CN111360826B

CN111360826B - System capable of displaying grabbing pose in real time

Info

Publication number: CN111360826B
Application number: CN202010132892.6A
Authority: CN
Inventors: 庞剑坤; 魏武
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2020-02-29
Filing date: 2020-02-29
Publication date: 2023-01-06
Anticipated expiration: 2040-02-29
Also published as: CN111360826A

Abstract

The invention discloses a system capable of displaying a grabbing pose in real time, which comprises a mechanical arm part, a camera part, a target object and a computer, wherein the mechanical arm part is connected with the camera part; the mechanical arm part comprises a six-degree-of-freedom mechanical arm and two-finger clamping jaws, and the six-degree-of-freedom mechanical arm is connected with the two-finger clamping jaws; the camera part comprises a depth camera, and the camera is connected with a computer; the computer comprises an algorithm processing unit, and the algorithm processing unit is used for calculating the grabbing poses of the mechanical arm, the camera and the target object; the target object is placed below the camera, the mechanical arm part moves from top to bottom along a track, in the process, the camera acquires a depth image of the target object and sends the depth image to the computer, an information entropy diagram is obtained through processing of an algorithm processing unit of the computer, and a change diagram of the optimal grabbing pose of the target object, the depth image acquired from the camera and the information entropy diagram are displayed on a display screen of the computer.

Description

System capable of displaying grabbing pose in real time

Technical Field

The invention relates to the field of mechanical arm visual grabbing, in particular to a system capable of displaying grabbing poses in real time.

Background

In recent years, the visual grabbing of the mechanical arm is gradually a research hotspot, and related applications are gradually brought to the market. Most of the existing mechanical arm grabbing systems rely on high-performance hardware devices, such as a multi-core processor, a display card with a sufficiently large memory, and the like. Such a visual capture system relying on conventional image recognition is difficult to land in a real scene. For example, the high performance video card TITAN used in Deep Object timing for Semantic robotics marking of Household Objects published by NVIDIA is very expensive. On the one hand, the high price and the demanding hardware requirements limit their spread. On the other hand, the real-time requirement in grabbing can not be met by adopting general hardware equipment, and generally the system can only grab static objects, and is long in time consumption and low in efficiency.

The system capable of displaying the grabbing pose in real time can solve the existing problems. The invention completely expounds the components of the System, the communication mode among the components and the specific processing method, obtains the depth information of the target object from multiple visual angles in the downward grabbing process of the mechanical arm, generates the optimal grabbing pose of the target object, and displays the optimal grabbing pose in real time, thereby realizing the aim of dynamic grabbing, greatly shortening the processing time from the input of information to the decision obtaining, being very efficient and effective, having great reference value for a Robot Operating System (ROS) and an environment perception System, and being capable of being popularized in the field of industrial robots.

Disclosure of Invention

The invention provides a system capable of displaying a grabbing pose in real time, which mainly solves the problems that the existing algorithm has high requirements on hardware equipment, cannot display the grabbing pose in real time and the like, completely expounds the components of the system, the communication mode among all the components and the specific processing method, shortens the calculation time and realizes the function of displaying the optimal grabbing pose in real time by a method with high operation efficiency of a grid map.

The invention is realized by at least one of the following technical schemes.

A system capable of displaying a grabbing pose in real time comprises a mechanical arm part, a camera part, a target object and a computer;

the mechanical arm part comprises a six-degree-of-freedom mechanical arm and two-finger clamping jaws, the six-degree-of-freedom mechanical arm is connected with the two-finger clamping jaws, and the two-finger clamping jaws are arranged at the tail end of the six-degree-of-freedom mechanical arm;

the camera part comprises a depth camera, and the camera is connected with a computer; the depth camera is arranged right above the two-finger clamping jaw;

the computer comprises an algorithm processing unit, and the algorithm processing unit is used for calculating the grabbing poses of the mechanical arm, the camera and the target object;

the target object is placed below the depth camera, the mechanical arm part moves from top to bottom along a track, in the process, the depth camera acquires a depth image of the target object and sends the depth image to the computer, an information entropy diagram is obtained through processing of an algorithm processing unit of the computer, and a change diagram of the optimal grabbing pose of the target object, the depth image acquired from the depth camera and the information entropy diagram are displayed on a display screen of the computer.

Further, the six-degree-of-freedom mechanical arm is a UR5 industrial mechanical arm, and the two fingers are RG2 fingers.

Further, the depth camera is an Intel Realsense D435i.

Further, the computer system used by the algorithm processing unit is Ubuntu 16.04, and the robot operating system is ROS Kinetic.

Further, the motion trajectory of the mechanical arm part is defined as follows:

p represents the three-dimensional position of the camera in the downward movement process of the mechanical arm according to a certain track;

k represents the number of a series of P points in the downward movement process of the mechanical arm according to a certain track;

Γ＝{p ₀ ,...,p _k }: a random trajectory consisting of K P points;

p ₀ the position of the camera before the mechanical arm starts to move is shown, and the corresponding vertical height is z _max ；

p _k The position of the camera after the mechanical arm movement is finished is shown, and the corresponding vertical height is z _min 。

Further, after the depth camera is started by the computer, the depth image is sent to the computer, and the updating frequency is 80fps, so that the continuity of data transmission is ensured.

Further, the communication mode of the six-degree-of-freedom mechanical arm, the depth camera and the computer is communication under ROS, specifically, the computer is connected with the six-degree-of-freedom mechanical arm through a network cable, then the six-degree-of-freedom mechanical arm is started, and a control instruction is sent to the six-degree-of-freedom mechanical arm; the computer receives depth information acquired by the depth camera from the target object in real time, inputs the depth information into the internal algorithm processing unit, outputs the optimal grabbing pose of the target object after the depth information is processed by the algorithm processing unit, and displays the optimal grabbing pose through a screen of the computer so as to display the optimal grabbing pose of the target object in real time, wherein the time of algorithm processing is less than 0.5s.

Further, the algorithm processing unit calculates the grab pose of the target object as defined below:

defining the grabbing pose of the target object: g = (c, Φ, w, q) represents parameters involved in one complete gripping movement;

c = (x, y, z) represents the three-dimensional coordinates of the target object grasping point, namely the target position to which the clamping jaw needs to reach;

x, Y and Z respectively represent coordinates of X, Y and Z axes in a Cartesian coordinate system and are in mm;

phi belongs to [0, pi ] to represent the angle of the clamping jaw which needs to rotate for grabbing the target object;

w represents the width of the clamping jaw required to be opened for grabbing the target object, and the unit is mm;

q belongs to [0,1] represents the quality of grabbing, and the larger the stipulated value is, the higher the grabbing success rate is proved to be.

In order to combine observation along a viewpoint track at a time step, the working space of the six-freedom-degree mechanical arm and the two-finger clamping jaw is represented as a two-dimensional grid graph M of J x K units, and J and K respectively represent the length and the width of the two-dimensional grid graph M; each unit corresponds to a u x u physical area and serves as a unit square grid, and u represents the size of the unit square grid;

in each cell (j, k), a unit cell corresponding to u x u, (j, k) represents a physical region of j x k, j, k represents the length and width of the region, respectively, and the gripping quality observation (q) is calculated into a vector q _j,k In, discretizing into n _q Interval, n _q Representing grid diagram row coordinates, and combining the gripping quality and the angle observation value to form (q, phi) to be recorded in a two-dimensional histogram m _j,k In each case are separated by n _q *n _φ Interval with abscissa of n _φ The numerical value represents the rotation angle, and the column coordinate is n _q The value size represents the grabbing quality, the vector of the values represents the distribution of observation points at each point and forms the basis of information acquisition, the number in each square grid is defined in a grid graph to represent the probability, and the larger the number is, the information gain (n) in the square grid area is represented (the larger the number is, the higher the probability is) _q ,n _φ ) The larger the grid image, the more the grid image is, the more the grid image contains the objectObjects, the square figures are small, and the description does not contain the objects;

further, the grid map is obtained by digitally converting a depth image of the object, and the capture in the (j, k) region is defined by parameterizing the average of the observations in the region:

wherein, g _j,k Represents grabbing within the (j, k) region; c. C _j,k Representing the three-dimensional position of the target grabbing center point;

represents phi _j,k Mean value of (phi) _j,k Representing the angle of the object to be rotated when the two fingers grab the object in the (j, k) area;

represents w _j,k Mean value of (1), w _j,k The two-finger clamping jaw is used for clamping an object in the area (j, k) and needs to be opened;

represents q _j,k Mean value of q _j,k Representing the grabbing quality of the two fingers in the (j, k) area;

further, the average observed value is calculated as follows, and for a single cell, the average grab quality observed value q is given by:

wherein N is _q Represents n _q The set of (a) and (b),

representing a subscript of n _q Q value of (1);

mean angle of rotation

Is the vector mean of the angle observations weighted by the corresponding grabbed quality observations:

wherein pi represents a grid diagram

The sum of all the sine values of the grabbing angles, psi represents the sum of all the cosine values of the grabbing angles, N _q Represents n _q Set of (2), N _φ Represents n _φ A set of (a);

average opening width of one unit

Average of n observations:

where n represents the number of values of w.

Compared with the prior art, the invention has the advantages and beneficial effects that:

1. the method completely expounds the components of the system, the communication mode among the components and the specific processing method, obtains the depth information of the target object from multiple visual angles in the downward grabbing process of the mechanical arm, generates the optimal grabbing pose of the target object, displays the optimal grabbing pose in real time, achieves the aim of dynamic grabbing, greatly shortens the processing time from information input to decision obtaining, and is efficient and effective.

2. The system components and the communication mode adopted by the invention can simplify the information transmission process, and meanwhile, the adopted gridding calculation method greatly shortens the time for calculating the optimal grabbing pose, does not need a processor and a display card with strong performance, can run on a common industrial personal computer or a notebook computer, and is convenient to popularize.

Drawings

Fig. 1 is a schematic diagram of a system capable of displaying a grabbing pose in real time according to the embodiment;

fig. 2 is a depth image obtained from the target object in this embodiment;

FIG. 3 is a diagram of the optimal grabbing pose displayed in real time in this embodiment;

FIG. 4 is a grid diagram in the present embodiment;

FIG. 5 is a diagram of the path followed by the jaws of this embodiment;

in the figure: 1-six degree of freedom mechanical arm; 2-a clamping jaw; 3-a depth camera; 4-a target object; 5-a computer.

Detailed Description

The working principle and working process of the present invention will be further explained in detail with reference to the accompanying drawings.

As shown in fig. 1, a system capable of displaying a grabbing pose in real time comprises a mechanical arm part, a camera part, a target object and a computer, wherein the mechanical arm part comprises a six-degree-of-freedom mechanical arm 1 and two-finger clamping jaws 2, and the two-finger clamping jaws 2 are arranged at the tail end of the six-degree-of-freedom mechanical arm 1;

the camera part comprises a depth camera 3, the camera 3 is connected with a computer 5, and the depth camera 3 is arranged right above the two-finger clamping jaw 2;

the target objects comprise a plurality of target objects 4 common to daily life;

the computer 5 comprises an algorithm processing unit which is used for calculating the grabbing poses of the mechanical arm, the camera and the target object;

as shown in fig. 1, the camera 3 acquires a depth image of the target object and sends the depth image to the computer 5, the information entropy diagram shown in fig. 3 is obtained through processing by the algorithm processing unit of the computer 5, and a variation diagram of the optimal grabbing pose (rectangle) of the target object is displayed on the display screen of the computer 5.

The six-degree-of-freedom mechanical arm 1 is a UR5 industrial mechanical arm, the two-finger clamping jaw 2 is an RG2 clamping jaw, the six-degree-of-freedom mechanical arm is connected with the two-finger clamping jaw, the depth camera 3 is an Intel Realsense D435i, a computer system used by the algorithm processing part is Ubuntu 16.04, and a robot operating system is ROS Kinetic;

specifically, the target object 4 is randomly placed right below the camera, the six-degree-of-freedom mechanical arm 1 drives the two-finger clamping jaw 2 to move from top to bottom along a track, as shown in fig. 5, in the process, the camera 3 continuously obtains depth information of the target object from a plurality of different visual angles and transmits the depth image information to the computer 5, the computer 5 processes the obtained depth image information according to an algorithm and calculates an optimal grabbing pose of the target object, and meanwhile, an object depth map, an information entropy map after algorithm processing and an optimal grabbing pose (rectangular) change map of the object obtained through algorithm calculation are displayed on a computer screen of the computer 5, and the depth image is shown in fig. 2.

The mechanical arm part motion trajectory is defined as follows:

p represents the three-dimensional position of the camera in the downward movement process of the mechanical arm 1 according to a certain track;

k represents the number of a series of P points in the downward movement process of the mechanical arm 1 according to a certain track;

Γ＝{p ₀ ,...,p _k denotes a random trajectory composed of K P points;

After the depth camera 3 is started by the computer 5, the corresponding depth image is obtained and sent to the computer 5, and the updating frequency is 80fps, so that the continuity of data transmission is ensured.

As shown in fig. 1, the Robot arm 1, the camera 3 and the computer 5 communicate with each other under a Robot Operating System (ROS), and the computer 5 is connected to the Robot arm 1 through a network cable, then starts the Robot arm 1, and transmits a series of information about the Robot arm 1 and a control command to the six-degree-of-freedom Robot arm 1. The computer 5 receives the information sent by the depth camera 3 in real time and inputs the information to the algorithm processing unit. The whole communication flow is as follows: through an ROS system, the mechanical arm 1 and the depth camera 3 respectively release relevant information, the depth camera 3 acquires depth information of a target object, namely the thickness of the object and the distance from the depth camera, and the depth information is released, the computer 5 always receives the relevant information, sends the depth information in the depth camera to an algorithm processing unit, outputs the optimal grabbing pose of the target object after algorithm processing, and sends the optimal grabbing pose to the computer 5 in a topic form, the computer 5 can start Rviz (a visualization tool carried in the ROS system) through the ROS to acquire the optimal grabbing pose of the target object and display the optimal grabbing pose through a screen of the computer 5, therefore, the optimal grabbing pose of the target object can be displayed in real time, and the time of algorithm processing is less than 0.5s.

The algorithm processing unit calculates the grabbing pose of the target object as follows:

defining the grabbing pose of the target object: g = (c, Φ, w, q) represents parameters involved in one complete gripping motion;

c = (x, y, z) represents the three-dimensional coordinates of the target object grasping point, that is, the target position that the gripping jaw needs to reach;

phi belongs to [0, pi ] represents the angle of the clamping jaw required to rotate for grabbing the target object;

q belongs to [0,1] to represent the quality of grabbing, and the larger the stipulated value is, the higher the grabbing success rate is proved to be;

in order to combine the observations at time step along the viewpoint trajectory, the working spaces of the six-dof robot arm and the two-finger gripper are represented as two-dimensional grid maps M of J × K units, J, K representing the length and width of the two-dimensional grid map M, respectively. Each unit corresponds to a u x u physical area and serves as a unit square grid, and u represents the size of the unit square grid;

as shown in fig. 4, in each cell (j, k), corresponding to a unit cell of u × (j, k) represents a physical region of j × k, j, k represent the length and width of the region, respectively, and the grip quality observation (q) is incorporated into a vector q _j,k In, discretizing into n _q Interval, n _q Representing grid diagram row coordinates, and combining the gripping quality and the angle observation value to form (q, phi) to be recorded in a two-dimensional histogram m _j,k In each case are separated by n _q *n _φ Interval with abscissa of n _φ The numerical value represents the rotation angle, and the column coordinate is n _q The value size represents the grasping quality. These vectors represent the distribution of the observation points at each point and form the basis of the information acquisition method, the grid map defines the probability that the number inside each square represents, the larger the number, the information gain (n) in the square area is represented _q ,n _φ ) The larger, it can be determined from the numerical conditions in the grid map whether the area contains an object, the place with large number contains objects, and the other squares with small number indicate that no objects are contained.

The adopted grid map is obtained by digitally converting a depth image of an object. The grab in the (j, k) region is defined by parameterizing the average of the observations in the region:

wherein, g _j,k Representing grabbing in the (j, k) region, c _j,k Represents the three-dimensional position of the target grabbing center point,

represents phi _j,k Mean value of (phi) _j,k Represents the angle of rotation required for the two fingers to grab the object in the (j, k) area,

represents w _j,k Mean value of (1), w _j,k Represents the angle of the two fingers required to open for gripping the object in the (j, k) area,

represents q _j,k Mean value of (a), q _j,k Representing the gripping quality of the two fingers in the (j, k) region. The average observed value was calculated as follows. For a single cell, the average grab quality observation q is given by:

wherein N is _q Represents n _q The set of (a) or (b),

representing a subscript of n _q Q value of (2).

Mean angle of rotation

Is the vector mean of the angle observations weighted by the corresponding grabbed mass observations:

wherein pi represents a grid diagram

Sine value of all grabbing anglesΨ represents the cumulative sum of all grasp angle cosine values, N _q Represents n _q Set of (2), N _φ Represents n _φ A collection of (a).

Average opening width of one unit

Except for the average of n observations:

wherein n represents the number of w values

The above description is only for the purpose of illustrating the preferred embodiments of the present invention and is not to be construed as limiting the invention, and any modifications, equivalents or improvements made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. The utility model provides a system that can show in real time and snatch position appearance which characterized in that: the system comprises a mechanical arm part, a camera part, a target object and a computer;

the mechanical arm part comprises a six-degree-of-freedom mechanical arm (1) and two-finger clamping jaws (2), the six-degree-of-freedom mechanical arm (1) is connected with the two-finger clamping jaws (2), and the two-finger clamping jaws (2) are installed at the tail end of the six-degree-of-freedom mechanical arm (1);

the camera part comprises a depth camera (3), and the camera depth camera (3) is connected with a computer (5); the depth camera (3) is arranged right above the two-finger clamping jaw (2);

the computer (5) comprises an algorithm processing unit, and the algorithm processing unit is used for calculating the grabbing poses of the mechanical arm, the camera and the target object;

the target object is placed below the depth camera (3), the mechanical arm part moves from top to bottom along a track, in the process, the depth camera (3) acquires a depth image of the target object and sends the depth image to the computer (5), an information entropy diagram is obtained through processing of an algorithm processing unit of the computer (5), and a change diagram of the optimal grabbing pose of the target object, the depth image acquired from the depth camera (3) and the information entropy diagram are displayed on a display screen of the computer (5);

the algorithm processing unit calculates the grabbing pose of the target object and is defined as follows:

x, Y and Z respectively represent coordinates of X, Y and Z axes in a Cartesian coordinate system and have a unit of mm;

q belongs to [0,1] and represents the quality of grabbing, and the larger the stipulated value is, the higher the grabbing success rate is proved to be;

in order to combine the observation along the viewpoint track at the time step, the working spaces of the six-degree-of-freedom mechanical arm and the two fingers are represented as two-dimensional grid graphs M of J x K units, and J and K respectively represent the length and the width of the two-dimensional grid graphs M; each unit corresponds to a u x u physical area and serves as a unit square grid, and u represents the size of the unit square grid;

in each cell (j, k), corresponding to a unit cell of u, (j, k) representing a physical region of j k, j, k representing the length and width of the region, respectively, a gripping quality observation (q) is added to a vector q _j,k In, discretizing into n _q Interval, n _q Representing grid diagram row coordinates, and combining the gripping quality and the angle observation value to form (q, phi) to be recorded in a two-dimensional histogram m _j,k In each case are separated by n _q *n _φ Interval with abscissa of n _φ The numerical value represents the rotation angle, and the column coordinate is n _q The value size represents the grasping quality, the vector of the value represents the distribution of the observation points at each point and forms the basis of information acquisition, the number in each square grid is defined in the grid graph to represent the probability, and the larger the number is, the information gain (n) in the square grid area is represented (n) _q ,n _φ ) The larger, the more the determination of the inclusion of an area is based on the numerical conditions in the grid mapIn the body, the squares with large numbers contain objects, and the squares with small numbers indicate that no objects are contained.

2. The system capable of displaying the grabbing pose in real time according to claim 1, wherein: the six-degree-of-freedom mechanical arm (1) is a UR5 industrial mechanical arm, and the two-finger clamping jaw (2) is an RG2 clamping jaw.

3. The system capable of displaying the grabbing pose in real time according to claim 1, wherein: the depth camera (3) is an Intel Realsense D435i.

4. The system capable of displaying the grabbing pose in real time according to claim 1, wherein: the computer system used by the algorithm processing unit is Ubuntu 16.04, and the robot operating system is ROS Kinetic.

5. The system capable of displaying the grabbing pose in real time according to claim 1, wherein: the mechanical arm part motion trajectory is defined as follows:

p represents the three-dimensional position of the camera in the downward movement process of the mechanical arm (1) according to a certain track;

k represents the number of a series of P points in the downward movement process of the mechanical arm (1) according to a certain track;

Γ＝{p ₀ ,...,p _k }: a random trajectory consisting of K P points;

6. The system capable of displaying the grabbing pose in real time according to claim 1, wherein: after the depth camera is started by the computer (5), the depth image is sent to the computer (5), and the updating frequency is 80fps, so that the continuity of data transmission is ensured.

7. The system capable of displaying the grabbing pose in real time according to claim 1, wherein: the communication mode of the six-degree-of-freedom mechanical arm (1), the depth camera (3) and the computer (5) is communication under ROS, specifically, the computer (5) is connected with the six-degree-of-freedom mechanical arm (1) through a network cable, then the six-degree-of-freedom mechanical arm (1) is started, and a control instruction is sent to the six-degree-of-freedom mechanical arm (1); the computer (5) receives the depth information acquired by the depth camera (3) from the target object in real time, inputs the depth information into an internal algorithm processing unit, outputs the optimal grabbing pose of the target object after the depth information is processed by the algorithm processing unit, and displays the optimal grabbing pose through a screen of the computer (5) so as to display the optimal grabbing pose of the target object in real time, wherein the time of algorithm processing is less than 0.5s.

8. The system capable of displaying the grabbing pose in real time according to claim 1, wherein: the grid map is obtained by digitally transforming a depth image of the object, and the capture in the (j, k) region is defined by parameterizing the average of the observations in the region:

represents phi _j,k Mean value of (phi) _j,k Representing the angle of the object to be rotated when the two fingers grab in the (j, k) area;

represents w _j,k Mean value of (d), w _j,k The two-finger clamping jaw is used for clamping an object in the area (j, k) and needs to be opened;

represents q _j,k Mean value of q _j,k Representing the gripping quality of the two fingers in the (j, k) region.

9. The system capable of displaying the grabbing pose in real time according to claim 1, wherein: the average observations are calculated as follows, and for a single cell, the average grab quality observation q is given by:

wherein N is _q Represents n _q The set of (a) or (b),

representing a subscript of n _q Q value of (1);

mean angle of rotation

wherein pi represents a grid diagram

The sum of the sine values of all the grabbing angles, psi represents the sum of the cosine values of all the grabbing angles, N _q Represents n _q Set of (2), N _φ Represents n _φ A set of (a);

average opening width of one unit

Average of n observations:

where n represents the number of w values.