CN114083545B

CN114083545B - Moving object robot grabbing method and device based on visual perception

Info

Publication number: CN114083545B
Application number: CN202210076251.2A
Authority: CN
Inventors: 李特; 王彬; 顾建军; 曹昕; 金立; 秦学英
Original assignee: Zhejiang Lab
Current assignee: Zhejiang Lab
Priority date: 2022-01-24
Filing date: 2022-01-24
Publication date: 2022-07-01
Anticipated expiration: 2042-01-24
Also published as: CN114083545A

Abstract

The invention discloses a moving object robot grabbing method and device based on visual perception, wherein the method comprises the following steps: acquiring a first energy function under a first camera view angle and a second energy function under a second camera view angle; calculating a comprehensive energy function under the object center coordinate system according to the first energy function and the second energy function; minimizing the comprehensive energy function to obtain a pose transformation increment; updating the object pose at each camera view angle according to the pose transformation increment; inputting the object poses updated for a plurality of times into an object motion prediction model, and predicting a first pose of the object in a future preset time; comparing the first pose with all second poses in the grabbing pose database, and taking the first pose with the smallest difference value of the second poses and the difference value smaller than a preset threshold value as the end pose of the mechanical arm corresponding to the grabbing pose; and controlling the end effector to grab the object after controlling the mechanical arm to move according to the pose of the tail end of the mechanical arm.

Description

Moving object robot grabbing method and device based on visual perception

Technical Field

The application relates to the technical field of three-dimensional object tracking, trajectory prediction and mechanical arm planning control, in particular to a moving object robot grabbing method and device based on visual perception.

Background

Three-dimensional object tracking technology has wide application, such as application to AR games, AR navigation using mobile devices such as in a shopping environment, and electronic instruction for instrument maintenance, by tracking instruments, rendering steps or devices to be processed on a screen in real time, and the like. Real-time high-precision three-dimensional object posture tracking is always the direction of efforts of researchers.

The field of robot control and planning has been a focus of research. Robot researchers have made significant advances in developing algorithms and methods for robot operations in static environments.

However, in the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art:

robot operation becomes more difficult in dynamic environments, which is often the case in the real world. Such as dynamic grabbing, catching a ball, man-machine interfacing, etc., objects and obstacles with which to interact may move in unknown or known motions. On one hand, the motion of the object is unstable due to the visual perception method; on the other hand, the prediction of the motion trajectory of the object is also a difficult point in the grabbing problem.

Disclosure of Invention

The embodiment of the application aims to provide a moving object robot grabbing method and device based on visual perception so as to solve the technical problem that a moving object is difficult to grab in the related technology.

According to a first aspect of embodiments of the present application, there is provided a moving object robot grabbing method based on visual perception, including:

acquiring a first energy function under a first camera view angle and a second energy function under a second camera view angle;

according to the first energy function and the second energy function, calculating a comprehensive energy function of double-view joint optimization under an object center coordinate system;

minimizing the comprehensive energy function to obtain a pose transformation increment;

updating the object pose at each camera view angle according to the pose transformation increment;

inputting the object poses updated for a plurality of times into an object motion prediction model, and predicting a first pose of the object in a future preset time;

comparing the first pose with all second poses in a grabbing pose database, and taking the first pose with the minimum difference value of the second poses and the difference value smaller than a preset threshold value as the end pose of the mechanical arm corresponding to the grabbing pose;

and controlling the end effector to grab the object after the mechanical arm moves according to the pose of the tail end of the mechanical arm.

Further, minimizing the comprehensive energy function to obtain a pose transformation increment, including:

calculating a first Jacobian matrix of the first energy function and a second Jacobian matrix of the second energy function;

and calculating a pose transformation increment according to the first Jacobian matrix and the second Jacobian matrix.

Further, the prediction process of the object motion prediction model includes:

acquiring the updated pose of the object, and judging the motion mode of the object;

if the motion mode of the object is uniform linear motion or uniform circular motion, predicting the pose of the object through a Kalman filtering algorithm;

if the motion mode of the object is spherical pendulum motion, predicting the translational motion of the object by establishing an object translation model, predicting the rotational motion of the object by establishing a key sequence of the rotational motion of the object, and obtaining a prediction result of the pose of the object according to the translational motion of the object and the rotational motion of the object.

Further, before comparing the first pose with all second poses in a grab pose database, further comprising:

acquiring the motion direction and the motion speed of a frame on the target object according to the first pose;

and pre-screening the first posture according to the movement direction and the movement speed.

Further, comparing the first pose with all second poses in a capture pose database, comprising:

acquiring a conversion relation between a camera coordinate system and a mechanical arm base coordinate system, wherein a first pose obtained by the object motion prediction model is a pose in the camera coordinate system;

converting the first position posture into a coordinate system of a mechanical arm base to obtain a third position posture;

and comparing the third pose with all second poses in a capture pose database.

Further, according to the terminal position of the mechanical arm, the end effector is controlled to grab the object after the mechanical arm moves, and the method comprises the following steps:

acquiring a motion mode of an object;

if the motion mode of the object is uniform linear motion or uniform circular motion, directly controlling an end effector to grab the object after the mechanical arm moves to the tail end pose of the mechanical arm;

and if the motion mode of the object is spherical pendulum motion, predicting the tail end pose of the mechanical arm when the mechanical arm moves to the tail end pose of the mechanical arm again, and controlling the tail end actuator to grab the object according to the prediction result.

Further, predicting the terminal pose of the mechanical arm during grabbing again comprises:

inputting the grabbing pose into a spherical pendulum motion prediction model;

setting a prediction time;

and obtaining the pre-grabbing attitude of the tail end pose of the mechanical arm according to the object pose and the motion direction output by the spherical pendulum motion prediction model after the prediction time is input.

According to a second aspect of the embodiments of the present application, there is provided a moving object robot gripper device based on visual perception, including:

the acquisition module is used for acquiring a first energy function under a first camera view angle and a second energy function under a second camera view angle;

the calculation module is used for calculating a comprehensive energy function of double-view joint optimization under an object center coordinate system according to the first energy function and the second energy function;

the minimization module is used for minimizing the comprehensive energy function to obtain a pose transformation increment;

the updating module is used for updating the object pose under each camera view angle according to the pose transformation increment;

the prediction module is used for inputting the updated object poses for a plurality of times into the object motion prediction model and predicting the first pose of the object in the future preset time;

the comparison module is used for comparing the first pose with all second poses in a grabbing pose database, and taking the first pose with the minimum difference value with the second poses and the difference value smaller than a preset threshold value as the end pose of the mechanical arm corresponding to the grabbing pose;

and the control module is used for controlling the end effector to grab the object after the mechanical arm moves according to the pose of the tail end of the mechanical arm.

According to a third aspect of embodiments of the present application, there is provided an electronic apparatus, including:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement a method as described in the first aspect.

According to a fourth aspect of embodiments herein, there is provided a computer readable storage medium having stored thereon computer instructions which, when executed by a processor, implement the steps of the method according to the first aspect.

The technical scheme provided by the embodiment of the application can have the following beneficial effects:

according to the embodiment, the comprehensive energy function of the double-view joint optimization under the object center coordinate system is calculated according to the first energy function under the first camera view angle and the second energy function under the second camera view angle, the object posture is tracked through the double-view method, and the perception of the moving object is realized; the double-view method is adopted, the technical problem of insufficient tracking precision is solved, the input data of the prediction model is accurate enough, and the output result of the prediction model can be used for a grabbing task, so that the grabbing of a moving object is realized.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present application and together with the description, serve to explain the principles of the application.

Fig. 1 is a flowchart illustrating a moving object robot grasping method based on visual perception according to an exemplary embodiment.

Fig. 2 is a flowchart illustrating step S13 according to an exemplary embodiment.

FIG. 3 is a flow diagram illustrating a prediction process of an object motion prediction model according to an exemplary embodiment.

FIG. 4 is a flowchart illustrating steps S41-S42, according to an exemplary embodiment.

Fig. 5 is a flowchart illustrating "compare the first pose with all second poses in the grab pose database" in step S16 according to an exemplary embodiment.

Fig. 6 is a flowchart illustrating step S17 according to an exemplary embodiment.

Fig. 7 is a flowchart illustrating "predict again the robot arm end pose at the time of grasping" in step S63 according to an exemplary embodiment.

FIG. 8 is a block diagram illustrating a moving object robotic gripper device based on visual perception according to an example embodiment.

FIG. 9 is a schematic diagram of an electronic device shown in accordance with an example embodiment.

Detailed Description

Reference will now be made in detail to the exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, like numbers in different drawings represent the same or similar elements unless otherwise indicated. The embodiments described in the following exemplary embodiments do not represent all embodiments consistent with the present application. Rather, they are merely examples of apparatus and methods consistent with certain aspects of the present application, as detailed in the appended claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this application and the appended claims, the singular forms "a", "an", and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any and all possible combinations of one or more of the associated listed items.

It is to be understood that although the terms first, second, third, etc. may be used herein to describe various information, such information should not be limited to these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the present application. The word "if" as used herein may be interpreted as "at … …" or "when … …" or "in response to a determination", depending on the context.

The noun explains:

object center coordinate system: a coordinate system with the center of the target object as the origin of coordinates.

Arm base coordinate system: and a coordinate system taking the center of the mechanical arm base as a coordinate origin.

Fig. 1 is a flowchart illustrating a method for grabbing a moving object based on visual perception by a robot according to an exemplary embodiment, where the method is applied to a terminal, and may include the following steps:

step S11: acquiring a first energy function under a first camera view angle and a second energy function under a second camera view angle;

step S12: according to the first energy function and the second energy function, calculating a comprehensive energy function of double-view joint optimization under an object center coordinate system;

step S13: minimizing the comprehensive energy function to obtain a pose transformation increment;

step S14: updating the object pose at each camera view angle according to the pose transformation increment;

step S15: inputting the object poses updated for a plurality of times into an object motion prediction model, and predicting a first pose of the object in a future preset time;

step S16: comparing the first pose with all second poses in a grabbing pose database, and taking the first pose with the minimum difference value of the second poses and the difference value smaller than a preset threshold value as the end pose of the mechanical arm corresponding to the grabbing pose;

step S17: and controlling the end effector to grab the object after the mechanical arm moves according to the pose of the tail end of the mechanical arm.

In a specific implementation of step S11, a first energy function at a first camera view angle and a second energy function at a second camera view angle are obtained;

specifically, the first energy function is an energy function of the target object in the object center coordinate system with respect to a first image in a first camera view angle, the second energy function is an energy function of the target object in the object center coordinate system with respect to a second image in a second camera view angle, and the calculation method of the energy function may be various, such as a region-based monocular three-dimensional tracking method and an edge-based monocular three-dimensional tracking method, where a general calculation formula is as follows:

wherein the content of the first and second substances,

an energy function representing a monocular region-based three-dimensional tracking method at the view angle of the ith camera,

representing the image at the view angle of the ith camera,

representing the pose of the last frame expressed by a target object lie algebra under a camera i coordinate system,

representing the pose of the target object in the last frame of the lie group representation in the i coordinate system of the camera,

and representing the pose transformation increment of the current frame and the previous frame under the object center coordinate system.

In the specific implementation of step S12, according to the first energy function and the second energy function, calculating a comprehensive energy function of the dual-view joint optimization in the object center coordinate system;

in particular, the first energy function is added to the second energy function, the reason for direct addition mainly considering that the sharing of the two cameras in the adjustment of the object pose is comparable. And moreover, due to the combination of the image information under the visual angles of the two cameras, the optimized and obtained object pose is more accurate. Comprehensive energy function of double-view joint optimization under object center coordinate system

Is expressed as

In the specific implementation of step S13, the synthetic energy function is minimized to obtain a pose transformation increment;

specifically, as shown in fig. 2, step S13 may include the following sub-steps:

step S21: calculating a first Jacobian matrix of the first energy function and a second Jacobian matrix of the second energy function;

specifically, the jacobian matrix is a matrix in which the first partial derivatives of the function are arranged in a manner that is essentially optimized using a gradient descent method.

Step S22: calculating a pose transformation increment according to the first Jacobian matrix and the second Jacobian matrix;

specifically, pose transformation increments

Is calculated by the formula

In a specific implementation of step S14, updating the object pose at each camera view angle according to the pose transformation delta;

specifically, under the object center coordinate system, the pose updating formula is as follows:

wherein the content of the first and second substances,

for the updated object pose represented by the lie group,

representing pose transformation deltas

From the lie algebraic space to the lie group space,

representing the pose of the target object in the last frame of the lie group representation under the i coordinate system of the camera.

In an implementation of step S15, inputting the updated object poses several times into the object motion prediction model, and predicting a first pose of the object in a predetermined time in the future;

in particular, selecting several times of results can avoid the inaccuracy of a certain tracking result as much as possible, which causes a large deviation of the prediction precision.

Specifically, as shown in fig. 3, the prediction process of the object motion prediction model includes:

step S31: acquiring the updated pose of the object, and judging the motion mode of the object;

specifically, the motion modes of the object in this embodiment include uniform linear motion, uniform circular motion, and spherical pendulum motion.

Step S32: if the motion mode of the object is uniform linear motion or uniform circular motion, predicting the pose of the object through a Kalman filtering algorithm;

specifically, a plurality of obtained three-dimensional object tracking results are input into a Kalman filtering model corresponding to the motion mode of the object, the pose of the object in the next frame is obtained according to the existing results, the pose is used as input, a plurality of frames are output in the form, the number of the output frames is related to the set time, the time interval between the two frames is fixed and known, and therefore the real-time performance of the program can meet the requirement while the precision is guaranteed.

Step S33: if the motion mode of the object is spherical pendulum motion, predicting the translational motion of the object by establishing an object translation model, predicting the rotational motion of the object by establishing a key sequence of the rotational motion of the object, and obtaining a prediction result of the pose of the object according to the translational motion of the object and the rotational motion of the object;

specifically, for the translational motion part of the object, an approximate model is established to simulate the spherical pendulum motion. After taking into account the energy loss C due to friction, by the Lagrangian method

We can obtain the free vibration equation:

wherein L, T, V represents the lagrangian amount, kinetic energy, and potential energy, respectively.

Is a generalized coordinate of the coordinate system of the device,

is a generalized velocity, both with respect to time

As a function of (c).

And for the rotary motion part of the object, constructing a key sequence on the basis of the experimental data in advance by adopting a form of establishing a sequence database. Specifically, the increment of the object pose of other frames and the first frame in the sequence is calculated, and similar sequences in the database are removed to ensure that a certain difference exists between the sequences. This has the advantage that the data capacity in the database can be greatly reduced, thereby increasing the speed of prediction during test run.

In a specific implementation of step S16, the first pose is compared with all second poses in the grasp pose database, and the first pose having the smallest difference with the second poses and the difference being smaller than a predetermined threshold is taken as the end-of-arm pose corresponding to the grasp pose;

specifically, as shown in fig. 4, step S16 may further include, before:

step S41: acquiring a movement direction and a movement speed corresponding to the target object according to the first pose;

step S42: pre-screening the first posture according to the movement direction and the movement speed;

in the specific implementation of the steps S41-S42, it is the most reasonable strategy to perform capturing according to the motion direction and motion direction of the target object, and the pre-screening can eliminate most candidate poses that do not satisfy the capturing conditions, so that the real-time performance of the program can be greatly improved.

Specifically, as shown in fig. 5, the "comparing the first pose with all the second poses in the grab pose database" in step S16 may include the sub-steps of:

step S51: acquiring a conversion relation between a camera coordinate system and a mechanical arm base coordinate system, wherein a first pose obtained by the object motion prediction model is a pose in the camera coordinate system;

step S52: converting the first position posture into a coordinate system of a mechanical arm base to obtain a third position posture;

step S53: comparing the third pose with all second poses in a capture pose database;

in the embodiment of step S51-step S53, since the tracked and predicted pose is the result of the camera coordinate system, the robot arm can be planned only in the robot arm base coordinate system, and therefore, the coordinate transformation is necessary.

In the specific implementation of step S17, the end effector is controlled to perform object grabbing after the robot arm is controlled to move according to the pose of the end of the robot arm.

Specifically, as shown in fig. 6, step S17 includes the following sub-steps:

step S61: acquiring a motion mode of an object;

Step S62: if the motion mode of the object is uniform linear motion or uniform circular motion, directly controlling an end effector to grab the object after the mechanical arm moves to the tail end pose of the mechanical arm;

specifically, the movement speed of the object can be known from the tracking result, so that the pattern of the movement of the object can be obtained. The uniform linear motion and the uniform circular motion are very regular motions, so that the prediction result is very accurate even if the prediction time is very long, and the grabbing can be completed only by once planning.

Step S63: if the motion mode of the object is spherical pendulum motion, predicting the end pose of the mechanical arm when the mechanical arm moves to the end pose of the mechanical arm, and controlling the end effector to grab the object according to the prediction result;

specifically, firstly, a tracking result is input into a prediction model, a relatively long-time prediction is set, then after the prediction result is screened, the tail end of the mechanical arm is moved to grab the position close to the pose (a certain distance along the moving direction of an object at the current pose); the next prediction is then performed while reducing the time of the prediction to ensure that the program can complete the prediction more quickly, thereby ensuring that the object can be successfully grasped.

Specifically, as shown in fig. 7, the "predicting again the robot arm end pose at the time of grasping" in step S63 may include the following sub-steps:

step S71: inputting the grabbing pose into a spherical pendulum motion prediction model;

step S72: setting a prediction time;

step S73: according to the object pose and the motion direction output by the spherical pendulum motion prediction model after the prediction time is input, obtaining a pre-grabbing gesture of the tail end pose of the mechanical arm;

specifically, a more accurate result obtained by the latest tracking is input into the prediction model, and meanwhile, the prediction time, namely the number of predicted frames, is shortened, so that the prediction precision meets the precision of a mechanical arm grabbing task, and the grabbing of a moving object is completed.

Corresponding to the embodiment of the moving object robot grabbing method based on visual perception, the application also provides an embodiment of the moving object robot grabbing device based on visual perception.

Fig. 8 is a block diagram of a moving object robotic gripper based on visual perception, according to an example embodiment. Referring to fig. 8, the apparatus may include:

an obtaining module 21, configured to obtain a first energy function at a first camera view angle and a second energy function at a second camera view angle;

the calculation module 22 is configured to calculate a comprehensive energy function of the dual-view joint optimization in the object center coordinate system according to the first energy function and the second energy function;

a minimization module 23, configured to minimize the comprehensive energy function to obtain a pose transformation increment;

an updating module 24, configured to update the object pose at each camera view angle according to the pose transformation increment;

the prediction module 25 is configured to input the updated object poses of several times into the object motion prediction model, and predict a first pose of the object in a future predetermined time;

a comparison module 26, configured to compare the first pose with all second poses in the capture pose database, and use the first pose with the smallest difference value with the second poses and the difference value being smaller than a predetermined threshold as the end pose of the mechanical arm corresponding to the capture pose;

and the control module 27 is used for controlling the end effector to grab the object after the mechanical arm moves according to the pose of the tail end of the mechanical arm.

With regard to the apparatus in the above-described embodiment, the specific manner in which each module performs the operation has been described in detail in the embodiment related to the method, and will not be elaborated here.

For the device embodiments, since they substantially correspond to the method embodiments, reference may be made to the partial description of the method embodiments for relevant points. The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules can be selected according to actual needs to achieve the purpose of the scheme of the application. One of ordinary skill in the art can understand and implement it without inventive effort.

Correspondingly, the present application also provides an electronic device, comprising: one or more processors; a memory for storing one or more programs; when executed by the one or more processors, cause the one or more processors to implement a moving object robot grasping method based on visual perception as described above. As shown in fig. 9, for a hardware structure diagram of any device with data processing capability where a moving object robot grabbing method based on visual perception provided in the embodiment of the present invention is located, in addition to the processor and the memory shown in fig. 9, any device with data processing capability where a device is located in the embodiment may also include other hardware according to the actual function of the any device with data processing capability, which is not described again.

Accordingly, the present application also provides a computer readable storage medium, on which computer instructions are stored, which when executed by a processor, implement the moving object robot grabbing method based on visual perception as described above. The computer readable storage medium may be an internal storage unit, such as a hard disk or a memory, of any data processing capability device described in any of the foregoing embodiments. The computer readable storage medium may also be an external storage device of the wind turbine, such as a plug-in hard disk, a Smart Media Card (SMC), an SD Card, a Flash memory Card (Flash Card), and the like, provided on the device. Further, the computer readable storage medium may include both an internal storage unit of any data processing capable device and an external storage device. The computer-readable storage medium is used for storing the computer program and other programs and data required by the arbitrary data processing-capable device, and may also be used for temporarily storing data that has been output or is to be output.

Other embodiments of the present application will be apparent to those skilled in the art from consideration of the specification and practice of the disclosure disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the application and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the application being indicated by the following claims.

It will be understood that the present application is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the application is limited only by the appended claims.

Claims

1. A moving object robot grabbing method based on visual perception is characterized by comprising the following steps:

2. The method of claim 1, wherein minimizing the composite energy function to obtain pose transformation increments comprises:

3. The method of claim 1, wherein the prediction process of the object motion prediction model comprises:

4. The method of claim 1, further comprising, prior to comparing the first pose to all second poses in a grab pose database:

5. The method of claim 1, wherein comparing the first pose to all second poses in a grab pose database comprises:

acquiring a conversion relation between a camera coordinate system and a mechanical arm base coordinate system, wherein a first pose obtained by the object motion prediction model is a pose under the camera coordinate system;

and comparing the third pose with all second poses in a capture pose database.

6. The method according to claim 1, wherein controlling the end effector to perform object grabbing after controlling the robot arm to move according to the pose of the end of the robot arm comprises:

acquiring a motion mode of an object;

and if the motion mode of the object is spherical swing motion, predicting the tail end pose of the mechanical arm during grabbing again after the mechanical arm moves to the tail end pose of the mechanical arm, and controlling the tail end executor to grab the object according to the prediction result.

7. The method of claim 6, wherein predicting the pose of the end of the robot arm at the time of the grabbing again comprises:

inputting the grabbing pose into a spherical pendulum motion prediction model;

setting a prediction time;

8. The utility model provides a moving object robot grabbing device based on vision perception which characterized in that includes:

9. An electronic device, comprising:

one or more processors;

a memory for storing one or more programs;

when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 1-7.

10. A computer-readable storage medium having stored thereon computer instructions, which when executed by a processor, perform the steps of the method according to any one of claims 1-7.