CN117644511A

CN117644511A - Robot grabbing method, system, equipment and medium based on implicit neural representation

Info

Publication number: CN117644511A
Application number: CN202311718061.7A
Authority: CN
Inventors: 王栋; 李学龙; 张学超; 赵斌
Original assignee: Shanghai AI Innovation Center
Current assignee: Shanghai AI Innovation Center
Priority date: 2023-12-14
Filing date: 2023-12-14
Publication date: 2024-03-05

Abstract

The invention relates to a robot grabbing control method, a system, equipment and a medium based on implicit neural representation, wherein the method comprises the following steps: predicting the grabbing availability in the direction which is not observed before by utilizing the new view synthesis capability of the implicit neural representation, and selecting a next observation view angle based on the potential optimal grabbing availability of the target object; and performing visual modeling on the target scene through closed-loop continuous observation, and stopping observation and performing grabbing when the grabbing availability of the target object is greater than a target threshold value. Compared with the prior art, the invention has the advantages of high grabbing accuracy and high efficiency.

Description

Robot grabbing method, system, equipment and medium based on implicit neural representation

Technical Field

The invention relates to the field of robot grabbing control, in particular to a robot grabbing method, a system, equipment and a medium based on implicit neural representation.

Background

Existing robot character capture methods are generally based on visual information input, such as capturing environmental features using a depth camera. Most use a fixed depth information acquisition approach, modeling a scene with visual information in a single view or in fixed multiple views. Such methods have difficulty in handling the grabbing of the specified object in complex, occluded stacked scenes.

Active perception planning aims at recursively planning the next observation position of the sensor, and active perception with recursive view planning enables more flexible acquisition of environmental information than the paradigm of passive observation, which is currently applied in various fields such as object reconstruction, object recognition and grip detection.

Active perception planning is generally divided into two categories: synthetic-based methods and search-based methods. The method based on synthesis directly calculates the next observation position according to the current observation and task constraint, and the method based on search firstly generates a certain number of candidate viewpoints and then selects the optimal observation position according to manual standards.

The active perception planning method has the following defects:

1) The method based on synthesis is difficult to deal with complex scenes when carrying out next observation gesture calculation, and the algorithm complexity requirement is high.

2) The defects in the search-based method adopt geometric information to evaluate the next observation gesture, so that the correlation between geometric reconstruction quality and grabbing quality is difficult to ensure, and the complex occlusion scene and the complex appearance target object to be grabbed are difficult to deal with; further, the latest method for evaluating the observation position planning by utilizing the grabbing availability often needs a large number of times of observation to realize better grabbing quality estimation, and reduces the overall grabbing efficiency.

Therefore, there is a need to design a robot gripping method with high gripping accuracy and high efficiency.

Disclosure of Invention

The invention aims to overcome the defects of the prior art and provide a robot grabbing method, a system, equipment and a medium with high grabbing accuracy and high efficiency based on implicit neural representation.

The aim of the invention can be achieved by the following technical scheme:

according to a first aspect of the present invention, there is provided a robot gripping control method based on implicit neural representation, the method comprising:

predicting the grabbing availability in the direction which is not observed before by utilizing the new view synthesis capability of the implicit neural representation, and selecting a next observation view angle based on the potential optimal grabbing availability of the target object;

and performing visual modeling on the target scene through closed-loop continuous observation, and stopping observation and performing grabbing when the grabbing availability of the target object is greater than a target threshold value.

Preferably, the method comprises the steps of:

s1, fusing a depth map acquired by a depth sensor to a truncated symbol distance function TSDF, inputting fused data to a neural network model feature to extract implicit neural expression of a target scene, and observing grasping availability and grasping gesture of a target object;

s2, when the grabbing availability of the target object reaches a target threshold value, driving a mechanical arm of the robot to run to a designated position to execute grabbing, otherwise turning to S3;

s3, sampling a potential next observation pose, selecting a potential optimal grabbing availability direction as a next observation target pose by predicting grabbing availability corresponding to a new view angle direction, driving a mechanical arm of a robot to move to a designated position to execute grabbing, and circulating.

Preferably, the neural network model is a three-dimensional convolutional neural network.

Preferably, a multi-layer perceptron neural network is adopted for grabbing availability and grabbing gesture observation of the target object.

Preferably, the capturing availability and capturing gesture observation of the target object by using the multi-layer perceptron neural network comprises:

and inputting the characteristics extracted from the implicit neural expression of the target scene according to the spatial position and the direction of the boundary o' clock of the target object into a multi-layer perceptron neural network, and predicting to obtain the grabbing availability and grabbing gesture of the target object.

Preferably, the grabbing gesture is a 6DoF grabbing gesture.

Preferably, the depth sensor is a depth camera.

According to a second aspect of the present invention, there is provided a robot gripping control system based on implicit neural representation, the system performing target object gripping control using any of the above methods.

According to a third aspect of the present invention there is provided an electronic device comprising a memory and a processor, the memory having stored thereon a computer program, the processor implementing the method of any one of the above when executing the program.

According to a fourth aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method of any one of the above.

Compared with the prior art, the invention has the following beneficial effects:

1) According to the robot grabbing control method based on the implicit neural expression, corresponding grabbing quality can be predicted more accurately when the observing visual angle direction is the same as the grabbing direction, the depth sensor input compression is understood to be the implicit neural expression, grabbing availability is adopted to evaluate the next optimal observing visual angle, and the grabbing accuracy of the robot is improved.

2) According to the invention, the depth sensor input compression is understood as implicit neural expression through the convolutional neural network model, and the new view depth map synthesis is used for performing multitask training, so that the efficiency of the mechanical arm for observing and grabbing the whole flow is improved.

Drawings

FIG. 1 is a flow chart of the method of the present invention;

fig. 2 is a schematic diagram of a neural network model related to implicit neural representation according to the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

Examples

The embodiment provides a robot grabbing control method based on implicit neural representation, which uses new view synthesis capability of the implicit neural representation to predict grabbing availability in the direction which is not observed before and selects a next view based on potential optimal grabbing availability of a target object in consideration of more accurate prediction of corresponding grabbing quality when the observation view angle direction is the same as the grabbing direction; and (3) carrying out closed-loop continuous observation, carrying out visual modeling on the target scene, stopping observation and executing grabbing when the grabbing quality of the target object is optimized to reach a target threshold value.

The method of this embodiment will be described in detail with reference to fig. 1 and 2.

S1, fusing a depth map acquired by a depth sensor to a truncated symbol distance function TSDF, inputting fused data to a neural network model, compressing and understanding the truncated symbol distance function TSDF through the neural network model, extracting implicit neural expression of a target scene by features, and observing grasping availability and grasping gesture of a target object;

The truncated signed distance function (Truncated Signed Distance Function, TSDF) in this embodiment is a data structure for representing the surface of an object in three dimensions, dividing the space into a regular grid of voxels, and storing a signed distance value for each voxel, the distance value representing the distance of the voxel center from the object surface.

The target threshold in this embodiment may be set according to the actual requirement, and further, the target threshold may be set to 0.95.

As another preferred embodiment, the depth sensor may be a depth camera.

As another preferred embodiment, the neural network model is a multi-layer 3D CNN layer, and the implicit neural expression of the target scene is obtained through feature extraction.

As another preferred embodiment, a multi-layer perceptron neural network is employed by traversing each direction G of each point in the target object bounding box _v Obtaining grabbing availability and grabbing gesture prediction of each space point in different directions, wherein the corresponding expression is:

F(G _v ,C _geo )->G _r ,G _q ,G _w

wherein F is the calculation process of the neural network of the multi-layer perceptron, G _v C is the direction corresponding to the midpoint of the boundary box of the target object _geo G for feature information extracted from implicit neural expression according to spatial location _r For the grabbing pose of the target object, G _q For grabbing availability of target object G _w Is the grabbing width of the target object.

Bounding box T for cluttered scenes and target objects on a given desktop _bbox The 6DoF grabbing gesture of the target object is predicted, and the training process is implemented as follows:

1) Judging whether the current time T exceeds the maximum running time T _max If not, turning to S2, otherwise, ending;

2) Inputting the depth map of the current moment into D _t As a truncated symbol distance function value M _t The truncated symbol distance function value M _t Input into a three-dimensional convolutional neural network, and the output of the three-dimensional convolutional neural network is used as implicit neural expression C at the current moment _t S3, turning to S;

3) According to the boundary frame T of the target object at the current moment _bbox Implicit neural expression C _t And the direction of observation O _v,t Predicting to obtain optimal grabbing availabilityTransfer 4);

4) JudgingOptimal availability of grabbing at the current timeWhether the predicted optimal grabbing is larger than a target threshold value or not, if the predicted optimal grabbing is larger than the target threshold value, directly executing predicted optimal grabbing, otherwise, turning to 5);

5) Traversing all possible directions G _v Searching to obtain the optimal possible grabbing direction O _v,t+1 And driving the robotic arm to a corresponding direction.

According to the invention, verification is carried out through a simulation experiment and a physical experiment, and the result shows that compared with the existing method, the method has 2% improvement in the grabbing success rate, and meanwhile, only 69% of observation times are used, so that the overall grabbing efficiency is improved. Meanwhile, in experiments for limiting the maximum observation times, the method improves the grabbing success rate by 3% compared with the existing method. In addition, the method can be well migrated from simulation training to physical experiments.

The electronic device of the present invention includes a Central Processing Unit (CPU) that can perform various appropriate actions and processes according to computer program instructions stored in a Read Only Memory (ROM) or computer program instructions loaded from a storage unit into a Random Access Memory (RAM). In the RAM, various programs and data required for the operation of the device can also be stored. The CPU, ROM and RAM are connected to each other by a bus. An input/output (I/O) interface is also connected to the bus.

A plurality of components in a device are connected to an I/O interface, comprising: an input unit such as a keyboard, a mouse, etc.; an output unit such as various types of displays, speakers, and the like; a storage unit such as a magnetic disk, an optical disk, or the like; and communication units such as network cards, modems, wireless communication transceivers, and the like. The communication unit allows the device to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.

The processing unit performs the various methods and processes described above. For example, in some embodiments, the method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as a storage unit. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device via the ROM and/or the communication unit. One or more steps of the methods described above may be performed when the computer program is loaded into RAM and executed by a CPU. Alternatively, in other embodiments, the CPU may be configured to perform the method by any other suitable means (e.g., by means of firmware).

The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a load programmable logic device (CPLD), etc.

Program code for carrying out methods of the present invention may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.

In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.

While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.

Claims

1. A robot gripping method based on implicit neural representation, the method comprising:

2. The robot gripping method based on implicit neural representation according to claim 1, characterized in that it comprises the steps of:

3. The robot gripping method based on implicit neural representation of claim 2, wherein the neural network model is a three-dimensional convolutional neural network.

4. The robot gripping method based on implicit neural representation according to claim 2, wherein the gripping availability and the gripping gesture observation of the target object are performed by using a multi-layer perceptron neural network.

5. The robot gripping method based on implicit neural representation according to claim 4, wherein the using of the multi-layer perceptron neural network for gripping availability and gripping gesture observation of the target object comprises:

6. The robot gripping method based on implicit neural representation of claim 2, wherein the gripping gesture is a 6DoF gripping gesture.

7. The implicit neural representation-based robotic grasping method of claim 2, wherein the depth sensor is a depth camera.

8. A robot gripping system based on implicit neural representation, characterized in that the system performs target object gripping control using the method of any one of claims 1 to 7.

9. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program, characterized in that the processor, when executing the program, implements the method according to any of claims 1-7.

10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1-7.