CN117644511A - Robot grabbing method, system, equipment and medium based on implicit neural representation - Google Patents
Robot grabbing method, system, equipment and medium based on implicit neural representation Download PDFInfo
- Publication number
- CN117644511A CN117644511A CN202311718061.7A CN202311718061A CN117644511A CN 117644511 A CN117644511 A CN 117644511A CN 202311718061 A CN202311718061 A CN 202311718061A CN 117644511 A CN117644511 A CN 117644511A
- Authority
- CN
- China
- Prior art keywords
- grabbing
- availability
- implicit
- target object
- gripping
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000000034 method Methods 0.000 title claims abstract description 51
- 230000001537 neural effect Effects 0.000 title claims abstract description 21
- 230000000007 visual effect Effects 0.000 claims abstract description 8
- 230000015572 biosynthetic process Effects 0.000 claims abstract description 7
- 238000003786 synthesis reaction Methods 0.000 claims abstract description 7
- 230000008043 neural expression Effects 0.000 claims description 12
- 230000006870 function Effects 0.000 claims description 10
- 238000013528 artificial neural network Methods 0.000 claims description 8
- 238000004590 computer program Methods 0.000 claims description 8
- 238000003062 neural network model Methods 0.000 claims description 8
- 238000013527 convolutional neural network Methods 0.000 claims description 6
- 238000005070 sampling Methods 0.000 claims description 3
- 238000004891 communication Methods 0.000 description 4
- 230000008447 perception Effects 0.000 description 4
- 230000007547 defect Effects 0.000 description 3
- 238000002474 experimental method Methods 0.000 description 3
- 230000003287 optical effect Effects 0.000 description 3
- 238000012545 processing Methods 0.000 description 3
- 238000012549 training Methods 0.000 description 3
- 238000004364 calculation method Methods 0.000 description 2
- 230000006835 compression Effects 0.000 description 2
- 238000007906 compression Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000007613 environmental effect Effects 0.000 description 2
- 238000004088 simulation Methods 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000004422 calculation algorithm Methods 0.000 description 1
- 238000013461 design Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000000605 extraction Methods 0.000 description 1
- 230000014509 gene expression Effects 0.000 description 1
- 239000013307 optical fiber Substances 0.000 description 1
- 239000004065 semiconductor Substances 0.000 description 1
- 238000006467 substitution reaction Methods 0.000 description 1
- 238000012546 transfer Methods 0.000 description 1
- 238000012795 verification Methods 0.000 description 1
Landscapes
- Manipulator (AREA)
Abstract
The invention relates to a robot grabbing control method, a system, equipment and a medium based on implicit neural representation, wherein the method comprises the following steps: predicting the grabbing availability in the direction which is not observed before by utilizing the new view synthesis capability of the implicit neural representation, and selecting a next observation view angle based on the potential optimal grabbing availability of the target object; and performing visual modeling on the target scene through closed-loop continuous observation, and stopping observation and performing grabbing when the grabbing availability of the target object is greater than a target threshold value. Compared with the prior art, the invention has the advantages of high grabbing accuracy and high efficiency.
Description
Technical Field
The invention relates to the field of robot grabbing control, in particular to a robot grabbing method, a system, equipment and a medium based on implicit neural representation.
Background
Existing robot character capture methods are generally based on visual information input, such as capturing environmental features using a depth camera. Most use a fixed depth information acquisition approach, modeling a scene with visual information in a single view or in fixed multiple views. Such methods have difficulty in handling the grabbing of the specified object in complex, occluded stacked scenes.
Active perception planning aims at recursively planning the next observation position of the sensor, and active perception with recursive view planning enables more flexible acquisition of environmental information than the paradigm of passive observation, which is currently applied in various fields such as object reconstruction, object recognition and grip detection.
Active perception planning is generally divided into two categories: synthetic-based methods and search-based methods. The method based on synthesis directly calculates the next observation position according to the current observation and task constraint, and the method based on search firstly generates a certain number of candidate viewpoints and then selects the optimal observation position according to manual standards.
The active perception planning method has the following defects:
1) The method based on synthesis is difficult to deal with complex scenes when carrying out next observation gesture calculation, and the algorithm complexity requirement is high.
2) The defects in the search-based method adopt geometric information to evaluate the next observation gesture, so that the correlation between geometric reconstruction quality and grabbing quality is difficult to ensure, and the complex occlusion scene and the complex appearance target object to be grabbed are difficult to deal with; further, the latest method for evaluating the observation position planning by utilizing the grabbing availability often needs a large number of times of observation to realize better grabbing quality estimation, and reduces the overall grabbing efficiency.
Therefore, there is a need to design a robot gripping method with high gripping accuracy and high efficiency.
Disclosure of Invention
The invention aims to overcome the defects of the prior art and provide a robot grabbing method, a system, equipment and a medium with high grabbing accuracy and high efficiency based on implicit neural representation.
The aim of the invention can be achieved by the following technical scheme:
according to a first aspect of the present invention, there is provided a robot gripping control method based on implicit neural representation, the method comprising:
predicting the grabbing availability in the direction which is not observed before by utilizing the new view synthesis capability of the implicit neural representation, and selecting a next observation view angle based on the potential optimal grabbing availability of the target object;
and performing visual modeling on the target scene through closed-loop continuous observation, and stopping observation and performing grabbing when the grabbing availability of the target object is greater than a target threshold value.
Preferably, the method comprises the steps of:
s1, fusing a depth map acquired by a depth sensor to a truncated symbol distance function TSDF, inputting fused data to a neural network model feature to extract implicit neural expression of a target scene, and observing grasping availability and grasping gesture of a target object;
s2, when the grabbing availability of the target object reaches a target threshold value, driving a mechanical arm of the robot to run to a designated position to execute grabbing, otherwise turning to S3;
s3, sampling a potential next observation pose, selecting a potential optimal grabbing availability direction as a next observation target pose by predicting grabbing availability corresponding to a new view angle direction, driving a mechanical arm of a robot to move to a designated position to execute grabbing, and circulating.
Preferably, the neural network model is a three-dimensional convolutional neural network.
Preferably, a multi-layer perceptron neural network is adopted for grabbing availability and grabbing gesture observation of the target object.
Preferably, the capturing availability and capturing gesture observation of the target object by using the multi-layer perceptron neural network comprises:
and inputting the characteristics extracted from the implicit neural expression of the target scene according to the spatial position and the direction of the boundary o' clock of the target object into a multi-layer perceptron neural network, and predicting to obtain the grabbing availability and grabbing gesture of the target object.
Preferably, the grabbing gesture is a 6DoF grabbing gesture.
Preferably, the depth sensor is a depth camera.
According to a second aspect of the present invention, there is provided a robot gripping control system based on implicit neural representation, the system performing target object gripping control using any of the above methods.
According to a third aspect of the present invention there is provided an electronic device comprising a memory and a processor, the memory having stored thereon a computer program, the processor implementing the method of any one of the above when executing the program.
According to a fourth aspect of the present invention, there is provided a computer readable storage medium having stored thereon a computer program which when executed by a processor implements the method of any one of the above.
Compared with the prior art, the invention has the following beneficial effects:
1) According to the robot grabbing control method based on the implicit neural expression, corresponding grabbing quality can be predicted more accurately when the observing visual angle direction is the same as the grabbing direction, the depth sensor input compression is understood to be the implicit neural expression, grabbing availability is adopted to evaluate the next optimal observing visual angle, and the grabbing accuracy of the robot is improved.
2) According to the invention, the depth sensor input compression is understood as implicit neural expression through the convolutional neural network model, and the new view depth map synthesis is used for performing multitask training, so that the efficiency of the mechanical arm for observing and grabbing the whole flow is improved.
Drawings
FIG. 1 is a flow chart of the method of the present invention;
fig. 2 is a schematic diagram of a neural network model related to implicit neural representation according to the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and fully with reference to the accompanying drawings, in which it is evident that the embodiments described are some, but not all embodiments of the invention. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.
Examples
The embodiment provides a robot grabbing control method based on implicit neural representation, which uses new view synthesis capability of the implicit neural representation to predict grabbing availability in the direction which is not observed before and selects a next view based on potential optimal grabbing availability of a target object in consideration of more accurate prediction of corresponding grabbing quality when the observation view angle direction is the same as the grabbing direction; and (3) carrying out closed-loop continuous observation, carrying out visual modeling on the target scene, stopping observation and executing grabbing when the grabbing quality of the target object is optimized to reach a target threshold value.
The method of this embodiment will be described in detail with reference to fig. 1 and 2.
S1, fusing a depth map acquired by a depth sensor to a truncated symbol distance function TSDF, inputting fused data to a neural network model, compressing and understanding the truncated symbol distance function TSDF through the neural network model, extracting implicit neural expression of a target scene by features, and observing grasping availability and grasping gesture of a target object;
s2, when the grabbing availability of the target object reaches a target threshold value, driving a mechanical arm of the robot to run to a designated position to execute grabbing, otherwise turning to S3;
s3, sampling a potential next observation pose, selecting a potential optimal grabbing availability direction as a next observation target pose by predicting grabbing availability corresponding to a new view angle direction, driving a mechanical arm of a robot to move to a designated position to execute grabbing, and circulating.
The truncated signed distance function (Truncated Signed Distance Function, TSDF) in this embodiment is a data structure for representing the surface of an object in three dimensions, dividing the space into a regular grid of voxels, and storing a signed distance value for each voxel, the distance value representing the distance of the voxel center from the object surface.
The target threshold in this embodiment may be set according to the actual requirement, and further, the target threshold may be set to 0.95.
As another preferred embodiment, the depth sensor may be a depth camera.
As another preferred embodiment, the neural network model is a multi-layer 3D CNN layer, and the implicit neural expression of the target scene is obtained through feature extraction.
As another preferred embodiment, a multi-layer perceptron neural network is employed by traversing each direction G of each point in the target object bounding box v Obtaining grabbing availability and grabbing gesture prediction of each space point in different directions, wherein the corresponding expression is:
F(G v ,C geo )->G r ,G q ,G w
wherein F is the calculation process of the neural network of the multi-layer perceptron, G v C is the direction corresponding to the midpoint of the boundary box of the target object geo G for feature information extracted from implicit neural expression according to spatial location r For the grabbing pose of the target object, G q For grabbing availability of target object G w Is the grabbing width of the target object.
Bounding box T for cluttered scenes and target objects on a given desktop bbox The 6DoF grabbing gesture of the target object is predicted, and the training process is implemented as follows:
1) Judging whether the current time T exceeds the maximum running time T max If not, turning to S2, otherwise, ending;
2) Inputting the depth map of the current moment into D t As a truncated symbol distance function value M t The truncated symbol distance function value M t Input into a three-dimensional convolutional neural network, and the output of the three-dimensional convolutional neural network is used as implicit neural expression C at the current moment t S3, turning to S;
3) According to the boundary frame T of the target object at the current moment bbox Implicit neural expression C t And the direction of observation O v,t Predicting to obtain optimal grabbing availabilityTransfer 4);
4) JudgingOptimal availability of grabbing at the current timeWhether the predicted optimal grabbing is larger than a target threshold value or not, if the predicted optimal grabbing is larger than the target threshold value, directly executing predicted optimal grabbing, otherwise, turning to 5);
5) Traversing all possible directions G v Searching to obtain the optimal possible grabbing direction O v,t+1 And driving the robotic arm to a corresponding direction.
According to the invention, verification is carried out through a simulation experiment and a physical experiment, and the result shows that compared with the existing method, the method has 2% improvement in the grabbing success rate, and meanwhile, only 69% of observation times are used, so that the overall grabbing efficiency is improved. Meanwhile, in experiments for limiting the maximum observation times, the method improves the grabbing success rate by 3% compared with the existing method. In addition, the method can be well migrated from simulation training to physical experiments.
The electronic device of the present invention includes a Central Processing Unit (CPU) that can perform various appropriate actions and processes according to computer program instructions stored in a Read Only Memory (ROM) or computer program instructions loaded from a storage unit into a Random Access Memory (RAM). In the RAM, various programs and data required for the operation of the device can also be stored. The CPU, ROM and RAM are connected to each other by a bus. An input/output (I/O) interface is also connected to the bus.
A plurality of components in a device are connected to an I/O interface, comprising: an input unit such as a keyboard, a mouse, etc.; an output unit such as various types of displays, speakers, and the like; a storage unit such as a magnetic disk, an optical disk, or the like; and communication units such as network cards, modems, wireless communication transceivers, and the like. The communication unit allows the device to exchange information/data with other devices via a computer network, such as the internet, and/or various telecommunication networks.
The processing unit performs the various methods and processes described above. For example, in some embodiments, the method may be implemented as a computer software program tangibly embodied on a machine-readable medium, such as a storage unit. In some embodiments, part or all of the computer program may be loaded and/or installed onto the device via the ROM and/or the communication unit. One or more steps of the methods described above may be performed when the computer program is loaded into RAM and executed by a CPU. Alternatively, in other embodiments, the CPU may be configured to perform the method by any other suitable means (e.g., by means of firmware).
The functions described above herein may be performed, at least in part, by one or more hardware logic components. For example, without limitation, exemplary types of hardware logic components that may be used include: a Field Programmable Gate Array (FPGA), an Application Specific Integrated Circuit (ASIC), an Application Specific Standard Product (ASSP), a system on a chip (SOC), a load programmable logic device (CPLD), etc.
Program code for carrying out methods of the present invention may be written in any combination of one or more programming languages. These program code may be provided to a processor or controller of a general purpose computer, special purpose computer, or other programmable data processing apparatus such that the program code, when executed by the processor or controller, causes the functions/operations specified in the flowchart and/or block diagram to be implemented. The program code may execute entirely on the machine, partly on the machine, as a stand-alone software package, partly on the machine and partly on a remote machine or entirely on the remote machine or server.
In the context of the present invention, a machine-readable medium may be a tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. The machine-readable medium may be a machine-readable signal medium or a machine-readable storage medium. The machine-readable medium may include, but is not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples of a machine-readable storage medium would include an electrical connection based on one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
While the invention has been described with reference to certain preferred embodiments, it will be understood by those skilled in the art that various changes and substitutions of equivalents may be made and equivalents will be apparent to those skilled in the art without departing from the scope of the invention. Therefore, the protection scope of the invention is subject to the protection scope of the claims.
Claims (10)
1. A robot gripping method based on implicit neural representation, the method comprising:
predicting the grabbing availability in the direction which is not observed before by utilizing the new view synthesis capability of the implicit neural representation, and selecting a next observation view angle based on the potential optimal grabbing availability of the target object;
and performing visual modeling on the target scene through closed-loop continuous observation, and stopping observation and performing grabbing when the grabbing availability of the target object is greater than a target threshold value.
2. The robot gripping method based on implicit neural representation according to claim 1, characterized in that it comprises the steps of:
s1, fusing a depth map acquired by a depth sensor to a truncated symbol distance function TSDF, inputting fused data to a neural network model feature to extract implicit neural expression of a target scene, and observing grasping availability and grasping gesture of a target object;
s2, when the grabbing availability of the target object reaches a target threshold value, driving a mechanical arm of the robot to run to a designated position to execute grabbing, otherwise turning to S3;
s3, sampling a potential next observation pose, selecting a potential optimal grabbing availability direction as a next observation target pose by predicting grabbing availability corresponding to a new view angle direction, driving a mechanical arm of a robot to move to a designated position to execute grabbing, and circulating.
3. The robot gripping method based on implicit neural representation of claim 2, wherein the neural network model is a three-dimensional convolutional neural network.
4. The robot gripping method based on implicit neural representation according to claim 2, wherein the gripping availability and the gripping gesture observation of the target object are performed by using a multi-layer perceptron neural network.
5. The robot gripping method based on implicit neural representation according to claim 4, wherein the using of the multi-layer perceptron neural network for gripping availability and gripping gesture observation of the target object comprises:
and inputting the characteristics extracted from the implicit neural expression of the target scene according to the spatial position and the direction of the boundary o' clock of the target object into a multi-layer perceptron neural network, and predicting to obtain the grabbing availability and grabbing gesture of the target object.
6. The robot gripping method based on implicit neural representation of claim 2, wherein the gripping gesture is a 6DoF gripping gesture.
7. The implicit neural representation-based robotic grasping method of claim 2, wherein the depth sensor is a depth camera.
8. A robot gripping system based on implicit neural representation, characterized in that the system performs target object gripping control using the method of any one of claims 1 to 7.
9. An electronic device comprising a memory and a processor, the memory having stored thereon a computer program, characterized in that the processor, when executing the program, implements the method according to any of claims 1-7.
10. A computer readable storage medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any one of claims 1-7.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311718061.7A CN117644511A (en) | 2023-12-14 | 2023-12-14 | Robot grabbing method, system, equipment and medium based on implicit neural representation |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202311718061.7A CN117644511A (en) | 2023-12-14 | 2023-12-14 | Robot grabbing method, system, equipment and medium based on implicit neural representation |
Publications (1)
Publication Number | Publication Date |
---|---|
CN117644511A true CN117644511A (en) | 2024-03-05 |
Family
ID=90049379
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202311718061.7A Pending CN117644511A (en) | 2023-12-14 | 2023-12-14 | Robot grabbing method, system, equipment and medium based on implicit neural representation |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN117644511A (en) |
-
2023
- 2023-12-14 CN CN202311718061.7A patent/CN117644511A/en active Pending
Similar Documents
Publication | Publication Date | Title |
---|---|---|
WO2020228446A1 (en) | Model training method and apparatus, and terminal and storage medium | |
CN111062263B (en) | Method, apparatus, computer apparatus and storage medium for hand gesture estimation | |
CN113378770B (en) | Gesture recognition method, device, equipment and storage medium | |
CN114186632B (en) | Method, device, equipment and storage medium for training key point detection model | |
CN113378712B (en) | Training method of object detection model, image detection method and device thereof | |
US11941838B2 (en) | Methods, apparatuses, devices and storage medium for predicting correlation between objects | |
US11756205B2 (en) | Methods, devices, apparatuses and storage media of detecting correlated objects involved in images | |
CN114519881A (en) | Face pose estimation method and device, electronic equipment and storage medium | |
CN115719436A (en) | Model training method, target detection method, device, equipment and storage medium | |
CN115170510B (en) | Focus detection method and device, electronic equipment and readable storage medium | |
CN112967388A (en) | Training method and device for three-dimensional time sequence image neural network model | |
CN115239508A (en) | Scene planning adjustment method, device, equipment and medium based on artificial intelligence | |
US20220300774A1 (en) | Methods, apparatuses, devices and storage media for detecting correlated objects involved in image | |
CN114723809A (en) | Method and device for estimating object posture and electronic equipment | |
CN114220163B (en) | Human body posture estimation method and device, electronic equipment and storage medium | |
CN115984963A (en) | Action counting method and related equipment thereof | |
CN117644511A (en) | Robot grabbing method, system, equipment and medium based on implicit neural representation | |
Yang et al. | Locator slope calculation via deep representations based on monocular vision | |
CN111814865A (en) | Image identification method, device, equipment and storage medium | |
CN111753736A (en) | Human body posture recognition method, device, equipment and medium based on packet convolution | |
CN112131902A (en) | Closed loop detection method and device, storage medium and electronic equipment | |
CN115431968B (en) | Vehicle controller, vehicle and vehicle control method | |
US11922667B2 (en) | Object region identification device, object region identification method, and object region identification program | |
CN112686185B (en) | Relation feature extraction method and device and electronic equipment | |
CN115880776B (en) | Determination method of key point information and generation method and device of offline action library |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |