CN114474060A

CN114474060A - Control method and device of industrial robot and storage medium

Info

Publication number: CN114474060A
Application number: CN202210143375.8A
Authority: CN
Inventors: 张平; 张佳鑫
Original assignee: South China University of Technology SCUT
Current assignee: South China University of Technology SCUT
Priority date: 2022-02-16
Filing date: 2022-02-16
Publication date: 2022-05-13
Anticipated expiration: 2042-02-16
Also published as: CN114474060B

Abstract

The application discloses a control method, a control device and a storage medium of an industrial robot, wherein the method comprises the following steps: defining a pushing action and a grabbing action of an industrial robot, and carrying out Markov modeling on the pushing action and the grabbing action; acquiring a depth image, establishing a quadtree for the depth image, and acquiring the minimum distance between edge points of different objects in the depth image according to the quadtree; and learning by using a deep reinforcement learning network, and rewarding the environment changed by the pushing action and the grabbing action of the industrial robot. The method and the device can be used for randomly placing objects in the unstructured scene, and realize stable and good sorting and grabbing control.

Description

Control method and device of industrial robot and storage medium

Technical Field

The application relates to the field of intelligent manufacturing and reinforcement learning, in particular to a control method and device of an industrial robot and a storage medium.

Background

Robots are increasingly used in industrial production. In an intelligent manufacturing scene, when a sorting task is carried out in an unstructured environment, the placement of objects to be sorted has the characteristic of randomness, the change of the robot action to a working space has the characteristics of continuity and time sequence, and a hard coding method and a teaching method in the related technology have high repetition precision, but have poor operation flexibility and cannot be quickly adapted to the unstructured environment at low cost. In addition, the object information in the scene can be frequently changed by adopting the traditional method to detect the target and operate in the fixed scene, and the time cost and the labor cost for carrying out the action programming adaptive to the environment on each robot are higher.

Therefore, the above technical problems of the related art need to be solved.

Disclosure of Invention

The present application is directed to solving one of the technical problems in the related art. Therefore, the embodiment of the application provides a control method and device for an industrial robot and a storage medium, which can learn the action coordination of pushing and control in an unstructured scene, and realize stable and good sorting and grabbing control.

According to an aspect of an embodiment of the present application, there is provided a control method of an industrial robot, the method including:

defining a pushing action and a grabbing action of an industrial robot, and carrying out Markov modeling on the pushing action and the grabbing action;

acquiring a depth image, establishing a quadtree for the depth image, and acquiring the minimum distance between edge points of different objects in the depth image according to the quadtree;

and learning by using a deep reinforcement learning network, and rewarding the environment changed by the pushing action and the grabbing action of the industrial robot.

In one embodiment, the defining the pushing and gripping actions of the industrial robot comprises:

setting a safety height of the industrial robot;

defining the pushing action as pushing in a preset direction within the safety height;

defining the grabbing act as grabbing at a preset position within the safety height.

In one embodiment, establishing a quadtree for a depth image, and obtaining a minimum distance between edge points of different objects in the depth image according to the quadtree includes:

dividing the image into four areas according to a coordinate axis by taking the central point of the depth image as a coordinate origin;

traversing by taking four vertexes of the depth image as starting points, and setting points of an external object-free communication area as preset values;

obtaining edge points of different objects in the depth image according to the preset value;

and calculating the minimum distance between the edge points of different objects according to the edge points.

In one embodiment, the calculation formula for calculating the minimum distance between the edge points of different objects according to the edge points is as follows:

in the formula (I), the compound is shown in the specification,

being the minimum distance between edge points of different objects,

and the nodes containing the object in the depth image are obtained.

In one embodiment, the deep reinforcement learning network comprises an action value network and an action strategy network, the action value network uses a feedforward full convolution network to represent the pushing action, uses an extended depth value network to represent the grabbing action, and inputs the depth image into the feedforward full convolution network and the extended depth value network; the action strategy network comprises an actor network and a critic network, wherein the actor network is used for selecting the best executed action, and the critic network scores the output of the actor network and is used for enabling the actor network to perform strategy gradient descent.

In one embodiment, rewarding the gripping motion change environment of the industrial robot comprises:

rewarding the environment changed by the grabbing action of the industrial robot after successfully grabbing an object, wherein the rewarding function is as follows:

in the formula, R_g(s_t,s_t+1) For rewarding the industrial robot, grasp success is a grabbing success condition, and grasp failed is a grabbing failure condition.

In one embodiment, rewarding the push action changing environment of the industrial robot comprises:

if the number of objects changes after the pushing action is executed, the reward function is:

R_p(s_t，S_t+1)＝l，if n_t＜n_t+1

in the formula, R_p(s_t,s_t+1) For reward to said industrial robot, n_tAnd n_t+1Is the number of said objects;

if the number of objects is not changed after the pushing action is executed, the reward function is as follows:

if any d_(i，j)＜L and n_t＝＝n_t+1

in the formula, R_p(s_t,s_t+1) For reward to said industrial robot, n_tAnd n_t+1ReLu is the ReLu function, d_t+1Is the minimum distance between edge points of different objects.

According to an aspect of an embodiment of the present application, there is provided a control apparatus of an industrial robot, the apparatus including:

the system comprises a definition module, a data processing module and a data processing module, wherein the definition module is used for defining pushing action and grabbing action of the industrial robot and carrying out Markov modeling on the pushing action and the grabbing action;

the acquisition module is used for acquiring a depth image, establishing a quadtree for the depth image, and acquiring the minimum distance between edge points of different objects in the depth image according to the quadtree;

and the reward module is used for learning by using a deep reinforcement learning network and rewarding the environment changed by the pushing action and the grabbing action of the industrial robot.

at least one processor;

at least one memory for storing at least one program;

at least one of said programs, when executed by at least one of said processors, implements a method of controlling an industrial robot as described in the previous embodiments.

According to an aspect of the embodiments of the present application, there is provided a storage medium storing a program executable by a processor, wherein the program executable by the processor is used for implementing a control method of an industrial robot according to the foregoing embodiments when executed by the processor.

The control method of the industrial robot has the advantages that: performing Markov modeling on a pushing action and a grabbing action by defining the pushing action and the grabbing action of an industrial robot; acquiring a depth image, establishing a quadtree for the depth image, and acquiring the minimum distance between edge points of different objects in the depth image according to the quadtree; the deep reinforcement learning network is used for learning, the environment of the industrial robot is rewarded when the pushing action and the grabbing action are changed, objects can be placed randomly in an unstructured scene (for example, the objects are placed compactly), and stable and good sorting and grabbing control is achieved.

Additional aspects and advantages of the present application will be set forth in part in the description which follows and, in part, will be obvious from the description, or may be learned by practice of the present application.

Drawings

In order to more clearly illustrate the technical solutions in the embodiments of the present application, the drawings needed to be used in the description of the embodiments are briefly introduced below, and it is obvious that the drawings in the following description are only some embodiments of the present application, and it is obvious for those skilled in the art to obtain other drawings based on these drawings without creative efforts.

Fig. 1 is a schematic diagram illustrating a control method of an industrial robot according to an embodiment of the present application;

fig. 2 is a flowchart of a control method of an industrial robot according to an embodiment of the present disclosure;

fig. 3 is a schematic diagram illustrating a picture partial area division according to an embodiment of the present application;

FIG. 4 is a diagram of a leaf node of a quadtree associated with two object edge points according to an embodiment of the present application;

fig. 5 is a schematic diagram of a control device of an industrial robot according to an embodiment of the present application;

fig. 6 is a schematic view of another control device of an industrial robot according to an embodiment of the present application.

Detailed Description

In order to make the technical solutions better understood by those skilled in the art, the technical solutions in the embodiments of the present application will be clearly and completely described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are only partial embodiments of the present application, but not all embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The terms "first," "second," "third," and "fourth," etc. in the description and claims of this application and the accompanying drawings are used for distinguishing between different objects and not for describing a particular order. Furthermore, the terms "include" and "have," as well as any variations thereof, are intended to cover non-exclusive inclusions. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those steps or elements listed, but may alternatively include other steps or elements not listed, or inherent to such process, method, article, or apparatus.

Reference herein to "an embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor are separate or alternative embodiments mutually exclusive of other embodiments. It is explicitly and implicitly understood by one skilled in the art that the embodiments described herein can be combined with other embodiments.

For the convenience of understanding, the description explains the related terms and terms of the application, specifically as follows:

markov model: the Markov Model is a statistical Model, and is widely applied to the application fields of natural language processing such as speech recognition, automatic part-of-speech tagging, phonetic-to-character conversion, probabilistic grammar and the like. A markov process in which both the markov model time and the state are discrete is called a markov chain, and is denoted by Xn ═ x (n), n ═ 0,1, 2. The markov chain is a sequence of random variables X1, X2, X3. The range of these variables, i.e., the set of all their possible values, is called the "state space", while the value of Xn is the state at time n. If Xn +1 is only a function of Xn for the past state, then P (Xn +1 | X0, X1, X2, …, Xn) ═ P (Xn +1 | Xn), where X is some state in the process.

Depth image: in a computer vision system, three-dimensional scene information provides more possibilities for various computer vision applications such as image segmentation, target detection, object tracking and the like, and a Depth image (Depth map) is widely applied as a general three-dimensional scene information expression mode. The gray value of each pixel point of the depth image can be used for representing the distance between a certain point in the scene and the camera.

A quadtree: a quadtree is a tree-like data structure with four sub-blocks at each node. Quaternary trees are often used for analysis and classification of two-dimensional spatial data. It divides the data into four quadrants. The data range may be square or rectangular or any other shape.

Deep reinforcement learning: the deep reinforcement learning combines the perception capability of the deep learning and the decision capability of the reinforcement learning, can be directly controlled according to an input image, and is an artificial intelligence method closer to a human thinking mode.

Fig. 1 is a schematic diagram of a control method of an industrial robot according to an embodiment of the present application, and as shown in fig. 1, the control method of an industrial robot according to the present application includes: an action value network: two feedforward full convolution networks and an extended depth value network are used for representing pushing and grabbing actions respectively, the two networks take a depth image of a state as input, value pixel mapping with the same size and resolution as the input image is output, and the value is the return expectation of executing the action at the pixel point; action policy network: the strategy network comprises an actor network and a critic network; the actor network selects the best performing action for the current moment workspace scene. The critic network scores the output of the actor network, the result represents the good or bad action decision of the actor network, and therefore strategy gradient descending of the actor network is achieved, and training of the critic network is gradient descending according to the reward of the robot for executing the action. In fig. 1, a sensor and a camera take an image and input depth camera data into the action value network and the action strategy network, the action value network pushes an earth cave execution point and a grabbing execution point of an industrial robot, and the action strategy network selects an action to be executed by the industrial robot.

Based on the above principle, the present application proposes a control method of an industrial robot, as shown in fig. 2, including:

s201, defining a pushing action and a grabbing action of the industrial robot, and carrying out Markov modeling on the pushing action and the grabbing action.

In step S201, the defining the pushing motion and the grasping motion of the industrial robot includes: setting a safety height of the industrial robot; defining the pushing action as pushing in a preset direction within the safety height; defining the grabbing action as grabbing at a preset position within the safety height. Specifically, the method comprises the following steps: defining the safe height of the tail end of the robot in the operation process, taking (x, y) as the center in a picture shot by a depth camera, finding the highest point of the area as the safe height safe _ z, and the action point as (x, y, safe _ z), wherein the action is defined as follows:

(1) pushing action:

starting from the starting point, move to the upper point (x, y, safe _ z + d) of the action point_h) The clamping jaws are closed and vertically moved to an action point, pushed along a given direction and then moved to the original position

(2) Grabbing action:

starting from the starting point, move to (x, y, safe _ z + d)_h) Is (d)_hThe value is set by a specific scene), the clamping jaws are opened, the clamping jaws are vertically moved downwards to an action point along the z axis, the clamping jaws are closed, the clamping jaws are vertically moved upwards, and when the clamping of the object is detected, the moving object moves out of the working space, and then the clamping jaws are moved to the original position.

It should be noted that the markov model in the present embodiment can be subdivided into four parts, i.e., a state, an action, a policy, and a reward, wherein each quantity in the markov model is represented as follows:

(1) the state is as follows: the state information is the state information of the object in the environment. Using s_iIndicating the state of time step i.

S＝(s₀，…，s_i-1，s_i，s_i+1，…，s_n)

(2) The actions are as follows: the action set is composed of two basic actions of pushing and grabbing, using a_iRepresenting the actions performed by the robot at time step i.

A＝(a₀，...，a_i-1，a_i，a_i+1，...，a_n-1)

(3) Strategy: representing policy using deep network, representing state transition probability, and recordingAs p(s)_i+1|s_i,a_i) I.e. from s_iExecuting a in the state_iAfter action, shift to s_i+1Probability distribution of (2).

(4) Reward: reward is the reward obtained according to a reward function after the robot performs an action, at s_iExecuting a in the state_iReward R earned after action_i。

R_i＝R_Function(s_i,a_i)

S202, obtaining a depth image, establishing a quadtree for the depth image, and obtaining the minimum distance between edge points of different objects in the depth image according to the quadtree.

In step S202, a quadtree is established for the depth image, and the minimum distance between edge points of different objects in the depth image is obtained according to the quadtree, including: dividing the image into four areas according to a coordinate axis by taking the central point of the depth image as a coordinate origin; traversing by taking four vertexes of the depth image as starting points, and setting points of an external object-free communication area as preset values; obtaining edge points of different objects in the depth image according to the preset value; and calculating the minimum distance between the edge points of different objects according to the edge points. The calculation mathematical expression for calculating the minimum distance between the edge points of different objects according to the edge points is as follows:

in the formula (I), the compound is shown in the specification,

being the minimum distance between edge points of different objects,

and the nodes containing the object in the depth image are obtained.

And S203, learning by using a deep reinforcement learning network, and rewarding the environment changed by the pushing action and the grabbing action of the industrial robot.

In this embodiment, the deep reinforcement learning network includes an action value network and an action strategy network, the action value network uses a feedforward full convolution network to represent the pushing action, uses an extended depth value network to represent the grabbing action, and inputs the depth image into the feedforward full convolution network and the extended depth value network; the action strategy network comprises an actor network and a critic network, wherein the actor network is used for selecting the best executed action, and the critic network scores the output of the actor network and is used for enabling the actor network to perform strategy gradient descent.

The strategy network comprises an actor network and a critic network; actor network for current moment working space scene s_tThe best performing action is selected. The critic network scores the output of the actor network, the result represents the good or bad action decision of the actor network, and therefore strategy gradient descending of the actor network is achieved, and training of the critic network is gradient descending according to the reward of the robot for executing the action.

The reward value for the policy selection action is as follows:

when the grabbing action is successful, 1 is awarded, and the strategy is to grab the object which can be grabbed as far as possible and then execute other actions

R(s_t,s_t+1)＝R_g(s_t,s_t+1)

When the pushing action is executed and the return report which is more than or equal to zero is obtained, the return report value is given to the strategy return function, and when the pushing action return is a negative value, the record is not put into an experience pool for playback

R(s_t,s_t+1)＝R_p(s_t,s_t+1)

Optionally, rewarding the gripping motion changing environment of the industrial robot comprises: rewarding the environment changed by the grabbing action of the industrial robot after successfully grabbing an object, wherein the rewarding function is as follows:

Correspondingly, reward is given to the pushing action changing environment of the industrial robot, and the reward comprises the following steps:

if the number of the objects is changed after the pushing action is executed, the reward function is as follows:

R_p(s_t,s_t+1)＝1,if n_t＜n_t+1

(II) if the number of the objects is not changed after the pushing action is executed, the reward function is as follows:

if any d_(i，j)＜L and n_t＝＝n_t+1

(III) when in s_tAll object distances in the scene are larger than the minimum grabbing distance L, and under the condition that the number of object clusters is not changed after the pushing action. If at s_t+1All object distances in the scene are greater than the minimum grabbing distance, then the given reward value is 0; if pushed, at s_t+1In the case of a scene where the distance between objects is smaller than the minimum grasping distance, a negative award is given according to L where the sum of the distances is n times.

In the formula, R_p(s_t,s_t+1) For reward to said industrial robot, n_tAnd n_t+1As the number of said objects, d_t+1Is the minimum distance between edge points of different objects.

(IV) when s_tNumber of clusters n in a scene_tGreater than s_t+1Number of clusters n in a scene_t+1In this case, the object is pushed out of the workspace by a pushing action, which causes the object to become unhappy, giving a reward value of-1.

R_p(s_t，s_t+1)＝-1，if n_t＞n_t+1

In the formula, R_p(s_t,s_t+1) For reward to said industrial robot, n_tAnd n_t+1Is the number of said objects.

Fig. 3 is a schematic diagram illustrating division of a partial region of a picture according to an embodiment of the present disclosure, as shown in fig. 3, a root node in fig. 3 represents a center point of the whole picture, an index node represents a center point of each region, and a leaf node is a region with a minimum granularity, where the granularity may be adjusted according to precision requirements. Regarding the picture as a two-dimensional array, using the vertices at the four corners of the picture as starting points, and setting all the points of the external object-free connected region to a special value (e.g., -1, shown as gray) by one traversal using depth-first traversal.

Fig. 4 is a graph of leaf nodes of a quadtree associated with edge points of two objects according to an embodiment of the present application, and as shown in fig. 4, mapping of two objects on leaf nodes of the quadtree uniquely marks objs for each object_iAnd if the leaf node represents the edge point of the object in the picture area, marking all the path nodes from the leaf node to the root node, and thus searching from the root node downwards to find the edge of the object.

Is that the level layer contains the object obj s_iThe above formula is to calculate the distance between two objects expressed in the same layer of the quadtree, and calculate the nodes of different objects in the same layer according to the above formula; and taking the image as a plane, dividing the image into four regions by the quadtree, only taking three pairs of nodes of the minimum value, continuously searching downwards until the leaf nodes, and taking the minimum value of the calculation result of the leaf nodes as the minimum distance. The time complexity of finding the minimum distance between two objects is O (hk)²) Where k is the number of child nodes of each node in the tree, and since it is a quadtree, k is 4, and the time complexity for detecting all objects at once is O (m)²hk²). That is, in fig. 4, mapping of two objects on a leaf node of a quadtree uniquely marks and represents each object, where the mapping is an object number, and if there is an edge point of an object in a picture area represented by the leaf node, all path nodes from the leaf node to a root node are marked. And taking the image as a plane, dividing the image into four regions by the quadtree, only taking three pairs of nodes of the minimum value, continuously searching downwards until the leaf nodes, and taking the minimum value of the calculation result of the leaf nodes as the minimum distance.

The control method of the industrial robot has the advantages that: carrying out Markov modeling on the pushing action and the grabbing action by defining the pushing action and the grabbing action of the industrial robot; acquiring a depth image, establishing a quadtree for the depth image, and acquiring the minimum distance between edge points of different objects in the depth image according to the quadtree; the deep reinforcement learning network is used for learning, the environment of the industrial robot is rewarded when the pushing action and the grabbing action are changed, objects can be placed randomly in an unstructured scene (for example, the objects are placed compactly), and stable and good sorting and grabbing control is achieved.

Fig. 5 is a schematic diagram of an industrial robot control device provided in an embodiment of the present application, and as shown in fig. 5, an unmanned aerial vehicle visual navigation policy device of the present application includes:

the defining module 501 is used for defining pushing actions and grabbing actions of the industrial robot and carrying out Markov modeling on the pushing actions and the grabbing actions;

an obtaining module 502, configured to obtain a depth image, establish a quadtree for the depth image, and obtain a minimum distance between edge points of different objects in the depth image according to the quadtree;

and the reward module 503 is used for learning by using a deep reinforcement learning network and rewarding the environment changed by the pushing action and the grabbing action of the industrial robot.

It can be seen that the contents in the foregoing method embodiments are all applicable to this apparatus embodiment, the functions specifically implemented by this apparatus embodiment are the same as those in the foregoing method embodiment, and the advantageous effects achieved by this apparatus embodiment are also the same as those achieved by the foregoing method embodiment.

Referring to fig. 6, an embodiment of the present application provides an industrial robot control apparatus including:

at least one processor 601;

at least one memory 602 for storing at least one program;

the at least one program, when executed by the at least one processor 601, causes the at least one processor 601 to implement a method of controlling an industrial robot of the foregoing embodiment.

Similarly, the contents of the method embodiments are all applicable to the apparatus embodiments, the functions specifically implemented by the apparatus embodiments are the same as the method embodiments, and the beneficial effects achieved by the apparatus embodiments are also the same as the beneficial effects achieved by the method embodiments.

An embodiment of the present invention further provides a storage medium storing a program, which is used to implement the method of the foregoing embodiment when the program is executed by a processor.

The contents in the above method embodiments are all applicable to the present storage medium embodiment, and the functions implemented in the present storage medium embodiment are the same as those in the above method embodiments.

Similarly, the contents in the foregoing method embodiments are all applicable to this storage medium embodiment, the functions specifically implemented by this storage medium embodiment are the same as those in the foregoing method embodiments, and the advantageous effects achieved by this storage medium embodiment are also the same as those achieved by the foregoing method embodiments.

In alternative embodiments, the functions/acts noted in the block diagrams may occur out of the order noted in the operational illustrations. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality/acts involved. Furthermore, the embodiments presented and described in the flowcharts of the present application are provided by way of example in order to provide a more thorough understanding of the technology. The disclosed methods are not limited to the operations and logic flows presented herein. Alternative embodiments are contemplated in which the order of various operations is changed and in which sub-operations described as part of larger operations are performed independently.

Furthermore, although the present application is described in the context of functional modules, it should be understood that, unless otherwise stated to the contrary, one or more of the functions and/or features may be integrated in a single physical device and/or software module, or one or more functions and/or features may be implemented in separate physical devices or software modules. It will also be appreciated that a detailed discussion regarding the actual implementation of each module is not necessary for an understanding of the present application. Rather, the actual implementation of the various functional modules in the apparatus disclosed herein will be understood within the ordinary skill of an engineer, given the nature, function, and internal relationship of the modules. Accordingly, those skilled in the art can, using ordinary skill, practice the present application as set forth in the claims without undue experimentation. It is also to be understood that the specific concepts disclosed are merely illustrative of and not intended to limit the scope of the application, which is defined by the appended claims and their full scope of equivalents.

The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

The logic and/or steps represented in the flowcharts or otherwise described herein, e.g., an ordered listing of executable instructions that can be considered to implement logical functions, can be embodied in any computer-readable medium for use by or in connection with an instruction execution system, apparatus, or device, such as a computer-based system, processor-containing system, or other system that can fetch the instructions from the instruction execution system, apparatus, or device and execute the instructions. For the purposes of this description, a "computer-readable medium" can be any means that can contain, store, communicate, propagate, or transport the program for use by or in connection with the instruction execution system, apparatus, or device.

More specific examples (a non-exhaustive list) of the computer-readable medium would include the following: an electrical connection (electronic device) having one or more wires, a portable computer diskette (magnetic device), a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber device, and a portable compact disc read-only memory (CDROM). Additionally, the computer-readable medium could even be paper or another suitable medium upon which the program is printed, as the program can be electronically captured, via for instance optical scanning of the paper or other medium, then compiled, interpreted or otherwise processed in a suitable manner if necessary, and then stored in a computer memory.

It should be understood that portions of the present application may be implemented in hardware, software, firmware, or a combination thereof. In the above embodiments, the various steps or methods may be implemented in software or firmware stored in memory and executed by a suitable instruction execution system. For example, if implemented in hardware, as in another embodiment, any one or combination of the following techniques, which are known in the art, may be used: a discrete logic circuit having a logic gate circuit for implementing a logic function on a data signal, an application specific integrated circuit having an appropriate combinational logic gate circuit, a Programmable Gate Array (PGA), a Field Programmable Gate Array (FPGA), or the like.

In the foregoing description of the specification, reference to the description of "one embodiment/example," "another embodiment/example," or "certain embodiments/examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present application have been shown and described, it will be understood by those of ordinary skill in the art that: numerous changes, modifications, substitutions and alterations can be made to the embodiments without departing from the principles and spirit of the application, the scope of which is defined by the claims and their equivalents.

The above embodiments are only used to illustrate the technical solutions of the present application, and not to limit the same; although the present application has been described in detail with reference to the foregoing embodiments, it should be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions in the embodiments of the present application.

Claims

1. A method for controlling an industrial robot, characterized in that the method comprises:

2. A control method for an industrial robot according to claim 1, characterized in that said defining the pushing and gripping actions of an industrial robot comprises:

setting a safety height of the industrial robot;

3. A control method for an industrial robot according to claim 1, characterized in that establishing a quadtree for a depth image, from which a minimum distance between edge points of different objects in the depth image is obtained, comprises:

4. A control method for an industrial robot according to claim 3, characterized in that the mathematical expression for calculating the minimum distance between edge points of different objects from said edge points is:

in the formula (I), the compound is shown in the specification,

being the minimum distance between edge points of different objects,

and the nodes containing the object in the depth image are obtained.

5. The control method for an industrial robot according to claim 1, wherein said deep reinforcement learning network comprises an action value network and an action strategy network, said action value network represents said pushing action using a feed-forward full convolution network, said grabbing action using an extended depth value network, said depth image is inputted to said feed-forward full convolution network and said extended depth value network; the action strategy network comprises an actor network and a critic network, wherein the actor network is used for selecting the best executed action, and the critic network scores the output of the actor network and is used for enabling the actor network to perform strategy gradient descent.

6. A control method for an industrial robot according to claim 1, characterized in that rewarding the gripping motion changing environment of the industrial robot comprises:

in the formula, R_g(s_t，s_t+1) For rewarding the industrial robot, grasp success is a grabbing success condition, and grasp failed is a grabbing failure condition.

7. A control method of an industrial robot according to claim 1, characterized in that rewarding a push action change environment of the industrial robot comprises:

R_p(s_t，s_t+1)＝1，if n_t＜n_t+1

in the formula, R_p(s_t，s_t+1) For reward to said industrial robot, n_tAnd n_t+1Is the number of said objects;

if any d_(i，j)＜L and n_t＝＝n_t+1

in the formula, R_p(s_t，s_t+1) For reward to said industrial robot, n_tAnd n_t+1ReLu is the ReLu function, d_t+1Between edge points of different objectsThe minimum distance of (c).

8. A control arrangement of an industrial robot, characterized in that the arrangement comprises:

9. A control arrangement of an industrial robot, characterized in that the arrangement comprises:

at least one processor;

at least one memory for storing at least one program;

a control method of an industrial robot according to any of claims 1-7, when at least one of said programs is executed by at least one of said processors.

10. Storage medium, characterized in that the storage medium stores a processor-executable program which, when executed by a processor, implements a method of controlling an industrial robot according to any one of claims 1-7.