CN116619374A

CN116619374A - Robot control method and device and robot

Info

Publication number: CN116619374A
Application number: CN202310653567.8A
Authority: CN
Inventors: 王在进; 张泽宇; 刘航欣
Original assignee: Beijing General Artificial Intelligence Research Institute
Current assignee: Beijing General Artificial Intelligence Research Institute
Priority date: 2023-06-02
Filing date: 2023-06-02
Publication date: 2023-08-22

Abstract

The application discloses a robot control method and device and a robot, and belongs to the field of robots. The robot control method comprises the following steps: acquiring a target environment image of an environment where a robot is located; performing feature segmentation on the target environment image to obtain a complete object point cloud image corresponding to a target object in the target environment image and a complete component point cloud image corresponding to a target component, wherein the target component is a component in the target object; acquiring target environment understanding information of the robot on the environment based on the complete object point cloud image and the complete part point cloud image; and controlling the robot based on the target environment understanding information and the task to be executed by the robot. The robot control method can deepen the understanding degree of the robot on the environment, remarkably improves the control precision, realizes the task execution of the robot in the complex environment, and has higher universality for the execution of the complex task.

Description

Robot control method and device and robot

Technical Field

The application belongs to the field of robots, and particularly relates to a robot control method, a robot control device and a robot.

Background

With the development of industrial technology, robots are increasingly and widely applied to daily life work of people, such as feeding and discharging robots, automatic bolt locking robots, automatic vegetable feeding robots, household intelligent robots and the like in industrial workshops. In the related art, the control of the robot is mainly realized by arranging a sensor on the robot to sense the environmental information, but under the sensing of the traditional sensor, the cognition and understanding capability of the robot to the environment are limited, so that the robot can only repeatedly execute simple tasks, cannot carry out refined control on the robot, and cannot be applied to complex environments.

Disclosure of Invention

The present application aims to solve at least one of the technical problems existing in the prior art. Therefore, the application provides the robot control method, the robot control device and the robot, which remarkably improve the control precision, realize the task execution of the robot in a complex environment and have higher universality for the execution of the complex task.

In a first aspect, the present application provides a robot control method, the method comprising:

acquiring a target environment image of an environment where a robot is located;

performing feature segmentation on the target environment image to obtain a complete object point cloud image corresponding to a target object in the target environment image and a complete component point cloud image corresponding to a target component, wherein the target component is a component in the target object;

Acquiring target environment understanding information of the robot on the environment based on the complete object point cloud image and the complete component point cloud image, wherein the target environment understanding information comprises first information of each object in the environment and second information of each component in each object;

and controlling the robot based on the target environment understanding information and the task to be executed by the robot.

According to the robot control method, the object-level point cloud image and the component-level point cloud image are directly obtained by image segmentation of the target environment image, different components in the same object can be further distinguished while the objects in the environment are effectively distinguished, the specific structural information of a single object is effectively expressed, the target environment understanding information of the robot on the environment is established based on the connection relation between the components, and the understanding degree of the robot on the environment is enhanced; on the basis, based on the target environment understanding information and the task control robot executing corresponding actions, the robot can accurately control the specific object and the specific components in the specific object, the control precision is remarkably improved, the task execution of the robot in a complex environment is realized, and the robot has higher universality for the execution of the complex task.

According to an embodiment of the present application, the feature segmentation is performed on the target environment image to obtain a complete object point cloud image corresponding to a target object and a complete component point cloud image corresponding to a target component in the target environment image, including:

inputting the target environment image into an image segmentation network of a target model, and acquiring an object-level image block corresponding to the target object and a component-level image block corresponding to the target component output by the image segmentation network;

and respectively inputting the object-level image block and the component-level image block into a point cloud complement network of the target model, and acquiring a complete object point cloud image corresponding to the object-level image block and a complete component point cloud image corresponding to the component-level image block, which are output by the point cloud complement network.

According to one embodiment of the present application, the inputting the target environment image into an image segmentation network of a target model, obtaining an object-level image block corresponding to the target object and a component-level image block corresponding to the target component output by the image segmentation network, includes:

inputting the target environment image to an object segmentation module of the image segmentation network, and acquiring the object-level image block output by the component segmentation module;

And inputting the target environment image or the object-level image block into a component segmentation module of the image segmentation network, and acquiring the component-level image block output by the component segmentation module.

According to one embodiment of the application, the target environment image comprises a color image and a depth image.

According to an embodiment of the present application, the inputting the object-level image block and the component-level image block to the point cloud completion network of the target model respectively, obtaining a complete object point cloud image corresponding to the object-level image block and a complete component point cloud image corresponding to the component-level image block output by the point cloud completion network includes:

extracting first point cloud information corresponding to the target object from the object-level image block, and extracting second point cloud information corresponding to the target component from the component-level image block;

predicting to obtain a first deviation based on the first point cloud information, and predicting to obtain a second deviation based on the second point cloud information;

the first point cloud information is complemented based on the first deviation, and the complete object point cloud image is obtained; and complementing the second point cloud information based on the second deviation, and acquiring the point cloud image of the complete part.

According to one embodiment of the present application, the obtaining, based on the complete object point cloud image and the complete component point cloud image, target environment understanding information of the robot on the environment includes:

establishing a first coordinate system corresponding to the target object by taking the centroid of the target object as the origin of the coordinate system, and acquiring a first position coordinate of the target object based on the complete object point cloud image and the first coordinate system;

establishing a second coordinate system corresponding to the target component by taking the centroid of the target component as a coordinate system origin, and acquiring a second position coordinate of the target object based on the complete component point cloud image and the second coordinate system;

establishing a relative positional relationship between the target component and the target object based on the first coordinate system and the second coordinate system;

and acquiring the target environment understanding information based on the first position coordinate, the second position coordinate and the relative position relation.

According to an embodiment of the present application, the controlling the robot based on the target environment understanding information and the task to be performed by the robot includes:

extracting interactable parts from a plurality of components included in the target object based on the target environment understanding information;

Determining target pose information based on the task and the interactable location;

and controlling the robot based on the target pose information.

According to one embodiment of the present application, the acquiring a target environment image of an environment in which a robot is located includes:

acquiring a plurality of environment images corresponding to the environment under a plurality of acquisition angles; the plurality of acquisition angles are in one-to-one correspondence with the plurality of environment images, and the plurality of acquisition angles are different; the target environment image is any image in the plurality of environment images;

the obtaining, based on the complete object point cloud image and the complete component point cloud image, target environment understanding information of the robot on the environment includes:

determining first environment understanding information of the robot on the environment under the acquisition angle corresponding to the target environment image based on the complete object point cloud image and the complete component point cloud image corresponding to the target environment image in the plurality of environment images;

and fusing first environment understanding information of the robot to the environment under the plurality of acquisition angles to acquire the target environment understanding information.

In a second aspect, the present application provides a robot control device comprising:

The first processing module is used for acquiring a target environment image of the environment where the robot is located;

the second processing module is used for carrying out feature segmentation on the target environment image to obtain a complete object point cloud image corresponding to a target object in the target environment image and a complete component point cloud image corresponding to a target component, wherein the target component is a component in the target object;

the third processing module is used for acquiring target environment understanding information of the robot on the environment based on the complete object point cloud image and the complete component point cloud image, wherein the target environment understanding information comprises first information of each object in the environment and second information of each component in each object;

and the fourth processing module is used for controlling the robot based on the target environment understanding information and the task to be executed by the robot.

According to the robot control device, the object-level point cloud image and the component-level point cloud image are directly obtained by image segmentation of the target environment image, different components in the same object can be further distinguished while the objects in the environment are effectively distinguished, the specific structural information of a single object is effectively expressed, the target environment understanding information of the robot on the environment is established based on the connection relation between the components, and the understanding degree of the robot on the environment is enhanced; on the basis, based on the target environment understanding information and the task control robot executing corresponding actions, the robot can accurately control the specific object and the specific components in the specific object, the control precision is remarkably improved, the task execution of the robot in a complex environment is realized, and the robot has higher universality for the execution of the complex task.

In a third aspect, the present application provides a robot comprising:

a robot body;

an image sensor provided to the robot body;

the robot control device according to the second aspect, wherein the robot control device is electrically connected to the robot body and the image sensor, respectively.

According to the robot disclosed by the application, the object-level point cloud image and the component-level point cloud image are directly obtained by carrying out image segmentation on the target environment image, so that different components in the same object can be further distinguished while the objects in the environment are effectively distinguished, the specific structural information of a single object is effectively expressed, the target environment understanding information of the robot on the environment is established based on the connection relation between the components, and the understanding degree of the robot on the environment is enhanced; on the basis, based on the target environment understanding information and the task control robot executing corresponding actions, the robot can accurately control the specific object and the specific components in the specific object, the control precision is obviously improved, the task execution of the robot under the complex environment is realized, and the robot has higher universality for the execution of the complex task

In a fourth aspect, the present application provides a non-transitory computer readable storage medium having stored thereon a computer program which, when executed by a processor, implements a robot control method as described in the first aspect above.

In a fifth aspect, the present application provides a computer program product comprising a computer program which, when executed by a processor, implements a robot control method as described in the first aspect above.

The above technical solutions in the embodiments of the present application have at least one of the following technical effects:

the object-level point cloud image and the component-level point cloud image are directly obtained by image segmentation of the target environment image, so that different components in the same object can be further distinguished while the objects in the environment are effectively distinguished, the specific structural information of a single object is effectively expressed, the target environment understanding information of the robot on the environment is established based on the connection relation between the components, and the understanding degree of the robot on the environment is enhanced; on the basis, based on the target environment understanding information and the task control robot executing corresponding actions, the robot can accurately control the specific object and the specific components in the specific object, the control precision is remarkably improved, the task execution of the robot in a complex environment is realized, and the robot has higher universality for the execution of the complex task.

Further, by arranging the object segmentation module and the component segmentation module in the image segmentation network, the object segmentation module performs instance segmentation, and the component segmentation module performs component-level image segmentation, so that the object-level image block and the component-level image block can be directly acquired based on the target environment image, the image segmentation precision is high, the segmentation speed is high, and the segmentation precision and the segmentation efficiency are high.

Furthermore, on the basis of acquiring the object-level image block and the component-level image block based on the image segmentation network, the object-level image block and the component-level image block are subjected to point cloud complementation based on the point cloud complementation network so as to acquire a complete object point cloud image and a complete component point cloud image, so that the integrity of finally acquired point cloud image information is remarkably improved, the depth of understanding of a robot established in the follow-up process to the environment is improved, and the robot is controlled in a refined mode.

Still further, by respectively establishing a first coordinate system for representing the position information of the object by taking the mass center of the object as an origin, and establishing a second coordinate system for representing the position information of the object by taking the mass center of the component in the object as an origin, the established coordinate system is only related to the geometric shape of the object or the component, and is irrelevant to the specific position of the object in the environment and the specific position of the component in the object, the influence of the position error caused by the movement of the object or the component on the control accuracy of the robot is avoided, and the rationality and the stability of the understanding information of the established target environment are further improved, so that the accuracy and the accuracy of the control of the robot are improved.

Additional aspects and advantages of the application will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the application.

Drawings

The foregoing and/or additional aspects and advantages of the application will become apparent and may be better understood from the following description of embodiments taken in conjunction with the accompanying drawings in which:

fig. 1 is a schematic flow chart of a robot control method according to an embodiment of the present application;

FIG. 2 is a second flow chart of a robot control method according to an embodiment of the present application;

FIG. 3 is a third flow chart of a robot control method according to an embodiment of the present application;

FIG. 4 is a flowchart of a robot control method according to an embodiment of the present application;

fig. 5 is a schematic structural diagram of a robot control device according to an embodiment of the present application;

fig. 6 is a schematic structural diagram of a robot according to an embodiment of the present application.

Detailed Description

The technical solutions of the embodiments of the present application will be clearly described below with reference to the drawings in the embodiments of the present application, and it is apparent that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments, which are obtained by a person skilled in the art based on the embodiments of the present application, fall within the scope of protection of the present application.

The terms first, second and the like in the description and in the claims, are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged, as appropriate, such that embodiments of the present application may be implemented in sequences other than those illustrated or described herein, and that the objects identified by "first," "second," etc. are generally of a type, and are not limited to the number of objects, such as the first object may be one or more. Furthermore, in the description and claims, "and/or" means at least one of the connected objects, and the character "/", generally means that the associated object is an "or" relationship.

The robot control method, the robot control device, the robot and the readable storage medium provided by the embodiment of the application are described in detail below through specific embodiments and application scenes thereof with reference to the accompanying drawings.

The robot control method can be applied to the terminal, and can be specifically executed by hardware or software in the terminal.

The terminal includes, but is not limited to, a portable communication device such as a mobile phone or tablet computer. It should also be appreciated that in some embodiments, the terminal may not be a portable communication device, but rather a desktop computer or robot.

The execution subject of the robot control method provided by the embodiment of the application may be a robot or a functional module or a functional entity capable of implementing the robot control method in the robot, and the robot control method provided by the embodiment of the application is described below by taking the robot as the execution subject.

It should be noted that the robot control method of the present application may be applied to any intelligent robot scene, such as an industrial robot scene or a home robot scene.

As shown in fig. 1, the robot control method includes: step 110, step 120, step 130 and step 140.

Step 110, acquiring a target environment image of the environment where the robot is located;

in this step, the target environment image is an environment image acquired by the robot at the target acquisition angle.

The target environment image is an image for characterizing color information and depth information, and may be an RGB-D image, for example.

In the actual execution process, the image sensor can be used for collecting the image, the collected target environment image is stored in a local or cloud database, and the target environment image is called when the target environment image is needed later.

The image sensor may include a color camera, a depth camera, a 3D camera, an RGB-D camera, and the like, among others.

The number of the target environment images can be one or more, and in the case that the target environment images are multiple, at least partial images of the multiple images correspond to different image acquisition angles.

Step 120, performing feature segmentation on the target environment image to obtain a complete object point cloud image corresponding to the target object in the target environment image and a complete component point cloud image corresponding to the target component, wherein the target component is a component in the target object;

in this step, the target object may be any object included in the target environment image.

For example, in the case where the target environment image is an in-plant environment image, the target object may be a table, a chair, a machine, a wall, a floor, a lamp, a person, or the like in the plant.

The target object may be one, more or all of all objects included in the target environment image.

The target component may be any component in the target object.

For example, in the case where the target object is a table, the target component may be a single component such as a leg and a table top.

The target component may be one, more or all of all components included in the target object.

In some embodiments, obtaining the complete object point cloud image corresponding to the target object and the complete component point cloud image corresponding to the target component in the target environment image may include: and obtaining a complete object point cloud image corresponding to at least part of the objects in the target environment image and a complete component point cloud image corresponding to at least part of the components in each object.

In some embodiments, obtaining the complete object point cloud image corresponding to the target object and the complete component point cloud image corresponding to the target component in the target environment image may include: and obtaining a complete object point cloud image corresponding to all the objects in the target environment image and a complete component point cloud image corresponding to all the components in each object.

The complete object point cloud image is all point cloud images corresponding to the complete area of the target object.

The complete part point cloud image is all point cloud images corresponding to the complete area of the target part.

The point cloud image may represent distances between each part of the target object and the target component and the robot.

In the actual execution process, any image segmentation algorithm such as threshold segmentation, edge segmentation, region segmentation, morphological segmentation and the like can be adopted to carry out image segmentation processing so as to obtain a complete object point cloud image and a complete component point cloud image.

In other embodiments, the target environment image may be subjected to preliminary feature extraction, and point cloud complement based on the extracted features, so as to obtain a complete object point cloud image and a complete component point cloud image.

The inventor finds that in the field of robot control, an image segmentation network is mostly adopted to segment a single object from the acquired image information so as to acquire pose information of the whole object, and the robot is controlled to execute corresponding operation based on the pose information; however, the method cannot effectively express specific structural information of a single object, such as the composition of each component in the object and the connection relation between the components, so that precise control of the robot on the specific component cannot be realized, and the control precision is low.

In the application, through combining the example segmentation and the component level segmentation, the object level point cloud image and the component level point cloud image corresponding to each component in the object are obtained, different components in the same object can be further distinguished while different objects in the environment are effectively distinguished, and specific structural information of a single object, such as the composition of each component in the object and the connection relation among the components, is effectively expressed, so that the precise control of the robot on the specific components is facilitated, the control precision is improved, the task execution of the robot in the complex environment is realized, and the execution of the complex task is realized.

In some embodiments, step 120 may include: and inputting the target environment image into a target model, and obtaining a complete object point cloud image and a complete component point cloud image which are output by the target model.

In this embodiment, the target model may be a pre-trained neural network model.

In the actual execution process, only the target environment image is input into the target model, and the target model is used for carrying out object-level and component-level image segmentation on the target environment image so as to directly output a complete object point cloud image and a complete component point cloud image.

The target model is obtained by taking a sample environment image as a sample, taking a sample object point cloud image corresponding to a sample object included in the sample environment image and a sample component point cloud image corresponding to a sample component in the sample object as sample labels, and training.

Wherein the sample object point cloud image is an image obtained based on semantic/instance segmentation.

The sample component point cloud image is an image obtained by further based on the component segmentation on the basis of the semantic/instance segmentation.

In some embodiments, the training samples may include: scannet, partnet and Scan2cad, etc.

For example, firstly, establishing a connection between Scannet and Partnet through a Scan2cad data set, pouring a model of an object containing component information in Partnet into Scannet, obtaining a semantic segmentation area (i.e. a sample object point cloud image) and a component segmentation area (i.e. a sample component point cloud image) of each object of a 2d image in Scannet by using a back projection mode, aligning depth images provided by Scannet to color images, and storing internal parameters of an image sensor into the data set together, thereby obtaining training samples.

In some embodiments, the 2d image and depth image serialized in Scannet may be extracted once every target frame number, such as every 3 frames, 5 frames, or 10 frames, as the application is not limited.

The specific architecture and training of the object model will be described below and will not be described here.

According to the robot control method provided by the embodiment of the application, the pre-trained target model is used for carrying out image segmentation on the target environment image so as to simultaneously obtain the complete object point cloud image and the complete component point cloud image, so that the extracted characteristics are finer, and the accuracy and the precision of the extraction result are improved.

As shown in fig. 2, in some embodiments, step 120 may include:

inputting the target environment image into an image segmentation network of a target model, and acquiring an object-level image block corresponding to a target object and a component-level image block corresponding to a target component output by the image segmentation network;

and respectively inputting the object-level image block and the component-level image block into a point cloud complement network of the target model, and acquiring a complete object point cloud image corresponding to the object-level image block and a complete component point cloud image corresponding to the component-level image block output by the point cloud complement network.

In this embodiment, the object model may include: an image segmentation network and a point cloud completion network.

The output of the image segmentation network is connected with the input of the point cloud completion network.

The image segmentation network is used for performing object-level image segmentation to obtain object-level image blocks and is also used for performing component-level image segmentation to obtain component-level image blocks.

The point cloud completion network is used for carrying out point cloud completion on the image obtained by the image segmentation network.

It will be appreciated that the object-level image blocks and the component-level image blocks output by the image segmentation network may be complete point cloud images or incomplete point cloud images, as shown in the image input in fig. 4, in which case point cloud complementation is required on the object-level image blocks and/or the component-level image blocks to obtain complete object point cloud images and/or complete component point cloud images, as shown in the image output in fig. 4.

The input of the point cloud complement network can be multiple channels.

In some embodiments, the input channels of the point cloud completion network may be three channels, corresponding to the x, y, z coordinates of the point cloud, respectively.

In other embodiments, the input channels of the point cloud completion network may be six channels, corresponding to x, y, z coordinates and r, g, b color information of the point cloud, respectively.

By introducing the color channels, different parts in the same object can be further distinguished, and the method is beneficial to distinguishing a planar object from a three-dimensional object and is beneficial to improving the accuracy of a follow-up complement result.

According to the application, the point cloud of the incomplete object is complemented, so that the conditions of collision and the like caused by the incomplete object point cloud in the subsequent control process of the robot can be avoided, the service life of the robot is prolonged, and the potential safety hazard is reduced.

According to the robot control method provided by the embodiment of the application, on the basis of acquiring the object-level image block and the component-level image block based on the image segmentation network, the object-level image block and the component-level image block are further subjected to point cloud complementation based on the point cloud complementation network so as to acquire the complete object point cloud image and the complete component point cloud image, so that the integrity of finally acquired point cloud image information is obviously improved, the depth of understanding of the robot to the environment is improved, and the robot is conveniently controlled in a refined manner.

In some embodiments, the target model is obtained by taking a sample environment image as a sample, taking a sample object point cloud image corresponding to a sample object included in the sample environment image and a sample component point cloud image corresponding to a sample component in the sample object as a sample label, and training.

In this embodiment, the sample object point cloud image and the sample component point cloud image may be real object point cloud images and component point cloud images corresponding to the sample environment image.

The specific acquisition method of the training sample is described in the above embodiments, and is not described herein.

In some embodiments, the target environment image may include a color image and a depth image.

In some embodiments, the image segmentation network may include a plurality of U-net cascades, U-net being trained based on multi-class cross entropy loss functions, derived based on Adam optimization.

In the embodiment, the U-net is a CNN-based image segmentation network, and each hidden layer has more feature dimensions, so that more various and comprehensive features of model learning are facilitated; in addition, the U-shaped structure of the U-net network enables the cutting and splicing processes to be more visual and reasonable, and accurate results can be obtained under the condition of fewer training samples.

During training, the loss function can be selected from multi-classification cross entropy loss functions, and the optimization method can be selected from Adam.

In some embodiments, the target environment image may be an RGB-D image, wherein the color image includes r, g, b color information; the depth image includes depth information.

As shown in fig. 3, in the actual implementation, 4 cascaded U-nets may be used, where the input of the first U-net is a 4-channel RGB-D image (corresponding to r, g, b and depth image channels, respectively), and the output of the last U-net is an object-level image block and a component-level image block.

According to the robot control method provided by the embodiment of the application, the image segmentation network is set to be a plurality of cascaded U-nets, the image segmentation network is trained based on the multi-classification cross entropy loss function, and the image segmentation network is optimized based on Adam, so that more various and comprehensive characteristics of model learning are facilitated; accurate results can be obtained with fewer training samples.

With continued reference to fig. 2, in some embodiments, inputting the target environment image into the image segmentation network of the target model, obtaining the object-level image block corresponding to the target object and the component-level image block corresponding to the target component output by the image segmentation network may include:

Inputting the target environment image into an object segmentation module of an image segmentation network, and acquiring an object-level image block output by a component segmentation module;

In this embodiment, the image segmentation network may include an object segmentation module and a component segmentation module.

The object segmentation module is used for object-level image segmentation, namely instance segmentation.

The component segmentation module is used for performing component-level image segmentation.

In some embodiments, an input of the component segmentation module may be connected to an output of the object segmentation module for component-level segmentation of the object-level image block output by the object segmentation module to obtain a component-level image block.

In other embodiments, the object segmentation module and the component segmentation module may also be performed synchronously, such as inputting the target environment image to the object segmentation module and the component segmentation module, respectively, to obtain the object-level image block and the component-level image block.

In the training process, the image segmentation network can be trained in a single training or overall training mode, and particularly, an optimal training mode can be selected based on actual requirements, so that the application is not limited.

For example, in the actual implementation process, a 3D camera may first collect a target environment image to obtain a corresponding RGB-D image, and send the RGB-D image to an image segmentation network to obtain a pixel-level region of each object, a tag of each object, a region of each component in the object, and so on;

and then, referring to the region of each part of the segmented object in the depth image, combining with the camera internal reference, recovering the point cloud of the object from the depth image, combining with the RGB image, giving the point cloud point colors, and obtaining the color point cloud (namely an object-level image block and a part-level image block).

And sending the color point cloud which possibly has the defect into a point cloud complement network so as to complement the defective object into a complete object, and acquiring a complete object point cloud image and a complete part point cloud image, thereby facilitating the operation and motion planning of the subsequent robot and avoiding the collision and the like caused by the defect of the object point cloud.

It should be noted that, the image segmentation network provided by the embodiment of the present application may complete the object pixel level instance segmentation and the component segmentation at a time, and each instance outputs an instance category and a label of each component.

According to the robot control method provided by the embodiment of the application, the object segmentation module and the component segmentation module are arranged in the image segmentation network, the object segmentation module performs instance segmentation, and the component segmentation module performs component-level image segmentation, so that the semantic/instance segmentation and the component segmentation on the RGB-D image can be completed once through the neural network, the object-level image block and the component-level image block can be directly acquired based on the target environment image, the image segmentation precision is high, the segmentation speed is high, and the segmentation precision and the segmentation efficiency are higher.

In some embodiments, inputting the object-level image block and the component-level image block to the point cloud completion network of the target model, respectively, and obtaining a complete object point cloud image corresponding to the object-level image block and a complete component point cloud image corresponding to the component-level image block output by the point cloud completion network may include:

extracting first point cloud information corresponding to a target object from an object-level image block, and extracting second point cloud information corresponding to a target component from a component-level image block;

predicting and obtaining a first deviation based on the first point cloud information, and predicting and obtaining a second deviation based on the second point cloud information;

based on the first deviation complement first point cloud information, acquiring a complete object point cloud image; and supplementing second point cloud information based on the second deviation, and acquiring a complete component point cloud image.

In this embodiment, the object-level image block and the component-level image block each comprise a grayscale image, and in some embodiments, the object-level image block and the component-level image block may also comprise color images.

The first point cloud information is a point cloud directly extracted from the object-level image block and the component-level image block.

The first deviation is the offset corresponding to each point in the target object, which is predicted based on the input first point cloud information.

The offset may represent a distance from other points near the input point within the target object to the input point.

The second deviation is the offset corresponding to each point in the target component, which is predicted based on the input second point cloud information.

The offset may represent a distance from the input point to other points within the target component near the input point.

In the actual execution process, the offset corresponding to each piece of point cloud information is predicted, and then the offset corresponding to the point cloud information is added to each piece of point cloud information, so that the final complete point cloud information can be obtained.

Continuing taking the image segmentation network as a plurality of cascaded U-nets as an example, as shown in fig. 3, an RGB-D image can be input into the plurality of cascaded U-nets, two paths of data are output by the U-net of the last stage, one path is an object-level image block, and the other path is a component-level image block.

And then the object-level image block and the component-level image block are respectively input into the point cloud completion network.

As shown in fig. 4, x, y, z coordinates of a point cloud and r, g, b color information are respectively input into six channels of a point cloud complement network, downsampling or upsampling is performed to 2048 points, regularization processing is performed on the input point cloud, position information is implanted after processing by a multi-layer perceptron (Multilayer Perceptron, MLP), the position information is then sent to a encoder and decoder of a transformer, offset of each point is predicted, and then offset of each point and coordinate information of the input point cloud are added to obtain a final output point cloud, so that a complete object point cloud image and a complete component point cloud image are obtained.

It should be noted that, based on the complete point cloud image obtained after the point cloud is completed by the method provided by the application, the complete point cloud image of the same target object can be obtained regardless of the acquisition view angle of the image sensor, i.e. no matter from which acquisition angle the target object is acquired.

According to the robot control method provided by the embodiment of the application, the offset is predicted based on the input point cloud, the point cloud complementation is performed based on the offset, the prediction accuracy is higher, the complementation effect is better, the limitation of the acquisition angle of the image sensor is avoided, and the universality is higher.

130, acquiring target environment understanding information of the robot on the environment based on the complete object point cloud image and the complete component point cloud image, wherein the target environment understanding information comprises first information of each object in the environment and second information of each component in each object;

in this step, the target environment understanding information is used to assist the robot in depth perception of the environment.

The target environment information may include first information for each object in the environment and second information for each component in each object.

The first information may include a category of objects included in the environment, the number of objects of each category, a position of each object in the environment, a relative positional relationship between each object and the robot, and the like.

The second information may include the type of the parts included in the same object, the number of parts of each class, the relative positional relationship of the parts in the object, the relative positional relationship between the parts, the mutual positional relationship between the parts and the robot, and the like.

The target environment information may be a URDF file, which is a file format for describing the robot structure.

After the point cloud images of the complete object and the point cloud images of the complete component are obtained, point cloud information corresponding to each object and each component can be extracted from the point cloud images of the complete object and the point cloud images of the complete component, and then target environment understanding information of the robot to the environment is established based on the extracted point cloud information.

The obtained target environment understanding information can be stored in a local or cloud database so as to be called up when needed later.

In some embodiments, step 130 may include:

establishing a second coordinate system corresponding to the target component by taking the centroid of the target component as the origin of the coordinate system, and acquiring a second position coordinate of the target object based on the complete component point cloud image and the second coordinate system;

Establishing a relative position relationship between the target component and the target object based on the first coordinate system and the second coordinate system;

and acquiring target environment understanding information based on the first position coordinate, the second position coordinate and the relative position relation.

In this embodiment, the origin of the coordinate system of the first coordinate system is the centroid of the target object.

The first coordinate system is related only to the geometry of the target object itself, irrespective of the spatial position in which the target object is located.

It will be appreciated that in the event that the geometry of the target object does not change, the position and attitude of the first coordinate system relative to the various components of the target object will also correspond.

For example, according to the completed point cloud, the principal component analysis (Principal Component Analysis, PCA) method may be used to obtain the pose information of the target object, that is, construct a first coordinate system with the centroid of the target object as the origin of the coordinate system, and obtain the coordinates of each point on the target object along the x, y, and z coordinate axes on the first coordinate system.

The origin of coordinates of the second coordinate system is the centroid of the target component.

The second coordinate system is related only to the geometry of the target component itself, irrespective of the spatial position in which the target component is located.

The relative positional relationship is used for the positional relationship between objects in the environment and between objects and components.

The relative positional relationship may be determined based on a first coordinate system corresponding to different objects, or may be determined based on a second coordinate system corresponding to the same object and each component within the object.

For example, the relative position relation between the objects can be respectively established based on the first coordinate system corresponding to the objects in the environment, the relative position relation between the second coordinate system and the first coordinate system corresponding to the second coordinate system is established, then the point clouds of the objects and the parts in the objects are combined, the understanding of the robot to the environment is established based on the first position coordinate, the second position coordinate and the relative position relation, and the URDF file is generated, so that the target environment understanding information is obtained.

The inventor also finds that in the research and development process, in the related technology, a coordinate system is established mainly in a mode of taking the object as a whole so as to acquire the position coordinates of all parts in the object and thus acquire the posture of the object; the construction mode of the position coordinates ensures that the position coordinates of all the components in the object correspondingly change along with the movement and rotation of the object or the components, thereby causing certain displacement errors and affecting the accurate understanding of the robot on the environmental information, and further affecting the control precision and accuracy of the robot.

In the application, the first coordinate system for representing the position information of the object is respectively established by taking the mass center of the object as the origin, and the second coordinate system for representing the position information of the object is established by taking the mass center of the component in the object as the origin, so that the robot can understand the environment more carefully, the established coordinate system is only related to the geometric shape of the object or the component, and is irrelevant to the specific position of the object in the environment and the specific position of the component in the object, the influence of the position error caused by the movement of the object or the component on the control accuracy of the robot is avoided, the robot is suitable for the object and the component which move arbitrarily in the environment and the component which are symmetrical arbitrarily, the application scene is wide, the rationality and the universality of the established object environment understanding information can be further improved, and the control accuracy of the robot are improved.

According to the robot control method provided by the embodiment of the application, the first coordinate system used for representing the position information of the object is established by taking the mass center of the object as the origin, and the second coordinate system used for representing the position information of the object is established by taking the mass center of the component in the object as the origin, so that the established coordinate system is only related to the geometric shape of the object or the component and is irrelevant to the specific position of the object in the environment and the specific position of the component in the object, the influence of the position error caused by the movement of the object or the component on the control accuracy of the robot is avoided, the rationality and the stability of the understanding information of the established target environment are further improved, and the accuracy and the precision of the robot control are improved.

And 140, controlling the robot based on the target environment understanding information and the task to be executed by the robot.

In this step, the tasks to be performed may include, but are not limited to: grabbing and placing objects, pushing and pulling objects, opening and closing doors, robot navigation, and the like.

In the actual execution process, the object in the environment to be interacted with by the robot can be determined based on the task, and then the object and the position information of each component in the object are acquired based on the established target environment understanding information, so that the robot is controlled to execute the related task based on the position information.

In the application, the object in the environment is segmented more finely by carrying out image segmentation on the target environment image through example segmentation and component segmentation to directly obtain the object-level point cloud image and the component-level point cloud image, and different components in the same object can be further distinguished while the object and the object in the environment are effectively distinguished, so that the specific structural information of a single object, such as the composition of each component in the object and the connection relation among the components, is effectively expressed, the robot can understand the environment more carefully, and the perception capability of the environment is enhanced to construct the depth target environment understanding information; and then based on the target environment understanding information and tasks, controlling the robot to execute corresponding actions, and realizing the precise control of the robot on specific components so as to improve the control precision, realize the task execution of the robot in a complex environment and the execution of the complex tasks.

According to the robot control method provided by the embodiment of the application, the object-level point cloud image and the component-level point cloud image are directly obtained by image segmentation of the target environment image, so that different components in the same object can be further distinguished while the objects in the environment are effectively distinguished, the specific structural information of a single object is effectively expressed, the target environment understanding information of the robot on the environment is established based on the connection relation between the components, and the understanding degree of the robot on the environment is enhanced; on the basis, based on the target environment understanding information and the task control robot executing corresponding actions, the robot can accurately control the specific object and the specific components in the specific object, the control precision is remarkably improved, the task execution of the robot in a complex environment is realized, and the robot has higher universality for the execution of the complex task.

In some embodiments, step 140 may comprise:

determining target pose information based on the task and the interactable part;

based on the target pose information, the robot is controlled.

In this embodiment, the target object may be any object in the environment.

For example, in the actual execution process, the interactable part of the object can be extracted according to the category of the object, the category of each component in the object and the information of each object and the point cloud of the component described in the URDF file, and the target pose information suitable for robot interaction is calculated by combining with a specific task, so that support is provided for the robot to grab and place the object, push and pull the object by the robot, open and close the door by the robot, navigation by the robot and the like.

According to the robot control method provided by the embodiment of the application, through extracting all interactable parts in the environment from the constructed target environment understanding information, the target pose information of the robot is conveniently determined based on the interactable parts corresponding to the task selection in the actual execution process, so that the robot is controlled to execute the corresponding task based on the target pose information, thereby realizing the object refining operation of the robot, and having higher control precision and higher control efficiency.

In some embodiments, step 110 may further comprise:

acquiring a plurality of environment images corresponding to the environment under a plurality of acquisition angles; the plurality of acquisition angles are in one-to-one correspondence with the plurality of environment images, and the plurality of acquisition angles are different; the target environment image is any image in a plurality of environment images;

Step 130 may further include:

and fusing first environment understanding information of the robot to the environment under a plurality of acquisition angles to acquire target environment understanding information.

With continued reference to fig. 2, in the actual implementation process, environmental images under different acquisition angles can be acquired through multi-acquisition-angle and multi-position shooting, and a complete object point cloud image and a complete component point cloud image corresponding to each environmental image are acquired in a mode of step 110 based on each environmental image respectively; then acquiring first environment understanding information of the robot to the environment under the acquisition angle in a step 120 mode based on the complete object point cloud image and the complete part point cloud image corresponding to each environment image; and finally, fusing the first environment understanding information of a plurality of acquisition angles to obtain final target environment understanding information, thereby perfecting the URDF file and avoiding missing objects in the environment.

According to the robot control method provided by the embodiment of the application, the plurality of environmental image information is acquired through multi-acquisition-angle and multi-position shooting, so that the first environmental understanding information corresponding to each acquisition angle is fused, the target environmental understanding information can be further perfected, and the integrity and accuracy of the target environmental understanding information are improved.

According to the robot control method provided by the embodiment of the application, the execution main body can be a robot control device. In the embodiment of the application, a robot control device executes a robot control method as an example, and the robot control device provided by the embodiment of the application is described.

The embodiment of the application also provides a robot control device.

As shown in fig. 5, the robot control device includes: a first processing module 510, a second processing module 520, a third processing module 530, and a fourth processing module 540.

A first processing module 510, configured to obtain a target environment image of an environment in which the robot is located;

the second processing module 520 is configured to perform feature segmentation on the target environment image to obtain a complete object point cloud image corresponding to the target object in the target environment image and a complete component point cloud image corresponding to the target component, where the target component is a component in the target object;

a third processing module 530, configured to obtain, based on the complete object point cloud image and the complete component point cloud image, target environment understanding information of the robot on the environment, where the target environment understanding information includes first information of each object in the environment and second information of each component in each object;

the fourth processing module 540 is configured to control the robot based on the target environment understanding information and the task to be performed by the robot.

According to the robot control device provided by the embodiment of the application, the object-level point cloud image and the component-level point cloud image are directly obtained by image segmentation of the target environment image, so that different components in the same object can be further distinguished while the objects in the environment are effectively distinguished, the specific structural information of a single object is effectively expressed, the target environment understanding information of the robot on the environment is established based on the connection relation between the components, and the understanding degree of the robot on the environment is enhanced; on the basis, based on the target environment understanding information and the task control robot executing corresponding actions, the robot can accurately control the specific object and the specific components in the specific object, the control precision is remarkably improved, the task execution of the robot in a complex environment is realized, and the robot has higher universality for the execution of the complex task.

In some embodiments, the second processing module 520 may also be configured to:

In some embodiments, the third processing module 530 may also be configured to:

In some embodiments, the fourth processing module 540 may also be configured to:

based on the target pose information, the robot is controlled.

In some embodiments, the first processing module 510 may also be configured to:

In some embodiments, the third processing module 530 may also be configured to:

The robot control device in the embodiment of the application can be a robot, a component in the robot, such as an integrated circuit or a chip, or a server in communication connection with the robot.

The robot control device in the embodiment of the application may be a device having an operating system. The operating system may be an Android operating system, an IOS operating system, or other possible operating systems, and the embodiment of the present application is not limited specifically.

The robot control device provided by the embodiment of the present application can implement each process implemented by the embodiments of the methods of fig. 1 to fig. 4, and in order to avoid repetition, a detailed description is omitted here.

The embodiment of the application also provides a robot.

As shown in fig. 6, the robot includes: the robot body, the image sensor 610, and the robot control device as described in any of the above embodiments.

Wherein the image sensor 610 is used to capture an image of the target environment.

The image sensor 610 may be provided to the robot body.

The number of image sensors 610 may be one or more.

For example, the image sensor 610 may be a color camera, a depth camera, or an RGB-D camera.

Types of robots may include, but are not limited to: household intelligent robots, robotic arms, and other industrial intelligent robots, etc.

With continued reference to fig. 6, in some embodiments, the robot body may further include: a robotic arm 621, a jaw 622, a robotic arm base 623, and a movable member 624.

The clamping jaw 622 is disposed at one end of the mechanical arm 621, and is used for performing operations such as grabbing and placing.

The mechanical arm 621 is disposed on the mechanical arm base 623, and the mechanical arm 621 is rotatable relative to the mechanical arm base 623.

The mechanical arm base 623 is disposed on the movable component 624, and is driven by the movable component 624 to move.

The robot control device is electrically connected to the robot body and the image sensor 610, respectively, for performing the robot control method as described in any of the embodiments above.

In some embodiments, with continued reference to fig. 6, a robot control device may be provided to the robot body.

In some embodiments, the robot control device may include an image processing module 631 and a robotic arm control module 632, where the image processing module 631 is electrically connected to the image sensor 610 and the robotic arm control module 632, respectively, for generating a complete object point cloud image corresponding to a target object in the target environment image and a complete component point cloud image corresponding to a target component in the target object based on the received target environment image; the robot arm control module 632 is configured to obtain target environment understanding information of the robot about the environment based on the complete object point cloud image and the complete component point cloud image, and control the robot to perform tasks based on the target environment understanding information.

According to the robot provided by the embodiment of the application, the object-level point cloud image and the component-level point cloud image are directly obtained by image segmentation of the target environment image, so that different components in the same object can be further distinguished while the objects in the environment are effectively distinguished, the specific structural information of a single object is effectively expressed, the target environment understanding information of the robot on the environment is established based on the connection relation between the components, and the understanding degree of the robot on the environment is enhanced; on the basis, based on the target environment understanding information and the task control robot executing corresponding actions, the robot can accurately control the specific object and the specific components in the specific object, the control precision is remarkably improved, the task execution of the robot in a complex environment is realized, and the robot has higher universality for the execution of the complex task.

The embodiment of the application also provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor, implements the processes of the robot control method embodiment described above, and can achieve the same technical effects, and in order to avoid repetition, the description is omitted here.

Wherein the processor is a processor in the robot described in the above embodiment. The readable storage medium includes computer readable storage medium such as computer readable memory ROM, random access memory RAM, magnetic or optical disk, etc.

The embodiment of the application also provides a computer program product, which comprises a computer program, wherein the computer program realizes the robot control method when being executed by a processor.

The embodiment of the application further provides a chip, the chip comprises a processor and a communication interface, the communication interface is coupled with the processor, the processor is used for running programs or instructions, the processes of the robot control method embodiment can be realized, the same technical effects can be achieved, and the repetition is avoided, and the description is omitted here.

It should be understood that the chips referred to in the embodiments of the present application may also be referred to as system-on-chip chips, chip systems, or system-on-chip chips, etc.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element. Furthermore, it should be noted that the scope of the methods and apparatus in the embodiments of the present application is not limited to performing the functions in the order shown or discussed, but may also include performing the functions in a substantially simultaneous manner or in an opposite order depending on the functions involved, e.g., the described methods may be performed in an order different from that described, and various steps may be added, omitted, or combined. Additionally, features described with reference to certain examples may be combined in other examples.

From the above description of the embodiments, it will be clear to those skilled in the art that the above-described embodiment method may be implemented by means of software plus a necessary general hardware platform, but of course may also be implemented by means of hardware, but in many cases the former is a preferred embodiment. Based on such understanding, the technical solution of the present application may be embodied essentially or in a part contributing to the prior art in the form of a computer software product stored in a storage medium (e.g. ROM/RAM, magnetic disk, optical disk) comprising instructions for causing a terminal (which may be a mobile phone, a computer, a server, or a network device, etc.) to perform the method according to the embodiments of the present application.

The embodiments of the present application have been described above with reference to the accompanying drawings, but the present application is not limited to the above-described embodiments, which are merely illustrative and not restrictive, and many forms may be made by those having ordinary skill in the art without departing from the spirit of the present application and the scope of the claims, which are to be protected by the present application.

In the description of the present specification, reference to the terms "one embodiment," "some embodiments," "illustrative embodiments," "examples," "specific examples," or "some examples," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the application. In this specification, schematic representations of the above terms do not necessarily refer to the same embodiments or examples. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.

While embodiments of the present application have been shown and described, it will be understood by those of ordinary skill in the art that: many changes, modifications, substitutions and variations may be made to the embodiments without departing from the spirit and principles of the application, the scope of which is defined by the claims and their equivalents.

Claims

1. A robot control method, comprising:

2. The method according to claim 1, wherein the performing feature segmentation on the target environment image to obtain a complete object point cloud image corresponding to a target object and a complete component point cloud image corresponding to a target component in the target environment image includes:

3. The robot control method according to claim 2, wherein the inputting the target environment image into the image segmentation network of the target model, obtaining the object-level image block corresponding to the target object and the component-level image block corresponding to the target component output by the image segmentation network, includes:

4. The robot control method of claim 2, wherein the target environment image comprises a color image and a depth image.

5. The method according to claim 2, wherein the inputting the object-level image block and the component-level image block to the point cloud completion network of the target model, respectively, obtaining a complete object point cloud image corresponding to the object-level image block and a complete component point cloud image corresponding to the component-level image block output by the point cloud completion network, includes:

6. The robot control method according to any one of claims 1 to 5, wherein the acquiring the target environment understanding information of the robot to the environment based on the complete object point cloud image and the complete part point cloud image includes:

7. The robot control method according to any one of claims 1 to 5, wherein the controlling the robot based on the target environment understanding information and a task to be performed by the robot includes:

and controlling the robot based on the target pose information.

8. The robot control method according to any one of claims 1 to 5, wherein the acquiring a target environment image of an environment in which the robot is located includes:

9. A robot control device, comprising:

10. A robot, comprising:

a robot body;

an image sensor provided to the robot body;

the robot control device of claim 9, electrically connected to the robot body and the image sensor, respectively.

11. A non-transitory computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when executed by a processor, implements the robot control method according to any one of claims 1-8.

12. A computer program product comprising a computer program, characterized in that the computer program, when executed by a processor, implements the robot control method according to any of claims 1-8.