CN118123799A

CN118123799A - Robot intelligent control method, system, electronic equipment and medium

Info

Publication number: CN118123799A
Application number: CN202410309152.3A
Authority: CN
Inventors: 韩定强; 余驿; 高党波
Original assignee: Hangzhou Ezviz Network Co Ltd
Current assignee: Hangzhou Ezviz Network Co Ltd
Priority date: 2024-03-18
Filing date: 2024-03-18
Publication date: 2024-06-04

Abstract

The embodiment of the application provides a robot intelligent control method, a system, electronic equipment and a medium. In the embodiment, the current task is decomposed into a subtask sequence through a large language model; before the corresponding subtasks are executed, task execution information of the robot for executing the subtasks is planned according to the local three-dimensional semantic map and/or the global three-dimensional semantic map, such as the task execution information of collision-free foothold points, the position posture of the robot, grabbing paths and the like. Therefore, the current task with complex content can be automatically decomposed into the subtasks with single content and easy execution, and task execution information of the robot for executing the subtasks is planned for each subtask so as to complete the subtasks based on the task execution information, thereby improving the control intelligence of the robot.

Description

Robot intelligent control method, system, electronic equipment and medium

Technical Field

The application relates to the technical field of robots, in particular to an intelligent control method, an intelligent control system, electronic equipment and a medium for robots.

Background

With the continuous development of technology in recent years, various types of robots (such as mobile robots) are rapidly developed in technical and market aspects; the robot may be a machine device that automatically performs work, and is a machine that performs various functions by means of its own power and control capability.

In practical applications, a robot can only realize one task with a single content (such as a task of capturing a certain target object) at a time, and if more tasks are to be realized, a user is required to input more related instructions correspondingly, so that the intelligence of the robot is not high.

Disclosure of Invention

In view of the above, the application provides a robot intelligent control method, a system, an electronic device and a medium, so as to improve the intelligence of the robot.

The embodiment of the application provides an intelligent control method for a robot, which comprises the following steps:

decomposing the current task into a subtask sequence through a large language model;

Planning task execution information of a robot to execute a corresponding subtask before the subtask is executed; when the subtask is to be moved to a target position, planning a collision-free foot drop point of the robot based on the global three-dimensional semantic map, and when the subtask is to grasp a target object, planning a position posture and a grasping path of the robot based on the local three-dimensional semantic map;

The local three-dimensional semantic map is determined according to perception information in a working environment perceived by the robot in the subtask execution process; the perception information at least comprises: object properties, object point clouds, and environmental point clouds in a work environment;

the global three-dimensional semantic map is a recorded global three-dimensional semantic map or is generated by splicing the recorded global three-dimensional semantic map and a latest obtained local three-dimensional semantic map.

The embodiment of the application also provides an intelligent control system of the robot, which comprises: the system comprises a decision module, a perception module and at least one planning module;

the decision module is used for decomposing the current task into a subtask sequence through the large language model and distributing a corresponding planning module for each subtask in the subtask sequence;

Any planning module is used for planning the task execution information of the robot to execute the subtask before the corresponding subtask is executed; when the subtask is a grabbing target object, the allocated planning module of the subtask plans the position gesture and grabbing path of the robot based on the local three-dimensional semantic map;

The local three-dimensional semantic map is determined by a perception module according to perception information of a robot in a working environment perceived in the execution process of the subtasks; the perception information at least comprises: object properties, object point clouds, and environmental point clouds in a work environment;

The global three-dimensional semantic map is a recorded global three-dimensional semantic map or is generated by splicing the recorded global three-dimensional semantic map and a local three-dimensional semantic map which is obtained by a perception module.

The embodiment of the application also provides electronic equipment, which comprises:

A processor and a memory for storing computer program instructions which, when executed by the processor, cause the processor to perform the steps of the method as above.

Embodiments of the present application also provide a computer readable storage medium storing computer program instructions which, when executed, enable the steps of the method as above to be carried out.

According to the technical scheme, in the embodiment, the current task is decomposed into a subtask sequence through a large language model; before the corresponding subtasks are executed, task execution information of the robot for executing the subtasks is planned according to the local three-dimensional semantic map and/or the global three-dimensional semantic map, such as the task execution information of collision-free foothold points, the position posture of the robot, grabbing paths and the like. Therefore, the current task with complex content can be automatically decomposed into the subtasks with single content and easy execution, and task execution information of the robot for executing the subtasks is planned for each subtask so as to complete the subtasks based on the task execution information, thereby improving the control intelligence of the robot.

It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the application as claimed.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the application and together with the description, serve to explain the principles of the application.

Fig. 1 is a schematic flow chart of a robot intelligent control method according to an embodiment of the present application.

Fig. 2 is an implementation schematic diagram of a robot control system according to an embodiment of the present application.

Fig. 3 is a schematic structural diagram of an intelligent control system for a robot according to an embodiment of the present application.

Fig. 4 is a schematic structural diagram of an electronic device according to an embodiment of the present application.

Detailed Description

Reference will now be made in detail to exemplary embodiments, examples of which are illustrated in the accompanying drawings. When the following description refers to the accompanying drawings, the same numbers in different drawings refer to the same or similar elements, unless otherwise indicated. The implementations described in the following exemplary examples do not represent all implementations consistent with the application. Rather, they are merely examples of apparatus and methods consistent with aspects of the application as detailed in the accompanying claims.

The terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of the application. As used in this specification and the appended claims, the singular forms "a," "an," and "the" are intended to include the plural forms as well, unless the context clearly indicates otherwise. It should also be understood that the term "and/or" as used herein refers to and encompasses any or all possible combinations of one or more of the associated listed items.

It should be understood that although the terms first, second, third, etc. may be used herein to describe various information, these information should not be limited by these terms. These terms are only used to distinguish one type of information from another. For example, first information may also be referred to as second information, and similarly, second information may also be referred to as first information, without departing from the scope of the application. The word "if" as used herein may be interpreted as "at … …" or "at … …" or "in response to a determination" depending on the context.

In order to facilitate understanding of the technical solution of the embodiment of the present application, the technical concept related to the embodiment of the present application is explained below.

And (3) point cloud: is a data representation for describing the surface of an object in three-dimensional space. It is composed of a large number of points, each of which contains spatial coordinates and possibly other attribute information such as color, normal direction, etc.; such as for reconstructing three-dimensional object models, terrain surface models, building models, etc.

Large language model: the language model with larger parameter number and more training data is referred to. It is trained by deep learning techniques using large-scale text data to learn and predict the probability distribution and patterns of languages. The large language model may be applied to a number of natural language processing tasks such as language generation, text classification, machine translation, semantic understanding, and the like.

Three-dimensional semantic map: refers to a map representation based on semantic information in a three-dimensional environment. It models and describes objects, scenes and structures in the environment, and their relationships, including information on the position, shape, size, class, pose, etc. of the objects.

In order to better understand the technical solution provided by the embodiments of the present application and make the above objects, features and advantages of the embodiments of the present application more obvious, the technical solution in the embodiments of the present application will be described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, fig. 1 is a schematic flow chart of a robot intelligent control method according to an embodiment of the present application. Alternatively, the flow may be applied to a robot such as a mobile robot, which is not particularly limited in this embodiment.

As shown in fig. 1, the process may include the steps of:

s101, decomposing the current task into a subtask sequence through a large language model.

In this embodiment, the large language model may be a model for task decomposition that has been trained in advance. The current task may refer to a currently received task; the specific content of the task is not limited herein, and the task may be "help me take an apple", "help me close a window next to a kitchen", "help me close a window in a kitchen", or the like.

Optionally, as an embodiment, there are many specific implementations of decomposing the current task into the subtask sequence through the large language model, for example, if at least one target history subtask associated with the current task exists in the recorded history task set, each target history subtask, the current task and the recorded linguistic global map are input into the large language model as input data to obtain the subtask sequence; otherwise, the current task and the recorded linguistic global map are used as input data to be input into a large language model to obtain a subtask sequence; the history task set comprises at least one history subtask which is executed by the robot in a specified time period and meets a set condition; any history subtask at least carries an execution result indicating whether the history subtask is successfully executed; the setting condition at least comprises the task type as the target type.

Here, the specified time period may be flexibly set according to actual demands, for example, a time period from a set history time point to a current time point, where the set history time point is not particularly limited either, and may be, for example, 0 point on the day, 0 point on the previous day, or the like. As an embodiment, the target type may be flexibly set according to actual requirements, for example, it may refer to a task type that requires a long time to be performed (e.g., the execution time of a subtask is longer than a set time), or it may be a task type that needs to be controlled to be opened or closed (e.g., to open a window, to open a bedroom door, etc.).

As one example, there are many specific implementations of determining whether there is at least one target history subtask associated with the current task in the recorded set of history tasks, for example, at least one keyword may be determined based on the current task first, where the current task is "help me close a window in a kitchen", and the corresponding keyword may include: help me, close, kitchen, window, etc.; then matching is carried out from the historical task set according to at least one keyword corresponding to the current task, and if a historical subtask matched with each keyword exists, the matched historical subtask is determined to be a target historical subtask; for example, the target history subtask may include a subtask that is executed near a window that moves to the kitchen during a period of time a and that is executed successfully, a subtask that is executed to close the window during a period of time B and that is executed successfully, etc.

On this basis, as an embodiment, the specific implementation manner of inputting each target historical subtask, the current task and the recorded linguistic global map into the large language model as input data to obtain the subtask sequence is many, for example, the large language model firstly decomposes the current task into a subtask sequence according to the recorded linguistic global map, then removes the same subtask as the successfully executed target historical subtask in the subtask sequence to obtain a new subtask sequence, and the new subtask sequence can be regarded as the finally obtained subtask sequence, so that repeated execution of certain subtasks can be avoided, and the execution efficiency of the current task is effectively improved. For example, if the current task is "help me to close the window in the kitchen and take an apple to me", then since each target history subtask has indicated that the window in the kitchen has been closed, it may indicate that the window in the kitchen does not need to be closed again.

As another embodiment, any historical subtask may further carry an execution start time point corresponding to the historical subtask, so as to be used for knowing the current time of the executed time interval of the historical subtask. On the basis, the current task is decomposed into a subtask sequence according to a recorded linguistic global map through a large language model, then target historical subtasks which are more than a set threshold from the current time point are removed based on an execution starting time point carried by the target historical subtasks, and subtasks which are the same as the successfully executed target historical subtasks in the rest target historical subtasks in the subtask sequence are removed to obtain a new subtask sequence, wherein the new subtask sequence can be regarded as the finally obtained subtask sequence. Thus, given that a target history sub-task (e.g., a sub-task that closes a kitchen window) is performed too long from the current time, there may be instances in which the kitchen window is closed by other robots or manually during this time, and thus the target history sub-task needs to be removed to ensure the integrity of the current task being performed.

Optionally, as an embodiment, after decomposing the current task into the subtask sequence by the large language module, a corresponding planning module may be further allocated to each subtask in the subtask sequence; there are many specific implementations of allocating corresponding planning modules to each subtask in the subtask sequence, for example, each planning module may be configured in advance according to actual requirements, for example, a planning module for subtask planning may be configured according to a task type, for example, the planning module may include a planning module for capturing a task, a planning module for an environment exploration task, a planning module for a mobile task, a planning module for a switch task, and the like. On the basis, according to the task type of each subtask in the subtask sequence, determining a planning module matched with each subtask from preconfigured planning modules; for example, the subtask is "pick up apple", the planning module allocated for the subtask is the planning module for the pick up task, and so on.

S102, planning task execution information of the robot to execute the subtasks before the corresponding subtasks are executed.

In the present embodiment, the task execution information may be understood as information for task execution, such as information of a collision-free landing point, a gripping point, a position posture of the robot, a gripping path, a collision-free moving path of the robot, and the like, which is not particularly limited herein.

When the subtask is to be moved to the target position, the collision-free foot drop point of the robot is planned based on the global three-dimensional semantic map, and when the subtask is to be used for grabbing the target object, the position gesture and grabbing path of the robot are planned based on the local three-dimensional semantic map. Here, the target position may refer to, for example, a position where an apple is located; the target may be, for example, apples. The position and orientation may refer to information of the position, orientation, rotation state, etc. of the object in space.

In the embodiment, the local three-dimensional semantic map is determined according to perception information of a robot in a working environment perceived by the robot in the execution process of the subtasks; the perception information at least comprises: object properties in the working environment, object point clouds, and ambient point clouds. Here, the object attribute may include attribute information such as a category, color, surface text, texture, shape, and the like of the object. The environment is understood to be an object that cannot be identified, such as a floor, a table top, an obstacle, etc., e.g. the obstacle may be a stool, a pot, etc. Optionally, as an embodiment, the sensing module may determine the local three-dimensional semantic map according to sensing information in a working environment sensed by the robot in the sub-task execution process; wherein the perception module may refer to a camera or a camera group consisting of a plurality of cameras, for example.

Alternatively, as an embodiment, there are many specific implementations of determining the recorded global three-dimensional semantic map, for example, periodically acquiring an image, where the acquired image may be understood as an image of a local working environment under the current acquisition range; for each acquired image, obtaining the perception information in the working environment of the robot according to the acquired image, for example, obtaining the perception information in the working environment of the robot by carrying out object detection, contour segmentation, attribute identification and other processes on the acquired image; fusion processing is carried out based on the perception information to obtain a local three-dimensional semantic map, and the recorded local three-dimensional semantic map is updated to be the local three-dimensional semantic map; the global three-dimensional semantic map is obtained by splicing the local three-dimensional semantic map and the recorded global three-dimensional semantic map, so that the global three-dimensional semantic map containing more comprehensive working environment information can be obtained, and the recorded global three-dimensional semantic map is updated to be the global three-dimensional semantic map.

Optionally, as an embodiment, after the global three-dimensional semantic map is obtained by stitching based on the local three-dimensional semantic map and the recorded global three-dimensional semantic map, the method further includes: compressing based on the global three-dimensional semantic map, and carrying out linguistic processing based on the compressed global three-dimensional semantic map to obtain a linguistic global map, wherein the linguistic global map can refer to text information for expressing and describing objects, scenes and semantic information of a three-dimensional environment in the global three-dimensional semantic map in a natural language form; and updating the recorded linguistic global map to a linguistic global map. The method of compression processing is not particularly limited here, and may be, for example, a PCA principal component analysis algorithm.

Optionally, as an embodiment, there are many specific implementation manners of the collision-free footage point of the robot based on the global three-dimensional semantic map planning, for example, determining a point cloud of a target position in the global three-dimensional semantic map based on the global three-dimensional semantic map; determining a sampling range based on the point cloud of the target position and attribute information of the robot; uniformly sampling point clouds in the sampling range in the global three-dimensional semantic map at intervals to obtain a plurality of sampling points; determining whether obstacle point clouds exist in point clouds positioned in a first target range in the global three-dimensional semantic map according to any sampling point; the first target range is determined in dependence on the sampling point; if no obstacle point cloud exists, determining the sampling point as a reference collision-free foot drop point; a collision-free landing point of the robot is determined based on the reference collision-free landing points.

As an embodiment, the attribute information of the robot may include, for example; the shape of the robot (such as a circular chassis with the radius of R on the robot), the grabbing range and other information. There are many specific implementations of determining the sampling range based on the point cloud of the target position and the attribute information of the robot, for example, a sampling range is obtained with the point cloud of the target position as the center and with the grippable range as the radius. There are many specific implementations of the determination of the first target range according to the sampling point, for example, the first target range may be a range determined by taking the sampling point as a center and taking a set size as a radius. There are many specific implementations of determining the non-collision foot drop point of the robot based on the reference non-collision foot drop points, for example, the reference collision foot drop point closest to the robot may be determined as the non-collision foot drop point of the robot.

Optionally, as an embodiment, the specific implementation of planning the position gesture and the grabbing path of the robot based on the local three-dimensional semantic map is very many, for example, planning the position gesture of the robot based on the point cloud of the target object in the local three-dimensional semantic map; planning a grabbing path of the robot based on the point cloud of the target object in the local three-dimensional semantic map and the point cloud positioned in the second target range; the second target range is determined in dependence of the point cloud of the target object. Here, as an example, the second target range may be a range determined by taking a point cloud of the target object as a center point and taking a preset size as a radius.

In this embodiment, it is understood that each sub-task in the sub-task sequence is sequentially executed in the specified order. After the current subtask is executed, the next subtask is triggered to be executed until all the subtasks are executed.

Optionally, as an embodiment, the method further includes: aiming at any subtask, obtaining an execution result of the robot executing the subtask; under the condition that the execution result indicates that the sub-task is failed to be executed, if the total number of times of execution failure of the sub-task does not reach the set number of times, re-planning task execution information of the robot for executing the sub-task according to the recorded linguistic global map and/or the recorded global three-dimensional semantic map; if the total times of execution failure of the subtasks reaches the set times, the current task is decomposed into a new subtask sequence again through a large language model according to the recorded linguistic global map and an execution result indicating that the execution of the subtasks fails, and the step of planning the task execution information of the robot to execute the subtasks before the execution of the corresponding subtasks is continuously executed. The number of times set here may be, for example, 3 times, 5 times, or the like.

As an embodiment, there are many specific implementations of re-decomposing the current task into a new sub-task sequence through a large language model according to the recorded linguistic global map and the execution result indicating that the sub-task is executed failure, for example, it is determined that the sub-task cannot be the same as the new sub-task corresponding to the sub-task obtained by re-decomposing according to the execution result indicating that the sub-task is executed failure, for example, the sub-task is an apple 1, it is determined that any one of the working environments in the working environment is different from apple 1 according to the recorded linguistic global map is apple 2, then the new sub-task corresponding to the sub-task is apple 2, and accordingly, based on this, the new sub-task sequence is re-decomposed through the large language model according to the recorded linguistic global map.

Optionally, as an embodiment, if the total number of execution failures of the subtasks reaches the set number of times, event information indicating that the subtasks are executed failure is fed back to the target user who issues the current task, so that the target user can make a decision of a next step.

Optionally, as an embodiment, for any subtask, before the subtask is executed, whether the subtask needs to be executed may be further determined according to the recorded local three-dimensional semantic map; and if the subtask is judged to be required to be executed, planning task execution information of the robot for executing the subtask. Specifically, for example, if the subtask is "cleaning the master bedroom", before the subtask is executed, whether the master bedroom needs to be cleaned can be judged according to the recorded local three-dimensional semantic map, for example, the master bedroom is clean (such as no dust, no messy obstacle, etc.) at present can be judged according to the recorded local three-dimensional semantic map, the master bedroom is determined to be not cleaned, the subtask is determined to be not executed, and the subtask can be skipped to continue to execute the next subtask. The method for determining whether the subtask needs to be executed according to the recorded local three-dimensional semantic map is not particularly limited herein, for example, the recorded local three-dimensional semantic map may be processed by a corresponding deep learning algorithm to analyze whether the subtask needs to be executed.

Thus, the flow shown in fig. 1 is completed.

As can be seen from the flow shown in fig. 1, in this embodiment, the current task is decomposed into a subtask sequence through a large language model; before the corresponding subtasks are executed, task execution information of the robot for executing the subtasks is planned according to the local three-dimensional semantic map and/or the global three-dimensional semantic map, such as the task execution information of collision-free foothold points, the position posture of the robot, grabbing paths and the like. Therefore, the current task with complex content can be automatically decomposed into the subtasks with single content and easy execution, and task execution information of the robot for executing the subtasks is planned for each subtask so as to complete the subtasks based on the task execution information, thereby improving the control intelligence of the robot.

In order to facilitate understanding of the specific implementation process of the intelligent control of the robot, an example is described below through a specific embodiment.

At present, the related robot control scheme can only realize a single grabbing task, cannot carry out more complex tasks such as mobile grabbing and the like, and does not have autonomous decision and intelligent embodiment of the robot. In addition, the object pose is used for grabbing, so that objects with overall sizes exceeding that of clamping jaws and manipulators cannot be grabbed.

The scheme provided by the embodiment of the application is that all objects which can be detected by the visual model output are stored in the memory module to form a semantic map, then the semantic map is linguisticd through an information compression technology, a large language model is combined with a user instruction and linguisticd map information to decompose a target task indicated in the user instruction into a plurality of subtasks, then corresponding subtask planning is completed through the planning module, and the 6-degree-of-freedom pose is not needed in the process of executing the grabbing subtask, but the corresponding point cloud data can be directly used for carrying out grabbing point planning; in this embodiment, the user only needs to issue the instruction to enable the robot to automatically complete multiple tasks in the open environment, and the robot has a certain autonomous decision-making capability.

Referring to the implementation schematic of the robot control system shown in fig. 2, the entire robot control system may be composed of five modules: the system comprises a sensing module, a memory module, a decision module, a planning module and an executing module; these five modules may be configured on the robot.

The sensing module outputs a detectable object, fuses detected object information into a local three-dimensional semantic map, outputs the local three-dimensional semantic map into the memory module, forms a global three-dimensional semantic map through positioning and graphic construction, and then languages the semantic map through information compression; the large language model combines the user instruction and the linguistic map information to decompose the target task indicated by the user instruction into a plurality of subtasks, then the corresponding subtask planning is completed through the planning module to obtain executable target points, movement tracks and the like, and finally the execution module is used for controlling the robot to complete corresponding actions; the planning module comprises a plurality of subtask planners such as the subtask planners of the classes of grabbing, exploring environment, moving, opening and closing drawers and the like; the user only needs to issue instructions to enable the robot to automatically complete various tasks in an open environment, and the robot has certain autonomous decision making capability.

In this embodiment, as shown in fig. 2, a specific process for implementing robot control based on the above-mentioned robot control system is as follows:

A perception module (as may include at least one camera): the sensing module can identify all the identifiable objects in the environment where the robot is located and convert the objects into point cloud data and text information; the perception module consists of two sub-modules of visual recognition and information fusion, wherein RGB, depth and other information of a camera (namely, images acquired by the camera) are input into the visual recognition sub-module, object attribute information (such as including characteristics of category, color, surface characters and the like) and object point cloud information (an object is the recognized object) are output after object detection, contour segmentation, attribute recognition and other processing, environment point cloud information (an environment can be composed of unrecognized objects such as ground, desktop and barriers) are input into the information fusion sub-module, a local three-dimensional semantic map is obtained by fusing the obtained information, and the local three-dimensional semantic map is respectively input into the memory module and the planning module.

A memory module: the memory module inputs corresponding data acquired by sensors such as a laser radar, imu, a chassis encoder and the like, and meanwhile, based on a local three-dimensional semantic map transmitted by the sensing module, the local three-dimensional semantic map is spliced by positioning and mapping processing in the memory module to generate a final global three-dimensional semantic map, and the map comprises relatively complete object attribute, object point cloud and environment point cloud information; then the global three-dimensional semantic map is output to an information compression sub-module, information compression processing (such as PCA principal component analysis algorithm) is carried out on point cloud information of objects in the global three-dimensional semantic map to obtain the center position of the objects, and each object is numbered (the objects of the same category are prevented from being confused, such as apples 1 and apples 2); the finally output linguistic map information comprises characteristic attribute information such as an object id number (namely an object serial number), a name, an object point cloud center position, colors/characters and the like. Meanwhile, the memory module also stores the history key event description (namely at least one history subtask executed by the robot in a designated time period and the execution result information corresponding to any history subtask and indicating whether the history subtask is executed successfully) in the decision module, so as to provide information reference for subsequent decisions.

Decision module: the decision module may specifically generate a serialized subtask through a large language model; when detecting instructions such as voice or text input by a user, the decision module is based on the current associated environment information (namely, the linguistic map information (namely, linguistic global map) +the historical key event (namely, target historical subtask) associated with the user instructions) which is latest transmitted by the memory module, then the user instructions and the current associated environment information are simultaneously input into a large language model, and the large language model decomposes the user instructions according to preset rules to obtain serialized subtasks; it can be appreciated that initially the historical key event is null;

Illustratively, the user inputs instructions: helping me take an apple;

Decision process of decision module: determining information such as apple position, user position and the like according to the linguistic map information which is currently and recently transmitted by the memory module, and acquiring a historical key event (if not, the historical key event is empty) associated with the user instruction from the memory module; generating a serialization subtask: subtask 1, move to the vicinity of apple; subtask 2, picking up apples; subtask 3, in the vicinity of the sports user; a subtask 4, submitting apples; if no apple exists in the environment, generating an instruction for searching an unknown area to search for the apple; the embodiment can make flexible decisions through the intelligence of the large language model.

And a planning module: the planning module inputs the local three-dimensional semantic map transmitted by the sensing module, the global three-dimensional semantic map transmitted by the memory module and the serialized subtasks generated by the decision module, selects a corresponding subtask planning sub-module (namely a subtask action planner in the figure) according to the subtask category, and then outputs control instructions such as specific target points, motion tracks and the like which can be executed by the robot; the global three-dimensional semantic map can be used for planning instructions such as collision-free footage points of the robot, and the local three-dimensional semantic map is used for providing real-time uncompressed environment information for a planning module, so that a specific robot position posture is planned based on the detailed environment information, such as planning of grabbing points through point cloud data of an object, planning of a collision-free path of a mechanical arm through the point cloud data of the environment, and the like; the planning module also receives state feedback from the execution module so as to judge whether the subtasks are successfully executed; for any subtask, if the subtask is executed for a plurality of times and fails, uploading state information indicating that the subtask is executed and fails to a decision module, and making a decision again or feeding back to a user by the decision module; the planning module can comprise a plurality of preset subtask planning sub-modules, and the more the number of the sub-modules is, the more powerful and comprehensive the functions of the robot are;

exemplary: the subtask 1-moves to the vicinity of the apple, the planning module can find the apple and surrounding local map information from the global three-dimensional semantic map which is currently generated by the memory module, and then a robot foot drop point which is collision-free and can pick up an object is planned according to the information of the shape (such as a chassis with a circular radius R) of the robot, a grabbing space and the like; for example, the specific method includes taking apples as circle centers, uniformly sampling at intervals on circles formed by different grabbing ranges as radiuses, calculating whether object/obstacle point clouds exist in the range of each sampling point distance R, if not, then, ordering all the reachable points according to a rule that the distance from the current robot is from near to far, and selecting the best suitable point as a target point (such as the reachable point closest to the current robot); finally, the target point is sent to an execution module, and an instruction that the robot reaches the target point is completed by the execution module;

And 2-picking up apples, obtaining a local three-dimensional semantic map which is latest generated by a perception module, planning grabbing points and grabbing postures through point cloud data of the apples (such as dexnet and other algorithms for planning grabbing points), planning a collision-free path (such as RRT algorithm) of a mechanical arm according to other obstacle point cloud data in the local three-dimensional semantic map, and finally sending the path to an execution module to complete the grabbing task.

The execution module: the execution module can be composed of a mechanical arm control part, a mechanical arm/clamping jaw control part and a navigation-based motion control part, and is used for respectively receiving and executing instructions from the planning module and feeding back status information such as whether the execution is finished or not; the mechanical arm control can receive instructions such as joint positions, tail end positions, contact forces and the like, and the return state is whether the target point is reached; the manipulator/clamping jaw control can receive instructions such as joint positions, grabbing forces and the like, and the returning state is whether to grab an object or not; the motion control based on navigation can be oriented to a wheeled robot, a foot robot and the like, the position and the orientation of the robot in a room are controlled, meanwhile, the robot can automatically avoid obstacles in the motion process, and the return state is whether the robot reaches a target point or not.

In the embodiment, the realization and association method of five modules including perception, memory, decision-making, planning and execution are provided by combining a large language model, a visual recognition technology, a robot planning control technology and the like, so as to realize a service robot/robot manager with certain intelligence; the whole scheme of the embodiment is realized as a modularized thought, the functions of the robot can be continuously overlapped and perfected (such as by adding a subtask planning submodule), and the functions of the robot are not influenced.

Thus, the description of the method provided in this embodiment is completed, and the following describes the robot control device provided in this embodiment of the present application:

Referring to fig. 3, fig. 3 is a schematic structural diagram of an intelligent control system for a robot according to an embodiment of the present application.

As shown in fig. 3, the robot intelligent control system 300 includes: a decision module 301, a perception module 302, and at least one planning module 303;

The decision module 301 is configured to decompose a current task into a subtask sequence through a large language model, and allocate a corresponding planning module 303 to each subtask in the subtask sequence;

Any planning module 303, configured to plan task execution information of a robot to execute a corresponding subtask before the subtask is executed; when the subtask is a grabbing target object, the allocated planning module 303 plans a position gesture and a grabbing path of the robot based on the local three-dimensional semantic map;

The local three-dimensional semantic map is determined by a perception module 302 according to perception information of a robot in a working environment perceived in the execution process of the subtasks; the perception information at least comprises: object properties, object point clouds, and environmental point clouds in a work environment;

The global three-dimensional semantic map is a recorded global three-dimensional semantic map, or is generated by splicing the recorded global three-dimensional semantic map and a local three-dimensional semantic map obtained by the perception module 302.

As one embodiment, the recorded global three-dimensional semantic map is determined according to the following steps:

periodically collecting images through a sensing module; aiming at each acquired image, obtaining perception information in the working environment of the robot according to the acquired image;

Fusion processing is carried out on the basis of the perception information to obtain a local three-dimensional semantic map, and the recorded local three-dimensional semantic map is updated to be the local three-dimensional semantic map;

and splicing the local three-dimensional semantic map and the recorded global three-dimensional semantic map to obtain a global three-dimensional semantic map, and updating the recorded global three-dimensional semantic map into the global three-dimensional semantic map.

As an embodiment, after the global three-dimensional semantic map is obtained by stitching based on the local three-dimensional semantic map and the recorded global three-dimensional semantic map, the method further includes:

Compressing based on the global three-dimensional semantic map, and carrying out linguistic processing based on the compressed global three-dimensional semantic map to obtain a linguistic global map;

updating the recorded linguistic global map into a linguistic global map.

As one embodiment, decomposing the current task into a sequence of subtasks by a large language model includes:

if at least one target history subtask associated with the current task exists in the recorded history task set, inputting each target history subtask, the current task and the recorded linguistic global map into a large language model as input data to obtain a subtask sequence;

Otherwise, the current task and the recorded linguistic global map are used as input data to be input into a large language model to obtain a subtask sequence;

The history task set comprises at least one history subtask which is executed by the robot in a specified time period and meets a set condition; any history subtask at least carries an execution result indicating whether the history subtask is successfully executed; the setting condition at least comprises the task type as the target type.

As an embodiment, the method further comprises:

Aiming at any subtask, obtaining an execution result of the robot executing the subtask;

Under the condition that the execution result indicates that the sub-task is failed to be executed, if the total number of times of execution failure of the sub-task does not reach the set number of times, re-planning task execution information of the robot for executing the sub-task according to the recorded linguistic global map and/or the recorded global three-dimensional semantic map;

If the total times of execution failure of the subtasks reaches the set times, the current task is decomposed into a new subtask sequence again through a large language model according to the recorded linguistic global map and the execution result indicating that the subtasks are failed to be executed, and the step of allocating corresponding planning modules for each subtask in the new subtask sequence is continuously executed.

As one embodiment, planning a collision-free foothold of a robot based on a global three-dimensional semantic map includes:

Determining a point cloud of a target position in the global three-dimensional semantic map based on the global three-dimensional semantic map;

Determining a sampling range based on the point cloud of the target position and attribute information of the robot; uniformly sampling point clouds in the sampling range in the global three-dimensional semantic map at intervals to obtain a plurality of sampling points;

determining whether obstacle point clouds exist in point clouds positioned in a first target range in the global three-dimensional semantic map according to any sampling point; the first target range is determined in dependence on the sampling point;

If no obstacle point cloud exists, determining the sampling point as a reference collision-free foot drop point;

a collision-free landing point of the robot is determined based on the reference collision-free landing points.

As one embodiment, planning a position pose and a grasping path of a robot based on a local three-dimensional semantic map includes:

planning the position and the posture of the robot based on the point cloud of the target object in the local three-dimensional semantic map;

Planning a grabbing path of the robot based on the point cloud of the target object in the local three-dimensional semantic map and the point cloud positioned in the second target range; the second target range is determined in dependence of the point cloud of the target object.

The implementation process of the functions and roles of each module in the intelligent control system of the robot is specifically detailed in the implementation process of the corresponding steps in the method, and is not repeated here.

For embodiments of the robot intelligent control system, reference should be made to the description of parts of the method embodiments for relevance, since it basically corresponds to the method embodiments. The robot intelligent control system embodiments described above are merely illustrative, wherein the modules illustrated as separate components may or may not be physically separate, and the components shown as modules may or may not be physical modules, i.e., may be located in one place, or may be distributed across multiple network modules. Some or all of the modules may be selected according to actual needs to achieve the purposes of the present application. Those of ordinary skill in the art will understand and implement the present application without undue burden.

Referring to fig. 4, a schematic hardware structure of an electronic device according to an exemplary embodiment of the application is shown. The electronic device may include a processor 401, a communication interface 402, a memory 403, and a communication bus 404. Processor 401, communication interface 402 and memory 403 perform communication with each other via communication bus 404. Wherein the memory 403 has stored thereon a computer program; the processor 401 may perform the steps of the method described in the above embodiments by executing a program stored on the memory 403. The electronic device may further include other hardware according to the actual function of the electronic device, which will not be described in detail.

Embodiments of the subject matter and functional operations described in this disclosure may be implemented in the following: digital electronic circuitry, tangibly embodied computer software or firmware, computer hardware including the structures disclosed in this application and structural equivalents thereof, or a combination of one or more of them. Embodiments of the subject matter described in this specification can be implemented as one or more computer programs, i.e., one or more modules of computer program instructions encoded on a tangible, non-transitory program carrier for execution by, or to control the operation of, data processing apparatus. Alternatively or additionally, the program instructions may be encoded on a manually-generated propagated signal, e.g., a machine-generated electrical, optical, or electromagnetic signal, that is generated to encode and transmit information to suitable receiver apparatus for execution by data processing apparatus. The computer storage medium may be a machine-readable storage device, a machine-readable storage substrate, a random or serial access memory device, or a combination of one or more of them.

The processes and logic flows described in this application can be performed by one or more programmable computers executing one or more computer programs to perform corresponding functions by operating on input data and generating output. The processes and logic flows can also be performed by, and apparatus can also be implemented as, special purpose logic circuitry, e.g., an FPGA (field programmable gate array) or an ASIC (application-specific integrated circuit).

Computers suitable for executing computer programs include, for example, general purpose and/or special purpose microprocessors, or any other type of central processing unit. Typically, the central processing unit will receive instructions and data from a read only memory and/or a random access memory. The essential elements of a computer include a central processing unit for carrying out or executing instructions and one or more memory devices for storing instructions and data. Typically, a computer will also include, or be operatively coupled to receive data from or transfer data to, or both, one or more mass storage devices for storing data, e.g., magnetic, magneto-optical disks, or optical disks, etc. However, a computer does not have to have such a device. Furthermore, the computer may be embedded in another device, such as a mobile phone, a Personal Digital Assistant (PDA), a mobile audio or video player, a game console, a Global Positioning System (GPS) receiver, or a portable storage device such as a Universal Serial Bus (USB) flash drive, to name a few.

Computer readable media suitable for storing computer program instructions and data include all forms of non-volatile memory, media and memory devices including, for example, semiconductor memory devices (e.g., EPROM, EEPROM, and flash memory devices), magnetic disks (e.g., internal hard disk or removable disks), magneto-optical disks, and CD-ROM and DVD-ROM disks. The processor and the memory can be supplemented by, or incorporated in, special purpose logic circuitry.

While this application contains many specific implementation details, these should not be construed as limitations on the scope of any application or of what may be claimed, but rather as features of specific embodiments of particular applications. Certain features that are described in this application in the context of separate embodiments can also be implemented in combination in a single embodiment. On the other hand, the various features described in the individual embodiments may also be implemented separately in the various embodiments or in any suitable subcombination. Furthermore, although features may be acting in certain combinations and even initially claimed as such, one or more features from a claimed combination can in some cases be excised from the combination, and the claimed combination may be directed to a subcombination or variation of a subcombination.

Similarly, although operations are depicted in the drawings in a particular order, this should not be understood as requiring that such operations be performed in the particular order shown or in sequential order, or that all illustrated operations be performed, to achieve desirable results. In some cases, multitasking and parallel processing may be advantageous. Moreover, the separation of various system modules and components in the embodiments described above should not be understood as requiring such separation in all embodiments, and it should be understood that the described program components and systems can generally be integrated together in a single software product or packaged into multiple software products.

Thus, particular embodiments of the subject matter have been described. Other embodiments are within the scope of the following claims. In some cases, the actions recited in the claims can be performed in a different order and still achieve desirable results. Furthermore, the processes depicted in the accompanying drawings are not necessarily required to be in the particular order shown, or sequential order, to achieve desirable results. In some implementations, multitasking and parallel processing may be advantageous.

The foregoing description of the preferred embodiments of the application is not intended to be limiting, but rather to enable any modification, equivalent replacement, improvement or the like to be made within the spirit and principles of the application.

Claims

1. An intelligent control method for a robot is characterized by comprising the following steps:

Planning task execution information of a robot to execute a corresponding subtask before the subtask is executed; when the subtask is to be moved to a target position, planning a collision-free foot drop point of the robot based on a global three-dimensional semantic map, and when the subtask is to be used for grabbing a target object, planning a position gesture and a grabbing path of the robot based on a local three-dimensional semantic map;

the local three-dimensional semantic map is determined according to perception information in a working environment perceived by the robot in the subtask execution process; the perception information at least comprises: object attributes, object point clouds and environment point clouds in the working environment;

the global three-dimensional semantic map is a recorded global three-dimensional semantic map or is generated by splicing the recorded global three-dimensional semantic map and the latest local three-dimensional semantic map.

2. The method of claim 1, wherein the recorded global three-dimensional semantic map is determined according to the steps of:

Periodically acquiring images; for each acquired image, obtaining perception information in the working environment of the robot according to the acquired image;

3. The method of claim 2, wherein after stitching based on the local three-dimensional semantic map and the recorded global three-dimensional semantic map to obtain a global three-dimensional semantic map, the method further comprises:

Compressing the global three-dimensional semantic map, and carrying out linguistic processing on the basis of the compressed global three-dimensional semantic map to obtain a linguistic global map;

Updating the recorded linguistic global map into the linguistic global map.

4. The method of claim 1, wherein decomposing the current task into a sequence of subtasks via a large language model comprises:

If at least one target history subtask associated with the current task exists in the recorded history task set, inputting each target history subtask, the current task and a recorded linguistic global map into the large language model as input data to obtain the subtask sequence;

Otherwise, the current task and the recorded linguistic global map are used as input data to be input into the large language model to obtain the subtask sequence;

The historical task set comprises at least one historical subtask which is executed by the robot in a specified time period and meets a set condition; any history subtask at least carries an execution result indicating whether the history subtask is successfully executed; the setting condition at least comprises that the task type is the target type.

5. The method according to claim 1, characterized in that the method further comprises:

aiming at any subtask, obtaining an execution result of the robot for executing the subtask;

If the execution result indicates that the execution of the subtask fails, if the total number of times of execution failure of the subtask does not reach the set number of times, re-planning task execution information of the robot for executing the subtask according to the recorded linguistic global map and/or the recorded global three-dimensional semantic map;

If the total times of execution failure of the subtasks reaches the set times, the current task is decomposed into a new subtask sequence again through a large language model according to the recorded linguistic global map and an execution result indicating that the execution of the subtasks fails, and the step of planning the task execution information of the robot to execute the subtasks before the execution of the corresponding subtasks is continuously executed.

6. The method of claim 1, wherein the planning collision-free footage of the robot based on the global three-dimensional semantic map comprises:

determining a point cloud of the target position in the global three-dimensional semantic map based on the global three-dimensional semantic map;

Determining a sampling range based on the point cloud of the target position and the attribute information of the robot; uniformly and alternately sampling point clouds positioned in the sampling range in the global three-dimensional semantic map to obtain a plurality of sampling points;

determining whether obstacle point clouds exist in point clouds positioned in a first target range in the global three-dimensional semantic map according to any sampling point; the first target range is determined depending on the sampling point;

a collision-free landing point of the robot is determined based on each reference collision-free landing point.

7. The method of claim 1, wherein the planning the robot's position pose and grasp path based on the local three-dimensional semantic map comprises:

planning a grabbing path of the robot based on the point cloud of the target object and the point cloud positioned in a second target range in the local three-dimensional semantic map; the second target range is determined in dependence of a point cloud of the target object.

8. An intelligent control system for a robot, the system comprising: the system comprises a decision module, a perception module and at least one planning module;

the decision module is used for decomposing the current task into a subtask sequence through a large language model, and distributing a corresponding planning module for each subtask in the subtask sequence;

any planning module is used for planning the task execution information of the robot to execute the subtask before the corresponding subtask is executed; when the subtask is a grabbing target object, the allocated planning module of the subtask plans a position gesture and a grabbing path of the robot based on the local three-dimensional semantic map;

The local three-dimensional semantic map is determined by the perception module according to perception information in a working environment perceived by the robot in the subtask execution process; the perception information at least comprises: object attributes, object point clouds and environment point clouds in the working environment;

The global three-dimensional semantic map is a recorded global three-dimensional semantic map or is generated by splicing the recorded global three-dimensional semantic map and the local three-dimensional semantic map which is obtained by the perception module recently.

9. An electronic device, the electronic device comprising:

a processor; and

A memory in which computer program instructions are stored which, when executed by the processor, cause the processor to perform the steps of the method of any one of claims 1 to 7.

10. A computer readable storage medium, characterized in that it has stored thereon computer program instructions which, when executed by a processor, cause the processor to perform the steps of the method according to any of claims 1 to 7.