CN112784664A

CN112784664A - Semantic map construction and operation method, autonomous mobile device and storage medium

Info

Publication number: CN112784664A
Application number: CN202010852364.8A
Authority: CN
Inventors: 蔡瑞莹; 鲍亮
Original assignee: Ecovacs Robotics Suzhou Co Ltd
Current assignee: Ecovacs Robotics Suzhou Co Ltd
Priority date: 2019-11-07
Filing date: 2020-08-21
Publication date: 2021-05-11

Abstract

The embodiment of the application provides a semantic map construction and operation method, autonomous mobile equipment and a storage medium. In the embodiment of the application, the environment information of the operation area of the autonomous mobile equipment is collected, and the semantic information of the object contained in the operation area is obtained by carrying out target detection and semantic recognition on the environment information; furthermore, scene semantic information of the operation partition is identified according to the semantic information of the objects contained in the operation area, and a semantic map containing the scene semantic information of the operation partition is constructed.

Description

Semantic map construction and operation method, autonomous mobile device and storage medium

Technical Field

The application relates to the technical field of artificial intelligence, in particular to a semantic map construction and operation method, autonomous mobile equipment and a storage medium.

Background

With the development of artificial intelligence technology, home appliances also tend to be intelligent. For example, a sweeping robot may automatically perform floor sweeping tasks, freeing the user from cleaning tasks. Some robots can construct an environment map in real time by using sensors such as a camera or a radar, and perform autonomous positioning and navigation by using the environment map.

However, the information included in the existing environment map is only a few digital information, which is not intuitive for users and is not easy to understand. The semantic map can fuse 'semantic information' which is easy to understand by a user into an environment map, and has attracted wide attention in the fields of computer vision, machine intelligence and the like. However, the semantic information contained in the existing semantic map is relatively simple and cannot meet the requirements of people.

Disclosure of Invention

Aspects of the present application provide a semantic map construction and operation method, an autonomous mobile device, and a storage medium, so as to enrich semantic information included in a semantic map and meet the demand of people on the semantic map.

The embodiment of the application provides a semantic map construction method, which is suitable for autonomous mobile equipment and comprises the following steps: collecting environmental information of a working area; carrying out target detection on the environmental information to obtain semantic information of at least one object contained in the working area; identifying scene semantic information of the operation area according to the semantic information of at least one object; and constructing a semantic map corresponding to the operation area according to the scene semantic information of the operation area.

The embodiment of the application also provides a semantic map construction method, which is suitable for autonomous mobile equipment and comprises the following steps: collecting environmental information of a working environment, wherein the working environment comprises at least one working partition; performing target detection on the environment information to obtain semantic information of a plurality of objects contained in the working environment; identifying scene semantic information of at least one operation partition according to semantic information of a plurality of objects contained in the operation environment; and constructing a semantic map corresponding to the operation environment according to the scene semantic information of at least one operation partition.

The embodiment of the application also provides a semantic map construction method, which is suitable for a server and comprises the following steps: receiving environment information of a working area reported by the autonomous mobile equipment; carrying out target detection on the environment information to obtain semantic information of at least one object contained in the working environment; identifying scene semantic information of the operation area according to the semantic information of at least one object; according to the scene semantic information of the operation area, constructing a semantic map corresponding to the operation area; and sending the semantic map to the autonomous mobile equipment so that the autonomous mobile equipment can execute the operation task based on the semantic map.

The embodiment of the application also provides a semantic map construction method, which is suitable for a server and comprises the following steps: receiving environment information of a working environment reported by an autonomous mobile device, wherein the working environment comprises at least one working partition; performing target detection on the environment information to obtain semantic information of a plurality of objects contained in the working environment; identifying scene semantic information of at least one operation partition according to semantic information of a plurality of objects contained in the operation environment; and constructing a semantic map corresponding to the operation environment according to the scene semantic information of at least one operation partition. And sending the semantic map to the autonomous mobile equipment so that the autonomous mobile equipment can execute the operation task based on the semantic map.

An embodiment of the present application further provides an operating method, which is applicable to an autonomous mobile device, and includes: receiving a work instruction, wherein the work instruction comprises scene semantic information of a target work area; determining the position of the target operation area in a pre-established semantic map according to the scene semantic information of the target operation area; and moving to the target working area based on the position, and executing the working task in the target working area.

The embodiment of the present application further provides an operation method, which is applicable to a terminal device, and the method includes: displaying a semantic map corresponding to the working environment of the autonomous mobile device, wherein the semantic map comprises at least one working area and scene semantic information thereof; responding to the selection operation of a user on at least one operation area, and generating an operation instruction, wherein the operation instruction comprises a target operation area selected by the user and scene semantic information of the target operation area; and sending the operation instruction to the autonomous mobile equipment so as to control the autonomous mobile equipment to execute the operation task in the target operation area.

An embodiment of the present application further provides an autonomous mobile device, including: the device comprises a device body, wherein one or more processors and one or more memories for storing computer instructions are arranged on the device body; one or more processors to execute computer instructions to: collecting environmental information of a working area; carrying out target detection on the environmental information to obtain semantic information of at least one object contained in the working area; identifying scene semantic information of the operation area according to the semantic information of at least one object; and according to the scene semantic information of the operation area, constructing a semantic map corresponding to the operation area.

An embodiment of the present application further provides an autonomous mobile device, including: the device comprises a device body, wherein one or more processors and one or more memories for storing computer instructions are arranged on the device body; one or more processors to execute computer instructions to: collecting environmental information of a working environment, wherein the working environment comprises at least one working partition; performing target detection on the environment information to obtain semantic information of a plurality of objects contained in the working environment; identifying scene semantic information of at least one operation partition according to semantic information of a plurality of objects contained in the operation environment; and constructing a semantic map corresponding to the operation environment according to the scene semantic information of at least one operation partition.

An embodiment of the present application further provides an autonomous mobile device, including: the device comprises a device body, wherein one or more processors and one or more memories for storing computer instructions are arranged on the device body; one or more processors to execute computer instructions to: receiving a work instruction, wherein the work instruction comprises scene semantic information of a target work area; determining the position of the target operation area in a pre-established semantic map according to the scene semantic information of the target operation area; and controlling the autonomous mobile equipment to move to the target working area based on the position, and executing the working task in the target working area.

An embodiment of the present application further provides a server, including: one or more processors, a communications component, and one or more memories storing computer instructions; one or more processors to execute computer instructions to: receiving environment information of an operation area reported by the autonomous mobile equipment through a communication component; carrying out target detection on the environment information to obtain semantic information of at least one object contained in the working environment; identifying scene semantic information of the operation area according to the semantic information of at least one object; according to the scene semantic information of the operation area, constructing a semantic map corresponding to the operation area; the semantic map is sent to the autonomous mobile device through the communication component for the autonomous mobile device to perform the job task based on the semantic map.

An embodiment of the present application further provides a server, including: one or more processors, a communications component, and one or more memories storing computer instructions; one or more processors to execute computer instructions to: receiving environment information of a working environment reported by the autonomous mobile equipment through a communication component, wherein the working environment comprises at least one working partition; performing target detection on the environment information to obtain semantic information of a plurality of objects contained in the working environment; identifying scene semantic information of at least one operation partition according to semantic information of a plurality of objects contained in the operation environment; and constructing a semantic map corresponding to the operation environment according to the scene semantic information of at least one operation partition. The semantic map is sent to the autonomous mobile device through the communication component for the autonomous mobile device to perform the job task based on the semantic map.

An embodiment of the present application further provides a terminal device, including: one or more processors, a display, a communications component, and one or more memories storing computer instructions; the display is used for displaying a semantic map corresponding to the working environment of the autonomous mobile equipment, and the semantic map comprises at least one working area and scene semantic information thereof; one or more processors to execute computer instructions to: responding to the selection operation of a user on at least one operation area, and generating an operation instruction, wherein the operation instruction comprises a target operation area selected by the user and scene semantic information of the target operation area; and sending the work instruction to the autonomous mobile equipment through the communication component so as to control the autonomous mobile equipment to execute the work task in the target work area.

Embodiments of the present application also provide a computer-readable storage medium storing computer instructions, which, when executed by one or more processors, cause the one or more processors to perform the steps in the semantic map construction method or the operation method embodiments provided in the embodiments of the present application.

In the embodiment of the application, the environment information of the operation area of the autonomous mobile equipment is collected, and the semantic information of the object contained in the operation area is obtained by carrying out target detection and semantic recognition on the environment information; furthermore, scene semantic information of the operation partition is identified according to the semantic information of the objects contained in the operation area, and a semantic map containing the scene semantic information of the operation partition is constructed.

Drawings

The accompanying drawings, which are included to provide a further understanding of the application and are incorporated in and constitute a part of this application, illustrate embodiment(s) of the application and together with the description serve to explain the application and not to limit the application. In the drawings:

fig. 1a is a schematic flowchart of a semantic map construction method according to an exemplary embodiment of the present disclosure;

fig. 1b is a schematic diagram of a defined relationship between an object and a regional scene learned by a bayesian network model according to an embodiment of the present application;

FIG. 1c is a schematic flow chart illustrating another semantic map construction method according to an exemplary embodiment of the present disclosure;

fig. 2a is a schematic flowchart of a method for constructing a semantic map according to an exemplary embodiment of the present disclosure;

FIG. 2b is a schematic flowchart illustrating a method for constructing a semantic map according to an exemplary embodiment of the present disclosure;

fig. 2c is a schematic diagram of a semantic map constructed according to an embodiment of the present application, provided by an exemplary embodiment of the present application;

FIG. 3a is a schematic flow chart of a method of operation provided in an exemplary embodiment of the present application;

FIG. 3b is a schematic flow chart diagram illustrating another method of operation provided by an exemplary embodiment of the present application;

fig. 4a is a schematic structural diagram of an autonomous mobile apparatus according to an exemplary embodiment of the present application;

fig. 4b is a schematic structural diagram of another autonomous mobile device provided in an exemplary embodiment of the present application;

fig. 4c is a schematic structural diagram of another autonomous mobile apparatus provided in an exemplary embodiment of the present application;

fig. 5 is a schematic structural diagram of a sweeping robot according to an exemplary embodiment of the present disclosure;

fig. 6 is a schematic structural diagram of a server according to an exemplary embodiment of the present application;

fig. 7 is a schematic structural diagram of a terminal device according to an exemplary embodiment of the present application.

Detailed Description

In order to make the objects, technical solutions and advantages of the present application more apparent, the technical solutions of the present application will be described in detail and completely with reference to the following specific embodiments of the present application and the accompanying drawings. It should be apparent that the described embodiments are only some of the embodiments of the present application, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present application.

The semantic information contained in the existing semantic map is relatively simple and cannot meet the requirements of people. Aiming at the technical problems, in some embodiments of the application, environment information of an autonomous mobile equipment operation area is collected, and semantic information of objects contained in the operation area is obtained by performing target detection and semantic recognition on the environment information; furthermore, scene semantic information of the operation partition is identified according to the semantic information of the objects contained in the operation area, and a semantic map containing the scene semantic information of the operation partition is constructed.

It is noted that the semantic mapping method provided in the embodiments of the present application may be implemented by an autonomous mobile device, or may be implemented by a server. The following will be described in detail with reference to the accompanying figures 1a-2b, respectively, by different embodiments.

Before the embodiments of the present application are explained in detail, the "autonomous mobile device" in the embodiments of the present application is explained. The explanation is applicable to all the embodiments of the present application, and repeated explanation will not be provided in the following embodiments.

In the embodiment of the present application, the autonomous moving apparatus may be any mechanical apparatus capable of performing a highly autonomous spatial movement in its environment, and for example, may be a robot, a cleaner, an unmanned vehicle, or the like. The robot may include a sweeping robot, a accompanying robot, a guiding robot, or the like.

Fig. 1a is a schematic flowchart of a semantic map construction method according to an exemplary embodiment of the present disclosure. The method is applicable to autonomous mobile devices, as shown in fig. 1a, and comprises the following steps:

10a, collecting environmental information of a working area by autonomous mobile equipment;

11a, carrying out target detection on the environmental information to obtain semantic information of at least one object contained in the working area;

12a, identifying scene semantic information of the working area according to the semantic information of at least one object;

and 13a, constructing a semantic map corresponding to the operation area according to the scene semantic information of the operation area.

The work area of the autonomous mobile device refers to an environmental area in which the autonomous mobile device is located during the execution of the work task. At least one object is generally contained within the work area of the autonomous mobile machine. The autonomous mobile device has different implementation forms, and the working area of the autonomous mobile device is different, and objects contained in different working areas are also different. The following examples illustrate:

for example, the autonomous mobile device is a sweeping robot, and the work area may be a kitchen, a living room, a bedroom, or the like. For kitchen areas, included objects include, but are not limited to: electric rice cookers, microwave ovens, refrigerators, etc.; for the bedroom area, the objects included include, but are not limited to: beds, wardrobes, air conditioners, bed lights, etc.; for a living room area, included objects include, but are not limited to: television, tea table, sofa, etc.

In this embodiment, the autonomous mobile device may acquire environmental information of a working area of the autonomous mobile device, and perform target detection on the acquired environmental information to obtain at least one object included in the working area. In this embodiment, the autonomous mobile device may acquire the environmental information of the operation area in real time during the operation process, or may adopt an offline acquisition mode.

Wherein the autonomous mobile device may utilize one or more sensors to collect environmental information of the work area. The one or more sensors used by the autonomous mobile device may be mounted on the autonomous mobile device or may be separate from but communicatively coupled to the autonomous mobile device. In addition, the one or more sensors may include various types of sensors such as a monocular camera, a binocular camera, a lidar, a microwave radar, and an infrared radar. The types of environmental information collected by different types of sensors may vary. If the environment information is a visual sensor such as a camera, the acquired environment information is an environment image.

Semantic maps are a very important problem in the fields of computer vision, machine intelligence and the like, and for autonomous mobile equipment, semantic maps are required to be constructed. In this embodiment, first, target detection is performed on the collected environment information, so that it can be detected that at least one object is included in the working area, and besides, semantic recognition can be performed on the detected at least one object to obtain semantic information of the at least one object. In this embodiment, the scenized semantic information of the work area may be further identified according to the semantic information of the objects included in the work area, and then the semantic map including the scenized semantic information of the work area may be constructed. Compared with the existing semantic map, the semantic map constructed by the embodiment has higher semantic level, is convenient for interactive control with the autonomous mobile equipment, improves the experience of users, and can meet the requirements of people on the semantic map.

In the present embodiment, the semantic information of the object refers to information that can comprehend and interpret what the object is or belongs to by means of natural language, and may include, but is not limited to, the name, shape, position, and the like of the object. Similarly, the scene semantic information of the work area refers to information such as what scene or what scene the work area belongs to can be grasped and interpreted by natural language, and may be, for example, a name, description information, and the like of the scene the work area belongs to.

In the embodiment of the present application, the embodiment of detecting the target of the environment information of the work area is not limited. Any embodiment that can detect that the work area includes at least one object from the environmental information and can recognize semantic information of the at least one object is applicable to the embodiment of the present application. An alternative embodiment is given below.

In an alternative embodiment, the environmental information may be subject to object detection and semantic recognition using an object detection model to obtain semantic information of the at least one object. In this optional embodiment, the target detection model has both the functions of target detection and semantic recognition, and can recognize the object and its position included in the environmental information and perform semantic recognition on the detected object to obtain the semantic information of the object. All models with the functions of target detection and semantic recognition can be used as the target detection model of the embodiment. In the embodiment of the present application, a Convolutional Neural Network (CNN) model may be adopted, and the following brief description is made on the operation principle of the CNN model in the embodiment of the present application:

the CNN model has the functions of classification, target detection, semantic segmentation and the like. In this embodiment, the collected environmental information is input into the CNN model, in the CNN model, the convolutional layer is used to perform preliminary feature extraction on the input environmental information, the pooling layer is used to perform main feature extraction on the preliminary features, the full-link layer is used to summarize the features of each part, and the classifier is used to perform prediction and identification on the summarized features. Taking the environmental information including a television, a tea table and a sofa as an example, the CNN model can utilize the convolution layer to carry out primary extraction on the environmental information to obtain the local characteristics of the environmental information, utilize the pooling layer to carry out main characteristic extraction on the local characteristics of the environmental information to obtain that the environmental information comprises three objects, utilize the full connection layer to summarize the main characteristics of the environmental information, and identify various attribute information such as the shapes, the positions, the sizes and the like of the three objects; further, the classifier is used for identifying three objects, namely a television, a tea table and a sofa.

Before using the target detection model, the target detection model may be trained in advance based on the labeled environmental information. The labeled environment information refers to the position of the object and semantic information of the object, which are labeled to be included in the environment information. For example, one piece of labeled environment information includes furniture appliances such as tea tables, televisions, sofas, and the like, another piece of labeled environment information includes furniture appliances such as beds, bedside cabinets, and wardrobes, and another piece of labeled environment information includes furniture appliances such as refrigerators, cabinets, and the like. Based on the labeled environment information, a network model such as a CNN model can be trained.

In some embodiments of the present application, a defined relationship between some objects and the regional scene may be learned in advance. Based on the above, an embodiment of recognizing scenized semantic information of a work area based on semantic information of at least one object includes: and recognizing scene semantic information of the working area according to the semantic information of at least one object by combining the limit relation between the objects and the area scene learned in advance.

In an optional embodiment, if the pre-learned limited relationship between the object and the area scene may be a mapping relationship between the semantic information of the object and the area scene, the semantic information of at least one object may be matched in the mapping relationship, and the area scene in the matching may be used as the scenized semantic information of the working area.

In another alternative embodiment, a semantic recognition model may be trained in advance by using a machine learning method, and the semantic recognition model has the capability of learning the defined relationship between the object and the regional scene, and may embody the defined relationship between the object and the regional scene. Based on the semantic information, combined with the limit relationship between the objects and the regional scenes learned in advance, the scene semantic information of the operation region is identified according to the semantic information of at least one object, and the scene semantic information recognition method comprises the following steps: and inputting semantic information of at least one object into a semantic recognition model trained in advance, and recognizing scene semantic information of the operation area by using the semantic recognition model.

In the embodiments of the present application, the semantic recognition model is not limited, and any machine model that can represent a defined relationship between an object and a regional scene is suitable for the embodiments of the present application. For example, the semantic recognition model may be a Bayesian network model. The Bayesian network model is a probability type network model, and the Bayesian formula is the basis of the network model. The bayesian network is a mathematical model based on probabilistic reasoning, which is a process of obtaining other probability information through information of some variables. For example, the probability of the occurrence of an object such as a refrigerator or a cabinet, the probability of the occurrence of a scene in the area of a kitchen, and the conditional probability of the occurrence of the refrigerator or the cabinet in the kitchen are known, and the probability of the partition being the kitchen under the condition that the refrigerator occurs in the partition can be obtained.

Fig. 1b is a schematic diagram of a defined relationship between an object and a regional scene learned by a simple bayesian network model. In the bayesian network model shown in fig. 1b, objects and area scenes such as sofas, ceilings, refrigerators, etc. can be represented: the limited relation between the living rooms, the objects such as sliding doors and the like and the regional scenes: defined relationships between balconies, sliding doors, ceilings, refrigerators and cabinets and regional scenes: the defined relationship between kitchens, objects such as ceilings, toilets and regional scenes: the limited relationship between the toilets, and the objects and regional scenes such as beds, wardrobes and the like: a defined relationship between the bedrooms. The defined relationship between the object and the area scene shown in fig. 1b is merely an exemplary illustration and is not limited thereto.

In connection with the defined relationship between the object and the area scene shown in fig. 1b, assuming that the object identified from the environment information of a certain working area includes a cabinet and a refrigerator, for example, two semantic information of the cabinet and the refrigerator are input into a bayesian network model, and inside the bayesian network model, based on the learned defined relationship between the object and the area scene, the working area is considered to have a high probability of being a kitchen, so that outputting the scenized semantic information of the working area is the kitchen.

Prior to using the bayesian network model, the bayesian network model may be trained in advance. Wherein the process of training the Bayesian network model comprises: acquiring a plurality of labeled samples, wherein each labeled sample comprises scene semantic information of a sample region and semantic information of a sample object contained in the sample region; obtaining the prior probability of each sample object appearing in each sample area from a plurality of labeled samples; and further, a Bayesian network model can be trained according to the prior probability of each sample object appearing in each sample area. The Bayesian network model learns the limited relation between the objects and the regional scenes, the semantic information of the objects contained in one operation region is sent to the Bayesian network model, and the model can output the regional scenes to which the objects belong, namely the scene semantic information of the operation region in which the objects are positioned.

For example, in a home environment, environment information of each work area such as a kitchen, a living room, a bedroom, a bathroom, and a balcony may be collected in advance, and semantic information of objects included in each work area such as the kitchen, the living room, the bedroom, the bathroom, and the balcony may be identified and marked, where the kitchen includes objects such as a refrigerator, a cabinet, a sliding door, and a ceiling, the living room includes objects such as a sofa, a ceiling, and a refrigerator, the balcony includes objects such as a sliding door and a clothes hanger, the bathroom includes objects such as a ceiling, a toilet, and the bedroom includes objects such as a bed, and a wardrobe. And respectively taking the operation areas and the objects contained in the operation areas as a sample area and a sample object, and performing model training according to the scene semantic information of the sample area and the semantic information of the sample object contained in the sample area to obtain a Bayesian network model.

The semantic map of the work area constructed in the embodiment of the present application may include semantic information of at least one object in the work area, may include scenized semantic information corresponding to the work area, or may be a combination of the two. For example: the semantic map under the home environment can identify each furniture on the semantic map, can identify the name of each room on the semantic map, and can also identify the names of the furniture and the rooms, and the three conditions are called as the semantic map under the home environment. Based on the above, in an optional embodiment, constructing a semantic map corresponding to the work area according to the scenized semantic information of the work area includes: and constructing a semantic map corresponding to the operation area according to the scene semantic information of the operation area and the semantic information of at least one object. The semantic map comprises double-layer semantic information of objects and operation areas, so that the semantic information is richer, and the requirements of people on the semantic map are greatly met.

In addition, the semantic map may include position information, shape, etc. of at least one object within the work area. Based on the above, a detailed embodiment for constructing the semantic map corresponding to the working area comprises the following steps: and combining the position information of the at least one object in the operation area, and constructing a semantic map corresponding to the operation area according to the scene semantic information of the operation area and the semantic information of the at least one object.

In the process of carrying out target detection and semantic recognition on the environment information of the working area, the position information of at least one object contained in the working area can be detected. Based on the above, in the process of constructing the semantic map, the position information of at least one object in the operation area can be combined to construct an environment map corresponding to the operation area; and marking the scene semantic information of the operation area and the semantic information of at least one object in the environment map to obtain a semantic map corresponding to the operation area. The semantic map simultaneously embodies the environmental information and the semantic information of the operation area, wherein the autonomous mobile equipment can be helped to execute various tasks or solve the problems of path planning and the like through the environmental information, and the semantic information can facilitate the communication between a user and the autonomous mobile equipment, so that the requirements of people are further met.

In the above embodiment, the autonomous mobile device has been described as an example of constructing a semantic map corresponding to a work area. The semantic map construction can also be completed by the autonomous mobile device and the server, and is mainly implemented by the server, which will be described in detail in the following embodiments.

Fig. 1c is a schematic flowchart of another semantic map construction method according to an exemplary embodiment of the present disclosure. As shown in fig. 1c, the method comprises the steps of:

10c, the server receives the environment information of the operation area reported by the autonomous mobile equipment;

11c, the server carries out target detection on the environmental information to obtain semantic information of at least one object contained in the working area;

12c, identifying scene semantic information of the operation area according to the semantic information of at least one object;

13c, constructing a semantic map corresponding to the operation area according to the scene semantic information of the operation area;

14c, the server sends the semantic map to the autonomous mobile device so that the autonomous mobile device can execute the operation task based on the semantic map.

In this embodiment, the autonomous mobile device may collect environment information of the operating environment and report the environment information to the server. For the server, the environment information of the operation area reported by the autonomous mobile device can be received, then a semantic map corresponding to the operation environment is constructed, and finally the semantic map is sent to the autonomous mobile device so that the autonomous mobile device can execute the operation task based on the semantic map.

Regarding the detailed implementation of steps 11c-13c, similar to steps 11a-13a in the above embodiments, the difference is only that the execution main body is different, and the detailed description process or implementation can refer to the foregoing embodiments and will not be repeated herein.

In order to facilitate understanding, the semantic map construction process provided by the embodiment of the application is described in detail by taking the autonomous mobile device as a sweeping robot and combining a scene in which the sweeping robot executes a sweeping task.

Application scenario example 1:

in a home environment, the sweeping robot is required to sweep an unknown working area, and the unknown working area can be an area where the sweeping robot performs a sweeping task for the first time or an area which cannot be understood by the sweeping robot because a semantic map is not constructed yet. In the process that the sweeping robot cleans the unknown operation area, a camera on the robot is used for collecting an environment image of the operation area in real time, the environment image is sent to a target detection model built in the sweeping robot for carrying out target detection and semantic recognition on the environment image, and semantic information of at least one object contained in the unknown operation area is obtained, namely which object is. In addition, the actual position of the at least one object in the working area can be estimated according to the pixel position of the at least one object in the environment image. In this embodiment, assuming that the unknown region includes objects such as a sofa, a tea table, and a television, the target detection model built in the sweeping robot may output semantic information that the unknown operation region includes objects such as a sofa, a tea table, and a television. The sweeping robot is further internally provided with a pre-trained Bayesian network model, after semantic information of objects such as sofas, tea tables and televisions is obtained, the semantic information of the objects is input into the pre-trained Bayesian network model, and scene information of the unknown operation area is output by the Bayesian network model. In this embodiment, the working area includes a sofa, a television cabinet and a television, and the bayesian model considers that the working area has a high probability of being a living room, so that the output scenographic information of the working area is the living room. Of course, the objects included in the work area are not limited to the sofa, the television set, and the television, but may include only the sofa, and in the case where only one object, namely, the sofa, is included in the work area, if the bayesian network model can recognize that the work area is a living room at a high probability, the scenized semantic information of the work area is also output to the living room.

On the basis of obtaining scene semantic information of an unknown operation area, semantic information of an object contained in the unknown operation area and position information of the object, a semantic map of the unknown operation area can be constructed by combining the scene semantic information, the semantic information of the object and the position information of the object. The semantic map includes: the position and semantic information of objects such as sofas, television cabinets, televisions and the like also include semantic information that the area is a living room.

Optionally, for any object, the estimating of the actual position of the object in the working area includes: determining the pixel coordinates of the object in the environment image; and converting the pixel coordinates of the object in the environment image into coordinates in a world coordinate system, namely the actual position of the object in the working area according to the coordinate conversion relation between the coordinate system of the camera and the world coordinate system.

Application scenario example 2:

in a home environment, the sweeping robot is required to sweep an unknown working area, and the unknown working area can be an area where the sweeping robot performs a sweeping task for the first time or an area which cannot be understood by the sweeping robot because a semantic map is not constructed yet. In the process of cleaning the unknown operation area by the sweeping robot, the camera on the robot is used for acquiring the environment image of the unknown operation area in real time, and the environment image acquired in real time is sent to the server.

The server receives an environment image sent by the sweeping robot, and performs target detection and semantic recognition on the environment image by using a target detection model to obtain semantic information of at least one object contained in an unknown operation area, namely which object the object is. In addition, the server can estimate the actual position of the object in the unknown working area according to the pixel position of the object in the environment image. In this embodiment, taking the example that the unknown operation area includes an object such as a range hood, an induction cooker, a coffee machine, etc., the server may output semantic information of the object such as the range hood, the induction cooker, the coffee machine, etc. by using the target detection model. Semantic information of the objects is sent to a Bayesian network model which is trained in advance, and the Bayesian network model considers that the unknown operation area has a high probability of being a kitchen based on a limit relation between the objects and the area scenes which are learned in advance, so that scene semantic information of the unknown operation area is output as the kitchen. Furthermore, a semantic map of the unknown working area is constructed based on the scenized semantic information of the unknown working area and the semantic information of the objects contained in the unknown working area, and in combination with the actual positions of the objects in the unknown working area. The semantic map includes the position and semantic information of an object such as a range hood, an induction cooker, a coffee maker, and the like, and also includes semantic information that the area is a kitchen.

The above example illustrates a scheme for constructing a semantic map according to the embodiment of the present application, taking one work area as an example. The technical scheme provided by the embodiment of the application can also be used for constructing the semantic map aiming at the operation environment comprising a plurality of operation areas. This will be explained in the following examples.

Fig. 2a is a schematic flowchart of a method for constructing a semantic map according to an exemplary embodiment of the present application. The method is described from the perspective of an autonomous mobile device, as shown in fig. 2a, and comprises the steps of:

20a, collecting environment information of a working environment, wherein the working environment comprises at least one working partition;

21a, carrying out target detection on the environment information to obtain semantic information of a plurality of objects contained in the working environment;

22a, identifying scene semantic information of at least one work partition according to semantic information of a plurality of objects contained in the work environment;

and 23a, constructing a semantic map corresponding to the operation environment according to the scene semantic information of at least one operation partition.

Depending on the application scenario, the operating environment of the autonomous mobile device may also vary. The operating environment of the autonomous mobile device may be the entire environmental area in which the autonomous mobile device is located or may be a partial environmental area in which the autonomous mobile device is located. Taking a home environment as an example, the work environment of the autonomous mobile device may be the entire home environment area; further, if the entire home environment is divided into different areas such as bedrooms, kitchens, living rooms, balconies, etc., the working environment of the autonomous mobile device may also be a local area such as bedrooms, kitchens, etc. in the home environment.

In this embodiment, the autonomous mobile device may collect environment information of its working environment, which generally includes a plurality of objects. Optionally, the autonomous mobile device may collect environment information of its work environment during execution of a work in the work environment. Alternatively, the autonomous mobile device may move (not performing a job) within its work environment and collect environmental information of the work environment during the movement in order to construct a semantic map.

In this embodiment, the work environment of the autonomous mobile device may be divided into at least one work partition. For example, taking the working environment of the autonomous mobile device as an example, the whole home environment area may be divided into different working areas such as bedroom, kitchen, living room, balcony, etc. For example, the working environment of the autonomous mobile device is a living room in a home environment, and the living room area may be divided into different working areas such as a main area, a sofa area, and a television area. For another example, taking the working environment of the autonomous mobile device as a kitchen in a home environment as an example, the trigger area may be further divided into different working zones such as a kitchen zone, a dining zone, and a bar zone.

It should be noted that at least one job partition included in the job environment may be obtained by partitioning the job environment in advance. Or, the work environment may be obtained by partitioning in real time according to the collected environment information in the process of building the semantic map, which is not limited in this embodiment.

After the environmental information of the working environment is collected, the collected environmental information can be subjected to target detection to obtain semantic information of a plurality of objects contained in the working environment; furthermore, the scene semantic information of at least one operation partition contained in the operation environment can be identified according to the semantic information of a plurality of objects contained in the operation environment; furthermore, a semantic map containing semantic information of the job partition can be constructed based on the scene semantic information of at least one job partition contained in the job environment. The semantic map has higher semantic level, is convenient to interact with the autonomous mobile equipment, and meets the requirements of people on the semantic map.

In the embodiment of the present application, the embodiment of detecting the target of the environment information of the work environment is not limited. Any embodiment that can detect semantic information of a plurality of objects and a plurality of objects included in a work environment from environment information is applicable to the embodiment of the present application. An alternative embodiment is given below.

In an alternative embodiment, the environment information may be subjected to object detection and semantic recognition by using an object detection model to obtain semantic information of a plurality of objects included in the working environment. For a detailed implementation of the target detection model and the target detection and semantic recognition based on the target detection model, reference may be made to the foregoing embodiments, which are not described herein again.

In one embodiment, an implementation of identifying scenarized semantic information for at least one job partition based on semantic information for a plurality of objects included in a job environment includes: determining objects contained in each of at least one work partition according to the position information of the plurality of objects and the boundary of the at least one work partition; and identifying scene semantic information of the at least one work partition according to the semantic information of the object contained in the at least one work partition.

Alternatively, in the process of performing object detection and semantic recognition on the environment information, position information of a plurality of objects included in the work environment may also be detected. And determining the objects contained in each of the at least one work partition by combining the position information of the plurality of objects and the boundary of the at least one work partition.

If at least one partition included in the working environment is obtained by partitioning the working environment in real time according to the collected environment information in the process of constructing the semantic map, before determining the object included in each of the at least one working partition, the method further includes: and partitioning the working environment in real time according to the environment information to determine the boundary of at least one working partition and provide a basis for determining objects contained in at least one working partition. For example, in a home environment, various doors, walls, etc. may be used as boundaries of different work partitions, so that the entire home environment may be divided into different work partitions such as a kitchen, a living room, a bedroom, a bathroom, etc. Of course, in a home environment, a line or some kind of object on the ground may be used as a boundary of different work divisions. For example, for a kitchen environment, a kitchen range, a bar counter, a table, and the like may be used as boundary information, and the entire kitchen environment may be divided into different work areas such as a kitchen range area, a bar counter area, and a table area.

In some optional embodiments of the present application, a defined relationship between some objects and the regional scene may be learned in advance. Based on the above, an embodiment of identifying scenized semantic information of at least one work partition according to semantic information of objects included in each of the at least one work partition includes: and identifying scene semantic information of at least one work partition according to semantic information of the object contained in each work partition by combining the limit relation between the object and the area scene learned in advance.

Further optionally, a semantic recognition model may be trained in advance by using a machine learning method, and the semantic recognition model has the capability of learning the limited relationship between the object and the regional scene, and may embody the limited relationship between the object and the regional scene. Based on this, the above-mentioned embodiment of recognizing the scenized semantic information of at least one job partition in combination with the previously learned limited relationship between the object and the area scene includes: and inputting semantic information of objects contained in at least one operation partition into a semantic recognition model trained in advance, and recognizing scene semantic information of at least one operation partition by using the semantic recognition model.

In the embodiments of the present application, the semantic recognition model is not limited, and any machine model that can represent a defined relationship between an object and a regional scene is suitable for the embodiments of the present application. For example, the semantic recognition model may be a Bayesian network model. For the working principle and the training process of the bayesian network model, reference may be made to the foregoing embodiments, which are not described herein again.

The semantic map of the work environment constructed in the embodiment of the present application may include scenarized semantic information of at least one work partition, semantic information of objects included in each of the at least one work partition, or a combination thereof. For example: the semantic map under the home environment can identify scene semantic information of partitions such as a bar counter, a cooking bench and the like on the map, can also identify furniture, household appliances and the like in a room on the map, and can also identify the furniture, the household appliances and the partition information, wherein the three conditions are the semantic map under the home environment.

In an optional embodiment, the constructing a semantic map corresponding to the job environment according to the scenized semantic information of at least one job partition includes: and constructing a semantic map corresponding to the operation environment according to the scene semantic information of at least one operation partition and the semantic information of the plurality of objects. The semantic map comprises object and operation partition double-layer semantic information, so that the semantic information is richer, and the requirements of people on the semantic map are greatly met.

In addition to this, the semantic map may include position information, shapes, and the like of objects included in the work environment. Based on the above, a detailed implementation of constructing the semantic map comprises: and combining position information of a plurality of objects in the working environment, wherein the position information comprises the working environment, and constructing a semantic map corresponding to the working environment according to the scenized semantic information of at least one working partition and the semantic information of the objects respectively contained in at least one working partition.

In the process of performing object detection and semantic recognition on the environment information of the working environment, position information of a plurality of objects included in the working environment can be detected. Based on the above, in the process of constructing the semantic map, an environment map corresponding to the working environment can be constructed by combining the position information of a plurality of objects in the working environment, wherein the position information is contained in the working environment; and marking the scene semantic information of at least one operation partition and the semantic information of the object contained in at least one operation partition in the environment map to obtain the semantic map corresponding to the operation environment. The semantic map simultaneously embodies the environmental information and the semantic information of the operation environment, wherein the environmental information can help the autonomous mobile equipment to execute various tasks or solve the problems of path planning and the like, and the semantic information can facilitate the communication between a user and the autonomous mobile equipment, so that the requirements of people are further met.

In the above embodiments, the semantic map for constructing the job environment including at least one job partition by the autonomous mobile device is taken as an example for explanation. The semantic map of the work environment including at least one work partition may also be constructed by the autonomous mobile device cooperating with a server, which is mainly implemented by the server, and this is described in detail in the following embodiments.

Fig. 2b is a flowchart illustrating a method for constructing a semantic map according to an exemplary embodiment of the present disclosure. The method is described from the server perspective, as shown in fig. 2b, and comprises the following steps:

20b, receiving environment information of a working environment reported by the autonomous mobile equipment, wherein the working environment comprises at least one working partition;

21b, carrying out target detection on the environment information to obtain semantic information of a plurality of objects contained in the working environment;

22b, identifying scene semantic information of at least one operation partition according to the semantic information of a plurality of objects contained in the operation environment;

23b, constructing a semantic map corresponding to the operation environment according to the scene semantic information of at least one operation partition;

and 24b, sending the semantic map to the autonomous mobile equipment so that the autonomous mobile equipment can execute the operation task based on the semantic map.

In this embodiment, the autonomous mobile device may collect environment information of the operating environment and report the environment information to the server. For the server, the environment information of the work environment reported by the autonomous mobile device can be received, then a semantic map corresponding to the work environment is constructed based on the scene semantic information of at least one work partition contained in the environment information, and finally the semantic map is sent to the autonomous mobile device so that the autonomous mobile device can execute the work task based on the semantic map.

Regarding the detailed implementation of steps 21b-23b, similar to steps 21a-23a in the above embodiments, the difference is only that the execution main body is different, and the detailed description process or implementation can refer to the foregoing embodiments and will not be repeated herein.

For convenience in understanding, the semantic map construction process provided by the embodiment of the present application is described in detail by taking an autonomous mobile device as a sweeping robot as an example and combining a scene in which the sweeping robot executes a sweeping task.

Application scenario example 3:

in a home environment, the sweeping robot is required to sweep an unknown working environment, and the unknown working environment can be an area where the sweeping robot performs a sweeping task for the first time or an area which cannot be understood by the sweeping robot because a semantic map is not constructed yet. In the process that the sweeping robot cleans the unknown operation environment, a camera on the robot is used for collecting an environment image of the operation environment in real time, the environment image is sent to a target detection model built in the sweeping robot for carrying out target detection and semantic recognition on the environment image, and semantic information of a plurality of objects contained in the unknown operation environment is obtained. In addition, the actual positions of the plurality of objects within the work environment may also be estimated from the pixel positions of the plurality of objects in the environment image. In this embodiment, taking a plurality of objects such as sofas, televisions, beds, dining tables, dishwashers, microwave ovens, washing machines, toilets, and the like, as examples, the target detection model built in the sweeping robot may output semantic information of the objects such as sofas, televisions, beds, dining tables, dishwashers, microwave ovens, washing machines, toilets, and the like, which are included in the unknown work environment.

Furthermore, the sweeping robot can divide the operation environment according to the collected environment information to obtain at least one operation division. In the present embodiment, taking the example that the unknown work environment is the whole home environment, it is assumed that the whole home environment is divided into several work divisions of a kitchen, a bedroom (including a main bedroom and a secondary bedroom), a living room, and a toilet. Further, the actual positions of the plurality of objects included in the work environment and the boundaries of the several work divisions may be combined to determine the objects included in each work division. As can be seen in connection with fig. 2c, the kitchen includes a dishwasher, a microwave oven and an oven; the living room includes: sofas, televisions, and dining tables; the bedroom includes: a bed; the bathroom includes: toilets and washing machines.

The sweeping robot is further internally provided with a pre-trained Bayesian network model, after objects contained in operation partitions such as a kitchen, a bedroom, a bathroom and a living room are obtained, semantic information of the objects contained in the operation partitions is input into the pre-trained Bayesian network model, and scene semantic information of the operation partitions is output through the Bayesian network model. For example, in a work partition including a sofa, a television, and a table, semantic information of an object such as the sofa, the television, and the table included in the work partition is input to a bayesian network model, and the bayesian network model assumes that the work partition has a high probability of being a living room based on a previously learned constraint relationship between the object and a regional scene, so that scene semantic information for outputting the work partition is the living room. For example, semantic information of an object such as a bed included in a work partition including the bed is input to a bayesian network model, and the bayesian network model assumes that the work partition has a high probability of being a bedroom based on a previously learned constraint relationship between the object and a regional scene, and therefore outputs scenarized semantic information of the work partition to be a bedroom. For another example, semantic information of an object such as a toilet or a washing machine included in a work partition including the toilet or the washing machine is sent to a bayesian network model, and the bayesian network model assumes that the work partition has a high probability of being a toilet based on a previously learned limited relationship between the object and a regional scene, and outputs scenarized semantic information of the work partition as the toilet. For another example, for a work partition including a dishwasher, a microwave oven, and an oven, semantic information of an object such as the dishwasher, the microwave oven, and the oven included in the work partition is sent to a bayesian network model, and the bayesian network model considers that the work partition has a high probability of being a kitchen based on a previously learned constraint relationship between the object and a region scene, and therefore outputs scenized semantic information of the work partition as the kitchen.

Furthermore, based on the scenized semantic information of each work partition and the semantic information of the objects contained in each work partition, a semantic map of the objects and the work environment to which the work partition belongs is constructed in accordance with the actual positions of the objects contained in each work partition within the partition. As shown in fig. 2c, the semantic map includes semantic information of work partitions such as kitchen, bathroom, living room, bedroom, etc., and also includes position and semantic information of objects such as sofa, tv, bed, dining table, dishwasher, microwave oven, washing machine and toilet.

Application scenario example 4:

in a home environment, the sweeping robot is required to sweep an unknown working environment, and the unknown working environment can be an area where the sweeping robot performs a sweeping task for the first time or an area which cannot be understood by the sweeping robot because a semantic map is not constructed yet. In the process of cleaning the unknown operation environment by the sweeping robot, the environment image of the operation environment is collected in real time by using a camera on the robot, and the environment image collected in real time is sent to the server.

The server receives an environment image sent by the sweeping robot, and performs target detection and semantic recognition on the environment image by using a target detection model to obtain semantic information of a plurality of objects contained in an unknown working environment, namely which object or which objects are. In addition, the server may also estimate the actual locations of the plurality of objects within the work environment based on the pixel locations of the plurality of objects in the environmental image. In the present embodiment, the unknown work environment includes: for example, a plurality of objects such as sofas, televisions, beds, dining tables, dishwashers, microwave ovens, washing machines, toilets, and the like, the target detection model may output semantic information of the unknown work environment including the objects such as sofas, televisions, beds, dining tables, dishwashers, microwave ovens, washing machines, toilets, and the like.

Further, the server can partition the operation environment according to the collected environment information to obtain at least one operation partition. In the present embodiment, taking the example that the unknown work environment is the whole home environment, it is assumed that the whole home environment is divided into several work divisions of a kitchen, a bedroom (including a main bedroom and a secondary bedroom), a living room, and a toilet. Further, the actual positions of the plurality of objects included in the work environment and the boundaries of the several work divisions may be combined to determine the objects included in each work division. As can be seen in connection with fig. 2c, the kitchen includes a dishwasher, a microwave oven and an oven; the living room includes: sofas, televisions, and dining tables; the bedroom includes: a bed; the bathroom includes: toilets and washing machines.

The server is also internally provided with a pre-trained Bayesian network model, after objects contained in the operation partitions such as a kitchen, a bedroom, a bathroom and a living room are obtained, semantic information of the objects contained in the operation partitions is input into the pre-trained Bayesian network, and scene semantic information of the operation partitions is output by the Bayesian network model.

For example, in a work partition including a sofa, a television, and a table, semantic information of an object such as the sofa, the television, and the table included in the work partition is input to a bayesian network model, and the bayesian network model assumes that the work partition has a high probability of being a living room based on a previously learned constraint relationship between the object and a regional scene, so that scene semantic information for outputting the work partition is the living room. For example, semantic information of an object such as a bed included in a work partition including the bed is input to a bayesian network model, and the bayesian network model assumes that the work partition has a high probability of being a bedroom based on a previously learned constraint relationship between the object and a regional scene, and therefore outputs scenarized semantic information of the work partition to be a bedroom. For another example, semantic information of an object such as a toilet or a washing machine included in a work partition including the toilet or the washing machine is sent to a bayesian network model, and the bayesian network model assumes that the work partition has a high probability of being a toilet based on a previously learned limited relationship between the object and a regional scene, and outputs scenarized semantic information of the work partition as the toilet. For another example, for a work partition including a dishwasher, a microwave oven, and an oven, semantic information of an object such as the dishwasher, the microwave oven, and the oven included in the work partition is sent to a bayesian network model, and the bayesian network model considers that the work partition has a high probability of being a kitchen based on a previously learned constraint relationship between the object and a region scene, and therefore outputs scenized semantic information of the work partition as the kitchen.

Furthermore, the server may construct a semantic map of the objects and the work environment to which the work partition belongs, based on the scenized semantic information of each work partition and the semantic information of the objects included in each work partition, in combination with the actual positions of the objects included in each work partition within the partition. As shown in fig. 2c, the semantic map includes semantic information of work partitions such as kitchen, bathroom, living room, bedroom, etc., and also includes position and semantic information of objects such as sofa, tv, bed, dining table, dishwasher, microwave oven, washing machine and toilet.

After the semantic map corresponding to the work environment is constructed, the server sends the semantic map to the autonomous mobile device so that the autonomous mobile device can execute work tasks based on the semantic map.

An example of a semantic map corresponding to a home environment, whether application scenario instance 3 or application scenario instance 4, is shown in fig. 2 c. In fig. 2c, there are work areas such as kitchen, toilet, living room, main bed and sub bed, each work area containing at least one object. Taking a living room as an example, the living room comprises a television, a sofa, a dining table and the like. Examples of kitchens include microwave ovens, dishwashers, etc. In fig. 2c, the main bed and the sub bed are illustrated as an example, but the present invention is not limited thereto. The main bed and the secondary bed belong to a bedroom scene, and the main bed and the secondary bed can be distinguished in a semantic map, for example, the main bed and the secondary bed can be distinguished according to the size, the position and the like of the bedroom; of course, the semantic map can also be simply and uniformly marked as a bedroom without distinguishing the primary lying from the secondary lying, and the method is not limited.

No matter which embodiment is adopted to construct the semantic map, after the semantic map is constructed, the autonomous mobile device can execute the operation task based on the constructed semantic map. The following describes in detail an operation method of the sweeping robot for executing an operation task based on a semantic map.

FIG. 3a is a schematic flow chart of a method of operation provided in an exemplary embodiment of the present application; the method is applicable to an autonomous mobile device, as shown in fig. 3a, and comprises the following steps:

30a, the autonomous mobile equipment receives a work instruction, wherein the work instruction comprises scene semantic information of a target work area;

31a, determining the position of the target operation area in a pre-established semantic map according to the scene semantic information of the target operation area;

and 32a, moving to the target working area based on the position, and executing the working task in the target working area.

In this embodiment, it is possible for the autonomous mobile device to receive a job instruction. Because the semantic map is provided, the work instruction is different from the existing semantic instruction, and the semantic instruction comprises the scene semantic information of the target work area. The target work area may be the work area in the embodiment shown in fig. 1a and 1b, or may be one or some of the work partitions in the embodiment shown in fig. 2a and 2 b. The scene semantic information of the target working area is information of a kitchen, a bedroom, a living room and the like.

For the autonomous mobile equipment, after receiving the operation instruction, the target operation area and the scene semantic information thereof can be analyzed from the operation instruction; determining the position of a target operation area in a pre-established semantic map based on the scene semantic information; further, the work machine moves to a target work area based on the position, and executes a work task in the target work area.

Depending on the device type of the autonomous mobile device, the manner and content of performing the task in the target work area by the autonomous mobile device may vary. For example, if the autonomous moving apparatus is a sweeping robot, the autonomous moving apparatus may move to a target working area and perform a sweeping task in the target working area. If the autonomous mobile device is an air purifier, the autonomous mobile device can move to a target operation area, and an air purification task is executed in the target operation area. And if the autonomous mobile equipment is a family accompanying robot, the autonomous mobile equipment can move to a target operation area, and a family accompanying task is executed in the target operation area. And if the autonomous mobile equipment is the welcome robot, the autonomous mobile equipment can move to the target operation area, and the welcome task is executed in the target operation area.

In the embodiments of the present application, the issuing manner of the work instruction is not limited, and any manner that can issue the work instruction to the autonomous mobile device is applicable to the embodiments of the present application. The following description will be given by referring to several application scenarios:

in an application scenario A1The user is located in an environment in which the autonomous mobile device is located, and the autonomous mobile device has a voice recognition function. When a user needs to work on the target working area, a working instruction can be sent to the autonomous mobile equipment in a voice mode, wherein the working instruction comprises the target working area and scene semantic information of the target working area, and the autonomous mobile equipment is instructed to work on the target working area. For example, a user may say "please sweep the kitchen" to the autonomous mobile device. The autonomous mobile equipment receives a work instruction sent by a user in a voice mode, wherein the work instruction comprises scene semantic information of a target work area; then, according to the scene semantic information of the target operation area, determining the position of the target operation area in a pre-established semantic map; and moving to the target working area based on the position, and executing the working task in the target working area.

In another application scenario A2The environment in which the autonomous mobile device is located includes an audio playing device, and the audio playing device may be any device that can play an audio signal, such as a smart speaker, a television, a smart phone, a tablet computer, and the like. The user can set the operation instruction and the sending time of the operation instruction on the audio playing device in advance so as to be convenient for the user to stay in the same position with the autonomous mobile deviceAn audio playback device in an environment sends a job instruction to an autonomous mobile device. Optionally, if the audio playing device is provided with a corresponding physical key, the user may set the operation instruction and the playing time thereof on the audio playing device through the physical key on the audio playing device. Or, if the audio playing device supports the voice recognition function, the user may set the operation instruction and the playing time thereof on the audio playing device in a voice manner. Or, if the audio playing device has a touch screen, the user may set the operation instruction and the playing time thereof on the audio playing device through the touch screen of the audio playing device. The audio playing device can play the job instruction set by the user when the playing time is up. The autonomous mobile equipment receives a job instruction sent by audio playing equipment in the same environment with the autonomous mobile equipment, wherein the job instruction comprises scene semantic information of a target job area; then, according to the scene semantic information of the target operation area, determining the position of the target operation area in a pre-established semantic map; and moving to the target working area based on the position, and executing the working task in the target working area.

In another application scenario A3And the user binds the terminal equipment used by the user with the autonomous mobile equipment, so that no matter whether the user is in the environment of the autonomous mobile equipment, the user can send an operation instruction to the autonomous mobile equipment through the terminal equipment bound with the autonomous mobile equipment. In particular, when the user is not in the environment of the autonomous mobile device, the job instruction may be remotely transmitted to the autonomous mobile device through the terminal device. The terminal equipment used by the user can be a smart phone, a smart watch, a smart bracelet and the like. Taking home monitoring as an example, during business trip or work, a user can send an operation instruction to home autonomous mobile equipment through terminal equipment such as a smart phone, a smart watch, a smart bracelet and the like which are carried by the user, wherein the operation instruction comprises scene semantic information of a target operation area; then, according to the scene semantic information of the target operation area, determining the position of the target operation area in a pre-established semantic map; moving to and executing within the target work area based on the locationAnd (5) executing the job task.

In an optional embodiment, an App for controlling the autonomous mobile device may be installed on the terminal device, and the user may send a job instruction to the autonomous mobile device through the App. For example, a user can open a semantic map corresponding to the autonomous mobile device through the App, determine a target operation area by performing selection operation in the semantic map, and then send an operation instruction to the autonomous mobile device through the App. For the terminal equipment, a semantic map corresponding to the operation environment of the autonomous mobile equipment can be displayed, and the semantic map comprises at least one operation area and scene semantic information thereof; responding to the selection operation of a user on at least one operation area, and generating an operation instruction, wherein the operation instruction comprises a target operation area selected by the user and scene semantic information of the target operation area; and sending the operation instruction to the autonomous mobile equipment so as to control the autonomous mobile equipment to execute the operation task in the target operation area.

Further, for the terminal device, before displaying the semantic map corresponding to the work environment of the autonomous mobile device, the method further includes: receiving a semantic map sent by a server or an autonomous mobile device.

In thatIn another application scenario A4The autonomous mobile device has an electronic screen through which a semantic map including at least one work area and its scenarized semantic information may be displayed to a user. For the user, the electronic screen can be used for carrying out selection operation in the semantic map to determine the target operation area. For the autonomous mobile equipment, a job instruction can be generated in response to the selection operation of a user on at least one job area, wherein the job instruction comprises a target job area selected by the user and scene semantic information of the target job area; then, according to the scene semantic information of the target operation area, determining the position of the target operation area in a pre-established semantic map; and moving to the target working area based on the position, and executing the working task in the target working area.

Fig. 3b is a schematic flowchart of another operation method provided in an exemplary embodiment of the present application, where the method is applied to a terminal device, and as shown in fig. 3b, the method includes:

30b, displaying a semantic map corresponding to the working environment of the autonomous mobile equipment, wherein the semantic map comprises at least one working area and scene semantic information thereof;

31b, responding to the selection operation of the user on at least one operation area, and generating an operation instruction, wherein the operation instruction comprises a target operation area selected by the user and scene semantic information of the target operation area;

and 32b, sending the work instruction to the autonomous mobile equipment so as to control the autonomous mobile equipment to execute the work task in the target work area.

In an optional embodiment, before displaying the semantic map corresponding to the working environment of the autonomous mobile device, the method further comprises: receiving the semantic map sent by a server or an autonomous mobile device.

It should be noted that the execution subjects of the steps of the methods provided in the above embodiments may be the same device, or different devices may be used as the execution subjects of the methods. For example, the execution subjects of steps 10a to 13a may be device a; for another example, the execution subject of step 10a may be device a, and the execution subjects of steps 11a to 13a may be device B; and so on.

In addition, in some of the flows described in the above embodiments and the drawings, a plurality of operations are included in a specific order, but it should be clearly understood that the operations may be executed out of the order presented herein or in parallel, and the order of the operations, such as 11a, 12a, etc., is merely used to distinguish various operations, and the order itself does not represent any execution order. Additionally, the flows may include more or fewer operations, and the operations may be performed sequentially or in parallel. It should be noted that, the descriptions of "first", "second", etc. in this document are used for distinguishing different messages, devices, modules, etc., and do not represent a sequential order, nor limit the types of "first" and "second" to be different.

Fig. 4a is a schematic structural diagram of an autonomous mobile device according to an exemplary embodiment of the present application. As shown in fig. 4a, the autonomous mobile apparatus includes: the device body 40a is provided with one or more processors 401a, one or more memories 402a for storing computer instructions, a communication component 403a, and an information acquisition component 407 a.

In this embodiment, the one or more processors 401a are configured to execute the computer instructions stored in the one or more memories 402a to: collecting the environment information of the operation area through an information collecting component 407 a; carrying out target detection on the environmental information to obtain semantic information of at least one object; identifying scene semantic information of the operation area according to the semantic information of at least one object; and according to the scene semantic information of the operation area, constructing a semantic map corresponding to the operation area.

In an optional embodiment, when obtaining semantic information of at least one object included in the work area, the one or more processors 401a are specifically configured to: and carrying out target detection and semantic recognition on the environmental information by using the target detection model to obtain semantic information of at least one object contained in the working area.

In an alternative embodiment, the one or more processors 401a, when identifying the scenized semantic information of the work area based on the semantic information of the at least one object, are specifically configured to: and recognizing scene semantic information of the working area according to the semantic information of at least one object by combining the limit relation between the objects and the area scene learned in advance.

Optionally, the one or more processors 401a, when recognizing the scenized semantic information of the working area according to the semantic information of at least one object in combination with the previously learned defined relationship between the object and the area scene, are specifically configured to: inputting semantic information of at least one object into a semantic recognition model trained in advance, and recognizing scene semantic information of a working area by using the semantic recognition model; wherein the semantic recognition model can embody a defined relationship between the object and the regional scene.

Optionally, the semantic recognition model is a bayesian network model, but is not limited thereto.

In an alternative embodiment, the one or more processors 401a are also used to pre-train the Bayesian network model. Wherein, when training the bayesian network model, the one or more processors 401a are specifically configured to: acquiring a plurality of labeled samples, wherein each labeled sample comprises scene semantic information of a sample region and semantic information of a sample object contained in the sample region; obtaining the prior probability of each sample object appearing in each sample area from a plurality of labeled samples; and training a Bayesian network model according to the prior probability of each sample object appearing in each sample area.

In an optional embodiment, when the one or more processors 401a construct, according to the scenized semantic information of the work area, the semantic map corresponding to the work area, the processor is specifically configured to: and constructing a semantic map corresponding to the operation area according to the scene semantic information of the operation area and the semantic information of at least one object.

In an optional embodiment, when constructing the semantic map corresponding to the work area, the one or more processors 401a are specifically configured to: and combining the position information of the at least one object in the operation area, and constructing a semantic map corresponding to the operation area according to the scene semantic information of the operation area and the semantic information of the at least one object.

Further optionally, when the one or more processors 401a combine the position information of the at least one object in the work area, and construct a semantic map corresponding to the work area according to the scenized semantic information of the work area and the semantic information of the at least one object, the one or more processors are specifically configured to: combining the position information of at least one object in the operation area to construct an environment map corresponding to the operation area; and marking the scene semantic information of the operation area and the semantic information of at least one object in the environment map to obtain a semantic map corresponding to the operation area.

Further, as shown in fig. 4a, the autonomous mobile apparatus further includes: display 404a, power component 405a, audio component 406a, and other components. Only some of the components are schematically shown in fig. 4a, and it is not meant that the autonomous mobile device includes only the components shown in fig. 4 a. In addition, the components within the dashed box in fig. 4a are optional components, not mandatory components, and may depend on the product form of the autonomous mobile device.

Fig. 4b is a schematic structural diagram of another autonomous mobile device according to an exemplary embodiment of the present application. As shown in fig. 4b, the autonomous mobile apparatus includes: the device body 40b is provided with one or more processors 401b, one or more memories 402b for storing computer instructions, a communication component 403b and an information acquisition component 407 b. Optionally, the information collecting component 407b may be various sensors such as a camera, a microwave radar, an infrared radar, and the like.

In this embodiment, the one or more processors 401b are configured to execute the computer instructions stored in the one or more memories 402b to: collecting, by the information collection component 407b, environmental information of a work environment, the work environment comprising at least one work partition; performing target detection on the environment information to obtain semantic information of a plurality of objects contained in the working environment; identifying scene semantic information of at least one operation partition according to semantic information of a plurality of objects contained in the operation environment; and constructing a semantic map corresponding to the operation environment according to the scene semantic information of at least one operation partition.

In an optional embodiment, when performing the target detection on the environmental information to obtain the semantic information of the multiple objects, the one or more processors 401b are specifically configured to: and carrying out target detection and semantic recognition on the environmental information by using the target detection model to obtain semantic information of a plurality of objects contained in the working environment.

In an alternative embodiment, the one or more processors 401b, when identifying the scenized semantic information of the at least one job partition based on the semantic information of the plurality of objects included in the job environment, are specifically configured to: determining objects contained in each of at least one work partition according to the position information of the plurality of objects and the boundary of the at least one work partition; and identifying scene semantic information of the at least one work partition according to the semantic information of the object contained in the at least one work partition.

In an alternative embodiment, the one or more processors 401b are further configured to: before determining the objects contained in each of the at least one work partition, partitioning the work environment according to the environment information of the work environment to determine the boundary of the at least one work partition.

In an alternative embodiment, the one or more processors 401b, when identifying the scenarized semantic information for the at least one job partition, are specifically configured to: and respectively identifying scene semantic information of at least one work partition according to semantic information of the object contained in each work partition by combining the limit relation between the object and the regional scene learned in advance.

Further optionally, when the one or more processors 401b respectively identify the scenized semantic information of at least one job partition according to the semantic information of the object included in each of the at least one job partition in combination with the previously learned limited relationship between the object and the area scene, specifically: for any operation partition, inputting semantic information of objects contained in the operation partition into a semantic recognition model trained in advance, and recognizing scene semantic information of the operation partition by using the semantic recognition model; wherein the semantic recognition model may embody a defined relationship between an object and a regional scene.

Alternatively, the semantic recognition model may be a bayesian network model, but is not limited thereto.

In an alternative embodiment, the one or more processors 401b are further configured to: the bayesian network model is trained in advance. Wherein, when training the bayesian network model, the one or more processors 401b are specifically configured to: acquiring a plurality of labeled samples, wherein each labeled sample comprises scene semantic information of a sample region and semantic information of a sample object contained in the sample region; obtaining the prior probability of each sample object appearing in each sample area from a plurality of labeled samples; and training a Bayesian network model according to the prior probability of each sample object appearing in each sample area.

In an optional embodiment, when the semantic map corresponding to the job environment is constructed according to the scenized semantic information of at least one job partition, the one or more processors 401b are specifically configured to: and constructing a semantic map corresponding to the operation environment according to the scene semantic information of at least one operation partition and the semantic information of the plurality of objects.

Further, as shown in fig. 4b, the autonomous mobile apparatus further includes: display 404b, power components 405b, audio components 406b, and the like. Only some of the components are schematically shown in fig. 4b and it is not meant that the autonomous mobile device comprises only the components shown in fig. 4 b. In addition, the components within the dashed box in fig. 4b are optional components, not mandatory components, and may depend on the product form of the autonomous mobile device.

Fig. 4c is a schematic structural diagram of another autonomous mobile apparatus provided in an exemplary embodiment of the present application. As shown in fig. 4c, the autonomous mobile apparatus includes: the device body 40c is provided with one or more processors 401c, one or more memories 402c storing computer instructions, and a communication component 403c, on the device body 40 c.

In this embodiment, the one or more processors 401c are configured to execute computer instructions stored in the one or more memories 402c to: one or more processors 401c to execute computer instructions stored in the one or more memories 402c to: receiving a job instruction through the communication component 403c, wherein the job instruction comprises the scenized semantic information of the target job region; determining the position of the target operation area in a pre-established semantic map according to the scene semantic information of the target operation area; and controlling the autonomous mobile equipment to move to the target working area based on the position, and executing the working task in the target working area.

In an optional embodiment, when receiving the job instruction, the one or more processors 401c are specifically configured to: displaying a semantic map, wherein the semantic map comprises at least one operation area and scene semantic information thereof; and responding to the selection operation of the user on at least one working area, and generating a working instruction, wherein the working instruction comprises a target working area selected by the user and scene semantic information of the target working area.

In another optional embodiment, the one or more processors 401c, when receiving the job instruction, are specifically configured to: and receiving a working instruction sent by the terminal equipment, wherein the terminal equipment is in communication connection with the autonomous mobile equipment.

Further, as shown in fig. 4c, the autonomous mobile apparatus may further include: display 404c, power components 405c, audio components 406c, and the like. The present embodiment is only schematically given to some components and does not mean that the autonomous mobile device only includes these components. It is to be noted that the components shown in fig. 4c as dashed boxes are optional components, not essential components.

The autonomous mobile device provided by the above embodiments may be a robot, a purifier, an unmanned vehicle, or the like. The robot can be a sweeping robot, a accompanying robot or a welcoming robot and the like.

In an alternative embodiment, the autonomous mobile device provided in the above embodiments is implemented as a robot. As shown in fig. 5, the robot 500 of the present embodiment includes: the machine body 501 is provided with one or more processors 502, one or more memories 503 for storing computer instructions, and a communication component 504. The communication component 504 may be a Wifi module, an infrared module, or a bluetooth module, etc.

In addition to one or more processors 502, communication components 504, and one or more memories 503, some basic components of the robot 500, such as a vision sensor 506, a power supply component 507, a driving component 508, and the like, are provided on the machine body 501. The vision sensor may be a camera, or the like. Alternatively, the drive assembly 508 may include drive wheels, drive motors, universal wheels, and the like. Optionally, if the robot 500 is a sweeping robot, the robot 500 may further include a sweeping assembly 505, and the sweeping assembly 505 may include a sweeping motor, a sweeping brush, a dust suction fan, and the like. These basic components and the configurations of the basic components included in different robots 500 are different, and the embodiments of the present application are only some examples. It is to be noted that the components shown in fig. 5 by the dashed line boxes are optional components, not essential components.

It is noted that the one or more processors 502 and the one or more memories 503 may be disposed inside the machine body 501, or may be disposed on the surface of the machine body 501.

The machine body 501 is an execution mechanism by which the robot 500 performs a task, and can execute an operation designated by the processor 502 in a certain environment. The machine body 501 represents the appearance of the robot 500 to some extent. In the present embodiment, the external appearance of the robot 500 is not limited, and may be, for example, a circle, an ellipse, a triangle, a convex polygon, or the like.

The one or more memories 503 are used primarily to store computer instructions that are executable by the one or more processors 502 to cause the one or more processors 502 to control the robot 500 to perform corresponding tasks. In addition to storing computer instructions, the one or more memories 503 may also be configured to store other various data to support operations on the robot 500. Examples of such data include instructions for any application or method operating on the robot 500, semantic maps of the environment/scene in which the robot 500 is located, and so forth.

The memory or memories 503 may be implemented by any type or combination of volatile or non-volatile memory devices such as Static Random Access Memory (SRAM), electrically erasable programmable read-only memory (EEPROM), erasable programmable read-only memory (EPROM), programmable read-only memory (PROM), read-only memory (ROM), magnetic memory, flash memory, magnetic or optical disks.

One or more processors 502, which may be considered a control system for the robot 500, may be used to execute computer instructions stored in one or more memories 503 to control the robot 500 to perform corresponding tasks.

In this embodiment, the one or more processors 502 may execute computer instructions stored in the one or more memories 503 to construct a semantic map. The semantic map is constructed in a manner including, but not limited to, describing two ways:

mode 1: collecting environmental information of the work area through a vision sensor 506; carrying out target detection on the environmental information to obtain semantic information of at least one object contained in the working area; identifying scene semantic information of the operation area according to the semantic information of at least one object; and according to the scene semantic information of the operation area, constructing a semantic map corresponding to the operation area.

Or

Mode 2: collecting environmental information of a work environment by a vision sensor 506, the work environment comprising at least one work zone; performing target detection on the environment information to obtain semantic information of a plurality of objects contained in the working environment; identifying scene semantic information of at least one operation partition according to semantic information of objects contained in the at least one operation partition; and constructing a semantic map corresponding to the operation environment according to the scene semantic information of at least one operation partition.

Further, after the semantic map is constructed, the sweeping robot of the embodiment can execute the operation task according to the semantic map. Wherein, the process of executing the job task comprises the following steps: receiving a job instruction through the communication component 504, wherein the job instruction comprises scenized semantic information of a target job area; determining the position of the target operation area in a pre-established semantic map according to the scene semantic information of the target operation area; the robot 500 is controlled to move to the target working area based on the above-described position, and performs a working task within the target working area.

For a detailed description of the related operations, reference may be made to the foregoing embodiments, which are not repeated herein.

In addition to the autonomous mobile device or robot described above, embodiments of the present application also provide a computer-readable storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising:

collecting environmental information of a working area; carrying out target detection on the environmental information to obtain semantic information of at least one object contained in the working area; identifying scene semantic information of the operation area according to the semantic information of at least one object; and according to the scene semantic information of the operation area, constructing a semantic map corresponding to the operation area.

Or

Collecting environmental information of a working environment, wherein the working environment comprises at least one working partition; performing target detection on the environment information to obtain semantic information of a plurality of objects contained in the working environment; identifying scene semantic information of at least one operation partition according to semantic information of objects contained in the at least one operation partition; and constructing a semantic map corresponding to the operation environment according to the scene semantic information of at least one operation partition.

Or

Receiving a work instruction, wherein the work instruction comprises scene semantic information of a target work area; determining the position of the target operation area in a pre-established semantic map according to the scene semantic information of the target operation area; and moving to the target working area, and executing the working task in the target working area.

In addition to the above actions, the one or more processors may also perform other actions when executing the computer instructions in the computer-readable storage medium, and the other actions may refer to the description in the foregoing embodiments and are not described herein again.

Fig. 6 is a schematic structural diagram of a server according to an exemplary embodiment of the present application; as shown in fig. 6, the server includes: one or more processors 601, one or more memories 602 storing computer instructions, and communication components 603, power components 605. The present embodiment only schematically shows some components, and does not mean that the server includes only these components.

In an alternative embodiment, one or more processors 601 configured to execute computer instructions stored in one or more memories 602 configured to: receiving the environment information of its operation area reported from the autonomous mobile device through the communication component 603; carrying out target detection on the environmental information to obtain semantic information of at least one object contained in the working area; identifying scene semantic information of the operation area according to the semantic information of at least one object; according to the scene semantic information of the operation area, constructing a semantic map corresponding to the operation area; the semantic map is sent to the autonomous mobile device through the communication component 603 for the autonomous mobile device to perform job tasks based on the semantic map.

In another alternative embodiment, the one or more processors 601 executing computer instructions stored in the one or more memories 602 may also be configured to: receiving environment information of its working environment reported from the autonomous mobile device through the communication component 603, the working environment comprising at least one working partition; performing target detection on the environment information to obtain semantic information of a plurality of objects contained in the working environment; identifying scene semantic information of at least one operation partition according to semantic information of objects contained in the at least one operation partition; according to the scene semantic information of at least one operation partition, constructing a semantic map corresponding to an operation environment; the semantic map is sent to the autonomous mobile device through the communication component 603 for the autonomous mobile device to perform job tasks based on the semantic map.

Accordingly, embodiments of the present application also provide a computer-readable storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising:

receiving environment information of a working area of the autonomous mobile equipment, wherein the environment information comprises at least one object; carrying out target detection on the environmental information to obtain semantic information of at least one object contained in the working area; identifying scene semantic information of the operation area according to the semantic information of at least one object; according to the scene semantic information of the operation area, constructing a semantic map corresponding to the operation area; and sending the semantic map to the autonomous mobile equipment so that the autonomous mobile equipment can execute the operation task based on the semantic map.

Or

Receiving environment information of a working environment reported by an autonomous mobile device, wherein the working environment comprises at least one working partition; performing target detection on the environment information to obtain semantic information of a plurality of objects contained in the working environment; identifying scene semantic information of at least one operation partition according to semantic information of objects contained in the at least one operation partition; according to the scene semantic information of at least one operation partition, constructing a semantic map corresponding to an operation environment; and sending the semantic map to the autonomous mobile equipment so that the autonomous mobile equipment can execute the operation task based on the semantic map.

Fig. 7 is a schematic structural diagram of a terminal device according to an exemplary embodiment of the present application. As shown in fig. 7, the terminal device includes: one or more processors 701, one or more memories 702 storing computer instructions, and communications components 703, a display 704.

Further, as shown in fig. 7, the terminal device may further include: power component 705, audio component 706, and other components. The present embodiment only schematically shows some components, and does not mean that the terminal device includes only these components. It is to be noted that the components shown in fig. 7 by the dashed line boxes are optional components, not essential components.

Among other things, one or more processors 701 for executing computer instructions stored in one or more memories 702 for: displaying a semantic map corresponding to a work environment of the autonomous mobile device on the display 704, the semantic map including at least one work area and its scenarized semantic information; responding to the selection operation of a user on at least one operation area, and generating an operation instruction, wherein the operation instruction comprises a target operation area selected by the user and scene semantic information of the target operation area; the work instructions are sent to the autonomous mobile device via the communication component 703 to control the autonomous mobile device to perform work tasks within the target work area.

Optionally, the one or more processors 701, prior to displaying the semantic map corresponding to the work environment of the autonomous mobile device, are further to: the semantic map sent by the server or autonomous mobile device is received through the communication component 703.

It should be noted that only some components are schematically shown in fig. 7, and the terminal device is not meant to include only the components shown in fig. 7. In addition, the components within the dashed line frame in fig. 7 are optional components, not necessary components, and may be determined according to the product form of the terminal device.

Accordingly, embodiments of the present application also provide a computer-readable storage medium storing computer instructions that, when executed by one or more processors, cause the one or more processors to perform acts comprising: displaying a semantic map corresponding to the working environment of the autonomous mobile device, wherein the semantic map comprises at least one working area and scene semantic information thereof; responding to the selection operation of a user on at least one operation area, and generating an operation instruction, wherein the operation instruction comprises a target operation area selected by the user and scene semantic information of the target operation area; and sending the operation instruction to the autonomous mobile equipment so as to control the autonomous mobile equipment to execute the operation task in the target operation area.

The communication component in the above embodiments is configured to facilitate communication between the device in which the communication component is located and other devices in a wired or wireless manner. The device in which the communication component is located can access a wireless network based on a communication standard, such as Wifi, 2G or 3G, or a combination thereof. In an exemplary embodiment, the communication component receives a broadcast signal or broadcast related information from an external broadcast management system via a broadcast channel. In an exemplary embodiment, the communication component may further include a Near Field Communication (NFC) module, Radio Frequency Identification (RFID) technology, infrared data association (IrDA) technology, Ultra Wideband (UWB) technology, Bluetooth (BT) technology, and the like.

The display in the above embodiments includes a screen, which may include a Liquid Crystal Display (LCD) and a Touch Panel (TP). If the screen includes a touch panel, the screen may be implemented as a touch screen to receive an input signal from a user. The touch panel includes one or more touch sensors to sense touch, slide, and gestures on the touch panel. The touch sensor may not only sense the boundary of a touch or slide action, but also detect the duration and pressure associated with the touch or slide operation.

The power supply components in the embodiments of the figures described above provide power to the various components of the device in which the power supply components are located. The power components may include a power management system, one or more power supplies, and other components associated with generating, managing, and distributing power for the device in which the power component is located.

The audio component in the above embodiments may be configured to output and/or input an audio signal. For example, the audio component includes a Microphone (MIC) configured to receive an external audio signal when the device in which the audio component is located is in an operational mode, such as a call mode, a recording mode, and a voice recognition mode. The received audio signal may further be stored in a memory or transmitted via a communication component. In some embodiments, the audio assembly further comprises a speaker for outputting audio signals.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

In a typical configuration, a computing device includes one or more processors (CPUs), input/output interfaces, network interfaces, and memory.

The memory may include forms of volatile memory in a computer readable medium, Random Access Memory (RAM) and/or non-volatile memory, such as Read Only Memory (ROM) or flash memory (flash RAM). Memory is an example of a computer-readable medium.

Computer-readable media, including both non-transitory and non-transitory, removable and non-removable media, may implement information storage by any method or technology. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of computer storage media include, but are not limited to, phase change memory (PRAM), Static Random Access Memory (SRAM), Dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), Digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium that can be used to store information that can be accessed by a computing device. As defined herein, a computer readable medium does not include a transitory computer readable medium such as a modulated data signal and a carrier wave.

It should also be noted that the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other identical elements in the process, method, article, or apparatus that comprises the element.

The above are merely examples of the present application and are not intended to limit the present application. Various modifications and changes may occur to those skilled in the art. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present application should be included in the scope of the claims of the present application.

Claims

1. A semantic map construction method suitable for autonomous mobile devices, the method comprising:

collecting environmental information of a working area;

performing target detection on the environmental information to obtain semantic information of at least one object contained in the working area;

according to the semantic information of the at least one object, identifying scene semantic information of the operation area;

and constructing a semantic map corresponding to the operation area according to the scene semantic information of the operation area.

2. The method of claim 1, wherein performing target detection on the environmental information to obtain semantic information of at least one object included in the work area comprises:

and carrying out target detection and semantic recognition on the environmental information by using a target detection model so as to obtain semantic information of at least one object contained in the working area.

3. The method of claim 1, wherein identifying scenarized semantic information for the work area based on the semantic information for the at least one object comprises:

and recognizing scene semantic information of the working area according to the semantic information of the at least one object by combining the previously learned limited relationship between the object and the area scene.

4. The method according to claim 3, wherein identifying the scenized semantic information of the work area based on the semantic information of the at least one object in combination with a previously learned defined relationship between the object and the area scene comprises:

inputting semantic information of the at least one object into a semantic recognition model trained in advance, and recognizing scene semantic information of the operation area by using the semantic recognition model; wherein the semantic recognition model may embody a defined relationship between an object and a regional scene.

5. The method according to claim 4, wherein the semantic recognition model is a Bayesian model, and the pre-training of the Bayesian network model comprises:

acquiring a plurality of labeled samples, wherein each labeled sample comprises scene semantic information of a sample region and semantic information of a sample object contained in the sample region;

obtaining a prior probability of each sample object appearing in each sample region from the plurality of labeled samples;

and training the Bayesian network model according to the prior probability of each sample object appearing in each sample area.

6. The method according to claim 1, wherein constructing the semantic map corresponding to the working area according to the scenized semantic information of the working area comprises:

and constructing a semantic map corresponding to the operation area according to the scene semantic information of the operation area and the semantic information of the at least one object.

7. The method according to claim 6, wherein constructing the semantic map corresponding to the working area according to the scenized semantic information of the working area and the semantic information of the at least one object comprises:

and combining the position information of the at least one object in the operation area, and constructing a semantic map corresponding to the operation area according to the scene semantic information of the operation area and the semantic information of the at least one object.

8. The method according to claim 7, wherein constructing a semantic map corresponding to the work area according to the scenized semantic information of the work area and the semantic information of the at least one object by combining the position information of the at least one object in the work area comprises:

combining the position information of the at least one object in the working area to construct an environment map corresponding to the working area;

and marking the scene semantic information of the operation area and the semantic information of the at least one object in the environment map to obtain a semantic map corresponding to the operation area.

9. A semantic map construction method suitable for autonomous mobile devices, the method comprising:

collecting environmental information of a working environment, wherein the working environment comprises at least one working partition;

performing target detection on the environment information to obtain semantic information of a plurality of objects contained in the working environment;

according to semantic information of a plurality of objects contained in the working environment, identifying scene semantic information of the at least one working partition;

and constructing a semantic map corresponding to the operation environment according to the scene semantic information of the at least one operation partition.

10. The method of claim 9, wherein identifying the scenized semantic information of the at least one work partition based on semantic information of a plurality of objects included in the work environment comprises:

determining objects contained in each of the at least one work partition according to the position information of the plurality of objects and the boundary of the at least one work partition;

and respectively identifying scene semantic information of the at least one operation partition according to the semantic information of the object contained in the at least one operation partition.

11. The method of claim 10, further comprising, prior to determining the objects that each of the at least one work partition contains:

partitioning the job environment according to the environment information to determine a boundary of the at least one job partition.

12. The method according to claim 10, wherein identifying the scenized semantic information of the at least one work partition based on the semantic information of the object respectively contained in the at least one work partition comprises:

and respectively identifying scene semantic information of the at least one work partition according to semantic information of the object contained in the at least one work partition by combining a pre-learned limited relationship between the object and the regional scene.

13. A semantic map construction method is suitable for a server, and is characterized by comprising the following steps:

receiving environment information of a working area reported by the autonomous mobile equipment;

according to the scene semantic information of the operation area, constructing a semantic map corresponding to the operation area;

and sending the semantic map to the autonomous mobile equipment so that the autonomous mobile equipment can execute an operation task based on the semantic map.

14. A semantic map construction method is suitable for a server, and is characterized by comprising the following steps:

receiving environment information of a working environment reported by an autonomous mobile device, wherein the working environment comprises at least one working partition;

according to the scene semantic information of the at least one operation partition, constructing a semantic map corresponding to the operation environment;

15. A method of operation for an autonomous mobile device, the method comprising:

receiving a work instruction, wherein the work instruction comprises scene semantic information of a target work area;

determining the position of the target operation area in a pre-established semantic map according to the scene semantic information of the target operation area;

and moving to the target working area based on the position, and executing a working task in the target working area.

16. The method of claim 15, wherein receiving a job instruction comprises:

displaying the semantic map, wherein the semantic map comprises at least one operation area and scene semantic information thereof;

and responding to the selection operation of the user on the at least one working area, and generating a working instruction, wherein the working instruction comprises the target working area selected by the user and the scene semantic information of the target working area.

17. An operation method is applicable to terminal equipment, and is characterized in that the method comprises the following steps:

displaying a semantic map corresponding to a working environment of the autonomous mobile device, wherein the semantic map comprises at least one working area and scene semantic information thereof;

responding to the selection operation of the user on the at least one operation area, and generating an operation instruction, wherein the operation instruction comprises a target operation area selected by the user and scene semantic information of the target operation area;

and sending the operation instruction to the autonomous mobile equipment so as to control the autonomous mobile equipment to execute an operation task in the target operation area.

18. An autonomous mobile device, comprising: the device comprises a device body, wherein one or more memories and one or more processors are arranged on the device body;

the one or more memories for storing a computer program;

the one or more processors to execute the computer program to:

collecting environmental information of a working area of the autonomous mobile device;

performing target detection on the environmental information to obtain semantic information of the at least one object;

19. An autonomous mobile device, comprising: the device comprises a device body, wherein one or more memories and one or more processors are arranged on the device body;

the one or more memories for storing a computer program;

the one or more processors to execute the computer program to:

collecting environment information of a work environment of the autonomous mobile device, the work environment comprising at least one work partition;

20. A server, comprising: one or more memories, one or more processors, and a communications component;

the one or more memories for storing a computer program;

the one or more processors to execute the computer program to:

receiving the environment information of the operation area reported by the autonomous mobile equipment through the communication assembly;

sending, by the communication component, the semantic map to the autonomous mobile device for the autonomous mobile device to perform a job task based on the semantic map.

21. A server, comprising: one or more memories, one or more processors, and a communications component;

the one or more memories for storing a computer program;

the one or more processors to execute the computer program to:

receiving, by the communication component, environment information of a work environment thereof reported by an autonomous mobile device, the work environment including at least one work partition;

22. An autonomous mobile device, comprising: the device comprises a device body, wherein one or more memories, one or more processors and a communication assembly are arranged on the device body;

the one or more memories for storing a computer program;

the one or more processors to execute the computer program to:

receiving a work instruction through the communication assembly, wherein the work instruction comprises scene semantic information of a target work area;

23. A terminal device, comprising: one or more memories, one or more processors, communication components, and a display;

the display is used for displaying a semantic map corresponding to the working environment of the autonomous mobile equipment, and the semantic map comprises at least one working area and scene semantic information thereof;

the one or more memories for storing a computer program;

the one or more processors to execute the computer program to:

and sending the operation instruction to the autonomous mobile equipment through the communication component so as to control the autonomous mobile equipment to execute the operation task in the target operation area.

24. A computer-readable storage medium storing a computer program, wherein the computer program, when executed by one or more processors, causes the one or more processors to perform the steps of the method of any one of claims 1-17.