CN111496784A

CN111496784A - Space environment identification method and system for robot intelligent service

Info

Publication number: CN111496784A
Application number: CN202010228789.1A
Authority: CN
Inventors: 吴皓; 马庆; 焦梦林
Original assignee: Shandong University
Current assignee: Shandong University
Priority date: 2020-03-27
Filing date: 2020-03-27
Publication date: 2020-08-07
Anticipated expiration: 2040-03-27
Also published as: CN111496784B

Abstract

The utility model provides a space environment recognition method and system for robot intelligent service, belonging to the robot technical field, which obtains at least one image of the space environment area to be recognized; acquiring a model by adopting a preset visual relationship according to the acquired image to obtain a visual relationship triple of the object in the image; obtaining a model by using a preset attribute according to the obtained article characteristics to obtain the attribute of each article; obtaining an article multiple relation graph in the current image according to the obtained visual relation triples of each article and the attributes of each article; processing each image of the environment area to be identified to obtain an article multiple relation graph, and constructing a robot semantic visual space according to the article multiple relation graph of each obtained image to realize identification of the space environment; according to the method and the device, the relation between the objects in the space and the relation between the objects and the attributes are obtained by constructing the relation triple, and the environment recognition capability of the space is greatly improved.

Description

Space environment identification method and system for robot intelligent service

Technical Field

The disclosure relates to the technical field of robots, in particular to a space environment identification method and system for robot intelligent service.

Background

The statements in this section merely provide background information related to the present disclosure and may not necessarily constitute prior art.

When the robot is in intelligent service, the environment needs to be understood at a semantic level, particularly, the environment can be mapped to a semantic space in the aspect of vision, the cognitive intelligence of the robot is enhanced, and tasks are better analyzed.

The inventor of the present disclosure finds that (1) most of the existing space identification methods simply identify images without considering the attributes of the articles in the images, which results in inaccurate space identification and can not define the relationship of each article in the space well; (2) the existing recognition algorithm mainly adopts a neural network algorithm, realizes the feature extraction of an image so as to realize the recognition of articles in the image, has less research on the relative position relationship in the specific image, and cannot effectively express the relative position relationship among the articles in the space.

Disclosure of Invention

In order to solve the defects of the prior art, the space environment identification method and system for robot intelligent service are provided by the disclosure, the relation among all articles in the space and the relation among the articles and the attributes are obtained by constructing the relation triple, and the environment identification capability of the space is greatly improved.

In order to achieve the purpose, the following technical scheme is adopted in the disclosure:

the first aspect of the disclosure provides a space environment identification method for robot intelligent service.

A space environment identification method for robot intelligent service comprises the following steps:

acquiring at least one image of a spatial environment region to be identified;

acquiring a model by adopting a preset visual relationship according to the acquired image, acquiring article characteristics in the image and relationship characteristics among articles, and further acquiring a visual relationship triple of the articles in the image;

obtaining a model by using a preset attribute according to the obtained article characteristics to obtain the attribute of each article;

obtaining an article multiple relation graph in the current image according to the obtained visual relation triples of each article and the attributes of each article;

and processing each acquired image of the environment area to be identified to obtain an article multiple relation graph, and constructing a robot semantic visual space according to the article multiple relation graph of each acquired image to realize identification of the space environment.

A second aspect of the present disclosure provides a spatial environment recognition system for a robot-intelligent service.

A spatial environment recognition system for robotic intelligence services, comprising:

a data acquisition module configured to: acquiring at least one image of a spatial environment region to be identified;

a visual relationship triplet acquisition module configured to: acquiring a model by adopting a preset visual relationship according to the acquired image, acquiring article characteristics in the image and relationship characteristics among articles, and further acquiring a visual relationship triple of the articles in the image;

an item attribute acquisition module configured to: obtaining a model by using a preset attribute according to the obtained article characteristics to obtain the attribute of each article;

an item multiple relationship diagram acquisition module configured to: obtaining an article multiple relation graph in the current image according to the obtained visual relation triples of each article and the attributes of each article;

an environment identification module configured to: and processing each image of the environment area to be identified to obtain an article multiple relation graph, and constructing a robot semantic visual space according to the obtained article multiple relation graph of each image to realize the identification of the environment.

A third aspect of the present disclosure provides a medium on which a program is stored, the program implementing, when executed by a processor, the steps in the spatial environment recognition method for robot intelligence service according to the first aspect of the present disclosure.

A fourth aspect of the present disclosure provides an electronic device, including a memory, a processor, and a program stored on the memory and executable on the processor, where the processor executes the program to implement the steps in the method for identifying a spatial environment for robot intelligent services according to the first aspect of the present disclosure.

Compared with the prior art, the beneficial effect of this disclosure is:

1. according to the method, the system, the medium and the electronic equipment, the preset visual relationship obtaining model is adopted, the object features in the image and the relationship features between the objects are obtained, the triple features of the objects in the image are obtained, the visual relationship triple of each object in the image is further obtained, the identification of the relative position relationship of each object in the image is achieved, and the identification capacity of the space is improved.

2. According to the method, the system, the medium and the electronic equipment, the model is obtained by using the preset attribute according to the obtained article characteristics, the attribute of each article is obtained, the article multi-relation graph in the current image is obtained according to the obtained visual relation triple of each article and the attribute of each article, the visual relation of each article and the attribute of each article are effectively combined, and the space environment is more efficiently and accurately identified.

Drawings

Fig. 1 is a schematic diagram of a visual relationship acquisition framework provided in embodiment 1 of the present disclosure.

Fig. 2 is a schematic diagram of a BRNN framework in the semantic visual space construction provided in embodiment 1 of the present disclosure.

Fig. 3 is a multiple relation diagram in a semantic visual space provided in embodiment 1 of the present disclosure.

Detailed Description

It should be noted that the following detailed description is exemplary and is intended to provide further explanation of the disclosure. Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this disclosure belongs.

It is noted that the terminology used herein is for the purpose of describing particular embodiments only and is not intended to be limiting of example embodiments according to the present disclosure. As used herein, the singular forms "a", "an" and "the" are intended to include the plural forms as well, and it should be understood that when the terms "comprises" and/or "comprising" are used in this specification, they specify the presence of stated features, steps, operations, devices, components, and/or combinations thereof, unless the context clearly indicates otherwise.

The embodiments and features of the embodiments in the present disclosure may be combined with each other without conflict.

Example 1:

as shown in fig. 1 to 3, embodiment 1 of the present disclosure provides a spatial environment recognition method for robot intelligent service, including the following steps:

acquiring at least one image of a spatial environment region to be identified;

according to the obtained image, a preset visual relationship obtaining model is adopted, the object features in the image and the relationship features between the objects are obtained, the triple features of the objects in the image are obtained, and then the visual relationship triple of the objects in the image is obtained;

The visual space is constructed by the following steps:

firstly, the object relation in the environment is mapped to a semantic level, and the cognition of the robot to the service environment space relation is enhanced. Next, in order to further understand the intrinsic logical relationship of the item, a method of obtaining the item attribute from the network is given. And finally, storing the obtained environmental knowledge in a form of a multi-relation graph to complete the construction of the visual space.

The method specifically comprises the following steps:

(1) acquisition of visual relationships

In the semantic visual space construction process, firstly, the object visual relationships in the scene, such as "Computer on task", are obtained, and these relationships generally express < subject, predictor, object > in the form of triples. The visual relation realizes higher-level understanding of semantic level and is an important way for realizing the increase from perception intelligence to cognitive intelligence. In the construction work of the semantic visual space, the visual relation directly provides the leading knowledge in the environment. Fig. 1 provides a method for acquiring a visual relationship, which combines an image and semantic information to acquire a visual relationship.

In the method, firstly, a fast RCNN (Faster convolutional neural network based on regions) technology is used for target detection, and the semantics of a target object and the coordinate information of a detection frame of the target object are obtained. And performing feature coding on the semantics of the target object, and acquiring the relationship features between the objects according to the detection frame information to complete the construction of the triple features. Inputting the triple characteristics into a Bi-RNN (bidirectional recurrent neural network) network to construct a relation triple. In particular, considering that the existence of the dynamic object can influence the robustness of the constructed visual relationship, in the framework, the dynamic article filtering module is used at the same time, so that the influence of the dynamic object on the visual relationship library is reduced.

(1-1) triple feature construction

The triple features include word vectors for constructing entity targets, and entity relationship coding. A BERT (Bidirectional Encoder representation of transducers) model is adopted for constructing word vectors of an entity target, semantic labels in a data set are used as training data and input into the BERT model, fine adjustment of the model is carried out, parameters obtained by the model are used as characteristics, and the characteristics are the word vectors.

When obtaining the relational features, the spatial relationships between the targets are utilized to model the relational features. Relationship between objects uses [ x ]₁,x₂,x₃]Denotes x₁Represents object 1, x₃Represents object 3, x₂Representing a relationship. The spatial relationship s is represented as follows:

in the formula 1, the first and second groups of the compound,

a prediction block (i ═ 1,3) representing a target object;

refers to the center coordinates, w, of the prediction box_iAnd h_iRepresenting the width and height of the prediction box, respectively. W and H represent the width and height of the intersection of the target 1 and target 3 prediction boxes;

after the spatial relation s is obtained, the s is input into a multilayer perceptron (M L P) to obtain a feature representation x with the same dimension as a word vector₂And thus obtain the feature expression of the triplet.

(1-2) construction of visual relationship

And after the triple features are obtained, inputting the features into the Bi-RNN for visual relation prediction. The BRNN (bidirectional recurrent neural network) is an efficient natural language processing model, which is formed by superposing two RNN networks up and down, and the output is determined by the states of the two RNN (convolutional neural networks). In the visual relationship acquisition model, the output of the Bi-RNN is designed to be one rather than multiple outputs, and the input is a feature of a triplet, as shown in fig. 2.

In the framework shown in fig. 2, forward propagation is included

And backward propagation

The framework is mainly divided into an input layer, a hidden layer and an output layer. Wherein the formulas for the hidden layer and the output layer are defined as follows:

in the context of these formulas, the expression,

and

the parameter matrixes respectively represent an input layer-hidden layer and a hidden layer-hidden layer in the forward propagation process.

And

respectively, the hidden layer-output layer parameter matrix and the offset vector in the forward propagation process. The meaning of the parameters in the back propagation process can be obtained in the same way. x is the number of_tRepresenting the input, y the output, f the activation function, b_yIs an offset value.

From the framework shown in FIG. 2 and equation (4), the predicted relationship output y can be obtained:

wherein h is₁,h₂,h₃The values in the forward propagation and the backward propagation can be obtained by formula (2) and formula (3), and y represents the predicted relationship, such as "while", "on", etc. And after the word vectors and the relation characteristics are output through Bi-RNN training, the visual relation triple can be obtained.

(1-3) dynamic object Filtering

In the process of acquiring the visual relationship, the service robot pays attention to the relationship among the articles in the environment, and the relationship among the articles is always kept unchanged for a certain time. However, the existence of dynamic objects in the environment directly affects the timeliness of the visual relationship. For example, when detecting visual relationships, relationships between people and items are often extracted, and these relationships may change rapidly. If the relation is added into the relation library, inconsistency with reality is generated when the visual relation is searched in subsequent services, meanwhile, the redundancy degree of the relation library is increased, and the service efficiency of the robot is reduced. Therefore, the obtained triple relationship containing the dynamic object needs to be filtered.

The dynamic filtering module mainly provides a dynamic object list, and when detecting that the visual relation triplets contain dynamic objects, the triplets are filtered. The objects in the list may be added according to common sense, e.g. "people". The objects can also be added according to experience in actual operation, and through multiple operation observation, if the visual relationship of a certain type of objects changes rapidly, the objects are added. Thereby finally completing the acquisition of the visual relationship.

(2) Attribute knowledge acquisition

When the robot executes the intelligent service, not only the physical relationship between the objects needs to be understood, but also the deep-level attributes of the objects need to be further understood, so that the relationship of the objects on the logical level can be better understood. Therefore, when building the semantic visual space, the attributes of the item also need to be added. Due to the diversity of items and related attributes in the environment, it is desirable to obtain the related attributes of the items from the network.

In the process of acquiring the related attributes of the article, firstly, the article semantics are used as key words to search related information on the network, and after the search is carried out, a series of result character strings are obtained. Attributes of the item are extracted from the result strings.

In the method, text information is extracted by using a BERT model, characteristic coding is carried out, and the coding is input to a CRF layer for information decoding to obtain a labeling sequence, wherein the CRF refers to a conditional random field and is a discriminant probability undirected graph model used for labeling and dividing sequence data, the BERT-CRF is an improvement on the basis of Bi-L STM, the BERT model is used for replacing Bi-L STM, and the BERT model and the CRF have similar training principles.

Input sequence X ═ X₁,X₂,…,X_n) Obtaining a predicted tag sequence y (y) through output of a BERT-CRF model₁,y₂,..,y_n)。

The scores defining the predicted sequences are as follows:

wherein

Indicating that the output result at the ith position is y_iThe probability of (c).

Representing the transition probability from state i to i + 1. The scoring function takes the previous states into account and the obtained result is more consistent with the actual output result.

During training, for each sample X, all possible sequences y are scored as s (X, y), and the total score is normalized:

in equation 7, Y represents the correct annotation sequence and Y represents all possible annotation sequences. The loss function can be obtained by equation 7, as follows:

through continuous iterative training of the loss function in the model, the loss function is reduced to the minimum value, and therefore an ideal entity labeling effect is obtained. The attribute keywords in the result character string are extracted through the model, and the relationship is established with the objects in the environment, so that the object semantics in the environment are enriched, and the knowledge system in the semantic visual space is further improved.

(3) Storage of environmental knowledge

After the visual relationship and the attribute of the article in the environment are acquired, the semantic visual space is constructed to finish data preparation, and the connection form of the data knowledge is diversified, so that the diversified information needs to be efficiently organized and expressed in a multi-relationship graph form. The basic construction form of the multi-relationship graph is nodes and edges, wherein the nodes store entity semantic and item attribute information, and the edges represent the relationship between entities and the relationship between the entities and the attributes. Unlike a general graph, a multi-relationship graph contains multiple types of nodes and multiple types of edges. FIG. 3 is an example of a multiple relationship graph store.

In fig. 3, the gray boxes represent entity information in the environment and the white boxes represent attributes of the item. It can be seen that the visual relationship and the article attribute in the environment are combined in the form of a multiple relationship diagram, and through the combination, the robot can understand the article relationship in the environment not only at the semantic level, but also increase the knowledge of the article at the logical level, thereby forming a uniform and standard expression.

And storing the acquired environmental knowledge through a multi-relation graph to finally form a semantic visual space suitable for the intelligent service of the robot.

Example 2:

an embodiment 2 of the present disclosure provides a spatial environment recognition system for robot intelligent service, including:

a visual relationship triplet acquisition module configured to: according to the obtained image, a preset visual relationship obtaining model is adopted, the object features in the image and the relationship features between the objects are obtained, the triple features of the objects in the image are obtained, and then the visual relationship triple of the objects in the image is obtained;

The working method of the system described in this embodiment is the same as the spatial environment recognition method for the robot intelligent service in embodiment 1, and details are not repeated here.

Example 3:

the embodiment 3 of the present disclosure provides a medium on which a program is stored, which when executed by a processor, implements the steps in the spatial environment recognition method for robot intelligent service according to the embodiment 1 of the present disclosure.

Example 4:

the embodiment 4 of the present disclosure provides an electronic device, which includes a memory, a processor, and a program stored in the memory and executable on the processor, and when the processor executes the program, the steps in the method for identifying a spatial environment for robot intelligent services according to embodiment 1 of the present disclosure are implemented.

As will be appreciated by one skilled in the art, embodiments of the present disclosure may be provided as a method, system, or computer program product. Accordingly, the present disclosure may take the form of a hardware embodiment, a software embodiment, or an embodiment combining software and hardware aspects. Furthermore, the present disclosure may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, optical storage, and the like) having computer-usable program code embodied therein.

The present disclosure is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the disclosure. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

It will be understood by those skilled in the art that all or part of the processes of the methods of the embodiments described above can be implemented by a computer program, which can be stored in a computer-readable storage medium, and when executed, can include the processes of the embodiments of the methods described above. The storage medium may be a magnetic disk, an optical disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), or the like.

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the spirit and principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. A space environment identification method for robot intelligent service is characterized by comprising the following steps:

acquiring at least one image of a spatial environment region to be identified;

2. The spatial environment recognition method for robot intelligent services according to claim 1, wherein the preset visual relationship acquisition model comprises a convolutional neural network and a bidirectional cyclic neural network for target detection, and specifically comprises:

performing target detection on the obtained image according to the convolutional neural network, acquiring the semantics of a target article and the coordinate information of a detection frame thereof, performing feature coding on the semantics of the target article and a non-target article, and acquiring the relationship features between the articles according to the coordinate information of the detection frame to obtain triple features;

and inputting the obtained triple features into a bidirectional cyclic neural network for visual relationship prediction to obtain the visual relationship triples of each article in the image.

3. The spatial environment recognition method for robot intelligent services according to claim 2, wherein the preset visual relationship obtaining model further includes a dynamic filtering module, and the dynamic filtering module filters out the visual relationship triples when detecting that the visual relationship triples contain dynamic objects through a preset dynamic object list.

4. The method of claim 3, wherein if the speed of change of the visual relationship of a certain type of objects exceeds a preset threshold, adding the objects into the dynamic object list to filter out the corresponding visual relationship triples.

5. The spatial environment recognition method for robot intelligent services according to claim 1, wherein the attributes of each item are obtained using a preset attribute acquisition model, specifically:

searching article correlation in a preset database by taking the semantics of the articles as keywords to obtain a plurality of result character strings;

according to the obtained result character string, identifying attribute keywords in the result by adopting a BERT-CRF model, which specifically comprises the following steps:

semantic text information of the articles is extracted by using a BERT model, characteristic coding is carried out, the codes are input into a CRF layer for information decoding, and the labeling sequences of the articles are obtained.

6. The spatial environment recognition method for robotic intelligence services as claimed in claim 1 wherein the item multiple relationship graph includes a plurality of nodes for storing entity semantics and item attribute information and a plurality of edges representing relationships between entities and attributes.

7. The spatial environment recognition method for robotic intelligence services as claimed in claim 1, wherein the relationship features between items are characterized by spatial relationships between objects, specifically:

the triple features comprise word vectors of objects in the image, and the relation between the targets uses [ x ]₁，x₂，x₃]Denotes x₁Represents object 1, x₃Represents object 3, x₂Representing the relationship, then the spatial relationship s is represented as follows:

wherein,

a prediction block representing a target object, wherein i-1 or 3,

to predict the center coordinates of the box, w_iAnd h_iRespectively representing the width and height of the prediction box, W and H representing the width and height of the intersection of the target 1 and target 3 prediction boxes,

is the encoding of the relevant spatial relationships;

after the spatial relation s is obtained, the s is input into a multilayer perceptron, and a feature representation x with the same dimension as the word vector is obtained₂And thus obtain the feature expression of the triplet.

8. A spatial environment recognition system for robotic intelligence services, comprising:

9. A medium on which a program is stored, characterized in that the program, when being executed by a processor, realizes the steps in the spatial environment recognition method for robot intelligence services according to any one of claims 1 to 7.

10. An electronic device comprising a memory, a processor and a program stored on the memory and executable on the processor, wherein the processor implements the steps of the method for spatial environment recognition for robot intelligence services according to any of claims 1-7 when executing the program.