CN112509392B

CN112509392B - Robot behavior teaching method based on meta-learning

Info

Publication number: CN112509392B
Application number: CN202011483927.7A
Authority: CN
Inventors: 胡梓烨; 李伟; 甘中学; 王旭升; 胡林强
Original assignee: Fudan University
Current assignee: Fudan University
Priority date: 2020-12-16
Filing date: 2020-12-16
Publication date: 2022-11-29
Anticipated expiration: 2040-12-16
Also published as: CN112509392A

Abstract

The invention provides a robot behavior teaching method based on meta-learning, which is characterized by comprising the following steps of: acquiring a teaching video; and learning the teaching video by using the trained neural network model. The training process of the neural network model comprises the following steps: collecting training content; preprocessing the training content to obtain a preprocessing comparison video, a preprocessing teaching video and a preprocessing motion video; constructing an initial neural network model; taking the preprocessed demonstration video as input to obtain demonstration action and calculating the loss of the demonstration action; updating the initial neural network model according to the demonstration action loss to obtain an updated model; taking the preprocessed motion video and the track action as input to obtain predicted track action, demonstration semantics, motion semantics and comparison semantics, and calculating target action loss and semantic loss to further construct total loss; updating the updated model based on the total loss; until the total loss stably converges to the total loss threshold value, and a well-trained neural network model is obtained.

Description

Robot behavior teaching method based on meta-learning

Technical Field

The invention relates to the technical field of program control manipulators, in particular to a robot behavior teaching method based on meta-learning.

Background

With the development of artificial intelligence technology and the wide application of robots in the fields of aerospace, education, service, detection, medical treatment and the like, the intelligence of the robots is widely concerned by people. However, most conventional robots have a limited level of intelligence and lack the ability to learn and flexibly adapt to task changes. Taking an industrial robot as an example, when the production line is set every time, calibration needs to be performed in advance and a professional needs to perform programming, which is very troublesome. In practical industrial applications, there are often cases where the production line needs to be reset due to a problem of a change in business. The production line is remodeled every time, the position of a workpiece needs to be positioned in a troublesome and accurate mode, time-consuming and labor-consuming programming of professionals is needed, and the cost of manpower and material resources is high. Therefore, how to save manpower and material resources by simple visual teaching so as to simplify the process is a problem in the current industrial field.

Based on the above difficulties, one possible approach is behavior teaching based on meta-learning: in view of the ability of human beings to quickly learn new behaviors by observing others, robots should also have the ability to learn as well as human beings. At present, model agnostic learning algorithm (MAML) ^[1] The method is one of the best meta-learning methods at present, and is a simple but powerful meta-learning technology. Although the meta-learning algorithms such as the MAML algorithm have good performances in the fields of regression, classification, image super-resolution, reinforcement learning and the like, the meta-learning algorithms such as the MAML algorithm still have a lot of problems in the application process. For example, a multi-element learning method ^[1,2,3,4] The performance of the algorithm degrades rapidly when only visual information is provided as a teaching during use.

Reference to the literature

[1]Finn C,Abbeel P,Levine S.Model-agnostic meta-learning for fast adaptation of deep networks[J].arXiv preprint arXiv:1703.03400,2017.

[2]Finn C,Yu T,Zhang T,et al.One-shot visual imitation learning via meta-learning[J].arXiv preprint arXiv:1709.04905,2017.

[3]James S,Bloesch M,Davison A J.Task-embedded control networks for few-shot imitation learning[J].arXiv preprint arXiv:1810.03237,2018.

[4]Yu T,Finn C,Xie A,et al.One-shot imitation from observing humans via domain-adaptive meta-learning[J].arXiv preprint arXiv:1802.01557,2018.

Disclosure of Invention

In order to solve the problems, the invention provides a robot behavior teaching method which can learn the motion path planning of the robot and can combine different semantic environments to define a target task to be executed so as to enhance the stability and the robustness, and the invention adopts the following technical scheme:

the invention provides a baseA robot behavior teaching method based on meta learning is used for learning teaching videos acquired by a robot so as to complete various tasks, and is characterized by comprising the following steps: step S1, a teaching video is obtained; s2, learning the teaching video by using a pre-trained neural network model so as to complete various tasks, wherein the training process of the neural network model comprises the following steps: step T1, collecting a video V containing contrast _comparison Training and teaching video V _demo Robot motion video V _robot And a track motion V _action (ii) a Step T2, comparing the video V by using a preset data normalization method _comparison Training and teaching video V _demo And robot motion video V _robot Carrying out normalization processing to obtain a preprocessed contrast video V' _comparison And preprocessing the teaching video V' _demo And pre-processing the motion video V' _robot And unifying the time length; step T3, constructing an initial neural network model theta; step T4, teaching video V 'of the preprocessing' _demo Inputting the initial neural network model theta to obtain a demonstration action D _action And calculating and demonstrating motion loss L _demo ：

(ii) a Step T5, according to the demonstration action loss L _demo Performing parameter updating on the initial neural network model theta to obtain an updated neural network model serving as an updated model theta':

in the formula, lambda is the learning rate of the hyper-parameter; step T6, teaching video V 'of the preprocessing' _demo Inputting the updated model theta' to obtain the predicted demonstration track motion P _action-demo And corresponding presentation semantics E _demo And comparing the preprocessed contrast video V' _comparison Inputting the updated model theta' to obtain the predicted contrast track motion P _{action-comparison} And corresponding contrast semantics E _comparison And converting the preprocessed motion video V' _robot Robot track motion P predicted by inputting updated model theta _action-robot And corresponding robot object semantics E _target Wherein we represent each presentation semantic E by an N-dimensional vector that takes values as a set of real numbers _demo Comparison semantics E _comparison And robot object semantics E _target ：

P _action-demo ,E _demo ＝f _θ' (V' _demo )

P _{action-comparison} ,E _comparison ＝f _θ' (V' _comparison ) (3)

P _action-robot ,E _target ＝f _θ' (V' _robot )

(ii) a Step T7, acting V based on the track _action And predicted robot trajectory action P _action-robot Calculating to obtain the target action loss L _action ：

(ii) a Step T8, according to the demonstration semantic E _demo Motion semantics E _target And contrast semantics E _comparison Calculating to obtain semantic loss L _embedding ：

L _embedding ＝∑max[0,M-E _demo ·E _target +E _target ·E _comparison +E _demo ·E _comparison ] (5)

Wherein M is a threshold value; step T9, based on the target action loss L _action And semantic loss L _embedding The total loss L is obtained:

L＝αL _action +βL _embedding (6)

in the formula, alpha and beta are hyper-parameters; step T10, carrying out derivation based on the total loss L to obtain a loss gradient

Thereby completing the update of the updated model theta 'to obtain a neural network model theta':

in the formula, delta is a hyper-parameter learning rate; step T11, repeating the steps T4 to T11 for a predetermined training time until the total loss L stably converges to a predetermined total loss threshold L _margin And obtaining the trained neural network model.

The robot behavior teaching method based on the meta learning provided by the invention can also have the technical characteristics that the initial neural network model is an end-to-end neural network model.

The robot behavior teaching method based on meta learning provided by the invention can also have the technical characteristics that the tasks comprise an arrival task of the robot to reach the target position and a push task of the robot to move the target object.

Action and effects of the invention

According to the robot behavior teaching method based on meta learning, the updated model is obtained according to the demonstration action loss through updating, and the demonstration action loss is obtained according to the pre-processing teaching video, so that the robot can understand various demonstration works in the pre-processing teaching video. And due to the fact that the video V 'is pre-processed and taught' _demo Pre-processed motion video V' _robot And pre-processed contrast video V' _comparison Inputting the updated model to obtain the corresponding demonstration semantic E _demo Motion semantics E _target And contrast semantics E _comparison And constructing semantic loss L based on the semantics _embedding Therefore, the robot can accurately understand task semantics under different semantic environments, such as samples with the same task but different task environments have similar semantic environments, samples with different task targets but similar environments have different semantic environments, and finally, the result is thatThe neural network model has a more definite semantic target. Also, different from the pre-processing teaching video V 'is introduced in the neural network training process' _demo And pre-processing motion video V' _robot Pairs of preprocessed contrast video V' _comparison Therefore, the semantic vectors under different scenes of unsupervised learning are realized, and the contrast induction capability and stability of the neural network model are enhanced, so that the neural network model has better robustness, and a prediction result with higher precision is deduced.

The robot behavior teaching method based on meta-learning can enable the robot to quickly understand the behavior intention of the human according to the teaching video when the industrial production line is reset, so that new tasks after the industrial production line is reset, such as an arrival task of automatic path planning and a push task of object movement in object sorting, can be completed.

Drawings

Fig. 1 is a flowchart of a robot behavior teaching method based on meta learning according to an embodiment of the present invention;

FIG. 2 is a diagram illustrating an arrival task and a push task according to an embodiment of the present invention;

FIG. 3 is a live view of an arrival task and a push task in accordance with an embodiment of the present invention;

FIG. 4 is a flow chart of a training process of a neural network model according to an embodiment of the present invention; and

fig. 5 is a schematic diagram of a neural network model according to an embodiment of the present invention.

Detailed Description

In order to make the technical means, the creation features, the achievement purposes and the effects of the invention easy to understand, the following describes a robot behavior teaching method based on meta-learning specifically with reference to the embodiments and the accompanying drawings.

< example >

The robot behavior teaching method based on meta learning in the embodiment is specifically completed on the UR10 robot, and a camera is placed on a UR10 robot worktable and connected with a computer for visual information feedback. Meanwhile, a table is placed in front of the workbench, and objects in various shapes are placed on the table, so that the robot can carry out tasks such as arriving, taking, putting down, pushing and the like on the objects.

In the robot behavior teaching method based on meta learning in the embodiment, the pre-processing teaching video is mainly used for object sorting, and the object sorting task is understood and an appropriate path route is automatically planned based on the input human hand pre-processing teaching video related to object sorting, so that manpower and material resources are saved, and the process is simple.

Fig. 1 is a flowchart of a robot behavior teaching method based on meta learning according to an embodiment of the present invention.

As shown in fig. 1, the robot behavior teaching method based on meta learning includes the following steps:

and S1, acquiring a teaching video.

In this embodiment, the teaching video is a teaching video related to an object sorting task.

And S2, learning the teaching video by using a pre-trained neural network model so as to complete various tasks.

FIG. 2 is a diagram illustrating an arrival task and a push task according to an embodiment of the present invention.

FIG. 3 is a diagram of a real scene of an arrival task and a push task according to an embodiment of the present invention.

As shown in fig. 2 and 3, the object sorting task mainly involves an arrival task in which the robot arrives at a target position and a push task in which the robot moves the target object.

In a task of arrival, the robot learns a demonstration part about the task of arrival in a teaching video through a trained neural network model, then according to the learning result of the task of arrival, the mechanical arm arrives at the position of a specified color block from a base representing an origin, in the task of arrival in fig. 2, the color blocks M, N and O are represented as different positions, wherein the color block M is the specified color block which the mechanical arm needs to arrive at, the demonstration input is that the mechanical arm learns to arrive at the position of the color block M from the position representing the origin, and the simulation output is that the mechanical arm arrives at the position of the color block M from the position of the origin. In fig. 3, the arrival task refers to the arrival of the robot arm from the base position to the strip "3" position.

In the task pushing process, the robot learns the demonstration part about the task pushing process in the teaching video through the trained neural network model, and then the mechanical arm is controlled to push an object to a specified position according to the task pushing learning result. In the present embodiment, the pushing task is that the robot controls the robot arm to push the object to the learning result of the arrival task learned in the arrival task, that is, to push the object to a specified position (as shown in fig. 2), specifically, in fig. 3, the pushing task is to move the square "1" or the hexagon "2" to the long bar "3" position by the robot arm.

Fig. 4 is a flowchart of a training process of a neural network model according to an embodiment of the present invention.

As shown in fig. 4, the training process of the neural network model includes the following steps:

step T1, collecting a video V containing contrast _comparison Training teaching video V _demo Robot motion video V _robot And a track motion V _action As the training content, step T2 is then entered.

Wherein the track action V _action The motion trajectory is a motion taught by a standard expert as a reference, such as the direction of up-down, left-right motion, corresponding to the robot in each frame of image in the training video.

In addition, training teaching video V _demo Including a presentation video for a presentation task, a target task video for a target task, and a contrast video V for a contrast task _comparison And so on. In particular, V _demo The video comprises a plurality of videos A, a, B, B, C and C, and if A is a demonstration video and B is a target task video, a comparison video V is used _comparison Can be B or C, and can be used as the contrast video V only if different from the demonstration video A _comparison 。

At the same time, V _action Actions are taught for standard experts as a reference.

Step T2, normalization by means of predetermined dataMethod for contrast video V as training content _comparison Training and teaching video V _demo And robot motion video V _robot Carrying out normalization processing to obtain a preprocessed contrast video V' _comparison And preprocessing the teaching video V' _demo And pre-processing the motion video V' _robot And unifying the time length, and then entering step T3.

And step T3, constructing an initial neural network model theta, and then entering step T4.

Wherein, the initial neural network model is an end-to-end neural network model.

Step T4, teaching video V 'of the preprocessing' _demo Inputting the initial neural network model theta to obtain a demonstration action D _action And calculating and demonstrating motion loss L _demo ：

Step T5, according to the demonstration action loss L _demo Performing parameter updating on the initial neural network model theta to obtain an updated neural network model serving as an updated model theta':

in the formula, λ is a hyper-parameter learning rate, and then the process proceeds to step T6.

Step T6, the preprocessed teaching video V 'is processed' _demo Input of updated model θ' to obtain predicted demonstration track action P _action-demo And corresponding presentation semantics E _demo And (c) converting the preprocessed contrast video V' _comparison Inputting the updated model theta' to obtain the predicted contrast track motion P _{action-comparison} And corresponding contrast semantics E _comparison The preprocessed motion video V' _robot Robot track motion P predicted by inputting updated model theta _action-robot And corresponding robot object semantics E _target Wherein we use values ofRepresenting each presentation semantic E by an N-dimensional vector of a set of real numbers _demo Comparison semantics E _comparison And robot object semantics E _target ：

P _action-demo ,E _demo ＝f _θ' (V' _demo )

P _{action-comparison} ,E _comparison ＝f _θ' (V' _comparison ) (3)

P _action-robot ,E _target ＝f _θ' (V' _robot )

Wherein, the contrast video V 'is preprocessed' _comparison Task target is different from pre-processed teaching video V' _demo And pre-processing motion video V' _robot To pre-process the teaching video V' _demo And pre-processed motion video V' _robot The task goals of (a) are kept consistent.

Step T7, acting V based on the track _action And predicted robot trajectory action P _action-robot Calculating to obtain the target action loss L _action ：

Step T8, according to the demonstration semantic E _demo Motion semantics E _target And contrast semantics E _comparison Calculating to obtain semantic loss L _embedding ：

Where M is the threshold, and then proceeds to step T9.

Step T9, based on the target action loss L _action And semantic loss L _embedding The total loss L is obtained:

L＝αL _action +βL _embedding (6)

where α and β are hyperparameters, and then proceeds to step T10.

And T10, performing derivation based on the total loss L to obtain a loss gradient

where δ is the hyper-parameter learning rate, and then proceeds to step T11.

Step T11, repeating the steps T4 to T11 for a predetermined number of training times until the total loss L stably converges to a predetermined total loss threshold L _margin And obtaining the trained neural network model. Specifically, the method comprises the following steps:

step T11-1, judging whether the preset training times are reached or not until the total loss L is stably converged to a preset total loss threshold value L _margin If yes, the process proceeds to step T11-2, otherwise, the process repeats step T4 to step T1.

And T11-2, obtaining the trained neural network model, and then entering an ending state.

As shown in fig. 5, the trained neural network model can understand the semantics of the demonstration task, the target task and the comparison task, and can complete the target task according to the demonstration task. The demonstration task comprises a video for teaching, a motion path and the like, the target task is to enable the robot to learn automatic path planning according to the provided demonstration task video, and the comparison task is used for performing comparison analysis in the learning process of the robot so as to learn to distinguish semantics under different scenes.

And building a CNN convolutional neural network according to the three tasks, wherein each task shares a weight, namely the same CNN convolutional neural network is shared.

Inputting the teaching video into the neural network model to obtain the predicted action output andpredicting semantic output, wherein the pre-processed teaching video V' _demo Corresponding presentation task and preprocessed motion video V' _robot The corresponding target task has a consistent task target, and the pre-processed contrast video V' _comparison The task goal of the corresponding contrast task is different from the presentation task and the goal task.

Specifically, in fig. 5, the demonstration task is input to the CNN convolutional neural network to obtain a corresponding demonstration prediction action output and a demonstration prediction semantic output; after the target task is input into the CNN convolutional neural network, corresponding support prediction action output and support prediction semantic output are obtained; after the contrast task is input into the CNN convolutional neural network, corresponding contrast prediction action output and contrast prediction semantic output are obtained, and the demonstration prediction semantic output and the support prediction semantic output point to the same color block (namely, the color block which is the same as the color block X in the figure 5), namely, the task target of the demonstration task is the same as that of the target task; the comparison prediction semantic output and the demonstration prediction semantic output are different from the color blocks supporting the prediction semantic output, namely the color blocks with the color of X and the color blocks with the color of Y in FIG. 5, namely the task targets of the comparison task and the demonstration task or the target task are different.

Examples effects and effects

According to the robot behavior teaching method based on meta learning provided by the embodiment, the updated model is obtained according to the updating of the demonstration action loss, and the demonstration action loss is obtained according to the pre-processing teaching video, so that the robot can understand various demonstration works in the pre-processing teaching video. And due to the fact that the video V 'is pre-processed and taught' _demo Preprocessing the motion video V' _robot And pre-processed contrast video V' _comparison Inputting the updated model to obtain the corresponding demonstration semantics E _demo Motion semantics E _target And contrast semantics E _comparison And constructing semantic loss L based on the semantics _embedding Therefore, the robot can accurately understand task semantics under different semantic environments, for example, samples of the same task but different task environments have similar semantic environments, and samples of different task targets but similar environments have different semanticsAnd environment, and finally, the neural network model has a more definite semantic target. Also due to the fact that teaching video V 'different from preprocessing is introduced in the neural network training process' _demo And pre-processing the motion video V' _robot Of preprocessed contrast video V' _comparison Therefore, the semantic vectors under different scenes of unsupervised learning are realized, and the contrast induction capability and stability of the neural network model are enhanced, so that the neural network model has better robustness, and a prediction result with higher precision is deduced.

In addition, in the embodiment, because the initial neural network model is an end-to-end neural network model, the neural network model has a small volume, can be directly transplanted into the robot and can directly deduce to obtain a prediction result, and time is saved.

The above-described embodiments are merely illustrative of specific embodiments of the present invention, and the present invention is not limited to the scope of the description of the above-described embodiments.

Claims

1. A robot behavior teaching method based on meta-learning is used for learning teaching videos acquired by a robot so as to complete various tasks, and is characterized by comprising the following steps:

s1, acquiring the teaching video;

s2, learning the teaching video by using a pre-trained neural network model so as to complete various tasks,

the training process of the neural network model comprises the following steps:

step T1, collecting a video V containing contrast _comparison Training and teaching video V _demo Robot motion video V _robot And a trajectory action V _action ；

Step T2, utilizing a preset data normalization method to compare the video V _comparison The training teaching video V _demo And the robot motion video V _robot Carrying out normalization processing to obtain a preprocessed contrast video V' _comparison And preprocessing the teaching video V' _demo And pre-processing motion video V' _robot And unify the duration;

step T3, constructing an initial neural network model theta;

in the formula, lambda is the learning rate of the hyper-parameter;

step T6, teaching video V 'of the preprocessing' _demo Inputting the updated model theta' to obtain the predicted demonstration track motion P _action-demo And corresponding presentation semantics E _demo And comparing the preprocessed contrast video V' _comparison Inputting the updated model theta' to obtain the predicted contrast track action P _{action-comparison} And corresponding contrast semantics E _comparison The preprocessed motion video V' _robot Robot track motion P predicted by inputting updated model theta _action-robot And corresponding robot object semantics E _target Wherein we represent each presentation semantic E by an N-dimensional vector that takes values as a set of real numbers _demo Comparison semantics E _comparison And robot object semantics E _target ：

Step T8, according to the demonstration semantic E _demo The motion semantics E _target And the contrast semantics E _comparison Calculating to obtain semantic loss L _embedding ：

Wherein M is a threshold value;

step T9, based on the target action loss L _action And the semantic loss L _embedding The total loss L is obtained:

L＝αL _action +βL _embedding (6)

in the formula, alpha and beta are hyper-parameters;

step T10, performing derivation based on the total loss L to obtain a loss gradient

Thereby completing the updating of the updated model theta 'to obtain a neural network model theta':

in the formula, delta is the learning rate of the hyper-parameter;

step T11, repeating the step T4 to the step T11 for a predetermined training number of times until the total loss L stably converges to a predetermined total loss threshold L _margin And obtaining the trained neural network model.

2. The meta learning based robot behavior teaching method according to claim 1, wherein:

wherein the initial neural network model is an end-to-end neural network model.

3. The meta learning based robot behavior teaching method according to claim 1, wherein:

wherein the tasks include an arrival task in which the robot arrives at a target position and a push task in which the robot moves a target object.