CN112509392B - Robot behavior teaching method based on meta-learning - Google Patents

Robot behavior teaching method based on meta-learning Download PDF

Info

Publication number
CN112509392B
CN112509392B CN202011483927.7A CN202011483927A CN112509392B CN 112509392 B CN112509392 B CN 112509392B CN 202011483927 A CN202011483927 A CN 202011483927A CN 112509392 B CN112509392 B CN 112509392B
Authority
CN
China
Prior art keywords
robot
action
video
neural network
demo
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202011483927.7A
Other languages
Chinese (zh)
Other versions
CN112509392A (en
Inventor
胡梓烨
李伟
甘中学
王旭升
胡林强
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Fudan University
Original Assignee
Fudan University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Fudan University filed Critical Fudan University
Priority to CN202011483927.7A priority Critical patent/CN112509392B/en
Publication of CN112509392A publication Critical patent/CN112509392A/en
Application granted granted Critical
Publication of CN112509392B publication Critical patent/CN112509392B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G09EDUCATION; CRYPTOGRAPHY; DISPLAY; ADVERTISING; SEALS
    • G09BEDUCATIONAL OR DEMONSTRATION APPLIANCES; APPLIANCES FOR TEACHING, OR COMMUNICATING WITH, THE BLIND, DEAF OR MUTE; MODELS; PLANETARIA; GLOBES; MAPS; DIAGRAMS
    • G09B5/00Electrically-operated educational appliances
    • G09B5/02Electrically-operated educational appliances with visual presentation of the material to be studied, e.g. using film strip
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06VIMAGE OR VIDEO RECOGNITION OR UNDERSTANDING
    • G06V20/00Scenes; Scene-specific elements
    • G06V20/40Scenes; Scene-specific elements in video content

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Computing Systems (AREA)
  • Artificial Intelligence (AREA)
  • Mathematical Physics (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Biophysics (AREA)
  • Multimedia (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Biomedical Technology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Business, Economics & Management (AREA)
  • Educational Administration (AREA)
  • Educational Technology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Medical Informatics (AREA)
  • Manipulator (AREA)

Abstract

The invention provides a robot behavior teaching method based on meta-learning, which is characterized by comprising the following steps of: acquiring a teaching video; and learning the teaching video by using the trained neural network model. The training process of the neural network model comprises the following steps: collecting training content; preprocessing the training content to obtain a preprocessing comparison video, a preprocessing teaching video and a preprocessing motion video; constructing an initial neural network model; taking the preprocessed demonstration video as input to obtain demonstration action and calculating the loss of the demonstration action; updating the initial neural network model according to the demonstration action loss to obtain an updated model; taking the preprocessed motion video and the track action as input to obtain predicted track action, demonstration semantics, motion semantics and comparison semantics, and calculating target action loss and semantic loss to further construct total loss; updating the updated model based on the total loss; until the total loss stably converges to the total loss threshold value, and a well-trained neural network model is obtained.

Description

Robot behavior teaching method based on meta-learning
Technical Field
The invention relates to the technical field of program control manipulators, in particular to a robot behavior teaching method based on meta-learning.
Background
With the development of artificial intelligence technology and the wide application of robots in the fields of aerospace, education, service, detection, medical treatment and the like, the intelligence of the robots is widely concerned by people. However, most conventional robots have a limited level of intelligence and lack the ability to learn and flexibly adapt to task changes. Taking an industrial robot as an example, when the production line is set every time, calibration needs to be performed in advance and a professional needs to perform programming, which is very troublesome. In practical industrial applications, there are often cases where the production line needs to be reset due to a problem of a change in business. The production line is remodeled every time, the position of a workpiece needs to be positioned in a troublesome and accurate mode, time-consuming and labor-consuming programming of professionals is needed, and the cost of manpower and material resources is high. Therefore, how to save manpower and material resources by simple visual teaching so as to simplify the process is a problem in the current industrial field.
Based on the above difficulties, one possible approach is behavior teaching based on meta-learning: in view of the ability of human beings to quickly learn new behaviors by observing others, robots should also have the ability to learn as well as human beings. At present, model agnostic learning algorithm (MAML) [1] The method is one of the best meta-learning methods at present, and is a simple but powerful meta-learning technology. Although the meta-learning algorithms such as the MAML algorithm have good performances in the fields of regression, classification, image super-resolution, reinforcement learning and the like, the meta-learning algorithms such as the MAML algorithm still have a lot of problems in the application process. For example, a multi-element learning method [1,2,3,4] The performance of the algorithm degrades rapidly when only visual information is provided as a teaching during use.
Reference to the literature
[1]Finn C,Abbeel P,Levine S.Model-agnostic meta-learning for fast adaptation of deep networks[J].arXiv preprint arXiv:1703.03400,2017.
[2]Finn C,Yu T,Zhang T,et al.One-shot visual imitation learning via meta-learning[J].arXiv preprint arXiv:1709.04905,2017.
[3]James S,Bloesch M,Davison A J.Task-embedded control networks for few-shot imitation learning[J].arXiv preprint arXiv:1810.03237,2018.
[4]Yu T,Finn C,Xie A,et al.One-shot imitation from observing humans via domain-adaptive meta-learning[J].arXiv preprint arXiv:1802.01557,2018.
Disclosure of Invention
In order to solve the problems, the invention provides a robot behavior teaching method which can learn the motion path planning of the robot and can combine different semantic environments to define a target task to be executed so as to enhance the stability and the robustness, and the invention adopts the following technical scheme:
the invention provides a baseA robot behavior teaching method based on meta learning is used for learning teaching videos acquired by a robot so as to complete various tasks, and is characterized by comprising the following steps: step S1, a teaching video is obtained; s2, learning the teaching video by using a pre-trained neural network model so as to complete various tasks, wherein the training process of the neural network model comprises the following steps: step T1, collecting a video V containing contrast comparison Training and teaching video V demo Robot motion video V robot And a track motion V action (ii) a Step T2, comparing the video V by using a preset data normalization method comparison Training and teaching video V demo And robot motion video V robot Carrying out normalization processing to obtain a preprocessed contrast video V' comparison And preprocessing the teaching video V' demo And pre-processing the motion video V' robot And unifying the time length; step T3, constructing an initial neural network model theta; step T4, teaching video V 'of the preprocessing' demo Inputting the initial neural network model theta to obtain a demonstration action D action And calculating and demonstrating motion loss L demo
Figure GDA0003893356370000031
(ii) a Step T5, according to the demonstration action loss L demo Performing parameter updating on the initial neural network model theta to obtain an updated neural network model serving as an updated model theta':
Figure GDA0003893356370000032
in the formula, lambda is the learning rate of the hyper-parameter; step T6, teaching video V 'of the preprocessing' demo Inputting the updated model theta' to obtain the predicted demonstration track motion P action-demo And corresponding presentation semantics E demo And comparing the preprocessed contrast video V' comparison Inputting the updated model theta' to obtain the predicted contrast track motion P action-comparison And corresponding contrast semantics E comparison And converting the preprocessed motion video V' robot Robot track motion P predicted by inputting updated model theta action-robot And corresponding robot object semantics E target Wherein we represent each presentation semantic E by an N-dimensional vector that takes values as a set of real numbers demo Comparison semantics E comparison And robot object semantics E target
P action-demo ,E demo =f θ' (V' demo )
P action-comparison ,E comparison =f θ' (V' comparison ) (3)
P action-robot ,E target =f θ' (V' robot )
(ii) a Step T7, acting V based on the track action And predicted robot trajectory action P action-robot Calculating to obtain the target action loss L action
Figure GDA0003893356370000033
(ii) a Step T8, according to the demonstration semantic E demo Motion semantics E target And contrast semantics E comparison Calculating to obtain semantic loss L embedding
L embedding =∑max[0,M-E demo ·E target +E target ·E comparison +E demo ·E comparison ] (5)
Wherein M is a threshold value; step T9, based on the target action loss L action And semantic loss L embedding The total loss L is obtained:
L=αL action +βL embedding (6)
in the formula, alpha and beta are hyper-parameters; step T10, carrying out derivation based on the total loss L to obtain a loss gradient
Figure GDA0003893356370000042
Thereby completing the update of the updated model theta 'to obtain a neural network model theta':
Figure GDA0003893356370000041
in the formula, delta is a hyper-parameter learning rate; step T11, repeating the steps T4 to T11 for a predetermined training time until the total loss L stably converges to a predetermined total loss threshold L margin And obtaining the trained neural network model.
The robot behavior teaching method based on the meta learning provided by the invention can also have the technical characteristics that the initial neural network model is an end-to-end neural network model.
The robot behavior teaching method based on meta learning provided by the invention can also have the technical characteristics that the tasks comprise an arrival task of the robot to reach the target position and a push task of the robot to move the target object.
Action and effects of the invention
According to the robot behavior teaching method based on meta learning, the updated model is obtained according to the demonstration action loss through updating, and the demonstration action loss is obtained according to the pre-processing teaching video, so that the robot can understand various demonstration works in the pre-processing teaching video. And due to the fact that the video V 'is pre-processed and taught' demo Pre-processed motion video V' robot And pre-processed contrast video V' comparison Inputting the updated model to obtain the corresponding demonstration semantic E demo Motion semantics E target And contrast semantics E comparison And constructing semantic loss L based on the semantics embedding Therefore, the robot can accurately understand task semantics under different semantic environments, such as samples with the same task but different task environments have similar semantic environments, samples with different task targets but similar environments have different semantic environments, and finally, the result is thatThe neural network model has a more definite semantic target. Also, different from the pre-processing teaching video V 'is introduced in the neural network training process' demo And pre-processing motion video V' robot Pairs of preprocessed contrast video V' comparison Therefore, the semantic vectors under different scenes of unsupervised learning are realized, and the contrast induction capability and stability of the neural network model are enhanced, so that the neural network model has better robustness, and a prediction result with higher precision is deduced.
The robot behavior teaching method based on meta-learning can enable the robot to quickly understand the behavior intention of the human according to the teaching video when the industrial production line is reset, so that new tasks after the industrial production line is reset, such as an arrival task of automatic path planning and a push task of object movement in object sorting, can be completed.
Drawings
Fig. 1 is a flowchart of a robot behavior teaching method based on meta learning according to an embodiment of the present invention;
FIG. 2 is a diagram illustrating an arrival task and a push task according to an embodiment of the present invention;
FIG. 3 is a live view of an arrival task and a push task in accordance with an embodiment of the present invention;
FIG. 4 is a flow chart of a training process of a neural network model according to an embodiment of the present invention; and
fig. 5 is a schematic diagram of a neural network model according to an embodiment of the present invention.
Detailed Description
In order to make the technical means, the creation features, the achievement purposes and the effects of the invention easy to understand, the following describes a robot behavior teaching method based on meta-learning specifically with reference to the embodiments and the accompanying drawings.
< example >
The robot behavior teaching method based on meta learning in the embodiment is specifically completed on the UR10 robot, and a camera is placed on a UR10 robot worktable and connected with a computer for visual information feedback. Meanwhile, a table is placed in front of the workbench, and objects in various shapes are placed on the table, so that the robot can carry out tasks such as arriving, taking, putting down, pushing and the like on the objects.
In the robot behavior teaching method based on meta learning in the embodiment, the pre-processing teaching video is mainly used for object sorting, and the object sorting task is understood and an appropriate path route is automatically planned based on the input human hand pre-processing teaching video related to object sorting, so that manpower and material resources are saved, and the process is simple.
Fig. 1 is a flowchart of a robot behavior teaching method based on meta learning according to an embodiment of the present invention.
As shown in fig. 1, the robot behavior teaching method based on meta learning includes the following steps:
and S1, acquiring a teaching video.
In this embodiment, the teaching video is a teaching video related to an object sorting task.
And S2, learning the teaching video by using a pre-trained neural network model so as to complete various tasks.
FIG. 2 is a diagram illustrating an arrival task and a push task according to an embodiment of the present invention.
FIG. 3 is a diagram of a real scene of an arrival task and a push task according to an embodiment of the present invention.
As shown in fig. 2 and 3, the object sorting task mainly involves an arrival task in which the robot arrives at a target position and a push task in which the robot moves the target object.
In a task of arrival, the robot learns a demonstration part about the task of arrival in a teaching video through a trained neural network model, then according to the learning result of the task of arrival, the mechanical arm arrives at the position of a specified color block from a base representing an origin, in the task of arrival in fig. 2, the color blocks M, N and O are represented as different positions, wherein the color block M is the specified color block which the mechanical arm needs to arrive at, the demonstration input is that the mechanical arm learns to arrive at the position of the color block M from the position representing the origin, and the simulation output is that the mechanical arm arrives at the position of the color block M from the position of the origin. In fig. 3, the arrival task refers to the arrival of the robot arm from the base position to the strip "3" position.
In the task pushing process, the robot learns the demonstration part about the task pushing process in the teaching video through the trained neural network model, and then the mechanical arm is controlled to push an object to a specified position according to the task pushing learning result. In the present embodiment, the pushing task is that the robot controls the robot arm to push the object to the learning result of the arrival task learned in the arrival task, that is, to push the object to a specified position (as shown in fig. 2), specifically, in fig. 3, the pushing task is to move the square "1" or the hexagon "2" to the long bar "3" position by the robot arm.
Fig. 4 is a flowchart of a training process of a neural network model according to an embodiment of the present invention.
As shown in fig. 4, the training process of the neural network model includes the following steps:
step T1, collecting a video V containing contrast comparison Training teaching video V demo Robot motion video V robot And a track motion V action As the training content, step T2 is then entered.
Wherein the track action V action The motion trajectory is a motion taught by a standard expert as a reference, such as the direction of up-down, left-right motion, corresponding to the robot in each frame of image in the training video.
In addition, training teaching video V demo Including a presentation video for a presentation task, a target task video for a target task, and a contrast video V for a contrast task comparison And so on. In particular, V demo The video comprises a plurality of videos A, a, B, B, C and C, and if A is a demonstration video and B is a target task video, a comparison video V is used comparison Can be B or C, and can be used as the contrast video V only if different from the demonstration video A comparison
At the same time, V action Actions are taught for standard experts as a reference.
Step T2, normalization by means of predetermined dataMethod for contrast video V as training content comparison Training and teaching video V demo And robot motion video V robot Carrying out normalization processing to obtain a preprocessed contrast video V' comparison And preprocessing the teaching video V' demo And pre-processing the motion video V' robot And unifying the time length, and then entering step T3.
And step T3, constructing an initial neural network model theta, and then entering step T4.
Wherein, the initial neural network model is an end-to-end neural network model.
Step T4, teaching video V 'of the preprocessing' demo Inputting the initial neural network model theta to obtain a demonstration action D action And calculating and demonstrating motion loss L demo
Figure GDA0003893356370000081
Step T5, according to the demonstration action loss L demo Performing parameter updating on the initial neural network model theta to obtain an updated neural network model serving as an updated model theta':
Figure GDA0003893356370000082
in the formula, λ is a hyper-parameter learning rate, and then the process proceeds to step T6.
Step T6, the preprocessed teaching video V 'is processed' demo Input of updated model θ' to obtain predicted demonstration track action P action-demo And corresponding presentation semantics E demo And (c) converting the preprocessed contrast video V' comparison Inputting the updated model theta' to obtain the predicted contrast track motion P action-comparison And corresponding contrast semantics E comparison The preprocessed motion video V' robot Robot track motion P predicted by inputting updated model theta action-robot And corresponding robot object semantics E target Wherein we use values ofRepresenting each presentation semantic E by an N-dimensional vector of a set of real numbers demo Comparison semantics E comparison And robot object semantics E target
P action-demo ,E demo =f θ' (V' demo )
P action-comparison ,E comparison =f θ' (V' comparison ) (3)
P action-robot ,E target =f θ' (V' robot )
Wherein, the contrast video V 'is preprocessed' comparison Task target is different from pre-processed teaching video V' demo And pre-processing motion video V' robot To pre-process the teaching video V' demo And pre-processed motion video V' robot The task goals of (a) are kept consistent.
Step T7, acting V based on the track action And predicted robot trajectory action P action-robot Calculating to obtain the target action loss L action
Figure GDA0003893356370000091
Step T8, according to the demonstration semantic E demo Motion semantics E target And contrast semantics E comparison Calculating to obtain semantic loss L embedding
L embedding =∑max[0,M-E demo ·E target +E target ·E comparison +E demo ·E comparison ] (5)
Where M is the threshold, and then proceeds to step T9.
Step T9, based on the target action loss L action And semantic loss L embedding The total loss L is obtained:
L=αL action +βL embedding (6)
where α and β are hyperparameters, and then proceeds to step T10.
And T10, performing derivation based on the total loss L to obtain a loss gradient
Figure GDA0003893356370000101
Thereby completing the update of the updated model theta 'to obtain a neural network model theta':
Figure GDA0003893356370000102
where δ is the hyper-parameter learning rate, and then proceeds to step T11.
Step T11, repeating the steps T4 to T11 for a predetermined number of training times until the total loss L stably converges to a predetermined total loss threshold L margin And obtaining the trained neural network model. Specifically, the method comprises the following steps:
step T11-1, judging whether the preset training times are reached or not until the total loss L is stably converged to a preset total loss threshold value L margin If yes, the process proceeds to step T11-2, otherwise, the process repeats step T4 to step T1.
And T11-2, obtaining the trained neural network model, and then entering an ending state.
Fig. 5 is a schematic diagram of a neural network model according to an embodiment of the present invention.
As shown in fig. 5, the trained neural network model can understand the semantics of the demonstration task, the target task and the comparison task, and can complete the target task according to the demonstration task. The demonstration task comprises a video for teaching, a motion path and the like, the target task is to enable the robot to learn automatic path planning according to the provided demonstration task video, and the comparison task is used for performing comparison analysis in the learning process of the robot so as to learn to distinguish semantics under different scenes.
And building a CNN convolutional neural network according to the three tasks, wherein each task shares a weight, namely the same CNN convolutional neural network is shared.
Inputting the teaching video into the neural network model to obtain the predicted action output andpredicting semantic output, wherein the pre-processed teaching video V' demo Corresponding presentation task and preprocessed motion video V' robot The corresponding target task has a consistent task target, and the pre-processed contrast video V' comparison The task goal of the corresponding contrast task is different from the presentation task and the goal task.
Specifically, in fig. 5, the demonstration task is input to the CNN convolutional neural network to obtain a corresponding demonstration prediction action output and a demonstration prediction semantic output; after the target task is input into the CNN convolutional neural network, corresponding support prediction action output and support prediction semantic output are obtained; after the contrast task is input into the CNN convolutional neural network, corresponding contrast prediction action output and contrast prediction semantic output are obtained, and the demonstration prediction semantic output and the support prediction semantic output point to the same color block (namely, the color block which is the same as the color block X in the figure 5), namely, the task target of the demonstration task is the same as that of the target task; the comparison prediction semantic output and the demonstration prediction semantic output are different from the color blocks supporting the prediction semantic output, namely the color blocks with the color of X and the color blocks with the color of Y in FIG. 5, namely the task targets of the comparison task and the demonstration task or the target task are different.
Examples effects and effects
According to the robot behavior teaching method based on meta learning provided by the embodiment, the updated model is obtained according to the updating of the demonstration action loss, and the demonstration action loss is obtained according to the pre-processing teaching video, so that the robot can understand various demonstration works in the pre-processing teaching video. And due to the fact that the video V 'is pre-processed and taught' demo Preprocessing the motion video V' robot And pre-processed contrast video V' comparison Inputting the updated model to obtain the corresponding demonstration semantics E demo Motion semantics E target And contrast semantics E comparison And constructing semantic loss L based on the semantics embedding Therefore, the robot can accurately understand task semantics under different semantic environments, for example, samples of the same task but different task environments have similar semantic environments, and samples of different task targets but similar environments have different semanticsAnd environment, and finally, the neural network model has a more definite semantic target. Also due to the fact that teaching video V 'different from preprocessing is introduced in the neural network training process' demo And pre-processing the motion video V' robot Of preprocessed contrast video V' comparison Therefore, the semantic vectors under different scenes of unsupervised learning are realized, and the contrast induction capability and stability of the neural network model are enhanced, so that the neural network model has better robustness, and a prediction result with higher precision is deduced.
In addition, in the embodiment, because the initial neural network model is an end-to-end neural network model, the neural network model has a small volume, can be directly transplanted into the robot and can directly deduce to obtain a prediction result, and time is saved.
The above-described embodiments are merely illustrative of specific embodiments of the present invention, and the present invention is not limited to the scope of the description of the above-described embodiments.

Claims (3)

1. A robot behavior teaching method based on meta-learning is used for learning teaching videos acquired by a robot so as to complete various tasks, and is characterized by comprising the following steps:
s1, acquiring the teaching video;
s2, learning the teaching video by using a pre-trained neural network model so as to complete various tasks,
the training process of the neural network model comprises the following steps:
step T1, collecting a video V containing contrast comparison Training and teaching video V demo Robot motion video V robot And a trajectory action V action
Step T2, utilizing a preset data normalization method to compare the video V comparison The training teaching video V demo And the robot motion video V robot Carrying out normalization processing to obtain a preprocessed contrast video V' comparison And preprocessing the teaching video V' demo And pre-processing motion video V' robot And unify the duration;
step T3, constructing an initial neural network model theta;
step T4, teaching video V 'of the preprocessing' demo Inputting the initial neural network model theta to obtain a demonstration action D action And calculating and demonstrating motion loss L demo
Figure FDA0003893356360000011
Step T5, according to the demonstration action loss L demo Performing parameter updating on the initial neural network model theta to obtain an updated neural network model serving as an updated model theta':
Figure FDA0003893356360000012
in the formula, lambda is the learning rate of the hyper-parameter;
step T6, teaching video V 'of the preprocessing' demo Inputting the updated model theta' to obtain the predicted demonstration track motion P action-demo And corresponding presentation semantics E demo And comparing the preprocessed contrast video V' comparison Inputting the updated model theta' to obtain the predicted contrast track action P action-comparison And corresponding contrast semantics E comparison The preprocessed motion video V' robot Robot track motion P predicted by inputting updated model theta action-robot And corresponding robot object semantics E target Wherein we represent each presentation semantic E by an N-dimensional vector that takes values as a set of real numbers demo Comparison semantics E comparison And robot object semantics E target
Figure FDA0003893356360000021
Step T7, acting V based on the track action And predicted robot trajectory action P action-robot Calculating to obtain the target action loss L action
Figure FDA0003893356360000022
Step T8, according to the demonstration semantic E demo The motion semantics E target And the contrast semantics E comparison Calculating to obtain semantic loss L embedding
L embedding =∑max[0,M-E demo ·E target +E target ·E comparison +E demo ·E comparison ] (5)
Wherein M is a threshold value;
step T9, based on the target action loss L action And the semantic loss L embedding The total loss L is obtained:
L=αL action +βL embedding (6)
in the formula, alpha and beta are hyper-parameters;
step T10, performing derivation based on the total loss L to obtain a loss gradient
Figure FDA0003893356360000023
Thereby completing the updating of the updated model theta 'to obtain a neural network model theta':
Figure FDA0003893356360000031
in the formula, delta is the learning rate of the hyper-parameter;
step T11, repeating the step T4 to the step T11 for a predetermined training number of times until the total loss L stably converges to a predetermined total loss threshold L margin And obtaining the trained neural network model.
2. The meta learning based robot behavior teaching method according to claim 1, wherein:
wherein the initial neural network model is an end-to-end neural network model.
3. The meta learning based robot behavior teaching method according to claim 1, wherein:
wherein the tasks include an arrival task in which the robot arrives at a target position and a push task in which the robot moves a target object.
CN202011483927.7A 2020-12-16 2020-12-16 Robot behavior teaching method based on meta-learning Active CN112509392B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202011483927.7A CN112509392B (en) 2020-12-16 2020-12-16 Robot behavior teaching method based on meta-learning

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202011483927.7A CN112509392B (en) 2020-12-16 2020-12-16 Robot behavior teaching method based on meta-learning

Publications (2)

Publication Number Publication Date
CN112509392A CN112509392A (en) 2021-03-16
CN112509392B true CN112509392B (en) 2022-11-29

Family

ID=74972443

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202011483927.7A Active CN112509392B (en) 2020-12-16 2020-12-16 Robot behavior teaching method based on meta-learning

Country Status (1)

Country Link
CN (1) CN112509392B (en)

Families Citing this family (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114491039B (en) * 2022-01-27 2023-10-03 四川大学 Primitive learning few-sample text classification method based on gradient improvement
CN114881240B (en) * 2022-02-28 2023-09-26 复旦大学 Robot vision teaching learning model and method based on multi-attention mechanism
CN117464683B (en) * 2023-11-23 2024-05-14 中机生产力促进中心有限公司 Method for controlling mechanical arm to simulate video motion

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9256838B2 (en) * 2013-03-15 2016-02-09 International Business Machines Corporation Scalable online hierarchical meta-learning
CN110785268B (en) * 2017-06-28 2023-04-04 谷歌有限责任公司 Machine learning method and device for semantic robot grabbing
US11341364B2 (en) * 2017-09-20 2022-05-24 Google Llc Using simulation and domain adaptation for robotic control
CN109571487B (en) * 2018-09-12 2020-08-28 河南工程学院 Robot demonstration learning method based on vision
CN109875777B (en) * 2019-02-19 2021-08-31 西安科技大学 Fetching control method of wheelchair with fetching function
CN111199458B (en) * 2019-12-30 2023-06-02 北京航空航天大学 Recommendation system based on meta learning and reinforcement learning
CN111890357B (en) * 2020-07-01 2023-07-04 广州中国科学院先进技术研究所 Intelligent robot grabbing method based on action demonstration teaching

Also Published As

Publication number Publication date
CN112509392A (en) 2021-03-16

Similar Documents

Publication Publication Date Title
Pertsch et al. Accelerating reinforcement learning with learned skill priors
CN112509392B (en) Robot behavior teaching method based on meta-learning
Sadeghi et al. Sim2real viewpoint invariant visual servoing by recurrent control
CN111203878B (en) Robot sequence task learning method based on visual simulation
Qi et al. Towards latent space based manipulation of elastic rods using autoencoder models and robust centerline extractions
Passalis et al. Deep reinforcement learning for controlling frontal person close-up shooting
CN113657573B (en) Robot skill acquisition method based on meta learning under scene memory guidance
Hoppe et al. Planning approximate exploration trajectories for model-free reinforcement learning in contact-rich manipulation
Sena et al. Improving task-parameterised movement learning generalisation with frame-weighted trajectory generation
Nasiriany et al. Pivot: Iterative visual prompting elicits actionable knowledge for vlms
Auddy et al. Continual learning from demonstration of robotics skills
Pauly et al. O2a: one-shot observational learning with action vectors
Ye et al. Efficient robotic object search via hiem: Hierarchical policy learning with intrinsic-extrinsic modeling
Wang et al. Bulletarm: An open-source robotic manipulation benchmark and learning framework
Huang et al. Learning graph dynamics with external contact for deformable linear objects shape control
CN114779661B (en) Chemical synthesis robot system based on multi-classification generation confrontation imitation learning algorithm
Kasaei et al. Object learning and grasping capabilities for robotic home assistants
Kulić et al. Incremental learning of full body motion primitives
Gao et al. Online learning in planar pushing with combined prediction model
Nguyen et al. Deep learning with experience ranking convolutional neural network for robot manipulator
Zito et al. One-shot learning for autonomous aerial manipulation
Zhou et al. Humanoid action imitation learning via boosting sample DQN in virtual demonstrator environment
Saleem et al. Obstacle-avoidance algorithm using deep learning based on rgbd images and robot orientation
Sejnova et al. Feedback-driven incremental imitation learning using sequential VAE
CN114881240B (en) Robot vision teaching learning model and method based on multi-attention mechanism

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant