CN113635310B

CN113635310B - Model migration method and device

Info

Publication number: CN113635310B
Application number: CN202111206993.4A
Authority: CN
Inventors: 邢登鹏; 杨依明; 李佳乐; 徐波
Original assignee: Institute of Automation of Chinese Academy of Science
Current assignee: Institute of Automation of Chinese Academy of Science
Priority date: 2021-10-18
Filing date: 2021-10-18
Publication date: 2022-01-11
Anticipated expiration: 2041-10-18
Also published as: CN113635310A

Abstract

The disclosure relates to a model migration method and a device, wherein the method comprises the following steps: acquiring a target model, a verification data set and a parameter fine tuning data set; knowledge distillation processing is carried out on the target model to obtain a migration model, and optimization processing is carried out on the migration model according to the verification data set and an error function corresponding to the knowledge distillation processing in the knowledge distillation processing process of the target model; and performing self-supervision training on the migration model by using the parameter fine-tuning data set so as to perform fine tuning on the parameters of the migration model. By adopting the technical means, the problem that a robot model migration method is lacked in the prior art is solved.

Description

Model migration method and device

Technical Field

The present disclosure relates to the field of machine learning, and in particular, to a model migration method and apparatus.

Background

Robot model migration is one of the research hotspots of robot control learning. In the prior art, a robot model obtained by a large-scale robot learning method is too large to be directly applied to a real scene. The whole process of applying the robot model to the real scene is called robot model migration.

In the course of implementing the disclosed concept, the inventors found that there are at least the following technical problems in the related art: the prior art lacks a method for transferring a robot model.

Disclosure of Invention

In order to solve the technical problem or at least partially solve the technical problem, embodiments of the present disclosure provide a model migration method and apparatus to solve at least the problem in the prior art that the prior art lacks a method for robot model migration.

The purpose of the present disclosure is realized by the following technical scheme:

in a first aspect, an embodiment of the present disclosure provides a model migration method, including: acquiring a target model, a verification data set and a parameter fine tuning data set; knowledge distillation processing is carried out on the target model to obtain a migration model, and optimization processing is carried out on the migration model according to the verification data set and an error function corresponding to the knowledge distillation processing in the knowledge distillation processing process of the target model; and performing self-supervision training on the migration model by using the parameter fine-tuning data set so as to perform fine tuning on the parameters of the migration model.

In an exemplary embodiment, the optimizing the migration model according to the validation data set and the error function corresponding to the knowledge distillation process during the knowledge distillation process on the target model includes: optimizing the migration model according to a first verification data set and a first error function, wherein the verification data set comprises the first verification data set, the error function comprises the first error function, and the first error function is used for representing an error between an output of a multi-head attention layer of the target model and an output of the multi-head attention layer of the migration model; and/or performing optimization processing on the migration model according to a second verification data set and a second error function, wherein the verification data set comprises the second verification data set, the error function comprises the second error function, and the second error function is used for representing an error between an output of a fully-connected layer of the target model and an output of a fully-connected layer of the migration model; and/or optimizing the migration model according to a third verification data set and a third error function, wherein the verification data set includes the third verification data set, and the error function includes the third error function, and the third error function is used for representing an error between an output of the output layer of the target model and a label of the third verification data set.

In an exemplary embodiment, the first error function

：

；

q is the number of heads of the multi-head attention layer of the migration model, j is the serial number of the multi-head attention layer, A_j ^SThe output of the multi-head attention layer for the j-th layer of the migration model, A_j ^TFor the output of the multi-head attention layer of the jth layer of the target model, MSE () is a mean square error function.

In an exemplary embodiment, the second error function

：

；

H^SFor the output of the full connection layer of the migration model, W_hTo convert the matrix, H^TFor the output of the fully-connected layer of the target model, MSE () is the mean square error function.

In an exemplary embodiment, the third error function

：

；

MSE () is a mean square error function, B is the output of the output layer of the migration model,

a label for the third verification data set.

In an exemplary embodiment, the optimizing the migration model according to the validation data set and the error function corresponding to the knowledge distillation process during the knowledge distillation process on the target model includes: determining a first number of processes per optimization of the migration model, wherein the first number of processes is indicative of a number of trajectories of a robot selected from the verification dataset per optimization of the migration model; circularly executing the following steps to perform the optimization processing on the migration model: step one, determining the track of the first batch of processing pieces of the robot from the verification data set; generating a first matrix according to each determined track to obtain a plurality of first matrices; step three, respectively inputting the plurality of first matrixes into the migration model in sequence to obtain a plurality of second matrixes; calculating an error value of each first matrix and the second matrix corresponding to each first matrix through the error function; fifthly, performing the optimization processing on the migration model according to the error value; and step six, in the optimization processing of the current batch, on the basis of the minimum of the migration models, when the error value of the second matrix corresponding to the first matrix input into the migration model at the last time and the first matrix input into the migration model at the last time is smaller than a first preset threshold value, ending the circulation.

In an exemplary embodiment, the generating a first matrix according to each determined track to obtain a plurality of first matrices includes: determining states and joint moments of the robot corresponding to the trajectory in multiple time dimensions, wherein the states include: the position and velocity of each joint of the robot; constructing the first matrix by taking the states and the joint moments as columns of the first matrix and taking the plurality of time dimensions as rows of the first matrix; and when the model dimension of the migration model corresponding to the track is smaller than a second preset threshold value, performing zero filling processing on the first matrix.

In an exemplary embodiment, the self-supervised training of the migration model using the parameter tuning dataset to tune parameters of the migration model comprises: determining a second number of batches of the self-supervised training of the migration model each time, wherein the second number of batches is used for indicating the number of tracks of the robot selected from the parameter fine-tuning dataset each time the self-supervised training of the migration model is performed; circularly executing the following steps to perform the self-supervision training on the migration model: step one, determining the trajectories of the second batch of the plurality of robots from the parameter fine tuning data set; generating a third matrix according to each determined track to obtain a plurality of third matrices; step three, respectively inputting the plurality of third matrixes into the migration model in sequence to obtain a plurality of fourth matrixes; calculating an error value of each third matrix and the fourth matrix corresponding to each third matrix through the error function; fifthly, performing the self-supervision training on the migration model according to the error value; step six, in the self-supervision training of the current batch, when the error value of the third matrix input into the migration model at the last time and the error value of the fourth matrix corresponding to the third matrix input into the migration model at the last time are smaller than a third preset threshold value, ending the circulation.

In an exemplary embodiment, the generating a third matrix according to each determined track to obtain a plurality of third matrices includes: determining states and joint moments of the robot corresponding to the trajectory in multiple time dimensions, wherein the states include: the position and velocity of each joint of the robot; constructing a third matrix by taking the states and the joint moments as columns of the third matrix and taking the plurality of time dimensions as rows of the third matrix; and when the model dimension of the migration model corresponding to the track is smaller than a fourth preset threshold value, performing zero filling processing on the third matrix.

In a second aspect, an embodiment of the present disclosure provides a model migration apparatus, including: the acquisition module is used for acquiring a target model, a verification data set and a parameter fine-tuning data set; the knowledge distillation module is used for carrying out knowledge distillation processing on the target model to obtain a migration model, and optimizing the migration model according to the verification data set and an error function corresponding to the knowledge distillation processing in the knowledge distillation processing process of the target model; and the fine tuning module is used for carrying out self-supervision training on the migration model by using the parameter fine tuning data set so as to carry out fine tuning on the parameters of the migration model.

In a third aspect, embodiments of the present disclosure provide an electronic device. The electronic equipment comprises a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus; a memory for storing a computer program; and a processor for implementing the model migration method or the image processing method as described above when executing the program stored in the memory.

In a fourth aspect, embodiments of the present disclosure provide a computer-readable storage medium. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, implements the model migration method or the image processing method as described above.

Compared with the prior art, the technical scheme provided by the embodiment of the disclosure at least has part or all of the following advantages: acquiring a target model, a verification data set and a parameter fine tuning data set; knowledge distillation processing is carried out on the target model to obtain a migration model, and optimization processing is carried out on the migration model according to the verification data set and an error function corresponding to the knowledge distillation processing in the knowledge distillation processing process of the target model; and performing self-supervision training on the migration model by using the parameter fine-tuning data set so as to perform fine tuning on the parameters of the migration model. In the knowledge distillation process performed on the target model, the migration model is optimized according to the verification data set and the error function corresponding to the knowledge distillation process, and the parameter fine tuning data set is used for performing self-supervision training on the migration model to fine tune the parameters of the migration model.

Drawings

The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the present disclosure and together with the description, serve to explain the principles of the disclosure.

In order to more clearly illustrate the embodiments of the present disclosure or the technical solutions in the prior art, the drawings needed to be used in the description of the embodiments or the related art will be briefly described below, and it is obvious for those skilled in the art to obtain other drawings without inventive exercise.

Fig. 1 schematically illustrates a hardware structure block diagram of a computer terminal of a model migration method according to an embodiment of the present disclosure;

FIG. 2 schematically illustrates a flow chart of a model migration method of an embodiment of the present disclosure;

FIG. 3 schematically illustrates a schematic diagram of a model migration process of an embodiment of the present disclosure;

FIG. 4 is a block diagram schematically illustrating a structure of a model migration apparatus according to an embodiment of the present disclosure;

fig. 5 schematically shows a block diagram of an electronic device provided in an embodiment of the present disclosure.

Detailed Description

The present disclosure will be described in detail below with reference to the accompanying drawings in conjunction with embodiments. It should be noted that, in the present disclosure, the embodiments and features of the embodiments may be combined with each other without conflict.

It should be noted that the terms "first," "second," and the like in the description and claims of the present disclosure and in the above-described drawings are used for distinguishing between similar elements and not necessarily for describing a particular sequential or chronological order.

The method embodiments provided by the embodiments of the present disclosure may be executed in a computer terminal or a similar computing device. Taking an example of the method running on a computer terminal, fig. 1 schematically shows a hardware structure block diagram of a computer terminal of a model migration method according to an embodiment of the present disclosure. As shown in fig. 1, a computer terminal may include one or more processors 102 (only one is shown in fig. 1), wherein the processors 102 may include but are not limited to a processing device such as a Microprocessor (MPU) or a Programmable Logic Device (PLD) and a memory 104 for storing data, and optionally, the computer terminal may further include a transmission device 106 for communication function and an input/output device 108, it is understood by those skilled in the art that the structure shown in fig. 1 is merely illustrative and not a limitation to the structure of the computer terminal, for example, the computer terminal may further include more or less components than those shown in fig. 1, or have equivalent functions or different configurations than those shown in fig. 1.

The memory 104 may be used to store a computer program, for example, a software program and a module of application software, such as a computer program corresponding to the model migration method in the embodiment of the present disclosure, and the processor 102 executes the computer program stored in the memory 104 to execute various functional applications and data processing, i.e., to implement the method described above. The memory 104 may include high speed random access memory, and may also include non-volatile memory, such as one or more magnetic storage devices, flash memory, or other non-volatile solid-state memory. In some examples, the memory 104 may further include memory located remotely from the processor 102, which may be connected to a computer terminal over a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

The transmission device 106 is used to receive or transmit data via a network. Specific examples of the network described above may include a wireless network provided by a communication provider of the computer terminal. In one example, the transmission device 106 includes a Network adapter (NIC) that can be connected to other Network devices through a base station to communicate with the internet. In one example, the transmission device 106 may be a Radio Frequency (RF) module, which is used to communicate with the internet in a wireless manner.

In an embodiment of the present disclosure, a model migration method is provided, and fig. 2 schematically illustrates a flowchart of the model migration method in the embodiment of the present disclosure, and as shown in fig. 2, the flowchart includes the following steps:

step S202, obtaining a target model, a verification data set and a parameter fine tuning data set;

step S204, knowledge distillation processing is carried out on the target model to obtain a migration model, and optimization processing is carried out on the migration model according to the verification data set and an error function corresponding to the knowledge distillation processing in the knowledge distillation processing process of the target model;

step S206, performing self-supervision training on the migration model by using the parameter fine-tuning data set so as to fine-tune the parameters of the migration model.

It should be noted that the target model in the embodiment of the present disclosure may be a robot model corresponding to a robot with any structure, and the model scale or the model size of the migration model is smaller than that of the target model. Because the target model is too large to be directly applied to a real scene, and because the migration model is much smaller than the target model, the migration model can be applied to the real scene, thereby providing a method for robot model migration. The model migration method provided in the embodiment of the disclosure can be applied to robot dynamics or inverse robot dynamics.

The robot dynamics refers to the state of the robot at the next moment calculated by the state of the robot at the current or previous moment and the moment corresponding to the current or previous moment, and then the robot is controlled. The inverse dynamics of the robot is to calculate the moment corresponding to each time through the state of the robot at the current or previous time and the state of the robot at the next time, and then to control the robot.

By the present disclosure, a target model, a validation dataset, and a parameter trim dataset are obtained; knowledge distillation processing is carried out on the target model to obtain a migration model, and optimization processing is carried out on the migration model according to the verification data set and an error function corresponding to the knowledge distillation processing in the knowledge distillation processing process of the target model; and performing self-supervision training on the migration model by using the parameter fine-tuning data set so as to perform fine tuning on the parameters of the migration model. In the knowledge distillation process performed on the target model, the migration model is optimized according to the verification data set and the error function corresponding to the knowledge distillation process, and the parameter fine tuning data set is used for performing self-supervision training on the migration model to fine tune the parameters of the migration model.

In step S204, in the process of performing the knowledge distillation processing on the target model, the optimizing processing on the migration model according to the verification data set and the error function corresponding to the knowledge distillation processing includes: optimizing the migration model according to a first verification data set and a first error function, wherein the verification data set comprises the first verification data set, the error function comprises the first error function, and the first error function is used for representing an error between an output of a multi-head attention layer of the target model and an output of the multi-head attention layer of the migration model; and/or performing optimization processing on the migration model according to a second verification data set and a second error function, wherein the verification data set comprises the second verification data set, the error function comprises the second error function, and the second error function is used for representing an error between an output of a fully-connected layer of the target model and an output of a fully-connected layer of the migration model; and/or optimizing the migration model according to a third verification data set and a third error function, wherein the verification data set includes the third verification data set, and the error function includes the third error function, and the third error function is used for representing an error between an output of the output layer of the target model and a label of the third verification data set.

Knowledge Distillation (Knowledge Distillation) process is a model compression method. And (3) carrying out knowledge distillation treatment on the target model, namely compressing the target model to obtain a compressed migration model. And in the process of compressing the target model, optimizing the compressed migration model, and ensuring that the compressed migration model is minimum and the error of the migration model compared with the target model is minimum by the technical means. And optimizing the migration model through at least one of the following error functions, namely a first error function, a second error function and a third error function.

The first error function

：

；

The second error function

：

；

Each multi-head attention layer has q heads, so the first error function of each multi-head attention layer is the sum of the errors of all the heads in each multi-head attention layer; and a multi-head mechanism is not arranged in the full connection layer, and the error of the full connection layer is only the error between the output of the full connection layer of the target model and the output of the full connection layer of the migration model. W_hIs a parameter newly appeared when the knowledge distillation treatment is carried out on the target model, and the parameter is also subjected to parameter adjustment by gradient back transmission in the knowledge distillation treatment process, so that W_hIs "learnable". The "transformation matrix" means that the dimensions of the output results of the target model and the migration model are not the same, and therefore, in order to be able to compare the output results of the target model and the migration model, it is necessary to use the transformation matrix W_hThe output results of the target model and the migration model are made dimensionally close.

The third error function

：

；

a label for the third verification data set.

Optionally, the migration model is optimized according to a fourth validation data set and a fourth error function, wherein the validation data set includes the fourth validation data set, and the error function includes the fourth error function, and the fourth error function is used to represent a total error corresponding to the knowledge distillation process.

The fourth error function

：

；

d is the number of the multi-head attention layers of the migration model, j is the serial number of the multi-head attention layers,

a first error function for a jth multi-headed attention layer of the target model and a jth multi-headed attention layer of the migration model,

a second error function for a jth layer fully-connected layer of the target model and a jth layer fully-connected layer of the migration model,

for the error between the output of the output layer of the migration model and the label of the third verification data set, MSE () is a mean square error function.

In step S204, in the process of performing the knowledge distillation processing on the target model, the optimizing processing on the migration model according to the verification data set and the error function corresponding to the knowledge distillation processing includes: determining a first number of processes per optimization of the migration model, wherein the first number of processes is indicative of a number of trajectories of a robot selected from the verification dataset per optimization of the migration model; circularly executing the following steps to perform the optimization processing on the migration model: step one, determining the track of the first batch of processing pieces of the robot from the verification data set; generating a first matrix according to each determined track to obtain a plurality of first matrices; step three, respectively inputting the plurality of first matrixes into the migration model in sequence to obtain a plurality of second matrixes; calculating an error value of each first matrix and the second matrix corresponding to each first matrix through the error function; fifthly, performing the optimization processing on the migration model according to the error value; and step six, in the optimization processing of the current batch, on the basis of the minimum of the migration models, when the error value of the second matrix corresponding to the first matrix input into the migration model at the last time and the first matrix input into the migration model at the last time is smaller than a first preset threshold value, ending the circulation.

And ending the cycle when the error values of the first matrix input into the migration model at the last time and the second matrix corresponding to the first matrix input into the migration model at the last time are smaller than a first preset threshold value on the principle that the migration model is minimum, namely, ensuring that the migration model is minimum and the error of the migration model is minimum compared with that of a target model by the technical means.

The first batch number is the number of trajectories of the robot selected from the verification dataset corresponding to each batch, the batch is a proper term in model training, and this disclosure is not explained. It should be noted that, when determining the number of batch processing pieces in training the migration model, the maximum model dimension of the migration model may also be determined, and the hyper-parameters in each module of the network, such as the number of network layers of the migration model, the number of mask multi-head attention network, the learning rate attenuation, the learning seeds, and the like, are determined. And generating a first matrix according to each determined track, namely converting each track into matrix data, namely the first matrix. And respectively and sequentially inputting the first matrixes into the migration model to obtain a plurality of second matrixes. And the second matrix is the data of the predicted track of the migration model according to the first matrix. And calculating an error value of each first matrix and the second matrix corresponding to each first matrix through an error function, namely calculating the difference between a predicted value and a true value of the migration model, and finally training the migration model according to the error value. And circularly executing the first step to the fifth step, gradually reducing the error value, improving the prediction accuracy of the migration model, and ending the circulation when the error values of the first matrix input into the migration model at the last of the current batch training and the second matrix corresponding to the first matrix input into the migration model at the last are smaller than a first preset threshold value. It should be noted that the determination of the trajectory of the batch of several robots from the validation dataset may be repeated for each cycle.

In step S204, generating a first matrix according to each determined track to obtain a plurality of first matrices, including: determining states and joint moments of the robot corresponding to the trajectory in multiple time dimensions, wherein the states include: the position and velocity of each joint of the robot; constructing the first matrix by taking the states and the joint moments as columns of the first matrix and taking the plurality of time dimensions as rows of the first matrix; and when the model dimension of the migration model corresponding to the track is smaller than a second preset threshold value, performing zero filling processing on the first matrix.

And determining the state and the joint moment of the robot corresponding to the track in multiple time dimensions, wherein the track actually comprises the state and the joint moment of the robot in multiple time dimensions. The position of each joint of the robot may be an angle of each joint. The columns of the first matrix may be formed by the state-joint moment pairs, with the state and the joint moment as columns of the first matrix. For example, the first row and the first column of elements of the first matrix that is completely built may be the state of the robot at the first time. The second preset threshold is determined by the maximum model dimension of the migration model.

And obtaining the second matrix corresponding to the first matrix according to the first matrix, wherein the second matrix takes a plurality of time dimensions as rows, and the prediction state corresponding to the state is a column. The second matrix is data of the predicted track of the migration model according to the first matrix, and since one track corresponds to the states and joint moments of the robot in multiple time dimensions, the second matrix is arranged in multiple time dimensions, and the predicted states corresponding to the states are columns, which are not contradictory to the above.

The number of dimensions of the plurality of time dimensions is determined by the length of the motion time series corresponding to the robot trajectory in the motion trajectory set.

It should be noted that, when the model dimension of the migration model corresponding to the track is smaller than a second preset threshold, zero padding processing may be performed on the first matrix, or alternatively, when the first matrix corresponding to the track is smaller than the second preset threshold, zero padding processing may be performed on the first matrix.

In step S206, performing an auto-supervised training on the migration model using the parameter fine-tuning data set to perform fine-tuning on parameters of the migration model, including: determining a second number of batches of the self-supervised training of the migration model each time, wherein the second number of batches is used for indicating the number of tracks of the robot selected from the parameter fine-tuning dataset each time the self-supervised training of the migration model is performed; circularly executing the following steps to perform the self-supervision training on the migration model: step one, determining the trajectories of the second batch of the plurality of robots from the parameter fine tuning data set; generating a third matrix according to each determined track to obtain a plurality of third matrices; step three, respectively inputting the plurality of third matrixes into the migration model in sequence to obtain a plurality of fourth matrixes; calculating an error value of each third matrix and the fourth matrix corresponding to each third matrix through the error function; fifthly, performing the self-supervision training on the migration model according to the error value; step six, in the self-supervision training of the current batch, when the error value of the third matrix input into the migration model at the last time and the error value of the fourth matrix corresponding to the third matrix input into the migration model at the last time are smaller than a third preset threshold value, ending the circulation.

The second batch number is the number of trajectories of the robot selected from the parameter tuning dataset corresponding to each batch, which is a proper term in model training and is not explained in this disclosure. And generating a third matrix according to each determined track, namely converting each track into matrix data, namely the third matrix. And respectively and sequentially inputting the plurality of third matrixes into the migration model to obtain a plurality of fourth matrixes. And the fourth matrix is the data of the predicted track of the migration model according to the third matrix. And calculating an error value of each third matrix and the fourth matrix corresponding to each third matrix through an error function, namely calculating the difference between a predicted value and a true value of the migration model, and finally training the migration model according to the error value. And circularly executing the first step to the fifth step, gradually reducing the error value, improving the prediction accuracy of the migration model, and ending the circulation when the error values of the third matrix input into the migration model at the last of the current batch training and the fourth matrix corresponding to the third matrix input into the migration model at the last are smaller than a third preset threshold value. It should be noted that the determination of the trajectory of the batch of several robots from the parameter tuning dataset may be repeated for each cycle.

The first matrix is the same as the third matrix and the second matrix is the same as the fourth matrix, here only to distinguish whether the matrices are from the verification dataset or the parameter tuning dataset.

In step S206, generating a third matrix according to each determined track to obtain a plurality of third matrices, including: determining states and joint moments of the robot corresponding to the trajectory in multiple time dimensions, wherein the states include: the position and velocity of each joint of the robot; constructing a third matrix by taking the states and the joint moments as columns of the third matrix and taking the plurality of time dimensions as rows of the third matrix; and when the model dimension of the migration model corresponding to the track is smaller than a fourth preset threshold value, performing zero filling processing on the third matrix.

The columns of the third matrix may be formed by the state-joint moment pairs. For example, the first row and the first column of elements of the third matrix that is completely constructed may be the state of the robot at the first time.

And obtaining a fourth matrix corresponding to the third matrix according to the third matrix, wherein the fourth matrix is formed by taking a plurality of time dimensions as rows and taking a prediction state corresponding to the state as a column. The fourth matrix is data of the predicted track of the migration model according to the third matrix, and because one track corresponds to the states and joint moments of the robot in multiple time dimensions, the fourth matrix is arranged in multiple time dimensions, and the predicted states corresponding to the states are columns, which are not contradictory to the above.

It should be noted that, when the model dimension of the migration model corresponding to the trajectory is smaller than a fourth preset threshold, zero padding processing may be performed on the third matrix, or alternatively, when the third matrix corresponding to the trajectory is smaller than the fourth preset threshold, zero padding processing may be performed on the third matrix.

In order to better understand the technical solutions, the embodiments of the present disclosure also provide an alternative embodiment for explaining the technical solutions.

Fig. 3 schematically illustrates a schematic diagram of a model migration process according to an embodiment of the present disclosure, as shown in fig. 3:

large-scale networks, i.e. object models, comprising: the device comprises an input layer, an encoding layer, a plurality of Block layers, a decoding layer and an output layer, wherein the Block is a network Block layer and can be a multi-head attention layer, a full connection layer and the like;

distillation networks, i.e. migration models, comprising: the target model comprises an input layer, an encoding layer, a plurality of Block layers, a decoding layer and an output layer, wherein the number of the plurality of Block layers of the migration model is smaller than that of the plurality of Block layers of the target model.

It should be noted that each Block layer sequentially includes: masked multi-headed attention networks, sum & normalization networks, feed forward networks, and sum & normalization networks.

Outputting, by the migration model, an error between an output of an output layer of the migration model and a label of the third validation dataset;

is the second error function, output by the fully-connected layer of the migration model;

is the first error function, output by the multi-head attention layer of the migration model.

Through the above description of the embodiments, those skilled in the art can clearly understand that the method according to the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but the former is a better implementation mode in many cases. Based on such understanding, the technical solutions of the present disclosure or portions contributing to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (such as a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, and an optical disk), and includes several instructions for enabling a terminal device (which may be a mobile phone, a computer, a component server, or a network device) to execute the methods of the embodiments of the present disclosure.

In this embodiment, a model migration apparatus is further provided, and the model migration apparatus is used to implement the foregoing embodiments and preferred embodiments, which have already been described and are not described again. As used below, the term "module" may be a combination of software and/or hardware that implements a predetermined function. Although the means described in the embodiments below are preferably implemented in software, an implementation in hardware, or a combination of software and hardware is also possible and contemplated.

Fig. 4 is a block diagram schematically illustrating a model migration apparatus according to an alternative embodiment of the present disclosure, and as shown in fig. 4, the apparatus includes:

an obtaining module 402, configured to obtain a target model, a verification data set, and a parameter fine-tuning data set;

a knowledge distillation module 404, configured to perform knowledge distillation processing on the target model to obtain a migration model, and perform optimization processing on the migration model according to the verification data set and an error function corresponding to the knowledge distillation processing in the process of performing the knowledge distillation processing on the target model;

a fine tuning module 406, configured to perform self-supervision training on the migration model using the parameter fine tuning data set, so as to perform fine tuning on parameters of the migration model.

Optionally, the knowledge distillation module 404 is further configured to perform an optimization process on the migration model according to a first validation data set and a first error function, wherein the validation data set includes the first validation data set, and the error function includes the first error function, and the first error function is configured to represent an error between an output of the multi-head attention layer of the target model and an output of the multi-head attention layer of the migration model; and/or performing optimization processing on the migration model according to a second verification data set and a second error function, wherein the verification data set comprises the second verification data set, the error function comprises the second error function, and the second error function is used for representing an error between an output of a fully-connected layer of the target model and an output of a fully-connected layer of the migration model; and/or optimizing the migration model according to a third verification data set and a third error function, wherein the verification data set includes the third verification data set, and the error function includes the third error function, and the third error function is used for representing an error between an output of the output layer of the target model and a label of the third verification data set.

The knowledge distillation process is a model compression method. And (3) carrying out knowledge distillation treatment on the target model, namely compressing the target model to obtain a compressed migration model. And in the process of compressing the target model, optimizing the compressed migration model, and ensuring that the compressed migration model is minimum and the error of the migration model compared with the target model is minimum by the technical means. And optimizing the migration model through at least one of the following error functions, namely a first error function, a second error function and a third error function.

The first error function

：

；

The second error function

：

；

The third error function

：

；

a label for the third verification data set.

Optionally, the knowledge distillation module 404 is further configured to perform an optimization process on the migration model according to a fourth validation data set and a fourth error function, wherein the validation data set includes the fourth validation data set, and the error function includes the fourth error function, and the fourth error function is used to represent a total error corresponding to the knowledge distillation process.

The fourth error function

：

；

Optionally, the knowledge distillation module 404 is further configured to determine a first number of processing runs each time the optimization process is performed on the migration model, wherein the first number of processing runs is configured to indicate a number of trajectories of the robot selected from the verification dataset each time the migration model is optimized; circularly executing the following steps to perform the optimization processing on the migration model: step one, determining the track of the first batch of processing pieces of the robot from the verification data set; generating a first matrix according to each determined track to obtain a plurality of first matrices; step three, respectively inputting the plurality of first matrixes into the migration model in sequence to obtain a plurality of second matrixes; calculating an error value of each first matrix and the second matrix corresponding to each first matrix through the error function; fifthly, performing the optimization processing on the migration model according to the error value; and step six, in the optimization processing of the current batch, on the basis of the minimum of the migration models, when the error value of the second matrix corresponding to the first matrix input into the migration model at the last time and the first matrix input into the migration model at the last time is smaller than a first preset threshold value, ending the circulation.

Optionally, knowledge distillation module 404 is also used for the position and velocity of the various joints of the robot; constructing the first matrix by taking the states and the joint moments as columns of the first matrix and taking the plurality of time dimensions as rows of the first matrix; and when the model dimension of the migration model corresponding to the track is smaller than a second preset threshold value, performing zero filling processing on the first matrix.

Optionally, the tuning module 406 is further configured to determine a second number of batches of the self-supervised training of the migration model each time, wherein the second number of batches is used to indicate a number of trajectories of the robot selected from the parameter tuning data set each time the self-supervised training of the migration model is performed; circularly executing the following steps to perform the self-supervision training on the migration model: step one, determining the trajectories of the second batch of the plurality of robots from the parameter fine tuning data set; generating a third matrix according to each determined track to obtain a plurality of third matrices; step three, respectively inputting the plurality of third matrixes into the migration model in sequence to obtain a plurality of fourth matrixes; calculating an error value of each third matrix and the fourth matrix corresponding to each third matrix through the error function; fifthly, performing the self-supervision training on the migration model according to the error value; step six, in the self-supervision training of the current batch, when the error value of the third matrix input into the migration model at the last time and the error value of the fourth matrix corresponding to the third matrix input into the migration model at the last time are smaller than a third preset threshold value, ending the circulation.

Optionally, the fine tuning module 406 is further configured to determine a state and a joint moment of the robot corresponding to the trajectory in multiple time dimensions, where the state includes: the position and velocity of each joint of the robot; constructing a third matrix by taking the states and the joint moments as columns of the third matrix and taking the plurality of time dimensions as rows of the third matrix; and when the model dimension of the migration model corresponding to the track is smaller than a fourth preset threshold value, performing zero filling processing on the third matrix.

It should be noted that, the above modules may be implemented by software or hardware, and for the latter, the following may be implemented, but not limited to: the modules are all positioned in the same processor; alternatively, the modules are respectively located in different processors in any combination.

Embodiments of the present disclosure provide an electronic device.

Referring to fig. 5, an electronic device 500 provided in the embodiment of the present disclosure includes a processor 501, a communication interface 502, a memory 503 and a communication bus 504, where the processor 501, the communication interface 502 and the memory 503 complete communication with each other through the communication bus 504; a memory 503 for storing a computer program; the processor 501 is configured to implement the steps in any of the above method embodiments when executing the program stored in the memory.

Optionally, the electronic apparatus may further include a transmission device and an input/output device, wherein the input/output device is connected to the processor.

Optionally, in this embodiment, the processor may be configured to execute the following steps by a computer program:

s1, acquiring a target model, a verification data set and a parameter fine adjustment data set;

s2, knowledge distillation processing is carried out on the target model to obtain a migration model, and optimization processing is carried out on the migration model according to the verification data set and an error function corresponding to the knowledge distillation processing in the knowledge distillation processing process of the target model;

s3, performing self-supervision training on the migration model by using the parameter fine-tuning data set so as to perform fine tuning on the parameters of the migration model.

Embodiments of the present disclosure also provide a computer-readable storage medium. The computer-readable storage medium has stored thereon a computer program which, when executed by a processor, performs the steps of any of the method embodiments described above.

Alternatively, in the present embodiment, the storage medium may be configured to store a computer program for executing the steps of:

The computer-readable storage medium may be contained in the apparatus/device described in the above embodiments; or may be present alone without being assembled into the device/apparatus. The computer-readable storage medium carries one or more programs which, when executed, implement the method according to an embodiment of the disclosure.

According to embodiments of the present disclosure, the computer-readable storage medium may be a non-volatile computer-readable storage medium, which may include, for example but is not limited to: a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present disclosure, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

Optionally, the specific examples in this embodiment may refer to the examples described in the above embodiments and optional implementation manners, and this embodiment is not described herein again.

It will be apparent to those skilled in the art that the modules or steps of the present disclosure described above may be implemented by a general purpose computing device, they may be centralized on a single computing device or distributed across a network of multiple computing devices, and alternatively, they may be implemented by program code executable by a computing device, such that they may be stored in a storage device and executed by a computing device, and in some cases, the steps shown or described may be performed in an order different than that described herein, or they may be separately fabricated into individual integrated circuit modules, or multiple ones of them may be fabricated into a single integrated circuit module. As such, the present disclosure is not limited to any specific combination of hardware and software.

The above description is only a preferred embodiment of the present disclosure and is not intended to limit the present disclosure, and various modifications and changes may be made to the present disclosure by those skilled in the art. Any modification, equivalent replacement, improvement and the like made within the principle of the present disclosure should be included in the protection scope of the present disclosure.

Claims

1. A method of model migration, comprising:

acquiring a target model, a verification data set and a parameter fine tuning data set;

knowledge distillation processing is carried out on the target model to obtain a migration model, and optimization processing is carried out on the migration model according to the verification data set and an error function corresponding to the knowledge distillation processing in the knowledge distillation processing process of the target model;

performing self-supervision training on the migration model by using the parameter fine-tuning data set so as to fine-tune parameters of the migration model;

and in the process of carrying out knowledge distillation processing on the target model, carrying out optimization processing on the migration model according to the verification data set and an error function corresponding to the knowledge distillation processing, wherein the optimization processing comprises the following steps:

determining a first number of processes per optimization of the migration model, wherein the first number of processes is indicative of a number of trajectories of a robot selected from the verification dataset per optimization of the migration model;

circularly executing the following steps to perform the optimization processing on the migration model:

step one, determining the track of the first batch of processing pieces of the robot from the verification data set;

generating a first matrix according to each determined track to obtain a plurality of first matrices;

step three, respectively inputting the plurality of first matrixes into the migration model in sequence to obtain a plurality of second matrixes;

calculating an error value of each first matrix and the second matrix corresponding to each first matrix through the error function;

fifthly, performing the optimization processing on the migration model according to the error value;

and step six, in the optimization processing of the current batch, on the basis of the minimum of the migration models, when the error value of the second matrix corresponding to the first matrix input into the migration model at the last time and the first matrix input into the migration model at the last time is smaller than a first preset threshold value, ending the circulation.

2. The method according to claim 1, wherein the optimizing the migration model according to the validation data set and the error function corresponding to the knowledge distillation process during the knowledge distillation process on the target model comprises:

optimizing the migration model according to a first verification data set and a first error function, wherein the verification data set comprises the first verification data set, the error function comprises the first error function, and the first error function is used for representing an error between an output of a multi-head attention layer of the target model and an output of the multi-head attention layer of the migration model; and/or

Optimizing the migration model according to a second verification data set and a second error function, wherein the verification data set includes the second verification data set, the error function includes the second error function, and the second error function is used for representing an error between an output of a fully-connected layer of the target model and an output of a fully-connected layer of the migration model; and/or

Optimizing the migration model according to a third verification data set and a third error function, wherein the verification data set includes the third verification data set, the error function includes the third error function, and the third error function is used for representing an error between an output of the output layer of the target model and a label of the third verification data set.

3. The method of claim 2, wherein the first error function

：

；

q is the number of heads of the multi-head attention layer of the migration model, j is the serial number of the multi-head attention layer, A_j ^SMultiple heads for the jth layer of the migration modelOutput of the attention layer, A_j ^TFor the output of the multi-head attention layer of the jth layer of the target model, MSE () is a mean square error function.

4. The method of claim 2, wherein the second error function

：

；

5. The method of claim 2, wherein the third error function

：

；

a label for the third verification data set.

6. The method of claim 1, wherein generating a first matrix from each determined trajectory to obtain a plurality of first matrices comprises:

determining states and joint moments of the robot corresponding to the trajectory in multiple time dimensions, wherein the states include: the position and velocity of each joint of the robot;

constructing the first matrix by taking the states and the joint moments as columns of the first matrix and taking the plurality of time dimensions as rows of the first matrix;

and when the model dimension of the migration model corresponding to the track is smaller than a second preset threshold value, performing zero filling processing on the first matrix.

7. The method of claim 1, wherein the using the parameter tuning dataset to self-supervised train the migration model to tune parameters of the migration model comprises:

determining a second number of batches of the self-supervised training of the migration model each time, wherein the second number of batches is used for indicating the number of tracks of the robot selected from the parameter fine-tuning dataset each time the self-supervised training of the migration model is performed;

circularly executing the following steps to perform the self-supervision training on the migration model:

step one, determining the trajectories of the second batch of the plurality of robots from the parameter fine tuning data set;

generating a third matrix according to each determined track to obtain a plurality of third matrices;

step three, respectively inputting the plurality of third matrixes into the migration model in sequence to obtain a plurality of fourth matrixes;

calculating an error value of each third matrix and the fourth matrix corresponding to each third matrix through the error function;

fifthly, performing the self-supervision training on the migration model according to the error value;

step six, in the self-supervision training of the current batch, when the error value of the third matrix input into the migration model at the last time and the error value of the fourth matrix corresponding to the third matrix input into the migration model at the last time are smaller than a third preset threshold value, ending the circulation.

8. The method of claim 7, wherein generating a third matrix from each determined trajectory to obtain a plurality of third matrices comprises:

constructing a third matrix by taking the states and the joint moments as columns of the third matrix and taking the plurality of time dimensions as rows of the third matrix;

and when the model dimension of the migration model corresponding to the track is smaller than a fourth preset threshold value, performing zero filling processing on the third matrix.

9. A model migration apparatus, comprising:

the acquisition module is used for acquiring a target model, a verification data set and a parameter fine-tuning data set;

the knowledge distillation module is used for carrying out knowledge distillation processing on the target model to obtain a migration model, and optimizing the migration model according to the verification data set and an error function corresponding to the knowledge distillation processing in the knowledge distillation processing process of the target model;

the fine tuning module is used for carrying out self-supervision training on the migration model by using the parameter fine tuning data set so as to carry out fine tuning on the parameters of the migration model;

wherein, in the process of performing the knowledge distillation processing on the target model, the optimization processing on the migration model according to the verification data set and the error function corresponding to the knowledge distillation processing includes: