CN111950736A

CN111950736A - Migration ensemble learning method, terminal device, and computer-readable storage medium

Info

Publication number: CN111950736A
Application number: CN202010726052.2A
Authority: CN
Inventors: 袁春; 钟括
Original assignee: Shenzhen International Graduate School of Tsinghua University
Current assignee: Shenzhen International Graduate School of Tsinghua University
Priority date: 2020-07-24
Filing date: 2020-07-24
Publication date: 2020-11-17
Anticipated expiration: 2040-07-24
Also published as: CN111950736B

Abstract

The invention provides a migration ensemble learning method, a terminal device and a computer readable storage medium, wherein the method comprises the following steps: s1: uniformly sampling at least 3 different total migratable degrees of the source models from 0 to 1, wherein each total migratable degree of the source models corresponds to one individual learner; s2: for each individual learner, determining an optimal migration strategy based on the total migratable extent of the corresponding source model; s3: migrating a source model to an individual learner of a target task sequentially through a parameter migration regular term and a task loss function based on the optimal migration strategy corresponding to each individual learner; s4: and obtaining an integrated model in an average mode based on the individual learners of each target task. By utilizing the effectiveness of ensemble learning in relieving overfitting, the estimation of the migration strategy is more stable and efficient through reasonable assumption, and finally the generalization capability of the ensemble model is greatly improved.

Description

Migration ensemble learning method, terminal device, and computer-readable storage medium

Technical Field

The invention relates to the technical field of artificial intelligence and transfer learning, in particular to a transfer ensemble learning method, a terminal device and a computer readable storage medium.

Background

The introduction of deep neural networks has significantly facilitated the development of computer vision and natural language processing in recent years. Compared with the traditional method for manually defining the characteristics, the model can automatically learn the characteristics with more discrimination through layer-by-layer stacking of the network. However, the neural network also brings a new problem that it requires a large amount of tag data. Therefore, to achieve better generalization performance, the model needs to be trained with more data. This seems to be a ferrology in the deep learning field today. However, obtaining a large amount of tag data is costly. For example, in the segmentation task, each pixel point needs to be labeled with a category, so that a large amount of manual labor is consumed. Even more fatal, due to privacy protection, it becomes completely impossible to collect a large amount of tag data.

Facing the problem of a small number of training samples, one current direction of fire comparison is transfer learning. Intuitively, the method of migrating and riding the bicycle can be used for learning how to ride the motorcycle and how to take the Chinese chess into the chess, thereby rapidly adapting to a new task, namely the migration learning. The transfer learning is to improve the generalization performance of the target model on the target task by transferring the knowledge of the related task under the condition of a small number of label data sets. The transfer learning is widely applied to the field of deep learning.

Considering the mobility of model parameters, there is a lot of work to migrate through parameter migration regularization terms. But these efforts all suffer from severe overfitting problems.

The above background disclosure is only for the purpose of assisting understanding of the concept and technical solution of the present invention and does not necessarily belong to the prior art of the present patent application, and should not be used for evaluating the novelty and inventive step of the present application in the case that there is no clear evidence that the above content is disclosed at the filing date of the present patent application.

Disclosure of Invention

In order to solve the existing problems, the invention provides a migration ensemble learning method, a terminal device and a computer readable storage medium.

In order to solve the above problems, the technical solution adopted by the present invention is as follows:

a migration ensemble learning method, comprising the steps of: s1: uniformly sampling at least 3 different total migratable degrees of the source models from 0 to 1, wherein each total migratable degree of the source models corresponds to one individual learner; s2: for each individual learner, determining an optimal migration strategy based on the total migratable extent of the corresponding source model; s3: migrating a source model to an individual learner of a target task sequentially through a parameter migration regular term and a task loss function based on the optimal migration strategy corresponding to each individual learner; s4: and obtaining an integrated model in an average mode based on the individual learners of each target task.

Preferably, each of said source models has a total migratable extent c_mCorresponding to one individual learner f_m(·；ω^t，m) And sampling M total migratable degrees of the source models, wherein the total migratable degree of the source models is

Preferably, the estimating the optimal migration strategy according to the total migratable extent of each source model includes:

wherein ,

is the training set of the target task, c is the total migratable extent of the source model, λ is the migration strategy, L_TIs a loss function of the target task, ω^s，ω^tParameters, λ, of the individual learners representing the source model parameters and the target task, respectively_lIs a migration policy per layer, N_lIs the number of parameters of the l-th layer, and gamma is a normalizationA constant.

Preferably, the estimating of the optimal migration strategy through the total migratable extent of each source model includes:

wherein ,σ_i，jIs a loss function L of the target task_TSecond derivative of jth model parameter at ith layer; u is an auxiliary variable by which the migration strategy λ is made to satisfy the following condition:

in particular, the method of manufacturing a semiconductor device,

preferably, the parameter migration regularization term is:

the loss function of the target model is:

preferably, the optimal migration strategy λ and the pre-training model parameter ω corresponding to each individual learner are based on^sTarget model parameters omega of the target task through a complete loss function L^tOptimizing to obtain the individual learner f (·; omega)^t)。

Preferably, M migration policies based on the selection

Obtaining M in turnThe individual learning device

wherein ,f_mDenotes the m-th learner, ω^t，mRepresenting the corresponding model parameters.

Preferably, M of said individual learners are trained

And obtaining an integrated model F ═ Sigma by an averaging mode_mf_m。

The invention also provides a migration ensemble learning terminal device, which comprises a memory, a processor and a computer program stored in the memory and capable of running on the processor, wherein the processor executes the computer program to realize the steps of any one of the methods.

The invention further provides a computer-readable storage medium having stored thereon a computer program which, when executed by a processor, carries out the steps of the method as set forth in any of the above.

The invention has the beneficial effects that: the method comprises the steps of sampling at least 3 different source model total migratable degrees from 0 to 1 to ensure the difference between individual learners, estimating a corresponding optimal migration strategy according to the different source model total migratable degrees, obtaining a corresponding number of individual learners through a migration strategy and a loss function, and obtaining an integrated model in an averaging mode. By utilizing the effectiveness of ensemble learning in relieving overfitting, the estimation of the migration strategy is more stable and efficient through reasonable assumption, and finally the generalization capability of the ensemble model is greatly improved.

Drawings

Fig. 1 is a schematic diagram of a migration ensemble learning method according to an embodiment of the present invention.

Detailed Description

In order to make the technical problems, technical solutions and advantageous effects to be solved by the embodiments of the present invention more clearly apparent, the present invention is further described in detail below with reference to the accompanying drawings and the embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

It will be understood that when an element is referred to as being "secured to" or "disposed on" another element, it can be directly on the other element or be indirectly on the other element. When an element is referred to as being "connected to" another element, it can be directly connected to the other element or be indirectly connected to the other element. In addition, the connection may be for either a fixing function or a circuit connection function.

It is to be understood that the terms "length," "width," "upper," "lower," "front," "rear," "left," "right," "vertical," "horizontal," "top," "bottom," "inner," "outer," and the like are used in an orientation or positional relationship indicated in the drawings for convenience in describing the embodiments of the present invention and to simplify the description, and are not intended to indicate or imply that the referenced device or element must have a particular orientation, be constructed in a particular orientation, and be in any way limiting of the present invention.

Furthermore, the terms "first", "second" and "first" are used for descriptive purposes only and are not to be construed as indicating or implying relative importance or implicitly indicating the number of technical features indicated. Thus, a feature defined as "first" or "second" may explicitly or implicitly include one or more of that feature. In the description of the embodiments of the present invention, "a plurality" means two or more unless specifically limited otherwise.

In the invention, the source model refers to a pre-training model of a related task; the integrated model is a strong learner formed by training a plurality of individual learners and finally through a certain combination strategy. Related tasks refer to tasks related to the target task. In one particular embodiment, the target task is cat and dog discrimination, and the related task may be tiger and wolf discrimination. In practical applications, there are typically a large number of relevant tasks gathered before the target task is resolved, which may have been previously utilized. Now for the target task, in addition to the data of the target task, the previous related task knowledge can be salvaged.

As shown in fig. 1, the present invention provides a migration ensemble learning method, comprising the following steps:

s1: uniformly sampling at least 3 different total migratable degrees of the source models from 0 to 1, wherein each total migratable degree of the source models corresponds to one individual learner;

s2: for each individual learner, determining an optimal migration strategy based on the total migratable extent of the corresponding source model;

s3: migrating a source model to an individual learner of a target task sequentially through a parameter migration regular term and a task loss function based on the optimal migration strategy corresponding to each individual learner;

s4: and obtaining an integrated model in an average mode based on the individual learners of each target task.

The invention considers the effectiveness of ensemble learning in relieving overfitting through research, and can be introduced into transfer learning. Considering that the superiority of individual learning and the difference of individual learners are important factors for relieving overfitting after integration, the invention ensures the difference between the individual learners by sampling at least 3 different total migratable degrees of source models from 0 to 1, then estimates the corresponding optimal migration strategy according to the total migratable degrees of the different source models, finally can obtain the corresponding number of individual learners through the migration strategy and the loss function, and obtains the integrated model through an average mode.

According to the invention, according to the migratability of the relevant task pre-training model, namely the source model, the generalization performance of the target model of the target task can be improved by migrating the relevant task pre-training model, and then according to the effectiveness of the ensemble learning in relieving the overfitting, the ensemble learning can be introduced into the migration learning, so that the model performance is further improved.

The invention is oriented to a migration ensemble learning method based on model parameters, utilizes the effectiveness of ensemble learning in relieving overfitting, further enables the estimation of a migration strategy to be more stable and efficient through reasonable assumption, and finally greatly improves the generalization capability of an ensemble model.

In an embodiment of the present invention, each of the source models has a total migratable extent c_mCorresponding to one individual learner f_m(·；ω^t，m) And sampling M total migratable degrees of the source models, wherein the total migratable degree of the source models is

Since the over-fitting can be relieved by the difference between the individual learners, the difference between the source model and the target model of the target task can be ensured by changing the total migratable degree of the source model, so that the over-fitting is relieved to the maximum extent. When the source model f (·; ω)^s) And a target model f (-) of the target task; omega^t) When the output of (a) is close, the total migratability of the source model can be considered high. Further, when the source model and the target model of the target task have the same structure, the source model parameter ω can be considered as the source model parameter ω^sAnd target model parameters omega^tWhen the similarity is high, the model f (·; omega) is pre-trained^s) And model f (·; omega^t) The output of (c) is close, so the source model is always migratable to a high degree. Controlling source model parameters omega by a migration strategy lambda^sAnd target model parameters omega^tThe similarity of (c). It is thus possible to use a function of the migration strategy lambda as a proxy value for which the total migratability as source model is high. Considering the influence of different layer parameters on the total migratable degree of the source model, the model migratable degree

wherein N_lIs the number of parameters for the l-th layer and gamma is a normalization constant. Before estimating M migration strategies, the invention uniformly samples M different total model migratability degrees from 0 to 1

Wherein each c_mCorresponding to an individual learner f_m(·；ω^t，m) And selecting the value of M according to the actual storage space and the actual operation resource, wherein M is at least more than 3.

The method for estimating the optimal migration strategy through the total migratable degree of each source model comprises the following steps:

wherein ,

is the training set of the target task, c is the total migratable extent of the source model, λ is the migration strategy, L_TIs a loss function of the target task, ω^s，ω^tParameters, λ, of the individual learners representing the source model parameters and the target task, respectively_lIs a migration policy per layer, N_lIs the number of parameters of the l-th layer, gamma is a normalization constant,

are the individual learner parameters estimated by the integrity loss function.

It will be appreciated that the above description, for simplicity of calculation, will refer to the total migratable extent c of the source model_mThe specific meaning of c is the same, here merely to simplify the formulation in the case of clearance.

Further, estimating an optimal migration strategy through the total migratable extent of each source model, including:

in particular, the method of manufacturing a semiconductor device,

by a reasonable approximation of two-stage optimization, u is estimated first and then the specific migration strategy λ can be estimated by the above equation.

Furthermore, M corresponding migration strategies can be obtained by transforming the total model migration degree.

According to the model parameter mobility, a migration strategy lambda of each layer can be defined_lThen, the regular term Ω and the migration strategy λ ═ λ are migrated through the parameters₁，λ₂，...，λ_L]Migration of knowledge from the source model can be achieved.

The parameter migration regularization term is:

considering that the individual learners need to be trained on the target task at the same time to improve the performance of the target model on the target task, a target task loss function L is added_TThe loss function of the target model is:

learning based on each of the individual learnersCorresponding optimal migration strategy lambda and pre-training model parameter omega^sTarget model parameters omega of the target task through a complete loss function L^tOptimizing to obtain the individual learner f (·; omega)^t)。

Further, based on the M migration policies selected

Sequentially obtaining M individual learners

Training M individual learners

And obtaining an integrated model F ═ Sigma by an averaging mode_mf_m。

To verify the effectiveness of the method of the present invention, the method of the present invention was applied to bird detection systems. The wild bird detection system aims at distinguishing the categories of birds in the wild environment, and the core problem is how to distinguish the categories of the birds in the image from a complex background through a small number of bird training sets. Because there are only a few bird training sets, the prior art cannot get a good model through a classical deep learning network. The method of the invention was applied to this system, which in particular contained 200 birds, a training set size of 5994 and a test set size of 5794, this data being provided by the open data set CUBS. Compared to mainstream methods such as finenet, FitNet, ABdistill, L2SP, cross stick, BAN, SE, Bagging, the method of the present invention exceeds the related mainstream methods by up to about 1% accuracy, which can prove the effectiveness of the patent. The effectiveness of the method of the present invention in mitigating overfitting and improving model performance is demonstrated.

An embodiment of the present application further provides a control apparatus, including a processor and a storage medium for storing a computer program; wherein a processor is adapted to perform at least the method as described above when executing the computer program.

Embodiments of the present application also provide a storage medium for storing a computer program, which when executed performs at least the method described above.

Embodiments of the present application further provide a processor, where the processor executes a computer program to perform at least the method described above.

The storage medium may be implemented by any type of volatile or non-volatile storage device, or combination thereof. Among them, the nonvolatile Memory may be a Read Only Memory (ROM), a Programmable Read Only Memory (PROM), an erasable Programmable Read-Only Memory (EPROM), an electrically erasable Programmable Read-Only Memory (EEPROM), a magnetic random Access Memory (FRAM), a Flash Memory (Flash Memory), a magnetic surface Memory, an optical Disc, or a Compact Disc Read-Only Memory (CD-ROM); the magnetic surface storage may be disk storage or tape storage. Volatile Memory can be Random Access Memory (RAM), which acts as external cache Memory. By way of illustration and not limitation, many forms of RAM are available, such as Static Random Access Memory (SRAM), Synchronous Static Random Access Memory (SSRAM), Dynamic Random Access Memory (DRAM), Synchronous Dynamic Random Access Memory (SDRAM), Double Data rate Synchronous Dynamic Random Access Memory (DDRSDRAM, Double Data rate Synchronous Dynamic Random Access Memory), Enhanced Synchronous Dynamic Random Access Memory (ESDRAM, Enhanced Synchronous Dynamic Random Access Memory), Synchronous link Dynamic Random Access Memory (SLDRAM, Synchronous Dynamic Random Access Memory (DRAM), Direct Memory (DRM, Random Access Memory). The storage media described in connection with the embodiments of the invention are intended to comprise, without being limited to, these and any other suitable types of memory.

In the several embodiments provided in the present application, it should be understood that the disclosed system and method may be implemented in other ways. The above-described device embodiments are merely illustrative, for example, the division of the unit is only a logical functional division, and there may be other division ways in actual implementation, such as: multiple units or components may be combined, or may be integrated into another system, or some features may be omitted, or not implemented. In addition, the coupling, direct coupling or communication connection between the components shown or discussed may be through some interfaces, and the indirect coupling or communication connection between the devices or units may be electrical, mechanical or other forms.

The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, that is, may be located in one place, or may be distributed on a plurality of network units; some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.

In addition, all the functional units in the embodiments of the present invention may be integrated into one processing unit, or each unit may be separately regarded as one unit, or two or more units may be integrated into one unit; the integrated unit can be realized in a form of hardware, or in a form of hardware plus a software functional unit.

Those of ordinary skill in the art will understand that: all or part of the steps for implementing the method embodiments may be implemented by hardware related to program instructions, and the program may be stored in a computer readable storage medium, and when executed, the program performs the steps including the method embodiments; and the aforementioned storage medium includes: a mobile storage device, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

Alternatively, the integrated unit of the present invention may be stored in a computer-readable storage medium if it is implemented in the form of a software functional module and sold or used as a separate product. Based on such understanding, the technical solutions of the embodiments of the present invention may be essentially implemented or a part contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium and includes several instructions for causing a computer device (which may be a personal computer, a server, or a network device) to execute all or part of the methods described in the embodiments of the present invention. And the aforementioned storage medium includes: a removable storage device, a ROM, a RAM, a magnetic or optical disk, or various other media that can store program code.

The methods disclosed in the several method embodiments provided in the present application may be combined arbitrarily without conflict to obtain new method embodiments.

Features disclosed in several of the product embodiments provided in the present application may be combined in any combination to yield new product embodiments without conflict.

The features disclosed in the several method or apparatus embodiments provided in the present application may be combined arbitrarily, without conflict, to arrive at new method embodiments or apparatus embodiments.

The foregoing is a more detailed description of the invention in connection with specific preferred embodiments and it is not intended that the invention be limited to these specific details. For those skilled in the art to which the invention pertains, several equivalent substitutions or obvious modifications can be made without departing from the spirit of the invention, and all the properties or uses are considered to be within the scope of the invention.

Claims

1. A migration ensemble learning method, characterized by comprising the steps of:

2. The migration ensemble learning method of claim 1, wherein each of said source models has a total migratable extent c_mCorresponding to one individual learner f_m(·；ω^t，m) And sampling M total migratable degrees of the source models, wherein the total migratable degree of the source models is

3. The migration ensemble learning method of claim 2, wherein estimating an optimal migration policy by a total migratable extent of each of said source models comprises:

wherein ,

is the training set of the target task, c is the total migratable extent of the source model, λ is the migration strategy, L_TIs a loss function of the target task, ω^s，ω^tParameters, λ, of the individual learners representing the source model parameters and the target task, respectively_lIs a migration policy per layer, N_lIs the number of parameters for the l-th layer and gamma is a normalization constant.

4. The migration ensemble learning method of claim 3, wherein estimating an optimal migration strategy by a total migratable extent of each of said source models comprises:

in particular, the method of manufacturing a semiconductor device,

5. the migration ensemble learning method of claim 4, wherein said parameter migration regularization term is:

the loss function of the target model is:

6. the migration ensemble learning method according to claim 5, wherein the optimal migration strategy λ and pre-training model parameter ω are based on the optimal migration strategy λ and pre-training model parameter ω corresponding to each of the individual learners^sTarget model parameters omega of the target task through a complete loss function L^tOptimizing to obtain the individual learner f (·; omega)^t)。

7. The migration ensemble learning method of claim 6, wherein the M migration strategies based on the selection

Sequentially obtaining M individual learners

8. The method of migratory ensemble learning of claim 7, wherein M of said individual learners are trained

And obtaining an integrated model F ═ Sigma by an averaging mode_mf_m。

9. A migration ensemble learning terminal device comprising a memory, a processor and a computer program stored in said memory and executable on said processor, characterized in that said processor, when executing said computer program, implements the steps of the method according to any of claims 1-8.

10. A computer-readable storage medium, in which a computer program is stored which, when being executed by a processor, carries out the steps of the method according to any one of claims 1 to 8.