CN109754068A

CN109754068A - Transfer learning method and terminal device based on deep learning pre-training model

Info

Publication number: CN109754068A
Application number: CN201811473650.2A
Authority: CN
Inventors: 许国杰; 刘川; 吴又奎
Original assignee: Zhongke Hengyun Co Ltd
Current assignee: Zhongke Hengyun Co Ltd
Priority date: 2018-12-04
Filing date: 2018-12-04
Publication date: 2019-05-14

Abstract

The present invention is suitable for model construction techniques field, a kind of transfer learning method and terminal device based on deep learning pre-training model is provided, this method comprises: data set is divided into training dataset and test data set；New model is obtained to the pre-training model re -training of acquisition according to the data that the training data is concentrated；According to the data that the test data is concentrated, detection is carried out to the Generalization Capability of the new model and obtains testing result；When the testing result reaches pre-set level value, it is determined that the new model is the model for meeting application.The program can solve the data of scale needed for when facing the particular problem in a certain field, may being usually unable to get building model in the prior art, and construct the new model problem that time-consuming, resource consumption is larger.

Description

Transfer learning method and terminal device based on deep learning pre-training model

Technical field

The invention belongs to model construction techniques field more particularly to a kind of migrations based on deep learning pre-training model Learning method and terminal device.

Background technique

Under traditional machine learning frame, the task of study is exactly to be constructed on the basis of data according to given train up A new model.However, following key is had in machine learning in current research:

1, large-scale data needed for training new model.By the large-scale data for a certain specific field and bad obtain It takes.

2, time-consuming.Deep learning is a large-scale neural network, and the number of plies is relatively more, training needed for expend the time compared with It is long, and neural network is more complicated, and data are more, it would be desirable to which the time spent in training process is also more.

3, resource is consumed.Neural network usually requires a large amount of marker samples, usually a large amount of data and neural network In the response of each layer can consume a large amount of memories.It is identical that traditional machine learning usually assumes that training data is obeyed with test data Data distribution.However, in many cases, this same distributional assumption is simultaneously unsatisfactory for, such as training data is expired, results in the need for me Remove to mark a large amount of training data again to meet the needs of we train, but mark new data and need a large amount of manpower and object Power, even and if we have a large amount of, training data under different distributions, abandoning these data completely is also to waste very much 's.

Summary of the invention

In view of this, the embodiment of the invention provides a kind of transfer learning method based on deep learning pre-training model and Terminal device, can solve in the prior art face a certain field particular problem when, may usually be unable to get building mould The data of scale needed for type, and construct the new model problem that time-consuming, resource consumption is larger.

The first aspect of the embodiment of the present invention provides a kind of transfer learning method based on deep learning pre-training model, Include:

Data set is divided into training dataset and test data set；

New model is obtained to the pre-training model re -training of acquisition according to the data that the training data is concentrated；

According to the data that the test data is concentrated, detection is carried out to the Generalization Capability of the new model and obtains detection knot Fruit；

When the testing result reaches pre-set level value, it is determined that the new model is the model for meeting application.

In one embodiment, the training dataset and the test data set are that two data distributions are consistent and mutual exclusion Data acquisition system.

In one embodiment, the test data set is carried out sampling in the data set by way of stratified sampling and be obtained , the data in the data set in addition to the test data set are the training dataset.

In one embodiment, the data that the training data is concentrated are more than the data that the test data is concentrated.

In one embodiment, the ratio section that the data that the training data is concentrated account for the data of the data set is [2/ 3,4/5]。

It is in one embodiment, described that data set is divided into training dataset and test data set, comprising:

Data set is subjected to n times random division, N group training dataset is obtained and corresponding test data set, the N is greater than Equal to 1.

In one embodiment, the data concentrated according to the training data, instruct the pre-training model of acquisition again Practice, obtain new model, comprising:

According to the data that the training data is concentrated, to the training in the level of model rear end in the pre-training model of acquisition Weight re -training obtains new weight；

According to the training data concentrate data, to the parameter in the level of model rear end in the pre-training model into Row adjustment, obtains new parameter；

According to remained unchanged in the level of model front end in the pre-training model training weight, the new weight with And the new parameter, obtain new model.

The second aspect of the embodiment of the present invention provides a kind of transfer learning device based on deep learning pre-training model, Include:

Division module, for data set to be divided into training dataset and test data set；

Training module, the data for being concentrated according to the training data obtain the pre-training model re -training of acquisition Obtain new model；

Test module, the data for being concentrated according to the test data, examines the Generalization Capability of the new model It surveys and obtains testing result；

Determining module, for when the testing result reaches pre-set level value, it is determined that the new model is to meet to answer Model.

The third aspect of the embodiment of the present invention provides a kind of terminal device, including memory, processor and is stored in In the memory and the computer program that can run on the processor, which is characterized in that described in the processor executes Step described in the transfer learning method based on deep learning pre-training model realized when computer program.

The fourth aspect of the embodiment of the present invention provides a kind of computer readable storage medium, the computer-readable storage Media storage has computer program, which is characterized in that realizes when the computer program is executed by processor and is based on deep learning Step described in the transfer learning method of pre-training model.

Existing beneficial effect is the embodiment of the present invention compared with prior art: scheme provided in an embodiment of the present invention is led to It crosses and is adjusted the part layer in the pre-training model in deep learning to obtain new model according to the data of frontier, then to new Model is assessed, adjusts to be applied to practical problem, trains up data to solve and need to give in the prior art On the basis of learn a new model, when the new model of needs is a large-scale neural network, time-consuming and consumes Problem more than resource.

Detailed description of the invention

It to describe the technical solutions in the embodiments of the present invention more clearly, below will be to embodiment or description of the prior art Needed in attached drawing be briefly described, it should be apparent that, the accompanying drawings in the following description is only of the invention some Embodiment for those of ordinary skill in the art without any creative labor, can also be according to these Attached drawing obtains other attached drawings.

Fig. 1 is a kind of process of transfer learning method based on deep learning pre-training model provided in an embodiment of the present invention Schematic diagram；

Fig. 2 is the stream of another transfer learning method based on deep learning pre-training model provided in an embodiment of the present invention Journey schematic diagram；

Fig. 3 is a kind of example of transfer learning device based on deep learning pre-training model provided in an embodiment of the present invention Figure；

Fig. 4 is the schematic diagram of terminal device provided in an embodiment of the present invention.

Specific embodiment

In being described below, for illustration and not for limitation, the tool of such as particular system structure, technology etc is proposed Body details, to understand thoroughly the embodiment of the present invention.However, it will be clear to one skilled in the art that there is no these specific The present invention also may be implemented in the other embodiments of details.In other situations, it omits to well-known system, device, electricity The detailed description of road and method, in case unnecessary details interferes description of the invention.

In order to illustrate technical solutions according to the invention, the following is a description of specific embodiments.

The embodiment of the present invention provides a kind of transfer learning method based on deep learning pre-training model, as shown in Figure 1, should Method the following steps are included:

Step 101, data set is divided into training dataset and test data set.

Optionally, the training dataset and the test data set are that two data distributions are consistent and the data of mutual exclusion Set.For example, data set D is divided into two data distributions unanimously and the data acquisition system of mutual exclusion, one is training dataset S, it be the intersection of D, S and T is empty that one, which is the union of test data set T, S and T,.

Optionally, S is consistent with the data distribution of T, be in order to avoid because data partition process introduces additional deviation, thus Influence final result.

Further, in order to guarantee the consistency of data distribution, in the application by the way of stratified sampling to data into Row sampling.Specifically, the test data set carries out sampling acquisition by way of stratified sampling in the data set, described Data in data set in addition to the test data set are the training dataset.For example, having m in data set D₁A positive sample This, there is m₂A negative sample, and the ratio that S accounts for D is p, then the ratio that T accounts for D is (1-p), then it can be by m₁A positive sample Middle acquisition (m₁* the positive sample that p) a sample is concentrated as training data, and by m₂(m is acquired in a negative sample₂* p) a sample This negative sample concentrated as training data, remaining sample concentrated as test data.

Optionally, the data that the training data is concentrated are more than the data that the test data is concentrated.

After dividing to data set D, there are many sample in training dataset S, close to D, train the new model and D come Itself training the model come may be very close to, but since T is smaller, it is not accurate enough at this time to may result in assessment result Stablize；If S sample is seldom, and the new model for training can be made to train the model come with D and differed greatly.Therefore, Specifically, the ratio section that the data that the training data is concentrated account for the data of the data set is [2/3,4/5], the then survey The ratio section that data in examination data set account for the data of the data set is [1/3,1/5].

Further, when being divided to data set, data set can be subjected to n times random division, obtains the training of N group Data set and corresponding test data set, the N are more than or equal to 1, the N group testing result of acquisition can be averaged work in this way For final assessment result, the assessment result for being is more acurrate, more meets the quick application of different field.

Step 102, the data concentrated according to the training data obtain new the pre-training model re -training of acquisition Model.

Optionally, as shown in Fig. 2, the step includes following sub-step:

Step 1021, the data concentrated according to the training data, to the level of model rear end in the pre-training model In training weight re -training, obtain new weight.

It optionally, can also include: to obtain pre-training model before this step, the pre-training model includes training power Weight.

During deep learning, since computing resource is limited or training dataset is smaller, but we want again obtain compared with It is well more stable as a result, still can obtain some trained models, i.e. pre-training model first, directly to pre-training Model carries out re -training acquisition new model can save a large amount of people without trained a new model of starting from scratch in this way Power material resources.

The source model of one pre-training be it is select from available model, many research institutions have all issued based on super The model of large data sets, these all can serve as the choice of source model.The pre-training model that this programme obtains is with training The pre-training model of weight.

Further alternative, deep learning passes through forward calculation and backpropagation, continuous adjusting parameter, to extract optimal spy Sign, to achieve the purpose that prediction.Advanced connection of the level of model front end commonly used to capture input data, such as image border With main body etc.；The level of model rear end helps to make the information finally determined commonly used to capture, such as distinguishing target The detailed information of output.

After obtaining pre-training model, re -training total is not needed, it is only necessary to several layers of be trained for therein ?.Some layers of the weight that model originates is remained unchanged, the subsequent layer of re -training obtains new weight.That is basis The data that the training data is concentrated, instruct the training weight in the level of model rear end in the pre-training model of acquisition again Practice, obtains new weight.

During adjusting model, we can repeatedly be attempted, the number concentrated according to the different training data of N group It is adjusted according to pre-training model, freezes layer frozen layers and retraining layer retrain so as to find according to result Best collocation between layers.

Step 1022, the data concentrated according to the training data, to the level of model rear end in the pre-training model In parameter be adjusted, obtain new parameter.

Optionally, it is finely adjusted, trained model is applied to similar or only by the parameter to pre-training model Have in the different task of nuance.

Step 1023, according to the training weight, described remained unchanged in the level of model front end in the pre-training model New weight and the new parameter obtains new model.

It should be understood that the application of new model is the process of loop iteration, only by the lasting adjustment of model and tune It is excellent just to adapt to online data and business objective, it can just find most effective new model.

Step 103, the data concentrated according to the test data, carry out detection acquisition to the Generalization Capability of the new model Testing result.

After obtaining new model, need to assess the performance of new model, the application passes through the generalization to new model It can be carried out detection, to obtain the result to new model assessment.

Generalization Capability, that is, generalization ability, generalization ability (generalization ability) refer to machine learning algorithm pair The adaptability of fresh sample.The destination of study is to acquire the rule for lying in data to behind, to same rule The data other than collection are practised, trained network can also provide suitable output, which is known as generalization ability.

Optionally, in order to obtain more accurate testing result, the data that N group test data is concentrated, to the new model Generalization Capability detected, obtain N number of testing result, take the average value of N number of testing result as final testing result.

Step 104, when the testing result reaches pre-set level value, it is determined that the new model is the mould for meeting application Type.

, can be online by new model when new model reaches the pre-set level value of setting, it puts into production, obtains for enterprise The transfer learning method model used with user.

Optionally, as shown in Fig. 2, before this step further include:

Step 105, detect whether the testing result reaches pre-set level value.

Step 106, when the testing result is not up to the pre-set level value, it is determined that the new model is unsatisfactory for answering With, continue adjust model, i.e., execution step 1021, until assessment result meet apply.

Transfer learning method provided in an embodiment of the present invention based on deep learning pre-training model, by being drawn to data set It is divided into the training dataset and test data set of independent same distribution and mutual exclusion, according to the data that the training data is concentrated, to obtaining The pre-training model re -training taken obtains new model；According to the data that the test data is concentrated, to the general of the new model Change performance and carries out detection acquisition testing result；When the testing result reaches pre-set level value, it is determined that the new model is Meet the model of application.Can solve in the prior art face a certain field particular problem when, may usually be unable to get The problem of large-scale data needed for constructing model, and if it is given train up data on the basis of it is new to learn one Model, when the new model of needs is a large-scale neural network, time-consuming and consumes the problem more than resource.This programme is logical Re -training, the new model of acquisition can be carried out to the part layer in pre-training model for certain categorical data by crossing transfer learning In relationship can also be easily applied to the different problems in same field.

It should be understood that the size of the serial number of each step is not meant that the order of the execution order in above-described embodiment, each process Execution sequence should be determined by its function and internal logic, the implementation process without coping with the embodiment of the present invention constitutes any limit It is fixed.

The embodiment of the present invention provides a kind of transfer learning device based on deep learning pre-training model, as shown in figure 3, should Device includes: division module 301, training module 302, test module 303, determining module 304.

Division module 301, for data set to be divided into training dataset and test data set.

Training module 302, the data for being concentrated according to the training data, instructs the pre-training model of acquisition again Practice, obtains new model.

Optionally, training module 302, specifically for the data concentrated according to the training data, to the pre-training of acquisition Training weight re -training in model in the level of model rear end, obtains new weight；It is concentrated according to the training data Data are adjusted the parameter in the level of model rear end in the pre-training model, obtain new parameter；According to described pre- Training weight, the new weight and the new parameter remained unchanged in the level of model front end in training pattern, is obtained Obtain new model.

Test module 303, the data for being concentrated according to the test data carry out the Generalization Capability of the new model Detection obtains testing result.

Optionally, in order to obtain more accurate testing result, the data that test module 303 concentrates N group test data are right The Generalization Capability of the new model is detected, and N number of testing result is obtained, and takes the average value of N number of testing result as final Testing result.

Determining module 304, for when the testing result reaches pre-set level value, it is determined that the new model is to meet The model of application.

When the testing result is not up to the pre-set level value, it is determined that the new model is unsatisfactory for applying, then by Training module 302 continues to adjust model, until the assessment result of the new model of acquisition meets application.

The embodiment of the present invention provides a kind of transfer learning device based on deep learning pre-training model, passes through transfer learning Pre-training model in deep learning is adjusted to obtain new model, re-test mould by training module according to the data of frontier Block assesses the new model of acquisition, so that the new model for meeting application that determining module determines can also be applied easily In the different problems in same field.

Fig. 4 is the schematic diagram for the terminal device that one embodiment of the invention provides.As shown in figure 4, the terminal of the embodiment is set Standby 4 include: processor 401, memory 402 and are stored in the memory 402 and can run on the processor 401 Computer program 403, such as the transfer learning program based on deep learning pre-training model.The processor 401 executes institute The step in the above-mentioned transfer learning embodiment of the method based on deep learning pre-training model is realized when stating computer program 403, Such as step 101 shown in FIG. 1, to 104 or step 101 shown in Fig. 2 to step 106, the processor 401 executes described The function of each module in above-mentioned each Installation practice, such as the function of module 301 to 304 shown in Fig. 3 are realized when computer program 403 Energy.

Illustratively, the computer program 403 can be divided into one or more modules, one or more of Module is stored in the memory 402, and is executed by the processor 401, to complete the present invention.It is one or more of Module can be the series of computation machine program instruction section that can complete specific function, and the instruction segment is for describing the computer Implementation procedure of the program 403 in the device or terminal device 4 of the transfer learning based on deep learning pre-training model. For example, the computer program 403 can be divided into division module 301, training module 302, test module 303 determines mould Block 304, each module concrete function is as shown in figure 3, this is no longer going to repeat them.

The terminal device 4 can be the calculating such as desktop PC, notebook, palm PC and cloud server and set It is standby.The terminal device may include, but be not limited only to, processor 401, memory 402.It will be understood by those skilled in the art that Fig. 4 is only the example of terminal device 4, does not constitute the restriction to terminal device 4, may include more more or fewer than illustrating Component, perhaps combines certain components or different components, for example, the terminal device can also include input-output equipment, Network access equipment, bus etc..

Alleged processor 401 can be central processing unit (Central Processing Unit, CPU), can also be Other general processors, digital signal processor (Digital Signal Processor, DSP), specific integrated circuit (Application Specific Integrated Circuit, ASIC), ready-made programmable gate array (Field- Programmable Gate Array, FPGA) either other programmable logic device, discrete gate or transistor logic, Discrete hardware components etc..General processor can be microprocessor or the processor is also possible to any conventional processor Deng.

The memory 402 can be the internal storage unit of the terminal device 4, for example, terminal device 4 hard disk or Memory.The memory 402 is also possible to the External memory equipment of the terminal device 4, such as is equipped on the terminal device 4 Plug-in type hard disk, intelligent memory card (Smart Media Card, SMC), secure digital (Secure Digital, SD) card, Flash card (Flash Card) etc..Further, the memory 402 can also have been deposited both the inside including the terminal device 4 Storage unit also includes External memory equipment.The memory 402 is for storing the computer program and the terminal device 4 Other required programs and data.The memory 402, which can be also used for temporarily storing, have been exported or will export Data.

It is apparent to those skilled in the art that for convenience of description and succinctly, only with above-mentioned each function Can unit, module division progress for example, in practical application, can according to need and by above-mentioned function distribution by different Functional unit, module are completed, i.e., the internal structure of described device is divided into different functional unit or module, more than completing The all or part of function of description.Each functional unit in embodiment, module can integrate in one processing unit, can also To be that each unit physically exists alone, can also be integrated in one unit with two or more units, it is above-mentioned integrated Unit both can take the form of hardware realization, can also realize in the form of software functional units.In addition, each function list Member, the specific name of module are also only for convenience of distinguishing each other, the protection scope being not intended to limit this application.Above system The specific work process of middle unit, module, can refer to corresponding processes in the foregoing method embodiment, and details are not described herein.

In the above-described embodiments, it all emphasizes particularly on different fields to the description of each embodiment, is not described in detail or remembers in some embodiment The part of load may refer to the associated description of other embodiments.

Those of ordinary skill in the art may be aware that list described in conjunction with the examples disclosed in the embodiments of the present disclosure Member and algorithm steps can be realized with the combination of electronic hardware or computer software and electronic hardware.These functions are actually It is implemented in hardware or software, the specific application and design constraint depending on technical solution.Professional technician Each specific application can be used different methods to achieve the described function, but this realization is it is not considered that exceed The scope of the present invention.

In embodiment provided by the present invention, it should be understood that disclosed device/terminal device and method, it can be with It realizes by another way.For example, device described above/terminal device embodiment is only schematical, for example, institute The division of module or unit is stated, only a kind of logical function partition, there may be another division manner in actual implementation, such as Multiple units or components can be combined or can be integrated into another system, or some features can be ignored or not executed.Separately A bit, shown or discussed mutual coupling or direct-coupling or communication connection can be through some interfaces, device Or the INDIRECT COUPLING or communication connection of unit, it can be electrical property, mechanical or other forms.

The unit as illustrated by the separation member may or may not be physically separated, aobvious as unit The component shown may or may not be physical unit, it can and it is in one place, or may be distributed over multiple In network unit.It can select some or all of unit therein according to the actual needs to realize the mesh of this embodiment scheme 's.

It, can also be in addition, the functional units in various embodiments of the present invention may be integrated into one processing unit It is that each unit physically exists alone, can also be integrated in one unit with two or more units.Above-mentioned integrated list Member both can take the form of hardware realization, can also realize in the form of software functional units.

If the integrated module/unit be realized in the form of SFU software functional unit and as independent product sale or In use, can store in a computer readable storage medium.Based on this understanding, the present invention realizes above-mentioned implementation All or part of the process in example method, can also instruct relevant hardware to complete, the meter by computer program Calculation machine program can be stored in a computer readable storage medium, the computer program when being executed by processor, it can be achieved that on The step of stating each embodiment of the method.Wherein, the computer program includes computer program code, the computer program Code can be source code form, object identification code form, executable file or certain intermediate forms etc..Computer-readable Jie Matter may include: can carry the computer program code any entity or device, recording medium, USB flash disk, mobile hard disk, Magnetic disk, CD, computer storage, read-only memory (ROM, Read-Only Memory), random access memory (RAM, Random Access Memory), electric carrier signal, telecommunication signal and software distribution medium etc..It should be noted that described The content that computer-readable medium includes can carry out increasing appropriate according to the requirement made laws in jurisdiction with patent practice Subtract, such as does not include electric carrier signal and electricity according to legislation and patent practice, computer-readable medium in certain jurisdictions Believe signal.

Embodiment described above is merely illustrative of the technical solution of the present invention, rather than its limitations；Although referring to aforementioned reality Applying example, invention is explained in detail, those skilled in the art should understand that: it still can be to aforementioned each Technical solution documented by embodiment is modified or equivalent replacement of some of the technical features；And these are modified Or replacement, the spirit and scope for technical solution of various embodiments of the present invention that it does not separate the essence of the corresponding technical solution should all It is included within protection scope of the present invention.

Claims

1. a kind of transfer learning method based on deep learning pre-training model characterized by comprising

Data set is divided into training dataset and test data set；

According to the data that the test data is concentrated, detection is carried out to the Generalization Capability of the new model and obtains testing result；

2. the transfer learning method as described in claim 1 based on deep learning pre-training model, which is characterized in that the instruction Practicing data set and the test data set is that two data distributions are consistent and the data acquisition system of mutual exclusion.

3. the transfer learning method as claimed in claim 2 based on deep learning pre-training model, which is characterized in that the survey Examination data set carries out sampling acquisition by way of stratified sampling in the data set, removes the test number in the data set It is the training dataset according to the data except collection.

4. the transfer learning method as claimed in claim 3 based on deep learning pre-training model, which is characterized in that the instruction The data practiced in data set are more than the data that the test data is concentrated.

5. the transfer learning method as claimed in claim 4 based on deep learning pre-training model, which is characterized in that the instruction The ratio section for practicing the data that the data in data set account for the data set is [2/3,4/5].

6. the transfer learning method according to any one of claims 1 to 5 based on deep learning pre-training model, feature It is, it is described that data set is divided into training dataset and test data set, comprising:

Data set is subjected to n times random division, N group training dataset is obtained and corresponding test data set, the N is more than or equal to 1。

7. the transfer learning method as claimed in claim 6 based on deep learning pre-training model, which is characterized in that described New model is obtained to the pre-training model re -training of acquisition according to the data that the training data is concentrated, comprising:

According to the data that the training data is concentrated, to the training weight in the level of model rear end in the pre-training model of acquisition Re -training obtains new weight；

According to the data that the training data is concentrated, the parameter in the level of model rear end in the pre-training model is adjusted It is whole, obtain new parameter；

According to the training weight, the new weight and institute remained unchanged in the level of model front end in the pre-training model New parameter is stated, new model is obtained.

8. a kind of transfer learning device based on deep learning pre-training model characterized by comprising

Training module, the data for being concentrated according to the training data obtain new the pre-training model re -training of acquisition Model；

Test module, the data for being concentrated according to the test data carry out detection to the Generalization Capability of the new model and obtain Take testing result；

Determining module, for when the testing result reaches pre-set level value, it is determined that the new model is to meet application Model.

9. a kind of terminal device, including memory, processor and storage are in the memory and can be on the processor The computer program of operation, which is characterized in that the processor realizes such as claim 1 to 7 when executing the computer program The step of any one the method.

10. a kind of computer readable storage medium, the computer-readable recording medium storage has computer program, and feature exists In when the computer program is executed by processor the step of any one of such as claim 1 to 7 of realization the method.