CN108460455A - Model treatment method and device - Google Patents
Model treatment method and device Download PDFInfo
- Publication number
- CN108460455A CN108460455A CN201810103695.4A CN201810103695A CN108460455A CN 108460455 A CN108460455 A CN 108460455A CN 201810103695 A CN201810103695 A CN 201810103695A CN 108460455 A CN108460455 A CN 108460455A
- Authority
- CN
- China
- Prior art keywords
- model
- training
- target
- network model
- data
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/08—Learning methods
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Data Mining & Analysis (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The embodiment of the present invention provides a kind of model treatment method and device.The model treatment method includes:The initial network model is trained to obtain pre-training model using multi-group data, the initial network model includes the object construction formed by multilayered structure and the collocation structure that is formed by multilayered structure, and the pre-training model is the corresponding network model of the object construction after the training initial network model;Target data is inputted the pre-training model to carry out that intermediate output data is calculated;And train the target network model to obtain overlap joint model using the intermediate output data, it can form the identification model for the target signature in data to be identified to be identified after the pre-training model and the overlap joint model splicing.
Description
Technical field
The present invention relates to data processing fields, in particular to a kind of model treatment method and device.
Background technology
With the development of computer technology, machine learning has also obtained extensively as application.It is existing mainly to pass through input
Circuit training is carried out in a large amount of data to network model, so that network model can identify specific characteristic by study.But
Each type of data volume is bigger, when needing to learn new feature every time, is required for carrying out identification model complete
Training, cause data processing amount bigger, cause the training time also slow.
Invention content
In view of this, the embodiment of the present invention is designed to provide a kind of model treatment method and device.
A kind of model treatment method provided in an embodiment of the present invention, is applied to electric terminal, and the electric terminal is stored with
Initial network model and target network model, the model treatment method include:
Train the initial network model to obtain pre-training model using multi-group data, the initial network model include by
The object construction that multilayered structure is formed and the collocation structure formed by multilayered structure, the pre-training model are that training is described initial
The corresponding network model of the object construction after network model;
Target data is inputted the pre-training model to carry out that intermediate output data is calculated;And
Train the target network model to obtain overlap joint model using the intermediate output data, the pre-training model and
The identification model for the target signature in data to be identified to be identified can be formed after the overlap joint model splicing.
The embodiment of the present invention also provides a kind of model treatment method, advance in the electric terminal applied to electric terminal
The pre-training model for being stored with target network model and being trained using multi-group data, the model treatment method include:
Target data is inputted into the pre-training model and obtains intermediate output data;And
Train the target network model to obtain overlap joint model using the intermediate output data, the pre-training model and
Identification model can be formed after the overlap joint model splicing, to identify the spy of the target in data to be identified by the identification model
Sign.
The embodiment of the present invention also provides a kind of model treatment device, is applied to electric terminal, and the electric terminal is stored with
Initial network model and target network model, the model treatment device include:
Pre-training module, it is described first for training the initial network model to obtain pre-training model using multi-group data
Beginning network model includes the object construction formed by multilayered structure and the collocation structure that is formed by multilayered structure, the pre-training mould
Type is the corresponding network model of the object construction after the training initial network model;
Computing module for target data to be inputted the pre-training model carries out that intermediate output data is calculated;With
And
Target training module, for training the target network model to obtain seaming die using the intermediate output data
It can be formed after type, the pre-training model and the overlap joint model splicing for knowing to the target signature in data to be identified
Other identification model.
The embodiment of the present invention also provides a kind of model treatment device, advance in the electric terminal applied to electric terminal
The pre-training model for being stored with target network model and being trained using multi-group data, the model treatment device include:
Computing module obtains intermediate output data for target data to be inputted the pre-training model;And
Target training module, for training the target network model to obtain seaming die using the intermediate output data
Identification model can be formed after type, the pre-training model and the overlap joint model splicing, to be waited for by identification model identification
Identify the target signature in data.
Compared with prior art, the model treatment method and device of the embodiment of the present invention first passes through multi-group data training in advance
Pre-training model is obtained, when needing to obtain identification model, it is thus only necessary to be trained to obtain seaming die to target network model
Type, the pre-training model and overlap joint model splicing can form the identification model.Described in training every time
The corresponding archetype of identification model, it is possible to reduce obtain the training burden needed for identification model, improve the efficiency of model training.
To enable the above objects, features and advantages of the present invention to be clearer and more comprehensible, preferred embodiment cited below particularly, and coordinate
Appended attached drawing, is described in detail below.
Description of the drawings
In order to illustrate the technical solution of the embodiments of the present invention more clearly, below will be to needed in the embodiment attached
Figure is briefly described, it should be understood that the following drawings illustrates only certain embodiments of the present invention, therefore is not construed as pair
The restriction of range for those of ordinary skill in the art without creative efforts, can also be according to this
A little attached drawings obtain other relevant attached drawings.
Fig. 1 is the block diagram for the electric terminal that present pre-ferred embodiments provide.
Fig. 2 is the flow chart for the model treatment method that present pre-ferred embodiments provide.
Fig. 3 is the detail flowchart of the step S101 for the model treatment method that present pre-ferred embodiments provide.
Fig. 4 is the training process flow signal of the pre-training model for the model treatment method that present pre-ferred embodiments provide
Figure.
Fig. 5 is the flow chart for the model treatment method that another preferred embodiment of the present invention provides.
Fig. 6 is that the intermediate data in the model treatment method that present pre-ferred embodiments provide exports process flow signal
Figure.
Fig. 7 is the training process flow signal of the split-join model for the model treatment method that present pre-ferred embodiments provide
Figure.
Fig. 8 is the high-level schematic functional block diagram for the model treatment device that present pre-ferred embodiments provide.
Specific implementation mode
Below in conjunction with attached drawing in the embodiment of the present invention, technical solution in the embodiment of the present invention carries out clear, complete
Ground describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.Usually exist
The component of the embodiment of the present invention described and illustrated in attached drawing can be arranged and be designed with a variety of different configurations herein.Cause
This, the detailed description of the embodiment of the present invention to providing in the accompanying drawings is not intended to limit claimed invention below
Range, but it is merely representative of the selected embodiment of the present invention.Based on the embodiment of the present invention, those skilled in the art are not doing
The every other embodiment obtained under the premise of going out creative work, shall fall within the protection scope of the present invention.
It should be noted that:Similar label and letter indicate similar terms in following attached drawing, therefore, once a certain Xiang Yi
It is defined, then it further need not be defined and explained in subsequent attached drawing in a attached drawing.Meanwhile the present invention's
In description, term " first ", " second " etc. are only used for distinguishing description, are not understood to indicate or imply relative importance.
As shown in Figure 1, being the block diagram of an electric terminal 100.The electric terminal 100 includes model treatment device
110, memory 111, storage control 112, processor 113, Peripheral Interface 114, input-output unit 115, display unit
116.It will appreciated by the skilled person that structure shown in FIG. 1 is only to illustrate, not to the knot of electric terminal 100
It is configured to limit.For example, electric terminal 100 may also include more either less components than shown in Fig. 1 or have and figure
Different configuration shown in 1.Electric terminal 100 described in the present embodiment can be personal computer, processing server or movement
Electronic equipment etc. has the computing device of data-handling capacity.
The memory 111, storage control 112, processor 113, Peripheral Interface 114, input-output unit 115 and aobvious
Show that 116 each element of unit is directly or indirectly electrically connected between each other, to realize the transmission or interaction of data.For example, these
Element can be realized by one or more communication bus or signal wire be electrically connected between each other.The model treatment device 110
It can be stored in the memory 111 or be solidificated in the electricity in the form of software or firmware (Firmware) including at least one
Software function module in the operating system (Operating System, OS) of sub- terminal 100.The processor 113 is for holding
The executable module stored in line storage, such as software function module or computer that the model treatment device 110 includes
Program.
Wherein, the memory 111 may be, but not limited to, random access memory (Random Access
Memory, RAM), read-only memory (Read Only Memory, ROM), programmable read only memory (Programmable
Read-Only Memory, PROM), erasable read-only memory (Erasable Programmable Read-Only
Memory, EPROM), electricallyerasable ROM (EEROM) (Electric Erasable Programmable Read-Only
Memory, EEPROM) etc..Wherein, memory 111 is for storing program, the processor 113 after receiving and executing instruction,
Described program is executed, the method performed by electric terminal 100 that the process that any embodiment of the embodiment of the present invention discloses defines can
To be applied in processor 113, or realized by processor 113.
The processor 113 may be a kind of IC chip, the processing capacity with signal.Above-mentioned processor
113 can be general processor, including central processing unit (Central Processing Unit, abbreviation CPU), network processes
Device (Network Processor, abbreviation NP) etc.;It can also be digital signal processor (DSP), application-specific integrated circuit
(ASIC), field programmable gate array (FPGA) either other programmable logic device, discrete gate or transistor logic,
Discrete hardware components.It may be implemented or execute disclosed each method, step and the logic diagram in the embodiment of the present invention.It is general
Processor can be microprocessor or the processor can also be any conventional processor etc..
The Peripheral Interface 114 couples various input/output devices to processor 113 and memory 111.At some
In embodiment, Peripheral Interface 114, processor 113 and storage control 112 can be realized in one single chip.Other one
In a little examples, they can be realized by independent chip respectively.
The input-output unit 115 is for being supplied to user input data.The input-output unit 115 can be,
But it is not limited to, mouse and keyboard etc..
The display unit 116 provided between the electric terminal 100 and user an interactive interface (such as user behaviour
Make interface) or for display image data give user reference.In the present embodiment, the display unit can be liquid crystal display
Or touch control display.Can be the capacitance type touch control screen or resistance for supporting single-point and multi-point touch operation if touch control display
Formula touch screen etc..Single-point and multi-point touch operation is supported to refer to touch control display and can sense on the touch control display one
Or the touch control operation generated simultaneously at multiple positions, and transfer to processor to be calculated and located the touch control operation that this is sensed
Reason.
Referring to Fig. 2, being the model treatment side applied to electric terminal shown in FIG. 1 that present pre-ferred embodiments provide
The flow chart of method.The electric terminal is stored with initial network model and target network model.It below will be to shown in Fig. 2 specific
Flow is described in detail.
Step S101 trains the initial network model to obtain pre-training model using multi-group data.
In the present embodiment, the initial network model includes the object construction formed by multilayered structure and by multilayered structure shape
At collocation structure, the pre-training model is the corresponding network mould of the object construction after the training initial network model
Type.
In the present embodiment, each layer of structure in the initial network model includes parameter to be determined.By using institute
It states multi-group data and is trained determination.
In the present embodiment, the multi-group data can be the number for the multiple fields being stored in advance in the electric terminal
According to for example, weather data, everyday problem, historical knowledge data.
Target data is inputted the pre-training model and carries out that intermediate output data is calculated by step S102.
In the present embodiment, step S101 can be first carried out in advance, and be preserved obtained pre-training model is executed.It is needing
When using pre-training model, the pre-training model is reloaded, the target data, which is inputted the pre-training model, to be carried out
It calculates.In the present embodiment, the pre-training model can be used for multiple times, and not need to all hold before executing step S102 every time
Row step S101.It is protected that is, the step S101 after being executed once, will execute obtained pre-training model
It deposits, for being used for multiple times.
In the present embodiment, the target data can be the sample data sampled for a certain field.For example,
The target data can be every-day language sample data, briefing class data etc..
Step S103 trains the target network model to obtain overlap joint model using the intermediate output data, described pre-
The identification for the target signature in data to be identified to be identified can be formed after training pattern and the overlap joint model splicing
Model.
The model treatment method of the embodiment of the present invention first passes through multi-group data and trains to obtain pre-training model, needing in advance
When obtaining identification model, it is thus only necessary to be trained to obtain overlap joint model to target network model, the pre-training model and take
The identification model can be formed by connecing model splicing.Without training the corresponding original mould of the identification model every time
Type, it is possible to reduce obtain the training burden needed for identification model, improve the efficiency of model training.
In the present embodiment, every group of data include multiple sentences in the multi-group data, as shown in figure 3, the step S101
Including:Step S1011 and step S1012.
Each sentence in the multi-group data is carried out numerical value conversion, obtains the vector of designated length by step S1011.
In the present embodiment, the electric terminal is first identified the character in each sentence and is converted into number.
In one embodiment, can sentence be converted into number in the following manner.
First, character string is cleaned, removes forbidden character.The forbidden character includes additional character, network address.Its
It is secondary, by digital normalized, number is all converted to designated character, for example, number can be converted to "@".Again, by sentence
Son carries out word, word segmentation.Finally, word, word feature are converted to number, and single sample curtailment according to the index of dictionary
35 supply to 35, and dictionary is the unique number to each words feature, global identical dictionary.It is, for example, possible to use word
Allusion quotation is increased income library jieba.
It is described below with a specific example sentence:
Example sentence:" hello, today, weather was pretty good, 20 degree of &_&https of temperature://www.***.com ", will be in example sentence
Forbidden character, number carry out processing can obtain:" the pretty good temperature@degree of hello weather today ".
It can be obtained by character segmentation sentence:" the pretty good temperature@degree of hello weather today "=>['h','e','l','l','
O', ' modern ', ' day ', ' day ', ' gas ', ' or not, ' wrong ', ' temperature ', ' degree ', ' degree '].
It can be obtained according to word segmentation sentence:" the pretty good temperature@degree of hello weather today "=>[' hello', ' modern
Its ', ' good ', ' the temperature of ', ' weather ', ' weather today ', ' ', '@' everyday, ' degree '].
Words after cutting is indexed to obtain corresponding number in dictionary.By this step formed (n_sample,
35) data are the training data of model, and n_sample is total quantity of the sample after over-sampling.In an example, institute
Can be sampled by stating n_sample by 1000, as soon as if the i.e. corresponding sample size of classification will be adopted less than 1000 in the category
Sample is to 1000.Mode is:Multiplying power=(the current sample sizes of 1000-)/current sample size, then takes available sample to mend by this multiplying power
Foot 1000.
Above-mentioned word, word are indexed the vector searched and obtained in dictionary to be:['h','e','l','l','
O', ' modern ', ' day ', ' day ', ' gas ', ' or not, ' wrong ', ' temperature ', ' degree ', ' degree ']=>[0 0 0 0 0 0 0 0 0 0 0 0 0
0 0 0 0 0 0 0 0 1695 2793 473 473 440 477 32 32 398 3 181 1538 459 459]['
Hello', ' today ', ' everyday ', ' weather ', ' today weather ', ' good ', ' temperature ', '@', ' degree ']=>[0 0 0 0 0 0
0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 2508 10926 10622 22894 79562 12070
18442 459]。
Further, multi-group data can also be classified, to add label data per a kind of.Each sample is corresponding
Label data is the corresponding digital number of item name, one_hot forms is converted into when training, such as our categorical measures
Class_num is 10, and it is (n_sample, 10) that label data shape is obtained after processing, and here it is our training labels.
Step S1012, by initial network model described in the Input matrix of the vector formation of each sentence in multi-group data
Training obtains the pre-training model.
In the present embodiment, step S1012, including:A. the matrix vector of each sentence in multi-group data formed is defeated
Enter the initial network model and is iterated calculating;B. the value of the loss function of the initial network model is calculated, described in adjusting
Parameter to be determined in each layer of structure of initial network model, so that the corresponding loss function of parameter to be determined after adjusting
The average value of value reduces;Repeat the difference for the average value that step a and b is iterated to calculate until continuous preset times
Less than preset value, wherein the object construction of the parameter to be determined after last time adjusting is the pre-training model.
It can obtain being applicable in multiple necks by being first trained initial training model using multi-group data in the present embodiment
The pre-training model in domain.When needing the identification model to target signature for identification, directly part-structure can be instructed
Calculation amount when white silk can reduce trained improves identification model and obtains efficiency.
As shown in figure 4, the training described in a specific model obtains the flow of the pre-training model below.Its
In, the initial network model includes the object construction formed by multilayered structure and the collocation structure that is formed by multilayered structure.
In example shown in Fig. 4, the object construction includes embedding layers, LSTM layers and CNN layers;The collocation structure includes
FCNN layers and softmax layers.First, by treated, the multi-group data input initial network model calculates.This implementation
In example, when multi-group data inputs the initial network model for the first time, the value of the parameter to be determined is initial default value.It will
Initial network model is iterated calculating described in the Input matrix that the vector of each sentence in multi-group data is formed.
When softmax layers of output, it is calculated the value of the loss function (loss functions) of every layer of structure, adjusts the to be determined of every layer of structure
Parameter makes the average value of the value of the loss functions of whole layer structures reduce, and this completes an iterations.Then, it repeats above-mentioned
Output valve of the multi-group data after initial network model calculating is inputted the initial network model and counted by process again
Calculation obtains the value of loss functions, adjusts parameter to be determined, judges whether loss stops declining, can not until adjusting parameter to be determined
The average value for reducing loss functional values obtains pre-training model with regard to deconditioning.In this example, the pre-training model is true
Embedding layers, LSTM layers and CNN layers of formation after fixed parameter to be determined.
Referring to Fig. 5, being the model treatment side applied to electric terminal shown in FIG. 1 that present pre-ferred embodiments provide
The flow chart of method.The pre-training for being previously stored with target network model in the electric terminal and being trained using multi-group data
Model.Detailed process shown in fig. 5 will be described in detail below.
In the present embodiment, it is previously stored with target network model in the electric terminal and trains to obtain using multi-group data
Pre-training model.
Target data input pre-training model is obtained intermediate output data by step S201.
Step S202 trains the target network model to obtain overlap joint model using the intermediate output data, described pre-
Identification model can be formed after training pattern and the overlap joint model splicing, to be identified in data to be identified by the identification model
Target signature.
Step S201 in the present embodiment is similar to the step S102 in former approach embodiment, the step in the present embodiment
S202 is similar to the step S103 in former approach embodiment, and the description closed in this present embodiment can be referring again to previous embodiment
In description, details are not described herein.
The model treatment method of the embodiment of the present invention first passes through multi-group data and trains to obtain pre-training model, needing in advance
When obtaining identification model, it is thus only necessary to be trained to obtain overlap joint model to target network model, the pre-training model and take
The identification model can be formed by connecing model splicing.Without training the corresponding original mould of the identification model every time
Type, it is possible to reduce obtain the training burden needed for identification model, improve the efficiency of model training.
It is described to train the target network model to obtain overlap joint model using the intermediate output data in the present embodiment
Step includes:C. the intermediate output data is inputted into the target network model and is iterated calculating;D. the target is calculated
The value of the loss function of network model adjusts the parameter to be determined in each layer of structure of the target network model, so as to adjust
The average value of the value of the loss function of the corresponding each layer of structure of parameter to be determined after section reduces;It is straight to repeat step c and d
The difference of the average value iterated to calculate to continuous preset times is less than preset value, wherein after last time is adjusted
The target network model of parameter to be determined is the overlap joint model.
As shown in Figures 6 and 7, the training described in a specific model below obtains the flow of the overlap joint model.Such as
Shown in Fig. 6, the target data is inputted into the pre-training model and carries out that intermediate output data is calculated.Reality shown in fig. 6
Pre-training model includes embedding layers, LSTM layers and CNN layers described in example.As shown in fig. 7, by the intermediate output data
The target network model is inputted to be trained.The target network model in example shown in Fig. 7 includes three layers FCNN layers.
In this example, when the intermediate output data inputs the target network model for the first time, the ginseng to be determined of the target network
Several values is initial default value.The initial default value can be arranged in those skilled in the art according to demand.It then, will be described
Intermediate output data inputs the target network model and is iterated calculating.The target network mould is calculated when being exported at FCNN layers
The loss functional values of type, the parameter to be determined of every layer of structure of adjusting make the average value of the value of the loss functions of whole layer structures subtract
Small, this completes an iterations.Then, it repeats the above process multi-group data after initial network model calculating
Output valve input the initial network model again and carry out the value that loss functions are calculated, adjust parameter to be determined, judge
Whether loss, which stops, declining, and is obtained with regard to deconditioning until adjusting the average value that parameter to be determined can not reduce loss functional values
Overlap model.In this example, the overlap joint model is three layers of FCNN layers of formation after determining parameter to be determined.
Referring to Fig. 8, being the function module for the model treatment device 110 shown in FIG. 1 that present pre-ferred embodiments provide
Schematic diagram.The model treatment device 110 includes pre-training module 1101, computing module 1102 and target training module
1103。
The pre-training module 1101, for training the initial network model to obtain pre-training mould using multi-group data
Type, the initial network model include the object construction formed by multilayered structure and the collocation structure that is formed by multilayered structure, institute
Pre-training model is stated as the corresponding network model of the object construction after the training initial network model.
The computing module 1102 for target data to be inputted the pre-training model carries out that intermediate output is calculated
Data.
The target training module 1103, for training the target network model to obtain using the intermediate output data
Model is overlapped, can be formed for the target signature in data to be identified after the pre-training model and the overlap joint model splicing
The identification model being identified.
In the present embodiment, every group of data include multiple sentences in the multi-group data, and the pre-training module 1101 includes:
Date Conversion Unit and data training unit.
The Date Conversion Unit is specified for each sentence in the multi-group data to be carried out numerical value conversion
The vector of length.
The data training unit, by original net described in the Input matrix of the vector formation of each sentence in multi-group data
Network model training obtains the pre-training model.
In the present embodiment, training obtains the pre-training model to the data training unit in the following manner:
Primary iteration computation subunit, for will each sentence in multi-group data vector formed Input matrix described in
Initial network model calculates the value of the loss function in each layer of structure of the initial network model.
Initial parameter regulator unit, the ginseng to be determined in each layer of structure for adjusting the initial network model
It counts, so that the average value of the value of the loss function of the corresponding each layer of structure of parameter to be determined after adjusting reduces.
Primary iteration computation subunit and initial parameter regulator unit are repeated until continuous preset times iteration meter
The difference of the obtained average value is less than preset value, wherein the object construction of the parameter to be determined after last time adjusting
For the pre-training model.
The other details closed in this present embodiment can be with a step with reference to the description in above method embodiment, herein no longer
It repeats.
The model treatment device of the embodiment of the present invention first passes through multi-group data and trains to obtain pre-training model, needing in advance
When obtaining identification model, it is thus only necessary to be trained to obtain overlap joint model to target network model, the pre-training model and take
The identification model can be formed by connecing model splicing.Without training the corresponding original mould of the identification model every time
Type, it is possible to reduce obtain the training burden needed for identification model, improve the efficiency of model training.
The high-level schematic functional block diagram for the model treatment device shown in FIG. 1 that present pre-ferred embodiments provide.The present embodiment
The model treatment device provided is similar with the model treatment device that previous embodiment provides, the difference is that, this implementation
The pre-training model to prestore in executive agent electric terminal 100 in example.The model treatment device includes computing module 1102
With target training module 1103.
The computing module 1102 obtains intermediate output data for target data to be inputted the pre-training model.
The target training module 1103, for training the target network model to obtain using the intermediate output data
Model is overlapped, identification model can be formed after the pre-training model and the overlap joint model splicing, to pass through the identification model
Identify the target signature in data to be identified.
In the present embodiment, training obtains the overlap joint model to the target training module in the following manner:
Target iteration computing unit, for the intermediate output data to be inputted the target network model, described in calculating
The value of loss function in each layer of structure of target network model.
Target component adjusting unit, the parameter to be determined in each layer of structure for adjusting the target network model,
So that adjust after the corresponding each layer of structure of parameter to be determined loss function value average value reduce.
It repeats the target iteration computing unit and target component adjusts unit until continuous preset times iteration meter
The difference of the obtained average value is less than preset value, wherein the target network of the parameter to be determined after last time adjusting
Model is the overlap joint model.
The other details closed in this present embodiment can be with a step with reference to the description in above method embodiment, herein no longer
It repeats.
The model treatment device of the embodiment of the present invention first passes through multi-group data and trains to obtain pre-training model, needing in advance
When obtaining identification model, it is thus only necessary to be trained to obtain overlap joint model to target network model, the pre-training model and take
The identification model can be formed by connecing model splicing.Without training the corresponding original mould of the identification model every time
Type, it is possible to reduce obtain the training burden needed for identification model, improve the efficiency of model training.
In several embodiments provided herein, it should be understood that disclosed device and method can also pass through
Other modes are realized.The apparatus embodiments described above are merely exemplary, for example, the flow chart in attached drawing and block diagram
Show the device of multiple embodiments according to the present invention, the architectural framework in the cards of method and computer program product,
Function and operation.In this regard, each box in flowchart or block diagram can represent the one of a module, section or code
Part, a part for the module, section or code, which includes that one or more is for implementing the specified logical function, to be held
Row instruction.It should also be noted that at some as in the realization method replaced, the function of being marked in box can also be to be different from
The sequence marked in attached drawing occurs.For example, two continuous boxes can essentially be basically executed in parallel, they are sometimes
It can execute in the opposite order, this is depended on the functions involved.It is also noted that every in block diagram and or flow chart
The combination of box in a box and block diagram and or flow chart can use function or the dedicated base of action as defined in executing
It realizes, or can be realized using a combination of dedicated hardware and computer instructions in the system of hardware.
In addition, each function module in each embodiment of the present invention can integrate to form an independent portion
Point, can also be modules individualism, can also two or more modules be integrated to form an independent part.
It, can be with if the function is realized and when sold or used as an independent product in the form of software function module
It is stored in a computer read/write memory medium.Based on this understanding, technical scheme of the present invention is substantially in other words
The part of the part that contributes to existing technology or the technical solution can be expressed in the form of software products, the meter
Calculation machine software product is stored in a storage medium, including some instructions are used so that a computer equipment (can be
People's computer, server or network equipment etc.) it performs all or part of the steps of the method described in the various embodiments of the present invention.
And storage medium above-mentioned includes:USB flash disk, mobile hard disk, read-only memory (ROM, Read-Only Memory), arbitrary access are deposited
The various media that can store program code such as reservoir (RAM, Random Access Memory), magnetic disc or CD.It needs
Illustrate, herein, relational terms such as first and second and the like be used merely to by an entity or operation with
Another entity or operation distinguish, and without necessarily requiring or implying between these entities or operation, there are any this realities
The relationship or sequence on border.Moreover, the terms "include", "comprise" or its any other variant are intended to the packet of nonexcludability
Contain, so that the process, method, article or equipment including a series of elements includes not only those elements, but also includes
Other elements that are not explicitly listed, or further include for elements inherent to such a process, method, article, or device.
In the absence of more restrictions, the element limited by sentence "including a ...", it is not excluded that including the element
Process, method, article or equipment in there is also other identical elements.
The foregoing is only a preferred embodiment of the present invention, is not intended to restrict the invention, for the skill of this field
For art personnel, the invention may be variously modified and varied.All within the spirits and principles of the present invention, any made by repair
Change, equivalent replacement, improvement etc., should all be included in the protection scope of the present invention.It should be noted that:Similar label and letter exist
Similar terms are indicated in following attached drawing, therefore, once being defined in a certain Xiang Yi attached drawing, are then not required in subsequent attached drawing
It is further defined and is explained.
The above description is merely a specific embodiment, but scope of protection of the present invention is not limited thereto, any
Those familiar with the art in the technical scope disclosed by the present invention, can easily think of the change or the replacement, and should all contain
Lid is within protection scope of the present invention.Therefore, protection scope of the present invention should be subject to the protection scope in claims.
Claims (10)
1. a kind of model treatment method, which is characterized in that be applied to electric terminal, the electric terminal is stored with initial network mould
Type and target network model, the model treatment method include:
The initial network model is trained to obtain pre-training model using multi-group data, the initial network model includes by multilayer
The object construction that structure is formed and the collocation structure formed by multilayered structure, the pre-training model are the training initial network
The corresponding network model of the object construction after model;
Target data is inputted the pre-training model to carry out that intermediate output data is calculated;And
The target network model is trained to obtain overlap joint model using the intermediate output data, the pre-training model and described
The identification model for the target signature in data to be identified to be identified can be formed after overlap joint model splicing.
2. model treatment method as described in claim 1, which is characterized in that every group of data include multiple in the multi-group data
Sentence, it is described using multi-group data train the initial network model to obtain pre-training model the step of include:
Each sentence in the multi-group data is subjected to numerical value conversion, obtains the vector of designated length;
Initial network model training described in the Input matrix of the vector formation of each sentence in multi-group data is obtained described pre-
Training pattern.
3. model treatment method as claimed in claim 2, which is characterized in that each sentence by multi-group data to
Measuring the step of initial network model training described in the Input matrix formed obtains the pre-training model includes:
A. initial network model described in the Input matrix of the vector formation of each sentence in multi-group data is iterated calculating;
B. the value for calculating the loss function of the initial network model, in each layer of structure for adjusting the initial network model
Parameter to be determined so that adjust after the corresponding loss function of parameter to be determined value average value reduce;
The difference for repeating the average value that step a and b is iterated to calculate until continuous preset times is less than preset value,
Wherein, the object construction of the parameter to be determined after last time is adjusted is the pre-training model.
4. a kind of model treatment method, which is characterized in that be applied to electric terminal, target is previously stored in the electric terminal
Network model and the pre-training model trained using multi-group data, the model treatment method include:
Target data is inputted into the pre-training model and obtains intermediate output data;And
The target network model is trained to obtain overlap joint model using the intermediate output data, the pre-training model and described
Identification model can be formed after overlap joint model splicing, to identify the target signature in data to be identified by the identification model.
5. model treatment method as claimed in claim 4, which is characterized in that described to train institute using the intermediate output data
Stating the step of target network model obtains overlap joint model includes:
C. the intermediate output data is inputted into the target network model and is iterated calculating;
D. the value for calculating the loss function of the target network model, in each layer of structure for adjusting the target network model
Parameter to be determined so that adjust after the corresponding each layer of structure of parameter to be determined loss function value average value reduce;
The difference for repeating the average value that step c and d is iterated to calculate until continuous preset times is less than preset value,
Wherein, the target network model of the parameter to be determined after last time is adjusted is the overlap joint model.
6. a kind of model treatment device, which is characterized in that be applied to electric terminal, the electric terminal is stored with initial network mould
Type and target network model, the model treatment device include:
Pre-training module, for training the initial network model to obtain pre-training model, the original net using multi-group data
Network model includes the object construction formed by multilayered structure and the collocation structure that is formed by multilayered structure, and the pre-training model is
The corresponding network model of the object construction after the training initial network model;
Computing module for target data to be inputted the pre-training model carries out that intermediate output data is calculated;And
Target training module, for training the target network model to obtain overlap joint model, institute using the intermediate output data
It can be formed after stating pre-training model and the overlap joint model splicing for the target signature in data to be identified to be identified
Identification model.
7. model treatment device as claimed in claim 6, which is characterized in that every group of data include multiple in the multi-group data
Sentence, the pre-training module include:
Date Conversion Unit, for by the multi-group data each sentence carry out numerical value conversion, obtain designated length to
Amount;
Data training unit instructs initial network model described in the Input matrix of the vector formation of each sentence in multi-group data
Get the pre-training model.
8. model treatment device as claimed in claim 7, which is characterized in that the data training unit is instructed in the following manner
Get the pre-training model:
Primary iteration computation subunit, for will described in the Input matrix that is formed of vector of each sentence in multi-group data it is initial
Network model is iterated calculating;
Initial parameter regulator unit, the value of the loss function for calculating the initial network model, adjusts the original net
Parameter to be determined in each layer of structure of network model, so that the value of the corresponding loss function of parameter to be determined after adjusting is flat
Mean value reduces;
Primary iteration computation subunit and initial parameter regulator unit are repeated until continuous preset times iterate to calculate
The difference of the average value arrived is less than preset value, wherein the object construction of the parameter to be determined after last time adjusting is institute
State pre-training model.
9. a kind of model treatment device, which is characterized in that be applied to electric terminal, target is previously stored in the electric terminal
Network model and the pre-training model trained using multi-group data, the model treatment device include:
Computing module obtains intermediate output data for target data to be inputted the pre-training model;And
Target training module, for training the target network model to obtain overlap joint model, institute using the intermediate output data
Identification model can be formed after stating pre-training model and the overlap joint model splicing, to identify number to be identified by the identification model
Target signature in.
10. model treatment device as claimed in claim 9, which is characterized in that the target training module is in the following manner
Training obtains the overlap joint model:
Target iteration computing unit is iterated calculating for the intermediate output data to be inputted the target network model;
Target component adjusts unit, and the value of the loss function for calculating the target network model adjusts the target network
Parameter to be determined in each layer of structure of model, so that the loss letter of the corresponding each layer of structure of parameter to be determined after adjusting
The average value of several values reduces;
It repeats the target iteration computing unit and target component adjusts unit until continuous preset times iterate to calculate
The difference of the average value arrived is less than preset value, wherein the target network model of the parameter to be determined after last time adjusting
For the overlap joint model.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810103695.4A CN108460455A (en) | 2018-02-01 | 2018-02-01 | Model treatment method and device |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810103695.4A CN108460455A (en) | 2018-02-01 | 2018-02-01 | Model treatment method and device |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108460455A true CN108460455A (en) | 2018-08-28 |
Family
ID=63239310
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810103695.4A Pending CN108460455A (en) | 2018-02-01 | 2018-02-01 | Model treatment method and device |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108460455A (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109243493A (en) * | 2018-10-30 | 2019-01-18 | 南京工程学院 | Based on the vagitus emotion identification method for improving long memory network in short-term |
CN109685120A (en) * | 2018-12-11 | 2019-04-26 | 中科恒运股份有限公司 | Quick training method and terminal device of the disaggregated model under finite data |
CN111221963A (en) * | 2019-11-19 | 2020-06-02 | 成都晓多科技有限公司 | Intelligent customer service data training model field migration method |
CN111274422A (en) * | 2018-12-04 | 2020-06-12 | 北京嘀嘀无限科技发展有限公司 | Model training method, image feature extraction method and device and electronic equipment |
CN111105020B (en) * | 2018-10-29 | 2024-03-29 | 西安宇视信息科技有限公司 | Feature representation migration learning method and related device |
-
2018
- 2018-02-01 CN CN201810103695.4A patent/CN108460455A/en active Pending
Cited By (7)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111105020B (en) * | 2018-10-29 | 2024-03-29 | 西安宇视信息科技有限公司 | Feature representation migration learning method and related device |
CN109243493A (en) * | 2018-10-30 | 2019-01-18 | 南京工程学院 | Based on the vagitus emotion identification method for improving long memory network in short-term |
CN109243493B (en) * | 2018-10-30 | 2022-09-16 | 南京工程学院 | Infant crying emotion recognition method based on improved long-time and short-time memory network |
CN111274422A (en) * | 2018-12-04 | 2020-06-12 | 北京嘀嘀无限科技发展有限公司 | Model training method, image feature extraction method and device and electronic equipment |
CN109685120A (en) * | 2018-12-11 | 2019-04-26 | 中科恒运股份有限公司 | Quick training method and terminal device of the disaggregated model under finite data |
CN111221963A (en) * | 2019-11-19 | 2020-06-02 | 成都晓多科技有限公司 | Intelligent customer service data training model field migration method |
CN111221963B (en) * | 2019-11-19 | 2023-05-12 | 成都晓多科技有限公司 | Intelligent customer service data training model field migration method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108460455A (en) | Model treatment method and device | |
CN109934706A (en) | A kind of transaction risk control method, apparatus and equipment based on graph structure model | |
CN110334357A (en) | A kind of method, apparatus, storage medium and electronic equipment for naming Entity recognition | |
WO2021073390A1 (en) | Data screening method and apparatus, device and computer-readable storage medium | |
CN107391545A (en) | A kind of method classified to user, input method and device | |
CN108305158A (en) | A kind of method, apparatus and equipment of trained air control model and air control | |
CN109598517B (en) | Commodity clearance processing, object processing and category prediction method and device thereof | |
CN109446328A (en) | A kind of text recognition method, device and its storage medium | |
CN104361415B (en) | A kind of choosing method and device for showing information | |
CN107341173A (en) | A kind of information processing method and device | |
CN113449187A (en) | Product recommendation method, device and equipment based on double portraits and storage medium | |
CN108509407A (en) | Text semantic similarity calculating method, device and user terminal | |
CN110263161A (en) | A kind of processing method of information, device and equipment | |
CN110019790A (en) | Text identification, text monitoring, data object identification, data processing method | |
CN110276382A (en) | Listener clustering method, apparatus and medium based on spectral clustering | |
CN113592605B (en) | Product recommendation method, device, equipment and storage medium based on similar products | |
CN109582792A (en) | A kind of method and device of text classification | |
CN107515896A (en) | A kind of resource recommendation method, device and equipment | |
CN110033382A (en) | A kind of processing method of insurance business, device and equipment | |
CN110321430A (en) | Domain name identification and domain name identification model generation method, device and storage medium | |
CN110457470A (en) | A kind of textual classification model learning method and device | |
US20220335209A1 (en) | Systems, apparatus, articles of manufacture, and methods to generate digitized handwriting with user style adaptations | |
CN109255629A (en) | A kind of customer grouping method and device, electronic equipment, readable storage medium storing program for executing | |
CN115392237A (en) | Emotion analysis model training method, device, equipment and storage medium | |
CN113505273B (en) | Data sorting method, device, equipment and medium based on repeated data screening |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information | ||
CB02 | Change of applicant information |
Address after: 610000 Huayang Street, Tianfu New District, Chengdu City, Sichuan Province, No. 1, No. 2, No. 19 Building, Unit 2, 1903 Applicant after: Chengdu Xiaoduo Technology Co., Ltd. Address before: 610000 New Hope International Block A 2207, No. 19 Tianfu Third Street, Chengdu High-tech Zone, Sichuan Province Applicant before: CHENGDU XIAODUO TECH CO., LTD. |
|
RJ01 | Rejection of invention patent application after publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20180828 |