CN111737952A

CN111737952A - Training method and device for sequence labeling model

Info

Publication number: CN111737952A
Application number: CN202010591966.2A
Authority: CN
Inventors: 周楠楠; 杨海军; 徐倩
Original assignee: WeBank Co Ltd
Current assignee: WeBank Co Ltd
Priority date: 2020-06-24
Filing date: 2020-06-24
Publication date: 2020-10-02
Also published as: WO2021258914A1

Abstract

The invention relates to the field of natural language processing, in particular to a training method and a device of a sequence labeling model, which are used for effectively training the sequence labeling model under the condition of insufficient sample data volume, and the method comprises the following steps: training the sequence labeling model based on the sample training sentence set to obtain first loss information, after determining an anti-disturbance factor according to model parameters, obtaining second loss information based on the sample training sentence set added with the anti-disturbance factor, adjusting the model parameters of the sequence labeling model based on target loss information obtained by calculation of the first loss information and the second loss information, performing iterative training, and determining that a convergence condition is met. Therefore, different loss information can be obtained based on one sample training sentence by increasing the anti-disturbance factor, the generalization capability of the sequence labeling model obtained by training is stronger, the precision is higher, unnecessary noise interference is avoided, and the resource consumption is saved.

Description

Training method and device for sequence labeling model

Technical Field

The invention relates to the field of natural language processing, in particular to a training method and a training device for a sequence labeling model.

Background

The sequence labeling problem is an important and widely applied problem in the field of natural language processing, and after training of a built sequence labeling model is completed based on a training sample, people can perform sequence labeling on input sentences by means of the trained sequence labeling model. However, in many cases, the sample size is insufficient when training the sequence labeling model.

In the prior art, in order to obtain a sufficient sample size, a data enhancement processing mode is usually adopted to obtain a plurality of sample data from one sample data, and then sequence labeling model training is performed on the obtained plurality of sample data. However, when the sample data generated by the data enhancement processing is used for training, noise caused by the data enhancement processing is introduced, so that the accuracy of the sequence annotation model is greatly influenced, and the accuracy of the sequence annotation is further influenced.

Disclosure of Invention

The invention provides a training method and a training device for a sequence labeling model, which are used for solving the problem that an effective sequence labeling model cannot be obtained due to insufficient sample data quantity when training the labeling model in the prior art.

The specific technical scheme provided by the invention is as follows:

in a first aspect, a method for training a sequence annotation model is provided, including:

acquiring a sequence marking model to be trained and a sample training sentence set;

training the sequence labeling model based on the sample training sentence set to obtain first loss information;

determining an anti-disturbance factor according to the model parameters of the sequence labeling model, and training the sequence labeling model based on a sample training statement set added with the anti-disturbance factor to obtain second loss information;

calculating target loss information based on the first loss information and the second loss information, adjusting model parameters of the sequence labeling model based on the target loss information, performing iterative training, and outputting the trained sequence labeling model when a preset convergence condition is met.

Optionally, the determining a disturbance rejection factor according to the model parameter of the sequence labeling model includes:

obtaining the current model parameters of the sequence labeling model, calculating the gradient of the sequence labeling model based on the model parameters, and calculating the disturbance resisting factor based on the gradient and the preset hyper-parameters, wherein the hyper-parameters are used for adjusting the strength of the generated disturbance resisting.

Optionally, the calculating a robust disturbance factor based on the gradient and a preset hyper-parameter includes:

and acquiring a preset hyper-parameter, and taking the quotient of the product of the obtained hyper-parameter and the gradient and the norm of the gradient to obtain a disturbance resisting factor.

Optionally, before adjusting the model parameters of the sequence labeling model based on the target loss information and performing iterative training, the method further includes:

acquiring an original learning rate set for a pre-training model in the sequence labeling model, wherein the pre-training model is used for generating a corresponding word vector set based on an input sample training sentence;

according to layer coefficients preset corresponding to each level in the pre-training model, and in combination with the original learning rate, respectively determining the learning rate corresponding to each level, wherein the learning rate is used for representing the adjustment range of model parameters corresponding to each level;

the adjusting and iterative training of the model parameters of the sequence labeling model based on the target loss information comprises:

and adjusting the model parameters of each level in the sequence labeling model in an error back propagation mode based on the determined learning rate of each level in the sequence labeling model and the target loss information.

Optionally, before the obtaining of the sequence labeling model to be trained and the sample training sentence set, the method further includes:

obtaining a plurality of sample training sentences, determining the sentence length of each sample training sentence, and executing any one of the following operations on each sample training sentence in the plurality of sample training sentences based on the sentence length of each sample training sentence:

if the sentence length does not reach the preset fixed sentence length, filling the sample training sentence by adopting preset characters to generate a sample training sentence; or,

if the sentence length exceeds a preset fixed sentence length, cutting off the part exceeding the fixed sentence length in the sample training sentence to generate a sample training sentence; or,

and if the sentence length reaches the preset fixed sentence length, directly taking the sample training sentence as a sample training sentence.

Optionally, training the sequence labeling model based on the sample training sentence set to obtain first loss information, including:

inputting each sample training sentence in the sample training sentence set into a sequence labeling model, and respectively executing the following operations for each sample training sentence input into the sequence labeling model:

determining word vectors corresponding to characters in a sample training sentence, and generating a corresponding first word vector set;

performing entity labeling on the sample training sentence based on the first word vector set to obtain corresponding first prediction labeling information;

and calculating to obtain first loss information based on the labeling difference between the first prediction labeling information and the real labeling information corresponding to the sample training sentence.

Optionally, the training the sequence labeling model based on the sample training sentence set added with the disturbance resisting factor to obtain second loss information includes:

adding the counterdisturbance factor into the first word vector set to obtain a second word vector set;

performing entity labeling on the sample training sentence based on the second word vector set to obtain corresponding second prediction labeling information;

and calculating to obtain second loss information based on the labeling difference between the second prediction labeling information and the real labeling information corresponding to the sample training sentence.

Optionally, the determining that the preset convergence condition is met includes:

determining the difference between the prediction accuracy of the sample training sentence in each iteration process and the prediction accuracy of the sample training sentence in the previous iteration process in the continuous N iteration processes, and determining that the preset convergence condition is reached when the preset accuracy difference range is met; or,

determining a difference value between the target loss of the sequence labeling model in each iteration process and the target loss of the sequence labeling model in the previous iteration process in the continuous M iteration processes, and determining that a preset convergence condition is met when a preset loss difference value range is met; or,

and when the current iteration times reach the preset maximum iteration times, determining that the preset convergence condition is reached.

Optionally, after outputting the trained sequence tagging model, the method further includes:

and acquiring a statement to be processed, and calling the sequence labeling model to perform sequence labeling processing on the statement to be processed to obtain output prediction labeling information.

In a second aspect, a training apparatus for a sequence labeling model is provided, including:

the acquisition unit is used for acquiring a sequence marking model to be trained and a sample training sentence set;

the training unit is used for training the sequence labeling model based on the sample training sentence set to obtain first loss information;

the determining unit is used for determining an anti-disturbance factor according to the model parameters of the sequence labeling model, and training the sequence labeling model based on the sample training sentence set added with the anti-disturbance factor to obtain second loss information;

and the adjusting unit is used for calculating target loss information based on the first loss information and the second loss information, adjusting model parameters of the sequence labeling model based on the target loss information, performing iterative training, and outputting the trained sequence labeling model when a preset convergence condition is met.

Optionally, when determining the disturbance rejection factor according to the model parameter of the sequence labeling model, the determining unit is configured to:

Optionally, when the robust disturbance factor is calculated based on the gradient and a preset hyper-parameter, the determining unit:

Optionally, before adjusting the model parameters of the sequence labeling model based on the target loss information and performing iterative training, the adjusting unit is further configured to:

Optionally, before the obtaining of the sequence labeling model to be trained and the sample training sentence set, the obtaining unit is further configured to:

Optionally, when the sequence labeling model is trained based on the sample training sentence set to obtain first loss information, the training unit is configured to:

inputting each sample training sentence in the sample training sentence set into a sequence labeling model, and generating a corresponding first word vector set aiming at the word vector corresponding to each character in the obtained sample training sentence;

Optionally, when the sequence labeling model is trained based on the sample training sentence set added with the disturbance factor to obtain second loss information, the determining unit is configured to:

Optionally, when it is determined that the preset convergence condition is satisfied, the adjusting unit is configured to:

Optionally, after outputting the trained sequence labeling model, the adjusting unit is further configured to:

In a third aspect, a training apparatus for a sequence labeling model is provided, including:

a memory for storing executable instructions;

and the processor is used for reading and executing the executable instructions stored in the memory so as to realize the training method of the sequence labeling model.

In a fourth aspect, a storage medium is provided, in which instructions are executed by a processor, so that the processor can execute the training method of the sequence annotation model described in any one of the above.

The invention has the following beneficial effects:

the method comprises the steps of obtaining a sequence marking model to be trained and a sample training sentence set, then training the sequence marking model based on the sample training sentence set to obtain first loss information, determining an anti-disturbance factor according to model parameters of the sequence marking model, training the sequence marking model based on the sample training sentence set added with the anti-disturbance factor to obtain second loss information, then calculating to obtain target loss information based on the first loss information and the second loss information, adjusting model parameters of the sequence marking model based on the target loss information, performing iterative training, and outputting the trained sequence marking model when a preset convergence condition is met.

Therefore, by adding the anti-disturbance factor in the sequence labeling model, different loss information can be obtained based on one sample training sentence, so that the generalization capability of the sequence labeling model obtained by training is stronger, the precision is higher, unnecessary noise interference is avoided, and the resource consumption is saved. In addition, a large number of training samples do not need to be labeled manually, a large amount of labor cost and time can be saved, and therefore the training efficiency of the sequence labeling model can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a flow chart of training a sequence annotation model according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a logic structure of a training apparatus for a sequence annotation model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of an entity structure of a training apparatus for a sequence annotation model according to an embodiment of the present invention.

Detailed Description

In order to realize effective training of the sequence marking model under the condition of insufficient sample data, the embodiment of the invention acquires the sequence marking model to be trained and a sample training sentence set, then, training the sequence labeling model based on the sample training sentence set to obtain first loss information, determining an anti-disturbance factor according to the model parameters of the sequence labeling model, training the sequence labeling model based on the sample training sentence set added with the anti-disturbance factor to obtain second loss information, then, target loss information is calculated based on the first loss information and the second loss information, and based on the target loss information, adjusting the model parameters of the sequence labeling model and performing iterative training, and outputting the trained sequence labeling model when the preset convergence condition is met.

Preferred embodiments of the present invention will be described in further detail with reference to the accompanying drawings.

Referring to fig. 1, in the embodiment of the present invention, a training process of a sequence labeling model is as follows:

s101: and acquiring a sequence marking model to be trained and a sample training sentence set.

Specifically, first, sample data is acquired, and the statement length of each sample data is determined.

For example, sample data 1{ Xiaoming goes to bank today and repays two thousand yuan } is obtained, the sentence length of the sample data 1 is determined to be 12 characters, sample data 2{ Xiaoli goes to school every day for learning } is read, and the sentence length of the sample data 2 is determined to be 9 characters.

Further, after determining the statement length of each sample data, processing each sample data based on the statement length of each sample data in the specified number of sample data to obtain a sample training statement until all sample data are processed, where the processing method for each sample data includes, but is not limited to, the following cases:

in the first case: the sentence length of the sample data does not reach the preset fixed sentence length.

And if the sentence length of one sample data does not reach the preset fixed sentence length, filling the sample data by adopting preset characters to generate a sample training sentence.

For example, assuming that the preset fixed sentence length is 128 characters, the preset character is "0", and the sentence length of the sample training sentence 1 is 12 characters, at this time, the sentence length of the sample training sentence 1 does not reach 128 characters, and the sample data 1 is padded with the character "0", so as to generate the sample training sentence 1.

In the second case: the sentence length of the sample data exceeds the preset fixed sentence length.

And if the statement length of one sample data exceeds a preset fixed statement length, truncating the part of the one sample data exceeding the fixed statement length to generate a sample training statement.

For example, assuming that the preset fixed statement length is 128 characters, there are 130 characters of the statement length of the sample data X, and at this time, the statement length of the sample data X exceeds the preset fixed statement length by 128 characters, the sample data X is truncated at a portion exceeding 128 characters, and a sample training statement X is generated.

In the third case: the sentence length of the sample data reaches the preset fixed sentence length.

And if the statement length of one sample data reaches the preset fixed statement length, directly taking the sample data as a sample training statement.

For example, if the fixed statement length is 128 characters and the statement length of the sample data N is 128 characters, the sample data N is directly used as a sample training statement N.

It should be noted that, in the embodiment of the present invention, before determining the statement length of one sample data, the sample data is processed based on the preset statement start tag [ CLS ], the preset statement end tag [ SEP ], that is, the statement start tag [ CLS ] of one sample data and the statement end tag [ SEP ] of one sample data.

Further, in the embodiment of the present invention, a training sample sentence set is obtained based on the training sample sentences obtained by the processing.

It should be noted that, in the embodiment of the present invention, the obtained sequence labeling model is constructed based on a model architecture of Bidirectional Encoder Representation from converters (BERT) + Bidirectional Long and Short Term Memory network (BiLSTM) + Conditional Random Fields (CRF), where the BERT model is a pre-training model in the sequence labeling model.

S102: and training the sequence labeling model based on the sample training sentence set to obtain first loss information.

In the embodiment of the invention, after a sample training sentence set is obtained, a sequence labeling model is trained based on the sample sequence sentence set to obtain first loss information.

It should be noted that, in the embodiment of the present invention, when the sequence marker model is trained, in the iterative process, the sample training statement is read and processed in a batch processing manner. That is, according to the preset batch processing size, it is determined that a corresponding number of sample training sentences are read each time to perform model training.

For example, assuming that the preset batch size is 32, it is determined that 32 sample training sentences are read each time for model training.

For another example, assuming that the preset batch size is 64, it is determined that 64 sample training sentences are read each time for model training.

For convenience of description, the following describes the training process by taking only the example of inputting a sample training sentence into the sequence labeling model.

Specifically, each sample training sentence in the sample training sentence set is input into a sequence labeling model, and for each sample training sentence input into the sequence labeling model, the following operations are respectively performed:

s1: determining word vectors corresponding to characters in a sample training sentence, and generating a corresponding first word vector set.

Specifically, after a sample training sentence is input into a sequence labeling model, a corresponding first word vector set is generated based on word vectors corresponding to each character in the sample training sentence and output by a pre-training model in the sequence labeling model.

For example, after a sample training sentence { Xiaoming reading } is input into a sequence labeling model, a ' small ' corresponding word vector 1, a ' bright ' corresponding word vector 2, a ' corresponding word vector 3, a ' reading ' corresponding word vector 4, a ' book ' corresponding word vector 5 are determined, a word vector set obtained by reading one sample training sentence includes word vectors 1-5, and the word vectors 1-5 are 768 dimensions.

S2: and carrying out entity labeling on a sample training sentence based on the word vector set to obtain corresponding first prediction labeling information.

For example, after determining a word vector set corresponding to a sample training sentence, obtaining a prediction result of each word vector, for example, for a sample training sentence { two thousand yuan is paid by going to bank today in a mingming mode, corresponding first prediction marking information is a small (B-NAM) mingming (E-NAM) day (O) going to a silver (O) line (O) and a small (B-NAM) money (O) two thousand yuan (O) in a representation sample training sentence, wherein the small is the beginning of a name of a person, the mingming is the end of the name of the person, and the current, day, going to the silver, line, and money, two, thousand yuan are classified as other marks.

S3: and calculating to obtain first loss information based on the labeling difference between the first prediction labeling information and the real labeling information corresponding to a sample training sentence, wherein the first loss information is recorded as Le.

Specifically, a BilSTM + CRF model in the sequence labeling model determines a labeling difference between first prediction labeling information and real labeling information based on the obtained first prediction labeling information and the real labeling information corresponding to a sample training sentence, and calculates a first loss corresponding to the current sequence labeling model in a targeted manner.

For example, for a sample training sentence { Xiaoming two thousand yuan for today to go to bank, the corresponding real labeling information is: small (B-NAM) day (E-TIM) to silver (B-LOC) line (E-LOC) and also (O) two (B-MON) thousand (I-MON) elements (E-MON), where "B-" indicates the beginning of the labeled element, "I-" indicates the middle of the labeled element, and "E-" indicates the end of the labeled element. And then, calculating a first loss corresponding to the current sequence labeling model based on the difference between the real labeling information and the first labeling information obtained by the sequence labeling model.

S103: and determining a counterdisturbance factor according to the model parameters of the sequence labeling model.

In the embodiment of the invention, the current model parameters of the sequence labeling model are obtained, the gradient of the sequence labeling model is calculated based on the model parameters, and the disturbance resisting factor is calculated based on the gradient and the preset hyper-parameters, wherein the hyper-parameters are used for adjusting the strength of the generated disturbance resisting factor.

Specifically, the gradient g of the sequence labeling model is calculated based on the current model parameter of the sequence labeling model, further, in the embodiment of the present invention, a preset hyper-parameter is obtained, and the product of the obtained hyper-parameter and the gradient is subjected to quotient with the norm of the gradient to obtain the disturbance rejection factor.

The formula for calculating the opposition disturbance factor r is as follows:

r＝-g/||g||2

the system comprises a processor, a storage unit, a processor, a storage unit, a processor and a display unit, wherein the processor is used for processing the parameters of the processor, and the processor is used for processing the parameters of the processor.

S104: and training the sequence labeling model based on the sample training sentence set added with the anti-disturbance factor to obtain second loss information.

Specifically, the process of inputting a sample training sentence in the sample training set into the sequence labeling model in S102 to obtain the second loss information is described.

And adding the confrontation disturbance factor into the first word vector set obtained in the step S102 to obtain a second word vector set, and then carrying out entity labeling on the sample training sentence by a BilSTM + CRF model in the sequence labeling model based on the second word vector set to obtain corresponding second prediction labeling information.

For example, after an anti-disturbance factor is added to a first word vector set corresponding to a sample training statement { two thousand yuan for every day of Mingming and repayment for banking }, each word vector in the first word vector set is disturbed to generate a second word vector set. The sequence annotation model output is then obtained such as: and the second prediction marking information of the small (B-NAM) amine (E-NAM) is obtained by removing the second prediction marking information of the silver (O) line (E-NAM) and the second prediction marking information of the silver (O) line (E-NA.

Further, a second loss information is calculated based on the labeling difference between the second prediction labeling information and the real labeling information corresponding to the sample training sentence.

Specifically, based on the obtained second prediction labeling information and the real labeling information corresponding to one sample data, the labeling difference between the second prediction labeling information and the real labeling information is determined, and a second loss corresponding to the current sequence labeling model is calculated in a targeted manner, and the second loss is recorded as Lr.

For example, for a sample training sentence { Xiaoming two thousand yuan for today to go to bank, the corresponding real labeling information is: small (B-NAM) bright (E-NAM) current (B-TIM) day (E-TIM) (O) silver (B-LOC) line (E-LOC) and (O) money (O) two (B-MON) thousand (I-MON) yuan (E-MON), and the second prediction labeling information obtained for the sample training sentence is: and removing (O) silver (B-NAM) in the current (O) day (O) of the small (B-NAM) (E-NAM) line (E-NAM) and returning (O) two (O) thousand (O) yuan (O), and further obtaining a second loss corresponding to the current sequence labeling model based on the difference between the second prediction information and the real labeling information.

Therefore, on the basis of one sample training sentence, different prediction marking information output by the sequence marking model is obtained under the condition of adding the anti-disturbance factor, the number of the sample training sentences can be increased under the condition of not introducing noise under the condition of limited samples, and the generalization capability and the precision of the sequence marking model can be stronger and higher when the sequence marking model is trained on the basis of the sample training sentences added with the anti-disturbance factor.

S105: and calculating target loss information based on the first loss information and the second loss information, and adjusting model parameters of the sequence labeling model based on the target loss information and performing iterative training.

Specifically, after an Le determined for the difference between the first prediction labeling information and the real labeling information and an Lr determined for the difference between the second prediction labeling information and the real labeling information are obtained, the sum of the Le and the Lr is further used as target loss information, and the target loss information is denoted as L.

It should be noted that, in the embodiment of the present invention, before adjusting and performing iterative training on the model parameters of the sequence labeling model based on the target loss information, a learning rate needs to be configured for each layer of a pre-training model in the sequence labeling model, specifically, an original learning rate set for the pre-training model in the sequence labeling model is obtained, where the pre-training model is used to generate a corresponding word vector set based on an input sample training sentence, and then, learning rates corresponding to respective levels are respectively determined according to layer coefficients preset for the respective levels in the pre-training model and by combining the original learning rate, where the learning rate is used to represent adjustment ranges of the model parameters corresponding to the respective levels.

It should be noted that, for the pre-training model in the sequence labeling model, the learning rate of each level in the pre-training model is different, generally, the upper layer of the pre-training model includes more semantic level information, the middle layer includes syntax level information, and the bottom layer includes information on terms of phrases, so when configuring the learning rate, a higher learning rate is generally configured for the upper layer of the pre-training model to implement more changes of upper layer parameters, and a lower learning rate is generally configured for the bottom layer of the pre-training model to implement less changes of bottom layer parameters.

Specifically, the learning rate of each layer in the pre-training model in the sequence labeling model is calculated by adopting the following two modes:

in a first way,

Calculating the learning rate of each layer in a pre-training model in the sequence labeling model according to the following formula:

Li＝Lr/Ci

where Li represents the learning rate of the ith layer, Lr represents the original learning rate configured for the pre-trained model, and Ci represents the layer coefficients of the ith layer.

It should be noted that, the smaller i represents the higher level, taking the pre-training model has three levels as an example, C1 represents the level system coefficient of the upper level of the pre-training model, C2 represents the level system coefficient of the middle level of the pre-training model, and C3 represents the level system coefficient of the lower level of the pre-training model, wherein the value of Ci may be adjusted according to actual processing requirements.

The second way,

Calculating the learning rate of each layer in a pre-training model in the sequence labeling model by adopting the following formula:

Li+1＝Li/C

where Li denotes the learning rate of the i-th layer, Li +1 denotes the learning rate of the i + 1-th layer, and C is a fixed parameter.

For example, if the initial learning rate L1 of layer 1 is set to 0.025 and the fixed parameter C is set to 5, the learning rate of layer 2 is 0.005 and the learning rate of layer 3 is 0.001.

It should be noted that, the smaller i represents the higher the level, taking the pre-training model has three levels as an example, L1 represents the learning rate of the upper layer of the pre-training model, L2 represents the learning rate of the middle layer of the pre-training model, and L3 represents the learning rate of the bottom layer of the pre-training model.

Therefore, the method considers that due to the fact that the sample size is insufficient, when the pre-training model is adjusted, different learning rates are configured for different levels of the pre-training model, the upper-layer parameters of the pre-training model are changed more, the bottom-layer parameters of the pre-training model are changed less, the parameter adjustment of the pre-training model can be more scientific, and the sequence labeling model with higher precision is obtained.

Further, based on the determined learning rate of each level in the sequence labeling model and the target loss information, adjusting model parameters of each level in the sequence labeling model in an error back propagation manner. The specific process can be as follows: calculating the partial derivative value of the target loss information to the model parameter of each level, calculating the product of the learning rate of each level and the calculated partial derivative value, and adjusting the model parameter of each level by using the product corresponding to each level, wherein the adjusted model parameter of each level is the difference value of the model parameter before adjustment and the corresponding product.

For example, taking an example of adjusting a certain model parameter W1 at a certain level in the sequence labeling model, first, the partial derivative of the target loss information to W1 is calculated based on the calculated target loss information, and then the difference between the value of the model parameter W1 before adjustment and the product of the partial derivative value and the learning rate is calculated, and then the calculated difference is taken as the updated W1, and if the original W1 is assumed to be 0.4, the learning rate is 0.01, and the calculated partial derivative value is 0.04728, the updated W1 bit is 0.4-0.01 × 0.04728 — 0.3995727.

In the embodiment of the present invention, an iterative training mode is adopted to train based on each sample training sentence in the sample training sentence set, specifically, in the training process of the sequence annotation model, a target loss information is obtained based on the sample training sentence, after one adjustment of the sequence annotation model is completed, a new sample training sentence is obtained from the sample training sentence set and input into the adjusted sequence annotation model for another training, and the iteration is repeated until it is determined that the convergence condition is satisfied, which is not described herein again.

In this way, equivalent to adding the disturbance resisting factor to the loss of the sequence labeling model, when the sequence labeling model is adjusted based on the target loss information added with the disturbance resisting factor, the generalization capability of the sequence labeling model can be stronger, and the precision is higher.

S106: and when the preset convergence condition is met, outputting the trained sequence labeling model.

Specifically, in the embodiment of the present invention, after determining the target loss information of the sequence labeling model, the following methods may be adopted to determine that the preset convergence condition is satisfied:

the first mode is as follows: and determining the difference between the prediction accuracy of the sample training sentence in each iteration process and the prediction accuracy of the sample training sentence in the previous iteration process in the continuous N iteration processes, and determining that the preset convergence condition is reached when the preset accuracy difference range is met.

It should be noted that the prediction accuracy can be measured by the accuracy of the comparison between the prediction labeling information and the actual labeling information, that is, the percentage of correctly labeled information content in the prediction labeling information.

For example, it is assumed that after a sample training sentence including 50 characters is sequence-labeled by the sequence labeling model, 40 character sequences are determined to be correctly labeled, and thus, the accuracy of the sequence labeling model is 80%.

In the embodiment of the invention, the value of N can be set according to the actual application scene.

For example, assuming that the value of N is 2, the preset accuracy difference range is 1% to 5%, the prediction accuracy 1 of the sample training sentence in the 10 th iteration is 80%, the prediction accuracy 2 of the sample training sentence in the 9 th iteration is 75%, and the prediction accuracy 3 of the sample training sentence in the 8 th iteration is 70%, obviously, the difference between the prediction accuracy 1 of the sample training sentence in the 10 th iteration and the prediction accuracy 2 of the sample training sentence in the 9 th iteration is 5%, and the difference between the prediction accuracy 2 of the sample training sentence in the 9 th iteration and the prediction accuracy 3 of the sample training sentence in the 8 th iteration is 5%, at this time, determining the difference between the prediction accuracy of the sample training sentence in each iteration and the prediction accuracy of the sample training sentence in the previous iteration in the consecutive 2 iterations, and if the preset convergence condition is met, judging that the preset convergence condition is met.

The third mode is as follows: and determining the difference value between the target loss information of the sequence marking model in each iteration process and the target loss information of the sequence marking model in the previous iteration process in the continuous M iteration processes, and determining that the preset convergence condition is reached when the preset accuracy difference value range is met.

It should be noted that, in the embodiment of the present invention, the value of M may be set according to an actual application scenario.

For example, assuming that the value of M is 5, the preset loss difference range is 1% to 2.5%, the target loss information of the sequence labeling model in the 30 th iteration is 7.5%, the target loss information in the 29 th iteration is 8.4%, the target loss information in the 28 th iteration is 9.2%, the target loss information in the 27 th iteration is 10.3%, the target loss information in the 26 th iteration is 11.6%, and the target loss information in the 25 th iteration is 13.0%, so that the difference of the target loss information in the adjacent iterations from the 25 th iteration to the 30 th iteration is 1.4%, 1.3%, 1.1%, 0.8%, and 0.9%, and the difference satisfying the target loss information for 5 consecutive times is in the loss difference range, it is determined that the preset convergence condition is satisfied.

The third mode is as follows: and when the current iteration times reach the preset maximum iteration times, determining that the preset convergence condition is met.

For example, assuming that the preset maximum number of iterations is 50, when the current number of iterations reaches 50, it is determined that the preset convergence condition is satisfied.

Further, when any one of the convergence conditions is determined to be satisfied, the convergence of the sequence labeling model can be determined, and the trained sequence labeling model can be output.

Further, in practical application, after the sentence to be processed is obtained, the sequence tagging model is called to perform sequence tagging processing on the sentence to be processed, so as to obtain the output prediction tagging information.

Based on the same inventive concept, referring to fig. 2, in an embodiment of the present invention, a training apparatus for a sequence annotation model is provided, which at least includes: an acquisition unit 201, a training unit 202, a determination unit 203 and an adjustment unit 204, wherein,

an obtaining unit 201, which obtains a sequence labeling model to be trained and a sample training sentence set;

a training unit 202, configured to train the sequence labeling model based on the sample training sentence set to obtain first loss information;

the determining unit 203 determines an anti-disturbance factor according to the model parameters of the sequence labeling model, and trains the sequence labeling model based on the sample training sentence set added with the anti-disturbance factor to obtain second loss information;

an adjusting unit 204, configured to calculate target loss information based on the first loss information and the second loss information, adjust model parameters of the sequence labeling model based on the target loss information, perform iterative training, and output the trained sequence labeling model when it is determined that a preset convergence condition is satisfied.

Optionally, when determining the disturbance rejection factor according to the model parameter of the sequence labeling model, the determining unit 203 is configured to:

Optionally, when the robust disturbance factor is calculated based on the gradient and the preset hyper-parameter, the determining unit 203:

Optionally, before adjusting the model parameters of the sequence labeling model based on the target loss information and performing iterative training, the adjusting unit 204 is further configured to:

Optionally, before the obtaining of the sequence labeling model to be trained and the sample training sentence set, the obtaining unit 201 is further configured to:

Optionally, when the sequence labeling model is trained based on the sample training sentence set to obtain first loss information, the training unit 202 is configured to:

Optionally, when the sequence labeling model is trained based on the sample training sentence set added with the disturbance factor to obtain second loss information, the determining unit 203 is configured to:

Optionally, when it is determined that the preset convergence condition is satisfied, the adjusting unit 204 is configured to:

Optionally, after outputting the trained sequence tagging model, the adjusting unit 204 is further configured to:

Based on the same inventive concept, referring to fig. 3, the disclosed embodiment provides a training apparatus for a sequence annotation model, which at least includes:

a memory 301 for storing executable instructions;

a processor 302 for reading and executing the executable instructions stored in the memory, and performing the following processes:

Optionally, when determining the disturbance rejection factor according to the model parameter of the sequence labeling model, the processor 302 is configured to:

Optionally, when the countermeasure disturbance factor is calculated based on the gradient and the preset hyper-parameter, the processor 302 is configured to:

Optionally, before adjusting the model parameters of the sequence labeling model based on the target loss information and performing iterative training, the processor 302 is further configured to:

and respectively determining the learning rate corresponding to each level according to the layer coefficient preset corresponding to each level in the pre-training model and by combining the original learning rate, wherein the learning rate is used for representing the adjustment range of the model parameter corresponding to each level.

Optionally, before obtaining the sequence labeling model to be trained and the sample training sentence set, the processor 302 is further configured to:

Optionally, when the sequence labeling model is trained based on the sample training sentence set to obtain first loss information, the processor 302 is configured to:

Optionally, when the sequence labeling model is trained based on the sample training sentence set added with the disturbance factor to obtain second loss information, the processor 302 is configured to:

Optionally, when it is determined that the preset convergence condition is satisfied, the processor 302 is configured to:

Optionally, after outputting the trained sequence labeling model, the processor 302 is further configured to:

Based on the same inventive concept, embodiments of the present invention provide a storage medium, and instructions in the storage medium, when executed by a processor, enable the processor to execute the training method of the sequence annotation model as described in any one of the above embodiments.

In summary, in the embodiments of the present invention, a sequence annotation model to be trained and a sample training sentence set are obtained, then the sequence annotation model is trained based on the sample training sentence set to obtain first loss information, an anti-disturbance factor is determined according to a model parameter of the sequence annotation model, the sequence annotation model is trained based on the sample training sentence set to which the anti-disturbance factor is added to obtain second loss information, then target loss information is obtained by calculation based on the first loss information and the second loss information, model parameters of the sequence annotation model are adjusted based on the target loss information and are subjected to iterative training, and when it is determined that a preset convergence condition is satisfied, the trained sequence annotation model is output. Therefore, by adding the anti-disturbance factor in the sequence labeling model, different loss information can be obtained based on one sample training sentence, so that the generalization capability of the sequence labeling model obtained by training is stronger, the precision is higher, unnecessary noise interference is avoided, and the resource consumption is saved. In addition, a large number of training samples do not need to be labeled manually, a large amount of labor cost and time can be saved, and therefore the training efficiency of the sequence labeling model can be improved.

As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.

The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.

These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.

While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.

It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims

1. A training method of a sequence labeling model is characterized by comprising the following steps:

2. The method of claim 1, wherein determining the counterdisturbance factor according to the model parameters of the sequence labeling model comprises:

3. The method of claim 2, wherein the calculating a countering perturbation factor based on the gradient and a preset hyperparameter comprises:

and acquiring the hyper-parameter, and performing quotient on the product of the acquired hyper-parameter and the gradient and the norm of the gradient to obtain a disturbance resisting factor.

4. The method of claim 1, wherein before performing the iterative training to adjust the model parameters of the sequence labeling model based on the target loss information, the method further comprises:

5. The method of any one of claims 1-4, wherein training the sequence annotation model based on the set of sample training sentences to obtain first loss information comprises:

6. The method of claim 5, wherein the training the sequence labeling model based on the sample training sentence set added with the disturbance rejection factor to obtain second loss information comprises:

7. The method of any one of claims 1-4, wherein the outputting the trained sequence labeling model further comprises:

8. A training device for a sequence labeling model is characterized by comprising:

9. A training device for a sequence labeling model is characterized by comprising:

a memory for storing executable instructions;

a processor for reading and executing the executable instructions stored in the memory to implement the training method of the sequence annotation model according to any one of claims 1 to 7.

10. A storage medium, wherein instructions in the storage medium, when executed by a processor, enable the processor to perform a method of training a sequence annotation model according to any one of claims 1 to 7.