CN111737952A - Training method and device for sequence labeling model - Google Patents

Training method and device for sequence labeling model Download PDF

Info

Publication number
CN111737952A
CN111737952A CN202010591966.2A CN202010591966A CN111737952A CN 111737952 A CN111737952 A CN 111737952A CN 202010591966 A CN202010591966 A CN 202010591966A CN 111737952 A CN111737952 A CN 111737952A
Authority
CN
China
Prior art keywords
model
training
sequence
sequence labeling
loss information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202010591966.2A
Other languages
Chinese (zh)
Inventor
周楠楠
杨海军
徐倩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
WeBank Co Ltd
Original Assignee
WeBank Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by WeBank Co Ltd filed Critical WeBank Co Ltd
Priority to CN202010591966.2A priority Critical patent/CN111737952A/en
Publication of CN111737952A publication Critical patent/CN111737952A/en
Priority to PCT/CN2021/094180 priority patent/WO2021258914A1/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/10Text processing
    • G06F40/103Formatting, i.e. changing of presentation of documents
    • G06F40/117Tagging; Marking up; Designating a block; Setting of attributes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to the field of natural language processing, in particular to a training method and a device of a sequence labeling model, which are used for effectively training the sequence labeling model under the condition of insufficient sample data volume, and the method comprises the following steps: training the sequence labeling model based on the sample training sentence set to obtain first loss information, after determining an anti-disturbance factor according to model parameters, obtaining second loss information based on the sample training sentence set added with the anti-disturbance factor, adjusting the model parameters of the sequence labeling model based on target loss information obtained by calculation of the first loss information and the second loss information, performing iterative training, and determining that a convergence condition is met. Therefore, different loss information can be obtained based on one sample training sentence by increasing the anti-disturbance factor, the generalization capability of the sequence labeling model obtained by training is stronger, the precision is higher, unnecessary noise interference is avoided, and the resource consumption is saved.

Description

Training method and device for sequence labeling model
Technical Field
The invention relates to the field of natural language processing, in particular to a training method and a training device for a sequence labeling model.
Background
The sequence labeling problem is an important and widely applied problem in the field of natural language processing, and after training of a built sequence labeling model is completed based on a training sample, people can perform sequence labeling on input sentences by means of the trained sequence labeling model. However, in many cases, the sample size is insufficient when training the sequence labeling model.
In the prior art, in order to obtain a sufficient sample size, a data enhancement processing mode is usually adopted to obtain a plurality of sample data from one sample data, and then sequence labeling model training is performed on the obtained plurality of sample data. However, when the sample data generated by the data enhancement processing is used for training, noise caused by the data enhancement processing is introduced, so that the accuracy of the sequence annotation model is greatly influenced, and the accuracy of the sequence annotation is further influenced.
Disclosure of Invention
The invention provides a training method and a training device for a sequence labeling model, which are used for solving the problem that an effective sequence labeling model cannot be obtained due to insufficient sample data quantity when training the labeling model in the prior art.
The specific technical scheme provided by the invention is as follows:
in a first aspect, a method for training a sequence annotation model is provided, including:
acquiring a sequence marking model to be trained and a sample training sentence set;
training the sequence labeling model based on the sample training sentence set to obtain first loss information;
determining an anti-disturbance factor according to the model parameters of the sequence labeling model, and training the sequence labeling model based on a sample training statement set added with the anti-disturbance factor to obtain second loss information;
calculating target loss information based on the first loss information and the second loss information, adjusting model parameters of the sequence labeling model based on the target loss information, performing iterative training, and outputting the trained sequence labeling model when a preset convergence condition is met.
Optionally, the determining a disturbance rejection factor according to the model parameter of the sequence labeling model includes:
obtaining the current model parameters of the sequence labeling model, calculating the gradient of the sequence labeling model based on the model parameters, and calculating the disturbance resisting factor based on the gradient and the preset hyper-parameters, wherein the hyper-parameters are used for adjusting the strength of the generated disturbance resisting.
Optionally, the calculating a robust disturbance factor based on the gradient and a preset hyper-parameter includes:
and acquiring a preset hyper-parameter, and taking the quotient of the product of the obtained hyper-parameter and the gradient and the norm of the gradient to obtain a disturbance resisting factor.
Optionally, before adjusting the model parameters of the sequence labeling model based on the target loss information and performing iterative training, the method further includes:
acquiring an original learning rate set for a pre-training model in the sequence labeling model, wherein the pre-training model is used for generating a corresponding word vector set based on an input sample training sentence;
according to layer coefficients preset corresponding to each level in the pre-training model, and in combination with the original learning rate, respectively determining the learning rate corresponding to each level, wherein the learning rate is used for representing the adjustment range of model parameters corresponding to each level;
the adjusting and iterative training of the model parameters of the sequence labeling model based on the target loss information comprises:
and adjusting the model parameters of each level in the sequence labeling model in an error back propagation mode based on the determined learning rate of each level in the sequence labeling model and the target loss information.
Optionally, before the obtaining of the sequence labeling model to be trained and the sample training sentence set, the method further includes:
obtaining a plurality of sample training sentences, determining the sentence length of each sample training sentence, and executing any one of the following operations on each sample training sentence in the plurality of sample training sentences based on the sentence length of each sample training sentence:
if the sentence length does not reach the preset fixed sentence length, filling the sample training sentence by adopting preset characters to generate a sample training sentence; or,
if the sentence length exceeds a preset fixed sentence length, cutting off the part exceeding the fixed sentence length in the sample training sentence to generate a sample training sentence; or,
and if the sentence length reaches the preset fixed sentence length, directly taking the sample training sentence as a sample training sentence.
Optionally, training the sequence labeling model based on the sample training sentence set to obtain first loss information, including:
inputting each sample training sentence in the sample training sentence set into a sequence labeling model, and respectively executing the following operations for each sample training sentence input into the sequence labeling model:
determining word vectors corresponding to characters in a sample training sentence, and generating a corresponding first word vector set;
performing entity labeling on the sample training sentence based on the first word vector set to obtain corresponding first prediction labeling information;
and calculating to obtain first loss information based on the labeling difference between the first prediction labeling information and the real labeling information corresponding to the sample training sentence.
Optionally, the training the sequence labeling model based on the sample training sentence set added with the disturbance resisting factor to obtain second loss information includes:
adding the counterdisturbance factor into the first word vector set to obtain a second word vector set;
performing entity labeling on the sample training sentence based on the second word vector set to obtain corresponding second prediction labeling information;
and calculating to obtain second loss information based on the labeling difference between the second prediction labeling information and the real labeling information corresponding to the sample training sentence.
Optionally, the determining that the preset convergence condition is met includes:
determining the difference between the prediction accuracy of the sample training sentence in each iteration process and the prediction accuracy of the sample training sentence in the previous iteration process in the continuous N iteration processes, and determining that the preset convergence condition is reached when the preset accuracy difference range is met; or,
determining a difference value between the target loss of the sequence labeling model in each iteration process and the target loss of the sequence labeling model in the previous iteration process in the continuous M iteration processes, and determining that a preset convergence condition is met when a preset loss difference value range is met; or,
and when the current iteration times reach the preset maximum iteration times, determining that the preset convergence condition is reached.
Optionally, after outputting the trained sequence tagging model, the method further includes:
and acquiring a statement to be processed, and calling the sequence labeling model to perform sequence labeling processing on the statement to be processed to obtain output prediction labeling information.
In a second aspect, a training apparatus for a sequence labeling model is provided, including:
the acquisition unit is used for acquiring a sequence marking model to be trained and a sample training sentence set;
the training unit is used for training the sequence labeling model based on the sample training sentence set to obtain first loss information;
the determining unit is used for determining an anti-disturbance factor according to the model parameters of the sequence labeling model, and training the sequence labeling model based on the sample training sentence set added with the anti-disturbance factor to obtain second loss information;
and the adjusting unit is used for calculating target loss information based on the first loss information and the second loss information, adjusting model parameters of the sequence labeling model based on the target loss information, performing iterative training, and outputting the trained sequence labeling model when a preset convergence condition is met.
Optionally, when determining the disturbance rejection factor according to the model parameter of the sequence labeling model, the determining unit is configured to:
obtaining the current model parameters of the sequence labeling model, calculating the gradient of the sequence labeling model based on the model parameters, and calculating the disturbance resisting factor based on the gradient and the preset hyper-parameters, wherein the hyper-parameters are used for adjusting the strength of the generated disturbance resisting.
Optionally, when the robust disturbance factor is calculated based on the gradient and a preset hyper-parameter, the determining unit:
and acquiring a preset hyper-parameter, and taking the quotient of the product of the obtained hyper-parameter and the gradient and the norm of the gradient to obtain a disturbance resisting factor.
Optionally, before adjusting the model parameters of the sequence labeling model based on the target loss information and performing iterative training, the adjusting unit is further configured to:
acquiring an original learning rate set for a pre-training model in the sequence labeling model, wherein the pre-training model is used for generating a corresponding word vector set based on an input sample training sentence;
according to layer coefficients preset corresponding to each level in the pre-training model, and in combination with the original learning rate, respectively determining the learning rate corresponding to each level, wherein the learning rate is used for representing the adjustment range of model parameters corresponding to each level;
the adjusting and iterative training of the model parameters of the sequence labeling model based on the target loss information comprises:
and adjusting the model parameters of each level in the sequence labeling model in an error back propagation mode based on the determined learning rate of each level in the sequence labeling model and the target loss information.
Optionally, before the obtaining of the sequence labeling model to be trained and the sample training sentence set, the obtaining unit is further configured to:
obtaining a plurality of sample training sentences, determining the sentence length of each sample training sentence, and executing any one of the following operations on each sample training sentence in the plurality of sample training sentences based on the sentence length of each sample training sentence:
if the sentence length does not reach the preset fixed sentence length, filling the sample training sentence by adopting preset characters to generate a sample training sentence; or,
if the sentence length exceeds a preset fixed sentence length, cutting off the part exceeding the fixed sentence length in the sample training sentence to generate a sample training sentence; or,
and if the sentence length reaches the preset fixed sentence length, directly taking the sample training sentence as a sample training sentence.
Optionally, when the sequence labeling model is trained based on the sample training sentence set to obtain first loss information, the training unit is configured to:
inputting each sample training sentence in the sample training sentence set into a sequence labeling model, and generating a corresponding first word vector set aiming at the word vector corresponding to each character in the obtained sample training sentence;
inputting each sample training sentence in the sample training sentence set into a sequence labeling model, and respectively executing the following operations for each sample training sentence input into the sequence labeling model:
determining word vectors corresponding to characters in a sample training sentence, and generating a corresponding first word vector set;
performing entity labeling on the sample training sentence based on the first word vector set to obtain corresponding first prediction labeling information;
and calculating to obtain first loss information based on the labeling difference between the first prediction labeling information and the real labeling information corresponding to the sample training sentence.
Optionally, when the sequence labeling model is trained based on the sample training sentence set added with the disturbance factor to obtain second loss information, the determining unit is configured to:
adding the counterdisturbance factor into the first word vector set to obtain a second word vector set;
performing entity labeling on the sample training sentence based on the second word vector set to obtain corresponding second prediction labeling information;
and calculating to obtain second loss information based on the labeling difference between the second prediction labeling information and the real labeling information corresponding to the sample training sentence.
Optionally, when it is determined that the preset convergence condition is satisfied, the adjusting unit is configured to:
determining the difference between the prediction accuracy of the sample training sentence in each iteration process and the prediction accuracy of the sample training sentence in the previous iteration process in the continuous N iteration processes, and determining that the preset convergence condition is reached when the preset accuracy difference range is met; or,
determining a difference value between the target loss of the sequence labeling model in each iteration process and the target loss of the sequence labeling model in the previous iteration process in the continuous M iteration processes, and determining that a preset convergence condition is met when a preset loss difference value range is met; or,
and when the current iteration times reach the preset maximum iteration times, determining that the preset convergence condition is reached.
Optionally, after outputting the trained sequence labeling model, the adjusting unit is further configured to:
and acquiring a statement to be processed, and calling the sequence labeling model to perform sequence labeling processing on the statement to be processed to obtain output prediction labeling information.
In a third aspect, a training apparatus for a sequence labeling model is provided, including:
a memory for storing executable instructions;
and the processor is used for reading and executing the executable instructions stored in the memory so as to realize the training method of the sequence labeling model.
In a fourth aspect, a storage medium is provided, in which instructions are executed by a processor, so that the processor can execute the training method of the sequence annotation model described in any one of the above.
The invention has the following beneficial effects:
the method comprises the steps of obtaining a sequence marking model to be trained and a sample training sentence set, then training the sequence marking model based on the sample training sentence set to obtain first loss information, determining an anti-disturbance factor according to model parameters of the sequence marking model, training the sequence marking model based on the sample training sentence set added with the anti-disturbance factor to obtain second loss information, then calculating to obtain target loss information based on the first loss information and the second loss information, adjusting model parameters of the sequence marking model based on the target loss information, performing iterative training, and outputting the trained sequence marking model when a preset convergence condition is met.
Therefore, by adding the anti-disturbance factor in the sequence labeling model, different loss information can be obtained based on one sample training sentence, so that the generalization capability of the sequence labeling model obtained by training is stronger, the precision is higher, unnecessary noise interference is avoided, and the resource consumption is saved. In addition, a large number of training samples do not need to be labeled manually, a large amount of labor cost and time can be saved, and therefore the training efficiency of the sequence labeling model can be improved.
Drawings
In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.
FIG. 1 is a flow chart of training a sequence annotation model according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of a logic structure of a training apparatus for a sequence annotation model according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of an entity structure of a training apparatus for a sequence annotation model according to an embodiment of the present invention.
Detailed Description
In order to realize effective training of the sequence marking model under the condition of insufficient sample data, the embodiment of the invention acquires the sequence marking model to be trained and a sample training sentence set, then, training the sequence labeling model based on the sample training sentence set to obtain first loss information, determining an anti-disturbance factor according to the model parameters of the sequence labeling model, training the sequence labeling model based on the sample training sentence set added with the anti-disturbance factor to obtain second loss information, then, target loss information is calculated based on the first loss information and the second loss information, and based on the target loss information, adjusting the model parameters of the sequence labeling model and performing iterative training, and outputting the trained sequence labeling model when the preset convergence condition is met.
Preferred embodiments of the present invention will be described in further detail with reference to the accompanying drawings.
Referring to fig. 1, in the embodiment of the present invention, a training process of a sequence labeling model is as follows:
s101: and acquiring a sequence marking model to be trained and a sample training sentence set.
Specifically, first, sample data is acquired, and the statement length of each sample data is determined.
For example, sample data 1{ Xiaoming goes to bank today and repays two thousand yuan } is obtained, the sentence length of the sample data 1 is determined to be 12 characters, sample data 2{ Xiaoli goes to school every day for learning } is read, and the sentence length of the sample data 2 is determined to be 9 characters.
Further, after determining the statement length of each sample data, processing each sample data based on the statement length of each sample data in the specified number of sample data to obtain a sample training statement until all sample data are processed, where the processing method for each sample data includes, but is not limited to, the following cases:
in the first case: the sentence length of the sample data does not reach the preset fixed sentence length.
And if the sentence length of one sample data does not reach the preset fixed sentence length, filling the sample data by adopting preset characters to generate a sample training sentence.
For example, assuming that the preset fixed sentence length is 128 characters, the preset character is "0", and the sentence length of the sample training sentence 1 is 12 characters, at this time, the sentence length of the sample training sentence 1 does not reach 128 characters, and the sample data 1 is padded with the character "0", so as to generate the sample training sentence 1.
In the second case: the sentence length of the sample data exceeds the preset fixed sentence length.
And if the statement length of one sample data exceeds a preset fixed statement length, truncating the part of the one sample data exceeding the fixed statement length to generate a sample training statement.
For example, assuming that the preset fixed statement length is 128 characters, there are 130 characters of the statement length of the sample data X, and at this time, the statement length of the sample data X exceeds the preset fixed statement length by 128 characters, the sample data X is truncated at a portion exceeding 128 characters, and a sample training statement X is generated.
In the third case: the sentence length of the sample data reaches the preset fixed sentence length.
And if the statement length of one sample data reaches the preset fixed statement length, directly taking the sample data as a sample training statement.
For example, if the fixed statement length is 128 characters and the statement length of the sample data N is 128 characters, the sample data N is directly used as a sample training statement N.
It should be noted that, in the embodiment of the present invention, before determining the statement length of one sample data, the sample data is processed based on the preset statement start tag [ CLS ], the preset statement end tag [ SEP ], that is, the statement start tag [ CLS ] of one sample data and the statement end tag [ SEP ] of one sample data.
Further, in the embodiment of the present invention, a training sample sentence set is obtained based on the training sample sentences obtained by the processing.
It should be noted that, in the embodiment of the present invention, the obtained sequence labeling model is constructed based on a model architecture of Bidirectional Encoder Representation from converters (BERT) + Bidirectional Long and Short Term Memory network (BiLSTM) + Conditional Random Fields (CRF), where the BERT model is a pre-training model in the sequence labeling model.
S102: and training the sequence labeling model based on the sample training sentence set to obtain first loss information.
In the embodiment of the invention, after a sample training sentence set is obtained, a sequence labeling model is trained based on the sample sequence sentence set to obtain first loss information.
It should be noted that, in the embodiment of the present invention, when the sequence marker model is trained, in the iterative process, the sample training statement is read and processed in a batch processing manner. That is, according to the preset batch processing size, it is determined that a corresponding number of sample training sentences are read each time to perform model training.
For example, assuming that the preset batch size is 32, it is determined that 32 sample training sentences are read each time for model training.
For another example, assuming that the preset batch size is 64, it is determined that 64 sample training sentences are read each time for model training.
For convenience of description, the following describes the training process by taking only the example of inputting a sample training sentence into the sequence labeling model.
Specifically, each sample training sentence in the sample training sentence set is input into a sequence labeling model, and for each sample training sentence input into the sequence labeling model, the following operations are respectively performed:
s1: determining word vectors corresponding to characters in a sample training sentence, and generating a corresponding first word vector set.
Specifically, after a sample training sentence is input into a sequence labeling model, a corresponding first word vector set is generated based on word vectors corresponding to each character in the sample training sentence and output by a pre-training model in the sequence labeling model.
For example, after a sample training sentence { Xiaoming reading } is input into a sequence labeling model, a ' small ' corresponding word vector 1, a ' bright ' corresponding word vector 2, a ' corresponding word vector 3, a ' reading ' corresponding word vector 4, a ' book ' corresponding word vector 5 are determined, a word vector set obtained by reading one sample training sentence includes word vectors 1-5, and the word vectors 1-5 are 768 dimensions.
S2: and carrying out entity labeling on a sample training sentence based on the word vector set to obtain corresponding first prediction labeling information.
For example, after determining a word vector set corresponding to a sample training sentence, obtaining a prediction result of each word vector, for example, for a sample training sentence { two thousand yuan is paid by going to bank today in a mingming mode, corresponding first prediction marking information is a small (B-NAM) mingming (E-NAM) day (O) going to a silver (O) line (O) and a small (B-NAM) money (O) two thousand yuan (O) in a representation sample training sentence, wherein the small is the beginning of a name of a person, the mingming is the end of the name of the person, and the current, day, going to the silver, line, and money, two, thousand yuan are classified as other marks.
S3: and calculating to obtain first loss information based on the labeling difference between the first prediction labeling information and the real labeling information corresponding to a sample training sentence, wherein the first loss information is recorded as Le.
Specifically, a BilSTM + CRF model in the sequence labeling model determines a labeling difference between first prediction labeling information and real labeling information based on the obtained first prediction labeling information and the real labeling information corresponding to a sample training sentence, and calculates a first loss corresponding to the current sequence labeling model in a targeted manner.
For example, for a sample training sentence { Xiaoming two thousand yuan for today to go to bank, the corresponding real labeling information is: small (B-NAM) day (E-TIM) to silver (B-LOC) line (E-LOC) and also (O) two (B-MON) thousand (I-MON) elements (E-MON), where "B-" indicates the beginning of the labeled element, "I-" indicates the middle of the labeled element, and "E-" indicates the end of the labeled element. And then, calculating a first loss corresponding to the current sequence labeling model based on the difference between the real labeling information and the first labeling information obtained by the sequence labeling model.
S103: and determining a counterdisturbance factor according to the model parameters of the sequence labeling model.
In the embodiment of the invention, the current model parameters of the sequence labeling model are obtained, the gradient of the sequence labeling model is calculated based on the model parameters, and the disturbance resisting factor is calculated based on the gradient and the preset hyper-parameters, wherein the hyper-parameters are used for adjusting the strength of the generated disturbance resisting factor.
Specifically, the gradient g of the sequence labeling model is calculated based on the current model parameter of the sequence labeling model, further, in the embodiment of the present invention, a preset hyper-parameter is obtained, and the product of the obtained hyper-parameter and the gradient is subjected to quotient with the norm of the gradient to obtain the disturbance rejection factor.
The formula for calculating the opposition disturbance factor r is as follows:
r=-g/||g||2
the system comprises a processor, a storage unit, a processor, a storage unit, a processor and a display unit, wherein the processor is used for processing the parameters of the processor, and the processor is used for processing the parameters of the processor.
S104: and training the sequence labeling model based on the sample training sentence set added with the anti-disturbance factor to obtain second loss information.
Specifically, the process of inputting a sample training sentence in the sample training set into the sequence labeling model in S102 to obtain the second loss information is described.
And adding the confrontation disturbance factor into the first word vector set obtained in the step S102 to obtain a second word vector set, and then carrying out entity labeling on the sample training sentence by a BilSTM + CRF model in the sequence labeling model based on the second word vector set to obtain corresponding second prediction labeling information.
For example, after an anti-disturbance factor is added to a first word vector set corresponding to a sample training statement { two thousand yuan for every day of Mingming and repayment for banking }, each word vector in the first word vector set is disturbed to generate a second word vector set. The sequence annotation model output is then obtained such as: and the second prediction marking information of the small (B-NAM) amine (E-NAM) is obtained by removing the second prediction marking information of the silver (O) line (E-NAM) and the second prediction marking information of the silver (O) line (E-NA.
Further, a second loss information is calculated based on the labeling difference between the second prediction labeling information and the real labeling information corresponding to the sample training sentence.
Specifically, based on the obtained second prediction labeling information and the real labeling information corresponding to one sample data, the labeling difference between the second prediction labeling information and the real labeling information is determined, and a second loss corresponding to the current sequence labeling model is calculated in a targeted manner, and the second loss is recorded as Lr.
For example, for a sample training sentence { Xiaoming two thousand yuan for today to go to bank, the corresponding real labeling information is: small (B-NAM) bright (E-NAM) current (B-TIM) day (E-TIM) (O) silver (B-LOC) line (E-LOC) and (O) money (O) two (B-MON) thousand (I-MON) yuan (E-MON), and the second prediction labeling information obtained for the sample training sentence is: and removing (O) silver (B-NAM) in the current (O) day (O) of the small (B-NAM) (E-NAM) line (E-NAM) and returning (O) two (O) thousand (O) yuan (O), and further obtaining a second loss corresponding to the current sequence labeling model based on the difference between the second prediction information and the real labeling information.
Therefore, on the basis of one sample training sentence, different prediction marking information output by the sequence marking model is obtained under the condition of adding the anti-disturbance factor, the number of the sample training sentences can be increased under the condition of not introducing noise under the condition of limited samples, and the generalization capability and the precision of the sequence marking model can be stronger and higher when the sequence marking model is trained on the basis of the sample training sentences added with the anti-disturbance factor.
S105: and calculating target loss information based on the first loss information and the second loss information, and adjusting model parameters of the sequence labeling model based on the target loss information and performing iterative training.
Specifically, after an Le determined for the difference between the first prediction labeling information and the real labeling information and an Lr determined for the difference between the second prediction labeling information and the real labeling information are obtained, the sum of the Le and the Lr is further used as target loss information, and the target loss information is denoted as L.
It should be noted that, in the embodiment of the present invention, before adjusting and performing iterative training on the model parameters of the sequence labeling model based on the target loss information, a learning rate needs to be configured for each layer of a pre-training model in the sequence labeling model, specifically, an original learning rate set for the pre-training model in the sequence labeling model is obtained, where the pre-training model is used to generate a corresponding word vector set based on an input sample training sentence, and then, learning rates corresponding to respective levels are respectively determined according to layer coefficients preset for the respective levels in the pre-training model and by combining the original learning rate, where the learning rate is used to represent adjustment ranges of the model parameters corresponding to the respective levels.
It should be noted that, for the pre-training model in the sequence labeling model, the learning rate of each level in the pre-training model is different, generally, the upper layer of the pre-training model includes more semantic level information, the middle layer includes syntax level information, and the bottom layer includes information on terms of phrases, so when configuring the learning rate, a higher learning rate is generally configured for the upper layer of the pre-training model to implement more changes of upper layer parameters, and a lower learning rate is generally configured for the bottom layer of the pre-training model to implement less changes of bottom layer parameters.
Specifically, the learning rate of each layer in the pre-training model in the sequence labeling model is calculated by adopting the following two modes:
in a first way,
Calculating the learning rate of each layer in a pre-training model in the sequence labeling model according to the following formula:
Li=Lr/Ci
where Li represents the learning rate of the ith layer, Lr represents the original learning rate configured for the pre-trained model, and Ci represents the layer coefficients of the ith layer.
It should be noted that, the smaller i represents the higher level, taking the pre-training model has three levels as an example, C1 represents the level system coefficient of the upper level of the pre-training model, C2 represents the level system coefficient of the middle level of the pre-training model, and C3 represents the level system coefficient of the lower level of the pre-training model, wherein the value of Ci may be adjusted according to actual processing requirements.
The second way,
Calculating the learning rate of each layer in a pre-training model in the sequence labeling model by adopting the following formula:
Li+1=Li/C
where Li denotes the learning rate of the i-th layer, Li +1 denotes the learning rate of the i + 1-th layer, and C is a fixed parameter.
For example, if the initial learning rate L1 of layer 1 is set to 0.025 and the fixed parameter C is set to 5, the learning rate of layer 2 is 0.005 and the learning rate of layer 3 is 0.001.
It should be noted that, the smaller i represents the higher the level, taking the pre-training model has three levels as an example, L1 represents the learning rate of the upper layer of the pre-training model, L2 represents the learning rate of the middle layer of the pre-training model, and L3 represents the learning rate of the bottom layer of the pre-training model.
Therefore, the method considers that due to the fact that the sample size is insufficient, when the pre-training model is adjusted, different learning rates are configured for different levels of the pre-training model, the upper-layer parameters of the pre-training model are changed more, the bottom-layer parameters of the pre-training model are changed less, the parameter adjustment of the pre-training model can be more scientific, and the sequence labeling model with higher precision is obtained.
Further, based on the determined learning rate of each level in the sequence labeling model and the target loss information, adjusting model parameters of each level in the sequence labeling model in an error back propagation manner. The specific process can be as follows: calculating the partial derivative value of the target loss information to the model parameter of each level, calculating the product of the learning rate of each level and the calculated partial derivative value, and adjusting the model parameter of each level by using the product corresponding to each level, wherein the adjusted model parameter of each level is the difference value of the model parameter before adjustment and the corresponding product.
For example, taking an example of adjusting a certain model parameter W1 at a certain level in the sequence labeling model, first, the partial derivative of the target loss information to W1 is calculated based on the calculated target loss information, and then the difference between the value of the model parameter W1 before adjustment and the product of the partial derivative value and the learning rate is calculated, and then the calculated difference is taken as the updated W1, and if the original W1 is assumed to be 0.4, the learning rate is 0.01, and the calculated partial derivative value is 0.04728, the updated W1 bit is 0.4-0.01 × 0.04728 — 0.3995727.
In the embodiment of the present invention, an iterative training mode is adopted to train based on each sample training sentence in the sample training sentence set, specifically, in the training process of the sequence annotation model, a target loss information is obtained based on the sample training sentence, after one adjustment of the sequence annotation model is completed, a new sample training sentence is obtained from the sample training sentence set and input into the adjusted sequence annotation model for another training, and the iteration is repeated until it is determined that the convergence condition is satisfied, which is not described herein again.
In this way, equivalent to adding the disturbance resisting factor to the loss of the sequence labeling model, when the sequence labeling model is adjusted based on the target loss information added with the disturbance resisting factor, the generalization capability of the sequence labeling model can be stronger, and the precision is higher.
S106: and when the preset convergence condition is met, outputting the trained sequence labeling model.
Specifically, in the embodiment of the present invention, after determining the target loss information of the sequence labeling model, the following methods may be adopted to determine that the preset convergence condition is satisfied:
the first mode is as follows: and determining the difference between the prediction accuracy of the sample training sentence in each iteration process and the prediction accuracy of the sample training sentence in the previous iteration process in the continuous N iteration processes, and determining that the preset convergence condition is reached when the preset accuracy difference range is met.
It should be noted that the prediction accuracy can be measured by the accuracy of the comparison between the prediction labeling information and the actual labeling information, that is, the percentage of correctly labeled information content in the prediction labeling information.
For example, it is assumed that after a sample training sentence including 50 characters is sequence-labeled by the sequence labeling model, 40 character sequences are determined to be correctly labeled, and thus, the accuracy of the sequence labeling model is 80%.
In the embodiment of the invention, the value of N can be set according to the actual application scene.
For example, assuming that the value of N is 2, the preset accuracy difference range is 1% to 5%, the prediction accuracy 1 of the sample training sentence in the 10 th iteration is 80%, the prediction accuracy 2 of the sample training sentence in the 9 th iteration is 75%, and the prediction accuracy 3 of the sample training sentence in the 8 th iteration is 70%, obviously, the difference between the prediction accuracy 1 of the sample training sentence in the 10 th iteration and the prediction accuracy 2 of the sample training sentence in the 9 th iteration is 5%, and the difference between the prediction accuracy 2 of the sample training sentence in the 9 th iteration and the prediction accuracy 3 of the sample training sentence in the 8 th iteration is 5%, at this time, determining the difference between the prediction accuracy of the sample training sentence in each iteration and the prediction accuracy of the sample training sentence in the previous iteration in the consecutive 2 iterations, and if the preset convergence condition is met, judging that the preset convergence condition is met.
The third mode is as follows: and determining the difference value between the target loss information of the sequence marking model in each iteration process and the target loss information of the sequence marking model in the previous iteration process in the continuous M iteration processes, and determining that the preset convergence condition is reached when the preset accuracy difference value range is met.
It should be noted that, in the embodiment of the present invention, the value of M may be set according to an actual application scenario.
For example, assuming that the value of M is 5, the preset loss difference range is 1% to 2.5%, the target loss information of the sequence labeling model in the 30 th iteration is 7.5%, the target loss information in the 29 th iteration is 8.4%, the target loss information in the 28 th iteration is 9.2%, the target loss information in the 27 th iteration is 10.3%, the target loss information in the 26 th iteration is 11.6%, and the target loss information in the 25 th iteration is 13.0%, so that the difference of the target loss information in the adjacent iterations from the 25 th iteration to the 30 th iteration is 1.4%, 1.3%, 1.1%, 0.8%, and 0.9%, and the difference satisfying the target loss information for 5 consecutive times is in the loss difference range, it is determined that the preset convergence condition is satisfied.
The third mode is as follows: and when the current iteration times reach the preset maximum iteration times, determining that the preset convergence condition is met.
For example, assuming that the preset maximum number of iterations is 50, when the current number of iterations reaches 50, it is determined that the preset convergence condition is satisfied.
Further, when any one of the convergence conditions is determined to be satisfied, the convergence of the sequence labeling model can be determined, and the trained sequence labeling model can be output.
Further, in practical application, after the sentence to be processed is obtained, the sequence tagging model is called to perform sequence tagging processing on the sentence to be processed, so as to obtain the output prediction tagging information.
Based on the same inventive concept, referring to fig. 2, in an embodiment of the present invention, a training apparatus for a sequence annotation model is provided, which at least includes: an acquisition unit 201, a training unit 202, a determination unit 203 and an adjustment unit 204, wherein,
an obtaining unit 201, which obtains a sequence labeling model to be trained and a sample training sentence set;
a training unit 202, configured to train the sequence labeling model based on the sample training sentence set to obtain first loss information;
the determining unit 203 determines an anti-disturbance factor according to the model parameters of the sequence labeling model, and trains the sequence labeling model based on the sample training sentence set added with the anti-disturbance factor to obtain second loss information;
an adjusting unit 204, configured to calculate target loss information based on the first loss information and the second loss information, adjust model parameters of the sequence labeling model based on the target loss information, perform iterative training, and output the trained sequence labeling model when it is determined that a preset convergence condition is satisfied.
Optionally, when determining the disturbance rejection factor according to the model parameter of the sequence labeling model, the determining unit 203 is configured to:
obtaining the current model parameters of the sequence labeling model, calculating the gradient of the sequence labeling model based on the model parameters, and calculating the disturbance resisting factor based on the gradient and the preset hyper-parameters, wherein the hyper-parameters are used for adjusting the strength of the generated disturbance resisting.
Optionally, when the robust disturbance factor is calculated based on the gradient and the preset hyper-parameter, the determining unit 203:
and acquiring a preset hyper-parameter, and taking the quotient of the product of the obtained hyper-parameter and the gradient and the norm of the gradient to obtain a disturbance resisting factor.
Optionally, before adjusting the model parameters of the sequence labeling model based on the target loss information and performing iterative training, the adjusting unit 204 is further configured to:
acquiring an original learning rate set for a pre-training model in the sequence labeling model, wherein the pre-training model is used for generating a corresponding word vector set based on an input sample training sentence;
according to layer coefficients preset corresponding to each level in the pre-training model, and in combination with the original learning rate, respectively determining the learning rate corresponding to each level, wherein the learning rate is used for representing the adjustment range of model parameters corresponding to each level;
the adjusting and iterative training of the model parameters of the sequence labeling model based on the target loss information comprises:
and adjusting the model parameters of each level in the sequence labeling model in an error back propagation mode based on the determined learning rate of each level in the sequence labeling model and the target loss information.
Optionally, before the obtaining of the sequence labeling model to be trained and the sample training sentence set, the obtaining unit 201 is further configured to:
obtaining a plurality of sample training sentences, determining the sentence length of each sample training sentence, and executing any one of the following operations on each sample training sentence in the plurality of sample training sentences based on the sentence length of each sample training sentence:
if the sentence length does not reach the preset fixed sentence length, filling the sample training sentence by adopting preset characters to generate a sample training sentence; or,
if the sentence length exceeds a preset fixed sentence length, cutting off the part exceeding the fixed sentence length in the sample training sentence to generate a sample training sentence; or,
and if the sentence length reaches the preset fixed sentence length, directly taking the sample training sentence as a sample training sentence.
Optionally, when the sequence labeling model is trained based on the sample training sentence set to obtain first loss information, the training unit 202 is configured to:
inputting each sample training sentence in the sample training sentence set into a sequence labeling model, and generating a corresponding first word vector set aiming at the word vector corresponding to each character in the obtained sample training sentence;
inputting each sample training sentence in the sample training sentence set into a sequence labeling model, and respectively executing the following operations for each sample training sentence input into the sequence labeling model:
determining word vectors corresponding to characters in a sample training sentence, and generating a corresponding first word vector set;
performing entity labeling on the sample training sentence based on the first word vector set to obtain corresponding first prediction labeling information;
and calculating to obtain first loss information based on the labeling difference between the first prediction labeling information and the real labeling information corresponding to the sample training sentence.
Optionally, when the sequence labeling model is trained based on the sample training sentence set added with the disturbance factor to obtain second loss information, the determining unit 203 is configured to:
adding the counterdisturbance factor into the first word vector set to obtain a second word vector set;
performing entity labeling on the sample training sentence based on the second word vector set to obtain corresponding second prediction labeling information;
and calculating to obtain second loss information based on the labeling difference between the second prediction labeling information and the real labeling information corresponding to the sample training sentence.
Optionally, when it is determined that the preset convergence condition is satisfied, the adjusting unit 204 is configured to:
determining the difference between the prediction accuracy of the sample training sentence in each iteration process and the prediction accuracy of the sample training sentence in the previous iteration process in the continuous N iteration processes, and determining that the preset convergence condition is reached when the preset accuracy difference range is met; or,
determining a difference value between the target loss of the sequence labeling model in each iteration process and the target loss of the sequence labeling model in the previous iteration process in the continuous M iteration processes, and determining that a preset convergence condition is met when a preset loss difference value range is met; or,
and when the current iteration times reach the preset maximum iteration times, determining that the preset convergence condition is reached.
Optionally, after outputting the trained sequence tagging model, the adjusting unit 204 is further configured to:
and acquiring a statement to be processed, and calling the sequence labeling model to perform sequence labeling processing on the statement to be processed to obtain output prediction labeling information.
Based on the same inventive concept, referring to fig. 3, the disclosed embodiment provides a training apparatus for a sequence annotation model, which at least includes:
a memory 301 for storing executable instructions;
a processor 302 for reading and executing the executable instructions stored in the memory, and performing the following processes:
acquiring a sequence marking model to be trained and a sample training sentence set;
training the sequence labeling model based on the sample training sentence set to obtain first loss information;
determining an anti-disturbance factor according to the model parameters of the sequence labeling model, and training the sequence labeling model based on a sample training statement set added with the anti-disturbance factor to obtain second loss information;
calculating target loss information based on the first loss information and the second loss information, adjusting model parameters of the sequence labeling model based on the target loss information, performing iterative training, and outputting the trained sequence labeling model when a preset convergence condition is met.
Optionally, when determining the disturbance rejection factor according to the model parameter of the sequence labeling model, the processor 302 is configured to:
obtaining the current model parameters of the sequence labeling model, calculating the gradient of the sequence labeling model based on the model parameters, and calculating the disturbance resisting factor based on the gradient and the preset hyper-parameters, wherein the hyper-parameters are used for adjusting the strength of the generated disturbance resisting.
Optionally, when the countermeasure disturbance factor is calculated based on the gradient and the preset hyper-parameter, the processor 302 is configured to:
and acquiring a preset hyper-parameter, and taking the quotient of the product of the obtained hyper-parameter and the gradient and the norm of the gradient to obtain a disturbance resisting factor.
Optionally, before adjusting the model parameters of the sequence labeling model based on the target loss information and performing iterative training, the processor 302 is further configured to:
acquiring an original learning rate set for a pre-training model in the sequence labeling model, wherein the pre-training model is used for generating a corresponding word vector set based on an input sample training sentence;
and respectively determining the learning rate corresponding to each level according to the layer coefficient preset corresponding to each level in the pre-training model and by combining the original learning rate, wherein the learning rate is used for representing the adjustment range of the model parameter corresponding to each level.
Optionally, before obtaining the sequence labeling model to be trained and the sample training sentence set, the processor 302 is further configured to:
obtaining a plurality of sample training sentences, determining the sentence length of each sample training sentence, and executing any one of the following operations on each sample training sentence in the plurality of sample training sentences based on the sentence length of each sample training sentence:
if the sentence length does not reach the preset fixed sentence length, filling the sample training sentence by adopting preset characters to generate a sample training sentence; or,
if the sentence length exceeds a preset fixed sentence length, cutting off the part exceeding the fixed sentence length in the sample training sentence to generate a sample training sentence; or,
and if the sentence length reaches the preset fixed sentence length, directly taking the sample training sentence as a sample training sentence.
Optionally, when the sequence labeling model is trained based on the sample training sentence set to obtain first loss information, the processor 302 is configured to:
inputting each sample training sentence in the sample training sentence set into a sequence labeling model, and respectively executing the following operations for each sample training sentence input into the sequence labeling model:
determining word vectors corresponding to characters in a sample training sentence, and generating a corresponding first word vector set;
performing entity labeling on the sample training sentence based on the first word vector set to obtain corresponding first prediction labeling information;
and calculating to obtain first loss information based on the labeling difference between the first prediction labeling information and the real labeling information corresponding to the sample training sentence.
Optionally, when the sequence labeling model is trained based on the sample training sentence set added with the disturbance factor to obtain second loss information, the processor 302 is configured to:
adding the counterdisturbance factor into the first word vector set to obtain a second word vector set;
performing entity labeling on the sample training sentence based on the second word vector set to obtain corresponding second prediction labeling information;
and calculating to obtain second loss information based on the labeling difference between the second prediction labeling information and the real labeling information corresponding to the sample training sentence.
Optionally, when it is determined that the preset convergence condition is satisfied, the processor 302 is configured to:
determining the difference between the prediction accuracy of the sample training sentence in each iteration process and the prediction accuracy of the sample training sentence in the previous iteration process in the continuous N iteration processes, and determining that the preset convergence condition is reached when the preset accuracy difference range is met; or,
determining a difference value between the target loss of the sequence labeling model in each iteration process and the target loss of the sequence labeling model in the previous iteration process in the continuous M iteration processes, and determining that a preset convergence condition is met when a preset loss difference value range is met; or,
and when the current iteration times reach the preset maximum iteration times, determining that the preset convergence condition is reached.
Optionally, after outputting the trained sequence labeling model, the processor 302 is further configured to:
and acquiring a statement to be processed, and calling the sequence labeling model to perform sequence labeling processing on the statement to be processed to obtain output prediction labeling information.
Based on the same inventive concept, embodiments of the present invention provide a storage medium, and instructions in the storage medium, when executed by a processor, enable the processor to execute the training method of the sequence annotation model as described in any one of the above embodiments.
In summary, in the embodiments of the present invention, a sequence annotation model to be trained and a sample training sentence set are obtained, then the sequence annotation model is trained based on the sample training sentence set to obtain first loss information, an anti-disturbance factor is determined according to a model parameter of the sequence annotation model, the sequence annotation model is trained based on the sample training sentence set to which the anti-disturbance factor is added to obtain second loss information, then target loss information is obtained by calculation based on the first loss information and the second loss information, model parameters of the sequence annotation model are adjusted based on the target loss information and are subjected to iterative training, and when it is determined that a preset convergence condition is satisfied, the trained sequence annotation model is output. Therefore, by adding the anti-disturbance factor in the sequence labeling model, different loss information can be obtained based on one sample training sentence, so that the generalization capability of the sequence labeling model obtained by training is stronger, the precision is higher, unnecessary noise interference is avoided, and the resource consumption is saved. In addition, a large number of training samples do not need to be labeled manually, a large amount of labor cost and time can be saved, and therefore the training efficiency of the sequence labeling model can be improved.
As will be appreciated by one skilled in the art, embodiments of the present invention may be provided as a method, system, or computer program product. Accordingly, the present invention may take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment combining software and hardware aspects. Furthermore, the present invention may take the form of a computer program product embodied on one or more computer-usable storage media (including, but not limited to, disk storage, CD-ROM, optical storage, and the like) having computer-usable program code embodied therein.
The present invention is described with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems), and computer program products according to embodiments of the invention. It will be understood that each flow and/or block of the flow diagrams and/or block diagrams, and combinations of flows and/or blocks in the flow diagrams and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, embedded processor, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be stored in a computer-readable memory that can direct a computer or other programmable data processing apparatus to function in a particular manner, such that the instructions stored in the computer-readable memory produce an article of manufacture including instruction means which implement the function specified in the flowchart flow or flows and/or block diagram block or blocks.
These computer program instructions may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart flow or flows and/or block diagram block or blocks.
While preferred embodiments of the present invention have been described, additional variations and modifications in those embodiments may occur to those skilled in the art once they learn of the basic inventive concepts. Therefore, it is intended that the appended claims be interpreted as including preferred embodiments and all such alterations and modifications as fall within the scope of the invention.
It will be apparent to those skilled in the art that various modifications and variations can be made in the embodiments of the present invention without departing from the spirit or scope of the embodiments of the invention. Thus, if such modifications and variations of the embodiments of the present invention fall within the scope of the claims of the present invention and their equivalents, the present invention is also intended to encompass such modifications and variations.

Claims (10)

1. A training method of a sequence labeling model is characterized by comprising the following steps:
acquiring a sequence marking model to be trained and a sample training sentence set;
training the sequence labeling model based on the sample training sentence set to obtain first loss information;
determining an anti-disturbance factor according to the model parameters of the sequence labeling model, and training the sequence labeling model based on a sample training statement set added with the anti-disturbance factor to obtain second loss information;
calculating target loss information based on the first loss information and the second loss information, adjusting model parameters of the sequence labeling model based on the target loss information, performing iterative training, and outputting the trained sequence labeling model when a preset convergence condition is met.
2. The method of claim 1, wherein determining the counterdisturbance factor according to the model parameters of the sequence labeling model comprises:
obtaining the current model parameters of the sequence labeling model, calculating the gradient of the sequence labeling model based on the model parameters, and calculating the disturbance resisting factor based on the gradient and the preset hyper-parameters, wherein the hyper-parameters are used for adjusting the strength of the generated disturbance resisting.
3. The method of claim 2, wherein the calculating a countering perturbation factor based on the gradient and a preset hyperparameter comprises:
and acquiring the hyper-parameter, and performing quotient on the product of the acquired hyper-parameter and the gradient and the norm of the gradient to obtain a disturbance resisting factor.
4. The method of claim 1, wherein before performing the iterative training to adjust the model parameters of the sequence labeling model based on the target loss information, the method further comprises:
acquiring an original learning rate set for a pre-training model in the sequence labeling model, wherein the pre-training model is used for generating a corresponding word vector set based on an input sample training sentence;
according to layer coefficients preset corresponding to each level in the pre-training model, and in combination with the original learning rate, respectively determining the learning rate corresponding to each level, wherein the learning rate is used for representing the adjustment range of model parameters corresponding to each level;
the adjusting and iterative training of the model parameters of the sequence labeling model based on the target loss information comprises:
and adjusting the model parameters of each level in the sequence labeling model in an error back propagation mode based on the determined learning rate of each level in the sequence labeling model and the target loss information.
5. The method of any one of claims 1-4, wherein training the sequence annotation model based on the set of sample training sentences to obtain first loss information comprises:
inputting each sample training sentence in the sample training sentence set into a sequence labeling model, and respectively executing the following operations for each sample training sentence input into the sequence labeling model:
determining word vectors corresponding to characters in a sample training sentence, and generating a corresponding first word vector set;
performing entity labeling on the sample training sentence based on the first word vector set to obtain corresponding first prediction labeling information;
and calculating to obtain first loss information based on the labeling difference between the first prediction labeling information and the real labeling information corresponding to the sample training sentence.
6. The method of claim 5, wherein the training the sequence labeling model based on the sample training sentence set added with the disturbance rejection factor to obtain second loss information comprises:
adding the counterdisturbance factor into the first word vector set to obtain a second word vector set;
performing entity labeling on the sample training sentence based on the second word vector set to obtain corresponding second prediction labeling information;
and calculating to obtain second loss information based on the labeling difference between the second prediction labeling information and the real labeling information corresponding to the sample training sentence.
7. The method of any one of claims 1-4, wherein the outputting the trained sequence labeling model further comprises:
and acquiring a statement to be processed, and calling the sequence labeling model to perform sequence labeling processing on the statement to be processed to obtain output prediction labeling information.
8. A training device for a sequence labeling model is characterized by comprising:
the acquisition unit is used for acquiring a sequence marking model to be trained and a sample training sentence set;
the training unit is used for training the sequence labeling model based on the sample training sentence set to obtain first loss information;
the determining unit is used for determining an anti-disturbance factor according to the model parameters of the sequence labeling model, and training the sequence labeling model based on the sample training sentence set added with the anti-disturbance factor to obtain second loss information;
and the adjusting unit is used for calculating target loss information based on the first loss information and the second loss information, adjusting model parameters of the sequence labeling model based on the target loss information, performing iterative training, and outputting the trained sequence labeling model when a preset convergence condition is met.
9. A training device for a sequence labeling model is characterized by comprising:
a memory for storing executable instructions;
a processor for reading and executing the executable instructions stored in the memory to implement the training method of the sequence annotation model according to any one of claims 1 to 7.
10. A storage medium, wherein instructions in the storage medium, when executed by a processor, enable the processor to perform a method of training a sequence annotation model according to any one of claims 1 to 7.
CN202010591966.2A 2020-06-24 2020-06-24 Training method and device for sequence labeling model Pending CN111737952A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN202010591966.2A CN111737952A (en) 2020-06-24 2020-06-24 Training method and device for sequence labeling model
PCT/CN2021/094180 WO2021258914A1 (en) 2020-06-24 2021-05-17 Method and apparatus for training sequence labeling model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202010591966.2A CN111737952A (en) 2020-06-24 2020-06-24 Training method and device for sequence labeling model

Publications (1)

Publication Number Publication Date
CN111737952A true CN111737952A (en) 2020-10-02

Family

ID=72651128

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202010591966.2A Pending CN111737952A (en) 2020-06-24 2020-06-24 Training method and device for sequence labeling model

Country Status (2)

Country Link
CN (1) CN111737952A (en)
WO (1) WO2021258914A1 (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112434213A (en) * 2020-10-15 2021-03-02 中国科学院深圳先进技术研究院 Network model training method, information pushing method and related device
CN113032560A (en) * 2021-03-16 2021-06-25 北京达佳互联信息技术有限公司 Sentence classification model training method, sentence processing method and equipment
WO2021258914A1 (en) * 2020-06-24 2021-12-30 深圳前海微众银行股份有限公司 Method and apparatus for training sequence labeling model
CN114138973A (en) * 2021-12-03 2022-03-04 大连海事大学 Log sequence anomaly detection method based on contrast countertraining
CN114648028A (en) * 2020-12-21 2022-06-21 阿里巴巴集团控股有限公司 Method and device for training label model, electronic equipment and storage medium
WO2023045949A1 (en) * 2021-09-27 2023-03-30 华为技术有限公司 Model training method and related device
WO2023071581A1 (en) * 2021-10-29 2023-05-04 北京有竹居网络技术有限公司 Method and apparatus for determining response sentence, device, and medium

Families Citing this family (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115146055B (en) * 2022-04-18 2024-07-23 重庆邮电大学 Text universal countermeasure defense method and system based on countermeasure training
CN114896307B (en) * 2022-06-30 2022-09-27 北京航空航天大学杭州创新研究院 Time series data enhancement method and device and electronic equipment
CN115881212A (en) * 2022-10-26 2023-03-31 溪砾科技(深圳)有限公司 RNA target-based small molecule compound screening method and device
CN115938353B (en) * 2022-11-24 2023-06-27 北京数美时代科技有限公司 Voice sample distributed sampling method, system, storage medium and electronic equipment
CN116503923B (en) * 2023-02-16 2023-12-08 深圳市博安智控科技有限公司 Method and device for training face recognition model
CN116388884B (en) * 2023-06-05 2023-10-20 浙江大学 Method, system and device for designing anti-eavesdrop ultrasonic interference sample

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582793A (en) * 2018-11-23 2019-04-05 深圳前海微众银行股份有限公司 Model training method, customer service system and data labeling system, readable storage medium storing program for executing
CN110532377A (en) * 2019-05-13 2019-12-03 南京大学 A kind of semi-supervised file classification method based on dual training and confrontation learning network
CN111091004A (en) * 2019-12-18 2020-05-01 上海风秩科技有限公司 Training method and training device for sentence entity labeling model and electronic equipment
CN111191453A (en) * 2019-12-25 2020-05-22 中国电子科技集团公司第十五研究所 Named entity recognition method based on confrontation training

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11568261B2 (en) * 2018-10-26 2023-01-31 Royal Bank Of Canada System and method for max-margin adversarial training
CN109902705A (en) * 2018-10-30 2019-06-18 华为技术有限公司 A kind of object detection model to disturbance rejection generation method and device
CN110459282B (en) * 2019-07-11 2021-03-09 新华三大数据技术有限公司 Sequence labeling model training method, electronic medical record processing method and related device
CN110781934A (en) * 2019-10-15 2020-02-11 深圳市商汤科技有限公司 Supervised learning and label prediction method and device, electronic equipment and storage medium
CN114936639A (en) * 2019-12-31 2022-08-23 北京航空航天大学 Progressive confrontation training method and device
CN111737952A (en) * 2020-06-24 2020-10-02 深圳前海微众银行股份有限公司 Training method and device for sequence labeling model

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109582793A (en) * 2018-11-23 2019-04-05 深圳前海微众银行股份有限公司 Model training method, customer service system and data labeling system, readable storage medium storing program for executing
CN110532377A (en) * 2019-05-13 2019-12-03 南京大学 A kind of semi-supervised file classification method based on dual training and confrontation learning network
CN111091004A (en) * 2019-12-18 2020-05-01 上海风秩科技有限公司 Training method and training device for sentence entity labeling model and electronic equipment
CN111191453A (en) * 2019-12-25 2020-05-22 中国电子科技集团公司第十五研究所 Named entity recognition method based on confrontation training

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
TAKERU MIYATO等: "ADVERSARIAL TRAINING METHODS FOR SEMI-SUPERVISED TEXT CLASSIFICATION", ARXIV, 6 May 2017 (2017-05-06) *
张晓辉;于双元;王全新;徐保民;: "基于对抗训练的文本表示和分类算法", 计算机科学, no. 1, 15 June 2020 (2020-06-15) *
杜昌顺: "面向细分领域的舆情情感分析关键技术研究", 中国博士学位论文全文数据库信息科技辑(月刊), no. 01, 15 January 2020 (2020-01-15), pages 1 *
黄培馨;赵翔;方阳;朱慧明;肖卫东;: "融合对抗训练的端到端知识三元组联合抽取", 计算机研究与发展, no. 12, 15 December 2019 (2019-12-15) *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2021258914A1 (en) * 2020-06-24 2021-12-30 深圳前海微众银行股份有限公司 Method and apparatus for training sequence labeling model
CN112434213A (en) * 2020-10-15 2021-03-02 中国科学院深圳先进技术研究院 Network model training method, information pushing method and related device
CN112434213B (en) * 2020-10-15 2023-09-29 中国科学院深圳先进技术研究院 Training method of network model, information pushing method and related devices
CN114648028A (en) * 2020-12-21 2022-06-21 阿里巴巴集团控股有限公司 Method and device for training label model, electronic equipment and storage medium
CN113032560A (en) * 2021-03-16 2021-06-25 北京达佳互联信息技术有限公司 Sentence classification model training method, sentence processing method and equipment
CN113032560B (en) * 2021-03-16 2023-10-27 北京达佳互联信息技术有限公司 Sentence classification model training method, sentence processing method and equipment
WO2023045949A1 (en) * 2021-09-27 2023-03-30 华为技术有限公司 Model training method and related device
WO2023071581A1 (en) * 2021-10-29 2023-05-04 北京有竹居网络技术有限公司 Method and apparatus for determining response sentence, device, and medium
CN114138973A (en) * 2021-12-03 2022-03-04 大连海事大学 Log sequence anomaly detection method based on contrast countertraining

Also Published As

Publication number Publication date
WO2021258914A1 (en) 2021-12-30

Similar Documents

Publication Publication Date Title
CN111737952A (en) Training method and device for sequence labeling model
CN109726696B (en) Image description generation system and method based on attention-pushing mechanism
CN110929515B (en) Reading understanding method and system based on cooperative attention and adaptive adjustment
CN108305612A (en) Text-processing, model training method, device, storage medium and computer equipment
CN112528637B (en) Text processing model training method, device, computer equipment and storage medium
CN111177325B (en) Method and system for automatically generating answers
CN110678882B (en) Method and system for selecting answer spans from electronic documents using machine learning
CN111951780A (en) Speech synthesis multitask model training method and related equipment
CN110750630A (en) Generating type machine reading understanding method, device, equipment and storage medium
CN117173504A (en) Training method, training device, training equipment and training storage medium for text-generated graph model
CN109766550A (en) A kind of text brand identification method, identification device and storage medium
CN111753076A (en) Dialogue method, dialogue device, electronic equipment and readable storage medium
CN109726400A (en) Entity word recognition result evaluation method, apparatus, equipment and entity word extraction system
CN110399472A (en) Reminding method, device, computer equipment and storage medium are putd question in interview
CN112270181A (en) Sequence labeling method, system, computer readable storage medium and computer device
WO2023042045A1 (en) Convolution attention network for multi-label clinical document classification
CN117194619A (en) Multi-round dialogue question-answering method and system based on historical position coding
CN116542328A (en) Knowledge distillation method and device for CTR prediction model
CN115952266A (en) Question generation method and device, computer equipment and storage medium
CN112765936B (en) Training method and device for operation based on language model
CN115270795A (en) Small sample learning-based named entity recognition technology in environmental assessment field
CN114707518A (en) Semantic fragment-oriented target emotion analysis method, device, equipment and medium
CN111400484B (en) Keyword extraction method and system
CN114115878A (en) Workflow node recommendation method and device
CN110147881A (en) Language processing method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination