CN112307769B

CN112307769B - Natural language model generation method and computer equipment

Info

Publication number: CN112307769B
Application number: CN201910689711.7A
Authority: CN
Inventors: 刘坤
Original assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Current assignee: Wuhan TCL Group Industrial Research Institute Co Ltd
Priority date: 2019-07-29
Filing date: 2019-07-29
Publication date: 2024-03-15
Anticipated expiration: 2039-07-29
Also published as: CN112307769A

Abstract

The application relates to a method for generating a natural language model and computer equipment, wherein the method comprises the following steps: inputting the shielding statement pair group into an initial neural network to obtain a prediction statement pair group and a prediction label; the shielding statement pair group is obtained by preprocessing statement pairs in training data, and the training data comprises the statement pairs and real labels; and adjusting parameters of the initial neural network according to the statement pairs, the real labels, the prediction statement pair groups and the prediction labels, and continuously executing the step of inputting the shielding statement pair groups into the initial neural network until preset training conditions are met, so as to obtain a trained natural language model. The semantic representation of the natural language model obtained by the method has global information and local semantic information of the shielding statement pairs, so that the precision of a natural language processing task is improved.

Description

Natural language model generation method and computer equipment

Technical Field

The present invention relates to the field of image processing technologies, and in particular, to a method and a computer device for generating a natural language model.

Background

Natural language processing (Nature Language Processing, NPL) is a sub-domain of artificial intelligence, generally divided into four general classes of tasks: sequence labeling, classification tasks, relationship judgment, and generation type tasks, training a natural language processing model plays an important role in improving the accuracy of a task result of natural language processing, because proper word vectors are obtained through training, and the accuracy of the task result of natural language processing is improved by the proper word vectors.

In NLP, certain tasks in natural language processing are generally adopted for training, for example, a neural network language model (Nature Network Language Model) proposed by Bengio, a main stream method of CBOW and Skip-gram training proposed by Google, mostly, word sense information of a certain feature word can be learned by inputting a context related word of the feature word, and word sense information and global semantic information of related words adjacent to the feature word are not learned by the main stream training method, so that a language processing model obtained by training by the existing training method cannot learn the related word sense information before and after an isolated word and the global semantic information, semantic characterization capability is weak, and the accuracy of a processing result is low.

Accordingly, the prior art is in need of improvement.

Disclosure of Invention

The invention aims to solve the technical problem of providing a generation method and computer equipment of a natural language model, which enable semantic representation of the natural language model obtained through training to have global information of sentence pairs and local semantic information and improve the precision of a natural language processing task.

In a first aspect, an embodiment of the present invention provides a method for generating a natural language model, where the method includes:

inputting a shielding statement pair group into an initial neural network to obtain a prediction statement pair group and a prediction tag, wherein the shielding statement pair group comprises at least one shielding statement pair, and the prediction tag represents the context relation of two statements in the at least one shielding statement pair; the shielding statement pair group is obtained by preprocessing statement pairs in training data, the training data comprises the statement pairs and real labels, wherein the statement pairs comprise two statements, and the real labels represent the context relation of the two statements in the statement pairs;

and adjusting parameters of the initial neural network according to the statement pairs, the real labels, the prediction statement pair groups and the prediction labels, and continuously executing the step of inputting the shielding statement pair groups into the initial neural network until preset training conditions are met, so as to obtain a trained natural language model.

As a further improved technical solution, the method for obtaining the shielding statement pair group corresponding to the statement pair by preprocessing the statement pair includes:

randomly selecting at least one word to be masked from the sentence pair;

determining at least one target set to be shielded according to the selected words to be shielded, and shielding the statement pairs according to the at least one target set to be shielded to obtain shielding statement pair groups corresponding to the statement pairs.

As a further improvement technical scheme, the determining at least one target set to be masked according to the selected word to be masked includes:

for each word to be shielded, determining a first word and a second word corresponding to the word to be shielded, wherein the first word is a word which is adjacent to the word to be shielded and positioned in front of the word to be shielded in a sentence pair to which the word to be shielded belongs, and the second word is a word which is adjacent to the word to be shielded and positioned behind the word to be shielded in a sentence pair to which the word to be shielded belongs;

generating a first target set to be shielded according to each word to be shielded;

generating a second target set to be shielded according to the words to be shielded and the first words respectively corresponding to the words to be shielded;

Generating a third target set to be shielded according to the words to be shielded and the second words respectively corresponding to the words to be shielded;

and generating a fourth target set to be shielded according to the words to be shielded, the first words respectively corresponding to the words to be shielded and the second words respectively corresponding to the words to be shielded.

As a further improvement technical scheme, the shielding statement pair group comprises a first shielding statement pair, a second shielding statement pair, a third shielding statement pair and a fourth shielding statement pair;

the step of masking the statement pairs according to the at least one target set to be masked to obtain a masking statement pair group corresponding to the statement pairs, including:

masking the statement pairs according to the first target set to be masked to obtain the first masking statement pairs;

shielding the statement pairs according to the second target set to be shielded to obtain the second shielding statement pairs;

shielding the statement pairs according to the third target set to be shielded to obtain the third shielding statement pairs;

and shielding the statement pairs according to the fourth target set to be shielded to obtain the fourth shielding statement pairs.

As a further improvement technical solution, the adjusting parameters of the initial neural network according to the sentence pair, the real tag, the prediction sentence pair group and the prediction tag, and continuing to execute the step of inputting the shielding sentence pair group into the initial neural network until a preset training condition is satisfied, so as to obtain a trained natural language model, including:

calculating a total loss value according to the statement pair, the real label, the prediction statement pair and the prediction label;

and adjusting parameters of the initial neural network according to the total loss value, and continuously executing the step of inputting the shielding statement pair group into the initial neural network until a preset training condition is met, so as to obtain a trained natural language model.

As a further improvement, the calculating the total loss value according to the statement pair, the real tag, the prediction statement pair, and the prediction tag includes:

calculating a first loss value according to the prediction statement pair group and the statement pair;

calculating to obtain a second loss value according to the prediction label and the real label corresponding to the statement pair;

and calculating the total loss value according to the first loss value and the second loss value.

As a further improvement technical scheme, the prediction statement pair group comprises a first prediction statement pair, a second prediction statement pair, a third prediction statement pair and a fourth prediction statement pair; the calculating a first loss value according to the prediction statement pair group and the statement pair corresponding to the shielding statement pair group comprises the following steps:

obtaining a first difference value according to the statement pair and the first prediction statement pair;

obtaining a second difference value according to the statement pairs and the second prediction statement pairs;

obtaining a third difference value according to the statement pair and the third prediction statement pair;

obtaining a fourth difference value according to the statement pair and the fourth prediction statement pair;

and calculating the first loss value according to the first difference value, the second difference value, the third difference value and the fourth difference value.

In a second aspect, an embodiment of the present invention provides a method for processing a natural language, including:

acquiring a statement pair to be processed, wherein the statement pair to be processed comprises two statements to be processed;

randomly selecting partial words from the sentence pair to be processed, and shielding the partial words in the sentence pair to be processed to obtain a shielded sentence pair to be processed;

Inputting the to-be-processed shielding sentence pair into a trained natural language model to obtain a fifth prediction sentence pair corresponding to the to-be-processed shielding sentence pair and a target label, wherein the target label represents the context relation between two sentences in the to-be-processed shielding sentence pair, and the trained natural language model is the trained natural language model obtained through the generation method of the natural language processing model.

In a third aspect, an embodiment of the present invention provides a computer device, including a memory and a processor, the memory storing a computer program, the processor implementing the following steps when executing the computer program:

acquiring training data, wherein the training data comprises statement pairs and real labels, the statement pairs comprise two statements, and the real labels represent the context relation of the two statements in the statement pairs;

preprocessing the statement pairs to obtain shielding statement pair groups corresponding to the statement pairs, wherein the shielding statement pair groups comprise at least one shielding statement pair;

inputting the shielding statement pair group into an initial neural network to obtain a prediction statement pair group and a prediction tag, wherein the prediction tag represents the context relation of two statements in the at least one shielding statement pair;

In a fourth aspect, embodiments of the present invention also provide a computer readable storage medium having stored thereon a computer program which when executed by a processor performs the steps of:

Compared with the prior art, the embodiment of the invention has the following advantages:

according to the training method of the natural language model provided by the embodiment of the invention, firstly, a shielding statement pair group is input into an initial neural network to obtain a prediction statement pair group and a prediction label, wherein the shielding statement pair group comprises at least one shielding statement pair, and the prediction label represents the context relation of two statements in the at least one shielding statement pair; the shielding statement pair group is obtained by preprocessing statement pairs in training data, the training data comprises the statement pairs and real labels, wherein the statement pairs comprise two statements, and the real labels represent the context relation of the two statements in the statement pairs; and adjusting parameters of the initial neural network according to the statement pairs, the real labels, the prediction statement pair groups and the prediction labels, and continuously executing the step of inputting the shielding statement pair groups into the initial neural network until preset training conditions are met, so as to obtain a trained natural language model. According to the training method of the natural language model, on one hand, the shielding statement pair group comprises a plurality of shielding statement pairs, the object to be shielded in each shielding statement pair is not only an isolated word, and the initial neural network is trained by predicting the prediction statement pair group corresponding to the shielding statement pair group, so that the semantic representation of the trained natural language model has the local semantic information of the word to be shielded; on the other hand, the initial neural network is trained by predicting the context relation of two sentences in the shielding sentence pair, so that the semantic representation of the trained natural language model has the global information of the shielding sentence pair, and the accuracy of the task processed by the trained natural language processing model is improved by the combined training of the two above.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings that are required to be used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments described in the present invention, and other drawings may be obtained according to the drawings without inventive effort to those skilled in the art.

FIG. 1 is a flow chart of a method for generating a natural language model according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a transducer encoding structure according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a multi-head attention model according to an embodiment of the present invention;

FIG. 4 is a diagram of V, K and Q of word vectors obtained in an embodiment of the invention;

FIG. 5 is an overall flow chart of calculating the total loss value according to an embodiment of the present invention;

FIG. 6 is a flow chart of a natural language processing method according to an embodiment of the invention;

fig. 7 is an internal structural diagram of a computer device in an embodiment of the present invention.

Detailed Description

In order to make the present invention better understood by those skilled in the art, the following description will clearly and completely describe the technical solutions in the embodiments of the present invention with reference to the accompanying drawings, and it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.

The inventor finds that most of the existing training methods of the natural language processing model only can learn the isolated word sense information, can not learn the related word sense information before and after the isolated word and the global semantic information, so that the semantic representation capability of the natural language processing model obtained by training is weaker.

In order to solve the above problems, in the embodiment of the present invention, an initial neural network is trained by masking a sentence pair group, so that semantic representation of a natural language model obtained by training has global information and local semantic information of the sentence pair, and the accuracy of the obtained natural language processing task is improved.

Various non-limiting embodiments of the present invention are described in detail below with reference to the attached drawing figures.

Referring to fig. 1, a method for generating a natural language model according to an embodiment of the present invention is shown, where the method includes:

s1, inputting a shielding statement pair group into an initial neural network to obtain a prediction statement pair group and a prediction tag, wherein the shielding statement pair group comprises at least one shielding statement pair, and the prediction tag represents the context relation of two statements in the at least one shielding statement pair; the shielding statement pair group is obtained by preprocessing statement pairs in training data, the training data comprises the statement pairs and real labels, wherein the statement pairs comprise two statements, and the real labels represent the context relation of the two statements in the statement pairs.

In the embodiment of the invention, the training data comprises statement pairs and real labels, wherein the statement pairs can be selected from any article, and the article materials are not limited and can be sports, literature, financial and the like. In the embodiment of the invention, the method for acquiring the statement pair is not limited. The sentence pairs can be two continuous sentences or two discontinuous sentences in the original article according to the context order. If two sentences in the sentence pair are continuous two sentences in the original article according to the context sequence, the real label of the sentence pair represents the context relation between the continuous two sentences of the sentence pair, for example, the next sentence in the sentence immediately follows the previous sentence, namely the next sentence is the next sentence of the previous sentence; if two sentences in the sentence pair are discontinuous two sentences in the original article according to the context order, the real label of the sentence pair represents the context relation of the discontinuous two sentences of the sentence pair, for example, the next sentence in the sentence pair is not immediately behind the previous sentence in the original text, i.e. the next sentence is not the next sentence of the previous sentence.

For example, if the statement pair is: the man went to the store. In the store he bought a gallon of mill its corresponding real tag is: is next, the expression: in the store, he bought a gallon of mill: the man went to the after storage.

In step S1, the sentence pair includes two sentences, and a classification special symbol is added before the sentence pair: the Classification special symbol is used for being identified by an initial neural network to trigger the initial neural network to perform natural language processing tasks on the subsequent statement pairs; the use of a segmenter: [ SEP ] splits two sentences, one sentence preceding the first partitioner in a sentence pair, the partitioner also being a flag of the end of one sentence, and one sentence being between the two partitioners.

For example, statement pairs: [ CLS ] The man went to the store [ SEP ] In the store, he bought a gallon of milk [ SEP ].

The initial neural network identifies the class specifier [ CLS ], then for the following sentence pair: the man went to the store [ SEP ] Inthe store, he bought a gallon of milk [ SEP ], performs natural language processing tasks, the first segmenter [ SEP ] is preceded by a statement, i.e., "The man went to the store" is a statement, and between the two segmenters is a statement, i.e., "Inthe store, he bought a gallon of milk" is another statement.

In the embodiment of the invention, the initial neural network is trained by adopting a prediction mask word task and a next sentence prediction task. Predictive mask word task: given a sentence pair, at least one word is randomly masked, and the natural language processing model predicts the masked at least one word from the remaining words, similar to the shape filling question type in English testing. The following sentence prediction task: given two sentences in an article, a sentence pair is obtained, and whether the next sentence in the sentence pair immediately follows the previous sentence in the article is judged. The initial neural network is trained through the prediction masking word task and the next sentence prediction task, so that the natural language processing model obtained after training can describe the whole information of the input text as comprehensively and accurately as possible, a better model parameter initial value is provided for the natural language processing task, and the obtained natural language processing result is more accurate.

In an embodiment of the present invention, the initial neural network may include a BERT model, a mask word prediction model, and a next sentence prediction model, where in natural language processing, words in text are generally represented by one-dimensional vectors, which are called word vectors; the BERT model is used for processing the shielding sentence pairs to obtain word vectors of each word in the sentence pairs; the shielding word prediction model is used for processing word vectors output by the BERT model to obtain statement pairs of predicted shielding words; and the next sentence prediction model is used for processing the word vector output by the BERT model so as to predict the context relation of two sentence pairs in the sentence pairs.

The BERT model can obtain word vectors of words in a text by utilizing large-scale and unmarked corpus training, then inputs the word vectors into a specific task model for fine adjustment to obtain a natural language processing result, the BERT model adopts a transducer coding structure to construct a language model, see fig. 2, which shows a schematic diagram of the transducer coding structure, illustrates a processing flow of the transducer coding structure, assumes that the input is sentence pairs, converts each word in the sentence pairs into a corresponding word vector, adds position codes to each word vector, the position codes represent the position of each word in the sentence and the distance between different words, inputs the word vectors added with the position codes into the multi-head attention model, adds the word vectors subjected to the multi-head attention model and the word vectors not subjected to the multi-head attention model, then performs normalization processing to obtain intermediate word vectors, inputs the intermediate word vectors into a feedforward neural network, adds the intermediate word vectors subjected to feedforward neural network processing and the intermediate word vectors not subjected to feedforward neural network processing, and then performs normalization processing to obtain the output word vectors.

The main framework of the transducer coding structure is a multi-head attention model, see fig. 3, which shows a schematic diagram of the framework of the multi-head attention model. The nature of the attention function can be described as a mapping of a query (Q) to a series of key-value pairs (K-V) representing the retrieval of data (value, V) from a data identity (key, K), the word vector being multiplied by an initialization matrix to yield V (value), K (key) and Q (query) for the word vector.

Referring to FIG. 4, a process of obtaining V (value), K (key), and Q (query) of a word vector is shown, e.g., the word vector of the word, thking, is X, multiplied by the initialization matrix W corresponding to Q ^Q Q is obtained; x multiplied by K corresponds to an initialization matrix W ^K Obtaining K; x times V corresponds to the initialization matrix W ^V V is obtained.

The multi-head attention model takes V, K, Q of each word vector as input, firstly carries out linear transformation on V, K, Q of each word vector, then inputs the input into the scaled click attention module, the step is carried out for h times, each time, one head is calculated, the head is shared for h times, namely, the attention is multi-head, and then word vectors processed by the scaled click attention module are combined and processed in a linearization mode.

In the embodiment of the invention, the method for obtaining the shielding statement pair group corresponding to the statement pair by preprocessing the statement pair comprises the following steps:

S11, randomly selecting at least one word to be masked from the sentence pair;

the at least one word to be masked is randomly selected by randomly selecting a preset percentage (e.g., 15%) of the words in all the words in the sentence pair as the word to be masked. Assuming that there are 15 words in a sentence pair, 2 words are randomly selected as words to be masked.

Illustrating: statement pairs: [ CLS ] The rain had only ceased with the gray streaks of morning at Blazing Star [ SEP ] and the settlement awoke to a moral sense of cleanliness [ SEP ], assuming that the words to be masked are randomly selected as follows: only, moving and setting.

S12, determining at least one target set to be shielded according to the selected words to be shielded, and shielding the statement pairs according to the at least one target set to be shielded to obtain shielding statement pair groups corresponding to the statement pairs.

In the embodiment of the invention, according to the sequence of words to be shielded in a sentence pair, selecting the adjacent previous word of each word to be shielded as a first word and the adjacent next word as a second word, and generating at least one target set to be shielded of each word to be shielded according to each word to be shielded, the first word of each word to be shielded and the second word of each word to be shielded. For each target set to be shielded, shielding the target set to be shielded in the statement pair to obtain a shielding statement pair corresponding to the target set to be shielded; and respectively corresponding shielding statement pairs of each target set to be shielded form a shielding statement pair group together, wherein the number of shielding statement pairs in the shielding statement pair group is the same as the number of target sets to be shielded of each word to be shielded.

Specifically, in step S12, determining at least one target set to be masked according to the selected word to be masked includes:

s121a, respectively determining a first word and a second word corresponding to each word to be shielded, wherein the first word is a word which is adjacent to the word to be shielded and is positioned in front of the word to be shielded in a sentence pair to which the word to be shielded belongs, and the second word is a word which is adjacent to the word to be shielded and is positioned behind the word to be shielded in a sentence pair to which the word to be shielded belongs.

For example, in the above example, the statement pair: the rain had only ceased with the gray streaks of morning at Blazing Star [ SEP ] and the settlement awoke to a moral sense of cleanliness [ SEP ], randomly selecting the words to be masked as only, moving and setting, and the words adjacent to and in front of the word to be masked are: had, the words adjacent to and following the word only to be masked are: the first word corresponding to the word only to be shielded is had, and the second word is molded; in the same method, the first word corresponding to the word to be masked moving is of, the second word is of, and the first word of the word to be masked is: the second word is: awokes.

S121b, generating a first target set to be shielded according to each word to be shielded.

In the embodiment of the invention, the first target set to be shielded corresponding to each word to be shielded is the word to be shielded.

For example, in the above example, the first set of objects to be masked for the word only to be masked is itself: only; likewise, for the word to be masked, moving, the first set of objects to be masked for moving is: moving; for the word to be masked, the first set of objects to be masked for the word segment is: set segments.

S121c, generating a second target set to be shielded according to the words to be shielded and the first words respectively corresponding to the words to be shielded.

In the embodiment of the invention, the second target set to be shielded corresponding to each word to be shielded consists of each word to be shielded and the first word corresponding to the word to be shielded.

For example, in the above example, for the word only to be masked, the first word of only is had, and then the second target set to be masked of the word only to be masked is: had only; for the word to be masked, the first word of the word to be masked is of, and the second target set to be masked for the word to be masked is: of moving; for the word to be masked, the first word of the word to be masked is the, and the second set of objects to be masked for the word to be masked is: the setting.

S121d, generating a third target set to be shielded according to the words to be shielded and the second words respectively corresponding to the words to be shielded.

In the embodiment of the invention, the third target set to be shielded corresponding to each word to be shielded consists of each word to be shielded and the corresponding second word.

For example, in the above example, if the second word of the word to be masked only is extracted, the third target set to be masked of the word to be masked only is: only purified; the second word to be masked for word moving is at, and the third target set to be masked for word moving is: a moving at; the second word of the word to be masked settlement is awake, and the third target set to be masked of the word to be masked settlement is: set piece awake.

S121e, generating a fourth target set to be shielded according to the words to be shielded, the first words respectively corresponding to the words to be shielded and the second words respectively corresponding to the words to be shielded.

In the embodiment of the invention, the fourth target set to be shielded corresponding to each word to be shielded consists of each word to be shielded, the first word corresponding to the word to be shielded and the second word corresponding to the word to be shielded.

For example, in the above example, for the word only to be masked, the first word of only is had and the second word of only is ceased, then the fourth target set to be masked of the word only to be masked is: had only spent; for the word to be masked, the first word of the word to be masked is of, the second word of the word to be masked is at, and the fourth target set to be masked for the word to be masked is: of moving at; for the word to be masked, the first word of the word to be masked is the word, the second word of the word to be masked is the word, and the fourth set of targets to be masked for the word to be masked is: the settlement awoke.

Specifically, in step S12, the set of shielding sentence pairs includes a first shielding sentence pair, a second shielding sentence pair, a third shielding sentence pair, and a fourth shielding sentence pair.

In the embodiment of the present invention, in step S121, four target sets to be masked of the word to be masked are obtained, and according to the sentence pairs and the first target set to be masked, a first masking sentence pair is obtained; likewise, a second shielding statement pair can be obtained according to the statement pair and the second target set to be shielded; according to the statement pairs and the third target set to be shielded, a third shielding statement pair can be obtained; the statement pair and the fourth target set to be masked can obtain a fourth masking statement pair; the four shading statement pairs form a shading statement pair group.

Specifically, the masking the statement pair according to the at least one target set to be masked to obtain a set of masking statement pairs corresponding to the statement pair, including:

s122a, shielding the statement pairs according to the first target set to be shielded, and obtaining the first shielding statement pairs.

In the embodiment of the invention, the words corresponding to the first target set to be shielded in the sentence pair are shielded, and the shielded sentence pair is obtained, namely the first shielded sentence pair.

For example, in the above example, the statement pair is: the first target set to be masked of [ CLS ] The rain had only ceased with the gray streaks of morning at Blazing Star [ SEP ] and the settlement awoke to a moral sense of cleanliness [ SEP ], three words to be masked only, moving, set are: only, moving, setting, masking a first target set to be masked in a statement pair, and obtaining a masked statement pair as a first masking statement pair, where in this example, the first masking statement pair is: unigram MASK [ CLS ] The rain had [ MASK ] ceased with The gray streaks of [ MASK ] at blowing Star [ SEP ] and The [ MASK ] awoke to a moral sense of cleanliness [ SEP ].

When the statement pair is masked, the rule of the masking operation is as follows:

The probability of the word to be shielded is 80%; the probability that the word to be masked is randomly replaced by other words is 10%; the probability that the word to be masked retains the real word is 10%.

By way of example, select masking nice in You are sound nice this statement:

80% probability, nice is truly occluded, yielding: youare so [ MASK ];

10% probability, nice is randomly replaced with other words, resulting in: you are so mill.

10% probability, nice unchanged, yielding You are sonic nice.

The probability that the word to be masked retains the real word is 10% with the aim of biasing the tokens towards the word actually observed; the 10% probability of the word to be masked is randomly replaced, and the understanding capability of the model is not impaired because the word to be masked is only 1.5% of the sentence pair (the word to be masked is 15% of the selected sentence pair, and the 10% probability of the sentence pair is randomly replaced).

S122b, shielding the statement pairs according to the second target set to be shielded, and obtaining the second shielded statement pairs.

In the embodiment of the invention, the words corresponding to the second target set to be shielded in the sentence pair are shielded, and the shielded sentence pair is obtained, namely the second shielded sentence pair.

For example, the sentence pair in the above example is: the second set of objects to be masked for the three words to be masked only, moving, and set are [ CLS ] The rain had only ceased with the gray streaks of morning at Blazing Star [ SEP ] and the settlement awoke to a moral sense of cleanliness [ SEP ], respectively: the second target set to be Masked in the statement pair is Masked by the had only, of moving, the set of masking statement pairs, and the Masked statement pair is the second masking statement pair, which in this example is Bigram mask-1: [ CLS ] The rain [ MASK ] [ MASK ] ceased with The gray streaks [ MASK ] [ MASK ] at blowing Star [ SEP ] and [ MASK ] [ MASK ] awoke to a moral sense of cleanliness [ SEP ].

S122c, shielding the statement pairs according to the third target set to be shielded, and obtaining the third shielding statement pairs.

In the embodiment of the invention, the words corresponding to the third target set to be shielded in the sentence pair are shielded, and the shielded sentence pair is obtained, namely the third shielded sentence pair.

Illustrating: the sentence pairs are: the third set of objects to be masked for the three words to be masked only, moving, and set are [ CLS ] The rain had only ceased with the gray streaks of morning at Blazing Star [ SEP ] and the settlement awoke to a moral sense of cleanliness [ SEP ], respectively: masking the third target set to be Masked in the statement pair to obtain a Masked statement pair, namely a third masking statement pair, wherein in the example, the third masking statement pair is Bigram mask-2: [ CLS ] he rain had [ MASK ] [ MASK ] with the gray streaks of [ MASK ] [ MASK ] blowing Star [ SEP ] and the [ MASK ] [ MASK ] to a moral sense of cleanliness [ SEP ].

S122d, shielding the statement pairs according to the fourth target set to be shielded, and obtaining the fourth shielding statement pairs.

In the embodiment of the invention, the words corresponding to the fourth target set to be shielded in the sentence pair are shielded, and the shielded sentence pair is obtained, namely the fourth shielded sentence pair.

Illustrating: the sentence pairs are: the fourth set of objects to be masked for the three words to be masked only, moving, and set are [ CLS ] The rain had only ceased with the gray streaks of morning at Blazing Star [ SEP ] and the settlement awoke to a moral sense of cleanliness [ SEP ]: and (3) shielding a fourth target set to be shielded in The statement pair by The had only shielded, of moving at, the settlement awoke, wherein The shielded statement pair is a fourth shielded statement pair, in this example, the fourth shielded statement pair is Trigram MASK: [ CLS ] The rain [ MASK ] [ MASK ] [ MASK ] with The gray streaks [ MASK ] [ MASK ] [ MASK ] blowing Star [ SEP ] and [ MASK ] [ MASK ] to a moral sense of cleanliness [ SEP ].

In the embodiment of the invention, the shielding sentence pair group is input into the initial neural network, the BERT module of the initial neural network outputs the word vector of each word in each shielding sentence pair, the word vector of each word in each shielding sentence pair is input into the shielding word prediction model, and the shielding word prediction model predicts the shielded word in each shielding sentence pair to obtain the prediction sentence pair group. The word vector of each word in each masked sentence pair is input into a next sentence prediction model, which outputs a prediction label representing the context of the two sentences in each masked sentence pair predicted by the next sentence prediction model, e.g., for a first masked sentence pair, the next sentence prediction model outputs the context of the two sentences in the first masked sentence pair predicted by the next sentence prediction model.

For example, for a first MASK-statement pair Unigram MASK: [ CLS ] The rain had [ MASK ] ceased with The gray streaks of [ MASK ] at blank Star [ SEP ] and The [ MASK ] awoke to a moral sense of cleanliness [ SEP ], the MASK word prediction model will output its predicted pre-MASK first set of objects to be Masked, assuming that The output first prediction statement pair is: [ CLS ] The rain had [ to ] ceased with The gray streaks of [ moving ] at blowing Star [ SEP ] and The [ setting ] awoke to a moral sense of cleanliness [ SEP ], each mask statement pair has a corresponding prediction statement pair.

For example, the next sentence prediction model may output a prediction tag for Unigram MASK [ CLS ] The rain had [ MASK ] ceased with The gray streaks of [ MASK ] at blowing Star [ SEP ] and The [ MASK ] awoke to a moral sense of cleanliness [ SEP ] according to The first MASK sentence: prediction Label-1, the prediction Label representing the context of two sentences in the predicted first masked sentence pair, i.e., whether the next sentence in the sentence pair immediately follows the previous sentence in the original article, assuming that prediction Label-1=is next, represents that the initial neural network predicts that the two sentences in the first masked sentence pair are contextually consecutive, each mask sentence pair corresponding to a prediction Label.

S2, adjusting parameters of the initial neural network according to the statement pairs, the real labels, the prediction statement pair groups and the prediction labels, and continuously executing the step of inputting the shielding statement pair groups into the initial neural network until preset training conditions are met, so that a trained natural language model is obtained.

In the embodiment of the invention, the parameters of the initial neural network are adjusted according to the statement pairs, the real labels, the prediction statement pair groups and the prediction labels, the statement pairs and the prediction statement pair groups are compared to obtain a first loss value, the real labels and the prediction labels are compared to obtain a second loss value, the loss value of the initial neural network is obtained according to the first loss value and the second loss value, and the parameters of the initial neural network are adjusted according to the loss value of the initial neural network.

Further, in an implementation manner of this embodiment, the step S2 includes:

s21, calculating a total loss value according to the statement pair, the real label, the prediction statement pair and the prediction label.

In the embodiment of the invention, a first loss value can be obtained according to the statement pair and the prediction statement pair corresponding to the statement pair, a second loss value can be obtained according to the real tag and the prediction tag, and a total loss value can be obtained according to the first loss value and the second loss value.

Specifically, step S21 includes:

s211, calculating a first loss value according to the prediction statement pair group and the statement pair;

in the embodiment of the invention, the prediction statement pair group comprises four prediction statement pairs, the four prediction statement pairs correspond to four shielding statement pairs in the shielding statement pair group, the shielding statement pair group is obtained by shielding statement pairs, the statement pairs are equivalent to standard answers of a shielding word prediction task, each prediction statement pair in the prediction statement pair group is obtained through an initial neural network and is compared with the statement pair respectively, and a first loss value can be obtained.

Specifically, step S211 includes:

s211a, obtaining a first difference value according to the statement pair and the first prediction statement pair;

s211b, obtaining a second difference value according to the statement pair and the second prediction statement pair;

s211c, obtaining a third difference value according to the statement pair and the third prediction statement pair;

s211d, obtaining a fourth difference value according to the statement pair and the fourth prediction statement pair.

In the embodiment of the present invention, the first difference value is a difference value between the word vector of each word in the sentence pair obtained by calculation and the word vector of each word in the first prediction sentence pair, and may be denoted as Loss (Unigram); the second difference value is a difference value between the word vector of each word in the sentence pair and the word vector of each word in the second prediction sentence pair, and can be recorded as Loss (Bigram 1); the third difference value is a difference value between the word vector of each word in the sentence pair and the word vector of each word in the third prediction sentence pair, and can be recorded as Loss (Bigram 2); the fourth difference value is a difference value between the calculated word vector of each word in the sentence pair and the calculated word vector of each word in the fourth prediction sentence pair, and may be denoted as Loss (Trigram).

S211e, calculating the first loss value according to the first difference value, the second difference value, the third difference value and the fourth difference value.

And carrying out summation calculation on the first difference value, the second difference value, the third difference value and the fourth difference value obtained by calculation according to task weights to obtain a Loss value of a masking word prediction task, namely the first Loss value Loss1, as shown in a formula (1):

Loss1＝αLoss(Unigram)+βLoss(Bigram1)+γLoss(Bigram2)+

δLoss(Trigram) (1)

where α, β, γ, and δ are parameters of the initial neural network, loss (Unigram) is a first difference, loss (Bigram 1) is a second difference, loss (Bigram 2) is a third difference, loss (Trigram) is a fourth difference, and Loss1 is a first Loss value.

S212, calculating to obtain a second loss value according to the prediction label and the real label corresponding to the statement pair.

In the embodiment of the invention, the shielding statement pair group is obtained by performing shielding operation on the statement pairs, the statement pairs have real labels, the context relation of two statements in the statement pair is represented, the prediction label of each shielding statement pair in the shielding statement pair group is predicted by the next statement prediction model in the initial neural network, and the second loss value can be obtained by comparing the prediction label corresponding to each shielding statement pair with the real labels.

For example, the statement pair is The rain had only ceased with the gray streaks of morning at Blazing Star [ SEP ] and the settlement awoke to a moral sense of cleanliness [ SEP ]. Its corresponding real tag is: real label=is next, for the first pair of masking sentences, the first pair of masking sentences is predicted by the initial neural network to obtain a predicted label-1, the real label is compared with the predicted label-1, similarly, for the second pair of masking sentences, there is a corresponding Loss (label-2) for the second pair of masking sentences, there is a corresponding Loss (label-3) for the third pair of masking sentences, and there is a corresponding Loss (label-4) for the fourth pair of masking sentences. According to the real label and the prediction label corresponding to each shielding statement pair, a cross entropy loss function is adopted, so that the loss value of the prediction task of the next statement, namely the second loss value, can be calculated, and the second loss value is shown as a formula (2):

wherein y represents a real tag: the real label is used for the real label,representing the corresponding predictive label of the t-th mask statement pair, e.g.)>In the embodiment of the present invention, there are 4 prediction tags corresponding to the first mask statement pair, and t may be 1,2,3, and 4.

S213, calculating the total loss value according to the first loss value and the second loss value.

In this embodiment, the total loss value may be determined according to the first loss value and the second loss value, for example, the sum of the first loss value and the second loss value is taken as the total loss value, specifically, the total loss value may be obtained by the formula (3):

TotalLoss＝Loss1+Loss2 (3)

wherein totaloss is the total Loss value, loss1 is the first Loss value, and Loss2 is the second Loss value.

S22, adjusting parameters of the initial neural network according to the total loss value, and continuously executing the step of inputting the shielding statement pair group into the initial neural network until a preset training condition is met, so as to obtain a trained natural language model.

The total loss value comprises the loss value of the masking word prediction task and the loss value of the next sentence prediction task, and the parameters of the initial neural network are modified according to the total loss value until the preset training condition is met, so that a trained natural language model is obtained.

In the embodiment of the present invention, the preset training condition may be that the training frequency reaches a preset frequency, and optionally, the preset frequency may be 100,000 times; the preset training condition can also be that the model converges; because the training times may not reach the preset times, the initial neural network may be converged, which may cause repeated unnecessary work, or the model may not be converged all the time, may cause dead cycles, and may not end the training process, and based on the two cases, the preset training condition may also be that the preset times or the model convergence is reached.

Referring to fig. 5, fig. 5 is an overall flowchart of calculating the total loss value according to an embodiment of the invention.

Statement pairs: the man went to the store. In the store he bought a gallon of mill.

a. Acquiring statement pairs and real labels thereof;

b. randomly selecting words to be masked from sentence pairs: man, of;

c. for the word to be masked: man, of selecting a target set to be shielded;

d. and shielding the statement pairs to obtain a shielding statement pair group.

For this statement pair, the set of shading statement pairs is:

the first pair of MASK statements, unigram MASK [ CLS ] The [ MASK ] went to The store [ SEP ] Inthe store, he bought a gallon [ MASK ] MASK [ SEP ].

And a second MASK statement pair, bigram MASK-1: [ CLS ] [ MASK ] [ MASK ] went to the store [ SEP ] In the store, he bought a [ MASK ] [ MASK ] mill [ SEP ].

A third MASK statement pair, bigram MASK-2, [ CLS ] The [ MASK ] [ MASK ] to The store [ SEP ] In The store, he bought a gallon [ MASK ] [ MASK ] [ SEP ].

And a fourth MASK statement pair, trigram MASK [ CLS ] [ MASK ] [ MASK ] [ MASK ] to the store [ SEP ] Inthe store, he bought a [ MASK ] [ MASK ] [ MASK ] [ SEP ].

f. Inputting the shielding statement pair group into an initial neural network, obtaining a prediction statement pair group through a shielding word prediction model in the initial neural network, and obtaining at least one prediction tag through a next sentence prediction model in the initial neural network: and predicting Label.

g. Calculating a first loss value according to the prediction statement pair group and the statement pair, and calculating a second loss value according to the prediction label and the real label;

h. and calculating a total loss value according to the first loss value and the second loss value.

In the method for generating the natural language model, in the training process, the target set to be shielded of each word is set, and the target set to be shielded is predicted relative to the single shielding word, so that the semantic representation of the learned natural language processing model has the local semantic information of the word to be shielded, and is not only isolated semantic information; in the training process, the context relation of two sentences in the sentence pair is predicted, so that the semantic representation of the learned natural language processing model has the global information of the sentence pair; the natural language processing model obtained through the combined training of the mask word prediction task and the next sentence prediction task ensures that the mask word prediction task and the next sentence prediction task have better performance and more accurate results.

Referring to fig. 6, the embodiment of the invention further provides a natural language processing method, which includes:

k1, acquiring a statement pair to be processed, wherein the statement pair to be processed comprises two statements to be processed;

In the embodiment of the invention, two sentences to be processed can be selected from any article, and the two selected sentences to be processed form the sentence pair to be processed.

For example, the acquired pending statement pair is [ CLS ] The rain had only ceased with the gray streaks of morning at Blazing Star [ SEP ] and the settlement awoke to a moral sense of cleanliness [ SEP ].

And K2, randomly selecting partial words in the sentence pair to be processed, and shielding the partial words in the sentence pair to be processed to obtain a shielded sentence pair to be processed.

In the embodiment of the invention, the method of randomly selecting the partial words is that a preset percentage number in all words in the sentence pair to be processed is randomly selected, for example, the preset percentage number may be 15%, and after the partial words are selected, the partial words in the sentence pair to be processed are shielded.

For example, in the case that the to-be-processed sentence pair is [ CLS ] The rain had only ceased with the gray streaks of morning at Blazing Star [ SEP ] and the settlement awoke to a moral sense of cleanliness [ SEP ] and a part of words are randomly selected, it is assumed that the to-be-processed shielding sentence pair obtained after shielding is: [ CLS ] The rain had only ceased [ MASK ] the gray [ MASK ] of morning at Blazing Star [ SEP ] and the settlement awoke to a moral [ MASK ] of clearline [ SEP ].

And K3, inputting the to-be-processed shielding sentence pair into a trained natural language model to obtain a fifth prediction sentence pair corresponding to the to-be-processed shielding sentence pair and a target label, wherein the target label represents the context relation between two sentences in the to-be-processed shielding sentence pair, and the trained natural language model is the trained natural language model obtained through the generation method of the natural language processing model.

In the embodiment of the invention, a to-be-processed shielding sentence pair is input into a trained natural language model, wherein the natural language model comprises a BERT model, a shielding word prediction model and a next sentence prediction model, and word vectors corresponding to each word in the to-be-processed shielding sentence pair can be obtained through the BERT model. And inputting word vectors corresponding to each word in the shielding sentence pair to be processed into a shielding word prediction model to obtain a fifth prediction sentence pair, wherein the fifth prediction sentence pair comprises predicted words output by the shielding word prediction model, and the predicted words are predicted words corresponding to the shielded words by the shielding word prediction model aiming at the shielded words. And inputting word vectors corresponding to each word in the to-be-processed shielding sentence pair into a next sentence prediction model to obtain a target label, wherein the target label represents whether a next sentence of two sentences in the to-be-processed shielding sentence pair follows a previous sentence or not.

For example, the pair of shading statements to be processed: the [ CLS ] The rain had only ceased [ MASK ] the gray [ MASK ] of morning at Blazing Star [ SEP ] and the settlement awoke to a moral [ MASK ] of clearline [ SEP ] is input into a natural language model to obtain a MASK word prediction task result: fifth prediction statement pair, next statement prediction task result: target tags.

In the method for generating the natural language model, the natural language processing model is obtained through the combined training of the masking word prediction task and the next sentence prediction task, so that the results of the masking word prediction task and the next sentence prediction task are more accurate and better in performance.

In one embodiment, the present invention provides a computer device, which may be a terminal, with an internal structure as shown in fig. 7. The computer device includes a processor, a memory, a network interface, a display screen, and an input device connected by a system bus. Wherein the processor of the computer device is configured to provide computing and control capabilities. The memory of the computer device includes a non-volatile storage medium and an internal memory. The non-volatile storage medium stores an operating system and a computer program. The internal memory provides an environment for the operation of the operating system and computer programs in the non-volatile storage media. The network interface of the computer device is used for communicating with an external terminal through a network connection. The computer program, when executed by a processor, implements a method of generating a natural language model. The display screen of the computer equipment can be a liquid crystal display screen or an electronic ink display screen, and the input device of the computer equipment can be a touch layer covered on the display screen, can also be keys, a track ball or a touch pad arranged on the shell of the computer equipment, and can also be an external keyboard, a touch pad or a mouse and the like.

It will be appreciated by those skilled in the art that the block diagram of fig. 7 is merely a partial structure related to the present application and does not constitute a limitation of the computer device to which the present application is applied, and that a specific computer device may include more or less components than those shown in the drawings, or may combine some components, or have a different arrangement of components.

An embodiment of the present invention provides a computer device, including a memory and a processor, where the memory stores a computer program, and the processor executes the computer program to implement the following steps:

The embodiment of the invention also provides a computer readable storage medium, on which a computer program is stored, characterized in that the computer program when executed by a processor realizes the following steps:

The technical features of the above embodiments may be arbitrarily combined, and all possible combinations of the technical features in the above embodiments are not described for brevity of description, however, as long as there is no contradiction between the combinations of the technical features, they should be considered as the scope of the description.

The above examples merely represent a few embodiments of the present application, which are described in more detail and are not to be construed as limiting the scope of the invention. It should be noted that it would be apparent to those skilled in the art that various modifications and improvements could be made without departing from the spirit of the present application, which would be within the scope of the present application. Accordingly, the scope of protection of the present application is to be determined by the claims appended hereto.

Claims

1. A method for generating a natural language model, the method comprising:

Adjusting parameters of the initial neural network according to the statement pairs, the real labels, the prediction statement pair groups and the prediction labels, and continuously executing the step of inputting the shielding statement pair groups into the initial neural network until preset training conditions are met, so as to obtain a trained natural language model;

the method for obtaining the shielding statement pair group corresponding to the statement pair by preprocessing the statement pair comprises the following steps:

randomly selecting at least one word to be masked from the sentence pair;

determining at least one target set to be shielded according to the selected words to be shielded, and shielding the statement pairs according to the at least one target set to be shielded to obtain shielding statement pair groups corresponding to the statement pairs;

the determining at least one target set to be masked according to the selected word to be masked comprises:

generating a fourth target set to be shielded according to the words to be shielded, the first words respectively corresponding to the words to be shielded and the second words respectively corresponding to the words to be shielded;

the probability of the word to be shielded is 80%; the probability that the word to be masked is randomly replaced by other words is 10%; the probability that the words to be masked retain real words is 10%.

2. The method of claim 1, wherein the set of mask statement pairs includes a first mask statement pair, a second mask statement pair, a third mask statement pair, and a fourth mask statement pair;

3. The method of claim 1, wherein the adjusting parameters of the initial neural network according to the sentence pairs, the real labels, the prediction sentence pair groups, and the prediction labels, and continuing the step of inputting the masked sentence pair groups into the initial neural network until a preset training condition is satisfied, comprises:

4. A method according to claim 3, wherein said calculating a total loss value from said statement pair, said real tag, said predicted statement pair, and said predicted tag comprises:

5. The method of claim 4, wherein the set of prediction statement pairs includes a first prediction statement pair, a second prediction statement pair, a third prediction statement pair, and a fourth prediction statement pair; the calculating a first loss value according to the prediction statement pair group and the statement pair corresponding to the shielding statement pair group comprises the following steps:

6. The natural language processing method is characterized by comprising the following steps of:

inputting the to-be-processed shielding sentence pair into a trained natural language model to obtain a fifth prediction sentence pair corresponding to the to-be-processed shielding sentence pair and a target label, wherein the target label represents the context relation between two sentences in the to-be-processed shielding sentence pair, and the trained natural language model is a model generated by the natural language model generation method according to any one of claims 1 to 5.

7. A computer device comprising a memory and a processor, the memory storing a computer program, characterized in that the processor implements the steps of the method of any one of claims 1 to 5 when the computer program is executed.

8. A computer readable storage medium, on which a computer program is stored, characterized in that the computer program, when being executed by a processor, implements the steps of the method of any of claims 1 to 5.