CN111241843A

CN111241843A - Semantic relation inference system and method based on composite neural network

Info

Publication number: CN111241843A
Application number: CN201811446102.0A
Authority: CN
Inventors: 何广; 朱琦; 林鹏飞; 袁源; 覃玲华; 毛仕文; 陈开添
Original assignee: China Mobile Communications Group Co Ltd; China Mobile Group Guangdong Co Ltd
Current assignee: China Mobile Communications Group Co Ltd; China Mobile Group Guangdong Co Ltd
Priority date: 2018-11-29
Filing date: 2018-11-29
Publication date: 2020-06-05
Anticipated expiration: 2038-11-29
Also published as: CN111241843B

Abstract

The embodiment of the invention provides a semantic relation inference system and method based on a composite neural network. The system comprises a feature extraction unit, a training unit and a decision unit, wherein the training unit comprises a dual-growth short-term memory neural network model, a decomposition focal length model and an enhanced sequence inference model; the training unit is used for receiving the word vectors, respectively training the word vectors of the two texts to be matched with a dual-growth short-term memory neural network model, a decomposition focal length model and an enhanced sequence inference model, and outputting the result vectors output by the models to the decision unit; and the decision unit is used for receiving the result vector input by the training unit, integrating the result vector through a gradient enhancement decision tree and outputting the semantic relation of the two texts to be matched. The embodiment of the invention can improve the accuracy of synonym sense relation detection.

Description

Semantic relation inference system and method based on composite neural network

Technical Field

The embodiment of the invention relates to the technical field of natural language processing, in particular to a semantic relation inference system and a semantic relation inference method based on a composite neural network.

Background

With the rise of deep learning, semantic analysis based on a neural network becomes a research hotspot, and detection of synonym and synonym relation becomes a key for deducing the context relation of short texts.

At present, the method for improving the accuracy of the semantic relation inference method mainly extracts a large number of artificial features. Usually, the targeted extraction is performed based on the service situation and the data condition. For example, common business synonyms are normalized, and so on. However, the accuracy improvement of the method is usually difficult to migrate to another data set. Meanwhile, the manual feature extraction will take most of the time for system construction.

Disclosure of Invention

The embodiment of the invention provides a semantic relation inference system and a semantic relation inference method based on a composite neural network, which are used for solving the problem of low semantic relation inference accuracy in the prior art.

In a first aspect, an embodiment of the present invention provides a semantic relation inference system based on a composite neural network, where the system includes a feature extraction unit, a training unit, and a decision unit, where the training unit includes a dual-growth short-term memory neural network model, a decomposed focal length model, and an enhanced sequence inference model, where:

the feature extraction unit is used for extracting word vectors of input texts and outputting the word vectors to the training unit;

the training unit is used for receiving the word vectors, respectively training the word vectors of the two texts to be matched with a dual-growth short-term memory neural network model, a decomposition focal length model and an enhanced sequence inference model, and outputting the result vectors output by the models to the decision unit;

and the decision unit is used for receiving the result vector input by the training unit, integrating the result vector through a gradient enhancement decision tree and outputting the semantic relation of the two texts to be matched.

In a second aspect, an embodiment of the present invention provides a semantic relation inference method for a composite neural network, where the method includes:

extracting a word vector of an input text;

training a dual-growth short-term memory neural network model, a decomposition focal length model and an enhanced sequence inference model on the word vectors respectively;

and integrating the result vectors output by the models and outputting the semantic relation of the two texts to be matched.

In a third aspect, an embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, where the processor executes the computer program to implement the method provided in the second aspect.

In a fourth aspect, the present invention also provides a non-transitory computer readable storage medium, on which a computer program is stored, which when executed by a processor implements the method provided in the second aspect.

According to the embodiment of the invention, the word vectors are respectively trained on the dual-growth short-term memory neural network model, the decomposed focal length model and the enhanced sequence inference model, and then the semantic relation of the word vectors is judged through gradient enhancement decision, so that the detection accuracy of the synonym semantic relation can be improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, and it is obvious that the drawings in the following description are some embodiments of the present invention, and those skilled in the art can also obtain other drawings according to the drawings without creative efforts.

Fig. 1 is a schematic structural diagram of a semantic relationship inference system based on a composite neural network according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a network structure of a dual-growth short-term memory neural network model according to an embodiment of the present invention;

FIG. 3 is a schematic diagram of a network structure of a decomposed focal length model according to an embodiment of the present invention;

FIG. 4 is a schematic diagram of a network structure of an enhanced sequence inference model according to an embodiment of the present invention;

FIG. 5 is a flowchart illustrating a semantic relationship inference method based on a composite neural network according to an embodiment of the present invention;

fig. 6 is a schematic structural diagram of an electronic device according to an embodiment of the present invention.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

Fig. 1 shows a schematic structural diagram of a semantic relation inference system based on a composite neural network according to an embodiment of the present invention.

As shown in fig. 1, the system includes a feature extraction unit 11, a training unit 12, and a decision unit 13, the training unit includes a dual-growth short-term memory neural network model 121, a decomposed focal length model 122, and an enhanced sequence inference model 123, wherein:

specifically, the embodiment of the present invention may use a pre-trained word vector model or train itself on the original text to generate a word vector.

specifically, the embodiment of the invention inputs the word vectors of two texts to be matched into the training unit, and trains three models in the training unit respectively by using the data set. And finally, outputting the result vectors output by the three models to a decision unit as embedded vectors.

And the decision unit is used for receiving the matching result vector input by the training unit, integrating the matching result vector through a gradient enhancement decision tree and outputting the semantic relation of the two texts to be matched.

Specifically, the decision unit adopts a gradient enhancement decision tree to finally integrate the embedded vectors input by the training unit to obtain a judgment result of the semantic relationship of the word vectors of the two texts, and judges whether the word vectors of the two texts are synonyms or synonyms, so as to obtain the semantic relationship of the texts.

Meanwhile, the feature automatic extraction based on the neural network reduces the workload of manual feature selection and construction in system construction, so that the method has wider application range and can more conveniently and quickly realize the inference of semantic relation.

On the basis of the above embodiment, the dual-growth short-term memory neural network model includes:

the first input module is used for respectively inputting the word vectors of the two texts to be matched into the two long-term and short-term memory neural networks to obtain the final hidden states of the two texts;

the first training module is used for training by taking the normalized difference value of the final hidden states of the two texts as a prediction label;

and the first output module is used for carrying out vector splicing on the final hidden states of the two trained texts and outputting the final hidden states to the decision unit.

Fig. 2 is a schematic network structure diagram of a dual-growth short-term memory neural network model provided by an embodiment of the present invention.

As shown in fig. 2, the dual-growth short-term Memory neural network (siame LSTM) model provided by the embodiment of the present invention includes two long short-term Memory neural networks (LSTM-a and LSTM-B), and the training process is as follows:

respectively inputting two texts to be matched into two LSTM networks;

and taking the normalized difference value obtained from the final hidden states of the LSTM-A and the LSTM-B as a prediction label, and performing matching training with a label provided by a daA set, wherein the calculation formula of the prediction label is as follows:

exp(-||h₂ ^A-h₃ ^B||₁)

after training is finished, vector splicing is carried out on the final hidden states of the LSTM-A and the LSTM-B during use, and the final hidden states are input into a final gradient enhancement decision tree model.

On the basis of the above embodiment, the decomposed focal length model includes:

the second input module is used for inputting the word vectors of the two texts to be matched into a decomposition focusing matrix to obtain para-word vectors of the positions of the two word vectors;

the second training unit is used for inputting the comparison result of the para-word vector and the original word vector at the corresponding position into the feedforward neural network for training;

and the second output unit is used for splicing the vectors after position comparison results of the two trained texts are pooled and outputting the vectors to the decision unit.

Fig. 3 shows a network structure diagram of a decomposed focal length model provided by the embodiment of the present invention.

As shown in fig. 3, the training process for decomposing the focal length model is as follows:

the word vector is weight-computed using a neural network, each weight being the focus of decomposition (decompactable Attention). If a word vector in text A is expressed as Ai, a word vector in text B is expressed as B_jExpressing the neural network as a function F (), the calculation of the focusing unit in the focusing matrix can be obtained by combining and calculating the corresponding focusing again from the following expression:

e_ij＝F(A_i)^TF(B_j)

calculating a para-word vector (A) for the text word vector location by aggregating the matrix with the original word vectors_i-a_i,B_j-b_j). The calculation method is as follows:

and comparing the para-word vector obtained by the above formula with the original word vector at the corresponding position, wherein the comparison mode is to input the two vectors after splicing into a feedforward neural network.

And integrating the comparison results on the vector positions of the words, and using a Global Average Pooling (Global Average Pooling) mode in the text range.

And splicing the pooled vectors of the two texts, and outputting the spliced vectors to a final linear layer to obtain a final inference result.

Fig. 4 is a schematic network structure diagram of an enhanced sequence inference model provided by an embodiment of the present invention.

As shown in fig. 4, the enhanced sequence inference model provided in the embodiment of the present invention includes a dual-growth short-term memory neural network, a layer decomposition focal length model, and a long-term short-term memory neural network, and is configured to receive the paired word vectors, respectively perform training on the dual-growth short-term memory neural network, the decomposition focal length model, and the long-term short-term memory neural network, respectively perform global maximum pooling and global average pooling on the hidden state output by the long-term short-term memory neural network, and output the vector after global maximum pooling and the vector after average pooling of the paired word vectors to the decision unit after being spliced.

The training process of the enhanced sequence inference model is as follows:

inputting two texts to be matched into two LSTM networks (LSTM-A1 and LSTM-B1);

using the hidden state of each step of the last step LSTM as the text region meaning code a of the word position_iAnd b_j；

Calculating the element of the focusing matrix to obtain the para-position local area code of the corresponding position, wherein the calculation formula is as follows:

calculating and converting the para-position local area code and the original local area code to obtain an integrated local area code;

the integrated local area codes are sequentially input into the next layer LSTM. Outputting the hidden state of the text, and respectively performing global maximum pooling and global average pooling in the text range to obtain global text representation;

and splicing the vectors after the global maximum pooling and the vectors after the average pooling corresponding to the two texts. The vector is output to a neural network as a final decision device and output for training; or directly output the vector to the final gradient enhanced decision tree model.

The embodiment of the invention outputs the embedded vectors of the three models of the training unit to the final gradient enhancement decision tree model and outputs the final result.

Fig. 5 is a flow chart illustrating a semantic relation inference method based on a composite neural network according to an embodiment of the present invention.

As shown in fig. 5, the semantic relationship inference method based on a composite neural network provided in the embodiment of the present invention specifically includes the following steps:

s11, extracting word vectors of the input text;

S12, training a dual-growth short-term memory neural network model, a decomposition focal length model and an enhanced sequence inference model on the word vectors respectively;

specifically, the embodiment of the invention inputs the word vectors of two texts to be matched into the training unit, and trains three models in the training unit respectively by using the data set. Finally, the result vectors output by the three models are output to the decision unit as embedded vectors (embedding vectors).

And S13, integrating the result vectors output by the models and outputting the semantic relation of the two texts needing to be matched.

On the basis of the above embodiment, S12 specifically includes a training step of the dual-growth short-term memory neural network model:

respectively inputting the word vectors of two texts to be matched into two long-term and short-term memory neural networks to obtain the final hidden states of the two texts;

training by taking the normalized difference value of the final hidden states of the two texts as a prediction label;

and carrying out vector splicing on the final hidden states of the two trained texts, and outputting the final hidden states to the decision unit.

Referring to fig. 2, a dual-growth Short-term Memory neural network (siame Long Short-term Memory, siame LSTM) model provided by the embodiment of the present invention includes two Long Short-term Memory neural networks (LSTM-a and LSTM-B), and the training process is as follows:

respectively inputting two texts to be matched into two LSTM networks;

exp(-||h₂ ^A-h₃ ^B||₁)

On the basis of the above embodiment, S12 specifically includes a training step of decomposing the focal length model:

inputting the word vectors of two texts to be matched into a decomposition focusing matrix to obtain para-word vectors of two word vector positions;

inputting the comparison result of the para-word vector and the original word vector at the corresponding position into a feedforward neural network for training;

and splicing vectors after position comparison results of the two trained texts are pooled, and outputting the vectors to the decision unit.

Referring to fig. 3, a training process of the decomposed focal length model provided by the embodiment of the present invention is as follows:

e_ij＝F(A_i)^TF(B_j)

calculating the para-word vector of the text word vector position by aggregating the matrix and the original word vector: (A_i-a_i,B_j-b_j). The calculation method is as follows:

On the basis of the above embodiment, S12 specifically includes a training step of enhancing the sequence inference model:

inputting the word vectors of two texts to be matched into a double-growth short-term memory neural network to obtain the hidden state of the two texts in each step;

taking the hidden state of each step of the double-growth short-term memory neural network as the position code of the corresponding text, and inputting the hidden state into a decomposition focusing matrix to obtain the para-position local area codes of the two texts;

inputting the para-position local area codes of the two texts into a long-short term memory neural network to obtain the hidden states of the two texts;

and splicing the vectors of the two texts after the hidden states are pooled, and outputting the spliced vectors to the decision unit.

Referring to fig. 4, a training process of the enhanced sequence inference model provided in the embodiment of the present invention is as follows:

inputting two texts to be matched into two LSTM networks respectively, wherein the two texts are not called LSTM-A1 and LSTM-B1;

the last step LThe hidden state of each step of STM is used as the text region meaning code a of the word position_iAnd b_j；

An embodiment of the present invention further provides an electronic device, which includes a memory, a processor, and a computer program stored in the memory and executable on the processor, and when the processor executes the computer program, the method shown in fig. 5 is implemented.

As shown in fig. 6, the electronic device provided by the embodiment of the present invention includes a memory 21, a processor 22, a bus 23, and a computer program stored on the memory 21 and executable on the processor 22. The memory 21 and the processor 22 complete communication with each other through the bus 23.

The processor 22 is used to call the program instructions in the memory 21 to implement the method of fig. 5 when executing the program.

For example, the processor implements the following method when executing the program:

extracting a word vector of an input text;

According to the electronic equipment provided by the embodiment of the invention, the word vectors are respectively trained by the dual-growth short-term memory neural network model, the decomposition focal length model and the enhanced sequence inference model, and the semantic relation of the word vectors is judged by the gradient enhancement decision, so that the detection accuracy of the synonym semantic relation can be improved.

An embodiment of the present invention further provides a non-transitory computer readable storage medium, on which a computer program is stored, and the program, when executed by a processor, implements the steps of fig. 5.

extracting a word vector of an input text;

The non-transitory computer-readable storage medium provided by the embodiment of the invention can improve the precision of synonym semantic relation detection by training the word vectors respectively through a dual-growth short-term memory neural network model, a decomposed focal length model and an enhanced sequence inference model and judging the semantic relation of the word vectors through gradient enhancement decision making.

An embodiment of the present invention discloses a computer program product, the computer program product comprising a computer program stored on a non-transitory computer-readable storage medium, the computer program comprising program instructions, which when executed by a computer, enable the computer to perform the methods provided by the above-mentioned method embodiments, for example, including:

extracting a word vector of an input text;

The above-described embodiments of the apparatus are merely illustrative, and the units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one place, or may be distributed on a plurality of network units. Some or all of the modules may be selected according to actual needs to achieve the purpose of the solution of the present embodiment. One of ordinary skill in the art can understand and implement it without inventive effort.

Through the above description of the embodiments, those skilled in the art will clearly understand that each embodiment can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware. With this understanding in mind, the above-described technical solutions may be embodied in the form of a software product, which can be stored in a computer-readable storage medium such as ROM/RAM, magnetic disk, optical disk, etc., and includes instructions for causing a computer device (which may be a personal computer, a server, or a network device, etc.) to execute the methods described in the embodiments or some parts of the embodiments.

Finally, it should be noted that: the above examples are only intended to illustrate the technical solution of the present invention, but not to limit it; although the present invention has been described in detail with reference to the foregoing embodiments, it will be understood by those of ordinary skill in the art that: the technical solutions described in the foregoing embodiments may still be modified, or some technical features may be equivalently replaced; and such modifications or substitutions do not depart from the spirit and scope of the corresponding technical solutions of the embodiments of the present invention.

Claims

1. A semantic relation inference system based on a composite neural network is characterized by comprising a feature extraction unit, a training unit and a decision unit, wherein the training unit comprises a dual-growth short-term memory neural network model, a decomposition focal length model and an enhanced sequence inference model, and the semantic relation inference system comprises:

2. The system of claim 1, wherein the dual-growth short-term memory neural network model comprises:

3. The system of claim 1, wherein the decomposed focal length model comprises:

4. The system of claim 1, wherein the enhanced sequence inference model comprises:

the third input module is used for inputting the word vectors of the two texts to be matched into a double-growth short-term memory neural network to obtain the hidden state of the two texts in each step;

the fourth input module is used for inputting the hidden state of each step of the double-growth short-term memory neural network into a decomposition focusing matrix to obtain the para-position local coding of the two texts as the position coding of the corresponding text;

the fifth input module is used for inputting the para-position local area codes of the two texts into a long-short term memory neural network to obtain the hidden states of the two texts;

and the third output unit is used for splicing the vectors of the two texts after the hidden states are pooled and outputting the spliced vectors to the decision unit.

5. A semantic relationship inference method based on a composite neural network, the method comprising:

extracting a word vector of an input text;

6. The method of claim 5, further comprising:

training a double-growth short-term memory neural network model:

7. The method of claim 5, further comprising:

and (3) decomposing a focal length model and training:

8. The method of claim 5, further comprising:

training the enhanced sequence inference model:

9. An electronic device comprising a memory, a processor and a computer program stored on the memory and executable on the processor, wherein the processor when executing the program implements the steps of the composite neural network-based semantic relationship inference method according to any one of claims 5 to 8.

10. A non-transitory computer readable storage medium, on which a computer program is stored, wherein the computer program, when being executed by a processor, implements the steps of the composite neural network-based semantic relationship inference method according to any one of claims 5 to 8.