CN111488438B

CN111488438B - Question-answer matching attention processing method, computer equipment and storage medium

Info

Publication number: CN111488438B
Application number: CN202010107570.6A
Authority: CN
Inventors: 赵洪科; 张兮; 王涛; 成一航
Original assignee: Tianjin University
Current assignee: Tianjin University
Priority date: 2020-02-21
Filing date: 2020-02-21
Publication date: 2022-07-29
Anticipated expiration: 2040-02-21
Also published as: CN111488438A

Abstract

The invention discloses a question-answer matching attention processing method, which comprises the following steps: establishing a multi-task learning neural network model; the multi-task learning neural network model comprises a main task neural network model and an auxiliary task neural network model; the main task neural network model is used for extracting answer feature vectors and attention feature vectors, and the auxiliary task neural network model is used for extracting externalization degree feature vectors of the answers; the auxiliary task neural network model is added to the main task neural network model; acquiring a training sample, and marking the externalization degree of answers to the question-answer matched sample; and inputting the processed sample into a multi-task learning neural network model for comprehensive loss training, and optimizing the shared parameters of the multi-task learning neural network model. The invention also discloses computer equipment and a storage medium for implementing the method. The invention can concentrate on externalization characteristics, avoid the loss of relevant information between questions and answers and enhance the interpretability of the deep neural network.

Description

Question-answer matching attention processing method, computer equipment and storage medium

Technical Field

The invention relates to the technical field of machine learning and text data mining, in particular to a question-answer matching attention processing method, computer equipment and a storage medium.

Background

Recently, a Community Question and answer platform (CQA), such as quera, has been rapidly developed because it can provide richer information than a conventional search engine. One of the most important research directions is the research on how to guarantee the quality of answers in CQA. In this regard, in order to ensure the quality of answers, many scholars perform a great deal of research work, such as expert discovery, information retrieval, question and answer matching, and the like. Currently, in the research of question-answer matching, most of the research assumes that the text similarity of question-answer and answer can be used to determine whether the question and answer are matched, and most of the research processes the question and answer separately and extracts feature vectors, and this separate processing mode results in that much relevant information between the question and answer is ignored. Therefore, in order to avoid the loss of relevant information between questions and answers and to enhance the interpretability of the deep neural network, attention networks are introduced into question-answer matching. Currently, most attention models in question-answer matching use some primary text features to build attention, such as giving higher weight to elements that appear in both questions and answers. This approach may introduce many high-level features that are important in CQAs, such as readability, understandability, language logic, and so forth.

Disclosure of Invention

The invention provides a question-answer matching attention processing method, computer equipment and a storage medium for solving the technical problems in the prior art. The method aims to improve the accuracy and the interpretability of an attention model in question-answer matching by combining theories suitable for CQA scenes.

The technical scheme adopted by the invention for solving the technical problems in the prior art is as follows: a question-answer matching attention processing method is characterized in that a multi-task learning neural network model is established; the multi-task learning neural network model comprises a main task neural network model and an auxiliary task neural network model; the main task neural network model is used for extracting answer feature vectors and attention feature vectors, and the auxiliary task neural network model is used for extracting externalization degree feature vectors of answers; the auxiliary task neural network model is attached to the main task neural network model; acquiring a training sample, and marking the externalization degree of answers to the question-answer matched sample; inputting the processed sample into the multi-task learning neural network model for comprehensive loss training, and optimizing the shared parameters of the multi-task learning neural network model.

Further, the auxiliary task neural network model has separate pooling layers, a connection layer, and an output layer.

Further, the method comprises the following specific steps:

step one, obtaining a sample set of a question text and an answer text, and performing word segmentation processing on the question text and the answer text in the sample set to obtain corpus word segmentation data; embedding the obtained corpus participle data to obtain an embedded matrix of the problem, and using E ^q Representation, and an embedded matrix of answers, with E ^a Represents;

step two, setting a first attention matrix to represent E ^q The ith word vector sum E of ^a The first attention matrix is represented by A ¹ Represents; constructing two trainable matrices

And

wherein M is ^q For mixing A with A ¹ Conversion to and E ^q Matrices I of the same size ^q ，M ^a For mixing A with A ¹ Is converted into ^a Matrices I of the same size ^a ；l _a A maximum number of character strings representing answer text; l _q A maximum number of character strings representing the question text; d represents a word vector dimension; the conversion formula is as follows:

I ^q ＝A ¹ ·M ^q ；

I ^a ＝(A ¹ ) ^T ·M ^a ；

step three, obtaining E ^q 、I ^q 、E ^a 、I ^a Thereafter, the composite input word vector matrix [ E ] ^q ；I ^q ]Inputting into a main task neural network model to obtain a primary problem feature vector matrix, and using F ^q Represents; vector matrix [ E ] of compound input words ^a ；I ^a ]Respectively inputting the data into a main task neural network model and an auxiliary task neural network model, obtaining a primary answer eigenvector matrix from the main task neural network model, and usingF ^a Expressing, obtaining a primary externalization degree characteristic vector matrix by an auxiliary task neural network model, and using F ^e Represents;

step four, setting a second attention matrix to represent F ^q And F ^a Similarity between, the second attention matrix is represented by A ² Represents; f is to be ^e Combining the first attention matrix with the second attention matrix to obtain a third attention matrix, wherein the third attention matrix is represented by A ³ Represents;

step five, obtaining a third attention matrix A ³ Then, by summing the third attention matrix A separately ³ Row and column of the problem, calculating the merging weight of the problem, using W ^q Representation, and the combined weight of the answers, W ^a Represents; then by weighting the total pooling with W ^q And W ^a Substitution into F ^q 、F ^a And F ^e Obtaining a second-level question feature vector matrix, a second-level answer feature vector matrix and a second-level externalization degree feature vector matrix; v for second-order problem eigenvector matrix ^q Representing, the two-level answer eigenvector matrix by V ^a Representing, by V, a two-level externalization degree eigenvector matrix ^e Represents;

Step six, converting V ^q 、V ^a Simultaneously inputting the data into two fully-connected layers of the main task neural network model; will V ^e Inputting the loss function into two connecting layers of an auxiliary task neural network model to obtain a total loss function consisting of a loss function of the main task neural network model and a loss function of the auxiliary task neural network model;

and seventhly, taking the total loss function as an objective function, and optimizing all trainable parameters by using a batch gradient descent method.

Further, F is ^e The specific method for obtaining the third attention matrix by combining the second attention matrix is as follows:

first, a trainable matrix M is constructed ^e Let F stand for ^e Is trainable by the matrix M ^e Conversion into matrix C for obtaining AND ² Matrices of the same size; the conversion formula is as follows:

C＝(F ^e ) ^T ·M ^e ；

then, adopting convex combination linear combination mode to combine C and A ² Are combined into A ³ ，A ³ Each element in (1)

Calculated from the following formula:

c _i,j represents an element in C;

is represented by A ² The elements of (1); l _a A maximum number of character strings representing answer text; l _q A maximum number of character strings representing the question text; lambda [ alpha ] _i,j Are trainable convex combining coefficients, all of which are initialized to lambda _i,j 1 and training is performed during back propagation.

Further, when carrying out comprehensive loss training, output data of the main task neural network model and the auxiliary task neural network model are converted into a standardized output result which is distributed between 0 and 1 and has the sum of 1 by using an SOFTMAX activation function.

Furthermore, the two full connection layers of the master task neural network model are full connection layers based on a Siamese framework.

Further, by adopting a pair training method, the feature vectors of a pair of questions and answers are simultaneously input into two fully-connected layers of the main task neural network model so as to predict a final result.

And further, training and verifying the multitask learning neural network model by using data of the Quora database as a training sample.

The invention also provides a computer device comprising a memory and a processor; the memory is used for storing a computer program; the processor is configured to execute the computer program and implement any of the above-mentioned steps of the question-answer matching attention processing method when executing the computer program.

The present invention also provides a computer-readable storage medium storing a computer program which, when executed by a processor, implements the question-answer matching attention processing method steps of any one of the above.

The invention has the advantages and positive effects that: the invention combines the theory suitable for CQA scene to improve the accuracy and the interpretability of the attention model in question-answer matching. The present invention chooses Externalization (Externalization) in knowledge authoring theory as background theory because it describes semantic information that closely matches the target that the question-answer matches and wants to find the best answer. This would make up for the lack of advanced functions in the question-and-answer matching algorithm such as readability and understandability.

First, the present invention combines neural network algorithms with social science theories, a very interesting and popular research direction in recent years, and as is known, few studies combine knowledge creation theories with question-and-answer matching algorithms. Therefore, the invention introduces the externalization process of the knowledge creation theory into question-answer matching, which is a new exploration combining the knowledge creation theory and the CQA scene and has important significance for the research of the CQA business activity and the knowledge creation theory.

Secondly, the invention provides a novel multi-task learning neural network model, and the auxiliary task of identifying the externalization degree of the answer is introduced, so that the novel multi-task learning neural network model can be focused on the externalization characteristics. In addition, according to the result of attention visualization, the attention generation process is purposefully influenced through parameter sharing, so that the interpretability of the attention network in the field of question-answer matching is improved. Because the poor interpretability is a general defect of the artificial neural network model, the multi-task learning neural network model provided by the invention has higher interpretability through purposeful multi-task learning and has theoretical significance for enhancing the neural network interpretability.

Third, the model was tested on two real world datasets, see fig. 3 and 4. The experimental result shows that compared with other models, the multi-task learning neural network model has better precision and F1-Score performance and smaller volatility, and the method can improve the precision of the model and the stability of the result. The method has practical significance for further improving the question-answer matching system of the CQA platform.

Drawings

FIG. 1 is a block diagram of a multi-task learning neural network model of the present invention;

FIG. 2 is a flowchart of a process for embedding question text and answer text;

FIG. 3 is a boxplot of the results of the experiment;

fig. 4 attention focus visualization effect map.

Detailed Description

For further understanding of the contents, features and effects of the present invention, the following embodiments are enumerated in conjunction with the accompanying drawings, and the following detailed description is given:

referring to fig. 1 to 4, a question-answer matching attention processing method is to establish a multi-task learning neural network model; the multi-task learning neural network model comprises a main task neural network model and an auxiliary task neural network model; the main task neural network model is used for extracting answer feature vectors and attention feature vectors, and the auxiliary task neural network model is used for extracting externalization degree feature vectors of answers; the auxiliary task neural network model is attached to the main task neural network model; acquiring a training sample, and marking the externalization degree of answers to the question-answer matched sample; inputting the processed sample into the multi-task learning neural network model for comprehensive loss training, and optimizing the shared parameters of the multi-task learning neural network model. Preferably, the auxiliary task neural network model may have separate pooling layers, a connection layer, and an output layer.

The multi-task learning neural network model of the invention has two tasks, takes question-answer matching as a main task, and takes the externalization degree of the recognition answer as an auxiliary task. The invention provides an attention neural network model based on externalization by introducing an auxiliary task based on an externalization process, namely: the external orientation Neural Network for Question-Answer Matching, called EBANN model for short, enables the EBANN model to add externalized high-level text functions to the process of generating Attention.

According to the externalization rule and the set judgment standard, the externalization degree of the answer is additionally marked in the actual data matched with the question and answer, so that the input data with the main task label and the auxiliary task label can be obtained. And inputting the training samples into the EBANN model, and then carrying out comprehensive loss training on the model to ensure that the EBANN model simultaneously realizes question-answer matching and text classification based on externalization. A trainable soft parameter sharing method is adopted to design a new attention network by combining a main task and an auxiliary task, so that an attention model has auxiliary task characteristics.

Finally, two data sets of quadra were cross-validated 10-fold, see fig. 3 and 4. Experimental results prove that the accuracy and the stability of the EBANN model are superior to those of a baseline method. Furthermore, the interpretable improvement of the attention network is also displayed by the attention visualization.

Preferably, the method may comprise the specific steps of:

step one, a sample set of a question text and an answer text can be obtained, and the question text and the answer text in the sample set can be subjected to word segmentation to obtain corpus word segmentation data; the obtained corpus participle data can be embedded to obtain an embedded matrix of problems, and E can be used ^q Representation, and an embedded matrix of answers, available as E ^a And (4) showing.

And

wherein M is ^q ForA is to be ¹ Conversion to and E ^q Matrices I of the same size ^q ，M ^a For mixing A with A ¹ Is converted into ^a Matrices I of the same size ^a ；l _a A maximum number of character strings representing answer text; l _q A maximum number of character strings representing the question text; d represents a word vector dimension; the conversion formula is as follows:

I ^q ＝A ¹ ·M ^q ；

I ^a ＝(A ¹ ) ^T ·M ^a ；

the trainable matrix is a parameter matrix and is a more common calculation method in the deep learning model. We initialize the trainable matrix, for example, initialize all the elements of the matrix to 1 matrix, then train and optimize it like the model parameters, finally optimize it to get the optimal result and use it as part of the model.

Since convolutional neural network models, like most other models, require input data length to be consistent, we here need to set maximum length l for question text and answer text, respectively _q And l _a I.e. the maximum number of strings. For the text exceeding the maximum length, the exceeding part is deleted; for those texts that do not reach the maximum length, we will fill them to the maximum length with a zero vector.

Step three, obtaining E ^q 、I ^q 、E ^a 、I ^a Thereafter, the composite input word vector matrix [ E ] may be input ^q ；I ^q ]Inputting into the main task neural network model to obtain the primary problem feature vector matrix, which can be F ^q Represents; a vector matrix [ E ] of composite input words ^a ；I ^a ]Respectively inputting into the main task neural network model and the auxiliary task neural network model, and obtaining a primary answer eigenvector matrix from the main task neural network model, wherein F can be used ^a Representing that a primary externalization degree characteristic vector matrix can be obtained by an auxiliary task neural network model, and F can be used ^e And (4) showing.

Step four, a second attention matrix can be set to represent F ^q And F ^a Similarity between, the second attention matrix can be A ² Represents; can be combined with F ^e Combining the first attention matrix and the second attention matrix to obtain a third attention matrix, wherein the third attention matrix can be A ³ And (4) showing.

Step five, obtaining a third attention matrix A ³ Thereafter, the third attention matrix A may be summed separately ³ Row and column of, calculating the problem's combining weight, available as W ^q Representation, and the combined weight of the answers, available as W ^a Represents; then W can be used by weighting the total pooling ^q And W ^a Substitution into F ^q 、F ^a And F ^e Obtaining a secondary question feature vector matrix, a secondary answer feature vector matrix and a secondary externalization degree feature vector matrix; the second order problem eigenvector matrix may be used V ^q Representing, the two-level answer eigenvector matrix can be V ^a The feature vector matrix representing the degree of two-level externalization can be V ^e And (4) showing.

Step six, V can be adjusted ^q 、V ^a Simultaneously inputting the data into two fully-connected layers of the main task neural network model; can make V ^e And inputting the loss function into two connecting layers of the auxiliary task neural network model to obtain a total loss function consisting of the loss function of the main task neural network model and the loss function of the auxiliary task neural network model.

And step seven, the total loss function can be used as an objective function, and a batch gradient descent method is used for optimizing all trainable parameters.

Preferably, F is ^e The specific method of merging into the second attention matrix to obtain the third attention matrix may be as follows:

First, a trainable matrix M may be constructed ^e Can make F ^e Is trainable by the matrix M ^e Conversion into matrix C, which can be used to obtain AND ² Matrices of the same size; the conversion formula may be as follows:

C＝(F ^e ) ^T ·M ^e ；

then, can adoptConvex combination linear combination mode, can be used for C and A ² Are combined into A ³ ，A ³ Each element in (1)

Can be calculated from the following formula:

c _i,j represents an element in C;

is represented by A ² The elements of (1); l _a A maximum number of character strings representing answer text; l _q A maximum number of character strings representing the question text; lambda [ alpha ] _i,j Are trainable convex combining coefficients, which correspond to elements in C and A ² The convex combination coefficients can be initialized to be lambda _i,j 1 and training is performed during back propagation.

By adopting the convex combination linear combination mode, the condition that the sharing layer needs to be manually set can be changed, the training time can be reduced, and the model precision is improved.

Preferably, in the synthetic loss training, the output data of the main task neural network model and the auxiliary task neural network model can be converted into a normalized output result which is distributed between 0 and 1 and has a sum of 1 by using the SOFTMAX activation function.

When the comprehensive loss training is carried out, the output of the network model can be standardized by using an SOFTMAX activation function; normalization can be performed using the following formula:

Wherein

And

respectively may be O in a training batch ^m 1 and O ^m A frequency of 0, in the same way,

and

respectively may be O in a training batch ^e 1 and O ^e Frequency of 0.

Let the total loss function consist of three parts:

Loss＝[L ^m *log(y ^m )+(1-L ^m )*log(1-y ^m )]-λ ^e [L ^e *log(y ₂ )+(1-L ^e )*log(1-y ^e )]+L ₂ 。

in the above formulas:

E ^q an embedded matrix represented as a problem;

E ^a an embedded matrix represented as an answer;

I ^q the problem after the attention moment array conversion is expressed as an embedded matrix;

I ^a the answer is expressed as an answer embedding matrix after the attention moment matrix conversion;

M ^q expressed as a trainable problem transformation matrix;

M ^a expressed as a trainable answer transformation matrix;

M ^e expressed as a trainable externalization level transformation matrix;

W ^q a merge weight represented as a problem;

W ^a a merge weight represented as an answer;

F ^q expressed as a primary problem eigenvector matrix;

F ^a expressed as a primary answer feature vector matrix;

F ^e expressed as a primary externalization degree eigenvector matrix;

V ^q expressed as a second order problemA feature vector matrix;

V ^a expressed as a two-level answer feature vector matrix;

V ^e representing the feature vector matrix of the two-level externalization degree;

A ¹ expressed as a first attention matrix;

A ² expressed as a second attention matrix;

A ³ expressed as a third attention matrix;

L ^m the main task label is a boolean variable used for representing the result of question-answer matching the main task, wherein 0 represents that the next answer in the paired training pattern data is more matched with the question than the previous answer, and 1 represents that the previous answer in the paired training pattern data is more matched with the question than the next answer.

L ^e The auxiliary task label is a boolean variable used for representing the result of externalization degree, wherein 0 represents that the next answer in the paired training pattern data has higher externalization characteristics than the previous answer, and 1 represents that the previous answer in the paired training pattern data has higher externalization characteristics than the next answer.

Outputting a frequency of 1 for the main task label in a training batch;

the frequency of 0 output for the master task label in a training batch;

outputting a frequency of 1 for the auxiliary task label in a training batch;

for marking auxiliary tasks in a training batchA frequency at which the signature output is 1;

y ^m the output result of the main task after the SoftMax function standardization is obtained;

y ^e the output result of the auxiliary task after the SoftMax function standardization is obtained;

λ ^e a scaling factor in the composite loss function for the auxiliary task;

L ₂ to normalize the penalty parameters.

Wherein the first part of the total loss function [ L ^m *log(y ^m )+(1-L ^m )*log(1-y ^m )]As output O ^m And a primary task tag L in the sample data ^m The cross entropy function of (a); second part lambda ^e [L ^e *log(y ₂ )+(1-L ^e )*log(1-y ^e )]As output O ^e And auxiliary task tag L ^e The cross entropy function of (a); third part L ₂ And the normalized penalty parameter is used as a normalization penalty parameter for preventing the model from being overfitted.

Preferably, the full connection layer of the master task neural network may be a full connection layer based on a siamese architecture. In this configuration, the feature vector V is first obtained ^q And V ^a Mapped to the similarity space and then put the similarity of all filter sizes into the fully connected layers. Finally, will obtain and O ^m The output of the matched question answer, which is a boolean variable.

Preferably, a pair-wise training method may be employed, and feature vectors of a pair of questions and answers are simultaneously input into the fully-connected layer to predict the final result. The influence of subjectivity can be reduced by training the model through the relative relation of a pair of data.

Preferably, the multitask learning neural network model can be trained and verified by using data of the Quora database as a training sample. A first data set containing various questions related to the Quora is opened by the Quora of the United states question and answer website, and the purpose of the method is to help researchers in the fields of machine learning, natural language processing, neural network science and the like to construct an extensible online knowledge sharing platform.

The invention also provides a computer device embodiment, the computer device comprises a memory and a processor; the memory is used for storing a computer program; the processor is configured to execute the computer program and implement the steps of the question-answer matching attention processing method according to any one of the above items when the computer program is executed.

The present invention further provides an embodiment of a computer-readable storage medium, which stores a computer program, and when the computer program is executed by a processor, the method for processing the attention of question and answer matching is implemented.

The working principle of the invention is explained below in connection with a preferred embodiment of the invention:

in order to improve the accuracy and the interpretability of a question-answer matching attention network, an externalization process is combined with the attention network to establish a multi-task learning neural network model, and the multi-task learning neural network model is an externalization-based attention neural network model, which is called as an EBANN model for short, and is shown in figure 1. The multi-task learning neural network model has two tasks, the question-answer matching is used as a main task, and the externalization degree of the recognition answer is used as an auxiliary task. Then, an auxiliary task neural network model is introduced to serve as an additional neural network model (CNN network), and the auxiliary task neural network model is added to the main task neural network model in a multi-task learning mode, so that the degree of externalization of the main task can be brought into the principle of attracting attention.

The following neural network model is abbreviated as CNN; the main task neural network model is called main task CNN for short; the auxiliary task neural network model is called auxiliary task CNN for short.

The degree of externalization is the degree of conversion from implicit knowledge to explicit knowledge.

The externalization means that the implicit knowledge is converted into the explicit knowledge, and the idea and the pithy formula are expressed in the forms of language and the like; this "implicit knowledge" is transformed into "explicit knowledge" in teams through conversational (Dialogue) discussions, which belongs to the "externalization" phase of knowledge innovation. The auxiliary task neural network model is attached to the main task neural network model;

the so-called "implicit Knowledge" refers to informatization Knowledge, including experience, technology, culture, habit, etc. of enterprises, operators or employees, and is Knowledge of personal experience and intuition, which is not easy to communicate and express by speech, and is expressed as a skill which is difficult to Formalize and can not be embodied. Such knowledge is subjective, self-comprehending knowledge that relates to an individual's cognition and idea.

The "Explicit Knowledge" includes all Knowledge presented in the form of documents, manuals, reports, maps, programs, pictures, sounds, images, etc., and is expressed by means of formal or institutional languages or text images, which are objective and rational Knowledge, regardless of the conventional written documents or electronic files.

In the process of externalization, implicit knowledge can be expressed through a metaphor, a simulation, a concept and the like, and after modeling, conceptual knowledge, namely explicit knowledge, is formed.

The degree of externalization is a label marked by a person, and has a certain subjectivity. Therefore, a pair training method is adopted, and the model is trained through the relative relation of a pair of data so as to reduce the influence of subjectivity.

Moreover, in order to reduce subjective influence caused by data marking, a pairwise training model method is adopted, so that the model learns how to select better answers according to the relative goodness and badness between two answers. Therefore, the two answers input into the model each time are actually two pieces of answer data under the same question, and then the two answers are respectively input into the model to calculate results and then are compared to obtain the high or low. Fig. 1 fully illustrates the processing of only one of the two answers in the model. A pair of feature vectors are input together into the fully-connected layer to predict the final result.

Step 1, l _q And l _a Respectively are problems andthe maximum length of the word vector of the answer, and the dimension of each word vector is d. Doing so as in most studies, for each text question-answer pair, these unstructured data are first converted by the trainable embedding module in FIG. 2 into two computable word embedding matrices, respectively of the question

And answers

In addition, in the case of the present invention,

is the ith word vector in the question embedding matrix corresponding to the ith word in the question,

is the jth word vector in the answer-embedding matrix corresponding to the jth word in the answer.

Step 2, obtaining I ^q And I ^a Then, at E ^q Convolution operations are performed. For the vector matrix [ E ] of compound input words ^q ；I ^q ]And [ E ^a ；I ^a ]The following calculations are performed, respectively, to obtain eigenvector matrices

And

i∈[1,l _q ],j∈[1,l _a ],k∈[1,n]。

i represents the ith word vector of the question and j represents the jth word vector of the answer. k denotes the kth filter of the nth filter, each filter size f having n filters.

And

are trainable parameters of the convolutional layer. During this convolution, the filter is only moved in the vertical direction in fig. 2, which means that one word is convolved with f words after itself. This is because there is no special meaning between the forward and directional dimensions of each word vector, and therefore no horizontal movement is required in the model.

Step 3, constructing an attention matrix A by using the similarity of the word vector or the feature vector matrix vector ¹ And A ² . The attention moment matrix may make the model more aware of features that occur simultaneously in question and answer. Specifically, the first attention moment array

Each element in (1)

Is E ^q Element (1) of

And E ^a Element (1) of

The degree of similarity of (a) to (b),

calculated by the following formula:

it represents the ith word direction of the questionThe similarity of the jth word vector in the quantity and answer, where both Euclidean distance and cosine similarity can be used to measure similarity. After that, a composite input word vector matrix [ E ] of CNN is to be obtained ^q ；I ^q ]And [ E ^a ；I ^a ]Must be prepared by

Is converted into and

and

matrices of the same size. Thus, two trainable matrices are constructed

And

multiplying A by a matrix of the formula ¹ Is converted into

And

I ^q ＝A ¹ ·M ^q ,I ^a ＝(A ¹ ) ^T ·M ^a 。

similarly, let the second attention matrix

Each element in (1)

Is that

At F ^q Neutralization

At F ^a The similarity of (a) to (b) is,

the similarity between the ith feature map vector representing the question and the jth feature map vector representing the answer.

The following formula is calculated:

step 4, more importantly, A ² Combined with auxiliary tasks to obtain a third attention moment array that can take into account the degree of externalization

This is the most critical part of the model. Composite word vector matrix [ E ] in the equation pair ^a ；I ^a ]After convolution operation, a characteristic vector matrix can be obtained

F ^e Is the feature vector matrix of the auxiliary task, which has two functions. The first function is part of an independent deep neural network that assists the task and identifies the degree of externalization of the answer, and the second function is used to build a multi-task learning module to improve the main task. The invention combines the main task matched with the question and answer and the auxiliary task for identifying the externalization degree of the answer through the parameter sharing of two CNNs. Specifically, the auxiliary task feature vector matrix F is learned through multiple tasks ^e Incorporated into attention Module A ² In (2), a new attention module is obtained, which is defined as a third attention matrix A ³ The method comprises the following steps:

j∈[1,l _a ],h∈[1,l _q ]。

h denotes the h-th filter, with filter l _q Is more inclined to have l in the auxiliary task _q Filters other than E ^q That uses the same number of n filters, since F ^e After transposition, a second attention matrix can be obtained

Eigenvector matrix F of the same size ^e 。

First, the auxiliary task feature vector matrix F ^e Is trainable the matrix M ^e Transformation into a matrix

The calculation formula is as follows, and the purpose is to obtain the formula A ² Matrices of the same size.

C＝(F ^e ) ^T ·M ^e 。

Second, C and A ² Are combined into A ³ By combining the corresponding elements in a convex manner, respectively. A. the ³ Each element in (1)

Calculated from the following formula:

c _i,j represents an element in C. Lambda [ alpha ] _i,j Are trainable convex combining coefficients, one for each position. In addition, they are all initialized to λ _i,j 1 and training is performed during back propagation. Based on a multitask learning network, a multitask module is provided, which has a more simplified structure and fewer parameters. The method can change the condition that a sharing layer needs to be manually set, reduce the training time and improve the model precision.

Step 5, obtaining A ³ Then, by summing A separately ³ Can calculate the merging weight of the problem

And the combined weight of the answers

Then, by weighting all pooling, using W ^q And W ^a F is to be ^q And F ^a Are respectively extracted as problem feature vectors V ^q ∈R ^1×n And answer feature vector V ^a ∈R ^1×n . To obtain A ³ Then, to obtain the problem merge weight vector

And obtaining a merged weight vector of the answers

The third attention matrices a are summed separately as follows ³ Rows and columns.

And

each represents W ^q And W ^a Of (1). Then, using a weighted complete pooling method, using W ^q And W ^a As the de-weighting, the problem feature vector V is obtained ^q ∈R ^1×n And answer feature vector V ^a ∈R ^1×n The calculation method is as follows:

k∈[1,n],i∈[1,l _q ],j∈[1,l _a ]。

this approach may assign more weight to those more important elements than the traditional pooling approach.

Step 6, in order to obtain the final output result of the Boolean value type and the total loss function, V is used ^q And V ^a Together into a two-tiered fully connected network. Since a pair-wise training method is used, it is necessary to input pairs of feature vectors into fully connected layers. V in FIG. 1 ^q And V ^a Both are made up of two parts. In addition to this, for the fully-connected layer of the task, two types of fully-connected layers are used as a comparison.

A first type of fully connected layer is based on the siamese architecture. In this configuration, the feature vector V is first obtained ^q And V ^a Mapped to the similarity space and then put the similarity of all filter sizes into the fully connected layers. Finally, will obtain a product with O ^m The output of the matched question answer, which is a boolean variable.

In the second type of fully-connected layer, V is added ^q And V ^a Directly put into two full-connection layers. Thereafter, matching O can also be obtained ^m The final boolean output of the questions and answers. If a second type of fully-connected layer is used, then the corresponding model will have an upper corner mark in the discussion of the results of FIG. 3.

In addition, for the fully connected layer of the auxiliary task, only V is added ^e Directly placed in two fully connected layers without any treatment. Thereafter, an auxiliary task O may be obtained ^e Is also a boolean variable.

And 7, finally, the total loss function consists of the loss functions of the main task and the auxiliary task. Batch Gradient Descent (BGD) is used to perform the back propagation process and optimize all trainable parameters. In order to train the model, the loss function must be set as the objective function. Therefore, the results are first normalized using the SOFTMAX activation function:

Wherein

And

respectively in a training batch ^m 1 and O ^m A frequency of 0, in the same way,

and

and similarly. Then, an overall loss function is designed, which consists of three parts:

In the above formulas:

E ^q an embedded matrix represented as a problem;

E ^a an embedded matrix represented as an answer;

I ^a expressed as the answer after the attention moment matrix conversionEmbedding the matrix;

M ^q expressed as a trainable problem transformation matrix;

M ^a expressed as a trainable answer transformation matrix;

M ^e expressed as a trainable externalization level transformation matrix;

W ^q a merge weight represented as a problem;

W ^a a merge weight represented as an answer;

F ^q expressed as a primary problem eigenvector matrix;

F ^a expressed as a primary answer feature vector matrix;

F ^e expressed as a primary externalization degree eigenvector matrix;

V ^q representing the feature vector matrix of the second-level problem;

V ^a Expressed as a secondary answer feature vector matrix;

V ^e representing the characteristic vector matrix of the two-stage externalization degree;

A ¹ expressed as a first attention matrix;

A ² expressed as a second attention matrix;

A ³ expressed as a third attention matrix;

Outputting a frequency of 1 for the main task label in a training batch;

the frequency of 0 output for the master task label in a training batch;

outputting a frequency of 1 for the auxiliary task label in a training batch;

outputting a frequency of 1 for the auxiliary task label in a training batch;

λ ^e a scaling factor in the composite loss function for the auxiliary task;

L ₂ to normalize the penalty parameters.

And 8, carrying out an experimental process. Using W ^q And W ^a The Z-score of (a) serves as a basis for the attention visualization of fig. 4 to show the interpretability of the attention network. It can be seen that:

first, the EBANN model may retain most of the original attention features of ABCNN, which means that the model may inherit the features of existing question-and-answer matching attention networks.

Second, the attention weight of the EBANN model has more negative values. This means that the EBANN model can not only focus more on meaningful information, but also effectively suppress useless information.

Third, early studies demonstrated that too many prepositions and pronouns reduced the readability of the text, while the EBANN model had much negative weight on prepositions and pronouns. Therefore, the EBANN model can transfer the auxiliary task characteristic of identifying the externalization degree to the main task, so that the attention network has better interpretability and the accuracy of model prediction is improved.

Finally, according to experimental results and attention visualization, it is considered that the process of generating attention in the question-answer matching process may be influenced by the purpose of parameter sharing.

The invention provides an externalized attention neural network (EBANN) model by introducing an auxiliary task of an externalized process based on a knowledge creation theory into a question-answer matching model. In addition, the main task of question-answer matching is combined with the auxiliary task through a multi-task learning method, so that an attention model based on word vector similarity features can pay attention to high-level externalized semantic features.

Through the description of the above embodiments, it is clear to those skilled in the art that the above embodiments may be implemented by software, or by software plus a necessary general hardware platform. Based on such understanding, the technical solutions of the above embodiments may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a usb disk, a removable hard disk, or the like), and includes several instructions for enabling a computer device (which may be a personal computer, a server, or a network device, or the like) to execute the methods according to the embodiments of the present invention.

The above-mentioned embodiments are only for illustrating the technical ideas and features of the present invention, and the purpose thereof is to enable those skilled in the art to understand the contents of the present invention and to carry out the same, and the present invention shall not be limited to the embodiments, i.e. the equivalent changes or modifications made within the spirit of the present invention shall fall within the scope of the present invention.

Claims

1. A question-answer matching attention processing method is characterized in that a multitask learning neural network model is established; the multi-task learning neural network model comprises a main task neural network model and an auxiliary task neural network model; the main task neural network model is used for extracting an answer characteristic matrix and an attention characteristic matrix, and the auxiliary task neural network model is used for extracting an externalization degree characteristic matrix of an answer; the auxiliary task neural network model is attached to the main task neural network model; acquiring a training sample, and marking the externalization degree of answers to the question-answer matched sample; inputting the processed sample into the multi-task learning neural network model for comprehensive loss training, and optimizing the shared parameters of the multi-task learning neural network model;

the method comprises the following specific steps:

step one, obtaining a sample set of a question text and an answer text, and performing word segmentation processing on the question text and the answer text in the sample set to obtain corpus word segmentation data; embedding the obtained corpus participle data to obtain an embedded matrix of the problem, and using E ^q Representation, and an embedded matrix of answers, with E ^a Representing;

And

I ^q ＝A ¹ ·M ^q ；

I ^a ＝(A ¹ ) ^T ·M ^a ；

step three, obtaining E ^q 、I ^q 、E ^a 、I ^a Thereafter, the composite input word vector matrix [ E ] ^q ；I ^q ]Inputting into a main task neural network model to obtain a primary problem feature vector matrix, and using F ^q Represents; vector matrix [ E ] of compound input words ^a ；I ^a ]Respectively inputting the data into a main task neural network model and an auxiliary task neural network model, obtaining a primary answer eigenvector matrix by the main task neural network model, and using F ^a Expressing, obtaining a primary externalization degree characteristic vector matrix by an auxiliary task neural network model, and using F ^e Represents;

step four, setting a second attention matrix to represent F ^q And F ^a Similarity between, the second attention matrix is represented by A ² Represents; f is to be ^e Combining the first attention matrix with the second attention matrix to obtain a third attention matrix, wherein the third attention matrix is represented by A ³ Representing;

step six, mixing V ^q 、V ^a Simultaneously inputting the data into two fully-connected layers of the main task neural network model; will V ^e Inputting the loss function into two connecting layers of an auxiliary task neural network model to obtain a total loss function consisting of a loss function of the main task neural network model and a loss function of the auxiliary task neural network model;

2. The question-answer matching attention processing method according to claim 1, characterized in that the auxiliary task neural network model has separate pooling layers, connection layers and output layers.

3. The question-answer matching attention processing method according to claim 1, characterized in that F is set ^e The specific method for obtaining the third attention matrix by combining the second attention matrix is as follows:

C＝(F ^e ) ^T ·M ^e ；

Calculated from the following formula:

c _i,j represents an element in C;

4. The question-answer matching attention processing method according to claim 1, characterized in that, when performing synthetic loss training, output data of the main task neural network model and the auxiliary task neural network model are converted into normalized output results which are distributed between 0 and 1 and have a sum of 1 by using a SOFTMAX activation function.

5. The question-answer matching attention processing method according to claim 1, characterized in that the two fully connected layers of the master task neural network model are a fully connected layer based on a siamese architecture.

6. The question-answer matching attention processing method according to claim 1, characterized in that a pair training method is adopted, feature vectors of a pair of questions and answers are simultaneously input into two fully connected layers of a main task neural network model to predict a final result.

7. The question-answer matching attention processing method according to claim 1, characterized in that the multitask learning neural network model is trained and verified by using data of a Quora database as training samples.

8. A computer device, wherein the computer device comprises a memory and a processor; the memory is used for storing a computer program; the processor for executing the computer program and implementing the question-answer matching attention processing method steps according to any one of claims 1 to 7 when executing the computer program.

9. A computer-readable storage medium storing a computer program, characterized in that: the computer program, when executed by a processor, implementing the question-answer matching attention processing method steps of any one of claims 1 to 7.