CN113127599A

CN113127599A - Question-answering position detection method and device of hierarchical alignment structure

Info

Publication number: CN113127599A
Application number: CN202110230676.XA
Authority: CN
Inventors: 付鹏; 林政�; 刘欢; 王伟平; 孟丹
Original assignee: Institute of Information Engineering of CAS
Current assignee: Institute of Information Engineering of CAS
Priority date: 2021-03-02
Filing date: 2021-03-02
Publication date: 2021-07-16
Anticipated expiration: 2041-03-02
Also published as: CN113127599B

Abstract

The invention discloses a method and a device for detecting a question-answering position of a hierarchical alignment structure, wherein the method comprises the following steps: respectively converting the question text and the answer text into a question sequence and an answer sequence; splicing the question sequence and the answer sequence to obtain a question answer sequence; and inputting the question sequence, the answer sequence and the question answer sequence into a hierarchical alignment model to obtain a question-answer standpoint detection result. According to the hierarchical alignment model, a BERT pre-training model is used to obtain coarse-grained vertical representation, concept-level target alignment and evidence-level information alignment are performed from two aspects of question and answer in a QA pair, and coarse-to-fine vertical representation is obtained, so that higher accuracy and F1 value can be obtained on a question-and-answer vertical detection task.

Description

Question-answering position detection method and device of hierarchical alignment structure

Technical Field

The invention relates to the field of social media-position detection-natural language processing, in particular to a question-answering position detection method and device with a hierarchical alignment structure.

Technical Field

The position detection task is a classification problem, which is intended to identify the position of an author expressed on a specific target (such as an entity, a statement, an event, and the like), and plays an important role in tasks such as opinion identification, political debate, rumor detection, fake news detection, and the like. In the social question-and-answer platform, question-and-answer position detection is a novel position detection task, aiming at identifying the position carried in the answer to a specific question.

For the site detection task, early research focused on online forensic text, mainly using rule-based algorithms (Walker, M., Tree, J.F., and, P., Abbott, R., King, J.: A: color for Resources on delay and floor. in: Proceedings of the origin International Conference on delay and Evaluation (LREC' 12). pp.812{ European channels Association (ELRA), Istanbul, Turkey (May)), SVM (Hasan, K.S., Ng., V.: stability classification of electronic devices: Data, fields, services, string, training, string, concrete, etc.) (Source: coding, N.S. 201, J.S. and J.S. J., coding. J.S.: coding, J.S., and J.7. application, J.S.: coding. and classification of electronic devices: Data, fields, sources, Resources, devices, concrete, N.S. Pat. No. 5. J.S. Pat. No. J.7. application, J.S. 12. and No. J.S. (Source, application. 7. application, mineral, P.S. (J.S.: code, P.S. 12. 7. application, No. 7. application, No. 7. application, application, cham (2014)), and the like. Recent work gradually shifted to the social media field, research methods gradually shifted to deep learning methods, using a model based on a deep neural network to analyze the standpoint of targets, such as documents (Vijayaraghavan, P., Sysoev, I., Vosoughi, S., Roy, D.: Deep at SemEvent-2016 (6): Detecting station in tweeth using the character and word-level Ns. in: Proceedings of the 10 International Workshop on management (SemEvent-2016). 413{419.Association for Computational linkage, Sanego, Calorifa (2016)), documents (Zarrella, G., A., Marsh, Miq-Securi, J.D.: Zones, J.D.: J., g. standard detection with systematic identification network. in: Proceedings of the 27th International Conference on Computational restriction. pp.2399{2409.Association for Computational restriction, SantaFe, New Mexico, USA (Aug 2018) }. In addition, there are some studies, such as literature (Zhang, B., Yang, M., Li, X., Ye, Y., Xu, X., Dai, K.: engineering cross-target state detection with transferable information-based knowledge consumption. in: Proceedings of the 58th annular Meeting of the Association for Computational linkage. pp.3188{3197.Association for Computational linkage, Online (JSlul 2020) }) and literature (over-looking, V., Attarad, G.: Transfer learning to polymers from pages in question) which use the main knowledge of the target object of migration { emission, emission.

The question-answer position detection aims at questions in a question-answer text and identifies positions in the answer text. Given a question-answer (QA) pair, The latest approach proposes a cyclic conditional attention network (Yuan, j., Zhao, y., Xu, j., Qin, b., expanding answer state detection with a repeat conditional assignment. in: The third-third aai Conference on architecture assignment, AAAI 2019, The third-First Innovative Applications of architecture assignment, IAAI 2019, The Ninth AAAI aggregate on duration assignment, EAAI 2019, honoluu, hawaiii, USA, January 27-library 1,2019, 20126. in architecture assignment, EAAI 2019, honoluu, Hawaii, USA, January 27, library 1, 20126. in architecture assignment, and The final answer to The question is obtained by modeling The final answer state of The cyclic conditional attention network (Yuan, j., Zhao, y 26, and The final answer state of The text of The final answer (aai.e., aai, 7433). When the question and answer standpoint detection task is solved, the model not only needs to understand the semantics in the question and answer text, but also needs to model the relationship between the question and the answer text.

Furthermore, vertical detection subtask vertical detection (Gorrall, G., Kochkina, E., Liakata, M., Aker, A., Zubiga, A., Bontcheva, K., Derczynski, L.: Semcal-2019 task 7: Rumour Eval, determining rule trend and support for rule in Proceedings of the 13th International word on semiconductor evaluation.845 {854.Association for computing Linguletics, Minneapolis, Minnesota, USA (Jun 2019)) and false news vertical detection (Gorrall, G., Kochkina, E., Lichkin, M., Akia, Zneu, A., Zuk, J., Wen, J.7, J.12, J.1. evaluation, J., and the question-answer position detection focuses more on how to learn the mutual correlation between QAs, and models the position representation under the specified target. The tasks associated with question-answering position detection are also target-dependent sentiment analysis (Gorrell, G., Kochkina, E., Liakata, M., Aker, A., Zubiga, A., Bontcheva, K., Derczynski, L.: SemEval-2019task 7: Rumourval, determining rumour veracity and reporting for rumour. in: Proceedings of the 13th International Workshop on semiconductor evaluation. pp.845. Association for practical linearity, Minneapolis, Minnesota, USA (Jun 2019)), the latter target being a representation of learning targets, and the need to find targets and information associated with the entire question.

The prior art is applied to the question-answering position detection task and ignores the following two problems. First, in question-and-answer position detection, positions are related to targets related to concepts in the question text, but words representing the same concepts appearing in the question and answer text may not coincide and target alignment should be performed. Second, the answer text may contain more than one concept-related target, and the additional target information may interfere with the recognition standpoint, and context alignment should be performed to find the content that can support the question text, i.e., evidence-related context.

Disclosure of Invention

The invention provides a method and a device for detecting a question-answering position of a hierarchical alignment structure, which solve the problem that a target related to a concept and a context related to an evidence in a QA pair in a question-answering position detection task are possibly inconsistent through a method of target alignment related to the concept and context alignment related to the evidence, and carry out vector representation from coarse to fine on the position, thereby effectively improving the effect of the question-answering position detection task and accurately identifying the position carried by an answer text aiming at the problem in the QA pair.

In order to achieve the purpose, the invention provides the following technical scheme:

a question-answering position detection method of a hierarchical alignment structure comprises the following steps:

1) respectively converting the question text and the answer text into a question sequence and an answer sequence;

2) splicing the question sequence and the answer sequence to obtain a question answer sequence;

3) inputting the question sequence, the answer sequence and the question answer sequence into a hierarchical alignment model to obtain a question-answer standpoint detection result;

wherein, a question-answering place detection model is obtained through the following steps:

a) respectively converting the plurality of sample question texts and the plurality of sample answer texts into sample question sequences and sample answer sequences, and splicing the sample question sequences and the corresponding sample answer sequences to obtain a plurality of sample question answer sequences;

b) respectively coding each sample question sequence, sample answer sequence and sample question answer sequence to obtain a plurality of question sequence representations S_QAnswer sequence representation S_AAnd coarse grain size in the vertical representation of S_QA；

c) Representing S by a question sequence_QAs a query and representing the corresponding answer sequence as S_AObtaining a number of question-dependent answer representations M as keys and values_Q→AExpressing S as a sequence of answers_AAs a query and representing the corresponding answer sequence as S_QObtaining, as keys and values, a number of answer-dependent question representations M_A→QAnd connecting the question-dependent answer representation M_Q→AWith corresponding answer-dependent question representation M_A→QObtaining a plurality of fine-grained representations D_QA；

d) Aligning fine-grained representations D based on a multi-head attention mechanism_QARepresenting S from the corresponding coarse-grained standpoint_QAThe sentence meanings related to evidence between the two groups of the sentences obtain a plurality of vectors representing O from a coarse position to a fine position;

e) and classifying a plurality of vector representations O from coarse to fine to obtain a level alignment model.

Further, the method of encoding the sample question sequence, the sample answer sequence, and the sample question answer sequence includes: a pre-trained BERT model is used.

Further, a question-dependent answer representation M is obtained by the following steps_Q→A：

1) Representing S by a question sequence_QAs a query and representing the corresponding answer sequence as S_AObtaining an output of a first answer-question matching block as a key and value, comprising the steps of:

a) obtaining the output of the ith head

Wherein

Is that

D is the embedding size of the sample question text and the sample answer text converted into the sample question sequence and the sample answer sequence, h is the number of heads,

is a parameter which can be learned, i is more than or equal to 1 and less than or equal to h;

b) splicing the outputs of h heads, and performing linear projection operation on the splicing result to obtain an operation result

MATT(S_Q，S_A)＝[ATT₁(S_Q，S_A)，ATT₂(S_Q，S_A)，...，ATT_h(S_Q，S_A)W^OWherein

Are learnable parameters;

c) in question sequence representation S_QAnd operation result MATT (S)_Q，S_A) The residual connection is carried out between the two to obtain the result Z ═ LN (S)_Q+MATT(S_Q，S_A) LN is a hierarchical normalization operation;

d) accessing the result Z into a feedforward network and another residual connecting layer to obtain the output TIM of the first transform encoder₁(S_Q，S_A) LN (Z + MLP (Z)), where MLP is a feed-forward network;

2) by stacking_mAn answer-question matching block for obtaining an answer representation dependent on a question

Further, I ═ MATT' (D) is represented by a coarse to fine field vector_QA，S_QA)＝[ATT′₁(D_QA，S_QA)，...，ATT′_h′(D_QA，S_QA)]W′^OWherein

Is a parameter that can be learned, j is more than or equal to 1 and less than or equal to h'.

Further, by classifying several coarse to fine field vector representations O:

1) calculating the probability that the vector from the coarse to the fine position represents that O belongs to each type of position by utilizing a softmax function;

2) the class with the highest probability is taken as the class of the coarse-to-fine-field vector representation O.

Further, before calculating the probability that O belongs to each class of position represented by the coarse-to-fine position vector, a linear layer is used to reduce the number of dimensions each representing O by the coarse-to-fine position vector.

Further, a loss function of the level alignment model is trained

Wherein

Is the result of the prediction, N is the number of sample question texts or sample answer texts, and | C | is the number of the set of the place categories.

Further, the set of standpoint categories C includes: approval, disapproval, and neutrality.

A storage medium having a computer program stored therein, wherein the computer program is arranged to perform the above-mentioned method when executed.

An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer to perform the method as described above.

Compared with the prior art, the invention has the following advantages:

compared with the scheme of the circulation condition attention network, the method explicitly models the target dependency information through the attention coding strategy. In contrast, the conditional attention and extraction process only simulates the interaction between the QA pair, and neither learns a feature-rich text representation nor explicitly performs target and context alignment at the encoding stage, but the present invention uses the BERT pre-training model to obtain coarse-grained vertical representation, and then performs concept-level target alignment and evidence-level information alignment from both the question and the answer in the QA pair to obtain coarse-to-fine vertical representation. Experiments prove that the technical method can obtain higher accuracy and F1 value on the question-answering position detection task.

Drawings

FIG. 1 is a Hierarchical Alignment (HAT) model architecture diagram of the present invention.

Detailed Description

In order to make the technical solutions in the embodiments of the present invention better understood and make the objects, features, and advantages of the present invention more comprehensible, the technical core of the present invention is described in further detail below with reference to the accompanying drawings and examples.

The invention provides a novel question-answering position detection model, namely a Hierarchy Alignment (HAT) model based on a Transformer, as shown in figure 1, the model can align the context related to the concept-related target and evidence in a question-answering pair, learn the position from rough to fine, and be applied to the question-answering position detection. The HAT model mainly comprises three modules: a question and answer text coding module, a concept related target alignment module and an evidence related context alignment module. First, the present invention uses the Pre-training model BERT (Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: BERT: Pre-training of deep biological transformations for Language integrity in: Proceedings of the 2019 Conference of the North American pipeline of the Association for practical linearity of Human Language Technologies, Volume 1(Long and Short Papers) pp.4171{4186.Association for practical linearity, Minneaps, Minnesota (Jun 2019)) to calculate the basic features and meaningful features of the question-answer text. Then, a QA interaction matching block is introduced to align the concept-related objects from two directions, obtaining a question-dependent answer representation and a question-dependent answer representation. Finally, a multi-head attention mechanism is used to align evidence-related contexts to learn a better ground representation for question-answering ground detection.

The method is mainly divided into the following four parts: encoding a question and answer text, aligning targets, aligning contexts and classifying from the standpoint.

1. Question-answer text coding

For problem text, the invention converts the text sequence into a sequence representation X ═ X₁，x₂，...，x_NTherein of

Is the sum of word embedding, segment embedding and position embedding, N is the length in the problem sequence, d is the size of the embedding, and d is also the dimension size of the pre-trained model BERT used to obtain the text representation. The encoded text is the output of the last layer of the BERT encoder, i.e. the problem sequence representation

The answer sequence representation can be obtained using the same method

Then, the invention splices the question sequence and the answer sequence, inputs them into the pre-trained BERT model to obtain a coarse-grained position representation which is marked as

Wherein one dimension of the (N +1+ M) extra is the separator [ SEP ] between the Q and A sequences]。

2. Target alignment

The role of the concept-related object alignment module is to align the concept-related objects from both the question and answer aspects in the QA pair, learning an answer-dependent question representation and an answer-dependent question representation. The QA interaction matching module is thus constructed, using a self-attention mechanism, to align the concept-level targets from two aspects. We propose two QA interaction matching blocks: a question-answer matching block and an answer-question matching block.

Question-answer matching block represents a sequence of questions S_QAs a query, the answer sequence is represented as S_AAs keys and values. Conversely, the answer-question matching block represents the answer sequence S_AAs a query, the answer sequence is represented as S_QAs keys and values. In this way, the model focuses more on conceptually related goals, both in terms of questions and answers, and thus obtains an answer-dependent question representation and an answer-dependent question representation.

Specifically, the ith head calculation formula of the question-answer matching block is:

wherein the content of the first and second substances,

is that

The dimension (c) of (a) is,

is a learnable parameter, and h is the number of heads.

Then, the outputs of the h heads are spliced together to perform linear projection operation, and the formula is as follows:

MATT(S_Q，S_A)＝[ATT₁(S_Q，S_A)，ATT₂(S_Q，S_A)，...，ATT_h(S_Q，S_A)]W^o

wherein the content of the first and second substances,

are learnable parameters.

Then, at S_QAnd MATT (S)_Q，S_A) The residual errors are connected, and the calculation formula is as follows:

Z＝LN(S_Q+MATT(S_Q，S_A))

where LN is the hierarchical normalization operation. After that, Z is then coupled into a feed forward network (MLP) and another residual connection layer, resulting in the output of the first transform encoder:

TIM(S_Q，S_A)＝LN(Z+MLP(Z))

wherein

I.e., the output of the first question-answer matching block.

We stacked on l_mA matching block for obtaining answer representation dependent on the question

I.e. the output of the last layer, denoted as M_Q→AWhere l is_mIs a hyper-parameter representing the number of matching blocks.

Similar to the computation of the question-answer matching block, we can also pile up l_mA matching block obtains an answer-dependent question representation by computing an answer-question matching block

Is marked as M_A→Q。

Finally, we will represent two kinds M_Q→AAnd M_A→QConnecting to obtain a fine-grained representation D_QAAs the output of the conceptually related target alignment module.

3. Context alignment

The alignment module associated with evidence aims to align the evidence context of the QA pair and accumulate from coarse-grained to fine-grained standpoint representations for question-answering standpoint classification. To accomplish this, the present invention employs a multi-head attention layer to align the fine-grained representation D of QA_QAFrom the standpoint of coarse particle size, S_QAEvidence-related sentence meaning in between.

Specifically, the multi-head attention is calculated:

MATT′(D_QA，S_QA)＝[ATT′₁(D_QA，S_QA)，...，ATT′_h′(D_QA，S_QA)]W′^o

where h' is the number of attention heads. Note that the last vertical vector from coarse to fine is denoted as O ═ MATT' (D)_QA，S_QA) Thus, the context alignment process is completed.

4. Location classification

After the vector representation of the vertical is obtained, the final vertical classification is performed. In the position classification part, a linear layer is used for reducing the number of dimensions, then the probability of belonging to each type of position is calculated by using a softmax function, and the category with the highest probability is taken as the position category of a given QA pair. This section is formulated as:

the loss function during training is:

wherein the content of the first and second substances,

is the result of the prediction, representing the probability of the jth class of categories; when the jth class is the true label of sample i,

is 1, otherwise is 0; n is the data size of the training data; i C is the size of the number of the set of vertical classes, where the set of vertical classes C ═ Favor, Against, Neutral }.

(III) positive effects

In order to verify the effect of the method, in the experimental process, the invention uses an open source data set proposed in the above-mentioned cyclic conditional attention network scheme, and the data set comprises a plurality of Chinese question-answer pairs. The question-answer pair data is collected from three websites of hundredth knowledge, dog searching and question asking and medical public network, and the related targets of concepts mainly comprise pregnancy, food safety, diseases and the like. The training data set size is 10598, the test size is 2993, and the data size for each of the standpoint categories of the training set and the test set is shown in table 1.

The evaluation indexes of the method are accuracy (accuracycacy), F1-macro, F1-macro, F1-favor and F1-against, wherein F1-favor is the F1 value of a sample with a position label as support, and F1-against is the F1 value of a sample with a position label as objection. The present method (HAT model) was compared with some mainstream methods, and the specific results are shown in table 2.

TABLE 1 data set statistics

TABLE 2 results of the experiment

The model provided by the invention achieves the optimum on each evaluation index, exceeds the performances of a plurality of mainstream models, and proves the effectiveness of the method provided by the invention.

The above examples are provided only for the purpose of describing the present invention, and are not intended to limit the scope of the present invention. The scope of the invention is defined by the appended claims. Various equivalent substitutions and modifications can be made without departing from the spirit and principles of the invention, and are intended to be within the scope of the invention.

Claims

1. A question-answering position detection method of a hierarchical alignment structure comprises the following steps:

c) Representing S by a question sequence_QAs a query and representing the corresponding answer sequence as S_AObtaining, as keys and values, a number of question-dependent answer representation M_Q→AExpressing S as a sequence of answers_AAs a query and representing the corresponding answer sequence as S_QObtaining, as keys and values, a number of answer-dependent question representations M_A→QAnd connecting the question-dependent answer representation M_Q→AWith corresponding answer-dependent question representation M_A→QObtaining a plurality of fine-grained representations, denoted as D_QA；

2. The method of claim 1, wherein the method of encoding the sample question sequence, the sample answer sequence, and the sample question answer sequence comprises: a pre-trained BERT model is used.

3. Such as rightThe method of claim 1, characterized in that the question-dependent answer representation M is obtained by the following steps_Q→A：

a) obtaining the output of the ith head

Wherein

Is that

b) splicing the outputs of h heads, and performing linear projection operation on the splicing result to obtain an operation result MATT (S)_Q，S_A)＝[ATT₁(S_Q，S_A)，ATT₂(S_Q，S_A)，...，ATT_h(S_Q，S_A)W^OWherein

Are learnable parameters;

c) in question sequence representation S_QAnd operation result MATT (S)_Q，S_A) The residual connection is carried out between the two to obtain the result Z ═ LN (S)_Q+MATT(S_Q，S_A) Wherein LN is hierarchicalCarrying out a normalization operation;

4. The method of claim 3, wherein O MATT' (D) is represented by a coarse to fine field vector_QA，S_QA)＝[ATT′₁(D_QA，S_QA)，...，ATT′_h′(D_QA，S_QA)]W′^OWherein

Is a parameter which can be learned, j is more than or equal to 1 and less than or equal to h ', and h' is the number of attention heads.

5. The method of claim 1, wherein the classification is performed by classifying a number of coarse-to-fine-field vector representations O:

6. The method of claim 5 wherein the number of dimensions each representing O by a coarse-to-fine field vector is reduced using a linear layer before computing the probability that O belongs to each class of fields represented by a coarse-to-fine field vector.

7. The method of claim 1, wherein a penalty function of the hierarchical alignment model is trained

Wherein

8. The method of claim 7, wherein the set of standpoint categories C comprises: approval, disapproval, and neutrality.

9.A storage medium having a computer program stored thereon, wherein the computer program is arranged to, when run, perform the method of any of claims 1-8.

10. An electronic device comprising a memory having a computer program stored therein and a processor arranged to run the computer program to perform the method according to any of claims 1-8.