CN115964497A

CN115964497A - Event extraction method integrating attention mechanism and convolutional neural network

Info

Publication number: CN115964497A
Application number: CN202310154608.9A
Authority: CN
Inventors: 周永彬; 周沁仪; 林海伦; 张倩
Original assignee: Nanjing University of Science and Technology; Institute of Information Engineering of CAS
Current assignee: Nanjing University of Science and Technology; Institute of Information Engineering of CAS
Priority date: 2023-02-23
Filing date: 2023-02-23
Publication date: 2023-04-14

Abstract

The invention discloses an event extraction method integrating an attention mechanism and a convolutional neural network, which comprises the following steps of: 1) Performing feature representation on text content to be extracted by using a text encoder to obtain distributed features of the text to be extracted; 2) Extracting context characteristics of the text to be extracted and associated information between vocabularies from the distributed characteristics by using a characteristic extractor; 3) Inputting the context characteristics of the text to be extracted and the associated information between the vocabularies into an event trigger word classifier, outputting the event trigger word of the text to be extracted, and then determining the event type of the text to be extracted based on the event trigger word of the text to be extracted; 4) The event element classifier judges whether each word segmentation in the text to be extracted is an event element or not in sequence according to the event type and the context characteristics of the text to be extracted; 5) A role category is identified for each of the event elements using an element role classifier. The invention greatly improves the accuracy rate of event extraction and has high efficiency.

Description

Event extraction method integrating attention mechanism and convolutional neural network

Technical Field

The invention belongs to an information extraction technology in the field of natural language processing, and particularly relates to an event extraction method integrating an attention mechanism and a convolutional neural network.

Background

The information extraction is to extract specific information from the natural language text, so as to automatically classify, extract and reconstruct the mass heterogeneous text content. The information extraction mainly comprises entity extraction, relation extraction and event extraction, wherein the event extraction is an extraction task for highly structuring various entities and relations. The event extraction is to identify important arguments of events related to a target from a semi-structured or unstructured text, namely, event trigger words and event related argument information are obtained and organized into event information, and the event information is widely applied to the fields of semantic search, information analysis, event reasoning, risk early warning, intelligent question answering and the like. The event extraction task is divided into two parts of event identification and event argument identification, namely finding an event trigger word and determining the event type, identifying the event key argument and determining the argument role. Since the understanding of the world and the problem solving are more in line with the cognitive habits of human beings by taking events as units, event extraction becomes a focus of attention in the industrial and academic fields at home and abroad. In recent years, most of these research works rely on deep learning to obtain event information, and currently, the existing event extraction methods are mainly classified into the following categories according to the neural network architectures used by the methods:

(1) Event extraction based on the convolutional neural network, inducing k-grams information by using the convolutional neural network, capturing local semantic features, learning the composition semantic features of sentences, and completing the identification and extraction of event information. This approach can handle multiple event scenarios. However, the vocabulary semantics cannot be encoded in different contexts, and if more global information needs to be acquired, the receptive field needs to be increased in a cascading manner.

(2) Event extraction based on the recurrent neural network, modeling sequence information of the recurrent neural network, capturing the dependency relationship between the parameter role and the trigger word type, and mining the relationship between time sequence information and long distance in the text. The method can effectively analyze the long dependency text, but the method is essentially a Markov decision process, cannot well learn global structure information, neglects the position information of the entity pair, cannot be executed in parallel and has low speed.

(3) Based on the event extraction of the attention mechanism, the dependency relationship among different distance words is learned, the weight among the words is analyzed, the semantic relationship is judged, the event mode information is reused, and the data annotation is simplified. The method can capture important semantic information and effectively utilize global information, but the sequences are directly compared pairwise, so that the position cannot be well modeled.

(4) Event extraction based on the combination of the neural networks improves the extraction capability of semantic features by stacking two or more neural networks, and can verify that fusing different neural networks to improve the accuracy and efficiency of event extraction is effective.

However, the existing current event extraction method still lacks of fully utilizing semantic features in the text, and ignores the improvement of semantic correlation between event types and event arguments in the event extraction process on the event extraction accuracy. Fusing multiple neural networks is an effective feature extraction mode, but how to design a proper neural network to realize the full utilization of semantic relevance still needs to be researched.

Disclosure of Invention

The invention aims to design an event extraction method fusing a plurality of neural networks by fully utilizing semantic correlation in texts, and the method can improve the accuracy of event extraction under the condition of ensuring the efficiency of event extraction.

In order to achieve the purpose, the invention adopts the technical scheme that: an event extraction method integrating an attention mechanism and a convolutional neural network is characterized in that: (1) Effectively acquiring a combined semantic feature vector in a literature sentence through a convolutional neural network, and performing maximum pooling operation on the combined semantic feature vector to obtain a sentence-level local feature and a semantic structure; (2) The attention fusion mechanism solves the problem of long-distance dependence by calculating the mutual influence among words in the same sentence, the attention fuses different knowledge generated by the same attention convergence, the difference of the knowledge comes from different subspace representations of the same query, key and value, and the influence of text noise on the extraction effect can be effectively reduced; (3) The word vector generated by loading a pre-training model (such as BERT) is used for mapping corresponding word vector representation for each input word, namely the feature vector fused with context information is dynamically generated by fine tuning, and a plain text is represented as distributed feature information, so that the hidden state of each word contains the influence of words at different positions in a sentence, and meanwhile, the sparsity of a parameter space is avoided.

An event extraction method for fusing an attention mechanism and a convolutional neural network comprises the following steps:

1) Performing feature representation on the text content to be extracted by using a text encoder to obtain distributed features of the text to be extracted;

2) Extracting context characteristics of the text to be extracted and associated information among vocabularies from the distributed characteristics by using a characteristic extractor;

3) Inputting the contextual characteristics of the text to be extracted and the associated information between the vocabularies into an event trigger word classifier, outputting the event trigger word of the text to be extracted, and then determining the event type of the text to be extracted based on the event trigger word of the text to be extracted;

4) The event element classifier judges whether each word segmentation in the text to be extracted is an event element or not in sequence according to the event type and the context characteristics of the text to be extracted;

5) A role category is identified for each of the event elements using an element role classifier.

Further, the method for obtaining the distributed features of the text to be extracted comprises the following steps: the text encoder generates a text sequence S = [ x ] according to the text to be extracted ₁ ,x ₂ ,x ₃ ,..,x _n ]Wherein n is the number of word segmentation in the text to be extracted, x _i The ith word segmentation in the text to be extracted is carried out; then, a word embedding vector, a segmentation vector and a position vector are respectively generated for each participle in the text sequence S, and the text sequence S is converted into an input sequence T = (T) in a mode of summing the word embedding vector, the segmentation vector and the position vector ₁ ,t ₂ ,t ₃ ,..,t _n )，t _n A word embedding vector, a segmentation vector and a position vector which represent the nth participle are merged into a fused vector; the sequence T = (T) ₁ ,t ₂ ,t ₃ ,..,t _n ) Inputting a Transformer layer to obtain interword association and distribute weight through a self-attention function to obtain a feature vector fusing context information; inputting the feature vector of the fusion context information into a pre-training model to obtain a sequence E _m ＝{e ₁ ,e ₂ ,e ₃ ,...,e _n The text to be extracted is obtained; e.g. of a cylinder _n And representing the word vector corresponding to the nth participle.

Further, the feature extractor comprises a bidirectional long-time and short-time memory network, a convolutional layer and an attention unit; the method for extracting the context characteristics of the text to be extracted and the associated information among the vocabularies comprises the following steps: firstly, inputting the distributed characteristics into the bidirectional long-time and short-time memory network to obtain sequence characteristic vectors and inputting the sequence characteristic vectors into the convolutional layer; the convolution layer carries out convolution calculation on the sequence feature vector to obtain the local feature of the text to be extracted and the semantic structure high-dimensional feature vector; inputting the semantic structure high-dimensional feature vector into the attention unit to obtain each participle x in the text to be extracted _i Relevance feature vector r with target entity _i Where i ∈ [1, n ]]。

Further, the event trigger word classifier will

And r _i After splicing, inputting the words into a conditional random field to obtain each word segmentation x in the text to be extracted _i A corresponding event type; />

For word segmentation x _i And (5) corresponding semantic structure high-dimensional feature vectors.

Further, the pre-training model is a BERT model.

A server, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the above method.

A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the above-mentioned method.

The invention completes the event extraction through a text encoder, a feature extractor, an event trigger word classifier, an event element classifier and an element role classifier, and comprises the following steps:

firstly, performing feature representation on text content to be extracted by using a text encoder, and representing a plain text as distributed feature information;

secondly, acquiring features with different abstraction degrees successively by using a feature extractor, and automatically capturing text sentence features from distributed features obtained by processing of a text encoder through a bidirectional long-short memory network; then, extracting local features around each vocabulary through a convolutional neural network, further extracting contextual features and key information features among the words from the text sentence features captured by the bidirectional long and short memory network by combining local feature vectors extracted by the convolutional layer, distributing different weights to different vector representations by using an attention mechanism, learning the relevant information among the vocabularies from the text sentence features captured by the bidirectional long and short memory network, and reducing the influence of text noise on the extraction effect;

then, classifying the new sample by utilizing deep features extracted by a convolutional neural network and an attention mechanism and an event trigger word classifier, judging whether the vocabulary is an event trigger word, classifying the event category based on trigger word information, and completing event type identification;

and then, according to the characteristics of the event type and the acquired characteristics, sequentially judging each entity in the sentence by using an event element classifier, and judging whether the phrase is an element of the event.

And finally, judging the role type of the acquired event element by using an element role classifier, and finally obtaining a structured event which comprises a trigger word, the event element and the type thereof.

The text encoder performs characteristic representation on the text of the information to be extracted, and the characteristic representation is divided into two processes: and (4) constructing a feature representation model and learning the model. The specific execution steps are as follows:

1) Constructing a feature representation model: preparing a text sequence S = [ x ] containing n words ₁ ，x ₂ ，x ₃ ，..，x _n ]Wherein x is _i An ith character representing the input text sequence; converting the text sequence S into an input sequence T = (T) by summing a word embedding vector, a segmentation vector and a position vector ₁ ，t ₂ ，t ₃ ，..，t _n ) Wherein the word embedding vector is obtained by querying a word vector table, the segmentation vector represents a sentence to which the word belongs, the position vector represents word position information, t _n A word embedding vector + a segmentation vector + a position vector representing the nth participle in the S; the sequence T = (T) ₁ ，t ₂ ，t ₃ ，..，t _n ) Inputting a Transformer layer to obtain the association among words and distribute weight through a self-attention function to obtain a feature vector fusing context information:

q represents a hidden vector of a decoded character, K is a hidden vector corresponding to different words in the encoder, Q and each word in the coding region respectively calculate a value through a vector dot product, and a weight is calculated through a softmax function. Intuitively speaking, Q, K, V are same sentence, obtain the relation matrix between the lemma and normalize the back through the vector product, act on input sentence self, through a plurality of different linear variation to Q, K, V projections promptly, obtain single-head attention:

the multi-head attention mechanism is to splice different self-attention mechanism results, and calculate to obtain different spatial latitude position information:

MultiHead(Q，K，V)＝Concat(Head ₁ ，Head ₂ ，Head ₃ ，..，Head _n )W ⁰ ；

2) Model learning: training and learning on the feature representation model by using a training set of literature articles, namely 'the Guzaer King Chuan', dynamically generating a feature vector fused with context information to obtain a sequence E _m = BERT (T), mapping as E _m ＝{e ₁ ，e ₂ ，e ₃ ，...，e _n }，e _i ＝[w ₁ ，w ₂ ，...，w _m ]，E _m I.e. the resulting distributed signature, e _i Embedding a vector for each word corresponding to the entered character, w ∈ [ -1,1]And m is the designated word vector dimension.

The feature extractor is divided into two processes for distributed feature extraction: sentence-level feature capture and high-level semantic feature capture. The specific execution steps are as follows:

1) Sentence level feature capture: using a bidirectional long-time memory network to capture sentence vector context information and enabling distributed features E _m Inputting bidirectional long-time and short-time memory network, and connecting the forward direction of the ith character in series

And backward output->

Obtaining the sequence characteristic vector of the ith character and marking the sequence characteristic vector as->

Obtaining sentence-level semantic representation L of ith character _i ；

2) High-level semantic feature capture: calculating out

L to be acquired _i Performing local feature extraction by convolution layer using multiple convolution kernels in combination, conv _k Representing the convolution operation, the convolution kernel is k; counting/or>

The maximum pooling operation after different convolution kernel processing is completed to obtain the local feature and semantic structure high-dimensional feature vector->

Calculating alpha = softmax (Q × K) by a fusion attention mechanism, wherein Q represents a hidden vector of a decoded word, K is a hidden vector corresponding to different words in an encoder, Q and each word in a coding region respectively calculate a value through a vector dot product, and a weight is calculated through a softmax function; calculating an output vector +>

Obtaining the correlation characteristic vector r of the words in the sentence and the target entity _i For the word segmentation x in the text to be extracted _i Relevance feature vector to target entity, where i ∈ [1, n ]]。

Event trigger word classifier using feature extractor

And r _i The vector splicing is input into a conditional random field, so that the acquisition of the event type is completed, and the method is divided into two parts of constructing an event type dependency relationship and deducing the event type. The specific execution steps are as follows:

1) Constructing event type dependency relations: modeling dependencies between tags using conditional random fieldsFor a given statement S = { x = ₁ ,x ₂ ,..,x _n And its corresponding sequence label y = { y = } ₁ ,y ₂ ,..,y _n Calculating the conditional probability p (y \ m) on all possible label sequences given by y; computing

Wherein β (S) represents a likely event type tag sequence in S; calculate->

f is a mapping function that maps the feature vector to an event type label; counting/or>

W _y Is a prediction weight matrix, based on the weight value of the prediction>

Is the transition weight; loss function calculation mode L = -sigma _x logp(y|S)；

2) The type of the presumed event is: decoding an input model τ = (A, B, π) and an observed O = { O } using the Viterbi algorithm ₁ ,o ₂ ,..,o _T }, initialize delta ₁ (i)＝π _i b _i (o ₁ )，i∈[1,N](ii) a Recursion delta _t (i)＝max _1≤j≤ N[δ _t-1 (j)a _ji ]b _i (o _t ) And

up to P ^* ＝max _1≤i≤N δ _r (i) And i is ^* _r ＝argmax _1≤i≤N [δ _r (i)](ii) a Backtracking optimal path->

Determining an optimal path->

And obtaining an analysis result, and outputting and obtaining the event type corresponding to the trigger word.

The event element classifier is divided into three parts, namely feature splicing, construction of event argument dependency relationship and inference of event argument types. The specific execution steps are as follows:

1) Characteristic splicing: will extract each word segmentation x in the text to be extracted _i Mapping the corresponding event type to an event vector V _i Is spliced to r _i Then, obtaining a feature vector R fusing event type information _i ＝{V _i ,r _i }；

2) Constructing an event argument dependency relationship: modeling dependencies between tags using conditional random fields, S = { x ] for a given statement ₁ ,x ₂ ,..,x _n And its corresponding sequence label g = { g = } ₁ ,g ₂ ,..,g _n Calculating the conditional probability p (g \ S) on all possible label sequences given by y; computing

Wherein β (S) represents a possible event argument tag sequence in S; calculate->

f is a mapping function that maps the feature vectors to event argument tags; counting/or>

W _g Is a prediction weight matrix, based on the weight value of the prediction>

Is the transition weight; loss function calculation mode L = -sigma _s log(g|S)；

3) Speculative event argument type: decoding an input model τ = (C, D, π) and an observed O = { O } using the Viterbi algorithm ₁ ,o ₂ ,..,o _T }, initialize delta ₁ (i)＝π _i b _i (o ₁ )，i∈[1,N](ii) a Recursion delta _t (i)＝max _1≤j≤N [δ _t-1 (j)a _ji ]b _i (o _t ) And with

Up to, P ^* ＝max _1≤i≤N δ _r (i) And i is ^* _r ＝argmax _1≤i≤N [δ _r (i)](ii) a Backtracking optimal path +>

Determining an optimal path->

And obtaining an analysis result and outputting an argument and an argument role.

Compared with the prior art, the invention has the following positive effects:

according to the event extraction method fusing a plurality of neural networks, the convolutional neural network, the attention mechanism and the pre-training model are combined, so that the training effect is guaranteed, and the model training speed is effectively increased; on the other hand, the method pays attention to the importance of the semantic relation between the event type and the event argument to the event extraction and focuses on utilizing the relation, so that the accuracy and the efficiency of the model are remarkably improved compared with other models of the same type.

Drawings

FIG. 1 is an overall flow diagram of the method of the present invention.

FIG. 2 is a flow chart of an algorithm of an event extraction method for merging an attention mechanism with a convolutional neural network according to the present invention.

Detailed Description

The invention is further described with reference to the following figures and examples.

Fig. 1 is a system model diagram of an event extraction method combining an attention mechanism and a convolutional neural network according to the present invention. The method mainly comprises the following five steps to complete event extraction:

step 101, preparing an event text data set for training, namely 'Kazakh of Gezakh';

step 102, constructing text feature representation by using a text encoder, and acquiring distributed feature representation of a text S to be extracted;

and 103, capturing sentence-level semantic representation, high-dimensional feature vectors of local features and semantic structures, and correlation feature vectors of words and target entities in the distributed feature representation by using a feature extractor.

104, detecting a trigger word representing an event for the feature representation acquired in the step 103 by using an event trigger word classifier, and acquiring an event type corresponding to the text through the event trigger word;

step 105, merging the event type obtained in step 104 and the feature representation obtained in step 103 by using an event element classifier and an element role classifier, sequentially judging each entity in the sentence through the merged feature vector, and judging whether the phrase is an element of the event or not;

further, in step 101, an event text data set is collected in the literature, "the university of gesarang", and the data set is divided into a training set and a testing set according to a ratio of 9. The training set is input into a text coder and training events are extracted. The event extraction model adopts supervised learning, and the labeling of event corpora adopts a double-pointer labeling method. The double-pointer marking method can effectively solve the entity nesting problem by endowing each character with the corresponding starting position and the corresponding ending position of each label, and the labels cover entities, event trigger words, event elements and corresponding roles. Each corpus comprises two forms of a single sentence and a plurality of sentences, has event theme consistency and only considers main events. There are 22 experimental corpus event types.

Further, in step 102, the pre-training model used in the text encoder adopts a bert-base-hierarchy version in hugging face. In step 102, parameters of the model are learned using Adam's method. The method comprises the following steps:

step 201, selecting a training data set M for model learning, initializing algorithm input: setting model learning parameters according to the event type set C and the relationship set L: maximum number of iterations epochs, learning rate λ, maximum input sequence length b, batch size k (in this example epochs is 100, λ is 5e-5, b is 128, k is 32);

step 202, training the feature representation model constructed by the text encoder in step 201, learning model parameters, and learning the parameters of the model by adopting an Adam method in the model learning process;

further, in step 103, the feature extractor includes a stack of a bidirectional LSTM neural network, a convolutional neural network, and an attention mechanism, including:

step 301, performing feature extraction on the learned feature expression by using a bidirectional LSTM neural network, and preventing an overfitting phenomenon from being generated in the training process by using a dropout function;

step 302, performing secondary feature extraction on the shallow feature representation obtained in step 301 by using a convolutional neural network, setting m convolutional kernels with different lengths, wherein each convolutional kernel is provided with n numbers (m is 3 in the example, the lengths are 2, 3 and 4, and n is 20 respectively)

Step 303, expressing the shallow feature obtained in step 301 to perform attention calculation by using an attention mechanism, obtaining key information between words and sentences, and performing similarity calculation by using a cosine distance formula;

step 304, splicing the feature representations obtained in the steps 302 and 303 to obtain a highly abstracted feature representation;

further, in step 104, the event type identification module takes the feature representation in step 304 as input, finds out the relation between the entity labeled by the text in the training set and the label by using the transition matrix of the conditional random field layer, and extracts an event trigger word to obtain an event type;

further, in step 105, the event element classifier maps the event types obtained in step 104 into a 1-dimensional array, after splicing to step 304, introduces event type label information to enhance description capability, finds out the relation between labels by using the transition matrix of the conditional random field layer, predicts the constraint relation between labels, and extracts event arguments and the corresponding roles.

The invention provides an event extraction method integrating an attention mechanism and a convolutional neural network. The method combines the convolutional neural network, the attention mechanism and the pre-training model, effectively obtains sentence-level event characteristics in literature description, learns the relationship among words, dynamically generates the characteristic vector fusing context information, ensures extraction efficiency, finally realizes full utilization of semantic correlation between event types and event arguments in an event extraction flow, and greatly improves the accuracy of event extraction.

Of course, the present invention may have other embodiments, which are not limited to the embodiments described in the detailed description, and other embodiments provided by the technical solutions of the present invention by those skilled in the art are also within the scope of the claims attached to the present invention.

Claims

1. An event extraction method for fusing an attention mechanism and a convolutional neural network comprises the following steps:

3) Inputting the context characteristics of the text to be extracted and the associated information between the vocabularies into an event trigger word classifier, outputting the event trigger word of the text to be extracted, and then determining the event type of the text to be extracted based on the event trigger word of the text to be extracted;

2. The method of claim 1, wherein the method is obtainedThe method for extracting the distributed features of the text comprises the following steps: the text encoder generates a text sequence S = [ x ] according to the text to be extracted ₁ ,x ₂ ,x ₃ ,..,x _n ]Wherein n is the number of word segments in the text to be extracted, and x _i The ith word segmentation in the text to be extracted is carried out; then, a word embedding vector, a segmentation vector and a position vector are respectively generated for each participle in the text sequence S, and the text sequence S is converted into an input sequence T = (T) in a mode of summing the word embedding vector, the segmentation vector and the position vector ₁ ,t ₂ ,t ₃ ,..,t _n )，t _n A word embedding vector, a segmentation vector and a position vector which represent the nth participle are merged into a fused vector; the sequence T = (T) ₁ ,t ₂ ,t ₃ ,..,t _n ) Inputting a Transformer layer to obtain interword association and distribute weight through a self-attention function to obtain a feature vector fusing context information; inputting the feature vector of the fusion context information into a pre-training model to obtain a sequence E _m ＝{e ₁ ,e ₂ ,e ₃ ,...,e _n The text to be extracted is obtained; e.g. of the type _n And representing the word vector corresponding to the nth participle.

3. The method of claim 2, wherein the feature extractor comprises a two-way long-short-time memory network, a convolutional layer, and an attention unit; the method for extracting the context characteristics of the text to be extracted and the associated information among the vocabularies comprises the following steps: firstly, inputting the distributed characteristics into the bidirectional long-time and short-time memory network to obtain sequence characteristic vectors and inputting the sequence characteristic vectors into the convolutional layer; the convolution layer carries out convolution calculation on the sequence feature vector to obtain the local feature of the text to be extracted and the semantic structure high-dimensional feature vector; inputting the semantic structure high-dimensional feature vector into the attention unit to obtain each participle x in the text to be extracted _i Relevance feature vector r with target entity _i Wherein i ∈ [1, n ]]。

4. The method of claim 3, wherein the event triggersThe word classifier will

5. The method of claim 2, wherein the pre-training model is a BERT model.

6. A server, comprising a memory and a processor, the memory storing a computer program configured to be executed by the processor, the computer program comprising instructions for carrying out the steps of the method according to any one of claims 1 to 5.

7. A computer-readable storage medium, on which a computer program is stored, which, when being executed by a processor, carries out the steps of the method of one of claims 1 to 5.