CN111783394B

CN111783394B - Training method of event extraction model, event extraction method, system and equipment

Info

Publication number: CN111783394B
Application number: CN202010798835.1A
Authority: CN
Inventors: 程刚; 张剑
Original assignee: Shenzhen Raisound Technology Co ltd
Current assignee: Shenzhen Raisound Technology Co ltd
Priority date: 2020-08-11
Filing date: 2020-08-11
Publication date: 2024-03-08
Anticipated expiration: 2040-08-11
Also published as: CN111783394A

Abstract

The invention discloses a training method of an event extraction model, an event extraction method, an event extraction system and an event extraction device. The training method of the event extraction model comprises the following steps: acquiring document data, carrying out related labeling, and carrying out vectorization processing on the document data subjected to related labeling; extracting features of the vectorized document data according to three levels of sentences, paragraphs and documents, and carrying out fusion processing on the extracted three levels of features to obtain features of fused full-text information; based on the feature and label of the fusion full-text information, training of entity identification, event identification and event element classification is carried out through a supervised deep learning algorithm, and an event extraction model is obtained. According to the invention, sentence level features, paragraph level features and document level features are extracted respectively, and then three levels of features are fused, so that full-text information is fully used, feature expression is enhanced, and further, the performance of event extraction and recognition is improved.

Description

Training method of event extraction model, event extraction method, system and equipment

Technical Field

The present invention relates to an event extraction technology in the field of computer data processing technology, and in particular, to a training method for an event extraction model, an event extraction method, an event extraction system and a device.

Background

In recent years, with the rising and continuous economic growth of digitization in the financial field, explosive growth of digital financial documents has been witnessed, and massive financial documents exist on the network, such as announcements of a large number of companies and annual reports of companies. The massive data contains a large amount of precious information, but most of the data is unstructured or semi-structured data, is very difficult to structure and difficult to quantify, so that manual analysis is needed, but the massive data is simply analyzed by manpower, so that the massive data is not only very inefficient but also difficult to realize, the study on the structuring of the data becomes a research hot spot, and in recent years, rapid development is achieved, and various data structured systems are layered endlessly.

Event extraction refers to identifying a specific type of event and determining and extracting relevant information. Event extraction plays an important role in natural language processing, and it can generate evaluable structured information to facilitate completion of various tasks such as knowledge base construction, question answering, language understanding, etc. Based on the characteristics of event extraction, the method has wide application in the financial field. Such as event extraction of related financial notices, which is an event that occurs at a specific time point or time period within a specific geographical range and is a change in a state or a thing composed of one or more actions involving one or more roles, the event extraction can be applied to convert unstructured financial notices into structured data, thereby laying a good foundation for later analysis and mapping research.

Traditional event extraction is based primarily on template matching and machine learning. In recent years, due to the continuous improvement of the performance of the deep learning method, the deep learning can be performed end-to-end by training and extracting the features of Lu Bang, so that the deep learning becomes the main research direction of the current event extraction. The deep learning framework can learn the pointing information of different categories from the selected characteristics, and the determination of parameters, characteristic sets and related network architecture plays a decisive role in classification performance.

Event extraction techniques have been applied in many areas of financial structuring, such as the extraction of corporate announcements and the construction of financial event schematics, to convert unstructured data that is not understood by a computer into structured data that is understood by a computer, to convert something that is not understood by a computer into something that is understood by a computer, to exploit the powerful computing and reasoning capabilities of a computer, to mine some meaningful information, and to assist relevant financial investment institutions in making relevant financial decisions.

Most of the existing event extraction studies are sentence-level based, because the time complexity and space assistance required for processing by the relevant feature encoder increases quadratically with sentence length and cannot process too long text, so that the feature range of these event extraction codes can only be sentence-level based. The existing event extraction research is limited to sentence level, so that text context information interaction is split, a large amount of context information in document content is ignored, and the accuracy of extracting related elements of a financial event is reduced, so that the overall performance of the whole financial event extraction system is influenced, the overall performance of the financial event extraction is influenced, and further, the user experience effect of the financial event extraction in practical application is poor.

Disclosure of Invention

The invention mainly aims to provide a training method of an event extraction model and a corresponding event extraction method, which are used for enhancing feature expression and improving the performance and effect of event extraction identification by expanding the event extraction from sentence-based level to sentence-based level, paragraph-based level and document-based level. The invention also aims to provide a training system of the corresponding event extraction model, an event extraction system and related equipment.

In order to achieve the above purpose, the technical scheme adopted by the invention is as follows.

In a first aspect, a training method of an event extraction model is provided, including: acquiring document data, carrying out related labeling, and carrying out vectorization processing on the document data subjected to related labeling; extracting features of the vectorized document data according to three levels of sentences, paragraphs and documents, and carrying out fusion processing on the extracted three levels of features to obtain features of fused full-text information; based on the feature and label of the fusion full-text information, training of entity identification, event identification and event element classification is carried out through a supervised deep learning algorithm, and an event extraction model is obtained.

In a possible implementation manner, the fusing processing is performed on the extracted three levels of features to obtain features of fused full-text information, including: respectively marking sentence, paragraph and document level characteristics as S_i, P_i and D_i; calculating fusion weight of paragraph level information as G ^p _i=sigmoid (w1_i+w2_p_i+b), and the fusion weight of the document level information is calculated as g≡g≡b ^d _i=sigmoid (w1_i+w2_d_i+b), where W1, W2 and b are preset parameters; finally, the characteristics of the fusion full-text information are obtained and recorded as: fj=gj ^d _i*D_i+G^ ^p _i*P_i+(2-G^ ^p _i-G^ ^d _i)*S_i。

In a possible implementation manner, the feature extraction according to three levels of sentences, paragraphs and documents includes: inputting the document data into a feature extractor by taking sentences as processing units for feature interaction, and extracting features at sentence level; segmenting the document data on the basis of sentences, taking paragraphs as processing units, inputting the paragraphs into a feature extractor for feature interaction, and extracting features at paragraph levels; and inputting the document data into a feature extractor by taking the whole text as a processing unit for feature interaction, and extracting the features of the document level.

In a possible implementation manner, the training of entity identification, event identification and event element classification through a supervised deep learning algorithm includes: inputting the obtained characteristics of the integrated full-text information into a conditional random field CRF module for entity recognition training, and recognizing the document data as related event trigger word entities and event element trigger word entities; converting the identified event trigger word entity into a vector, and inputting the vector into a classifier to perform event identification training; and converting the identified event element trigger word entity into a vector, and inputting the vector into a classifier for event element classification training.

In a possible implementation manner, the method further includes: for a forward propagation process formed by entity identification, event identification and event element classification, a backward propagation process is constructed to reduce the respective results obtained in the process of training the entity identification, event identification and event element classification: entity recognition classification loss function loss1, event recognition classification loss function loss2, event element classification loss function loss3; so as to optimize the obtained event extraction model.

In a second aspect, there is provided an event extraction method, including: acquiring an input document and performing vectorization processing; extracting features of the input document subjected to vectorization according to three levels of sentences, paragraphs and documents, and carrying out fusion processing on the extracted three levels of features to obtain features of fused full-text information; and sending the obtained features fused with the full-text information into an event extraction model, and carrying out entity identification, event identification and event element classification to realize event extraction.

In a possible implementation manner, the feature extraction according to three levels of sentences, paragraphs and documents includes: inputting the input document into a feature extractor by taking sentences as processing units for feature interaction, and extracting features at sentence level; segmenting an input document on the basis of sentences, taking paragraphs as processing units, inputting the segmentation into a feature extractor for feature interaction, and extracting features at paragraph levels; and inputting the input document into a feature extractor by taking the whole text as a processing unit to perform feature interaction, and extracting the features of the document level.

In a third aspect, a training system for an event extraction model is provided, comprising: the document processing module is used for acquiring document data and carrying out related labeling, and carrying out vectorization processing on the document data subjected to related labeling; the feature extraction module is used for extracting features of the vectorized document data according to three levels of sentences, paragraphs and documents, and carrying out fusion processing on the extracted three levels of features to obtain features of fusion full-text information; the model training module is used for training entity identification, event identification and event element classification through a supervised deep learning algorithm based on the feature and the labeled label of the fusion full-text information to obtain an event extraction model.

In a fourth aspect, there is provided an event extraction system comprising: the document processing module is used for acquiring an input document and carrying out vectorization processing; the feature extraction module is used for extracting features of the input document subjected to vectorization processing according to three levels of sentences, paragraphs and documents, and carrying out fusion processing on the extracted three levels of features to obtain features of fusion full-text information; the event extraction module is used for sending the obtained characteristics of the integrated full-text information into the event extraction model to carry out entity identification, event identification and event element classification so as to realize event extraction.

In a fifth aspect, there is provided a computer device comprising a processor and a memory, the memory having stored therein a program comprising computer-executable instructions that, when executed by the computer device, cause the computer device to perform the training method of the event extraction model as described in the first aspect.

In a sixth aspect, there is provided a computer device comprising a processor and a memory, the memory having stored therein a program comprising computer-executable instructions, the processor executing the computer-executable instructions stored in the memory when the computer device is running to cause the computer device to perform the event extraction method as described in the second aspect.

In a seventh aspect, there is provided a computer readable storage medium storing one or more programs, the one or more programs comprising computer-executable instructions, which when executed by a computer device, cause the computer device to perform the training method of the event extraction model as described in the first aspect.

In an eighth aspect, there is provided a computer readable storage medium storing one or more programs, the one or more programs comprising computer-executable instructions, which when executed by a computer device, cause the computer device to perform the event extraction method as described in the second aspect.

From the above technical solutions, the embodiment of the present invention has the following advantages:

according to the invention, three layers of feature extraction are constructed, sentence level features, paragraph level features and document level (namely full text level) features are respectively extracted, and then the three layers of features are fused, so that three kinds of information including semantic level information, context information and global information are fused, full text information is fully used, feature expression is enhanced, and further the performance of event extraction and identification is improved. The invention is especially suitable for the financial field, and is beneficial to improving the performance and effect of financial event extraction and identification.

Drawings

In order to more clearly illustrate the technical solutions of the embodiments of the present invention, the drawings used in the description of the embodiments and the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the present invention, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.

FIG. 1 is a flow chart of a training method for an event extraction model according to one embodiment of the present invention;

FIG. 2 is a functional block diagram of event extraction in accordance with one embodiment of the present invention;

FIG. 3 is a functional block diagram of feature extraction and fusion in one embodiment of the invention;

FIG. 4 is a flow chart of a method for event extraction according to one embodiment of the present invention;

FIG. 5 is a block diagram of a training system for providing an event extraction model in accordance with one embodiment of the present invention;

FIG. 6 is a block diagram of an event extraction system according to one embodiment of the present invention;

fig. 7 is a block diagram of a computer device according to an embodiment of the present invention.

Detailed Description

In order that those skilled in the art will better understand the present invention, a technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in which it is apparent that the described embodiments are only some embodiments of the present invention, not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the present invention without making any inventive effort, shall fall within the scope of the present invention.

The terms first, second, third and the like in the description and in the claims and in the above drawings, are used for distinguishing between different objects and not necessarily for describing a particular sequential or chronological order. Furthermore, the terms "comprise" and "have," as well as any variations thereof, are intended to cover a non-exclusive inclusion. For example, a process, method, system, article, or apparatus that comprises a list of steps or elements is not limited to only those listed steps or elements but may include other steps or elements not listed or inherent to such process, method, article, or apparatus.

For ease of understanding, related art terms related to the present invention will be explained first.

Event extraction: it means that events of interest to the user are extracted from the text data describing the event information and presented in a structured form.

Events: the description of a specific event that occurs objectively in natural language form is typically a sentence or group of sentences. An event consists of an event trigger word and an event element.

Event trigger words: the word most representative of an event occurrence in an event reference is an important feature that determines the type (or class) of the event, and is typically a verb or noun.

Event element: the participants in an event are mainly composed of entities, time and attribute values.

ACE2015 defines 8 event types and 33 seed event types, each event type corresponding to a unique event template. Thus, event extraction becomes a task of event recognition and event element recognition.

Event extraction based on deep learning mainly consists of the following parts: one is entity recognition, which mainly recognizes event trigger words and event element trigger words; one is event identification, whose task is to determine whether an entity mentions a argument of a particular event, and then determine the relevant event type; one is event element classification, the main function of this task is to classify the relevant event elements into relevant predefined event types.

The present invention will be described in detail with reference to the following examples.

Referring to fig. 1, in an embodiment of the present invention, a training method of an event extraction model is provided. The method is an event extraction method based on deep learning, and can be executed by an event extraction system running on computer equipment. The method mainly comprises the following steps:

11: acquiring document data, carrying out related labeling, and carrying out vectorization processing on the document data subjected to related labeling;

12: extracting features of the vectorized document data according to three levels of sentences, paragraphs and documents, and carrying out fusion processing on the extracted three levels of features to obtain features of fused full-text information;

13: and carrying out entity identification and labeling labels based on the features of the fusion full-text information, and carrying out entity identification, event identification and event element classification training through a supervised deep learning algorithm to obtain an event extraction model.

As described above, the method has the core characteristics that sentence level characteristics, paragraph level characteristics and full text level characteristics are fused, so that text characteristics of full text information are obtained, model training is performed, and finally, the event extraction effect is improved. The invention is mainly aimed at financial event extraction and can be applied to event extraction in various other fields.

The method of the present invention will be further described with reference to fig. 2 and 3, taking a specific application scenario of financial event extraction as an example, and the steps are as follows.

1. Entity identification

1.1. Acquiring document data and performing related labeling

A) There is a large amount of financial document data on the internet. According to the task requirement, for a multimedia webpage (called a target webpage) with a specified network range, firstly, a web crawler can be utilized to capture relevant financial document data which is not marked, and the web crawler captures the data, thereby belonging to the prior art means; the acquired document data may then be stored in the database 1.

B) The data crawled from the web page cannot be directly used for training the event extraction model, mainly due to the following reasons: 1) The data comparison is dirty, contains a large amount of interference data, and has meaningless characters and punctuations; 2) At present, related models are mostly trained based on a supervised method, but related data provided on the internet are mostly data which are not marked, and the data cannot be used for training the related models, so that the acquired data need to be marked in a related way.

The data is processed in two steps:

b.1 The first step of data processing is mainly data denoising, and since the crawled data contains a large number of non-text parts, such as html tags, the characters have no meaning, and only interfere with the understanding of the text, the interfering text needs to be removed. The data can be filtered by writing related regular expressions, characters of nonsensical and interference items, such as some website links and some messy code characters, are filtered, so that the data is denoised, the data is clean, and the influence of the interference items on subsequent processing is reduced.

B.2 The second step is to make relevant labeling on the data. Since the current deep learning-based financial event extraction is based on a supervised training mode, a prediction function is derived from marked training data, so that the data needs to be marked. The data is mainly processed by the following steps: first, the different event types contained in the crawled related data are summarized, and then different event elements are defined for each event type according to the characteristics of each event data, and each event element represents a representative element in the event. The data is then labeled in accordance with the predefined information, mainly by labeling the event trigger words and related event element trigger words in the original document data, which can be processed according to existing standards. The document data (or text data) in the annotated text format may then be divided into the following three parts: 1) Training set: the training data set is mainly used for training a related event extraction model; 2) Verification set: the data collection is mainly used for verifying the performance of the related models in the training models, and then screening out the model with the best training; 3) Test set: the test data set is mainly used for detecting the effect of the training obtained model and seeing how the training obtained model is. Alternatively, the division criteria is 8:1:1, i.e., training data is eighty percent of all annotated text, and test data and validation data are ten percent, respectively. Finally, the annotated and partitioned data may be stored to database 2.

1.2. Feature extraction

A) And carrying out vectorization processing on the document data, and converting the text data into a form which can be understood by a computer.

A.1 The first step we want to process the text and build a correlation mapping table to map the text data into a correlation vector. First, a text-handling data structure may be constructed, which contains character mapping functions, which map each text to an id (Identity document, identification number), simply a dictionary whose key value is text and whose corresponding value is its corresponding id. At the same time, a label Alphabet can be constructed, and each label of the label is also mapped to the relevant id.

A.2 A text vector space is then constructed, the main function of which is to map the text to a vector, which is a 2-dimensional array W1 with dimensions N x D, N representing the length of all characters, D representing the dimensions of the character vector, through which the text can be converted to a vector, each character obtaining a vector id through alphabets, through which the text can be mapped to a related vector W id, to thereby convert the text to a related vector. Meanwhile, a label vector space W2 is also constructed, the label vector space is consistent with the text vector space effect, and each text is mapped into a related vector through character id.

A.3 First, the financial document can be divided, primarily by passing the document through. ? The following is carried out Converting the text into N sentences by the aid of the equal segmenters, converting the text into corresponding ids by the aid of the method described in the step A.1), setting the maximum length of each sentence to 128 for parallelization processing of the later data, setting the length of each sentence to 128, filling a part corresponding to no character with 0 when the length of each sentence is smaller than 128, directly cutting off the part exceeding the part to obtain the first 128 characters, processing the input text by the aid of the method, converting a financial document into a 2-dimensional vector D, wherein the dimension is N x 128, N represents how many sentences exist in the document, the value corresponding to 128 represents the vector id corresponding to each text, and obtaining a three-dimensional array N x 128 x D, D represents the dimension of the text vector by the aid of the method described in the step A.2), so that the text is converted into the vector.

A.4 Finally, the marked labels are processed, the texts and the labels are aligned, each text corresponds to one label, and the labels are used as labels to be trained next.

B) And extracting the characteristics of the processed document data.

B.1 The text vector obtained in the above manner is isolated in the processing process, but each character is converted into a pre-trained vector, the granularity of the processing is character granularity, the text context in which the processing is performed is ignored, and in order to solve the above problem, the text vector after the processing is subjected to the following processing.

B.2 Mainly text features are fed into the following three feature extractors: 1) Sentence-level feature extractor: the text range processed by the method is sentence-level, and text characteristics fused with context information are obtained by carrying out information interaction on the text of one sentence level; 2) Paragraph-level feature extractor: firstly, segmenting an original text by the prior art, dividing a document into different paragraphs, then inputting the different paragraphs into a paragraph-level feature extractor, wherein the text range processed by the paragraph-level feature extractor is of a paragraph level, and obtaining text features fused with context information by carrying out information interaction on the text of the paragraph level; 3) Document-level feature extractor: the text range processed by the method is the whole document, and the global text characteristics fused with the whole document are obtained by carrying out information interaction on the text of the document level. 4) Finally, a fusion module is constructed to fuse the three characteristics, so that the characteristics fused with sentence level information, paragraph level information and document level (namely full text level) information are obtained, and the characteristics fused with multiple layers are obtained. These four parts will be described in detail below, and the overall block diagram of the model is shown in fig. 3.

B.3 Sentence-level encoding processes sentence-level features as the name suggests, and its processing units are sentence-level features. Text may be input into some existing feature extractors with sentences as processing units for feature interaction.

The feature extractor may use some off-the-shelf feature processing modules such as transducer, biLSTM, biglu, etc. The BiLSTM and the BiGRU perform sentence-level feature interaction through a bidirectional mechanism, and the Transformer realizes sentence-level feature interaction through a self-attention and multi-head attention mechanism. The text vectors of n×128×d are respectively input into the feature processor in 2-dimensional vectors of 128×d, so as to obtain feature vectors of 128×d, and then all the processed sentences are combined together, so as to obtain a final feature vector S, wherein the dimension of the final feature vector S is n×128×d.

B.4 The paragraph level feature extractor processes as its name implies, the paragraph level features, and its processing unit is a paragraph level feature. Firstly, the segmentation is carried out on the basis of the existing clauses, and the segmentation is mainly carried out by the following methods at present: (1) hard cutting method: the method is simple and realized, and each section has consistent length, so that parallelization processing can be convenient, but the method has the defects that some sentences which are irrelevant to semantics are divided into one section, and some sentences which are relevant to semantics are divided into different sections; (2) semantic method: according to the method, the text with low semantic relevance is prevented from being divided into one section by taking the context relevance into consideration, but the result obtained after the processing of the method causes inconsistent length of each section, and the parallel coding processing is inconvenient. After segmentation processing is carried out by one of the methods, the text in sentence units is converted into vectors in paragraph units, wherein P represents the number of paragraphs obtained after segmentation, L represents the length of each text, and D represents the sentence length.

The paragraph level feature processing module employed may be some off-the-shelf feature extractor such as BiLSTM, biglu, but if a transducer module is to be used, some presently improved transducer is required because the complexity of conventional transducer processing grows exponentially with the text rendering of the processed text, all of which are less suitable for sentence-level encoding, requiring some improved transducer such as longtransducer. And respectively inputting the text vectors of P and L and D into a feature extractor in 2-dimensional vectors of L and D, so as to obtain feature vectors of P and D, and then merging all processed sentences together to obtain a final feature vector P, wherein the dimension of the final feature vector P is P and L and D. To facilitate final feature fusion, the vector of P x L x D is converted to a vector of N x 128 x D.

B.5 Document-level feature extraction as the name implies, is a full-text-level feature, and its processing unit is a full-text feature. The text may be input to some existing feature extractor, with the text being the processing unit in its entirety, for feature interaction. Since the text length is very long due to the full text of the processing, an encoder specially processing the document must be used to take the encoding complexity into account, the processing assistance is insensitive to the text length, and some document-level encoders must be used, for example, existing longtransformers can be used. The text vector of n×128×d is input to the feature processor in 2-dimensional vectors of (n×128) ×d, so as to obtain feature vectors of (n×128) ×d, thereby obtaining final feature vector D, wherein the dimension of the final feature vector D is (128×n) ×d. To facilitate final feature fusion, the (128×n) D vector is converted to an n×128×d vector.

B.6 For the three-level (level) features acquired above, the acquired features are aligned so that each text corresponds to the three-level features (s_i, p_i, d_i).

Sentence level features and paragraph level features are input to a gating fusion module, and fusion weights G-A of paragraph level information are controlled by calculating gating vectors through the gating module ^p _i=sigmoid (w1×s_i+w2×p_i+b), where W1, W2, and b are parameters of the gated fusion module. The Sigmoid function is a Sigmoid function common in biology, also known as an Sigmoid growth curve. In information science, sigmoid functions are often used as activation functions for neural networks, mapping variables between 0,1, due to their single increment and anti-function single increment properties.

Then inputting sentence level features and document level features into a gating fusion module, and acquiring gating vectors through the gating module to control fusion weights G-A of full-text level information ^d _i=sigmoid (w1_i+w2_d_i+b). Then pass G ^p I and G ^d And (3) respectively controlling the fusion weights, and finally obtaining the final characteristics of the fusion full-text information through the following fusion mechanism: fj=gj ^d _i*D_i+G^ ^p _i*P_i+(2-G^ ^p _i-G^ ^d _i)*S_i。

C) Entity identification module:

c.1 Entity recognition training to obtain an entity recognition classifier. The final feature f_i of the fused full text information obtained in the above steps is input to a CRF (conditional random field module), entity recognition training is performed, the extracted features are labeled in sequence by a correlation algorithm, and the correlation text is recognized as a correlation event trigger word entity E1 and a correlation event element trigger word entity E2. And then carrying out one-hot coding on the marked label so as to construct the marked label, namely, comparing the marked label in supervised learning with the predicted label, and constructing a cross entropy loss function loss1.

2. Event identification

2.1. Event trigger word entity vectorization

The identified related event trigger word entities E1 are converted into related vectors, each identified event trigger word entity is a related text, each text corresponds to a text feature vector f_i and a label feature vector l_i, and the text feature vector f_i and the label feature vector l_i are fused into a vector t_i= [ f_i, l_i ] through a connection operation. Since the identified event trigger word entities are typically consecutive phrases [ w_i, w_i+1, ], w_j ], their corresponding vectors are [ t_i, t_i+1, ], feature aggregation is performed by a maxpooling operation, converting the identified entities into one vector all_t.

2.2. Event identification

And performing event recognition training to obtain an event recognition classifier. The obtained feature vector all_T is input into an event recognition classifier to perform event recognition classification training, a predicted event label is obtained through event classification, and then the predicted label and the marked event label are constructed into a cross entropy loss function loss2.

3. Event element classification

3.1. Event trigger word entity vectorization

The identified related event element trigger word entity E2 is converted into related vectors, each identified event element trigger word entity is a related text, each text corresponds to a text feature vector f_i and a label feature vector l_i, and the text feature vector f_i and the label feature vector l_i are fused into a vector a_i= [ f_i, l_i ] through a connection operation. Since the identified event element trigger word entities are typically consecutive phrases [ w_i, w_i+1, ], w_j ], their corresponding vectors are [ a_i, a_i+1, ], a_j ], feature aggregation is performed by a maxpooling operation, converting the identified entities into one vector all_a.

3.2. Event element classification

And training the event element classification to obtain an event element classifier. The obtained feature vector all_A is input into an event element classifier to perform event element classification training, a predicted event label is obtained by performing event element classification, and then a cross entropy loss function loss3 is constructed by the predicted label and the labeled label.

As described above, by performing training for entity recognition, event recognition, and event element classification, a desired event extraction model is obtained, which includes the training-obtained entity recognition classifier, event recognition classifier, and event element classifier.

4. Model optimization

4.1. Forward propagation

As above, the marked data is subjected to three steps of entity identification, event identification and event element classification to form a complete forward propagation process, and named entity classification loss function loss1, event identification classification loss function loss2 and event element classification loss function loss3 are respectively formed.

4.2. Back propagation optimization

By means of the correlation framework and the setting of the correlation optimizer, an automatic back propagation process can be constructed for the forward propagation process to optimize the event extraction model, and the optimization of the correlation model parameters comprises the following steps: the loss function of loss1 is reduced, so that the accuracy of entity identification is improved; the loss function of loss2 is reduced, so that the accuracy of event identification is improved; the loss function of loss2 is reduced, so that the accuracy of event element classification is improved.

5. Model deployment

The trained event extraction model can be obtained through the above parts, then the trained model can be tested by using a test set, and the quality of the model is quantified through three evaluation criteria of event identification accuracy, recall rate and F1 value, so that the best event extraction model is selected for deployment.

Referring to fig. 4, another embodiment of the present invention provides a corresponding event extraction method, including:

41. acquiring an input document and performing vectorization processing;

42. extracting features of the input document subjected to vectorization according to three levels of sentences, paragraphs and documents, and carrying out fusion processing on the extracted three levels of features to obtain features of fused full-text information;

43. and sending the obtained features fused with the full-text information into an event extraction model, and carrying out entity identification, event identification and event element classification to realize event extraction.

Wherein, the event extraction model is an event extraction model trained by the method shown in fig. 1 to 3.

In step 41, after the input document is obtained, a denoising process may be performed on the input document. The non-text parts contained in the input document, such as html tags, are of no significance, and only interfere with the understanding of the text, so that the interfering text needs to be removed. The data can be filtered by writing related regular expressions, characters of nonsensical and interference items, such as some website links and some messy code characters, are filtered, so that the data is denoised, the data is clean, and the influence of the interference items on subsequent processing is reduced.

In step 41, the input document is vectorized, which means that the text data is converted into a form that can be understood by a computer, that is, a correlation mapping table is constructed to map the text data into a correlation vector. The method specifically comprises the following steps:

a) The document clause will be entered, primarily by passing the document through. ? The following is carried out The equal segmenter converts the text into N sentences, and simultaneously sets the maximum length of each sentence to 128 for parallelization processing of the subsequent data, so that the length of each sentence is set to 128, when the length of each sentence is smaller than 128, the part corresponding to no character is filled with 0, and the excess part is directly truncated to only take the first 128 characters.

B) The related characters alphabets are built, a text processing data structure comprising a character mapping function, each text can be mapped to an id (Identity document, identification number), in short, a dictionary is built, the key value of which is the text, and the corresponding value of which is the corresponding id.

By processing the input document, a financial document can be converted into a 2-dimensional vector D, the dimension of which is n×128, N represents how many sentences the document has, 128 is the sentence length, and the corresponding value of 128 represents the vector id corresponding to each text.

C) A text vector space is constructed, the main function of which is to map a text into a vector, wherein the vector is a 2-dimensional array W1, the dimension of the vector is N x D, N represents the length of all characters, D represents the dimension of a character vector, the text can be converted into the vector through the vector space, each character obtains a vector id through alpha bet, and the text can be mapped into a related vector W [ id, ], thereby converting the text into the related vector.

And then the input document is processed to obtain a three-dimensional array N.times.128.times.D.D, wherein D represents the dimension of the text vector, so that the text is converted into the vector.

In step 42, feature extraction of the quantized input document may include:

inputting the input document into a sentence-level feature extractor for feature interaction by taking sentences as processing units, and extracting features of sentence levels;

segmenting an input document on the basis of sentences, taking a paragraph as a processing unit, inputting the paragraph into a paragraph-level feature extractor for feature interaction, and extracting paragraph-level features;

and the input document takes the full text as a processing unit, and is input into a document-level feature extractor for feature interaction, so that the document-level features are extracted.

In step 42, the fusion processing is performed on the extracted three levels of features, which may include:

respectively marking sentence, paragraph and document level characteristics as S_i, P_i and D_i;

calculating fusion weight of paragraph level information as G ^p _i=sigmoid (w1_i+w2_p_i+b), and the fusion weight of the document level information is calculated as g≡g≡b ^d _i=sigmoid (w1_i+w2_d_i+b), where W1, W2 and b are preset parameters;

finally, the characteristics of the fusion full-text information are obtained and recorded as: fj=gj ^d _i*D_i+G^ ^p _i*P_i+(2-G^ ^p _i-G^ ^d _i)*S_i。

For a more detailed implementation of step 42, reference is made to the descriptions of the previous sections b.1) to b.6), and no further description is given here.

Referring to fig. 5, in one embodiment of the present invention, a training system for an event extraction model is further provided for performing the training method for the event extraction model as described above.

The system may include:

the document processing module 51 is configured to acquire document data and perform related labeling, and perform vectorization processing on the document data that has been subjected to related labeling;

the feature extraction module 52 is configured to perform feature extraction on the vectorized document data according to three levels of sentences, paragraphs and documents, and perform fusion processing on the extracted three levels of features to obtain features of fused full-text information;

The model training module 53 is configured to perform training of entity identification, event identification and event element classification through a supervised deep learning algorithm based on the feature and the labeled label of the fused full-text information, so as to obtain an event extraction model.

Optionally, the feature extraction module 52 further includes:

the feature fusion module is used for respectively marking sentence, paragraph and document level features as S_i, P_i and D_i; calculating fusion weight of paragraph level information as G ^p _i=sigmoid (w1_i+w2_p_i+b), and the fusion weight of the document level information is calculated as g≡g≡b ^d _i=sigmoid (w1_i+w2_d_i+b), where W1, W2 and b are preset parameters; finally, the final characteristics of the fusion full-text information are obtained and recorded as: fj=gj ^d _i*D_i+G^ ^p _i*P_i+(2-G^ ^p _i-G^ ^d _i)*S_i。

Optionally, the feature extraction module 52 further includes:

the sentence-level coding module is used for inputting the document data into the feature extractor by taking sentences as processing units for feature interaction, and extracting the features of sentence levels;

the paragraph level coding module is used for segmenting the document data on the basis of sentences, taking paragraphs as processing units, inputting the paragraphs into the feature extractor for feature interaction, and extracting features of paragraph levels;

And the document-level coding module is used for taking the whole text of the document data as a processing unit, inputting the document data into the feature extractor for feature interaction, and extracting the features of the document level.

Optionally, the system further includes an optimization module, configured to construct a back propagation process for a forward propagation process comprising entity identification, event identification, and event element classification, to reduce the respective results obtained during training of entity identification, event identification, and event element classification: entity recognition classification loss function loss1, event recognition classification loss function loss2, event element classification loss function loss3; so as to optimize the obtained event extraction model.

Referring to fig. 6, an event extraction system is further provided according to an embodiment of the present invention. The system for performing the event extraction method as described above may include:

a document processing module 61 for acquiring an input document and performing vectorization processing;

the feature extraction module 62 is configured to extract features of the vectorized input document according to three levels of sentences, paragraphs and documents, and perform fusion processing on the extracted three levels of features to obtain features of fused full-text information;

The event extraction module 63 is configured to send the obtained feature of the fused full text information to an event extraction model, perform entity identification, event identification and event element classification, and implement event extraction.

Optionally, the feature extraction module 62 is specifically configured to:

inputting the input document into a feature extractor by taking sentences as processing units for feature interaction, and extracting features at sentence level; segmenting an input document on the basis of sentences, taking paragraphs as processing units, inputting the segmentation into a feature extractor for feature interaction, and extracting features at paragraph levels; the method comprises the steps of taking the whole text of an input document as a processing unit, inputting the input document into a feature extractor for feature interaction, and extracting features of a document level;

and, marking the sentence, paragraph and document level features as S_i, P_i, D_i respectively; calculating fusion weight of paragraph level information as G ^p _i=sigmoid (w1_i+w2_p_i+b), and the fusion weight of the document level information is calculated as g≡g≡b ^d _i=sigmoid (w1_i+w2_d_i+b), where W1, W2 and b are preset parameters; finally, the characteristics of the fusion full-text information are obtained and recorded as: fj=gj ^d _i*D_i+G^ ^p _i*P_i+(2-G^ ^p _i-G^ ^d _i)*S_i。

Referring to fig. 7, an embodiment of the present invention further provides a computer device 70, including a processor 71 and a memory 72, where the memory 72 stores a program, where the program includes computer-executable instructions, and when the computer device is running, the processor 71 executes the computer-executable instructions stored in the memory, so that the computer device 70 performs a training method of an event extraction model as described above, or performs an event extraction method as described above.

An embodiment of the present invention also provides a computer-readable storage medium storing one or more programs, the one or more programs comprising computer-executable instructions, which when executed by a computer device, cause the computer device to perform a training method of an event extraction model as described above, or to perform an event extraction method as described above.

In summary, the invention discloses a training method of an event extraction model, a corresponding event extraction method, a related system and related equipment.

In the foregoing embodiments, the descriptions of the embodiments are each focused, and for those portions of one embodiment that are not described in detail, reference may be made to the related descriptions of other embodiments.

The above embodiments are only for illustrating the technical solution of the present invention, and are not limiting; those of ordinary skill in the art will appreciate that: the technical scheme described in the above embodiments can be modified or some technical features thereof can be replaced equivalently; such modifications and substitutions do not depart from the spirit and scope of the technical solutions of the embodiments of the present invention.

Claims

1. A method for training an event extraction model, comprising:

acquiring document data, carrying out relevant labeling, and carrying out vectorization processing on the labeled document data;

extracting features of the vectorized document data according to three levels of sentences, paragraphs and documents, and carrying out fusion processing on the extracted three levels of features to obtain features of fused full-text information;

based on the feature and label of the fusion full-text information, training entity identification, event identification and event element classification through a supervised deep learning algorithm to obtain an event extraction model;

The method for obtaining the characteristics of the fusion full-text information comprises the following steps of:

calculating fusion weight of paragraph level information as G ^{^p} _i=sigmoid (w1_i+w2_p_i+b), and the fusion weight of the document level information is calculated as G ^{^d} _i=sigmoid (w1_i+w2_d_i+b), where W1, W2 and b are preset parameters;

finally, the characteristics of the fusion full-text information are obtained and recorded as: f_i=g ^{^d} _i*D_i+G ^{^p} _i*P_i+(2-G ^{^p} _i-G ^{^d} _i)*S_i。

2. The method of claim 1, wherein the feature extraction at three levels of sentences, paragraphs, and documents comprises:

inputting the document data into a feature extractor by taking sentences as processing units for feature interaction, and extracting features at sentence level;

segmenting the document data on the basis of sentences, taking paragraphs as processing units, inputting the paragraphs into a feature extractor for feature interaction, and extracting features at paragraph levels;

and inputting the document data into a feature extractor by taking the whole text as a processing unit for feature interaction, and extracting the features of the document level.

3. The method of claim 1, wherein the training of entity recognition, event recognition, and event element classification by a supervised deep learning algorithm comprises:

Inputting the obtained characteristics of the integrated full-text information into a conditional random field CRF module for entity recognition training, and recognizing the document data as related event trigger word entities and event element trigger word entities;

converting the identified event trigger word entity into a vector, and inputting the vector into a classifier to perform event identification training;

and converting the identified event element trigger word entity into a vector, and inputting the vector into a classifier for event element classification training.

4. The method of claim 1, wherein after obtaining the event extraction model, further comprising:

for a forward propagation process formed by entity identification, event identification and event element classification, a backward propagation process is constructed to reduce the respective results obtained in the process of training the entity identification, event identification and event element classification: entity recognition classification loss function loss1, event recognition classification loss function loss2, event element classification loss function loss3; so as to optimize the obtained event extraction model.

5. An event extraction method, comprising:

acquiring an input document and performing vectorization processing;

extracting features of the input document subjected to vectorization according to three levels of sentences, paragraphs and documents, and carrying out fusion processing on the extracted three levels of features to obtain features of fused full-text information;

Sending the obtained features fused with the full-text information into an event extraction model, and carrying out entity identification, event identification and event element classification to realize event extraction;

6. The method of claim 5, wherein the feature extraction at three levels of sentences, paragraphs, and documents comprises:

inputting the input document into a feature extractor by taking sentences as processing units for feature interaction, and extracting features at sentence level;

segmenting an input document on the basis of sentences, taking paragraphs as processing units, inputting the segmentation into a feature extractor for feature interaction, and extracting features at paragraph levels;

and inputting the input document into a feature extractor by taking the whole text as a processing unit to perform feature interaction, and extracting the features of the document level.

7. A training system for an event extraction model, comprising:

the document processing module is used for acquiring document data and carrying out related labeling, and carrying out vectorization processing on the document data subjected to related labeling;

the feature extraction module is used for extracting features of the vectorized document data according to three levels of sentences, paragraphs and documents, and carrying out fusion processing on the extracted three levels of features to obtain features of fusion full-text information;

the model training module is used for training entity identification, event identification and event element classification through a supervised deep learning algorithm based on the feature and the labeled label of the fusion full-text information to obtain an event extraction model;

8. An event extraction system, comprising:

the document processing module is used for acquiring an input document and carrying out vectorization processing;

the feature extraction module is used for extracting features of the input document subjected to vectorization processing according to three levels of sentences, paragraphs and documents, and carrying out fusion processing on the extracted three levels of features to obtain features of fusion full-text information;

the event extraction module is used for sending the obtained characteristics of the integrated full-text information into the event extraction model, and carrying out entity identification, event identification and event element classification to realize event extraction;