CN113553850A

CN113553850A - Entity relation extraction method based on ordered structure encoding pointer network decoding

Info

Publication number: CN113553850A
Application number: CN202110338079.9A
Authority: CN
Inventors: 贾海涛; 邢增传; 张博阳; 黄超; 耿昊天; 曾靓; 刘桐; 李嘉豪
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2021-03-30
Filing date: 2021-03-30
Publication date: 2021-10-26

Abstract

The invention provides an entity relation extraction method based on ordered structure encoding pointer network decoding, which comprises the following steps: performing Word Embedding on an input layer by using a BERT pre-training model training Word vector, adding a negative example represented by a sentence vector generated by countermeasure training, and constructing a sentence initial vector; capturing global semantic information of the text by using Bi-OnLSTM at an encoding layer; and (3) respectively extracting a head entity, a tail entity and a relation at a decoding layer by using a decoding idea of a pointer network, and using Sigmoid to replace Softmax prediction input to finish an entity relation triple extraction task. Because the decoding layer adopts a pointer network decoding mode, the problems of entity relationship overlapping and effective extraction of more triples contained in sentences can be well solved, and the accuracy of extracting entities and relationships in real time is improved.

Description

Entity relation extraction method based on ordered structure encoding pointer network decoding

Technical Field

The invention belongs to the field of natural language processing.

Background

The birth of computers, the continuous innovation and breakthrough of the technology and the popularization of the internet in the world all lead to the unprecedented improvement of life, study, food and transportation of people. Meanwhile, a large amount of text data is generated every day in the forms of news journal articles, blogs, question-and-answer community forums, social media and the like. Many important information is hidden in the document text data, and people need to acquire the important information through a large amount of complicated screening and reading. Therefore, an information extraction technology is developed to remove redundant data and reduce the amount of human reading while actually acquiring effective information. The information extracted by the extraction technology can help us to acquire and manage the implicit knowledge in a large text corpus, and can be used for constructing a question-answering system, a retrieval and recommendation system. Information extraction techniques differ from manual data filtering to return a series of document data, which can extract the event fact information contained in a given sentence, a speech, a document, or even a batch of data, and the information is composed of entity and relationship information, and is generally called triple data. Entity types such as people, organizations, etc. are the most basic units of information, and entities appearing in a sentence can be related by an explicit relationship of "birth to", "presence", etc. The entity and relationship extraction task (RE) is to automatically identify the relationships between these entities and entities. Through the information extraction technology, people can acquire effective contents in information without reading data word by word. Research aiming at information extraction technology, especially entity relation extraction technology, is still one of the major hotspots in the field of artificial intelligence until now.

Information Extraction (IE) is a new sub-field in natural languages, which has been developed for twenty years now, and its predecessor is text understanding, which has been developed for decades. In the 80 s, the Message Understanding Conference (MUC) established with U.S. government support has been working to drive the development of information extraction technology. MUC attracts the participation of companies laboratories and academic research institutions around the world by holding information extraction games, each competition team can construct a model through three major indexes of an official release data set and an information extraction technology, and then the official evaluates the models by using a test set, so that the information extraction technology is continuously developed and improved.

At present, the top task of natural language processing is to construct a Knowledge Graph (KG), and KG is a large-scale information representation method which can be used in various fields. The most common method for representing KG is to follow a Resource Description Framework (RDF) method, i.e. representing entities by using nodes, and representing the relationship between entities by using edges between every two nodes. Each edge and the two end points of the edge form a set of fact information of the triplets (head entity, relation, tail entity), such as: (Zhou Ji Lun, born in New North City, Taiwan), it means that the birth place of Zhou Ji Lun is in New North City, Taiwan. The KG is a heterogeneous graph network, which contains a large number of different types of entity nodes and relationships, and may even have sentence nodes. By so representing, we can discover from it various attributes of entities, high-level relationships between entities, and associations between relationships. Therefore, the entity relation extraction technology is not important as the bottom layer for constructing the knowledge graph base.

The entity relationship extraction task is a first-stage subtask in the information extraction task, and the main task of the entity relationship extraction task can be divided into 2 subtasks: firstly, named entity recognition is carried out, namely a head entity (also called a main entity object) contained in a sentence is recognized, and then a tail entity (also called a guest entity object) is recognized; the next step is relationship extraction, which is to identify the implied relationship (predicate) between the head entity and the tail entity. The pair of entities and relationships are integrated together in a triplet form (S, P, O), for example (zhou jen, born in taiwan). However, the entity relationship extraction task has two types of problems, which are summarized as follows:

in the first category, the conventional Pipeline method for Pipeline processing includes recognizing named entities, i.e., recognizing two entities existing in a sentence, and then sending the two entities into a relationship classification model to recognize the relationship between the two entities. The essence is that the relation extraction task is divided into 2 subtasks, and the output result of the entity recognition task model is used as the input of the relation classification model. However, this creates several problems:

(1) and (3) error accumulation: errors in entity-phase extraction can affect the relationship extraction performance in the relationship classification phase.

(2) Physical redundancy: because the head and tail entities are extracted first, and the two entities may be found to have no relationship when being classified, the two entities having no relationship are redundant for other tasks of the subsequent knowledge graph, such as entity linking, and using unrelated candidate entities by a question and answer system, the calculated amount is increased, and the model accuracy is reduced.

(3) Interactive missing: the entity recognition task and the relation extraction task may have an association or parameter sharing, and the interaction is lost simply by taking the last subtask output as the next subtask input.

In the second category, a pipeline-based method sometimes extracts a pair of entities that do not have a relationship, and because this method only performs a triple extraction on a sentence once, it also results in that a plurality of triples contained in a sentence are not extracted all. Most importantly, if a sentence contains overlapping entities or relationships, neither the traditional model nor the joint extraction model can completely extract the triples. As shown in fig. 7.

(1) Single Entity Overlap (Single Entity overlay) for example, as shown in Table 1, the sentences "Zhou Jie Lun has evolved" head word D "and" secret cannot say ". "wherein the head entity" Zhou Jilun "corresponds to 2 tail entities" head word D "and" secret that can not be said ", two groups of triples are formed respectively, except that the relation and tail entities are the same, the head entity is overlapped. This problem exists in many sentences.

(2) Entity Pair Overlap (Entity Pair overlay) the sentence "secret cannot say" from the director and lead of Zhou Jie Lung is sold in the box office. "includes 2 sets of entity pairs (Zhoujilun, secret to say) and is a typical overlap of entity pairs where the relationships are different, one is" actor "and one is" director ".

The invention provides a joint entity relationship extraction method aiming at the problems of the entity relationship extraction task. In view of the good performance of the codec frame on other natural language processing tasks, the invention builds the AT-BiOnLSTM-Point pointer network decoding model added with the disturbance item to extract the entity relationship triples by improving the traditional LSTM network based on the codec frame.

Disclosure of Invention

The invention provides an entity relationship extraction method based on ordered structure encoding pointer network decoding, and aims to improve three indexes of entity relationship extraction task accuracy, recall rate and F1 value and the capability of extracting overlapping entity triples. The method comprises the following steps:

(1) and selecting characteristics in an input layer to construct an initial sentence vector, and vectorizing and representing the sentence.

(2) And capturing hierarchical structure information at the coding layer to obtain the hidden embedding of each word of the sentence.

(3) And (4) further extracting abstract features by using a pointer network from the features before the integration of the decoding layer, and extracting sentence triples.

Drawings

FIG. 1 is an overall framework diagram of the entity relationship extraction model of the present invention.

FIG. 2 is an example of an entity to be extracted and a relational data set in a sentence according to the present invention.

Fig. 3 is a schematic diagram of adding an AT perturbation term after a presentation layer, which is adopted by the present invention.

FIG. 4 is a diagram illustrating hierarchical granularity in a sentence according to the present invention.

FIG. 5 is a schematic diagram of the structure of the On-LSTM unit employed in the present invention.

FIG. 6 is a schematic diagram of a pointer network employed in the present invention.

FIG. 7 is an example of the entity overlap type problem of the present invention.

FIG. 8 is a diagram of NYT data set information as used in the present invention.

Detailed Description

The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.

As shown in fig. 1, the invention mainly uses a codec framework as a basis, constructs a pointer network added with a disturbance item by improving the traditional LSTM to extract the entity relationship, and mainly comprises an Input Layer (Input Layer), an encoding Layer (Encoder Layer), and a decoding Layer (Decoder Layer) (including a head entity pointer labeling Layer and a tail entity relationship pointer labeling Layer). The specific implementation mode is as follows:

The method comprises the following steps: input layer

The improved combined entity relation extraction model is put on a standard English data set NYT (New York times) and a derivative version data set thereof for evaluation. The corpus used in the experiment is an NYT data set obtained by Zeng et al by aligning original data with the relationship on freebase, and has 24 relationships, the test set of the data set is manually labeled, and more overlapping entity relationships are added. As shown in fig. 8 (a).

The NYT data set contains a statistically derived number of overlapping type triples as shown in FIG. 8(b), where it can be seen that each sentence contains an average of 1.5 triples, whether in the training set or the test set. The overlapping entity types are then divided into 3 types: neo (normal) indicates sentences with no entities or overlapping pairs of entities, EPO indicates sentences in which the head and tail entities overlap, SPO indicates sentences in which only a single entity (which may be either the head or tail entity) overlaps.

As shown in fig. 2, the example of the entity to be extracted and the relationship data set in the NYT data set applied by the entity relationship extraction task is shown.

The method comprises the steps of completing sentence vectorization on an input layer, firstly performing Word Embedding on an input sentence by using a Word vector trained by a pre-training model, then adding countermeasure training in the middle of outputting the Word vector to an encoder on a vector representation layer, generating a negative example represented by a sentence vector, and enhancing the performance of model training.

We convert each word into a vector consisting of the following two parts.

1. Word vector

The entity relation extraction task needs to contact the context to find out each entity word and relation word, and how to find the entities and the relation needs to be identified according to the context. Therefore, the invention adopts the word vector which is trained by the BERT pre-training language model and is based on the context to carry out space mapping on the input sentence.

The advantage of BERT over word2vec is that the word vectors trained by BERT are not static, i.e., the semantics are not fixed, and can be well represented for sentences containing ambiguous words; compared with the mode that the ElMo adopts the bidirectional LSTM splicing fusion characteristic, the mode is naturally weaker than the BERT integrated fusion characteristic mode; the better GPT is because it is a one-way language model, and naturally much weaker than BERT.

BERT employs a two-stage training model consistent with GPT: first, language pre-training is performed, and second, Fine-Tuning (Fine-Tuning) is used when applied to downstream tasks.

First, the input sentence sequence may be expressed as X ═ { X₁,x₂,…x_i,…,x_nIn which x_iRepresent the ith character in the sequence, then we use the pre-trained BERT word vector to convert x _iIs shown as

The vector dimension is d.

Then the word vector matrix for the entire sentence is as shown in equation 1.

E＝[e₁,e₂,…,e_n] (1)

2. Counter training

The countermeasure training (AT) is first proposed in image processing, and aims to improve the robustness of successful classifier identification in an image identification environment. In natural language processing, various variants are generated for different task confrontation training, such as based on text classification, part-of-speech tagging, and the like. So-called confrontational training is actually considered a regularization method. But the countertraining is different from many regularization methods, which introduce random noise, and the countertraining improves the model performance by generating perturbation that is easily recognized as an error example by the classifier.

In order to improve the performance of the entity relationship extraction model, the invention adds countermeasure training on the word embedding layer, and generates a negative example of the original input information by adding some noise on the spliced word vector representation layer, as shown in fig. 2.

The input representation layer model comprises word vectors and countertraining, and a small perturbation function is added in the training data. As shown in equation 2.

I.e. by applying the worst-case disturbance η_advAdded to the original embedded vector ω to maximize the loss function. Wherein the content of the first and second substances,

is a copy of the current model parameters. Then, the original case and the generated negative case are jointly trained, so the final loss is as shown in equation 3.

Step two: coding layer

For tasks in different fields, different combination modes can be selected for the coding layer and the decoding layer, for example, on an image processing task, a convolutional neural network is usually used to form the coding layer, and for a natural language processing field task of extracting event elements, a cyclic neural network is usually selected.

In the text processing of Chinese, we have a level concept, where words are the lowest level, followed by words, followed by sentences, paragraphs, and the like. The higher the hierarchy, the coarser the granularity, the larger the span of information in the sentence. FIG. 4 is a schematic diagram of hierarchical granularity.

However, the neurons of the conventional recurrent neural networks such as LSTM are often disordered, so that the neurons cannot learn and extract hierarchical structure information. Therefore, the invention selects the Bi-directional ordered long-short term memory network (Bi-OnLSTM) as the basic structure of the coding layer, so that the high-level information can be kept for a longer time in the corresponding period, the low-level information is easier to forget in the corresponding interval, and the different information propagation spans form the hierarchical structure of the input sequence. The forward calculation formula of the On-LSTM is shown in formula 4, and FIG. 5 is a schematic structural diagram of the On-LSTM unit.

Wherein, the On-LSTM is modified compared with the traditional LSTM, and the On-LSTM is mainly provided with a main forgetting door

Main input gate

And

right/left direction cumsum operations, respectively.

The present invention designs the introduced On-LSTM as a bi-directional network. In the entity relationship extraction task, the acquisition of the unidirectional left-to-right information is not enough to support the entity relationshipAn extraction task needs a layer of On-LSTM from right to left to obtain the following information, so that the structure of an improved joint entity relation extraction model coding layer is Bi-OnLSTM. Computing word x at t time by forward On-LSTM_tLeft state

(final hidden state of forward propagation layer), and then utilizing backward On-LSTM to calculate word x at time t_tRight state

(final hidden state of the counter-propagating layer), then the word x_tThe output result at the coding layer at time t is

Step three: decoding layer

Because the Bi-OnLSTM of the coding layer captures all hierarchical information and sequence information, the invention extracts the joint entity relationship at the decoding layer and solves the problem of entity relationship overlapping by using the decoding idea of a pointer network.

The invention is different from the prior method of extracting the entities first and then judging the relationship between the entities, and adopts an improved extraction mechanism instead. Fig. 6 is a schematic view of a pointer network. The task can be divided into two stages, the first stage is to mark out possible candidate head entities in the sentence, the second stage is to mark out tail entities and relations according to the semantic and position characteristics of the candidate head entities, so that the problem that one head entity can be overlapped with a plurality of tail entities and relations is solved, and because one head entity obtains triples according to the semantic and position characteristics, the extraction of meaningless triples is avoided, and the redundant information is reduced.

Then, the conventional triple extraction formula becomes a conditional probability formula, as shown in formula 5.

p(s，p,o|Sen)＝p(s|Sen)p(p,o|s,Sen) (5)

In the formula, Sen is a sentence representation, and s, p, o are entity relationship triples. First, we use the head entity tag p (s | Sen) to identify the head entity in a sentence, and then use the tail entity tag p (p, o | s, Sen) to identify the tail entity having a corresponding relationship with the head entity for each relationship r.

The extraction of the abstract feature triples by the joint entity relationship extraction decoding layer is composed of the following two modules.

1. Head entity extraction module

Coding vector h output by Bi-On-LSTM coding layer_iAnd a head entity extraction module of the decoding layer decodes the vector to identify all possible vectors of the head entity. First, add a header entity label layer on top of the coding layer output, i.e. use two layers of classifiers (label layer): the start layer and the end layer recognize the start position and the end position of the header entity. The specific operation is to use a binary label (0, 1) to label each token represented by a sentence: the token in the start layer if it carries a "1" tag indicates the start position, and the token in the end layer if it carries a "0" tag. The head entity labeling layer calculates the probability of the head entity possibly existing in the sentence as shown in formula 6.

Wherein, a Bi-OnLSTM layer is added in the start label layer firstly, and h before decoding_iSending to the layer to obtain further hidden state vector of initial position of head entity

Is the probability of the beginning position of the head entity token. Head entity end position hidden state vector

σ is the probability of the head entity token ending position, σ is the activation function.

Then, the maximum likelihood function is calculated for all possible starting positions and ending positions of the head entity, so as to obtain an input sentence token representation x (x)_i＝h_N[i]) The head entity range of (2) is shown in equation 7.

Where L is the length of the sentence token, when x is 1, f { x } ═ 1; conversely, when x is 0, f { x }, is 0.

2. Tail entity and relation extraction module

W head entity vector representation of head entity label layer output

And hidden state vector representation x of the coding layer output_i＝h_N[i]Sending the tail entity and the relation label layer. Similarly, the probability of the possible tail entity is shown in equation 8.

The tail entity maximum likelihood function is shown in equation 9.

Finally, the loss function is calculated according to equations 8 and 9 as shown in equation 10.

Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited in scope to the specific embodiments. Such variations are obvious and all the inventions utilizing the concepts of the present invention are intended to be protected.

Claims

1. An entity relation extraction method based on ordered structure encoding pointer network decoding is characterized in that the method aims at identifying and extracting a triple composed of entities and relations in a sentence, and the method comprises the following steps:

step 1: selecting characteristics on an input layer to construct an initial sentence vector, and vectorizing and representing the sentence;

step 2: capturing hierarchical structure information on a coding layer, and acquiring hidden embedding of each word of a sentence;

and step 3: and (4) further extracting abstract features by using a pointer network from the features before the integration of the decoding layer, and extracting sentence triples.

2. The method for extracting entity relationship based on ordered structure coded pointer network decoding as claimed in claim 1, wherein the constructing of sentence initial vector in step 1 specifically refers to: in the entity relation extraction task, the invention selects a word vector to add into a countertraining negative case to represent a sentence;

step 1.1: training word vector

With X ═ X₁，x₂，…x_i，…，x_nDenotes a sequence of input sentences, where x_iRepresenting the ith character in the sequence, adopting word vectors which are trained by a BERT pre-training language model and are based on context to perform space mapping on an input sentence, and adopting a two-stage training model: first, language pre-training is performed, and second, Fine-Tuning (Fine-Tuning) is used when applied to downstream tasks; then, we use the pre-trained BERT word vector to convert x _iIs shown as

The vector dimension is d;

then the word vector matrix of the whole sentence is as shown in equation 1;

E＝[e₁，e₂，…，e_n] (1)

step 1.2: counter training

In order to improve the performance of the entity relationship extraction model, the invention adds countermeasure training on the word embedding layer, and generates a negative example of the original input information by adding some noises on the spliced word vector representation layer, as shown in fig. 2;

the input representation layer model comprises word vectors and countermeasure training, and a small disturbance function is added in the training data; as shown in equation 2;

i.e. by applying the worst-case disturbance η_advAdding to the original embedded vector ω, thereby maximizing the loss function; wherein the content of the first and second substances,

is a copy of the current model parameters; then, the original case and the generated negative case are jointly trained, so the final loss is as shown in equation 3.

3. The entity relationship extraction method based on ordered structure coded pointer network decoding as claimed in claim 2, wherein the capturing of hierarchical structure information and sequence information at the coding layer in step 2 specifically refers to:

for tasks in different fields, different combination modes can be selected for the coding layer and the decoding layer, for example, on an image processing task, a convolutional neural network is usually used for forming the coding layer, and for the natural language processing field task of extracting event elements, a cyclic neural network is usually selected;

In the text processing of Chinese, a concept of a hierarchy exists, a word is the lowest hierarchy, words are the second, and sentences, paragraphs and the like are the next; the higher the hierarchy is, the coarser the granularity is, the larger the span of the information in the sentence is; FIG. 4 is a schematic diagram of hierarchical granularity;

however, the neurons of the conventional recurrent neural networks such as LSTM are usually disordered, so that the neurons cannot learn and extract hierarchical structure information; therefore, the invention selects the bidirectional ordered long-short term memory network (Bi-OnLSTM) as the basic structure of the coding layer, so that the high-level information can be kept for a longer time in the corresponding period, the low-level information is easier to forget in the corresponding interval, and the different information propagation spans form the hierarchical structure of the input sequence; the forward calculation formula of the On-LSTM is shown as formula 4, and FIG. 5 is a schematic structural diagram of the On-LSTM unit;

Main input gate

And

right/left cumsum operation, respectively;

the introduced On-LSTM is designed into a bidirectional network; in the entity relationship extraction task, only acquiring unidirectional left-to-right upper information is not enough to support the entity relationship extraction task, a layer of right-to-left On-LSTM is needed to acquire the lower information, and then the improved coding layer structure of the combined entity relationship extraction model is Bi-OnLSTM; computing word x at t time by forward On-LSTM _tLeft state

4. The entity relation extraction method based on ordered structure coded pointer network decoding as claimed in claim 3, wherein the feature before decoding layer synthesis in step 4 further extracting abstract features by using pointer network specifically refers to:

because Bi-OnLSTM of the coding layer captures all hierarchical information and sequence information, the invention extracts the joint entity relationship at the decoding layer and solves the problem of entity relationship overlapping by using the decoding idea of a pointer network;

the invention is different from the prior method of extracting the entities first and then judging the relationship between the entities, and adopts an improved extraction mechanism; FIG. 6 is a schematic diagram of a pointer network; the task can be divided into two stages, wherein the first stage is to mark possible candidate head entities in the sentence, and the second stage is to mark tail entities and relations according to the semantic and position characteristics of the candidate head entities, so that the overlapping problem that one head entity can correspond to a plurality of tail entities and relations is solved, and because one head entity obtains triples according to the semantic and position characteristics, the extraction of meaningless triples is avoided, and the redundant information is reduced;

Therefore, the conventional triple extraction formula becomes a conditional probability solving formula, as shown in formula 5;

p(s，p，o|Sen)＝p(s|Sen)p(p，o|s，Sen) (5)

in the formula, Sen is represented by sentences, and s, p and o are entity relationship triples; firstly, using a head entity label p (s-Sen) to identify a head entity in a sentence, and then using a tail entity label p (p, o-s, Sen) to identify a tail entity corresponding to the head entity for each relation r;

extracting an abstract feature triple through a combined entity relationship extraction decoding layer, wherein the abstract feature triple consists of the following two modules;

step 4.1: head entity extraction

Coding vector h output by Bi-OnLSTM coding layer_iSending the vector to a head entity extraction module of a decoding layer for decoding, and identifying all vectors which may be head entities; first, add a header entity label layer on top of the coding layer output, i.e. use two layers of classifiers (label layer): identifying the starting position and the ending position of the head entity by the start layer and the end layer; the specific operation is to use a binary label (0, 1) to label each token represented by a sentence: if the token in the start layer is provided with a label of '1' to represent the starting position, and if the token in the end layer is provided with a label of '0'; the probability of the head entity possibly existing in the sentence is solved by the head entity label layer and is shown as a formula 6;

Wherein, a Bi-LSTM layer is added in the start label layer, h _ i before decoding is sent to the layer to obtain a further hidden state vector of the initial position of the head entity

Probability of starting position of token which is a head entity; head entity end position hidden state vector

The probability of the ending position of the token of the head entity is sigma, and sigma is an activation function;

then, the maximum likelihood function is calculated for all possible starting positions and ending positions of the head entity, so as to obtain an input sentenceChild token represents x (x)_i＝h_N[i]) The head entity range of (a) is shown in equation 7;

where L is the length of the sentence token, when x is 1, f { x } ═ 1; conversely, when x is 0, f { x }, is 0;

step 4.2: tail entity and relationship extraction

W head entity vector representation of head entity label layer output

And hidden state vector representation x of the coding layer output_i＝h_N[i]Sending a tail entity and a relation label layer; similarly, the probability of the possible tail entity is shown in equation 8;

the maximum likelihood function of the tail entity is shown in equation 9;