CN113553850A - Entity relation extraction method based on ordered structure encoding pointer network decoding - Google Patents

Entity relation extraction method based on ordered structure encoding pointer network decoding Download PDF

Info

Publication number
CN113553850A
CN113553850A CN202110338079.9A CN202110338079A CN113553850A CN 113553850 A CN113553850 A CN 113553850A CN 202110338079 A CN202110338079 A CN 202110338079A CN 113553850 A CN113553850 A CN 113553850A
Authority
CN
China
Prior art keywords
entity
layer
sentence
decoding
head
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202110338079.9A
Other languages
Chinese (zh)
Inventor
贾海涛
邢增传
张博阳
黄超
耿昊天
曾靓
刘桐
李嘉豪
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN202110338079.9A priority Critical patent/CN113553850A/en
Publication of CN113553850A publication Critical patent/CN113553850A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/253Grammatical analysis; Style critique
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides an entity relation extraction method based on ordered structure encoding pointer network decoding, which comprises the following steps: performing Word Embedding on an input layer by using a BERT pre-training model training Word vector, adding a negative example represented by a sentence vector generated by countermeasure training, and constructing a sentence initial vector; capturing global semantic information of the text by using Bi-OnLSTM at an encoding layer; and (3) respectively extracting a head entity, a tail entity and a relation at a decoding layer by using a decoding idea of a pointer network, and using Sigmoid to replace Softmax prediction input to finish an entity relation triple extraction task. Because the decoding layer adopts a pointer network decoding mode, the problems of entity relationship overlapping and effective extraction of more triples contained in sentences can be well solved, and the accuracy of extracting entities and relationships in real time is improved.

Description

Entity relation extraction method based on ordered structure encoding pointer network decoding
Technical Field
The invention belongs to the field of natural language processing.
Background
The birth of computers, the continuous innovation and breakthrough of the technology and the popularization of the internet in the world all lead to the unprecedented improvement of life, study, food and transportation of people. Meanwhile, a large amount of text data is generated every day in the forms of news journal articles, blogs, question-and-answer community forums, social media and the like. Many important information is hidden in the document text data, and people need to acquire the important information through a large amount of complicated screening and reading. Therefore, an information extraction technology is developed to remove redundant data and reduce the amount of human reading while actually acquiring effective information. The information extracted by the extraction technology can help us to acquire and manage the implicit knowledge in a large text corpus, and can be used for constructing a question-answering system, a retrieval and recommendation system. Information extraction techniques differ from manual data filtering to return a series of document data, which can extract the event fact information contained in a given sentence, a speech, a document, or even a batch of data, and the information is composed of entity and relationship information, and is generally called triple data. Entity types such as people, organizations, etc. are the most basic units of information, and entities appearing in a sentence can be related by an explicit relationship of "birth to", "presence", etc. The entity and relationship extraction task (RE) is to automatically identify the relationships between these entities and entities. Through the information extraction technology, people can acquire effective contents in information without reading data word by word. Research aiming at information extraction technology, especially entity relation extraction technology, is still one of the major hotspots in the field of artificial intelligence until now.
Information Extraction (IE) is a new sub-field in natural languages, which has been developed for twenty years now, and its predecessor is text understanding, which has been developed for decades. In the 80 s, the Message Understanding Conference (MUC) established with U.S. government support has been working to drive the development of information extraction technology. MUC attracts the participation of companies laboratories and academic research institutions around the world by holding information extraction games, each competition team can construct a model through three major indexes of an official release data set and an information extraction technology, and then the official evaluates the models by using a test set, so that the information extraction technology is continuously developed and improved.
At present, the top task of natural language processing is to construct a Knowledge Graph (KG), and KG is a large-scale information representation method which can be used in various fields. The most common method for representing KG is to follow a Resource Description Framework (RDF) method, i.e. representing entities by using nodes, and representing the relationship between entities by using edges between every two nodes. Each edge and the two end points of the edge form a set of fact information of the triplets (head entity, relation, tail entity), such as: (Zhou Ji Lun, born in New North City, Taiwan), it means that the birth place of Zhou Ji Lun is in New North City, Taiwan. The KG is a heterogeneous graph network, which contains a large number of different types of entity nodes and relationships, and may even have sentence nodes. By so representing, we can discover from it various attributes of entities, high-level relationships between entities, and associations between relationships. Therefore, the entity relation extraction technology is not important as the bottom layer for constructing the knowledge graph base.
The entity relationship extraction task is a first-stage subtask in the information extraction task, and the main task of the entity relationship extraction task can be divided into 2 subtasks: firstly, named entity recognition is carried out, namely a head entity (also called a main entity object) contained in a sentence is recognized, and then a tail entity (also called a guest entity object) is recognized; the next step is relationship extraction, which is to identify the implied relationship (predicate) between the head entity and the tail entity. The pair of entities and relationships are integrated together in a triplet form (S, P, O), for example (zhou jen, born in taiwan). However, the entity relationship extraction task has two types of problems, which are summarized as follows:
in the first category, the conventional Pipeline method for Pipeline processing includes recognizing named entities, i.e., recognizing two entities existing in a sentence, and then sending the two entities into a relationship classification model to recognize the relationship between the two entities. The essence is that the relation extraction task is divided into 2 subtasks, and the output result of the entity recognition task model is used as the input of the relation classification model. However, this creates several problems:
(1) and (3) error accumulation: errors in entity-phase extraction can affect the relationship extraction performance in the relationship classification phase.
(2) Physical redundancy: because the head and tail entities are extracted first, and the two entities may be found to have no relationship when being classified, the two entities having no relationship are redundant for other tasks of the subsequent knowledge graph, such as entity linking, and using unrelated candidate entities by a question and answer system, the calculated amount is increased, and the model accuracy is reduced.
(3) Interactive missing: the entity recognition task and the relation extraction task may have an association or parameter sharing, and the interaction is lost simply by taking the last subtask output as the next subtask input.
In the second category, a pipeline-based method sometimes extracts a pair of entities that do not have a relationship, and because this method only performs a triple extraction on a sentence once, it also results in that a plurality of triples contained in a sentence are not extracted all. Most importantly, if a sentence contains overlapping entities or relationships, neither the traditional model nor the joint extraction model can completely extract the triples. As shown in fig. 7.
(1) Single Entity Overlap (Single Entity overlay) for example, as shown in Table 1, the sentences "Zhou Jie Lun has evolved" head word D "and" secret cannot say ". "wherein the head entity" Zhou Jilun "corresponds to 2 tail entities" head word D "and" secret that can not be said ", two groups of triples are formed respectively, except that the relation and tail entities are the same, the head entity is overlapped. This problem exists in many sentences.
(2) Entity Pair Overlap (Entity Pair overlay) the sentence "secret cannot say" from the director and lead of Zhou Jie Lung is sold in the box office. "includes 2 sets of entity pairs (Zhoujilun, secret to say) and is a typical overlap of entity pairs where the relationships are different, one is" actor "and one is" director ".
The invention provides a joint entity relationship extraction method aiming at the problems of the entity relationship extraction task. In view of the good performance of the codec frame on other natural language processing tasks, the invention builds the AT-BiOnLSTM-Point pointer network decoding model added with the disturbance item to extract the entity relationship triples by improving the traditional LSTM network based on the codec frame.
Disclosure of Invention
The invention provides an entity relationship extraction method based on ordered structure encoding pointer network decoding, and aims to improve three indexes of entity relationship extraction task accuracy, recall rate and F1 value and the capability of extracting overlapping entity triples. The method comprises the following steps:
(1) and selecting characteristics in an input layer to construct an initial sentence vector, and vectorizing and representing the sentence.
(2) And capturing hierarchical structure information at the coding layer to obtain the hidden embedding of each word of the sentence.
(3) And (4) further extracting abstract features by using a pointer network from the features before the integration of the decoding layer, and extracting sentence triples.
Drawings
FIG. 1 is an overall framework diagram of the entity relationship extraction model of the present invention.
FIG. 2 is an example of an entity to be extracted and a relational data set in a sentence according to the present invention.
Fig. 3 is a schematic diagram of adding an AT perturbation term after a presentation layer, which is adopted by the present invention.
FIG. 4 is a diagram illustrating hierarchical granularity in a sentence according to the present invention.
FIG. 5 is a schematic diagram of the structure of the On-LSTM unit employed in the present invention.
FIG. 6 is a schematic diagram of a pointer network employed in the present invention.
FIG. 7 is an example of the entity overlap type problem of the present invention.
FIG. 8 is a diagram of NYT data set information as used in the present invention.
Detailed Description
The technical solution in the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings in the embodiments of the present invention.
As shown in fig. 1, the invention mainly uses a codec framework as a basis, constructs a pointer network added with a disturbance item by improving the traditional LSTM to extract the entity relationship, and mainly comprises an Input Layer (Input Layer), an encoding Layer (Encoder Layer), and a decoding Layer (Decoder Layer) (including a head entity pointer labeling Layer and a tail entity relationship pointer labeling Layer). The specific implementation mode is as follows:
The method comprises the following steps: input layer
The improved combined entity relation extraction model is put on a standard English data set NYT (New York times) and a derivative version data set thereof for evaluation. The corpus used in the experiment is an NYT data set obtained by Zeng et al by aligning original data with the relationship on freebase, and has 24 relationships, the test set of the data set is manually labeled, and more overlapping entity relationships are added. As shown in fig. 8 (a).
The NYT data set contains a statistically derived number of overlapping type triples as shown in FIG. 8(b), where it can be seen that each sentence contains an average of 1.5 triples, whether in the training set or the test set. The overlapping entity types are then divided into 3 types: neo (normal) indicates sentences with no entities or overlapping pairs of entities, EPO indicates sentences in which the head and tail entities overlap, SPO indicates sentences in which only a single entity (which may be either the head or tail entity) overlaps.
As shown in fig. 2, the example of the entity to be extracted and the relationship data set in the NYT data set applied by the entity relationship extraction task is shown.
The method comprises the steps of completing sentence vectorization on an input layer, firstly performing Word Embedding on an input sentence by using a Word vector trained by a pre-training model, then adding countermeasure training in the middle of outputting the Word vector to an encoder on a vector representation layer, generating a negative example represented by a sentence vector, and enhancing the performance of model training.
We convert each word into a vector consisting of the following two parts.
1. Word vector
The entity relation extraction task needs to contact the context to find out each entity word and relation word, and how to find the entities and the relation needs to be identified according to the context. Therefore, the invention adopts the word vector which is trained by the BERT pre-training language model and is based on the context to carry out space mapping on the input sentence.
The advantage of BERT over word2vec is that the word vectors trained by BERT are not static, i.e., the semantics are not fixed, and can be well represented for sentences containing ambiguous words; compared with the mode that the ElMo adopts the bidirectional LSTM splicing fusion characteristic, the mode is naturally weaker than the BERT integrated fusion characteristic mode; the better GPT is because it is a one-way language model, and naturally much weaker than BERT.
BERT employs a two-stage training model consistent with GPT: first, language pre-training is performed, and second, Fine-Tuning (Fine-Tuning) is used when applied to downstream tasks.
First, the input sentence sequence may be expressed as X ═ { X1,x2,…xi,…,xnIn which xiRepresent the ith character in the sequence, then we use the pre-trained BERT word vector to convert x iIs shown as
Figure RE-GDA0003135261630000041
The vector dimension is d.
Then the word vector matrix for the entire sentence is as shown in equation 1.
E=[e1,e2,…,en] (1)
2. Counter training
The countermeasure training (AT) is first proposed in image processing, and aims to improve the robustness of successful classifier identification in an image identification environment. In natural language processing, various variants are generated for different task confrontation training, such as based on text classification, part-of-speech tagging, and the like. So-called confrontational training is actually considered a regularization method. But the countertraining is different from many regularization methods, which introduce random noise, and the countertraining improves the model performance by generating perturbation that is easily recognized as an error example by the classifier.
In order to improve the performance of the entity relationship extraction model, the invention adds countermeasure training on the word embedding layer, and generates a negative example of the original input information by adding some noise on the spliced word vector representation layer, as shown in fig. 2.
The input representation layer model comprises word vectors and countertraining, and a small perturbation function is added in the training data. As shown in equation 2.
Figure RE-GDA0003135261630000051
I.e. by applying the worst-case disturbance ηadvAdded to the original embedded vector ω to maximize the loss function. Wherein the content of the first and second substances,
Figure RE-GDA0003135261630000052
is a copy of the current model parameters. Then, the original case and the generated negative case are jointly trained, so the final loss is as shown in equation 3.
Figure RE-GDA0003135261630000053
Step two: coding layer
For tasks in different fields, different combination modes can be selected for the coding layer and the decoding layer, for example, on an image processing task, a convolutional neural network is usually used to form the coding layer, and for a natural language processing field task of extracting event elements, a cyclic neural network is usually selected.
In the text processing of Chinese, we have a level concept, where words are the lowest level, followed by words, followed by sentences, paragraphs, and the like. The higher the hierarchy, the coarser the granularity, the larger the span of information in the sentence. FIG. 4 is a schematic diagram of hierarchical granularity.
However, the neurons of the conventional recurrent neural networks such as LSTM are often disordered, so that the neurons cannot learn and extract hierarchical structure information. Therefore, the invention selects the Bi-directional ordered long-short term memory network (Bi-OnLSTM) as the basic structure of the coding layer, so that the high-level information can be kept for a longer time in the corresponding period, the low-level information is easier to forget in the corresponding interval, and the different information propagation spans form the hierarchical structure of the input sequence. The forward calculation formula of the On-LSTM is shown in formula 4, and FIG. 5 is a schematic structural diagram of the On-LSTM unit.
Figure RE-GDA0003135261630000054
Wherein, the On-LSTM is modified compared with the traditional LSTM, and the On-LSTM is mainly provided with a main forgetting door
Figure RE-GDA0003135261630000061
Main input gate
Figure RE-GDA0003135261630000062
And
Figure RE-GDA0003135261630000063
right/left direction cumsum operations, respectively.
The present invention designs the introduced On-LSTM as a bi-directional network. In the entity relationship extraction task, the acquisition of the unidirectional left-to-right information is not enough to support the entity relationshipAn extraction task needs a layer of On-LSTM from right to left to obtain the following information, so that the structure of an improved joint entity relation extraction model coding layer is Bi-OnLSTM. Computing word x at t time by forward On-LSTMtLeft state
Figure RE-GDA0003135261630000064
(final hidden state of forward propagation layer), and then utilizing backward On-LSTM to calculate word x at time ttRight state
Figure RE-GDA0003135261630000065
(final hidden state of the counter-propagating layer), then the word xtThe output result at the coding layer at time t is
Figure RE-GDA0003135261630000066
Step three: decoding layer
Because the Bi-OnLSTM of the coding layer captures all hierarchical information and sequence information, the invention extracts the joint entity relationship at the decoding layer and solves the problem of entity relationship overlapping by using the decoding idea of a pointer network.
The invention is different from the prior method of extracting the entities first and then judging the relationship between the entities, and adopts an improved extraction mechanism instead. Fig. 6 is a schematic view of a pointer network. The task can be divided into two stages, the first stage is to mark out possible candidate head entities in the sentence, the second stage is to mark out tail entities and relations according to the semantic and position characteristics of the candidate head entities, so that the problem that one head entity can be overlapped with a plurality of tail entities and relations is solved, and because one head entity obtains triples according to the semantic and position characteristics, the extraction of meaningless triples is avoided, and the redundant information is reduced.
Then, the conventional triple extraction formula becomes a conditional probability formula, as shown in formula 5.
p(s,p,o|Sen)=p(s|Sen)p(p,o|s,Sen) (5)
In the formula, Sen is a sentence representation, and s, p, o are entity relationship triples. First, we use the head entity tag p (s | Sen) to identify the head entity in a sentence, and then use the tail entity tag p (p, o | s, Sen) to identify the tail entity having a corresponding relationship with the head entity for each relationship r.
The extraction of the abstract feature triples by the joint entity relationship extraction decoding layer is composed of the following two modules.
1. Head entity extraction module
Coding vector h output by Bi-On-LSTM coding layeriAnd a head entity extraction module of the decoding layer decodes the vector to identify all possible vectors of the head entity. First, add a header entity label layer on top of the coding layer output, i.e. use two layers of classifiers (label layer): the start layer and the end layer recognize the start position and the end position of the header entity. The specific operation is to use a binary label (0, 1) to label each token represented by a sentence: the token in the start layer if it carries a "1" tag indicates the start position, and the token in the end layer if it carries a "0" tag. The head entity labeling layer calculates the probability of the head entity possibly existing in the sentence as shown in formula 6.
Figure RE-GDA0003135261630000071
Wherein, a Bi-OnLSTM layer is added in the start label layer firstly, and h before decodingiSending to the layer to obtain further hidden state vector of initial position of head entity
Figure RE-GDA0003135261630000072
Is the probability of the beginning position of the head entity token. Head entity end position hidden state vector
Figure RE-GDA0003135261630000073
σ is the probability of the head entity token ending position, σ is the activation function.
Then, the maximum likelihood function is calculated for all possible starting positions and ending positions of the head entity, so as to obtain an input sentence token representation x (x)i=hN[i]) The head entity range of (2) is shown in equation 7.
Figure RE-GDA0003135261630000074
Where L is the length of the sentence token, when x is 1, f { x } ═ 1; conversely, when x is 0, f { x }, is 0.
2. Tail entity and relation extraction module
W head entity vector representation of head entity label layer output
Figure RE-GDA0003135261630000075
And hidden state vector representation x of the coding layer outputi=hN[i]Sending the tail entity and the relation label layer. Similarly, the probability of the possible tail entity is shown in equation 8.
Figure RE-GDA0003135261630000076
The tail entity maximum likelihood function is shown in equation 9.
Figure RE-GDA0003135261630000077
Finally, the loss function is calculated according to equations 8 and 9 as shown in equation 10.
Figure RE-GDA0003135261630000078
Although illustrative embodiments of the present invention have been described above to facilitate the understanding of the present invention by those skilled in the art, it should be understood that the present invention is not limited in scope to the specific embodiments. Such variations are obvious and all the inventions utilizing the concepts of the present invention are intended to be protected.

Claims (4)

1. An entity relation extraction method based on ordered structure encoding pointer network decoding is characterized in that the method aims at identifying and extracting a triple composed of entities and relations in a sentence, and the method comprises the following steps:
step 1: selecting characteristics on an input layer to construct an initial sentence vector, and vectorizing and representing the sentence;
step 2: capturing hierarchical structure information on a coding layer, and acquiring hidden embedding of each word of a sentence;
and step 3: and (4) further extracting abstract features by using a pointer network from the features before the integration of the decoding layer, and extracting sentence triples.
2. The method for extracting entity relationship based on ordered structure coded pointer network decoding as claimed in claim 1, wherein the constructing of sentence initial vector in step 1 specifically refers to: in the entity relation extraction task, the invention selects a word vector to add into a countertraining negative case to represent a sentence;
step 1.1: training word vector
With X ═ X1,x2,…xi,…,xnDenotes a sequence of input sentences, where xiRepresenting the ith character in the sequence, adopting word vectors which are trained by a BERT pre-training language model and are based on context to perform space mapping on an input sentence, and adopting a two-stage training model: first, language pre-training is performed, and second, Fine-Tuning (Fine-Tuning) is used when applied to downstream tasks; then, we use the pre-trained BERT word vector to convert x iIs shown as
Figure FDA0002998390110000011
The vector dimension is d;
then the word vector matrix of the whole sentence is as shown in equation 1;
E=[e1,e2,…,en] (1)
step 1.2: counter training
In order to improve the performance of the entity relationship extraction model, the invention adds countermeasure training on the word embedding layer, and generates a negative example of the original input information by adding some noises on the spliced word vector representation layer, as shown in fig. 2;
the input representation layer model comprises word vectors and countermeasure training, and a small disturbance function is added in the training data; as shown in equation 2;
Figure FDA0002998390110000012
i.e. by applying the worst-case disturbance ηadvAdding to the original embedded vector ω, thereby maximizing the loss function; wherein the content of the first and second substances,
Figure FDA0002998390110000013
is a copy of the current model parameters; then, the original case and the generated negative case are jointly trained, so the final loss is as shown in equation 3.
Figure FDA0002998390110000014
3. The entity relationship extraction method based on ordered structure coded pointer network decoding as claimed in claim 2, wherein the capturing of hierarchical structure information and sequence information at the coding layer in step 2 specifically refers to:
for tasks in different fields, different combination modes can be selected for the coding layer and the decoding layer, for example, on an image processing task, a convolutional neural network is usually used for forming the coding layer, and for the natural language processing field task of extracting event elements, a cyclic neural network is usually selected;
In the text processing of Chinese, a concept of a hierarchy exists, a word is the lowest hierarchy, words are the second, and sentences, paragraphs and the like are the next; the higher the hierarchy is, the coarser the granularity is, the larger the span of the information in the sentence is; FIG. 4 is a schematic diagram of hierarchical granularity;
however, the neurons of the conventional recurrent neural networks such as LSTM are usually disordered, so that the neurons cannot learn and extract hierarchical structure information; therefore, the invention selects the bidirectional ordered long-short term memory network (Bi-OnLSTM) as the basic structure of the coding layer, so that the high-level information can be kept for a longer time in the corresponding period, the low-level information is easier to forget in the corresponding interval, and the different information propagation spans form the hierarchical structure of the input sequence; the forward calculation formula of the On-LSTM is shown as formula 4, and FIG. 5 is a schematic structural diagram of the On-LSTM unit;
Figure FDA0002998390110000021
wherein, the On-LSTM is modified compared with the traditional LSTM, and the On-LSTM is mainly provided with a main forgetting door
Figure FDA0002998390110000022
Main input gate
Figure FDA0002998390110000023
Figure FDA0002998390110000024
And
Figure FDA0002998390110000025
right/left cumsum operation, respectively;
the introduced On-LSTM is designed into a bidirectional network; in the entity relationship extraction task, only acquiring unidirectional left-to-right upper information is not enough to support the entity relationship extraction task, a layer of right-to-left On-LSTM is needed to acquire the lower information, and then the improved coding layer structure of the combined entity relationship extraction model is Bi-OnLSTM; computing word x at t time by forward On-LSTM tLeft state
Figure FDA0002998390110000026
(final hidden state of forward propagation layer), and then utilizing backward On-LSTM to calculate word x at time ttRight state
Figure FDA0002998390110000027
(final hidden state of the counter-propagating layer), then the word xtThe output result at the coding layer at time t is
Figure FDA0002998390110000028
4. The entity relation extraction method based on ordered structure coded pointer network decoding as claimed in claim 3, wherein the feature before decoding layer synthesis in step 4 further extracting abstract features by using pointer network specifically refers to:
because Bi-OnLSTM of the coding layer captures all hierarchical information and sequence information, the invention extracts the joint entity relationship at the decoding layer and solves the problem of entity relationship overlapping by using the decoding idea of a pointer network;
the invention is different from the prior method of extracting the entities first and then judging the relationship between the entities, and adopts an improved extraction mechanism; FIG. 6 is a schematic diagram of a pointer network; the task can be divided into two stages, wherein the first stage is to mark possible candidate head entities in the sentence, and the second stage is to mark tail entities and relations according to the semantic and position characteristics of the candidate head entities, so that the overlapping problem that one head entity can correspond to a plurality of tail entities and relations is solved, and because one head entity obtains triples according to the semantic and position characteristics, the extraction of meaningless triples is avoided, and the redundant information is reduced;
Therefore, the conventional triple extraction formula becomes a conditional probability solving formula, as shown in formula 5;
p(s,p,o|Sen)=p(s|Sen)p(p,o|s,Sen) (5)
in the formula, Sen is represented by sentences, and s, p and o are entity relationship triples; firstly, using a head entity label p (s-Sen) to identify a head entity in a sentence, and then using a tail entity label p (p, o-s, Sen) to identify a tail entity corresponding to the head entity for each relation r;
extracting an abstract feature triple through a combined entity relationship extraction decoding layer, wherein the abstract feature triple consists of the following two modules;
step 4.1: head entity extraction
Coding vector h output by Bi-OnLSTM coding layeriSending the vector to a head entity extraction module of a decoding layer for decoding, and identifying all vectors which may be head entities; first, add a header entity label layer on top of the coding layer output, i.e. use two layers of classifiers (label layer): identifying the starting position and the ending position of the head entity by the start layer and the end layer; the specific operation is to use a binary label (0, 1) to label each token represented by a sentence: if the token in the start layer is provided with a label of '1' to represent the starting position, and if the token in the end layer is provided with a label of '0'; the probability of the head entity possibly existing in the sentence is solved by the head entity label layer and is shown as a formula 6;
Figure FDA0002998390110000031
Wherein, a Bi-LSTM layer is added in the start label layer, h _ i before decoding is sent to the layer to obtain a further hidden state vector of the initial position of the head entity
Figure FDA0002998390110000032
Probability of starting position of token which is a head entity; head entity end position hidden state vector
Figure FDA0002998390110000033
The probability of the ending position of the token of the head entity is sigma, and sigma is an activation function;
then, the maximum likelihood function is calculated for all possible starting positions and ending positions of the head entity, so as to obtain an input sentenceChild token represents x (x)i=hN[i]) The head entity range of (a) is shown in equation 7;
Figure FDA0002998390110000034
where L is the length of the sentence token, when x is 1, f { x } ═ 1; conversely, when x is 0, f { x }, is 0;
step 4.2: tail entity and relationship extraction
W head entity vector representation of head entity label layer output
Figure FDA0002998390110000041
And hidden state vector representation x of the coding layer outputi=hN[i]Sending a tail entity and a relation label layer; similarly, the probability of the possible tail entity is shown in equation 8;
Figure FDA0002998390110000042
the maximum likelihood function of the tail entity is shown in equation 9;
Figure FDA0002998390110000043
finally, the loss function is calculated according to equations 8 and 9 as shown in equation 10.
Figure FDA0002998390110000044
CN202110338079.9A 2021-03-30 2021-03-30 Entity relation extraction method based on ordered structure encoding pointer network decoding Pending CN113553850A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110338079.9A CN113553850A (en) 2021-03-30 2021-03-30 Entity relation extraction method based on ordered structure encoding pointer network decoding

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110338079.9A CN113553850A (en) 2021-03-30 2021-03-30 Entity relation extraction method based on ordered structure encoding pointer network decoding

Publications (1)

Publication Number Publication Date
CN113553850A true CN113553850A (en) 2021-10-26

Family

ID=78101730

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110338079.9A Pending CN113553850A (en) 2021-03-30 2021-03-30 Entity relation extraction method based on ordered structure encoding pointer network decoding

Country Status (1)

Country Link
CN (1) CN113553850A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051929A (en) * 2021-03-23 2021-06-29 电子科技大学 Entity relationship extraction method based on fine-grained semantic information enhancement
CN113869049A (en) * 2021-12-03 2021-12-31 北京大学 Fact extraction method and device with legal attribute based on legal consultation problem
CN114298052A (en) * 2022-01-04 2022-04-08 中国人民解放军国防科技大学 Entity joint labeling relation extraction method and system based on probability graph
CN114691895A (en) * 2022-05-31 2022-07-01 南京航天数智科技有限公司 Criminal case entity relationship joint extraction method based on pointer network
CN115169326A (en) * 2022-04-15 2022-10-11 山西长河科技股份有限公司 Chinese relation extraction method, device, terminal and storage medium
CN116226408A (en) * 2023-03-27 2023-06-06 中国科学院空天信息创新研究院 Agricultural product growth environment knowledge graph construction method and device and storage medium
CN117408247A (en) * 2023-12-15 2024-01-16 南京邮电大学 Intelligent manufacturing triplet extraction method based on relational pointer network

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109408812A (en) * 2018-09-30 2019-03-01 北京工业大学 A method of the sequence labelling joint based on attention mechanism extracts entity relationship
US20190370325A1 (en) * 2018-06-04 2019-12-05 Infosys Limited Extraction of tokens and relationship between tokens to form an entity relationship map
CN111914091A (en) * 2019-05-07 2020-11-10 四川大学 Entity and relation combined extraction method based on reinforcement learning
CN111950297A (en) * 2020-08-26 2020-11-17 桂林电子科技大学 Abnormal event oriented relation extraction method
CN112183103A (en) * 2020-10-27 2021-01-05 杭州电子科技大学 Convolutional neural network entity relationship extraction method fusing different pre-training word vectors

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20190370325A1 (en) * 2018-06-04 2019-12-05 Infosys Limited Extraction of tokens and relationship between tokens to form an entity relationship map
CN109408812A (en) * 2018-09-30 2019-03-01 北京工业大学 A method of the sequence labelling joint based on attention mechanism extracts entity relationship
CN111914091A (en) * 2019-05-07 2020-11-10 四川大学 Entity and relation combined extraction method based on reinforcement learning
CN111950297A (en) * 2020-08-26 2020-11-17 桂林电子科技大学 Abnormal event oriented relation extraction method
CN112183103A (en) * 2020-10-27 2021-01-05 杭州电子科技大学 Convolutional neural network entity relationship extraction method fusing different pre-training word vectors

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
QIANQIAN ZHANG 等: "A Review on Entity Relation Extraction", 《2017 SECOND INTERNATIONAL CONFERENCE ON MECHANICAL, CONTROL AND COMPUTER ENGINEERING》 *
张心怡 等: "面向煤矿的实体识别与关系抽取模型", 《计算机应用》 *

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113051929A (en) * 2021-03-23 2021-06-29 电子科技大学 Entity relationship extraction method based on fine-grained semantic information enhancement
CN113869049A (en) * 2021-12-03 2021-12-31 北京大学 Fact extraction method and device with legal attribute based on legal consultation problem
CN114298052A (en) * 2022-01-04 2022-04-08 中国人民解放军国防科技大学 Entity joint labeling relation extraction method and system based on probability graph
CN115169326A (en) * 2022-04-15 2022-10-11 山西长河科技股份有限公司 Chinese relation extraction method, device, terminal and storage medium
CN114691895A (en) * 2022-05-31 2022-07-01 南京航天数智科技有限公司 Criminal case entity relationship joint extraction method based on pointer network
CN116226408A (en) * 2023-03-27 2023-06-06 中国科学院空天信息创新研究院 Agricultural product growth environment knowledge graph construction method and device and storage medium
CN116226408B (en) * 2023-03-27 2023-12-19 中国科学院空天信息创新研究院 Agricultural product growth environment knowledge graph construction method and device and storage medium
CN117408247A (en) * 2023-12-15 2024-01-16 南京邮电大学 Intelligent manufacturing triplet extraction method based on relational pointer network
CN117408247B (en) * 2023-12-15 2024-03-29 南京邮电大学 Intelligent manufacturing triplet extraction method based on relational pointer network

Similar Documents

Publication Publication Date Title
WO2021147726A1 (en) Information extraction method and apparatus, electronic device and storage medium
CN113553850A (en) Entity relation extraction method based on ordered structure encoding pointer network decoding
CN110717017B (en) Method for processing corpus
US20220050967A1 (en) Extracting definitions from documents utilizing definition-labeling-dependent machine learning background
CN114064918B (en) Multi-modal event knowledge graph construction method
CN113254610B (en) Multi-round conversation generation method for patent consultation
CN114020936B (en) Construction method and system of multi-modal affair map and readable storage medium
US12002276B2 (en) Document distinguishing based on page sequence learning
CN115034224A (en) News event detection method and system integrating representation of multiple text semantic structure diagrams
CN112100332A (en) Word embedding expression learning method and device and text recall method and device
KR102379660B1 (en) Method for utilizing deep learning based semantic role analysis
Arumugam et al. Hands-On Natural Language Processing with Python: A practical guide to applying deep learning architectures to your NLP applications
Perez-Martin et al. A comprehensive review of the video-to-text problem
CN111881292A (en) Text classification method and device
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN113312912A (en) Machine reading understanding method for traffic infrastructure detection text
CN114880307A (en) Structured modeling method for knowledge in open education field
Cao et al. Visual question answering research on multi-layer attention mechanism based on image target features
CN117172253A (en) Label information guiding-based social media multi-modal named entity recognition method
CN116628207A (en) Training method and device for text classification model, electronic equipment and storage medium
CN114519353B (en) Model training method, emotion message generation method and device, equipment and medium
CN115964497A (en) Event extraction method integrating attention mechanism and convolutional neural network
CN116186241A (en) Event element extraction method and device based on semantic analysis and prompt learning, electronic equipment and storage medium
CN114911940A (en) Text emotion recognition method and device, electronic equipment and storage medium
Zhang Exploration of Cross‐Modal Text Generation Methods in Smart Justice

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20211026

WD01 Invention patent application deemed withdrawn after publication