CN113761893B - Relation extraction method based on mode pre-training - Google Patents

Relation extraction method based on mode pre-training Download PDF

Info

Publication number
CN113761893B
CN113761893B CN202111331381.8A CN202111331381A CN113761893B CN 113761893 B CN113761893 B CN 113761893B CN 202111331381 A CN202111331381 A CN 202111331381A CN 113761893 B CN113761893 B CN 113761893B
Authority
CN
China
Prior art keywords
training
entity
text
model
dependency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202111331381.8A
Other languages
Chinese (zh)
Other versions
CN113761893A (en
Inventor
杜熙源
高成睿
岳元浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Aerospace Technology & Innovation Industrial Co ltd
Original Assignee
Shenzhen Aerospace Technology & Innovation Industrial Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Aerospace Technology & Innovation Industrial Co ltd filed Critical Shenzhen Aerospace Technology & Innovation Industrial Co ltd
Priority to CN202111331381.8A priority Critical patent/CN113761893B/en
Publication of CN113761893A publication Critical patent/CN113761893A/en
Application granted granted Critical
Publication of CN113761893B publication Critical patent/CN113761893B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/047Probabilistic or stochastic networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/048Activation functions
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • Computational Linguistics (AREA)
  • Molecular Biology (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computing Systems (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The scheme discloses a mode pre-training relationship extraction method, which is mainly used for solving the problems of entity boundary determination and relationship distribution imbalance in entity relationship joint extraction. The scheme constructs a mode for pre-training the model, and the model is modeled in an encoding-decoding mode. The column pre-trains the encoding part and the decoding part. The coded portion is pre-trained with the associated entity data to obtain information that predicts the entity boundaries. The decoding part is pre-trained by relevant relational data, a dependency tree is constructed by using syntactic dependency analysis, and part of information is shielded according to a mask mechanism in an attention mechanism through an adjacent matrix of the dependency tree. And finally, updating the parameters in the whole frame in a supervised learning mode, and finally learning a relationship extraction model with strong adaptability and strong expression capability.

Description

Relation extraction method based on mode pre-training
Technical Field
The invention relates to a relation extraction method based on mode pre-training, and belongs to the technical field of information extraction.
Background
With the rapid development and popularization of computers and the internet, the amount of data created by humans is on a high-speed growth trend. In this information explosion age, how to analyze and process information rapidly and extract valuable information from text becomes a research hotspot and an urgent problem to be solved. In response to these challenges, it is imperative to develop an automated information processing tool that automatically and quickly extracts valuable knowledge from a vast amount of information. In this context, Information Extraction (IE) technology has become a hot content of academic and industrial research, and the purpose of Information Extraction is to extract specific and valuable Information from semi-structured and unstructured texts and structured data, and process the extracted Information into structured data that can be easily stored and calculated by computers, and the Information Extraction includes Entity identification (Entity Recognition), relationship Extraction (relationship Extraction) and Event Extraction (Event Extraction).
As one of important tasks of information extraction, the relation extraction aims at extracting semantic relations contained between two entities from a text, and has wide application value in the fields of mass data processing, automatic construction of knowledge bases, automatic question answering and the like; for example, the large-scale Knowledge base systems constructed by automatically processing large-scale Web texts through a relational extraction technology can be applied to search engine optimization, for example, Knowledge Vault constructed by Google with more than 16 hundred million fact data is applied to improvement of search results, and the user experience effect is improved. Meanwhile, the relationship extraction technology provides infrastructure in other tasks in the natural language field, such as entity linking, automatic summarization, emotion analysis and the like.
The idea of using a pre-training method to improve the performance of a model has been widely applied in the field of deep learning. Natural language processing researchers have proposed methods of pre-training language models based on predecessors, the most well-known of which are BERT, GPT, ELMO, and the like. According to the different problems, the aim of the pre-training model is divided into two categories, including the direction facing language coding and the direction facing model vector space. The direction of language-oriented coding mainly solves the problem of coding a text, namely how to code a section of text, for example, a deep learning convolutional neural network codes the text by using vector representation of pre-trained words. The model-oriented vector space mainly solves the problems of long training time and high cost in deep learning by adjusting model parameters to the problem of easy learning, and introduces some information to adjust the parameters of the model to the position of easy convergence to an optimal solution, including introducing artificially labeled information or additional information contained in an entity, such as entity category and the like. It is noted that the above two types of approaches are not mutually exclusive and can be used in common.
Disclosure of Invention
Aiming at the technical problems in the prior art, the invention provides an entity-relationship joint extraction method based on mode pre-training, which comprises the steps of constructing a pre-training mode by introducing a pre-training structure, training the capability of a model for positioning an entity by using an entity marking task, and extracting the relationship between two entities by an attention mechanism.
The scheme discloses a body-relation combined extraction method based on mode pre-training. The scheme provides a method for supervising learning, and triples of < entity h, relation and entity t > can be mined from a text. For each text segment, firstly pre-training a model through a pre-training network, wherein a pre-training task comprises an encoding part and a decoding part; the coding part enhances the capability of searching the entity through the entity data; the decoding section enhances the weights of the entities in the relationship extraction by syntactic dependency analysis. The pre-training is carried out by using the steps, then the pre-training structure is removed, and the formal data is used for training, so that the probability distribution of the relationship between the entity pairs in the text is obtained.
In order to achieve the purpose, the technical scheme of the invention is as follows: a relation extraction method based on mode pre-training comprises the following specific steps:
step 1) pre-training in an encoding stage, constructing a pre-training network for acquiring information of an entity position, and representing related information by labeling a text, which mainly comprises the following steps:
a) marking entity position information, namely the head, the interior and the tail of the entity, and identifying various entities to provide information for the model;
b) the expectation of pre-training the coding part model is to obtain the position information of an entity, wherein the specification is that the head character of the entity represents the information of the whole entity, a BERT pre-training language model is used for coding a text, and pre-training is carried out based on the coding mechanism so that the model can obtain all the information;
c) the encoding part is pre-trained, and specifically, the pre-training in the encoding part is a multi-training task of word prediction and boundary prediction, that is, for a piece of text, given a character, the model predicts the head and tail positions of an entity where the character is located, and predicts the content of the whole entity (if the character is the head of the entity). In actual operation, a neural network is added to a BERT coding result, probability mapping is carried out, and BIO labels corresponding to sequences are predicted.
Step 2) pre-training in a decoding stage, enhancing information of an entity through a mask matrix of an attention mechanism, obtaining pre-trained data by using a syntax dependency tree, and generating the mask matrix, wherein the method mainly comprises the following steps:
a) in the SDP (shortest dependency path) of the entity-related text, the shortest dependency path between entities is found by performing word segmentation, part of speech tagging and dependency syntax analysis on the text containing the target entity, and the shortest dependency path contains all semantic information between the entities in the text, so that syntactic dependency analysis is performed on a large number of linguistic data by using the syntactic dependency analysis, and the dependency tree is sampled according to the constructed dependency tree to obtain the relationships of some major predicates and verb guest types as the training data of a decoding part;
b) for an original text, a syntax dependency tree is constructed by using a syntax dependency analysis tool to obtain a corresponding adjacent matrix, and the matrix is used as a mask matrix of a decoding part attention mechanism;
c) applying a mask mechanism in the attribute to mask vectors, wherein in a syntactic dependency tree, a node is regarded as an entity, the importance of the entity in the overall task is highlighted in such a way, the capability of the model for detecting the entity is enhanced, and the two vectors are subjected to weighted summation to obtain the vector for decoding;
d) the prediction result of the decoding part is represented by a label representation method in the form of a matrix, the elements represented by the rows and the columns of the matrix are the same and are all entities corresponding to the input text, and the relationship between the two entities is represented by the values of the matrix, obviously, the matrix is a symmetrical matrix;
e) and adding a prediction structure of the predicate-object triple behind the transform part to finish training.
Step 3) training the model by using formal data:
in formal training, the network in the pre-training stage is removed, the network is trained in a mini-batch gradient descending mode, and parameters are updated. The method comprises the steps of abandoning a full connection layer added in a pre-training process, directly using a BERT model which is pre-trained to encode a text, inputting the text to a transform part of a decoding layer, modeling a triple into a mapping from a subject to a predicate, and normalizing an output node value through a softmax function to obtain the probability distribution of the text-to-relation.
Wherein, in step 1) a) additional information needs to be obtained in different ways.
1-1) firstly, carrying out word segmentation on a text original text by using a related tool, and obtaining entity marking data by a manual fine adjustment method, wherein the entity marking data comprises the category of an entity and the position of the entity. And performing word segmentation, part of speech tagging and dependency syntactic analysis on the text containing the entity pairs. In the dependency syntax tree generated by the dependency syntax analysis, the shortest link between two entities is found. The words and edges on this shortest connected path will be the shortest dependent path of the entity pair in the text. In this way, the relationship of the main and the subordinate is constructed.
1-2) pre-training a word vector model. Word vectors in the data set are trained in advance using BERT or GPT approaches. If not, the method of the scheme keeps synchronous training with the model parameters. But in effect, word vectors trained on large-scale predictions in advance can hold more semantic and grammatical information.
Training a pre-training language model in a coding stage by using the constructed entity data in the step 1):
in the boundary prediction task, mapping the pre-training language model, and performing the following steps by a Bi-LSTM sublayer:
Figure 387173DEST_PATH_IMAGE001
and passing the obtained mapping result through a CRF layer:
Figure 773155DEST_PATH_IMAGE002
wherein
Figure 21734DEST_PATH_IMAGE003
In order to normalize the factors, the method comprises the steps of,the calculation formula is as follows:
Figure 885784DEST_PATH_IMAGE004
in the word prediction task, 15% of tokens in an entity are masked (Mask), and are directly replaced by Mask with 80% probability, and replaced by any other word with 10% probability, and the original Token is retained with 10% probability.
Step 1) c) obtaining a series of vectors for the text, represented as feature matrices, by b) in step 1)
Figure 168998DEST_PATH_IMAGE005
. Wherein the content of the first and second substances,
Figure 194723DEST_PATH_IMAGE005
is a feature vector representation of all nodes of the input, the dimensions are
Figure 429133DEST_PATH_IMAGE006
Figure 96875DEST_PATH_IMAGE007
Is the number of the nodes that are present,
Figure 31333DEST_PATH_IMAGE008
is the dimension of the vector representation of each node.
And performing dependency analysis on the original text by using a syntactic dependency tool to construct a syntactic dependency tree, and obtaining an adjacency matrix M of the text, wherein the dimensionality is N x N. By performing operation through a multihead-attribute mechanism, a mask mechanism in a transform can shield entities which are not 1 in a matrix, highlight information of related entities, and perform pre-training on a transform layer, which is specifically as follows:
Figure 493538DEST_PATH_IMAGE009
wherein
Figure 716709DEST_PATH_IMAGE010
In order to be a mask matrix, the method comprises the following steps of,
Figure 922562DEST_PATH_IMAGE011
a mask operator representing the type of attention,
Figure 649210DEST_PATH_IMAGE012
for the three vectors of the attention mechanism,
Figure 547896DEST_PATH_IMAGE013
is a vector
Figure 258363DEST_PATH_IMAGE014
The dimension (c) of (a) is,
Figure 495004DEST_PATH_IMAGE015
represents the first
Figure 404054DEST_PATH_IMAGE015
A head.
In step 2) a), the output of step 1) is a matrix Z of N × F, Z being represented as a sequence Z = { Z0, Z1.., ZN }, each node being a vector of dimension F. And then, calculating the vector representation of each node according to a weighted average mode to obtain the final vector representation. Combining the results of each calculation, through a feedforward neural network sublayer:
Figure 208062DEST_PATH_IMAGE016
the final obtained vector is expressed as follows:
Figure 405825DEST_PATH_IMAGE017
wherein the content of the first and second substances,
Figure 687902DEST_PATH_IMAGE018
the original result output by the coding layer.
It is specified here that a relational model is a function, the parameters of which are the subject, the result is the object, and the probability that the ith character is the starting position and the ending position of the subject is:
Figure 654721DEST_PATH_IMAGE019
wherein
Figure 691947DEST_PATH_IMAGE020
Is a function of the Sigmod and,
Figure 111427DEST_PATH_IMAGE021
is a parameter that is trainable,
Figure 462774DEST_PATH_IMAGE022
is as follows
Figure 18520DEST_PATH_IMAGE023
A word vector of locations.
The probability of the position of the object is:
Figure 928445DEST_PATH_IMAGE024
wherein
Figure 631959DEST_PATH_IMAGE025
Is a vector representation of the kth subject,
Figure 521417DEST_PATH_IMAGE020
is a function of the Sigmod and,
Figure 197249DEST_PATH_IMAGE026
are trainable parameters.
Step 2) training through the data generated in step 1), wherein the specific loss function is as follows:
and an encoding part:
Figure 45120DEST_PATH_IMAGE027
a decoding part:
Figure 173613DEST_PATH_IMAGE028
wherein
Figure 866762DEST_PATH_IMAGE029
For the purpose of the corresponding text sequence,
Figure 397101DEST_PATH_IMAGE030
are the corresponding labeled sequences.
The training function is represented as:
Figure 150293DEST_PATH_IMAGE031
wherein the content of the first and second substances,
Figure 530196DEST_PATH_IMAGE032
the likelihood function for labeling the part of the subject is specifically defined as:
Figure 27037DEST_PATH_IMAGE033
wherein
Figure 474199DEST_PATH_IMAGE034
For inputting the length of the text when
Figure 132713DEST_PATH_IMAGE035
Taking 1 when true, taking 0 when false,
Figure 501378DEST_PATH_IMAGE036
the value of the marker for the ith token is 0 or 1,
Figure 270751DEST_PATH_IMAGE037
as a parameter
Figure 41260DEST_PATH_IMAGE038
;
Figure 870676DEST_PATH_IMAGE039
The likelihood function of labeling the part of the object is specifically defined as:
Figure 726637DEST_PATH_IMAGE040
wherein
Figure 362017DEST_PATH_IMAGE034
In order to input the length of the text,
Figure 219990DEST_PATH_IMAGE035
taking 1 when true, taking 0 when false,
Figure 485886DEST_PATH_IMAGE036
the value of the marker for the ith token is 0 or 1,
Figure 563563DEST_PATH_IMAGE041
as a parameter
Figure 205897DEST_PATH_IMAGE042
Figure 419841DEST_PATH_IMAGE043
-representing objects without correspondence of subject, marks thereof
Figure 856639DEST_PATH_IMAGE044
Is always 0;
the overall loss function can be expressed as:
Figure 421612DEST_PATH_IMAGE045
wherein
Figure 602058DEST_PATH_IMAGE046
Training of the model is done by maximizing this function for the weights.
And 3) completing the pre-training of the model through the previous steps, removing the pre-trained network, and training by using formal data:
and removing word prediction codes of the pre-training part, and training by using formal data by using the network after the pre-training is finished. By modeling the code from the decoding layer as a function with the parameters subject and the result object as defined above, the probability that it is the starting and ending position of the subject for the ith character is:
Figure 732825DEST_PATH_IMAGE047
wherein
Figure 844918DEST_PATH_IMAGE048
Is a function of the Sigmod and,
Figure 897188DEST_PATH_IMAGE049
are trainable parameters.
The probability of the position of the object is:
Figure 615745DEST_PATH_IMAGE050
wherein
Figure 69860DEST_PATH_IMAGE051
Is a vector representation of the kth subject,
Figure 582881DEST_PATH_IMAGE052
is a function of the Sigmod and,
Figure 122447DEST_PATH_IMAGE053
are trainable parameters.
Wherein the specific loss function is as follows:
and an encoding part:
Figure 644695DEST_PATH_IMAGE054
wherein
Figure 750054DEST_PATH_IMAGE055
For the purpose of the corresponding text sequence,
Figure 699556DEST_PATH_IMAGE056
in order for the sequence of the corresponding labels,
a decoding part:
Figure 959374DEST_PATH_IMAGE057
wherein the content of the first and second substances,
Figure 285313DEST_PATH_IMAGE058
the likelihood function for labeling the part of the subject is specifically defined as:
Figure 182862DEST_PATH_IMAGE059
Figure 303264DEST_PATH_IMAGE060
wherein
Figure 817422DEST_PATH_IMAGE034
In order to input the length of the text,
Figure 681473DEST_PATH_IMAGE035
taking 1 when true, taking 0 when false,
Figure 495845DEST_PATH_IMAGE036
the value of the marker for the ith token is 0 or 1,
Figure 52729DEST_PATH_IMAGE037
as a parameter
Figure 788603DEST_PATH_IMAGE038
Figure 954880DEST_PATH_IMAGE039
The likelihood function of labeling the part of the object is specifically defined as:
Figure 827021DEST_PATH_IMAGE040
wherein
Figure 289227DEST_PATH_IMAGE034
In order to input the length of the text,
Figure 512398DEST_PATH_IMAGE035
taking 1 when true, taking 0 when false,
Figure 718251DEST_PATH_IMAGE036
the value of the marker for the ith token is 0 or 1,
Figure 710478DEST_PATH_IMAGE041
as a parameter
Figure 405901DEST_PATH_IMAGE042
Figure 850789DEST_PATH_IMAGE043
-representing objects without correspondence of subject, marks thereof
Figure 594754DEST_PATH_IMAGE044
Is always 0;
the overall loss function can be expressed as:
Figure 205602DEST_PATH_IMAGE061
wherein
Figure 9610DEST_PATH_IMAGE046
Training of the model is done by maximizing this function for the weights.
Compared with the prior art, the invention has the following advantages:
1) the method introduces additional corpora containing entity category, entity representation, entity context and text path information, and can improve the coding capacity of the model compared with the traditional representation-based method and rule-based method.
2) Compared with the method for introducing the additional information, the method of the scheme introduces the additional information into the pre-training language model by pre-training in the coding layer. The method has good adaptability, can be applied to the task, and can also apply the model which is trained in advance to other problems. Secondly, multiple kinds of extra information act on simultaneously, and the problem of insufficient coverage rate of single information can be avoided.
3) The method customizes a reasonable initialization method for all extra information, uses a scheme based on word segmentation and syntactic dependency analysis on entity and relation embedding, adopts a method based on attention mechanism on relation extraction, and can fully utilize semantic information so as to achieve better effect.
4) The method adopts the SDP (shortest dependent path) of the text to introduce an attention mechanism for operation, firstly reserves the original semantics of the text, adds the dependency syntactic characteristic and highlights the importance of the entity in text coding. The weighted sum mode retains the original semantic meaning of the text and adds the dependency syntactic characteristics, which cannot be achieved by the traditional text coding mode.
5) The scheme compresses and screens information by using an attention mechanism on a graph, and the method has the characteristics of high efficiency and strong expression capability. The attention mechanism can effectively mine the association between the entities, can integrate high-order logic expression of an inference chain into the feature representation, and well utilizes the summary experience of human beings.
6) The scheme introduces an attention mechanism and syntax dependency analysis at a decoding layer to perform further pre-training and feature extraction work. In the text, various information introduced in the encoding stage may have a lot of noises, such as entity category identification errors, or a text path introduces irrelevant text, and the information does not have a great effect on extracting relevant entities, so that a relationship data representation generated by syntactic dependency analysis is used as supervision information in a way of attention mechanism, the relevance is calculated for each character, the character features with low relevance are shielded by using mask of attention mechanism, the nodes more prone to the entities are reserved, and the original encoding result is subjected to weighted summation, so that after weighting, the features can be further compressed, and the noises are also filtered.
7) In the scheme, different entity pairs and different additional information have different contribution degrees to the relation extraction task, and the information is filtered by using a mask mechanism of an attention mechanism. The method has the advantages that semantic deviation caused by partial information is avoided, and the model automatically selects the most favorable information of the current entity to carry out relation classification.
Drawings
FIG. 1 is a schematic overall framework diagram of the present solution;
FIG. 2 is a schematic diagram of the encoding part of the pre-training in the present scheme;
fig. 3 is a schematic diagram of formal training in the present scheme.
Detailed Description
The following detailed description of the embodiments of the invention is provided in connection with the accompanying drawings.
Example 1: referring to fig. 1-3, the present invention is a method for extracting remote supervised relationship of mode pre-training, comprising the following steps:
step 1) pre-training in an encoding stage, constructing a pre-training network for acquiring information of an entity position, and representing related information by labeling a text, which mainly comprises the following steps:
1-1) firstly, carrying out word segmentation on original text of a text by using a related tool, and obtaining entity marking data by a manual fine adjustment method, wherein the entity marking data comprises the category of an entity and the position of the entity. And performing word segmentation, part of speech tagging and dependency syntactic analysis on the text containing the entity pairs. In the dependency syntax tree generated by the dependency syntax analysis, the shortest link between two entities is found. The words and edges on this shortest connected path will be the shortest dependent path of the entity pair in the text. In this way, the relationship of the main and the subordinate is constructed.
1-2) pre-training a word vector model. Word vectors in the data set are trained in advance using BERT or GPT approaches. If not, the method of the scheme keeps synchronous training with the model parameters. But in effect, word vectors trained on large-scale predictions in advance can hold more semantic and grammatical information.
2-1) in the boundary prediction task, mapping the pre-training language model, and mapping the vector through BilSTM:
Figure 4111DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure 551767DEST_PATH_IMAGE062
vector codes obtained for the BERT pre-training language model.
Sequence annotation prediction using CRF (conditional random field):
Figure 518586DEST_PATH_IMAGE063
and obtaining the result of sequence annotation.
2-2) in the word prediction task, the Token of 15% of the entity in the input text is hidden (Mask), and is directly replaced by Mask with a probability of 80%, and replaced by any other word with a probability of 10%, and the original Token is retained with a probability of 10%. A corpus is repeatedly provided to the model in this way for training.
Step 2) pre-training in a decoding stage, enhancing information of an entity through a mask matrix of an attention mechanism, obtaining pre-trained data by using a syntax dependency tree, and generating the mask matrix, wherein the method mainly comprises the following steps:
a series of vectors for the text is obtained by step 1), represented as a feature matrix X. Wherein, X is the feature vector representation of all the input nodes, the dimension is N X D, N is the number of the nodes, and D is the dimension of the vector representation of each node.
And performing dependency analysis on the original text by using a syntactic dependency tool to construct a syntactic dependency tree, and obtaining an adjacency matrix M of the text, wherein the dimensionality is N x N. By performing operation through a multihead-attribute mechanism, a mask mechanism in a transform can shield entities which are not 1 in a matrix, highlight information of related entities, and perform pre-training on a transform layer, which is specifically as follows:
Figure 227916DEST_PATH_IMAGE064
wherein the content of the first and second substances,
Figure 912975DEST_PATH_IMAGE065
in order to be a mask matrix, the method comprises the following steps of,
Figure 61059DEST_PATH_IMAGE066
a mask operator representing the type of attention,
Figure 882385DEST_PATH_IMAGE067
for the three vectors of the attention mechanism,
Figure 293775DEST_PATH_IMAGE068
is a vector
Figure 716664DEST_PATH_IMAGE069
The dimension (c) of (a) is,
Figure 340544DEST_PATH_IMAGE070
represents the first
Figure 16375DEST_PATH_IMAGE070
A head.
The output of the above steps is a matrix Z of N x F. Denote Z as a sequence of nodes Z = { Z0, Z1.., ZN }, each node being a vector of dimension F. And then, calculating the vector representation of each node according to a weighted average mode to obtain the final vector representation. Combining the results of each calculation, through a feedforward neural network sublayer:
Figure 333087DEST_PATH_IMAGE071
wherein
Figure 727160DEST_PATH_IMAGE072
Are parameters of the feed-forward neural network.
The final obtained vector is expressed as follows:
Figure 420309DEST_PATH_IMAGE017
;
wherein
Figure 950648DEST_PATH_IMAGE073
Is a real number between values (0, 1).
Wherein
Figure 202375DEST_PATH_IMAGE074
The original result output by the coding layer.
A relational model is modeled as a function with the parameters subject and the result object, and the probability of the starting and ending positions of the subject for the ith character is:
Figure 83743DEST_PATH_IMAGE075
wherein
Figure 111742DEST_PATH_IMAGE076
Is a function of the Sigmod and,
Figure 762166DEST_PATH_IMAGE077
is a parameter that is trainable,
Figure 686260DEST_PATH_IMAGE078
is as follows
Figure 789345DEST_PATH_IMAGE079
Word vectors of individual positions.
The probability of the position of the object is:
Figure 824297DEST_PATH_IMAGE080
wherein
Figure 329228DEST_PATH_IMAGE081
Is a vector representation of the kth subject,
Figure 220961DEST_PATH_IMAGE082
is a function of the Sigmod and,
Figure 76921DEST_PATH_IMAGE077
are trainable parameters.
Wherein the specific loss function is as follows:
and an encoding part:
Figure 148520DEST_PATH_IMAGE083
the training function in the decoding part is represented as:
Figure 773537DEST_PATH_IMAGE084
wherein the content of the first and second substances,
Figure 773854DEST_PATH_IMAGE085
the likelihood function for labeling the part of the subject is specifically defined as:
Figure 851531DEST_PATH_IMAGE040
wherein the content of the first and second substances,
Figure 493865DEST_PATH_IMAGE034
in order to input the length of the text,
Figure 973388DEST_PATH_IMAGE035
taking 1 when true, taking 0 when false,
Figure 410185DEST_PATH_IMAGE036
the value of the marker for the ith token is 0 or 1,
Figure 771897DEST_PATH_IMAGE037
as a parameter
Figure 686763DEST_PATH_IMAGE038
;
Figure 784907DEST_PATH_IMAGE039
The likelihood function of labeling the part of the object is specifically defined as:
Figure 392606DEST_PATH_IMAGE040
wherein
Figure 179296DEST_PATH_IMAGE034
In order to input the length of the text,
Figure 897853DEST_PATH_IMAGE035
taking 1 when true, taking 0 when false,
Figure 351968DEST_PATH_IMAGE036
the value of the marker for the ith token is 0 or 1,
Figure 130569DEST_PATH_IMAGE041
as a parameter
Figure 466872DEST_PATH_IMAGE042
Figure 989120DEST_PATH_IMAGE043
-representing objects without correspondence of subject, marks thereof
Figure 297742DEST_PATH_IMAGE044
Is always 0;
the overall loss function can be expressed as:
Figure 751638DEST_PATH_IMAGE061
wherein
Figure 247341DEST_PATH_IMAGE046
Training of the model is done by maximizing this function for the weights.
Step 3) training the model by using formal data;
in formal training, the full-link layer added in the pre-training process is abandoned. Directly coding a text by using a pre-trained BERT model, inputting the text into a transform part of a decoding layer, then mapping the text to an output layer through a full connection layer, and then normalizing an output node value through a softmax function to obtain probability distribution of a text pair relation; through the previous steps, the model is pre-trained, the pre-trained network is removed, and formal data are used for training:
and removing word prediction codes of the pre-training part, and training by using formal data by using the network after the pre-training is finished. The output result is obtained by inputting the code obtained from the decoding layer into a feedforward neural network and predicting the probability by an activation function.
During the training phase, the loss function is as follows:
Figure 573280DEST_PATH_IMAGE086
and (3) defining reference step (2) by specific parameters.
The task of the training phase is to formally train the model, so that training of the code part is not involved here.
It should be noted that the above-mentioned embodiments are only preferred embodiments of the present invention, and are not intended to limit the scope of the present invention, and all equivalent substitutions or substitutions made on the basis of the above-mentioned technical solutions belong to the scope of the present invention.

Claims (8)

1. A relation extraction method based on mode pre-training comprises the following steps:
step 1) pre-training in an encoding stage, constructing a pre-training network for acquiring information of an entity position, and representing related information by labeling a text, which mainly comprises the following steps:
marking entity position information, namely the head, the interior and the tail of the entity, and identifying various entities to provide information for the model;
the expectation of pre-training the coding part model is to obtain the position information of an entity, the head character of the entity represents the information of the whole entity, a BERT pre-training language model is used for coding a text, and the model is pre-trained on the basis of the coding mechanism so as to enable the model to obtain the text information;
pre-training a coding part, wherein the pre-training of the coding part is a multi-training task, namely word prediction and boundary prediction, namely, for a section of text, a character is given, results are added to a neural network through BERT coding, probability mapping is carried out, and BIO labels corresponding to prediction sequences are marked;
step 2) pre-training in a decoding stage, enhancing information of an entity through a mask matrix of an attention mechanism, obtaining pre-trained data by using a syntax dependency tree, and generating the mask matrix, wherein the method mainly comprises the following steps:
the SDP (short dependency path) of the entity related text is the shortest dependency path, the shortest dependency path between the entities is found by performing word segmentation, part of speech tagging and dependency syntax analysis on the text containing the target entity, and the shortest dependency path contains all semantic information between the entities in the text, so that syntactic dependency analysis is performed on a large number of linguistic data by using the syntactic dependency analysis, and the dependency tree is sampled according to the constructed dependency tree to obtain some relations of the major predicate type and the verb type to be used as training data of a decoding part;
for an original text, a syntax dependency tree is constructed by using a syntax dependency analysis tool to obtain a corresponding adjacent matrix, and the matrix is used as a mask matrix of a decoding part attention mechanism;
applying a mask mechanism in the attribute to mask vectors, wherein in a syntactic dependency tree, a node is regarded as an entity, the importance of the entity in the overall task is highlighted in such a way, the capability of the model for detecting the entity is enhanced, and the two vectors are subjected to weighted summation to obtain the vector for decoding;
the prediction result of the decoding part is represented by a label representation method in the form of a matrix, the elements represented by the rows and the columns of the matrix are the same and are all entities corresponding to the input text, and the relationship between the two entities is represented by the values of the matrix, obviously, the matrix is a symmetrical matrix;
adding a prediction structure of a main and predicate element triple behind a transform part to finish training;
and 3) training the model by using formal data, wherein in the formal training, a full connection layer added in the pre-training process is abandoned, the text is directly encoded by using the pre-trained BERT model and is input to a transform part of a decoding layer, then the triple is modeled into the mapping from a subject to a predicate, and then the output node value is normalized by a softmax function to obtain the probability distribution of the text-to-relation.
2. The method for extracting relationship based on pattern pre-training as claimed in claim 1, wherein the step of obtaining the sequence labeling information of the text in step 1) is as follows:
1-1) firstly, segmenting a section of original text by using a related tool to acquire entity position information contained in the original text, specifically, using a character "B" to represent the head of an entity, using a character "I" to represent the middle part of the entity, and using a character "O" to represent that the character does not belong to any entity;
1-2) pre-training a word vector model, acquiring the pre-trained model in advance by using a BERT or GPT mode,
the step of obtaining the sequence labeling information of the text in the step 2) is as follows:
2-1) performing word segmentation and dependency syntax analysis on a text containing entity pairs, finding the shortest connection path between two entities in a dependency syntax tree generated by the dependency syntax analysis, wherein words and edges on the shortest connection path are used as the shortest dependency paths of the entity pairs in the text;
2-2) the dependency syntax tree contains multiple types of edges, and one or more types of paths are selected.
3. The pattern pre-training based relationship extraction method of claim 1, wherein step 1) pre-trains the BERT pre-trained language model as follows:
3-1) pre-training a BERT model to obtain vectorized representation of a text, obtaining vectorized representation of each entity including position information and semantic information through the BERT, wherein the BERT is a pre-training language model based on a transformer, a vector of the BERT is composed of three parts, token embedding, segmentation embedding and position embedding are carried out, and vectorized representation containing the semantic information is obtained through the BERT;
3-2) performing further pre-training on the existing BERT Model, wherein the task is to predict a Masked Language Model for a word, a character of the Model is required to be given, the Model judges whether the character is a first character of an entity, and the full text of the word where the character is located is predicted;
3-3) performing further pre-training on the BERT based on the steps, wherein the task is boundary prediction, a character of a model is required to be given, the model judges whether the character belongs to an entity according to the vector of the character, and the boundary of the entity where the character is located is predicted;
3-4) modeling is carried out based on the tasks in the steps, the tasks are multi-task training of word prediction and boundary prediction, in the word prediction task, part of words are randomly covered, and the words are replaced according to the specified probability;
3-5) modeling is carried out based on the tasks of the steps, the tasks are multitask training of word prediction and boundary prediction, in the boundary prediction task, entity boundary prediction is carried out based on obtained word codes, and vectors are mapped by using BilSTM:
Figure DEST_PATH_IMAGE001
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE002
vector coding obtained for the BERT pre-training language model;
sequence labeling prediction using a CRF conditional random field:
Figure DEST_PATH_IMAGE003
wherein
Figure DEST_PATH_IMAGE004
The position in the sequence is indicated and,
Figure DEST_PATH_IMAGE005
the corresponding sequences are labeled.
4. The pattern pre-training based relationship extraction method according to claim 1, wherein the step 2) models the decoding part as follows:
4-1) constructing a syntactic tree for the processed sequence through syntactic dependency analysis to obtain an adjacency matrix between entities as a transform's mask matrix,
4-2) based on the vectorized expression of the text obtained in the step 3) and the mask matrix in the step 4-1), performing operation through a multihead-attention mechanism, wherein the mask mechanism in the transformer can shield entities which are not 1 in the matrix, highlight information of related entities, and pre-train a transform layer, and the method comprises the following specific steps:
Figure DEST_PATH_IMAGE006
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE007
in order to be a mask matrix, the method comprises the following steps of,
Figure DEST_PATH_IMAGE008
represents the mask operator in the attribute,
Figure DEST_PATH_IMAGE009
for the three vectors of the attention mechanism,
Figure DEST_PATH_IMAGE010
is a vector
Figure DEST_PATH_IMAGE011
The dimension (c) of (a) is,
Figure DEST_PATH_IMAGE012
represents the first
Figure DEST_PATH_IMAGE013
A head;
combining the results of each calculation, through a feedforward neural network sublayer:
Figure DEST_PATH_IMAGE014
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE015
in order to feed forward the parameters of the neural network,
the final obtained vector is expressed as follows:
Figure DEST_PATH_IMAGE016
;
wherein
Figure DEST_PATH_IMAGE017
The original result output by the coding layer;
4-3) expressing the predicted relation by using an x matrix taking the entity sequence as rows and columns, and mapping the vector obtained in the 4-2) through a feedforward neural network and an activation function to obtain a predicted relation result.
5. The method for extracting relationship based on pattern pre-training as claimed in claim 1, wherein the step 3) is as follows:
removing the networks added in the pre-training, namely word prediction and boundary prediction of the coding layer and relation prediction of the decoding layer, inputting formal linguistic data for training, specifically, decoding a vector through a decoding layer transformer, and expressing the obtained vector as follows:
Figure DEST_PATH_IMAGE018
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE019
is a real number between values (0, 1).
6. The method for extracting relationship based on pattern pre-training as claimed in claim 1, wherein the step 2) is specifically as follows:
a relationship is modeled as a function with parameters as the subject and the result as the object, and for the ith character, the probability of being the starting and ending positions of the subject is:
Figure DEST_PATH_IMAGE020
wherein
Figure DEST_PATH_IMAGE021
Is a function of the Sigmod and,
Figure DEST_PATH_IMAGE022
and
Figure DEST_PATH_IMAGE023
is a parameter that is trainable,
Figure DEST_PATH_IMAGE024
is as follows
Figure DEST_PATH_IMAGE025
A word vector of individual positions;
in a modeling manner, the probability of the position of the object is:
Figure DEST_PATH_IMAGE026
wherein
Figure DEST_PATH_IMAGE027
Is a vector representation of the kth subject,
Figure 12829DEST_PATH_IMAGE021
is a function of the Sigmod and,
Figure 9604DEST_PATH_IMAGE022
and
Figure 924339DEST_PATH_IMAGE023
are trainable parameters.
7. The pattern pre-training based relationship extraction method according to claim 1, wherein the training function of step 3) is as follows: and an encoding part:
Figure DEST_PATH_IMAGE028
wherein
Figure 554035DEST_PATH_IMAGE004
For the purpose of the corresponding text sequence,
Figure DEST_PATH_IMAGE029
for the corresponding labeled sequence, the training function in the decoding part is expressed as:
Figure DEST_PATH_IMAGE030
wherein the content of the first and second substances,
Figure DEST_PATH_IMAGE031
the likelihood function for labeling the part of the subject is specifically defined as:
Figure DEST_PATH_IMAGE032
;
wherein
Figure DEST_PATH_IMAGE033
In order to input the length of the text,
Figure DEST_PATH_IMAGE034
taking 1 when true, taking 0 when false,
Figure DEST_PATH_IMAGE035
the value of the marker for the ith token is 0 or 1,
Figure DEST_PATH_IMAGE036
as a parameter
Figure DEST_PATH_IMAGE037
;
Figure DEST_PATH_IMAGE038
The likelihood function of labeling the part of the object is specifically defined as:
Figure DEST_PATH_IMAGE039
wherein
Figure 966080DEST_PATH_IMAGE033
In order to input the length of the text,
Figure 692728DEST_PATH_IMAGE034
taking 1 when true, taking 0 when false,
Figure 450468DEST_PATH_IMAGE035
the value of the marker for the ith token is 0 or 1,
Figure DEST_PATH_IMAGE040
as a parameter
Figure DEST_PATH_IMAGE041
Figure DEST_PATH_IMAGE042
-representing objects without correspondence of subject, marks thereof
Figure DEST_PATH_IMAGE043
Is always 0;
the overall loss function can be expressed as:
Figure DEST_PATH_IMAGE044
wherein
Figure DEST_PATH_IMAGE045
Training of the model is done by maximizing this function for the weights.
8. The method for extracting relationship based on pattern pre-training as claimed in claim 7, wherein the training process of step 3) is as follows:
the method abandons a pre-trained word prediction and boundary prediction part, and re-initializes modeling of the triples, and specifically comprises the following steps: for the ith character, the probability that it is the starting and ending positions of the subject is:
Figure DEST_PATH_IMAGE046
wherein
Figure DEST_PATH_IMAGE047
Is a function of the Sigmod and,
Figure 122453DEST_PATH_IMAGE022
and
Figure 459893DEST_PATH_IMAGE023
is a parameter that is trainable,
Figure DEST_PATH_IMAGE048
is as follows
Figure DEST_PATH_IMAGE049
A word vector of individual positions;
in a modeling manner, the probability of the position of the object is:
Figure DEST_PATH_IMAGE050
wherein
Figure DEST_PATH_IMAGE051
Is the first
Figure DEST_PATH_IMAGE052
The vector expression of the individual subjects is,
Figure 759156DEST_PATH_IMAGE047
is a function of the Sigmod and,
Figure DEST_PATH_IMAGE053
is a parameter that is trainable,
the training function is:
Figure DEST_PATH_IMAGE054
by maximizing this function, training of the model is accomplished.
CN202111331381.8A 2021-11-11 2021-11-11 Relation extraction method based on mode pre-training Active CN113761893B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111331381.8A CN113761893B (en) 2021-11-11 2021-11-11 Relation extraction method based on mode pre-training

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111331381.8A CN113761893B (en) 2021-11-11 2021-11-11 Relation extraction method based on mode pre-training

Publications (2)

Publication Number Publication Date
CN113761893A CN113761893A (en) 2021-12-07
CN113761893B true CN113761893B (en) 2022-02-11

Family

ID=78784893

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111331381.8A Active CN113761893B (en) 2021-11-11 2021-11-11 Relation extraction method based on mode pre-training

Country Status (1)

Country Link
CN (1) CN113761893B (en)

Families Citing this family (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114519356B (en) * 2022-02-22 2023-07-18 平安科技(深圳)有限公司 Target word detection method and device, electronic equipment and storage medium
CN114328978B (en) * 2022-03-10 2022-05-24 合肥综合性国家科学中心人工智能研究院(安徽省人工智能实验室) Relationship extraction method, device, equipment and readable storage medium
CN114528394B (en) * 2022-04-22 2022-08-26 杭州费尔斯通科技有限公司 Text triple extraction method and device based on mask language model
CN114861600B (en) 2022-07-07 2022-12-13 之江实验室 NER-oriented Chinese clinical text data enhancement method and device
CN117807956A (en) * 2023-12-29 2024-04-02 兰州理工大学 ICD automatic coding method based on clinical text tree structure

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472235A (en) * 2019-07-22 2019-11-19 北京航天云路有限公司 A kind of end-to-end entity relationship joint abstracting method towards Chinese text
CN111488726A (en) * 2020-03-31 2020-08-04 成都数之联科技有限公司 Pointer network-based unstructured text extraction multi-task joint training method

Family Cites Families (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103514165A (en) * 2012-06-15 2014-01-15 佳能株式会社 Method and device for identifying persons mentioned in conversation
CN109543183B (en) * 2018-11-16 2021-01-19 西安交通大学 Multi-label entity-relation combined extraction method based on deep neural network and labeling strategy
CN112507699B (en) * 2020-09-16 2022-11-18 东南大学 Remote supervision relation extraction method based on graph convolution network
CN112613306A (en) * 2020-12-31 2021-04-06 恒安嘉新(北京)科技股份公司 Method, device, electronic equipment and storage medium for extracting entity relationship
CN113254429B (en) * 2021-05-13 2023-07-21 东北大学 BERT and MLM-based noise reduction method for remote supervision relation extraction
CN113326698B (en) * 2021-06-18 2023-05-09 深圳前海微众银行股份有限公司 Method for detecting entity relationship, model training method and electronic equipment

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110472235A (en) * 2019-07-22 2019-11-19 北京航天云路有限公司 A kind of end-to-end entity relationship joint abstracting method towards Chinese text
CN111488726A (en) * 2020-03-31 2020-08-04 成都数之联科技有限公司 Pointer network-based unstructured text extraction multi-task joint training method

Also Published As

Publication number Publication date
CN113761893A (en) 2021-12-07

Similar Documents

Publication Publication Date Title
CN113761893B (en) Relation extraction method based on mode pre-training
CN110134757B (en) Event argument role extraction method based on multi-head attention mechanism
CN111241294B (en) Relationship extraction method of graph convolution network based on dependency analysis and keywords
CN110597735B (en) Software defect prediction method for open-source software defect feature deep learning
CN112507699B (en) Remote supervision relation extraction method based on graph convolution network
CN111159407B (en) Method, apparatus, device and medium for training entity recognition and relation classification model
CN111931506B (en) Entity relationship extraction method based on graph information enhancement
CN110263323A (en) Keyword abstraction method and system based on the long Memory Neural Networks in short-term of fence type
CN109063164A (en) A kind of intelligent answer method based on deep learning
CN112183064B (en) Text emotion reason recognition system based on multi-task joint learning
CN112800203B (en) Question-answer matching method and system fusing text representation and knowledge representation
CN112966525B (en) Law field event extraction method based on pre-training model and convolutional neural network algorithm
CN112463424A (en) End-to-end program repair method based on graph
CN113868432A (en) Automatic knowledge graph construction method and system for iron and steel manufacturing enterprises
CN116661805B (en) Code representation generation method and device, storage medium and electronic equipment
CN115292568B (en) Civil news event extraction method based on joint model
CN114168754A (en) Relation extraction method based on syntactic dependency and fusion information
CN113378024B (en) Deep learning-oriented public inspection method field-based related event identification method
CN111382333B (en) Case element extraction method in news text sentence based on case correlation joint learning and graph convolution
CN117033423A (en) SQL generating method for injecting optimal mode item and historical interaction information
CN112148879B (en) Computer readable storage medium for automatically labeling code with data structure
CN116975161A (en) Entity relation joint extraction method, equipment and medium of power equipment partial discharge text
CN114298052B (en) Entity joint annotation relation extraction method and system based on probability graph
CN116340507A (en) Aspect-level emotion analysis method based on mixed weight and double-channel graph convolution
CN115422945A (en) Rumor detection method and system integrating emotion mining

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant