CN114444506A

CN114444506A - Method for extracting relation triple fusing entity types

Info

Publication number: CN114444506A
Application number: CN202210026447.0A
Authority: CN
Inventors: 彭德中; 陈付旻; 吕建成; 彭玺; 桑永胜; 胡鹏; 孙亚楠; 王旭; 陈杰; 王骞
Original assignee: Sichuan University
Current assignee: Sichuan University
Priority date: 2022-01-11
Filing date: 2022-01-11
Publication date: 2022-05-06
Anticipated expiration: 2042-01-11
Also published as: CN114444506B

Abstract

The invention discloses a relation triple extraction method fusing entity types, which comprises the following steps: collecting text data as training samples; cleaning the collected training sample data to obtain a data set; segmenting a data set and dividing the data set into a training set, a verification set and a test set according to a certain proportion; building a deep learning network based on a BERT pre-training model and loading pre-training parameters to obtain deep expression of a training sample; building a Fast Gradient Method after the model to resist the robustness and generalization performance of the network promotion model; constructing a relation triple extraction model of a multi-head attention mechanism and a deep neural network; training and testing the model, storing a K-fold model for the verification set by adopting a K-fold cross verification method, testing the test set by integrating the K-fold model, and taking the average probability as the test result of the model; the model AtnFGM-MARE was output.

Description

Method for extracting relation triple fusing entity types

Technical Field

The invention relates to the technical field of natural language processing, in particular to an intelligent extraction method for relation triples fusing entity types.

Background

Relational extraction is typically organized and presented in the form of triplets, with these structured knowledge enabling a number of downstream tasks, such as: the method comprises the steps of information extraction, knowledge graph, search engine and question and answer task, wherein the task of relation extraction comprises the steps of identifying head entities and tail entities in texts and classifying relations between the entities. At present, a neural network method is generally used for extracting the relation triples, and is inspired by the rapid development of a pre-training language model technology, the relation extraction uses a pre-training model to perform unsupervised learning by utilizing large-scale unmarked data to obtain the deep expression of a text, the method depends on the deep expression of a head entity, the deep expression of the head entity influences the extraction effect of a tail entity and a relation type in a downstream task, but the method has limited characteristic expression capability on the head entity, the improvement effect of type information of the head entity on the tail entity and the relation extraction is not considered, and the model also has the problems of low robustness, weak generalization capability and the like.

Disclosure of Invention

The invention aims to solve the technical problem of providing a relational ternary ancestor extraction method fusing entity types, which uses a multi-head attention mechanism to fuse the characteristics of the entity types of a head and obtains relational triples through a Deep Neural Network (DNN), thereby improving the robustness and generalization performance of a model,

in order to solve the technical problem, the invention is realized by the following modes:

a method for extracting a relation triple fused with entity types specifically comprises the following steps:

1) collecting text data as training samples;

2) cleaning the training sample data collected in the step 1) to form a data set;

3) segmenting the data set formed in the step 2), and dividing the data set into a training set, a verification set and a test set according to the proportion of 7:2: 1;

4) building a deep learning network based on a BERT pre-training model, and loading pre-training parameters to obtain deep expression of training set data;

5) a Fast Gradient Method antagonistic neural network model is built after a BERT pre-training model is built, and the Fast Gradient Method antagonistic neural network model is used for improving the robustness and generalization performance of the model;

6) predicting a head entity by utilizing a head entity extraction module DNN;

7) extracting head entity characteristics from the head entities predicted in the step 6), and extracting a model by using a relation triple of a multi-head attention mechanism and a deep neural network;

8) predicting the relation and the tail entity by using a relation and tail entity extraction module DNN;

9) performing model training, storing a K-fold model for the verification set in the step 3) by adopting a K-fold cross verification method, testing the test set by using the K-fold model, and taking the average probability as a test result of the model; the model AtnFGM-MARE was output.

Further, the BERT pre-training model in the step 4) is sequentially provided with a position embedding layer, a syntax embedding layer and a token embedding layer, and then is connected with an E_[cls]Layer, full connection layer and T_[cls]Layers and depth expression is as follows:

H＝BERT(S) (1)

wherein S is training set text data, and H is the depth expression of the hidden state of S after the BERT pre-training model.

The expression of the Fast Gradient Method anti-neural network model built in the step 5) is as follows:

r_adv＝∈·g/||g||₂ (3)

x_adv＝x+r_adv (4)

wherein g represents a loss function after gradient updating, theta represents a parameter of the antagonistic neural network, x represents an input of the model, y represents a label corresponding to the input x, L represents a loss function of the training neural network,

gradient representation representing a neural network, e representing a hyper-parameter of the antagonistic network, x_advRepresenting model input after adding counterdisturbance, r_advRepresenting the degree of adding the countermeasure against the disturbance.

Step 6) connecting a head entity extraction module to a Fast Gradient Method countermeasure neural network model, and predicting a head entity starting position and a head entity ending position of a head entity starting full-link layer and a head entity ending full-link layer respectively;

wherein ,x_iFor depth representation of the ith character in text, W_s、W_e、b_s、b_eRepresents trainable parameters of the deep neural network, sigma represents sigmoid activation function,

representing the probability that the ith character is the beginning character of the head entity,

representing the probability that the ith character is the head entity end character.

Step 7) acquiring head entity type representation characteristics by using a multi-head attention mechanism, and fusing the head entity characteristics, the head entity type representation characteristics and the context representation characteristics by using a characteristic fusion (characteristic addition) mode to obtain a relation triple extraction model of the deep neural network; the expression for the multi-attention mechanism is as follows:

MultiHead(Q，K，V)＝Concat(head₁，head₂，...，head_h)W^O (9)

wherein Q, K, V represents a vector obtained by linearly varying input features, d_kHead, a parameter representing a control variance_iRepresenting the output of the ith attention mechanism module in the multi-head attention mechanism model,

W⁰trainable parameters representing a deep neural network.

The relation and tail entity prediction in the step 8) is specifically as follows: predicting the relation, the starting position and the ending position of the tail entity according to the relation after the characteristic fusion in the step 7), the starting full-link layer and the relation of the tail entity and the ending full-link layer of the tail entity;

wherein ,

represents trainable parameters of the deep neural network, sigma represents sigmoid activation function,

representing the depth representation of the k-th head entity,

representing a type depth representation of the kth head entity,

representing the probability that the ith character is the end entity start character given the relationship r,

representing the probability that the ith character is the end character of the tail entity given the relationship r.

Compared with the prior art, the invention has the following beneficial effects:

the triple extraction method disclosed by the invention has the advantages that through fusing an entity type neural network end-to-end model (AttnFGM-MARE), a pre-training model is adopted as the feature extraction of text context, and relational triples are intelligently extracted; the robust performance and the generalization performance of the model are improved by adopting a Fast Gradient Method antagonistic network model, the entity type characteristics of the head are fused by using a multi-head attention mechanism, and the effect of extracting the relation triplets is obtained and improved by a Deep Neural Network (DNN).

Drawings

FIG. 1 is a flow chart of a relational triple extraction model according to the present invention.

Detailed Description

The following detailed description of embodiments of the invention is provided in connection with the accompanying drawings and the examples. It should be understood that terms such as "having," "including," and "comprising," as used herein, do not preclude the presence or addition of one or more other networks or combinations thereof.

As shown in fig. 1, a method for extracting relationship triples of a converged entity type specifically includes the following steps:

1) collecting text data as training samples, size N of batch-size_bAnd a learning rate α; wherein the size N of the batch-size_bThe model can process the size of data in batch, namely the number of data samples captured by one-time training, and the speed of processing one data at a time is low; the learning rate alpha is the step size of the model updating parameter, the whole optimization process is updated by the determined step size, and the function of updating the parameter is to gradually approach the optimal solution.

2) Cleaning the training sample data collected in the step 1) to form a data set

3) Segmenting the data set formed in the step 2), and dividing the data set into training sets according to the proportion of 7:2:1

Verification set

And test set

5) a Fast Gradient Method antagonistic neural network model is built after a BERT pre-training model is built, and the Fast Gradient Method antagonistic neural network model is used for improving the robustness and generalization performance of the model; the expression of the Fast Gradient Method versus neural network model is as follows:

r_adv＝∈·g/||g||₂ (3)

x_adv＝x+r_adv (4)

where θ represents a parameter of the antagonistic neural network, x represents an input to the model, y represents a label corresponding to the input x, L represents a loss function of the training neural network,

gradient representation representing a neural network, e representing a hyper-parameter of the antagonistic network, r_advRepresenting the degree of adding the countermeasure disturbance;

6) predicting a head entity by utilizing a head entity extraction module DNN;

representing the probability that the ith character is a head entity ending character;

7) extracting head entity characteristics from the head entities predicted in the step 6), and extracting a model by using a relation triple of a multi-head attention mechanism and a deep neural network; the expression for the multi-attention mechanism is as follows:

MultiHead(Q，K，V)＝Concat(head₁，head₂，...，head_h)W^O (9)

W⁰trainable parameters representing a deep neural network;

wherein ,

representing the depth representation of the k-th head entity,

representing a type depth representation of the kth head entity,

representing the probability that the ith character is the tail entity ending character under the condition of a given relation r;

9) performing model training, storing a K-fold model for the verification set in the step 3) by adopting a K-fold cross verification method, testing the test set by using the K-fold model, and taking the average probability as a test result of the model; the model AtnFGM-MARE was output. The K-fold cross validation method comprises the steps of dividing original data into K groups, making a primary validation set for each subset data, and using the remaining K-1 subsets as training sets to obtain a K-fold model.

Computing the network output y of a Deep Neural Network (DNN) of batch-size for the deep expression H of the present application^*＝DNN(H) (12)

wherein ,y^*Representing the output of the neural network;

the loss of training is calculated through a loss function, and the expression of the loss function is as follows:

wherein ,x_iRepresents the ith sample in the training set, T_iRepresentative of the occurrence in the training sample x_iS represents a head entity appearing in the relational triple, o represents a tail entity appearing in the relational triple, p_rRepresenting a probability value under a specified relationship r;

training and calculating through a loss function to update a parameter theta of the antagonistic neural network model, wherein the minimum expression of the parameter theta is as follows:

the foregoing is illustrative of embodiments of the present invention and it will be further appreciated by those skilled in the art that various modifications may be made without departing from the principles of the invention and that such modifications are intended to be included within the scope of the appended claims.

Claims

1. A method for extracting relation triples fusing entity types is characterized in that: the method specifically comprises the following steps:

1) collecting text data as training samples;

6) predicting a head entity by utilizing a head entity extraction module DNN;

2. The method for extracting relationship triples of converged entity types according to claim 1, wherein:

the BERT pre-training model in the step 4) is sequentially provided with a position embedding layer, a syntax embedding layer and a token embedding layer, and the rear connection is connected with an E_[cls]Layer, full connection layer and T_[cls]Layers and depth expression is as follows:

H＝BERT(S) (1)

3. The method for extracting relationship triples of converged entity types according to claim 1, wherein:

r_adv＝∈·g/||g||₂ (3)

x_adv＝x+r_adv (4)

gradient representation representing a neural network, e representing a hyper-parameter, x, of the antagonistic network_advRepresenting model input after adding counterdisturbance, r_advRepresenting the degree of adding the countermeasure against the disturbance.

4. The method for extracting relationship triples of converged entity types according to claim 1, wherein:

the step 6) is that the head entity extraction module is connected with a Fast Gradient Method confrontation neural network model, and predicts the head entity starting position and the head entity ending position of a head entity starting full-connection layer and a head entity ending full-connection layer respectively;

5. The method for extracting relationship triples of converged entity types according to claim 1, wherein:

the step 7) obtains head entity type representation characteristics by using a multi-head attention mechanism, and fuses the head entity characteristics, the head entity type representation characteristics and the context representation characteristics by using a characteristic fusion mode to obtain a relation triple extraction model of the deep neural network; the expression for the multi-attention mechanism is as follows:

MultiHead(Q，K，V)＝Concat(head₁，head₂，...，head_h)W^O (9)

wherein Q, K, V represents a vector obtained by linearly varying input features, d_kParameter representing control variance, head_iRepresenting the output of the ith attention mechanism module in the multi-head attention mechanism model, W_i ^Q、W_i ^K、W_i ^V、W⁰Trainable parameters representing a deep neural network.

6. The method for extracting relationship triples of converged entity types according to claim 1, wherein:

wherein ,

representing the depth representation of the k-th head entity,

representing a type depth representation of the kth head entity,