CN115203434A

CN115203434A - Entity relationship extraction method fusing BERT network and position characteristic information and application thereof

Info

Publication number: CN115203434A
Application number: CN202210791774.5A
Authority: CN
Inventors: 冯勇; 张宁; 徐红艳; 王嵘冰
Original assignee: Liaoning University
Current assignee: Liaoning University
Priority date: 2022-07-07
Filing date: 2022-07-07
Publication date: 2022-10-18

Abstract

A method for extracting entity relationship fusing a BERT network and position characteristic information and application thereof comprise the following steps: (1) entity data acquisition; (2) data processing; (3) providing a model structure; (4) off-line training; (5) online testing; the invention discloses an entity relationship extraction method fusing a BERT network and position characteristic information. By the method, the problems of information redundancy, entity overlapping, low accuracy rate and the like in the traditional entity extraction method are solved. The method firstly utilizes the BERT network to perform unsupervised pre-training on texts, and then adds Bi-LSTM to perform fine tuning on specific downstream tasks. Secondly, the position characteristic information is added on the basis of the attention mechanism, and certain information is selectively focused. And finally, realizing entity relation identification and classification. The method disclosed by the invention is used for carrying out comparison experiments on the public data set FB15K, and the method disclosed by the invention is verified to have higher accuracy and good application prospect.

Description

Entity relationship extraction method fusing BERT network and position characteristic information and application thereof

Technical Field

The invention relates to an entity relation extraction method fusing a BERT network and position characteristic information and application thereof, belonging to the field of knowledge extraction.

Background

With the multi-domain development of the knowledge graph, comprehensive processing and utilization capacity of data in most vertical domains can be greatly improved by analyzing and mining a large amount of heterogeneous data, and entity relationship extraction is an important step for constructing the large-scale domain knowledge graph. The big data era is obtained by a mat, a knowledge graph must be fused with complicated data, and the traditional entity relationship extraction method mainly ignores the internal relation between entity identification and relationship classification, has the problems of error propagation, information redundancy and the like, and can not effectively solve the problem of entity relationship overlapping. A deep learning method is introduced for solving the related problems, the deep learning method can alleviate the defects of a relation extraction model based on the traditional characteristics to a certain extent, and the accumulated error is small. However, in many fields, due to different scales of data volumes, there are many limitations in applying the deep learning method directly to entity relationship extraction, and there is a need to improve a related network model to solve the problem of entity relationship extraction in a specific application field.

Disclosure of Invention

In order to solve the existing problems, the invention provides an entity relationship extraction method fusing a BERT network and position characteristic information and application thereof.

The purpose of the invention is realized by the following technical scheme:

the method for extracting the entity relationship fusing the BERT network and the position characteristic information comprises the following steps:

1) And (3) entity data acquisition: acquiring a data set disclosed in a professional vertical field, and acquiring a sample according to BIO (building information object) labels;

2) Data processing: dividing the obtained entity sample into a training set, a verification set and a test set, and performing maximum and minimum normalization processing;

3) And (3) providing a model structure: the entity relationship extraction model fusing the BERT network and the position characteristic information consists of a BERT model, a Bi-LSTM, a linear layer, an Attention layer and a full connection layer; firstly, a BERT layer captures stronger context semantic information, secondly, a complete time step hidden state sequence is spliced through Bi-LSTM and mapped to a linear layer to obtain the score of each category label, then, weights are distributed through an Attention layer, specific information and a position embedding matrix P are selected to splice and distinguish the feature representation of the same entity among different relations in a full connection layer, and finally, entity classification prediction is completed through an entity classification matrix C;

4) Off-line training: training a model by utilizing a training set and a regularization strategy and storing the optimal parameters of the entity classification matrix C;

5) And (3) online testing: and (3) verifying the performance of the model by applying a test set or fine-tuning the whole model by loading pre-training parameters, and realizing the timely training of the model by using parameter sharing transfer learning.

In the step 3), the specific method is as follows:

3.1 Data passes through the BERT model and Bi-LSTM, and the score of each category label corresponding to the word level is obtained:

taking BERT as an encoder of an input text sequence, and acquiring a hidden layer state vector X by adopting the BERT _t Specifically, as shown in formula (1):

X _t ＝Bert _base (w _t ) (1)

it is denoted as sequence X = (X) ₁ ,x ₂ ,…,x _n )；

Taking the sequence X obtained by the BERT network as the input of each time step of the Bi-directional long-short time memory network Bi-LSTM to obtain the forward hidden state sequence of the Bi-LSTM layer

And backward hidden state sequence

Wherein h is _t Introducing a corresponding memory cell in the hidden layer as shown in formula (2);

wherein h is _t-1 The output result of the hidden layer of the previous long-short-time memory network unit is obtained; c _t-1 Then it isMemorizing the state result of the network unit at the last time; x is a radical of a fluorine atom _t Then it is the word vector input result for the text; f. of _t The output result of the forgetting gate is shown, wherein sigma is a sigmoid activation function; i.e. i _t And with

The output result of the input gate; tan h is tan h activation function; o _t Is the output result of the output gate; c _t Is the state value of the current cell; h is _t An output of a hidden layer that is a current cell;

splicing the forward hidden state sequence and the backward hidden state sequence according to the time step to obtain a complete hidden state sequence, and recording the complete hidden state sequence as H = (H) ₁ ,h ₂ ,…,h _n )；

Finally, the hidden state sequence is mapped to s dimension through the linear layer, and the mapped sequence is recorded as L = (L) ₁ ,l ₂ ,…,l _n ) (ii) a Wherein L represents the word X _i Y of each corresponding category label _j A score of (a);

3.2 Propose a relational location feature attention mechanism: inputting the score of each category label into a relation position characteristic attention mechanism to obtain an entity classification matrix C;

calculating the weight value of the entity relationship classification result by adopting an attention mechanism QKV model; vector matrix query with uniform distribution for random sampling _k*1 Obtaining a Query matrix, wherein k is the output vector dimension of a hidden layer of the bidirectional long-short time memory network; obtaining a Key matrix through a feature matrix generated by word vectors of Chinese word segmentation in sentences; obtaining a Value matrix by a matrix formed by bidirectional long-time and short-time memory network hidden layer output vectors;

the attention mechanism weight value in the entity relationship extraction is calculated according to formula (3):

Attention_w _n×1 ＝softmax(key_w _n×k *query_w _k×1 ) (3)

wherein, the softmax function is used for carrying out vector normalization operation; key _ w _n×k A Key vector matrix of an attention mechanism; query _ w _k×1 To pay attention to the Query vector moment in the power mechanismArraying; attention _ w _n×1 A weight value for attention mechanism;

the attention mechanism output matrix in the entity relationship extraction is calculated according to the formula (4):

Attention_r _k×1 ＝(Attention_w ^T *value_w _n×k ) ^T (4)

wherein value _ w _n×k Memorizing an output vector matrix of a network hidden layer for two-way long and short time; attentint _ r _k×1 Then it is the output vector matrix of the attention mechanism;

in addition, a relation between the attention output sequence and the output sequence passing through the BERT is established, the feature representation of the same entity among different relations is distinguished, and the final relation classification prediction is made;

calculating the distance between each word in the output sequence of the BERT and the trigger word of the current attention output sequence, then randomly initializing a position embedding matrix P according to the maximum length m of sentences and the position feature size n, and obtaining the relation position feature Pf of each word by inquiring the position embedding matrix P _t The relational position feature is calculated according to equation (5):

Pf _t ＝P _Pr-Pw (5)

wherein Pr represents the position of the relationship trigger word; pw represents the position of the BERT output sequence word;

and finally, splicing the position embedded matrix and the attention mechanism output matrix at a full connection layer to obtain an entity classification matrix (6):

C＝f(Attention_r _k×1 ；P) (6)

wherein f (-) represents a fully connected layer; attentint _ r _k×1 An output vector matrix of the attention mechanism; p is the position embedding matrix, and C is the current entity classification matrix.

The extraction method of the entity relationship fusing the BERT network and the position characteristic information is applied to medicine: the CNMER data is input, and the specific application method comprises the following steps: and performing entity extraction on the CNMER data by using an entity relation extraction method fusing a BERT network and position characteristic information, finishing entity classification of disease signs, disease treatment and the like, and improving the classification accuracy of medical entities.

The entity relationship extraction method fusing the BERT network and the position characteristic information is applied to military: inputting AIR FORCE MIL-HDBK-310-1997 data, and the specific application method is as follows: and (3) carrying out entity extraction on the AIR FORCE MIL-HDBK-310-1997 data by utilizing an entity relationship extraction method for fusing a BERT network and position characteristic information, finishing the classification of different climate entities for developing military products, solving the problem of difficult relationship extraction caused by entity relationship overlapping in the field of developing military product climate, and predicting the climate suitable for developing military products.

The entity relationship extraction method fusing the BERT network and the position characteristic information is applied to finance: inputting Lendingclub data, wherein the specific application method comprises the following steps: and (3) performing entity extraction on the Lendingclub data by using an entity relation extraction method fusing a BERT network and position characteristic information, finishing entity classification of loan clients, loan businesses and loan default factors, comprehensively understanding the development situation of loan financial events, and predicting the development rule of a loan financial market.

The application of the method for extracting the entity relationship fusing the BERT network and the position characteristic information in the law is characterized in that the CALL2018 data is subjected to entity extraction by the method for extracting the entity relationship fusing the BERT network and the position characteristic information, so that the classification of criminal names, law rules and criminal period entities is completed, the classification of the criminal method entities is improved, and the accuracy of the classification of the criminal method entities is improved to predict the criminal names, recommend the law rules and predict the criminal period.

The beneficial effects of the invention are as follows:

by adopting the scheme, multiple data samples conforming to the field are generated through BIO labeling and sampling, and then the data set is divided and normalized. And performing unsupervised pre-training on the text by using a BERT network, and then adding Bi-LSTM on a specific downstream task for fine tuning to obtain stronger context semantic information capturing capability. Secondly, position feature information is added on the basis of an attention mechanism, certain information is paid attention selectively, and information features are extracted better. And finally, realizing entity relation identification and classification. The entity relationship extraction method fusing the BERT network and the position characteristic information considers the entity overlapping property of the field data and can realize high-precision entity extraction in a complex field and a small number of samples. The invention performs entity extraction on data on the FB15K data set.

Drawings

FIG. 1 is a diagram of an improved attention model.

Fig. 2 is a diagram of an entity relationship extraction model fusing a BERT network and location feature information.

The entity of figure 3 extracts the basic BERT structure diagram.

FIG. 4 is a graph of ACC values under different models.

Detailed Description

1) And (3) entity data acquisition: acquiring a data set disclosed in a professional vertical field, and acquiring a sample according to BIO (building information organization) labels;

3) And (3) providing a model structure: the proposed entity relationship extraction model fusing the BERT network and the position characteristic information is shown in FIG. 2 and comprises a BERT model, a Bi-LSTM, a linear layer, an Attention layer and a full connection layer; firstly, capturing stronger context semantic information by a BERT layer, secondly, splicing a complete time step hidden state sequence through Bi-LSTM, mapping the complete time step hidden state sequence to a linear layer to obtain a score of each category label, then, distributing weights through an Attention layer, selecting specific information and a position embedding matrix P to splice and distinguish feature representation of the same entity among different relations in a full connection layer, and finally finishing entity classification prediction through an entity classification matrix C;

3.1 Data is passed through the BERT model and Bi-LSTM, obtaining a score for each category label corresponding to the word level:

X _t ＝Bert _base (w _t ) (1)

it is denoted as sequence X = (X) ₁ ,x ₂ ,…,x _n )。

And backward hidden state sequence

Wherein h is _t And introducing a corresponding memory cell in the hidden layer as shown in formula (2).

Wherein h is _t-1 The output result of the hidden layer of the previous long-and-short memory network unit is obtained; c _t-1 The result of memorizing the state of the network unit in the last time is obtained; x is the number of _t It is the word vector input result of this text; f. of _t The output result of the forgetting gate is shown, wherein sigma is a sigmoid activation function; i.e. i _t And

the output result of the input gate; tan h is tan h activation function; o < O > of a compound _t Is the output result of the output gate; c _t Is the state value of the current cell; h is _t Is the output of the hidden layer of the current cell.

Splicing the forward hidden state sequence and the backward hidden state sequence according to time steps to obtain a complete hidden state sequence, and recording the complete hidden state sequence as H = (H) ₁ ,h ₂ ,…,h _n )。

Finally, the hidden state sequence is mapped to s dimension through the linear layer, and the mapped sequence is recorded as L = (L) ₁ ,l ₂ ,…,l _n ) Wherein L represents the word X _i Y of each corresponding category label _j Is scored.

3.2 Propose a relational location feature attention mechanism: and inputting the score of each category label into a relation position characteristic attention mechanism to obtain an entity classification matrix.

And calculating the weight value of the entity relationship classification result by adopting an attention mechanism QKV model. Vector matrix query with uniform distribution for random sampling _k*1 And obtaining a Query matrix, wherein k is the dimension of an output vector of the hidden layer of the two-way long-short time memory network. And obtaining a Key matrix through a characteristic matrix generated by word vectors of Chinese participles in the sentence. And obtaining a Value matrix by memorizing a matrix formed by output vectors of a hidden layer of the network in a bidirectional long-time and short-time manner.

Attention_w _n×1 ＝softmax(key_w _n×k *query_w _k×1 ) (3)

wherein, the softmax function is used for carrying out vector normalization operation; key _ w _n×k A Key vector matrix of an attention mechanism; query _ w _k×1 A Query vector matrix in an attention mechanism; attention _ w _n×1 Are weighted values for attention mechanism.

Attention_r _k×1 ＝(Attention_w ^T *value_w _n×k ) ^T (4)

wherein value _ w _n×k Memorizing an output vector matrix of a network hidden layer for two-way long and short time; attentint _ r _k×1 It is the output vector matrix of the attention mechanism.

In addition, the relation between the attention output sequence and the output sequence passing through the BERT is established, the feature representation of the same entity among different relations is distinguished, and the final relation classification prediction is made.

Calculating the distance between each word in the output sequence of the BERT and the trigger word of the current attention output sequence, then randomly initializing a position embedding matrix P according to the maximum length m of sentences and the size n of position features, and obtaining the relation position features Pf of each word by inquiring the position embedding matrix P _t The relational position feature is calculated according to equation (5):

Pf _t ＝P _Pr-Pw (5)

wherein Pr represents the position of the relationship trigger word; pw represents the position of the BERT output sequence word.

C＝f(Attention_r _k×1 ；P) (6)

The structure of the improved attention mechanism model is shown in FIG. 1, wherein the Query matrix is a vector matrix Query randomly sampled by uniform distribution _k*1 Wherein k is the output vector dimension of the bidirectional long-short time memory network hidden layer, the Key matrix is a characteristic matrix generated by word vectors of Chinese participles in sentences, and the Value matrix is a matrix formed by the output vectors of the bidirectional long-short time memory network hidden layer.

4) Off-line training: storing the optimal parameters by utilizing a training set and a regularization strategy training model and an entity classification matrix C;

Example 1:

1. the theoretical basis of the scheme of the invention is as follows:

1. BERT network

BERT consists of three modules: the left side is an Embedding module, the middle is a Transformer module, and the right side is a pre-fine-tuning module. The general BERT in entity extraction is shown in fig. 3.

In the entity extraction, embedding includes three parts: word Embedding tensor Token Embedding, sentence segmenting Embedding tensor Segment Embedding, position encoding tensor Position Embedding. The output tensor of the entire Embedding module is the direct sum of the 3 tensors.

In entity extraction, only the Encoder part in the classic transform architecture is used in BERT, and the Decode part is completely abandoned. And the two pre-training tasks are also collectively embodied in the training Transformer module. After the processing of the Transformer of the middle layer, the last layer of the BERT makes different adjustments according to different requirements of entity extraction

2. The technical scheme of the invention comprises the following implementation processes:

1. and (3) entity data acquisition: acquiring a data set disclosed in a professional vertical field, and acquiring a sample according to BIO (building information organization) labels;

2. data processing: dividing the obtained entity sample into a training set, a verification set and a test set, and performing maximum and minimum normalization processing;

3. and (3) providing a model structure: the entity relationship extraction model fusing the BERT network and the position characteristic information consists of a BERT model, a Bi-LSTM, a linear layer, an Attention layer and a full connection layer; firstly, a BERT layer captures stronger context semantic information, secondly, a complete time step hidden state sequence is spliced through Bi-LSTM and mapped to a linear layer to obtain the score of each category label, then, weights are distributed through an Attention layer, specific information and a position embedding matrix P are selected to splice and distinguish the feature representation of the same entity among different relations in a full connection layer, and finally, entity classification prediction is completed through an entity classification matrix C;

3.1 the data is processed by a BERT model and Bi-LSTM to obtain the score of each category label corresponding to the word level:

the BERT model adopts a plurality of layers of transformers as the coding structure, compared with a cyclic neural network based on a time sequence, the BERT has stronger context semantic information capturing capability and contains richer syntax, semantics and context information. Taking BERT as an encoder of an input text sequence, outputting the BERT as the input of each time step of the Bi-LSTM bidirectional long-and-short time memory network, and obtaining a forward hidden state sequence of a Bi-LSTM layer

And backward hidden state sequence

Hiding the forward direction and the backward directionAnd splicing the state sequences according to the time step to obtain a complete hidden state sequence, and then mapping the hidden state sequence to an s-dimension through a linear layer, namely the number of label types of the label set.

3.2 proposes a relation position feature attention mechanism: inputting the score of each category label into a relation position characteristic attention mechanism to obtain an entity classification matrix;

in order to pay attention to specific information and extract information features better, a relation position feature attention mechanism is provided. And calculating attention mechanism weight values and an attention mechanism output matrix in the entity relationship extraction by adopting an attention mechanism model of QKV. Calculating the distance between each word in the output sequence of the BERT and the trigger word of the current attention output sequence, then randomly initializing a position embedding matrix P according to the maximum length m of sentences and the size n of position features, and obtaining the relation position features Pf of each word by inquiring the position embedding matrix P _t And finally, finishing entity classification by the entity classification matrix C.

4. Off-line training: training a model by utilizing a training set and a regularization strategy and storing the optimal parameters of the entity classification matrix C;

5. and (3) online testing: and (3) verifying the performance of the model by applying a test set or fine-tuning the whole model by loading pre-training parameters, and realizing the timely training of the model by using parameter sharing transfer learning.

Evaluation indexes are as follows: in the field of entity relationship extraction, accuracy rates precision and recall rates call of different models are compared, and F1-Score values are compared when the performance is not judged directly when the two indexes have different levels. The calculation formulas of precision, call and F1-Score are formulas (7) to (9).

Wherein TP represents the number which is actually true and is predicted to be true simultaneously; FP means that the number is actually false, but the number predicted to be true is the error rate; FN represents the number of false actually, but is predicted to be, namely the rate of missing report.

5.1 FB15K data set

FB15K is a subset of the knowledge map Freebase, containing extensive general human knowledge. FB15k contains 14951 entities, 592213 triples, and the approximate ratio of the training set, the validation set, and the test set is: 9:1:1. the results of the experiment are shown in table 1.

TABLE 1 accuracy, recall and F1-Score values for different models

Table 1 it can be seen that BERT _ BAP is the algorithm of the present invention. The algorithm of the invention achieves the optimal performance on the data set with accuracy, recall ratio and F1-Score. Compared with a word embedding model, the BERT network pre-training model can better extract characteristic information between corpora.

Fig. 4 is a comparison of different model ACCs on FB15K dataset for experiments. BERT _ BAP is the algorithm of the present invention. It can be seen that the ACC value also converges to a stable value in the end as the data set exercises. From the figure, it can be derived that the ACC value of BERT _ BAP is good. The network model of the BERT network model has better effect on the whole.

The algorithm provided by the invention can be applied to the military field, the medical field and the like. Unsupervised pre-training is carried out on the text through a BERT network, and then Bi-LSTM is added to a specific downstream task for fine tuning, so that stronger context semantic information capturing capability is obtained. The position feature information is added on the basis of an attention mechanism, certain information is selectively paid attention to, and information features are extracted better. And further, the entity relation is identified and classified, and effective extraction of the entity relation is achieved.

Claims

1. The method for extracting the entity relationship fusing the BERT network and the position characteristic information is characterized by comprising the following steps:

1) Acquiring entity data: acquiring a data set disclosed in a professional vertical field, and acquiring a sample according to BIO (building information object) labels;

3) And (3) providing a model structure: the entity relationship extraction model fusing the BERT network and the position characteristic information consists of a BERT model, a Bi-LSTM, a linear layer, an Attention layer and a full connection layer; firstly, capturing stronger context semantic information by a BERT layer, secondly, splicing a complete time step hidden state sequence through Bi-LSTM, mapping the complete time step hidden state sequence to a linear layer to obtain a score of each category label, then, distributing weights through an Attention layer, selecting specific information and a position embedding matrix P to splice and distinguish feature representation of the same entity among different relations in a full connection layer, and finally finishing entity classification prediction through an entity classification matrix C;

5) And (3) online testing: and (3) verifying the performance of the model by applying a test set or fine-tuning the whole model by loading pre-training parameters, and realizing the timely training of the model by using parameter sharing migration learning.

2. The method for extracting entity relationship fusing BERT network and location feature information according to claim 1, wherein in step 3), the specific method is as follows:

X _t ＝Bert _base (w _t ) (1)

it is denoted as the sequence X = (X) ₁ ,x ₂ ,…,x _n )；

And backward hidden state sequence

Finally, the hidden state sequence is mapped to s dimension through the linear layer, and the mapped sequence is recorded as L = (L) ₁ ,l ₂ ,…,l _n ) (ii) a Wherein L represents the word X _i Y for each corresponding category label _j Score of (a);

Attention_w _n×1 ＝softmax(key_w _n×k *query_w _k×1 ) (3)

wherein, the softmax function is used for carrying out vector normalization operation; key _ w _n×k A Key vector matrix of an attention mechanism; query _ w _k×1 A Query vector matrix in an attention mechanism; attention _ w _n×1 A weight value for attention mechanism;

Attention_r _k×1 ＝(Attention_w ^T *value_w _n×k ) ^T (4)

Pf _t ＝P _Pr-Pw (5)

C＝f(Attention_r _k×1 ；P) (6)

3. The application of the entity relationship extraction method fusing the BERT network and the position feature information in medicine as claimed in claim 2, wherein the specific application method is as follows: and performing entity extraction on the CNMER data by using an entity relation extraction method for fusing a BERT network and position characteristic information to finish the classification of disease signs and disease treatment entities.

4. The application of the entity relationship extraction method fusing the BERT network and the position characteristic information in military as claimed in claim 2 is characterized in that the specific application method is as follows: and (3) carrying out entity extraction on the AIR FORCE MIL-HDBK-310-1997 data by using an entity relation extraction method fusing a BERT network and position characteristic information, and completing the classification of different climate entities for developing military products.

5. The application of the entity relationship extraction method fusing the BERT network and the location feature information as claimed in claim 2 in finance is characterized in that the specific application method is as follows: and (4) performing entity extraction on the Lendingclub data by using an entity relation extraction method fusing a BERT network and position characteristic information to finish entity classification of loan clients, loan businesses and loan default factors.

6. The application of the entity relationship extraction method fusing the BERT network and the location feature information as claimed in claim 2 in the law is characterized in that the specific application method is as follows: and (3) carrying out entity extraction on the CALL2018 data by using an entity relation extraction method fusing a BERT network and position characteristic information, finishing the classification of criminal names, law rules and criminal period entities and improving the classification of the criminal method entities.