CN115203434A - Entity relationship extraction method fusing BERT network and position characteristic information and application thereof - Google Patents

Entity relationship extraction method fusing BERT network and position characteristic information and application thereof Download PDF

Info

Publication number
CN115203434A
CN115203434A CN202210791774.5A CN202210791774A CN115203434A CN 115203434 A CN115203434 A CN 115203434A CN 202210791774 A CN202210791774 A CN 202210791774A CN 115203434 A CN115203434 A CN 115203434A
Authority
CN
China
Prior art keywords
entity
bert
matrix
network
output
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210791774.5A
Other languages
Chinese (zh)
Inventor
冯勇
张宁
徐红艳
王嵘冰
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Liaoning University
Original Assignee
Liaoning University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Liaoning University filed Critical Liaoning University
Priority to CN202210791774.5A priority Critical patent/CN115203434A/en
Publication of CN115203434A publication Critical patent/CN115203434A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/088Non-supervised learning, e.g. competitive learning

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Biomedical Technology (AREA)
  • Artificial Intelligence (AREA)
  • Biophysics (AREA)
  • Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Animal Behavior & Ethology (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

A method for extracting entity relationship fusing a BERT network and position characteristic information and application thereof comprise the following steps: (1) entity data acquisition; (2) data processing; (3) providing a model structure; (4) off-line training; (5) online testing; the invention discloses an entity relationship extraction method fusing a BERT network and position characteristic information. By the method, the problems of information redundancy, entity overlapping, low accuracy rate and the like in the traditional entity extraction method are solved. The method firstly utilizes the BERT network to perform unsupervised pre-training on texts, and then adds Bi-LSTM to perform fine tuning on specific downstream tasks. Secondly, the position characteristic information is added on the basis of the attention mechanism, and certain information is selectively focused. And finally, realizing entity relation identification and classification. The method disclosed by the invention is used for carrying out comparison experiments on the public data set FB15K, and the method disclosed by the invention is verified to have higher accuracy and good application prospect.

Description

Entity relationship extraction method fusing BERT network and position characteristic information and application thereof
Technical Field
The invention relates to an entity relation extraction method fusing a BERT network and position characteristic information and application thereof, belonging to the field of knowledge extraction.
Background
With the multi-domain development of the knowledge graph, comprehensive processing and utilization capacity of data in most vertical domains can be greatly improved by analyzing and mining a large amount of heterogeneous data, and entity relationship extraction is an important step for constructing the large-scale domain knowledge graph. The big data era is obtained by a mat, a knowledge graph must be fused with complicated data, and the traditional entity relationship extraction method mainly ignores the internal relation between entity identification and relationship classification, has the problems of error propagation, information redundancy and the like, and can not effectively solve the problem of entity relationship overlapping. A deep learning method is introduced for solving the related problems, the deep learning method can alleviate the defects of a relation extraction model based on the traditional characteristics to a certain extent, and the accumulated error is small. However, in many fields, due to different scales of data volumes, there are many limitations in applying the deep learning method directly to entity relationship extraction, and there is a need to improve a related network model to solve the problem of entity relationship extraction in a specific application field.
Disclosure of Invention
In order to solve the existing problems, the invention provides an entity relationship extraction method fusing a BERT network and position characteristic information and application thereof.
The purpose of the invention is realized by the following technical scheme:
the method for extracting the entity relationship fusing the BERT network and the position characteristic information comprises the following steps:
1) And (3) entity data acquisition: acquiring a data set disclosed in a professional vertical field, and acquiring a sample according to BIO (building information object) labels;
2) Data processing: dividing the obtained entity sample into a training set, a verification set and a test set, and performing maximum and minimum normalization processing;
3) And (3) providing a model structure: the entity relationship extraction model fusing the BERT network and the position characteristic information consists of a BERT model, a Bi-LSTM, a linear layer, an Attention layer and a full connection layer; firstly, a BERT layer captures stronger context semantic information, secondly, a complete time step hidden state sequence is spliced through Bi-LSTM and mapped to a linear layer to obtain the score of each category label, then, weights are distributed through an Attention layer, specific information and a position embedding matrix P are selected to splice and distinguish the feature representation of the same entity among different relations in a full connection layer, and finally, entity classification prediction is completed through an entity classification matrix C;
4) Off-line training: training a model by utilizing a training set and a regularization strategy and storing the optimal parameters of the entity classification matrix C;
5) And (3) online testing: and (3) verifying the performance of the model by applying a test set or fine-tuning the whole model by loading pre-training parameters, and realizing the timely training of the model by using parameter sharing transfer learning.
In the step 3), the specific method is as follows:
3.1 Data passes through the BERT model and Bi-LSTM, and the score of each category label corresponding to the word level is obtained:
taking BERT as an encoder of an input text sequence, and acquiring a hidden layer state vector X by adopting the BERT t Specifically, as shown in formula (1):
X t =Bert base (w t ) (1)
it is denoted as sequence X = (X) 1 ,x 2 ,…,x n );
Taking the sequence X obtained by the BERT network as the input of each time step of the Bi-directional long-short time memory network Bi-LSTM to obtain the forward hidden state sequence of the Bi-LSTM layer
Figure BDA0003734124510000021
And backward hidden state sequence
Figure BDA0003734124510000022
Wherein h is t Introducing a corresponding memory cell in the hidden layer as shown in formula (2);
Figure BDA0003734124510000023
wherein h is t-1 The output result of the hidden layer of the previous long-short-time memory network unit is obtained; c t-1 Then it isMemorizing the state result of the network unit at the last time; x is a radical of a fluorine atom t Then it is the word vector input result for the text; f. of t The output result of the forgetting gate is shown, wherein sigma is a sigmoid activation function; i.e. i t And with
Figure BDA0003734124510000024
The output result of the input gate; tan h is tan h activation function; o t Is the output result of the output gate; c t Is the state value of the current cell; h is t An output of a hidden layer that is a current cell;
splicing the forward hidden state sequence and the backward hidden state sequence according to the time step to obtain a complete hidden state sequence, and recording the complete hidden state sequence as H = (H) 1 ,h 2 ,…,h n );
Finally, the hidden state sequence is mapped to s dimension through the linear layer, and the mapped sequence is recorded as L = (L) 1 ,l 2 ,…,l n ) (ii) a Wherein L represents the word X i Y of each corresponding category label j A score of (a);
3.2 Propose a relational location feature attention mechanism: inputting the score of each category label into a relation position characteristic attention mechanism to obtain an entity classification matrix C;
calculating the weight value of the entity relationship classification result by adopting an attention mechanism QKV model; vector matrix query with uniform distribution for random sampling k*1 Obtaining a Query matrix, wherein k is the output vector dimension of a hidden layer of the bidirectional long-short time memory network; obtaining a Key matrix through a feature matrix generated by word vectors of Chinese word segmentation in sentences; obtaining a Value matrix by a matrix formed by bidirectional long-time and short-time memory network hidden layer output vectors;
the attention mechanism weight value in the entity relationship extraction is calculated according to formula (3):
Attention_w n×1 =softmax(key_w n×k *query_w k×1 ) (3)
wherein, the softmax function is used for carrying out vector normalization operation; key _ w n×k A Key vector matrix of an attention mechanism; query _ w k×1 To pay attention to the Query vector moment in the power mechanismArraying; attention _ w n×1 A weight value for attention mechanism;
the attention mechanism output matrix in the entity relationship extraction is calculated according to the formula (4):
Attention_r k×1 =(Attention_w T *value_w n×k ) T (4)
wherein value _ w n×k Memorizing an output vector matrix of a network hidden layer for two-way long and short time; attentint _ r k×1 Then it is the output vector matrix of the attention mechanism;
in addition, a relation between the attention output sequence and the output sequence passing through the BERT is established, the feature representation of the same entity among different relations is distinguished, and the final relation classification prediction is made;
calculating the distance between each word in the output sequence of the BERT and the trigger word of the current attention output sequence, then randomly initializing a position embedding matrix P according to the maximum length m of sentences and the position feature size n, and obtaining the relation position feature Pf of each word by inquiring the position embedding matrix P t The relational position feature is calculated according to equation (5):
Pf t =P Pr-Pw (5)
wherein Pr represents the position of the relationship trigger word; pw represents the position of the BERT output sequence word;
and finally, splicing the position embedded matrix and the attention mechanism output matrix at a full connection layer to obtain an entity classification matrix (6):
C=f(Attention_r k×1 ;P) (6)
wherein f (-) represents a fully connected layer; attentint _ r k×1 An output vector matrix of the attention mechanism; p is the position embedding matrix, and C is the current entity classification matrix.
The extraction method of the entity relationship fusing the BERT network and the position characteristic information is applied to medicine: the CNMER data is input, and the specific application method comprises the following steps: and performing entity extraction on the CNMER data by using an entity relation extraction method fusing a BERT network and position characteristic information, finishing entity classification of disease signs, disease treatment and the like, and improving the classification accuracy of medical entities.
The entity relationship extraction method fusing the BERT network and the position characteristic information is applied to military: inputting AIR FORCE MIL-HDBK-310-1997 data, and the specific application method is as follows: and (3) carrying out entity extraction on the AIR FORCE MIL-HDBK-310-1997 data by utilizing an entity relationship extraction method for fusing a BERT network and position characteristic information, finishing the classification of different climate entities for developing military products, solving the problem of difficult relationship extraction caused by entity relationship overlapping in the field of developing military product climate, and predicting the climate suitable for developing military products.
The entity relationship extraction method fusing the BERT network and the position characteristic information is applied to finance: inputting Lendingclub data, wherein the specific application method comprises the following steps: and (3) performing entity extraction on the Lendingclub data by using an entity relation extraction method fusing a BERT network and position characteristic information, finishing entity classification of loan clients, loan businesses and loan default factors, comprehensively understanding the development situation of loan financial events, and predicting the development rule of a loan financial market.
The application of the method for extracting the entity relationship fusing the BERT network and the position characteristic information in the law is characterized in that the CALL2018 data is subjected to entity extraction by the method for extracting the entity relationship fusing the BERT network and the position characteristic information, so that the classification of criminal names, law rules and criminal period entities is completed, the classification of the criminal method entities is improved, and the accuracy of the classification of the criminal method entities is improved to predict the criminal names, recommend the law rules and predict the criminal period.
The beneficial effects of the invention are as follows:
by adopting the scheme, multiple data samples conforming to the field are generated through BIO labeling and sampling, and then the data set is divided and normalized. And performing unsupervised pre-training on the text by using a BERT network, and then adding Bi-LSTM on a specific downstream task for fine tuning to obtain stronger context semantic information capturing capability. Secondly, position feature information is added on the basis of an attention mechanism, certain information is paid attention selectively, and information features are extracted better. And finally, realizing entity relation identification and classification. The entity relationship extraction method fusing the BERT network and the position characteristic information considers the entity overlapping property of the field data and can realize high-precision entity extraction in a complex field and a small number of samples. The invention performs entity extraction on data on the FB15K data set.
Drawings
FIG. 1 is a diagram of an improved attention model.
Fig. 2 is a diagram of an entity relationship extraction model fusing a BERT network and location feature information.
The entity of figure 3 extracts the basic BERT structure diagram.
FIG. 4 is a graph of ACC values under different models.
Detailed Description
The method for extracting the entity relationship fusing the BERT network and the position characteristic information comprises the following steps:
1) And (3) entity data acquisition: acquiring a data set disclosed in a professional vertical field, and acquiring a sample according to BIO (building information organization) labels;
2) Data processing: dividing the obtained entity sample into a training set, a verification set and a test set, and performing maximum and minimum normalization processing;
3) And (3) providing a model structure: the proposed entity relationship extraction model fusing the BERT network and the position characteristic information is shown in FIG. 2 and comprises a BERT model, a Bi-LSTM, a linear layer, an Attention layer and a full connection layer; firstly, capturing stronger context semantic information by a BERT layer, secondly, splicing a complete time step hidden state sequence through Bi-LSTM, mapping the complete time step hidden state sequence to a linear layer to obtain a score of each category label, then, distributing weights through an Attention layer, selecting specific information and a position embedding matrix P to splice and distinguish feature representation of the same entity among different relations in a full connection layer, and finally finishing entity classification prediction through an entity classification matrix C;
3.1 Data is passed through the BERT model and Bi-LSTM, obtaining a score for each category label corresponding to the word level:
taking BERT as an encoder of an input text sequence, and acquiring a hidden layer state vector X by adopting the BERT t Specifically, as shown in formula (1):
X t =Bert base (w t ) (1)
it is denoted as sequence X = (X) 1 ,x 2 ,…,x n )。
Taking the sequence X obtained by the BERT network as the input of each time step of the Bi-directional long-short time memory network Bi-LSTM to obtain the forward hidden state sequence of the Bi-LSTM layer
Figure BDA0003734124510000051
And backward hidden state sequence
Figure BDA0003734124510000052
Wherein h is t And introducing a corresponding memory cell in the hidden layer as shown in formula (2).
Figure BDA0003734124510000053
Wherein h is t-1 The output result of the hidden layer of the previous long-and-short memory network unit is obtained; c t-1 The result of memorizing the state of the network unit in the last time is obtained; x is the number of t It is the word vector input result of this text; f. of t The output result of the forgetting gate is shown, wherein sigma is a sigmoid activation function; i.e. i t And
Figure BDA0003734124510000054
the output result of the input gate; tan h is tan h activation function; o < O > of a compound t Is the output result of the output gate; c t Is the state value of the current cell; h is t Is the output of the hidden layer of the current cell.
Splicing the forward hidden state sequence and the backward hidden state sequence according to time steps to obtain a complete hidden state sequence, and recording the complete hidden state sequence as H = (H) 1 ,h 2 ,…,h n )。
Finally, the hidden state sequence is mapped to s dimension through the linear layer, and the mapped sequence is recorded as L = (L) 1 ,l 2 ,…,l n ) Wherein L represents the word X i Y of each corresponding category label j Is scored.
3.2 Propose a relational location feature attention mechanism: and inputting the score of each category label into a relation position characteristic attention mechanism to obtain an entity classification matrix.
And calculating the weight value of the entity relationship classification result by adopting an attention mechanism QKV model. Vector matrix query with uniform distribution for random sampling k*1 And obtaining a Query matrix, wherein k is the dimension of an output vector of the hidden layer of the two-way long-short time memory network. And obtaining a Key matrix through a characteristic matrix generated by word vectors of Chinese participles in the sentence. And obtaining a Value matrix by memorizing a matrix formed by output vectors of a hidden layer of the network in a bidirectional long-time and short-time manner.
The attention mechanism weight value in the entity relationship extraction is calculated according to formula (3):
Attention_w n×1 =softmax(key_w n×k *query_w k×1 ) (3)
wherein, the softmax function is used for carrying out vector normalization operation; key _ w n×k A Key vector matrix of an attention mechanism; query _ w k×1 A Query vector matrix in an attention mechanism; attention _ w n×1 Are weighted values for attention mechanism.
The attention mechanism output matrix in the entity relationship extraction is calculated according to the formula (4):
Attention_r k×1 =(Attention_w T *value_w n×k ) T (4)
wherein value _ w n×k Memorizing an output vector matrix of a network hidden layer for two-way long and short time; attentint _ r k×1 It is the output vector matrix of the attention mechanism.
In addition, the relation between the attention output sequence and the output sequence passing through the BERT is established, the feature representation of the same entity among different relations is distinguished, and the final relation classification prediction is made.
Calculating the distance between each word in the output sequence of the BERT and the trigger word of the current attention output sequence, then randomly initializing a position embedding matrix P according to the maximum length m of sentences and the size n of position features, and obtaining the relation position features Pf of each word by inquiring the position embedding matrix P t The relational position feature is calculated according to equation (5):
Pf t =P Pr-Pw (5)
wherein Pr represents the position of the relationship trigger word; pw represents the position of the BERT output sequence word.
And finally, splicing the position embedded matrix and the attention mechanism output matrix at a full connection layer to obtain an entity classification matrix (6):
C=f(Attention_r k×1 ;P) (6)
wherein f (-) represents a fully connected layer; attentint _ r k×1 An output vector matrix of the attention mechanism; p is the position embedding matrix, and C is the current entity classification matrix.
The structure of the improved attention mechanism model is shown in FIG. 1, wherein the Query matrix is a vector matrix Query randomly sampled by uniform distribution k*1 Wherein k is the output vector dimension of the bidirectional long-short time memory network hidden layer, the Key matrix is a characteristic matrix generated by word vectors of Chinese participles in sentences, and the Value matrix is a matrix formed by the output vectors of the bidirectional long-short time memory network hidden layer.
4) Off-line training: storing the optimal parameters by utilizing a training set and a regularization strategy training model and an entity classification matrix C;
5) And (3) online testing: and (3) verifying the performance of the model by applying a test set or fine-tuning the whole model by loading pre-training parameters, and realizing the timely training of the model by using parameter sharing transfer learning.
Example 1:
1. the theoretical basis of the scheme of the invention is as follows:
1. BERT network
BERT consists of three modules: the left side is an Embedding module, the middle is a Transformer module, and the right side is a pre-fine-tuning module. The general BERT in entity extraction is shown in fig. 3.
In the entity extraction, embedding includes three parts: word Embedding tensor Token Embedding, sentence segmenting Embedding tensor Segment Embedding, position encoding tensor Position Embedding. The output tensor of the entire Embedding module is the direct sum of the 3 tensors.
In entity extraction, only the Encoder part in the classic transform architecture is used in BERT, and the Decode part is completely abandoned. And the two pre-training tasks are also collectively embodied in the training Transformer module. After the processing of the Transformer of the middle layer, the last layer of the BERT makes different adjustments according to different requirements of entity extraction
2. The technical scheme of the invention comprises the following implementation processes:
1. and (3) entity data acquisition: acquiring a data set disclosed in a professional vertical field, and acquiring a sample according to BIO (building information organization) labels;
2. data processing: dividing the obtained entity sample into a training set, a verification set and a test set, and performing maximum and minimum normalization processing;
3. and (3) providing a model structure: the entity relationship extraction model fusing the BERT network and the position characteristic information consists of a BERT model, a Bi-LSTM, a linear layer, an Attention layer and a full connection layer; firstly, a BERT layer captures stronger context semantic information, secondly, a complete time step hidden state sequence is spliced through Bi-LSTM and mapped to a linear layer to obtain the score of each category label, then, weights are distributed through an Attention layer, specific information and a position embedding matrix P are selected to splice and distinguish the feature representation of the same entity among different relations in a full connection layer, and finally, entity classification prediction is completed through an entity classification matrix C;
3.1 the data is processed by a BERT model and Bi-LSTM to obtain the score of each category label corresponding to the word level:
the BERT model adopts a plurality of layers of transformers as the coding structure, compared with a cyclic neural network based on a time sequence, the BERT has stronger context semantic information capturing capability and contains richer syntax, semantics and context information. Taking BERT as an encoder of an input text sequence, outputting the BERT as the input of each time step of the Bi-LSTM bidirectional long-and-short time memory network, and obtaining a forward hidden state sequence of a Bi-LSTM layer
Figure BDA0003734124510000071
And backward hidden state sequence
Figure BDA0003734124510000072
Hiding the forward direction and the backward directionAnd splicing the state sequences according to the time step to obtain a complete hidden state sequence, and then mapping the hidden state sequence to an s-dimension through a linear layer, namely the number of label types of the label set.
3.2 proposes a relation position feature attention mechanism: inputting the score of each category label into a relation position characteristic attention mechanism to obtain an entity classification matrix;
in order to pay attention to specific information and extract information features better, a relation position feature attention mechanism is provided. And calculating attention mechanism weight values and an attention mechanism output matrix in the entity relationship extraction by adopting an attention mechanism model of QKV. Calculating the distance between each word in the output sequence of the BERT and the trigger word of the current attention output sequence, then randomly initializing a position embedding matrix P according to the maximum length m of sentences and the size n of position features, and obtaining the relation position features Pf of each word by inquiring the position embedding matrix P t And finally, finishing entity classification by the entity classification matrix C.
4. Off-line training: training a model by utilizing a training set and a regularization strategy and storing the optimal parameters of the entity classification matrix C;
5. and (3) online testing: and (3) verifying the performance of the model by applying a test set or fine-tuning the whole model by loading pre-training parameters, and realizing the timely training of the model by using parameter sharing transfer learning.
Evaluation indexes are as follows: in the field of entity relationship extraction, accuracy rates precision and recall rates call of different models are compared, and F1-Score values are compared when the performance is not judged directly when the two indexes have different levels. The calculation formulas of precision, call and F1-Score are formulas (7) to (9).
Figure BDA0003734124510000081
Figure BDA0003734124510000082
Figure BDA0003734124510000083
Wherein TP represents the number which is actually true and is predicted to be true simultaneously; FP means that the number is actually false, but the number predicted to be true is the error rate; FN represents the number of false actually, but is predicted to be, namely the rate of missing report.
5.1 FB15K data set
FB15K is a subset of the knowledge map Freebase, containing extensive general human knowledge. FB15k contains 14951 entities, 592213 triples, and the approximate ratio of the training set, the validation set, and the test set is: 9:1:1. the results of the experiment are shown in table 1.
TABLE 1 accuracy, recall and F1-Score values for different models
Figure BDA0003734124510000084
Table 1 it can be seen that BERT _ BAP is the algorithm of the present invention. The algorithm of the invention achieves the optimal performance on the data set with accuracy, recall ratio and F1-Score. Compared with a word embedding model, the BERT network pre-training model can better extract characteristic information between corpora.
Fig. 4 is a comparison of different model ACCs on FB15K dataset for experiments. BERT _ BAP is the algorithm of the present invention. It can be seen that the ACC value also converges to a stable value in the end as the data set exercises. From the figure, it can be derived that the ACC value of BERT _ BAP is good. The network model of the BERT network model has better effect on the whole.
The algorithm provided by the invention can be applied to the military field, the medical field and the like. Unsupervised pre-training is carried out on the text through a BERT network, and then Bi-LSTM is added to a specific downstream task for fine tuning, so that stronger context semantic information capturing capability is obtained. The position feature information is added on the basis of an attention mechanism, certain information is selectively paid attention to, and information features are extracted better. And further, the entity relation is identified and classified, and effective extraction of the entity relation is achieved.

Claims (6)

1. The method for extracting the entity relationship fusing the BERT network and the position characteristic information is characterized by comprising the following steps:
1) Acquiring entity data: acquiring a data set disclosed in a professional vertical field, and acquiring a sample according to BIO (building information object) labels;
2) Data processing: dividing the obtained entity sample into a training set, a verification set and a test set, and performing maximum and minimum normalization processing;
3) And (3) providing a model structure: the entity relationship extraction model fusing the BERT network and the position characteristic information consists of a BERT model, a Bi-LSTM, a linear layer, an Attention layer and a full connection layer; firstly, capturing stronger context semantic information by a BERT layer, secondly, splicing a complete time step hidden state sequence through Bi-LSTM, mapping the complete time step hidden state sequence to a linear layer to obtain a score of each category label, then, distributing weights through an Attention layer, selecting specific information and a position embedding matrix P to splice and distinguish feature representation of the same entity among different relations in a full connection layer, and finally finishing entity classification prediction through an entity classification matrix C;
4) Off-line training: training a model by utilizing a training set and a regularization strategy and storing the optimal parameters of the entity classification matrix C;
5) And (3) online testing: and (3) verifying the performance of the model by applying a test set or fine-tuning the whole model by loading pre-training parameters, and realizing the timely training of the model by using parameter sharing migration learning.
2. The method for extracting entity relationship fusing BERT network and location feature information according to claim 1, wherein in step 3), the specific method is as follows:
3.1 Data is passed through the BERT model and Bi-LSTM, obtaining a score for each category label corresponding to the word level:
taking BERT as an encoder of an input text sequence, and acquiring a hidden layer state vector X by adopting the BERT t Specifically, as shown in formula (1):
X t =Bert base (w t ) (1)
it is denoted as the sequence X = (X) 1 ,x 2 ,…,x n );
Taking the sequence X obtained by the BERT network as the input of each time step of the Bi-directional long-short time memory network Bi-LSTM to obtain the forward hidden state sequence of the Bi-LSTM layer
Figure FDA0003734124500000011
And backward hidden state sequence
Figure FDA0003734124500000012
Wherein h is t Introducing a corresponding memory cell in the hidden layer as shown in formula (2);
Figure FDA0003734124500000013
wherein h is t-1 The output result of the hidden layer of the previous long-and-short memory network unit is obtained; c t-1 The result of memorizing the state of the network unit in the last time is obtained; x is the number of t It is the word vector input result of this text; f. of t The output result of the forgetting gate is shown, wherein sigma is a sigmoid activation function; i.e. i t And
Figure FDA0003734124500000014
the output result of the input gate; tan h is tan h activation function; o t Is the output result of the output gate; c t Is the state value of the current cell; h is t An output of a hidden layer that is a current cell;
splicing the forward hidden state sequence and the backward hidden state sequence according to the time step to obtain a complete hidden state sequence, and recording the complete hidden state sequence as H = (H) 1 ,h 2 ,…,h n );
Finally, the hidden state sequence is mapped to s dimension through the linear layer, and the mapped sequence is recorded as L = (L) 1 ,l 2 ,…,l n ) (ii) a Wherein L represents the word X i Y for each corresponding category label j Score of (a);
3.2 Propose a relational location feature attention mechanism: inputting the score of each category label into a relation position characteristic attention mechanism to obtain an entity classification matrix C;
calculating the weight value of the entity relationship classification result by adopting an attention mechanism QKV model; vector matrix query with uniform distribution for random sampling k*1 Obtaining a Query matrix, wherein k is the output vector dimension of a hidden layer of the bidirectional long-short time memory network; obtaining a Key matrix through a feature matrix generated by word vectors of Chinese word segmentation in sentences; obtaining a Value matrix by a matrix formed by bidirectional long-time and short-time memory network hidden layer output vectors;
the attention mechanism weight value in the entity relationship extraction is calculated according to formula (3):
Attention_w n×1 =softmax(key_w n×k *query_w k×1 ) (3)
wherein, the softmax function is used for carrying out vector normalization operation; key _ w n×k A Key vector matrix of an attention mechanism; query _ w k×1 A Query vector matrix in an attention mechanism; attention _ w n×1 A weight value for attention mechanism;
the attention mechanism output matrix in the entity relationship extraction is calculated according to the formula (4):
Attention_r k×1 =(Attention_w T *value_w n×k ) T (4)
wherein value _ w n×k Memorizing an output vector matrix of a network hidden layer for two-way long and short time; attentint _ r k×1 Then it is the output vector matrix of the attention mechanism;
in addition, a relation between the attention output sequence and the output sequence passing through the BERT is established, the feature representation of the same entity among different relations is distinguished, and the final relation classification prediction is made;
calculating the distance between each word in the output sequence of the BERT and the trigger word of the current attention output sequence, then randomly initializing a position embedding matrix P according to the maximum length m of sentences and the size n of position features, and obtaining the relation position features Pf of each word by inquiring the position embedding matrix P t The relational position feature is calculated according to equation (5):
Pf t =P Pr-Pw (5)
wherein Pr represents the position of the relationship trigger word; pw represents the position of the BERT output sequence word;
and finally, splicing the position embedded matrix and the attention mechanism output matrix at a full connection layer to obtain an entity classification matrix (6):
C=f(Attention_r k×1 ;P) (6)
wherein f (-) represents a fully connected layer; attentint _ r k×1 An output vector matrix of the attention mechanism; p is the position embedding matrix, and C is the current entity classification matrix.
3. The application of the entity relationship extraction method fusing the BERT network and the position feature information in medicine as claimed in claim 2, wherein the specific application method is as follows: and performing entity extraction on the CNMER data by using an entity relation extraction method for fusing a BERT network and position characteristic information to finish the classification of disease signs and disease treatment entities.
4. The application of the entity relationship extraction method fusing the BERT network and the position characteristic information in military as claimed in claim 2 is characterized in that the specific application method is as follows: and (3) carrying out entity extraction on the AIR FORCE MIL-HDBK-310-1997 data by using an entity relation extraction method fusing a BERT network and position characteristic information, and completing the classification of different climate entities for developing military products.
5. The application of the entity relationship extraction method fusing the BERT network and the location feature information as claimed in claim 2 in finance is characterized in that the specific application method is as follows: and (4) performing entity extraction on the Lendingclub data by using an entity relation extraction method fusing a BERT network and position characteristic information to finish entity classification of loan clients, loan businesses and loan default factors.
6. The application of the entity relationship extraction method fusing the BERT network and the location feature information as claimed in claim 2 in the law is characterized in that the specific application method is as follows: and (3) carrying out entity extraction on the CALL2018 data by using an entity relation extraction method fusing a BERT network and position characteristic information, finishing the classification of criminal names, law rules and criminal period entities and improving the classification of the criminal method entities.
CN202210791774.5A 2022-07-07 2022-07-07 Entity relationship extraction method fusing BERT network and position characteristic information and application thereof Pending CN115203434A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210791774.5A CN115203434A (en) 2022-07-07 2022-07-07 Entity relationship extraction method fusing BERT network and position characteristic information and application thereof

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210791774.5A CN115203434A (en) 2022-07-07 2022-07-07 Entity relationship extraction method fusing BERT network and position characteristic information and application thereof

Publications (1)

Publication Number Publication Date
CN115203434A true CN115203434A (en) 2022-10-18

Family

ID=83580085

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210791774.5A Pending CN115203434A (en) 2022-07-07 2022-07-07 Entity relationship extraction method fusing BERT network and position characteristic information and application thereof

Country Status (1)

Country Link
CN (1) CN115203434A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116227495A (en) * 2023-05-05 2023-06-06 公安部信息通信中心 Entity classification data processing system
CN116894436A (en) * 2023-09-06 2023-10-17 神州医疗科技股份有限公司 Data enhancement method and system based on medical named entity recognition

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116227495A (en) * 2023-05-05 2023-06-06 公安部信息通信中心 Entity classification data processing system
CN116894436A (en) * 2023-09-06 2023-10-17 神州医疗科技股份有限公司 Data enhancement method and system based on medical named entity recognition
CN116894436B (en) * 2023-09-06 2023-12-15 神州医疗科技股份有限公司 Data enhancement method and system based on medical named entity recognition

Similar Documents

Publication Publication Date Title
CN111738003B (en) Named entity recognition model training method, named entity recognition method and medium
CN110188272B (en) Community question-answering website label recommendation method based on user background
CN112069302B (en) Training method of conversation intention recognition model, conversation intention recognition method and device
CN111191002B (en) Neural code searching method and device based on hierarchical embedding
CN110147457A (en) Picture and text matching process, device, storage medium and equipment
CN115203434A (en) Entity relationship extraction method fusing BERT network and position characteristic information and application thereof
CN112561718A (en) Case microblog evaluation object emotion tendency analysis method based on BilSTM weight sharing
CN112256866A (en) Text fine-grained emotion analysis method based on deep learning
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
CN112528136A (en) Viewpoint label generation method and device, electronic equipment and storage medium
CN117217277A (en) Pre-training method, device, equipment, storage medium and product of language model
CN114372532B (en) Method, device, equipment, medium and product for determining label labeling quality
CN113920379B (en) Zero sample image classification method based on knowledge assistance
CN116450848B (en) Method, device and medium for evaluating computing thinking level based on event map
Hao et al. Deep collaborative online learning resource recommendation based on attention mechanism
CN117312562A (en) Training method, device, equipment and storage medium of content auditing model
CN116958622A (en) Data classification method, device, equipment, medium and program product
CN114757183B (en) Cross-domain emotion classification method based on comparison alignment network
Zhang et al. Knowledge-aware attentive wasserstein adversarial dialogue response generation
CN112528015B (en) Method and device for judging rumor in message interactive transmission
Sun et al. A BERT-based deontic logic learner
Akula et al. Credibility of social-media content using bidirectional long short-term memory-recurrent neural networks
Amiriparian et al. Humans inside: cooperative big multimedia data mining
Yun et al. [Retracted] Quality Evaluation and Satisfaction Analysis of Online Learning of College Students Based on Artificial Intelligence
CN118070775B (en) Performance evaluation method and device of abstract generation model and computer equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination