CN113987150A

CN113987150A - Bert-based multi-layer attention mechanism relation extraction method

Info

Publication number: CN113987150A
Application number: CN202111275258.9A
Authority: CN
Inventors: 王伟; 陈加杰; 孙思明
Original assignee: Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Current assignee: Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Priority date: 2021-10-29
Filing date: 2021-10-29
Publication date: 2022-01-28

Abstract

The invention relates to an entity relationship extraction method, in particular to a bert-based multi-layer attention mechanism relationship extraction method, which comprises the following steps: step S1: acquiring sample data; step S2: dividing sample data; step S3: replacing entities in the sample data with #; step S4: acquiring an entity and connecting the entity and a sentence together by $ to form a training sample; step S5: inputting the training samples into a BERT language model and a full connection layer, and carrying out vectorization processing to obtain vectorized representation of fusion characteristics; step S6: sequentially increasing a word attention mechanism for the training samples and adding weight to local features of the training samples; step S7: increasing a sentence-level attention mechanism and adding weight to the global features; step S8: and inputting the sentence-level feature vectors into a classifier after passing through a full connection layer to obtain a classification result.

Description

Bert-based multi-layer attention mechanism relation extraction method

[ technical field ] A method for producing a semiconductor device

The invention relates to an entity relationship extraction method, in particular to a bert-based multi-layer attention mechanism relationship extraction method.

[ background of the invention ]

In the natural language processing technology center, the construction of the knowledge graph plays a very important role in the development of artificial intelligence, in the construction of the knowledge graph, knowledge extraction is a core foundation, and the knowledge extraction mainly comprises 3 subtasks: the method comprises the steps of entity extraction, relation extraction and event extraction, wherein the entity relation is a core task and an important link of knowledge extraction. The main objective of entity relation extraction is to identify and judge specific relations between entity pairs from natural language texts, so as to provide basic support for intelligent retrieval, semantic analysis and the like, and improve the search efficiency.

The supervised relation extraction method is considered to be a method with a better relation extraction effect at present, the relation extraction task is taken as a classification problem to be solved, effective features are designed according to training data, so that a classification model is learned, and then the trained model is used for predicting the relation.

Therefore, the prior art is not sufficient and needs to be improved.

[ summary of the invention ]

In order to overcome the technical problems, the invention provides a bert-based multi-layer attention mechanism relation extraction method.

The invention provides a bert-based multi-layer attention mechanism relation extraction method, which solves the technical problem and comprises the following steps of:

step S1: acquiring sample data;

step S2: dividing sample data;

step S3: replacing entities in the sample data with #;

step S4: acquiring an entity and connecting the entity and a sentence together by $ to form a training sample;

step S5: inputting the training samples into a BERT language model and a full connection layer, and carrying out vectorization processing to obtain vectorized representation of fusion characteristics;

step S6: sequentially increasing a word attention mechanism for the training samples and adding weight to local features of the training samples;

step S7: increasing a sentence-level attention mechanism and adding weight to the global features;

step S8: and inputting the sentence-level feature vectors into a classifier after passing through a full connection layer to obtain a classification result.

Preferably, the BERT language model includes an Embedding layer, a bidirectional Transformer encoder, and an output layer.

Preferably, the Embedding layer is formed by adding Token Embedding, Segment Embedding and Position Embedding.

Preferably, the bidirectional Transformer encoder is a multi-layer coding unit stack.

Preferably, the encoding unit includes a self-attention mechanism module, a residual network layer, a layer normalization structure, and a Dropout layer.

Preferably, the calculation formula of the layer normalization structure is:

preferably, the calculation formula of the residual network layer is:

FFN＝max(0,xW₁+b₁)W₂+b₂。

compared with the prior art, the multi-layer attention mechanism relation extraction method based on bert has the following advantages:

the method can accurately extract the entity relationship from the unstructured text, is favorable for enriching the data of the entity relationship in the knowledge map, enriches the semantic representation of words through a BERT language model, is favorable for utilizing the local characteristics and the global characteristics of the text through increasing the attention mechanism of the words and the sentences, and is favorable for improving the identification accuracy.

The data processing mode of connecting two entities and a sentence by $ and replacing the entity name in the sentence by # in the data preprocessing stage enables the model to learn semantic information of other words related to the entity relationship better, thereby improving the accuracy of model identification.

[ description of the drawings ]

FIG. 1 is a detailed flow chart of the multi-layer attention mechanism relationship extraction method based on bert in the invention.

[ detailed description ] embodiments

In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, the present invention provides a bert-based method for extracting relationship between multiple attention mechanisms, comprising the following steps:

step S1: acquiring sample data;

step S2: dividing sample data;

step S3: replacing entities in the sample data with #;

Further, in step S5, the BERT language model includes an Embedding layer, a bidirectional Transformer encoder, and an output layer. Wherein, the Embedding layer is obtained by adding the oken Embedding, Segment Embedding and Position Embedding, and the calculation mode of the Position vector is shown as the following formula;

in the above formula, 2i represents an even number, 2i +1 represents an odd number, and d_modelIs 64.

Further, the bidirectional Transformer encoder is used to extract semantics like those, which are a stack of multi-layered coding units, each coding unit including a self-attention mechanism module, a residual network layer, a layer normalization structure, and a Dropout layer. The self-attention mechanism module adjusts the weight coefficient matrix according to the association degree between words in the same sentence to obtain the word characterization, so that the word characterization contains the information of other words in the sentence, has context association and better global property, and the formula shows that:

where Q, K, V are the word vector matrix, d_kIs Embedding dimension, and the multi-head Attention mechanism projects Q, K and V through a plurality of different linear changes, and finally, different Attention results are spliced together, and the formulaThe following were used:

MultiHead(Q,K,V)＝Concat(head₁,head₂,Λhead_n)W^o (4)

head_i＝Attention(QW_i ^Q,KW_i ^K,VW_i ^V) (5)

the layer normalization structure, i.e. the LN layer, is calculated as follows:

the residual network layer is calculated as follows:

FFN＝max(0,xW₁+b₁)W₂+b₂ (7)

the layer normalization structure and the residual error network layer mainly solve the degradation problem in deep learning.

Further, in step S6, in the relationship extraction, the influence degree of the semantic information of each chinese character included in the relationship instance on the task is different, so a character set attention mechanism is introduced herein to automatically focus on the specific chinese character that is decisive for the relationship extraction, and capture the important semantic information in the instance. Word vector matrix H ═ (H) for BERT speech model output₁,h₂,h₃,Λh_m) By considering the relative weight of each word to the sentence, a sentence vector is obtained as:

where m is the length of the sentence, α_iFor the attention weight of the character set, α_i∈[0,1]And is

α i is calculated by a feedforward neural network as:

μ_i＝tanh(W_CH+b_C) (9)

W_cas words and phrasesCorrelation weight matrix, b_CFor the offset value, the calculated μ is measured by softmax after probing₁,μ₂,Λμ_mNormalized to obtain the weight alpha_i：

Further, in step S7, it is assumed that a sentence set composed of k sentences including the same entity pair is S ═ S₁,s₂Λs_kTo highlight the relevance of all sentences to the prediction relationship, the sentence vector can be expressed as a weighted sum of all sentence vectors in S:

wherein, beta_i∈[0,1]For each sentence in the set of sentences, a weight relative to the predicted relationship, and

by calculating the inner product of the sentence vector and the entity relationship vector and normalizing the inner product by using a softmax function, the attention weight beta of the sentence layer can be obtained_i：

a_i＝softmax(s_iAv_r) (12)

Where A is a randomly initialized weighted diagonal matrix, v_rIs a vector representation of the relation r.

And calculating the conditional probability p (R | R) of the prediction relation by the obtained vector R through a softmax function layer:

p(r|R)＝softmax(wR+b) (13)

where W is the relational expression matrix and b is the bias.

During training, cross entropy is adopted as a target loss function, and meanwhile the size of a constraint parameter is normalized by L2, so that the overfitting problem caused by training is relieved, and the loss function is calculated as follows:

in the above formula, θ represents a parameter of a parameter trained in the model, y_iThe number of the real labels is shown, and lambda is a regularization coefficient.

It can be understood that the invention calculates all parameter gradients through back propagation and minimizes the loss function using the Adam algorithm, thereby achieving smooth updating of the parameters therein. In addition, in order to avoid the over-fitting phenomenon, dropout is added into the full-connection layer, so that nodes of the hidden layer are abandoned from training according to a certain proportion, and the dependence of parameters on a fixed network is reduced.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and any modifications, equivalents, improvements, etc. made within the spirit of the present invention should be included in the scope of the present invention.

Claims

1. A multi-layer attention mechanism relation extraction method based on bert is characterized in that: the method for extracting the relation of the multi-layer attention mechanism based on the bert comprises the following steps:

step S1: acquiring sample data;

step S2: dividing sample data;

step S3: replacing entities in the sample data with #;

2. The bert-based multi-tiered attention mechanism relationship extraction method of claim 1, wherein: the BERT language model comprises an Embedding layer, a bidirectional Transformer coder and an output layer.

3. The bert-based multi-tiered attention mechanism relationship extraction method of claim 2, wherein: the Embedding layer is formed by adding Token Embedding, Segment Embedding and Position Embedding.

4. The bert-based multi-tiered attention mechanism relationship extraction method of claim 2, wherein: the bi-directional Transformer encoder is a multi-layer coding unit stack.

5. The bert-based multi-tiered attention mechanism relationship extraction method of claim 4, wherein: the encoding unit comprises a self-attention mechanism module, a residual error network layer, a layer normalization structure and a Dropout layer.

6. The bert-based multi-tiered attention mechanism relationship extraction method of claim 5, wherein: the calculation formula of the layer normalization structure is as follows:

7. the bert-based multi-tiered attention mechanism relationship extraction method of claim 5, wherein: the calculation formula of the residual error network layer is as follows:

FFN＝max(0,xW₁+b₁)W₂+b₂。