CN113987150A - Bert-based multi-layer attention mechanism relation extraction method - Google Patents

Bert-based multi-layer attention mechanism relation extraction method Download PDF

Info

Publication number
CN113987150A
CN113987150A CN202111275258.9A CN202111275258A CN113987150A CN 113987150 A CN113987150 A CN 113987150A CN 202111275258 A CN202111275258 A CN 202111275258A CN 113987150 A CN113987150 A CN 113987150A
Authority
CN
China
Prior art keywords
layer
attention mechanism
bert
extraction method
sentence
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202111275258.9A
Other languages
Chinese (zh)
Inventor
王伟
陈加杰
孙思明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Original Assignee
Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd filed Critical Shenzhen Qianhai Huanrong Lianyi Information Technology Service Co Ltd
Priority to CN202111275258.9A priority Critical patent/CN113987150A/en
Publication of CN113987150A publication Critical patent/CN113987150A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/332Query formulation
    • G06F16/3329Natural language query formulation or dialogue systems
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/367Ontology
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • Mathematical Physics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Databases & Information Systems (AREA)
  • General Health & Medical Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Biomedical Technology (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Human Computer Interaction (AREA)
  • Animal Behavior & Ethology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Machine Translation (AREA)

Abstract

The invention relates to an entity relationship extraction method, in particular to a bert-based multi-layer attention mechanism relationship extraction method, which comprises the following steps: step S1: acquiring sample data; step S2: dividing sample data; step S3: replacing entities in the sample data with #; step S4: acquiring an entity and connecting the entity and a sentence together by $ to form a training sample; step S5: inputting the training samples into a BERT language model and a full connection layer, and carrying out vectorization processing to obtain vectorized representation of fusion characteristics; step S6: sequentially increasing a word attention mechanism for the training samples and adding weight to local features of the training samples; step S7: increasing a sentence-level attention mechanism and adding weight to the global features; step S8: and inputting the sentence-level feature vectors into a classifier after passing through a full connection layer to obtain a classification result.

Description

Bert-based multi-layer attention mechanism relation extraction method
[ technical field ] A method for producing a semiconductor device
The invention relates to an entity relationship extraction method, in particular to a bert-based multi-layer attention mechanism relationship extraction method.
[ background of the invention ]
In the natural language processing technology center, the construction of the knowledge graph plays a very important role in the development of artificial intelligence, in the construction of the knowledge graph, knowledge extraction is a core foundation, and the knowledge extraction mainly comprises 3 subtasks: the method comprises the steps of entity extraction, relation extraction and event extraction, wherein the entity relation is a core task and an important link of knowledge extraction. The main objective of entity relation extraction is to identify and judge specific relations between entity pairs from natural language texts, so as to provide basic support for intelligent retrieval, semantic analysis and the like, and improve the search efficiency.
The supervised relation extraction method is considered to be a method with a better relation extraction effect at present, the relation extraction task is taken as a classification problem to be solved, effective features are designed according to training data, so that a classification model is learned, and then the trained model is used for predicting the relation.
Therefore, the prior art is not sufficient and needs to be improved.
[ summary of the invention ]
In order to overcome the technical problems, the invention provides a bert-based multi-layer attention mechanism relation extraction method.
The invention provides a bert-based multi-layer attention mechanism relation extraction method, which solves the technical problem and comprises the following steps of:
step S1: acquiring sample data;
step S2: dividing sample data;
step S3: replacing entities in the sample data with #;
step S4: acquiring an entity and connecting the entity and a sentence together by $ to form a training sample;
step S5: inputting the training samples into a BERT language model and a full connection layer, and carrying out vectorization processing to obtain vectorized representation of fusion characteristics;
step S6: sequentially increasing a word attention mechanism for the training samples and adding weight to local features of the training samples;
step S7: increasing a sentence-level attention mechanism and adding weight to the global features;
step S8: and inputting the sentence-level feature vectors into a classifier after passing through a full connection layer to obtain a classification result.
Preferably, the BERT language model includes an Embedding layer, a bidirectional Transformer encoder, and an output layer.
Preferably, the Embedding layer is formed by adding Token Embedding, Segment Embedding and Position Embedding.
Preferably, the bidirectional Transformer encoder is a multi-layer coding unit stack.
Preferably, the encoding unit includes a self-attention mechanism module, a residual network layer, a layer normalization structure, and a Dropout layer.
Preferably, the calculation formula of the layer normalization structure is:
Figure BDA0003329142550000021
preferably, the calculation formula of the residual network layer is:
FFN=max(0,xW1+b1)W2+b2
compared with the prior art, the multi-layer attention mechanism relation extraction method based on bert has the following advantages:
the method can accurately extract the entity relationship from the unstructured text, is favorable for enriching the data of the entity relationship in the knowledge map, enriches the semantic representation of words through a BERT language model, is favorable for utilizing the local characteristics and the global characteristics of the text through increasing the attention mechanism of the words and the sentences, and is favorable for improving the identification accuracy.
The data processing mode of connecting two entities and a sentence by $ and replacing the entity name in the sentence by # in the data preprocessing stage enables the model to learn semantic information of other words related to the entity relationship better, thereby improving the accuracy of model identification.
[ description of the drawings ]
FIG. 1 is a detailed flow chart of the multi-layer attention mechanism relationship extraction method based on bert in the invention.
[ detailed description ] embodiments
In order to make the objects, technical solutions and advantages of the present invention more apparent, the present invention will be described in further detail below with reference to the accompanying drawings and embodiments. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
Referring to fig. 1, the present invention provides a bert-based method for extracting relationship between multiple attention mechanisms, comprising the following steps:
step S1: acquiring sample data;
step S2: dividing sample data;
step S3: replacing entities in the sample data with #;
step S4: acquiring an entity and connecting the entity and a sentence together by $ to form a training sample;
step S5: inputting the training samples into a BERT language model and a full connection layer, and carrying out vectorization processing to obtain vectorized representation of fusion characteristics;
step S6: sequentially increasing a word attention mechanism for the training samples and adding weight to local features of the training samples;
step S7: increasing a sentence-level attention mechanism and adding weight to the global features;
step S8: and inputting the sentence-level feature vectors into a classifier after passing through a full connection layer to obtain a classification result.
Further, in step S5, the BERT language model includes an Embedding layer, a bidirectional Transformer encoder, and an output layer. Wherein, the Embedding layer is obtained by adding the oken Embedding, Segment Embedding and Position Embedding, and the calculation mode of the Position vector is shown as the following formula;
Figure BDA0003329142550000041
Figure BDA0003329142550000042
in the above formula, 2i represents an even number, 2i +1 represents an odd number, and dmodelIs 64.
Further, the bidirectional Transformer encoder is used to extract semantics like those, which are a stack of multi-layered coding units, each coding unit including a self-attention mechanism module, a residual network layer, a layer normalization structure, and a Dropout layer. The self-attention mechanism module adjusts the weight coefficient matrix according to the association degree between words in the same sentence to obtain the word characterization, so that the word characterization contains the information of other words in the sentence, has context association and better global property, and the formula shows that:
Figure BDA0003329142550000043
where Q, K, V are the word vector matrix, dkIs Embedding dimension, and the multi-head Attention mechanism projects Q, K and V through a plurality of different linear changes, and finally, different Attention results are spliced together, and the formulaThe following were used:
MultiHead(Q,K,V)=Concat(head1,head2,Λheadn)Wo (4)
headi=Attention(QWi Q,KWi K,VWi V) (5)
the layer normalization structure, i.e. the LN layer, is calculated as follows:
Figure BDA0003329142550000044
the residual network layer is calculated as follows:
FFN=max(0,xW1+b1)W2+b2 (7)
the layer normalization structure and the residual error network layer mainly solve the degradation problem in deep learning.
Further, in step S6, in the relationship extraction, the influence degree of the semantic information of each chinese character included in the relationship instance on the task is different, so a character set attention mechanism is introduced herein to automatically focus on the specific chinese character that is decisive for the relationship extraction, and capture the important semantic information in the instance. Word vector matrix H ═ (H) for BERT speech model output1,h2,h3,Λhm) By considering the relative weight of each word to the sentence, a sentence vector is obtained as:
Figure BDA0003329142550000051
where m is the length of the sentence, αiFor the attention weight of the character set, αi∈[0,1]And is
Figure BDA0003329142550000052
α i is calculated by a feedforward neural network as:
μi=tanh(WCH+bC) (9)
Wcas words and phrasesCorrelation weight matrix, bCFor the offset value, the calculated μ is measured by softmax after probing12,ΛμmNormalized to obtain the weight alphai
Figure BDA0003329142550000053
Further, in step S7, it is assumed that a sentence set composed of k sentences including the same entity pair is S ═ S1,s2ΛskTo highlight the relevance of all sentences to the prediction relationship, the sentence vector can be expressed as a weighted sum of all sentence vectors in S:
Figure BDA0003329142550000054
wherein, betai∈[0,1]For each sentence in the set of sentences, a weight relative to the predicted relationship, and
Figure BDA0003329142550000055
by calculating the inner product of the sentence vector and the entity relationship vector and normalizing the inner product by using a softmax function, the attention weight beta of the sentence layer can be obtainedi
ai=softmax(siAvr) (12)
Where A is a randomly initialized weighted diagonal matrix, vrIs a vector representation of the relation r.
And calculating the conditional probability p (R | R) of the prediction relation by the obtained vector R through a softmax function layer:
p(r|R)=softmax(wR+b) (13)
where W is the relational expression matrix and b is the bias.
During training, cross entropy is adopted as a target loss function, and meanwhile the size of a constraint parameter is normalized by L2, so that the overfitting problem caused by training is relieved, and the loss function is calculated as follows:
Figure BDA0003329142550000061
in the above formula, θ represents a parameter of a parameter trained in the model, yiThe number of the real labels is shown, and lambda is a regularization coefficient.
It can be understood that the invention calculates all parameter gradients through back propagation and minimizes the loss function using the Adam algorithm, thereby achieving smooth updating of the parameters therein. In addition, in order to avoid the over-fitting phenomenon, dropout is added into the full-connection layer, so that nodes of the hidden layer are abandoned from training according to a certain proportion, and the dependence of parameters on a fixed network is reduced.
Compared with the prior art, the multi-layer attention mechanism relation extraction method based on bert has the following advantages:
the method can accurately extract the entity relationship from the unstructured text, is favorable for enriching the data of the entity relationship in the knowledge map, enriches the semantic representation of words through a BERT language model, is favorable for utilizing the local characteristics and the global characteristics of the text through increasing the attention mechanism of the words and the sentences, and is favorable for improving the identification accuracy.
The data processing mode of connecting two entities and a sentence by $ and replacing the entity name in the sentence by # in the data preprocessing stage enables the model to learn semantic information of other words related to the entity relationship better, thereby improving the accuracy of model identification.
The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and any modifications, equivalents, improvements, etc. made within the spirit of the present invention should be included in the scope of the present invention.

Claims (7)

1. A multi-layer attention mechanism relation extraction method based on bert is characterized in that: the method for extracting the relation of the multi-layer attention mechanism based on the bert comprises the following steps:
step S1: acquiring sample data;
step S2: dividing sample data;
step S3: replacing entities in the sample data with #;
step S4: acquiring an entity and connecting the entity and a sentence together by $ to form a training sample;
step S5: inputting the training samples into a BERT language model and a full connection layer, and carrying out vectorization processing to obtain vectorized representation of fusion characteristics;
step S6: sequentially increasing a word attention mechanism for the training samples and adding weight to local features of the training samples;
step S7: increasing a sentence-level attention mechanism and adding weight to the global features;
step S8: and inputting the sentence-level feature vectors into a classifier after passing through a full connection layer to obtain a classification result.
2. The bert-based multi-tiered attention mechanism relationship extraction method of claim 1, wherein: the BERT language model comprises an Embedding layer, a bidirectional Transformer coder and an output layer.
3. The bert-based multi-tiered attention mechanism relationship extraction method of claim 2, wherein: the Embedding layer is formed by adding Token Embedding, Segment Embedding and Position Embedding.
4. The bert-based multi-tiered attention mechanism relationship extraction method of claim 2, wherein: the bi-directional Transformer encoder is a multi-layer coding unit stack.
5. The bert-based multi-tiered attention mechanism relationship extraction method of claim 4, wherein: the encoding unit comprises a self-attention mechanism module, a residual error network layer, a layer normalization structure and a Dropout layer.
6. The bert-based multi-tiered attention mechanism relationship extraction method of claim 5, wherein: the calculation formula of the layer normalization structure is as follows:
Figure FDA0003329142540000021
7. the bert-based multi-tiered attention mechanism relationship extraction method of claim 5, wherein: the calculation formula of the residual error network layer is as follows:
FFN=max(0,xW1+b1)W2+b2
CN202111275258.9A 2021-10-29 2021-10-29 Bert-based multi-layer attention mechanism relation extraction method Pending CN113987150A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111275258.9A CN113987150A (en) 2021-10-29 2021-10-29 Bert-based multi-layer attention mechanism relation extraction method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111275258.9A CN113987150A (en) 2021-10-29 2021-10-29 Bert-based multi-layer attention mechanism relation extraction method

Publications (1)

Publication Number Publication Date
CN113987150A true CN113987150A (en) 2022-01-28

Family

ID=79744727

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202111275258.9A Pending CN113987150A (en) 2021-10-29 2021-10-29 Bert-based multi-layer attention mechanism relation extraction method

Country Status (1)

Country Link
CN (1) CN113987150A (en)

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502749A (en) * 2019-08-02 2019-11-26 中国电子科技集团公司第二十八研究所 A kind of text Relation extraction method based on the double-deck attention mechanism Yu two-way GRU
US20200134422A1 (en) * 2018-10-29 2020-04-30 International Business Machines Corporation Relation extraction from text using machine learning
CN111950269A (en) * 2020-08-21 2020-11-17 清华大学 Text statement processing method and device, computer equipment and storage medium
CN112800774A (en) * 2021-01-15 2021-05-14 华南理工大学 Entity relationship extraction method, device, medium and equipment based on attention mechanism
CN113221567A (en) * 2021-05-10 2021-08-06 北京航天情报与信息研究所 Judicial domain named entity and relationship combined extraction method
CN113254429A (en) * 2021-05-13 2021-08-13 东北大学 BERT and MLM-based noise reduction method for remote supervision relationship extraction
CN113360582A (en) * 2021-06-04 2021-09-07 中国人民解放军战略支援部队信息工程大学 Relation classification method and system based on BERT model fusion multi-element entity information

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20200134422A1 (en) * 2018-10-29 2020-04-30 International Business Machines Corporation Relation extraction from text using machine learning
CN110502749A (en) * 2019-08-02 2019-11-26 中国电子科技集团公司第二十八研究所 A kind of text Relation extraction method based on the double-deck attention mechanism Yu two-way GRU
CN111950269A (en) * 2020-08-21 2020-11-17 清华大学 Text statement processing method and device, computer equipment and storage medium
CN112800774A (en) * 2021-01-15 2021-05-14 华南理工大学 Entity relationship extraction method, device, medium and equipment based on attention mechanism
CN113221567A (en) * 2021-05-10 2021-08-06 北京航天情报与信息研究所 Judicial domain named entity and relationship combined extraction method
CN113254429A (en) * 2021-05-13 2021-08-13 东北大学 BERT and MLM-based noise reduction method for remote supervision relationship extraction
CN113360582A (en) * 2021-06-04 2021-09-07 中国人民解放军战略支援部队信息工程大学 Relation classification method and system based on BERT model fusion multi-element entity information

Similar Documents

Publication Publication Date Title
CN110210037B (en) Syndrome-oriented medical field category detection method
CN111626063B (en) Text intention identification method and system based on projection gradient descent and label smoothing
CN110969020B (en) CNN and attention mechanism-based Chinese named entity identification method, system and medium
CN111738003B (en) Named entity recognition model training method, named entity recognition method and medium
CN109918671A (en) Electronic health record entity relation extraction method based on convolution loop neural network
CN111931506B (en) Entity relationship extraction method based on graph information enhancement
CN111738004A (en) Training method of named entity recognition model and named entity recognition method
CN113705238B (en) Method and system for analyzing aspect level emotion based on BERT and aspect feature positioning model
CN113191148A (en) Rail transit entity identification method based on semi-supervised learning and clustering
CN116661805B (en) Code representation generation method and device, storage medium and electronic equipment
Suyanto Synonyms-based augmentation to improve fake news detection using bidirectional LSTM
CN114048286A (en) Automatic fact verification method fusing graph converter and common attention network
CN114722835A (en) Text emotion recognition method based on LDA and BERT fusion improved model
CN112069825A (en) Entity relation joint extraction method for alert condition record data
CN115687609A (en) Zero sample relation extraction method based on Prompt multi-template fusion
CN113806528A (en) Topic detection method and device based on BERT model and storage medium
CN117725211A (en) Text classification method and system based on self-constructed prompt template
Xue et al. Intent-enhanced attentive Bert capsule network for zero-shot intention detection
CN114757183B (en) Cross-domain emotion classification method based on comparison alignment network
CN113051886B (en) Test question duplicate checking method, device, storage medium and equipment
CN114692604A (en) Deep learning-based aspect-level emotion classification method
CN113987150A (en) Bert-based multi-layer attention mechanism relation extraction method
Al-Harigy et al. Deep Pre-trained Contrastive Self-Supervised Learning: A Cyberbullying Detection Approach with Augmented Datasets
CN114238649A (en) Common sense concept enhanced language model pre-training method
CN114282537A (en) Social text-oriented cascade linear entity relationship extraction method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination