CN114444506A - Method for extracting relation triple fusing entity types - Google Patents
Method for extracting relation triple fusing entity types Download PDFInfo
- Publication number
- CN114444506A CN114444506A CN202210026447.0A CN202210026447A CN114444506A CN 114444506 A CN114444506 A CN 114444506A CN 202210026447 A CN202210026447 A CN 202210026447A CN 114444506 A CN114444506 A CN 114444506A
- Authority
- CN
- China
- Prior art keywords
- model
- entity
- head
- training
- representing
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 35
- 238000012549 training Methods 0.000 claims abstract description 49
- 238000013528 artificial neural network Methods 0.000 claims abstract description 30
- 238000000605 extraction Methods 0.000 claims abstract description 23
- 230000007246 mechanism Effects 0.000 claims abstract description 17
- 238000012360 testing method Methods 0.000 claims abstract description 17
- 238000012795 verification Methods 0.000 claims abstract description 12
- 238000004140 cleaning Methods 0.000 claims abstract description 4
- 238000013135 deep learning Methods 0.000 claims abstract description 4
- 230000006870 function Effects 0.000 claims description 15
- 230000003042 antagnostic effect Effects 0.000 claims description 14
- 238000003062 neural network model Methods 0.000 claims description 12
- 230000004913 activation Effects 0.000 claims description 6
- 230000004927 fusion Effects 0.000 claims description 4
- 230000000694 effects Effects 0.000 description 3
- 238000012986 modification Methods 0.000 description 2
- 230000004048 modification Effects 0.000 description 2
- 230000008569 process Effects 0.000 description 2
- 238000013459 approach Methods 0.000 description 1
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000002790 cross-validation Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000006872 improvement Effects 0.000 description 1
- 238000003058 natural language processing Methods 0.000 description 1
- 238000005457 optimization Methods 0.000 description 1
- 238000012545 processing Methods 0.000 description 1
- 238000010200 validation analysis Methods 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/045—Combinations of networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Data Mining & Analysis (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- Life Sciences & Earth Sciences (AREA)
- Biomedical Technology (AREA)
- Biophysics (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Evolutionary Computation (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a relation triple extraction method fusing entity types, which comprises the following steps: collecting text data as training samples; cleaning the collected training sample data to obtain a data set; segmenting a data set and dividing the data set into a training set, a verification set and a test set according to a certain proportion; building a deep learning network based on a BERT pre-training model and loading pre-training parameters to obtain deep expression of a training sample; building a Fast Gradient Method after the model to resist the robustness and generalization performance of the network promotion model; constructing a relation triple extraction model of a multi-head attention mechanism and a deep neural network; training and testing the model, storing a K-fold model for the verification set by adopting a K-fold cross verification method, testing the test set by integrating the K-fold model, and taking the average probability as the test result of the model; the model AtnFGM-MARE was output.
Description
Technical Field
The invention relates to the technical field of natural language processing, in particular to an intelligent extraction method for relation triples fusing entity types.
Background
Relational extraction is typically organized and presented in the form of triplets, with these structured knowledge enabling a number of downstream tasks, such as: the method comprises the steps of information extraction, knowledge graph, search engine and question and answer task, wherein the task of relation extraction comprises the steps of identifying head entities and tail entities in texts and classifying relations between the entities. At present, a neural network method is generally used for extracting the relation triples, and is inspired by the rapid development of a pre-training language model technology, the relation extraction uses a pre-training model to perform unsupervised learning by utilizing large-scale unmarked data to obtain the deep expression of a text, the method depends on the deep expression of a head entity, the deep expression of the head entity influences the extraction effect of a tail entity and a relation type in a downstream task, but the method has limited characteristic expression capability on the head entity, the improvement effect of type information of the head entity on the tail entity and the relation extraction is not considered, and the model also has the problems of low robustness, weak generalization capability and the like.
Disclosure of Invention
The invention aims to solve the technical problem of providing a relational ternary ancestor extraction method fusing entity types, which uses a multi-head attention mechanism to fuse the characteristics of the entity types of a head and obtains relational triples through a Deep Neural Network (DNN), thereby improving the robustness and generalization performance of a model,
in order to solve the technical problem, the invention is realized by the following modes:
a method for extracting a relation triple fused with entity types specifically comprises the following steps:
1) collecting text data as training samples;
2) cleaning the training sample data collected in the step 1) to form a data set;
3) segmenting the data set formed in the step 2), and dividing the data set into a training set, a verification set and a test set according to the proportion of 7:2: 1;
4) building a deep learning network based on a BERT pre-training model, and loading pre-training parameters to obtain deep expression of training set data;
5) a Fast Gradient Method antagonistic neural network model is built after a BERT pre-training model is built, and the Fast Gradient Method antagonistic neural network model is used for improving the robustness and generalization performance of the model;
6) predicting a head entity by utilizing a head entity extraction module DNN;
7) extracting head entity characteristics from the head entities predicted in the step 6), and extracting a model by using a relation triple of a multi-head attention mechanism and a deep neural network;
8) predicting the relation and the tail entity by using a relation and tail entity extraction module DNN;
9) performing model training, storing a K-fold model for the verification set in the step 3) by adopting a K-fold cross verification method, testing the test set by using the K-fold model, and taking the average probability as a test result of the model; the model AtnFGM-MARE was output.
Further, the BERT pre-training model in the step 4) is sequentially provided with a position embedding layer, a syntax embedding layer and a token embedding layer, and then is connected with an E[cls]Layer, full connection layer and T[cls]Layers and depth expression is as follows:
H=BERT(S) (1)
wherein S is training set text data, and H is the depth expression of the hidden state of S after the BERT pre-training model.
The expression of the Fast Gradient Method anti-neural network model built in the step 5) is as follows:
radv=∈·g/||g||2 (3)
xadv=x+radv (4)
wherein g represents a loss function after gradient updating, theta represents a parameter of the antagonistic neural network, x represents an input of the model, y represents a label corresponding to the input x, L represents a loss function of the training neural network,gradient representation representing a neural network, e representing a hyper-parameter of the antagonistic network, xadvRepresenting model input after adding counterdisturbance, radvRepresenting the degree of adding the countermeasure against the disturbance.
Step 6) connecting a head entity extraction module to a Fast Gradient Method countermeasure neural network model, and predicting a head entity starting position and a head entity ending position of a head entity starting full-link layer and a head entity ending full-link layer respectively;
wherein ,xiFor depth representation of the ith character in text, Ws、We、bs、beRepresents trainable parameters of the deep neural network, sigma represents sigmoid activation function,representing the probability that the ith character is the beginning character of the head entity,representing the probability that the ith character is the head entity end character.
Step 7) acquiring head entity type representation characteristics by using a multi-head attention mechanism, and fusing the head entity characteristics, the head entity type representation characteristics and the context representation characteristics by using a characteristic fusion (characteristic addition) mode to obtain a relation triple extraction model of the deep neural network; the expression for the multi-attention mechanism is as follows:
MultiHead(Q,K,V)=Concat(head1,head2,...,headh)WO (9)
wherein Q, K, V represents a vector obtained by linearly varying input features, dkHead, a parameter representing a control varianceiRepresenting the output of the ith attention mechanism module in the multi-head attention mechanism model,W0trainable parameters representing a deep neural network.
The relation and tail entity prediction in the step 8) is specifically as follows: predicting the relation, the starting position and the ending position of the tail entity according to the relation after the characteristic fusion in the step 7), the starting full-link layer and the relation of the tail entity and the ending full-link layer of the tail entity;
wherein ,represents trainable parameters of the deep neural network, sigma represents sigmoid activation function,representing the depth representation of the k-th head entity,representing a type depth representation of the kth head entity,representing the probability that the ith character is the end entity start character given the relationship r,representing the probability that the ith character is the end character of the tail entity given the relationship r.
Compared with the prior art, the invention has the following beneficial effects:
the triple extraction method disclosed by the invention has the advantages that through fusing an entity type neural network end-to-end model (AttnFGM-MARE), a pre-training model is adopted as the feature extraction of text context, and relational triples are intelligently extracted; the robust performance and the generalization performance of the model are improved by adopting a Fast Gradient Method antagonistic network model, the entity type characteristics of the head are fused by using a multi-head attention mechanism, and the effect of extracting the relation triplets is obtained and improved by a Deep Neural Network (DNN).
Drawings
FIG. 1 is a flow chart of a relational triple extraction model according to the present invention.
Detailed Description
The following detailed description of embodiments of the invention is provided in connection with the accompanying drawings and the examples. It should be understood that terms such as "having," "including," and "comprising," as used herein, do not preclude the presence or addition of one or more other networks or combinations thereof.
As shown in fig. 1, a method for extracting relationship triples of a converged entity type specifically includes the following steps:
1) collecting text data as training samples, size N of batch-sizebAnd a learning rate α; wherein the size N of the batch-sizebThe model can process the size of data in batch, namely the number of data samples captured by one-time training, and the speed of processing one data at a time is low; the learning rate alpha is the step size of the model updating parameter, the whole optimization process is updated by the determined step size, and the function of updating the parameter is to gradually approach the optimal solution.
3) Segmenting the data set formed in the step 2), and dividing the data set into training sets according to the proportion of 7:2:1Verification setAnd test set
4) Building a deep learning network based on a BERT pre-training model, and loading pre-training parameters to obtain deep expression of training set data;
5) a Fast Gradient Method antagonistic neural network model is built after a BERT pre-training model is built, and the Fast Gradient Method antagonistic neural network model is used for improving the robustness and generalization performance of the model; the expression of the Fast Gradient Method versus neural network model is as follows:
radv=∈·g/||g||2 (3)
xadv=x+radv (4)
where θ represents a parameter of the antagonistic neural network, x represents an input to the model, y represents a label corresponding to the input x, L represents a loss function of the training neural network,gradient representation representing a neural network, e representing a hyper-parameter of the antagonistic network, radvRepresenting the degree of adding the countermeasure disturbance;
6) predicting a head entity by utilizing a head entity extraction module DNN;
wherein ,xiFor depth representation of the ith character in text, Ws、We、bs、beRepresents trainable parameters of the deep neural network, sigma represents sigmoid activation function,representing the probability that the ith character is the beginning character of the head entity,representing the probability that the ith character is a head entity ending character;
7) extracting head entity characteristics from the head entities predicted in the step 6), and extracting a model by using a relation triple of a multi-head attention mechanism and a deep neural network; the expression for the multi-attention mechanism is as follows:
MultiHead(Q,K,V)=Concat(head1,head2,...,headh)WO (9)
wherein Q, K, V represents a vector obtained by linearly varying input features, dkHead, a parameter representing a control varianceiRepresenting the output of the ith attention mechanism module in the multi-head attention mechanism model,W0trainable parameters representing a deep neural network;
8) predicting the relation and the tail entity by using a relation and tail entity extraction module DNN;
wherein ,represents trainable parameters of the deep neural network, sigma represents sigmoid activation function,representing the depth representation of the k-th head entity,representing a type depth representation of the kth head entity,representing the probability that the ith character is the end entity start character given the relationship r,representing the probability that the ith character is the tail entity ending character under the condition of a given relation r;
9) performing model training, storing a K-fold model for the verification set in the step 3) by adopting a K-fold cross verification method, testing the test set by using the K-fold model, and taking the average probability as a test result of the model; the model AtnFGM-MARE was output. The K-fold cross validation method comprises the steps of dividing original data into K groups, making a primary validation set for each subset data, and using the remaining K-1 subsets as training sets to obtain a K-fold model.
Computing the network output y of a Deep Neural Network (DNN) of batch-size for the deep expression H of the present application*=DNN(H) (12)
wherein ,y*Representing the output of the neural network;
the loss of training is calculated through a loss function, and the expression of the loss function is as follows:
wherein ,xiRepresents the ith sample in the training set, TiRepresentative of the occurrence in the training sample xiS represents a head entity appearing in the relational triple, o represents a tail entity appearing in the relational triple, prRepresenting a probability value under a specified relationship r;
training and calculating through a loss function to update a parameter theta of the antagonistic neural network model, wherein the minimum expression of the parameter theta is as follows:
the foregoing is illustrative of embodiments of the present invention and it will be further appreciated by those skilled in the art that various modifications may be made without departing from the principles of the invention and that such modifications are intended to be included within the scope of the appended claims.
Claims (6)
1. A method for extracting relation triples fusing entity types is characterized in that: the method specifically comprises the following steps:
1) collecting text data as training samples;
2) cleaning the training sample data collected in the step 1) to form a data set;
3) segmenting the data set formed in the step 2), and dividing the data set into a training set, a verification set and a test set according to the proportion of 7:2: 1;
4) building a deep learning network based on a BERT pre-training model, and loading pre-training parameters to obtain deep expression of training set data;
5) a Fast Gradient Method antagonistic neural network model is built after a BERT pre-training model is built, and the Fast Gradient Method antagonistic neural network model is used for improving the robustness and generalization performance of the model;
6) predicting a head entity by utilizing a head entity extraction module DNN;
7) extracting head entity characteristics from the head entities predicted in the step 6), and extracting a model by using a relation triple of a multi-head attention mechanism and a deep neural network;
8) predicting the relation and the tail entity by using a relation and tail entity extraction module DNN;
9) performing model training, storing a K-fold model for the verification set in the step 3) by adopting a K-fold cross verification method, testing the test set by using the K-fold model, and taking the average probability as a test result of the model; the model AtnFGM-MARE was output.
2. The method for extracting relationship triples of converged entity types according to claim 1, wherein:
the BERT pre-training model in the step 4) is sequentially provided with a position embedding layer, a syntax embedding layer and a token embedding layer, and the rear connection is connected with an E[cls]Layer, full connection layer and T[cls]Layers and depth expression is as follows:
H=BERT(S) (1)
wherein S is training set text data, and H is the depth expression of the hidden state of S after the BERT pre-training model.
3. The method for extracting relationship triples of converged entity types according to claim 1, wherein:
the expression of the Fast Gradient Method anti-neural network model built in the step 5) is as follows:
radv=∈·g/||g||2 (3)
xadv=x+radv (4)
wherein g represents a loss function after gradient updating, theta represents a parameter of the antagonistic neural network, x represents an input of the model, y represents a label corresponding to the input x, L represents a loss function of the training neural network,gradient representation representing a neural network, e representing a hyper-parameter, x, of the antagonistic networkadvRepresenting model input after adding counterdisturbance, radvRepresenting the degree of adding the countermeasure against the disturbance.
4. The method for extracting relationship triples of converged entity types according to claim 1, wherein:
the step 6) is that the head entity extraction module is connected with a Fast Gradient Method confrontation neural network model, and predicts the head entity starting position and the head entity ending position of a head entity starting full-connection layer and a head entity ending full-connection layer respectively;
wherein ,xiFor depth representation of the ith character in text, Ws、We、bs、beRepresents trainable parameters of the deep neural network, sigma represents sigmoid activation function,representing the probability that the ith character is the beginning character of the head entity,representing the probability that the ith character is the head entity end character.
5. The method for extracting relationship triples of converged entity types according to claim 1, wherein:
the step 7) obtains head entity type representation characteristics by using a multi-head attention mechanism, and fuses the head entity characteristics, the head entity type representation characteristics and the context representation characteristics by using a characteristic fusion mode to obtain a relation triple extraction model of the deep neural network; the expression for the multi-attention mechanism is as follows:
MultiHead(Q,K,V)=Concat(head1,head2,...,headh)WO (9)
wherein Q, K, V represents a vector obtained by linearly varying input features, dkParameter representing control variance, headiRepresenting the output of the ith attention mechanism module in the multi-head attention mechanism model, Wi Q、Wi K、Wi V、W0Trainable parameters representing a deep neural network.
6. The method for extracting relationship triples of converged entity types according to claim 1, wherein:
the relation and tail entity prediction in the step 8) is specifically as follows: predicting the relation, the starting position and the ending position of the tail entity according to the relation after the characteristic fusion in the step 7), the starting full-link layer and the relation of the tail entity and the ending full-link layer of the tail entity;
wherein ,represents trainable parameters of the deep neural network, sigma represents sigmoid activation function,representing the depth representation of the k-th head entity,representing a type depth representation of the kth head entity,representing the probability that the ith character is the end entity start character given the relationship r,representing the probability that the ith character is the end character of the tail entity given the relationship r.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210026447.0A CN114444506B (en) | 2022-01-11 | 2022-01-11 | Relation triplet extraction method for fusing entity types |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210026447.0A CN114444506B (en) | 2022-01-11 | 2022-01-11 | Relation triplet extraction method for fusing entity types |
Publications (2)
Publication Number | Publication Date |
---|---|
CN114444506A true CN114444506A (en) | 2022-05-06 |
CN114444506B CN114444506B (en) | 2023-05-02 |
Family
ID=81368025
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210026447.0A Active CN114444506B (en) | 2022-01-11 | 2022-01-11 | Relation triplet extraction method for fusing entity types |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114444506B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111191453A (en) * | 2019-12-25 | 2020-05-22 | 中国电子科技集团公司第十五研究所 | Named entity recognition method based on confrontation training |
CN111931506A (en) * | 2020-05-22 | 2020-11-13 | 北京理工大学 | Entity relationship extraction method based on graph information enhancement |
CN112148997A (en) * | 2020-08-07 | 2020-12-29 | 江汉大学 | Multi-modal confrontation model training method and device for disaster event detection |
CN113221567A (en) * | 2021-05-10 | 2021-08-06 | 北京航天情报与信息研究所 | Judicial domain named entity and relationship combined extraction method |
WO2021190236A1 (en) * | 2020-03-23 | 2021-09-30 | 浙江大学 | Entity relation mining method based on biomedical literature |
-
2022
- 2022-01-11 CN CN202210026447.0A patent/CN114444506B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111191453A (en) * | 2019-12-25 | 2020-05-22 | 中国电子科技集团公司第十五研究所 | Named entity recognition method based on confrontation training |
WO2021190236A1 (en) * | 2020-03-23 | 2021-09-30 | 浙江大学 | Entity relation mining method based on biomedical literature |
CN111931506A (en) * | 2020-05-22 | 2020-11-13 | 北京理工大学 | Entity relationship extraction method based on graph information enhancement |
CN112148997A (en) * | 2020-08-07 | 2020-12-29 | 江汉大学 | Multi-modal confrontation model training method and device for disaster event detection |
CN113221567A (en) * | 2021-05-10 | 2021-08-06 | 北京航天情报与信息研究所 | Judicial domain named entity and relationship combined extraction method |
Non-Patent Citations (4)
Title |
---|
YUANFEI DAI 等: "Generative adversarial networks based on Wasserstein distance for knowledge graph embeddings" * |
吕建成 等: "类脑超大规模深度神经网络***" * |
李涛;郭渊博;琚安康;: "融合对抗主动学习的网络安全知识三元组抽取" * |
黄培馨;赵翔;方阳;朱慧明;肖卫东;: "融合对抗训练的端到端知识三元组联合抽取" * |
Also Published As
Publication number | Publication date |
---|---|
CN114444506B (en) | 2023-05-02 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN111914644B (en) | Dual-mode cooperation based weak supervision time sequence action positioning method and system | |
CN109934261B (en) | Knowledge-driven parameter propagation model and few-sample learning method thereof | |
US11301759B2 (en) | Detective method and system for activity-or-behavior model construction and automatic detection of the abnormal activities or behaviors of a subject system without requiring prior domain knowledge | |
CN111914091B (en) | Entity and relation combined extraction method based on reinforcement learning | |
CN112560432B (en) | Text emotion analysis method based on graph attention network | |
CN111950269A (en) | Text statement processing method and device, computer equipment and storage medium | |
CN112100403A (en) | Knowledge graph inconsistency reasoning method based on neural network | |
US20220121949A1 (en) | Personalized neural network pruning | |
CN115357904B (en) | Multi-class vulnerability detection method based on program slicing and graph neural network | |
CN115631365A (en) | Cross-modal contrast zero sample learning method fusing knowledge graph | |
CN115203507A (en) | Event extraction method based on pre-training model and oriented to document field | |
CN113673242A (en) | Text classification method based on K-neighborhood node algorithm and comparative learning | |
CN114880307A (en) | Structured modeling method for knowledge in open education field | |
CN117354207A (en) | Reverse analysis method and device for unknown industrial control protocol | |
CN116384379A (en) | Chinese clinical term standardization method based on deep learning | |
CN112507720A (en) | Graph convolution network root identification method based on causal semantic relation transfer | |
CN113342982B (en) | Enterprise industry classification method integrating Roberta and external knowledge base | |
CN114444506A (en) | Method for extracting relation triple fusing entity types | |
CN115661539A (en) | Less-sample image identification method embedded with uncertainty information | |
CN114780725A (en) | Text classification algorithm based on deep clustering | |
US20240185078A1 (en) | Purified contrastive learning for lightweight neural network training | |
CN117909766A (en) | Open information extraction clustering method based on manual guidance | |
CN117688472A (en) | Unsupervised domain adaptive multivariate time sequence classification method based on causal structure | |
WO2023167791A1 (en) | On-device artificial intelligence video search | |
CN118132751A (en) | Standard document industry classification method, apparatus, computer device, and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |