CN114781375A - Military equipment relation extraction method based on BERT and attention mechanism - Google Patents

Military equipment relation extraction method based on BERT and attention mechanism Download PDF

Info

Publication number
CN114781375A
CN114781375A CN202210555624.4A CN202210555624A CN114781375A CN 114781375 A CN114781375 A CN 114781375A CN 202210555624 A CN202210555624 A CN 202210555624A CN 114781375 A CN114781375 A CN 114781375A
Authority
CN
China
Prior art keywords
entity
relation
extraction
bert
text
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210555624.4A
Other languages
Chinese (zh)
Inventor
王鑫鹏
阮国庆
李晓冬
吴蔚
徐建
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
CETC 28 Research Institute
Original Assignee
CETC 28 Research Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by CETC 28 Research Institute filed Critical CETC 28 Research Institute
Priority to CN202210555624.4A priority Critical patent/CN114781375A/en
Publication of CN114781375A publication Critical patent/CN114781375A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Computing Systems (AREA)
  • Molecular Biology (AREA)
  • Biophysics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biomedical Technology (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a military equipment relation extraction method based on BERT and attention mechanism. The extraction of the equipment target relation information in the military news is completed by adopting a mode of entity and relation combined extraction. Firstly, constructing a BERT layer and extracting text characteristic information. And secondly, dividing into an entity extraction branch and a relation extraction branch. Entity extraction full connection layers and conditional random fields are added to a BERT network for label sequence prediction and optimization. The relation extracting branch embeds label characteristics and start and end mark characteristics of relation start entity and end entity on the basis of BERT network output, then excavates the relation between entities through GRU and attention layer, and finally predicts the relation through full connection layer. And thirdly, adding the loss values of the entity extraction branch and the relation extraction branch during training, and optimizing through the same optimizer. Experimental results show that the method is effective in extracting the Chinese text relationship.

Description

Military equipment relation extraction method based on BERT and attention mechanism
Technical Field
The invention relates to the technical field of text relation extraction, in particular to a military equipment relation extraction method based on BERT and attention mechanism.
Background
With the rapid development of information technology and network level, the information amount is in an explosively increasing state, and how to extract important information from massive information becomes a research hotspot at present in information service. The text information processing comprises the directions of entity extraction, relation extraction, event extraction, machine reading understanding and the like. The relation extraction establishes the relation between the entities, further converts the text information into structured data, and provides data support for downstream applications such as Chinese information content retrieval, knowledge graph construction and the like.
The relationship extraction mainly comprises a supervised entity relationship extraction method, a semi-supervised entity relationship extraction method and an unsupervised entity relationship extraction method. The unsupervised entity relation extraction method comprises two parts of entity clustering and relation type word selection, but the problems of inaccurate feature extraction, unreasonable clustering result, low relation result accuracy and the like exist. Semi-supervised entity relationship extraction methods, such as Bootstrapping, summarize entity relationship sequence patterns from text containing relationship seeds, and then use this to find more relationship seed instances. However, the problem of semantic drift caused by noise mixed in the iterative process exists. The main idea of the supervised entity relation extraction method is to train a machine learning model on the labeled data and perform relation recognition on the test data. Supervised entity relationship extraction methods are classified into rule-based relationship extraction methods and feature-based relationship extraction methods. The relation extraction method based on the rules extracts the entity relation through template matching by summarizing and summarizing the rules or templates according to the corpora and the fields. Such methods rely on named entity recognition systems and distance calculations, which are prone to additional propagation errors and time consumption. The feature-based relation extraction method mainly uses machine learning methods, such as RNN (Recurrent Neural Network), CNN (Convolutional Neural Network), LSTM (Long Short-Term Memory artificial Neural Network), and the like, to automatically extract text features without constructing complex features, but cannot fully utilize local features and global features of text information.
Disclosure of Invention
The purpose of the invention is as follows: the invention aims to solve the technical problem of the prior art and provides a military equipment relation extraction method based on BERT and attention mechanism, which can effectively improve the Chinese text relation extraction accuracy.
In order to solve the technical problem, the invention discloses a military equipment relation extraction method based on a BERT and attention mechanism, which comprises the following steps:
step 1, performing entity labeling and relation labeling on a text corpus to obtain labeled data;
step 2, preprocessing the labeled data to generate a text relation extraction model training set and a test set;
step 3, constructing a text relation extraction model;
step 4, training a text relation extraction model to obtain a trained text relation extraction model;
and 5, inputting the test set data into the trained text relation extraction model to obtain a relation extraction result.
Further, in step 1, the annotation data comprises three parts, the first part is an original text of a text expectation, the second part is entity annotation data, and the third part is relationship annotation data;
the step 2 of preprocessing the labeling data comprises the following steps: the entity marking data is expressed in the form of { entity initial position, entity end position and entity label }, and then is converted into a BMES entity marking system; (ii) a Converting the relation annotation data into a form of { a first entity, a second entity, a relation, a first entity starting position, a first entity ending position, a first entity label, a second entity starting position, a second entity ending position, a second entity label };
generating a text relation extraction model training set and a test set in the step 2 according to the following steps of 7: and 3, respectively segmenting the entity annotation data and the relationship annotation data.
Further, in step 3, the text relationship extraction model includes a BERT (bidirectional Encoder retrieval from transformations) layer, an entity extraction branch, and a relationship extraction branch, where the BERT layer is configured to perform depth feature extraction on an input text to obtain an input text feature; BERT essentially performs self-supervised learning through mass data, learning a good feature representation for words/words. In subsequent downstream tasks, the features of BERT may be used directly to represent word embedding features as tasks. And fine adjustment is carried out according to the requirements of downstream tasks to obtain a model with good effect.
The entity extraction branch is used for mapping the input text features to entity tags to obtain entity tag sequence vectors, and then predicting the entity tag sequence vectors to obtain entity categories;
and the relation extraction branch is used for carrying out classification prediction on the combined characteristics of the BERT output and the entity identification output to obtain the relation category between the two entities.
Compared with a pipeline type information extraction mode of firstly extracting the entities and then extracting the relations, the method for constructing the entity and relation combined extraction model has the advantages of lower hardware resource overhead and higher speed. Meanwhile, the relation between the subtasks is enhanced, the error transmission accumulation among the subtasks is reduced, and the relation extraction effect is improved.
Further, in step 3, the entity extraction branch sequentially includes a full-link layer and a conditional random field layer, where the full-link layer is configured to map the input text features to the entity labels to obtain entity label sequence vectors, which are denoted as h1,h2,...,hnN is the maximum model input length; the Conditional Random Field (CRF) layer is used for optimizing and predicting an entity label sequence vector to obtain an entity class; CRF is a sequence of states, in contrast to the traditional Hidden Markov Model (HMM)Markov Chain (Markov Chain). CRF can be used for various predictive problems, and is commonly used in the machine learning field as a processing annotation problem. The probability P (y | s) of the tag sequence y is calculated as:
Figure RE-GDA0003683803650000031
here, s denotes an input sentence, m denotes the number of tags in the tag sequence y, and the tags in the tag sequence y include l1,l2,...,lm(ii) a y' represents any tag sequence, including tags; i represents the label indexes of the label sequences y and y', and i is more than or equal to 1 and less than or equal to m;
Figure RE-GDA0003683803650000032
represents the weight vector corresponding to the label sequence y,
Figure RE-GDA0003683803650000033
indicates the offset corresponding to the tag sequence y,
Figure RE-GDA0003683803650000034
represents the weight vector corresponding to the label sequence y',
Figure RE-GDA0003683803650000035
represents the offset corresponding to the label sequence y'; and then searching the optimal label sequence through a first-order Viterbi algorithm to obtain the entity class.
Further, the relationship extraction branch in step 3 sequentially includes a feature combination layer, a bidirectional GRU (Gated recovery Unit) layer, an attention layer, and a Softmax classifier, where the feature combination layer is configured to combine features of an input text with categories of entities to obtain a relationship extraction input feature, which is denoted as Er
The bidirectional GRU layer is used for acquiring abstract features;
the attention layer is used for simulating an attention mechanism when a person reads information, focuses on semantic information with important influence on local features like keywords, and records an output feature A;
the Softmax classifier is used for mapping the attention layer output characteristics A to entity relation classes and obtaining the probability R ═ R of each class1,...,rN]And N is the number of relation categories.
Further, step 3 comprises:
step 3-1: converting the training text in the training set into character vector characteristics, embedding position information to obtain BERT input characteristics, and recording as Ei(ii) a The position embedding mode is as follows:
Figure RE-GDA0003683803650000041
Figure RE-GDA0003683803650000042
wherein 2c denotes the even number bits of the input sequence, 2c +1 denotes the odd number bits of the input sequence, c is a natural number, dmodelIs the characteristic dimension of the BERT model, pos represents the word location information, and PE represents the location embedding function.
Step 3-2: inputting BERT input features Ei into a text relation extraction model, and performing depth feature extraction to obtain abstract features of a BERT layer;
step 3-3: for a batch of training data
Figure RE-GDA0003683803650000043
The physical decimation branch penalty function is:
Figure RE-GDA0003683803650000044
wherein λ represents the L2 regularization parameter, and Θ is the entity extraction branch parameter set.
Step 3-4: in the bidirectional GRU layer of the relation extraction branch, the hidden state h of the current time is calculatedt
Step 3-5: is closingDraw the attention layer of the branch according to the result h obtained in step 3-4tCalculating to obtain a word vector ut
In fact, each word in the sentence has an unequal effect on the expression of the meaning of the sentence, and a randomly initialized word context vector u is added during the attention level training processwAnd performing co-training, and calculating the correlation degree of the words and the relations by adding an attention layer to form an attention layer sentence vector sa.
Step 3-6: the Softmax classifier maps the sentence vector sa to a set of vectors with elements in the [0,1] interval, the vector sum being 1, as follows:
R=Softmax(s),R=[r1,r2,...,rN]andri∈[0,1]and∑ri=1
further, in step 4, a text relation extraction model is trained, and the text relation extraction network loss value loss is used for extracting the branch loss value loss for the entitynerSum relation decimation Branch loss value lossreAnd (3) weighted sum:
loss=lossner+2×lossre
to enhance the impact of relationship extraction, lossner、lossreThe weights of (a) and (b) are set to 1 and 2, respectively. In order to prevent overfitting of the model during training, L2 regularization is added to constrain a text relation extraction network, a dropout strategy is introduced in the training process, the suppression probability is set to be 0.5, and a batch Adam optimization algorithm is adopted for parameter training of the relation extraction model.
Further, the BERT layer in the step 3 adopts a 12-layer Transformer structure
Further, in step 5, the entity recognition result and the test set data are input into a text relation extraction model to obtain a relation extraction result; the performance evaluation indexes adopt precision rate precision, recall rate recall and F1The value is obtained.
Further, in step 1, entity labeling and relationship labeling are performed on the text corpus by manual labeling.
The principle of the invention is as follows: the invention extracts and associates the target information mentioned in the military text by constructing a united entity and a relation extraction model based on a BERT deep learning pre-training language model.
Has the beneficial effects that:
compared with the prior art, the invention has the following remarkable advantages: the BERT deep learning pre-training language model is obtained based on large-scale open source data training, universal grammar semantic knowledge is implicitly learned, and downstream tasks can obtain almost the best effect only by fine adjustment of a small amount of task data. Meanwhile, an entity and relationship combined extraction model is constructed, and compared with a pipeline type information extraction mode of firstly extracting entities and then extracting relationships, the hardware resource overhead is smaller and the speed is higher. Meanwhile, the relation between the subtasks is enhanced, the error transmission accumulation among the subtasks is reduced, and the effect of extracting the target relation of military equipment is improved.
Drawings
The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.
FIG. 1 is a BERT-ATT-RE relationship extraction network of the present invention.
FIG. 2 is a schematic diagram of the BERT model structure of the present invention.
FIG. 3 is a schematic diagram of the structure of the Transformer of the present invention.
Detailed Description
Embodiments of the present invention will be described below with reference to the accompanying drawings.
With the development of the information age, the amount of text data is rapidly increasing, and computer computing power is rapidly developing. In the natural language understanding field, a pre-training model is generated through unsupervised learning of mass data, common grammar semantic knowledge is implicitly learned, and downstream tasks can obtain almost optimal effect only through fine adjustment of a small amount of task data. Text information extraction can be roughly divided into a pipeline mode and a joint extraction mode according to the logic sequence of the realization of entity identification and relation extraction. The pipeline mode firstly extracts the entities of the Chinese text and identifies the relationship between the entities. And the joint extraction mode simultaneously extracts entities and relationships thereof in the same model. The joint extraction mode enhances the relation between tasks, reduces error transmission accumulation between subtasks, and improves the effect of relation extraction.
The embodiment of the application discloses a military equipment relationship extraction method (BERT-ATT-RE) based on a BERT and attention mechanism, which is suitable for an equipment target relationship information extraction scene in military news and comprises the following steps:
step 1, carrying out entity labeling and relation labeling on a text corpus to obtain labeled data;
step 2, preprocessing the labeled data to generate a text relation extraction model training set and a test set;
step 3, constructing a text relation extraction model;
step 4, training a text relation extraction model to obtain a trained text relation extraction model;
and 5, inputting the test set data into the trained text relation extraction model to obtain a relation extraction result.
In this embodiment, in step 1, the annotation data includes three parts, a first part is an original text of a text prediction, a second part is entity annotation data, and a third part is relationship annotation data; and carrying out entity labeling and relation labeling on the text corpus by adopting manual labeling.
The step 2 of preprocessing the labeled data comprises the following steps: the entity annotation data is expressed in the form of { entity initial position, entity end position and entity label }, and then converted into a BMES entity annotation system; and converting the relation annotation data into a form of { a first entity, a second entity, a relation, a first entity starting position, a first entity ending position, a first entity label, a second entity starting position, a second entity ending position and a second entity label }. For example, marking the current activity of the X country aircraft carrier in the XX sea area, and the entity marking result is {0, 2, "Country" }, {2, 6, "equipment" }, {9, 11, "location" }, {13, 15, "action" }. The entity annotation data is converted into a BMES entity annotation system, B represents the initial position of the entity, M represents the middle part of the entity, E represents the end position of the entity, and S represents that the entity is a single-word entity. For example, "aircraft carrier" is labeled "B-equipment, M-equipment, E-equipment". The relationship labeling result is { "aircraft carrier", "X country", "country", 2, 6 "equipment", 0, 2 "country" }, { "aircraft carrier", "XX", "position", 2, 6 "equipment", 9, 11 "location" }, { "aircraft carrier", "activity", "mission", 2, 6 "equipment", 13, 15 "action" }.
Generating a text relation extraction model training set and a test set in the step 2 according to the following steps of 7: and 3, respectively segmenting the entity annotation data and the relation annotation data.
In this embodiment, in step 3, the text relationship extraction model includes a BERT layer, an entity extraction branch, and a relationship extraction branch. The BERT layer is used for carrying out depth feature extraction on an input text to obtain input text features; the entity extraction branch is used for mapping the input text features to entity tags to obtain entity tag sequence vectors, and then predicting the entity tag sequence vectors to obtain entity categories; and the relation extraction branch is used for carrying out classification prediction on the combined characteristics of the BERT output and the entity identification output to obtain the relation category between the two entities. As shown in fig. 1. The method specifically comprises the following steps:
step 3-1: converting training texts { X, state, navigation, air, mother, ship, eye, front, in, X, X, sea, domain, living, moving } in a training set into character vector characteristics, embedding position information to obtain BERT input characteristics, and marking as Ei. The position embedding mode is as follows:
Figure RE-GDA0003683803650000071
Figure RE-GDA0003683803650000072
wherein 2c represents the even number of the input sequence, 2c +1 represents the odd number of the input sequence, c is a natural number, dmodelIs the characteristic dimension of the BERT model, pos represents the word position information, PE represents the position embedding functionCounting; .
Step 3-2: the first layer of the text relation extraction model network is BERT. BERT essentially performs self-supervised learning through mass data, learning a good feature representation for words/words. In subsequent downstream tasks, the characteristics of BERT may be used directly to represent word embedding characteristics as a task. And obtaining a model with good effect after fine adjustment according to the requirements of downstream tasks. BERT employs a 12-layer Transformer structure, as shown in FIG. 2, trm in the rectangular box represents the Transformer structure.
The transform, which is essentially an Encoder (Encoder) -Decoder (Decoder) structure, is shown in fig. 3, with an Encoder structure in the left dashed box and a Decoder structure in the right dashed box. The encoder structure of the transform model is formed by stacking Nx (Nx ═ 6) identical base layers, each base Layer is composed of two sub-layers, the first is a Multi-Head Attention Layer (Multi-Head Attention), the second is a dense full-link Feed Forward neural Network Layer (Feed Forward Network), then a Residual Connection (Residual Connection) is used once in the two sub-layers, and then a Layer Normalization (Layer Normalization) operation is performed (added & Norm in fig. 3 represents Residual Connection and Layer Normalization). The decoder structure is similar to the encoder structure, and is composed of 6 identical basic layers, each layer includes a Multi-Head Attention layer and a feedforward neural network layer, and a concealed Multi-Head Attention layer (Multi-Head Attention) for performing Multi-Head Attention operation on the output of the encoder layer. Each sub-layer of the decoder also employs residual concatenation and then normalization operations.
The multi-headed Attention mechanism is an upgraded version that can be considered a point-of-care Attention mechanism (Scaled Dot-Product Attention). The point-by-point attention mechanism is calculated as follows:
Figure RE-GDA0003683803650000073
q, K, V, which represent the text entry vector and the key-value pair vector, respectively, dkIs the characteristic dimension of K. By self-attentionIn Self-Attention, Q, K, V inputs vectors for question sentences, so that the Attention distribution in the sentence can be determined, including information of other words in the sentence. The multi-head attention mechanism is based on a point-by-point attention mechanism, linear change is added to expand to a plurality of different expression subspaces, and characteristic information is expanded. The multi-head attention mechanism is calculated as follows:
MultiHead(Q,K,V)=Concat(head1,head2,...,headk)WO
Figure RE-GDA0003683803650000081
wherein the content of the first and second substances,
Figure RE-GDA0003683803650000082
a Q-oriented weight matrix is represented,
Figure RE-GDA0003683803650000083
a weight matrix oriented to K is represented,
Figure RE-GDA0003683803650000084
representing a V-oriented weight matrix, WOAnd the output weight matrix is shown, k represents the number of attention heads, j represents the jth attention head, and j is larger than or equal to 1 and smaller than or equal to k.
Step 3-3: after the text is subjected to depth feature extraction through a BERT layer, the text is divided into two branches to be respectively subjected to entity extraction and relationship extraction. And the entity extraction branch is added into a full connection layer and a CRF layer on the basis of the BERT layer. The full connection layer maps the features to entity labels, and the CRF layer optimizes the sequence. In contrast to conventional Hidden Markov Models (HMMs), CRF is a Markov Chain of state sequences (Markov Chain). CRF can be used for various predictive problems, and is commonly used in the machine learning field as a processing annotation problem. Finally, processing the output vector h of the full connection layer by a CRF layer1,h2,...,hnAnd n is the maximum model input length. The probability P (y | s) of the tag sequence y is calculated as:
Figure RE-GDA0003683803650000085
here, s denotes an input sentence, m denotes the number of tags in the tag sequence y, and the tags in the tag sequence y include l1,l2,...,lm(ii) a y' represents any tag sequence, including tags; i represents the label indexes of the label sequences y and y', and i is more than or equal to 1 and less than or equal to m;
Figure RE-GDA0003683803650000086
represents the weight vector corresponding to the label sequence y,
Figure RE-GDA0003683803650000087
indicates the offset corresponding to the tag sequence y,
Figure RE-GDA0003683803650000088
represents the weight vector corresponding to the label sequence y',
Figure RE-GDA0003683803650000089
represents the offset corresponding to the label sequence y'; and then searching the optimal label sequence through a first-order Viterbi algorithm to obtain the entity class. The sentence { X, Country, aviation, mother, ship, current, on, X, X, sea, territory, alive, action } corresponds to the entity class { B-Country, E-Country, B-equipment, M-equipment, E-equipment, O, O, O, B-site, E-site, O, O, B-action, E-action }. For a batch of training data
Figure RE-GDA0003683803650000091
The loss function is:
Figure RE-GDA0003683803650000092
where λ represents the L2 regularization parameter and Θ is the set of parameters.
Step 3-4: the relation extraction branch is added on the basis of BERT outputBidirectional gru (gated secure unit) and attention layer, and finally output the result through Softmax classifier. The bidirectional GRU layer is used for obtaining abstract characteristics and comprises two GRU units, and each GRU unit comprises a reset gate (reset gate) rtAnd an update gate zt. Updating the door ztFor controlling the output h of the preceding momentt-1Input x with the current timetThe degree of retention of the information contained in (a) is taken as the output h of the gating unit at time ttThe larger the value, the higher the degree of retention; and reset the door rtBy inputting x at the current momenttDetermining the previous time ht-1The smaller the reset gate value is, the higher the neglect degree is. Calculating to obtain the memory of the current moment
Figure RE-GDA0003683803650000093
And the hidden state h at the current moment after the reset gate and the update gatet. Update gate z of the GRU unit at time ttReset gate rtNew memory of
Figure RE-GDA0003683803650000094
Final hidden state htIs calculated as follows:
zt=σ(Wz·[ht-1,xt])
rt=σ(Wr·[ht-1,xt])
Figure RE-GDA0003683803650000095
Figure RE-GDA0003683803650000096
wherein, σ () is sigmoid nonlinear activation function for enhancing the processing capability of model to nonlinear data, and σ (x) is 1/(1+ e)-x). Denotes dot multiplication. tan h (x) or (e)x-e-x)/(ex+e-x)。W、Wr、WzIs the weight matrix of the model. []Indicating that two vectors are connected.
Step 3-5: attention layer. Attention mechanism (Attention) is a model proposed by Treisman and Gelade to simulate the Attention mechanism of the human brain. And highlighting key information influencing the output of the model by utilizing probability distribution. Therefore, the attention mechanism can more effectively utilize the local features and the global features of the text, improve the accuracy of relation extraction, and focus on semantic information with important influence of the local features like keywords.
For one sentence vector w ═ w1,w2,...,wTH } converting the result obtained in the step 3-2 into a result htThe word vector u is obtained by the following formulat
ut=tanh(Ww·ht+bw)
Wherein, WwRepresenting the attention level weight parameter, bwAn attention layer offset value parameter is represented.
In fact, each word in the sentence has an unequal effect on the expression of the meaning of the sentence, and a randomly initialized word context vector u is added during the attention level training processwAnd (5) performing co-training. And calculating the degree of correlation between the words and the relations by adding an attention layer to form an attention layer sentence vector sa. The attention layer calculation formula is as follows:
Figure RE-GDA0003683803650000101
Figure RE-GDA0003683803650000102
αtis the word utAnd uwSa is the weighted sentence vector representation at the current moment.
Step 3-6: the Softmax classifier maps the sentence vector sa to a set of elements at [0,1]Vectors within the interval, i.e. the probability R ═ R for each entity relationship class1,...,rN]N is offThe coefficient class number, vector sum is 1, as follows:
R=Softmax(s),R=[r1,r2,...,rN]andri∈[0,1]and∑ri=1
and selecting the relation category with the highest probability as the relation between the two entities. For the example "X national aircraft carrier is currently active in XX sea area", the relationship recognition result (format: { starting entity, ending entity, relationship category }) is { "aircraft carrier", "X country", "country" }, { "aircraft carrier", "XX", "position" }, { "aircraft carrier", "activity", "mission" }.
In this embodiment, step 4 is performed to train the BERT-ATT-RE relationship extraction model. The BERT-ATT-RE network loss value loss is a weighted sum of the entity extracted loss value lossner and the relationship extracted branch loss value lossner.
loss=lossner+2×lossre
In order to prevent the model from generating an overfitting phenomenon in the training process, an L2 regularization method is added to constrain the BERT-ATT-RE network. A dropout strategy is introduced in the training process, the depression probability is set to be 0.5, and a batch Adam optimization method is adopted for model parameter training.
In this embodiment, in step 5, the entity recognition result and the test set data are input into a BERT-ATT-RE relationship extraction model to obtain a relationship extraction result. The performance evaluation indexes of the relation extraction result adopt precision (precision), recall (recall) and F1The value, the calculation formula is as follows:
Figure RE-GDA0003683803650000111
Figure RE-GDA0003683803650000112
Figure RE-GDA0003683803650000113
where TP indicates the number of correct classes, FP indicates the number of negative classes judged as positive classes, and FN indicates the number of positive classes predicted as negative classes.
The experimental data set is generated by manual marking on the basis of Baidu encyclopedia and interactive encyclopedia military corpora. The data set comprises 13940 training samples and 2390 test samples, which contain 24 relationships. Identifying the test sample to obtain the accuracy rate, recall rate and F of relation extraction1The values were 0.962, 0.956, 0.959. Compared with BERT algorithm 0.946, 0.93 and 0.938, the method has obvious improvement.
In a specific implementation, the present application provides a computer storage medium and a corresponding data processing unit, where the computer storage medium is capable of storing a computer program, and when the computer program is executed by the data processing unit, the computer program may run the inventive content of the military equipment relationship extraction method based on BERT and attention mechanism provided by the present invention and some or all of the steps in each embodiment. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like.
It is obvious to those skilled in the art that the technical solutions in the embodiments of the present invention can be implemented by means of a computer program and its corresponding general-purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of a computer program or a software product, which may be stored in a storage medium and includes several instructions to enable a device (which may be a personal computer, a server, a single chip computer, MUU, or a network device) including a data processing unit to execute the method in the embodiments or some parts of the embodiments of the present invention.
The present invention provides a military equipment relation extraction method based on BERT and attention mechanism, and the method and the way for realizing the technical scheme are many, the above description is only the preferred embodiment of the present invention, it should be noted that, for those skilled in the art, a plurality of improvements and embellishments can be made without departing from the principle of the present invention, and the improvements and embellishments should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims (10)

1. A military equipment relation extraction method based on BERT and attention mechanism is characterized by comprising the following steps:
step 1, performing entity labeling and relation labeling on a text corpus to obtain labeled data;
step 2, preprocessing the labeled data to generate a text relation extraction model training set and a test set;
step 3, constructing a text relation extraction model;
step 4, training a text relation extraction model to obtain a trained text relation extraction model;
and 5, inputting the test set data into the trained text relation extraction model to obtain a relation extraction result.
2. The military equipment relationship extraction method based on BERT and attention mechanism as claimed in claim 1, characterized in that in step 1, the annotation data comprises three parts, the first part is the text of text prediction, the second part is entity annotation data, and the third part is relationship annotation data;
the step 2 of preprocessing the labeling data comprises the following steps: the entity annotation data is expressed in the form of { entity initial position, entity end position and entity label }, and then converted into a BMES entity annotation system; converting the relation annotation data into a form of { a first entity, a second entity, a relation, a first entity starting position, a first entity ending position, a first entity tag, a second entity starting position, a second entity ending position, a second entity tag };
generating a text relation extraction model training set and a test set in the step 2 according to the following steps of 7: and 3, respectively segmenting the entity annotation data and the relationship annotation data.
3. The military equipment relationship extraction method based on BERT and attention mechanism as claimed in claim 2, characterized in that in step 3, the text relationship extraction model comprises a BERT layer, an entity extraction branch and a relationship extraction branch; the BERT layer is used for carrying out depth feature extraction on an input text to obtain input text features; the entity extraction branch is used for mapping the input text features to entity tags to obtain entity tag sequence vectors, and then predicting the entity tag sequence vectors to obtain entity categories; and the relation extraction branch is used for carrying out classification prediction on the combined characteristics of the BERT output and the entity identification output to obtain the relation category between the two entities.
4. The military equipment relationship extraction method based on BERT and attention mechanism as claimed in claim 3, wherein the entity extraction branch in step 3 sequentially comprises a full-link layer and a conditional random field layer, the full-link layer is used for mapping the input text features to entity tags to obtain entity tag sequence vectors, which are denoted as h1,h2,...,hnN is the maximum model input length; the conditional random field layer is used for optimizing and predicting the entity label sequence vector to obtain an entity type; the probability P (y | s) of the tag sequence y is calculated as:
Figure FDA0003652293550000021
here, s denotes an input sentence, m denotes the number of tags in the tag sequence y, and the tags in the tag sequence y include l1,l2,...,lm(ii) a y' represents an arbitrary tag sequence; i represents the label indexes of the label sequences y and y', and i is more than or equal to 1 and less than or equal to m;
Figure FDA0003652293550000022
represents the weight vector corresponding to the label sequence y,
Figure FDA0003652293550000023
indicates the offset corresponding to the tag sequence y,
Figure FDA0003652293550000024
represents the weight vector corresponding to the label sequence y',
Figure FDA0003652293550000025
represents the offset corresponding to the label sequence y'; and then searching the optimal label sequence through a first-order Viterbi algorithm to obtain the entity class.
5. The method as claimed in claim 4, wherein the relationship extraction branch in step 3 sequentially comprises a feature combination layer, a bidirectional GRU layer, an attention layer and a Softmax classifier, wherein the feature combination layer is used for combining input text features and entity categories to obtain relationship extraction input features, and the relationship extraction input features are recorded as Er
The bidirectional GRU layer is used for acquiring abstract features;
the attention layer is used for simulating an attention mechanism when a person reads information, mainly focuses on local features, and records the output feature of the attention layer as A;
the Softmax classifier is used for mapping the attention layer output characteristics A to entity relation classes to obtain the probability R ═ R of each class1,…,rN]And N is the number of relation categories.
6. The military equipment relationship extraction method based on the BERT and attention mechanism as claimed in claim 5, wherein the step 3 comprises:
step 3-1: converting the training text in the training set into character vector characteristics, embedding position information to obtain BERT input characteristics, and recording as Ei(ii) a The position embedding mode is as follows:
Figure FDA0003652293550000026
Figure FDA0003652293550000027
wherein 2c denotes the even number bits of the input sequence, 2c +1 denotes the odd number bits of the input sequence, c is a natural number, dmodelThe characteristic dimension of the BERT model is adopted, pos represents word position information, and PE represents a position embedding function;
step 3-2: inputting BERT into characteristic EiInputting a text relation extraction model, and performing depth feature extraction to obtain abstract features of a BERT layer;
step 3-3: for a batch of training data
Figure FDA0003652293550000028
The physical decimation branch penalty function is:
Figure FDA0003652293550000031
wherein, λ represents L2 regularization parameter, and Θ is entity extraction branch parameter set;
step 3-4: in a bidirectional GRU layer of a relation extraction branch, calculating a hidden state h at the current momentt
Step 3-5: in the attention layer of the relation extraction branch, the result h obtained in step 3-4 is comparedtCalculating to obtain a word vector ut(ii) a Adding a randomly initialized word context vector u in an attention layer training processwPerforming co-training, and calculating the correlation degree of the characters and the relation by adding an attention layer to form an attention layer sentence vector sa;
step 3-6: the Softmax classifier maps the sentence vector sa to a set of vectors with elements in the [0,1] interval, the vector sum being 1, as follows:
R=Soft max(s),R=[r1,r2,…,rN]andri∈[0,1]and∑ri=1。
7. the method of claim 6, wherein in step 4, a text relation extraction model training is performed, and the text relation extraction network loss value loss is an entity extraction branch loss value lossnerSum relation decimation Branch penalty value lossreWeighted sum:
loss=lossner+2×lossre
adding L2 regularization to constrain a text relation extraction network, introducing a dropout strategy in a training process, setting the suppression probability to be 0.5, and adopting a batch Adam optimization algorithm for parameter training of the relation extraction model.
8. The military equipment relationship extraction method based on the BERT and attention mechanism as claimed in claim 7, wherein the BERT layer in step 3 adopts a 12-layer Transformer structure.
9. The military equipment relationship extraction method based on the BERT and attention mechanism as claimed in claim 8, wherein in step 5, test set data is input into a text relationship extraction model to obtain a relationship extraction result; the performance evaluation indexes of the relation extraction result adopt precision rate precision, recall rate call and F1The value is obtained.
10. The military equipment relationship extraction method based on BERT and attention mechanism as claimed in claim 9, wherein in step 1, entity labeling and relationship labeling are performed on text corpus by manual labeling.
CN202210555624.4A 2022-05-19 2022-05-19 Military equipment relation extraction method based on BERT and attention mechanism Pending CN114781375A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210555624.4A CN114781375A (en) 2022-05-19 2022-05-19 Military equipment relation extraction method based on BERT and attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210555624.4A CN114781375A (en) 2022-05-19 2022-05-19 Military equipment relation extraction method based on BERT and attention mechanism

Publications (1)

Publication Number Publication Date
CN114781375A true CN114781375A (en) 2022-07-22

Family

ID=82409685

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210555624.4A Pending CN114781375A (en) 2022-05-19 2022-05-19 Military equipment relation extraction method based on BERT and attention mechanism

Country Status (1)

Country Link
CN (1) CN114781375A (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628174A (en) * 2023-02-17 2023-08-22 广东技术师范大学 End-to-end relation extraction method and system for fusing entity and relation information
CN117114739A (en) * 2023-09-27 2023-11-24 数据空间研究院 Enterprise supply chain information mining method, mining system and storage medium
CN117688974A (en) * 2024-02-01 2024-03-12 中国人民解放军总医院 Knowledge graph-based generation type large model modeling method, system and equipment

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502749A (en) * 2019-08-02 2019-11-26 中国电子科技集团公司第二十八研究所 A kind of text Relation extraction method based on the double-deck attention mechanism Yu two-way GRU
CN111160008A (en) * 2019-12-18 2020-05-15 华南理工大学 Entity relationship joint extraction method and system
CN111985239A (en) * 2020-07-31 2020-11-24 杭州远传新业科技有限公司 Entity identification method and device, electronic equipment and storage medium
CN112163092A (en) * 2020-10-10 2021-01-01 成都数之联科技有限公司 Entity and relation extraction method, system, device and medium
CN112215004A (en) * 2020-09-04 2021-01-12 中国电子科技集团公司第二十八研究所 Application method in extraction of text entities of military equipment based on transfer learning
CN112883732A (en) * 2020-11-26 2021-06-01 中国电子科技网络信息安全有限公司 Method and device for identifying Chinese fine-grained named entities based on associative memory network
CN113011191A (en) * 2021-04-28 2021-06-22 广东工业大学 Knowledge joint extraction model training method
CN114118056A (en) * 2021-10-13 2022-03-01 中国人民解放军军事科学院国防工程研究院工程防护研究所 Information extraction method for war research report
CN114490885A (en) * 2021-12-24 2022-05-13 华南师范大学 Entity relationship extraction method and device, electronic equipment and storage medium
CN114510576A (en) * 2021-12-21 2022-05-17 一拓通信集团股份有限公司 Entity relationship extraction method based on BERT and BiGRU fusion attention mechanism

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110502749A (en) * 2019-08-02 2019-11-26 中国电子科技集团公司第二十八研究所 A kind of text Relation extraction method based on the double-deck attention mechanism Yu two-way GRU
CN111160008A (en) * 2019-12-18 2020-05-15 华南理工大学 Entity relationship joint extraction method and system
CN111985239A (en) * 2020-07-31 2020-11-24 杭州远传新业科技有限公司 Entity identification method and device, electronic equipment and storage medium
CN112215004A (en) * 2020-09-04 2021-01-12 中国电子科技集团公司第二十八研究所 Application method in extraction of text entities of military equipment based on transfer learning
CN112163092A (en) * 2020-10-10 2021-01-01 成都数之联科技有限公司 Entity and relation extraction method, system, device and medium
CN112883732A (en) * 2020-11-26 2021-06-01 中国电子科技网络信息安全有限公司 Method and device for identifying Chinese fine-grained named entities based on associative memory network
CN113011191A (en) * 2021-04-28 2021-06-22 广东工业大学 Knowledge joint extraction model training method
CN114118056A (en) * 2021-10-13 2022-03-01 中国人民解放军军事科学院国防工程研究院工程防护研究所 Information extraction method for war research report
CN114510576A (en) * 2021-12-21 2022-05-17 一拓通信集团股份有限公司 Entity relationship extraction method based on BERT and BiGRU fusion attention mechanism
CN114490885A (en) * 2021-12-24 2022-05-13 华南师范大学 Entity relationship extraction method and device, electronic equipment and storage medium

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116628174A (en) * 2023-02-17 2023-08-22 广东技术师范大学 End-to-end relation extraction method and system for fusing entity and relation information
CN117114739A (en) * 2023-09-27 2023-11-24 数据空间研究院 Enterprise supply chain information mining method, mining system and storage medium
CN117114739B (en) * 2023-09-27 2024-05-03 数据空间研究院 Enterprise supply chain information mining method, mining system and storage medium
CN117688974A (en) * 2024-02-01 2024-03-12 中国人民解放军总医院 Knowledge graph-based generation type large model modeling method, system and equipment
CN117688974B (en) * 2024-02-01 2024-04-26 中国人民解放军总医院 Knowledge graph-based generation type large model modeling method, system and equipment

Similar Documents

Publication Publication Date Title
CN110502749B (en) Text relation extraction method based on double-layer attention mechanism and bidirectional GRU
CN110188358B (en) Training method and device for natural language processing model
CN111046179B (en) Text classification method for open network question in specific field
CN112115238B (en) Question-answering method and system based on BERT and knowledge base
CN109800437B (en) Named entity recognition method based on feature fusion
CN112733541A (en) Named entity identification method of BERT-BiGRU-IDCNN-CRF based on attention mechanism
CN111738003B (en) Named entity recognition model training method, named entity recognition method and medium
CN110609891A (en) Visual dialog generation method based on context awareness graph neural network
CN110263325B (en) Chinese word segmentation system
CN113239700A (en) Text semantic matching device, system, method and storage medium for improving BERT
CN109214006B (en) Natural language reasoning method for image enhanced hierarchical semantic representation
CN114781375A (en) Military equipment relation extraction method based on BERT and attention mechanism
CN111274790B (en) Chapter-level event embedding method and device based on syntactic dependency graph
CN115794999B (en) Patent document query method based on diffusion model and computer equipment
CN113128203A (en) Attention mechanism-based relationship extraction method, system, equipment and storage medium
CN112257449A (en) Named entity recognition method and device, computer equipment and storage medium
CN111984791B (en) Attention mechanism-based long text classification method
CN111966812A (en) Automatic question answering method based on dynamic word vector and storage medium
CN113282714B (en) Event detection method based on differential word vector representation
CN113743099A (en) Self-attention mechanism-based term extraction system, method, medium and terminal
CN111145914B (en) Method and device for determining text entity of lung cancer clinical disease seed bank
CN115600597A (en) Named entity identification method, device and system based on attention mechanism and intra-word semantic fusion and storage medium
Amari et al. Deep convolutional neural network for Arabic speech recognition
Mankolli et al. Machine learning and natural language processing: Review of models and optimization problems
Pappas et al. A survey on language modeling using neural networks

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination