CN114781375A

CN114781375A - Military equipment relation extraction method based on BERT and attention mechanism

Info

Publication number: CN114781375A
Application number: CN202210555624.4A
Authority: CN
Inventors: 王鑫鹏; 阮国庆; 李晓冬; 吴蔚; 徐建
Original assignee: CETC 28 Research Institute
Current assignee: CETC 28 Research Institute
Priority date: 2022-05-19
Filing date: 2022-05-19
Publication date: 2022-07-22

Abstract

The invention discloses a military equipment relation extraction method based on BERT and attention mechanism. The extraction of the equipment target relation information in the military news is completed by adopting a mode of entity and relation combined extraction. Firstly, constructing a BERT layer and extracting text characteristic information. And secondly, dividing into an entity extraction branch and a relation extraction branch. Entity extraction full connection layers and conditional random fields are added to a BERT network for label sequence prediction and optimization. The relation extracting branch embeds label characteristics and start and end mark characteristics of relation start entity and end entity on the basis of BERT network output, then excavates the relation between entities through GRU and attention layer, and finally predicts the relation through full connection layer. And thirdly, adding the loss values of the entity extraction branch and the relation extraction branch during training, and optimizing through the same optimizer. Experimental results show that the method is effective in extracting the Chinese text relationship.

Description

Military equipment relation extraction method based on BERT and attention mechanism

Technical Field

The invention relates to the technical field of text relation extraction, in particular to a military equipment relation extraction method based on BERT and attention mechanism.

Background

With the rapid development of information technology and network level, the information amount is in an explosively increasing state, and how to extract important information from massive information becomes a research hotspot at present in information service. The text information processing comprises the directions of entity extraction, relation extraction, event extraction, machine reading understanding and the like. The relation extraction establishes the relation between the entities, further converts the text information into structured data, and provides data support for downstream applications such as Chinese information content retrieval, knowledge graph construction and the like.

The relationship extraction mainly comprises a supervised entity relationship extraction method, a semi-supervised entity relationship extraction method and an unsupervised entity relationship extraction method. The unsupervised entity relation extraction method comprises two parts of entity clustering and relation type word selection, but the problems of inaccurate feature extraction, unreasonable clustering result, low relation result accuracy and the like exist. Semi-supervised entity relationship extraction methods, such as Bootstrapping, summarize entity relationship sequence patterns from text containing relationship seeds, and then use this to find more relationship seed instances. However, the problem of semantic drift caused by noise mixed in the iterative process exists. The main idea of the supervised entity relation extraction method is to train a machine learning model on the labeled data and perform relation recognition on the test data. Supervised entity relationship extraction methods are classified into rule-based relationship extraction methods and feature-based relationship extraction methods. The relation extraction method based on the rules extracts the entity relation through template matching by summarizing and summarizing the rules or templates according to the corpora and the fields. Such methods rely on named entity recognition systems and distance calculations, which are prone to additional propagation errors and time consumption. The feature-based relation extraction method mainly uses machine learning methods, such as RNN (Recurrent Neural Network), CNN (Convolutional Neural Network), LSTM (Long Short-Term Memory artificial Neural Network), and the like, to automatically extract text features without constructing complex features, but cannot fully utilize local features and global features of text information.

Disclosure of Invention

The purpose of the invention is as follows: the invention aims to solve the technical problem of the prior art and provides a military equipment relation extraction method based on BERT and attention mechanism, which can effectively improve the Chinese text relation extraction accuracy.

In order to solve the technical problem, the invention discloses a military equipment relation extraction method based on a BERT and attention mechanism, which comprises the following steps:

step 1, performing entity labeling and relation labeling on a text corpus to obtain labeled data;

step 2, preprocessing the labeled data to generate a text relation extraction model training set and a test set;

step 3, constructing a text relation extraction model;

step 4, training a text relation extraction model to obtain a trained text relation extraction model;

and 5, inputting the test set data into the trained text relation extraction model to obtain a relation extraction result.

Further, in step 1, the annotation data comprises three parts, the first part is an original text of a text expectation, the second part is entity annotation data, and the third part is relationship annotation data;

the step 2 of preprocessing the labeling data comprises the following steps: the entity marking data is expressed in the form of { entity initial position, entity end position and entity label }, and then is converted into a BMES entity marking system; (ii) a Converting the relation annotation data into a form of { a first entity, a second entity, a relation, a first entity starting position, a first entity ending position, a first entity label, a second entity starting position, a second entity ending position, a second entity label };

generating a text relation extraction model training set and a test set in the step 2 according to the following steps of 7: and 3, respectively segmenting the entity annotation data and the relationship annotation data.

Further, in step 3, the text relationship extraction model includes a BERT (bidirectional Encoder retrieval from transformations) layer, an entity extraction branch, and a relationship extraction branch, where the BERT layer is configured to perform depth feature extraction on an input text to obtain an input text feature; BERT essentially performs self-supervised learning through mass data, learning a good feature representation for words/words. In subsequent downstream tasks, the features of BERT may be used directly to represent word embedding features as tasks. And fine adjustment is carried out according to the requirements of downstream tasks to obtain a model with good effect.

The entity extraction branch is used for mapping the input text features to entity tags to obtain entity tag sequence vectors, and then predicting the entity tag sequence vectors to obtain entity categories;

and the relation extraction branch is used for carrying out classification prediction on the combined characteristics of the BERT output and the entity identification output to obtain the relation category between the two entities.

Compared with a pipeline type information extraction mode of firstly extracting the entities and then extracting the relations, the method for constructing the entity and relation combined extraction model has the advantages of lower hardware resource overhead and higher speed. Meanwhile, the relation between the subtasks is enhanced, the error transmission accumulation among the subtasks is reduced, and the relation extraction effect is improved.

Further, in step 3, the entity extraction branch sequentially includes a full-link layer and a conditional random field layer, where the full-link layer is configured to map the input text features to the entity labels to obtain entity label sequence vectors, which are denoted as h₁,h₂,...,h_nN is the maximum model input length; the Conditional Random Field (CRF) layer is used for optimizing and predicting an entity label sequence vector to obtain an entity class; CRF is a sequence of states, in contrast to the traditional Hidden Markov Model (HMM)Markov Chain (Markov Chain). CRF can be used for various predictive problems, and is commonly used in the machine learning field as a processing annotation problem. The probability P (y | s) of the tag sequence y is calculated as:

here, s denotes an input sentence, m denotes the number of tags in the tag sequence y, and the tags in the tag sequence y include l₁,l₂,...,l_m(ii) a y' represents any tag sequence, including tags; i represents the label indexes of the label sequences y and y', and i is more than or equal to 1 and less than or equal to m;

represents the weight vector corresponding to the label sequence y,

indicates the offset corresponding to the tag sequence y,

represents the weight vector corresponding to the label sequence y',

represents the offset corresponding to the label sequence y'; and then searching the optimal label sequence through a first-order Viterbi algorithm to obtain the entity class.

Further, the relationship extraction branch in step 3 sequentially includes a feature combination layer, a bidirectional GRU (Gated recovery Unit) layer, an attention layer, and a Softmax classifier, where the feature combination layer is configured to combine features of an input text with categories of entities to obtain a relationship extraction input feature, which is denoted as E_r；

The bidirectional GRU layer is used for acquiring abstract features;

the attention layer is used for simulating an attention mechanism when a person reads information, focuses on semantic information with important influence on local features like keywords, and records an output feature A;

the Softmax classifier is used for mapping the attention layer output characteristics A to entity relation classes and obtaining the probability R ═ R of each class₁,...,r_N]And N is the number of relation categories.

Further, step 3 comprises:

step 3-1: converting the training text in the training set into character vector characteristics, embedding position information to obtain BERT input characteristics, and recording as E_i(ii) a The position embedding mode is as follows:

wherein 2c denotes the even number bits of the input sequence, 2c +1 denotes the odd number bits of the input sequence, c is a natural number, d_modelIs the characteristic dimension of the BERT model, pos represents the word location information, and PE represents the location embedding function.

Step 3-2: inputting BERT input features Ei into a text relation extraction model, and performing depth feature extraction to obtain abstract features of a BERT layer;

step 3-3: for a batch of training data

The physical decimation branch penalty function is:

wherein λ represents the L2 regularization parameter, and Θ is the entity extraction branch parameter set.

Step 3-4: in the bidirectional GRU layer of the relation extraction branch, the hidden state h of the current time is calculated_t；

Step 3-5: is closingDraw the attention layer of the branch according to the result h obtained in step 3-4_tCalculating to obtain a word vector u_t。

In fact, each word in the sentence has an unequal effect on the expression of the meaning of the sentence, and a randomly initialized word context vector u is added during the attention level training process_wAnd performing co-training, and calculating the correlation degree of the words and the relations by adding an attention layer to form an attention layer sentence vector sa.

Step 3-6: the Softmax classifier maps the sentence vector sa to a set of vectors with elements in the [0,1] interval, the vector sum being 1, as follows:

R＝Softmax(s),R＝[r₁,r₂,...,r_N]andr_i∈[0,1]and∑r_i＝1

further, in step 4, a text relation extraction model is trained, and the text relation extraction network loss value loss is used for extracting the branch loss value loss for the entity_nerSum relation decimation Branch loss value loss_reAnd (3) weighted sum:

loss＝loss_ner+2×loss_re

to enhance the impact of relationship extraction, loss_ner、loss_reThe weights of (a) and (b) are set to 1 and 2, respectively. In order to prevent overfitting of the model during training, L2 regularization is added to constrain a text relation extraction network, a dropout strategy is introduced in the training process, the suppression probability is set to be 0.5, and a batch Adam optimization algorithm is adopted for parameter training of the relation extraction model.

Further, the BERT layer in the step 3 adopts a 12-layer Transformer structure

Further, in step 5, the entity recognition result and the test set data are input into a text relation extraction model to obtain a relation extraction result; the performance evaluation indexes adopt precision rate precision, recall rate recall and F₁The value is obtained.

Further, in step 1, entity labeling and relationship labeling are performed on the text corpus by manual labeling.

The principle of the invention is as follows: the invention extracts and associates the target information mentioned in the military text by constructing a united entity and a relation extraction model based on a BERT deep learning pre-training language model.

Has the beneficial effects that:

compared with the prior art, the invention has the following remarkable advantages: the BERT deep learning pre-training language model is obtained based on large-scale open source data training, universal grammar semantic knowledge is implicitly learned, and downstream tasks can obtain almost the best effect only by fine adjustment of a small amount of task data. Meanwhile, an entity and relationship combined extraction model is constructed, and compared with a pipeline type information extraction mode of firstly extracting entities and then extracting relationships, the hardware resource overhead is smaller and the speed is higher. Meanwhile, the relation between the subtasks is enhanced, the error transmission accumulation among the subtasks is reduced, and the effect of extracting the target relation of military equipment is improved.

Drawings

The foregoing and/or other advantages of the invention will become further apparent from the following detailed description of the invention when taken in conjunction with the accompanying drawings.

FIG. 1 is a BERT-ATT-RE relationship extraction network of the present invention.

FIG. 2 is a schematic diagram of the BERT model structure of the present invention.

FIG. 3 is a schematic diagram of the structure of the Transformer of the present invention.

Detailed Description

Embodiments of the present invention will be described below with reference to the accompanying drawings.

With the development of the information age, the amount of text data is rapidly increasing, and computer computing power is rapidly developing. In the natural language understanding field, a pre-training model is generated through unsupervised learning of mass data, common grammar semantic knowledge is implicitly learned, and downstream tasks can obtain almost optimal effect only through fine adjustment of a small amount of task data. Text information extraction can be roughly divided into a pipeline mode and a joint extraction mode according to the logic sequence of the realization of entity identification and relation extraction. The pipeline mode firstly extracts the entities of the Chinese text and identifies the relationship between the entities. And the joint extraction mode simultaneously extracts entities and relationships thereof in the same model. The joint extraction mode enhances the relation between tasks, reduces error transmission accumulation between subtasks, and improves the effect of relation extraction.

The embodiment of the application discloses a military equipment relationship extraction method (BERT-ATT-RE) based on a BERT and attention mechanism, which is suitable for an equipment target relationship information extraction scene in military news and comprises the following steps:

step 1, carrying out entity labeling and relation labeling on a text corpus to obtain labeled data;

step 3, constructing a text relation extraction model;

In this embodiment, in step 1, the annotation data includes three parts, a first part is an original text of a text prediction, a second part is entity annotation data, and a third part is relationship annotation data; and carrying out entity labeling and relation labeling on the text corpus by adopting manual labeling.

The step 2 of preprocessing the labeled data comprises the following steps: the entity annotation data is expressed in the form of { entity initial position, entity end position and entity label }, and then converted into a BMES entity annotation system; and converting the relation annotation data into a form of { a first entity, a second entity, a relation, a first entity starting position, a first entity ending position, a first entity label, a second entity starting position, a second entity ending position and a second entity label }. For example, marking the current activity of the X country aircraft carrier in the XX sea area, and the entity marking result is {0, 2, "Country" }, {2, 6, "equipment" }, {9, 11, "location" }, {13, 15, "action" }. The entity annotation data is converted into a BMES entity annotation system, B represents the initial position of the entity, M represents the middle part of the entity, E represents the end position of the entity, and S represents that the entity is a single-word entity. For example, "aircraft carrier" is labeled "B-equipment, M-equipment, E-equipment". The relationship labeling result is { "aircraft carrier", "X country", "country", 2, 6 "equipment", 0, 2 "country" }, { "aircraft carrier", "XX", "position", 2, 6 "equipment", 9, 11 "location" }, { "aircraft carrier", "activity", "mission", 2, 6 "equipment", 13, 15 "action" }.

Generating a text relation extraction model training set and a test set in the step 2 according to the following steps of 7: and 3, respectively segmenting the entity annotation data and the relation annotation data.

In this embodiment, in step 3, the text relationship extraction model includes a BERT layer, an entity extraction branch, and a relationship extraction branch. The BERT layer is used for carrying out depth feature extraction on an input text to obtain input text features; the entity extraction branch is used for mapping the input text features to entity tags to obtain entity tag sequence vectors, and then predicting the entity tag sequence vectors to obtain entity categories; and the relation extraction branch is used for carrying out classification prediction on the combined characteristics of the BERT output and the entity identification output to obtain the relation category between the two entities. As shown in fig. 1. The method specifically comprises the following steps:

step 3-1: converting training texts { X, state, navigation, air, mother, ship, eye, front, in, X, X, sea, domain, living, moving } in a training set into character vector characteristics, embedding position information to obtain BERT input characteristics, and marking as E_i. The position embedding mode is as follows:

wherein 2c represents the even number of the input sequence, 2c +1 represents the odd number of the input sequence, c is a natural number, d_modelIs the characteristic dimension of the BERT model, pos represents the word position information, PE represents the position embedding functionCounting; .

Step 3-2: the first layer of the text relation extraction model network is BERT. BERT essentially performs self-supervised learning through mass data, learning a good feature representation for words/words. In subsequent downstream tasks, the characteristics of BERT may be used directly to represent word embedding characteristics as a task. And obtaining a model with good effect after fine adjustment according to the requirements of downstream tasks. BERT employs a 12-layer Transformer structure, as shown in FIG. 2, trm in the rectangular box represents the Transformer structure.

The transform, which is essentially an Encoder (Encoder) -Decoder (Decoder) structure, is shown in fig. 3, with an Encoder structure in the left dashed box and a Decoder structure in the right dashed box. The encoder structure of the transform model is formed by stacking Nx (Nx ═ 6) identical base layers, each base Layer is composed of two sub-layers, the first is a Multi-Head Attention Layer (Multi-Head Attention), the second is a dense full-link Feed Forward neural Network Layer (Feed Forward Network), then a Residual Connection (Residual Connection) is used once in the two sub-layers, and then a Layer Normalization (Layer Normalization) operation is performed (added & Norm in fig. 3 represents Residual Connection and Layer Normalization). The decoder structure is similar to the encoder structure, and is composed of 6 identical basic layers, each layer includes a Multi-Head Attention layer and a feedforward neural network layer, and a concealed Multi-Head Attention layer (Multi-Head Attention) for performing Multi-Head Attention operation on the output of the encoder layer. Each sub-layer of the decoder also employs residual concatenation and then normalization operations.

The multi-headed Attention mechanism is an upgraded version that can be considered a point-of-care Attention mechanism (Scaled Dot-Product Attention). The point-by-point attention mechanism is calculated as follows:

q, K, V, which represent the text entry vector and the key-value pair vector, respectively, d_kIs the characteristic dimension of K. By self-attentionIn Self-Attention, Q, K, V inputs vectors for question sentences, so that the Attention distribution in the sentence can be determined, including information of other words in the sentence. The multi-head attention mechanism is based on a point-by-point attention mechanism, linear change is added to expand to a plurality of different expression subspaces, and characteristic information is expanded. The multi-head attention mechanism is calculated as follows:

MultiHead(Q,K,V)＝Concat(head₁，head₂，...，head_k)W^O

wherein the content of the first and second substances,

a Q-oriented weight matrix is represented,

a weight matrix oriented to K is represented,

representing a V-oriented weight matrix, W^OAnd the output weight matrix is shown, k represents the number of attention heads, j represents the jth attention head, and j is larger than or equal to 1 and smaller than or equal to k.

Step 3-3: after the text is subjected to depth feature extraction through a BERT layer, the text is divided into two branches to be respectively subjected to entity extraction and relationship extraction. And the entity extraction branch is added into a full connection layer and a CRF layer on the basis of the BERT layer. The full connection layer maps the features to entity labels, and the CRF layer optimizes the sequence. In contrast to conventional Hidden Markov Models (HMMs), CRF is a Markov Chain of state sequences (Markov Chain). CRF can be used for various predictive problems, and is commonly used in the machine learning field as a processing annotation problem. Finally, processing the output vector h of the full connection layer by a CRF layer₁,h₂,...,h_nAnd n is the maximum model input length. The probability P (y | s) of the tag sequence y is calculated as:

represents the weight vector corresponding to the label sequence y,

indicates the offset corresponding to the tag sequence y,

represents the weight vector corresponding to the label sequence y',

represents the offset corresponding to the label sequence y'; and then searching the optimal label sequence through a first-order Viterbi algorithm to obtain the entity class. The sentence { X, Country, aviation, mother, ship, current, on, X, X, sea, territory, alive, action } corresponds to the entity class { B-Country, E-Country, B-equipment, M-equipment, E-equipment, O, O, O, B-site, E-site, O, O, B-action, E-action }. For a batch of training data

The loss function is:

where λ represents the L2 regularization parameter and Θ is the set of parameters.

Step 3-4: the relation extraction branch is added on the basis of BERT outputBidirectional gru (gated secure unit) and attention layer, and finally output the result through Softmax classifier. The bidirectional GRU layer is used for obtaining abstract characteristics and comprises two GRU units, and each GRU unit comprises a reset gate (reset gate) r_tAnd an update gate z_t. Updating the door z_tFor controlling the output h of the preceding moment_t-1Input x with the current time_tThe degree of retention of the information contained in (a) is taken as the output h of the gating unit at time t_tThe larger the value, the higher the degree of retention; and reset the door r_tBy inputting x at the current moment_tDetermining the previous time h_t-1The smaller the reset gate value is, the higher the neglect degree is. Calculating to obtain the memory of the current moment

And the hidden state h at the current moment after the reset gate and the update gate_t. Update gate z of the GRU unit at time t_tReset gate r_tNew memory of

Final hidden state h_tIs calculated as follows:

z_t＝σ(W_z·[h_t-1,x_t])

r_t＝σ(W_r·[h_t-1,x_t])

wherein, σ () is sigmoid nonlinear activation function for enhancing the processing capability of model to nonlinear data, and σ (x) is 1/(1+ e)^-x). Denotes dot multiplication. tan h (x) or (e)^x-e^-x)/(e^x+e^-x)。W、W_r、W_zIs the weight matrix of the model. []Indicating that two vectors are connected.

Step 3-5: attention layer. Attention mechanism (Attention) is a model proposed by Treisman and Gelade to simulate the Attention mechanism of the human brain. And highlighting key information influencing the output of the model by utilizing probability distribution. Therefore, the attention mechanism can more effectively utilize the local features and the global features of the text, improve the accuracy of relation extraction, and focus on semantic information with important influence of the local features like keywords.

For one sentence vector w ═ w₁,w₂,...,w_TH } converting the result obtained in the step 3-2 into a result h_tThe word vector u is obtained by the following formula_t。

u_t＝tanh(W_w·h_t+b_w)

Wherein, W_wRepresenting the attention level weight parameter, b_wAn attention layer offset value parameter is represented.

In fact, each word in the sentence has an unequal effect on the expression of the meaning of the sentence, and a randomly initialized word context vector u is added during the attention level training process_wAnd (5) performing co-training. And calculating the degree of correlation between the words and the relations by adding an attention layer to form an attention layer sentence vector sa. The attention layer calculation formula is as follows:

α_tis the word u_tAnd u_wSa is the weighted sentence vector representation at the current moment.

Step 3-6: the Softmax classifier maps the sentence vector sa to a set of elements at [0,1]Vectors within the interval, i.e. the probability R ═ R for each entity relationship class₁,...,r_N]N is offThe coefficient class number, vector sum is 1, as follows:

R＝Softmax(s),R＝[r₁,r₂,...,r_N]andr_i∈[0,1]and∑r_i＝1

and selecting the relation category with the highest probability as the relation between the two entities. For the example "X national aircraft carrier is currently active in XX sea area", the relationship recognition result (format: { starting entity, ending entity, relationship category }) is { "aircraft carrier", "X country", "country" }, { "aircraft carrier", "XX", "position" }, { "aircraft carrier", "activity", "mission" }.

In this embodiment, step 4 is performed to train the BERT-ATT-RE relationship extraction model. The BERT-ATT-RE network loss value loss is a weighted sum of the entity extracted loss value lossner and the relationship extracted branch loss value lossner.

loss＝loss_ner+2×loss_re

In order to prevent the model from generating an overfitting phenomenon in the training process, an L2 regularization method is added to constrain the BERT-ATT-RE network. A dropout strategy is introduced in the training process, the depression probability is set to be 0.5, and a batch Adam optimization method is adopted for model parameter training.

In this embodiment, in step 5, the entity recognition result and the test set data are input into a BERT-ATT-RE relationship extraction model to obtain a relationship extraction result. The performance evaluation indexes of the relation extraction result adopt precision (precision), recall (recall) and F₁The value, the calculation formula is as follows:

where TP indicates the number of correct classes, FP indicates the number of negative classes judged as positive classes, and FN indicates the number of positive classes predicted as negative classes.

The experimental data set is generated by manual marking on the basis of Baidu encyclopedia and interactive encyclopedia military corpora. The data set comprises 13940 training samples and 2390 test samples, which contain 24 relationships. Identifying the test sample to obtain the accuracy rate, recall rate and F of relation extraction₁The values were 0.962, 0.956, 0.959. Compared with BERT algorithm 0.946, 0.93 and 0.938, the method has obvious improvement.

In a specific implementation, the present application provides a computer storage medium and a corresponding data processing unit, where the computer storage medium is capable of storing a computer program, and when the computer program is executed by the data processing unit, the computer program may run the inventive content of the military equipment relationship extraction method based on BERT and attention mechanism provided by the present invention and some or all of the steps in each embodiment. The storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), a Random Access Memory (RAM), or the like.

It is obvious to those skilled in the art that the technical solutions in the embodiments of the present invention can be implemented by means of a computer program and its corresponding general-purpose hardware platform. Based on such understanding, the technical solutions in the embodiments of the present invention may be essentially or partially implemented in the form of a computer program or a software product, which may be stored in a storage medium and includes several instructions to enable a device (which may be a personal computer, a server, a single chip computer, MUU, or a network device) including a data processing unit to execute the method in the embodiments or some parts of the embodiments of the present invention.

The present invention provides a military equipment relation extraction method based on BERT and attention mechanism, and the method and the way for realizing the technical scheme are many, the above description is only the preferred embodiment of the present invention, it should be noted that, for those skilled in the art, a plurality of improvements and embellishments can be made without departing from the principle of the present invention, and the improvements and embellishments should also be regarded as the protection scope of the present invention. All the components not specified in the present embodiment can be realized by the prior art.

Claims

1. A military equipment relation extraction method based on BERT and attention mechanism is characterized by comprising the following steps:

step 3, constructing a text relation extraction model;

2. The military equipment relationship extraction method based on BERT and attention mechanism as claimed in claim 1, characterized in that in step 1, the annotation data comprises three parts, the first part is the text of text prediction, the second part is entity annotation data, and the third part is relationship annotation data;

the step 2 of preprocessing the labeling data comprises the following steps: the entity annotation data is expressed in the form of { entity initial position, entity end position and entity label }, and then converted into a BMES entity annotation system; converting the relation annotation data into a form of { a first entity, a second entity, a relation, a first entity starting position, a first entity ending position, a first entity tag, a second entity starting position, a second entity ending position, a second entity tag };

3. The military equipment relationship extraction method based on BERT and attention mechanism as claimed in claim 2, characterized in that in step 3, the text relationship extraction model comprises a BERT layer, an entity extraction branch and a relationship extraction branch; the BERT layer is used for carrying out depth feature extraction on an input text to obtain input text features; the entity extraction branch is used for mapping the input text features to entity tags to obtain entity tag sequence vectors, and then predicting the entity tag sequence vectors to obtain entity categories; and the relation extraction branch is used for carrying out classification prediction on the combined characteristics of the BERT output and the entity identification output to obtain the relation category between the two entities.

4. The military equipment relationship extraction method based on BERT and attention mechanism as claimed in claim 3, wherein the entity extraction branch in step 3 sequentially comprises a full-link layer and a conditional random field layer, the full-link layer is used for mapping the input text features to entity tags to obtain entity tag sequence vectors, which are denoted as h₁,h₂,...,h_nN is the maximum model input length; the conditional random field layer is used for optimizing and predicting the entity label sequence vector to obtain an entity type; the probability P (y | s) of the tag sequence y is calculated as:

here, s denotes an input sentence, m denotes the number of tags in the tag sequence y, and the tags in the tag sequence y include l₁,l₂,...,l_m(ii) a y' represents an arbitrary tag sequence; i represents the label indexes of the label sequences y and y', and i is more than or equal to 1 and less than or equal to m;

represents the weight vector corresponding to the label sequence y,

indicates the offset corresponding to the tag sequence y,

represents the weight vector corresponding to the label sequence y',

5. The method as claimed in claim 4, wherein the relationship extraction branch in step 3 sequentially comprises a feature combination layer, a bidirectional GRU layer, an attention layer and a Softmax classifier, wherein the feature combination layer is used for combining input text features and entity categories to obtain relationship extraction input features, and the relationship extraction input features are recorded as E_r；

The bidirectional GRU layer is used for acquiring abstract features;

the attention layer is used for simulating an attention mechanism when a person reads information, mainly focuses on local features, and records the output feature of the attention layer as A;

the Softmax classifier is used for mapping the attention layer output characteristics A to entity relation classes to obtain the probability R ═ R of each class₁,…,r_N]And N is the number of relation categories.

6. The military equipment relationship extraction method based on the BERT and attention mechanism as claimed in claim 5, wherein the step 3 comprises:

wherein 2c denotes the even number bits of the input sequence, 2c +1 denotes the odd number bits of the input sequence, c is a natural number, d_modelThe characteristic dimension of the BERT model is adopted, pos represents word position information, and PE represents a position embedding function;

step 3-2: inputting BERT into characteristic E_iInputting a text relation extraction model, and performing depth feature extraction to obtain abstract features of a BERT layer;

step 3-3: for a batch of training data

The physical decimation branch penalty function is:

wherein, λ represents L2 regularization parameter, and Θ is entity extraction branch parameter set;

step 3-4: in a bidirectional GRU layer of a relation extraction branch, calculating a hidden state h at the current moment_t；

Step 3-5: in the attention layer of the relation extraction branch, the result h obtained in step 3-4 is compared_tCalculating to obtain a word vector u_t(ii) a Adding a randomly initialized word context vector u in an attention layer training process_wPerforming co-training, and calculating the correlation degree of the characters and the relation by adding an attention layer to form an attention layer sentence vector sa;

R＝Soft max(s),R＝[r₁,r₂,…,r_N]andr_i∈[0,1]and∑r_i＝1。

7. the method of claim 6, wherein in step 4, a text relation extraction model training is performed, and the text relation extraction network loss value loss is an entity extraction branch loss value loss_nerSum relation decimation Branch penalty value loss_reWeighted sum:

loss＝loss_ner+2×loss_re

adding L2 regularization to constrain a text relation extraction network, introducing a dropout strategy in a training process, setting the suppression probability to be 0.5, and adopting a batch Adam optimization algorithm for parameter training of the relation extraction model.

8. The military equipment relationship extraction method based on the BERT and attention mechanism as claimed in claim 7, wherein the BERT layer in step 3 adopts a 12-layer Transformer structure.

9. The military equipment relationship extraction method based on the BERT and attention mechanism as claimed in claim 8, wherein in step 5, test set data is input into a text relationship extraction model to obtain a relationship extraction result; the performance evaluation indexes of the relation extraction result adopt precision rate precision, recall rate call and F₁The value is obtained.

10. The military equipment relationship extraction method based on BERT and attention mechanism as claimed in claim 9, wherein in step 1, entity labeling and relationship labeling are performed on text corpus by manual labeling.