CN114429132A

CN114429132A - Named entity identification method and device based on mixed lattice self-attention network

Info

Publication number: CN114429132A
Application number: CN202210172667.4A
Authority: CN
Inventors: 王立松; 何宗锋; 刘绍翰; 刘亮
Original assignee: Nanjing University of Aeronautics and Astronautics
Current assignee: Nanjing University of Aeronautics and Astronautics
Priority date: 2022-02-24
Filing date: 2022-02-24
Publication date: 2022-05-03

Abstract

The invention discloses a named entity identification method based on a mixed lattice self-attention network, which comprises the following steps: s1, coding the sentence characteristic vector expressed by the word pair into a matrix with fixed dimensionality to obtain the word vector expression with a mixed lattice structure; constructing a self-attention network to capture the influence of word vectors in the vector on the word vectors and enhance the feature representation of each word vector; word features are fused in an Embedding layer of BERT, and better word vector representation is obtained through fine tuning of a learning process; and realizing an entity sequence labeling task and a decoding process in entity identification according to a BilSTM-CRF network, completing modeling of the character characteristics after fusion through the network, and constructing and completing an entity identification model based on a mixed lattice self-attention network. The invention can capture global vocabulary information, generate word vector representation with rich semantics and improve the recognition precision of the Chinese named entity on a plurality of data sets.

Description

Named entity identification method and device based on mixed lattice self-attention network

Technical Field

The invention relates to the technical field of natural language processing in artificial intelligence, in particular to a named entity identification method and device based on a mixed lattice self-attention network.

Background

Named Entity Recognition (NER), also called entity extraction, was originally proposed at the MUC-6 conference, a technique for extracting entities from text in information extraction technology. Early entity identification adopts methods based on rules, statistics and the like, and because the traditional methods excessively depend on manual design, the coverage rate of identification is low, the identification precision is low, and the traditional methods are already replaced by deep learning methods. In the deep learning-based method, entity recognition models are divided into a character-based model (character-based) and a word-based model (word-based), and other languages such as english generally adopt the character-based model because each word has a definite meaning; the meaning of a word in chinese is ambiguous and the meaning of a word is concrete, so a word-based model is used in the chinese NER method. In order to better represent each word vector in Chinese, later, some learners propose a method based on representation learning, which is a learning mode for converting human language information into characteristics capable of being recognized by a machine, and can improve the accuracy of semantic expression in machine learning.

In the named entity recognition method, external vocabulary information can effectively improve recognition accuracy, but the methods depend on the performance of a fusion algorithm. For example, the invention with the patent number CN113836930A proposes a method for recognizing named entities of chinese dangerous chemicals, which is based on a BiLSTM-CRF model, and utilizes a pre-training language model BERT to obtain text character level codes in the field of dangerous chemicals, to obtain word vectors based on context information, and then introduces an attention mechanism to enhance the capability of the model to mine global and local features of texts. The invention with the patent number of CN113128232A provides a named entity recognition method based on ALBERT and multiple word information embedding, which can effectively represent the ambiguity of characters and improve the efficiency of entity recognition. The invention with the patent number of CN111310470A discloses a Chinese named entity recognition method fusing word features, the comprehension of a model to a text is enhanced through result data obtained after comprehensive analysis, and the F1 value in a model recognition task is improved.

Although the existing method has achieved good effect on fusing word feature vectors, the problems existing in the existing technical means are: 1) the word feature fusion method does not consider the difference of word vectors trained by different models in semantic expression, and directly fuses the word vectors and the word vectors in a void manner, so that the word-level features of the word vectors cannot be effectively enhanced; 2) in the vocabulary enhancement method based on the weight of the learning words, only the influence of the matching words of each character characteristic on the semantic representation of the character is considered, and the effect of global vocabulary information is ignored.

Disclosure of Invention

Aiming at the defects in the prior art, the invention provides a named entity recognition method and device based on a mixed lattice self-attention network, and based on the idea of expression learning, the proposed model can fuse vocabulary information so as to enhance the feature expression of a word vector, so that the generated word vector contains more entity boundary information, and the accuracy of an NER task can be improved.

In order to achieve the purpose, the invention adopts the following technical scheme:

a named entity recognition method based on a mixed lattice self-attention network comprises the following steps:

s1, searching words composed of continuous words in the input sentence in the dictionary, combining into a single multidimensional vector through position alternate mapping, and coding the sentence characteristic vector represented by the word pair into a matrix with fixed dimensionality by adopting a mixed word lattice coding mode to obtain the word vector representation of a corresponding mixed lattice structure;

s2, constructing a corresponding self-attention network based on the word vectors of the mixed lattice structure generated in the step S1 to capture the influence of the word vectors in the vectors on the word vectors, so as to enhance the feature representation of each word vector;

s3, fusing word features at the Embedding layer of BERT, and learning to obtain better word vector representation through a fine tuning learning process; according to a BilSTM-CRF network, realizing an entity sequence labeling task and a decoding process in entity identification, completing modeling of character features after fusion through the network, and constructing and completing an entity identification model based on a mixed lattice self-attention network;

and S4, training the entity recognition model based on the mixed lattice self-attention network on the data set.

In order to optimize the technical scheme, the specific measures adopted further comprise:

further, in step S1, the process of encoding the sentence feature vectors represented by the word pairs into a matrix with a fixed dimension by using a mixed-word lattice encoding method to obtain a word vector representation with a corresponding mixed-lattice structure includes the following steps:

s11, a sentence S is given_c＝{c₁,c₂,…,c_nGet sentence s by loading the pretrained BERT weights_cWord feature vector representation of

Wherein

c_iDenotes s_cOf (a), n represents the word length of s, e^BA lookup table representing a BERT pre-training word vector;

s12, giving a Chinese dictionary L, constructing a Trie dictionary tree, traversing the nodes of the tree, and obtaining the vocabulary matched with each word;

s13, all matched words are grouped according to BMES marks, namely for the character c_iWord set B (c)_i) Consisting of matching words starting with it, set M (c)_i) From c_iSet E (c) for matching word composition of its internal characters_i) By c_iMatching word composition at the end, set S (c)_i) From c_iThe single character word composition of (1); sentence s_cIn each word c_iWord set w_iExpressed as:

w_i＝{e^w(B(c_i)),e^w(M(c_i)),e^w(E(c_i)),e^w(S(c_i))}；

wherein e^wA lookup table of word vectors representing pre-training;

s14, setting two learnable nonlinear full-connection layers to connect w_iIs raised to a sum word vector

When BERT is in fine adjustment, learning the weights of the two layers so as to map the pre-trained word feature vector to the semantic feature space of the BERT; the processed word feature vector is represented as follows:

wherein W₁∈(d_c×d_c),W₂∈(d_c×d_w) Is a learnable weight matrix, b₁And b₂Is a corresponding offset, d_cDimension representing the vector of the BERT word, d_wA dimension representing a pre-training word vector;

s15, converting the word feature vector

As the input of the feature fusion model, according to the corresponding relation between the words and the word sets, the feature of each word-word pair is expressed as:

s16, characterizing the word-word pairs as follows:

wherein

Representing a vector concatenator.

Further, in step S2, based on the word vectors of the mixed-lattice structure generated in step S1, constructing a corresponding self-attention network to capture the influence of the word vectors in the vectors on the word vectors, so as to enhance the feature representation of each word vector, includes the following steps:

s21, designing Mixed-lattice self-attention network to capture the association between word features, the self-attention network will mix word code vector V^MEAnd a word position shielding matrix M is used as an input of the enhancement network, global word vectors and word vectors are modeled through the self-attention network, so that the model learns word meaning correlation weights among words, and the Q, K, V matrix is calculated as follows:

[Q,K,V]＝[W_qV^ME,W_kV^ME,W_vV^ME]；

wherein

Is a learnable weight matrix, and d_e＝d_c+d_w(ii) a Q, K, V matrix is query item matrix, key item matrix corresponding to query item and value item matrix to be weighted average; d_eDimension, d, representing mixed-lattice vector_cDimension, d, representing word vector_wThe dimension of the word vector;

s22, using the dot product operation as a formula for calculating the similarity score:

F_Att＝Softmax(S_Att+εM)V；

where M is a static word-position mask matrix, ε is a matrix whose value is infinitesimal,

is an output from the attention network; wherein S_AttDenotes the normalized attention score, K^TRepresents the transpose of matrix K;

s23, adding the word feature information as a residual into the BERT pre-training word vector, and obtaining the word enhanced word feature vector as follows:

C′＝C+g(F_Att)；

wherein

Representing the pre-training word vector features of BERT, the function g (. + -) is used to remove the word vector path in self-entry network to ensure C and F_AttAnd (5) obtaining word embedding vector C' after vocabulary enhancement through the consistency of vector dimensions.

Further, in step S3, the process of constructing the entity recognition model based on the hybride self-attention network includes the following steps:

s31, a sentence sequence S with the length of n is given_c＝{c₁,c₂,…,c_nThe vocabulary enhanced word vector is denoted as C '═ C'₁,c′₂,…,c′_nFine-tuning word vectors C' in the BERT model, the vocabulary-enhanced BERT word-embedding vector is expressed as:

E′_i＝C′_i+E_s(i)+E_p(i)；

wherein E_sAnd E_pRespectively representing a separation vector and a position vector lookup table; i denotes a character sequence s of length n_cThe ith character in (1);

s32, inputting the obtained E' into BERT, wherein the calculation formula of each transform block is as follows:

D＝LN(H_k-1+MHA(H_k-1)；

H_k＝LN(FFN(D)+D)；

wherein H_kRepresents the hidden state output of the k-th layer, H₀E' represents the underlying word vector; LN is the layer normalization function; MHA is a multi-headed self-attention module; FFN represents a two-layer feedforward neural network; d represents an output vector after the multi-head attention module is normalized;

s33, obtaining the hidden state output vector of the last layer of transformer

Will be provided with

Inputting the semantic information into a bidirectional LSTM network, and capturing semantic information from left to right and from right to left of a sentence respectively; the hidden state output of the forward LSTM network is represented as

The backward LSTM network outputs

The output of the BI-LSTM network is the output of the sequence-labeling layer, expressed as:

wherein h is_iThe cascade hidden state output of the ith Bi-LSTMs neuron is expressed and used for expressing c_iThe character-level context semantic representation of;

s34, predicting the NER label by using standard CRF layer, and giving hidden state output vector H ═ H of last layer of network₁,h₂,…,h_nIf y is equal to { y }₁,y₂,…,y_nDenotes a sequence of labels, s ═ s for a sentence₁,s₂,…,s_nThe probability of its corresponding tag sequence is defined as follows:

wherein y' represents any one of all tag sequences;

the representation corresponds to y_iA learnable weight parameter in the network of (a);

the representation corresponds to y_i-1And y_iAn offset between; in the same way as above, the first and second,

respectively representing model weight parameters and offset under any possible label y';

s35, the negative log-likelihood loss as a loss function of the model is expressed as:

based on the named entity recognition method, the invention provides a named entity recognition device based on a mixed lattice self-attention network, wherein the named entity recognition device comprises a mixed lattice structure coding module, a vocabulary enhancement module, a sequence marking and decoding module and a model training module;

the mixed lattice structure coding module is used for searching words consisting of continuous words in input sentences in a dictionary, alternately mapping and merging the words into a single multidimensional vector through positions, and coding sentence characteristic vectors represented by word pairs into a matrix with fixed dimensionality by adopting a mixed word lattice coding mode to obtain word vector representation of a corresponding mixed lattice structure;

the vocabulary enhancement module is used for constructing a corresponding self-attention network based on the generated word vectors of the mixed lattice structure so as to capture the influence of the word vectors in the vectors on the word vectors, thereby enhancing the feature representation of each word vector;

the sequence labeling and decoding module is used for fusing word features at an Embedding layer of BERT and learning to obtain better word vector representation through a fine tuning learning process; according to a BilSTM-CRF network, realizing an entity sequence labeling task and a decoding process in entity identification, completing modeling of character features after fusion through the network, and constructing and completing an entity identification model based on a mixed lattice self-attention network;

the model training module is used for training the entity recognition model based on the hybrid grid self-attention network on the data set.

The invention has the beneficial effects that:

the named entity recognition method and device based on the mixed grid self-attention network capture global vocabulary information by constructing the fusion network, generate word vector representation with rich semantics, and improve the recognition precision of the Chinese named entity on a plurality of data sets. Compared with the BERT reference model without using the vocabulary enhancement network, the invention has the performance improvement of 4.55%, 0.54%, 1.82% and 0.91% on the four data sets respectively, which shows that the improvement of the feature representation of the word vector by the vocabulary enhancement technology is a method for effectively improving the performance of the NER. Meanwhile, compared with other vocabulary enhancement methods, the feature fusion framework (MELSN) provided by the invention can effectively fuse richer vocabulary semantic features, and word feature representation after vocabulary enhancement contains more vocabulary semantics by means of a fine adjustment mechanism of the pre-training model BERT. The comparison experiment result shows that the invention can better utilize the character-word lattice structure to realize word enhancement, and also shows the high efficiency of the method provided by the invention on the recognition task of the Chinese named entity.

Drawings

Fig. 1 is a schematic structural diagram of a named entity recognition apparatus based on a mixed-grid self-attention network according to the present invention.

Detailed Description

The present invention will now be described in further detail with reference to the accompanying drawings.

It should be noted that the terms "upper", "lower", "left", "right", "front", "back", etc. used in the present invention are for clarity of description only, and are not intended to limit the scope of the present invention, and the relative relationship between the terms and the terms is not limited by the technical contents of the essential changes.

The invention provides a named entity recognition method based on a mixed lattice self-attention network, which comprises the following steps:

s1, searching words composed of continuous words in the input sentence in the dictionary, combining into a single multidimensional vector through position alternate mapping, and coding the sentence feature vector represented by the word pair into a matrix with fixed dimension by adopting a mixed word lattice coding mode to obtain the word vector representation of a corresponding mixed lattice structure.

S2, based on the word vectors of the mixed lattice structure generated in step S1, constructing a corresponding self-attention network to capture the influence of the word vectors in the vectors on the word vectors, thereby enhancing the feature representation of each word vector.

S3, fusing word features at the Embedding layer of BERT, and learning to obtain better word vector representation through a fine tuning learning process; and realizing an entity sequence labeling task and a decoding process in entity identification according to a BilSTM-CRF network, completing modeling of the character characteristics after fusion through the network, and constructing and completing an entity identification model based on a mixed lattice self-attention network.

Based on the foregoing method, the embodiment further provides a named entity recognition apparatus based on a hybrid grid self-attention network. Fig. 1 is a schematic structural diagram of a named entity recognition apparatus based on a mixed-grid self-attention network according to the present invention. The whole framework is divided into three parts: a Mixed-lattice Encoding Module (Mixed-lattice Encoding Module), a vocabulary Enhancement Module (Lexicon Enhancement Module), and a Sequence-labeling and Decoding Module (Sequence-labeling and Decoding). Completing the encoding of word-word pair vectors in a first module, namely searching all Words in a dictionary tree, loading pre-trained word and word vectors, encoding the word-word pair vectors into Mixed-lattice embedded vectors (Mixed-lattice embedding), and transmitting the Mixed-lattice embedded vectors and word mask vectors (Words mask) generated at the stage to a next module; the second module realizes the vocabulary enhancement process through the proposed self-attention model, and the enhanced word vector representation is transmitted into the BERT model for fine adjustment; and the last module carries out modeling according to the enhanced and fine-tuned word vectors to complete the label prediction and decoding process of each word vector.

The invention provides a named entity recognition method based on vocabulary enhancement, wherein a model is improved based on a BERT-BilSTM-CRF network, and a vocabulary enhancement module based on an Attention network is added on an Embedding layer of the BERT; the word vector coding method is characterized in that the word vector coding method is designed for coding the word vector, an intersection arrangement word vector coding mode is designed, after vocabulary enhancement is completed through an Attention network, fusion characteristics are normalized, and then the original BERT word vector is added in a residual error mode. The specific implementation process of the invention is as follows:

step 1: word vector construction of mixed lattice structure

The Mixed-word Lattice (Mixed-Lattice) coding has the effect of coding sentence feature vectors represented by word pairs into a matrix with a fixed dimension. Firstly, words composed of continuous words in an input sentence are searched in a dictionary, and then the words are combined into a single multidimensional vector through position alternate mapping.

In particular, a sentence s is given_c＝{c₁,c₂,…,c_nThe sentence s can be directly obtained by loading the weight of the pretrained BERT_cWord feature vector representation of

Wherein

c_iDenotes s_cOf (a), n represents the word length of s, e^BA lookup table representing the BERT pre-training word vector. Given a Chinese dictionary L, a Trie dictionary tree is firstly constructed, nodes of the tree are traversed, and vocabularies matched with each word can be obtained. Grouping all matched words according to 'BMES' mark, namely for character c_iWord set B (c)_i) Consisting of the matching words that begin with it. Similarly, set M (c)_i) From c_iSet E (c) for matching word composition of its internal characters_i) From c to c_iMatching word composition at the end, set S (c)_i) From c_iSo that the sentence s is composed of the single character words_cIn each word c_iWord set w_iCan be expressed as:

w_i＝{e^w(B(c_i)),e^w(M(c_i)),e^w(E(c_i)),e^w(S(c_i))}；

wherein e^wRepresenting a pre-trained word vector look-up table. In order to ensure the consistency of vector dimensions, two learnable nonlinear full-connection layers are arranged to connect w_iIs raised to a sum word vector

And in the process of fine adjustment of BERT, learning the two layers of weights, so that the pre-trained word feature vector can be mapped to the semantic feature space of the BERT. The processed word feature vector is represented as follows:

wherein W₁∈(d_c×d_c),W₂∈(d_c×d_w) Is a learnable weight matrix, b₁And b₂Is the corresponding offset. d is a radical of_cDimension, d, representing a BERT word vector_wRepresenting the dimensions of the pre-training word vector. In our feature fusion method, the converted word feature vector

As input to the feature fusion model. According to the corresponding relation between the words and the word sets, the characteristics of each word-word pair can be expressed as:

at present, a plurality of NER methods based on vocabulary fusion directly use character-word pair characteristics I_iAs the input of the vocabulary enhancement network, the method can only fuse partial vocabulary information, and in the experiment, we find that in the sentence s_cIn order to enable the model to capture global word-level features, the embodiment proposes a new word-word pairThe encoding mode, i.e. Mixed-word encoding (Mixed-Lattice encoding), features of word-word pairs are expressed as follows:

wherein

Representing a vector concatenator, the word vectors are cross-encoded into a fixed-dimension feature vector based on the encoding method, in which the closer the word vectors are arranged, the higher the association with the word, which also corresponds to reality. In the next phase, the embodiment constructs a fusion network based on the Attention mechanism to represent V for the word-word pair feature^MEAnd modeling.

And 2, step: computing process for word feature fusion network

The function of the vocabulary enhancement network is to model a word-word feature representation vector V^METhe word features are used to enhance the feature representation of the word vectors in the sequence. Based on the attention mechanism design idea proposed by Vaswani et al, a Mixed-lattice self-attention network is designed to capture the association between word features. In the previous stage we have obtained a mixed lattice vector representation V of the pairs of code word words^METhe Lattice structure organizes the word and word vectors into a multi-dimensional vector. Characters and word features are distributed in the embedding in a crossed manner, and a self-attribute network distributes V^MEAnd the word position shielding matrix M is used as the input of the enhancement network, and through the modeling of the self-attention network, the model can learn the word meaning correlation weight between words. Given a mixed word code vector V of a sentence^MEAnd the M matrix, Q, K, V matrix is calculated as follows:

[Q,K,V]＝[W_qV^ME,W_kV^ME,W_vV^ME]；

wherein

Is a learnable weightA matrix, and d_e＝d_c+d_wThen, the dot product operation is taken as a calculation formula of the similarity score:

F_Att＝Softmax(S_Att+εM)V；

is the output of the self-attention network. The attention scores obtained at word positions are masked by multiplying the M points by an infinitesimal matrix. Through calculation of the Softmax function, the probability of the word position is all zero, and the position of the character contains the weight fraction of each word. And completing the fusion of word characteristics by using a word position mask method in the self-attribute network.

Through the above process, the word characteristics can be integrated, the word characteristic information is added into the BERT pre-training word vector as a residual error, and the finally obtained word enhanced word characteristic vector is as follows:

C′＝C+g(F_Att)；

wherein

Representing the pre-training word vector features of BERT, using a function g (#) to remove the word vector channel in self-entry network to guarantee C and F_AttThe consistency of the vector dimensions results in a vocabulary enhanced word embedding vector C'.

Based on the above process, the word-word pair vector is recoded, and a special self-attention network is utilized to model the global word vector and the word vector, so that the fused word feature vector is obtained, and the fusion of the global word feature information is realized.

And step 3: constructing named entity recognition models

1) Calculation procedure of BERT structure

The embodiment is based on the improvement of a BERT-BilSTM network, the characteristic enhancement is carried out on an embedding layer of the BERT, and a mechanism for integrating word characteristics into the embedding layer is provided. In step 2, a vocabulary enhanced word vector representation C' is obtained, giving a sentence sequence s of length n_c＝{c₁,c₂,…,c_nNext, fine-tuning the word vector C' in the BERT model, the vocabulary-enhanced BERT word-embedding vector can be expressed as:

E′_i＝C′_i+E_s(i)+E_p(i)

wherein E_sAnd E_pThe separation vector and position vector lookup tables are represented separately, and the resulting E' is then input into BERT, the calculation formula for each transform block being as follows:

D＝LN(H_k-1+MHA(H_k-1)

H_k＝LN(FFN(D)+D)

wherein H_kRepresents the hidden state output (H) of the k-th layer₀E' represents the underlying word vector); LN is the layer normalization function; MHA is a multi-headed self-attention module; FFN represents a two-layer feedforward neural network. Finally, the hidden state output vector of the last layer of transformer is obtained

The vector is output to subsequent sequence labeling and decoding tasks.

2) Computation procedure for LSTM networks

The embedded vector of the fine-tuned mixing lattice already contains semantic information at word level, because the association of the semantic information between words is more concerned in the process of word fusion, in order to capture the global semantic information between words in a sentence and better improve the performance of NER, the method commonly used by most NER models is adopted, namely, BilSTM is used as the sequence labeling layer of the model of the embodiment. Given a sentence s_c＝{c₁,c₂,…,c_nAt the previous step, a vocabulary enhanced character feature representation C '═ { C'₁,c′₂,…,c′_nAre multiplied byOutput matrix with hidden state after BERT fine adjustment

Will be provided with

Input into a bi-directional LSTM network that is capable of capturing semantic information from the left to the right and right to the left of a sentence, respectively.

The hidden state output of the forward LSTM network can be represented as

The same backward LSTM network outputs

The output of the BI-LSTM network is therefore the output of the sequence-labeling layer, which can be expressed as:

wherein h is_iRepresents the cascade hidden state output of the ith Bi-LSTMs neuron, which is used by us to represent c_iIs used to generate a character-level context semantic representation.

3) Decoding process for CRF network

After the model has undergone computation of the sequence labeling layer, the NER label is predicted using the standard CRF layer. Giving hidden state output vector H ═ H of last layer of network₁,h₂,…,h_nIf y is equal to { y }₁,y₂,…,y_nDenotes a sequence of labels, s ═ s for a sentence₁,s₂,…,s_nThe probability of its corresponding tag sequence is defined as follows:

wherein y' represents any one tag in all tag sequencesA sequence;

the representation corresponds to y_i-1And y_iThe offset between. The negative log-likelihood loss is taken as a loss function of the model, which can be expressed as:

and 4, step 4: model learning process

The method proposed in this embodiment is trained on four public data separately, and the model converges by optimizing the above-mentioned negative log-likelihood loss. The model uses a learning rate updating strategy of warmup, the BERT model parameter uses a learning rate of 1e-5, and the model can be quickly converged by a small learning rate in the fine adjustment process; for the parameters of the LSTM model, a learning rate of 1e-3 is used, and for all other parameters, a learning rate of 1e-4 is set. On four datasets, the model converged within 20 epochs. Experiments were performed on MSRA datasets using V100 graphics cards, and other datasets were performed on 1080Ti graphics cards. In experiments, results on different machines are found to have certain differences, so the experimental results are the average value of the experimental results of a plurality of times.

The embodiment evaluates four public Chinese named entity recognition data sets, compares the effects with other models, and shows the effect of the invention through table display data.

Introduction of data set:

the Weibo dataset is a social media public dataset collected from the Xinlang microblog website, and contains four entity types: place name, person name, organization, and politically related entity name. The Resume dataset is also from the New wave social media data, and is about financial Resume data. The MSRA and ontotonotes 4 data sets originate from the public news domain and contain authentic labels for training data. The statistics of these data sets are shown in table 1.

Table 1 table of statistics for data sets

And (3) measurement indexes are as follows:

the F1 value, a measure commonly used by classification models, is used here to compare the recognition accuracy of other models and the present invention. First, some preconditions for calculating the F1 value are introduced: TP, True Positives, indicates that the sample is divided into positive samples and allocated correctly; TN, True negotives, indicates that the sample is divided into samples and assigned correctly; FP, False Positives, indicates that a sample is divided into positive samples but is allocated incorrectly; FN, False Negatives, indicates that the sample is divided into negative samples but is misassigned. Precision, which represents the ratio of the number of correctly assigned positive samples to the total number of correctly assigned positive samples, is given by:

recall, the Recall rate, represents the proportion of correctly assigned positive samples to the total number of positive samples, and is given by the formula:

the F1-Score is also called F1 Score, is a measure of the classification problem, is often used as a final indicator of the multi-classification problem, and is a harmonic mean of the precision and the recall ratio. For a single category of F1 scores, the following formula can be used to calculate

Description of the effects:

the F1 values on the four data are shown in the table below. This embodiment compares not only the method of using BERT to pre-train word vectors, but also other classical vocabulary enhancement SOTA methods. On the basis of ensuring that the same pre-training word vector and word vector are used, the method provided by the embodiment achieves the performances of 71.88%, 96.22%, 95.72% and 81.75% of F1 values on four public data sets Weibo, Resume, MSRA and Ontonote4 respectively. Compared with the performances of the classical Chinese NER models, namely Lattice-LSTM, LR-CNN and FLAT, the method provided by the embodiment respectively achieves the improvement of 8.46%, 0.69%, 5% and 1.37% on the four data sets. Compared with NER models using BERT pre-training word vectors, such as Softlexicon-BERT, FLAT-BERT, LEBERT and the like, on Resume, MSRA and OnNote4 data sets, the method provided by the embodiment achieves slight improvements of 0.28%, 0.12% and 0.41%, but F1 score on Weibo data set is obviously better than that of Softlexicon BERT model and is improved by 1.64%. This embodiment is based on BERT-BilSTM improvement, and compared with the BERT reference model without using the vocabulary enhancement network, the embodiment has performance improvement of 4.55%, 0.54%, 1.82% and 0.91% on the four data sets, which shows that improving the feature representation of the word vector by the vocabulary enhancement technology is a method for effectively improving the performance of the NER. Meanwhile, compared with other vocabulary enhancement methods, the feature fusion framework (MELSN) provided by the embodiment can effectively fuse richer vocabulary semantic features, and word feature representation after vocabulary enhancement contains more vocabulary semantics by means of a fine adjustment mechanism of the pre-training model BERT. Referring to table 2, the comparison experiment results show that the embodiment can better utilize the word-word lattice structure to realize vocabulary enhancement, and also show the high efficiency of the method provided by the embodiment on the recognition task of the named entity in chinese.

Table 2 comparative experiment results table

The above are only preferred embodiments of the present invention, and the scope of the present invention is not limited to the above examples, and all technical solutions that fall under the spirit of the present invention belong to the scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims

1. A named entity recognition method based on a mixed lattice self-attention network is characterized by comprising the following steps:

s2, constructing a corresponding self-attention network based on the word vectors of the mixed lattice structure generated in the step S1 to capture the influence of the word vectors in the vectors on the word vectors so as to enhance the feature representation of each word vector;

2. The method for recognizing named entities based on mixed-lattice self-attention network as claimed in claim 1, wherein in step S1, the process of encoding sentence feature vectors represented by word pairs into a matrix with fixed dimension by using mixed-word lattice encoding to obtain word vector representations with corresponding mixed-lattice structure comprises the following steps:

s11, a sentence S is given_c＝{c₁,c₂,…,c_nBy loading with prestageThe trained BERT weights to obtain a sentence s_cWord feature vector representation of

Wherein

s13, all matched words are grouped according to BMES marks, namely for the character c_iWord set B (c)_i) Consisting of matching words starting with it, set M (c)_i) From c_iSet E (c) for matching word composition of its internal characters_i) By c_iMatching word composition at the end, set S (c)_i) From c_iThe single character word composition of (1); sentence s_cIn each word c_iWord set w of_iExpressed as:

w_i＝{e^w(B(c_i)),e^w(M(c_i)),e^w(E(c_i)),e^w(S(c_i))}；

wherein e^wA lookup table of word vectors representing pre-training;

When BERT is in fine adjustment, learning the two layers of weights to enable pre-trained word feature vectors to be mapped to a semantic feature space of the BERT; the processed word feature vector is represented as follows:

wherein W₁∈(d_c×d_c),W₂∈(d_c×d_w) Is a learnable weight matrix, b₁And b₂Is a corresponding offset, d_cDimension, d, representing the BERT word vector_wA dimension representing a pre-training word vector;

s15, converting the word feature vector v_i ^wAs the input of the feature fusion model, according to the corresponding relation between the words and the word sets, the feature of each word-word pair is expressed as:

s16, representing the characteristics of the word-word pairs as follows:

wherein

Representing a vector concatenator.

3. The method for named entity recognition based on mixed-lattice self-attention network as claimed in claim 2, wherein the step S2 of constructing a corresponding self-attention network based on the word vectors of the mixed-lattice structure generated in step S1 to capture the influence of the word vectors in the vectors on the word vectors, so as to enhance the feature representation of each word vector comprises the following steps:

[Q,K,V]＝[W_qV^ME,W_kV^ME,W_vV^ME]；

wherein

F_Att＝Softmax(S_Att+εM)V；

C′＝C+g(F_Att)；

wherein

Representing the pre-training word vector features of BERT, the function g (. + -) is used to remove the word vector path in self-entry network to ensure C and F_AttAnd (5) obtaining word embedded vector C' after vocabulary enhancement by the consistency of vector dimensions.

4. The method for named entity recognition based on hybride self-attention network of claim 3, wherein the step S3 of constructing the entity recognition model based on hybride self-attention network comprises the following steps:

s31, a sentence sequence S with the length of n is given_c＝{c₁,c₂,…,c_nThe vocabulary enhanced word vector is denoted as C '═ C'₁,c′₂,…,c′_nFine-tuning a word vector C' in the BERT model, the vocabulary-enhanced BERT word-embedding vector is represented as:

E′_i＝C′_i+E_s(i)+E_p(i)；

wherein E_sAnd E_pRespectively representing a separation vector and a position vector lookup table; i denotes a character sequence s of length n_cThe ith character in (a);

D＝LN(H_k-1+MHA(H_k-1)；

H_k＝LN(FFN(D)+D)；

s33, obtaining the hidden state output vector of the last layer of transformer

Will be provided with

The backward LSTM network outputs

wherein h is_iRepresents the cascade hidden state output of the ith Bi-LSTMs neuron and is used for representing c_iThe character-level context semantic representation of;

wherein y' represents any one of all tag sequences;

representation corresponds to y_i-1And y_iAn offset between;

5. a named entity recognition device based on a mixed lattice self-attention network is characterized by comprising a mixed lattice structure coding module, a vocabulary enhancing module, a sequence marking and decoding module and a model training module;