CN114429132A - Named entity identification method and device based on mixed lattice self-attention network - Google Patents
Named entity identification method and device based on mixed lattice self-attention network Download PDFInfo
- Publication number
- CN114429132A CN114429132A CN202210172667.4A CN202210172667A CN114429132A CN 114429132 A CN114429132 A CN 114429132A CN 202210172667 A CN202210172667 A CN 202210172667A CN 114429132 A CN114429132 A CN 114429132A
- Authority
- CN
- China
- Prior art keywords
- word
- vector
- network
- self
- mixed
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/242—Dictionaries
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Data Mining & Analysis (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Biophysics (AREA)
- Life Sciences & Earth Sciences (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a named entity identification method based on a mixed lattice self-attention network, which comprises the following steps: s1, coding the sentence characteristic vector expressed by the word pair into a matrix with fixed dimensionality to obtain the word vector expression with a mixed lattice structure; constructing a self-attention network to capture the influence of word vectors in the vector on the word vectors and enhance the feature representation of each word vector; word features are fused in an Embedding layer of BERT, and better word vector representation is obtained through fine tuning of a learning process; and realizing an entity sequence labeling task and a decoding process in entity identification according to a BilSTM-CRF network, completing modeling of the character characteristics after fusion through the network, and constructing and completing an entity identification model based on a mixed lattice self-attention network. The invention can capture global vocabulary information, generate word vector representation with rich semantics and improve the recognition precision of the Chinese named entity on a plurality of data sets.
Description
Technical Field
The invention relates to the technical field of natural language processing in artificial intelligence, in particular to a named entity identification method and device based on a mixed lattice self-attention network.
Background
Named Entity Recognition (NER), also called entity extraction, was originally proposed at the MUC-6 conference, a technique for extracting entities from text in information extraction technology. Early entity identification adopts methods based on rules, statistics and the like, and because the traditional methods excessively depend on manual design, the coverage rate of identification is low, the identification precision is low, and the traditional methods are already replaced by deep learning methods. In the deep learning-based method, entity recognition models are divided into a character-based model (character-based) and a word-based model (word-based), and other languages such as english generally adopt the character-based model because each word has a definite meaning; the meaning of a word in chinese is ambiguous and the meaning of a word is concrete, so a word-based model is used in the chinese NER method. In order to better represent each word vector in Chinese, later, some learners propose a method based on representation learning, which is a learning mode for converting human language information into characteristics capable of being recognized by a machine, and can improve the accuracy of semantic expression in machine learning.
In the named entity recognition method, external vocabulary information can effectively improve recognition accuracy, but the methods depend on the performance of a fusion algorithm. For example, the invention with the patent number CN113836930A proposes a method for recognizing named entities of chinese dangerous chemicals, which is based on a BiLSTM-CRF model, and utilizes a pre-training language model BERT to obtain text character level codes in the field of dangerous chemicals, to obtain word vectors based on context information, and then introduces an attention mechanism to enhance the capability of the model to mine global and local features of texts. The invention with the patent number of CN113128232A provides a named entity recognition method based on ALBERT and multiple word information embedding, which can effectively represent the ambiguity of characters and improve the efficiency of entity recognition. The invention with the patent number of CN111310470A discloses a Chinese named entity recognition method fusing word features, the comprehension of a model to a text is enhanced through result data obtained after comprehensive analysis, and the F1 value in a model recognition task is improved.
Although the existing method has achieved good effect on fusing word feature vectors, the problems existing in the existing technical means are: 1) the word feature fusion method does not consider the difference of word vectors trained by different models in semantic expression, and directly fuses the word vectors and the word vectors in a void manner, so that the word-level features of the word vectors cannot be effectively enhanced; 2) in the vocabulary enhancement method based on the weight of the learning words, only the influence of the matching words of each character characteristic on the semantic representation of the character is considered, and the effect of global vocabulary information is ignored.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a named entity recognition method and device based on a mixed lattice self-attention network, and based on the idea of expression learning, the proposed model can fuse vocabulary information so as to enhance the feature expression of a word vector, so that the generated word vector contains more entity boundary information, and the accuracy of an NER task can be improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
a named entity recognition method based on a mixed lattice self-attention network comprises the following steps:
s1, searching words composed of continuous words in the input sentence in the dictionary, combining into a single multidimensional vector through position alternate mapping, and coding the sentence characteristic vector represented by the word pair into a matrix with fixed dimensionality by adopting a mixed word lattice coding mode to obtain the word vector representation of a corresponding mixed lattice structure;
s2, constructing a corresponding self-attention network based on the word vectors of the mixed lattice structure generated in the step S1 to capture the influence of the word vectors in the vectors on the word vectors, so as to enhance the feature representation of each word vector;
s3, fusing word features at the Embedding layer of BERT, and learning to obtain better word vector representation through a fine tuning learning process; according to a BilSTM-CRF network, realizing an entity sequence labeling task and a decoding process in entity identification, completing modeling of character features after fusion through the network, and constructing and completing an entity identification model based on a mixed lattice self-attention network;
and S4, training the entity recognition model based on the mixed lattice self-attention network on the data set.
In order to optimize the technical scheme, the specific measures adopted further comprise:
further, in step S1, the process of encoding the sentence feature vectors represented by the word pairs into a matrix with a fixed dimension by using a mixed-word lattice encoding method to obtain a word vector representation with a corresponding mixed-lattice structure includes the following steps:
s11, a sentence S is givenc={c1,c2,…,cnGet sentence s by loading the pretrained BERT weightscWord feature vector representation ofWhereinciDenotes scOf (a), n represents the word length of s, eBA lookup table representing a BERT pre-training word vector;
s12, giving a Chinese dictionary L, constructing a Trie dictionary tree, traversing the nodes of the tree, and obtaining the vocabulary matched with each word;
s13, all matched words are grouped according to BMES marks, namely for the character ciWord set B (c)i) Consisting of matching words starting with it, set M (c)i) From ciSet E (c) for matching word composition of its internal charactersi) By ciMatching word composition at the end, set S (c)i) From ciThe single character word composition of (1); sentence scIn each word ciWord set wiExpressed as:
wi={ew(B(ci)),ew(M(ci)),ew(E(ci)),ew(S(ci))};
wherein ewA lookup table of word vectors representing pre-training;
s14, setting two learnable nonlinear full-connection layers to connect wiIs raised to a sum word vectorWhen BERT is in fine adjustment, learning the weights of the two layers so as to map the pre-trained word feature vector to the semantic feature space of the BERT; the processed word feature vector is represented as follows:
wherein W1∈(dc×dc),W2∈(dc×dw) Is a learnable weight matrix, b1And b2Is a corresponding offset, dcDimension representing the vector of the BERT word, dwA dimension representing a pre-training word vector;
s15, converting the word feature vectorAs the input of the feature fusion model, according to the corresponding relation between the words and the word sets, the feature of each word-word pair is expressed as:
s16, characterizing the word-word pairs as follows:
Further, in step S2, based on the word vectors of the mixed-lattice structure generated in step S1, constructing a corresponding self-attention network to capture the influence of the word vectors in the vectors on the word vectors, so as to enhance the feature representation of each word vector, includes the following steps:
s21, designing Mixed-lattice self-attention network to capture the association between word features, the self-attention network will mix word code vector VMEAnd a word position shielding matrix M is used as an input of the enhancement network, global word vectors and word vectors are modeled through the self-attention network, so that the model learns word meaning correlation weights among words, and the Q, K, V matrix is calculated as follows:
[Q,K,V]=[WqVME,WkVME,WvVME];
whereinIs a learnable weight matrix, and de=dc+dw(ii) a Q, K, V matrix is query item matrix, key item matrix corresponding to query item and value item matrix to be weighted average; deDimension, d, representing mixed-lattice vectorcDimension, d, representing word vectorwThe dimension of the word vector;
s22, using the dot product operation as a formula for calculating the similarity score:
FAtt=Softmax(SAtt+εM)V;
where M is a static word-position mask matrix, ε is a matrix whose value is infinitesimal,is an output from the attention network; wherein SAttDenotes the normalized attention score, KTRepresents the transpose of matrix K;
s23, adding the word feature information as a residual into the BERT pre-training word vector, and obtaining the word enhanced word feature vector as follows:
C′=C+g(FAtt);
whereinRepresenting the pre-training word vector features of BERT, the function g (. + -) is used to remove the word vector path in self-entry network to ensure C and FAttAnd (5) obtaining word embedding vector C' after vocabulary enhancement through the consistency of vector dimensions.
Further, in step S3, the process of constructing the entity recognition model based on the hybride self-attention network includes the following steps:
s31, a sentence sequence S with the length of n is givenc={c1,c2,…,cnThe vocabulary enhanced word vector is denoted as C '═ C'1,c′2,…,c′nFine-tuning word vectors C' in the BERT model, the vocabulary-enhanced BERT word-embedding vector is expressed as:
E′i=C′i+Es(i)+Ep(i);
wherein EsAnd EpRespectively representing a separation vector and a position vector lookup table; i denotes a character sequence s of length ncThe ith character in (1);
s32, inputting the obtained E' into BERT, wherein the calculation formula of each transform block is as follows:
D=LN(Hk-1+MHA(Hk-1);
Hk=LN(FFN(D)+D);
wherein HkRepresents the hidden state output of the k-th layer, H0E' represents the underlying word vector; LN is the layer normalization function; MHA is a multi-headed self-attention module; FFN represents a two-layer feedforward neural network; d represents an output vector after the multi-head attention module is normalized;
s33, obtaining the hidden state output vector of the last layer of transformerWill be provided withInputting the semantic information into a bidirectional LSTM network, and capturing semantic information from left to right and from right to left of a sentence respectively; the hidden state output of the forward LSTM network is represented asThe backward LSTM network outputsThe output of the BI-LSTM network is the output of the sequence-labeling layer, expressed as:
wherein h isiThe cascade hidden state output of the ith Bi-LSTMs neuron is expressed and used for expressing ciThe character-level context semantic representation of;
s34, predicting the NER label by using standard CRF layer, and giving hidden state output vector H ═ H of last layer of network1,h2,…,hnIf y is equal to { y }1,y2,…,ynDenotes a sequence of labels, s ═ s for a sentence1,s2,…,snThe probability of its corresponding tag sequence is defined as follows:
wherein y' represents any one of all tag sequences;the representation corresponds to yiA learnable weight parameter in the network of (a);the representation corresponds to yi-1And yiAn offset between; in the same way as above, the first and second,respectively representing model weight parameters and offset under any possible label y';
s35, the negative log-likelihood loss as a loss function of the model is expressed as:
based on the named entity recognition method, the invention provides a named entity recognition device based on a mixed lattice self-attention network, wherein the named entity recognition device comprises a mixed lattice structure coding module, a vocabulary enhancement module, a sequence marking and decoding module and a model training module;
the mixed lattice structure coding module is used for searching words consisting of continuous words in input sentences in a dictionary, alternately mapping and merging the words into a single multidimensional vector through positions, and coding sentence characteristic vectors represented by word pairs into a matrix with fixed dimensionality by adopting a mixed word lattice coding mode to obtain word vector representation of a corresponding mixed lattice structure;
the vocabulary enhancement module is used for constructing a corresponding self-attention network based on the generated word vectors of the mixed lattice structure so as to capture the influence of the word vectors in the vectors on the word vectors, thereby enhancing the feature representation of each word vector;
the sequence labeling and decoding module is used for fusing word features at an Embedding layer of BERT and learning to obtain better word vector representation through a fine tuning learning process; according to a BilSTM-CRF network, realizing an entity sequence labeling task and a decoding process in entity identification, completing modeling of character features after fusion through the network, and constructing and completing an entity identification model based on a mixed lattice self-attention network;
the model training module is used for training the entity recognition model based on the hybrid grid self-attention network on the data set.
The invention has the beneficial effects that:
the named entity recognition method and device based on the mixed grid self-attention network capture global vocabulary information by constructing the fusion network, generate word vector representation with rich semantics, and improve the recognition precision of the Chinese named entity on a plurality of data sets. Compared with the BERT reference model without using the vocabulary enhancement network, the invention has the performance improvement of 4.55%, 0.54%, 1.82% and 0.91% on the four data sets respectively, which shows that the improvement of the feature representation of the word vector by the vocabulary enhancement technology is a method for effectively improving the performance of the NER. Meanwhile, compared with other vocabulary enhancement methods, the feature fusion framework (MELSN) provided by the invention can effectively fuse richer vocabulary semantic features, and word feature representation after vocabulary enhancement contains more vocabulary semantics by means of a fine adjustment mechanism of the pre-training model BERT. The comparison experiment result shows that the invention can better utilize the character-word lattice structure to realize word enhancement, and also shows the high efficiency of the method provided by the invention on the recognition task of the Chinese named entity.
Drawings
Fig. 1 is a schematic structural diagram of a named entity recognition apparatus based on a mixed-grid self-attention network according to the present invention.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings.
It should be noted that the terms "upper", "lower", "left", "right", "front", "back", etc. used in the present invention are for clarity of description only, and are not intended to limit the scope of the present invention, and the relative relationship between the terms and the terms is not limited by the technical contents of the essential changes.
The invention provides a named entity recognition method based on a mixed lattice self-attention network, which comprises the following steps:
s1, searching words composed of continuous words in the input sentence in the dictionary, combining into a single multidimensional vector through position alternate mapping, and coding the sentence feature vector represented by the word pair into a matrix with fixed dimension by adopting a mixed word lattice coding mode to obtain the word vector representation of a corresponding mixed lattice structure.
S2, based on the word vectors of the mixed lattice structure generated in step S1, constructing a corresponding self-attention network to capture the influence of the word vectors in the vectors on the word vectors, thereby enhancing the feature representation of each word vector.
S3, fusing word features at the Embedding layer of BERT, and learning to obtain better word vector representation through a fine tuning learning process; and realizing an entity sequence labeling task and a decoding process in entity identification according to a BilSTM-CRF network, completing modeling of the character characteristics after fusion through the network, and constructing and completing an entity identification model based on a mixed lattice self-attention network.
And S4, training the entity recognition model based on the mixed lattice self-attention network on the data set.
Based on the foregoing method, the embodiment further provides a named entity recognition apparatus based on a hybrid grid self-attention network. Fig. 1 is a schematic structural diagram of a named entity recognition apparatus based on a mixed-grid self-attention network according to the present invention. The whole framework is divided into three parts: a Mixed-lattice Encoding Module (Mixed-lattice Encoding Module), a vocabulary Enhancement Module (Lexicon Enhancement Module), and a Sequence-labeling and Decoding Module (Sequence-labeling and Decoding). Completing the encoding of word-word pair vectors in a first module, namely searching all Words in a dictionary tree, loading pre-trained word and word vectors, encoding the word-word pair vectors into Mixed-lattice embedded vectors (Mixed-lattice embedding), and transmitting the Mixed-lattice embedded vectors and word mask vectors (Words mask) generated at the stage to a next module; the second module realizes the vocabulary enhancement process through the proposed self-attention model, and the enhanced word vector representation is transmitted into the BERT model for fine adjustment; and the last module carries out modeling according to the enhanced and fine-tuned word vectors to complete the label prediction and decoding process of each word vector.
The invention provides a named entity recognition method based on vocabulary enhancement, wherein a model is improved based on a BERT-BilSTM-CRF network, and a vocabulary enhancement module based on an Attention network is added on an Embedding layer of the BERT; the word vector coding method is characterized in that the word vector coding method is designed for coding the word vector, an intersection arrangement word vector coding mode is designed, after vocabulary enhancement is completed through an Attention network, fusion characteristics are normalized, and then the original BERT word vector is added in a residual error mode. The specific implementation process of the invention is as follows:
step 1: word vector construction of mixed lattice structure
The Mixed-word Lattice (Mixed-Lattice) coding has the effect of coding sentence feature vectors represented by word pairs into a matrix with a fixed dimension. Firstly, words composed of continuous words in an input sentence are searched in a dictionary, and then the words are combined into a single multidimensional vector through position alternate mapping.
In particular, a sentence s is givenc={c1,c2,…,cnThe sentence s can be directly obtained by loading the weight of the pretrained BERTcWord feature vector representation ofWhereinciDenotes scOf (a), n represents the word length of s, eBA lookup table representing the BERT pre-training word vector. Given a Chinese dictionary L, a Trie dictionary tree is firstly constructed, nodes of the tree are traversed, and vocabularies matched with each word can be obtained. Grouping all matched words according to 'BMES' mark, namely for character ciWord set B (c)i) Consisting of the matching words that begin with it. Similarly, set M (c)i) From ciSet E (c) for matching word composition of its internal charactersi) From c to ciMatching word composition at the end, set S (c)i) From ciSo that the sentence s is composed of the single character wordscIn each word ciWord set wiCan be expressed as:
wi={ew(B(ci)),ew(M(ci)),ew(E(ci)),ew(S(ci))};
wherein ewRepresenting a pre-trained word vector look-up table. In order to ensure the consistency of vector dimensions, two learnable nonlinear full-connection layers are arranged to connect wiIs raised to a sum word vectorAnd in the process of fine adjustment of BERT, learning the two layers of weights, so that the pre-trained word feature vector can be mapped to the semantic feature space of the BERT. The processed word feature vector is represented as follows:
wherein W1∈(dc×dc),W2∈(dc×dw) Is a learnable weight matrix, b1And b2Is the corresponding offset. d is a radical ofcDimension, d, representing a BERT word vectorwRepresenting the dimensions of the pre-training word vector. In our feature fusion method, the converted word feature vectorAs input to the feature fusion model. According to the corresponding relation between the words and the word sets, the characteristics of each word-word pair can be expressed as:
at present, a plurality of NER methods based on vocabulary fusion directly use character-word pair characteristics IiAs the input of the vocabulary enhancement network, the method can only fuse partial vocabulary information, and in the experiment, we find that in the sentence scIn order to enable the model to capture global word-level features, the embodiment proposes a new word-word pairThe encoding mode, i.e. Mixed-word encoding (Mixed-Lattice encoding), features of word-word pairs are expressed as follows:
whereinRepresenting a vector concatenator, the word vectors are cross-encoded into a fixed-dimension feature vector based on the encoding method, in which the closer the word vectors are arranged, the higher the association with the word, which also corresponds to reality. In the next phase, the embodiment constructs a fusion network based on the Attention mechanism to represent V for the word-word pair featureMEAnd modeling.
And 2, step: computing process for word feature fusion network
The function of the vocabulary enhancement network is to model a word-word feature representation vector VMEThe word features are used to enhance the feature representation of the word vectors in the sequence. Based on the attention mechanism design idea proposed by Vaswani et al, a Mixed-lattice self-attention network is designed to capture the association between word features. In the previous stage we have obtained a mixed lattice vector representation V of the pairs of code word wordsMEThe Lattice structure organizes the word and word vectors into a multi-dimensional vector. Characters and word features are distributed in the embedding in a crossed manner, and a self-attribute network distributes VMEAnd the word position shielding matrix M is used as the input of the enhancement network, and through the modeling of the self-attention network, the model can learn the word meaning correlation weight between words. Given a mixed word code vector V of a sentenceMEAnd the M matrix, Q, K, V matrix is calculated as follows:
[Q,K,V]=[WqVME,WkVME,WvVME];
whereinIs a learnable weightA matrix, and de=dc+dwThen, the dot product operation is taken as a calculation formula of the similarity score:
FAtt=Softmax(SAtt+εM)V;
where M is a static word-position mask matrix, ε is a matrix whose value is infinitesimal,is the output of the self-attention network. The attention scores obtained at word positions are masked by multiplying the M points by an infinitesimal matrix. Through calculation of the Softmax function, the probability of the word position is all zero, and the position of the character contains the weight fraction of each word. And completing the fusion of word characteristics by using a word position mask method in the self-attribute network.
Through the above process, the word characteristics can be integrated, the word characteristic information is added into the BERT pre-training word vector as a residual error, and the finally obtained word enhanced word characteristic vector is as follows:
C′=C+g(FAtt);
whereinRepresenting the pre-training word vector features of BERT, using a function g (#) to remove the word vector channel in self-entry network to guarantee C and FAttThe consistency of the vector dimensions results in a vocabulary enhanced word embedding vector C'.
Based on the above process, the word-word pair vector is recoded, and a special self-attention network is utilized to model the global word vector and the word vector, so that the fused word feature vector is obtained, and the fusion of the global word feature information is realized.
And step 3: constructing named entity recognition models
1) Calculation procedure of BERT structure
The embodiment is based on the improvement of a BERT-BilSTM network, the characteristic enhancement is carried out on an embedding layer of the BERT, and a mechanism for integrating word characteristics into the embedding layer is provided. In step 2, a vocabulary enhanced word vector representation C' is obtained, giving a sentence sequence s of length nc={c1,c2,…,cnNext, fine-tuning the word vector C' in the BERT model, the vocabulary-enhanced BERT word-embedding vector can be expressed as:
E′i=C′i+Es(i)+Ep(i)
wherein EsAnd EpThe separation vector and position vector lookup tables are represented separately, and the resulting E' is then input into BERT, the calculation formula for each transform block being as follows:
D=LN(Hk-1+MHA(Hk-1)
Hk=LN(FFN(D)+D)
wherein HkRepresents the hidden state output (H) of the k-th layer0E' represents the underlying word vector); LN is the layer normalization function; MHA is a multi-headed self-attention module; FFN represents a two-layer feedforward neural network. Finally, the hidden state output vector of the last layer of transformer is obtainedThe vector is output to subsequent sequence labeling and decoding tasks.
2) Computation procedure for LSTM networks
The embedded vector of the fine-tuned mixing lattice already contains semantic information at word level, because the association of the semantic information between words is more concerned in the process of word fusion, in order to capture the global semantic information between words in a sentence and better improve the performance of NER, the method commonly used by most NER models is adopted, namely, BilSTM is used as the sequence labeling layer of the model of the embodiment. Given a sentence sc={c1,c2,…,cnAt the previous step, a vocabulary enhanced character feature representation C '═ { C'1,c′2,…,c′nAre multiplied byOutput matrix with hidden state after BERT fine adjustmentWill be provided withInput into a bi-directional LSTM network that is capable of capturing semantic information from the left to the right and right to the left of a sentence, respectively.
The hidden state output of the forward LSTM network can be represented asThe same backward LSTM network outputsThe output of the BI-LSTM network is therefore the output of the sequence-labeling layer, which can be expressed as:
wherein h isiRepresents the cascade hidden state output of the ith Bi-LSTMs neuron, which is used by us to represent ciIs used to generate a character-level context semantic representation.
3) Decoding process for CRF network
After the model has undergone computation of the sequence labeling layer, the NER label is predicted using the standard CRF layer. Giving hidden state output vector H ═ H of last layer of network1,h2,…,hnIf y is equal to { y }1,y2,…,ynDenotes a sequence of labels, s ═ s for a sentence1,s2,…,snThe probability of its corresponding tag sequence is defined as follows:
wherein y' represents any one tag in all tag sequencesA sequence;the representation corresponds to yiA learnable weight parameter in the network of (a);the representation corresponds to yi-1And yiThe offset between. The negative log-likelihood loss is taken as a loss function of the model, which can be expressed as:
and 4, step 4: model learning process
The method proposed in this embodiment is trained on four public data separately, and the model converges by optimizing the above-mentioned negative log-likelihood loss. The model uses a learning rate updating strategy of warmup, the BERT model parameter uses a learning rate of 1e-5, and the model can be quickly converged by a small learning rate in the fine adjustment process; for the parameters of the LSTM model, a learning rate of 1e-3 is used, and for all other parameters, a learning rate of 1e-4 is set. On four datasets, the model converged within 20 epochs. Experiments were performed on MSRA datasets using V100 graphics cards, and other datasets were performed on 1080Ti graphics cards. In experiments, results on different machines are found to have certain differences, so the experimental results are the average value of the experimental results of a plurality of times.
The embodiment evaluates four public Chinese named entity recognition data sets, compares the effects with other models, and shows the effect of the invention through table display data.
Introduction of data set:
the Weibo dataset is a social media public dataset collected from the Xinlang microblog website, and contains four entity types: place name, person name, organization, and politically related entity name. The Resume dataset is also from the New wave social media data, and is about financial Resume data. The MSRA and ontotonotes 4 data sets originate from the public news domain and contain authentic labels for training data. The statistics of these data sets are shown in table 1.
Table 1 table of statistics for data sets
And (3) measurement indexes are as follows:
the F1 value, a measure commonly used by classification models, is used here to compare the recognition accuracy of other models and the present invention. First, some preconditions for calculating the F1 value are introduced: TP, True Positives, indicates that the sample is divided into positive samples and allocated correctly; TN, True negotives, indicates that the sample is divided into samples and assigned correctly; FP, False Positives, indicates that a sample is divided into positive samples but is allocated incorrectly; FN, False Negatives, indicates that the sample is divided into negative samples but is misassigned. Precision, which represents the ratio of the number of correctly assigned positive samples to the total number of correctly assigned positive samples, is given by:
recall, the Recall rate, represents the proportion of correctly assigned positive samples to the total number of positive samples, and is given by the formula:
the F1-Score is also called F1 Score, is a measure of the classification problem, is often used as a final indicator of the multi-classification problem, and is a harmonic mean of the precision and the recall ratio. For a single category of F1 scores, the following formula can be used to calculate
Description of the effects:
the F1 values on the four data are shown in the table below. This embodiment compares not only the method of using BERT to pre-train word vectors, but also other classical vocabulary enhancement SOTA methods. On the basis of ensuring that the same pre-training word vector and word vector are used, the method provided by the embodiment achieves the performances of 71.88%, 96.22%, 95.72% and 81.75% of F1 values on four public data sets Weibo, Resume, MSRA and Ontonote4 respectively. Compared with the performances of the classical Chinese NER models, namely Lattice-LSTM, LR-CNN and FLAT, the method provided by the embodiment respectively achieves the improvement of 8.46%, 0.69%, 5% and 1.37% on the four data sets. Compared with NER models using BERT pre-training word vectors, such as Softlexicon-BERT, FLAT-BERT, LEBERT and the like, on Resume, MSRA and OnNote4 data sets, the method provided by the embodiment achieves slight improvements of 0.28%, 0.12% and 0.41%, but F1 score on Weibo data set is obviously better than that of Softlexicon BERT model and is improved by 1.64%. This embodiment is based on BERT-BilSTM improvement, and compared with the BERT reference model without using the vocabulary enhancement network, the embodiment has performance improvement of 4.55%, 0.54%, 1.82% and 0.91% on the four data sets, which shows that improving the feature representation of the word vector by the vocabulary enhancement technology is a method for effectively improving the performance of the NER. Meanwhile, compared with other vocabulary enhancement methods, the feature fusion framework (MELSN) provided by the embodiment can effectively fuse richer vocabulary semantic features, and word feature representation after vocabulary enhancement contains more vocabulary semantics by means of a fine adjustment mechanism of the pre-training model BERT. Referring to table 2, the comparison experiment results show that the embodiment can better utilize the word-word lattice structure to realize vocabulary enhancement, and also show the high efficiency of the method provided by the embodiment on the recognition task of the named entity in chinese.
Table 2 comparative experiment results table
The above are only preferred embodiments of the present invention, and the scope of the present invention is not limited to the above examples, and all technical solutions that fall under the spirit of the present invention belong to the scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.
Claims (5)
1. A named entity recognition method based on a mixed lattice self-attention network is characterized by comprising the following steps:
s1, searching words composed of continuous words in the input sentence in the dictionary, combining into a single multidimensional vector through position alternate mapping, and coding the sentence characteristic vector represented by the word pair into a matrix with fixed dimensionality by adopting a mixed word lattice coding mode to obtain the word vector representation of a corresponding mixed lattice structure;
s2, constructing a corresponding self-attention network based on the word vectors of the mixed lattice structure generated in the step S1 to capture the influence of the word vectors in the vectors on the word vectors so as to enhance the feature representation of each word vector;
s3, fusing word features at the Embedding layer of BERT, and learning to obtain better word vector representation through a fine tuning learning process; according to a BilSTM-CRF network, realizing an entity sequence labeling task and a decoding process in entity identification, completing modeling of character features after fusion through the network, and constructing and completing an entity identification model based on a mixed lattice self-attention network;
and S4, training the entity recognition model based on the mixed lattice self-attention network on the data set.
2. The method for recognizing named entities based on mixed-lattice self-attention network as claimed in claim 1, wherein in step S1, the process of encoding sentence feature vectors represented by word pairs into a matrix with fixed dimension by using mixed-word lattice encoding to obtain word vector representations with corresponding mixed-lattice structure comprises the following steps:
s11, a sentence S is givenc={c1,c2,…,cnBy loading with prestageThe trained BERT weights to obtain a sentence scWord feature vector representation ofWhereinciDenotes scOf (a), n represents the word length of s, eBA lookup table representing a BERT pre-training word vector;
s12, giving a Chinese dictionary L, constructing a Trie dictionary tree, traversing the nodes of the tree, and obtaining the vocabulary matched with each word;
s13, all matched words are grouped according to BMES marks, namely for the character ciWord set B (c)i) Consisting of matching words starting with it, set M (c)i) From ciSet E (c) for matching word composition of its internal charactersi) By ciMatching word composition at the end, set S (c)i) From ciThe single character word composition of (1); sentence scIn each word ciWord set w ofiExpressed as:
wi={ew(B(ci)),ew(M(ci)),ew(E(ci)),ew(S(ci))};
wherein ewA lookup table of word vectors representing pre-training;
s14, setting two learnable nonlinear full-connection layers to connect wiIs raised to a sum word vectorWhen BERT is in fine adjustment, learning the two layers of weights to enable pre-trained word feature vectors to be mapped to a semantic feature space of the BERT; the processed word feature vector is represented as follows:
wherein W1∈(dc×dc),W2∈(dc×dw) Is a learnable weight matrix, b1And b2Is a corresponding offset, dcDimension, d, representing the BERT word vectorwA dimension representing a pre-training word vector;
s15, converting the word feature vector vi wAs the input of the feature fusion model, according to the corresponding relation between the words and the word sets, the feature of each word-word pair is expressed as:
s16, representing the characteristics of the word-word pairs as follows:
3. The method for named entity recognition based on mixed-lattice self-attention network as claimed in claim 2, wherein the step S2 of constructing a corresponding self-attention network based on the word vectors of the mixed-lattice structure generated in step S1 to capture the influence of the word vectors in the vectors on the word vectors, so as to enhance the feature representation of each word vector comprises the following steps:
s21, designing Mixed-lattice self-attention network to capture the association between word features, the self-attention network will mix word code vector VMEAnd a word position shielding matrix M is used as an input of the enhancement network, global word vectors and word vectors are modeled through the self-attention network, so that the model learns word meaning correlation weights among words, and the Q, K, V matrix is calculated as follows:
[Q,K,V]=[WqVME,WkVME,WvVME];
whereinIs a learnable weight matrix, and de=dc+dw(ii) a Q, K, V matrix is query item matrix, key item matrix corresponding to query item and value item matrix to be weighted average; deDimension, d, representing mixed-lattice vectorcDimension, d, representing word vectorwThe dimension of the word vector;
s22, using the dot product operation as a formula for calculating the similarity score:
FAtt=Softmax(SAtt+εM)V;
where M is a static word-position mask matrix, ε is a matrix whose value is infinitesimal,is an output from the attention network; wherein SAttDenotes the normalized attention score, KTRepresents the transpose of matrix K;
s23, adding the word feature information as a residual into the BERT pre-training word vector, and obtaining the word enhanced word feature vector as follows:
C′=C+g(FAtt);
4. The method for named entity recognition based on hybride self-attention network of claim 3, wherein the step S3 of constructing the entity recognition model based on hybride self-attention network comprises the following steps:
s31, a sentence sequence S with the length of n is givenc={c1,c2,…,cnThe vocabulary enhanced word vector is denoted as C '═ C'1,c′2,…,c′nFine-tuning a word vector C' in the BERT model, the vocabulary-enhanced BERT word-embedding vector is represented as:
E′i=C′i+Es(i)+Ep(i);
wherein EsAnd EpRespectively representing a separation vector and a position vector lookup table; i denotes a character sequence s of length ncThe ith character in (a);
s32, inputting the obtained E' into BERT, wherein the calculation formula of each transform block is as follows:
D=LN(Hk-1+MHA(Hk-1);
Hk=LN(FFN(D)+D);
wherein HkRepresents the hidden state output of the k-th layer, H0E' represents the underlying word vector; LN is the layer normalization function; MHA is a multi-headed self-attention module; FFN represents a two-layer feedforward neural network; d represents an output vector after the multi-head attention module is normalized;
s33, obtaining the hidden state output vector of the last layer of transformerWill be provided withInputting the semantic information into a bidirectional LSTM network, and capturing semantic information from left to right and from right to left of a sentence respectively; the hidden state output of the forward LSTM network is represented asThe backward LSTM network outputsThe output of the BI-LSTM network is the output of the sequence-labeling layer, expressed as:
wherein h isiRepresents the cascade hidden state output of the ith Bi-LSTMs neuron and is used for representing ciThe character-level context semantic representation of;
s34, predicting the NER label by using standard CRF layer, and giving hidden state output vector H ═ H of last layer of network1,h2,…,hnIf y is equal to { y }1,y2,…,ynDenotes a sequence of labels, s ═ s for a sentence1,s2,…,snThe probability of its corresponding tag sequence is defined as follows:
wherein y' represents any one of all tag sequences;the representation corresponds to yiA learnable weight parameter in the network of (a);representation corresponds to yi-1And yiAn offset between;respectively representing model weight parameters and offset under any possible label y';
s35, the negative log-likelihood loss as a loss function of the model is expressed as:
5. a named entity recognition device based on a mixed lattice self-attention network is characterized by comprising a mixed lattice structure coding module, a vocabulary enhancing module, a sequence marking and decoding module and a model training module;
the mixed lattice structure coding module is used for searching words consisting of continuous words in input sentences in a dictionary, alternately mapping and merging the words into a single multidimensional vector through positions, and coding sentence characteristic vectors represented by word pairs into a matrix with fixed dimensionality by adopting a mixed word lattice coding mode to obtain word vector representation of a corresponding mixed lattice structure;
the vocabulary enhancement module is used for constructing a corresponding self-attention network based on the generated word vectors of the mixed lattice structure so as to capture the influence of the word vectors in the vectors on the word vectors, thereby enhancing the feature representation of each word vector;
the sequence labeling and decoding module is used for fusing word features at an Embedding layer of BERT and learning to obtain better word vector representation through a fine tuning learning process; according to a BilSTM-CRF network, realizing an entity sequence labeling task and a decoding process in entity identification, completing modeling of character features after fusion through the network, and constructing and completing an entity identification model based on a mixed lattice self-attention network;
the model training module is used for training the entity recognition model based on the hybrid grid self-attention network on the data set.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210172667.4A CN114429132A (en) | 2022-02-24 | 2022-02-24 | Named entity identification method and device based on mixed lattice self-attention network |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202210172667.4A CN114429132A (en) | 2022-02-24 | 2022-02-24 | Named entity identification method and device based on mixed lattice self-attention network |
Publications (1)
Publication Number | Publication Date |
---|---|
CN114429132A true CN114429132A (en) | 2022-05-03 |
Family
ID=81312807
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202210172667.4A Pending CN114429132A (en) | 2022-02-24 | 2022-02-24 | Named entity identification method and device based on mixed lattice self-attention network |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN114429132A (en) |
Cited By (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114818721A (en) * | 2022-06-30 | 2022-07-29 | 湖南工商大学 | Event joint extraction model and method combined with sequence labeling |
CN115545035A (en) * | 2022-11-29 | 2022-12-30 | 城云科技(中国)有限公司 | Text entity recognition model and construction method, device and application thereof |
CN115935994A (en) * | 2022-12-12 | 2023-04-07 | 重庆邮电大学 | Method for intelligently identifying electric trademark |
CN117992924A (en) * | 2024-04-02 | 2024-05-07 | 云南师范大学 | HyperMixer-based knowledge tracking method |
-
2022
- 2022-02-24 CN CN202210172667.4A patent/CN114429132A/en active Pending
Cited By (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN114818721A (en) * | 2022-06-30 | 2022-07-29 | 湖南工商大学 | Event joint extraction model and method combined with sequence labeling |
CN114818721B (en) * | 2022-06-30 | 2022-11-01 | 湖南工商大学 | Event joint extraction model and method combined with sequence labeling |
CN115545035A (en) * | 2022-11-29 | 2022-12-30 | 城云科技(中国)有限公司 | Text entity recognition model and construction method, device and application thereof |
CN115545035B (en) * | 2022-11-29 | 2023-02-17 | 城云科技(中国)有限公司 | Text entity recognition model and construction method, device and application thereof |
CN115935994A (en) * | 2022-12-12 | 2023-04-07 | 重庆邮电大学 | Method for intelligently identifying electric trademark |
CN115935994B (en) * | 2022-12-12 | 2024-03-08 | 芽米科技(广州)有限公司 | Method for intelligently identifying current label questions |
CN117992924A (en) * | 2024-04-02 | 2024-05-07 | 云南师范大学 | HyperMixer-based knowledge tracking method |
CN117992924B (en) * | 2024-04-02 | 2024-06-07 | 云南师范大学 | HyperMixer-based knowledge tracking method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109992782B (en) | Legal document named entity identification method and device and computer equipment | |
CN114169330B (en) | Chinese named entity recognition method integrating time sequence convolution and transform encoder | |
CN110196980B (en) | Domain migration on Chinese word segmentation task based on convolutional network | |
CN111737496A (en) | Power equipment fault knowledge map construction method | |
CN114429132A (en) | Named entity identification method and device based on mixed lattice self-attention network | |
CN113268995B (en) | Chinese academy keyword extraction method, device and storage medium | |
CN111767718B (en) | Chinese grammar error correction method based on weakened grammar error feature representation | |
CN114757182A (en) | BERT short text sentiment analysis method for improving training mode | |
CN113392209B (en) | Text clustering method based on artificial intelligence, related equipment and storage medium | |
CN113190656A (en) | Chinese named entity extraction method based on multi-label framework and fusion features | |
CN111309918A (en) | Multi-label text classification method based on label relevance | |
CN111368542A (en) | Text language association extraction method and system based on recurrent neural network | |
Deng et al. | Self-attention-based BiGRU and capsule network for named entity recognition | |
CN114818717A (en) | Chinese named entity recognition method and system fusing vocabulary and syntax information | |
Wu et al. | An effective approach of named entity recognition for cyber threat intelligence | |
CN114332519A (en) | Image description generation method based on external triple and abstract relation | |
CN112905736A (en) | Unsupervised text emotion analysis method based on quantum theory | |
CN114757184B (en) | Method and system for realizing knowledge question and answer in aviation field | |
CN111680529A (en) | Machine translation algorithm and device based on layer aggregation | |
CN115545033A (en) | Chinese field text named entity recognition method fusing vocabulary category representation | |
CN115169349A (en) | Chinese electronic resume named entity recognition method based on ALBERT | |
CN115017879A (en) | Text comparison method, computer device and computer storage medium | |
US11822887B2 (en) | Robust name matching with regularized embeddings | |
CN111581365B (en) | Predicate extraction method | |
CN116680407A (en) | Knowledge graph construction method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination |