CN114429132A - Named entity identification method and device based on mixed lattice self-attention network - Google Patents

Named entity identification method and device based on mixed lattice self-attention network Download PDF

Info

Publication number
CN114429132A
CN114429132A CN202210172667.4A CN202210172667A CN114429132A CN 114429132 A CN114429132 A CN 114429132A CN 202210172667 A CN202210172667 A CN 202210172667A CN 114429132 A CN114429132 A CN 114429132A
Authority
CN
China
Prior art keywords
word
vector
network
self
mixed
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210172667.4A
Other languages
Chinese (zh)
Inventor
王立松
何宗锋
刘绍翰
刘亮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Nanjing University of Aeronautics and Astronautics
Original Assignee
Nanjing University of Aeronautics and Astronautics
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Nanjing University of Aeronautics and Astronautics filed Critical Nanjing University of Aeronautics and Astronautics
Priority to CN202210172667.4A priority Critical patent/CN114429132A/en
Publication of CN114429132A publication Critical patent/CN114429132A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Computational Linguistics (AREA)
  • General Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Biomedical Technology (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • Evolutionary Computation (AREA)
  • Data Mining & Analysis (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Biophysics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a named entity identification method based on a mixed lattice self-attention network, which comprises the following steps: s1, coding the sentence characteristic vector expressed by the word pair into a matrix with fixed dimensionality to obtain the word vector expression with a mixed lattice structure; constructing a self-attention network to capture the influence of word vectors in the vector on the word vectors and enhance the feature representation of each word vector; word features are fused in an Embedding layer of BERT, and better word vector representation is obtained through fine tuning of a learning process; and realizing an entity sequence labeling task and a decoding process in entity identification according to a BilSTM-CRF network, completing modeling of the character characteristics after fusion through the network, and constructing and completing an entity identification model based on a mixed lattice self-attention network. The invention can capture global vocabulary information, generate word vector representation with rich semantics and improve the recognition precision of the Chinese named entity on a plurality of data sets.

Description

Named entity identification method and device based on mixed lattice self-attention network
Technical Field
The invention relates to the technical field of natural language processing in artificial intelligence, in particular to a named entity identification method and device based on a mixed lattice self-attention network.
Background
Named Entity Recognition (NER), also called entity extraction, was originally proposed at the MUC-6 conference, a technique for extracting entities from text in information extraction technology. Early entity identification adopts methods based on rules, statistics and the like, and because the traditional methods excessively depend on manual design, the coverage rate of identification is low, the identification precision is low, and the traditional methods are already replaced by deep learning methods. In the deep learning-based method, entity recognition models are divided into a character-based model (character-based) and a word-based model (word-based), and other languages such as english generally adopt the character-based model because each word has a definite meaning; the meaning of a word in chinese is ambiguous and the meaning of a word is concrete, so a word-based model is used in the chinese NER method. In order to better represent each word vector in Chinese, later, some learners propose a method based on representation learning, which is a learning mode for converting human language information into characteristics capable of being recognized by a machine, and can improve the accuracy of semantic expression in machine learning.
In the named entity recognition method, external vocabulary information can effectively improve recognition accuracy, but the methods depend on the performance of a fusion algorithm. For example, the invention with the patent number CN113836930A proposes a method for recognizing named entities of chinese dangerous chemicals, which is based on a BiLSTM-CRF model, and utilizes a pre-training language model BERT to obtain text character level codes in the field of dangerous chemicals, to obtain word vectors based on context information, and then introduces an attention mechanism to enhance the capability of the model to mine global and local features of texts. The invention with the patent number of CN113128232A provides a named entity recognition method based on ALBERT and multiple word information embedding, which can effectively represent the ambiguity of characters and improve the efficiency of entity recognition. The invention with the patent number of CN111310470A discloses a Chinese named entity recognition method fusing word features, the comprehension of a model to a text is enhanced through result data obtained after comprehensive analysis, and the F1 value in a model recognition task is improved.
Although the existing method has achieved good effect on fusing word feature vectors, the problems existing in the existing technical means are: 1) the word feature fusion method does not consider the difference of word vectors trained by different models in semantic expression, and directly fuses the word vectors and the word vectors in a void manner, so that the word-level features of the word vectors cannot be effectively enhanced; 2) in the vocabulary enhancement method based on the weight of the learning words, only the influence of the matching words of each character characteristic on the semantic representation of the character is considered, and the effect of global vocabulary information is ignored.
Disclosure of Invention
Aiming at the defects in the prior art, the invention provides a named entity recognition method and device based on a mixed lattice self-attention network, and based on the idea of expression learning, the proposed model can fuse vocabulary information so as to enhance the feature expression of a word vector, so that the generated word vector contains more entity boundary information, and the accuracy of an NER task can be improved.
In order to achieve the purpose, the invention adopts the following technical scheme:
a named entity recognition method based on a mixed lattice self-attention network comprises the following steps:
s1, searching words composed of continuous words in the input sentence in the dictionary, combining into a single multidimensional vector through position alternate mapping, and coding the sentence characteristic vector represented by the word pair into a matrix with fixed dimensionality by adopting a mixed word lattice coding mode to obtain the word vector representation of a corresponding mixed lattice structure;
s2, constructing a corresponding self-attention network based on the word vectors of the mixed lattice structure generated in the step S1 to capture the influence of the word vectors in the vectors on the word vectors, so as to enhance the feature representation of each word vector;
s3, fusing word features at the Embedding layer of BERT, and learning to obtain better word vector representation through a fine tuning learning process; according to a BilSTM-CRF network, realizing an entity sequence labeling task and a decoding process in entity identification, completing modeling of character features after fusion through the network, and constructing and completing an entity identification model based on a mixed lattice self-attention network;
and S4, training the entity recognition model based on the mixed lattice self-attention network on the data set.
In order to optimize the technical scheme, the specific measures adopted further comprise:
further, in step S1, the process of encoding the sentence feature vectors represented by the word pairs into a matrix with a fixed dimension by using a mixed-word lattice encoding method to obtain a word vector representation with a corresponding mixed-lattice structure includes the following steps:
s11, a sentence S is givenc={c1,c2,…,cnGet sentence s by loading the pretrained BERT weightscWord feature vector representation of
Figure BDA0003517951980000021
Wherein
Figure BDA0003517951980000022
ciDenotes scOf (a), n represents the word length of s, eBA lookup table representing a BERT pre-training word vector;
s12, giving a Chinese dictionary L, constructing a Trie dictionary tree, traversing the nodes of the tree, and obtaining the vocabulary matched with each word;
s13, all matched words are grouped according to BMES marks, namely for the character ciWord set B (c)i) Consisting of matching words starting with it, set M (c)i) From ciSet E (c) for matching word composition of its internal charactersi) By ciMatching word composition at the end, set S (c)i) From ciThe single character word composition of (1); sentence scIn each word ciWord set wiExpressed as:
wi={ew(B(ci)),ew(M(ci)),ew(E(ci)),ew(S(ci))};
wherein ewA lookup table of word vectors representing pre-training;
s14, setting two learnable nonlinear full-connection layers to connect wiIs raised to a sum word vector
Figure BDA0003517951980000023
When BERT is in fine adjustment, learning the weights of the two layers so as to map the pre-trained word feature vector to the semantic feature space of the BERT; the processed word feature vector is represented as follows:
Figure BDA0003517951980000024
wherein W1∈(dc×dc),W2∈(dc×dw) Is a learnable weight matrix, b1And b2Is a corresponding offset, dcDimension representing the vector of the BERT word, dwA dimension representing a pre-training word vector;
s15, converting the word feature vector
Figure BDA0003517951980000025
As the input of the feature fusion model, according to the corresponding relation between the words and the word sets, the feature of each word-word pair is expressed as:
Figure BDA0003517951980000026
s16, characterizing the word-word pairs as follows:
Figure BDA0003517951980000027
wherein
Figure BDA0003517951980000028
Representing a vector concatenator.
Further, in step S2, based on the word vectors of the mixed-lattice structure generated in step S1, constructing a corresponding self-attention network to capture the influence of the word vectors in the vectors on the word vectors, so as to enhance the feature representation of each word vector, includes the following steps:
s21, designing Mixed-lattice self-attention network to capture the association between word features, the self-attention network will mix word code vector VMEAnd a word position shielding matrix M is used as an input of the enhancement network, global word vectors and word vectors are modeled through the self-attention network, so that the model learns word meaning correlation weights among words, and the Q, K, V matrix is calculated as follows:
[Q,K,V]=[WqVME,WkVME,WvVME];
wherein
Figure BDA0003517951980000031
Is a learnable weight matrix, and de=dc+dw(ii) a Q, K, V matrix is query item matrix, key item matrix corresponding to query item and value item matrix to be weighted average; deDimension, d, representing mixed-lattice vectorcDimension, d, representing word vectorwThe dimension of the word vector;
s22, using the dot product operation as a formula for calculating the similarity score:
Figure BDA0003517951980000032
FAtt=Softmax(SAtt+εM)V;
where M is a static word-position mask matrix, ε is a matrix whose value is infinitesimal,
Figure BDA0003517951980000033
is an output from the attention network; wherein SAttDenotes the normalized attention score, KTRepresents the transpose of matrix K;
s23, adding the word feature information as a residual into the BERT pre-training word vector, and obtaining the word enhanced word feature vector as follows:
C′=C+g(FAtt);
wherein
Figure BDA0003517951980000034
Representing the pre-training word vector features of BERT, the function g (. + -) is used to remove the word vector path in self-entry network to ensure C and FAttAnd (5) obtaining word embedding vector C' after vocabulary enhancement through the consistency of vector dimensions.
Further, in step S3, the process of constructing the entity recognition model based on the hybride self-attention network includes the following steps:
s31, a sentence sequence S with the length of n is givenc={c1,c2,…,cnThe vocabulary enhanced word vector is denoted as C '═ C'1,c′2,…,c′nFine-tuning word vectors C' in the BERT model, the vocabulary-enhanced BERT word-embedding vector is expressed as:
E′i=C′i+Es(i)+Ep(i);
wherein EsAnd EpRespectively representing a separation vector and a position vector lookup table; i denotes a character sequence s of length ncThe ith character in (1);
s32, inputting the obtained E' into BERT, wherein the calculation formula of each transform block is as follows:
D=LN(Hk-1+MHA(Hk-1);
Hk=LN(FFN(D)+D);
wherein HkRepresents the hidden state output of the k-th layer, H0E' represents the underlying word vector; LN is the layer normalization function; MHA is a multi-headed self-attention module; FFN represents a two-layer feedforward neural network; d represents an output vector after the multi-head attention module is normalized;
s33, obtaining the hidden state output vector of the last layer of transformer
Figure BDA0003517951980000041
Will be provided with
Figure BDA0003517951980000042
Inputting the semantic information into a bidirectional LSTM network, and capturing semantic information from left to right and from right to left of a sentence respectively; the hidden state output of the forward LSTM network is represented as
Figure BDA0003517951980000043
The backward LSTM network outputs
Figure BDA0003517951980000044
The output of the BI-LSTM network is the output of the sequence-labeling layer, expressed as:
Figure BDA0003517951980000045
wherein h isiThe cascade hidden state output of the ith Bi-LSTMs neuron is expressed and used for expressing ciThe character-level context semantic representation of;
s34, predicting the NER label by using standard CRF layer, and giving hidden state output vector H ═ H of last layer of network1,h2,…,hnIf y is equal to { y }1,y2,…,ynDenotes a sequence of labels, s ═ s for a sentence1,s2,…,snThe probability of its corresponding tag sequence is defined as follows:
Figure BDA0003517951980000046
wherein y' represents any one of all tag sequences;
Figure BDA0003517951980000047
the representation corresponds to yiA learnable weight parameter in the network of (a);
Figure BDA0003517951980000048
the representation corresponds to yi-1And yiAn offset between; in the same way as above, the first and second,
Figure BDA0003517951980000049
respectively representing model weight parameters and offset under any possible label y';
s35, the negative log-likelihood loss as a loss function of the model is expressed as:
Figure BDA00035179519800000410
based on the named entity recognition method, the invention provides a named entity recognition device based on a mixed lattice self-attention network, wherein the named entity recognition device comprises a mixed lattice structure coding module, a vocabulary enhancement module, a sequence marking and decoding module and a model training module;
the mixed lattice structure coding module is used for searching words consisting of continuous words in input sentences in a dictionary, alternately mapping and merging the words into a single multidimensional vector through positions, and coding sentence characteristic vectors represented by word pairs into a matrix with fixed dimensionality by adopting a mixed word lattice coding mode to obtain word vector representation of a corresponding mixed lattice structure;
the vocabulary enhancement module is used for constructing a corresponding self-attention network based on the generated word vectors of the mixed lattice structure so as to capture the influence of the word vectors in the vectors on the word vectors, thereby enhancing the feature representation of each word vector;
the sequence labeling and decoding module is used for fusing word features at an Embedding layer of BERT and learning to obtain better word vector representation through a fine tuning learning process; according to a BilSTM-CRF network, realizing an entity sequence labeling task and a decoding process in entity identification, completing modeling of character features after fusion through the network, and constructing and completing an entity identification model based on a mixed lattice self-attention network;
the model training module is used for training the entity recognition model based on the hybrid grid self-attention network on the data set.
The invention has the beneficial effects that:
the named entity recognition method and device based on the mixed grid self-attention network capture global vocabulary information by constructing the fusion network, generate word vector representation with rich semantics, and improve the recognition precision of the Chinese named entity on a plurality of data sets. Compared with the BERT reference model without using the vocabulary enhancement network, the invention has the performance improvement of 4.55%, 0.54%, 1.82% and 0.91% on the four data sets respectively, which shows that the improvement of the feature representation of the word vector by the vocabulary enhancement technology is a method for effectively improving the performance of the NER. Meanwhile, compared with other vocabulary enhancement methods, the feature fusion framework (MELSN) provided by the invention can effectively fuse richer vocabulary semantic features, and word feature representation after vocabulary enhancement contains more vocabulary semantics by means of a fine adjustment mechanism of the pre-training model BERT. The comparison experiment result shows that the invention can better utilize the character-word lattice structure to realize word enhancement, and also shows the high efficiency of the method provided by the invention on the recognition task of the Chinese named entity.
Drawings
Fig. 1 is a schematic structural diagram of a named entity recognition apparatus based on a mixed-grid self-attention network according to the present invention.
Detailed Description
The present invention will now be described in further detail with reference to the accompanying drawings.
It should be noted that the terms "upper", "lower", "left", "right", "front", "back", etc. used in the present invention are for clarity of description only, and are not intended to limit the scope of the present invention, and the relative relationship between the terms and the terms is not limited by the technical contents of the essential changes.
The invention provides a named entity recognition method based on a mixed lattice self-attention network, which comprises the following steps:
s1, searching words composed of continuous words in the input sentence in the dictionary, combining into a single multidimensional vector through position alternate mapping, and coding the sentence feature vector represented by the word pair into a matrix with fixed dimension by adopting a mixed word lattice coding mode to obtain the word vector representation of a corresponding mixed lattice structure.
S2, based on the word vectors of the mixed lattice structure generated in step S1, constructing a corresponding self-attention network to capture the influence of the word vectors in the vectors on the word vectors, thereby enhancing the feature representation of each word vector.
S3, fusing word features at the Embedding layer of BERT, and learning to obtain better word vector representation through a fine tuning learning process; and realizing an entity sequence labeling task and a decoding process in entity identification according to a BilSTM-CRF network, completing modeling of the character characteristics after fusion through the network, and constructing and completing an entity identification model based on a mixed lattice self-attention network.
And S4, training the entity recognition model based on the mixed lattice self-attention network on the data set.
Based on the foregoing method, the embodiment further provides a named entity recognition apparatus based on a hybrid grid self-attention network. Fig. 1 is a schematic structural diagram of a named entity recognition apparatus based on a mixed-grid self-attention network according to the present invention. The whole framework is divided into three parts: a Mixed-lattice Encoding Module (Mixed-lattice Encoding Module), a vocabulary Enhancement Module (Lexicon Enhancement Module), and a Sequence-labeling and Decoding Module (Sequence-labeling and Decoding). Completing the encoding of word-word pair vectors in a first module, namely searching all Words in a dictionary tree, loading pre-trained word and word vectors, encoding the word-word pair vectors into Mixed-lattice embedded vectors (Mixed-lattice embedding), and transmitting the Mixed-lattice embedded vectors and word mask vectors (Words mask) generated at the stage to a next module; the second module realizes the vocabulary enhancement process through the proposed self-attention model, and the enhanced word vector representation is transmitted into the BERT model for fine adjustment; and the last module carries out modeling according to the enhanced and fine-tuned word vectors to complete the label prediction and decoding process of each word vector.
The invention provides a named entity recognition method based on vocabulary enhancement, wherein a model is improved based on a BERT-BilSTM-CRF network, and a vocabulary enhancement module based on an Attention network is added on an Embedding layer of the BERT; the word vector coding method is characterized in that the word vector coding method is designed for coding the word vector, an intersection arrangement word vector coding mode is designed, after vocabulary enhancement is completed through an Attention network, fusion characteristics are normalized, and then the original BERT word vector is added in a residual error mode. The specific implementation process of the invention is as follows:
step 1: word vector construction of mixed lattice structure
The Mixed-word Lattice (Mixed-Lattice) coding has the effect of coding sentence feature vectors represented by word pairs into a matrix with a fixed dimension. Firstly, words composed of continuous words in an input sentence are searched in a dictionary, and then the words are combined into a single multidimensional vector through position alternate mapping.
In particular, a sentence s is givenc={c1,c2,…,cnThe sentence s can be directly obtained by loading the weight of the pretrained BERTcWord feature vector representation of
Figure BDA0003517951980000061
Wherein
Figure BDA0003517951980000062
ciDenotes scOf (a), n represents the word length of s, eBA lookup table representing the BERT pre-training word vector. Given a Chinese dictionary L, a Trie dictionary tree is firstly constructed, nodes of the tree are traversed, and vocabularies matched with each word can be obtained. Grouping all matched words according to 'BMES' mark, namely for character ciWord set B (c)i) Consisting of the matching words that begin with it. Similarly, set M (c)i) From ciSet E (c) for matching word composition of its internal charactersi) From c to ciMatching word composition at the end, set S (c)i) From ciSo that the sentence s is composed of the single character wordscIn each word ciWord set wiCan be expressed as:
wi={ew(B(ci)),ew(M(ci)),ew(E(ci)),ew(S(ci))};
wherein ewRepresenting a pre-trained word vector look-up table. In order to ensure the consistency of vector dimensions, two learnable nonlinear full-connection layers are arranged to connect wiIs raised to a sum word vector
Figure BDA0003517951980000063
And in the process of fine adjustment of BERT, learning the two layers of weights, so that the pre-trained word feature vector can be mapped to the semantic feature space of the BERT. The processed word feature vector is represented as follows:
Figure BDA0003517951980000064
wherein W1∈(dc×dc),W2∈(dc×dw) Is a learnable weight matrix, b1And b2Is the corresponding offset. d is a radical ofcDimension, d, representing a BERT word vectorwRepresenting the dimensions of the pre-training word vector. In our feature fusion method, the converted word feature vector
Figure BDA0003517951980000065
As input to the feature fusion model. According to the corresponding relation between the words and the word sets, the characteristics of each word-word pair can be expressed as:
Figure BDA0003517951980000066
at present, a plurality of NER methods based on vocabulary fusion directly use character-word pair characteristics IiAs the input of the vocabulary enhancement network, the method can only fuse partial vocabulary information, and in the experiment, we find that in the sentence scIn order to enable the model to capture global word-level features, the embodiment proposes a new word-word pairThe encoding mode, i.e. Mixed-word encoding (Mixed-Lattice encoding), features of word-word pairs are expressed as follows:
Figure BDA0003517951980000067
wherein
Figure BDA0003517951980000068
Representing a vector concatenator, the word vectors are cross-encoded into a fixed-dimension feature vector based on the encoding method, in which the closer the word vectors are arranged, the higher the association with the word, which also corresponds to reality. In the next phase, the embodiment constructs a fusion network based on the Attention mechanism to represent V for the word-word pair featureMEAnd modeling.
And 2, step: computing process for word feature fusion network
The function of the vocabulary enhancement network is to model a word-word feature representation vector VMEThe word features are used to enhance the feature representation of the word vectors in the sequence. Based on the attention mechanism design idea proposed by Vaswani et al, a Mixed-lattice self-attention network is designed to capture the association between word features. In the previous stage we have obtained a mixed lattice vector representation V of the pairs of code word wordsMEThe Lattice structure organizes the word and word vectors into a multi-dimensional vector. Characters and word features are distributed in the embedding in a crossed manner, and a self-attribute network distributes VMEAnd the word position shielding matrix M is used as the input of the enhancement network, and through the modeling of the self-attention network, the model can learn the word meaning correlation weight between words. Given a mixed word code vector V of a sentenceMEAnd the M matrix, Q, K, V matrix is calculated as follows:
[Q,K,V]=[WqVME,WkVME,WvVME];
wherein
Figure BDA0003517951980000071
Is a learnable weightA matrix, and de=dc+dwThen, the dot product operation is taken as a calculation formula of the similarity score:
Figure BDA0003517951980000072
FAtt=Softmax(SAtt+εM)V;
where M is a static word-position mask matrix, ε is a matrix whose value is infinitesimal,
Figure BDA0003517951980000073
is the output of the self-attention network. The attention scores obtained at word positions are masked by multiplying the M points by an infinitesimal matrix. Through calculation of the Softmax function, the probability of the word position is all zero, and the position of the character contains the weight fraction of each word. And completing the fusion of word characteristics by using a word position mask method in the self-attribute network.
Through the above process, the word characteristics can be integrated, the word characteristic information is added into the BERT pre-training word vector as a residual error, and the finally obtained word enhanced word characteristic vector is as follows:
C′=C+g(FAtt);
wherein
Figure BDA0003517951980000074
Representing the pre-training word vector features of BERT, using a function g (#) to remove the word vector channel in self-entry network to guarantee C and FAttThe consistency of the vector dimensions results in a vocabulary enhanced word embedding vector C'.
Based on the above process, the word-word pair vector is recoded, and a special self-attention network is utilized to model the global word vector and the word vector, so that the fused word feature vector is obtained, and the fusion of the global word feature information is realized.
And step 3: constructing named entity recognition models
1) Calculation procedure of BERT structure
The embodiment is based on the improvement of a BERT-BilSTM network, the characteristic enhancement is carried out on an embedding layer of the BERT, and a mechanism for integrating word characteristics into the embedding layer is provided. In step 2, a vocabulary enhanced word vector representation C' is obtained, giving a sentence sequence s of length nc={c1,c2,…,cnNext, fine-tuning the word vector C' in the BERT model, the vocabulary-enhanced BERT word-embedding vector can be expressed as:
E′i=C′i+Es(i)+Ep(i)
wherein EsAnd EpThe separation vector and position vector lookup tables are represented separately, and the resulting E' is then input into BERT, the calculation formula for each transform block being as follows:
D=LN(Hk-1+MHA(Hk-1)
Hk=LN(FFN(D)+D)
wherein HkRepresents the hidden state output (H) of the k-th layer0E' represents the underlying word vector); LN is the layer normalization function; MHA is a multi-headed self-attention module; FFN represents a two-layer feedforward neural network. Finally, the hidden state output vector of the last layer of transformer is obtained
Figure BDA0003517951980000081
The vector is output to subsequent sequence labeling and decoding tasks.
2) Computation procedure for LSTM networks
The embedded vector of the fine-tuned mixing lattice already contains semantic information at word level, because the association of the semantic information between words is more concerned in the process of word fusion, in order to capture the global semantic information between words in a sentence and better improve the performance of NER, the method commonly used by most NER models is adopted, namely, BilSTM is used as the sequence labeling layer of the model of the embodiment. Given a sentence sc={c1,c2,…,cnAt the previous step, a vocabulary enhanced character feature representation C '═ { C'1,c′2,…,c′nAre multiplied byOutput matrix with hidden state after BERT fine adjustment
Figure BDA0003517951980000082
Will be provided with
Figure BDA0003517951980000083
Input into a bi-directional LSTM network that is capable of capturing semantic information from the left to the right and right to the left of a sentence, respectively.
The hidden state output of the forward LSTM network can be represented as
Figure BDA0003517951980000084
The same backward LSTM network outputs
Figure BDA0003517951980000085
The output of the BI-LSTM network is therefore the output of the sequence-labeling layer, which can be expressed as:
Figure BDA0003517951980000086
wherein h isiRepresents the cascade hidden state output of the ith Bi-LSTMs neuron, which is used by us to represent ciIs used to generate a character-level context semantic representation.
3) Decoding process for CRF network
After the model has undergone computation of the sequence labeling layer, the NER label is predicted using the standard CRF layer. Giving hidden state output vector H ═ H of last layer of network1,h2,…,hnIf y is equal to { y }1,y2,…,ynDenotes a sequence of labels, s ═ s for a sentence1,s2,…,snThe probability of its corresponding tag sequence is defined as follows:
Figure BDA0003517951980000087
wherein y' represents any one tag in all tag sequencesA sequence;
Figure BDA0003517951980000088
the representation corresponds to yiA learnable weight parameter in the network of (a);
Figure BDA0003517951980000089
the representation corresponds to yi-1And yiThe offset between. The negative log-likelihood loss is taken as a loss function of the model, which can be expressed as:
Figure BDA00035179519800000810
and 4, step 4: model learning process
The method proposed in this embodiment is trained on four public data separately, and the model converges by optimizing the above-mentioned negative log-likelihood loss. The model uses a learning rate updating strategy of warmup, the BERT model parameter uses a learning rate of 1e-5, and the model can be quickly converged by a small learning rate in the fine adjustment process; for the parameters of the LSTM model, a learning rate of 1e-3 is used, and for all other parameters, a learning rate of 1e-4 is set. On four datasets, the model converged within 20 epochs. Experiments were performed on MSRA datasets using V100 graphics cards, and other datasets were performed on 1080Ti graphics cards. In experiments, results on different machines are found to have certain differences, so the experimental results are the average value of the experimental results of a plurality of times.
The embodiment evaluates four public Chinese named entity recognition data sets, compares the effects with other models, and shows the effect of the invention through table display data.
Introduction of data set:
the Weibo dataset is a social media public dataset collected from the Xinlang microblog website, and contains four entity types: place name, person name, organization, and politically related entity name. The Resume dataset is also from the New wave social media data, and is about financial Resume data. The MSRA and ontotonotes 4 data sets originate from the public news domain and contain authentic labels for training data. The statistics of these data sets are shown in table 1.
Table 1 table of statistics for data sets
Figure BDA0003517951980000091
And (3) measurement indexes are as follows:
the F1 value, a measure commonly used by classification models, is used here to compare the recognition accuracy of other models and the present invention. First, some preconditions for calculating the F1 value are introduced: TP, True Positives, indicates that the sample is divided into positive samples and allocated correctly; TN, True negotives, indicates that the sample is divided into samples and assigned correctly; FP, False Positives, indicates that a sample is divided into positive samples but is allocated incorrectly; FN, False Negatives, indicates that the sample is divided into negative samples but is misassigned. Precision, which represents the ratio of the number of correctly assigned positive samples to the total number of correctly assigned positive samples, is given by:
Figure BDA0003517951980000092
recall, the Recall rate, represents the proportion of correctly assigned positive samples to the total number of positive samples, and is given by the formula:
Figure BDA0003517951980000093
the F1-Score is also called F1 Score, is a measure of the classification problem, is often used as a final indicator of the multi-classification problem, and is a harmonic mean of the precision and the recall ratio. For a single category of F1 scores, the following formula can be used to calculate
Figure BDA0003517951980000094
Description of the effects:
the F1 values on the four data are shown in the table below. This embodiment compares not only the method of using BERT to pre-train word vectors, but also other classical vocabulary enhancement SOTA methods. On the basis of ensuring that the same pre-training word vector and word vector are used, the method provided by the embodiment achieves the performances of 71.88%, 96.22%, 95.72% and 81.75% of F1 values on four public data sets Weibo, Resume, MSRA and Ontonote4 respectively. Compared with the performances of the classical Chinese NER models, namely Lattice-LSTM, LR-CNN and FLAT, the method provided by the embodiment respectively achieves the improvement of 8.46%, 0.69%, 5% and 1.37% on the four data sets. Compared with NER models using BERT pre-training word vectors, such as Softlexicon-BERT, FLAT-BERT, LEBERT and the like, on Resume, MSRA and OnNote4 data sets, the method provided by the embodiment achieves slight improvements of 0.28%, 0.12% and 0.41%, but F1 score on Weibo data set is obviously better than that of Softlexicon BERT model and is improved by 1.64%. This embodiment is based on BERT-BilSTM improvement, and compared with the BERT reference model without using the vocabulary enhancement network, the embodiment has performance improvement of 4.55%, 0.54%, 1.82% and 0.91% on the four data sets, which shows that improving the feature representation of the word vector by the vocabulary enhancement technology is a method for effectively improving the performance of the NER. Meanwhile, compared with other vocabulary enhancement methods, the feature fusion framework (MELSN) provided by the embodiment can effectively fuse richer vocabulary semantic features, and word feature representation after vocabulary enhancement contains more vocabulary semantics by means of a fine adjustment mechanism of the pre-training model BERT. Referring to table 2, the comparison experiment results show that the embodiment can better utilize the word-word lattice structure to realize vocabulary enhancement, and also show the high efficiency of the method provided by the embodiment on the recognition task of the named entity in chinese.
Table 2 comparative experiment results table
Figure BDA0003517951980000101
The above are only preferred embodiments of the present invention, and the scope of the present invention is not limited to the above examples, and all technical solutions that fall under the spirit of the present invention belong to the scope of the present invention. It should be noted that modifications and embellishments within the scope of the invention may be made by those skilled in the art without departing from the principle of the invention.

Claims (5)

1. A named entity recognition method based on a mixed lattice self-attention network is characterized by comprising the following steps:
s1, searching words composed of continuous words in the input sentence in the dictionary, combining into a single multidimensional vector through position alternate mapping, and coding the sentence characteristic vector represented by the word pair into a matrix with fixed dimensionality by adopting a mixed word lattice coding mode to obtain the word vector representation of a corresponding mixed lattice structure;
s2, constructing a corresponding self-attention network based on the word vectors of the mixed lattice structure generated in the step S1 to capture the influence of the word vectors in the vectors on the word vectors so as to enhance the feature representation of each word vector;
s3, fusing word features at the Embedding layer of BERT, and learning to obtain better word vector representation through a fine tuning learning process; according to a BilSTM-CRF network, realizing an entity sequence labeling task and a decoding process in entity identification, completing modeling of character features after fusion through the network, and constructing and completing an entity identification model based on a mixed lattice self-attention network;
and S4, training the entity recognition model based on the mixed lattice self-attention network on the data set.
2. The method for recognizing named entities based on mixed-lattice self-attention network as claimed in claim 1, wherein in step S1, the process of encoding sentence feature vectors represented by word pairs into a matrix with fixed dimension by using mixed-word lattice encoding to obtain word vector representations with corresponding mixed-lattice structure comprises the following steps:
s11, a sentence S is givenc={c1,c2,…,cnBy loading with prestageThe trained BERT weights to obtain a sentence scWord feature vector representation of
Figure FDA0003517951970000011
Wherein
Figure FDA0003517951970000012
ciDenotes scOf (a), n represents the word length of s, eBA lookup table representing a BERT pre-training word vector;
s12, giving a Chinese dictionary L, constructing a Trie dictionary tree, traversing the nodes of the tree, and obtaining the vocabulary matched with each word;
s13, all matched words are grouped according to BMES marks, namely for the character ciWord set B (c)i) Consisting of matching words starting with it, set M (c)i) From ciSet E (c) for matching word composition of its internal charactersi) By ciMatching word composition at the end, set S (c)i) From ciThe single character word composition of (1); sentence scIn each word ciWord set w ofiExpressed as:
wi={ew(B(ci)),ew(M(ci)),ew(E(ci)),ew(S(ci))};
wherein ewA lookup table of word vectors representing pre-training;
s14, setting two learnable nonlinear full-connection layers to connect wiIs raised to a sum word vector
Figure FDA0003517951970000013
When BERT is in fine adjustment, learning the two layers of weights to enable pre-trained word feature vectors to be mapped to a semantic feature space of the BERT; the processed word feature vector is represented as follows:
Figure FDA0003517951970000014
wherein W1∈(dc×dc),W2∈(dc×dw) Is a learnable weight matrix, b1And b2Is a corresponding offset, dcDimension, d, representing the BERT word vectorwA dimension representing a pre-training word vector;
s15, converting the word feature vector vi wAs the input of the feature fusion model, according to the corresponding relation between the words and the word sets, the feature of each word-word pair is expressed as:
Figure FDA0003517951970000015
s16, representing the characteristics of the word-word pairs as follows:
Figure FDA0003517951970000021
wherein
Figure FDA0003517951970000022
Representing a vector concatenator.
3. The method for named entity recognition based on mixed-lattice self-attention network as claimed in claim 2, wherein the step S2 of constructing a corresponding self-attention network based on the word vectors of the mixed-lattice structure generated in step S1 to capture the influence of the word vectors in the vectors on the word vectors, so as to enhance the feature representation of each word vector comprises the following steps:
s21, designing Mixed-lattice self-attention network to capture the association between word features, the self-attention network will mix word code vector VMEAnd a word position shielding matrix M is used as an input of the enhancement network, global word vectors and word vectors are modeled through the self-attention network, so that the model learns word meaning correlation weights among words, and the Q, K, V matrix is calculated as follows:
[Q,K,V]=[WqVME,WkVME,WvVME];
wherein
Figure FDA0003517951970000023
Is a learnable weight matrix, and de=dc+dw(ii) a Q, K, V matrix is query item matrix, key item matrix corresponding to query item and value item matrix to be weighted average; deDimension, d, representing mixed-lattice vectorcDimension, d, representing word vectorwThe dimension of the word vector;
s22, using the dot product operation as a formula for calculating the similarity score:
Figure FDA0003517951970000024
FAtt=Softmax(SAtt+εM)V;
where M is a static word-position mask matrix, ε is a matrix whose value is infinitesimal,
Figure FDA0003517951970000025
is an output from the attention network; wherein SAttDenotes the normalized attention score, KTRepresents the transpose of matrix K;
s23, adding the word feature information as a residual into the BERT pre-training word vector, and obtaining the word enhanced word feature vector as follows:
C′=C+g(FAtt);
wherein
Figure FDA0003517951970000026
Representing the pre-training word vector features of BERT, the function g (. + -) is used to remove the word vector path in self-entry network to ensure C and FAttAnd (5) obtaining word embedded vector C' after vocabulary enhancement by the consistency of vector dimensions.
4. The method for named entity recognition based on hybride self-attention network of claim 3, wherein the step S3 of constructing the entity recognition model based on hybride self-attention network comprises the following steps:
s31, a sentence sequence S with the length of n is givenc={c1,c2,…,cnThe vocabulary enhanced word vector is denoted as C '═ C'1,c′2,…,c′nFine-tuning a word vector C' in the BERT model, the vocabulary-enhanced BERT word-embedding vector is represented as:
E′i=C′i+Es(i)+Ep(i);
wherein EsAnd EpRespectively representing a separation vector and a position vector lookup table; i denotes a character sequence s of length ncThe ith character in (a);
s32, inputting the obtained E' into BERT, wherein the calculation formula of each transform block is as follows:
D=LN(Hk-1+MHA(Hk-1);
Hk=LN(FFN(D)+D);
wherein HkRepresents the hidden state output of the k-th layer, H0E' represents the underlying word vector; LN is the layer normalization function; MHA is a multi-headed self-attention module; FFN represents a two-layer feedforward neural network; d represents an output vector after the multi-head attention module is normalized;
s33, obtaining the hidden state output vector of the last layer of transformer
Figure FDA0003517951970000031
Will be provided with
Figure FDA0003517951970000032
Inputting the semantic information into a bidirectional LSTM network, and capturing semantic information from left to right and from right to left of a sentence respectively; the hidden state output of the forward LSTM network is represented as
Figure FDA0003517951970000033
The backward LSTM network outputs
Figure FDA0003517951970000034
The output of the BI-LSTM network is the output of the sequence-labeling layer, expressed as:
Figure FDA0003517951970000035
wherein h isiRepresents the cascade hidden state output of the ith Bi-LSTMs neuron and is used for representing ciThe character-level context semantic representation of;
s34, predicting the NER label by using standard CRF layer, and giving hidden state output vector H ═ H of last layer of network1,h2,…,hnIf y is equal to { y }1,y2,…,ynDenotes a sequence of labels, s ═ s for a sentence1,s2,…,snThe probability of its corresponding tag sequence is defined as follows:
Figure FDA0003517951970000036
wherein y' represents any one of all tag sequences;
Figure FDA0003517951970000037
the representation corresponds to yiA learnable weight parameter in the network of (a);
Figure FDA0003517951970000038
representation corresponds to yi-1And yiAn offset between;
Figure FDA0003517951970000039
respectively representing model weight parameters and offset under any possible label y';
s35, the negative log-likelihood loss as a loss function of the model is expressed as:
Figure FDA00035179519700000310
5. a named entity recognition device based on a mixed lattice self-attention network is characterized by comprising a mixed lattice structure coding module, a vocabulary enhancing module, a sequence marking and decoding module and a model training module;
the mixed lattice structure coding module is used for searching words consisting of continuous words in input sentences in a dictionary, alternately mapping and merging the words into a single multidimensional vector through positions, and coding sentence characteristic vectors represented by word pairs into a matrix with fixed dimensionality by adopting a mixed word lattice coding mode to obtain word vector representation of a corresponding mixed lattice structure;
the vocabulary enhancement module is used for constructing a corresponding self-attention network based on the generated word vectors of the mixed lattice structure so as to capture the influence of the word vectors in the vectors on the word vectors, thereby enhancing the feature representation of each word vector;
the sequence labeling and decoding module is used for fusing word features at an Embedding layer of BERT and learning to obtain better word vector representation through a fine tuning learning process; according to a BilSTM-CRF network, realizing an entity sequence labeling task and a decoding process in entity identification, completing modeling of character features after fusion through the network, and constructing and completing an entity identification model based on a mixed lattice self-attention network;
the model training module is used for training the entity recognition model based on the hybrid grid self-attention network on the data set.
CN202210172667.4A 2022-02-24 2022-02-24 Named entity identification method and device based on mixed lattice self-attention network Pending CN114429132A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202210172667.4A CN114429132A (en) 2022-02-24 2022-02-24 Named entity identification method and device based on mixed lattice self-attention network

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202210172667.4A CN114429132A (en) 2022-02-24 2022-02-24 Named entity identification method and device based on mixed lattice self-attention network

Publications (1)

Publication Number Publication Date
CN114429132A true CN114429132A (en) 2022-05-03

Family

ID=81312807

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210172667.4A Pending CN114429132A (en) 2022-02-24 2022-02-24 Named entity identification method and device based on mixed lattice self-attention network

Country Status (1)

Country Link
CN (1) CN114429132A (en)

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114818721A (en) * 2022-06-30 2022-07-29 湖南工商大学 Event joint extraction model and method combined with sequence labeling
CN115545035A (en) * 2022-11-29 2022-12-30 城云科技(中国)有限公司 Text entity recognition model and construction method, device and application thereof
CN115935994A (en) * 2022-12-12 2023-04-07 重庆邮电大学 Method for intelligently identifying electric trademark
CN117992924A (en) * 2024-04-02 2024-05-07 云南师范大学 HyperMixer-based knowledge tracking method

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114818721A (en) * 2022-06-30 2022-07-29 湖南工商大学 Event joint extraction model and method combined with sequence labeling
CN114818721B (en) * 2022-06-30 2022-11-01 湖南工商大学 Event joint extraction model and method combined with sequence labeling
CN115545035A (en) * 2022-11-29 2022-12-30 城云科技(中国)有限公司 Text entity recognition model and construction method, device and application thereof
CN115545035B (en) * 2022-11-29 2023-02-17 城云科技(中国)有限公司 Text entity recognition model and construction method, device and application thereof
CN115935994A (en) * 2022-12-12 2023-04-07 重庆邮电大学 Method for intelligently identifying electric trademark
CN115935994B (en) * 2022-12-12 2024-03-08 芽米科技(广州)有限公司 Method for intelligently identifying current label questions
CN117992924A (en) * 2024-04-02 2024-05-07 云南师范大学 HyperMixer-based knowledge tracking method
CN117992924B (en) * 2024-04-02 2024-06-07 云南师范大学 HyperMixer-based knowledge tracking method

Similar Documents

Publication Publication Date Title
CN109992782B (en) Legal document named entity identification method and device and computer equipment
CN114169330B (en) Chinese named entity recognition method integrating time sequence convolution and transform encoder
CN110196980B (en) Domain migration on Chinese word segmentation task based on convolutional network
CN111737496A (en) Power equipment fault knowledge map construction method
CN114429132A (en) Named entity identification method and device based on mixed lattice self-attention network
CN113268995B (en) Chinese academy keyword extraction method, device and storage medium
CN111767718B (en) Chinese grammar error correction method based on weakened grammar error feature representation
CN114757182A (en) BERT short text sentiment analysis method for improving training mode
CN113392209B (en) Text clustering method based on artificial intelligence, related equipment and storage medium
CN113190656A (en) Chinese named entity extraction method based on multi-label framework and fusion features
CN111309918A (en) Multi-label text classification method based on label relevance
CN111368542A (en) Text language association extraction method and system based on recurrent neural network
Deng et al. Self-attention-based BiGRU and capsule network for named entity recognition
CN114818717A (en) Chinese named entity recognition method and system fusing vocabulary and syntax information
Wu et al. An effective approach of named entity recognition for cyber threat intelligence
CN114332519A (en) Image description generation method based on external triple and abstract relation
CN112905736A (en) Unsupervised text emotion analysis method based on quantum theory
CN114757184B (en) Method and system for realizing knowledge question and answer in aviation field
CN111680529A (en) Machine translation algorithm and device based on layer aggregation
CN115545033A (en) Chinese field text named entity recognition method fusing vocabulary category representation
CN115169349A (en) Chinese electronic resume named entity recognition method based on ALBERT
CN115017879A (en) Text comparison method, computer device and computer storage medium
US11822887B2 (en) Robust name matching with regularized embeddings
CN111581365B (en) Predicate extraction method
CN116680407A (en) Knowledge graph construction method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination