CN114048750A - Named entity identification method integrating information advanced features - Google Patents

Named entity identification method integrating information advanced features Download PDF

Info

Publication number
CN114048750A
CN114048750A CN202111510990.XA CN202111510990A CN114048750A CN 114048750 A CN114048750 A CN 114048750A CN 202111510990 A CN202111510990 A CN 202111510990A CN 114048750 A CN114048750 A CN 114048750A
Authority
CN
China
Prior art keywords
text
feature
character
information
named entity
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202111510990.XA
Other languages
Chinese (zh)
Other versions
CN114048750B (en
Inventor
程良伦
聂梦娜
张伟文
叶海明
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Guangdong University of Technology
Original Assignee
Guangdong University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Guangdong University of Technology filed Critical Guangdong University of Technology
Priority to CN202111510990.XA priority Critical patent/CN114048750B/en
Priority claimed from CN202111510990.XA external-priority patent/CN114048750B/en
Publication of CN114048750A publication Critical patent/CN114048750A/en
Application granted granted Critical
Publication of CN114048750B publication Critical patent/CN114048750B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • G06F40/295Named entity recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • G06F18/2415Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches based on parametric or probabilistic models, e.g. based on likelihood ratio or false acceptance rate versus a false rejection rate
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • General Health & Medical Sciences (AREA)
  • Computational Linguistics (AREA)
  • Health & Medical Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Molecular Biology (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Probability & Statistics with Applications (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Character Discrimination (AREA)

Abstract

The invention relates to the technical field of natural language processing, and discloses a named entity identification method fusing advanced features of information, which comprises the following specific steps: s1, acquiring text information to be analyzed, and preprocessing the text information to obtain preprocessed text characteristics; s2, inputting the obtained preprocessed text features into a deep convolutional neural network to extract character-level features in the text, and inputting the extracted character-level features into a full-connection network to obtain the optimal feature representation of the characters; s3, obtaining lattice through the obtained character-level features and the text information, and embedding the lattice to obtain integrated representation; and S4, connecting the integrated representation and the optimal characteristic representation, using a Transformer as an encoder, decoding by using a conditional random field to obtain the relation between words, and performing entity recognition. The method solves the problem that the named entity cannot be reliably and efficiently extracted in the prior art, and has the characteristics of convenience in calculation, practicability and high efficiency.

Description

Named entity identification method integrating information advanced features
Technical Field
The invention relates to the technical field of natural language processing, in particular to a named entity identification method fusing information advanced features.
Background
Named entity recognition is a task of recognizing entities in unstructured text, such as people, places, organizations, and the like, and plays a significant role in the recognition of the entities in unstructured text. Named entity recognition has been widely used in many scenarios, such as relationship extraction, question and answer, event extraction, information retrieval, and knowledge graph construction, based on previous researchers' research.
For languages where words are naturally separated, such as english, named entity recognition is often used as a sequence tagging problem and has achieved the most advanced results in deep learning based approaches. In contrast to English, Chinese does not have empty lattices as delimiters for words, so Chinese words are generally obtained by a segmentation tool and introducing an external dictionary. In non-general fields such as the marine industry, many professional words can not be effectively obtained through word segmentation tools or means such as matching with external dictionaries, such as whale, maritime scenic spots and underwater restaurants, so that entity boundaries are fuzzy, and a lot of difficulties are brought to Chinese named entity recognition.
In order to solve the problems in the prior art, the prior patent discloses a method and a device for identifying a named entity; wherein, the method comprises the following steps: extracting information of the character image by using a convolutional neural network model CNN to obtain a font vector corresponding to characters in the character image; splicing the font vector and the character vector corresponding to the character, and acquiring a characteristic vector according to the spliced vector obtained by splicing; obtaining a named entity set according to the feature vector, wherein the named entity set comprises a plurality of named entities; and establishing a question corresponding to the text image, and positioning to obtain a named entity to be acquired based on the question. However, the existing patent method has low precision and cannot reliably and efficiently extract named entities; therefore, how to invent a method for identifying a named entity, which can reliably and efficiently extract the named entity, is an urgent problem to be solved in the technical field.
Disclosure of Invention
The invention provides a named entity identification method fusing information advanced features, aiming at solving the problem that the named entity cannot be extracted reliably and efficiently in the prior art, and having the characteristics of convenience in calculation, practicability and high efficiency.
In order to achieve the purpose of the invention, the technical scheme is as follows:
a named entity identification method fusing information advanced features comprises the following specific steps:
s1, obtaining text information to be analyzed, and preprocessing the text information to obtain preprocessed text characteristics;
s2, inputting the obtained preprocessed text features into a deep convolutional neural network to extract character-level features in the text, and inputting the extracted character-level features into a full-connection network to obtain the optimal feature representation of the characters;
s3, obtaining lattice through the obtained character-level features and the text information, and embedding the lattice to obtain integrated representation;
s4, connecting the integrated representation and the optimal feature representation, using a Transformer as an encoder,
and decoding by using the conditional random field to obtain the relation between words and performing entity recognition.
Preferably, in step S1, specifically, the method includes: the sequence of the obtained text information is expressed as s ═ { c ═ c1,c2,…,cnWhere n denotes sentence length, ciCharacters in the sentence s; each character c in the sentence to be inputiMapping into a character vector, and the formula is as follows:
xi=ec(ci),
wherein xiIs a character vector, ecIs a pre-training word vector containing characters and words; thus, the preprocessed text features are obtained:
X=[x1,x2,…,xn]。
further, in step S2, the specific steps are:
s201, filling two ends of the preprocessed text features;
s202, scanning the filled text features by using three filters with different sizes, and extracting a first feature, a second feature and a third feature;
s203, performing maximum pooling operation on the first feature, the second feature and the third feature respectively, and splicing the operated first feature, the operated second feature and the operated third feature to obtain character information in the text;
s204, inputting the character information in the obtained text into a full-connection network, and obtaining the optimal feature expression x of the character informationc
Further, in step S202, the three filters with different sizes are three filters with areas of 2 × 50,3 × 50 and 4 × 50, respectively.
Further, in step S3, the specific steps are:
s301, respectively using all subsequences of the text sequence s and the existing dictionary D
Figure BDA0003405345830000021
To indicate a subsequence starting from character number b and ending with character number e, b and e being constants, and to obtain all sequences in text sequence s that match existing dictionary D
Figure BDA0003405345830000031
S302, cwMapping as vector: x is the number ofw=ec(cw) Wherein e iscIs a pre-training word vector containing characters and words;
s303, preprocessing the text feature X and the vector XwSplicing to obtain latitexLI.e. xL=[X,xw];
S304. x isLInputting a full connection layer to obtain the feature X and the vector XwIs represented by the integration of xl
xl=WLxL+b
Wherein WLAre trainable parameters.
Furthermore, the lattice comprises spans with different lengths, and the spans have three relations: including, separating, intersecting.
Further, step S4, connecting the integrated representation and the optimal feature representation, specifically: expressing the optimal characteristics as xcAnd an integrated representation xlConnected, converted into original dimension x by linear projectionin
xin=Linear[xc;xl]
Where Linear represents a Linear function.
Further, the Transformer in step S4 includes two neural networks, namely a self-attention layer and a feedforward neural network layer; wherein the self-attention layer is a multi-head attention network, which focuses on the input sequence in a multi-head parallel manner, then connects the multi-head results, and projects the multi-head results again to generate a final value; the feedforward neural network is a multilayer perceptron and consists of a plurality of layers of continuous nonlinear functions; the feedforward neural network comprises two linear transformations, and a ReLU activation function is arranged in the middle of the two linear transformations; the output from the attention sublayer will be input to a feedforward neural network for further processing.
Furthermore, the self-attention layer adopts relative position coding, and the specific steps are as follows:
st401. calculated by successive transformations of the header and trailer information; using head [ i ] and tail [ i ]
To respectively express the spansx in latticeiWhile representing the spansx by four relative distancesiAnd xjThe relationship of (1):
Figure BDA0003405345830000032
Figure BDA0003405345830000033
Figure BDA0003405345830000034
Figure BDA0003405345830000035
wherein
Figure BDA0003405345830000036
Denotes spansxiHead and spansx ofjIs measured by the distance between the heads of (a),
Figure BDA0003405345830000037
denotes spansxiHead and spansx ofjThe distance between the tails of (a) and (b),
Figure BDA0003405345830000038
denotes spansxiTail and spansx ofjIs measured by the distance between the heads of (a),
Figure BDA0003405345830000041
denotes spansxiTail and spansx ofjThe distance between the tails;
st402. through the nonlinear transformation of the four distances, the relative position of the span is obtained:
Figure BDA0003405345830000042
wherein WrIs a parameter that can be trained in a particular way,
Figure BDA0003405345830000043
representing join operators, PdThe calculation is as follows:
Figure BDA0003405345830000044
Figure BDA0003405345830000045
wherein P isdI.e., P, d is 4 distances in equation 10
Figure BDA0003405345830000046
Or
Figure BDA0003405345830000047
One, k, denotes a dimension index of the position code.
Further, in step S4, the decoding is performed by using a conditional random field, specifically:
sp401. for a given text sequence s, create its corresponding tag sequence y ═ y1,y2,…,yn}, Y(s) denotes all valid tag sequences;
sp402. calculate the probability of Y:
Figure BDA0003405345830000048
wherein f (y)i-1,yiN) calculating from yi-1To yiTransition score and y ofiA fraction of (d);
sp403. in conjunction with the probability of Y, the path that reaches the maximum probability is found by the Viterbi algorithm.
The invention has the following beneficial effects:
the invention provides a named entity recognition method fusing information high-level features, which obtains lattice through character-level features and text information obtained by a deep convolutional neural network, extracts the character-level features and sends the extracted character-level features and lattice representations to a full-connection network so as to obtain the optimal feature representation of each input. Then, the words are encoded through a transformer and decoded in a conditional random field so as to effectively find the relation among the words and effectively identify the entity. Compared with other models, the method has the advantages that the effectiveness of the more fine-grained features extracted in the embedding layer is improved, the features capture meaningful information from the entity, and the problem that the named entity cannot be extracted reliably and efficiently in the prior art is solved.
Drawings
Fig. 1 is a flow chart of the named entity recognition method.
Fig. 2 is a model structure of the named entity recognition method.
Fig. 3 is a flowchart illustrating step S2 of the named entity recognition method.
Detailed Description
The invention is described in detail below with reference to the drawings and the detailed description.
Example 1
As shown in fig. 1, a named entity recognition method with integrated information advanced features includes the following specific steps:
s1, obtaining text information to be analyzed, and preprocessing the text information to obtain preprocessed text characteristics;
s2, inputting the obtained preprocessed text features into a deep convolutional neural network to extract character-level features in the text, and inputting the extracted character-level features into a full-connection network to obtain the optimal feature representation of the characters;
s3, obtaining lattice through the obtained character-level features and the text information, and embedding the lattice to obtain integrated representation;
and S4, connecting the integrated representation and the optimal characteristic representation, using a Transformer as an encoder, decoding by using a conditional random field to obtain the relation between words, and performing entity recognition.
Step S1 specifically includes: the sequence of the obtained text information is expressed as s ═ { c ═ c1,c2,…,cnWhere n denotes sentence length, ciCharacters in the sentence s; each character c in the sentence to be inputiMapping into a character vector, and the formula is as follows:
xi=ec(ci),
wherein xiIs a character vector, ecIs a pre-training word vector containing characters and words; thus, the preprocessed text features are obtained:
X=[x1,x2,…,xn]。
as shown in fig. 3, step S2 includes the following specific steps:
s201, filling two ends of the preprocessed text features;
s202, scanning the filled text features by using three filters with different sizes, and extracting a first feature, a second feature and a third feature;
s203, performing maximum pooling operation on the first feature, the second feature and the third feature respectively, and splicing the operated first feature, the operated second feature and the operated third feature to obtain character information in the text; in this embodiment, a max pooling operation is used to reduce the fit.
S204, inputting the character information in the obtained text into a full-connection network, and obtaining the optimal feature expression x of the character informationc
The three filters with different sizes in step S202 are three filters with areas of 2 × 50,3 × 50, and 4 × 50, respectively.
Example 2
As shown in fig. 1, a named entity recognition method with integrated information advanced features includes the following specific steps:
s1, obtaining text information to be analyzed, and preprocessing the text information to obtain preprocessed text characteristics;
s2, inputting the obtained preprocessed text features into a deep convolutional neural network to extract character-level features in the text, and inputting the extracted character-level features into a full-connection network to obtain the optimal feature representation of the characters;
s3, obtaining lattice through the obtained character-level features and the text information, and embedding the lattice to obtain integrated representation;
and S4, connecting the integrated representation and the optimal characteristic representation, using a Transformer as an encoder, decoding by using a conditional random field to obtain the relation between words, and performing entity recognition.
Step S1 specifically includes: the sequence of the obtained text information is expressed as s ═ { c ═ c1,c2,…,cnWhere n denotes sentence length, ciCharacters in the sentence s; each character c in the sentence to be inputiMapping into a character vector, and the formula is as follows:
xi=ec(ci),
wherein xiIs a character vector, ecIs a pre-training word vector containing characters and words; thus, the preprocessed text features are obtained:
X=[x1,x2,…,xn]。
as shown in fig. 3, step S2 includes the following specific steps:
s201, filling two ends of the preprocessed text features;
s202, scanning the filled text features by using three filters with different sizes, and extracting a first feature, a second feature and a third feature;
s203, performing maximum pooling operation on the first feature, the second feature and the third feature respectively, and splicing the operated first feature, the operated second feature and the operated third feature to obtain character information in the text;
s204, inputting the character information in the obtained text into a full-connection network, and obtaining the optimal feature expression x of the character informationc
Step S3, the specific steps are:
s301, respectively using all subsequences of the text sequence s and the existing dictionary D
Figure BDA0003405345830000061
To indicate a subsequence starting from character number b and ending with character number e, b and e being constants, and to obtain all sequences in text sequence s that match existing dictionary D
Figure BDA0003405345830000071
S302, cwMapping as vector: x is the number ofw=ec(cw) Wherein e iscIs a pre-training word vector containing characters and words;
s303, preprocessing the text feature X and the vector XwSplicing to obtain latitexLI.e. xL=[X,xw];
S304. x isLInputting a full connection layer to obtain the feature X and the vector XwIs represented by the integration of xl
xl=WLxL+b
Wherein WLAre trainable parameters.
The lattice comprises spans with different lengths, and the spans have three relations: including, separating, intersecting.
Step S4, connecting the integrated representation and the optimal feature representation, specifically: expressing the optimal characteristics as xcAnd an integrated representation xlConnected, converted into original dimension x by linear projectionin
xin=Linear[xc;xl]
Where Linear represents a Linear function.
As shown in fig. 2, the Transformer in step S4 includes two neural networks, namely a self-attention layer and a feedforward neural network layer; in this embodiment, each layer is followed by summing and normalization in turn. In other words, the output of each layer is LayerNorm (x + layer (x)), layer (x) is the output from an attention or feed-forward neural network; the self-attention layer is a multi-head attention network, which focuses on the input sequence in a multi-head parallel manner, then connects the multi-head results, and projects the result again to generate a final value. The results of the H-heads are then concatenated and projected again to produce the final value. The manner of calculation for each head is as follows.
Att(A,V)=softmax(A)V (13)
Aij=(Qi+u)TKj+(Qi+v)TRijWR (14)
[Q,K,V]=Ex[Wq, Wk, Wv] (15)
E is the input vector xin.Wq,WkWvIs a trainable matrix for combining ExProjected into different spaces. WRIs a trainable matrix, Qi,KjAre each span xiAnd xjV is a value vector AijIndicating the attention score. The feedforward neural network is a multilayer perceptron and consists of a plurality of layers of continuous nonlinear functions; the feedforward neural network comprises two linear transformations, and a ReLU activation function is arranged in the middle of the two linear transformations; the output from the attention sublayer will be input to a feedforward neural network for further processing.
The self-attention layer adopts relative position coding, and the specific steps are as follows:
st401. calculated by successive transformations of the header and trailer information; using head [ i ] and tail [ i ]
To respectively express the spansx in latticeiWhile representing the spansx by four relative distancesiAnd xjThe relationship of (1):
Figure BDA0003405345830000081
Figure BDA0003405345830000082
Figure BDA0003405345830000083
Figure BDA0003405345830000084
wherein
Figure BDA0003405345830000085
Denotes spansxiHead and spansx ofjIs measured by the distance between the heads of (a),
Figure BDA0003405345830000086
denotes spansxiHead and spansx ofjThe distance between the tails of (a) and (b),
Figure BDA0003405345830000087
denotes spansxiTail and spansx ofjIs measured by the distance between the heads of (a),
Figure BDA0003405345830000088
denotes spansxiTail and spansx ofjThe distance between the tails;
st402. through the nonlinear transformation of the four distances, the relative position of the span is obtained:
Figure BDA0003405345830000089
wherein WrIs a parameter that can be trained in a particular way,
Figure BDA00034053458300000810
representing join operators, PdThe calculation is as follows:
Figure BDA00034053458300000811
Figure BDA00034053458300000812
wherein P isdI.e., P, d is 4 distances in equation 10
Figure BDA00034053458300000813
Or
Figure BDA00034053458300000814
One, k, denotes a dimension index of the position code.
In step S4, decoding is performed with a conditional random field, specifically:
sp401. for a given text sequence s, create its corresponding tag sequence y ═ y1,y2,…,yn}, Y(s) denotes all valid tag sequences;
sp402. calculate the probability of Y:
Figure BDA00034053458300000815
wherein f (y)i-1,yiN) calculating from yi-1To yiTransition score and y ofiA fraction of (d);
sp403. in conjunction with the probability of Y, the path that reaches the maximum probability is found by the Viterbi algorithm.
Example 3
As shown in fig. 1, a named entity recognition method with integrated information advanced features includes the following specific steps:
s1, obtaining text information to be analyzed, and preprocessing the text information to obtain preprocessed text characteristics;
in this embodiment, the text information to be analyzed includes 900 pieces of marine industry sentence data that can be used for named entity recognition.
S2, inputting the obtained preprocessed text features into a deep convolutional neural network to extract character-level features in the text, and inputting the extracted character-level features into a full-connection network to obtain the optimal feature representation of the characters;
s3, obtaining lattice through the obtained character-level features and the text information, and embedding the lattice to obtain integrated representation;
s4, connecting the integrated representation and the optimal feature representation, using a Transformer as an encoder,
and decoding by using the conditional random field to obtain the relation between words and performing entity recognition.
Step S1 specifically includes: the sequence of the obtained text information is expressed as s ═ { c ═ c1,c2,…,cnWhere n denotes sentence length, ciCharacters in the sentence s; each character c in the sentence to be inputiMapping into a character vector, and the formula is as follows:
xi=ec(ci),
wherein xiIs a character vector, ecIs a pre-training word vector containing characters and words; thus, the preprocessed text features are obtained:
X=[x1,x2,…,xn]。
as shown in fig. 3, step S2 includes the following specific steps:
s201, filling two ends of the preprocessed text features;
s202, scanning the filled text features by using three filters with different sizes, and extracting a first feature, a second feature and a third feature;
s203, performing maximum pooling operation on the first feature, the second feature and the third feature respectively, and splicing the operated first feature, the operated second feature and the operated third feature to obtain character information in the text;
s204, inputting the character information in the obtained text into a full-connection network, and obtaining the optimal feature expression x of the character informationc
Step S3, the specific steps are:
s301, respectively using all subsequences of the text sequence s and the existing dictionary D
Figure BDA0003405345830000091
To indicate a subsequence starting from character number b and ending with character number e, b and e being constants, and to obtain all sequences in text sequence s that match existing dictionary D
Figure BDA0003405345830000101
S302, cwMapping as vector: x is the number ofw=ec(cw) Wherein e iscIs a pre-training word vector containing characters and words;
s303, preprocessing the text feature X and the vector XwSplicing to obtain latitexLI.e. xL=[X,xw];
S304. x isLInputting a full connection layer to obtain the feature X and the vector XwIs represented by the integration of xl
xl=WLxL+b
Wherein WLAre trainable parameters.
The lattice comprises spans with different lengths, and the spans have three relations: including, separating, intersecting.
Step S4, connecting the integrated representation and the optimal feature representation, specifically: expressing the optimal characteristics as xcAnd an integrated representation xlConnected, converted into original dimension x by linear projectionin
xin=Linear[xc;xl]
Where Linear represents a Linear function.
As shown in fig. 2, the Transformer in step S4 includes two neural networks, namely a self-attention layer and a feedforward neural network layer; in this embodiment, each layer is followed by summing and normalization in turn. In other words, the output of each layer is LayerNorm (x + layer (x)), layer (x) is the output from an attention or feed-forward neural network; the self-attention layer is a multi-head attention network, which focuses on the input sequence in a multi-head parallel manner, then connects the multi-head results, and projects the result again to generate a final value. The results of the H-heads are then concatenated and projected again to produce the final value. The manner of calculation for each head is as follows.
Att(A,V)=softmax(A)V (13)
Aij=(Qi+u)TKj+(Qi+v)TRijWR (14)
[Q,K,V]=Ex[Wq, Wk, Wv] (15)
E is the input vector xin.Wq,Wk,WvIs a trainable matrix for combining ExProjected into different spaces. WRIs a trainable matrix, Qi,KjAre each span xiAnd xjV is a value vector AijIndicating the attention score. The feedforward neural network is a multilayer perceptron and consists of a plurality of layers of continuous nonlinear functions; the feedforward neural network comprises two linear transformations, and a ReLU activation function is arranged in the middle of the two linear transformations; the output from the attention sublayer will be input to a feedforward neural network for further processing.
The self-attention layer adopts relative position coding, and the specific steps are as follows:
st401. calculated by successive transformations of the header and trailer information; by head [ i ]]and tail[i]To respectively express the spansx in latticeiWhile representing the spansx by four relative distancesiAnd xjThe relationship of (1):
Figure BDA0003405345830000111
Figure BDA0003405345830000112
Figure BDA0003405345830000113
Figure BDA0003405345830000114
wherein
Figure BDA0003405345830000115
Denotes spansxiHead and spansx ofjIs measured by the distance between the heads of (a),
Figure BDA0003405345830000116
denotes spansxiHead and spansx ofjThe distance between the tails of (a) and (b),
Figure BDA0003405345830000117
denotes spansxiTail and spansx ofjIs measured by the distance between the heads of (a),
Figure BDA0003405345830000118
denotes spansxiTail and spansx ofjThe distance between the tails;
st402. through the nonlinear transformation of the four distances, the relative position of the span is obtained:
Figure BDA0003405345830000119
wherein WrIs a parameter that can be trained in a particular way,
Figure BDA00034053458300001110
representing join operators, PdThe calculation is as follows:
Figure BDA00034053458300001111
Figure BDA00034053458300001112
wherein P isdI.e., P, d is 4 distances in equation 10
Figure BDA00034053458300001113
Or
Figure BDA00034053458300001114
One, k, denotes a dimension index of the position code.
In step S4, decoding is performed with a conditional random field, specifically:
sp401. for a given text sequence s, create its corresponding tag sequence y ═ y1,y2,…,yn}, Y(s) denotes all valid tag sequences;
sp402. calculate the probability of Y:
Figure BDA00034053458300001115
wherein f (y)i-1,yiN) calculating from yi-1To yiTransition score and y ofiA fraction of (d);
sp403. in conjunction with the probability of Y, the path that reaches the maximum probability is found by the Viterbi algorithm.
In the embodiment, experiments are designed to test the effectiveness of the named entity identification method.
In this embodiment, 900 ocean industry sentence data sets that can be used for named entity recognition are used, and the entity types include 8 types, such as PER (person), LOC (location), ORG (organization).
This example used BilSTM-CRF and FLAT as baseline models. The FLAT is a method which uses a character-word lattice structure, adopts a transform model of relative position coding for NER, simultaneously compares with other models, adopts embedding and lexicons which are the same as zhang, and uniformly uses embedding and lexicons which are the same as the FLAT when comparing with other models.
The following table shows the hyper-parameter settings of the model, using random gradient descent (SGD) as the optimization algorithm for the model.
Paremeter Value
epoch 100
batch 7
Char emb size 50
Lattice emb size 50
Cnn emb size 50
Cnn dropout 0.23
Transformer layer 1
head 8
Head dim 20
Learning rate lr 0.001
Input dropout 0.5
And adjusting the parameters of the system. We used random gradient descent (SGD) as the optimization algorithm for our model.
Figure BDA0003405345830000121
Figure BDA0003405345830000131
The present example employed the standard accuracy (P), the recall ratio (R), and the F1 score (F1) as evaluation indexes, specifically as follows.
Figure BDA0003405345830000132
Figure BDA0003405345830000133
Figure BDA0003405345830000134
TP represents the case where a word tagged as an entity is correctly identified as an entity and FP represents the case where a word other than an entity is predicted as an entity. Further, FN indicates a case where a word marked as an entity is not detected, and TN indicates a case where a word marked as a non-entity is correctly detected as a non-entity. The accuracy is the ratio of the number of correctly predicted entities to the total number of predicted entities. The recall is the ratio of the number of entities correctly predicted to the total number of entities. The F1 score is a harmonic mean of accuracy and recall that reflects the overall performance of the NER model.
As shown in the following table, the named entity identification method has relative advantages on evaluation indexes compared with other models,
Figure BDA0003405345830000135
it can be seen from the table that the baseline FLAT model has an average F1 score of 63.16%, and that the average F1 score increased to 65.07% over all other competitive models for comparison, especially those character-based models that did not use an external dictionary, in addition to the higher level features extracted from the character-level input by the deep convolutional neural network. The deep convolutional neural network can extract the relation between each character and the preceding and following characters, so that potential words in the text are captured, the entity boundary can be defined more clearly and accurately by combining with an external dictionary, and the accuracy of entity recognition is improved.
It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims (10)

1. A named entity identification method fusing information advanced features is characterized in that: the method comprises the following specific steps:
s1, acquiring text information to be analyzed, and preprocessing the text information to obtain preprocessed text characteristics;
s2, inputting the obtained preprocessed text features into a deep convolutional neural network to extract character-level features in the text, and inputting the extracted character-level features into a full-connection network to obtain the optimal feature representation of the characters;
s3, obtaining lattice through the obtained character-level features and the text information, and embedding the lattice to obtain integrated representation;
and S4, connecting the integrated representation and the optimal characteristic representation, using a Transformer as an encoder, decoding by using a conditional random field to obtain the relation between words, and performing entity recognition.
2. The named entity recognition method of converged information high-level features of claim 1, wherein: step by stepStep S1 specifically includes: the sequence of the obtained text information is expressed as s ═ { c ═ c1,c2,...,cnWhere n denotes sentence length, ciCharacters in the sentence s; each character c in the sentenceiMapping into a character vector, and the formula is as follows:
xi=ec(ci),
wherein x isiIs a character vector, ecIs a pre-training word vector containing characters and words; thus, the preprocessed text features are obtained:
X=[x1,x2,...,xn]。
3. the named entity recognition method of converged information high-level features of claim 2, wherein: step S2, the specific steps are:
s201, filling two ends of the preprocessed text features;
s202, scanning the filled text features by using three filters with different sizes, and extracting a first feature, a second feature and a third feature;
s203, performing maximum pooling operation on the first feature, the second feature and the third feature respectively, and splicing the operated first feature, the operated second feature and the operated third feature to obtain character information in the text;
s204, inputting the character information in the obtained text into a full-connection network, and obtaining the optimal feature expression x of the character informationc
4. The named entity recognition method of converged information high-level features of claim 3, wherein: the sizes of the three filters with different sizes in step S202 are 2 × 50,3 × 50, and 4 × 50, respectively.
5. The named entity recognition method of converged information high-level features, according to claim 4, wherein: step S3, the specific steps are:
s301, the text sequence is processedAll subsequences of s are matched with existing dictionary D respectively
Figure FDA0003405345820000021
To indicate a subsequence starting from character number b and ending with character number e, b and e being constants, and to obtain all sequences in text sequence s that match existing dictionary D
Figure FDA0003405345820000022
S302, cwMapping as vector: x is the number ofw=ec(cw) Wherein e iscIs a pre-training word vector comprising characters and words;
s303, preprocessing the text feature X and the vector XwSplicing to obtain latitexLI.e. xL=[X,xw];
S304. x isLInputting a full connection layer to obtain text feature X and vector XwIs represented by the integration of xl
xl=WLxL+b
Wherein WLAre trainable parameters.
6. The named entity recognition method of converged information high-level features, according to claim 5, wherein: the lattice comprises spans with different lengths, and the spans have three relations: including, separating, intersecting.
7. The named entity recognition method of converged information high-level features, according to claim 6, wherein: step S4, connecting the integrated representation and the optimal feature representation, specifically: expressing the optimal characteristics as xcAnd an integrated representation xlConnected, converted into original dimension x by linear projectionin
xin=Linear[xc;xl]
Where Linear represents a Linear function.
8. The named entity recognition method of converged information advanced features, according to claim 7, wherein: the Transformer in step S4 includes two neural networks, which are a self-attention layer and a feedforward neural network layer; wherein the self-attention layer is a multi-head attention network, which focuses on the input sequence in a multi-head parallel manner, then connects the multi-head results, and projects the multi-head results again to generate a final value; the feedforward neural network is a multilayer perceptron and consists of a plurality of layers of continuous nonlinear functions; the feedforward neural network comprises two linear transformations, and a ReLU activation function is arranged in the middle of the two linear transformations; the output from the attention sublayer will be input to a feedforward neural network for further processing.
9. The named entity recognition method of converged information high-level features of claim 8, wherein: the self-attention layer adopts relative position coding, and the specific steps are as follows:
st401. calculated by successive transformations of the header and trailer information; by head [ i ]]and tail[i]To respectively express the spansx in latticeiWhile representing the spansx by four relative distancesiAnd xjThe relationship of (1):
Figure FDA0003405345820000031
Figure FDA0003405345820000032
Figure FDA0003405345820000033
Figure FDA0003405345820000034
wherein
Figure FDA0003405345820000035
Denotes spansxiHead and spansx ofjIs measured by the distance between the heads of (a),
Figure FDA0003405345820000036
denotes spansxiHead and spansx ofjThe distance between the tails of (a) and (b),
Figure FDA0003405345820000037
denotes spansxiTail and spansx ofjIs measured by the distance between the heads of (a),
Figure FDA0003405345820000038
denotes spansxiTail and spansx ofjThe distance between the tails;
st402. through the nonlinear transformation of four distances, the relative position of the span is obtained:
Figure FDA0003405345820000039
wherein WrIs a parameter that can be trained in a particular way,
Figure FDA00034053458200000310
representing join operators, PdThe calculation is as follows:
Figure FDA00034053458200000311
Figure FDA00034053458200000312
wherein P isdI.e., P, d is 4 distances in equation 10
Figure FDA00034053458200000313
Or
Figure FDA00034053458200000314
One, k, denotes a dimension index of the position code.
10. The named entity recognition method of converged information high-level features of claim 9, wherein: in step S4, decoding is performed with a conditional random field, specifically:
sp401. for a given text sequence s, create its corresponding tag sequence as y ═ y1,y2,...,yn}, Y(s) denotes all valid tag sequences;
sp402. calculate the probability of Y:
Figure FDA00034053458200000315
wherein f (y)i-1,yiN) calculating from yi-1To yiTransition score and y ofiA fraction of (d);
sp403. in conjunction with the probability of Y, the path that reaches the maximum probability is found by the Viterbi algorithm.
CN202111510990.XA 2021-12-10 Named entity identification method integrating advanced features of information Active CN114048750B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202111510990.XA CN114048750B (en) 2021-12-10 Named entity identification method integrating advanced features of information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202111510990.XA CN114048750B (en) 2021-12-10 Named entity identification method integrating advanced features of information

Publications (2)

Publication Number Publication Date
CN114048750A true CN114048750A (en) 2022-02-15
CN114048750B CN114048750B (en) 2024-06-28

Family

ID=

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115329766A (en) * 2022-08-23 2022-11-11 中国人民解放军国防科技大学 Named entity identification method based on dynamic word information fusion

Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106354701A (en) * 2016-08-30 2017-01-25 腾讯科技(深圳)有限公司 Chinese character processing method and device
CN109753660A (en) * 2019-01-07 2019-05-14 福州大学 A kind of acceptance of the bid webpage name entity abstracting method based on LSTM
CN111783459A (en) * 2020-05-08 2020-10-16 昆明理工大学 Laos named entity recognition method based on improved transform + CRF
WO2021000362A1 (en) * 2019-07-04 2021-01-07 浙江大学 Deep neural network model-based address information feature extraction method
CN112270193A (en) * 2020-11-02 2021-01-26 重庆邮电大学 Chinese named entity identification method based on BERT-FLAT
CN112733541A (en) * 2021-01-06 2021-04-30 重庆邮电大学 Named entity identification method of BERT-BiGRU-IDCNN-CRF based on attention mechanism
WO2021114745A1 (en) * 2019-12-13 2021-06-17 华南理工大学 Named entity recognition method employing affix perception for use in social media
CN113297851A (en) * 2021-06-21 2021-08-24 北京富通东方科技有限公司 Recognition method for confusable sports injury entity words
CN113743122A (en) * 2021-09-14 2021-12-03 河南工业大学 Grain situation named entity identification method based on new word discovery and Flat-lattice

Patent Citations (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106354701A (en) * 2016-08-30 2017-01-25 腾讯科技(深圳)有限公司 Chinese character processing method and device
CN109753660A (en) * 2019-01-07 2019-05-14 福州大学 A kind of acceptance of the bid webpage name entity abstracting method based on LSTM
WO2021000362A1 (en) * 2019-07-04 2021-01-07 浙江大学 Deep neural network model-based address information feature extraction method
WO2021114745A1 (en) * 2019-12-13 2021-06-17 华南理工大学 Named entity recognition method employing affix perception for use in social media
CN111783459A (en) * 2020-05-08 2020-10-16 昆明理工大学 Laos named entity recognition method based on improved transform + CRF
CN112270193A (en) * 2020-11-02 2021-01-26 重庆邮电大学 Chinese named entity identification method based on BERT-FLAT
CN112733541A (en) * 2021-01-06 2021-04-30 重庆邮电大学 Named entity identification method of BERT-BiGRU-IDCNN-CRF based on attention mechanism
CN113297851A (en) * 2021-06-21 2021-08-24 北京富通东方科技有限公司 Recognition method for confusable sports injury entity words
CN113743122A (en) * 2021-09-14 2021-12-03 河南工业大学 Grain situation named entity identification method based on new word discovery and Flat-lattice

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
石春丹;秦岭: "基于BGRU-CRF的中文命名实体识别方法", 计算机科学, no. 009, 31 December 2019 (2019-12-31) *
邓博研;程良伦: "基于ALBERT的中文命名实体识别方法", 计算机科学与应用, no. 005, 31 December 2020 (2020-12-31) *

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115329766A (en) * 2022-08-23 2022-11-11 中国人民解放军国防科技大学 Named entity identification method based on dynamic word information fusion

Similar Documents

Publication Publication Date Title
CN113220919B (en) Dam defect image text cross-modal retrieval method and model
CN111382565B (en) Emotion-reason pair extraction method and system based on multiple labels
CN111476023B (en) Method and device for identifying entity relationship
CN111738004A (en) Training method of named entity recognition model and named entity recognition method
CN111897908A (en) Event extraction method and system fusing dependency information and pre-training language model
CN111737975A (en) Text connotation quality evaluation method, device, equipment and storage medium
CN113221567A (en) Judicial domain named entity and relationship combined extraction method
CN114169330A (en) Chinese named entity identification method fusing time sequence convolution and Transformer encoder
CN110347787B (en) Interview method and device based on AI auxiliary interview scene and terminal equipment
CN111460142B (en) Short text classification method and system based on self-attention convolutional neural network
CN111814463B (en) International disease classification code recommendation method and system, corresponding equipment and storage medium
CN112487812A (en) Nested entity identification method and system based on boundary identification
CN114020906A (en) Chinese medical text information matching method and system based on twin neural network
CN116702091B (en) Multi-mode ironic intention recognition method, device and equipment based on multi-view CLIP
CN111984780A (en) Multi-intention recognition model training method, multi-intention recognition method and related device
CN112100212A (en) Case scenario extraction method based on machine learning and rule matching
CN114925157A (en) Nuclear power station maintenance experience text matching method based on pre-training model
CN113704396A (en) Short text classification method, device, equipment and storage medium
CN112052319A (en) Intelligent customer service method and system based on multi-feature fusion
CN114564950A (en) Electric Chinese named entity recognition method combining word sequence
CN116822513A (en) Named entity identification method integrating entity types and keyword features
CN117009516A (en) Converter station fault strategy model training method, pushing method and device
CN115965026A (en) Model pre-training method and device, text analysis method and device and storage medium
CN115994220A (en) Contact net text data defect identification method and device based on semantic mining
CN114048750B (en) Named entity identification method integrating advanced features of information

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant