CN114048750A

CN114048750A - Named entity identification method integrating information advanced features

Info

Publication number: CN114048750A
Application number: CN202111510990.XA
Authority: CN
Inventors: 程良伦; 聂梦娜; 张伟文; 叶海明
Original assignee: Guangdong University of Technology
Current assignee: Guangdong University of Technology
Priority date: 2021-12-10
Filing date: 2021-12-10
Publication date: 2022-02-15
Anticipated expiration: 2041-12-10

Abstract

The invention relates to the technical field of natural language processing, and discloses a named entity identification method fusing advanced features of information, which comprises the following specific steps: s1, acquiring text information to be analyzed, and preprocessing the text information to obtain preprocessed text characteristics; s2, inputting the obtained preprocessed text features into a deep convolutional neural network to extract character-level features in the text, and inputting the extracted character-level features into a full-connection network to obtain the optimal feature representation of the characters; s3, obtaining lattice through the obtained character-level features and the text information, and embedding the lattice to obtain integrated representation; and S4, connecting the integrated representation and the optimal characteristic representation, using a Transformer as an encoder, decoding by using a conditional random field to obtain the relation between words, and performing entity recognition. The method solves the problem that the named entity cannot be reliably and efficiently extracted in the prior art, and has the characteristics of convenience in calculation, practicability and high efficiency.

Description

Named entity identification method integrating information advanced features

Technical Field

The invention relates to the technical field of natural language processing, in particular to a named entity identification method fusing information advanced features.

Background

Named entity recognition is a task of recognizing entities in unstructured text, such as people, places, organizations, and the like, and plays a significant role in the recognition of the entities in unstructured text. Named entity recognition has been widely used in many scenarios, such as relationship extraction, question and answer, event extraction, information retrieval, and knowledge graph construction, based on previous researchers' research.

For languages where words are naturally separated, such as english, named entity recognition is often used as a sequence tagging problem and has achieved the most advanced results in deep learning based approaches. In contrast to English, Chinese does not have empty lattices as delimiters for words, so Chinese words are generally obtained by a segmentation tool and introducing an external dictionary. In non-general fields such as the marine industry, many professional words can not be effectively obtained through word segmentation tools or means such as matching with external dictionaries, such as whale, maritime scenic spots and underwater restaurants, so that entity boundaries are fuzzy, and a lot of difficulties are brought to Chinese named entity recognition.

In order to solve the problems in the prior art, the prior patent discloses a method and a device for identifying a named entity; wherein, the method comprises the following steps: extracting information of the character image by using a convolutional neural network model CNN to obtain a font vector corresponding to characters in the character image; splicing the font vector and the character vector corresponding to the character, and acquiring a characteristic vector according to the spliced vector obtained by splicing; obtaining a named entity set according to the feature vector, wherein the named entity set comprises a plurality of named entities; and establishing a question corresponding to the text image, and positioning to obtain a named entity to be acquired based on the question. However, the existing patent method has low precision and cannot reliably and efficiently extract named entities; therefore, how to invent a method for identifying a named entity, which can reliably and efficiently extract the named entity, is an urgent problem to be solved in the technical field.

Disclosure of Invention

The invention provides a named entity identification method fusing information advanced features, aiming at solving the problem that the named entity cannot be extracted reliably and efficiently in the prior art, and having the characteristics of convenience in calculation, practicability and high efficiency.

In order to achieve the purpose of the invention, the technical scheme is as follows:

a named entity identification method fusing information advanced features comprises the following specific steps:

s1, obtaining text information to be analyzed, and preprocessing the text information to obtain preprocessed text characteristics;

s2, inputting the obtained preprocessed text features into a deep convolutional neural network to extract character-level features in the text, and inputting the extracted character-level features into a full-connection network to obtain the optimal feature representation of the characters;

s3, obtaining lattice through the obtained character-level features and the text information, and embedding the lattice to obtain integrated representation;

s4, connecting the integrated representation and the optimal feature representation, using a Transformer as an encoder,

and decoding by using the conditional random field to obtain the relation between words and performing entity recognition.

Preferably, in step S1, specifically, the method includes: the sequence of the obtained text information is expressed as s ═ { c ═ c₁,c₂,…,c_nWhere n denotes sentence length, c_iCharacters in the sentence s; each character c in the sentence to be input_iMapping into a character vector, and the formula is as follows:

x_i＝e^c(c_i),

wherein x_iIs a character vector, e^cIs a pre-training word vector containing characters and words; thus, the preprocessed text features are obtained:

X＝[x₁,x₂,…,x_n]。

further, in step S2, the specific steps are:

s201, filling two ends of the preprocessed text features;

s202, scanning the filled text features by using three filters with different sizes, and extracting a first feature, a second feature and a third feature;

s203, performing maximum pooling operation on the first feature, the second feature and the third feature respectively, and splicing the operated first feature, the operated second feature and the operated third feature to obtain character information in the text;

s204, inputting the character information in the obtained text into a full-connection network, and obtaining the optimal feature expression x of the character information_c。

Further, in step S202, the three filters with different sizes are three filters with areas of 2 × 50,3 × 50 and 4 × 50, respectively.

Further, in step S3, the specific steps are:

s301, respectively using all subsequences of the text sequence s and the existing dictionary D

To indicate a subsequence starting from character number b and ending with character number e, b and e being constants, and to obtain all sequences in text sequence s that match existing dictionary D

S302, c_wMapping as vector: x is the number of_w＝e^c(c_w) Wherein e is^cIs a pre-training word vector containing characters and words;

s303, preprocessing the text feature X and the vector X_wSplicing to obtain latitex_LI.e. x_L＝[X,x_w]；

S304. x is_LInputting a full connection layer to obtain the feature X and the vector X_wIs represented by the integration of x_l：

x_l＝W_Lx_L+b

Wherein W_LAre trainable parameters.

Furthermore, the lattice comprises spans with different lengths, and the spans have three relations: including, separating, intersecting.

Further, step S4, connecting the integrated representation and the optimal feature representation, specifically: expressing the optimal characteristics as x_cAnd an integrated representation x_lConnected, converted into original dimension x by linear projection_in：

x_in＝Linear[x_c；x_l]

Where Linear represents a Linear function.

Further, the Transformer in step S4 includes two neural networks, namely a self-attention layer and a feedforward neural network layer; wherein the self-attention layer is a multi-head attention network, which focuses on the input sequence in a multi-head parallel manner, then connects the multi-head results, and projects the multi-head results again to generate a final value; the feedforward neural network is a multilayer perceptron and consists of a plurality of layers of continuous nonlinear functions; the feedforward neural network comprises two linear transformations, and a ReLU activation function is arranged in the middle of the two linear transformations; the output from the attention sublayer will be input to a feedforward neural network for further processing.

Furthermore, the self-attention layer adopts relative position coding, and the specific steps are as follows:

st401. calculated by successive transformations of the header and trailer information; using head [ i ] and tail [ i ]

To respectively express the spansx in lattice_iWhile representing the spansx by four relative distances_iAnd x_jThe relationship of (1):

wherein

Denotes spansx_iHead and spansx of_jIs measured by the distance between the heads of (a),

denotes spansx_iHead and spansx of_jThe distance between the tails of (a) and (b),

denotes spansx_iTail and spansx of_jIs measured by the distance between the heads of (a),

denotes spansx_iTail and spansx of_jThe distance between the tails;

st402. through the nonlinear transformation of the four distances, the relative position of the span is obtained:

wherein W_rIs a parameter that can be trained in a particular way,

representing join operators, P_dThe calculation is as follows:

wherein P is_dI.e., P, d is 4 distances in equation 10

Or

One, k, denotes a dimension index of the position code.

Further, in step S4, the decoding is performed by using a conditional random field, specifically:

sp401. for a given text sequence s, create its corresponding tag sequence y ═ y₁,y₂,…,y_n}, Y(s) denotes all valid tag sequences;

sp402. calculate the probability of Y:

wherein f (y)_i-1,y_iN) calculating from y_i-1To y_iTransition score and y of_iA fraction of (d);

sp403. in conjunction with the probability of Y, the path that reaches the maximum probability is found by the Viterbi algorithm.

The invention has the following beneficial effects:

the invention provides a named entity recognition method fusing information high-level features, which obtains lattice through character-level features and text information obtained by a deep convolutional neural network, extracts the character-level features and sends the extracted character-level features and lattice representations to a full-connection network so as to obtain the optimal feature representation of each input. Then, the words are encoded through a transformer and decoded in a conditional random field so as to effectively find the relation among the words and effectively identify the entity. Compared with other models, the method has the advantages that the effectiveness of the more fine-grained features extracted in the embedding layer is improved, the features capture meaningful information from the entity, and the problem that the named entity cannot be extracted reliably and efficiently in the prior art is solved.

Drawings

Fig. 1 is a flow chart of the named entity recognition method.

Fig. 2 is a model structure of the named entity recognition method.

Fig. 3 is a flowchart illustrating step S2 of the named entity recognition method.

Detailed Description

The invention is described in detail below with reference to the drawings and the detailed description.

Example 1

As shown in fig. 1, a named entity recognition method with integrated information advanced features includes the following specific steps:

and S4, connecting the integrated representation and the optimal characteristic representation, using a Transformer as an encoder, decoding by using a conditional random field to obtain the relation between words, and performing entity recognition.

Step S1 specifically includes: the sequence of the obtained text information is expressed as s ═ { c ═ c₁,c₂,…,c_nWhere n denotes sentence length, c_iCharacters in the sentence s; each character c in the sentence to be input_iMapping into a character vector, and the formula is as follows:

x_i＝e^c(c_i),

X＝[x₁,x₂,…,x_n]。

as shown in fig. 3, step S2 includes the following specific steps:

s201, filling two ends of the preprocessed text features;

s203, performing maximum pooling operation on the first feature, the second feature and the third feature respectively, and splicing the operated first feature, the operated second feature and the operated third feature to obtain character information in the text; in this embodiment, a max pooling operation is used to reduce the fit.

The three filters with different sizes in step S202 are three filters with areas of 2 × 50,3 × 50, and 4 × 50, respectively.

Example 2

x_i＝e^c(c_i),

X＝[x₁,x₂,…,x_n]。

as shown in fig. 3, step S2 includes the following specific steps:

s201, filling two ends of the preprocessed text features;

Step S3, the specific steps are:

x_l＝W_Lx_L+b

Wherein W_LAre trainable parameters.

The lattice comprises spans with different lengths, and the spans have three relations: including, separating, intersecting.

Step S4, connecting the integrated representation and the optimal feature representation, specifically: expressing the optimal characteristics as x_cAnd an integrated representation x_lConnected, converted into original dimension x by linear projection_in：

x_in＝Linear[x_c；x_l]

Where Linear represents a Linear function.

As shown in fig. 2, the Transformer in step S4 includes two neural networks, namely a self-attention layer and a feedforward neural network layer; in this embodiment, each layer is followed by summing and normalization in turn. In other words, the output of each layer is LayerNorm (x + layer (x)), layer (x) is the output from an attention or feed-forward neural network; the self-attention layer is a multi-head attention network, which focuses on the input sequence in a multi-head parallel manner, then connects the multi-head results, and projects the result again to generate a final value. The results of the H-heads are then concatenated and projected again to produce the final value. The manner of calculation for each head is as follows.

Att(A,V)＝softmax(A)V (13)

A_ij＝(Q_i+u)^TK_j+(Q_i+v)^TR_ijW_R (14)

[Q,K,V]＝E_x[W_q, W_k, W_v] (15)

E is the input vector x_in.W_q,W_kW_vIs a trainable matrix for combining E_xProjected into different spaces. W_RIs a trainable matrix, Q_i,K_jAre each span x_iAnd x_jV is a value vector A_ijIndicating the attention score. The feedforward neural network is a multilayer perceptron and consists of a plurality of layers of continuous nonlinear functions; the feedforward neural network comprises two linear transformations, and a ReLU activation function is arranged in the middle of the two linear transformations; the output from the attention sublayer will be input to a feedforward neural network for further processing.

The self-attention layer adopts relative position coding, and the specific steps are as follows:

wherein

denotes spansx_iTail and spansx of_jThe distance between the tails;

wherein W_rIs a parameter that can be trained in a particular way,

representing join operators, P_dThe calculation is as follows:

wherein P is_dI.e., P, d is 4 distances in equation 10

Or

One, k, denotes a dimension index of the position code.

In step S4, decoding is performed with a conditional random field, specifically:

sp402. calculate the probability of Y:

Example 3

in this embodiment, the text information to be analyzed includes 900 pieces of marine industry sentence data that can be used for named entity recognition.

x_i＝e^c(c_i),

X＝[x₁,x₂,…,x_n]。

as shown in fig. 3, step S2 includes the following specific steps:

s201, filling two ends of the preprocessed text features;

Step S3, the specific steps are:

x_l＝W_Lx_L+b

Wherein W_LAre trainable parameters.

x_in＝Linear[x_c；x_l]

Where Linear represents a Linear function.

Att(A,V)＝softmax(A)V (13)

A_ij＝(Q_i+u)^TK_j+(Q_i+v)^TR_ijW_R (14)

[Q,K,V]＝E_x[W_q, W_k, W_v] (15)

E is the input vector x_in.W_q,W_k,W_vIs a trainable matrix for combining E_xProjected into different spaces. W_RIs a trainable matrix, Q_i,K_jAre each span x_iAnd x_jV is a value vector A_ijIndicating the attention score. The feedforward neural network is a multilayer perceptron and consists of a plurality of layers of continuous nonlinear functions; the feedforward neural network comprises two linear transformations, and a ReLU activation function is arranged in the middle of the two linear transformations; the output from the attention sublayer will be input to a feedforward neural network for further processing.

st401. calculated by successive transformations of the header and trailer information; by head [ i ]]and tail[i]To respectively express the spansx in lattice_iWhile representing the spansx by four relative distances_iAnd x_jThe relationship of (1):

wherein

denotes spansx_iTail and spansx of_jThe distance between the tails;

wherein W_rIs a parameter that can be trained in a particular way,

representing join operators, P_dThe calculation is as follows:

wherein P is_dI.e., P, d is 4 distances in equation 10

Or

One, k, denotes a dimension index of the position code.

sp402. calculate the probability of Y:

In the embodiment, experiments are designed to test the effectiveness of the named entity identification method.

In this embodiment, 900 ocean industry sentence data sets that can be used for named entity recognition are used, and the entity types include 8 types, such as PER (person), LOC (location), ORG (organization).

This example used BilSTM-CRF and FLAT as baseline models. The FLAT is a method which uses a character-word lattice structure, adopts a transform model of relative position coding for NER, simultaneously compares with other models, adopts embedding and lexicons which are the same as zhang, and uniformly uses embedding and lexicons which are the same as the FLAT when comparing with other models.

The following table shows the hyper-parameter settings of the model, using random gradient descent (SGD) as the optimization algorithm for the model.

Paremeter	Value
		epoch	100
batch	7
		Char emb size	50
Lattice emb size	50
		Cnn emb size	50
Cnn dropout	0.23
		Transformer layer	1
head	8
		Head dim	20
Learning rate lr	0.001
		Input dropout	0.5

And adjusting the parameters of the system. We used random gradient descent (SGD) as the optimization algorithm for our model.

The present example employed the standard accuracy (P), the recall ratio (R), and the F1 score (F1) as evaluation indexes, specifically as follows.

TP represents the case where a word tagged as an entity is correctly identified as an entity and FP represents the case where a word other than an entity is predicted as an entity. Further, FN indicates a case where a word marked as an entity is not detected, and TN indicates a case where a word marked as a non-entity is correctly detected as a non-entity. The accuracy is the ratio of the number of correctly predicted entities to the total number of predicted entities. The recall is the ratio of the number of entities correctly predicted to the total number of entities. The F1 score is a harmonic mean of accuracy and recall that reflects the overall performance of the NER model.

As shown in the following table, the named entity identification method has relative advantages on evaluation indexes compared with other models,

it can be seen from the table that the baseline FLAT model has an average F1 score of 63.16%, and that the average F1 score increased to 65.07% over all other competitive models for comparison, especially those character-based models that did not use an external dictionary, in addition to the higher level features extracted from the character-level input by the deep convolutional neural network. The deep convolutional neural network can extract the relation between each character and the preceding and following characters, so that potential words in the text are captured, the entity boundary can be defined more clearly and accurately by combining with an external dictionary, and the accuracy of entity recognition is improved.

It should be understood that the above-described embodiments of the present invention are merely examples for clearly illustrating the present invention, and are not intended to limit the embodiments of the present invention. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the claims of the present invention.

Claims

1. A named entity identification method fusing information advanced features is characterized in that: the method comprises the following specific steps:

s1, acquiring text information to be analyzed, and preprocessing the text information to obtain preprocessed text characteristics;

2. The named entity recognition method of converged information high-level features of claim 1, wherein: step by stepStep S1 specifically includes: the sequence of the obtained text information is expressed as s ═ { c ═ c₁，c₂，...，c_nWhere n denotes sentence length, c_iCharacters in the sentence s; each character c in the sentence_iMapping into a character vector, and the formula is as follows:

x_i＝e^c(c_i)，

wherein x is_iIs a character vector, e^cIs a pre-training word vector containing characters and words; thus, the preprocessed text features are obtained:

X＝[x₁，x₂，...，x_n]。

3. the named entity recognition method of converged information high-level features of claim 2, wherein: step S2, the specific steps are:

s201, filling two ends of the preprocessed text features;

4. The named entity recognition method of converged information high-level features of claim 3, wherein: the sizes of the three filters with different sizes in step S202 are 2 × 50,3 × 50, and 4 × 50, respectively.

5. The named entity recognition method of converged information high-level features, according to claim 4, wherein: step S3, the specific steps are:

s301, the text sequence is processedAll subsequences of s are matched with existing dictionary D respectively

S302, c_wMapping as vector: x is the number of_w＝e^c(c_w) Wherein e is^cIs a pre-training word vector comprising characters and words;

s303, preprocessing the text feature X and the vector X_wSplicing to obtain latitex_LI.e. x_L＝[X，x_w]；

S304. x is_LInputting a full connection layer to obtain text feature X and vector X_wIs represented by the integration of x_l：

x_l＝W_Lx_L+b

Wherein W_LAre trainable parameters.

6. The named entity recognition method of converged information high-level features, according to claim 5, wherein: the lattice comprises spans with different lengths, and the spans have three relations: including, separating, intersecting.

7. The named entity recognition method of converged information high-level features, according to claim 6, wherein: step S4, connecting the integrated representation and the optimal feature representation, specifically: expressing the optimal characteristics as x_cAnd an integrated representation x_lConnected, converted into original dimension x by linear projection_in：

x_in＝Linear[x_c；x_l]

Where Linear represents a Linear function.

8. The named entity recognition method of converged information advanced features, according to claim 7, wherein: the Transformer in step S4 includes two neural networks, which are a self-attention layer and a feedforward neural network layer; wherein the self-attention layer is a multi-head attention network, which focuses on the input sequence in a multi-head parallel manner, then connects the multi-head results, and projects the multi-head results again to generate a final value; the feedforward neural network is a multilayer perceptron and consists of a plurality of layers of continuous nonlinear functions; the feedforward neural network comprises two linear transformations, and a ReLU activation function is arranged in the middle of the two linear transformations; the output from the attention sublayer will be input to a feedforward neural network for further processing.

9. The named entity recognition method of converged information high-level features of claim 8, wherein: the self-attention layer adopts relative position coding, and the specific steps are as follows:

wherein

denotes spansx_iTail and spansx of_jThe distance between the tails;

st402. through the nonlinear transformation of four distances, the relative position of the span is obtained:

wherein W_rIs a parameter that can be trained in a particular way,

representing join operators, P_dThe calculation is as follows:

wherein P is_dI.e., P, d is 4 distances in equation 10

Or

One, k, denotes a dimension index of the position code.

10. The named entity recognition method of converged information high-level features of claim 9, wherein: in step S4, decoding is performed with a conditional random field, specifically:

sp401. for a given text sequence s, create its corresponding tag sequence as y ═ y₁，y₂，...，y_n}, Y(s) denotes all valid tag sequences;

sp402. calculate the probability of Y:

wherein f (y)_i-1，y_iN) calculating from y_i-1To y_iTransition score and y of_iA fraction of (d);