CN110765775A - Self-adaptive method for named entity recognition field fusing semantics and label differences - Google Patents

Self-adaptive method for named entity recognition field fusing semantics and label differences Download PDF

Info

Publication number
CN110765775A
CN110765775A CN201911059048.9A CN201911059048A CN110765775A CN 110765775 A CN110765775 A CN 110765775A CN 201911059048 A CN201911059048 A CN 201911059048A CN 110765775 A CN110765775 A CN 110765775A
Authority
CN
China
Prior art keywords
sentence
sentences
character
vector
representation
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201911059048.9A
Other languages
Chinese (zh)
Other versions
CN110765775B (en
Inventor
李思
王蓬辉
徐雅静
李明正
孙忆南
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing University of Posts and Telecommunications
Original Assignee
Beijing University of Posts and Telecommunications
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing University of Posts and Telecommunications filed Critical Beijing University of Posts and Telecommunications
Priority to CN201911059048.9A priority Critical patent/CN110765775B/en
Publication of CN110765775A publication Critical patent/CN110765775A/en
Application granted granted Critical
Publication of CN110765775B publication Critical patent/CN110765775B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/044Recurrent networks, e.g. Hopfield networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/04Architecture, e.g. interconnection topology
    • G06N3/045Combinations of networks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • G06N3/084Backpropagation, e.g. using gradient descent

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Data Mining & Analysis (AREA)
  • General Health & Medical Sciences (AREA)
  • Biomedical Technology (AREA)
  • Biophysics (AREA)
  • Computational Linguistics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Molecular Biology (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Software Systems (AREA)
  • Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a method for expanding training data of a target domain by selecting positive sample data in source domain data through fusing semantic difference and label difference of sentences in a source domain and the target domain so as to enhance named entity recognition performance of the target domain. On the basis of a conventional Bi-LSTM + CRF model, in order to fuse semantic differences and label differences of sentences in a source domain and a target domain, the semantic differences and the label differences are introduced through state representation and reward setting in reinforcement learning, so that a trained decision network can select sentences which have positive influence on the recognition performance of named entities of the target domain in data of the source domain, the training data of the target domain is expanded, the problem of insufficient training data of the target domain is solved, and the recognition performance of the named entities of the target domain is improved.

Description

Self-adaptive method for named entity recognition field fusing semantics and label differences
Technical Field
The invention relates to the technical field of internet, in particular to a method for fusing semantic difference and label difference between fields and carrying out field migration on a named entity recognition task.
Background
In recent years, deep learning and machine learning have made great progress in computer vision and natural language processing. In the aspect of computer vision, people classify images by using a deep neural network, such as a convolutional neural network for recognizing handwritten figures, and achieve the accuracy rate exceeding the recognition rate of human beings in the aspect; in the aspect of natural language processing, deep learning is more applied to various life scenes, such as analyzing browsing records and consumption behaviors of a user by using a neural network, pushing products which the user may like, and training a translation system by using a large amount of parallel corpora to enable a machine to achieve a high level of translation capability. With the increase of internet users, more and more information is generated, and how to automatically extract useful information from the large amount of user information has a very important meaning. The Chinese named entity recognition is used as an upstream task of information extraction, and the development of the Chinese named entity recognition is very critical to the information extraction technology.
The Chinese named entity recognition refers to recognition of entities with specific meanings in texts, and generally comprises names of people, places, time and the like. The named entity recognition is performed on the text because many downstream tasks need entity information in the text, for example, information extraction is very concerned about the entity information in the text, relationship extraction is needed to know the entity information in the text, then the relationship between the entities is determined, and the named entity recognition has very important significance for machine translation and knowledge graph construction.
Chinese named entity recognition typically involves two processes: (1) determining the boundary of an entity; (2) the type of the entity is identified. Generally, we look at named entity identification as a problem with sequence labeling, employing labeling rules to label both the type and the boundaries of an entity. Traditional methods for named entity recognition include maximum entropy models, support vector machine models, and conditional random fields. In recent years, advanced learning methods such as recurrent neural networks, convolutional neural networks and the like are also widely applied to Chinese named entity recognition, and higher accuracy is achieved on a plurality of large corpora.
Deep learning is a feature that allows a neural network to capture data automatically, and a large amount of data is often required to obtain high accuracy. However, in the aspect of Chinese named entity recognition, the existing large corpus is only in the news field, and the labeled corpus in the microblog field is few, so that the trained neural network cannot achieve good accuracy in the field. In recent years, in order to improve accuracy, a migration learning mode is adopted for named entity recognition tasks in the fields of microblogs and the like, and the performance of named entity recognition models in the fields of microblogs and the like is improved mainly through large-scale corpora outside the fields of news and the like.
In domain migration, corpora with large-scale annotations are referred to as source domain data, and corpora with no annotations or only a small amount of annotations are referred to as target domain data. Meanwhile, domain migration using unmarked target domain data is called unsupervised domain migration, and domain migration using a small amount of marked target domain data is called semi-supervised domain migration.
There are two problems with the domain migration of Chinese named entity recognition: firstly, there is a great difference between the sentence semantics of the corpus, and secondly, there is a difference between the tag sets of the sentences of the corpus, which is caused by different labeling rules. In order to solve these problems, the existing domain migration technology migrates based on semantic vectors of sentences in different corpora on one hand and on conversion relationships of different labels of sentences in different corpora on the other hand.
In the article "A Unified Model for Cross-Domain and Semi-Supervised Named EntityRecognization in Chinese Social Media", authors perform Domain migration for Named entity recognition based on the similarity between sentences in the source corpus and the target Domain corpus.
Firstly, training a word vector generation model by using a large number of sentences of an unlabeled corpus to obtain a pre-trained word vector dictionary, then searching and obtaining a word vector corresponding to each word in a source field and a target field according to the dictionary, then averaging all the word vectors in the sentences to obtain an expression form of a sentence vector of each sentence, and finally calculating a learning rate corresponding to training of each sentence according to the following formula.
α(x)=α0(x)*func(x,IN)
Figure BDA0002257366220000011
Wherein v isxSentence vector of sentences referring to the source domain, α0Is the learning rate of the target domain sentence, C is an adjustable parameter.
In the article "Named Entity Recognition for Novel Types by Transfer Learning", authors propose to use a two-layer linear network to learn the correlation between the labels of the source domain and the target domain for domain migration.
Firstly, a named entity recognition model is trained by using a large amount of data of a source domain, then the relevance of labels between the source domain and a target is learned by using a double-layer linear network, and finally a conditional random field is trained by using data of a target domain to obtain an output label of the target domain.
The inventor finds out in the research process that: for "A Unified Model for Cross-Domain and semi-Supervised Named ability Recognition in Chinese Social Media", "Named ability Recognition for Novel Types by Transfer Learning" the prior art:
1. conventionally, whether a current source domain sentence is beneficial to training a target domain naming recognition model is judged according to semantic similarity of a source domain and a target domain, and influence of label difference of an entity is not considered.
2. When the label transfer relationship between the source domain and the target domain is adopted for migration, the situation that semantic vectors of sentences in the source domain and the target domain are too different is not fully considered.
Disclosure of Invention
In order to solve the technical problems, the invention provides a method for carrying out domain migration on a named entity task by fusing semantics and label differences, which introduces the semantics differences and the label differences between a source domain and a target domain through state representation and reward setting in deep reinforcement learning, trains a decision network, and selectively adds data of the source domain into a training process, so that positive sample data in the source domain can enhance the named entity recognition performance of the target domain, and simultaneously avoids the influence of negative sample data in the source domain on the target domain.
The invention provides a method for performing domain migration by fusing semantic difference and label difference of texts in a source domain and a target domain. And selectively adding data of the source field by a reinforcement learning training decision network to enhance the performance of named entity recognition of the target field.
The method comprises the steps of firstly, preprocessing sentences in a corpus of a source field and a target field, removing websites and special symbols in the source field and the target field, and converting complex and simplified sentences into Chinese simplified sentences.
And step two, processing the labels of the sentences in the source field corpus, and unifying the entity label sets of the target field and the source field.
And step three, mapping the sentences in the source field and the sentences in the target field into vector representations according to the same dictionary, and digitizing the input text into a numerical matrix formed by connecting character vector columns.
And step four, in order to enhance the representation of the character vector, splicing the word segmentation label and the bigram vector of each character behind the character vector to introduce word level information and word segmentation information.
And step five, extracting context-related feature vectors of each character by adopting a Bidirectional Long-Short term memory neural network (Bi-LSTM), and obtaining the probability of each character entity label by using a linear layer.
And sixthly, decoding by adopting a Conditional Random Field (CRF) to obtain a final label of each word and form an output label sequence.
And step seven, performing the operations from the first step to the sixth step by using the corpus of the target domain to obtain the named entity recognition model of the target domain.
And step eight, obtaining the state representation and the current reward of each sentence in the source field by adopting the named entity recognition model obtained in the step seven.
And step nine, the decision network makes corresponding actions according to the state representation of the sentences in the current source field, judges whether to add the current sentences into the training data, and then calculates the loss function of the decision network according to the reward of each sentence to perform gradient back propagation.
Combining the source domain sentences selected by the decision network with the target domain sentences to obtain expanded training data, and continuing to train the named entity recognition model of the target domain.
And step eleven, continuously repeating the step eight to the step ten, selecting the model with the maximum F value obtained on the development set, carrying out model test, and storing the model.
Further, in the non-training case, the steps one to ten are replaced by:
step one, a sentence in a target field corpus is used as an input of a trained named entity recognition model;
mapping each character of a sentence in the target corpus to corresponding vector representation by using a character vector dictionary in the training process;
inputting the vector representation of each sentence into a bidirectional long-short term memory neural network to obtain the feature representation of each sentence related to the context;
inputting the obtained characteristic representation of the sentence into a linear layer to obtain the prediction probability of various labels of each character in the sentence;
and step five, inputting the label prediction probability of each character into the conditional random field, and decoding to obtain an optimal sequence to obtain a result of named entity recognition.
Further, in the third step, mapping the chinese characters in the target domain and the source domain into vector representations by using the same dictionary, including:
randomly initializing a mapping dictionary by adopting a word embedding method, randomly initializing the same dense vector representation for the same characters, and mapping each Chinese character of the corpus data into the dense vector representation through the mapping dictionary;
training a word vector to obtain a vector representation containing certain word information by adopting a word Bag model Skip-Gram or Continuous Bag-of-words (CBOW), and mapping each Chinese character of the corpus data into a dense vector representation through a mapping dictionary.
Further, in the fourth step, in order to enhance the representation of the character vectors, the word segmentation information and the bigram information are added after each character vector, which is specifically represented as follows:
xi=[ci:bi:segi]
wherein c isiIs the character vector of the ith character in the sentence, biIs the corresponding bigram vector, segiThen it is a word segmentation tag.
Further, in the fifth step, the numerical matrix is input into the bidirectional long-short term memory neural network to obtain the feature representation, and the calculation process is as follows:
ft=σ(Wf·[ht-1:xt]+bf)
it=σ(Wi·[ht-1:xt]+bi)
Figure BDA0002257366220000031
Figure BDA0002257366220000032
ot=σ(Wo·[ht-1:xt]+bo)
ht=ot*tanh(Ct)
wherein f ist、it
Figure BDA0002257366220000033
Outputs representing states of forgetting gate, memory gate and temporary cell, respectively, CtIs the cell state at the present time, otIs the output of the current output gate,htThe output of the hidden layer state is taken as the characteristic representation of each character.
Further, the calculation method of the state representation and the reward of the sentences in the source field in the step eight is as follows:
st=(h1+h2+…+hn)/n
reward=log P(Y|X)
wherein h is1,h2,…,hnIs the output of the sentence outside the domain after passing through the two-way long-short term memory neural network, P (Y | X) is the probability of the label sequence obtained by conditional random field decoding, stAnd reward is the state representation of the sentence and the reward the current sentence received in the named entity recognition model.
Further, the decision network in the ninth step has the following judgment mode:
a=softmax(W·st+b)
where W, b are the weight parameters of the selector, softmax is the normalization operation, a ∈ R2x1Is the action of the output of the selector, we adopt the multilayer perceptron as our decision network, the decision network makes the corresponding action a according to the current state of each sentence, if a0>And 0.5, selecting a sentence and adding the sentence into the training data, otherwise, discarding the sentence, obtaining corresponding reward at the same time, calculating a loss function of the decision network, and performing reverse gradient propagation.
The loss function is calculated as follows:
Loss=-reward*(a0loga0+(1-a0)log(1-a0))+L1+L2
wherein L is1,L2Is L of a selector1,L2The regularization parameter, reward, is the reward that the current sentence receives in the named entity recognition model.
The invention provides a method for carrying out domain migration on a named entity recognition task by fusing semantics and label differences, which adopts a neural network to replace a decision network in reinforcement learning, thereby avoiding the problem of infinite state space in natural language processing; and meanwhile, semantic difference and label difference between a source field and a target field are introduced by using state representation and reward setting in reinforcement learning, and training of a decision network is carried out, so that the decision network can select sentences which have positive influence on a target domain named entity recognition model, and the field migration based on examples on Chinese named entity recognition is realized.
Drawings
FIG. 1 is a flow chart of a first embodiment;
fig. 2 is a network structure diagram of a domain migration method for fusing semantic and tag differences on a named entity recognition task according to the present invention.
Detailed Description
In order to make the technical solutions of the present invention better understood, the technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention. Wherein, the abbreviations and key terms appearing in this embodiment are defined as follows:
BP is Back Propagation Back Propagation;
CRF, Conditional Random Field;
Bi-LSTM, a Bidirectional Long Short-Term Memory neural network;
real-time instance one
Referring to fig. 1 and 2, the present invention provides a method for performing domain migration on a named entity recognition task by fusing semantic and tag differences, and specifically, during training, the method includes:
the method comprises the steps of firstly, preprocessing sentences in a corpus of a source domain and a target domain, removing websites and special symbols in the source domain and the target domain, and converting complex and simplified sentences into Chinese simplified sentences.
And step two, processing the labels of the sentences in the source field corpus, and unifying the entity label sets of the source field and the target field. Specifically, PER tag in source domain is changed to per.nam, LOC tag is changed to loc.nam, and ORG tag is changed to org.nam, O tag is not changed.
And step three, mapping the sentences in the source field and the sentences in the target field into vector representations according to the same dictionary, and digitizing the input text into a numerical matrix formed by connecting character vector columns.
Further, a mapping dictionary which is initialized randomly is adopted, a word embedding method is adopted to initialize the same dense vector representation for the same characters randomly, and then each Chinese character of the corpus data is mapped into the dense vector representation through the mapping dictionary;
when training a word vector, a Glove model is adopted, a vector representation containing certain word information is obtained through training, and each Chinese character of the corpus data is mapped into a dense vector representation through a mapping dictionary.
In the embodiment, a large amount of unlabeled target domain corpus and source domain data obtained from a web crawler are adopted to pre-train a word vector model, a word vector mapping dictionary is constructed, and the same character vectors are consistent for each character label; for characters that do not appear in the dictionary, random initialization is employed.
And step four, in order to enhance the representation of the character vector, splicing the word segmentation label and the bigram vector of each character behind the character vector to introduce word level information and word segmentation information.
Specifically, in order to enhance the representation of the character vectors, the word segmentation information and the bigram information are added after each character vector, which is specifically represented as follows:
xi=[ci:bi:segi]
wherein c isiIs the character vector of the ith character in the sentence, biIs the corresponding bigram vector, segiIt is a Word Segmentation tag, and what we adopt when segmenting words in the target domain corpus is (Neural Word Segmentation with RichPretracking, Yang et al 2017a)Word segmentation tool.
And step five, extracting context-related feature vectors of each character by adopting a Bidirectional Long-Short term memory neural network (Bi-LSTM), and obtaining the probability of various entity labels of each character by using a linear layer.
Inputting the numerical matrix into a bidirectional long-short term memory neural network to obtain a characteristic representation, wherein the calculation process is as follows:
ft=σ(Wf·[ht-1:xt]+bf)
it=σ(Wi·[ht-1:xt]+bi)
Figure BDA0002257366220000041
Figure BDA0002257366220000042
ot=σ(Wo·[ht-1:xt]+bo)
ht=ot*tanh(Ct)
wherein f ist、it
Figure BDA0002257366220000043
Outputs representing states of forgetting gate, memory gate and temporary cell, respectively, CtIs the cell state at the present time, otIs the output of the current output gate, htThe output of the hidden layer state is taken as the characteristic representation of each character.
And sixthly, decoding by adopting a Conditional Random Field (CRF) to obtain a final label of each word and form an output label sequence.
And step seven, performing the operations from the first step to the sixth step by using the corpus of the target domain to obtain the named entity recognition model of the target domain.
And step eight, obtaining the state representation and the current reward of the sentences in the source field by adopting the named entity recognition model obtained in the step seven.
The state representation and the reward of the source domain sentence are calculated as follows:
st=(h1+h2+…+hn)/n
reward=log P(Y|X)
wherein h is1,h2,…,hnIs the output of the sentence outside the domain after passing through the two-way long-short term memory neural network, P (Y | X) is the probability of the label sequence obtained by conditional random field decoding, stAnd reward is the state representation of the sentence and the reward the current sentence received in the named entity recognition model.
And step nine, the decision network makes corresponding actions according to the state representation of the current sentence, judges whether the current sentence is added into the training data, and simultaneously calculates the loss function of the decision network according to the reward of each sentence to perform gradient reverse propagation.
The decision network in the ninth step has the following judgment mode:
a=softmax(W·st+b)
wherein W, b are weight parameters of the decision network, softmax is a normalization operation, a belongs to R2x1The decision network is an output action of the decision network, a multilayer perceptron is adopted as the decision network, the decision network makes a corresponding action a according to the current state of each sentence, if a0>And 0.5, selecting a sentence, adding training data, obtaining corresponding reward, calculating a loss function, and performing inverse gradient propagation.
The loss function is calculated as follows:
Loss=-reward*(a0loga0+(1-a0)log(1-a0))+L1+L2
wherein L is1,L2Is the regularization parameter of the decision network and reward is the reward that the current sentence receives in the named entity recognition model.
Combining the source field sentences selected by the decision network with the sentences of the target field linguistic data to obtain expanded training data, and continuing to train the named entity recognition model of the target field.
And step eleven, continuously repeating the step eight to the step ten, selecting the model with the maximum F value obtained on the development set, testing, and storing the model.
In the non-training case, the steps one to ten are replaced by:
step one, sentences in a target domain corpus are used as input of a neural network;
mapping each character of a sentence in the target domain corpus to corresponding vector representation by using a character vector dictionary in the training process;
and step three, inputting the vector representation of each sentence into a bidirectional long-Short Term Memory neural network (Bi-LSTM) to obtain the feature representation related to each sentence and context.
And step four, inputting the obtained characteristic representation of the sentence into a linear layer to obtain the prediction probability of various labels of each character in the sentence.
And step five, inputting the label prediction probability of each character into a Conditional Random Field (CRF), decoding to obtain an optimal sequence, and finishing entity identification.
In a preferred embodiment, each character in a sentence is mapped into a dense vector with the vector dimension being n, and the feature of each character in the sentence is extracted through a bidirectional long-short term memory neural network; for data in a source field, inputting the state of each sentence into a decision network of reinforcement learning training to obtain corresponding action and reward; determining whether to add the current sentence into the training data according to the action, meanwhile, calculating the Loss of the decision network according to the feedback reward, performing reverse propagation, and updating the decision network; for sentences in the target field, training data are directly added without selection; and (4) retraining the named entity recognition model by the obtained training data, calculating the corresponding Loss, performing back propagation, and updating the parameters of the named entity recognition model.
The invention provides a method for carrying out domain migration on a named entity recognition task by fusing semantics and label differences, which adopts a neural network to replace a decision network in reinforcement learning, thereby avoiding the problem of infinite state space in natural language processing; meanwhile, semantic difference and label difference between a source field and a target field are introduced by using state representation and reward setting in reinforcement learning, and training of a decision network is carried out, so that the decision network can select sentences having positive influence on a target domain named entity recognition model; by utilizing the existing large-scale labeling data, the named entity identification accuracy in the target domain is improved, and the pressure of manual labeling of linguistic data is relieved.
The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.
The above description is only for the specific embodiments of the present invention, but the scope of the present invention is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present invention, and all the changes or substitutions should be covered within the scope of the present invention. Therefore, the protection scope of the present invention shall be subject to the protection scope of the appended claims.

Claims (7)

1. A named entity recognition field self-adaptive method fusing semantic and label differences is characterized in that the semantic and label differences are introduced in a deep reinforcement learning mode, a decision network is trained, data of a source field are selectively added, and training data of a target field are expanded, and the method comprises the following steps:
(1) preprocessing sentences in the target corpus to remove websites and special symbols in the sentences, and performing complex and simplified body conversion to convert all the sentences in the target corpus into Chinese simplified bodies;
(2) processing labels of sentences in the corpus of the source field and the target field, and unifying label sets of entities in different corpora;
(3) the method comprises the steps of mapping sentences in a source field and sentences in a target field into vector representations according to the same dictionary, and digitizing input texts into a numerical matrix formed by connecting character vector columns;
(4) in order to enhance the representation of the character vector, the word segmentation label and the bigram vector of each character are spliced behind the character vector to introduce word level information and word segmentation information;
(5) extracting context-related feature vectors of each character by adopting a Bidirectional Long-Short Term Memory neural network (Bi-LSTM), and obtaining the probability of various entity labels of each character by using a linear layer;
(6) decoding by adopting a Conditional Random Field (CRF) to obtain a final label of each word and form an output label sequence;
(7) carrying out the operations of the steps (1) to (6) by using a target corpus to obtain a named entity recognition model trained by a target domain;
(8) obtaining the state representation and the current reward of each sentence in the source field in a reinforcement learning mode by adopting the named entity recognition model obtained in the step (7);
(9) training a decision network by using a deep reinforcement learning mode, wherein the decision network makes corresponding actions according to the state representation of the current sentence, judges whether the current sentence is added into training data or not, and then obtains rewards after the actions are executed for calculating a loss function of the decision network and performing gradient back propagation;
(10) combining positive samples in the source field selected by the decision network with sentences in the target field corpus, expanding training data, and continuing to train the named entity recognition model of the target field;
(11) and (5) continuously repeating the steps (8) to (10), selecting the model which obtains the maximum F value on the target domain development set, carrying out model test, and storing the model.
2. The method of claim 1, wherein in the non-training case, steps (1) - (10) are replaced with:
(2.1) taking sentences in the target domain corpus as input of the trained named entity recognition model of the target domain;
(2.2) mapping sentences in the target domain corpus to corresponding vector representations through a dictionary by utilizing the character vector dictionary in the training process;
(2.3) inputting the vector representation of each sentence into the bidirectional long-short term memory neural network, and acquiring the feature representation of each sentence related to the context;
(2.4) inputting the obtained characteristic representation of the sentence into a linear layer to obtain the prediction probability of various labels of each character in the sentence;
and (2.5) inputting the label prediction probability of each character into the conditional random field, and decoding to obtain an optimal sequence to obtain a result of recognizing each sentence named entity.
3. The method of claim 1, wherein in step (3), mapping the chinese characters of the target domain and the source domain into a vector representation using the same dictionary comprises:
(3.1) randomly initializing a mapping dictionary, adopting a word embedding method, randomly initializing the same dense vector representation for the same characters, and mapping each Chinese character of the corpus data into dense vector representation through the mapping dictionary;
and (3.2) training word vectors to obtain vector representation containing certain word information by adopting a Bag-of-words model Skip-Gram or Continuous Bag-of-words (CBOW), and mapping each Chinese character of the corpus data into dense vector representation through a mapping dictionary.
4. The method according to claim 1, wherein in step (4), in order to enhance the representation of the character-level vectors, the participle tag information and bigram information are added after each character vector, which is specifically represented as follows:
xi=[ci:bi:segi]
wherein c isiIs the character vector of the ith character in the sentence, biIs the corresponding bigram vector, segiThen it is a word segmentation tag.
5. The method as claimed in claim 1, wherein in the step (5), the numerical matrix is input into the bidirectional long-short term memory neural network to obtain the feature representation, and the calculation process is as follows:
ft=σ(Wf·[ht-1:xt]+bf)
it=σ(Wi·[ht-1:xt]+bi)
Figure FDA0002257366210000021
ot=σ(Wo·[ht-1:xt]+bo)
ht=ot*tanh(Ct)
wherein f ist、itOutputs representing states of forgetting gate, memory gate and temporary cell, respectively, CtIs the cell state at the present time, otIs the output of the current output gate, htThe output of the hidden layer state is taken as the characteristic representation of each character.
6. The method as claimed in claim 1, wherein the state representation and the award of the source domain sentence in the step (8) are calculated as follows:
st=(h1+h2+…+hn)/n
reward=log P(Y|X)
wherein h is1,h2,…,hnThe method is characterized in that a sentence in a source field is output in a hidden layer state obtained by a bidirectional long-short term memory neural network, P (Y | X) is the probability of outputting a real label sequence of a current sentence obtained by conditional random field decoding, and a mode of summing Bi-LSTM output hidden layer states is taken as the current sentence in a targetState representation s of a domain-named entity recognition modeltMeanwhile, in order to select sentences having positive influence on the named entity recognition model of the target domain, conditional probabilities of corresponding real tag sequences output in the current sentence and output by conditional random fields in the model of the target domain are used as reward rewarded.
7. The method of claim 1, wherein the selector in step (9) is determined in the following manner:
a=softmax(W·st+b)
wherein W, b are parameters of the decision network, softmax is a normalization operation, a belongs to R2x1The decision network acts in the current state, the multi-layer perceptron is taken as the decision network, the decision network makes corresponding action a according to the current state of each sentence, if a0>0.5, selecting sentences in the source field to add in training data, otherwise discarding the sentences, then maximizing according to the reward, finally obtaining the reward in the named entity recognition model of the target field through the current sentences, calculating a loss function, and performing reverse gradient propagation;
the loss function of the decision network is calculated as follows:
Loss=-reward*(a0loga0+(1-a0)log(1-a0))+L1+L2
wherein L is1,L2Is L of a decision network1,L2The regularization parameter, reward, is the reward that the current sentence receives in the named entity recognition model.
CN201911059048.9A 2019-11-01 2019-11-01 Self-adaptive method for named entity recognition field fusing semantics and label differences Active CN110765775B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911059048.9A CN110765775B (en) 2019-11-01 2019-11-01 Self-adaptive method for named entity recognition field fusing semantics and label differences

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911059048.9A CN110765775B (en) 2019-11-01 2019-11-01 Self-adaptive method for named entity recognition field fusing semantics and label differences

Publications (2)

Publication Number Publication Date
CN110765775A true CN110765775A (en) 2020-02-07
CN110765775B CN110765775B (en) 2020-08-04

Family

ID=69335232

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911059048.9A Active CN110765775B (en) 2019-11-01 2019-11-01 Self-adaptive method for named entity recognition field fusing semantics and label differences

Country Status (1)

Country Link
CN (1) CN110765775B (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522965A (en) * 2020-04-22 2020-08-11 重庆邮电大学 Question-answering method and system for entity relationship extraction based on transfer learning
CN111611802A (en) * 2020-05-21 2020-09-01 苏州大学 Multi-field entity identification method
CN111666734A (en) * 2020-04-24 2020-09-15 北京大学 Sequence labeling method and device
CN111738003A (en) * 2020-06-15 2020-10-02 中国科学院计算技术研究所 Named entity recognition model training method, named entity recognition method, and medium
CN111767718A (en) * 2020-07-03 2020-10-13 北京邮电大学 Chinese grammar error correction method based on weakened grammar error feature representation
CN112084783A (en) * 2020-09-24 2020-12-15 中国民航大学 Entity identification method and system based on civil aviation non-civilized passengers
CN112163372A (en) * 2020-09-21 2021-01-01 上海玫克生储能科技有限公司 SOC estimation method of power battery
CN112199511A (en) * 2020-09-28 2021-01-08 西南电子技术研究所(中国电子科技集团公司第十研究所) Cross-language multi-source vertical domain knowledge graph construction method
CN112528894A (en) * 2020-12-17 2021-03-19 科大讯飞股份有限公司 Method and device for distinguishing difference items
CN112925886A (en) * 2021-03-11 2021-06-08 杭州费尔斯通科技有限公司 Few-sample entity identification method based on field adaptation
CN113342904A (en) * 2021-04-01 2021-09-03 山东省人工智能研究院 Enterprise service recommendation method based on enterprise feature propagation
CN115221871A (en) * 2022-06-24 2022-10-21 毕开龙 Multi-feature fusion English scientific and technical literature keyword extraction method
WO2022227163A1 (en) * 2021-04-30 2022-11-03 平安科技(深圳)有限公司 Training method for named entity recognition model, apparatus, device, and medium
CN115577707A (en) * 2022-12-08 2023-01-06 中国传媒大学 Word segmentation method for multi-language news subject words
CN117744660A (en) * 2024-02-19 2024-03-22 广东省人民医院 Named entity recognition method and device based on reinforcement learning and migration learning

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664589A (en) * 2018-05-08 2018-10-16 苏州大学 Text message extracting method, device, system and medium based on domain-adaptive
CN108874997A (en) * 2018-06-13 2018-11-23 广东外语外贸大学 A kind of name name entity recognition method towards film comment
CN109871541A (en) * 2019-03-06 2019-06-11 电子科技大学 It is a kind of suitable for multilingual multi-field name entity recognition method
CN109871538A (en) * 2019-02-18 2019-06-11 华南理工大学 A kind of Chinese electronic health record name entity recognition method
CN110175227A (en) * 2019-05-10 2019-08-27 神思电子技术股份有限公司 A kind of dialogue auxiliary system based on form a team study and level reasoning
CN110196980A (en) * 2019-06-05 2019-09-03 北京邮电大学 A kind of field migration based on convolutional network in Chinese word segmentation task
CN110209770A (en) * 2019-06-03 2019-09-06 北京邮电大学 A kind of name entity recognition method based on policy value network and tree search enhancing

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664589A (en) * 2018-05-08 2018-10-16 苏州大学 Text message extracting method, device, system and medium based on domain-adaptive
CN108874997A (en) * 2018-06-13 2018-11-23 广东外语外贸大学 A kind of name name entity recognition method towards film comment
CN109871538A (en) * 2019-02-18 2019-06-11 华南理工大学 A kind of Chinese electronic health record name entity recognition method
CN109871541A (en) * 2019-03-06 2019-06-11 电子科技大学 It is a kind of suitable for multilingual multi-field name entity recognition method
CN110175227A (en) * 2019-05-10 2019-08-27 神思电子技术股份有限公司 A kind of dialogue auxiliary system based on form a team study and level reasoning
CN110209770A (en) * 2019-06-03 2019-09-06 北京邮电大学 A kind of name entity recognition method based on policy value network and tree search enhancing
CN110196980A (en) * 2019-06-05 2019-09-03 北京邮电大学 A kind of field migration based on convolutional network in Chinese word segmentation task

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
LIZHEN QU等: "Named Entity Recognition for Novel Types by Transfer Learning", 《PROCEEDINGS OF THE 2016 CONFERENCE ON EMPIRICAL METHODS IN NATURAL LANGUAGE PROCESSING》 *
陈佳沣 等: "基于强化学习的实体关系联合抽取模型", 《计算机应用》 *

Cited By (25)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111522965A (en) * 2020-04-22 2020-08-11 重庆邮电大学 Question-answering method and system for entity relationship extraction based on transfer learning
CN111666734A (en) * 2020-04-24 2020-09-15 北京大学 Sequence labeling method and device
CN111611802B (en) * 2020-05-21 2021-08-31 苏州大学 Multi-field entity identification method
CN111611802A (en) * 2020-05-21 2020-09-01 苏州大学 Multi-field entity identification method
CN111738003A (en) * 2020-06-15 2020-10-02 中国科学院计算技术研究所 Named entity recognition model training method, named entity recognition method, and medium
CN111738003B (en) * 2020-06-15 2023-06-06 中国科学院计算技术研究所 Named entity recognition model training method, named entity recognition method and medium
CN111767718A (en) * 2020-07-03 2020-10-13 北京邮电大学 Chinese grammar error correction method based on weakened grammar error feature representation
CN111767718B (en) * 2020-07-03 2021-12-07 北京邮电大学 Chinese grammar error correction method based on weakened grammar error feature representation
CN112163372A (en) * 2020-09-21 2021-01-01 上海玫克生储能科技有限公司 SOC estimation method of power battery
CN112163372B (en) * 2020-09-21 2022-05-13 上海玫克生储能科技有限公司 SOC estimation method of power battery
CN112084783A (en) * 2020-09-24 2020-12-15 中国民航大学 Entity identification method and system based on civil aviation non-civilized passengers
CN112084783B (en) * 2020-09-24 2022-04-12 中国民航大学 Entity identification method and system based on civil aviation non-civilized passengers
CN112199511A (en) * 2020-09-28 2021-01-08 西南电子技术研究所(中国电子科技集团公司第十研究所) Cross-language multi-source vertical domain knowledge graph construction method
CN112199511B (en) * 2020-09-28 2022-07-08 西南电子技术研究所(中国电子科技集团公司第十研究所) Cross-language multi-source vertical domain knowledge graph construction method
CN112528894A (en) * 2020-12-17 2021-03-19 科大讯飞股份有限公司 Method and device for distinguishing difference items
CN112528894B (en) * 2020-12-17 2024-05-31 科大讯飞股份有限公司 Method and device for discriminating difference term
CN112925886A (en) * 2021-03-11 2021-06-08 杭州费尔斯通科技有限公司 Few-sample entity identification method based on field adaptation
CN113342904A (en) * 2021-04-01 2021-09-03 山东省人工智能研究院 Enterprise service recommendation method based on enterprise feature propagation
CN113342904B (en) * 2021-04-01 2021-12-24 山东省人工智能研究院 Enterprise service recommendation method based on enterprise feature propagation
WO2022227163A1 (en) * 2021-04-30 2022-11-03 平安科技(深圳)有限公司 Training method for named entity recognition model, apparatus, device, and medium
CN115221871A (en) * 2022-06-24 2022-10-21 毕开龙 Multi-feature fusion English scientific and technical literature keyword extraction method
CN115221871B (en) * 2022-06-24 2024-02-20 毕开龙 Multi-feature fusion English scientific literature keyword extraction method
CN115577707A (en) * 2022-12-08 2023-01-06 中国传媒大学 Word segmentation method for multi-language news subject words
CN117744660A (en) * 2024-02-19 2024-03-22 广东省人民医院 Named entity recognition method and device based on reinforcement learning and migration learning
CN117744660B (en) * 2024-02-19 2024-05-10 广东省人民医院 Named entity recognition method and device based on reinforcement learning and migration learning

Also Published As

Publication number Publication date
CN110765775B (en) 2020-08-04

Similar Documents

Publication Publication Date Title
CN110765775B (en) Self-adaptive method for named entity recognition field fusing semantics and label differences
CN110609891B (en) Visual dialog generation method based on context awareness graph neural network
CN111967266B (en) Chinese named entity recognition system, model construction method, application and related equipment
CN109582789B (en) Text multi-label classification method based on semantic unit information
CN106980683B (en) Blog text abstract generating method based on deep learning
CN110196980B (en) Domain migration on Chinese word segmentation task based on convolutional network
CN106484674B (en) Chinese electronic medical record concept extraction method based on deep learning
CN111738007B (en) Chinese named entity identification data enhancement algorithm based on sequence generation countermeasure network
CN110263325B (en) Chinese word segmentation system
CN111783462A (en) Chinese named entity recognition model and method based on dual neural network fusion
CN111401061A (en) Method for identifying news opinion involved in case based on BERT and Bi L STM-Attention
CN108062388A (en) Interactive reply generation method and device
CN111858944A (en) Entity aspect level emotion analysis method based on attention mechanism
CN113392209B (en) Text clustering method based on artificial intelligence, related equipment and storage medium
CN113569001A (en) Text processing method and device, computer equipment and computer readable storage medium
Suyanto Synonyms-based augmentation to improve fake news detection using bidirectional LSTM
CN113988079A (en) Low-data-oriented dynamic enhanced multi-hop text reading recognition processing method
CN114579741B (en) GCN-RN aspect emotion analysis method and system for fusing syntax information
CN115630649A (en) Medical Chinese named entity recognition method based on generative model
CN111581392A (en) Automatic composition scoring calculation method based on statement communication degree
CN113191150B (en) Multi-feature fusion Chinese medical text named entity identification method
CN111145914A (en) Method and device for determining lung cancer clinical disease library text entity
CN116522165B (en) Public opinion text matching system and method based on twin structure
CN112560440A (en) Deep learning-based syntax dependence method for aspect-level emotion analysis
CN112417118A (en) Dialog generation method based on marked text and neural network

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant