CN112906397A - Short text entity disambiguation method - Google Patents
Short text entity disambiguation method Download PDFInfo
- Publication number
- CN112906397A CN112906397A CN202110366911.6A CN202110366911A CN112906397A CN 112906397 A CN112906397 A CN 112906397A CN 202110366911 A CN202110366911 A CN 202110366911A CN 112906397 A CN112906397 A CN 112906397A
- Authority
- CN
- China
- Prior art keywords
- entity
- sentence
- model
- training
- word
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
- 238000000034 method Methods 0.000 title claims abstract description 23
- 239000013598 vector Substances 0.000 claims abstract description 27
- 230000011218 segmentation Effects 0.000 claims abstract description 18
- 238000005516 engineering process Methods 0.000 claims abstract description 14
- 230000006870 function Effects 0.000 claims abstract description 12
- 238000004364 calculation method Methods 0.000 claims abstract description 7
- 238000012549 training Methods 0.000 claims description 35
- 238000013528 artificial neural network Methods 0.000 claims description 33
- 210000002569 neuron Anatomy 0.000 claims description 8
- 238000003062 neural network model Methods 0.000 claims description 7
- 238000012360 testing method Methods 0.000 claims description 7
- ORILYTVJVMAKLC-UHFFFAOYSA-N Adamantane Natural products C1C(C2)CC3CC1CC2C3 ORILYTVJVMAKLC-UHFFFAOYSA-N 0.000 claims description 6
- 230000007246 mechanism Effects 0.000 claims description 6
- 230000010355 oscillation Effects 0.000 claims description 6
- 238000012545 processing Methods 0.000 claims description 6
- 230000000694 effects Effects 0.000 claims description 4
- 230000003044 adaptive effect Effects 0.000 claims description 3
- 238000006243 chemical reaction Methods 0.000 claims description 3
- 238000011478 gradient descent method Methods 0.000 claims description 3
- 238000005065 mining Methods 0.000 claims description 3
- 230000000717 retained effect Effects 0.000 claims description 3
- 238000012795 verification Methods 0.000 claims description 3
- 238000013135 deep learning Methods 0.000 abstract description 4
- 230000002457 bidirectional effect Effects 0.000 abstract description 3
- 230000009466 transformation Effects 0.000 abstract description 3
- 238000000844 transformation Methods 0.000 abstract description 3
- 238000003058 natural language processing Methods 0.000 description 4
- 238000004458 analytical method Methods 0.000 description 2
- 238000013473 artificial intelligence Methods 0.000 description 2
- 238000013527 convolutional neural network Methods 0.000 description 2
- 238000010586 diagram Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 235000013399 edible fruits Nutrition 0.000 description 1
- 238000004880 explosion Methods 0.000 description 1
- 230000008676 import Effects 0.000 description 1
- 230000010365 information processing Effects 0.000 description 1
- 230000007774 longterm Effects 0.000 description 1
- 238000012986 modification Methods 0.000 description 1
- 230000004048 modification Effects 0.000 description 1
- 238000004321 preservation Methods 0.000 description 1
- 230000008569 process Effects 0.000 description 1
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
- G06F40/295—Named entity recognition
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06N—COMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
- G06N3/00—Computing arrangements based on biological models
- G06N3/02—Neural networks
- G06N3/04—Architecture, e.g. interconnection topology
- G06N3/044—Recurrent networks, e.g. Hopfield networks
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Health & Medical Sciences (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Computational Linguistics (AREA)
- Data Mining & Analysis (AREA)
- Biomedical Technology (AREA)
- Molecular Biology (AREA)
- Computing Systems (AREA)
- Evolutionary Computation (AREA)
- Biophysics (AREA)
- Mathematical Physics (AREA)
- Software Systems (AREA)
- Life Sciences & Earth Sciences (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Databases & Information Systems (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a short text entity disambiguation method based on deep learning, which is mainly used for solving the problem that entities in sentences have different meanings and different directions in different short texts, and comprises the following steps: step 1, segmenting words of sentences by using a jieba word segmentation technology, finding out entities to be disambiguated, and using listed company entities and abbreviations thereof as dictionaries; step 2, segmenting the sentence by taking the entity to be disambiguated as the center and the size of 32 characters; step 3, converting the statement containing the entity to be disambiguated into a Bidirectional Encoder reproduction from transformations (BERT) word vector model; and 4, putting the word vector model into a Long-Short Term Memory RNN (LSTM) model in batches, performing loss function calculation through cross entropy, and continuously optimizing parameters to obtain a final model. The invention can not only obtain good results in special fields such as company entities, but also obtain good results in general fields.
Description
Technical Field
The invention belongs to the field of natural language processing, and particularly relates to a Short text entity disambiguation method, which is an effective entity disambiguation technology based on deep learning Long-Short Term Memory RNN (LSTM) and Bidirectional Encoder retrieval from transformations (BERT) models, and is mainly used for solving the problem that company entities point to different meanings in different Short texts.
Background
In the internet era, information explosion and massive consultation, people hope that the advanced AI technology can associate texts with massive entity (company, name, etc.) information, improve reading fluency of users, realize accurate content recommendation, etc. The intelligent consultation treatment not only provides intelligent service for the financial industry, but also provides more innovation space for the financial business.
The text information is the main medium for information dissemination of company entities, and the company entities which generate news are accurately positioned to directly determine how to carry out downstream financial work. In the financial information, company entities (in tens of millions of company entities) appear in the form of domain names, which causes ambiguity. For example, apple is a commercially available company in the united states and is also a fruit. The object of entity disambiguation is to eliminate the problem of entity ambiguity in the information processing process and to purify the text information. Disambiguation is generally achieved by incorporating knowledge of entities. In recent years, the rapid development of artificial intelligence technology makes it possible to solve many problems, and people hope to apply the leading-edge artificial intelligence method to solve the problem of entity ambiguity in intelligent information.
The traditional entity disambiguation task is mainly based on a long text of a knowledge base, the knowledge base is complete, the long text has richer context information to assist entity disambiguation, and an entity disambiguation system based on vertical domain (company entity) disambiguation data is more challenging to construct.
The BERT model has the parallel capability, the capability of extracting features and modeling texts in a two-way mode, better results can be obtained with less data and shorter time, the long-term and short-term neural network can retain more important information and forget redundant information, the two technologies are combined and a binary technology is used for disambiguating entities, and a novel entity disambiguation technology based on deep learning is provided.
Disclosure of Invention
The technical problem to be solved by the invention is to provide a short text entity disambiguation method, which can effectively help natural language processing developers and related readers to judge whether a word to be disambiguated is a company name according to own requirements, and has higher accuracy and efficiency.
In order to solve the above technical problem, an embodiment of the present invention provides a short text entity disambiguation method, including the following steps:
s1, performing word segmentation on the training sample and the test sample;
s2, segmenting the sample by taking the entity to be disambiguated as the center;
s3, converting the sample containing the entity to be disambiguated into a word vector pre-trained by a BERT model;
s4, constructing a neural network model;
s5, calculating the value between the one-dimensional vector output by the neural network and the label vector of the sample by using the cross entropy as a loss function, and optimizing a neural network parameter model;
s6, using Microsoft Neural Network Intelligence (NNI) to search for parameters with higher training accuracy.
The specific steps of step S1 are:
s1.1, creating dictionaries for all entity names (including company full names and short names), and finding out all entities to be disambiguated by using a jieba word segmentation technology for training samples and testing samples;
s1.2, generating a prefix tree for a text to be segmented, and constructing a directed acyclic graph of a potential string order by using regular matching;
s1.3, finding out a word segmentation scheme of the maximum probability path through dynamic programming, solving an HMM (hidden Markov model) model by using a Viterbi algorithm in order to enable a word segmentation effect to be adaptive to a text, and mining a new word.
The specific steps of step S2 are:
s2.1, segmenting the sentence, and selecting only 32 characters when the sentence is coded;
s2.2, segmenting the sentence by taking the entity name as a center, finding the position of the entity name in the text, and dividing the first 13 characters and the last 14 characters of the entity name into one sentence, wherein the entity name fixedly occupies 5 bytes.
The specific steps of step S3 are:
s3.1, finding the id corresponding to the BERT pre-training model for each word in each sentence of the cut training and verification sample;
s3.2, identifying the length of each sentence, using 0 and 1 as masks, wherein 0 represents that no word exists in the position, and 1 represents that a word exists in the position, so that each sentence is converted into a binary vector group [ I, T, L, M ], wherein I identifies the BERT model id corresponding to each word; t identifies whether the sample is a company name, wherein 1 identifies the company name and 0 identifies the company name; l represents the length of the sentence; m is a mask for each sentence;
s3.3, performing batch processing on all the training sets, wherein every 32 samples serve as one batch, and optimizing parameters;
the specific steps of step S4 are: the neural network model is divided into three sub-modules:
s4.1, a BERT conversion module, which is used for converting the id in the step 3.1 into a BERT model vector which is actually pre-trained;
s4.2, an LSTM module which is used as a first layer training model and is convenient for learning information among the sentence sequences;
s4.3, a linear output module which is used as a final input vector.
Further, in step S4.1, for the BERT model, corresponding gradient information is retained in the calculation, and the formula is:
wherein loss is a loss functionNumber, w is weight, yiIs the true value;
in step S4.2, the LSTM module uses a dropout algorithm, for each layer of neurons, the neurons are temporarily discarded from the network according to a certain probability, and different neurons are randomly selected during each iterative training, which is equivalent to performing training on different neural networks each time;
in step S4.3, the linear output module uses an Attention mechanism, and the Attention mechanism gives higher weight to the Tokens sequence which has important influence on each word in the sentence; the Attention score calculation formula for Tokens is as follows:
wherein f isTIs a linear layer, and the linear layer,is the hidden layer state of the t-th Tokens, cTIs the context vector for Tokens.
The specific steps of step S5 are:
s5.1, calculating a neural network loss function by using cross entropy, and optimizing a neural network parameter model;
s5.2, for the entity name, the name is only an indication pronoun without meaning in the aspect of actual grammar, and the problem is simplified into a two-classification problem: the name of the entity is 1, and the name of the non-entity is 0; the cross entropy is a tool for two categories, can measure slight differences, finds an optimal solution by using a gradient descent method, and defines a cross entropy loss function as follows:
wherein, yiLabel representing a sample i, a positive class representing 1, and a negative class representing 0; y isiRepresents the probability that sample i is predicted to be positive;
s5.3, optimizing parameters by using Adam as a gradient descent algorithm, wherein the Adam algorithm not only performs exponential weighted average processing on the gradient during each training, but also updates the weight W and the constant term b by using the obtained gradient value, and reduces the updating speed of the direction if the direction has large oscillation, so as to reduce the oscillation; the exponentially weighted average formula is as follows:
Vt=βvt-1+(1-β)θt
wherein beta represents hyper-ginseng, vtRepresents the average value of the t-th order, thetatRepresents the value of the t-th time.
The specific steps of step S6 are:
microsoft Neural Network Intelligence (NNI) is a lightweight but powerful tool kit, which can adjust the super-parameters and adjust the batch size, learning rate, length processed by each sentence, cycle number, and number of convolution kernels, wherein F1 formula is as follows, taking F1 value as the basis for judgment:
where TP represents the number of positive samples determined to be positive, FP represents the number of negative samples determined to be positive, and FN represents the number of positive samples determined to be negative.
The technical scheme of the invention has the following beneficial effects:
the invention provides an entity disambiguation method based on the combination of a Bidirectional Encoder reproduction from transformations (BERT) model and a Long-Short Term Memory RNN (LSTM) model, which can effectively help natural language processing developers and related readers to judge whether a word to be disambiguated is a company name according to the requirements of the developers and the related readers, and has higher accuracy and efficiency.
Drawings
FIG. 1 is an overall framework of the present invention;
FIG. 2 is a flow chart of the jieba word segmentation work in the present invention;
FIG. 3 is a graph of a sentence segmentation algorithm in the present invention;
FIG. 4 is a general framework diagram of a neural network in the present invention;
FIG. 5 is a graph of the value of F1 obtained using the three word vectors in the present invention;
FIG. 6 is a graph of F1 values obtained using three neural networks in the present invention;
fig. 7 shows the values of F1 obtained in the present invention using three text lengths.
Detailed Description
In order to make the technical problems, technical solutions and advantages of the present invention more apparent, the following detailed description is given with reference to the accompanying drawings and specific embodiments.
The invention provides a short text entity disambiguation technology based on deep learning, which is mainly used for helping natural language processing developers and related readers judge whether a word to be disambiguated is a company name according to own requirements. The technology firstly finds out an entity to be disambiguated through jieba word segmentation and segments a long text into a short text, so that the scale of a neural network is reduced; secondly, the text uses a BERT model as a word vector pre-training model, converts each word in each sentence into an id corresponding to the BERT model, and records the length, the mask and whether the sentence is a company name; and finally, constructing and training a deep neural network by adopting long-short term neural network technology, an Attention mechanism, cross entropy and other technologies to obtain better parameters.
The invention provides a short text entity disambiguation method, which comprises the following steps:
s1, performing word segmentation on the training sample and the test sample; the method comprises the following specific steps:
s1.1, creating dictionaries for all entity names (including company full names and short names), and finding out all entities to be disambiguated by using a jieba word segmentation technology for training samples and testing samples; FIG. 2 is a jieba word segmentation workflow diagram, in which the loaded dictionary is the entity name, so as to find out the word to be disambiguated conveniently and quickly.
S1.2, generating a prefix tree for a text to be segmented, and constructing a directed acyclic graph of a potential string order by using regular matching;
s1.3, finding out a word segmentation scheme of the maximum probability path through dynamic programming, solving an HMM (hidden Markov model) model by using a Viterbi algorithm in order to enable a word segmentation effect to be adaptive to a text, and mining a new word.
S2, segmenting the sample by taking the entity to be disambiguated as the center; the method comprises the following specific steps:
s2.1, segmenting the sentences, and simultaneously, only selecting 32 characters when encoding the sentences, so that the training speed of the neural network is reduced as much as possible on the basis of ensuring the accuracy;
s2.2, segmenting the sentence by taking the entity name as the center, finding the position of the entity name in the text, and dividing the first 13 characters and the last 14 characters of the entity name into one sentence, wherein the entity name fixedly occupies 5 bytes, as shown in figure 3.
S3, converting the sample containing the entity to be disambiguated into a word vector pre-trained by a BERT model; the method comprises the following specific steps:
s3.1, finding the id corresponding to the BERT pre-training model for each word in each sentence of the cut training and verification sample;
and S3.2, because the step 2 can only ensure that the lengths of long sentences are equal, for sentences with smaller lengths, the lengths of the sentences cannot be ensured. Therefore, the length of each sentence must be identified, 0 and 1 are used as masks, 0 represents that there is no word at the position, 1 represents that there is a word at the position, and each sentence is converted into a binary vector group [ I, T, L, M ], wherein I identifies the BERT model id corresponding to each word; t identifies whether the sample is a company name, wherein 1 identifies the company name and 0 identifies the company name; l represents the length of the sentence; m is a mask for each sentence;
and S3.3, carrying out batch processing on all the training sets, taking 32 samples as one batch, and optimizing parameters.
S4, constructing a neural network model, wherein the overall framework of the neural network is shown in FIG. 4, and the neural network model is divided into three sub-modules:
s4.1, a BERT conversion module, which is used for converting the id in the step 3.1 into a BERT model vector which is actually pre-trained;
s4.2, an LSTM module which is used as a first layer training model and is convenient for learning information among the sentence sequences;
s4.3, a linear output module which is used as a final input vector.
For the BERT model, corresponding gradient information is retained in the calculation, and the formula is as follows:
where loss is the loss function, w is the weight, yiAre true values.
Using a dropout algorithm for an LSTM module, temporarily discarding the neurons of each layer from the network according to a certain probability, randomly selecting different neurons during each iterative training, and equivalently, training on different neural networks each time;
since the important part of a sentence is usually on the key words, the linear output module, using the Attention mechanism, gives higher weight to the token sequence that has important influence on each word in the sentence; the Attention score calculation formula for Tokens is as follows:
wherein f isTIs a linear layer, and the linear layer,is the hidden layer state of the t-th Tokens, cTIs the context vector for Tokens.
S5, calculating the value between the one-dimensional vector output by the neural network and the label vector of the sample by using the cross entropy as a loss function, and optimizing a neural network parameter model; the method comprises the following specific steps:
s5.1, calculating a neural network loss function by using cross entropy, and optimizing a neural network parameter model;
s5.2, for the entity name, the name is only an indication pronoun without meaning in the aspect of actual grammar, and the problem is simplified into a two-classification problem: the name of the entity is 1, and the name of the non-entity is 0; the cross entropy is a tool for two categories, can measure slight differences, finds an optimal solution by using a gradient descent method, and defines a cross entropy loss function as follows:
wherein, yiLabel representing a sample i, a positive class representing 1, and a negative class representing 0; y isiRepresents the probability that sample i is predicted to be positive;
s5.3, optimizing parameters by using Adam as a gradient descent algorithm, wherein the Adam algorithm not only performs exponential weighted average processing on the gradient during each training, but also updates the weight W and the constant term b by using the obtained gradient value, and reduces the updating speed of the direction if the direction has large oscillation, so as to reduce the oscillation; the exponentially weighted average formula is as follows:
Vt=βvt-1+(1-β)θt
wherein beta represents hyper-ginseng, vtRepresents the average value of the t-th order, thetatRepresents the value of the t-th time.
S6, searching parameters with higher training accuracy by using Microsoft Neural Network Intelligence (NNI); the method comprises the following specific steps:
microsoft Neural Network Intelligency (NNI) is a lightweight but powerful tool kit that can adjust the hyper-parameters and make reference to batch size, learning rate, length processed into each sentence, cycle number, and number of convolution kernels. Wherein, taking the F1 value as the judgment basis, the formula of F1 is as follows:
where TP represents the number of positive samples determined to be positive, FP represents the number of negative samples determined to be positive, and FN represents the number of positive samples determined to be negative.
The general framework of the method provided by the invention is shown in figure 1, the BERT model and the LSTM model are combined, the BERT model can use predecessors to obtain information relation between sentences through mass data and trained vector parameters, and the LSTM model can obtain information relation between sentences through an update gate, an output gate and a forget gate.
Model comparisons are performed below, with analysis being performed for the word vector model, the neural network, and the text length, respectively.
Comparison 1: the results corresponding to the values of the test set F1 obtained by comparing the Word2vec, BERT and ERNIE models are shown in fig. 5, which shows that the best results of BERT and ERNIE are obtained, but the BERT model curve is smoother.
Comparison 2: comparing three neural network models of general neural network, Convolutional Neural Network (CNN) and long-short term neural network (LSTM), as shown in fig. 6, it can be shown that LSTM converges more smoothly.
Comparison 3: for different text length comparisons, as shown in fig. 7, the length effect is not too great for the same period of training.
Through experimental results and analysis, the invention uses the BERT model to effectively obtain the relation between words and avoid the import of redundant information. For neural networks, the use of LSTM solves the long text information preservation problem. In addition, the reasonable segmentation of the text length can obtain enough information and improve the training speed.
While the foregoing is directed to the preferred embodiment of the present invention, it will be understood by those skilled in the art that various changes and modifications may be made without departing from the spirit and scope of the invention as defined in the appended claims.
Claims (7)
1. A short text entity disambiguation method, comprising the steps of:
s1, performing word segmentation on the training sample and the test sample;
s2, segmenting the sample by taking the entity to be disambiguated as the center;
s3, converting the sample containing the entity to be disambiguated into a word vector pre-trained by a BERT model;
s4, constructing a neural network model;
s5, calculating the value between the one-dimensional vector output by the neural network and the label vector of the sample by using the cross entropy as a loss function, and optimizing a neural network parameter model;
s6, using Microsoft Neural Network Intelligence (NNI) to search for parameters with higher training accuracy.
2. The method for disambiguating an entity of short text as claimed in claim 1, wherein the specific steps of step S1 are:
s1.1, creating dictionaries for all entity names, and finding out all entities to be disambiguated by using a jieba word segmentation technology for training samples and testing samples;
s1.2, generating a prefix tree for a text to be segmented, and constructing a directed acyclic graph of a potential string order by using regular matching;
s1.3, finding out a word segmentation scheme of the maximum probability path through dynamic programming, solving an HMM (hidden Markov model) model by using a Viterbi algorithm in order to enable a word segmentation effect to be adaptive to a text, and mining a new word.
3. The method for disambiguating an entity of short text as claimed in claim 1, wherein the specific steps of step S2 are:
s2.1, segmenting the sentence, and selecting only 32 characters when the sentence is coded;
s2.2, segmenting the sentence by taking the entity name as a center, finding the position of the entity name in the text, and dividing the first 13 characters and the last 14 characters of the entity name into one sentence, wherein the entity name fixedly occupies 5 bytes.
4. The method for disambiguating an entity of short text as claimed in claim 1, wherein the specific steps of step S3 are:
s3.1, finding the id corresponding to the BERT pre-training model for each word in each sentence of the cut training and verification sample;
s3.2, identifying the length of each sentence, using 0 and 1 as masks, wherein 0 represents that no word exists in the position, and 1 represents that a word exists in the position, so that each sentence is converted into a binary vector group [ I, T, L, M ], wherein I identifies the BERT model id corresponding to each word; t identifies whether the sample is a company name, wherein 1 identifies the company name and 0 identifies the company name; l represents the length of the sentence; m is a mask for each sentence;
s3.3, performing batch processing on all the training sets, wherein every 32 samples serve as one batch, and optimizing parameters;
the specific steps of step S4 are: the neural network model is divided into three sub-modules:
s4.1, a BERT conversion module, which is used for converting the id in the step 3.1 into a BERT model vector which is actually pre-trained;
s4.2, an LSTM module which is used as a first layer training model and is convenient for learning information among the sentence sequences;
s4.3, a linear output module which is used as a final input vector.
5. The short text entity disambiguation method of claim 4, wherein in step S4.1, for the BERT model, corresponding gradient information is retained in the calculation, which is formulated as:
where loss is the loss function, w is the weight, yiIs the true value;
in step S4.2, the LSTM module uses a dropout algorithm, for each layer of neurons, the neurons are temporarily discarded from the network according to a certain probability, and different neurons are randomly selected during each iterative training, which is equivalent to performing training on different neural networks each time;
in step S4.3, the linear output module uses an Attention mechanism, and the Attention mechanism gives higher weight to the Tokens sequence which has important influence on each word in the sentence; the Attention score calculation formula for Tokens is as follows:
6. The method for disambiguating an entity of short text as claimed in claim 1, wherein the specific steps of step S5 are:
s5.1, calculating a neural network loss function by using cross entropy, and optimizing a neural network parameter model;
s5.2, for the entity name, the name is only an indication pronoun without meaning in the aspect of actual grammar, and the problem is simplified into a two-classification problem: the name of the entity is 1, and the name of the non-entity is 0; the cross entropy is a tool for two categories, can measure slight differences, finds an optimal solution by using a gradient descent method, and defines a cross entropy loss function as follows:
wherein, yiRepresents a sample iLabel of (1), the positive class represents 1, the negative class represents 0; y isiRepresents the probability that sample i is predicted to be positive;
s5.3, optimizing parameters by using Adam as a gradient descent algorithm, wherein the Adam algorithm not only performs exponential weighted average processing on the gradient during each training, but also updates the weight W and the constant term b by using the obtained gradient value, and reduces the updating speed of the direction if the direction has large oscillation, so as to reduce the oscillation; the exponentially weighted average formula is as follows:
vt=βvt-1+(1-β)θt
wherein beta represents hyper-ginseng, vtRepresents the average value of the t-th order, thetatRepresents the value of the t-th time.
7. The method for disambiguating an entity of short text as claimed in claim 1, wherein the specific steps of step S6 are:
the microsoft Neural Network Intelligence toolkit can adjust the hyper-parameters and adjust the parameters of batch size, learning rate, length processed by each sentence, cycle times and convolution kernel number, wherein the F1 formula is as follows by taking an F1 value as a judgment basis:
where TP represents the number of positive samples determined to be positive, FP represents the number of negative samples determined to be positive, and FN represents the number of positive samples determined to be negative.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110366911.6A CN112906397B (en) | 2021-04-06 | 2021-04-06 | Short text entity disambiguation method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN202110366911.6A CN112906397B (en) | 2021-04-06 | 2021-04-06 | Short text entity disambiguation method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN112906397A true CN112906397A (en) | 2021-06-04 |
CN112906397B CN112906397B (en) | 2021-11-19 |
Family
ID=76109966
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110366911.6A Active CN112906397B (en) | 2021-04-06 | 2021-04-06 | Short text entity disambiguation method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN112906397B (en) |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113449516A (en) * | 2021-06-07 | 2021-09-28 | 深延科技(北京)有限公司 | Disambiguation method, system, electronic device and storage medium for acronyms |
CN113704416A (en) * | 2021-10-26 | 2021-11-26 | 深圳市北科瑞声科技股份有限公司 | Word sense disambiguation method and device, electronic equipment and computer-readable storage medium |
CN113779959A (en) * | 2021-08-31 | 2021-12-10 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Small sample text data mixing enhancement method |
CN114818736A (en) * | 2022-05-31 | 2022-07-29 | 北京百度网讯科技有限公司 | Text processing method, chain finger method and device for short text and storage medium |
CN115238701A (en) * | 2022-09-21 | 2022-10-25 | 北京融信数联科技有限公司 | Multi-field named entity recognition method and system based on subword level adapter |
Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108566627A (en) * | 2017-11-27 | 2018-09-21 | 浙江鹏信信息科技股份有限公司 | A kind of method and system identifying fraud text message using deep learning |
CN111581973A (en) * | 2020-04-24 | 2020-08-25 | 中国科学院空天信息创新研究院 | Entity disambiguation method and system |
CN112069826A (en) * | 2020-07-15 | 2020-12-11 | 浙江工业大学 | Vertical domain entity disambiguation method fusing topic model and convolutional neural network |
CN112464669A (en) * | 2020-12-07 | 2021-03-09 | 宁波深擎信息科技有限公司 | Stock entity word disambiguation method, computer device and storage medium |
-
2021
- 2021-04-06 CN CN202110366911.6A patent/CN112906397B/en active Active
Patent Citations (4)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108566627A (en) * | 2017-11-27 | 2018-09-21 | 浙江鹏信信息科技股份有限公司 | A kind of method and system identifying fraud text message using deep learning |
CN111581973A (en) * | 2020-04-24 | 2020-08-25 | 中国科学院空天信息创新研究院 | Entity disambiguation method and system |
CN112069826A (en) * | 2020-07-15 | 2020-12-11 | 浙江工业大学 | Vertical domain entity disambiguation method fusing topic model and convolutional neural network |
CN112464669A (en) * | 2020-12-07 | 2021-03-09 | 宁波深擎信息科技有限公司 | Stock entity word disambiguation method, computer device and storage medium |
Non-Patent Citations (3)
Title |
---|
DU J等: "Using bert for word sense disambiguation", 《ARXIV PREPRINT ARXIV:1909.08358》 * |
HUANG L等: "GlossBERT: BERT for word sense disambiguation with gloss knowledge", 《ARXIV PREPRINT ARXIV:1908.07245》 * |
JACOB DEVLIN等: "BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding", 《ARXIV:1810.04805V1》 * |
Cited By (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN113449516A (en) * | 2021-06-07 | 2021-09-28 | 深延科技(北京)有限公司 | Disambiguation method, system, electronic device and storage medium for acronyms |
CN113779959A (en) * | 2021-08-31 | 2021-12-10 | 西南电子技术研究所(中国电子科技集团公司第十研究所) | Small sample text data mixing enhancement method |
CN113704416A (en) * | 2021-10-26 | 2021-11-26 | 深圳市北科瑞声科技股份有限公司 | Word sense disambiguation method and device, electronic equipment and computer-readable storage medium |
CN114818736A (en) * | 2022-05-31 | 2022-07-29 | 北京百度网讯科技有限公司 | Text processing method, chain finger method and device for short text and storage medium |
CN115238701A (en) * | 2022-09-21 | 2022-10-25 | 北京融信数联科技有限公司 | Multi-field named entity recognition method and system based on subword level adapter |
Also Published As
Publication number | Publication date |
---|---|
CN112906397B (en) | 2021-11-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109992782B (en) | Legal document named entity identification method and device and computer equipment | |
CN112906397B (en) | Short text entity disambiguation method | |
CN108628823B (en) | Named entity recognition method combining attention mechanism and multi-task collaborative training | |
CN108920622B (en) | Training method, training device and recognition device for intention recognition | |
CN109284506B (en) | User comment emotion analysis system and method based on attention convolution neural network | |
CN106776581B (en) | Subjective text emotion analysis method based on deep learning | |
CN111738003B (en) | Named entity recognition model training method, named entity recognition method and medium | |
CN111209401A (en) | System and method for classifying and processing sentiment polarity of online public opinion text information | |
CN110909736B (en) | Image description method based on long-term and short-term memory model and target detection algorithm | |
CN113591483A (en) | Document-level event argument extraction method based on sequence labeling | |
CN111709242B (en) | Chinese punctuation mark adding method based on named entity recognition | |
CN114757182A (en) | BERT short text sentiment analysis method for improving training mode | |
CN111931506A (en) | Entity relationship extraction method based on graph information enhancement | |
WO2023134083A1 (en) | Text-based sentiment classification method and apparatus, and computer device and storage medium | |
CN115392259B (en) | Microblog text sentiment analysis method and system based on confrontation training fusion BERT | |
CN112818110B (en) | Text filtering method, equipment and computer storage medium | |
CN114416979A (en) | Text query method, text query equipment and storage medium | |
CN112163089A (en) | Military high-technology text classification method and system fusing named entity recognition | |
CN115831102A (en) | Speech recognition method and device based on pre-training feature representation and electronic equipment | |
CN115238693A (en) | Chinese named entity recognition method based on multi-word segmentation and multi-layer bidirectional long-short term memory | |
Chen et al. | Chinese Weibo sentiment analysis based on character embedding with dual-channel convolutional neural network | |
CN115204143A (en) | Method and system for calculating text similarity based on prompt | |
CN115169349A (en) | Chinese electronic resume named entity recognition method based on ALBERT | |
CN116522165B (en) | Public opinion text matching system and method based on twin structure | |
Sinapoy et al. | Comparison of lstm and indobert method in identifying hoax on twitter |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant | ||
TR01 | Transfer of patent right | ||
TR01 | Transfer of patent right |
Effective date of registration: 20240613 Address after: 518000 1104, Building A, Zhiyun Industrial Park, No. 13, Huaxing Road, Henglang Community, Longhua District, Shenzhen, Guangdong Province Patentee after: Shenzhen Hongyue Information Technology Co.,Ltd. Country or region after: China Address before: 226019 Jiangsu Province, Nantong City Chongchuan District sik Road No. 9 Patentee before: NANTONG University Country or region before: China |