CN110968665A - Method for recognizing upper and lower level word relation based on gradient enhanced decision tree - Google Patents
Method for recognizing upper and lower level word relation based on gradient enhanced decision tree Download PDFInfo
- Publication number
- CN110968665A CN110968665A CN201911086620.0A CN201911086620A CN110968665A CN 110968665 A CN110968665 A CN 110968665A CN 201911086620 A CN201911086620 A CN 201911086620A CN 110968665 A CN110968665 A CN 110968665A
- Authority
- CN
- China
- Prior art keywords
- sample
- word
- path
- samples
- decision tree
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Images
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/35—Clustering; Classification
- G06F16/355—Class or cluster creation or modification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/334—Query execution
- G06F16/3344—Query execution using natural language analysis
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/21—Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
- G06F18/214—Generating training patterns; Bootstrap methods, e.g. bagging or boosting
- G06F18/2148—Generating training patterns; Bootstrap methods, e.g. bagging or boosting characterised by the process organisation or structure, e.g. boosting cascade
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- Evolutionary Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computational Linguistics (AREA)
- Machine Translation (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention relates to a method for identifying the relation between upper and lower terms based on a gradient enhanced decision tree. To train the classification model, the inputs are the entity pairs and their path information, and the outputs are either 1 (for context) or 0 (for no context). And a high-confidence recommendation set based on a positive classification result is obtained by jointly training the two classifiers. The model adapts quickly to regular patterns of unlabeled corpus text by continuously iterating high confidence sets. The method can better mine the upper and lower word relation of the E-business domain.
Description
Technical Field
The invention relates to the technical field of natural language processing, in particular to a method for recognizing the relation between upper and lower level words based on a gradient enhanced decision tree.
Background
Automatic mining and verification of context relationships between entities is an important task in e-commerce. The context relationship represents a relationship between a generic entity (hyperym) and a specific instance thereof (hypoym). Such as appliances and refrigerators. In electronic commerce, mining such lower relationships helps to better understand user queries and commodity recommendations.
However, in electronic commerce, this task faces many challenges. First, the corpus of text on the web often contains a lot of noise, and the text is updated frequently. Noise makes it difficult for the general method to obtain valid information from e-commerce text. The high frequency of updates wastes significant labor costs in labeling new and proprietary words. Second, there are currently known about 10 billion commercial entities (including a large number of isomorphs). Assume that each node in the category tree has at least one root node (y) and its associated leaf node (x) is greater than or equal to 0. Then the category tree for the commodity entity would be huge. This requires a good recall rate while ensuring accuracy. Aiming at the particularity of a corpus text in the field of electronic commerce, a gradient enhancement decision tree method based on joint training is provided by taking the semi-supervised thought as reference. The method can automatically dig the up-down hyperword entity relationship in the text of the specific domain and the noise text. From the various entity relationship mining methods of the prior invention, all entity relationship mining methods can be classified as supervised, semi-supervised and unsupervised. Wherein in two-class learning, multiple classifier trains are combined together with higher accuracy than learning alone. This approach requires that related tasks share a similar representation. The bootstrapping method is to train a classifier based on a small number of labeled samples and then iteratively augment the training set with highly-trusted samples in the current model. bootstrapping is good at introducing new or large e-commerce unlabeled text corpora through small sample guided seeds. However, this method has a "semantic drift" problem after many iterations. In order to reduce errors continuously introduced in semi-supervised learning iteration, one method is to perform cross training on different types of samples to prevent the precision from being reduced; or the bias of the marking error is reduced by conditional independent segmentation of the feature space. Other non-bootstrapping techniques use the same extraction method to generate independent errors, thereby triggering predictions from multiple extractors. These predictions combine to improve the accuracy of the extraction. In addition to these, there are methods that use two complementary methods to handle both top and bottom entity relationship mining, a distributed-based method and a path-based method. Distributed methods are excellent for finding entity relationships. Some path-based methods, however, encode using a recurrent neural network, with results comparable to distributed methods.
The mining method for mining the relation of upper and lower position words in the complex text mainly comprises the following steps of:
firstly, when a user searches commodities, the search content is expanded through the upper and lower terms, secondary search is reduced, and user experience is improved.
And secondly, adding commodity information recall. Under the condition of not changing the dimensionality, the recall information precision is improved, and the recall information amount is enriched.
And thirdly, the situation that the scene card can be used for multiple times in the application scene is improved.
And fourthly, layering related words of the commodity domain into categories, attributes, attribute values and auxiliary classification tree system.
Fifthly, positioning the hot new words.
Disclosure of Invention
The invention aims to overcome the defects and provides a method for identifying the superior-inferior word relationship based on a gradient enhanced decision tree. To train the classification model, the inputs are the entity pairs and their path information, and the outputs are either 1 (for context) or 0 (for no context). And a high-confidence recommendation set based on a positive classification result is obtained by jointly training the two classifiers. The model adapts quickly to regular patterns of unlabeled corpus text by continuously iterating high confidence sets. The method can better mine the upper and lower word relation of the E-business domain.
The invention achieves the aim through the following technical scheme: a method for recognizing the relation between upper and lower terms based on a gradient enhanced decision tree comprises the following steps:
(1) constructing a random dislocation sample training set;
(2) constructing a sample training set based on the path;
(3) and training a semi-supervised combined gradient enhancement decision tree model according to the constructed random dislocation sample training set and the path-based sample training set, and recognizing the upper and lower level word relation by using the trained model.
Preferably, the construction method of the random dislocation sample training set comprises the following steps:
(1.1) segmenting the corpus text based on an Alibaba Word Segmenter lexical analysis system; extracting upper and lower word pairs from the existing word stock for matching, and constructing a positive sample by combining texts between the word pairs;
(1.2) misplacing the upper and lower words of the successfully matched word pair to serve as negative sample word pairs; matching the texts by using the staggered words to construct random staggered negative samples;
and (1.3) combining the positive and negative samples obtained in the step, and constructing to obtain a random dislocation sample training set.
Preferably, the method for constructing the path-based sample training set includes:
(2.1) fragmenting the corpus text and recording as Ssplit=Split({S1,S2,S3,…,Sn}); (2.2) taking the malposition word pairs in the random malposition sample, matching with the corpus text to obtain a sentence set S containing malposition upper and lower word pairs<x,y>={S<x1,y1>,S<x2,y2>,S<x3,y3>,…,S<xn,yn>}; (2.3) taking out the path between the misaligned word pair, and recording the path as P ═ P1,P2,P3…,Pn};
(2.4) extracting the paths and the corpus fragment { S1,S2,S3,…,SnMatching, inquiring a fragment prototype sentence after matching is successful, and taking a first word before and after a path P' but not an original staggered word pair as a negative sample word pair based on the path; a path-based training set of samples is obtained in combination with the positive samples.
Preferably, the corpus fragmentation adopts an Ngarm algorithm to enumerate sentence fragments formed by all continuous participles, each participle is marked as length 1, and a segment with a path length not greater than 5 is taken.
Preferably, the semi-supervised joint gradient enhancement decision tree model is an addition model, the learning algorithm is a forward stepping algorithm, and the basis function is a CART tree; the loss function is the mean square error function loss, i.e.:
the negative gradient is then:
wherein y-f (x) is the residual error; the output is: classification tree f (x).
Preferably, the semi-supervised joint gradient enhancement decision tree training method comprises the following steps:
inputting a text corpus T, pre-trained word embedding and maximum iteration I;
(i) preprocessing T data, and extracting two types of training samples XpAnd XdWherein X ispFor the path-based sample training set, XdTraining set for random dislocation sample;
(ii) converting each training sample into a vector representation using word embedding W;
(iv) using X separatelyp∪X′pAnd Xd∪X′dBy training two classifiers f1And f2;
(v) Predicting the unlabeled samples, and selecting positive samples with high confidence coefficient to be used as new training samples X'pAnd X'dCarrying out expansion;
(vi) recycling step (iv) and step (v) up to X'pAnd X'dNew annotated samples are not appearing;
and (3) outputting: two classifiers and a prediction label for the test sample.
The invention has the beneficial effects that: the method can complete the sample construction of the complex text, and mark the prediction of the unmarked entity; the method analyzes the characteristics of the e-commerce domain text, summarizes the upper and lower word pairs of some e-commerce domains by means of substring, pattern, rule learning and the like, and can better mine the upper and lower word relation of the e-commerce domains.
Drawings
FIG. 1 is a schematic flow diagram of the process of the present invention;
FIG. 2 is a schematic diagram of a training set of random misalignment samples according to an embodiment of the present invention;
FIG. 3 is a schematic diagram of a path sample training set-based construction according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of a training process of a gradient enhanced decision tree model according to an embodiment of the present invention.
Detailed Description
The invention will be further described with reference to specific examples, but the scope of the invention is not limited thereto:
example (b): as shown in fig. 1, a method for identifying a context relationship based on a gradient-enhanced decision tree includes the following steps:
(1) the construction of the random misalignment sample training set is specifically as follows, as shown in fig. 2:
the linguistic data text is firstly participled through a lexical analysis system based on AliWS (Aliba Word Segmenter for short). And extracting upper and lower word pairs from the existing word stock for matching, and constructing a positive sample by combining texts between the word pairs. And (5) staggering the upper and lower words of the successfully matched word pair to serve as a negative sample word pair. And then, matching the text by using the misplaced words to construct a random misplaced negative sample, such as:
(1) < apple is a fruit >
(2) < fruits such as apple >
(3) < dog is an animal >
After dislocation, the < apple, fruit >, < dog, animal > becomes < apple, animal > < dog, fruit >. And then finding a sentence path matched with the corpus in the corpus. After screening, the following results were obtained:
(1) < apple for tropical animals >
(2) < dogs do not eat fruit >
The misaligned word pairs and their path information as a whole are constructed as negative examples. And combining the positive and negative samples to form a random dislocation sample training set.
The path between two word pairs is taken to satisfy:
1. the number of words does not exceed 5 words in length, e.g. "is a" length of 3.
2. Except the word pairs with the length of single word byte less than 2, wherein the word pairs comprise one, no and no in the non-upper and lower word pairs.
3. When the corpus cannot be matched to contain two word pairs at the same time, the training corpus cannot be constructed according to the word pair.
And based on the word mode characteristic vector representation obtained above, word embedding and word mode characteristic vector splicing of the word pairs, and using the finally spliced characteristic vector as the representation of the word pairs. The calculation process is as follows:
i.e. the vector represented by the path of a given word pair < x, y >.
(2) The construction of the path-based sample training set is specifically as follows, as shown in fig. 3: first, the corpus text is fragmented and recorded as Ssplit=Split({S1,S2,S3,…,Sn}). The corpus fragmentation uses the Ngarm algorithm to enumerate sentence fragments formed by all continuous participles, each participle is marked as length 1, and a fragment with the path length not more than 5 is taken. For example, if "dragon fruit is a rose fruit" the sentence with length of 7 is fragmented:
(1) dragon fruit
(2) The dragon fruit is
(3) The dragon fruit is
(4) The dragon fruit is
(5) The dragon fruit is a rose
And 28 kinds of fragments.
Taking the malposition word pair in the random malposition sample, matching with the corpus text to obtain a sentence set S containing the malposition upper and lower word pairs<x,y>={S<x1,y1>,S<x2,y2>,S<x3,y3>,…,S<xn,yn>For example:
(1) s < x1, y1> < apple for tropical animals >
(2) (iii) S < x2, y2> < dogs do not eat fruit >
Taking out the path between the dislocation word pair, and recording as P ═ P1,P2,P3…,Pn}. Extracting the paths and the corpus fragment { S1,S2,S3,…,SnMatch is made. And after matching is successful, inquiring the fragment prototype sentence, and taking the first word before and after the path P', which is not the original staggered word pair, as a path negative sample word pair. Such as:
P1=<for tropical zones>
P2=<Can not eat>
Matching with the text fragment to obtain a sentence:
s' < such a temperature is very suitable for tropical animals >
(cold weather people do not eat cold food >
The negative sample word pair < temperature, animal >, < people, cold food > based on the path can be obtained. And finally, combining the positive sample with the positive sample to obtain a path-based sample training set.
(3) Training a semi-supervised joint gradient enhancement decision tree model according to the constructed random dislocation sample training set and the path-based sample training set, wherein the construction training process is shown in FIG. 4; and performing upper and lower level word relation recognition by using the trained model.
After two training samples, namely random dislocation samples and path-based samples, are constructed, semi-supervised joint gradient enhancement decision tree model training is started. The misplaced-based samples change the 100-dimensional vector of the path 300 and the hyponyms when constructed, and the path-based samples change the 200-dimensional vector of the hypernyms and hyponyms when constructed.
The semi-supervised combined gradient enhancement decision tree training method comprises the following steps:
inputting a text corpus T, pre-trained word embedding and maximum iteration I;
(i) preprocessing T data, and extracting two types of training samples XpAnd XdWherein X ispFor the path-based sample training set, XdTraining set for random dislocation sample;
(ii) converting each training sample into a vector representation using word embedding W;
(iv) using X separatelyp∪X′pAnd Xd∪X′dBy training two classifiers f1And f2;
(v) Predicting the unlabeled samples, and selecting positive samples with high confidence coefficient to be used as new training samples X'pAnd X'dCarrying out expansion;
(vi) recycling step (iv) and step (v) up to X'pAnd X'dNew annotated samples are not appearing;
and (3) outputting: two classifiers and a prediction label for the test sample.
The relationship mining of the superior and inferior terms is essentially a binary task. Gradient-enhanced decision trees are among the best algorithms to fit to the true distribution in conventional machine learning algorithms, and are algorithms that classify or regress data by using additive models (i.e., linear combinations of basis functions) and by continuously reducing the residual errors generated by the training process. The gradient enhancement decision tree model is an addition model, the learning algorithm is a forward stepping algorithm, and the basis function is a CART tree. The loss function is the mean square error function loss, i.e.:
the negative gradient is then:
and y-f (x) is residual error, and the model learns a weak classifier by fitting the residual error in each iteration. The requirements for weak classifiers are generally simple enough and are low variance and high variance. Because the training process is to continuously improve the accuracy of the final classifier by reducing the bias. The core is that each weak classifier is the residual of the conclusion sum of all previous classifiers, and the residual is an accumulated amount of the true value obtained after adding the predicted value. The input of the model is a marked sample, and the marked sample is divided into a path sample and a dislocation sample. Since label is a label and is a binary task, label indicates a high-low word pair or a non-high word pair by [1, 0 ]. The format is as follows:
the output is: classification tree f (x).
The method mainly comprises the following steps:
(1) initialization: c is a constant value estimated to minimize the loss function, which is a tree with only one root node, and the general squared loss function is the mean of the nodes
(2) For M ═ 1,2,3, …, M:
(a) calculating a residual error for the sample i-1, 2,3 …, N;
(b) to { (x)1,rm1),…,(xN,rmN) Fitting a classification tree to obtain leaf node regions R of the mth treemj,j=1,2,…,J
(c) For J equal to 1,2, …, J, the values of leaf node regions are estimated by linear search to minimize the loss function, and the calculation is performed
K represents the number of samples in the jth node of the mth tree. The above formula represents cmjIs used to determine the average of the residuals in the jth node of the mth tree.
(d) The update minimizes the loss function, I being the parameter controlling the negative gradient.
(3) Obtaining a final classification tree:
obtaining a gradient enhanced decision tree classification function, and then carrying out marking-free data T'1And performing path sample construction and then putting the path sample into a classification tree for prediction. The method comprises the following steps:
the input is as follows:
(4) during training, a classification regression tree is trained for each possible class of the sample X. The training set has two types, namely an upper-lower relation or a non-upper-lower word relation, for the sample<x,y>The prediction result 0 indicates a non-hypernym relationship, and 1 indicates a hypernym relationship. After multi-round iterative training, two trees are generated, and a new sample is obtained<x’,y’>Is respectively F1(x),F2(x) Then the probability that the sample belongs to a certain class c is:
the method comprises the steps of training two classifiers by constructing different samples, taking a sample of which the prediction result of the same sample on the two classifiers is greater than 0.8 as a high-confidence sample.
When text { T1,T2,T3,…,TnWhen there is no intersection, the new text T2The generated high confidence set is directly added into the training set after being audited. At this time, the growth rate:when the semi-supervised model learns the high confidence set of the nth mutually disjoint text, the growth rate tends to be 0:
when text { T'1,T′2,T′3,...,T′NT 'for any two texts'n,T′mWhen there is intersection
T′n∪T′m=T′n\T′m+T′n∩T′m+T′m\Tn
I.e. for T'1Newly-added T'm,T′nThe effect of the text is equivalent to newly added T'n\T′m+T′n∩T′m+T′m\T′n. And taking the intersection of the upper and lower word pairs in the meaning of the intersection.
Then when n documents are newly added,
that is, when there is intersection between the texts, any n texts can be split into at mostMutually disjoint text. When learning the nth text, assuming that i ≠ j, the text growth rate is:
is T'i\T′j=T′ij,T′i\T′j=T′ji,T′i∩T′j=T′(j,i)And T'ij,T′ji,T′(j,i)And if the two are not mutually intersected, then:
when N → N
So the amount of newly added information tends to be 0 when N → N, i.e., text, tends to add the full amount of text. If for any T'iTo learn T'iWhen the growth rate of the model is greater than or equal to 0, when i tends to infinity, the growth rate of the model tends to 0; the model converges.
While the invention has been described in connection with specific embodiments thereof, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (6)
1. A method for recognizing the relation between upper and lower level words based on a gradient enhanced decision tree is characterized by comprising the following steps:
(1) constructing a random dislocation sample training set;
(2) constructing a sample training set based on the path;
(3) and training a semi-supervised combined gradient enhancement decision tree model according to the constructed random dislocation sample training set and the path-based sample training set, and recognizing the upper and lower level word relation by using the trained model.
2. The method for recognizing the relation between upper and lower level words based on the gradient reinforced decision tree as claimed in claim 1, wherein: the construction method of the random dislocation sample training set comprises the following steps:
(1.1) segmenting the corpus text based on an Alibaba Word Segmenter lexical analysis system; extracting upper and lower word pairs from the existing word stock for matching, and constructing a positive sample by combining texts between the word pairs;
(1.2) misplacing the upper and lower words of the successfully matched word pair to serve as negative sample word pairs; matching the texts by using the staggered words to construct random staggered negative samples;
and (1.3) combining the positive and negative samples obtained in the step, and constructing to obtain a random dislocation sample training set.
3. The method for recognizing the relation between upper and lower level words based on the gradient reinforced decision tree as claimed in claim 1, wherein: the method for constructing the path-based sample training set comprises the following steps:
(2.1) fragmenting the corpus text and recording as Ssplit=Split({S1,S2,S3,…,Sn});
(2.2) taking the malposition word pairs in the random malposition sample, matching with the corpus text to obtain a sentence set S containing malposition upper and lower word pairs<x,y>={S<x1,y1>,S<x2,y2>,S<x3,y3>,…,S<xn,yn>};
(2.3) taking out the path between the misaligned word pair, and recording the path as P ═ P1,P2,P3…,Pn};
(2.4) extracting the paths and the corpus fragment { S1,S2,S3,…,SnMatching, inquiring a fragment prototype sentence after matching is successful, and taking a first word before and after a path P' but not an original staggered word pair as a negative sample word pair based on the path; a path-based training set of samples is obtained in combination with the positive samples.
4. The method for recognizing the relation between upper and lower level words based on the gradient reinforced decision tree as claimed in claim 3, wherein: the corpus fragmentation adopts an Ngarm algorithm to enumerate sentence fragments formed by all continuous participles, each participle is marked as length 1, and a fragment with the path length not more than 5 is taken.
5. The method for recognizing the relation between upper and lower level words based on the gradient reinforced decision tree as claimed in claim 1, wherein: the semi-supervised joint gradient enhancement decision tree model is an addition model, the learning algorithm is a forward stepping algorithm, and the basis function is a CART tree; the loss function is the mean square error function loss, i.e.:
the negative gradient is then:
wherein y-f (x) is the residual error; the output is: classification tree f (x).
6. The method for recognizing the relation between upper and lower level words based on the gradient reinforced decision tree as claimed in claim 1, wherein: the semi-supervised combined gradient enhancement decision tree training method comprises the following steps:
inputting a text corpus T, pre-trained word embedding and maximum iteration I;
(i) preprocessing T data, and extracting two types of training samples XpAnd XdWherein X ispFor the path-based sample training set, XdTraining set for random dislocation sample;
(ii) converting each training sample into a vector representation using word embedding W;
(iv) using X separatelyp∪X′pAnd Xd∪X′dBy training two classifiers f1And f2;
(v) Predicting the unlabeled samples, and selecting positive samples with high confidence coefficient to be used as new training samples X'pAnd X'dCarrying out expansion;
(vi) recycling step (iv) and step (v) up to X'pAnd X'dNew annotated samples are not appearing;
and (3) outputting: two classifiers and a prediction label for the test sample.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911086620.0A CN110968665B (en) | 2019-11-08 | 2019-11-08 | Method for recognizing upper and lower level word relation based on gradient enhanced decision tree |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201911086620.0A CN110968665B (en) | 2019-11-08 | 2019-11-08 | Method for recognizing upper and lower level word relation based on gradient enhanced decision tree |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110968665A true CN110968665A (en) | 2020-04-07 |
CN110968665B CN110968665B (en) | 2022-09-23 |
Family
ID=70030486
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201911086620.0A Active CN110968665B (en) | 2019-11-08 | 2019-11-08 | Method for recognizing upper and lower level word relation based on gradient enhanced decision tree |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110968665B (en) |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2008202384A1 (en) * | 2008-05-23 | 2009-12-10 | O'Collins, Frank Anthony Mr | Ucadia Semantic Classification System |
CN107506486A (en) * | 2017-09-21 | 2017-12-22 | 北京航空航天大学 | A kind of relation extending method based on entity link |
CN108733702A (en) * | 2017-04-20 | 2018-11-02 | 北京京东尚科信息技术有限公司 | User inquires method, apparatus, electronic equipment and the medium of hyponymy extraction |
CN109408642A (en) * | 2018-08-30 | 2019-03-01 | 昆明理工大学 | A kind of domain entities relation on attributes abstracting method based on distance supervision |
CN110196982A (en) * | 2019-06-12 | 2019-09-03 | 腾讯科技(深圳)有限公司 | Hyponymy abstracting method, device and computer equipment |
-
2019
- 2019-11-08 CN CN201911086620.0A patent/CN110968665B/en active Active
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
AU2008202384A1 (en) * | 2008-05-23 | 2009-12-10 | O'Collins, Frank Anthony Mr | Ucadia Semantic Classification System |
CN108733702A (en) * | 2017-04-20 | 2018-11-02 | 北京京东尚科信息技术有限公司 | User inquires method, apparatus, electronic equipment and the medium of hyponymy extraction |
CN107506486A (en) * | 2017-09-21 | 2017-12-22 | 北京航空航天大学 | A kind of relation extending method based on entity link |
CN109408642A (en) * | 2018-08-30 | 2019-03-01 | 昆明理工大学 | A kind of domain entities relation on attributes abstracting method based on distance supervision |
CN110196982A (en) * | 2019-06-12 | 2019-09-03 | 腾讯科技(深圳)有限公司 | Hyponymy abstracting method, device and computer equipment |
Non-Patent Citations (1)
Title |
---|
郭茂盛等: "文本蕴含关系识别与知识获取研究进展及展望", 《计算机学报》 * |
Also Published As
Publication number | Publication date |
---|---|
CN110968665B (en) | 2022-09-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107330032B (en) | Implicit discourse relation analysis method based on recurrent neural network | |
CN110427623B (en) | Semi-structured document knowledge extraction method and device, electronic equipment and storage medium | |
CN108984724B (en) | Method for improving emotion classification accuracy of specific attributes by using high-dimensional representation | |
CN108897857B (en) | Chinese text subject sentence generating method facing field | |
CN108829722B (en) | Remote supervision Dual-Attention relation classification method and system | |
CN108182295B (en) | Enterprise knowledge graph attribute extraction method and system | |
CN111444343B (en) | Cross-border national culture text classification method based on knowledge representation | |
CN111753024B (en) | Multi-source heterogeneous data entity alignment method oriented to public safety field | |
CN111694924A (en) | Event extraction method and system | |
CN113268995B (en) | Chinese academy keyword extraction method, device and storage medium | |
CN106649561A (en) | Intelligent question-answering system for tax consultation service | |
CN108304373B (en) | Semantic dictionary construction method and device, storage medium and electronic device | |
CN101561805A (en) | Document classifier generation method and system | |
CN112307351A (en) | Model training and recommending method, device and equipment for user behavior | |
CN112069312B (en) | Text classification method based on entity recognition and electronic device | |
CN112966525B (en) | Law field event extraction method based on pre-training model and convolutional neural network algorithm | |
CN110210036A (en) | A kind of intension recognizing method and device | |
CN112131876A (en) | Method and system for determining standard problem based on similarity | |
CN113157859A (en) | Event detection method based on upper concept information | |
CN111159405B (en) | Irony detection method based on background knowledge | |
CN114417851A (en) | Emotion analysis method based on keyword weighted information | |
CN114564563A (en) | End-to-end entity relationship joint extraction method and system based on relationship decomposition | |
CN112115259A (en) | Feature word driven text multi-label hierarchical classification method and system | |
Katumullage et al. | Using neural network models for wine review classification | |
CN113722439B (en) | Cross-domain emotion classification method and system based on antagonism class alignment network |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |