CN109885680B - Short text classification preprocessing method, system and device based on semantic extension - Google Patents
Short text classification preprocessing method, system and device based on semantic extension Download PDFInfo
- Publication number
- CN109885680B CN109885680B CN201910060245.6A CN201910060245A CN109885680B CN 109885680 B CN109885680 B CN 109885680B CN 201910060245 A CN201910060245 A CN 201910060245A CN 109885680 B CN109885680 B CN 109885680B
- Authority
- CN
- China
- Prior art keywords
- word
- word vector
- expansion
- semantic
- vectors
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
- 238000000034 method Methods 0.000 title claims abstract description 34
- 238000007781 pre-processing Methods 0.000 title claims abstract description 30
- 239000013598 vector Substances 0.000 claims abstract description 259
- 238000012545 processing Methods 0.000 claims abstract description 55
- 238000012216 screening Methods 0.000 claims abstract description 26
- 238000004364 calculation method Methods 0.000 claims abstract description 19
- 230000011218 segmentation Effects 0.000 claims description 22
- 238000012217 deletion Methods 0.000 claims description 3
- 230000037430 deletion Effects 0.000 claims description 3
- 238000000605 extraction Methods 0.000 claims description 3
- 238000007635 classification algorithm Methods 0.000 abstract description 9
- 230000000694 effects Effects 0.000 abstract description 3
- 230000007547 defect Effects 0.000 abstract description 2
- 238000004422 calculation algorithm Methods 0.000 description 2
- 238000012360 testing method Methods 0.000 description 2
- 238000012549 training Methods 0.000 description 2
- 230000009286 beneficial effect Effects 0.000 description 1
- 238000013527 convolutional neural network Methods 0.000 description 1
- 238000013135 deep learning Methods 0.000 description 1
- 238000001514 detection method Methods 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000004927 fusion Effects 0.000 description 1
- 238000010801 machine learning Methods 0.000 description 1
- 238000007637 random forest analysis Methods 0.000 description 1
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention discloses a short text classification preprocessing method, a system and a device based on semantic extension, wherein the method comprises the following steps: performing primary processing on short texts to be classified to obtain original word vectors; performing semantic expansion processing on each word in the original word vector to obtain an expanded word vector, and further forming a candidate expanded word vector set; performing semantic similarity calculation on the expansion word vectors in the candidate expansion word vector set, and screening to obtain a group of expansion word vectors with the maximum semantic similarity as specific word vectors; and weighting the original word vector and the specific word vector group to obtain the word vector to be classified. The invention effectively overcomes the defect of insufficient information amount of the original text, and simultaneously avoids the limitation of selection of a later-stage classification algorithm due to the adoption of a semantic expansion mode, and simultaneously has better recognition effect on newly-appeared words, thereby providing help for the generalization performance improvement of the subsequent classification algorithm and greatly improving the accuracy of the subsequent classification.
Description
Technical Field
The invention relates to the technical field of data processing, in particular to a short text classification preprocessing method, system and device based on semantic extension.
Background
Text classification is a very extensive application scenario currently encountered, for example, a news is required to be classified into sports, politics and the like, or a novel story is required to be classified into science fiction, story, swordsman and the like, and the current text classification method is mainly based on a traditional feature engineering plus machine learning algorithm, or a deep learning algorithm is directly used. However, in the field of text classification, a long text provides a large amount of information, while a short text provides very limited information, so that it is easier to extract characteristic information for the long text, and the short text is harder.
For short text classification, the existing methods focus on studying which classification algorithm is adopted to improve classification accuracy, such as convolutional neural network, multi-model fusion, SVM, and random forest. However, in practice, the difficulty of short text classification is that the text is too short, and the amount of information contained in the text is too small, so that the features input to various classification algorithms are too small, and the classification accuracy is low.
Disclosure of Invention
In order to solve the above technical problems, the present invention provides a method, a system and a device for preprocessing short text classification based on semantic extension, which can improve accuracy.
The technical scheme adopted by the invention is as follows:
a short text classification preprocessing method based on semantic extension comprises the following steps:
performing primary processing on short texts to be classified to obtain original word vectors;
performing semantic expansion processing on each word in the original word vector to obtain an expanded word vector, and further forming a candidate expanded word vector set;
performing semantic similarity calculation on the expansion word vectors in the candidate expansion word vector set, and screening to obtain a group of expansion word vectors with the maximum semantic similarity as specific word vectors;
weighting the original word vector and the specific word vector group to obtain word vectors to be classified;
and inputting the word vector to be classified into a classifier for text classification.
As a further improvement of the short text classification preprocessing method based on the semantic extension, the step of performing preliminary processing on the short text to be classified to obtain an original word vector specifically includes:
performing word segmentation processing on short texts to be classified to obtain word segmentation results;
and performing stop word deletion processing on the word segmentation result to obtain an original word vector.
As a further improvement of the short text classification preprocessing method based on the semantic expansion, the method specifically includes the steps of performing semantic expansion processing on each word in an original word vector to obtain an expanded word vector, and further forming a candidate expanded word vector set, where the step of performing the semantic expansion processing includes:
performing semantic expansion processing on each word in the original word vector to obtain a semantic set corresponding to each word;
extracting an original from an original set corresponding to each word according to a preset mode to form an expanded word vector;
and forming a candidate expansion word vector set according to the obtained expansion word vectors.
As a further improvement of the short text classification preprocessing method based on the semantic expansion, the semantic similarity calculation is performed on the expansion word vectors in the candidate expansion word vector set, and the group of expansion word vectors with the largest average semantic similarity is obtained by screening as the specific word vector, and the step specifically includes:
vectorizing expansion word vectors in the candidate expansion word vector set to obtain a word vector feature set corresponding to the expansion word vectors;
calculating the semantic similarity of any two word vector representations according to the word vector representation set corresponding to the expanded word vectors;
calculating the average semantic similarity of a word vector representation set corresponding to the expansion word vector according to the semantic similarity represented by any two word vectors;
and screening the group of expansion word vectors with the maximum average semantic similarity as the specific word vectors according to the average similarity of the word vector representation set corresponding to each expansion word vector.
The other technical scheme adopted by the invention is as follows:
an semantic extension-based short text classification preprocessing system, comprising:
the preliminary processing unit is used for carrying out preliminary processing on the short texts to be classified to obtain original word vectors;
the semantic expansion unit is used for performing semantic expansion processing on each word in the original word vector to obtain an expanded word vector and further form a candidate expanded word vector set;
the screening unit is used for carrying out semantic similarity calculation on the expansion word vectors in the candidate expansion word vector set, and screening to obtain a group of expansion word vectors with the maximum semantic similarity as specific word vectors;
the weighting processing unit is used for weighting the original word vector and the specific word vector group to obtain a word vector to be classified;
and the input unit is used for inputting the word vectors to be classified into the classifier to classify the texts.
As a further improvement of the short text classification preprocessing system based on the semantic extension, the preliminary processing unit specifically includes:
the word segmentation processing unit is used for carrying out word segmentation processing on the short text to be classified to obtain a word segmentation result;
and the stop word processing unit is used for deleting the stop words from the word segmentation result to obtain an original word vector.
As a further improvement of the short text classification preprocessing system based on the semantic extension, the semantic extension unit specifically includes:
the expansion unit is used for carrying out semantic expansion processing on each word in the original word vector to obtain a semantic set corresponding to each word;
the extraction unit is used for extracting the sememes from the sememe set corresponding to each word according to a preset mode to form an expanded word vector;
and the set forming unit is used for forming a candidate expansion word vector set according to the obtained expansion word vectors.
As a further improvement of the short text classification preprocessing system based on the semantic extension, the screening unit specifically includes:
the vectorization processing unit is used for vectorizing and characterizing the expansion word vectors in the candidate expansion word vector set to obtain a word vector feature set corresponding to the expansion word vectors;
the semantic similarity calculation unit is used for calculating the semantic similarity of any two word vector representations according to the word vector representation set corresponding to the expanded word vector;
the average calculating unit is used for calculating the average semantic similarity of the word vector representation set corresponding to the expansion word vector according to the semantic similarity represented by any two word vectors;
and the word vector screening unit is used for screening the group of expansion word vectors with the maximum average semantic similarity as the specific word vector according to the average similarity of the word vector expression set corresponding to each expansion word vector.
The invention adopts another technical scheme that:
an apparatus for preprocessing short text classification based on semantic extension, comprising:
a memory for storing a program;
a processor for executing the program, the program causing the processor to execute the method for preprocessing short text classification based on the semantic extension.
The invention has the beneficial effects that:
the invention relates to a short text classification preprocessing method, a system and a device based on semantic extension, which obtain word vectors to be classified to replace the original short text to participate in a classification algorithm after the semantic extension, semantic similarity calculation and weighting processing, thereby overcoming the defect of insufficient information of the original text.
Drawings
FIG. 1 is a flowchart illustrating steps of a short text classification preprocessing method based on semantic extension according to the present invention;
FIG. 2 is a block diagram of a short text classification preprocessing system based on semantic extension according to the present invention.
Detailed Description
The following further describes embodiments of the present invention with reference to the accompanying drawings:
referring to fig. 1, the invention relates to a short text classification preprocessing method based on semantic extension, comprising the following steps:
s1, carrying out primary processing on the short texts to be classified to obtain original word vectors;
s2, performing semantic expansion processing on each word in the original word vector to obtain an expanded word vector, and further forming a candidate expanded word vector set;
s3, performing semantic similarity calculation on the expansion word vectors in the candidate expansion word vector set, and screening to obtain a group of expansion word vectors with the maximum semantic similarity as specific word vectors;
s4, weighting the original word vector and the specific word vector group to obtain a word vector to be classified;
and S5, inputting the word vector to be classified into a classifier for text classification.
In the present embodiment, it is assumed that T is applied to a short textiTwo groups of expansion word vectors p, q are selected, respectivelyTherefore, a new text, namely a word vector to be classified is formed by the link of the three parts to replace the original short text and put into a classification algorithm.
But to enhance the original short text TiThe invention proposes to apply to the original short text TiThe weighting mode is adopted, namely the new alternative text is as follows:wherein w>1。
And performing the processes of S1-S4 on all the short texts to form a new substitute text data set, and then obtaining the classification condition of each short text by using the new substitute text data set through various existing text classification algorithms.
Further as a preferred embodiment, the preliminary processing is performed on the short text to be classified to obtain an original word vector, and this step specifically includes:
s11, performing word segmentation processing on the short texts to be classified to obtain word segmentation results;
and S12, performing stop word deletion processing on the word segmentation result to obtain an original word vector.
In this embodiment, each short text may be segmented by using any segmentation tool (e.g., jieba, etc.), and then preset stop words like "of", "ground", "get" in the short text are deleted, and then the following original word vectors are obtained:
in the formula TiRepresents the ith short text word vector,representing the first word in the text,represents the C-th in the short textiWord, CiAlso indicating the number of words that are dropped out of the short text.
Further as a preferred embodiment, the performing semantic expansion processing on each word in the original word vector to obtain an expanded word vector, and further forming a candidate expanded word vector set specifically includes:
s21, performing semantic expansion processing on each word in the original word vector to obtain a semantic set corresponding to each word;
wherein the original word vector of each short textEach word in (1)And to the concepts of the foregoing, the invention is thus directed toAnd (3) realizing the expression extension of the short text based on the sememe of each concept of the known network.
Let the jth word in each short text i beThe word correspondence has a semantic set in the concept semantic expression of the knowledge networkWhereinRepresenting the 1 st of the set of sememes, the word always having a common CSjAnd (4) an original meaning.
S22, extracting the sememes from the sememe set corresponding to each word according to a preset mode to form an expanded word vector;
the preset method for constructing a group of expanded word vectors in this embodiment is as follows: respectively from the original word vector TiCorresponding sequence of the set of sememes T _ SemiEach of the primitive collection items ofExtracting one or two sememes to form an extended word vector in a combination manner, for example, extracting a first item of each sememe set to form an extended word vector, which is expressed as follows:
for an original word vector TiWill existSuch an expanded word vector forms a set of expanded word vectors, i.e.
And S23, forming a candidate expansion word vector set according to the obtained expansion word vectors.
Further as a preferred embodiment, the semantic similarity calculation is performed on the expansion word vectors in the candidate expansion word vector set, and the group of expansion word vectors with the largest average semantic similarity is obtained by screening as the specific word vector, where the step specifically includes:
s31, vectorizing the expansion word vectors in the candidate expansion word vector set to obtain a word vector feature set corresponding to the expansion word vectors;
the embodiment of the invention adopts word2vec technology, utilizes a Wikipedia or dog searching corpus as a training corpus, and uses an extended word vectorEach semantic word in the vector is expressed in the form of a vector, the vector can be set to 50,100,300 and other vectors with different dimensions, and each value in the vector is a floating-point type numerical value, so that vectorization representation of each word is completed.
Namely, the word2vec vector table set corresponding to the candidate expansion word vector is obtained
S32, calculating the semantic similarity of any two word vector representations according to the word vector representation set corresponding to the expanded word vector;
the method of the present invention is not limited to the similarity calculation method, and the cosine similarity calculation is only taken as an example in this embodiment. If two vectors A and B are set, the cosine similarity calculation formula of the two vectors A and B is as follows:
the invention adopts a similarity calculation method (such as the cosine similarity calculation formula) to calculate the semantic similarity represented by any two word vectors:
s33, calculating the average semantic similarity of the word vector representation set corresponding to the expansion word vector according to the semantic similarity represented by any two word vectors;
each word2vec vector feature setIn common among such similaritiesTherefore, a vector feature set can be calculatedHas an average similarity of:
This is a short text TiThe average similarity of the first expansion word vector of (2), similarly, the short text T can be calculatediThe average similarity of other expansion word vectors to obtain a short text TiThe average similarity vector of the expanded word vectors of (2) is as follows:
according to the meaning of cosine similarity, the similarity range is given from-1 to 1: a 1 means that the two vectors point in exactly the opposite direction, a 1 means that their points are exactly the same, a 0 usually means that they are independent, and a value between them means an intermediate similarity or dissimilarity.
S34, screening the group of expansion word vectors with the maximum average semantic similarity as the specific word vector according to the average similarity of the word vector expression set corresponding to each expansion word vector.
The invention will be on sim (V)i) Performing a screening operation to selectThe term with the largest value indicates that the closer the semantic association relationship of the corresponding expansion word vector, the more likely it is an expansion vector that can replace the original short text. Of course, the invention also proposes to chooseThe largest two terms are applied to the expanded word vector.
Referring to fig. 2, the invention relates to a short text classification preprocessing system based on semantic extension, comprising:
the preliminary processing unit is used for carrying out preliminary processing on the short texts to be classified to obtain original word vectors;
the semantic expansion unit is used for performing semantic expansion processing on each word in the original word vector to obtain an expanded word vector and further form a candidate expanded word vector set;
the screening unit is used for carrying out semantic similarity calculation on the expansion word vectors in the candidate expansion word vector set, and screening to obtain a group of expansion word vectors with the maximum semantic similarity as specific word vectors;
the weighting processing unit is used for weighting the original word vector and the specific word vector group to obtain a word vector to be classified;
and the input unit is used for inputting the word vectors to be classified into the classifier to classify the texts.
Further as a preferred embodiment, the preliminary treatment unit specifically includes:
the word segmentation processing unit is used for carrying out word segmentation processing on the short text to be classified to obtain a word segmentation result;
and the stop word processing unit is used for deleting the stop words from the word segmentation result to obtain an original word vector.
Further preferably, the sense extension unit specifically includes:
the expansion unit is used for carrying out semantic expansion processing on each word in the original word vector to obtain a semantic set corresponding to each word;
the extraction unit is used for extracting the sememes from the sememe set corresponding to each word according to a preset mode to form an expanded word vector;
and the set forming unit is used for forming a candidate expansion word vector set according to the obtained expansion word vectors.
Further as a preferred embodiment, the screening unit specifically includes:
the vectorization processing unit is used for vectorizing and characterizing the expansion word vectors in the candidate expansion word vector set to obtain a word vector feature set corresponding to the expansion word vectors;
the semantic similarity calculation unit is used for calculating the semantic similarity of any two word vector representations according to the word vector representation set corresponding to the expanded word vector;
the average calculating unit is used for calculating the average semantic similarity of the word vector representation set corresponding to the expansion word vector according to the semantic similarity represented by any two word vectors;
and the word vector screening unit is used for screening the group of expansion word vectors with the maximum average semantic similarity as the specific word vector according to the average similarity of the word vector expression set corresponding to each expansion word vector.
The invention also comprises a short text classification preprocessing device based on the semantic extension, which specifically comprises the following steps:
a memory for storing a program;
a processor for executing the program, the program causing the processor to execute the method for preprocessing short text classification based on the semantic extension.
According to the method, the short text information is expanded, the application of a later text classification algorithm is facilitated, and the classification accuracy can be effectively improved. Compared with the existing vocabulary of the data set, the expansion mode designed by the traditional method easily causes the classification algorithm selected at the later stage to have limitation or is difficult to have better recognition effect on the new words appearing in the test set. The invention provides a method for expanding each word in a short text by using externally associated primitive words to finally form a replaced text, and the length of the replaced text can be flexibly controlled, so that the method has the advantages of no limitation on the selection of a later stage, is suitable for training set data and test set data, and has better recognition effect on newly appeared words in future detection.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.
Claims (7)
1. A short text classification preprocessing method based on semantic extension is characterized by comprising the following steps:
performing primary processing on short texts to be classified to obtain original word vectors;
performing semantic expansion processing on each word in the original word vector to obtain an expanded word vector, and further forming a candidate expanded word vector set;
performing semantic similarity calculation on the expansion word vectors in the candidate expansion word vector set, and screening to obtain a group of expansion word vectors with the maximum semantic similarity as specific word vectors;
weighting the original word vector and the specific word vector group to obtain word vectors to be classified;
inputting the word vectors to be classified into a classifier for text classification;
the semantic similarity calculation is performed on the expansion word vectors in the candidate expansion word vector set, and the group of expansion word vectors with the largest average semantic similarity is obtained by screening and is used as the specific word vector, and the method specifically comprises the following steps:
vectorizing expansion word vectors in the candidate expansion word vector set to obtain a word vector feature set corresponding to the expansion word vectors;
calculating the semantic similarity of any two word vector representations according to the word vector representation set corresponding to the expanded word vectors;
calculating the average semantic similarity of a word vector representation set corresponding to the expansion word vector according to the semantic similarity represented by any two word vectors;
and screening the group of expansion word vectors with the maximum average semantic similarity as the specific word vectors according to the average similarity of the word vector representation set corresponding to each expansion word vector.
2. The method of claim 1, wherein the short text classification preprocessing method based on the semantic extension comprises: the method comprises the following steps of performing preliminary processing on short texts to be classified to obtain an original word vector, wherein the steps specifically comprise:
performing word segmentation processing on short texts to be classified to obtain word segmentation results;
and performing stop word deletion processing on the word segmentation result to obtain an original word vector.
3. The method of claim 1, wherein the short text classification preprocessing method based on the semantic extension comprises: the method specifically includes the steps of performing semantic expansion processing on each word in an original word vector to obtain an expanded word vector, and further forming a candidate expanded word vector set, wherein the steps include:
performing semantic expansion processing on each word in the original word vector to obtain a semantic set corresponding to each word;
extracting an original from an original set corresponding to each word according to a preset mode to form an expanded word vector;
and forming a candidate expansion word vector set according to the obtained expansion word vectors.
4. A short text classification preprocessing system based on semantic extension, comprising:
the preliminary processing unit is used for carrying out preliminary processing on the short texts to be classified to obtain original word vectors;
the semantic expansion unit is used for performing semantic expansion processing on each word in the original word vector to obtain an expanded word vector and further form a candidate expanded word vector set;
the screening unit is used for carrying out semantic similarity calculation on the expansion word vectors in the candidate expansion word vector set, and screening to obtain a group of expansion word vectors with the maximum semantic similarity as specific word vectors;
the weighting processing unit is used for weighting the original word vector and the specific word vector group to obtain a word vector to be classified;
the input unit is used for inputting the word vectors to be classified into the classifier to classify the texts;
the screening unit specifically comprises:
the vectorization processing unit is used for vectorizing and characterizing the expansion word vectors in the candidate expansion word vector set to obtain a word vector feature set corresponding to the expansion word vectors;
the semantic similarity calculation unit is used for calculating the semantic similarity of any two word vector representations according to the word vector representation set corresponding to the expanded word vector;
the average calculating unit is used for calculating the average semantic similarity of the word vector representation set corresponding to the expansion word vector according to the semantic similarity represented by any two word vectors;
and the word vector screening unit is used for screening the group of expansion word vectors with the maximum average semantic similarity as the specific word vector according to the average similarity of the word vector expression set corresponding to each expansion word vector.
5. The system of claim 4, wherein the short text classification preprocessing system based on the semantic extension comprises: the preliminary processing unit specifically comprises:
the word segmentation processing unit is used for carrying out word segmentation processing on the short text to be classified to obtain a word segmentation result;
and the stop word processing unit is used for deleting the stop words from the word segmentation result to obtain an original word vector.
6. The system of claim 4, wherein the short text classification preprocessing system based on the semantic extension comprises: the semantic extension unit specifically comprises:
the expansion unit is used for carrying out semantic expansion processing on each word in the original word vector to obtain a semantic set corresponding to each word;
the extraction unit is used for extracting the sememes from the sememe set corresponding to each word according to a preset mode to form an expanded word vector;
and the set forming unit is used for forming a candidate expansion word vector set according to the obtained expansion word vectors.
7. An apparatus for preprocessing short text classification based on semantic extension, comprising:
a memory for storing a program;
a processor for executing the program, the program causing the processor to execute the method of the short text classification preprocessing based on the semantic extension according to any one of claims 1 to 3.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910060245.6A CN109885680B (en) | 2019-01-22 | 2019-01-22 | Short text classification preprocessing method, system and device based on semantic extension |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910060245.6A CN109885680B (en) | 2019-01-22 | 2019-01-22 | Short text classification preprocessing method, system and device based on semantic extension |
Publications (2)
Publication Number | Publication Date |
---|---|
CN109885680A CN109885680A (en) | 2019-06-14 |
CN109885680B true CN109885680B (en) | 2020-05-19 |
Family
ID=66926608
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910060245.6A Active CN109885680B (en) | 2019-01-22 | 2019-01-22 | Short text classification preprocessing method, system and device based on semantic extension |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109885680B (en) |
Families Citing this family (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN110765259A (en) * | 2019-09-19 | 2020-02-07 | 平安科技(深圳)有限公司 | Text filtering method based on lexical semaphores and related equipment |
CN115083550B (en) * | 2022-06-29 | 2023-08-08 | 西安理工大学 | Patient similarity classification method based on multi-source information |
Family Cites Families (2)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US8527523B1 (en) * | 2009-04-22 | 2013-09-03 | Equivio Ltd. | System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith |
CN108920482B (en) * | 2018-04-27 | 2020-08-21 | 浙江工业大学 | Microblog short text classification method based on lexical chain feature extension and LDA (latent Dirichlet Allocation) model |
-
2019
- 2019-01-22 CN CN201910060245.6A patent/CN109885680B/en active Active
Also Published As
Publication number | Publication date |
---|---|
CN109885680A (en) | 2019-06-14 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107102981B (en) | Word vector generation method and device | |
Chen et al. | Learning deep features for image emotion classification | |
CN111159485B (en) | Tail entity linking method, device, server and storage medium | |
CN112347284B (en) | Combined trademark image retrieval method | |
KR20200075114A (en) | System and Method for Matching Similarity between Image and Text | |
CN112819023A (en) | Sample set acquisition method and device, computer equipment and storage medium | |
CN106227719B (en) | Chinese word segmentation disambiguation method and system | |
CN106844482B (en) | Search engine-based retrieval information matching method and device | |
CN106708798A (en) | String segmentation method and device | |
CN109885680B (en) | Short text classification preprocessing method, system and device based on semantic extension | |
CN115544303A (en) | Method, apparatus, device and medium for determining label of video | |
CN114647713A (en) | Knowledge graph question-answering method, device and storage medium based on virtual confrontation | |
CN111368066A (en) | Method, device and computer readable storage medium for acquiring dialogue abstract | |
CN110413997B (en) | New word discovery method, system and readable storage medium for power industry | |
CN115187910A (en) | Video classification model training method and device, electronic equipment and storage medium | |
CN105354264B (en) | A kind of quick adding method of theme label based on local sensitivity Hash | |
CN112711944B (en) | Word segmentation method and system, and word segmentation device generation method and system | |
CN111241271A (en) | Text emotion classification method and device and electronic equipment | |
CN113704623A (en) | Data recommendation method, device, equipment and storage medium | |
Pei-Xia et al. | Learning discriminative CNN features and similarity metrics for image retrieval | |
CN111159456B (en) | Multi-scale clothing retrieval method and system based on deep learning and traditional features | |
Henri et al. | A deep transfer learning model for the identification of bird songs: A case study for Mauritius | |
CN111488400A (en) | Data classification method, device and computer readable storage medium | |
CN109241124A (en) | A kind of method and system of quick-searching similar character string | |
Sridhar et al. | Performance Analysis of Two-Stage Iterative Ensemble Method over Random Oversampling Methods on Multiclass Imbalanced Datasets |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |