CN109885680B - Short text classification preprocessing method, system and device based on semantic extension - Google Patents

Short text classification preprocessing method, system and device based on semantic extension Download PDF

Info

Publication number
CN109885680B
CN109885680B CN201910060245.6A CN201910060245A CN109885680B CN 109885680 B CN109885680 B CN 109885680B CN 201910060245 A CN201910060245 A CN 201910060245A CN 109885680 B CN109885680 B CN 109885680B
Authority
CN
China
Prior art keywords
word
word vector
expansion
semantic
vectors
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910060245.6A
Other languages
Chinese (zh)
Other versions
CN109885680A (en
Inventor
郑建华
刘双印
朱蓉
贺超波
徐龙琴
张世龙
冯大春
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zhongkai University of Agriculture and Engineering
Original Assignee
Zhongkai University of Agriculture and Engineering
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zhongkai University of Agriculture and Engineering filed Critical Zhongkai University of Agriculture and Engineering
Priority to CN201910060245.6A priority Critical patent/CN109885680B/en
Publication of CN109885680A publication Critical patent/CN109885680A/en
Application granted granted Critical
Publication of CN109885680B publication Critical patent/CN109885680B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention discloses a short text classification preprocessing method, a system and a device based on semantic extension, wherein the method comprises the following steps: performing primary processing on short texts to be classified to obtain original word vectors; performing semantic expansion processing on each word in the original word vector to obtain an expanded word vector, and further forming a candidate expanded word vector set; performing semantic similarity calculation on the expansion word vectors in the candidate expansion word vector set, and screening to obtain a group of expansion word vectors with the maximum semantic similarity as specific word vectors; and weighting the original word vector and the specific word vector group to obtain the word vector to be classified. The invention effectively overcomes the defect of insufficient information amount of the original text, and simultaneously avoids the limitation of selection of a later-stage classification algorithm due to the adoption of a semantic expansion mode, and simultaneously has better recognition effect on newly-appeared words, thereby providing help for the generalization performance improvement of the subsequent classification algorithm and greatly improving the accuracy of the subsequent classification.

Description

Short text classification preprocessing method, system and device based on semantic extension
Technical Field
The invention relates to the technical field of data processing, in particular to a short text classification preprocessing method, system and device based on semantic extension.
Background
Text classification is a very extensive application scenario currently encountered, for example, a news is required to be classified into sports, politics and the like, or a novel story is required to be classified into science fiction, story, swordsman and the like, and the current text classification method is mainly based on a traditional feature engineering plus machine learning algorithm, or a deep learning algorithm is directly used. However, in the field of text classification, a long text provides a large amount of information, while a short text provides very limited information, so that it is easier to extract characteristic information for the long text, and the short text is harder.
For short text classification, the existing methods focus on studying which classification algorithm is adopted to improve classification accuracy, such as convolutional neural network, multi-model fusion, SVM, and random forest. However, in practice, the difficulty of short text classification is that the text is too short, and the amount of information contained in the text is too small, so that the features input to various classification algorithms are too small, and the classification accuracy is low.
Disclosure of Invention
In order to solve the above technical problems, the present invention provides a method, a system and a device for preprocessing short text classification based on semantic extension, which can improve accuracy.
The technical scheme adopted by the invention is as follows:
a short text classification preprocessing method based on semantic extension comprises the following steps:
performing primary processing on short texts to be classified to obtain original word vectors;
performing semantic expansion processing on each word in the original word vector to obtain an expanded word vector, and further forming a candidate expanded word vector set;
performing semantic similarity calculation on the expansion word vectors in the candidate expansion word vector set, and screening to obtain a group of expansion word vectors with the maximum semantic similarity as specific word vectors;
weighting the original word vector and the specific word vector group to obtain word vectors to be classified;
and inputting the word vector to be classified into a classifier for text classification.
As a further improvement of the short text classification preprocessing method based on the semantic extension, the step of performing preliminary processing on the short text to be classified to obtain an original word vector specifically includes:
performing word segmentation processing on short texts to be classified to obtain word segmentation results;
and performing stop word deletion processing on the word segmentation result to obtain an original word vector.
As a further improvement of the short text classification preprocessing method based on the semantic expansion, the method specifically includes the steps of performing semantic expansion processing on each word in an original word vector to obtain an expanded word vector, and further forming a candidate expanded word vector set, where the step of performing the semantic expansion processing includes:
performing semantic expansion processing on each word in the original word vector to obtain a semantic set corresponding to each word;
extracting an original from an original set corresponding to each word according to a preset mode to form an expanded word vector;
and forming a candidate expansion word vector set according to the obtained expansion word vectors.
As a further improvement of the short text classification preprocessing method based on the semantic expansion, the semantic similarity calculation is performed on the expansion word vectors in the candidate expansion word vector set, and the group of expansion word vectors with the largest average semantic similarity is obtained by screening as the specific word vector, and the step specifically includes:
vectorizing expansion word vectors in the candidate expansion word vector set to obtain a word vector feature set corresponding to the expansion word vectors;
calculating the semantic similarity of any two word vector representations according to the word vector representation set corresponding to the expanded word vectors;
calculating the average semantic similarity of a word vector representation set corresponding to the expansion word vector according to the semantic similarity represented by any two word vectors;
and screening the group of expansion word vectors with the maximum average semantic similarity as the specific word vectors according to the average similarity of the word vector representation set corresponding to each expansion word vector.
The other technical scheme adopted by the invention is as follows:
an semantic extension-based short text classification preprocessing system, comprising:
the preliminary processing unit is used for carrying out preliminary processing on the short texts to be classified to obtain original word vectors;
the semantic expansion unit is used for performing semantic expansion processing on each word in the original word vector to obtain an expanded word vector and further form a candidate expanded word vector set;
the screening unit is used for carrying out semantic similarity calculation on the expansion word vectors in the candidate expansion word vector set, and screening to obtain a group of expansion word vectors with the maximum semantic similarity as specific word vectors;
the weighting processing unit is used for weighting the original word vector and the specific word vector group to obtain a word vector to be classified;
and the input unit is used for inputting the word vectors to be classified into the classifier to classify the texts.
As a further improvement of the short text classification preprocessing system based on the semantic extension, the preliminary processing unit specifically includes:
the word segmentation processing unit is used for carrying out word segmentation processing on the short text to be classified to obtain a word segmentation result;
and the stop word processing unit is used for deleting the stop words from the word segmentation result to obtain an original word vector.
As a further improvement of the short text classification preprocessing system based on the semantic extension, the semantic extension unit specifically includes:
the expansion unit is used for carrying out semantic expansion processing on each word in the original word vector to obtain a semantic set corresponding to each word;
the extraction unit is used for extracting the sememes from the sememe set corresponding to each word according to a preset mode to form an expanded word vector;
and the set forming unit is used for forming a candidate expansion word vector set according to the obtained expansion word vectors.
As a further improvement of the short text classification preprocessing system based on the semantic extension, the screening unit specifically includes:
the vectorization processing unit is used for vectorizing and characterizing the expansion word vectors in the candidate expansion word vector set to obtain a word vector feature set corresponding to the expansion word vectors;
the semantic similarity calculation unit is used for calculating the semantic similarity of any two word vector representations according to the word vector representation set corresponding to the expanded word vector;
the average calculating unit is used for calculating the average semantic similarity of the word vector representation set corresponding to the expansion word vector according to the semantic similarity represented by any two word vectors;
and the word vector screening unit is used for screening the group of expansion word vectors with the maximum average semantic similarity as the specific word vector according to the average similarity of the word vector expression set corresponding to each expansion word vector.
The invention adopts another technical scheme that:
an apparatus for preprocessing short text classification based on semantic extension, comprising:
a memory for storing a program;
a processor for executing the program, the program causing the processor to execute the method for preprocessing short text classification based on the semantic extension.
The invention has the beneficial effects that:
the invention relates to a short text classification preprocessing method, a system and a device based on semantic extension, which obtain word vectors to be classified to replace the original short text to participate in a classification algorithm after the semantic extension, semantic similarity calculation and weighting processing, thereby overcoming the defect of insufficient information of the original text.
Drawings
FIG. 1 is a flowchart illustrating steps of a short text classification preprocessing method based on semantic extension according to the present invention;
FIG. 2 is a block diagram of a short text classification preprocessing system based on semantic extension according to the present invention.
Detailed Description
The following further describes embodiments of the present invention with reference to the accompanying drawings:
referring to fig. 1, the invention relates to a short text classification preprocessing method based on semantic extension, comprising the following steps:
s1, carrying out primary processing on the short texts to be classified to obtain original word vectors;
s2, performing semantic expansion processing on each word in the original word vector to obtain an expanded word vector, and further forming a candidate expanded word vector set;
s3, performing semantic similarity calculation on the expansion word vectors in the candidate expansion word vector set, and screening to obtain a group of expansion word vectors with the maximum semantic similarity as specific word vectors;
s4, weighting the original word vector and the specific word vector group to obtain a word vector to be classified;
and S5, inputting the word vector to be classified into a classifier for text classification.
In the present embodiment, it is assumed that T is applied to a short textiTwo groups of expansion word vectors p, q are selected, respectively
Figure GDA0002415953520000061
Therefore, a new text, namely a word vector to be classified is formed by the link of the three parts to replace the original short text and put into a classification algorithm.
But to enhance the original short text TiThe invention proposes to apply to the original short text TiThe weighting mode is adopted, namely the new alternative text is as follows:
Figure GDA0002415953520000062
wherein w>1。
And performing the processes of S1-S4 on all the short texts to form a new substitute text data set, and then obtaining the classification condition of each short text by using the new substitute text data set through various existing text classification algorithms.
Further as a preferred embodiment, the preliminary processing is performed on the short text to be classified to obtain an original word vector, and this step specifically includes:
s11, performing word segmentation processing on the short texts to be classified to obtain word segmentation results;
and S12, performing stop word deletion processing on the word segmentation result to obtain an original word vector.
In this embodiment, each short text may be segmented by using any segmentation tool (e.g., jieba, etc.), and then preset stop words like "of", "ground", "get" in the short text are deleted, and then the following original word vectors are obtained:
Figure GDA0002415953520000071
in the formula TiRepresents the ith short text word vector,
Figure GDA0002415953520000072
representing the first word in the text,
Figure GDA0002415953520000073
represents the C-th in the short textiWord, CiAlso indicating the number of words that are dropped out of the short text.
Further as a preferred embodiment, the performing semantic expansion processing on each word in the original word vector to obtain an expanded word vector, and further forming a candidate expanded word vector set specifically includes:
s21, performing semantic expansion processing on each word in the original word vector to obtain a semantic set corresponding to each word;
wherein the original word vector of each short text
Figure GDA0002415953520000074
Each word in (1)
Figure GDA0002415953520000075
And to the concepts of the foregoing, the invention is thus directed toAnd (3) realizing the expression extension of the short text based on the sememe of each concept of the known network.
Let the jth word in each short text i be
Figure GDA0002415953520000076
The word correspondence has a semantic set in the concept semantic expression of the knowledge network
Figure GDA0002415953520000077
Wherein
Figure GDA0002415953520000078
Representing the 1 st of the set of sememes, the word always having a common CSjAnd (4) an original meaning.
Other words of similar short text i
Figure GDA0002415953520000079
A corresponding set of semaphores may also be constructed,
Figure GDA0002415953520000081
so for each short text primitive word vector TiA sequence of the set of sememes can be obtained
Figure GDA0002415953520000082
S22, extracting the sememes from the sememe set corresponding to each word according to a preset mode to form an expanded word vector;
the preset method for constructing a group of expanded word vectors in this embodiment is as follows: respectively from the original word vector TiCorresponding sequence of the set of sememes T _ SemiEach of the primitive collection items of
Figure GDA0002415953520000083
Extracting one or two sememes to form an extended word vector in a combination manner, for example, extracting a first item of each sememe set to form an extended word vector, which is expressed as follows:
Figure GDA0002415953520000084
for an original word vector TiWill exist
Figure GDA0002415953520000085
Such an expanded word vector forms a set of expanded word vectors, i.e.
Figure GDA0002415953520000086
And S23, forming a candidate expansion word vector set according to the obtained expansion word vectors.
Further as a preferred embodiment, the semantic similarity calculation is performed on the expansion word vectors in the candidate expansion word vector set, and the group of expansion word vectors with the largest average semantic similarity is obtained by screening as the specific word vector, where the step specifically includes:
s31, vectorizing the expansion word vectors in the candidate expansion word vector set to obtain a word vector feature set corresponding to the expansion word vectors;
the embodiment of the invention adopts word2vec technology, utilizes a Wikipedia or dog searching corpus as a training corpus, and uses an extended word vector
Figure GDA0002415953520000087
Each semantic word in the vector is expressed in the form of a vector, the vector can be set to 50,100,300 and other vectors with different dimensions, and each value in the vector is a floating-point type numerical value, so that vectorization representation of each word is completed.
Such as
Figure GDA0002415953520000091
In (1)
Figure GDA0002415953520000092
Is characterized by
Figure GDA0002415953520000093
Figure GDA0002415953520000094
Is characterized by
Figure GDA0002415953520000095
Namely, the word2vec vector table set corresponding to the candidate expansion word vector is obtained
Figure GDA0002415953520000096
S32, calculating the semantic similarity of any two word vector representations according to the word vector representation set corresponding to the expanded word vector;
the method of the present invention is not limited to the similarity calculation method, and the cosine similarity calculation is only taken as an example in this embodiment. If two vectors A and B are set, the cosine similarity calculation formula of the two vectors A and B is as follows:
Figure GDA0002415953520000097
the invention adopts a similarity calculation method (such as the cosine similarity calculation formula) to calculate the semantic similarity represented by any two word vectors:
Figure GDA0002415953520000098
s33, calculating the average semantic similarity of the word vector representation set corresponding to the expansion word vector according to the semantic similarity represented by any two word vectors;
each word2vec vector feature set
Figure GDA0002415953520000099
In common among such similarities
Figure GDA00024159535200000910
Therefore, a vector feature set can be calculated
Figure GDA00024159535200000911
Has an average similarity of:
Figure GDA0002415953520000101
This is a short text TiThe average similarity of the first expansion word vector of (2), similarly, the short text T can be calculatediThe average similarity of other expansion word vectors to obtain a short text TiThe average similarity vector of the expanded word vectors of (2) is as follows:
Figure GDA0002415953520000102
according to the meaning of cosine similarity, the similarity range is given from-1 to 1: a 1 means that the two vectors point in exactly the opposite direction, a 1 means that their points are exactly the same, a 0 usually means that they are independent, and a value between them means an intermediate similarity or dissimilarity.
S34, screening the group of expansion word vectors with the maximum average semantic similarity as the specific word vector according to the average similarity of the word vector expression set corresponding to each expansion word vector.
The invention will be on sim (V)i) Performing a screening operation to select
Figure GDA0002415953520000103
The term with the largest value indicates that the closer the semantic association relationship of the corresponding expansion word vector, the more likely it is an expansion vector that can replace the original short text. Of course, the invention also proposes to choose
Figure GDA0002415953520000104
The largest two terms are applied to the expanded word vector.
Referring to fig. 2, the invention relates to a short text classification preprocessing system based on semantic extension, comprising:
the preliminary processing unit is used for carrying out preliminary processing on the short texts to be classified to obtain original word vectors;
the semantic expansion unit is used for performing semantic expansion processing on each word in the original word vector to obtain an expanded word vector and further form a candidate expanded word vector set;
the screening unit is used for carrying out semantic similarity calculation on the expansion word vectors in the candidate expansion word vector set, and screening to obtain a group of expansion word vectors with the maximum semantic similarity as specific word vectors;
the weighting processing unit is used for weighting the original word vector and the specific word vector group to obtain a word vector to be classified;
and the input unit is used for inputting the word vectors to be classified into the classifier to classify the texts.
Further as a preferred embodiment, the preliminary treatment unit specifically includes:
the word segmentation processing unit is used for carrying out word segmentation processing on the short text to be classified to obtain a word segmentation result;
and the stop word processing unit is used for deleting the stop words from the word segmentation result to obtain an original word vector.
Further preferably, the sense extension unit specifically includes:
the expansion unit is used for carrying out semantic expansion processing on each word in the original word vector to obtain a semantic set corresponding to each word;
the extraction unit is used for extracting the sememes from the sememe set corresponding to each word according to a preset mode to form an expanded word vector;
and the set forming unit is used for forming a candidate expansion word vector set according to the obtained expansion word vectors.
Further as a preferred embodiment, the screening unit specifically includes:
the vectorization processing unit is used for vectorizing and characterizing the expansion word vectors in the candidate expansion word vector set to obtain a word vector feature set corresponding to the expansion word vectors;
the semantic similarity calculation unit is used for calculating the semantic similarity of any two word vector representations according to the word vector representation set corresponding to the expanded word vector;
the average calculating unit is used for calculating the average semantic similarity of the word vector representation set corresponding to the expansion word vector according to the semantic similarity represented by any two word vectors;
and the word vector screening unit is used for screening the group of expansion word vectors with the maximum average semantic similarity as the specific word vector according to the average similarity of the word vector expression set corresponding to each expansion word vector.
The invention also comprises a short text classification preprocessing device based on the semantic extension, which specifically comprises the following steps:
a memory for storing a program;
a processor for executing the program, the program causing the processor to execute the method for preprocessing short text classification based on the semantic extension.
According to the method, the short text information is expanded, the application of a later text classification algorithm is facilitated, and the classification accuracy can be effectively improved. Compared with the existing vocabulary of the data set, the expansion mode designed by the traditional method easily causes the classification algorithm selected at the later stage to have limitation or is difficult to have better recognition effect on the new words appearing in the test set. The invention provides a method for expanding each word in a short text by using externally associated primitive words to finally form a replaced text, and the length of the replaced text can be flexibly controlled, so that the method has the advantages of no limitation on the selection of a later stage, is suitable for training set data and test set data, and has better recognition effect on newly appeared words in future detection.
While the preferred embodiments of the present invention have been illustrated and described, it will be understood by those skilled in the art that various changes in form and details may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims (7)

1. A short text classification preprocessing method based on semantic extension is characterized by comprising the following steps:
performing primary processing on short texts to be classified to obtain original word vectors;
performing semantic expansion processing on each word in the original word vector to obtain an expanded word vector, and further forming a candidate expanded word vector set;
performing semantic similarity calculation on the expansion word vectors in the candidate expansion word vector set, and screening to obtain a group of expansion word vectors with the maximum semantic similarity as specific word vectors;
weighting the original word vector and the specific word vector group to obtain word vectors to be classified;
inputting the word vectors to be classified into a classifier for text classification;
the semantic similarity calculation is performed on the expansion word vectors in the candidate expansion word vector set, and the group of expansion word vectors with the largest average semantic similarity is obtained by screening and is used as the specific word vector, and the method specifically comprises the following steps:
vectorizing expansion word vectors in the candidate expansion word vector set to obtain a word vector feature set corresponding to the expansion word vectors;
calculating the semantic similarity of any two word vector representations according to the word vector representation set corresponding to the expanded word vectors;
calculating the average semantic similarity of a word vector representation set corresponding to the expansion word vector according to the semantic similarity represented by any two word vectors;
and screening the group of expansion word vectors with the maximum average semantic similarity as the specific word vectors according to the average similarity of the word vector representation set corresponding to each expansion word vector.
2. The method of claim 1, wherein the short text classification preprocessing method based on the semantic extension comprises: the method comprises the following steps of performing preliminary processing on short texts to be classified to obtain an original word vector, wherein the steps specifically comprise:
performing word segmentation processing on short texts to be classified to obtain word segmentation results;
and performing stop word deletion processing on the word segmentation result to obtain an original word vector.
3. The method of claim 1, wherein the short text classification preprocessing method based on the semantic extension comprises: the method specifically includes the steps of performing semantic expansion processing on each word in an original word vector to obtain an expanded word vector, and further forming a candidate expanded word vector set, wherein the steps include:
performing semantic expansion processing on each word in the original word vector to obtain a semantic set corresponding to each word;
extracting an original from an original set corresponding to each word according to a preset mode to form an expanded word vector;
and forming a candidate expansion word vector set according to the obtained expansion word vectors.
4. A short text classification preprocessing system based on semantic extension, comprising:
the preliminary processing unit is used for carrying out preliminary processing on the short texts to be classified to obtain original word vectors;
the semantic expansion unit is used for performing semantic expansion processing on each word in the original word vector to obtain an expanded word vector and further form a candidate expanded word vector set;
the screening unit is used for carrying out semantic similarity calculation on the expansion word vectors in the candidate expansion word vector set, and screening to obtain a group of expansion word vectors with the maximum semantic similarity as specific word vectors;
the weighting processing unit is used for weighting the original word vector and the specific word vector group to obtain a word vector to be classified;
the input unit is used for inputting the word vectors to be classified into the classifier to classify the texts;
the screening unit specifically comprises:
the vectorization processing unit is used for vectorizing and characterizing the expansion word vectors in the candidate expansion word vector set to obtain a word vector feature set corresponding to the expansion word vectors;
the semantic similarity calculation unit is used for calculating the semantic similarity of any two word vector representations according to the word vector representation set corresponding to the expanded word vector;
the average calculating unit is used for calculating the average semantic similarity of the word vector representation set corresponding to the expansion word vector according to the semantic similarity represented by any two word vectors;
and the word vector screening unit is used for screening the group of expansion word vectors with the maximum average semantic similarity as the specific word vector according to the average similarity of the word vector expression set corresponding to each expansion word vector.
5. The system of claim 4, wherein the short text classification preprocessing system based on the semantic extension comprises: the preliminary processing unit specifically comprises:
the word segmentation processing unit is used for carrying out word segmentation processing on the short text to be classified to obtain a word segmentation result;
and the stop word processing unit is used for deleting the stop words from the word segmentation result to obtain an original word vector.
6. The system of claim 4, wherein the short text classification preprocessing system based on the semantic extension comprises: the semantic extension unit specifically comprises:
the expansion unit is used for carrying out semantic expansion processing on each word in the original word vector to obtain a semantic set corresponding to each word;
the extraction unit is used for extracting the sememes from the sememe set corresponding to each word according to a preset mode to form an expanded word vector;
and the set forming unit is used for forming a candidate expansion word vector set according to the obtained expansion word vectors.
7. An apparatus for preprocessing short text classification based on semantic extension, comprising:
a memory for storing a program;
a processor for executing the program, the program causing the processor to execute the method of the short text classification preprocessing based on the semantic extension according to any one of claims 1 to 3.
CN201910060245.6A 2019-01-22 2019-01-22 Short text classification preprocessing method, system and device based on semantic extension Active CN109885680B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910060245.6A CN109885680B (en) 2019-01-22 2019-01-22 Short text classification preprocessing method, system and device based on semantic extension

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910060245.6A CN109885680B (en) 2019-01-22 2019-01-22 Short text classification preprocessing method, system and device based on semantic extension

Publications (2)

Publication Number Publication Date
CN109885680A CN109885680A (en) 2019-06-14
CN109885680B true CN109885680B (en) 2020-05-19

Family

ID=66926608

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910060245.6A Active CN109885680B (en) 2019-01-22 2019-01-22 Short text classification preprocessing method, system and device based on semantic extension

Country Status (1)

Country Link
CN (1) CN109885680B (en)

Families Citing this family (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110765259A (en) * 2019-09-19 2020-02-07 平安科技(深圳)有限公司 Text filtering method based on lexical semaphores and related equipment
CN115083550B (en) * 2022-06-29 2023-08-08 西安理工大学 Patient similarity classification method based on multi-source information

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US8527523B1 (en) * 2009-04-22 2013-09-03 Equivio Ltd. System for enhancing expert-based computerized analysis of a set of digital documents and methods useful in conjunction therewith
CN108920482B (en) * 2018-04-27 2020-08-21 浙江工业大学 Microblog short text classification method based on lexical chain feature extension and LDA (latent Dirichlet Allocation) model

Also Published As

Publication number Publication date
CN109885680A (en) 2019-06-14

Similar Documents

Publication Publication Date Title
CN107102981B (en) Word vector generation method and device
Chen et al. Learning deep features for image emotion classification
CN111159485B (en) Tail entity linking method, device, server and storage medium
CN112347284B (en) Combined trademark image retrieval method
KR20200075114A (en) System and Method for Matching Similarity between Image and Text
CN112819023A (en) Sample set acquisition method and device, computer equipment and storage medium
CN106227719B (en) Chinese word segmentation disambiguation method and system
CN106844482B (en) Search engine-based retrieval information matching method and device
CN106708798A (en) String segmentation method and device
CN109885680B (en) Short text classification preprocessing method, system and device based on semantic extension
CN115544303A (en) Method, apparatus, device and medium for determining label of video
CN114647713A (en) Knowledge graph question-answering method, device and storage medium based on virtual confrontation
CN111368066A (en) Method, device and computer readable storage medium for acquiring dialogue abstract
CN110413997B (en) New word discovery method, system and readable storage medium for power industry
CN115187910A (en) Video classification model training method and device, electronic equipment and storage medium
CN105354264B (en) A kind of quick adding method of theme label based on local sensitivity Hash
CN112711944B (en) Word segmentation method and system, and word segmentation device generation method and system
CN111241271A (en) Text emotion classification method and device and electronic equipment
CN113704623A (en) Data recommendation method, device, equipment and storage medium
Pei-Xia et al. Learning discriminative CNN features and similarity metrics for image retrieval
CN111159456B (en) Multi-scale clothing retrieval method and system based on deep learning and traditional features
Henri et al. A deep transfer learning model for the identification of bird songs: A case study for Mauritius
CN111488400A (en) Data classification method, device and computer readable storage medium
CN109241124A (en) A kind of method and system of quick-searching similar character string
Sridhar et al. Performance Analysis of Two-Stage Iterative Ensemble Method over Random Oversampling Methods on Multiclass Imbalanced Datasets

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant