CN111078876A - Short text classification method and system based on multi-model integration - Google Patents

Short text classification method and system based on multi-model integration Download PDF

Info

Publication number
CN111078876A
CN111078876A CN201911229492.0A CN201911229492A CN111078876A CN 111078876 A CN111078876 A CN 111078876A CN 201911229492 A CN201911229492 A CN 201911229492A CN 111078876 A CN111078876 A CN 111078876A
Authority
CN
China
Prior art keywords
classification
model
training
short text
models
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201911229492.0A
Other languages
Chinese (zh)
Inventor
段东圣
井雅琪
任博雅
时磊
孙旷怡
李扬曦
佟玲玲
习健
宋永浩
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Computing Technology of CAS
National Computer Network and Information Security Management Center
Original Assignee
Institute of Computing Technology of CAS
National Computer Network and Information Security Management Center
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Computing Technology of CAS, National Computer Network and Information Security Management Center filed Critical Institute of Computing Technology of CAS
Priority to CN201911229492.0A priority Critical patent/CN111078876A/en
Publication of CN111078876A publication Critical patent/CN111078876A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/353Clustering; Classification into predefined classes
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • G06F16/355Class or cluster creation or modification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a short text classification method based on multi-model integration, which comprises the following steps: selecting a plurality of classification models for classifying the short texts; sampling the training samples to generate a training set corresponding to the classification model one by one; training the classification model through a corresponding training set to obtain a corresponding final model; classifying the target text through all the final models to obtain a plurality of classification result vectors; and integrating all the classification result vectors to obtain a final result vector, and taking the class represented by the element with the maximum value in the final result vector as the class of the target text.

Description

Short text classification method and system based on multi-model integration
Technical Field
The invention relates to the field of deep learning, in particular to a method and a system for classifying Chinese short text information through multiple models.
Background
With the rapid development of social contact modes such as microblogs and WeChat, short texts become an important information form in life. The proper classification of short text messages (i.e., determining a category for each sample according to a predefined subject category) has wide application, such as identification of specific categories of information, multi-dimensional classification of product ratings, and the like.
The invention relates to a complaint short text classification method based on deep ensemble learning in China, which comprises the following steps: CN109739986A, a BTM topic model and a convolutional neural network are used for respectively extracting the features of the text, combining the features and inputting the combined features into an integrated random forest model. The method has the advantages that random forests are used during integration, submodels (Bert, Text RNN, Text CNN and SVM) with different types and structures are integrated, the submodels are large in structural difference and rich in diversity, and the differential features of short Text data samples can be extracted and coded from different angles, so that the extracted feature distribution is closer to the overall feature distribution of data. The invention relates to a Bagging-BSJ short text classification method in China, which has the following publication numbers: CN107292348A, adopting Bagging integration algorithm idea to perform semantic feature expansion on the short text, and combining Bayesian algorithm, support vector machine algorithm and J48 algorithm to classify the short text after semantic feature expansion.
The adoption of deep learning models to classify short text messages is a method that has been commonly adopted in recent years. Particularly, a Bert model derived from Google AI team in 2018 is a huge model built by adopting a deep bidirectional Transformer, the number of parameters is more than 3 hundred million, the model obtains the best performance at that time on 11 NLP tasks, and huge reverberation is caused in the NLP industry. Subsequently, companies such as OpenAI, FastAI, etc. also successively derive their own gross models, and compared to well-known GPT, GPT2, Elmo, etc., NLP leaderboard is refreshed many times.
However, the large-scale model represented by Bert still has some problems to be solved in the real application of short text classification. Only one of the problems is analyzed here: because the number of parameters to be trained is huge, even if fine tuning is performed on the basis of a pre-training model, a large-volume model needs a large amount of training data, and in practical application, the quantity of marking data which can be matched with the model volume is difficult to collect. Because the large-volume model has extremely strong fitting capability, under the condition of insufficient data, an overfitting phenomenon often occurs, so that the generalization capability is insufficient, namely the trained model can well classify the training data, and the classification effect on unknown data is sharply reduced.
At present, no relevant method or scheme is found in the aspect of improving the generalization capability of the Bert model. In traditional machine learning and deep learning applications, the generalization ability of the model is generally improved by expanding the number of samples in a training data set, and by supplementing training samples, the distribution of the samples in the training set can better approach the overall distribution of the data, so that the model generated by training can more accurately fit the overall distribution of the data, and the generalization ability of the model is improved. However, in real-world applications, it is often difficult to collect a sufficient amount of training data, requiring high time and labor costs, and in this way increasing the generalization capability of Bert is costly.
Disclosure of Invention
Aiming at the problem that the generalization capability of short text classification is insufficient when short text classification is performed by applying the Bert due to the fact that the scale of training data is not enough to match the quantity of parameters of the Bert model in practical application, the method adopts a mode of respectively training a plurality of short text classification models, and then integrates the classification results of the plurality of classification models to obtain the final classification result.
Specifically, the short text classification method based on multi-model integration comprises the following steps: selecting a plurality of classification models for classifying the short texts; sampling the training samples to generate a plurality of training sets corresponding to the classification models one by one; training the classification model through a corresponding training set to obtain a corresponding final model; classifying the target text through all the final models to obtain a plurality of classification result vectors; and integrating all the classification result vectors to obtain a final result vector, and taking the class represented by the element with the maximum value in the final result vector as the class of the target text.
The short text classification method of the invention comprises the following steps: a Bert model, a TextRnn model, a TextCNN model, and an SVM model.
The short text classification method of the invention, wherein the classification result vector is a binary vector, a first value of the classification result vector represents a probability value that the target text belongs to a first class, and a second value of the classification result vector represents a probability value that the target text belongs to a second class; and carrying out weighted average on all the classification result vectors to obtain the final result vector, wherein the final result vector is a binary vector.
The short text classification method of the invention, wherein the process of sampling the training sample comprises the following steps: sampling data from the training sample a plurality of times in a sample-back manner to generate the training set; when the number of the training samples is greater than the sampling threshold, the generated plurality of training sets are independent from each other, and when the number of the training samples is less than or equal to the sampling threshold, the generated plurality of training sets are the same.
The invention also provides a short text classification system based on multi-model integration, which comprises the following steps: the classification model selection module is used for selecting a plurality of classification models for classifying the short texts; the training data acquisition module is used for sampling the training samples and generating a plurality of training sets which are in one-to-one correspondence with the classification models; the classification model training module is used for training the classification model through a corresponding training set so as to obtain a plurality of final models; the target text classification module is used for classifying the target texts through all the final models to obtain a plurality of classification result vectors; and the classification result integration module is used for integrating all the classification result vectors to obtain a final result vector, and taking the class represented by the element with the maximum value in the final result vector as the class of the target text.
The short text classification system of the invention, wherein the classification model comprises: a Bert model, a TextRnn model, a TextCNN model, and an SVM model.
In the short text classification system, in the target text classification module, the classification result vector is a binary vector, a first value of the classification result vector represents a probability value that the target text belongs to a first class, and a second value of the classification result vector represents a probability value that the target text belongs to a second class; in the classification result integration module, the final result vector is obtained by performing weighted average on all the classification result vectors, and the final result vector is a binary vector.
The invention relates to a short text classification system, wherein the training data acquisition module comprises: sampling data from the training sample a plurality of times in a sample-back manner to generate the training set; when the number of the training samples is greater than the sampling threshold, the generated plurality of training sets are independent from each other, and when the number of the training samples is less than or equal to the sampling threshold, the generated plurality of training sets are the same.
The present invention also provides a computer-readable storage medium storing executable instructions for performing the short text classification method based on multi-model integration as described above.
The invention also provides a data processing device, which comprises the computer readable storage medium, wherein the processor of the data processing device calls and executes the executable instructions in the readable storage medium to perform short text classification based on multi-model integration.
According to the short text classification method, the classification models of the short texts are trained through the training sets respectively, and then the classification results of the classification models are weighted and averaged to obtain the final classification result, so that the unknown data can be better classified, and better generalization capability can be obtained.
Drawings
FIG. 1 is a flow chart of the short text classification method based on multi-model integration of the present invention.
FIG. 2 is a flow chart of training sample sampling of the short text classification method of the present invention.
FIG. 3 is a schematic diagram of the training of the classification model of the short text classification method of the present invention.
FIG. 4 is a schematic diagram of the multi-model ensemble classification of the present invention.
FIG. 5 is a schematic diagram of a data processing apparatus of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the present invention more clearly understood, the short text classification method and system based on multi-model integration proposed by the present invention are further described in detail below with reference to the accompanying drawings. It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.
The invention aims to solve the problem that the generalization capability of a Bert model is insufficient when short text classification is carried out due to the fact that the scale of training data is not enough to match the parameter quantity of the model in practical application, and provides a multi-model integration framework.
The invention integrates classification models (Bert, Text RNN, Text CNN and SVM) with different types and structures, each classification model has large structure difference and richer diversity, and can extract and code the differentiated features of short Text data samples from different angles, so that the extracted feature distribution is closer to the overall feature distribution of the data. The method not only considers the traditional SVM model with a non-deep network structure in integration, but also adds a Bert model built based on a Transformer and an attention mechanism, a TextRNN model based on RNN and a TextCNN model based on CNN. Different types of short text classification models are selected, and differential extraction and coding of text features are achieved, so that overall distribution of data is better fitted.
The technical key points of the model integration framework and the system for improving the short text classification generalization capability mainly comprise selection of a plurality of short text classification models, training data sampling and model training, fusion of multi-model classification results and the like, and the main technical key points comprise:
1. a plurality of short text classification models are selected. The key point of the model selection is that different types of short text classification models are selected, so that the differential extraction and coding of text features are realized, and the overall distribution of data is better fitted. Because the Bert model is a deep model built based on a Transformer and an attention mechanism, three models different from the Bert model in the realization mechanism are selected in the scheme, and the three models are respectively as follows: the method is characterized in that a TextRNN model based on RNN, a TextCNN model based on CNN and an SVM model with a non-deep network structure (the three models are open source models) are used for realizing the extraction and coding of differential features of short text data samples from different angles, so that the extracted feature distribution is closer to the feature distribution of the data population.
2. For a given training set, data is sampled for multiple times in a return sampling mode to generate multiple data sets. Compared with the 4 short text classification models of the specific embodiment of the present invention, for a data set with sample data larger than 2 thousands, the generated 4 data sets each contain 1.5 thousands of data, and for a data set with sample data smaller than 2 thousands, sampling is not required (i.e., 4 data sets are the same data set). The 4 selected models are then trained using the generated 4 data sets, respectively. The technical effects of the operation in this step are as follows: when the training data is insufficient, the overfitting of the Bert model to the data characteristics of the training sample is avoided by dividing the data and respectively training 3 small-scale auxiliary models.
3. And fusing multi-model classification results. For short text data to be classified, the 4 models generated by training are used for classification calculation, each model outputs probability value vectors of each category, and the 4 result vectors are weighted and averaged to generate final result vectors. In the result vector, the category represented by the element with the largest value is the category of the short text data. Because the selected 4 models extract and encode the characteristics of the input data from different angles, the high variance caused by the overfitting of a single model can be reduced through weighted average, and the accuracy of text classification is improved.
Aiming at the problem that the generalization capability of short text classification is insufficient when short text classification is carried out by applying the Bert due to the fact that the scale of training data is not enough to match the quantity of parameters of the Bert model in practical application, the invention designs the multi-model integration framework and the system for improving the generalization capability of the short text classification, and the multi-model integration framework and the system can have better classification effect on unknown data and obtain better generalization capability. FIG. 1 is a flow chart of the short text classification method based on multi-model integration of the present invention. As shown in fig. 1, the present invention targets the second category, and the specific embodiment is as follows:
step S1: a plurality of short text classification models are selected. The invention takes a Bert model as a basic model, selects three models with different realizing mechanisms from the Bert model, and respectively comprises the following steps: the method is characterized in that a TextRNN model based on RNN, a TextCNN model based on CNN and an SVM model with a non-deep network structure (the three models are open source models) are used for realizing the extraction and coding of differentiated features of data samples from different angles, so that the extracted feature distribution is closer to the overall feature distribution of the data.
Step S2: and sampling the training samples to generate four independent training sets. FIG. 2 is a flow chart of training sample sampling for the short text classification method of the present invention; as shown in fig. 2, the algorithm process of the sampling of the present invention specifically includes:
step S21, reading all the labeled data samples;
step S22, judging whether the total amount of the labeled data samples is more than 2 ten thousand, if the total amount of the labeled data samples is less than 2 ten thousand, sampling is not needed, namely 4 classification models are trained by using the original training set, the sampling process is finished, and if the total amount of the labeled data samples is more than 2 ten thousand, the next step is continuously executed;
step S23, randomly sampling a sample from the sample set and putting the sample into a first result set;
and step S24, repeating step S23 until the number of samples in the first result set reaches 1.5 ten thousand, and continuing to generate the second, third and fourth result data sets until all the data sets are finished.
To this end, four independent sets of training samples are generated.
Step S3: and (5) training a classification model. And (3) respectively training four text classification models of Bert, TextRNN, TextCNN and SVM by using the four training data sets generated in the step S2 until the parameters converge, wherein fig. 3 is a schematic diagram of training the classification models of the short text classification method of the present invention. As shown in fig. 3, the present invention needs to perform a cleaning process on the training text data, which includes sorting the text into a format satisfying the input, removing special symbols, etc., such as an asterisk (#) and a pound (#) that are meaningless for the text. After data cleaning, the training process of each model is also different. The SVM and the TextRNN both need to be segmented, and a dictionary defined by a user is added in the segmentation step so as to improve the segmentation accuracy. The text data after Word segmentation needs to be vectorized by a Word2Vec language model trained in advance, and then input into an SVM and a TextRNN model for training. The Word2Vec language model is trained by using nearly 500w pieces of corpus data in the field, so that the model is more aware and sensitive to the characteristics of text composition, distribution and the like in the specific field, and the accuracy of model identification is improved. And Bert and TextCNN are directly input text data, and do not need to go through word segmentation and vectorization processes, wherein Bert also goes through a 'pre-training' process in advance before training (fine tuning), and almost 500 pieces of corpus data in the field are also used.
Step S4: and integrating multi-model classification results. When classifying texts, the input texts (target texts) are classified simultaneously by using the multiple models generated in step S3, and fig. 4 is a schematic diagram of the multiple-model integrated classification according to the present invention. As shown in fig. 4, each classification model will obtain a classification result, which is a binary vector, where a first value in the vector represents a probability value that the input text belongs to the first class, and a second value in the vector represents a probability value that the input text belongs to the first class; then, the four binary vectors are weighted and averaged according to the bit to obtain a new binary vector (final result vector), and the category represented by the element with the maximum value in the final result vector is the category of the short text data. In the embodiment of the present invention, other manners, such as a summation manner, may also be adopted to integrate the classification results, and the present invention is not limited thereto.
FIG. 5 is a schematic diagram of a data processing apparatus of the present invention. As shown in fig. 5, the embodiment of the present invention also provides a computer-readable storage medium and a data processing apparatus. The computer readable storage medium of the present invention stores executable instructions, and when the executable instructions are executed by a processor of a data processing apparatus, the method for classifying short texts based on multi-model integration is implemented. It will be understood by those skilled in the art that all or part of the steps of the above method may be implemented by instructing relevant hardware (e.g., processor, FPGA, ASIC, etc.) through a program, and the program may be stored in a readable storage medium, such as a read-only memory, a magnetic or optical disk, etc. All or some of the steps of the above embodiments may also be implemented using one or more integrated circuits. Accordingly, the modules in the above embodiments may be implemented in hardware, for example, by an integrated circuit, or in software, for example, by a processor executing programs/instructions stored in a memory. Embodiments of the invention are not limited to any specific form of hardware or software combination.
The method and the device can be applied to short text classification scenes, such as text classification tasks of short messages, screening of specific category data in microblogs, screening of junk mails, inquiry and division of chat robots and the like. Aiming at the problem that the generalization capability of short text classification of the application of the Bert is insufficient due to the fact that the scale of training data is not enough to match the parameter quantity of the Bert model in practical application, the invention designs a multi-model integration framework and a multi-model integration system for improving the Bert generalization capability. The integrated system is subjected to a large amount of practical tests in microblog data screening application, and verification results show that although the integrated model consumes more time in the training process, the obtained comprehensive model can have better classification effect and stability on unknown data, and better generalization performance is obtained.
The above embodiments are only for illustrating the invention and are not to be construed as limiting the invention, and those skilled in the art can make various changes and modifications without departing from the spirit and scope of the invention, therefore, all equivalent technical solutions also fall into the scope of the invention, and the scope of the invention is defined by the claims.

Claims (10)

1. A short text classification method based on multi-model integration is characterized by comprising the following steps:
selecting a plurality of classification models for classifying the short texts;
sampling the training samples to generate a plurality of training sets corresponding to the classification models one by one;
training the classification model through a corresponding training set to obtain a corresponding final model;
classifying the target text through all the final models to obtain a plurality of classification result vectors;
and integrating all the classification result vectors to obtain a final result vector, and taking the class represented by the element with the maximum value in the final result vector as the class of the target text.
2. The short text classification method of claim 1, characterized in that the classification model comprises: a Bert model, a TextRnn model, a TextCNN model, and an SVM model.
3. The short text classification method according to claim 1 or 2, characterized in that the classification result vector is a binary vector, a first value of the classification result vector represents a probability value that the target text belongs to a first class, and a second value of the classification result vector represents a probability value that the target text belongs to a second class; and carrying out weighted average on all the classification result vectors to obtain the final result vector, wherein the final result vector is a binary vector.
4. The short text classification method of claim 1, wherein the process of sampling the training samples comprises: sampling data from the training sample a plurality of times in a sample-back manner to generate the training set; when the number of the training samples is greater than the sampling threshold, the generated plurality of training sets are independent from each other, and when the number of the training samples is less than or equal to the sampling threshold, the generated plurality of training sets are the same.
5. A short text classification system based on multi-model integration, comprising:
the classification model selection module is used for selecting a plurality of classification models for classifying the short texts;
the training data acquisition module is used for sampling the training samples and generating a plurality of training sets which are in one-to-one correspondence with the classification models;
the classification model training module is used for training the classification model through a corresponding training set so as to obtain a plurality of final models;
the target text classification module is used for classifying the target texts through all the final models to obtain a plurality of classification result vectors;
and the classification result integration module is used for integrating all the classification result vectors to obtain a final result vector, and taking the class represented by the element with the maximum value in the final result vector as the class of the target text.
6. The short text classification system of claim 5, characterized in that the classification model comprises: a Bert model, a TextRnn model, a TextCNN model, and an SVM model.
7. The short text classification system according to claim 5 or 6, characterized in that in the target text classification module, the classification result vector is a binary vector, a first value of the classification result vector represents a probability value that the target text belongs to a first class, and a second value of the classification result vector represents a probability value that the target text belongs to a second class;
in the classification result integration module, the final result vector is obtained by performing weighted average on all the classification result vectors, and the final result vector is a binary vector.
8. The short text classification system of claim 5, wherein the training data collection module comprises: sampling data from the training sample a plurality of times in a sample-back manner to generate the training set; when the number of the training samples is greater than the sampling threshold, the generated plurality of training sets are independent from each other, and when the number of the training samples is less than or equal to the sampling threshold, the generated plurality of training sets are the same.
9. A computer readable storage medium storing executable instructions for performing the short text classification method based on multi-model integration according to any one of claims 1 to 4.
10. A data processing apparatus comprising the computer-readable storage medium of claim 9, the processor of the data processing apparatus retrieving and executing executable instructions in the readable storage medium to perform short text classification based on multi-model integration.
CN201911229492.0A 2019-12-04 2019-12-04 Short text classification method and system based on multi-model integration Pending CN111078876A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201911229492.0A CN111078876A (en) 2019-12-04 2019-12-04 Short text classification method and system based on multi-model integration

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201911229492.0A CN111078876A (en) 2019-12-04 2019-12-04 Short text classification method and system based on multi-model integration

Publications (1)

Publication Number Publication Date
CN111078876A true CN111078876A (en) 2020-04-28

Family

ID=70312849

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201911229492.0A Pending CN111078876A (en) 2019-12-04 2019-12-04 Short text classification method and system based on multi-model integration

Country Status (1)

Country Link
CN (1) CN111078876A (en)

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597328A (en) * 2020-05-27 2020-08-28 青岛大学 New event theme extraction method
CN111737473A (en) * 2020-07-17 2020-10-02 浙江口碑网络技术有限公司 Text classification method, device and equipment
CN111782804A (en) * 2020-06-09 2020-10-16 中科院成都信息技术股份有限公司 TextCNN-based same-distribution text data selection method, system and storage medium
CN112307212A (en) * 2020-11-11 2021-02-02 上海昌投网络科技有限公司 Public opinion delivery monitoring method for advertisement delivery
WO2021189974A1 (en) * 2020-10-21 2021-09-30 平安科技(深圳)有限公司 Model training method and apparatus, text classification method and apparatus, computer device and medium
CN113780338A (en) * 2021-07-30 2021-12-10 国家计算机网络与信息安全管理中心 Confidence evaluation method, system, equipment and storage medium in big data analysis based on support vector machine
CN118069852A (en) * 2024-04-22 2024-05-24 数据空间研究院 Multi-model fusion data classification prediction method and system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468713A (en) * 2015-11-19 2016-04-06 西安交通大学 Multi-model fused short text classification method
CN105912716A (en) * 2016-04-29 2016-08-31 国家计算机网络与信息安全管理中心 Short text classification method and apparatus
CN108280462A (en) * 2017-12-11 2018-07-13 北京三快在线科技有限公司 A kind of model training method and device, electronic equipment
CN108959265A (en) * 2018-07-13 2018-12-07 深圳市牛鼎丰科技有限公司 Cross-domain texts sensibility classification method, device, computer equipment and storage medium
CN109034233A (en) * 2018-07-18 2018-12-18 武汉大学 A kind of high-resolution remote sensing image multi classifier combination classification method of combination OpenStreetMap

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105468713A (en) * 2015-11-19 2016-04-06 西安交通大学 Multi-model fused short text classification method
CN105912716A (en) * 2016-04-29 2016-08-31 国家计算机网络与信息安全管理中心 Short text classification method and apparatus
CN108280462A (en) * 2017-12-11 2018-07-13 北京三快在线科技有限公司 A kind of model training method and device, electronic equipment
CN108959265A (en) * 2018-07-13 2018-12-07 深圳市牛鼎丰科技有限公司 Cross-domain texts sensibility classification method, device, computer equipment and storage medium
CN109034233A (en) * 2018-07-18 2018-12-18 武汉大学 A kind of high-resolution remote sensing image multi classifier combination classification method of combination OpenStreetMap

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111597328A (en) * 2020-05-27 2020-08-28 青岛大学 New event theme extraction method
CN111782804A (en) * 2020-06-09 2020-10-16 中科院成都信息技术股份有限公司 TextCNN-based same-distribution text data selection method, system and storage medium
CN111782804B (en) * 2020-06-09 2023-05-02 中科院成都信息技术股份有限公司 Text CNN-based co-distributed text data selection method, system and storage medium
CN111737473A (en) * 2020-07-17 2020-10-02 浙江口碑网络技术有限公司 Text classification method, device and equipment
WO2021189974A1 (en) * 2020-10-21 2021-09-30 平安科技(深圳)有限公司 Model training method and apparatus, text classification method and apparatus, computer device and medium
CN112307212A (en) * 2020-11-11 2021-02-02 上海昌投网络科技有限公司 Public opinion delivery monitoring method for advertisement delivery
CN113780338A (en) * 2021-07-30 2021-12-10 国家计算机网络与信息安全管理中心 Confidence evaluation method, system, equipment and storage medium in big data analysis based on support vector machine
CN113780338B (en) * 2021-07-30 2024-04-09 国家计算机网络与信息安全管理中心 Confidence evaluation method, system, equipment and storage medium in big data analysis based on support vector machine
CN118069852A (en) * 2024-04-22 2024-05-24 数据空间研究院 Multi-model fusion data classification prediction method and system

Similar Documents

Publication Publication Date Title
CN111078876A (en) Short text classification method and system based on multi-model integration
CN110609897B (en) Multi-category Chinese text classification method integrating global and local features
Prusa et al. The effect of dataset size on training tweet sentiment classifiers
CN107480143B (en) Method and system for segmenting conversation topics based on context correlation
CN112069310B (en) Text classification method and system based on active learning strategy
CN108287858A (en) The semantic extracting method and device of natural language
CN108255813B (en) Text matching method based on word frequency-inverse document and CRF
CN109271514B (en) Generation method, classification method, device and storage medium of short text classification model
CN112906397B (en) Short text entity disambiguation method
CN112231477A (en) Text classification method based on improved capsule network
CN111368096A (en) Knowledge graph-based information analysis method, device, equipment and storage medium
CN111046979A (en) Method and system for discovering badcase based on small sample learning
CN114490953B (en) Method for training event extraction model, method, device and medium for extracting event
CN110647995A (en) Rule training method, device, equipment and storage medium
CN113609289A (en) Multi-mode dialog text-based emotion recognition method
CN117150026B (en) Text content multi-label classification method and device
CN115374845A (en) Commodity information reasoning method and device
Mishra et al. Twitter sentiment analysis using naive bayes algorithm
Jayakody et al. Sentiment analysis on product reviews on twitter using Machine Learning Approaches
CN114065749A (en) Text-oriented Guangdong language recognition model and training and recognition method of system
CN105183807A (en) emotion reason event identifying method and system based on structure syntax
CN112463964B (en) Text classification and model training method, device, equipment and storage medium
CN114254622A (en) Intention identification method and device
CN110162629B (en) Text classification method based on multi-base model framework
Li et al. Multilingual toxic text classification model based on deep learning

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20200428

WD01 Invention patent application deemed withdrawn after publication