CN113032563A - Regularization text classification fine-tuning method based on manually-covered keywords - Google Patents

Regularization text classification fine-tuning method based on manually-covered keywords Download PDF

Info

Publication number
CN113032563A
CN113032563A CN202110302636.1A CN202110302636A CN113032563A CN 113032563 A CN113032563 A CN 113032563A CN 202110302636 A CN202110302636 A CN 202110302636A CN 113032563 A CN113032563 A CN 113032563A
Authority
CN
China
Prior art keywords
keywords
keyword
model
regularization
text classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110302636.1A
Other languages
Chinese (zh)
Other versions
CN113032563B (en
Inventor
潘晓光
陈亮
董虎弟
宋晓晨
张雅娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanxi Sanyouhe Smart Information Technology Co Ltd
Original Assignee
Shanxi Sanyouhe Smart Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanxi Sanyouhe Smart Information Technology Co Ltd filed Critical Shanxi Sanyouhe Smart Information Technology Co Ltd
Priority to CN202110302636.1A priority Critical patent/CN113032563B/en
Publication of CN113032563A publication Critical patent/CN113032563A/en
Application granted granted Critical
Publication of CN113032563B publication Critical patent/CN113032563B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of text classification, and particularly relates to a regularization text classification fine adjustment method based on manually-hidden keywords, which comprises the following steps: data acquisition and processing, keyword selection based on frequency, keyword selection based on attention value, masked keyword reconstruction, hidden entropy regularization and performance evaluation, wherein the data acquisition and processing acquires text data required by a model, labels the type of the text data, constructs a data set required by the model, and pre-trains the data set; the frequency-based keyword selection uses relative frequencies of words in a dataset to select keywords; the attention value-based keyword selection uses model attention to select keywords. The method regularizes the model, reconstructs keywords from other words, and makes low confidence predictions without sufficient context. The method can greatly improve OOD detection and cross-domain generalization under the condition of not reducing the classification precision.

Description

Regularization text classification fine-tuning method based on manually-covered keywords
Technical Field
The invention relates to the technical field of text classification, in particular to a regularization text classification fine-tuning method based on manually-hidden keywords.
Background
At present, a language model trained in advance achieves the most advanced accuracy in various text classification tasks, such as emotion analysis, natural language reasoning and semantic text similarity. However, the reliability of the fine-tuned text classifier is severely underestimated. It has not been possible to build a model that can detect samples of the OOD (out-of-distribution) or that is robust to domain transitions, mainly due to the model's excessive dependence on a limited number of keywords, rather than looking at the entire context.
Cause of problems or defects: current research on text classification focuses only on evaluating the accuracy of models, and ignores their reliability. Meanwhile, the excessive dependence of the traditional method on the keywords may cause the problems of abnormal distribution and generalization of the detection.
Disclosure of Invention
The invention aims to provide a regularization text classification fine adjustment method based on manually-hidden keywords.
In order to achieve the purpose, the invention provides the following technical scheme: a regularization text classification fine adjustment method based on manually-hidden keywords comprises the following steps:
s100, data acquisition and processing: acquiring text data required by a model, labeling the type of the text data, constructing a data set required by the model, and pre-training the data set;
s200, selecting keywords based on frequency: selecting keywords using relative frequencies of words in the dataset;
s300, selecting keywords based on the attention value: selecting keywords using model attention;
s400, hiding the keyword reconstruction: reconstructing the keywords from the keyword mask document;
s500, hidden entropy regularization: regularizing the random deletion of the non-key words in the context for the prediction of the context-obscured document;
s600, performance evaluation: and evaluating the text classification precision.
Further, in the keyword selection based on frequency in step S200, the importance of the token is measured through TF-IDF, and then the importance of the token is measured by comparing the frequency in the target document with the frequency in the entire corpus, and the keyword is defined as the word with the highest TF-IDF score.
Further, in the step S300, in the selection of the attention value-based keyword, the LCE standard method of cross entropy loss is used to train the model, and the attention value of the model is used to select the labeled keyword with the highest attention value.
Further, in the keyword selection based on the attention value in step S300, let a ═ a1, … aT ∈ R T be the attention value of document embedding, and the attention-based scoring formula in which ai corresponds to ti in the input symbol and the symbol t is set as ti
Figure BDA0002986861920000021
Further, in step S400, during the hidden keyword reconstruction, the sentence is regularized by the keyword, and k is assumed to beFor a random subset of the full keyword k, each element is chosen with an independent probability p, and k is masked from the original document xTo obtain a masked document x=x-kFinally, the reconstruction loss formula of the shielding keyword is obtained as
Figure BDA0002986861920000022
Further, step (ii)In step S500 hidden entropy regularization, let c be a randomly selected subset of the full context word c x-k, where each element is independently selected with a probability q, then mask c from the original document x and obtain a context masked document x-c, obtaining a formula for the latent entropy regularization as
Figure BDA0002986861920000023
Finally, the verification formula is set as
Figure BDA0002986861920000024
Further, in the performance evaluation of step S600, classification accuracy, OOD detection and cross-domain generalization index are mainly evaluated.
Further, step S200 and step S300 are not in sequence.
Further, step S400 and step S500 are not in sequence.
The invention has the following technical effects: aiming at the problems that the reliability of an evaluation model is neglected and the reliability of a research method for classifying texts excessively depends on keywords and the like at present, the invention provides a method which can carry out overall prediction and has higher reliability on the basis of context so as to carry out overall prediction on the basis of context. The method regularizes the model, reconstructs keywords from other words, and makes low confidence predictions without sufficient context. Running in pre-trained language models such as BERT, RoBERTa and ALBERT, this approach can greatly improve OOD detection and cross-domain generalization without reducing classification accuracy.
Drawings
FIG. 1 is a flow chart of the system of the present invention.
Detailed Description
The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.
Examples
A regularization text classification fine-tuning method based on artificial masking keywords is disclosed, as shown in FIG. 1, and comprises the following steps:
s100, data acquisition and processing: acquiring text data required by a model, labeling the type of the text data, constructing a data set required by the model, and pre-training the data set;
s200, selecting keywords based on frequency: selecting keywords using relative frequencies of words in the dataset;
s300, selecting keywords based on the attention value: selecting keywords using model attention;
s400, hiding the keyword reconstruction: reconstructing the keywords from the keyword mask document;
s500, hidden entropy regularization: regularizing the random deletion of the non-key words in the context for the prediction of the context-obscured document;
s600, performance evaluation: and evaluating the text classification precision.
In the keyword selection based on frequency in step S200, the importance of the token is measured by TF-IDF term frequency-inverse document frequency, and then the importance of the token is measured by comparing the frequency term frequency in the target document with the frequency inverse document frequency in the entire corpus, and the keyword is defined as the word with the highest TF-IDF score. XC is a document with all symbols in the DC corpus concatenated, D is a very large document with D ═ X1, … XC ], and the frequency-based keyword selection score formula of t symbols is
Figure BDA0002986861920000031
Where tf (t, X) ═ 0.5+ 0.5. nt, idf (t, D) ═ log (| D |/| { X ∈ D: t ∈ X } |). The frequency-based selection is model-independent and relatively easy to compute, but does not directly reflect the contribution of words to text predictions.
Step S300 selects keywords based on attention values in keyword selection using model attention, since this is a more direct and efficient way to scale how important quantifiers are in model prediction. The model is trained using the LCE standard method of cross entropy loss, with the attention value of the model being used to select the labeled keyword with the highest attention value.
In the step S300 of keyword selection based on attention value, let a ═ a1, … aT ∈ R T be the attention value of document embedding, and the attention-based scoring formula in which ai corresponds to ti in the input symbol and the symbol t is set as
Figure BDA0002986861920000032
Where II is an indicator function and IIII is L2 regularization.
In the masked keyword reconstruction of step S400, the model is forced to reconstruct the keywords from the keyword masked document in order to strengthen the model to understand the surrounding context. The principle is similar to the masking mechanism in the BERT model, but the scheme only masks keywords rather than random words. The hidden keyword reconstruction only carries out keyword regularization on the sentences, and the loss of the sentences without the keywords is ignored. Formally, assume kFor a random subset of the full keyword k, each element is chosen with an independent probability p, and k is masked from the original document xTo obtain a masked document x=x-kFinally, the reconstruction loss formula of the shielding keyword is obtained as
Figure BDA0002986861920000041
Wherein the index (k)) Is the keyword kVi is the index of the keyword relative to the vocabulary set, relative to the index of the original document x. It is also important to select the appropriate keyword method here, and experiments have shown that the attention-based keyword selection method performs better than the frequency-or random-based keyword selection method.
In step S500 hidden entropy regularization, the model should not correctly classify the context masked documents because it is not the original context already here. Formally, let c be a randomly selected subset of the full context word c ═ x-k, where each element is a probabilityq are independently selected, c is then masked from the original document x, and a context masked document x is obtained-c, obtaining a formula for the latent entropy regularization as
Figure BDA0002986861920000042
Where DKL is a KL-difference and U (y) is a homogeneous distribution. The hidden entropy regularization does not degrade the classification accuracy because it specifies unrealistic, masked sentences, rather than complete documents. Finally, the verification formula is set as
Figure BDA0002986861920000043
Where λ MKR and λ MER are the hyperparameters lost by MKR masked keyword reconstruction and MER hidden entropy regularization, respectively.
In the performance evaluation of step S600, classification accuracy, OOD detection, and cross-domain generalization index are mainly evaluated. The scheme does not reduce the classification precision of the model, and the indexes of OOD detection and cross-domain generalization are greatly improved.
Step S200 and step S300 are not in sequence, and step S400 and step S500 are not in sequence.
The invention provides a fine adjustment method based on manual masking keyword regularization, so that overall prediction can be performed based on context. The method regularizes the model, reconstructs keywords from other words, and makes low confidence predictions without sufficient context. The method has good reliability when being operated in a pre-training language model such as BERT, RoBERTA and ALBERT, and can greatly improve OOD detection and cross-domain generalization without reducing classification precision.
Although only the preferred embodiments of the present invention have been described in detail, the present invention is not limited to the above embodiments, and various changes can be made without departing from the spirit of the present invention within the knowledge of those skilled in the art, and all changes are encompassed in the scope of the present invention.

Claims (9)

1. A regularization text classification fine adjustment method based on manually-hidden keywords is characterized by comprising the following steps:
s100, data acquisition and processing: acquiring text data required by a model, labeling the type of the text data, constructing a data set required by the model, and pre-training the data set;
s200, selecting keywords based on frequency: selecting keywords using relative frequencies of words in the dataset;
s300, selecting keywords based on the attention value: selecting keywords using model attention;
s400, hiding the keyword reconstruction: reconstructing the keywords from the keyword mask document;
s500, hidden entropy regularization: regularizing the random deletion of the non-key words in the context for the prediction of the context-obscured document;
s600, performance evaluation: and evaluating the text classification precision.
2. The method of claim 1, wherein in the frequency-based keyword selection of S200, the importance of the label is measured by TF-IDF, and then the importance of the label is measured by comparing the frequency in the target document with the frequency in the entire corpus, and the keyword is defined as the word with the highest TF-IDF score.
3. The method as claimed in claim 1, wherein the step S300 is implemented by training a model using LCE standard method of cross entropy loss in the keyword selection based on attention value, and selecting the labeled keyword with the highest attention value by using the attention value of the model.
4. The regularization text classification fine-tuning method based on artificial occlusion keyword as claimed in claim 1, wherein in the keyword selection based on attention value of S300, let a ═ a1, … aT ] ∈ RT as the attention value of document embedding, and set the attention-based scoring formula in which ai corresponds to ti in the input symbols and t is the symbol t as
Figure FDA0002986861910000011
5. The regularization text classification fine tuning method based on artificial mask keywords as claimed in claim 1, wherein in S400 mask keyword reconstruction, the keywords of the sentence are regularized, assuming k isFor a random subset of the full keyword k, each element is chosen with an independent probability p, and k is masked from the original document xTo obtain a masked document x=x-kFinally, the reconstruction loss formula of the shielding keyword is obtained as
Figure FDA0002986861910000012
6. The regularization text classification fine tuning method based on artificial occlusion keywords as claimed in claim 1, wherein in the S500 hidden entropy regularization, let c be a randomly selected subset of full context words c ═ x-k, where each element is independently selected with probability q, then mask c from original document x and obtain context occlusion document x ═ x-k-c, obtaining a formula for the latent entropy regularization as
Figure FDA0002986861910000021
Finally, the verification formula is set as
Figure FDA0002986861910000022
7. The regularized text classification fine tuning method based on artificial occlusion keywords according to claim 1,
in the performance evaluation of step S600, classification accuracy, OOD detection, and cross-domain generalization index are mainly evaluated.
8. The regularized text classification fine tuning method based on artificial occlusion keywords according to claim 1,
step S200 and step S300 are not in sequence.
9. The regularized text classification fine tuning method based on artificial occlusion keywords according to claim 1,
step S400 and step S500 are not in sequence.
CN202110302636.1A 2021-03-22 2021-03-22 Regularized text classification fine tuning method based on manual masking keywords Active CN113032563B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110302636.1A CN113032563B (en) 2021-03-22 2021-03-22 Regularized text classification fine tuning method based on manual masking keywords

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110302636.1A CN113032563B (en) 2021-03-22 2021-03-22 Regularized text classification fine tuning method based on manual masking keywords

Publications (2)

Publication Number Publication Date
CN113032563A true CN113032563A (en) 2021-06-25
CN113032563B CN113032563B (en) 2023-07-14

Family

ID=76472302

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110302636.1A Active CN113032563B (en) 2021-03-22 2021-03-22 Regularized text classification fine tuning method based on manual masking keywords

Country Status (1)

Country Link
CN (1) CN113032563B (en)

Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014043519A1 (en) * 2012-09-14 2014-03-20 Population Diagnostics Inc. Methods and compositions for diagnosing, prognosing, and treating neurological conditions
CN110119765A (en) * 2019-04-18 2019-08-13 浙江工业大学 A kind of keyword extracting method based on Seq2seq frame
CN110222349A (en) * 2019-06-13 2019-09-10 成都信息工程大学 A kind of model and method, computer of the expression of depth dynamic context word
US20200193500A1 (en) * 2017-07-04 2020-06-18 Beijing Jingdong Shangke Information Technology Co., Ltd. Data processing method and apparatus based on electronic commerce
CN111339278A (en) * 2020-02-28 2020-06-26 支付宝(杭州)信息技术有限公司 Method and device for generating training speech generating model and method and device for generating answer speech
CN111444709A (en) * 2020-03-09 2020-07-24 腾讯科技(深圳)有限公司 Text classification method, device, storage medium and equipment
CN111444721A (en) * 2020-05-27 2020-07-24 南京大学 Chinese text key information extraction method based on pre-training language model
CN111488459A (en) * 2020-04-15 2020-08-04 焦点科技股份有限公司 Product classification method based on keywords
CN111563166A (en) * 2020-05-28 2020-08-21 浙江学海教育科技有限公司 Pre-training model method for mathematical problem classification
CN111563373A (en) * 2020-04-13 2020-08-21 中南大学 Attribute-level emotion classification method for focused attribute-related text
CN112115247A (en) * 2020-09-07 2020-12-22 中国人民大学 Personalized dialogue generation method and system based on long-time and short-time memory information
CN112214599A (en) * 2020-10-20 2021-01-12 电子科技大学 Multi-label text classification method based on statistics and pre-training language model
CN112256876A (en) * 2020-10-26 2021-01-22 南京工业大学 Aspect-level emotion classification model based on multi-memory attention network

Patent Citations (13)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014043519A1 (en) * 2012-09-14 2014-03-20 Population Diagnostics Inc. Methods and compositions for diagnosing, prognosing, and treating neurological conditions
US20200193500A1 (en) * 2017-07-04 2020-06-18 Beijing Jingdong Shangke Information Technology Co., Ltd. Data processing method and apparatus based on electronic commerce
CN110119765A (en) * 2019-04-18 2019-08-13 浙江工业大学 A kind of keyword extracting method based on Seq2seq frame
CN110222349A (en) * 2019-06-13 2019-09-10 成都信息工程大学 A kind of model and method, computer of the expression of depth dynamic context word
CN111339278A (en) * 2020-02-28 2020-06-26 支付宝(杭州)信息技术有限公司 Method and device for generating training speech generating model and method and device for generating answer speech
CN111444709A (en) * 2020-03-09 2020-07-24 腾讯科技(深圳)有限公司 Text classification method, device, storage medium and equipment
CN111563373A (en) * 2020-04-13 2020-08-21 中南大学 Attribute-level emotion classification method for focused attribute-related text
CN111488459A (en) * 2020-04-15 2020-08-04 焦点科技股份有限公司 Product classification method based on keywords
CN111444721A (en) * 2020-05-27 2020-07-24 南京大学 Chinese text key information extraction method based on pre-training language model
CN111563166A (en) * 2020-05-28 2020-08-21 浙江学海教育科技有限公司 Pre-training model method for mathematical problem classification
CN112115247A (en) * 2020-09-07 2020-12-22 中国人民大学 Personalized dialogue generation method and system based on long-time and short-time memory information
CN112214599A (en) * 2020-10-20 2021-01-12 电子科技大学 Multi-label text classification method based on statistics and pre-training language model
CN112256876A (en) * 2020-10-26 2021-01-22 南京工业大学 Aspect-level emotion classification model based on multi-memory attention network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
YUN ZHAO 等: "BERTSurv: BERT-Based Survival Models for Predicting Outcomes of Trauma Patients", 《ARXIV:2103.10928V1》 *
王楠禔: "基于BERT改进的文本表示模型研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
田芳: "面向社交物联网的细粒度文本情感分类方法研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
薄层: "KL散度的理解", 《HTTPS://WWW.CNBLOGS.COM/BOCENG/P/11519381.HTML》 *

Also Published As

Publication number Publication date
CN113032563B (en) 2023-07-14

Similar Documents

Publication Publication Date Title
Mei et al. Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research
Parameswaran et al. Towards the web of concepts: Extracting concepts from large datasets
CN111125349A (en) Graph model text abstract generation method based on word frequency and semantics
CN108304372A (en) Entity extraction method and apparatus, computer equipment and storage medium
Plank Domain adaptation for parsing
CN112749274B (en) Chinese text classification method based on attention mechanism and interference word deletion
CN110807324A (en) Video entity identification method based on IDCNN-crf and knowledge graph
CN110807326A (en) Short text keyword extraction method combining GPU-DMM and text features
CN110287314A (en) Long text credibility evaluation method and system based on Unsupervised clustering
Tungthamthiti et al. Recognition of sarcasm in microblogging based on sentiment analysis and coherence identification
Shnarch et al. GRASP: Rich patterns for argumentation mining
CN112528653B (en) Short text entity recognition method and system
Chang et al. The secret’s in the word order: Text-to-text generation for linguistic steganography
CN116720498A (en) Training method and device for text similarity detection model and related medium thereof
CN110019814B (en) News information aggregation method based on data mining and deep learning
CN113032563B (en) Regularized text classification fine tuning method based on manual masking keywords
CN114996442B (en) Text abstract generation system combining abstract degree discrimination and abstract optimization
CN107729509B (en) Discourse similarity determination method based on recessive high-dimensional distributed feature representation
Tang et al. Text semantic understanding based on knowledge enhancement and multi-granular feature extraction
Wang et al. Weakly Supervised Chinese short text classification algorithm based on ConWea model
CN115455975A (en) Method and device for extracting topic keywords based on multi-model fusion decision
Amini et al. Incorporating prior knowledge into a transductive ranking algorithm for multi-document summarization
CN112257458A (en) Intention recognition model training method, intention recognition method, device and equipment
Kalmar Bootstrapping Websites for Classification of Organization Names on Twitter.
Jain Unsupervised method for text summarization using content based approach

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant