CN113032563B - Regularized text classification fine tuning method based on manual masking keywords - Google Patents

Regularized text classification fine tuning method based on manual masking keywords Download PDF

Info

Publication number
CN113032563B
CN113032563B CN202110302636.1A CN202110302636A CN113032563B CN 113032563 B CN113032563 B CN 113032563B CN 202110302636 A CN202110302636 A CN 202110302636A CN 113032563 B CN113032563 B CN 113032563B
Authority
CN
China
Prior art keywords
keywords
keyword
masking
model
text classification
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110302636.1A
Other languages
Chinese (zh)
Other versions
CN113032563A (en
Inventor
潘晓光
陈亮
董虎弟
宋晓晨
张雅娜
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shanxi Sanyouhe Smart Information Technology Co Ltd
Original Assignee
Shanxi Sanyouhe Smart Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shanxi Sanyouhe Smart Information Technology Co Ltd filed Critical Shanxi Sanyouhe Smart Information Technology Co Ltd
Priority to CN202110302636.1A priority Critical patent/CN113032563B/en
Publication of CN113032563A publication Critical patent/CN113032563A/en
Application granted granted Critical
Publication of CN113032563B publication Critical patent/CN113032563B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3344Query execution using natural language analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/334Query execution
    • G06F16/3346Query execution using probabilistic model
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention belongs to the technical field of text classification, and particularly relates to a regularized text classification fine tuning method based on manual masking keywords, which comprises the following steps: the method comprises the steps of data acquisition and processing, keyword selection based on frequency, keyword selection based on attention value, masking keyword reconstruction, hidden entropy regularization and performance evaluation, wherein the data acquisition and processing are used for acquiring text data required by a model, marking the category of the text data, constructing a data set required by the model and pre-training the data set; the frequency-based keyword selection uses the relative frequencies of words in the dataset to select keywords; the attention value based keyword selection uses model attention to select keywords. This method regularizes the model, reconstructs keywords from other words, and makes low confidence predictions without sufficient context. The method can greatly improve OOD detection and cross-domain generalization under the condition of not reducing classification precision.

Description

Regularized text classification fine tuning method based on manual masking keywords
Technical Field
The invention relates to the technical field of text classification, in particular to a regularized text classification fine tuning method based on manual masking keywords.
Background
The pre-trained language model achieves the most advanced accuracy in various text classification tasks such as emotion analysis, natural language reasoning and semantic text similarity. However, the reliability of the fine-tuned text classifier is severely underestimated. It has not been possible to build a model that can detect out-of-distribution samples or is robust to domain transfer, mainly because the model is overly dependent on a limited number of keywords, rather than focusing on the entire context.
Causes of problems or defects: current studies on text classification focus only on evaluating the accuracy of models, and neglecting their reliability. Meanwhile, the excessive dependence of the conventional method on the keywords may cause problems of abnormal distribution and generalization of the detection.
Disclosure of Invention
The invention aims to provide a regularized text classification fine tuning method based on manual masking keywords.
In order to achieve the above purpose, the present invention provides the following technical solutions: a regularized text classification fine tuning method based on manual masking keywords comprises the following steps:
s100, data acquisition and processing: collecting text data required by a model, marking the categories of the text data, constructing a data set required by the model, and pre-training the data set;
s200, selecting keywords based on frequency: selecting keywords using the relative frequencies of words in the dataset;
s300, keyword selection based on attention value: selecting a keyword using model attention;
s400, reconstructing a masking keyword: reconstructing keywords from the keyword-masked document;
s500, hidden entropy regularization: regularization of random deletion of context non-key words is performed on predictions of the context mask document;
s600, performance evaluation: and evaluating the text classification accuracy.
Further, in the step S200, the importance of the mark is measured by TF-IDF in the selection of the keyword based on the frequency, and the importance of the mark is measured by comparing the frequency in the target document with the frequency in the whole corpus, and the keyword is defined as the word with the highest TF-IDF score.
Further, in the keyword selection based on the attention value, step S300 trains a model using LCE standard method of cross entropy loss, and selects the keyword of the label having the highest attention value by using the attention value of the model.
Further, in the keyword selection based on the attention value in step S300, a= [ a1, … aT ] ∈ R T is set as the attention value embedded in the document, where ai corresponds to ti in the input symbol, and the attention-based scoring formula of symbol t is set as
Figure BDA0002986861920000021
Further, in step S400, in the masking keyword reconstruction, the sentence is subjected to the regularization of the keyword, and k is assumed For a random subset of the full key k, each element is selected with an independent probability p, and k is then masked from the original document x Obtaining the mask document x =x-k Finally, the reconstruction loss formula of the hidden key words is obtained as follows
Figure BDA0002986861920000022
Further, in the step S500 latent entropy regularization, let c be a subset of the randomly selected full context words c=x-k, where each element is independently selected with probability q, and then masked from the original document xc, and obtaining the context mask document x=x -c, obtaining the formula of latent entropy regularization as
Figure BDA0002986861920000023
Finally, the verification formula is set to +.>
Figure BDA0002986861920000024
Further, in the performance evaluation in step S600, classification accuracy, OOD detection and cross-domain generalization indexes are mainly evaluated.
Further, the step S200 and the step S300 are not sequential.
Further, step S400 and step S500 are not sequential.
The invention has the following technical effects: aiming at the problems that the reliability of a model is ignored, the model is excessively dependent on keywords and the like in the current text classification research method, the invention provides a method capable of carrying out overall prediction based on context and having higher reliability so as to carry out overall prediction based on context. This method regularizes the model, reconstructs keywords from other words, and makes low confidence predictions without sufficient context. Running in pre-trained language models such as BERT, roBERTa, and ALBERT, this approach can greatly improve OOD detection and cross-domain generalization without degrading classification accuracy.
Drawings
FIG. 1 is a flow chart of the system of the present invention.
Detailed Description
The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the invention without making any inventive effort, are intended to be within the scope of the invention.
Examples
A regularized text classification fine tuning method based on manual masking keywords, as shown in FIG. 1, comprises the following steps:
s100, data acquisition and processing: collecting text data required by a model, marking the categories of the text data, constructing a data set required by the model, and pre-training the data set;
s200, selecting keywords based on frequency: selecting keywords using the relative frequencies of words in the dataset;
s300, keyword selection based on attention value: selecting a keyword using model attention;
s400, reconstructing a masking keyword: reconstructing keywords from the keyword-masked document;
s500, hidden entropy regularization: regularization of random deletion of context non-key words is performed on predictions of the context mask document;
s600, performance evaluation: and evaluating the text classification accuracy.
In step S200, in the frequency-based keyword selection, the importance of the tag is measured by TF-IDF term frequency-inverse document frequency, and then the importance of the tag is measured by comparing the frequency term frequency in the target document with the frequency-inverse document frequency in the whole corpus, and the keyword is defined as the word having the highest TF-IDF score. XC is a document concatenated with all symbols in the DC corpus, and D is an oversized document with d= [ X1, … XC ], the frequency-based keyword selection score formula for t symbols is
Figure BDA0002986861920000031
Where tf (t, X) =0.5+0.5·nt, idf (t, D) =log (|D|/|{ X ε D: t ε X } |). The frequency-based selection is model-independent and relatively easy to calculate, but does not directly reflect the contribution of the word to the text prediction.
In the keyword selection based on the attention value, the model attention is used to select keywords in step S300, because this is a more direct and efficient way to scale the importance of the keywords in model prediction. The model is trained using LCE standard methods of cross entropy loss, with the attention value of the model being used to select the labeled keyword with the highest attention value.
In the keyword selection based on the attention value in step S300, a= [ a1, … aT ] ∈ R T is set as the attention value embedded in the document, and the attention-based scoring formula in which ai corresponds to ti in the input symbol and symbol t is set as
Figure BDA0002986861920000032
Wherein II is an indication function, and II.II is L2 regular.
In step S400 mask keyword reconstruction, to strengthen the model to understand the surrounding context, the model is forced to reconstruct keywords from the keyword mask document. The principle is similar to the masking mechanism in the BERT model, but the scheme only masks keywords and not random words. The hidden keyword reconstruction only regularizes the keywords of the sentences, and ignores the loss of sentences without keywords. Formally, let k For a random subset of the full key k, each element is selected with an independent probability p, and k is then masked from the original document x Obtaining the mask document x =x-k Finally, the reconstruction loss formula of the hidden key words is obtained as follows
Figure BDA0002986861920000041
Wherein index (k) ) Is keyword k Vi is an index of keywords with respect to the vocabulary set with respect to the index of the original document x. Here too, it is important to select a proper keyword method, and experiments prove that the attention-based keyword selection method performs better than the frequency-based or random keyword selection method.
In step S500 latent entropy regularization, the model should not correctly classify the context mask document because it is not the original context here. Formally, let c be a subset of the randomly selected full context words c=x-k, where each element is independently selected with probability q, then mask c from the original document x, and obtain the context mask document x=x -c, obtaining the formula of latent entropy regularization as
Figure BDA0002986861920000042
Where DKL is the KL-distinction and U (y) is uniformly distributed. Latent entropy regularization does not reduce the accuracy of classification because it normalizes unrealistic masked sentences rather than complete documents. Finally, the verification formula is set to +.>
Figure BDA0002986861920000043
Where λ MKR and λMER are the super parameters lost by MKR mask keyword reconstruction and MER latent entropy regularization, respectively.
In the step S600 performance evaluation, classification accuracy, OOD detection and cross-domain generalization indexes are mainly evaluated. The classification accuracy of the model is not reduced, and the indexes of OOD detection and cross-domain generalization are greatly improved.
Step S200 and step S300 are not sequenced, and step S400 and step S500 are not sequenced.
The invention provides a fine tuning method based on regularization of manual masking keywords so as to conduct overall prediction based on context. This method regularizes the model, reconstructs keywords from other words, and makes low confidence predictions without sufficient context. The method is operated in pre-training language models such as BERT, roBERTa and ALBERT, has good reliability, and can greatly improve OOD detection and cross-domain generalization without reducing classification accuracy.
The preferred embodiments of the present invention have been described in detail, but the present invention is not limited to the above embodiments, and various changes can be made within the knowledge of those skilled in the art without departing from the spirit of the present invention, and the various changes are included in the scope of the present invention.

Claims (5)

1. A regularized text classification fine tuning method based on manual masking keywords is characterized by comprising the following steps:
s100, data acquisition and processing: collecting text data required by a model, marking the categories of the text data, constructing a data set required by the model, and pre-training the data set;
s200, selecting keywords based on frequency: selecting keywords using the relative frequencies of words in the dataset;
s300, keyword selection based on attention value: selecting a keyword using model attention; in the selecting of the keywords based on the attention value, training a model by using an LCE standard method of cross entropy loss, and selecting the keyword with the mark with the highest attention value by using the attention value of the model; in the keyword selection of S300 based on the attention value, a= [ a1, … aT]E R T is the attention value embedded in the document, and the attention-based scoring formula in which ai corresponds to ti in the input symbol, symbol t, is set to
Figure FDA0004259007580000011
The II is an indication function, the II is L2 regular, and the ti is ai corresponding to an input symbol;
s400, reconstructing a masking keyword: reconstructing keywords from the keyword-masked document; in the S400 masking keyword reconstruction, keyword regularization is carried out on sentences, k-is assumed to be a random subset of the complete keyword k, the selection of each element is independent probability p, k-is then shielded from the original document x to obtain masking documents x-k-and finally a masking keyword reconstruction loss formula is obtained as follows
Figure FDA0004259007580000012
The index (k-) is the index of the random subset k-relative to the original document x, and vi is the index of the keyword relative to the vocabulary set;
s500, hidden entropy regularization: regularization of random deletion of context non-key words is performed on predictions of the context mask document; in the S500 hidden entropy regularization, let c be a subset of the randomly selected complete context words c=x-k, wherein each element is independently selected with probability q, then mask c from the original document x, and obtain the context mask documents x=x to-c, and the formula for the hidden entropy regularization is
Figure FDA0004259007580000013
Finally, the verification formula is set as
Figure FDA0004259007580000014
The DKL is KL-difference, the U (y) is uniformly distributed, and the lambda MKR and lambda MER are the super parameters lost by MKR mask keyword reconstruction and MER latent entropy regularization respectively;
s600, performance evaluation: and evaluating the text classification accuracy.
2. The regularized text classification fine tuning method based on artificial masking of keywords according to claim 1, wherein in the step S200 of frequency-based keyword selection, the importance of the markers is measured by TF-IDF, and then the importance of the markers is measured by comparing the frequencies in the target document with the frequencies in the whole corpus, and the keywords are defined as words with the highest TF-IDF score.
3. The regularized text classification fine-tuning method based on artificial masking keywords as recited in claim 1, wherein,
in the step S600 performance evaluation, classification accuracy, OOD detection and cross-domain generalization indexes are mainly evaluated.
4. The regularized text classification fine-tuning method based on artificial masking keywords as recited in claim 1, wherein,
step S200 and step S300 are not sequential.
5. The regularized text classification fine-tuning method based on artificial masking keywords as recited in claim 1, wherein,
step S400 and step S500 are not sequential.
CN202110302636.1A 2021-03-22 2021-03-22 Regularized text classification fine tuning method based on manual masking keywords Active CN113032563B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110302636.1A CN113032563B (en) 2021-03-22 2021-03-22 Regularized text classification fine tuning method based on manual masking keywords

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110302636.1A CN113032563B (en) 2021-03-22 2021-03-22 Regularized text classification fine tuning method based on manual masking keywords

Publications (2)

Publication Number Publication Date
CN113032563A CN113032563A (en) 2021-06-25
CN113032563B true CN113032563B (en) 2023-07-14

Family

ID=76472302

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110302636.1A Active CN113032563B (en) 2021-03-22 2021-03-22 Regularized text classification fine tuning method based on manual masking keywords

Country Status (1)

Country Link
CN (1) CN113032563B (en)

Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014043519A1 (en) * 2012-09-14 2014-03-20 Population Diagnostics Inc. Methods and compositions for diagnosing, prognosing, and treating neurological conditions
CN110119765A (en) * 2019-04-18 2019-08-13 浙江工业大学 A kind of keyword extracting method based on Seq2seq frame
CN110222349A (en) * 2019-06-13 2019-09-10 成都信息工程大学 A kind of model and method, computer of the expression of depth dynamic context word
CN111339278A (en) * 2020-02-28 2020-06-26 支付宝(杭州)信息技术有限公司 Method and device for generating training speech generating model and method and device for generating answer speech
CN111444709A (en) * 2020-03-09 2020-07-24 腾讯科技(深圳)有限公司 Text classification method, device, storage medium and equipment
CN111444721A (en) * 2020-05-27 2020-07-24 南京大学 Chinese text key information extraction method based on pre-training language model
CN111488459A (en) * 2020-04-15 2020-08-04 焦点科技股份有限公司 Product classification method based on keywords
CN111563166A (en) * 2020-05-28 2020-08-21 浙江学海教育科技有限公司 Pre-training model method for mathematical problem classification
CN111563373A (en) * 2020-04-13 2020-08-21 中南大学 Attribute-level emotion classification method for focused attribute-related text
CN112115247A (en) * 2020-09-07 2020-12-22 中国人民大学 Personalized dialogue generation method and system based on long-time and short-time memory information
CN112214599A (en) * 2020-10-20 2021-01-12 电子科技大学 Multi-label text classification method based on statistics and pre-training language model
CN112256876A (en) * 2020-10-26 2021-01-22 南京工业大学 Aspect-level emotion classification model based on multi-memory attention network

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107315823B (en) * 2017-07-04 2020-11-03 北京京东尚科信息技术有限公司 Data processing method and device based on electronic commerce

Patent Citations (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2014043519A1 (en) * 2012-09-14 2014-03-20 Population Diagnostics Inc. Methods and compositions for diagnosing, prognosing, and treating neurological conditions
CN110119765A (en) * 2019-04-18 2019-08-13 浙江工业大学 A kind of keyword extracting method based on Seq2seq frame
CN110222349A (en) * 2019-06-13 2019-09-10 成都信息工程大学 A kind of model and method, computer of the expression of depth dynamic context word
CN111339278A (en) * 2020-02-28 2020-06-26 支付宝(杭州)信息技术有限公司 Method and device for generating training speech generating model and method and device for generating answer speech
CN111444709A (en) * 2020-03-09 2020-07-24 腾讯科技(深圳)有限公司 Text classification method, device, storage medium and equipment
CN111563373A (en) * 2020-04-13 2020-08-21 中南大学 Attribute-level emotion classification method for focused attribute-related text
CN111488459A (en) * 2020-04-15 2020-08-04 焦点科技股份有限公司 Product classification method based on keywords
CN111444721A (en) * 2020-05-27 2020-07-24 南京大学 Chinese text key information extraction method based on pre-training language model
CN111563166A (en) * 2020-05-28 2020-08-21 浙江学海教育科技有限公司 Pre-training model method for mathematical problem classification
CN112115247A (en) * 2020-09-07 2020-12-22 中国人民大学 Personalized dialogue generation method and system based on long-time and short-time memory information
CN112214599A (en) * 2020-10-20 2021-01-12 电子科技大学 Multi-label text classification method based on statistics and pre-training language model
CN112256876A (en) * 2020-10-26 2021-01-22 南京工业大学 Aspect-level emotion classification model based on multi-memory attention network

Non-Patent Citations (4)

* Cited by examiner, † Cited by third party
Title
BERTSurv: BERT-Based Survival Models for Predicting Outcomes of Trauma Patients;Yun Zhao 等;《arXiv:2103.10928v1》;20210319;1-15 *
KL散度的理解;薄层;《https://www.cnblogs.com/boceng/p/11519381.html》;20190914;1-3 *
基于BERT改进的文本表示模型研究;王楠禔;《中国优秀硕士学位论文全文数据库 信息科技辑》;20200115(第01期);I138-2641 *
面向社交物联网的细粒度文本情感分类方法研究;田芳;《中国优秀硕士学位论文全文数据库 信息科技辑》;20210315(第03期);I136-338 *

Also Published As

Publication number Publication date
CN113032563A (en) 2021-06-25

Similar Documents

Publication Publication Date Title
Diggelmann et al. Climate-fever: A dataset for verification of real-world climate claims
Mei et al. Wavcaps: A chatgpt-assisted weakly-labelled audio captioning dataset for audio-language multimodal research
Wu et al. Learning to tag
CN111125349A (en) Graph model text abstract generation method based on word frequency and semantics
CN108304372A (en) Entity extraction method and apparatus, computer equipment and storage medium
CN111914062B (en) Long text question-answer pair generation system based on keywords
CN110096572B (en) Sample generation method, device and computer readable medium
CN110807324A (en) Video entity identification method based on IDCNN-crf and knowledge graph
CN115309872B (en) Multi-model entropy weighted retrieval method and system based on Kmeans recall
Xie et al. T2ranking: A large-scale chinese benchmark for passage ranking
Hillard et al. Learning weighted entity lists from web click logs for spoken language understanding
CN117271792A (en) Method for constructing enterprise domain knowledge base based on large model
CN116756303A (en) Automatic generation method and system for multi-topic text abstract
CN116862318B (en) New energy project evaluation method and device based on text semantic feature extraction
CN111581365B (en) Predicate extraction method
CN115146021A (en) Training method and device for text retrieval matching model, electronic equipment and medium
CN113032563B (en) Regularized text classification fine tuning method based on manual masking keywords
CN116720498A (en) Training method and device for text similarity detection model and related medium thereof
CN117131383A (en) Method for improving search precision drainage performance of double-tower model
CN110019814B (en) News information aggregation method based on data mining and deep learning
CN109189915A (en) A kind of information retrieval method based on depth relevant matches model
Amini et al. Incorporating prior knowledge into a transductive ranking algorithm for multi-document summarization
Kalmar Bootstrapping Websites for Classification of Organization Names on Twitter.
Sotudeh et al. Qontsum: On contrasting salient content for query-focused summarization
CN111930880A (en) Text code retrieval method, device and medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant