CN109325114A - A kind of text classification algorithm merging statistical nature and Attention mechanism - Google Patents

A kind of text classification algorithm merging statistical nature and Attention mechanism Download PDF

Info

Publication number
CN109325114A
CN109325114A CN201810817616.6A CN201810817616A CN109325114A CN 109325114 A CN109325114 A CN 109325114A CN 201810817616 A CN201810817616 A CN 201810817616A CN 109325114 A CN109325114 A CN 109325114A
Authority
CN
China
Prior art keywords
event
word
text
attention
value
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810817616.6A
Other languages
Chinese (zh)
Inventor
程艳芬
李超
陈逸灵
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuhan University of Technology WUT
Original Assignee
Wuhan University of Technology WUT
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuhan University of Technology WUT filed Critical Wuhan University of Technology WUT
Priority to CN201810817616.6A priority Critical patent/CN109325114A/en
Publication of CN109325114A publication Critical patent/CN109325114A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/216Parsing using statistical methods
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/30Semantic analysis

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Probability & Statistics with Applications (AREA)
  • Machine Translation (AREA)

Abstract

The present invention relates to a kind of text classification algorithm for merging statistical nature and Attention mechanism, Attention mechanism is gradually applied to natural language processing field.Existing method significantly increases calculation amount when calculating Attention weight, and the present invention proposes to calculate Attention weight on the event level of structuring.It on the one hand, include semanteme more abundant relative to word or phrase event;On the other hand, the Attention mechanism based on event reduces computation complexity.Meanwhile it joined statistical nature on the basis of Attention weight computing.Compared with existing model, the semantic information and corresponding statistical nature that event structure is included improve the quality of text vector expression, obtain preferable classification performance.Recruitment evaluation is carried out to it on classification accuracy, the experimental results showed that the model obtains more preferably effect while reducing the training time.

Description

A kind of text classification algorithm merging statistical nature and Attention mechanism
Technical field
The present invention relates to a kind of novel text classification algorithms to classify particular for large-scale text data set improving Reduce the time complexity of calculating while accuracy rate.
Background technique
The data that the fast development of network and information technology is are increased with index rank, and text is internet information statement How principal mode extracts the research that crucial, effective information is current the field of data mining from many and diverse text data Hot spot, key technology of the Text Classification as the field of data mining, can the information to text carry out preliminary processing, point Classification out.
The main task of text classification is text representation, feature extraction, sorting algorithm and recruitment evaluation.In order to quilt Computer calculates and processing it may first have to the text of initial input be showed using corresponding feature extraction algorithm, then Sorting algorithm could be used to be trained the text feature of extraction, and by generate training pattern to text to be sorted into Row classification.The method of traditional Text character extraction is mainly based upon the model of probability, by calculate text statistical nature come Keyword is extracted, largely ignores the syntactic and semantic information of text deep layer, this will necessarily reduce the accuracy rate of classification.
Summary of the invention
In view of the above deficiencies, the present invention proposes a kind of text that Attention weight is calculated on the event level of structuring This sorting algorithm, on the one hand, relative to word or phrase, event includes semanteme more abundant;On the other hand, based on event Attention mechanism reduces the time complexity of calculating.Meanwhile it can not learning text to solve existing deep learning model The problem of statistical nature, joined statistical nature on the basis of Attention weight computing, compared with existing model, thing The semantic information and corresponding statistical nature that part structure is included improve the quality of text vector expression, achieve preferable Classification performance.
In conjunction with existing textual classification model, the present invention proposes that a kind of Attention mechanism based on event is used for text Classification, main difference is as follows compared with existing model:
(1) existing Attention mechanism is mainly based upon word rank, proposed by the present invention based on event Attention mechanism calculates weight on event structure level.
(2) deep learning model without calligraphy learning has the statistical nature centainly influenced to text classification result end to end, Statistical nature is added in model, obtains the text representation vector comprising more information.
The present invention adopts the following technical scheme:
A kind of text classification algorithm merging statistical nature and Attention mechanism, it is characterised in that: include:
Step 1 segments a document sets to it first, part-of-speech tagging and stop words is gone to handle, and being recorded The word frequency information of word, while the synonym in document is replaced, then, each word is instructed using word2vec tool Practice and generate term vector, tf-idf value is calculated to the word that gets, is assigned respectively according to the part of speech of word to tf-idf value related Weight obtain the statistical characteristics of the word;
Step 2 extracts event in every document, and calculates the statistical characteristics of event and based on event Attention weight;
Step 3, fusion event Attention weight and event statistics characteristic value, obtaining final vector indicates;
Step 4 is carried out model training, and is tested using finally obtained training result test text and classified As a result.
In the text classification algorithm of above-mentioned a kind of fusion statistical nature and Attention mechanism, the step 1 is specifically wrapped Include following steps:
Step 1 does participle and part-of-speech tagging processing to document sets using Chinese word segmenting tool NLPIR, then utilizes Chinese Stop words vocabulary rejects stop words in document;
Step 2 uses Harbin Institute of Technology's " Chinese thesaurus " extended edition as semantic dictionary, and the near synonym in document are all replaced It is changed to and represents word, obtain final text input sequence;
Step 3 generates term vector to each word training in text input sequence using word2vec tool;
Step 4, each term vector generated to training calculate its tf-idf value, and according to the part of speech of word and tf-idf value The statistical characteristics of the word, calculation is calculated are as follows: Wi=posw*posi+tfidfw*tfidfi, wherein posiIt indicates The part of speech value of word, the value of each weight are as follows: posw=0.5, tfidfw=0.8.
In the text classification algorithm of above-mentioned a kind of fusion statistical nature and Attention mechanism, the step 2 is specifically wrapped Include following steps:
Step 1, given document, carry out interdependent point to each sentence in document using the dependency analysis tool of Stanford Analysis, obtains the dependency structure of every sentence;Then event is extracted using two kinds of dependences of nsubj and dobj, if two Nsubj and dobj relationship possesses identical predicate, then can be merged into an event, indicates <subj with a triple, Verb, obj >, relationship is not merged for the part in interdependent result, is still left binary group event;
The event that step 2, basis are extracted, obtaining the corresponding vector of event indicates:In formula, xsubj、xverb、xobjThe vector for respectively indicating subject in event, predicate and object indicates, calculates the event e in text1,e2, e3,……,etTo the influence power weight of article totality, the effect of critical event can be protruded, it is whole to article to reduce critical events The influence of body semanteme, the semantic coding of attention distribution probability calculate as follows:
Wherein akiIndicate node i relative to the attention weight integrally inputted, ekiIndicate the vector table of discrepancy event sequence Show, T is the number of the Event element of list entries, hkThe corresponding hiding layer state of X` is inputted for whole event;hiIndicate input sequence The corresponding hidden layer state value of i-th of Event element of column, v, W, U are weight matrix, and b is offset parameter, and tanh function is Activation primitive;
Step 3 calculates its statistical characteristics to each of textual event collection event:In formulaThe statistical characteristics of subject in event, object, predicate is respectively indicated, if not including subject or guest in event Then its value is 0 to language.
In the text classification algorithm of above-mentioned a kind of fusion statistical nature and Attention mechanism, the step 3 is specifically wrapped Include following steps:
Step 1, fusion statistical nature and Attention weight, Aki=Tw*Ti+Aw*aki, a in formulakiFor event Attention weight, TiFor the statistical characteristics of event, TwWith AwIndicate that the statistical nature for being respectively event and Attention are weighed Value distributes certain specific gravity, wherein TwValue is 1, AwValue is 2.5;
Step 2 obtains semantic coding C by event criticality and the cumulative of hiding layer state product,Formula Middle AkiFor the weight of event obtained in above-mentioned steps 1, hiFor two-way length, memory network hidden layer state value, T are document in short-term In include event number;
Step 3, the semantic coding C that will be obtained, two-way length memory network hidden layer state value h in short-termkAnd text is averaged Input input of the X` as two-way length memory network module in short-term, Hk`=H (C, hk, X`) and it is that the final vector of document indicates.
In the text classification algorithm of above-mentioned a kind of fusion statistical nature and Attention mechanism, the step 4 is specifically wrapped Include following steps:
Final text representation vector is sent into softmax classifier by step 1, carries out model training;
Step 2 tests training result using test text, obtains final classification results.
The present invention can obtain more semantic features, while reduce influence of the useless feature to classification results, for evaluation The validity of algorithm designs and Implements five groups of comparative experimentss on four data sets, and experiment is run on the server of 64G, leads to The average operating time for crossing contrast model and the convergence rate under identical learning rate, it is known that, the present invention effectively reduces training Time, greatly accelerate convergence rate, while the accuracy rate classified has corresponding promotion, by synonym replacement with will not be same Adopted word replacement accuracy rate averagely promotes 1.68%, averagely promotes 2.22% after fusion statistical nature, this illustrate statistical nature for There are certain influences for text classification accuracy, and the Attention mechanism based on event averagely promotes 3.62%.Using this paper mould Type, accuracy rate averagely promote 4.97%, and classifying quality reaches best.
Detailed description of the invention
Fig. 1 is Attention network structure.
Fig. 2 is model training convergent comparison diagram.
Specific embodiment
Below with reference to the embodiments and with reference to the accompanying drawing the technical solutions of the present invention will be further described.
Embodiment:
The text inputted for one section, first segments it, and stop words and synonym replacement is gone to handle.Then sharp Term vector is generated to the training of each word with word2vec tool, tf-idf value is calculated to the word got, according to word Part of speech and tf-idf value assign the statistical characteristics that relevant weight obtains the word respectively.It calculates based on event Attention weight, while calculating the statistical nature weight of the event.Two weights are merged, the spy obtained based on this Sign vector contains more semantic informations.Steps are as follows for specific algorithm logic:
(1) document sets first segment it, part-of-speech tagging and stop words gone to handle.
(2) word frequency information of word is recorded, while the synonym in document is replaced.
(3) event in every document is extracted
(4) statistical characteristics of event and the Attention weight based on event are calculated.
(5) fusion event Attention weight and statistical characteristics, obtaining final vector indicates.
(6) model training obtains classification results.
1. the statistical characteristics of word.
The synonym in one section of text is removed first, uses Harbin Institute of Technology's " Chinese thesaurus " extended edition as semantic word herein Allusion quotation, wherein each word has several codings, each coding is made of complete five layer identification codes and a flag bit description, five layer identification codes One atom clump of coded representation, flag bit are "=", " # ", "@", wherein "=" represents synonymous, " # " represents similar, belongs to phase Word is closed, " " indicates independent word.In order to replace synonym, so the atom clump for being labeled as "=" is only chosen, to text envelope When breath is pre-processed, using first word of each atom clump as the representative word of the clump, by the near synonym in text It replaces all with and represents word, obtain final text input sequence.
The calculating of word statistical nature relies primarily on statistical theory, is calculated using existing data to estimate feature to last The influence of classification, to screen effective feature.Although the introducing of deep learning overcomes the defect of Feature Words independence assumption, Get more semantic informations.But influence of the word statistical nature for classification results be can not ignore.Special is counted to word The calculating of sign is as follows.
Define 1. word WiPart of speech value posiFor WiThe different degree of affiliated part of speech, the value of each part of speech are as follows:
The statistical characteristics calculation formula of corresponding word is as follows:
Wi=posw*posi+tfidfw*tfidfi
For the statistical characteristics for obtaining word, certain weight is distributed part of speech and tf-idf respectively, by various features value Cumulative summation obtains total characteristic value.Pos in formulaiIndicate word WiPart of speech value, tfidfiIndicate word WiTfidf value, poswIndicate part of speech weight, tfidfwIndicate tf-idf weight.Experiment tuning obtains each weight value are as follows: posw=0.5, tfidfw=0.8.
2. the Attention mechanism based on event.
For one section of text, judge that its generic meets normal cognitive law so that " event " is unit.Given text This, carries out dependency analysis to each sentence in document first with the dependency analysis tool of stanford, obtains every sentence Dependency structure;Then event is extracted using two kinds of dependences of nsubj and dobj, if two nsubj and dobj relationships Possess identical predicate, then can be merged into an event, is indicated with a triple<subj,verb,obj>.For interdependent As a result the part in does not merge relationship, is still left binary group event.
After extracting possible event, word is replaced using trained term vector, obtained event is expressed as having There is the vector of 3 times of term vector dimensions.Calculation are as follows:
The semantic coding of attention distribution probability calculates as follows.
Wherein akiNode i is indicated relative to the attention weight integrally inputted, T is the number of the Event element of list entries Mesh, hkThe corresponding hiding layer state of X` is inputted for whole event.hiIndicate the corresponding hidden layer of i-th of Event element of list entries State value, v, W, U are weight matrix, and b is offset parameter.Model structure is as shown in Figure 1.
3. the fusion of feature weight.
In the extraction process of keyword, by the weight information knot of traditional statistical nature and Attention mechanism acquisition Input of the obtained semantic coding as BiLSTM is closed, the feature vector of final text had both considered the statistical characteristics in text, It simultaneously include more semantic information.The text representation vector that the algorithm obtains can more reflect that the main information of text effectively mentions The accuracy rate of classification is risen.Main processing logic is as shown in algorithm 1.
Algorithm passes through the sum for calculating the statistical characteristics of each event elements corresponding word first, obtains its corresponding feature Then weight calculates the Attention weight of the event, by two values distribute certain weights sum to obtain the event it is corresponding Weight finally will be semantic by the way that the cumulative of output valve product of event criticality and BiLSTM hidden layer is obtained semantic coding C C is encoded, input of the input vector X` of general characteristic vector and the article totality of article as BiLSTM module obtains most The hidden layer state value H of posterior nodal pointk`It is exactly final feature vector.This feature vector contains the weight of history input node Information highlights the effect of keyword, finally using logistic regression building classifiers of classifying, obtains classification results more.
4. effect of the invention.
For the validity for verifying model, yelp2013, Sogou corpus, Amazon Review and IMDB conduct are chosen The data set of experiment chooses 90% respectively and is used as training set, and 10% is used as test set.It is deep that the frame of experiment is based on TensorFlow Learning framework is spent, design realizes that five groups of comparative experimentss are respectively as follows: BiLSTM_Attention (BA), synonym replacement BiLSTM_Attention (S_BA) merges the BiLSTM_Attention (T_BA) of statistical nature, the BiLSTM_ based on event The model (Proposed) that Attention (E_BA) and the present invention design.Majorized function uses Adam, learning rate in experiment It is set as 0.01, num_epoch and is set as 20, batch_size 32, the number of the node of hidden layer is 256.In order to be promoted Training speed model uses single layer network, and classifier uses polytypic logistic regression classifier.Specific experimentation are as follows: will After the Text Pretreatment to be trained, indicate that characteristic extraction part is using upper using the vector that word2vec tool is mapped to 50 dimensions Corresponding model realization in five kinds of models is stated, the input of classifier is the last hidden layer state value of corresponding model.
Experiment is run on the server of 64G, is known by the average operating time of contrast model based on event Attention mechanism is effectively reduced the trained time, while greatly accelerating convergent speed, and three models are at four Training time on data set is as shown in table 1.Training result of the two of them model on yelp2013 data set such as Fig. 2 institute Show.It can be seen that the Attention mechanism based on event faster compared to BA convergence rate in figure, while accuracy rate is higher.
The 1 model training time of table
For above-mentioned five groups of experiments, then every group of experiment is chosen the optimal data of result, is obtained by repeatedly training adjustment Experimental result carry out counting as shown in table 2.The model that result by comparing five groups of experiments is known that design is realized can be with Effectively improve the accuracy rate of text classification.Synonym is replaced and does not improve synonym replacement accuracy rate averagely 1.68%, 2.22% is averagely promoted after merging statistical nature, it is certain that this illustrates that statistical nature has text classification accuracy Influence, the Attention mechanism based on event averagely promotes 3.62%.Using this paper model, accuracy rate is averagely promoted 4.97%, classifying quality reaches best.
Accuracy rate of the 2 five kinds of models of table on four data sets
Specific embodiment described herein is only an example for the spirit of the invention.The neck of technology belonging to the present invention The technical staff in domain can make various modifications or additions to the described embodiments or replace by a similar method In generation, however, it does not deviate from the spirit of the invention or beyond the scope of the appended claims.

Claims (5)

1. a kind of text classification algorithm for merging statistical nature and Attention mechanism, it is characterised in that: include:
Step 1 segments a document sets to it first, part-of-speech tagging and stop words is gone to handle, and recording word Word frequency information, while the synonym in document is replaced, then, the training of each word is given birth to using word2vec tool At term vector, tf-idf value is calculated to the word got, assigns relevant power respectively according to the part of speech of word and tf-idf value Restore the statistical characteristics of the word;
Event in every step 2, extraction document, and calculate the statistical characteristics of event and the Attention power based on event Value;
Step 3, fusion event Attention weight and event statistics characteristic value, obtaining final vector indicates;
Step 4 carries out model training, and is tested using finally obtained training result test text to obtain classification knot Fruit.
2. the text classification algorithm of fusion statistical nature and Attention mechanism according to claim 1, feature exist In: the step 1 specifically includes the following steps:
Step 1 does participle and part-of-speech tagging processing to document sets using Chinese word segmenting tool NLPIR, is then deactivated using Chinese Word vocabulary rejects stop words in document;
Step 2 uses Harbin Institute of Technology's " Chinese thesaurus " extended edition as semantic dictionary, and the near synonym in document are replaced all with Word is represented, final text input sequence is obtained;
Step 3 generates term vector to each word training in text input sequence using word2vec tool;
Step 4 calculates its tf-idf value to each term vector that training generates, and is calculated according to the part of speech of word and tf-idf value Obtain the statistical characteristics of the word, calculation are as follows: Wi=posw*posi+tfidfw*tfidfi, wherein posiIndicate word Part of speech value, the value of each weight are as follows: posw=0.5, tfidfw=0.8.
3. the text classification algorithm of fusion statistical nature and Attention mechanism according to claim 1, feature exist In: the step 2 specifically includes the following steps:
Step 1, given document carry out dependency analysis to each sentence in document using the dependency analysis tool of Stanford, Obtain the dependency structure of every sentence;Then event is extracted using two kinds of dependences of nsubj and dobj, if two Nsubj and dobj relationship possesses identical predicate, then can be merged into an event, indicates <subj with a triple, Verb, obj >, relationship is not merged for the part in interdependent result, is still left binary group event;
The event that step 2, basis are extracted, obtaining the corresponding vector of event indicates:In formula, xsubj、 xverb、xobjThe vector for respectively indicating subject in event, predicate and object indicates, calculates the event e in text1,e2,e3,……, etTo the influence power weight of article totality, the effect of critical event can be protruded, it is whole to article semantic to reduce critical events Influence, the semantic coding of attention distribution probability calculates as follows:
Wherein akiIndicate node i relative to the attention weight integrally inputted, ekiIndicate that the vector for entering and leaving event sequence indicates, T For the number of the Event element of list entries, hkThe corresponding hiding layer state of X` is inputted for whole event;hiIndicate list entries the The corresponding hidden layer state value of i Event element, v, W, U are weight matrix, and b is offset parameter, and tanh function is used as activation Function;
Step 3 calculates its statistical characteristics to each of textual event collection event:In formulaThe statistical characteristics of subject in event, object, predicate is respectively indicated, if not including subject or guest in event Then its value is 0 to language.
4. the text classification algorithm of fusion statistical nature and Attention mechanism according to claim 1, feature exist In: the step 3 specifically includes the following steps:
Step 1, fusion statistical nature and Attention weight, Aki=Tw*Ti+Aw*aki, a in formulakiFor the Attention of event Weight, TiFor the statistical characteristics of event, TwWith AwIndicate that the statistical nature for being respectively event and Attention weight distribute one Fixed specific gravity, wherein TwValue is 1, AwValue is 2.5;
Step 2 obtains semantic coding C by event criticality and the cumulative of hiding layer state product,A in formulaki For the weight of event obtained in above-mentioned steps 1, hiFor two-way length, memory network hidden layer state value, T are to wrap in document in short-term The event number contained;
Step 3, the semantic coding C that will be obtained, two-way length memory network hidden layer state value h in short-termkAnd the average input X` of text As the input of two-way length memory network module in short-term, Hk`=H (C, hk, X`) and it is that the final vector of document indicates.
5. the text classification algorithm of fusion statistical nature and Attention mechanism according to claim 1, feature exist In: the step 4 specifically includes the following steps:
Final text representation vector is sent into softmax classifier by step 1, carries out model training;
Step 2 tests training result using test text, obtains final classification results.
CN201810817616.6A 2018-07-24 2018-07-24 A kind of text classification algorithm merging statistical nature and Attention mechanism Pending CN109325114A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810817616.6A CN109325114A (en) 2018-07-24 2018-07-24 A kind of text classification algorithm merging statistical nature and Attention mechanism

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810817616.6A CN109325114A (en) 2018-07-24 2018-07-24 A kind of text classification algorithm merging statistical nature and Attention mechanism

Publications (1)

Publication Number Publication Date
CN109325114A true CN109325114A (en) 2019-02-12

Family

ID=65263959

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810817616.6A Pending CN109325114A (en) 2018-07-24 2018-07-24 A kind of text classification algorithm merging statistical nature and Attention mechanism

Country Status (1)

Country Link
CN (1) CN109325114A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992774A (en) * 2019-03-25 2019-07-09 北京理工大学 The key phrase recognition methods of word-based attribute attention mechanism
CN110209824A (en) * 2019-06-13 2019-09-06 中国科学院自动化研究所 Text emotion analysis method based on built-up pattern, system, device
CN110309306A (en) * 2019-06-19 2019-10-08 淮阴工学院 A kind of Document Modeling classification method based on WSD level memory network
CN110309317A (en) * 2019-05-22 2019-10-08 中国传媒大学 Term vector generation method, system, electronic device and the medium of Chinese corpus
CN110781303A (en) * 2019-10-28 2020-02-11 佰聆数据股份有限公司 Short text classification method and system
CN111061881A (en) * 2019-12-27 2020-04-24 浪潮通用软件有限公司 Text classification method, equipment and storage medium
CN111159409A (en) * 2019-12-31 2020-05-15 腾讯科技(深圳)有限公司 Text classification method, device, equipment and medium based on artificial intelligence
CN112612898A (en) * 2021-03-05 2021-04-06 蚂蚁智信(杭州)信息技术有限公司 Text classification method and device
CN113407721A (en) * 2021-06-29 2021-09-17 哈尔滨工业大学(深圳) Method, device and computer storage medium for detecting log sequence abnormity

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013065257A (en) * 2011-09-20 2013-04-11 Fuji Xerox Co Ltd Information processing unit and program
JP2016024545A (en) * 2014-07-17 2016-02-08 株式会社Nttドコモ Information management apparatus, information management system, and information management method
CN107045524A (en) * 2016-12-30 2017-08-15 中央民族大学 A kind of method and system of network text public sentiment classification

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2013065257A (en) * 2011-09-20 2013-04-11 Fuji Xerox Co Ltd Information processing unit and program
JP2016024545A (en) * 2014-07-17 2016-02-08 株式会社Nttドコモ Information management apparatus, information management system, and information management method
CN107045524A (en) * 2016-12-30 2017-08-15 中央民族大学 A kind of method and system of network text public sentiment classification

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
CHAO LI等: "A Novel Document Classification Algorithm Based on Statistical Features and Attention Mechanism", 《2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 》 *

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109992774A (en) * 2019-03-25 2019-07-09 北京理工大学 The key phrase recognition methods of word-based attribute attention mechanism
CN110309317A (en) * 2019-05-22 2019-10-08 中国传媒大学 Term vector generation method, system, electronic device and the medium of Chinese corpus
CN110309317B (en) * 2019-05-22 2021-07-23 中国传媒大学 Method, system, electronic device and medium for generating word vector of Chinese corpus
CN110209824A (en) * 2019-06-13 2019-09-06 中国科学院自动化研究所 Text emotion analysis method based on built-up pattern, system, device
CN110209824B (en) * 2019-06-13 2021-06-22 中国科学院自动化研究所 Text emotion analysis method, system and device based on combined model
CN110309306A (en) * 2019-06-19 2019-10-08 淮阴工学院 A kind of Document Modeling classification method based on WSD level memory network
CN110781303A (en) * 2019-10-28 2020-02-11 佰聆数据股份有限公司 Short text classification method and system
CN111061881A (en) * 2019-12-27 2020-04-24 浪潮通用软件有限公司 Text classification method, equipment and storage medium
CN111159409A (en) * 2019-12-31 2020-05-15 腾讯科技(深圳)有限公司 Text classification method, device, equipment and medium based on artificial intelligence
CN111159409B (en) * 2019-12-31 2023-06-02 腾讯科技(深圳)有限公司 Text classification method, device, equipment and medium based on artificial intelligence
CN112612898A (en) * 2021-03-05 2021-04-06 蚂蚁智信(杭州)信息技术有限公司 Text classification method and device
CN113407721A (en) * 2021-06-29 2021-09-17 哈尔滨工业大学(深圳) Method, device and computer storage medium for detecting log sequence abnormity

Similar Documents

Publication Publication Date Title
CN109325114A (en) A kind of text classification algorithm merging statistical nature and Attention mechanism
Wang et al. A hybrid document feature extraction method using latent Dirichlet allocation and word2vec
Song et al. Research on text classification based on convolutional neural network
Wang et al. Integrating extractive and abstractive models for long text summarization
US11675981B2 (en) Neural network systems and methods for target identification from text
WO2019080863A1 (en) Text sentiment classification method, storage medium and computer
Quan et al. An efficient framework for sentence similarity modeling
CN112001185A (en) Emotion classification method combining Chinese syntax and graph convolution neural network
CN108197111A (en) A kind of text automatic abstracting method based on fusion Semantic Clustering
CN108733653A (en) A kind of sentiment analysis method of the Skip-gram models based on fusion part of speech and semantic information
CN108763402A (en) Class center vector Text Categorization Method based on dependence, part of speech and semantic dictionary
CN111078833B (en) Text classification method based on neural network
CN112001186A (en) Emotion classification method using graph convolution neural network and Chinese syntax
CN104834735A (en) Automatic document summarization extraction method based on term vectors
CN110378409A (en) It is a kind of based on element association attention mechanism the Chinese get over news documents abstraction generating method
CN106599032A (en) Text event extraction method in combination of sparse coding and structural perceptron
CN109684642A (en) A kind of abstract extraction method of combination page parsing rule and NLP text vector
CN110175221A (en) Utilize the refuse messages recognition methods of term vector combination machine learning
Qiu et al. Advanced sentiment classification of *** microblogs on smart campuses based on multi-feature fusion
CN102779119B (en) A kind of method of extracting keywords and device
Errami et al. Sentiment Analysis onMoroccan Dialect based on ML and Social Media Content Detection
Gao et al. Sentiment classification for stock news
Foong et al. Text summarization using latent semantic analysis model in mobile android platform
CN103744838A (en) Chinese emotional abstract system and Chinese emotional abstract method for measuring mainstream emotional information
CN114265936A (en) Method for realizing text mining of science and technology project

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20190212

RJ01 Rejection of invention patent application after publication