CN109325114A - A kind of text classification algorithm merging statistical nature and Attention mechanism - Google Patents
A kind of text classification algorithm merging statistical nature and Attention mechanism Download PDFInfo
- Publication number
- CN109325114A CN109325114A CN201810817616.6A CN201810817616A CN109325114A CN 109325114 A CN109325114 A CN 109325114A CN 201810817616 A CN201810817616 A CN 201810817616A CN 109325114 A CN109325114 A CN 109325114A
- Authority
- CN
- China
- Prior art keywords
- event
- word
- text
- attention
- value
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/216—Parsing using statistical methods
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/30—Semantic analysis
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Physics & Mathematics (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Computational Linguistics (AREA)
- General Health & Medical Sciences (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Probability & Statistics with Applications (AREA)
- Machine Translation (AREA)
Abstract
The present invention relates to a kind of text classification algorithm for merging statistical nature and Attention mechanism, Attention mechanism is gradually applied to natural language processing field.Existing method significantly increases calculation amount when calculating Attention weight, and the present invention proposes to calculate Attention weight on the event level of structuring.It on the one hand, include semanteme more abundant relative to word or phrase event;On the other hand, the Attention mechanism based on event reduces computation complexity.Meanwhile it joined statistical nature on the basis of Attention weight computing.Compared with existing model, the semantic information and corresponding statistical nature that event structure is included improve the quality of text vector expression, obtain preferable classification performance.Recruitment evaluation is carried out to it on classification accuracy, the experimental results showed that the model obtains more preferably effect while reducing the training time.
Description
Technical field
The present invention relates to a kind of novel text classification algorithms to classify particular for large-scale text data set improving
Reduce the time complexity of calculating while accuracy rate.
Background technique
The data that the fast development of network and information technology is are increased with index rank, and text is internet information statement
How principal mode extracts the research that crucial, effective information is current the field of data mining from many and diverse text data
Hot spot, key technology of the Text Classification as the field of data mining, can the information to text carry out preliminary processing, point
Classification out.
The main task of text classification is text representation, feature extraction, sorting algorithm and recruitment evaluation.In order to quilt
Computer calculates and processing it may first have to the text of initial input be showed using corresponding feature extraction algorithm, then
Sorting algorithm could be used to be trained the text feature of extraction, and by generate training pattern to text to be sorted into
Row classification.The method of traditional Text character extraction is mainly based upon the model of probability, by calculate text statistical nature come
Keyword is extracted, largely ignores the syntactic and semantic information of text deep layer, this will necessarily reduce the accuracy rate of classification.
Summary of the invention
In view of the above deficiencies, the present invention proposes a kind of text that Attention weight is calculated on the event level of structuring
This sorting algorithm, on the one hand, relative to word or phrase, event includes semanteme more abundant;On the other hand, based on event
Attention mechanism reduces the time complexity of calculating.Meanwhile it can not learning text to solve existing deep learning model
The problem of statistical nature, joined statistical nature on the basis of Attention weight computing, compared with existing model, thing
The semantic information and corresponding statistical nature that part structure is included improve the quality of text vector expression, achieve preferable
Classification performance.
In conjunction with existing textual classification model, the present invention proposes that a kind of Attention mechanism based on event is used for text
Classification, main difference is as follows compared with existing model:
(1) existing Attention mechanism is mainly based upon word rank, proposed by the present invention based on event
Attention mechanism calculates weight on event structure level.
(2) deep learning model without calligraphy learning has the statistical nature centainly influenced to text classification result end to end,
Statistical nature is added in model, obtains the text representation vector comprising more information.
The present invention adopts the following technical scheme:
A kind of text classification algorithm merging statistical nature and Attention mechanism, it is characterised in that: include:
Step 1 segments a document sets to it first, part-of-speech tagging and stop words is gone to handle, and being recorded
The word frequency information of word, while the synonym in document is replaced, then, each word is instructed using word2vec tool
Practice and generate term vector, tf-idf value is calculated to the word that gets, is assigned respectively according to the part of speech of word to tf-idf value related
Weight obtain the statistical characteristics of the word;
Step 2 extracts event in every document, and calculates the statistical characteristics of event and based on event
Attention weight;
Step 3, fusion event Attention weight and event statistics characteristic value, obtaining final vector indicates;
Step 4 is carried out model training, and is tested using finally obtained training result test text and classified
As a result.
In the text classification algorithm of above-mentioned a kind of fusion statistical nature and Attention mechanism, the step 1 is specifically wrapped
Include following steps:
Step 1 does participle and part-of-speech tagging processing to document sets using Chinese word segmenting tool NLPIR, then utilizes Chinese
Stop words vocabulary rejects stop words in document;
Step 2 uses Harbin Institute of Technology's " Chinese thesaurus " extended edition as semantic dictionary, and the near synonym in document are all replaced
It is changed to and represents word, obtain final text input sequence;
Step 3 generates term vector to each word training in text input sequence using word2vec tool;
Step 4, each term vector generated to training calculate its tf-idf value, and according to the part of speech of word and tf-idf value
The statistical characteristics of the word, calculation is calculated are as follows: Wi=posw*posi+tfidfw*tfidfi, wherein posiIt indicates
The part of speech value of word, the value of each weight are as follows: posw=0.5, tfidfw=0.8.
In the text classification algorithm of above-mentioned a kind of fusion statistical nature and Attention mechanism, the step 2 is specifically wrapped
Include following steps:
Step 1, given document, carry out interdependent point to each sentence in document using the dependency analysis tool of Stanford
Analysis, obtains the dependency structure of every sentence;Then event is extracted using two kinds of dependences of nsubj and dobj, if two
Nsubj and dobj relationship possesses identical predicate, then can be merged into an event, indicates <subj with a triple,
Verb, obj >, relationship is not merged for the part in interdependent result, is still left binary group event;
The event that step 2, basis are extracted, obtaining the corresponding vector of event indicates:In formula,
xsubj、xverb、xobjThe vector for respectively indicating subject in event, predicate and object indicates, calculates the event e in text1,e2,
e3,……,etTo the influence power weight of article totality, the effect of critical event can be protruded, it is whole to article to reduce critical events
The influence of body semanteme, the semantic coding of attention distribution probability calculate as follows:
Wherein akiIndicate node i relative to the attention weight integrally inputted, ekiIndicate the vector table of discrepancy event sequence
Show, T is the number of the Event element of list entries, hkThe corresponding hiding layer state of X` is inputted for whole event;hiIndicate input sequence
The corresponding hidden layer state value of i-th of Event element of column, v, W, U are weight matrix, and b is offset parameter, and tanh function is
Activation primitive;
Step 3 calculates its statistical characteristics to each of textual event collection event:In formulaThe statistical characteristics of subject in event, object, predicate is respectively indicated, if not including subject or guest in event
Then its value is 0 to language.
In the text classification algorithm of above-mentioned a kind of fusion statistical nature and Attention mechanism, the step 3 is specifically wrapped
Include following steps:
Step 1, fusion statistical nature and Attention weight, Aki=Tw*Ti+Aw*aki, a in formulakiFor event
Attention weight, TiFor the statistical characteristics of event, TwWith AwIndicate that the statistical nature for being respectively event and Attention are weighed
Value distributes certain specific gravity, wherein TwValue is 1, AwValue is 2.5;
Step 2 obtains semantic coding C by event criticality and the cumulative of hiding layer state product,Formula
Middle AkiFor the weight of event obtained in above-mentioned steps 1, hiFor two-way length, memory network hidden layer state value, T are document in short-term
In include event number;
Step 3, the semantic coding C that will be obtained, two-way length memory network hidden layer state value h in short-termkAnd text is averaged
Input input of the X` as two-way length memory network module in short-term, Hk`=H (C, hk, X`) and it is that the final vector of document indicates.
In the text classification algorithm of above-mentioned a kind of fusion statistical nature and Attention mechanism, the step 4 is specifically wrapped
Include following steps:
Final text representation vector is sent into softmax classifier by step 1, carries out model training;
Step 2 tests training result using test text, obtains final classification results.
The present invention can obtain more semantic features, while reduce influence of the useless feature to classification results, for evaluation
The validity of algorithm designs and Implements five groups of comparative experimentss on four data sets, and experiment is run on the server of 64G, leads to
The average operating time for crossing contrast model and the convergence rate under identical learning rate, it is known that, the present invention effectively reduces training
Time, greatly accelerate convergence rate, while the accuracy rate classified has corresponding promotion, by synonym replacement with will not be same
Adopted word replacement accuracy rate averagely promotes 1.68%, averagely promotes 2.22% after fusion statistical nature, this illustrate statistical nature for
There are certain influences for text classification accuracy, and the Attention mechanism based on event averagely promotes 3.62%.Using this paper mould
Type, accuracy rate averagely promote 4.97%, and classifying quality reaches best.
Detailed description of the invention
Fig. 1 is Attention network structure.
Fig. 2 is model training convergent comparison diagram.
Specific embodiment
Below with reference to the embodiments and with reference to the accompanying drawing the technical solutions of the present invention will be further described.
Embodiment:
The text inputted for one section, first segments it, and stop words and synonym replacement is gone to handle.Then sharp
Term vector is generated to the training of each word with word2vec tool, tf-idf value is calculated to the word got, according to word
Part of speech and tf-idf value assign the statistical characteristics that relevant weight obtains the word respectively.It calculates based on event
Attention weight, while calculating the statistical nature weight of the event.Two weights are merged, the spy obtained based on this
Sign vector contains more semantic informations.Steps are as follows for specific algorithm logic:
(1) document sets first segment it, part-of-speech tagging and stop words gone to handle.
(2) word frequency information of word is recorded, while the synonym in document is replaced.
(3) event in every document is extracted
(4) statistical characteristics of event and the Attention weight based on event are calculated.
(5) fusion event Attention weight and statistical characteristics, obtaining final vector indicates.
(6) model training obtains classification results.
1. the statistical characteristics of word.
The synonym in one section of text is removed first, uses Harbin Institute of Technology's " Chinese thesaurus " extended edition as semantic word herein
Allusion quotation, wherein each word has several codings, each coding is made of complete five layer identification codes and a flag bit description, five layer identification codes
One atom clump of coded representation, flag bit are "=", " # ", "@", wherein "=" represents synonymous, " # " represents similar, belongs to phase
Word is closed, " " indicates independent word.In order to replace synonym, so the atom clump for being labeled as "=" is only chosen, to text envelope
When breath is pre-processed, using first word of each atom clump as the representative word of the clump, by the near synonym in text
It replaces all with and represents word, obtain final text input sequence.
The calculating of word statistical nature relies primarily on statistical theory, is calculated using existing data to estimate feature to last
The influence of classification, to screen effective feature.Although the introducing of deep learning overcomes the defect of Feature Words independence assumption,
Get more semantic informations.But influence of the word statistical nature for classification results be can not ignore.Special is counted to word
The calculating of sign is as follows.
Define 1. word WiPart of speech value posiFor WiThe different degree of affiliated part of speech, the value of each part of speech are as follows:
The statistical characteristics calculation formula of corresponding word is as follows:
Wi=posw*posi+tfidfw*tfidfi;
For the statistical characteristics for obtaining word, certain weight is distributed part of speech and tf-idf respectively, by various features value
Cumulative summation obtains total characteristic value.Pos in formulaiIndicate word WiPart of speech value, tfidfiIndicate word WiTfidf value,
poswIndicate part of speech weight, tfidfwIndicate tf-idf weight.Experiment tuning obtains each weight value are as follows: posw=0.5,
tfidfw=0.8.
2. the Attention mechanism based on event.
For one section of text, judge that its generic meets normal cognitive law so that " event " is unit.Given text
This, carries out dependency analysis to each sentence in document first with the dependency analysis tool of stanford, obtains every sentence
Dependency structure;Then event is extracted using two kinds of dependences of nsubj and dobj, if two nsubj and dobj relationships
Possess identical predicate, then can be merged into an event, is indicated with a triple<subj,verb,obj>.For interdependent
As a result the part in does not merge relationship, is still left binary group event.
After extracting possible event, word is replaced using trained term vector, obtained event is expressed as having
There is the vector of 3 times of term vector dimensions.Calculation are as follows:
The semantic coding of attention distribution probability calculates as follows.
Wherein akiNode i is indicated relative to the attention weight integrally inputted, T is the number of the Event element of list entries
Mesh, hkThe corresponding hiding layer state of X` is inputted for whole event.hiIndicate the corresponding hidden layer of i-th of Event element of list entries
State value, v, W, U are weight matrix, and b is offset parameter.Model structure is as shown in Figure 1.
3. the fusion of feature weight.
In the extraction process of keyword, by the weight information knot of traditional statistical nature and Attention mechanism acquisition
Input of the obtained semantic coding as BiLSTM is closed, the feature vector of final text had both considered the statistical characteristics in text,
It simultaneously include more semantic information.The text representation vector that the algorithm obtains can more reflect that the main information of text effectively mentions
The accuracy rate of classification is risen.Main processing logic is as shown in algorithm 1.
Algorithm passes through the sum for calculating the statistical characteristics of each event elements corresponding word first, obtains its corresponding feature
Then weight calculates the Attention weight of the event, by two values distribute certain weights sum to obtain the event it is corresponding
Weight finally will be semantic by the way that the cumulative of output valve product of event criticality and BiLSTM hidden layer is obtained semantic coding C
C is encoded, input of the input vector X` of general characteristic vector and the article totality of article as BiLSTM module obtains most
The hidden layer state value H of posterior nodal pointk`It is exactly final feature vector.This feature vector contains the weight of history input node
Information highlights the effect of keyword, finally using logistic regression building classifiers of classifying, obtains classification results more.
4. effect of the invention.
For the validity for verifying model, yelp2013, Sogou corpus, Amazon Review and IMDB conduct are chosen
The data set of experiment chooses 90% respectively and is used as training set, and 10% is used as test set.It is deep that the frame of experiment is based on TensorFlow
Learning framework is spent, design realizes that five groups of comparative experimentss are respectively as follows: BiLSTM_Attention (BA), synonym replacement
BiLSTM_Attention (S_BA) merges the BiLSTM_Attention (T_BA) of statistical nature, the BiLSTM_ based on event
The model (Proposed) that Attention (E_BA) and the present invention design.Majorized function uses Adam, learning rate in experiment
It is set as 0.01, num_epoch and is set as 20, batch_size 32, the number of the node of hidden layer is 256.In order to be promoted
Training speed model uses single layer network, and classifier uses polytypic logistic regression classifier.Specific experimentation are as follows: will
After the Text Pretreatment to be trained, indicate that characteristic extraction part is using upper using the vector that word2vec tool is mapped to 50 dimensions
Corresponding model realization in five kinds of models is stated, the input of classifier is the last hidden layer state value of corresponding model.
Experiment is run on the server of 64G, is known by the average operating time of contrast model based on event
Attention mechanism is effectively reduced the trained time, while greatly accelerating convergent speed, and three models are at four
Training time on data set is as shown in table 1.Training result of the two of them model on yelp2013 data set such as Fig. 2 institute
Show.It can be seen that the Attention mechanism based on event faster compared to BA convergence rate in figure, while accuracy rate is higher.
The 1 model training time of table
For above-mentioned five groups of experiments, then every group of experiment is chosen the optimal data of result, is obtained by repeatedly training adjustment
Experimental result carry out counting as shown in table 2.The model that result by comparing five groups of experiments is known that design is realized can be with
Effectively improve the accuracy rate of text classification.Synonym is replaced and does not improve synonym replacement accuracy rate averagely
1.68%, 2.22% is averagely promoted after merging statistical nature, it is certain that this illustrates that statistical nature has text classification accuracy
Influence, the Attention mechanism based on event averagely promotes 3.62%.Using this paper model, accuracy rate is averagely promoted
4.97%, classifying quality reaches best.
Accuracy rate of the 2 five kinds of models of table on four data sets
Specific embodiment described herein is only an example for the spirit of the invention.The neck of technology belonging to the present invention
The technical staff in domain can make various modifications or additions to the described embodiments or replace by a similar method
In generation, however, it does not deviate from the spirit of the invention or beyond the scope of the appended claims.
Claims (5)
1. a kind of text classification algorithm for merging statistical nature and Attention mechanism, it is characterised in that: include:
Step 1 segments a document sets to it first, part-of-speech tagging and stop words is gone to handle, and recording word
Word frequency information, while the synonym in document is replaced, then, the training of each word is given birth to using word2vec tool
At term vector, tf-idf value is calculated to the word got, assigns relevant power respectively according to the part of speech of word and tf-idf value
Restore the statistical characteristics of the word;
Event in every step 2, extraction document, and calculate the statistical characteristics of event and the Attention power based on event
Value;
Step 3, fusion event Attention weight and event statistics characteristic value, obtaining final vector indicates;
Step 4 carries out model training, and is tested using finally obtained training result test text to obtain classification knot
Fruit.
2. the text classification algorithm of fusion statistical nature and Attention mechanism according to claim 1, feature exist
In: the step 1 specifically includes the following steps:
Step 1 does participle and part-of-speech tagging processing to document sets using Chinese word segmenting tool NLPIR, is then deactivated using Chinese
Word vocabulary rejects stop words in document;
Step 2 uses Harbin Institute of Technology's " Chinese thesaurus " extended edition as semantic dictionary, and the near synonym in document are replaced all with
Word is represented, final text input sequence is obtained;
Step 3 generates term vector to each word training in text input sequence using word2vec tool;
Step 4 calculates its tf-idf value to each term vector that training generates, and is calculated according to the part of speech of word and tf-idf value
Obtain the statistical characteristics of the word, calculation are as follows: Wi=posw*posi+tfidfw*tfidfi, wherein posiIndicate word
Part of speech value, the value of each weight are as follows: posw=0.5, tfidfw=0.8.
3. the text classification algorithm of fusion statistical nature and Attention mechanism according to claim 1, feature exist
In: the step 2 specifically includes the following steps:
Step 1, given document carry out dependency analysis to each sentence in document using the dependency analysis tool of Stanford,
Obtain the dependency structure of every sentence;Then event is extracted using two kinds of dependences of nsubj and dobj, if two
Nsubj and dobj relationship possesses identical predicate, then can be merged into an event, indicates <subj with a triple,
Verb, obj >, relationship is not merged for the part in interdependent result, is still left binary group event;
The event that step 2, basis are extracted, obtaining the corresponding vector of event indicates:In formula, xsubj、
xverb、xobjThe vector for respectively indicating subject in event, predicate and object indicates, calculates the event e in text1,e2,e3,……,
etTo the influence power weight of article totality, the effect of critical event can be protruded, it is whole to article semantic to reduce critical events
Influence, the semantic coding of attention distribution probability calculates as follows:
Wherein akiIndicate node i relative to the attention weight integrally inputted, ekiIndicate that the vector for entering and leaving event sequence indicates, T
For the number of the Event element of list entries, hkThe corresponding hiding layer state of X` is inputted for whole event;hiIndicate list entries the
The corresponding hidden layer state value of i Event element, v, W, U are weight matrix, and b is offset parameter, and tanh function is used as activation
Function;
Step 3 calculates its statistical characteristics to each of textual event collection event:In formulaThe statistical characteristics of subject in event, object, predicate is respectively indicated, if not including subject or guest in event
Then its value is 0 to language.
4. the text classification algorithm of fusion statistical nature and Attention mechanism according to claim 1, feature exist
In: the step 3 specifically includes the following steps:
Step 1, fusion statistical nature and Attention weight, Aki=Tw*Ti+Aw*aki, a in formulakiFor the Attention of event
Weight, TiFor the statistical characteristics of event, TwWith AwIndicate that the statistical nature for being respectively event and Attention weight distribute one
Fixed specific gravity, wherein TwValue is 1, AwValue is 2.5;
Step 2 obtains semantic coding C by event criticality and the cumulative of hiding layer state product,A in formulaki
For the weight of event obtained in above-mentioned steps 1, hiFor two-way length, memory network hidden layer state value, T are to wrap in document in short-term
The event number contained;
Step 3, the semantic coding C that will be obtained, two-way length memory network hidden layer state value h in short-termkAnd the average input X` of text
As the input of two-way length memory network module in short-term, Hk`=H (C, hk, X`) and it is that the final vector of document indicates.
5. the text classification algorithm of fusion statistical nature and Attention mechanism according to claim 1, feature exist
In: the step 4 specifically includes the following steps:
Final text representation vector is sent into softmax classifier by step 1, carries out model training;
Step 2 tests training result using test text, obtains final classification results.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810817616.6A CN109325114A (en) | 2018-07-24 | 2018-07-24 | A kind of text classification algorithm merging statistical nature and Attention mechanism |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810817616.6A CN109325114A (en) | 2018-07-24 | 2018-07-24 | A kind of text classification algorithm merging statistical nature and Attention mechanism |
Publications (1)
Publication Number | Publication Date |
---|---|
CN109325114A true CN109325114A (en) | 2019-02-12 |
Family
ID=65263959
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810817616.6A Pending CN109325114A (en) | 2018-07-24 | 2018-07-24 | A kind of text classification algorithm merging statistical nature and Attention mechanism |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN109325114A (en) |
Cited By (9)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109992774A (en) * | 2019-03-25 | 2019-07-09 | 北京理工大学 | The key phrase recognition methods of word-based attribute attention mechanism |
CN110209824A (en) * | 2019-06-13 | 2019-09-06 | 中国科学院自动化研究所 | Text emotion analysis method based on built-up pattern, system, device |
CN110309306A (en) * | 2019-06-19 | 2019-10-08 | 淮阴工学院 | A kind of Document Modeling classification method based on WSD level memory network |
CN110309317A (en) * | 2019-05-22 | 2019-10-08 | 中国传媒大学 | Term vector generation method, system, electronic device and the medium of Chinese corpus |
CN110781303A (en) * | 2019-10-28 | 2020-02-11 | 佰聆数据股份有限公司 | Short text classification method and system |
CN111061881A (en) * | 2019-12-27 | 2020-04-24 | 浪潮通用软件有限公司 | Text classification method, equipment and storage medium |
CN111159409A (en) * | 2019-12-31 | 2020-05-15 | 腾讯科技(深圳)有限公司 | Text classification method, device, equipment and medium based on artificial intelligence |
CN112612898A (en) * | 2021-03-05 | 2021-04-06 | 蚂蚁智信(杭州)信息技术有限公司 | Text classification method and device |
CN113407721A (en) * | 2021-06-29 | 2021-09-17 | 哈尔滨工业大学(深圳) | Method, device and computer storage medium for detecting log sequence abnormity |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013065257A (en) * | 2011-09-20 | 2013-04-11 | Fuji Xerox Co Ltd | Information processing unit and program |
JP2016024545A (en) * | 2014-07-17 | 2016-02-08 | 株式会社Nttドコモ | Information management apparatus, information management system, and information management method |
CN107045524A (en) * | 2016-12-30 | 2017-08-15 | 中央民族大学 | A kind of method and system of network text public sentiment classification |
-
2018
- 2018-07-24 CN CN201810817616.6A patent/CN109325114A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2013065257A (en) * | 2011-09-20 | 2013-04-11 | Fuji Xerox Co Ltd | Information processing unit and program |
JP2016024545A (en) * | 2014-07-17 | 2016-02-08 | 株式会社Nttドコモ | Information management apparatus, information management system, and information management method |
CN107045524A (en) * | 2016-12-30 | 2017-08-15 | 中央民族大学 | A kind of method and system of network text public sentiment classification |
Non-Patent Citations (1)
Title |
---|
CHAO LI等: "A Novel Document Classification Algorithm Based on Statistical Features and Attention Mechanism", 《2018 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS 》 * |
Cited By (12)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109992774A (en) * | 2019-03-25 | 2019-07-09 | 北京理工大学 | The key phrase recognition methods of word-based attribute attention mechanism |
CN110309317A (en) * | 2019-05-22 | 2019-10-08 | 中国传媒大学 | Term vector generation method, system, electronic device and the medium of Chinese corpus |
CN110309317B (en) * | 2019-05-22 | 2021-07-23 | 中国传媒大学 | Method, system, electronic device and medium for generating word vector of Chinese corpus |
CN110209824A (en) * | 2019-06-13 | 2019-09-06 | 中国科学院自动化研究所 | Text emotion analysis method based on built-up pattern, system, device |
CN110209824B (en) * | 2019-06-13 | 2021-06-22 | 中国科学院自动化研究所 | Text emotion analysis method, system and device based on combined model |
CN110309306A (en) * | 2019-06-19 | 2019-10-08 | 淮阴工学院 | A kind of Document Modeling classification method based on WSD level memory network |
CN110781303A (en) * | 2019-10-28 | 2020-02-11 | 佰聆数据股份有限公司 | Short text classification method and system |
CN111061881A (en) * | 2019-12-27 | 2020-04-24 | 浪潮通用软件有限公司 | Text classification method, equipment and storage medium |
CN111159409A (en) * | 2019-12-31 | 2020-05-15 | 腾讯科技(深圳)有限公司 | Text classification method, device, equipment and medium based on artificial intelligence |
CN111159409B (en) * | 2019-12-31 | 2023-06-02 | 腾讯科技(深圳)有限公司 | Text classification method, device, equipment and medium based on artificial intelligence |
CN112612898A (en) * | 2021-03-05 | 2021-04-06 | 蚂蚁智信(杭州)信息技术有限公司 | Text classification method and device |
CN113407721A (en) * | 2021-06-29 | 2021-09-17 | 哈尔滨工业大学(深圳) | Method, device and computer storage medium for detecting log sequence abnormity |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN109325114A (en) | A kind of text classification algorithm merging statistical nature and Attention mechanism | |
Wang et al. | A hybrid document feature extraction method using latent Dirichlet allocation and word2vec | |
Song et al. | Research on text classification based on convolutional neural network | |
Wang et al. | Integrating extractive and abstractive models for long text summarization | |
US11675981B2 (en) | Neural network systems and methods for target identification from text | |
WO2019080863A1 (en) | Text sentiment classification method, storage medium and computer | |
Quan et al. | An efficient framework for sentence similarity modeling | |
CN112001185A (en) | Emotion classification method combining Chinese syntax and graph convolution neural network | |
CN108197111A (en) | A kind of text automatic abstracting method based on fusion Semantic Clustering | |
CN108733653A (en) | A kind of sentiment analysis method of the Skip-gram models based on fusion part of speech and semantic information | |
CN108763402A (en) | Class center vector Text Categorization Method based on dependence, part of speech and semantic dictionary | |
CN111078833B (en) | Text classification method based on neural network | |
CN112001186A (en) | Emotion classification method using graph convolution neural network and Chinese syntax | |
CN104834735A (en) | Automatic document summarization extraction method based on term vectors | |
CN110378409A (en) | It is a kind of based on element association attention mechanism the Chinese get over news documents abstraction generating method | |
CN106599032A (en) | Text event extraction method in combination of sparse coding and structural perceptron | |
CN109684642A (en) | A kind of abstract extraction method of combination page parsing rule and NLP text vector | |
CN110175221A (en) | Utilize the refuse messages recognition methods of term vector combination machine learning | |
Qiu et al. | Advanced sentiment classification of *** microblogs on smart campuses based on multi-feature fusion | |
CN102779119B (en) | A kind of method of extracting keywords and device | |
Errami et al. | Sentiment Analysis onMoroccan Dialect based on ML and Social Media Content Detection | |
Gao et al. | Sentiment classification for stock news | |
Foong et al. | Text summarization using latent semantic analysis model in mobile android platform | |
CN103744838A (en) | Chinese emotional abstract system and Chinese emotional abstract method for measuring mainstream emotional information | |
CN114265936A (en) | Method for realizing text mining of science and technology project |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20190212 |
|
RJ01 | Rejection of invention patent application after publication |