CN110069780A - A kind of emotion word recognition method and system based on specific area text - Google Patents
A kind of emotion word recognition method and system based on specific area text Download PDFInfo
- Publication number
- CN110069780A CN110069780A CN201910316622.8A CN201910316622A CN110069780A CN 110069780 A CN110069780 A CN 110069780A CN 201910316622 A CN201910316622 A CN 201910316622A CN 110069780 A CN110069780 A CN 110069780A
- Authority
- CN
- China
- Prior art keywords
- word
- emotion word
- emotion
- corpus
- template
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/33—Querying
- G06F16/3331—Query processing
- G06F16/3332—Query translation
- G06F16/3335—Syntactic pre-processing, e.g. stopword elimination, stemming
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/30—Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
- G06F16/36—Creation of semantic tools, e.g. ontology or thesauri
- G06F16/374—Thesaurus
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/237—Lexical tools
- G06F40/247—Thesauruses; Synonyms
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computational Linguistics (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- Databases & Information Systems (AREA)
- Data Mining & Analysis (AREA)
- Health & Medical Sciences (AREA)
- Artificial Intelligence (AREA)
- Audiology, Speech & Language Pathology (AREA)
- General Health & Medical Sciences (AREA)
- Machine Translation (AREA)
Abstract
The present invention provides emotion word recognition methods and system based on specific area text, and this method comprises the following steps: pretreatment pre-processes corpus data;Emotion word identification calculates using the emotion word discovery based on template and based on the discovery of the emotion word of naive Bayesian the polarity for the emotional expression word for obtaining emotional expression word, while obtaining using the emotion word discovery judgement based on naive Bayesian;The emotional expression word obtained by the identification of above-mentioned emotion word is ranked up, chooses the emotion candidate word of highest scoring by post-processing, is used to expand emotion word dictionary as final emotion word.By means of the invention it is also possible to realize that the identification to emotion word is extracted, and the positive and negative feeling polarities of emotion word can be exported, and this method is not necessarily to the corpus manually marked, the emotion word being fully automated identification may be implemented.
Description
Technical field
The present invention relates to emotion word identify field, in particular to a kind of emotion word based on specific area text with
And the method and system of polarity identification.
Background technique
In the epoch of big data and artificial intelligence, it is desirable that artificial intelligence system not only has anthropoid thinking and reasoning energy
Power, it is also desirable to can perceive and show emotion.It is to having so sentiment analysis is the hot and difficult issue studied at present
The subjective texts of emotional color are analyzed, handled, concluded and the process of reasoning.The task and other natural language processings are appointed
Business is the same, it is necessary first to the support of resource.On this basis, carry out text emotion classification work.Thus the building of resource is institute
There is the foundation stone of task, affection resources generally comprise sentiment dictionary and Emotional Corpus.For sentiment dictionary, people is mainly used at present
The mode of work screening constructs, and needs biggish cost, and the dictionary scale constructed is smaller.For comparing English, Chinese emotion
Dictionary creation and research are still immature.
The present invention provides a kind of sentiment dictionary extending method based on certain specific area, in different field, different task
Upper development sentiment analysis research provides the support of corpus.
The prior art CN106776566A in the field discloses recognition methods and the device of a kind of emotion vocabulary, should
Method needs to carry out positive negative sense emotion label to the emotion word in text first, then according to label result and chi-square statistics feature
Selection algorithm determines candidate emotion word;Prior art CN107729374A discloses a kind of method for expanding emotion word dictionary,
This method is according to direction selection corpus is expanded, and the matching degree for carrying out term vector to corpus and sentiment dictionary calculates, root
According to calculated result, the corpus data in the corpus is selected to expand the sentiment dictionary.However the above-mentioned prior art
Used emotion word recognition method is huge with this programme difference, simultaneously because this programme is without manually marking, it is practical
On realize the purpose of automatic identification.
Summary of the invention
The purpose of the present invention is to provide a kind of emotion word identifying system and method based on specific area text, Jin Erzhi
It is few to overcome the problems, such as caused by the limitation and defect due to the relevant technologies one or more to a certain extent.
Other characteristics and advantages of the invention will be apparent from by the following detailed description, or partially by the present invention
Practice and acquistion.
The present invention is directed to text emotion analysis task, is based on didactic sentiment analysis task, the accuracy rate of dictionary and covers
Lid rate determines the accuracy rate and recall rate of sentiment analysis task;In addition, sentiment dictionary can also be mentioned effectively as external resource
Rise the accuracy rate of machine learning model.Under big data era, magnanimity corpus can be obtained easily, but since corpus scale is huge
Greatly, sentiment dictionary can not be constructed by way of artificial screening.In addition, with the development and universal, network use of Internet application
Language, newspeak emerge one after another, and are constructed if relying on manual type, not only expend huge human cost, construct scale also not
It can be guaranteed.In view of the above deficiencies, the present invention can with the emotion word in automatic identification corpus and calculate emotion word emotion power
Weight, by Sentiment orientation come expanding sentiment dictionary.
Present invention firstly provides a kind of emotion word recognition method based on specific area text, this method includes following step
It is rapid:
Pretreatment, pre-processes corpus data, and the pretreatment is to carry out cleaning filtering, subordinate sentence, participle to corpus
Processing;
Emotion word identification is calculated using the emotion word discovery based on template and based on the discovery of the emotion word of naive Bayesian
Obtain emotional expression word, while the pole of the emotional expression word obtained using the emotion word discovery judgement based on naive Bayesian
Property;
The emotional expression word obtained by the identification of above-mentioned emotion word is ranked up, chooses the feelings of highest scoring by post-processing
Feel candidate word, is used to expand emotion word dictionary as final emotion word.
Preferably, the emotion word discovery algorithm based on template is directed to emerging word, finds some prevalences instantly
Emotional expression word;The emotion word discovery algorithm based on naive Bayesian is directed to the word of writtenization, excavates regular emotion
Express word.
Preferably, the implementation using the emotion word discovery based on template is as follows:
Input: seed set of words seed={ word1,word2......wordnAnd pretreated corpus;
Step 1: extracting all templates under current seed set, the i.e. previous word of seed words or punctuation mark and the latter
Word or punctuation mark;
Step 2: evaluating all templates, select highest preceding 5 templates of score, constitute extraction template;Template evaluation is public
Formula are as follows:
Wherein, T indicates template, and score (T) indicates the final score of a template, and s indicates the emotion of seed set of words
Word, Freq (T, s) indicate the number that the template T of the s of emotion word containing seed occurs in corpus;If containing degree adverb in T, σ
=2, if not having, σ=1;
Step 3: extracting example or word using extraction template;
Step 4: evaluating all words of extraction, select score highest as candidate emotion word, evaluate all of extraction
Word judgement schematics are as follows:
Wherein, word is the emotion word extracted;Score (word) is the final score of the emotion word;It indicates in template T set, the frequency that the template t containing word word occurs in corpus;| T | indicate mould
The number of plate T set;
Step 5: choosing remaining seed words, continue to step 1, emotion word extraction is carried out, until seed collection is combined into sky;
Output: the candidate emotion word with score.
Preferably, the implementation using the emotion word discovery based on naive Bayesian is as follows:
It is a file by whole text organizations in corpus, participle is carried out to text in this document and subordinate sentence operates, and
Filter stop words;
Calculate the tf-idf value of pretreatment corpus;
The maximum word of tf-idf value is chosen as keyword, i.e., candidate emotion word;
It is loaded into seed emotion word dictionary, using the emotion word recognizer based on naive Bayesian, calculates candidate emotion word
Sentiment orientation to obtaining emotion word.
The weight calculation of the candidate emotion word uses following formula:
Wherein, S#Indicate the weight of candidate emotion word;wiIndicate a word, i=1...n, then w1...wi...wnIndicate one
A candidate's emotion word;S*Indicate seed emotion word;
Wherein n (wi,S*) indicate w in corpusiAnd S*The number occurred jointly, n (wi)
Indicate w in corpusiThe number of appearance, δ is that do not occur altering in corpus in order to prevent that word frequency is caused to be 0, after carrying out data smoothing
The constant of introducing;
Wherein Freq (S*) it is the number that seed emotion word occurs in corpus,Indicate the total degree that all words occur, word indicates the emotion word extracted, i.e., aforementioned
w1...wi...wn;
Since each candidate's emotion word has positive emotion weightWith negative affect weightThen:
Final emotional value
Meanwhile the present invention also provides a kind of emotion word identifying systems based on specific area text, it is characterised in that: this is
System includes:
Preprocessing module carries out data cleansing, participle, subordinate sentence pretreatment to corpus data, when the corpus data is net
When page data, the preprocessing module also needs to execute removal noise data, and extracts related text content;
Emotion word identification module, the emotion word identification module is using the emotion word discovery based on template and based on simple shellfish
The emotion word discovery algorithm of Ye Si finds candidate emotion word, and the emotion word discovery based on template is with described based on simple pattra leaves
This emotion word finds that two kinds of algorithms are performed simultaneously, and the candidate emotion word that the emotion word discovery based on template obtains is logical
It crosses the emotion word discovery based on naive Bayesian and carries out the judgement of emotion word polarity;
Post-processing module has what is obtained by the emotion word identification module described in polar candidate emotion word input
Post-processing module, the post-processing module is ranked up described with polar candidate emotion word, and chooses highest scoring
Part candidate word as final emotion word, to expand sentiment dictionary.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not
It can the limitation present invention.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention
Example, and be used to explain the principle of the present invention together with specification.It should be evident that the accompanying drawings in the following description is only the present invention
Some embodiments for those of ordinary skill in the art without creative efforts, can also basis
These attached drawings obtain other attached drawings.In the accompanying drawings:
Fig. 1 shows emotion word identification work flow diagram of the invention;
Fig. 2 shows the flow charts based on the discovery of naive Bayesian emotion word in the embodiment of the present invention;
Fig. 3 shows present invention post-processing work flow diagram.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes
Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, thesing embodiments are provided so that the present invention will more
Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.
In addition, described feature, structure or characteristic can be incorporated in one or more implementations in any suitable manner
In example.In the following description, many details are provided to provide and fully understand to the embodiment of the present invention.However,
It will be appreciated by persons skilled in the art that technical solution of the present invention can be practiced without one or more in specific detail,
Or it can be using other methods, constituent element, device, step etc..In other cases, it is not shown in detail or describes known side
Method, device, realization or operation are to avoid fuzzy each aspect of the present invention.
Block diagram shown in the drawings is only functional entity, not necessarily must be corresponding with physically separate entity.
I.e., it is possible to realize these functional entitys using software form, or realized in one or more hardware modules or integrated circuit
These functional entitys, or these functional entitys are realized in heterogeneous networks and/or processor device and/or microcontroller device.
Flow chart shown in the drawings is merely illustrative, it is not necessary to including all content and operation/step,
It is not required to execute by described sequence.For example, some operation/steps can also decompose, and some operation/steps can close
And or part merge, therefore the sequence actually executed is possible to change according to the actual situation.
In view of the problems of the existing technology, the present invention provides a kind of emotion word identification sides based on specific area text
Method and system.
The present invention is divided into three parts, is respectively as follows: preprocessing module, emotion word identification module and post-processing module.
As shown in Figure 1.Wherein, data initially enter preprocessing module, which is to carry out some necessary places to corpus
Reason, such as: cleaning filtering, subordinate sentence, participle etc..Data that treated, into emotion recognition module, the module is by two kinds of algorithm structures
At the discovery of the emotion word based on template and the emotion word discovery based on naive Bayesian.Data use two kinds of algorithm hairs simultaneously
Existing candidate's emotion word, the former designs for some emerging words, and main function is the emotion table for finding some nearest prevalences
Up to word, and the latter is primarily directed to the word of more writtenization, and main function is to excavate regular emotional expression word.Emotion recognition
Two kinds of algorithms of module are performed simultaneously, but can not be judged by the emotion word that " emotion word based on template is found " algorithm obtains
The polarity (emotion word that judgement is positive emotion word or passiveness) of word, thus, obtained emotion word is also needed by based on Piao
The emotion word discovery of plain Bayes judges the polarity of emotion word.Then, all candidate emotion words are sent into post-processing module.It should
Module is that the emotion word for selecting algorithm is ranked up, and most highest scoring candidate word is chosen, as final emotion word, for expanding
Fill sentiment dictionary.
In preprocessing module, which pre-processes corpus data, such as: subordinate sentence, some conventional behaviour such as participle
Make.Different modes is taken for different data sources, such as: if web data, then first needs to remove web page tag etc. and make an uproar
Sound data, and extract related text content.
Emotion word identification module includes two kinds of algorithms, and the emotion word respectively based on template finds and be based on naive Bayesian
Emotion word discovery.
According to one embodiment of present invention, the emotion word discovery based on template is accomplished in that
Input: seed set of words seed={ word1,word2......wordnAnd pretreated corpus;
Step 1: extracting all templates under current seed set, the i.e. previous word of seed words or punctuation mark and the latter
Word or punctuation mark;
Step 2: evaluating all templates, select highest preceding 5 templates of score, constitute extraction template;Template evaluation is public
Formula are as follows:
Wherein, T indicates template, such as: " too<word>" is a template.This algorithm first has to from corpus, leads to
It crosses seed emotion word and obtains the higher template of template evaluation score, then gone to match entire corpus with the template found, be obtained
Candidate emotion word.For example, seed words have to words such as power, arrogances, through counting, " giving very much power ", " too arrogance " etc. are arranged in pairs or groups out
Existing frequency is higher, obtains template " too<word>".Then, by this template, entire corpus is searched, is matched to and " dazzles very much
", the collocation such as " too hole father ", finally obtain " dazzling ", the emotion words such as " hole father ".Score (T) indicates final point of a template
Number, s indicate the emotion word of seed set of words, and Freq (T, s) indicates time that the template T of the s of emotion word containing seed occurs in corpus
Number;If containing degree adverb in T, σ=2, if not having, σ=1;Through counting, emotion word often occurs together with degree adverb,
So introducing variable σ.If containing degree adverb σ=2 in template, finally obtained score is higher.In addition to this, σ=1 obtains
Score it is lower.
Step 3: extracting example or word using extraction template;
Step 4: evaluating all words of extraction, select score highest as candidate emotion word.It is extracted using template
After emotion word, need to carry out a primary dcreening operation.It is desirable that certain word as much as possible can appear in multiple template, at the same time it is wished that
The word occurs homogeneously in multiple template as far as possible.It is thus evaluated using following formula, word is the feelings extracted
Feel word, score (word) is the final score of candidate's emotion word, all words for equally taking the highest word evaluation of score to extract
Judgement schematics are as follows:
Wherein, word is the emotion word extracted;Score (word) is the final score of the emotion word;It indicates in template T set, the frequency that the template t containing word word occurs in corpus;| T | indicate mould
The number of plate T set;
Step 5: choosing remaining seed words, continue to step 1, emotion word extraction is carried out, until seed collection is combined into sky;
Output: the candidate emotion word with score.
According to another embodiment of the present invention, as shown in Fig. 2, the emotion word discovery based on Bayes is real in the following way
It is existing:
By counting large-scale corpus, find a phenomenon: certain words can be frequently appeared in some emotion words.We
Method assumes that the word in corpus is in love sighing with emotion rate.Using the method for naive Bayesian, by calculating word in candidate emotion word
Emotion probability, to calculate the Sentiment orientation of word.From large-scale corpus, emotion word is picked out.
Firstly, organizing whole chapters in corpus for a file.Then subordinate sentence is carried out to the text and participle is grasped
Make, and filters stop words.Then tf-idf value (the common weighting for information retrieval and data mining of pretreatment corpus is calculated
Technology, to assess a words for the significance level of a copy of it file in a file set or a corpus).Selected value
Maximum word is as keyword, i.e., candidate emotion word.It is finally loaded seed emotion word dictionary, utilizes the feelings based on naive Bayesian
Feel word recognizer, obtains emotion word.
Pretreated corpus and a seed emotion word dictionary are now given, wherein there is positive emotion word SJustWith negative affect word
SIt is negative.Assuming that wiIndicate a word, i=1...n, then w1...wi...wnIt indicates a candidate emotion word, judges whether it is emotion
Word and Sentiment orientation degree are abstracted as following mathematical expression formula:
Use S#Indicate the emotion weighted value of candidate emotion word.P(S*|w1..wi..wn) indicate candidate word w1..wi..wnIt is feelings
Feel the probability of word.P(w1..wi..wn) it is probability distribution of the candidate word in corpus, it is a definite value, can ignores.
From the conditional independence of naive Bayesian: ()
P(S*,w1,w2...wn)=P (w1|S*)P(w2|S*)···P(wn|S*)P(S*), wherein S* indicates seed emotion
Word;
The weight calculation of candidate emotion word is as follows:
The calculation method of the every part of above-mentioned formula is illustrated separately below, and formula (1) calculates the emotion probability of word in corpus, root
According to the law of large numbers, with frequency representation probability.Do not occur the word in corpus in order to prevent, word frequency 0 carries out data smoothing, introduces
One constant δ (the case where denominator is 0 in order to prevent occurs).Wherein, n (wi,S*) represent w in corpusiAnd S*Time occurred jointly
Number, n (wi) represent w in corpusiThe number of appearance.
Formula (2) indicates distribution situation of the emotion word S* in corpus, Freq (S*) it is what emotion word occurred in corpus
Number,Indicate the total degree that all words occur;
Since each candidate's emotion word has positive emotion weight and negative affect weight, i.e.,WithFinal emotional value
CalculatingIf S is greater than 0, be positive emotion word, and be then negative emotion word on the contrary.
Finally, be ranked up by its emotional value as shown in figure 3, obtained emotion word is input in post-processing module, and
Choose total quantity preceding 30% is exported, and as emotion word, is stored in dictionary.
The case where present invention is directed in text emotion analysis task, artificial constructed sentiment dictionary time and effort consuming, proposes two
The method of the automatic discovery emotion word of kind.By a seed dictionary, the automatic identification emotion word from large-scale corpus.Wherein, base
In the emotion word recognition method of template, the neologisms (such as: microblogging social platform) occurred in network corpus are mainly solved, due to certain
The expression of a little neologisms is similar with the expression of existing word, based on this using the boundary of template identification neologisms, skips new word identification, directly
Identify emotion word.Emotion word recognition method based on naive Bayesian passes through word using the conditional independence of naive Bayesian
Emotional value calculates the emotional value of entire word, can quickly and accurately extract in document the emotion word of (news corpus).
It should be noted that although being referred to several modules or list for acting the equipment executed in the above detailed description
Member, but this division is not enforceable.In fact, embodiment according to the present invention, it is above-described two or more
Module or the feature and function of unit can embody in a module or unit.Conversely, an above-described mould
The feature and function of block or unit can be to be embodied by multiple modules or unit with further division.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its
Its embodiment.This application is intended to cover any variations, uses, or adaptations of the invention, these modifications, purposes or
Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the present invention
Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are wanted by right
It asks and points out.
It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and
And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.
Claims (9)
1. a kind of emotion word recognition method based on specific area text, which is characterized in that this method comprises the following steps:
Pretreatment, pre-processes corpus data;
Emotion word identification is obtained using the emotion word discovery based on template and based on the discovery of the emotion word of naive Bayesian to calculate
Emotional expression word, while the polarity of the emotional expression word obtained using the emotion word discovery judgement based on naive Bayesian;
The emotional expression word obtained by the identification of above-mentioned emotion word is ranked up by post-processing, and the emotion for choosing highest scoring is waited
Word is selected, is used to expand emotion word dictionary as final emotion word.
2. according to the method described in claim 1, it is characterized by: it is described pretreatment for corpus carry out cleaning filtering, subordinate sentence,
Word segmentation processing.
3. according to the method described in claim 1, it is characterized by: the emotion word discovery algorithm based on template is directed to newly to go out
Existing word finds some instantly popular emotional expression words;The emotion word discovery algorithm based on naive Bayesian is directed to book
The word in face excavates regular emotional expression word.
4. method according to any one of claim 1-3, it is characterised in that: utilize the emotion word discovery based on template
Implementation is as follows:
Input: seed set of words seed={ word1,word2......wordnAnd pretreated corpus;
Step 1: extract all templates under current seed set, i.e., the previous word of seed words or punctuation mark and the latter word or
Punctuation mark;
Step 2: evaluating all templates, select highest preceding 5 templates of score, constitute extraction template;
Step 3: extracting example or word using extraction template;
Step 4: evaluating all words of extraction, select score highest as candidate emotion word;
Step 5: choosing remaining seed words, continue to step 1, emotion word extraction is carried out, until seed collection is combined into sky;
Output: the candidate emotion word with score.
5. according to the method described in claim 4, it is characterized by: evaluating the template that all templates use in the step 2
Judgement schematics are as follows:
Wherein, T indicates template, and score (T) indicates the final score of a template, and s indicates the emotion word of seed set of words,
Freq (T, s) indicates the number that the template T of the s of emotion word containing seed occurs in corpus;If containing degree adverb in T, σ=2,
If no, σ=1;
The judgement schematics of all words of extraction are evaluated in the step 4 are as follows:
Wherein, word is the emotion word extracted;Score (word) is the final score of the emotion word;It indicates in template T set, the frequency that the template t containing word word occurs in corpus;| T | indicate mould
The number of plate T set.
6. method according to claim 1 to 3, it is characterised in that: utilize the reality of the emotion word discovery based on naive Bayesian
Existing mode is as follows:
It is a file by whole text organizations in corpus, participle is carried out to text in this document and subordinate sentence operates, and is filtered
Stop words;
Calculate the tf-idf value of pretreatment corpus;
The maximum word of tf-idf value is chosen as keyword, i.e., candidate emotion word;
It is loaded into seed emotion word dictionary, using the emotion word recognizer based on naive Bayesian, calculates the feelings of candidate emotion word
Sense tendency is to obtain emotion word.
7. according to the method described in claim 6, it is characterized by: the weight calculation of the candidate emotion word is using following public
Formula:
Wherein, S#Indicate the weight of candidate emotion word;wiIndicate a word, i=1...n, then w1...wi...wnIndicate a time
Select emotion word;S*Indicate seed emotion word;
Wherein n (wi,S*) indicate w in corpusiAnd S*The number occurred jointly, n (wi) indicate
W in corpusiThe number of appearance, δ are that do not occur the word in corpus in order to prevent word frequency is caused to be 0, are introduced after carrying out data smoothing
Constant;
Wherein Freq (S*) it is the number that seed emotion word occurs in corpus,Indicate the total degree that all words occur, wordiIndicate the candidate emotion word extracted;
Since each candidate's emotion word has positive emotion weightWith negative affect weightThen: final emotional value
8. a kind of emotion word identification based on specific area text to realize such as claim 1-7 any one the method
System, it is characterised in that: the system includes:
Preprocessing module carries out data cleansing, participle, subordinate sentence pretreatment to corpus data;
Emotion word identification module, the emotion word identification module is using the emotion word discovery based on template and is based on naive Bayesian
Emotion word discovery algorithm find candidate emotion word, the emotion word discovery based on template and described based on naive Bayesian
Emotion word finds that two kinds of algorithms are performed simultaneously, and the candidate emotion word that the emotion word discovery based on template obtains passes through institute
It states the emotion word discovery based on naive Bayesian and carries out the judgement of emotion word polarity;
Post-processing module inputs the rear place with polar candidate emotion word for what is obtained by the emotion word identification module
Module is managed, the post-processing module is ranked up described with polar candidate emotion word, and chooses the portion of highest scoring
Divide candidate word as final emotion word, to expand sentiment dictionary.
9. system according to claim 8, which is characterized in that when the corpus data is web data, the pre- place
Reason module also needs to execute removal noise data, and extracts related text content.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910316622.8A CN110069780B (en) | 2019-04-19 | 2019-04-19 | Specific field text-based emotion word recognition method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201910316622.8A CN110069780B (en) | 2019-04-19 | 2019-04-19 | Specific field text-based emotion word recognition method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN110069780A true CN110069780A (en) | 2019-07-30 |
CN110069780B CN110069780B (en) | 2021-11-19 |
Family
ID=67367977
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201910316622.8A Active CN110069780B (en) | 2019-04-19 | 2019-04-19 | Specific field text-based emotion word recognition method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN110069780B (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111310455A (en) * | 2020-02-11 | 2020-06-19 | 安徽理工大学 | New emotion word polarity calculation method for online shopping comments |
Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008151926A (en) * | 2006-12-15 | 2008-07-03 | Internatl Business Mach Corp <Ibm> | Technique to search new phrase to be registered in voice processing dictionary |
CN103544246A (en) * | 2013-10-10 | 2014-01-29 | 清华大学 | Method and system for constructing multi-emotion dictionary for internet |
CN103955453A (en) * | 2014-05-23 | 2014-07-30 | 清华大学 | Method and device for automatically discovering new words from document set |
CN104268197A (en) * | 2013-09-22 | 2015-01-07 | 中科嘉速(北京)并行软件有限公司 | Industry comment data fine grain sentiment analysis method |
CN105912576A (en) * | 2016-03-31 | 2016-08-31 | 北京外国语大学 | Emotion classification method and emotion classification system |
CN107305539A (en) * | 2016-04-18 | 2017-10-31 | 南京理工大学 | A kind of text tendency analysis method based on Word2Vec network sentiment new word discoveries |
CN109947951A (en) * | 2019-03-19 | 2019-06-28 | 北京师范大学 | A kind of automatically updated emotion dictionary construction method for financial text analyzing |
CN110889275A (en) * | 2018-09-07 | 2020-03-17 | 鼎复数据科技(北京)有限公司 | Information extraction method based on deep semantic understanding |
-
2019
- 2019-04-19 CN CN201910316622.8A patent/CN110069780B/en active Active
Patent Citations (8)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2008151926A (en) * | 2006-12-15 | 2008-07-03 | Internatl Business Mach Corp <Ibm> | Technique to search new phrase to be registered in voice processing dictionary |
CN104268197A (en) * | 2013-09-22 | 2015-01-07 | 中科嘉速(北京)并行软件有限公司 | Industry comment data fine grain sentiment analysis method |
CN103544246A (en) * | 2013-10-10 | 2014-01-29 | 清华大学 | Method and system for constructing multi-emotion dictionary for internet |
CN103955453A (en) * | 2014-05-23 | 2014-07-30 | 清华大学 | Method and device for automatically discovering new words from document set |
CN105912576A (en) * | 2016-03-31 | 2016-08-31 | 北京外国语大学 | Emotion classification method and emotion classification system |
CN107305539A (en) * | 2016-04-18 | 2017-10-31 | 南京理工大学 | A kind of text tendency analysis method based on Word2Vec network sentiment new word discoveries |
CN110889275A (en) * | 2018-09-07 | 2020-03-17 | 鼎复数据科技(北京)有限公司 | Information extraction method based on deep semantic understanding |
CN109947951A (en) * | 2019-03-19 | 2019-06-28 | 北京师范大学 | A kind of automatically updated emotion dictionary construction method for financial text analyzing |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN111310455A (en) * | 2020-02-11 | 2020-06-19 | 安徽理工大学 | New emotion word polarity calculation method for online shopping comments |
Also Published As
Publication number | Publication date |
---|---|
CN110069780B (en) | 2021-11-19 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Singh et al. | Vectorization of text documents for identifying unifiable news articles | |
CN104462053B (en) | A kind of personal pronoun reference resolution method based on semantic feature in text | |
CN104572958B (en) | A kind of sensitive information monitoring method based on event extraction | |
Panchenko et al. | Unsupervised does not mean uninterpretable: The case for word sense induction and disambiguation | |
WO2019080863A1 (en) | Text sentiment classification method, storage medium and computer | |
CN109670039B (en) | Semi-supervised e-commerce comment emotion analysis method based on three-part graph and cluster analysis | |
CN111221962B (en) | Text emotion analysis method based on new word expansion and complex sentence pattern expansion | |
CN109376251A (en) | A kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model | |
CN111831802B (en) | Urban domain knowledge detection system and method based on LDA topic model | |
CN106446109A (en) | Acquiring method and device for audio file abstract | |
CN107729468A (en) | Answer extracting method and system based on deep learning | |
CN103744953A (en) | Network hotspot mining method based on Chinese text emotion recognition | |
CN108038205A (en) | For the viewpoint analysis prototype system of Chinese microblogging | |
CN105760363B (en) | Word sense disambiguation method and device for text file | |
CN108228569A (en) | A kind of Chinese microblog emotional analysis method based on Cooperative Study under the conditions of loose | |
CN112052356A (en) | Multimedia classification method, apparatus and computer-readable storage medium | |
CN104573030A (en) | Textual emotion prediction method and device | |
CN109325122A (en) | Vocabulary generation method, file classification method, device, equipment and storage medium | |
CN105589976B (en) | Method and device is determined based on the target entity of semantic relevancy | |
CN111626050A (en) | Microblog emotion analysis method based on expression dictionary and emotion common sense | |
Nguyen et al. | An ensemble of shallow and deep learning algorithms for Vietnamese sentiment analysis | |
CN114722176A (en) | Intelligent question answering method, device, medium and electronic equipment | |
JP2013131075A (en) | Classification model learning method, device, program, and review document classifying method | |
Pay et al. | An ensemble of automatic keyword extractors: TextRank, RAKE and TAKE | |
CN112926341A (en) | Text data processing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |