CN110069780A - A kind of emotion word recognition method and system based on specific area text - Google Patents

A kind of emotion word recognition method and system based on specific area text Download PDF

Info

Publication number
CN110069780A
CN110069780A CN201910316622.8A CN201910316622A CN110069780A CN 110069780 A CN110069780 A CN 110069780A CN 201910316622 A CN201910316622 A CN 201910316622A CN 110069780 A CN110069780 A CN 110069780A
Authority
CN
China
Prior art keywords
word
emotion word
emotion
corpus
template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201910316622.8A
Other languages
Chinese (zh)
Other versions
CN110069780B (en
Inventor
张力文
程国艮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Chinese Translation Language Through Polytron Technologies Inc
Original Assignee
Chinese Translation Language Through Polytron Technologies Inc
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Chinese Translation Language Through Polytron Technologies Inc filed Critical Chinese Translation Language Through Polytron Technologies Inc
Priority to CN201910316622.8A priority Critical patent/CN110069780B/en
Publication of CN110069780A publication Critical patent/CN110069780A/en
Application granted granted Critical
Publication of CN110069780B publication Critical patent/CN110069780B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The present invention provides emotion word recognition methods and system based on specific area text, and this method comprises the following steps: pretreatment pre-processes corpus data;Emotion word identification calculates using the emotion word discovery based on template and based on the discovery of the emotion word of naive Bayesian the polarity for the emotional expression word for obtaining emotional expression word, while obtaining using the emotion word discovery judgement based on naive Bayesian;The emotional expression word obtained by the identification of above-mentioned emotion word is ranked up, chooses the emotion candidate word of highest scoring by post-processing, is used to expand emotion word dictionary as final emotion word.By means of the invention it is also possible to realize that the identification to emotion word is extracted, and the positive and negative feeling polarities of emotion word can be exported, and this method is not necessarily to the corpus manually marked, the emotion word being fully automated identification may be implemented.

Description

A kind of emotion word recognition method and system based on specific area text
Technical field
The present invention relates to emotion word identify field, in particular to a kind of emotion word based on specific area text with And the method and system of polarity identification.
Background technique
In the epoch of big data and artificial intelligence, it is desirable that artificial intelligence system not only has anthropoid thinking and reasoning energy Power, it is also desirable to can perceive and show emotion.It is to having so sentiment analysis is the hot and difficult issue studied at present The subjective texts of emotional color are analyzed, handled, concluded and the process of reasoning.The task and other natural language processings are appointed Business is the same, it is necessary first to the support of resource.On this basis, carry out text emotion classification work.Thus the building of resource is institute There is the foundation stone of task, affection resources generally comprise sentiment dictionary and Emotional Corpus.For sentiment dictionary, people is mainly used at present The mode of work screening constructs, and needs biggish cost, and the dictionary scale constructed is smaller.For comparing English, Chinese emotion Dictionary creation and research are still immature.
The present invention provides a kind of sentiment dictionary extending method based on certain specific area, in different field, different task Upper development sentiment analysis research provides the support of corpus.
The prior art CN106776566A in the field discloses recognition methods and the device of a kind of emotion vocabulary, should Method needs to carry out positive negative sense emotion label to the emotion word in text first, then according to label result and chi-square statistics feature Selection algorithm determines candidate emotion word;Prior art CN107729374A discloses a kind of method for expanding emotion word dictionary, This method is according to direction selection corpus is expanded, and the matching degree for carrying out term vector to corpus and sentiment dictionary calculates, root According to calculated result, the corpus data in the corpus is selected to expand the sentiment dictionary.However the above-mentioned prior art Used emotion word recognition method is huge with this programme difference, simultaneously because this programme is without manually marking, it is practical On realize the purpose of automatic identification.
Summary of the invention
The purpose of the present invention is to provide a kind of emotion word identifying system and method based on specific area text, Jin Erzhi It is few to overcome the problems, such as caused by the limitation and defect due to the relevant technologies one or more to a certain extent.
Other characteristics and advantages of the invention will be apparent from by the following detailed description, or partially by the present invention Practice and acquistion.
The present invention is directed to text emotion analysis task, is based on didactic sentiment analysis task, the accuracy rate of dictionary and covers Lid rate determines the accuracy rate and recall rate of sentiment analysis task;In addition, sentiment dictionary can also be mentioned effectively as external resource Rise the accuracy rate of machine learning model.Under big data era, magnanimity corpus can be obtained easily, but since corpus scale is huge Greatly, sentiment dictionary can not be constructed by way of artificial screening.In addition, with the development and universal, network use of Internet application Language, newspeak emerge one after another, and are constructed if relying on manual type, not only expend huge human cost, construct scale also not It can be guaranteed.In view of the above deficiencies, the present invention can with the emotion word in automatic identification corpus and calculate emotion word emotion power Weight, by Sentiment orientation come expanding sentiment dictionary.
Present invention firstly provides a kind of emotion word recognition method based on specific area text, this method includes following step It is rapid:
Pretreatment, pre-processes corpus data, and the pretreatment is to carry out cleaning filtering, subordinate sentence, participle to corpus Processing;
Emotion word identification is calculated using the emotion word discovery based on template and based on the discovery of the emotion word of naive Bayesian Obtain emotional expression word, while the pole of the emotional expression word obtained using the emotion word discovery judgement based on naive Bayesian Property;
The emotional expression word obtained by the identification of above-mentioned emotion word is ranked up, chooses the feelings of highest scoring by post-processing Feel candidate word, is used to expand emotion word dictionary as final emotion word.
Preferably, the emotion word discovery algorithm based on template is directed to emerging word, finds some prevalences instantly Emotional expression word;The emotion word discovery algorithm based on naive Bayesian is directed to the word of writtenization, excavates regular emotion Express word.
Preferably, the implementation using the emotion word discovery based on template is as follows:
Input: seed set of words seed={ word1,word2......wordnAnd pretreated corpus;
Step 1: extracting all templates under current seed set, the i.e. previous word of seed words or punctuation mark and the latter Word or punctuation mark;
Step 2: evaluating all templates, select highest preceding 5 templates of score, constitute extraction template;Template evaluation is public Formula are as follows:
Wherein, T indicates template, and score (T) indicates the final score of a template, and s indicates the emotion of seed set of words Word, Freq (T, s) indicate the number that the template T of the s of emotion word containing seed occurs in corpus;If containing degree adverb in T, σ =2, if not having, σ=1;
Step 3: extracting example or word using extraction template;
Step 4: evaluating all words of extraction, select score highest as candidate emotion word, evaluate all of extraction Word judgement schematics are as follows:
Wherein, word is the emotion word extracted;Score (word) is the final score of the emotion word;It indicates in template T set, the frequency that the template t containing word word occurs in corpus;| T | indicate mould The number of plate T set;
Step 5: choosing remaining seed words, continue to step 1, emotion word extraction is carried out, until seed collection is combined into sky;
Output: the candidate emotion word with score.
Preferably, the implementation using the emotion word discovery based on naive Bayesian is as follows:
It is a file by whole text organizations in corpus, participle is carried out to text in this document and subordinate sentence operates, and Filter stop words;
Calculate the tf-idf value of pretreatment corpus;
The maximum word of tf-idf value is chosen as keyword, i.e., candidate emotion word;
It is loaded into seed emotion word dictionary, using the emotion word recognizer based on naive Bayesian, calculates candidate emotion word Sentiment orientation to obtaining emotion word.
The weight calculation of the candidate emotion word uses following formula:
Wherein, S#Indicate the weight of candidate emotion word;wiIndicate a word, i=1...n, then w1...wi...wnIndicate one A candidate's emotion word;S*Indicate seed emotion word;
Wherein n (wi,S*) indicate w in corpusiAnd S*The number occurred jointly, n (wi) Indicate w in corpusiThe number of appearance, δ is that do not occur altering in corpus in order to prevent that word frequency is caused to be 0, after carrying out data smoothing The constant of introducing;
Wherein Freq (S*) it is the number that seed emotion word occurs in corpus,Indicate the total degree that all words occur, word indicates the emotion word extracted, i.e., aforementioned w1...wi...wn
Since each candidate's emotion word has positive emotion weightWith negative affect weightThen:
Final emotional value
Meanwhile the present invention also provides a kind of emotion word identifying systems based on specific area text, it is characterised in that: this is System includes:
Preprocessing module carries out data cleansing, participle, subordinate sentence pretreatment to corpus data, when the corpus data is net When page data, the preprocessing module also needs to execute removal noise data, and extracts related text content;
Emotion word identification module, the emotion word identification module is using the emotion word discovery based on template and based on simple shellfish The emotion word discovery algorithm of Ye Si finds candidate emotion word, and the emotion word discovery based on template is with described based on simple pattra leaves This emotion word finds that two kinds of algorithms are performed simultaneously, and the candidate emotion word that the emotion word discovery based on template obtains is logical It crosses the emotion word discovery based on naive Bayesian and carries out the judgement of emotion word polarity;
Post-processing module has what is obtained by the emotion word identification module described in polar candidate emotion word input Post-processing module, the post-processing module is ranked up described with polar candidate emotion word, and chooses highest scoring Part candidate word as final emotion word, to expand sentiment dictionary.
It should be understood that above general description and following detailed description be only it is exemplary and explanatory, not It can the limitation present invention.
Detailed description of the invention
The drawings herein are incorporated into the specification and forms part of this specification, and shows and meets implementation of the invention Example, and be used to explain the principle of the present invention together with specification.It should be evident that the accompanying drawings in the following description is only the present invention Some embodiments for those of ordinary skill in the art without creative efforts, can also basis These attached drawings obtain other attached drawings.In the accompanying drawings:
Fig. 1 shows emotion word identification work flow diagram of the invention;
Fig. 2 shows the flow charts based on the discovery of naive Bayesian emotion word in the embodiment of the present invention;
Fig. 3 shows present invention post-processing work flow diagram.
Specific embodiment
Example embodiment is described more fully with reference to the drawings.However, example embodiment can be with a variety of shapes Formula is implemented, and is not understood as limited to example set forth herein;On the contrary, thesing embodiments are provided so that the present invention will more Fully and completely, and by the design of example embodiment comprehensively it is communicated to those skilled in the art.
In addition, described feature, structure or characteristic can be incorporated in one or more implementations in any suitable manner In example.In the following description, many details are provided to provide and fully understand to the embodiment of the present invention.However, It will be appreciated by persons skilled in the art that technical solution of the present invention can be practiced without one or more in specific detail, Or it can be using other methods, constituent element, device, step etc..In other cases, it is not shown in detail or describes known side Method, device, realization or operation are to avoid fuzzy each aspect of the present invention.
Block diagram shown in the drawings is only functional entity, not necessarily must be corresponding with physically separate entity. I.e., it is possible to realize these functional entitys using software form, or realized in one or more hardware modules or integrated circuit These functional entitys, or these functional entitys are realized in heterogeneous networks and/or processor device and/or microcontroller device.
Flow chart shown in the drawings is merely illustrative, it is not necessary to including all content and operation/step, It is not required to execute by described sequence.For example, some operation/steps can also decompose, and some operation/steps can close And or part merge, therefore the sequence actually executed is possible to change according to the actual situation.
In view of the problems of the existing technology, the present invention provides a kind of emotion word identification sides based on specific area text Method and system.
The present invention is divided into three parts, is respectively as follows: preprocessing module, emotion word identification module and post-processing module.
As shown in Figure 1.Wherein, data initially enter preprocessing module, which is to carry out some necessary places to corpus Reason, such as: cleaning filtering, subordinate sentence, participle etc..Data that treated, into emotion recognition module, the module is by two kinds of algorithm structures At the discovery of the emotion word based on template and the emotion word discovery based on naive Bayesian.Data use two kinds of algorithm hairs simultaneously Existing candidate's emotion word, the former designs for some emerging words, and main function is the emotion table for finding some nearest prevalences Up to word, and the latter is primarily directed to the word of more writtenization, and main function is to excavate regular emotional expression word.Emotion recognition Two kinds of algorithms of module are performed simultaneously, but can not be judged by the emotion word that " emotion word based on template is found " algorithm obtains The polarity (emotion word that judgement is positive emotion word or passiveness) of word, thus, obtained emotion word is also needed by based on Piao The emotion word discovery of plain Bayes judges the polarity of emotion word.Then, all candidate emotion words are sent into post-processing module.It should Module is that the emotion word for selecting algorithm is ranked up, and most highest scoring candidate word is chosen, as final emotion word, for expanding Fill sentiment dictionary.
In preprocessing module, which pre-processes corpus data, such as: subordinate sentence, some conventional behaviour such as participle Make.Different modes is taken for different data sources, such as: if web data, then first needs to remove web page tag etc. and make an uproar Sound data, and extract related text content.
Emotion word identification module includes two kinds of algorithms, and the emotion word respectively based on template finds and be based on naive Bayesian Emotion word discovery.
According to one embodiment of present invention, the emotion word discovery based on template is accomplished in that
Input: seed set of words seed={ word1,word2......wordnAnd pretreated corpus;
Step 1: extracting all templates under current seed set, the i.e. previous word of seed words or punctuation mark and the latter Word or punctuation mark;
Step 2: evaluating all templates, select highest preceding 5 templates of score, constitute extraction template;Template evaluation is public Formula are as follows:
Wherein, T indicates template, such as: " too<word>" is a template.This algorithm first has to from corpus, leads to It crosses seed emotion word and obtains the higher template of template evaluation score, then gone to match entire corpus with the template found, be obtained Candidate emotion word.For example, seed words have to words such as power, arrogances, through counting, " giving very much power ", " too arrogance " etc. are arranged in pairs or groups out Existing frequency is higher, obtains template " too<word>".Then, by this template, entire corpus is searched, is matched to and " dazzles very much ", the collocation such as " too hole father ", finally obtain " dazzling ", the emotion words such as " hole father ".Score (T) indicates final point of a template Number, s indicate the emotion word of seed set of words, and Freq (T, s) indicates time that the template T of the s of emotion word containing seed occurs in corpus Number;If containing degree adverb in T, σ=2, if not having, σ=1;Through counting, emotion word often occurs together with degree adverb, So introducing variable σ.If containing degree adverb σ=2 in template, finally obtained score is higher.In addition to this, σ=1 obtains Score it is lower.
Step 3: extracting example or word using extraction template;
Step 4: evaluating all words of extraction, select score highest as candidate emotion word.It is extracted using template After emotion word, need to carry out a primary dcreening operation.It is desirable that certain word as much as possible can appear in multiple template, at the same time it is wished that The word occurs homogeneously in multiple template as far as possible.It is thus evaluated using following formula, word is the feelings extracted Feel word, score (word) is the final score of candidate's emotion word, all words for equally taking the highest word evaluation of score to extract Judgement schematics are as follows:
Wherein, word is the emotion word extracted;Score (word) is the final score of the emotion word;It indicates in template T set, the frequency that the template t containing word word occurs in corpus;| T | indicate mould The number of plate T set;
Step 5: choosing remaining seed words, continue to step 1, emotion word extraction is carried out, until seed collection is combined into sky;
Output: the candidate emotion word with score.
According to another embodiment of the present invention, as shown in Fig. 2, the emotion word discovery based on Bayes is real in the following way It is existing:
By counting large-scale corpus, find a phenomenon: certain words can be frequently appeared in some emotion words.We Method assumes that the word in corpus is in love sighing with emotion rate.Using the method for naive Bayesian, by calculating word in candidate emotion word Emotion probability, to calculate the Sentiment orientation of word.From large-scale corpus, emotion word is picked out.
Firstly, organizing whole chapters in corpus for a file.Then subordinate sentence is carried out to the text and participle is grasped Make, and filters stop words.Then tf-idf value (the common weighting for information retrieval and data mining of pretreatment corpus is calculated Technology, to assess a words for the significance level of a copy of it file in a file set or a corpus).Selected value Maximum word is as keyword, i.e., candidate emotion word.It is finally loaded seed emotion word dictionary, utilizes the feelings based on naive Bayesian Feel word recognizer, obtains emotion word.
Pretreated corpus and a seed emotion word dictionary are now given, wherein there is positive emotion word SJustWith negative affect word SIt is negative.Assuming that wiIndicate a word, i=1...n, then w1...wi...wnIt indicates a candidate emotion word, judges whether it is emotion Word and Sentiment orientation degree are abstracted as following mathematical expression formula:
Use S#Indicate the emotion weighted value of candidate emotion word.P(S*|w1..wi..wn) indicate candidate word w1..wi..wnIt is feelings Feel the probability of word.P(w1..wi..wn) it is probability distribution of the candidate word in corpus, it is a definite value, can ignores.
From the conditional independence of naive Bayesian: ()
P(S*,w1,w2...wn)=P (w1|S*)P(w2|S*)···P(wn|S*)P(S*), wherein S* indicates seed emotion Word;
The weight calculation of candidate emotion word is as follows:
The calculation method of the every part of above-mentioned formula is illustrated separately below, and formula (1) calculates the emotion probability of word in corpus, root According to the law of large numbers, with frequency representation probability.Do not occur the word in corpus in order to prevent, word frequency 0 carries out data smoothing, introduces One constant δ (the case where denominator is 0 in order to prevent occurs).Wherein, n (wi,S*) represent w in corpusiAnd S*Time occurred jointly Number, n (wi) represent w in corpusiThe number of appearance.
Formula (2) indicates distribution situation of the emotion word S* in corpus, Freq (S*) it is what emotion word occurred in corpus Number,Indicate the total degree that all words occur;
Since each candidate's emotion word has positive emotion weight and negative affect weight, i.e.,WithFinal emotional value CalculatingIf S is greater than 0, be positive emotion word, and be then negative emotion word on the contrary.
Finally, be ranked up by its emotional value as shown in figure 3, obtained emotion word is input in post-processing module, and Choose total quantity preceding 30% is exported, and as emotion word, is stored in dictionary.
The case where present invention is directed in text emotion analysis task, artificial constructed sentiment dictionary time and effort consuming, proposes two The method of the automatic discovery emotion word of kind.By a seed dictionary, the automatic identification emotion word from large-scale corpus.Wherein, base In the emotion word recognition method of template, the neologisms (such as: microblogging social platform) occurred in network corpus are mainly solved, due to certain The expression of a little neologisms is similar with the expression of existing word, based on this using the boundary of template identification neologisms, skips new word identification, directly Identify emotion word.Emotion word recognition method based on naive Bayesian passes through word using the conditional independence of naive Bayesian Emotional value calculates the emotional value of entire word, can quickly and accurately extract in document the emotion word of (news corpus).
It should be noted that although being referred to several modules or list for acting the equipment executed in the above detailed description Member, but this division is not enforceable.In fact, embodiment according to the present invention, it is above-described two or more Module or the feature and function of unit can embody in a module or unit.Conversely, an above-described mould The feature and function of block or unit can be to be embodied by multiple modules or unit with further division.
Those skilled in the art after considering the specification and implementing the invention disclosed here, will readily occur to of the invention its Its embodiment.This application is intended to cover any variations, uses, or adaptations of the invention, these modifications, purposes or Person's adaptive change follows general principle of the invention and including the undocumented common knowledge in the art of the present invention Or conventional techniques.The description and examples are only to be considered as illustrative, and true scope and spirit of the invention are wanted by right It asks and points out.
It should be understood that the present invention is not limited to the precise structure already described above and shown in the accompanying drawings, and And various modifications and changes may be made without departing from the scope thereof.The scope of the present invention is limited only by the attached claims.

Claims (9)

1. a kind of emotion word recognition method based on specific area text, which is characterized in that this method comprises the following steps:
Pretreatment, pre-processes corpus data;
Emotion word identification is obtained using the emotion word discovery based on template and based on the discovery of the emotion word of naive Bayesian to calculate Emotional expression word, while the polarity of the emotional expression word obtained using the emotion word discovery judgement based on naive Bayesian;
The emotional expression word obtained by the identification of above-mentioned emotion word is ranked up by post-processing, and the emotion for choosing highest scoring is waited Word is selected, is used to expand emotion word dictionary as final emotion word.
2. according to the method described in claim 1, it is characterized by: it is described pretreatment for corpus carry out cleaning filtering, subordinate sentence, Word segmentation processing.
3. according to the method described in claim 1, it is characterized by: the emotion word discovery algorithm based on template is directed to newly to go out Existing word finds some instantly popular emotional expression words;The emotion word discovery algorithm based on naive Bayesian is directed to book The word in face excavates regular emotional expression word.
4. method according to any one of claim 1-3, it is characterised in that: utilize the emotion word discovery based on template Implementation is as follows:
Input: seed set of words seed={ word1,word2......wordnAnd pretreated corpus;
Step 1: extract all templates under current seed set, i.e., the previous word of seed words or punctuation mark and the latter word or Punctuation mark;
Step 2: evaluating all templates, select highest preceding 5 templates of score, constitute extraction template;
Step 3: extracting example or word using extraction template;
Step 4: evaluating all words of extraction, select score highest as candidate emotion word;
Step 5: choosing remaining seed words, continue to step 1, emotion word extraction is carried out, until seed collection is combined into sky;
Output: the candidate emotion word with score.
5. according to the method described in claim 4, it is characterized by: evaluating the template that all templates use in the step 2 Judgement schematics are as follows:
Wherein, T indicates template, and score (T) indicates the final score of a template, and s indicates the emotion word of seed set of words, Freq (T, s) indicates the number that the template T of the s of emotion word containing seed occurs in corpus;If containing degree adverb in T, σ=2, If no, σ=1;
The judgement schematics of all words of extraction are evaluated in the step 4 are as follows:
Wherein, word is the emotion word extracted;Score (word) is the final score of the emotion word;It indicates in template T set, the frequency that the template t containing word word occurs in corpus;| T | indicate mould The number of plate T set.
6. method according to claim 1 to 3, it is characterised in that: utilize the reality of the emotion word discovery based on naive Bayesian Existing mode is as follows:
It is a file by whole text organizations in corpus, participle is carried out to text in this document and subordinate sentence operates, and is filtered Stop words;
Calculate the tf-idf value of pretreatment corpus;
The maximum word of tf-idf value is chosen as keyword, i.e., candidate emotion word;
It is loaded into seed emotion word dictionary, using the emotion word recognizer based on naive Bayesian, calculates the feelings of candidate emotion word Sense tendency is to obtain emotion word.
7. according to the method described in claim 6, it is characterized by: the weight calculation of the candidate emotion word is using following public Formula:
Wherein, S#Indicate the weight of candidate emotion word;wiIndicate a word, i=1...n, then w1...wi...wnIndicate a time Select emotion word;S*Indicate seed emotion word;
Wherein n (wi,S*) indicate w in corpusiAnd S*The number occurred jointly, n (wi) indicate W in corpusiThe number of appearance, δ are that do not occur the word in corpus in order to prevent word frequency is caused to be 0, are introduced after carrying out data smoothing Constant;
Wherein Freq (S*) it is the number that seed emotion word occurs in corpus,Indicate the total degree that all words occur, wordiIndicate the candidate emotion word extracted;
Since each candidate's emotion word has positive emotion weightWith negative affect weightThen: final emotional value
8. a kind of emotion word identification based on specific area text to realize such as claim 1-7 any one the method System, it is characterised in that: the system includes:
Preprocessing module carries out data cleansing, participle, subordinate sentence pretreatment to corpus data;
Emotion word identification module, the emotion word identification module is using the emotion word discovery based on template and is based on naive Bayesian Emotion word discovery algorithm find candidate emotion word, the emotion word discovery based on template and described based on naive Bayesian Emotion word finds that two kinds of algorithms are performed simultaneously, and the candidate emotion word that the emotion word discovery based on template obtains passes through institute It states the emotion word discovery based on naive Bayesian and carries out the judgement of emotion word polarity;
Post-processing module inputs the rear place with polar candidate emotion word for what is obtained by the emotion word identification module Module is managed, the post-processing module is ranked up described with polar candidate emotion word, and chooses the portion of highest scoring Divide candidate word as final emotion word, to expand sentiment dictionary.
9. system according to claim 8, which is characterized in that when the corpus data is web data, the pre- place Reason module also needs to execute removal noise data, and extracts related text content.
CN201910316622.8A 2019-04-19 2019-04-19 Specific field text-based emotion word recognition method Active CN110069780B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910316622.8A CN110069780B (en) 2019-04-19 2019-04-19 Specific field text-based emotion word recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910316622.8A CN110069780B (en) 2019-04-19 2019-04-19 Specific field text-based emotion word recognition method

Publications (2)

Publication Number Publication Date
CN110069780A true CN110069780A (en) 2019-07-30
CN110069780B CN110069780B (en) 2021-11-19

Family

ID=67367977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910316622.8A Active CN110069780B (en) 2019-04-19 2019-04-19 Specific field text-based emotion word recognition method

Country Status (1)

Country Link
CN (1) CN110069780B (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310455A (en) * 2020-02-11 2020-06-19 安徽理工大学 New emotion word polarity calculation method for online shopping comments

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008151926A (en) * 2006-12-15 2008-07-03 Internatl Business Mach Corp <Ibm> Technique to search new phrase to be registered in voice processing dictionary
CN103544246A (en) * 2013-10-10 2014-01-29 清华大学 Method and system for constructing multi-emotion dictionary for internet
CN103955453A (en) * 2014-05-23 2014-07-30 清华大学 Method and device for automatically discovering new words from document set
CN104268197A (en) * 2013-09-22 2015-01-07 中科嘉速(北京)并行软件有限公司 Industry comment data fine grain sentiment analysis method
CN105912576A (en) * 2016-03-31 2016-08-31 北京外国语大学 Emotion classification method and emotion classification system
CN107305539A (en) * 2016-04-18 2017-10-31 南京理工大学 A kind of text tendency analysis method based on Word2Vec network sentiment new word discoveries
CN109947951A (en) * 2019-03-19 2019-06-28 北京师范大学 A kind of automatically updated emotion dictionary construction method for financial text analyzing
CN110889275A (en) * 2018-09-07 2020-03-17 鼎复数据科技(北京)有限公司 Information extraction method based on deep semantic understanding

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008151926A (en) * 2006-12-15 2008-07-03 Internatl Business Mach Corp <Ibm> Technique to search new phrase to be registered in voice processing dictionary
CN104268197A (en) * 2013-09-22 2015-01-07 中科嘉速(北京)并行软件有限公司 Industry comment data fine grain sentiment analysis method
CN103544246A (en) * 2013-10-10 2014-01-29 清华大学 Method and system for constructing multi-emotion dictionary for internet
CN103955453A (en) * 2014-05-23 2014-07-30 清华大学 Method and device for automatically discovering new words from document set
CN105912576A (en) * 2016-03-31 2016-08-31 北京外国语大学 Emotion classification method and emotion classification system
CN107305539A (en) * 2016-04-18 2017-10-31 南京理工大学 A kind of text tendency analysis method based on Word2Vec network sentiment new word discoveries
CN110889275A (en) * 2018-09-07 2020-03-17 鼎复数据科技(北京)有限公司 Information extraction method based on deep semantic understanding
CN109947951A (en) * 2019-03-19 2019-06-28 北京师范大学 A kind of automatically updated emotion dictionary construction method for financial text analyzing

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310455A (en) * 2020-02-11 2020-06-19 安徽理工大学 New emotion word polarity calculation method for online shopping comments

Also Published As

Publication number Publication date
CN110069780B (en) 2021-11-19

Similar Documents

Publication Publication Date Title
Singh et al. Vectorization of text documents for identifying unifiable news articles
CN104462053B (en) A kind of personal pronoun reference resolution method based on semantic feature in text
CN104572958B (en) A kind of sensitive information monitoring method based on event extraction
Panchenko et al. Unsupervised does not mean uninterpretable: The case for word sense induction and disambiguation
WO2019080863A1 (en) Text sentiment classification method, storage medium and computer
CN109670039B (en) Semi-supervised e-commerce comment emotion analysis method based on three-part graph and cluster analysis
CN111221962B (en) Text emotion analysis method based on new word expansion and complex sentence pattern expansion
CN109376251A (en) A kind of microblogging Chinese sentiment dictionary construction method based on term vector learning model
CN111831802B (en) Urban domain knowledge detection system and method based on LDA topic model
CN106446109A (en) Acquiring method and device for audio file abstract
CN107729468A (en) Answer extracting method and system based on deep learning
CN103744953A (en) Network hotspot mining method based on Chinese text emotion recognition
CN108038205A (en) For the viewpoint analysis prototype system of Chinese microblogging
CN105760363B (en) Word sense disambiguation method and device for text file
CN108228569A (en) A kind of Chinese microblog emotional analysis method based on Cooperative Study under the conditions of loose
CN112052356A (en) Multimedia classification method, apparatus and computer-readable storage medium
CN104573030A (en) Textual emotion prediction method and device
CN109325122A (en) Vocabulary generation method, file classification method, device, equipment and storage medium
CN105589976B (en) Method and device is determined based on the target entity of semantic relevancy
CN111626050A (en) Microblog emotion analysis method based on expression dictionary and emotion common sense
Nguyen et al. An ensemble of shallow and deep learning algorithms for Vietnamese sentiment analysis
CN114722176A (en) Intelligent question answering method, device, medium and electronic equipment
JP2013131075A (en) Classification model learning method, device, program, and review document classifying method
Pay et al. An ensemble of automatic keyword extractors: TextRank, RAKE and TAKE
CN112926341A (en) Text data processing method and device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant