CN110069780B - Specific field text-based emotion word recognition method - Google Patents

Specific field text-based emotion word recognition method Download PDF

Info

Publication number
CN110069780B
CN110069780B CN201910316622.8A CN201910316622A CN110069780B CN 110069780 B CN110069780 B CN 110069780B CN 201910316622 A CN201910316622 A CN 201910316622A CN 110069780 B CN110069780 B CN 110069780B
Authority
CN
China
Prior art keywords
word
words
emotion
emotional
template
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910316622.8A
Other languages
Chinese (zh)
Other versions
CN110069780A (en
Inventor
张力文
程国艮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Glabal Tone Communication Technology Co ltd
Original Assignee
Glabal Tone Communication Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Glabal Tone Communication Technology Co ltd filed Critical Glabal Tone Communication Technology Co ltd
Priority to CN201910316622.8A priority Critical patent/CN110069780B/en
Publication of CN110069780A publication Critical patent/CN110069780A/en
Application granted granted Critical
Publication of CN110069780B publication Critical patent/CN110069780B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/33Querying
    • G06F16/3331Query processing
    • G06F16/3332Query translation
    • G06F16/3335Syntactic pre-processing, e.g. stopword elimination, stemming
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • G06F16/374Thesaurus
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/247Thesauruses; Synonyms
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Data Mining & Analysis (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • General Health & Medical Sciences (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides an emotion word recognition method based on a specific field text, which comprises the following steps: preprocessing, namely preprocessing the material data; recognizing the emotional words, calculating and obtaining emotional expression words by utilizing template-based emotional word discovery and naive Bayes-based emotional word discovery, and judging the polarity of the obtained emotional expression words by utilizing naive Bayes-based emotional word discovery; and (4) post-processing, namely sequencing the emotion expression words obtained by the emotion word recognition, selecting the emotion candidate word with the highest score, and using the selected emotion candidate word as the final emotion word to expand an emotion word dictionary. The method can realize the identification and extraction of the emotional words, can output the positive and negative emotional polarities of the emotional words, does not need manually labeled linguistic data, and can realize the fully-automatic emotional word identification.

Description

Specific field text-based emotion word recognition method
Technical Field
The invention relates to the field of emotion word recognition, in particular to an emotion word and polarity recognition method and system based on a specific field text.
Background
In the era of big data and artificial intelligence, artificial intelligence systems are required to not only have human-like thinking and reasoning capabilities, but also to be able to perceive and express emotions. Therefore, emotion analysis is a hotspot and difficulty in current research, and is a process of analyzing, processing, inducing and reasoning subjective texts with emotion colors. This task, like other natural language processing tasks, first requires support of resources. On the basis, text emotion classification work is carried out. The construction of resources is thus a cornerstone of all tasks, and emotion resources generally include emotion dictionaries and emotion corpora. For the emotional dictionary, a manual screening mode is mainly adopted for construction at present, so that the cost is high, and the scale of the constructed dictionary is small. Compared with English, the construction and research of Chinese emotion dictionary are still not mature.
The invention provides an emotion dictionary expansion method based on a specific field, which provides support for a corpus for carrying out emotion analysis research in different fields and different tasks.
The prior art CN106776566A in the field discloses a method and a device for recognizing emotion words, wherein the method needs to firstly carry out positive and negative emotion marking on emotion words in a text, and then determines candidate emotion words according to a marking result and a chi-square statistical feature selection algorithm; the prior art CN107729374A discloses a method for extending an emotion word dictionary, which selects a corpus according to an extension direction, calculates the matching degree of word vectors between a corpus and an emotion dictionary, and selects corpus data in the corpus to extend the emotion dictionary according to a calculation result. However, the emotion word recognition methods adopted in the prior art are all greatly different from the scheme, and meanwhile, because the scheme does not need manual labeling, the purpose of automatic recognition is actually achieved.
Disclosure of Invention
It is an object of the present invention to provide a system and method for emotion word recognition based on domain-specific text, which overcome, at least to some extent, one or more of the problems due to limitations and disadvantages of the related art.
Additional features and advantages of the invention will be set forth in the detailed description which follows, or may be learned by practice of the invention.
Aiming at the text emotion analysis task, the accuracy and recall rate of the emotion analysis task are determined based on the heuristic emotion analysis task and the accuracy and coverage rate of a dictionary; in addition, the emotion dictionary is used as an external resource, and the accuracy of the machine learning model can be effectively improved. Under the big data era, a large amount of corpora can be easily obtained, but because the corpora are large in scale, the emotion dictionary cannot be constructed in a manual screening mode. In addition, with the development and popularization of internet applications, network terms and new languages are layered endlessly, and if the construction is carried out in a manual mode, huge labor cost is consumed, and the construction scale cannot be guaranteed. Aiming at the defects, the method can automatically identify the emotional words in the corpus, calculate the emotional weight of the emotional words and expand the emotional dictionary through the emotional tendency.
The invention firstly provides an emotion word recognition method based on a specific field text, which comprises the following steps:
preprocessing, namely preprocessing the voice data, wherein the preprocessing comprises the steps of cleaning and filtering the voice, segmenting sentences and segmenting words;
recognizing the emotional words, calculating and obtaining emotional expression words by utilizing template-based emotional word discovery and naive Bayes-based emotional word discovery, and judging the polarity of the obtained emotional expression words by utilizing naive Bayes-based emotional word discovery;
and (4) post-processing, namely sequencing the emotion expression words obtained by the emotion word recognition, selecting the emotion candidate word with the highest score, and using the selected emotion candidate word as the final emotion word to expand an emotion word dictionary.
Preferably, the template-based emotional word discovery algorithm discovers some currently popular emotional expression words aiming at the newly appeared words; the naive Bayes-based emotion word discovery algorithm is used for discovering regular emotion expression words aiming at written words.
Preferably, the implementation of emotion word discovery using templates is as follows:
inputting: seed word set ═ word1,word2......wordnAnd preprocessed linguistic data;
step 1: extracting all templates under the current seed set, namely a previous word or punctuation mark of the seed word and a next word or punctuation mark;
step 2: evaluating all the templates, and selecting the first 5 templates with the highest scores to form an extraction template; the template evaluation formula is as follows:
Figure GDA0003105478480000021
wherein T represents a template, score (T) represents the final score of the template, s represents the emotional words of the seed word set, and Freq (T, s) represents the frequency of the appearance of the template T containing the seed emotional words s in the corpus; if T contains a degree adverb, σ becomes 2, and if not, σ becomes 1;
and step 3: extracting instances or words using an extraction template;
and 4, step 4: evaluating all the extracted words, selecting the words with the highest score as candidate emotion words, wherein the evaluation formula of all the extracted words is as follows:
Figure GDA0003105478480000031
Figure GDA0003105478480000032
wherein, word is the extracted emotional words; score (word) is the final score of the emotion word;
Figure GDA0003105478480000033
representing the frequency of the template T containing words in the corpus in the template T set; | T | represents the number of template T sets;
and 5: selecting the remaining seed words, continuing to step 1, and extracting the emotional words until the seed set is empty;
and (3) outputting: candidate emotion words with scores.
Preferably, the implementation of emotion word discovery using naive Bayes is as follows:
organizing all chapters in the corpus into a file, performing word segmentation and sentence segmentation operations on the text in the file, and filtering stop words;
calculating tf-idf values of the preprocessed linguistic data;
selecting the word with the largest tf-idf value as a keyword, namely a candidate emotion word;
and loading a seed emotional word dictionary, and calculating the emotional tendency of the candidate emotional words by using an emotional word recognition algorithm based on naive Bayes so as to obtain the emotional words.
The weight calculation of the candidate emotion words adopts the following formula:
Figure GDA0003105478480000034
wherein S is#Representing the weight of the candidate emotional words; w is aiRepresents a word, i 1.. n, then w1...wi...wnRepresenting a candidate emotional word; s*Representing seed emotion words;
Figure GDA0003105478480000041
wherein n (w)i,S*) Representing w in corpusiAnd S*Number of co-occurrences, n (w)i) Representing w in corpusiThe number of times of occurrence is delta, which is a constant introduced after data smoothing in order to prevent word frequency from being 0 caused by no occurrence of word change in the corpus;
Figure GDA0003105478480000042
wherein Freq (S)*) For the number of times the seed emotion word appears in the corpus,
Figure GDA0003105478480000043
representing the total times of all the words, and word represents the extracted emotional words, i.e. the aforementioned w1...wi...wn
Because each candidate emotional word has positive emotional weight
Figure GDA0003105478480000044
And negative emotional weight
Figure GDA0003105478480000045
Then:
final sentiment value
Figure GDA0003105478480000046
Meanwhile, the invention also provides an emotion word recognition system based on the specific field text, which is characterized in that: the system comprises:
the preprocessing module is used for carrying out data cleaning, word segmentation and sentence segmentation preprocessing on the corpus data, and when the corpus data is webpage data, the preprocessing module is also used for executing noise data removal and extracting related text contents;
the emotion word recognition module discovers candidate emotion words by adopting template-based emotion word discovery and naive Bayes-based emotion word discovery algorithms, the template-based emotion word discovery and the naive Bayes-based emotion word discovery algorithms are executed simultaneously, and the candidate emotion words obtained by the template-based emotion word discovery are subjected to emotion word polarity judgment through the naive Bayes-based emotion word discovery;
and the post-processing module is used for inputting the candidate emotion words with polarities obtained by the emotion word recognition module into the post-processing module, sequencing the candidate emotion words with polarities and selecting partial candidate words with the highest scores as final emotion words to expand an emotion dictionary.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The accompanying drawings, which are incorporated in and constitute a part of this specification, illustrate embodiments consistent with the invention and together with the description, serve to explain the principles of the invention. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort. In the drawings:
FIG. 1 illustrates an emotional word recognition workflow of the present invention;
FIG. 2 shows a flow diagram of naive Bayes emotion word discovery based on an embodiment of the invention;
FIG. 3 illustrates the post-processing workflow of the present invention.
Detailed Description
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the examples set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations or operations have not been shown or described in detail to avoid obscuring aspects of the invention.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
Aiming at the problems in the prior art, the invention provides an emotion word recognition method based on a specific field text.
The invention is divided into three parts, which are respectively: the emotion recognition system comprises a preprocessing module, an emotion word recognition module and a post-processing module.
As shown in fig. 1. The data first enters a preprocessing module, which performs some necessary processing on the data, such as: cleaning, filtering, sentence segmentation, word segmentation, etc. The processed data enter an emotion recognition module which consists of two algorithms, namely emotion word discovery based on a template and emotion word discovery based on naive Bayes. The data simultaneously uses two algorithms to discover candidate emotion words, wherein the former is designed for some newly appeared words and mainly serves to discover some recently popular emotion expression words, and the latter is mainly used for comparatively written words and mainly serves to discover regular emotion expression words. The two algorithms of the emotion recognition module are executed simultaneously, but the emotion words obtained through the algorithm of 'emotion word discovery based on template' cannot judge the polarity of the words (judge whether the words are positive emotion words or negative emotion words), so the obtained emotion words also need to judge the polarity of the emotion words through emotion word discovery based on naive Bayes. Then, all candidate emotional words are sent to a post-processing module. The module ranks the emotion words selected by the algorithm, selects the candidate word with the highest score as the final emotion word and is used for expanding the emotion dictionary.
In the preprocessing module, the module preprocesses corpus data, for example: sentence, word, etc. Different approaches are taken for different data sources, for example: if the data is web page data, noise data such as web page labels and the like needs to be removed, and relevant text content is extracted.
The emotion word recognition module comprises two algorithms, namely emotion word discovery based on a template and emotion word discovery based on naive Bayes.
According to one embodiment of the invention, the emotion word discovery based on the template is realized by the following steps:
inputting: seed word set ═ word1,word2......wordnAnd preprocessed linguistic data;
step 1: extracting all templates under the current seed set, namely a previous word or punctuation mark of the seed word and a next word or punctuation mark;
step 2: evaluating all the templates, and selecting the first 5 templates with the highest scores to form an extraction template; the template evaluation formula is as follows:
Figure GDA0003105478480000061
where T represents a template, for example: "too < word > is a template. The algorithm firstly obtains a template with a higher template evaluation score from a corpus through seed emotion words, and then matches the whole corpus with the found template to obtain candidate emotion words. For example, the seed words include power and dominance, and the statistical frequency of the collocation of "too power and too dominance" is higher, so as to obtain the template "too < word > is obtained. Then, through the template, the whole corpus is searched, and matches such as "too dazzling", "too hole satay" and the like are matched, and finally emotion words such as "dazzling", "hole satay" and the like are obtained. score (T) represents the final score of a template, s represents the emotion words of the seed word set, Freq (T, s) represents the number of times the template T containing the seed emotion words s appears in the corpus; if T contains a degree adverb, σ becomes 2, and if not, σ becomes 1; through statistics, emotional words often appear together with degree adverbs, so a variable sigma is introduced. If the degree adverb σ is contained in the template, the score obtained finally is high. In addition, σ is 1, and the obtained score is low.
And step 3: extracting instances or words using an extraction template;
and 4, step 4: and evaluating all the extracted words, and selecting the word with the highest score as the candidate emotion word. After the emotion words are extracted by using the template, primary screening needs to be performed once. It is desirable that a word occur as many as possible in multiple templates, while it is desirable that the word occur as uniformly as possible in multiple templates. Therefore, the evaluation is performed by using the following formula, where word is the extracted emotional word, score (word) is the final score of the candidate emotional word, and the evaluation formula of all words extracted by the same word evaluation with the highest score is:
Figure GDA0003105478480000071
Figure GDA0003105478480000072
wherein, word is the extracted emotional words; score (word) is the final score of the emotion word;
Figure GDA0003105478480000073
representing the frequency of the template T containing words in the corpus in the template T set; | T | represents the number of template T sets;
and 5: selecting the remaining seed words, continuing to step 1, and extracting the emotional words until the seed set is empty;
and (3) outputting: candidate emotion words with scores.
According to another embodiment of the invention, as shown in fig. 2, the emotion word discovery based on Bayes is realized by the following way:
through counting large-scale corpora, a phenomenon is found: some words may often appear in some emotional words. The method assumes that the words in the corpus have emotional probabilities. And calculating the emotional tendency of the word by calculating the emotional probability of the characters in the candidate emotional word by using a naive Bayes method. And selecting emotional words from the large-scale corpus.
First, all chapters in the corpus are organized into a file. Then, the sentence and word segmentation operation is carried out on the text, and stop words are filtered. The tf-idf value of the preprocessed corpus is then calculated (a commonly used weighting technique for information retrieval and data mining to assess how important a word is to one of the documents in a corpus or a corpus). And selecting the word with the maximum value as a keyword, namely the candidate emotion word. And finally, loading a seed emotional word dictionary, and obtaining the emotional words by utilizing an emotional word recognition algorithm based on naive Bayes.
The language material after given pretreatment and a sub-emotion word dictionary containing positive emotion words SIs justAnd negative emotion word SNegative pole. Suppose wiRepresents a word, i 1.. n, then w1...wi...wnRepresenting a candidate emotional word, judging whether the candidate emotional word is an emotional word and the emotional tendency degree, and abstracting to a mathematical expression formula as follows:
Figure GDA0003105478480000081
with S#And representing the emotion weight value of the candidate emotion words. P (S)*|w1..wi..wn) Represents a candidate word w1..wi..wnIs the probability of an emotional word. P (w)1..wi..wn) The probability distribution of the candidate words in the corpus is a constant value and can be ignored.
The conditional independence of naive Bayes reveals that: ()
P(S*,w1,w2...wn)=P(w1|S*)P(w2|S*)···P(wn|S*)P(S*) Wherein S represents a seed sentiment word;
the weight of the candidate emotion words is calculated as follows:
Figure GDA0003105478480000082
the following explains the calculation method of each part of the formula respectively, formula (1) calculates the emotional probability of the words in the corpus, and the probability is expressed by frequency according to the law of large numbers. In order to prevent the word from not appearing in the corpus, the word frequency is 0, data smoothing is carried out, and a constant delta is introduced (in order to prevent the situation that the denominator is 0 from appearing). Wherein n (w)i,S*) W in the representative corpusiAnd S*Number of co-occurrences, n (w)i) W in the representative corpusiThe number of occurrences.
Figure GDA0003105478480000091
Equation (2) represents the distribution of emotional words S in the corpus, Freq (S)*) The number of times the emotional words appear in the corpus,
Figure GDA0003105478480000092
represents the total number of occurrences of all words;
Figure GDA0003105478480000093
since each candidate emotion word has a positive emotion weight and a negative emotion weight, i.e.
Figure GDA0003105478480000094
And
Figure GDA0003105478480000095
calculation of final sentiment value
Figure GDA0003105478480000096
If S is larger than 0, the words are positive emotion words, and otherwise, the words are negative emotion words.
Finally, as shown in fig. 3, the obtained emotion words are input into the post-processing module, sorted according to their emotion values, and the first 30% of the total number is selected for output as emotion words and stored in a dictionary.
The invention provides two methods for automatically finding emotional words aiming at the condition that manually constructing an emotional dictionary is time-consuming and labor-consuming in a text emotion analysis task. The emotional words are automatically recognized from the large-scale corpus by means of a sub-dictionary. The emotion word recognition method based on the template mainly solves the problem that new words appearing in network linguistic data (such as social platforms like microblogs) are similar to the expressions of existing words, the template is used for recognizing the boundaries of the new words based on the fact that the expressions of the new words are similar to the expressions of the existing words, the new word recognition is skipped, and emotion words are directly recognized. The emotion word recognition method based on naive Bayes utilizes conditional independence of naive Bayes to calculate emotion values of the whole words through emotion values of words, and can quickly and accurately extract emotion words in formal texts (news corpora).
It should be noted that although in the above detailed description several modules or units of the device for action execution are mentioned, such a division is not mandatory. Indeed, the features and functionality of two or more modules or units described above may be embodied in one module or unit, according to embodiments of the invention. Conversely, the features and functions of one module or unit described above may be further divided into embodiments by a plurality of modules or units.
Other embodiments of the invention will be apparent to those skilled in the art from consideration of the specification and practice of the invention disclosed herein. This application is intended to cover any variations, uses, or adaptations of the invention following, in general, the principles of the invention and including such departures from the present disclosure as come within known or customary practice within the art to which the invention pertains. It is intended that the specification and examples be considered as exemplary only, with a true scope and spirit of the invention being indicated by the following claims.
It will be understood that the invention is not limited to the precise arrangements described above and shown in the drawings and that various modifications and changes may be made without departing from the scope thereof. The scope of the invention is limited only by the appended claims.

Claims (5)

1. A method for recognizing emotion words based on a text in a specific field is characterized by comprising the following steps:
preprocessing, namely preprocessing the material data;
recognizing the emotional words, calculating and obtaining emotional expression words by utilizing template-based emotional word discovery and naive Bayes-based emotional word discovery, and judging the polarity of the obtained emotional expression words by utilizing naive Bayes-based emotional word discovery; the emotion word discovery method based on the template comprises the following implementation modes:
inputting: seed word set ═ word1,word2......wordnAnd the preprocessed corpus are processed,
step 1: extracting all templates under the current seed set, namely a previous word or punctuation mark and a next word or punctuation mark of the seed word,
step 2: evaluating all the templates, selecting the first 5 templates with the highest scores to form an extraction template, wherein the template evaluation formula adopted for evaluating all the templates is as follows:
Figure FDA0003105478470000011
wherein T represents a template, score (T) represents the final score of the template, s represents the emotional words of the seed word set, and Freq (T, s) represents the frequency of the appearance of the template T containing the seed emotional words s in the corpus; if T contains a degree adverb, σ becomes 2, and if not, σ becomes 1,
and step 3: the instances or words are extracted using an extraction template,
and 4, step 4: evaluating all the extracted words, and selecting the words with the highest scores as candidate emotional words, wherein the evaluation formula of all the extracted words is as follows:
Figure FDA0003105478470000012
Figure FDA0003105478470000013
wherein, word is the extracted emotional words; score (word) is the final score of the emotion word;
Figure FDA0003105478470000014
representing the frequency of the template T containing words in the corpus in the template T set; | T | represents the number of template T sets
And 5: selecting the remaining seed words, continuing to step 1, extracting the emotional words until the seed set is empty,
and (3) outputting: candidate emotion words with scores;
and (4) post-processing, namely sequencing the emotion expression words obtained by the emotion word recognition, selecting the emotion candidate word with the highest score, and using the selected emotion candidate word as the final emotion word to expand an emotion word dictionary.
2. The method of claim 1, wherein: the preprocessing comprises the steps of cleaning and filtering the language material, segmenting sentences and segmenting words.
3. The method of claim 1, wherein: the emotion word discovery algorithm based on the template discovers some currently popular emotion expression words aiming at the newly appeared words; the naive Bayes-based emotion word discovery algorithm is used for discovering regular emotion expression words aiming at written words.
4. A method according to any one of claims 1-3, characterized in that: the implementation method of emotion word discovery based on naive Bayes is as follows:
organizing all chapters in the corpus into a file, performing word segmentation and sentence segmentation operations on the text in the file, and filtering stop words;
calculating tf-idf values of the preprocessed linguistic data;
selecting the word with the largest tf-idf value as a keyword, namely a candidate emotion word;
and loading a seed emotional word dictionary, and calculating the emotional tendency of the candidate emotional words by using an emotional word recognition algorithm based on naive Bayes so as to obtain the emotional words.
5. The method of claim 4, wherein: the weight calculation of the candidate emotion words adopts the following formula:
Figure FDA0003105478470000021
wherein S is#Representing the weight of the candidate emotional words; w is aiRepresents a word, i 1.. n, then w1...wi...wnRepresenting a candidate emotional word; s*Representing seed emotion words;
Figure FDA0003105478470000022
wherein n (w)i,S*) Representing w in corpusiAnd S*Number of co-occurrences, n (w)i) Representing w in corpusiThe number of times of occurrence is delta, which is a constant introduced after data smoothing in order to prevent the word frequency from being 0 caused by the fact that the word does not occur in the corpus;
Figure FDA0003105478470000031
wherein Freq (S)*) For the number of times the seed emotion word appears in the corpus,
Figure FDA0003105478470000032
representing the total number of occurrences of all words, wordiRepresenting the extracted candidate emotional words;
because each candidate emotional word has positive emotional weight
Figure FDA0003105478470000033
And negative emotional weight
Figure FDA0003105478470000034
Then: final sentiment value
Figure FDA0003105478470000035
CN201910316622.8A 2019-04-19 2019-04-19 Specific field text-based emotion word recognition method Active CN110069780B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910316622.8A CN110069780B (en) 2019-04-19 2019-04-19 Specific field text-based emotion word recognition method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910316622.8A CN110069780B (en) 2019-04-19 2019-04-19 Specific field text-based emotion word recognition method

Publications (2)

Publication Number Publication Date
CN110069780A CN110069780A (en) 2019-07-30
CN110069780B true CN110069780B (en) 2021-11-19

Family

ID=67367977

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910316622.8A Active CN110069780B (en) 2019-04-19 2019-04-19 Specific field text-based emotion word recognition method

Country Status (1)

Country Link
CN (1) CN110069780B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111310455B (en) * 2020-02-11 2022-09-20 安徽理工大学 New emotion word polarity calculation method for online shopping comments

Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008151926A (en) * 2006-12-15 2008-07-03 Internatl Business Mach Corp <Ibm> Technique to search new phrase to be registered in voice processing dictionary
CN103544246A (en) * 2013-10-10 2014-01-29 清华大学 Method and system for constructing multi-emotion dictionary for internet
CN103955453A (en) * 2014-05-23 2014-07-30 清华大学 Method and device for automatically discovering new words from document set
CN104268197A (en) * 2013-09-22 2015-01-07 中科嘉速(北京)并行软件有限公司 Industry comment data fine grain sentiment analysis method
CN105912576A (en) * 2016-03-31 2016-08-31 北京外国语大学 Emotion classification method and emotion classification system
CN107305539A (en) * 2016-04-18 2017-10-31 南京理工大学 A kind of text tendency analysis method based on Word2Vec network sentiment new word discoveries
CN109947951A (en) * 2019-03-19 2019-06-28 北京师范大学 A kind of automatically updated emotion dictionary construction method for financial text analyzing
CN110889275A (en) * 2018-09-07 2020-03-17 鼎复数据科技(北京)有限公司 Information extraction method based on deep semantic understanding

Patent Citations (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008151926A (en) * 2006-12-15 2008-07-03 Internatl Business Mach Corp <Ibm> Technique to search new phrase to be registered in voice processing dictionary
CN104268197A (en) * 2013-09-22 2015-01-07 中科嘉速(北京)并行软件有限公司 Industry comment data fine grain sentiment analysis method
CN103544246A (en) * 2013-10-10 2014-01-29 清华大学 Method and system for constructing multi-emotion dictionary for internet
CN103955453A (en) * 2014-05-23 2014-07-30 清华大学 Method and device for automatically discovering new words from document set
CN105912576A (en) * 2016-03-31 2016-08-31 北京外国语大学 Emotion classification method and emotion classification system
CN107305539A (en) * 2016-04-18 2017-10-31 南京理工大学 A kind of text tendency analysis method based on Word2Vec network sentiment new word discoveries
CN110889275A (en) * 2018-09-07 2020-03-17 鼎复数据科技(北京)有限公司 Information extraction method based on deep semantic understanding
CN109947951A (en) * 2019-03-19 2019-06-28 北京师范大学 A kind of automatically updated emotion dictionary construction method for financial text analyzing

Also Published As

Publication number Publication date
CN110069780A (en) 2019-07-30

Similar Documents

Publication Publication Date Title
CN108363790B (en) Method, device, equipment and storage medium for evaluating comments
CN109299480B (en) Context-based term translation method and device
CN109710947B (en) Electric power professional word bank generation method and device
CN109101478B (en) Aspect-level emotion analysis method for E-commerce comment text
CN110705206B (en) Text information processing method and related device
CN107273358B (en) End-to-end English chapter structure automatic analysis method based on pipeline mode
US10831993B2 (en) Method and apparatus for constructing binary feature dictionary
CN111444330A (en) Method, device and equipment for extracting short text keywords and storage medium
CN107688630B (en) Semantic-based weakly supervised microbo multi-emotion dictionary expansion method
CN111858935A (en) Fine-grained emotion classification system for flight comment
CN113704416B (en) Word sense disambiguation method and device, electronic equipment and computer-readable storage medium
CN112434164B (en) Network public opinion analysis method and system taking topic discovery and emotion analysis into consideration
CN112052356A (en) Multimedia classification method, apparatus and computer-readable storage medium
CN109325122A (en) Vocabulary generation method, file classification method, device, equipment and storage medium
CN104573030A (en) Textual emotion prediction method and device
CN103678565A (en) Domain self-adaption sentence alignment system based on self-guidance mode
CN112699232A (en) Text label extraction method, device, equipment and storage medium
CN107357895A (en) A kind of processing method of the text representation based on bag of words
CN112860896A (en) Corpus generalization method and man-machine conversation emotion analysis method for industrial field
Boag et al. Twitterhawk: A feature bucket based approach to sentiment analysis
CN114491062B (en) Short text classification method integrating knowledge graph and topic model
CN107577713A (en) Text handling method based on electric power dictionary
CN110069780B (en) Specific field text-based emotion word recognition method
CN106021225B (en) A kind of Chinese Maximal noun phrase recognition methods based on the simple noun phrase of Chinese
CN111681731A (en) Method for automatically marking colors of inspection report

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant