CN112711941B - Emotional score analysis processing method based on emotional dictionary entity - Google Patents

Emotional score analysis processing method based on emotional dictionary entity Download PDF

Info

Publication number
CN112711941B
CN112711941B CN202110021645.3A CN202110021645A CN112711941B CN 112711941 B CN112711941 B CN 112711941B CN 202110021645 A CN202110021645 A CN 202110021645A CN 112711941 B CN112711941 B CN 112711941B
Authority
CN
China
Prior art keywords
entity
emotion
emotional
entities
score
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110021645.3A
Other languages
Chinese (zh)
Other versions
CN112711941A (en
Inventor
张娴
王盼盼
周庆勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Cloud Information Technology Co Ltd
Original Assignee
Inspur Cloud Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Cloud Information Technology Co Ltd filed Critical Inspur Cloud Information Technology Co Ltd
Priority to CN202110021645.3A priority Critical patent/CN112711941B/en
Publication of CN112711941A publication Critical patent/CN112711941A/en
Application granted granted Critical
Publication of CN112711941B publication Critical patent/CN112711941B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/237Lexical tools
    • G06F40/242Dictionaries
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Health & Medical Sciences (AREA)
  • Artificial Intelligence (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides an analysis processing method of emotion score based on an emotion dictionary entity, belonging to the field of natural language processing and comprising 6 steps: 1) Preparing a dictionary; 2) Establishing a structure defining an entity, and 3) establishing an entity comparator; 4) Traversing the text to be analyzed according to the established entity to generate all candidate entities; 5) Screening candidate entities; 6) And calculating the emotion score. The method uses four dictionaries such as an emotion dictionary to create the entity, and fine-grained processing is performed on the traversal of the entity, so that errors are reduced.

Description

Emotional score analysis processing method based on emotional dictionary entity
Technical Field
The invention relates to the field of natural language processing, in particular to an emotion score analysis processing method based on an emotion dictionary entity.
Background
What is the sentiment analysis? Briefly, this is the process of analyzing, processing, generalizing, and reasoning about subjective text with emotional colors. A great deal of valuable review information about people, events, products, etc. is generated on the internet (e.g., blogs and forums and social service networks such as mass reviews, beauty groups). The comment information expresses various emotional colors and emotional tendencies of people, such as happiness, anger, grief, music, criticism, praise and the like. Based on this, the potential user can know the opinion of the public opinion about a certain event or product by browsing the subjective color comments. Developments and rapid initiatives in this area benefit from rapid development of social media on the network, such as product reviews, forum discussions, micro blogs, and the like. Since the early 2000 s, emotion analysis has grown into one of the most active research areas in Natural Language Processing (NLP), and has been a widespread research in data mining, web mining, text mining, and information retrieval. At present, the emotional direction is mainly analyzed by a text classification method or a dictionary-based method, and the classification method has the defects that labels of training samples need to be labeled manually, and manpower and material resources are consumed; the dictionary-based calculation method only considers one kind of dictionary of the emotion dictionary or has certain error in searching the emotion words.
Disclosure of Invention
In order to solve the technical problems, the invention provides an emotion score analysis processing method based on an emotion dictionary entity, which performs fine-grained processing on the traversal of the entity, reduces errors and aims to perform emotion score analysis processing on unstructured emotion text data through text processing and statistical methods.
The technical scheme of the invention is as follows:
an analysis processing method based on the emotion score of an emotion dictionary entity,
comprises 6 steps:
1) Dictionary preparation
2) The structure defining the entity is established and,
3) Establishing an entity comparator;
4) Traversing the text to be analyzed according to the established entity to generate all candidate entities;
5) Screening candidate entities;
6) An emotion score is calculated.
Further, in the above-mentioned case,
four dictionaries of emotion words, degree adverbs, negative words and punctuation marks need to be prepared first.
The four dictionaries come from a general dictionary or a custom dictionary of a specific industry according to specific requirements; wherein,
the positive emotion words are assigned to positive scores, the stronger the emotion is, the higher the score is, and the negative emotion words are assigned to negative scores, the stronger the emotion is, the lower the score is; each degree adverb in the degree adverb dictionary is assigned with a different score according to different expressed strengths, and the score is larger when the degree represented by a general degree word is higher; the negative adverbs are a dictionary of simple negative words; punctuation dictionaries are also commonly used for punctuation or segmentation.
Further, in the above-mentioned case,
the entity structure comprises an entity name, an entity starting index, an entity ending index, an entity type and an entity length, wherein the entity type is divided into emotional words, degree adverbs, negative words and punctuation marks.
Further, in the above-mentioned case,
an entity comparator is established, namely two entities are set: and if the initial position of the entity I is larger than that of the entity II, returning to 1, if the initial position of the entity I is smaller than that of the entity II, returning to-1, and the initial positions of the two entities are equal, comparing the lengths of the two entities, if the length of the entity I is larger than that of the entity II, returning to 1, and otherwise, returning to-1.
Further, in the above-mentioned case,
generating candidate entities, giving a text to be analyzed, sequentially traversing the four dictionaries, if words in the dictionaries appear in the text, constructing a corresponding entity by the words, putting the entity into a candidate entity list, generating all the candidate entities after traversing the four dictionaries, and sequencing the candidate entities according to a defined filter, wherein the candidate entity list is a list generated according to the size of a starting position.
Further, in the above-mentioned case,
when screening entities, the candidate entity list is searched iteratively, if the initial indexes of the following entities are consistent with the initial index of the current entity, the longest entity is found and used as the entity of the current index, the initial index of the next word is larger than the end index of the longest entity, the index of the current entity is smaller than the end index of the last entity, the next entity is judged by directly skipping, and finally the required entity list is obtained.
In a further aspect of the present invention,
and traversing the generated final entity list, directly skipping if the type of the current entity is not an emotional entity, and if the type of the current entity is the emotional entity, searching the position of the emotional entity or the punctuation mark entity closest to the emotional entity forward according to the position of the entity as an index, and simultaneously recording the number of all emotional entities.
Calculating the emotion score of the current emotional entity: the initial weight of the emotional entity is the score of the emotional word, the negative entity and the degree adverb entity which appear are found from the emotional entity to the position index, and the situation of the degree adverb, the degree adverb and the emotional word is removed, and the score of the emotional entity is as follows: and the degree adverb entity score ^ degree adverb entity times (-1) ^ negation word entity times ^ initial weight, and then the emotion score of the current emotion entity is obtained.
Traversing all the emotional entities, and summing all the emotional scores to obtain the emotional score of the text to be analyzed; if normalization is required it can be divided by the number of sentiment entities.
The invention has the advantages that
1. The invention is not limited to a specific field or scene, and the emotional text to be analyzed can come from fields such as news, product evaluation, public opinion analysis and the like;
2. the analysis of the text class usually performs word segmentation first, and then has a certain word segmentation error. The method does not perform operations such as basic word segmentation and the like on the text to be analyzed, so that certain accuracy is improved;
3. the user-defined method of the invention uses four dictionaries, adds punctuation mark entities of sentences or paragraphs, improves the accuracy of searching in the entities, and finds out the modified entities for modifying the entities to carry out corresponding weight change.
Drawings
FIG. 1 is a schematic of the work flow of the present invention.
Detailed Description
In order to make the objects, technical solutions and advantages of the embodiments of the present invention clearer and more complete, the technical solutions in the embodiments of the present invention will be described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are some, but not all, embodiments of the present invention, and based on the embodiments of the present invention, all other embodiments obtained by a person of ordinary skill in the art without creative efforts belong to the scope of the present invention.
The invention provides an analysis processing method of emotion scores based on an emotion dictionary entity, which is mainly realized by the following technical scheme and specifically comprises the following steps:
1. dictionary preparation
Firstly, four dictionaries of emotion words, degree adverbs, negative words and punctuation marks need to be prepared: the four dictionaries can be from general dictionaries or custom dictionaries of specific industries according to specific requirements; each emotion word in the emotion word dictionary is assigned with a certain fraction to express the strength of the emotion, generally, positive emotion words are assigned with positive scores and the stronger emotion is, the higher the score is, and negative emotion words are assigned with negative scores and the stronger emotion is, the lower the score is; each degree adverb in the degree adverb dictionary is assigned with a different score according to different expressed strengths, and the score is larger when the degree represented by a general degree word is higher; the negative adverbs are a dictionary of simple negative words; punctuation dictionaries are also commonly used for punctuation or segmentation.
2. Defining the structure of an entity
The entity structure comprises an entity name, an entity starting index, an entity ending index, an entity type and an entity length, wherein the entity type is divided into emotional words, degree adverbs, negative words and punctuation marks. Subsequent calculation steps will use these specific properties of the entity for calculation.
3. Building a physical comparator
For example, there are two entities, i.e., entity one and entity two, and if the starting position of entity one is greater than the starting position of entity two, return to 1, if entity one is less than the starting position of entity two, return to-1, if the starting positions of the two entities are equal, compare the lengths of the two entities, and if the length of entity one is greater than the length of entity two, return to 1, otherwise return to-1.
4. Generating candidate entities
And giving a text to be analyzed, sequentially traversing the four dictionaries, if a word in the dictionary appears in the text, constructing a corresponding entity by the word, putting the entity into a candidate entity list, generating all candidate entities after traversing the four dictionaries, and sequencing the candidate entities according to a defined filter, wherein the candidate entity list is a list generated according to the size of the starting position.
5. Screening candidate entities
And iteratively searching the candidate entity list, if the initial indexes of the subsequent entities are consistent with the initial index of the current entity, finding the longest entity as the entity of the current index, directly skipping the initial index of the next word which is larger than the ending index of the longest entity and the index of the current entity which is smaller than the ending index of the previous entity, and judging the next entity. And finally obtaining the required entity list.
6. Calculating an emotion score
And traversing the final entity list generated in the previous step, directly skipping if the current entity type is not an emotional entity, and if the current entity type is the emotional entity, searching forward the position of the emotional entity or the punctuation mark entity closest to the emotional entity according to the position of the entity as an index, and simultaneously recording the number of all the emotional entities. Calculating the emotion score of the current emotion entity, wherein the initial weight of the emotion entity is the score of the emotion word, finding the negative entity and the degree adverb entity from the emotion entity to the position index, and removing the situation of the degree adverb and the degree adverb, and the score of the emotion entity is as follows: and (3) obtaining the emotion score of the current emotional entity by the degree adverb entity score ^ times of the degree adverb entity (-1) times of the negation word entity ^ initial weight. And traversing all the emotion entities, and summing all the emotion scores to obtain the emotion score of the text to be analyzed. If normalization is required it can be divided by the number of sentiment entities.
The invention can be adjusted according to actual requirements, for example, specific contents of four dictionaries are customized according to actual requirements, and corresponding personalization is performed on specific details, for example, the definition of emotion words in different industries is possibly different, and optimization can be performed through modification of emotion dictionaries. In the method, the combination of four dictionaries is considered, and weights can be given to different combination forms, for example, when the degree adverb, the degree adverb and the emotional word, a user highlights the combination more and can assign corresponding weights, so that the method has great applicability and expandability.
The method does not perform operations such as word segmentation and filtering on the text to be analyzed, and reduces errors caused by inaccurate processing of information by operations such as word segmentation. Candidate entities are generated in an entity traversal mode, further entity screening is performed according to the candidate entities and designed corresponding rules, final entities are reserved, and accuracy is improved. And finally, calculating to obtain emotion scores according to the text to be analyzed, and carrying out standardization or normalization, wherein the user can divide emotion grades according to needs.
The above description is only a preferred embodiment of the present invention, and is only used to illustrate the technical solutions of the present invention, and not to limit the protection scope of the present invention. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention shall fall within the protection scope of the present invention.

Claims (1)

1. An analysis processing method of emotion score based on emotion dictionary entity is characterized in that,
comprises 6 steps:
1) Preparing a dictionary;
2) Establishing a structure for defining an entity;
3) Establishing an entity comparator;
4) Traversing the text to be analyzed according to the established entity to generate all candidate entities;
5) Screening candidate entities;
6) Calculating an emotion score;
firstly, four dictionaries of emotional words, degree adverbs, negative words and punctuation marks need to be prepared;
the four dictionaries come from a general dictionary or a custom dictionary of a specific industry according to specific requirements; wherein,
the positive emotion words are assigned to positive scores, the stronger the emotion is, the higher the score is, and the negative emotion words are assigned to negative scores, the stronger the emotion is, the lower the score is; each degree adverb in the degree adverb dictionary is assigned with a different score according to different expressed intensities, and the score is increased when the degree represented by the degree word is higher; the negative adverbs are a dictionary of simple negative words; punctuation dictionaries are also commonly used symbol dictionaries for punctuation or segmentation;
the entity structure comprises an entity name, an entity starting index, an entity ending index, an entity type and an entity length, wherein the entity type is divided into emotional words, degree adverbs, negative words and punctuation marks;
an entity comparator is established, namely two entities are set: if the initial position of the first entity is larger than that of the second entity, returning to 1, if the initial position of the first entity is smaller than that of the second entity, returning to-1, and the initial positions of the two entities are equal, comparing the lengths of the two entities, if the length of the first entity is larger than that of the second entity, returning to 1, otherwise, returning to-1;
generating candidate entities, giving a text to be analyzed, sequentially traversing the four dictionaries, if a word in the dictionary appears in the text, constructing a corresponding entity by the word, putting the entity into a candidate entity list, generating all candidate entities after traversing the four dictionaries, and sequencing the candidate entities according to a defined filter, wherein the candidate entity list is a list generated according to the size of a starting position;
when screening entities, iteratively searching a candidate entity list, if the initial indexes of the subsequent entities are consistent with the initial index of the current entity, finding the longest entity as the entity of the current index, directly skipping the initial index of the next word which is larger than the ending index of the longest entity and the index of the current entity which is smaller than the ending index of the previous entity, judging the next entity, and finally obtaining the required entity list;
traversing the generated final entity list, if the type of the current entity is not an emotional entity, directly skipping, if the type of the current entity is the emotional entity, searching forward the position of the emotional entity or the punctuation mark entity closest to the emotional entity according to the position of the entity as an index, and simultaneously recording the number of all the emotional entities;
calculating the emotion score of the current emotional entity: the initial weight of the sentiment entity is the score of the sentiment word, the negative entity and the degree adverb entity which appear are found from the sentiment entity to the position index, the condition of 'degree adverb + sentiment word' is removed, and the score of the sentiment entity is as follows: multiplying the times of the degree adverb entity value ^ degree adverb entity by the times of the ((-1) negation word entity) by the initial weight to obtain the emotion score of the current emotional entity;
traversing all emotion entities, and summing all emotion scores to obtain the emotion score of the text to be analyzed; divided by the number of sentiment entities if normalization is required.
CN202110021645.3A 2021-01-08 2021-01-08 Emotional score analysis processing method based on emotional dictionary entity Active CN112711941B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110021645.3A CN112711941B (en) 2021-01-08 2021-01-08 Emotional score analysis processing method based on emotional dictionary entity

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110021645.3A CN112711941B (en) 2021-01-08 2021-01-08 Emotional score analysis processing method based on emotional dictionary entity

Publications (2)

Publication Number Publication Date
CN112711941A CN112711941A (en) 2021-04-27
CN112711941B true CN112711941B (en) 2022-12-27

Family

ID=75548493

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110021645.3A Active CN112711941B (en) 2021-01-08 2021-01-08 Emotional score analysis processing method based on emotional dictionary entity

Country Status (1)

Country Link
CN (1) CN112711941B (en)

Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105320960A (en) * 2015-10-14 2016-02-10 北京航空航天大学 Voting based classification method for cross-language subjective and objective sentiments
CN106407235A (en) * 2015-08-03 2017-02-15 北京众荟信息技术有限公司 A semantic dictionary establishing method based on comment data
CN106610990A (en) * 2015-10-22 2017-05-03 北京国双科技有限公司 Emotional tendency analysis method and apparatus
CN106897274A (en) * 2017-01-09 2017-06-27 北京众荟信息技术股份有限公司 Method is repeated in a kind of comment across languages
CN107656917A (en) * 2016-07-26 2018-02-02 深圳联友科技有限公司 A kind of Chinese sentiment analysis method and system
CN107862087A (en) * 2017-12-01 2018-03-30 广州简亦迅信息科技有限公司 Sentiment analysis method, apparatus and storage medium based on big data and deep learning
CN110362679A (en) * 2019-06-05 2019-10-22 北京大学(天津滨海)新一代信息技术研究院 A kind of financial field comment sensibility classification method and system based on sentiment dictionary
CN110399603A (en) * 2018-04-25 2019-11-01 北京中润普达信息技术有限公司 A kind of text-processing technical method and system based on sense-group division
CN111027322A (en) * 2019-12-13 2020-04-17 新华智云科技有限公司 Sentiment dictionary-based sentiment analysis method for fine-grained entities in financial news
CN111612339A (en) * 2020-05-21 2020-09-01 中国标准化研究院 Big data-based online commodity emotional tendency analysis method

Family Cites Families (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9747278B2 (en) * 2012-02-23 2017-08-29 Palo Alto Research Center Incorporated System and method for mapping text phrases to geographical locations
CN111985223A (en) * 2020-08-25 2020-11-24 武汉长江通信产业集团股份有限公司 Emotion calculation method based on combination of long and short memory networks and emotion dictionaries

Patent Citations (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106407235A (en) * 2015-08-03 2017-02-15 北京众荟信息技术有限公司 A semantic dictionary establishing method based on comment data
CN105320960A (en) * 2015-10-14 2016-02-10 北京航空航天大学 Voting based classification method for cross-language subjective and objective sentiments
CN106610990A (en) * 2015-10-22 2017-05-03 北京国双科技有限公司 Emotional tendency analysis method and apparatus
CN107656917A (en) * 2016-07-26 2018-02-02 深圳联友科技有限公司 A kind of Chinese sentiment analysis method and system
CN106897274A (en) * 2017-01-09 2017-06-27 北京众荟信息技术股份有限公司 Method is repeated in a kind of comment across languages
CN107862087A (en) * 2017-12-01 2018-03-30 广州简亦迅信息科技有限公司 Sentiment analysis method, apparatus and storage medium based on big data and deep learning
CN110399603A (en) * 2018-04-25 2019-11-01 北京中润普达信息技术有限公司 A kind of text-processing technical method and system based on sense-group division
CN110362679A (en) * 2019-06-05 2019-10-22 北京大学(天津滨海)新一代信息技术研究院 A kind of financial field comment sensibility classification method and system based on sentiment dictionary
CN111027322A (en) * 2019-12-13 2020-04-17 新华智云科技有限公司 Sentiment dictionary-based sentiment analysis method for fine-grained entities in financial news
CN111612339A (en) * 2020-05-21 2020-09-01 中国标准化研究院 Big data-based online commodity emotional tendency analysis method

Also Published As

Publication number Publication date
CN112711941A (en) 2021-04-27

Similar Documents

Publication Publication Date Title
CN107862343B (en) Commodity comment attribute level emotion classification method based on rules and neural network
Mahtab et al. Sentiment analysis on bangladesh cricket with support vector machine
CN109101478B (en) Aspect-level emotion analysis method for E-commerce comment text
US20160299955A1 (en) Text mining system and tool
Kaur Incorporating sentimental analysis into development of a hybrid classification model: A comprehensive study
CN112926345B (en) Multi-feature fusion neural machine translation error detection method based on data enhancement training
CN112287197B (en) Method for detecting sarcasm of case-related microblog comments described by dynamic memory cases
CN113360647B (en) 5G mobile service complaint source-tracing analysis method based on clustering
CN113312922A (en) Improved chapter-level triple information extraction method
CN113221569A (en) Method for extracting text information of damage test
CN112749283A (en) Entity relationship joint extraction method for legal field
CN111984790A (en) Entity relation extraction method
CN115374258A (en) Knowledge base query method and system combining semantic understanding with question template
Chumwatana COMMENT ANALYSIS FOR PRODUCT AND SERVICE SATISFACTION FROM THAI CUSTOMERS'REVIEW IN SOCIAL NETWORK
CN112711941B (en) Emotional score analysis processing method based on emotional dictionary entity
CN117291190A (en) User demand calculation method based on emotion dictionary and LDA topic model
Abdussalam et al. BERT implementation on news sentiment analysis and analysis benefits on branding
Roșca et al. UNLOCKING CUSTOMER SENTIMENT INSIGHTS WITH AZURE SENTIMENT ANALYSIS: A COMPREHENSIVE REVIEW AND ANALYSIS.
Jayawickrama et al. Seeking sinhala sentiment: Predicting facebook reactions of sinhala posts
Alorini et al. Machine learning enabled sentiment index estimation using social media big data
Abdulla et al. Sentiment analyses for kurdish social network texts using naive bayes classifier
CN114117047A (en) Method and system for classifying illegal voice based on C4.5 algorithm
Bhattacharya et al. Towards the exploitation of statistical language models for sentiment analysis of twitter posts
Kumar et al. Deep learning-based emotion classification of Hindi text from social media
Wadhwani et al. Analysis and implementation of sentiment analysis of user YouTube comments

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant