CN109948148A - A kind of text information emotion determination method and decision maker - Google Patents

A kind of text information emotion determination method and decision maker Download PDF

Info

Publication number
CN109948148A
CN109948148A CN201910149488.7A CN201910149488A CN109948148A CN 109948148 A CN109948148 A CN 109948148A CN 201910149488 A CN201910149488 A CN 201910149488A CN 109948148 A CN109948148 A CN 109948148A
Authority
CN
China
Prior art keywords
text
sentiment orientation
determined
determination method
probability
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201910149488.7A
Other languages
Chinese (zh)
Inventor
吴明平
黄楷
梁新敏
吴明辉
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
BEIJING XUEZHITU NETWORK TECHNOLOGY Co Ltd
Original Assignee
BEIJING XUEZHITU NETWORK TECHNOLOGY Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by BEIJING XUEZHITU NETWORK TECHNOLOGY Co Ltd filed Critical BEIJING XUEZHITU NETWORK TECHNOLOGY Co Ltd
Priority to CN201910149488.7A priority Critical patent/CN109948148A/en
Publication of CN109948148A publication Critical patent/CN109948148A/en
Pending legal-status Critical Current

Links

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The present invention provides a kind of text information emotion determination method and decision makers, solve the technical issues of existing sentiment analysis method can measure evaluation to abundant athymia.Method includes: to utilize the Sentiment orientation probability that text to be determined is obtained based on the Naive Bayes Classification process that Sentiment orientation is classified.Using Sentiment orientation classification embody be non-particular emotion type qualitiative trends, not directly react affective style, avoid disaggregated model explication de texte defect and to industry data analysis limitation.There is a situation where to be inclined to short sentence more for single text simultaneously, give the method for basic activity attribute subdivision, provides more accurate emotion and determine result.

Description

A kind of text information emotion determination method and decision maker
Technical field
The present invention relates to semantics recognition technical fields, and in particular to a kind of text information emotion determination method and determines dress It sets.
Background technique
The text informations such as comment, original blog article that user delivers in each platform in internet reflect user for some thing Part holds attitude or the subjective assessment to some brand product, these information can be used for excavating the interest characteristics and row of user For mode, more accurate the analysis of public opinion is carried out, to realize personalized precision marketing.
Currently, mainly having two major classes for the method for realizing text information sentiment analysis.One kind is rule-based and statistics Method, mainly in conjunction with sentiment dictionary and sentence structure, but sentiment dictionary and the irregular sentence structure of text information is accurate Building is technological difficulties.It is another kind of, it is to indicate text vector, it is then in conjunction with the sorting algorithm of machine learning, emotion is poor The different task as a text classification is completed.Two classes are only gived mostly for sentiment analysis in existing Text Classification Polar emotion determines only have front and negative emotion, this is insufficient for the analysis of public opinion classification.And it utilizes excessive Emotional semantic classification classification can not only aggravate the text marking cost of sentiment analysis, but also can not generate higher break-up value, Error rate can be higher instead.
Summary of the invention
In view of the above problems, the embodiment of the present invention provides a kind of text information emotion determination method and decision maker, solves The technical issues of existing sentiment analysis method can measure evaluation to abundant athymia.
The text information emotion determination method of the embodiment of the present invention, comprising:
The Sentiment orientation probability of text to be determined is obtained using the Naive Bayes Classification process classified based on Sentiment orientation.
In one embodiment of the invention, the formation of the Naive Bayes Classification process based on Sentiment orientation classification includes:
Text Pretreatment is carried out to source data and forms source data text;
It is extracted in the source data text and forms sampled data text;
Tendency mark is carried out to the sampled data text and forms Sentiment orientation classification and corresponding sample data text collection;
Text character extraction is carried out to the sampled data text, training sample set is formed according to the text feature;
The Sentiment orientation classification is in training sample during forming Naive Bayes Classification by the training sample set In the frequency of occurrences and each text feature the conditional probability of each Sentiment orientation is estimated.
In one embodiment of the invention, the Sentiment orientation classification includes positive, negative and neutral.
In one embodiment of the invention, the source data is at least derived from electric business platform, microblog and wechat platform One platform.
It is described that Text Pretreatment is carried out including at least a kind of following processing mode to source data in one embodiment of the invention:
For temporal information, delete processing is carried out;
For link information, delete processing is carried out;
For topic and/or subject information, delete processing is carried out;
For forwarding micro-blog information, only retain active user's issuing microblog content;
For user name and/or user's pet name, delete processing is carried out;
For additional character, delete processing is carried out;
For emoticon, regular expression matching is carried out, the corresponding received text of the regular expression is replaced with.
In one embodiment of the invention, the extraction in the source data text, which uses, randomly selects mode.
It is described to include: to sampled data text progress Text character extraction described in one embodiment of the invention
The high temperature vocabulary of the sampled data text is filtered out using bag of words;
Each high temperature vocabulary weight is calculated using TF-IDF algorithm;
Text eigenvector is determined according to the high temperature vocabulary weight.
In one embodiment of the invention, the Sentiment orientation probability for obtaining text to be determined includes:
Obtain the text feature of the text to be determined;
It is special to the text of the text to be determined by the Naive Bayes Classification process based on Sentiment orientation classification Sign carries out class probability and compares the Sentiment orientation probability for obtaining the text to be determined.
In one embodiment of the invention, further includes:
Sentiment orientation fragmentation threshold is set according to the Sentiment orientation probability, according to the Sentiment orientation probability of text to be determined Determine type of emotion.
In one embodiment of the invention, the setting Sentiment orientation fragmentation threshold includes:
The Sentiment orientation fragmentation threshold is according to the accuracy in industry field to Sentiment orientation verification acquisition and recalls Rate determines.
In one embodiment of the invention, further includes:
The text to be determined split according to industry attribute and forms text fragment, is based on Sentiment orientation using described The Naive Bayes Classification process of classification is that the text fragment determines Sentiment orientation probability.
In one embodiment of the invention, the industry attribute is according to the industry attributive character keyword in the text to be determined Identification.
In one embodiment of the invention, the forming process of the industrial characteristic keyword includes:
Participle is carried out to the business content in data source and forms corpus;
To basic noun in deactivated industry field after corpus cleaning;
It chooses noun corpus and carries out word frequency statistics, validity is carried out to high frequency words and filters to form the industrial characteristic key Word.
In one embodiment of the invention, it is described according to industry attribute to the text to be determined carry out split form text fragment Include:
Text to be determined is made pauses in reading unpunctuated ancient writings to form text chunk corresponding with industry attribute according to industrial characteristic keyword and separator It falls.
It is described to determine that Sentiment orientation probability includes: for the text fragment in one embodiment of the invention
The Sentiment orientation probability of the text to be determined and the emotion of text fragment described in the text to be determined are inclined Mapping association is formed to probability.
The text information emotion decision maker of the embodiment of the present invention, comprising:
Memory, for storing the program code of above-mentioned text information emotion determination method treatment process;
Processor, for executing said program code.
The text information emotion decision maker of the embodiment of the present invention, comprising:
Naive Bayes Classification determination module, for being obtained using based on the Naive Bayes Classification process that Sentiment orientation is classified Take the Sentiment orientation probability of text to be determined.
In one embodiment of the invention, further includes:
Sentiment orientation division module, for according to the Sentiment orientation probability be arranged Sentiment orientation fragmentation threshold, according to Determine the Sentiment orientation determine the probability type of emotion of text.
In one embodiment of the invention, further includes:
Text attribute discriminating module forms text chunk for split to the text to be determined according to industry attribute It falls, is that the text fragment determines that Sentiment orientation is general using the Naive Bayes Classification process based on Sentiment orientation classification Rate.
The text information emotion determination method and decision maker of the embodiment of the present invention embody non-spy using Sentiment orientation classification Determine the qualitiative trends of affective style, does not react affective style directly, avoid the explication de texte defect of disaggregated model and to industry The limitation of data analysis.Can for industry data provide sentiment analysis as a result, can not only provide emotion determine as a result, and And user is allowed to carry out data correction according to self demand, meanwhile, for giving for each there are multiattribute text The Judgment by emotion of a attribute is as a result, directly judge emotion compared to pure original text, the real meaning for providing user that can be more accurate See, while based on different attributes, the achievable some customization precisely analysis of user.
The text information emotion determination method and decision maker of the embodiment of the present invention are suitable for broad data source data ruler Degree can determine that corresponding text feature and the probability of opposite Sentiment orientation classification are estimated when forming Sentiment orientation probability simultaneously Meter.Incremental training sample is formed for comprehensive different data sources, using text to be determined improves the emotion judgement essence of user comment Degree has more technical advantage.
Detailed description of the invention
Fig. 1 show the flow diagram of one embodiment of the invention text information emotion determination method.
Fig. 2 show the flow diagram of one embodiment of the invention text information emotion determination method.
Fig. 3 show the flow diagram of one embodiment of the invention text information emotion determination method.
Fig. 4 show the configuration diagram of one embodiment of the invention text information emotion decision maker.
Specific embodiment
To be clearer and more clear the objectives, technical solutions, and advantages of the present invention, below in conjunction with attached drawing and specific embodiment party The invention will be further described for formula.Obviously, described embodiments are only a part of the embodiments of the present invention, rather than all Embodiment.Based on the embodiments of the present invention, those of ordinary skill in the art institute without creative efforts The every other embodiment obtained, shall fall within the protection scope of the present invention.
The text information emotion determination method of one embodiment of the invention is as shown in Figure 1.In Fig. 1, the present embodiment includes:
Step 100: the emotion of text to be determined is obtained using the Naive Bayes Classification process classified based on Sentiment orientation It is inclined to probability.
The qualitiative trends that nonspecific affective style is embodied using Sentiment orientation classification, do not react affective style directly.It is based on The Naive Bayes Classification process of Sentiment orientation classification, which passes through, determines that the training sample set of the text information composition of Sentiment orientation is true The conditional probability of the frequency of occurrences of the fixed each Sentiment orientation classification in training sample and each sample characteristics to each classification Estimation.Naive Bayes Classification process based on Sentiment orientation classification carries out probability to the text feature of text to be determined and handles shape The quantization probability being inclined at text emotion to be determined.
The text information emotion determination method of the embodiment of the present invention by user, incline by the mood of potential expression in text information To assorting process randomization is utilized, removing is formed in the degree quantized data of specific type of emotion, so that subsequent specific mood class Type determines there is quantitative data basis, and targetedly careful judgement can be made according to emotion expression service of the industry field to user.
As shown in Figure 1, in an embodiment of the present invention, based on Sentiment orientation classification in text information emotion determination method The formation of Naive Bayes Classification process includes:
Step 110: Text Pretreatment being carried out to source data and forms source data text.
Source data may include the data of different types of internet platform, for example including but be not limited to electric business platform, micro- Rich platform and wechat platform.Data source can cover multiple syndicated data sources, for example including but be not limited to food, cosmetics and postal Pass industry.Data source can cover multiple industry fields, for example including but be not limited to collect money, manufacture and after sale.The information of source data Type includes but is not limited to that user issues text data source platform and text industry field etc..
The purpose of Text Pretreatment is the redundant data and interference data excluded outside text validity feature information.To data source The mode that text carries out Text Pretreatment includes but is not limited to following treatment process:
For temporal information, delete processing is carried out;
For link information, delete processing is carried out;
For topic, subject information, delete processing is carried out;
For forwarding micro-blog information, only retain active user's issuing microblog content;
For user name, user's pet name, delete processing is carried out;
For additional character, delete processing is carried out;
For emoticon, regular expression matching is carried out, the corresponding received text of expression formula is replaced with.
Step 120: being extracted in source data text and form sampled data text.
Sampling is a subset in order to obtain source data text in source data text.Extraction mode includes but is not limited to It randomly selects, layered extraction, entirety extract and system extracts.
The embodiment of the present invention preferably randomly selects mode.With guarantee sampled data text in source data text distributivity compared with Good, content is associated between lacking text, is guaranteed the independence of sampled data text, is avoided the occurrence of the normal distribution of implicit information.
Step 130: tendency mark being carried out to sampled data text and forms Sentiment orientation classification and corresponding sample data text Set.
Tendency mark is carried out using manual type.The rule of tendency mark includes front, negatively, neutral three classes emotion, shape At three classes class categories.Sampled data text marks the set for determining sampled data text in tendency classification according to tendency.
Error caused by artificial cognition standard can be reduced by substituting specific type of emotion mark using Sentiment orientation.With sampling The forming process of data text combines, so that sampled data text is on the basis of being based on source data original distribution state, it is sharp It is excluded to introduce mood classification with Sentiment orientation classification, simplifies subsequent quantizatiion processing.
Step 140: Text character extraction being carried out to sampled data text, training sample set is formed according to text feature.
Text character extraction forms the series of features attribute of each sampled data text.Those skilled in the art can manage Solution Text character extraction process can at least take bag of words (Bag of Words, BOW), term vector (Word Embedding) or The model extractions text features such as TF-IDF (term frequency inverse document frequency) algorithm.
Text character extraction in the embodiment of the present invention is combined using bag of words and TF-IDF algorithm, and text feature mentions The process is taken to include:
The high temperature vocabulary of sampled data text is filtered out using bag of words;
Each high temperature vocabulary weight is calculated using TF-IDF algorithm;
Text eigenvector is determined according to high temperature vocabulary weight.
Corresponding with Text eigenvector using sampled data text, Text eigenvector forms the training of sampled data text Sample set.
Step 150: Sentiment orientation classification is in training sample during forming Naive Bayes Classification by training sample set In the frequency of occurrences and each text feature the conditional probability of each Sentiment orientation is estimated.
The Piao to be classified based on Sentiment orientation is formed to the training of general Naive Bayes Classification process by training sample set Plain Bayes's classification process.
The text information emotion determination method of the embodiment of the present invention substitutes specific type of emotion and formed using Sentiment orientation to be divided Class classification and training sample set allow general Naive Bayes Classification process to carry out emotion to text information Sentiment orientation strong The reasonable quantization of degree.The potential information distribution of the formation of training sample set effectively reflection source data, Text eigenvectorization are effective The inherent semantic feature for reflecting training sample ensure that improved Naive Bayes Classification process carries out emotion to text to be measured and inclines It is accurate to class probability valuation when classification.
As shown in Figure 1, in an embodiment of the present invention, the feelings of text to be determined are obtained in text information emotion determination method Probability is inclined in sense
Step 160: obtaining the text feature of text to be determined.
Text to be determined can be source data text except sampled data text, can be the text from source data. The extraction that the Text character extraction of text to be determined can be combined with bag of words in above-described embodiment and TF-IDF algorithm Journey, so that vector space locating for the text feature of text to be determined and training sample concentrate the text feature institute of sampled data text It is compatible to locate vector space.
Step 170: special to the text of text to be determined by the Naive Bayes Classification process classified based on Sentiment orientation Sign carries out class probability and compares the Sentiment orientation probability for obtaining text to be determined.
It is a kind of carry out class probability comparison process description include:
P (pos | neg)=exp (Tm [neg | pos]-Tm [pos | neg]);
Prob (pos)=1/P (pos);
Prob (neg)=1-1/P (neg).
Wherein, neg is the text feature array that training sample concentrates negative sample to be formed, and pos is that training sample is concentrated just The text feature array that face sample is formed, and P (pos | neg) it is the comparison probability for occurring front evaluation in negative classification, Prob It (pos) is the probability for belonging to front classification, Prob (neg) is to belong to the probability negatively classified, P (pos | neg) it is conditional probability Valuation.
The text information emotion determination method of one embodiment of the invention is as shown in Figure 2.In Fig. 2, in above-mentioned text information On the basis of emotion determination method, further includes:
Step 200: Sentiment orientation fragmentation threshold being arranged according to Sentiment orientation probability, according to the Sentiment orientation of text to be determined Determine the probability type of emotion.
The Sentiment orientation probability of each text to be determined is between [0,1], and closer to 1, emotion is more positive, can basis Magnitude is judged to accepting, likes or thirst for, and closer to 0, emotion is more passive, can be judged to despising, dislike or detesting according to magnitude It dislikes.Fragmentation threshold is set not only can provide three kinds of Sentiment orientations as a result, the reality that can also allow for user according to oneself Demand is adjusted the classification standard of emotion.
As shown in Fig. 2, Sentiment orientation, which is arranged, in an embodiment of the present invention, in text information emotion determination method is segmented threshold Value includes:
Step 210: segmentation threshold being determined to the accuracy and recall rate of Sentiment orientation verification acquisition according in industry field Value.
The Sentiment orientation probability of text to be determined is manually determined simultaneously, determines that Sentiment orientation is general by data statistics The accuracy and recall rate that rate manually determines relatively, and fragmentation threshold is determined according to accuracy and recall rate, improve type of emotion Judgement accuracy.
In an embodiment of the present invention, the Sentiment orientation probability of the text to be determined obtained after being handled according to assorting process, Adjusting thresholds adjustment is carried out according to user demand demand, strong just, weak positive, neutrality can be classified to emotional intensity, it is weak negative, strong minus five Class is further formed accurate type of emotion and determines.
The text information emotion determination method of one embodiment of the invention is as shown in Figure 3.In Fig. 3, in above-mentioned text information On the basis of emotion determination method, further includes:
Step 300: fractionation being carried out to text to be determined according to industry attribute and forms text fragment, using based on Sentiment orientation The Naive Bayes Classification process of classification is that each text fragment determines Sentiment orientation probability.
Text to be determined can correspond at least one industry attribute, and each industry attribute includes a series of (for lines of description Industry feature) industrial characteristic keyword, industrial characteristic keyword root is according to relevant industries field in the data source of text to be determined Industrial characteristic formed.
As shown in figure 3, in an embodiment of the present invention, the industrial characteristic of industry attribute in text information emotion determination method The forming process of keyword includes:
Step 310: participle being carried out to the business content in data source and forms corpus.Business content is allowed to carry out height Effect processing.
Step 320: to basic noun in deactivated industry field after corpus cleaning.So that unreasonable corpus and redundancy noun, The corpus such as general sense noun rationally exclude.
Step 330: choosing noun corpus and carry out word frequency statistics, validity is carried out to high frequency words and filters to form industrial characteristic pass Key word.Word frequency statistics, which are carried out, using key nouns in industry chooses high frequency vocabulary as industrial characteristic keyword.
As shown in figure 3, in an embodiment of the present invention, being torn open in text information emotion determination method to text to be determined Point forming text fragment includes:
Step 340: determining the industrial characteristic keyword of industry attribute in text to be determined.
Industry attribute in text to be determined is determined according to different industries attribute;Determine the row of industry attribute in text to be determined Industry feature critical word.
Step 350: text to be determined made pauses in reading unpunctuated ancient writings according to industrial characteristic keyword and separator, and it is corresponding with industry attribute to be formed Text fragment.
In one embodiment of the invention, a kind of preferred punctuate process includes:
When the text to be determined only including most industrial characteristic keywords is without punctuate.
When include at least two industrial characteristic keywords text to be determined from second industrial characteristic keyword basis Decollator (comma or space) before each industrial characteristic keyword is made pauses in reading unpunctuated ancient writings;
Text after last industrial characteristic keyword is final text fragment;
Merge industry feature critical word when two adjacent industry feature critical words belong to same industry.
As shown in figure 3, in an embodiment of the present invention, in text information emotion determination method being that each text fragment determines The Sentiment orientation probability of text to be determined includes:
Step 360: the Sentiment orientation probability of text to be determined and the Sentiment orientation of text fragment in text to be determined is general Rate forms mapping association.
When carrying out the Naive Bayes Classification classified based on Sentiment orientation to text to be determined, while each is waited sentencing Determine the text fragment in text and individually carries out the Naive Bayes Classification classified based on Sentiment orientation for the feelings to text to be determined The Sentiment orientation probability of sense tendency probability and each text fragment is associated the Sentiment orientation judgement to form text entirety to be determined The Sentiment orientation judgment basis formed in foundation and text to be determined for each determining industry attribute for including.
The text information emotion decision maker of the embodiment of the present invention, comprising:
Memory, the program generation of the treatment process of the text information emotion determination method for storing the embodiment of the present invention Code;
Processor, the program generation of the treatment process of the text information emotion determination method for executing the embodiment of the present invention Code.
DSP (Digital Signal Processing) digital signal processor, FPGA (Field- can be used Programmable Gate Array) field programmable gate array, MCU (Microcontroller Unit) system board, SoC (system on a chip) system board or the minimum system of PLC (Programmable Logic Controller) including I/O System.
The text information emotion decision maker of one embodiment of the invention is as shown in Figure 4.In Fig. 4, the present embodiment includes:
Naive Bayes Classification determination module 10, for utilizing the Naive Bayes Classification process classified based on Sentiment orientation Obtain the Sentiment orientation probability of text to be determined.
As shown in figure 4, in one embodiment of the invention, text information emotion decision maker further include:
Sentiment orientation division module 20, for Sentiment orientation fragmentation threshold to be arranged according to Sentiment orientation probability, according to wait sentence Determine the Sentiment orientation determine the probability type of emotion of text.
As shown in figure 4, in one embodiment of the invention, text information emotion decision maker further include:
Text attribute discriminating module 30 forms text fragment for split to text to be determined according to industry attribute, It is that each text fragment determines Sentiment orientation probability using the Naive Bayes Classification process classified based on Sentiment orientation.
As shown in figure 4, in one embodiment of the invention, Naive Bayes Classification determination module 10 includes:
Pretreatment unit 11 forms source data text for carrying out Text Pretreatment to source data.
Sampling unit 12 forms sampled data text for extracting in source data text.
Classification annotation unit 13, for carrying out to sampled data text, tendency mark forms Sentiment orientation classification and correspondence is adopted Sample data text set.
Text character extraction unit 14, for carrying out Text character extraction to sampled data text, according to text feature shape At training sample set.
Assorting process forms unit 15, for passing through Sentiment orientation during training sample set formation Naive Bayes Classification The frequency of occurrences and each text feature of the classification in training sample estimate the conditional probability of each Sentiment orientation.
As shown in figure 4, in one embodiment of the invention, Naive Bayes Classification determination module 10 further include:
Acquiring unit 16 is inputted, for obtaining the text feature of text to be determined.
Class probability judging unit 17 is treated for the Naive Bayes Classification process by being classified based on Sentiment orientation and is sentenced The text feature for determining text carries out class probability and compares the Sentiment orientation probability for obtaining text to be determined.
As shown in figure 4, in one embodiment of the invention, Sentiment orientation division module 20 includes:
Threshold setting unit 21, for according to the accuracy and recall rate obtained in industry field to Sentiment orientation verification Determine fragmentation threshold.
As shown in figure 4, in one embodiment of the invention, text attribute discriminating module 30 includes:
Participle unit 31 forms corpus for carrying out participle to the business content in data source.
Unit 32 is eliminated, for basic noun in deactivated industry field after corpus cleaning.
Statistic unit 33 carries out word frequency statistics for choosing noun corpus, carries out validity to high frequency words and filter to form row Industry feature critical word.
As shown in figure 4, in one embodiment of the invention, text attribute discriminating module 30 further include:
Keyword determination unit 34, for determining the industrial characteristic keyword of industry attribute in text to be determined.
Punctuate unit 35 to be formed and industry category for text to be determined to be made pauses in reading unpunctuated ancient writings according to industrial characteristic keyword and separator The corresponding text fragment of property.
As shown in figure 4, in one embodiment of the invention, text attribute discriminating module 30 further include:
Attribute probabilistic correlation unit 36, for by text chunk in the Sentiment orientation probability of text to be determined and text to be determined The Sentiment orientation probability fallen forms mapping association.
The foregoing is only a preferred embodiment of the present invention, but scope of protection of the present invention is not limited thereto, In the technical scope disclosed by the present invention, any changes or substitutions that can be easily thought of by anyone skilled in the art, It should be covered by the protection scope of the present invention.Therefore, protection scope of the present invention should be with the protection model of claims Subject to enclosing.

Claims (19)

1. a kind of text information emotion determination method characterized by comprising
The Sentiment orientation probability of text to be determined is obtained using the Naive Bayes Classification process classified based on Sentiment orientation.
2. text information emotion determination method as described in claim 1, which is characterized in that it is described based on Sentiment orientation classification The formation of Naive Bayes Classification process includes:
Text Pretreatment is carried out to source data and forms source data text;
It is extracted in the source data text and forms sampled data text;
Tendency mark is carried out to the sampled data text and forms Sentiment orientation classification and corresponding sample data text collection;
Text character extraction is carried out to the sampled data text, training sample set is formed according to the text feature;
The Sentiment orientation classification is in training sample during forming Naive Bayes Classification by the training sample set The frequency of occurrences and each text feature estimate the conditional probability of each Sentiment orientation.
3. text information emotion determination method as claimed in claim 2, which is characterized in that the Sentiment orientation classification includes just Face, negative and neutrality.
4. text information emotion determination method as claimed in claim 2, which is characterized in that the source data at least derives from electricity A platform in quotient's platform, microblog and wechat platform.
5. text information emotion determination method as claimed in claim 2, which is characterized in that described pre- to source data progress text Processing includes at least a kind of following processing mode:
For temporal information, delete processing is carried out;
For link information, delete processing is carried out;
For topic and/or subject information, delete processing is carried out;
For forwarding micro-blog information, only retain active user's issuing microblog content;
For user name and/or user's pet name, delete processing is carried out;
For additional character, delete processing is carried out;
For emoticon, regular expression matching is carried out, the corresponding received text of the regular expression is replaced with.
6. text information emotion determination method as claimed in claim 2, which is characterized in that described in the source data text It extracts to use and randomly selects mode.
7. text information emotion determination method as claimed in claim 2, which is characterized in that it is described described to the hits Carrying out Text character extraction according to text includes:
The high temperature vocabulary of the sampled data text is filtered out using bag of words;
Each high temperature vocabulary weight is calculated using TF-IDF algorithm;
Text eigenvector is determined according to the high temperature vocabulary weight.
8. text information emotion determination method as described in claim 1, which is characterized in that the feelings for obtaining text to be determined Probability is inclined in sense
Obtain the text feature of the text to be determined;
By the Naive Bayes Classification process based on Sentiment orientation classification to the text feature of the text to be determined into Row class probability compares the Sentiment orientation probability for obtaining the text to be determined.
9. text information emotion determination method as described in claim 1, which is characterized in that further include:
Sentiment orientation fragmentation threshold is set according to the Sentiment orientation probability, according to the Sentiment orientation determine the probability of text to be determined Type of emotion.
10. text information emotion determination method as claimed in claim 9, which is characterized in that the setting Sentiment orientation segmentation Threshold value includes:
The Sentiment orientation fragmentation threshold is according to true to the accuracy and recall rate of Sentiment orientation verification acquisition in industry field It is fixed.
11. the text information emotion determination method as described in claim 1 or 9, which is characterized in that further include:
The text to be determined split according to industry attribute and forms text fragment, is classified using described based on Sentiment orientation Naive Bayes Classification process be the text fragment determine Sentiment orientation probability.
12. text information emotion determination method as claimed in claim 11, which is characterized in that the industry attribute is according to Industry attributive character keyword recognition in text to be determined.
13. text information emotion determination method as claimed in claim 12, which is characterized in that the industrial characteristic keyword Forming process includes:
Participle is carried out to the business content in data source and forms corpus;
To basic noun in deactivated industry field after corpus cleaning;
It chooses noun corpus and carries out word frequency statistics, validity is carried out to high frequency words and filters to form the industrial characteristic keyword.
14. text information emotion determination method as claimed in claim 11, which is characterized in that it is described according to industry attribute to institute It states text to be determined and split and form text fragment and include:
Text to be determined is made pauses in reading unpunctuated ancient writings to form text fragment corresponding with industry attribute according to industrial characteristic keyword and separator.
15. text information emotion determination method as claimed in claim 11, which is characterized in that described to sentence for the text fragment Determining Sentiment orientation probability includes:
The Sentiment orientation probability of the text to be determined and the Sentiment orientation of text fragment described in the text to be determined is general Rate forms mapping association.
16. a kind of text information emotion decision maker characterized by comprising
Memory, for storing the journey of the text information emotion determination method treatment process as described in claim 1 to 15 is any Sequence code;
Processor, for executing said program code.
17. a kind of text information emotion decision maker characterized by comprising
Naive Bayes Classification determination module, for using Naive Bayes Classification process acquisition classify based on Sentiment orientation to Determine the Sentiment orientation probability of text.
18. text information emotion decision maker as claimed in claim 17, which is characterized in that further include:
Sentiment orientation division module, for Sentiment orientation fragmentation threshold to be arranged according to the Sentiment orientation probability, according to be determined The Sentiment orientation determine the probability type of emotion of text.
19. text information emotion decision maker as claimed in claim 17, which is characterized in that further include:
Text attribute discriminating module forms text fragment for split to the text to be determined according to industry attribute, benefit It is that the text fragment determines Sentiment orientation probability with the Naive Bayes Classification process based on Sentiment orientation classification.
CN201910149488.7A 2019-02-28 2019-02-28 A kind of text information emotion determination method and decision maker Pending CN109948148A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910149488.7A CN109948148A (en) 2019-02-28 2019-02-28 A kind of text information emotion determination method and decision maker

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910149488.7A CN109948148A (en) 2019-02-28 2019-02-28 A kind of text information emotion determination method and decision maker

Publications (1)

Publication Number Publication Date
CN109948148A true CN109948148A (en) 2019-06-28

Family

ID=67007036

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910149488.7A Pending CN109948148A (en) 2019-02-28 2019-02-28 A kind of text information emotion determination method and decision maker

Country Status (1)

Country Link
CN (1) CN109948148A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111177392A (en) * 2019-12-31 2020-05-19 腾讯云计算(北京)有限责任公司 Data processing method and device
CN112685559A (en) * 2020-12-21 2021-04-20 深圳供电局有限公司 Monitoring method, device, computer equipment and medium for metering automation system
CN112711693A (en) * 2019-10-24 2021-04-27 富驰律法(北京)科技有限公司 Litigation clue mining method and system based on multi-feature fusion
CN113177163A (en) * 2021-04-28 2021-07-27 烟台中科网络技术研究所 Method, system and storage medium for social dynamic information sentiment analysis
CN115187996A (en) * 2022-09-09 2022-10-14 中电科新型智慧城市研究院有限公司 Semantic recognition method and device, terminal equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116637A (en) * 2013-02-08 2013-05-22 无锡南理工科技发展有限公司 Text sentiment classification method facing Chinese Web comments
CN105095190A (en) * 2015-08-25 2015-11-25 众联数据技术(南京)有限公司 Chinese semantic structure and finely segmented word bank combination based emotional analysis method
CN105912576A (en) * 2016-03-31 2016-08-31 北京外国语大学 Emotion classification method and emotion classification system
CN108108462A (en) * 2017-12-29 2018-06-01 河南科技大学 A kind of text emotion analysis method of feature based classification
CN109214454A (en) * 2018-08-31 2019-01-15 东北大学 A kind of emotion community classification method towards microblogging
CN109271634A (en) * 2018-09-17 2019-01-25 重庆理工大学 A kind of microblog text affective polarity check method based on user feeling tendency perception

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103116637A (en) * 2013-02-08 2013-05-22 无锡南理工科技发展有限公司 Text sentiment classification method facing Chinese Web comments
CN105095190A (en) * 2015-08-25 2015-11-25 众联数据技术(南京)有限公司 Chinese semantic structure and finely segmented word bank combination based emotional analysis method
CN105912576A (en) * 2016-03-31 2016-08-31 北京外国语大学 Emotion classification method and emotion classification system
CN108108462A (en) * 2017-12-29 2018-06-01 河南科技大学 A kind of text emotion analysis method of feature based classification
CN109214454A (en) * 2018-08-31 2019-01-15 东北大学 A kind of emotion community classification method towards microblogging
CN109271634A (en) * 2018-09-17 2019-01-25 重庆理工大学 A kind of microblog text affective polarity check method based on user feeling tendency perception

Cited By (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN112711693A (en) * 2019-10-24 2021-04-27 富驰律法(北京)科技有限公司 Litigation clue mining method and system based on multi-feature fusion
CN112711693B (en) * 2019-10-24 2024-04-09 富驰律法(北京)科技有限公司 Litigation thread mining method and system based on multi-feature fusion
CN111177392A (en) * 2019-12-31 2020-05-19 腾讯云计算(北京)有限责任公司 Data processing method and device
CN112685559A (en) * 2020-12-21 2021-04-20 深圳供电局有限公司 Monitoring method, device, computer equipment and medium for metering automation system
CN112685559B (en) * 2020-12-21 2024-01-23 深圳供电局有限公司 Monitoring method, device, computer equipment and medium for metering automation system
CN113177163A (en) * 2021-04-28 2021-07-27 烟台中科网络技术研究所 Method, system and storage medium for social dynamic information sentiment analysis
CN115187996A (en) * 2022-09-09 2022-10-14 中电科新型智慧城市研究院有限公司 Semantic recognition method and device, terminal equipment and storage medium
CN115187996B (en) * 2022-09-09 2023-01-06 中电科新型智慧城市研究院有限公司 Semantic recognition method and device, terminal equipment and storage medium

Similar Documents

Publication Publication Date Title
CN109948148A (en) A kind of text information emotion determination method and decision maker
CN107491531B (en) Chinese network comment sensibility classification method based on integrated study frame
CN107977798B (en) Risk assessment method for quality of electronic commerce product
CN109829166B (en) People and host customer opinion mining method based on character-level convolutional neural network
CN109165294B (en) Short text classification method based on Bayesian classification
CN105824922B (en) A kind of sensibility classification method merging further feature and shallow-layer feature
CN108073673A (en) A kind of legal knowledge map construction method, apparatus, system and medium based on machine learning
Khatri Sarcasm detection in tweets with BERT and GloVe embeddings
CN104778186B (en) Merchandise items are mounted to the method and system of standardized product unit
CN103116637A (en) Text sentiment classification method facing Chinese Web comments
CN110096575B (en) Psychological portrait method facing microblog user
CN103064971A (en) Scoring and Chinese sentiment analysis based review spam detection method
TW201115370A (en) Systems and methods for capturing and managing collective social intelligence information
CN112632274B (en) Abnormal event classification method and system based on text processing
CN112597283B (en) Notification text information entity attribute extraction method, computer equipment and storage medium
CN112015721A (en) E-commerce platform storage database optimization method based on big data
CN109299252A (en) The viewpoint polarity classification method and device of stock comment based on machine learning
CN109933648A (en) A kind of differentiating method and discriminating device of real user comment
Alksher et al. A review of methods for mining idea from text
TWI477987B (en) Methods for sentimental analysis of news text
CN108733652A (en) The test method of film review emotional orientation analysis based on machine learning
CN116304020A (en) Industrial text entity extraction method based on semantic source analysis and span characteristics
KR20130083092A (en) Summary information generating system and method for review of product and service
CN109035025A (en) The method and apparatus for evaluating stock comment reliability
CN106227720B (en) A kind of APP software users comment mode identification method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information
CB02 Change of applicant information

Address after: Room 2020, 2 / F, building 27, No. 25, North Third Ring Road West, Haidian District, Beijing 100098

Applicant after: Beijing minglue Zhaohui Technology Co.,Ltd.

Address before: 100070 Wangjing SOHO tower 1-c-1802, Chaoyang District, Beijing

Applicant before: BEIJING SUPERTOOL INTERNET TECHNOLOGY LTD.

RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20190628