CN103970864A - Emotion classification and emotion component analyzing method and system based on microblog texts - Google Patents

Emotion classification and emotion component analyzing method and system based on microblog texts Download PDF

Info

Publication number
CN103970864A
CN103970864A CN201410193638.1A CN201410193638A CN103970864A CN 103970864 A CN103970864 A CN 103970864A CN 201410193638 A CN201410193638 A CN 201410193638A CN 103970864 A CN103970864 A CN 103970864A
Authority
CN
China
Prior art keywords
mood
classification
predicate
institute
word
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201410193638.1A
Other languages
Chinese (zh)
Other versions
CN103970864B (en
Inventor
徐华
杨炜炜
王玮
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Tsinghua University
Original Assignee
Tsinghua University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Tsinghua University filed Critical Tsinghua University
Priority to CN201410193638.1A priority Critical patent/CN103970864B/en
Publication of CN103970864A publication Critical patent/CN103970864A/en
Application granted granted Critical
Publication of CN103970864B publication Critical patent/CN103970864B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/36Creation of semantic tools, e.g. ontology or thesauri
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/30Information retrieval; Database structures therefor; File system structures therefor of unstructured textual data
    • G06F16/35Clustering; Classification

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computational Linguistics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses an emotion classification and emotion component analyzing method based on microblog texts. The method comprises the following steps: acquiring the multiple microblog texts released by a user from the internet; word classification is conducted on the microblog texts to obtain a plurality of words according to the part of speech of each word; extracting a plurality of characteristic words from the multiple words; training a classifier of each node in an emotion classifying system according to the multiple characteristic words to construct the emotion classifying system, and achieving emotion classification through the motion classifying system; analyzing microblog text emotion components according to a classification result. By means of the emotion classification and emotion component analyzing method based on microblog texts, the emotion classifying system is constructed by extracting the multiple characteristic words, emotion classification is achieved, and the emotion components of the microblog texts are analyzed according to the classification result. Time is saved, the classifying speed and classifying effect are improved, and the emotion components further can be fast analyzed, so that the use requirement of a user can be well met. The invention further discloses an emotion classification and emotion component analyzing system based on the microblog texts.

Description

Mood classification and mood component analyzing method and system based on microblogging text
Technical field
The present invention relates to computer utility and Internet technical field, particularly a kind of mood classification and mood component analyzing method and system based on microblogging text.
Background technology
Along with the development of network and Web2.0, microblogging has become acquired information indispensable in people's daily life and the important channel releasing news.On microblogging, user can record the life of oneself, also can to focus instantly, deliver the view of oneself, express the suggestion of oneself, and this class microblogging often contains publisher's mood.Therefore, by the microblogging text to user issue, analyze, thereby infer user's mood, to realize from microblogging, excavate unique user and all users for the emotional status of some focus incident, think that decision-making from now on provides Data support.Yet, take Sina's microblogging as example, Sina's microblogging has approximately 500,000,000 registered users, has every day 200,000,000 new microbloggings of surpassing to be published, if these microbloggings, entirely by artificial treatment, must be wasted time and energy very much, the resource of losing time, can not meet user's user demand well.
Summary of the invention
The present invention is intended to solve at least to a certain extent one of technical matters in correlation technique.For this reason, one object of the present invention is to propose a kind ofly not only can save time, and improves classification speed and classifying quality, mood classification and mood component analyzing method based on microblogging text that can also express-analysis mood composition.
Another object of the present invention is to propose a kind of mood classification and mood elemental analysis system based on microblogging text.
For achieving the above object, one aspect of the present invention embodiment has proposed a kind of mood classification and mood component analyzing method based on microblogging text, comprises the following steps: the microblogging text that obtains many user's issues from internet; Described many microblogging texts are carried out to participle, to obtain a plurality of words according to the part of speech of each word; From described a plurality of words, extract a plurality of Feature Words; According to the sorter of each node in described a plurality of Feature Words training mood taxonomic hierarchieses, to build described mood taxonomic hierarchies, and by described mood taxonomic hierarchies, realize mood and classify; And according to classification results, microblogging text mood composition is analyzed.
According to mood classification and the mood component analyzing method based on microblogging text of the embodiment of the present invention, by being carried out to participle, microblogging text obtains a plurality of words, and from a plurality of words, extract a plurality of Feature Words, to train the sorter of each node in mood taxonomic hierarchieses according to a plurality of Feature Words, complete and build mood taxonomic hierarchies, and by mood taxonomic hierarchies, realizing mood classifies, and fast microblogging text mood composition is analyzed according to classification results, detect topmost mood in text, not only save time, improved classification speed, also promoted classifying quality, meet better user's user demand.
In addition, mood classification and the mood component analyzing method based on microblogging text according to the above embodiment of the present invention can also have following additional technical characterictic:
In one embodiment of the invention, from described a plurality of words, extract described a plurality of Feature Words, specifically comprise: judge whether each word is high frequency words; If grammatical term for the character is described high frequency words, calculate the degree of correlation of institute's predicate; And if judgement institute predicate is low-frequency word, calculate the PMI value of institute's predicate.
Further, in one embodiment of the invention, according to following formula, calculate the degree of correlation of institute's predicate:
χ 2 ( t , c ) = N ( AD - BC ) 2 ( A + C ) ( A + B ) ( B + D ) ( C + D )
Wherein, the word of t for calculating, c is classification, N is number of files, A represents that document belongs to classification c and do not comprise word t, B represents that described document does not belong to described classification c and comprises the predicate t of institute, and C represents that described document belongs to described classification c and do not comprise the predicate t of institute, and D represents that described document does not belong to described classification c and do not comprise the predicate t of institute.
Further, in one embodiment of the invention, according to following formula, calculate PMI (PointwiseMutual Information the puts mutual formula information) value of institute's predicate:
PMI ( t , c ) = log p ( t , c ) p ( t ) p ( c )
Wherein, p (t, c) represents described document package containing the predicate t of institute and belongs to the probability of described classification c, and p (t) represents that described document package is containing the probability of the predicate t of institute, and p (c) represents that described document belongs to the probability of described classification c.
Further, in one embodiment of the invention, should mood classification and mood component analyzing method based on microblogging text also comprise: if the degree of correlation of institute's predicate is greater than the first predetermined threshold value, as Feature Words, extract; If the PMI value of institute's predicate is greater than the second predetermined threshold value, as described Feature Words, extract.
Further, in one embodiment of the invention, according to described classification results, described microblogging text mood composition is analyzed, further comprised: the regressand value that obtains described microblogging text every kind of mood in described mood degree taxonomic hierarchies; According to the regressand value of described every kind of mood, calculate the score of described every kind of mood, to choose the mood of default value, and calculate the ratio of the mood of described default value.
Further, in one embodiment of the invention, according to following formula, calculate the score of described every kind of mood:
S i = e V i , 3 + V i , 4
Wherein, S ithe score that represents i kind mood, V i, 3the 3rd layer of regressand value that represents described i kind mood, V i, 4the 4th layer of regressand value that represents described i kind mood; According to following formula, calculate the ratio of the mood of described default value:
P i = e V i , 3 + V i , 4 Σ k = 1 4 e V k , 3 + V k , 4
Wherein, P ithe ratio that represents described i kind mood, K represents total K kind mood.
The present invention on the other hand embodiment has proposed a kind of mood classification and mood elemental analysis system based on microblogging text, comprising: acquisition module, for obtain the microblogging text of many user's issues from internet; Word-dividing mode, for described many microblogging texts are carried out to participle, to obtain a plurality of words according to the part of speech of each word; Extraction module, for extracting a plurality of Feature Words from described a plurality of words; Creation module, for according to the sorter of described a plurality of each node of Feature Words training mood taxonomic hierarchies, carries out mood classification to build described mood taxonomic hierarchies, and by described mood taxonomic hierarchies, realizes mood and classify; And analysis module, for microblogging text mood composition being analyzed according to classification results.
According to mood classification and the mood elemental analysis system based on microblogging text of the embodiment of the present invention, by being carried out to participle, microblogging text obtains a plurality of words, and from a plurality of words, extract a plurality of Feature Words, to train the sorter of each node in mood taxonomic hierarchieses according to a plurality of Feature Words, complete and build mood taxonomic hierarchies, and by mood taxonomic hierarchies, realizing mood classifies, and fast microblogging text mood composition is analyzed according to classification results, detect topmost mood in text, not only save time, improved classification speed, also promoted classifying quality, meet better user's user demand.
In addition, mood classification and the mood elemental analysis system based on microblogging text according to the above embodiment of the present invention can also have following additional technical characterictic:
In one embodiment of the invention, described extraction module also for: judge whether each word is high frequency words; If grammatical term for the character is described high frequency words, calculate the degree of correlation of institute's predicate; And if judgement institute predicate is low-frequency word, calculate the PMI value of institute's predicate.
Further, in one embodiment of the invention, according to following formula, calculate the degree of correlation of institute's predicate:
χ 2 ( t , c ) = N ( AD - BC ) 2 ( A + C ) ( A + B ) ( B + D ) ( C + D )
Wherein, the word of t for calculating, c is classification, N is number of files, A represents that document belongs to classification c and do not comprise word t, B represents that described document does not belong to described classification c and comprises the predicate t of institute, and C represents that described document belongs to described classification c and do not comprise the predicate t of institute, and D represents that described document does not belong to described classification c and do not comprise the predicate t of institute.
Further, in one embodiment of the invention, according to following formula, calculate the PMI value of institute's predicate:
PMI ( t , c ) = log p ( t , c ) p ( t ) p ( c )
Wherein, p (t, c) represents described document package containing the predicate t of institute and belongs to the probability of described classification c, and p (t) represents that described document package is containing the probability of the predicate t of institute, and p (c) represents that described document belongs to the probability of described classification c.
Further, in one embodiment of the invention, described extraction module also for: if the degree of correlation of institute's predicate is greater than the first predetermined threshold value, as Feature Words, extract; If the PMI value of institute's predicate is greater than the second predetermined threshold value, as described Feature Words, extract.
Further, in one embodiment of the invention, described analysis module also for: obtain described microblogging text at the regressand value of every kind of mood of described mood degree taxonomic hierarchies; According to the regressand value of described every kind of mood, calculate the score of described every kind of mood, to choose the mood of default value, and calculate the ratio of the mood of described default value.
Further, in one embodiment of the invention, according to following formula, calculate the score of described every kind of mood:
S i = e V i , 3 + V i , 4
Wherein, S ithe score that represents i kind mood, V i, 3the 3rd layer of regressand value that represents described i kind mood, V i, 4the 4th layer of regressand value that represents described i kind mood; According to following formula, calculate the ratio of the mood of described default value:
P i = e V i , 3 + V i , 4 Σ k = 1 4 e V k , 3 + V k , 4
Wherein, P ithe ratio that represents described i kind mood, K represents total K kind mood.
The aspect that the present invention is additional and advantage in the following description part provide, and part will become obviously from the following description, or recognize by practice of the present invention.
Accompanying drawing explanation
Above-mentioned and/or the additional aspect of the present invention and advantage will become from the following description of the accompanying drawings of embodiments and obviously and easily understand, wherein:
Fig. 1 is the process flow diagram of the classification of the mood based on microblogging text according to an embodiment of the invention and mood component analyzing method;
Fig. 2 is mood based on the microblogging text classification of a specific embodiment according to the present invention and the process flow diagram of mood component analyzing method;
Fig. 3 is four layers of fine granularity mood taxonomic hierarchies according to an embodiment of the invention;
Fig. 4 is the structural representation of the classification of the mood based on microblogging text according to an embodiment of the invention and mood elemental analysis system; And
Fig. 5 is mood based on the microblogging text classification of a specific embodiment according to the present invention and the structural representation of mood elemental analysis system.
Embodiment
Describe embodiments of the invention below in detail, the example of described embodiment is shown in the drawings, and wherein same or similar label represents same or similar element or has the element of identical or similar functions from start to finish.Below by the embodiment being described with reference to the drawings, be exemplary, only for explaining the present invention, and can not be interpreted as limitation of the present invention.
Disclosing below provides many different embodiment or example to be used for realizing different structure of the present invention.Of the present invention open in order to simplify, hereinafter the parts of specific examples and setting are described.Certainly, they are only example, and object does not lie in restriction the present invention.In addition, the present invention can be in different examples repeat reference numerals and/or letter.This repetition is in order to simplify and object clearly, itself do not indicate the relation between discussed various embodiment and/or setting.In addition, the various specific technique the invention provides and the example of material, but those of ordinary skills can recognize the property of can be applicable to of other techniques and/or the use of other materials.In addition, First Characteristic described below Second Characteristic it " on " structure can comprise that the first and second Characteristics creations are for the direct embodiment of contact, also can comprise the embodiment of other Characteristics creation between the first and second features, such the first and second features may not be direct contacts.
In description of the invention, it should be noted that, unless otherwise prescribed and limit, term " installation ", " being connected ", " connection " should be interpreted broadly, for example, can be mechanical connection or electrical connection, also can be the connection of two element internals, can be to be directly connected, and also can indirectly be connected by intermediary, for the ordinary skill in the art, can understand as the case may be the concrete meaning of above-mentioned term.
Describe with reference to the accompanying drawings mood classification and mood component analyzing method and the system based on the microblogging text that according to the embodiment of the present invention, propose, describe first with reference to the accompanying drawings mood classification and the mood component analyzing method based on the microblogging text that according to the embodiment of the present invention, propose.Shown in Fig. 1, should mood classification and mood component analyzing method (hereinafter to be referred as analytical approach) based on microblogging text comprise the following steps:
S101, obtains the microblogging text of many user's issues from internet.
In one embodiment of the invention, shown in Fig. 2, the embodiment of the present invention is mainly obtained original microblogging text from internet, to carry out mood classification and mood constituent analysis.The data of the embodiment of the present invention mainly API based on microblogging (Application Programming Interface, application programming interface) crawl from microblogging by web crawlers, and are saved in associated databases.Further, the data of crawl are generally microblogging text, if the relevant microblogging of a certain event is analyzed, can use corresponding API to capture data.
S102, carries out participle to many microblogging texts, to obtain a plurality of words according to the part of speech of each word.
In one embodiment of the invention, the embodiment of the present invention is preferably by using the ICTCLAS of Chinese Academy of Sciences Words partition system to carry out participle to microblogging text, retains the word of following part of speech: noun (n), character string (x), number (m), measure word (q), verb (v), adjective (a), descriptive word (z), distinction word (b), adverbial word (d), unknown part of speech (un), interrogative pronoun (ry), question mark (ww), exclamation (wt), left parenthesis (wkz) and right parenthesis (wky) after participle.
Further, in one embodiment of the invention, in order to extract better the word of the part of speech that needs reservation, to obtain a plurality of words, thus constitutive characteristic space, realization extracts Feature Words, and the embodiment of the present invention also adds two processing rules.Article one: repeat continuously punctuate rule.For example, in order to distinguish a plurality of question marks (exclamation) and single question mark (exclamation), again for unified feature, the present invention represents continuous a plurality of question marks (exclamation) unification with two question marks (exclamation); Second: negative word rule.For example ought occur with negative phrase, during as " not very happy ", Words partition system can be split up into negative word " no/d too/d happiness/a ", and does not meet like this demand and actual semantic.So there is adjective in three words after negative word, just these words are processed as a word together, so word segmentation result is " not very happy/a ".
S103 extracts a plurality of Feature Words from a plurality of words.
In one embodiment of the invention, from a plurality of words, extract a plurality of Feature Words, specifically comprise: judge whether each word is high frequency words; If grammatical term for the character is high frequency words, calculate the degree of correlation of word; If grammatical term for the character is low-frequency word, calculate the PMI value of word.
Particularly, in one embodiment of the invention, the target signature set of words that the feature selecting algorithm that the embodiment of the present invention proposes will be selected is that a plurality of Feature Words are divided into high frequency word set and low frequency word set two parts.Wherein, high frequency words can refer to that the frequency of occurrences is higher in sampling text, and low-frequency word can refer to that the frequency of occurrences is lower in sampling text, can determine predeterminated frequency according to actual conditions particularly, and when the frequency of word is during higher than predeterminated frequency, grammatical term for the character is high frequency words; When the frequency of word is during lower than predeterminated frequency, grammatical term for the character is low-frequency word.In addition, it should be noted that, the threshold value relating in following process is all determined by iteration.
Particularly, for high frequency words set, adopt the method for chi square test and odds ratio (Odds Ratios) combination.Chi square test algorithm is as follows: establishing the word that will calculate is t, and classification is c, and total N document is N bar microblogging text, as shown in table 1, according to whether comprising t and whether belong to c, document is divided into following 4 classes:
Table 1
Belong to classification c Do not belong to classification c
Comprise word t A B
Do not comprise word t C D
Further, in one embodiment of the invention, according to following formula, calculate the degree of correlation of word:
χ 2 ( t , c ) = N ( AD - BC ) 2 ( A + C ) ( A + B ) ( B + D ) ( C + D )
Wherein, the word of t for calculating, c is classification, and N is number of files, and A represents that document belongs to classification c and do not comprise word t, and B represents that document does not belong to classification c and comprises word t, and C represents that document belongs to classification c and do not comprise word t, and D represents that document does not belong to classification c and do not comprise word t.
Further, in one embodiment of the invention, according to following formula, calculate the PMI value of word:
PMI ( t , c ) = log p ( t , c ) p ( t ) p ( c )
Wherein, p (t, c) expression document package contains word t and belongs to the probability of classification c, and p (t) represents that document package is containing the probability of word t, and p (c) represents that document belongs to the probability of classification c.
Further, in one embodiment of the invention, above-mentioned analytical approach also comprises: if the degree of correlation of word is greater than the first predetermined threshold value, as Feature Words, extract; If the PMI value of word is greater than the second predetermined threshold value, as Feature Words, extract.
Particularly, in one embodiment of the invention, for high frequency words set, when selecting high frequency words, travel through from high to low each word, if its odds ratio is the degree of correlation, be greater than the threshold value of setting, just by this selected ci poem, until do not have the optional or word number of word to reach threshold value.For low-frequency word set, when selecting low-frequency word, the embodiment of the present invention adopts PMI to select.Wherein, for each word, if its positive PMI or negative PMI are higher than the threshold value of setting, just by this selected ci poem.Finally merge high frequency words set and low-frequency word set, as the Feature Words set of final reservation.
S104, according to the sorter of each node in a plurality of Feature Words training mood taxonomic hierarchieses, to build mood degree taxonomic hierarchies, and realizes mood by mood taxonomic hierarchies and classifies.
Further, in one embodiment of the invention, SVM (Support Vector Machine, support vector machine) is a kind of machine learning algorithm, is used for processing the data of linear separability.When data linearly inseparable, SVM can make its linear separability by data-mapping in higher dimensional space.Meanwhile, for fear of the computational complexity in higher dimensional space, SVM can be used kernel function (Kernel Function) to carry out result of calculation.The sorter that the embodiment of the present invention is used is SVR (SupporVectorRegression, support vector regression), and SVR is the Yi Ge branch of SVM.Particularly, directly to provide classification results different from SVM, and what SVR provided is the regressand value of each sample, can regulate more neatly classification thresholds like this.For relating to polytypic situation, first SVR calculates the regressand value of each class, then calculates the difference between regressand value and threshold value, selects the final classification of conduct of difference maximum.In other words, the embodiment of the present invention is according to the SVR of each node in a plurality of Feature Words training mood taxonomic hierarchieses, to build four layers of fine granularity mood taxonomic hierarchies.It should be noted that, in actual applications, also can adjust flexibly the feature selecting algorithm of every one deck according to data characteristics, can select the algorithm different from the present invention to build mood taxonomic hierarchies.The analytical approach of the embodiment of the present invention not only can promote mood classifying quality, can also improve mood classification speed.
Preferably, in one embodiment of the invention, shown in Fig. 3, mood taxonomic hierarchies is preferably the fine-grained mood taxonomic hierarchies of four layers.Particularly, mood sorting algorithm is in the past general uses 3 layers, the taxonomic hierarchies of totally 7 kinds of moods, and the mood taxonomic hierarchies that the embodiment of the present invention adopts is the upper one deck that increases in original basis again, has 19 kinds of fine-grained basic emotions, can portray more meticulously mood.
In an embodiment of the present invention, the embodiment of the present invention is according to a plurality of Feature Words training classifiers.Wherein, a plurality of Feature Words are divided into training set and test set.Sorter is trained on training set, test effect on test set.Wherein, Indexes of Evaluation Effect adopts accuracy rate (Precision), recall rate (Recall) and F1 value (F1-Score) to evaluate.In specific embodiment of the present invention, classification results is as shown in table 1, the original microblogging text that data Dou Shicong used Sina microblogging captures, totally 9960.According to shown in table 2, the embodiment of the present invention has improved precision and the coverage rate of mood classification, better microblogging text is carried out to mood classification.
Table 2
Mood Accuracy rate Recall rate F1 value
Sad 0.398 0.412 0.415
Compunction 0.333 0.130 0.188
Disappointed 0.327 0.358 0.341
Miss 0.446 0.465 0.455
In surprise 0.417 0.312 0.357
Unbearably 0.529 0.429 0.474
Frightened 0.500 0.583 0.538
Shy 0.267 0.267 0.267
Indignation 0.750 0.493 0.595
Censure 0.284 0.338 0.309
Unhappy 0.300 0.401 0.344
Suspect 0.188 0.115 0.143
Abhor 0.514 0.463 0.487
Like 0.273 0.185 0.220
Believe 0.467 0.389 0.424
Praise 0.111 0.070 0.086
Wish 0.606 0.680 0.641
Feel at ease 0.294 0.294 0.294
Happy 0.578 0.585 0.581
S105, analyzes microblogging text mood composition according to classification results.
Further, in one embodiment of the invention, according to classification results, microblogging text mood composition is analyzed, further comprised: the regressand value that obtains microblogging text every kind of mood in mood degree taxonomic hierarchies; According to the regressand value of every kind of mood, calculate the score of every kind of mood, to choose the mood of default value, and calculate the ratio of the mood of described default value.
Further, in one embodiment of the invention, according to following formula, calculate the score of every kind of mood: wherein, S ithe score that represents i kind mood, V i, 3the 3rd layer of regressand value that represents i kind mood, V i, 4the 4th layer of regressand value that represents i kind mood.According to following formula, calculate the ratio of the mood of default value: wherein, P ithe ratio that represents i kind mood, K represents total K kind mood.
Particularly, mood constituent analysis depends on mood classification results, detects topmost mood in current text.Wherein, be mainly that the regressand value on the 3rd layer and the 4th layer of mood taxonomic hierarchies counts the score based on current text, select for example 4 kinds of moods of default value that score is the highest.For example, for i kind basic emotion, it must be divided into: v wherein i, 3and V i, 4it is respectively the regressand value of the 3rd layer and the 4th layer mood i.Further, pass through S iselect 4 kinds of moods that score is the highest, and calculate the ratio of every kind of mood, complete the analysis to microblogging text mood composition.Wherein, ratio computing method are: the embodiment of the present invention is automatically identified the mood in microblogging by computing machine, and detects topmost 4 kinds of moods, and calculates ratio, and by result Dynamic Display.
The analytical approach of the embodiment of the present invention has following several principal feature: 1) save time.Current microblogging text does not need manual analysis, just can obtain rapidly mood classification and the main mood of microblogging text.2) applied widely.The method can be used by manufacturer or competent authorities, and the mood trend of analysis user integral body also can be used by unique user oneself, analysis oneself and other people emotional status.3) mood fine size.Mood sorting algorithm is in the past general uses 3 layers, the taxonomic hierarchies of totally 7 kinds of moods, and the mood taxonomic hierarchies that the embodiment of the present invention adopts is the upper one deck that increases in original basis again, has 19 kinds of fine-grained basic emotions, can portray more meticulously mood.
According to mood classification and the mood component analyzing method based on microblogging text of the embodiment of the present invention, by being carried out to participle, microblogging text obtains a plurality of words, and from a plurality of words, extract a plurality of Feature Words, to train the sorter of each node in mood taxonomic hierarchieses according to a plurality of Feature Words, complete and build mood taxonomic hierarchies, and by mood taxonomic hierarchies, realizing mood classifies, and fast microblogging text mood composition is analyzed according to classification results, detect topmost mood in text, not only save time, improved classification speed, also promoted classifying quality, meet better user's user demand.
Fig. 4 is according to the mood classification based on microblogging text of the embodiment of the present invention and the structural representation of mood elemental analysis system.Shown in Fig. 4, according to mood classification and the mood elemental analysis system (hereinafter to be referred as analytic system 100) based on microblogging text of the embodiment of the present invention, comprise: acquisition module 10, word-dividing mode 20, extraction module 30, creation module 40 and analysis module 50.
Wherein, acquisition module 10 is for obtaining the microblogging text of many user's issues from internet.Word-dividing mode 20 is for carrying out participle to many microblogging texts, to obtain a plurality of words according to the part of speech of each word.Extraction module 30 is for extracting a plurality of Feature Words from a plurality of words.Creation module 40, for according to the sorter of a plurality of each node of Feature Words training mood taxonomic hierarchies, is carried out mood classification to build mood taxonomic hierarchies, and by mood taxonomic hierarchies, is realized mood and classify.Analysis module 50 is for analyzing microblogging text mood composition according to classification results.
In one embodiment of the invention, shown in Fig. 2, the embodiment of the present invention is mainly obtained original microblogging text from internet, to carry out mood classification and mood constituent analysis.The data of the embodiment of the present invention mainly API based on microblogging crawl from microblogging by web crawlers, and are saved in database 80.Further, the data of crawl are generally microblogging text, if the relevant microblogging of a certain event is analyzed, can use corresponding API to capture data.
Preferably, in one embodiment of the invention, the embodiment of the present invention is preferably by using the ICTCLAS of Chinese Academy of Sciences Words partition system to carry out participle to microblogging text, retains the word of following part of speech: noun (n), character string (x), number (m), measure word (q), verb (v), adjective (a), descriptive word (z), distinction word (b), adverbial word (d), unknown part of speech (un), interrogative pronoun (ry), question mark (ww), exclamation (wt), left parenthesis (wkz) and right parenthesis (wky) after participle.
Further, in one embodiment of the invention, in order to extract better the word of the part of speech that needs reservation, to obtain a plurality of words, thus constitutive characteristic space, realization extracts Feature Words, and the embodiment of the present invention also adds two processing rules.Article one: repeat continuously punctuate rule.For example, in order to distinguish a plurality of question marks (exclamation) and single question mark (exclamation), again for unified feature, the present invention represents continuous a plurality of question marks (exclamation) unification with two question marks (exclamation); Second: negative word rule.For example ought occur with negative phrase, during as " not very happy ", Words partition system can be split up into negative word " no/d too/d happiness/a ", and does not meet like this demand and actual semantic.So there is adjective in three words after negative word, just these words are processed as a word together, so word segmentation result is " not very happy/a ".
Further, in one embodiment of the invention, extraction module 30 is also for judging whether each word is high frequency words; If grammatical term for the character is high frequency words, calculate the degree of correlation of word; If grammatical term for the character is low-frequency word, calculate the PMI value of word.
Particularly, in one embodiment of the invention, the target signature set of words that the feature selecting algorithm that the embodiment of the present invention proposes will be selected is that a plurality of Feature Words are divided into high frequency word set and low frequency word set two parts.Wherein, high frequency words can refer to that the frequency of occurrences is higher in sampling text, and low-frequency word can refer to that the frequency of occurrences is lower in sampling text, can determine predeterminated frequency according to actual conditions particularly, and when the frequency of word is during higher than predeterminated frequency, grammatical term for the character is high frequency words; When the frequency of word is during lower than predeterminated frequency, grammatical term for the character is low-frequency word.In addition, it should be noted that, the threshold value relating in following process is all determined by iteration.
Particularly, for high frequency words set, adopt the method for chi square test and odds ratio (Odds Ratios) combination.Chi square test algorithm is as follows: establishing the word that will calculate is t, and classification is c, and total N document is N bar microblogging text, as shown in table 1, according to whether comprising t and whether belong to c, document is divided into 4 classes.
Further, in one embodiment of the invention, according to following formula, calculate the degree of correlation of word:
χ 2 ( t , c ) = N ( AD - BC ) 2 ( A + C ) ( A + B ) ( B + D ) ( C + D )
Wherein, the word of t for calculating, c is classification, and N is number of files, and A represents that document belongs to classification c and do not comprise word t, and B represents that document does not belong to classification c and comprises word t, and C represents that document belongs to classification c and do not comprise word t, and D represents that document does not belong to classification c and do not comprise word t.
Further, in one embodiment of the invention, according to following formula, calculate the PMI value of word:
PMI ( t , c ) = log p ( t , c ) p ( t ) p ( c )
Wherein, p (t, c) expression document package contains word t and belongs to the probability of classification c, and p (t) represents that document package is containing the probability of word t, and p (c) represents that document belongs to the probability of classification c.
Further, in one embodiment of the invention, if extraction module 30 is also greater than the first predetermined threshold value for the degree of correlation of word, as Feature Words, extract; If the PMI value of word is greater than the second predetermined threshold value, as Feature Words, extract.
Particularly, in one embodiment of the invention, for high frequency words set, when selecting high frequency words, travel through from high to low each word, if its odds ratio is the degree of correlation, be greater than the threshold value of setting, just by this selected ci poem, until do not have the optional or word number of word to reach threshold value.For low-frequency word set, when selecting low-frequency word, the embodiment of the present invention adopts PMI to select.Wherein, for each word, if its positive PMI or negative PMI are higher than the threshold value of setting, just by this selected ci poem.Finally merge high frequency words set and low-frequency word set, as the Feature Words set of final reservation.
Further, in one embodiment of the invention, SVM is a kind of machine learning algorithm, is used for processing the data of linear separability.When data linearly inseparable, SVM can make its linear separability by data-mapping in higher dimensional space.Meanwhile, for fear of the computational complexity in higher dimensional space, SVM can be used kernel function (Kernel Function) to carry out result of calculation.The sorter that the embodiment of the present invention is used is SVR, and SVR is the Yi Ge branch of SVM.Particularly, directly to provide classification results different from SVM, and what SVR provided is the regressand value of each sample, can regulate more neatly classification thresholds like this.For relating to polytypic situation, first SVR calculates the regressand value of each class, then calculates the difference between regressand value and threshold value, selects the final classification of conduct of difference maximum.In other words, the embodiment of the present invention is according to the SVR of each node in a plurality of Feature Words training mood taxonomic hierarchieses, to build four layers of fine granularity mood taxonomic hierarchies.It should be noted that, in actual applications, also can adjust flexibly the feature selecting algorithm of every one deck according to data characteristics, can select the algorithm different from the present invention to build mood taxonomic hierarchies.The analytical approach of the embodiment of the present invention not only can promote mood classifying quality, can also improve mood classification speed.
Preferably, in one embodiment of the invention, shown in Fig. 3, mood taxonomic hierarchies is preferably the fine-grained mood taxonomic hierarchies of four layers.Particularly, mood sorting algorithm is in the past general uses 3 layers, the taxonomic hierarchies of totally 7 kinds of moods, and the mood taxonomic hierarchies that the embodiment of the present invention adopts is the upper one deck that increases in original basis again, has 19 kinds of fine-grained basic emotions, can portray more meticulously mood.
In an embodiment of the present invention, the embodiment of the present invention is according to a plurality of Feature Words training classifiers.Wherein, a plurality of Feature Words are divided into training set and test set.Sorter is trained on training set, test effect on test set.Wherein, Indexes of Evaluation Effect adopts accuracy rate (Precision), recall rate (Recall) and F1 value (F1-Score) to evaluate.In specific embodiment of the present invention, classification results is as shown in table 1, the original microblogging text that data Dou Shicong used Sina microblogging captures, totally 9960.According to shown in table 2, the embodiment of the present invention has improved precision and the coverage rate of mood classification, better microblogging text is carried out to mood classification.
Further, in one embodiment of the invention, analysis module 50 is also for obtaining microblogging text at the regressand value of every kind of mood of mood degree taxonomic hierarchies; According to the regressand value of every kind of mood, calculate the score of every kind of mood, to choose the mood of default value, and calculate the ratio of the mood of described default value.
Further, in one embodiment of the invention, according to following formula, calculate the score of every kind of mood: wherein, S ithe score that represents i kind mood, V i, 3the 3rd layer of regressand value that represents i kind mood, V i, 4the 4th layer of regressand value that represents i kind mood.According to following formula, calculate the ratio of the mood of default value: wherein, P ithe ratio that represents i kind mood, K represents total K kind mood.
Particularly, mood constituent analysis depends on mood classification results, detects topmost mood in current text.Wherein, be mainly that the regressand value on the 3rd layer and the 4th layer of mood taxonomic hierarchies counts the score based on current text, select for example 4 kinds of moods of default value that score is the highest.For example, for i kind basic emotion, it must be divided into: v wherein i, 3and V i, 4it is respectively the regressand value of the 3rd layer and the 4th layer mood i.Further, pass through S iselect 4 kinds of moods that score is the highest, and calculate the ratio of every kind of mood, complete the analysis to microblogging text mood composition.Wherein, ratio computing method are: the embodiment of the present invention is automatically identified the mood in microblogging by computing machine, and detects topmost 4 kinds of moods, and calculates ratio, and by result Dynamic Display.
Further, in one embodiment of the invention, shown in Fig. 5, above-mentioned analytic system 100 can also comprise: subscriber interface module 60 and database interface module 70.
Wherein, subscriber interface module 60 is the user interface that user provides a patterned close friend for giving the user of analytic system 100, to facilitate user to browse own and other people emotional status.Database interface module 70, for database 80 read-write interfaces of whole system are provided, facilitates other each different functional module to carry out the I/O operation of data.
Further, in one embodiment of the invention, the acquisition module 10 of this analytic system 100, word-dividing mode 20, extraction module 30, creation module 40, analysis module 50, subscriber interface module 60 and database interface module 70 all realize with java, Python and JSP language development under Windows.Further, based on above-mentioned development platform, the deployment of this analytic system 100 operation needs the support of following several level running environment.First at operating system layer, analytic system 100 need to be moved on Windows XP or its compatible operating system platform, also needs program run time infrastructure, namely Java and Python run time infrastructure simultaneously.When having possessed above-mentioned back-up environment, this analytic system 100 could be able to normally be moved.And user only need to just can browse own and other people emotional status by web browser access system.
The analytic system 100 of the embodiment of the present invention has following several principal feature: 1) save time.Current microblogging text does not need manual analysis, just can obtain rapidly mood classification and the main mood of microblogging text.2) applied widely.This system can be used by manufacturer or competent authorities, and the mood trend of analysis user integral body also can be used by unique user oneself, analysis oneself and other people emotional status.3) mood fine size.Mood sorting algorithm is in the past general uses 3 layers, the taxonomic hierarchies of totally 7 kinds of moods, and the mood taxonomic hierarchies that the embodiment of the present invention adopts is the upper one deck that increases in original basis again, has 19 kinds of fine-grained basic emotions, can portray more meticulously mood.
According to mood classification and the mood elemental analysis system based on microblogging text of the embodiment of the present invention, by being carried out to participle, microblogging text obtains a plurality of words, and from a plurality of words, extract a plurality of Feature Words, to train the sorter of each node in mood taxonomic hierarchieses according to a plurality of Feature Words, complete and build mood taxonomic hierarchies, and by mood taxonomic hierarchies, realizing mood classifies, and fast microblogging text mood composition is analyzed according to classification results, detect topmost mood in text, not only save time, improved classification speed, also promoted classifying quality, meet better user's user demand.
In process flow diagram or any process of otherwise describing at this or method describe and can be understood to, represent to comprise that one or more is for realizing module, fragment or the part of code of executable instruction of the step of specific logical function or process, and the scope of the preferred embodiment of the present invention comprises other realization, wherein can be not according to order shown or that discuss, comprise according to related function by the mode of basic while or by contrary order, carry out function, this should be understood by embodiments of the invention person of ordinary skill in the field.
The logic and/or the step that in process flow diagram, represent or otherwise describe at this, for example, can be considered to for realizing the sequencing list of the executable instruction of logic function, may be embodied in any computer-readable medium, for instruction execution system, device or equipment (as computer based system, comprise that the system of processor or other can and carry out the system of instruction from instruction execution system, device or equipment instruction fetch), use, or use in conjunction with these instruction execution systems, device or equipment.With regard to this instructions, " computer-readable medium " can be anyly can comprise, storage, communication, propagation or transmission procedure be for instruction execution system, device or equipment or the device that uses in conjunction with these instruction execution systems, device or equipment.The example more specifically of computer-readable medium (non-exhaustive list) comprises following: the electrical connection section (electronic installation) with one or more wirings, portable computer diskette box (magnetic device), random access memory (RAM), ROM (read-only memory) (ROM), the erasable ROM (read-only memory) (EPROM or flash memory) of editing, fiber device, and portable optic disk ROM (read-only memory) (CDROM).In addition, computer-readable medium can be even paper or other the suitable medium that can print described program thereon, because can be for example by paper or other media be carried out to optical scanning, then edit, decipher or process in electronics mode and obtain described program with other suitable methods if desired, be then stored in computer memory.
Should be appreciated that each several part of the present invention can realize with hardware, software, firmware or their combination.In the above-described embodiment, a plurality of steps or method can realize with being stored in storer and by software or the firmware of suitable instruction execution system execution.For example, if realized with hardware, the same in another embodiment, can realize by any one in following technology well known in the art or their combination: have for data-signal being realized to the discrete logic of the logic gates of logic function, the special IC with suitable combinational logic gate circuit, programmable gate array (PGA), field programmable gate array (FPGA) etc.
Those skilled in the art are appreciated that realizing all or part of step that above-described embodiment method carries is to come the hardware that instruction is relevant to complete by program, described program can be stored in a kind of computer-readable recording medium, this program, when carrying out, comprises step of embodiment of the method one or a combination set of.
In addition, each functional unit in each embodiment of the present invention can be integrated in a processing module, can be also that the independent physics of unit exists, and also can be integrated in a module two or more unit.Above-mentioned integrated module both can adopt the form of hardware to realize, and also can adopt the form of software function module to realize.If described integrated module usings that the form of software function module realizes and during as production marketing independently or use, also can be stored in a computer read/write memory medium.In addition, term " first ", " second " be only for describing object, and can not be interpreted as indication or hint relative importance or the implicit quantity that indicates indicated technical characterictic.Thus, at least one this feature can be expressed or impliedly be comprised to the feature that is limited with " first ", " second ".In description of the invention, the implication of " a plurality of " is at least two, for example two, and three etc., unless otherwise expressly limited specifically.
The above-mentioned storage medium of mentioning can be ROM (read-only memory), disk or CD etc.
In the description of this instructions, the description of reference term " embodiment ", " some embodiment ", " example ", " concrete example " or " some examples " etc. means to be contained at least one embodiment of the present invention or example in conjunction with specific features, structure, material or the feature of this embodiment or example description.In this manual, the schematic statement of above-mentioned term is not necessarily referred to identical embodiment or example.And the specific features of description, structure, material or feature can be with suitable mode combinations in any one or more embodiment or example.
Although illustrated and described embodiments of the invention, for the ordinary skill in the art, be appreciated that without departing from the principles and spirit of the present invention and can carry out multiple variation, modification, replacement and modification to these embodiment, scope of the present invention is by claims and be equal to and limit.

Claims (14)

1. the classification of the mood based on microblogging text and a mood component analyzing method, is characterized in that, comprises the following steps:
From internet, obtain the microblogging text of many user's issues;
Described many microblogging texts are carried out to participle, to obtain a plurality of words according to the part of speech of each word;
From described a plurality of words, extract a plurality of Feature Words;
According to the sorter of each node in described a plurality of Feature Words training mood taxonomic hierarchieses, to build described mood taxonomic hierarchies, and by described mood taxonomic hierarchies, realize mood and classify; And
According to classification results, microblogging text mood composition is analyzed.
2. method according to claim 1, is characterized in that, extracts described a plurality of Feature Words from described a plurality of words, specifically comprises:
Judge whether each word is high frequency words;
If grammatical term for the character is described high frequency words, calculate the degree of correlation of institute's predicate; And
If judgement institute predicate is low-frequency word, calculate the PMI value of institute's predicate.
3. method according to claim 2, is characterized in that, calculates the degree of correlation of institute's predicate according to following formula:
χ 2 ( t , c ) = N ( AD - BC ) 2 ( A + C ) ( A + B ) ( B + D ) ( C + D )
Wherein, the word of t for calculating, c is classification, N is number of files, A represents that document belongs to classification c and do not comprise word t, B represents that described document does not belong to described classification c and comprises the predicate t of institute, and C represents that described document belongs to described classification c and do not comprise the predicate t of institute, and D represents that described document does not belong to described classification c and do not comprise the predicate t of institute.
4. method according to claim 2, is characterized in that, calculates the PMI value of institute's predicate according to following formula:
PMI ( t , c ) = log p ( t , c ) p ( t ) p ( c )
Wherein, p (t, c) represents described document package containing the predicate t of institute and belongs to the probability of described classification c, and p (t) represents that described document package is containing the probability of the predicate t of institute, and p (c) represents that described document belongs to the probability of described classification c.
5. according to the method described in claim 2-4 any one, it is characterized in that, also comprise:
If the degree of correlation of institute's predicate is greater than the first predetermined threshold value, as Feature Words, extract;
If the PMI value of institute's predicate is greater than the second predetermined threshold value, as described Feature Words, extract.
6. method according to claim 1, is characterized in that, according to described classification results, described microblogging text mood composition is analyzed, and further comprises:
Obtain the regressand value of described microblogging text every kind of mood in described mood degree taxonomic hierarchies;
According to the regressand value of described every kind of mood, calculate the score of described every kind of mood, to choose the mood of default value, and calculate the ratio of the mood of described default value.
7. method according to claim 6, is characterized in that, calculates the score of described every kind of mood according to following formula:
S i = e V i , 3 + V i , 4
Wherein, S ithe score that represents i kind mood, V i, 3the 3rd layer of regressand value that represents described i kind mood, V i, 4the 4th layer of regressand value that represents described i kind mood;
According to following formula, calculate the ratio of the mood of described default value:
P i = e V i , 3 + V i , 4 Σ k = 1 4 e V k , 3 + V k , 4
Wherein, P ithe ratio that represents described i kind mood, K represents total K kind mood.
8. the classification of the mood based on microblogging text and a mood component analyzing method, is characterized in that, comprising:
Acquisition module, for obtaining the microblogging text of many user's issues from internet;
Word-dividing mode, for described many microblogging texts are carried out to participle, to obtain a plurality of words according to the part of speech of each word;
Extraction module, for extracting a plurality of Feature Words from described a plurality of words;
Creation module, for according to the sorter of described a plurality of each node of Feature Words training mood taxonomic hierarchies, carries out mood classification to build described mood taxonomic hierarchies, and by described mood taxonomic hierarchies, realizes mood and classify; And
Analysis module, for analyzing microblogging text mood composition according to classification results.
9. system according to claim 8, is characterized in that, described extraction module also for:
Judge whether each word is high frequency words;
If grammatical term for the character is described high frequency words, calculate the degree of correlation of institute's predicate; And
If judgement institute predicate is low-frequency word, calculate the PMI value of institute's predicate.
10. system according to claim 9, is characterized in that, calculates the degree of correlation of institute's predicate according to following formula:
χ 2 ( t , c ) = N ( AD - BC ) 2 ( A + C ) ( A + B ) ( B + D ) ( C + D )
Wherein, the word of t for calculating, c is classification, N is number of files, A represents that document belongs to classification c and do not comprise word t, B represents that described document does not belong to described classification c and comprises the predicate t of institute, and C represents that described document belongs to described classification c and do not comprise the predicate t of institute, and D represents that described document does not belong to described classification c and do not comprise the predicate t of institute.
11. systems according to claim 9, is characterized in that, calculate the PMI value of institute's predicate according to following formula:
PMI ( t , c ) = log p ( t , c ) p ( t ) p ( c )
Wherein, p (t, c) represents described document package containing the predicate t of institute and belongs to the probability of described classification c, and p (t) represents that described document package is containing the probability of the predicate t of institute, and p (c) represents that described document belongs to the probability of described classification c.
12. according to the system described in claim 9-11 any one, it is characterized in that, described extraction module also for:
If the degree of correlation of institute's predicate is greater than the first predetermined threshold value, as Feature Words, extract;
If the PMI value of institute's predicate is greater than the second predetermined threshold value, as described Feature Words, extract.
13. systems according to claim 8, is characterized in that, described analysis module also for:
Obtain the regressand value of described microblogging text every kind of mood in described mood degree taxonomic hierarchies;
According to the regressand value of described every kind of mood, calculate the score of described every kind of mood, to choose the mood of default value, and calculate the ratio of the mood of described default value.
14. systems according to claim 13, is characterized in that, calculate the score of described every kind of mood according to following formula:
S i = e V i , 3 + V i , 4
Wherein, S ithe score that represents i kind mood, V i, 3the 3rd layer of regressand value that represents described i kind mood, V i, 4the 4th layer of regressand value that represents described i kind mood;
According to following formula, calculate the ratio of the mood of described default value:
P i = e V i , 3 + V i , 4 Σ k = 1 4 e V k , 3 + V k , 4
Wherein, P ithe ratio that represents described i kind mood, K represents total K kind mood.
CN201410193638.1A 2014-05-08 2014-05-08 Mood classification and mood component analyzing method and system based on microblogging text Active CN103970864B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410193638.1A CN103970864B (en) 2014-05-08 2014-05-08 Mood classification and mood component analyzing method and system based on microblogging text

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410193638.1A CN103970864B (en) 2014-05-08 2014-05-08 Mood classification and mood component analyzing method and system based on microblogging text

Publications (2)

Publication Number Publication Date
CN103970864A true CN103970864A (en) 2014-08-06
CN103970864B CN103970864B (en) 2017-09-22

Family

ID=51240361

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410193638.1A Active CN103970864B (en) 2014-05-08 2014-05-08 Mood classification and mood component analyzing method and system based on microblogging text

Country Status (1)

Country Link
CN (1) CN103970864B (en)

Cited By (14)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104794208A (en) * 2015-04-24 2015-07-22 清华大学 Sentiment classification method and system based on contextual information of microblog text
CN104794209A (en) * 2015-04-24 2015-07-22 清华大学 Chinese microblog sentiment classification method and system based on Markov logic network
CN105573983A (en) * 2015-12-17 2016-05-11 清华大学 Topic model based hierarchical classification method and system for microblog user emotions
CN106294845A (en) * 2016-08-19 2017-01-04 清华大学 The many emotions sorting technique extracted based on weight study and multiple features and device
CN106777361A (en) * 2017-01-20 2017-05-31 清华大学 Microblogging text mood sorting technique and categorizing system based on vector paragraph model
CN108021704A (en) * 2017-12-27 2018-05-11 广东广业开元科技有限公司 A kind of Optimal Configuration Method of attending a banquet based on Social Public Feelings data mining technology
CN108095740A (en) * 2017-12-20 2018-06-01 姜涵予 A kind of user emotion appraisal procedure and device
CN108536868A (en) * 2018-04-24 2018-09-14 北京慧闻科技发展有限公司 The data processing method of short text data and application on social networks
CN108806724A (en) * 2018-08-15 2018-11-13 太原理工大学 A kind of emotional speech PAD values prediction technique and system
CN109740685A (en) * 2019-01-08 2019-05-10 武汉斗鱼鱼乐网络科技有限公司 Characteristic analysis method, prediction technique, device, equipment and the medium of customer churn
CN110929005A (en) * 2019-10-18 2020-03-27 平安科技(深圳)有限公司 Emotion analysis-based task follow-up method, device, equipment and storage medium
WO2020082612A1 (en) * 2018-10-22 2020-04-30 平安科技(深圳)有限公司 Method and apparatus for sentiment analysis on security research report using big data, and computer device
WO2020087774A1 (en) * 2018-10-31 2020-05-07 平安科技(深圳)有限公司 Concept-tree-based intention recognition method and apparatus, and computer device
CN112949305A (en) * 2021-05-13 2021-06-11 平安科技(深圳)有限公司 Negative feedback information acquisition method, device, equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101127042A (en) * 2007-09-21 2008-02-20 浙江大学 Sensibility classification method based on language model
CN103034626A (en) * 2012-12-26 2013-04-10 上海交通大学 Emotion analyzing system and method
CN103049435A (en) * 2013-01-04 2013-04-17 浙江工商大学 Text fine granularity sentiment analysis method and text fine granularity sentiment analysis device

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101127042A (en) * 2007-09-21 2008-02-20 浙江大学 Sensibility classification method based on language model
CN103034626A (en) * 2012-12-26 2013-04-10 上海交通大学 Emotion analyzing system and method
CN103049435A (en) * 2013-01-04 2013-04-17 浙江工商大学 Text fine granularity sentiment analysis method and text fine granularity sentiment analysis device

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
WEIYUAN LI 等: ""Text-based emotion classification using emotion cause extraction"", 《EXPERT SYSTEMS WITH APPLICATIONS》 *
杨炜炜: ""中文微博文本情绪分析"", 《HTTP://OAPS.LIB.TSINGHUA.EDU.CN/HANDLE/123456789/3156》 *

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104794209B (en) * 2015-04-24 2018-10-02 清华大学 Chinese microblogging mood sorting technique based on Markov logical network and system
CN104794209A (en) * 2015-04-24 2015-07-22 清华大学 Chinese microblog sentiment classification method and system based on Markov logic network
CN104794208A (en) * 2015-04-24 2015-07-22 清华大学 Sentiment classification method and system based on contextual information of microblog text
CN105573983A (en) * 2015-12-17 2016-05-11 清华大学 Topic model based hierarchical classification method and system for microblog user emotions
CN106294845A (en) * 2016-08-19 2017-01-04 清华大学 The many emotions sorting technique extracted based on weight study and multiple features and device
CN106294845B (en) * 2016-08-19 2019-08-09 清华大学 The susceptible thread classification method and device extracted based on weight study and multiple features
CN106777361A (en) * 2017-01-20 2017-05-31 清华大学 Microblogging text mood sorting technique and categorizing system based on vector paragraph model
CN108095740A (en) * 2017-12-20 2018-06-01 姜涵予 A kind of user emotion appraisal procedure and device
CN108021704A (en) * 2017-12-27 2018-05-11 广东广业开元科技有限公司 A kind of Optimal Configuration Method of attending a banquet based on Social Public Feelings data mining technology
CN108536868A (en) * 2018-04-24 2018-09-14 北京慧闻科技发展有限公司 The data processing method of short text data and application on social networks
CN108536868B (en) * 2018-04-24 2022-04-15 北京慧闻科技(集团)有限公司 Data processing method and device for short text data on social network
CN108806724A (en) * 2018-08-15 2018-11-13 太原理工大学 A kind of emotional speech PAD values prediction technique and system
WO2020082612A1 (en) * 2018-10-22 2020-04-30 平安科技(深圳)有限公司 Method and apparatus for sentiment analysis on security research report using big data, and computer device
WO2020087774A1 (en) * 2018-10-31 2020-05-07 平安科技(深圳)有限公司 Concept-tree-based intention recognition method and apparatus, and computer device
CN109740685A (en) * 2019-01-08 2019-05-10 武汉斗鱼鱼乐网络科技有限公司 Characteristic analysis method, prediction technique, device, equipment and the medium of customer churn
CN109740685B (en) * 2019-01-08 2020-10-27 武汉斗鱼鱼乐网络科技有限公司 User loss characteristic analysis method, prediction method, device, equipment and medium
CN110929005A (en) * 2019-10-18 2020-03-27 平安科技(深圳)有限公司 Emotion analysis-based task follow-up method, device, equipment and storage medium
CN112949305A (en) * 2021-05-13 2021-06-11 平安科技(深圳)有限公司 Negative feedback information acquisition method, device, equipment and storage medium

Also Published As

Publication number Publication date
CN103970864B (en) 2017-09-22

Similar Documents

Publication Publication Date Title
CN103970864A (en) Emotion classification and emotion component analyzing method and system based on microblog texts
Ghosh et al. Fracking sarcasm using neural network
CN110717339B (en) Semantic representation model processing method and device, electronic equipment and storage medium
Sulea et al. Predicting the law area and decisions of french supreme court cases
Cheng et al. Hierarchical attention networks for cyberbullying detection on the instagram social network
CN104699763B (en) The text similarity gauging system of multiple features fusion
CN111310476B (en) Public opinion monitoring method and system using aspect-based emotion analysis method
CN107077463A (en) Remote supervisory relation extractor
US20200160196A1 (en) Methods and systems for detecting check worthy claims for fact checking
CN111309910A (en) Text information mining method and device
Arumugam et al. Hands-On Natural Language Processing with Python: A practical guide to applying deep learning architectures to your NLP applications
Gao et al. Text classification research based on improved Word2vec and CNN
CN104317965A (en) Establishment method of emotion dictionary based on linguistic data
CN112966526A (en) Automobile online comment emotion analysis method based on emotion word vector
CN112732910B (en) Cross-task text emotion state evaluation method, system, device and medium
Winters et al. Automatic joke generation: Learning humor from examples
Subramani et al. Child abuse and domestic abuse: Content and feature analysis from social media disclosures
Ahanin et al. A multi-label emoji classification method using balanced pointwise mutual information-based feature selection
Xu et al. Text sentiment analysis and classification based on bidirectional Gated Recurrent Units (GRUs) model
CN106599824A (en) GIF cartoon emotion identification method based on emotion pairs
Naveenkumar et al. Amrita-cen-sentidb 1: Improved twitter dataset for sentimental analysis and application of deep learning
CN114357204B (en) Media information processing method and related equipment
Romberg et al. Making sense of citizens’ input through artificial intelligence: a review of methods for computational text analysis to support the evaluation of contributions in public participation
Devisree et al. A hybrid approach to relationship extraction from stories
CN117574879A (en) Data enhancement method, system, equipment and medium based on pre-training model

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant