CN102945268A - Method and system for excavating comments on characteristics of product - Google Patents

Method and system for excavating comments on characteristics of product Download PDF

Info

Publication number
CN102945268A
CN102945268A CN2012104138543A CN201210413854A CN102945268A CN 102945268 A CN102945268 A CN 102945268A CN 2012104138543 A CN2012104138543 A CN 2012104138543A CN 201210413854 A CN201210413854 A CN 201210413854A CN 102945268 A CN102945268 A CN 102945268A
Authority
CN
China
Prior art keywords
comment
explicit
features
implicit
comments
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN2012104138543A
Other languages
Chinese (zh)
Inventor
杨睿尘
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Tengyi Science & Technology Development Co Ltd
Original Assignee
Beijing Tengyi Science & Technology Development Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Tengyi Science & Technology Development Co Ltd filed Critical Beijing Tengyi Science & Technology Development Co Ltd
Priority to CN2012104138543A priority Critical patent/CN102945268A/en
Publication of CN102945268A publication Critical patent/CN102945268A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention provides a method and system for obtaining comments on characteristics of a product. The method comprises the following steps: comments are acquired from a website through Web-Crawler, wherein the comments are explicit comments or implicit comments; explicit characteristics are extracted from explicit comments; implicit characteristics are extracted from the implicit comments through the explicit characteristics or explicit comments; comments that describing same characteristics are classified in an emotional manner; and a plurality of comments are extracted from the comments classified in an emotional manner. According to the method provided by the embodiment of the invention, a large number of comments acquired from the website are taken as data sources, so that the accuracy of data is improved; and meanwhile, the comments are classified in an emotional manner, so that visualized data is provided for users, and convenience is provided for inquiry of users.

Description

Product feature comment method for digging and system
Technical field
The present invention relates to field of computer technology, particularly a kind of product feature comment method for digging and system.
Background technology
Along with constantly popularizing and the develop rapidly of web2.0 of internet, the review information about social event, focus personage and various product that the internet is passed on has received the concern of each side, become an important channel of people's obtaining information, also often in people's decision-making, occupied very large component.
On the characteristics of Information Communication, the internet has interactivity, can fast and effeciently propagate netizen's viewpoint, thereby form certain Social Public Feelings guiding, thereby it is compared with traditional media at the aspects such as real-time effectiveness, social influence power and spin of the velocity of propagation of information, information and has an enormous advantage.The user has been not only the role who serves as a simple information browse person now, and in the time of more, the user also is the publisher of an information.Forum, blog, comment website, mail, microblogging etc. provide one to release news all for the user of Web 2.0, express own viewpoint place.So, begin to have produced a large amount of review information that contains subjective colo(u)r on the internet, these information can be the user for a certain product, the view of certain some service or use gains in depth of comprehension also can be the viewpoint held for certain social event of user etc.
The method that generally adopts at present is directly to obtain comment from channels such as comment websites, directly extracts the comment that comprises the product feature phrase to analyze, and enumerates representative comment for user's inquiry.
The shortcoming that classic method exists comprises:
(1) the extraction mode to characteristics of objects is single, has reduced the accuracy of feature extraction.
(2) comment is enumerated in indication, so acquired information is limited.
Summary of the invention
Purpose of the present invention is intended to solve at least one of above-mentioned technological deficiency.
For achieving the above object, the embodiment of one aspect of the present invention proposes a kind of product feature comment method for digging, may further comprise the steps: S1: obtain comment by web crawlers from the website, wherein, described comment is a kind of in explicit comment or the implicit expression comment; S2: extract explicit features from described explicit comment; S3: in described implicit expression comment, extract implicit features by described explicit features or described explicit comment; S4: the emotional culture classification is carried out in the comment that will describe same characteristic features; And S5: from the comment behind the described emotional semantic classification, extract a plurality of comments and generate the comment summary.
According to the method for the embodiment of the invention, comment on as data source by the magnanimity that obtains in the website, therefore improved the accuracy of data, simultaneously comment is carried out emotional semantic classification for the user provides intuitively data, made things convenient for user's inquiry.
In one embodiment of the invention, described method also comprises: S6: check described comment summary by product feature.
In one embodiment of the invention, described step S2 specifically comprises: S21: extract frequent noun or the noun phrase that occurs from described explicit comment; S22: described noun or noun phrase processed obtaining the concept set; And S23: cluster is carried out in described concept set obtain conceptual clustering set, i.e. described explicit features.
In one embodiment of the invention, described step S3 specifically comprises: S31: generate correlation rule according to explicit comment and by the collocation extracting method, wherein, described correlation rule is comprised of word and described explicit features or Frequent and described explicit features; And S32: extract described implicit features in described implicit expression comment according to described correlation rule.
In one embodiment of the invention, described step S3 specifically comprises: S310: select attribute to generate training pattern according to described explicit comment and explicit features; S320: the training according to described training pattern generates sorter; S330: obtain described implicit expression comment; And S340: the described implicit features of Analysis deterrmination is carried out in described implicit expression comment by described sorter.
In one embodiment of the invention, described step S4 specifically comprises: S41: extract the comment of describing same characteristic features from described comment; And S42: the emotional culture classification is carried out in the described comment of describing same characteristic features by dictionary.
For achieving the above object, embodiments of the invention propose a kind of product feature comment digging system on the other hand, comprising: acquisition module, be used for obtaining comment by web crawlers from the website, and wherein, described comment is a kind of in explicit comment or the implicit expression comment; The first extraction module is used for extracting explicit features from described explicit comment; The second extraction module is used for extracting implicit features by described explicit features or described explicit comment in described implicit expression comment; The emotional semantic classification module is carried out the emotional culture classification for the comment that will describe same characteristic features; And the summarization generation module, extract a plurality of comments for the comment behind described emotional semantic classification and generate the comment summary.
According to the system of the embodiment of the invention, comment on as data source by the magnanimity that obtains in the website, therefore improved the accuracy of data, simultaneously comment is carried out emotional semantic classification for the user provides intuitively data, made things convenient for user's inquiry.
In one embodiment of the present of invention, described system also comprises: check module, be used for checking described comment summary by product feature.
In one embodiment of the present of invention, described the first extraction module specifically comprises: the first extraction unit is used for extracting frequent noun or the noun phrase that occurs from described explicit comment; Processing unit is used for described noun or noun phrase processed obtaining the concept set; And cluster cell, be used for that cluster is carried out in described concept set and obtain conceptual clustering set, i.e. described explicit features.
In one embodiment of the present of invention, described the second extraction module specifically comprises: the first generation unit, be used for generating correlation rule according to explicit comment and by the collocation extracting method, wherein, described correlation rule is comprised of word and described explicit features or Frequent and described explicit features; And second extraction unit, be used for extracting described implicit features according to described correlation rule in described implicit expression comment.
In one embodiment of the present of invention, described the second extraction module specifically comprises: the second generation unit is used for selecting attribute to generate training pattern according to described explicit comment and explicit features; The 3rd generates the unit, is used for generating sorter according to the training of described training pattern; Acquiring unit is used for obtaining described implicit expression comment; And determining unit, be used for by described sorter the described implicit features of Analysis deterrmination being carried out in described implicit expression comment.
In one embodiment of the present of invention, described emotional semantic classification module specifically comprises: the 3rd extraction unit is used for extracting the comment of describing same characteristic features from described comment; And the emotional semantic classification unit, be used for by dictionary the emotional culture classification being carried out in the described comment of describing same characteristic features.
The aspect that the present invention adds and advantage in the following description part provide, and part will become obviously from the following description, or recognize by practice of the present invention.
Description of drawings
Above-mentioned and/or the additional aspect of the present invention and advantage are from obviously and easily understanding becoming the description of embodiment below in conjunction with accompanying drawing, wherein:
Fig. 1 is the process flow diagram of product feature comment method for digging according to an embodiment of the invention;
Fig. 2 is for obtaining according to an embodiment of the invention the process flow diagram of implicit features by train classification models;
Fig. 3 is the process flow diagram of product feature comment method for digging in accordance with another embodiment of the present invention;
Fig. 4 is the frame diagram of product feature comment digging system according to an embodiment of the invention;
Fig. 5 is the frame diagram of the first extraction module according to an embodiment of the invention;
Fig. 6 is the frame diagram that extracts according to an embodiment of the invention implicit features in the second extraction module by association rule mining;
Fig. 7 is the frame diagram that extracts according to an embodiment of the invention implicit features in the second extraction module by train classification models; And
Fig. 8 is the frame diagram of product feature comment digging system in accordance with another embodiment of the present invention.
Embodiment
The below describes embodiments of the invention in detail, and the example of embodiment is shown in the drawings, and wherein identical or similar label represents identical or similar element or the element with identical or similar functions from start to finish.Be exemplary below by the embodiment that is described with reference to the drawings, only be used for explaining the present invention, and can not be interpreted as limitation of the present invention.
In description of the invention, it will be appreciated that term " first ", " second ", " the 3rd " only are used for describing purpose, and can not be interpreted as indication or hint relative importance or the implicit quantity that indicates indicated technical characterictic.Thus, one or more these features can be expressed or impliedly be comprised to the feature that is limited with " first ", " second ", " the 3rd ".In description of the invention, the implication of " a plurality of " is two or more, unless clear and definite concrete restriction is arranged in addition.
Fig. 1 is the process flow diagram of the product feature comment method for digging of the embodiment of the invention.As shown in Figure 1, the product feature comment method for digging according to the embodiment of the invention may further comprise the steps:
Step S101 obtains comment by web crawlers from the website, wherein, comments on a kind of in the comment of explicit comment or implicit expression.
Particularly, obtain a large amount of user comments for some specific products from the internet, in order to carry out opining mining and analysis.Crawl comment from websites such as forum, blog, Jingdone district, mail, microbloggings or in special comment website by web crawlers, and be saved in associated databases.The data of crawl comprise the view of product information, comment specifying information, certain some service or use gains in depth of comprehension, also can be the viewpoints held for certain social event of user etc.
In one embodiment of the invention, web crawlers can regularly grasp the newly-increased comment of each product with integrality and the real-time of assurance data according to the time of institute.
Step S102 extracts explicit features from explicit comment.
Particularly, from explicit comment, extract frequent noun or the noun phrase that occurs.Because the present invention mainly pays close attention to the focus concept relevant with commenting on object, so this class name part of speech phrase has regular feature.According to these features, the grammatical form of the nominal phrase that definable extracts, for example, and adjective+noun, noun+noun, pronoun/verb/adjective/noun+" "+noun, noun+" "+verb etc.Identify and be divided into noun or noun phrase according to these grammatical forms again, and then extract.
Noun or noun phrase processed obtain the concept set.In one embodiment of the invention, through the extraction of above-mentioned noun or noun phrase, obtain the most original candidate collection of focus concept set.From the most original candidate collection of comment, extract the reviewer and pay close attention to the most, comment on maximum nouns or noun phrase, as focus concept Candidate Set.Reject in the frequent noun that occurs or the noun phrase and the skimble-skamble Frequent of comment.After the rejecting respectively individual character word, multiword word (the Chinese word that contains at least two Chinese characters) and the nominal phrase to the noun of frequent appearance carry out beta pruning, obtain the concept set.
Cluster is carried out in the concept set obtain conceptual clustering set, i.e. explicit features.After the concept extraction, from the urtext review information, obtained the concept set the most relevant with the comment object.May there be a plurality of concepts all to refer to same attribute, feature or the related notion of comment object in these concepts, these related notions be carried out cluster obtain conceptual clustering set, i.e. explicit features.
Step S103 extracts implicit features by explicit features or explicit comment in the implicit expression comment.
Particularly, can pass through the described implicit features of dual mode extracting comment.A kind of is to extract implicit features by association rule mining, and another kind is other texts are classified and then to extract implicit features by train classification models.
Extract implicit features by association rule mining, the collocation that is very similar to natural language processing field is extracted, because the input data set of this moment has been all explicit comments relevant with certain feature.
At first generate correlation rule according to explicit comment and by the collocation extracting method, wherein, correlation rule is comprised of word and explicit features or Frequent and explicit features.
In one embodiment of the invention, collocation extracting method commonly used comprises: frequency, mutual information, frequency * mutual information, t check, χ 2 (chi-square, card side) check etc.
The method of simple collocation is to calculate frequency that certain collocation occurs, i.e. frequency.If two words often occur simultaneously, they might be exactly a collocation commonly used so.
Mutual information is an index that is used for weighing interesting collocation in information theory.Mutual information between two words calculates by following formula:
PMI ( x , y ) = log 2 P xy P x P y
Wherein, P XyThe joint probability that x and y occur in corpus simultaneously, P xWith P yIt is respectively the probability that x and y occur separately.Mutual information is an extraordinary index for weighing independence, suitable is not used for weighing dependent index but but be not one.Because for dependence, the following formula score depends primarily on the frequency that x and y occur separately.In the situation that other condition equates, it is higher that the independent occurrence number of x and y gets branch when less.This runs counter to the original intention of extracting commonly used collocation, because if the frequency that word occurs is high, represents that this collocation is in daily use, and confidence level is better, also should give a higher score.A kind of Innovative method is that the information with frequency adds, i.e. frequency * mutual information:
P xy * PMI ( x , y ) = P xy * log 2 P xy P x P y
Another kind of collocation extracting method is test of hypothesis, and this method often is used to judge the whether incident of certain event.Judge that whether incident appears in two word x and y simultaneously, can be with the method for test of hypothesis.We suppose first H 0Expression x does not have other related the appearance with y except accidental simultaneously, if then calculate H 0The probability P that event occurs during for true, hypothesis is true and false before determining according to the value of P at last.
A kind of hypothesis testing method commonly used is the t check.It is the normal distribution of μ that its hypothesis sample is obeyed average, then calculates average and the variance of sample.By relatively the average of actual computation and the difference between the expectation average determine whether to accept this hypothesis.The t check can be calculated by following formula:
t = x ‾ - u S 2 N
Wherein
Figure BDA00002307396700062
Sample average, S 2Be sample variance, N is the sample space size, the average of μ for distributing.This method is used in the collocation extraction μ=P xP y, because P XyVery little, approximate variance S 2=P Xy(1-P Xy) ≈ P Xy, Value is the probability P that x and y reality occur simultaneously in the corpus XyIf the value of t check is enough large, the hypothesis that proposes so before is false just.T test-hypothesis sample Normal Distribution, and in actual conditions, not necessarily always set up, it is X that another kind does not need sample to obey just too distributional assumption check 2(card side) check.Assess the correlation degree of two words by calculating chi-square value based on the evaluation assessment of Chi-square Test.The x of two word x and y 2Value can be calculated with following formula:
x 2 = Σ i = 1 c Σ j = 1 r ( o ij - e ij ) 2 e ij
Wherein, o IjJoint event (x i, y j) observation frequency (being actual count), e Ij(x i, y j) the expectation frequency, computing method are as follows:
e ij = count ( x = x i ) * count ( y = y i ) N
Wherein, N is the number of data tuple, count (x=x i) be that x has value x iThe tuple number, and count (y=y j) be that y has value y jThe tuple number.Chi-square value is larger, illustrates that two words are more relevant, is that the possibility of a collocation is also just larger.
Then extract implicit features by correlation rule in the implicit expression comment.Wherein, correlation rule is comprised of word and explicit features or Frequent and explicit features.For example, fashion → { outward appearance, profile }, cheap → { price, price } calculated → { price, price }.According to corresponding word or Frequent in the correlation rule that generates in implicit expression comment, occur the time, can judge that this comments on described feature, and then obtain implicit features.
Fig. 2 is for obtaining according to an embodiment of the invention the process flow diagram of implicit features by train classification models.As shown in Figure 2, it is as follows to obtain the step of implicit features by train classification models:
Step S201 selects attribute to generate training pattern according to explicit comment and explicit features.As a kind of training mode, and select corresponding attribute to make it to obtain a kind of corresponding relation according to comment according to the explicit features of the Frequent of putting down in writing in the explicit comment, noun or noun phrase and this comment.
Step S202 is by the training generation sorter of training pattern.Generate directly the sorter that can specifically classify to comment according to training pattern, for example, sorter directly can be judged outward appearance, price or screen etc.
Step S203, obtain implicit expression comment and by sorter to implicit expression comment carry out the Analysis deterrmination implicit features.In the implicit expression comment of obtaining, search situation about conforming to training pattern, when finding, can directly determine the implicit features of this implicit expression comment according to sorter.Because the singularity of comment institute can find implicit expression correspondingly to comment in all explicit comments as long as comment reaches certain amount.
Step S104 carries out the emotional culture classification with the comment of describing same characteristic features.
Particularly, will from all comments, extraction be described by the same characteristic features comment according to comment and the corresponding relation of feature.Then, by dictionary the emotional culture classification is carried out in the comment of describing same characteristic features.
In one embodiment of the invention, comprehensive existing sentiment dictionary, linguistics emotion achievement in research, network lexicon, input method dictionary etc. resource construction relatively complete sentiment dictionary, on the basis that adds user feeling statement rule, carried out comment at other emotional semantic classification of statement particle size fraction.Commonly used some need the rule of natural language statement to be processed, comprise negative, turnover sentence and comprise the statement of degree word:
(1) statement that contains negative word is processed and to be mainly contained: negate+commendation=derogatory sense; Negate+derogatory sense=commendation; Negate+neutral=derogatory sense.
When (2) sentence contains adversative emotion tendency general and upper one opposite.
Certain emotion tendency is in most cases arranged when (3) sentence contains the degree word.
Can obtain the basic emotion tendency of each comment statement by these rules.Although, most of emotion word can both directly extract with the emotion tendency by sentiment dictionary to be judged, yet, owing to exist the minority emotion along with the word of different field and described feature dynamic change, for example, " this hotel's ambient noise is very large ", these words tend to bring very large interference to the emotional semantic classification process.By corpus is added up, set up a feeling polarities along with the dictionary of context dynamic change for each feature, for example, greatly-little, high-low, thick-thin, by the context of co-text analysis to comment, inferred iteratively these words emotion tendency for certain characteristics of objects in this field again.Above-mentioned emotional semantic classification carries out with the process need iteration that makes up context-sensitive sentiment dictionary, can use the information of context dependent sentiment dictionary during emotional semantic classification for each feature, this two step iterates to context-sensitive sentiment dictionary always and no longer changes, finally to commenting on definite emotional semantic classification.
Step S105 extracts a plurality of comments and generates the comment summary the comment behind emotional semantic classification.
Particularly, by behind the emotional semantic classification tendency of comment being added up, for example, the emotion tendency can be commendation or derogatory sense.Generate the comment summary according to these commendation comments and the occupied ratio of derogatory sense comment from commendation comment and a plurality of comments of derogatory sense comment extraction.
According to the method for the embodiment of the invention, comment on as data source by the magnanimity that obtains in the website, therefore improved the accuracy of data, simultaneously comment is carried out emotional semantic classification for the user provides intuitively data, made things convenient for user's inquiry.
Fig. 2 is the process flow diagram of product feature comment method for digging in accordance with another embodiment of the present invention.As shown in Figure 2, the product feature comment method for digging according to the embodiment of the invention may further comprise the steps:
Step S301 obtains comment by web crawlers from the website, wherein, comments on a kind of in the comment of explicit comment or implicit expression.
Step S302 extracts explicit features from explicit comment.
Step S303 extracts implicit features by explicit features or explicit comment in the implicit expression comment.
Can pass through the described implicit features of dual mode extracting comment.A kind of is to extract implicit features by association rule mining, and another kind is other texts are classified and then to extract implicit features by train classification models.Extract implicit features by correlation rule: generate correlation rule according to explicit comment and by the collocation extracting method, wherein, correlation rule is comprised of word and explicit features or Frequent and explicit features, then extracts implicit features according to correlation rule in the implicit expression comment.Extract implicit features by train classification models: select attribute to generate training pattern according to explicit comment and explicit features, then the training according to training pattern generates sorter, obtains at last the implicit expression comment and extracts implicit features by sorter.
Step S304 carries out the emotional culture classification with the comment of describing same characteristic features.
To from all comments, extraction be described by the same characteristic features comment according to comment and the corresponding relation of feature.Then, by dictionary the emotional culture classification is carried out in the comment of describing same characteristic features.
Step S305 extracts a plurality of comments and generates the comment summary the comment behind emotional semantic classification.
Step S306 checks the comment summary by product feature.
Particularly, the user can by the comment summary relevant with this feature to inquiring about of feature, be known advantage or deficiency, advantage or inferior position to this feature.Strategy is bought in the formation that the consumer can be rough thus, and the provider of product or service then can further improve this product or service.
According to the method for the embodiment of the invention, the user is by checking the comment summary of product or service, and the consumer can more understand product or service provides convenience for consumption, and the provider of product or service then can further improve weak point.
Below in conjunction with the product feature comment digging system of Figure of description detailed description according to the embodiment of the invention.
Fig. 4 is the structured flowchart of the product feature comment digging system of the embodiment of the invention, as shown in Figure 4, the product feature comment digging system according to the embodiment of the invention comprises acquisition module 100, the first extraction module 200, the second extraction module 300, emotional semantic classification module 400 and summarization generation module 500.
Particularly, acquisition module 100 is used for obtaining comment by web crawlers from the website, wherein, comments on a kind of in explicit comment or the implicit expression comment.
Obtain a large amount of user comments for some specific products from the internet, in order to carry out opining mining and analysis.Crawl comment from websites such as forum, blog, Jingdone district, mail, microbloggings or in special comment website by web crawlers, and be saved in associated databases.The data of crawl comprise the view of product information, comment specifying information, certain some service or use gains in depth of comprehension, also can be the viewpoints held for certain social event of user etc.
In one embodiment of the invention, web crawlers can regularly grasp the newly-increased comment of each product with integrality and the real-time of assurance data according to the time of institute.
The first extraction module 200 is used for extracting explicit features from explicit comment.
Fig. 5 is the frame diagram of the first extraction module according to an embodiment of the invention.As shown in Figure 5, the first extraction module 200 comprises: the first extraction unit 210, processing unit 220 and cluster cell 230.
More specifically, the first extraction unit 210 is used for extracting frequent noun or the noun phrase that occurs from explicit comment;
Processing unit 220 is used for noun or noun phrase processed and obtains the concept set.
Cluster cell 230 is used for that cluster is carried out in the concept set and obtains conceptual clustering set, i.e. explicit features.
The second extraction module 300 is used for extracting implicit features by explicit features or explicit comment in the implicit expression comment.
Particularly, can pass through the described implicit features of dual mode extracting comment.A kind of is to extract implicit features by association rule mining, and another kind is other texts are classified and then to extract implicit features by train classification models.
Fig. 6 is the frame diagram that extracts according to an embodiment of the invention implicit features in the second extraction module by association rule mining.As shown in Figure 6, the second extraction module 300 comprises: the first generation unit 310 and the second extraction unit 320.
The first generation unit 310 is used for generating correlation rule according to explicit comment and by the collocation extracting method, and wherein, correlation rule is comprised of word and explicit features or Frequent and explicit features.
The second extraction unit 320 is used for extracting implicit features according to correlation rule in the implicit expression comment.
Fig. 7 is the frame diagram that extracts according to an embodiment of the invention implicit features in the second extraction module by train classification models.As shown in Figure 7, the second extraction module 300 comprises: the second generation unit 330, the 3rd generates unit 340, acquiring unit 350 and determining unit 360.
The second generation unit 330 is used for selecting attribute to generate training pattern according to explicit comment and explicit features.
The 3rd generates unit 340 is used for generating sorter according to the training of training pattern.
Acquiring unit 350 is used for obtaining the implicit expression comment.
Determining unit 360 is carried out the Analysis deterrmination implicit features for by sorter implicit expression being commented on.
In one embodiment of the invention, as a kind of training mode, and select corresponding attribute to make it to obtain a kind of corresponding relation according to comment according to the explicit features of the Frequent of putting down in writing in the explicit comment, noun or noun phrase and this comment.Training by training pattern generates sorter again.Generate directly the sorter that can specifically classify to comment according to training pattern, for example, sorter directly can be judged outward appearance, price or screen etc.In the implicit expression comment of obtaining, search situation about conforming to training pattern, when finding, can directly determine the implicit features of this implicit expression comment according to sorter.Because the singularity of comment institute can find implicit expression correspondingly to comment in all explicit comments as long as comment reaches certain amount.
Emotional semantic classification module 400 is carried out the emotional culture classification for the comment that will describe same characteristic features.
Emotional semantic classification module 400 comprises: the 3rd extraction unit and emotional semantic classification unit.
The 3rd extraction unit is used for extracting the comment of describing same characteristic features from comment.
The emotional semantic classification unit is used for by dictionary the emotional culture classification being carried out in the comment of describing same characteristic features.
In one embodiment of the invention, comprehensive existing sentiment dictionary, linguistics emotion achievement in research, network lexicon, input method dictionary etc. resource construction relatively complete sentiment dictionary, on the basis that adds user feeling statement rule, carried out comment at other emotional semantic classification of statement particle size fraction.Then, further corpus is added up, set up a feeling polarities along with the dictionary of context dynamic change for each feature, again by the context of co-text analysis to comment, inferred iteratively these words emotion tendency for certain characteristics of objects in this field.Emotional semantic classification and the process need iteration that makes up context-sensitive sentiment dictionary are carried out, until context-sensitive sentiment dictionary no longer changes, finally to commenting on definite emotional semantic classification.
Summarization generation module 500 is extracted a plurality of comments for the comment behind emotional semantic classification and is generated the comment summary.
In one embodiment of the invention, by behind the emotional semantic classification tendency of comment being added up, for example, the emotion tendency can be commendation or derogatory sense.Generate the comment summary according to these commendation comments and the occupied ratio of derogatory sense comment from commendation comment and a plurality of comments of derogatory sense comment extraction.
According to the system of the embodiment of the invention, comment on as data source by the magnanimity that obtains in the website, therefore improved the accuracy of data, simultaneously comment is carried out emotional semantic classification for the user provides intuitively data, made things convenient for user's inquiry.
Fig. 8 is the frame diagram of product feature comment digging system in accordance with another embodiment of the present invention.As shown in Figure 8, also comprise according to the product feature of embodiment of the invention comment digging system and check module 600.
Check that module 600 is used for checking the comment summary by product feature.
In one embodiment of the invention, the user can by the comment summary relevant with this feature to inquiring about of feature, be known advantage or deficiency, advantage or inferior position to this feature.Strategy is bought in the formation that the consumer can be rough thus, and the provider of product or service then can further improve this product or service.
According to the method for the embodiment of the invention, the user is by checking the comment summary of product or service, and the consumer can more understand product or service provides convenience for consumption, and the provider of product or service then can further improve weak point.
The specific operation process that should be appreciated that modules in the system embodiment of the present invention and unit can be identical with the description in the embodiment of the method, is not described in detail herein.
Although the above has illustrated and has described embodiments of the invention, be understandable that, above-described embodiment is exemplary, can not be interpreted as limitation of the present invention, those of ordinary skill in the art can change above-described embodiment in the situation that does not break away from principle of the present invention and aim within the scope of the invention, modification, replacement and modification.

Claims (12)

1. a product feature comment method for digging is characterized in that, may further comprise the steps:
S1: obtain comment by web crawlers from the website, wherein, described comment is a kind of in the comment of explicit comment or implicit expression;
S2: extract explicit features from described explicit comment;
S3: in described implicit expression comment, extract implicit features by described explicit features or described explicit comment;
S4: the emotional culture classification is carried out in the comment that will describe same characteristic features; And
S5: from the comment behind the described emotional semantic classification, extract a plurality of comments and generate the comment summary.
2. product feature comment method for digging according to claim 1 is characterized in that, also comprises:
S6: check described comment summary by product feature.
3. product feature according to claim 1 is commented on method for digging, it is characterized in that described step S2 further comprises:
S21: from described explicit comment, extract frequent noun or the noun phrase that occurs;
S22: described noun or noun phrase processed obtaining the concept set; And
S23: cluster is carried out in described concept set obtain conceptual clustering set, i.e. described explicit features.
4. product feature according to claim 1 is commented on method for digging, it is characterized in that described step S3 specifically comprises:
S31: generate correlation rule according to explicit comment and by the collocation extracting method, wherein, described correlation rule is comprised of word and described explicit features or Frequent and described explicit features; And
S32: extract described implicit features in described implicit expression comment according to described correlation rule.
5. product feature according to claim 1 is commented on method for digging, it is characterized in that described step S3 specifically comprises:
S310: select attribute to generate training pattern according to described explicit comment and explicit features;
S320: the training according to described training pattern generates sorter;
S330: obtain described implicit expression comment; And
S340: the described implicit features of Analysis deterrmination is carried out in described implicit expression comment by described sorter.
6. product feature according to claim 1 is commented on method for digging, it is characterized in that described step S4 specifically comprises:
S41: from described comment, extract the comment of describing same characteristic features; And
S42: the emotional culture classification is carried out in the described comment of describing same characteristic features by dictionary.
7. a product feature comment digging system is characterized in that, comprising:
Acquisition module is used for obtaining comment by web crawlers from the website, and wherein, described comment is a kind of in explicit comment or the implicit expression comment;
The first extraction module is used for extracting explicit features from described explicit comment;
The second extraction module is used for extracting implicit features by described explicit features or described explicit comment in described implicit expression comment;
The emotional semantic classification module is carried out the emotional culture classification for the comment that will describe same characteristic features; And
The summarization generation module is extracted a plurality of comments for the comment behind described emotional semantic classification and is generated the comment summary.
8. product feature comment digging system according to claim 7 is characterized in that, also comprises:
Check module, be used for checking described comment summary by product feature.
9. product feature according to claim 8 is commented on digging system, it is characterized in that described the first extraction module specifically comprises:
The first extraction unit is used for extracting frequent noun or the noun phrase that occurs from described explicit comment;
Processing unit is used for described noun or noun phrase processed obtaining the concept set; And
Cluster cell is used for that cluster is carried out in described concept set and obtains conceptual clustering set, i.e. described explicit features.
10. product feature according to claim 7 is commented on digging system, it is characterized in that described the second extraction module specifically comprises:
The first generation unit is used for generating correlation rule according to explicit comment and by the collocation extracting method, and wherein, described correlation rule is comprised of word and described explicit features or Frequent and described explicit features; And
The second extraction unit is used for extracting described implicit features according to described correlation rule in described implicit expression comment.
11. a kind of multi-eye stereo video acquisition system based on self-calibration technology according to claim 7 is characterized in that described the second extraction module specifically comprises:
The second generation unit is used for selecting attribute to generate training pattern according to described explicit comment and explicit features;
The 3rd generates the unit, is used for generating sorter according to the training of described training pattern;
Acquiring unit is used for obtaining described implicit expression comment; And
Determining unit is used for by described sorter the described implicit features of Analysis deterrmination being carried out in described implicit expression comment.
12. a kind of multi-eye stereo video acquisition system based on self-calibration technology according to claim 7 is characterized in that described emotional semantic classification module specifically comprises:
The 3rd extraction unit is used for extracting the comment of describing same characteristic features from described comment; And
The emotional semantic classification unit is used for by dictionary the emotional culture classification being carried out in the described comment of describing same characteristic features.
CN2012104138543A 2012-10-25 2012-10-25 Method and system for excavating comments on characteristics of product Pending CN102945268A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN2012104138543A CN102945268A (en) 2012-10-25 2012-10-25 Method and system for excavating comments on characteristics of product

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN2012104138543A CN102945268A (en) 2012-10-25 2012-10-25 Method and system for excavating comments on characteristics of product

Publications (1)

Publication Number Publication Date
CN102945268A true CN102945268A (en) 2013-02-27

Family

ID=47728212

Family Applications (1)

Application Number Title Priority Date Filing Date
CN2012104138543A Pending CN102945268A (en) 2012-10-25 2012-10-25 Method and system for excavating comments on characteristics of product

Country Status (1)

Country Link
CN (1) CN102945268A (en)

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345525A (en) * 2013-07-22 2013-10-09 苏州大学 Method, device and processor for text categorization
CN103399916A (en) * 2013-07-31 2013-11-20 清华大学 Internet comment and opinion mining method and system on basis of product features
CN104462480A (en) * 2014-12-18 2015-03-25 刘耀强 Typicality-based big comment data mining method
CN106021413A (en) * 2016-05-13 2016-10-12 清华大学 Theme model based self-extendable type feature selecting method and system
CN106202108A (en) * 2015-05-06 2016-12-07 阿里巴巴集团控股有限公司 Web crawlers captures method for allocating tasks and device and data grab method and device
CN106354754A (en) * 2016-08-16 2017-01-25 清华大学 Bootstrap-type implicit characteristic mining method and system based on dispersed independent component analysis
CN106708868A (en) * 2015-11-16 2017-05-24 ***通信集团北京有限公司 Method and system for analyzing internet data
CN107273351A (en) * 2017-05-31 2017-10-20 温州市鹿城区中津先进科技研究院 A kind of product feature extracting method based on big data opining mining
CN107943909A (en) * 2017-11-17 2018-04-20 合肥工业大学 User demand trend method for digging and device, storage medium based on comment data
CN108920545A (en) * 2018-06-13 2018-11-30 四川大学 The Chinese affective characteristics selection method of sentiment dictionary and Ka Fang model based on extension
CN109190109A (en) * 2018-07-26 2019-01-11 中国科学院自动化研究所 Merge the method and device that user information generates comment abstract
CN109299460A (en) * 2018-09-18 2019-02-01 北京三快在线科技有限公司 Analyze method, apparatus, electronic equipment and the storage medium of the evaluation data in shop
CN109582945A (en) * 2018-12-17 2019-04-05 北京百度网讯科技有限公司 Article generation method, device and storage medium
CN109886104A (en) * 2019-01-14 2019-06-14 浙江大学 A kind of motion feature extracting method based on the perception of video before and after frames relevant information
CN110738046A (en) * 2018-07-03 2020-01-31 百度在线网络技术(北京)有限公司 Viewpoint extraction method and device
CN111428489A (en) * 2020-03-19 2020-07-17 北京百度网讯科技有限公司 Comment generation method and device, electronic equipment and storage medium
CN112270170A (en) * 2020-10-19 2021-01-26 中译语通科技股份有限公司 Analysis method, device, medium and electronic equipment for implicit expression statement
CN112559746A (en) * 2020-12-11 2021-03-26 南京邮电大学 Product comment mining method and system
CN113722487A (en) * 2021-08-31 2021-11-30 平安普惠企业管理有限公司 User emotion analysis method, device and equipment and storage medium
CN114663246A (en) * 2022-05-24 2022-06-24 中国电子科技集团公司第三十研究所 Representation modeling method of information product in propagation simulation and multi-agent simulation method

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101609459A (en) * 2009-07-21 2009-12-23 北京大学 A kind of extraction system of affective characteristic words
CN101667194A (en) * 2009-09-29 2010-03-10 北京大学 Automatic abstracting method and system based on user comment text feature
CN102236722A (en) * 2011-08-17 2011-11-09 广州索答信息科技有限公司 Method and system for generating user comment summaries based on triples
CN102385579A (en) * 2010-08-30 2012-03-21 腾讯科技(深圳)有限公司 Internet information classification method and system
US20120179751A1 (en) * 2011-01-06 2012-07-12 International Business Machines Corporation Computer system and method for sentiment-based recommendations of discussion topics in social media

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101609459A (en) * 2009-07-21 2009-12-23 北京大学 A kind of extraction system of affective characteristic words
CN101667194A (en) * 2009-09-29 2010-03-10 北京大学 Automatic abstracting method and system based on user comment text feature
CN102385579A (en) * 2010-08-30 2012-03-21 腾讯科技(深圳)有限公司 Internet information classification method and system
US20120179751A1 (en) * 2011-01-06 2012-07-12 International Business Machines Corporation Computer system and method for sentiment-based recommendations of discussion topics in social media
CN102236722A (en) * 2011-08-17 2011-11-09 广州索答信息科技有限公司 Method and system for generating user comment summaries based on triples

Cited By (30)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103345525A (en) * 2013-07-22 2013-10-09 苏州大学 Method, device and processor for text categorization
CN103399916A (en) * 2013-07-31 2013-11-20 清华大学 Internet comment and opinion mining method and system on basis of product features
CN104462480A (en) * 2014-12-18 2015-03-25 刘耀强 Typicality-based big comment data mining method
CN104462480B (en) * 2014-12-18 2017-11-10 刘耀强 Comment big data method for digging based on typicalness
CN106202108A (en) * 2015-05-06 2016-12-07 阿里巴巴集团控股有限公司 Web crawlers captures method for allocating tasks and device and data grab method and device
CN106202108B (en) * 2015-05-06 2019-09-06 阿里巴巴集团控股有限公司 Web crawlers grabs method for allocating tasks and device and data grab method and device
CN106708868B (en) * 2015-11-16 2020-02-21 ***通信集团北京有限公司 Internet data analysis method and system
CN106708868A (en) * 2015-11-16 2017-05-24 ***通信集团北京有限公司 Method and system for analyzing internet data
CN106021413A (en) * 2016-05-13 2016-10-12 清华大学 Theme model based self-extendable type feature selecting method and system
CN106021413B (en) * 2016-05-13 2019-07-02 清华大学 Auto-expanding type feature selection approach and system based on topic model
CN106354754A (en) * 2016-08-16 2017-01-25 清华大学 Bootstrap-type implicit characteristic mining method and system based on dispersed independent component analysis
CN107273351A (en) * 2017-05-31 2017-10-20 温州市鹿城区中津先进科技研究院 A kind of product feature extracting method based on big data opining mining
CN107943909A (en) * 2017-11-17 2018-04-20 合肥工业大学 User demand trend method for digging and device, storage medium based on comment data
CN108920545A (en) * 2018-06-13 2018-11-30 四川大学 The Chinese affective characteristics selection method of sentiment dictionary and Ka Fang model based on extension
CN108920545B (en) * 2018-06-13 2021-07-09 四川大学 Chinese emotion feature selection method based on extended emotion dictionary and chi-square model
CN110738046A (en) * 2018-07-03 2020-01-31 百度在线网络技术(北京)有限公司 Viewpoint extraction method and device
CN110738046B (en) * 2018-07-03 2023-06-06 百度在线网络技术(北京)有限公司 Viewpoint extraction method and apparatus
CN109190109A (en) * 2018-07-26 2019-01-11 中国科学院自动化研究所 Merge the method and device that user information generates comment abstract
CN109190109B (en) * 2018-07-26 2020-09-29 中国科学院自动化研究所 Method and device for generating comment abstract by fusing user information
CN109299460A (en) * 2018-09-18 2019-02-01 北京三快在线科技有限公司 Analyze method, apparatus, electronic equipment and the storage medium of the evaluation data in shop
CN109299460B (en) * 2018-09-18 2022-07-12 北京三快在线科技有限公司 Method and device for analyzing evaluation data of shop, electronic device and storage medium
CN109582945A (en) * 2018-12-17 2019-04-05 北京百度网讯科技有限公司 Article generation method, device and storage medium
CN109886104A (en) * 2019-01-14 2019-06-14 浙江大学 A kind of motion feature extracting method based on the perception of video before and after frames relevant information
CN111428489A (en) * 2020-03-19 2020-07-17 北京百度网讯科技有限公司 Comment generation method and device, electronic equipment and storage medium
CN111428489B (en) * 2020-03-19 2023-08-29 北京百度网讯科技有限公司 Comment generation method and device, electronic equipment and storage medium
CN112270170A (en) * 2020-10-19 2021-01-26 中译语通科技股份有限公司 Analysis method, device, medium and electronic equipment for implicit expression statement
CN112559746A (en) * 2020-12-11 2021-03-26 南京邮电大学 Product comment mining method and system
CN113722487A (en) * 2021-08-31 2021-11-30 平安普惠企业管理有限公司 User emotion analysis method, device and equipment and storage medium
CN114663246A (en) * 2022-05-24 2022-06-24 中国电子科技集团公司第三十研究所 Representation modeling method of information product in propagation simulation and multi-agent simulation method
CN114663246B (en) * 2022-05-24 2022-09-23 中国电子科技集团公司第三十研究所 Representation modeling method of information product in propagation simulation and multi-agent simulation method

Similar Documents

Publication Publication Date Title
CN102945268A (en) Method and system for excavating comments on characteristics of product
Gu et al. " what parts of your apps are loved by users?"(T)
CN103399916A (en) Internet comment and opinion mining method and system on basis of product features
Lin et al. Predictive intelligence in harmful news identification by BERT-based ensemble learning model with text sentiment analysis
Xu et al. Mining comparative opinions from customer reviews for competitive intelligence
CN103699626B (en) Method and system for analysing individual emotion tendency of microblog user
Castellanos et al. LCI: a social channel analysis platform for live customer intelligence
CN106503049A (en) A kind of microblog emotional sorting technique for merging multiple affection resources based on SVM
Lloret et al. A novel concept-level approach for ultra-concise opinion summarization
WO2016085409A1 (en) A method and system for sentiment classification and emotion classification
Sharma et al. Nlp and machine learning techniques for detecting insulting comments on social networking platforms
CN103365867A (en) Method and device for emotion analysis of user evaluation
CN103455562A (en) Text orientation analysis method and product review orientation discriminator on basis of same
CN104820629A (en) Intelligent system and method for emergently processing public sentiment emergency
Hong et al. Influencing factors of the persuasiveness of online reviews considering persuasion methods
CN106354845A (en) Microblog rumor recognizing method and system based on propagation structures
Tayal et al. Sentiment analysis on social campaign “Swachh Bharat Abhiyan” using unigram method
CN104636425A (en) Method for predicting and visualizing emotion cognitive ability of network individual or group
GB2502037A (en) Topic analytics
KR20120108095A (en) System for analyzing social data collected by communication network
Benamara et al. Introduction to the special issue on language in social media: exploiting discourse and other contextual information
Sims et al. Measuring information propagation in literary social networks
CN101957812A (en) Verb semantic information extracting method based on event ontology
Chenlo et al. Sentiment-based ranking of blog posts using rhetorical structure theory
Bosco et al. Detecting happiness in Italian tweets: Towards an evaluation dataset for sentiment analysis in Felicitta

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C12 Rejection of a patent application after its publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20130227