CN102945268A - Method and system for excavating comments on characteristics of product - Google Patents
Method and system for excavating comments on characteristics of product Download PDFInfo
- Publication number
- CN102945268A CN102945268A CN2012104138543A CN201210413854A CN102945268A CN 102945268 A CN102945268 A CN 102945268A CN 2012104138543 A CN2012104138543 A CN 2012104138543A CN 201210413854 A CN201210413854 A CN 201210413854A CN 102945268 A CN102945268 A CN 102945268A
- Authority
- CN
- China
- Prior art keywords
- comment
- explicit
- features
- implicit
- comments
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Images
Landscapes
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention provides a method and system for obtaining comments on characteristics of a product. The method comprises the following steps: comments are acquired from a website through Web-Crawler, wherein the comments are explicit comments or implicit comments; explicit characteristics are extracted from explicit comments; implicit characteristics are extracted from the implicit comments through the explicit characteristics or explicit comments; comments that describing same characteristics are classified in an emotional manner; and a plurality of comments are extracted from the comments classified in an emotional manner. According to the method provided by the embodiment of the invention, a large number of comments acquired from the website are taken as data sources, so that the accuracy of data is improved; and meanwhile, the comments are classified in an emotional manner, so that visualized data is provided for users, and convenience is provided for inquiry of users.
Description
Technical field
The present invention relates to field of computer technology, particularly a kind of product feature comment method for digging and system.
Background technology
Along with constantly popularizing and the develop rapidly of web2.0 of internet, the review information about social event, focus personage and various product that the internet is passed on has received the concern of each side, become an important channel of people's obtaining information, also often in people's decision-making, occupied very large component.
On the characteristics of Information Communication, the internet has interactivity, can fast and effeciently propagate netizen's viewpoint, thereby form certain Social Public Feelings guiding, thereby it is compared with traditional media at the aspects such as real-time effectiveness, social influence power and spin of the velocity of propagation of information, information and has an enormous advantage.The user has been not only the role who serves as a simple information browse person now, and in the time of more, the user also is the publisher of an information.Forum, blog, comment website, mail, microblogging etc. provide one to release news all for the user of Web 2.0, express own viewpoint place.So, begin to have produced a large amount of review information that contains subjective colo(u)r on the internet, these information can be the user for a certain product, the view of certain some service or use gains in depth of comprehension also can be the viewpoint held for certain social event of user etc.
The method that generally adopts at present is directly to obtain comment from channels such as comment websites, directly extracts the comment that comprises the product feature phrase to analyze, and enumerates representative comment for user's inquiry.
The shortcoming that classic method exists comprises:
(1) the extraction mode to characteristics of objects is single, has reduced the accuracy of feature extraction.
(2) comment is enumerated in indication, so acquired information is limited.
Summary of the invention
Purpose of the present invention is intended to solve at least one of above-mentioned technological deficiency.
For achieving the above object, the embodiment of one aspect of the present invention proposes a kind of product feature comment method for digging, may further comprise the steps: S1: obtain comment by web crawlers from the website, wherein, described comment is a kind of in explicit comment or the implicit expression comment; S2: extract explicit features from described explicit comment; S3: in described implicit expression comment, extract implicit features by described explicit features or described explicit comment; S4: the emotional culture classification is carried out in the comment that will describe same characteristic features; And S5: from the comment behind the described emotional semantic classification, extract a plurality of comments and generate the comment summary.
According to the method for the embodiment of the invention, comment on as data source by the magnanimity that obtains in the website, therefore improved the accuracy of data, simultaneously comment is carried out emotional semantic classification for the user provides intuitively data, made things convenient for user's inquiry.
In one embodiment of the invention, described method also comprises: S6: check described comment summary by product feature.
In one embodiment of the invention, described step S2 specifically comprises: S21: extract frequent noun or the noun phrase that occurs from described explicit comment; S22: described noun or noun phrase processed obtaining the concept set; And S23: cluster is carried out in described concept set obtain conceptual clustering set, i.e. described explicit features.
In one embodiment of the invention, described step S3 specifically comprises: S31: generate correlation rule according to explicit comment and by the collocation extracting method, wherein, described correlation rule is comprised of word and described explicit features or Frequent and described explicit features; And S32: extract described implicit features in described implicit expression comment according to described correlation rule.
In one embodiment of the invention, described step S3 specifically comprises: S310: select attribute to generate training pattern according to described explicit comment and explicit features; S320: the training according to described training pattern generates sorter; S330: obtain described implicit expression comment; And S340: the described implicit features of Analysis deterrmination is carried out in described implicit expression comment by described sorter.
In one embodiment of the invention, described step S4 specifically comprises: S41: extract the comment of describing same characteristic features from described comment; And S42: the emotional culture classification is carried out in the described comment of describing same characteristic features by dictionary.
For achieving the above object, embodiments of the invention propose a kind of product feature comment digging system on the other hand, comprising: acquisition module, be used for obtaining comment by web crawlers from the website, and wherein, described comment is a kind of in explicit comment or the implicit expression comment; The first extraction module is used for extracting explicit features from described explicit comment; The second extraction module is used for extracting implicit features by described explicit features or described explicit comment in described implicit expression comment; The emotional semantic classification module is carried out the emotional culture classification for the comment that will describe same characteristic features; And the summarization generation module, extract a plurality of comments for the comment behind described emotional semantic classification and generate the comment summary.
According to the system of the embodiment of the invention, comment on as data source by the magnanimity that obtains in the website, therefore improved the accuracy of data, simultaneously comment is carried out emotional semantic classification for the user provides intuitively data, made things convenient for user's inquiry.
In one embodiment of the present of invention, described system also comprises: check module, be used for checking described comment summary by product feature.
In one embodiment of the present of invention, described the first extraction module specifically comprises: the first extraction unit is used for extracting frequent noun or the noun phrase that occurs from described explicit comment; Processing unit is used for described noun or noun phrase processed obtaining the concept set; And cluster cell, be used for that cluster is carried out in described concept set and obtain conceptual clustering set, i.e. described explicit features.
In one embodiment of the present of invention, described the second extraction module specifically comprises: the first generation unit, be used for generating correlation rule according to explicit comment and by the collocation extracting method, wherein, described correlation rule is comprised of word and described explicit features or Frequent and described explicit features; And second extraction unit, be used for extracting described implicit features according to described correlation rule in described implicit expression comment.
In one embodiment of the present of invention, described the second extraction module specifically comprises: the second generation unit is used for selecting attribute to generate training pattern according to described explicit comment and explicit features; The 3rd generates the unit, is used for generating sorter according to the training of described training pattern; Acquiring unit is used for obtaining described implicit expression comment; And determining unit, be used for by described sorter the described implicit features of Analysis deterrmination being carried out in described implicit expression comment.
In one embodiment of the present of invention, described emotional semantic classification module specifically comprises: the 3rd extraction unit is used for extracting the comment of describing same characteristic features from described comment; And the emotional semantic classification unit, be used for by dictionary the emotional culture classification being carried out in the described comment of describing same characteristic features.
The aspect that the present invention adds and advantage in the following description part provide, and part will become obviously from the following description, or recognize by practice of the present invention.
Description of drawings
Above-mentioned and/or the additional aspect of the present invention and advantage are from obviously and easily understanding becoming the description of embodiment below in conjunction with accompanying drawing, wherein:
Fig. 1 is the process flow diagram of product feature comment method for digging according to an embodiment of the invention;
Fig. 2 is for obtaining according to an embodiment of the invention the process flow diagram of implicit features by train classification models;
Fig. 3 is the process flow diagram of product feature comment method for digging in accordance with another embodiment of the present invention;
Fig. 4 is the frame diagram of product feature comment digging system according to an embodiment of the invention;
Fig. 5 is the frame diagram of the first extraction module according to an embodiment of the invention;
Fig. 6 is the frame diagram that extracts according to an embodiment of the invention implicit features in the second extraction module by association rule mining;
Fig. 7 is the frame diagram that extracts according to an embodiment of the invention implicit features in the second extraction module by train classification models; And
Fig. 8 is the frame diagram of product feature comment digging system in accordance with another embodiment of the present invention.
Embodiment
The below describes embodiments of the invention in detail, and the example of embodiment is shown in the drawings, and wherein identical or similar label represents identical or similar element or the element with identical or similar functions from start to finish.Be exemplary below by the embodiment that is described with reference to the drawings, only be used for explaining the present invention, and can not be interpreted as limitation of the present invention.
In description of the invention, it will be appreciated that term " first ", " second ", " the 3rd " only are used for describing purpose, and can not be interpreted as indication or hint relative importance or the implicit quantity that indicates indicated technical characterictic.Thus, one or more these features can be expressed or impliedly be comprised to the feature that is limited with " first ", " second ", " the 3rd ".In description of the invention, the implication of " a plurality of " is two or more, unless clear and definite concrete restriction is arranged in addition.
Fig. 1 is the process flow diagram of the product feature comment method for digging of the embodiment of the invention.As shown in Figure 1, the product feature comment method for digging according to the embodiment of the invention may further comprise the steps:
Step S101 obtains comment by web crawlers from the website, wherein, comments on a kind of in the comment of explicit comment or implicit expression.
Particularly, obtain a large amount of user comments for some specific products from the internet, in order to carry out opining mining and analysis.Crawl comment from websites such as forum, blog, Jingdone district, mail, microbloggings or in special comment website by web crawlers, and be saved in associated databases.The data of crawl comprise the view of product information, comment specifying information, certain some service or use gains in depth of comprehension, also can be the viewpoints held for certain social event of user etc.
In one embodiment of the invention, web crawlers can regularly grasp the newly-increased comment of each product with integrality and the real-time of assurance data according to the time of institute.
Step S102 extracts explicit features from explicit comment.
Particularly, from explicit comment, extract frequent noun or the noun phrase that occurs.Because the present invention mainly pays close attention to the focus concept relevant with commenting on object, so this class name part of speech phrase has regular feature.According to these features, the grammatical form of the nominal phrase that definable extracts, for example, and adjective+noun, noun+noun, pronoun/verb/adjective/noun+" "+noun, noun+" "+verb etc.Identify and be divided into noun or noun phrase according to these grammatical forms again, and then extract.
Noun or noun phrase processed obtain the concept set.In one embodiment of the invention, through the extraction of above-mentioned noun or noun phrase, obtain the most original candidate collection of focus concept set.From the most original candidate collection of comment, extract the reviewer and pay close attention to the most, comment on maximum nouns or noun phrase, as focus concept Candidate Set.Reject in the frequent noun that occurs or the noun phrase and the skimble-skamble Frequent of comment.After the rejecting respectively individual character word, multiword word (the Chinese word that contains at least two Chinese characters) and the nominal phrase to the noun of frequent appearance carry out beta pruning, obtain the concept set.
Cluster is carried out in the concept set obtain conceptual clustering set, i.e. explicit features.After the concept extraction, from the urtext review information, obtained the concept set the most relevant with the comment object.May there be a plurality of concepts all to refer to same attribute, feature or the related notion of comment object in these concepts, these related notions be carried out cluster obtain conceptual clustering set, i.e. explicit features.
Step S103 extracts implicit features by explicit features or explicit comment in the implicit expression comment.
Particularly, can pass through the described implicit features of dual mode extracting comment.A kind of is to extract implicit features by association rule mining, and another kind is other texts are classified and then to extract implicit features by train classification models.
Extract implicit features by association rule mining, the collocation that is very similar to natural language processing field is extracted, because the input data set of this moment has been all explicit comments relevant with certain feature.
At first generate correlation rule according to explicit comment and by the collocation extracting method, wherein, correlation rule is comprised of word and explicit features or Frequent and explicit features.
In one embodiment of the invention, collocation extracting method commonly used comprises: frequency, mutual information, frequency * mutual information, t check, χ 2 (chi-square, card side) check etc.
The method of simple collocation is to calculate frequency that certain collocation occurs, i.e. frequency.If two words often occur simultaneously, they might be exactly a collocation commonly used so.
Mutual information is an index that is used for weighing interesting collocation in information theory.Mutual information between two words calculates by following formula:
Wherein, P
XyThe joint probability that x and y occur in corpus simultaneously, P
xWith P
yIt is respectively the probability that x and y occur separately.Mutual information is an extraordinary index for weighing independence, suitable is not used for weighing dependent index but but be not one.Because for dependence, the following formula score depends primarily on the frequency that x and y occur separately.In the situation that other condition equates, it is higher that the independent occurrence number of x and y gets branch when less.This runs counter to the original intention of extracting commonly used collocation, because if the frequency that word occurs is high, represents that this collocation is in daily use, and confidence level is better, also should give a higher score.A kind of Innovative method is that the information with frequency adds, i.e. frequency * mutual information:
Another kind of collocation extracting method is test of hypothesis, and this method often is used to judge the whether incident of certain event.Judge that whether incident appears in two word x and y simultaneously, can be with the method for test of hypothesis.We suppose first H
0Expression x does not have other related the appearance with y except accidental simultaneously, if then calculate H
0The probability P that event occurs during for true, hypothesis is true and false before determining according to the value of P at last.
A kind of hypothesis testing method commonly used is the t check.It is the normal distribution of μ that its hypothesis sample is obeyed average, then calculates average and the variance of sample.By relatively the average of actual computation and the difference between the expectation average determine whether to accept this hypothesis.The t check can be calculated by following formula:
Wherein
Sample average, S
2Be sample variance, N is the sample space size, the average of μ for distributing.This method is used in the collocation extraction μ=P
xP
y, because P
XyVery little, approximate variance S
2=P
Xy(1-P
Xy) ≈ P
Xy,
Value is the probability P that x and y reality occur simultaneously in the corpus
XyIf the value of t check is enough large, the hypothesis that proposes so before is false just.T test-hypothesis sample Normal Distribution, and in actual conditions, not necessarily always set up, it is X that another kind does not need sample to obey just too distributional assumption check
2(card side) check.Assess the correlation degree of two words by calculating chi-square value based on the evaluation assessment of Chi-square Test.The x of two word x and y
2Value can be calculated with following formula:
Wherein, o
IjJoint event (x
i, y
j) observation frequency (being actual count), e
Ij(x
i, y
j) the expectation frequency, computing method are as follows:
Wherein, N is the number of data tuple, count (x=x
i) be that x has value x
iThe tuple number, and count (y=y
j) be that y has value y
jThe tuple number.Chi-square value is larger, illustrates that two words are more relevant, is that the possibility of a collocation is also just larger.
Then extract implicit features by correlation rule in the implicit expression comment.Wherein, correlation rule is comprised of word and explicit features or Frequent and explicit features.For example, fashion → { outward appearance, profile }, cheap → { price, price } calculated → { price, price }.According to corresponding word or Frequent in the correlation rule that generates in implicit expression comment, occur the time, can judge that this comments on described feature, and then obtain implicit features.
Fig. 2 is for obtaining according to an embodiment of the invention the process flow diagram of implicit features by train classification models.As shown in Figure 2, it is as follows to obtain the step of implicit features by train classification models:
Step S201 selects attribute to generate training pattern according to explicit comment and explicit features.As a kind of training mode, and select corresponding attribute to make it to obtain a kind of corresponding relation according to comment according to the explicit features of the Frequent of putting down in writing in the explicit comment, noun or noun phrase and this comment.
Step S202 is by the training generation sorter of training pattern.Generate directly the sorter that can specifically classify to comment according to training pattern, for example, sorter directly can be judged outward appearance, price or screen etc.
Step S203, obtain implicit expression comment and by sorter to implicit expression comment carry out the Analysis deterrmination implicit features.In the implicit expression comment of obtaining, search situation about conforming to training pattern, when finding, can directly determine the implicit features of this implicit expression comment according to sorter.Because the singularity of comment institute can find implicit expression correspondingly to comment in all explicit comments as long as comment reaches certain amount.
Step S104 carries out the emotional culture classification with the comment of describing same characteristic features.
Particularly, will from all comments, extraction be described by the same characteristic features comment according to comment and the corresponding relation of feature.Then, by dictionary the emotional culture classification is carried out in the comment of describing same characteristic features.
In one embodiment of the invention, comprehensive existing sentiment dictionary, linguistics emotion achievement in research, network lexicon, input method dictionary etc. resource construction relatively complete sentiment dictionary, on the basis that adds user feeling statement rule, carried out comment at other emotional semantic classification of statement particle size fraction.Commonly used some need the rule of natural language statement to be processed, comprise negative, turnover sentence and comprise the statement of degree word:
(1) statement that contains negative word is processed and to be mainly contained: negate+commendation=derogatory sense; Negate+derogatory sense=commendation; Negate+neutral=derogatory sense.
When (2) sentence contains adversative emotion tendency general and upper one opposite.
Certain emotion tendency is in most cases arranged when (3) sentence contains the degree word.
Can obtain the basic emotion tendency of each comment statement by these rules.Although, most of emotion word can both directly extract with the emotion tendency by sentiment dictionary to be judged, yet, owing to exist the minority emotion along with the word of different field and described feature dynamic change, for example, " this hotel's ambient noise is very large ", these words tend to bring very large interference to the emotional semantic classification process.By corpus is added up, set up a feeling polarities along with the dictionary of context dynamic change for each feature, for example, greatly-little, high-low, thick-thin, by the context of co-text analysis to comment, inferred iteratively these words emotion tendency for certain characteristics of objects in this field again.Above-mentioned emotional semantic classification carries out with the process need iteration that makes up context-sensitive sentiment dictionary, can use the information of context dependent sentiment dictionary during emotional semantic classification for each feature, this two step iterates to context-sensitive sentiment dictionary always and no longer changes, finally to commenting on definite emotional semantic classification.
Step S105 extracts a plurality of comments and generates the comment summary the comment behind emotional semantic classification.
Particularly, by behind the emotional semantic classification tendency of comment being added up, for example, the emotion tendency can be commendation or derogatory sense.Generate the comment summary according to these commendation comments and the occupied ratio of derogatory sense comment from commendation comment and a plurality of comments of derogatory sense comment extraction.
According to the method for the embodiment of the invention, comment on as data source by the magnanimity that obtains in the website, therefore improved the accuracy of data, simultaneously comment is carried out emotional semantic classification for the user provides intuitively data, made things convenient for user's inquiry.
Fig. 2 is the process flow diagram of product feature comment method for digging in accordance with another embodiment of the present invention.As shown in Figure 2, the product feature comment method for digging according to the embodiment of the invention may further comprise the steps:
Step S301 obtains comment by web crawlers from the website, wherein, comments on a kind of in the comment of explicit comment or implicit expression.
Step S302 extracts explicit features from explicit comment.
Step S303 extracts implicit features by explicit features or explicit comment in the implicit expression comment.
Can pass through the described implicit features of dual mode extracting comment.A kind of is to extract implicit features by association rule mining, and another kind is other texts are classified and then to extract implicit features by train classification models.Extract implicit features by correlation rule: generate correlation rule according to explicit comment and by the collocation extracting method, wherein, correlation rule is comprised of word and explicit features or Frequent and explicit features, then extracts implicit features according to correlation rule in the implicit expression comment.Extract implicit features by train classification models: select attribute to generate training pattern according to explicit comment and explicit features, then the training according to training pattern generates sorter, obtains at last the implicit expression comment and extracts implicit features by sorter.
Step S304 carries out the emotional culture classification with the comment of describing same characteristic features.
To from all comments, extraction be described by the same characteristic features comment according to comment and the corresponding relation of feature.Then, by dictionary the emotional culture classification is carried out in the comment of describing same characteristic features.
Step S305 extracts a plurality of comments and generates the comment summary the comment behind emotional semantic classification.
Step S306 checks the comment summary by product feature.
Particularly, the user can by the comment summary relevant with this feature to inquiring about of feature, be known advantage or deficiency, advantage or inferior position to this feature.Strategy is bought in the formation that the consumer can be rough thus, and the provider of product or service then can further improve this product or service.
According to the method for the embodiment of the invention, the user is by checking the comment summary of product or service, and the consumer can more understand product or service provides convenience for consumption, and the provider of product or service then can further improve weak point.
Below in conjunction with the product feature comment digging system of Figure of description detailed description according to the embodiment of the invention.
Fig. 4 is the structured flowchart of the product feature comment digging system of the embodiment of the invention, as shown in Figure 4, the product feature comment digging system according to the embodiment of the invention comprises acquisition module 100, the first extraction module 200, the second extraction module 300, emotional semantic classification module 400 and summarization generation module 500.
Particularly, acquisition module 100 is used for obtaining comment by web crawlers from the website, wherein, comments on a kind of in explicit comment or the implicit expression comment.
Obtain a large amount of user comments for some specific products from the internet, in order to carry out opining mining and analysis.Crawl comment from websites such as forum, blog, Jingdone district, mail, microbloggings or in special comment website by web crawlers, and be saved in associated databases.The data of crawl comprise the view of product information, comment specifying information, certain some service or use gains in depth of comprehension, also can be the viewpoints held for certain social event of user etc.
In one embodiment of the invention, web crawlers can regularly grasp the newly-increased comment of each product with integrality and the real-time of assurance data according to the time of institute.
The first extraction module 200 is used for extracting explicit features from explicit comment.
Fig. 5 is the frame diagram of the first extraction module according to an embodiment of the invention.As shown in Figure 5, the first extraction module 200 comprises: the first extraction unit 210, processing unit 220 and cluster cell 230.
More specifically, the first extraction unit 210 is used for extracting frequent noun or the noun phrase that occurs from explicit comment;
Processing unit 220 is used for noun or noun phrase processed and obtains the concept set.
Cluster cell 230 is used for that cluster is carried out in the concept set and obtains conceptual clustering set, i.e. explicit features.
The second extraction module 300 is used for extracting implicit features by explicit features or explicit comment in the implicit expression comment.
Particularly, can pass through the described implicit features of dual mode extracting comment.A kind of is to extract implicit features by association rule mining, and another kind is other texts are classified and then to extract implicit features by train classification models.
Fig. 6 is the frame diagram that extracts according to an embodiment of the invention implicit features in the second extraction module by association rule mining.As shown in Figure 6, the second extraction module 300 comprises: the first generation unit 310 and the second extraction unit 320.
The first generation unit 310 is used for generating correlation rule according to explicit comment and by the collocation extracting method, and wherein, correlation rule is comprised of word and explicit features or Frequent and explicit features.
The second extraction unit 320 is used for extracting implicit features according to correlation rule in the implicit expression comment.
Fig. 7 is the frame diagram that extracts according to an embodiment of the invention implicit features in the second extraction module by train classification models.As shown in Figure 7, the second extraction module 300 comprises: the second generation unit 330, the 3rd generates unit 340, acquiring unit 350 and determining unit 360.
The second generation unit 330 is used for selecting attribute to generate training pattern according to explicit comment and explicit features.
The 3rd generates unit 340 is used for generating sorter according to the training of training pattern.
Acquiring unit 350 is used for obtaining the implicit expression comment.
Determining unit 360 is carried out the Analysis deterrmination implicit features for by sorter implicit expression being commented on.
In one embodiment of the invention, as a kind of training mode, and select corresponding attribute to make it to obtain a kind of corresponding relation according to comment according to the explicit features of the Frequent of putting down in writing in the explicit comment, noun or noun phrase and this comment.Training by training pattern generates sorter again.Generate directly the sorter that can specifically classify to comment according to training pattern, for example, sorter directly can be judged outward appearance, price or screen etc.In the implicit expression comment of obtaining, search situation about conforming to training pattern, when finding, can directly determine the implicit features of this implicit expression comment according to sorter.Because the singularity of comment institute can find implicit expression correspondingly to comment in all explicit comments as long as comment reaches certain amount.
Emotional semantic classification module 400 is carried out the emotional culture classification for the comment that will describe same characteristic features.
Emotional semantic classification module 400 comprises: the 3rd extraction unit and emotional semantic classification unit.
The 3rd extraction unit is used for extracting the comment of describing same characteristic features from comment.
The emotional semantic classification unit is used for by dictionary the emotional culture classification being carried out in the comment of describing same characteristic features.
In one embodiment of the invention, comprehensive existing sentiment dictionary, linguistics emotion achievement in research, network lexicon, input method dictionary etc. resource construction relatively complete sentiment dictionary, on the basis that adds user feeling statement rule, carried out comment at other emotional semantic classification of statement particle size fraction.Then, further corpus is added up, set up a feeling polarities along with the dictionary of context dynamic change for each feature, again by the context of co-text analysis to comment, inferred iteratively these words emotion tendency for certain characteristics of objects in this field.Emotional semantic classification and the process need iteration that makes up context-sensitive sentiment dictionary are carried out, until context-sensitive sentiment dictionary no longer changes, finally to commenting on definite emotional semantic classification.
Summarization generation module 500 is extracted a plurality of comments for the comment behind emotional semantic classification and is generated the comment summary.
In one embodiment of the invention, by behind the emotional semantic classification tendency of comment being added up, for example, the emotion tendency can be commendation or derogatory sense.Generate the comment summary according to these commendation comments and the occupied ratio of derogatory sense comment from commendation comment and a plurality of comments of derogatory sense comment extraction.
According to the system of the embodiment of the invention, comment on as data source by the magnanimity that obtains in the website, therefore improved the accuracy of data, simultaneously comment is carried out emotional semantic classification for the user provides intuitively data, made things convenient for user's inquiry.
Fig. 8 is the frame diagram of product feature comment digging system in accordance with another embodiment of the present invention.As shown in Figure 8, also comprise according to the product feature of embodiment of the invention comment digging system and check module 600.
Check that module 600 is used for checking the comment summary by product feature.
In one embodiment of the invention, the user can by the comment summary relevant with this feature to inquiring about of feature, be known advantage or deficiency, advantage or inferior position to this feature.Strategy is bought in the formation that the consumer can be rough thus, and the provider of product or service then can further improve this product or service.
According to the method for the embodiment of the invention, the user is by checking the comment summary of product or service, and the consumer can more understand product or service provides convenience for consumption, and the provider of product or service then can further improve weak point.
The specific operation process that should be appreciated that modules in the system embodiment of the present invention and unit can be identical with the description in the embodiment of the method, is not described in detail herein.
Although the above has illustrated and has described embodiments of the invention, be understandable that, above-described embodiment is exemplary, can not be interpreted as limitation of the present invention, those of ordinary skill in the art can change above-described embodiment in the situation that does not break away from principle of the present invention and aim within the scope of the invention, modification, replacement and modification.
Claims (12)
1. a product feature comment method for digging is characterized in that, may further comprise the steps:
S1: obtain comment by web crawlers from the website, wherein, described comment is a kind of in the comment of explicit comment or implicit expression;
S2: extract explicit features from described explicit comment;
S3: in described implicit expression comment, extract implicit features by described explicit features or described explicit comment;
S4: the emotional culture classification is carried out in the comment that will describe same characteristic features; And
S5: from the comment behind the described emotional semantic classification, extract a plurality of comments and generate the comment summary.
2. product feature comment method for digging according to claim 1 is characterized in that, also comprises:
S6: check described comment summary by product feature.
3. product feature according to claim 1 is commented on method for digging, it is characterized in that described step S2 further comprises:
S21: from described explicit comment, extract frequent noun or the noun phrase that occurs;
S22: described noun or noun phrase processed obtaining the concept set; And
S23: cluster is carried out in described concept set obtain conceptual clustering set, i.e. described explicit features.
4. product feature according to claim 1 is commented on method for digging, it is characterized in that described step S3 specifically comprises:
S31: generate correlation rule according to explicit comment and by the collocation extracting method, wherein, described correlation rule is comprised of word and described explicit features or Frequent and described explicit features; And
S32: extract described implicit features in described implicit expression comment according to described correlation rule.
5. product feature according to claim 1 is commented on method for digging, it is characterized in that described step S3 specifically comprises:
S310: select attribute to generate training pattern according to described explicit comment and explicit features;
S320: the training according to described training pattern generates sorter;
S330: obtain described implicit expression comment; And
S340: the described implicit features of Analysis deterrmination is carried out in described implicit expression comment by described sorter.
6. product feature according to claim 1 is commented on method for digging, it is characterized in that described step S4 specifically comprises:
S41: from described comment, extract the comment of describing same characteristic features; And
S42: the emotional culture classification is carried out in the described comment of describing same characteristic features by dictionary.
7. a product feature comment digging system is characterized in that, comprising:
Acquisition module is used for obtaining comment by web crawlers from the website, and wherein, described comment is a kind of in explicit comment or the implicit expression comment;
The first extraction module is used for extracting explicit features from described explicit comment;
The second extraction module is used for extracting implicit features by described explicit features or described explicit comment in described implicit expression comment;
The emotional semantic classification module is carried out the emotional culture classification for the comment that will describe same characteristic features; And
The summarization generation module is extracted a plurality of comments for the comment behind described emotional semantic classification and is generated the comment summary.
8. product feature comment digging system according to claim 7 is characterized in that, also comprises:
Check module, be used for checking described comment summary by product feature.
9. product feature according to claim 8 is commented on digging system, it is characterized in that described the first extraction module specifically comprises:
The first extraction unit is used for extracting frequent noun or the noun phrase that occurs from described explicit comment;
Processing unit is used for described noun or noun phrase processed obtaining the concept set; And
Cluster cell is used for that cluster is carried out in described concept set and obtains conceptual clustering set, i.e. described explicit features.
10. product feature according to claim 7 is commented on digging system, it is characterized in that described the second extraction module specifically comprises:
The first generation unit is used for generating correlation rule according to explicit comment and by the collocation extracting method, and wherein, described correlation rule is comprised of word and described explicit features or Frequent and described explicit features; And
The second extraction unit is used for extracting described implicit features according to described correlation rule in described implicit expression comment.
11. a kind of multi-eye stereo video acquisition system based on self-calibration technology according to claim 7 is characterized in that described the second extraction module specifically comprises:
The second generation unit is used for selecting attribute to generate training pattern according to described explicit comment and explicit features;
The 3rd generates the unit, is used for generating sorter according to the training of described training pattern;
Acquiring unit is used for obtaining described implicit expression comment; And
Determining unit is used for by described sorter the described implicit features of Analysis deterrmination being carried out in described implicit expression comment.
12. a kind of multi-eye stereo video acquisition system based on self-calibration technology according to claim 7 is characterized in that described emotional semantic classification module specifically comprises:
The 3rd extraction unit is used for extracting the comment of describing same characteristic features from described comment; And
The emotional semantic classification unit is used for by dictionary the emotional culture classification being carried out in the described comment of describing same characteristic features.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012104138543A CN102945268A (en) | 2012-10-25 | 2012-10-25 | Method and system for excavating comments on characteristics of product |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN2012104138543A CN102945268A (en) | 2012-10-25 | 2012-10-25 | Method and system for excavating comments on characteristics of product |
Publications (1)
Publication Number | Publication Date |
---|---|
CN102945268A true CN102945268A (en) | 2013-02-27 |
Family
ID=47728212
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN2012104138543A Pending CN102945268A (en) | 2012-10-25 | 2012-10-25 | Method and system for excavating comments on characteristics of product |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN102945268A (en) |
Cited By (20)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103345525A (en) * | 2013-07-22 | 2013-10-09 | 苏州大学 | Method, device and processor for text categorization |
CN103399916A (en) * | 2013-07-31 | 2013-11-20 | 清华大学 | Internet comment and opinion mining method and system on basis of product features |
CN104462480A (en) * | 2014-12-18 | 2015-03-25 | 刘耀强 | Typicality-based big comment data mining method |
CN106021413A (en) * | 2016-05-13 | 2016-10-12 | 清华大学 | Theme model based self-extendable type feature selecting method and system |
CN106202108A (en) * | 2015-05-06 | 2016-12-07 | 阿里巴巴集团控股有限公司 | Web crawlers captures method for allocating tasks and device and data grab method and device |
CN106354754A (en) * | 2016-08-16 | 2017-01-25 | 清华大学 | Bootstrap-type implicit characteristic mining method and system based on dispersed independent component analysis |
CN106708868A (en) * | 2015-11-16 | 2017-05-24 | ***通信集团北京有限公司 | Method and system for analyzing internet data |
CN107273351A (en) * | 2017-05-31 | 2017-10-20 | 温州市鹿城区中津先进科技研究院 | A kind of product feature extracting method based on big data opining mining |
CN107943909A (en) * | 2017-11-17 | 2018-04-20 | 合肥工业大学 | User demand trend method for digging and device, storage medium based on comment data |
CN108920545A (en) * | 2018-06-13 | 2018-11-30 | 四川大学 | The Chinese affective characteristics selection method of sentiment dictionary and Ka Fang model based on extension |
CN109190109A (en) * | 2018-07-26 | 2019-01-11 | 中国科学院自动化研究所 | Merge the method and device that user information generates comment abstract |
CN109299460A (en) * | 2018-09-18 | 2019-02-01 | 北京三快在线科技有限公司 | Analyze method, apparatus, electronic equipment and the storage medium of the evaluation data in shop |
CN109582945A (en) * | 2018-12-17 | 2019-04-05 | 北京百度网讯科技有限公司 | Article generation method, device and storage medium |
CN109886104A (en) * | 2019-01-14 | 2019-06-14 | 浙江大学 | A kind of motion feature extracting method based on the perception of video before and after frames relevant information |
CN110738046A (en) * | 2018-07-03 | 2020-01-31 | 百度在线网络技术(北京)有限公司 | Viewpoint extraction method and device |
CN111428489A (en) * | 2020-03-19 | 2020-07-17 | 北京百度网讯科技有限公司 | Comment generation method and device, electronic equipment and storage medium |
CN112270170A (en) * | 2020-10-19 | 2021-01-26 | 中译语通科技股份有限公司 | Analysis method, device, medium and electronic equipment for implicit expression statement |
CN112559746A (en) * | 2020-12-11 | 2021-03-26 | 南京邮电大学 | Product comment mining method and system |
CN113722487A (en) * | 2021-08-31 | 2021-11-30 | 平安普惠企业管理有限公司 | User emotion analysis method, device and equipment and storage medium |
CN114663246A (en) * | 2022-05-24 | 2022-06-24 | 中国电子科技集团公司第三十研究所 | Representation modeling method of information product in propagation simulation and multi-agent simulation method |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101609459A (en) * | 2009-07-21 | 2009-12-23 | 北京大学 | A kind of extraction system of affective characteristic words |
CN101667194A (en) * | 2009-09-29 | 2010-03-10 | 北京大学 | Automatic abstracting method and system based on user comment text feature |
CN102236722A (en) * | 2011-08-17 | 2011-11-09 | 广州索答信息科技有限公司 | Method and system for generating user comment summaries based on triples |
CN102385579A (en) * | 2010-08-30 | 2012-03-21 | 腾讯科技(深圳)有限公司 | Internet information classification method and system |
US20120179751A1 (en) * | 2011-01-06 | 2012-07-12 | International Business Machines Corporation | Computer system and method for sentiment-based recommendations of discussion topics in social media |
-
2012
- 2012-10-25 CN CN2012104138543A patent/CN102945268A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN101609459A (en) * | 2009-07-21 | 2009-12-23 | 北京大学 | A kind of extraction system of affective characteristic words |
CN101667194A (en) * | 2009-09-29 | 2010-03-10 | 北京大学 | Automatic abstracting method and system based on user comment text feature |
CN102385579A (en) * | 2010-08-30 | 2012-03-21 | 腾讯科技(深圳)有限公司 | Internet information classification method and system |
US20120179751A1 (en) * | 2011-01-06 | 2012-07-12 | International Business Machines Corporation | Computer system and method for sentiment-based recommendations of discussion topics in social media |
CN102236722A (en) * | 2011-08-17 | 2011-11-09 | 广州索答信息科技有限公司 | Method and system for generating user comment summaries based on triples |
Cited By (30)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103345525A (en) * | 2013-07-22 | 2013-10-09 | 苏州大学 | Method, device and processor for text categorization |
CN103399916A (en) * | 2013-07-31 | 2013-11-20 | 清华大学 | Internet comment and opinion mining method and system on basis of product features |
CN104462480A (en) * | 2014-12-18 | 2015-03-25 | 刘耀强 | Typicality-based big comment data mining method |
CN104462480B (en) * | 2014-12-18 | 2017-11-10 | 刘耀强 | Comment big data method for digging based on typicalness |
CN106202108A (en) * | 2015-05-06 | 2016-12-07 | 阿里巴巴集团控股有限公司 | Web crawlers captures method for allocating tasks and device and data grab method and device |
CN106202108B (en) * | 2015-05-06 | 2019-09-06 | 阿里巴巴集团控股有限公司 | Web crawlers grabs method for allocating tasks and device and data grab method and device |
CN106708868B (en) * | 2015-11-16 | 2020-02-21 | ***通信集团北京有限公司 | Internet data analysis method and system |
CN106708868A (en) * | 2015-11-16 | 2017-05-24 | ***通信集团北京有限公司 | Method and system for analyzing internet data |
CN106021413A (en) * | 2016-05-13 | 2016-10-12 | 清华大学 | Theme model based self-extendable type feature selecting method and system |
CN106021413B (en) * | 2016-05-13 | 2019-07-02 | 清华大学 | Auto-expanding type feature selection approach and system based on topic model |
CN106354754A (en) * | 2016-08-16 | 2017-01-25 | 清华大学 | Bootstrap-type implicit characteristic mining method and system based on dispersed independent component analysis |
CN107273351A (en) * | 2017-05-31 | 2017-10-20 | 温州市鹿城区中津先进科技研究院 | A kind of product feature extracting method based on big data opining mining |
CN107943909A (en) * | 2017-11-17 | 2018-04-20 | 合肥工业大学 | User demand trend method for digging and device, storage medium based on comment data |
CN108920545A (en) * | 2018-06-13 | 2018-11-30 | 四川大学 | The Chinese affective characteristics selection method of sentiment dictionary and Ka Fang model based on extension |
CN108920545B (en) * | 2018-06-13 | 2021-07-09 | 四川大学 | Chinese emotion feature selection method based on extended emotion dictionary and chi-square model |
CN110738046A (en) * | 2018-07-03 | 2020-01-31 | 百度在线网络技术(北京)有限公司 | Viewpoint extraction method and device |
CN110738046B (en) * | 2018-07-03 | 2023-06-06 | 百度在线网络技术(北京)有限公司 | Viewpoint extraction method and apparatus |
CN109190109A (en) * | 2018-07-26 | 2019-01-11 | 中国科学院自动化研究所 | Merge the method and device that user information generates comment abstract |
CN109190109B (en) * | 2018-07-26 | 2020-09-29 | 中国科学院自动化研究所 | Method and device for generating comment abstract by fusing user information |
CN109299460A (en) * | 2018-09-18 | 2019-02-01 | 北京三快在线科技有限公司 | Analyze method, apparatus, electronic equipment and the storage medium of the evaluation data in shop |
CN109299460B (en) * | 2018-09-18 | 2022-07-12 | 北京三快在线科技有限公司 | Method and device for analyzing evaluation data of shop, electronic device and storage medium |
CN109582945A (en) * | 2018-12-17 | 2019-04-05 | 北京百度网讯科技有限公司 | Article generation method, device and storage medium |
CN109886104A (en) * | 2019-01-14 | 2019-06-14 | 浙江大学 | A kind of motion feature extracting method based on the perception of video before and after frames relevant information |
CN111428489A (en) * | 2020-03-19 | 2020-07-17 | 北京百度网讯科技有限公司 | Comment generation method and device, electronic equipment and storage medium |
CN111428489B (en) * | 2020-03-19 | 2023-08-29 | 北京百度网讯科技有限公司 | Comment generation method and device, electronic equipment and storage medium |
CN112270170A (en) * | 2020-10-19 | 2021-01-26 | 中译语通科技股份有限公司 | Analysis method, device, medium and electronic equipment for implicit expression statement |
CN112559746A (en) * | 2020-12-11 | 2021-03-26 | 南京邮电大学 | Product comment mining method and system |
CN113722487A (en) * | 2021-08-31 | 2021-11-30 | 平安普惠企业管理有限公司 | User emotion analysis method, device and equipment and storage medium |
CN114663246A (en) * | 2022-05-24 | 2022-06-24 | 中国电子科技集团公司第三十研究所 | Representation modeling method of information product in propagation simulation and multi-agent simulation method |
CN114663246B (en) * | 2022-05-24 | 2022-09-23 | 中国电子科技集团公司第三十研究所 | Representation modeling method of information product in propagation simulation and multi-agent simulation method |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN102945268A (en) | Method and system for excavating comments on characteristics of product | |
Gu et al. | " what parts of your apps are loved by users?"(T) | |
CN103399916A (en) | Internet comment and opinion mining method and system on basis of product features | |
Lin et al. | Predictive intelligence in harmful news identification by BERT-based ensemble learning model with text sentiment analysis | |
Xu et al. | Mining comparative opinions from customer reviews for competitive intelligence | |
CN103699626B (en) | Method and system for analysing individual emotion tendency of microblog user | |
Castellanos et al. | LCI: a social channel analysis platform for live customer intelligence | |
CN106503049A (en) | A kind of microblog emotional sorting technique for merging multiple affection resources based on SVM | |
Lloret et al. | A novel concept-level approach for ultra-concise opinion summarization | |
WO2016085409A1 (en) | A method and system for sentiment classification and emotion classification | |
Sharma et al. | Nlp and machine learning techniques for detecting insulting comments on social networking platforms | |
CN103365867A (en) | Method and device for emotion analysis of user evaluation | |
CN103455562A (en) | Text orientation analysis method and product review orientation discriminator on basis of same | |
CN104820629A (en) | Intelligent system and method for emergently processing public sentiment emergency | |
Hong et al. | Influencing factors of the persuasiveness of online reviews considering persuasion methods | |
CN106354845A (en) | Microblog rumor recognizing method and system based on propagation structures | |
Tayal et al. | Sentiment analysis on social campaign “Swachh Bharat Abhiyan” using unigram method | |
CN104636425A (en) | Method for predicting and visualizing emotion cognitive ability of network individual or group | |
GB2502037A (en) | Topic analytics | |
KR20120108095A (en) | System for analyzing social data collected by communication network | |
Benamara et al. | Introduction to the special issue on language in social media: exploiting discourse and other contextual information | |
Sims et al. | Measuring information propagation in literary social networks | |
CN101957812A (en) | Verb semantic information extracting method based on event ontology | |
Chenlo et al. | Sentiment-based ranking of blog posts using rhetorical structure theory | |
Bosco et al. | Detecting happiness in Italian tweets: Towards an evaluation dataset for sentiment analysis in Felicitta |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
C12 | Rejection of a patent application after its publication | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20130227 |