CN102236722B - Method and system for generating user comment summaries based on triples - Google Patents

Method and system for generating user comment summaries based on triples Download PDF

Info

Publication number
CN102236722B
CN102236722B CN201110236683.7A CN201110236683A CN102236722B CN 102236722 B CN102236722 B CN 102236722B CN 201110236683 A CN201110236683 A CN 201110236683A CN 102236722 B CN102236722 B CN 102236722B
Authority
CN
China
Prior art keywords
feature
tlv triple
decision
comment
making
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201110236683.7A
Other languages
Chinese (zh)
Other versions
CN102236722A (en
Inventor
石忠民
徐亚波
杜伟夫
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
GUANGZHOU SUMMBA INFORMATION TECHNOLOGY CO LTD
Original Assignee
GUANGZHOU SUMMBA INFORMATION TECHNOLOGY CO LTD
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by GUANGZHOU SUMMBA INFORMATION TECHNOLOGY CO LTD filed Critical GUANGZHOU SUMMBA INFORMATION TECHNOLOGY CO LTD
Priority to CN201110236683.7A priority Critical patent/CN102236722B/en
Publication of CN102236722A publication Critical patent/CN102236722A/en
Application granted granted Critical
Publication of CN102236722B publication Critical patent/CN102236722B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Landscapes

  • Machine Translation (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a system for generating user comment summaries based on triples. The method comprises the following steps of: establishing a feature word bank, a mapping word list and an emotional word bank of objects, and constructing a feature tree according to the feature word bank; grasping a user comment webpage; receiving user comments; processing each user comment one by one and generating own comment triple-based comment summary; summing up and integrating the comment triples of all the user comments to generate decision triples; calculating the number of decision triples in which the feature words and the emotional words are the same in polarity; and extracting all the decision triples to generate a decision summary. With the method or the system of the present invention, the comment summary is generated for each user comment so that the user can check and refer to this information; and all the comment triples are summed up and integrated to generate the decision triples having the directing significance; furthermore, all the decision triples are extracted to generate the decision summary which is capable of reflecting the overall assessment results and having the function of assisting decision; therefore, the user can be helped to make a correct decision quickly.

Description

A kind of generation method and system of the summary of the user comment based on tlv triple
Technical field
The present invention relates to computerized information digging technology field, relate in particular to a kind of generation method and system of the summary of the user comment based on tlv triple, be mainly used in generating a decision-making summary that can objectively respond the overall assessment result of all user comments from a large number of users comment of object.
Background technology
At present, along with popularizing of internet, user wishes to go to understand by internet the comment that other users deliver consumer objects before consumption, determine with this whether consumer objects is worth oneself going consumption, this object can be businessman or product, also can be service, such as user thinks certain restaurant or market consumption, the advertisement information of only seeing this restaurant or market is inadequate, because these advertisement informations are difficult to objectively describe its real product quality and service level, user would like to know in this restaurant naturally or how other users of market post-consumer evaluate it.But, in the time that the user comment for object is very many, user is difficult to know from large-scale user comment front evaluation and the negative evaluation of certain feature of oneself paying special attention to of this object respectively account for how many, also be difficult to learn that the total result of all user comments is that positive evaluation is in the majority or negative evaluation is in the majority, such as thinking a restaurant, user has a meal, pay special attention to food and the environment in this restaurant, but the distribution of the user comment that relates to food and environment in all user comments is irregular, user wants to check that all user comments that relate to food and environment must finish watching whole user comments one by one, oneself also will be added up front evaluation and negative evaluation, this is the power that consumes again consuming time obviously, extremely inconvenient, and cost large cost like this is finished watching whole user comments, what know is only also the evaluation result of food and these two features of environment, wonder the evaluation result of further feature and the overall assessment result of all user comments, its workload is unthinkable.And it is short that the word length of a user comment has length to have, the information that wherein user pays close attention to is the feature of object and the emotion word of Expressive Features, and out of Memory is all useless, but user can not only check the own information of paying close attention in the time checking.
In sum, user has following two significant problems in the time checking the user comment of object at present:
1. in user comment, be flooded with a large amount of garbages, while causing checking, lose time;
2. front evaluation and the negative evaluation that can not directly check feature respectively account for how many, can not learn the overall assessment result of all user comments, although therefore have large-scale user comment as a reference, assisted user is made correct decisions rapidly intuitively.
Summary of the invention
For the deficiencies in the prior art, fundamental purpose of the present invention is intended to provide a kind of generation method of the summary of the user comment based on tlv triple.
Another object of the present invention is to provide a kind of generation system of the summary of the user comment based on tlv triple.
The present invention adopts following technical scheme for achieving the above object:
A generation method for user comment summary based on tlv triple, comprising:
Step 1. is set up feature dictionary, mapping vocabulary and the emotion dictionary of object, and according to the characteristics tree of the feature construction object in feature dictionary, wherein, mapping word in mapping vocabulary is corresponding with the Feature Mapping in feature dictionary, emotion dictionary comprises positive emotion dictionary and negative emotion dictionary, the root node on characteristics tree top is object, the each layer of feature that leaf node is object under root node, and lower one deck leaf node is the subcharacter of corresponding last layer leaf node, last layer leaf node is father's feature of corresponding lower one deck leaf node;
Step 2. is the directed user comment webpage that captures object from internet;
Step 3. receives all user comments of object in user comment webpage;
Step 4. is handled as follows one by one to each user comment, generates comment summary separately:
Step 41. is according to the feature of feature dictionary and mapping vocabulary extracting object;
Step 42. is according to emotion dictionary identification emotion word;
Step 43. arrange in pairs or groups feature and emotion word, generate the comment tlv triple based on object, feature, emotion word;
Step 44. extracting comment tlv triple generates the comment summary of this user comment;
And the method also comprises:
Step 5. is concluded the comment tlv triple of integrating all user comments, leaf node taking below characteristics tree ground floor leaf node is summed up in the point that on its corresponding ground floor leaf node as the feature of the comment tlv triple of feature, generate all decision-making tlv triple taking characteristics tree ground floor leaf node as feature;
The quantity of the decision-making tlv triple that step 6. calculated characteristics is identical with emotion word polarity, if quantity equals 1, by this decision-making tlv triple and incompatible this decision-making tlv triple that represents of sets of numbers, if quantity is greater than 1, by any one decision-making tlv triple wherein with sets of numbers is incompatible represents that these have the decision-making tlv triple of same characteristic features and emotion word polarity;
Step 7. extracts all decision-making tlv triple and generates the decision-making summary of all user comments with the representation of decision-making tlv triple and number combinations.
As a kind of preferred version, described step 41 comprises:
User comment is divided into sentence by step 411.;
Step 412. travels through each word in sentence, judge whether it occurs in feature dictionary, if there is, extract as feature, if do not appear in feature dictionary but appear in mapping vocabulary, extract in feature dictionary with this word mapping characteristic of correspondence out as feature.
As a kind of preferred version, described step 42 comprises:
User comment is divided into sentence by step 421.;
Step 422. travels through each word in sentence, and the word appearing in emotion dictionary is extracted as emotion word;
Step 423. judges the polarity of the emotion word extracting according to the polarity of emotion dictionary.
As a kind of preferred version, described step 43 comprises:
Step 431. is extracted the feature templates of tlv triple from training sample;
Step 432. is used svm classifier method according to a sorter of feature templates training;
Step 433. utilizes syntax rule to combine feature and emotion word, generates tlv triple;
Step 434. utilizes sorter to arrange in pairs or groups to feature and emotion word, generates tlv triple;
All tlv triple that step 435. utilizes candidate's tlv triple set pair of artificial mark to be generated by syntax rule and sorter are filtered, and remove feature and emotion word irrational tlv triple of arranging in pairs or groups, and obtain commenting on tlv triple.
As a kind of preferred version, described step 6 also comprises calculates the quantity of positive decision-making tlv triple and the quantity of negative decision-making tlv triple, and described step 7 also comprises the content that the extraction quantity of this front decision-making tlv triple and the quantity of negative decision-making tlv triple are made a summary as described decision-making.
A generation system for user comment summary based on tlv triple, comprising:
Pretreatment unit, for setting up feature dictionary, mapping vocabulary and the emotion dictionary of object, and according to the characteristics tree of the feature construction object in feature dictionary, wherein, mapping word in mapping vocabulary is corresponding with the Feature Mapping in feature dictionary, emotion dictionary comprises positive emotion dictionary and negative emotion dictionary, the root node on characteristics tree top is object, the each layer of feature that leaf node is object under root node, and lower one deck leaf node is the subcharacter of corresponding last layer leaf node, last layer leaf node is father's feature of corresponding lower one deck leaf node;
Reptile device, for the directed user comment webpage that captures object from internet;
Receiving trap, for receiving all user comments of user comment webpage object;
Treating apparatus, for each user comment is processed one by one, generates comment summary separately, and this treating apparatus comprises: feature extraction device, for the feature with mapping vocabulary extracting object according to feature dictionary;
Emotion word recognition device, for identifying emotion word according to emotion dictionary;
Comment tlv triple generating apparatus, for feature and the emotion word of arranging in pairs or groups, generates the comment tlv triple based on object, feature, emotion word;
Comment summarization generation device, the comment that generates this user comment for extracting comment tlv triple is made a summary;
And this system also comprises:
Decision-making tlv triple generating apparatus, for concluding the comment tlv triple of integrating all user comments, leaf node taking below characteristics tree ground floor leaf node is summed up in the point that on its corresponding ground floor leaf node as the feature of the comment tlv triple of feature, generate all decision-making tlv triple taking characteristics tree ground floor leaf node as feature;
Calculation element, for the quantity of the calculated characteristics decision-making tlv triple identical with emotion word polarity, if quantity equals 1, by this decision-making tlv triple and incompatible this decision-making tlv triple that represents of sets of numbers, if quantity is greater than 1, by any one decision-making tlv triple wherein with sets of numbers is incompatible represents that these have the decision-making tlv triple of same characteristic features and emotion word polarity;
Decision-making summarization generation device, the decision-making that generates all user comments for extract all decision-making tlv triple with the representation of decision-making tlv triple and number combinations is made a summary.
As a kind of preferred version, described feature extraction device comprises:
User comment is divided into the device of sentence;
Each word in traversal sentence, judge whether it occurs in feature dictionary, if there is, extract as feature, if do not appear in feature dictionary but appear in mapping vocabulary, extract in feature dictionary with this word mapping characteristic of correspondence out as the device of feature.
As a kind of preferred version, described emotion word recognition device comprises:
User comment is divided into the device of sentence;
Each word in traversal sentence, extracts the word appearing in emotion dictionary as the device of emotion word;
Judge the device of the polarity of the emotion word extracting according to the polarity of emotion dictionary.
As a kind of preferred version, described comment tlv triple generating apparatus comprises:
From training sample, extract the device of the feature templates of tlv triple;
Use svm classifier method according to the device of a sorter of feature templates training;
Utilize syntax rule to combine feature and emotion word, generate the device of tlv triple;
Utilize sorter to arrange in pairs or groups to feature and emotion word, generate the device of tlv triple;
All tlv triple of utilizing candidate's tlv triple set pair of artificial mark to be generated by syntax rule and sorter are filtered, and remove feature and emotion word irrational tlv triple of arranging in pairs or groups, and obtain commenting on the device of tlv triple.
As a kind of preferred version, described calculation element is also for calculating the quantity of positive decision-making tlv triple and the quantity of negative decision-making tlv triple, and described decision-making summarization generation device is also for extracting the quantity of this front decision-making tlv triple and the quantity of the negative decision-making tlv triple content as described decision-making summary.
The generation method and system of a kind of summary of the user comment based on tlv triple set forth in the present invention, its beneficial effect is: utilize this method or system, by the feature in each user comment, emotion word extracts and generates based on object, feature, the comment tlv triple of emotion word, for each user comment generates its comment summary based on comment tlv triple, realize the information that user in user comment is paid close attention to and extracted separately succinctly summary info intuitively of formation, so that user checks reference, and, by comment tlv triple is concluded to integration, generation has the decision-making tlv triple of directive significance, and extract whole decision-making tlv triple with the representation of decision-making tlv triple and number combinations and generate and can reflect overall assessment result, there is the decision-making summary of decision-making booster action, in decision-making summary, user can directly check that the front of the own feature of paying close attention to and further feature is evaluated and negative evaluation respectively accounts for how many, also can know the overall assessment result of all user comments, thereby assisted user is made correct decisions rapidly.
Brief description of the drawings
Fig. 1 is the schematic flow sheet of the generation method of a kind of summary of the user comment based on tlv triple of the present invention.
Fig. 2 is the configuration diagram of a characteristics tree.
Fig. 3 is the diagram of a user comment and comment summary thereof.
Fig. 4 is a decision-making summary diagram.
Embodiment
Below in conjunction with accompanying drawing and specific embodiment, the invention will be further described.
Please refer to shown in Fig. 1, it has demonstrated the overall procedure of the generation method of a kind of summary of the user comment based on tlv triple of the present invention.In step (1), set up feature dictionary, mapping vocabulary and the emotion dictionary of object, and according to the characteristics tree of the feature construction object in feature dictionary, wherein:
Object can be product, businessman or service, and feature dictionary is the set of the word that can be used as characteristics of objects collected from large-scale corpus.The foundation of feature dictionary can use the method based on statistics to realize, and specific implementation process can be: first, collect a seed characteristics dictionary that has comprised all nouns from large-scale corpus; Then the frequency that, in statistics seed characteristics dictionary, all nouns occur in large-scale corpus; Then, the frequency of occurrences is removed as stop words lower than the noun of pre-set threshold value, generated initial characteristics dictionary; Finally, the word in initial characteristics dictionary is filtered, generate final feature dictionary.
Mapping word in mapping vocabulary is corresponding with the Feature Mapping in feature dictionary, the object of setting up mapping vocabulary is the potential feature that may exist in user comment in order to excavate, the definition of potential feature is relative explicit features, if there is the feature in feature dictionary in user comment, this feature is explicit features in this user comment, and due to the dirigibility of Chinese and user's expression problem, although user may evaluate certain feature of object in the time making comments, but in word, do not write out this feature, this feature is potential feature in this user comment, such as certain user comment that is certain restaurant for object, user has write and " has eaten on the contrary nice in comment, be exactly too expensive ", in the words, just do not write out feature, but " eat " and in this verb, but implied " food " this feature, therefore here " eating " is mapping word, " food " is the potential feature corresponding with this mapping word.Mapping vocabulary is exactly the set of the mapping word that has comprised potential feature collected from large-scale corpus, feature in mapping word How to choose feature dictionary in mapping vocabulary can be by PMI (the Point-wise Mutual Information between calculated characteristics and mapping word as potential feature, pointwise mutual information) determine, computing formula is: PMI (f, d)=hits (f, d)/hits (f) hits (d), wherein f is feature, d is mapping word, PMI value is higher shows that this feature is just larger as the possibility of the potential feature of this mapping word, therefore be generally that mapping word and the feature that collocation PMI value is the highest set up the corresponding relation that shines upon vocabulary and feature dictionary.
Emotion dictionary comprises positive emotion dictionary and negative emotion dictionary, emotion dictionary is a set with the emotion word that obviously emotion is inclined to of collecting from large-scale corpus, the emotion word of two kinds of feeling polarities only collected in emotion dictionary, a kind of is positive, for example " good ", " satisfaction " is exactly two positive emotion words, it is negative also having a kind of, for example " disappointing " is exactly a negative emotion word, because these two kinds of diametrically opposite feeling polarities can provide reference value for user, and more neutral emotion word meaning little concerning user, the foundation of emotion dictionary can be used the method based on statistics.
The root node on characteristics tree top is object, the each layer of feature that leaf node is object under root node, and lower one deck leaf node is the subcharacter of corresponding last layer leaf node, and last layer leaf node is father's feature of corresponding lower one deck leaf node.Characteristics tree has defined the relation between feature and the feature of current object, this relation is with different levels tree structure, extensive concept of the node on upper strata more in characteristics tree, father's feature is the summary of the attribute to its all subcharacters, subcharacter is the refinement to his father's feature from different perspectives, between all subcharacters of same father's feature, forms relations on an equal basis.For ease of explanation, taking a restaurant of catering field as example, please refer to shown in Fig. 2, Fig. 2 is the sketch of a characteristics tree taking " restaurant " as object, in characteristics tree top root node in Fig. 2 is " restaurant " this object, " food " in ground floor leaf node, " service ", " cost performance ", " environment " is to summarize the feature of " restaurant " attribute, " nutrition " in second layer leaf node is the subcharacter of " food ", " atmosphere ", " tableware " is the subcharacter of " environment " with " finishing ", " shop front ", " tone ", " style " is again the subcharacter of " finishing ", the structure of characteristics tree can use in conjunction with statistical machine learning and rule-based method sorts out integration to the feature in feature dictionary, take out different concept hierarchies, generate required characteristics tree.
Proceed to step (2), the directed user comment webpage that captures object from internet.This relates to web crawlers, in order to crawl efficiently and the Internet resources of Topic relative, general adoptablely crawls strategy and related algorithm has: based on the heuristic of word content; The method of evaluating based on the super chain figure of web; Based on the method for sorter prediction; Other focused crawl methods.
Proceed to step (3), receive all user comments of object in user comment webpage.This relates to web page text information extraction technique, can adopt the web page text information extraction based on dispenser, based on the Web page text extracting of statistics, web page blocks analysis based on vision, the technology such as the Web page text extracting based on data mining thought realize, also can adopt a kind of scheme stage by stage: the first stage remains the technology path based on dispenser, but different from the common information extraction technique based on dispenser be, for dispenser part, every dispenser decimation rule is all configurableization, utilize xml analytic technique to realize, more particularly, decimation rule is inquired about based on xpath, make to extract very convenient flexible, and for the generation of concrete each xpath, it is the mode of using autopager browser plug-in, auxiliary generation, there is certain semi-automatic feature, subordinate phase is used the method for machine learning, for structured message web webpage to be extracted, according to the feature of structured message to be extracted, use heuritic approach automatically to identify its corresponding xpath, and generate corresponding xpath configuration file, realize the automatic extraction of wrapper rule.
Proceed to step (4), each user comment is processed one by one, extract the information that in user comment, user pays close attention to: feature, emotion word, and feature and emotion word are reasonably arranged in pairs or groups according to certain rule, generate separately based on object, feature, the comment tlv triple of emotion word, the comment being generated separately by the comment tlv triple of each user comment is made a summary, comment tlv triple has reflected the viewpoint of this user comment, comment summary succinctly illustrates the evaluation result that this user comment is expressed intuitively, on UI interface, comment summary can be presented at the right side of user comment, please refer to shown in Fig. 3, it has demonstrated a user comment and its comment summary, in Fig. 3, user " abc " has delivered the comment in certain restaurant of evaluation on May 1st, 2009, comment content is that " the service here well, and popularity is also many, environment is also good, and price relatively is calculated! I will come twice at a week, everybody consumption that can come here more! wrap you and come on an impulse, return in high spirits.", generate its comment summary according to this user comment, the content of comment summary is " the pretty good popularity of environment is served good price more and calculated ", wherein there are four comment tlv triple, respectively the pretty good > of < restaurant environment, the many > of < restaurant popularity, < has served > in restaurant, < restaurant price is calculated >, certainly, while demonstration on UI interface, because the object of all comment tlv triple and decision-making tlv triple is all identical, therefore can be by Objects hide, make interface more succinct, in Fig. 3, four comment tlv triple of comment summary have just been hidden its common object " restaurant ", as can be seen from Figure 3, the information that in this user comment, user pays close attention to by extracted separately formed comprise four comment tlv triple comments make a summary, be convenient to very much the evaluation that user knows that this user comment is made.The detailed process that this step (4) is processed each user comment is: step (41), according to the feature of feature dictionary and mapping vocabulary extracting object, feature is the attribute of object, the basal conditions of energy reflection object, for example, for " restaurant " this object, its feature just has " food ", " service ", " cost performance ", " environment " etc.; Step (42), according to emotion dictionary identification emotion word, emotion word is for expressing the tendentious word of viewpoint, with obvious subjectivity, user evaluates the quality of certain feature with it, for example, for food and drink comment, the words such as " height ", " low ", " satisfaction ", " disappointing " are exactly some common emotion words; Step (43), collocation feature and emotion word, generate the comment tlv triple based on object, feature, emotion word; Step (44), extracting comment tlv triple generates the comment summary of this user comment.Wherein, in the time carrying out step (41), can first by punctuation mark, user comment be divided into sentence, then travel through each word in sentence, judge whether it occurs in feature dictionary, if there is, setting it as explicit features extracts, if do not appear in feature dictionary but appear in mapping vocabulary, will with this mapping word mapping characteristic of correspondence as potential feature extraction out, mapping word is generally emotion word or verb; In the time carrying out step (42), can first by punctuation mark, user comment be divided into sentence equally, then travel through each word in sentence, the word appearing in emotion dictionary is extracted as emotion word, then judge the polarity of the emotion word extracting according to the polarity of emotion dictionary; In the time carrying out step (43), can use the method for machine learning, and merge syntactic feature the relation between Feature Words and emotion word is judged, concrete grammar is: first, extract the feature templates of tlv triple from training sample; Then, use svm classifier method according to a sorter of feature templates training; Then, utilize syntax rule to combine feature and emotion word, generate tlv triple, utilize sorter to arrange in pairs or groups to feature and emotion word, generate tlv triple; Finally, all tlv triple of utilizing candidate's tlv triple set pair of artificial mark to be generated by syntax rule and sorter are filtered, and remove feature and emotion word irrational tlv triple of arranging in pairs or groups, and obtain commenting on tlv triple.Candidate's tlv triple collection is the reasonable triplet sets that the current object that pre-defines likely exists, do not appear at tlv triple in candidate's triplet sets and all belong to feature and emotion word irrational tlv triple of arranging in pairs or groups, can directly filter out, such as feature is " environment ", emotion word is " expensive ", be obviously a unreasonable collocation, can not be present in candidate's triplet sets.
Proceed to step (5), conclude the comment tlv triple of integrating all user comments, leaf node taking below characteristics tree ground floor leaf node is summed up in the point that on its corresponding ground floor leaf node as the feature of the comment tlv triple of feature, generate all decision-making tlv triple taking characteristics tree ground floor leaf node as feature.For an object, may have many consumers it is commented on, and the viewpoint of different comments may be identical, also may be different, even completely contrary.Correspondingly, different comment tlv triple may be to the existing front of evaluation of same feature, also have negative, actually being user, the result causing do not know that or existing comment is positive in the majority negative in the majority to the evaluation of these features after having seen these comment triplet information, moreover user disperses relatively to the evaluation of a certain feature, such as, concerning " environment " this feature, user may comment on from all angles such as " finishing ", " health ", " atmosphere ".These characteristic evaluatings that relatively disperse are unfavorable for that user makes intuitive judgment to a certain feature rapidly and accurately.Decision-making tlv triple is exactly to recognize the evaluation situation of the feature that can summarize object properties for assisted user, and the feature of characteristics tree ground floor leaf node is undoubtedly the most representative feature of object, therefore select the feature of ground floor leaf node as decision-making tlv triple, other all features all sum up in the point that these features get on.Such as " environment " is a ground floor leaf node, be used as the feature of decision-making tlv triple, all features about environment in comment tlv triple, all to sum up in the point that " environment " this feature gets on, thereby generate decision-making tlv triple, for example one to as if " restaurant ", feature is " atmosphere ", emotion word is the good > of comment tlv triple < restaurant atmosphere of " good ", its feature " atmosphere " is summed up in the point that to the decision-making tlv triple that has just generated a good > of < restaurant environment after " environment ", certainly, it is exactly the comment tlv triple taking characteristics tree ground floor leaf node as feature if original, generate the decision-making tlv triple identical with this comment tlv triple, the same with aforementioned comment tlv triple, all decision-making tlv triple with same object also can not demonstrate object on UI interface.
Proceed to step (6), the quantity of the decision-making tlv triple that calculated characteristics is identical with emotion word polarity, if quantity is greater than 1, by any one decision-making tlv triple wherein with sets of numbers is incompatible represents that these have the decision-making tlv triple of same characteristic features and emotion word polarity.Certainly, if quantity equals 1, illustrate and do not have other decision-making tlv triple and this decision-making tlv triple to there is identical feature and emotion word polarity just do not have other to select with the decision-making tlv triple of combination of numbers so, can only be by this decision-making tlv triple and incompatible this decision-making tlv triple that represents of sets of numbers.In this step, also can calculate the quantity of positive decision-making tlv triple and the quantity of negative decision-making tlv triple.
Proceed to step (7), the decision-making summary that extracts all decision-making tlv triple and generate all user comments with the representation of decision-making tlv triple and number combinations, also can extract the quantity of positive decision-making tlv triple and the quantity of the negative decision-making tlv triple content as decision-making summary.In decision-making summary, user can directly check that the front of the own feature of paying close attention to and further feature is evaluated and negative evaluation respectively accounts for how many, also can know the overall assessment result of all user comments, thereby assisted user is made correct decisions rapidly.As shown in Figure 4, it has demonstrated the decision-making summary of all user comments in " A restaurant ", respectively have how many taking " food ", " environment ", " service ", " taste ", " cost performance " as positive decision-making tlv triple and the negative decision-making tlv triple of feature as can be seen from Figure 4, also show the quantity of all positive decision-making tlv triple and negative decision-making tlv triple, this decision-making summary provides very useful reference information for user makes high-speed decision.
Being pointed out that, repeatedly giving an example with catering field above, just for the ease of understanding the present invention, is not to limit application of the present invention, and the present invention can be widely used in the field of any related products, businessman, service.
The present invention, in the time extracting feature and emotion word, needs user comment to carry out participle, and the quality of participle performance has very important impact to the generation of tlv triple.The present invention adopts Hidden Markov Model (HMM) (HMM) to carry out participle and part-of-speech tagging, and the method for use based on Delimiter carried out participle and obtained.And participle performance of the present invention is evaluated and tested, and evaluation method common in natural language processing is continued to use in evaluation and test:
Accuracy rate: P=C3/C2;
Recall rate: R=C3/C1;
F value: F=2*P*R/ (P+R);
Wherein, C1 is the number of the word of reality in language material; C2 is the number of the word that branches away of participle device; C3 is the number of the word that correctly branches away of participle device, the language material of evaluation and test comes from the comment about cosmetics in Taobao and store, Jingdone district, these comment language materials obtain by reptile device and receiving trap, then extract at random 251 comments out, through artificial participle and correction, form evaluating standard language material.Evaluation result is as shown in the table:
Accuracy rate Recall rate F value C1 C2 C3
0.93 0963 0.946 7025 7279 6769
As can be seen from the above table, the F value of participle of the present invention has reached 94.6%, has obtained higher performance, has laid a solid foundation for generating high-quality tlv triple.
The performance of the present invention aspect feature extraction is also very excellent, the coverage rate for performance index (coverage) of feature extraction is calculated, computing formula is: coverage=Four/Fall, Four is the feature sum that the present invention identifies in the time of feature extraction, and Fall is by the feature sum after artificial mark.Our experiment language material comes from Taobao and store, Jingdone district in the comment about cosmetics, and these review information are obtained by reptile device and receiving trap, then extracts at random 1745 comments out as evaluation and test language material.The feature of these language materials after artificial mark adds up to 398, and the feature that the present invention extracts adds up to 338, and coverage rate has reached 84.9%, shown higher feature coverage rate.
The generation of comment tlv triple is core of the present invention and difficult point, and the present invention uses the method judging characteristic word of machine learning and emotion word whether can form rational collocation.The effect generating in order to evaluate and test comment tlv triple is randomly drawed out 133 comments as test set from the comment of cosmetic field.These test sets, through artificial extracting comment tlv triple with after proofreading and correct, form evaluating standard language material.The standard of evaluation and test adopts the mode of P-R-F.Wherein C1 is the number of actual comment tlv triple in language material; C2 is the comment tlv triple number that sorter identifies; C3 is the number of the tlv triple that correctly identifies of sorter.Evaluation result is as shown in the table:
Accuracy rate Recall rate F value C1 C2 C3
80.4% 67.5% 73.95% 949 797 641
As can be seen from the above table, the rate of accuracy reached of comment tlv triple identification is to 80.4%, and recall rate has reached 67.5%, and this has been pretty good result for the comment language material of irregularity.
The present invention also provides a kind of generation system of the summary of the user comment based on tlv triple, this system comprises: pretreatment unit, for setting up the feature dictionary of object, mapping vocabulary and emotion dictionary, and according to the characteristics tree of the feature construction object in feature dictionary, wherein, mapping word in mapping vocabulary is corresponding with the Feature Mapping in feature dictionary, emotion dictionary comprises positive emotion dictionary and negative emotion dictionary, the root node on characteristics tree top is object, the each layer of feature that leaf node is object under root node, and lower one deck leaf node is the subcharacter of corresponding last layer leaf node, last layer leaf node is father's feature of lower one deck leaf node of correspondence, reptile device, for the directed user comment webpage that captures object from internet, receiving trap, for receiving all user comments of user comment webpage object, treating apparatus, for each user comment is processed one by one, generates comment summary separately, and this treating apparatus comprises: feature extraction device, for the feature with mapping vocabulary extracting object according to feature dictionary, emotion word recognition device, for identifying emotion word according to emotion dictionary, comment tlv triple generating apparatus, for feature and the emotion word of arranging in pairs or groups, generates the comment tlv triple based on object, feature, emotion word, comment summarization generation device, the comment that generates this user comment for extracting comment tlv triple is made a summary, and, this system also comprises: decision-making tlv triple generating apparatus, for concluding the comment tlv triple of integrating all user comments, leaf node taking below characteristics tree ground floor leaf node is summed up in the point that on its corresponding ground floor leaf node as the feature of the comment tlv triple of feature, generate all decision-making tlv triple taking characteristics tree ground floor leaf node as feature, calculation element, for the quantity of the calculated characteristics decision-making tlv triple identical with emotion word polarity, if quantity equals 1, by this decision-making tlv triple and incompatible this decision-making tlv triple that represents of sets of numbers, if quantity is greater than 1, by any one decision-making tlv triple wherein with sets of numbers is incompatible represents that these have the decision-making tlv triple of same characteristic features and emotion word polarity, decision-making summarization generation device, the decision-making that generates all user comments for extract all decision-making tlv triple with the representation of decision-making tlv triple and number combinations is made a summary.
Described feature extraction device comprises: the device that user comment is divided into sentence; Each word in traversal sentence, judge whether it occurs in feature dictionary, if there is, extract as feature, if do not appear in feature dictionary but appear in mapping vocabulary, extract in feature dictionary with this word mapping characteristic of correspondence out as the device of feature.
Described emotion word recognition device comprises: the device that user comment is divided into sentence; Each word in traversal sentence, extracts the word appearing in emotion dictionary as the device of emotion word; Judge the device of the polarity of the emotion word extracting according to the polarity of emotion dictionary.
Described comment tlv triple generating apparatus comprises: the device that extracts the feature templates of tlv triple from training sample; Use svm classifier method according to the device of a sorter of feature templates training; Utilize syntax rule to combine feature and emotion word, generate the device of tlv triple; Utilize sorter to arrange in pairs or groups to feature and emotion word, generate the device of tlv triple; All tlv triple of utilizing candidate's tlv triple set pair of artificial mark to be generated by syntax rule and sorter are filtered, and remove feature and emotion word irrational tlv triple of arranging in pairs or groups, and obtain commenting on the device of tlv triple.
And, described calculation element is also for calculating the quantity of positive decision-making tlv triple and the quantity of negative decision-making tlv triple, and described decision-making summarization generation device is also for extracting the quantity of this front decision-making tlv triple and the quantity of the negative decision-making tlv triple content as described decision-making summary.
The correlation technique that native system adopts is identical with the embodiment of the generation method of the above-mentioned summary of the user comment based on tlv triple, no longer repeats at this.
Design focal point of the present invention is: utilize this method or system, by the feature in each user comment, emotion word extracts and generates based on object, feature, the comment tlv triple of emotion word, for each user comment generates its comment summary based on comment tlv triple, realize the information that user in user comment is paid close attention to and extracted separately succinctly summary info intuitively of formation, so that user checks reference, and, by comment tlv triple is concluded to integration, generation has the decision-making tlv triple of directive significance, and extract whole decision-making tlv triple with the representation of decision-making tlv triple and number combinations and generate and can reflect overall assessment result, there is the decision-making summary of decision-making booster action, in decision-making summary, user can directly check that the front of the own feature of paying close attention to and further feature is evaluated and negative evaluation respectively accounts for how many, also can know the overall assessment result of all user comments, thereby assisted user is made correct decisions rapidly.
The above, it is only preferred embodiment of the present invention, not technical scope of the present invention is imposed any restrictions, therefore any trickle amendment, equivalent variations and modification that every foundation technical spirit of the present invention is done above embodiment all still belong in the scope of technical solution of the present invention.

Claims (10)

1. a generation method for the summary of the user comment based on tlv triple, is characterized in that, comprising:
Step 1. is set up feature dictionary, mapping vocabulary and the emotion dictionary of object, and according to the characteristics tree of the feature construction object in feature dictionary, wherein, mapping word in mapping vocabulary is corresponding with the Feature Mapping in feature dictionary, emotion dictionary comprises positive emotion dictionary and negative emotion dictionary, the root node on characteristics tree top is object, the each layer of feature that leaf node is object under root node, and lower one deck leaf node is the subcharacter of corresponding last layer leaf node, last layer leaf node is father's feature of corresponding lower one deck leaf node;
Step 2. is the directed user comment webpage that captures object from internet;
Step 3. receives all user comments of object in user comment webpage;
Step 4. is handled as follows one by one to each user comment, generates comment summary separately:
Step 41. is according to the feature of feature dictionary and mapping vocabulary extracting object;
Step 42. is according to emotion dictionary identification emotion word;
Step 43. arrange in pairs or groups feature and emotion word, generate the comment tlv triple based on object, feature, emotion word;
Step 44. extracting comment tlv triple generates the comment summary of this user comment;
And the method also comprises:
Step 5. is concluded the comment tlv triple of integrating all user comments, leaf node taking below characteristics tree ground floor leaf node is summed up in the point that on its corresponding ground floor leaf node as the feature of the comment tlv triple of feature, generate all decision-making tlv triple taking characteristics tree ground floor leaf node as feature;
The quantity of the decision-making tlv triple that step 6. calculated characteristics is identical with emotion word polarity, if quantity equals 1, by this decision-making tlv triple and incompatible this decision-making tlv triple that represents of sets of numbers, if quantity is greater than 1, by any one decision-making tlv triple wherein with sets of numbers is incompatible represents that these have the decision-making tlv triple of same characteristic features and emotion word polarity;
Step 7. extracts all decision-making tlv triple and generates the decision-making summary of all user comments with the representation of decision-making tlv triple and number combinations.
2. the generation method of a kind of summary of the user comment based on tlv triple according to claim 1, is characterized in that, described step 41 comprises:
User comment is divided into sentence by step 411.;
Step 412. travels through each word in sentence, judge whether it occurs in feature dictionary, if there is, extract as feature, if do not appear in feature dictionary but appear in mapping vocabulary, extract in feature dictionary with this word mapping characteristic of correspondence out as feature.
3. the generation method of a kind of summary of the user comment based on tlv triple according to claim 1, is characterized in that, described step 42 comprises:
User comment is divided into sentence by step 421.;
Step 422. travels through each word in sentence, and the word appearing in emotion dictionary is extracted as emotion word;
Step 423. judges the polarity of the emotion word extracting according to the polarity of emotion dictionary.
4. the generation method of a kind of summary of the user comment based on tlv triple according to claim 1, is characterized in that, described step 43 comprises:
Step 431. is extracted the feature templates of tlv triple from training sample;
Step 432. is used svm classifier method according to a sorter of feature templates training;
Step 433. utilizes syntax rule to combine feature and emotion word, generates tlv triple;
Step 434. utilizes sorter to arrange in pairs or groups to feature and emotion word, generates tlv triple;
All tlv triple that step 435. utilizes candidate's tlv triple set pair of artificial mark to be generated by syntax rule and sorter are filtered, and remove feature and emotion word irrational tlv triple of arranging in pairs or groups, and obtain commenting on tlv triple.
5. the generation method of a kind of summary of the user comment based on tlv triple according to claim 1, it is characterized in that, described step 6 also comprises calculates the quantity of positive decision-making tlv triple and the quantity of negative decision-making tlv triple, and described step 7 also comprises the content that the extraction quantity of this front decision-making tlv triple and the quantity of negative decision-making tlv triple are made a summary as described decision-making.
6. a generation system for the summary of the user comment based on tlv triple, is characterized in that, comprising:
Pretreatment unit, for setting up feature dictionary, mapping vocabulary and the emotion dictionary of object, and according to the characteristics tree of the feature construction object in feature dictionary, wherein, mapping word in mapping vocabulary is corresponding with the Feature Mapping in feature dictionary, emotion dictionary comprises positive emotion dictionary and negative emotion dictionary, the root node on characteristics tree top is object, the each layer of feature that leaf node is object under root node, and lower one deck leaf node is the subcharacter of corresponding last layer leaf node, last layer leaf node is father's feature of corresponding lower one deck leaf node;
Reptile device, for the directed user comment webpage that captures object from internet;
Receiving trap, for receiving all user comments of user comment webpage object;
Treating apparatus, for each user comment is processed one by one, generates comment summary separately, and this treating apparatus comprises:
Feature extraction device, for the feature with mapping vocabulary extracting object according to feature dictionary;
Emotion word recognition device, for identifying emotion word according to emotion dictionary;
Comment tlv triple generating apparatus, for feature and the emotion word of arranging in pairs or groups, generates the comment tlv triple based on object, feature, emotion word;
Comment summarization generation device, the comment that generates this user comment for extracting comment tlv triple is made a summary;
And this system also comprises:
Decision-making tlv triple generating apparatus, for concluding the comment tlv triple of integrating all user comments, leaf node taking below characteristics tree ground floor leaf node is summed up in the point that on its corresponding ground floor leaf node as the feature of the comment tlv triple of feature, generate all decision-making tlv triple taking characteristics tree ground floor leaf node as feature;
Calculation element, for the quantity of the calculated characteristics decision-making tlv triple identical with emotion word polarity, if quantity equals 1, by this decision-making tlv triple and incompatible this decision-making tlv triple that represents of sets of numbers, if quantity is greater than 1, by any one decision-making tlv triple wherein with sets of numbers is incompatible represents that these have the decision-making tlv triple of same characteristic features and emotion word polarity;
Decision-making summarization generation device, the decision-making that generates all user comments for extract all decision-making tlv triple with the representation of decision-making tlv triple and number combinations is made a summary.
7. the generation system of a kind of summary of the user comment based on tlv triple according to claim 6, is characterized in that, described feature extraction device comprises:
User comment is divided into the device of sentence;
Each word in traversal sentence, judge whether it occurs in feature dictionary, if there is, extract as feature, if do not appear in feature dictionary but appear in mapping vocabulary, extract in feature dictionary with this word mapping characteristic of correspondence out as the device of feature.
8. the generation system of a kind of summary of the user comment based on tlv triple according to claim 6, is characterized in that, described emotion word recognition device comprises:
User comment is divided into the device of sentence;
Each word in traversal sentence, extracts the word appearing in emotion dictionary as the device of emotion word;
Judge the device of the polarity of the emotion word extracting according to the polarity of emotion dictionary.
9. the generation system of a kind of summary of the user comment based on tlv triple according to claim 6, is characterized in that, described comment tlv triple generating apparatus comprises:
From training sample, extract the device of the feature templates of tlv triple;
Use svm classifier method according to the device of a sorter of feature templates training;
Utilize syntax rule to combine feature and emotion word, generate the device of tlv triple;
Utilize sorter to arrange in pairs or groups to feature and emotion word, generate the device of tlv triple;
All tlv triple of utilizing candidate's tlv triple set pair of artificial mark to be generated by syntax rule and sorter are filtered, and remove feature and emotion word irrational tlv triple of arranging in pairs or groups, and obtain commenting on the device of tlv triple.
10. the generation system of a kind of summary of the user comment based on tlv triple according to claim 6, it is characterized in that, described calculation element is also for calculating the quantity of positive decision-making tlv triple and the quantity of negative decision-making tlv triple, and described decision-making summarization generation device is also for extracting the quantity of this front decision-making tlv triple and the quantity of the negative decision-making tlv triple content as described decision-making summary.
CN201110236683.7A 2011-08-17 2011-08-17 Method and system for generating user comment summaries based on triples Active CN102236722B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201110236683.7A CN102236722B (en) 2011-08-17 2011-08-17 Method and system for generating user comment summaries based on triples

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201110236683.7A CN102236722B (en) 2011-08-17 2011-08-17 Method and system for generating user comment summaries based on triples

Publications (2)

Publication Number Publication Date
CN102236722A CN102236722A (en) 2011-11-09
CN102236722B true CN102236722B (en) 2014-08-27

Family

ID=44887368

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201110236683.7A Active CN102236722B (en) 2011-08-17 2011-08-17 Method and system for generating user comment summaries based on triples

Country Status (1)

Country Link
CN (1) CN102236722B (en)

Families Citing this family (24)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103377262B (en) * 2012-04-28 2017-09-12 国际商业机器公司 The method and apparatus being grouped to user
CN102890707A (en) * 2012-08-28 2013-01-23 华南理工大学 System for mining emotional tendencies of brief network comments based on conditional random field
CN103678371B (en) * 2012-09-14 2017-10-10 富士通株式会社 Word library updating device, data integration device and method and electronic equipment
CN102945268A (en) * 2012-10-25 2013-02-27 北京腾逸科技发展有限公司 Method and system for excavating comments on characteristics of product
CN103970786A (en) * 2013-01-31 2014-08-06 百度在线网络技术(北京)有限公司 LBS (Location Based Service)-based information obtaining method and equipment
CN103970783A (en) * 2013-01-31 2014-08-06 百度在线网络技术(北京)有限公司 LBS (Location Based Service)-based information acquisition method and equipment
CN103970784A (en) * 2013-01-31 2014-08-06 百度在线网络技术(北京)有限公司 Retrieval method and equipment
CN103399916A (en) * 2013-07-31 2013-11-20 清华大学 Internet comment and opinion mining method and system on basis of product features
CN104375739B (en) * 2013-08-12 2019-07-26 联想(北京)有限公司 The method and electronic equipment of information processing
CN104375977B (en) * 2013-08-14 2018-11-23 腾讯科技(深圳)有限公司 The processing method and processing device of reply message in Ask-Answer Community
CN104462132A (en) * 2013-09-23 2015-03-25 华为技术有限公司 Comment information display method and device
CN105512333A (en) * 2015-12-28 2016-04-20 上海电机学院 Product comment theme searching method based on emotional tendency
CN105761152A (en) * 2016-02-07 2016-07-13 重庆邮电大学 Topic participation prediction method based on triadic group in social network
CN105760502A (en) * 2016-02-23 2016-07-13 常州普适信息科技有限公司 Commercial quality emotional dictionary construction system based on big data text mining
CN105912644A (en) * 2016-04-08 2016-08-31 国家计算机网络与信息安全管理中心 Network review generation type abstract method
CN106055542B (en) * 2016-08-17 2019-01-22 山东大学 A kind of text snippet automatic generation method and system based on temporal knowledge extraction
CN106469145A (en) * 2016-09-30 2017-03-01 中科鼎富(北京)科技发展有限公司 Text emotion analysis method and device
CN108133014B (en) * 2017-12-22 2022-03-22 广州数说故事信息科技有限公司 Triple generation method and device based on syntactic analysis and clustering and user terminal
CN109992661A (en) * 2019-03-05 2019-07-09 广发证券股份有限公司 A kind of intelligent public sentiment monitoring method and system towards securities industry
CN109948031A (en) * 2019-03-12 2019-06-28 南京航空航天大学 On-Line review sentence automatic creation system with Sentiment orientation
CN110134765B (en) * 2019-05-05 2021-06-29 杭州师范大学 Restaurant user comment analysis system and method based on emotion analysis
CN110349620B (en) * 2019-06-28 2020-06-19 南方医科大学 Method for accurately identifying molecular interaction and polarity and directionality thereof
CN114116989B (en) * 2022-01-28 2022-04-15 京华信息科技股份有限公司 Formatted document generation method and system based on OCR recognition
CN114724010B (en) * 2022-05-16 2022-09-02 中译语通科技股份有限公司 Method, device and equipment for determining sample to be trained and readable storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101667194A (en) * 2009-09-29 2010-03-10 北京大学 Automatic abstracting method and system based on user comment text feature
CN101727487A (en) * 2009-12-04 2010-06-09 中国人民解放军信息工程大学 Network criticism oriented viewpoint subject identifying method and system
CN102096680A (en) * 2009-12-15 2011-06-15 北京大学 Method and device for analyzing information validity

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN101667194A (en) * 2009-09-29 2010-03-10 北京大学 Automatic abstracting method and system based on user comment text feature
CN101727487A (en) * 2009-12-04 2010-06-09 中国人民解放军信息工程大学 Network criticism oriented viewpoint subject identifying method and system
CN102096680A (en) * 2009-12-15 2011-06-15 北京大学 Method and device for analyzing information validity

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
产品评论挖掘研究综述;郗亚辉等;《山东大学学报(理学版)》;20110531;第46卷(第5期);16-22 *
郗亚辉等.产品评论挖掘研究综述.《山东大学学报(理学版)》.2011,第46卷(第5期),16-22.

Also Published As

Publication number Publication date
CN102236722A (en) 2011-11-09

Similar Documents

Publication Publication Date Title
CN102236722B (en) Method and system for generating user comment summaries based on triples
CN110633373B (en) Automobile public opinion analysis method based on knowledge graph and deep learning
CN110175325A (en) The comment and analysis method and Visual Intelligent Interface Model of word-based vector sum syntactic feature
CN106649603B (en) Designated information pushing method based on emotion classification of webpage text data
CN105956052A (en) Building method of knowledge map based on vertical field
CN102663139B (en) Method and system for constructing emotional dictionary
CN109829166B (en) People and host customer opinion mining method based on character-level convolutional neural network
CN105512687A (en) Emotion classification model training and textual emotion polarity analysis method and system
CN103577989B (en) A kind of information classification approach and information classifying system based on product identification
CN103399916A (en) Internet comment and opinion mining method and system on basis of product features
CN103678564A (en) Internet product research system based on data mining
CN101609459A (en) A kind of extraction system of affective characteristic words
CN103870973A (en) Information push and search method and apparatus based on electronic information keyword extraction
CN104063497B (en) Viewpoint treating method and apparatus and searching method and device
Claster et al. Naïve Bayes and unsupervised artificial neural nets for Cancun tourism social media data analysis
CN104794212A (en) Context sentiment classification method and system based on user comment text
CN102929860B (en) Chinese clause emotion polarity distinguishing method based on context
CN105868185A (en) Part-of-speech-tagging-based dictionary construction method applied in shopping comment emotion analysis
CN102945268A (en) Method and system for excavating comments on characteristics of product
CN103336766A (en) Short text garbage identification and modeling method and device
CN101127042A (en) Sensibility classification method based on language model
CN104376010A (en) User recommendation method and user recommendation device
CN102096680A (en) Method and device for analyzing information validity
CN106776574A (en) User comment text method for digging and device
Botchway et al. A review of social media posts from UniCredit bank in Europe: A sentiment analysis approach

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
C14 Grant of patent or utility model
GR01 Patent grant