CN105844424A - Product quality problem discovery and risk assessment method based on network comments - Google Patents
Product quality problem discovery and risk assessment method based on network comments Download PDFInfo
- Publication number
- CN105844424A CN105844424A CN201610212917.7A CN201610212917A CN105844424A CN 105844424 A CN105844424 A CN 105844424A CN 201610212917 A CN201610212917 A CN 201610212917A CN 105844424 A CN105844424 A CN 105844424A
- Authority
- CN
- China
- Prior art keywords
- word
- risk assessment
- emotion
- comment
- product quality
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
- 238000012502 risk assessment Methods 0.000 title claims abstract description 80
- 238000000034 method Methods 0.000 title claims abstract description 33
- 230000008451 emotion Effects 0.000 claims description 119
- 230000007935 neutral effect Effects 0.000 claims description 13
- 239000000284 extract Substances 0.000 claims description 12
- 238000012549 training Methods 0.000 claims description 11
- 238000004458 analytical method Methods 0.000 claims description 9
- 238000010606 normalization Methods 0.000 claims description 4
- 238000005034 decoration Methods 0.000 claims 1
- 238000000605 extraction Methods 0.000 abstract description 8
- 238000007781 pre-processing Methods 0.000 abstract 1
- 238000007726 management method Methods 0.000 description 5
- 238000004519 manufacturing process Methods 0.000 description 4
- 230000005611 electricity Effects 0.000 description 3
- 239000000203 mixture Substances 0.000 description 3
- 238000013459 approach Methods 0.000 description 2
- 230000001419 dependent effect Effects 0.000 description 2
- 238000013461 design Methods 0.000 description 2
- 238000012545 processing Methods 0.000 description 2
- 238000005070 sampling Methods 0.000 description 2
- 238000002946 Total quality control Methods 0.000 description 1
- 238000000205 computational method Methods 0.000 description 1
- 230000008878 coupling Effects 0.000 description 1
- 238000010168 coupling process Methods 0.000 description 1
- 238000005859 coupling reaction Methods 0.000 description 1
- 238000011161 development Methods 0.000 description 1
- 230000018109 developmental process Effects 0.000 description 1
- 238000010586 diagram Methods 0.000 description 1
- 230000000694 effects Effects 0.000 description 1
- 230000002996 emotional effect Effects 0.000 description 1
- 238000005516 engineering process Methods 0.000 description 1
- 238000011156 evaluation Methods 0.000 description 1
- 238000002372 labelling Methods 0.000 description 1
- 239000000463 material Substances 0.000 description 1
- 238000003908 quality control method Methods 0.000 description 1
- 238000010948 quality risk assessment Methods 0.000 description 1
- 230000011218 segmentation Effects 0.000 description 1
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0639—Performance analysis of employees; Performance analysis of enterprise or organisation operations
- G06Q10/06395—Quality analysis or management
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/951—Indexing; Web crawling techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/953—Querying, e.g. by the use of web search engines
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/90—Details of database functions independent of the retrieved data types
- G06F16/95—Retrieval from the web
- G06F16/955—Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/205—Parsing
- G06F40/211—Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/284—Lexical analysis, e.g. tokenisation or collocates
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F40/00—Handling natural language data
- G06F40/20—Natural language analysis
- G06F40/279—Recognition of textual entities
- G06F40/289—Phrasal analysis, e.g. finite state techniques or chunking
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q10/00—Administration; Management
- G06Q10/06—Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
- G06Q10/063—Operations research, analysis or management
- G06Q10/0635—Risk analysis of enterprise or organisation activities
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q50/00—Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
- G06Q50/04—Manufacturing
-
- Y—GENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
- Y02—TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
- Y02P—CLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
- Y02P90/00—Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
- Y02P90/30—Computing systems specially adapted for manufacturing
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Business, Economics & Management (AREA)
- Human Resources & Organizations (AREA)
- Physics & Mathematics (AREA)
- General Physics & Mathematics (AREA)
- Databases & Information Systems (AREA)
- General Engineering & Computer Science (AREA)
- Strategic Management (AREA)
- Economics (AREA)
- Health & Medical Sciences (AREA)
- General Health & Medical Sciences (AREA)
- Entrepreneurship & Innovation (AREA)
- Data Mining & Analysis (AREA)
- Audiology, Speech & Language Pathology (AREA)
- Marketing (AREA)
- Educational Administration (AREA)
- Development Economics (AREA)
- Tourism & Hospitality (AREA)
- General Business, Economics & Management (AREA)
- Computational Linguistics (AREA)
- Artificial Intelligence (AREA)
- Quality & Reliability (AREA)
- Game Theory and Decision Science (AREA)
- Operations Research (AREA)
- Manufacturing & Machinery (AREA)
- Primary Health Care (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
- Machine Translation (AREA)
Abstract
The invention provides a product quality problem discovery and risk assessment method based on network comments. The method comprises steps that 1, data acquisition, a webpage related to a designated product is acquired by utilizing a web crawler, and the comment data of the webpage is extracted and is stored in a database; 2, quality characteristic word extraction, pre-processing on comment texts is carried out, and a condition random field model is utilized to extract quality characteristic words from the comment data; and 3, quality problem discovery and risk assessment, statistics of product quality problems is carried out on the basis of a quality problem description template, and risk assessment on each aspect of the product quality is carried out on the basis of a risk assessment algorithm. Through the method, the quality problems reflected by users can be rapidly and effectively discovered, and quality risks in a product use process can be monitored in real time.
Description
Technical field:
The invention belongs to product quality management field, particularly relate to a kind of product quality problem based on network comment and find and methods of risk assessment.
Background technology:
Product quality is the life of enterprise, is the displaying of enterprise's total quality, is also the embodiment of an enterprise synthetical strength.Traditional method for quality control the most only focuses on the quality management in production process, and dispatching from the factory of product means the end of quality management.Rise along with total quality control, the range expansion of quality management has arrived user's operational phase, and enterprise is devoted to the product quality problem during discovery user uses, and these quality problems feed back to design and production division, thus improve product quality, improve Consumer's Experience.
At present, corporate boss to collect the product quality problem during user uses by after-sale service department.A lot of large-scale manufacturing enterprises set up after-sale service point in the whole nation, collect, by after-sale service point, the quality problems that user in use runs into, and these quality problems feed back to design and production division, and the quality improvement for product provides direction.But the restriction due to fund, human and material resources etc., what after-sale service point covered is limited in scope, the enterprise even having the most does not sets up after-sale service point, collects the product quality problem during user uses by after-sale service department can not fully meet the demand of enterprise so traditional.
Along with the development of network, increasing user delivers the evaluation to certain product of oneself in the network platforms such as forum, electricity business in the way of comment, often implies, in these comments, the product quality problem that user in use finds.Effectively utilize these to comment on, therefrom excavate product quality problems in use, compensate for the shortcoming that after-sale service department's gather information is the most complete.
Summary of the invention:
Present invention is primarily targeted at a kind of product quality problem based on network comment of offer to find and methods of risk assessment, be that the one to traditional quality management method is supplemented.
A kind of product quality problem based on network comment finds and methods of risk assessment, comprises the steps:
Step 1, data acquisition: utilize web crawlers, capture the webpages such as the forum relevant to appointed product, electric business, the then comment data in extraction webpage, and comment data is saved in data base;
Step 2, qualitative character word extracts: first comment text carries out participle and part-of-speech tagging, syntactic analysis, emotion word mark three step pretreatment, and formulate feature templates, and then training condition random field models, finally utilize conditional random field models to extract qualitative character word from comment data;
Step 3, quality problems find and risk assessment: first propose quality problems description template, and go out the quality problems relevant to each qualitative character word based on this template statistics;Then propose risk assessment algorithm based on qualitative character word, utilize risk assessment algorithm to calculate the risk assessment value of each qualitative character word.
Finding and in methods of risk assessment at above-mentioned product quality problem based on network comment, in described step 1, when capturing the webpages such as the forum relevant to appointed product, electric business, calculating ProductName with the formula of web page title similarity is:
Wherein, Z is normalization factor, αkIt is position parameter, and 0 < αk≤ 1, PkIt is single similarity, PkValue be 0 or 1.
Find and in methods of risk assessment at above-mentioned product quality problem based on network comment, in described step 2, use emotion word dictionary to carry out emotion word mark.In emotion word dictionary, the Sentiment orientation of word includes three classes: commendation, derogatory sense, neutrality, represent with P, N, M respectively, for commendation emotion word and derogatory sense emotion word, emotion intensity is divided into 1,3,5,7,9 five grades, and 9 represent maximum intensity, and 1 is that intensity is minimum, for neutral emotion word, emotion intensity is 0.
Find and in methods of risk assessment at above-mentioned product quality problem based on network comment, in described step 3, quality problems description template is broadly divided into two classes: the first kind is made up of qualitative character word and emotion word, and Equations of The Second Kind is made up of " no " word and qualitative character word.
In above-mentioned product quality problem based on network comment discovery and methods of risk assessment, in described step 3, the realization of risk assessment algorithm make use of emotion word dictionary and degree adverb dictionary.In degree adverb dictionary, word is divided into four classes according to emotion intensity, is respectively as follows: " extremely ", " very ", " relatively ", " slightly ", and corresponding emotion intensity level is 4,3,2,1.
In above-mentioned product quality problem based on network comment discovery and methods of risk assessment, in described step 3, the formula of risk assessment algorithm is as follows:
V (S)=V1(S)+V2(S)
Wherein, V1(S) it is in the comment data meeting first kind quality problems description template, the risk assessment value of S, V2(S) it is in the comment data meeting Equations of The Second Kind quality problems description template, the risk assessment value of S.
In the formula of above-mentioned risk assessment algorithm, V1(S) computing formula is:
Wherein, VP(S)、VN(S)、VM(S) the commendation risk assessment value of qualitative character word S, derogatory sense risk assessment value, neutral risk assessed value are represented respectively.A, b, c represent the number of commendation emotion word of decorative features word S, the number of derogatory sense emotion word, the number of neutral emotion word respectively;Score(PSk) represent the emotion intensity of kth commendation emotion word of decorative features word S, Score (PASk) represent the emotion intensity of degree adverb of the kth commendation emotion word of decorative features word S, Score (NSl) represent the emotion intensity of the l derogatory sense emotion word of decorative features word S;Pi represents the number of the commendation emotion word in that comment at the i-th neutrality emotion word place of decorative features word S, NiThe number of the derogatory sense emotion word in that comment at the i-th neutrality emotion word place of expression decorative features word S, Score (PSij) represent decorative features word S i-th neutrality emotion word place that comment in the emotion intensity of jth commendation emotion word.
In the formula of above-mentioned risk assessment algorithm, V2(S) computing formula is:
Wherein, TiRepresent the score value of i-th template, NumiRepresent and meet the number of times that the comment data of i-th template occurs.
The present invention can capture user comment data relevant to appointed product on network automatically, and therefrom finds the quality problems of product, and then the various aspects of product quality are carried out risk assessment.The method utilizing the present invention, enterprise more rapid can effectively find the product quality problem that user is reflected, and the quality risk during product use is carried out real-time oversight.
Accompanying drawing illustrates:
Fig. 1 is the flow chart of the present invention.
Fig. 2 is the data acquisition flow chart of the present invention.
Fig. 3 is that the qualitative character word of the present invention extracts flow chart.
Fig. 4 is the dependency analysis exemplary plot of the present invention.
Fig. 5 is the training text example of the qualitative character word extraction of the present invention.
Fig. 6 is the feature templates of the qualitative character word extraction of the present invention.
Detailed description of the invention:
Below in conjunction with concrete accompanying drawing, the present invention is further illustrated.
The present invention is with the user comment in the network platforms such as forum, electricity business as object of study, it is therefore an objective to excavates the quality problems of product from network comment, and makes quality risk assessment.
Product quality problem based on network comment finds and methods of risk assessment, and including data acquisition, qualitative character word extracts, quality problems find and three steps of risk assessment, as shown in Figure 1.Separately below these three step is described in detail.
Step 1, data acquisition: utilize web crawlers, capture the webpages such as the forum relevant to appointed product, electric business, the then comment data in extraction webpage, and comment data is saved in data base.
The flow process of data acquisition is as shown in Figure 2.First, call Baidu's searching interface search appointed product, obtain specifying the search results pages of number of pages, and each search results pages comprises 13 Search Results;Then according to the following steps each search results pages of process:
Step S101: extract the title of jth Search Results in the i-th search results pages.
Step S102: calculating title similarity: utilize formula (1) to calculate title and the similarity of ProductName, similarity Sim (title, ProductName) represents, and 0≤Sim (title, ProductName)≤1.If similarity is more than or equal to 0.8, then continuing next step, otherwise, j adds 1, returns step S101.
Wherein, Z is normalization factor,
αkIt is position parameter,
PkIt is single similarity,
In formula (1), (2), (3) and (4), m is the word number comprised in " ProductName ", n is the word number comprised in " title ", " title (k+l-1) " represents kth+l-1 word in title, and " ProductName (l) " represents the l word in ProductName.
Step S103: extract the URL of jth Search Results in the i-th search results pages.
Step S104: coupling URL: according to the URL of jth Search Results, it is judged that whether this Search Results is forum or electricity business website, the most then continue next step, and otherwise, j adds 1, returns step S101.
Step S105: webpage capture and information extraction: what different types of webpage was corresponding captures with extraction strategy is different, so needing different websites is formulated different crawls and extraction template, Fig. 2 gives the templates such as Zhong Guan-cun, the Pacific Ocean, www.yesky.com, Jingdone district, Suning, No. 1 shop, the number of template does not limits, and can be extended.
Step S106: terminate to judge: after whole Search Results of the i-th search results pages have all processed, if in 13 Search Results in page i-th, meet the Search Results number of title similarity more than 10, then i+1, j=1, forwards S101 to, continues with next search results pages, otherwise, data acquisition end-of-job.
Step 2, qualitative character word extracts: first comment text carries out participle and part-of-speech tagging, syntactic analysis, emotion word mark three step pretreatment, and formulate feature templates, and then training condition random field models, finally utilize conditional random field models to extract qualitative character word from comment data.
The invention provides the method extracting qualitative character word from comment data, method flow diagram is as shown in Figure 3.First carry out the three step pretreatment such as participle and part-of-speech tagging S201, syntactic analysis S202, emotion word mark S203, obtain structured text 201;Then using uniform sampling approach to take out the result of 500 comments from text 201, all qualitative character words in manually commenting on these 500 are labeled as " S ", obtain training set 202;Then using training set 202 and feature templates 203 as input, condition random field Algorithm for Training is utilized to go out the conditional random field models 204 of statistical relationship between an emotion direction characterizing qualitative character word and word, part of speech, dependence, governing word and domination;Then utilize model 204 that text 201 is carried out the automatic marking of qualitative character word, obtain result set 205;From result set, finally extract the word being labeled as S, obtain qualitative character word set 206.
Below S201~S204 is described in detail:
Step S201, participle and part-of-speech tagging: the purpose that qualitative character word extracts is to extract the word relevant to product quality from comment data, but owing to Chinese does not exist space when writing between word and word, computer cannot directly carry out the identification of word, so, to first carry out word segmentation processing.The effect of participle is that one section of text of continuous print is divided into word one by one, and such as, given a word " screen of mobile phone is the fuzzyyest ", the result after participle is " screen of mobile phone is the fuzzyyest ".The word of describing mass problem has certain statistical law in part of speech, such as: qualitative character word major part is all noun;Adverbial word is that the probability of qualitative character word is almost nil etc..So, after participle, part-of-speech tagging to be carried out, mark out the part of speech of each word, the annotation results of above-mentioned example be " mobile phone/n /u screen/n very/d is fuzzy/a ".
Step S202, dependency analysis: the theoretical basis of dependency analysis is dependency grammar, this grammer thinks that the predicate verb in sentence is the center arranging other compositions, and itself is not arranged by other any compositions, all of subject composition is all subordinated to its allocator with certain dependence, the relation between the direct descriptor of dependency grammar and word.Given example " mobile phone/n /u screen/n very/d is fuzzy/a ", the result of dependency analysis is as shown in Figure 4.In the result of dependency analysis, dependence is directly there is between word and word, constitute one interdependent right, one of them is governing word, and another is dependent, dependence represents with a directed arc, being called interdependent arc, the direction of interdependent arc, for be pointed to dependent by governing word, each interdependent arc has a labelling, it is called relationship type, represents to there is which type of dependence between two words of this interdependent centering.In this example, screen is qualitative character word, figure 4, it is seen that the governing word of " screen " is " obscuring ", the dependence between " screen " and " obscuring " is " SBV ", i.e. subject-predicate relation.
Step S203, emotion word marks: through step S201, S202,4 contents such as word, part of speech, dependence, governing word are obtained, for " screen of mobile phone is the fuzzyyest " this example, the result obtained is front 5 row of form in Fig. 5, one record of each of which behavior, every record includes four fields such as word, part of speech, dependence, governing word.The basis of emotion word mark is sentiment dictionary, in sentiment dictionary, comprises conventional emotion word, such as " obscuring ", " high ", " good " etc..The object of Emotion tagging is governing word, utilizes sentiment dictionary, and marking out governing word is emotion word, is that emotion word is then labeled as " Y ", is not that emotion word is then labeled as " N ".Result as shown in Figure 5 has been obtained after Emotion tagging.
Step S204, qualitative character word based on conditional random field models extracts: qualitative character word based on condition random field extracts and is made up of two parts: trains and processes.In the training stage, taking out the result of 500 comments from text 201 initially with uniform sampling approach, all qualitative character words in manually commenting on these 500 are labeled as " S ", obtain training set 202;Five kinds of factors such as the emotion direction then considering word, part of speech, dependence, governing word and domination, make feature templates as shown in Figure 6;Then using training set 202 and feature templates 203 as input, condition random field Algorithm for Training is utilized to go out the conditional random field models 204 of statistical relationship between an emotion direction characterizing qualitative character word and word, part of speech, dependence, governing word and domination;Processing stage, utilize the model 204 that trained that text 201 carries out the automatic marking of qualitative character word, obtain result set 205, from result set, then extract the word being labeled as S, obtain qualitative character word set 206.
Step 3, quality problems find and risk assessment: first propose quality problems description template, and go out the quality problems relevant to each qualitative character word based on this template statistics;Then propose risk assessment algorithm based on qualitative character word, utilize risk assessment algorithm, calculate the risk assessment value of each qualitative character word.
User is when describing mass problem, owing to everyone language convention is different, and description form the most multiple to same quality problems.The present invention is on the basis of analyzing a large amount of comment data, take out and can contain the template that major part quality problems describe, quality problems description template is broadly divided into two classes: the first kind is made up of qualitative character word and emotion word, such as " screen obscures ", wherein " screen " is qualitative character word, and " obscuring " is emotion word;Equations of The Second Kind is made up of " no " word and qualitative character word, such as " not reading address list ", wherein contains " no " word, and " address list " is qualitative character word.As shown in table 1, wherein 1,2,3 is first kind template, and 4,5,6 is Equations of The Second Kind template in the more detailed classification of quality problems description template.
Table 1: quality problems description template exhaustive division
Sequence number | Quality problems description template | Citing |
1 | Qualitative character word+emotion word | Screen obscures |
2 | Qualitative character word+degree adverb+emotion word | Pixel is the lowest |
3 | Qualitative character word+emotion word+degree adverb | System is very bad |
4 | Verb+or not auxiliary word+qualitative character word | Do not read address list |
5 | Qualitative character word+verb+or not auxiliary word | Take pictures have more than is needed |
6 | Qualitative character word+or not auxiliary word+verb | Compass can not be used |
Risk assessment algorithm is described below.
Risk assessment algorithm is based on emotion word dictionary and degree adverb dictionary.Emotion word dictionary emotion based on the Dalian University of Science & Engineering vocabulary body that the present invention uses, have chosen the partial words in emotion vocabulary body, and it is possible to additionally incorporate some new cyberspeaks, the also emotional semantic classification to word and re-started division.In the emotion word dictionary of the present invention, the Sentiment orientation of word includes three classes: commendation, derogatory sense, neutrality, represent with P, N, M respectively, for commendation emotion word and derogatory sense emotion word, emotion intensity is divided into 1,3,5,7,9 five grades, and 9 represent maximum intensity, and 1 is that intensity is minimum, for neutral emotion word, emotion intensity is 0.The degree adverb dictionary that the present invention uses, based on knowing the degree rank word collection of net, therefrom have chosen partial words, and it is possible to additionally incorporate some conventional degree adverbs.Word is divided into four classes according to emotion intensity by this degree adverb dictionary, is respectively as follows: " extremely ", " very ", " relatively ", " slightly ", and corresponding emotion intensity level is 4,3,2,1.
A given qualitative character word S, its risk assessment value is designated as V (S), and the computing formula of V (S) is as follows:
V (S)=V1(S)+V2(S) (5)
Wherein, V1(S) it is in the comment data meeting first kind quality problems description template, the risk assessment value of S, its computational methods are as follows:
The risk assessment of qualitative character word is divided into commendation risk assessment, derogatory sense risk assessment, neutral risk assessment three class.
Commendation risk assessment: in commenting at one, if meeting first kind quality description template, and the emotion word modifying this feature word is commendation, the calculating of commendation risk assessment to be carried out.First finding out the emotion word modifying this feature word, then judge whether to meet the template containing degree adverb, if meeting, then risk assessment is: " the emotion intensity of the emotion intensity+degree adverb of emotion word ";If not meeting, risk assessment is: " the emotion intensity of emotion word ".
Derogatory sense risk assessment: in commenting at one, if meeting first kind quality description template, and the emotion word modifying this feature word is derogatory sense, the calculating of derogatory sense risk assessment to be carried out.First finding out the emotion word modifying this feature word, then judge whether to meet the template containing degree adverb, if meeting, then risk assessment is: " the emotion intensity of the emotion intensity+degree adverb of emotion word ";If not meeting, risk assessment is: " the emotion intensity of emotion word ".
Neutral risk assessment: in commenting at one, if meeting first kind quality description template, the nearest emotion word modifying this feature word is neutral, then to carry out the calculating of neutral risk assessment.At this moment, the commendation risk assessment of the risk assessment of this feature word=this comment risk assessment=this comment and the difference of derogatory sense risk assessment.
V1(S) computing formula is as follows:
Wherein, TiIt is normalization factor:
Ti=Pi+Ni (7)
In formula (6), (7), VP(S)、VN(S)、VM(S) the commendation risk assessment value of qualitative character word S, derogatory sense risk assessment value, neutral risk assessed value are represented respectively.A, b, c represent the number of commendation emotion word of decorative features word S, the number of derogatory sense emotion word, the number of neutral emotion word respectively;Score(PSk) represent the emotion intensity of kth commendation emotion word of decorative features word S, Score (PASk) represent the emotion intensity of degree adverb of the kth commendation emotion word of decorative features word S, Score (NSl) represent the emotion intensity of the l derogatory sense emotion word of decorative features word S;Pi represents the number of the commendation emotion word in that comment at the i-th neutrality emotion word place of decorative features word S, NiThe number of the derogatory sense emotion word in that comment at the i-th neutrality emotion word place of expression decorative features word S, Score (PSij) represent decorative features word S i-th neutrality emotion word place that comment in the emotion intensity of jth commendation emotion word.
V2(S) it is in the comment data meeting Equations of The Second Kind quality problems description template, the risk assessment value of S.Their computing formula is as follows:
Wherein, TiRepresent the score value of i-th template, NumiRepresenting and meet the number of times that the comment data of i-th template occurs, the span of i is 4,5,6, the most corresponding 4th, 5, No. 6 templates.
The present invention can capture user comment data relevant to appointed product on network automatically, and therefrom finds the quality problems of product, and then the various aspects of product quality are carried out risk assessment.The method utilizing the present invention, enterprise more rapid can effectively find the product quality problem that user is reflected, and the quality risk during product use is carried out real-time oversight.
Claims (10)
1. a product quality problem based on network comment finds and methods of risk assessment, it is characterised in that including:
Step 1, data acquisition: utilize web crawlers, capture the webpages such as the forum relevant to appointed product, electric business, then take out
Take the comment data in webpage, and comment data is saved in data base;
Step 2, qualitative character word extracts: first comment text carries out participle and part-of-speech tagging, syntactic analysis, emotion word mark
Note three step pretreatment, and formulate feature templates, then training condition random field models, finally utilize conditional random field models from commenting
Opinion extracting data qualitative character word;
Step 3, quality problems find and risk assessment: first propose quality problems description template, and go out based on this template statistics
The quality problems relevant to each qualitative character word;Then propose risk assessment algorithm based on qualitative character word, utilize risk to comment
Estimation algorithm calculates the risk assessment value of each qualitative character word.
2. product quality problem based on network comment as claimed in claim 1 finds and methods of risk assessment, it is characterised in that:
In step 1, when capturing the webpages such as the forum relevant to appointed product, electric business, ProductName and web page title similarity are calculated
Formula is:
Wherein, Z is normalization factor, αkIt is position parameter, and 0 < αk≤ 1, PkIt is single similarity, PkValue be 0
Or 1.
3. product quality problem based on network comment as claimed in claim 1 finds and methods of risk assessment, it is characterised in that:
In step 2, emotion word dictionary is used to carry out emotion word mark.
4. product quality problem based on network comment as claimed in claim 3 finds and methods of risk assessment, it is characterised in that:
In described emotion word dictionary, the Sentiment orientation of word includes three classes: commendation, derogatory sense, neutrality, represents with P, N, M respectively,
For commendation emotion word and derogatory sense emotion word, emotion intensity is divided into 1,3,5,7,9 five grades, and 9 represent maximum intensity, and 1 is strong
Degree minimum, for neutral emotion word, emotion intensity is 0.
5. product quality problem based on network comment as claimed in claim 1 finds and methods of risk assessment, it is characterised in that:
In step 3, quality problems description template is broadly divided into two classes: the first kind is made up of qualitative character word and emotion word, Equations of The Second Kind
It is made up of " no " word and qualitative character word.
6. product quality problem based on network comment as claimed in claim 1 finds and methods of risk assessment, it is characterised in that:
In step 3, the realization of risk assessment algorithm make use of emotion word dictionary and degree adverb dictionary.
7. product quality problem based on network comment as claimed in claim 6 finds and methods of risk assessment, it is characterised in that:
In described degree adverb dictionary, word is divided into four classes according to emotion intensity, is respectively as follows: " extremely ", " very ", " relatively ", " slightly ",
Corresponding emotion intensity level is 4,3,2,1.
8. product quality problem based on network comment as claimed in claim 1 finds and methods of risk assessment, it is characterised in that:
In step 3, the formula of risk assessment algorithm is:
V (S)=V1(S)+V2(S)
Wherein, V1(S) it is in the comment data meeting first kind quality problems description template, the risk assessment value of S, V2(S) it is
In the comment data meeting Equations of The Second Kind quality problems description template, the risk assessment value of S.
9. product quality problem based on network comment as claimed in claim 8 finds and methods of risk assessment, it is characterised in that:
In the formula of described risk assessment algorithm, V1(S) computing formula is:
Wherein, VP(S)、VN(S)、VM(S) the commendation risk assessment value of qualitative character word S, derogatory sense risk assessment are represented respectively
Value, neutral risk assessed value;A, b, c represent respectively the number of commendation emotion word of decorative features word S, derogatory sense emotion word
The number of emotion word several, neutral;Score(PSk) represent the emotion intensity of kth commendation emotion word of decorative features word S,
Score(PASk) represent the emotion intensity of degree adverb of the kth commendation emotion word of decorative features word S, Score (NSl) represent and repair
The emotion intensity of the l derogatory sense emotion word of decorations Feature Words S;Pi represents the i-th neutrality emotion word place of decorative features word S
The number of the commendation emotion word in that comment, NiRepresent that comment at the i-th neutrality emotion word place of decorative features word S
In the number of derogatory sense emotion word, Score (PSij) represent decorative features word S i-th neutrality emotion word place that comment in
The emotion intensity of jth commendation emotion word.
10. product quality problem based on network comment as claimed in claim 8 finds and methods of risk assessment, and its feature exists
In: in the formula of described risk assessment algorithm, V2(S) computing formula is:
Wherein, TiRepresent the score value of i-th template, NumiRepresent and meet the number of times that the comment data of i-th template occurs.
Priority Applications (2)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610212917.7A CN105844424A (en) | 2016-05-30 | 2016-05-30 | Product quality problem discovery and risk assessment method based on network comments |
CN202110934697.XA CN113837531A (en) | 2016-05-30 | 2016-05-30 | Product quality problem finding and risk assessment method based on network comments |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610212917.7A CN105844424A (en) | 2016-05-30 | 2016-05-30 | Product quality problem discovery and risk assessment method based on network comments |
Related Child Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110934697.XA Division CN113837531A (en) | 2016-05-30 | 2016-05-30 | Product quality problem finding and risk assessment method based on network comments |
Publications (1)
Publication Number | Publication Date |
---|---|
CN105844424A true CN105844424A (en) | 2016-08-10 |
Family
ID=56596842
Family Applications (2)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110934697.XA Pending CN113837531A (en) | 2016-05-30 | 2016-05-30 | Product quality problem finding and risk assessment method based on network comments |
CN201610212917.7A Pending CN105844424A (en) | 2016-05-30 | 2016-05-30 | Product quality problem discovery and risk assessment method based on network comments |
Family Applications Before (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN202110934697.XA Pending CN113837531A (en) | 2016-05-30 | 2016-05-30 | Product quality problem finding and risk assessment method based on network comments |
Country Status (1)
Country | Link |
---|---|
CN (2) | CN113837531A (en) |
Cited By (18)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106294885A (en) * | 2016-10-09 | 2017-01-04 | 华东师范大学 | A kind of data collection towards isomery webpage and mask method |
CN106570631A (en) * | 2016-10-28 | 2017-04-19 | 南京邮电大学 | Method and system of facing P2P platform operation risk estimation |
CN106649260A (en) * | 2016-10-19 | 2017-05-10 | 中国计量大学 | Product feature structure tree construction method based on comment text mining |
CN107133214A (en) * | 2017-05-05 | 2017-09-05 | 中国计量大学 | A kind of product demand preference profiles based on comment information are excavated and its method for evaluating quality |
CN107169091A (en) * | 2017-05-12 | 2017-09-15 | 北京奇艺世纪科技有限公司 | A kind of data analysing method and device |
CN107767156A (en) * | 2016-08-17 | 2018-03-06 | 百度在线网络技术(北京)有限公司 | A kind of information input method, apparatus and system |
CN107977798A (en) * | 2017-12-21 | 2018-05-01 | 中国计量大学 | A kind of risk evaluating method of e-commerce product quality |
CN108256078A (en) * | 2018-01-18 | 2018-07-06 | 北京百度网讯科技有限公司 | Information acquisition method and device |
CN108733748A (en) * | 2018-04-04 | 2018-11-02 | 浙江大学城市学院 | A kind of cross-border product quality risk fuzzy prediction method based on comment on commodity public sentiment |
CN109145097A (en) * | 2018-06-11 | 2019-01-04 | 人民法院信息技术服务中心 | A kind of judgement document's classification method based on information extraction |
CN109857838A (en) * | 2019-02-12 | 2019-06-07 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating information |
CN110008311A (en) * | 2019-04-04 | 2019-07-12 | 北京邮电大学 | A kind of product information security risk monitoring method based on semantic analysis |
CN110135694A (en) * | 2019-04-12 | 2019-08-16 | 深圳壹账通智能科技有限公司 | Product risks appraisal procedure, device, computer equipment and storage medium |
CN110704581A (en) * | 2019-09-11 | 2020-01-17 | 阿里巴巴集团控股有限公司 | Computer-executed text emotion analysis method and device |
CN111461876A (en) * | 2020-05-07 | 2020-07-28 | 赵玉洁 | E-commerce credit system management system and method based on big data |
CN111861507A (en) * | 2020-06-30 | 2020-10-30 | 成都数之联科技有限公司 | Identification method and system for analyzing risks of online catering stores in real time |
CN112182165A (en) * | 2020-10-28 | 2021-01-05 | 杭州电子科技大学 | New product quality planning method based on online comments |
CN117150025A (en) * | 2023-10-31 | 2023-12-01 | 湖南锦鳞智能科技有限公司 | Intelligent data service identification system |
Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103399916A (en) * | 2013-07-31 | 2013-11-20 | 清华大学 | Internet comment and opinion mining method and system on basis of product features |
CN103646088A (en) * | 2013-12-13 | 2014-03-19 | 合肥工业大学 | Product comment fine-grained emotional element extraction method based on CRFs and SVM |
US20150186790A1 (en) * | 2013-12-31 | 2015-07-02 | Soshoma Inc. | Systems and Methods for Automatic Understanding of Consumer Evaluations of Product Attributes from Consumer-Generated Reviews |
CN104794154A (en) * | 2015-03-11 | 2015-07-22 | 南通天呈医流互联网技术有限公司 | O2O service quality evaluation model for medical apparatus based on text mining |
CN105205699A (en) * | 2015-09-17 | 2015-12-30 | 北京众荟信息技术有限公司 | User label and hotel label matching method and device based on hotel comments |
Family Cites Families (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US20080249764A1 (en) * | 2007-03-01 | 2008-10-09 | Microsoft Corporation | Smart Sentiment Classifier for Product Reviews |
CN102890707A (en) * | 2012-08-28 | 2013-01-23 | 华南理工大学 | System for mining emotional tendencies of brief network comments based on conditional random field |
CN103455562A (en) * | 2013-08-13 | 2013-12-18 | 西安建筑科技大学 | Text orientation analysis method and product review orientation discriminator on basis of same |
CN103544242B (en) * | 2013-09-29 | 2017-02-15 | 广东工业大学 | Microblog-oriented emotion entity searching system |
CN105354183A (en) * | 2015-10-19 | 2016-02-24 | Tcl集团股份有限公司 | Analytic method, apparatus and system for internet comments of household electrical appliance products |
-
2016
- 2016-05-30 CN CN202110934697.XA patent/CN113837531A/en active Pending
- 2016-05-30 CN CN201610212917.7A patent/CN105844424A/en active Pending
Patent Citations (5)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103399916A (en) * | 2013-07-31 | 2013-11-20 | 清华大学 | Internet comment and opinion mining method and system on basis of product features |
CN103646088A (en) * | 2013-12-13 | 2014-03-19 | 合肥工业大学 | Product comment fine-grained emotional element extraction method based on CRFs and SVM |
US20150186790A1 (en) * | 2013-12-31 | 2015-07-02 | Soshoma Inc. | Systems and Methods for Automatic Understanding of Consumer Evaluations of Product Attributes from Consumer-Generated Reviews |
CN104794154A (en) * | 2015-03-11 | 2015-07-22 | 南通天呈医流互联网技术有限公司 | O2O service quality evaluation model for medical apparatus based on text mining |
CN105205699A (en) * | 2015-09-17 | 2015-12-30 | 北京众荟信息技术有限公司 | User label and hotel label matching method and device based on hotel comments |
Cited By (29)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107767156A (en) * | 2016-08-17 | 2018-03-06 | 百度在线网络技术(北京)有限公司 | A kind of information input method, apparatus and system |
CN106294885A (en) * | 2016-10-09 | 2017-01-04 | 华东师范大学 | A kind of data collection towards isomery webpage and mask method |
CN106649260A (en) * | 2016-10-19 | 2017-05-10 | 中国计量大学 | Product feature structure tree construction method based on comment text mining |
CN106649260B (en) * | 2016-10-19 | 2022-01-25 | 中国计量大学 | Product characteristic structure tree construction method based on comment text mining |
CN106570631B (en) * | 2016-10-28 | 2021-01-01 | 南京邮电大学 | P2P platform-oriented operation risk assessment method and system |
CN106570631A (en) * | 2016-10-28 | 2017-04-19 | 南京邮电大学 | Method and system of facing P2P platform operation risk estimation |
CN107133214A (en) * | 2017-05-05 | 2017-09-05 | 中国计量大学 | A kind of product demand preference profiles based on comment information are excavated and its method for evaluating quality |
CN107169091A (en) * | 2017-05-12 | 2017-09-15 | 北京奇艺世纪科技有限公司 | A kind of data analysing method and device |
CN107977798A (en) * | 2017-12-21 | 2018-05-01 | 中国计量大学 | A kind of risk evaluating method of e-commerce product quality |
CN107977798B (en) * | 2017-12-21 | 2023-09-12 | 中国计量大学 | Risk assessment method for quality of electronic commerce product |
CN108256078A (en) * | 2018-01-18 | 2018-07-06 | 北京百度网讯科技有限公司 | Information acquisition method and device |
CN108256078B (en) * | 2018-01-18 | 2019-07-12 | 北京百度网讯科技有限公司 | Information acquisition method and device |
CN108733748B (en) * | 2018-04-04 | 2022-01-14 | 浙江大学城市学院 | Cross-border product quality risk fuzzy prediction method based on commodity comment public sentiment |
CN108733748A (en) * | 2018-04-04 | 2018-11-02 | 浙江大学城市学院 | A kind of cross-border product quality risk fuzzy prediction method based on comment on commodity public sentiment |
CN109145097A (en) * | 2018-06-11 | 2019-01-04 | 人民法院信息技术服务中心 | A kind of judgement document's classification method based on information extraction |
CN109857838B (en) * | 2019-02-12 | 2021-01-26 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating information |
CN109857838A (en) * | 2019-02-12 | 2019-06-07 | 北京字节跳动网络技术有限公司 | Method and apparatus for generating information |
CN110008311B (en) * | 2019-04-04 | 2020-11-24 | 北京邮电大学 | Product information safety risk monitoring method based on semantic analysis |
CN110008311A (en) * | 2019-04-04 | 2019-07-12 | 北京邮电大学 | A kind of product information security risk monitoring method based on semantic analysis |
CN110135694A (en) * | 2019-04-12 | 2019-08-16 | 深圳壹账通智能科技有限公司 | Product risks appraisal procedure, device, computer equipment and storage medium |
CN110704581A (en) * | 2019-09-11 | 2020-01-17 | 阿里巴巴集团控股有限公司 | Computer-executed text emotion analysis method and device |
CN110704581B (en) * | 2019-09-11 | 2024-03-08 | 创新先进技术有限公司 | Text emotion analysis method and device executed by computer |
CN111461876A (en) * | 2020-05-07 | 2020-07-28 | 赵玉洁 | E-commerce credit system management system and method based on big data |
CN111861507A (en) * | 2020-06-30 | 2020-10-30 | 成都数之联科技有限公司 | Identification method and system for analyzing risks of online catering stores in real time |
CN111861507B (en) * | 2020-06-30 | 2023-10-24 | 成都数之联科技股份有限公司 | Identification method and system for real-time analysis of risks of network restaurant shops |
CN112182165A (en) * | 2020-10-28 | 2021-01-05 | 杭州电子科技大学 | New product quality planning method based on online comments |
CN112182165B (en) * | 2020-10-28 | 2022-05-20 | 杭州电子科技大学 | New product quality planning method based on online comments |
CN117150025A (en) * | 2023-10-31 | 2023-12-01 | 湖南锦鳞智能科技有限公司 | Intelligent data service identification system |
CN117150025B (en) * | 2023-10-31 | 2024-01-26 | 湖南锦鳞智能科技有限公司 | Intelligent data service identification system |
Also Published As
Publication number | Publication date |
---|---|
CN113837531A (en) | 2021-12-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN105844424A (en) | Product quality problem discovery and risk assessment method based on network comments | |
CN106649260B (en) | Product characteristic structure tree construction method based on comment text mining | |
CN102262634B (en) | Automatic questioning and answering method and system | |
CN102253982B (en) | Query suggestion method based on query semantics and click-through data | |
CN103605658B (en) | A kind of search engine system analyzed based on text emotion | |
CN103886067B (en) | Method for recommending books through label implied topic | |
CN108363725B (en) | Method for extracting user comment opinions and generating opinion labels | |
CN105718586A (en) | Word division method and device | |
CN104408093A (en) | News event element extracting method and device | |
CN102866989A (en) | Viewpoint extracting method based on word dependence relationship | |
CN103246644B (en) | Method and device for processing Internet public opinion information | |
CN101127042A (en) | Sensibility classification method based on language model | |
CN105677857B (en) | method and device for accurately matching keywords with marketing landing pages | |
CN103020230A (en) | Semantic fuzzy matching method | |
CN104484380A (en) | Personalized search method and personalized search device | |
CN105718585B (en) | Document and label word justice correlating method and its device | |
CN103823893A (en) | User comment-based product search method and system | |
CN104317834A (en) | Cross-media sorting method based on deep neural network | |
CN103886020B (en) | A kind of real estate information method for fast searching | |
CN105630768A (en) | Cascaded conditional random field-based product name recognition method and device | |
CN102693279A (en) | Method, device and system for fast calculating comment similarity | |
CN103177036A (en) | Method and system for label automatic extraction | |
CN109376202A (en) | A kind of supply relationship based on NLP extracts analysis method automatically | |
CN105787662A (en) | Mobile application software performance prediction method based on attributes | |
CN104281565A (en) | Semantic dictionary constructing method and device |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
CB02 | Change of applicant information |
Address after: 310018 258 Xiyuan street, Xiasha, Hangzhou, Zhejiang Applicant after: China Jiliang University Address before: 310018 258 Xiyuan street, Xiasha, Hangzhou, Zhejiang Applicant before: China Jiliang University |
|
CB02 | Change of applicant information | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20160810 |
|
RJ01 | Rejection of invention patent application after publication |