CN105844424A

CN105844424A - Product quality problem discovery and risk assessment method based on network comments

Info

Publication number: CN105844424A
Application number: CN201610212917.7A
Authority: CN
Inventors: 徐新胜; 朱凡凡; 林静
Original assignee: China Jiliang University
Current assignee: China Jiliang University
Priority date: 2016-05-30
Filing date: 2016-05-30
Publication date: 2016-08-10
Also published as: CN113837531A

Abstract

The invention provides a product quality problem discovery and risk assessment method based on network comments. The method comprises steps that 1, data acquisition, a webpage related to a designated product is acquired by utilizing a web crawler, and the comment data of the webpage is extracted and is stored in a database; 2, quality characteristic word extraction, pre-processing on comment texts is carried out, and a condition random field model is utilized to extract quality characteristic words from the comment data; and 3, quality problem discovery and risk assessment, statistics of product quality problems is carried out on the basis of a quality problem description template, and risk assessment on each aspect of the product quality is carried out on the basis of a risk assessment algorithm. Through the method, the quality problems reflected by users can be rapidly and effectively discovered, and quality risks in a product use process can be monitored in real time.

Description

Product quality problem based on network comment finds and methods of risk assessment

Technical field:

The invention belongs to product quality management field, particularly relate to a kind of product quality problem based on network comment and find and methods of risk assessment.

Background technology:

Product quality is the life of enterprise, is the displaying of enterprise's total quality, is also the embodiment of an enterprise synthetical strength.Traditional method for quality control the most only focuses on the quality management in production process, and dispatching from the factory of product means the end of quality management.Rise along with total quality control, the range expansion of quality management has arrived user's operational phase, and enterprise is devoted to the product quality problem during discovery user uses, and these quality problems feed back to design and production division, thus improve product quality, improve Consumer's Experience.

At present, corporate boss to collect the product quality problem during user uses by after-sale service department.A lot of large-scale manufacturing enterprises set up after-sale service point in the whole nation, collect, by after-sale service point, the quality problems that user in use runs into, and these quality problems feed back to design and production division, and the quality improvement for product provides direction.But the restriction due to fund, human and material resources etc., what after-sale service point covered is limited in scope, the enterprise even having the most does not sets up after-sale service point, collects the product quality problem during user uses by after-sale service department can not fully meet the demand of enterprise so traditional.

Along with the development of network, increasing user delivers the evaluation to certain product of oneself in the network platforms such as forum, electricity business in the way of comment, often implies, in these comments, the product quality problem that user in use finds.Effectively utilize these to comment on, therefrom excavate product quality problems in use, compensate for the shortcoming that after-sale service department's gather information is the most complete.

Summary of the invention:

Present invention is primarily targeted at a kind of product quality problem based on network comment of offer to find and methods of risk assessment, be that the one to traditional quality management method is supplemented.

A kind of product quality problem based on network comment finds and methods of risk assessment, comprises the steps:

Step 1, data acquisition: utilize web crawlers, capture the webpages such as the forum relevant to appointed product, electric business, the then comment data in extraction webpage, and comment data is saved in data base；

Step 2, qualitative character word extracts: first comment text carries out participle and part-of-speech tagging, syntactic analysis, emotion word mark three step pretreatment, and formulate feature templates, and then training condition random field models, finally utilize conditional random field models to extract qualitative character word from comment data；

Step 3, quality problems find and risk assessment: first propose quality problems description template, and go out the quality problems relevant to each qualitative character word based on this template statistics；Then propose risk assessment algorithm based on qualitative character word, utilize risk assessment algorithm to calculate the risk assessment value of each qualitative character word.

Finding and in methods of risk assessment at above-mentioned product quality problem based on network comment, in described step 1, when capturing the webpages such as the forum relevant to appointed product, electric business, calculating ProductName with the formula of web page title similarity is:

Wherein, Z is normalization factor, α_kIt is position parameter, and 0 ＜ α_k≤ 1, P_kIt is single similarity, P_kValue be 0 or 1.

Find and in methods of risk assessment at above-mentioned product quality problem based on network comment, in described step 2, use emotion word dictionary to carry out emotion word mark.In emotion word dictionary, the Sentiment orientation of word includes three classes: commendation, derogatory sense, neutrality, represent with P, N, M respectively, for commendation emotion word and derogatory sense emotion word, emotion intensity is divided into 1,3,5,7,9 five grades, and 9 represent maximum intensity, and 1 is that intensity is minimum, for neutral emotion word, emotion intensity is 0.

Find and in methods of risk assessment at above-mentioned product quality problem based on network comment, in described step 3, quality problems description template is broadly divided into two classes: the first kind is made up of qualitative character word and emotion word, and Equations of The Second Kind is made up of " no " word and qualitative character word.

In above-mentioned product quality problem based on network comment discovery and methods of risk assessment, in described step 3, the realization of risk assessment algorithm make use of emotion word dictionary and degree adverb dictionary.In degree adverb dictionary, word is divided into four classes according to emotion intensity, is respectively as follows: " extremely ", " very ", " relatively ", " slightly ", and corresponding emotion intensity level is 4,3,2,1.

In above-mentioned product quality problem based on network comment discovery and methods of risk assessment, in described step 3, the formula of risk assessment algorithm is as follows:

V (S)=V₁(S)+V₂(S)

Wherein, V₁(S) it is in the comment data meeting first kind quality problems description template, the risk assessment value of S, V₂(S) it is in the comment data meeting Equations of The Second Kind quality problems description template, the risk assessment value of S.

In the formula of above-mentioned risk assessment algorithm, V₁(S) computing formula is:

Wherein, V_P(S)、V_N(S)、V_M(S) the commendation risk assessment value of qualitative character word S, derogatory sense risk assessment value, neutral risk assessed value are represented respectively.A, b, c represent the number of commendation emotion word of decorative features word S, the number of derogatory sense emotion word, the number of neutral emotion word respectively；Score(P_Sk) represent the emotion intensity of kth commendation emotion word of decorative features word S, Score (PA_Sk) represent the emotion intensity of degree adverb of the kth commendation emotion word of decorative features word S, Score (N_Sl) represent the emotion intensity of the l derogatory sense emotion word of decorative features word S；Pi represents the number of the commendation emotion word in that comment at the i-th neutrality emotion word place of decorative features word S, N_iThe number of the derogatory sense emotion word in that comment at the i-th neutrality emotion word place of expression decorative features word S, Score (P_Sij) represent decorative features word S i-th neutrality emotion word place that comment in the emotion intensity of jth commendation emotion word.

In the formula of above-mentioned risk assessment algorithm, V₂(S) computing formula is:

Wherein, T_iRepresent the score value of i-th template, Num_iRepresent and meet the number of times that the comment data of i-th template occurs.

The present invention can capture user comment data relevant to appointed product on network automatically, and therefrom finds the quality problems of product, and then the various aspects of product quality are carried out risk assessment.The method utilizing the present invention, enterprise more rapid can effectively find the product quality problem that user is reflected, and the quality risk during product use is carried out real-time oversight.

Accompanying drawing illustrates:

Fig. 1 is the flow chart of the present invention.

Fig. 2 is the data acquisition flow chart of the present invention.

Fig. 3 is that the qualitative character word of the present invention extracts flow chart.

Fig. 4 is the dependency analysis exemplary plot of the present invention.

Fig. 5 is the training text example of the qualitative character word extraction of the present invention.

Fig. 6 is the feature templates of the qualitative character word extraction of the present invention.

Detailed description of the invention:

Below in conjunction with concrete accompanying drawing, the present invention is further illustrated.

The present invention is with the user comment in the network platforms such as forum, electricity business as object of study, it is therefore an objective to excavates the quality problems of product from network comment, and makes quality risk assessment.

Product quality problem based on network comment finds and methods of risk assessment, and including data acquisition, qualitative character word extracts, quality problems find and three steps of risk assessment, as shown in Figure 1.Separately below these three step is described in detail.

Step 1, data acquisition: utilize web crawlers, capture the webpages such as the forum relevant to appointed product, electric business, the then comment data in extraction webpage, and comment data is saved in data base.

The flow process of data acquisition is as shown in Figure 2.First, call Baidu's searching interface search appointed product, obtain specifying the search results pages of number of pages, and each search results pages comprises 13 Search Results；Then according to the following steps each search results pages of process:

Step S101: extract the title of jth Search Results in the i-th search results pages.

Step S102: calculating title similarity: utilize formula (1) to calculate title and the similarity of ProductName, similarity Sim (title, ProductName) represents, and 0≤Sim (title, ProductName)≤1.If similarity is more than or equal to 0.8, then continuing next step, otherwise, j adds 1, returns step S101.

Wherein, Z is normalization factor,

α_kIt is position parameter,

P_kIt is single similarity,

In formula (1), (2), (3) and (4), m is the word number comprised in " ProductName ", n is the word number comprised in " title ", " title (k+l-1) " represents kth+l-1 word in title, and " ProductName (l) " represents the l word in ProductName.

Step S103: extract the URL of jth Search Results in the i-th search results pages.

Step S104: coupling URL: according to the URL of jth Search Results, it is judged that whether this Search Results is forum or electricity business website, the most then continue next step, and otherwise, j adds 1, returns step S101.

Step S105: webpage capture and information extraction: what different types of webpage was corresponding captures with extraction strategy is different, so needing different websites is formulated different crawls and extraction template, Fig. 2 gives the templates such as Zhong Guan-cun, the Pacific Ocean, www.yesky.com, Jingdone district, Suning, No. 1 shop, the number of template does not limits, and can be extended.

Step S106: terminate to judge: after whole Search Results of the i-th search results pages have all processed, if in 13 Search Results in page i-th, meet the Search Results number of title similarity more than 10, then i+1, j=1, forwards S101 to, continues with next search results pages, otherwise, data acquisition end-of-job.

Step 2, qualitative character word extracts: first comment text carries out participle and part-of-speech tagging, syntactic analysis, emotion word mark three step pretreatment, and formulate feature templates, and then training condition random field models, finally utilize conditional random field models to extract qualitative character word from comment data.

The invention provides the method extracting qualitative character word from comment data, method flow diagram is as shown in Figure 3.First carry out the three step pretreatment such as participle and part-of-speech tagging S201, syntactic analysis S202, emotion word mark S203, obtain structured text 201；Then using uniform sampling approach to take out the result of 500 comments from text 201, all qualitative character words in manually commenting on these 500 are labeled as " S ", obtain training set 202；Then using training set 202 and feature templates 203 as input, condition random field Algorithm for Training is utilized to go out the conditional random field models 204 of statistical relationship between an emotion direction characterizing qualitative character word and word, part of speech, dependence, governing word and domination；Then utilize model 204 that text 201 is carried out the automatic marking of qualitative character word, obtain result set 205；From result set, finally extract the word being labeled as S, obtain qualitative character word set 206.

Below S201～S204 is described in detail:

Step S201, participle and part-of-speech tagging: the purpose that qualitative character word extracts is to extract the word relevant to product quality from comment data, but owing to Chinese does not exist space when writing between word and word, computer cannot directly carry out the identification of word, so, to first carry out word segmentation processing.The effect of participle is that one section of text of continuous print is divided into word one by one, and such as, given a word " screen of mobile phone is the fuzzyyest ", the result after participle is " screen of mobile phone is the fuzzyyest ".The word of describing mass problem has certain statistical law in part of speech, such as: qualitative character word major part is all noun；Adverbial word is that the probability of qualitative character word is almost nil etc..So, after participle, part-of-speech tagging to be carried out, mark out the part of speech of each word, the annotation results of above-mentioned example be " mobile phone/n /u screen/n very/d is fuzzy/a ".

Step S202, dependency analysis: the theoretical basis of dependency analysis is dependency grammar, this grammer thinks that the predicate verb in sentence is the center arranging other compositions, and itself is not arranged by other any compositions, all of subject composition is all subordinated to its allocator with certain dependence, the relation between the direct descriptor of dependency grammar and word.Given example " mobile phone/n /u screen/n very/d is fuzzy/a ", the result of dependency analysis is as shown in Figure 4.In the result of dependency analysis, dependence is directly there is between word and word, constitute one interdependent right, one of them is governing word, and another is dependent, dependence represents with a directed arc, being called interdependent arc, the direction of interdependent arc, for be pointed to dependent by governing word, each interdependent arc has a labelling, it is called relationship type, represents to there is which type of dependence between two words of this interdependent centering.In this example, screen is qualitative character word, figure 4, it is seen that the governing word of " screen " is " obscuring ", the dependence between " screen " and " obscuring " is " SBV ", i.e. subject-predicate relation.

Step S203, emotion word marks: through step S201, S202,4 contents such as word, part of speech, dependence, governing word are obtained, for " screen of mobile phone is the fuzzyyest " this example, the result obtained is front 5 row of form in Fig. 5, one record of each of which behavior, every record includes four fields such as word, part of speech, dependence, governing word.The basis of emotion word mark is sentiment dictionary, in sentiment dictionary, comprises conventional emotion word, such as " obscuring ", " high ", " good " etc..The object of Emotion tagging is governing word, utilizes sentiment dictionary, and marking out governing word is emotion word, is that emotion word is then labeled as " Y ", is not that emotion word is then labeled as " N ".Result as shown in Figure 5 has been obtained after Emotion tagging.

Step S204, qualitative character word based on conditional random field models extracts: qualitative character word based on condition random field extracts and is made up of two parts: trains and processes.In the training stage, taking out the result of 500 comments from text 201 initially with uniform sampling approach, all qualitative character words in manually commenting on these 500 are labeled as " S ", obtain training set 202；Five kinds of factors such as the emotion direction then considering word, part of speech, dependence, governing word and domination, make feature templates as shown in Figure 6；Then using training set 202 and feature templates 203 as input, condition random field Algorithm for Training is utilized to go out the conditional random field models 204 of statistical relationship between an emotion direction characterizing qualitative character word and word, part of speech, dependence, governing word and domination；Processing stage, utilize the model 204 that trained that text 201 carries out the automatic marking of qualitative character word, obtain result set 205, from result set, then extract the word being labeled as S, obtain qualitative character word set 206.

Step 3, quality problems find and risk assessment: first propose quality problems description template, and go out the quality problems relevant to each qualitative character word based on this template statistics；Then propose risk assessment algorithm based on qualitative character word, utilize risk assessment algorithm, calculate the risk assessment value of each qualitative character word.

User is when describing mass problem, owing to everyone language convention is different, and description form the most multiple to same quality problems.The present invention is on the basis of analyzing a large amount of comment data, take out and can contain the template that major part quality problems describe, quality problems description template is broadly divided into two classes: the first kind is made up of qualitative character word and emotion word, such as " screen obscures ", wherein " screen " is qualitative character word, and " obscuring " is emotion word；Equations of The Second Kind is made up of " no " word and qualitative character word, such as " not reading address list ", wherein contains " no " word, and " address list " is qualitative character word.As shown in table 1, wherein 1,2,3 is first kind template, and 4,5,6 is Equations of The Second Kind template in the more detailed classification of quality problems description template.

Table 1: quality problems description template exhaustive division

Sequence number	Quality problems description template	Citing
			1	Qualitative character word+emotion word	Screen obscures
2	Qualitative character word+degree adverb+emotion word	Pixel is the lowest
			3	Qualitative character word+emotion word+degree adverb	System is very bad
4	Verb+or not auxiliary word+qualitative character word	Do not read address list
			5	Qualitative character word+verb+or not auxiliary word	Take pictures have more than is needed
6	Qualitative character word+or not auxiliary word+verb	Compass can not be used

Risk assessment algorithm is described below.

Risk assessment algorithm is based on emotion word dictionary and degree adverb dictionary.Emotion word dictionary emotion based on the Dalian University of Science & Engineering vocabulary body that the present invention uses, have chosen the partial words in emotion vocabulary body, and it is possible to additionally incorporate some new cyberspeaks, the also emotional semantic classification to word and re-started division.In the emotion word dictionary of the present invention, the Sentiment orientation of word includes three classes: commendation, derogatory sense, neutrality, represent with P, N, M respectively, for commendation emotion word and derogatory sense emotion word, emotion intensity is divided into 1,3,5,7,9 five grades, and 9 represent maximum intensity, and 1 is that intensity is minimum, for neutral emotion word, emotion intensity is 0.The degree adverb dictionary that the present invention uses, based on knowing the degree rank word collection of net, therefrom have chosen partial words, and it is possible to additionally incorporate some conventional degree adverbs.Word is divided into four classes according to emotion intensity by this degree adverb dictionary, is respectively as follows: " extremely ", " very ", " relatively ", " slightly ", and corresponding emotion intensity level is 4,3,2,1.

A given qualitative character word S, its risk assessment value is designated as V (S), and the computing formula of V (S) is as follows:

V (S)=V₁(S)+V₂(S) (5)

Wherein, V₁(S) it is in the comment data meeting first kind quality problems description template, the risk assessment value of S, its computational methods are as follows:

The risk assessment of qualitative character word is divided into commendation risk assessment, derogatory sense risk assessment, neutral risk assessment three class.

Commendation risk assessment: in commenting at one, if meeting first kind quality description template, and the emotion word modifying this feature word is commendation, the calculating of commendation risk assessment to be carried out.First finding out the emotion word modifying this feature word, then judge whether to meet the template containing degree adverb, if meeting, then risk assessment is: " the emotion intensity of the emotion intensity+degree adverb of emotion word "；If not meeting, risk assessment is: " the emotion intensity of emotion word ".

Derogatory sense risk assessment: in commenting at one, if meeting first kind quality description template, and the emotion word modifying this feature word is derogatory sense, the calculating of derogatory sense risk assessment to be carried out.First finding out the emotion word modifying this feature word, then judge whether to meet the template containing degree adverb, if meeting, then risk assessment is: " the emotion intensity of the emotion intensity+degree adverb of emotion word "；If not meeting, risk assessment is: " the emotion intensity of emotion word ".

Neutral risk assessment: in commenting at one, if meeting first kind quality description template, the nearest emotion word modifying this feature word is neutral, then to carry out the calculating of neutral risk assessment.At this moment, the commendation risk assessment of the risk assessment of this feature word=this comment risk assessment=this comment and the difference of derogatory sense risk assessment.

V₁(S) computing formula is as follows:

Wherein, T_iIt is normalization factor:

T_i=Pi+Ni (7)

In formula (6), (7), V_P(S)、V_N(S)、V_M(S) the commendation risk assessment value of qualitative character word S, derogatory sense risk assessment value, neutral risk assessed value are represented respectively.A, b, c represent the number of commendation emotion word of decorative features word S, the number of derogatory sense emotion word, the number of neutral emotion word respectively；Score(P_Sk) represent the emotion intensity of kth commendation emotion word of decorative features word S, Score (PA_Sk) represent the emotion intensity of degree adverb of the kth commendation emotion word of decorative features word S, Score (N_Sl) represent the emotion intensity of the l derogatory sense emotion word of decorative features word S；Pi represents the number of the commendation emotion word in that comment at the i-th neutrality emotion word place of decorative features word S, N_iThe number of the derogatory sense emotion word in that comment at the i-th neutrality emotion word place of expression decorative features word S, Score (P_Sij) represent decorative features word S i-th neutrality emotion word place that comment in the emotion intensity of jth commendation emotion word.

V₂(S) it is in the comment data meeting Equations of The Second Kind quality problems description template, the risk assessment value of S.Their computing formula is as follows:

Wherein, T_iRepresent the score value of i-th template, Num_iRepresenting and meet the number of times that the comment data of i-th template occurs, the span of i is 4,5,6, the most corresponding 4th, 5, No. 6 templates.

Claims

1. a product quality problem based on network comment finds and methods of risk assessment, it is characterised in that including:

Step 1, data acquisition: utilize web crawlers, capture the webpages such as the forum relevant to appointed product, electric business, then take out Take the comment data in webpage, and comment data is saved in data base；

Step 2, qualitative character word extracts: first comment text carries out participle and part-of-speech tagging, syntactic analysis, emotion word mark Note three step pretreatment, and formulate feature templates, then training condition random field models, finally utilize conditional random field models from commenting Opinion extracting data qualitative character word；

Step 3, quality problems find and risk assessment: first propose quality problems description template, and go out based on this template statistics The quality problems relevant to each qualitative character word；Then propose risk assessment algorithm based on qualitative character word, utilize risk to comment Estimation algorithm calculates the risk assessment value of each qualitative character word.

2. product quality problem based on network comment as claimed in claim 1 finds and methods of risk assessment, it is characterised in that: In step 1, when capturing the webpages such as the forum relevant to appointed product, electric business, ProductName and web page title similarity are calculated Formula is:

3. product quality problem based on network comment as claimed in claim 1 finds and methods of risk assessment, it is characterised in that: In step 2, emotion word dictionary is used to carry out emotion word mark.

4. product quality problem based on network comment as claimed in claim 3 finds and methods of risk assessment, it is characterised in that: In described emotion word dictionary, the Sentiment orientation of word includes three classes: commendation, derogatory sense, neutrality, represents with P, N, M respectively, For commendation emotion word and derogatory sense emotion word, emotion intensity is divided into 1,3,5,7,9 five grades, and 9 represent maximum intensity, and 1 is strong Degree minimum, for neutral emotion word, emotion intensity is 0.

5. product quality problem based on network comment as claimed in claim 1 finds and methods of risk assessment, it is characterised in that: In step 3, quality problems description template is broadly divided into two classes: the first kind is made up of qualitative character word and emotion word, Equations of The Second Kind It is made up of " no " word and qualitative character word.

6. product quality problem based on network comment as claimed in claim 1 finds and methods of risk assessment, it is characterised in that: In step 3, the realization of risk assessment algorithm make use of emotion word dictionary and degree adverb dictionary.

7. product quality problem based on network comment as claimed in claim 6 finds and methods of risk assessment, it is characterised in that: In described degree adverb dictionary, word is divided into four classes according to emotion intensity, is respectively as follows: " extremely ", " very ", " relatively ", " slightly ", Corresponding emotion intensity level is 4,3,2,1.

8. product quality problem based on network comment as claimed in claim 1 finds and methods of risk assessment, it is characterised in that: In step 3, the formula of risk assessment algorithm is:

V (S)=V₁(S)+V₂(S)

9. product quality problem based on network comment as claimed in claim 8 finds and methods of risk assessment, it is characterised in that: In the formula of described risk assessment algorithm, V₁(S) computing formula is:

\begin{matrix} V_{1} (S) = V_{P} (S) - V_{N} (S) + V_{M} (S) \\ = Σ_{k = 1}^{a} [S c o r e (P_{S k}) + S c o r e ({PA}_{S k})] - Σ_{l = 1}^{b} [S c o r e (N_{S l}) + S c o r e ({NA}_{S l})] + \\ Σ_{i = 1}^{c} \frac{1}{T_{i}} {Σ_{j = 1}^{P i} [S c o r e (P_{S i j}) + S c o r e ({PA}_{S i j})] - Σ_{j = 1}^{N i} [S c o r e (N_{S i j}) + S c o r e ({NA}_{S i j})]} \end{matrix}

Wherein, V_P(S)、V_N(S)、V_M(S) the commendation risk assessment value of qualitative character word S, derogatory sense risk assessment are represented respectively Value, neutral risk assessed value；A, b, c represent respectively the number of commendation emotion word of decorative features word S, derogatory sense emotion word The number of emotion word several, neutral；Score(P_Sk) represent the emotion intensity of kth commendation emotion word of decorative features word S, Score(PA_Sk) represent the emotion intensity of degree adverb of the kth commendation emotion word of decorative features word S, Score (N_Sl) represent and repair The emotion intensity of the l derogatory sense emotion word of decorations Feature Words S；Pi represents the i-th neutrality emotion word place of decorative features word S The number of the commendation emotion word in that comment, N_iRepresent that comment at the i-th neutrality emotion word place of decorative features word S In the number of derogatory sense emotion word, Score (P_Sij) represent decorative features word S i-th neutrality emotion word place that comment in The emotion intensity of jth commendation emotion word.

10. product quality problem based on network comment as claimed in claim 8 finds and methods of risk assessment, and its feature exists In: in the formula of described risk assessment algorithm, V₂(S) computing formula is:

V_{2} (S) = Σ_{i = 4}^{6} T_{i} \times {Num}_{i}