CN105844424A - Product quality problem discovery and risk assessment method based on network comments - Google Patents

Product quality problem discovery and risk assessment method based on network comments Download PDF

Info

Publication number
CN105844424A
CN105844424A CN201610212917.7A CN201610212917A CN105844424A CN 105844424 A CN105844424 A CN 105844424A CN 201610212917 A CN201610212917 A CN 201610212917A CN 105844424 A CN105844424 A CN 105844424A
Authority
CN
China
Prior art keywords
word
risk assessment
emotion
comment
product quality
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201610212917.7A
Other languages
Chinese (zh)
Inventor
徐新胜
朱凡凡
林静
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
China Jiliang University
Original Assignee
China Jiliang University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by China Jiliang University filed Critical China Jiliang University
Priority to CN201610212917.7A priority Critical patent/CN105844424A/en
Priority to CN202110934697.XA priority patent/CN113837531A/en
Publication of CN105844424A publication Critical patent/CN105844424A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06395Quality analysis or management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/951Indexing; Web crawling techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/955Retrieval from the web using information identifiers, e.g. uniform resource locators [URL]
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/205Parsing
    • G06F40/211Syntactic parsing, e.g. based on context-free grammar [CFG] or unification grammars
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/284Lexical analysis, e.g. tokenisation or collocates
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F40/00Handling natural language data
    • G06F40/20Natural language analysis
    • G06F40/279Recognition of textual entities
    • G06F40/289Phrasal analysis, e.g. finite state techniques or chunking
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Databases & Information Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • Strategic Management (AREA)
  • Economics (AREA)
  • Health & Medical Sciences (AREA)
  • General Health & Medical Sciences (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Data Mining & Analysis (AREA)
  • Audiology, Speech & Language Pathology (AREA)
  • Marketing (AREA)
  • Educational Administration (AREA)
  • Development Economics (AREA)
  • Tourism & Hospitality (AREA)
  • General Business, Economics & Management (AREA)
  • Computational Linguistics (AREA)
  • Artificial Intelligence (AREA)
  • Quality & Reliability (AREA)
  • Game Theory and Decision Science (AREA)
  • Operations Research (AREA)
  • Manufacturing & Machinery (AREA)
  • Primary Health Care (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
  • Machine Translation (AREA)

Abstract

The invention provides a product quality problem discovery and risk assessment method based on network comments. The method comprises steps that 1, data acquisition, a webpage related to a designated product is acquired by utilizing a web crawler, and the comment data of the webpage is extracted and is stored in a database; 2, quality characteristic word extraction, pre-processing on comment texts is carried out, and a condition random field model is utilized to extract quality characteristic words from the comment data; and 3, quality problem discovery and risk assessment, statistics of product quality problems is carried out on the basis of a quality problem description template, and risk assessment on each aspect of the product quality is carried out on the basis of a risk assessment algorithm. Through the method, the quality problems reflected by users can be rapidly and effectively discovered, and quality risks in a product use process can be monitored in real time.

Description

Product quality problem based on network comment finds and methods of risk assessment
Technical field:
The invention belongs to product quality management field, particularly relate to a kind of product quality problem based on network comment and find and methods of risk assessment.
Background technology:
Product quality is the life of enterprise, is the displaying of enterprise's total quality, is also the embodiment of an enterprise synthetical strength.Traditional method for quality control the most only focuses on the quality management in production process, and dispatching from the factory of product means the end of quality management.Rise along with total quality control, the range expansion of quality management has arrived user's operational phase, and enterprise is devoted to the product quality problem during discovery user uses, and these quality problems feed back to design and production division, thus improve product quality, improve Consumer's Experience.
At present, corporate boss to collect the product quality problem during user uses by after-sale service department.A lot of large-scale manufacturing enterprises set up after-sale service point in the whole nation, collect, by after-sale service point, the quality problems that user in use runs into, and these quality problems feed back to design and production division, and the quality improvement for product provides direction.But the restriction due to fund, human and material resources etc., what after-sale service point covered is limited in scope, the enterprise even having the most does not sets up after-sale service point, collects the product quality problem during user uses by after-sale service department can not fully meet the demand of enterprise so traditional.
Along with the development of network, increasing user delivers the evaluation to certain product of oneself in the network platforms such as forum, electricity business in the way of comment, often implies, in these comments, the product quality problem that user in use finds.Effectively utilize these to comment on, therefrom excavate product quality problems in use, compensate for the shortcoming that after-sale service department's gather information is the most complete.
Summary of the invention:
Present invention is primarily targeted at a kind of product quality problem based on network comment of offer to find and methods of risk assessment, be that the one to traditional quality management method is supplemented.
A kind of product quality problem based on network comment finds and methods of risk assessment, comprises the steps:
Step 1, data acquisition: utilize web crawlers, capture the webpages such as the forum relevant to appointed product, electric business, the then comment data in extraction webpage, and comment data is saved in data base;
Step 2, qualitative character word extracts: first comment text carries out participle and part-of-speech tagging, syntactic analysis, emotion word mark three step pretreatment, and formulate feature templates, and then training condition random field models, finally utilize conditional random field models to extract qualitative character word from comment data;
Step 3, quality problems find and risk assessment: first propose quality problems description template, and go out the quality problems relevant to each qualitative character word based on this template statistics;Then propose risk assessment algorithm based on qualitative character word, utilize risk assessment algorithm to calculate the risk assessment value of each qualitative character word.
Finding and in methods of risk assessment at above-mentioned product quality problem based on network comment, in described step 1, when capturing the webpages such as the forum relevant to appointed product, electric business, calculating ProductName with the formula of web page title similarity is:
Wherein, Z is normalization factor, αkIt is position parameter, and 0 < αk≤ 1, PkIt is single similarity, PkValue be 0 or 1.
Find and in methods of risk assessment at above-mentioned product quality problem based on network comment, in described step 2, use emotion word dictionary to carry out emotion word mark.In emotion word dictionary, the Sentiment orientation of word includes three classes: commendation, derogatory sense, neutrality, represent with P, N, M respectively, for commendation emotion word and derogatory sense emotion word, emotion intensity is divided into 1,3,5,7,9 five grades, and 9 represent maximum intensity, and 1 is that intensity is minimum, for neutral emotion word, emotion intensity is 0.
Find and in methods of risk assessment at above-mentioned product quality problem based on network comment, in described step 3, quality problems description template is broadly divided into two classes: the first kind is made up of qualitative character word and emotion word, and Equations of The Second Kind is made up of " no " word and qualitative character word.
In above-mentioned product quality problem based on network comment discovery and methods of risk assessment, in described step 3, the realization of risk assessment algorithm make use of emotion word dictionary and degree adverb dictionary.In degree adverb dictionary, word is divided into four classes according to emotion intensity, is respectively as follows: " extremely ", " very ", " relatively ", " slightly ", and corresponding emotion intensity level is 4,3,2,1.
In above-mentioned product quality problem based on network comment discovery and methods of risk assessment, in described step 3, the formula of risk assessment algorithm is as follows:
V (S)=V1(S)+V2(S)
Wherein, V1(S) it is in the comment data meeting first kind quality problems description template, the risk assessment value of S, V2(S) it is in the comment data meeting Equations of The Second Kind quality problems description template, the risk assessment value of S.
In the formula of above-mentioned risk assessment algorithm, V1(S) computing formula is:
Wherein, VP(S)、VN(S)、VM(S) the commendation risk assessment value of qualitative character word S, derogatory sense risk assessment value, neutral risk assessed value are represented respectively.A, b, c represent the number of commendation emotion word of decorative features word S, the number of derogatory sense emotion word, the number of neutral emotion word respectively;Score(PSk) represent the emotion intensity of kth commendation emotion word of decorative features word S, Score (PASk) represent the emotion intensity of degree adverb of the kth commendation emotion word of decorative features word S, Score (NSl) represent the emotion intensity of the l derogatory sense emotion word of decorative features word S;Pi represents the number of the commendation emotion word in that comment at the i-th neutrality emotion word place of decorative features word S, NiThe number of the derogatory sense emotion word in that comment at the i-th neutrality emotion word place of expression decorative features word S, Score (PSij) represent decorative features word S i-th neutrality emotion word place that comment in the emotion intensity of jth commendation emotion word.
In the formula of above-mentioned risk assessment algorithm, V2(S) computing formula is:
Wherein, TiRepresent the score value of i-th template, NumiRepresent and meet the number of times that the comment data of i-th template occurs.
The present invention can capture user comment data relevant to appointed product on network automatically, and therefrom finds the quality problems of product, and then the various aspects of product quality are carried out risk assessment.The method utilizing the present invention, enterprise more rapid can effectively find the product quality problem that user is reflected, and the quality risk during product use is carried out real-time oversight.
Accompanying drawing illustrates:
Fig. 1 is the flow chart of the present invention.
Fig. 2 is the data acquisition flow chart of the present invention.
Fig. 3 is that the qualitative character word of the present invention extracts flow chart.
Fig. 4 is the dependency analysis exemplary plot of the present invention.
Fig. 5 is the training text example of the qualitative character word extraction of the present invention.
Fig. 6 is the feature templates of the qualitative character word extraction of the present invention.
Detailed description of the invention:
Below in conjunction with concrete accompanying drawing, the present invention is further illustrated.
The present invention is with the user comment in the network platforms such as forum, electricity business as object of study, it is therefore an objective to excavates the quality problems of product from network comment, and makes quality risk assessment.
Product quality problem based on network comment finds and methods of risk assessment, and including data acquisition, qualitative character word extracts, quality problems find and three steps of risk assessment, as shown in Figure 1.Separately below these three step is described in detail.
Step 1, data acquisition: utilize web crawlers, capture the webpages such as the forum relevant to appointed product, electric business, the then comment data in extraction webpage, and comment data is saved in data base.
The flow process of data acquisition is as shown in Figure 2.First, call Baidu's searching interface search appointed product, obtain specifying the search results pages of number of pages, and each search results pages comprises 13 Search Results;Then according to the following steps each search results pages of process:
Step S101: extract the title of jth Search Results in the i-th search results pages.
Step S102: calculating title similarity: utilize formula (1) to calculate title and the similarity of ProductName, similarity Sim (title, ProductName) represents, and 0≤Sim (title, ProductName)≤1.If similarity is more than or equal to 0.8, then continuing next step, otherwise, j adds 1, returns step S101.
Wherein, Z is normalization factor,
αkIt is position parameter,
PkIt is single similarity,
In formula (1), (2), (3) and (4), m is the word number comprised in " ProductName ", n is the word number comprised in " title ", " title (k+l-1) " represents kth+l-1 word in title, and " ProductName (l) " represents the l word in ProductName.
Step S103: extract the URL of jth Search Results in the i-th search results pages.
Step S104: coupling URL: according to the URL of jth Search Results, it is judged that whether this Search Results is forum or electricity business website, the most then continue next step, and otherwise, j adds 1, returns step S101.
Step S105: webpage capture and information extraction: what different types of webpage was corresponding captures with extraction strategy is different, so needing different websites is formulated different crawls and extraction template, Fig. 2 gives the templates such as Zhong Guan-cun, the Pacific Ocean, www.yesky.com, Jingdone district, Suning, No. 1 shop, the number of template does not limits, and can be extended.
Step S106: terminate to judge: after whole Search Results of the i-th search results pages have all processed, if in 13 Search Results in page i-th, meet the Search Results number of title similarity more than 10, then i+1, j=1, forwards S101 to, continues with next search results pages, otherwise, data acquisition end-of-job.
Step 2, qualitative character word extracts: first comment text carries out participle and part-of-speech tagging, syntactic analysis, emotion word mark three step pretreatment, and formulate feature templates, and then training condition random field models, finally utilize conditional random field models to extract qualitative character word from comment data.
The invention provides the method extracting qualitative character word from comment data, method flow diagram is as shown in Figure 3.First carry out the three step pretreatment such as participle and part-of-speech tagging S201, syntactic analysis S202, emotion word mark S203, obtain structured text 201;Then using uniform sampling approach to take out the result of 500 comments from text 201, all qualitative character words in manually commenting on these 500 are labeled as " S ", obtain training set 202;Then using training set 202 and feature templates 203 as input, condition random field Algorithm for Training is utilized to go out the conditional random field models 204 of statistical relationship between an emotion direction characterizing qualitative character word and word, part of speech, dependence, governing word and domination;Then utilize model 204 that text 201 is carried out the automatic marking of qualitative character word, obtain result set 205;From result set, finally extract the word being labeled as S, obtain qualitative character word set 206.
Below S201~S204 is described in detail:
Step S201, participle and part-of-speech tagging: the purpose that qualitative character word extracts is to extract the word relevant to product quality from comment data, but owing to Chinese does not exist space when writing between word and word, computer cannot directly carry out the identification of word, so, to first carry out word segmentation processing.The effect of participle is that one section of text of continuous print is divided into word one by one, and such as, given a word " screen of mobile phone is the fuzzyyest ", the result after participle is " screen of mobile phone is the fuzzyyest ".The word of describing mass problem has certain statistical law in part of speech, such as: qualitative character word major part is all noun;Adverbial word is that the probability of qualitative character word is almost nil etc..So, after participle, part-of-speech tagging to be carried out, mark out the part of speech of each word, the annotation results of above-mentioned example be " mobile phone/n /u screen/n very/d is fuzzy/a ".
Step S202, dependency analysis: the theoretical basis of dependency analysis is dependency grammar, this grammer thinks that the predicate verb in sentence is the center arranging other compositions, and itself is not arranged by other any compositions, all of subject composition is all subordinated to its allocator with certain dependence, the relation between the direct descriptor of dependency grammar and word.Given example " mobile phone/n /u screen/n very/d is fuzzy/a ", the result of dependency analysis is as shown in Figure 4.In the result of dependency analysis, dependence is directly there is between word and word, constitute one interdependent right, one of them is governing word, and another is dependent, dependence represents with a directed arc, being called interdependent arc, the direction of interdependent arc, for be pointed to dependent by governing word, each interdependent arc has a labelling, it is called relationship type, represents to there is which type of dependence between two words of this interdependent centering.In this example, screen is qualitative character word, figure 4, it is seen that the governing word of " screen " is " obscuring ", the dependence between " screen " and " obscuring " is " SBV ", i.e. subject-predicate relation.
Step S203, emotion word marks: through step S201, S202,4 contents such as word, part of speech, dependence, governing word are obtained, for " screen of mobile phone is the fuzzyyest " this example, the result obtained is front 5 row of form in Fig. 5, one record of each of which behavior, every record includes four fields such as word, part of speech, dependence, governing word.The basis of emotion word mark is sentiment dictionary, in sentiment dictionary, comprises conventional emotion word, such as " obscuring ", " high ", " good " etc..The object of Emotion tagging is governing word, utilizes sentiment dictionary, and marking out governing word is emotion word, is that emotion word is then labeled as " Y ", is not that emotion word is then labeled as " N ".Result as shown in Figure 5 has been obtained after Emotion tagging.
Step S204, qualitative character word based on conditional random field models extracts: qualitative character word based on condition random field extracts and is made up of two parts: trains and processes.In the training stage, taking out the result of 500 comments from text 201 initially with uniform sampling approach, all qualitative character words in manually commenting on these 500 are labeled as " S ", obtain training set 202;Five kinds of factors such as the emotion direction then considering word, part of speech, dependence, governing word and domination, make feature templates as shown in Figure 6;Then using training set 202 and feature templates 203 as input, condition random field Algorithm for Training is utilized to go out the conditional random field models 204 of statistical relationship between an emotion direction characterizing qualitative character word and word, part of speech, dependence, governing word and domination;Processing stage, utilize the model 204 that trained that text 201 carries out the automatic marking of qualitative character word, obtain result set 205, from result set, then extract the word being labeled as S, obtain qualitative character word set 206.
Step 3, quality problems find and risk assessment: first propose quality problems description template, and go out the quality problems relevant to each qualitative character word based on this template statistics;Then propose risk assessment algorithm based on qualitative character word, utilize risk assessment algorithm, calculate the risk assessment value of each qualitative character word.
User is when describing mass problem, owing to everyone language convention is different, and description form the most multiple to same quality problems.The present invention is on the basis of analyzing a large amount of comment data, take out and can contain the template that major part quality problems describe, quality problems description template is broadly divided into two classes: the first kind is made up of qualitative character word and emotion word, such as " screen obscures ", wherein " screen " is qualitative character word, and " obscuring " is emotion word;Equations of The Second Kind is made up of " no " word and qualitative character word, such as " not reading address list ", wherein contains " no " word, and " address list " is qualitative character word.As shown in table 1, wherein 1,2,3 is first kind template, and 4,5,6 is Equations of The Second Kind template in the more detailed classification of quality problems description template.
Table 1: quality problems description template exhaustive division
Sequence number Quality problems description template Citing
1 Qualitative character word+emotion word Screen obscures
2 Qualitative character word+degree adverb+emotion word Pixel is the lowest
3 Qualitative character word+emotion word+degree adverb System is very bad
4 Verb+or not auxiliary word+qualitative character word Do not read address list
5 Qualitative character word+verb+or not auxiliary word Take pictures have more than is needed
6 Qualitative character word+or not auxiliary word+verb Compass can not be used
Risk assessment algorithm is described below.
Risk assessment algorithm is based on emotion word dictionary and degree adverb dictionary.Emotion word dictionary emotion based on the Dalian University of Science & Engineering vocabulary body that the present invention uses, have chosen the partial words in emotion vocabulary body, and it is possible to additionally incorporate some new cyberspeaks, the also emotional semantic classification to word and re-started division.In the emotion word dictionary of the present invention, the Sentiment orientation of word includes three classes: commendation, derogatory sense, neutrality, represent with P, N, M respectively, for commendation emotion word and derogatory sense emotion word, emotion intensity is divided into 1,3,5,7,9 five grades, and 9 represent maximum intensity, and 1 is that intensity is minimum, for neutral emotion word, emotion intensity is 0.The degree adverb dictionary that the present invention uses, based on knowing the degree rank word collection of net, therefrom have chosen partial words, and it is possible to additionally incorporate some conventional degree adverbs.Word is divided into four classes according to emotion intensity by this degree adverb dictionary, is respectively as follows: " extremely ", " very ", " relatively ", " slightly ", and corresponding emotion intensity level is 4,3,2,1.
A given qualitative character word S, its risk assessment value is designated as V (S), and the computing formula of V (S) is as follows:
V (S)=V1(S)+V2(S) (5)
Wherein, V1(S) it is in the comment data meeting first kind quality problems description template, the risk assessment value of S, its computational methods are as follows:
The risk assessment of qualitative character word is divided into commendation risk assessment, derogatory sense risk assessment, neutral risk assessment three class.
Commendation risk assessment: in commenting at one, if meeting first kind quality description template, and the emotion word modifying this feature word is commendation, the calculating of commendation risk assessment to be carried out.First finding out the emotion word modifying this feature word, then judge whether to meet the template containing degree adverb, if meeting, then risk assessment is: " the emotion intensity of the emotion intensity+degree adverb of emotion word ";If not meeting, risk assessment is: " the emotion intensity of emotion word ".
Derogatory sense risk assessment: in commenting at one, if meeting first kind quality description template, and the emotion word modifying this feature word is derogatory sense, the calculating of derogatory sense risk assessment to be carried out.First finding out the emotion word modifying this feature word, then judge whether to meet the template containing degree adverb, if meeting, then risk assessment is: " the emotion intensity of the emotion intensity+degree adverb of emotion word ";If not meeting, risk assessment is: " the emotion intensity of emotion word ".
Neutral risk assessment: in commenting at one, if meeting first kind quality description template, the nearest emotion word modifying this feature word is neutral, then to carry out the calculating of neutral risk assessment.At this moment, the commendation risk assessment of the risk assessment of this feature word=this comment risk assessment=this comment and the difference of derogatory sense risk assessment.
V1(S) computing formula is as follows:
Wherein, TiIt is normalization factor:
Ti=Pi+Ni (7)
In formula (6), (7), VP(S)、VN(S)、VM(S) the commendation risk assessment value of qualitative character word S, derogatory sense risk assessment value, neutral risk assessed value are represented respectively.A, b, c represent the number of commendation emotion word of decorative features word S, the number of derogatory sense emotion word, the number of neutral emotion word respectively;Score(PSk) represent the emotion intensity of kth commendation emotion word of decorative features word S, Score (PASk) represent the emotion intensity of degree adverb of the kth commendation emotion word of decorative features word S, Score (NSl) represent the emotion intensity of the l derogatory sense emotion word of decorative features word S;Pi represents the number of the commendation emotion word in that comment at the i-th neutrality emotion word place of decorative features word S, NiThe number of the derogatory sense emotion word in that comment at the i-th neutrality emotion word place of expression decorative features word S, Score (PSij) represent decorative features word S i-th neutrality emotion word place that comment in the emotion intensity of jth commendation emotion word.
V2(S) it is in the comment data meeting Equations of The Second Kind quality problems description template, the risk assessment value of S.Their computing formula is as follows:
Wherein, TiRepresent the score value of i-th template, NumiRepresenting and meet the number of times that the comment data of i-th template occurs, the span of i is 4,5,6, the most corresponding 4th, 5, No. 6 templates.
The present invention can capture user comment data relevant to appointed product on network automatically, and therefrom finds the quality problems of product, and then the various aspects of product quality are carried out risk assessment.The method utilizing the present invention, enterprise more rapid can effectively find the product quality problem that user is reflected, and the quality risk during product use is carried out real-time oversight.

Claims (10)

1. a product quality problem based on network comment finds and methods of risk assessment, it is characterised in that including:
Step 1, data acquisition: utilize web crawlers, capture the webpages such as the forum relevant to appointed product, electric business, then take out Take the comment data in webpage, and comment data is saved in data base;
Step 2, qualitative character word extracts: first comment text carries out participle and part-of-speech tagging, syntactic analysis, emotion word mark Note three step pretreatment, and formulate feature templates, then training condition random field models, finally utilize conditional random field models from commenting Opinion extracting data qualitative character word;
Step 3, quality problems find and risk assessment: first propose quality problems description template, and go out based on this template statistics The quality problems relevant to each qualitative character word;Then propose risk assessment algorithm based on qualitative character word, utilize risk to comment Estimation algorithm calculates the risk assessment value of each qualitative character word.
2. product quality problem based on network comment as claimed in claim 1 finds and methods of risk assessment, it is characterised in that: In step 1, when capturing the webpages such as the forum relevant to appointed product, electric business, ProductName and web page title similarity are calculated Formula is:
Wherein, Z is normalization factor, αkIt is position parameter, and 0 < αk≤ 1, PkIt is single similarity, PkValue be 0 Or 1.
3. product quality problem based on network comment as claimed in claim 1 finds and methods of risk assessment, it is characterised in that: In step 2, emotion word dictionary is used to carry out emotion word mark.
4. product quality problem based on network comment as claimed in claim 3 finds and methods of risk assessment, it is characterised in that: In described emotion word dictionary, the Sentiment orientation of word includes three classes: commendation, derogatory sense, neutrality, represents with P, N, M respectively, For commendation emotion word and derogatory sense emotion word, emotion intensity is divided into 1,3,5,7,9 five grades, and 9 represent maximum intensity, and 1 is strong Degree minimum, for neutral emotion word, emotion intensity is 0.
5. product quality problem based on network comment as claimed in claim 1 finds and methods of risk assessment, it is characterised in that: In step 3, quality problems description template is broadly divided into two classes: the first kind is made up of qualitative character word and emotion word, Equations of The Second Kind It is made up of " no " word and qualitative character word.
6. product quality problem based on network comment as claimed in claim 1 finds and methods of risk assessment, it is characterised in that: In step 3, the realization of risk assessment algorithm make use of emotion word dictionary and degree adverb dictionary.
7. product quality problem based on network comment as claimed in claim 6 finds and methods of risk assessment, it is characterised in that: In described degree adverb dictionary, word is divided into four classes according to emotion intensity, is respectively as follows: " extremely ", " very ", " relatively ", " slightly ", Corresponding emotion intensity level is 4,3,2,1.
8. product quality problem based on network comment as claimed in claim 1 finds and methods of risk assessment, it is characterised in that: In step 3, the formula of risk assessment algorithm is:
V (S)=V1(S)+V2(S)
Wherein, V1(S) it is in the comment data meeting first kind quality problems description template, the risk assessment value of S, V2(S) it is In the comment data meeting Equations of The Second Kind quality problems description template, the risk assessment value of S.
9. product quality problem based on network comment as claimed in claim 8 finds and methods of risk assessment, it is characterised in that: In the formula of described risk assessment algorithm, V1(S) computing formula is:
V 1 ( S ) = V P ( S ) - V N ( S ) + V M ( S ) = Σ k = 1 a [ S c o r e ( P S k ) + S c o r e ( PA S k ) ] - Σ l = 1 b [ S c o r e ( N S l ) + S c o r e ( NA S l ) ] + Σ i = 1 c 1 T i { Σ j = 1 P i [ S c o r e ( P S i j ) + S c o r e ( PA S i j ) ] - Σ j = 1 N i [ S c o r e ( N S i j ) + S c o r e ( NA S i j ) ] }
Wherein, VP(S)、VN(S)、VM(S) the commendation risk assessment value of qualitative character word S, derogatory sense risk assessment are represented respectively Value, neutral risk assessed value;A, b, c represent respectively the number of commendation emotion word of decorative features word S, derogatory sense emotion word The number of emotion word several, neutral;Score(PSk) represent the emotion intensity of kth commendation emotion word of decorative features word S, Score(PASk) represent the emotion intensity of degree adverb of the kth commendation emotion word of decorative features word S, Score (NSl) represent and repair The emotion intensity of the l derogatory sense emotion word of decorations Feature Words S;Pi represents the i-th neutrality emotion word place of decorative features word S The number of the commendation emotion word in that comment, NiRepresent that comment at the i-th neutrality emotion word place of decorative features word S In the number of derogatory sense emotion word, Score (PSij) represent decorative features word S i-th neutrality emotion word place that comment in The emotion intensity of jth commendation emotion word.
10. product quality problem based on network comment as claimed in claim 8 finds and methods of risk assessment, and its feature exists In: in the formula of described risk assessment algorithm, V2(S) computing formula is:
V 2 ( S ) = Σ i = 4 6 T i × Num i
Wherein, TiRepresent the score value of i-th template, NumiRepresent and meet the number of times that the comment data of i-th template occurs.
CN201610212917.7A 2016-05-30 2016-05-30 Product quality problem discovery and risk assessment method based on network comments Pending CN105844424A (en)

Priority Applications (2)

Application Number Priority Date Filing Date Title
CN201610212917.7A CN105844424A (en) 2016-05-30 2016-05-30 Product quality problem discovery and risk assessment method based on network comments
CN202110934697.XA CN113837531A (en) 2016-05-30 2016-05-30 Product quality problem finding and risk assessment method based on network comments

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610212917.7A CN105844424A (en) 2016-05-30 2016-05-30 Product quality problem discovery and risk assessment method based on network comments

Related Child Applications (1)

Application Number Title Priority Date Filing Date
CN202110934697.XA Division CN113837531A (en) 2016-05-30 2016-05-30 Product quality problem finding and risk assessment method based on network comments

Publications (1)

Publication Number Publication Date
CN105844424A true CN105844424A (en) 2016-08-10

Family

ID=56596842

Family Applications (2)

Application Number Title Priority Date Filing Date
CN202110934697.XA Pending CN113837531A (en) 2016-05-30 2016-05-30 Product quality problem finding and risk assessment method based on network comments
CN201610212917.7A Pending CN105844424A (en) 2016-05-30 2016-05-30 Product quality problem discovery and risk assessment method based on network comments

Family Applications Before (1)

Application Number Title Priority Date Filing Date
CN202110934697.XA Pending CN113837531A (en) 2016-05-30 2016-05-30 Product quality problem finding and risk assessment method based on network comments

Country Status (1)

Country Link
CN (2) CN113837531A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106294885A (en) * 2016-10-09 2017-01-04 华东师范大学 A kind of data collection towards isomery webpage and mask method
CN106570631A (en) * 2016-10-28 2017-04-19 南京邮电大学 Method and system of facing P2P platform operation risk estimation
CN106649260A (en) * 2016-10-19 2017-05-10 中国计量大学 Product feature structure tree construction method based on comment text mining
CN107133214A (en) * 2017-05-05 2017-09-05 中国计量大学 A kind of product demand preference profiles based on comment information are excavated and its method for evaluating quality
CN107169091A (en) * 2017-05-12 2017-09-15 北京奇艺世纪科技有限公司 A kind of data analysing method and device
CN107767156A (en) * 2016-08-17 2018-03-06 百度在线网络技术(北京)有限公司 A kind of information input method, apparatus and system
CN107977798A (en) * 2017-12-21 2018-05-01 中国计量大学 A kind of risk evaluating method of e-commerce product quality
CN108256078A (en) * 2018-01-18 2018-07-06 北京百度网讯科技有限公司 Information acquisition method and device
CN108733748A (en) * 2018-04-04 2018-11-02 浙江大学城市学院 A kind of cross-border product quality risk fuzzy prediction method based on comment on commodity public sentiment
CN109145097A (en) * 2018-06-11 2019-01-04 人民法院信息技术服务中心 A kind of judgement document's classification method based on information extraction
CN109857838A (en) * 2019-02-12 2019-06-07 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN110008311A (en) * 2019-04-04 2019-07-12 北京邮电大学 A kind of product information security risk monitoring method based on semantic analysis
CN110135694A (en) * 2019-04-12 2019-08-16 深圳壹账通智能科技有限公司 Product risks appraisal procedure, device, computer equipment and storage medium
CN110704581A (en) * 2019-09-11 2020-01-17 阿里巴巴集团控股有限公司 Computer-executed text emotion analysis method and device
CN111461876A (en) * 2020-05-07 2020-07-28 赵玉洁 E-commerce credit system management system and method based on big data
CN111861507A (en) * 2020-06-30 2020-10-30 成都数之联科技有限公司 Identification method and system for analyzing risks of online catering stores in real time
CN112182165A (en) * 2020-10-28 2021-01-05 杭州电子科技大学 New product quality planning method based on online comments
CN117150025A (en) * 2023-10-31 2023-12-01 湖南锦鳞智能科技有限公司 Intelligent data service identification system

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103399916A (en) * 2013-07-31 2013-11-20 清华大学 Internet comment and opinion mining method and system on basis of product features
CN103646088A (en) * 2013-12-13 2014-03-19 合肥工业大学 Product comment fine-grained emotional element extraction method based on CRFs and SVM
US20150186790A1 (en) * 2013-12-31 2015-07-02 Soshoma Inc. Systems and Methods for Automatic Understanding of Consumer Evaluations of Product Attributes from Consumer-Generated Reviews
CN104794154A (en) * 2015-03-11 2015-07-22 南通天呈医流互联网技术有限公司 O2O service quality evaluation model for medical apparatus based on text mining
CN105205699A (en) * 2015-09-17 2015-12-30 北京众荟信息技术有限公司 User label and hotel label matching method and device based on hotel comments

Family Cites Families (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20080249764A1 (en) * 2007-03-01 2008-10-09 Microsoft Corporation Smart Sentiment Classifier for Product Reviews
CN102890707A (en) * 2012-08-28 2013-01-23 华南理工大学 System for mining emotional tendencies of brief network comments based on conditional random field
CN103455562A (en) * 2013-08-13 2013-12-18 西安建筑科技大学 Text orientation analysis method and product review orientation discriminator on basis of same
CN103544242B (en) * 2013-09-29 2017-02-15 广东工业大学 Microblog-oriented emotion entity searching system
CN105354183A (en) * 2015-10-19 2016-02-24 Tcl集团股份有限公司 Analytic method, apparatus and system for internet comments of household electrical appliance products

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103399916A (en) * 2013-07-31 2013-11-20 清华大学 Internet comment and opinion mining method and system on basis of product features
CN103646088A (en) * 2013-12-13 2014-03-19 合肥工业大学 Product comment fine-grained emotional element extraction method based on CRFs and SVM
US20150186790A1 (en) * 2013-12-31 2015-07-02 Soshoma Inc. Systems and Methods for Automatic Understanding of Consumer Evaluations of Product Attributes from Consumer-Generated Reviews
CN104794154A (en) * 2015-03-11 2015-07-22 南通天呈医流互联网技术有限公司 O2O service quality evaluation model for medical apparatus based on text mining
CN105205699A (en) * 2015-09-17 2015-12-30 北京众荟信息技术有限公司 User label and hotel label matching method and device based on hotel comments

Cited By (29)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107767156A (en) * 2016-08-17 2018-03-06 百度在线网络技术(北京)有限公司 A kind of information input method, apparatus and system
CN106294885A (en) * 2016-10-09 2017-01-04 华东师范大学 A kind of data collection towards isomery webpage and mask method
CN106649260A (en) * 2016-10-19 2017-05-10 中国计量大学 Product feature structure tree construction method based on comment text mining
CN106649260B (en) * 2016-10-19 2022-01-25 中国计量大学 Product characteristic structure tree construction method based on comment text mining
CN106570631B (en) * 2016-10-28 2021-01-01 南京邮电大学 P2P platform-oriented operation risk assessment method and system
CN106570631A (en) * 2016-10-28 2017-04-19 南京邮电大学 Method and system of facing P2P platform operation risk estimation
CN107133214A (en) * 2017-05-05 2017-09-05 中国计量大学 A kind of product demand preference profiles based on comment information are excavated and its method for evaluating quality
CN107169091A (en) * 2017-05-12 2017-09-15 北京奇艺世纪科技有限公司 A kind of data analysing method and device
CN107977798A (en) * 2017-12-21 2018-05-01 中国计量大学 A kind of risk evaluating method of e-commerce product quality
CN107977798B (en) * 2017-12-21 2023-09-12 中国计量大学 Risk assessment method for quality of electronic commerce product
CN108256078A (en) * 2018-01-18 2018-07-06 北京百度网讯科技有限公司 Information acquisition method and device
CN108256078B (en) * 2018-01-18 2019-07-12 北京百度网讯科技有限公司 Information acquisition method and device
CN108733748B (en) * 2018-04-04 2022-01-14 浙江大学城市学院 Cross-border product quality risk fuzzy prediction method based on commodity comment public sentiment
CN108733748A (en) * 2018-04-04 2018-11-02 浙江大学城市学院 A kind of cross-border product quality risk fuzzy prediction method based on comment on commodity public sentiment
CN109145097A (en) * 2018-06-11 2019-01-04 人民法院信息技术服务中心 A kind of judgement document's classification method based on information extraction
CN109857838B (en) * 2019-02-12 2021-01-26 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN109857838A (en) * 2019-02-12 2019-06-07 北京字节跳动网络技术有限公司 Method and apparatus for generating information
CN110008311B (en) * 2019-04-04 2020-11-24 北京邮电大学 Product information safety risk monitoring method based on semantic analysis
CN110008311A (en) * 2019-04-04 2019-07-12 北京邮电大学 A kind of product information security risk monitoring method based on semantic analysis
CN110135694A (en) * 2019-04-12 2019-08-16 深圳壹账通智能科技有限公司 Product risks appraisal procedure, device, computer equipment and storage medium
CN110704581A (en) * 2019-09-11 2020-01-17 阿里巴巴集团控股有限公司 Computer-executed text emotion analysis method and device
CN110704581B (en) * 2019-09-11 2024-03-08 创新先进技术有限公司 Text emotion analysis method and device executed by computer
CN111461876A (en) * 2020-05-07 2020-07-28 赵玉洁 E-commerce credit system management system and method based on big data
CN111861507A (en) * 2020-06-30 2020-10-30 成都数之联科技有限公司 Identification method and system for analyzing risks of online catering stores in real time
CN111861507B (en) * 2020-06-30 2023-10-24 成都数之联科技股份有限公司 Identification method and system for real-time analysis of risks of network restaurant shops
CN112182165A (en) * 2020-10-28 2021-01-05 杭州电子科技大学 New product quality planning method based on online comments
CN112182165B (en) * 2020-10-28 2022-05-20 杭州电子科技大学 New product quality planning method based on online comments
CN117150025A (en) * 2023-10-31 2023-12-01 湖南锦鳞智能科技有限公司 Intelligent data service identification system
CN117150025B (en) * 2023-10-31 2024-01-26 湖南锦鳞智能科技有限公司 Intelligent data service identification system

Also Published As

Publication number Publication date
CN113837531A (en) 2021-12-24

Similar Documents

Publication Publication Date Title
CN105844424A (en) Product quality problem discovery and risk assessment method based on network comments
CN106649260B (en) Product characteristic structure tree construction method based on comment text mining
CN102262634B (en) Automatic questioning and answering method and system
CN102253982B (en) Query suggestion method based on query semantics and click-through data
CN103605658B (en) A kind of search engine system analyzed based on text emotion
CN103886067B (en) Method for recommending books through label implied topic
CN108363725B (en) Method for extracting user comment opinions and generating opinion labels
CN105718586A (en) Word division method and device
CN104408093A (en) News event element extracting method and device
CN102866989A (en) Viewpoint extracting method based on word dependence relationship
CN103246644B (en) Method and device for processing Internet public opinion information
CN101127042A (en) Sensibility classification method based on language model
CN105677857B (en) method and device for accurately matching keywords with marketing landing pages
CN103020230A (en) Semantic fuzzy matching method
CN104484380A (en) Personalized search method and personalized search device
CN105718585B (en) Document and label word justice correlating method and its device
CN103823893A (en) User comment-based product search method and system
CN104317834A (en) Cross-media sorting method based on deep neural network
CN103886020B (en) A kind of real estate information method for fast searching
CN105630768A (en) Cascaded conditional random field-based product name recognition method and device
CN102693279A (en) Method, device and system for fast calculating comment similarity
CN103177036A (en) Method and system for label automatic extraction
CN109376202A (en) A kind of supply relationship based on NLP extracts analysis method automatically
CN105787662A (en) Mobile application software performance prediction method based on attributes
CN104281565A (en) Semantic dictionary constructing method and device

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
CB02 Change of applicant information

Address after: 310018 258 Xiyuan street, Xiasha, Hangzhou, Zhejiang

Applicant after: China Jiliang University

Address before: 310018 258 Xiyuan street, Xiasha, Hangzhou, Zhejiang

Applicant before: China Jiliang University

CB02 Change of applicant information
RJ01 Rejection of invention patent application after publication

Application publication date: 20160810

RJ01 Rejection of invention patent application after publication