CN110648173B

CN110648173B - Unsupervised abnormal commodity data detection method based on good evaluation and poor evaluation rates of commodities

Info

Publication number: CN110648173B
Application number: CN201910887119.8A
Authority: CN
Inventors: 刘静; 侯志鹏
Original assignee: Xidian University
Current assignee: Xidian University
Priority date: 2019-09-19
Filing date: 2019-09-19
Publication date: 2023-04-07
Anticipated expiration: 2039-09-19
Also published as: CN110648173A

Abstract

The invention discloses an unsupervised abnormal commodity data detection method based on good evaluation and bad evaluation rates of commodities, and mainly solves the problem that abnormal commodity data in an online shopping mall is low in detection accuracy. The implementation scheme is as follows: determining the data type of the detected abnormal commodity; for the detection of abnormally high-scoring commodities, firstly, calculating the good scoring rate of each commodity; calculating the difference good evaluation rate of the commodity after the difference operator; finally determining abnormal high-score commodities; for the detection of abnormal low-grade commodities, firstly, calculating the poor evaluation rate of each commodity; calculating the scaling poor evaluation rate of the commodity after the scaling operator; and finally determining the abnormal low-score commodities. The invention provides two calculation indexes and two operation operators for two abnormal commodity data detection scenes, can more accurately detect abnormal commodities, helps system maintenance personnel to find problematic commodities as soon as possible and delete abnormal data in time, and can be used for the stability of a detection and maintenance system of abnormal commodity data in an online shopping mall.

Description

Unsupervised abnormal commodity data detection method based on good evaluation and poor evaluation rates of commodities

Technical Field

The invention belongs to the technical field of detection, and particularly relates to a method for detecting abnormal commodity data, which can be used for detecting the abnormal commodity data by an online mall and maintaining the stability of an online mall system.

Background

With the rapid development of information technology and the internet, online shopping is the first choice for more and more people to purchase goods. In order to increase the exposure rate of the commodities of some merchants and increase the sales volume of the commodities, the merchants can prompt the users to give good comments to the commodities by cashback, reward and the like, namely, to make high scores, and even to press the commodities of competitors, the merchants directly hire the users to give bad comments to the commodities of the competitors maliciously, namely, to make low scores. For example, the domestic online shopping website is Taobao, a community website bean-paste web providing recommendations and comments for books, movies and music, and a foreign online shopping website eBay and other well-known electronic commerce websites are found in the system. The abnormal commodity data can greatly affect the stability of the system, thereby affecting the use experience of the user and even causing the user to give up using the abnormal commodity data. Therefore, the abnormal commodity data in the system can be timely and effectively detected, the system maintenance personnel can be helped to find out the commodity with problems as soon as possible, the abnormal data can be timely deleted, and the stability of the system can be maintained, which is very important.

According to the publication "Robust collaborative registration" (recommendation Systems Handbook, page number: 805-835, 2015) by Burke Robin et al. At present, a KNN method based on clustering or a C4.5 method based on a decision tree are two more classical and common abnormal data detection methods. The clustering-based KNN method directly utilizes original data to perform clustering, and abnormal data and non-abnormal data are clustered into different categories, so that detection is completed. The method is an unsupervised method, so that the method does not need pre-training and is simple and effective. However, since the scoring information of the product is directly used, the highest score and the lowest score of the product are not quantitatively analyzed, and thus the detection accuracy for abnormal product data is not high. And a C4.5 method based on the decision tree directly utilizes the data to construct the decision tree, thereby completing the distinguishing and detection of abnormal data. Although the detection accuracy rate is higher than that of the KNN method based on clustering, the method is a supervised model, and a certain amount of false data needs to be artificially constructed in advance to train the model so as to complete the detection. However, the data constructed artificially is often far from the real situation, and it is difficult to simulate the complex situation in the real system, so the method is limited to be used in the real system.

Disclosure of Invention

The invention aims to provide an unsupervised abnormal commodity data detection method based on good evaluation and poor evaluation rates of commodities, and aims to solve the problems that in the prior art, due to the lack of quantitative analysis on commodity grading conditions, the detection accuracy is low, and the limitation that a certain amount of false data needs to be artificially constructed in advance to carry out detection is caused.

The technical idea of the invention is that in the detection process of abnormal high-score commodities, good-score index is defined to quantify the condition that the commodities obtain high scores, and differential operator operation is defined to eliminate noise in data and highlight good-score data of the abnormal high-score commodities, so that the detection accuracy of the abnormal commodities is improved. In the detection process of the abnormal low-score commodities, poor-score indexes are defined to quantify the condition that the commodities obtain low scores, scaling operator operation is defined to overcome the power law distribution characteristics of data, the poor-score data of the abnormal low-score commodities are highlighted, and therefore the detection accuracy of the abnormal commodities is improved. The method comprises the following implementation steps:

(1) Entering data:

according to the scoring records of the commodities by the user in the e-commerce website, the scoring data of each commodity is extracted, and a commodity set O = { O } is formed by all commodities in the extracted data ₁ ,o ₂ ,....,o _i ,...,o _m And constructing a user set U = { U } by using all users in the extracted data ₁ ,u ₂ ,...,u _j ,...,u _n In which o is _i Denotes the ith commodity, i is from 1 to m, m is the total number of commodities, u _j Representing j users, wherein j is from 1 to n, and n is the total number of the users;

(2) Determining whether the detection is to detect the abnormally high-scoring commodity: if yes, executing the step (3); if not, the detected commodity is an abnormal low-grade commodity, and the step (6) is skipped;

(3) Calculating the good rate of each commodity:

(3a) For each commodity O in the commodity set O _i Statistics for each item o _i User number r with scoring behavior _i ；

(3b) For each item O in the set of items O _i Calculating the good rate H of each commodity _i :

Wherein r is _{i_max} Is a commercial product o _i The number of scores equal to the highest score of the system, if the allowed score range of the current system is 1 to 5 _{i_max} Represents a commodity o _i A number of scores equal to 5 in the scores of (a);

(4) Calculating the difference good evaluation rate of each commodity:

(4a) According to the number of scores possessed by the goods r _i Good rating H of goods sorted in descending order _i ；

(4b) The number r of the commodities is scored according to the commodities _i On the basis of the ranking, for each item o _i Taking the position of the user in the commodity sequencing sequence as a center, respectively selecting l/2 commodities forwards and backwards to construct a commodity o _i Of neighbor commodity set Γ _i ＝{g ₁ ,g ₂ ,...,g _k ,...,g _l In which g is _k Represents a commodity o _i K is the product o, k is from 1 to l _i Total number of neighboring commodities of (a);

(4c) For the good rate of each commodity, calculating the differential good rate D after the difference _i ：

Wherein H _k Is a commodity o _i (ii) a good rating of the kth neighboring commodity;

(5) Selecting the number of scores r of the commodities in the commodity set O _i The commodities larger than 1% of the total number n of the users form an abnormal commodity candidate set, and the abnormal commodity candidate set is selected to have the maximum difference good evaluation rate D _i Article o of _i An output as a result of the detection;

(6) Calculating the poor rating for each commodity:

(6a) For each commodity O in the commodity set O _i Statistics for each item o _i Number of users with scoring behavior r _i ；

(6b) For each item O in the set of items O _i Calculating a poor rating C of each commodity _i :

Wherein r is _{i_min} Is a commodity o _i The number of scores equal to the lowest score of the system, if the allowed score range of the current system is 1 to 5 _{i_min} Represents a commodity o _i The number of scores equal to 1 in the scores of (a);

(7) For the poor rating of each commodity, calculating the zoom poor rating S after zooming _i ：

Wherein

Is the number of scores r owned by each commodity in the commodity set O _i Average value of (d);

(8) Selecting the number of scores r of the commodities in the commodity set O _i The commodities which are larger than 1% of the total number n of the users form an abnormal commodity candidate set, and the abnormal commodity candidate set is selected to have the maximum zoom difference rating S _i Article o of _i As an output of the detection result.

Compared with the prior art, the invention has the following advantages:

firstly, the invention defines the statistical indexes of good rating and poor rating of the commodity, and quantifies the high-score and low-score scoring condition of the commodity through the two indexes. Compared with the method that the numerical analysis is carried out by directly using all scores of the commodities, the two indexes can more intuitively reflect the difference of abnormal commodity data, so that the abnormal commodities can be better detected.

Secondly, according to the numerical distribution characteristic that good evaluation rate numerical values of commodities with similar scoring numbers are relatively close to each other and the good evaluation rate numerical values of the abnormally high-score commodities are greatly different from the good evaluation rates of the commodities with similar scoring numbers, the method defines the operation of a difference operator to smooth good evaluation rate numerical noise, amplifies the difference between good evaluation rates of the commodities and highlights the abnormality of the good evaluation rate numerical values of the abnormally high-score commodities, so that the detection accuracy of the abnormally high-score commodities is further improved.

Thirdly, the invention defines the operation of the scaling operator according to the characteristic that the poor evaluation rate value of the commodity has power law distribution, so that the poor evaluation rates of the commodity are basically distributed on the same datum line after scaling. The poor evaluation rate of the abnormal low-score commodities is compared with that of normal commodities to form an obvious peak value, and therefore the detection accuracy rate of the abnormal low-score commodities is further improved.

Fourthly, the detection method is based on data statistical indexes, does not need to artificially construct a data training model in advance, is an unsupervised detection method, and has wider application range.

Drawings

FIG. 1 is a flow chart of an implementation of the present invention;

FIG. 2 is a simulation diagram of the present invention showing the numerical differentiation between the defined good rating and the differential good rating before and after the act of maliciously scoring the top of each commodity;

FIG. 3 is a simulation diagram of the numerical differentiation between the defined poor rating and the scaling poor rating before and after the act of maliciously undergrading each commodity in the present invention;

FIG. 4 is a simulation diagram of the results of detecting an abnormally high-scoring commodity in accordance with the present invention;

fig. 5 is a simulation diagram of the results of detecting an abnormally low-scoring commodity according to the present invention.

The specific implementation mode is as follows:

the embodiments and effects of the present invention will be described in further detail below with reference to the accompanying drawings.

Referring to fig. 1, the specific implementation steps of the present invention are as follows:

step 1, inputting data:

1.1 According to the scoring records of the users on the commodities in the e-commerce website, extracting specific scoring data of the users on each commodity in the website;

1.2 Construct a product set O = { O) using all products in the extracted data ₁ ,o ₂ ,....,o _i ,...,o _m In which o is _i Representing the ith commodity, i is from 1 to m, and m is the total number of commodities;

1.3 Form a user set U =with all users in the extracted data{u ₁ ,u ₂ ,...,u _j ,...,u _n In which u _j Representing the jth user, j being from 1 to n, n being the total number of users.

And 2, determining whether the detection is used for detecting the abnormally high-score commodities.

In general, when abnormal commodity data is detected, the detection of abnormal high-score commodities and the detection of abnormal low-score commodities can be classified into two cases. Determining the type of the detection according to actual requirements, and if the detected abnormal high-score commodity data is abnormal high-score commodity data, executing the step 3; if not, the detected commodity is an abnormal low-grade commodity, and the step 6 is skipped.

Step 3, calculating the good rating of each commodity:

3.1 For each item O in the set of items O) _i Statistics for each item o _i Number of users with scoring behavior r _i ；

3.2 For each item O in the set of items O _i Calculating a good rating H of each commodity _i :

Wherein r is _{i_max} Is a commercial product o _i The number of scores equal to the highest score of the system, if the allowed score range of the current system is 1 to 5 _{i_max} Represents a commodity o _i The number of scores in the score of (1) is equal to 5.

And 4, calculating the difference good evaluation rate of each commodity.

4.1 According to the number of scores possessed by the goods r) _i Good rating H of goods sorted in descending order _i ；

4.2 Number of grades r in terms of goods on goods _i On the basis of the ranking, for each item o _i Taking the position of the user in the commodity ordering sequence as a center, selecting l/2 commodities forwards and backwards respectively to construct a commodity o _i Of neighbor commodity set Γ _i ＝{g ₁ ,g ₂ ,...,g _k ,...,g _l Therein ofg _k Represents a commodity o _i K is the product o, k is from 1 to l _i The total number of neighboring commodities, this example l is equal to 1% of the number of users n;

4.3 Good rating H for each commodity _i Calculating the difference good evaluation rate D after the difference _i ：

Wherein H _k Is a commodity o _i (ii) a good rating of the kth neighboring commodity.

And step 5, determining the abnormal high-scoring commodities according to the calculated difference good scoring rate.

Selecting the number of grades r of the commodities in the commodity set O _i And (3) the commodities which are more than 1% of the total number n of the users form an abnormal commodity candidate set, and the commodities which have the maximum difference good evaluation rate in the abnormal commodity candidate set are selected as the output of the detection result.

Step 6, calculating the bad rating of each commodity:

6.1 For each item O in the set of items O) _i For each product o, statistics _i User number r with scoring behavior _i ；

6.2 For each item O in the set of items O _i Calculating a poor rating C of each commodity _i :

Wherein r is _{i_min} Is a commercial product o _i Is equal to the number of scores of the lowest score of the system, if the allowed score range of the current system is 1 to 5 _{i_min} Represents a commodity o _i The number of scores equal to 1.

Step 7, poor rating C for each product _i Calculating the zoom difference rating S after zooming _i ：

Wherein

and 8, determining abnormal low-score commodities according to the calculated zooming poor-score rate.

Selecting the number of grades r of the commodities in the commodity set O _i And (3) the commodities which are more than 1% of the total number n of the users form an abnormal commodity candidate set, and the commodities which have the largest scaling difference evaluation rate in the abnormal commodity candidate set are selected as the output of the detection result.

The effect of the present invention will be further described with reference to simulation experiments.

1. Simulation conditions are as follows:

the simulation experiment of the invention adopts a data set MovieLens-100K commonly used in the field of electronic commerce, which comprises 100000 pieces of rating data of 943 users for 1682 commodities, and the rating range is 1 to 5.

2. Simulation content and result analysis:

simulation 1: the data distinguishing effect of the good-scoring rate and the difference operator defined by the invention on the abnormal high-scoring commodity is further explained.

Firstly, on the basis of inputting a MovieLens-100K data set, calculating the original good rating and the original difference good rating numerical value of each commodity on the original data by using the method for detecting the abnormal high-rating commodities;

next, simulating the behavior of maliciously scoring a high score for each commodity, namely randomly selecting 3% of users who have not scored the commodity from the 943 system user number for each commodity, and adding scores for the commodity for the users, wherein the score is the highest score of the system, namely 5 scores;

thirdly, calculating the good rating and the difference good rating of each commodity after the behavior of maliciously scoring the high score by using the method for detecting the abnormally high-score commodities in the invention;

finally, the good evaluation rate of the product and the difference good evaluation rate before and after the behavior of malicious high scoring are compared and plotted, and the result is shown in fig. 2 (a) and 2 (b). Wherein:

fig. 2 (a) shows a good rating value distribution diagram of a commodity before and after a malicious rating behavior is performed on each commodity by 3% of the number of users who randomly select a system user and have not rated the commodity, an abscissa of fig. 2 (a) shows the rating number of the commodity in an original data set, an ordinate shows the good rating value of the commodity, a gray line shows a good rating value distribution curve of the commodity before the malicious rating behavior is performed, and a black line shows a good rating value distribution curve of the commodity after the malicious rating behavior is performed.

Fig. 2 (b) shows a differential good-rating numerical distribution diagram before and after a malicious high-rating act is performed on each commodity for 3% of the randomly selected system users who have not scored the commodity, an abscissa shows the number of scores of the commodity in the original data set, an ordinate shows the differential good-rating numerical value of the commodity, a gray line shows a differential good-rating numerical distribution curve of the commodity before the malicious high-rating act is performed, and a black line shows a differential good-rating numerical distribution curve of the commodity after the malicious high-rating act is performed.

As can be seen from fig. 2 (a), the good scoring rate values of the commodities with similar scoring numbers are also relatively close to each other, and when malicious high scoring behavior occurs, the good scoring rate values of the abnormally high scoring commodities are greatly different from the good scoring rate values of the commodities with similar scoring numbers. The condition that the good-scoring rate index can quantitatively describe the condition that the commodity obtains high scoring is reflected, and the index has good distinguishability when malicious high-scoring behaviors exist. Comparing the curves in fig. 2 (a) and fig. 2 (b), it can be seen that the difference operator can smooth the noise in the good-scoring value well, and can amplify the difference between good-scoring values of the goods, and highlight the abnormality of the good-scoring value of the abnormally high-scoring goods. In an actual scene, malicious high-scoring behavior often appears in a commodity with a poor score or without too many numbers of comments, namely the second half of the curve in fig. 2. For the data of the part, as can be clearly seen from fig. 2, the good-scoring rate index and the difference operator defined by the invention can well distinguish abnormal high-scoring commodity data from normal commodity data.

Simulation 2: the data distinguishing effect of the poor rating and the scaling operator on the abnormal low-rating commodity defined by the invention is further explained.

Firstly, on the basis of inputting a MovieLens-100K data set, calculating the original poor rating and the original zooming poor rating value of each commodity on the original data by using the method for detecting the abnormal low-rating commodities;

next, simulating the behavior of maliciously scoring the low score for each commodity, namely randomly selecting 3% of users who have not scored the commodity from the 943 system user number for each commodity, and adding scores for the commodity for the users, wherein the score is the lowest score of the system, namely 1 score;

thirdly, calculating the numerical values of the poor evaluation rate and the zooming poor evaluation rate of each commodity after the behavior of maliciously scoring low by using the method for detecting the abnormal low-scoring commodities is adopted;

finally, the values of the poor evaluation rate and the scaled poor evaluation rate of the product before and after the act of maliciously underscoring are plotted by comparison, and the results are shown in fig. 3 (a) and 3 (b). Wherein:

fig. 3 (a) shows a poor rating value distribution diagram before and after malicious rating grading is performed on each commodity for users who have not scored the commodity in 3% of the number of randomly selected system users, the abscissa shows the number of scores of the commodity in the original data set, the ordinate shows the poor rating value, the gray line shows a poor rating value distribution curve of the commodity before malicious rating is performed, and the black line shows a poor rating value distribution curve of the commodity after malicious rating is performed.

Fig. 3 (b) shows a zoom difference score value distribution graph before and after a malicious underscoring behavior is performed on each commodity for 3% of the number of users who randomly select the system user, where the abscissa shows the number of scores possessed by the commodity in the original data set, the ordinate shows the zoom difference score value of the commodity, the gray line shows the zoom difference score value distribution curve of the commodity before the malicious underscoring behavior is performed, and the black line shows the zoom difference score value distribution curve of the commodity after the malicious underscoring behavior is performed.

As can be seen from fig. 3 (a), before and after the act of maliciously scoring low scores is performed, the poor score value of the commodity changes significantly, which reflects that the poor score index defined by the present invention can quantitatively describe the condition that the commodity obtains low scores, and the index has good distinctiveness for the abnormal low-score commodity. Meanwhile, the phenomenon that the more the number of the reviews of the commodity is, the lower the bad review rate is, the less the number of the reviews is, the higher the bad review rate is can be found, that is, the bad review rate value of the commodity has power law distribution characteristics. Comparing the curves in fig. 3 (a) and fig. 3 (b), it can be seen that the poor evaluation rates of the commodities are substantially distributed on the same reference line after passing through the scaling operator, and the scaling poor evaluation rates of the abnormally low-scoring commodities have obvious peak values compared with the normal commodities. The scaling operator defined by the invention can well eliminate the influence of the power law distribution characteristic of the poor evaluation rate value of the commodity on the detection result, and can further highlight the data abnormality of the abnormal low-score commodity. In an actual scenario, the malicious low-scoring behavior is often found in the commodities with higher scores or more comments, i.e., the first half of the curve in fig. 3. For the data of the part, as can be clearly seen from fig. 3, the poor rating index and the scaling operator defined by the invention can well distinguish abnormal low-rating commodity data from normal commodity data.

Simulation 3: the effect of the abnormal data detection method of the invention, the cluster-based detection method KNN and the decision tree-based detection method C4.5 on detecting abnormal high-score commodity data is further explained.

Firstly, on the basis of inputting a MovieLens-100K data set, randomly selecting 50 commodities from a commodity set as a commodity set to be subjected to malicious high-score behaviors. Each detection takes out one commodity from a commodity set which is to take a malicious high-scoring action as a malicious high-scoring commodity, randomly selects users which have not scored the commodity from a user set according to the number of specified users participating in the malicious high-scoring action, and adds scores of the commodity for the users, wherein the score is 5, namely good score;

then, detecting the changed data set by using the method of the invention to obtain a detection result;

finally, comparing whether the abnormal commodities output by the method are consistent with the commodities selected in the previous step and subjected to malicious high scoring behavior, if so, marking as 1, namely correct detection, and otherwise, marking as 0, namely wrong detection, further obtaining the correct detection ratio of the method on the 50 commodities, wherein the higher the correct ratio is, the more accurate the detection is proved.

In the simulation experiment, the detection accuracy of the method is sequentially tested from 1% of the number of the system users participating in the malicious high-rating behavior, which is increased by 1%, to 10% of the number of the system users participating in the malicious high-rating behavior, and the result is shown in fig. 4. Wherein:

the abscissa represents the proportion of the number of users participating in malicious high-score behaviors to the total number of users of the system, the proportion is increased from 1% to 10% by taking 1% as a step length, the ordinate represents the detection accuracy of the method for detecting abnormally high-score commodities, the curve marked by a circle represents the detection accuracy curve of the detection method KNN based on clustering, the curve marked by a triangle represents the detection accuracy curve of the detection method C4.5 based on a decision tree, and the curve marked by a square represents the detection accuracy curve of the method.

As can be seen from fig. 4, the accuracy curve of the method of the present invention is always located above the accuracy curves of the cluster-based detection method KNN and the decision tree-based detection method C4.5, which indicates that the method of the present invention can more accurately detect the abnormal high-score commodity data. Meanwhile, when the number of users participating in malicious high scoring is not large, such as 1% to 2% of the total number of users of the system, the accuracy of the method is far higher than that of a cluster-based detection method KNN and a decision tree-based detection method C4.5, which shows that the method has higher data sensitivity for abnormally high-scoring commodities, can detect the abnormality in data early, and illustrates the effectiveness of the method from another aspect.

Simulation 4: the effect of the abnormal low-grade commodity data detection by the method, the cluster-based detection method KNN and the decision tree-based detection method C4.5 is further explained.

Firstly, on the basis of inputting a MovieLens-100K data set, randomly selecting 50 commodities from a commodity set as a commodity set to be subjected to malicious low-grade behavior. Taking a commodity from a commodity set which is to take a malicious low-grade behavior as a commodity which is subjected to the malicious low-grade behavior at this time, randomly selecting users which do not score the commodity from a user set according to the number of specified users participating in the malicious low-grade behavior, and adding scores of the commodity for the users, wherein the score is 1, namely poor score;

then, the changed data set is detected by using the method of the invention to obtain a detection result;

and finally, comparing whether the abnormal commodity output by the method is consistent with the commodity subjected to the malicious low-grade behavior selected in the previous step, if so, marking as 1, namely correct detection, and otherwise, marking as 0, namely wrong detection. Further, the detection accuracy rate of the method of the invention on the 50 commodities is obtained, and the higher the accuracy rate is, the more accurate the detection is proved.

In the simulation experiment, the detection accuracy of the method is sequentially tested from 1% of the number of the system users participating in the malicious low-grade division behavior, the detection accuracy is increased by 1% to 10% of the number of the system users participating in the malicious low-grade division behavior, and the result is shown in fig. 5. Wherein:

the abscissa represents the proportion of the number of users participating in malicious low-score behaviors to the total number of users of the system, the proportion is increased from 1% to 10% by taking 1% as a step length, the ordinate represents the detection accuracy of detecting abnormal low-score commodities by using a method, a curve marked by a circle represents a detection accuracy curve of a cluster-based detection method KNN, a curve marked by a triangle represents a detection accuracy curve of a decision tree-based detection method C4.5, and a curve marked by a square represents a detection accuracy curve of the method.

As can be seen from fig. 5, the accuracy curve of the method of the present invention is always located above the accuracy curves of the cluster-based detection method KNN and the decision tree-based detection method C4.5, which indicates that the method of the present invention can more accurately detect the abnormal low-score commodity data. Meanwhile, when the number of users participating in malicious low scoring is not large, such as 1% to 2% of the total number of users of the system, the accuracy of the method is far higher than that of a cluster-based detection method KNN and a decision tree-based detection method C4.5, which shows that the method has higher data sensitivity for abnormal low-scoring commodities, can detect the abnormality in the data early, and illustrates the effectiveness of the method from another aspect.

Claims

1. An unsupervised abnormal commodity data detection method based on good evaluation and poor evaluation rates of commodities is characterized by comprising the following steps:

(1) Entering data:

according to the scoring records of the commodities by the user in the e-commerce website, scoring data of each commodity is extracted, and a commodity set O = { O } is formed by all commodities in the extracted data ₁ ,o ₂ ,....,o _i ,...,o _m A start-up time of the system is shortened, constructing a user set U = { U } with all users in the extracted data ₁ ,u ₂ ,...,u _j ,...,u _n H, o therein _i Represents the ith product, i is from 1 to m, m is the total number of products, u _j Representing j users, wherein j is from 1 to n, and n is the total number of the users;

(3) Calculating the good rating of each commodity:

(3a) For each commodity O in the commodity set O _i For each product o, statistics _i Number of users with scoring behavior r _i ；

(4) Calculating the difference good evaluation rate of each commodity:

(4b) Number of commodities scored according to commodities r _i On the basis of the ranking, for each item o _i Taking the position of the user in the commodity ordering sequence as a center, selecting l/2 commodities forwards and backwards respectively to construct a commodity o _i Neighbor commodity set Γ _i ＝{g ₁ ,g ₂ ,...,g _k ,...,g _l In which g is _k Represents a commodity o _i K is the product o from 1 to l _i Total number of neighboring commodities of (a);

Wherein H _k As a commodity o _i (ii) a good rating of the kth neighboring commodity;

(5) Selecting the number of grades r of the commodities in the commodity set O _i The commodities which are more than 1% of the total number n of the users form an abnormal commodity candidate set, and the abnormal commodity candidate set is selected to have the maximum difference good evaluation rate D _i Article o of _i An output as a result of the detection;

(6) Calculating the poor rating for each commodity:

(6a) For each commodity O in the commodity set O _i Statistics for each item o _i User number r with scoring behavior _i ；

(6b) For each item O in the set of items O _i Calculating the poor rating C of each commodity _i :

Wherein r is _{i_min} Is a commercial product o _i Is equal to the number of scores of the lowest score of the system, if the allowed score range of the current system is 1 to 5 _{i_min} Represents a commodity o _i The number of scores equal to 1 in the scores of (a);

/>

Wherein