CN104463601A - Method for detecting users who score maliciously in online social media system - Google Patents

Method for detecting users who score maliciously in online social media system Download PDF

Info

Publication number
CN104463601A
CN104463601A CN201410638173.6A CN201410638173A CN104463601A CN 104463601 A CN104463601 A CN 104463601A CN 201410638173 A CN201410638173 A CN 201410638173A CN 104463601 A CN104463601 A CN 104463601A
Authority
CN
China
Prior art keywords
user
scoring
product
degree
social media
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201410638173.6A
Other languages
Chinese (zh)
Inventor
尚明生
蔡世民
高见
董宇蔚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
University of Electronic Science and Technology of China
Original Assignee
University of Electronic Science and Technology of China
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by University of Electronic Science and Technology of China filed Critical University of Electronic Science and Technology of China
Priority to CN201410638173.6A priority Critical patent/CN104463601A/en
Publication of CN104463601A publication Critical patent/CN104463601A/en
Pending legal-status Critical Current

Links

Landscapes

  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method for detecting users who score maliciously in an online social media system. The method for detecting users who score maliciously in the online social media system arms at scoring feedback. Firstly, clustering is conducted according to scores for products by users, and the normalized user confidence degree is calculated; secondly, the reliability degree of user scoring is calculated according to the user confidence degree to obtain a candidate list of the users who score maliciously; finally, candidate users who score maliciously are sorted in combination with the deviation degree of user scoring and product quality to obtain the final list of the users who score maliciously. The method has advantages in the aspects of calculation accuracy and efficiency and can be applied to large-scale online social media websites.

Description

A kind of method of detection of malicious scoring user in online Social Media system
Technical field
The present invention relates to the method for detection of malicious evaluation user in online Social Media system, particularly a kind of method for detection of malicious scoring user in the Social Media system of scoring feedback.
Background technology
Internet, as the carrier of commercial affairs, has become the instrument of requisite information acquisition, transmission and exchange, and the arrival of information age is be filled with new vitality based on the IT service sector of Internet.Wherein Social Media gets most of the attention especially, has been acknowledged as novel economizer pattern and the catalyzer of 21 world structures, has had the title of " sunrise industry, pollution-free industry ".Social Media is that the novel economizer of networking is movable, just with unprecedented speed fast development, has become country and has strengthened white war strength, wins the effective means of global resources configuration advantage.By Social Media people be no longer aspectant, look at out and out goods, carry on transactions by paper medium document (comprising cash), but presenting a feast for the eyes merchandise news by network, perfect logistics distribution system and convenient and safe financial account system are concluded the business.There is ten hundreds of electric business and hundreds of millions of consumer in Social Media, how to set up effective credit rating mechanism, build the environment of orderly competition, the person of guiding rational consumption just seems especially important.
The Reputation Evaluation System of Most current all to be commented on product based on user or is commented grading information, and user makes comments to bought product or carries out satisfaction scoring and have expressed user to the view of certain part product and satisfaction.These review information are that producer and potential consumer provide valuable information resources.The feedback opinion of the market status and consumer, by analyzing these information, can be understood in time by producer, and potential consumer also can in this, as the important reference buying product.Whether potential consumer determines to buy product, is the most also the most important reference frame height of the scoring of product acquisition itself and the quality of comment content often.For large-scale Social Media transaction platform, the commending system for potential user's recommended products most all based on user to the history score data of product and comment content.If the great majority comment of certain commodity is all front, so this user has very large possibility to buy this product; If great majority comment is negative, so these commodity are purchased hardly.When reality, some illegal businessman is in order to increase the interests of oneself, and employ a group of people to carry out malice comment to some commodity, its comment content and commodity actual value are not inconsistent, or malice is flattered or malice is slandered.Malice scoring and review information have impact on the reference value of review information, the serious selection misleading consumer, there is meaning in what weaken the scoring of normal users and review information, make consumer lose trust to Social Media product evaluation system gradually, and then jeopardize and finally compromise entire society's media industry.As can be seen here, the score data in Reputation Evaluation System and the authenticity of review information and Usefulness Pair mean a great in the benign competition of Social Media, and the significance level how screening out the malice scoring user in Reputation Evaluation System is self-evident.
In order to detect the user of cheating comment or malice scoring, mainly contain two kinds of methods at present:
First method is handmarking.By observe artificially evaluate user scoring, comment content and other comment behaviors, judge whether user belongs to cheating comment user.But this detection method is with very strong subjectivity, and owing to needing data volume to be processed large, manual method is difficult to really be applied to malice in large-scale Social Media system and evaluates the detection of user.
Second method utilizes Computer Automatic Recognition.First mark typical cheating comment user, then by machine learning algorithm, unlabelled user is classified.More typical way has two kinds, and one is the similarity judging user comment content in the evaluation having text reviews, and another kind calculates user's scoring and product proper mass departure degree.
Such as, article (A robust ranking algorithm to spamming.EPL EPL delivered in 2011,94 (2011), 48002.) a kind of user's prestige sort algorithm detection of malicious based on correlativity scoring user is proposed in.This algorithm calculates user's credit value and product average mainly through iterative strategy simultaneously, and finally according to the prestige sequence detection of malicious scoring user of user.The essence of this algorithm is to adopt user's prestige to be weighted average computation product quality to product scoring, in fact detect according to the deviation of user's score value and product proper mass, deviation is larger, illustrate user become malice scoring user possibility larger.Although this method is simple, the proper mass of product itself is an immensurable value, and the satisfaction of different user to same product varies with each individual.Generally, objectively there is certain error, thus accuracy in detection can be caused not high in the way that the average of all scorings that product quality product obtains represents.In addition, this algorithm shows good robustness when malice scoring user ratio is large especially, but to mark all less true points-scoring system poor effect of ratio for malice scoring user's ratio and cheating user.
And for example, WWW meeting paper (Spotting Fake Reviewer Groups in Consumer Reviews.WWW ' 12,2012, pp in 2012 ,the method of the detection of malicious scoring user based on user comment content similarities 191-200.) is proposed.The method detects cheating comment user by the similarity of analysis user comment text content, if similarity is very high between two comments, the possibility that the user so delivering comment in these two days becomes the comment user that practises fraud is larger.Although this method effectively can detect cheating reviewer, need to carry out text analyzing to the comment content in entire society's media system, data volume is large, and treatment effeciency is low; On the other hand, in a lot of Social Media system, user does not actively participate in comment, even and if participate in comment also only have brief word, this make based on comment content analysis can not normally use in many systems.And be that current most system all possesses based on the system of scoring, due to user, to evaluate cost not high, and the user therefore participated in is many, and can not be used in this type systematic based on the method for discrimination of comment text.
Along with the development of social networks, the US Patent No. 8176057 that on August 5th, 2012 authorizes discloses a kind of user's prestige detection method based on social networks, carried out the transmission of credit value by the feedback of high prestige user, thus detect the user of low prestige.Although the method can effectively calculate user's credit value, be mainly used in the user identifying that prestige is higher, the user's detection accuracy for malice scoring is not high.
In sum, existing method can't meet the actual demand of most of Social Media website, or has deviation in identification accuracy, or can not be applied to actual detection efficiently, or is not suitable for some evaluation system.
Summary of the invention
The object of this invention is to provide a kind of effective ways being applicable to malice scoring user detection in online Social Media system.The present invention is directed have the Social Media system of scoring feedback, carry out detection of malicious scoring user by the score value analyzing user, avoid user comment text content analysis and process the super large calculated amount brought, accuracy is high simultaneously to improve detection efficiency.
The technical scheme that its technical matters of solution provided by the invention adopts is the method for detection of malicious scoring user in a kind of online Social Media system, comprises the steps:
Step 1: the user's score data in extraction system, pre-service is carried out to data, obtains normalized user's score data and comprise by user ID, product IDs, user to the scoring of product, by these three classes data according to tlv triple (u, p, v) form store;
Step 2: user marks cluster, calculates the degree of confidence vector of user's scoring;
Step 2-1: be one group by the user clustering giving identical scoring for same product;
Step 2-2: the degree of confidence vector calculating every user, this user of each representation in components of this degree of confidence vector is to a kind of credit value of product, this credit value is for user is for the ratio of phylogenetic group size belonging to this product and all evaluation numbers of users, and this ratio is defined as ratio value of comforming;
Step 3: the user's degree of confidence vector always calculated according to step 2, calculate the fiduciary level of user's scoring, be considered as least reliable N number of user maliciously to mark user, generate malice scoring user candidate list, wherein N to mark ratio and detect the factors such as degree of accuracy and set according to the user of real system;
Step 4: with the departure degree of product proper mass, user's candidate list of maliciously marking is resequenced according to user's scoring in malice scoring user candidate list, choose the maximum M of a departure degree user, obtain final malice scoring user, wherein M to mark ratio and detect the factors such as degree of accuracy and set according to the user of real system.
Wherein, the concrete steps of step 1 are:
Step 1-1: remove the user of scoring number of times lower than threshold k, wherein threshold k can regulate according to the situation of system scoring and the concrete fine degree detected;
Step 1-2: according to the principle rounded up, to mark not for integer discretize is carried out in the scoring of integer;
Step 1-3: by user ID, product IDs, user store the form of the score data of product according to tlv triple (u, p, v).
In described step 1, usual K value is 8.
In described step 2, the degree of confidence vector dimension of each user is inconsistent, adopts xml file to store.
The concrete steps of described step 3 are:
Step 3-1: the mean value and the variance that calculate every user's degree of confidence vector, at calculating mean value divided by square extent, obtain user's fiduciary level;
Step 3-2: by all users according to the arrangement of fiduciary level size ascending order, choose top n user, generates malice scoring user candidate list.
The concrete steps of described step 4 are:
Step 4-1: calculate a mean value of product scoring, this mean value is considered as the proper mass of product;
Step 4-2: in malice that calculation procedure 3 obtains scoring user candidate list, each user is for the proper mass irrelevance of each product, namely user is to the difference of the scoring of product and this product proper mass;
Step 4-3: calculate the proper mass irrelevance absolute value of each user to each product, then it is averaging, obtain the scoring irrelevance of this user;
Step 4-4: each user is carried out descending sort according to irrelevance of marking, chooses a front M user for the final user that maliciously marks, generates user list of maliciously marking.
The present invention is based on user's scoring to detect, eliminate the complicated procedures of forming of process text on the one hand, improve detection efficiency, be applicable to nearly all evaluation system, first detect malice scoring user Candidate Set on the other hand, again secondary detection is carried out to the user in Candidate Set, this operation makes the present invention greatly improve in identification accuracy, and especially in user marks number Realistic Evaluation system that and malice mark number of users relative to all number of users ratio little less of the total ratio of product, Detection results is very outstanding.
Accompanying drawing explanation
Fig. 1 is a kind of process flow diagram being applicable to the method for detection of malicious scoring user in large scale community media system provided by the invention.
Fig. 2 is the processing flow chart of generation user degree of confidence vector provided by the invention.
Fig. 3 is the process flow diagram that User reliability provided by the invention calculates and malice scoring user candidate list generates.
Fig. 4 provided by the inventionly resequences to malice user's candidate list of mark according to user's scoring in malice scoring user candidate list and product proper mass departure degree, obtains final malice and to mark the process flow diagram of user.
Embodiment
For making the object, technical solutions and advantages of the present invention clearly understand, below in conjunction with specific embodiment, and with reference to accompanying drawing, the present invention is described in more detail.
Below in conjunction with accompanying drawing, the present invention is described in detail, be to be noted that described example is only intended to be convenient to the understanding of the present invention, and any restriction effect is not play to it.
The present invention propose based on malice scoring user detection method in the Social Media of scoring behavior cluster, overall procedure is as shown in Figure 1.
Step 1 is data preprocessing module.The raw data inputted system is carried out pre-service by this module, filtering noise data, and carries out discrete integer to score data, and pretreated data are the input of feature extraction operation in step S2.
Step 2 is user's confidence calculations module.This module carries out scoring cluster to through the pretreated data of step S1, calculates user's degree of confidence vector, the input data that user's degree of confidence will be extracted as secondary characteristics in step S3 according to the ratio of comforming of cluster size.
Step 3, for calculating the fiduciary level of user's scoring, generates malice scoring user candidate list module.This module extracts mean value and the variance of each user's degree of confidence based on user's degree of confidence vector, and the ratio of computation of mean values and variance, as user's fiduciary level, sorts to user's fiduciary level, final generation malice scoring user candidate list.
Step 4 is that malice scoring user candidate list reorders, final malice scoring user generation module.This module, by counting yield proper mass, on the basis of malice scoring user candidate list, utilizes user to mark and the irrelevance of product proper mass, carries out two-stage detection, generate final malice scoring user testing result to Preliminary detection result.
Next each key step is described in detail:
1. input system iotave evaluation data, and data prediction is carried out to input data, pretreated result is stored.(step 1).
Pretreatment work comprises noise data and filters and score value integer two major parts.Isolating user comment data the first-selected raw data from inputting, filtering the comment user of number of times below 8 times and the score information of correspondence.If user's scoring is not integer, based on the principle rounded up, user's scoring is rounded.Because noise data is the less user of scoring and score information, little on whole system impact after removing, but effectively raise counting yield.By scoring integer discretize, decrease cluster calculation complexity, be easier to the application of real system.
2. user marks cluster, calculates the degree of confidence vector of user's scoring.
Step 2 mainly completes the work of user's degree of confidence vector calculation, and workflow diagram as shown in Figure 2, comprises scoring behavior cluster, ratio of comforming calculates and the generation of user's degree of confidence vector and storage.
If carrying out cluster to scoring behavior in step 2-1 is according to identical to user's evaluation score of like products scoring, is one group by these user clusterings.User to be carried out to product each in system to mark behavior cluster.If user carried out scoring to N number of product, so this user's degree of confidence was a N dimensional vector, and each component is the credit value obtained after user marks at every turn.Because mark of marking after pre-service is discrete, so form the group of fixed number after cluster.
G j ( r ) = { U i | r i , j = r , r ∈ Rate }
Wherein to product O jthe group formed after cluster is carried out in scoring, r i,jrepresent user U ito product O jmarking.Rate is that the discrete scoring of the integer of product is interval.
Step 2-2 calculates group size belonging to each user to account for the scale evaluating the total user number of product, and ratio larger explanation belongingness is stronger.This ratio value has reacted the departure degree that user evaluates behavior and most people.If user belongs to a less group, so ratio value is little, user evaluate behavior depart from popular evaluate larger.On the contrary, if user belongs to a larger group, illustrate that comment is consistent with the comment of most people, departure degree is little, credible.System adopts the method for cluster size normalization to calculate intensity of comforming.Generate and store user's degree of confidence vector.Belonging to user, the ratio of comforming of group and this group allocation carrys out distributing user degree of confidence, the intensity size of comforming of group belonging to degree of confidence size characterizing consumer.By being that the user of a group gives identical degree of confidence to giving that like products marks and gathered, obtaining each user for each degree of confidence of carrying out the product of marking, generating the degree of confidence vector of user.Finally, the degree of confidence vector that user is corresponding is stored.
3. calculate the fiduciary level of user's scoring, generate malice scoring user candidate list.(step S3)
Step 3 calculates user's fiduciary level on the basis of user's degree of confidence vector of step S2 generation, and sort according to user's fiduciary level size, and before getting rank, the user of percent K adds in malice scoring user candidate list.The process flow diagram of step S3 as shown in Figure 3, comprises degree of confidence mean value and variance calculating (step S31), calculates user's fiduciary level (step S32) and maliciously scoring user's candidate generation and storage.
In step 3-1, extract mean value and the variance of all degree of confidence of each user.The mean value of degree of confidence has reacted the average level of this user's fiduciary level, and the variance of degree of confidence has reacted the degree of fluctuation of this user's fiduciary level.User reliability is the final confidence level size calculated further on the basis of user's degree of confidence.The average coherence of user and fiduciary level degree of fluctuation is utilized comprehensively to generate the fiduciary level of user, shown in the following formula of circular:
Score i = Rs i Ps i , Wherein Rs i = Σ j ∈ O i ∈ U rp i , j dim rp → i , Ps i = Σ j ∈ O i ∈ U ( rp i , j - Rs i ) 2 dim rp i ;
Wherein Score iuser U ifiduciary level, the Reliability size that namely user is final; Rs iit is the average level of the mean value of user's degree of confidence, representative of consumer fiduciary level; Ps iit is the degree of fluctuation of the variance of user's degree of confidence, the representative of consumer degree of reliability.When mean value is less compared with variance during Datong District, the scoring prestige of the acquisition high score that user at every turn can be stable, this kind of user's fiduciary level is high, is trustworthy.
In step 3-2, generate on basis that malice scoring user candidate list is the user's fiduciary level calculated in step s 32 and carry out.Carry out ascending order arrangement to user's fiduciary level, before getting list, the user of percent K adds malice scoring user candidate collection, completes Preliminary detection.Sort algorithm adopts ripe quicksort, and this algorithm does not belong to the content that the present invention emphasizes, when data volume is larger, this sort algorithm can well distributedization, improves sequence efficiency.
4. with product proper mass departure degree, user's candidate list of maliciously marking is resequenced according to user's scoring in malice scoring user candidate list, obtain final malice scoring user.
The overall flow of step 4 as shown in Figure 4, mainly contains and extracts product proper mass (step S41), calculates scoring irrelevance (step S42), to user's scoring and product proper mass deviation average sorts (step S43) and finally generates user's testing result of maliciously marking.
In step 4-1, product proper mass is weighed by all scoring averages of product.Product proper mass itself is an immensurable amount, usually makes estimation by some algorithms to product proper mass.Take the proper mass of arithmetic average as product of the scoring calculating the acquisition of each product in the present invention, mean value larger explanation product sole mass is better, otherwise then product sole mass is poorer.
Calculate in step 4-2 scoring irrelevance be one can the process of calculated off-line.Calculate the scoring irrelevance of absolute value as user of the difference of user's scoring and product proper mass.
Do to all evaluated products of user the scoring irrelevance vector that same process obtains this user in step 4-3, the mean value finally calculating scoring irrelevance vector to be marked irrelevance as final user.Same, all users in the malice scoring user Candidate Set generate step 3 do above-mentioned identical process." user mark irrelevance " that below mention all refer to through deviation that user is given a mark carry out absolute value average after value.
In step 4-4, according to user's irrelevance of marking, descending sort is carried out to the user in malice scoring user Candidate Set based in step 4-2, the more forward user of rank irrelevance of marking is larger, the possibility becoming malice scoring user is larger, and before getting rank, the user of K generates final malice scoring user list.K value can be marked ratio and detect the factor such as degree of accuracy and adjust according to the user of real system.Obtaining final ranking results is thus exactly malice scoring user testing result.
The process of execution of the present invention is described with a concrete instance below
For simplified illustration, in this example, in Social Media web station system, one has the iotave evaluation situation of 10 users to 5 products, and scoring is 1 assign to 5 points, totally 5 grading systems.As shown in table 1, in table 1, row represents user (U), and product (O) is shown in list, value in corresponding cell is the scoring of user to product, if cell is empty (-), represents that this user did not buy this product, mark as sky.The consumer products rating matrix R of such formation table 1.
O1 O2 O3 O4 O5
U1 4 5 3 4 -
U2 - 4 4 2 5
U3 3 4 - 5 3
U4 5 - - 4 3
U5 3 4 5 - 3
U6 2 4 3 5 3
U7 - 3 1 5 3
U8 1 - 3 3 4
U9 5 2 2 5 -
U10 5 - 2 1 4
Table 1
For simplified illustration, only utilize the one based on scoring behavior cluster to be implemented as example herein and be described, wherein clustering method carries out according to specifically describing in step S21, obtains the scoring group after cluster.Row is 5 products in corresponding table 1, row are that 1 to 5 scorings are interval, and corresponding unit lattice are the groups according to being formed after scoring behavior cluster, and cell does not have user to comment reciprocal fraction for empty (-) represents, after numeric representation cluster, the size of group, as shown in table 2.
Table 2
Obtaining user's degree of confidence to the user clustering that each product obtains according to cluster size normalization, is that the user of a group gives identical degree of confidence to giving that like products marks and gathered.As shown in table 3, row expression 10 users, 5 kinds of products are shown in list, and cell is that user is to product evaluation degree of confidence size.If cell is empty, show that user does not mark to this product.
O1 O2 O3 O4 O5
U1 0.125 0.143 0.375 0.222 -
U2 - 0.571 0.125 0.111 0.125
U3 0.250 0.571 - 0.444 0.625
U4 0.375 - - 0.222 0.625
U5 0.250 0.571 0.125 - 0.625
U6 0.125 0.571 0.375 0.444 0.625
U7 - 0.143 0.125 0.444 0.625
U8 0.125 - 0.375 0.111 0.250
U9 0.375 0.143 0.250 0.444 -
U10 0.375 - 0.250 0.111 0.250
Table 3
Calculate the Gaussian distribution statistical nature of each user's degree of confidence vector according to step S31, obtain mean value and variance.Utilize mean value to obtain the final fiduciary level of user than variance, as shown in table 4, row represents user, and user's degree of confidence mean value, variance and final fiduciary level size are shown in list.
Mean value Variance Fiduciary level
u1 0.2163 0.1139 1.899
u2 0.2330 0.2254 1.034
u3 0.4725 0.1666 2.836
u4 0.4073 0.2034 2.002
u5 0.3927 0.2434 1.613
u6 0.4280 0.1963 2.180
u7 0.3342 0.2429 1.376
u8 0.2153 0.1235 1.743
u9 0.3030 0.1335 2.269
u10 0.2465 0.1079 2.285
Table 4
User's fiduciary level size according to calculating in table 4 carries out ascending sort, and result is: u2, u7, u5, u8, u1, u4, u6, u9, u10, u3.Before getting list, the user of 40% adds malice scoring user candidate collection, obtains malice scoring user candidate collection to be: { u2, u7, u5, u8}.
According to step S41, all scoring mean value of counting yield gained, as product proper mass, is resequenced to user's Candidate Set of maliciously marking according to the irrelevance of user's scoring with product proper mass.As shown in table 5, row represents user, and product is shown in list.Totally 4 users 5 kinds of products, corresponding unit lattice are that user marks and the irrelevance of product proper mass.
Table 5
The irrelevance mean value descending sort calculated in his-and-hers watches 5, the malice obtained scoring user list is U2, U7, U8, U5.Compared with list of marking in Candidate Set of maliciously marking, U5 is more for U8 scoring departure ratio, more easily becomes the user that maliciously marks.So far detect complete, obtain malice and to mark user list, rank is more forward, and to become the possibility of malice scoring user larger.
Although be described, so that those skilled in the art understand the present invention the illustrative embodiment of the present invention above.But be noted that; the invention is not restricted to the scope of embodiment; to those skilled in the art; as long as various change to limit and in the spirit and scope of the present invention determined in appended claim; these changes are apparent, and all innovation and creation utilizing the present invention to conceive are all at the row of protection.

Claims (6)

1. in online Social Media system detection of malicious scoring user a method, the method comprises:
Step 1: the user's score data in extraction system, pre-service is carried out to data, obtains normalized user's score data and comprise by user ID, product IDs, user to the scoring of product, by these three classes data according to tlv triple (u, p, v) form store;
Step 2: user marks cluster, calculates the degree of confidence vector of user's scoring;
Step 2-1: be one group by the user clustering giving identical scoring for same product;
Step 2-2: the degree of confidence vector calculating every user, this user of each representation in components of this degree of confidence vector is to a kind of credit value of product, and this credit value is for user is for the ratio value of comforming of phylogenetic group size belonging to this product and all evaluation numbers of users;
Step 3: according to the user's degree of confidence vector calculated in step 2, calculate the fiduciary level of user's scoring, be considered as least reliable N number of user maliciously to mark user, generate malice scoring user candidate list, wherein N to mark ratio and detect the factors such as degree of accuracy and set according to the user of real system;
Step 4: with the departure degree of product proper mass, user's candidate list of maliciously marking is resequenced according to user's scoring in malice scoring user candidate list, choose the maximum M of a departure degree user, obtain final malice scoring user, wherein M to mark ratio and detect the factors such as degree of accuracy and set according to the user of real system.
2. in a kind of online Social Media system as claimed in claim 1 detection of malicious scoring user method, it is characterized in that the concrete steps of step 1 are:
Step 1-1: remove the user of scoring number of times lower than threshold k, wherein threshold k can regulate according to the situation of system scoring and the concrete fine degree detected;
Step 1-2: according to the principle rounded up, to mark not for integer discretize is carried out in the scoring of integer;
Step 1-3: by user ID, product IDs, user store the form of the score data of product according to tlv triple (u, p, v).
3. in a kind of online Social Media system as claimed in claim 2 detection of malicious scoring user method, it is characterized in that in described step 1-1, usual K value is 8.
4. the method for detection of malicious scoring user in a kind of online Social Media system as claimed in claim 1, is characterized in that the degree of confidence vector dimension of each user in described step 2 is inconsistent, adopts xml file to store.
5. in a kind of online Social Media system as claimed in claim 1 detection of malicious scoring user method, it is characterized in that the concrete steps of described step 3 are:
Step 3-1: the mean value and the variance that calculate every user's degree of confidence vector, at calculating mean value divided by square extent, obtain user's fiduciary level;
Step 3-2: by all users according to the arrangement of fiduciary level size ascending order, choose top n user, generates malice scoring user candidate list.
6. in a kind of online Social Media system as claimed in claim 1 detection of malicious scoring user method, it is characterized in that the concrete steps of described step 4 are:
Step 4-1: calculate a mean value of product scoring, this mean value is considered as the proper mass of product;
Step 4-2: in malice that calculation procedure 3 obtains scoring user candidate list, each user is for the proper mass irrelevance of each product, namely user is to the difference of the scoring of product and this product proper mass;
Step 4-3: calculate the proper mass irrelevance absolute value of each user to each product, then it is averaging, obtain the scoring irrelevance of this user;
Step 4-4: each user is carried out descending sort according to irrelevance of marking, chooses a front M user for the final user that maliciously marks, generates user list of maliciously marking.
CN201410638173.6A 2014-11-13 2014-11-13 Method for detecting users who score maliciously in online social media system Pending CN104463601A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201410638173.6A CN104463601A (en) 2014-11-13 2014-11-13 Method for detecting users who score maliciously in online social media system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201410638173.6A CN104463601A (en) 2014-11-13 2014-11-13 Method for detecting users who score maliciously in online social media system

Publications (1)

Publication Number Publication Date
CN104463601A true CN104463601A (en) 2015-03-25

Family

ID=52909593

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201410638173.6A Pending CN104463601A (en) 2014-11-13 2014-11-13 Method for detecting users who score maliciously in online social media system

Country Status (1)

Country Link
CN (1) CN104463601A (en)

Cited By (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105389505A (en) * 2015-10-19 2016-03-09 西安电子科技大学 Shilling attack detection method based on stack type sparse self-encoder
CN106991425A (en) * 2016-01-21 2017-07-28 阿里巴巴集团控股有限公司 The detection method and device of commodity transaction quality
CN106991584A (en) * 2017-04-10 2017-07-28 山东科技大学 A kind of electronic commerce credits computational methods based on scoring person's impression
CN107689960A (en) * 2017-09-11 2018-02-13 南京大学 A kind of attack detection method for inorganization malicious attack
CN109344176A (en) * 2018-09-05 2019-02-15 浙江工业大学 False comment detection method based on Two-way Cycle figure
CN111242647A (en) * 2020-01-20 2020-06-05 南京财经大学 Method for identifying malicious user based on E-commerce comment
CN111382435A (en) * 2018-12-28 2020-07-07 卡巴斯基实验室股份制公司 System and method for detecting sources of malicious activity in a computer system
US10824721B2 (en) 2018-05-22 2020-11-03 International Business Machines Corporation Detecting and delaying effect of machine learning model attacks
CN112312169A (en) * 2020-11-20 2021-02-02 广州欢网科技有限责任公司 Method and equipment for checking program scoring validity

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020482A (en) * 2013-01-05 2013-04-03 南京邮电大学 Relation-based spam comment detection method
US20140317732A1 (en) * 2013-04-22 2014-10-23 Facebook, Inc. Categorizing social networking system users based on user connections to objects

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103020482A (en) * 2013-01-05 2013-04-03 南京邮电大学 Relation-based spam comment detection method
US20140317732A1 (en) * 2013-04-22 2014-10-23 Facebook, Inc. Categorizing social networking system users based on user connections to objects

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
董宇蔚: "电子商务中的评论挖掘及应用研究", 《万方学位论文》 *

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN105389505A (en) * 2015-10-19 2016-03-09 西安电子科技大学 Shilling attack detection method based on stack type sparse self-encoder
CN105389505B (en) * 2015-10-19 2018-06-12 西安电子科技大学 Support attack detection method based on the sparse self-encoding encoder of stack
CN106991425A (en) * 2016-01-21 2017-07-28 阿里巴巴集团控股有限公司 The detection method and device of commodity transaction quality
CN106991584A (en) * 2017-04-10 2017-07-28 山东科技大学 A kind of electronic commerce credits computational methods based on scoring person's impression
CN107689960A (en) * 2017-09-11 2018-02-13 南京大学 A kind of attack detection method for inorganization malicious attack
US10824721B2 (en) 2018-05-22 2020-11-03 International Business Machines Corporation Detecting and delaying effect of machine learning model attacks
CN109344176A (en) * 2018-09-05 2019-02-15 浙江工业大学 False comment detection method based on Two-way Cycle figure
CN111382435A (en) * 2018-12-28 2020-07-07 卡巴斯基实验室股份制公司 System and method for detecting sources of malicious activity in a computer system
CN111382435B (en) * 2018-12-28 2023-06-23 卡巴斯基实验室股份制公司 System and method for detecting source of malicious activity in computer system
CN111242647A (en) * 2020-01-20 2020-06-05 南京财经大学 Method for identifying malicious user based on E-commerce comment
CN112312169A (en) * 2020-11-20 2021-02-02 广州欢网科技有限责任公司 Method and equipment for checking program scoring validity

Similar Documents

Publication Publication Date Title
CN104463601A (en) Method for detecting users who score maliciously in online social media system
Nahar et al. Sentiment analysis for effective detection of cyber bullying
Cheng et al. Personalized click prediction in sponsored search
CN111882446B (en) Abnormal account detection method based on graph convolution network
CN102841946B (en) Commodity data retrieval ordering and Method of Commodity Recommendation and system
CN103514255B (en) A kind of collaborative filtering recommending method based on project stratigraphic classification
CN105389505B (en) Support attack detection method based on the sparse self-encoding encoder of stack
CN106991447A (en) A kind of embedded multi-class attribute tags dynamic feature selection algorithm
CN104156403B (en) A kind of big data normal mode extracting method and system based on cluster
CN103902545B (en) A kind of classification path identification method and system
CN108415913A (en) Crowd's orientation method based on uncertain neighbours
CN102156706A (en) Mentor recommendation system and method
CN105069072A (en) Emotional analysis based mixed user scoring information recommendation method and apparatus
CN103353880B (en) A kind of utilization distinctiveness ratio cluster and the data digging method for associating
CN109635010B (en) User characteristic and characteristic factor extraction and query method and system
CN110992059B (en) Surrounding string behavior recognition analysis method based on big data
CN110414780A (en) A kind of financial transaction negative sample generation method based on generation confrontation network
CN106156372A (en) The sorting technique of a kind of internet site and device
CN111738843B (en) Quantitative risk evaluation system and method using running water data
Rajamohana et al. An effective hybrid cuckoo search with harmony search for review spam detection
CN103366009A (en) Book recommendation method based on self-adaption clustering
Yin et al. Improved fake reviews detection model based on vertical ensemble tri-training and active learning
CN109977131A (en) A kind of house type matching system
Pei et al. Subgraph anomaly detection in financial transaction networks
Kumar et al. Friend Recommendation using graph mining on social media

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication
RJ01 Rejection of invention patent application after publication

Application publication date: 20150325