CN107480187A

CN107480187A - User's value category method and apparatus based on cluster analysis

Info

Publication number: CN107480187A
Application number: CN201710555480.1A
Authority: CN
Inventors: 王硕; 郑凯伦
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Jingdong Shangke Information Technology Co Ltd
Priority date: 2017-07-10
Filing date: 2017-07-10
Publication date: 2017-12-15

Abstract

The invention discloses user's value category method and apparatus based on cluster analysis, it is related to field of computer technology.One embodiment of this method includes：According to desired value of the user under each index feature, it is determined that the original feature vector of each user；Factorial analysis is carried out to the original feature vector of each user, meets the common factor of setting rule as factor variable factor score；Value based on each user under the factor variable, it is determined that the multi-feature vector of each user；Cluster analysis is carried out to the multi-feature vector of each user, it is determined that the classification of each user.The embodiment can prevent that index feature is excessively cumbersome, reduce the factor quantity of cluster analysis, reduce the time cost and database resource consumption of user's value category；It is avoided that and empirically assigns weight for each index feature, improves weight distribution accuracy, and then improve the accuracy of user's value category；Objectivity and accuracy that user is worth packet can be improved.

Description

User's value category method and apparatus based on cluster analysis

Technical field

The present invention relates to field of computer technology, more particularly to a kind of user's value category method based on cluster analysis and Device.

Background technology

With the continuous development of relationship marketing theory, customer relation management is paid attention to by more and more enterprises, enterprise Manage emphasis and client is turned to by product one after another.Effectively management how is carried out to huge customer group and has become many enterprises not Avoidable problem.Effectively identification customer value and feature, implement targetedly to manage on this basis, are customer relation managements Center of gravity and key issue.Client is the basis of enterprise's survival and development, with being becoming better and approaching perfection day by day for customer relation management, is obtained simultaneously High-quality client is kept to turn into the focus of enterprises pay attention, also therefore as the focus of Marketing researcher concern.Obtain with In the research for keeping client, the problem of how to weigh the value of client turns into a key.In existing user's value models, greatly Desired value of the user under each index feature is multiplied by the weight of the index feature empirically assigned more, product addition is obtained The value score of the user, score is divided according to certain quantile, obtain the final value packet of user.

In process of the present invention is realized, inventor has found that at least there are the following problems in the prior art：

Existing user's value models are excessively cumbersome in the selection of index feature, so that processing all expends largely every time Time cost and database resource；

Weight empirically is assigned for each index feature, is easily produced because the weight proportioning of index feature is larger or smaller Caused by user value packet it is inaccurate；

The value that existing user's value models divide user according to certain quantile is grouped, and dividing mode is excessively main See, be dogmatic.

The content of the invention

In view of this, the embodiment of the present invention provides a kind of user's value category method and apparatus based on cluster analysis, energy It is enough it is objective, carry out user's value category exactly.

To achieve the above object, a kind of one side according to embodiments of the present invention, there is provided use based on cluster analysis Family value category method, including：

According to desired value of the user under each index feature, it is determined that the original feature vector of each user；

Factorial analysis is carried out to the original feature vector of each user, factor score is met into the public of setting rule The factor is as factor variable；Value based on each user under the factor variable, it is determined that the comprehensive characteristics of each user to Amount；

The multi-feature vector of each user is clustered, it is determined that the classification of each user.

Alternatively, the common factor is arranged according to the order of factor score from high to low, obtains common factor sequence；

Using preceding n common factor in the common factor sequence as factor variable；Wherein, 1≤n ＜ k, k for it is public because The number of son, n, k are integer.

Alternatively, before to the original feature vector progress factorial analysis of each user, further comprise：

By data cleansing, identify whether the desired value is exceptional value, also,

If the desired value is exceptional value, the desired value is rejected.

Alternatively, data cleansing is carried out by box-shaped figure.

Judge whether the index feature is negative sense index, also,

If the index feature is negative sense index, positiveization processing is carried out to the desired value under the negative sense index.

Alternatively, positiveization processing is carried out to the desired value under the negative sense index as follows：

Obtain the Maximum Index value under the negative sense index；

With the Maximum Index value and the difference of the desired value, as under the negative sense index after positiveization processing Desired value.

Alternatively, before to the original feature vector progress factorial analysis of each user, further comprise：To described Desired value is standardized.

Alternatively, the desired value is standardized as follows：

Obtain the maximum and minimum value of an index feature；

With the ratio of the difference and the maximum and the difference of the minimum value of the desired value and the minimum value, make For the desired value under this feature index after standardization.

Alternatively, the central point number of cluster is determined using ancon rule.

Alternatively, cluster analysis is carried out to the multi-feature vector of each user, including：

Select the central value of central point each clustered；

The central value of each central point is updated by iteration, in each iterative process：Based on the described comprehensive of each user The central value for closing characteristic vector and each central point determines each user and the distance of each central point, and each user is returned Enter and the class where its most short central point of distance；Update the central value of each central point；

If the central value of each central point keeps constant before and after renewal, iteration terminates.

According to a further aspect of the invention, a kind of user's value category device based on cluster analysis is additionally provided, is wrapped Include：

Acquisition module, for the desired value according to user under each index feature, it is determined that the primitive character of each user Vector；

Analysis module, for carrying out factorial analysis to the original feature vector of each user, factor score is met The common factor of rule is set as factor variable；Value based on each user under the factor variable, it is determined that each use The multi-feature vector at family；

Cluster module, the cluster module carries out cluster analysis to the multi-feature vector of each user, it is determined that often The classification of individual user.

Alternatively, the analysis module arranges the common factor according to the order of factor score from high to low, obtains public affairs Common factor sequence；

The analysis module is using preceding n common factor in the common factor sequence as factor variable；Wherein, 1≤n ＜ K, k are the number of common factor, and n, k are integer.

Alternatively, user's value category device of the invention further comprises：Cleaning module, for identifying the desired value Whether it is exceptional value, if also, the desired value is exceptional value, reject the desired value.

Alternatively, the cleaning module carries out data cleansing by box-shaped figure.

Alternatively, user's value category device of the invention further comprises：Forward directionization module, for judging the index Whether feature is negative sense index, if also, the index feature is negative sense index, to the desired value under the negative sense index Carry out positiveization processing.

Alternatively, the forward directionization module is carried out positive to the desired value under the negative sense index as follows Change is handled：

Obtain the Maximum Index value under the negative sense index；

Alternatively, user's value category device of the invention further comprises：Standardized module, for the desired value It is standardized.

Alternatively, standardized module is standardized to the desired value as follows：

Obtain the maximum and minimum value of an index feature；

Alternatively, cluster module determines the central point number of cluster using ancon rule.

Alternatively, the cluster module carries out cluster analysis to the multi-feature vector of each user, including：

Select the central value of central point each clustered；

According to another aspect of the present invention, there is provided a kind of user's value category terminal based on cluster analysis, including：

One or more processors；

Storage device, for storing one or more programs,

When one or more of programs are by one or more of computing devices so that one or more of processing Device realizes user's value category method of the invention based on cluster analysis.

According to the still another aspect of the present invention, there is provided a kind of computer-readable medium, computer program is stored thereon with, its It is characterised by, user's value category method of the invention based on cluster analysis is realized when described program is executed by processor.

One embodiment in foregoing invention has the following advantages that or beneficial effect：

By carrying out factorial analysis to the original feature vector of each user, based on each user under the factor variable Value determine the multi-feature vector of each user, can prevent that index feature is excessively cumbersome, reduce the factor of cluster analysis Quantity, reduce the time cost and database resource consumption of user's value category；

By the way that factor score being met to, the common factor of setting rule is used as factor variable, it is empirically each that can avoid Index feature assigns weight, improves weight distribution accuracy, and then improve the accuracy of user's value category；

By carrying out cluster analysis to the multi-feature vector of each user, can avoid according to certain quantile division The value packet of user, improves objectivity and accuracy that user is worth packet.

Further effect adds hereinafter in conjunction with embodiment possessed by above-mentioned non-usual optional mode With explanation.

Brief description of the drawings

Accompanying drawing is used to more fully understand the present invention, does not form inappropriate limitation of the present invention.Wherein：

Fig. 1 is the signal of the main flow of user's value category method according to embodiments of the present invention based on cluster analysis Figure；

Fig. 2 is the signal of the key step of user's value category method according to embodiments of the present invention based on cluster analysis Figure；

Fig. 3 is the signal of the main modular of user's value category device according to embodiments of the present invention based on cluster analysis Figure；

Fig. 4 is that the embodiment of the present invention can apply to exemplary system architecture figure therein；

Fig. 5 is adapted for the structural representation for realizing the terminal device of the embodiment of the present invention or the computer system of server Figure.

Embodiment

The one exemplary embodiment of the present invention is explained below in conjunction with accompanying drawing, including the various of the embodiment of the present invention Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize Arrive, various changes and modifications can be made to the embodiments described herein, without departing from scope and spirit of the present invention.Together Sample, for clarity and conciseness, the description to known function and structure is eliminated in following description.

Fig. 1 is a kind of main flow of user's value category method based on cluster analysis according to embodiments of the present invention Schematic diagram, as shown in Figure 1.

In step S101, the desired value first according to user under each index feature, it is determined that each user's is original Characteristic vector.

Those skilled in the art can determine user and index feature according to the purpose of actual conditions and user's value analysis Selection mode.By taking electric business platform as an example, it can select to have within nearly 1 year the account to place an order as user, can select as following In institute's column data any one, two or more are as index feature：

User efficiently accomplishes all spending amount summations of order in nearly 1 year；

The nearly 1 year maximum order amount of money：User efficiently accomplishes the maximum spending amount of order in nearly 1 year；

Nearly 1 year consumption visitor's unit price：User efficiently accomplishes visitor's unit price of order in nearly 1 year；

Nearly 1 year order volume：User efficiently accomplishes the order volume summation of order in nearly 1 year；

The one-level category number of purchase in nearly 1 year：User efficiently accomplishes the one-level category number of purchase in nearly 1 year；

The merchandise discount rate of purchase in nearly 1 year：The nearly 1 year merchandise discount rate for efficiently accomplishing purchase of user；

Nearly 1 year comprehensive rate of gross profit：The commodity summation rate of gross profit that user effectively buys for nearly 1 year；

Nearly 1 year goods return and replacement information：User's return of goods number of nearly 1 year；

Reject order volume within nearly 1 year：User's rejection order volume of nearly 1 year；

Log in number of days：User adds up the number of days logged in from the registration；

Shine odd number amount：User is accumulative to shine single number；

Nearly 1 year concern commodity amount：The quantity of user's concern commodity of nearly 1 year；

Participate within nearly 1 year evaluation number：The number of user's participation evaluation of nearly 1 year.

Certainly, user's value category method of the invention can also be applied to other field.To be sent out applied to Chinese society , can be using each province of China, autonomous region or municipality directly under the Central Government as user, with per-capita gross domestic product exemplified by opening up situation analysis (Gross Domestic Product, GDP), town dweller year disposable income, newly-increased fixed assets, institution of higher education's number per capita Amount, hygiene medical treatment mechanism quantity etc. are used as index feature.

When carrying out user's value category, some data are possible to extract from multiple operation systems, unavoidable to occur The data that some data are wrong data, are had have situations such as conflict between each other, due to data acquisition modes or other reasonses, Desired value under acquired index feature is also possible to exception be present.For the ease of description, there is exception by the present invention in these The desired value of index feature is referred to as exceptional value.Such as when single number is index feature under using user, if user's malice brushes list Meeting, the then value for corresponding to lower single number of user are exceptional value；For another example during using a certain regional GDP per capita as index feature, If the economic data of this area has a certain degree of statistical error, the GDP per capita value for corresponding to area is exceptional value；Compare again When such as, using the addition of a certain chemical reagent as index feature, if weigh assay balance used in the chemical reagent exist compared with Big systematic error, then the addition of the chemical reagent determined using the assay balance is exceptional value, etc..In order to avoid as far as possible These exceptional values, in some preferred embodiments, can be to each use to the issuable negative effect of user's value category Before the original feature vector at family carries out factorial analysis, by data cleansing, whether distinguishing indexes value is exceptional value, if also, The desired value is exceptional value, rejects the desired value.

Have in the prior art and screen exceptional value using 3 σ rules or criterion score (i.e. z-score, z-score) method. It is well known that 3 σ rules or z-score method are premised on assuming data Normal Distribution, but real data is often not Strict Normal Distribution；3 σ rules or z-score method judge that the standard of exceptional value is the average and mark with parameter value Based on quasi- difference, and the resistance of average and standard deviation is minimum, and exceptional value can produce considerable influence to them in itself, so produce Raw exceptional value number will not be more than sum 0.7%.Obviously, should judge in this way in non-normal data abnormal Value, its validity is limited.

In some currently preferred embodiments of the present invention, data cleansing is carried out by box-shaped figure (Box-plot).Box-shaped figure Draw by actual desired value, it is not necessary to which the desired value of prior conditional indicator feature obeys specific distribution form, therefore does not have Have to the requirement of imposing any restrictions property of desired value, can the true intuitively style of performance indicators value, objectively identification is abnormal Value.

For example, being directed to a certain index feature, first by the ascending arrangement of the desired value under the index feature and can be divided into Quarter, the 25%th desired value is defined as first quartile (Q1) after ascending arrangement, also known as " lower quartile "；By The small desired value to the after longer spread the 50%th is defined as the second quartile (Q2), also known as " median "；After ascending arrangement 75%th desired value is defined as the 3rd quartile (Q3), also known as " upper quartile "；3rd quartile with the one or four point The gap of digit is also known as interquartile-range IQR (InterQuartile Range, IQR).Then, (Q1-1.5IQR) will be less than or be more than (Q3+1.5IQR) desired value is defined as exceptional value and rejected.The span of exceptional value can be entered according to actual conditions Row setting, the present invention are not specifically limited to it.In the above-described embodiments, based on quartile and interquartile-range IQR, four points Digit has certain resistance, and up to 25% desired value can become any far without greatly disturbing quartile. As can be seen here, above-mentioned quartile method has certain superiority in terms of exceptional value is identified.

The desired value of some index features is the bigger the better, but also have some index features desired value be not it is more big more Well, the desired values such as return of goods number, rejection number during such as applied to electric business platform., will be upper in the present invention for the ease of description State two kinds of index features and be respectively defined as positive index and negative sense index, concrete kind of the present invention to positive index and negative sense index Type is not specifically limited, such as the index feature that desired value can be the bigger the better is defined as positive index, desired value is smaller Better index feature is defined as negative sense index, it is of course also possible to which the smaller the better index feature of desired value is defined as into forward direction Index, the index feature that desired value is the bigger the better are defined as negative sense index.In order to avoid negative sense index is to user's value category Issuable adverse effect, can be before factorial analysis be carried out to the original feature vector of each user, and judge index is special Whether sign is negative sense index, if also, the index feature is negative sense index, positiveization is carried out to the desired value under the negative sense index Processing.

Those skilled in the art can be realized to negative sense index by way of taking the opposite number of desired value of negative sense index Positiveization processing, certainly, those skilled in the art can also use other modes carry out positiveization processing, the present invention to forward direction The method for changing processing is not specifically limited.In certain embodiments, can be as follows to the desired value under negative sense index Carry out positiveization processing：

Obtain the Maximum Index value max (x) under negative sense index；

With Maximum Index value max (x) and desired value x difference x_new=max (x)-x, after positiveization processing Desired value under negative sense index.

In some cases, be inconvenient to be compared between the desired value of part index number feature, such as same spy Index is levied, is difficult to directly be compared between the two indices value using different dimensions measurement；For another example for same feature Index, using percent desired value determined compared with being difficult to directly between the desired value determined using ten point system；Etc..In order to Make that there is comparativity between each desired value of same index feature or the desired value of different index features, it is preferable that right Before the original feature vector of each user carries out factorial analysis, it may further include：Desired value is standardized.

In certain embodiments, can be standardized using deviation Standardization Act, for example, right as follows Desired value is standardized：Obtain the maximum and minimum value of an index feature；With the difference of desired value and minimum value and The ratio of maximum and the difference of the minimum value, as the desired value under this feature index after standardization.Deviation Standardization Act is to carry out linear transformation to original desired value.If minA and maxA are respectively index feature A minimum value and most Big value, the value x' in section [0,1], its formula are mapped to by an index feature A original value x by deviation Standardization Act For：X'=(x-minA)/(maxA-minA).

In other embodiment, Z-score Standardization Acts (zero-mean normalization) can be used Standard deviation Standardization Act is made to be standardized.Data fit standardized normal distribution by this standardization, i.e., It is worth and is for 0, standard deviation 1, its conversion function：

Wherein, μ is the average of all desired values of an index feature, and σ is the standard of all desired values of an index feature Difference, x are the desired value before standardization, and x ' is the desired value after standardization.

It the above is only and the simple of standardization processing method is enumerated, those skilled in the art can also select according to actual conditions The method for selecting other standardizations, the present invention are not limited this.

Step S102, factorial analysis is carried out to the original feature vector of each user, factor score is met into setting rule Common factor as factor variable；Value based on each user under factor variable, it is determined that the comprehensive characteristics of each user Vector.

In existing user's value category model, substantial amounts of index feature is often screened, on the one hand, index feature is excessively numerous Trivial, processing every time all takes a substantial amount of time cost and database resource；On the other hand, index feature is more, between variable or More or few correlations can damage the effect of cluster, and too many index feature participate after can become subsequent cluster It is very complicated.The present invention utilize dimensionality reduction thought, by the dependence inside original variable correlation matrix, by each use The original feature vector at family carries out factorial analysis, the value based on each user under the factor variable determines each user's Multi-feature vector, the more important multi-feature vector that can be picked out carries out cluster analysis, so as to which original index is comprehensive Synthesize less index, there is the variable of intricate relation to be attributed to a few multi-stress some, avoid index special Go on a punitive expedition in cumbersome, reduce the factor quantity of cluster analysis, the time cost and database resource for reducing user's value category expend Amount.

In the prior art, weight empirically often is assigned for each index feature, easily produced due to the power of index feature Match again it is larger or smaller caused by user value packet it is inaccurate.The present invention by factor score by meeting setting rule Common factor as factor variable, can avoid empirically assigning weight for each index feature, it is accurate to improve weight distribution Property, and then improve the accuracy of user's value category.

In certain embodiments, the mathematical modeling of factorial analysis is：

The mathematical modeling of above-mentioned factorial analysis can also the form of matrix be expressed as：

It is abbreviated as：

Wherein, X is the original variable matrix of factorial analysis, and A is Factor load-matrix, and F is common factor matrix, and p is finger The quantity of feature is marked, k is the quantity of common factor, and p≤k, ε are specific factor.

The factor score of each common factor can be regarded as the variance contribution ratio of the common factor.The factor of common factor Score is higher, shows that the significance level of the common factor is bigger.In the alternative embodiment of the present invention, factor score is met to set The common factor of set pattern then is as factor variable, for example, the common factor that factor score is exceeded to given threshold becomes as the factor Amount.It is of course also possible to arrange common factor according to the order of factor score from high to low, common factor sequence is obtained；Will be public Preceding n common factor is as factor variable in factor sequence；Wherein, 1≤n ＜ k, k are the number of common factor, and n, k are integer. N value can be determined according to actual conditions, such as n=5.On the basis of user's value category accuracy is met, fit When the value for reducing n, amount of calculation and the processing time of cluster analysis can be reduced, improves the efficiency of user's value category.

Step S103, cluster analysis is carried out to the multi-feature vector of each user, it is determined that the classification of each user.Cluster It is the process by data-object classifications to different class or cluster, the data object in same class has very big similitude, and Data object between inhomogeneity has very big diversity.By being clustered to the multi-feature vector of each user, can keep away Exempt to be grouped according to the value of certain quantile division user, improve objectivity and accuracy that user is worth packet.

In order to which a certain data object is divided into the class, division methods, hierarchical method, the side based on density can be used The method of the cluster analyses such as method, the method based on grid and the method based on model, for above-mentioned each class cluster analysis Method, all there is the clustering algorithm being used widely, such as：K averages (K-means) clustering algorithm in division methods, Coagulation type hierarchical clustering algorithm in hierarchical method, based on neural network clustering algorithm in model method etc..Certainly, in order to not , can also be using the method for fuzzy cluster analysis, for example with fuzzy rigidly by a data object categorization into certain one kind Cluster algorithm, determine that each Data Data is under the jurisdiction of the degree of each class by membership function.Those skilled in the art Suitable clustering method and algorithm can be selected according to actual conditions, the present invention is not especially limited to this.

The central point number of cluster can determine based on experience value, can also be obtained by model training.The present invention's In alternative embodiment, the central point number K of cluster is determined using ancon rule.For example, the central point of cluster can be pre-estimated Number K, generally 2 to 15, then the cost function value of different K values is drawn.Cost function is the distortion journey of each class (distortions) sum is spent, can be denoted as：

Wherein, C_mFor all point sets of m-th of class, y_iFor the arbitrfary point in m-th of class, μ_mFor the centre bit of m-th of class Put.The distortion degree of each class is equal to the quadratic sum of the distance of such center and the position each put inside it.In class The point in portion is compacter to each other, then the distortion degree of class is smaller, conversely, the distortion of the more scattered then class to each other of the member inside class Degree is bigger.K values during the increase of K values corresponding to the maximum position of average distortion degree fall are ancon, corresponding K values be cluster central point number.

Select the central value of central point each clustered；

The central value of each central point is updated by iteration, in each iterative process：It is comprehensive special based on each user Levy the vectorial distance that each user and each central point are determined with each central point central value, by each user be included into its away from From the class where most short central point；Update the central value of each central point；

In the iterative process of above-described embodiment, it can be that central value before and after value iteration is equal that central value, which keeps constant, Can also be that central value before and after value iteration is no more than default span.

Fig. 2 is the signal of the key step of user's value category method according to embodiments of the present invention based on cluster analysis Figure, as shown in Fig. 2 including：

1st, desired value of the user under each index feature is obtained；

2nd, whether judge index feature is negative sense index：If so, positiveization place is carried out to the desired value under the index feature Reason, then jump in next step；If it is not, then jump directly in next step；

3rd, each desired value is standardized；

4th, whether the desired value after criterionization processing is exceptional value：If so, rejecting the desired value, then jump to down One step；If it is not, then jump directly in next step；

5th, the original feature vector of each user is determined；

6th, factorial analysis is carried out to the original feature vector of each user, by factor score meet setting rule it is public because Son is used as factor variable；Value based on each user under factor variable, it is determined that the multi-feature vector of each user；

7th, cluster analysis is carried out to the multi-feature vector of each user, it is determined that the classification of each user.

Fig. 3 is the main modular of user's value category device 300 according to embodiments of the present invention based on cluster analysis Schematic diagram, as shown in figure 3, including：

Acquisition module 301, for the desired value according to user under each index feature, it is determined that the original spy of each user Sign vector；

Analysis module 305, for carrying out factorial analysis to the original feature vector of each user, factor score is met to set The common factor of set pattern then is as factor variable；Value based on each user under factor variable, it is determined that each user's is comprehensive Close characteristic vector；

Cluster module 306, cluster module 306 carries out cluster analysis to the multi-feature vector of each user, it is determined that each The classification of user.

Shine odd number amount：User is accumulative to shine single number；

Certainly, user's value category device of the invention can also be applied to other field.To be sent out applied to Chinese society , can be using each province of China, autonomous region or municipality directly under the Central Government as user, with GDP per capita, town dweller per capita exemplified by opening up situation analysis Year disposable income, newly-increased fixed assets, institution of higher education's quantity, hygiene medical treatment mechanism quantity etc. are used as index feature.

When carrying out user's value category, some data are possible to extract from multiple operation systems, unavoidable to occur The data that some data are wrong data, are had have situations such as conflict between each other, due to data acquisition modes or other reasonses, Desired value under acquired index feature is also possible to exception be present.For the ease of description, there is exception by the present invention in these The desired value of index feature is referred to as exceptional value.Such as when single number is index feature under using user, if user's malice brushes list Meeting, the then value for corresponding to lower single number of user are exceptional value；For another example during using a certain regional GDP per capita as index feature, If the economic data of this area has a certain degree of statistical error, the GDP per capita value for corresponding to area is exceptional value；Compare again When such as, using the addition of a certain chemical reagent as index feature, if weigh assay balance used in the chemical reagent exist compared with Big systematic error, then the addition of the chemical reagent determined using the assay balance is exceptional value, etc..In order to avoid as far as possible These exceptional values are to the issuable negative effect of user's value category, in some preferred embodiments, user's valency of the invention Value sorter further comprises：Cleaning module 304, for carrying out data cleansing, whether distinguishing indexes value is exceptional value, and And if the desired value is exceptional value, reject the desired value.

In some currently preferred embodiments of the present invention, cleaning module 304 is clear by box-shaped figure (Box-plot) progress data Wash.The drafting of box-shaped figure is by actual desired value, it is not necessary to which the desired value of prior conditional indicator feature obeys specific distribution Form, therefore not to the requirement of imposing any restrictions property of desired value, can the true intuitively style of performance indicators value, it is objective Ground identifies exceptional value.

The desired value of some index features is the bigger the better, but also have some index features desired value be not it is more big more Well, the desired values such as return of goods number, rejection number during such as applied to electric business platform., will be upper in the present invention for the ease of description State two kinds of index features and be respectively defined as positive index and negative sense index, concrete kind of the present invention to positive index and negative sense index Type is not specifically limited, such as the index feature that desired value can be the bigger the better is defined as positive index, desired value is smaller Better index feature is defined as negative sense index, it is of course also possible to which the smaller the better index feature of desired value is defined as into forward direction Index, the index feature that desired value is the bigger the better are defined as negative sense index.In order to avoid negative sense index is to user's value category Issuable adverse effect, user's value category device of the invention may further include：Forward directionization module 302, is used for Whether judge index feature is negative sense index, if also, the index feature is negative sense index, to the desired value under the negative sense index Carry out positiveization processing.

Those skilled in the art can be realized to negative sense index by way of taking the opposite number of desired value of negative sense index Positiveization processing, certainly, those skilled in the art can also use other modes carry out positiveization processing, the present invention to forward direction The method for changing processing is not specifically limited.In certain embodiments, forward directionization module 302 can refer to negative sense as follows Desired value under mark carries out positiveization processing：

Obtain the Maximum Index value max (x) under negative sense index；

In some cases, be inconvenient to be compared between the desired value of part index number feature, such as same spy Index is levied, is difficult to directly be compared between the two indices value using different dimensions measurement；For another example for same feature Index, using percent desired value determined compared with being difficult to directly between the desired value determined using ten point system；Etc..In order to Make that there is comparativity between each desired value of same index feature or the desired value of different index features, it is preferable that this hair Bright user's value category device may further include：Standardized module 303, for being standardized to desired value.

In certain embodiments, standardized module 303 can be standardized using deviation Standardization Act, for example, Desired value is standardized as follows：Obtain the maximum and minimum value of an index feature；With desired value with The ratio of the difference and maximum of minimum value and the difference of the minimum value, as under this feature index after standardization Desired value.Deviation Standardization Act is to carry out linear transformation to original desired value.If minA and maxA are respectively index feature A Minimum value and maximum, an index feature A original value x is mapped in section [0,1] by deviation Standardization Act Value x', its formula is：X'=(x-minA)/(maxA-minA).

In other embodiment, standardized module 303 can use Z-score Standardization Acts (zero-mean Normalization) standard deviation Standardization Act is also made to be standardized.Data fit mark by this standardization Quasi normal distribution, i.e. average are 0, standard deviation 1, and its conversion function is：

It the above is only the simple of method for being standardized standardized module 303 to enumerate, those skilled in the art The method that other standardizations can also be selected according to actual conditions, the present invention are not specifically limited to this.

In existing user's value category model, substantial amounts of index feature is often screened, on the one hand, index feature is excessively numerous It is trivial, so that processing all takes a substantial amount of time cost and database resource every time；On the other hand, index feature is more, variable Between more or less correlation can damage the effect of cluster, and too many index feature can make subsequent cluster become very multiple It is miscellaneous.The present invention utilize dimensionality reduction thought, by the dependence inside original variable correlation matrix, by each user's Original feature vector progress factorial analysis, the value based on each user under the factor variable determine the synthesis of each user Characteristic vector, the more important multi-feature vector that can be picked out carry out cluster analysis, so as to by original index synthesis into Less index, there is the variable of intricate relation to be attributed to a few multi-stress some, avoid index feature mistake In cumbersome, the factor quantity of cluster analysis is reduced, reduces the time cost and database resource consumption of user's value category.

In certain embodiments, the mathematical modeling of factorial analysis is：

It is abbreviated as：

The factor score of each common factor can be regarded as the variance contribution ratio of the common factor.The factor of common factor Score is higher, shows that the significance level of the common factor is bigger.The present invention alternative embodiment in, analysis module 305 by because Sub- score meets the common factor of setting rule as factor variable, for example, factor score is exceeded setting threshold by analysis module 305 The common factor of value is as factor variable.Certainly, analysis module 305 can also arrange according to the order of factor score from high to low Common factor, obtain common factor sequence；Using preceding n common factor in common factor sequence as factor variable；Wherein, 1≤n ＜ k, k are the number of common factor, and n, k are integer.N value can be determined according to actual conditions, such as n=5.Full On the basis of sufficient user's value category accuracy, the appropriate value for reducing n, when can reduce amount of calculation and the processing of cluster analysis Between, the efficiency of raising user's value category.

Cluster module 306 carries out cluster analysis to the multi-feature vector of each user, it is determined that the classification of each user.It is poly- Class is the process to different class or cluster by data-object classifications, and the data object in same class has very big similitude, , can by being clustered to the multi-feature vector of each user and the data object between inhomogeneity has very big diversity Avoid the value according to certain quantile division user from being grouped, improve objectivity and accuracy that user is worth packet.

In order to which some data object is divided into the class, division methods, hierarchical method, the side based on density can be used The method of the cluster analyses such as method, the method based on grid and the method based on model, for above-mentioned each class cluster analysis Method, all there is the clustering algorithm being used widely, such as：K averages (K-means) clustering algorithm in division methods, Coagulation type hierarchical clustering algorithm in hierarchical method, based on neural network clustering algorithm in model method etc..Certainly, in order to not , can also be using the method for fuzzy cluster analysis, for example with fuzzy rigidly by a data object categorization into certain one kind Cluster algorithm, determine that each Data Data is under the jurisdiction of the degree of each class by membership function.Those skilled in the art Suitable clustering method and algorithm can be selected according to actual conditions, the present invention is not especially limited to this.

The central point number of cluster can determine based on experience value, can also be obtained by model training.The present invention's In alternative embodiment, cluster module 306 determines the central point number K of cluster using ancon rule.For example, it can pre-estimate poly- The central point number K of class, then draws the cost function value of different K values by generally 2 to 15.Cost function is each Distortion degree (distortions) sum of class, can be denoted as：

Alternatively, cluster module 306 clusters to the multi-feature vector of each user, including：

Select the central value of central point each clustered；

Fig. 4, which is shown, can apply user's value category method of the embodiment of the present invention or showing for user's value category device Example sexual system framework 400.

As shown in figure 4, system architecture 400 can include terminal device 401,402,403, network 404 and server 405. Network 404 between terminal device 401,402,403 and server 405 provide communication link medium.Network 404 can be with Including various connection types, such as wired, wireless communication link or fiber optic cables etc..

User can be interacted with using terminal equipment 401,402,403 by network 404 with server 405, to receive or send out Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 401,402,403 The application of page browsing device, searching class application, JICQ, mailbox client, social platform software etc..

Terminal device 401,402,403 can have a display screen and a various electronic equipments that supported web page browses, bag Include but be not limited to smart mobile phone, tablet personal computer, pocket computer on knee and desktop computer etc..

Server 405 can be to provide the server of various services, such as utilize terminal device 401,402,403 to user The shopping class website browsed provides the back-stage management server supported.Back-stage management server can be believed the product received The data such as breath inquiry request are carried out the processing such as analyzing, and result is fed back into terminal device.

It should be noted that user's value category method that the embodiment of the present invention is provided typically is performed by server 405, Correspondingly, user's value category device is generally positioned in server 405.

It should be understood that the number of the terminal device, network and server in Fig. 4 is only schematical.According to realizing need Will, can have any number of terminal device, network and server.

One or more processors；

Storage device, for storing one or more programs,

When one or more programs are by one or more of computing devices so that the one or more processors are realized User's value category method of the invention based on cluster analysis.

Below with reference to Fig. 5, it illustrates suitable for for realizing the computer system 500 of the terminal device of the embodiment of the present invention Structural representation.Terminal device shown in Fig. 5 is only an example, to the function of the embodiment of the present invention and should not use model Shroud carrys out any restrictions.

As shown in figure 5, computer system 500 includes CPU (CPU) 501, it can be read-only according to being stored in Program in memory (ROM) 502 or be loaded into program in random access storage device (RAM) 503 from storage part 508 and Perform various appropriate actions and processing.In RAM 503, also it is stored with system 500 and operates required various programs and data. CPU 501, ROM 502 and RAM 503 are connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to always Line 504.

I/O interfaces 505 are connected to lower component：Importation 506 including keyboard, mouse etc.；Penetrated including such as negative electrode The output par, c 507 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.；Storage part 508 including hard disk etc.； And the communications portion 509 of the NIC including LAN card, modem etc..Communications portion 509 via such as because The network of spy's net performs communication process.Driver 510 is also according to needing to be connected to I/O interfaces 505.Detachable media 511, such as Disk, CD, magneto-optic disk, semiconductor memory etc., it is arranged on as needed on driver 510, in order to read from it Computer program be mounted into as needed storage part 508.

Especially, according to embodiment disclosed by the invention, may be implemented as counting above with reference to the process of flow chart description Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product, it includes being carried on computer Computer program on computer-readable recording medium, the computer program include the program code for being used for the method shown in execution flow chart. In such embodiment, the computer program can be downloaded and installed by communications portion 509 from network, and/or from can Medium 511 is dismantled to be mounted.When the computer program is performed by CPU (CPU) 501, system of the invention is performed The above-mentioned function of middle restriction.

It should be noted that the computer-readable medium shown in the present invention can be computer-readable signal media or meter Calculation machine readable storage medium storing program for executing either the two any combination.Computer-readable recording medium for example can be --- but not Be limited to --- electricity, magnetic, optical, electromagnetic, system, device or the device of infrared ray or semiconductor, or it is any more than combination.Meter The more specifically example of calculation machine readable storage medium storing program for executing can include but is not limited to：Electrical connection with one or more wires, just Take formula computer disk, hard disk, random access storage device (RAM), read-only storage (ROM), erasable type and may be programmed read-only storage Device (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light storage device, magnetic memory device, Or above-mentioned any appropriate combination.In the present invention, computer-readable recording medium can any include or store journey The tangible medium of sequence, the program can be commanded the either device use or in connection of execution system, device.And at this In invention, computer-readable signal media can include in a base band or as carrier wave a part propagation data-signal, Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can Any computer-readable medium beyond storage medium is read, the computer-readable medium, which can send, propagates or transmit, to be used for By instruction execution system, device either device use or program in connection.Included on computer-readable medium Program code can be transmitted with any appropriate medium, be included but is not limited to：Wirelessly, electric wire, optical cable, RF etc., or it is above-mentioned Any appropriate combination.

Flow chart and block diagram in accompanying drawing, it is illustrated that according to the system of various embodiments of the invention, method and computer journey Architectural framework in the cards, function and the operation of sequence product.At this point, each square frame in flow chart or block diagram can generation The part of one module of table, program segment or code, a part for above-mentioned module, program segment or code include one or more For realizing the executable instruction of defined logic function.It should also be noted that some as replace realization in, institute in square frame The function of mark can also be with different from the order marked in accompanying drawing generation.For example, two square frames succeedingly represented are actual On can perform substantially in parallel, they can also be performed in the opposite order sometimes, and this is depending on involved function.Also It is noted that the combination of each square frame and block diagram in block diagram or flow chart or the square frame in flow chart, can use and perform rule Fixed function or the special hardware based system of operation are realized, or can use the group of specialized hardware and computer instruction Close to realize.

Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard The mode of part is realized.Described module can also be set within a processor, for example, can be described as：A kind of processor bag Include module, acquisition module, determining module and first processing module.Wherein, the title of these modules not structure under certain conditions The paired restriction of the module in itself, for example, sending module is also described as " sending picture to the service end connected to obtain The module of request ".

As on the other hand, present invention also offers a kind of computer-readable medium, the computer-readable medium can be Included in equipment described in above-described embodiment；Can also be individualism, and without be incorporated the equipment in.Above-mentioned calculating Machine computer-readable recording medium carries one or more program, when said one or multiple programs are performed by the equipment, makes Obtain the equipment and realize user's value category method of the invention based on cluster analysis.

Above-mentioned embodiment, does not form limiting the scope of the invention.Those skilled in the art should be bright It is white, depending on design requirement and other factors, various modifications, combination, sub-portfolio and replacement can occur.It is any Modifications, equivalent substitutions and improvements made within the spirit and principles in the present invention etc., should be included in the scope of the present invention Within.

Claims

A kind of 1. user's value category method based on cluster analysis, it is characterised in that including：

According to desired value of the user under each index feature, it is determined that the original feature vector of each user；

Factorial analysis is carried out to the original feature vector of each user, factor score is met to the common factor of setting rule As factor variable；Value based on each user under the factor variable, it is determined that the multi-feature vector of each user；

Cluster analysis is carried out to the multi-feature vector of each user, it is determined that the classification of each user.
2. user's value category method as claimed in claim 1, it is characterised in that

The common factor is arranged according to the order of factor score from high to low, obtains common factor sequence；

Using preceding n common factor in the common factor sequence as factor variable；Wherein, 1≤n ＜ k, k are common factor Number, n, k are integer.
3. user's value category method as claimed in claim 1, it is characterised in that to the primitive character of each user to Before amount carries out factorial analysis, further comprise：

By data cleansing, identify whether the desired value is exceptional value, also,

If the desired value is exceptional value, the desired value is rejected.
4. user's value category method as claimed in claim 3, it is characterised in that data cleansing is carried out by box-shaped figure.
5. user's value category method as claimed in claim 1, it is characterised in that to the primitive character of each user to Before amount carries out factorial analysis, further comprise：

Judge whether the index feature is negative sense index, also,

If the index feature is negative sense index, positiveization processing is carried out to the desired value under the negative sense index.
6. user's value category method as claimed in claim 5, it is characterised in that as follows to the negative sense index Under the desired value carry out positiveization processing：

Obtain the Maximum Index value under the negative sense index；

With the Maximum Index value and the difference of the desired value, as the finger under the negative sense index after positiveization processing Scale value.
7. user's value category method as claimed in claim 1, it is characterised in that to the primitive character of each user to Before amount carries out factorial analysis, further comprise：

The desired value is standardized.
8. user's value category method as claimed in claim 7, it is characterised in that enter as follows to the desired value Row standardization：

Obtain the maximum and minimum value of an index feature；

With the ratio of the difference and the maximum and the difference of the minimum value of the desired value and the minimum value, as mark The desired value under this feature index after quasi-ization processing.
9. user's value category method as claimed in claim 1, it is characterised in that the center of cluster is determined using ancon rule Point number.
10. user's value category method as claimed in claim 1, it is characterised in that to the comprehensive characteristics of each user Vector carries out cluster analysis, including：

Select the central value of central point each clustered；

The central value of each central point is updated by iteration, in each iterative process：It is described comprehensive special based on each user Levy it is vectorial each user and the distance of each central point are determined with each central point central value, by each user be included into Class where its most short described central point of distance；Update the central value of each central point；

If the central value of each central point keeps constant before and after renewal, iteration terminates.
A kind of 11. user's value category device based on cluster analysis, it is characterised in that including：

Acquisition module, for the desired value according to user under each index feature, it is determined that the original feature vector of each user；

Analysis module, for carrying out factorial analysis to the original feature vector of each user, factor score is met to set The common factor of rule is as factor variable；Value based on each user under the factor variable, it is determined that each user Multi-feature vector；

Cluster module, the cluster module carries out cluster analysis to the multi-feature vector of each user, it is determined that each use The classification at family.
12. user's value category device as claimed in claim 11, it is characterised in that

The analysis module arranges the common factor according to the order of factor score from high to low, obtains common factor sequence；

The analysis module is using preceding n common factor in common factor sequence as factor variable；Wherein, 1≤n ＜ k, k are public affairs The number of common factor, n, k are integer.
13. user's value category device as claimed in claim 11, it is characterised in that further comprise：Cleaning module, it is used for Identify whether the desired value is exceptional value, if also, the desired value is exceptional value, reject the desired value.
14. user's value category device as claimed in claim 13, it is characterised in that the cleaning module is entered by box-shaped figure Row data cleansing.
15. user's value category device as claimed in claim 11, it is characterised in that further comprise：Forward directionization module, use In judging whether the index feature is negative sense index, if also, the index feature is negative sense index, to the negative sense index Under the desired value carry out positiveization processing.
16. user's value category device as claimed in claim 15, it is characterised in that the forward directionization module is according to such as lower section Method carries out positiveization processing to the desired value under the negative sense index：

Obtain the Maximum Index value under the negative sense index；

With the Maximum Index value and the difference of the desired value, as the finger under the negative sense index after positiveization processing Scale value.
17. user's value category device as claimed in claim 11, it is characterised in that further comprise：Standardized module, use It is standardized in the desired value.
18. user's value category device as claimed in claim 17, it is characterised in that the standardized module is according to such as lower section Method is standardized to the desired value：

Obtain the maximum and minimum value of an index feature；

With the ratio of the difference and the maximum and the difference of the minimum value of the desired value and the minimum value, as mark The desired value under this feature index after quasi-ization processing.
19. user's value category device as claimed in claim 11, it is characterised in that the cluster module uses ancon rule It is determined that the central point number of cluster.
20. user's value category device as claimed in claim 11, it is characterised in that the cluster module is to each use The multi-feature vector at family carries out cluster analysis, including：

Select the central value of central point each clustered；

The central value of each central point is updated by iteration, in each iterative process：It is described comprehensive special based on each user Levy it is vectorial each user and the distance of each central point are determined with each central point central value, by each user be included into Class where its most short described central point of distance；Update the central value of each central point；

If the central value of each central point keeps constant before and after renewal, iteration terminates.
A kind of 21. user's value category terminal based on cluster analysis, it is characterised in that including：

One or more processors；

Storage device, for storing one or more programs,

When one or more of programs are by one or more of computing devices so that one or more of processors are real The now method as described in any in claim 1-10.
22. a kind of computer-readable medium, is stored thereon with computer program, it is characterised in that described program is held by processor The method as described in any in claim 1-10 is realized during row.