CN107480187A - User's value category method and apparatus based on cluster analysis - Google Patents
User's value category method and apparatus based on cluster analysis Download PDFInfo
- Publication number
- CN107480187A CN107480187A CN201710555480.1A CN201710555480A CN107480187A CN 107480187 A CN107480187 A CN 107480187A CN 201710555480 A CN201710555480 A CN 201710555480A CN 107480187 A CN107480187 A CN 107480187A
- Authority
- CN
- China
- Prior art keywords
- user
- value
- index
- desired value
- factor
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Finance (AREA)
- General Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Databases & Information Systems (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- General Engineering & Computer Science (AREA)
- Entrepreneurship & Innovation (AREA)
- General Business, Economics & Management (AREA)
- Marketing (AREA)
- Economics (AREA)
- Game Theory and Decision Science (AREA)
- Probability & Statistics with Applications (AREA)
- Life Sciences & Earth Sciences (AREA)
- Artificial Intelligence (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Bioinformatics & Computational Biology (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Evolutionary Biology (AREA)
- Evolutionary Computation (AREA)
- Management, Administration, Business Operations System, And Electronic Commerce (AREA)
Abstract
The invention discloses user's value category method and apparatus based on cluster analysis, it is related to field of computer technology.One embodiment of this method includes:According to desired value of the user under each index feature, it is determined that the original feature vector of each user;Factorial analysis is carried out to the original feature vector of each user, meets the common factor of setting rule as factor variable factor score;Value based on each user under the factor variable, it is determined that the multi-feature vector of each user;Cluster analysis is carried out to the multi-feature vector of each user, it is determined that the classification of each user.The embodiment can prevent that index feature is excessively cumbersome, reduce the factor quantity of cluster analysis, reduce the time cost and database resource consumption of user's value category;It is avoided that and empirically assigns weight for each index feature, improves weight distribution accuracy, and then improve the accuracy of user's value category;Objectivity and accuracy that user is worth packet can be improved.
Description
Technical field
The present invention relates to field of computer technology, more particularly to a kind of user's value category method based on cluster analysis and
Device.
Background technology
With the continuous development of relationship marketing theory, customer relation management is paid attention to by more and more enterprises, enterprise
Manage emphasis and client is turned to by product one after another.Effectively management how is carried out to huge customer group and has become many enterprises not
Avoidable problem.Effectively identification customer value and feature, implement targetedly to manage on this basis, are customer relation managements
Center of gravity and key issue.Client is the basis of enterprise's survival and development, with being becoming better and approaching perfection day by day for customer relation management, is obtained simultaneously
High-quality client is kept to turn into the focus of enterprises pay attention, also therefore as the focus of Marketing researcher concern.Obtain with
In the research for keeping client, the problem of how to weigh the value of client turns into a key.In existing user's value models, greatly
Desired value of the user under each index feature is multiplied by the weight of the index feature empirically assigned more, product addition is obtained
The value score of the user, score is divided according to certain quantile, obtain the final value packet of user.
In process of the present invention is realized, inventor has found that at least there are the following problems in the prior art:
Existing user's value models are excessively cumbersome in the selection of index feature, so that processing all expends largely every time
Time cost and database resource;
Weight empirically is assigned for each index feature, is easily produced because the weight proportioning of index feature is larger or smaller
Caused by user value packet it is inaccurate;
The value that existing user's value models divide user according to certain quantile is grouped, and dividing mode is excessively main
See, be dogmatic.
The content of the invention
In view of this, the embodiment of the present invention provides a kind of user's value category method and apparatus based on cluster analysis, energy
It is enough it is objective, carry out user's value category exactly.
To achieve the above object, a kind of one side according to embodiments of the present invention, there is provided use based on cluster analysis
Family value category method, including:
According to desired value of the user under each index feature, it is determined that the original feature vector of each user;
Factorial analysis is carried out to the original feature vector of each user, factor score is met into the public of setting rule
The factor is as factor variable;Value based on each user under the factor variable, it is determined that the comprehensive characteristics of each user to
Amount;
The multi-feature vector of each user is clustered, it is determined that the classification of each user.
Alternatively, the common factor is arranged according to the order of factor score from high to low, obtains common factor sequence;
Using preceding n common factor in the common factor sequence as factor variable;Wherein, 1≤n < k, k for it is public because
The number of son, n, k are integer.
Alternatively, before to the original feature vector progress factorial analysis of each user, further comprise:
By data cleansing, identify whether the desired value is exceptional value, also,
If the desired value is exceptional value, the desired value is rejected.
Alternatively, data cleansing is carried out by box-shaped figure.
Alternatively, before to the original feature vector progress factorial analysis of each user, further comprise:
Judge whether the index feature is negative sense index, also,
If the index feature is negative sense index, positiveization processing is carried out to the desired value under the negative sense index.
Alternatively, positiveization processing is carried out to the desired value under the negative sense index as follows:
Obtain the Maximum Index value under the negative sense index;
With the Maximum Index value and the difference of the desired value, as under the negative sense index after positiveization processing
Desired value.
Alternatively, before to the original feature vector progress factorial analysis of each user, further comprise:To described
Desired value is standardized.
Alternatively, the desired value is standardized as follows:
Obtain the maximum and minimum value of an index feature;
With the ratio of the difference and the maximum and the difference of the minimum value of the desired value and the minimum value, make
For the desired value under this feature index after standardization.
Alternatively, the central point number of cluster is determined using ancon rule.
Alternatively, cluster analysis is carried out to the multi-feature vector of each user, including:
Select the central value of central point each clustered;
The central value of each central point is updated by iteration, in each iterative process:Based on the described comprehensive of each user
The central value for closing characteristic vector and each central point determines each user and the distance of each central point, and each user is returned
Enter and the class where its most short central point of distance;Update the central value of each central point;
If the central value of each central point keeps constant before and after renewal, iteration terminates.
According to a further aspect of the invention, a kind of user's value category device based on cluster analysis is additionally provided, is wrapped
Include:
Acquisition module, for the desired value according to user under each index feature, it is determined that the primitive character of each user
Vector;
Analysis module, for carrying out factorial analysis to the original feature vector of each user, factor score is met
The common factor of rule is set as factor variable;Value based on each user under the factor variable, it is determined that each use
The multi-feature vector at family;
Cluster module, the cluster module carries out cluster analysis to the multi-feature vector of each user, it is determined that often
The classification of individual user.
Alternatively, the analysis module arranges the common factor according to the order of factor score from high to low, obtains public affairs
Common factor sequence;
The analysis module is using preceding n common factor in the common factor sequence as factor variable;Wherein, 1≤n <
K, k are the number of common factor, and n, k are integer.
Alternatively, user's value category device of the invention further comprises:Cleaning module, for identifying the desired value
Whether it is exceptional value, if also, the desired value is exceptional value, reject the desired value.
Alternatively, the cleaning module carries out data cleansing by box-shaped figure.
Alternatively, user's value category device of the invention further comprises:Forward directionization module, for judging the index
Whether feature is negative sense index, if also, the index feature is negative sense index, to the desired value under the negative sense index
Carry out positiveization processing.
Alternatively, the forward directionization module is carried out positive to the desired value under the negative sense index as follows
Change is handled:
Obtain the Maximum Index value under the negative sense index;
With the Maximum Index value and the difference of the desired value, as under the negative sense index after positiveization processing
Desired value.
Alternatively, user's value category device of the invention further comprises:Standardized module, for the desired value
It is standardized.
Alternatively, standardized module is standardized to the desired value as follows:
Obtain the maximum and minimum value of an index feature;
With the ratio of the difference and the maximum and the difference of the minimum value of the desired value and the minimum value, make
For the desired value under this feature index after standardization.
Alternatively, cluster module determines the central point number of cluster using ancon rule.
Alternatively, the cluster module carries out cluster analysis to the multi-feature vector of each user, including:
Select the central value of central point each clustered;
The central value of each central point is updated by iteration, in each iterative process:Based on the described comprehensive of each user
The central value for closing characteristic vector and each central point determines each user and the distance of each central point, and each user is returned
Enter and the class where its most short central point of distance;Update the central value of each central point;
If the central value of each central point keeps constant before and after renewal, iteration terminates.
According to another aspect of the present invention, there is provided a kind of user's value category terminal based on cluster analysis, including:
One or more processors;
Storage device, for storing one or more programs,
When one or more of programs are by one or more of computing devices so that one or more of processing
Device realizes user's value category method of the invention based on cluster analysis.
According to the still another aspect of the present invention, there is provided a kind of computer-readable medium, computer program is stored thereon with, its
It is characterised by, user's value category method of the invention based on cluster analysis is realized when described program is executed by processor.
One embodiment in foregoing invention has the following advantages that or beneficial effect:
By carrying out factorial analysis to the original feature vector of each user, based on each user under the factor variable
Value determine the multi-feature vector of each user, can prevent that index feature is excessively cumbersome, reduce the factor of cluster analysis
Quantity, reduce the time cost and database resource consumption of user's value category;
By the way that factor score being met to, the common factor of setting rule is used as factor variable, it is empirically each that can avoid
Index feature assigns weight, improves weight distribution accuracy, and then improve the accuracy of user's value category;
By carrying out cluster analysis to the multi-feature vector of each user, can avoid according to certain quantile division
The value packet of user, improves objectivity and accuracy that user is worth packet.
Further effect adds hereinafter in conjunction with embodiment possessed by above-mentioned non-usual optional mode
With explanation.
Brief description of the drawings
Accompanying drawing is used to more fully understand the present invention, does not form inappropriate limitation of the present invention.Wherein:
Fig. 1 is the signal of the main flow of user's value category method according to embodiments of the present invention based on cluster analysis
Figure;
Fig. 2 is the signal of the key step of user's value category method according to embodiments of the present invention based on cluster analysis
Figure;
Fig. 3 is the signal of the main modular of user's value category device according to embodiments of the present invention based on cluster analysis
Figure;
Fig. 4 is that the embodiment of the present invention can apply to exemplary system architecture figure therein;
Fig. 5 is adapted for the structural representation for realizing the terminal device of the embodiment of the present invention or the computer system of server
Figure.
Embodiment
The one exemplary embodiment of the present invention is explained below in conjunction with accompanying drawing, including the various of the embodiment of the present invention
Details should think them only exemplary to help understanding.Therefore, those of ordinary skill in the art should recognize
Arrive, various changes and modifications can be made to the embodiments described herein, without departing from scope and spirit of the present invention.Together
Sample, for clarity and conciseness, the description to known function and structure is eliminated in following description.
Fig. 1 is a kind of main flow of user's value category method based on cluster analysis according to embodiments of the present invention
Schematic diagram, as shown in Figure 1.
In step S101, the desired value first according to user under each index feature, it is determined that each user's is original
Characteristic vector.
Those skilled in the art can determine user and index feature according to the purpose of actual conditions and user's value analysis
Selection mode.By taking electric business platform as an example, it can select to have within nearly 1 year the account to place an order as user, can select as following
In institute's column data any one, two or more are as index feature:
User efficiently accomplishes all spending amount summations of order in nearly 1 year;
The nearly 1 year maximum order amount of money:User efficiently accomplishes the maximum spending amount of order in nearly 1 year;
Nearly 1 year consumption visitor's unit price:User efficiently accomplishes visitor's unit price of order in nearly 1 year;
Nearly 1 year order volume:User efficiently accomplishes the order volume summation of order in nearly 1 year;
The one-level category number of purchase in nearly 1 year:User efficiently accomplishes the one-level category number of purchase in nearly 1 year;
The merchandise discount rate of purchase in nearly 1 year:The nearly 1 year merchandise discount rate for efficiently accomplishing purchase of user;
Nearly 1 year comprehensive rate of gross profit:The commodity summation rate of gross profit that user effectively buys for nearly 1 year;
Nearly 1 year goods return and replacement information:User's return of goods number of nearly 1 year;
Reject order volume within nearly 1 year:User's rejection order volume of nearly 1 year;
Log in number of days:User adds up the number of days logged in from the registration;
Shine odd number amount:User is accumulative to shine single number;
Nearly 1 year concern commodity amount:The quantity of user's concern commodity of nearly 1 year;
Participate within nearly 1 year evaluation number:The number of user's participation evaluation of nearly 1 year.
Certainly, user's value category method of the invention can also be applied to other field.To be sent out applied to Chinese society
, can be using each province of China, autonomous region or municipality directly under the Central Government as user, with per-capita gross domestic product exemplified by opening up situation analysis
(Gross Domestic Product, GDP), town dweller year disposable income, newly-increased fixed assets, institution of higher education's number per capita
Amount, hygiene medical treatment mechanism quantity etc. are used as index feature.
When carrying out user's value category, some data are possible to extract from multiple operation systems, unavoidable to occur
The data that some data are wrong data, are had have situations such as conflict between each other, due to data acquisition modes or other reasonses,
Desired value under acquired index feature is also possible to exception be present.For the ease of description, there is exception by the present invention in these
The desired value of index feature is referred to as exceptional value.Such as when single number is index feature under using user, if user's malice brushes list
Meeting, the then value for corresponding to lower single number of user are exceptional value;For another example during using a certain regional GDP per capita as index feature,
If the economic data of this area has a certain degree of statistical error, the GDP per capita value for corresponding to area is exceptional value;Compare again
When such as, using the addition of a certain chemical reagent as index feature, if weigh assay balance used in the chemical reagent exist compared with
Big systematic error, then the addition of the chemical reagent determined using the assay balance is exceptional value, etc..In order to avoid as far as possible
These exceptional values, in some preferred embodiments, can be to each use to the issuable negative effect of user's value category
Before the original feature vector at family carries out factorial analysis, by data cleansing, whether distinguishing indexes value is exceptional value, if also,
The desired value is exceptional value, rejects the desired value.
Have in the prior art and screen exceptional value using 3 σ rules or criterion score (i.e. z-score, z-score) method.
It is well known that 3 σ rules or z-score method are premised on assuming data Normal Distribution, but real data is often not
Strict Normal Distribution;3 σ rules or z-score method judge that the standard of exceptional value is the average and mark with parameter value
Based on quasi- difference, and the resistance of average and standard deviation is minimum, and exceptional value can produce considerable influence to them in itself, so produce
Raw exceptional value number will not be more than sum 0.7%.Obviously, should judge in this way in non-normal data abnormal
Value, its validity is limited.
In some currently preferred embodiments of the present invention, data cleansing is carried out by box-shaped figure (Box-plot).Box-shaped figure
Draw by actual desired value, it is not necessary to which the desired value of prior conditional indicator feature obeys specific distribution form, therefore does not have
Have to the requirement of imposing any restrictions property of desired value, can the true intuitively style of performance indicators value, objectively identification is abnormal
Value.
For example, being directed to a certain index feature, first by the ascending arrangement of the desired value under the index feature and can be divided into
Quarter, the 25%th desired value is defined as first quartile (Q1) after ascending arrangement, also known as " lower quartile ";By
The small desired value to the after longer spread the 50%th is defined as the second quartile (Q2), also known as " median ";After ascending arrangement
75%th desired value is defined as the 3rd quartile (Q3), also known as " upper quartile ";3rd quartile with the one or four point
The gap of digit is also known as interquartile-range IQR (InterQuartile Range, IQR).Then, (Q1-1.5IQR) will be less than or be more than
(Q3+1.5IQR) desired value is defined as exceptional value and rejected.The span of exceptional value can be entered according to actual conditions
Row setting, the present invention are not specifically limited to it.In the above-described embodiments, based on quartile and interquartile-range IQR, four points
Digit has certain resistance, and up to 25% desired value can become any far without greatly disturbing quartile.
As can be seen here, above-mentioned quartile method has certain superiority in terms of exceptional value is identified.
The desired value of some index features is the bigger the better, but also have some index features desired value be not it is more big more
Well, the desired values such as return of goods number, rejection number during such as applied to electric business platform., will be upper in the present invention for the ease of description
State two kinds of index features and be respectively defined as positive index and negative sense index, concrete kind of the present invention to positive index and negative sense index
Type is not specifically limited, such as the index feature that desired value can be the bigger the better is defined as positive index, desired value is smaller
Better index feature is defined as negative sense index, it is of course also possible to which the smaller the better index feature of desired value is defined as into forward direction
Index, the index feature that desired value is the bigger the better are defined as negative sense index.In order to avoid negative sense index is to user's value category
Issuable adverse effect, can be before factorial analysis be carried out to the original feature vector of each user, and judge index is special
Whether sign is negative sense index, if also, the index feature is negative sense index, positiveization is carried out to the desired value under the negative sense index
Processing.
Those skilled in the art can be realized to negative sense index by way of taking the opposite number of desired value of negative sense index
Positiveization processing, certainly, those skilled in the art can also use other modes carry out positiveization processing, the present invention to forward direction
The method for changing processing is not specifically limited.In certain embodiments, can be as follows to the desired value under negative sense index
Carry out positiveization processing:
Obtain the Maximum Index value max (x) under negative sense index;
With Maximum Index value max (x) and desired value x difference x_new=max (x)-x, after positiveization processing
Desired value under negative sense index.
In some cases, be inconvenient to be compared between the desired value of part index number feature, such as same spy
Index is levied, is difficult to directly be compared between the two indices value using different dimensions measurement;For another example for same feature
Index, using percent desired value determined compared with being difficult to directly between the desired value determined using ten point system;Etc..In order to
Make that there is comparativity between each desired value of same index feature or the desired value of different index features, it is preferable that right
Before the original feature vector of each user carries out factorial analysis, it may further include:Desired value is standardized.
In certain embodiments, can be standardized using deviation Standardization Act, for example, right as follows
Desired value is standardized:Obtain the maximum and minimum value of an index feature;With the difference of desired value and minimum value and
The ratio of maximum and the difference of the minimum value, as the desired value under this feature index after standardization.Deviation
Standardization Act is to carry out linear transformation to original desired value.If minA and maxA are respectively index feature A minimum value and most
Big value, the value x' in section [0,1], its formula are mapped to by an index feature A original value x by deviation Standardization Act
For:X'=(x-minA)/(maxA-minA).
In other embodiment, Z-score Standardization Acts (zero-mean normalization) can be used
Standard deviation Standardization Act is made to be standardized.Data fit standardized normal distribution by this standardization, i.e.,
It is worth and is for 0, standard deviation 1, its conversion function:
Wherein, μ is the average of all desired values of an index feature, and σ is the standard of all desired values of an index feature
Difference, x are the desired value before standardization, and x ' is the desired value after standardization.
It the above is only and the simple of standardization processing method is enumerated, those skilled in the art can also select according to actual conditions
The method for selecting other standardizations, the present invention are not limited this.
Step S102, factorial analysis is carried out to the original feature vector of each user, factor score is met into setting rule
Common factor as factor variable;Value based on each user under factor variable, it is determined that the comprehensive characteristics of each user
Vector.
In existing user's value category model, substantial amounts of index feature is often screened, on the one hand, index feature is excessively numerous
Trivial, processing every time all takes a substantial amount of time cost and database resource;On the other hand, index feature is more, between variable or
More or few correlations can damage the effect of cluster, and too many index feature participate after can become subsequent cluster
It is very complicated.The present invention utilize dimensionality reduction thought, by the dependence inside original variable correlation matrix, by each use
The original feature vector at family carries out factorial analysis, the value based on each user under the factor variable determines each user's
Multi-feature vector, the more important multi-feature vector that can be picked out carries out cluster analysis, so as to which original index is comprehensive
Synthesize less index, there is the variable of intricate relation to be attributed to a few multi-stress some, avoid index special
Go on a punitive expedition in cumbersome, reduce the factor quantity of cluster analysis, the time cost and database resource for reducing user's value category expend
Amount.
In the prior art, weight empirically often is assigned for each index feature, easily produced due to the power of index feature
Match again it is larger or smaller caused by user value packet it is inaccurate.The present invention by factor score by meeting setting rule
Common factor as factor variable, can avoid empirically assigning weight for each index feature, it is accurate to improve weight distribution
Property, and then improve the accuracy of user's value category.
In certain embodiments, the mathematical modeling of factorial analysis is:
The mathematical modeling of above-mentioned factorial analysis can also the form of matrix be expressed as:
It is abbreviated as:
Wherein, X is the original variable matrix of factorial analysis, and A is Factor load-matrix, and F is common factor matrix, and p is finger
The quantity of feature is marked, k is the quantity of common factor, and p≤k, ε are specific factor.
The factor score of each common factor can be regarded as the variance contribution ratio of the common factor.The factor of common factor
Score is higher, shows that the significance level of the common factor is bigger.In the alternative embodiment of the present invention, factor score is met to set
The common factor of set pattern then is as factor variable, for example, the common factor that factor score is exceeded to given threshold becomes as the factor
Amount.It is of course also possible to arrange common factor according to the order of factor score from high to low, common factor sequence is obtained;Will be public
Preceding n common factor is as factor variable in factor sequence;Wherein, 1≤n < k, k are the number of common factor, and n, k are integer.
N value can be determined according to actual conditions, such as n=5.On the basis of user's value category accuracy is met, fit
When the value for reducing n, amount of calculation and the processing time of cluster analysis can be reduced, improves the efficiency of user's value category.
Step S103, cluster analysis is carried out to the multi-feature vector of each user, it is determined that the classification of each user.Cluster
It is the process by data-object classifications to different class or cluster, the data object in same class has very big similitude, and
Data object between inhomogeneity has very big diversity.By being clustered to the multi-feature vector of each user, can keep away
Exempt to be grouped according to the value of certain quantile division user, improve objectivity and accuracy that user is worth packet.
In order to which a certain data object is divided into the class, division methods, hierarchical method, the side based on density can be used
The method of the cluster analyses such as method, the method based on grid and the method based on model, for above-mentioned each class cluster analysis
Method, all there is the clustering algorithm being used widely, such as:K averages (K-means) clustering algorithm in division methods,
Coagulation type hierarchical clustering algorithm in hierarchical method, based on neural network clustering algorithm in model method etc..Certainly, in order to not
, can also be using the method for fuzzy cluster analysis, for example with fuzzy rigidly by a data object categorization into certain one kind
Cluster algorithm, determine that each Data Data is under the jurisdiction of the degree of each class by membership function.Those skilled in the art
Suitable clustering method and algorithm can be selected according to actual conditions, the present invention is not especially limited to this.
The central point number of cluster can determine based on experience value, can also be obtained by model training.The present invention's
In alternative embodiment, the central point number K of cluster is determined using ancon rule.For example, the central point of cluster can be pre-estimated
Number K, generally 2 to 15, then the cost function value of different K values is drawn.Cost function is the distortion journey of each class
(distortions) sum is spent, can be denoted as:
Wherein, CmFor all point sets of m-th of class, yiFor the arbitrfary point in m-th of class, μmFor the centre bit of m-th of class
Put.The distortion degree of each class is equal to the quadratic sum of the distance of such center and the position each put inside it.In class
The point in portion is compacter to each other, then the distortion degree of class is smaller, conversely, the distortion of the more scattered then class to each other of the member inside class
Degree is bigger.K values during the increase of K values corresponding to the maximum position of average distortion degree fall are ancon, corresponding
K values be cluster central point number.
Alternatively, cluster analysis is carried out to the multi-feature vector of each user, including:
Select the central value of central point each clustered;
The central value of each central point is updated by iteration, in each iterative process:It is comprehensive special based on each user
Levy the vectorial distance that each user and each central point are determined with each central point central value, by each user be included into its away from
From the class where most short central point;Update the central value of each central point;
If the central value of each central point keeps constant before and after renewal, iteration terminates.
In the iterative process of above-described embodiment, it can be that central value before and after value iteration is equal that central value, which keeps constant,
Can also be that central value before and after value iteration is no more than default span.
Fig. 2 is the signal of the key step of user's value category method according to embodiments of the present invention based on cluster analysis
Figure, as shown in Fig. 2 including:
1st, desired value of the user under each index feature is obtained;
2nd, whether judge index feature is negative sense index:If so, positiveization place is carried out to the desired value under the index feature
Reason, then jump in next step;If it is not, then jump directly in next step;
3rd, each desired value is standardized;
4th, whether the desired value after criterionization processing is exceptional value:If so, rejecting the desired value, then jump to down
One step;If it is not, then jump directly in next step;
5th, the original feature vector of each user is determined;
6th, factorial analysis is carried out to the original feature vector of each user, by factor score meet setting rule it is public because
Son is used as factor variable;Value based on each user under factor variable, it is determined that the multi-feature vector of each user;
7th, cluster analysis is carried out to the multi-feature vector of each user, it is determined that the classification of each user.
Fig. 3 is the main modular of user's value category device 300 according to embodiments of the present invention based on cluster analysis
Schematic diagram, as shown in figure 3, including:
Acquisition module 301, for the desired value according to user under each index feature, it is determined that the original spy of each user
Sign vector;
Analysis module 305, for carrying out factorial analysis to the original feature vector of each user, factor score is met to set
The common factor of set pattern then is as factor variable;Value based on each user under factor variable, it is determined that each user's is comprehensive
Close characteristic vector;
Cluster module 306, cluster module 306 carries out cluster analysis to the multi-feature vector of each user, it is determined that each
The classification of user.
Those skilled in the art can determine user and index feature according to the purpose of actual conditions and user's value analysis
Selection mode.By taking electric business platform as an example, it can select to have within nearly 1 year the account to place an order as user, can select as following
In institute's column data any one, two or more are as index feature:
User efficiently accomplishes all spending amount summations of order in nearly 1 year;
The nearly 1 year maximum order amount of money:User efficiently accomplishes the maximum spending amount of order in nearly 1 year;
Nearly 1 year consumption visitor's unit price:User efficiently accomplishes visitor's unit price of order in nearly 1 year;
Nearly 1 year order volume:User efficiently accomplishes the order volume summation of order in nearly 1 year;
The one-level category number of purchase in nearly 1 year:User efficiently accomplishes the one-level category number of purchase in nearly 1 year;
The merchandise discount rate of purchase in nearly 1 year:The nearly 1 year merchandise discount rate for efficiently accomplishing purchase of user;
Nearly 1 year comprehensive rate of gross profit:The commodity summation rate of gross profit that user effectively buys for nearly 1 year;
Nearly 1 year goods return and replacement information:User's return of goods number of nearly 1 year;
Reject order volume within nearly 1 year:User's rejection order volume of nearly 1 year;
Log in number of days:User adds up the number of days logged in from the registration;
Shine odd number amount:User is accumulative to shine single number;
Nearly 1 year concern commodity amount:The quantity of user's concern commodity of nearly 1 year;
Participate within nearly 1 year evaluation number:The number of user's participation evaluation of nearly 1 year.
Certainly, user's value category device of the invention can also be applied to other field.To be sent out applied to Chinese society
, can be using each province of China, autonomous region or municipality directly under the Central Government as user, with GDP per capita, town dweller per capita exemplified by opening up situation analysis
Year disposable income, newly-increased fixed assets, institution of higher education's quantity, hygiene medical treatment mechanism quantity etc. are used as index feature.
When carrying out user's value category, some data are possible to extract from multiple operation systems, unavoidable to occur
The data that some data are wrong data, are had have situations such as conflict between each other, due to data acquisition modes or other reasonses,
Desired value under acquired index feature is also possible to exception be present.For the ease of description, there is exception by the present invention in these
The desired value of index feature is referred to as exceptional value.Such as when single number is index feature under using user, if user's malice brushes list
Meeting, the then value for corresponding to lower single number of user are exceptional value;For another example during using a certain regional GDP per capita as index feature,
If the economic data of this area has a certain degree of statistical error, the GDP per capita value for corresponding to area is exceptional value;Compare again
When such as, using the addition of a certain chemical reagent as index feature, if weigh assay balance used in the chemical reagent exist compared with
Big systematic error, then the addition of the chemical reagent determined using the assay balance is exceptional value, etc..In order to avoid as far as possible
These exceptional values are to the issuable negative effect of user's value category, in some preferred embodiments, user's valency of the invention
Value sorter further comprises:Cleaning module 304, for carrying out data cleansing, whether distinguishing indexes value is exceptional value, and
And if the desired value is exceptional value, reject the desired value.
Have in the prior art and screen exceptional value using 3 σ rules or criterion score (i.e. z-score, z-score) method.
It is well known that 3 σ rules or z-score method are premised on assuming data Normal Distribution, but real data is often not
Strict Normal Distribution;3 σ rules or z-score method judge that the standard of exceptional value is the average and mark with parameter value
Based on quasi- difference, and the resistance of average and standard deviation is minimum, and exceptional value can produce considerable influence to them in itself, so produce
Raw exceptional value number will not be more than sum 0.7%.Obviously, should judge in this way in non-normal data abnormal
Value, its validity is limited.
In some currently preferred embodiments of the present invention, cleaning module 304 is clear by box-shaped figure (Box-plot) progress data
Wash.The drafting of box-shaped figure is by actual desired value, it is not necessary to which the desired value of prior conditional indicator feature obeys specific distribution
Form, therefore not to the requirement of imposing any restrictions property of desired value, can the true intuitively style of performance indicators value, it is objective
Ground identifies exceptional value.
For example, being directed to a certain index feature, first by the ascending arrangement of the desired value under the index feature and can be divided into
Quarter, the 25%th desired value is defined as first quartile (Q1) after ascending arrangement, also known as " lower quartile ";By
The small desired value to the after longer spread the 50%th is defined as the second quartile (Q2), also known as " median ";After ascending arrangement
75%th desired value is defined as the 3rd quartile (Q3), also known as " upper quartile ";3rd quartile with the one or four point
The gap of digit is also known as interquartile-range IQR (InterQuartile Range, IQR).Then, (Q1-1.5IQR) will be less than or be more than
(Q3+1.5IQR) desired value is defined as exceptional value and rejected.The span of exceptional value can be entered according to actual conditions
Row setting, the present invention are not specifically limited to it.In the above-described embodiments, based on quartile and interquartile-range IQR, four points
Digit has certain resistance, and up to 25% desired value can become any far without greatly disturbing quartile.
As can be seen here, above-mentioned quartile method has certain superiority in terms of exceptional value is identified.
The desired value of some index features is the bigger the better, but also have some index features desired value be not it is more big more
Well, the desired values such as return of goods number, rejection number during such as applied to electric business platform., will be upper in the present invention for the ease of description
State two kinds of index features and be respectively defined as positive index and negative sense index, concrete kind of the present invention to positive index and negative sense index
Type is not specifically limited, such as the index feature that desired value can be the bigger the better is defined as positive index, desired value is smaller
Better index feature is defined as negative sense index, it is of course also possible to which the smaller the better index feature of desired value is defined as into forward direction
Index, the index feature that desired value is the bigger the better are defined as negative sense index.In order to avoid negative sense index is to user's value category
Issuable adverse effect, user's value category device of the invention may further include:Forward directionization module 302, is used for
Whether judge index feature is negative sense index, if also, the index feature is negative sense index, to the desired value under the negative sense index
Carry out positiveization processing.
Those skilled in the art can be realized to negative sense index by way of taking the opposite number of desired value of negative sense index
Positiveization processing, certainly, those skilled in the art can also use other modes carry out positiveization processing, the present invention to forward direction
The method for changing processing is not specifically limited.In certain embodiments, forward directionization module 302 can refer to negative sense as follows
Desired value under mark carries out positiveization processing:
Obtain the Maximum Index value max (x) under negative sense index;
With Maximum Index value max (x) and desired value x difference x_new=max (x)-x, after positiveization processing
Desired value under negative sense index.
In some cases, be inconvenient to be compared between the desired value of part index number feature, such as same spy
Index is levied, is difficult to directly be compared between the two indices value using different dimensions measurement;For another example for same feature
Index, using percent desired value determined compared with being difficult to directly between the desired value determined using ten point system;Etc..In order to
Make that there is comparativity between each desired value of same index feature or the desired value of different index features, it is preferable that this hair
Bright user's value category device may further include:Standardized module 303, for being standardized to desired value.
In certain embodiments, standardized module 303 can be standardized using deviation Standardization Act, for example,
Desired value is standardized as follows:Obtain the maximum and minimum value of an index feature;With desired value with
The ratio of the difference and maximum of minimum value and the difference of the minimum value, as under this feature index after standardization
Desired value.Deviation Standardization Act is to carry out linear transformation to original desired value.If minA and maxA are respectively index feature A
Minimum value and maximum, an index feature A original value x is mapped in section [0,1] by deviation Standardization Act
Value x', its formula is:X'=(x-minA)/(maxA-minA).
In other embodiment, standardized module 303 can use Z-score Standardization Acts (zero-mean
Normalization) standard deviation Standardization Act is also made to be standardized.Data fit mark by this standardization
Quasi normal distribution, i.e. average are 0, standard deviation 1, and its conversion function is:
Wherein, μ is the average of all desired values of an index feature, and σ is the standard of all desired values of an index feature
Difference, x are the desired value before standardization, and x ' is the desired value after standardization.
It the above is only the simple of method for being standardized standardized module 303 to enumerate, those skilled in the art
The method that other standardizations can also be selected according to actual conditions, the present invention are not specifically limited to this.
In existing user's value category model, substantial amounts of index feature is often screened, on the one hand, index feature is excessively numerous
It is trivial, so that processing all takes a substantial amount of time cost and database resource every time;On the other hand, index feature is more, variable
Between more or less correlation can damage the effect of cluster, and too many index feature can make subsequent cluster become very multiple
It is miscellaneous.The present invention utilize dimensionality reduction thought, by the dependence inside original variable correlation matrix, by each user's
Original feature vector progress factorial analysis, the value based on each user under the factor variable determine the synthesis of each user
Characteristic vector, the more important multi-feature vector that can be picked out carry out cluster analysis, so as to by original index synthesis into
Less index, there is the variable of intricate relation to be attributed to a few multi-stress some, avoid index feature mistake
In cumbersome, the factor quantity of cluster analysis is reduced, reduces the time cost and database resource consumption of user's value category.
In the prior art, weight empirically often is assigned for each index feature, easily produced due to the power of index feature
Match again it is larger or smaller caused by user value packet it is inaccurate.The present invention by factor score by meeting setting rule
Common factor as factor variable, can avoid empirically assigning weight for each index feature, it is accurate to improve weight distribution
Property, and then improve the accuracy of user's value category.
In certain embodiments, the mathematical modeling of factorial analysis is:
The mathematical modeling of above-mentioned factorial analysis can also the form of matrix be expressed as:
It is abbreviated as:
Wherein, X is the original variable matrix of factorial analysis, and A is Factor load-matrix, and F is common factor matrix, and p is finger
The quantity of feature is marked, k is the quantity of common factor, and p≤k, ε are specific factor.
The factor score of each common factor can be regarded as the variance contribution ratio of the common factor.The factor of common factor
Score is higher, shows that the significance level of the common factor is bigger.The present invention alternative embodiment in, analysis module 305 by because
Sub- score meets the common factor of setting rule as factor variable, for example, factor score is exceeded setting threshold by analysis module 305
The common factor of value is as factor variable.Certainly, analysis module 305 can also arrange according to the order of factor score from high to low
Common factor, obtain common factor sequence;Using preceding n common factor in common factor sequence as factor variable;Wherein, 1≤n
< k, k are the number of common factor, and n, k are integer.N value can be determined according to actual conditions, such as n=5.Full
On the basis of sufficient user's value category accuracy, the appropriate value for reducing n, when can reduce amount of calculation and the processing of cluster analysis
Between, the efficiency of raising user's value category.
Cluster module 306 carries out cluster analysis to the multi-feature vector of each user, it is determined that the classification of each user.It is poly-
Class is the process to different class or cluster by data-object classifications, and the data object in same class has very big similitude,
, can by being clustered to the multi-feature vector of each user and the data object between inhomogeneity has very big diversity
Avoid the value according to certain quantile division user from being grouped, improve objectivity and accuracy that user is worth packet.
In order to which some data object is divided into the class, division methods, hierarchical method, the side based on density can be used
The method of the cluster analyses such as method, the method based on grid and the method based on model, for above-mentioned each class cluster analysis
Method, all there is the clustering algorithm being used widely, such as:K averages (K-means) clustering algorithm in division methods,
Coagulation type hierarchical clustering algorithm in hierarchical method, based on neural network clustering algorithm in model method etc..Certainly, in order to not
, can also be using the method for fuzzy cluster analysis, for example with fuzzy rigidly by a data object categorization into certain one kind
Cluster algorithm, determine that each Data Data is under the jurisdiction of the degree of each class by membership function.Those skilled in the art
Suitable clustering method and algorithm can be selected according to actual conditions, the present invention is not especially limited to this.
The central point number of cluster can determine based on experience value, can also be obtained by model training.The present invention's
In alternative embodiment, cluster module 306 determines the central point number K of cluster using ancon rule.For example, it can pre-estimate poly-
The central point number K of class, then draws the cost function value of different K values by generally 2 to 15.Cost function is each
Distortion degree (distortions) sum of class, can be denoted as:
Wherein, CmFor all point sets of m-th of class, yiFor the arbitrfary point in m-th of class, μmFor the centre bit of m-th of class
Put.The distortion degree of each class is equal to the quadratic sum of the distance of such center and the position each put inside it.In class
The point in portion is compacter to each other, then the distortion degree of class is smaller, conversely, the distortion of the more scattered then class to each other of the member inside class
Degree is bigger.K values during the increase of K values corresponding to the maximum position of average distortion degree fall are ancon, corresponding
K values be cluster central point number.
Alternatively, cluster module 306 clusters to the multi-feature vector of each user, including:
Select the central value of central point each clustered;
The central value of each central point is updated by iteration, in each iterative process:Based on the described comprehensive of each user
The central value for closing characteristic vector and each central point determines each user and the distance of each central point, and each user is returned
Enter and the class where its most short central point of distance;Update the central value of each central point;
If the central value of each central point keeps constant before and after renewal, iteration terminates.
In the iterative process of above-described embodiment, it can be that central value before and after value iteration is equal that central value, which keeps constant,
Can also be that central value before and after value iteration is no more than default span.
Fig. 4, which is shown, can apply user's value category method of the embodiment of the present invention or showing for user's value category device
Example sexual system framework 400.
As shown in figure 4, system architecture 400 can include terminal device 401,402,403, network 404 and server 405.
Network 404 between terminal device 401,402,403 and server 405 provide communication link medium.Network 404 can be with
Including various connection types, such as wired, wireless communication link or fiber optic cables etc..
User can be interacted with using terminal equipment 401,402,403 by network 404 with server 405, to receive or send out
Send message etc..Various telecommunication customer end applications, such as the application of shopping class, net can be installed on terminal device 401,402,403
The application of page browsing device, searching class application, JICQ, mailbox client, social platform software etc..
Terminal device 401,402,403 can have a display screen and a various electronic equipments that supported web page browses, bag
Include but be not limited to smart mobile phone, tablet personal computer, pocket computer on knee and desktop computer etc..
Server 405 can be to provide the server of various services, such as utilize terminal device 401,402,403 to user
The shopping class website browsed provides the back-stage management server supported.Back-stage management server can be believed the product received
The data such as breath inquiry request are carried out the processing such as analyzing, and result is fed back into terminal device.
It should be noted that user's value category method that the embodiment of the present invention is provided typically is performed by server 405,
Correspondingly, user's value category device is generally positioned in server 405.
It should be understood that the number of the terminal device, network and server in Fig. 4 is only schematical.According to realizing need
Will, can have any number of terminal device, network and server.
According to another aspect of the present invention, there is provided a kind of user's value category terminal based on cluster analysis, including:
One or more processors;
Storage device, for storing one or more programs,
When one or more programs are by one or more of computing devices so that the one or more processors are realized
User's value category method of the invention based on cluster analysis.
Below with reference to Fig. 5, it illustrates suitable for for realizing the computer system 500 of the terminal device of the embodiment of the present invention
Structural representation.Terminal device shown in Fig. 5 is only an example, to the function of the embodiment of the present invention and should not use model
Shroud carrys out any restrictions.
As shown in figure 5, computer system 500 includes CPU (CPU) 501, it can be read-only according to being stored in
Program in memory (ROM) 502 or be loaded into program in random access storage device (RAM) 503 from storage part 508 and
Perform various appropriate actions and processing.In RAM 503, also it is stored with system 500 and operates required various programs and data.
CPU 501, ROM 502 and RAM 503 are connected with each other by bus 504.Input/output (I/O) interface 505 is also connected to always
Line 504.
I/O interfaces 505 are connected to lower component:Importation 506 including keyboard, mouse etc.;Penetrated including such as negative electrode
The output par, c 507 of spool (CRT), liquid crystal display (LCD) etc. and loudspeaker etc.;Storage part 508 including hard disk etc.;
And the communications portion 509 of the NIC including LAN card, modem etc..Communications portion 509 via such as because
The network of spy's net performs communication process.Driver 510 is also according to needing to be connected to I/O interfaces 505.Detachable media 511, such as
Disk, CD, magneto-optic disk, semiconductor memory etc., it is arranged on as needed on driver 510, in order to read from it
Computer program be mounted into as needed storage part 508.
Especially, according to embodiment disclosed by the invention, may be implemented as counting above with reference to the process of flow chart description
Calculation machine software program.For example, embodiment disclosed by the invention includes a kind of computer program product, it includes being carried on computer
Computer program on computer-readable recording medium, the computer program include the program code for being used for the method shown in execution flow chart.
In such embodiment, the computer program can be downloaded and installed by communications portion 509 from network, and/or from can
Medium 511 is dismantled to be mounted.When the computer program is performed by CPU (CPU) 501, system of the invention is performed
The above-mentioned function of middle restriction.
It should be noted that the computer-readable medium shown in the present invention can be computer-readable signal media or meter
Calculation machine readable storage medium storing program for executing either the two any combination.Computer-readable recording medium for example can be --- but not
Be limited to --- electricity, magnetic, optical, electromagnetic, system, device or the device of infrared ray or semiconductor, or it is any more than combination.Meter
The more specifically example of calculation machine readable storage medium storing program for executing can include but is not limited to:Electrical connection with one or more wires, just
Take formula computer disk, hard disk, random access storage device (RAM), read-only storage (ROM), erasable type and may be programmed read-only storage
Device (EPROM or flash memory), optical fiber, portable compact disc read-only storage (CD-ROM), light storage device, magnetic memory device,
Or above-mentioned any appropriate combination.In the present invention, computer-readable recording medium can any include or store journey
The tangible medium of sequence, the program can be commanded the either device use or in connection of execution system, device.And at this
In invention, computer-readable signal media can include in a base band or as carrier wave a part propagation data-signal,
Wherein carry computer-readable program code.The data-signal of this propagation can take various forms, including but unlimited
In electromagnetic signal, optical signal or above-mentioned any appropriate combination.Computer-readable signal media can also be that computer can
Any computer-readable medium beyond storage medium is read, the computer-readable medium, which can send, propagates or transmit, to be used for
By instruction execution system, device either device use or program in connection.Included on computer-readable medium
Program code can be transmitted with any appropriate medium, be included but is not limited to:Wirelessly, electric wire, optical cable, RF etc., or it is above-mentioned
Any appropriate combination.
Flow chart and block diagram in accompanying drawing, it is illustrated that according to the system of various embodiments of the invention, method and computer journey
Architectural framework in the cards, function and the operation of sequence product.At this point, each square frame in flow chart or block diagram can generation
The part of one module of table, program segment or code, a part for above-mentioned module, program segment or code include one or more
For realizing the executable instruction of defined logic function.It should also be noted that some as replace realization in, institute in square frame
The function of mark can also be with different from the order marked in accompanying drawing generation.For example, two square frames succeedingly represented are actual
On can perform substantially in parallel, they can also be performed in the opposite order sometimes, and this is depending on involved function.Also
It is noted that the combination of each square frame and block diagram in block diagram or flow chart or the square frame in flow chart, can use and perform rule
Fixed function or the special hardware based system of operation are realized, or can use the group of specialized hardware and computer instruction
Close to realize.
Being described in module involved in the embodiment of the present invention can be realized by way of software, can also be by hard
The mode of part is realized.Described module can also be set within a processor, for example, can be described as:A kind of processor bag
Include module, acquisition module, determining module and first processing module.Wherein, the title of these modules not structure under certain conditions
The paired restriction of the module in itself, for example, sending module is also described as " sending picture to the service end connected to obtain
The module of request ".
As on the other hand, present invention also offers a kind of computer-readable medium, the computer-readable medium can be
Included in equipment described in above-described embodiment;Can also be individualism, and without be incorporated the equipment in.Above-mentioned calculating
Machine computer-readable recording medium carries one or more program, when said one or multiple programs are performed by the equipment, makes
Obtain the equipment and realize user's value category method of the invention based on cluster analysis.
Above-mentioned embodiment, does not form limiting the scope of the invention.Those skilled in the art should be bright
It is white, depending on design requirement and other factors, various modifications, combination, sub-portfolio and replacement can occur.It is any
Modifications, equivalent substitutions and improvements made within the spirit and principles in the present invention etc., should be included in the scope of the present invention
Within.
Claims (22)
- A kind of 1. user's value category method based on cluster analysis, it is characterised in that including:According to desired value of the user under each index feature, it is determined that the original feature vector of each user;Factorial analysis is carried out to the original feature vector of each user, factor score is met to the common factor of setting rule As factor variable;Value based on each user under the factor variable, it is determined that the multi-feature vector of each user;Cluster analysis is carried out to the multi-feature vector of each user, it is determined that the classification of each user.
- 2. user's value category method as claimed in claim 1, it is characterised in thatThe common factor is arranged according to the order of factor score from high to low, obtains common factor sequence;Using preceding n common factor in the common factor sequence as factor variable;Wherein, 1≤n < k, k are common factor Number, n, k are integer.
- 3. user's value category method as claimed in claim 1, it is characterised in that to the primitive character of each user to Before amount carries out factorial analysis, further comprise:By data cleansing, identify whether the desired value is exceptional value, also,If the desired value is exceptional value, the desired value is rejected.
- 4. user's value category method as claimed in claim 3, it is characterised in that data cleansing is carried out by box-shaped figure.
- 5. user's value category method as claimed in claim 1, it is characterised in that to the primitive character of each user to Before amount carries out factorial analysis, further comprise:Judge whether the index feature is negative sense index, also,If the index feature is negative sense index, positiveization processing is carried out to the desired value under the negative sense index.
- 6. user's value category method as claimed in claim 5, it is characterised in that as follows to the negative sense index Under the desired value carry out positiveization processing:Obtain the Maximum Index value under the negative sense index;With the Maximum Index value and the difference of the desired value, as the finger under the negative sense index after positiveization processing Scale value.
- 7. user's value category method as claimed in claim 1, it is characterised in that to the primitive character of each user to Before amount carries out factorial analysis, further comprise:The desired value is standardized.
- 8. user's value category method as claimed in claim 7, it is characterised in that enter as follows to the desired value Row standardization:Obtain the maximum and minimum value of an index feature;With the ratio of the difference and the maximum and the difference of the minimum value of the desired value and the minimum value, as mark The desired value under this feature index after quasi-ization processing.
- 9. user's value category method as claimed in claim 1, it is characterised in that the center of cluster is determined using ancon rule Point number.
- 10. user's value category method as claimed in claim 1, it is characterised in that to the comprehensive characteristics of each user Vector carries out cluster analysis, including:Select the central value of central point each clustered;The central value of each central point is updated by iteration, in each iterative process:It is described comprehensive special based on each user Levy it is vectorial each user and the distance of each central point are determined with each central point central value, by each user be included into Class where its most short described central point of distance;Update the central value of each central point;If the central value of each central point keeps constant before and after renewal, iteration terminates.
- A kind of 11. user's value category device based on cluster analysis, it is characterised in that including:Acquisition module, for the desired value according to user under each index feature, it is determined that the original feature vector of each user;Analysis module, for carrying out factorial analysis to the original feature vector of each user, factor score is met to set The common factor of rule is as factor variable;Value based on each user under the factor variable, it is determined that each user Multi-feature vector;Cluster module, the cluster module carries out cluster analysis to the multi-feature vector of each user, it is determined that each use The classification at family.
- 12. user's value category device as claimed in claim 11, it is characterised in thatThe analysis module arranges the common factor according to the order of factor score from high to low, obtains common factor sequence;The analysis module is using preceding n common factor in common factor sequence as factor variable;Wherein, 1≤n < k, k are public affairs The number of common factor, n, k are integer.
- 13. user's value category device as claimed in claim 11, it is characterised in that further comprise:Cleaning module, it is used for Identify whether the desired value is exceptional value, if also, the desired value is exceptional value, reject the desired value.
- 14. user's value category device as claimed in claim 13, it is characterised in that the cleaning module is entered by box-shaped figure Row data cleansing.
- 15. user's value category device as claimed in claim 11, it is characterised in that further comprise:Forward directionization module, use In judging whether the index feature is negative sense index, if also, the index feature is negative sense index, to the negative sense index Under the desired value carry out positiveization processing.
- 16. user's value category device as claimed in claim 15, it is characterised in that the forward directionization module is according to such as lower section Method carries out positiveization processing to the desired value under the negative sense index:Obtain the Maximum Index value under the negative sense index;With the Maximum Index value and the difference of the desired value, as the finger under the negative sense index after positiveization processing Scale value.
- 17. user's value category device as claimed in claim 11, it is characterised in that further comprise:Standardized module, use It is standardized in the desired value.
- 18. user's value category device as claimed in claim 17, it is characterised in that the standardized module is according to such as lower section Method is standardized to the desired value:Obtain the maximum and minimum value of an index feature;With the ratio of the difference and the maximum and the difference of the minimum value of the desired value and the minimum value, as mark The desired value under this feature index after quasi-ization processing.
- 19. user's value category device as claimed in claim 11, it is characterised in that the cluster module uses ancon rule It is determined that the central point number of cluster.
- 20. user's value category device as claimed in claim 11, it is characterised in that the cluster module is to each use The multi-feature vector at family carries out cluster analysis, including:Select the central value of central point each clustered;The central value of each central point is updated by iteration, in each iterative process:It is described comprehensive special based on each user Levy it is vectorial each user and the distance of each central point are determined with each central point central value, by each user be included into Class where its most short described central point of distance;Update the central value of each central point;If the central value of each central point keeps constant before and after renewal, iteration terminates.
- A kind of 21. user's value category terminal based on cluster analysis, it is characterised in that including:One or more processors;Storage device, for storing one or more programs,When one or more of programs are by one or more of computing devices so that one or more of processors are real The now method as described in any in claim 1-10.
- 22. a kind of computer-readable medium, is stored thereon with computer program, it is characterised in that described program is held by processor The method as described in any in claim 1-10 is realized during row.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710555480.1A CN107480187A (en) | 2017-07-10 | 2017-07-10 | User's value category method and apparatus based on cluster analysis |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201710555480.1A CN107480187A (en) | 2017-07-10 | 2017-07-10 | User's value category method and apparatus based on cluster analysis |
Publications (1)
Publication Number | Publication Date |
---|---|
CN107480187A true CN107480187A (en) | 2017-12-15 |
Family
ID=60594946
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201710555480.1A Pending CN107480187A (en) | 2017-07-10 | 2017-07-10 | User's value category method and apparatus based on cluster analysis |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN107480187A (en) |
Cited By (21)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197224A (en) * | 2017-12-28 | 2018-06-22 | 广州虎牙信息科技有限公司 | User group sorting technique, storage medium and terminal |
CN108664653A (en) * | 2018-05-18 | 2018-10-16 | 拓普暨达(广州)基因精准医疗科技有限公司 | A kind of Medical Consumption client's automatic classification method based on K-means |
CN109063885A (en) * | 2018-05-29 | 2018-12-21 | 国网天津市电力公司 | A kind of substation's exception metric data prediction technique |
CN109711484A (en) * | 2019-01-10 | 2019-05-03 | 哈步数据科技(上海)有限公司 | A kind of classification method and system of customer |
CN109949068A (en) * | 2019-01-09 | 2019-06-28 | 深圳北斗应用技术研究院有限公司 | A kind of real time pooling vehicle method and apparatus based on prediction result |
CN110858313A (en) * | 2018-08-24 | 2020-03-03 | 国信优易数据有限公司 | Crowd classification method and crowd classification system |
CN111464583A (en) * | 2019-01-22 | 2020-07-28 | 阿里巴巴集团控股有限公司 | Computing resource allocation method, device, server and storage medium |
CN111767956A (en) * | 2020-06-30 | 2020-10-13 | 苏州科达科技股份有限公司 | Image tampering detection method, electronic device, and storage medium |
CN111797848A (en) * | 2019-04-09 | 2020-10-20 | 成都鼎桥通信技术有限公司 | User classification method, device, equipment and storage medium |
CN112183885A (en) * | 2020-10-21 | 2021-01-05 | ***股份有限公司 | Position determination method and device |
CN112446644A (en) * | 2020-12-11 | 2021-03-05 | 福州数据技术研究院有限公司 | Method and device for improving quality of network questionnaire |
CN112907035A (en) * | 2021-01-27 | 2021-06-04 | 厦门卫星定位应用股份有限公司 | K-means-based transportation subject credit rating method and device |
CN113159829A (en) * | 2021-03-19 | 2021-07-23 | 北京京东拓先科技有限公司 | Virtual asset allocation method and device and electronic equipment |
CN113256693A (en) * | 2021-06-04 | 2021-08-13 | 武汉工控仪器仪表有限公司 | Multi-view registration method based on K-means and normal distribution transformation |
CN113532801A (en) * | 2021-06-24 | 2021-10-22 | 四川九洲电器集团有限责任公司 | High/multispectral camera dead pixel detection method and system based on distribution quantile |
CN113706182A (en) * | 2020-05-20 | 2021-11-26 | 北京沃东天骏信息技术有限公司 | User classification method and device |
CN113779275A (en) * | 2021-09-18 | 2021-12-10 | 中国平安人寿保险股份有限公司 | Feature extraction method, device and equipment based on medical data and storage medium |
CN113837780A (en) * | 2020-06-23 | 2021-12-24 | 上海莉莉丝科技股份有限公司 | Information delivery method, system, device and medium |
CN114662629A (en) * | 2022-03-23 | 2022-06-24 | 中国邮电器材集团有限公司 | Method and device for identifying industrial code in multi-level node structure |
CN115409549A (en) * | 2022-08-23 | 2022-11-29 | 中国民航信息网络股份有限公司 | Data processing method, system, electronic equipment and computer storage medium |
CN116796075A (en) * | 2023-08-24 | 2023-09-22 | 四维世景科技(北京)有限公司 | Method and device for analyzing problem data |
Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103295079A (en) * | 2013-06-09 | 2013-09-11 | 国家电网公司 | Electric power multi-objective decision support method based on intelligent data mining model |
CN104899331A (en) * | 2015-06-24 | 2015-09-09 | Tcl集团股份有限公司 | Television used behavior data clustering method and device and Spark big data platform |
CN105761110A (en) * | 2016-02-19 | 2016-07-13 | 北京京东尚科信息技术有限公司 | Cross-equipment user value analysis method and cross-equipment user value analysis device |
CN106022800A (en) * | 2016-05-16 | 2016-10-12 | 北京百分点信息科技有限公司 | User feature data processing method and device |
CN106294882A (en) * | 2016-08-30 | 2017-01-04 | 北京京东尚科信息技术有限公司 | Data digging method and device |
CN106408008A (en) * | 2016-09-08 | 2017-02-15 | 国网江西省电力公司赣州供电分公司 | Load curve distance and shape-based load classification method |
-
2017
- 2017-07-10 CN CN201710555480.1A patent/CN107480187A/en active Pending
Patent Citations (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN103295079A (en) * | 2013-06-09 | 2013-09-11 | 国家电网公司 | Electric power multi-objective decision support method based on intelligent data mining model |
CN104899331A (en) * | 2015-06-24 | 2015-09-09 | Tcl集团股份有限公司 | Television used behavior data clustering method and device and Spark big data platform |
CN105761110A (en) * | 2016-02-19 | 2016-07-13 | 北京京东尚科信息技术有限公司 | Cross-equipment user value analysis method and cross-equipment user value analysis device |
CN106022800A (en) * | 2016-05-16 | 2016-10-12 | 北京百分点信息科技有限公司 | User feature data processing method and device |
CN106294882A (en) * | 2016-08-30 | 2017-01-04 | 北京京东尚科信息技术有限公司 | Data digging method and device |
CN106408008A (en) * | 2016-09-08 | 2017-02-15 | 国网江西省电力公司赣州供电分公司 | Load curve distance and shape-based load classification method |
Cited By (28)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN108197224B (en) * | 2017-12-28 | 2020-11-20 | 广州虎牙信息科技有限公司 | User group classification method, storage medium and terminal |
CN108197224A (en) * | 2017-12-28 | 2018-06-22 | 广州虎牙信息科技有限公司 | User group sorting technique, storage medium and terminal |
CN108664653A (en) * | 2018-05-18 | 2018-10-16 | 拓普暨达(广州)基因精准医疗科技有限公司 | A kind of Medical Consumption client's automatic classification method based on K-means |
CN109063885A (en) * | 2018-05-29 | 2018-12-21 | 国网天津市电力公司 | A kind of substation's exception metric data prediction technique |
CN110858313A (en) * | 2018-08-24 | 2020-03-03 | 国信优易数据有限公司 | Crowd classification method and crowd classification system |
CN110858313B (en) * | 2018-08-24 | 2023-01-31 | 国信优易数据股份有限公司 | Crowd classification method and crowd classification system |
CN109949068A (en) * | 2019-01-09 | 2019-06-28 | 深圳北斗应用技术研究院有限公司 | A kind of real time pooling vehicle method and apparatus based on prediction result |
CN109711484A (en) * | 2019-01-10 | 2019-05-03 | 哈步数据科技(上海)有限公司 | A kind of classification method and system of customer |
CN111464583A (en) * | 2019-01-22 | 2020-07-28 | 阿里巴巴集团控股有限公司 | Computing resource allocation method, device, server and storage medium |
CN111797848A (en) * | 2019-04-09 | 2020-10-20 | 成都鼎桥通信技术有限公司 | User classification method, device, equipment and storage medium |
CN111797848B (en) * | 2019-04-09 | 2023-10-24 | 成都鼎桥通信技术有限公司 | User classification method, device, equipment and storage medium |
CN113706182A (en) * | 2020-05-20 | 2021-11-26 | 北京沃东天骏信息技术有限公司 | User classification method and device |
CN113837780A (en) * | 2020-06-23 | 2021-12-24 | 上海莉莉丝科技股份有限公司 | Information delivery method, system, device and medium |
CN111767956A (en) * | 2020-06-30 | 2020-10-13 | 苏州科达科技股份有限公司 | Image tampering detection method, electronic device, and storage medium |
CN112183885A (en) * | 2020-10-21 | 2021-01-05 | ***股份有限公司 | Position determination method and device |
CN112446644A (en) * | 2020-12-11 | 2021-03-05 | 福州数据技术研究院有限公司 | Method and device for improving quality of network questionnaire |
CN112907035A (en) * | 2021-01-27 | 2021-06-04 | 厦门卫星定位应用股份有限公司 | K-means-based transportation subject credit rating method and device |
CN112907035B (en) * | 2021-01-27 | 2022-08-05 | 厦门卫星定位应用股份有限公司 | K-means-based transportation subject credit rating method and device |
CN113159829A (en) * | 2021-03-19 | 2021-07-23 | 北京京东拓先科技有限公司 | Virtual asset allocation method and device and electronic equipment |
CN113256693A (en) * | 2021-06-04 | 2021-08-13 | 武汉工控仪器仪表有限公司 | Multi-view registration method based on K-means and normal distribution transformation |
CN113532801A (en) * | 2021-06-24 | 2021-10-22 | 四川九洲电器集团有限责任公司 | High/multispectral camera dead pixel detection method and system based on distribution quantile |
CN113779275A (en) * | 2021-09-18 | 2021-12-10 | 中国平安人寿保险股份有限公司 | Feature extraction method, device and equipment based on medical data and storage medium |
CN113779275B (en) * | 2021-09-18 | 2024-02-09 | 中国平安人寿保险股份有限公司 | Feature extraction method, device, equipment and storage medium based on medical data |
CN114662629A (en) * | 2022-03-23 | 2022-06-24 | 中国邮电器材集团有限公司 | Method and device for identifying industrial code in multi-level node structure |
CN115409549A (en) * | 2022-08-23 | 2022-11-29 | 中国民航信息网络股份有限公司 | Data processing method, system, electronic equipment and computer storage medium |
CN115409549B (en) * | 2022-08-23 | 2024-05-14 | 中国民航信息网络股份有限公司 | Data processing method, system, electronic equipment and computer storage medium |
CN116796075A (en) * | 2023-08-24 | 2023-09-22 | 四维世景科技(北京)有限公司 | Method and device for analyzing problem data |
CN116796075B (en) * | 2023-08-24 | 2023-10-31 | 四维世景科技(北京)有限公司 | Method and device for analyzing problem data |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN107480187A (en) | User's value category method and apparatus based on cluster analysis | |
Elliott et al. | Efficient tests for general persistent time variation in regression coefficients | |
CN108133418A (en) | Real-time credit risk management system | |
Chanta et al. | The minimum p-envy location problem: a new model for equitable distribution of emergency resources | |
CN110135901A (en) | A kind of enterprise customer draws a portrait construction method, system, medium and electronic equipment | |
CN107220217A (en) | Characteristic coefficient training method and device that logic-based is returned | |
CN109840730B (en) | Method and device for data prediction | |
Kim et al. | A patent-based approach for the identification of technology-based service opportunities | |
CN107729915A (en) | For the method and system for the key character for determining machine learning sample | |
CN107833063A (en) | Pharmacy member is lost in early warning and intelligent interfering system and method | |
CN106294882A (en) | Data digging method and device | |
CN113742492A (en) | Insurance scheme generation method and device, electronic equipment and storage medium | |
CN107885784A (en) | The method and apparatus for extracting user characteristic data | |
Sharma et al. | Financial fluctuations anchored to economic fundamentals: A mesoscopic network approach | |
CN115375177A (en) | User value evaluation method and device, electronic equipment and storage medium | |
CN113130052A (en) | Doctor recommendation method, doctor recommendation device, terminal equipment and storage medium | |
CN115983900A (en) | Method, apparatus, device, medium, and program product for constructing user marketing strategy | |
CN113450141A (en) | Intelligent prediction method and device based on electricity selling quantity characteristics of large-power customer groups | |
Simons et al. | A cross-disciplinary technology transfer for search-based evolutionary computing: from engineering design to software engineering design | |
WO2023185125A1 (en) | Product resource data processing method and apparatus, electronic device and storage medium | |
CN115759014A (en) | Dynamic intelligent analysis method and system and electronic equipment | |
CN113496236B (en) | User tag information determining method, device, equipment and storage medium | |
CN113379456A (en) | Credit interest rate differential pricing method, device, equipment and storage medium | |
Koole | An Introduction to Business Analytics | |
CN112288482A (en) | Virtual resource pool construction method, system, equipment and storage medium |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20171215 |
|
RJ01 | Rejection of invention patent application after publication |