CN107103336A

CN107103336A - A kind of mixed attributes data clustering method based on density peaks

Info

Publication number: CN107103336A
Application number: CN201710294126.8A
Authority: CN
Inventors: 刘世华; 叶展翔; 周炳忠; 张�浩
Original assignee: Wenzhou Polytechnic
Current assignee: Wenzhou Polytechnic
Priority date: 2017-04-28
Filing date: 2017-04-28
Publication date: 2017-08-29

Abstract

The embodiment of the invention discloses a kind of mixed attributes data clustering method based on density peaks, including obtain mixed attributes data set to be clustered, and calculate in mixed attributes data set to be clustered the distance between each two data point and block distance；According to the distance between each two data point and distance is blocked, obtain the local density of each data point, and calculate relative distance；The γ parameter curves formed by the local density and its relative distance of each data point are defined, γ parameter values are obtained；According to the sequence number of each data point, γ parameter values and relative distance, flex point index matrix is built, and cluster centre point is obtained using default crutches point algorithm；According to cluster centre point, the expression and output of mixed attributes cluster data result to be clustered are realized.The embodiment of the present invention, better than traditional k prototypes algorithms Clustering Effects, efficiency of algorithm is high and can find clusters number automatically, the influence to outlier is insensitive.

Description

A kind of mixed attributes data clustering method based on density peaks

Technical field

Excavated the present invention relates to computer data and processing technology field, more particularly to a kind of mixing based on density peaks Attribute data clustering method.

Background technology

Clustering is always one of study hotspot in data mining and machine learning field, with the big data epoch Development, Various types of data emerges in an endless stream, and wherein most is the tradition while have the data of a variety of attribute types such as numerical value and classification Clustering algorithm such as K-Means etc. primarily directed to numerical attribute data clustering algorithm.Gather to handle mixed attributes data Class problem, researcher proposes various solutions, and can be broadly divided into traditional type by its roadmap changes Method, Clustering Ensemble Approaches: An, the method based on prototype and the method based on density, method based on level etc..

Other attributes are exactly converted to certain attribute and clustered again by the method for type conversion, such as David and Numerical attribute is converted into categorical attribute by the SpectralCAT algorithms that Averbuch is proposed, this method first, then using spectrum Data after clustering method processing conversion.

The thought of Cluster-Fusion is that one group objects is divided using many algorithms, and the result that algorithms of different is drawn is adopted Merged to draw final cluster result with common recognition function.Its earliest by A.Strehl and J.Ghosh in 2002 propose, Subsequently become one of main stream approach of mixed attributes cluster.Zhao Yu etc. proposes a kind of mixed attributes cluster based on Cluster-Fusion Algorithm CEMC, the method system of Cluster-Fusion is introduced into mixed attributes data clusters problem.He etc. is proposed to be melted based on cluster Conjunction and the mixed attributes clustering algorithm CEBMDC of Squeezer algorithms, the algorithm is clustered for categorical attribute subset to be gathered with last Class fusion all employs the progress of Squeezer algorithms.

K-prototypes (k prototypes) algorithm that Huang was proposed in 1997, the algorithm uses the base of k-means algorithms This thought, the mode combinations of the cluster centre of numerical attribute and categorical attribute are got up, and construct a new mixed attributes number It is prototype (prototype) according to center, and it is public to construct based on prototype a distance metric for mixed attributes data Mixed attributes are directly clustered by formula and cost function using cluster process as the k-means classes of algorithms.Calculation based on prototype Method thought is simple, and efficiency high, its key is to be the definition of the distance between data tuple measure formulas.Yiu-ming Cheung etc. proposes a kind of unified similarity measurement (unified similarity metric) method, by numerical attribute portion The distance metric divided is normalized, and the value of similarity measurement is constrained in [0,1] interval, then by each categorical attribute Similarity measurement assigns weight and is normalized respectively, finally obtains a unified distance metric formula.Based on this Formula, they propose a kind of iterative algorithm OCIL to cluster mixed attributes data, meanwhile, by introduction of competition and punishing Mechanism is penalized, further improvement has been carried out to OCIL, it is proposed that is capable of the mixed attributes clustering algorithm of automatic discrimination clusters number (PCL-OC).OCIL algorithms and k-prototypes algorithms have been carried out experiment and compared by them, and its clustering precision improves a lot, But the computation complexity of its unified metric value is higher.

Li and Biswas propose SBAC (Similarity Based Agglomerative Clustering) algorithm [i], this is the Agglomerative Hierarchical Clustering algorithm based on Goodall similarities, and this method effect is pretty good, but computation complexity is higher than O (n²*logn)。

The RDBC_M algorithms of the propositions such as yellow ability and political integrity employ the range formula towards dimension, to per it is one-dimensional it is independent calculate away from From logarithm value attribute uses Euclidean distance, to categorical attribute then by way of expert estimation between the attribute different value Similarity definition one distance matrix weighs dimension distance, and it, which builds, needs artificial marking.

The MDCDen algorithms and DC-MDACC algorithms of the propositions such as Chen Jinyin be by mixed attributes data be divided into numerical value be dominant, Classification is dominant and the class of balanced type mixed attributes data three, and different distance metric functions are then defined for each class.They are needed The analysis that is dominant first is carried out to data set.

The above-mentioned method based on prototype is still suffered from it needs to be determined that clustering number, the selection sensitivity to cluster center, can not finding The cluster of arbitrary shape and to abnormity point it is more sensitive the shortcomings of；Method existence time and space complexity based on level compared with High, the irreversible shortcoming of cluster process；The measuring similarity of categorical attribute in RDBC_M algorithms needs the evaluation of domain expert Assignment；MDCDen algorithms need three parameters of regulation to obtain preferable result.

2014, Alex Rodriguez and Alessandro Laio existed《Science》Deliver a kind of quick on magazine Search and the clustering algorithm (this paper abbreviation DPC algorithms) for finding density peaks.The algorithm Clustering Effect is good, efficiency high, parameter are few, It can be found that clusters number, and data of different shapes can be clustered, automatic identification outlier.The input of DPC algorithms It is the distance matrix between data point, as long as the distance metric between solving the problems, such as the data points of mixed attributes data, it is possible to directly Clustering is carried out using the algorithm, but not yet inquires other at present mixed attributes data are clustered using DPC algorithms Research report.

Therefore, the cluster of a kind of rational mixed attributes data point distance calculating method and processing mixed attributes data is needed badly Method, it is better than traditional k-prototypes algorithms Clustering Effect, efficiency of algorithm is high and can find clusters number automatically, to from The influence of group's point is insensitive.

The content of the invention

The purpose of the embodiment of the present invention is to provide a kind of mixed attributes data clustering method based on density peaks, than passing The k-prototypes algorithms Clustering Effect of system is good, efficiency of algorithm is high and can find clusters number automatically, to the shadow of outlier Sound is insensitive.

In order to solve the above-mentioned technical problem, the embodiments of the invention provide a kind of mixed attributes data based on density peaks Clustering method, methods described includes：

S1, acquisition mixed attributes data set to be clustered, and according to the mixed attributes data set to be clustered, calculate described The distance between each two data point in mixed attributes data set to be clustered, and calculate the mixed attributes data to be clustered What is collected blocks distance；

The distance between each two data point and institute in S2, the mixed attributes data set to be clustered calculated according to The local density blocked distance, obtain each data point in the mixed attributes data set to be clustered calculated is stated, is gone forward side by side The local density of each data point in the mixed attributes data set to be clustered that one step is obtained according to, calculates and described waits to gather The relative distance of each data point in class mixed attributes data set；

S3, the definition local density of each data point and its corresponding phase in the mixed attributes data set to be clustered Adjust the distance the γ parameter curves to be formed, and determine the γ parameters of each data point in the mixed attributes data set to be clustered Value；

S4, according to the sequence number of each data point in the mixed attributes data set to be clustered, γ parameter values and it is relative away from From, flex point index matrix is built, and the flex point index matrix of the structure is solved using default crutches point algorithm, obtain institute State the cluster centre point of mixed attributes data set to be clustered；

S5, the mixed attributes data set to be clustered obtained according to cluster centre point, realize the mixing to be clustered The expression and output of attribute data clustering result；Wherein, what is obtained in the mixed attributes data set to be clustered except described in is poly- Data point outside class central point will be assigned to during neighbour local density highest clusters, and complete the expression of cluster result and defeated Go out.

Wherein, the distance between each two data point is by formula D (X in the mixed attributes data set to be clustered_i, X_j)=d (X_i,X_j)_r+d(X_i,X_j)_cTo realize；Wherein, d (X_i,X_j)_rRepresent numerical attribute in mixed attributes data set to be clustered Partial distance, d (X_i,X_j)_cRepresent the distance of categorical attribute part in mixed attributes data set to be clustered；

Wherein, d (X_i,X_j)_rIt is by formulaTo realize；Wherein,Represent data point X_iAnd X_jThe normalization of numerical part attribute after Euclidean distance, and distance value d (X_i,X_j)_r It is interval in [0,1]；

Wherein, d (X_i,X_j)_cIt is by formulaTo realize；Wherein,For data point X_iAnd X_jThe matching distance in categorical attribute is tieed up in t； The entropy weight in categorical attribute is tieed up for t, wherein,p(a_ts) in t dimension categorical attributes Classification value total number be m_tWhen, s (s=1,2 ..., m_t) the individual probability for being worth appearance.

Wherein, the γ parameter values of each data point are by formula γ in the mixed attributes data set to be clustered_i=ρ_i ×δ_iAnd obtain；Wherein, γ_iFor the γ parameter values of i-th of data point；ρ_iFor the local density of i-th of data point；δ_iFor i-th The relative distance of individual data point.

Wherein, the step S4 is specifically included：

Sequence number, γ parameter values and the relative distance of each data point in the mixed attributes data set to be clustered are determined, And sequence number set, γ set of parameter values and relative distance set are further formed respectively；Wherein, sequence number set I=[1,2 ..., N], γ set of parameter values γ=[γ₁,γ₂,…,γ_n], relative distance set delta=[δ₁,δ₂,…,δ_n]；N is described to be clustered Data point sum in mixed attributes data set, and be positive integer；

According to the sequence number set of the formation, γ set of parameter values and relative distance set, flex point index matrix CT is built =[I；γ；δ]；Wherein, the flex point index matrix CT=[I；γ；δ] be by the sequence number of data point, γ parameter values and it is relative away from From the 3xn matrixes respectively formed as row vector, data point sum n by column vector；

By the flex point index matrix CT=[I；γ；δ] according to first γ rows are sorted from big to small again to δ rows from big to small The mode of sequence is adjusted, the flex point index matrix after being adjusted, and the flex point index matrix after the adjustment is solved The second dervative of γ rows, obtained value retains columns as flex point, and further in the flex point index matrix after the adjustment Less than or equal to institute's directed quantity of the flex point, candidate centers collection is formed；

Judge that the candidate centers concentrate whether vectorial columns is less than or equal to 2；

If it is, regarding the data point of candidate centers collection midrange correspondence sequence number as cluster centre point；

If it is not, then to the candidate centers collection continue solve δ rows second dervative, obtained value as secondary flex point, And the candidate centers are concentrated to the data point of columns correspondence sequence number before being less than or equal to the secondary flex point as in cluster Heart point.

Implement the embodiment of the present invention, have the advantages that：

The embodiment of the present invention provides mixed attributes data set the distance metric formula of unified mixed attributes data point, and The flex point index matrix between the data point of mixed attributes data is built with this, and then is proposed in the automatic cluster based on crutches point The heart determines method, thus it is better than traditional k-prototypes algorithms Clustering Effect, efficiency of algorithm is high and can find to gather automatically Class number, the influence to outlier is insensitive.

Brief description of the drawings

In order to illustrate more clearly about the embodiment of the present invention or technical scheme of the prior art, below will be to embodiment or existing There is the accompanying drawing used required in technology description to be briefly described, it should be apparent that, drawings in the following description are only this Some embodiments of invention, for those of ordinary skill in the art, without having to pay creative labor, according to These accompanying drawings obtain other accompanying drawings and still fall within scope of the invention.

Fig. 1 is the flow chart of the mixed attributes data clustering method provided in an embodiment of the present invention based on density peaks.

Embodiment

In order to make the purpose , technical scheme and advantage of the present invention be clearer, it is right below in conjunction with drawings and Examples The present invention is further elaborated.It should be appreciated that the specific embodiments described herein are merely illustrative of the present invention, and It is not used in the restriction present invention.

As shown in figure 1, in the embodiment of the present invention, a kind of mixed attributes data clustering method based on density peaks is proposed, Methods described includes：

Step S101, acquisition mixed attributes data set to be clustered, and according to the mixed attributes data set to be clustered, calculate Go out the distance between each two data point in the mixed attributes data set to be clustered, and calculate the mixing category to be clustered Property data set blocks distance；

Detailed process is to determine mixed attributes data set to be clustered, can set mixed attributes data set S={ X to be clustered₁, X₂,…,X_nIt is the mixed attributes data set that d to be clustered ties up n data point；Wherein, j-th of data point is represented by X_j= [x_j1,x_j2,…,x_jd]；Assuming that numerical attribute has d in mixed attributes data set S to be clustered_rPeacekeeping categorical attribute has d_cDimension so that d_r+d_c=d, such as preceding d_rIt is numerical attribute for individual attribute, rear d_cIndividual attribute is categorical attribute, now mixed attributes data to be clustered The collection S d for blocking distance as categorical attribute_c。

Unified distance metric definition is proposed to mixed attributes data set S to be clustered, mixed attributes data to be clustered are calculated Collect the distance between each two data point in S, such as to any two data point X in mixed attributes data set S to be clustered_iAnd X_j, Their distance can be as obtained by being calculated formula (1)：

D(X_i,X_j)=d (X_i,X_j)_r+d(X_i,X_j)_c(1)；

In formula (1), d (X_i,X_j)_rRepresent the distance of numerical attribute part in mixed attributes data set to be clustered, d (X_i,X_j)_c Represent the distance of categorical attribute part in mixed attributes data set to be clustered；

And d (X_i,X_j)_rIt can be represented using formula (2), formula (2) is as follows；

In formula (2),Represent data point X_iAnd X_jNumerical part attribute normalization after Euclidean distance. Because Euclidean distance is non-negative, therefore the distance value d (X of numerical attribute part can be ensured_i,X_j)_rIt is interval in [0,1].

d(X_i,X_j)_cIt can be represented using formula (3), formula (3) is as follows；

Formula (3) uses the matching process for adding entropy weight, wherein,For data point X_iAnd X_j The matching distance in categorical attribute is tieed up in t；The entropy weight in categorical attribute is tieed up for t, wherein,p(a_ts) it is that the total number that t ties up the classification value in categorical attribute is m_tWhen, s (s= 1,2,...,m_t) the individual probability for being worth appearance.

The distance between each two data point in step S102, the mixed attributes data set to be clustered calculated according to And it is described calculate block distance, the part for obtaining each data point in the mixed attributes data set to be clustered is close The local density of each data point, calculates in degree, and the mixed attributes data set to be clustered further obtained according to The relative distance of each data point in the mixed attributes data set to be clustered；

Detailed process is, according to the distance between each two data point D (X in mixed attributes data set S to be clustered_i,X_j) with And block apart from d_c, the local density ρ of each data point in mixed attributes data set S to be clustered is calculated by formula (4)_i：

In formula (4), ρ_iFor the local density of i-th of data point；

Again by formula (5), the relative distance δ of each data point in mixed attributes data set S to be clustered is calculated_i：

In formula (4), δ_iFor the relative distance of i-th of data point, when local density is not maximal density, data point X_iIt is right The distance answered is the minimum value of point distance of its big point to all density ratios, otherwise, takes it to arrive the maximum of every other point Distance.

To sum up, according to the local density ρ of each data point_iWith apart from δ_iThe decision diagram of structure, user can explicitly send out The number and central point now clustered with selection.

Step S103, define the local density of each data point and its right in the mixed attributes data set to be clustered The γ parameter curves for the relative distance formation answered, and determine the γ of each data point in the mixed attributes data set to be clustered Parameter value；

Detailed process is, in order to realize automatically determining for cluster centre, and γ parameter values γ is defined first_i=ρ_i×δ_iTo make Determine γ parameter curves, and by the γ parameter values γ of γ parameter curves_iInverted order is arranged, at this moment γ_iMust be local close than larger point Spend ρ_iOr relative distance δ_iThan larger point.Wherein, γ_iFor the γ parameter values of i-th of data point.

Step S104, according to the sequence number of each data point in the mixed attributes data set to be clustered, γ parameter values and Relative distance, builds flex point index matrix, and the flex point index matrix of the structure is solved using default crutches point algorithm, Obtain the cluster centre point of the mixed attributes data set to be clustered；

Detailed process is, by calculating γ_iAnd δ_iTwo flex points can determine cluster centre point, i.e., before crutches point Those central points can meet local density ρ_iWith apart from δ_iAll than larger.It therefore, it can be defined and solved according to the flex point of function Method, by the second dervative f " (x) for calculating a function f (x).Solve flex point x₀So that f " (x₀)=0, and in x₀Both sides Numerical value positive and negative values are different, specific as follows：

Sequence number, γ parameter values and the relative distance of each data point in mixed attributes data set S to be clustered are determined, is gone forward side by side One step forms sequence number set, γ set of parameter values and relative distance set respectively；Wherein, sequence number set I=[1,2 ..., n], γ Set of parameter values γ=[γ₁,γ₂,…,γ_n], relative distance set delta=[δ₁,δ₂,…,δ_n]；N is mixed attributes number to be clustered According to data point sum in collection S, and it is positive integer；

According to sequence number set, γ set of parameter values and relative distance set, flex point index matrix CT=[I are built；γ；δ]； Wherein, flex point index matrix CT=[I；γ；δ] it is respectively as row by the sequence number of data point, γ parameter values and relative distance Vector, the 3xn matrixes that data point sum n is formed by column vector；

By flex point index matrix CT=[I；γ；δ] δ rows are sorted from big to small again according to first being sorted from big to small to γ rows Mode be adjusted, the flex point index matrix after being adjusted, and to after adjustment flex point index matrix solve γ rows two Order derivative, obtained value as flex point, and further in flex point index matrix after the adjustment retain columns be less than or equal to turn Institute's directed quantity of point, forms candidate centers collection；

Judge that candidate centers concentrate whether vectorial columns is less than or equal to 2；

If it is not, then continuing to solve the second dervative of δ rows to candidate centers collection, obtained value is incited somebody to action as secondary flex point The candidate centers concentrate the data point less than or equal to columns correspondence sequence number before the secondary flex point to be used as cluster centre point.

As an example, by taking bank credit voucher as an example, the flex point index matrix formed after 10 row adjustment, such as table 1 below It is shown：

Table 1

The second dervative of γ rows is solved to the flex point index matrix after adjustment, it is as shown in table 2 below：

Table 2

Flex point appears in the 8th row it can be seen from γ ", is arranged it can thus be concluded that going out preceding 8 as candidate centers collection HSCT, due to Flex point appears in the 8th row and is more than 2, then second dervative that can again to candidate centers collection HSCT solution δ rows, as shown in table 3 below：

Table 3

From upper table 3 it can be seen that, δ flex points appear in the 2nd row, and the cluster centre that can draw the data set is 2, i.e., the 407 and 127 data points are used as cluster centre point.

Step S105, the cluster centre point according to the obtained mixed attributes data set to be clustered, wait to gather described in realization The expression and output of class mixed attributes cluster data result；Wherein, obtained in the mixed attributes data set to be clustered except described To cluster centre point outside data point will be assigned to during neighbour local density highest clusters, complete the table of cluster result Show and export.

Detailed process is, according to cluster centre point, to realize the expression of the mixed attributes cluster data result to be clustered And output, certainly by non-cluster central point be sequentially allocated with its arest neighbors high density point identical category, thus complete one time gather Class simultaneously exports cluster result, i.e., the data point in mixed attributes data set S to be clustered in addition to cluster centre point will be assigned to During neighbour local density highest clusters, the expression and output of cluster result are completed.

In the embodiment of the present invention, the data set that data points are n, algorithm space complexity is deposited essentially from distance matrix Storage is, it is necessary to which the row of storage 3 coexist in 3*n* (n-1)/2 memory space, i.e. distance matrix, and 1 row and the 2nd row are data point sequence number, the 3rd row For the distance of two data points.

Storage of array local density ρ that 3 length are n is needed in step S103, apart from δ and its product γ, therefore space Complexity is O (n²), and local density calculates and relative distance is calculated and the product of the two is calculated, and time complexity is O(n²)；Sorting time complexity depends on the sort algorithm used, minimum O (nlog during crutches point is calculated in step S104 (n)), it is O (n to the maximum²), therefore time complexity is no more than O (n²)；The number of expression and the output of cluster result in step S105 Strong point distribution time complexity is O (n).Therefore, the total complexity of algorithm is O (n²), than traditional k-prototypes algorithms Clustering Effect is good, efficiency of algorithm is high and can find clusters number automatically.

Implement the embodiment of the present invention, have the advantages that：

The embodiment of the present invention provides mixed attributes data set the distance metric formula of unified mixed attributes data point, and The flex point index matrix between the data point of mixed attributes data is built with this, and then is proposed in the automatic cluster based on crutches point The heart determines method, thus it is better than traditional k-prototypes algorithms Clustering Effect, efficiency of algorithm is high and can find to gather automatically Class number, the influence to outlier is insensitive

Can be with one of ordinary skill in the art will appreciate that realizing that all or part of step in above-described embodiment method is The hardware of correlation is instructed to complete by program, described program can be stored in a computer read/write memory medium, Described storage medium, such as ROM/RAM, disk, CD.

The foregoing is merely illustrative of the preferred embodiments of the present invention, is not intended to limit the invention, all essences in the present invention Any modifications, equivalent substitutions and improvements made within refreshing and principle etc., should be included in the scope of the protection.

Claims

1. a kind of mixed attributes data clustering method based on density peaks, it is characterised in that methods described includes：

S1, mixed attributes data set to be clustered is obtained, and according to the mixed attributes data set to be clustered, calculate and described wait to gather The distance between each two data point in class mixed attributes data set, and calculate the mixed attributes data set to be clustered Block distance；

The distance between each two data point and the meter in S2, the mixed attributes data set to be clustered calculated according to What is calculated blocks distance, obtains the local density of each data point in the mixed attributes data set to be clustered, and further According to the local density of each data point in the obtained mixed attributes data set to be clustered, calculate described to be clustered mixed Close the relative distance that attribute data concentrates each data point；

S3, define in the mixed attributes data set to be clustered the local density of each data point and its it is corresponding it is relative away from From the γ parameter curves of formation, and determine the γ parameter values of each data point in the mixed attributes data set to be clustered；

S4, sequence number, γ parameter values and relative distance according to each data point in the mixed attributes data set to be clustered, structure Flex point index matrix is built, and the flex point index matrix of the structure is solved using default crutches point algorithm, described treat is obtained Cluster the cluster centre point of mixed attributes data set；

S5, the mixed attributes data set to be clustered obtained according to cluster centre point, realize the mixed attributes to be clustered The expression and output of cluster data result；Wherein, except in the obtained cluster in the mixed attributes data set to be clustered Data point outside heart point will be assigned to during neighbour local density highest clusters, and complete the expression and output of cluster result.

2. the mixed attributes data clustering method as claimed in claim 1 based on density peaks, it is characterised in that described to wait to gather The distance between each two data point is by formula D (X in class mixed attributes data set_i,X_j)=d (X_i,X_j)_r+d(X_i,X_j)_c To realize；Wherein, d (X_i,X_j)_rRepresent the distance of numerical attribute part in mixed attributes data set to be clustered, d (X_i,X_j)_cRepresent The distance of categorical attribute part in mixed attributes data set to be clustered；

Wherein, d (X_i,X_j)_rIt is by formulaTo realize；Wherein,Represent Data point X_iAnd X_jThe normalization of numerical part attribute after Euclidean distance, and distance value d (X_i,X_j)_rIt is interval in [0,1]；

3. the mixed attributes data clustering method as claimed in claim 2 based on density peaks, it is characterised in that described to wait to gather The γ parameter values of each data point are by formula γ in class mixed attributes data set_i=ρ_i×δ_iAnd obtain；Wherein, γ_i For the γ parameter values of i-th of data point；ρ_iFor the local density of i-th of data point；δ_iFor the relative distance of i-th of data point.

4. the mixed attributes data clustering method as claimed in claim 3 based on density peaks, it is characterised in that the step S4 is specifically included：

Sequence number, γ parameter values and the relative distance of each data point in the mixed attributes data set to be clustered are determined, is gone forward side by side One step forms sequence number set, γ set of parameter values and relative distance set respectively；Wherein, sequence number set I=[1,2 ..., n], γ Set of parameter values γ=[γ₁,γ₂,…,γ_n], relative distance set delta=[δ₁,δ₂,…,δ_n]；N is the mixing category to be clustered Property data set in data point sum, and for positive integer；

According to the sequence number set of the formation, γ set of parameter values and relative distance set, flex point index matrix CT=[I are built； γ；δ]；Wherein, the flex point index matrix CT=[I；γ；δ] it is by the sequence number of data point, γ parameter values and relative distance point The 3xn matrixes that Wei do not formed as row vector, data point sum n by column vector；

By the flex point index matrix CT=[I；γ；δ] δ rows are sorted from big to small again according to first being sorted from big to small to γ rows Mode be adjusted, the flex point index matrix after being adjusted, and to after the adjustment flex point index matrix solve γ rows Second dervative, obtained value as flex point, and further in the flex point index matrix after the adjustment retain columns be less than Or equal to institute's directed quantity of the flex point, form candidate centers collection；

If it is not, then continuing to solve the second dervative of δ rows to the candidate centers collection, obtained value is incited somebody to action as secondary flex point The candidate centers concentrate the data point less than or equal to columns correspondence sequence number before the secondary flex point to be used as cluster centre point.