CN106682915A

CN106682915A - User cluster analysis method in customer care system

Info

Publication number: CN106682915A
Application number: CN201611212713.XA
Authority: CN
Inventors: 王欣; 张毅; 薛雯; 王燕涛; 王姣; 郑荣; 刘碧莹; 张磊; 齐林林; 刘宇航; 刘蔚; 郑红刚
Original assignee: Northeast Dianli University
Current assignee: Northeast Electric Power University
Priority date: 2016-12-25
Filing date: 2016-12-25
Publication date: 2017-05-17

Abstract

The invention discloses a user cluster analysis method in a customer care system. The method is characterized by establishing a systematic structure based on the customer care system, an application procedure of data mining and a brand-new algorithm. The method can overcome the defects in user clustering methods of current customer care systems, is scientific and proper, has high accuracy and strong universality, excellent effects, simplicity and reliability.

Description

User clustering analysis method in a kind of CRM system

Technical field

The present invention relates to management information system field, more particularly to the user clustering analysis side in CRM system Method.

Background technology

Customer relation management (Customer Relationship Management, CRM) is exactly using present information skill Art, by carrying out profound the demand for tracking, analyzing client to client, keeping frequent customer here, attracts new client, and can be with In time find loyalty it may happen that change client, take measures in time, the interests of maintaining enterprise.With enterprise Jing From being changed into " client " as core with " product " as core, the importance of customer relation management increasingly manifests battalion's pattern.Client Relation management is total solution of the industrially developed country to marketing customer-centric, and CRM is from client's actual demand Set out, using present information instrument, strengthen executive capability of the enterprise at aspects such as customer service, client development and client development, So as to improve the satisfaction of enterprise getting profit and client, the management philosophy of client and enterprise's doulbe-sides' victory is realized.

CRM system is a kind of new type management system for being intended to perfect, improve relation between enterprise and client. CRM system not only will be managed to customer information, it is often more important that carry out data using customer relationship Excavate, to be best understood from structure, the hobby of client etc. of market, to develop new product and adjustment marketing strategy Deng.

At present, enterprise lacks profound excavation to a large amount of customer datas for accumulating so that the service of client is in backwardness Status.In order to preferably study customer relationship, it is very that data mining is incorporated in CRM system It is necessary.Potential customer demand pattern and consuming behavior model can be found out using data mining, helps enterprise to improve client Satisfaction, reduce client turnover rate, improve sales achievement, really embody enterprise " customer-centric " service theory.

Find in patent retrieval at home, application number：201510919909.1, title：It is a kind of to be excavated based on back-end data The method and device of big customer's relation is safeguarded, being primarily directed to customer relation system data carries out data mining, to set up judgement Big customer's method and step, primarily focus on system architecture aspect；Application number：201310204340.1, title：A kind of base It is the main flow of the CRM system foundation based on data mining in the client relation management method and system of data mining Technology；Application number：201210445332.1, title：Enterprise Resources Planning function is provided from customer relation management client application Technology, primarily focus on CRM using the technology and device to the access of Enterprise Resources Planning (ERP) system.There are no so far Close and technical solution of the present invention identical document report and practical application.

The content of the invention

It is an object of the present invention to make up user clustering method defect in prior art CRM system, pass through Establishing based on the architecture of CRM system, the application flow of data mining and New arithmetic method for science, carries Go out a kind of accuracy high, highly versatile, user clustering analysis method in the good and simple and reliable CRM system of effect.

Realize that the technical scheme that the object of the invention is adopted is, user clustering analysis side in a kind of CRM system Method, it is comprised the step of successively：The definition of traffic issues, the preparation of data and screening, the cleaning of data and pretreatment (ETL), the correct reliable data of extraction, the excavation of data, set of modes, model are selected and built, model is evaluated Then instruct management practice activity and the feedback information in client and market is sent in time data bins with explanation, when result is satisfied with Storehouse, is easy to enterprise quickly to be reacted, and is then sent into model selection and is built by model optimization when result is dissatisfied, its feature It is to comprise the concrete steps that：

(1) CRM system data are acquired and are classified：To ensure cluster of the model of training after In well adapting to property, system data should classify according to system user regulation, at least 100 groups data are respectively taken in different sections and is made For training sample；

(2) normalized is done to CRM system data：If dataDomain be d_i=[m_i, M_i], if r_i=ud_i(x_i), (i=1,2,3, n) it is model pairCategory value x_iDimensionless number, and r_i∈[0,1]

Wherein,ForCanonical function, through normalization, individual data span be [0,1]；

(3) using the data after normalized, degree of membership U (t) is initialized, V (t), wherein t are iterationses；

(4) V (t) is updated to into V (t+1) by formula (2)；

(5) by V (t+1),U (t+1) is updated to by formula (3)；

(6) as | J (t+1)-J (t) | π ε, or iterationses t exceed maximum iteration time M when, algorithm terminate；Otherwise proceed to Step (2)；

(7) clustering algorithm：Algorithm mathematics model is introduced in constraints is

It is equivalent to optimization problem

Wherein：d_kj=‖ x_j-v_k‖ represents sample point x_jTo class center v_kEuclidean distance, η ∈ (0,1) be apoplexy due to endogenous wind podiod ring Degree regulatory factor parameter,

Other specification is identical with formula (1) definition, and formula (3) is compared with formula (1), it is clear that consider data in each cluster process Actual distribution characteristic on middle data space；

(8) compensation term Ψ of semi-supervised property is introduced to degree of membership in clustering algorithm, description supervision message its expression formula is

Separating degree function phi describes the dispersion problem between inhomogeneity between class, and its expression formula is

Wish that hyperplane is spaced at greater between class, it is known that message sample has the ability of guiding cluster, and it is subordinate to the shadow of angle value Sound makes accuracy of the final clustering result quality as far as possible than being clustered with random number higher, therefore, formula (3) is made an amendment, to degree of membership Separating degree function between the compensation term and class of semi-supervised property is introduced, new object function is obtained, and then obtains the number of clustering method Model is learned, its expression-form is

Wherein:0 π η π 1, η are that apoplexy due to endogenous wind podiod rings factor parameter；Other specification and formula (3) it is identical.

For model (6), solved using Lagrange multiplier factor methods, construction Lagrange functions are

OrderThe iterative formula for obtaining optimal solution is

WhereinFor supervision message item, if its value meets data x_jFor Given information sample, thenValue and known letter Manner of breathing etc.；Otherwise it is zero,ForSubordinated-degree matrix known to the c × n of composition；

(9) PN being set as training set, there is p positive example and n counter-example in PN, for a sample set, the PE's of positive example collection is general Rate is p/ (p+n), and counter-example integrates the probability of NE as n/ (p+n)；

One decision tree can be seen as the message source with positive and negative example collection, and the message that message source is produced expects information It is：

If the value of attribute A is { A₁,A₂,Λ,A_m, PN is categorized as into k subset { PN₁, PN₂, Λ, PN_k, if PN_iThere is p_i Individual positive example, n_iIndividual counter-example, subtree PN_iRequired expectation information is I (p_i,n_i), and expectation information of the root required for the tree of A The weighted mean of the expectation information for needed for each subtree, i.e.,：

Tree with root as the A information gain for obtaining that carries out classifying is：

Gain (A)=I (p, n)-E (A) (12)

The maximum attributes of Gain (A) are selected as the node branch attribute, for each node of decision tree is used This principle, until setting up out complete decision tree till；

(10) decision trees are as follows：

Input：S：Training sample set, is mainly described by Category Attributes value；

Candidate-attribute：Candidate attribute set,

Output：One decision tree,

A () creates node N；

B () IF S are in a class C THEN；

C () returns N as leaf node, be labeled as class C；

D () IF Candidate-attribute are Null THEN；

E () returns N as leaf node, the general category being labeled as in S；// majority voting；

F () selects the attribute A in Candidate-attribute with highest information gain；

G () flag node N is A；

Given value a in (h) FOR Each A_i；// divide training sample；

I () grows a condition for A=a by node N_iBranch；

J () sets S_iIt is the branch in training sample S；// mono- division；

(k) IF A=a_iFor Null THEN；

L () adds a leaveves, be labeled as class most common in S；

M () ELSE is plus a node returned by Generate_decision_tree (S, candidate_A)；

After the information gain for obtaining each attribute, parameter is calculated using function, go to correct the information gain, as The division module of Attributions selection and sample dividing subset, for those samples for lacking property value, using relative frequency with Machine probability number is adjusted,

The step of specific algorithm is successively：Statistical sample total amount, calculating training sample schedule properties value information gain G ain (Q), training sample schedule properties value information gain G ain is corrected ' the current sample of Attribute transposition (Q), according to MAXGain ' (Q) Value, establishment and MAXGain ' (Q) the corresponding root node A of property value, determine the next stage node of root node A, obtain whole leaves Node, generation decision tree, do not obtain whole leaf nodes and return calculating training sample schedule properties value information gain G ain (Q),

In order to carry out the analysis of cluster result using decision tree, the analytical standard for first defining some customer values is needed, this A little standards are stored in data base or XML file in the form of tables of data, after the analytical standard for defining customer value, profit It is generated algorithmically by decision tree to explain the structure of cluster analyses；

(11) database data of CRM system is set up as input using step (1) to step (10) Model clustered, obtain relevant cluster result.

The method of user clustering analysis in a kind of CRM system of the present invention, application data is excavated in customer relationship By extracting to the data in data warehouse in management system, using these data as the modeling sample studied；Then it is sharp Data are excavated with the algorithm of various data minings, the result to excavating is analyzed and models, and model is carried out constantly Optimization, and then obtain relevant cluster result.Have the advantage that and be embodied in：

1. cluster analyses are carried out to CRM system data using intelligent computation model, are capable of achieving effectively cluster, The deficiency of existing method is compensate for, preferable assosting effect is served to existing method；

2. it is simple without doing any change to CRM system, it is not required that increase new equipment；

3. merge various artificial intelligence's computation models, reduce single model and the several of larger error are likely to occur in cluster Rate, makes cluster result more stable.

4. its is scientific and reasonable, and accuracy is high, and highly versatile, effect is good.

Description of the drawings

Fig. 1 is a kind of data digging flow figure of user clustering analysis in CRM system；

Fig. 2 is a kind of algorithm steps block diagram of user clustering analysis in CRM system；

The change curve of Fig. 3 performance indications and supervision message ratio.

Specific embodiment:

Below with method of the drawings and Examples to user clustering analysis in a kind of CRM system of the invention It is described further.

With reference to Fig. 1, user clustering analysis method in a kind of CRM system of the present invention, successively including the step of Have：The definition of traffic issues, the preparation of data and screening, the cleaning of data and pretreatment (ETL), the correct reliable number for extracting According to, the excavation of data, set of modes, model select with build, model is evaluated and explained, when enterprise is then instructed in result satisfaction Industry practical activity and the feedback information in client and market is sent in time data warehouse, is easy to enterprise quickly to be reacted, Then send into model selection and build by model optimization when result is dissatisfied.

For the reasonability of verification algorithm, in UCI machine learning databases, adopt and be usually used in clustering method detection Iris data sets, Wine data sets and Balance-scale data sets are tested, and data set information is listed in table 1.

The experimental data set information of table 1

For each data set, the 10% of population sample, 20%, 30% is randomly selected, 40% used as test set.For visitor Sight carries out the odds of algorithms of different performance compared with, setting parameter m=2, η=0.000 1.

Comprise the following steps that：

(1) customer relation system data are acquired and are classified：Have in ensure cluster of the model of training after Well adapting to property, system data should classify according to system user regulation.Respectively fetch data conduct in data above collection difference section Training sample.

(2) normalized is done to the data of CRM system：If dataDomain be d_i=[m_i, M_i], if r_i=ud_i(x_i), (i=1,2,3, n) it is model pairCategory value x_iDimensionless number, and r_i∈[0,1].

Wherein,ForCanonical function.Through normalization, individual data span is [0,1].

(4) V (t) is updated to into V (t+1) by formula (2)；

(5) by V (t+1),U (t+1) is updated to by formula (3)；

(7) algorithm mathematics model is introduced in clustering algorithm constraints is

It is equivalent to optimization problem

Wherein:d_kj=‖ x_j-v_k‖ represents sample point x_jTo class center v_kEuclidean distance；η ∈ (0, it is 1) apoplexy due to endogenous wind podiod sound Degree regulatory factor parameter；

Other specification is identical with formula (1) definition.Formula (3) is compared with formula (1), it is clear that consider data in each cluster process Actual distribution characteristic on middle data space.

Wish that hyperplane is spaced at greater between class.Given information sample has the ability of guiding cluster, and it is subordinate to the shadow of angle value Sound makes accuracy of the final clustering result quality as far as possible than being clustered with random number higher.Therefore, formula (3) is made an amendment, to degree of membership Separating degree function between the compensation term and class of semi-supervised property is introduced, new object function is obtained, and then obtains the number of clustering method Model is learned, its expression-form is

OrderThe iterative formula for obtaining optimal solution is

WhereinFor supervision message item, if its value meets data x_jFor Given information sample, thenValue and known letter Manner of breathing etc.；Otherwise it is zero.ForSubordinated-degree matrix known to the c × n of composition.

(9) PN being set as training set, there is p positive example and n counter-example in PN, for a sample set, the PE's of positive example collection is general Rate is p/ (p+n), and counter-example integrates the probability of NE as n/ (p+n).

If the value of attribute A is { A₁,A₂,Λ,A_m, PN is categorized as into k subset { PN₁, PN₂, Λ, PN_k}.If PN_iThere is p_i Individual positive example, n_iIndividual counter-example, subtree PN_iRequired expectation information is I (p_i,n_i).And expectation information of the root required for the tree of A The weighted mean of the expectation information for needed for each subtree, i.e.,：

Gain (A)=I (p, n)-E (A) (12)

The maximum attributes of Gain (A) are selected as the node branch attribute, for each node of decision tree is used This principle, until setting up out complete decision tree till.

(10) decision trees are as follows：

Candidate-attribute：Candidate attribute set.

Output：One decision tree.

Method：

A () creates node N；

B () IF S are in a class C THEN；

C () returns N as leaf node, be labeled as class C；

D () IF Candidate-attribute are Null THEN；

G () flag node N is A；

Given value a in (h) FOR Each A_i；// divide training sample；

I () grows a condition for A=a by node N_iBranch；

J () sets S_iIt is the branch in training sample S；// mono- division；

(k) IF A=a_iFor Null THEN；

L () adds a leaveves, be labeled as class most common in S；

M () ELSE is plus a node returned by Generate_decision_tree (S, candidate_A)；

After the information gain for obtaining each attribute, parameter is calculated using function, go to correct the information gain, as The division module of Attributions selection and sample dividing subset.For those samples for lacking property value, using relative frequency with Machine probability number is adjusted.

As shown in Fig. 2 the step of specific algorithm is successively：Statistical sample total amount, calculating training sample schedule properties value letter Breath gain G ain (Q), amendment training sample schedule properties value information gain G ain ' (Q), according to MAXGain ' attributes (Q) draw Point current sample value, create with MAXGain ' (Q) the corresponding root node A of property value, determine the next stage node of root node A, ask Go out whole leaf nodes, generate decision tree, do not obtain whole leaf nodes and return the value information increasing of calculating training sample schedule properties Beneficial Gain (Q).

In order to carry out the analysis of cluster result using decision tree, the analytical standard for first defining some customer values is needed, this A little standards are stored in data base or XML file in the form of tables of data, after the analytical standard for defining customer value, just Can be generated algorithmically by decision tree to explain the structure of cluster analyses with profit.

Performance Evaluating Indexes are RI=n0/n, and wherein n0 is obtained after contrasting with standard data set for the cluster result of test set The mean number of correct classification samples；N is the total sample number of test data set；RI values are bigger, represent that cluster accuracy is bigger, gather Class effect is better.Experiment is repeated 5 times, the meansigma methodss of experimental result RI are listed in table 2.From table 2, with the increasing of supervision message Many, the accuracy of cluster has increase tendency, shows that supervision message data have directive function.In Iris data sets, Wine data Performance indications are as shown in Figure 3 with the change curve of supervision message ratio on collection and Balance-scale data sets.As seen from Figure 3: On different pieces of information collection, RI values increase with the increase of supervision message ratio；Although the rate of climb of cluster accuracy can not be by prison Superintend and direct the amplification of quantity of information and change, but generally remain above the clustering precision of original clustering algorithm, and then demonstrate the algorithm Reasonability and effectiveness.

The comparison of experimental result RI of table 2

In sum, clustering algorithm of the invention, reduces the wave of information in cluster process using known sample information Take, while considering in class dispersion information between tightness information and class, be effectively improved the blindness of original clustering method. by this Bright method carries out emulation experiment on UCI data sets, test result indicate that, new algorithm proposed by the invention is generally better than it The performance of his clustering algorithm.

A kind of method that porous silicon/Graphene composite lithium ion battery cathode material is prepared as raw material with kieselguhr of the present invention Raw material used is commercially available prod, and raw material is easy to get, convenient to carry out.

Design conditions, legend in the embodiment of the present invention etc. are only used for that the present invention is further illustrated, not exhaustive, Do not constitute the restriction to claims, the enlightenment that those skilled in the art obtain according to embodiments of the present invention, no Other replacements being substantially equal to are would occur to through creative work, in the scope of the present invention.

Claims

1. user clustering analysis method in a kind of CRM system, it is comprised the step of successively：Traffic issues are determined Justice, the preparation of data and screening, the cleaning of data and pretreatment (ETL), the correct reliable data for extracting, the excavation of data, Set of modes, model select with build, model is evaluated and is explained, when result satisfaction then instruct management practice activity and The feedback information in client and market is sent in time data warehouse, is easy to enterprise quickly to be reacted, when result is unsatisfied with Then send into model by model optimization to select and structure, it is characterized in that, comprise the concrete steps that：

(1) CRM system data are acquired and are classified：Have in ensure cluster of the model of training after Well adapting to property, system data should classify according to system user regulation, and at least 100 groups data are respectively taken in different sections as instruction Practice sample；

(2) normalized is done to CRM system data：If dataDomain be d_i=[m_i,M_i], if r_i=ud_i(x_i), (i=1,2,3 ..., n) are model pairCategory value x_iDimensionless number, and r_i∈[0,1]

(4) V (t) is updated to into V (t+1) by formula (2)；

(5) by V (t+1),U (t+1) is updated to by formula (3)；

\begin{matrix} \min j (U, V, λ) = Σ_{i = 1}^{c} Σ_{j = 1}^{n} u_{i j}^{m} d_{i j}^{2} - Σ_{i = 1}^{c} Σ_{j = 1}^{n} λ_{j} u_{i j}^{m} \ln u_{i j}^{m} \\ s . t Σ_{i = 1}^{c} u_{i j} = 1, u_{i j} &Element; [0, 1], Σ_{j = 1}^{n} u_{i j} &Element; (0, n), \end{matrix} - - - (2)

It is equivalent to optimization problem

\begin{matrix} \min j (U, V) = Σ_{i = 1}^{c} Σ_{j = 1}^{n} δ_{j}^{m} u_{i j}^{m} d_{i j}^{2} \\ s . t Σ_{i = 1}^{c} u_{i j} = 1, u_{i j} &Element; [0, 1], Σ_{j = 1}^{n} u_{i j} &Element; (0, n), \end{matrix} - - - (3)

Wherein：d_kj=‖ x_j-v_k‖ represents sample point x_jTo class center v_kEuclidean distance, η ∈ (0, be 1) apoplexy due to endogenous wind heart influence degree Regulatory factor parameter,

δ_{i} = {(Σ_{k = 1}^{c} d_{k j}^{2})}^{- 1}; λ = (λ_{1}, λ_{2}, Λ, λ_{n});

Other specification is identical with formula (1) definition, and formula (3) is compared with formula (1), it is clear that consider data number in each cluster process According to actual distribution characteristic spatially；

ψ = Σ_{i = 1}^{c} Σ_{j = 1}^{n} {(u_{i j} - {\hat{u}}_{i j})}^{m}, - - - (4)

φ = η Σ_{i = 1}^{c} Σ_{h = 1, h &NotEqual; i}^{c} | | v_{i} - v_{h} | |^{2}, - - - (5)

Wish that hyperplane is spaced at greater between class, it is known that message sample has the ability of guiding cluster, its impact for being subordinate to angle value makes Final accuracy of the clustering result quality as far as possible than being clustered with random number is higher, therefore, formula (3) is made an amendment, degree of membership is introduced Separating degree function between the compensation term and class of semi-supervised property, obtains new object function, and then obtains the mathematical modulo of clustering method Type, its expression-form is

\begin{matrix} \min j (U, V) = Σ_{i = 1}^{c} Σ_{j = 1}^{n} δ_{j}^{m} {(u_{i j} - {\hat{u}}_{i j})}^{m} d_{i j}^{2} - η Σ_{i = 1}^{c} Σ_{j = 1}^{n} {δm}_{j}^{m} {(u_{i j} - {\hat{u}}_{i j})}^{m} Σ_{h = i, h &NotEqual; k}^{c} | | v_{i} - v_{j} | |^{2}, \\ s . t Σ_{i = 1}^{c} u_{i j} = 1, u_{i j} &Element; [0, 1], Σ_{j = 1}^{n} u_{i j} &Element; (0, n), \end{matrix} - - - (6)

Wherein:0 π η π 1, η are that apoplexy due to endogenous wind podiod rings factor parameter；Other specification and formula (3) phase Together.

J = J (U, V) - Σ_{k = 1}^{n} λ_{k} (Σ_{i = 1}^{c} u_{i j} - 1), - - - (7)

OrderThe iterative formula for obtaining optimal solution is

v_{i} = \frac{{Σδ}_{j}^{m} {(u_{i j} - {\hat{u}}_{i j})}^{m} x_{j} - η Σ_{j = 1}^{n} [δ_{j}^{m} {(u_{i j} - {\hat{u}}_{i j})}^{m} Σ_{h = i, h &NotEqual; i}^{c} v_{h}}{Σ_{j = 1}^{n} δ_{j}^{m} {(u_{i j} - {\hat{u}}_{i j})}^{m} - η (c - 1) Σ_{j = 1}^{n} δ_{j}^{m} {(u_{i j} - {\hat{u}}_{i j})}^{m}} - - - (8)

u_{i j} = {\hat{u}}_{i j} + \frac{1 - Σ_{k = 1}^{c} {\hat{u}}_{i j}}{{(Σ_{k = 1}^{N} \frac{δ_{j}^{m} d_{i j}^{2} - {ηδ}_{j}^{m} Σ_{h = 1, h &NotEqual; i}^{c} | | v_{i} - v_{h} | |^{2}}{δ_{j}^{m} d_{k j}^{2} - {ηδ}_{j}^{m} Σ_{h = 1, h &NotEqual; i}^{c} | | v - v_{h} | |^{2}})}^{1 / (m - 1)}}, - - - (9)

WhereinFor supervision message item, if its value meets data x_jFor Given information sample, thenValue and Given information phase Deng；Otherwise it is zero,ForSubordinated-degree matrix known to the c × n of composition；

(9) PN being set as training set, there is p positive example and n counter-example in PN, for a sample set, the probability of the PE of positive example collection is P/ (p+n), counter-example integrates the probability of NE as n/ (p+n)；

One decision tree can be seen as the message source with positive and negative example collection, and the message that message source is produced expects that information is：

I (p, n) = - \frac{p}{p + n} \log_{2} \frac{p}{p + n} - \frac{n}{p + n} \log_{2} \frac{n}{p + n} - - - (10)

If the value of attribute A is { A₁,A₂,Λ,A_m, PN is categorized as into k subset { PN₁, PN₂, Λ, PN_k, if PN_iThere is p_iIt is individual just Example, n_iIndividual counter-example, subtree PN_iRequired expectation information is I (p_i,n_i), and expectation information of the root required for the tree of A is each The weighted mean of the expectation information needed for subtree, i.e.,：

E (A) = Σ_{i = 1}^{k} \frac{p_{i} + n_{i}}{p + n} I (p_{i}, n_{i}) - - - (11)

Gain (A)=I (p, n)-E (A) (12)

The maximum attributes of Gain (A) are selected as the node branch attribute, for each node of decision tree uses this Principle, until setting up out complete decision tree till；

(10) decision trees are as follows：

Candidate-attribute：Candidate attribute set,

Output：One decision tree,

A () creates node N；

B () IF S are in a class C THEN；

C () returns N as leaf node, be labeled as class C；

D () IF Candidate-attribute are Null THEN；

G () flag node N is A；

Given value a in (h) FOR Each A_i；// divide training sample；

I () grows a condition for A=a by node N_iBranch；

J () sets S_iIt is the branch in training sample S；// mono- division；

(k) IF A=a_iFor Null THEN；

L () adds a leaveves, be labeled as class most common in S；

M () ELSE is plus a node returned by Generate_decision_tree (S, candidate_A)；

After the information gain for obtaining each attribute, parameter is calculated using function, go to correct the information gain, as attribute Selection and the division module of sample dividing subset, it is general at random using relative frequency for those samples for lacking property value Rate number is adjusted,

The step of specific algorithm is successively：Statistical sample total amount, calculate training sample schedule properties value information gain G ain (Q), Amendment training sample schedule properties value information gain G ain ' (Q), according to MAXGain ' Attribute transposition current sample values (Q), wound Build with MAXGain ' (Q) the corresponding root node A of property value, determine root node A next stage node, obtain whole leaf nodes, Decision tree is generated, whole leaf nodes is not obtained and is returned calculating training sample schedule properties value information gain G ain (Q),

In order to carry out the analysis of cluster result using decision tree, the analytical standard for first defining some customer values is needed, these marks Standard is stored in data base or XML file in the form of tables of data, after the analytical standard for defining customer value, using calculation Method generates decision tree to explain the structure of cluster analyses；

(11) using the database data of CRM system as input, the mould set up using step (1) to step (10) Type is clustered, and obtains relevant cluster result.