CN106682915A - User cluster analysis method in customer care system - Google Patents

User cluster analysis method in customer care system Download PDF

Info

Publication number
CN106682915A
CN106682915A CN201611212713.XA CN201611212713A CN106682915A CN 106682915 A CN106682915 A CN 106682915A CN 201611212713 A CN201611212713 A CN 201611212713A CN 106682915 A CN106682915 A CN 106682915A
Authority
CN
China
Prior art keywords
sigma
data
node
delta
attribute
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201611212713.XA
Other languages
Chinese (zh)
Inventor
王欣
张毅
薛雯
王燕涛
王姣
郑荣
刘碧莹
张磊
齐林林
刘宇航
刘蔚
郑红刚
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Northeast Electric Power University
Original Assignee
Northeast Dianli University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Northeast Dianli University filed Critical Northeast Dianli University
Priority to CN201611212713.XA priority Critical patent/CN106682915A/en
Publication of CN106682915A publication Critical patent/CN106682915A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/22Indexing; Data structures therefor; Storage structures
    • G06F16/2228Indexing structures
    • G06F16/2246Trees, e.g. B+trees
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/285Clustering or classification
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/01Customer relationship services

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • General Physics & Mathematics (AREA)
  • Physics & Mathematics (AREA)
  • Business, Economics & Management (AREA)
  • General Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Software Systems (AREA)
  • Accounting & Taxation (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Finance (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a user cluster analysis method in a customer care system. The method is characterized by establishing a systematic structure based on the customer care system, an application procedure of data mining and a brand-new algorithm. The method can overcome the defects in user clustering methods of current customer care systems, is scientific and proper, has high accuracy and strong universality, excellent effects, simplicity and reliability.

Description

User clustering analysis method in a kind of CRM system
Technical field
The present invention relates to management information system field, more particularly to the user clustering analysis side in CRM system Method.
Background technology
Customer relation management (Customer Relationship Management, CRM) is exactly using present information skill Art, by carrying out profound the demand for tracking, analyzing client to client, keeping frequent customer here, attracts new client, and can be with In time find loyalty it may happen that change client, take measures in time, the interests of maintaining enterprise.With enterprise Jing From being changed into " client " as core with " product " as core, the importance of customer relation management increasingly manifests battalion's pattern.Client Relation management is total solution of the industrially developed country to marketing customer-centric, and CRM is from client's actual demand Set out, using present information instrument, strengthen executive capability of the enterprise at aspects such as customer service, client development and client development, So as to improve the satisfaction of enterprise getting profit and client, the management philosophy of client and enterprise's doulbe-sides' victory is realized.
CRM system is a kind of new type management system for being intended to perfect, improve relation between enterprise and client. CRM system not only will be managed to customer information, it is often more important that carry out data using customer relationship Excavate, to be best understood from structure, the hobby of client etc. of market, to develop new product and adjustment marketing strategy Deng.
At present, enterprise lacks profound excavation to a large amount of customer datas for accumulating so that the service of client is in backwardness Status.In order to preferably study customer relationship, it is very that data mining is incorporated in CRM system It is necessary.Potential customer demand pattern and consuming behavior model can be found out using data mining, helps enterprise to improve client Satisfaction, reduce client turnover rate, improve sales achievement, really embody enterprise " customer-centric " service theory.
Find in patent retrieval at home, application number:201510919909.1, title:It is a kind of to be excavated based on back-end data The method and device of big customer's relation is safeguarded, being primarily directed to customer relation system data carries out data mining, to set up judgement Big customer's method and step, primarily focus on system architecture aspect;Application number:201310204340.1, title:A kind of base It is the main flow of the CRM system foundation based on data mining in the client relation management method and system of data mining Technology;Application number:201210445332.1, title:Enterprise Resources Planning function is provided from customer relation management client application Technology, primarily focus on CRM using the technology and device to the access of Enterprise Resources Planning (ERP) system.There are no so far Close and technical solution of the present invention identical document report and practical application.
The content of the invention
It is an object of the present invention to make up user clustering method defect in prior art CRM system, pass through Establishing based on the architecture of CRM system, the application flow of data mining and New arithmetic method for science, carries Go out a kind of accuracy high, highly versatile, user clustering analysis method in the good and simple and reliable CRM system of effect.
Realize that the technical scheme that the object of the invention is adopted is, user clustering analysis side in a kind of CRM system Method, it is comprised the step of successively:The definition of traffic issues, the preparation of data and screening, the cleaning of data and pretreatment (ETL), the correct reliable data of extraction, the excavation of data, set of modes, model are selected and built, model is evaluated Then instruct management practice activity and the feedback information in client and market is sent in time data bins with explanation, when result is satisfied with Storehouse, is easy to enterprise quickly to be reacted, and is then sent into model selection and is built by model optimization when result is dissatisfied, its feature It is to comprise the concrete steps that:
(1) CRM system data are acquired and are classified:To ensure cluster of the model of training after In well adapting to property, system data should classify according to system user regulation, at least 100 groups data are respectively taken in different sections and is made For training sample;
(2) normalized is done to CRM system data:If dataDomain be di=[mi, Mi], if ri=udi(xi), (i=1,2,3, n) it is model pairCategory value xiDimensionless number, and ri∈[0,1]
Wherein,ForCanonical function, through normalization, individual data span be [0,1];
(3) using the data after normalized, degree of membership U (t) is initialized, V (t), wherein t are iterationses;
(4) V (t) is updated to into V (t+1) by formula (2);
(5) by V (t+1),U (t+1) is updated to by formula (3);
(6) as | J (t+1)-J (t) | π ε, or iterationses t exceed maximum iteration time M when, algorithm terminate;Otherwise proceed to Step (2);
(7) clustering algorithm:Algorithm mathematics model is introduced in constraints is
It is equivalent to optimization problem
Wherein:dkj=‖ xj-vk‖ represents sample point xjTo class center vkEuclidean distance, η ∈ (0,1) be apoplexy due to endogenous wind podiod ring Degree regulatory factor parameter,
Other specification is identical with formula (1) definition, and formula (3) is compared with formula (1), it is clear that consider data in each cluster process Actual distribution characteristic on middle data space;
(8) compensation term Ψ of semi-supervised property is introduced to degree of membership in clustering algorithm, description supervision message its expression formula is
Separating degree function phi describes the dispersion problem between inhomogeneity between class, and its expression formula is
Wish that hyperplane is spaced at greater between class, it is known that message sample has the ability of guiding cluster, and it is subordinate to the shadow of angle value Sound makes accuracy of the final clustering result quality as far as possible than being clustered with random number higher, therefore, formula (3) is made an amendment, to degree of membership Separating degree function between the compensation term and class of semi-supervised property is introduced, new object function is obtained, and then obtains the number of clustering method Model is learned, its expression-form is
Wherein:0 π η π 1, η are that apoplexy due to endogenous wind podiod rings factor parameter;Other specification and formula (3) it is identical.
For model (6), solved using Lagrange multiplier factor methods, construction Lagrange functions are
OrderThe iterative formula for obtaining optimal solution is
WhereinFor supervision message item, if its value meets data xjFor Given information sample, thenValue and known letter Manner of breathing etc.;Otherwise it is zero,ForSubordinated-degree matrix known to the c × n of composition;
(9) PN being set as training set, there is p positive example and n counter-example in PN, for a sample set, the PE's of positive example collection is general Rate is p/ (p+n), and counter-example integrates the probability of NE as n/ (p+n);
One decision tree can be seen as the message source with positive and negative example collection, and the message that message source is produced expects information It is:
If the value of attribute A is { A1,A2,Λ,Am, PN is categorized as into k subset { PN1, PN2, Λ, PNk, if PNiThere is pi Individual positive example, niIndividual counter-example, subtree PNiRequired expectation information is I (pi,ni), and expectation information of the root required for the tree of A The weighted mean of the expectation information for needed for each subtree, i.e.,:
Tree with root as the A information gain for obtaining that carries out classifying is:
Gain (A)=I (p, n)-E (A) (12)
The maximum attributes of Gain (A) are selected as the node branch attribute, for each node of decision tree is used This principle, until setting up out complete decision tree till;
(10) decision trees are as follows:
Input:S:Training sample set, is mainly described by Category Attributes value;
Candidate-attribute:Candidate attribute set,
Output:One decision tree,
A () creates node N;
B () IF S are in a class C THEN;
C () returns N as leaf node, be labeled as class C;
D () IF Candidate-attribute are Null THEN;
E () returns N as leaf node, the general category being labeled as in S;// majority voting;
F () selects the attribute A in Candidate-attribute with highest information gain;
G () flag node N is A;
Given value a in (h) FOR Each Ai;// divide training sample;
I () grows a condition for A=a by node NiBranch;
J () sets SiIt is the branch in training sample S;// mono- division;
(k) IF A=aiFor Null THEN;
L () adds a leaveves, be labeled as class most common in S;
M () ELSE is plus a node returned by Generate_decision_tree (S, candidate_A);
After the information gain for obtaining each attribute, parameter is calculated using function, go to correct the information gain, as The division module of Attributions selection and sample dividing subset, for those samples for lacking property value, using relative frequency with Machine probability number is adjusted,
The step of specific algorithm is successively:Statistical sample total amount, calculating training sample schedule properties value information gain G ain (Q), training sample schedule properties value information gain G ain is corrected ' the current sample of Attribute transposition (Q), according to MAXGain ' (Q) Value, establishment and MAXGain ' (Q) the corresponding root node A of property value, determine the next stage node of root node A, obtain whole leaves Node, generation decision tree, do not obtain whole leaf nodes and return calculating training sample schedule properties value information gain G ain (Q),
In order to carry out the analysis of cluster result using decision tree, the analytical standard for first defining some customer values is needed, this A little standards are stored in data base or XML file in the form of tables of data, after the analytical standard for defining customer value, profit It is generated algorithmically by decision tree to explain the structure of cluster analyses;
(11) database data of CRM system is set up as input using step (1) to step (10) Model clustered, obtain relevant cluster result.
The method of user clustering analysis in a kind of CRM system of the present invention, application data is excavated in customer relationship By extracting to the data in data warehouse in management system, using these data as the modeling sample studied;Then it is sharp Data are excavated with the algorithm of various data minings, the result to excavating is analyzed and models, and model is carried out constantly Optimization, and then obtain relevant cluster result.Have the advantage that and be embodied in:
1. cluster analyses are carried out to CRM system data using intelligent computation model, are capable of achieving effectively cluster, The deficiency of existing method is compensate for, preferable assosting effect is served to existing method;
2. it is simple without doing any change to CRM system, it is not required that increase new equipment;
3. merge various artificial intelligence's computation models, reduce single model and the several of larger error are likely to occur in cluster Rate, makes cluster result more stable.
4. its is scientific and reasonable, and accuracy is high, and highly versatile, effect is good.
Description of the drawings
Fig. 1 is a kind of data digging flow figure of user clustering analysis in CRM system;
Fig. 2 is a kind of algorithm steps block diagram of user clustering analysis in CRM system;
The change curve of Fig. 3 performance indications and supervision message ratio.
Specific embodiment:
Below with method of the drawings and Examples to user clustering analysis in a kind of CRM system of the invention It is described further.
With reference to Fig. 1, user clustering analysis method in a kind of CRM system of the present invention, successively including the step of Have:The definition of traffic issues, the preparation of data and screening, the cleaning of data and pretreatment (ETL), the correct reliable number for extracting According to, the excavation of data, set of modes, model select with build, model is evaluated and explained, when enterprise is then instructed in result satisfaction Industry practical activity and the feedback information in client and market is sent in time data warehouse, is easy to enterprise quickly to be reacted, Then send into model selection and build by model optimization when result is dissatisfied.
For the reasonability of verification algorithm, in UCI machine learning databases, adopt and be usually used in clustering method detection Iris data sets, Wine data sets and Balance-scale data sets are tested, and data set information is listed in table 1.
The experimental data set information of table 1
For each data set, the 10% of population sample, 20%, 30% is randomly selected, 40% used as test set.For visitor Sight carries out the odds of algorithms of different performance compared with, setting parameter m=2, η=0.000 1.
Comprise the following steps that:
(1) customer relation system data are acquired and are classified:Have in ensure cluster of the model of training after Well adapting to property, system data should classify according to system user regulation.Respectively fetch data conduct in data above collection difference section Training sample.
(2) normalized is done to the data of CRM system:If dataDomain be di=[mi, Mi], if ri=udi(xi), (i=1,2,3, n) it is model pairCategory value xiDimensionless number, and ri∈[0,1].
Wherein,ForCanonical function.Through normalization, individual data span is [0,1].
(3) using the data after normalized, degree of membership U (t) is initialized, V (t), wherein t are iterationses;
(4) V (t) is updated to into V (t+1) by formula (2);
(5) by V (t+1),U (t+1) is updated to by formula (3);
(6) as | J (t+1)-J (t) | π ε, or iterationses t exceed maximum iteration time M when, algorithm terminate;Otherwise proceed to Step (2);
(7) algorithm mathematics model is introduced in clustering algorithm constraints is
It is equivalent to optimization problem
Wherein:dkj=‖ xj-vk‖ represents sample point xjTo class center vkEuclidean distance;η ∈ (0, it is 1) apoplexy due to endogenous wind podiod sound Degree regulatory factor parameter;
Other specification is identical with formula (1) definition.Formula (3) is compared with formula (1), it is clear that consider data in each cluster process Actual distribution characteristic on middle data space.
(8) compensation term Ψ of semi-supervised property is introduced to degree of membership in clustering algorithm, description supervision message its expression formula is
Separating degree function phi describes the dispersion problem between inhomogeneity between class, and its expression formula is
Wish that hyperplane is spaced at greater between class.Given information sample has the ability of guiding cluster, and it is subordinate to the shadow of angle value Sound makes accuracy of the final clustering result quality as far as possible than being clustered with random number higher.Therefore, formula (3) is made an amendment, to degree of membership Separating degree function between the compensation term and class of semi-supervised property is introduced, new object function is obtained, and then obtains the number of clustering method Model is learned, its expression-form is
Wherein:0 π η π 1, η are that apoplexy due to endogenous wind podiod rings factor parameter;Other specification and formula (3) it is identical.
For model (6), solved using Lagrange multiplier factor methods, construction Lagrange functions are
OrderThe iterative formula for obtaining optimal solution is
WhereinFor supervision message item, if its value meets data xjFor Given information sample, thenValue and known letter Manner of breathing etc.;Otherwise it is zero.ForSubordinated-degree matrix known to the c × n of composition.
(9) PN being set as training set, there is p positive example and n counter-example in PN, for a sample set, the PE's of positive example collection is general Rate is p/ (p+n), and counter-example integrates the probability of NE as n/ (p+n).
One decision tree can be seen as the message source with positive and negative example collection, and the message that message source is produced expects information It is:
If the value of attribute A is { A1,A2,Λ,Am, PN is categorized as into k subset { PN1, PN2, Λ, PNk}.If PNiThere is pi Individual positive example, niIndividual counter-example, subtree PNiRequired expectation information is I (pi,ni).And expectation information of the root required for the tree of A The weighted mean of the expectation information for needed for each subtree, i.e.,:
Tree with root as the A information gain for obtaining that carries out classifying is:
Gain (A)=I (p, n)-E (A) (12)
The maximum attributes of Gain (A) are selected as the node branch attribute, for each node of decision tree is used This principle, until setting up out complete decision tree till.
(10) decision trees are as follows:
Input:S:Training sample set, is mainly described by Category Attributes value;
Candidate-attribute:Candidate attribute set.
Output:One decision tree.
Method:
A () creates node N;
B () IF S are in a class C THEN;
C () returns N as leaf node, be labeled as class C;
D () IF Candidate-attribute are Null THEN;
E () returns N as leaf node, the general category being labeled as in S;// majority voting;
F () selects the attribute A in Candidate-attribute with highest information gain;
G () flag node N is A;
Given value a in (h) FOR Each Ai;// divide training sample;
I () grows a condition for A=a by node NiBranch;
J () sets SiIt is the branch in training sample S;// mono- division;
(k) IF A=aiFor Null THEN;
L () adds a leaveves, be labeled as class most common in S;
M () ELSE is plus a node returned by Generate_decision_tree (S, candidate_A);
After the information gain for obtaining each attribute, parameter is calculated using function, go to correct the information gain, as The division module of Attributions selection and sample dividing subset.For those samples for lacking property value, using relative frequency with Machine probability number is adjusted.
As shown in Fig. 2 the step of specific algorithm is successively:Statistical sample total amount, calculating training sample schedule properties value letter Breath gain G ain (Q), amendment training sample schedule properties value information gain G ain ' (Q), according to MAXGain ' attributes (Q) draw Point current sample value, create with MAXGain ' (Q) the corresponding root node A of property value, determine the next stage node of root node A, ask Go out whole leaf nodes, generate decision tree, do not obtain whole leaf nodes and return the value information increasing of calculating training sample schedule properties Beneficial Gain (Q).
In order to carry out the analysis of cluster result using decision tree, the analytical standard for first defining some customer values is needed, this A little standards are stored in data base or XML file in the form of tables of data, after the analytical standard for defining customer value, just Can be generated algorithmically by decision tree to explain the structure of cluster analyses with profit.
(11) database data of CRM system is set up as input using step (1) to step (10) Model clustered, obtain relevant cluster result.
Performance Evaluating Indexes are RI=n0/n, and wherein n0 is obtained after contrasting with standard data set for the cluster result of test set The mean number of correct classification samples;N is the total sample number of test data set;RI values are bigger, represent that cluster accuracy is bigger, gather Class effect is better.Experiment is repeated 5 times, the meansigma methodss of experimental result RI are listed in table 2.From table 2, with the increasing of supervision message Many, the accuracy of cluster has increase tendency, shows that supervision message data have directive function.In Iris data sets, Wine data Performance indications are as shown in Figure 3 with the change curve of supervision message ratio on collection and Balance-scale data sets.As seen from Figure 3: On different pieces of information collection, RI values increase with the increase of supervision message ratio;Although the rate of climb of cluster accuracy can not be by prison Superintend and direct the amplification of quantity of information and change, but generally remain above the clustering precision of original clustering algorithm, and then demonstrate the algorithm Reasonability and effectiveness.
The comparison of experimental result RI of table 2
In sum, clustering algorithm of the invention, reduces the wave of information in cluster process using known sample information Take, while considering in class dispersion information between tightness information and class, be effectively improved the blindness of original clustering method. by this Bright method carries out emulation experiment on UCI data sets, test result indicate that, new algorithm proposed by the invention is generally better than it The performance of his clustering algorithm.
A kind of method that porous silicon/Graphene composite lithium ion battery cathode material is prepared as raw material with kieselguhr of the present invention Raw material used is commercially available prod, and raw material is easy to get, convenient to carry out.
Design conditions, legend in the embodiment of the present invention etc. are only used for that the present invention is further illustrated, not exhaustive, Do not constitute the restriction to claims, the enlightenment that those skilled in the art obtain according to embodiments of the present invention, no Other replacements being substantially equal to are would occur to through creative work, in the scope of the present invention.

Claims (1)

1. user clustering analysis method in a kind of CRM system, it is comprised the step of successively:Traffic issues are determined Justice, the preparation of data and screening, the cleaning of data and pretreatment (ETL), the correct reliable data for extracting, the excavation of data, Set of modes, model select with build, model is evaluated and is explained, when result satisfaction then instruct management practice activity and The feedback information in client and market is sent in time data warehouse, is easy to enterprise quickly to be reacted, when result is unsatisfied with Then send into model by model optimization to select and structure, it is characterized in that, comprise the concrete steps that:
(1) CRM system data are acquired and are classified:Have in ensure cluster of the model of training after Well adapting to property, system data should classify according to system user regulation, and at least 100 groups data are respectively taken in different sections as instruction Practice sample;
(2) normalized is done to CRM system data:If dataDomain be di=[mi,Mi], if ri=udi(xi), (i=1,2,3 ..., n) are model pairCategory value xiDimensionless number, and ri∈[0,1]
Wherein,ForCanonical function, through normalization, individual data span be [0,1];
(3) using the data after normalized, degree of membership U (t) is initialized, V (t), wherein t are iterationses;
(4) V (t) is updated to into V (t+1) by formula (2);
(5) by V (t+1),U (t+1) is updated to by formula (3);
(6) as | J (t+1)-J (t) | π ε, or iterationses t exceed maximum iteration time M when, algorithm terminate;Otherwise proceed to step (2);
(7) clustering algorithm:Algorithm mathematics model is introduced in constraints is
min j ( U , V , λ ) = Σ i = 1 c Σ j = 1 n u i j m d i j 2 - Σ i = 1 c Σ j = 1 n λ j u i j m ln u i j m s . t Σ i = 1 c u i j = 1 , u i j ∈ [ 0 , 1 ] , Σ j = 1 n u i j ∈ ( 0 , n ) , - - - ( 2 )
It is equivalent to optimization problem
min j ( U , V ) = Σ i = 1 c Σ j = 1 n δ j m u i j m d i j 2 s . t Σ i = 1 c u i j = 1 , u i j ∈ [ 0 , 1 ] , Σ j = 1 n u i j ∈ ( 0 , n ) , - - - ( 3 )
Wherein:dkj=‖ xj-vk‖ represents sample point xjTo class center vkEuclidean distance, η ∈ (0, be 1) apoplexy due to endogenous wind heart influence degree Regulatory factor parameter,
δ i = ( Σ k = 1 c d k j 2 ) - 1 ; λ = ( λ 1 , λ 2 , Λ , λ n ) ;
Other specification is identical with formula (1) definition, and formula (3) is compared with formula (1), it is clear that consider data number in each cluster process According to actual distribution characteristic spatially;
(8) compensation term Ψ of semi-supervised property is introduced to degree of membership in clustering algorithm, description supervision message its expression formula is
ψ = Σ i = 1 c Σ j = 1 n ( u i j - u ^ i j ) m , - - - ( 4 )
Separating degree function phi describes the dispersion problem between inhomogeneity between class, and its expression formula is
φ = η Σ i = 1 c Σ h = 1 , h ≠ i c | | v i - v h | | 2 , - - - ( 5 )
Wish that hyperplane is spaced at greater between class, it is known that message sample has the ability of guiding cluster, its impact for being subordinate to angle value makes Final accuracy of the clustering result quality as far as possible than being clustered with random number is higher, therefore, formula (3) is made an amendment, degree of membership is introduced Separating degree function between the compensation term and class of semi-supervised property, obtains new object function, and then obtains the mathematical modulo of clustering method Type, its expression-form is
min j ( U , V ) = Σ i = 1 c Σ j = 1 n δ j m ( u i j - u ^ i j ) m d i j 2 - η Σ i = 1 c Σ j = 1 n δm j m ( u i j - u ^ i j ) m Σ h = i , h ≠ k c | | v i - v j | | 2 , s . t Σ i = 1 c u i j = 1 , u i j ∈ [ 0 , 1 ] , Σ j = 1 n u i j ∈ ( 0 , n ) , - - - ( 6 )
Wherein:0 π η π 1, η are that apoplexy due to endogenous wind podiod rings factor parameter;Other specification and formula (3) phase Together.
For model (6), solved using Lagrange multiplier factor methods, construction Lagrange functions are
J = J ( U , V ) - Σ k = 1 n λ k ( Σ i = 1 c u i j - 1 ) , - - - ( 7 )
OrderThe iterative formula for obtaining optimal solution is
v i = Σδ j m ( u i j - u ^ i j ) m x j - η Σ j = 1 n [ δ j m ( u i j - u ^ i j ) m Σ h = i , h ≠ i c v h Σ j = 1 n δ j m ( u i j - u ^ i j ) m - η ( c - 1 ) Σ j = 1 n δ j m ( u i j - u ^ i j ) m - - - ( 8 )
u i j = u ^ i j + 1 - Σ k = 1 c u ^ i j ( Σ k = 1 N δ j m d i j 2 - ηδ j m Σ h = 1 , h ≠ i c | | v i - v h | | 2 δ j m d k j 2 - ηδ j m Σ h = 1 , h ≠ i c | | v - v h | | 2 ) 1 / ( m - 1 ) , - - - ( 9 )
WhereinFor supervision message item, if its value meets data xjFor Given information sample, thenValue and Given information phase Deng;Otherwise it is zero,ForSubordinated-degree matrix known to the c × n of composition;
(9) PN being set as training set, there is p positive example and n counter-example in PN, for a sample set, the probability of the PE of positive example collection is P/ (p+n), counter-example integrates the probability of NE as n/ (p+n);
One decision tree can be seen as the message source with positive and negative example collection, and the message that message source is produced expects that information is:
I ( p , n ) = - p p + n log 2 p p + n - n p + n log 2 n p + n - - - ( 10 )
If the value of attribute A is { A1,A2,Λ,Am, PN is categorized as into k subset { PN1, PN2, Λ, PNk, if PNiThere is piIt is individual just Example, niIndividual counter-example, subtree PNiRequired expectation information is I (pi,ni), and expectation information of the root required for the tree of A is each The weighted mean of the expectation information needed for subtree, i.e.,:
E ( A ) = Σ i = 1 k p i + n i p + n I ( p i , n i ) - - - ( 11 )
Tree with root as the A information gain for obtaining that carries out classifying is:
Gain (A)=I (p, n)-E (A) (12)
The maximum attributes of Gain (A) are selected as the node branch attribute, for each node of decision tree uses this Principle, until setting up out complete decision tree till;
(10) decision trees are as follows:
Input:S:Training sample set, is mainly described by Category Attributes value;
Candidate-attribute:Candidate attribute set,
Output:One decision tree,
A () creates node N;
B () IF S are in a class C THEN;
C () returns N as leaf node, be labeled as class C;
D () IF Candidate-attribute are Null THEN;
E () returns N as leaf node, the general category being labeled as in S;// majority voting;
F () selects the attribute A in Candidate-attribute with highest information gain;
G () flag node N is A;
Given value a in (h) FOR Each Ai;// divide training sample;
I () grows a condition for A=a by node NiBranch;
J () sets SiIt is the branch in training sample S;// mono- division;
(k) IF A=aiFor Null THEN;
L () adds a leaveves, be labeled as class most common in S;
M () ELSE is plus a node returned by Generate_decision_tree (S, candidate_A);
After the information gain for obtaining each attribute, parameter is calculated using function, go to correct the information gain, as attribute Selection and the division module of sample dividing subset, it is general at random using relative frequency for those samples for lacking property value Rate number is adjusted,
The step of specific algorithm is successively:Statistical sample total amount, calculate training sample schedule properties value information gain G ain (Q), Amendment training sample schedule properties value information gain G ain ' (Q), according to MAXGain ' Attribute transposition current sample values (Q), wound Build with MAXGain ' (Q) the corresponding root node A of property value, determine root node A next stage node, obtain whole leaf nodes, Decision tree is generated, whole leaf nodes is not obtained and is returned calculating training sample schedule properties value information gain G ain (Q),
In order to carry out the analysis of cluster result using decision tree, the analytical standard for first defining some customer values is needed, these marks Standard is stored in data base or XML file in the form of tables of data, after the analytical standard for defining customer value, using calculation Method generates decision tree to explain the structure of cluster analyses;
(11) using the database data of CRM system as input, the mould set up using step (1) to step (10) Type is clustered, and obtains relevant cluster result.
CN201611212713.XA 2016-12-25 2016-12-25 User cluster analysis method in customer care system Pending CN106682915A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201611212713.XA CN106682915A (en) 2016-12-25 2016-12-25 User cluster analysis method in customer care system

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201611212713.XA CN106682915A (en) 2016-12-25 2016-12-25 User cluster analysis method in customer care system

Publications (1)

Publication Number Publication Date
CN106682915A true CN106682915A (en) 2017-05-17

Family

ID=58870536

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201611212713.XA Pending CN106682915A (en) 2016-12-25 2016-12-25 User cluster analysis method in customer care system

Country Status (1)

Country Link
CN (1) CN106682915A (en)

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107909178A (en) * 2017-08-31 2018-04-13 上海壹账通金融科技有限公司 Electronic device, lost contact repair rate Forecasting Methodology and computer-readable recording medium
CN108966448A (en) * 2018-05-31 2018-12-07 淮阴工学院 A kind of light dynamic regulation method based on adaptive fuzzy decision tree
CN109885597A (en) * 2019-01-07 2019-06-14 平安科技(深圳)有限公司 Tenant group processing method, device and electric terminal based on machine learning
CN112348583A (en) * 2020-11-04 2021-02-09 贝壳技术有限公司 User preference generation method and generation system
CN112508074A (en) * 2020-11-30 2021-03-16 深圳市飞泉云数据服务有限公司 Visualization display method and system and readable storage medium
CN115019078A (en) * 2022-08-09 2022-09-06 阿里巴巴(中国)有限公司 Data clustering method and device

Cited By (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107909178A (en) * 2017-08-31 2018-04-13 上海壹账通金融科技有限公司 Electronic device, lost contact repair rate Forecasting Methodology and computer-readable recording medium
CN107909178B (en) * 2017-08-31 2021-06-08 深圳壹账通智能科技有限公司 Electronic device, loss of association repair rate prediction method, and computer-readable storage medium
CN108966448A (en) * 2018-05-31 2018-12-07 淮阴工学院 A kind of light dynamic regulation method based on adaptive fuzzy decision tree
CN109885597A (en) * 2019-01-07 2019-06-14 平安科技(深圳)有限公司 Tenant group processing method, device and electric terminal based on machine learning
CN109885597B (en) * 2019-01-07 2023-05-30 平安科技(深圳)有限公司 User grouping processing method and device based on machine learning and electronic terminal
CN112348583A (en) * 2020-11-04 2021-02-09 贝壳技术有限公司 User preference generation method and generation system
CN112348583B (en) * 2020-11-04 2022-12-06 贝壳技术有限公司 User preference generation method and generation system
CN112508074A (en) * 2020-11-30 2021-03-16 深圳市飞泉云数据服务有限公司 Visualization display method and system and readable storage medium
CN112508074B (en) * 2020-11-30 2024-05-14 深圳市飞泉云数据服务有限公司 Visual display method, system and readable storage medium
CN115019078A (en) * 2022-08-09 2022-09-06 阿里巴巴(中国)有限公司 Data clustering method and device
CN115019078B (en) * 2022-08-09 2023-01-24 阿里巴巴(中国)有限公司 Vehicle image processing method, computing device and storage medium

Similar Documents

Publication Publication Date Title
CN106682915A (en) User cluster analysis method in customer care system
Liu et al. A multimodal multiobjective evolutionary algorithm using two-archive and recombination strategies
Pandey et al. A decision tree algorithm pertaining to the student performance analysis and prediction
Charoen-Ung et al. Sugarcane yield grade prediction using random forest with forward feature selection and hyper-parameter tuning
Alvarez et al. An evolutionary algorithm to discover quantitative association rules from huge databases without the need for an a priori discretization
CN109461025A (en) A kind of electric energy substitution potential customers' prediction technique based on machine learning
Limsathitwong et al. Dropout prediction system to reduce discontinue study rate of information technology students
Hu et al. A niching backtracking search algorithm with adaptive local search for multimodal multiobjective optimization
Gajowniczek et al. Comparison of decision trees with Rényi and Tsallis entropy applied for imbalanced churn dataset
Wang et al. Design of the Sports Training Decision Support System Based on the Improved Association Rule, the Apriori Algorithm.
Kurniawan et al. C5. 0 algorithm and synthetic minority oversampling technique (SMOTE) for rainfall forecasting in Bandung regency
CN115018357A (en) Farmer portrait construction method and system for production performance improvement
Fayaz et al. An adaptive gradient boosting model for the prediction of rainfall using ID3 as a base estimator
CN105930531A (en) Method for optimizing cloud dimensions of agricultural domain ontological knowledge on basis of hybrid models
Wang et al. Research on the factors affecting the innovation performance of China’s new energy type enterprises from the perspective of industrial policy
CN103020864A (en) Corn fine breed breeding method
Lai Segmentation study on enterprise customers based on data mining technology
Hassani et al. On the application of data mining to official data
Rattan et al. Applying SMOTE with decision tree classifier for campus placement prediction
Hou et al. Prediction of learners' academic performance using factorization machine and decision tree
Sreerama et al. A machine learning approach to crop yield prediction
Sang English teaching comprehensive ability evaluation system based on K-means clustering algorithm
Li Application of Fuzzy K‐Means Clustering Algorithm in the Innovation of English Teaching Evaluation Method
He et al. A study on evaluation of farmland fertility levels based on optimization of the decision tree algorithm
Tamrakar Student Performance Prediction by means of Multiple Regression

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20170517