CN106682915A - User cluster analysis method in customer care system - Google Patents
User cluster analysis method in customer care system Download PDFInfo
- Publication number
- CN106682915A CN106682915A CN201611212713.XA CN201611212713A CN106682915A CN 106682915 A CN106682915 A CN 106682915A CN 201611212713 A CN201611212713 A CN 201611212713A CN 106682915 A CN106682915 A CN 106682915A
- Authority
- CN
- China
- Prior art keywords
- sigma
- data
- node
- delta
- attribute
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/22—Indexing; Data structures therefor; Storage structures
- G06F16/2228—Indexing structures
- G06F16/2246—Trees, e.g. B+trees
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F16/00—Information retrieval; Database structures therefor; File system structures therefor
- G06F16/20—Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
- G06F16/28—Databases characterised by their database models, e.g. relational or object models
- G06F16/284—Relational databases
- G06F16/285—Clustering or classification
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/01—Customer relationship services
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Databases & Information Systems (AREA)
- General Physics & Mathematics (AREA)
- Physics & Mathematics (AREA)
- Business, Economics & Management (AREA)
- General Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Software Systems (AREA)
- Accounting & Taxation (AREA)
- Development Economics (AREA)
- Economics (AREA)
- Finance (AREA)
- Marketing (AREA)
- Strategic Management (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a user cluster analysis method in a customer care system. The method is characterized by establishing a systematic structure based on the customer care system, an application procedure of data mining and a brand-new algorithm. The method can overcome the defects in user clustering methods of current customer care systems, is scientific and proper, has high accuracy and strong universality, excellent effects, simplicity and reliability.
Description
Technical field
The present invention relates to management information system field, more particularly to the user clustering analysis side in CRM system
Method.
Background technology
Customer relation management (Customer Relationship Management, CRM) is exactly using present information skill
Art, by carrying out profound the demand for tracking, analyzing client to client, keeping frequent customer here, attracts new client, and can be with
In time find loyalty it may happen that change client, take measures in time, the interests of maintaining enterprise.With enterprise Jing
From being changed into " client " as core with " product " as core, the importance of customer relation management increasingly manifests battalion's pattern.Client
Relation management is total solution of the industrially developed country to marketing customer-centric, and CRM is from client's actual demand
Set out, using present information instrument, strengthen executive capability of the enterprise at aspects such as customer service, client development and client development,
So as to improve the satisfaction of enterprise getting profit and client, the management philosophy of client and enterprise's doulbe-sides' victory is realized.
CRM system is a kind of new type management system for being intended to perfect, improve relation between enterprise and client.
CRM system not only will be managed to customer information, it is often more important that carry out data using customer relationship
Excavate, to be best understood from structure, the hobby of client etc. of market, to develop new product and adjustment marketing strategy
Deng.
At present, enterprise lacks profound excavation to a large amount of customer datas for accumulating so that the service of client is in backwardness
Status.In order to preferably study customer relationship, it is very that data mining is incorporated in CRM system
It is necessary.Potential customer demand pattern and consuming behavior model can be found out using data mining, helps enterprise to improve client
Satisfaction, reduce client turnover rate, improve sales achievement, really embody enterprise " customer-centric " service theory.
Find in patent retrieval at home, application number:201510919909.1, title:It is a kind of to be excavated based on back-end data
The method and device of big customer's relation is safeguarded, being primarily directed to customer relation system data carries out data mining, to set up judgement
Big customer's method and step, primarily focus on system architecture aspect;Application number:201310204340.1, title:A kind of base
It is the main flow of the CRM system foundation based on data mining in the client relation management method and system of data mining
Technology;Application number:201210445332.1, title:Enterprise Resources Planning function is provided from customer relation management client application
Technology, primarily focus on CRM using the technology and device to the access of Enterprise Resources Planning (ERP) system.There are no so far
Close and technical solution of the present invention identical document report and practical application.
The content of the invention
It is an object of the present invention to make up user clustering method defect in prior art CRM system, pass through
Establishing based on the architecture of CRM system, the application flow of data mining and New arithmetic method for science, carries
Go out a kind of accuracy high, highly versatile, user clustering analysis method in the good and simple and reliable CRM system of effect.
Realize that the technical scheme that the object of the invention is adopted is, user clustering analysis side in a kind of CRM system
Method, it is comprised the step of successively:The definition of traffic issues, the preparation of data and screening, the cleaning of data and pretreatment
(ETL), the correct reliable data of extraction, the excavation of data, set of modes, model are selected and built, model is evaluated
Then instruct management practice activity and the feedback information in client and market is sent in time data bins with explanation, when result is satisfied with
Storehouse, is easy to enterprise quickly to be reacted, and is then sent into model selection and is built by model optimization when result is dissatisfied, its feature
It is to comprise the concrete steps that:
(1) CRM system data are acquired and are classified:To ensure cluster of the model of training after
In well adapting to property, system data should classify according to system user regulation, at least 100 groups data are respectively taken in different sections and is made
For training sample;
(2) normalized is done to CRM system data:If dataDomain be di=[mi,
Mi], if ri=udi(xi), (i=1,2,3, n) it is model pairCategory value xiDimensionless number, and ri∈[0,1]
Wherein,ForCanonical function, through normalization, individual data span be [0,1];
(3) using the data after normalized, degree of membership U (t) is initialized, V (t), wherein t are iterationses;
(4) V (t) is updated to into V (t+1) by formula (2);
(5) by V (t+1),U (t+1) is updated to by formula (3);
(6) as | J (t+1)-J (t) | π ε, or iterationses t exceed maximum iteration time M when, algorithm terminate;Otherwise proceed to
Step (2);
(7) clustering algorithm:Algorithm mathematics model is introduced in constraints is
It is equivalent to optimization problem
Wherein:dkj=‖ xj-vk‖ represents sample point xjTo class center vkEuclidean distance, η ∈ (0,1) be apoplexy due to endogenous wind podiod ring
Degree regulatory factor parameter,
Other specification is identical with formula (1) definition, and formula (3) is compared with formula (1), it is clear that consider data in each cluster process
Actual distribution characteristic on middle data space;
(8) compensation term Ψ of semi-supervised property is introduced to degree of membership in clustering algorithm, description supervision message its expression formula is
Separating degree function phi describes the dispersion problem between inhomogeneity between class, and its expression formula is
Wish that hyperplane is spaced at greater between class, it is known that message sample has the ability of guiding cluster, and it is subordinate to the shadow of angle value
Sound makes accuracy of the final clustering result quality as far as possible than being clustered with random number higher, therefore, formula (3) is made an amendment, to degree of membership
Separating degree function between the compensation term and class of semi-supervised property is introduced, new object function is obtained, and then obtains the number of clustering method
Model is learned, its expression-form is
Wherein:0 π η π 1, η are that apoplexy due to endogenous wind podiod rings factor parameter;Other specification and formula
(3) it is identical.
For model (6), solved using Lagrange multiplier factor methods, construction Lagrange functions are
OrderThe iterative formula for obtaining optimal solution is
WhereinFor supervision message item, if its value meets data xjFor Given information sample, thenValue and known letter
Manner of breathing etc.;Otherwise it is zero,ForSubordinated-degree matrix known to the c × n of composition;
(9) PN being set as training set, there is p positive example and n counter-example in PN, for a sample set, the PE's of positive example collection is general
Rate is p/ (p+n), and counter-example integrates the probability of NE as n/ (p+n);
One decision tree can be seen as the message source with positive and negative example collection, and the message that message source is produced expects information
It is:
If the value of attribute A is { A1,A2,Λ,Am, PN is categorized as into k subset { PN1, PN2, Λ, PNk, if PNiThere is pi
Individual positive example, niIndividual counter-example, subtree PNiRequired expectation information is I (pi,ni), and expectation information of the root required for the tree of A
The weighted mean of the expectation information for needed for each subtree, i.e.,:
Tree with root as the A information gain for obtaining that carries out classifying is:
Gain (A)=I (p, n)-E (A) (12)
The maximum attributes of Gain (A) are selected as the node branch attribute, for each node of decision tree is used
This principle, until setting up out complete decision tree till;
(10) decision trees are as follows:
Input:S:Training sample set, is mainly described by Category Attributes value;
Candidate-attribute:Candidate attribute set,
Output:One decision tree,
A () creates node N;
B () IF S are in a class C THEN;
C () returns N as leaf node, be labeled as class C;
D () IF Candidate-attribute are Null THEN;
E () returns N as leaf node, the general category being labeled as in S;// majority voting;
F () selects the attribute A in Candidate-attribute with highest information gain;
G () flag node N is A;
Given value a in (h) FOR Each Ai;// divide training sample;
I () grows a condition for A=a by node NiBranch;
J () sets SiIt is the branch in training sample S;// mono- division;
(k) IF A=aiFor Null THEN;
L () adds a leaveves, be labeled as class most common in S;
M () ELSE is plus a node returned by Generate_decision_tree (S, candidate_A);
After the information gain for obtaining each attribute, parameter is calculated using function, go to correct the information gain, as
The division module of Attributions selection and sample dividing subset, for those samples for lacking property value, using relative frequency with
Machine probability number is adjusted,
The step of specific algorithm is successively:Statistical sample total amount, calculating training sample schedule properties value information gain G ain
(Q), training sample schedule properties value information gain G ain is corrected ' the current sample of Attribute transposition (Q), according to MAXGain ' (Q)
Value, establishment and MAXGain ' (Q) the corresponding root node A of property value, determine the next stage node of root node A, obtain whole leaves
Node, generation decision tree, do not obtain whole leaf nodes and return calculating training sample schedule properties value information gain G ain (Q),
In order to carry out the analysis of cluster result using decision tree, the analytical standard for first defining some customer values is needed, this
A little standards are stored in data base or XML file in the form of tables of data, after the analytical standard for defining customer value, profit
It is generated algorithmically by decision tree to explain the structure of cluster analyses;
(11) database data of CRM system is set up as input using step (1) to step (10)
Model clustered, obtain relevant cluster result.
The method of user clustering analysis in a kind of CRM system of the present invention, application data is excavated in customer relationship
By extracting to the data in data warehouse in management system, using these data as the modeling sample studied;Then it is sharp
Data are excavated with the algorithm of various data minings, the result to excavating is analyzed and models, and model is carried out constantly
Optimization, and then obtain relevant cluster result.Have the advantage that and be embodied in:
1. cluster analyses are carried out to CRM system data using intelligent computation model, are capable of achieving effectively cluster,
The deficiency of existing method is compensate for, preferable assosting effect is served to existing method;
2. it is simple without doing any change to CRM system, it is not required that increase new equipment;
3. merge various artificial intelligence's computation models, reduce single model and the several of larger error are likely to occur in cluster
Rate, makes cluster result more stable.
4. its is scientific and reasonable, and accuracy is high, and highly versatile, effect is good.
Description of the drawings
Fig. 1 is a kind of data digging flow figure of user clustering analysis in CRM system;
Fig. 2 is a kind of algorithm steps block diagram of user clustering analysis in CRM system;
The change curve of Fig. 3 performance indications and supervision message ratio.
Specific embodiment:
Below with method of the drawings and Examples to user clustering analysis in a kind of CRM system of the invention
It is described further.
With reference to Fig. 1, user clustering analysis method in a kind of CRM system of the present invention, successively including the step of
Have:The definition of traffic issues, the preparation of data and screening, the cleaning of data and pretreatment (ETL), the correct reliable number for extracting
According to, the excavation of data, set of modes, model select with build, model is evaluated and explained, when enterprise is then instructed in result satisfaction
Industry practical activity and the feedback information in client and market is sent in time data warehouse, is easy to enterprise quickly to be reacted,
Then send into model selection and build by model optimization when result is dissatisfied.
For the reasonability of verification algorithm, in UCI machine learning databases, adopt and be usually used in clustering method detection
Iris data sets, Wine data sets and Balance-scale data sets are tested, and data set information is listed in table 1.
The experimental data set information of table 1
For each data set, the 10% of population sample, 20%, 30% is randomly selected, 40% used as test set.For visitor
Sight carries out the odds of algorithms of different performance compared with, setting parameter m=2, η=0.000 1.
Comprise the following steps that:
(1) customer relation system data are acquired and are classified:Have in ensure cluster of the model of training after
Well adapting to property, system data should classify according to system user regulation.Respectively fetch data conduct in data above collection difference section
Training sample.
(2) normalized is done to the data of CRM system:If dataDomain be di=[mi,
Mi], if ri=udi(xi), (i=1,2,3, n) it is model pairCategory value xiDimensionless number, and ri∈[0,1].
Wherein,ForCanonical function.Through normalization, individual data span is [0,1].
(3) using the data after normalized, degree of membership U (t) is initialized, V (t), wherein t are iterationses;
(4) V (t) is updated to into V (t+1) by formula (2);
(5) by V (t+1),U (t+1) is updated to by formula (3);
(6) as | J (t+1)-J (t) | π ε, or iterationses t exceed maximum iteration time M when, algorithm terminate;Otherwise proceed to
Step (2);
(7) algorithm mathematics model is introduced in clustering algorithm constraints is
It is equivalent to optimization problem
Wherein:dkj=‖ xj-vk‖ represents sample point xjTo class center vkEuclidean distance;η ∈ (0, it is 1) apoplexy due to endogenous wind podiod sound
Degree regulatory factor parameter;
Other specification is identical with formula (1) definition.Formula (3) is compared with formula (1), it is clear that consider data in each cluster process
Actual distribution characteristic on middle data space.
(8) compensation term Ψ of semi-supervised property is introduced to degree of membership in clustering algorithm, description supervision message its expression formula is
Separating degree function phi describes the dispersion problem between inhomogeneity between class, and its expression formula is
Wish that hyperplane is spaced at greater between class.Given information sample has the ability of guiding cluster, and it is subordinate to the shadow of angle value
Sound makes accuracy of the final clustering result quality as far as possible than being clustered with random number higher.Therefore, formula (3) is made an amendment, to degree of membership
Separating degree function between the compensation term and class of semi-supervised property is introduced, new object function is obtained, and then obtains the number of clustering method
Model is learned, its expression-form is
Wherein:0 π η π 1, η are that apoplexy due to endogenous wind podiod rings factor parameter;Other specification and formula
(3) it is identical.
For model (6), solved using Lagrange multiplier factor methods, construction Lagrange functions are
OrderThe iterative formula for obtaining optimal solution is
WhereinFor supervision message item, if its value meets data xjFor Given information sample, thenValue and known letter
Manner of breathing etc.;Otherwise it is zero.ForSubordinated-degree matrix known to the c × n of composition.
(9) PN being set as training set, there is p positive example and n counter-example in PN, for a sample set, the PE's of positive example collection is general
Rate is p/ (p+n), and counter-example integrates the probability of NE as n/ (p+n).
One decision tree can be seen as the message source with positive and negative example collection, and the message that message source is produced expects information
It is:
If the value of attribute A is { A1,A2,Λ,Am, PN is categorized as into k subset { PN1, PN2, Λ, PNk}.If PNiThere is pi
Individual positive example, niIndividual counter-example, subtree PNiRequired expectation information is I (pi,ni).And expectation information of the root required for the tree of A
The weighted mean of the expectation information for needed for each subtree, i.e.,:
Tree with root as the A information gain for obtaining that carries out classifying is:
Gain (A)=I (p, n)-E (A) (12)
The maximum attributes of Gain (A) are selected as the node branch attribute, for each node of decision tree is used
This principle, until setting up out complete decision tree till.
(10) decision trees are as follows:
Input:S:Training sample set, is mainly described by Category Attributes value;
Candidate-attribute:Candidate attribute set.
Output:One decision tree.
Method:
A () creates node N;
B () IF S are in a class C THEN;
C () returns N as leaf node, be labeled as class C;
D () IF Candidate-attribute are Null THEN;
E () returns N as leaf node, the general category being labeled as in S;// majority voting;
F () selects the attribute A in Candidate-attribute with highest information gain;
G () flag node N is A;
Given value a in (h) FOR Each Ai;// divide training sample;
I () grows a condition for A=a by node NiBranch;
J () sets SiIt is the branch in training sample S;// mono- division;
(k) IF A=aiFor Null THEN;
L () adds a leaveves, be labeled as class most common in S;
M () ELSE is plus a node returned by Generate_decision_tree (S, candidate_A);
After the information gain for obtaining each attribute, parameter is calculated using function, go to correct the information gain, as
The division module of Attributions selection and sample dividing subset.For those samples for lacking property value, using relative frequency with
Machine probability number is adjusted.
As shown in Fig. 2 the step of specific algorithm is successively:Statistical sample total amount, calculating training sample schedule properties value letter
Breath gain G ain (Q), amendment training sample schedule properties value information gain G ain ' (Q), according to MAXGain ' attributes (Q) draw
Point current sample value, create with MAXGain ' (Q) the corresponding root node A of property value, determine the next stage node of root node A, ask
Go out whole leaf nodes, generate decision tree, do not obtain whole leaf nodes and return the value information increasing of calculating training sample schedule properties
Beneficial Gain (Q).
In order to carry out the analysis of cluster result using decision tree, the analytical standard for first defining some customer values is needed, this
A little standards are stored in data base or XML file in the form of tables of data, after the analytical standard for defining customer value, just
Can be generated algorithmically by decision tree to explain the structure of cluster analyses with profit.
(11) database data of CRM system is set up as input using step (1) to step (10)
Model clustered, obtain relevant cluster result.
Performance Evaluating Indexes are RI=n0/n, and wherein n0 is obtained after contrasting with standard data set for the cluster result of test set
The mean number of correct classification samples;N is the total sample number of test data set;RI values are bigger, represent that cluster accuracy is bigger, gather
Class effect is better.Experiment is repeated 5 times, the meansigma methodss of experimental result RI are listed in table 2.From table 2, with the increasing of supervision message
Many, the accuracy of cluster has increase tendency, shows that supervision message data have directive function.In Iris data sets, Wine data
Performance indications are as shown in Figure 3 with the change curve of supervision message ratio on collection and Balance-scale data sets.As seen from Figure 3:
On different pieces of information collection, RI values increase with the increase of supervision message ratio;Although the rate of climb of cluster accuracy can not be by prison
Superintend and direct the amplification of quantity of information and change, but generally remain above the clustering precision of original clustering algorithm, and then demonstrate the algorithm
Reasonability and effectiveness.
The comparison of experimental result RI of table 2
In sum, clustering algorithm of the invention, reduces the wave of information in cluster process using known sample information
Take, while considering in class dispersion information between tightness information and class, be effectively improved the blindness of original clustering method. by this
Bright method carries out emulation experiment on UCI data sets, test result indicate that, new algorithm proposed by the invention is generally better than it
The performance of his clustering algorithm.
A kind of method that porous silicon/Graphene composite lithium ion battery cathode material is prepared as raw material with kieselguhr of the present invention
Raw material used is commercially available prod, and raw material is easy to get, convenient to carry out.
Design conditions, legend in the embodiment of the present invention etc. are only used for that the present invention is further illustrated, not exhaustive,
Do not constitute the restriction to claims, the enlightenment that those skilled in the art obtain according to embodiments of the present invention, no
Other replacements being substantially equal to are would occur to through creative work, in the scope of the present invention.
Claims (1)
1. user clustering analysis method in a kind of CRM system, it is comprised the step of successively:Traffic issues are determined
Justice, the preparation of data and screening, the cleaning of data and pretreatment (ETL), the correct reliable data for extracting, the excavation of data,
Set of modes, model select with build, model is evaluated and is explained, when result satisfaction then instruct management practice activity and
The feedback information in client and market is sent in time data warehouse, is easy to enterprise quickly to be reacted, when result is unsatisfied with
Then send into model by model optimization to select and structure, it is characterized in that, comprise the concrete steps that:
(1) CRM system data are acquired and are classified:Have in ensure cluster of the model of training after
Well adapting to property, system data should classify according to system user regulation, and at least 100 groups data are respectively taken in different sections as instruction
Practice sample;
(2) normalized is done to CRM system data:If dataDomain be di=[mi,Mi], if
ri=udi(xi), (i=1,2,3 ..., n) are model pairCategory value xiDimensionless number, and ri∈[0,1]
Wherein,ForCanonical function, through normalization, individual data span be [0,1];
(3) using the data after normalized, degree of membership U (t) is initialized, V (t), wherein t are iterationses;
(4) V (t) is updated to into V (t+1) by formula (2);
(5) by V (t+1),U (t+1) is updated to by formula (3);
(6) as | J (t+1)-J (t) | π ε, or iterationses t exceed maximum iteration time M when, algorithm terminate;Otherwise proceed to step
(2);
(7) clustering algorithm:Algorithm mathematics model is introduced in constraints is
It is equivalent to optimization problem
Wherein:dkj=‖ xj-vk‖ represents sample point xjTo class center vkEuclidean distance, η ∈ (0, be 1) apoplexy due to endogenous wind heart influence degree
Regulatory factor parameter,
Other specification is identical with formula (1) definition, and formula (3) is compared with formula (1), it is clear that consider data number in each cluster process
According to actual distribution characteristic spatially;
(8) compensation term Ψ of semi-supervised property is introduced to degree of membership in clustering algorithm, description supervision message its expression formula is
Separating degree function phi describes the dispersion problem between inhomogeneity between class, and its expression formula is
Wish that hyperplane is spaced at greater between class, it is known that message sample has the ability of guiding cluster, its impact for being subordinate to angle value makes
Final accuracy of the clustering result quality as far as possible than being clustered with random number is higher, therefore, formula (3) is made an amendment, degree of membership is introduced
Separating degree function between the compensation term and class of semi-supervised property, obtains new object function, and then obtains the mathematical modulo of clustering method
Type, its expression-form is
Wherein:0 π η π 1, η are that apoplexy due to endogenous wind podiod rings factor parameter;Other specification and formula (3) phase
Together.
For model (6), solved using Lagrange multiplier factor methods, construction Lagrange functions are
OrderThe iterative formula for obtaining optimal solution is
WhereinFor supervision message item, if its value meets data xjFor Given information sample, thenValue and Given information phase
Deng;Otherwise it is zero,ForSubordinated-degree matrix known to the c × n of composition;
(9) PN being set as training set, there is p positive example and n counter-example in PN, for a sample set, the probability of the PE of positive example collection is
P/ (p+n), counter-example integrates the probability of NE as n/ (p+n);
One decision tree can be seen as the message source with positive and negative example collection, and the message that message source is produced expects that information is:
If the value of attribute A is { A1,A2,Λ,Am, PN is categorized as into k subset { PN1, PN2, Λ, PNk, if PNiThere is piIt is individual just
Example, niIndividual counter-example, subtree PNiRequired expectation information is I (pi,ni), and expectation information of the root required for the tree of A is each
The weighted mean of the expectation information needed for subtree, i.e.,:
Tree with root as the A information gain for obtaining that carries out classifying is:
Gain (A)=I (p, n)-E (A) (12)
The maximum attributes of Gain (A) are selected as the node branch attribute, for each node of decision tree uses this
Principle, until setting up out complete decision tree till;
(10) decision trees are as follows:
Input:S:Training sample set, is mainly described by Category Attributes value;
Candidate-attribute:Candidate attribute set,
Output:One decision tree,
A () creates node N;
B () IF S are in a class C THEN;
C () returns N as leaf node, be labeled as class C;
D () IF Candidate-attribute are Null THEN;
E () returns N as leaf node, the general category being labeled as in S;// majority voting;
F () selects the attribute A in Candidate-attribute with highest information gain;
G () flag node N is A;
Given value a in (h) FOR Each Ai;// divide training sample;
I () grows a condition for A=a by node NiBranch;
J () sets SiIt is the branch in training sample S;// mono- division;
(k) IF A=aiFor Null THEN;
L () adds a leaveves, be labeled as class most common in S;
M () ELSE is plus a node returned by Generate_decision_tree (S, candidate_A);
After the information gain for obtaining each attribute, parameter is calculated using function, go to correct the information gain, as attribute
Selection and the division module of sample dividing subset, it is general at random using relative frequency for those samples for lacking property value
Rate number is adjusted,
The step of specific algorithm is successively:Statistical sample total amount, calculate training sample schedule properties value information gain G ain (Q),
Amendment training sample schedule properties value information gain G ain ' (Q), according to MAXGain ' Attribute transposition current sample values (Q), wound
Build with MAXGain ' (Q) the corresponding root node A of property value, determine root node A next stage node, obtain whole leaf nodes,
Decision tree is generated, whole leaf nodes is not obtained and is returned calculating training sample schedule properties value information gain G ain (Q),
In order to carry out the analysis of cluster result using decision tree, the analytical standard for first defining some customer values is needed, these marks
Standard is stored in data base or XML file in the form of tables of data, after the analytical standard for defining customer value, using calculation
Method generates decision tree to explain the structure of cluster analyses;
(11) using the database data of CRM system as input, the mould set up using step (1) to step (10)
Type is clustered, and obtains relevant cluster result.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611212713.XA CN106682915A (en) | 2016-12-25 | 2016-12-25 | User cluster analysis method in customer care system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201611212713.XA CN106682915A (en) | 2016-12-25 | 2016-12-25 | User cluster analysis method in customer care system |
Publications (1)
Publication Number | Publication Date |
---|---|
CN106682915A true CN106682915A (en) | 2017-05-17 |
Family
ID=58870536
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201611212713.XA Pending CN106682915A (en) | 2016-12-25 | 2016-12-25 | User cluster analysis method in customer care system |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106682915A (en) |
Cited By (6)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107909178A (en) * | 2017-08-31 | 2018-04-13 | 上海壹账通金融科技有限公司 | Electronic device, lost contact repair rate Forecasting Methodology and computer-readable recording medium |
CN108966448A (en) * | 2018-05-31 | 2018-12-07 | 淮阴工学院 | A kind of light dynamic regulation method based on adaptive fuzzy decision tree |
CN109885597A (en) * | 2019-01-07 | 2019-06-14 | 平安科技(深圳)有限公司 | Tenant group processing method, device and electric terminal based on machine learning |
CN112348583A (en) * | 2020-11-04 | 2021-02-09 | 贝壳技术有限公司 | User preference generation method and generation system |
CN112508074A (en) * | 2020-11-30 | 2021-03-16 | 深圳市飞泉云数据服务有限公司 | Visualization display method and system and readable storage medium |
CN115019078A (en) * | 2022-08-09 | 2022-09-06 | 阿里巴巴(中国)有限公司 | Data clustering method and device |
-
2016
- 2016-12-25 CN CN201611212713.XA patent/CN106682915A/en active Pending
Cited By (11)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107909178A (en) * | 2017-08-31 | 2018-04-13 | 上海壹账通金融科技有限公司 | Electronic device, lost contact repair rate Forecasting Methodology and computer-readable recording medium |
CN107909178B (en) * | 2017-08-31 | 2021-06-08 | 深圳壹账通智能科技有限公司 | Electronic device, loss of association repair rate prediction method, and computer-readable storage medium |
CN108966448A (en) * | 2018-05-31 | 2018-12-07 | 淮阴工学院 | A kind of light dynamic regulation method based on adaptive fuzzy decision tree |
CN109885597A (en) * | 2019-01-07 | 2019-06-14 | 平安科技(深圳)有限公司 | Tenant group processing method, device and electric terminal based on machine learning |
CN109885597B (en) * | 2019-01-07 | 2023-05-30 | 平安科技(深圳)有限公司 | User grouping processing method and device based on machine learning and electronic terminal |
CN112348583A (en) * | 2020-11-04 | 2021-02-09 | 贝壳技术有限公司 | User preference generation method and generation system |
CN112348583B (en) * | 2020-11-04 | 2022-12-06 | 贝壳技术有限公司 | User preference generation method and generation system |
CN112508074A (en) * | 2020-11-30 | 2021-03-16 | 深圳市飞泉云数据服务有限公司 | Visualization display method and system and readable storage medium |
CN112508074B (en) * | 2020-11-30 | 2024-05-14 | 深圳市飞泉云数据服务有限公司 | Visual display method, system and readable storage medium |
CN115019078A (en) * | 2022-08-09 | 2022-09-06 | 阿里巴巴(中国)有限公司 | Data clustering method and device |
CN115019078B (en) * | 2022-08-09 | 2023-01-24 | 阿里巴巴(中国)有限公司 | Vehicle image processing method, computing device and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106682915A (en) | User cluster analysis method in customer care system | |
Liu et al. | A multimodal multiobjective evolutionary algorithm using two-archive and recombination strategies | |
Pandey et al. | A decision tree algorithm pertaining to the student performance analysis and prediction | |
Charoen-Ung et al. | Sugarcane yield grade prediction using random forest with forward feature selection and hyper-parameter tuning | |
Alvarez et al. | An evolutionary algorithm to discover quantitative association rules from huge databases without the need for an a priori discretization | |
CN109461025A (en) | A kind of electric energy substitution potential customers' prediction technique based on machine learning | |
Limsathitwong et al. | Dropout prediction system to reduce discontinue study rate of information technology students | |
Hu et al. | A niching backtracking search algorithm with adaptive local search for multimodal multiobjective optimization | |
Gajowniczek et al. | Comparison of decision trees with Rényi and Tsallis entropy applied for imbalanced churn dataset | |
Wang et al. | Design of the Sports Training Decision Support System Based on the Improved Association Rule, the Apriori Algorithm. | |
Kurniawan et al. | C5. 0 algorithm and synthetic minority oversampling technique (SMOTE) for rainfall forecasting in Bandung regency | |
CN115018357A (en) | Farmer portrait construction method and system for production performance improvement | |
Fayaz et al. | An adaptive gradient boosting model for the prediction of rainfall using ID3 as a base estimator | |
CN105930531A (en) | Method for optimizing cloud dimensions of agricultural domain ontological knowledge on basis of hybrid models | |
Wang et al. | Research on the factors affecting the innovation performance of China’s new energy type enterprises from the perspective of industrial policy | |
CN103020864A (en) | Corn fine breed breeding method | |
Lai | Segmentation study on enterprise customers based on data mining technology | |
Hassani et al. | On the application of data mining to official data | |
Rattan et al. | Applying SMOTE with decision tree classifier for campus placement prediction | |
Hou et al. | Prediction of learners' academic performance using factorization machine and decision tree | |
Sreerama et al. | A machine learning approach to crop yield prediction | |
Sang | English teaching comprehensive ability evaluation system based on K-means clustering algorithm | |
Li | Application of Fuzzy K‐Means Clustering Algorithm in the Innovation of English Teaching Evaluation Method | |
He et al. | A study on evaluation of farmland fertility levels based on optimization of the decision tree algorithm | |
Tamrakar | Student Performance Prediction by means of Multiple Regression |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
WD01 | Invention patent application deemed withdrawn after publication | ||
WD01 | Invention patent application deemed withdrawn after publication |
Application publication date: 20170517 |