CN108734217A - A kind of customer segmentation method and device based on clustering - Google Patents
A kind of customer segmentation method and device based on clustering Download PDFInfo
- Publication number
- CN108734217A CN108734217A CN201810496620.7A CN201810496620A CN108734217A CN 108734217 A CN108734217 A CN 108734217A CN 201810496620 A CN201810496620 A CN 201810496620A CN 108734217 A CN108734217 A CN 108734217A
- Authority
- CN
- China
- Prior art keywords
- sample
- data
- distance
- data sample
- point
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Pending
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F18/00—Pattern recognition
- G06F18/20—Analysing
- G06F18/23—Clustering techniques
- G06F18/232—Non-hierarchical techniques
- G06F18/2321—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
- G06F18/23213—Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06Q—INFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
- G06Q30/00—Commerce
- G06Q30/02—Marketing; Price estimation or determination; Fundraising
- G06Q30/0201—Market modelling; Market analysis; Collecting market data
Landscapes
- Engineering & Computer Science (AREA)
- Data Mining & Analysis (AREA)
- Business, Economics & Management (AREA)
- Physics & Mathematics (AREA)
- Strategic Management (AREA)
- Finance (AREA)
- Development Economics (AREA)
- Accounting & Taxation (AREA)
- Theoretical Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Entrepreneurship & Innovation (AREA)
- Artificial Intelligence (AREA)
- Life Sciences & Earth Sciences (AREA)
- Computer Vision & Pattern Recognition (AREA)
- Bioinformatics & Computational Biology (AREA)
- Bioinformatics & Cheminformatics (AREA)
- Evolutionary Computation (AREA)
- General Engineering & Computer Science (AREA)
- Evolutionary Biology (AREA)
- Probability & Statistics with Applications (AREA)
- Game Theory and Decision Science (AREA)
- Economics (AREA)
- Marketing (AREA)
- General Business, Economics & Management (AREA)
- Information Retrieval, Db Structures And Fs Structures Therefor (AREA)
Abstract
The invention discloses a kind of customer segmentation method and device based on clustering, this method include:Customer information raw data set is obtained, numeralization pretreatment is carried out, obtains data sample, dimensionality reduction and feature extraction are carried out to data sample by autocoder;Autocoder treated data sample is used to the weight of VC Method computation attribute feature, and the distance between sample point is calculated using the Euclidean distance formula of weighting;Calculate the average distance between all data samples, ergodic data sample searches the Neighbor Points that each sample point is less than average distance with its distance, count all sample Neighbor Points quantity and according to descending sort, determine k initial cluster center point, remainder data is clustered according to weighted euclidean distance formula, completes customer segmentation work.
Description
Technical field
The invention belongs to market statistics and the technical fields of marketing, refer to a kind of customer segmentation based on clustering
Method and device.
Background technology
Along with the rapid development of science and technology, the universal use of computer, the quiet infiltration of network is daily in us
Every aspect.Nowadays, people find the increasingly heavier of valuable information change using data mining technology from each field
It wants, passing development not only can be summed up, but also the development trend in data future can be predicted.Wherein,
Customer segmentation is an important field of research.By the method for clustering, and will according to the similitude of client and diversity
They are divided into different classes, and enterprise is facilitated to find different types of client, to formulate the sale scheme of differentiation, realize enterprise
The bigger of industry profit, so, how the key point that enterprise obtains bigger profit is become to customer segmentation.Currently, in visitor
It is primarily present following problem in family subdivision system:
First, customer segmentation system is faced with that data volume is big when handling Customer Information, and the attribute of data is more, number
According to the higher problem of dimension, if directly select these initial data carry out clustering, do not only result in the effect of customer segmentation
Rate is relatively low, calculates complex steps, can also make customer segmentation overlong time.
Second, in customer segmentation system research, traditional k-means clustering algorithms be most common application algorithm it
One, but the algorithm equally treats the attribute of all data samples in cluster process, does not consider the difference between different attributes.
However, the importance of different attributes is different, different influences is also generated to Clustering Effect, it is however generally that, it is important
Attribute generates large effect to Clustering Effect.
Third, traditional k-means algorithms cluster when, it is more sensitive to the selection of initial cluster center point, exist with
Machine chooses the blindness sex chromosome mosaicism of initial cluster center point.In general, the quality of initial cluster center point selection can produce cluster result
Raw large effect may make Clustering Effect reach local optimum rather than global optimum, Er Qiehui once selection is improper
Increase the iterations of algorithm, reduces convergence speed of the algorithm.
In conclusion for the problem of how preferably carrying out customer segmentation in the prior art, still lack effective solution
Scheme.
Invention content
For the problem that the deficiencies in the prior art, how solution preferably carries out customer segmentation in the prior art,
The present invention provides a kind of customer segmentation method and device based on clustering, in customer segmentation, according to the consumption of client
They are collected as different classes by custom, to propose that different marketing strategies provides basis for inhomogeneous client.
The first object of the present invention is to provide a kind of customer segmentation method based on clustering.
To achieve the goals above, the present invention is using a kind of following technical solution:
A kind of customer segmentation method based on clustering, this method include:
Customer profile data collection is obtained, numeralization pretreatment is carried out, obtains data sample, by autocoder to data
Sample carries out dimensionality reduction and feature extraction;
Autocoder treated data sample uses to the weight of VC Method computation attribute feature, and using plus
The Euclidean distance formula of power calculates the distance between sample point;
The average distance between all data samples is calculated, ergodic data sample searches each sample point with its distance less than flat
The Neighbor Points of equal distance, count all sample Neighbor Points quantity and descending sort, determine initial cluster center point, carry out its remainder
Strong point clusters, and obtains inhomogeneous client, completes customer segmentation work.
Scheme as a further preference, in the method, the pretreated specific steps that quantize include:
The data of nonumeric type are subjected to numeralization processing;
Use the data of standardization formula manipulation numeric type;
The data of normalized processing are handled using normalized formula, obtain data sample.
Scheme as a further preference, it is described that dimensionality reduction and feature extraction are carried out to data sample by autocoder
Specific steps include:
The primary data sample of no label is input on the encoder in autocoder and carries out compressed encoding, is obtained
Code is encoded;
Operation is decoded to code using the decoder in autocoder, obtains new data sample;
The error for calculating new data sample and primary data sample moves the reconciliation of the encoder in encoder according to error transfer factor
The weight parameter of code device carries out dimensionality reduction and feature extraction by adjusting the autocoder after parameter to data sample.
Scheme as a further preference, in the method, improved k-meams algorithms include:
Autocoder treated data sample uses to the weight of VC Method computation attribute feature, and using plus
The Euclidean distance formula of power calculates the distance between sample, calculates the average distance between all data samples;
Ergodic data sample searches the Neighbor Points that each sample point is less than average distance with its distance, and it is close to count all samples
Adjoint point quantity and descending sort, determine initial cluster center point.
Scheme as a further preference, by autocoder treated data sample in improved k-meams algorithms
This uses the weight of VC Method computation attribute feature, and specific steps include:
Obtain the attribute value matrix of the data sample after automatic coder processes;
The coefficient of variation of each dimension attribute in computation attribute value matrix;
The weight of its each attributive character is calculated using the coefficient of variation of each dimension attribute acquired.
Scheme as a further preference, in improved k-meams algorithms, the coefficient of variation of each dimension attribute according to
The standard deviation of each dimension attribute value is calculated with average in attribute value matrix.
Scheme as a further preference, in improved k-meams algorithms, the Euclidean distance using weighting
The specific steps of distance that formula calculates between sample point include:
According to the weight for each dimension attribute being calculated, assignment weighting is carried out to Euclidean distance;
The distance between data sample point is calculated using the Euclidean distance formula of weighting.
Scheme as a further preference, in improved k-meams algorithms, the tool of the determining initial cluster center point
Body step includes:
Optional data sample point searches all sample points for being less than average distance with its distance, as the data sample
The Neighbor Points of point, and calculate the quantity of Neighbor Points;
Ergodic data sample searches the Neighbor Points that each sample point is less than average distance with its distance, and it is close to count all samples
Adjoint point quantity and according to descending sort;
Neighbour is selected to count out highest sample point as first initial cluster center point, if sample point is initial clustering
The Neighbor Points of central point are then ignored, and so on all sample points of traversal until determining k initial cluster center point.
The second object of the present invention is to provide a kind of computer readable storage medium.
To achieve the goals above, the present invention is using a kind of following technical solution:
A kind of computer readable storage medium, wherein being stored with a plurality of instruction, described instruction is suitable for by terminal device
Reason device loads and executes a kind of customer segmentation method based on clustering.
The third object of the present invention is to provide a kind of terminal device.
To achieve the goals above, the present invention is using a kind of following technical solution:
A kind of terminal device, including processor and computer readable storage medium, processor is for realizing each instruction;It calculates
Machine readable storage medium storing program for executing is suitable for being loaded by processor and being executed described one kind and is based on gathering for storing a plurality of instruction, described instruction
The customer segmentation method of alanysis.
Beneficial effects of the present invention:
1. a kind of customer segmentation method and device based on clustering of the present invention, introduce autocoder
Concept realizes the dimensionality reduction to data sample and the purpose of feature extraction so that the feature of the sample obtained after dimension-reduction treatment
The characteristics of primary data sample can be represented to greatest extent, plays the role of effective, needle to the feature extraction of initial data
The characteristics of stream data, more efficiently handles high dimensional data, and better effect is played in customer information processing.
2. a kind of customer segmentation method and device based on clustering of the present invention, by introducing VC Method
To reflect importance of the different attribute to Clustering Effect.The big attribute of dispersion degree role in cluster is bigger, this hair
It is bright to start with from the data of sample set, the weight of each attributive character is acquired using the coefficient of variation, and the weight is several applied to Europe
In in range formula, as the weighting coefficient of each attribute, the distance between sample is calculated using the Euclidean distance of weighting, with the public affairs
Formula, which carries out cluster, can make its Clustering Effect more preferably.
3. a kind of customer segmentation method and device based on clustering of the present invention, the new selection cluster of use
The method of central point not only avoids the randomness for choosing initial cluster center point, and the distribution for having reacted data sample is special
Point does not easily cause the Clustering Effect of local optimum, compensates for the deficiency of traditional k-means algorithms.
Description of the drawings
The accompanying drawings which form a part of this application are used for providing further understanding of the present application, and the application's shows
Meaning property embodiment and its explanation do not constitute the improper restriction to the application for explaining the application.
Fig. 1 is the customer segmentation method flow chart based on clustering of the present invention;
Fig. 2 is the flow chart of the autocoder of the present invention.
Specific implementation mode:
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete
Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on
Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other
Embodiment shall fall within the protection scope of the present invention.
It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the application.Unless another
It indicates, all technical and scientific terms that the present embodiment uses have and the application person of an ordinary skill in the technical field
Normally understood identical meanings.
It should be noted that term used herein above is merely to describe specific implementation mode, and be not intended to restricted root
According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singulative
It is also intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet
Include " when, indicate existing characteristics, step, operation, device, component and/or combination thereof.
It should be noted that flowcharts and block diagrams in the drawings show according to various embodiments of the present disclosure method and
The architecture, function and operation in the cards of system.It should be noted that each box in flowchart or block diagram can represent
A part for a part for one module, program segment, or code, the module, program segment, or code may include one or more
A executable instruction for realizing the logic function of defined in each embodiment.It should also be noted that some alternately
Realization in, the function that is marked in box can also occur according to the sequence different from being marked in attached drawing.For example, two connect
The box even indicated can essentially be basically executed in parallel or they can also be executed in a reverse order sometimes,
This depends on involved function.It should also be noted that each box in flowchart and or block diagram and flow chart
And/or the combination of the box in block diagram, it can be come using the dedicated hardware based system for executing defined functions or operations
It realizes, or can make to combine using a combination of dedicated hardware and computer instructions to realize.
In the absence of conflict, the features in the embodiments and the embodiments of the present application can be combined with each other with reference to
The invention will be further described with embodiment for attached drawing.
Embodiment 1:
The purpose of the present embodiment 1 is to provide a kind of customer segmentation method based on clustering.
To achieve the goals above, the present invention is using a kind of following technical solution:
As shown in Figure 1,
A kind of customer segmentation method based on clustering, this method include:
Step (1):Customer information raw data set is obtained, numeralization pretreatment is carried out, obtains data sample, by automatic
Encoder carries out dimensionality reduction and feature extraction to data sample;
Step (2):Pass through autocoder treated data obtain cluster knot by improved k-meams algorithm process
Fruit completes customer segmentation work;
Step (2-1):Autocoder treated data sample is used to the power of VC Method computation attribute feature
Weight, and the distance between sample is calculated using the Euclidean distance formula of weighting;
Step (2-2):The average distance between all data samples is calculated, ergodic data sample searches each sample point and its
Distance is less than the Neighbor Points of average distance, counts all sample Neighbor Points quantity and according to descending sort, determines in initial clustering
Then they are carried out cluster operation according to weighted euclidean distance, complete customer segmentation work by remaining data point by heart point.
The present embodiment the step of in (1), the pretreated specific steps that quantize include:
The first step:Not only include the data of numeric type in most data set, also includes that character type etc. is other kinds of
Data, so, the data of nonumeric type are subjected to numeralization processing first.Such as:There are two types of values for gender attribute:Man, female.It will
It carries out numeralization processing, and male is indicated with 0, and women is indicated with 1, in this way, gender=0 that represent is male;Gender=1 represents
Be women.
Second step:Use the data of standardization formula manipulation numeric type.Standardization formula is:
Wherein, x'ijFor standardization as a result, xijFor pending numeric type data,For pending numeric type number
According to average value,θjFor the standard deviation of pending numeric type data,
Third walks:The data of normalized processing are handled using normalized formula.Normalized formula
For:
Wherein, min { x'ijBe standardization result minimum value, max { x'ijBe standardization result maximum
Value,
Finally obtained data set can be more convenient to carry out the feature extraction of subsequent step.
It is described that dimensionality reduction and feature extraction are carried out to data sample by autocoder the present embodiment the step of in (1)
Specific steps include:
Autocoder is mainly to be made of two parts of encoder network and decoder network, and encoder is to inputting sample
This progress compressed encoding, it is therefore an objective to which with the initial data for indicating higher-dimension of the vector maximum limit compared with low dimensional, decoder can be with
Operation is decoded to obtained new samples, it is reverted into initial data to the greatest extent by decoding process.
Their course of work can be described as:
The first step:The primary data sample of no label is input on encoder and is encoded, code codings are obtained.
Second step:Operation is decoded to code using decoder.
Third walks:The error for calculating new sample information and original sample information, according to error to encoder and decoder
Weight parameter be adjusted, reconstructed error is reduced to minimum, code at this time coding is exactly the character representation of original sample.
The realization process of autocoder is as shown in Figure 2.
This embodiment introduces the concepts of autocoder.Today under the big data epoch, we are not all the time in face
Face magnanimity formula, real-time and high-dimensional flow data.Effective ways one of of the autocoder as deep learning, realize
The purpose of dimensionality reduction and feature extraction to data sample so that the feature of the sample obtained after dimension-reduction treatment can be to greatest extent
Representative primary data sample the characteristics of.The introducing of autocoder plays effective work to the feature extraction of initial data
With the characteristics of for flow data, better effect is played in market user's subdivision field.
The present embodiment the step of in (2-1), reflect different attribute to Clustering Effect by introducing VC Method
Importance.In general, the big attribute of dispersion degree role in cluster is bigger, so, data of the present invention from sample set
Start with, the weight of each attributive character is acquired using the coefficient of variation, and the weight is applied in Euclidean distance formula, made
For the weighting coefficient of each attribute, the distance between sample is calculated using the Euclidean distance of weighting.It is specific as follows:
Assuming that data set X is the set for the data sample for needing to cluster, X is that the data object tieed up by n m is constituted,
Its attribute value matrix is expressed as:
I-th of data sample can use x in Xi=(xi1,xi2,xi3,…xij,…xim) indicate, xijWhat is represented is i-th
The value of the jth dimension attribute of a data object.I=1,2 ..., n;J=1,2 ... m.
1. seeking the coefficient of variation of each attribute first.The coefficient of variation is the ratio of standard deviation and average, uses vjIndicate that jth dimension belongs to
The coefficient of variation of property, mathematical formulae are:
Wherein,
2. seeking the weight w of its each attribute again using the coefficient of variation of each dimension attribute acquiredj, formula is:
Wherein, 1≤j≤m.
3. last, the weight of calculated each dimension attribute carries out assignment weighting to Euclidean distance, then with weighting
Euclidean distance calculate data sample point between distance.Entitled Euclidean distance can be expressed as:
Wherein, xaAnd xbIt is two data sample points.
The present embodiment the step of in (2-2), for traditional k-means algorithms to the selection ratio of initial cluster center point
More sensitive, the shortcomings that blindness randomly selected will produce poor Clustering Effect, this paper presents a kind of new selection is initial
The method of cluster centre point, detailed process are as follows:
1. calculating the distance between each two data sample in data set.It is carried out using the Euclidean distance of weighting presented above
Distance calculates.
2. acquiring the average distance between all data samples according to following formula, formula is:
Wherein, n is the number of data sample point,Represent from data set appoint take 2 samples at
Arrangement number.
3. optional data sample point xi(1≤i≤n), lookup and xiDistance be less than average distance Dis(Average)Institute
There is sample point, such sample point is referred to as xiNeighbor Points, and calculate xiThe number of Neighbor Points.And so on, statistical number
According to the quantity for the Neighbor Points for concentrating all samples, and each sample is ranked up according to the height that its neighbour counts out.
4. select neighbour to count out highest sample point as the 1st initial cluster center point, neighbour count out the 2nd sample
Point is the 2nd initial cluster center point, is searched down successively, if Neighbor Points number is ordered as the sample x of pthj(1≤j≤n)
It is the Neighbor Points for having selected cluster centre point, then ignores the point, check the sample point x for being ordered as P+1z(1≤z≤n), if xzPoint
It is not the Neighbor Points for having cluster centre point, then by xzAs an initial cluster center point.And so on, until finding whole
K initial cluster center point.
Embodiment 2:
The purpose of the present embodiment 2 is to provide a kind of computer readable storage medium.
To achieve the goals above, the present invention is using a kind of following technical solution:
A kind of computer readable storage medium, wherein being stored with a plurality of instruction, described instruction is suitable for by terminal device equipment
Processor load and execute following processing:
Step (1):Customer information raw data set is obtained, numeralization pretreatment is carried out, obtains data sample, by automatic
Encoder carries out dimensionality reduction and feature extraction to data sample;
Step (2):Pass through autocoder treated data obtain cluster knot by improved k-meams algorithm process
Fruit completes customer segmentation work;
Step (2-1):Autocoder treated data sample is used to the power of VC Method computation attribute feature
Weight, and the distance between sample is calculated using the Euclidean distance formula of weighting;
Step (2-2):The average distance between all data samples is calculated, ergodic data sample searches each sample point and its
Distance is less than the Neighbor Points of average distance, counts all sample Neighbor Points quantity and descending sort, determines initial cluster center point,
Then, they are subjected to cluster operation according to weighted euclidean distance by remaining data point, complete customer segmentation work.
Embodiment 3:
The purpose of the present embodiment 3 is to provide a kind of customer segmentation device based on clustering.
To achieve the goals above, the present invention is using a kind of following technical solution:
A kind of customer segmentation system and device based on clustering, including processor and computer readable storage medium, place
Device is managed for realizing each instruction;Computer readable storage medium is suitable for being added by processor for storing a plurality of instruction, described instruction
It carries and executes following processing:
Step (1):Customer information raw data set is obtained, numeralization pretreatment is carried out, obtains data sample, by automatic
Encoder carries out dimensionality reduction and feature extraction to data sample;
Step (2):Pass through autocoder treated data obtain cluster knot by improved k-meams algorithm process
Fruit completes customer segmentation work;
Step (2-1):Autocoder treated data sample is used to the power of VC Method computation attribute feature
Weight, and the distance between sample is calculated using the Euclidean distance formula of weighting;
Step (2-2):The average distance between all data samples is calculated, ergodic data sample searches each sample point and its
Distance is less than the Neighbor Points of average distance, counts all sample Neighbor Points quantity and descending sort, determines initial cluster center point,
Then, they are subjected to cluster operation according to weighted euclidean distance by remaining data point, complete customer segmentation work.
These computer executable instructions make the equipment execute according to each reality in the disclosure when running in a device
Apply method or process described in example.
In the present embodiment, computer program product may include computer readable storage medium, containing for holding
The computer-readable program instructions of row various aspects of the disclosure.Computer readable storage medium can be kept and store
By the tangible device for the instruction that instruction execution equipment uses.Computer readable storage medium for example can be-- but it is unlimited
In-- storage device electric, magnetic storage apparatus, light storage device, electromagnetism storage device, semiconductor memory apparatus or above-mentioned
Any appropriate combination.The more specific example (non exhaustive list) of computer readable storage medium includes:Portable computing
Machine disk, hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or
Flash memory), static RAM (SRAM), Portable compressed disk read-only memory (CD-ROM), digital versatile disc
(DVD), memory stick, floppy disk, mechanical coding equipment, the punch card for being for example stored thereon with instruction or groove internal projection structure, with
And above-mentioned any appropriate combination.Computer readable storage medium used herein above is not interpreted instantaneous signal itself,
The electromagnetic wave of such as radio wave or other Free propagations, the electromagnetic wave propagated by waveguide or other transmission mediums (for example,
Pass through the light pulse of fiber optic cables) or pass through electric wire transmit electric signal.
Computer-readable program instructions described herein can be downloaded to from computer readable storage medium it is each calculate/
Processing equipment, or outer computer or outer is downloaded to by network, such as internet, LAN, wide area network and/or wireless network
Portion's storage device.Network may include copper transmission cable, optical fiber transmission, wireless transmission, router, fire wall, interchanger, gateway
Computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment are received from network to be counted
Calculation machine readable program instructions, and the computer-readable program instructions are forwarded, for the meter being stored in each calculating/processing equipment
In calculation machine readable storage medium storing program for executing.
Computer program instructions for executing present disclosure operation can be assembly instruction, instruction set architecture (ISA)
Instruction, machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or with one or more programmings
Language arbitrarily combines the source code or object code write, the programming language include the programming language-of object-oriented such as
C++ etc., and conventional procedural programming languages-such as " C " language or similar programming language.Computer-readable program refers to
Order can be executed fully, partly be executed on the user computer, as an independent software package on the user computer
Execute, part on the user computer part on the remote computer execute or completely on a remote computer or server
It executes.In situations involving remote computers, remote computer can include LAN by the network-of any kind
(LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize internet
Service provider is connected by internet).In some embodiments, believe by using the state of computer-readable program instructions
Breath comes personalized customization electronic circuit, such as programmable logic circuit, field programmable gate array (FPGA) or programmable logic
Array (PLA), the electronic circuit can execute computer-readable program instructions, to realize the various aspects of present disclosure.
It should be noted that although being referred to several modules or submodule of equipment in the detailed description above, it is this
Division is merely exemplary rather than enforceable.In fact, in accordance with an embodiment of the present disclosure, two or more above-described moulds
The feature and function of block can embody in a module.Conversely, the feature and function of an above-described module can be with
It is further divided into and is embodied by multiple modules.
The foregoing is merely the preferred embodiments of the application, are not intended to limit this application, for the skill of this field
For art personnel, the application can have various modifications and variations.Within the spirit and principles of this application, any made by repair
Change, equivalent replacement, improvement etc., should be included within the protection domain of the application.Therefore, the present invention is not intended to be limited to this
These embodiments shown in text, and it is to fit to widest range consistent with the principles and novel features disclosed in this article.
Claims (10)
1. a kind of customer segmentation method based on clustering, which is characterized in that this method includes:
Customer information raw data set is obtained, numeralization pretreatment is carried out, obtains data sample, by autocoder to data
Sample carries out dimensionality reduction and feature extraction;
Autocoder treated data sample is used to the weight of VC Method computation attribute feature, and using weighting
Euclidean distance formula calculates the distance between sample point;
The average distance between all data samples is calculated, ergodic data sample searches each sample point and is less than average departure with its distance
From Neighbor Points, count all sample Neighbor Points quantity and according to descending sort, determine initial cluster center point, remaining is counted
It is clustered according to according to the European cluster of weighting, completes customer segmentation work.
2. the method as described in claim 1, which is characterized in that in the method, carry out the pretreated specific steps that quantize
Including:
The data of nonumeric type are subjected to numeralization processing;
Use the data of standardization formula manipulation numeric type;
The data of normalized processing are handled using normalized formula, obtain data sample.
3. the method as described in claim 1, which is characterized in that it is described by autocoder to data sample carry out dimensionality reduction and
The specific steps of feature extraction include:
The primary data sample of no label is input on the encoder in autocoder and carries out compressed encoding, obtains code volumes
Code;
Operation is decoded to code using the decoder in autocoder, obtains new data sample;
The error for calculating new data sample and primary data sample, the encoder and decoder in encoder are moved according to error transfer factor
Weight parameter, dimensionality reduction and feature extraction are carried out to data sample by adjusting the autocoder after parameter.
4. the method as described in claim 1, which is characterized in that in the method, improved k-meams algorithms include:
Autocoder treated data sample is used to the weight of VC Method computation attribute feature, and using weighting
Euclidean distance formula calculates the distance between sample, calculates the average distance between all data samples;
Ergodic data sample searches the Neighbor Points that each sample point is less than average distance with its distance, counts all sample Neighbor Points
Quantity and according to descending sort, determines initial cluster center point.
5. method as claimed in claim 4, which is characterized in that handle autocoder in improved k-meams algorithms
Data sample afterwards uses the weight of VC Method computation attribute feature, and specific steps include:
Obtain the attribute value matrix of the data sample after automatic coder processes;
The coefficient of variation of each dimension attribute in computation attribute value matrix;
The weight of its each attributive character is calculated using the coefficient of variation of each dimension attribute acquired.
6. method as claimed in claim 4, which is characterized in that in improved k-meams algorithms, the change of each dimension attribute
Different coefficient is calculated according to the standard deviation of each dimension attribute value in property value matrix and average.
7. method as claimed in claim 4, which is characterized in that in improved k-meams algorithms, the Europe using weighting
In several the specific steps of distance that calculate between sample point of range formula include:
According to the weight for each dimension attribute being calculated, assignment weighting is carried out to Euclidean distance;
The distance between data sample point is calculated using the Euclidean distance formula of weighting.
8. method as claimed in claim 4, which is characterized in that in improved k-meams algorithms, the determining initial clustering
The specific steps of central point include:
Optional data sample point searches all sample points for being less than average distance with its distance, as the data sample point
Neighbor Points, and calculate the quantity of Neighbor Points;
Ergodic data sample searches the Neighbor Points that each sample point is less than average distance with its distance, counts all sample Neighbor Points
Quantity and according to descending sort;
Neighbour is selected to count out highest sample point as first initial cluster center point, if sample point is initial cluster center
The Neighbor Points of point are then ignored, and so on all sample points of traversal until determining k initial cluster center point.
9. a kind of computer readable storage medium, wherein being stored with a plurality of instruction, which is characterized in that described instruction is suitable for by terminal
The processor of equipment loads and executes the method according to any one of claim 1-8.
10. a kind of terminal device, including processor and computer readable storage medium, processor is for realizing each instruction;It calculates
Machine readable storage medium storing program for executing is for storing a plurality of instruction, which is characterized in that described instruction is appointed for executing according in claim 1-8
Method described in one.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810496620.7A CN108734217A (en) | 2018-05-22 | 2018-05-22 | A kind of customer segmentation method and device based on clustering |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201810496620.7A CN108734217A (en) | 2018-05-22 | 2018-05-22 | A kind of customer segmentation method and device based on clustering |
Publications (1)
Publication Number | Publication Date |
---|---|
CN108734217A true CN108734217A (en) | 2018-11-02 |
Family
ID=63938814
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201810496620.7A Pending CN108734217A (en) | 2018-05-22 | 2018-05-22 | A kind of customer segmentation method and device based on clustering |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN108734217A (en) |
Cited By (15)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109495476A (en) * | 2018-11-19 | 2019-03-19 | 中南大学 | A kind of data flow difference method for secret protection and system based on edge calculations |
CN109829018A (en) * | 2019-01-28 | 2019-05-31 | 华南理工大学 | A kind of super divided method of mobile client based on deep learning |
CN109903082A (en) * | 2019-01-24 | 2019-06-18 | 平安科技(深圳)有限公司 | Clustering method, electronic device and storage medium based on user's portrait |
CN109948701A (en) * | 2019-03-19 | 2019-06-28 | 太原科技大学 | A kind of data clustering method based on space-time relationship between track |
CN110059118A (en) * | 2019-04-26 | 2019-07-26 | 迪爱斯信息技术股份有限公司 | Weighing computation method and device, the terminal device of characteristic attribute |
CN110133488A (en) * | 2019-04-09 | 2019-08-16 | 上海电力学院 | Switchgear health status evaluation method and device based on optimal number of degrees |
CN110414569A (en) * | 2019-07-03 | 2019-11-05 | 北京小米智能科技有限公司 | Cluster realizing method and device |
CN110866782A (en) * | 2019-11-06 | 2020-03-06 | 中国农业大学 | Customer classification method and system and electronic equipment |
CN111339294A (en) * | 2020-02-11 | 2020-06-26 | 普信恒业科技发展(北京)有限公司 | Client data classification method and device and electronic equipment |
CN112800476A (en) * | 2021-03-25 | 2021-05-14 | 全球能源互联网研究院有限公司 | Data desensitization method and device and electronic equipment |
CN112905863A (en) * | 2021-03-19 | 2021-06-04 | 青岛檬豆网络科技有限公司 | Automatic customer classification method based on K-Means clustering |
CN113111924A (en) * | 2021-03-26 | 2021-07-13 | 邦道科技有限公司 | Electric power customer classification method and device |
CN113781108A (en) * | 2021-08-30 | 2021-12-10 | 武汉理工大学 | E-commerce platform customer segmentation method and device, electronic equipment and storage medium |
US11386165B2 (en) | 2018-12-21 | 2022-07-12 | Visa International Service Association | Systems and methods for generating transaction profile tags |
CN114841285A (en) * | 2022-05-19 | 2022-08-02 | 中国电信股份有限公司 | Data clustering method and device, electronic equipment and storage medium |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106127519A (en) * | 2016-06-24 | 2016-11-16 | 武汉斗鱼网络科技有限公司 | A kind of live platform user divided method based on K Means algorithm and system |
CN107784518A (en) * | 2017-09-20 | 2018-03-09 | 国网浙江省电力公司电力科学研究院 | A kind of power customer divided method based on multidimensional index |
US20180121942A1 (en) * | 2016-11-03 | 2018-05-03 | Adobe Systems Incorporated | Customer segmentation via consensus clustering |
-
2018
- 2018-05-22 CN CN201810496620.7A patent/CN108734217A/en active Pending
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN106127519A (en) * | 2016-06-24 | 2016-11-16 | 武汉斗鱼网络科技有限公司 | A kind of live platform user divided method based on K Means algorithm and system |
US20180121942A1 (en) * | 2016-11-03 | 2018-05-03 | Adobe Systems Incorporated | Customer segmentation via consensus clustering |
CN107784518A (en) * | 2017-09-20 | 2018-03-09 | 国网浙江省电力公司电力科学研究院 | A kind of power customer divided method based on multidimensional index |
Non-Patent Citations (1)
Title |
---|
XINGANG WANG ET AL.: "Research on Intrusion Detection Based on Feature Extraction of Autoencoder and the Improved K-means Algorithm", 《2017 10TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN》 * |
Cited By (22)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN109495476B (en) * | 2018-11-19 | 2020-11-20 | 中南大学 | Data stream differential privacy protection method and system based on edge calculation |
CN109495476A (en) * | 2018-11-19 | 2019-03-19 | 中南大学 | A kind of data flow difference method for secret protection and system based on edge calculations |
US11790013B2 (en) | 2018-12-21 | 2023-10-17 | Visa International Service Association | Systems and methods for generating transaction profile tags |
US11386165B2 (en) | 2018-12-21 | 2022-07-12 | Visa International Service Association | Systems and methods for generating transaction profile tags |
CN109903082A (en) * | 2019-01-24 | 2019-06-18 | 平安科技(深圳)有限公司 | Clustering method, electronic device and storage medium based on user's portrait |
CN109903082B (en) * | 2019-01-24 | 2022-10-28 | 平安科技(深圳)有限公司 | Clustering method based on user portrait, electronic device and storage medium |
CN109829018A (en) * | 2019-01-28 | 2019-05-31 | 华南理工大学 | A kind of super divided method of mobile client based on deep learning |
CN109948701A (en) * | 2019-03-19 | 2019-06-28 | 太原科技大学 | A kind of data clustering method based on space-time relationship between track |
CN109948701B (en) * | 2019-03-19 | 2022-08-16 | 太原科技大学 | Data clustering method based on space-time correlation among tracks |
CN110133488A (en) * | 2019-04-09 | 2019-08-16 | 上海电力学院 | Switchgear health status evaluation method and device based on optimal number of degrees |
CN110133488B (en) * | 2019-04-09 | 2021-10-08 | 上海电力学院 | Switch cabinet health state evaluation method and device based on optimal grade number |
CN110059118A (en) * | 2019-04-26 | 2019-07-26 | 迪爱斯信息技术股份有限公司 | Weighing computation method and device, the terminal device of characteristic attribute |
CN110414569A (en) * | 2019-07-03 | 2019-11-05 | 北京小米智能科技有限公司 | Cluster realizing method and device |
US11501099B2 (en) | 2019-07-03 | 2022-11-15 | Beijing Xiaomi Intelligent Technology Co., Ltd. | Clustering method and device |
CN110866782A (en) * | 2019-11-06 | 2020-03-06 | 中国农业大学 | Customer classification method and system and electronic equipment |
CN110866782B (en) * | 2019-11-06 | 2022-09-16 | 中国农业大学 | Customer classification method and system and electronic equipment |
CN111339294A (en) * | 2020-02-11 | 2020-06-26 | 普信恒业科技发展(北京)有限公司 | Client data classification method and device and electronic equipment |
CN112905863A (en) * | 2021-03-19 | 2021-06-04 | 青岛檬豆网络科技有限公司 | Automatic customer classification method based on K-Means clustering |
CN112800476A (en) * | 2021-03-25 | 2021-05-14 | 全球能源互联网研究院有限公司 | Data desensitization method and device and electronic equipment |
CN113111924A (en) * | 2021-03-26 | 2021-07-13 | 邦道科技有限公司 | Electric power customer classification method and device |
CN113781108A (en) * | 2021-08-30 | 2021-12-10 | 武汉理工大学 | E-commerce platform customer segmentation method and device, electronic equipment and storage medium |
CN114841285A (en) * | 2022-05-19 | 2022-08-02 | 中国电信股份有限公司 | Data clustering method and device, electronic equipment and storage medium |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN108734217A (en) | A kind of customer segmentation method and device based on clustering | |
Sim et al. | Is deep learning for image recognition applicable to stock market prediction? | |
Mishra et al. | A comparative performance assessment of a set of multiobjective algorithms for constrained portfolio assets selection | |
Das et al. | A bacterial evolutionary algorithm for automatic data clustering | |
CN108629414A (en) | depth hash learning method and device | |
CN112819054B (en) | Method and device for configuring slicing template | |
CN110348721A (en) | Financial default risk prediction technique, device and electronic equipment based on GBST | |
CN114332984B (en) | Training data processing method, device and storage medium | |
US20230334286A1 (en) | Machine-learning method and system to optimize health-care resources using doctor-interpretable entity profiles | |
CN110147389A (en) | Account number treating method and apparatus, storage medium and electronic device | |
Shu et al. | A modified hybrid rice optimization algorithm for solving 0-1 knapsack problem | |
Chaghari et al. | Fuzzy clustering based on Forest optimization algorithm | |
Erpolat Taşabat | A Novel Multicriteria Decision‐Making Method Based on Distance, Similarity, and Correlation: DSC TOPSIS | |
Sahu et al. | Economic load dispatch in power system using genetic algorithm | |
CN110162606A (en) | For solving the session proxy learning model services selection of client-side service request | |
CN110162390A (en) | A kind of method for allocating tasks and system of mist computing system | |
CN109711733A (en) | For generating method, electronic equipment and the computer-readable medium of Clustering Model | |
CN112668482A (en) | Face recognition training method and device, computer equipment and storage medium | |
CN110309774A (en) | Iris segmentation method, apparatus, storage medium and electronic equipment | |
CN114037182A (en) | Building load prediction model training method and device and nonvolatile storage medium | |
CN110288465A (en) | Object determines method and device, storage medium, electronic device | |
CN110163255A (en) | A kind of data stream clustering method and device based on density peaks | |
CN116402625B (en) | Customer evaluation method, apparatus, computer device and storage medium | |
Lian et al. | Ultra-short-term wind speed prediction based on variational mode decomposition and optimized extreme learning machine | |
Charansiriphaisan et al. | A comparative study of improved artificial bee colony algorithms applied to multilevel image thresholding |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
PB01 | Publication | ||
PB01 | Publication | ||
SE01 | Entry into force of request for substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
RJ01 | Rejection of invention patent application after publication |
Application publication date: 20181102 |
|
RJ01 | Rejection of invention patent application after publication |