CN108734217A - A kind of customer segmentation method and device based on clustering - Google Patents

A kind of customer segmentation method and device based on clustering Download PDF

Info

Publication number
CN108734217A
CN108734217A CN201810496620.7A CN201810496620A CN108734217A CN 108734217 A CN108734217 A CN 108734217A CN 201810496620 A CN201810496620 A CN 201810496620A CN 108734217 A CN108734217 A CN 108734217A
Authority
CN
China
Prior art keywords
sample
data
distance
data sample
point
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201810496620.7A
Other languages
Chinese (zh)
Inventor
王新刚
王琳琳
孙涛
姜雪松
耿玉水
鲁芹
李爱民
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Qilu University of Technology
Original Assignee
Qilu University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Qilu University of Technology filed Critical Qilu University of Technology
Priority to CN201810496620.7A priority Critical patent/CN108734217A/en
Publication of CN108734217A publication Critical patent/CN108734217A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/23Clustering techniques
    • G06F18/232Non-hierarchical techniques
    • G06F18/2321Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions
    • G06F18/23213Non-hierarchical techniques using statistics or function optimisation, e.g. modelling of probability density functions with fixed number of clusters, e.g. K-means clustering
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • G06Q30/0201Market modelling; Market analysis; Collecting market data

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Business, Economics & Management (AREA)
  • Physics & Mathematics (AREA)
  • Strategic Management (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Accounting & Taxation (AREA)
  • Theoretical Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Artificial Intelligence (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Evolutionary Computation (AREA)
  • General Engineering & Computer Science (AREA)
  • Evolutionary Biology (AREA)
  • Probability & Statistics with Applications (AREA)
  • Game Theory and Decision Science (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a kind of customer segmentation method and device based on clustering, this method include:Customer information raw data set is obtained, numeralization pretreatment is carried out, obtains data sample, dimensionality reduction and feature extraction are carried out to data sample by autocoder;Autocoder treated data sample is used to the weight of VC Method computation attribute feature, and the distance between sample point is calculated using the Euclidean distance formula of weighting;Calculate the average distance between all data samples, ergodic data sample searches the Neighbor Points that each sample point is less than average distance with its distance, count all sample Neighbor Points quantity and according to descending sort, determine k initial cluster center point, remainder data is clustered according to weighted euclidean distance formula, completes customer segmentation work.

Description

A kind of customer segmentation method and device based on clustering
Technical field
The invention belongs to market statistics and the technical fields of marketing, refer to a kind of customer segmentation based on clustering Method and device.
Background technology
Along with the rapid development of science and technology, the universal use of computer, the quiet infiltration of network is daily in us Every aspect.Nowadays, people find the increasingly heavier of valuable information change using data mining technology from each field It wants, passing development not only can be summed up, but also the development trend in data future can be predicted.Wherein, Customer segmentation is an important field of research.By the method for clustering, and will according to the similitude of client and diversity They are divided into different classes, and enterprise is facilitated to find different types of client, to formulate the sale scheme of differentiation, realize enterprise The bigger of industry profit, so, how the key point that enterprise obtains bigger profit is become to customer segmentation.Currently, in visitor It is primarily present following problem in family subdivision system:
First, customer segmentation system is faced with that data volume is big when handling Customer Information, and the attribute of data is more, number According to the higher problem of dimension, if directly select these initial data carry out clustering, do not only result in the effect of customer segmentation Rate is relatively low, calculates complex steps, can also make customer segmentation overlong time.
Second, in customer segmentation system research, traditional k-means clustering algorithms be most common application algorithm it One, but the algorithm equally treats the attribute of all data samples in cluster process, does not consider the difference between different attributes. However, the importance of different attributes is different, different influences is also generated to Clustering Effect, it is however generally that, it is important Attribute generates large effect to Clustering Effect.
Third, traditional k-means algorithms cluster when, it is more sensitive to the selection of initial cluster center point, exist with Machine chooses the blindness sex chromosome mosaicism of initial cluster center point.In general, the quality of initial cluster center point selection can produce cluster result Raw large effect may make Clustering Effect reach local optimum rather than global optimum, Er Qiehui once selection is improper Increase the iterations of algorithm, reduces convergence speed of the algorithm.
In conclusion for the problem of how preferably carrying out customer segmentation in the prior art, still lack effective solution Scheme.
Invention content
For the problem that the deficiencies in the prior art, how solution preferably carries out customer segmentation in the prior art, The present invention provides a kind of customer segmentation method and device based on clustering, in customer segmentation, according to the consumption of client They are collected as different classes by custom, to propose that different marketing strategies provides basis for inhomogeneous client.
The first object of the present invention is to provide a kind of customer segmentation method based on clustering.
To achieve the goals above, the present invention is using a kind of following technical solution:
A kind of customer segmentation method based on clustering, this method include:
Customer profile data collection is obtained, numeralization pretreatment is carried out, obtains data sample, by autocoder to data Sample carries out dimensionality reduction and feature extraction;
Autocoder treated data sample uses to the weight of VC Method computation attribute feature, and using plus The Euclidean distance formula of power calculates the distance between sample point;
The average distance between all data samples is calculated, ergodic data sample searches each sample point with its distance less than flat The Neighbor Points of equal distance, count all sample Neighbor Points quantity and descending sort, determine initial cluster center point, carry out its remainder Strong point clusters, and obtains inhomogeneous client, completes customer segmentation work.
Scheme as a further preference, in the method, the pretreated specific steps that quantize include:
The data of nonumeric type are subjected to numeralization processing;
Use the data of standardization formula manipulation numeric type;
The data of normalized processing are handled using normalized formula, obtain data sample.
Scheme as a further preference, it is described that dimensionality reduction and feature extraction are carried out to data sample by autocoder Specific steps include:
The primary data sample of no label is input on the encoder in autocoder and carries out compressed encoding, is obtained Code is encoded;
Operation is decoded to code using the decoder in autocoder, obtains new data sample;
The error for calculating new data sample and primary data sample moves the reconciliation of the encoder in encoder according to error transfer factor The weight parameter of code device carries out dimensionality reduction and feature extraction by adjusting the autocoder after parameter to data sample.
Scheme as a further preference, in the method, improved k-meams algorithms include:
Autocoder treated data sample uses to the weight of VC Method computation attribute feature, and using plus The Euclidean distance formula of power calculates the distance between sample, calculates the average distance between all data samples;
Ergodic data sample searches the Neighbor Points that each sample point is less than average distance with its distance, and it is close to count all samples Adjoint point quantity and descending sort, determine initial cluster center point.
Scheme as a further preference, by autocoder treated data sample in improved k-meams algorithms This uses the weight of VC Method computation attribute feature, and specific steps include:
Obtain the attribute value matrix of the data sample after automatic coder processes;
The coefficient of variation of each dimension attribute in computation attribute value matrix;
The weight of its each attributive character is calculated using the coefficient of variation of each dimension attribute acquired.
Scheme as a further preference, in improved k-meams algorithms, the coefficient of variation of each dimension attribute according to The standard deviation of each dimension attribute value is calculated with average in attribute value matrix.
Scheme as a further preference, in improved k-meams algorithms, the Euclidean distance using weighting The specific steps of distance that formula calculates between sample point include:
According to the weight for each dimension attribute being calculated, assignment weighting is carried out to Euclidean distance;
The distance between data sample point is calculated using the Euclidean distance formula of weighting.
Scheme as a further preference, in improved k-meams algorithms, the tool of the determining initial cluster center point Body step includes:
Optional data sample point searches all sample points for being less than average distance with its distance, as the data sample The Neighbor Points of point, and calculate the quantity of Neighbor Points;
Ergodic data sample searches the Neighbor Points that each sample point is less than average distance with its distance, and it is close to count all samples Adjoint point quantity and according to descending sort;
Neighbour is selected to count out highest sample point as first initial cluster center point, if sample point is initial clustering The Neighbor Points of central point are then ignored, and so on all sample points of traversal until determining k initial cluster center point.
The second object of the present invention is to provide a kind of computer readable storage medium.
To achieve the goals above, the present invention is using a kind of following technical solution:
A kind of computer readable storage medium, wherein being stored with a plurality of instruction, described instruction is suitable for by terminal device Reason device loads and executes a kind of customer segmentation method based on clustering.
The third object of the present invention is to provide a kind of terminal device.
To achieve the goals above, the present invention is using a kind of following technical solution:
A kind of terminal device, including processor and computer readable storage medium, processor is for realizing each instruction;It calculates Machine readable storage medium storing program for executing is suitable for being loaded by processor and being executed described one kind and is based on gathering for storing a plurality of instruction, described instruction The customer segmentation method of alanysis.
Beneficial effects of the present invention:
1. a kind of customer segmentation method and device based on clustering of the present invention, introduce autocoder Concept realizes the dimensionality reduction to data sample and the purpose of feature extraction so that the feature of the sample obtained after dimension-reduction treatment The characteristics of primary data sample can be represented to greatest extent, plays the role of effective, needle to the feature extraction of initial data The characteristics of stream data, more efficiently handles high dimensional data, and better effect is played in customer information processing.
2. a kind of customer segmentation method and device based on clustering of the present invention, by introducing VC Method To reflect importance of the different attribute to Clustering Effect.The big attribute of dispersion degree role in cluster is bigger, this hair It is bright to start with from the data of sample set, the weight of each attributive character is acquired using the coefficient of variation, and the weight is several applied to Europe In in range formula, as the weighting coefficient of each attribute, the distance between sample is calculated using the Euclidean distance of weighting, with the public affairs Formula, which carries out cluster, can make its Clustering Effect more preferably.
3. a kind of customer segmentation method and device based on clustering of the present invention, the new selection cluster of use The method of central point not only avoids the randomness for choosing initial cluster center point, and the distribution for having reacted data sample is special Point does not easily cause the Clustering Effect of local optimum, compensates for the deficiency of traditional k-means algorithms.
Description of the drawings
The accompanying drawings which form a part of this application are used for providing further understanding of the present application, and the application's shows Meaning property embodiment and its explanation do not constitute the improper restriction to the application for explaining the application.
Fig. 1 is the customer segmentation method flow chart based on clustering of the present invention;
Fig. 2 is the flow chart of the autocoder of the present invention.
Specific implementation mode:
Following will be combined with the drawings in the embodiments of the present invention, and technical solution in the embodiment of the present invention carries out clear, complete Site preparation describes, it is clear that described embodiments are only a part of the embodiments of the present invention, instead of all the embodiments.It is based on Embodiment in the present invention, it is obtained by those of ordinary skill in the art without making creative efforts every other Embodiment shall fall within the protection scope of the present invention.
It is noted that following detailed description is all illustrative, it is intended to provide further instruction to the application.Unless another It indicates, all technical and scientific terms that the present embodiment uses have and the application person of an ordinary skill in the technical field Normally understood identical meanings.
It should be noted that term used herein above is merely to describe specific implementation mode, and be not intended to restricted root According to the illustrative embodiments of the application.As used herein, unless the context clearly indicates otherwise, otherwise singulative It is also intended to include plural form, additionally, it should be understood that, when in the present specification using term "comprising" and/or " packet Include " when, indicate existing characteristics, step, operation, device, component and/or combination thereof.
It should be noted that flowcharts and block diagrams in the drawings show according to various embodiments of the present disclosure method and The architecture, function and operation in the cards of system.It should be noted that each box in flowchart or block diagram can represent A part for a part for one module, program segment, or code, the module, program segment, or code may include one or more A executable instruction for realizing the logic function of defined in each embodiment.It should also be noted that some alternately Realization in, the function that is marked in box can also occur according to the sequence different from being marked in attached drawing.For example, two connect The box even indicated can essentially be basically executed in parallel or they can also be executed in a reverse order sometimes, This depends on involved function.It should also be noted that each box in flowchart and or block diagram and flow chart And/or the combination of the box in block diagram, it can be come using the dedicated hardware based system for executing defined functions or operations It realizes, or can make to combine using a combination of dedicated hardware and computer instructions to realize.
In the absence of conflict, the features in the embodiments and the embodiments of the present application can be combined with each other with reference to The invention will be further described with embodiment for attached drawing.
Embodiment 1:
The purpose of the present embodiment 1 is to provide a kind of customer segmentation method based on clustering.
To achieve the goals above, the present invention is using a kind of following technical solution:
As shown in Figure 1,
A kind of customer segmentation method based on clustering, this method include:
Step (1):Customer information raw data set is obtained, numeralization pretreatment is carried out, obtains data sample, by automatic Encoder carries out dimensionality reduction and feature extraction to data sample;
Step (2):Pass through autocoder treated data obtain cluster knot by improved k-meams algorithm process Fruit completes customer segmentation work;
Step (2-1):Autocoder treated data sample is used to the power of VC Method computation attribute feature Weight, and the distance between sample is calculated using the Euclidean distance formula of weighting;
Step (2-2):The average distance between all data samples is calculated, ergodic data sample searches each sample point and its Distance is less than the Neighbor Points of average distance, counts all sample Neighbor Points quantity and according to descending sort, determines in initial clustering Then they are carried out cluster operation according to weighted euclidean distance, complete customer segmentation work by remaining data point by heart point.
The present embodiment the step of in (1), the pretreated specific steps that quantize include:
The first step:Not only include the data of numeric type in most data set, also includes that character type etc. is other kinds of Data, so, the data of nonumeric type are subjected to numeralization processing first.Such as:There are two types of values for gender attribute:Man, female.It will It carries out numeralization processing, and male is indicated with 0, and women is indicated with 1, in this way, gender=0 that represent is male;Gender=1 represents Be women.
Second step:Use the data of standardization formula manipulation numeric type.Standardization formula is:
Wherein, x'ijFor standardization as a result, xijFor pending numeric type data,For pending numeric type number According to average value,θjFor the standard deviation of pending numeric type data,
Third walks:The data of normalized processing are handled using normalized formula.Normalized formula For:
Wherein, min { x'ijBe standardization result minimum value, max { x'ijBe standardization result maximum Value,
Finally obtained data set can be more convenient to carry out the feature extraction of subsequent step.
It is described that dimensionality reduction and feature extraction are carried out to data sample by autocoder the present embodiment the step of in (1) Specific steps include:
Autocoder is mainly to be made of two parts of encoder network and decoder network, and encoder is to inputting sample This progress compressed encoding, it is therefore an objective to which with the initial data for indicating higher-dimension of the vector maximum limit compared with low dimensional, decoder can be with Operation is decoded to obtained new samples, it is reverted into initial data to the greatest extent by decoding process.
Their course of work can be described as:
The first step:The primary data sample of no label is input on encoder and is encoded, code codings are obtained.
Second step:Operation is decoded to code using decoder.
Third walks:The error for calculating new sample information and original sample information, according to error to encoder and decoder Weight parameter be adjusted, reconstructed error is reduced to minimum, code at this time coding is exactly the character representation of original sample.
The realization process of autocoder is as shown in Figure 2.
This embodiment introduces the concepts of autocoder.Today under the big data epoch, we are not all the time in face Face magnanimity formula, real-time and high-dimensional flow data.Effective ways one of of the autocoder as deep learning, realize The purpose of dimensionality reduction and feature extraction to data sample so that the feature of the sample obtained after dimension-reduction treatment can be to greatest extent Representative primary data sample the characteristics of.The introducing of autocoder plays effective work to the feature extraction of initial data With the characteristics of for flow data, better effect is played in market user's subdivision field.
The present embodiment the step of in (2-1), reflect different attribute to Clustering Effect by introducing VC Method Importance.In general, the big attribute of dispersion degree role in cluster is bigger, so, data of the present invention from sample set Start with, the weight of each attributive character is acquired using the coefficient of variation, and the weight is applied in Euclidean distance formula, made For the weighting coefficient of each attribute, the distance between sample is calculated using the Euclidean distance of weighting.It is specific as follows:
Assuming that data set X is the set for the data sample for needing to cluster, X is that the data object tieed up by n m is constituted, Its attribute value matrix is expressed as:
I-th of data sample can use x in Xi=(xi1,xi2,xi3,…xij,…xim) indicate, xijWhat is represented is i-th The value of the jth dimension attribute of a data object.I=1,2 ..., n;J=1,2 ... m.
1. seeking the coefficient of variation of each attribute first.The coefficient of variation is the ratio of standard deviation and average, uses vjIndicate that jth dimension belongs to The coefficient of variation of property, mathematical formulae are:
Wherein,
2. seeking the weight w of its each attribute again using the coefficient of variation of each dimension attribute acquiredj, formula is:
Wherein, 1≤j≤m.
3. last, the weight of calculated each dimension attribute carries out assignment weighting to Euclidean distance, then with weighting Euclidean distance calculate data sample point between distance.Entitled Euclidean distance can be expressed as:
Wherein, xaAnd xbIt is two data sample points.
The present embodiment the step of in (2-2), for traditional k-means algorithms to the selection ratio of initial cluster center point More sensitive, the shortcomings that blindness randomly selected will produce poor Clustering Effect, this paper presents a kind of new selection is initial The method of cluster centre point, detailed process are as follows:
1. calculating the distance between each two data sample in data set.It is carried out using the Euclidean distance of weighting presented above Distance calculates.
2. acquiring the average distance between all data samples according to following formula, formula is:
Wherein, n is the number of data sample point,Represent from data set appoint take 2 samples at
Arrangement number.
3. optional data sample point xi(1≤i≤n), lookup and xiDistance be less than average distance Dis(Average)Institute There is sample point, such sample point is referred to as xiNeighbor Points, and calculate xiThe number of Neighbor Points.And so on, statistical number According to the quantity for the Neighbor Points for concentrating all samples, and each sample is ranked up according to the height that its neighbour counts out.
4. select neighbour to count out highest sample point as the 1st initial cluster center point, neighbour count out the 2nd sample Point is the 2nd initial cluster center point, is searched down successively, if Neighbor Points number is ordered as the sample x of pthj(1≤j≤n) It is the Neighbor Points for having selected cluster centre point, then ignores the point, check the sample point x for being ordered as P+1z(1≤z≤n), if xzPoint It is not the Neighbor Points for having cluster centre point, then by xzAs an initial cluster center point.And so on, until finding whole K initial cluster center point.
Embodiment 2:
The purpose of the present embodiment 2 is to provide a kind of computer readable storage medium.
To achieve the goals above, the present invention is using a kind of following technical solution:
A kind of computer readable storage medium, wherein being stored with a plurality of instruction, described instruction is suitable for by terminal device equipment Processor load and execute following processing:
Step (1):Customer information raw data set is obtained, numeralization pretreatment is carried out, obtains data sample, by automatic Encoder carries out dimensionality reduction and feature extraction to data sample;
Step (2):Pass through autocoder treated data obtain cluster knot by improved k-meams algorithm process Fruit completes customer segmentation work;
Step (2-1):Autocoder treated data sample is used to the power of VC Method computation attribute feature Weight, and the distance between sample is calculated using the Euclidean distance formula of weighting;
Step (2-2):The average distance between all data samples is calculated, ergodic data sample searches each sample point and its Distance is less than the Neighbor Points of average distance, counts all sample Neighbor Points quantity and descending sort, determines initial cluster center point, Then, they are subjected to cluster operation according to weighted euclidean distance by remaining data point, complete customer segmentation work.
Embodiment 3:
The purpose of the present embodiment 3 is to provide a kind of customer segmentation device based on clustering.
To achieve the goals above, the present invention is using a kind of following technical solution:
A kind of customer segmentation system and device based on clustering, including processor and computer readable storage medium, place Device is managed for realizing each instruction;Computer readable storage medium is suitable for being added by processor for storing a plurality of instruction, described instruction It carries and executes following processing:
Step (1):Customer information raw data set is obtained, numeralization pretreatment is carried out, obtains data sample, by automatic Encoder carries out dimensionality reduction and feature extraction to data sample;
Step (2):Pass through autocoder treated data obtain cluster knot by improved k-meams algorithm process Fruit completes customer segmentation work;
Step (2-1):Autocoder treated data sample is used to the power of VC Method computation attribute feature Weight, and the distance between sample is calculated using the Euclidean distance formula of weighting;
Step (2-2):The average distance between all data samples is calculated, ergodic data sample searches each sample point and its Distance is less than the Neighbor Points of average distance, counts all sample Neighbor Points quantity and descending sort, determines initial cluster center point, Then, they are subjected to cluster operation according to weighted euclidean distance by remaining data point, complete customer segmentation work.
These computer executable instructions make the equipment execute according to each reality in the disclosure when running in a device Apply method or process described in example.
In the present embodiment, computer program product may include computer readable storage medium, containing for holding The computer-readable program instructions of row various aspects of the disclosure.Computer readable storage medium can be kept and store By the tangible device for the instruction that instruction execution equipment uses.Computer readable storage medium for example can be-- but it is unlimited In-- storage device electric, magnetic storage apparatus, light storage device, electromagnetism storage device, semiconductor memory apparatus or above-mentioned Any appropriate combination.The more specific example (non exhaustive list) of computer readable storage medium includes:Portable computing Machine disk, hard disk, random access memory (RAM), read-only memory (ROM), erasable programmable read only memory (EPROM or Flash memory), static RAM (SRAM), Portable compressed disk read-only memory (CD-ROM), digital versatile disc (DVD), memory stick, floppy disk, mechanical coding equipment, the punch card for being for example stored thereon with instruction or groove internal projection structure, with And above-mentioned any appropriate combination.Computer readable storage medium used herein above is not interpreted instantaneous signal itself, The electromagnetic wave of such as radio wave or other Free propagations, the electromagnetic wave propagated by waveguide or other transmission mediums (for example, Pass through the light pulse of fiber optic cables) or pass through electric wire transmit electric signal.
Computer-readable program instructions described herein can be downloaded to from computer readable storage medium it is each calculate/ Processing equipment, or outer computer or outer is downloaded to by network, such as internet, LAN, wide area network and/or wireless network Portion's storage device.Network may include copper transmission cable, optical fiber transmission, wireless transmission, router, fire wall, interchanger, gateway Computer and/or Edge Server.Adapter or network interface in each calculating/processing equipment are received from network to be counted Calculation machine readable program instructions, and the computer-readable program instructions are forwarded, for the meter being stored in each calculating/processing equipment In calculation machine readable storage medium storing program for executing.
Computer program instructions for executing present disclosure operation can be assembly instruction, instruction set architecture (ISA) Instruction, machine instruction, machine-dependent instructions, microcode, firmware instructions, condition setup data or with one or more programmings Language arbitrarily combines the source code or object code write, the programming language include the programming language-of object-oriented such as C++ etc., and conventional procedural programming languages-such as " C " language or similar programming language.Computer-readable program refers to Order can be executed fully, partly be executed on the user computer, as an independent software package on the user computer Execute, part on the user computer part on the remote computer execute or completely on a remote computer or server It executes.In situations involving remote computers, remote computer can include LAN by the network-of any kind (LAN) or wide area network (WAN)-is connected to subscriber computer, or, it may be connected to outer computer (such as utilize internet Service provider is connected by internet).In some embodiments, believe by using the state of computer-readable program instructions Breath comes personalized customization electronic circuit, such as programmable logic circuit, field programmable gate array (FPGA) or programmable logic Array (PLA), the electronic circuit can execute computer-readable program instructions, to realize the various aspects of present disclosure.
It should be noted that although being referred to several modules or submodule of equipment in the detailed description above, it is this Division is merely exemplary rather than enforceable.In fact, in accordance with an embodiment of the present disclosure, two or more above-described moulds The feature and function of block can embody in a module.Conversely, the feature and function of an above-described module can be with It is further divided into and is embodied by multiple modules.
The foregoing is merely the preferred embodiments of the application, are not intended to limit this application, for the skill of this field For art personnel, the application can have various modifications and variations.Within the spirit and principles of this application, any made by repair Change, equivalent replacement, improvement etc., should be included within the protection domain of the application.Therefore, the present invention is not intended to be limited to this These embodiments shown in text, and it is to fit to widest range consistent with the principles and novel features disclosed in this article.

Claims (10)

1. a kind of customer segmentation method based on clustering, which is characterized in that this method includes:
Customer information raw data set is obtained, numeralization pretreatment is carried out, obtains data sample, by autocoder to data Sample carries out dimensionality reduction and feature extraction;
Autocoder treated data sample is used to the weight of VC Method computation attribute feature, and using weighting Euclidean distance formula calculates the distance between sample point;
The average distance between all data samples is calculated, ergodic data sample searches each sample point and is less than average departure with its distance From Neighbor Points, count all sample Neighbor Points quantity and according to descending sort, determine initial cluster center point, remaining is counted It is clustered according to according to the European cluster of weighting, completes customer segmentation work.
2. the method as described in claim 1, which is characterized in that in the method, carry out the pretreated specific steps that quantize Including:
The data of nonumeric type are subjected to numeralization processing;
Use the data of standardization formula manipulation numeric type;
The data of normalized processing are handled using normalized formula, obtain data sample.
3. the method as described in claim 1, which is characterized in that it is described by autocoder to data sample carry out dimensionality reduction and The specific steps of feature extraction include:
The primary data sample of no label is input on the encoder in autocoder and carries out compressed encoding, obtains code volumes Code;
Operation is decoded to code using the decoder in autocoder, obtains new data sample;
The error for calculating new data sample and primary data sample, the encoder and decoder in encoder are moved according to error transfer factor Weight parameter, dimensionality reduction and feature extraction are carried out to data sample by adjusting the autocoder after parameter.
4. the method as described in claim 1, which is characterized in that in the method, improved k-meams algorithms include:
Autocoder treated data sample is used to the weight of VC Method computation attribute feature, and using weighting Euclidean distance formula calculates the distance between sample, calculates the average distance between all data samples;
Ergodic data sample searches the Neighbor Points that each sample point is less than average distance with its distance, counts all sample Neighbor Points Quantity and according to descending sort, determines initial cluster center point.
5. method as claimed in claim 4, which is characterized in that handle autocoder in improved k-meams algorithms Data sample afterwards uses the weight of VC Method computation attribute feature, and specific steps include:
Obtain the attribute value matrix of the data sample after automatic coder processes;
The coefficient of variation of each dimension attribute in computation attribute value matrix;
The weight of its each attributive character is calculated using the coefficient of variation of each dimension attribute acquired.
6. method as claimed in claim 4, which is characterized in that in improved k-meams algorithms, the change of each dimension attribute Different coefficient is calculated according to the standard deviation of each dimension attribute value in property value matrix and average.
7. method as claimed in claim 4, which is characterized in that in improved k-meams algorithms, the Europe using weighting In several the specific steps of distance that calculate between sample point of range formula include:
According to the weight for each dimension attribute being calculated, assignment weighting is carried out to Euclidean distance;
The distance between data sample point is calculated using the Euclidean distance formula of weighting.
8. method as claimed in claim 4, which is characterized in that in improved k-meams algorithms, the determining initial clustering The specific steps of central point include:
Optional data sample point searches all sample points for being less than average distance with its distance, as the data sample point Neighbor Points, and calculate the quantity of Neighbor Points;
Ergodic data sample searches the Neighbor Points that each sample point is less than average distance with its distance, counts all sample Neighbor Points Quantity and according to descending sort;
Neighbour is selected to count out highest sample point as first initial cluster center point, if sample point is initial cluster center The Neighbor Points of point are then ignored, and so on all sample points of traversal until determining k initial cluster center point.
9. a kind of computer readable storage medium, wherein being stored with a plurality of instruction, which is characterized in that described instruction is suitable for by terminal The processor of equipment loads and executes the method according to any one of claim 1-8.
10. a kind of terminal device, including processor and computer readable storage medium, processor is for realizing each instruction;It calculates Machine readable storage medium storing program for executing is for storing a plurality of instruction, which is characterized in that described instruction is appointed for executing according in claim 1-8 Method described in one.
CN201810496620.7A 2018-05-22 2018-05-22 A kind of customer segmentation method and device based on clustering Pending CN108734217A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201810496620.7A CN108734217A (en) 2018-05-22 2018-05-22 A kind of customer segmentation method and device based on clustering

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810496620.7A CN108734217A (en) 2018-05-22 2018-05-22 A kind of customer segmentation method and device based on clustering

Publications (1)

Publication Number Publication Date
CN108734217A true CN108734217A (en) 2018-11-02

Family

ID=63938814

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810496620.7A Pending CN108734217A (en) 2018-05-22 2018-05-22 A kind of customer segmentation method and device based on clustering

Country Status (1)

Country Link
CN (1) CN108734217A (en)

Cited By (15)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109495476A (en) * 2018-11-19 2019-03-19 中南大学 A kind of data flow difference method for secret protection and system based on edge calculations
CN109829018A (en) * 2019-01-28 2019-05-31 华南理工大学 A kind of super divided method of mobile client based on deep learning
CN109903082A (en) * 2019-01-24 2019-06-18 平安科技(深圳)有限公司 Clustering method, electronic device and storage medium based on user's portrait
CN109948701A (en) * 2019-03-19 2019-06-28 太原科技大学 A kind of data clustering method based on space-time relationship between track
CN110059118A (en) * 2019-04-26 2019-07-26 迪爱斯信息技术股份有限公司 Weighing computation method and device, the terminal device of characteristic attribute
CN110133488A (en) * 2019-04-09 2019-08-16 上海电力学院 Switchgear health status evaluation method and device based on optimal number of degrees
CN110414569A (en) * 2019-07-03 2019-11-05 北京小米智能科技有限公司 Cluster realizing method and device
CN110866782A (en) * 2019-11-06 2020-03-06 中国农业大学 Customer classification method and system and electronic equipment
CN111339294A (en) * 2020-02-11 2020-06-26 普信恒业科技发展(北京)有限公司 Client data classification method and device and electronic equipment
CN112800476A (en) * 2021-03-25 2021-05-14 全球能源互联网研究院有限公司 Data desensitization method and device and electronic equipment
CN112905863A (en) * 2021-03-19 2021-06-04 青岛檬豆网络科技有限公司 Automatic customer classification method based on K-Means clustering
CN113111924A (en) * 2021-03-26 2021-07-13 邦道科技有限公司 Electric power customer classification method and device
CN113781108A (en) * 2021-08-30 2021-12-10 武汉理工大学 E-commerce platform customer segmentation method and device, electronic equipment and storage medium
US11386165B2 (en) 2018-12-21 2022-07-12 Visa International Service Association Systems and methods for generating transaction profile tags
CN114841285A (en) * 2022-05-19 2022-08-02 中国电信股份有限公司 Data clustering method and device, electronic equipment and storage medium

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106127519A (en) * 2016-06-24 2016-11-16 武汉斗鱼网络科技有限公司 A kind of live platform user divided method based on K Means algorithm and system
CN107784518A (en) * 2017-09-20 2018-03-09 国网浙江省电力公司电力科学研究院 A kind of power customer divided method based on multidimensional index
US20180121942A1 (en) * 2016-11-03 2018-05-03 Adobe Systems Incorporated Customer segmentation via consensus clustering

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106127519A (en) * 2016-06-24 2016-11-16 武汉斗鱼网络科技有限公司 A kind of live platform user divided method based on K Means algorithm and system
US20180121942A1 (en) * 2016-11-03 2018-05-03 Adobe Systems Incorporated Customer segmentation via consensus clustering
CN107784518A (en) * 2017-09-20 2018-03-09 国网浙江省电力公司电力科学研究院 A kind of power customer divided method based on multidimensional index

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
XINGANG WANG ET AL.: "Research on Intrusion Detection Based on Feature Extraction of Autoencoder and the Improved K-means Algorithm", 《2017 10TH INTERNATIONAL SYMPOSIUM ON COMPUTATIONAL INTELLIGENCE AND DESIGN》 *

Cited By (22)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109495476B (en) * 2018-11-19 2020-11-20 中南大学 Data stream differential privacy protection method and system based on edge calculation
CN109495476A (en) * 2018-11-19 2019-03-19 中南大学 A kind of data flow difference method for secret protection and system based on edge calculations
US11790013B2 (en) 2018-12-21 2023-10-17 Visa International Service Association Systems and methods for generating transaction profile tags
US11386165B2 (en) 2018-12-21 2022-07-12 Visa International Service Association Systems and methods for generating transaction profile tags
CN109903082A (en) * 2019-01-24 2019-06-18 平安科技(深圳)有限公司 Clustering method, electronic device and storage medium based on user's portrait
CN109903082B (en) * 2019-01-24 2022-10-28 平安科技(深圳)有限公司 Clustering method based on user portrait, electronic device and storage medium
CN109829018A (en) * 2019-01-28 2019-05-31 华南理工大学 A kind of super divided method of mobile client based on deep learning
CN109948701A (en) * 2019-03-19 2019-06-28 太原科技大学 A kind of data clustering method based on space-time relationship between track
CN109948701B (en) * 2019-03-19 2022-08-16 太原科技大学 Data clustering method based on space-time correlation among tracks
CN110133488A (en) * 2019-04-09 2019-08-16 上海电力学院 Switchgear health status evaluation method and device based on optimal number of degrees
CN110133488B (en) * 2019-04-09 2021-10-08 上海电力学院 Switch cabinet health state evaluation method and device based on optimal grade number
CN110059118A (en) * 2019-04-26 2019-07-26 迪爱斯信息技术股份有限公司 Weighing computation method and device, the terminal device of characteristic attribute
CN110414569A (en) * 2019-07-03 2019-11-05 北京小米智能科技有限公司 Cluster realizing method and device
US11501099B2 (en) 2019-07-03 2022-11-15 Beijing Xiaomi Intelligent Technology Co., Ltd. Clustering method and device
CN110866782A (en) * 2019-11-06 2020-03-06 中国农业大学 Customer classification method and system and electronic equipment
CN110866782B (en) * 2019-11-06 2022-09-16 中国农业大学 Customer classification method and system and electronic equipment
CN111339294A (en) * 2020-02-11 2020-06-26 普信恒业科技发展(北京)有限公司 Client data classification method and device and electronic equipment
CN112905863A (en) * 2021-03-19 2021-06-04 青岛檬豆网络科技有限公司 Automatic customer classification method based on K-Means clustering
CN112800476A (en) * 2021-03-25 2021-05-14 全球能源互联网研究院有限公司 Data desensitization method and device and electronic equipment
CN113111924A (en) * 2021-03-26 2021-07-13 邦道科技有限公司 Electric power customer classification method and device
CN113781108A (en) * 2021-08-30 2021-12-10 武汉理工大学 E-commerce platform customer segmentation method and device, electronic equipment and storage medium
CN114841285A (en) * 2022-05-19 2022-08-02 中国电信股份有限公司 Data clustering method and device, electronic equipment and storage medium

Similar Documents

Publication Publication Date Title
CN108734217A (en) A kind of customer segmentation method and device based on clustering
Sim et al. Is deep learning for image recognition applicable to stock market prediction?
Mishra et al. A comparative performance assessment of a set of multiobjective algorithms for constrained portfolio assets selection
Das et al. A bacterial evolutionary algorithm for automatic data clustering
CN108629414A (en) depth hash learning method and device
CN112819054B (en) Method and device for configuring slicing template
CN110348721A (en) Financial default risk prediction technique, device and electronic equipment based on GBST
CN114332984B (en) Training data processing method, device and storage medium
US20230334286A1 (en) Machine-learning method and system to optimize health-care resources using doctor-interpretable entity profiles
CN110147389A (en) Account number treating method and apparatus, storage medium and electronic device
Shu et al. A modified hybrid rice optimization algorithm for solving 0-1 knapsack problem
Chaghari et al. Fuzzy clustering based on Forest optimization algorithm
Erpolat Taşabat A Novel Multicriteria Decision‐Making Method Based on Distance, Similarity, and Correlation: DSC TOPSIS
Sahu et al. Economic load dispatch in power system using genetic algorithm
CN110162606A (en) For solving the session proxy learning model services selection of client-side service request
CN110162390A (en) A kind of method for allocating tasks and system of mist computing system
CN109711733A (en) For generating method, electronic equipment and the computer-readable medium of Clustering Model
CN112668482A (en) Face recognition training method and device, computer equipment and storage medium
CN110309774A (en) Iris segmentation method, apparatus, storage medium and electronic equipment
CN114037182A (en) Building load prediction model training method and device and nonvolatile storage medium
CN110288465A (en) Object determines method and device, storage medium, electronic device
CN110163255A (en) A kind of data stream clustering method and device based on density peaks
CN116402625B (en) Customer evaluation method, apparatus, computer device and storage medium
Lian et al. Ultra-short-term wind speed prediction based on variational mode decomposition and optimized extreme learning machine
Charansiriphaisan et al. A comparative study of improved artificial bee colony algorithms applied to multilevel image thresholding

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20181102

RJ01 Rejection of invention patent application after publication