CN109903082B

CN109903082B - Clustering method based on user portrait, electronic device and storage medium

Info

Publication number: CN109903082B
Application number: CN201910068877.7A
Authority: CN
Inventors: 金戈; 徐亮
Original assignee: Ping An Technology Shenzhen Co Ltd
Current assignee: Ping An Technology Shenzhen Co Ltd
Priority date: 2019-01-24
Filing date: 2019-01-24
Publication date: 2022-10-28
Anticipated expiration: 2039-01-24
Also published as: CN109903082A; WO2020151152A1

Abstract

The invention relates to a data analysis technology, and provides a clustering method based on user portrait, which comprises the following steps: acquiring user characteristics and characteristic variables of a plurality of users; converting the user characteristics into word vectors; clustering the word vectors, and determining the category of each user feature; dividing the characteristic variables into continuous variables and discrete variables; carrying out quantization processing on discrete variables and continuous variables; screening out the type of the user characteristics with preference, and endowing the discrete variable and the continuous variable which are subjected to quantization processing of the type of the user characteristics with preference with a weight value more than 1; and clustering all the discrete variables and continuous variables subjected to the quantization processing to obtain biased user characteristic clusters. The invention also provides an electronic device and a storage medium. The invention has the advantages of pointed clustering on the basis of keeping all characteristic information.

Description

Clustering method based on user portrait, electronic device and storage medium

Technical Field

The present invention relates to the field of data analysis technologies, and in particular, to a user portrait based clustering method, an electronic device, and a storage medium.

Background

Therefore, the concept of user portrait comes into force for accurate marketing service and further deep mining of potential business value. The user portrait is the labeling of the user information, and one label is usually a highly refined feature identifier, such as age, gender, user preference, and the like, and finally, the three-dimensional portrait of the user can be outlined by combining all labels of the user, and the user portrait can abstract the overall appearance of the user information. In the current stage, user figures are clustered, and data sources can be generally divided into life attributes, behavior attributes and the like, so that accurate clustering with pertinence cannot be realized.

Disclosure of Invention

In view of the foregoing, it is an object of the present invention to provide a user-portrait-based clustering method, an electronic device, and a storage medium, which can perform targeted clustering while retaining all feature information.

To achieve the above object, the present invention provides an electronic device, comprising a memory and a processor, wherein the memory comprises a user-portrait-based clustering program, and the user-portrait-based clustering program, when executed by the processor, implements the following steps:

acquiring user characteristics of a plurality of users and characteristic variables corresponding to the user characteristics;

converting the user characteristics into word vectors;

clustering the word vectors, and determining the category of each user feature;

dividing feature variables corresponding to the user features into continuous variables and discrete variables, wherein the continuous variables are numerical variables with sequence attributes, and the discrete variables are non-numerical variables;

carrying out quantization processing on discrete variables and continuous variables;

screening out the type of the user characteristics with preference, and endowing the quantized discrete variable and continuous variable of the type of the user characteristics with preference to a weight value which is more than 1, wherein the preference refers to the bias of a clustering process;

and clustering all the quantized discrete variables and continuous variables to obtain biased user characteristic clusters.

In addition, in order to achieve the above object, the present invention further provides a user portrait-based clustering method, including:

acquiring user characteristics of a plurality of users and corresponding characteristic variables thereof;

converting the user characteristics into word vectors;

clustering the word vectors, and determining the category of each user feature;

dividing the characteristic variables into continuous variables and discrete variables, wherein the continuous variables are numerical variables with sequence attributes, and the discrete variables are non-numerical variables;

screening out the category of the user characteristics with preference, and endowing the quantized discrete variable and continuous variable of the user characteristic category with preference with a weight value larger than 1, wherein the preference refers to the bias of a clustering process;

Preferably, the method for quantizing discrete variables and continuous variables includes:

converting discrete variables with order into numerical form;

converting discrete variables which do not have the orderliness and have the value number exceeding the set number into a high-order form;

encoding the discrete variable converted into the high-order form;

and carrying out normalization processing on the discrete variables and the continuous variables with the sequence after the codes are screened out.

Preferably, the method for giving a weight greater than 1 to the quantized discrete variable and continuous variable of the preferred user feature category comprises:

counting the number n of categories after user feature clustering;

changing the weight of the characteristic variable of the category of the user characteristic with preference within the range of more than 1 and not more than n-1;

and determining the optimal weight according to the contour coefficient or/and the interpretability of the cluster after weighting.

Further, preferably, the method further comprises:

clustering results corresponding to the optimal weight are used as optimal biased user characteristic clustering, wherein the clustering comprises the following steps:

calculating the contour coefficient of each cluster according to the formula

Wherein s is _i Contour coefficient of i-th cluster, a _i And b _i Respectively are distances belonging to different classes in the ith clustering resultThe largest two characteristic variables;

and repeating the steps to obtain a curve of the profile coefficient changing along with the weight value, observing whether the curve has an extreme point, taking the weight value corresponding to the maximum value of the profile coefficient as an optimal weight value, and taking the clustering result corresponding to the maximum value of the profile coefficient as the optimal biased user characteristic clustering.

Furthermore, preferably, the category of the preferred user feature is one or more categories, and when the category of the preferred user feature is one category, the weight of the feature variable of the preferred user feature is in a range greater than 1 and not greater than n-1; when the preferred category is multiple categories, the weight value of the feature variable of one category of the user features of the multiple categories is more than 1, the sum of the weight values is not more than n-1, and n is the number of the categories after the user features are clustered.

Furthermore, preferably, the method for assigning a weight greater than 1 to the quantized discrete variable and the continuous variable of the preferred user feature class includes:

counting the total number of the user characteristics, wherein the total number of the user characteristics belongs to the user characteristic number of each user characteristic category;

the weight given to a preferred user feature category is in the range of more than 1 to make the number of user features of the category equal to the sum of the number of user features of other categories.

In addition, in order to achieve the above object, the present invention further provides a computer readable storage medium, where the computer readable storage medium includes a user portrait based clustering program, and when the user portrait based clustering program is executed by a processor, the steps of the user portrait based clustering method are implemented.

The user portrait-based clustering method, the electronic device and the computer-readable storage medium can realize targeted clustering on the basis of retaining all feature information, and meanwhile, due to the ordered and unordered processing of discrete features, the overall precision is improved.

Drawings

FIG. 1 is a schematic diagram of an application environment of a preferred embodiment of a user profile-based clustering method according to the present invention;

FIG. 2 is a block diagram of a preferred embodiment of the user profile-based clustering routine of FIG. 1;

FIG. 3 is a flow chart of a preferred embodiment of the user profile-based clustering method of the present invention.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Specific embodiments of the present invention will be described in detail below with reference to the accompanying drawings.

The invention provides a clustering method based on user portrait, which is applied to an electronic device 1. FIG. 1 is a schematic diagram of an application environment of a user portrait-based clustering method according to a preferred embodiment of the present invention.

In the present embodiment, the electronic device 1 may be a terminal client having an arithmetic function, such as a server, a mobile phone, a tablet computer, a portable computer, and a desktop computer.

The memory 11 includes at least one type of readable storage medium. The at least one type of readable storage medium may be a non-volatile storage medium such as a flash memory, a hard disk, a multimedia card, a card-type memory, and the like. In some embodiments, the readable storage medium may be an internal storage unit of the electronic apparatus 1, such as a hard disk of the electronic apparatus 1. In other embodiments, the readable storage medium may also be an external memory of the electronic device 1, such as a plug-in hard disk, a Smart Media Card (SMC), a Secure Digital (SD) Card, a Flash memory Card (Flash Card), and the like, which are provided on the electronic device 1.

In the present embodiment, the readable storage medium of the memory 11 is generally used for storing the user portrait based clustering program 10 and the like installed in the electronic device 1. The memory 11 may also be used for temporarily storing data that has been output or is to be output.

Processor 12, which in some embodiments may be a Central Processing Unit (CPU), microprocessor or other data Processing chip, operates program code or processes data stored in memory 11, such as executing user-portrait based clustering routine 10.

The network interface 13 may optionally comprise a standard wired interface, a wireless interface (e.g. WI-FI interface), typically used for establishing a communication connection between the electronic apparatus 1 and other electronic clients.

The communication bus 14 is used to enable connection communication between these components.

Fig. 1 only shows the electronic device 1 with components 11-14, but it is to be understood that not all of the shown components are required to be implemented, and that more or fewer components may alternatively be implemented.

Optionally, the electronic device 1 may further include a user interface, the user interface may include an input unit such as a Keyboard (Keyboard), a voice input device such as a microphone (microphone) or other client with a voice recognition function, a voice output device such as a sound box, a headset, and the like, and optionally the user interface may further include a standard wired interface, a wireless interface.

Optionally, the electronic device 1 may further comprise a display, which may also be referred to as a display screen or a display unit.

In some embodiments, the display device may be an LED display, a liquid crystal display, a touch-sensitive liquid crystal display, an Organic Light-Emitting Diode (OLED) touch device, or the like. The display is used for displaying information processed in the electronic apparatus 1 and for displaying a visualized user interface.

Optionally, the electronic device 1 further comprises a touch sensor. The area provided by the touch sensor for the user to perform touch operation is called a touch area. Further, the touch sensor described herein may be a resistive touch sensor, a capacitive touch sensor, or the like. The touch sensor may include not only a contact type touch sensor but also a proximity type touch sensor. Further, the touch sensor may be a single sensor, or may be a plurality of sensors arranged in an array, for example.

Optionally, the electronic device 1 may further include logic gates, sensors, audio circuits, and the like, which are not described herein.

In the apparatus embodiment shown in FIG. 1, a memory 11, which is a type of computer storage medium, may include an operating system and a user profile-based clustering program 10; processor 12 implements the following steps when executing user portrait based clustering routine 10 stored in memory 11:

converting the user characteristics into word vectors;

clustering the word vectors, and determining the category of each user feature;

In other embodiments, the user-representation-based clustering program 10 may be further partitioned into one or more modules, which are stored in the memory 11 and executed by the processor 12 to implement the present invention. A module as referred to herein is a set of computer program instruction segments capable of performing a specified function. Referring to FIG. 2, a functional block diagram of a preferred embodiment of the user profile-based clustering routine 10 of FIG. 1 is shown. The user-portrait-based clustering routine 10 may be partitioned into:

a user characteristic obtaining module 110, for obtaining user characteristics of a plurality of users and corresponding characteristic variables;

a conversion module 120, which converts the user characteristics into word vectors;

the first clustering module 130 is used for clustering the word vectors and determining the category of each user feature;

a dividing module 140 that divides the characteristic variables into continuous variables and discrete variables, wherein the continuous variables are numerical variables having sequence attributes, and the discrete variables are non-numerical variables;

a quantization module 150 for performing quantization processing on the discrete variable and the continuous variable;

the preference selection module 160 screens out the categories of the preferred user characteristics, and gives a weight greater than 1 to the quantized discrete variable and continuous variable of the preferred user characteristic categories, wherein the preference refers to the concerned user characteristics and is also the bias of the clustering process;

the second clustering module 170 clusters all the quantized discrete variables and continuous variables, and clusters the weighted feature variables of the user feature categories and the unweighted feature variables of the user feature categories to obtain biased user feature clusters.

In addition, the invention also provides a clustering method based on the user portrait. FIG. 3 is a flowchart illustrating a user portrait-based clustering method according to a preferred embodiment of the present invention. The method may be performed by an apparatus, which may be implemented by software and/or hardware.

In this embodiment, a user portrait-based clustering method includes:

step S1, user characteristics of a plurality of users and characteristic variables corresponding to the user characteristics are obtained, for example, the user characteristics and the characteristic variables thereof can be obtained from a network by using a web crawler technology, and also can be obtained through special data, for example, the user characteristics are gender, and the characteristic variables are female;

step S2, converting the user characteristics into Word vectors, for example, searching the Word vectors corresponding to the user characteristics from a Word vector dictionary, wherein the Word vectors are specifically a dictionary prepared in advance, and the training method is Word2Vec;

s3, clustering word vectors, and determining the category of each user feature, wherein the step can be realized through an SKLearn module in Python, for example, names, sexes, ages, native heredity and the like can be clustered into personal attributes, academic calendars, certificates, work experiences and the like can be clustered into service capacity, and family rows, family structures, family happiness, family education and the like can be clustered into family responsibility;

step S4, dividing the characteristic variables into continuous variables and discrete variables, wherein the continuous variables are numerical variables with sequence attributes, the discrete variables are non-numerical variables (such as place names and grade information), and the characteristic variables can be distinguished automatically through programming;

s5, carrying out quantization processing on the discrete variable and the continuous variable;

s6, screening out a type of the user characteristics with preference, and endowing a weight value which is more than 1 to a discrete variable and a continuous variable which are subjected to quantization processing of the type of the user characteristics with preference, wherein the preference refers to the bias of a clustering process, for example, for clustering of biased characters, the specific gravity of the characteristic variable of the user characteristics related to the biased characters can be adjusted, and the difference of a clustering result in the aspect of characters can be more obvious;

and S7, clustering all the quantized discrete variables and continuous variables, namely clustering the characteristic variables of the weighted user characteristic categories and the characteristic variables of the unweighted user characteristic categories (such as hierarchical clustering, K-Means clustering and the like) to obtain biased user characteristic clusters. This step can be implemented by the K-protocols library in Python.

The clustering method is an unsupervised classification method, a weighted clustering algorithm is established according to the user portrait characteristics, the user classification function can be modified in a weighted mode according to specific application scenes, and the preference of the clustering method can be increased according to business requirements.

In step S5, the method for quantizing a discrete variable and a continuous variable includes:

converting discrete variables with order (such as levels) into numerical form;

converting discrete variables (such as place names and other information) which have no order and have the value number exceeding a set number (for example, 20) into high-order forms (such as identity, city grade and other information);

encoding the discrete variable converted into a higher-order form (e.g., one-hot encoding);

In one embodiment of the present invention, in step S6, the category of the preferred user feature is one or more categories, and when the category of the preferred user feature is one category, the weight of the feature variable of the preferred user feature is in a range greater than 1 and not greater than n-1; when the preferred category is multiple categories, the weight of the feature variable of the user feature of the multiple categories is more than 1, the sum of the weights is not more than n-1, and n is the number of the categories after the user feature clustering.

In another embodiment of the present invention, the category of the preferred user features is one or more categories, and when the category of the preferred user features is one category, the weight of the feature variable of the preferred user features is in a range that is greater than 1 and makes the number of user features of the category equal to the sum of the numbers of user features of other categories; when the preferred category is a multi-category, the weight of the feature variable of the user feature of the preferred category of the multi-category is in a range which is greater than 1 and the sum of the weights enables the total number of the user features of the preferred category to be equal to the sum of the user features of the non-preferred category, for example, the total number of the user features is 800, 4 user feature categories are provided, the user feature numbers of the first category to the fourth category are respectively 100, 300, 200 and 200, and the preferred category is a first category, the weight of the first category is changed in a range which is greater than 1 and not greater than 7.

The weight given to the user feature category with preference in the two embodiments can be changed within the above range to obtain different assignments, so as to obtain different clusters, and the optimal weight of the user feature category with preference can be obtained by one or more combinations in the following embodiments.

In an alternative embodiment, the method for giving a weight greater than 1 to the quantized discrete variable and continuous variable of the preferred user feature class includes:

counting the number n of categories after user feature clustering;

and determining the optimal weight according to the contour coefficient or/and interpretability of the cluster after weighting.

Preferably, the method further comprises the following steps:

clustering results corresponding to the optimal weight values as optimal biased user characteristic clusters, wherein the clustering comprises the following steps:

calculating the contour coefficient of each cluster according to the formula

Wherein s is _i Contour coefficient of i-th cluster, a _i And b _i Respectively two characteristic variables with the maximum distance belonging to different categories in the ith clustering result;

and repeating the steps to obtain a change curve of the profile coefficient along with the weight value, observing whether the curve has an extreme point, taking the weight value corresponding to the maximum value of the profile coefficient as an optimal weight value, and taking the clustering result corresponding to the maximum value of the profile coefficient as the optimal biased user characteristic clustering.

obtaining a quantization matrix composed of discrete variables and continuous variables subjected to quantization processing and having user feature categories with preference for one or more categories

B＝(b _ij ) _m×n

Wherein, b _ij A j characteristic variable which is the ith user characteristic;

constructing a combined weight matrix which endows the feature variables of the user feature classes with preference with different weights for different times

F＝WΘ＝[F ₁ F ₂ … F _n ] ^T

F _n ＝w _n,1 θ ₁ +w _n,2 θ ₂ +…+w _n,l θ _l

Wherein, the matrix W is the weight value given by different times of the characteristic variable of one or more types of user characteristics, theta is the linear coefficient vector of each time given weight value, W _n,l The weight value given to the nth characteristic variable for the first time is larger than 1 and not larger than n-1, n is the number of the characteristic variables, l is the number of the given weight times, w _l Weight vectors composed of the first weighted weights, and the sum of the weights in each weight vector is not more than n-1, theta _l Linear coefficient for the first weighting, theta _k ≥0,k＝1,2,…,l，

F _n The combination weight of the nth characteristic is taken as the combination weight of the nth characteristic;

a vector difference matrix C is constructed using the vector matrix,

obtaining a weight evaluation model according to the vector difference matrix and the combined weight matrix

M(F)＝CF＝CWΘ；

And respectively taking the optimal solution of the combined weight matrix corresponding to the zero first-order derivative of the weight evaluation model as the optimal weight of each characteristic variable.

obtaining a quantization matrix composed of discrete variables and continuous variables subjected to quantization processing and having one or more types of user characteristic categories preferred

B＝(b _ij ) _m×n

constructing a combined weight matrix which endows the feature variables of the user feature categories with preference with different weights for different times

F＝WΘ＝[F ₁ F ₂ … F _n ] ^T

F _n ＝w _n,1 θ ₁ +w _n,2 θ ₂ +…+w _n,l θ _l

Wherein, the matrix W is the weight given to different times of the characteristic variable which has a preference to one or more types of user characteristics, theta is the linear coefficient vector of each time given to the weight, W _n,l The weight value given to the nth characteristic variable for the first time is more than 1 and not more than n-1, n is the number of the characteristic variables, l is the number of the given weight times, w _l Weight vectors composed of the first weighted weights, and the sum of the weights in each weight vector is not more than n-1, theta _l Linear coefficient, θ, for the first weighting _k ≥0,k＝1,2,…,l，

the vector matrix is used to construct the vector sum matrix H,

obtaining a weight evaluation model according to the vector sum matrix and the combined weight matrix

M′(F)＝HF＝HWΘ；

The vector difference matrix is used for constructing the weight evaluation model, so that the difference between characteristic variables belonging to different user characteristics is reflected, the difference between various types when the characteristic variables are clustered is clear, the interpretability is better, the weight evaluation model is constructed by using the vector and the matrix, and the relation between different user characteristics is reflected, so that the characteristic variables are clustered to have a good outline, and therefore, the evaluation model can be constructed by adopting the weighted combination of the vector difference matrix and the matrix.

In an embodiment of the present invention, the method for quantizing a discrete variable and a continuous variable includes:

the degree of dispersion of the dispersion variable is determined, which can be obtained according to one or more methods of the range, the interquartile range, the variance, the standard deviation, the average difference and the coefficient of variation of the word vector, for example, the dispersion is evaluated by the average variance,

wherein PC is the discrete degree of discrete variable of a user characteristic, N is the number of users, y _i And o _i Discrete variables of user characteristics of the ith user and expected values thereof, respectively, the expected values being set values that reduce the degree of the dispersion;

and summarizing and counting discrete variables with the discrete degrees exceeding a threshold (the discrete degrees can be set, the higher the clustering precision is, the lower the threshold is) until the discrete degrees do not exceed the threshold, for example, discrete features of residential areas can be summarized and unified into streets by cells, and when the discrete degrees of the discrete features summarized and unified into streets still exceed the threshold, the discrete variables can be further summarized and unified into districts/counties.

In an embodiment of the present invention, the method for clustering all quantized discrete variables and continuous variables to obtain biased user feature clusters includes:

giving different weights to perform multiple initial clustering;

constructing a tree structure according to results of multiple initial clustering, wherein root nodes are clustered from a first initial clustering result to a last initial clustering result from top to bottom in sequence, and the side length is the proportion of characteristic variables with the same length in the clustering results to all the characteristic variables;

taking the ratio of the side length difference value between the nodes to the maximum side length and the shortest side length as the similarity between the nodes;

and clustering the nodes according to the similarity (for example, clustering by adopting a k-means method), and taking the intersection of the initial clusters in the clustering result as the optimal clustering result.

In addition, an embodiment of the present invention further provides a computer-readable storage medium, where the computer-readable storage medium includes a user-portrait-based clustering program, and when executed by a processor, the user-portrait-based clustering program implements the following steps:

converting the user characteristics into word vectors;

clustering the word vectors, and determining the category of each user feature;

and clustering all the discrete variables and continuous variables subjected to the quantization processing to obtain biased user characteristic clusters.

The specific implementation of the computer-readable storage medium of the present invention is substantially the same as the above-mentioned user-portrait-based clustering method and the electronic device, and will not be described herein again.

The user portrait-based clustering method, the electronic device and the storage medium can select a plurality of fields (targeted classification, for example, if the group of users hope to focus on personal attribute classification, the weight of the attribute is increased) to perform weight adjustment (larger than 1), so that targeted clustering is realized.

It should be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, apparatus, article, or method that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, apparatus, article, or method. Without further limitation, an element defined by the phrases "comprising a," "8230," "8230," or "comprising" does not exclude the presence of another identical element in a process, apparatus, article, or method comprising the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments. Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention or the portions contributing to the prior art may be embodied in the form of a software product, which is stored in a storage medium (such as ROM/RAM, magnetic disk, optical disk) as described above and includes several instructions for enabling a terminal client (which may be a mobile phone, a computer, a server, or a network client, etc.) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A user portrait based clustering method, comprising:

converting the user characteristics into word vectors;

clustering the word vectors, and determining the category of each user feature;

screening out the category of the user characteristics with preference, and endowing the discrete variable and the continuous variable which are subjected to quantization processing and are of the user characteristic category with preference with a weight value larger than 1, wherein the preference refers to the bias of a clustering process;

clustering all the discrete variables and continuous variables subjected to the quantization processing to obtain biased user characteristic clusters,

the quantization processing of the discrete variable and the continuous variable comprises the following steps:

judging the discrete degree of the discrete variable, wherein the discrete degree is evaluated by the average variance,

wherein PC is the discrete degree of discrete variable of a user characteristic, N is the number of users, y _i And o _i Departure of user characteristics for the ith user respectivelyA dispersion amount and an expected value thereof, the expected value being a set value that reduces the degree of dispersion,

summarizing and counting the discrete variables with the discrete degrees exceeding the threshold value until the discrete degrees do not exceed the threshold value, wherein the method for quantizing the discrete variables and the continuous variables comprises the following steps:

converting discrete variables with order into numerical form;

converting discrete variables which have no order and the value number of which exceeds the set number into a high-order form;

encoding the discrete variable converted into the high-order form;

the discrete variables and the continuous variables with the sequence after being screened out and coded are normalized,

the method for endowing the discrete variable and the continuous variable which are subjected to the quantization processing and have the preference user characteristic category with the weight value which is more than 1 comprises the following steps:

obtaining a quantization matrix formed by discrete variables and continuous variables which are subjected to quantization processing and have one or more types of user characteristic categories which are preferred;

B＝(b _ij ) _m×n

m represents the number of user features;

constructing a combined weight matrix which endows different weights to the characteristic variables of the preferred user characteristic categories for different times;

F＝WΘ＝[F ₁ F ₂ …F _n ] ^T

F _n ＝w _n,1 θ ₁ +w _n,2 θ ₂ +…+w _n,p θ _p

wherein, the matrix W is the weight value given by different times of the characteristic variable of one or more types of user characteristics, theta is the linear coefficient vector of each time given weight value, W _n,p The weight value given to the nth characteristic variable for the pth time is more than 1 and not more than n-1, n is the number of the characteristic variables, p is the number of the given weight times, w _p Weight vectors composed of the weight values weighted for the p-th time, and the sum of the weight values in each weight vector is not more than n-1, theta _p Linear coefficient for p-th weighting, theta _k ≥0,k＝1,2,…,p，

F _n The combination weight of the nth characteristic;

a vector difference matrix C is constructed using the quantization matrices,

obtaining a weight evaluation model according to the vector difference matrix and the combined weight matrix;

M(F)＝CF＝CWΘ；

2. The user profile-based clustering method according to claim 1, wherein the method for assigning a weight value greater than 1 to the quantized discrete variable and continuous variable of the preferred user feature class comprises:

counting the number n of the characteristic variables after the user characteristic clustering;

3. The user portrait based clustering method of claim 2, wherein after the step of determining the optimal weight according to the contour coefficient or/and interpretability of the weighted cluster, further comprising:

calculating the contour coefficient of each cluster according to the following formula

Wherein s is _i Contour coefficient of i-th cluster, a _i And b _i Respectively two feature variables with the maximum distance belonging to different categories in the ith clustering result;

4. The user profile-based clustering method according to claim 1, wherein the category of the preferred user features is one or more, and when the category of the preferred user features is one category, the weight of the feature variable of the preferred user features is in a range of more than 1 and not more than n-1; when the preferred category is multiple categories, the weight value of the feature variable of one category of the user features of the multiple categories is more than 1, the sum of the weight values is not more than n-1, and n is the number of the categories after the user features are clustered.

5. The user portrait based clustering method of claim 4, wherein the method for assigning a weight greater than 1 to the quantized discrete variable and continuous variable of the preferred user feature category further comprises:

B _＝ (b _ij ) _m×n

F＝WΘ＝[F ₁ F ₂ …F _n ] ^T

F _n ＝w _n,1 θ ₁ +w _n,2 θ ₂ +…+w _n,p θ _p

wherein, the matrix W is the weight value given by different times of the characteristic variable of one or more types of user characteristics, theta is the linear coefficient vector of each time given weight value, W _n,p The weight value given to the nth characteristic variable for the pth time is more than 1 and not more than n-1, n is the number of the characteristic variables, p is the number of the given weight times, w _p Weight vectors composed of the weight values weighted for the p-th time, and the sum of the weight values in each weight vector is not more than n-1, theta _p Linear coefficient, θ, for the p-th weighting _k ≥0,k＝1,2,…,

F _n The combination weight of the nth characteristic;

a vector and matrix H are constructed using the quantization matrix,

obtaining a weight evaluation model according to the vector sum matrix and the combined weight matrix;

M′(F)＝HF＝HWΘ；

6. The user profile-based clustering method according to claim 1, wherein the method for assigning a weight value greater than 1 to the quantized discrete variable and continuous variable of the preferred user feature class comprises:

the weight assigned to a preferred user feature category is in the range of more than 1 to make the number of user features of the category equal to the sum of the number of user features of other categories.

7. An electronic device comprising a memory and a processor, the memory having stored therein a user representation-based clustering program, the user representation-based clustering program when executed by the processor implementing the steps of the user representation-based clustering method according to any one of claims 1 to 6.

8. A computer-readable storage medium, comprising a user representation-based clustering program, wherein the user representation-based clustering program, when executed by a processor, performs the steps of the user representation-based clustering method as claimed in any one of claims 1 to 6.