CN116401567B

CN116401567B - Clustering model training, user clustering and information pushing method and device

Info

Publication number: CN116401567B
Application number: CN202310653728.3A
Authority: CN
Inventors: 赵耀; 卢星宇; 马文琪; 曾晓东; 顾进杰; 张冠男
Original assignee: Alipay Hangzhou Information Technology Co Ltd
Current assignee: Alipay Hangzhou Information Technology Co Ltd
Priority date: 2023-06-02
Filing date: 2023-06-02
Publication date: 2023-09-08
Anticipated expiration: 2043-06-02
Also published as: CN116401567A

Abstract

The embodiment of the specification provides a method and a device for training a clustering model, clustering users and pushing information. The cluster model comprises a plurality of layers of class clusters such as parent class clusters and child class clusters. The variables to be learned of the cluster model include the end sub-class cluster center. In one iteration training of the clustering model, determining the center of the terminal sub-class cluster matched with the user characteristics from the class cluster centers of the multi-layer class clusters through the clustering model, and obtaining the terminal sub-class cluster to which the user sample belongs. A predictive penalty is determined based on the similarity between the center of the matched end sub-class cluster and the user feature, and the variable to be learned is updated based on the predictive penalty. And after training the cluster model, deriving the corresponding relation between the cluster identification and the cluster center from the cluster model. The cluster model may determine the cluster-like identity to which the new user sample belongs. In the information pushing scene, the corresponding class cluster center can be queried from the user identification by utilizing the corresponding relation, and the class cluster center is used as the characteristic vector of the user sample for information pushing.

Description

Clustering model training, user clustering and information pushing method and device

Technical Field

One or more embodiments of the present disclosure relate to the field of computer technologies, and in particular, to a method and apparatus for training a clustering model, clustering users, and pushing information.

Background

Feature clustering is an operation of clustering objects based on correlation between object features and object features. In the field of online services, a service platform is typically used to provide services to users. In order to better improve the service level, a service platform needs to cluster users based on user characteristics. For example, in a push scenario, the service platform may push more reasonable information to the user based on the cluster information for the user. The user characteristics are obtained under the condition of obtaining the user authorization. Clustering users of a service platform becomes a current challenge when the users have very large-scale users. In addition, the service platform is paying more attention to the protection of the privacy data in various service data, and the privacy data cannot be sent to the outside in the clear.

Thus, improved schemes are desired that allow more efficient clustering of user features in a large scale scene.

Disclosure of Invention

One or more embodiments of the present specification describe a method and apparatus for training a clustering model, clustering users, and pushing information, so as to enable more efficient clustering of user features in a large-scale scene. The specific technical scheme is as follows.

In a first aspect, an embodiment provides a method for training a cluster model, where the cluster model includes a multi-layer class cluster, and the multi-layer class cluster includes a parent class cluster and a child class cluster included in the parent class cluster; the variables to be learned of the clustering model comprise an end sub-cluster center; the clustering model is used for determining an end sub-class cluster matched with a user characteristic based on the multi-layer class cluster, and the method comprises the following steps:

acquiring user characteristics of a first user sample;

determining the center of an end sub-class cluster matched with the user characteristics based on the class cluster center of the multi-layer class cluster through the clustering model to obtain an end sub-class cluster to which the first user sample belongs;

determining a prediction loss based on a similarity between the matched end sub-cluster center and the user feature;

and updating the variable to be learned based on the prediction loss.

In one embodiment, after updating the variable to be learned, the method further comprises:

and updating the parent cluster center of each layer based on the updated end sub-class cluster center.

In one embodiment, the multi-layer cluster center is initialized in the following manner:

randomly generating a top-level parent cluster center;

and generating an initial value of the center of each layer of sub-cluster based on a preset offset variance of the center of each layer of sub-cluster relative to the center of the parent cluster.

In one embodiment, the step of determining the center of the end sub-class cluster matching the user feature based on the class cluster center of the multi-layer class cluster includes:

and matching the class cluster centers with the user features layer by layer in the sequence from the center of the top-level parent class cluster to the center of the tail-end sub-class cluster.

In one embodiment, the step of determining the predicted loss comprises:

determining a first prediction loss based on a similarity between the matched end sub-cluster center and the user feature;

determining a second predictive loss based on the similarity between the multi-layer cluster centers;

the predicted loss is determined based on the first predicted loss and the second predicted loss.

In one embodiment, the similarity between the multi-layer cluster centers includes several of the following:

similarity between the cluster center of the same layer and the other cluster centers of the same layer;

similarity between the cluster center and its parent cluster center;

similarity between a class cluster center and its sibling class cluster center.

In one embodiment, the step of determining the first predictive loss includes:

and calculating the vector distance between the center of the matched terminal sub-class cluster and the user characteristic, and taking the vector distance as the similarity.

In one embodiment, the predicted loss comprises the first predicted loss and the second predicted loss; the step of updating the variable to be learned includes:

determining a first correction amount for the center of the corresponding end sub-cluster based on the first prediction loss;

determining a second correction amount for the center of the corresponding end sub-cluster based on the second prediction loss;

updating the center of the corresponding end sub-class cluster based on superposition of the first correction amount and the second correction amount.

In one embodiment, the variable to be learned is updated in the following manner:

determining a cold door cluster and a hot door cluster based on the number of first user samples contained in the parent cluster;

obtaining a class cluster center vector of the hot class cluster;

and updating the center vectors of all layers of sub-clusters contained in the cold door clusters based on the difference between the center vectors of the hot door clusters and the center vectors of the cold door clusters.

In one embodiment, the variable to be learned further includes a parent cluster center; the step of updating the variable to be learned includes:

and after updating any first parent cluster center, updating the child cluster center contained in the first parent cluster center based on the updated first parent cluster center.

In one embodiment, the child cluster center is obtained based on superposition of its parent cluster center and the offset; the step of updating the variable to be learned includes:

and updating the offset of the top-layer cluster center and other layer cluster centers based on the corresponding prediction loss.

In one embodiment, the variable to be learned further includes a parent cluster center; the predicted loss includes the first predicted loss and the second predicted loss; the step of updating the variable to be learned includes:

when the second prediction loss is determined based on the similarity between the arbitrary second-class cluster center and the third-class cluster center, determining correction amounts of the second-class cluster center and the third-class cluster center, respectively, based on the second prediction loss;

and updating the sub-cluster center based on the correction amount of the sub-cluster center and the correction amount of the parent cluster center of each layer of the sub-cluster center for any sub-cluster center.

In one embodiment, after the cluster model is trained, the method further comprises:

acquiring a plurality of tail end sub-class clusters from the clustering model to obtain a first corresponding relation between a class cluster identifier and a class cluster center; and storing the first corresponding relation.

In a second aspect, an embodiment provides a method for determining a cluster of clusters of a user sample, including:

acquiring a second user sample of a cluster to be determined;

acquiring a trained cluster model, wherein the cluster model is trained by adopting the method provided by the first aspect;

and determining the center of a tail end sub-class cluster matched with the user characteristics of the second user sample based on the cluster center of the multi-layer cluster through the cluster model to obtain the cluster to which the second user sample belongs.

In one embodiment, after obtaining the cluster identifier of the cluster to which the plurality of user samples belong, the method further includes:

a second correspondence between the user identification of the user sample and the cluster-like identification is stored.

In a third aspect, an embodiment provides an information pushing method, including:

acquiring a user identifier of a third user sample of information to be pushed, and taking the user identifier as a first user identifier;

determining a first cluster identifier corresponding to the first user identifier from a second corresponding relation between the user identifier and the cluster identifier; the second corresponding relation is obtained by predicting a user sample through a clustering model, and the clustering model is trained by adopting the method provided by the first aspect;

Determining a class cluster center corresponding to the first class cluster identifier from a first corresponding relation between the class cluster identifier and the class cluster center, and taking the class cluster center as a sample characteristic of the third user sample; wherein the first correspondence is obtained from the cluster model;

and determining push information for the third user sample through an information push model by using the sample characteristics.

In a fourth aspect, an embodiment provides a training device of a cluster model, where the cluster model includes a multi-layer class cluster, and the multi-layer class cluster includes a parent class cluster and a child class cluster included in the parent class cluster; the variables to be learned of the clustering model comprise an end sub-cluster center; the clustering model is used for determining an end sub-class cluster matched with a user characteristic based on the multi-layer class cluster, and the device comprises:

the first acquisition module is configured to acquire user characteristics of a first user sample;

the first matching module is configured to determine the center of an end sub-class cluster matched with the user characteristics based on the cluster center of the multi-layer cluster through the clustering model to obtain an end sub-class cluster to which the first user sample belongs;

a first penalty module configured to determine a predicted penalty based on a similarity between the matched end sub-cluster center and the user feature;

And the first updating module is configured to update the variable to be learned based on the prediction loss.

In a fifth aspect, an embodiment provides a device for determining a cluster of a user sample, including:

the second acquisition module is configured to acquire a second user sample of the cluster to be determined;

the third acquisition module is configured to acquire a trained cluster model, and the cluster model is trained by adopting the method provided by the first aspect;

and the second matching module is configured to determine the center of the tail end sub-class cluster matched with the user characteristics of the second user sample based on the cluster centers of the multi-layer clusters through the clustering model to obtain the cluster to which the second user sample belongs.

In a sixth aspect, an embodiment provides an information pushing apparatus, including:

the fourth acquisition module is configured to acquire a user identifier of a third user sample of information to be pushed as a first user identifier;

the first determining module is configured to determine a first cluster identifier corresponding to the first user identifier from a second corresponding relation between the user identifier and the cluster identifier; the second corresponding relation is obtained by predicting a user sample through a clustering model, and the clustering model is trained by adopting the method provided by the first aspect;

The second determining module is configured to determine a class cluster center corresponding to the first class cluster identifier from a first corresponding relation between the class cluster identifier and the class cluster center, and the class cluster center is used as a sample characteristic of the third user sample; wherein the first correspondence is obtained from the cluster model;

and a third determining module configured to determine push information for the third user sample through an information push model using the sample features.

In a seventh aspect, an embodiment provides a computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of the first to third aspects.

In an eighth aspect, an embodiment provides a computing device comprising a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any one of the first to third aspects.

In the method and the device provided by the embodiment of the specification, the clustering model is used for determining the class cluster matched with the user characteristic based on the multi-layer class cluster. After the model is trained, aiming at the user characteristics to be clustered, the clustering model can predict the clusters corresponding to the user characteristics. In a scene of a large-scale user, the clustering model is trained by utilizing the user characteristics of part of the users, all the users do not need to participate in the clustering process, and the clustering model can be used for directly predicting the belonging cluster of the users, so that the user characteristics can be clustered more efficiently, and the clustering efficiency is improved.

Drawings

In order to more clearly illustrate the technical solution of the embodiments of the present invention, the drawings that are required to be used in the description of the embodiments will be briefly described below. It is evident that the drawings in the following description are only some embodiments of the present invention and that other drawings may be obtained from these drawings without inventive effort for a person of ordinary skill in the art.

FIG. 1 is a schematic illustration of an implementation scenario of an embodiment disclosed herein;

fig. 2 is a flow chart of a training method of a cluster model according to an embodiment;

FIG. 3 is a schematic diagram of a data relationship between multi-layer cluster centers;

fig. 4 is a flowchart illustrating a method for determining a cluster of a user sample according to an embodiment;

fig. 5 is a schematic flow chart of an information pushing method according to an embodiment;

FIG. 6 is a schematic block diagram of a training apparatus for a cluster model provided in an embodiment;

FIG. 7 is a schematic block diagram of a device for determining a cluster of a user sample according to an embodiment;

fig. 8 is a schematic block diagram of an information pushing device according to an embodiment.

Detailed Description

The following describes the scheme provided in the present specification with reference to the drawings.

Fig. 1 is a schematic diagram of an implementation scenario of an embodiment disclosed in the present specification. The cluster model contains layer 0 through layer 2 class clusters, with each box representing the class cluster center of one class cluster. Where 3 and 4 are child clusters of parent cluster 1 and 5 and 6 are child clusters of parent cluster 3. When the clustering model is trained, user characteristics of a user sample are input into the clustering model, and the clustering model sequentially matches the user characteristics with the multi-layer class clusters from layer 0 to layer 2 downwards to the end sub-class clusters according to the hierarchical sequence. Then, the cluster model is updated based on the similarity between the user features and the cluster-like center. FIG. 1 is merely an illustration of a multi-layer cluster and is not intended as a definition of an embodiment of the present application.

The cluster model comprises multiple layers of class clusters, and each layer of class cluster comprises one or more class clusters. The multi-layer class clusters are divided into parent class clusters and sub-class clusters, and each parent class cluster comprises a sub-class cluster corresponding to the parent class cluster. The sub-cluster is located near its parent cluster, and the sub-cluster center is located near its parent cluster center, i.e., the similarity between the sub-cluster center and the parent cluster center is smaller than a preset value, which is a smaller value. From the hierarchical perspective, the multi-layer class clusters include a top-level parent class cluster, a middle-level class cluster and a terminal sub-class cluster. The top, the beginning of the hierarchy, is the top-level parent cluster, i.e., the first-level cluster, and is also the first-level parent cluster. The end, the last part of the hierarchy, is the end sub-cluster, the last layer sub-cluster. The top class cluster is a parent class cluster, the middle class cluster comprises a parent class cluster and a child class cluster, and the tail class cluster is a child class cluster. The top layer and the end are one illustrative definition made for both ends of a multi-layer cluster.

The class cluster contains a class cluster identification (Identity document, ID) and a class cluster center. The cluster-like center may be represented in a vector or other form. The cluster is obtained after a large number of user samples are clustered. Clustering, namely, according to a specific standard, dividing samples in a data set into different classes or clusters, so that the similarity of samples in the same class cluster is as large as possible, and the variability of samples not in the same class cluster is as large as possible. The specific criterion may be, for example, a distance criterion.

In a scene of a large-scale user sample, in order to reduce the calculated amount during clustering and improve the clustering efficiency, the embodiment of the specification takes the cluster center after the user sample is clustered as a variable to be learned, and gradually learns to a more reasonable and more accurate cluster center in a model iteration mode. The variable to be learned may also be referred to as a learnable variable.

In the clustering model, the number of layers of the multi-layer class clusters and the number of sub-class clusters contained in the parent class clusters can be preset according to experience and requirements. The dimension of the cluster-like center vector may also be preset. Assuming that the number of layers of the class clusters is h, each parent class cluster comprises n child class clusters, and the dimension of the class cluster center vector is d. When h is 3, the shape of the cluster-like vector may be generally expressed as [ n, d ], [ n, n, d ] and [ n, n, n, d ].

The cluster model is used to determine end sub-class clusters that match the user features based on the multi-layer class clusters. The matched end sub-class cluster is the class cluster to which the user sample belongs, so the center of the end sub-class cluster needs to be set as the variable to be learned. In one embodiment, the parent cluster center is not a variable to be learned, and the parent cluster center is updated based on the end child cluster center. For example, in the multi-layer class cluster of FIG. 1, the variables to be learned include end sub-class clusters of 5, 6, and 7, among others.

In various embodiments, the parent cluster center may also be set as the variable to be learned. For example, all class cluster centers in a multi-layer class cluster may be set as variables to be learned. For example, 1, 2, 3, 4, 5, 6, 7, and the like in fig. 1 may be set as variables to be learned.

In the following description of the embodiments, the description will be made with respect to the case of different variables to be learned.

Fig. 2 is a flow chart of a training method of a cluster model according to an embodiment. The cluster model contains a multi-layer class cluster that includes a parent class cluster and its contained child class clusters. The variables to be learned of the cluster model include the end sub-class cluster center. The method may be performed by a computing device. The computing device may be implemented by any means, device, platform, cluster of devices, etc. having computing, processing capabilities. The method comprises the following steps.

In step S210, a user characteristic F1 of the first user sample U1 is acquired.

The first user sample U1 may be a batch of samples or may be a sample. For brevity and clarity of description, the first user sample will be described as a sample.

The user feature F1 may be an embedded vector derived based on user attribute features, user behavior features, and other user-related features. Embedding vectors are processes that map discrete features of user samples into a multi-dimensional vector, the mapped vector being referred to as an embedding vector. It should be noted that, the user features in this embodiment are all used when the user is definitely licensed, and the user features are strictly privacy-protected, so that the privacy data cannot be revealed.

In step S220, determining, by using a clustering model, a center of a terminal sub-class cluster matching with the user feature F1 based on a cluster center of the multi-layer cluster, to obtain a terminal sub-class cluster to which the first user sample U1 belongs.

The multi-layer cluster center may be initialized in the initial state in a manner suitable for the case that all the cluster centers are variables to be learned or the end sub-clusters are variables to be learned. In this manner, the top-level parent cluster center is randomly generated. And then, generating an initial value of the center of each layer of sub-cluster based on a preset offset variance of the center of each layer of sub-cluster relative to the center of the parent cluster. The preset offset variance gradually decreases from high to low. For example, taking the multi-layer cluster of fig. 1 as an example, a plurality of vectors are randomly generated as the cluster center of the 0 th layer. When generating the class cluster center vector of layer 1, a small offset may be superimposed on the basis of the parent class cluster center of layer 1. For example, in generating the class cluster center vectors of 3 and 4, it can be obtained by superimposing a small offset on the class cluster center vector of its parent class cluster 1. The offsets of 3 and 4 can be obtained according to the corresponding preset offset variances, respectively. When setting the offset variance, the offset variance of the 0 th layer may be set to 1, the offset variance of the 1 st layer may be set to a value of 0.1 or 0.05, and the offset variance of the 2 nd layer may be set to a value of 0.001. Reasonable initialization values enable the training process of the cluster model to converge rapidly.

When matching the multi-layer cluster center with the user feature F1, the cluster center may be matched with the user feature F1 layer by layer in the order from the top parent cluster center to the end child cluster center. When determining a class cluster center matched with the user feature F1 for any layer of class cluster center, continuing to match the sub-class cluster center contained in the matched class cluster center with the user feature F1 until the sub-class cluster center is matched with the end sub-class cluster center. When the cluster center matched with the user feature F1 is determined from a certain layer of cluster center, the similarity between the layer of cluster center and the user feature F1 can be calculated respectively, and the cluster center with the largest similarity is used as the cluster center obtained by matching.

For example, in fig. 1, it is determined from layer 0 that the center of the cluster matching the user feature F1 is the cluster 1, then, continuing to match the sub-clusters 3 and 4 of the cluster 1 with the user feature F1, respectively, and assuming that the most similar cluster center is the cluster 3, continuing to match the clusters 5 and 6 with the user feature F1, respectively, and that the successfully matched cluster center is 5, then the cluster 5 is the end sub-cluster to which the first user sample U1 belongs.

Layer-by-layer matching can quickly match to the end sub-cluster center. Especially when the quantity of terminal sub-class cluster centers is more, the layer-by-layer matching can shorten the matching process to a great extent, and the matching efficiency is improved.

In step S230, a prediction loss L is determined based on the similarity between the matched end sub-cluster center and the user feature F1. In step S240, the variable to be learned is updated based on the prediction loss L.

In particular implementations, the prediction loss L may be determined directly based on the similarity between the center of the matched end sub-class cluster and the user feature F1. When calculating the loss, the center of the end sub-class cluster too far from the user feature F1 is penalized. In addition, the prediction loss L may be determined based on the similarity between the plurality of cluster centers.

In one embodiment, the first predictive loss1 is determined based on the similarity between the center of the matched end sub-cluster and the user feature F1, and the second predictive loss2 is determined based on the similarity between the centers of the multi-tier clusters. Wherein the similarity between the centers of the multi-layer class clusters can be selected from the following similarities.

Similarity between the cluster center of the same layer and the other cluster centers of the layer. For example, in fig. 1, the similarity between the cluster-like centers 1 and 2 of layer 0, the similarity between 3 and 4 in layer 1, the similarity between the cluster-like centers 5, 6, 7 of layer 2, and so on. Clusters of classes too close in the same layer are penalized when the loss is calculated.

Similarity between a class cluster center and its parent class cluster center. For example, the similarity between class cluster centers 3 and 4, respectively, and their parent cluster center 1, the similarity between class cluster centers 5 and 6, respectively, and their parent cluster center 3, the similarity between class cluster center 7 and its parent cluster center 4, and so forth. When calculating the loss, class clusters that are too far from their parent cluster are punished.

Similarity between a class cluster center and its sibling class cluster center. For example, the similarity between the cluster-like centers 5 and 6 and the cluster-like center 4, respectively, the similarity between the cluster-like center 7 and the cluster-like center 3, the similarity between the cluster-like centers 3 and 4 and the cluster-like center 2, respectively, and so on. When calculating the loss, class clusters that are too close to sibling class clusters of their parent are penalized.

After determining the first predicted loss1 and the second predicted loss2, when the predicted loss L is determined based on the first predicted loss1 and the second predicted loss2, the first predicted loss1 and the second predicted loss2 may be regarded as predicted losses L.

The above-mentioned similarity may be calculated in a variety of ways. When the cluster center and the user feature are both represented by vectors, a vector distance between the matched end sub-cluster center and the user feature F1 can be calculated, and the vector distance is used as the similarity. The vector distance may be calculated using functions such as euclidean distance, cosine distance, mean square error, etc. The similarity may also be calculated in a variety of other ways.

Steps S210 to S240 are an iterative training process. In a specific implementation, the prediction loss corresponding to a batch of user samples may be determined by using steps S210 to S230, and the cluster model may be updated based on the prediction loss corresponding to the batch of user samples. And when the prediction loss is smaller than a certain threshold value or the number of iterative training reaches a set threshold value, the clustering model training is completed.

The following description refers to the specific embodiment of step S240. When the variable to be learned of the clustering model comprises a tail end sub-cluster center, the parent cluster center of each layer is determined based on the corresponding sub-cluster center. That is, the class cluster center of the parent class cluster is determined based on the class cluster center of the child class cluster of the parent class cluster. The corresponding sub-class cluster refers to a sub-class cluster contained in the parent class cluster.

In this embodiment, all prediction losses point to adjustments to the center of the end sub-class cluster. For example, when the predicted loss L includes a first predicted loss1 and a second predicted loss2, a first correction amount for the center of the corresponding end sub-class cluster may be determined based on the first prediction loss1, and determining a second correction amount of the corresponding end sub-cluster center based on the second prediction loss2, and updating the corresponding end sub-cluster center based on superposition of the first correction amount and the second correction amount.

For example, the end sub-cluster center that matches the first user sample U1 is the cluster-like center 5 in fig. 1, and when the distance between the cluster-like center 5 and the user feature F1 of the first user sample U1 is greater than a preset first distance threshold, the correction amount 1 for the cluster-like center 5 may be determined based on the distance. When the distance between the cluster center 5 and the cluster center 3 is greater than a preset second distance threshold, the correction amount 2 for the cluster center 5 is determined based on the distance. When the distance between the cluster center 3 and the cluster center 4 is smaller than a preset third distance threshold, correction amounts of the sub-cluster centers included in the cluster center 3 and the cluster center 4 are determined, wherein the correction amount 3 of the cluster center 5 is included. Thus, the cluster-like center 5 is updated by superimposing the correction amounts 1, 2, and 3.

In any iteration process, after the end sub-class cluster center is updated, the parent cluster center of each layer can be updated based on the updated end sub-class cluster center. The parent cluster center may be determined based on an average or weighted average of the child cluster centers it contains. In fig. 1, after the class cluster centers 5, 6, and 7, etc. are updated, the vectors of the class cluster centers 3, 4, 1, and 2 may be recalculated based on the updated vectors.

In order to train the variables to be learned faster, the following updating method can also be adopted. In any one iteration training, a cold cluster and a hot cluster are determined based on the number of first user samples U1 contained in the parent cluster. And then, obtaining a class cluster center vector of the hot door class cluster, and updating the sub-class cluster center vector of each layer contained in the cold door class cluster based on the difference between the hot door class cluster center vector and the cold door class cluster center vector, wherein updating of the end sub-class cluster center vector is contained.

The number of user samples that match per parent cluster center may be recorded while step S220 is performed. And when the number is larger than a preset first value, determining the parent cluster as a hot cluster, and when the number is smaller than a preset second value, determining the parent cluster as a cold cluster.

And updating the center vectors of the sub-clusters of each layer contained in the cold door cluster based on the difference between the center vector of the hot door cluster and the center vector of the cold door cluster. Specifically, a vector difference between the center vector of the cold door cluster and the center vector of the hot door cluster can be calculated, and the vector difference is superimposed into the center vector of each layer of sub-cluster included in the cold door cluster, including being superimposed into the center of the end sub-cluster.

In this embodiment, the cluster center vector of the cold door cluster is updated in the above manner, so that the cold door cluster is moved to the vicinity of the hot door cluster, thereby improving the utilization rate of the cold door cluster and also alleviating the problem of excessive samples of the hot door cluster.

The embodiment of step S240 is further described below. In this embodiment, the variables to be learned include a parent cluster center and a child cluster center. In the implementation, part of the centers of the parent clusters or all the centers of the parent clusters can be set as variables to be learned.

In this embodiment, in any one iteration process, after an arbitrary first parent cluster center is updated, based on the updated first parent cluster center, a child cluster center included in the first parent cluster center is updated. Therefore, the child cluster center is updated based on the parent cluster center in the present embodiment. The updating process of the parent cluster center and the child cluster center in the iterative process is described in detail below.

In this embodiment, the parent cluster center overlaps the offset to obtain the child cluster center, and the child cluster center is obtained based on overlapping the parent cluster center and the offset. FIG. 3 is a schematic diagram of a data relationship between multi-layer cluster centers. Wherein the top class cluster centers 1 and 2 store respective class cluster center vectors, namely vector 1 and vector 2, respectively. The child cluster centers 3 and 4 store their offsets, i.e., offsets d3 and d4, respectively, relative to the parent cluster center 1. The vector of the child cluster center 3 is the sum of the vector of the parent cluster center 1 and the offset d3, and the vector of the child cluster center 4 is the sum of the vector of the parent cluster center 1 and the offset d4. The vector of the child cluster center 5 is the sum of the vector of the parent cluster center 3 and the offset d5, the vector of the child cluster center 6 is the sum of the vector of the parent cluster center 3 and the offset d6, the vector of the child cluster center 7 is the sum of the vector of the parent cluster center 4 and the offset d7, and so on.

When the cluster center is updated, the offset of the top-layer cluster center and other layer cluster centers can be updated based on the corresponding prediction loss, namely the top-layer cluster center vector is directly updated, and the offset of other layer cluster centers is updated. In a logical sense, when the offset of the parent cluster is updated, the cluster center of the child cluster contained in the parent cluster also changes correspondingly.

When the prediction loss L includes the first prediction loss1 and the second prediction loss2, the step S240 may include the following steps 1 to 3 when updating the parent cluster center and the child cluster center of the variable to be learned.

Step 1, determining a first correction amount of the center of the corresponding end sub-class cluster based on the first prediction loss 1. The first predictive loss1 is based on the similarity between the center of the end sub-cluster and the user feature F1.

And 2, when the second prediction loss2 is determined based on the similarity between the arbitrary second cluster center C2 and the third cluster center C3, respectively determining correction amounts of the second cluster center C2 and the third cluster center C3 based on the second prediction loss 2.

And 3, updating the center of any sub-cluster based on the correction quantity of the center of the sub-cluster and the correction quantity of the centers of the parent clusters of each layer of the center of the sub-cluster. The center of the top parent cluster is updated directly according to the correction quantity. And superposing the correction quantity of the center of the sub-cluster and the correction quantity of the centers of the parent clusters of each layer to obtain the total correction quantity of the center of the sub-cluster, and superposing the center of the sub-cluster and the total correction quantity to obtain the updated center of the sub-cluster. The operation of superimposing may include direct addition, or addition after changing sign, etc. This may be determined according to specific execution logic.

When the clustering model is not trained, only the offset of the sub-class cluster center may be updated. When the complete vector of the center of the class cluster is needed, the center vector of the top-level parent class cluster and the offset of the center of the top-level parent class cluster and the center of other parent class clusters can be read respectively, and the complete vector is obtained through superposition. The steps 1 and 2 are executed in no order.

In the above embodiment, the user sample is used to train the cluster model, and when the training of the cluster model is completed, the cluster model can be used to predict the clusters corresponding to the user sample. That is, the clustering model not only clusters the user samples that are engaged in training, but also clusters more user samples that are not engaged in training. In a scene of a large-scale user, all users do not need to participate in a clustering process, and the cluster model can be utilized to directly predict the attributive class clusters of the users, so that the user features can be clustered more efficiently, and the clustering efficiency is improved. Moreover, the clustering model can be realized by adopting a common neural network library and hardware, and does not depend on specific hardware or software version.

After training the cluster model or after training, a plurality of end sub-class clusters can be obtained from the cluster model, a first corresponding relation between the class cluster ID and the class cluster center is obtained, and the first corresponding relation is stored. The first correspondence may be stored using a cluster-like central table. The bottom sub-cluster center in the cluster model, namely the end sub-cluster center, is the cluster center which is finally needed to be stored. From the cluster centers, the cluster to which the user sample belongs can be determined.

In a large-scale sample scenario, enough user samples can be used to train a cluster model to obtain enough cluster centers. The cluster model may be used to predict the cluster to which the user sample belongs in the face of more new user samples. See in particular the examples below.

Fig. 4 is a flowchart illustrating a method for determining a cluster of a user sample according to an embodiment. The method may be performed by a computing device, including the following steps.

In step S410, a second user sample U2 of the cluster-like cluster to be determined is obtained. The second user sample U2 may be a user sample that participates in the process cluster model training process, or may be a new sample that has not participated in.

In step S420, a trained cluster model is acquired. The clustering model is trained using the method provided by the embodiment shown in fig. 2.

In step S430, determining, by the clustering model, an end sub-class cluster center matching the user feature F2 of the second user sample U2 based on the cluster centers of the multi-layer clusters, to obtain a cluster to which the second user sample U2 belongs. The present step may be performed in a similar manner to step S220, and the detailed process is not repeated.

Through step S430, it may be determined that the second user sample U2 belongs to the cluster-like identifier. After obtaining the cluster identifiers to which the plurality of user samples belong through steps S410 to S430, a second correspondence relationship between the user IDs of the user samples and the cluster IDs may also be stored.

The above embodiments of fig. 2 and 4 can be understood as the production phase of the cluster-like center. The first correspondence and the second correspondence are obtained in this production phase. The consumption phase for the cluster-like center is described below.

In a large-scale user scenario, computing embedded vectors based on discrete features of users can be cumbersome due to the large number of users, and storing embedded vectors for a large number of users can also take up too much memory. Especially in the information pushing scene, the reasonable information pushing of large-scale users needs to frequently use the embedded vectors of the users. The clustering center is adopted as the embedded vector of the user sample, so that large-scale calculation amount of the embedded vector can be avoided, and occupation of the memory is reduced. See the examples below for details.

Fig. 5 is a flow chart of an information pushing method according to an embodiment. The method is performed by a computing device, including the following steps.

In step S510, a user identifier of a third user sample U3 of information to be pushed is obtained as a first user identifier UID1.

In step S520, a first cluster identifier CID1 corresponding to the first user identifier UID1 is determined from the second correspondence relationship between the user ID and the cluster ID. The second correspondence is obtained by predicting the user sample through a clustering model, and the clustering model is trained by using the method provided by the embodiment shown in fig. 2.

In step S530, from the first correspondence between the class cluster ID and the class cluster center, the class cluster center corresponding to the first class cluster identification CID1 is determined as the sample feature of the third user sample U3. Wherein the first correspondence is obtained from a cluster model.

In step S540, push information for the third user sample U3 is determined by the information push model using the sample features described above.

In the embodiment, the second corresponding relation may be imported into the dictionary, and the cluster-like center table including the first corresponding relation may be loaded into the memory. When sample features of the user sample, namely the embedded vector, need to be used, the corresponding class cluster ID can be queried in the dictionary based on the user ID, and the corresponding class cluster center vector can be queried in the class cluster center table according to the class cluster ID. And using the cluster center vector as a sample characteristic of a user sample to participate in information push calculation.

According to the embodiments, the two-stage mapping from the characteristic value to the cluster ID and from the cluster ID to the embedded vector is adopted, so that the embedded vector and the clustering problem thereof in a large-scale scene can be well solved, and the efficiency of information pushing is improved.

In the present specification, the words "first" in the words of the first user sample, the first prediction loss, the first parent cluster center, and the like, and the words "second" and the like are merely for distinguishing and describing convenience, and are not meant to be limiting in any way.

The foregoing describes certain embodiments of the present disclosure, other embodiments being within the scope of the following claims. In some cases, the actions or steps recited in the claims can be performed in a different order than in the embodiments and still achieve desirable results. Furthermore, the processes depicted in the accompanying figures are not necessarily required to achieve the desired result in the particular order shown, or in a sequential order. In some embodiments, multitasking and parallel processing are also possible, or may be advantageous.

Fig. 6 is a schematic block diagram of a training apparatus for a cluster model according to an embodiment. The clustering model comprises a multi-layer class cluster, wherein the multi-layer class cluster comprises a parent class cluster and a child class cluster contained in the parent class cluster; the variables to be learned of the clustering model comprise an end sub-cluster center; the cluster model is used for determining an end sub-class cluster matched with the user features based on the multi-layer class clusters. The apparatus 600 is deployed in a computing device. A computing device may be implemented by any means, device, platform, cluster of devices, etc. having computing, processing capabilities. This embodiment of the device corresponds to the embodiment of the method shown in fig. 2. The apparatus 600 includes:

A first acquisition module 610 configured to acquire user characteristics of a first user sample;

a first matching module 620, configured to determine, through the clustering model, an end sub-class cluster center that matches the user feature based on a class cluster center of a multi-layer class cluster, to obtain an end sub-class cluster to which the first user sample belongs;

a first penalty module 630 configured to determine a predicted penalty based on a similarity between the matched end sub-cluster center and the user feature;

a first updating module 640 configured to update the variable to be learned based on the predictive loss.

In one embodiment, the first update module 640 is further configured to:

after updating the variables to be learned, updating the parent cluster centers of each layer based on the updated terminal sub-class cluster centers.

In one embodiment, the apparatus 600 further includes an initialization module (not shown in the figure) configured to initialize the multi-layer cluster center in the following manner:

randomly generating a top-level parent cluster center;

In one embodiment, the first matching module 620 is specifically configured to:

In one embodiment, the first loss module 630 includes a first loss sub-module, a second loss sub-module, and a third loss sub-module; (not shown in the drawings)

A first loss submodule configured to determine a first predicted loss based on a similarity between a matched end sub-class cluster center and the user feature;

a second loss submodule configured to determine a second predicted loss based on similarity between multi-tier class cluster centers;

a third loss submodule configured to determine the predicted loss based on the first predicted loss and the second predicted loss.

similarity between the cluster center and its parent cluster center;

similarity between a class cluster center and its sibling class cluster center.

In one embodiment, the first loss submodule is specifically configured to:

In one embodiment, the predicted loss comprises the first predicted loss and the second predicted loss; the first update module 640 includes a first determination sub-module, a second determination sub-module, and a first update sub-module; (not shown in the drawings)

A first determination submodule configured to determine a first modifier of a center of a corresponding end sub-class cluster based on the first predicted loss;

a second determination submodule configured to determine a second modifier of a center of the corresponding end sub-class cluster based on the second predicted loss;

and the first updating submodule is configured to update the center of the corresponding tail sub-class cluster based on superposition of the first correction amount and the second correction amount.

In one embodiment, the apparatus 600 further comprises a second updating module (not shown in the figure) configured to update the variable to be learned in the following manner:

obtaining a class cluster center vector of the hot class cluster;

In one embodiment, the variable to be learned further includes a parent cluster center; the first update module 640 is specifically configured to:

In one embodiment, the child cluster center is obtained based on superposition of its parent cluster center and the offset; the first update module 640 is specifically configured to:

In one embodiment, the variable to be learned further includes a parent cluster center; the predicted loss includes the first predicted loss and the second predicted loss; the first update module 640 includes a third determination sub-module, a fourth determination sub-module, and a second update sub-module; (not shown in the drawings)

A third determination submodule configured to determine a first modifier of a center of a corresponding end sub-class cluster based on the first predicted loss;

a fourth determination sub-module configured to determine correction amounts of the second-type cluster center and the third-type cluster center, respectively, based on the second prediction loss when the second prediction loss is determined based on a similarity between an arbitrary second-type cluster center and third-type cluster center;

The second updating sub-module is configured to update the sub-cluster center based on the correction amount of the sub-cluster center and the correction amount of each layer of parent cluster center of the sub-cluster center for any sub-cluster center.

In one embodiment, the apparatus 600 further comprises:

the first storage module (not shown in the figure) is configured to acquire a plurality of end sub-class clusters from the clustering model after the clustering model is trained, so as to obtain a first corresponding relation between a class cluster identifier and a class cluster center; and storing the first corresponding relation.

Fig. 7 is a schematic block diagram of a device for determining a cluster type cluster of a user sample according to an embodiment. The apparatus 700 is deployed in a computing device. A computing device may be implemented by any means, device, platform, cluster of devices, etc. having computing, processing capabilities. This device embodiment corresponds to the method embodiment shown in fig. 4. The apparatus 700 includes:

a second obtaining module 710 configured to obtain a second user sample of the cluster to be determined;

a third obtaining module 720, configured to obtain a trained cluster model, where the cluster model is trained by using the method provided by the embodiment shown in fig. 2;

And the second matching module 730 is configured to determine, through the clustering model, an end sub-class cluster center matched with the user characteristics of the second user sample based on the class cluster centers of the multi-layer class clusters, thereby obtaining a class cluster to which the second user sample belongs.

In one embodiment, the apparatus 700 further comprises:

a second storage module (not shown in the figure) is configured to store a second correspondence between the user identifications of the user samples and the class cluster identifications after obtaining class cluster identifications of class clusters to which the plurality of user samples belong.

Fig. 8 is a schematic block diagram of an information pushing device according to an embodiment. The apparatus 800 is deployed in a computing device. A computing device may be implemented by any means, device, platform, cluster of devices, etc. having computing, processing capabilities. This embodiment of the device corresponds to the embodiment of the method shown in fig. 5. The apparatus 800 includes:

a fourth obtaining module 810, configured to obtain, as the first user identifier, a user identifier of a third user sample of information to be pushed;

a first determining module 820 configured to determine a first cluster identifier corresponding to the first user identifier from a second correspondence between the user identifier and a cluster identifier; the second corresponding relation is obtained by predicting a user sample through a clustering model, and the clustering model is trained by adopting a method provided by the embodiment shown in fig. 2;

A second determining module 830, configured to determine, from a first correspondence between a cluster identifier and a cluster center, the cluster center corresponding to the first cluster identifier as a sample feature of the third user sample; wherein the first correspondence is obtained from the cluster model;

a third determination module 840 is configured to determine push information for the third user sample through an information push model using the sample characteristics.

The foregoing apparatus embodiments correspond to the method embodiments, and specific descriptions may be referred to in the method embodiment section, which is not repeated herein. The device embodiments are obtained based on corresponding method embodiments, and have the same technical effects as the corresponding method embodiments, and specific description can be found in the corresponding method embodiments.

The present description also provides a computer-readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of fig. 1 to 5.

Embodiments of the present disclosure also provide a computing device including a memory having executable code stored therein and a processor that, when executing the executable code, implements the method of any one of fig. 1 to 5.

In this specification, each embodiment is described in a progressive manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for storage media and computing device embodiments, since they are substantially similar to method embodiments, the description is relatively simple, with reference to the description of method embodiments in part.

Those skilled in the art will appreciate that in one or more of the examples described above, the functions described in the embodiments of the present invention may be implemented in hardware, software, firmware, or any combination thereof. When implemented in software, these functions may be stored on or transmitted over as one or more instructions or code on a computer-readable medium.

The foregoing detailed description of the embodiments of the present invention further details the objects, technical solutions and advantageous effects of the embodiments of the present invention. It should be understood that the foregoing description is only specific to the embodiments of the present invention and is not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements, etc. made on the basis of the technical solutions of the present invention should be included in the scope of the present invention.

Claims

1. A training method of a cluster model comprises a plurality of layers of class clusters, wherein each layer of class cluster comprises a parent class cluster and a child class cluster contained in the parent class cluster; the variables to be learned of the clustering model comprise an end sub-cluster center; the clustering model is used for determining an end sub-class cluster matched with the user characteristics based on the multi-layer class clusters; the terminal sub-class cluster center is used as a sample characteristic of a user sample and used for pushing information to the user sample; the method comprises the following steps:

acquiring user characteristics of a first user sample; the user characteristics comprise embedded vectors obtained based on user attribute characteristics, user behavior characteristics and other characteristics related to the user;

updating the variable to be learned based on the prediction loss;

wherein the variable to be learned is also updated in the following manner:

determining a cold door cluster and a hot door cluster based on the number of first user samples contained in the parent cluster; the hot cluster is a parent cluster when the number of the first user samples is larger than a preset first value, and the cold cluster is a parent cluster when the number of the first user samples is smaller than a preset second value;

2. The method of claim 1, further comprising, after updating the variable to be learned:

3. The method of claim 1, initializing a multi-layer cluster-like center in the following manner:

randomly generating a top-level parent cluster center;

4. The method of claim 1, the step of determining an end sub-class cluster center that matches the user feature based on a class cluster center of a multi-layer class cluster, comprising:

5. The method of claim 1, the step of determining a predictive loss comprising:

6. The method of claim 5, wherein the similarity between the multi-layer cluster-like centers comprises several of:

similarity between the cluster center and its parent cluster center;

similarity between a class cluster center and its sibling class cluster center.

7. The method of claim 5, the step of determining a first predictive loss comprising:

8. The method of claim 5, the predicted loss comprising the first predicted loss and the second predicted loss; the step of updating the variable to be learned includes:

9. The method of claim 1, the variable to be learned further comprising a parent cluster center; the step of updating the variable to be learned includes:

10. The method of claim 9, wherein the child cluster center is obtained based on superposition of its parent cluster center and an offset;

the step of updating the variable to be learned includes:

11. The method of claim 5, the variable to be learned further comprising a parent cluster center; the predicted loss includes the first predicted loss and the second predicted loss; the step of updating the variable to be learned includes:

12. The method of claim 1, further comprising, after the cluster model is trained:

acquiring a plurality of tail end sub-class clusters from the clustering model to obtain a first corresponding relation between a class cluster identifier and a class cluster center;

and storing the first corresponding relation.

13. A method for determining a cluster type cluster of a user sample comprises the following steps:

acquiring a second user sample of a cluster to be determined;

acquiring a trained cluster model, wherein the cluster model is trained by adopting the method of claim 1;

14. The method of claim 13, after obtaining class cluster identifications of class clusters to which a plurality of user samples belong, further comprising:

15. An information pushing method, comprising:

determining a first cluster identifier corresponding to the first user identifier from a second corresponding relation between the user identifier and the cluster identifier; wherein the second correspondence is obtained by predicting a user sample through a cluster model, the cluster model being trained by the method of claim 1;

16. A training device of a cluster model, wherein the cluster model comprises a plurality of layers of class clusters, and the layers of class clusters comprise a parent class cluster and a child class cluster contained in the parent class cluster; the variables to be learned of the clustering model comprise an end sub-cluster center; the clustering model is used for determining an end sub-class cluster matched with the user characteristics based on the multi-layer class clusters; the terminal sub-class cluster center is used as a sample characteristic of a user sample and used for pushing information to the user sample; the device comprises:

The first acquisition module is configured to acquire user characteristics of a first user sample; the user characteristics comprise embedded vectors obtained based on user attribute characteristics, user behavior characteristics and other characteristics related to the user;

a first updating module configured to update the variable to be learned based on the predictive loss;

wherein the variable to be learned is also updated in the following manner:

17. A device for determining a cluster type cluster of a user sample comprises:

a third acquisition module configured to acquire a trained cluster model, the cluster model being trained using the method of claim 1;

18. An information pushing apparatus, comprising:

the first determining module is configured to determine a first cluster identifier corresponding to the first user identifier from a second corresponding relation between the user identifier and the cluster identifier; wherein the second correspondence is obtained by predicting a user sample through a cluster model, the cluster model being trained by the method of claim 1;

19. A computer readable storage medium having stored thereon a computer program which, when executed in a computer, causes the computer to perform the method of any of claims 1-15.

20. A computing device comprising a memory having executable code stored therein and a processor, which when executing the executable code, implements the method of any of claims 1-15.