CN114757704A

CN114757704A - User layering method and device

Info

Publication number: CN114757704A
Application number: CN202210395100.3A
Authority: CN
Inventors: 李佳璐; 包勇军; 颜伟鹏
Original assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Current assignee: Beijing Jingdong Century Trading Co Ltd; Beijing Wodong Tianjun Information Technology Co Ltd
Priority date: 2022-04-15
Filing date: 2022-04-15
Publication date: 2022-07-15

Abstract

The invention discloses a user layering method and device, and relates to the technical field of big data. One embodiment of the method comprises: the method comprises the steps of obtaining basic data of a user to be predicted and order data in preset time, obtaining article information from the order data, determining prior characteristic data of the user to be predicted by utilizing the article information, inputting the basic characteristic data and the prior characteristic data into a user hierarchical model, predicting a hierarchical result of the user to be predicted through the trained user hierarchical model, and obtaining user groups of all hierarchies so as to achieve accurate marketing and delivery. According to the embodiment of the invention, the order data of the user to be predicted is utilized to obtain the prior characteristic data, the basic characteristic data and the prior characteristic data are utilized to predict the layering result, the data influencing the layering effect are fully utilized, and the prediction accuracy is improved.

Description

User layering method and device

Technical Field

The invention relates to the technical field of big data, in particular to a user layering method and device.

Background

The user hierarchy is the basis of big data fine operation and accurate marketing service. The existing user layering method mainly comprises the following steps: rule methods, maximum likelihood methods, traditional machine learning methods, deep learning methods, and the like.

However, the existing user layering method has the following disadvantages: the rule method is based on business experience, has strong subjectivity and is difficult to form a generalization method; the accuracy of the maximum likelihood method prediction is low; the traditional machine learning method and the deep learning method cannot fully mine information in data, and model characteristics lack pertinence to a hierarchical target.

Disclosure of Invention

In view of this, embodiments of the present invention provide a user layering method and apparatus, where prior feature data is obtained according to item information in order data of a user to be predicted, a layering result of the user to be predicted is obtained by using basic feature data, the prior feature data and a user layering model, data that affects a layering effect is fully used, and accuracy of layering prediction is improved.

To achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a user layering method, including:

acquiring basic characteristic data of a user to be predicted and order data within preset time;

acquiring article information from the order data, and determining prior characteristic data of the user to be predicted according to the article information;

and inputting the basic characteristic data and the prior characteristic data into a trained user layering model, and determining a layering result of the user to be predicted.

Optionally, the user layer model is trained in the following manner:

obtaining a training user set, a prior user set and a test user set according to a plurality of users and the layering result of each user;

acquiring order data of each prior user in the prior user set within a first preset time, acquiring article information from the order data of each prior user, and constructing prior characteristics according to the article information and a layering result of each prior user;

acquiring basic feature data of each training user in the training user set and order data in second preset time, and determining the prior feature data of each training user according to the order data in the second preset time of each training user based on the prior features;

and performing model training according to the prior characteristic data and the basic characteristic data of each training user and the layering result of each training user, and performing model testing according to the prior characteristic data and the basic characteristic data of each testing user in the testing user set and the layering result of each testing user to obtain the user layering model.

Optionally, obtaining a training user set, a prior user set, and a testing user set according to a plurality of users and a hierarchical result of each user includes:

Determining the number of users corresponding to each hierarchy according to the plurality of users and the hierarchy result of each user;

determining a layering sampling ratio according to the number of users corresponding to each layer;

respectively determining the number of users in the training user set, the prior user set and the testing user set according to a preset proportion;

and respectively determining each user in the training user set, the prior user set and the testing user set according to the number of the users in the training user set, the prior user set and the testing user set, the layered sampling proportion and the plurality of users.

Optionally, determining a hierarchical sampling ratio according to the number of users corresponding to each hierarchy includes:

determining the ratio of the number of the users corresponding to each layer according to the number of the users corresponding to each layer;

and adjusting the ratio of the number of the users corresponding to each layer in an undersampling mode to obtain the layer sampling ratio.

Optionally, respectively determining each user in the training user set, the prior user set, and the testing user set includes:

based on the number of the users in the training user set, acquiring each layered user from the plurality of users according to the layered sampling proportion so as to determine each user in the training user set;

Based on the number of users in the test user set, acquiring each layered user from the plurality of users according to the layered sampling proportion so as to determine each user in the test user set;

and determining each user in the prior user set according to all users and each user in the training user set and the test user set.

Optionally, the item information includes an item identifier, and constructing a priori characteristics according to the item information and a hierarchical result of each of the priori users includes:

determining prior characteristics of multiple dimensions aiming at the item identification according to the layering result of each prior user, the item identification in the order data of each prior user and the order quantity containing the item identification in the order data of each prior user.

Optionally, the prior features include a first prior feature and a second prior feature, the item information includes an item identifier, and constructing the prior features according to the item information and the hierarchical result of each prior user includes:

aiming at the mark of any article, the method comprises the following steps of,

determining first prior characteristics of multiple dimensions according to the total number of prior users including any article identifier in order data and the number of prior users corresponding to each layer including any article identifier in order data;

And determining second prior characteristics of multiple dimensions according to the order quantity of any item identifier contained in the order data of each layered prior user and the total order quantity of any item identifier contained in the order data of all prior users.

Optionally, the prior feature further includes a third feature and a fourth feature, and the constructing the prior feature according to the item information and the hierarchical result of each prior user further includes:

determining third prior characteristics of multiple dimensions according to the first prior characteristics, the number of prior users corresponding to each hierarchy and the total number of prior users in the prior user set;

and determining fourth prior characteristics of multiple dimensions according to the second prior characteristics, the number of the prior users corresponding to each hierarchy and the total number of the prior users in the prior user set.

Optionally, determining, based on the prior feature, prior feature data of each training user according to order data of each training user within a second preset time, where the determining includes:

determining prior characteristic values of a plurality of dimensions of each item identification in order data of each training user based on prior characteristics of the plurality of dimensions;

For any dimension of the multiple dimensions, determining the sum of the prior characteristic values of the article identifications in the any dimension as the prior characteristic data of the training user corresponding to the any dimension.

Optionally, determining prior feature data of the user to be predicted according to the item information includes:

determining prior characteristic values of multiple dimensions of each item identifier in order data of the user to be predicted based on the prior characteristics of the multiple dimensions;

for any dimension of the multiple dimensions, determining the sum of the prior characteristic values of the item identifiers in the any dimension as the prior characteristic data of the user to be predicted corresponding to the any dimension.

According to still another aspect of the embodiments of the present invention, there is provided an apparatus for user layering, including:

the acquisition module is used for acquiring basic characteristic data of a user to be predicted and order data in a preset time;

the first determining module is used for acquiring article information from the order data and determining the prior characteristic data of the user to be predicted according to the article information;

and the second determining module is used for inputting the basic characteristic data and the prior characteristic data into a trained user layering model and determining the layering result of the user to be predicted.

According to another aspect of an embodiment of the present invention, there is provided an electronic device including:

one or more processors;

a storage device to store one or more programs,

when the one or more programs are executed by the one or more processors, the one or more processors are caused to implement the user layering method provided by the present invention.

According to a further aspect of an embodiment of the present invention, there is provided a computer-readable medium on which a computer program is stored, the program, when executed by a processor, implementing the user layering method provided by the present invention.

One embodiment of the above invention has the following advantages or benefits: the method comprises the steps of obtaining basic data of a user to be predicted and order data within preset time, obtaining article information from the order data, determining prior characteristic data of the user to be predicted by utilizing the article information, inputting the basic characteristic data and the prior characteristic data into a user hierarchical model, predicting a hierarchical result of the user to be predicted through the trained user hierarchical model, and achieving user groups of all hierarchies to achieve accurate marketing and delivery. According to the embodiment of the invention, the order data of the user to be predicted is utilized to obtain the prior characteristic data, the basic characteristic data and the prior characteristic data are utilized to predict the layering result, the data influencing the layering effect are fully utilized, and the prediction accuracy is improved.

Further effects of the above-mentioned non-conventional alternatives will be described below in connection with the embodiments.

Drawings

The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:

FIG. 1 is a schematic diagram of a main flow of a user layering method according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of the main flow of another user-layered method according to an embodiment of the present invention;

FIG. 3 is a flow chart illustrating a method for user stratification in accordance with an embodiment of the present invention;

FIG. 4 is a flow chart diagram of a method of constructing a prior feature in accordance with an embodiment of the present invention;

FIG. 5 is a schematic diagram of the main modules of a user-layered device according to an embodiment of the present invention;

FIG. 6 is an exemplary system architecture diagram in which embodiments of the present invention may be employed;

fig. 7 is a schematic block diagram of a computer system suitable for use in implementing a terminal device or server of an embodiment of the invention.

Detailed Description

Exemplary embodiments of the present invention are described below with reference to the accompanying drawings, in which various details of embodiments of the invention are included to assist understanding, and which are to be considered as merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.

Fig. 1 is a schematic diagram of a main flow of a user layering method according to an embodiment of the present invention, as shown in fig. 1, the method includes the following steps:

step S101: acquiring basic characteristic data of a user to be predicted and order data within preset time;

step S102: acquiring article information from the order data, and determining prior characteristic data of a user to be predicted according to the article information;

step S103: and inputting the basic characteristic data and the prior characteristic data into the trained user layering model, and determining the layering result of the user to be predicted.

In the embodiment of the invention, the user layering method can be applied to the E-commerce field, and the user layering is used for classifying users so as to perform fine operation and accurate marketing in a targeted manner aiming at groups of different layers. Such as personalized recommendations, advertising systems, campaign marketing, content recommendations, interest preferences, are all based on applications that are hierarchical to the user. Multiple dimensions may be used in the user stratification to measure, for example, the user may be stratified by lifecycle, activity, loyalty, etc., such as dividing the user into three layers of high, medium, and low loyalty.

In the embodiment of the invention, the user hierarchical model is obtained by training in the following way:

acquiring order data of each prior user in a prior user set within first preset time, acquiring article information from the order data of each prior user, and constructing prior characteristics according to the article information and a layering result of each prior user;

acquiring basic characteristic data of each training user in a training user set and order data in second preset time, and determining the prior characteristic data of each training user according to the order data in the second preset time of each training user based on the prior characteristics;

and performing model training according to the prior characteristic data and the basic characteristic data of each training user and the layering result of each training user, and performing model testing according to the prior characteristic data and the basic characteristic data of each testing user in the testing user set and the layering result of each testing user to obtain a user layering model.

In the embodiment of the present invention, first, a plurality of users and a hierarchical result of each user are obtained, where the plurality of users with the hierarchical result may be referred to as seed users, for example, data of the users may be collected from an e-commerce shopping platform, where the data includes link information of online shopping behaviors such as orders, browsing, concerns, shopping and the like, and basic information of the users, and then the collected data is subjected to data integration and ETL (Extract-Transform-Load ) import into a data warehouse for subsequent use.

In the embodiment of the present invention, as shown in fig. 2, after obtaining a plurality of users and a hierarchical result of each user, the method divides the plurality of users into a training user set, a prior user set, and a testing user set, and includes:

step S201: determining the number of users corresponding to each hierarchy according to the plurality of users and the hierarchy result of each user;

step S202: determining a layering sampling ratio according to the number of users corresponding to each layer;

step S203: respectively determining the number of users in a training user set, a prior user set and a test user set according to a preset proportion;

step S204: and respectively determining each user in the training user set, the prior user set and the test user set according to the number of the users in the training user set, the prior user set and the test user set, the hierarchical sampling proportion and a plurality of users.

The ratio of the number of users in each hierarchy can be calculated through the number of users in each hierarchy, and the ratio of the number of users in each hierarchy can be used as the hierarchy sampling ratio. However, when the number of users in each hierarchy is different greatly, for example, the ratio of the number of users in each hierarchy is 1:2:50:100:3:1, sampling is performed by taking the ratio as a hierarchical sampling ratio, and then a model is constructed, so that the model tends to learn the characteristics of two hierarchies with higher ratios, but is difficult to learn the characteristics of hierarchies with lower ratios, and the model prediction accuracy is low.

In an optional implementation manner of the embodiment of the present invention, determining a hierarchical sampling ratio according to a number of users corresponding to each hierarchical layer includes:

and adjusting the ratio of the number of users corresponding to each layer in an undersampling mode to obtain the layer sampling ratio.

In the embodiment of the present invention, when the number of users corresponding to each tier is relatively large, the ratio of the number of users corresponding to each tier is adjusted in an under-sampling manner to obtain the tiered sampling ratio, for example, the ratio of the number of users corresponding to each tier may be adjusted to 1:2:50:100:3:1 to 1:2:5:10:3:1, and the adjusted ratio is used as the tiered sampling ratio to perform sampling, that is, one class with a relatively large data amount is randomly sampled according to the adjusted ratio, so that the number of users in each tier is relatively uniform, and the problem of unbalanced data distribution can be solved. Wherein, the ratio of the number of users of each layer can be adjusted according to the specific service meaning to determine the layer sampling ratio.

In the embodiment of the present invention, after determining the hierarchical sampling proportion, determining each user in the training user set, the prior user set, and the testing user set respectively includes:

Based on the number of users in the training user set, acquiring each layered user from a plurality of users according to a layered sampling proportion so as to determine each user in the training user set;

based on the number of users in the test user set, acquiring each layered user from a plurality of users according to a layered sampling proportion so as to determine each user in the test user set;

and determining each user in the prior user set according to all users and each user in the training user set and the testing user set.

In the embodiment of the invention, the number of users in the training user set, the prior user set and the test user set is determined according to a preset proportion, the preset proportion can be set in a user-defined mode, the users in the training user set and the users in the test user set are respectively used for training and testing a model, the training user set and the prior user set can be set to be the same number of users through the structure of the prior characteristics of the prior user set, the number of the users is more than that of the users in the test user set, and for example, the preset proportion can be set to be 4:4: 2.

In the embodiment of the invention, after the number of users in a training user set, a prior user set and a test user set is respectively determined, the number of users corresponding to each layer in the training user set and the test user set is respectively calculated according to a layered sampling proportion aiming at the training user set and the test user set, then the corresponding number of users are extracted from the corresponding layers of a plurality of users, each user in the training user set and the test user set is respectively obtained, and the rest users in the plurality of users are the users in the prior user set. For example, if the number of users in each layer is 1:2:50:100:3:1, the sampling ratio of the layers is determined to be 1:2:5:10:3:1 after undersampling adjustment, the number of users in a training user set, a priori user set and a test user set is divided according to a preset ratio of 4:4:2, namely, 40% of the total number of seed users is calculated to be the number of users in the training user set, the number of users distributed to each layer in the training user set is calculated according to the sampling ratio of the layers, the number of users distributed to each layer in the training user set is extracted from a plurality of users in the corresponding layer according to the number of users, and each extracted user is used as the training user set; similarly, 20% of the total number of the seed users is calculated to be the number of the users in the test user set, and then each user in the test user set is obtained by extraction according to a similar method; the users of 40% of the total number of the remaining seed users are used as the prior user set.

The number distribution or order quantity imbalance of the users in each layer in the prior user set obtained by the sampling method is adopted, so that the data of the users in the prior user set are fully utilized, the prior characteristics can be constructed according to the data of the users, the prior information is fully depicted, the influence of uneven number distribution or order quantity distribution of the users in each layer in the prior user set is reduced, the data information of the users in the prior user set is fully utilized, and the characteristics related to layering are mined.

When the prior feature is constructed by utilizing the prior user set, acquiring order data of each prior user in the prior user set within first preset time, wherein the first preset time can be set by user definition; and then extracting item information from the order data, wherein the item information optionally comprises an item identifier, and further comprises an item type identifier, a brand identifier and the like. The article identifier, the category identifier and the brand identifier are respectively article information with different granularities (commodity granularity, category granularity and brand granularity), and the article information can also comprise information with other granularities and can be set according to actual business requirements.

The prior characteristics are empirical characteristics related to the layers obtained from statistics of historical data, for example, the influence of a certain commodity on each layer can be scored according to a certain rule, and the higher the score is, the more likely the corresponding person in the layer is to purchase the commodity. For example, a class of merchandise may have the highest score among the high loyalty groups, indicating that the high loyalty groups are most likely to buy the class of merchandise as compared to other loyalty tiers, and also indicating that the score of the class of merchandise is somewhat indicative of loyalty, and therefore, the prior score of the class of merchandise may be used as a characteristic in loyalty tier prediction. Therefore, the item information of each granularity, such as each commodity, each item class, each brand and the like, can depict loyalty through the prior score, and can participate in the construction of the model, so that the characteristics of the model are richer, and the prediction result of the model is more accurate.

In the embodiment of the invention, the construction of the prior characteristics according to the article information and the layering result of each prior user comprises the following steps: determining prior characteristics of multiple dimensions aiming at the article identification according to the layering result of each prior user, the article identification in the order data of each prior user and the order quantity containing the article identification in the order data of each prior user.

In an optional implementation manner of the embodiment of the present invention, the prior feature includes a first prior feature and a second prior feature, the item information includes an item identifier, and the constructing the prior feature according to the item information and a hierarchical result of each prior user includes:

aiming at the mark of any one article,

determining first prior characteristics of multiple dimensions according to the total number of prior users containing any article identifier in order data and the number of prior users corresponding to each layer containing any article identifier in order data;

In a further optional implementation manner of the embodiment of the present invention, the prior feature further includes a third feature and a fourth feature, and the constructing the prior feature according to the item information and the hierarchical result of each prior user further includes:

Determining third prior characteristics of multiple dimensions according to the first prior characteristics, the number of prior users corresponding to each hierarchy and the total number of prior users in a prior user set;

In the embodiment of the invention, firstly, order data of each prior user in a prior user set in a second preset time are obtained, article information such as an article identifier is obtained from the order data, and the order quantity of each prior user, the order quantity containing the article identifier and a layering result corresponding to each prior user are obtained according to the order data, so that the relationship between the layering and the order quantity is determined in the subsequent process.

Then, aiming at any article identification, determining each prior user containing the article identification in order data, and determining the number of the prior users containing each layer of the article identification in the order data according to the layering result of each prior user; and determining the order quantity of any item identifier contained in the order data of the prior user in each layer. Generally, the greater the number of people placing orders for a given tier, the greater the order size, indicating that the item is more relevant to that tier. However, for popular products, the number of orders placed and the order quantity of the popular products in each layer are large, so that the accuracy is affected by describing the correlation between the products and the layers according to the absolute number of the number of orders placed and the absolute number of the order quantity, and the prior characteristics of multiple dimensions can be constructed by adopting the integral ratio of the number of orders placed and the order quantity.

According to the number of prior users containing any article identifier in order data and the total order quantity containing any article identifier in order data, splitting the data according to layers according to article identifier granularity, and processing the data into prior characteristics of the article identifier to each layer.

The first prior characteristic and the second prior characteristic of the embodiment of the invention are constructed by calculating the ratio, and the difference between the hot commodity and the cold commodity can be reduced by calculating the ratio as the prior characteristic, for example, the hot commodity is characterized by large quantity of people placing orders and more orders, which can generate some interference on the layered depiction of the user. For example, a good sale of item A has an order of 50 thousands, with a priori characteristics of 20 thousands at high loyalty; another cold item B has only 1 ten thousand orders, and the prior feature of high loyalty is 3000. Such a priori characteristics bring about two disadvantages: one is interference with training, as the depiction of loyalty by the same commodity should be comparable from a business perspective; secondly, the data range is large, and subsequent normalization processing is needed. Therefore, to overcome these two disadvantages, the prior feature of the embodiment of the present invention is to use a ratio instead of an absolute number, that is, the high loyalty prior feature of the good-selling article a is 20 ten thousand/50 ten thousand to 0.4, and the high loyalty prior feature of the cold article B is 3000/1 ten thousand to 0.3, thereby eliminating the interference of hot and cold of the article itself and also realizing the normalization processing of data.

The influence of the commodities on user layering can be well depicted by the method for calculating the proportion, and the problems of cold and hot door commodities and large data range are solved. However, when the sample size, such as the number of users in each layer or the distribution of the order size, is unbalanced, the unbalanced sample size also interferes the result, which is different from the conventional method of discarding unbalanced data. For example, if there are particularly many samples of medium loyalty in the sample, then the apriori feature scores the highest for the medium loyalty feature for each item, whether it be the number of people or the order size dimension. Such that each item is strongly associated with a medium loyalty, but this is due to the large number of users with medium loyalty. Therefore, the prior feature, i.e. the first prior feature and the second prior feature, can be adjusted by using the proportion of the number of people in each layer (or the order quantity) of the prior user set to the total number (or the total order quantity) of the prior user set as a denominator to obtain the third prior feature and the fourth prior feature. The third prior characteristic and the fourth prior characteristic better avoid the influence of unbalanced sample size and better indicate the strength of the correlation between the commodity and the layering, namely when the value of the first prior characteristic is higher than the proportion of the number of the layering people to the total number of the layering people, or when the value of the second prior characteristic is higher than the proportion of the layering order quantity to the total order quantity, the value of the third prior characteristic or the fourth prior characteristic is higher, and the commodity is strongly correlated with the layering.

Fig. 3 is a schematic flow chart of a method for constructing a prior feature according to an embodiment of the present invention, in which a user hierarchy corresponding to each prior user, each commodity identifier in order data, and the number of times that the user places each commodity are determined according to a prior user set and order data; then, aggregating according to two modes, wherein the first mode is that according to commodities and hierarchical aggregation, aiming at each commodity identification, the number of people ordering each hierarchy of the commodities and the order amount of the commodities for each hierarchical crowd are determined in each hierarchical prior user; the other mode is according to the commodity aggregation, namely, the total number of persons ordering the commodity and the total order quantity of the commodity are determined for each commodity identification; then, with commodities as granularity, processing the commodities into a first prior feature and a second prior feature of multiple dimensions according to user layering, wherein the first prior feature comprises the following steps of: (ii) the ratio of the number of persons ordering each tier of the good to the total number of persons ordering the good, … …, the number of persons ordering the first tier of the good/the total number of persons ordering the good, the number of persons ordering the second tier of the good/the total number of persons ordering the good; the second a priori characteristics include: the ratio of the order quantity of the commodity ordered by each layer to the total order quantity of the commodity, such as the order quantity of the commodity ordered by the first layer user/the total order quantity of the commodity, the order quantity of the commodity ordered by the second layer user/the total order quantity of the commodity, … …; then, with the commodity as the granularity, adjusting the first prior feature and the second prior feature by combining with the user layering proportion to respectively obtain a third prior feature and a fourth prior feature, wherein the third prior feature is the ratio of the first prior feature to the proportion of the number of people in the corresponding layering to the total number of people in the prior users, and the method comprises the following steps: (the number of people on the first floor of the product/the total number of people on the first floor)/(the ratio of the number of people on the first floor to the total number of prior users), (the number of people on the second floor of the product/the ratio of the number of people on the second floor)/(the ratio of the number of people on the second floor to the total number of prior users), … …; the fourth feature value is a ratio of the second prior feature to a proportion of the corresponding layered order quantity to the total order quantity, and includes: (the order quantity of the commodity ordered by the first layer user/the total order quantity of the commodity)/(the ratio of the order quantity of the first layer to the total order quantity of the prior user set), (the order quantity of the commodity ordered by the second layer user/the total order quantity of the commodity)/(the ratio of the order quantity of the second layer to the total order quantity of the prior user set), … …, so as to construct the prior characteristics of multiple dimensions of commodity granularity, and similarly, the method can construct the prior characteristics of multiple dimensions of commodity granularity, brand granularity or other granularity related to layering.

The prior features are the prior features for article identification, article type identification, brand identification and the like, and are specific to a user when a training model, a testing model and model prediction are performed, so that the prior features specific to the article identification need to be matched with the user to obtain the prior feature data of multiple dimensions of the user.

In the embodiment of the present invention, based on the prior characteristics, determining the prior characteristic data of each training user according to the order data of each training user within the second preset time includes:

determining prior characteristic values of multiple dimensions of each item identifier in order data of each training user based on prior characteristics of the multiple dimensions;

and determining the sum of the prior characteristic values of the item identifications in any dimension aiming at any dimension in the plurality of dimensions, wherein the sum is used as the prior characteristic data corresponding to any dimension for training the user.

The method comprises the steps of obtaining order data of each training user in second preset time for each training user in a training user set, wherein the second preset time can be set in a user-defined mode, the second preset time can be the same as or different from the first preset time, obtaining all article identifications in the order data, and aiming at the prior characteristics of multiple dimensions of each article identification, the prior characteristics comprise a first prior characteristic and a second prior characteristic, and the prior characteristics also comprise a third prior characteristic and a fourth prior characteristic.

Optionally, for any training user, all article identifiers in order data of any training user are obtained, and for any dimension, the prior feature values of any dimension of all article identifiers are summed to serve as the prior feature data of any training user corresponding to the dimension, so as to obtain the prior feature data of each training user in the training user set corresponding to multiple dimensions.

Optionally, the prior eigenvalues of any dimension of each item identifier may be subjected to weighted summation to serve as prior eigenvalue data of any training user corresponding to the dimension, so as to obtain prior eigenvalue data of each training user in the training user set corresponding to multiple dimensions, where the weight of each item identifier may be set in a user-defined manner, or the weight may be assigned according to time decay, and the more recently ordered item corresponds to a larger weight.

Similarly, the prior characteristic data corresponding to any training user can be determined according to the prior characteristics of the category identifiers and the brand identifiers.

In the embodiment of the present invention, the basic feature data includes portrait feature data and/or user behavior feature data of the user, and the user behavior feature data may include data such as ordering, browsing, purchasing, paying attention, and the like.

The basic characteristic data and the prior characteristic data of each training user in the training user set and the layering result corresponding to each training user are adopted for model training, model parameters are adjusted, parameters are continuously optimized according to model effects, a user layering model is obtained, correspondingly, model verification and parameter adjustment are carried out through the basic characteristic data and the prior characteristic data of each testing user in the testing user set, parameters of the user layering model are adjusted and optimized, and the effect of the user layering model is improved. The model is a multi-classification model, and may adopt one or more of a logistic regression model such as 1vs1 (binary logistic regression) or 1vs rest (multivariate logistic regression), a tree model, a random forest model, and a Deep learning model (such as DNN, Deep Neural Networks). And specifically, a proper model can be selected according to business requirements, data characteristics and the like.

In the embodiment of the present invention, the prior characteristics of multiple dimensions include a user quantity dimension and an order dimension, and prior characteristics of other dimensions may also be constructed according to actual services, such as a PV (Page View, access quantity, such as Page View quantity or click quantity) dimension, a UV (Unique viewer, independent Visitor, such as a Visitor at a computer client accessing a website).

In the embodiment of the invention, the determining of the prior characteristic data of the user to be predicted according to the article information comprises the following steps:

determining prior characteristic values of multiple dimensions of each item identifier in order data of a user to be predicted based on prior characteristics of the multiple dimensions;

and aiming at any dimension in the multiple dimensions, determining the sum of the prior characteristic values of the item identifications in any dimension as prior characteristic data of the user to be predicted, which correspond to any dimension.

After the user hierarchical model is obtained, the user hierarchical model can be adopted to predict the hierarchical result of the user to be predicted, basic characteristic data of the user to be predicted and order data in preset time are obtained, the preset time can be the same as the first preset time and/or the second preset time, all article identifications are obtained from the order data, the article identifications, brand identifications and the like can be included, the prior characteristic value of multiple dimensions corresponding to each article identification is determined based on the prior characteristics of the multiple dimensions of the article identification, then, for any dimension, calculating the sum of the prior characteristic values of the item identifications in the dimension, the average value of the sum of the prior characteristic values or the weighted sum result can be used as the prior characteristic data of the user to be predicted, which corresponds to any dimension, so as to obtain the prior characteristic data of multiple dimensions of the user to be predicted; the weight of each article identifier can be set in a user-defined mode, and can also be distributed according to time attenuation, wherein the more recent article to be ordered is corresponding to a larger weight; and then inputting the basic characteristic data and the prior characteristic data of the user to be predicted into the user hierarchical model to obtain a hierarchical result of the user to be predicted. By carrying out hierarchical prediction on users to be predicted, accurate marketing and delivery are carried out on user groups with different hierarchies.

As shown in fig. 4, which is a schematic flow chart of a user layering method according to an embodiment of the present invention, basic feature data and order data of a plurality of users (i.e., seed users) with loyalty layering results are obtained from all station users of an e-commerce platform, the numbers of users in a training user set, a priori user set, and a test user set are determined according to the number of the plurality of users and a ratio of 4:4:2, a layered sampling ratio can be determined in an undersampling manner, and a training user is extracted from the plurality of users as a training user set according to the layered sampling ratio and the number of users in the training user set; extracting test users from a plurality of users as a test user set according to the layered sampling proportion and the number of the users in the test user set; the rest users are used as a prior user set; constructing prior characteristics by using data of prior users in a prior user set, and matching the prior characteristics with the users to obtain prior characteristic data of the users; then, basic characteristic data (such as user portrait characteristic data and user behavior characteristic data) and prior characteristic data of each training user in the training user set and loyalty layering corresponding to each training user are used as training samples to conduct model training; then, the basic characteristic data and the prior characteristic data of each test user in the test user set and the loyalty layering corresponding to each test user are used as test samples to verify the model and adjust parameters to obtain a user layering model; the method comprises the steps of obtaining a prediction sample from a total station user, predicting a layering result of a user to be predicted in the prediction sample, obtaining basic feature data and order data of the user to be predicted, obtaining an article identification from the order data, obtaining prior feature data of the user to be predicted based on the prior feature and the article identification, inputting the basic feature data and the prior feature data of the user to be predicted into a user layering model, obtaining a loyalty layering result of the user to be predicted, and further obtaining a loyalty layering group.

The embodiment of the invention provides a user layering method, which comprises the steps of determining the number of users in a training user set, a testing user set and a prior user set according to a plurality of users with layering results, solving the problem of unbalanced distribution of the number of the users in each layer in an undersampling mode, obtaining each user in the training user set and the testing user set, taking the rest users as the prior user set, and constructing prior characteristics by using data of the prior users, so that the data of the prior users with the layering results are fully utilized, the interference of unbalanced data distribution is reduced, and the performance of a model is improved by efficiently utilizing the data; the order data of the prior users are concentrated by the prior users to construct prior characteristics, indexes related to the layering are mined by cross statistics of the layering and the distribution of user and article identifications (including article identifications and brand identifications), the characteristics are fully compressed in the process of constructing the prior characteristics, sparse characteristics are concentrated into dense prior characteristics and correspond to business meanings, high-quality prior characteristic data are provided for the model, the effect of the model is further improved, then the model is trained by the aid of basic characteristic data and the prior characteristic data, the characteristics of the model are enriched, a user layering model with high model prediction accuracy is obtained, layering results of the users to be predicted are predicted by the aid of the user layering model, and accurate putting and marketing are achieved according to user groups of different layering.

As shown in fig. 5, another aspect of the present invention provides an apparatus 500 for user layering, including:

the obtaining module 501 obtains basic feature data of a user to be predicted and order data within a preset time;

the first determining module 502 is used for acquiring article information from the order data and determining the prior characteristic data of the user to be predicted according to the article information;

the second determining module 503 inputs the basic feature data and the prior feature data into the trained user hierarchical model, and determines the hierarchical result of the user to be predicted.

In an embodiment of the present invention, the user layering apparatus further includes a model training module, configured to: obtaining a training user set, a prior user set and a test user set according to a plurality of users and the layering result of each user; acquiring order data of each prior user in a prior user set within first preset time, acquiring article information from the order data of each prior user, and constructing prior characteristics according to the article information and a layering result of each prior user; acquiring basic characteristic data of each training user in a training user set and order data in second preset time, and determining the prior characteristic data of each training user according to the order data in the second preset time of each training user based on the prior characteristics; and performing model training according to the prior characteristic data and the basic characteristic data of each training user and the layering result of each training user, and performing model testing according to the prior characteristic data and the basic characteristic data of each testing user in the testing user set and the layering result of each testing user to obtain a user layering model.

In an embodiment of the present invention, the model training module is further configured to: determining the number of users corresponding to each hierarchy according to the plurality of users and the hierarchy result of each user; determining a layering sampling ratio according to the number of users corresponding to each layer; respectively determining the number of users in a training user set, a prior user set and a test user set according to a preset proportion; and respectively determining each user in the training user set, the prior user set and the test user set according to the number of the users in the training user set, the prior user set and the test user set, the hierarchical sampling proportion and a plurality of users.

In an embodiment of the present invention, the model training module is further configured to: determining the ratio of the number of users corresponding to each hierarchy according to the number of users corresponding to each hierarchy; and adjusting the ratio of the number of users corresponding to each layer in an undersampling mode to obtain the layer sampling ratio.

In an embodiment of the present invention, the model training module is further configured to: based on the number of users in the training user set, acquiring each layered user from a plurality of users according to a layered sampling proportion so as to determine each user in the training user set; based on the number of users in the test user set, acquiring each layered user from a plurality of users according to a layered sampling proportion so as to determine each user in the test user set; and determining each user in the prior user set according to all users and each user in the training user set and the testing user set.

In an embodiment of the present invention, the article information includes an article identifier, and the model training module is further configured to: determining prior characteristics of multiple dimensions aiming at the article identification according to the layering result of each prior user, the article identification in the order data of each prior user and the order quantity containing the article identification in the order data of each prior user.

In an embodiment of the present invention, the prior characteristics include a first prior characteristic and a second prior characteristic, the article information includes an article identifier, and the model training module is further configured to: aiming at any article identifier, determining first prior characteristics of multiple dimensions according to the total number of prior users containing any article identifier in order data and the number of prior users corresponding to each layer containing any article identifier in order data; and determining second prior characteristics of multiple dimensions according to the order quantity of any item identifier contained in the order data of each layered prior user and the total order quantity of any item identifier contained in the order data of all prior users.

In an embodiment of the present invention, the prior characteristics further include a third characteristic and a fourth characteristic, and the model training module is further configured to: determining third prior characteristics of multiple dimensions according to the first prior characteristics, the number of prior users corresponding to each hierarchy and the total number of prior users in a prior user set; and determining fourth prior characteristics of a plurality of dimensions according to the second prior characteristics, the number of the prior users corresponding to each hierarchy and the total number of the prior users in the prior user set.

In an embodiment of the present invention, the model training module is further configured to: determining prior characteristic values of multiple dimensions of each item identification in order data of each training user based on prior characteristics of the multiple dimensions; and determining the sum of the prior characteristic values of the item identifications in any dimension aiming at any dimension in the plurality of dimensions, wherein the sum is used as the prior characteristic data corresponding to any dimension for training the user.

In this embodiment of the present invention, the second determining module 503 is further configured to: determining prior characteristic values of multiple dimensions of each item identifier in order data of a user to be predicted based on prior characteristics of the multiple dimensions; and determining the sum of the prior characteristic values of each item identifier in any dimension as prior characteristic data of the user to be predicted corresponding to any dimension aiming at any dimension in the plurality of dimensions.

In another aspect, an embodiment of the present invention provides an electronic device, including: one or more processors; a storage device for storing one or more programs which, when executed by one or more processors, cause the one or more processors to implement the user-layered method of an embodiment of the present invention.

Yet another aspect of the embodiments of the present invention provides a computer-readable medium, on which a computer program is stored, where the computer program, when executed by a processor, implements the user layering method of the embodiments of the present invention.

Fig. 6 illustrates an exemplary system architecture 600 of a user-layered method or user-layered apparatus to which embodiments of the present invention may be applied.

As shown in fig. 6, the system architecture 600 may include

terminal devices

601, 602, 603, a network 604, and a server 605. The network 604 serves to provide a medium for communication links between the

terminal devices

601, 602, 603 and the server 605. Network 604 may include various types of connections, such as wire, wireless communication links, or fiber optic cables, to name a few.

A user may use the

terminal devices

601, 602, 603 to interact with the server 605 via the network 604 to receive or send messages or the like. The

terminal devices

601, 602, 603 may have installed thereon various communication client applications, such as shopping applications, web browser applications, search applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only).

The

terminal devices

601, 602, 603 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.

The server 605 may be a server providing various services, such as a background management server (for example only) providing support for shopping websites browsed by users using the

terminal devices

601, 602, 603. The backend management server may analyze and perform other processing on the received data such as the product information query request, and feed back a processing result (for example, target push information, product information — just an example) to the terminal device.

It should be noted that the method for user hierarchy provided by the embodiment of the present invention is generally executed by the server 605, and accordingly, the device for user hierarchy is generally disposed in the server 605.

It should be understood that the number of terminal devices, networks, and servers in fig. 6 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for an implementation.

Referring now to FIG. 7, shown is a block diagram of a computer system 700 suitable for use with a terminal device implementing an embodiment of the present invention. The terminal device shown in fig. 7 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.

As shown in fig. 7, the computer system 700 includes a Central Processing Unit (CPU)701, which can perform various appropriate actions and processes in accordance with a program stored in a Read Only Memory (ROM)702 or a program loaded from a storage section 708 into a Random Access Memory (RAM) 703. In the RAM 703, various programs and data necessary for the operation of the system 700 are also stored. The CPU 701, the ROM 702, and the RAM 703 are connected to each other via a bus 704. An input/output (I/O) interface 705 is also connected to bus 704.

The following components are connected to the I/O interface 705: an input portion 706 including a keyboard, a mouse, and the like; an output section 707 including components such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage section 708 including a hard disk and the like; and a communication section 709 including a network interface card such as a LAN card, a modem, or the like. The communication section 709 performs communication processing via a network such as the internet. A drive 710 is also connected to the I/O interface 705 as needed. A removable medium 711 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is mounted on the drive 710 as necessary, so that a computer program read out therefrom is mounted into the storage section 708 as necessary.

In particular, according to the embodiments of the present disclosure, the processes described above with reference to the flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method illustrated in the flow chart. In such an embodiment, the computer program can be downloaded and installed from a network through the communication section 709, and/or installed from the removable medium 711. The computer program performs the above-described functions defined in the system of the present invention when executed by the Central Processing Unit (CPU) 701.

It should be noted that the computer readable medium shown in the present invention can be a computer readable signal medium or a computer readable storage medium or any combination of the two. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples of the computer readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the present invention, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, a computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.

The flowchart and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.

The modules described in the embodiments of the present invention may be implemented by software or hardware. The described modules may also be provided in a processor, which may be described as: a processor includes an acquisition module, a first determination module, and a second determination module. The names of the modules do not constitute a limitation to the modules themselves in some cases, for example, the obtaining module may also be described as "obtaining basic feature data of the user to be predicted and order data within a preset time".

As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be separate and not incorporated into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to comprise: acquiring basic characteristic data of a user to be predicted and order data within preset time; acquiring article information from the order data, and determining prior characteristic data of a user to be predicted according to the article information; and inputting the basic characteristic data and the prior characteristic data into the trained user layering model, and determining the layering result of the user to be predicted.

According to the technical scheme of the embodiment of the invention, the embodiment of the invention provides a user layering method, the number of users in a training user set, a testing user set and a prior user set is determined according to a plurality of users with layering results, the problem of unbalanced distribution of the number of users in each layer is solved in an undersampling mode, so that each user in the training user set and the testing user set is obtained, the rest users are used as the prior user set, the prior characteristics are constructed by using the data of the prior users, the data of the prior users with layering results are fully utilized, the interference of unbalanced data distribution is reduced, and the performance of a model is improved by efficiently utilizing the data; the method comprises the steps of constructing prior characteristics by using order data of prior users in a concentrated mode, mining indexes related to layering through cross statistics of layering and distribution of the prior users and article identifications (including article identifications and brand identifications), fully compressing the characteristics in the process of constructing the prior characteristics, concentrating sparse characteristics into dense prior characteristics and corresponding to business meanings, providing high-quality data for a model, further improving the effect of the model, training the model by using basic characteristic data and the prior characteristic data, enriching the characteristics of the model, obtaining a user layered model with high model prediction accuracy, predicting layering results of users to be predicted by using the user layered model, and achieving accurate marketing and delivery according to user groups of different layers.

The above-described embodiments should not be construed as limiting the scope of the invention. It should be understood by those skilled in the art that various modifications, combinations, sub-combinations and substitutions may occur depending on design requirements and other factors. Any modification, equivalent replacement, and improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims

1. A method of user stratification, comprising:

2. The method of claim 1, wherein the user hierarchy model is trained in the following manner:

and performing model training according to the prior characteristic data and the basic characteristic data of each training user and the layering result of each training user, and performing model verification according to the prior characteristic data and the basic characteristic data of each testing user in the testing user set and the layering result of each testing user to obtain the user layering model.

3. The method of claim 1, wherein obtaining a training user set, a prior user set, and a testing user set according to a plurality of users and a hierarchical result of each user comprises:

4. The method of claim 3, wherein determining the hierarchical sampling ratio according to the number of users corresponding to each hierarchy comprises:

determining the ratio of the number of users corresponding to each hierarchy according to the number of users corresponding to each hierarchy;

and adjusting the ratio of the number of the users corresponding to each layer in an undersampling mode to obtain the layered sampling ratio.

5. The method of claim 3, wherein determining each user in the training user set, the prior user set, and the testing user set separately comprises:

based on the number of users in the training user set, acquiring each layered user from the plurality of users according to the layered sampling proportion so as to determine each user in the training user set;

6. The method of claim 2, wherein the item information includes an item identification, and wherein constructing prior characteristics from the item information and the tiered results of each of the prior users comprises:

7. The method of claim 2, wherein the prior features include a first prior feature and a second prior feature, wherein the item information includes an item identification, and wherein constructing the prior features from the item information and the tiered results for each of the prior users comprises:

aiming at the mark of any article, the method comprises the following steps of,

8. The method of claim 7, wherein the prior features further include a third feature and a fourth feature, the constructing the prior features from the item information and the hierarchical results of each of the prior users further comprising:

9. The method according to any one of claims 6 to 8, wherein determining, based on the prior characteristics, prior characteristic data of each of the training users according to order data within a second preset time of each of the training users comprises:

determining prior characteristic values of a plurality of dimensions of each item identifier in order data of each training user based on prior characteristics of the plurality of dimensions;

10. The method according to claim 9, wherein determining a priori characteristic data of the user to be predicted from the item information comprises:

determining prior characteristic values of multiple dimensions of each item identifier in the order data of the user to be predicted based on the prior characteristics of the multiple dimensions;

and determining the sum of the prior characteristic values of the item identifications in any dimension as prior characteristic data of the user to be predicted corresponding to any dimension aiming at any dimension in the plurality of dimensions.

11. An apparatus for user stratification, comprising:

the acquisition module is used for acquiring basic characteristic data of a user to be predicted and order data in preset time;

and the second determination module is used for inputting the basic characteristic data and the prior characteristic data into a trained user hierarchical model and determining the hierarchical result of the user to be predicted.

12. An electronic device, comprising:

one or more processors;

a storage device to store one or more programs,

the one or more programs, when executed by the one or more processors, cause the one or more processors to implement the method recited in any of claims 1-10.

13. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 1-10.