CN108229590B

CN108229590B - Method and device for acquiring multi-label user portrait

Info

Publication number: CN108229590B
Application number: CN201810148824.1A
Authority: CN
Inventors: 张雅淋; 李龙飞
Original assignee: Alibaba Group Holding Ltd
Current assignee: Advanced New Technologies Co Ltd; Advantageous New Technologies Co Ltd
Priority date: 2018-02-13
Filing date: 2018-02-13
Publication date: 2020-05-15
Anticipated expiration: 2038-02-13
Also published as: WO2019157928A1; TWI693567B; CN108229590A; TW201935344A

Abstract

The embodiment of the specification discloses a method and a device for training a user portrait classifier and a method and a device for acquiring a multi-label user portrait. The training method comprises the following steps: acquiring respective first feature vectors of a first group of users; obtaining respective first label values of the first group of users; training a first classifier by taking a set of values of a first feature vector and a first label of each user of the first group as a first training set; combining the respective first feature vectors of the first group of users with the values of the first labels to obtain respective second feature vectors of the first group of users; obtaining respective second label values of the first group of users; and training a second classifier by taking the respective second feature vector of the first group of users and the value set of the second label as a second training set.

Description

Method and device for acquiring multi-label user portrait

Technical Field

The invention relates to the field of machine learning, in particular to a method and a device for training a user portrait classifier and a method and a device for acquiring a multi-label user portrait.

Background

With the popularity and development of the internet, more and more data can be collected by various internet operators. For example, for an e-commerce website, information such as purchase records and browsing records of a user can be obtained; for a search engine, information of a user's search record, click record, and the like can be obtained. In order to make better use of such information and provide more efficient and superior services, user portrayal is gaining attention. A user representation is a tagged user model that is abstracted based on information such as user social attributes, lifestyle habits, and consumption behaviors. Currently, the prior art includes a method for obtaining a user portrait based on a deep neural network, a method for obtaining a user portrait based on statistical data, and the like. Accordingly, a more efficient scheme for capturing a multi-tag user representation is desired.

Disclosure of Invention

Embodiments of the present disclosure aim to provide a more efficient scheme for obtaining a multi-tag user representation, so as to solve the deficiencies in the prior art.

To achieve the above object, one aspect of the present specification provides a method of training a user representation classifier, the classifier being a chain classifier including a first classifier and a second classifier, the user representation being a multi-label user representation, the method comprising: acquiring respective first feature vectors of a first group of users, wherein the first feature vectors correspond to user information, and the information comprises registration information of the users and operation history information of the users; obtaining respective first tag values of the first group of users, wherein the first tag values correspond to first tag information of the users; training a first classifier by taking a set of values of a first feature vector and a first label of each user of the first group as a first training set; combining the respective first feature vectors of the first group of users with the values of the first labels to obtain respective second feature vectors of the first group of users; obtaining respective values of second tags of the first group of users, wherein the values of the second tags correspond to second tag information of the users, and the second tags of the users are associated with the first tags of the users; and training a second classifier by taking the respective second feature vector of the first group of users and the value set of the second label as a second training set.

In one embodiment, in the method for training the chain classifier, the information of the user includes label information of the user.

In one embodiment, in the method of training a chain classifier described above, the first label is an age and the second label is a purchasing preference.

In one embodiment, in the method of training a chain classifier described above, the first label is a purchase preference and the second label is a purchase capability.

In another aspect of the present specification, there is provided a method of training a user representation classifier, the classifier being a chain classifier comprising a first classifier and a second classifier, wherein the first classifier is a first classifier trained by the above training method, and the user representation is a multi-label user representation, the method comprising: after training a first classifier, acquiring respective first feature vectors of a second group of users, wherein the second group of users comprises at least one user not belonging to the first group of users, the first feature vectors correspond to user information, and the information comprises registration information of the users and operation history information of the users; inputting the respective first feature vectors of the second group of users into the first classifier to obtain respective first label prediction values of the second group of users, and combining the first feature vectors and the first label prediction values of each user in the second group of users to obtain respective second feature vectors of the second group of users; obtaining respective values of second tags of a second group of users, wherein the values of the second tags correspond to second tag information of the users, and the second tags of the users are associated with the first tags of the users; and training the second classifier by taking a set of values of a second feature vector and a second label of each user of the second group as a third training set.

Another aspect of the present specification provides a method for obtaining a multi-tag user representation, comprising: acquiring a first feature vector of a user based on user information; inputting the first feature vector into a first classifier obtained by training through the training method, and obtaining a first label predicted value of the user as a value of a first label of the user; combining the first feature vector with the value of the first label to obtain a second feature vector of the user; and inputting the second feature vector into a second classifier obtained by training through the training method, and obtaining a second label predicted value of the user as a value of a second label of the user.

In an embodiment, the method for obtaining a multi-tag user representation further includes, after obtaining a first feature vector of a user based on user information, replacing the first tag prediction value with a corresponding preset value of the first tag information in a case where the first tag information is included in the user information, as a value of a first tag of the user.

In an embodiment, the method for obtaining a multi-tag user representation further includes, after obtaining the second feature vector of the user, replacing the second tag prediction value with a corresponding preset value of the second tag information in a case that the second tag information is included in the user information, as a value of a second tag of the user.

Another aspect of the present specification provides an apparatus for training a user representation classifier, the classifier being a chain classifier comprising a first classifier and a second classifier, the user representation being a multi-label user representation, the apparatus comprising: a first acquisition unit configured to acquire respective first feature vectors of a first group of users, the first feature vectors corresponding to information of the users, the information including registration information of the users and operation history information of the users; a second obtaining unit configured to obtain values of respective first tags of the first group of users, the values of the first tags corresponding to first tag information of the users; a first training unit configured to train a first classifier with a set of values of a first feature vector and a first label of each of the first group of users as a first training set; a third obtaining unit, configured to combine the first feature vector and the value of the first label of each of the first group of users to obtain a second feature vector of each of the first group of users; a fourth obtaining unit configured to obtain values of respective second tags of the first group of users, the values of the second tags corresponding to second tag information of the users, and the second tags of the users being associated with the first tags of the users; and a second training unit configured to train a second classifier with a set of values of a second feature vector and a second label of each of the first group of users as a second training set.

In another aspect of the present specification, there is provided an apparatus for training a user representation classifier, the classifier being a chain classifier including a first classifier and a second classifier, wherein the first classifier is a first classifier trained by the above training method, and the user representation is a multi-label user representation, the apparatus comprising: a fifth obtaining unit, configured to obtain respective first feature vectors of a second group of users, where the second group of users includes at least one user not belonging to the first group of users, the first feature vectors correspond to information of users, and the information includes registration information of the users and operation history information of the users; an input unit configured to input respective first feature vectors of the second group of users into the first classifier to obtain respective first tag prediction values of the second group of users; a combining unit configured to combine the first feature vector and the first label prediction value of each user in the second group of users to obtain a second feature vector of each user in the second group of users; a sixth obtaining unit configured to obtain values of respective second tags of a second group of users, the values of the second tags corresponding to second tag information of the users, and the second tags of the users being associated with the first tags of the users; and a third training unit configured to train the second classifier with a set of values of a second feature vector and a second label of each of the second group of users as a third training set.

Another aspect of the present specification provides an apparatus for obtaining a multi-tag user representation, comprising: a first acquisition unit configured to acquire a first feature vector of a user based on user information; a first input unit, configured to input the first feature vector into a first classifier obtained by training through the training method, and obtain a first label prediction value of the user as a value of a first label of the user; a second obtaining unit configured to combine the first feature vector with the value of the first tag to obtain a second feature vector of the user; and a second input unit configured to input the second feature vector into a second classifier obtained by training through the training method, and obtain a second label prediction value of the user as a value of a second label of the user.

By the scheme for acquiring the multi-label user portrait according to the embodiment of the specification, learning of each label of the user portrait is more accurate and reliable, and the acquired multi-label user portrait is more accurate.

Drawings

The embodiments of the present specification may be made more clear by describing the embodiments with reference to the attached drawings:

FIG. 1 shows a schematic diagram of a system 100 according to embodiments herein;

FIG. 2 illustrates a method of training a chain classifier in accordance with an embodiment of the present description;

FIG. 3 illustrates a method of obtaining a multi-tag user representation in accordance with an embodiment of the present description;

FIG. 4 illustrates an apparatus 400 for training a chain classifier in accordance with an embodiment of the present description; and

FIG. 5 illustrates an apparatus 500 for obtaining a multi-tag user representation in accordance with an embodiment of the present description.

Detailed Description

The embodiments of the present specification will be described below with reference to the accompanying drawings.

Fig. 1 shows a schematic diagram of a system 100 according to an embodiment of the present description. As shown in fig. 1, the system 100 includes a classifier chain 11. In one embodiment, a plurality of classifiers C are included in the classifier chain 11_jJ 1 … n, each classifier C_jThese n classifiers are concatenated to form a chain corresponding to a label of the user. Classifier C_jThe n classifiers C can be based on one algorithm of decision tree, naive Bayes, support vector machine, association rule learning, neural network and genetic algorithm_jMay be based on the same algorithm or may be based on different algorithms.

In one embodiment, as shown in FIG. 1, the classifier chain 11 includes 4 classifiers C1, C2, C3, and C4. For example, classifier C1 is a classifier corresponding to gender tags, classifier C2 is a classifier corresponding to age tags, classifier C3 is a classifier corresponding to purchase preference tags, and classifier C4 is a classifier corresponding to purchase capabilities.

In training the classifier chain 11, first, a first training set t1 including a plurality of feature vectors x corresponding to information of respective users is input to the classifier C1, the training set t1₁And tag values λ of respective users₁. In the case where C1 is a gender classifier, the label value λ₁Corresponding to the gender of the user. C1 was trained at t1 to obtain classifier C1 corresponding to the gender tag. Thereafter, the training set t2 is input to the classifier C2. As shown in the figure, the training set t2 includes a plurality of feature vectors x corresponding to information of respective users₂And tag values of respective usersλ₂. In the case where C2 is an age classifier, the label value λ₂Corresponding to the age bracket of the user. The feature vector x₂In addition to including the above-mentioned feature vector x₁In addition, the label value λ of each user is included₁I.e. values corresponding to different sexes. C2 is trained with training set t2 such that classifications of user age are associated with the user's gender label information. In training the later classifiers C3 and C4, training was done in the same way as C2, i.e., the feature vector x in t3₃Including x₂And λ₂Feature vector x in t4₄Including x₃And λ₃Thereby associating the respective tags of the user. The learning of the sample label is more accurate and reliable. For example, in the case where C3 is a purchase preference classifier, the label λ₃Corresponding to the user purchase preference, input the feature vector x of C3₃Except that the feature vector x in C2 is included₂In addition, a tag value λ is included₂I.e., the user age tag value.

After training of all four classifiers C1-C4 is complete, i.e., classifier chain 11 is trained as a multi-label classification model that can be used to classify users with unknown labels. As shown in FIG. 1, the initial information of the user of unknown label is expressed as a feature vector x₁'form input C1, user information is classified through C1, and a predicted value lambda of the user's gender label is obtained₁'. C1 compares the user information x₁' and λ₁' input to C2, so that C2 is based on user information x₁' and λ₁' Classification is carried out to obtain the predicted value lambda of the age label of the user₂'. Thereafter, in the same manner as in C2, classifier C3 would receive its feature vector x from the last classifier C2₂' and λ₂', thus based on x₂' and λ₂' Classification is carried out to obtain the predicted value lambda of the purchasing preference label₃'. The classifier C4 receives its feature vector x from the last classifier C3₃' and λ₃', thus based on x₃' and λ₃' Classification is carried out to obtain the predicted value lambda of the purchasing power label₄', thus canObtaining a user portrait labelset { lambda₁’、λ₂’、λ₃’、λ₄’}。

The following describes a method for training a chain classifier and a method for obtaining a multi-label user representation according to an embodiment of the present specification with reference to specific examples of the present specification.

FIG. 2 illustrates a method of training a user representation classifier that is a chain classifier including a first classifier and a second classifier, the user representation being a multi-label user representation, according to an embodiment of the present description. The method comprises the following steps: at step S21, acquiring respective first feature vectors of a first group of users, where the first feature vectors correspond to information of the users, and the information includes registration information of the users and operation history information of the users; at step S22, obtaining respective first label values of the first group of users, the first label values corresponding to the first label information of the users; in step S23, training a first classifier with a first training set of respective first feature vectors and sets of values of first labels of the first group of users; at step S24, combining the first feature vector and the value of the first label of each of the first group of users to obtain a second feature vector of each of the first group of users; at step S25, obtaining values of respective second tags of the first group of users, where the values of the second tags correspond to the second tag information of the users, and the second tags of the users are associated with the first tags of the users; and training a second classifier with the respective second feature vector of the first group of users and the set of values of the second label as a second training set in step S26.

First, in step S21, respective first feature vectors of a first group of users are acquired, the first feature vectors corresponding to information of the users, the information including registration information of the users and operation history information of the users. The first group of users includes a plurality of users, including, for example, on the order of tens of thousands of users. The first feature vector is a column vector in which the elements correspond to the values of the respective information fields of the user. The user information may include user original entry information, such as registration information of the user: mobile phone, mailbox, city, etc. The user information may also include operation history information of the user, such as a search and click record including, for example, description information of the product (product category, price, whether to reduce price), product advertisement, offer promotion, and the like. The user information may also include user tag information, such as gender, age, and the like.

After the user information is obtained, the values are combined into a feature vector by converting the corresponding information of the user into a corresponding numerical form. For example, the city name in the user information may be converted into a corresponding number set in advance, for example, 1 for beijing, 2 for shanghai, and the like. In order to accurately learn the classification of the user, the user information generally includes operation history information of the user over a period of time, such as search and click records of the user in half a year, three months, or one month.

In one embodiment, the user information is user initial information, namely, user input information and user operation history information are included.

Then, in step S22, values of respective first tags of the first group of users are obtained, the values of the first tags corresponding to the first tag information of the users. The first label corresponds to a first classifier, for example, the first classifier is a classifier for classifying the gender of the user, and the first label is the gender of the user. In another embodiment, the first classifier is a classifier that classifies the age of the user, and the first label is the age of the user. For some tags, such as gender, age, etc., tag information about them may be entered by the user himself or obtained directly from previous model-to-user ratings. For some tags, e.g., purchasing preferences, purchasing capabilities, etc., tag information about them may be obtained from past model-to-user ratings.

In one embodiment, the first classifier is classifier C1 shown in FIG. 1, and C1 is, for example, a gender classifier, and thus, the value of the first label λ₁Is a value corresponding to user gender information. For example, a female is preset to correspond to the number 0 and a male is preset to correspond to the number 1, so that when λ is₁When 0, the gender label is female, when λ₁When 1, the gender label is male.

In step S23, a first classifier is trained with the respective first feature vector of the first group of users and the set of values of the first label as a first training set. In one embodiment, the first classifier may be any one of the classifiers C1, C2, C3 shown in FIG. 1, which includes respective first feature vectors x of a plurality of users in a training set_jAnd a value λ of a first tag of each of a plurality of users_j(j＝1、2、3)。

In one embodiment, the first classifier is classifier C1 of fig. 1, and classifier C1 is, for example, a classifier that classifies the gender of the user. The feature vector x of the user can be established based on the original input information of the user and the click record of the user₁The value λ of the first label is a value corresponding to the gender (true gender or gender predicted from a previous model) of the user₁With feature vectors x of a plurality of users₁And a tag value λ₁Train the classifier C1 so that the classifier C1 can be used to classify the gender of the user.

In step S24, the first feature vector and the value of the first label of each of the first group of users are combined to obtain the second feature vector of each of the first group of users. That is, the value of the first label is added as one element to the first feature vector, thereby obtaining the second feature vector.

In one embodiment, where classifier C1 is a gender classifier, the user's gender tag value λ is assigned₁Added as an element to the feature vector x₁For training of classifier C2.

In step S25, values of respective second tags of the first group of users are obtained, where the values of the second tags correspond to second tag information of the users. The second tag corresponds to the second classifier. For example, the second classifier may be a purchase preference classifier, and the second label is the user's purchase preference.

In one embodiment, the second classifier is a graphClassifier C2 in 1, which may be an age classifier, for example, so that the second label is a user age label. For example, the value of the second label λ₂Can be preset to correspond to several age groups of the user, for example, can be preset when λ₂1, corresponding to the age range of 5-10 years, when lambda₂2, corresponding to the age range of 10-20 years, when lambda₂3, corresponding to the age range of 20-30 years, and so on. The acquisition of the information (i.e., age information) corresponding to the value of the second tag is similar to the acquisition of the information of the first tag, and is not described herein again.

In step S26, a second classifier is trained with a second training set of the respective second feature vectors and the set of values of the second labels of the first group of users. In one embodiment, the second classifier may be any one of the classifiers C2, C3, C4 shown in FIG. 1, which each include a respective second feature vector x of a plurality of users in a respective training set_jAnd a value λ of a second label for each of the plurality of users_jAnd the second feature vector x_jIncluding the tag value λ corresponding to the first classifier_j-1Wherein j is 2, 3, 4.

In one embodiment, the second classifier is classifier C2 of fig. 1, which is, for example, a classifier that classifies the age of the user. By feature vector x at the user₁Increased by an element corresponding to gender (i.e., λ)₁) To obtain the feature vector x of the user₂A value corresponding to the age of the user (i.e., λ)₂) As the value of the second label, the feature vector x of a plurality of users₂And a tag value λ₂Train the classifier C2 so that the classifier C2 can be used to classify the age of the user. Thus, training for classifier C2 (i.e., an age classifier) is associated with the gender label.

In one embodiment, the chained classifiers described above further include classifier C3 as shown in FIG. 1, classifier C3 being, for example, a classifier that classifies user purchase preferences. Thus, the user label corresponding to classifier C3 purchases preferences for the user. The purchasing preference tag value lambda can be selected according to the practical application condition₃To carry outAnd (7) assigning values. For example, purchasing preferences can be categorized into several categories, such as living goods, electronic products, luxury goods, and school supplies, according to the purchasing characteristics of different people. And for λ by corresponding different classes of purchase preferences to predetermined values₃And (7) assigning values. For example, the living goods correspond to the number 1 and the electronic products correspond to the number 2, so that when λ is₃When 1, the purchasing preference of the representative user is the living goods.

In training C3, pass through the feature vector x corresponding to the classifier C2₂Increasing the element corresponding to age (lambda)₂) Thereby obtaining the feature vector x of the user₃. And, purchase preference information of the user is acquired to acquire a tag value λ₃. With feature vectors x of multiple users₃And a tag value λ₃Train the classifier C3 so that the classifier C3 can be used to classify the user's purchasing preferences. In this training, the training for classifier C3 is associated with the label (i.e., age) corresponding to classifier C2. In addition, the feature vector x corresponding to the classifier C2₂The label corresponding to classifier C1 (i.e., gender) is included, thereby also associating training for classifier C3 with the gender label. In practice, the purchasing preference of the user is obviously related to gender and age, so the training method according to the embodiment of the specification optimizes the full utilization of the user information, and the prediction of the multi-label user portrait is more accurate.

In one embodiment, the chain classifier further includes a classifier C4 as shown in FIG. 1, and the classifier C4 is, for example, a classifier that classifies the purchasing power of the user. Thus, the user label corresponding to classifier C4 purchases capabilities for the user. The purchasing power tag value lambda can be adjusted according to the practical application condition₄And carrying out assignment. For example, purchasing power can be divided into low, medium, high, and high categories. And for λ by corresponding the purchasing power of the different classes to a predetermined value₄Assignments are made, e.g., low corresponds to a number 1, medium corresponds to a number 2, etc., so that when λ is₄When 2, the purchasing power on behalf of the user is a medium level.

In training C4, pass through the feature vector x corresponding to the classifier C3₃In-increase corresponds to a purchase preference tag value (λ)₃) Thereby obtaining a feature vector x of the user₄. And, purchasing ability information of the user is acquired to acquire a tag value λ₄. With feature vectors x of multiple users₄And a tag value λ₄Train the classifier C4 so that the classifier C4 can be used to classify the user's purchasing preferences. In this training, the training of classifier C3 is correlated with the user initial information, gender, age, and purchasing preferences, thereby optimizing the full use of user information and making the prediction of multi-labeled user profiles more accurate.

In one embodiment, in training a chain classifier comprising a plurality of classifiers, the learning order of the labels is determined based on the difficulty of label learning, i.e., labels that are easy to learn are learned first, and labels that are harder are learned second. For example, in the chain classifier including the above-described C1, C2, C3, and C4, the gender tag has only two classifications, and thus, the gender is relatively easy to learn, and therefore, the gender classifier is placed at the position of the classifier C1 that was learned first. Age tags are less classified and are also easier to determine and are therefore placed at the location of classifier C2. The purchasing preference has more sorting options, the purchasing preference of the user is less easy to determine, and the purchasing preference of the user is also related to the gender and the age of the user, so the purchasing preference classifier is placed at the position of the classifier C3. And the user's purchasing power is related to the user's gender, age, and purchasing preferences, the purchasing power tag is placed at the location of classifier C4.

In one embodiment, the obtained partial tag information of the partial users is missing. For example, a second group of users whose gender tag information is missing includes at least one user and the at least one user does not belong to the first group of users. In this case, the feature vector x of the first group of users has been utilized as described above₁And a gender tag value λ₁After training the gender classifier C1, the feature vectors x of the second group of users are selected₁' separately input into classifier C1 to obtain respective predicted values of gender labels λ for a second group of users₁'. Predicting the sex label₁' adding feature vector x as an element₁', to obtain a respective feature vector x for the second group of users₂'. Thereafter, the feature vectors x of the second group of users may be used₂' and age tag value λ₂As a training set, training the age classifier C2. And, the inclusion of a gender tag predictor lambda₁' the second set of user samples may also be used to train a subsequent user purchase preference classifier C3, a purchasing power classifier C4, and so on.

In one embodiment, a plurality of classifiers C are included in the classifier chain 11_jJ 1 … n, each classifier C_jThese n classifiers are concatenated to form a chain corresponding to a label of the user. Wherein, similarly to the above-described embodiment, for each classifier C_jWith the preceding classifier C₁、C₂、、、C_j-1The corresponding label values are correlated, so that the full utilization of the user information is optimized, and the prediction of the multi-label user portrait is more accurate.

FIG. 3 illustrates a method of obtaining a multi-tag user representation, in accordance with an embodiment of the present description, including: at step S31, a first feature vector of the user is acquired based on the user information; in step S32, inputting the first feature vector into a first classifier obtained by training according to the training method, and obtaining a first label prediction value of the user as a value of a first label of the user; at step S33, combining the first feature vector with the value of the first label to obtain a second feature vector of the user; and in step S34, inputting the second feature vector into a second classifier obtained by training according to the training method, and obtaining a second label prediction value of the user as a value of a second label of the user.

For example, the first classifier is the gender classifier C1 described above, and the second classifier is the age classifier C2 described above. First, based on user information, i.e., user input information and user operation history information, correspondence is acquiredFeature vector x at classifier C1₁. The feature vector x₁Inputting into classifier C1 to obtain predicted value λ of user's gender tag₁', as sex label value lambda₁. The feature vector x₁And gender tag prediction value lambda₁Combining, i.e. mixing₁Added as an element to the feature vector x₁To obtain a feature vector x of the user₂. The feature vector x₂Inputting the age classifier C2 to obtain the predicted value λ of the user's age label₂', as age tag value lambda₂。

In one embodiment, the obtained age tag value λ may also be used₂Adding feature vector x as an element₂To obtain a feature vector x of the user₃The feature vector x₃Input into the purchasing preference classifier C3, so as to obtain the purchasing preference label predicted value lambda of the user₃', as a purchase preference tag value of the user lambda₃。

In one embodiment, the obtained purchase preference tag may also be predicted value λ₃' adding feature vector x as an element₃To obtain a feature vector x of the user₄The feature vector x₄Inputting the data into the purchasing power classifier C4 to obtain the purchasing power label predicted value lambda of the user₄' as a purchasing power tag value of the user₄。

Thus, by a method of obtaining a multi-tag user representation according to embodiments of the present specification, a set of tags { λ ] of a user representation may be obtained₁’、λ₂’、λ₃’、λ₄'}. In the user portrait labelset { lambda₁’、λ₂’、λ₃’、λ₄In' }, age tag prediction value λ₂' obtaining with user initial information x1 and gender tag value λ₁Correlation, purchase preference tag prediction value λ₃' obtaining and user initial information x1, gender tag value lambda₁And age tag value λ₂Correlation, and purchasing power tag prediction value λ₄' ofObtaining the initial information x1 of the user and the value lambda of the gender label₁Age tag value λ₂And a purchase preference tag value λ₃And (4) associating. Therefore, the incidence relation among the labels of the user is fully considered when the labels of the user are predicted.

In one embodiment, the user initiation information may include partial user tag information. For example, age, sex information, etc. may be included in the entry information of the user, in which case the tag prediction value is replaced with a corresponding preset value of the tag information as the tag value of the user. For example, in the case where the age is included in the user-entered information, in the user portrait tab set, the age prediction value is replaced with a corresponding preset value of an age bracket corresponding to the age, as the age tab value of the user.

FIG. 4 illustrates an apparatus 400 for training a user representation classifier, which is a chain classifier including a first classifier and a second classifier, the user representation being a multi-label user representation according to an embodiment of the present description. The apparatus 400 comprises: a first obtaining unit 41 configured to obtain respective first feature vectors of a first group of users, the first feature vectors corresponding to information of the users, the information including registration information of the users and operation history information of the users; a second obtaining unit 42, configured to obtain values of respective first tags of the first group of users, where the values of the first tags correspond to first tag information of the users; a first training unit 43 configured to train a first classifier with a set of values of a first feature vector and a first label of each of the first group of users as a first training set; a third obtaining unit 44, configured to combine the first feature vector and the value of the first label of each of the first group of users to obtain a second feature vector of each of the first group of users; a fourth obtaining unit 45, configured to obtain values of respective second tags of the first group of users, where the values of the second tags correspond to second tag information of the users, and the second tags of the users are associated with the first tags of the users; and a second training unit 46 configured to train a second classifier with a set of values of a second feature vector and a second label of each of the first group of users as a second training set.

In one embodiment, an apparatus for training a user representation classifier is provided, the classifier being a chain classifier including a first classifier and a second classifier, wherein the first classifier is a first classifier trained by the above training method, and the user representation is a multi-label user representation, the apparatus comprising: a fifth obtaining unit, configured to obtain respective first feature vectors of a second group of users after training the first classifier, where the second group of users includes at least one user not belonging to the first group of users, the first feature vectors correspond to information of users, and the information includes registration information of the users and operation history information of the users; an input unit configured to input the respective first feature vectors of the second group of users into the first classifier to obtain respective first tag prediction values of the second group of users as values of first tags thereof; a combining unit configured to combine the first feature vector of each user in the second group of users with the value of the first tag to obtain a second feature vector of each user in the second group of users; a sixth obtaining unit configured to obtain values of respective second tags of a second group of users, the values of the second tags corresponding to second tag information of the users, and the second tags of the users being associated with the first tags of the users; and a third training unit configured to train the second classifier with a set of values of a second feature vector and a second label of each of the second group of users as a third training set.

FIG. 5 shows an apparatus 500 for obtaining a multi-tag user representation according to an embodiment of the present description, comprising: a first obtaining unit 51 configured to obtain a first feature vector of a user based on user information; a first input unit 52, configured to input the first feature vector into a first classifier obtained by training through the training method, and obtain a first label prediction value of the user as a value of a first label of the user; a second obtaining unit 53 configured to combine the first feature vector with the value of the first tag to obtain a second feature vector of the user; and a second input unit 54 configured to input the second feature vector into a second classifier obtained by training through the training method, and obtain a second label prediction value of the user as a value of a second label of the user.

In one embodiment, the apparatus for acquiring a multi-tag user representation further includes a third acquiring unit configured to replace the first tag prediction value with a preset value corresponding to the first tag information as a value of the first tag of the user after acquiring the first feature vector of the user.

In one embodiment, the apparatus for acquiring a multi-tag user representation further includes a fourth acquiring unit configured to replace the second tag prediction value with a preset value corresponding to the second tag information as the value of the first tag of the user after acquiring the second feature vector of the user.

Through the scheme for acquiring the multi-label user portrait according to the embodiment of the specification, label information of the user is transmitted among the chained classifiers, the relevance among the labels of the user is considered, the learning of the labels of the user portrait is more accurate and reliable, and the acquired multi-label user portrait is more accurate.

It will be further appreciated by those of ordinary skill in the art that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether these functions are performed in hardware or software depends on the particular application of the solution and design constraints. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.

The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.

The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims

1. A method of training a user representation classifier, the classifier being a chain classifier comprising a first classifier and a second classifier, the user representation being a multi-labeled user representation, the method comprising:

acquiring respective first feature vectors of a first group of users, wherein the first feature vectors correspond to user information, and the information comprises registration information of the users and operation history information of the users;

obtaining respective first tag values of the first group of users, wherein the first tag values correspond to first tag information of the users;

training a first classifier by taking a set of values of a first feature vector and a first label of each user of the first group as a first training set;

acquiring respective first feature vectors of a second group of users, wherein the second group of users comprises at least one user not belonging to the first group of users;

inputting respective first feature vectors of the second group of users into the first classifier to obtain respective first tag prediction values of the second group of users;

combining the first feature vector and the first label prediction value of each user in the second group of users to obtain respective second feature vectors of the second group of users;

obtaining respective values of second tags of a second group of users, wherein the values of the second tags correspond to second tag information of the users, and the second tags of the users are associated with the first tags of the users; and

and training a second classifier by taking a set of values of a second feature vector and a second label of each user of the second group as a third training set.

2. A method of training a user representation classifier according to claim 1, wherein the user information includes user label information.

3. The method of training a user representation classifier of claim 1, wherein said first label is age and said second label is a purchase preference.

4. The method of training a user representation classifier of claim 1, wherein the first label is a purchasing preference and the second label is a purchasing power.

5. A method of obtaining a multi-tag user representation, comprising:

acquiring a first feature vector of a user based on user information;

inputting the first feature vector into a first classifier obtained by training according to the method of any one of claims 1-4, and obtaining a first label prediction value of the user as a value of a first label of the user;

combining the first feature vector with the value of the first label to obtain a second feature vector of the user; and

inputting the second feature vector into a second classifier obtained by training according to the method of any one of claims 1-4, and obtaining a second label prediction value of the user as a value of a second label of the user.

6. The method of retrieving a multi-tag user representation of claim 5, further comprising, after retrieving a first feature vector of a user based on user information, replacing the first tag prediction value with a corresponding preset value of the first tag information in case the first tag information is included in the user information as a value of a first tag of the user.

7. The method of retrieving a multi-tag user representation of claim 5, further comprising, after retrieving a second feature vector of the user, replacing the second tag prediction value with a corresponding preset value of the second tag information in the case that the second tag information is included in the user information as a value of a second tag of the user.

8. An apparatus for training a user representation classifier, the classifier being a chain classifier comprising a first classifier and a second classifier, the user representation being a multi-label user representation, the apparatus comprising:

a first acquisition unit configured to acquire respective first feature vectors of a first group of users, the first feature vectors corresponding to information of the users, the information including registration information of the users and operation history information of the users;

a second obtaining unit configured to obtain values of respective first tags of the first group of users, the values of the first tags corresponding to first tag information of the users;

a first training unit configured to train a first classifier with a set of values of a first feature vector and a first label of each of the first group of users as a first training set;

a fifth obtaining unit, configured to obtain respective first feature vectors of a second group of users, where the second group of users includes at least one user not belonging to the first group of users;

an input unit configured to input respective first feature vectors of the second group of users into the first classifier to obtain respective first tag prediction values of the second group of users;

a combining unit configured to combine the first feature vector and the first label prediction value of each user in the second group of users to obtain a second feature vector of each user in the second group of users;

a sixth obtaining unit configured to obtain values of respective second tags of a second group of users, the values of the second tags corresponding to second tag information of the users, and the second tags of the users being associated with the first tags of the users; and

a third training unit configured to train the second classifier with a set of values of a second feature vector and a second label of each of the second group of users as a third training set.

9. An apparatus for training a user representation classifier according to claim 8, wherein said user information includes user label information.

10. The apparatus of claim 8, wherein said first label is an age and said second label is a purchase preference.

11. The apparatus for training a user representation classifier of claim 8, wherein said first label is a purchasing preference and said second label is a purchasing power.

12. An apparatus for obtaining a multi-tag user representation, comprising:

a first acquisition unit configured to acquire a first feature vector of a user based on user information;

a first input unit configured to input the first feature vector into a first classifier obtained by training according to the method of any one of claims 1 to 4, and obtain a first label prediction value of the user as a value of a first label of the user;

a second obtaining unit configured to combine the first feature vector with the value of the first tag to obtain a second feature vector of the user; and

a second input unit configured to input the second feature vector into a second classifier obtained by training according to the method of any one of claims 1 to 4, and obtain a second label prediction value of the user as a value of a second label of the user.

13. The apparatus for obtaining a multi-tag user representation according to claim 12, further comprising a first replacing unit configured to replace the first tag prediction value with a corresponding preset value of the first tag information as a value of a first tag of a user in a case where the first tag information is included in the user information after obtaining a first feature vector of the user based on the user information.

14. The apparatus for retrieving a multi-tag user representation according to claim 12, further comprising a second replacing unit configured to replace the second tag prediction value with a corresponding preset value of the second tag information as a value of a second tag of the user in a case where the second tag information is included in the user information after the second feature vector of the user is retrieved.