CN108229590B - Method and device for acquiring multi-label user portrait - Google Patents

Method and device for acquiring multi-label user portrait Download PDF

Info

Publication number
CN108229590B
CN108229590B CN201810148824.1A CN201810148824A CN108229590B CN 108229590 B CN108229590 B CN 108229590B CN 201810148824 A CN201810148824 A CN 201810148824A CN 108229590 B CN108229590 B CN 108229590B
Authority
CN
China
Prior art keywords
user
users
classifier
label
tag
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201810148824.1A
Other languages
Chinese (zh)
Other versions
CN108229590A (en
Inventor
张雅淋
李龙飞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Advanced New Technologies Co Ltd
Advantageous New Technologies Co Ltd
Original Assignee
Alibaba Group Holding Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Alibaba Group Holding Ltd filed Critical Alibaba Group Holding Ltd
Priority to CN201810148824.1A priority Critical patent/CN108229590B/en
Publication of CN108229590A publication Critical patent/CN108229590A/en
Priority to TW107146609A priority patent/TWI693567B/en
Priority to PCT/CN2019/073109 priority patent/WO2019157928A1/en
Application granted granted Critical
Publication of CN108229590B publication Critical patent/CN108229590B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/243Classification techniques relating to the number of classes
    • G06F18/24323Tree-organised classifiers
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Artificial Intelligence (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The embodiment of the specification discloses a method and a device for training a user portrait classifier and a method and a device for acquiring a multi-label user portrait. The training method comprises the following steps: acquiring respective first feature vectors of a first group of users; obtaining respective first label values of the first group of users; training a first classifier by taking a set of values of a first feature vector and a first label of each user of the first group as a first training set; combining the respective first feature vectors of the first group of users with the values of the first labels to obtain respective second feature vectors of the first group of users; obtaining respective second label values of the first group of users; and training a second classifier by taking the respective second feature vector of the first group of users and the value set of the second label as a second training set.

Description

Method and device for acquiring multi-label user portrait
Technical Field
The invention relates to the field of machine learning, in particular to a method and a device for training a user portrait classifier and a method and a device for acquiring a multi-label user portrait.
Background
With the popularity and development of the internet, more and more data can be collected by various internet operators. For example, for an e-commerce website, information such as purchase records and browsing records of a user can be obtained; for a search engine, information of a user's search record, click record, and the like can be obtained. In order to make better use of such information and provide more efficient and superior services, user portrayal is gaining attention. A user representation is a tagged user model that is abstracted based on information such as user social attributes, lifestyle habits, and consumption behaviors. Currently, the prior art includes a method for obtaining a user portrait based on a deep neural network, a method for obtaining a user portrait based on statistical data, and the like. Accordingly, a more efficient scheme for capturing a multi-tag user representation is desired.
Disclosure of Invention
Embodiments of the present disclosure aim to provide a more efficient scheme for obtaining a multi-tag user representation, so as to solve the deficiencies in the prior art.
To achieve the above object, one aspect of the present specification provides a method of training a user representation classifier, the classifier being a chain classifier including a first classifier and a second classifier, the user representation being a multi-label user representation, the method comprising: acquiring respective first feature vectors of a first group of users, wherein the first feature vectors correspond to user information, and the information comprises registration information of the users and operation history information of the users; obtaining respective first tag values of the first group of users, wherein the first tag values correspond to first tag information of the users; training a first classifier by taking a set of values of a first feature vector and a first label of each user of the first group as a first training set; combining the respective first feature vectors of the first group of users with the values of the first labels to obtain respective second feature vectors of the first group of users; obtaining respective values of second tags of the first group of users, wherein the values of the second tags correspond to second tag information of the users, and the second tags of the users are associated with the first tags of the users; and training a second classifier by taking the respective second feature vector of the first group of users and the value set of the second label as a second training set.
In one embodiment, in the method for training the chain classifier, the information of the user includes label information of the user.
In one embodiment, in the method of training a chain classifier described above, the first label is an age and the second label is a purchasing preference.
In one embodiment, in the method of training a chain classifier described above, the first label is a purchase preference and the second label is a purchase capability.
In another aspect of the present specification, there is provided a method of training a user representation classifier, the classifier being a chain classifier comprising a first classifier and a second classifier, wherein the first classifier is a first classifier trained by the above training method, and the user representation is a multi-label user representation, the method comprising: after training a first classifier, acquiring respective first feature vectors of a second group of users, wherein the second group of users comprises at least one user not belonging to the first group of users, the first feature vectors correspond to user information, and the information comprises registration information of the users and operation history information of the users; inputting the respective first feature vectors of the second group of users into the first classifier to obtain respective first label prediction values of the second group of users, and combining the first feature vectors and the first label prediction values of each user in the second group of users to obtain respective second feature vectors of the second group of users; obtaining respective values of second tags of a second group of users, wherein the values of the second tags correspond to second tag information of the users, and the second tags of the users are associated with the first tags of the users; and training the second classifier by taking a set of values of a second feature vector and a second label of each user of the second group as a third training set.
Another aspect of the present specification provides a method for obtaining a multi-tag user representation, comprising: acquiring a first feature vector of a user based on user information; inputting the first feature vector into a first classifier obtained by training through the training method, and obtaining a first label predicted value of the user as a value of a first label of the user; combining the first feature vector with the value of the first label to obtain a second feature vector of the user; and inputting the second feature vector into a second classifier obtained by training through the training method, and obtaining a second label predicted value of the user as a value of a second label of the user.
In an embodiment, the method for obtaining a multi-tag user representation further includes, after obtaining a first feature vector of a user based on user information, replacing the first tag prediction value with a corresponding preset value of the first tag information in a case where the first tag information is included in the user information, as a value of a first tag of the user.
In an embodiment, the method for obtaining a multi-tag user representation further includes, after obtaining the second feature vector of the user, replacing the second tag prediction value with a corresponding preset value of the second tag information in a case that the second tag information is included in the user information, as a value of a second tag of the user.
Another aspect of the present specification provides an apparatus for training a user representation classifier, the classifier being a chain classifier comprising a first classifier and a second classifier, the user representation being a multi-label user representation, the apparatus comprising: a first acquisition unit configured to acquire respective first feature vectors of a first group of users, the first feature vectors corresponding to information of the users, the information including registration information of the users and operation history information of the users; a second obtaining unit configured to obtain values of respective first tags of the first group of users, the values of the first tags corresponding to first tag information of the users; a first training unit configured to train a first classifier with a set of values of a first feature vector and a first label of each of the first group of users as a first training set; a third obtaining unit, configured to combine the first feature vector and the value of the first label of each of the first group of users to obtain a second feature vector of each of the first group of users; a fourth obtaining unit configured to obtain values of respective second tags of the first group of users, the values of the second tags corresponding to second tag information of the users, and the second tags of the users being associated with the first tags of the users; and a second training unit configured to train a second classifier with a set of values of a second feature vector and a second label of each of the first group of users as a second training set.
In another aspect of the present specification, there is provided an apparatus for training a user representation classifier, the classifier being a chain classifier including a first classifier and a second classifier, wherein the first classifier is a first classifier trained by the above training method, and the user representation is a multi-label user representation, the apparatus comprising: a fifth obtaining unit, configured to obtain respective first feature vectors of a second group of users, where the second group of users includes at least one user not belonging to the first group of users, the first feature vectors correspond to information of users, and the information includes registration information of the users and operation history information of the users; an input unit configured to input respective first feature vectors of the second group of users into the first classifier to obtain respective first tag prediction values of the second group of users; a combining unit configured to combine the first feature vector and the first label prediction value of each user in the second group of users to obtain a second feature vector of each user in the second group of users; a sixth obtaining unit configured to obtain values of respective second tags of a second group of users, the values of the second tags corresponding to second tag information of the users, and the second tags of the users being associated with the first tags of the users; and a third training unit configured to train the second classifier with a set of values of a second feature vector and a second label of each of the second group of users as a third training set.
Another aspect of the present specification provides an apparatus for obtaining a multi-tag user representation, comprising: a first acquisition unit configured to acquire a first feature vector of a user based on user information; a first input unit, configured to input the first feature vector into a first classifier obtained by training through the training method, and obtain a first label prediction value of the user as a value of a first label of the user; a second obtaining unit configured to combine the first feature vector with the value of the first tag to obtain a second feature vector of the user; and a second input unit configured to input the second feature vector into a second classifier obtained by training through the training method, and obtain a second label prediction value of the user as a value of a second label of the user.
By the scheme for acquiring the multi-label user portrait according to the embodiment of the specification, learning of each label of the user portrait is more accurate and reliable, and the acquired multi-label user portrait is more accurate.
Drawings
The embodiments of the present specification may be made more clear by describing the embodiments with reference to the attached drawings:
FIG. 1 shows a schematic diagram of a system 100 according to embodiments herein;
FIG. 2 illustrates a method of training a chain classifier in accordance with an embodiment of the present description;
FIG. 3 illustrates a method of obtaining a multi-tag user representation in accordance with an embodiment of the present description;
FIG. 4 illustrates an apparatus 400 for training a chain classifier in accordance with an embodiment of the present description; and
FIG. 5 illustrates an apparatus 500 for obtaining a multi-tag user representation in accordance with an embodiment of the present description.
Detailed Description
The embodiments of the present specification will be described below with reference to the accompanying drawings.
Fig. 1 shows a schematic diagram of a system 100 according to an embodiment of the present description. As shown in fig. 1, the system 100 includes a classifier chain 11. In one embodiment, a plurality of classifiers C are included in the classifier chain 11jJ 1 … n, each classifier CjThese n classifiers are concatenated to form a chain corresponding to a label of the user. Classifier CjThe n classifiers C can be based on one algorithm of decision tree, naive Bayes, support vector machine, association rule learning, neural network and genetic algorithmjMay be based on the same algorithm or may be based on different algorithms.
In one embodiment, as shown in FIG. 1, the classifier chain 11 includes 4 classifiers C1, C2, C3, and C4. For example, classifier C1 is a classifier corresponding to gender tags, classifier C2 is a classifier corresponding to age tags, classifier C3 is a classifier corresponding to purchase preference tags, and classifier C4 is a classifier corresponding to purchase capabilities.
In training the classifier chain 11, first, a first training set t1 including a plurality of feature vectors x corresponding to information of respective users is input to the classifier C1, the training set t11And tag values λ of respective users1. In the case where C1 is a gender classifier, the label value λ1Corresponding to the gender of the user. C1 was trained at t1 to obtain classifier C1 corresponding to the gender tag. Thereafter, the training set t2 is input to the classifier C2. As shown in the figure, the training set t2 includes a plurality of feature vectors x corresponding to information of respective users2And tag values of respective usersλ2. In the case where C2 is an age classifier, the label value λ2Corresponding to the age bracket of the user. The feature vector x2In addition to including the above-mentioned feature vector x1In addition, the label value λ of each user is included1I.e. values corresponding to different sexes. C2 is trained with training set t2 such that classifications of user age are associated with the user's gender label information. In training the later classifiers C3 and C4, training was done in the same way as C2, i.e., the feature vector x in t33Including x2And λ2Feature vector x in t44Including x3And λ3Thereby associating the respective tags of the user. The learning of the sample label is more accurate and reliable. For example, in the case where C3 is a purchase preference classifier, the label λ3Corresponding to the user purchase preference, input the feature vector x of C33Except that the feature vector x in C2 is included2In addition, a tag value λ is included2I.e., the user age tag value.
After training of all four classifiers C1-C4 is complete, i.e., classifier chain 11 is trained as a multi-label classification model that can be used to classify users with unknown labels. As shown in FIG. 1, the initial information of the user of unknown label is expressed as a feature vector x1'form input C1, user information is classified through C1, and a predicted value lambda of the user's gender label is obtained1'. C1 compares the user information x1' and λ1' input to C2, so that C2 is based on user information x1' and λ1' Classification is carried out to obtain the predicted value lambda of the age label of the user2'. Thereafter, in the same manner as in C2, classifier C3 would receive its feature vector x from the last classifier C22' and λ2', thus based on x2' and λ2' Classification is carried out to obtain the predicted value lambda of the purchasing preference label3'. The classifier C4 receives its feature vector x from the last classifier C33' and λ3', thus based on x3' and λ3' Classification is carried out to obtain the predicted value lambda of the purchasing power label4', thus canObtaining a user portrait labelset { lambda1’、λ2’、λ3’、λ4’}。
The following describes a method for training a chain classifier and a method for obtaining a multi-label user representation according to an embodiment of the present specification with reference to specific examples of the present specification.
FIG. 2 illustrates a method of training a user representation classifier that is a chain classifier including a first classifier and a second classifier, the user representation being a multi-label user representation, according to an embodiment of the present description. The method comprises the following steps: at step S21, acquiring respective first feature vectors of a first group of users, where the first feature vectors correspond to information of the users, and the information includes registration information of the users and operation history information of the users; at step S22, obtaining respective first label values of the first group of users, the first label values corresponding to the first label information of the users; in step S23, training a first classifier with a first training set of respective first feature vectors and sets of values of first labels of the first group of users; at step S24, combining the first feature vector and the value of the first label of each of the first group of users to obtain a second feature vector of each of the first group of users; at step S25, obtaining values of respective second tags of the first group of users, where the values of the second tags correspond to the second tag information of the users, and the second tags of the users are associated with the first tags of the users; and training a second classifier with the respective second feature vector of the first group of users and the set of values of the second label as a second training set in step S26.
First, in step S21, respective first feature vectors of a first group of users are acquired, the first feature vectors corresponding to information of the users, the information including registration information of the users and operation history information of the users. The first group of users includes a plurality of users, including, for example, on the order of tens of thousands of users. The first feature vector is a column vector in which the elements correspond to the values of the respective information fields of the user. The user information may include user original entry information, such as registration information of the user: mobile phone, mailbox, city, etc. The user information may also include operation history information of the user, such as a search and click record including, for example, description information of the product (product category, price, whether to reduce price), product advertisement, offer promotion, and the like. The user information may also include user tag information, such as gender, age, and the like.
After the user information is obtained, the values are combined into a feature vector by converting the corresponding information of the user into a corresponding numerical form. For example, the city name in the user information may be converted into a corresponding number set in advance, for example, 1 for beijing, 2 for shanghai, and the like. In order to accurately learn the classification of the user, the user information generally includes operation history information of the user over a period of time, such as search and click records of the user in half a year, three months, or one month.
In one embodiment, the user information is user initial information, namely, user input information and user operation history information are included.
Then, in step S22, values of respective first tags of the first group of users are obtained, the values of the first tags corresponding to the first tag information of the users. The first label corresponds to a first classifier, for example, the first classifier is a classifier for classifying the gender of the user, and the first label is the gender of the user. In another embodiment, the first classifier is a classifier that classifies the age of the user, and the first label is the age of the user. For some tags, such as gender, age, etc., tag information about them may be entered by the user himself or obtained directly from previous model-to-user ratings. For some tags, e.g., purchasing preferences, purchasing capabilities, etc., tag information about them may be obtained from past model-to-user ratings.
In one embodiment, the first classifier is classifier C1 shown in FIG. 1, and C1 is, for example, a gender classifier, and thus, the value of the first label λ1Is a value corresponding to user gender information. For example, a female is preset to correspond to the number 0 and a male is preset to correspond to the number 1, so that when λ is1When 0, the gender label is female, when λ1When 1, the gender label is male.
In step S23, a first classifier is trained with the respective first feature vector of the first group of users and the set of values of the first label as a first training set. In one embodiment, the first classifier may be any one of the classifiers C1, C2, C3 shown in FIG. 1, which includes respective first feature vectors x of a plurality of users in a training setjAnd a value λ of a first tag of each of a plurality of usersj(j=1、2、3)。
In one embodiment, the first classifier is classifier C1 of fig. 1, and classifier C1 is, for example, a classifier that classifies the gender of the user. The feature vector x of the user can be established based on the original input information of the user and the click record of the user1The value λ of the first label is a value corresponding to the gender (true gender or gender predicted from a previous model) of the user1With feature vectors x of a plurality of users1And a tag value λ1Train the classifier C1 so that the classifier C1 can be used to classify the gender of the user.
In step S24, the first feature vector and the value of the first label of each of the first group of users are combined to obtain the second feature vector of each of the first group of users. That is, the value of the first label is added as one element to the first feature vector, thereby obtaining the second feature vector.
In one embodiment, where classifier C1 is a gender classifier, the user's gender tag value λ is assigned1Added as an element to the feature vector x1For training of classifier C2.
In step S25, values of respective second tags of the first group of users are obtained, where the values of the second tags correspond to second tag information of the users. The second tag corresponds to the second classifier. For example, the second classifier may be a purchase preference classifier, and the second label is the user's purchase preference.
In one embodiment, the second classifier is a graphClassifier C2 in 1, which may be an age classifier, for example, so that the second label is a user age label. For example, the value of the second label λ2Can be preset to correspond to several age groups of the user, for example, can be preset when λ21, corresponding to the age range of 5-10 years, when lambda22, corresponding to the age range of 10-20 years, when lambda23, corresponding to the age range of 20-30 years, and so on. The acquisition of the information (i.e., age information) corresponding to the value of the second tag is similar to the acquisition of the information of the first tag, and is not described herein again.
In step S26, a second classifier is trained with a second training set of the respective second feature vectors and the set of values of the second labels of the first group of users. In one embodiment, the second classifier may be any one of the classifiers C2, C3, C4 shown in FIG. 1, which each include a respective second feature vector x of a plurality of users in a respective training setjAnd a value λ of a second label for each of the plurality of usersjAnd the second feature vector xjIncluding the tag value λ corresponding to the first classifierj-1Wherein j is 2, 3, 4.
In one embodiment, the second classifier is classifier C2 of fig. 1, which is, for example, a classifier that classifies the age of the user. By feature vector x at the user1Increased by an element corresponding to gender (i.e., λ)1) To obtain the feature vector x of the user2A value corresponding to the age of the user (i.e., λ)2) As the value of the second label, the feature vector x of a plurality of users2And a tag value λ2Train the classifier C2 so that the classifier C2 can be used to classify the age of the user. Thus, training for classifier C2 (i.e., an age classifier) is associated with the gender label.
In one embodiment, the chained classifiers described above further include classifier C3 as shown in FIG. 1, classifier C3 being, for example, a classifier that classifies user purchase preferences. Thus, the user label corresponding to classifier C3 purchases preferences for the user. The purchasing preference tag value lambda can be selected according to the practical application condition3To carry outAnd (7) assigning values. For example, purchasing preferences can be categorized into several categories, such as living goods, electronic products, luxury goods, and school supplies, according to the purchasing characteristics of different people. And for λ by corresponding different classes of purchase preferences to predetermined values3And (7) assigning values. For example, the living goods correspond to the number 1 and the electronic products correspond to the number 2, so that when λ is3When 1, the purchasing preference of the representative user is the living goods.
In training C3, pass through the feature vector x corresponding to the classifier C22Increasing the element corresponding to age (lambda)2) Thereby obtaining the feature vector x of the user3. And, purchase preference information of the user is acquired to acquire a tag value λ3. With feature vectors x of multiple users3And a tag value λ3Train the classifier C3 so that the classifier C3 can be used to classify the user's purchasing preferences. In this training, the training for classifier C3 is associated with the label (i.e., age) corresponding to classifier C2. In addition, the feature vector x corresponding to the classifier C22The label corresponding to classifier C1 (i.e., gender) is included, thereby also associating training for classifier C3 with the gender label. In practice, the purchasing preference of the user is obviously related to gender and age, so the training method according to the embodiment of the specification optimizes the full utilization of the user information, and the prediction of the multi-label user portrait is more accurate.
In one embodiment, the chain classifier further includes a classifier C4 as shown in FIG. 1, and the classifier C4 is, for example, a classifier that classifies the purchasing power of the user. Thus, the user label corresponding to classifier C4 purchases capabilities for the user. The purchasing power tag value lambda can be adjusted according to the practical application condition4And carrying out assignment. For example, purchasing power can be divided into low, medium, high, and high categories. And for λ by corresponding the purchasing power of the different classes to a predetermined value4Assignments are made, e.g., low corresponds to a number 1, medium corresponds to a number 2, etc., so that when λ is4When 2, the purchasing power on behalf of the user is a medium level.
In training C4, pass through the feature vector x corresponding to the classifier C33In-increase corresponds to a purchase preference tag value (λ)3) Thereby obtaining a feature vector x of the user4. And, purchasing ability information of the user is acquired to acquire a tag value λ4. With feature vectors x of multiple users4And a tag value λ4Train the classifier C4 so that the classifier C4 can be used to classify the user's purchasing preferences. In this training, the training of classifier C3 is correlated with the user initial information, gender, age, and purchasing preferences, thereby optimizing the full use of user information and making the prediction of multi-labeled user profiles more accurate.
In one embodiment, in training a chain classifier comprising a plurality of classifiers, the learning order of the labels is determined based on the difficulty of label learning, i.e., labels that are easy to learn are learned first, and labels that are harder are learned second. For example, in the chain classifier including the above-described C1, C2, C3, and C4, the gender tag has only two classifications, and thus, the gender is relatively easy to learn, and therefore, the gender classifier is placed at the position of the classifier C1 that was learned first. Age tags are less classified and are also easier to determine and are therefore placed at the location of classifier C2. The purchasing preference has more sorting options, the purchasing preference of the user is less easy to determine, and the purchasing preference of the user is also related to the gender and the age of the user, so the purchasing preference classifier is placed at the position of the classifier C3. And the user's purchasing power is related to the user's gender, age, and purchasing preferences, the purchasing power tag is placed at the location of classifier C4.
In one embodiment, the obtained partial tag information of the partial users is missing. For example, a second group of users whose gender tag information is missing includes at least one user and the at least one user does not belong to the first group of users. In this case, the feature vector x of the first group of users has been utilized as described above1And a gender tag value λ1After training the gender classifier C1, the feature vectors x of the second group of users are selected1' separately input into classifier C1 to obtain respective predicted values of gender labels λ for a second group of users1'. Predicting the sex label1' adding feature vector x as an element1', to obtain a respective feature vector x for the second group of users2'. Thereafter, the feature vectors x of the second group of users may be used2' and age tag value λ2As a training set, training the age classifier C2. And, the inclusion of a gender tag predictor lambda1' the second set of user samples may also be used to train a subsequent user purchase preference classifier C3, a purchasing power classifier C4, and so on.
In one embodiment, a plurality of classifiers C are included in the classifier chain 11jJ 1 … n, each classifier CjThese n classifiers are concatenated to form a chain corresponding to a label of the user. Wherein, similarly to the above-described embodiment, for each classifier CjWith the preceding classifier C1、C2、、、Cj-1The corresponding label values are correlated, so that the full utilization of the user information is optimized, and the prediction of the multi-label user portrait is more accurate.
FIG. 3 illustrates a method of obtaining a multi-tag user representation, in accordance with an embodiment of the present description, including: at step S31, a first feature vector of the user is acquired based on the user information; in step S32, inputting the first feature vector into a first classifier obtained by training according to the training method, and obtaining a first label prediction value of the user as a value of a first label of the user; at step S33, combining the first feature vector with the value of the first label to obtain a second feature vector of the user; and in step S34, inputting the second feature vector into a second classifier obtained by training according to the training method, and obtaining a second label prediction value of the user as a value of a second label of the user.
For example, the first classifier is the gender classifier C1 described above, and the second classifier is the age classifier C2 described above. First, based on user information, i.e., user input information and user operation history information, correspondence is acquiredFeature vector x at classifier C11. The feature vector x1Inputting into classifier C1 to obtain predicted value λ of user's gender tag1', as sex label value lambda1. The feature vector x1And gender tag prediction value lambda1Combining, i.e. mixing1Added as an element to the feature vector x1To obtain a feature vector x of the user2. The feature vector x2Inputting the age classifier C2 to obtain the predicted value λ of the user's age label2', as age tag value lambda2
In one embodiment, the obtained age tag value λ may also be used2Adding feature vector x as an element2To obtain a feature vector x of the user3The feature vector x3Input into the purchasing preference classifier C3, so as to obtain the purchasing preference label predicted value lambda of the user3', as a purchase preference tag value of the user lambda3
In one embodiment, the obtained purchase preference tag may also be predicted value λ3' adding feature vector x as an element3To obtain a feature vector x of the user4The feature vector x4Inputting the data into the purchasing power classifier C4 to obtain the purchasing power label predicted value lambda of the user4' as a purchasing power tag value of the user4
Thus, by a method of obtaining a multi-tag user representation according to embodiments of the present specification, a set of tags { λ ] of a user representation may be obtained1’、λ2’、λ3’、λ4'}. In the user portrait labelset { lambda1’、λ2’、λ3’、λ4In' }, age tag prediction value λ2' obtaining with user initial information x1 and gender tag value λ1Correlation, purchase preference tag prediction value λ3' obtaining and user initial information x1, gender tag value lambda1And age tag value λ2Correlation, and purchasing power tag prediction value λ4' ofObtaining the initial information x1 of the user and the value lambda of the gender label1Age tag value λ2And a purchase preference tag value λ3And (4) associating. Therefore, the incidence relation among the labels of the user is fully considered when the labels of the user are predicted.
In one embodiment, the user initiation information may include partial user tag information. For example, age, sex information, etc. may be included in the entry information of the user, in which case the tag prediction value is replaced with a corresponding preset value of the tag information as the tag value of the user. For example, in the case where the age is included in the user-entered information, in the user portrait tab set, the age prediction value is replaced with a corresponding preset value of an age bracket corresponding to the age, as the age tab value of the user.
FIG. 4 illustrates an apparatus 400 for training a user representation classifier, which is a chain classifier including a first classifier and a second classifier, the user representation being a multi-label user representation according to an embodiment of the present description. The apparatus 400 comprises: a first obtaining unit 41 configured to obtain respective first feature vectors of a first group of users, the first feature vectors corresponding to information of the users, the information including registration information of the users and operation history information of the users; a second obtaining unit 42, configured to obtain values of respective first tags of the first group of users, where the values of the first tags correspond to first tag information of the users; a first training unit 43 configured to train a first classifier with a set of values of a first feature vector and a first label of each of the first group of users as a first training set; a third obtaining unit 44, configured to combine the first feature vector and the value of the first label of each of the first group of users to obtain a second feature vector of each of the first group of users; a fourth obtaining unit 45, configured to obtain values of respective second tags of the first group of users, where the values of the second tags correspond to second tag information of the users, and the second tags of the users are associated with the first tags of the users; and a second training unit 46 configured to train a second classifier with a set of values of a second feature vector and a second label of each of the first group of users as a second training set.
In one embodiment, an apparatus for training a user representation classifier is provided, the classifier being a chain classifier including a first classifier and a second classifier, wherein the first classifier is a first classifier trained by the above training method, and the user representation is a multi-label user representation, the apparatus comprising: a fifth obtaining unit, configured to obtain respective first feature vectors of a second group of users after training the first classifier, where the second group of users includes at least one user not belonging to the first group of users, the first feature vectors correspond to information of users, and the information includes registration information of the users and operation history information of the users; an input unit configured to input the respective first feature vectors of the second group of users into the first classifier to obtain respective first tag prediction values of the second group of users as values of first tags thereof; a combining unit configured to combine the first feature vector of each user in the second group of users with the value of the first tag to obtain a second feature vector of each user in the second group of users; a sixth obtaining unit configured to obtain values of respective second tags of a second group of users, the values of the second tags corresponding to second tag information of the users, and the second tags of the users being associated with the first tags of the users; and a third training unit configured to train the second classifier with a set of values of a second feature vector and a second label of each of the second group of users as a third training set.
FIG. 5 shows an apparatus 500 for obtaining a multi-tag user representation according to an embodiment of the present description, comprising: a first obtaining unit 51 configured to obtain a first feature vector of a user based on user information; a first input unit 52, configured to input the first feature vector into a first classifier obtained by training through the training method, and obtain a first label prediction value of the user as a value of a first label of the user; a second obtaining unit 53 configured to combine the first feature vector with the value of the first tag to obtain a second feature vector of the user; and a second input unit 54 configured to input the second feature vector into a second classifier obtained by training through the training method, and obtain a second label prediction value of the user as a value of a second label of the user.
In one embodiment, the apparatus for acquiring a multi-tag user representation further includes a third acquiring unit configured to replace the first tag prediction value with a preset value corresponding to the first tag information as a value of the first tag of the user after acquiring the first feature vector of the user.
In one embodiment, the apparatus for acquiring a multi-tag user representation further includes a fourth acquiring unit configured to replace the second tag prediction value with a preset value corresponding to the second tag information as the value of the first tag of the user after acquiring the second feature vector of the user.
Through the scheme for acquiring the multi-label user portrait according to the embodiment of the specification, label information of the user is transmitted among the chained classifiers, the relevance among the labels of the user is considered, the learning of the labels of the user portrait is more accurate and reliable, and the acquired multi-label user portrait is more accurate.
It will be further appreciated by those of ordinary skill in the art that the elements and algorithm steps of the examples described in connection with the embodiments disclosed herein may be embodied in electronic hardware, computer software, or combinations of both, and that the components and steps of the examples have been described in a functional general in the foregoing description for the purpose of illustrating clearly the interchangeability of hardware and software. Whether these functions are performed in hardware or software depends on the particular application of the solution and design constraints. Skilled artisans may implement the described functionality in varying ways for each particular application, but such implementation decisions should not be interpreted as causing a departure from the scope of the present application.
The steps of a method or algorithm described in connection with the embodiments disclosed herein may be embodied in hardware, a software module executed by a processor, or a combination of the two. A software module may reside in Random Access Memory (RAM), memory, Read Only Memory (ROM), electrically programmable ROM, electrically erasable programmable ROM, registers, hard disk, a removable disk, a CD-ROM, or any other form of storage medium known in the art.
The above-mentioned embodiments are intended to illustrate the objects, technical solutions and advantages of the present invention in further detail, and it should be understood that the above-mentioned embodiments are merely exemplary embodiments of the present invention, and are not intended to limit the scope of the present invention, and any modifications, equivalent substitutions, improvements and the like made within the spirit and principle of the present invention should be included in the scope of the present invention.

Claims (14)

1. A method of training a user representation classifier, the classifier being a chain classifier comprising a first classifier and a second classifier, the user representation being a multi-labeled user representation, the method comprising:
acquiring respective first feature vectors of a first group of users, wherein the first feature vectors correspond to user information, and the information comprises registration information of the users and operation history information of the users;
obtaining respective first tag values of the first group of users, wherein the first tag values correspond to first tag information of the users;
training a first classifier by taking a set of values of a first feature vector and a first label of each user of the first group as a first training set;
acquiring respective first feature vectors of a second group of users, wherein the second group of users comprises at least one user not belonging to the first group of users;
inputting respective first feature vectors of the second group of users into the first classifier to obtain respective first tag prediction values of the second group of users;
combining the first feature vector and the first label prediction value of each user in the second group of users to obtain respective second feature vectors of the second group of users;
obtaining respective values of second tags of a second group of users, wherein the values of the second tags correspond to second tag information of the users, and the second tags of the users are associated with the first tags of the users; and
and training a second classifier by taking a set of values of a second feature vector and a second label of each user of the second group as a third training set.
2. A method of training a user representation classifier according to claim 1, wherein the user information includes user label information.
3. The method of training a user representation classifier of claim 1, wherein said first label is age and said second label is a purchase preference.
4. The method of training a user representation classifier of claim 1, wherein the first label is a purchasing preference and the second label is a purchasing power.
5. A method of obtaining a multi-tag user representation, comprising:
acquiring a first feature vector of a user based on user information;
inputting the first feature vector into a first classifier obtained by training according to the method of any one of claims 1-4, and obtaining a first label prediction value of the user as a value of a first label of the user;
combining the first feature vector with the value of the first label to obtain a second feature vector of the user; and
inputting the second feature vector into a second classifier obtained by training according to the method of any one of claims 1-4, and obtaining a second label prediction value of the user as a value of a second label of the user.
6. The method of retrieving a multi-tag user representation of claim 5, further comprising, after retrieving a first feature vector of a user based on user information, replacing the first tag prediction value with a corresponding preset value of the first tag information in case the first tag information is included in the user information as a value of a first tag of the user.
7. The method of retrieving a multi-tag user representation of claim 5, further comprising, after retrieving a second feature vector of the user, replacing the second tag prediction value with a corresponding preset value of the second tag information in the case that the second tag information is included in the user information as a value of a second tag of the user.
8. An apparatus for training a user representation classifier, the classifier being a chain classifier comprising a first classifier and a second classifier, the user representation being a multi-label user representation, the apparatus comprising:
a first acquisition unit configured to acquire respective first feature vectors of a first group of users, the first feature vectors corresponding to information of the users, the information including registration information of the users and operation history information of the users;
a second obtaining unit configured to obtain values of respective first tags of the first group of users, the values of the first tags corresponding to first tag information of the users;
a first training unit configured to train a first classifier with a set of values of a first feature vector and a first label of each of the first group of users as a first training set;
a fifth obtaining unit, configured to obtain respective first feature vectors of a second group of users, where the second group of users includes at least one user not belonging to the first group of users;
an input unit configured to input respective first feature vectors of the second group of users into the first classifier to obtain respective first tag prediction values of the second group of users;
a combining unit configured to combine the first feature vector and the first label prediction value of each user in the second group of users to obtain a second feature vector of each user in the second group of users;
a sixth obtaining unit configured to obtain values of respective second tags of a second group of users, the values of the second tags corresponding to second tag information of the users, and the second tags of the users being associated with the first tags of the users; and
a third training unit configured to train the second classifier with a set of values of a second feature vector and a second label of each of the second group of users as a third training set.
9. An apparatus for training a user representation classifier according to claim 8, wherein said user information includes user label information.
10. The apparatus of claim 8, wherein said first label is an age and said second label is a purchase preference.
11. The apparatus for training a user representation classifier of claim 8, wherein said first label is a purchasing preference and said second label is a purchasing power.
12. An apparatus for obtaining a multi-tag user representation, comprising:
a first acquisition unit configured to acquire a first feature vector of a user based on user information;
a first input unit configured to input the first feature vector into a first classifier obtained by training according to the method of any one of claims 1 to 4, and obtain a first label prediction value of the user as a value of a first label of the user;
a second obtaining unit configured to combine the first feature vector with the value of the first tag to obtain a second feature vector of the user; and
a second input unit configured to input the second feature vector into a second classifier obtained by training according to the method of any one of claims 1 to 4, and obtain a second label prediction value of the user as a value of a second label of the user.
13. The apparatus for obtaining a multi-tag user representation according to claim 12, further comprising a first replacing unit configured to replace the first tag prediction value with a corresponding preset value of the first tag information as a value of a first tag of a user in a case where the first tag information is included in the user information after obtaining a first feature vector of the user based on the user information.
14. The apparatus for retrieving a multi-tag user representation according to claim 12, further comprising a second replacing unit configured to replace the second tag prediction value with a corresponding preset value of the second tag information as a value of a second tag of the user in a case where the second tag information is included in the user information after the second feature vector of the user is retrieved.
CN201810148824.1A 2018-02-13 2018-02-13 Method and device for acquiring multi-label user portrait Active CN108229590B (en)

Priority Applications (3)

Application Number Priority Date Filing Date Title
CN201810148824.1A CN108229590B (en) 2018-02-13 2018-02-13 Method and device for acquiring multi-label user portrait
TW107146609A TWI693567B (en) 2018-02-13 2018-12-22 Method and device for obtaining multi-label user portrait
PCT/CN2019/073109 WO2019157928A1 (en) 2018-02-13 2019-01-25 Method and apparatus for acquiring multi-tag user portrait

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201810148824.1A CN108229590B (en) 2018-02-13 2018-02-13 Method and device for acquiring multi-label user portrait

Publications (2)

Publication Number Publication Date
CN108229590A CN108229590A (en) 2018-06-29
CN108229590B true CN108229590B (en) 2020-05-15

Family

ID=62661860

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201810148824.1A Active CN108229590B (en) 2018-02-13 2018-02-13 Method and device for acquiring multi-label user portrait

Country Status (3)

Country Link
CN (1) CN108229590B (en)
TW (1) TWI693567B (en)
WO (1) WO2019157928A1 (en)

Families Citing this family (19)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108229590B (en) * 2018-02-13 2020-05-15 阿里巴巴集团控股有限公司 Method and device for acquiring multi-label user portrait
CN109102341B (en) * 2018-08-27 2021-08-31 寿带鸟信息科技(苏州)有限公司 Old man portrait drawing method for old man service
CN109785034A (en) * 2018-11-13 2019-05-21 北京码牛科技有限公司 User's portrait generation method, device, electronic equipment and computer-readable medium
CN109886299B (en) * 2019-01-16 2024-05-24 平安科技(深圳)有限公司 User portrait method and device, readable storage medium and terminal equipment
CN109885745A (en) * 2019-01-16 2019-06-14 平安科技(深圳)有限公司 A kind of user draws a portrait method, apparatus, readable storage medium storing program for executing and terminal device
CN109858532A (en) * 2019-01-16 2019-06-07 平安科技(深圳)有限公司 A kind of user draws a portrait method, apparatus, readable storage medium storing program for executing and terminal device
CN110069706A (en) * 2019-03-25 2019-07-30 华为技术有限公司 Method, end side equipment, cloud side apparatus and the end cloud cooperative system of data processing
CN110852338A (en) * 2019-07-26 2020-02-28 平安科技(深圳)有限公司 User portrait construction method and device
CN110674877B (en) * 2019-09-26 2023-06-27 联想(北京)有限公司 Image processing method and device
CN112749323A (en) * 2019-10-31 2021-05-04 北京沃东天骏信息技术有限公司 Method and device for constructing user portrait
CN113496236B (en) * 2020-03-20 2024-05-24 北京沃东天骏信息技术有限公司 User tag information determining method, device, equipment and storage medium
CN111723257B (en) * 2020-06-24 2023-05-02 山东建筑大学 User portrayal method and system based on water usage rule
CN112035742B (en) * 2020-08-28 2023-10-24 康键信息技术(深圳)有限公司 User portrait generation method, device, equipment and storage medium
CN112308166B (en) * 2020-11-09 2023-08-01 建信金融科技有限责任公司 Method and device for processing tag data
CN112330510A (en) * 2020-11-20 2021-02-05 龙马智芯(珠海横琴)科技有限公司 Volunteer recommendation method and device, server and computer-readable storage medium
CN113568738A (en) * 2021-07-02 2021-10-29 上海淇玥信息技术有限公司 Resource allocation method and device based on multi-label classification, electronic equipment and medium
CN113806638B (en) * 2021-09-29 2023-12-08 中国平安人寿保险股份有限公司 Personalized recommendation method based on user portrait and related equipment
CN114399352B (en) * 2021-12-22 2023-06-16 中国电信股份有限公司 Information recommendation method and device, electronic equipment and storage medium
CN116091112A (en) * 2022-12-29 2023-05-09 江苏玖益贰信息科技有限公司 Consumer portrait generating device and portrait analyzing method

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102270192A (en) * 2011-07-27 2011-12-07 浙江工业大学 Multi-label classification control method based on smart volume management (SVM) active learning
CN102364498A (en) * 2011-10-17 2012-02-29 江苏大学 Multi-label-based image recognition method
CN102945371A (en) * 2012-10-18 2013-02-27 浙江大学 Classifying method based on multi-label flexible support vector machine
CN106447490A (en) * 2016-09-26 2017-02-22 广州速鸿信息科技有限公司 Credit investigation application method based on user figures

Family Cites Families (8)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
TW201011575A (en) * 2008-09-12 2010-03-16 Univ Nat Cheng Kung Recommendation apparatus and method of integrating rough sets and multiple-characteristic exploration
CN105446988B (en) * 2014-06-30 2018-10-30 华为技术有限公司 The method and apparatus for predicting classification
CN104615730B (en) * 2015-02-09 2017-10-27 浪潮集团有限公司 A kind of multi-tag sorting technique and device
US9699205B2 (en) * 2015-08-31 2017-07-04 Splunk Inc. Network security system
CN106650780B (en) * 2016-10-18 2021-02-12 腾讯科技(深圳)有限公司 Data processing method and device, classifier training method and system
CN106709754A (en) * 2016-11-25 2017-05-24 云南电网有限责任公司昆明供电局 Power user grouping method based on text mining
CN107220281B (en) * 2017-04-19 2020-02-21 北京协同创新研究院 Music classification method and device
CN108229590B (en) * 2018-02-13 2020-05-15 阿里巴巴集团控股有限公司 Method and device for acquiring multi-label user portrait

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102270192A (en) * 2011-07-27 2011-12-07 浙江工业大学 Multi-label classification control method based on smart volume management (SVM) active learning
CN102364498A (en) * 2011-10-17 2012-02-29 江苏大学 Multi-label-based image recognition method
CN102945371A (en) * 2012-10-18 2013-02-27 浙江大学 Classifying method based on multi-label flexible support vector machine
CN106447490A (en) * 2016-09-26 2017-02-22 广州速鸿信息科技有限公司 Credit investigation application method based on user figures

Also Published As

Publication number Publication date
WO2019157928A1 (en) 2019-08-22
TWI693567B (en) 2020-05-11
CN108229590A (en) 2018-06-29
TW201935344A (en) 2019-09-01

Similar Documents

Publication Publication Date Title
CN108229590B (en) Method and device for acquiring multi-label user portrait
CN110008399B (en) Recommendation model training method and device, and recommendation method and device
CN110046952B (en) Recommendation model training method and device, and recommendation method and device
Mao et al. Multiobjective e-commerce recommendations based on hypergraph ranking
CN110163647B (en) Data processing method and device
CN109189904A (en) Individuation search method and system
CN109934619A (en) User's portrait tag modeling method, apparatus, electronic equipment and readable storage medium storing program for executing
Bhaskaran et al. An efficient personalized trust based hybrid recommendation (tbhr) strategy for e-learning system in cloud computing
CN111626832B (en) Product recommendation method and device and computer equipment
CN108304429B (en) Information recommendation method and device and computer equipment
CN110727864B (en) User portrait method based on mobile phone App installation list
CN109189922B (en) Comment evaluation model training method and device
CN115660783A (en) Model training method, commodity recommendation method, device, equipment and medium
CN113656699B (en) User feature vector determining method, related equipment and medium
Manimurugan et al. A user-based video recommendation approach using CAC filtering, PCA with LDOS-CoMoDa
Shu et al. V-SVR+: Support vector regression with variational privileged information
CN118043802A (en) Recommendation model training method and device
CN113327132A (en) Multimedia recommendation method, device, equipment and storage medium
CN116663909A (en) Provider risk identification data processing method and device
CN115168700A (en) Information flow recommendation method, system and medium based on pre-training algorithm
CN112084406A (en) Short message processing method and device, electronic equipment and storage medium
CN112417290A (en) Training method of book sorting push model, electronic equipment and storage medium
CN112632275A (en) Crowd clustering data processing method, device and equipment based on personal text information
CN111275261A (en) Resource flow prediction method and device
KR102637198B1 (en) Method, computing device and computer program for sharing, renting and selling artificial intelligence model through artificial intelligence model production platform

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant
TR01 Transfer of patent right

Effective date of registration: 20201021

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Innovative advanced technology Co.,Ltd.

Address before: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee before: Advanced innovation technology Co.,Ltd.

Effective date of registration: 20201021

Address after: Cayman Enterprise Centre, 27 Hospital Road, George Town, Grand Cayman Islands

Patentee after: Advanced innovation technology Co.,Ltd.

Address before: A four-storey 847 mailbox in Grand Cayman Capital Building, British Cayman Islands

Patentee before: Alibaba Group Holding Ltd.

TR01 Transfer of patent right