CN112559853B - User tag generation method and device - Google Patents

User tag generation method and device Download PDF

Info

Publication number
CN112559853B
CN112559853B CN201910917704.8A CN201910917704A CN112559853B CN 112559853 B CN112559853 B CN 112559853B CN 201910917704 A CN201910917704 A CN 201910917704A CN 112559853 B CN112559853 B CN 112559853B
Authority
CN
China
Prior art keywords
keyword
topic
user
theme
weight
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201910917704.8A
Other languages
Chinese (zh)
Other versions
CN112559853A (en
Inventor
李慧
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Wodong Tianjun Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Wodong Tianjun Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201910917704.8A priority Critical patent/CN112559853B/en
Publication of CN112559853A publication Critical patent/CN112559853A/en
Application granted granted Critical
Publication of CN112559853B publication Critical patent/CN112559853B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/90Details of database functions independent of the retrieved data types
    • G06F16/95Retrieval from the web
    • G06F16/953Querying, e.g. by the use of web search engines
    • G06F16/9535Search customisation based on user profiles and personalisation
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor
    • G06F16/20Information retrieval; Database structures therefor; File system structures therefor of structured data, e.g. relational data
    • G06F16/28Databases characterised by their database models, e.g. relational or object models
    • G06F16/284Relational databases
    • G06F16/288Entity relationship models

Landscapes

  • Engineering & Computer Science (AREA)
  • Databases & Information Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

The invention discloses a method and a device for generating a user label, and relates to the technical field of computers. One embodiment of the method comprises the following steps: word segmentation is carried out on the user information to obtain a word set of the user; obtaining a plurality of topics and keywords of each topic according to the word set of the user; for each topic, calculating the weight of the topic according to the operation information of the target to which the key word of the topic belongs; and selecting a target theme from a plurality of themes according to the weight of each theme, and taking the target theme as a label of the user. According to the embodiment, the generation efficiency of the user tag is improved, and the reusability of the generation method of the user tag is improved.

Description

User tag generation method and device
Technical Field
The present invention relates to the field of computer technologies, and in particular, to a method and an apparatus for generating a user tag.
Background
Currently, the generation process of the user tag includes: according to a specific application scene, the behavior of a user is manually analyzed, the generation logic of the tag is defined according to the behavior of the user, and the user tag is generated by utilizing the generation logic of the tag.
In the process of implementing the present invention, the inventor finds that at least the following problems exist in the prior art:
The existing user label generation mode needs to consume a great deal of manpower, material resources and time to analyze the user behavior and define the label generation logic, so that the problem of low user label generation efficiency in the prior art is caused. In addition, if the application scene changes, the existing user tag generation method is not suitable for the changed application scene, and the problem of poor reusability of the user tag generation method is caused.
Disclosure of Invention
In view of this, the embodiment of the invention provides a method and a device for generating a user tag, which can improve the generating efficiency of the user tag and the reusability of the generating method of the user tag.
In order to achieve the above object, according to an aspect of an embodiment of the present invention, there is provided a method for generating a user tag.
The method for generating the user tag comprises the following steps:
word segmentation is carried out on the user information to obtain a word set of the user;
obtaining a plurality of topics and keywords of each topic according to the word set of the user;
for each topic, calculating the weight of the topic according to the operation information of the target to which the key word of the topic belongs;
and selecting a target theme from a plurality of themes according to the weight of each theme, and taking the target theme as a label of the user.
In one embodiment, the operation information includes an operation time, an operation number, and an operation attribute value;
calculating the weight of the theme according to the operation information of the target to which the keyword of the theme belongs, including:
multiplying, for each operation, a sum of operation times of objects to which each keyword of the subject belongs with a time decay weight of the operation time, multiplying a sum of operation times of objects to which each keyword of the subject belongs with a time decay weight of the operation times, and multiplying a sum of operation attribute values of objects to which each keyword of the subject belongs with a time decay weight of the operation attribute values, taking the sum of the obtained products as the weight of the operation;
and adding the weights of the operations, wherein the obtained value is used as the weight of the theme.
In one embodiment, the method for calculating the time decay weight of the operation time includes:
for each keyword of the theme, multiplying the absolute value of the difference value between the operation time of the target to which the keyword of the theme belongs and the current time by a preset time attenuation factor to obtain a product, and adding 1 to the product to be used as the weight of the keyword of the theme;
Dividing the sum of the operation time of the target to which each keyword of the theme belongs with the weight of each keyword of the theme to obtain a plurality of quotients; summing all the quotients to obtain a sum; the sum is logarithmically processed, and the obtained value is used as the time attenuation weight of the operation time.
In one embodiment, obtaining a plurality of topics and keywords of each topic according to the word set of the user includes:
extracting a plurality of topics from the word set of the user, and obtaining a first keyword of each topic;
and for each topic, obtaining a second keyword of the topic according to the topic, the first keyword of the topic and a preset word stock, wherein the keywords of the topic comprise the first keyword of the topic and the second keyword of the topic.
In one embodiment, obtaining the second keyword of the topic according to the topic, the first keyword of the topic and a preset word stock includes:
generating at least one target vector according to the topic and a first keyword of the topic, and calculating the average distance between the target vectors by adopting at least one target vector;
and generating a vector of each word in a preset word stock, and if the distance between the vector of each word and any one of the target vectors is greater than the average distance, the second keyword of the theme comprises the word.
In one embodiment, selecting a target topic from a plurality of topics based on the weight of each topic includes:
and arranging the weights of all the topics in the order from big to small, and selecting the topics ranked in the first three positions as target topics.
In one embodiment, the user information includes:
historical behavior information of a user, information of an item associated with the user, information of a home of the item.
In order to achieve the above object, according to another aspect of the embodiments of the present invention, there is provided a user tag generating apparatus.
The user label generating device of the embodiment of the invention comprises the following steps:
the word segmentation unit is used for segmenting the user information to obtain a word set of the user;
the processing unit is used for obtaining a plurality of topics and keywords of each topic according to the word set of the user;
a calculating unit, configured to calculate, for each topic, a weight of the topic according to operation information on a target to which a keyword of the topic belongs;
and the selection unit is used for selecting a target theme from a plurality of themes according to the weight of each theme, and taking the target theme as the label of the user.
In one embodiment, the operation information includes an operation time, an operation number, and an operation attribute value;
the computing unit is used for:
multiplying, for each operation, a sum of operation times of objects to which each keyword of the subject belongs with a time decay weight of the operation time, multiplying a sum of operation times of objects to which each keyword of the subject belongs with a time decay weight of the operation times, and multiplying a sum of operation attribute values of objects to which each keyword of the subject belongs with a time decay weight of the operation attribute values, taking the sum of the obtained products as the weight of the operation;
and adding the weights of the operations, wherein the obtained value is used as the weight of the theme.
In one embodiment, the computing unit is configured to:
for each keyword of the theme, multiplying the absolute value of the difference value between the operation time of the target to which the keyword of the theme belongs and the current time by a preset time attenuation factor to obtain a product, and adding 1 to the product to be used as the weight of the keyword of the theme;
dividing the sum of the operation time of the target to which each keyword of the theme belongs with the weight of each keyword of the theme to obtain a plurality of quotients; summing all the quotients to obtain a sum; the sum is logarithmically processed, and the obtained value is used as the time attenuation weight of the operation time.
In one embodiment, the processing unit is configured to:
extracting a plurality of topics from the word set of the user, and obtaining a first keyword of each topic;
and for each topic, obtaining a second keyword of the topic according to the topic, the first keyword of the topic and a preset word stock, wherein the keywords of the topic comprise the first keyword of the topic and the second keyword of the topic.
In one embodiment, the processing unit is configured to:
generating at least one target vector according to the topic and a first keyword of the topic, and calculating the average distance between the target vectors by adopting at least one target vector;
and generating a vector of each word in a preset word stock, and if the distance between the vector of each word and any one of the target vectors is greater than the average distance, the second keyword of the theme comprises the word.
In one embodiment, the selection unit is configured to:
and arranging the weights of all the topics in the order from big to small, and selecting the topics ranked in the first three positions as target topics.
In one embodiment, the user information includes:
historical behavior information of a user, information of an item associated with the user, information of a home of the item.
To achieve the above object, according to still another aspect of an embodiment of the present invention, there is provided an electronic apparatus.
An electronic device according to an embodiment of the present invention includes: one or more processors; and the storage device is used for storing one or more programs, and when the one or more programs are executed by the one or more processors, the one or more processors are enabled to realize the generation method of the user label provided by the embodiment of the invention.
To achieve the above object, according to still another aspect of an embodiment of the present invention, a computer-readable medium is provided.
The computer readable medium of the embodiment of the invention stores a computer program, and the program is executed by a processor to implement the method for generating the user label provided by the embodiment of the invention.
One embodiment of the above invention has the following advantages or benefits: word segmentation is carried out on the user information to obtain a word set of the user; obtaining a plurality of topics and keywords of each topic according to a word set of a user; for each topic, calculating the weight of the topic according to the operation information of the target to which the key word of the topic belongs; and selecting a target theme from a plurality of themes according to the weight of each theme, and taking the target theme as a label of the user. Therefore, the generation logic of the labels is not needed to be analyzed, the user labels are directly generated according to the user information, the generation efficiency of the user labels is improved, the generation method of the user labels is suitable for any application scene, and the reusability of the generation method of the user labels is improved.
Further effects of the above-described non-conventional alternatives are described below in connection with the embodiments.
Drawings
The drawings are included to provide a better understanding of the invention and are not to be construed as unduly limiting the invention. Wherein:
fig. 1 is a schematic diagram of a main flow of a user tag generation method according to an embodiment of the present invention;
FIG. 2 is a schematic diagram of word segmentation in a method for generating a user tag according to an embodiment of the present invention;
fig. 3 is a schematic diagram of user information in a method for generating a user tag according to an embodiment of the present invention;
FIG. 4 is a schematic diagram of calculating weights of topics in a method of generating user tags according to an embodiment of the present invention;
fig. 5 is a schematic view of an application scenario of a method for generating a user tag according to an embodiment of the present invention;
FIG. 6 is a schematic diagram of topics and keywords in a method for generating user tags according to an embodiment of the present invention;
fig. 7 is a schematic diagram of main units of a user tag generating apparatus according to an embodiment of the present invention;
FIG. 8 is an exemplary system architecture diagram in which embodiments of the present invention may be applied;
fig. 9 is a schematic diagram of a computer system suitable for use in implementing an embodiment of the invention.
Detailed Description
Exemplary embodiments of the present invention will now be described with reference to the accompanying drawings, in which various details of the embodiments of the present invention are included to facilitate understanding, and are to be considered merely exemplary. Accordingly, those of ordinary skill in the art will recognize that various changes and modifications of the embodiments described herein can be made without departing from the scope and spirit of the invention. Also, descriptions of well-known functions and constructions are omitted in the following description for clarity and conciseness.
It is noted that embodiments of the invention and features of the embodiments may be combined with each other without conflict.
The following describes problems of the prior art by way of specific examples:
example 1: the tag of a group of night cats needs to be manually analyzed according to experience to show that the user is used to buy, browse, search and other behaviors at late night, and the generation logic of the tag is manually defined according to the behaviors of the user, so that the user tag is generated.
Example 2: the tags for pet owners require manual analysis of how the pet owners behave, get used and have characteristics according to experience, and manual definition of the tag generation logic according to the user's behavior, e.g. whether there are cat food and dog food purchased, so as to generate user tags.
Example 3:618 shopping interest group, or white interest preference.
For example 1 and example 2, a lot of manpower, material resources and time are required to be consumed, and thus, efficiency is low. In addition, the dimension considered by the manual analysis is not comprehensive, and in example 2, whether the user has laundry or the like related to purchasing the pet is not considered manually, and thus, the accuracy is not high. For example 3, the tags that generate 618 the shopping interest group can only be applied to the application scene of shopping at 618, and cannot be applied to other scenes, and thus, the reusability is poor.
In order to solve the problems in the prior art, an embodiment of the present invention provides a method for generating a user tag, as shown in fig. 1, where the method includes:
step S101, word segmentation is carried out on user information, and a word set of the user is obtained.
In the step, when the method is specifically implemented, user information is acquired, and a text word segmentation algorithm in a word segmentation device is adopted to segment the user information, so that a word set of a user is obtained.
The user information includes: historical behavior information of a user, information of an item associated with the user, information of a home of the item. It should be understood that, without affecting the embodiment of the present invention, a person skilled in the art may flexibly set the content included in the user information, and in another embodiment, the user information further includes information of a terminal used by the user.
It should be noted that, in the e-commerce field, the catering service field, the network entertainment field, etc., the method provided by the embodiment of the present invention may be used to generate the user tag. In the e-commerce field, the item may be a commodity and the home may be a merchant. In the food and beverage service field, the item may be a meal and the home may be a restaurant. In the field of network entertainment, the item may be news and the attribution may be a publisher of the news.
As shown in fig. 2 and 3, the historical behavior information of the user may include: the user exchanges information with customer service about the article, ordering information, refund information, reporting complaint information, and user evaluation information on the article (e.g., good quality, high cost performance, beautiful style, etc.), etc. The information of the attribution of the article includes: evaluation information of the user on the attribution of the article, self-evaluation information of the attribution of the article, shipping place information, and the like. The information of the item associated with the user includes: the name of the article (e.g., male shoe clover lovers style superstar summer recreational sports board shoes), the article's article name (e.g., health article, fitness equipment article, etc.), the article's attributes (e.g., tidal shoes, clover, fashion, young, leisure, conch head, neutral, antique, etc.), the article's attributes (e.g., health, small and health care, etc.), and the article's ordering information, etc.
It should be noted that, when the user tag is generated, only the user is not considered any more, and the article associated with the user and the attribution of the article are considered, so that the user information is collected in more dimensions, and the generation accuracy of the user tag is improved.
Step S102, obtaining a plurality of topics and keywords of each topic according to the word set of the user.
It should be noted that, the specific embodiments of this step are described in detail below, and are not described herein.
Step S103, for each topic, calculating the weight of the topic according to the operation information of the target to which the keyword of the topic belongs.
It should be noted that, the specific embodiments of this step are described in detail below, and are not described herein.
And step S104, selecting a target theme from a plurality of themes according to the weight of each theme, and taking the target theme as the label of the user.
In the implementation, the weights of all topics are arranged according to the order of the weights of the topics from big to small, the topics arranged in the first three positions are selected as target topics, and the target topics are used as labels of the users. It should be appreciated that one skilled in the art may flexibly set the number of target topics according to the specific application scenario, and in another embodiment, the topics ranked five first are selected as target topics. For example, user a, interest preference TOP5: fashion, home, movie, life, homemade ]; user B, preference for interest TOP5 [ pets, science and technology, stay up, beautify, keep in good health ]. As another example, table 1 shows:
TABLE 1 user's tag
In the embodiment of the present invention, as shown in fig. 4, the operation information includes an operation time, an operation number, and an operation attribute value;
calculating the weight of the theme according to the operation information of the target to which the keyword of the theme belongs, including:
multiplying, for each operation, a sum of operation times of objects to which each keyword of the subject belongs with a time decay weight of the operation time, multiplying a sum of operation times of objects to which each keyword of the subject belongs with a time decay weight of the operation times, and multiplying a sum of operation attribute values of objects to which each keyword of the subject belongs with a time decay weight of the operation attribute values, taking the sum of the obtained products as the weight of the operation;
and adding the weights of the operations, wherein the obtained value is used as the weight of the theme.
In this embodiment, as shown in fig. 4, the operations (the operations are the actions in fig. 4) include: searching, clicking, browsing, focusing, purchasing, ordering, forwarding, commenting, sharing, and the like. In addition, the operation information may be acquired from various servers. Before calculating the weight of the theme, dimensionless processing is performed on all operation time, all operation times and all operation attribute values, so that the influence of the dimension is eliminated. In addition, the attribute value may be an amount, value, or the like.
As shown in fig. 3, the target to which the keyword belongs may be an article, a merchant, a brand, or the like, and the operation of the target to which the keyword of the subject belongs may be purchase of the article to which the keyword of the subject belongs, search of the article to which the keyword of the subject belongs, attention of the merchant to which the keyword of the subject belongs, or browsing of the item detail page to which the keyword of the subject belongs.
The time decay weight of the operation times is similar to the calculation method of the time decay weight of the operation time, and the difference is that the sum of the operation times of the targets to which each keyword of the theme belongs is replaced by the sum of the operation times of the targets to which each keyword of the theme belongs. Similarly, the time decay weight of the operation attribute value is similar to the calculation method of the time decay weight of the operation time, except that the sum of the operation times of the targets to which each keyword of the subject belongs is replaced with the sum of the operation attribute values of the targets to which each keyword of the subject belongs. The method for calculating the time decay weight of the operation time is described in detail below, and is not described here.
It should be appreciated that one skilled in the art may flexibly set whether the operation information includes an operation time, or whether the operation number is included, or whether the operation attribute value is included, according to specific requirements. In another embodiment, the operation information includes an operation time and an operation number;
Calculating the weight of the theme according to the operation information of the target to which the keyword of the theme belongs, including:
multiplying the sum of the operation times of the targets of each keyword of the subject by the time attenuation weight of the operation time, multiplying the sum of the operation times of the targets of each keyword of the subject by the time attenuation weight of the operation times, and taking the obtained sum of the products as the weight of the operation;
and adding the weights of the operations, wherein the obtained value is used as the weight of the theme.
In this embodiment, the weight of the theme is calculated, and the time attenuation weights of various operations are considered, so that the label of the user changes along with the change of the behavior of the user in time, and the generation accuracy of the label of the user is further improved.
In the embodiment of the invention, the method for calculating the time attenuation weight of the operation time comprises the following steps:
for each keyword of the theme, multiplying the absolute value of the difference value between the operation time of the target to which the keyword of the theme belongs and the current time by a preset time attenuation factor to obtain a product, and adding 1 to the product to be used as the weight of the keyword of the theme;
Dividing the sum of the operation time of the target to which each keyword of the theme belongs with the weight of each keyword of the theme to obtain a plurality of quotients; summing all the quotients to obtain a sum; the sum is logarithmically processed, and the obtained value is used as the time attenuation weight of the operation time.
In this embodiment, the expression of the time decay weight for the operating time, when embodied:
wherein alpha is a preset time attenuation factor, t is the current time, t Ai To the operation time of the object to which each keyword of the subject belongs, mt Ai The sum of the operation time of the target to which each keyword of the subject belongs.
The larger the absolute value of the difference between the operation time and the current time, i.e. the earlier the operation time, the smaller the time decay weight of the operation time, and the smaller the weight influence of the operation on the theme.
The manner in which the time decay factor is defined is generally related to the average shopping life cycle of the user at the enterprise. The decay factor is a discrete element in a set, and the average length of time an element remains in the set can be calculated, which is referred to as the average lifetime (commonly referred to as lifetime), and is related to the decay rate. Time decay factor = 1/average shopping life cycle. Such as: the average shopping period of the jindong user is once 3 days, and the time attenuation factor=1/3, and the time attenuation factor depends on the shopping situation of the jindong user.
In an embodiment of the present invention, as shown in fig. 5, step S102 may include:
extracting a plurality of topics from the word set of the user, and obtaining a first keyword of each topic;
and for each topic, obtaining a second keyword of the topic according to the topic, the first keyword of the topic and a preset word stock, wherein the keywords of the topic comprise the first keyword of the topic and the second keyword of the topic.
In this embodiment, in implementation, a document topic generation model (Latent Dirichlet Allocation, LDA for short) is adopted, and the implementation is performed by judging the association and co-occurrence frequency of topics and other words, wherein the topics are probability condition distribution of vocabulary on words, and if the relationship between the vocabulary and the topics is more intimate, the condition probability is larger, and conversely the condition probability is smaller, for example, p (tea|stay up) =0.2, p (health|stay up) =0.15, p (health|stay up) =0.00000001, topic "stay up" and keywords "tea" and "health care") or probability latent semantic analysis (Probabilistic Latent Semantic Analysis, pLSA) algorithm is adopted, a plurality of topics are extracted from a word set of a user, and a first keyword of each topic is obtained. The step converts manual induction summarizing of which commodities, categories and interest preferences corresponding to orders into automatic extraction of topics and first keywords from information, and further improves the generation efficiency of user labels.
The subject matter and the first keyword are described below with a specific example: the theme represents abstract meaning of a first keyword, the theme is fashionable, and the first keyword is star, show, personality, sexy, tide, idol, fashion and the like.
In addition, a second keyword of the theme is obtained according to the theme, the first keyword of the theme and a preset word stock, and the first keyword of the theme is expanded.
In this embodiment, the keywords of each topic include a first keyword and a second keyword, where the first keyword is obtained according to various information of the user, and the second keyword is obtained by expanding the first keyword, so that the keywords of each topic are more comprehensive, and the generation accuracy of the user tag is further improved.
It should be noted that, in another embodiment, step S102 may include: extracting a plurality of topics from the word set of the user, and obtaining first keywords of each topic, wherein the keywords of each topic only comprise the first keywords of each topic and do not comprise the second keywords of each topic. For example, as shown in FIG. 6, a plurality of topics are extracted from the user's vocabulary, the plurality of topics including: trend, stay up night, home and recreation. The first keywords of the trend, namely the keywords of the trend, comprise fashion, quality, same style, little fragrance and the like. The embodiment can also realize the effect of the invention, but because the first keyword of each theme is not expanded, the keyword of each theme is not comprehensive, and the generation accuracy of the user label is slightly lower.
In the embodiment of the invention, obtaining the second keyword of the theme according to the theme, the first keyword of the theme and a preset word stock comprises the following steps:
generating at least one target vector according to the topic and a first keyword of the topic, and calculating the average distance between the target vectors by adopting at least one target vector;
and generating a vector of each word in a preset word stock, and if the distance between the vector of each word and any one of the target vectors is greater than the average distance, the second keyword of the theme comprises the word.
In this embodiment, in implementation, a Word-to-vector method (Word vector, also called Word2vec, maps words into K-dimensional real vectors through training) is adopted, a vector of a topic is generated according to the topic, a vector of a first keyword of the topic is generated according to a first keyword of the topic, and the vector of the topic and the vector of the first keyword of the topic are both target vectors. Of course, for each Word in the preset Word stock, word to Vector may also be used to generate the Vector of the Word.
The distance between the vector of the word and any target vector can be calculated by using Euclidean distance (Euclidean Distance), manhattan distance (Manhattan Distance), marsdet distance (Mahalanobis Distance) or Cosine of included angle (Cosine) and the like. In addition, the smaller the distance between the two vectors, the more similar the two words used to generate the two vectors; the larger the cosine value of the two vectors, the more similar the two words used to generate the two vectors.
The following describes a specific example of calculating an average distance between the target vectors using at least one of the target vectors: the target vector includes a first vector, a second vector, and a third vector, and the average distance= (distance of the first vector from the second vector + distance of the second vector from the third vector + distance of the first vector from the third vector)/3. In addition, the method for calculating the distance between the two vectors can adopt an included angle cosine or the like.
The embodiment is described below with a specific example: the first keyword of the topic includes: fashion, home and stay up night. Generating a fashionable target vector according to fashion, and generating a household target vector according to household; generating a target vector for stay-up according to stay-up. The distance between each target vector and the vector of each word in the preset word stock is shown in table 2:
TABLE 2 distance between vectors
Finally, fashionable expansion words: women's wear, men's wear, welt, design, retro; family expansion words: women's dress, men's dress, weimi, furniture, appliances, teapots and health preserving devices; extended words of stay up: women's dress, men's dress, weimi, furniture, appliances, teapots and health preserving. All the obtained expansion words are second keywords of the theme.
Currently, for large internet enterprises, especially the e-commerce industry. The storage of data and the tagging of labels has found a great deal of accumulation and application. The technology of generating user labels is relatively sophisticated, but how to better apply labels to services is indeed a common problem. For business personnel, the application cost of the tag is high, on one hand, the business personnel cannot directly clearly develop the logic of the personnel when developing the tag, and the interpretability and business understanding of the tag often have a certain barrier. On the other hand, labels that are directly related to business, which are highly interpretable, highly aggregated, and of interest preference classes, are often more urgent for business personnel. For example, tags with gender, 1 ten thousand income per month, resident Beijing, advanced white collar and the like are required to be combined with knowledge when applied, the combination cost is high, and it is unclear which group the combined tags belong to. And interest preferences of the user, such as: love movies, love lives, love science and technology, love sports, love stay up, and for business personnel, such tags are simple and easy to understand and have a clear meaning. The method provided by the embodiment of the invention can automatically generate the labels for the users.
If only keywords are used, the vocabulary is huge, each keyword can be used as a label of a user, obviously, the method is unsuitable and meaningless, and the method avoids the situation by extracting the theme.
Text word segmentation algorithms have three major classes, the first class is based on string matching (scanning strings, strings are identical to words in a dictionary, i.e., match; the second type is a word segmentation method based on statistics and machine learning (training model parameters according to labeled corpus to obtain a target model, calculating the probability of occurrence of various word segments by using the target model, and taking the word segmentation result with the highest probability as a final result; thirdly, by enabling a computer to simulate the understanding of a sentence by a person, the word recognition effect is achieved, and due to the complexity of Chinese semantics, various language information is difficult to organize into a form which can be recognized by a machine, and the method is still in a test stage at present.
And (3) tag: the Tag is an Internet content organization mode, is a keyword with strong correlation, and helps people to easily describe and classify contents so as to be convenient for searching and sharing.
Interest preferences: what type of item, what attribute of attribution, and what style of item class the user likes, etc.
The process of generating the user tag is described above with reference to fig. 1 to 6, and the apparatus for generating the user tag is described below with reference to fig. 7.
In order to solve the problems in the prior art, an embodiment of the present invention provides a device for generating a user tag, as shown in fig. 7, where the device includes:
and the word segmentation unit 701 is configured to segment the user information to obtain a word set of the user.
The processing unit 702 is configured to obtain a plurality of topics and keywords of each topic according to the word set of the user.
A calculating unit 703, configured to calculate, for each topic, a weight of the topic according to operation information on a target to which a keyword of the topic belongs.
And a selection unit 704, configured to select a target theme from multiple themes according to the weight of each theme, and use the target theme as a tag of the user.
In the embodiment of the invention, the operation information comprises operation time, operation times and operation attribute values;
the computing unit 703 is configured to:
multiplying, for each operation, a sum of operation times of objects to which each keyword of the subject belongs with a time decay weight of the operation time, multiplying a sum of operation times of objects to which each keyword of the subject belongs with a time decay weight of the operation times, and multiplying a sum of operation attribute values of objects to which each keyword of the subject belongs with a time decay weight of the operation attribute values, taking the sum of the obtained products as the weight of the operation;
And adding the weights of the operations, wherein the obtained value is used as the weight of the theme.
In the embodiment of the present invention, the computing unit 703 is configured to:
for each keyword of the theme, multiplying the absolute value of the difference value between the operation time of the target to which the keyword of the theme belongs and the current time by a preset time attenuation factor to obtain a product, and adding 1 to the product to be used as the weight of the keyword of the theme;
dividing the sum of the operation time of the target to which each keyword of the theme belongs with the weight of each keyword of the theme to obtain a plurality of quotients; summing all the quotients to obtain a sum; the sum is logarithmically processed, and the obtained value is used as the time attenuation weight of the operation time.
In an embodiment of the present invention, the processing unit 702 is configured to:
extracting a plurality of topics from the word set of the user, and obtaining a first keyword of each topic;
and for each topic, obtaining a second keyword of the topic according to the topic, the first keyword of the topic and a preset word stock, wherein the keywords of the topic comprise the first keyword of the topic and the second keyword of the topic.
In an embodiment of the present invention, the processing unit 702 is configured to:
generating at least one target vector according to the topic and a first keyword of the topic, and calculating the average distance between the target vectors by adopting at least one target vector;
and generating a vector of each word in a preset word stock, and if the distance between the vector of each word and any one of the target vectors is greater than the average distance, the second keyword of the theme comprises the word.
In the embodiment of the present invention, the selecting unit 704 is configured to:
and arranging the weights of all the topics in the order from big to small, and selecting the topics ranked in the first three positions as target topics.
In the embodiment of the invention, the user information comprises:
historical behavior information of a user, information of an item associated with the user, information of a home of the item.
It should be understood that the functions executed by each component of the user tag generating apparatus provided in the embodiment of the present invention are described in detail in the method for generating a user tag in the foregoing embodiment, which is not described herein again.
Fig. 8 shows an exemplary system architecture 800 to which the user tag generation method or the user tag generation apparatus of the embodiment of the present invention may be applied.
As shown in fig. 8, a system architecture 800 may include terminal devices 801, 802, 803, a network 804, and a server 805. The network 804 serves as a medium for providing communication links between the terminal devices 801, 802, 803 and the server 805. The network 804 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
A user may interact with the server 805 through the network 804 using the terminal devices 801, 802, 803 to receive or send messages or the like. Various communication client applications such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 801, 802, 803.
The terminal devices 801, 802, 803 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 805 may be a server providing various services, such as a background management server (by way of example only) that provides support for shopping-type websites browsed by users using the terminal devices 801, 802, 803. The background management server may analyze and process the received data such as the product information query request, and feedback the processing result (e.g., the target push information, the product information—only an example) to the terminal device.
It should be noted that, the method for generating the user tag according to the embodiment of the present invention is generally executed by the server 805, and accordingly, the device for generating the user tag is generally disposed in the server 805.
It should be understood that the number of terminal devices, networks and servers in fig. 8 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
Referring now to FIG. 9, there is illustrated a schematic diagram of a computer system 900 suitable for use in implementing an embodiment of the present invention. The terminal device shown in fig. 9 is only an example, and should not impose any limitation on the functions and the scope of use of the embodiment of the present invention.
As shown in fig. 9, the computer system 900 includes a Central Processing Unit (CPU) 901, which can execute various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 902 or a program loaded from a storage section 908 into a Random Access Memory (RAM) 903. In the RAM 903, various programs and data necessary for the operation of the system 900 are also stored. The CPU 901, ROM 902, and RAM 903 are connected to each other through a bus 904. An input/output (I/O) interface 905 is also connected to the bus 904.
The following components are connected to the I/O interface 905: an input section 906 including a keyboard, a mouse, and the like; an output portion 907 including a display such as a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and a speaker; a storage portion 908 including a hard disk or the like; and a communication section 909 including a network interface card such as a LAN card, a modem, or the like. The communication section 909 performs communication processing via a network such as the internet. The drive 910 is also connected to the I/O interface 905 as needed. A removable medium 911 such as a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like is installed as needed on the drive 910 so that a computer program read out therefrom is installed into the storage section 908 as needed.
In particular, according to embodiments of the present disclosure, the processes described above with reference to flowcharts may be implemented as computer software programs. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable medium, the computer program comprising program code for performing the method shown in the flow chart. In such an embodiment, the computer program may be downloaded and installed from the network via the communication portion 909 and/or installed from the removable medium 911. The above-described functions defined in the system of the present invention are performed when the computer program is executed by a Central Processing Unit (CPU) 901.
The computer readable medium shown in the present invention may be a computer readable signal medium or a computer readable storage medium, or any combination of the two. The computer readable storage medium can be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or a combination of any of the foregoing. More specific examples of the computer-readable storage medium may include, but are not limited to: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device. In the present invention, however, the computer-readable signal medium may include a data signal propagated in baseband or as part of a carrier wave, with the computer-readable program code embodied therein. Such a propagated data signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination of the foregoing. A computer readable signal medium may also be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to: wireless, wire, fiber optic cable, RF, etc., or any suitable combination of the foregoing.
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present invention. In this regard, each block in the flowchart or block diagrams may represent a unit, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The units involved in the embodiments of the present invention may be implemented in software or in hardware. The described units may also be provided in a processor, for example, described as: a processor includes a word segmentation unit, a processing unit, a computing unit, and a selection unit. The names of these units do not in some cases limit the unit itself, for example, a word segmentation unit may also be described as "a unit that segments user information to obtain a word set of the user".
As another aspect, the present invention also provides a computer-readable medium that may be contained in the apparatus described in the above embodiments; or may be present alone without being fitted into the device. The computer readable medium carries one or more programs which, when executed by a device, cause the device to include: word segmentation is carried out on the user information to obtain a word set of the user; obtaining a plurality of topics and keywords of each topic according to the word set of the user; for each topic, calculating the weight of the topic according to the operation information of the target to which the key word of the topic belongs; and selecting a target theme from a plurality of themes according to the weight of each theme, and taking the target theme as a label of the user.
According to the technical scheme of the embodiment of the invention, the user information is segmented to obtain the word set of the user; obtaining a plurality of topics and keywords of each topic according to a word set of a user; for each topic, calculating the weight of the topic according to the operation information of the target to which the key word of the topic belongs; and selecting a target theme from a plurality of themes according to the weight of each theme, and taking the target theme as a label of the user. Therefore, the generation logic of the labels is not needed to be analyzed, the user labels are directly generated according to the user information, the generation efficiency of the user labels is improved, the generation method of the user labels is suitable for any application scene, and the reusability of the generation method of the user labels is improved.
The above embodiments do not limit the scope of the present invention. It will be apparent to those skilled in the art that various modifications, combinations, sub-combinations and alternatives can occur depending upon design requirements and other factors. Any modifications, equivalent substitutions and improvements made within the spirit and principles of the present invention should be included in the scope of the present invention.

Claims (8)

1. A method for generating a user tag, comprising:
word segmentation is carried out on the user information to obtain a word set of the user;
obtaining a plurality of topics and keywords of each topic according to the word set of the user;
for each topic, calculating the weight of the topic according to the operation information of the target to which the key word of the topic belongs;
selecting a target theme from a plurality of themes according to the weight of each theme, and taking the target theme as a label of the user;
the operation information comprises operation time, operation times and operation attribute values;
calculating the weight of the theme according to the operation information of the target to which the keyword of the theme belongs, including:
multiplying, for each operation, a sum of operation times of objects to which each keyword of the subject belongs with a time decay weight of the operation time, multiplying a sum of operation times of objects to which each keyword of the subject belongs with a time decay weight of the operation times, and multiplying a sum of operation attribute values of objects to which each keyword of the subject belongs with a time decay weight of the operation attribute values, taking the sum of the obtained products as the weight of the operation;
Adding the weights of the operations, wherein the obtained value is used as the weight of the theme;
the method for calculating the time attenuation weight of the operation time comprises the following steps:
for each keyword of the theme, multiplying the absolute value of the difference value between the operation time of the target to which the keyword of the theme belongs and the current time by a preset time attenuation factor to obtain a product, and adding 1 to the product to be used as the weight of the keyword of the theme;
dividing the sum of the operation time of the target to which each keyword of the theme belongs with the weight of each keyword of the theme to obtain a plurality of quotients; summing all the quotients to obtain a sum; the sum is logarithmically processed, and the obtained value is used as the time attenuation weight of the operation time.
2. The method of claim 1, wherein obtaining a plurality of topics and keywords for each topic from the set of words for the user comprises:
extracting a plurality of topics from the word set of the user, and obtaining a first keyword of each topic;
and for each topic, obtaining a second keyword of the topic according to the topic, the first keyword of the topic and a preset word stock, wherein the keywords of the topic comprise the first keyword of the topic and the second keyword of the topic.
3. The method of claim 2, wherein obtaining the second keyword of the topic from the topic, the first keyword of the topic, and a preset thesaurus comprises:
generating at least one target vector according to the topic and a first keyword of the topic, and calculating the average distance between the target vectors by adopting at least one target vector;
and generating a vector of each word in a preset word stock, and if the distance between the vector of each word and any one of the target vectors is greater than the average distance, the second keyword of the theme comprises the word.
4. The method of claim 1, wherein selecting a target topic from a plurality of topics based on the weight of each topic comprises:
and arranging the weights of all the topics in the order from big to small, and selecting the topics ranked in the first three positions as target topics.
5. The method of claim 1, wherein the user information comprises:
historical behavior information of a user, information of an item associated with the user, information of a home of the item.
6. A user tag generation apparatus, comprising:
The word segmentation unit is used for segmenting the user information to obtain a word set of the user;
the processing unit is used for obtaining a plurality of topics and keywords of each topic according to the word set of the user;
a calculating unit, configured to calculate, for each topic, a weight of the topic according to operation information on a target to which a keyword of the topic belongs;
the selecting unit is used for selecting a target theme from a plurality of themes according to the weight of each theme, and taking the target theme as a label of the user;
the operation information comprises operation time, operation times and operation attribute values;
a calculation unit further configured to multiply, for each operation, a sum of operation times for each keyword of the subject with a time attenuation weight of the operation time, multiply a sum of operation times for each keyword of the subject with a time attenuation weight of the operation times, multiply a sum of operation attribute values for each keyword of the subject with a time attenuation weight of the operation attribute values, and take the sum of the obtained products as the weight of the operation; adding the weights of the operations, wherein the obtained value is used as the weight of the theme;
The method for calculating the time attenuation weight of the operation time comprises the following steps:
for each keyword of the theme, multiplying the absolute value of the difference value between the operation time of the target to which the keyword of the theme belongs and the current time by a preset time attenuation factor to obtain a product, and adding 1 to the product to be used as the weight of the keyword of the theme;
dividing the sum of the operation time of the target to which each keyword of the theme belongs with the weight of each keyword of the theme to obtain a plurality of quotients; summing all the quotients to obtain a sum; the sum is logarithmically processed, and the obtained value is used as the time attenuation weight of the operation time.
7. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs,
when executed by the one or more processors, causes the one or more processors to implement the method of any of claims 1-5.
8. A computer readable medium, on which a computer program is stored, characterized in that the program, when being executed by a processor, implements the method according to any of claims 1-5.
CN201910917704.8A 2019-09-26 2019-09-26 User tag generation method and device Active CN112559853B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201910917704.8A CN112559853B (en) 2019-09-26 2019-09-26 User tag generation method and device

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201910917704.8A CN112559853B (en) 2019-09-26 2019-09-26 User tag generation method and device

Publications (2)

Publication Number Publication Date
CN112559853A CN112559853A (en) 2021-03-26
CN112559853B true CN112559853B (en) 2024-01-12

Family

ID=75029805

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201910917704.8A Active CN112559853B (en) 2019-09-26 2019-09-26 User tag generation method and device

Country Status (1)

Country Link
CN (1) CN112559853B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116779109B (en) * 2023-05-24 2024-04-02 纬英数字科技(广州)有限公司 Self-feature discovery method and device based on exploration scene guidance

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011198393A (en) * 2011-06-29 2011-10-06 Yahoo Japan Corp User interest analyzing device, method, and program
CN106055538A (en) * 2016-05-26 2016-10-26 达而观信息科技(上海)有限公司 Automatic extraction method for text labels in combination with theme model and semantic analyses
CN106997382A (en) * 2017-03-22 2017-08-01 山东大学 Innovation intention label automatic marking method and system based on big data
CN107943895A (en) * 2017-11-16 2018-04-20 百度在线网络技术(北京)有限公司 Information-pushing method and device
CN108288229A (en) * 2018-03-02 2018-07-17 北京邮电大学 A kind of user's portrait construction method
CN110083774A (en) * 2019-05-10 2019-08-02 腾讯科技(深圳)有限公司 Using determination method, apparatus, computer equipment and the storage medium of recommendation list

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2011198393A (en) * 2011-06-29 2011-10-06 Yahoo Japan Corp User interest analyzing device, method, and program
CN106055538A (en) * 2016-05-26 2016-10-26 达而观信息科技(上海)有限公司 Automatic extraction method for text labels in combination with theme model and semantic analyses
CN106997382A (en) * 2017-03-22 2017-08-01 山东大学 Innovation intention label automatic marking method and system based on big data
CN107943895A (en) * 2017-11-16 2018-04-20 百度在线网络技术(北京)有限公司 Information-pushing method and device
CN108288229A (en) * 2018-03-02 2018-07-17 北京邮电大学 A kind of user's portrait construction method
CN110083774A (en) * 2019-05-10 2019-08-02 腾讯科技(深圳)有限公司 Using determination method, apparatus, computer equipment and the storage medium of recommendation list

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
基于评论与转发的微博联合主题挖掘;赵臣升;吴国文;胡福玲;;智能计算机与应用(01);全文 *
用户评论中的标签抽取以及排序;李丕绩;马军;张冬梅;韩晓晖;;中文信息学报(05);全文 *

Also Published As

Publication number Publication date
CN112559853A (en) 2021-03-26

Similar Documents

Publication Publication Date Title
CN108062375B (en) User portrait processing method and device, terminal and storage medium
CN107172151B (en) Method and device for pushing information
CN107577807B (en) Method and device for pushing information
US11514124B2 (en) Personalizing a search query using social media
US20230214895A1 (en) Methods and systems for product discovery in user generated content
CN110020162B (en) User identification method and device
US9767417B1 (en) Category predictions for user behavior
US20200226168A1 (en) Methods and systems for optimizing display of user content
CN112765478B (en) Method, apparatus, device, medium and program product for recommending content
CN107798622B (en) Method and device for identifying user intention
KR102458510B1 (en) Real-time complementary marketing system
CN114154013A (en) Video recommendation method, device, equipment and storage medium
US20230030560A1 (en) Methods and systems for tagged image generation
JP6928044B2 (en) Providing equipment, providing method and providing program
US9967297B1 (en) Generating item suggestions from a profile-based group
CN111767459A (en) Item recommendation method and device
CN109981712B (en) Method and device for pushing information
CN112559853B (en) User tag generation method and device
CN111782850B (en) Object searching method and device based on hand drawing
CN110245357B (en) Main entity identification method and device
US10387934B1 (en) Method medium and system for category prediction for a changed shopping mission
CN107483595B (en) Information pushing method and device
CN115544285A (en) Three-dimensional model search recommendation method, device, equipment and medium
CN113313542B (en) Method and device for pushing channel pages
JP7133508B2 (en) Provision device, provision method and provision program

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant