WO2019019348A1 - 产品信息推送方法、装置、存储介质和计算机设备 - Google Patents

产品信息推送方法、装置、存储介质和计算机设备 Download PDF

Info

Publication number
WO2019019348A1
WO2019019348A1 PCT/CN2017/103989 CN2017103989W WO2019019348A1 WO 2019019348 A1 WO2019019348 A1 WO 2019019348A1 CN 2017103989 W CN2017103989 W CN 2017103989W WO 2019019348 A1 WO2019019348 A1 WO 2019019348A1
Authority
WO
WIPO (PCT)
Prior art keywords
user
behavior data
preset
topic
association
Prior art date
Application number
PCT/CN2017/103989
Other languages
English (en)
French (fr)
Inventor
柴春燕
Original Assignee
上海壹账通金融科技有限公司
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 上海壹账通金融科技有限公司 filed Critical 上海壹账通金融科技有限公司
Publication of WO2019019348A1 publication Critical patent/WO2019019348A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/06Buying, selling or leasing transactions
    • G06Q30/0601Electronic shopping [e-shopping]
    • G06Q30/0631Item recommendations

Definitions

  • the present application relates to the field of information processing technologies, and in particular, to a product information pushing method, apparatus, storage medium, and computer device.
  • instant messaging and social application platforms store a large amount of user information, such as user's age, hobbies, occupations and other personal information, as well as behavioral information such as instant messaging messages and articles sent by users. Information has an important reference value for the accurate delivery of information such as merchandise advertisements.
  • a product information push method, apparatus, storage medium, and computer device are provided.
  • a product information pushing method includes: obtaining behavior data generated by each user within a preset time period; determining a preset topic to which each behavior data belongs, and establishing an association relationship between each behavior data and the belonging preset topic; Calculating, according to the association relationship, the number of occurrences of behavior data generated by each user in the preset time period on each preset topic; and generating, according to the number of occurrences, characteristics of the user's popularity for each preset topic.
  • a vector calculating a distance between feature vectors of each user; and, after obtaining the purchase information of the preset product by the first user, selecting a first number of second users having the shortest distance from the first user Sending product information of the product to the terminal of the second user.
  • a product information pushing device comprising: a behavior data acquiring module, configured to acquire behavior data generated by each user in a preset time period; and a topic determining module, configured to determine a preset topic to which each behavior data belongs And each of the behavior data and the belonging preset topic are associated with each other; the number calculation module is configured to calculate, according to the association relationship, behavior data generated by each user in the preset time period on each preset topic.
  • the feature vector generation module the user generates a feature vector corresponding to the heat of the user for each preset topic according to the number of occurrences;
  • the distance calculation module is configured to calculate the distance between the feature vectors of each user;
  • the information a pushing module configured to: after obtaining the purchase information of the preset product by the first user, selecting a first number of second users that are the shortest distance from the first user, and sending the second user to the terminal of the second user Product information for the product.
  • One or more non-volatile storage media storing computer readable instructions, when executed by one or more processors, cause one or more processors to perform the steps of: acquiring each user The behavior data generated in the preset time period; determining a preset topic to which each behavior data belongs, establishing an association relationship between each behavior data and the belonging preset topic; calculating each user in the preset according to the association relationship The number of occurrences of the behavior data generated in the time period on each of the preset topics; generating a feature vector corresponding to the heat of the user for each of the preset topics according to the number of occurrences; calculating a distance between the feature vectors of each user; And after acquiring the purchase information of the preset product by the first user, selecting a first number of second users that are the shortest distance from the first user, and sending the product to the terminal of the second user product information.
  • a computer device comprising a memory and a processor, the memory storing computer readable instructions, the computer readable instructions being executed by the processor, causing the processor to perform the step of: acquiring each user The behavior data generated in the preset time period; determining a preset topic to which each behavior data belongs, establishing an association relationship between each behavior data and the belonging preset topic; calculating each user in the preset according to the association relationship The number of occurrences of the behavior data generated in the time period on each of the preset topics; generating a feature vector corresponding to the heat of the user for each of the preset topics according to the number of occurrences; calculating a distance between the feature vectors of each user; And after acquiring the purchase information of the preset product by the first user, selecting a first number of second users that are the shortest distance from the first user, and sending the product to the terminal of the second user product information.
  • 1 is an application environment diagram of a method for pushing product information in an embodiment
  • FIG. 2 is a flow chart of a method for pushing product information in an embodiment
  • FIG. 3 is a flow chart of determining a preset topic for each behavior data attribution in an embodiment
  • Figure 4 is a flow chart showing the calculation of the center vector in one embodiment
  • Figure 5 is a block diagram showing the structure of a product information pushing device in an embodiment
  • FIG. 6 is a structural block diagram of a product information push device in another embodiment
  • FIG. 7 is a structural block diagram of a topic determining module in an embodiment
  • Figure 8 is a block diagram showing the structure of a product information pushing device in still another embodiment
  • Figure 9 is a diagram showing the internal structure of a computer device in an embodiment.
  • first”, “second”, and the like, as used herein may be used herein to describe various elements, but these elements are not limited by these terms. These terms are only used to distinguish one element from another.
  • a first user may be referred to as a second user, and similarly, a second user may be referred to as a first user, without departing from the scope of the present application.
  • Both the first user and the second user are users, but they are not the same user.
  • the product information pushing method provided by the embodiment of the present application can be applied to an application environment as shown in FIG. 1.
  • the server 120 is connected to a plurality of user terminals 110-1 to 120-x via a network.
  • Each user terminal 110 can generate behavior data through the server 120.
  • the server 120 can receive and store the behavior data generated by each user terminal 110, and determine a preset topic for which each behavior data belongs, and each behavior data and attribution.
  • the topic establishes an association relationship; calculates the number of occurrences of behavior data generated by each user in a preset time period on each topic according to the association relationship; generates a feature vector corresponding to the user's popularity for each topic according to the number of occurrences; The distance between the feature vectors of the users; after obtaining the purchase information of the preset product by the first user, selecting the first number of second users whose distance from the first user is the shortest, and the terminal to the second user Send product information for the product.
  • a product information pushing method is provided, which may be applied to a server as shown in FIG. 1, or may also be applied to a computer device such as a terminal, and the method includes:
  • Step S202 Acquire behavior data generated by each user within a preset time period.
  • the behavior data is information published by the user through an instant messaging application or a social application, and the behavior data includes data in the form of a chat message, an article, or a log sent by the user terminal through an instant communication application or a social application.
  • the behavior data is in the form of text data of a corresponding form such as an article or a log
  • each article or log can be used as a corresponding behavior data.
  • the behavior data is in the form of a separate message such as a text chat message or a voice chat message sent by the user, each small message can be used. All messages generated during a suitable time period, such as time or day or week, are aggregated into a corresponding behavior data.
  • Each of the behavior data may also carry a user identifier of the user who generated the information, or establish an association relationship between the behavior data and the user identifier, so as to determine the belonging of each behavior data according to the user identifier or association relationship carried therein user. And according to the determined users, distinguish and summarize the behavior data generated by each user within a preset time period.
  • the preset time period can be any time period set, for example, it can be set to the last one or two years.
  • the behavior data may be pre-stored in a local memory of the computer device, or may be stored in other devices. When stored locally, the computer device can read the behavior data of each user from the local storage; when stored on other devices, send a behavior data acquisition request to the corresponding device, and receive the corresponding device returns according to the request. Behavioral data.
  • Step S204 determining a preset topic to which each behavior data belongs, and establishing an association relationship between each behavior data and the belonging preset topic.
  • the computer device also presets various topics, and the topic may include any of shopping, sales, house price, insurance, finance, love, games, health, and the like.
  • the server can perform semantic analysis on each behavior data, and determine the topic to which each behavior data belongs according to the parsed semantics. Specifically, the word segmentation may be performed for each piece of behavior data, and the speech analysis may be performed for each of the divided words, and the topic attributed by the behavior data is analyzed according to the semantics of each of the parsed words.
  • the topic computing model is also preset in the computer device. For example, the topic of each behavior data can be calculated according to a preset LDA (Latent Dirichlet Allocation) topic model, and the behavior data and the location are established. The relationship between the topics of attribution. Among them, one behavior data may belong to one or several topics, and further, may not belong to any of the preset topics.
  • LDA Topic Dirichlet Allocation
  • Step S206 calculating the number of occurrences of the behavior data generated by each user in the preset time period on each preset topic according to the association relationship.
  • the computer device may count, according to the association relationship between each behavior data and the topic, the number of behavior data having an association relationship between each topic and each behavior data of each user, and the quantity is Behavior data generated during the preset time period of the user in each topic
  • the number of occurrences on. The number of occurrences reflects the user's level of interest in each topic and can be used to reflect the user characteristics of each user. It can be understood that the more the behavior data, the greater the accuracy of the user characteristics reflected by the calculated number of occurrences.
  • each behavioral data may also be screened to determine a predetermined topic for determining the attribution of each behavioral data after screening.
  • the behavior data it is determined whether the behavior data is selected or the behavior data is excluded according to several factors such as the text content and the quantity included in each behavior data. For example, when the number of words of a behavior data is less than a certain amount, and/or the parsed information of the behavior data has no meaning relative to each preset topic, the behavior data may be eliminated. And for the behavior data retained by the screening, determine the topic to which it belongs. Behavior data that is discarded at the same time may not be associated with any topic, so that there is no need to parse the discarded topic and reduce the resource consumption of the computer device.
  • Step S208 generating a feature vector corresponding to the heat of the user for each preset topic according to the number of occurrences.
  • the computer device may generate a corresponding feature vector according to the number of occurrences of each topic, and establish an association relationship between the feature vector and the user identifier of the corresponding user.
  • the dimension of the feature vector is the number of topics, and each element of the feature vector is used to reflect the heat of the corresponding user to a corresponding topic.
  • the corresponding feature vector S can be calculated as [s 1 , s 2 , s 3 ... sn] i .
  • the angle i of the feature vector is used to reflect the feature vector as the feature vector of the user i, and s 1 to s n are respectively used to reflect the discussion heat of the topic 1 to the topic n in the behavior data generated by the corresponding user i.
  • the value of an element in the feature vector may be the number or frequency of discussions on the topic, and the like.
  • the number of occurrences of each topic can be directly set to the value of the corresponding element in the feature vector, or the number of times can be correspondingly processed, and the processed value is taken as the value of the corresponding element.
  • each occurrence number can be normalized, and the normalized value is used as the value of the corresponding element.
  • step S210 the distance between the feature vectors of each user is calculated.
  • the computer device can calculate the distance between the feature vector of each user and the feature vector of other users according to the calculation method of the distance between the vectors, and the distance reflects the similarity between the two users. The smaller the distance value, the greater the similarity between the two.
  • the computer device can acquire feature vectors of two uncalculated distance users, and each of the two acquired feature vectors is subtracted from the corresponding parameter in the corresponding acquired feature vector to obtain a difference, and then each is The difference is squared and the resulting square sum value is the distance between the feature vectors of the corresponding two users.
  • the distance d kj between the feature vector of the user k and the feature vector of the user j can be calculated by a formula To calculate the first distance.
  • n represents the feature tag vector dimension, which is the preset number of lifts
  • s kj and s kj represent the feature vector of the user k and the ith parameter of the feature vector of the user j, respectively.
  • Step S212 after acquiring the purchase information of the preset product by the first user, selecting the first number of second users whose distance from the first user is the shortest, and transmitting the product information of the product to the terminal of the second user.
  • the first user is a user who purchased the preset product.
  • the second user is a user who is determined to recommend product information of the preset product.
  • the preset product can be any product, including physical products and virtual products.
  • the physical products may include any type of products such as snacks, digital and clothing;
  • the virtual products may include virtual game products and financial products, and
  • the financial products may be any products such as insurance, funds, stocks, and the like.
  • the product information includes the product name of the corresponding product, as well as the corresponding product recommendation information such as the purchase method and purchase price.
  • the computer device may obtain the purchase information of the first user, and the acquired purchase information may be purchase information sent by the first user to the computer device, and may also be a purchase obtained from other devices storing the purchase information of the first user. information.
  • the purchase information includes a product identifier of the corresponding purchased product and a user identifier, so that the feature vector of the corresponding user can be read according to the user identifier, according to the product label You can obtain product information for the corresponding product.
  • the user identifier of the first number of users having the shortest distance from the corresponding first user may be read according to the user identifier, and the read user identifier is corresponding to the second identifier.
  • the first quantity may be a preset fixed value, and may also be a value determined according to the corresponding distance size. For example, the number of other feature vectors whose distance from the feature vector of the first user is less than the preset distance may be calculated, and the calculated quantity is set to the first quantity.
  • the method before step S210, further comprises: sorting other users according to the calculated distance between each user's feature vector and other user's feature vectors. For example, the order can be sorted according to the size of the distance. For example, there are M users. For user i, the distances of other users relative to the user i are sorted from small to large, namely: user 3, user 9, user 6, user 80, and the like.
  • the computer device may read, from the ranking relative to the first user, a user identifier of a first first number of users with a minimum distance from the first user, and the read user identifier is a user identifier of the second user.
  • the first user is the user i
  • the first quantity is 3
  • the user IDs of the user 3, the user 9, and the user 6 can be read, and the user IDs of the user 3, the user 9, and the user 6 are read.
  • the terminal sends the product information of the corresponding product.
  • the above product information pushing method obtains behavior data generated by each user in a preset time period; determines a preset topic to which each behavior data belongs, thereby calculating all behavior data of each user in each preset topic. The number of occurrences on the basis, according to the number of times, a feature vector corresponding to the heat of the user for each preset topic is generated, and then the distance of each user is calculated according to the generated feature vector, so that each user is represented according to the distance.
  • the size of the similarity between the two and after acquiring the purchase information of the preset product by the first user, selecting the first number of second users that are the shortest distance from the first user, and sending the second user to the terminal of the second user
  • the product information of the product thereby improving the accuracy of pushing the product information to the user.
  • the method before the step S212, further includes: acquiring each user's Personal information; the similarity between each user is calculated based on personal information. Selecting the first number of second users that are the shortest distance from the first user, including: selecting a first number of second users that are the shortest distance from the first user from the users whose similarity with the first user is greater than the similarity threshold .
  • the personal information includes the user's gender, age, occupation, hobbies, and the like.
  • the computer device may store the personal information of each user in a local memory in advance, and may also send the user's personal information to other devices. Request and get personal information from other devices.
  • the computer device can calculate the similarity between each user based on each field in the personal information. This similarity is used to reflect the similarity between users, and it is understood that the similarity between users having the same or similar fields is large.
  • the similarity between each user may be calculated according to any one or more of the gender, age, occupation, hobby, and the like in the personal information, and the user group is divided according to the similarity, and the similarity is similar. Users whose degree exceeds a preset similarity threshold are divided into the same user group such that the calculation of the distance between the feature vectors and/or the selection of the second user is performed in the same user group as the first user.
  • the computer device may select a first number of second users that are the shortest distance from the first user from among users whose similarity is greater than the similarity threshold. The first number of second users that are the shortest distance from the first user can be selected in the same user group.
  • a preset number of categories of user groups may be pre-divided, and specific field characteristics that are required to be included in each category of user groups are set, and fields and divisions in each user's personal information are determined.
  • the similarity is greater than the similarity threshold.
  • a corresponding number of user groups may be set according to the age group of the age, the occupation category to which the occupation belongs, the attribution category of the hobby, and the gender. The degree of matching of the age, occupation, hobbies, and gender in each user's personal information with the set user group is calculated.
  • the similarity of each user is calculated according to the personal information of the user, and the distance from the first user is selected from the users whose similarity with the first user is greater than the similarity threshold.
  • the shortest first number of second users such that the selected The second user who recommends product information is more accurate.
  • a predetermined topic for determining the attribution of each behavior data includes:
  • Step S302 extracting keyword phrases in each piece of behavior data.
  • the computer device may perform word segmentation on the content in each piece of behavior data, filter the segmented words, and set the selected words as keyword words of the behavior data.
  • the computer device can pre-establish a semantic database of words and phrases, which contains a large number of words (ie words and sentences).
  • each sentence in the behavior data is divided into a corresponding number of words according to the rules of the corresponding syntax tree and combined with the sentences recorded in the semantic database.
  • determine the part of speech in the sentence and the position in the behavior data such as determining a word as a noun, and the subject in the sentence.
  • the position of the phrase in the behavior data includes the title, the body of the behavior data, the topic in the chapter, the author of the behavior data, the time of publication, and so on.
  • the words can be selected according to the part of speech of the segmented words to delete the words with little or no meaning for the topic determination of the behavior data.
  • the part of speech can be deleted as a part of speech such as a stop word or an auxiliary word.
  • the stop words are "the”, “is”, “at”, “that”, “yes”, “”, etc.
  • the auxiliary words are, for example, "also", "person", "like”, and the like.
  • Step S304 calculating a first degree of association between each keyword sentence and each preset topic.
  • the computer device may preset a plurality of topics. After acquiring a certain keyword sentence, it may first detect whether the keyword sentence or a phrase close to the keyword sentence is calculated in advance, and the degree of association of each topic. If yes, you don't have to calculate it again, you can directly put the keyword sentence or similar The degree of association between the words and each topic is set to the first degree of relevance of the keyword sentence and each of the preset topics. If it does not exist, each keyword sentence can be semantically parsed to obtain a first degree of association between each keyword term and each preset topic. The first degree of association may be represented by a percentage, and the degree of association is between 0 and 1.
  • preset topics include shopping, sales, housing prices, insurance, finance, love, games, health, and the like. Then, for each keyword sentence obtained, the first degree of association with the above-mentioned topics such as shopping, sales, house price, insurance, finance, love, games, health, and the like may be separately calculated. Further, the first degree of association according to each keyword sentence and all the preset topics may constitute a corresponding first degree of association vector. It can be understood that the first association degree vector is the same vector as the dimension of the feature vector described above, and each of the elements respectively represents the degree of association of the corresponding keyword sentence and one of the preset topics.
  • the computer device may be pre-configured with a semantic analysis model, and may calculate the relevance of each of the obtained keyword phrases and each topic according to the semantic analysis model.
  • the semantic analysis model may include a semantic analysis model set according to a Latent Semantic Index (LSI), a Latent Dirichlet Allocation (LDA), or a Word 2vec.
  • Step S306 calculating corresponding behavior data and a second degree of association of each preset topic according to the first association degree.
  • the computer device can weight and sum the first degree of association with each keyword sentence for each topic, so that the second degree of association of the behavior data for each topic can be calculated.
  • the elements in the same dimension may be weighted and summed in each of the first associated degree vectors, and the second associated degree vector may be generated according to the calculated weighted sum of each element.
  • Each element in the association degree vector is the second degree of association of the corresponding behavior data and the corresponding topic.
  • the weight of each keyword sentence may be determined according to one or more factors such as the detected position of the phrase in the behavior data, the part of speech, and the number of keyword sentences in the behavior data. For example, when the keyword sentence is in the title in the behavior data, a relatively large weight may be set; when the number of each keyword sentence in the behavior data is large, the weight set for each keyword sentence is relatively small.
  • Step S308 determining a preset topic corresponding to the second degree of association exceeding the relevance degree threshold as behavior The default topic to which the data belongs.
  • a corresponding relevance threshold is also set in the computer device, and the association threshold may be a value of a custom setting.
  • the computer device may compare each second degree of association with the relevance degree threshold, extract a topic of the second degree of relevance that exceeds the relevance degree threshold, and associate it with the corresponding behavior data, so that it is set as the behavior data.
  • the topic that belongs to For example, when the relevance degree threshold is 0.5, all topics corresponding to the second degree of association greater than 0.5 may be associated with the behavior data, and set as the topic to which the behavior data belongs.
  • determining the topic to which the behavior data belongs according to all the first degrees of association may further improve The accuracy of topic attribution of behavioral data.
  • generating a feature vector corresponding to the popularity of the user for each preset topic according to the number of occurrences including: according to each behavior data of each user and the number of occurrences of the belonging preset topic, and the topic of attribution
  • the second degree of association calculates the feature vector for each user.
  • the computer device can calculate, for each topic, the number of attributions to the topic in the behavior data of each user, and the number is the number of occurrences mentioned above. And combining the second degree of association of the behavior data belonging to the topic, setting a corresponding weight, and calculating a parameter value corresponding to the topic in the feature vector according to the number of occurrences and the weight.
  • the parameter value can be the product of the weight and the number of times.
  • the weight may be an average value calculated according to each second degree of association.
  • the computer device may separately extract the calculated second relevance of the 10 behavior data relative to the topic i, and calculate corresponding weights according to the extracted 10 second association degrees, and the weights and times
  • the product is set to the value of the parameter s i in the feature vector S i of the user i for reflecting the heat of the topic i.
  • the above method can be used, so that the feature vector of each user can be calculated.
  • the characteristic vector of the user is calculated by further combining the calculated second degree of association, so that the calculated feature vector of the user is more energized to express the user for each topic.
  • the heat is thus more accurate according to the second user determined by the feature vector.
  • step S208 further comprising generating a second number of clusters according to the distance, and dividing each user into a corresponding one of the clusters.
  • the computer device further presets a second number of categories for dividing each user into a corresponding one of the categories, and the users divided into the same category constitute a corresponding cluster.
  • the second quantity may be a arbitrarily set value that is less than the total number of users. The larger the second quantity, the more detailed the classification of the user.
  • the computer device may set a corresponding center vector for each cluster in advance, and the center vector is used to represent the feature information shared by the seed user corresponding to the information of a certain category, and the center vector has the same form as the feature vector and has the same feature vector.
  • the dimension of the length, the meaning of the parameter representation on each dimension is the same as the meaning of the parameter representation corresponding to the feature vector.
  • the computer device can calculate the distance between each feature vector and each center vector, and divide the feature vector into a cluster with the smallest corresponding distance, thereby completing the division of each user's category, and dividing into the same cluster.
  • the users corresponding to the feature vectors in the class are in the same cluster.
  • the step of selecting the first number of second users that are the shortest distance from the first user comprises: selecting a first number of second users that are the shortest distance from the first user from the users in the same cluster as the first user.
  • the cluster into which the first user is divided After acquiring the purchase information of the preset product by the first user, determining the cluster into which the first user is divided, and reading the feature vector from the cluster to the shortest distance from the feature vector of the first user The user ID of the first number of users. By selecting the user from among the users in the same cluster, the accuracy of the user selection can be further improved.
  • the center distance of each cluster may be calculated according to a preset clustering algorithm according to the feature vector of each user.
  • the clustering algorithm may be any clustering algorithm such as K-means, K-medoids or Clara. As shown in Figure 4, the calculation process of the center vector includes:
  • Step S402 determining an initial center vector of each cluster to be formed, and setting an initial center vector to a current center vector corresponding to the cluster to be formed.
  • the initial center vector may be a preset feature vector.
  • the initial center vector may be a feature vector set according to historical empirical values formed by the features of each cluster to reduce the amount of computation of the computer device.
  • a second quantity of feature vectors may be selected from each user's feature vector, and each selected feature vector is set to an initial center vector corresponding to one cluster, so that each cluster has a center vector.
  • the initial center vector will be determined as the current center vector for performing an iterative calculation of the following steps to determine the final cluster and the center vector of each determined final cluster.
  • Step S404 calculating a distance between each feature vector and each current center vector, and dividing the user of each feature vector into clusters corresponding to the minimum distance to generate a current cluster.
  • the distance between each feature vector and each current center vector may be separately calculated, and the user is divided according to the calculated distance, thereby forming a corresponding second number of clusters, and the cluster is formed.
  • the distance from each current center vector may be calculated, and the minimum distance among the second number of distances may be determined, and the user of the feature vector is divided into clusters corresponding to the minimum distance. . This process is repeated for each user's feature vector to complete the partitioning of all users, thereby achieving the formation of one cluster.
  • Step S406 calculating a new central vector of each cluster in the current cluster.
  • the center vector of the cluster is calculated for the feature vector of the user in each cluster, and the newly calculated center vector is replaced with the last calculated center vector. To perform an iterative calculation.
  • the calculation method of the central vector of each cluster can be calculated according to the preset central vector calculation model. For example, for each cluster of feature vectors, the parameters of the same position in the feature vector are averaged and the like to form a value of the parameter in the new center vector corresponding to the cluster.
  • Step S408 detecting whether each new center vector converges, and if not, setting each new center vector as the current center vector, and continuing to calculate the distance between each feature vector and each current center vector until each The new center vectors converge.
  • the current cluster is set to the determined final cluster, and the new center vector is set to correspond to the determined center vector of the final cluster.
  • the center point of each cluster that is currently calculated may be used as the current center vector corresponding to the cluster to be formed, and the process returns to the above step S404, and the steps S404 to S408 are continued to form a new cluster and The central vector of the new cluster until the center vector of each newly formed cluster is determined to converge.
  • the difference may be a distance value.
  • a corresponding distance threshold is preset in the computer device, and the preset distance threshold is used as a criterion for judging whether the center vector converges. By comparing the calculated difference with the distance threshold, when less than the distance threshold, it is determined that the corresponding center vector converges. When one or more of the center vectors do not converge, step S404 may be continued to set the latest center vector to correspond to the initial center vector of the cluster to be formed for re-clustering. If each difference is less than the distance threshold, it is determined that the calculated new center point converges.
  • the iteration may be terminated, the current cluster is set as the determined final cluster, and the new center vector is set to the central vector of the corresponding final cluster. , complete the clustering of all users and the calculation of the center vector of each cluster.
  • the current cluster is set to the determined final cluster
  • the new center vector is set to the center vector corresponding to the determined final cluster
  • the clustering of all users and the calculation of the center vector of each cluster are completed.
  • the accuracy of the cluster formation can be improved, the accuracy of the calculated cluster center can be improved, and the accuracy of the second user selection can be improved.
  • a corresponding cluster is formed for each user group according to the foregoing clustering process, so as to further improve the accuracy of the user division, and improve the second The accuracy of the user selection.
  • a product information pushing device comprising:
  • the behavior data acquisition module 502 is configured to acquire behavior data generated by each user within a preset time period.
  • the topic determining module 504 is configured to determine a preset topic to which each behavior data belongs, and associate each behavior data with a home preset topic.
  • the number calculation module 506 is configured to calculate the number of occurrences of the behavior data generated by each user in the preset time period on each preset topic according to the association relationship.
  • the feature vector generation module 508 generates a feature vector corresponding to the heat of the user for each preset topic according to the number of occurrences.
  • the distance calculation module 510 is configured to calculate a distance between feature vectors of each user.
  • the information pushing module 512 is configured to: after acquiring the purchase information of the preset product by the first user, selecting a first number of second users that are the shortest distance from the first user, and sending the product to the terminal of the second user product information.
  • the transposition further includes:
  • the user division module 514 is configured to acquire personal information of each user; and calculate similarity between each user according to the personal information.
  • the information pushing module 512 is further configured to select a first number of second users that are the shortest distance from the first user from the users whose similarity with the first user is greater than the similarity threshold.
  • the preset topic determining module 504 includes:
  • a keyword sentence extracting unit 702 configured to extract a keyword sentence in each piece of behavior data
  • the association degree calculation unit 704 is configured to calculate a first degree of association between each keyword sentence and each preset topic; and calculate corresponding behavior data and a second degree of association of each preset topic according to the first association degree;
  • the topic determining unit 706 is configured to determine a preset topic of the second degree of association that exceeds the relevance degree threshold as a preset topic to which the behavior data belongs.
  • the feature vector generation module 508 is further configured to: according to each behavior data of each user, the number of occurrences of the belonging preset topic, and the second degree of association with the preset topic of the attribution, Calculate the feature vector for each user.
  • the foregoing apparatus further includes:
  • the clustering module 516 is configured to generate a second number of clusters according to the distance, and divide each user into a corresponding one of the clusters.
  • the information pushing module 512 is further configured to select a first number of second users that are the shortest distance from the first user from among users who are in the same cluster as the first user.
  • the clustering module 516 is further configured to determine an initial center vector of each cluster to be formed, and set an initial center vector to correspond to a current center vector of the cluster to be formed; calculate each feature vector and The distance of each current center vector, the user of each feature vector is divided into clusters corresponding to the minimum distance, and the current cluster is generated; the new center vector of each cluster is calculated in the current cluster; each new one is detected Whether the center vector converges, if not, sets each new center vector to the current center vector, and continues to perform the above calculation of the distance of each feature vector from each current center vector until each new center vector converges When the new center vector converges, the current cluster is set to the determined final cluster, and the new center vector is set to the center vector corresponding to the determined final cluster.
  • the network interface may be an Ethernet card or a wireless network card.
  • Each of the above modules and units may be embedded in or independent of the processor in the computer device, or may be stored in a memory in the computer device in a software form, so that the processor invokes the operations corresponding to the above modules.
  • the processor can be a central processing unit (CPU), a microprocessor, a microcontroller, or the like.
  • the product information push device provided by the present application can be implemented in the form of a computer program that can be run on a computer device as shown in FIG. 9, non-volatile storage of the computer device
  • the medium can store various program modules constituting the product information push device.
  • the behavior data acquisition module 502, the topic determination module 504, the number calculation module 506, the feature vector generation module 508, the distance calculation module 510, and the information push module 512 can be included as shown in FIG.
  • Each program module includes computer readable instructions for causing the computer device to perform steps in a data processing method of various embodiments of the present application described in this specification.
  • the computer device may receive, by using the behavior data acquiring module 502 in the product information pushing device shown in FIG.
  • the preset topic to which the data belongs is associated with each of the behavior data and the belonging preset topic; the number-of-times calculation module 506 calculates the behavior data generated by each user in the preset time period according to the association relationship in each preset topic.
  • the feature vector generation module 508 generates a feature vector corresponding to the heat of the user for each preset topic according to the number of occurrences; calculates the distance between the feature vectors of each user by the distance calculation module 510; and pushes through the information
  • the module 512 selects the first number of second users whose distance is the shortest from the first user, and sends the product information of the product to the terminal of the second user.
  • one or more non-transitory computer readable storage media comprising computer readable instructions, when executed by one or more processors, causing the processor to perform the The steps of the product information pushing method provided by each embodiment.
  • the computer readable instructions when executed by one or more processors, causing the processor to perform the steps of: obtaining behavior data generated by each user for a preset period of time; determining each behavioral data The default topic of attribution, the relationship between each behavior data and the belonging preset topic is established; and the number of occurrences of the behavior data generated by each user in the preset time period on each preset topic is calculated according to the association relationship; The number of occurrences generates a feature vector corresponding to the heat of the user for each preset topic; calculates the distance between the feature vectors of each user; and after obtaining the purchase information of the preset product generated by the first user, selecting and first The first number of second users whose user distance is the shortest is to send the product information of the product to the terminal of the second user.
  • the processor when the computer readable instructions are executed by one or more processors, causing the processor to perform the following before performing a first number of second users that select the shortest distance from the first user Step: obtaining personal information of each user; calculating a similarity between each user according to the personal information; and implementing a first number of second users that are selected to be the shortest distance from the first user, including: from the first user Among the users whose similarity is greater than the similarity threshold, the first number of second users whose distance from the first user is the shortest is selected.
  • causing the processor to perform a predetermined topic for determining the attribution of each piece of behavior data comprises: extracting keyword phrases in each piece of behavior data Calculating a first degree of association of each keyword sentence and each preset topic; calculating a corresponding relationship data and a second degree of relevance of each preset topic according to the first degree of association; and a second degree of association exceeding a relevance degree threshold
  • the preset topic is determined as the default topic to which the behavior data belongs.
  • the processor when the computer readable instructions are executed by one or more processors, causing the processor to generate a feature vector corresponding to the user's popularity for each of the preset topics according to the number of occurrences, including: Each user's behavior data and the number of occurrences of the belonging preset topic, and the second degree of association with the belonging preset topic, calculate the feature vector of each user.
  • the processor when the computer readable instructions are executed by one or more processors, causing the processor, after performing the calculation of the distance between the feature vectors of each user, further comprising the step of: generating from the distances a second number of clusters, each user is divided into a corresponding one of the clusters; and the first number of second users that are selected to be the shortest distance from the first user are implemented, including: being the same as the first user Among the clustered users, the first number of second users whose distance from the first user is the shortest is selected.
  • the processor when the computer readable instructions are executed by one or more processors, the processor is further caused to perform the steps of: determining an initial center vector for each cluster to be formed, setting the initial center vector to Corresponding to the current center vector of the cluster to be formed; calculating the distance between each feature vector and each current center vector, dividing the user of each feature vector into the cluster corresponding to the minimum distance, generating the current cluster; calculating the current cluster In the class, each cluster new central vector; detect whether each new center vector converges, if not, set each new center vector to the current center vector, and continue to perform the above calculation for each feature vector and The distance of each current center vector until each new center vector converges; when each new center vector converges, the current cluster is set to the determined final cluster, and the new center vector is set to the corresponding determination. The central vector of the final cluster.
  • a computer apparatus comprising a memory and a processor, the memory storing computer readable instructions that, when executed by the processor, cause the processor to perform the The steps of the product information pushing method provided by each embodiment.
  • the computer device can be a terminal or a server. As shown in FIG. 9, an internal structure diagram of the computing device is provided.
  • the computer device includes a processor, memory, and network interface coupled by a system bus.
  • the processor is used to provide computing and control capabilities to support the operation of the entire computer device.
  • the memory is used to store data, instruction code, and the like. At least one computer readable instruction is stored on the memory, the computer readable instructions being executable by the processor to implement the product information push method applicable to the computer device provided in the embodiments of the present application.
  • the memory may include a non-volatile storage medium such as a magnetic disk, an optical disk, or a read-only memory (ROM).
  • ROM read-only memory
  • the memory includes a non-volatile storage medium and an internal memory.
  • the non-volatile storage medium stores an operating system, a database, and computer readable instructions.
  • the database stores data related to a product information pushing method provided by the above various embodiments, for example, behavior data of a user may be stored.
  • the computer readable instructions are executable by a processor for implementing a product information push method provided by the various embodiments above.
  • the internal memory provides a cached operating environment for operating systems, databases, and computer readable instructions in a non-volatile storage medium.
  • the network interface may be an Ethernet card or a wireless network card, etc., for communicating with an external terminal or a computer device, such as sending the generated comparison result to a preset test terminal.
  • the computer device is a server, it can also be implemented by a separate server or a server cluster composed of multiple servers.
  • FIG. 8 is only a block diagram of a part of the structure related to the solution of the present application, and does not constitute a limitation of the computer device to which the solution of the present application is applied.
  • the specific computer device may It includes more or fewer components than those shown in the figures, or some components are combined, or have different component arrangements.
  • the processor when the instruction is executed by the processor, the processor is caused to perform the steps of: obtaining behavior data generated by each user within a preset time period; determining a default topic for which each behavior data belongs, each copy The behavior data is associated with the default topic of the attribution; the number of occurrences of the behavior data generated by each user in the preset time period on each preset topic is calculated according to the association relationship; Setting a feature vector of the heat of the topic; calculating a distance between the feature vectors of each user; and acquiring the purchase information of the preset product by the first user Then, the first number of second users whose distance from the first user is the shortest is selected, and the product information of the product is sent to the terminal of the second user.
  • the method when the instruction is executed by the processor, causing the processor to perform the first number of second users that select the shortest distance from the first user, the method further includes the step of: acquiring personal information of each user; Calculating a similarity between each user according to the personal information; and implementing the first number of second users that are the shortest distance from the first user, including: from a user whose similarity with the first user is greater than a similarity threshold A first number of second users that are the shortest distance from the first user are selected.
  • causing the processor to perform a preset topic for determining the attribution of each piece of behavior data comprises: extracting a keyword sentence in each piece of behavior data; calculating each keyword sentence and each Determining a first relevance degree of the topic; calculating a corresponding relationship data and a second relevance degree of each preset topic according to the first association degree; determining a preset topic of the second relevance degree exceeding the relevance degree threshold as behavior data attribution The default topic.
  • the processor when the instruction is executed by the processor, the processor performs a feature vector generated according to the number of occurrences corresponding to the user's popularity for each preset topic, including: according to each behavior data and attribution of each user. The number of occurrences of the preset topic and the second degree of association with the default topic of the attribution are calculated, and the feature vector of each user is calculated.
  • the instructions when executed by the processor, causing the processor, after performing the calculation of the distance between the feature vectors of each user, further comprising the step of: generating a second number of clusters according to the distance, each of The user is divided into a corresponding one of the clusters; the first number of second users that are selected to be the shortest distance from the first user, including: selecting and first from the users who are in the same cluster as the first user The user's distance is the shortest of the first number of second users.
  • the processor when the instructions are executed by the processor, the processor further causes the processor to perform the steps of: determining an initial center vector of each cluster to be formed, setting the initial center vector to correspond to a current center of the cluster to be formed Vector; calculating the distance between each feature vector and each current center vector, dividing the user of each feature vector into the cluster corresponding to the minimum distance, generating the current cluster; calculating the current cluster, each cluster new Center vector; detects whether each new center vector converges, if not, Then each new center vector is set as the current center vector, and the above calculation is performed to calculate the distance of each feature vector from each current center vector until each new center vector converges; when each new center vector When the current convergence is set, the current cluster is set to the determined final cluster, and the new center vector is set to the center vector corresponding to the determined final cluster.
  • the readable storage medium which when executed, may include the flow of an embodiment of the methods as described above.
  • the storage medium may be a magnetic disk, an optical disk, a read-only memory (ROM), or the like.

Landscapes

  • Business, Economics & Management (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Development Economics (AREA)
  • Economics (AREA)
  • Marketing (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • General Business, Economics & Management (AREA)
  • General Physics & Mathematics (AREA)
  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种产品信息推送方法,包括:获取每个用户在预设时间段内产生的行为数据(S202);确定每份行为数据归属的预设话题,将每份行为数据和归属的预设话题建立关联关系(S204);根据所述关联关系计算每个用户在所述预设时间段内产生的行为数据在每个预设话题上的出现次数(S206);根据所述出现次数生成对应用户对每个预设话题的热度的特征向量(S208);计算每个用户的特征向量之间的距离(S210);当获取到第一用户生成了对预设产品的购买信息后,选取与所述第一用户的距离最短的第一数量的第二用户,向所述第二用户的终端发送所述产品的产品信息(S212)。

Description

产品信息推送方法、装置、存储介质和计算机设备
本申请要求于2017年07月27日提交中国专利局,申请号为2017106262800,发明名称为“产品信息推送方法、装置、存储介质和计算机设备”的中国专利申请的优先权,其全部内容通过引用结合在本申请中。
技术领域
本申请涉及信息处理技术领域,特别是涉及一种产品信息推送方法、装置、存储介质和计算机设备。
背景技术
随着大数据应用的发展,即时通信和社交应用平台中存储了大量的用户信息,比如用户的年龄、爱好、职业等个人信息,以及用户发送的即时通信消息、文章等行为信息,而这些用户信息对商品广告等信息的精准推送,有着重要的参考价值。
传统的技术方案中,通常都是对所获取的海量用户进行简单的筛选,比如筛选出浏览过与服务商准备推送的产品信息相同或相似的用户,作为目标用户,并向其投放相关服务商的产品的推送信息。而现实中,存在众多的产品,比如保险或基金等金融产品,在用户不具有强烈的购买欲望时,是很少去浏览这些产品或类似产品的信息,或者即时浏览了这些产品信息,也并不一定具有购买意愿。因此,仅通过这种简单的筛选所确定的用于推送信息的目标用户不够精准。
发明内容
根据本申请公开的各种实施例,提供一种产品信息推送方法、装置、存储介质和计算机设备。
一种产品信息推送方法,包括:获取每个用户在预设时间段内产生的行为数据;确定每份行为数据归属的预设话题,将每份行为数据和归属的预设话题建立关联关系;根据所述关联关系计算每个用户在所述预设时间段内产生的行为数据在每个预设话题上的出现次数;根据所述出现次数生成对应用户对每个预设话题的热度的特征向量;计算每个用户的特征向量之间的距离;及当获取到第一用户生成了对预设产品的购买信息后,选取与所述第一用户的距离最短的第一数量的第二用户,向所述第二用户的终端发送所述产品的产品信息。
一种产品信息推送装置,所述装置包括:行为数据获取模块,用于获取每个用户在预设时间段内产生的行为数据;话题确定模块,用于确定每份行为数据归属的预设话题,将每份行为数据和归属的预设话题建立关联关系;次数计算模块,用于根据所述关联关系计算每个用户在所述预设时间段内产生的行为数据在每个预设话题上的出现次数;特征向量生成模块,用户根据所述出现次数生成对应用户对每个预设话题的热度的特征向量;距离计算模块,用于计算每个用户的特征向量之间的距离;及信息推送模块,用于当获取到第一用户生成了对预设产品的购买信息后,选取与所述第一用户的距离最短的第一数量的第二用户,向所述第二用户的终端发送所述产品的产品信息。
一个或多个存储有计算机可读指令的非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得一个或多个处理器执行如下步骤:获取每个用户在预设时间段内产生的行为数据;确定每份行为数据归属的预设话题,将每份行为数据和归属的预设话题建立关联关系;根据所述关联关系计算每个用户在所述预设时间段内产生的行为数据在每个预设话题上的出现次数;根据所述出现次数生成对应用户对每个预设话题的热度的特征向量;计算每个用户的特征向量之间的距离;及当获取到第一用户生成了对预设产品的购买信息后,选取与所述第一用户的距离最短的第一数量的第二用户,向所述第二用户的终端发送所述产品的产品信息。
一种计算机设备,包括存储器和处理器,所述存储器中存储有计算机可读指令,所述计算机可读指令被所述处理器执行时,使得所述处理器执行如下步骤:获取每个用户在预设时间段内产生的行为数据;确定每份行为数据归属的预设话题,将每份行为数据和归属的预设话题建立关联关系;根据所述关联关系计算每个用户在所述预设时间段内产生的行为数据在每个预设话题上的出现次数;根据所述出现次数生成对应用户对每个预设话题的热度的特征向量;计算每个用户的特征向量之间的距离;及当获取到第一用户生成了对预设产品的购买信息后,选取与所述第一用户的距离最短的第一数量的第二用户,向所述第二用户的终端发送所述产品的产品信息。
本申请的一个或多个实施例的细节在下面的附图和描述中提出。本申请的其它特征、目的和优点将从说明书、附图以及权利要求书变得明显。
附图说明
为了更清楚地说明本申请实施例中的技术方案,下面将对实施例描述中所需要使用的附图作简单地介绍,显而易见地,下面描述中的附图仅仅是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。
图1为一个实施例中产品信息推送方法的应用环境图;
图2为一个实施例中产品信息推送方法的流程图;
图3为一个实施例中确定每份行为数据归属的预设话题的流程图;
图4为一个实施例中中心向量的计算的流程图;
图5为一个实施例中产品信息推送装置的结构框图;
图6为另一个实施例中产品信息推送装置的结构框图;
图7为一个实施例中话题确定模块的结构框图;
图8为又一个实施例中产品信息推送装置的结构框图;及
图9为一个实施例中计算机设备的内部结构图。
具体实施方式
为了使本申请的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本申请进行进一步详细说明。应当理解,此处描述的具体实施例仅仅用以解释本申请,并不用于限定本申请。本申请所使用的术语“第一”、“第二”等可在本文中用于描述各种元件,但这些元件不受这些术语限制。这些术语仅用于将第一个元件与另一个元件区分。举例来说,在不脱离本申请的范围的情况下,可以将第一用户称为第二用户,且类似地,可将第二用户称为第一用户。第一用户和第二用户两者都是用户,但其不是同一用户。
本申请实施例所提供的产品信息推送方法可应用于如图1所示的应用环境中。参考图1,服务器120与多个用户终端110-1~120-x通过网络相连接。每个用户终端110可通过服务器120来生成行为数据,服务器120可接收并存储每个用户终端110所产生的行为数据,并确定每份行为数据归属的预设话题,将每份行为数据和归属的话题建立关联关系;根据关联关系计算每个用户在预设时间段内产生的行为数据在每个话题上的出现次数;根据出现次数生成对应用户对每个话题的热度的特征向量;计算每个用户的特征向量之间的距离;当获取到第一用户生成了对预设产品的购买信息后,选取与第一用户的距离最短的第一数量的第二用户,向第二用户的终端发送产品的产品信息。
在一个实施例中,如图2所示,提供了产品信息推送方法,该方法可应用于如图1所示的服务器中,或者还可应用于终端等计算机设备上,该方法包括:
步骤S202,获取每个用户在预设时间段内产生的行为数据。
本实施例中,行为数据为用户通过即时通信应用或社交应用等发表的信息,该行为数据包括用户终端通过即时通信应用或社交应用发送的聊天消息、文章或日志等形式的数据。当该行为数据的形式为文章或日志等相应形式的文本数据时,可将每一份文章或日志作为对应的一份行为数据。当行为数据的形式为用户发送的文字聊天消息或语音聊天消息等独立的消息时,可每小 时或每天或每周等合适时长内,所产生的所有的消息,汇总为对应的一份行为数据。每份行为数据中还可携带生成该信息的用户的用户标识,或者建立该行为数据与该用户标识之间的关联关系,使得根据其中携带的用户标识或关联关系,确定每条行为数据的所属用户。并根据所确定的用户,区分并汇总每个用户在预设时间段内产生的行为数据。
预设时间段可为所设置的任意时间段,比如可设置为最近一年或两年等。其中,该行为数据可预先存储在计算机设备的本地存储器中,还可为存储在其它设备中。当存储在本地时,计算机设备可从本地存储器中读取每个用户的行为数据;当存储在其它设备上时,可向对应设备发送行为数据获取请求,并接收对应设备根据该请求所返回的行为数据。
步骤S204,确定每份行为数据归属的预设话题,将每份行为数据和归属的预设话题建立关联关系。
本实施例中,计算机设备还预先设置了多种话题,该话题可包括购物、销售、房价、保险、金融、恋爱、游戏、健康等其中的任意几种。服务器可对每份行为数据进行语义解析,根据解析出的语义,确定每个行为数据归属的话题。具体地,可针对每份行为数据进行词句的划分,针对划分的每个词句进行语音解析,根据解析出的每个词句的语义,分析出由该行为数据归属的话题。计算机设备中还预设了话题计算模型,比如可根据预设的LDA(Latent Dirichlet Allocation,潜在狄立克雷分配)话题模型计算出每份行为数据所归属的话题,并建立该行为数据和所归属的话题之间的关联关系。其中,一份行为数据可归属一种或几种话题,进一步地,还可不归属于任意一种预设的话题。
步骤S206,根据关联关系计算每个用户在预设时间段内产生的行为数据在每个预设话题上的出现次数。
本实施例中,计算机设备可根据所建立的每份行为数据和话题之间的关联关系,统计每个话题与每个用户的所有行为数据中,具有关联关系的行为数据的数量,该数量即为在该用户预设时间段内产生的行为数据在每个话题 上的出现次数。该出现次数可反映出用户对每个话题的感兴趣程度,进而可用于反映出每个用户的用户特征。可以理解地,当行为数据越多,所计算出的出现次数所反映的用户特征的准确性越大。
在一个实施例中,还可对每份行为数据进行筛选,确定筛选后的确定每份行为数据归属的预设话题。
具体地,可根据每份行为数据所包含的文字内容以及数量等几种因素,来确定是选取该行为数据还是剔除该行为数据。比如当一份行为数据的文字数量小于一定的数量,和/或解析出的该行为数据所表达的信息相对于每种预设话题,均没有任何意义时,则可剔除该行为数据。并针对所筛选保留的行为数据,确定其归属的话题。同时被舍弃的行为数据可不与任何话题相关联,从而无需对被舍弃的话题进行解析,降低计算机设备的资源消耗。
步骤S208,根据出现次数生成对应用户对每个预设话题的热度的特征向量。
本实施例中,计算机设备可根据对每个话题的出现次数,生成对应的特征向量,并建立该特征向量与对应用户的用户标识之间的关联关系。该特征向量的维度即为话题的种数,特征向量的每个元素,用于反映相应用户对对应一种话题的热度。当预设的话题数量为话题1~话题n这n种话题时,对应的特征向量S可计为[s1,s2,s3...sn]i。其中,特征向量的角标i用于反映该特征向量为用户i的特征向量,s1至sn分别用于反映对应用户i产生的行为数据中,话题1~话题n的讨论热度。
在一个实施例中,特征向量中的元素的数值可为对该话题的讨论次数或频率等。如可将每个话题的出现次数直接设置为特征向量中对应元素的数值,或者可将该次数进行相应的运算处理,将处理后的值作为对应元素的数值。比如还可对每个出现次数进行归一化处理,将归一化处理后的值作为对应元素的数值。通过进行归一化处理,可降低因不同的用户的行为数据的份数的差异过大,而导致后续进行距离计算时,使得每个距离之间的差异性过大。
步骤S210,计算每个用户的特征向量之间的距离。
本实施例中,计算机设备可按照向量之间的距离的计算方式来计算每个用户的特征向量和其他用户的特征向量之间的距离,该距离反映了对应两个用户之间的相似度。距离数值越小,则表示两者之间的相似度越大。计算机设备可获取两个未计算距离的用户的特征向量,两所获取的其中一个特征向量每个参数,与对应另一个获取的特征向量中的对应参数相减,得到差值,然后将每个差值进行平方求和,所得到的平方和数值即为相应两个用户的特征向量之间的距离。
具体地,用户k的特征向量和用户j的特征向量之间的距离dkj,可通过公式
Figure PCTCN2017103989-appb-000001
来计算第一距离。其中,n表示特征标签向量维度,该维度即为预设的提数量,skj和skj分别表示用户k的特征向量和用户j的特征向量中的第i个参数。通过上述的计算公式,分别可计算出每两个用户的特征向量之间的距离。
步骤S212,当获取到第一用户生成了对预设产品的购买信息后,选取与第一用户的距离最短的第一数量的第二用户,向第二用户的终端发送产品的产品信息。
本身实施例中,第一用户为购买了预设产品的用户。第二用户为被确定推荐该预设产品的产品信息的用户。该预设产品可为任意的产品,包括实体产品和虚拟产品。实体产品可包括零食、数码以及服饰等任意类型的产品;虚拟产品可包括虚拟游戏类的产品和金融类的产品等,金融类的产品可为保险、基金、股票等任意一种产品。产品信息包括相应产品的产品名称以及购买方式和购买价格等相应的产品推荐信息。
计算机设备可获取第一用户的购买信息,所获取的购买信息可为第一用户向该计算机设备发送的购买信息,还可为从存储了第一用户的购买信息的其他设备上所获取的购买信息。购买信息中包括相应购买产品的产品标识以及用户标识,使得可根据该用户标读取对应用户的特征向量,根据该产品标 志可获取对应产品的产品信息。
在读取相应的第一用户的用户标识后,可根据该用户标识,读取与对应第一用户的距离最短的第一数量的用户的用户标识,所读取的用户标识即为对应第二用户的用户标识,并向该第二用户的终端发送相应的产品信息。其中,第一数量可为预设的固定数值,还可为根据相应的距离大小所确定的数值。比如,可计算与第一用户的特征向量的距离小于预设距离的其他特征向量的数量,将所计算出的数量设置为第一数量。
在一个实施例中,在步骤S210之前,还包括:根据所计算出的每个用户的特征向量与其他用户的特征向量之间的距离,对其他用户进行排序。比如可按照距离的大小进行顺序排序。举例来说,设存在M个用户,针对用户i,其他用户相对于该用户i的距离,按照从小到大的排序,分别为:用户3、用户9、用户6、用户80等等。
计算机设备可从相对于第一用户的该排名中,读取相对于该第一用户的距离最小的前第一数量的用户的用户标识,所读取的用户标识即为第二用户的用户标识。比如,当该第一用户为上述的用户i,第一数量为3时,则可读取上述的用户3、用户9、用户6的用户标识,并向该用户3、用户9、用户6的终端发送对应产品的产品信息。
上述的产品信息推送方法,通过获取每个用户在预设时间段内产生的行为数据;确定每份行为数据归属的预设话题,从而计算出每个用户的所有行为数据在每个预设话题上的出现次数,根据该次数来生成对应用户对每个预设话题的热度的特征向量,再根据所生成的特征向量来计算每个用户的距离,使得根据该距离大小来体现出每个用户之间的相似度的大小,并当获取到第一用户生成了对预设产品的购买信息后,选取与第一用户的距离最短的第一数量的第二用户,向第二用户的终端发送产品的产品信息,从而提高了向用户进行产品信息推送的精准性。
在一个实施例中,在步骤S212之前,上述方法还包括:获取每个用户的 个人信息;根据个人信息计算每个用户之间的相似度。选取与第一用户的距离最短的第一数量的第二用户,包括:从与第一用户的相似度大于相似度阈值的用户中选取与第一用户的距离最短的第一数量的第二用户。
本实施例中,该个人信息包括用户的性别、年龄、职业、爱好等字段,计算机设备可预先在本地的存储器中存储有每个用户的个人信息,还可向其它设备发送用户的个人信息获取请求,并从其它设备中获取个人信息。计算机设备可根据个人信息中的每个字段,来计算每个用户之间的相似度。该相似度用于反映用户之间的相似性,可以理解地,具有相同或相似的字段的用户之间的相似度较大。
具体地,可根据上述个人信息中的性别、年龄、职业、爱好等其中的任意一种或几种字段,计算每个用户之间的相似度,并根据该相似度来划分用户组,将相似度超过预设的相似度阈值的用户划分为同一用户组之中,使得在与第一用户处于相同用户组之中进行特征向量之间的距离的计算和/或第二用户的选取。计算机设备可从相似度大于相似度阈值的用户中,选取与第一用户的距离最短的第一数量的第二用户。即可在处于相同用户组中来选取与第一用户的距离最短的第一数量的第二用户。
在一个实施例中,可预先划分出预设数量的类别的用户组,并设置每个类别的用户组中所需包含的具体的字段特征,判断每个用户的个人信息中的字段与所划分出的每个类别的用户组中的字段特征的匹配度,并根据该匹配度的大小,将其划分至最高的匹配度对应的用户组中,并判定处于相同用户组中的用户之间的相似度大于该相似度阈值。举例来说,可根据年龄的年龄段、职业所归属的职业类别、爱好的归属类别以及性别等其中的一种或几种的组合,设置出相应数量的用户组。计算每个用户的个人信息中的年龄、职业、爱好以及性别与被设置的用户组的匹配度。
本实施例中,通过进一步结合用户的个人信息,根据用户的个人信息来计算每个用户的相似度,并从与第一用户的相似度大于相似度阈值的用户中选取与第一用户的距离最短的第一数量的第二用户,使得所选取出的用于进 行产品信息推荐的第二用户的精准度更高。
在一个实施例中,如图3所示,确定每份行为数据归属的预设话题,包括:
步骤S302,提取每份行为数据中的关键词句。
本实施例中,计算机设备可对每份行为数据中的内容进行词句切分,对切分后的词句进行筛选,将筛选出的词句设置为行为数据的关键词句。
计算机设备可预先建立词句的语义数据库,该语义数据库中包含了大量的词句(即词语和句子)。根据行为数据的所属语言,按照相应的语法树的规则,并结合语义数据库中所记录的句子,将行为数据中的每个句子切分成相应数量的词句。并确定每个词语在该句子中的词性以及在行为数据中所处的位置,比如确定某个词语为名词,并为该句子中的主语。词句在行为数据中所处的位置包括处于标题、处于行为数据正文中、处于章节的题目中、行为数据作者、发表时间等。
在进行词句切分的过程中,若一个句子中的连续排列在一块的多个词语,在数据库中对应存在一个完整的词语,则将该多个词语组成一个词语,使切割后的词句保持一个整体。
在完成词句切分后,可根据所切分的词句的词性,进行筛选,以删除对于行为数据的话题判定具有干扰或意义不大的词句。具体地,可将词性判断为停用词或助词等词性的词句进行删除。比如,停用词为“the”、“is”、“at”、“that”、“是”、“的”等,助词比如为“也”、“者”、“乎”等。通过对切分后的词句的筛选,既可减少计算机设备对数据处理的计算量,又可排除被删除的词句的干扰,提高了话题判定的准确性。
步骤S304,计算每个关键词句和每个预设话题的第一关联度。
本实施例中,计算机设备可预先设置了多个话题,在获取某一关键词句后,可首先检测是否预先计算出该关键词句或者与该关键词句相近的词句,和每个话题的关联度,若是,则可不必再次计算,直接将该关键词句或相近 的词句与每个话题的关联度,设置为该关键词句和每个预设话题的第一关联度。若不存在,则可对每个关键词句进行语义解析,获取每个关键词语与每个预设话题的第一关联度。第一关联度可用百分比来表示,其关联度的大小处于0~1。举例来说,比如预设话题包括购物、销售、房价、保险、金融、恋爱、游戏、健康等。则可针对获取的每个关键词句,分别计算其与上述的购物、销售、房价、保险、金融、恋爱、游戏、健康等话题的第一关联度。进一步地,根据每个关键词句和所有预设话题的第一关联度可构成相应的第一关联度向量。可以理解地,该第一关联度向量是与上述的特征向量的维度相同的向量,其中的每一个元素分别表示对应关键词句和其中一种预设话题的关联度。
具体地,计算机设备可预设有语义分析模型,并可根据该语义分析模型来计算出所获取的每个关键词句和每个话题的关联度。其中,该语义分析模型可包括根据潜在语义分析(Latent Semantic Index,LSI)、潜在狄立克雷分配(Latent Dirichlet Allocation,LDA)或者Word2vec等所设置出的语义分析模型。
步骤S306,根据第一关联度计算出对应行为数据和每个预设话题的第二关联度。
计算机设备可对每个话题,将其与每个关键词句的第一关联度进行加权求和,从而可计算出该行为数据对每个话题的第二关联度。具体地,可将每个构成的第一关联度向量中,处于相同维度上的元素进行加权求和,根据所计算出的每个元素的加权和,生成该第二关联度向量,该第二关联度向量中的每个元素即为对应行为数据和相应话题的第二关联度。其中,每个关键词句的权值可根据检测出的该词句在行为数据中的位置、词性,以及行为数据中的关键词句的数量等一种或多种因素来确定。比如当该关键词句处于行为数据中的标题中时,可设置相对较大的权值;当该行为数据中的每个关键词句的数量较多时,为每个关键词句设置的权值则相对较小。
步骤S308,将超过关联度阈值的第二关联度对应的预设话题确定为行为 数据归属的预设话题。
本实施例中,计算机设备中还设置了相应的关联度阈值,关联度阈值可为自定义设置的数值。计算机设备可将每个第二关联度与该关联度阈值进行比较,提取超过该关联度阈值的第二关联度的话题,将其与相应行为数据建立对应关系,使得将其设置为该行为数据所归属的话题。举例来说,当关联度阈值为0.5时,则可将所有大于0.5的第二关联度对应的话题,与该行为数据之间建立关联关系,将其设置为该行为数据所归属的话题。
本实施例中,通过对行为数据划分关键词句,然后计算每个关键词句和每个预设话题的第一关联度,根据所有的第一关联度来确定行为数据所归属的话题,可进一步提高对行为数据进行话题归属划分的精准性。
在一个实施例中,根据出现次数生成对应用户对每个预设话题的热度的特征向量,包括:根据每个用户的每份行为数据与归属的预设话题的出现次数,以及与归属的话题的第二关联度,计算出每个用户的特征向量。
本实施例中,计算机设备可针对每个话题,计算出每个用户的行为数据中,对归属于该话题的数量,该数量即为上述的出现次数。并结合归属该话题的行为数据第二关联度,设置相应的权值,根据该出现次数和权值,计算出该特征向量中对应于该话题的参数数值。该参数数值可为该权值和次数的乘积。其中,该权值可为根据每个第二关联度所计算出的平均值。
举例来说,假设存在话题i,针对用户i的行为数据中,被确定为归属至该话题i的行为数据的数量为10个,即该出现次数为10。计算机设备可分别提取所计算出的该10个行为数据相对于该话题i的第二关联度,根据所提取出的10个第二关联度计算出相应的权值,将该权值和次数的乘积设置为用户i的特征向量Si中用于反映对该话题i的热度的参数si的数值。针对每个用户以及每个话题,可均采用上述的方法,从而可计算出每个用户的特征向量。
本实施例中,通过进一步结合所计算出的第二关联度来计算出用户的特征性向量,使得所计算出的用户的特征向量更能量化表达出用户对每个话题 的热度,从而根据该特征向量所确定的第二用户更精准。
在一个实施例中,在步骤S208之后,还包括根据距离生成第二数量的聚类,将每个用户划分至对应的一个聚类之中。
本实施例中,计算机设备还预先设置了第二数量的类别,用于将每个用户划分至对应的一个类别之中,被划分至同一类别中的用户构成对应一个聚类。其中,该第二数量可为任意设置的合适的小于用户总数的数值,第二数量越大,则对用户划分的类别越细致。计算机设备可预先针对每个聚类设置相应的中心向量,中心向量是用于表征某一类别的信息对应的种子用户所共有的特征信息,中心向量的形式与特征向量相同,具有与特征向量相同长度的维度,其每个维度上的参数表征的含义与特征向量对应的参数表征的含义相同。
计算机设备可计算每个特征向量与每个中心向量之间的距离,将特征向量划分至对应距离最小的一个聚类之中,从而完成了对每个用户的类别的划分,被划分至同一聚类中的特征向量所对应的用户即处于同一聚类。
选取与第一用户的距离最短的第一数量的第二用户的步骤,包括:从与第一用户处于相同聚类的用户中选取与第一用户的距离最短的第一数量的第二用户。
当获取到第一用户生成了对预设产品的购买信息后,确定该第一用户所被划分的聚类,并从该聚类中读取特征向量与该第一用户的特征向量的距离最短的,第一数量的用户的用户标识。通过从处于相同聚类的用户中进行用户的选取,可进一步提高了对用户选取的精准度。
在一个实施例中,可根据每个用户的特征向量,按照预设的聚类算法计算出每个聚类的中心距离。该聚类算法可为K-means、K-medoids或Clara等任意一种聚类算法。如图4所示,中心向量的计算过程包括:
步骤S402,确定待形成的每个聚类的初始中心向量,将初始中心向量设置为对应待形成的聚类的当前中心向量。
本实施例中,该初始中心向量可为预先设置的特征向量。该初始中心向量可根据每个聚类的所具有的特征而形成的历史经验值而设置的特征向量,以减少计算机设备的计算量。进一步地还可从每个用户的特征向量中来选取第二数量的特征向量,将选取的每个特征向量设置为对应一个聚类的初始中心向量,使每个聚类均具有一个中心向量,将初始中心向量设置为当前中心向量。同样的,也可以按照该历史经验来选取。将确定初始中心向量即为当前中心向量,用于进行下述步骤的迭代计算,以确定最终聚类和每个所确定的最终聚类的中心向量。
步骤S404,计算每个特征向量与每个当前中心向量的距离,将每个特征向量的用户划分至最小距离对应的聚类中,生成当前聚类。
在生成了当前中心向量之后,可分别计算每个特征向量与每个当前中心向量的距离,根据所计算出的距离将用户进行划分,从而形成了对应第二数量的聚类,将该聚类设置为当前聚类,用于进行下述的迭代。具体地,可针对每个用户的特征向量,计算其与每个当前中心向量距离,确定该第二数量的距离中的最小距离,将该特征向量的用户划分至最小距离所对应的聚类中。针对每个用户的特征向量,重复该过程,以完成对所有用户的划分,从而实现一次聚类的形成。
步骤S406,计算当前聚类中,每个聚类新的中心向量。
本实施例中,可根据最新形成的聚类,针对每个聚类中的用户的特征向量,计算出该聚类的中心向量,并将新计算出的中心向量替换上一次计算出的中心向量,以进行迭代计算。
其中,可按照预设的中心向量计算模型来计算出每个聚类的中心向量的计算方式。比如可对每个聚类的特征向量,将特征向量中相同位置的参数进行求平均值等运算,以形成对应聚类的新的中心向量中,该参数的数值。
步骤S408,检测每个新的中心向量是否收敛,若否,则将每个新的中心向量设置为当前中心向量,并继续执行计算每个特征向量与每个当前中心向量的距离,直至每个新的中心向量均收敛。当每个新的中心向量均收敛时, 将该当前聚类设置为确定的最终聚类,将新的中心向量设置为对应确定的最终聚类的中心向量。
本实施例中,可判断每个新的中心向量与前一次的中心向量之间的差值是否小于预设数值。若是,则判定所计算出的中心点收敛,否则,判断不收敛。当判断不收敛时,可将当前计算出的每个聚类的中心点作为对应待形成聚类的当前中心向量,返回上述步骤S404,继续执行步骤S404~S408,以再次形成新的聚类和新的聚类的中心向量,直至判定每个最新形成的聚类的中心向量收敛。
具体地,该差值可为距离值。计算机设备中对应预设有一个距离阈值,该预设距离阈值用于作为判断中心向量是否收敛的标准。通过将所计算出的差值与该距离阈值进行比较,当小于该距离阈值时,则判定对应中心向量收敛。当其中的一个或多个中心向量不收敛时,可继续执行步骤S404,将最新的中心向量设置为对应待形成聚类的初始中心向量,以进行重新聚类。若每个差值均小于距离阈值,则判定所计算出的新的中心点收敛。
当在判断出所有聚类的中心点已均经收敛时,则可终止迭代,将该当前聚类设置为确定的最终聚类,将新的中心向量设置为对应确定的最终聚类的中心向量,完成对所有用户的聚类以及每个聚类的中心向量的计算。
本实施例中,通过判断所形成的中心向量是否收敛,若否,则重新聚类,计算重新形成的聚类的中心向量,并进行中心向量的收敛判断,直至每个中心向量均收敛,将该当前聚类设置为确定的最终聚类,将新的中心向量设置为对应确定的最终聚类的中心向量,完成对所有用户的聚类以及每个聚类的中心向量的计算。可提高对聚类形成的准确度,提高所计算出的聚类中心的准确度,进而提高了对第二用户选取的精准度。
在一个实施例中,针对根据用户的个人信息所形成的用户组,对每个用户组中按照上述的聚类过程形成相应的聚类,以进一步提高对用户划分的准确度,提高对第二用户选取的精准度。
在一个实施例中,如图5所示,提供了一种产品信息推送装置,该装置包括:
行为数据获取模块502,用于获取每个用户在预设时间段内产生的行为数据。
话题确定模块504,用于确定每份行为数据归属的预设话题,将每份行为数据和归属的预设话题建立关联关系。
次数计算模块506,用于根据关联关系计算每个用户在预设时间段内产生的行为数据在每个预设话题上的出现次数。
特征向量生成模块508,用户根据出现次数生成对应用户对每个预设话题的热度的特征向量。
距离计算模块510,用于计算每个用户的特征向量之间的距离。
信息推送模块512,用于当获取到第一用户生成了对预设产品的购买信息后,选取与第一用户的距离最短的第一数量的第二用户,向第二用户的终端发送产品的产品信息。
在一个实施例中,如图6所示,上述转置还包括:
用户划分模块514,用于获取每个用户的个人信息;根据个人信息计算每个用户之间的相似度。
信息推送模块512还用于从与第一用户的相似度大于相似度阈值的用户中选取与第一用户的距离最短的第一数量的第二用户。
在一个实施例中,如图7所示,预设话题确定模块504包括:
关键词句提取单元702,用于提取每份行为数据中的关键词句;
关联度计算单元704,用于计算每个关键词句和每个预设话题的第一关联度;根据第一关联度计算出对应行为数据和每个预设话题的第二关联度;
话题确定单元706,用于将超过关联度阈值的第二关联度的预设话题确定为行为数据归属的预设话题。
在一个实施例中,特征向量生成模块508还用于根据每个用户的每份行为数据与归属的预设话题的出现次数,以及与归属的预设话题的第二关联度, 计算出每个用户的特征向量。
在一个实施例中,如图8所示,上述装置还包括:
聚类模块516,用于根据距离生成第二数量的聚类,将每个用户划分至对应的一个聚类之中。
信息推送模块512还用于从与第一用户处于相同聚类的用户中选取与第一用户的距离最短的第一数量的第二用户。
在一个实施例中,该聚类模块516还用于确定待形成的每个聚类的初始中心向量,将初始中心向量设置为对应待形成的聚类的当前中心向量;计算每个特征向量与每个当前中心向量的距离,将每个特征向量的用户划分至最小距离对应的聚类中,生成当前聚类;计算当前聚类中,每个聚类新的中心向量;检测每个新的中心向量是否收敛,若否,则将每个新的中心向量设置为当前中心向量,并继续执行上述的计算每个特征向量与每个当前中心向量的距离,直至每个新的中心向量均收敛;当每个新的中心向量都收敛时将该当前聚类设置为确定的最终聚类,将新的中心向量设置为对应确定的最终聚类的中心向量。
上述装置中的各个模块和单元可全部或部分通过软件、硬件及其组合来实现。其中,网络接口可以是以太网卡或无线网卡等。上述各模块和单元可以硬件形式内嵌于或独立于计算机设备中的处理器中,也可以以软件形式存储于计算机设备中的存储器中,以便于处理器调用执行以上各个模块对应的操作。该处理器可以为中央处理单元(CPU)、微处理器、单片机等。
在一个实施例中,本申请提供的产品信息推送装置可以实现为一种计算机程序的形式,该计算机程序可在如图9所示的计算机设备上运行,所述计算机设备的非易失性存储介质可存储组成该产品信息推送装置的各个程序模块。比如可包括如图5所示的行为数据获取模块502、话题确定模块504、次数计算模块506、特征向量生成模块508、距离计算模块510及信息推送模块512。各个程序模块中包括计算机可读指令,该计算机可读指令用于使所述计算机设备执行本说明书中描述的本申请各个实施例的数据处理方法中的步 骤,例如,计算机设备可以通过如图5所示的产品信息推送装置中的行为数据获取模块502接收获取每个用户在预设时间段内产生的行为数据;通过话题确定模块504确定每份行为数据归属的预设话题,将每份行为数据和归属的预设话题建立关联关系;通过次数计算模块506根据关联关系计算每个用户在预设时间段内产生的行为数据在每个预设话题上的出现次数;通过特征向量生成模块508根据出现次数生成对应用户对每个预设话题的热度的特征向量;通过距离计算模块510计算每个用户的特征向量之间的距离;及通过信息推送模块512当获取到第一用户生成了对预设产品的购买信息后,选取与第一用户的距离最短的第一数量的第二用户,向第二用户的终端发送产品的产品信息。
在一个实施例中,提供了一个或多个包含计算机可读指令的非易失性计算机可读存储介质,当该计算机可读指令被一个或多个处理器执行时,使得该处理器执行上述各实施例所提供的产品信息推送方法的步骤。
在一个实施例中,当该计算机可读指令被一个或多个处理器执行时,使得该处理器执行以下步骤:获取每个用户在预设时间段内产生的行为数据;确定每份行为数据归属的预设话题,将每份行为数据和归属的预设话题建立关联关系;根据关联关系计算每个用户在预设时间段内产生的行为数据在每个预设话题上的出现次数;根据出现次数生成对应用户对每个预设话题的热度的特征向量;计算每个用户的特征向量之间的距离;当获取到第一用户生成了对预设产品的购买信息后,选取与第一用户的距离最短的第一数量的第二用户,向第二用户的终端发送产品的产品信息。
在一个实施例中,当该计算机可读指令被一个或多个处理器执行时,使得该处理器在执行选取与第一用户的距离最短的第一数量的第二用户之前,还包括执行以下步骤:获取每个用户的个人信息;根据个人信息计算每个用户之间的相似度;所实现的选取与第一用户的距离最短的第一数量的第二用户,包括:从与第一用户的相似度大于相似度阈值的用户中选取与第一用户的距离最短的第一数量的第二用户。
在一个实施例中,当该计算机可读指令被一个或多个处理器执行时,使得该处理器执行的确定每份行为数据归属的预设话题,包括:提取每份行为数据中的关键词句;计算每个关键词句和每个预设话题的第一关联度;根据第一关联度计算出对应行为数据和每个预设话题的第二关联度;将超过关联度阈值的第二关联度的预设话题确定为行为数据归属的预设话题。
在一个实施例中,当该计算机可读指令被一个或多个处理器执行时,使得该处理器执行的根据出现次数生成对应用户对每个预设话题的热度的特征向量,包括:根据每个用户的每份行为数据与归属的预设话题的出现次数,以及与归属的预设话题的第二关联度,计算出每个用户的特征向量。
在一个实施例中,当该计算机可读指令被一个或多个处理器执行时,使得该处理器在执行计算每个用户的特征向量之间的距离之后,还包括执行以下步骤:根据距离生成第二数量的聚类,将每个用户划分至对应的一个聚类之中;所实现的选取与第一用户的距离最短的第一数量的第二用户,包括:从与第一用户处于相同聚类的用户中选取与第一用户的距离最短的第一数量的第二用户。
在一个实施例中,当该计算机可读指令被一个或多个处理器执行时,还使得该处理器执行以下步骤:确定待形成的每个聚类的初始中心向量,将初始中心向量设置为对应待形成的聚类的当前中心向量;计算每个特征向量与每个当前中心向量的距离,将每个特征向量的用户划分至最小距离对应的聚类中,生成当前聚类;计算当前聚类中,每个聚类新的中心向量;检测每个新的中心向量是否收敛,若否,则将每个新的中心向量设置为当前中心向量,并继续执行上述的计算每个特征向量与每个当前中心向量的距离,直至每个新的中心向量均收敛;当每个新的中心向量都收敛时将该当前聚类设置为确定的最终聚类,将新的中心向量设置为对应确定的最终聚类的中心向量。
在一个实施例中,提供了一种计算机设备,包括存储器及处理器,存储器中储存有计算机可读指令,该指令被处理器执行时,使得处理器执行上述 各实施例所提供的产品信息推送方法的步骤。
该计算机设备可为终端或服务器,如图9所示,提供了该计算设备的一种内部结构示意图。该计算机设备包括通过***总线连接的处理器、存储器和网络接口。其中,该处理器用于提供计算和控制能力,支撑整个计算机设备的运行。存储器用于存储数据、指令代码等。存储器上存储至少一个计算机可读指令,该计算机可读指令可被处理器执行,以实现本申请实施例中提供的适用于该计算机设备的产品信息推送方法。存储器可包括磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质。例如,在一个实施例中,存储器包括非易失性存储介质及内存储器。该非易失性存储介质存储有操作***、数据库和计算机可读指令。该数据库中存储有用于实现以上各个实施例所提供的一种产品信息推送方法相关的数据,比如可存储有用户的行为数据。该计算机可读指令可被处理器所执行,以用于实现以上各个实施例所提供的一种产品信息推送方法。该内存储器为非易失性存储介质中的操作***、数据库和计算机可读指令提供高速缓存的运行环境。网络接口可以是以太网卡或无线网卡等,用于与外部的终端或计算机设备进行通信,如将所生成的对比结果发送至预设的测试终端。当该计算机设备为服务器时,还可由独立的服务器或者是多个服务器组成的服务器集群来实现。本领域技术人员可以理解,图8中示出的结构,仅仅是与本申请方案相关的部分结构的框图,并不构成对本申请方案所应用于其上的计算机设备的限定,具体的计算机设备可以包括比图中所示更多或更少的部件,或者组合某些部件,或者具有不同的部件布置。
在一个实施例中,该指令被处理器执行时,使得处理器执行以下步骤:获取每个用户在预设时间段内产生的行为数据;确定每份行为数据归属的预设话题,将每份行为数据和归属的预设话题建立关联关系;根据关联关系计算每个用户在预设时间段内产生的行为数据在每个预设话题上的出现次数;根据出现次数生成对应用户对每个预设话题的热度的特征向量;计算每个用户的特征向量之间的距离;当获取到第一用户生成了对预设产品的购买信息 后,选取与第一用户的距离最短的第一数量的第二用户,向第二用户的终端发送产品的产品信息。
在一个实施例中,该指令被处理器执行时,使得处理器执行选取与第一用户的距离最短的第一数量的第二用户之前,还包括执行以下步骤:获取每个用户的个人信息;根据个人信息计算每个用户之间的相似度;所实现的选取与第一用户的距离最短的第一数量的第二用户,包括:从与第一用户的相似度大于相似度阈值的用户中选取与第一用户的距离最短的第一数量的第二用户。
在一个实施例中,该指令被处理器执行时,使得处理器执行的确定每份行为数据归属的预设话题,包括:提取每份行为数据中的关键词句;计算每个关键词句和每个预设话题的第一关联度;根据第一关联度计算出对应行为数据和每个预设话题的第二关联度;将超过关联度阈值的第二关联度的预设话题确定为行为数据归属的预设话题。
在一个实施例中,该指令被处理器执行时,使得处理器执行的根据出现次数生成对应用户对每个预设话题的热度的特征向量,包括:根据每个用户的每份行为数据与归属的预设话题的出现次数,以及与归属的预设话题的第二关联度,计算出每个用户的特征向量。
在一个实施例中,该指令被处理器执行时,使得处理器在执行计算每个用户的特征向量之间的距离之后,还包括执行以下步骤:根据距离生成第二数量的聚类,将每个用户划分至对应的一个聚类之中;所实现的选取与第一用户的距离最短的第一数量的第二用户,包括:从与第一用户处于相同聚类的用户中选取与第一用户的距离最短的第一数量的第二用户。
在一个实施例中,该指令被处理器执行时,还使得处理器执行以下步骤:确定待形成的每个聚类的初始中心向量,将初始中心向量设置为对应待形成的聚类的当前中心向量;计算每个特征向量与每个当前中心向量的距离,将每个特征向量的用户划分至最小距离对应的聚类中,生成当前聚类;计算当前聚类中,每个聚类新的中心向量;检测每个新的中心向量是否收敛,若否, 则将每个新的中心向量设置为当前中心向量,并继续执行上述的计算每个特征向量与每个当前中心向量的距离,直至每个新的中心向量均收敛;当每个新的中心向量都收敛时将该当前聚类设置为确定的最终聚类,将新的中心向量设置为对应确定的最终聚类的中心向量。
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机可读指令来指令相关的硬件来完成,所述的计算机可读指令可存储于一非易失性计算机可读取存储介质中,该计算机可读指令在执行时,可包括如上述各方法的实施例的流程。其中,所述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等。
以上所述实施例的各技术特征可以进行任意的组合,为使描述简洁,未对上述实施例中的各个技术特征所有可能的组合都进行描述,然而,只要这些技术特征的组合不存在矛盾,都应当认为是本说明书记载的范围。
以上所述实施例仅表达了本申请的几种实施方式,其描述较为具体和详细,但并不能因此而理解为对发明专利范围的限制。应当指出的是,对于本领域的普通技术人员来说,在不脱离本申请构思的前提下,还可以做出若干变形和改进,这些都属于本申请的保护范围。因此,本申请专利的保护范围应以所附权利要求为准。

Claims (20)

  1. 一种产品信息推送方法,包括:
    获取每个用户在预设时间段内产生的行为数据;
    确定每份行为数据归属的预设话题,将每份行为数据和归属的预设话题建立关联关系;
    根据所述关联关系计算每个用户在所述预设时间段内产生的行为数据在每个预设话题上的出现次数;
    根据所述出现次数生成对应用户对每个预设话题的热度的特征向量;
    计算每个用户的特征向量之间的距离;及
    当获取到第一用户生成了对预设产品的购买信息后,选取与所述第一用户的距离最短的第一数量的第二用户,向所述第二用户的终端发送所述产品的产品信息。
  2. 根据权利要求1所述的方法,其特征在于,在所述选取与所述第一用户的距离最短的第一数量的第二用户之前,还包括:
    获取每个用户的个人信息;及
    根据所述个人信息计算每个用户之间的相似度;
    所述选取与所述第一用户的距离最短的第一数量的第二用户,包括:
    从与所述第一用户的相似度大于相似度阈值的用户中选取与所述第一用户的距离最短的第一数量的第二用户。
  3. 根据权利要求1所述的方法,其特征在于,所述确定每份行为数据归属的预设话题,包括:
    提取每份行为数据中的关键词句;
    计算每个关键词句和每个预设话题的第一关联度;
    根据所述第一关联度计算出对应行为数据和每个预设话题的第二关联度;及
    将超过关联度阈值的第二关联度的预设话题确定为行为数据归属的预设话题。
  4. 根据权利要求3所述的方法,其特征在于,所述根据所述出现次数生成对应用户对每个预设话题的热度的特征向量,包括:
    根据每个用户的每份行为数据与归属的预设话题的出现次数,以及与归属的预设话题的第二关联度,计算出每个用户的特征向量。
  5. 根据权利要求1所述的方法,其特征在于,在所述计算每个用户的特征向量之间的距离之后,还包括:
    根据所述距离生成第二数量的聚类,将每个用户划分至对应的一个聚类之中;
    所述选取与所述第一用户的距离最短的第一数量的第二用户,包括:
    从与所述第一用户处于相同聚类的用户中选取与所述第一用户的距离最短的第一数量的第二用户。
  6. 一种产品信息推送装置,包括:
    行为数据获取模块,用于获取每个用户在预设时间段内产生的行为数据;
    话题确定模块,用于确定每份行为数据归属的预设话题,将每份行为数据和归属的预设话题建立关联关系;
    次数计算模块,用于根据所述关联关系计算每个用户在所述预设时间段内产生的行为数据在每个预设话题上的出现次数;
    特征向量生成模块,用户根据所述出现次数生成对应用户对每个预设话题的热度的特征向量;
    距离计算模块,用于计算每个用户的特征向量之间的距离;及
    信息推送模块,用于当获取到第一用户生成了对预设产品的购买信息后,选取与所述第一用户的距离最短的第一数量的第二用户,向所述第二用户的终端发送所述产品的产品信息。
  7. 根据权利要求6所述的装置,其特征在于,所述装置还包括:
    用户划分模块,用于获取每个用户的个人信息;根据所述个人信息计算每个用户之间的相似度;
    所述信息推送模块还用于从与所述第一用户的相似度大于相似度阈值的 用户中选取与所述第一用户的距离最短的第一数量的第二用户。
  8. 根据权利要求6所述的装置,其特征在于,所述预设话题确定模块包括:
    关键词句提取单元,用于提取每份行为数据中的关键词句;
    关联度计算单元,用于计算每个关键词句和每个预设话题的第一关联度;根据所述第一关联度计算出对应行为数据和每个预设话题的第二关联度;及
    话题确定单元,用于将超过关联度阈值的第二关联度的预设话题确定为行为数据归属的预设话题。
  9. 根据权利要求8所述的装置,其特征在于,所述特征向量生成模块还用于根据每个用户的每份行为数据与归属的预设话题的出现次数,以及与归属的预设话题的第二关联度,计算出每个用户的特征向量。
  10. 根据权利要求6所述的装置,其特征在于,所述装置还包括:
    聚类模块,用于根据距离生成第二数量的聚类,将每个用户划分至对应的一个聚类之中;
    所述信息推送模块还用于从与第一用户处于相同聚类的用户中选取与第一用户的距离最短的第一数量的第二用户。
  11. 一个或多个存储有计算机可读指令的计算机可读非易失性存储介质,所述计算机可读指令被一个或多个处理器执行时,使得所述一个或多个处理器执行以下步骤:
    获取每个用户在预设时间段内产生的行为数据;
    确定每份行为数据归属的预设话题,将每份行为数据和归属的预设话题建立关联关系;
    根据所述关联关系计算每个用户在所述预设时间段内产生的行为数据在每个预设话题上的出现次数;
    根据所述出现次数生成对应用户对每个预设话题的热度的特征向量;
    计算每个用户的特征向量之间的距离;及
    当获取到第一用户生成了对预设产品的购买信息后,选取与所述第一用户的距离最短的第一数量的第二用户,向所述第二用户的终端发送所述产品的产品信息。
  12. 根据权利要求11所述的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,还使得所述一个或多个处理器执行以下步骤:
    获取每个用户的个人信息;
    根据所述个人信息计算每个用户之间的相似度;及
    从与所述第一用户的相似度大于相似度阈值的用户中选取与所述第一用户的距离最短的第一数量的第二用户。
  13. 根据权利要求11所述的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,还使得所述一个或多个处理器执行以下步骤:
    提取每份行为数据中的关键词句;
    计算每个关键词句和每个预设话题的第一关联度;
    根据所述第一关联度计算出对应行为数据和每个预设话题的第二关联度;及
    将超过关联度阈值的第二关联度的预设话题确定为行为数据归属的预设话题。
  14. 根据权利要求13所述的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,还使得所述一个或多个处理器执行以下步骤:
    根据每个用户的每份行为数据与归属的预设话题的出现次数,以及与归属的预设话题的第二关联度,计算出每个用户的特征向量。
  15. 根据权利要求11所述的存储介质,其特征在于,所述计算机可读指令被一个或多个处理器执行时,还使得所述一个或多个处理器执行以下步骤:
    根据所述距离生成第二数量的聚类,将每个用户划分至对应的一个聚类之中;及
    从与所述第一用户处于相同聚类的用户中选取与所述第一用户的距离最短的第一数量的第二用户。
  16. 一种计算机设备,包括存储器及处理器,所述存储器中储存有计算机可读指令,所述指令被所述处理器执行时,使得所述处理器执行以下步骤:
    获取每个用户在预设时间段内产生的行为数据;
    确定每份行为数据归属的预设话题,将每份行为数据和归属的预设话题建立关联关系;
    根据所述关联关系计算每个用户在所述预设时间段内产生的行为数据在每个预设话题上的出现次数;
    根据所述出现次数生成对应用户对每个预设话题的热度的特征向量;
    计算每个用户的特征向量之间的距离;及
    当获取到第一用户生成了对预设产品的购买信息后,选取与所述第一用户的距离最短的第一数量的第二用户,向所述第二用户的终端发送所述产品的产品信息。
  17. 根据权利要求16所述的计算机设备,其特征在于,所述指令被所述处理器执行时,还使得所述处理器执行以下步骤:
    获取每个用户的个人信息;
    根据所述个人信息计算每个用户之间的相似度;及
    从与所述第一用户的相似度大于相似度阈值的用户中选取与所述第一用户的距离最短的第一数量的第二用户。
  18. 根据权利要求16所述的计算机设备,其特征在于,所述指令被所述处理器执行时,还使得所述处理器执行以下步骤:
    提取每份行为数据中的关键词句;
    计算每个关键词句和每个预设话题的第一关联度;
    根据所述第一关联度计算出对应行为数据和每个预设话题的第二关联度;及
    将超过关联度阈值的第二关联度的预设话题确定为行为数据归属的预设话题。
  19. 根据权利要求18所述的计算机设备,其特征在于,所述指令被所述处理器执行时,还使得所述处理器执行以下步骤:
    根据每个用户的每份行为数据与归属的预设话题的出现次数,以及与归属的预设话题的第二关联度,计算出每个用户的特征向量。
  20. 根据权利要求16所述的计算机设备,其特征在于,所述指令被所述处理器执行时,还使得所述处理器执行以下步骤:
    根据所述距离生成第二数量的聚类,将每个用户划分至对应的一个聚类之中;及
    从与所述第一用户处于相同聚类的用户中选取与所述第一用户的距离最短的第一数量的第二用户。
PCT/CN2017/103989 2017-07-27 2017-09-28 产品信息推送方法、装置、存储介质和计算机设备 WO2019019348A1 (zh)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN201710626280.0 2017-07-27
CN201710626280.0A CN107688984A (zh) 2017-07-27 2017-07-27 产品信息推送方法、装置、存储介质和计算机设备

Publications (1)

Publication Number Publication Date
WO2019019348A1 true WO2019019348A1 (zh) 2019-01-31

Family

ID=61152352

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2017/103989 WO2019019348A1 (zh) 2017-07-27 2017-09-28 产品信息推送方法、装置、存储介质和计算机设备

Country Status (2)

Country Link
CN (1) CN107688984A (zh)
WO (1) WO2019019348A1 (zh)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111932321A (zh) * 2020-09-23 2020-11-13 北京每日优鲜电子商务有限公司 针对用户的物品信息推送方法、装置、电子设备和介质

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108875043B (zh) * 2018-06-27 2022-02-25 腾讯科技(北京)有限公司 用户数据处理方法、装置、计算机设备和存储介质
CN109523344A (zh) * 2018-10-16 2019-03-26 深圳壹账通智能科技有限公司 产品信息推荐方法、装置、计算机设备和存储介质
TWI666558B (zh) * 2018-11-20 2019-07-21 財團法人資訊工業策進會 語意分析方法、語意分析系統及非暫態電腦可讀取媒體
CN109636479A (zh) * 2018-12-19 2019-04-16 武汉斗鱼鱼乐网络科技有限公司 一种广告推荐方法、装置、电子设备及存储介质
CN111797871A (zh) * 2019-04-09 2020-10-20 Oppo广东移动通信有限公司 信息处理方法、装置、存储介质及电子设备
CN110740166B (zh) * 2019-09-19 2022-06-17 平安科技(深圳)有限公司 基于距离的信息发送方法、装置、计算机设备和存储介质
CN112711699B (zh) * 2019-10-24 2023-04-07 上海哔哩哔哩科技有限公司 用户划分方法、***、计算机设备及可读存储介质
CN111475719B (zh) * 2020-03-30 2023-04-07 招商局金融科技有限公司 基于数据挖掘的信息推送方法、装置及存储介质
CN112070524B (zh) * 2020-07-24 2024-02-13 广州阿凡提电子科技有限公司 广告业务推荐方法、装置
CN113902596B (zh) * 2021-09-17 2022-06-14 广州认真教育科技有限公司 一种利用信息匹配的课后服务方法及***

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110213786A1 (en) * 2010-02-26 2011-09-01 International Business Machines Corporation Generating recommended items in unfamiliar domain
CN103377250A (zh) * 2012-04-27 2013-10-30 杭州载言网络技术有限公司 基于邻域的top-k推荐方法
CN103853763A (zh) * 2012-12-03 2014-06-11 腾讯科技(深圳)有限公司 获取信息的方法和装置
CN104281882A (zh) * 2014-09-16 2015-01-14 中国科学院信息工程研究所 基于用户特征的预测社交网络信息流行度的方法及***
CN106355449A (zh) * 2016-08-31 2017-01-25 腾讯科技(深圳)有限公司 用户选取方法和装置

Family Cites Families (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103577400A (zh) * 2012-07-18 2014-02-12 三星电子(中国)研发中心 一种提供地点信息的方法及***
CN104102662B (zh) * 2013-04-10 2018-05-22 阿里巴巴集团控股有限公司 一种用户兴趣偏好相似度确定方法及装置
CN103886090B (zh) * 2014-03-31 2018-01-02 北京搜狗科技发展有限公司 基于用户喜好的内容推荐方法及装置
CN104834967A (zh) * 2015-04-24 2015-08-12 南京邮电大学 泛在网络下基于用户相似度的业务行为预测方法
CN106445961B (zh) * 2015-08-10 2021-02-23 北京奇虎科技有限公司 新闻推送方法及装置
CN106447372B (zh) * 2015-08-10 2022-03-08 北京奇虎科技有限公司 产品信息推送方法及装置
CN105956146A (zh) * 2016-05-12 2016-09-21 腾讯科技(深圳)有限公司 一种物品信息的推荐方法及装置

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110213786A1 (en) * 2010-02-26 2011-09-01 International Business Machines Corporation Generating recommended items in unfamiliar domain
CN103377250A (zh) * 2012-04-27 2013-10-30 杭州载言网络技术有限公司 基于邻域的top-k推荐方法
CN103853763A (zh) * 2012-12-03 2014-06-11 腾讯科技(深圳)有限公司 获取信息的方法和装置
CN104281882A (zh) * 2014-09-16 2015-01-14 中国科学院信息工程研究所 基于用户特征的预测社交网络信息流行度的方法及***
CN106355449A (zh) * 2016-08-31 2017-01-25 腾讯科技(深圳)有限公司 用户选取方法和装置

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN111932321A (zh) * 2020-09-23 2020-11-13 北京每日优鲜电子商务有限公司 针对用户的物品信息推送方法、装置、电子设备和介质

Also Published As

Publication number Publication date
CN107688984A (zh) 2018-02-13

Similar Documents

Publication Publication Date Title
WO2019019348A1 (zh) 产品信息推送方法、装置、存储介质和计算机设备
WO2020048084A1 (zh) 资源推荐方法、装置、计算机设备及计算机可读存储介质
WO2021068610A1 (zh) 资源推荐的方法、装置、电子设备及存储介质
WO2018041168A1 (zh) 信息推送方法、存储介质和服务器
US10637826B1 (en) Policy compliance verification using semantic distance and nearest neighbor search of labeled content
Cufoglu User profiling-a short review
CN110196904B (zh) 一种获取推荐信息的方法、装置及计算机可读存储介质
CN110765117A (zh) 欺诈识别方法、装置、电子设备及计算机可读存储介质
CN108717407B (zh) 实体向量确定方法及装置,信息检索方法及装置
US11436446B2 (en) Image analysis enhanced related item decision
KR20190128246A (ko) 검색 방법 및 장치 및 비-일시적 컴퓨터-판독가능 저장 매체
CN107247728B (zh) 文本处理方法、装置及计算机存储介质
CN111737418A (zh) 搜索词和商品的相关性预测方法、设备和存储介质
US20190080290A1 (en) Updating messaging data structures to include predicted attribute values associated with recipient entities
CN113934941A (zh) 一种基于多维度信息的用户推荐***及方法
CN108346067A (zh) 基于自然语言处理的社交网络广告推送方法
CN111429214B (zh) 一种基于交易数据的买卖双方匹配方法及装置
CN115640470A (zh) 一种推荐方法及电子设备
CN113656699B (zh) 用户特征向量确定方法、相关设备及介质
CN116823410B (zh) 数据处理方法、对象处理方法、推荐方法及计算设备
CN111680213A (zh) 信息推荐方法、数据处理方法及装置
CN112288510A (zh) 物品推荐方法、装置、设备及存储介质
CN114282119B (zh) 一种基于异构信息网络的科技信息资源检索方法及***
CN115905648A (zh) 基于高斯混合模型的用户群和金融用户群分析方法及装置
CN108694171B (zh) 信息推送的方法及装置

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 17919331

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 29.05.2020)

122 Ep: pct application non-entry in european phase

Ref document number: 17919331

Country of ref document: EP

Kind code of ref document: A1