WO2018112696A1 - 一种内容推荐方法及内容推荐*** - Google Patents

一种内容推荐方法及内容推荐*** Download PDF

Info

Publication number
WO2018112696A1
WO2018112696A1 PCT/CN2016/110741 CN2016110741W WO2018112696A1 WO 2018112696 A1 WO2018112696 A1 WO 2018112696A1 CN 2016110741 W CN2016110741 W CN 2016110741W WO 2018112696 A1 WO2018112696 A1 WO 2018112696A1
Authority
WO
WIPO (PCT)
Prior art keywords
content
vector
contents
user
target user
Prior art date
Application number
PCT/CN2016/110741
Other languages
English (en)
French (fr)
Inventor
王娜
王文君
高睿
汪景福
陈昭男
Original Assignee
深圳大学
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 深圳大学 filed Critical 深圳大学
Priority to PCT/CN2016/110741 priority Critical patent/WO2018112696A1/zh
Publication of WO2018112696A1 publication Critical patent/WO2018112696A1/zh

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F16/00Information retrieval; Database structures therefor; File system structures therefor

Definitions

  • the present invention relates to the field of data analysis and processing technologies, and in particular, to a content recommendation method and a content recommendation system for recommending content of interest to a target user.
  • the recommendation system refers to the Internet site providing product information or suggestions to users, allowing users to discover their potential interests and needs and help users select products.
  • Video-based collaborative filtering uses the user's preference for video, discovers the similarity between the videos, and then recommends similar videos to the user based on the user's historical preference information.
  • the video-based collaborative filtering calculates the similarity between the videos according to the user-video scoring matrix, thereby determining the neighbor video of the target video, and then recommending to the target user a video with high similarity to the video viewed by the history, wherein the neighbors who view the historical video are searched for.
  • Video is a key step in collaborative filtering algorithms.
  • the advantage is that, independent of the user's attribute information and the content information of the video, only by analyzing a large number of user behavior data of the video, a specific behavior pattern is found therefrom, thereby predicting the user's interest and making relevant recommendations. It does not require rigorous modeling of video or users to make satisfactory recommendations.
  • the content-based recommendation algorithm finds the correlation between videos based on the description information of the video, which is the most widely used recommendation system.
  • the core idea of the recommendation mechanism is to use the description information of the video to discover the correlation between the videos, and then recommend the similar videos to the users based on the user's past preferences.
  • the advantage is that it is easy to implement, does not require user data, so there is no sparsity and cold start problem, and there is no over-recommended hot issue based on the characteristics of the video itself.
  • the disadvantages of video-based collaborative filtering are as follows: (1) The recommendation effect depends on the number and accuracy of user history preference data; 2 User history and preference are stored in a sparse matrix, and the calculation on the sparse matrix has obvious problems, a small part Human error preferences have a greater impact on accuracy; 3 because the number of users and videos is very large, the computational complexity of the user-video matrix will be very large, and the difficulty in real-time recommendation implementation will be greater.
  • the shortcomings of content-based related recommendations are as follows: (1) the description information of the video will be missing, resulting in the inability to extract the video attributes; 2 the extracted video feature records must have certain practical significance to ensure the accuracy, otherwise it is difficult to ensure the relevance of the recommendation results. .
  • the technical problem to be solved by the present invention is to provide a content recommendation method and a content recommendation system, which can achieve accurate recommendation of content without using description information or attributes of the content, and without using the description information or attributes of the user.
  • the present invention is implemented as follows:
  • a content recommendation method includes the following steps:
  • Step A Obtaining the content viewing history data of all users, and the content viewing history data of each user includes all the content viewed by the user and the viewing time of each content;
  • Step B Sorting all the content viewed by each user in the order of viewing time, and obtaining a sequence of historical viewing contents of each user;
  • Step C performing a continuous word bag model training on each user's historical view content sequence to obtain a continuous word bag model, thereby obtaining a content vector of each content;
  • Step D obtaining a collection of content that the target user viewed in the preset time window
  • Step E extracting content vectors and the collections from all the content viewed by all users The first predetermined number of contents that are most similar to each content;
  • Step F Calculate the degree of interest of the target user for each of the extracted contents; set the target user to be u, the set to be M, the extracted content to j, and the content vector and the content j to be the most similar second pre- If the set of the number of contents is N, the calculation method of the interest degree P uj of the target user u on the content j is:
  • W ij represents the content vector similarity between the content i and the content j
  • P ui represents the degree of interest of the target user u on the content i
  • Step G Extracting, according to the degree of interest of the target user to the content, a third preset number of content with the highest degree of interest of the target user from all the extracted content is recommended to the target user.
  • the content is video, music, news or merchandise on the network, and the viewing is a link to click on the content.
  • step C includes the following steps:
  • Step C1 Establish an input matrix V and an output matrix U of a continuous word bag model, and randomly initialize the input matrix V and the output matrix U; wherein, V ⁇ R n ⁇
  • Step C2 Select a content x c from the historical view content sequence of each user as the central content, and read each of the m contents before and after the central content, and perform the unique heat code on the read 2m contents. Encoding, obtaining the unique heat code of the 2m contents; the unique heat codes of the 2m contents are respectively expressed as follows:
  • Step C3 Multiplying the unique heat codes of the 2m contents by the input matrix respectively to obtain an input content vector of the 2m contents; the input content vectors of the 2m contents are respectively represented as follows:
  • Step C4 averaging the input content vectors of the 2m contents
  • Step C5 Calculate the score vector according to the average value
  • Step C6 Converting the score vector into a probability distribution
  • Step C7 Calculating the error between the content vector of the central content in the output matrix U and the probability distribution using the cross entropy as the objective function: among them, For the probability distribution obtained in step C5, y is the content vector of the central content in the output matrix U;
  • Step C8 obtaining a final optimization objective function according to the error:
  • u i represents the output content vector of the content w i ;
  • Step C9 The content vector corresponding to the central content in the output matrix and the content vector corresponding to the 2m contents in the input matrix are updated by using a gradient descent method to obtain a final input matrix V and an output matrix U, thereby obtaining the continuous word bag. model.
  • a content recommendation system including:
  • the content viewing history data obtaining module is configured to obtain a collection of content viewing history data of all users and content viewed by the target user in a preset time window; each user's content viewing history data includes all content viewed by the user and The viewing time of each content;
  • the history view content sequence generating module is configured to sort all the content viewed by each user in order of viewing time, and obtain a history view content sequence of each user;
  • the continuous word bag model training module is configured to perform a continuous word bag model training on each user's historical view content sequence, and obtain a continuous word bag model, thereby obtaining a content vector of each content;
  • a similar content extraction module configured to extract, from all the content viewed by all the users, a first preset number of content whose content vector is most similar to each content in the set;
  • a degree of interest calculation module configured to calculate a degree of interest of the target user for each of the extracted contents; set the target user to be u, the set to be M, the extracted content to be j, and the content vector is most similar to the content j
  • the set of the second preset number of contents is N, and the calculation method of the interest degree P uj of the target user u to the content j is:
  • W ij represents the content vector similarity between the content i and the content j
  • P ui represents the degree of interest of the target user u on the content i
  • the recommended content extraction module is configured to extract, from the extracted content, a third preset number of content with the highest target user interest to the target user according to the level of interest of the target user.
  • the content is video, music, news or merchandise on the network, and the viewing is a link to click on the content.
  • the continuous word bag model training module includes:
  • a matrix establishing module configured to establish an input matrix V and an output matrix U of the continuous word bag model, and randomly initialize the input matrix V and the output matrix U; wherein, V ⁇ R n ⁇
  • the unique heat code encoding module is configured to select a content x c from the historical view content sequence of each user as the center content, and read each of the m contents before and after the center content, and read the 2m pieces.
  • the content is encoded by the unique heat code to obtain the unique heat code of the 2m contents; the unique heat codes of the 2m contents are respectively expressed as follows:
  • the input content vector calculation module is configured to multiply the unique heat codes of the 2m contents by the input matrix to obtain an input content vector of the 2m contents; the input content vectors of the 2m contents are respectively represented as follows:
  • a vector average calculation module for averaging the input content vectors of the 2m contents
  • a score vector calculation module for calculating a score vector from the average value
  • a probability distribution conversion module for converting the score vector into a probability distribution
  • An error calculation module for calculating an error between a content vector of the central content in the output matrix U and the probability distribution using the cross entropy as an objective function: among them, For the probability distribution obtained in step C5, y is the content vector of the central content in the output matrix U;
  • An optimization objective function generation module is configured to obtain a final optimization objective function according to the error:
  • u i represents the output content vector of the content w i ;
  • a continuous word bag model generating module configured to update a content vector corresponding to the central content of the output matrix and a content vector corresponding to 2m contents in the input matrix by using a gradient descent method to obtain a final input matrix V and an output matrix U, thereby The continuous word bag model is obtained.
  • the invention is based on the continuous word bag model in natural language processing, and learns and trains the historical view content sequence of each user to obtain a continuous word bag model, thereby obtaining a content vector of each content, and then obtaining a target user according to the content vector of each content.
  • the similar content of the content is compared, then the target user's interest in each similar content is calculated, and finally the preset number of content with the highest target user interest is extracted and recommended to the target user.
  • the present invention does not utilize content and user description information, attributes or tags, and does not cause the robustness of the algorithm to be deteriorated due to the lack of content and user information.
  • the calculation speed of the present invention far exceeds the collaborative filtering and content-based recommendation algorithms.
  • the present invention expresses the content as vectors of equal length that can accommodate a variety of off-the-shelf similarity algorithms.
  • FIG. 1 is a schematic overall flow chart of a content recommendation method provided by Embodiment 1 of the present invention
  • Figure 2 An example of the calculation principle of the user's interest in content in the content recommendation method provided by the embodiment 1;
  • Figure 3 is a schematic diagram showing the specific flow of the continuous word bag model training in the content recommendation method provided by the embodiment 1;
  • FIG. 4 is a schematic diagram showing the overall composition of a content recommendation system according to Embodiment 2 of the present invention.
  • FIG. 5 is a schematic diagram showing the specific composition of a continuous word bag model training module in the content recommendation system provided in Embodiment 2.
  • Embodiment 1 of the present invention provides a content recommendation method, including the following steps:
  • Step A Obtain the content viewing history data of all users, and the content viewing history data of each user includes all the content viewed by the user and the viewing time of each content.
  • Content can be videos, music, news, or merchandise on the web, viewed as links to click content. When the content is video or music, clicking on the link of the video or music will play the video or music. When the content is news, clicking the news link will present the content of the news, and clicking on the product link will present the product information.
  • the viewing time of the content refers to the time at which the content is viewed.
  • Step B Sort all the content viewed by each user in the order of viewing time, and obtain a sequence of historical viewing contents of each user.
  • Step C Perform continuous bag model training on each user's historical view content sequence to obtain a continuous word bag model, thereby obtaining a content vector of each content.
  • Continuous word bag model training is the core of the entire process flow of the present invention, which utilizes a natural language processing algorithm.
  • the natural language processing algorithm that is used in the field of language processing is applied to the present invention.
  • the natural language processing algorithm obtains the word vector and the probability density function by learning the training corpus.
  • the word vector is a multidimensional real number vector.
  • the word vector contains the semantic and grammatical relations in natural language.
  • the cosine distance between the word vectors represents the phase between words. Similarity.
  • Each historical view content sequence is treated as a sentence in a natural language, and each content in the sequence is treated as a word in a sentence.
  • the language model used in this embodiment is a continuous word bag model
  • the continuous word bag model is a word bag model capable of predicting or generating a central word according to the preceding and following words in one sentence. Taking the sentence "The cat jump over the puddle” as an example, the continuous word bag model can predict or produce the central word in the context of ⁇ "The", “cat", “over”, “the”, “puddle” ⁇ . Jump", this model is called the continuous word bag model.
  • Step D Obtain a collection of content that the target user viewed in the preset time window.
  • the time window is a time period that can be preset as needed.
  • Step E Extracting, from all the content viewed by all the users, a first preset number of content whose content vector is most similar to each content in the collection.
  • the content vector similarity represents the content similarity, the content vector similarity is high, the content similarity is high, and the content vector similarity is low, and the content similarity is low.
  • Step F Calculate the degree of interest of the target user for each of the extracted contents.
  • the target user be u
  • the set is M
  • the extracted content is j
  • the set of the second preset number of contents whose content vector is most similar to the content j is N
  • the target user u calculates the interest degree P uj of the content j
  • the method is:
  • W ij represents the content vector similarity between the content i and the content j
  • P ui represents the degree of interest of the target user u to the content i.
  • Figure 2 shows the content as a video. It is assumed that the target user has watched the A video and the B video in a certain time window. Since the target user has watched the A video and the B video, the user's interest in the A video and the B video can be simply made 1.
  • the A video and the B video are input into the trained video model, and the three videos most similar to the A video are u video, v video and x video, respectively, and the three videos most similar to the B video are x video and y video respectively. And z video. Among them, x video is similar to A video and B video.
  • the similarity between A video and u video is 0.7
  • the similarity with v video is 0.6
  • the similarity with x video is 0.5
  • the similarity between B video and x video is 0.4
  • similar to y video is 0.6
  • the degree is 0.5 and the similarity to the z video is 0.6.
  • Step G Extracting, according to the degree of interest of the target user to the content, a third preset number of content with the highest degree of interest of the target user from all the extracted content is recommended to the target user.
  • step C includes the following steps:
  • Step C1 The input matrix V and the output matrix U of the continuous word bag model are established, and the input matrix V and the output matrix U are randomly initialized.
  • ⁇ n n represents the vector dimension.
  • n can be arbitrarily specified, indicating the dimension of the content vector
  • V is the input word matrix.
  • the ith column of V is the n-dimensional content vector corresponding to this content w i , and this column is represented as v i .
  • U is the output matrix.
  • the i-th row of U is the n-dimensional content vector corresponding to this content w i , and this row is represented as u i .
  • We learn two content vectors for each content w i one is the vector u i of the output content, and the other is the vector v i of the input content.
  • Step C2 selecting one content x c from each user's historical view content sequence as the central content, and reading each of the m contents before and after the central content, and performing the unique heat code encoding on the read 2m contents. Get the unique heat code of the 2m content.
  • the unique heat codes of the 2m contents are respectively expressed as follows:
  • Step C3 Multiply the unique heat codes of the 2m contents by the input matrix to obtain the input content vector of the 2m contents.
  • the input content vectors of the 2m contents are respectively expressed as follows:
  • v i represents the input content vector of the content w i .
  • Step C4 Average the input content vector of 2m contents
  • Step C5 Calculate the score vector based on the average value
  • Step C6 Convert the score vector into a probability distribution
  • Step C7 Calculate the error between the content vector and the probability distribution of the central content in the output matrix U using the cross entropy as the objective function: among them, For the probability distribution obtained in step C5, y is the content vector of the central content in the output matrix U.
  • Step C8 Obtain the final optimization objective function according to the error:
  • u i represents the output content vector of the content w i .
  • Step C9 The content vector corresponding to the central content in the output matrix and the content vector corresponding to the 2m contents in the input matrix are updated by using the gradient descent method to obtain the final input matrix V and the output matrix U, thereby obtaining a continuous word bag model.
  • Embodiment 2 of the present invention provides a content recommendation system, including a content viewing history data acquiring module 1, a history viewing content sequence generating module 2, a continuous word bag model training module 3, and a similar content extracting module 4, The interest degree calculation module 5 and the recommended content extraction module 6. among them:
  • the content viewing history data obtaining module 1 is configured to obtain content viewing history data and items of all users. A collection of content that the user viewed in a preset time window. The content viewing history data of each user includes all the content viewed by the user and the viewing time of each content.
  • the history view content sequence generating module 2 is configured to sort all the content viewed by each user in order of viewing time, and obtain a history view content sequence of each user.
  • the continuous word bag model training module 3 is configured to perform continuous word bag model training on each user's historical view content sequence to obtain a continuous word bag model, thereby obtaining a content vector of each content.
  • the similar content extraction module 4 is configured to respectively extract a first preset number of content whose content vector is most similar to each content in the collection from all the content viewed by all users.
  • the interest degree calculation module 5 is configured to calculate the degree of interest of the target user for each of the extracted contents. Let the target user be u, the set is M, the extracted content is j, and the set of the second preset number of contents whose content vector is most similar to the content j is N, then the target user u calculates the interest degree P uj of the content j
  • the method is:
  • w ij represents the content vector similarity between the content i and the content j
  • P ui represents the degree of interest of the target user u to the content i.
  • the recommended content extraction module 6 is configured to extract, from the extracted content, a third preset number of content with the highest target user interest to the target user according to the level of interest of the target user.
  • Content can be content, music, news, or merchandise on the web, viewed as links to click content.
  • the continuous word bag model training module 3 includes a matrix establishment module 301, a heat code encoding module 302, an input content vector calculation module 303, a vector average calculation module 304, a score vector calculation module 305, and a probability distribution conversion module. 306.
  • the matrix establishing module 301 is configured to establish an input matrix V and an output matrix U of the continuous word bag model, and randomly initialize the input matrix V and the output matrix U.
  • ⁇ n , n represents the vector dimension.
  • the unique heat code encoding module 302 is configured to select one content x c as the central content from each user's historical view content sequence, and read each of the m contents before and after the center content, and perform the read 2m contents.
  • the unique heat code is encoded to obtain the unique heat code of the 2m contents.
  • the unique heat codes of the 2m contents are respectively expressed as follows:
  • the input content vector calculation module 303 is configured to multiply the unique heat codes of the 2m contents by the input matrix to obtain the input content vectors of the 2m contents.
  • the input content vectors of the 2m contents are respectively expressed as follows:
  • v i represents the input content vector of the content w i .
  • the vector average calculation module 304 is configured to average the input content vectors of 2m contents.
  • the score vector calculation module 305 is configured to calculate a score vector from the average value
  • the probability distribution conversion module 306 is configured to convert the score vector into a probability distribution
  • the error calculation module 307 is configured to calculate the error between the content vector and the probability distribution of the central content in the output matrix U using the cross entropy as the objective function: among them, For the probability distribution obtained in step C5, y is the content vector of the central content in the output matrix U.
  • the optimization objective function generation module 308 is configured to obtain a final optimization objective function according to the error:
  • u i represents the output content vector of the content w i .
  • the continuous word bag model generating module 309 is configured to update the content vector of the central content in the output matrix and the content vector corresponding to the 2m contents in the input matrix by using the gradient descent method to obtain the final input.
  • the matrix V and the output matrix U are entered to obtain a continuous word bag model.
  • the modules in the content recommendation system provided in the embodiment 2 correspond to the steps in the content recommendation method provided in the embodiment 1.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Databases & Information Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Information Retrieval, Db Structures And Fs Structures Therefor (AREA)

Abstract

一种用于向目标用户推荐感兴趣内容的内容推荐方法及内容推荐***,涉及数据分析与处理技术领域,基于自然语言处理中的连续词袋模型对各用户的历史查看内容序列进行学习训练得到连续词袋模型,从而得到每个内容的内容向量(C),再根据每个内容的内容向量获取目标用户查看过的内容的相似内容(E),然后计算目标用户对各相似内容的兴趣度(F),最后提取出目标用户兴趣度最高的预设数量个内容推荐给目标用户(G)。由于不利用内容和用户的描述信息、属性或标签,不会因内容和用户的信息的缺失导致算法的鲁棒性变差。同时,计算速度远超协同过滤和基于内容的推荐算法。再者,将内容表示成等长的向量,可迎合各种现成的相似度算法。

Description

一种内容推荐方法及内容推荐*** 技术领域
本发明涉及数据分析与处理技术领域,尤其涉及一种用于向目标用户推荐感兴趣内容的内容推荐方法及内容推荐***。
背景技术
随着人们逐渐步入信息时代,当今世界正处于信息大***的环境下,并且面临着严峻的信息过剩问题。仅在2011年,全球数据量就达到了1.8ZB,相当于全世界每个人产生200GB以上的数据。这种增长趋势仍在加速,据保守预计,接下来几年中,数据将始终保持每年50%的增长速度。如今,各大电商、视频播放等平台用户每天都将产生海量的数据,因此如何有效地利用用户产生的数据是当今互联网企业亟需解决的问题。这时候个性化的推荐***作为数据挖掘的手段便应运而生了。推荐***指的是互联网站向用户提供产品信息或建议,让用户发现自己潜在的兴趣和需求并帮助用户选择产品。
传统的相关视频推荐算法有基于视频的协同过滤和基于内容的相关推荐。基于视频的协同过滤使用用户对视频的偏好,发现视频间的相似度,然后根据用户的历史偏好信息将相似视频推荐给用户。基于视频的协同过滤根据用户-视频评分矩阵计算视频间的相似度,从而确定目标视频的邻居视频,再向目标用户推荐与其历史观看的视频相似度高的视频,其中查找用户观看历史视频的邻居视频是协同过滤算法的关键步骤。其优点在于,不依赖用户的属性信息和视频的内容信息,仅仅通过分析大量用户对视频的行为数据,从中找到特定的行为模式,据此来预测用户的兴趣并作出相关推荐。它不需要对视频或者用户进行严格的建模就可以作出令人满意的推荐结果。基于内容的推荐算法是根据视频的描述信息发现视频之间的相关性,这是推荐***出现之初应用最为广泛 的推荐机制,其核心思想在于利用视频的描述信息发现视频之间的相关性,然后基于用户以往的喜好记录推荐给用户相似的视频。其优点在于易于实现,不需要用户数据因此不存在稀疏性和冷启动问题,而且基于视频本身的特征也不会出现过度推荐热门的问题。
基于视频的协同过滤的缺点有:①推荐效果依赖于用户历史偏好数据的多少和准确性;②用户历史和偏好是用稀疏矩阵进行存储的,而且稀疏矩阵上的计算有明显的问题,少部分人的错误偏好对准确度有较大的影响;③由于用户和视频的数量非常大,因此用户-视频矩阵的计算量会非常大,在实时推荐实现时难度也会较大。基于内容的相关推荐的缺点有:①视频的描述信息会有缺失,导致无法提取视频属性;②抽取的视频特征纪要保证准确性又要具有一定的实际意义,否则很难保证推荐结果的相关性。
上述缺陷同样也出现在其他诸如音乐、新闻、商品的推荐过程中。
发明内容
本发明所要解决的技术问题是,提供一种内容推荐方法及内容推荐***,不使用内容的描述信息或属性,也不使用用户的描述信息或属性,即可实现内容的准确推荐。本发明是这样实现的:
一种内容推荐方法,包括如下步骤:
步骤A:获取所有用户的内容查看历史数据,每个用户的内容查看历史数据包括该用户查看过的所有内容以及各内容的查看时间;
步骤B:按照查看时间先后顺序分别对每个用户查看过的所有内容进行排序,得到每个用户的历史查看内容序列;
步骤C:对每个用户的历史查看内容序列进行连续词袋模型训练,得到连续词袋模型,从而得到每个内容的内容向量;
步骤D:获取目标用户在预设时间窗口内查看过的内容的集合;
步骤E:从所有用户查看过的所有内容中分别提取内容向量与所述集合中 的每个内容最相似的第一预设数量个内容;
步骤F:计算目标用户对提取出的所有内容中的每个内容的兴趣度;设目标用户为u,所述集合为M,提取的内容为j,内容向量与内容j最相似的第二预设数量个内容的集合为N,则目标用户u对内容j的兴趣度Puj的计算方法为:
Figure PCTCN2016110741-appb-000001
其中,Wij表示内容i与内容j的内容向量相似度,Pui表示目标用户u对内容i的兴趣度;
步骤G:根据目标用户对内容的兴趣度高低从提取出的所有内容中提取出目标用户兴趣度最高的第三预设数量个内容推荐给目标用户。
进一步地,所述内容为网络上的视频、音乐、新闻或商品,所述查看为点击所述内容的链接。
进一步地,所述步骤C包括如下步骤:
步骤C1:建立连续词袋模型的输入矩阵V和输出矩阵U,并对所述输入矩阵V和输出矩阵U进行随机初始化;其中,V∈Rn×|V|,U∈R|V|×n,n表示向量维度;
步骤C2:从所述每个用户的历史查看内容序列中选取一个内容xc作为中心内容,并读取中心内容的前后的各m个内容,并对读取出的2m个内容进行独热码编码,得到这2m个内容的独热码;该2m个内容的独热码分别表示如下:
x(c-m),…,x(c-1),x(c+1),…,x(c+m)
步骤C3:将这2m个内容的独热码分别乘以所述输入矩阵,得到这2m个内容的输入内容向量;该2m个内容的输入内容向量分别表示如下:
vc-m=Vx(c-m),…vc-1=Vx(c-1),vc+1=Vx(c+1),…,vc+m=Vx(c+m);vi表示内容wi的输入内容向量;
步骤C4:对所述2m个内容的输入内容向量求平均值
Figure PCTCN2016110741-appb-000002
Figure PCTCN2016110741-appb-000003
步骤C5:根据所述平均值计算得分向量
Figure PCTCN2016110741-appb-000004
步骤C6:将所述得分向量转换为概率分布
Figure PCTCN2016110741-appb-000005
步骤C7:利用交叉熵作为目标函数计算中心内容在输出矩阵U中的内容向量与所述概率分布之间的误差:
Figure PCTCN2016110741-appb-000006
其中,
Figure PCTCN2016110741-appb-000007
为步骤C5中得到的概率分布,y为中心内容在输出矩阵U中的内容向量;
步骤C8:根据所述误差得到最终的优化目标函数:
Figure PCTCN2016110741-appb-000008
ui表示内容wi的输出内容向量;
步骤C9:采用梯度下降法对输出矩阵中的中心内容的内容向量和输入矩阵中的2m个内容对应的内容向量进行更新,得到最终的输入矩阵V和输出矩阵U,从而得到所述连续词袋模型。
一种内容推荐***,包括:
内容查看历史数据获取模块,用于获取所有用户的内容查看历史数据和目标用户在预设时间窗口内查看过的内容的集合;每个用户的内容查看历史数据包括该用户查看过的所有内容以及各内容的查看时间;
历史查看内容序列生成模块,用于按照查看时间先后顺序分别对每个用户查看过的所有内容进行排序,得到每个用户的历史查看内容序列;
连续词袋模型训练模块,用于对每个用户的历史查看内容序列进行连续词袋模型训练,得到连续词袋模型,从而得到每个内容的内容向量;
相似内容提取模块,用于从所有用户查看过的所有内容中分别提取内容向量与所述集合中的每个内容最相似的第一预设数量个内容;
兴趣度计算模块,用于计算目标用户对提取出的所有内容中的每个内容的 兴趣度;设目标用户为u,所述集合为M,提取的内容为j,内容向量与内容j最相似的第二预设数量个内容的集合为N,则目标用户u对内容j的兴趣度Puj的计算方法为:
Figure PCTCN2016110741-appb-000009
其中,Wij表示内容i与内容j的内容向量相似度,Pui表示目标用户u对内容i的兴趣度;
推荐内容提取模块,用于根据目标用户对内容的兴趣度高低从提取出的所有内容中提取出目标用户兴趣度最高的第三预设数量个内容推荐给目标用户。
进一步地,所述内容为网络上的视频、音乐、新闻或商品,所述查看为点击所述内容的链接。
进一步地,所述连续词袋模型训练模块包括:
矩阵建立模块,用于建立连续词袋模型的输入矩阵V和输出矩阵U,并对所述输入矩阵V和输出矩阵U进行随机初始化;其中,V∈Rn×|V|,U∈R|V|×n,n表示向量维度;
独热码编码模块,用于从所述每个用户的历史查看内容序列中选取一个内容xc作为中心内容,并读取中心内容的前后的各m个内容,并对读取出的2m个内容进行独热码编码,得到这2m个内容的独热码;该2m个内容的独热码分别表示如下:
x(c-m),…,x(c-1),x(c+1),…,x(c+m)
输入内容向量计算模块,用于将这2m个内容的独热码分别乘以所述输入矩阵,得到这2m个内容的输入内容向量;该2m个内容的输入内容向量分别表示如下:
vc-m=Vx(c-m),…vc-1=Vx(c-1),vc+1=Vx(c+1),…,Vc+m=Vx(c+m);vi表示内容wi的输入内容向量;
向量平均值计算模块,用于对所述2m个内容的输入内容向量求平均值
Figure PCTCN2016110741-appb-000010
Figure PCTCN2016110741-appb-000011
得分向量计算模块,用于根据所述平均值计算得分向量
Figure PCTCN2016110741-appb-000012
概率分布转换模块,用于将所述得分向量转换为概率分布
Figure PCTCN2016110741-appb-000013
误差计算模块,用于利用交叉熵作为目标函数计算中心内容在输出矩阵U中的内容向量与所述概率分布之间的误差:
Figure PCTCN2016110741-appb-000014
其中,
Figure PCTCN2016110741-appb-000015
为步骤C5中得到的概率分布,y为中心内容在输出矩阵U中的内容向量;
优化目标函数生成模块,用于根据所述误差得到最终的优化目标函数:
Figure PCTCN2016110741-appb-000016
ui表示内容wi的输出内容向量;
连续词袋模型生成模块,用于采用梯度下降法对输出矩阵中的中心内容的内容向量和输入矩阵中的2m个内容对应的内容向量进行更新,得到最终的输入矩阵V和输出矩阵U,从而得到所述连续词袋模型。
本发明基于自然语言处理中的连续词袋模型对各用户的历史查看内容序列进行学习训练得到连续词袋模型,从而得到每个内容的内容向量,再根据每个内容的内容向量获取目标用户查看过的内容的相似内容,然后计算目标用户对各相似内容的兴趣度,最后提取出目标用户兴趣度最高的预设数量个内容推荐给目标用户。与现有技术相比,本发明不利用内容和用户的描述信息、属性或标签,不会因内容和用户的信息的缺失导致算法的鲁棒性变差。同时,本发明计算速度远超协同过滤和基于内容的推荐算法。再者,本发明将内容表示成等长的向量,可迎合各种现成的相似度算法。
附图说明
图1:本发明实施例1提供的内容推荐方法的总体流程示意图;
图2:施例1提供的内容推荐方法中用户对内容的兴趣度的计算原理举例;
图3:施例1提供的内容推荐方法中连续词袋模型训练的具体流程示意图;
图4:本发明实施例2提供的内容推荐***的总体组成示意图;
图5:实施例2提供的内容推荐***中连续词袋模型训练模块的具体组成示意图。
具体实施方式
为了使本发明的目的、技术方案及优点更加清楚明白,以下结合附图及实施例,对本发明进行进一步详细说明。
如图1所示,本发明实施例1提供了一种内容推荐方法,包括如下步骤:
步骤A:获取所有用户的内容查看历史数据,每个用户的内容查看历史数据包括该用户查看过的所有内容以及各内容的查看时间。内容可以为网络上的视频、音乐、新闻或商品,查看为点击内容的链接。当内容为视频或音乐时,点击视频或音乐的链接将播放该视频或音乐,当内容为新闻时,点击新闻链接将呈现出新闻的内容,点击商品链接将呈现出商品信息。内容的查看时间是指查看该内容的时刻。
步骤B:按照查看时间先后顺序分别对每个用户查看过的所有内容进行排序,得到每个用户的历史查看内容序列。
步骤C:对每个用户的历史查看内容序列进行连续词袋模型训练,得到连续词袋模型,从而得到每个内容的内容向量。连续词袋模型训练是整个本发明整个方法流程的最核心的部分,其利用了自然语言处理算法。将本用于进行语言处理领域的自然语言处理算法应用到本发明中。自然语言处理算法通过学习训练语料获取词向量和概率密度函数。词向量是多维实数向量,词向量中包含了自然语言中的语义和语法关系,词向量之间的余弦距离代表了词语之间的相 似度。每个历史查看内容序列当做自然语言中的一个句子,序列中的每个内容当成句子中的一个词。使用语言模型对每个用户的历史查看内容序列进行学习训练后将得到每个内容的内容向量,内容向量等效于自然语言处理中获得的词向量。本实施例中采用的语言模型为连续词袋模型,连续词袋模型是一种能够根据一句话中的前后词语预测或产生出中心词语的词袋模型。以句子“The cat jump over the puddle”为例,连续词袋模型能够以{“The”,“cat”,“over”,“the”,“puddle”}为上下文,预测或产生出中心词语“jump”,这种模型称为连续词袋模型。
步骤D:获取目标用户在预设时间窗口内查看过的内容的集合。时间窗口是一个时间段,可根据需要预设。
步骤E:从所有用户查看过的所有内容中分别提取内容向量与集合中的每个内容最相似的第一预设数量个内容。本实施例中以内容向量相似度代表内容相似度,内容向量相似度高,则内容相似度高,内容向量相似度低,则内容相似度低。
步骤F:计算目标用户对提取出的所有内容中的每个内容的兴趣度。设目标用户为u,集合为M,提取的内容为j,内容向量与内容j最相似的第二预设数量个内容的集合为N,则目标用户u对内容j的兴趣度Puj的计算方法为:
Figure PCTCN2016110741-appb-000017
其中,Wij表示内容i与内容j的内容向量相似度,Pui表示目标用户u对内容i的兴趣度。
图2以内容为视频举例,假设目标用户在某时间窗口内观看过A视频和B视频,由于目标用户观看过A视频和B视频,可以简单地使用户对A视频和B视频的兴趣度为1。将A视频和B视频输入训练好的视频模型,得到与A视频最相似的3个视频分别是u视频、v视频和x视频,和B视频最相似的3个视频分别是x视频、y视频和z视频。其中,x视频与A视频和B视频均相似。如 图2所示,A视频与u视频的相似度为0.7,与v视频的相似度为0.6,与x视频的相似度为0.5;B视频与x视频的相似度为0.4,与y视频的相似度为0.5,与z视频的相似度为0.6。根据上述公式,目标用户对u视频的兴趣度为0.7*1=0.7,对v视频的兴趣度为0.6*1=0.6,对x视频的兴趣度为0.5*1+0.4*1=0.9,对y视频的兴趣度为0.5*1=0.5,对z视频的兴趣度为0.6*1=0.6。因此,按照目标用户兴趣度从高到底对上述各视频排序为:x>u>v=z>y。
步骤G:根据目标用户对内容的兴趣度高低从提取出的所有内容中提取出目标用户兴趣度最高的第三预设数量个内容推荐给目标用户。
如图3所示,步骤C包括如下步骤:
步骤C1:建立连续词袋模型的输入矩阵V和输出矩阵U,并对输入矩阵V和输出矩阵U进行随机初始化。其中,V∈Rn×|V|,U∈R|V|×n,n表示向量维度。首先,需要建立模型的一些已知参数,把训练集中所有内容进行one-hot(独热)编码,再将内容序列表示为一些one-hot向量作为模型的输入,记为x(c)。模型只有一个输出,即中心内容,记为y。以上文的英文句子为例,y就是我们已知的中心词语“jump”。然后定义模型中的未知参数,建立两个矩阵U、V,V∈Rn×|V|,U∈R|V|×n。其中n可任意指定,表示内容向量的维度,V表示输入词矩阵。当内容wi(one_hot向量)作为模型输入的时候,V的第i列就是这个内容wi对应的n维内容向量,这一列表示为vi。类似地,U是输出矩阵,当内容wj(one_hot向量)作为模型输出的时候,U的第i行就是这个内容wi对应的n维内容向量,这一行表示为ui。我们对每个内容wi学习了两个内容向量,一个是输出内容的向量ui,另一个是输入内容的向量vi
步骤C2:从每个用户的历史查看内容序列中选取一个内容xc作为中心内容,并读取中心内容的前后的各m个内容,并对读取出的2m个内容进行独热码编码,得到这2m个内容的独热码。该2m个内容的独热码分别表示如下:
x(c-m),…,x(c-1),x(c+1),…,x(c+m)
步骤C3:将这2m个内容的独热码分别乘以输入矩阵,得到这2m个内容的输入内容向量。该2m个内容的输入内容向量分别表示如下:
vc-m=Vx(c-m),…vc-1=Vx(c-1),vc+1=Vx(c+1),…,vc+m=Vx(c+m)。vi表示内容wi的输入内容向量。
步骤C4:对2m个内容的输入内容向量求平均值
Figure PCTCN2016110741-appb-000018
Figure PCTCN2016110741-appb-000019
步骤C5:根据平均值计算得分向量
Figure PCTCN2016110741-appb-000020
步骤C6:将得分向量转换为概率分布
Figure PCTCN2016110741-appb-000021
步骤C7:利用交叉熵作为目标函数计算中心内容在输出矩阵U中的内容向量与概率分布之间的误差:
Figure PCTCN2016110741-appb-000022
其中,
Figure PCTCN2016110741-appb-000023
为步骤C5中得到的概率分布,y为中心内容在输出矩阵U中的内容向量。
步骤C8:根据误差得到最终的优化目标函数:
Figure PCTCN2016110741-appb-000024
ui表示内容wi的输出内容向量。
步骤C9:采用梯度下降法对输出矩阵中的中心内容的内容向量和输入矩阵中的2m个内容对应的内容向量进行更新,得到最终的输入矩阵V和输出矩阵U,从而得到连续词袋模型。
如图4所示,本发明实施例2提供了一种内容推荐***,包括内容查看历史数据获取模块1、历史查看内容序列生成模块2、连续词袋模型训练模块3、相似内容提取模块4、兴趣度计算模块5和推荐内容提取模块6。其中:
内容查看历史数据获取模块1用于获取所有用户的内容查看历史数据和目 标用户在预设时间窗口内查看过的内容的集合。每个用户的内容查看历史数据包括该用户查看过的所有内容以及各内容的查看时间。
历史查看内容序列生成模块2用于按照查看时间先后顺序分别对每个用户查看过的所有内容进行排序,得到每个用户的历史查看内容序列。
连续词袋模型训练模块3用于对每个用户的历史查看内容序列进行连续词袋模型训练,得到连续词袋模型,从而得到每个内容的内容向量。
相似内容提取模块4用于从所有用户查看过的所有内容中分别提取内容向量与集合中的每个内容最相似的第一预设数量个内容。
兴趣度计算模块5用于计算目标用户对提取出的所有内容中的每个内容的兴趣度。设目标用户为u,集合为M,提取的内容为j,内容向量与内容j最相似的第二预设数量个内容的集合为N,则目标用户u对内容j的兴趣度Puj的计算方法为:
Figure PCTCN2016110741-appb-000025
其中,wij表示内容i与内容j的内容向量相似度,Pui表示目标用户u对内容i的兴趣度。
推荐内容提取模块6用于根据目标用户对内容的兴趣度高低从提取出的所有内容中提取出目标用户兴趣度最高的第三预设数量个内容推荐给目标用户。
内容可为网络上的内容、音乐、新闻或商品,查看为点击内容的链接。
如图5所示,连续词袋模型训练模块3包括矩阵建立模块301、独热码编码模块302、输入内容向量计算模块303、向量平均值计算模块304、得分向量计算模块305、概率分布转换模块306、误差计算模块307、优化目标函数生成模块308、连续词袋模型生成模块309。其中:
矩阵建立模块301用于建立连续词袋模型的输入矩阵V和输出矩阵U,并对输入矩阵V和输出矩阵U进行随机初始化。其中,V∈Rn×|V|,U∈R|V|×n,n表示向量维度。
独热码编码模块302用于从每个用户的历史查看内容序列中选取一个内容xc作为中心内容,并读取中心内容的前后的各m个内容,并对读取出的2m个内容进行独热码编码,得到这2m个内容的独热码。该2m个内容的独热码分别表示如下:
x(c-m),…,x(c-1),x(c+1),…,x(c+m)
输入内容向量计算模块303用于将这2m个内容的独热码分别乘以输入矩阵,得到这2m个内容的输入内容向量。该2m个内容的输入内容向量分别表示如下:
vc-m=Vx(c-m),…vc-1=Vx(c-1),vc+1=Vx(c+1),…,vc+m=Vx(c+m)。vi表示内容wi的输入内容向量。
向量平均值计算模块304用于对2m个内容的输入内容向量求平均值
Figure PCTCN2016110741-appb-000026
Figure PCTCN2016110741-appb-000027
得分向量计算模块305用于根据平均值计算得分向量
Figure PCTCN2016110741-appb-000028
概率分布转换模块306用于将得分向量转换为概率分布
Figure PCTCN2016110741-appb-000029
误差计算模块307用于利用交叉熵作为目标函数计算中心内容在输出矩阵U中的内容向量与概率分布之间的误差:
Figure PCTCN2016110741-appb-000030
其中,
Figure PCTCN2016110741-appb-000031
为步骤C5中得到的概率分布,y为中心内容在输出矩阵U中的内容向量。
优化目标函数生成模块308用于根据误差得到最终的优化目标函数:
Figure PCTCN2016110741-appb-000032
ui表示内容wi的输出内容向量。
连续词袋模型生成模块309用于采用梯度下降法对输出矩阵中的中心内容的内容向量和输入矩阵中的2m个内容对应的内容向量进行更新,得到最终的输 入矩阵V和输出矩阵U,从而得到连续词袋模型。
实施例2提供的内容推荐***中的各模块与实施例1提供的内容推荐方法中的各步骤相对应,具体工作原理可参照实施例1中对相应步骤的说明。
以上仅为本发明的较佳实施例而已,并不用以限制本发明,凡在本发明的精神和原则之内所作的任何修改、等同替换和改进等,均应包含在本发明的保护范围之内。

Claims (6)

  1. 一种内容推荐方法,其特征在于,包括如下步骤:
    步骤A:获取所有用户的内容查看历史数据,每个用户的内容查看历史数据包括该用户查看过的所有内容以及各内容的查看时间;
    步骤B:按照查看时间先后顺序分别对每个用户查看过的所有内容进行排序,得到每个用户的历史查看内容序列;
    步骤C:对每个用户的历史查看内容序列进行连续词袋模型训练,得到连续词袋模型,从而得到每个内容的内容向量;
    步骤D:获取目标用户在预设时间窗口内查看过的内容的集合;
    步骤E:从所有用户查看过的所有内容中分别提取内容向量与所述集合中的每个内容最相似的第一预设数量个内容;
    步骤F:计算目标用户对提取出的所有内容中的每个内容的兴趣度;设目标用户为u,所述集合为M,提取的内容为j,内容向量与内容j最相似的第二预设数量个内容的集合为N,则目标用户u对内容j的兴趣度Puj的计算方法为:
    Figure PCTCN2016110741-appb-100001
    其中,Wij表示内容i与内容j的内容向量相似度,Pui表示目标用户u对内容i的兴趣度;
    步骤G:根据目标用户对内容的兴趣度高低从提取出的所有内容中提取出目标用户兴趣度最高的第三预设数量个内容推荐给目标用户。
  2. 如权利要求1所述的内容推荐方法,其特征在于,所述内容为网络上的视频、音乐、新闻或商品,所述查看为点击所述内容的链接。
  3. 如权利要求1所述的内容推荐方法,其特征在于,所述步骤C包括如下步骤:
    步骤C1:建立连续词袋模型的输入矩阵V和输出矩阵U,并对所述输入矩阵V和输出矩阵U进行随机初始化;其中,V∈Rn×|V|,U∈R|V|×n,n表示向量维度;
    步骤C2:从所述每个用户的历史查看内容序列中选取一个内容xc作为中心内容,并读取中心内容的前后的各m个内容,并对读取出的2m个内容进行独热码编码,得到这2m个内容的独热码;该2m个内容的独热码分别表示如下:
    x(c-m),...,x(c-1),x(c+1),...,x(c+m)
    步骤C3:将这2m个内容的独热码分别乘以所述输入矩阵,得到这2m个内容的输入内容向量;该2m个内容的输入内容向量分别表示如下:
    vc-m=Vx(c-m),...vc-1=Vx(c-1),vc+1=Vx(c+1),...,vc+m=Vx(c+m);vi表示内容wi的输入内容向量;
    步骤C4:对所述2m个内容的输入内容向量求平均值
    Figure PCTCN2016110741-appb-100002
    Figure PCTCN2016110741-appb-100003
    步骤C5:根据所述平均值计算得分向量z:
    Figure PCTCN2016110741-appb-100004
    步骤C6:将所述得分向量转换为概率分布
    Figure PCTCN2016110741-appb-100005
    Figure PCTCN2016110741-appb-100006
    步骤C7:利用交叉熵作为目标函数计算中心内容在输出矩阵U中的内容向量与所述概率分布之间的误差:
    Figure PCTCN2016110741-appb-100007
    其中,
    Figure PCTCN2016110741-appb-100008
    为步骤C5中得到的概率分布,y为中心内容在输出矩阵U中的内容向量;
    步骤C8:根据所述误差得到最终的优化目标函数:
    Figure PCTCN2016110741-appb-100009
    ui表示内容wi的输出内容向量;
    步骤C9:采用梯度下降法对输出矩阵中的中心内容的内容向量和输入矩阵中的2m个内容对应的内容向量进行更新,得到最终的输入矩阵V和输出矩阵U,从而得到所述连续词袋模型。
  4. 一种内容推荐***,其特征在于,包括:
    内容查看历史数据获取模块,用于获取所有用户的内容查看历史数据和目标用户在预设时间窗口内查看过的内容的集合;每个用户的内容查看历史数据包括该用户查看过的所有内容以及各内容的查看时间;
    历史查看内容序列生成模块,用于按照查看时间先后顺序分别对每个用户查看过的所有内容进行排序,得到每个用户的历史查看内容序列;
    连续词袋模型训练模块,用于对每个用户的历史查看内容序列进行连续词袋模型训练,得到连续词袋模型,从而得到每个内容的内容向量;
    相似内容提取模块,用于从所有用户查看过的所有内容中分别提取内容向量与所述集合中的每个内容最相似的第一预设数量个内容;
    兴趣度计算模块,用于计算目标用户对提取出的所有内容中的每个内容的兴趣度;设目标用户为u,所述集合为M,提取的内容为j,内容向量与内容j最相似的第二预设数量个内容的集合为N,则目标用户u对内容j的兴趣度Puj的计算方法为:
    Figure PCTCN2016110741-appb-100010
    其中,Wij表示内容i与内容j的内容向量相似度,Pui表示目标用户u对内容i的兴趣度;
    推荐内容提取模块,用于根据目标用户对内容的兴趣度高低从提取出的所有内容中提取出目标用户兴趣度最高的第三预设数量个内容推荐给目标用户。
  5. 如权利要求4所述的内容推荐***,其特征在于,所述内容为网络上的视频、音乐、新闻或商品,所述查看为点击所述内容的链接。
  6. 如权利要求4所述的内容推荐***,其特征在于,所述连续词袋模型训练模块包括:
    矩阵建立模块,用于建立连续词袋模型的输入矩阵V和输出矩阵U,并对所述输入矩阵V和输出矩阵U进行随机初始化;其中,V∈Rn×|V|,U∈R|V|×n,n表 示向量维度;
    独热码编码模块,用于从所述每个用户的历史查看内容序列中选取一个内容xc作为中心内容,并读取中心内容的前后的各m个内容,并对读取出的2m个内容进行独热码编码,得到这2m个内容的独热码;该2m个内容的独热码分别表示如下:
    x(c-m),...,x(c-1),x(c+1),...,x(c+m)
    输入内容向量计算模块,用于将这2m个内容的独热码分别乘以所述输入矩阵,得到这2m个内容的输入内容向量;该2m个内容的输入内容向量分别表示如下:
    vc-m=Vx(c-m),...vc-1=Vx(c-1),vc+1=Vx(c+1),...,vc+m=Vx(c+m);vi表示内容wi的输入内容向量;
    向量平均值计算模块,用于对所述2m个内容的输入内容向量求平均值
    Figure PCTCN2016110741-appb-100011
    Figure PCTCN2016110741-appb-100012
    得分向量计算模块,用于根据所述平均值计算得分向量z:
    Figure PCTCN2016110741-appb-100013
    概率分布转换模块,用于将所述得分向量转换为概率分布
    Figure PCTCN2016110741-appb-100014
    Figure PCTCN2016110741-appb-100015
    误差计算模块,用于利用交叉熵作为目标函数计算中心内容在输出矩阵U中的内容向量与所述概率分布之间的误差:
    Figure PCTCN2016110741-appb-100016
    其中,
    Figure PCTCN2016110741-appb-100017
    为步骤C5中得到的概率分布,y为中心内容在输出矩阵U中的内容向量;
    优化目标函数生成模块,用于根据所述误差得到最终的优化目标函数:
    Figure PCTCN2016110741-appb-100018
    ui表示内容wi的输出内容向量;
    连续词袋模型生成模块,用于采用梯度下降法对输出矩阵中的中心内容的内容向量和输入矩阵中的2m个内容对应的内容向量进行更新,得到最终的输入矩阵V和输出矩阵U,从而得到所述连续词袋模型。
PCT/CN2016/110741 2016-12-19 2016-12-19 一种内容推荐方法及内容推荐*** WO2018112696A1 (zh)

Priority Applications (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/110741 WO2018112696A1 (zh) 2016-12-19 2016-12-19 一种内容推荐方法及内容推荐***

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
PCT/CN2016/110741 WO2018112696A1 (zh) 2016-12-19 2016-12-19 一种内容推荐方法及内容推荐***

Publications (1)

Publication Number Publication Date
WO2018112696A1 true WO2018112696A1 (zh) 2018-06-28

Family

ID=62624046

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2016/110741 WO2018112696A1 (zh) 2016-12-19 2016-12-19 一种内容推荐方法及内容推荐***

Country Status (1)

Country Link
WO (1) WO2018112696A1 (zh)

Cited By (12)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109348262A (zh) * 2018-10-19 2019-02-15 广州虎牙科技有限公司 一种主播相似度的计算方法、装置、设备和存储介质
CN109800324A (zh) * 2018-12-18 2019-05-24 北京达佳互联信息技术有限公司 视频数据推荐方法、装置、服务器及存储介质
CN110222257A (zh) * 2019-05-13 2019-09-10 特斯联(北京)科技有限公司 一种推荐业务信息的方法、装置及数据链节点
CN111310074A (zh) * 2020-02-13 2020-06-19 北京百度网讯科技有限公司 兴趣点的标签优化方法、装置、电子设备和计算机可读介质
CN111400548A (zh) * 2019-01-02 2020-07-10 ***通信有限公司研究院 一种基于深度学习与马尔科夫链的推荐方法及设备
CN112149604A (zh) * 2020-09-30 2020-12-29 网易传媒科技(北京)有限公司 视频特征提取模型的训练方法、视频推荐方法及装置
CN112449240A (zh) * 2020-11-10 2021-03-05 深圳市易平方网络科技有限公司 一种基于互联网电视使用行为的用户流失预测方法及终端
CN112614029A (zh) * 2020-12-24 2021-04-06 江苏知途教育科技有限公司 一种选修课程推荐的方法和装置
EP3819821A4 (en) * 2018-07-04 2021-06-09 Tencent Technology (Shenzhen) Company Limited USER CHARACTERISTIC PRODUCTION PROCESS, DEVICE AND APPARATUS, AND COMPUTER READABLE STORAGE MEDIA
CN113344662A (zh) * 2021-05-31 2021-09-03 联想(北京)有限公司 一种产品推荐方法、装置及设备
CN113704596A (zh) * 2020-05-21 2021-11-26 北京沃东天骏信息技术有限公司 用于生成召回信息集合的方法和装置
CN113886711A (zh) * 2021-10-29 2022-01-04 北京达佳互联信息技术有限公司 内容推荐方法、装置、服务器及存储介质

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103593410A (zh) * 2013-10-22 2014-02-19 上海交通大学 通过替换概念性词语进行搜索推荐***
CN103870453A (zh) * 2012-12-07 2014-06-18 盛乐信息技术(上海)有限公司 数据推荐方法及***
CN105677715A (zh) * 2015-12-29 2016-06-15 海信集团有限公司 一种基于多用户的视频推荐方法及装置

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103870453A (zh) * 2012-12-07 2014-06-18 盛乐信息技术(上海)有限公司 数据推荐方法及***
CN103593410A (zh) * 2013-10-22 2014-02-19 上海交通大学 通过替换概念性词语进行搜索推荐***
CN105677715A (zh) * 2015-12-29 2016-06-15 海信集团有限公司 一种基于多用户的视频推荐方法及装置

Cited By (20)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11122333B2 (en) 2018-07-04 2021-09-14 Tencent Technology (Shenzhen) Company Limited User feature generation method and apparatus, device, and computer-readable storage medium
EP3819821A4 (en) * 2018-07-04 2021-06-09 Tencent Technology (Shenzhen) Company Limited USER CHARACTERISTIC PRODUCTION PROCESS, DEVICE AND APPARATUS, AND COMPUTER READABLE STORAGE MEDIA
CN109348262A (zh) * 2018-10-19 2019-02-15 广州虎牙科技有限公司 一种主播相似度的计算方法、装置、设备和存储介质
CN109348262B (zh) * 2018-10-19 2021-08-13 广州虎牙科技有限公司 一种主播相似度的计算方法、装置、设备和存储介质
CN109800324A (zh) * 2018-12-18 2019-05-24 北京达佳互联信息技术有限公司 视频数据推荐方法、装置、服务器及存储介质
CN111400548A (zh) * 2019-01-02 2020-07-10 ***通信有限公司研究院 一种基于深度学习与马尔科夫链的推荐方法及设备
CN111400548B (zh) * 2019-01-02 2023-09-22 ***通信有限公司研究院 一种基于深度学习与马尔科夫链的推荐方法及设备
CN110222257B (zh) * 2019-05-13 2020-01-31 特斯联(北京)科技有限公司 一种推荐业务信息的方法、装置及数据链节点
CN110222257A (zh) * 2019-05-13 2019-09-10 特斯联(北京)科技有限公司 一种推荐业务信息的方法、装置及数据链节点
CN111310074A (zh) * 2020-02-13 2020-06-19 北京百度网讯科技有限公司 兴趣点的标签优化方法、装置、电子设备和计算机可读介质
CN111310074B (zh) * 2020-02-13 2023-08-18 北京百度网讯科技有限公司 兴趣点的标签优化方法、装置、电子设备和计算机可读介质
US12020467B2 (en) 2020-02-13 2024-06-25 Beijing Baidu Netcom Science And Technology Co., Ltd. Method and apparatus for optimizing tag of point of interest, electronic device and computer readable medium
CN113704596A (zh) * 2020-05-21 2021-11-26 北京沃东天骏信息技术有限公司 用于生成召回信息集合的方法和装置
CN112149604A (zh) * 2020-09-30 2020-12-29 网易传媒科技(北京)有限公司 视频特征提取模型的训练方法、视频推荐方法及装置
CN112449240B (zh) * 2020-11-10 2022-12-06 深圳市易平方网络科技有限公司 一种基于互联网电视使用行为的用户流失预测方法及终端
CN112449240A (zh) * 2020-11-10 2021-03-05 深圳市易平方网络科技有限公司 一种基于互联网电视使用行为的用户流失预测方法及终端
CN112614029A (zh) * 2020-12-24 2021-04-06 江苏知途教育科技有限公司 一种选修课程推荐的方法和装置
CN112614029B (zh) * 2020-12-24 2024-04-12 江苏知途教育科技有限公司 一种选修课程推荐的方法和装置
CN113344662A (zh) * 2021-05-31 2021-09-03 联想(北京)有限公司 一种产品推荐方法、装置及设备
CN113886711A (zh) * 2021-10-29 2022-01-04 北京达佳互联信息技术有限公司 内容推荐方法、装置、服务器及存储介质

Similar Documents

Publication Publication Date Title
WO2018112696A1 (zh) 一种内容推荐方法及内容推荐***
CN106599226B (zh) 一种内容推荐方法及内容推荐***
CN111581510B (zh) 分享内容处理方法、装置、计算机设备和存储介质
CN110162749B (zh) 信息提取方法、装置、计算机设备及计算机可读存储介质
Hidasi et al. Parallel recurrent neural network architectures for feature-rich session-based recommendations
CN108763362B (zh) 基于随机锚点对选择的局部模型加权融合Top-N电影推荐方法
US20180336193A1 (en) Artificial Intelligence Based Method and Apparatus for Generating Article
CN104008203B (zh) 一种融入本体情境的用户兴趣挖掘方法
CN109886294A (zh) 知识融合方法、装置、计算机设备和存储介质
CN109063147A (zh) 基于文本相似度的在线课程论坛内容推荐方法及***
CN110990597B (zh) 基于文本语义映射的跨模态数据检索***及其检索方法
CN111831924A (zh) 内容推荐方法、装置、设备及可读存储介质
CN111241394A (zh) 数据处理方法、装置、计算机可读存储介质及电子设备
CN115712780A (zh) 一种基于云计算和大数据的信息推送方法及装置
CN110569355B (zh) 一种基于词块的观点目标抽取和目标情感分类联合方法及***
CN116680363A (zh) 一种基于多模态评论数据的情感分析方法
CN116541607A (zh) 基于商品检索数据分析的智能推荐方法
CN109918162B (zh) 一种可学习的海量信息高维图形交互式展示方法
CN114239730A (zh) 一种基于近邻排序关系的跨模态检索方法
CN111723302A (zh) 一种基于协同双模型深度表示学习的推荐方法
CN106844765B (zh) 基于卷积神经网络的显著信息检测方法及装置
CN112749566B (zh) 一种面向英文写作辅助的语义匹配方法及装置
CN116561441A (zh) 一种基于自注意力神经网络的社交网络用户认知状态刻画方法
CN113688281B (zh) 一种基于深度学习行为序列的视频推荐方法及***
CN116257618A (zh) 一种基于细粒度情感分析的多源智能旅游推荐方法

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 16924251

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

32PN Ep: public notification in the ep bulletin as address of the adressee cannot be established

Free format text: NOTING OF LOSS OF RIGHTS PURSUANT TO RULE 112(1) EPC (EPO FORM 1205A DATED 06.09.2019)

122 Ep: pct application non-entry in european phase

Ref document number: 16924251

Country of ref document: EP

Kind code of ref document: A1