JP2014225091A

JP2014225091A - Recommendation server and feature dictionary generation method

Info

Publication number: JP2014225091A
Application number: JP2013103312A
Authority: JP
Inventors: 圭五十嵐; Kei Igarashi
Original assignee: NTT Docomo Inc
Current assignee: NTT Docomo Inc
Priority date: 2013-05-15
Filing date: 2013-05-15
Publication date: 2014-12-04

Abstract

PROBLEM TO BE SOLVED: To automatically and appropriately associate a feature word with each dimension of a feature vector in a case where a feature dictionary associating a plurality of feature words with each dimension of the feature vector is defined.SOLUTION: A recommendation server performs: processing of calculating in-user history co-occurrence intensity with another word, for each in-user history co-occurrence source word extracted from data indicating content; processing of selecting one or more in-user history co-occurrence destination words subjected to in-user history co-occurrence with the in-user history co-occurrence source word in a descending order of in-user history co-occurrence intensity, and collectively setting them as a feature word corresponding to each dimension; and processing of generating a feature dictionary constituted by the set feature words.

Description

本発明は、複数のユーザ端末に対しレコメンドコンテンツを提供するレコメンドサーバ、及び、レコメンドコンテンツを決定するために使用される特徴辞書を作成する特徴辞書生成方法に関する。なお、以下では、レコメンドコンテンツを提供する対象のユーザを「対象ユーザ」という。 The present invention relates to a recommendation server that provides recommended content to a plurality of user terminals, and a feature dictionary generation method for creating a feature dictionary used for determining recommended content. Hereinafter, a target user who provides recommended content is referred to as a “target user”.

レコメンデーション技術においては、対象ユーザが過去に利用又は視聴したコンテンツをもとに、個々の対象ユーザの嗜好性を学習し、当該対象ユーザの嗜好性に類似した特徴を有するコンテンツを提供する。これを実現するための技術としては、例えば非特許文献１に記載された技術が用いられる。以下、既存のレコメンデーション技術の一例を説明する。 In the recommendation technique, the target user learns the preference of each target user based on the content that the target user has used or watched in the past, and provides content having characteristics similar to the target user's preference. As a technique for realizing this, for example, the technique described in Non-Patent Document 1 is used. Hereinafter, an example of an existing recommendation technique will be described.

このレコメンデーション技術は、以下の３ステップの処理を行うことにより、レコメンドコンテンツを提供する。
（ステップ１）コンテンツの特徴をコンテンツ特徴ベクトルとして表現する。
（ステップ２）対象ユーザの視聴履歴に基づき対象ユーザの特徴をユーザ特徴ベクトルとして表現する。
（ステップ３）コンテンツ特徴ベクトルとユーザ特徴ベクトルとの類似度を計算し、類似度の高いコンテンツをレコメンド対象とする。 This recommendation technology provides recommended content by performing the following three-step process.
(Step 1) The feature of the content is expressed as a content feature vector.
(Step 2) The feature of the target user is expressed as a user feature vector based on the viewing history of the target user.
(Step 3) The similarity between the content feature vector and the user feature vector is calculated, and the content with a high similarity is set as a recommendation target.

上記のステップ１においては、コンテンツを特徴づけるためにベクトル化の処理を行う。このベクトル化の処理では、非特許文献１に記載されるように、特徴辞書と呼ばれる情報の集合により、特徴語を定義する。そして、定義された特徴語が、各コンテンツに対応するメタデータ内に出現する頻度を基に数値化し、ベクトル化する。このベクトル化の処理により得られるベクトルを、コンテンツ特徴ベクトルと呼ぶ。 In step 1 above, vectorization is performed to characterize the content. In this vectorization process, as described in Non-Patent Document 1, a feature word is defined by a set of information called a feature dictionary. Then, the defined feature word is digitized and vectorized based on the frequency of appearance in the metadata corresponding to each content. A vector obtained by this vectorization processing is called a content feature vector.

より具体的には、例えば図１に示されるように、特徴ＩＤ「１」〜「５」にそれぞれ特徴語「ドラマ」、「スポーツ」、「街」、「涙」、「演劇」を対応付ける特徴辞書が定義されているとする。 More specifically, for example, as shown in FIG. 1, the feature words “drama”, “sports”, “town”, “tears”, and “drama” are associated with feature IDs “1” to “5”, respectively. Suppose a dictionary is defined.

このとき、あるコンテンツ１に付随するメタデータに「ドラマ」という特徴語が１つ、「涙」という特徴語が２つ、「演劇」という特徴語が２つ、それぞれ含まれている場合には、このコンテンツ１のコンテンツ特徴ベクトルは、｛１／３，０，０，２／３，２／３｝となる。コンテンツ特徴ベクトルの各次元の成分は、（メタデータにおける特徴語の出現数）／（ノルム）で計算される。ここで、ノルムは、各特徴語の出現数の２乗値の総和の平方根である。ノルムで除算を行っているのは、コンテンツ特徴ベクトルの長さを１にするための正規化処理である。 At this time, if metadata associated with a certain content 1 includes one feature word “drama”, two feature words “tears”, and two feature words “drama”, respectively. The content feature vector of content 1 is {1/3, 0, 0, 2/3, 2/3}. The components of each dimension of the content feature vector are calculated by (number of feature words appearing in metadata) / (norm). Here, the norm is the square root of the sum of the squares of the number of appearances of each feature word. The division by the norm is a normalization process for setting the length of the content feature vector to 1.

また、別のコンテンツ２のメタデータに「街」という特徴語しか含まれない場合には、このコンテンツ２のコンテンツ特徴ベクトルは、｛０，０，１，０，０｝となる。 When only the feature word “town” is included in the metadata of another content 2, the content feature vector of this content 2 is {0, 0, 1, 0, 0}.

次のステップ２においては、ユーザの履歴に基づいてユーザの特徴をベクトル化する。このベクトル化の処理により得られるベクトルを、ユーザ特徴ベクトルと呼ぶ。ユーザ特徴ベクトルの計算方法はいくつか提案されているが、一例として、非特許文献２に記載された、ＳＶＭ（Support Vector Machine）という手法を用いた計算方法がある。ＳＶＭを用いた計算方法について、以下説明する。 In the next step 2, the user features are vectorized based on the user history. A vector obtained by this vectorization processing is called a user feature vector. Several calculation methods for user feature vectors have been proposed. As an example, there is a calculation method using a method called SVM (Support Vector Machine) described in Non-Patent Document 2. A calculation method using SVM will be described below.

例えば、ユーザＡがコンテンツ１とコンテンツ２とを視聴していた場合には、コンテンツ１のメタデータに含まれる特徴語「ドラマ」及び「涙」に対応する特徴辞書の第１次元及び第４次元、並びにコンテンツ２のメタデータに含まれる特徴語「街」に対応する特徴辞書の第３次元に正の値を有するベクトルとして、ユーザ特徴ベクトル｛０．５８，０，０．５８，０．５８，０｝が得られる。 For example, when user A is viewing content 1 and content 2, the first and fourth dimensions of the feature dictionary corresponding to the feature words “drama” and “tears” included in the metadata of content 1 , As well as the user feature vector {0.58, 0, 0.58, 0.58 as a vector having a positive value in the third dimension of the feature dictionary corresponding to the feature word “town” included in the metadata of the content 2 , 0} is obtained.

また、あるユーザＢがコンテンツ２のみを視聴していた時は、コンテンツ２の特徴語「街」に対応する特徴辞書の第３次元のみに正の値を有するベクトルとして、ユーザ特徴ベクトルは｛０，０，１，０，０｝となる。 When a user B is viewing only the content 2, the user feature vector is {0 as a vector having a positive value only in the third dimension of the feature dictionary corresponding to the feature word “town” of the content 2. , 0, 1, 0, 0}.

最後のステップ３において、対象ユーザの特徴に応じてどのコンテンツをレコメンドするかを最終的に決定する。レコメンドされるコンテンツの決定は、対象ユーザのユーザ特徴ベクトルと各コンテンツのコンテンツ特徴ベクトルとの類似度を計算し、類似度の高いコンテンツを選択することにより行われる。類似度の計算は、例えばユーザ特徴ベクトルとコンテンツ特徴ベクトルとの内積を計算することにより行われる。 In the final step 3, it is finally determined which content to recommend according to the characteristics of the target user. The content to be recommended is determined by calculating the similarity between the user feature vector of the target user and the content feature vector of each content, and selecting content with a high similarity. The similarity is calculated by, for example, calculating the inner product of the user feature vector and the content feature vector.

脊戸柳昌宏、深澤佑介、野口勝広、宮川聡、高橋良之、「ソーシャルゲームにおけるアイテム形状の類似性に着目したレコメンデーション」、IEICE SIG Notes WI-2011-41, pp.69-74, 2011Masahiro Shido, Keisuke Fukazawa, Katsuhiro Noguchi, Kei Miyagawa, Yoshiyuki Takahashi, “Recommendations Focusing on Similarity of Item Shapes in Social Games”, IEICE SIG Notes WI-2011-41, pp.69-74, 2011 Y. Fukazawa, M. Hara, M. Onogi, H. Ueno、“Automatic Mobile Menu Customization Based on User Operation History”, Mobile HCI 2009Y. Fukazawa, M. Hara, M. Onogi, H. Ueno, “Automatic Mobile Menu Customization Based on User Operation History”, Mobile HCI 2009

ところで、上述のユーザＢに対するレコメンドコンテンツを決定する際に、問題になる場合がある。例えば、レコメンド対象コンテンツのうち、メタデータに「街」という特徴語を含むコンテンツがコンテンツ２しかない場合を考える。ユーザＢのユーザ特徴ベクトルは、上述の通り｛０，０，１，０，０｝である。したがって、類似度の高い（すなわちユーザＢのユーザ特徴ベクトルと、当該コンテンツのコンテンツ特徴ベクトルとの内積が正の値となる）コンテンツは、コンテンツ２しか存在しない。しかしながら、コンテンツ２は、既にユーザＢが視聴したコンテンツであるため、このコンテンツ２をレコメンドするのは、サービス上、効果がない。 By the way, when determining the recommended content for the user B described above, there may be a problem. For example, let us consider a case in which only content 2 includes content including a feature word “town” in metadata among recommended content. The user feature vector of user B is {0, 0, 1, 0, 0} as described above. Therefore, only content 2 exists in the content having a high degree of similarity (that is, the inner product of the user feature vector of user B and the content feature vector of the content has a positive value). However, since the content 2 has already been viewed by the user B, recommending the content 2 has no effect on the service.

このような問題を解決するために、特徴ベクトルの各次元に複数の特徴語を対応付ける方法がある。この方法で定義される特徴辞書を図２に示す。この特徴辞書では、特徴ＩＤ１に特徴語１〜４としてそれぞれ「ドラマ」、「国内」、「海外」、「映画」が対応付けられ、特徴ＩＤ２に特徴語１〜４としてそれぞれ「スポーツ」、「野球」、「バスケ」、「卓球」が対応付けられている。このように特徴辞書を定義することにより、１つの次元に複数の特徴語を対応付けられるため、特徴ベクトルの１次元あたり、より多くのコンテンツの特徴を表現することができる。 In order to solve such a problem, there is a method of associating a plurality of feature words with each dimension of the feature vector. A feature dictionary defined by this method is shown in FIG. In this feature dictionary, “drama”, “domestic”, “overseas”, and “movie” are associated with feature ID 1 as feature words 1 to 4 respectively, and “sports” and “ “Baseball”, “Basketball”, and “Table Tennis” are associated with each other. By defining the feature dictionary in this way, a plurality of feature words can be associated with one dimension, so that more content features can be expressed per one dimension of the feature vector.

しかしながら、従来の方法では、特徴辞書の各次元への特徴語のまとめ方は、似たような言葉をサービス運用者が手作業でまとめて各次元に対応付けるというものであった。この方法では、作業に労力や時間がかかっており、さらに、複数の特徴語をまとめる方法も確立されていなかった。 However, in the conventional method, the feature words are grouped in each dimension of the feature dictionary in which similar words are manually collected by the service operator and associated with each dimension. In this method, labor and time are required for the work, and a method for collecting a plurality of feature words has not been established.

そこで、本発明は、特徴ベクトルの各次元に複数の特徴語を対応付ける特徴辞書を定義する場合に、特徴語を自動的に且つ適切に特徴ベクトルの各次元に対応付けるレコメンドサーバ及び特徴辞書生成方法を提供することを目的とする。 Therefore, the present invention provides a recommendation server and a feature dictionary generation method for automatically and appropriately associating a feature word with each dimension of a feature vector when defining a feature dictionary that associates a plurality of feature words with each dimension of the feature vector. The purpose is to provide.

本発明に係るレコメンドサーバは、コンテンツを表すデータから抽出されたユーザ履歴内共起元ワードのそれぞれについて、コンテンツを表すデータから抽出された他のワードとのユーザ履歴内共起強度を算出するユーザ履歴内共起強度算出手段と、ユーザ履歴内共起強度算出手段により算出されたユーザ履歴内共起強度の強い順に、ユーザ履歴内共起元ワードとユーザ履歴内共起するユーザ履歴内共起先ワードを１つ以上選択し、ユーザ履歴内共起元ワード及び選択されたユーザ履歴内共起先ワードを特徴ベクトルの各次元に対応する特徴語としてまとめて設定するユーザ履歴内共起ワード設定手段と、ユーザ履歴内共起ワード設定手段により特徴ベクトルの各次元に対応する特徴語として設定された特徴語で構成された特徴辞書を生成する特徴辞書生成手段と、を備える。 The recommendation server according to the present invention calculates a user history co-occurrence intensity with another word extracted from data representing content for each of the user history co-occurrence source words extracted from the data representing content. User history co-occurrence strength calculating means and user history co-occurrence intensity calculating means, in the order of the strongest user history co-occurrence intensity, the user history co-occurrence source word and the user history co-occurrence destination User history co-occurrence word setting means for selecting one or more words and collectively setting the user history co-occurrence source word and the selected user history co-occurrence destination word as feature words corresponding to each dimension of the feature vector; And generating a feature dictionary composed of feature words set as feature words corresponding to each dimension of the feature vector by the user history co-occurrence word setting means It includes a feature dictionary generating means.

上記のレコメンドサーバによれば、ユーザ履歴内共起元ワードとユーザ履歴内共起するユーザ履歴内共起先ワードがユーザ履歴内共起強度の強い順に選択され、ユーザ履歴内共起元ワードとともに特徴ベクトルの各次元に対応する特徴語としてまとめて設定される。そして、このように設定された特徴語で構成された特徴辞書が生成される。これにより、関連の強い１つ以上の特徴語をまとめて特徴ベクトルの各次元に対応させた特徴辞書が生成されるため、特徴ベクトルの各次元に複数の特徴語を対応付ける特徴辞書を定義する場合に、特徴語を自動的に且つ適切に定義することができる。そして、このように特徴辞書を生成することにより、コンテンツを表すデータに所定のワードを含むコンテンツを視聴した対象ユーザに、当該所定のワードと関連の強い（すなわちユーザ履歴内共起する）他のワードを含む他のコンテンツをレコメンドすることができ、対象ユーザに対するレコメンドコンテンツのバリエーションが広がる。 According to the above recommendation server, the co-occurrence source word in the user history and the co-occurrence destination word in the user history that co-occurs in the user history are selected in the order of the strength of the co-occurrence in the user history, and together with the co-occurrence source word in the user history It is set collectively as feature words corresponding to each dimension of the vector. Then, a feature dictionary composed of the feature words set in this way is generated. As a result, a feature dictionary in which one or more strongly related feature words are collected and corresponded to each dimension of the feature vector is generated. Therefore, when defining a feature dictionary that associates a plurality of feature words to each dimension of the feature vector In addition, feature words can be defined automatically and appropriately. Then, by generating the feature dictionary in this way, the target user who has watched the content including the predetermined word in the data representing the content is strongly related to the predetermined word (that is, co-occurs in the user history). Other contents including the word can be recommended, and variations of the recommended contents for the target user are expanded.

上記のレコメンドサーバでは、ユーザ履歴内共起ワード設定手段により選択されなかったワードであるコンテンツ間共起元ワードのそれぞれについて、他のワードとのコンテンツ間共起強度を算出するコンテンツ間共起強度算出手段と、コンテンツ間共起強度算出手段により算出されたコンテンツ間共起強度の強い順に、コンテンツ間共起元ワードとコンテンツ間共起するコンテンツ間共起先ワードを１つ以上選択し、コンテンツ間共起元ワード及び選択されたコンテンツ間共起先ワードを特徴ベクトルの各次元に対応する特徴語としてまとめて設定するコンテンツ間共起ワード設定手段と、をさらに備え、特徴辞書生成手段は、ユーザ履歴内共起ワード設定手段及びコンテンツ間共起ワード設定手段により特徴ベクトルの各次元に対応する特徴語として設定された特徴語で構成された特徴辞書を生成してもよい。この場合、コンテンツを表すデータに所定のワードを含むコンテンツを視聴した対象ユーザに、当該所定のワードと関連の強いワードとして、当該所定のワードとコンテンツ間共起する他のワードを含む他のコンテンツをレコメンドすることもできるようになり、対象ユーザに対するレコメンドコンテンツのバリエーションがさらに広がる。 In the recommendation server, the content co-occurrence strength for calculating the content co-occurrence strength with other words is calculated for each of the content co-occurrence source words that are not selected by the user history co-occurrence word setting unit. Select one or more content co-occurrence source words and content co-occurrence destination words that co-occur between content items in the order of the content co-occurrence strength calculated by the calculation unit and the content co-occurrence strength calculation unit. A content co-occurrence word setting unit configured to collectively set the co-occurrence source word and the selected inter-content co-occurrence destination word as a feature word corresponding to each dimension of the feature vector; Corresponding to each dimension of feature vector by internal co-occurrence word setting means and inter-content co-occurrence word setting means The feature dictionary configured with the set characteristic word as symptoms word may be generated. In this case, other content including another word that co-occurs between the predetermined word and the content as a word strongly related to the predetermined word to a target user who has viewed the content including the predetermined word in the data representing the content Can also be recommended, and the variation of the recommended content for the target user is further expanded.

また、上記のレコメンドサーバでは、コンテンツ間共起ワード設定手段は、特徴語としてまとめて設定されたコンテンツ間共起元ワード及びコンテンツ間共起先ワードのうち少なくとも１つを、コンテンツを表すデータに含むコンテンツの数が所定の閾値以上となるようにコンテンツ間共起元ワード及びコンテンツ間共起先ワードを選択してもよい。この場合、特徴ベクトルの１次元あたりでカバーされるコンテンツの数が所定の閾値以上となるため、特徴ベクトルの１次元あたり、コンテンツの特徴をより多く表現することができる。 In the recommendation server, the inter-content co-occurrence word setting unit includes at least one of the inter-content co-occurrence source word and the inter-content co-occurrence destination word collectively set as feature words in the data representing the content. The inter-content co-occurrence source word and the inter-content co-occurrence destination word may be selected so that the number of contents is equal to or greater than a predetermined threshold. In this case, since the number of contents covered per one dimension of the feature vector is equal to or greater than a predetermined threshold value, more features of the contents can be expressed per one dimension of the feature vector.

また、上記のレコメンドサーバでは、ユーザ履歴内共起ワード設定手段は、特徴語としてまとめて設定されたユーザ履歴内共起元ワード及びユーザ履歴内共起先ワードのうち少なくとも１つを、コンテンツを表すデータに含むコンテンツの数が所定の閾値以上となるようにユーザ履歴内共起元ワード及びユーザ履歴内共起先ワードを選択してもよい。この場合、特徴ベクトルの１次元あたりでカバーされるコンテンツの数が所定の閾値以上となるため、特徴ベクトルの１次元あたり、コンテンツの特徴をより多く表現することができる。 In the recommendation server, the user history co-occurrence word setting means represents at least one of the user history co-occurrence source word and the user history co-occurrence destination word collectively set as a feature word to represent content. The user history co-occurrence source word and the user history co-occurrence destination word may be selected so that the number of contents included in the data is equal to or greater than a predetermined threshold. In this case, since the number of contents covered per one dimension of the feature vector is equal to or greater than a predetermined threshold value, more features of the contents can be expressed per one dimension of the feature vector.

また、上記のレコメンドサーバでは、ユーザ履歴内共起強度は、コンテンツを表すデータにユーザ履歴内共起元ワードが含まれるコンテンツの視聴履歴を有するユーザが視聴したコンテンツのうち、コンテンツを表すデータにユーザ履歴内共起先ワードが含まれるコンテンツの個数であってもよい。この場合、コンテンツを表すデータにユーザ履歴内共起先ワードが含まれるコンテンツの個数が多くなるように、特徴語がまとめて設定される。そのため、特徴ベクトルの１次元あたりでカバーされるコンテンツの数がより多くなるため、より多くのコンテンツをユーザに対してレコメンドすることが可能となる。 In the recommendation server, the co-occurrence strength in the user history is the data representing the content among the content viewed by the user who has the viewing history of the content including the co-occurrence source word in the user history in the data representing the content. It may be the number of contents including the co-occurrence destination word in the user history. In this case, the feature words are set together so that the number of contents including the co-occurrence destination word in the user history is increased in the data representing the contents. For this reason, since the number of contents covered per one dimension of the feature vector is increased, it is possible to recommend more contents to the user.

また、上記のレコメンドサーバでは、ユーザ履歴内共起強度は、コンテンツを表すデータにユーザ履歴内共起元ワードが含まれるコンテンツの視聴履歴を有するユーザのうち、コンテンツを表すデータにユーザ履歴内共起先ワードが含まれるコンテンツの視聴履歴を有するユーザの数であってもよい。この場合、コンテンツを表すデータにユーザ履歴内共起元ワードを含むコンテンツを視聴し、かつコンテンツを表すデータにユーザ履歴内共起先ワードを含むコンテンツを視聴したユーザの数が多くなるように、特徴語がまとめて設定される。ここで、２つのコンテンツを見ているユーザが多ければ、それらの２つのコンテンツの関連性は、それだけ強いと考えることができる。そのため、上記の構成によれば、より関連性の強いコンテンツをユーザに対してレコメンドすることが可能となる。 Further, in the above recommendation server, the co-occurrence strength in the user history is the user history co-occurrence in the data representing the content among the users having the viewing history of the content including the co-occurrence source word in the user history in the data representing the content. It may be the number of users who have a viewing history of content including the destination word. In this case, the feature is such that the number of users who view the content including the co-occurrence source word in the user history in the data representing the content, and view the content including the co-occurrence destination word in the user history in the data representing the content increases. Words are set together. Here, if there are many users watching two contents, it can be considered that the relevance of these two contents is so strong. Therefore, according to said structure, it becomes possible to recommend a more relevant content with respect to a user.

また、本発明に係る特徴辞書生成方法は、レコメンドサーバにより実行される特徴辞書生成方法であって、コンテンツを表すデータから抽出されたユーザ履歴内共起元ワードのそれぞれについて、コンテンツを表すデータから抽出された他のワードとのユーザ履歴内共起強度を算出するユーザ履歴内共起強度算出ステップと、ユーザ履歴内共起強度算出ステップにより算出されたユーザ履歴内共起強度の強い順に、ユーザ履歴内共起元ワードとユーザ履歴内共起するユーザ履歴内共起先ワードを１つ以上選択し、ユーザ履歴内共起元ワード及び選択されたユーザ履歴内共起先ワードを特徴ベクトルの各次元に対応する特徴語としてまとめて設定するユーザ履歴内共起ワード設定ステップと、ユーザ履歴内共起ワード設定ステップにより特徴ベクトルの各次元に対応する特徴語として設定された特徴語で構成された特徴辞書を生成する特徴辞書生成ステップと、を備える。 The feature dictionary generation method according to the present invention is a feature dictionary generation method executed by a recommendation server, and for each of the user history co-occurrence source words extracted from the data representing content, from the data representing content The user history co-occurrence intensity calculating step for calculating the user history co-occurrence intensity with the other extracted word and the user history co-occurrence intensity calculating step calculated in the user history co-occurrence intensity calculating step One or more user history co-occurrence destination words that co-occur in the user history and user history co-occurrence destination words are selected, and the user history co-occurrence source word and the selected user history co-occurrence destination word in each dimension of the feature vector The user history co-occurrence word setting step and the user history co-occurrence word setting step collectively set as corresponding feature words And a feature dictionary generating step of generating a configuration feature dictionary feature words are set as the feature words corresponding to each dimension of Torr.

なお、本明細書において、共起元ワードと共起先ワードとがユーザ履歴内共起するとは、コンテンツを表すデータに共起元ワードが含まれるコンテンツを視聴したユーザが視聴したコンテンツに共起先ワードが含まれることを意味する。また、共起元ワードと共起先ワードとがコンテンツ間共起するとは、一つのコンテンツについて、当該コンテンツを表すデータに共起元ワード及び共起先ワードの両方が含まれることを意味する。 In this specification, the co-occurrence source word and the co-occurrence destination word co-occur in the user history means that the co-occurrence destination word is included in the content viewed by the user who has viewed the content including the co-occurrence source word in the data representing the content. Is included. Further, the co-occurrence source word and the co-occurrence destination word co-occurrence between contents means that, for one content, both the co-occurrence source word and the co-occurrence destination word are included in the data representing the content.

本発明によれば、特徴ベクトルの各次元に複数の特徴語を対応付ける特徴辞書を定義する場合に、特徴語を自動的に且つ適切に特徴ベクトルの各次元に対応付けるレコメンドサーバ及び特徴辞書生成方法が提供される。 According to the present invention, when defining a feature dictionary that associates a plurality of feature words with each dimension of a feature vector, a recommendation server and a feature dictionary generation method that automatically and appropriately associate a feature word with each dimension of a feature vector. Provided.

特徴辞書の一例を示す図である。It is a figure which shows an example of a feature dictionary. 各次元に複数の特徴語が対応付けられた特徴辞書の一例を示す図である。It is a figure which shows an example of the feature dictionary by which the several feature word was matched with each dimension. 発明の実施形態に係るレコメンドサーバ及びユーザ端末を含むシステムを示す図である。It is a figure which shows the system containing the recommendation server and user terminal which concern on embodiment of invention. レコメンドサーバの機能ブロック構成図である。It is a functional block block diagram of a recommendation server. レコメンドサーバにより実行される処理を示すフローチャートである。It is a flowchart which shows the process performed by a recommendation server. ユーザ履歴内共起による特徴語選択処理を示す図である。It is a figure which shows the feature word selection process by the co-occurrence in a user history. コンテンツ間共起による特徴語選択処理を示す図である。It is a figure which shows the feature word selection process by the co-occurrence between contents. コンテンツＩＤ・ワード対応表の一例を示す図である。It is a figure which shows an example of a content ID / word correspondence table. 視聴ログテーブルの一例を示す図である。It is a figure which shows an example of a viewing-and-listening log table. ユーザ履歴内共起コンテンツテーブルの一例を示す図である。It is a figure which shows an example of the co-occurrence content table in a user history. ユーザ履歴内共起ワードテーブルの一例を示す図である。It is a figure which shows an example of the co-occurrence word table in a user history. ユーザ履歴内に出現しないワードが削除されたユーザ履歴内共起ワードテーブルを示す図である。It is a figure which shows the co-occurrence word table in a user history from which the word which does not appear in a user history was deleted. ソート済みユーザ履歴内共起ワードテーブルを示す図である。It is a figure which shows the co-occurrence word table in a sorted user history. 同一次元の特徴語としてまとめられる特徴語が決定された後のソート済みユーザ履歴内共起ワードテーブルを示す図である。It is a figure which shows the co-occurrence word table in the sorted user history after the characteristic word put together as the characteristic word of the same dimension was determined. ユーザ履歴内共起による特徴語選択処理の変形例を示す図である。It is a figure which shows the modification of the feature word selection process by the co-occurrence in a user history. コンテンツ間共起テーブルの一例を示す図である。It is a figure which shows an example of the co-occurrence table between contents. レコメンドサーバのハードウェア構成例を示す図である。It is a figure which shows the hardware structural example of a recommendation server.

以下、図面を参照しながら、本発明に係る実施形態を説明する。以下の実施形態では、図３に示されるように、１つのレコメンドサーバ１００と複数（ここでは一例として８つ）のユーザ端末１１〜１８とが通信する形態を想定する。ユーザ端末１１〜１８のユーザには、それぞれＡ〜ＨのユーザＩＤが予め与えられており、以下、これらのユーザ（ユーザＩＤ＝Ａ〜Ｈのユーザ）を対象ユーザとして説明する。なお、レコメンドサーバ１００とユーザ端末１１〜１８との間の通信は、有線通信ネットワークを介して行われてもよいし、無線通信ネットワークを介して行われてもよい。 Embodiments according to the present invention will be described below with reference to the drawings. In the following embodiment, as illustrated in FIG. 3, a mode is assumed in which one recommendation server 100 and a plurality of (here, eight as an example) user terminals 11 to 18 communicate. User IDs A to H are given in advance to the users of the user terminals 11 to 18, respectively, and these users (users of user ID = A to H) will be described below as target users. Note that communication between the recommendation server 100 and the user terminals 11 to 18 may be performed via a wired communication network or via a wireless communication network.

［レコメンドサーバの構成］
本実施形態におけるレコメンドサーバ１００の構成を図４に示す。この図４に示すように、レコメンドサーバ１００は、形態素解析部１０１、メタデータ管理部１０２、コンテンツ・ワード対応管理部１０３、ユーザ履歴蓄積部１０４、ユーザ履歴内共起ワード算出部１０５（ユーザ履歴内共起強度算出手段）、ユーザ履歴内共起ワードテーブル作成部１０６（ユーザ履歴内共起ワード設定手段）、コンテンツ間共起ワード算出部１０７（コンテンツ間共起強度算出手段）、コンテンツ間共起ワードテーブル作成部１０８（コンテンツ間共起ワード設定手段）、及び特徴辞書管理部１０９（特徴辞書生成手段）を備える。以下、各部の機能・動作について説明する。 [Configuration of recommendation server]
The structure of the recommendation server 100 in this embodiment is shown in FIG. As shown in FIG. 4, the recommendation server 100 includes a morphological analysis unit 101, a metadata management unit 102, a content / word correspondence management unit 103, a user history storage unit 104, a user history co-occurrence word calculation unit 105 (user history). Internal co-occurrence strength calculation means), user history co-occurrence word table creation unit 106 (user history co-occurrence word setting means), content co-occurrence word calculation unit 107 (content co-occurrence strength calculation means), content co-occurrence An origin word table creation unit 108 (inter-content co-occurrence word setting unit) and a feature dictionary management unit 109 (feature dictionary generation unit) are provided. Hereinafter, functions and operations of each unit will be described.

メタデータ管理部１０２は、コンテンツを表すデータであるとともに各コンテンツに対応付けられるデータであるメタデータを保持し、形態素解析部１０１にメタデータを出力する。メタデータは、例えば、コンテンツＩＤ及び当該コンテンツに関する情報（例えばタイトルやジャンル、出演者、コンテンツ内容の詳細情報など）を含む。 The metadata management unit 102 holds metadata that is data representing content and is associated with each content, and outputs the metadata to the morpheme analysis unit 101. The metadata includes, for example, a content ID and information related to the content (for example, a title, a genre, a performer, and detailed information on the content).

形態素解析部１０１は、メタデータ管理部１０２からメタデータを受け取り、受け取ったメタデータに対して形態素解析を行う。そして、形態素解析部１０１は、メタデータに対する形態素解析により抽出されたワードを、当該メタデータに対応するコンテンツのコンテンツＩＤと対応付けてコンテンツ・ワード対応管理部１０３に出力する。 The morpheme analysis unit 101 receives metadata from the metadata management unit 102 and performs morpheme analysis on the received metadata. Then, the morpheme analyzer 101 outputs the word extracted by the morpheme analysis on the metadata to the content / word correspondence manager 103 in association with the content ID of the content corresponding to the metadata.

コンテンツ・ワード対応管理部１０３は、形態素解析部１０１から出力されたコンテンツＩＤ及びワードを受け取り、後述のコンテンツＩＤ・ワード対応表を作成し管理する。 The content / word correspondence management unit 103 receives the content ID and word output from the morpheme analysis unit 101, and creates and manages a content ID / word correspondence table described later.

ユーザ履歴蓄積部１０４は、各ユーザがどのコンテンツを視聴したかを示すユーザ履歴情報を蓄積しており、ユーザ履歴内共起ワード算出部１０５にユーザ履歴情報を出力する。ユーザ履歴情報は、各ユーザ端末１１〜１８からアップロードされたものであり、各ユーザ端末１１〜１８に対応するユーザＩＤ，及び当該ユーザ端末においてユーザが視聴したコンテンツのコンテンツＩＤを含む。 The user history accumulation unit 104 accumulates user history information indicating which content each user has watched, and outputs the user history information to the user history co-occurrence word calculation unit 105. The user history information is uploaded from each user terminal 11 to 18 and includes the user ID corresponding to each user terminal 11 to 18 and the content ID of the content viewed by the user on the user terminal.

ユーザ履歴内共起ワード算出部１０５は、コンテンツ・ワード対応管理部１０３からコンテンツＩＤ・ワード対応表を受け取るとともに、ユーザ履歴蓄積部１０４からユーザ履歴情報を受け取る。そして、ユーザ履歴内共起ワード算出部１０５は、各ワードをユーザ履歴内共起元ワードとして、当該ユーザ履歴内共起元ワードをメタデータに含むコンテンツを視聴したユーザにおいて、ユーザ履歴情報に記載されたコンテンツ（すなわち当該ユーザが視聴したコンテンツ）のメタデータが、ユーザ履歴内共起元ワード以外のワードとしてどのようなワードを含んでいるかを調べる。そして、ユーザ履歴内共起元ワードと他のワードとの共起の強さ（ユーザ履歴内共起強度）を算出し、ユーザ履歴内共起ワードテーブル作成部１０６に出力する。 The user history co-occurrence word calculation unit 105 receives the content ID / word correspondence table from the content / word correspondence management unit 103 and receives user history information from the user history storage unit 104. Then, the user history co-occurrence word calculation unit 105 sets each word as the user history co-occurrence source word, and describes the user history information in the user history information in the user who has viewed the content including the user history co-occurrence source word in the metadata. It is examined what kind of word is included in the metadata of the selected content (that is, the content viewed by the user) as a word other than the co-occurrence source word in the user history. Then, the strength of co-occurrence between the user history co-occurrence source word and other words (user history co-occurrence strength) is calculated and output to the user history co-occurrence word table creation unit 106.

ユーザ履歴内共起ワードテーブル作成部１０６は、ユーザ履歴内共起ワード算出部１０５から、各ワードについてのワード同士のユーザ履歴内共起強度を受け取るとともに、コンテンツ・ワード対応管理部１０３からコンテンツＩＤ・ワード対応表を受け取り、ユーザ履歴内共起強度をテーブルとして整理する。後述のように、本実施形態では、ユーザ履歴内共起強度は、コンテンツの個数で表されるものとする。さらに、ユーザ履歴内共起ワードテーブル作成部１０６は、後述のように、ユーザ履歴内共起元ワードとユーザ履歴内共起するユーザ履歴内共起先ワードを１つ以上選択し、ユーザ履歴内共起ワード及び選択されたユーザ履歴内共起先ワードを、特徴辞書の特徴ベクトルの各次元に対応する特徴語としてまとめて設定する。ユーザ履歴内共起ワードテーブル作成部１０６は、まとめて設定された特徴語を特徴辞書管理部１０９に出力する。 The user history co-occurrence word table creation unit 106 receives the user history co-occurrence strength for each word from the user history co-occurrence word calculation unit 105 and the content ID from the content / word correspondence management unit 103.・ Receiving the word correspondence table and organizing the co-occurrence strength in the user history as a table. As will be described later, in this embodiment, the co-occurrence strength in the user history is represented by the number of contents. Further, the user history co-occurrence word table creation unit 106 selects one or more user history co-occurrence source words and one or more user history co-occurrence destination words co-occurring in the user history, as will be described later. The origin word and the selected user history co-occurrence destination word are collectively set as a feature word corresponding to each dimension of the feature vector of the feature dictionary. The user history co-occurrence word table creation unit 106 outputs the feature words set together to the feature dictionary management unit 109.

コンテンツ間共起ワード算出部１０７は、ユーザ履歴内共起ワードテーブル作成部１０６によって選択されなかったワード（すなわち、ユーザ履歴内共起の観点から特徴語として選択されなかったワードであり、典型的には、ユーザ履歴情報における出現数の少ないワード）を受け取るとともに、コンテンツ・ワード対応管理部１０３からコンテンツＩＤ・ワード対応表を受け取る。そして、コンテンツ間共起ワード算出部１０７は、ユーザ履歴内共起ワードテーブル作成部１０６から受け取ったワードをコンテンツ間共起元ワードとして、各コンテンツ間共起元ワードについて、コンテンツＩＤ・ワード対応表をもとに他のワードとのコンテンツ間共起強度を算出し、算出結果をコンテンツ間共起ワードテーブル作成部１０８に出力する。ここで、２つのワード間のコンテンツ間共起強度とは、２つのワードの両方をメタデータに含むコンテンツの数を指す。 The inter-content co-occurrence word calculation unit 107 is a word that has not been selected by the user history co-occurrence word table creation unit 106 (that is, a word that has not been selected as a feature word from the viewpoint of co-occurrence in the user history, 2), the content ID / word correspondence table is received from the content / word correspondence management unit 103. Then, the inter-content co-occurrence word calculation unit 107 sets the word received from the user history co-occurrence word table creation unit 106 as the inter-content co-occurrence source word, and the content ID / word correspondence table for each inter-content co-occurrence source word. Based on the above, the co-occurrence strength between contents with other words is calculated, and the calculation result is output to the inter-content co-occurrence word table creation unit. Here, the co-occurrence strength between contents between two words indicates the number of contents including both of the two words in the metadata.

コンテンツ間共起ワードテーブル作成部１０８は、コンテンツ間共起ワード算出部１０７から各ワードについてのコンテンツ間共起強度を受け取るとともに、コンテンツ・ワード対応管理部１０３からコンテンツＩＤ・ワード対応表を受け取る。そして、コンテンツ間共起強度及びコンテンツＩＤ・ワード対応表に基づいて、コンテンツ間共起強度をテーブルとして整理する。さらに、コンテンツ間共起ワードテーブル作成部１０８は、後述のように、コンテンツ間共起強度の強い順に、コンテンツ間共起元ワードとコンテンツ間共起するコンテンツ間共起先ワードを１つ以上選択し、コンテンツ間共起元ワード及び選択されたコンテンツ間共起先ワードを、特徴辞書の特徴ベクトルの各次元に対応する特徴語としてまとめて設定する。コンテンツ間共起ワードテーブル作成部１０８は、まとめて設定された特徴語を特徴辞書管理部１０９に出力する。 The inter-content co-occurrence word table creation unit 108 receives the inter-content co-occurrence intensity for each word from the inter-content co-occurrence word calculation unit 107 and also receives the content ID / word correspondence table from the content / word correspondence management unit 103. Based on the content co-occurrence strength and the content ID / word correspondence table, the content co-occurrence strength is organized as a table. Further, as will be described later, the inter-content co-occurrence word table creation unit 108 selects one or more inter-content co-occurrence source words and inter-content co-occurrence destination words in order of increasing inter-content co-occurrence strength. The content co-occurrence source word and the selected content co-occurrence destination word are collectively set as feature words corresponding to each dimension of the feature vector of the feature dictionary. The inter-content co-occurrence word table creation unit 108 outputs the feature words set together to the feature dictionary management unit 109.

特徴辞書管理部１０９は、ユーザ履歴内共起ワードテーブル作成部１０６及びコンテンツ間共起ワードテーブル作成部１０８により特徴ベクトルの各次元に対応する特徴語として設定された特徴語で構成された特徴辞書を生成し、生成された特徴辞書を管理する。レコメンドサーバ１００は、特徴辞書管理部１０９により生成され管理された特徴辞書に基づいて、ユーザ端末１１〜１８に対してレコメンドコンテンツの情報を提供する。 The feature dictionary management unit 109 includes a feature dictionary configured by feature words set as feature words corresponding to each dimension of a feature vector by the user history co-occurrence word table creation unit 106 and the inter-content co-occurrence word table creation unit 108. And manage the generated feature dictionary. The recommendation server 100 provides recommended content information to the user terminals 11 to 18 based on the feature dictionary generated and managed by the feature dictionary management unit 109.

図１７には、レコメンドサーバ１００のハードウェア構成例を示す。レコメンドサーバ１００には、物理的には、ＣＰＵ１００Ａ、主記憶装置であるＲＡＭ１００Ｂ及びＲＯＭ１００Ｃ、入力デバイスであるキーボード及びマウス等の入力装置１００Ｄ、ディスプレイ等の出力装置１００Ｅ、ネットワークカード等のデータ送受信デバイスである通信モジュール１００Ｆ、ハードディスク等の補助記憶装置１００Ｇなどを含むコンピュータシステムとして構成されている。図４を参照して説明した各部の機能は、図１５に示すＣＰＵ１００Ａ、ＲＡＭ１００Ｂ等のハードウェア上に所定のコンピュータソフトウェアを読み込ませて実行することにより、ＣＰＵ１００Ａの制御のもとで入力装置１００Ｄ、出力装置１００Ｅ、通信モジュール１００Ｆを動作させ、ＲＡＭ１００Ｂや補助記憶装置１００Ｇにおけるデータの読み出し及び書き込みを行うことで実現される。 FIG. 17 shows a hardware configuration example of the recommendation server 100. The recommendation server 100 is physically composed of a CPU 100A, RAMs 100B and ROM 100C as main storage devices, an input device 100D such as a keyboard and a mouse as input devices, an output device 100E such as a display, and a data transmission / reception device such as a network card. The computer system includes a communication module 100F, an auxiliary storage device 100G such as a hard disk, and the like. The functions of each unit described with reference to FIG. 4 are performed by reading predetermined computer software on hardware such as the CPU 100A and the RAM 100B shown in FIG. 15 and executing them, thereby controlling the input device 100D, This is realized by operating the output device 100E and the communication module 100F and reading and writing data in the RAM 100B and the auxiliary storage device 100G.

［特徴辞書の生成］
次に、本実施形態に係るレコメンドサーバ１００により実行される特徴辞書生成方法について説明する。 [Generate feature dictionary]
Next, a feature dictionary generation method executed by the recommendation server 100 according to the present embodiment will be described.

図５は、特徴辞書生成方法において行われる処理の全体を示すフローチャートである。まず、形態素解析部１０１が、メタデータ管理部１０２に蓄積されたコンテンツのメタデータに対して、形態素解析によるワード抽出処理を行う（ステップＳ１）。この形態素解析によるワード抽出処理では、形態素解析部１０１は、メタデータ管理部１０２から受け取ったメタデータから、形態素解析によって特徴語候補を抽出する。そして、形態素解析部１０１は、形態素解析の対象としたメタデータに対応するコンテンツのコンテンツＩＤと、抽出された特徴語候補であるワードとを対応付ける、図８に示されるコンテンツＩＤ・ワード対応表を作成する。図８の例では、例えば、コンテンツＩＤ＝００１の行において、ワード「野球」の列に丸が付されている。これは、コンテンツＩＤ＝００１のコンテンツのメタデータにワード「野球」が含まれており、他のワードは含まれていないことを意味する。 FIG. 5 is a flowchart showing the entire processing performed in the feature dictionary generation method. First, the morpheme analysis unit 101 performs word extraction processing by morpheme analysis on the metadata of content stored in the metadata management unit 102 (step S1). In the word extraction processing by morphological analysis, the morphological analysis unit 101 extracts feature word candidates from the metadata received from the metadata management unit 102 by morphological analysis. Then, the morpheme analyzer 101 associates the content ID / word correspondence table shown in FIG. 8, which associates the content ID of the content corresponding to the metadata subjected to morpheme analysis with the extracted word that is a feature word candidate. create. In the example of FIG. 8, for example, in the row of content ID = 001, the word “baseball” column is circled. This means that the word “baseball” is included in the content metadata of the content ID = 001, and no other words are included.

次に、ユーザ履歴内共起ワード算出部１０５及びユーザ履歴内共起ワードテーブル作成部１０６が、ユーザ履歴内共起による特徴語選択処理を行う（ステップＳ２）。次に、コンテンツ間共起ワード算出部１０７及びコンテンツ間共起ワードテーブル作成部１０８が、コンテンツ間共起による特徴語選択処理を行う（ステップＳ３）。ステップＳ２及びＳ３については、以下で詳述する。 Next, the co-occurrence word calculation unit 105 in the user history and the co-occurrence word table creation unit 106 in the user history perform a feature word selection process by co-occurrence in the user history (step S2). Next, the inter-content co-occurrence word calculation unit 107 and the inter-content co-occurrence word table creation unit 108 perform a feature word selection process based on the inter-content co-occurrence (step S3). Steps S2 and S3 will be described in detail below.

ステップＳ２のユーザ履歴内共起による特徴語選択処理について、図６を参照して説明する。まず、ユーザ履歴蓄積部１０４が、図９で示される視聴ログテーブルとして、ユーザ端末１１〜１８におけるユーザの視聴履歴を蓄積する（ステップＳ１１）。図９の例では、例えば、ユーザＩＤ＝Ａのユーザ（すなわちユーザ端末１１のユーザ）は、コンテンツＩＤ＝００１，００３，００４の３つのコンテンツを視聴している。 The feature word selection process by co-occurrence in the user history in step S2 will be described with reference to FIG. First, the user history storage unit 104 stores the user's viewing history in the user terminals 11 to 18 as the viewing log table shown in FIG. 9 (step S11). In the example of FIG. 9, for example, a user with user ID = A (that is, a user of user terminal 11) is viewing three contents with content ID = 001, 003, 004.

次に、ユーザ履歴内共起ワード算出部１０５及びユーザ履歴内共起ワードテーブル作成部１０６が、ユーザ履歴内共起ワードテーブルの作成を行う（ステップＳ１２：ユーザ履歴内共起強度算出ステップ）。この処理では、ユーザ履歴の観点から、ワードの互いの共起関係をまとめる処理が行われるとともに、ワード間のユーザ履歴内共起強度が算出される。 Next, the user history co-occurrence word calculation unit 105 and the user history co-occurrence word table creation unit 106 create a user history co-occurrence word table (step S12: user history co-occurrence intensity calculation step). In this process, from the viewpoint of the user history, a process for collecting the co-occurrence relationships of the words is performed, and the co-occurrence strength in the user history between the words is calculated.

ステップＳ１２の処理について、具体例を挙げて説明する。ここでは、まず、コンテンツの共起関係に着目する。図９の視聴ログテーブルから、例えばコンテンツＩＤ＝００１のコンテンツは、ユーザＡの視聴履歴情報内に存在する。このユーザＡは、コンテンツＩＤ＝００３，００４のコンテンツも視聴している。したがって、コンテンツＩＤ＝００１，００３，００４のコンテンツは、ユーザ履歴の観点から共起関係にあることが分かる。言い換えれば、コンテンツＩＤ＝００１のコンテンツをユーザ履歴内共起元コンテンツとし、コンテンツＩＤ＝００１，００３，００４のコンテンツをユーザ履歴内共起先コンテンツとした場合に、これらのユーザ履歴内共起元コンテンツ及びユーザ履歴内共起先コンテンツは共起関係にあるといえる。同様にして、ユーザ履歴内共起元コンテンツとユーザ履歴内共起先コンテンツとの共起関係の有無を調べることにより、ユーザ履歴内共起ワード算出部１０５は、図１０に示されるユーザ履歴内共起コンテンツテーブルを作成する。 The process of step S12 will be described with a specific example. Here, attention is first focused on the co-occurrence relationship of contents. From the viewing log table of FIG. 9, for example, the content with content ID = 001 exists in the viewing history information of the user A. This user A is also viewing content with content ID = 003,004. Therefore, it can be seen that the content with content ID = 001, 003, 004 has a co-occurrence relationship from the viewpoint of the user history. In other words, when the content with content ID = 001 is the co-occurrence source content in the user history and the content with content ID = 001, 003, 004 is the co-occurrence destination content in the user history, these co-occurrence content in the user history It can be said that the co-occurrence content in the user history has a co-occurrence relationship. Similarly, by checking the co-occurrence relationship between the co-occurrence source content in the user history and the co-occurrence destination content in the user history, the co-occurrence word calculation unit 105 in the user history makes the intra-user history co-occurrence word shown in FIG. Create a content table.

ここで、図８に示したコンテンツＩＤ・ワード対応表に記載されたワードのうち、「旅行」というワードに着目する。コンテンツＩＤ・ワード対応表から、「旅行」というワードは、コンテンツＩＤ＝００２，００５のコンテンツのメタデータに含まれる。図９の視聴ログテーブルを参照すると、コンテンツＩＤ＝００２，００５のコンテンツを視聴しているユーザは、ユーザＢ及びユーザＧの２名である。このうち、ユーザＢは、コンテンツＩＤ＝００２，００５のコンテンツを視聴した履歴を有する。図１０のユーザ履歴内共起コンテンツテーブルから、ユーザＢが視聴した２つのコンテンツのうち、コンテンツＩＤ＝００２のコンテンツと共起するコンテンツは、コンテンツＩＤ＝００５のコンテンツである。また、ユーザＢが視聴した２つのコンテンツのうち、コンテンツＩＤ＝００５のコンテンツと共起するコンテンツは、コンテンツＩＤ＝００２，００６，００７のコンテンツである。以上より、ユーザ履歴の観点から「旅行」というワードと共起するコンテンツは、コンテンツＩＤ＝００２，００５，００６，００７の４コンテンツとなる。 Here, attention is focused on the word “travel” among the words described in the content ID / word correspondence table shown in FIG. From the content ID / word correspondence table, the word “travel” is included in the content metadata of content ID = 002,005. Referring to the viewing log table of FIG. 9, there are two users B and G who are viewing the content with content ID = 002,005. Among these, the user B has a history of viewing content with content ID = 002,005. Of the two contents viewed by user B from the co-occurrence content table in the user history of FIG. 10, the content that co-occurs with the content with content ID = 002 is the content with content ID = 005. Of the two contents viewed by user B, the content that co-occurs with the content with content ID = 005 is the content with content ID = 002, 006, 007. As described above, the contents co-occurring with the word “travel” from the viewpoint of the user history are four contents of content ID = 002, 005, 006, 007.

図８のコンテンツＩＤ・ワード対応表を参照すると、これらのコンテンツＩＤ＝００２，００５，００６，００７の４コンテンツのうち、コンテンツＩＤ＝００２のコンテンツのメタデータに含まれるワードは「温泉」、「旅行」、「海外」であり、コンテンツＩＤ＝００５のコンテンツのメタデータに含まれるワードは「旅行」、「海外」であり、コンテンツＩＤ＝００６のコンテンツのメタデータに含まれるワードは「海外」、「ニュース」であり、コンテンツＩＤ＝００７のコンテンツのメタデータに含まれるワードは「ミステリー」、「海外」である。これらをまとめると、ユーザ履歴の観点から、ユーザ履歴内共起元ワードである「旅行」と共起するユーザ履歴内共起先ワードは、「旅行」、「温泉」、「海外」、「ニュース」、「ミステリー」である。 Referring to the content ID / word correspondence table of FIG. 8, among the four content ID = 002, 005, 006, 007, the word included in the metadata of the content with content ID = 002 is “hot spring”, “ “Travel” and “Overseas”, and the word included in the metadata of the content with content ID = 005 is “Travel” and “Overseas”, and the word included in the metadata of the content with content ID = 006 is “Overseas”. , “News”, and the words included in the content metadata of content ID = 007 are “Mystery” and “Overseas”. In summary, from the viewpoint of the user history, the co-occurrence destination word in the user history that co-occurs with “travel” in the user history is “travel”, “hot spring”, “overseas”, “news”. , "Mystery".

次に、ユーザ履歴内共起ワード算出部１０５は、これらのユーザ履歴内共起元ワード及びユーザ履歴内共起先ワードの、ユーザ履歴内共起強度を調べる。ユーザ履歴内共起強度としては、例えば、メタデータにユーザ履歴内共起元ワードが含まれるコンテンツの視聴履歴を有するユーザが視聴したコンテンツのうち、メタデータにユーザ履歴内共起先ワードが含まれるコンテンツの個数が用いられる。先に挙げた具体例を用いて説明すると、ユーザ履歴内共起強度は、「旅行」というユーザ履歴内共起元ワードをメタデータに含むコンテンツとユーザ履歴の観点から共起するコンテンツＩＤ＝００２，００５，００６，００７のコンテンツのメタデータにおいて、ユーザ履歴内共起先ワード「旅行」、「温泉」、「海外」、「ニュース」、「ミステリー」というワードを含むコンテンツの個数である。 Next, the user history co-occurrence word calculation unit 105 checks the user history co-occurrence strength of the user history co-occurrence source word and the user history co-occurrence destination word. As the co-occurrence intensity in the user history, for example, among the contents viewed by the user who has the viewing history of the content including the co-occurrence source word in the user history, the metadata includes the co-occurrence destination word in the user history. The number of contents is used. The user history co-occurrence strength will be described using the specific example given above. The content ID co-occurs from the viewpoint of the user history and the content including the user history co-occurrence word “travel” in the metadata. , 005, 006, 007 in the content metadata, the number of contents including the words “travel”, “hot spring”, “overseas”, “news”, “mystery” in the user history.

図８のコンテンツＩＤ・ワード対応表から、コンテンツＩＤ＝００２，００５，００６，００７のコンテンツのうち、ユーザ履歴内共起先ワード「旅行」がメタデータに含まれるのはコンテンツＩＤ＝００２，００５の２コンテンツであり、ユーザ履歴内共起先ワード「温泉」がメタデータに含まれるのはコンテンツＩＤ＝００２の１コンテンツであり、ユーザ履歴内共起先ワード「海外」がメタデータに含まれるのはコンテンツＩＤ＝００２，００５，００６，００７の４コンテンツであり、ユーザ履歴内共起先ワード「ニュース」がメタデータに含まれるのはコンテンツＩＤ＝００６の１コンテンツであり、ユーザ履歴内共起先ワード「ミステリー」がメタデータに含まれるのはコンテンツＩＤ＝００７の１コンテンツである。以上のようにして、ユーザ履歴内共起ワード算出部１０５は、ユーザ履歴内共起元ワード「旅行」とユーザ履歴内共起先ワード「旅行」、「温泉」、「海外」、「ニュース」、「ミステリー」とのユーザ履歴内共起強度をそれぞれ２，４，１，１，１と計算する。 From the content ID / word correspondence table of FIG. 8, among the content of content ID = 002, 005, 006, 007, the user history co-occurrence destination word “travel” is included in the metadata of content ID = 002, 005. It is 2 contents, and the user history co-occurrence destination word “hot spring” is included in the metadata for one content with content ID = 002, and the user history co-occurrence destination word “overseas” is included in the metadata. 4 contents of ID = 002, 005, 006, 007, and the co-occurrence destination word “news” in the user history is included in the metadata for one content of content ID = 006, and the co-occurrence destination word “mystery in the user history”. "Is included in the metadata is one content with content ID = 007. As described above, the user history co-occurrence word calculation unit 105 performs the user history co-occurrence source word “travel” and the user history co-occurrence destination words “travel”, “hot spring”, “overseas”, “news”, The in-user history co-occurrence intensity with “mystery” is calculated as 2, 4, 1, 1, 1 respectively.

ユーザ履歴内共起ワード算出部１０５は、上述したユーザ履歴内共起強度の計算を、他のワードをユーザ履歴内共起元ワードとして繰り返し、計算されたユーザ履歴内共起強度をユーザ履歴内共起ワードテーブル作成部１０６に出力する。ユーザ履歴内共起ワードテーブル作成部１０６は、ユーザ履歴内共起ワード算出部１０５から出力されたユーザ履歴内共起強度を、ユーザ履歴内共起元ワード及びユーザ履歴内共起先ワードと対応付けて、図１１に示されるユーザ履歴内共起ワードテーブルを作成する。 The user history co-occurrence word calculation unit 105 repeats the above-described calculation of the user history co-occurrence intensity using another word as the user history co-occurrence source word, and calculates the calculated user history co-occurrence intensity in the user history. The result is output to the co-occurrence word table creation unit 106. The user history co-occurrence word table creation unit 106 associates the user history co-occurrence strength output from the user history co-occurrence word calculation unit 105 with the user history co-occurrence source word and the user history co-occurrence destination word. Thus, the user history co-occurrence word table shown in FIG. 11 is created.

そして、ユーザ履歴内共起ワードテーブル作成部１０６は、作成されたユーザ履歴内共起ワードテーブルに基づき、図６に示される処理、すなわちワードを特徴ベクトルの同次元に対応する特徴語にまとめる処理を行う。 Then, the user history co-occurrence word table creation unit 106 performs processing shown in FIG. 6 based on the created user history co-occurrence word table, that is, processing that combines words into feature words corresponding to the same dimension of the feature vector. I do.

まず、ユーザ履歴内共起ワードテーブル作成部１０６は、図１１のユーザ履歴内共起ワードテーブルから、ユーザ履歴に出現しないワードを削除し（ステップＳ１３）、図１２に示されるようにユーザ履歴内共起ワードテーブルを変形する。ユーザ履歴に出現しないワードを削除する理由は、ユーザ履歴に出現しないワードは、ユーザの興味が低いワードであるため、特徴語としての重要度も低いと考えられるからである。図１１のユーザ履歴内共起ワードテーブルの例では、ユーザ履歴内に一度も出現しない特徴語として、「時事」、「釣り」、「フェリー」、「家庭」という４つのワードがある。そこで、ユーザ履歴内共起ワードテーブル作成部１０６は、図１１のユーザ履歴内共起ワードテーブルから、上記４つのワードに対応する行及び列を削除し、図１２に示される変形済みユーザ履歴内共起ワードテーブルを得る。 First, the co-occurrence word table creation unit 106 in the user history deletes words that do not appear in the user history from the co-occurrence word table in the user history of FIG. 11 (step S13), as shown in FIG. Transform the co-occurrence word table. The reason for deleting a word that does not appear in the user history is that a word that does not appear in the user history is a word that is of low interest to the user and is therefore considered to have a low importance as a feature word. In the example of the co-occurrence word table in the user history of FIG. 11, there are four words “current affair”, “fishing”, “ferry”, and “home” as feature words that never appear in the user history. Therefore, the user history co-occurrence word table creation unit 106 deletes the rows and columns corresponding to the above four words from the user history co-occurrence word table of FIG. Get the co-occurrence word table.

次に、ユーザ履歴内共起ワードテーブル作成部１０６は、共起元ワードをメタデータに含むコンテンツ数が最も少なくなるように、共起元ワードを選択する（ステップＳ１４）。このステップＳ１４の処理は、例えば、図１２の変形済みユーザ履歴内共起ワードテーブルを、共起元ワードをメタデータに含むコンテンツの数の順にソートすることにより、行うことができる。 Next, the co-occurrence word table creation unit 106 in the user history selects the co-occurrence source word so that the number of contents including the co-occurrence source word in the metadata is minimized (step S14). The process of step S14 can be performed, for example, by sorting the co-occurrence word table in the modified user history of FIG. 12 in the order of the number of contents including the co-occurrence source word in the metadata.

具体例で説明すると、例えばワード「野球」は、図８のコンテンツＩＤ・ワード表から、コンテンツＩＤ＝００１，００３のコンテンツのメタデータに含まれているため、当該ワードをメタデータに含むコンテンツの数は２となる。同様に他のコンテンツ内共起元ワードについても、当該コンテンツ内共起元ワードをメタデータに含むコンテンツの数を計算し、コンテンツの数の小さい順にソートすると、図１３に示すソート済みユーザ履歴内共起ワードテーブルが得られる。 For example, the word “baseball” is included in the content metadata of content ID = 001,003 from the content ID / word table of FIG. The number is 2. Similarly, for other co-occurrence source words in content, if the number of contents including the content co-occurrence source word in the metadata is calculated and sorted in ascending order of the number of contents, the contents in the sorted user history shown in FIG. A co-occurrence word table is obtained.

以下、ユーザ履歴内共起ワードテーブル作成部１０６は、図１３のソート済みユーザ履歴内共起ワードテーブルをもとに、コンテンツ数の小さいユーザ履歴内共起元ワードから順に特徴語のまとめ処理を行う。コンテンツ数の小さいユーザ履歴内共起元ワードから処理を行う理由は、特徴ベクトルの１つの次元あたりでカバーできるコンテンツの数、すなわち、当該次元に対応する特徴語をメタデータに含むコンテンツの数を増加させることが、レコメンデーション技術においては好ましいからである。 Thereafter, the co-occurrence word table creation unit 106 in the user history performs the process of collecting the feature words in order from the co-occurrence source words in the user history with the smallest number of contents based on the sorted user history co-occurrence word table of FIG. Do. The reason for processing from the user history co-occurrence source word with a small number of contents is that the number of contents that can be covered per dimension of the feature vector, that is, the number of contents that include the feature word corresponding to the dimension in the metadata. This is because it is preferable in the recommendation technique.

従って、本例では、コンテンツ数が最小である１のユーザ履歴内共起元ワードから順に処理することになる。図１３のソート済みユーザ履歴内共起ワードテーブルでは、ユーザ履歴内共起元ワードをメタデータに含むコンテンツ数が１のユーザ履歴内共起元ワードとして「感動」、「サッカー」、「温泉」があるが、ここでは、「感動」を選択する。 Therefore, in this example, processing is performed in order from the co-occurrence source word in the user history having the smallest number of contents. In the sorted user history co-occurrence word table of FIG. 13, the user history co-occurrence source word including the user history co-occurrence source word in the metadata has “1”, “soccer”, “hot spring” as the user history co-occurrence source words. Here, “impression” is selected.

ここで、予め、特徴ベクトルの１次元あたりでカバーするコンテンツの数の最低値を閾値Ｎとして定めておく。本実施形態では、閾値Ｎを５とする。 Here, the minimum value of the number of contents covered per one dimension of the feature vector is determined in advance as the threshold value N. In the present embodiment, the threshold value N is 5.

次に、ユーザ履歴内共起ワードテーブル作成部１０６が、ユーザ履歴内共起元ワードと最もユーザ履歴内共起強度の高いワードを選択する（ステップＳ１５）。そして、ユーザ履歴内共起ワードテーブル作成部１０６が、ユーザ履歴内共起元ワード及びステップＳ１５で選択されたワードをメタデータに含むコンテンツ数が閾値Ｎ以上であるか否かを判定する（ステップＳ１６）。このコンテンツ数が閾値Ｎ以上であれば（ステップＳ１６：ＮＯ）、ユーザ履歴内共起ワードテーブル作成部１０６は、他にユーザ履歴内共起先ワードがあるか否かを判定する（ステップＳ１７）。他にユーザ履歴内共起先ワードがあれば（ステップＳ１７：ＹＥＳ）、ユーザ履歴内共起ワードテーブル作成部１０６は、まだ選択されていないユーザ履歴内共起先ワードのうち、最も共起強度の大きいユーザ履歴内共起先ワードを選択し（ステップＳ１８）、再びステップＳ１６に戻る。 Next, the user history co-occurrence word table creation unit 106 selects the user history co-occurrence source word and the word with the highest user history co-occurrence strength (step S15). Then, the co-occurrence word table creation unit 106 in the user history determines whether or not the number of contents including the co-occurrence source word in the user history and the word selected in step S15 in the metadata is equal to or greater than a threshold value N (step S1). S16). If the number of contents is greater than or equal to the threshold value N (step S16: NO), the user history co-occurrence word table creation unit 106 determines whether there is another user history co-occurrence destination word (step S17). If there is another user history co-occurrence destination word (step S17: YES), the user history co-occurrence word table creation unit 106 has the highest co-occurrence strength among the user history co-occurrence destination words that have not yet been selected. The co-occurrence destination word in the user history is selected (step S18), and the process returns to step S16 again.

ステップＳ１６において、ユーザ履歴内共起元ワード及びステップＳ１５で選択されたユーザ履歴内共起先ワードをメタデータに含むコンテンツ数が閾値Ｎ以上である場合（ステップＳ１６：ＹＥＳ）、又はステップＳ１７において、他にユーザ履歴内共起先ワードがない場合（ステップＳ１７：ＮＯ）は、ユーザ履歴内共起ワードテーブル作成部１０６が、ここまで選択された特徴語を特徴ベクトルの１次元にまとめ、まとめられたワードをユーザ履歴内共起ワードテーブルから削除する（ステップＳ１９）。そして、ユーザ履歴内共起ワードテーブル作成部１０６が、ユーザ履歴内共起ワードテーブルが空になったか否かを判定する（ステップＳ２０）。ユーザ履歴内共起ワードテーブルが空になっていれば（ステップＳ２０：ＹＥＳ）、ユーザ履歴内共起による特徴語選択処理を終了する。ユーザ履歴内共起ワードテーブルが空になっていなければ（ステップＳ２０：ＮＯ）、再びＳ１４に戻り、他の共起元ワードを選択する。なお、上述のステップＳ１４〜Ｓ２０が、本発明の特徴辞書生成方法におけるユーザ履歴内共起ワード設定ステップに対応する。 In step S16, when the number of contents including the user history co-occurrence source word and the user history co-occurrence destination word selected in step S15 in the metadata is greater than or equal to the threshold N (step S16: YES), or in step S17, If there is no other co-occurrence word in the user history (step S17: NO), the co-occurrence word table creation unit 106 in the user history summarizes the feature words selected so far into one dimension of the feature vector. The word is deleted from the co-occurrence word table in the user history (step S19). Then, the user history co-occurrence word table creation unit 106 determines whether or not the user history co-occurrence word table has become empty (step S20). If the user history co-occurrence word table is empty (step S20: YES), the feature word selection process by the user history co-occurrence is terminated. If the co-occurrence word table in the user history is not empty (step S20: NO), the process returns to S14 again to select another co-occurrence source word. The above-described steps S14 to S20 correspond to the user history co-occurrence word setting step in the feature dictionary generation method of the present invention.

ステップＳ１４〜Ｓ２０について、具体例を挙げて説明する。前述の通り、各ユーザ履歴内共起元ワードをメタデータに含むコンテンツ数が１のユーザ履歴内共起元ワードとして「感動」が選択されている。図１３のソート済みユーザ履歴内共起ワードテーブルを参照すると、「感動」というワードと最もユーザ履歴内共起強度の高い共起先ワードは、ユーザ履歴内共起強度が２の「ミステリー」である。図８のコンテンツＩＤ・ワード対応表を参照すると、ワード「感動」及び「ミステリー」でカバーできるコンテンツ（すなわちメタデータに「感動」又は「ミステリー」を含むコンテンツ）は、コンテンツＩＤ＝００７，００８，００９，０１３の４つのコンテンツであることが分かる。しかしながら、このように「感動」と「ミステリー」を１次元にまとめただけでは、この１次元によりカバーできるコンテンツの個数が４であり、閾値５に満たない。そして、図１４を参照すると、他にユーザ履歴内共起元ワード「感動」と共起するユーザ履歴内共起先ワードとして、「ドラマ」、「海外」、「笑い」がある。ここで、図１４に示される通り、ユーザ履歴内共起先ワード「ドラマ」、「海外」、「笑い」のユーザ履歴内共起元ワード「感動」とのユーザ履歴内共起強度は、いずれも１である。したがって、どのユーザ履歴内共起先ワードを選択してもよいが、ここでは「ドラマ」を選択することとする。 Steps S14 to S20 will be described with specific examples. As described above, “impressed” is selected as the co-occurrence source word in the user history with the number of contents including each user history co-occurrence word in the metadata. Referring to the sorted user history co-occurrence word table in FIG. 13, the word “impressed” and the co-occurrence destination word with the highest user history co-occurrence strength are “mystery” with a user history co-occurrence strength of 2. . Referring to the content ID / word correspondence table of FIG. 8, content that can be covered by the words “impression” and “mystery” (that is, content including “impression” or “mystery” in metadata) is content ID = 007,008, It can be seen that there are four contents of 009 and 013. However, if only “impression” and “mystery” are combined in one dimension in this way, the number of contents that can be covered by this one dimension is four, which is less than the threshold value five. Then, referring to FIG. 14, there are “drama”, “overseas”, and “laughter” as other user history co-occurrence destination words co-occurring with the user history co-occurrence source word “impressed”. Here, as shown in FIG. 14, the co-occurrence intensity in the user history with the co-occurrence source word “impressed” in the user history of the user history co-occurrence destination words “drama”, “overseas”, and “laughter” are all 1. Therefore, any user history co-occurrence destination word may be selected, but “drama” is selected here.

図８に示されるコンテンツＩＤ・ワード対応表から、「ドラマ」は、コンテンツＩＤ＝００３，００８，０１３のコンテンツのメタデータに含まれる。このため、ユーザ履歴内共起元ワード「感動」及びこれまでに選択されたユーザ履歴内共起先ワード「ミステリー」に加え、新たに「ドラマ」を選択することにより、ここまで選択されたワード「感動」、「ミステリー」、「ドラマ」により、コンテンツＩＤ＝００３，００７，００８，００９，０１３の５個のコンテンツをカバーできることになる。したがって、カバーされるコンテンツの数が閾値「５」以上となる。このため、ユーザ履歴内共起ワードテーブル作成部１０６は、ユーザ履歴内共起元ワード「感動」とユーザ履歴内共起先ワード「ミステリー」及び「ドラマ」とを、特徴ベクトル上の同一次元の特徴語としてまとめることを決定する。そこで、ユーザ履歴内共起ワードテーブル作成部１０６は、これらの３つのワードの行及び列を図１３に示されるユーザ履歴内共起ワードテーブルから削除する。削除される行及び列は、図１４において、太枠で囲んで示されている。 From the content ID / word correspondence table shown in FIG. 8, “drama” is included in the metadata of the content with content ID = 003, 008, 013. For this reason, in addition to the user history co-occurrence source word “impression” and the user history co-occurrence destination word “mystery” selected so far, by newly selecting “drama”, the word “ The five contents of content ID = 003, 007, 008, 009, 013 can be covered by “impression”, “mystery”, and “drama”. Therefore, the number of contents to be covered is equal to or greater than the threshold “5”. For this reason, the user history co-occurrence word table creation unit 106 combines the user history co-occurrence source word “impressed” and the user history co-occurrence destination words “mystery” and “drama” on the feature vector in the same dimension. Decide to put it together as a word. Therefore, the user history co-occurrence word table creation unit 106 deletes these three word rows and columns from the user history co-occurrence word table shown in FIG. The rows and columns to be deleted are shown surrounded by a thick frame in FIG.

この時点で、まだユーザ履歴内共起ワードテーブルが空にはなっていない。そこで、再度ステップＳ１４に戻り、処理を続行する。当該ワードをメタデータに含むコンテンツ数が最も少ない、ここでは１のユーザ履歴内共起元ワードとして、「サッカー」を選択する。次に、ステップＳ１５で、ワード「サッカー」と最もユーザ履歴内共起強度の高いワードを選択するが、この場合、図１４から、「サッカー」と共起するワードは「野球」のみである。そこで、「サッカー」及び「野球」を特徴語としてまとめる。これにより、図８から、ユーザ履歴内共起元ワード「サッカー」及びユーザ履歴内共起先ワード「野球」により、コンテンツＩＤ＝００１，００３，００４の３個のコンテンツをカバーできる。このコンテンツの数３は、閾値「５」に達していない。しかしながら、図１４から、削除されていないユーザ履歴内共起先ワードが他に存在しない。そこで、ここまで選択されたワード「サッカー」及び「野球」を、特徴ベクトル上の同一次元の特徴語としてまとめることを決定する。 At this time, the co-occurrence word table in the user history is not yet empty. Then, it returns to step S14 again and continues a process. “Soccer” is selected as the co-occurrence source word in the user history having the smallest number of contents including the word in the metadata. Next, in step S15, the word “soccer” and the word having the highest co-occurrence strength in the user history are selected. In this case, the word that co-occurs with “soccer” is only “baseball” from FIG. Therefore, “soccer” and “baseball” are summarized as feature words. Accordingly, from FIG. 8, the three contents of content ID = 001, 003, 004 can be covered by the co-occurrence source word “soccer” in the user history and the co-occurrence destination word “baseball” in the user history. The number 3 of contents does not reach the threshold “5”. However, from FIG. 14, there is no other user history co-occurrence destination word that has not been deleted. Therefore, it is determined to combine the words “soccer” and “baseball” selected so far as feature words of the same dimension on the feature vector.

このようにして各ワードについて処理を繰り返すと、最終的には、特徴ベクトルの各次元に対応する特徴語として、以下のようにワードがまとめられる。
｛感動、ドラマ、ミステリー｝
｛サッカー、野球｝
｛温泉、海外、旅行｝
｛アニメ、笑い｝
｛ニュース｝ When the processing is repeated for each word in this way, finally, the words are collected as follows as feature words corresponding to each dimension of the feature vector.
{Impression, drama, mystery}
{Soccer, baseball}
{Onsen, overseas, travel}
{Anime, Laughter}
{news}

そして、特徴辞書管理部１０９は、上記のようにコンテンツ間共起ワードテーブル作成部１０８によりまとめて設定された特徴語で構成された特徴語を生成する（特徴辞書生成ステップ）。 Then, the feature dictionary management unit 109 generates a feature word composed of the feature words set together by the inter-content co-occurrence word table creation unit 108 as described above (a feature dictionary generation step).

次に、図５のステップＳ３のコンテンツ間共起による特徴語選択処理について、図７を参照して説明する。この処理は、ユーザ履歴に出現しなかったワードを特徴語として選定する処理として行われる。 Next, the feature word selection process by content co-occurrence in step S3 of FIG. 5 will be described with reference to FIG. This process is performed as a process of selecting a word that has not appeared in the user history as a feature word.

まず、コンテンツ間共起ワード算出部１０７及びコンテンツ間共起ワードテーブル作成部１０８が、コンテンツ間共起ワードテーブルを作成する（ステップＳ３１）。次に、コンテンツ間共起ワードテーブル作成部１０８が、各ワードをメタデータに含むコンテンツ数の最も少ないワードをコンテンツ間共起元ワードとして選択する（ステップＳ３２：コンテンツ間共起強度算出ステップ）。この処理では、複数のコンテンツ間の関係の観点から、ワードの互いの共起関係をまとめる処理が行われるとともに、ワード間のコンテンツ間共起強度が算出される。 First, the inter-content co-occurrence word calculation unit 107 and the inter-content co-occurrence word table creation unit 108 create an inter-content co-occurrence word table (step S31). Next, the inter-content co-occurrence word table creation unit 108 selects the word with the smallest number of contents including each word in the metadata as the inter-content co-occurrence source word (step S32: inter-content co-occurrence intensity calculation step). In this process, from the viewpoint of the relationship between a plurality of contents, a process of combining the co-occurrence relations of words is performed, and the co-occurrence strength between contents between words is calculated.

次に、コンテンツ間共起ワードテーブル作成部１０８が、コンテンツ間共起元ワードと最もコンテンツ間共起強度の強いワードをコンテンツ間共起先ワードとして選択する（ステップＳ３３）。ここで、２つのワードの間のコンテンツ間共起強度としては、２つのワードの両方をメタデータに含むコンテンツの数を用いる。 Next, the inter-content co-occurrence word table creation unit 108 selects the inter-content co-occurrence source word and the word having the strongest inter-content co-occurrence strength as the inter-content co-occurrence destination word (step S33). Here, as the content co-occurrence strength between two words, the number of contents including both the two words in the metadata is used.

次に、コンテンツ間共起ワードテーブル作成部１０８が、コンテンツ間共起元ワード及びステップＳ３３で選択されたワードをメタデータに含むコンテンツ数が閾値Ｎ以上であるか否かを判定する（ステップＳ３４）。このコンテンツ数が閾値Ｎ以上であれば（ステップＳ３４：ＮＯ）、コンテンツ間共起ワードテーブル作成部１０８は、他にコンテンツ間共起先ワードがあるか否かを判定する（ステップＳ３５）。他にコンテンツ間共起先ワードがあれば（ステップＳ３５：ＹＥＳ）、コンテンツ間共起ワードテーブル作成部１０８は、まだ選択されていないコンテンツ間共起先ワードのうち、最も共起強度の大きいコンテンツ間共起先ワードを選択し（ステップＳ３６）、再びステップＳ３４に戻る。 Next, the inter-content co-occurrence word table creation unit 108 determines whether or not the number of contents including the inter-content co-occurrence source word and the word selected in step S33 in the metadata is greater than or equal to a threshold value N (step S34). ). If the number of contents is greater than or equal to the threshold value N (step S34: NO), the inter-content co-occurrence word table creation unit 108 determines whether there is another inter-content co-occurrence destination word (step S35). If there is another inter-content co-occurrence destination word (step S35: YES), the inter-content co-occurrence word table creation unit 108 creates an inter-content co-occurrence word having the highest co-occurrence strength among the inter-content co-occurrence destination words. A starting word is selected (step S36), and the process returns to step S34 again.

ステップＳ３４において、コンテンツ間共起元ワード及びステップＳ３３で選択されたコンテンツ間共起先ワードをメタデータに含むコンテンツ数が閾値Ｎ以上である場合（ステップＳ３４：ＹＥＳ）、又はステップＳ３５において、他にコンテンツ間共起先ワードがない場合（ステップＳ３５：ＮＯ）は、コンテンツ間共起ワードテーブル作成部１０８が、ここまで選択された特徴語を特徴ベクトルの１次元にまとめ、まとめられたワードをコンテンツ間共起ワードテーブルから削除する（ステップＳ３７）。そして、コンテンツ間共起ワードテーブル作成部１０８が、コンテンツ間共起ワードテーブルが空になったか否かを判定する（ステップＳ３８）。コンテンツ間共起ワードテーブルが空になっていれば（ステップＳ３８：ＹＥＳ）、コンテンツ間共起による特徴語選択処理を終了する。コンテンツ間共起ワードテーブルが空になっていなければ（ステップＳ３８：ＮＯ）、再びＳ３２に戻り、他の共起元ワードを選択する。なお、上述のステップＳ３４〜Ｓ３８が、本発明の特徴辞書生成方法におけるコンテンツ間共起ワード設定ステップに相当する。 In step S34, if the number of contents including the inter-content co-occurrence source word and the inter-content co-occurrence destination word selected in step S33 in the metadata is equal to or greater than the threshold value N (step S34: YES), or otherwise in step S35 When there is no inter-content co-occurrence destination word (step S35: NO), the inter-content co-occurrence word table creation unit 108 summarizes the feature words selected so far into one dimension of the feature vector, and collects the collected words between the contents. Delete from the co-occurrence word table (step S37). Then, the inter-content co-occurrence word table creation unit 108 determines whether or not the inter-content co-occurrence word table is empty (step S38). If the inter-content co-occurrence word table is empty (step S38: YES), the feature word selection process by the inter-content co-occurrence is terminated. If the inter-content co-occurrence word table is not empty (step S38: NO), the process returns to S32 again to select another co-occurrence source word. The above steps S34 to S38 correspond to the inter-content co-occurrence word setting step in the feature dictionary generation method of the present invention.

この処理について、具体例を挙げて説明する。本例では、「時事」、「釣り」、「フェリー」、「家庭」というワードが対象となる。 This process will be described with a specific example. In this example, the words “current affairs”, “fishing”, “ferry”, and “home” are targeted.

コンテンツ間共起ワード算出部１０７は、次のようにコンテンツ間共起強度を算出する。図８のコンテンツＩＤ・ワード対応表から、ワード「フェリー」は、コンテンツＩＤ＝０１４，０１５の２つのコンテンツのメタデータに含まれる。そして、この２つのコンテンツのいずれのメタデータにも、ワード「家庭」が含まれている。したがって、ワード「フェリー」とワード「家庭」との間のコンテンツ間共起強度は２となる。 The inter-content co-occurrence word calculation unit 107 calculates the inter-content co-occurrence intensity as follows. From the content ID / word correspondence table of FIG. 8, the word “ferry” is included in the metadata of the two contents of content ID = 014,015. In addition, the word “home” is included in the metadata of these two contents. Therefore, the inter-content co-occurrence strength between the word “Ferry” and the word “Home” is 2.

同様に、コンテンツ間共起ワード算出部１０７は、ワード「釣り」、「家庭」についてもコンテンツ間共起強度を算出する。そして、コンテンツ間共起ワード算出部１０７は、コンテンツ間共起ワードテーブル作成部１０８に、算出されたコンテンツ間共起強度を出力する。コンテンツ間共起ワードテーブル作成部１０８は、コンテンツ間共起ワード算出部１０７から受け取ったコンテンツ間共起強度に基づいて、図１６に示されるコンテンツ間共起テーブルを作成する。 Similarly, the inter-content co-occurrence word calculation unit 107 calculates the inter-content co-occurrence intensity for the words “fishing” and “home”. Then, the inter-content co-occurrence word calculation unit 107 outputs the calculated inter-content co-occurrence strength to the inter-content co-occurrence word table creation unit 108. The inter-content co-occurrence word table creation unit 108 creates the inter-content co-occurrence table shown in FIG. 16 based on the inter-content co-occurrence strength received from the inter-content co-occurrence word calculation unit 107.

図８のコンテンツＩＤ・ワード対応表からも分かるように、ワード「釣り」、「フェリー」、「家庭」のうち、当該ワードをメタデータに含むコンテンツの数が最も小さいワードは、「釣り」である。そこで、ワード「釣り」とまとめる特徴語を決定する処理を行う。ワード「釣り」は、「フェリー」、「家庭」を共起する。そして、ワード「釣り」、「フェリー」、「家庭」で、コンテンツＩＤ＝０１０，０１４，０１５の３つのコンテンツをカバーできる。本例では、他のコンテンツ間共起先ワードがないため、「釣り」、「フェリー」、「家庭」を特徴語としてまとめる。これにより、コンテンツ間共起による特徴語選択処理を終了する。 As can be seen from the content ID / word correspondence table of FIG. 8, among the words “fishing”, “ferry”, and “home”, the word with the smallest number of contents including the word in the metadata is “fishing”. is there. Therefore, processing for determining feature words to be combined with the word “fishing” is performed. The word “fishing” co-occurs “ferry” and “family”. The three contents of content ID = 010, 014, and 015 can be covered with the words “fishing”, “ferry”, and “home”. In this example, since there is no other content co-occurrence destination word, “fishing”, “ferry”, and “home” are collected as feature words. Thereby, the feature word selection process by co-occurrence between contents is completed.

なお、ワード「時事」は、コンテンツＩＤ＝０１０のコンテンツのメタデータのみに含まれる。しかしながら、このコンテンツＩＤ＝０１０のコンテンツのメタデータには、「釣り」、「フェリー」、「家庭」の単語はいずれも含まれない。したがって、「時事」というワードは、コンテンツＩＤ＝０１０のコンテンツしか表現できないため、コンテンツの特徴を示すワードとしての重要度が低いと考えられる。したがって、「時事」は特徴語の候補から外す。 Note that the word “current affair” is included only in the metadata of the content with content ID = 010. However, the content metadata of content ID = 010 does not include any of the words “fishing”, “ferry”, and “home”. Therefore, since the word “current” can only express the content with content ID = 010, it is considered to be less important as a word indicating the feature of the content. Therefore, “current affairs” are excluded from candidate feature words.

そして、特徴辞書管理部１０９は、以上のようにコンテンツ間共起ワードテーブル作成部１０８によりまとめて設定された特徴語｛釣り、フェリー、家庭｝を特徴辞書に追加する（特徴辞書生成ステップ）。 The feature dictionary management unit 109 adds the feature words {fishing, ferry, home} collectively set by the inter-content co-occurrence word table creation unit 108 as described above to the feature dictionary (feature dictionary generation step).

以上述べたユーザ履歴内共起による特徴語選択処理及びコンテンツ間共起による特徴語選択処理によれば、以下のように特徴ベクトルの各次元に特徴語をまとめた特徴辞書が得られる。
｛感動、ドラマ、ミステリー｝
｛サッカー、野球｝
｛温泉、海外、旅行｝
｛アニメ、笑い｝
｛ニュース｝
｛釣り、フェリー、家庭｝ According to the feature word selection process based on the co-occurrence within the user history and the feature word selection process based on the co-occurrence between contents described above, a feature dictionary in which the feature words are summarized in each dimension of the feature vector is obtained as follows.
{Impression, drama, mystery}
{Soccer, baseball}
{Onsen, overseas, travel}
{Anime, Laughter}
{news}
{Fishing, ferry, home}

以上説明した本実施形態によれば、ユーザ履歴内共起元ワードとユーザ履歴内共起するユーザ履歴内共起先ワードがユーザ履歴内共起強度の強い順に選択され、ユーザ履歴内共起元ワードとともに特徴ベクトルの各次元に対応する特徴語としてまとめて設定される。また、コンテンツ間共起元ワードとコンテンツ間共起するコンテンツ間共起先ワードがコンテンツ間共起強度の強い順に選択され、コンテンツ間共起元ワードとともに特徴ベクトルの各次元に対応する特徴語としてまとめて設定される。そして、このように設定された特徴語で構成された特徴辞書が生成される。これにより、関連の強い１つ以上の特徴語をまとめて特徴ベクトルの各次元に対応させた特徴辞書が生成されるため、特徴ベクトルの各次元に複数の特徴語を対応付ける特徴辞書を定義する場合に、特徴語を自動的に且つ適切に定義することができる。そして、このように特徴辞書を生成することにより、メタデータに所定のワードを含むコンテンツを視聴した対象ユーザに、当該所定のワードと関連の強い（すなわちユーザ履歴内共起又はコンテンツ間共起する）他のワードをメタデータに含む他のコンテンツをレコメンドすることができ、対象ユーザに対するレコメンドコンテンツのバリエーションが広がる。 According to the embodiment described above, the co-occurrence source word in the user history and the co-occurrence destination word in the user history that co-occurs in the user history are selected in descending order of the co-occurrence strength in the user history. At the same time, it is collectively set as a feature word corresponding to each dimension of the feature vector. Also, the content co-occurrence source word and the content co-occurrence destination word are selected in the order of the strength of the content co-occurrence, and are summarized as feature words corresponding to each dimension of the feature vector together with the content co-occurrence source word. Is set. Then, a feature dictionary composed of the feature words set in this way is generated. As a result, a feature dictionary in which one or more strongly related feature words are collected and corresponded to each dimension of the feature vector is generated. Therefore, when defining a feature dictionary that associates a plurality of feature words to each dimension of the feature vector In addition, feature words can be defined automatically and appropriately. Then, by generating the feature dictionary in this manner, the target user who has viewed the content including the predetermined word in the metadata is strongly related to the predetermined word (that is, co-occurrence in the user history or between contents) ) Other content including other words in the metadata can be recommended, and the variation of the recommended content for the target user is expanded.

なお、本実施形態において、コンテンツを表すデータとして、メタデータを対象としたが、コンテンツを表すデータは、メタデータには限定されない。例えば、コンテンツがテキストデータである場合には、コンテンツを表すデータは、当該コンテンツ自身、すなわちテキストデータの全文としてもよい。 In this embodiment, metadata is targeted as data representing content, but data representing content is not limited to metadata. For example, when the content is text data, the data representing the content may be the content itself, that is, the entire text of the text data.

また、本実施形態のうち、ユーザ履歴内共起による特徴語選択処理において、図１５に示されるように、図６に示した処理において、ステップＳ１７でＮＯと判定してからステップＳ１９に移行する前に、ユーザ履歴内共起元ワードと共起し、かつ、既にユーザ履歴内共起ワードテーブルから削除されたワードをメタデータに含む特徴語と、ユーザ履歴内共起元ワード及び選択されたユーザ履歴内共起先ワードとをまとめてもよい。例えば、本実施形態の説明で示した具体例では、「サッカー」及び「野球」をまとめたところで、ユーザ履歴内共起元ワード「サッカー」に対する処理を終了したが、そのような処理に代えて、次のように処理してもよい。ワード「サッカー」が、図１４の太枠で囲まれた（すなわち削除済みの）ワード「ドラマ」と共起している。そこで、前述の通りまとめられたワード「感動」、「ミステリー」、「ドラマ」の３つに加えて「サッカー」、「野球」の２つを加えて、合計５ワードを特徴ベクトルの同一次元に対応する特徴語としてまとめてもよい。このようにした場合には、図８のコンテンツＩＤ・ワード対応表から、コンテンツＩＤ＝００１，００３，００４，００７，００８，００９，０１３の７つのコンテンツを上記の５ワードによりカバーできる。 Further, in the present embodiment, in the feature word selection process by co-occurrence in the user history, as shown in FIG. 15, in the process shown in FIG. 6, it is determined NO in step S17, and then the process proceeds to step S19. Before, a feature word that co-occurs with a user history co-occurrence source word and has already been deleted from the user history co-occurrence word table in the metadata, a user history co-occurrence source word, and the selected word The co-occurrence destination words in the user history may be collected. For example, in the specific example shown in the description of the present embodiment, the processing for the co-occurrence source word “soccer” in the user history is finished when “soccer” and “baseball” are collected, but instead of such processing, The following processing may be performed. The word “soccer” co-occurs with the word “drama” surrounded (ie, deleted) by a thick frame in FIG. Therefore, in addition to the three words “impression”, “mystery” and “drama” summarized as described above, two words “soccer” and “baseball” are added, and a total of five words are put into the same dimension of the feature vector. You may summarize as a corresponding feature word. In this case, from the content ID / word correspondence table of FIG. 8, the seven contents of content ID = 001, 003, 004, 007, 008, 009, 013 can be covered with the above five words.

このようなステップＳ５１を加えた変形例の処理によれば、特徴ベクトルの１次元あたりでカバーできるコンテンツの数は増える。一方、ステップＳ５１を加えない処理によれば、ワードが１次元にまとめられずに２つの特徴ベクトルの次元に分かれる。したがって、多くの次元でコンテンツを表現することができる。そのため、細やかなレコメンドサービスを実現できる。 According to the process of the modified example in which step S51 is added, the number of contents that can be covered per one dimension of the feature vector increases. On the other hand, according to the process without adding step S51, the words are not combined into one dimension but divided into two feature vector dimensions. Therefore, content can be expressed in many dimensions. Therefore, detailed recommendation service can be realized.

また、本実施形態のうち、コンテンツ間共起による特徴語選択処理の具体例において、「釣り」、「フェリー」、「家庭」の３つのワードを１つの次元にまとめた。しかしながら、カバーするコンテンツの閾値を「５」としていたため、この閾値「５」を下回る３つのコンテンツしかカバーできていない。そこで、「釣り」、「フェリー」、「家庭」は、いずれもコンテンツの特徴を表すワードとしての重要性が小さいと考え、これらをいずれも特徴語として採用しないこととしてもよい。 In this embodiment, in the specific example of the feature word selection process by co-occurrence between contents, three words “fishing”, “ferry”, and “home” are combined into one dimension. However, since the threshold value of the content to be covered is “5”, only three contents that are lower than this threshold value “5” can be covered. Therefore, “fishing”, “ferry”, and “home” are all considered to be less important as words representing the characteristics of the content, and none of them may be adopted as feature words.

さらに、本実施形態のコンテンツサーバ１００において、コンテンツ間共起ワード算出部１０７及びコンテンツ間共起ワードテーブル１０８を省略してもよい。また、本実施形態の特徴辞書生成方法において、コンテンツ間共起による特徴語選択処理（ステップＳ３（図５参照））を省略してもよい。 Furthermore, in the content server 100 of this embodiment, the inter-content co-occurrence word calculation unit 107 and the inter-content co-occurrence word table 108 may be omitted. In the feature dictionary generation method of the present embodiment, the feature word selection process (step S3 (see FIG. 5)) by co-occurrence between contents may be omitted.

１００…コンテンツサーバ、１０５…ユーザ履歴内共起ワード算出部（ユーザ履歴内共起強度算出手段）、１０６…ユーザ履歴内共起ワードテーブル作成部（ユーザ履歴内共起ワード設定手段）、１０７…コンテンツ間共起ワード算出部（コンテンツ間共起強度算出手段）、１０８…コンテンツ間共起ワードテーブル作成部（コンテンツ間共起ワード設定手段）、１０９…特徴辞書管理部（特徴辞書生成手段）。 DESCRIPTION OF SYMBOLS 100 ... Content server, 105 ... User history co-occurrence word calculation part (User history co-occurrence intensity calculation means), 106 ... User history co-occurrence word table creation part (User history co-occurrence word setting means), 107 ... Inter-content co-occurrence word calculation unit (inter-content co-occurrence strength calculation unit), 108... Inter-content co-occurrence word table creation unit (inter-content co-occurrence word setting unit), 109... Feature dictionary management unit (feature dictionary generation unit).

Claims

コンテンツを表すデータから抽出されたユーザ履歴内共起元ワードのそれぞれについて、コンテンツを表すデータから抽出された他のワードとのユーザ履歴内共起強度を算出するユーザ履歴内共起強度算出手段と、
前記ユーザ履歴内共起強度算出手段により算出されたユーザ履歴内共起強度の強い順に、前記ユーザ履歴内共起元ワードとユーザ履歴内共起するユーザ履歴内共起先ワードを１つ以上選択し、前記ユーザ履歴内共起元ワード及び前記選択されたユーザ履歴内共起先ワードを特徴ベクトルの各次元に対応する特徴語としてまとめて設定するユーザ履歴内共起ワード設定手段と、
前記ユーザ履歴内共起ワード設定手段により特徴ベクトルの各次元に対応する特徴語として設定された特徴語で構成された特徴辞書を生成する特徴辞書生成手段と、
を備えるレコメンドサーバ。 A user history co-occurrence strength calculating means for calculating a user history co-occurrence strength with another word extracted from the data representing content for each of the user history co-occurrence source words extracted from the data representing content; ,
The user history co-occurrence intensity calculated by the user history co-occurrence intensity calculating means selects one or more user history co-occurrence source words and user history co-occurrence destination words co-occurring in the user history in descending order. The user history co-occurrence source word and the selected user history co-occurrence destination word are collectively set as feature words corresponding to each dimension of the feature vector;
Feature dictionary generating means for generating a feature dictionary composed of feature words set as feature words corresponding to each dimension of a feature vector by the user history co-occurrence word setting means;
A recommendation server with

前記ユーザ履歴内共起ワード設定手段により選択されなかったワードであるコンテンツ間共起元ワードのそれぞれについて、他のワードとのコンテンツ間共起強度を算出するコンテンツ間共起強度算出手段と、
前記コンテンツ間共起強度算出手段により算出されたコンテンツ間共起強度の強い順に、前記コンテンツ間共起元ワードとコンテンツ間共起するコンテンツ間共起先ワードを１つ以上選択し、前記コンテンツ間共起元ワード及び前記選択されたコンテンツ間共起先ワードを特徴ベクトルの各次元に対応する特徴語としてまとめて設定するコンテンツ間共起ワード設定手段と、
をさらに備え、
前記特徴辞書生成手段は、前記ユーザ履歴内共起ワード設定手段及び前記コンテンツ間共起ワード設定手段により特徴ベクトルの各次元に対応する特徴語として設定された特徴語で構成された特徴辞書を生成する、請求項１に記載のレコメンドサーバ。 Content co-occurrence intensity calculating means for calculating content co-occurrence intensity with other words for each content co-occurrence source word that is a word not selected by the user history co-occurrence word setting means,
One or more content co-occurrence source words and one or more content co-occurrence destination words co-occurring between the contents are selected in descending order of the content co-occurrence strength calculated by the content co-occurrence strength calculating means, An inter-content co-occurrence word setting means for collectively setting an origin word and the selected inter-content co-occurrence destination word as a feature word corresponding to each dimension of a feature vector;
Further comprising
The feature dictionary generating means generates a feature dictionary composed of feature words set as feature words corresponding to each dimension of a feature vector by the user history co-occurrence word setting means and the inter-content co-occurrence word setting means. The recommendation server according to claim 1.

前記コンテンツ間共起ワード設定手段は、特徴語としてまとめて設定されたコンテンツ間共起元ワード及びコンテンツ間共起先ワードのうち少なくとも１つを、前記コンテンツを表すデータに含むコンテンツの数が所定の閾値以上となるようにコンテンツ間共起元ワード及びコンテンツ間共起先ワードを選択する、
請求項２に記載のレコメンドサーバ。 The inter-content co-occurrence word setting means has a predetermined number of contents including at least one of the inter-content co-occurrence source word and the inter-content co-occurrence destination word collectively set as feature words in the data representing the content. Select the inter-content co-occurrence source word and the inter-content co-occurrence destination word so as to be equal to or greater than a threshold
The recommendation server according to claim 2.

前記ユーザ履歴内共起ワード設定手段は、特徴語としてまとめて設定されたユーザ履歴内共起元ワード及びユーザ履歴内共起先ワードのうち少なくとも１つを、前記コンテンツを表すデータに含むコンテンツの数が所定の閾値以上となるようにユーザ履歴内共起元ワード及びユーザ履歴内共起先ワードを選択する、
請求項１〜３の何れか一項に記載のレコメンドサーバ。 The user history co-occurrence word setting means includes at least one of user history co-occurrence source words and user history co-occurrence destination words collectively set as feature words in the data representing the content. Selecting the co-occurrence source word in the user history and the co-occurrence destination word in the user history so that is equal to or greater than a predetermined threshold,
The recommendation server as described in any one of Claims 1-3.

前記ユーザ履歴内共起強度は、前記コンテンツを表すデータに前記ユーザ履歴内共起元ワードが含まれるコンテンツの視聴履歴を有するユーザが視聴したコンテンツのうち、前記コンテンツを表すデータに前記ユーザ履歴内共起先ワードが含まれるコンテンツの個数である、
請求項１〜４の何れか一項に記載のレコメンドサーバ。 The co-occurrence strength in the user history is the content of the user history in the data representing the content out of the content viewed by the user who has the viewing history of the content including the co-occurrence source word in the user history in the data representing the content. This is the number of contents that contain the co-occurrence destination word.
The recommendation server as described in any one of Claims 1-4.

前記ユーザ履歴内共起強度は、前記コンテンツを表すデータに前記ユーザ履歴内共起元ワードが含まれるコンテンツの視聴履歴を有するユーザのうち、前記コンテンツを表すデータに前記ユーザ履歴内共起先ワードが含まれるコンテンツの視聴履歴を有するユーザの数である、
請求項１〜４の何れか一項に記載のレコメンドサーバ。 The in-user history co-occurrence strength is the user history co-occurrence destination word in the data representing the content among the users having the viewing history of the content in which the data representing the content includes the co-occurrence source word in the user history. The number of users who have a viewing history of the included content;
The recommendation server as described in any one of Claims 1-4.

レコメンドサーバにより実行される特徴辞書生成方法であって、
コンテンツを表すデータから抽出されたユーザ履歴内共起元ワードのそれぞれについて、コンテンツを表すデータから抽出された他のワードとのユーザ履歴内共起強度を算出するユーザ履歴内共起強度算出ステップと、
前記ユーザ履歴内共起強度算出ステップにより算出されたユーザ履歴内共起強度の強い順に、前記ユーザ履歴内共起元ワードとユーザ履歴内共起するユーザ履歴内共起先ワードを１つ以上選択し、前記ユーザ履歴内共起元ワード及び前記選択されたユーザ履歴内共起先ワードを特徴ベクトルの各次元に対応する特徴語としてまとめて設定するユーザ履歴内共起ワード設定ステップと、
前記ユーザ履歴内共起ワード設定ステップにより特徴ベクトルの各次元に対応する特徴語として設定された特徴語で構成された特徴辞書を生成する特徴辞書生成ステップと、
を備える特徴辞書生成方法。 A feature dictionary generation method executed by a recommendation server,
A user history co-occurrence intensity calculating step for calculating a user history co-occurrence intensity with another word extracted from the data representing the content for each of the user history co-occurrence source words extracted from the data representing the content; ,
One or more co-occurrence source words in the user history that co-occur in the user history and the co-occurrence source words in the user history are selected in descending order of the co-occurrence intensity in the user history calculated in the step of calculating the co-occurrence intensity in the user history The user history co-occurrence source word and the selected user history co-occurrence destination word are collectively set as a feature word corresponding to each dimension of the feature vector, and a user history co-occurrence word setting step;
A feature dictionary generating step of generating a feature dictionary composed of feature words set as feature words corresponding to each dimension of a feature vector by the user history co-occurrence word setting step;
A feature dictionary generation method comprising: