JP6235386B2

JP6235386B2 - Information presenting apparatus, information presenting method, and program

Info

Publication number: JP6235386B2
Application number: JP2014057009A
Authority: JP
Inventors: 浜田　伸一郎; 伸一郎浜田
Original assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Current assignee: Toshiba Corp; Toshiba Digital Solutions Corp
Priority date: 2014-03-19
Filing date: 2014-03-19
Publication date: 2017-11-22
Anticipated expiration: 2034-03-19
Also published as: JP2015179441A

Description

本発明の実施形態は、情報提示装置、情報提示方法およびプログラムに関する。 Embodiments described herein relate generally to an information presentation apparatus, an information presentation method, and a program.

電子商取引（ＥＣ：Electronic Commerce）のサービスを提供するＥＣシステムの多くは、ユーザがある商品を参照しているときに、その商品に関連する別の商品を併せて提示する商品推薦機能を持つ。商品推薦機能は、ユーザが参照している商品（以下、「第１商品」という。）と同種の商品を購入の選択肢として提示する対立型推薦と、第１商品と調和する別商品（以下、「第２商品」という。）を紹介して、いわゆる“ついで買い”を促す調和型推薦に大別される。このうち調和型推薦は、多くの場合、統計的観点において第１商品と相関性の高い商品を第２商品として提示する仕組みによって実現されている。 Many EC systems that provide electronic commerce (EC) services have a product recommendation function that presents another product related to the product when the user is browsing the product. The product recommendation function is a product that is referred to by the user (hereinafter referred to as “first product”) as a purchase option, and another type of product (hereinafter referred to as “first product”) that harmonizes with the first product. "Second product"), which is broadly divided into so-called “harmonious recommendation” that encourages “buying”. Of these, harmonized recommendation is often realized by a mechanism that presents a product highly correlated with the first product as the second product from a statistical viewpoint.

調和型推薦においては、第１商品と第２商品との組み合わせ効果をユーザが認識していることが重要である。つまり、第１商品と併せて単に第２商品を提示しただけでは、ユーザがこれらの商品の組み合わせ効果を認識していない場合、第２商品を“ついで買い”する動機付けが生まれない。例えば、秩父のＢ級グルメとして話題になった「味噌ポテト」を知らないユーザにとっては、「味噌」と併せて「ジャガイモ」が提示されただけでは、これらは奇妙な食材の組み合わせに感じられるだけで、「味噌」のついでに「ジャガイモ」を買おうとする購買意欲は生じない。このため、第２商品を提示する際には、第１商品と第２商品との組み合わせ効果に関する情報を含む推薦理由を併せて提示することが、調和型推薦による販売促進の効果を高める上で有効になると考えられる。 In harmonized recommendation, it is important that the user recognizes the combined effect of the first product and the second product. In other words, simply presenting the second product together with the first product does not create a motivation to “buy” the second product if the user does not recognize the combined effect of these products. For example, for users who do not know "Miso Potato", which has become a hot topic for Chichibu's Class B gourmet, just presenting "potato" along with "Miso" can only be felt as a strange combination of ingredients. Therefore, there is no willingness to purchase “potato” along with “miso”. For this reason, when presenting the second product, presenting the recommendation reason including information related to the combination effect of the first product and the second product together in order to enhance the sales promotion effect by the harmonized recommendation. It will be effective.

しかし、これまでのＥＣシステムでは、商品単体に関する推薦理由を提示する仕組み（例えばレビュー表示機能など）は持っていても、複数の商品の組み合わせ効果に関する情報を含む推薦理由を提示する仕組みは持っていない。このため、このような組み合わせ効果に関する情報を含む推薦理由を提示する仕組みの構築が求められている。 However, the EC system so far has a mechanism for presenting the reason for recommendation for a single product (for example, a review display function), but has a mechanism for presenting a reason for recommendation including information on the combined effect of multiple products. Absent. For this reason, there is a demand for the construction of a mechanism for presenting a reason for recommendation including information on such a combination effect.

特開２００６−１９０１２７号公報JP 2006-190127 A

本発明が解決しようとする課題は、第１商品と第２商品との組み合わせ効果に関する情報を含む推薦理由を適切に提示して、調和型推薦による販売促進の効果を高めることができる情報提示装置、情報提示方法およびプログラムを提供することである。 The problem to be solved by the present invention is to provide an information presentation device that can appropriately present a reason for recommendation including information related to the effect of combination of the first product and the second product, and can enhance the effect of sales promotion by harmonized recommendation. It is to provide an information presentation method and program.

実施形態の情報提示装置は、ユーザが参照している第１商品に調和する第２商品を推薦する際に、前記第１商品と前記第２商品との組み合わせ効果に関する情報を含む推薦理由を提示する情報提示装置であり、第１スコア算出部と、第２スコア算出部と、第３スコア算出部と、統合スコア算出部と、提示部と、を備える。第１スコア算出部は、検索対象の文書群から前記第１商品に関する第１文書群を抽出し、該第１文書群に含まれる各単語のそれぞれについて、前記第１商品との関連性を表す第１スコアを算出する。第２スコア算出部は、検索対象の文書群から前記第２商品に関する第２文書群を抽出し、該第２文書群に含まれる各単語のそれぞれについて、前記第２商品との関連性を表す第２スコアを算出する。第３スコア算出部は、検索対象の文書群から前記第１商品と前記第２商品との双方に関する第３文書群を抽出し、該第３文書群に含まれる各単語のそれぞれについて、前記第１商品および前記第２商品の双方との関連性を表す第３スコアを算出する。統合スコア算出部は、前記第３文書群に含まれる各単語のそれぞれについて、前記第３スコアから前記第１スコアと前記第２スコアとを減算し、統合スコアを算出する。提示部は、前記統合スコアに基づいて所定の基準に従って選択された１以上の重要単語、または、該重要単語を含む前記第３文書群中の１以上のテキストの少なくとも一方を、前記推薦理由として提示する。 The information presentation apparatus according to the embodiment presents a recommendation reason including information on a combination effect of the first product and the second product when recommending a second product that harmonizes with the first product referred to by the user. The information presentation device includes a first score calculation unit, a second score calculation unit, a third score calculation unit, an integrated score calculation unit, and a presentation unit. The first score calculation unit extracts a first document group related to the first product from the document group to be searched, and represents a relationship with the first product for each word included in the first document group. A first score is calculated. The second score calculation unit extracts a second document group relating to the second product from the document group to be searched, and represents each word included in the second document group and the relevance to the second product. A second score is calculated. The third score calculation unit extracts a third document group relating to both the first product and the second product from the document group to be searched, and for each word included in the third document group, A third score representing the relevance of both one product and the second product is calculated. The integrated score calculating unit calculates an integrated score by subtracting the first score and the second score from the third score for each word included in the third document group. The presenting unit selects at least one of one or more important words selected according to a predetermined criterion based on the integrated score or one or more texts in the third document group including the important words as the reason for recommendation. Present.

図１は、第１実施形態の情報提示装置の構成例を示す図である。FIG. 1 is a diagram illustrating a configuration example of the information presentation apparatus according to the first embodiment. 図２は、Ａ文書群抽出器の処理手順を示すフローチャートである。FIG. 2 is a flowchart showing the processing procedure of the A document group extractor. 図３は、単語の表現正規化に用いる同意語辞書の一例を示す図である。FIG. 3 is a diagram illustrating an example of a synonym dictionary used for word expression normalization. 図４は、全文書群抽出器の処理手順を示すフローチャートである。FIG. 4 is a flowchart showing the processing procedure of the all document group extractor. 図５は、単語関連度評価器の処理手順を示すフローチャートである。FIG. 5 is a flowchart showing the processing procedure of the word relevance evaluator. 図６は、単語重要度評価器の処理手順を示すフローチャートである。FIG. 6 is a flowchart showing a processing procedure of the word importance degree evaluator. 図７は、統合スコア算出部の処理手順を示すフローチャートである。FIG. 7 is a flowchart illustrating a processing procedure of the integrated score calculation unit. 図８は、固有文出力器の処理手順を示すフローチャートである。FIG. 8 is a flowchart showing the processing procedure of the specific sentence output device. 図９は、第２実施形態の情報提示装置の構成例を示す図である。FIG. 9 is a diagram illustrating a configuration example of the information presentation apparatus according to the second embodiment. 図１０は、Ａ文書群抽出器の処理手順を示すフローチャートである。FIG. 10 is a flowchart showing the processing procedure of the A document group extractor. 図１１は、Ａ∩Ｂ文書群抽出器の処理手順を示すフローチャートである。FIG. 11 is a flowchart showing a processing procedure of the A∩B document group extractor. 図１２は、Ａ∩Ｂ文書群抽出器の判定例を説明する図である。FIG. 12 is a diagram for explaining a determination example of the A∩B document group extractor. 図１３は、単語関連度評価器の処理手順を示すフローチャートである。FIG. 13 is a flowchart showing the processing procedure of the word relevance evaluator. 図１４は、情報提示装置のハードウェア構成例を示すブロック図である。FIG. 14 is a block diagram illustrating a hardware configuration example of the information presentation device.

以下、実施形態の情報提示装置、情報提示方法およびプログラムを、図面を参照して詳細に説明する。 Hereinafter, an information presentation device, an information presentation method, and a program according to embodiments will be described in detail with reference to the drawings.

実施形態の情報提示装置は、ユーザが参照している第１商品に調和する第２商品を推薦する際に、第１商品と第２商品との組み合わせ効果に関する情報を含む推薦理由を提示する。このような推薦理由をあらゆる商品の組み合わせについて予め人手で作成することは困難である。しかし、商品の組み合わせ効果に関する情報は、例えば、各種のＷｅｂページ、ＳＮＳ（Social Networking Service）、ブログなどの文書群に存在する。そこで、本実施形態では、このような文書群から両商品に関する文書群を見つけ出し、さらに商品の組み合わせ効果などの推薦理由としてふさわしい言及箇所を特定してユーザに提示する。なお、以下では説明の簡便化のため、第１商品を商品Ａ、第１商品に関する記載のある文書をＡ文書、第２商品を商品Ｂ、第２商品に関する記載のある文書をＢ文書、第１商品と第２商品との双方に関する記載のある文書をＡ∩Ｂ文書と呼ぶ。 The information presentation apparatus according to the embodiment presents a reason for recommendation including information related to a combination effect of the first product and the second product when recommending the second product in harmony with the first product referred to by the user. It is difficult to manually create such a reason for recommendation for all combinations of products in advance. However, information related to the combination effect of products exists in document groups such as various Web pages, SNS (Social Networking Service), and blogs. Therefore, in the present embodiment, a document group related to both products is found from such a document group, and further, a reference location suitable as a recommendation reason such as a combination effect of products is specified and presented to the user. In the following, for the sake of simplicity of explanation, the first product is the product A, the document with the description about the first product is the A document, the second product is the product B, the document with the description about the second product is the B document, A document that describes both one product and second product is called an A∩B document.

＜第１実施形態＞
まず、第１実施形態の情報提示装置について説明する。図１は、第１実施形態の情報提示装置の構成例を示す図である。本実施形態の情報提示装置は、図１に示すように、第１スコア算出部１０と、第２スコア算出部２０と、第３スコア算出部３０と、第４スコア算出部４０と、統合スコア算出部５０と、提示部６０とを備え、文書ＤＢ（Database）１００から取得した第１商品と第２商品との組み合わせ効果に関する情報を含む推薦理由を画面２００に表示して、ＥＣシステムのサービスを利用しているユーザに提示する。なお、本実施形態の情報処理装置は、ＥＣシステムの機能の一部として実現されることを想定するが、これに限らず、例えばＥＣシステムと連動して動作する独立のシステムあるいは装置として構成されてもよい。 <First Embodiment>
First, the information presentation apparatus according to the first embodiment will be described. FIG. 1 is a diagram illustrating a configuration example of the information presentation apparatus according to the first embodiment. As shown in FIG. 1, the information presentation apparatus according to the present embodiment includes a first score calculation unit 10, a second score calculation unit 20, a third score calculation unit 30, a fourth score calculation unit 40, and an integrated score. The EC system service includes a calculation unit 50 and a presentation unit 60, and displays a recommendation reason including information on the combination effect of the first product and the second product acquired from the document DB (Database) 100 on the screen 200. Present to users who use. Note that the information processing apparatus according to the present embodiment is assumed to be realized as part of the functions of the EC system, but is not limited thereto, and is configured as an independent system or apparatus that operates in conjunction with the EC system, for example. May be.

文書ＤＢ１００は、本実施形態において検索対象とする任意の文書群であり、各種のＷｅｂページ、ＳＮＳ、ブログなどを想定している。また、画面２００は、ＥＣシステムのサービスを利用しているユーザの端末装置に表示される画面を想定しており、一般的にはＷｅｂブラウザを搭載する端末装置に表示されるＷｅｂ画面である。 The document DB 100 is an arbitrary document group to be searched in the present embodiment, and assumes various Web pages, SNSs, blogs, and the like. The screen 200 is assumed to be a screen displayed on a terminal device of a user who uses an EC system service, and is generally a Web screen displayed on a terminal device equipped with a Web browser.

第１スコア算出部１０は、Ａ文書群抽出器１１と、単語関連度評価器１２とを含む。 The first score calculation unit 10 includes an A document group extractor 11 and a word relevance evaluator 12.

Ａ文書群抽出器１１は、文書ＤＢ１００に対して単語ベースの検索を行い、文書ＤＢ１００から商品Ａに関する記載のあるＡ文書をすべて抽出してＡ文書群１５を得る。 The A document group extractor 11 performs a word-based search on the document DB 100, extracts all A documents with descriptions about the product A from the document DB 100, and obtains the A document group 15.

単語関連度評価器１２は、Ａ文書群１５における各単語のヒストグラム（単語ごとの頻度を列挙したデータ）を作成し、各単語のそれぞれについて、Ａ文書群１５中の出現頻度に応じた第１スコアを算出する。ただし、各単語は辞書を用いて全半角・日英・送り仮名などの表記揺れを吸収する。また、各単語の出現頻度は総単語数で割ることで数値を正規化し、ｌｏｇスケールにしたものを第１スコアとする。したがって、第１スコアは負の値であり、Ａ文書群１５中の出現頻度が高い単語ほど、０に近い高い値の第１スコアが与えられる。 The word relevance evaluator 12 creates a histogram of each word in the A document group 15 (data listing the frequencies for each word), and the first word corresponding to the appearance frequency in the A document group 15 for each word. Calculate the score. However, each word uses a dictionary to absorb notation fluctuations such as full-width, Japanese-English, and sending kana. Also, the appearance frequency of each word is normalized by dividing by the total number of words, and the log scale is used as the first score. Therefore, the first score is a negative value, and a word having a higher appearance frequency in the A document group 15 is given a first score having a higher value near 0.

第２スコア算出部２０は、Ｂ文書群抽出器２１と、単語関連度評価器２２とを含む。 The second score calculation unit 20 includes a B document group extractor 21 and a word association degree evaluator 22.

Ｂ文書群抽出器２１は、文書ＤＢ１００に対して単語ベースの検索を行い、文書ＤＢ１００から商品Ｂについての記載のあるＢ文書をすべて抽出してＢ文書群２５を得る。 The B document group extractor 21 performs a word-based search on the document DB 100 and extracts all B documents having a description about the product B from the document DB 100 to obtain a B document group 25.

単語関連度評価器２２は、Ｂ文書群２５における各単語のヒストグラムを作成し、各単語のそれぞれについて、Ｂ文書群２５中の出現頻度に応じた第２スコアを算出する。ただし、各単語は辞書を用いて全半角・日英・送り仮名などの表記揺れを吸収する。また、各単語の出現頻度は総単語数で割ることで数値を正規化し、ｌｏｇスケールにしたものを第２スコアとする。したがって、第２スコアは負の値であり、Ｂ文書群２５中の出現頻度が高い単語ほど、０に近い高い値の第２スコアが与えられる。 The word relevance evaluator 22 creates a histogram of each word in the B document group 25, and calculates a second score corresponding to the appearance frequency in the B document group 25 for each word. However, each word uses a dictionary to absorb notation fluctuations such as full-width, Japanese-English, and sending kana. Also, the appearance frequency of each word is normalized by dividing by the total number of words, and the log scale is used as the second score. Therefore, the second score is a negative value, and a word having a higher appearance frequency in the B document group 25 is given a second score having a higher value near 0.

第３スコア算出部３０は、Ａ∩Ｂ文書群抽出器３１と、単語関連度評価器３２とを含む。 The third score calculation unit 30 includes an A∩B document group extractor 31 and a word relevance evaluator 32.

Ａ∩Ｂ文書群抽出器３１は、文書ＤＢ１００に対して単語ベースの検索を行い、文書ＤＢ１００から商品Ａと商品Ｂとの双方についての記載のあるＡ∩Ｂ文書をすべて抽出してＡ∩Ｂ文書群３５を得る。 The A∩B document group extractor 31 performs a word-based search on the document DB 100, extracts all A∩B documents having descriptions about both the products A and B from the document DB 100, and extracts the A∩B. A document group 35 is obtained.

単語関連度評価器３２は、Ａ∩Ｂ文書群３５における各単語のヒストグラムを作成し、各単語のそれぞれについて、Ａ∩Ｂ文書群３５中の出現頻度に応じた第３スコアを算出する。ただし、各単語は辞書を用いて全半角・日英・送り仮名などの表記揺れを吸収する。また、各単語の出現頻度は総単語数で割ることで数値を正規化し、ｌｏｇスケールにしたものを第３スコアとする。したがって、第３スコアは負の値であり、Ａ∩Ｂ文書群中の出現頻度が高い単語ほど、０に近い高い値の第３スコアが与えられる。 The word relevance evaluator 32 creates a histogram of each word in the A∩B document group 35 and calculates a third score corresponding to the appearance frequency in the A∩B document group 35 for each word. However, each word uses a dictionary to absorb notation fluctuations such as full-width, Japanese-English, and sending kana. Further, the appearance frequency of each word is normalized by dividing by the total number of words, and a log scale is used as the third score. Therefore, the third score is a negative value, and a word having a higher appearance frequency in the A∩B document group is given a third score having a higher value near 0.

第４スコア算出部４０は、全文書群抽出器４１と、単語重要度評価器４２とを含む。 The fourth score calculation unit 40 includes an all document group extractor 41 and a word importance level evaluator 42.

全文書群抽出器４１は、文書ＤＢ１００から文書をすべて抽出して全文書群４５を得る。 The all document group extractor 41 extracts all documents from the document DB 100 to obtain the all document group 45.

単語重要度評価器４２は、全文書群４５における各単語が含まれる文書数のヒストグラムを作成し、各単語のそれぞれについて、全文書群４５中における当該単語を含む文書の出現頻度に応じた第４スコアを算出する。ただし、各単語は辞書を用いて全半角・日英・送り仮名などの表記揺れを吸収する。また、各単語を含む文書の出現頻度は総文書数で割ることで数値を正規化し、ｌｏｇスケールにしてさらに正負を反転させたものを第４スコアとする。したがって、第４スコアは正の値であり、当該単語を含む文書の出現頻度が低いほど高い値の第４スコアが与えられる。 The word importance evaluator 42 creates a histogram of the number of documents including each word in the entire document group 45, and for each word, the word importance level evaluator 42 corresponds to the appearance frequency of the document including the word in the entire document group 45. 4 scores are calculated. However, each word uses a dictionary to absorb notation fluctuations such as full-width, Japanese-English, and sending kana. Also, the appearance frequency of documents including each word is normalized by dividing the total frequency by the total number of documents, and the log scale is used to further reverse the positive and negative values as the fourth score. Accordingly, the fourth score is a positive value, and a higher fourth score is given as the appearance frequency of the document including the word is lower.

統合スコア算出部５０は、Ａ∩Ｂ文書群３５に含まれる各単語のそれぞれについて、第３スコアと、第１スコアと、第２スコアと、第４スコアとを用いて、後述の式（１）を用いた計算によって統合スコアを算出する。統合スコアは、商品Ａと商品Ｂとの双方に関する話題に対する固有性を表す指標であり、商品Ａと商品Ｂとの双方に関する話題に対する固有性が高い単語ほど、高い値の統合スコアが与えられる。 The integrated score calculation unit 50 uses the third score, the first score, the second score, and the fourth score for each of the words included in the A∩B document group 35 to obtain the following formula (1 ) Is used to calculate the integrated score. The integrated score is an index representing the uniqueness to the topic related to both the product A and the product B, and a higher integrated score is given to a word having a higher uniqueness to the topic related to both the product A and the product B.

提示部６０は、固有語出力器６１と、固有文出力器６２とを含む。 The presentation unit 60 includes a proper word output device 61 and a proper sentence output device 62.

固有語出力器６１は、統合スコアに基づいて、商品Ａと商品Ｂとの双方に関する話題に対する固有性が高い１以上の重要単語（固有語）を選択し、単語ベースの推薦理由６５として画面２００に出力する。推薦理由が単語のみでよい場合は、この固有語出力器６１が出力する単語ベースの推薦理由６５が画面２００に表示される。 The proper word output unit 61 selects one or more important words (proprietary words) that are highly specific to the topics related to both the products A and B based on the integrated score, and displays the screen 200 as the word-based recommendation reason 65. Output to. When the recommendation reason is only a word, the word-based recommendation reason 65 output from the proper word output unit 61 is displayed on the screen 200.

固有文出力器６２は、Ａ∩Ｂ文書群３５から、固有語出力器６１により選択された重要単語（固有語）を多く持つ１以上の文を選択し、文ベースの推薦理由６６として画面２００に出力する。推薦理由を文とすることが要求される場合は、この固有文出力器６２が出力する文ベースの推薦理由６６が画面２００に表示される。また、固有語出力器６１が出力する単語ベースの推薦理由６５と、固有文出力器６２が出力する文ベースの推薦理由６６との双方を画面２００に表示させるようにしてもよい。 The proper sentence output device 62 selects one or more sentences having many important words (proprietary words) selected by the proper word output device 61 from the A∩B document group 35, and displays the screen 200 as the sentence-based recommendation reason 66. Output to. When the recommendation reason is required to be a sentence, a sentence-based recommendation reason 66 output from the specific sentence output unit 62 is displayed on the screen 200. Further, both the word-based recommendation reason 65 output from the unique word output unit 61 and the sentence-based recommendation reason 66 output from the specific sentence output unit 62 may be displayed on the screen 200.

なお、本実施形態では、固有文出力器６２の処理単位を文としているが、固有文出力器６２は、文ではなく、フレーズ、パッセージ、パラグラフなどを処理単位としてもよい。この場合も、固有文出力器６２の処理単位が変わるだけで、同様の処理により所望のテキストを推薦理由として画面２００に表示させることができる。 In the present embodiment, the processing unit of the specific sentence output unit 62 is a sentence, but the specific sentence output unit 62 may use a phrase, passage, paragraph, or the like as a processing unit instead of a sentence. In this case as well, the desired text can be displayed on the screen 200 as a recommendation reason by the same processing only by changing the processing unit of the specific sentence output device 62.

次に、本実施形態の情報提示装置を構成する上述した各部による処理手順の詳細について説明する。 Next, the details of the processing procedure by each of the above-described units constituting the information presentation apparatus of the present embodiment will be described.

まず、Ａ文書群抽出器１１の処理手順を説明する。Ａ文書群抽出器１１の処理は、文書ＤＢ１００からすべてのＡ文書を見つけ出すことが目的である。Ａ文書の抽出は、例えば、従来的な方法を用いた単語ベースの検索により行うことができる。一般的な検索処理では、検索対象の文書群のインデックスをあらかじめ作成しておく処理方式を用いるのが一般的である。しかし、本実施形態では説明を簡単にするため、インデックスを作成しないで検索するｇｒｅｐ方式を用いるものとする。 First, the processing procedure of the A document group extractor 11 will be described. The purpose of the processing of the A document group extractor 11 is to find all A documents from the document DB 100. The extraction of the A document can be performed by, for example, a word-based search using a conventional method. In general search processing, it is common to use a processing method in which an index of a document group to be searched is created in advance. However, in this embodiment, to simplify the description, it is assumed that a grep method for searching without creating an index is used.

図２は、Ａ文書群抽出器１１の処理手順を示すフローチャートである。Ａ文書群抽出器１１は、まず、Ａ商品に関するメタデータから商品名を取り出して、これを検索のクエリとする（ステップＳ１０１）。 FIG. 2 is a flowchart showing a processing procedure of the A document group extractor 11. The A document group extractor 11 first extracts the product name from the metadata related to the A product and uses it as a search query (step S101).

次に、Ａ文書群抽出器１１は、クエリの表現正規化を行う（ステップＳ１０２）。具体的には、Ａ文書群抽出器１１は、まずクエリの表記ゆれ（半全角、日英、送り仮名など）を吸収し、さらに図３に示すような同意語辞書を用いて、クエリ（ここでは商品Ａの商品名）を代表的表現に置き換える。例えば、クエリ「スマホ」は「スマートフォン」に置換され、クエリ「パソコン」は「ＰＣ」に置換される。 Next, the A document group extractor 11 performs query normalization (step S102). Specifically, the A document group extractor 11 first absorbs a query notation fluctuation (half-width, Japanese-English, kana, etc.), and further uses a synonym dictionary as shown in FIG. Then, the product name of product A) is replaced with a representative expression. For example, the query “smartphone” is replaced with “smartphone”, and the query “personal computer” is replaced with “PC”.

次に、Ａ文書群抽出器１１は、文書ＤＢ１００から文書を１つ取り出す（ステップＳ１０３）。そして、Ａ文書抽出器１１は、ステップＳ１０３で取り出した文書に含まれる各単語に対して、ステップＳ１０２と同様の手法により表現正規化を行う（ステップＳ１０４）。 Next, the A document group extractor 11 takes out one document from the document DB 100 (step S103). Then, the A document extractor 11 performs expression normalization for each word included in the document extracted in step S103 by the same method as in step S102 (step S104).

次に、Ａ文書群抽出器１１は、ステップＳ１０４で単語の表現正規化が行われた文書内に、ステップＳ１０２で表現正規化が行われたクエリ（つまり商品Ａの商品名）が含まれているか確認し、表現正規化されたクエリが含まれていれば、出力するＡ文書群１５に当該文書を追加する（ステップＳ１０５）。 Next, the A document group extractor 11 includes the query (in other words, the product name of the product A) subjected to the expression normalization in step S102 in the document in which the word expression normalization is performed in step S104. If the expression normalized query is included, the document is added to the A document group 15 to be output (step S105).

次に、Ａ文書群抽出器１１は、文書ＤＢ１００から取り出していない文書があるか否かを判定し（ステップＳ１０６）、文書ＤＢ１００から取り出していない文書があれば（ステップＳ１０６：Ｙｅｓ）、ステップＳ１０３に戻って以降の処理を繰り返す。一方、文書ＤＢ１００のすべての文書に対してステップＳ１０３〜ステップＳ１０５の処理を行っていれば（ステップＳ１０６：Ｎｏ）、Ａ文書群抽出器１１は、Ａ文書群１５を出力し（ステップＳ１０７）、一連の処理を終了する。 Next, the A document group extractor 11 determines whether there is a document that has not been extracted from the document DB 100 (step S106). If there is a document that has not been extracted from the document DB 100 (step S106: Yes), step S103 is performed. Return to and repeat the subsequent processing. On the other hand, if the processing of steps S103 to S105 has been performed for all the documents in the document DB 100 (step S106: No), the A document group extractor 11 outputs the A document group 15 (step S107). A series of processing ends.

Ｂ文書群抽出器２１の処理は、文書ＤＢ１００からすべてのＢ文書を見つけ出すことが目的である。Ｂ文書の抽出は、Ａ文書の抽出と同様に単語ベースの検索により行う。Ｂ文書群抽出器２１の処理は、検索に用いるクエリが商品Ｂの商品名に置き換わり、出力する文書群がＢ文書群２５となるだけで、上述したＡ文書群抽出器１１の処理と同様であるため、詳細な説明は省略する。 The process of the B document group extractor 21 is to find all B documents from the document DB 100. The extraction of the B document is performed by word-based search in the same manner as the extraction of the A document. The process of the B document group extractor 21 is the same as the process of the A document group extractor 11 described above, except that the query used for the search is replaced with the product name of the product B and the document group to be output is the B document group 25. Therefore, detailed description is omitted.

Ａ∩Ｂ文書群抽出器３１の処理は、文書ＤＢ１００からすべてのＡ∩Ｂ文書を見つけ出すことが目的である。Ａ∩Ｂ文書の抽出は、Ａ文書やＢ文書の抽出と同様に単語ベースの検索により行う。Ａ∩Ｂ文書群抽出器３１の処理は、検索に用いるクエリが商品Ａの商品名と商品Ｂの商品名のアンド条件となり、出力する文書群がＡ∩Ｂ文書群３５となるだけで、上述したＡ文書群抽出器１１やＢ文書群抽出器２１の処理と同様であるため、詳細な説明は省略する。 The purpose of the processing of the A 群 B document group extractor 31 is to find all the A∩B documents from the document DB 100. The extraction of the A∩B document is performed by word-based search in the same manner as the extraction of the A document and the B document. The processing of the A∩B document group extractor 31 is as described above only when the query used for the search is an AND condition between the product name of the product A and the product name of the product B, and the output document group is the A∩B document group 35. Since the processing is the same as that performed by the A document group extractor 11 and the B document group extractor 21, detailed description thereof is omitted.

全文書群抽出器４１の処理は、文書ＤＢ１００から全文書を取り出して、後続処理のためにそれぞれの文書に含まれる各単語の表現正規化を行うことが目的である。 The processing of the all document group extractor 41 is intended to take out all documents from the document DB 100 and normalize the expression of each word included in each document for subsequent processing.

図４は、全文書群抽出器４１の処理手順を示すフローチャートである。全文書群抽出器４１は、まず、文書ＤＢ１００から文書を１つ取り出す（ステップＳ２０１）。そして、全文書抽出器４１は、ステップＳ２０１で取り出した文書に含まれる各単語に対して、図２のステップＳ１０２と同様の手法により表現正規化を行い（ステップＳ２０２）、出力する全文書群４５に当該文書を追加する（ステップＳ２０３）。 FIG. 4 is a flowchart showing the processing procedure of the all document group extractor 41. First, the all document group extractor 41 extracts one document from the document DB 100 (step S201). Then, the all document extractor 41 performs expression normalization for each word included in the document extracted in step S201 by the same method as in step S102 in FIG. 2 (step S202), and outputs all documents 45 The document is added to (step S203).

次に、全文書群抽出器４１は、文書ＤＢ１００から取り出していない文書があるか否かを判定し（ステップＳ２０４）、文書ＤＢ１００から取り出していない文書があれば（ステップＳ２０４：Ｙｅｓ）、ステップＳ２０１に戻って以降の処理を繰り返す。一方、文書ＤＢ１００のすべての文書に対してステップＳ２０１〜ステップＳ２０３の処理を行っていれば（ステップＳ２０４：Ｎｏ）、全文書群抽出器４１は、全文書群４５を出力し（ステップＳ２０５）、一連の処理を終了する。 Next, the all document group extractor 41 determines whether there is a document that has not been extracted from the document DB 100 (step S204). If there is a document that has not been extracted from the document DB 100 (step S204: Yes), step S201 is performed. Return to and repeat the subsequent processing. On the other hand, if the processing in steps S201 to S203 has been performed on all the documents in the document DB 100 (step S204: No), the all document group extractor 41 outputs the all document group 45 (step S205). A series of processing ends.

次に、単語関連度評価器１２の処理手順を説明する。単語関連度評価器１２の処理は、Ａ文書群１５に含まれる各単語について、商品Ａとの関連性を表す第１スコアを算出することを目的とする。本実施形態では、Ａ文書群１５における各単語の出現数を総単語数で割りｌｏｇスケールに変換することで各単語の対数確率を求め、これを第１スコアとする。これは、単位テキスト量あたりの各単語の頻度を計測していることとなり、情報検索においてよく用いられる指標であるｔｆ（term frequency）を正規化したものと同等と言える。 Next, the processing procedure of the word relevance evaluator 12 will be described. The processing of the word relevance evaluator 12 is intended to calculate a first score representing the relevance with the product A for each word included in the A document group 15. In the present embodiment, the log probability of each word is obtained by dividing the number of occurrences of each word in the A document group 15 by the total number of words and converting it to a log scale, and this is used as the first score. This means that the frequency of each word per unit text amount is measured, and can be said to be equivalent to normalizing tf (term frequency), which is an index often used in information retrieval.

図５は、単語関連度評価器１２の処理手順を示すフローチャートである。単語関連度評価器１２は、まず、各単語の出現数を集計するための集計用ヒストグラムを初期化する（ステップＳ３０１）。 FIG. 5 is a flowchart showing a processing procedure of the word relevance evaluator 12. First, the word relevance evaluator 12 initializes a counting histogram for counting the number of appearances of each word (step S301).

次に、単語関連度評価器１２は、Ａ文書群１５から文書を１つ取り出す（ステップＳ３０２）。そして、単語関連度評価器１２は、ステップＳ３０２で取り出した文書内に含まれる単語のヒストグラムを作成し（ステップＳ３０３）、得られたヒストグラムを集計用ヒストグラムに加算する（ステップＳ３０４）。 Next, the word relevance evaluator 12 extracts one document from the A document group 15 (step S302). Then, the word relevance evaluator 12 creates a histogram of words included in the document extracted in step S302 (step S303), and adds the obtained histogram to the aggregation histogram (step S304).

次に、単語関連度評価器１２は、Ａ文書群１５から取り出していない文書があるか否かを判定し（ステップＳ３０５）、Ａ文書群１５から取り出していない文書があれば（ステップＳ３０５：Ｙｅｓ）、ステップＳ３０２に戻って以降の処理を繰り返す。一方、Ａ文書群１５のすべての文書に対してステップＳ３０２〜ステップＳ３０４の処理を行っていれば（ステップＳ３０５：Ｎｏ）、単語関連度評価器１２は、集計用ヒストグラムから各単語の対数確率を割り出す（ステップＳ３０６）。具体的には、集計用ヒストグラムが示す各単語の頻度をｘ、Ａ文書群１５中の総単語数をｙとすると、対数確率はｌｏｇ（ｘ／ｙ）である。そして、単語関連度評価器１２は、ステップＳ３０６で算出した各単語の対数確率を、各単語の第１スコアとして出力し（ステップＳ３０７）、一連の処理を終了する。なお、ｘ＝０の場合、対数確率は−∞となる。∞や−∞は計算機では直接扱えないため、極端に大きな値や小さな値で代用する方法が考えられる。以降においても∞や−∞を扱う場合は同様の方法を用いればよい。 Next, the word relevance evaluator 12 determines whether there is a document that has not been extracted from the A document group 15 (step S305), and if there is a document that has not been extracted from the A document group 15 (step S305: Yes). ), Returning to step S302, the subsequent processing is repeated. On the other hand, if the processing of step S302 to step S304 is performed on all the documents in the A document group 15 (step S305: No), the word relevance evaluator 12 calculates the log probability of each word from the aggregation histogram. The index is determined (step S306). Specifically, the log probability is log (x / y), where x is the frequency of each word indicated by the aggregation histogram and y is the total number of words in the A document group 15. Then, the word relevance evaluator 12 outputs the log probability of each word calculated in step S306 as the first score of each word (step S307), and ends the series of processes. When x = 0, the log probability is −∞. Since ∞ and −∞ cannot be directly handled by a computer, a method of substituting an extremely large value or a small value can be considered. In the following, the same method may be used when dealing with ∞ and −∞.

単語関連度評価器２２の処理は、Ｂ文書群２５に含まれる各単語について、商品Ｂとの関連性を表す第２スコアを算出することを目的とする。第２スコアは、第１スコアと同様に、Ｂ文書群２５に含まれる各単語の対数確率である。単語関連度評価器２２の処理は、与えられる文書セットがＢ文書群２５に置き換わり、Ｂ文書群２５に含まれる各単語の対数確率を第２スコアとして出力するだけで、上述した単語関連度評価器１２の処理と同様であるため、詳細な説明は省略する。 The processing of the word relevance evaluator 22 is intended to calculate a second score representing the relevance with the product B for each word included in the B document group 25. Similar to the first score, the second score is a logarithmic probability of each word included in the B document group 25. The processing of the word relevance evaluation unit 22 replaces the given document set with the B document group 25, and outputs the log probabilities of the words included in the B document group 25 as the second score. Since it is the same as the process of the container 12, detailed description is abbreviate | omitted.

単語関連度評価器３２の処理は、Ａ∩Ｂ文書群３５に含まれる各単語について、商品Ａおよび商品Ｂの双方との関連性を表す第３スコアを算出することを目的とする。第３スコアは、第１スコアや第２スコアと同様に、Ａ∩Ｂ文書群３５に含まれる各単語の対数確率である。単語関連度評価器３２の処理は、与えられる文書セットがＡ∩Ｂ文書群３５に置き換わり、Ａ∩Ｂ文書群３５に含まれる各単語の対数確率を第３スコアとして出力するだけで、上述した単語関連度評価器１２の処理と同様であるため、詳細な説明は省略する。 The process of the word relevance evaluator 32 is intended to calculate a third score representing the relevance of both the product A and the product B for each word included in the A∩B document group 35. The third score is the logarithmic probability of each word included in the A∩B document group 35, like the first score and the second score. The processing of the word relevance evaluator 32 is as described above only by replacing the given document set with the A∩B document group 35 and outputting the log probabilities of each word included in the A∩B document group 35 as the third score. Since it is the same as the processing of the word relevance evaluator 12, detailed description is omitted.

次に、単語重要度評価器４２の処理手順を説明する。単語重要度評価器４２の処理は、文書ＤＢ１００内の各単語が持つ一般的な重要性を表す第４スコアを算出することを目的とする。本実施形態では、単語の重要性の指標として情報検索などでよく用いられるｉｄｆ（Inverse Document Frequency）を求めて、これを各単語の第４スコアとする。ある単語のｉｄｆは，当該単語を含む文書の負の対数確率である。つまり、当該単語を含む文書数をｘ、全文書数をｙとすると、ｉｄｆ＝−ｌｏｇ（ｘ/ｙ）である。一般的に、めったに出現しない単語（すなわち出現確率の低い単語）は、出現した際に読者に与える情報量が多く重要であると考えられるが、この場合、ｉｄｆは高い値を示す。 Next, the processing procedure of the word importance degree evaluator 42 will be described. The processing of the word importance level evaluator 42 is intended to calculate a fourth score representing the general importance of each word in the document DB 100. In this embodiment, idf (Inverse Document Frequency) often used in information retrieval or the like is obtained as an index of word importance, and this is used as the fourth score of each word. The idf of a word is a negative log probability of a document containing the word. That is, if the number of documents including the word is x and the total number of documents is y, idf = −log (x / y). In general, a word that rarely appears (that is, a word with a low appearance probability) is considered to be important because the amount of information given to the reader is large when it appears. In this case, idf shows a high value.

図６は、単語重要度評価器４２の処理手順を示すフローチャートである。単語重要度評価器４２は、まず、各単語の出現数を集計するための集計用ヒストグラムを初期化する（ステップＳ４０１）。 FIG. 6 is a flowchart showing a processing procedure of the word importance degree evaluator 42. First, the word importance degree evaluator 42 initializes a totaling histogram for counting the number of appearances of each word (step S401).

次に、単語重要度評価器４２は、全文書群４５から文書を１つ取り出す（ステップＳ４０２）。そして、単語重要度評価器４２は、ステップＳ４０２で取り出した文書内に含まれる単語の２値ヒストグラムを作成し（ステップＳ４０３）、得られたヒストグラムを集計用ヒストグラムに加算する（ステップＳ４０４）。２値ヒストグラムは、１か０の頻度値しか持たないヒストグラムであり、文書内に出現する単語に対して、出現数にかかわらず１が与えられる。 Next, the word importance level evaluator 42 extracts one document from the entire document group 45 (step S402). Then, the word importance level evaluator 42 creates a binary histogram of words included in the document extracted in step S402 (step S403), and adds the obtained histogram to the aggregation histogram (step S404). The binary histogram is a histogram having only a frequency value of 1 or 0, and 1 is given to a word appearing in a document regardless of the number of appearances.

次に、単語重要度評価器４２は、全文書群４５から取り出していない文書があるか否かを判定し（ステップＳ４０５）、全文書群４５から取り出していない文書があれば（ステップＳ４０５：Ｙｅｓ）、ステップＳ４０２に戻って以降の処理を繰り返す。一方、全文書群４５のすべての文書に対してステップＳ４０２〜ステップＳ４０４の処理を行っていれば（ステップＳ４０５：Ｎｏ）、単語重要度評価器４２は、集計用ヒストグラムから各単語を含む文書の負の対数確率を割り出す（ステップＳ４０６）。具体的には、集計用ヒストグラムが示す各単語の頻度ｘ、全文書群４５の総文書数をｙとすると、負の対数確率は−ｌｏｇ（ｘ／ｙ）である。そして、単語重要度評価器４２は、各単語のそれぞれについて、ステップＳ４０６で算出した当該単語を含む文書の負の対数確率を、各単語の第４スコアとして出力し（ステップＳ４０７）、一連の処理を終了する。 Next, the word importance level evaluator 42 determines whether there is a document that has not been extracted from the entire document group 45 (step S405), and if there is a document that has not been extracted from the entire document group 45 (step S405: Yes). ), Returning to step S402, the subsequent processing is repeated. On the other hand, if the processing of step S402 to step S404 has been performed on all the documents in all the document groups 45 (step S405: No), the word importance level evaluator 42 determines the document containing each word from the aggregation histogram. A negative log probability is determined (step S406). Specifically, if the frequency x of each word indicated by the aggregation histogram and the total number of documents in all document groups 45 are y, the negative log probability is −log (x / y). Then, for each word, the word importance evaluator 42 outputs the negative logarithmic probability of the document including the word calculated in step S406 as the fourth score of each word (step S407), and a series of processes Exit.

次に、統合スコア算出部５０の処理手順を説明する。統合スコア算出部５０の処理は、Ａ∩Ｂ文書群３５内の各単語について、商品Ａと商品Ｂとの双方に関する話題に対する固有性（つまり、Ａ∩Ｂ文書群３５にのみ顕著に出現する単語であるかどうかの度合い）を表す指標となる統合スコアを算出することを目的とする。これにより、商品Ａと商品Ｂとの組み合わせに関する説明にふさわしい単語を見つけることができるようになる。 Next, the processing procedure of the integrated score calculation unit 50 will be described. The processing of the integrated score calculation unit 50 is performed for each word in the A∩B document group 35 with respect to the uniqueness of the topic regarding both the product A and the product B (that is, a word that appears prominently only in the A∩B document group 35). It is an object to calculate an integrated score that serves as an index representing the degree of whether or not. As a result, it is possible to find a word suitable for explanation regarding the combination of the product A and the product B.

本実施形態では、統合スコアの計算に下記式（１）を用いるものとする。ただし、下記式（１）のｗは単語、ｎｔｆ（ｗ）は与えられた文書セットにおける単語ｗの対数確率、ｉｄｆは全文書群４５における単語ｗを含む文書の負の対数確率である。

In the present embodiment, the following formula (1) is used for calculation of the integrated score. In the following formula (1), w is a word, ntf (w) is a logarithmic probability of the word w in a given document set, and idf is a negative logarithmic probability of a document including the word w in the whole document group 45.

式（１）の第１項は、Ａ∩Ｂ文書群３５における単語ｗの対数確率を示しており、単語関連度評価器３２が出力する第３スコアに相当する。この第１項の値（第３スコア）が高いほど、当該単語ｗが、Ａ∩Ｂ文書群３５において多く出現していることを示している。 The first term of Expression (1) indicates the log probability of the word w in the A∩B document group 35, and corresponds to the third score output by the word relevance evaluator 32. The higher the value of the first term (third score), the more the word w appears in the A∩B document group 35.

式（１）の第２項は、Ａ文書群１５における単語ｗの対数確率を示しており、単語関連度評価器１２が出力する第１スコアに相当する。この第２項の値（第１スコア）が高いほど、当該単語ｗがＡ文書群１５において多く出現していることを示している。 The second term of Equation (1) indicates the log probability of the word w in the A document group 15 and corresponds to the first score output by the word relevance evaluator 12. The higher the value of the second term (first score), the more the word w appears in the A document group 15.

式（１）の第３項は、Ｂ文書群２５における単語ｗの対数確率を示しており、単語関連度評価器２２が出力する第２スコアに相当する。この第３項の値（第２スコア）が高いほど、当該単語ｗがＢ文書群２５において多く出現していることを示している。 The third term of Equation (1) indicates the log probability of the word w in the B document group 25 and corresponds to the second score output by the word relevance evaluator 22. The higher the value of the third term (second score), the more the word w appears in the B document group 25.

式（１）の第４項は、全文書群４５における単語ｗの希少性を示しており、単語重要度評価器４２が出力する第４スコアに相当する。この第４項の値（第４スコア）が高いほど、当該単語ｗは希少性があり、出現したときの情報量が多く重要な単語であることを示している。 The fourth term of Expression (1) indicates the rarity of the word w in the entire document group 45, and corresponds to the fourth score output by the word importance degree evaluator 42. As the value of the fourth term (fourth score) is higher, the word w is more scarce and indicates that the amount of information when it appears is an important word.

式（１）は、第１項から第２項および第３項を減算して統合スコアを求める計算式となっている。これにより、Ａ∩Ｂ文書群３５において多く出現し、かつ、Ａ文書群１５やＢ文書群２５ではあまり出現していない単語に対し、高い値の統合スコアが与えられることとなる。このことから、統合スコアは、商品Ａや商品Ｂの個別説明ではなく、両商品にまたがる説明にふさわしい度合いを示していると考えられる。なお、第１項を２倍しているのは、第１項から減算している項が２つあるためである。Ａ∩Ｂ文書群３５、Ａ文書群１５、Ｂ文書群２５のそれぞれで同頻度で出現する単語の固有性は０と考えられるが、式（１）のように第１項を２倍しておくことで、この場合の統合スコアを０とすることができる。ただし、第１項を２倍することは必須ではなく、第１項を２倍せずに第２項および第３項を減算してもよい。 Formula (1) is a calculation formula for obtaining an integrated score by subtracting the second term and the third term from the first term. As a result, a high integrated score is given to words that frequently appear in the A∩B document group 35 and do not frequently appear in the A document group 15 and the B document group 25. From this, it can be considered that the integrated score indicates a degree suitable for the description across both products, not the individual descriptions of the products A and B. The reason why the first term is doubled is that there are two terms subtracted from the first term. The uniqueness of the word appearing at the same frequency in each of the A∩B document group 35, the A document group 15, and the B document group 25 is considered to be 0, but the first term is doubled as shown in Expression (1). Therefore, the integrated score in this case can be set to zero. However, doubling the first term is not essential, and the second and third terms may be subtracted without doubling the first term.

また、式（１）は、第１項から第２項および第３項を減算した値に、さらに第４項を乗算して統合スコアを求める計算式となっている。これにより、各単語の一般的な観点での重要性を加味した統合スコアが得られる。つまり、Ａ文書群１５の文書数、Ｂ文書群２５の文書数、およびＡ∩Ｂ文書群３５の文書数が十分でない場合、第４項を乗算せずに各単語の統合スコアを算出すると統合スコアが過適応してしまうリスクがあるが、第４項を乗算することで、このリスクを回避できる。ただし、第４項の乗算は必須ではなく、第４項を乗算せずに統合スコアを算出してもよい。 Formula (1) is a calculation formula for obtaining an integrated score by multiplying the value obtained by subtracting the second term and the third term from the first term by the fourth term. Thereby, the integrated score which considered the importance in the general viewpoint of each word is obtained. That is, if the number of documents in the A document group 15, the number of documents in the B document group 25, and the number of documents in the A∩B document group 35 are not sufficient, the integration score of each word is calculated without multiplying the fourth term. There is a risk that the score is over-adapted, but this risk can be avoided by multiplying the fourth term. However, the multiplication of the fourth term is not essential, and the integrated score may be calculated without multiplying the fourth term.

図７は、統合スコア算出部５０の処理手順を示すフローチャートである。統合スコア算出部５０は、まず、Ａ∩Ｂ文書群３５から単語を１つ取り出す（ステップＳ５０１）。 FIG. 7 is a flowchart showing a processing procedure of the integrated score calculation unit 50. First, the integrated score calculation unit 50 extracts one word from the A∩B document group 35 (step S501).

次に、統合スコア算出部５０は、ステップＳ５０１で取り出した単語について、単語関連度評価器３２が出力した第３スコアの値を、式（１）の第１項に当てはめる（ステップＳ５０２）。 Next, the integrated score calculation unit 50 applies the value of the third score output by the word relevance evaluator 32 to the first term of Expression (1) for the word extracted in Step S501 (Step S502).

次に、統合スコア算出部５０は、ステップＳ５０１で取り出した単語について、単語関連度評価器１２が出力した第１スコアの値を、式（１）の第２項に当てはめる（ステップＳ５０３）。 Next, the integrated score calculation unit 50 applies the value of the first score output by the word relevance evaluation unit 12 to the second term of Expression (1) for the word extracted in Step S501 (Step S503).

次に、統合スコア算出部５０は、ステップＳ５０１で取り出した単語について、単語関連度評価器２２が出力した第２スコアの値を、式（１）の第３項に当てはめる（ステップＳ５０４）。 Next, the integrated score calculation unit 50 applies the value of the second score output by the word relevance evaluation unit 22 to the third term of Expression (1) for the word extracted in Step S501 (Step S504).

次に、統合スコア算出部５０は、ステップＳ５０１で取り出した単語について、単語重要度評価器４２が出力した第４スコアの値を、式（１）の第４項に当てはめる（ステップＳ５０５）。 Next, the integrated score calculation unit 50 applies the value of the fourth score output by the word importance degree evaluator 42 to the fourth term of Expression (1) for the word extracted in Step S501 (Step S505).

次に、統合スコア算出部５０は、式（１）を用いて、ステップＳ５０１で取り出した単語の統合スコアを算出する（ステップＳ５０６）。 Next, the integrated score calculation unit 50 calculates the integrated score of the word extracted in step S501 using equation (1) (step S506).

次に、統合スコア算出部５０は、Ａ∩Ｂ文書群３５から取り出していない単語があるか否かを判定し（ステップＳ５０７）、Ａ∩Ｂ文書群３５から取り出していない単語があれば（ステップＳ５０７：Ｙｅｓ）、ステップＳ５０１に戻って以降の処理を繰り返す。一方、Ａ∩Ｂ文書群３５に含まれるすべての単語に対してステップＳ５０１〜ステップＳ５０６の処理を行っていれば（ステップＳ５０７：Ｎｏ）、統合スコア算出部５０は、各単語の統合スコアを出力し（ステップＳ５０８）、一連の処理を終了する。 Next, the integrated score calculation unit 50 determines whether there is a word that has not been extracted from the A∩B document group 35 (step S507), and if there is a word that has not been extracted from the A∩B document group 35 (step S507). (S507: Yes), the process returns to step S501 and the subsequent processing is repeated. On the other hand, if the processing of steps S501 to S506 is performed for all the words included in the A∩B document group 35 (step S507: No), the integrated score calculation unit 50 outputs the integrated score of each word. Then (step S508), a series of processing is terminated.

次に、固有語出力器６１の処理手順を説明する。固有語出力器６１の処理は、Ａ∩Ｂ文書群３５に含まれる単語のうち、商品Ａと商品Ｂとの双方に関する話題に対する固有性の高い単語（固有語）を重要単語として選択して出力することを目的とする。本実施形態では、Ａ∩Ｂ文書群３５に含まれる単語のうち、統合スコアが高い上位ｋ個の単語を重要単語として出力するものとする。 Next, the processing procedure of the proper word output unit 61 will be described. The processing of the proper word output device 61 selects and outputs, as an important word, a word (proprietary word) having a high specificity with respect to topics related to both the products A and B among the words included in the A に B document group 35. The purpose is to do. In the present embodiment, the top k words having the highest integrated score among the words included in the A∩B document group 35 are output as important words.

すなわち、固有語出力器６１は、統合スコア算出部５０から出力された統合スコアを値が高い順にソートし、統合スコアの値が高い順に上記ｋ個の単語を重要単語として選択して出力する。Ｂ商品の推薦理由が単語のみでよい場合は、この固有語出力器６１が出力する重要単語が、単語ベースの推薦理由６５として画面２００に表示される。また、推薦理由を文とすることが要求される場合は、固有語出力器６１が出力する重要単語が、固有文出力器６２に渡される。 That is, the proper word output unit 61 sorts the integrated scores output from the integrated score calculation unit 50 in descending order of values, and selects and outputs the k words as important words in descending order of the integrated score value. When the reason for recommendation of the B product is only a word, the important word output by the proper word output unit 61 is displayed on the screen 200 as the word-based recommendation reason 65. When the recommendation reason is required to be a sentence, the important word output from the proper word output unit 61 is passed to the proper sentence output unit 62.

次に、固有文出力器６２の処理手順を説明する。固有文出力器６２の処理は、Ａ∩Ｂ文書群３５から重要単語を多く含む文を見つけ出し、文ベースの推薦理由６６として画面２００に出力することを目的とする。本実施形態では、重要単語を最も多く含むＡ∩Ｂ文書群３５中の文をベスト文として見つけ出し、文ベースの推薦理由６６として画面２００に出力するものとする。なお、上述したように、文の代わりにフレーズ、パッセージ、パラグラフなどを推薦理由として画面２００に表示させるようにしてもよい。 Next, the processing procedure of the specific sentence output unit 62 will be described. The processing of the unique sentence output unit 62 is intended to find a sentence including many important words from the A∩B document group 35 and output it to the screen 200 as a sentence-based recommendation reason 66. In the present embodiment, it is assumed that a sentence in the A 含む B document group 35 that contains the most important words is found as a best sentence and is output to the screen 200 as a sentence-based recommendation reason 66. Note that, as described above, phrases, passages, paragraphs, and the like may be displayed on the screen 200 as the reason for recommendation instead of sentences.

図８は、固有文出力器６２の処理手順を示すフローチャートである。固有文出力器６２は、まず、ベスト文およびベストスコアを初期化する（ステップＳ６０１）。つまり、文ベースの推薦理由６６として最終的に出力するベスト文を空文とし、そのベスト文に含まれる各単語の統合スコアの合計値であるベストスコアを−∞にする。 FIG. 8 is a flowchart showing the processing procedure of the specific sentence output unit 62. The unique sentence output unit 62 first initializes the best sentence and the best score (step S601). That is, the best sentence to be finally output as the sentence-based recommendation reason 66 is an empty sentence, and the best score that is the total value of the integrated scores of the words included in the best sentence is set to −∞.

次に、固有文出力器６２は、Ａ∩Ｂ文書群３５から文を１つ取り出す（ステップＳ６０２）。そして、固有文出力器６２は、ステップＳ６０２で取り出した文に含まれる各単語の統合スコアを合計したものを当該文のスコアとする（ステップＳ６０３）。 Next, the specific sentence output unit 62 extracts one sentence from the A∩B document group 35 (step S602). Then, the unique sentence output unit 62 sets the total score of the words included in the sentence extracted in step S602 as the score of the sentence (step S603).

次に、固有文出力器６２は、ステップＳ６０３で求めた文のスコアがベストスコアを上回っているか確認し、ベストスコアを上回っていれば、ベスト文およびベストスコアを、当該文とそのスコアで置き換える（ステップＳ６０４）。 Next, the specific sentence output unit 62 confirms whether the score of the sentence obtained in step S603 exceeds the best score. If the score exceeds the best score, the best sentence and the best score are replaced with the sentence and the score. (Step S604).

次に、固有文出力器６２は、Ａ∩Ｂ文書群３５から取り出していない文があるか否かを判定し（ステップＳ６０５）、Ａ∩Ｂ文書群３５から取り出していない文があれば（ステップＳ６０５：Ｙｅｓ）、ステップＳ６０２に戻って以降の処理を繰り返す。一方、Ａ∩Ｂ文書群３５に含まれるすべての文に対してステップＳ６０２〜ステップＳ６０４の処理を行っていれば（ステップＳ６０５：Ｎｏ）、固有文出力器６２は、ベスト文を文ベースの推薦理由６６として出力し（ステップＳ６０６）、一連の処理を終了する。 Next, the unique sentence output unit 62 determines whether there is a sentence that has not been extracted from the A∩B document group 35 (step S605), and if there is a sentence that has not been extracted from the A∩B document group 35 (step S605). (S605: Yes), the process returns to step S602 and the subsequent processing is repeated. On the other hand, if the processing in steps S602 to S604 has been performed for all the sentences included in the A∩B document group 35 (step S605: No), the specific sentence output unit 62 recommends the best sentence based on the sentence. The reason 66 is output (step S606), and the series of processing ends.

以上、具体的な例を挙げながら説明したように、本実施形態の情報提示装置によれば、商品Ａと商品Ｂとの双方に関する話題に対する固有が高い単語、あるいはその単語を含む文を特定して単語ベースの推薦理由６５、あるいは文ベースの推薦理由６６として画面２００に表示させる。したがって、この情報提示装置を用いることによって、ＥＣシステムを利用するユーザに対して、商品Ａと商品Ｂとの組み合わせ効果に関する情報を含む推薦理由を適切に提示して、調和型推薦による販売促進の効果を高めることができる。すなわち、ＥＣシステムを利用するユーザにとっては、本実施形態の情報提示装置により提示される推薦理由を参照することでＢ商品を購入する動機付けが生まれ、新体験を伴う商品購入がしやすくなり、店舗にとっては販売機会を増やすことができる。 As described above, as described with specific examples, according to the information presentation apparatus of the present embodiment, a word that is highly specific to a topic related to both the product A and the product B or a sentence including the word is specified. The word-based recommendation reason 65 or the sentence-based recommendation reason 66 is displayed on the screen 200. Therefore, by using this information presentation device, it is possible to appropriately present the reason for recommendation including information relating to the combination effect of the products A and B to the user who uses the EC system, and to promote sales by harmonized recommendation. The effect can be enhanced. That is, for users who use the EC system, the motivation to purchase the B product is born by referring to the reason for recommendation presented by the information presentation device of the present embodiment, and it becomes easier to purchase the product with a new experience, For stores, sales opportunities can be increased.

＜第２実施形態＞
次に、第２実施形態の情報提示装置について説明する。本実施形態では、ＥＣシステムを利用したユーザによるレビュー記事など、ある商品について記載されていることが事前に予測される文書を、検索対象の文書群として用いる。ＥＣシステムは、商品ページごとにユーザによるレビュー記事を管理していることが多い。このようなレビュー記事は、それぞれの商品に対する感想などを記載した文書であるため、推薦理由を見つけ出す対象として有効に利用できる。ただし、各レビュー記事は、レビュー対象の商品ＩＤ（商品識別情報）およびレビュー記事を記載したユーザの購入ログがメタデータとして紐付けられているとする。以下、商品ＩＤおよび購入ログと紐付けられたレビュー記事をラベル付き文書と呼ぶ。 Second Embodiment
Next, an information presentation apparatus according to the second embodiment will be described. In the present embodiment, documents that are predicted in advance to be described for a certain product such as a review article by a user using the EC system are used as a search target document group. EC systems often manage review articles by users for each product page. Such a review article is a document in which an impression of each product is described, and thus can be effectively used as a target for finding out the reason for recommendation. However, it is assumed that each review article is associated with the product ID (product identification information) to be reviewed and the purchase log of the user describing the review article as metadata. Hereinafter, the review article associated with the product ID and the purchase log is referred to as a labeled document.

第１実施形態では、一般的な文書を検索対象としていたため、Ａ文書、Ｂ文書、Ａ∩Ｂ文書を検索する手がかりとして、文書内に商品名が含まれているかどうかを用いていた。これに対し本実施形態では、検索対象とする各文書に付与されたレビュー対象の商品ＩＤ（レビュー記事に商品名が紐付けられている場合は商品名でもよい）を用いて検索する方法を取る。このため、文書検索エラーを排除できる（第１実施形態では表現揺れなどによるエラーのリスクがある）ほか、単に「おいしかった！また買います」などのように商品名が含まれていない文書であっても、メタデータを用いることで簡単に仕分けを行うことができるメリットがある。ただし、文書に紐付けられている商品ＩＤは１つであるため、Ａ∩Ｂ文書を判定するのには工夫が必要である。そこで、本実施形態では、近いタイミングで商品Ａと商品Ｂの両商品を購入したユーザがこれらの商品の購入から近いタイミングで記載したレビュー記事は、両商品への言及を含むレビュー記事である可能性が高いという仮説に基づいて、Ａ∩Ｂ文書を特定するようにしている。 In the first embodiment, since a general document is a search target, whether or not a product name is included in the document is used as a clue to search for the A document, the B document, and the A∩B document. On the other hand, in the present embodiment, a method of searching using a review target product ID (or a product name if a product name is associated with a review article) assigned to each document to be searched is used. . For this reason, document search errors can be eliminated (in the first embodiment, there is a risk of error due to fluctuations in the expression, etc.), and the document is simply a document that does not contain a product name, such as “It was delicious! However, there is an advantage that sorting can be easily performed by using metadata. However, since the product ID linked to the document is one, it is necessary to devise to determine the A∩B document. Therefore, in this embodiment, the review article described by the user who purchased both the products A and B at a close timing from the purchase of these products may be a review article including a reference to both the products. The A∩B document is specified based on the hypothesis that the property is high.

図９は、第２実施形態の情報提示装置の構成例を示す図である。第２実施形態の情報提示装置は、図９に示すように、第１実施形態の第１スコア算出部１０、第２スコア算出部２０および第３スコア算出部３０（図１参照）に代えて、第１スコア算出部７０、第２スコア算出部８０および第３スコア算出部９０を備えている。また、第２実施形態の情報提示装置は、検索対象の文書集合として、第１実施形態の文書ＤＢ１００（図１参照）に代えて、ラベル付き文書ＤＢ３００を用いる。ラベル付き文書ＤＢ３００は、上述したように、例えばＥＣシステムを利用したユーザによるレビュー記事の集合であり、各レビュー記事は商品ＩＤおよび購入ログ４００と紐付けられている。なお、第２実施形態の情報提示装置におけるその他の構成は、上述した第１実施形態と同様であるため、以下、第１実施形態と共通の構成要素については同一の符号を付して、重複した説明を適宜省略する。 FIG. 9 is a diagram illustrating a configuration example of the information presentation apparatus according to the second embodiment. As shown in FIG. 9, the information presentation device of the second embodiment is replaced with the first score calculation unit 10, the second score calculation unit 20, and the third score calculation unit 30 (see FIG. 1) of the first embodiment. The first score calculation unit 70, the second score calculation unit 80, and the third score calculation unit 90 are provided. The information presentation apparatus according to the second embodiment uses a labeled document DB 300 instead of the document DB 100 (see FIG. 1) according to the first embodiment as a search target document set. As described above, the labeled document DB 300 is a collection of review articles by a user using an EC system, for example, and each review article is associated with a product ID and a purchase log 400. In addition, since the other structure in the information presentation apparatus of 2nd Embodiment is the same as that of 1st Embodiment mentioned above, hereafter, the same code | symbol is attached | subjected about the same component as 1st Embodiment, and duplication is carried out. The description will be omitted as appropriate.

第１スコア算出部７０は、Ａ文書群抽出器７１と、単語関連度評価器１２とを含む。Ａ文書群抽出器７１は、商品Ａの商品ＩＤを用いてラベル付き文書ＤＢ３００に対する検索を行い、ラベル付き文書ＤＢ３００からＡ文書をすべて抽出してＡ文書群１５を得る。単語関連度評価器１２は、第１実施形態と共通である。 The first score calculation unit 70 includes an A document group extractor 71 and a word relevance evaluator 12. The A document group extractor 71 searches the labeled document DB 300 using the product ID of the product A, extracts all A documents from the labeled document DB 300, and obtains the A document group 15. The word relevance evaluator 12 is common to the first embodiment.

第２スコア算出部８０は、Ｂ文書群抽出器８１と、単語関連度評価器２２とを含む。Ｂ文書群抽出器８１は、商品Ｂの商品ＩＤを用いてラベル付き文書ＤＢ３００に対する検索を行い、ラベル付き文書ＤＢ３００からＢ文書をすべて抽出してＢ文書群２５を得る。単語関連度評価器２２は、第１実施形態と共通である。 The second score calculation unit 80 includes a B document group extractor 81 and a word relevance evaluator 22. The B document group extractor 81 searches the labeled document DB 300 using the product ID of the product B, extracts all the B documents from the labeled document DB 300, and obtains the B document group 25. The word relevance evaluator 22 is common to the first embodiment.

第３スコア算出部９０は、Ａ∩Ｂ文書群抽出器９１と、単語関連度評価器９２とを含む。 The third score calculation unit 90 includes an A∩B document group extractor 91 and a word relevance evaluator 92.

Ａ∩Ｂ文書群抽出器９１は、商品Ａの商品ＩＤおよび商品Ｂの商品ＩＤを用いてラベル付き文書ＤＢ３００に対する検索を行い、ラベル付き文書ＤＢ３００からＡ∩Ｂ文書を抽出して確信度付きＡ∩Ｂ文書群９５を得る。ここでラベル付き文書ＤＢ３００から抽出されるＡ∩Ｂ文書は、上述した仮説に基づいて抽出されるレビュー記事などのラベル付き文書であり、その文書に商品Ａと商品Ｂとの双方に関する記述が含まれていることの確信度が与えられたものである。 The A∩B document group extractor 91 searches the labeled document DB 300 using the product ID of the product A and the product ID of the product B, extracts the A∩B document from the labeled document DB 300, and adds A with confidence. ∩ A B document group 95 is obtained. Here, the A 文書 B document extracted from the labeled document DB 300 is a labeled document such as a review article extracted based on the above-described hypothesis, and the document includes descriptions regarding both the product A and the product B. Is given certainty that

単語関連度評価器９２は、確信度付きＡ∩Ｂ文書群９５に含まれる各単語のそれぞれについて、第１実施形態の関連度評価器３２と同様に、出現頻度に応じた第３スコアを算出する。ただし、本実施形態では、Ａ∩Ｂ文書のそれぞれに商品Ａと商品Ｂとの双方に関する記述が含まれていることの確信度が与えられており、各単語の頻度が、その単語が出現する文書の確信度を用いて計算される点が第１実施形態とは異なる。 The word relevance evaluator 92 calculates a third score corresponding to the appearance frequency for each of the words included in the A∩B document group 95 with certainty as in the relevance evaluator 32 of the first embodiment. To do. However, in this embodiment, a certainty factor is given that each of the A∩B documents includes a description regarding both the product A and the product B, and the frequency of each word appears. The point of calculation using the certainty factor of the document is different from the first embodiment.

次に、本実施形態の情報提示装置において、第１実施形態とは異なる部分の処理手順の詳細について説明する。 Next, in the information presentation apparatus according to the present embodiment, details of a processing procedure of a portion different from the first embodiment will be described.

まず、Ａ文書群抽出器７１の処理手順を説明する。Ａ文書群抽出器７１の処理は、ラベル付き文書ＤＢ３００からすべてのＡ文書を見つけ出すことが目的である。 First, the processing procedure of the A document group extractor 71 will be described. The process of the A document group extractor 71 is to find all A documents from the labeled document DB 300.

図１０は、Ａ文書群抽出器７１の処理手順を示すフローチャートである。Ａ文書群抽出器７１は、まず、Ａ商品に関するメタデータからＡ商品の商品ＩＤを取り出して、これを検索のクエリとする（ステップＳ７０１）。 FIG. 10 is a flowchart showing a processing procedure of the A document group extractor 71. The A document group extractor 71 first extracts the product ID of the A product from the metadata related to the A product, and uses it as a search query (step S701).

次に、Ａ文書群抽出器７１は、ラベル付き文書ＤＢ３００から文書を１つ取り出す（ステップＳ７０２）。そして、Ａ文書抽出器７１は、ステップＳ７０１で取り出した文書のラベルがクエリの商品ＩＤと一致するか確認し、一致していれば、出力するＡ文書群１５に当該文書を追加する（ステップＳ７０３）。 Next, the A document group extractor 71 takes out one document from the labeled document DB 300 (step S702). Then, the A document extractor 71 checks whether the label of the document extracted in step S701 matches the product ID of the query, and if it matches, adds the document to the A document group 15 to be output (step S703). ).

次に、Ａ文書群抽出器７１は、ラベル付き文書ＤＢ３００から取り出していない文書があるか否かを判定し（ステップＳ７０４）、ラベル付き文書ＤＢ３００から取り出していない文書があれば（ステップＳ７０４：Ｙｅｓ）、ステップＳ７０２に戻って以降の処理を繰り返す。一方、ラベル付き文書ＤＢ３００のすべての文書に対してステップＳ７０２およびステップＳ７０３の処理を行っていれば（ステップＳ７０４：Ｎｏ）、Ａ文書群抽出器７１は、Ａ文書群１５を出力し（ステップＳ７０５）、一連の処理を終了する。 Next, the A document group extractor 71 determines whether there is a document that has not been extracted from the labeled document DB 300 (step S704), and if there is a document that has not been extracted from the labeled document DB 300 (step S704: Yes). ), Returning to step S702, the subsequent processing is repeated. On the other hand, if the processing in steps S702 and S703 has been performed on all the documents in the labeled document DB 300 (step S704: No), the A document group extractor 71 outputs the A document group 15 (step S705). ), A series of processing ends.

Ｂ文書群抽出器８１の処理は、ラベル付き文書ＤＢ３００からすべてのＢ文書を見つけ出すことが目的である。Ｂ文書群抽出器８１の処理は、検索に用いるクエリが商品Ｂの商品ＩＤに置き換わり、出力する文書群がＢ文書群２５となるだけで、上述したＡ文書群抽出器７１の処理と同様であるため、詳細な説明は省略する。 The process of the B document group extractor 81 is to find all B documents from the labeled document DB 300. The process of the B document group extractor 81 is the same as the process of the A document group extractor 71 described above, except that the query used for the search is replaced with the product ID of the product B and the output document group is the B document group 25. Therefore, detailed description is omitted.

次に、Ａ∩Ｂ文書群抽出器９１の処理手順を説明する。Ａ∩Ｂ文書群抽出器９１の処理は、ラベル付き文書ＤＢ３００からＡ∩Ｂ文書を見つけ出すことが目的である。ラベル付き文書ＤＢ３００内の各ラベル付き文書は１つの商品ＩＤにしか結びついていないため、そのラベル付き文書が商品Ａと商品Ｂとの双方に関する記述を含んでいるかどうかをメタデータだけから判定することはできない。ここで視点を変えて、商品Ａと商品Ｂとを同時あるいは近いタイミングで購入したユーザは、両商品の組合せに意図を持っており、そのようなユーザがそれに近いタイミングで記載したレビュー文書には、両商品の組合せに関する記述が含まれている可能性が高いと考えられる。そこで、本実施形態では、購買ログ４００を用いてこの仮説に適合するユーザを選び、このユーザが記載したレビュー記事から、この仮説に適合するレビュー記事を、Ａ∩Ｂ文書として抽出する。さらに、このように抽出されたＡ∩Ｂ文書群に対し、商品Ａと商品Ｂとの双方に関する記述が含まれていることの確信度を与えて、確信度付きＡ∩Ｂ文書群９５を得る。 Next, the processing procedure of the A∩B document group extractor 91 will be described. The processing of the A∩B document group extractor 91 is to find the A∩B document from the labeled document DB 300. Since each labeled document in the labeled document DB 300 is linked to only one product ID, it is determined from the metadata only whether the labeled document includes a description of both the product A and the product B. I can't. Here, the user who purchased the product A and the product B at the same time or at a close timing has an intention to combine both products, and the review document described by such a user at a timing close to it is Therefore, it is highly likely that a description about the combination of both products is included. Therefore, in the present embodiment, a user who matches the hypothesis is selected using the purchase log 400, and a review article that matches the hypothesis is extracted as an A∩B document from the review articles described by the user. Further, the A∩B document group 95 with certainty is obtained by giving the certainty that the description about both the product A and the product B is included in the A∩B document group thus extracted. .

図１１は、Ａ∩Ｂ文書群抽出器９１の処理手順を示すフローチャートである。Ａ∩Ｂ文書群抽出器９１は、まず、購入ログ４００から１人のユーザを選択する（ステップＳ８０１）。 FIG. 11 is a flowchart showing the processing procedure of the A∩B document group extractor 91. The A∩B document group extractor 91 first selects one user from the purchase log 400 (step S801).

次に、Ａ∩Ｂ文書群抽出器９１は、ステップＳ８０１で選択したユーザが、所定の第１期間内に商品Ａと商品Ｂを購入していることを示す購入ログのペアをすべて抜き出す（ステップＳ８０２）。このときの判定例を図１２（ａ）に示す。上記の第１期間を２日とすると、図１２（ａ）の判定例１のように、ユーザＸの購入ログのうち、「１１／７１５：２０商品Ａ購入」と「１１／７１８：２０商品Ｂ購入」のペアは、両商品を購入した時間差が２日以内のため、ステップＳ８０２の処理で抜き出される。一方、「１１／７１８：２０商品Ｂ購入」と「１１／１０９：５０商品Ａ購入」のペアは、両商品を購入した時間差が２日を超えるため、ステップＳ８０２の処理では抜き出されない。この購入ログのペアの購入時刻の時間差を以下では「購入時間差」と呼ぶ。 Next, the A∩B document group extractor 91 extracts all purchase log pairs indicating that the user selected in Step S801 has purchased the products A and B within the predetermined first period (Step S801). S802). A determination example at this time is shown in FIG. Assuming that the first period is 2 days, as shown in the determination example 1 in FIG. 12A, among the purchase logs of the user X, “11/7 15:20 purchase of product A” and “11/7 18: The pair “20 product B purchase” is extracted in the process of step S802 because the time difference between the purchase of both products is within two days. On the other hand, the pair of “11/7 18:20 product B purchase” and “11/10 9:50 product A purchase” is not extracted in the process of step S802 because the time difference between the purchase of both products exceeds two days. . Hereinafter, the time difference between the purchase times of the purchase log pairs is referred to as a “purchase time difference”.

次に、Ａ∩Ｂ文書群抽出器９１は、ステップＳ８０２で抜き出した購入ログのペアを１つ取り出す（ステップＳ８０３）。そして、Ａ∩Ｂ文書群抽出器９１は、ラベル付き文書ＤＢ３００から、ステップＳ８０１で選択したユーザによって、ステップＳ８０３で取り出した購入ログのペアが示す購入時刻のうちの遅い方の購入時刻から所定の第２期間内に記載された、商品Ａまたは商品Ｂの商品ＩＤをラベルとして持つ文書（レビュー記事）をすべて取り出す（ステップＳ８０４）。 Next, the A∩B document group extractor 91 extracts one purchase log pair extracted in step S802 (step S803). Then, the A 文書 B document group extractor 91 selects the predetermined purchase time from the later purchase time indicated by the purchase log pair extracted in step S803 by the user selected in step S801 from the labeled document DB 300. All documents (review articles) having the product ID of the product A or the product B described in the second period as labels are taken out (step S804).

このときの判定例を図１２（ｂ）に示す。上記の第２期間を３日とすると、図１２（ｂ）の判定例２のように、ユーザＸが記載したレビュー記事のうち、「１１／９１２：００商品Ａレビュー記事」は、「１１／７１８：２０商品Ｂ購入」の購入ログの購入時刻から３日以内に記載されたレビュー記事であるため、ステップＳ８０４の処理で取り出される。一方、「１１／１１１２：００商品Ａレビュー記事」は、「１１／７１８：２０商品Ｂ購入」の購入ログの購入時刻から３日経過した後に記載されたレビュー記事であるため、ステップＳ８０４の処理では取り出されない。この購入ログの購入時刻とレビュー記載時刻の時間差を以下では「レビュー時間差」と呼ぶ。 An example of determination at this time is shown in FIG. Assuming that the second period is 3 days, among the review articles written by the user X, as shown in the determination example 2 in FIG. 12B, “11/9 12:00 product A review article” is “11 Since it is a review article written within 3 days from the purchase time of the purchase log of “/ 7 18:20 Product B purchase”, it is extracted in the process of step S804. On the other hand, “11/11 12:00 product A review article” is a review article described after three days have passed since the purchase time of the purchase log “11/7 18:20 product B purchase”, and therefore, step S804. It is not taken out in the process. The time difference between the purchase time of the purchase log and the review description time is hereinafter referred to as “review time difference”.

次に、Ａ∩Ｂ文書群抽出器９１は、ステップＳ８０３で取り出した購入ログのペアの購入時間差に応じた確信度を、ステップＳ８０４で取り出した各文書に対して割り当てる（ステップＳ８０５）。例えば、購入ログのペアが同じセッションでの購入の場合の確信度を１００％、１時間以内の購入の場合の確信度を９０％、２時間以内の購入の場合の確信度を８０％、同日購入の場合の確信度を５０％といったように、購入時間差が大きいほど低い値となる確信度を与える。なお、本実施形態では、ラベル付き文書ＤＢ３００から取り出した文書に対し、その文書を取り出す要因となった購入ログのペアの購入時間差に応じた確信度を与えるようにしているが、確信度を与える方法はこれに限らない。例えば、ラベル付き文書ＤＢ３００から取り出した文書に対して、レビュー時間差が大きくなるほど低い値となる確信度を与えるようにしてもよいし、購入時間差とレビュー時間差との双方を考慮した確信度を与えるようにしてもよい。 Next, the A∩B document group extractor 91 assigns a certainty factor corresponding to the purchase time difference between the purchase log pair extracted in step S803 to each document extracted in step S804 (step S805). For example, 100% confidence level for purchases in the same session with purchase log pairs, 90% confidence level for purchases within 1 hour, 80% confidence level for purchases within 2 hours, same day As the certainty factor in the case of purchase is 50%, a certainty factor that gives a lower value is given as the purchase time difference increases. In this embodiment, a certainty factor is given to a document extracted from the labeled document DB 300 according to the purchase time difference of a purchase log pair that causes the document to be extracted. The method is not limited to this. For example, the document retrieved from the labeled document DB 300 may be given a certainty factor that decreases as the review time difference increases, or a certainty factor that considers both the purchase time difference and the review time difference. It may be.

次に、Ａ∩Ｂ文書群抽出器９１は、ステップＳ８０５の処理により得られた確信度付き文書を、出力する確信度付きＡ∩Ｂ文書群９５に追加する（ステップＳ８０６）。 Next, the A∩B document group extractor 91 adds the document with certainty obtained by the process of step S805 to the A∩B document group 95 with certainty to be output (step S806).

次に、Ａ∩Ｂ文書群抽出器９１は、ステップＳ８０３で取り出していない購入ログのペアがあるか否かを判定し（ステップＳ８０７）、取り出していない購入ログのペアがあれば（ステップＳ８０７：Ｙｅｓ）、ステップＳ８０３に戻って以降の処理を繰り返す。一方、すべての購入ログのペアに対してステップＳ８０３〜ステップＳ８０６の処理を行っていれば（ステップＳ８０７：Ｎｏ）、Ａ∩Ｂ文書群抽出器９１は、ステップＳ８０１で選択していないユーザがいるか否かを判定し（ステップＳ８０８）、選択していないユーザがいれば（ステップＳ８０８：Ｙｅｓ）、ステップＳ８０１に戻って以降の処理を繰り返す。 Next, the A∩B document group extractor 91 determines whether there is a purchase log pair that has not been extracted in step S803 (step S807), and if there is a purchase log pair that has not been extracted (step S807: Yes), the process returns to step S803 and the subsequent processing is repeated. On the other hand, if the processing from step S803 to step S806 is performed on all purchase log pairs (step S807: No), the A∩B document group extractor 91 has any user not selected in step S801. If there is a user who has not selected (step S808: Yes), the process returns to step S801 and the subsequent processing is repeated.

一方、購入ログに含まれるすべてのユーザを選択してステップＳ８０２〜ステップＳ８０６の処理を行っていれば（ステップＳ８０８：Ｎｏ）、Ａ∩Ｂ文書群抽出器９１は、確信度付きＡ∩Ｂ文書群９５を出力し（ステップＳ８０９）、一連の処理を終了する。 On the other hand, if all the users included in the purchase log have been selected and the processing from step S802 to step S806 has been performed (step S808: No), the A∩B document group extractor 91 will perform the A∩B document with certainty factor. The group 95 is output (step S809), and a series of processing ends.

次に、単語関連度評価器９２の処理手順を説明する。単語関連度評価器９２の処理は、確信度付きＡ∩Ｂ文書群９５に含まれる各単語のそれぞれについて、第１実施形態の単語関連度評価器３２と同様に、商品Ａおよび商品Ｂの双方との関連性を表す第３スコアを算出することを目的とする。ただし、Ａ∩Ｂ文書には確信度が与えられているため、それに伴う処理が第１実施形態の単語関連度評価器３２とは異なる。 Next, the processing procedure of the word relevance evaluation unit 92 will be described. The processing of the word association degree evaluator 92 is performed for both the products A and B for each word included in the A∩B document group 95 with certainty factor, as in the word association degree evaluator 32 of the first embodiment. The purpose is to calculate a third score representing the relevance to the. However, since the certainty factor is given to the A∩B document, the process associated therewith is different from the word association degree evaluator 32 of the first embodiment.

図１３は、単語関連度評価器９２の処理手順を示すフローチャートである。単語関連度評価器９２は、まず、各単語の出現数を集計するための集計用ヒストグラムおよび総単語数を初期化する（ステップＳ９０１）。総単語数は、後述のように確信度付きＡ∩Ｂ文書群９５に含まれる総単語数を文書の確信度に応じて調整した値である。 FIG. 13 is a flowchart showing the processing procedure of the word relevance evaluator 92. First, the word relevance evaluator 92 initializes a counting histogram and a total number of words for counting the number of appearances of each word (step S901). The total number of words is a value obtained by adjusting the total number of words included in the A∩B document group 95 with certainty as described later according to the certainty of the document.

次に、単語関連度評価器９２は、確信度付きＡ∩Ｂ文書群９５から文書を１つ取り出す（ステップＳ９０２）。そして、単語関連度評価器９２は、ステップＳ９０２で取り出した文書内に含まれる単語のヒストグラムを作成する（ステップＳ９０３）。ただし、この際、各単語に与えられる頻度は、実際の頻度に確信度を積算したものとする。例えば確信度が５０％の文書において、単語Ａが１０回、単語Ｂが６回、単語Ｃが４回出現したとすると、単語Ａに与える頻度は５回、単語Ｂに与える頻度は３回、単語Ｃに与える頻度は２回となる。 Next, the word association degree evaluator 92 takes out one document from the A∩B document group 95 with certainty factor (step S902). Then, the word relevance evaluator 92 creates a histogram of words included in the document extracted in step S902 (step S903). However, at this time, the frequency given to each word is obtained by adding the certainty factor to the actual frequency. For example, in a document with a certainty factor of 50%, if the word A appears 10 times, the word B 6 times, and the word C 4 times, the frequency given to the word A is 5 times, the frequency given to the word B is 3 times, The frequency given to the word C is twice.

次に、単語関連度評価器９２は、ステップＳ９０３で得られたヒストグラムを集計用ヒストグラムに加算する（ステップＳ９０４）。また、単語関連度評価器９２は、当該文書の単語数に確信度を積算した値を総単語数に加算する（ステップＳ９０５）。例えば、当該文書の単語数が１０００、確信度が５０％であれば、加算する単語数は５００となる。 Next, the word association degree evaluator 92 adds the histogram obtained in step S903 to the aggregation histogram (step S904). In addition, the word relevance evaluator 92 adds a value obtained by adding the certainty factor to the number of words in the document to the total number of words (step S905). For example, if the number of words in the document is 1000 and the certainty factor is 50%, the number of words to be added is 500.

次に、単語関連度評価器９２は、確信度付きＡ∩Ｂ文書群９５から取り出していない文書があるか否かを判定し（ステップＳ９０６）、確信度付きＡ∩Ｂ文書群９５から取り出していない文書があれば（ステップＳ９０６：Ｙｅｓ）、ステップＳ９０２に戻って以降の処理を繰り返す。一方、確信度付きＡ∩Ｂ文書群９５のすべての文書に対してステップＳ９０２〜ステップＳ９０５の処理を行っていれば（ステップＳ９０６：Ｎｏ）、単語関連度評価器９２は、集計用ヒストグラムから各単語の対数確率を割り出す（ステップＳ９０７）。具体的には、集計用ヒストグラムが示す各単語の頻度をｘ、確信度付きＡ∩Ｂ文書群９５の総単語数（ステップＳ９０５で加算された総単語数）をｙとすると、対数確率はｌｏｇ（ｘ／ｙ）である。そして、単語関連度評価器９２は、ステップＳ９０７で算出した各単語の対数確率を、各単語の第３スコアとして出力し（ステップＳ９０８）、一連の処理を終了する。 Next, the word relevance evaluator 92 determines whether or not there is a document that has not been extracted from the A 群 B document group 95 with certainty factor (step S906), and is extracted from the A∩B document group 95 with certainty factor. If there is no document (step S906: Yes), the process returns to step S902 and the subsequent processing is repeated. On the other hand, if the processing of steps S902 to S905 is performed on all the documents in the A∩B document group 95 with certainty level (step S906: No), the word relevance evaluator 92 determines each value from the aggregation histogram. The log probability of the word is determined (step S907). Specifically, if the frequency of each word indicated by the aggregation histogram is x and the total number of words in the A∩B document group 95 with certainty (the total number of words added in step S905) is y, the log probability is log. (X / y). Then, the word relevance evaluator 92 outputs the log probability of each word calculated in step S907 as the third score of each word (step S908), and ends the series of processes.

なお、単語関連度評価器９２において、上述した購入時間差やレビュー時間差に応じた確信度に基づく処理を行う方法を用いる場合、Ａ∩Ｂ文書群抽出器９１においてＡ∩Ｂ文書群を抽出する際に、第１期間や第２期間を用いた閾値処理を必ずしも行わなくてもよい。Ａ∩Ｂ文書群抽出器９１において閾値処理を行わなければ、非常に大きな購入時間差やレビュー時間差を持つレビュー記事も抽出されるが、そのようなレビュー記事には非常に小さな確信度が与えられるためである。閾値処理を行わなければ抽出されるレビュー記事が増大するため計算量が増えるが、閾値処理によるレビュー記事の取りこぼしを回避することができる。 When the word relevance evaluator 92 uses the above-described method based on the certainty factor according to the purchase time difference or the review time difference, the A∩B document group extractor 91 extracts the A∩B document group. In addition, the threshold process using the first period or the second period is not necessarily performed. If threshold processing is not performed in the A∩B document group extractor 91, review articles having a very large purchase time difference or review time difference are also extracted. However, since such a review article has a very small certainty factor. It is. If threshold processing is not performed, the number of review articles to be extracted increases and the amount of calculation increases, but it is possible to avoid missing review articles due to threshold processing.

本実施形態の情報提示装置におけるその他の処理は、上述した第１実施形態と同様である。つまり、本実施形態の情報提示装置においても、統合スコア算出部５０により、確信度付きＡ∩Ｂ文書群９５に含まれる各単語について統合スコアが算出され、固有語出力器６１により、統合スコアが高い重要単語が単語ベースの推薦理由６５として画面２００に出力され、固有文出力器６２により、重要単語を多く含む文が文ベースの推薦理由６６として画面２００に出力される。 Other processes in the information presentation apparatus of the present embodiment are the same as those in the first embodiment described above. That is, also in the information presentation apparatus of the present embodiment, the integrated score calculation unit 50 calculates an integrated score for each word included in the A∩B document group 95 with confidence, and the proper word output unit 61 calculates the integrated score. Highly important words are output to the screen 200 as word-based recommendation reasons 65, and sentences containing many important words are output to the screen 200 as sentence-based recommendation reasons 66 by the unique sentence output unit 62.

したがって、本実施形態の情報提示装置を用いることによって、ＥＣシステムを利用するユーザに対して、商品Ａと商品Ｂとの組み合わせ効果に関する情報を含む推薦理由を適切に提示して、調和型推薦による販売促進の効果を高めることができる。すなわち、ＥＣシステムを利用するユーザにとっては、本実施形態の情報提示装置により提示される推薦理由を参照することでＢ商品を購入する動機付けが生まれ、新体験を伴う商品購入がしやすくなり、店舗にとっては販売機会を増やすことができる。 Therefore, by using the information presentation apparatus according to the present embodiment, a recommendation reason including information regarding the combination effect of the product A and the product B is appropriately presented to the user who uses the EC system. The sales promotion effect can be enhanced. That is, for users who use the EC system, the motivation to purchase the B product is born by referring to the reason for recommendation presented by the information presentation device of the present embodiment, and it becomes easier to purchase the product with a new experience, For stores, sales opportunities can be increased.

以上説明した第１実施形態または第２実施形態の情報提示装置における上述した各機能は、例えば、情報提示装置において所定のプログラムを実行することにより実現することができる。この場合、情報提示装置は、例えば図１４に示すように、ＣＰＵ（Central Processing Unit）５１０などのプロセッサ、ＲＯＭ（Read Only Memory）５２０やＲＡＭ（Random Access Memory）５３０などの記憶装置、表示器や各種操作デバイスが接続される入出力Ｉ／Ｆ５４０、ネットワークに接続して通信を行う通信Ｉ／Ｆ５５０、各部を接続するバス５６０などを備えた、通常のコンピュータを利用したハードウェア構成とすることができる。 Each function mentioned above in the information presentation apparatus of 1st Embodiment or 2nd Embodiment demonstrated above is realizable by executing a predetermined program in an information presentation apparatus, for example. In this case, as shown in FIG. 14, for example, the information presenting apparatus includes a processor such as a CPU (Central Processing Unit) 510, a storage device such as a ROM (Read Only Memory) 520 and a RAM (Random Access Memory) 530, a display, A hardware configuration using a normal computer including an input / output I / F 540 to which various operation devices are connected, a communication I / F 550 for connecting to a network for communication, a bus 560 for connecting each unit, and the like may be adopted. it can.

上述した情報提示装置で実行されるプログラムは、例えば、インストール可能な形式又は実行可能な形式のファイルでＣＤ−ＲＯＭ（Compact Disk Read Only Memory）、フレキシブルディスク（ＦＤ）、ＣＤ−Ｒ（Compact Disk Recordable）、ＤＶＤ（Digital Versatile Disc）等のコンピュータで読み取り可能な記録媒体に記録されてコンピュータプログラムプロダクトとして提供される。 The program executed by the above-described information presentation apparatus is, for example, an installable or executable file, such as a CD-ROM (Compact Disk Read Only Memory), a flexible disk (FD), and a CD-R (Compact Disk Recordable). ), A DVD (Digital Versatile Disc) or the like recorded on a computer-readable recording medium and provided as a computer program product.

また、上述した情報提示装置で実行されるプログラムを、インターネットなどのネットワークに接続されたコンピュータ上に格納し、ネットワーク経由でダウンロードさせることにより提供するように構成してもよい。また、本実施形態の情報提示装置で実行されるプログラムをインターネットなどのネットワーク経由で提供または配布するように構成してもよい。 The program executed by the information presentation device described above may be stored on a computer connected to a network such as the Internet and provided by being downloaded via the network. In addition, the program executed by the information presentation apparatus of the present embodiment may be configured to be provided or distributed via a network such as the Internet.

また、上述した情報提示装置で実行されるプログラムを、ＲＯＭ５２０などに予め組み込んで提供するように構成してもよい。 Further, the program executed by the information presentation device described above may be provided by being incorporated in advance in the ROM 520 or the like.

上述した情報提示装置で実行されるプログラムは、情報提示装置の各処理部（第１スコア算出部１０，７０、第２スコア算出部２０，８０、第３スコア算出部３０，９０、第４スコア算出部４０、統合スコア算出部５０および提示部６０）を含むモジュール構成となっており、実際のハードウェアとしては、例えば、ＣＰＵ５１０（プロセッサ）が上記記録媒体からプログラムを読み出して実行することにより、上述した各処理部がＲＡＭ５３０（主記憶）上にロードされ、上述した各処理部がＲＡＭ５３０（主記憶）上に生成されるようになっている。なお、実施形態の情報提示装置は、上述した各処理部の一部または全部を、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field-Programmable Gate Array）などの専用のハードウェアを用いて実現することも可能である。 The programs executed by the information presentation device described above are the processing units of the information presentation device (first score calculation units 10, 70, second score calculation units 20, 80, third score calculation units 30, 90, fourth score). The module configuration includes a calculation unit 40, an integrated score calculation unit 50, and a presentation unit 60). As actual hardware, for example, the CPU 510 (processor) reads and executes a program from the recording medium, Each processing unit described above is loaded on the RAM 530 (main memory), and each processing unit described above is generated on the RAM 530 (main memory). In addition, the information presentation apparatus of the embodiment realizes part or all of each processing unit described above using dedicated hardware such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field-Programmable Gate Array). Is also possible.

以上、本発明の実施形態を説明したが、ここで説明した実施形態は、例として提示したものであり、発明の範囲を限定することは意図していない。ここで説明した新規な実施形態は、その他の様々な形態で実施されることが可能であり、発明の要旨を逸脱しない範囲で、種々の省略、置き換え、変更を行うことができる。ここで説明した実施形態やその変形は、発明の範囲や要旨に含まれるとともに、請求の範囲に記載された発明とその均等の範囲に含まれる。 As mentioned above, although embodiment of this invention was described, embodiment described here is shown as an example and is not intending limiting the range of invention. The novel embodiments described herein can be implemented in various other forms, and various omissions, replacements, and changes can be made without departing from the spirit of the invention. The embodiments and modifications described herein are included in the scope and gist of the invention, and are included in the invention described in the claims and the equivalents thereof.

１０，７０第１スコア算出部
２０，８０第２スコア算出部
３０，９０第３スコア算出部
４０第４スコア算出部
５０統合スコア算出部
６０提示部 10, 70 First score calculation unit 20, 80 Second score calculation unit 30, 90 Third score calculation unit 40 Fourth score calculation unit 50 Integrated score calculation unit 60 Presentation unit

Claims

ユーザが参照している第１商品に調和する第２商品を推薦する際に、前記第１商品と前記第２商品との組み合わせ効果に関する情報を含む推薦理由を提示する情報提示装置であって、
検索対象の文書群から前記第１商品に関する第１文書群を抽出し、該第１文書群に含まれる各単語のそれぞれについて、前記第１商品との関連性を表す第１スコアを算出する第１スコア算出部と、
検索対象の文書群から前記第２商品に関する第２文書群を抽出し、該第２文書群に含まれる各単語のそれぞれについて、前記第２商品との関連性を表す第２スコアを算出する第２スコア算出部と、
検索対象の文書群から前記第１商品と前記第２商品との双方に関する第３文書群を抽出し、該第３文書群に含まれる各単語のそれぞれについて、前記第１商品および前記第２商品の双方との関連性を表す第３スコアを算出する第３スコア算出部と、
前記第３文書群に含まれる各単語のそれぞれについて、前記第３スコアから前記第１スコアと前記第２スコアとを減算し、統合スコアを算出する統合スコア算出部と、
前記統合スコアに基づいて所定の基準に従って選択された１以上の重要単語、または、該重要単語を含む前記第３文書群中の１以上のテキストの少なくとも一方を、前記推薦理由として提示する提示部と、を備える情報提示装置。 When recommending a second product that harmonizes with a first product referred to by a user, the information presenting device presents a recommendation reason including information related to a combination effect of the first product and the second product,
A first document group relating to the first product is extracted from a document group to be searched, and a first score representing the relevance with the first product is calculated for each word included in the first document group. 1 score calculator,
A second document group relating to the second product is extracted from the document group to be searched, and a second score representing the relevance with the second product is calculated for each word included in the second document group. A 2-score calculator,
A third document group relating to both the first product and the second product is extracted from the document group to be searched, and the first product and the second product for each word included in the third document group. A third score calculation unit for calculating a third score representing the relationship with both of
An integrated score calculation unit that calculates an integrated score by subtracting the first score and the second score from the third score for each of the words included in the third document group;
A presentation unit that presents at least one of one or more important words selected according to a predetermined criterion based on the integrated score or one or more texts in the third document group including the important words as the recommendation reason An information presentation device comprising:

前記第１スコア算出部は、検索対象の文書群から前記第１商品を表す記述を含む前記第１文書群を抽出し、該第１文書群に含まれる各単語のそれぞれについて、前記第１文書群における該単語の出現頻度が高いほど高い値となる前記第１スコアを算出し、
前記第２スコア算出部は、検索対象の文書群から前記第２商品を表す記述を含む前記第２文書群を抽出し、該第２文書群に含まれる各単語のそれぞれについて、前記第２文書群における該単語の出現頻度が高いほど高い値となる前記第２スコアを算出し、
前記第３スコア算出部は、検索対象の文書群から前記第１商品を表す記述と前記第２商品を表す記述との双方を含む前記第３文書群を抽出し、該第３文書群に含まれる各単語のそれぞれについて、前記第３文書群における該単語の出現頻度が高いほど高い値となる前記第３スコアを算出する、請求項１に記載の情報提示装置。 The first score calculation unit extracts the first document group including a description representing the first product from a document group to be searched, and for each word included in the first document group, the first document Calculating the first score which is higher as the frequency of occurrence of the word in the group is higher;
The second score calculation unit extracts the second document group including a description representing the second product from a document group to be searched, and the second document for each word included in the second document group. Calculating the second score which is higher as the frequency of occurrence of the word in the group is higher;
The third score calculation unit extracts the third document group including both the description representing the first product and the description representing the second product from the document group to be searched, and includes the third document group in the third document group The information presentation device according to claim 1, wherein the third score is calculated such that the higher the appearance frequency of the word in the third document group, the higher the value for each of the words.

検索対象の文書群に含まれる各単語のそれぞれについて、検索対象の文書群における該単語を含む文書の出現頻度が低いほど高い値となる第４スコアを算出する第４スコア算出部をさらに備え、
前記統合スコア算出部は、前記第３文書群に含まれる各単語のそれぞれについて、前記第３スコアから前記第１スコアと前記第２スコアとを減算した値に、さらに前記第４スコアを積算または加算して、前記統合スコアを算出する、請求項２に記載の情報提示装置。 For each word included in the document group to be searched, a fourth score calculation unit that calculates a fourth score that becomes higher as the appearance frequency of the document including the word in the document group to be searched is lower,
The integrated score calculation unit further adds the fourth score to a value obtained by subtracting the first score and the second score from the third score for each word included in the third document group, or The information presentation apparatus according to claim 2, wherein the integrated score is calculated by addition.

検索対象の文書群は、商品の識別情報と関連付けられた文書群であり、
前記第１スコア算出部は、検索対象の文書群から前記第１商品の識別情報に関連付けられた前記第１文書群を抽出し、該第１文書群に含まれる各単語のそれぞれについて、前記第１文書群における該単語の出現頻度が高いほど高い値となる前記第１スコアを算出し、
前記第２スコア算出部は、検索対象の文書群から前記第２商品の識別情報に関連付けられた前記第２文書群を抽出し、該第２文書群に含まれる各単語のそれぞれについて、前記第２文書群における該単語の出現頻度が高いほど高い値となる前記第２スコアを算出し、
前記第３スコア算出部は、検索対象の文書群から、前記第１商品と前記第２商品の双方を購入したユーザによって記載された、前記第１商品の識別情報または前記第２商品の識別情報に関連付けられた前記第３文書群を抽出し、該第３文書群に含まれる各単語のそれぞれについて、前記第３文書群における該単語の出現頻度が高いほど高い値となる前記第３スコアを算出する、請求項１に記載の情報提示装置。 The document group to be searched is a document group associated with product identification information,
The first score calculation unit extracts the first document group associated with the identification information of the first product from the document group to be searched, and for each word included in the first document group, the first score group Calculating the first score having a higher value as the appearance frequency of the word in one document group increases;
The second score calculation unit extracts the second document group associated with the identification information of the second product from the document group to be searched, and for each word included in the second document group, the second score group Calculating the second score which is higher as the frequency of occurrence of the word in the two document groups is higher;
The third score calculation unit is the identification information of the first product or the identification information of the second product described by a user who has purchased both the first product and the second product from the document group to be searched. The third document group associated with the third document group is extracted, and for each word included in the third document group, the higher the appearance frequency of the word in the third document group, the higher the third score is. The information presentation device according to claim 1, wherein the information presentation device calculates the information.

前記第３スコア算出部は、検索対象の文書群から、前記第１商品と前記第２商品の双方を所定の第１期間内に購入したユーザによって、前記第１商品または前記第２商品の購入時から所定の第２期間内に記載された、前記第１商品の識別情報または前記第２商品の識別情報に関連付けられた前記第３文書群を抽出し、該第３文書群に含まれる各単語のそれぞれについて、前記第３文書群における該単語の出現頻度が高いほど高い値となる前記第３スコアを算出する、請求項４に記載の情報提示装置。 The third score calculation unit purchases the first product or the second product from a search target document group by a user who purchased both the first product and the second product within a predetermined first period. The third document group associated with the identification information of the first product or the identification information of the second product described within a predetermined second period from the time is extracted, and each of the third document groups included in the third document group The information presentation device according to claim 4, wherein for each word, the third score that is higher as the appearance frequency of the word in the third document group is higher is calculated.

前記第３スコア算出部は、前記第３文書群に含まれる各文書について、前記第１商品と前記第２商品の購入時間の差、あるいは、前記第１商品または前記第２商品の購入時から該文書が記載された時刻までの時間差をもとに、該文書に前記第１商品と前記第２商品の双方についての記述が含まれていることの確信度を設定し、前記第３文書群に含まれる各単語のそれぞれについて、前記第３文書群における該単語の出現頻度に応じたスコアに対して、該単語を含む文書に設定した確信度を積算または加算し、前記第３スコアを算出する、請求項４または５に記載の情報提示装置。 The third score calculation unit, for each document included in the third document group, a difference in purchase time between the first product and the second product, or from the time of purchase of the first product or the second product. Based on the time difference up to the time when the document was written, a certainty factor is set that the document contains descriptions of both the first product and the second product, and the third document group For each word included in the document, the third score is calculated by adding or adding the certainty factor set in the document including the word to the score corresponding to the appearance frequency of the word in the third document group. The information presentation device according to claim 4 or 5.

ユーザが参照している第１商品に調和する第２商品を推薦する際に、前記第１商品と前記第２商品との組み合わせ効果に関する情報を含む推薦理由を提示する情報提示装置により実行される情報提示方法であって、
前記情報提示装置が、検索対象の文書群から前記第１商品に関する第１文書群を抽出し、該第１文書群に含まれる各単語のそれぞれについて、前記第１商品との関連性を表す第１スコアを算出する工程と、
前記情報提示装置が、検索対象の文書群から前記第２商品に関する第２文書群を抽出し、該第２文書群に含まれる各単語のそれぞれについて、前記第２商品との関連性を表す第２スコアを算出する工程と、
前記情報提示装置が、検索対象の文書群から前記第１商品と前記第２商品との双方に関する第３文書群を抽出し、該第３文書群に含まれる各単語のそれぞれについて、前記第１商品および前記第２商品の双方との関連性を表す第３スコアを算出する工程と、
前記情報提示装置が、前記第３文書群に含まれる各単語のそれぞれについて、前記第３スコアから前記第１スコアと前記第２スコアとを減算し、統合スコアを算出する工程と、
前記情報提示装置が、前記統合スコアに基づいて所定の基準に従って選択された１以上の重要単語、または、該重要単語を含む前記第３文書群中の１以上のテキストの少なくとも一方を、前記推薦理由として提示する工程と、を含む情報提示方法。 When recommending a second product that harmonizes with the first product referred to by the user, the information is presented by an information presentation device that presents a reason for recommendation including information related to the combination effect of the first product and the second product. An information presentation method,
The information presenting device extracts a first document group related to the first product from a document group to be searched, and each word included in the first document group represents a relevance with the first product. Calculating one score;
The information presenting device extracts a second document group related to the second product from the document group to be searched, and each word included in the second document group represents a relationship with the second product. Calculating two scores;
The information presenting device extracts a third document group relating to both the first product and the second product from a document group to be searched, and for each of the words included in the third document group, the first document Calculating a third score representing the relevance of both the product and the second product;
The information presenting device subtracting the first score and the second score from the third score for each word included in the third document group to calculate an integrated score;
The information presentation apparatus recommends at least one of one or more important words selected according to a predetermined criterion based on the integrated score, or one or more texts in the third document group including the important words, as the recommendation. A method of presenting as a reason.

コンピュータに、
検索対象の文書群からユーザが参照している第１商品に関する第１文書群を抽出し、該第１文書群に含まれる各単語のそれぞれについて、前記第１商品との関連性を表す第１スコアを算出する機能と、
検索対象の文書群から前記第１商品に調和する第２商品に関する第２文書群を抽出し、該第２文書群に含まれる各単語のそれぞれについて、前記第２商品との関連性を表す第２スコアを算出する機能と、
検索対象の文書群から前記第１商品と前記第２商品との双方に関する第３文書群を抽出し、該第３文書群に含まれる各単語のそれぞれについて、前記第１商品および前記第２商品の双方との関連性を表す第３スコアを算出する機能と、
前記第３文書群に含まれる各単語のそれぞれについて、前記第３スコアから前記第１スコアと前記第２スコアとを減算し、統合スコアを算出する機能と、
前記統合スコアに基づいて所定の基準に従って選択された１以上の重要単語、または、該重要単語を含む前記第３文書群中の１以上のテキストの少なくとも一方を、前記第１商品と前記第２商品との組み合わせ効果に関する情報を含む推薦理由として提示する機能と、を実現させるためのプログラム。 On the computer,
A first document group relating to a first product referred to by a user is extracted from a document group to be searched, and each word included in the first document group represents a relationship with the first product. The ability to calculate the score;
A second document group related to the second product that harmonizes with the first product is extracted from the document group to be searched, and each word included in the second document group represents a relationship with the second product. A function to calculate two scores;
A third document group relating to both the first product and the second product is extracted from the document group to be searched, and the first product and the second product for each word included in the third document group. A function of calculating a third score representing the relationship with both of
A function of calculating an integrated score by subtracting the first score and the second score from the third score for each word included in the third document group;
At least one of one or more important words selected according to a predetermined criterion based on the integrated score, or one or more texts in the third document group including the important words, the first product and the second A program for realizing a function to present as a reason for recommendation including information on a combination effect with a product.