JPH08305711A

JPH08305711A - Method and device for retrieving information

Info

Publication number: JPH08305711A
Application number: JP7113078A
Authority: JP
Inventors: Kazuhiro Hayakawa; 和宏早川; Hiroshi Hamada; 洋浜田; Koji Tsurumaki; 宏治鶴巻
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1995-05-11
Filing date: 1995-05-11
Publication date: 1996-11-22

Abstract

PURPOSE: To automatically apply a feature amount matching with the using state of an individual object to be retrieved to the object and to attain a retrieval suited to a user's sense. CONSTITUTION: In the case of using a normal data base, a user inputs a retrieving object identifier(ID) 101 and a user's ID 102 by operating a user's terminal 10. A vector generating part 20 inputs both the IDs 101, 102 and generates a retrieving object vector 103 found out by the using state of each user as the value of each dimension of the vector as the feature amount of each element to be retrieved. A similarity calculating part 30 calculates similarity between respective vectors and a similarity storing part 40 stores similarity data consisting of each similarity and a retrieving object ID 101 corresponding to the similarity. A retrieving part 50 retrieves similarity data 104 by using a retrieving object ID 105 requested to be retrieved as a key and outputs the ID 105 as a retrieved result 108.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、情報の配付と検索を行
うデータベースサービス等において、利用者の感覚に適
合した検索を行うのに好適な情報検索方法及び装置に関
するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an information retrieval method and apparatus suitable for conducting a retrieval suitable for a user's feeling in a database service for distributing and retrieving information.

【０００２】[0002]

【従来の技術】従来、情報検索を行うには、大きく二つ
の方法があった。一つは検索対象となる文献、画像、音
声等に対してあらかじめ人間がキーワード等を人為的に
付与する方法であり、もう一つは検索対象となる情報の
持つ各種の特徴量を自動的に抽出して検索のためのキー
として用いる方法である。前者は現在広く用いられてい
る方法である。後者の例としては、たとえばテキスト検
索においてテキスト中の各種の単語の出現頻度を用いる
手法、画像検索において画像中の色の分布を「温かい」
「涼しい」などの形容詞に結び付ける手法などがある。2. Description of the Related Art Conventionally, there are roughly two methods for performing information retrieval. One is a method in which a person artificially adds keywords etc. to documents, images, sounds, etc. to be searched, and another is to automatically calculate various feature amounts of information to be searched. This is a method of extracting and using as a key for retrieval. The former method is widely used at present. Examples of the latter include, for example, a method that uses the frequency of appearance of various words in text in text search, and “warm” color distribution in images in image search.
There is a method of connecting to adjectives such as "cool".

【０００３】[0003]

【発明が解決しようとする課題】人間がキーワード等を
人為的に付与する方法は、価値判断のように機械的にデ
ータから抽出するのが難しい特徴量を得ることが可能で
あるが、大量の情報について均質な特徴量を抽出するた
めには大量の人手が必要である。A method in which a human being artificially assigns a keyword or the like can obtain a feature amount that is difficult to extract mechanically from data, such as value judgment, but it requires a large amount of data. A large amount of manpower is required to extract a uniform feature amount for information.

【０００４】一方、検索対象の持つ各種の特徴量を自動
的に抽出する方法は、自動的に行えるという利点がある
が、従来の手法は、抽出された特徴量がかならずしも検
索に適した特徴量とは限らず、抽出された特徴量を利用
目的に応じて別の特徴量と結び付けることが必要であっ
た。On the other hand, the method of automatically extracting the various feature amounts of the search target has an advantage that it can be automatically performed. However, the conventional method is that the extracted feature amount is always suitable for the search. Not necessarily, it was necessary to link the extracted feature quantity with another feature quantity according to the purpose of use.

【０００５】本発明の目的は、個々のデータに対して、
キーワード付与のような人手を介した作業を行なわず、
自動的に人の感覚に適合する特徴量を付与し、人間の感
覚に適合した検索を可能とする情報検索方法及び装置を
提供することにある。The object of the present invention is to
Do not perform manual work such as adding keywords,
An object of the present invention is to provide an information search method and device that automatically gives a feature amount that matches the human sense and enables a search that matches the human sense.

【０００６】[0006]

【課題を解決するための手段】本発明の情報検索方法
は、検索対象となる各々の要素を、当該要素に対する各
利用者の利用状況をベクトルの各次元の値とするベクト
ルとして表現し、前記検索対象となる各々の要素同士の
類似度を、対応する二つのベクトル同士の距離または内
積またはなす角として算出し、利用者からの検索要求に
対し、検索対象となる要素同士の前記類似度に基づい
て、検索要求の検索対象と類似する検索対象を一つ以上
出力することを特徴とする。According to the information retrieval method of the present invention, each element to be retrieved is expressed as a vector whose utilization status of each user for the element is a value of each dimension of the vector. The similarity between the search target elements is calculated as the distance, inner product, or angle between the two corresponding vectors, and the similarity between the search target elements is calculated in response to the search request from the user. Based on this, one or more search targets similar to the search target of the search request are output.

【０００７】また、本発明の情報検索装置は、検索対象
となる要素とその要素に対する利用者一人一人の利用状
況を得て、検索対象となる各々の要素を、当該要素に対
する各利用者の利用状況をベクトルの各次元の値とする
ベクトル表現として生成する手段と、前記検索対象とな
る各々の要素同士の類似度を、前記生成された対応する
二つのベクトル同士の距離または内積またはなす角とし
て算出する手段と、前記算出された類似度とそれに対応
する検索対象となる二つの要素を含む類似度データを保
持し、利用者からの検索要求に対応する検索対象の要素
をキーとして前記類似度データを検索する手段と、前記
検索された類似度データに含まれる検索対象の要素のう
ち、利用者からの検索要求に対応する検索対象の要素で
ない方の要素を類似度に基づいて並び替えて出力する手
段とを具備することを特徴とする。Further, the information search apparatus of the present invention obtains the element to be searched and the usage status of each user for the element, and uses each element to be searched by each user for the element. A means for generating a situation as a vector representation of each dimension of the vector and a similarity between the respective elements to be searched as a distance or an inner product or an angle formed between the two corresponding generated vectors. A means for calculating and holding similarity data including the calculated similarity and two corresponding search target elements, and the similarity using the search target element corresponding to a search request from a user as a key The means for searching the data and the element that is not the search target element corresponding to the search request from the user among the search target elements included in the searched similarity data are classified. Characterized by comprising a means for outputting the rearranged based on time.

【０００８】[0008]

【作用】検索対象となる要素に対する利用者一人一人の
利用状況をベクトルの各次元の値とするベクトル表現
は、各利用者が当該検索対象となる要素を使用した、使
用しないという事実を、その検索対象の特徴量とするも
のである。多くの利用者が、ある検索対象となる要素を
使用すればするほど、そのベクトルはその検索対象をよ
り細かく表現することができる。二つのベクトル同士の
類似度は、距離または内積またはなす角として求めるこ
とができるが、このようにして求められた類似度は、そ
のベクトルに対応する検索対象同士の類似度を表してい
ると見做すことができる。利用者から検索要求として一
つの検索対象が与えられた場合には、上記類似度によ
り、その検索対象と類似する別の検索対象を出力する。
本発明により、検索対象に対して人間の感覚に適合する
特徴量を自動的に付与し、検索を行うことが可能にな
る。[Function] The vector expression in which the utilization status of each user for the element to be searched is the value of each dimension of the vector is based on the fact that each user uses or does not use the element to be searched. It is used as the feature amount of the search target. The more users use a certain search target element, the more detailed the vector can represent the search target. The similarity between two vectors can be obtained as a distance, an inner product, or an angle formed, and the similarity thus obtained is considered to represent the similarity between the search targets corresponding to the vector. It can be changed. When one search target is given as a search request from the user, another search target similar to the search target is output based on the similarity.
According to the present invention, it is possible to automatically add a feature amount that matches a human sense to a search target and perform a search.

【０００９】[0009]

【実施例】以下、本発明の一実施例を図面にしたがって
説明する。An embodiment of the present invention will be described below with reference to the drawings.

【００１０】図１は、本発明の一実施例を表すブロック
図である。図中、１０は利用者の操作を受けて、検索要
求や検索結果等の入出力を行う利用者端末、２０は検索
対象の特徴量を表現するベクトル（検索対象ベクトル）
を生成するベクトル生成部、３０は複数の検索対象ベク
トルから任意の二つの組み合わせを生成し、それらの類
似度を算出する類似度算出部、４０は類似度算出部３０
で算出された類似度データを保持するとともに、検索対
象識別子が与えられるとそれを検索キーとして類似度デ
ータを検索して出力する類似度保持部、５０は利用者端
末１０からの検索対象識別子を類似度保持部４０へ出力
するとともに、類似度保持部４０からの類似度データを
蓄積し、該類似度データに含まれる検索対象識別のう
ち、利用者から受け取った検索対象識別子でない方の検
索対象識別子を類似度の順に並べ替えて出力する検索部
である。１０１は利用者端末１０からデータベースへの
データ配付要求として与えられる検索対象識別子、１０
２は利用者を特定する利用者識別子である。１０３はベ
クトル生成部２０から出力される検索対象ベクトル、１
０４は類似度計算部３０から出力される二つの検索対象
ベクトルの検索対象識別子およびそれらの類似度を含む
類似度データ、１０５は検索部５０が類似度保持部４０
に類似度データを要求するための検索対象識別子、１０
６は類似度保持部４０から出力される類似度データ、１
０７は検索要求として利用者端末１０から検索部５０へ
送られる検索対象識別子、１１２は検索結果として検索
部５０から利用者端末１０へ送られる検索対象識別子の
リストである。FIG. 1 is a block diagram showing an embodiment of the present invention. In the figure, 10 is a user terminal that inputs and outputs a search request, a search result, and the like in response to a user's operation, and 20 is a vector expressing a feature amount of a search target (search target vector)
, A vector generation unit 30 that generates arbitrary two combinations from a plurality of search target vectors, and a similarity calculation unit that calculates the similarity between them, and a similarity calculation unit 30
The similarity holding unit that holds the similarity data calculated in step S1 and searches for and outputs the similarity data using the search target identifier as a search key when the search target identifier is given, and 50 represents the search target identifier from the user terminal 10. While outputting the similarity data from the similarity holding unit 40 to the similarity holding unit 40, among the search target identifications included in the similarity data, the search target that is not the search target identifier received from the user This is a search unit that rearranges the identifiers in order of similarity and outputs them. 101 is a search target identifier given as a data distribution request from the user terminal 10 to the database, and 10
2 is a user identifier that identifies the user. 103 is a search target vector output from the vector generation unit 20, 1
Reference numeral 04 is similarity data containing the search target identifiers of the two search target vectors output from the similarity calculation unit 30 and their similarity, and 105 is the similarity holding unit 40 of the search unit 50.
Target identifier for requesting similarity data to
6 is the similarity data output from the similarity holding unit 40, 1
Reference numeral 07 is a search target identifier sent from the user terminal 10 to the search unit 50 as a search request, and 112 is a list of search target identifiers sent from the search unit 50 to the user terminal 10 as search results.

【００１１】次に、図１に示す本発明の実施例における
各部の動作を説明する。通常、情報の配付を受ける場
合、利用者端末１０は利用者の操作に従ってデータベー
スに対して配付要求として検索対象識別子１０１および
利用者識別子１０２を出力する。データベース（図示せ
ず）は、この配付要求を受け取り、所望の情報を利用者
端末１０に返送し、利用者端末１０はそれを出力する。
これは、従来のデータベース利用システムと同じである
ので、これ以上の説明は省略する。Next, the operation of each part in the embodiment of the present invention shown in FIG. 1 will be described. Normally, when receiving information distribution, the user terminal 10 outputs the search target identifier 101 and the user identifier 102 as a distribution request to the database according to the user's operation. A database (not shown) receives this distribution request, returns desired information to the user terminal 10, and the user terminal 10 outputs it.
Since this is the same as the conventional database utilization system, further description will be omitted.

【００１２】ベクトル生成部２０は、利用者端末１０が
出力する上記検索対象識別子１０１と利用者識別子１０
２の組を取り込み、組のまま蓄積する。図２は、この検
索対象識別子１０１と利用者識別子１０２の組で表現さ
れる利用履歴表を模式的に表したものである。図中、検
索対象識別子が表の行を指定し、利用者識別子が表の列
を指定する。表の値は最初は空欄ないし０であり、利用
者が検索対象を利用すると、表の該当する欄に１ないし
利用された回数（利用状況）が書き込まれる。なお、本
実施例で、ベクトル生成部２０において最初から、図２
の表の形式を用いる代わりに検索対象識別子と利用者識
別子の組を用いて蓄積するとしたのは、表の形式を用い
ると空欄が非常に多くなり、メモリ空間の使用効率が悪
くなることが予想されるからである。The vector generation unit 20 includes the search target identifier 101 and the user identifier 10 output by the user terminal 10.
Take in two sets and store them as-is. FIG. 2 schematically shows a usage history table represented by a set of the search target identifier 101 and the user identifier 102. In the figure, the search target identifier specifies the row of the table, and the user identifier specifies the column of the table. The value of the table is initially blank or 0, and when the user uses the search target, 1 or the number of times of use (usage status) is written in the corresponding column of the table. It should be noted that in the present embodiment, the vector generation unit 20 starts from the beginning in FIG.
Instead of using the table format of the above, the set of the search target identifier and the user identifier is used for storing, and it is expected that the use of the table format will result in an extremely large number of blank spaces and inefficient use of memory space. Because it is done.

【００１３】ベクトル生成部２０は、各検索対象識別子
に対応する利用者一人一人の利用状況をベクトルの各次
元の値とするベクトル表現の検索対象ベクトル１０３を
生成し、対応する検索対象識別子と共に出力する。この
検索対象ベクトルの生成は、概念図には図２で示すよう
に、利用履歴表から検索対象識別子に対応する一行を取
り出すことで行われる。本実施例では、利用履歴表を直
接保持する代わりに検索対象識別子と利用者識別子の組
を蓄積するとしているので、表の各欄の値は、その欄に
相当する検索対象識別子と利用者識別子の組があれば
１、なければ０であるとする。The vector generation unit 20 generates a vector-to-be-searched vector 103 in which the usage status of each user corresponding to each search-target identifier is a value of each dimension of the vector, and outputs it together with the corresponding search-target identifier. To do. The generation of the search target vector is performed by extracting one row corresponding to the search target identifier from the usage history table as shown in FIG. 2 in the conceptual diagram. In the present embodiment, since the set of search target identifiers and user identifiers is stored instead of directly holding the usage history table, the values in each column of the table are the search target identifiers and user identifiers corresponding to the columns. If there is a pair of, it is assumed to be 1, and if not, it is assumed to be 0.

【００１４】ベクトル生成部２０では、例えば、検索対
象ベクトルの生成を前回のベクトル生成から一定期間が
経過した場合に行う。なお、このベクトル生成のタイミ
ングはデーテベースの利用状態によるので、他にもベク
トル生成部２０に新しい情報が与えられる都度行うと
か、未処理の情報が一定量以上、ベクトル生成部２０に
蓄積された時点で行うなどの方法が考えられる。The vector generation unit 20 generates a search target vector, for example, when a certain period of time has passed since the previous vector generation. Since the timing of this vector generation depends on the usage state of the database, it may be performed each time new information is given to the vector generation unit 20, or when a certain amount or more of unprocessed information is accumulated in the vector generation unit 20. It is conceivable to do it in.

【００１５】ベクトル生成部２０からすべての検索対象
ベクトル１０３が出力されると、類似度算出部３０が検
索対象ベクトル同士の類似度を計算する。類似度計算部
３０では、ベクトル生成部２０から出力される検索対象
ベクトル１０３を、まず、いったん蓄積し、次に、順
次、二つの検索対象ベクトル同士で類似度を計算する。When all the search target vectors 103 are output from the vector generation unit 20, the similarity calculation unit 30 calculates the similarity between the search target vectors. The similarity calculation unit 30 first accumulates the search target vector 103 output from the vector generation unit 20 first, and then sequentially calculates the similarity between the two search target vectors.

【００１６】図３は、類似度算出部３０において、蓄積
された検索対象ベクトル同士の類似度を算出する処理フ
ロー例を示したものである。まず、検索対象識別子に対
して通し番号を振り、その最大値ｍを設定しておく。そ
の後、通し番号を使って全検索対象識別子から２つを選
びだす組み合わせ（ｉ，ｊ）を順に生成し、各組み合わ
せについて、その検索対象識別子に対応する検索対象ベ
クトルｖｉ，ｖｊを取り出し、類似度ｓを計算し、検索
対象識別子の対と類似度ｓを出力する。このようにし
て、選びだす組み合わせがなくなったなら、処理は終了
となる。FIG. 3 shows an example of a processing flow for calculating the similarity between the accumulated search target vectors in the similarity calculating section 30. First, a serial number is assigned to the search target identifier and its maximum value m is set. After that, a combination (i, j) that selects two from all the search target identifiers is sequentially generated using the serial number, and the search target vector vi, vj corresponding to the search target identifier is extracted for each combination, and the similarity s And the similarity s is output. In this way, when there are no more combinations to be selected, the process ends.

【００１７】図３中、類似度ｓは、例えば次の（１）式
で表わされるものである。In FIG. 3, the similarity s is expressed by the following equation (1), for example.

【００１８】[0018]

【数１】 [Equation 1]

【００１９】（１）式は、二つのベクトルのなす角の余
弦を求める演算である。類似度の演算内容としては、他
に内積、ユークリッド距離、ハミング距離でもよい。
（１）式の演算では、二つの検索対象に全く類似性がな
い場合には結果が０、同一と見なしてよい場合には結果
が１となる。Expression (1) is an operation for obtaining the cosine of the angle formed by two vectors. The contents of the similarity calculation may be an inner product, Euclidean distance, and Hamming distance.
In the calculation of the expression (1), the result is 0 when the two search targets have no similarity, and the result is 1 when they can be regarded as the same.

【００２０】類似度保持部４０は、類似度算出部３０か
ら出力される２つの検索対象識別子および類似度の演算
結果からなる類似度データ１０４を受け取り、蓄積す
る。図４は、類似度データで表現される類似度表を模式
的に表したものである。図中、二つの検索対象識別子が
表の行と列を指定する。表の各欄の値は、類似度算出部
３０で対応する二つの検索対象ベクトルから算出された
類似度である。The similarity holding unit 40 receives and stores the similarity data 104, which is output from the similarity calculating unit 30 and includes the two search target identifiers and the calculation result of the similarity. FIG. 4 schematically shows a similarity table represented by the similarity data. In the figure, two search target identifiers specify a row and a column of the table. The value in each column of the table is the similarity calculated by the similarity calculation unit 30 from the corresponding two search target vectors.

【００２１】実際には、類似度表は類似度が０の部分が
非常に大きいため、図４に示すような表形式で類似度を
蓄積するとメモリ空間の使用効率が悪くなることが予想
される。従って、本実施例では、２つの検索対象識別子
および類似度の演算結果を含む類似度データをそのまま
蓄積する。In reality, the similarity table has a very large portion where the similarity is 0. Therefore, if the similarity is stored in a table format as shown in FIG. 4, it is expected that the use efficiency of the memory space will deteriorate. . Therefore, in this embodiment, the similarity data including the two search target identifiers and the calculation result of the similarity is stored as it is.

【００２２】以上は、利用者が通常データベースを利用
している際に行われる動作である。次に、検索を行う場
合の動作を説明する。The above is the operation performed when the user normally uses the database. Next, the operation when performing a search will be described.

【００２３】検索を行う場合、利用者は利用者端末１０
を通し、検索要求として自分が検索したいものに近い内
容を持つと思われるものの検索対象識別子を入力する。
利用者端末１０は、入力された検索対象識別子１０７を
検索部５０に送る。When performing a search, the user uses the user terminal 10
Through, enter the search target identifier of the one that seems to have contents similar to the one you want to search as a search request.
The user terminal 10 sends the input search target identifier 107 to the search unit 50.

【００２４】検索部５０では、利用者端末１０からの検
索対象識別子１０７を受け取り記憶する。次に、この受
け取った検索対象識別子１０７を検索要求として、検索
対象識別子１０５を類似度保持部４０へ出力する。The search unit 50 receives and stores the search target identifier 107 from the user terminal 10. Next, using the received search target identifier 107 as a search request, the search target identifier 105 is output to the similarity holding unit 40.

【００２５】類似度保持部４０では、検索対象識別子１
０５を検索要求として受け取ると、それを検索キーとし
て保持していた類似度データを検索し、検索対象識別子
１０５の含まれる類似度データ１０６を検索部５０へ返
送する。In the similarity holding unit 40, the search target identifier 1
When 05 is received as a search request, the similarity data held as a search key is searched, and the similarity data 106 including the search target identifier 105 is returned to the search unit 50.

【００２６】検索部５０は、類似度保持部４０から類似
度データ１０６を受け取る。次に、受け取った類似度デ
ータのそれぞれについて、それに含まれる二つの検索対
象識別子のうち、利用者端末１０から受け取った検索対
象識別子１０７ではない方の検索対象識別子を、その類
似度データ中の類似度の順に並べ替え、検索対象識別子
のリストとして検索結果１０８を利用者端末１０へ出力
する。The search unit 50 receives the similarity data 106 from the similarity holding unit 40. Next, for each of the received similarity data, the search target identifier that is not the search target identifier 107 received from the user terminal 10 among the two search target identifiers included in it is determined as the similarity in the similarity data. The search results 108 are output to the user terminal 10 as a list of search target identifiers.

【００２７】図５は、本発明の実施例の具体的な処理例
として、論文データベースの検索の例を示したものであ
る。FIG. 5 shows an example of searching the article database as a concrete processing example of the embodiment of the present invention.

【００２８】図５（ａ）は、図２で説明した利用履歴表
であり、利用者Ｘが論文データベースを利用して論文Ｙ
を読んだなら、表の該当欄（Ｘ，Ｙ）に１が書き込まれ
る。なお、これは利用された回数が書き込まれることで
もよい。ここで、例えば論文１に対応する検索対象ベク
トルは［１１１００・・・］で表現される。他の論文
２，３，４に対応する検索対象ベクトルも同様である。FIG. 5A is the usage history table described in FIG. 2, in which user X uses the thesis database to write a thesis Y.
Is read, 1 is written in the corresponding column (X, Y) of the table. Note that this may be written as the number of times it has been used. Here, for example, the search target vector corresponding to the paper 1 is represented by [11100 ...]. The same applies to the search target vectors corresponding to the other papers 2, 3, and 4.

【００２９】図５（ａ）の利用履歴表について、（１）
式により論文同士の類似度を算出すると、図５（ｂ）の
ようになる。便宜上、図５（ｂ）では、論文１に着目
し、それと論文２，３，４との各類似度の算出例のみを
示したものである。同様にして、論文２について、それ
と論文３，４との各類似度、論文３について論文４との
類似度をそれぞれ算出すればよい。Regarding the usage history table of FIG. 5A, (1)
When the degree of similarity between papers is calculated by the formula, it becomes as shown in FIG. For the sake of convenience, FIG. 5 (b) focuses on the paper 1 and shows only an example of calculating the respective similarities between the paper 1 and the papers 2, 3 and 4. Similarly, for paper 2, the similarity between paper 3 and paper 4, and paper 3 for paper 4, may be calculated.

【００３０】図５（ｃ）は、図４で説明した類似度表で
あり、論文１，２，３，４を行および列として、図５
（ｂ）の各類似度を該当欄に書き込んだものである。FIG. 5C is the similarity table described with reference to FIG. 4, and the papers 1, 2, 3, and 4 are shown as rows and columns.
Each similarity of (b) is written in the corresponding column.

【００３１】図５（ｄ）は、利用者が検索要求として論
文１を入力した場合、該論文１をキーにして、図５
（ｃ）の類似度表が検索されることを示し、図５（ｅ）
は、この検索要求に対する検索結果リストを示したもの
である。ここで、図５（ｅ）の検索結果は、検索対象
（論文２，３，４）に対して人間の感覚に適合したもの
になっている。FIG. 5 (d) shows that when the user inputs article 1 as a search request, the article 1 is used as a key, and FIG.
FIG. 5E shows that the similarity table of FIG. 5C is searched.
Shows a search result list for this search request. Here, the search result of FIG. 5E is adapted to the human sense of the search target (thesis 2, 3, 4).

【００３２】[0032]

【発明の効果】以上説明したように、本発明の情報検索
方法及び装置を用いれば、個々の情報に対して、キーワ
ード付与のような人手を介した作業を行わず、自動的に
その情報の利用実態に合った特徴量を付与し、利用者の
感覚に適合した検索を行うことができる。As described above, by using the information search method and apparatus of the present invention, it is possible to automatically search individual information without manually performing a task such as adding a keyword. It is possible to add a feature amount that matches the actual usage and perform a search that matches the user's feeling.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明の一実施例を表すブロック図である。FIG. 1 is a block diagram illustrating an embodiment of the present invention.

【図２】ベクトル生成部で保持される利用履歴表とそれ
からのベクトルの生成法を模式的に示した図である。FIG. 2 is a diagram schematically showing a usage history table held by a vector generation unit and a vector generation method from the usage history table.

【図３】類似度算出部での動作を記述した流れ図であ
る。FIG. 3 is a flowchart describing an operation in a similarity calculation unit.

【図４】類似度保持部で保持される類似度表を示した図
である。FIG. 4 is a diagram showing a similarity table held by a similarity holding unit.

【図５】本発明による情報検索の具体例を示した例であ
る。FIG. 5 is an example showing a specific example of information retrieval according to the present invention.

【符号の説明】[Explanation of symbols]

１０利用者端末２０ベクトル生成部３０類似度算出部４０類似度保持部５０検索部 10 user terminal 20 vector generation unit 30 similarity calculation unit 40 similarity holding unit 50 search unit

Claims

【特許請求の範囲】[Claims]

【請求項１】検索対象となる各々の要素を、当該要素
に対する各利用者の利用状況をベクトルの各次元の値と
するベクトルとして表現し、前記検索対象となる各々の
要素同士の類似度を、対応する二つのベクトル同士の距
離または内積またはなす角として算出し、利用者からの
検索要求に対し、検索対象となる要素同士の前記類似度
に基づいて、検索要求の検索対象と類似する検索対象を
一つ以上出力することを特徴とする情報検索方法。1. Each element to be searched is expressed as a vector in which the usage status of each user with respect to the element is a value of each dimension of the vector, and the similarity between each element to be searched is expressed. , A distance or an inner product of two corresponding vectors, or an angle formed between them, and a search similar to the search target of the search request based on the similarity between the search target elements with respect to the search request from the user An information retrieval method characterized by outputting one or more targets.

【請求項２】検索対象となる要素とその要素に対する
利用者一人一人の利用状況を得て、検索対象となる各々
の要素を、当該要素に対する各利用者の利用状況をベク
トルの各次元の値とするベクトル表現として生成する手
段と、前記検索対象となる各々の要素同士の類似度を、前記生
成された対応する二つのベクトル同士の距離または内積
またはなす角として算出する手段と、前記算出された類似度とそれに対応する検索対象となる
二つの要素を含む類似度データを保持し、利用者からの
検索要求に対応する検索対象の要素をキーとして前記類
似度データを検索する手段と、前記検索された類似度データに含まれる検索対象の要素
のうち、利用者からの検索要求に対応する検索対象の要
素でない方の要素を類似度に基づいて並び替えて出力す
る手段と、を具備することを特徴とする情報検索装置。2. The element to be searched and the usage status of each user for the element are obtained, and each element to be searched is used as a value of each dimension of a vector of the usage status of each user for the element. And a means for calculating the similarity between the respective elements to be searched as a distance or an inner product or an angle formed between the two corresponding generated vectors, A means for holding similarity data including two similar elements and corresponding search target elements, and searching the similarity data using the search target element corresponding to a search request from a user as a key; Of the search target elements included in the searched similarity data, the element that is not the search target element corresponding to the search request from the user is sorted and output based on the similarity. An information retrieving apparatus comprising: