JP5367632B2

JP5367632B2 - Knowledge amount estimation apparatus and program

Info

Publication number: JP5367632B2
Application number: JP2010091772A
Authority: JP
Inventors: 大祐佐藤; 宜仁安田; 由美子松浦; 良治片岡
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2010-04-12
Filing date: 2010-04-12
Publication date: 2013-12-11
Anticipated expiration: 2030-04-12
Also published as: JP2011221872A

Description

本発明は、知識量推定装置及びプログラムに係り、特に、電子文書の検索技術における、ユーザの分野ごとの知識量や知識の深さといった背景を推定するための知識量推定装置及びプログラムに関する。 The present invention relates to a knowledge amount estimation device and program, and more particularly to a knowledge amount estimation device and program for estimating a background such as a knowledge amount and a depth of knowledge for each field of a user in an electronic document search technique.

文書検索では、ユーザが自分が探している情報を得るために、情報に関連しそうなキーワードを指定して検索を実施する。検索の精度を向上させるためには、検索を行うユーザの背景を知るための情報の獲得が重要であることが知られており、このため、従来では、ユーザの指定したキーワード履歴や、閲覧した電子文書の履歴から、どの分野に関して関心があるかといった嗜好分野を推定する技術がある（例えば、特許文献１参照）。 In document search, in order to obtain information that the user is searching for, a search is performed by specifying a keyword that is likely to be related to the information. In order to improve the accuracy of the search, it is known that it is important to acquire information to know the background of the user performing the search. For this reason, conventionally, the keyword history specified by the user or the browsing There is a technique for estimating a favorite field such as which field is interested from the history of an electronic document (see, for example, Patent Document 1).

特開2000-148773号公報JP 2000-148773 A

しかしながら、上記のような嗜好推定では、ユーザのよく調べる情報の分野を推定できても、ユーザがその分野において、どの程度精通しているのかといったことは推定できない。 However, in the above-described preference estimation, it is impossible to estimate how familiar the user is in the field even though the field of information frequently examined by the user can be estimated.

例えば、パソコンの知識が少ないユーザが、パソコンについて精通しているユーザ向けの文書を結果として与えられ、語句がわからなくて理解できないケースがある。あるいは、京都に何回も訪れたことのある旅行者が、初めて京都を観光する旅行者に対して書かれた文書を結果として与えられ、それはユーザにとって有用でない。従来技術で推定される、ユーザの嗜好分野情報の利用では、これらの問題に対処することはできない。 For example, there are cases where a user with little knowledge of a personal computer is given a document for a user who is familiar with the personal computer as a result and does not understand the words or phrases. Alternatively, a traveler who has visited Kyoto many times is given as a result a document written for a traveler who is first sightseeing in Kyoto, which is not useful to the user. The use of user preference field information estimated in the prior art cannot address these problems.

本発明は、上記の点に鑑みなされたもので、ユーザの嗜好性を考慮した分野毎の知識量や知識の深さを推定することが可能な知識量推定装置及びプログラムを提供することを目的とする。 The present invention has been made in view of the above points, and it is an object of the present invention to provide a knowledge amount estimation apparatus and program capable of estimating the knowledge amount and the knowledge depth for each field in consideration of user preference. And

図１は、本発明の原理構成図である。 FIG. 1 is a principle configuration diagram of the present invention.

本発明（請求項１）は、検索システムにおいてユーザの分野毎の知識量を推定するための知識量推定装置であって、
ユーザ識別子とクエリログを格納したクエリログ記憶手段１１０と、
語句と分野を対応付けた情報を格納した語句・分野記憶手段１２０と、
ユーザの知識量が推定できるような特殊語句と該特殊語句に対応する効果値を格納した特殊語句記憶手段１４０と、
ユーザ識別子とクエリ及び分野を対応付けて格納するユーザ・クエリ・分野記憶手段１３０と、
ユーザ識別子と分野及び特殊語句効果値を対応付けて格納するユーザ・分野・特殊効果値語句記憶手段１５０と、
第１の知識量推定要素値を格納する第１の知識量推定要素値記憶手段１７０と、
第２の知識量推定要素値を格納する第２の知識量推定要素値記憶手段１８０と、
クエリログ記憶手段１１０から読み出したユーザ識別子とクエリに基づいて、語句・分野記憶手段１２０を参照し、一致する語句があれば、該ユーザ識別子毎に該クエリと該語句をユーザ・クエリ・分野記憶手段１３０に格納する分野参照手段１０と、
ユーザ・クエリ・分野記憶手段１３０の各クエリに基づいて特殊語句記憶手段１４０を参照し、該クエリが特殊語句を含む場合は該特殊語句に対応する効果値を取得してユーザ・分野・特殊語句効果値記憶手段１５０に格納する特殊語句効果値参照手段２０と、
ユーザ・クエリ・分野記憶手段１３０のデータを読み出して、各分野のユーザ毎に、クエリの専門性度合いの総和を求め、当該分野に属する全語句が、当該ユーザによってクエリとして入力された回数で除した値を第１の知識量推定要素値として第１の知識量推定要素値記憶手段１７０に格納する第１の知識量推定要素値算出手段３０と、
ユーザ・分野・特殊語句効果値記憶手段１５０のデータを読み出して、分野毎にクエリの効果値の平均を第２の知識量推定要素値として第２の知識量推定要素値記憶手段１８０に格納する第２の知識量推定要素値算出手段４０と、
第１の知識量推定要素値記憶手段１７０の第１の知識量推定要素値と第２の知識量推定要素値記憶手段１８０の第２の知識量推定要素値の加重平均を求め、知識量推定値として出力する知識量推定手段５０と、を有する。 The present invention (Claim 1) is a knowledge amount estimation device for estimating a knowledge amount for each field of a user in a search system,
Query log storage means 110 storing a user identifier and a query log;
A phrase / field storage means 120 storing information in which a phrase is associated with a field;
Special word / phrase storage means 140 storing a special word / phrase from which the amount of knowledge of the user can be estimated and an effect value corresponding to the special word / phrase;
A user query / field storage means 130 for storing a user identifier, a query and a field in association with each other;
A user / field / special effect value phrase storage unit 150 for storing a user identifier, a field, and a special phrase effect value in association with each other;
First knowledge amount estimation element value storage means 170 for storing a first knowledge amount estimation element value;
Second knowledge amount estimation element value storage means 180 for storing a second knowledge amount estimation element value;
Based on the user identifier and query read from the query log storage unit 110, the phrase / field storage unit 120 is referred to. If there is a matching phrase, the query and the phrase are stored for each user identifier in the user query / field storage unit. Field reference means 10 for storing in 130;
Based on each query of the user query / field storage means 130, the special phrase storage means 140 is referred to. When the query includes a special word / phrase, an effect value corresponding to the special word / phrase is obtained to obtain the user / field / special word / phrase. Special word effect value reference means 20 stored in the effect value storage means 150;
The data of the user / query / field storage means 130 is read out, and the sum of the degree of expertise of the query is obtained for each user in each field, and all words / phrases belonging to the field are divided by the number of times entered by the user as a query. First knowledge amount estimation element value calculation means 30 for storing the obtained value in the first knowledge amount estimation element value storage means 170 as a first knowledge amount estimation element value;
The data of the user / field / special phrase effect value storage unit 150 is read out, and the average of the query effect values is stored in the second knowledge amount estimation element value storage unit 180 as the second knowledge amount estimation element value for each field. Second knowledge amount estimation element value calculation means 40;
A weighted average of the first knowledge amount estimation element value in the first knowledge amount estimation element value storage unit 170 and the second knowledge amount estimation element value in the second knowledge amount estimation element value storage unit 180 is obtained, and knowledge amount estimation is performed. Knowledge amount estimation means 50 that outputs the value.

また、本発明（請求項２）は、請求項１の知識量推定装置において、
クエリと該クエリを入力した時点からｋ個前までの閲覧履歴を格納した閲覧履歴記憶手段１６０を更に有し、
第１の知識量推定要素値算出手段３０は、
クエリに基づいて閲覧履歴記憶手段１６０を参照し、該クエリが閲覧履歴に含まれていた場合に、当該ユーザの該クエリに関する補正定数を１未満の正の定数とし、含まれていない場合は該補正定数を１として、第１の知識量推定要素値に乗算する手段を含む。 The present invention (Claim 2) is the knowledge amount estimation apparatus according to Claim 1,
A browsing history storage means 160 that stores a query and browsing history up to k times from the time when the query was input;
The first knowledge amount estimation element value calculation means 30 includes:
The browsing history storage means 160 is referred to based on the query, and when the query is included in the browsing history, the correction constant related to the query of the user is set as a positive constant less than 1, and when not included, Means for multiplying the first knowledge amount estimation element value by setting the correction constant to 1 is included.

また、本発明（請求項３）は、請求項１、または請求項２の第１の知識量推定要素値算出手段において、
第１の知識量推定要素値を求める際に、クエリの専門性の度合いとして、各分野におけるクエリ全体の中に占める、クエリの頻度割合の逆数を用いる。 Further, according to the present invention (Claim 3), in the first knowledge amount estimation element value calculation means of Claim 1 or Claim 2,
When obtaining the first knowledge amount estimation element value, the reciprocal of the query frequency ratio in the entire query in each field is used as the degree of expertise of the query.

また、本発明（請求項４）は、請求項１乃至３のいずれか１項に記載の第１の知識量推定要素値算出手段において、
第１の知識量推定要素値を求める際に、クエリの専門性度合いの総和の平均を算出する。 Further, the present invention (Claim 4) is the first knowledge amount estimation element value calculation unit according to any one of Claims 1 to 3,
When obtaining the first knowledge amount estimation element value, the average of the sum of the degree of expertise of the query is calculated.

本発明（請求項５）は、請求項１乃至４のいずれか１項に記載の知識量推定装置を構成する各手段としてコンピュータを機能させるための知識量推定プログラムである。 The present invention (Claim 5) is a knowledge amount estimation program for causing a computer to function as each means constituting the knowledge amount estimation apparatus according to any one of Claims 1 to 4.

本発明は、知識量を推定する際に、クエリログのクエリから分野毎にクエリの専門性度合いを算出し、ジャンル別のクエリの専門性の総和をその分野のユーザの第１の知識量推定要素値とし、また、「入門」、「マニア」といった語句に専門性を示す特殊語句の効果値を参照して、クエリの効果値の平均を分野別に算出し、第２の知識量推定要素値とし、これらの２つの推定要素値の加重平均をとることで知識量を求めることにより、ユーザの分野毎における知識量を高精度に推定し、数値として得ることができるため、ユーザのある分野における知識量に見合った、ユーザにより適した情報を提供することの実現に寄与することができる。 When estimating the amount of knowledge, the present invention calculates the degree of expertise of a query for each field from a query of a query log, and calculates the sum of the expertise of queries by genre as a first knowledge amount estimation element of a user in the field Value, and by referring to the effect value of special words that indicate expertise in words such as “Introduction” and “Mania”, the average of the effect value of the query is calculated for each field, and is used as the second knowledge amount estimation element value By obtaining the knowledge amount by taking a weighted average of these two estimated element values, the knowledge amount in each user's field can be estimated with high accuracy and obtained as a numerical value. It is possible to contribute to the realization of providing information more suitable for the user according to the amount.

課題部分で挙げた、「京都」の例で具体的に説明すると、あるユーザが過去に、京都に関して、十分な回数検索システムを用いて電子文書を検索、閲覧していたとする。従来技術では、このユーザが「京都」に関して興味を持っていることがわかる。本発明では、データベースによって分野の特定ができる、かつ他のユーザがクエリとして選択することの少ないクエリで、「京都」に関する電子文書をよく検索していた場合、このユーザには「京都」という分野に関して高い知識量推定値が与えられ、「京都」に関して豊富な知識を持っているということがわかる。 Specifically, in the example of “Kyoto” mentioned in the problem part, it is assumed that a user has searched and browsed an electronic document in the past using a sufficient number of search systems for Kyoto. In the prior art, it can be seen that this user is interested in “Kyoto”. In the present invention, when a user can often search for an electronic document related to “Kyoto” with a query that can be identified by a database and is rarely selected by other users as a query, It can be seen that a high knowledge amount estimation value is given for, and that there is abundant knowledge about "Kyoto".

本発明の原理構成図である。It is a principle block diagram of this invention. 本発明の一実施の形態における知識量推定装置の構成図である。It is a block diagram of the knowledge amount estimation apparatus in one embodiment of this invention. 本発明の一実施の形態におけるクエリログDBの例である。It is an example of query log DB in one embodiment of this invention. 本発明の一実施の形態における語句・分野DBの例である。It is an example of the phrase / field DB in one embodiment of the present invention. 本発明の一実施の形態におけるユーザ・クエリ・分野DBの例である。It is an example of user query field DB in one embodiment of the present invention. 本発明の一実施の形態における特殊語句DBの例である。It is an example of special phrase DB in one embodiment of the present invention. 本発明の一実施の形態におけるユーザ・分野・特殊語句効果値DBの例である。It is an example of user, field, and special phrase effect value DB in one embodiment of the present invention. 本発明の一実施の形態における閲覧履歴DBの例である。It is an example of browsing history DB in one embodiment of the present invention. 本発明の一実施の形態における知識量推定要素値C1DBの例である。It is an example of the knowledge amount estimation element value C1DB in one embodiment of the present invention. 本発明の一実施の形態における知識量推定要素値C2DBの例である。It is an example of knowledge amount estimation element value C2DB in one embodiment of the present invention. 本発明の一実施の形態における知識量推定要素値C1DBの作成のフローチャートである。It is a flowchart of preparation of the knowledge amount estimation element value C1DB in one embodiment of this invention. 本発明の一実施の形態における知識量推定要素値C2DBの作成のフローチャートである。It is a flowchart of preparation of the knowledge amount estimation element value C2DB in one embodiment of this invention. 本発明の一実施の形態における知識量推定部のフローチャートである。It is a flowchart of the knowledge amount estimation part in one embodiment of this invention.

以下図面と共に、本発明の実施の形態を説明する。 Embodiments of the present invention will be described below with reference to the drawings.

図２は、本発明の一実施の携帯における推定装置の構成を示す。 FIG. 2 shows a configuration of a portable estimation apparatus according to an embodiment of the present invention.

同図に示す推定装置は、分野参照部１０、特殊語句効果値さん勝負２０、知識量推定要素値C1データベース（DB）作成部３０、知識量推定要素値C2データベース（DB）作成部４０、知識量推定部５０、クエリログDB１１０、語句・分野DB１２０、ユーザ・クエリ・分野DB１３０、特殊語句DB１４０、ユーザ・分野・特殊語句効果値DB１５０、閲覧履歴DB１６０、知識量推定要素値C1DB１７０、知識量推定要素値C2DB１８０から構成される。 The estimation apparatus shown in the figure includes a field reference unit 10, a special phrase effect value game 20, a knowledge amount estimation element value C1 database (DB) creation unit 30, a knowledge amount estimation element value C2 database (DB) creation unit 40, knowledge Amount estimation unit 50, query log DB 110, phrase / field DB 120, user / query / field DB 130, special phrase DB 140, user / field / special phrase effect value DB 150, browsing history DB 160, knowledge amount estimation element value C1DB 170, knowledge amount estimation element value It consists of C2DB180.

クエリログDB１１０は、図３に示すように、ユーザの検索システム利用履歴から構成される。本発明においては、ユーザが検索システムを利用して電子文書を探す際に入力するクエリに関するログを使用する。これをクエリログと呼ぶ。クエリログは打ち込まれたクエリと打ち込んだユーザを特定する情報（ユーザ識別子）の２つを含む。 As shown in FIG. 3, the query log DB 110 is composed of a user's search system usage history. In the present invention, a log related to a query input when a user searches for an electronic document using a search system is used. This is called a query log. The query log includes two types of information (user identifier) for specifying a typed query and a typed user.

語句・分野DB１２０は、図４に示すように、各語句がどの分野に属するかを参照するための、語句と分野の対応リストを持つ。 As shown in FIG. 4, the phrase / field DB 120 has a correspondence list of words and fields for referring to which field each word belongs.

ユーザ・クエリ・分野DB１３０は、図５に示すように、ユーザ毎にクエリとその分野を格納する。 As shown in FIG. 5, the user / query / field DB 130 stores a query and its field for each user.

特殊語句DB１４０の例を図６に示す。複数の語句を一つのクエリとして検索する場合、それらの語句の中にユーザの知識量を示す語句が含まれることがある。ユーザの知識量が推定できるような特殊語句と、その特殊語句にそれぞれ予め設定された特殊語句効果値をリストとして持つ。例えば、比較的知識量が少ないことを示す語句としては、「入門」や「初心者」などであり、逆に知識量が多いことを示す語句としては「マニア」や「おたく」などである。 An example of the special phrase DB 140 is shown in FIG. When searching for a plurality of phrases as one query, a phrase indicating the knowledge amount of the user may be included in the phrases. A special phrase that can estimate the amount of knowledge of the user and a special phrase effect value preset for each special phrase are listed. For example, “introductory” or “beginner” is a phrase indicating that the amount of knowledge is relatively small, and “mania” or “nerdy” is a phrase indicating that the amount of knowledge is large.

ユーザ・分野・特殊語句効果値DB１５０は、図７に示すように、ユーザ毎に分野と特殊特効果値を格納する。 As shown in FIG. 7, the user / field / special phrase effect value DB 150 stores a field and a special special effect value for each user.

閲覧履歴DB１６０は、図８に示すように、ユーザ、クエリ、クエリを入力した時点からｋ（ｋは定数）個前までに閲覧していた文書の文字データを含んでいる。 As shown in FIG. 8, the browsing history DB 160 includes character data of documents that have been browsed up to k (k is a constant) before the user, query, and query are input.

知識量推定要素値C1DB１７０は、知識量推定要素値C1作成部３０により生成され、図９に示すように、ユーザ毎に分野の知識量推定要素値C₁を格納する。 Knowledge estimating element value C1DB170 is generated by knowledge estimation element value C1 creation unit 30, as shown in FIG. 9, stores the knowledge estimation element value C ₁ of the areas for each user.

知識量推定要素値C2DB１８０は、知識量推定要素値C2作成部４０により生成され、図１０に示すように、ユーザ毎に分野の知識量推定要素値C₂を格納する。 Knowledge estimation element value C2DB180 is generated by knowledge estimation element value C2 creation unit 40, as shown in FIG. 10, stores the knowledge estimation element values C ₂ fields for each user.

以下に、推定装置の各構成要素について説明する。 Below, each component of an estimation apparatus is demonstrated.

分野参照部１０は、クエリログDB１１０からユーザ名とクエリがペアとなったデータを読み込み、各ペアについて以下の処理を行う。クエリをキーとして、語句・分野DB１２０を参照し、クエリと一致する語句があれば、当該語句の属する分野をそのクエリの属する分野とし、ユーザ・クエリ・分野DB１３０に格納する。一致しない場合、当該クエリはユーザ・クエリ・分野DB１３０に格納しない。クエリログDB１１０中に、同一のユーザ、クエリのペアが複数ある場合、重複を許す方法と、重複しているものは1つにまとめる方法が考えられる。 The field reference unit 10 reads data in which a user name and a query are paired from the query log DB 110 and performs the following processing for each pair. Using the query as a key, the phrase / field DB 120 is referenced, and if there is a phrase that matches the query, the field to which the word belongs belongs to the field to which the query belongs and is stored in the user query / field DB 130. If they do not match, the query is not stored in the user query / field DB 130. When there are a plurality of pairs of the same user and query in the query log DB 110, there are a method of allowing duplication and a method of collecting duplicates into one.

特殊語句効果参照部２０は、ユーザ・クエリ・分野DB１３０の各クエリについて以下の処理を行う。ユーザ・クエリ・分野DB１３０からクエリを取得し、当該クエリに基づいて特殊語句DB１４０を参照し、クエリが特殊語句を含むのであれば、参照した値をそのクエリの属する分野の特殊語句効果値としてユーザ・分野・特殊語句効果値DB１５０に格納する。クエリが特殊語句を含まない場合はユーザ・分野・特殊語句効果値DB１５０には格納しない。 The special phrase effect reference unit 20 performs the following processing for each query in the user query / field DB 130. A query is acquired from the user / query / field DB 130, the special word / phrase DB 140 is referred to based on the query, and if the query includes a special word / phrase, the referred value is used as the special word / phrase effect value of the field to which the query belongs. -Stored in the field / special phrase effect value DB 150. When the query does not include a special phrase, it is not stored in the user / field / special phrase effect value DB 150.

知識量推定要素値C1DB作成部３０は、図１１に示す処理を行う。 The knowledge amount estimation element value C1DB creation unit 30 performs the processing shown in FIG.

ステップ１０１）まず、知識量推定要素値C1DB作成部３０は、ユーザ・クエリ・分野DB１３０を読み込み、ユーザ名u、分野名d、クエリｔを取得する
ステップ１０２）ユーザ・クエリ・分野DB１３０中の対象分野ｄに属する全クエリ数（Nd）をカウント済みかを判定し、カウントしていない場合はステップ１０３に移行し、カウントしている場合はステップ１０４に移行する。 Step 101) First, the knowledge amount estimation element value C1DB creation unit 30 reads the user query / field DB 130 and obtains the user name u, the field name d, and the query t. Step 102) Targets in the user query / field DB 130 It is determined whether the total number of queries (Nd) belonging to the field d has been counted. If not counted, the process proceeds to step 103. If counted, the process proceeds to step 104.

ステップ１０３）ユーザ・クエリ・分野DB１３０中のdを含む行数Ndをカウントする（Nd=Nd＋1）。 Step 103) Count the number Nd of rows including d in the user query / field DB 130 (Nd = Nd + 1).

ステップ１０４）ユーザ・クエリ・分野DB１３０のクエリｔを含む行数ｑf(t)をカウントする（ｑf(t)＝ｑf(t)＋１）。 Step 104) Count the number of rows qf (t) including the query t of the user / query / field DB 130 (qf (t) = qf (t) +1).

ステップ１０５）各分野においてユーザの専門性の指標を調べるため、IQF(Inverse Query Frequency)を算出する。ここでのIQFは当該分野におけるクエリ全体の中に占める、そのクエリの頻度の割合の逆数を用いたもので、以下に分野ｄにおけるクエリｔのIQFを表すiqf_d(t)を求める式を示す。 Step 105) IQF (Inverse Query Frequency) is calculated in order to examine the index of user expertise in each field. The IQF here uses the reciprocal of the ratio of the frequency of the query in the entire query in the field, and the following formula is used to obtain iqf _d (t) representing the IQF of the query t in the field d. .

ここで、Ｎ_dはユーザ・クエリ・分野DB１３０中の対象分野ｄに属する全クエリの数、qf(t)は、ユーザ・クエリ・分野DB１３０中のクエリｔの出現回数である。

Here, N _d is the number of all queries belonging to the target field d in the user query / field DB 130, and qf (t) is the number of appearances of the query t in the user query / field DB 130.

Iqf_d(t)が高いほど、語句ｔは分野ｄに対して専門性の高い語句であると判断される。これは、ある分野においてクエリとして入力される頻度の低い語句を使って検索するユーザは、その分野において高い専門性を持っているとの仮定に基づいている。例えば、「野球」という分野において、「プロ野球」や「高校野球」といった、全ユーザを対象としたクエリログにおいて、頻度の高いと思われるクエリを入力する人よりも、「クイックモーション」や「ツーシーム」といった、入力される頻度の低い語句をクエリとして選んでいる人のほうが、「野球」について詳しいと考えられる。 As Iqf _d (t) is higher, the phrase t is determined to be a phrase with higher expertise for the field d. This is based on the assumption that a user who searches using a phrase that is less frequently entered as a query in a certain field has high expertise in that field. For example, in the field of “baseball”, “quick motion” and “two seam” in query logs targeting all users, such as “professional baseball” and “high school baseball”, rather than those who input a query that seems to be frequent. It is considered that people who select words and phrases that are entered less frequently as queries are more detailed about “baseball”.

ステップ１０６）ユーザ・クエリ・分野DB１３０中のユーザu、分野dを含む行数N_u,dに１を足す。 Step 106) Add 1 to the number of rows N _{u, d} including the user u and the field d in the user query field DB 130.

ステップ１０７）クエリｔが入力された時点からｋ個前までの閲覧履歴中に、クエリｔが含まれていた場合はステップ１０８に移行し、含まれていない場合はステップ１０９に移行する。 Step 107) If the query t is included in the browsing history from the time point when the query t is inputted to the kth previous history, the process proceeds to Step 108, and if not included, the process proceeds to Step 109.

ステップ１０８）ユーザは、本来その分野に詳しくなくても、直近閲覧した文書によって、未知の分野の専門性の高い語をクエリとして入力することが考えられる。このため、直近の閲覧履歴を考慮に入れるために補正定数を導入する。クエリｔが入力された時点からｋ（ｋは定数）個前までの閲覧履歴中に、クエリｔが含まれていた場合、ユーザｕのクエリｔに関する補正定数cor_u,tは１未満の正の定数をとる。 Step 108) Even if the user is not originally familiar with the field, it is conceivable that a highly specialized word in an unknown field is input as a query by a recently viewed document. For this reason, a correction constant is introduced to take into account the latest browsing history. When the query t is included in the browsing history up to k (k is a constant) before the query t is input, the correction constant cor _{u, t} regarding the query t of the user u is a positive value less than 1. Take a constant.

ステップ１０９）閲覧履歴DB１６０に含まれていない場合、補正定数cor_u,tは１となる。 Step 109) If it is not included in the browsing history DB 160, the correction constant cor _{u, t} is 1.

ステップ１１０）ユーザu、分野dを含む行全てをqf(t)を求めた場合はステップ１１１に移行し、求めていない場合はステップ１０１に移行する。 Step 110) If qf (t) is obtained for all rows including the user u and the field d, the process proceeds to Step 111. If not, the process proceeds to Step 101.

ステップ１１１）ユーザ・クエリ・分野DB１３０において、あるユーザuの、ある分野ｄに属する語句全てのIQFの集合をQ_u,dとして、IQFに基づくユーザのある分野における知識量推定要素値C₁（Q_u,d）を求める。以下にC₁（Q_u,d）を求める式の一例を示す。 Step 111) In the user query / field DB 130, a set of IQFs of all words belonging to a certain field d of a certain user u is defined as Qu _{, d} , and the knowledge amount estimation element value C ₁ (Q ₁ ( Q _{u, d} ). An example of an expression for _obtaining C ₁ (Q _{u, d} ) is shown below.

ここで、N_u,dは、ユーザuのクエリログに残る、対象分野ｄに属する全語句がクエリとして入力された回数である。上記の例ではiqf_d(t)の相加平均をとっているが、他にも相乗平均や調和平均をとる方法が考えられる。N_u,dの値が小さいときには、知識量の推定精度が低いと考えられるため、外れ値を除く方法や、N_u,dが予め設定された閾値を超えるまでは０とするといった方法も考えられる。

Here, N _{u, d} is the number of times that all the words belonging to the target field d that remain in the query log of the user u are input as a query. In the above example, the arithmetic average of iqf _d (t) is taken, but other methods such as a geometric mean and a harmonic mean can be considered. When the value of N _{u, d} is small, it is considered that the estimation accuracy of the knowledge amount is low, so a method of removing outliers or a method of setting 0 until N _{u, d} exceeds a preset threshold is also considered. It is done.

ステップ１１２）算出されたC1(Q_u,d)は、ユーザuと分野ｄと共に知識量推定要素値C1,DB１７０に格納される。 Step 112) The calculated C1 (Q _{u, d} ) is stored in the knowledge amount estimation element value C1, DB 170 together with the user u and the field d.

ステップ１１３）ユーザ・クエリ・分野DB１３０の全ての行を読み込んだ場合は当該処理を終了し、読み込んでいない行がある場合はステップ１０１に移行する。 Step 113) When all the rows of the user / query / field DB 130 have been read, the processing is terminated, and when there is a row that has not been read, the processing proceeds to Step 101.

知識量推定要素値C2DB作成部４０は、図１２に示す処理を行う。 The knowledge amount estimation element value C2DB creation unit 40 performs the processing shown in FIG.

ステップ２０１）知識量推定要素値C2DB作成部４０は、ユーザ・分野・特殊語句効果値DB１５０を読み込み、ユーザ名u、分野名d、特殊語句効果値S_u,dを取得する。 Step 201) The knowledge amount estimation element value C2DB creation unit 40 reads the user / field / special phrase effect value DB 150 and obtains the user name u, the field name d, and the special phrase effect value Su _{, d} .

ステップ２０２）次に、ユーザ・分野・特殊語句効果値DB１５０のユーザu、分野ｄを含む行数N_u,d,sに１を足す（N_u,d,s＝N_u,d,s＋１）。 Step 202) Next, 1 is added to the number of rows _{Nu, d, s} including the user u and the field d in the user / field / special phrase effect value DB 150 (N _{u, d, s} = N _{u, d, s} +1). ).

ステップ２０３）ユーザ・分野・特殊語句効果値DB１５０からu、dを含む行を全て読み込んだかを判定し、読み込んだ場合はステップ２０４に移行し、読み込んでいない場合はステップ２０１に移行する。 Step 203) It is determined whether all lines including u and d have been read from the user / field / special phrase effect value DB 150. If read, the process proceeds to Step 204. If not, the process proceeds to Step 201.

ステップ２０４）ユーザuの分野ｄにおける特殊語句効果値sの集合をS_u,dとして、特殊語句に基づく知識量推定要素値C₂(S_u,d)を、以下の式により算出する。 Step 204) With the set of special phrase effect values s in the field d of the user u as S _{u, d} , a knowledge amount estimation element value C ₂ (S _{u, d} ) based on the special phrase is calculated by the following equation.

上記の式において、N_u,d,sは、ユーザuのクエリログに残る、対象分野ｄに属するかつ特殊語句効果値が定義されている全語句がクエリとして入力された回数である。上記の例では、相加平均を取っているが、他にも相乗平均や調和平均をとる方法が考えられる。

In the above formula, N _{u, d, s} is the number of times that all the words belonging to the target field d and defined with the special word effect value remaining in the user u's query log are input as a query. In the above example, an arithmetic average is taken, but other methods such as a geometric mean and a harmonic mean can be considered.

ステップ２０５）上記で求められた知識量推定要素値C₂(S_u,d)とユーザ名u、分野名ｄを知識量推定要素値C2DB１８０に格納する。 Step 205) The knowledge amount estimation element value C ₂ (S _{u, d} ) obtained above, the user name u, and the field name d are stored in the knowledge amount estimation element value C2DB 180.

ステップ２０６）ユーザ分野特殊語句効果値DB１５０の全ての行を読み込んだ場合は、当該処理を終了し、まだ残りの行がある場合はステップ２０１に移行する。 Step 206) When all the rows of the user field special phrase effect value DB 150 have been read, the processing is terminated, and when there are still remaining rows, the processing proceeds to Step 201.

知識量推定部５０は、図１３に示す処理を行う。 The knowledge amount estimation unit 50 performs the process shown in FIG.

ステップ３０１）知識量推定部５０は、ユーザｕ及び分野ｄを取得する。 Step 301) The knowledge amount estimation unit 50 acquires the user u and the field d.

ステップ３０２）知識量推定要素値C1DB１７０にユーザ名ｕ、分野ｄを含む行が存在するかを判定し、存在しない場合はステップ３０４に移行する。 Step 302) It is determined whether there is a row including the user name u and the field d in the knowledge amount estimation element value C1DB 170. If there is no row, the process proceeds to Step 304.

ステップ３０３）知識量推定要素値C1DB１７０からユーザ名ｕ、分野ｄを含む行中の知識量推定要素値C₁(Q_u,d)の値を取得する。 Step 303) The value of the knowledge amount estimation element value C ₁ (Q _{u, d} ) in the line including the user name u and the field d is acquired from the knowledge amount estimation element value C1DB 170.

ステップ３０４）知識量推定部５０は、知識量推定値C2DB１８０に、ユーザｕ、分野ｄをキーとして参照し、ｕ，ｄを含む行が存在しない場合はステップ３０６に移行する。 Step 304) The knowledge amount estimation unit 50 refers to the knowledge amount estimated value C2DB 180 using the user u and the field d as a key, and proceeds to Step 306 when there is no row including u and d.

ステップ３０５）知識量推定値C2DB１８０に、ｕ，ｄを含む行が存在する場合は、知識量推定要素値C₂(S_u,d)を取得する。 Step 305) If there is a row including u and d in the knowledge amount estimation value C2DB 180, the knowledge amount estimation element value C ₂ (S _{u, d} ) is acquired.

ステップ３０６）取得した知識量推定要素値C₁、知識量推定要素値C₂の少なくとも１いずれか一方に値があるか判定し、ある場合はステップ３０７に移行し、ない場合は当該処理を終了する。 Step 306) It is determined whether at least one of the acquired knowledge amount estimation element value C ₁ and knowledge amount estimation element value C ₂ has a value. If there is a value, the process proceeds to Step 307. If not, the process ends. To do.

ステップ３０７）知識量推定要素値C₁(Q_u,d)、知識量推定要素値C₂(S_u,d)を用いて以下の式により、知識量推定値を算出して出力する。なお、ステップ３０６において、いずれか一方にしか値がない場合は、その知識量推定要素値の値は０とする。 Step 307) Using the knowledge amount estimation element value C ₁ (Q _{u, d} ) and the knowledge amount estimation element value C ₂ (S _{u, d} ), the knowledge amount estimation value is calculated and output according to the following equation. In step 306, if only one of them has a value, the value of the knowledge amount estimation element value is set to zero.

上記のｗ_１，ｗ_２は、C_１，C_２にそれぞれ係る重みを示すための、予め調整された定数である。

The above w ₁ and w ₂ are constants adjusted in advance to indicate the weights related to C ₁ and C ₂ , respectively.

従来では、検索システムのユーザの検索ログを利用することによって、ユーザの嗜好に関する背景を推定するアプローチがとられていたが、本実施の形態では、検索システムのユーザログの一部であるクエリログ、さらに、ユーザの閲覧履歴を用いることにより、ユーザの知識量に関する背景を高精度に推定することが可能となる。 Conventionally, an approach for estimating a background related to user preferences by using a search log of a user of the search system has been taken, but in this embodiment, a query log that is a part of the user log of the search system, Furthermore, by using the user's browsing history, it is possible to estimate the background related to the user's knowledge amount with high accuracy.

なお、上記の図２に示す知識量推定装置の各構成要素の動作をプログラムとして構築し、知識量推定装置として利用されるコンピュータにインストールする、または、ネットワークを介して流通させることが可能である。 The operation of each component of the knowledge amount estimation device shown in FIG. 2 can be constructed as a program and installed in a computer used as the knowledge amount estimation device, or distributed via a network. .

また、構築されたプログラムをハードディスクや、フレキシブルディスク、ＣＤ−ＲＯＭ等の可搬記憶媒体に格納し、コンピュータにインストールする、または、配布することが可能である。 In addition, the constructed program can be stored in a portable storage medium such as a hard disk, a flexible disk, or a CD-ROM, and can be installed or distributed in a computer.

本発明は、上記の実施の形態に限定されることなく、特許請求の範囲内において、種々変更・応用が可能である。 The present invention is not limited to the above-described embodiments, and various modifications and applications are possible within the scope of the claims.

１０分野参照手段、分野参照部
２０特殊語句効果値参照手段、特殊語句効果値さん勝負
３０第１の知識量推定要素値算出手段、知識量推定要素値C1DB作成部
４０第２の知識量推定要素値算出手段、知識量推定要素値C2DB作成部
５０知識量推定手段、知識量推定部
１１０クエリログ記憶手段、クエリログDB
１２０語句・分野記憶手段、語句・分野DB
１３０ユーザ・クエリ・分野記憶手段、ユーザ・クエリ・分野DB
１４０特殊語句記憶手段、特殊語句DB
１５０ユーザ・分野・特殊語句効果値記憶手段、ユーザ・分野・特殊語句効果値DB
１６０閲覧履歴記憶手段、閲覧履歴DB
１７０第１の知識量推定要素値記憶手段、知識量推定要素値C1DB
１８０第２の知識量推定要素値記憶手段、知識量推定要素値C2DB 10 field reference means, field reference section 20 special phrase effect value reference means, special phrase effect value 3 game 30 first knowledge amount estimation element value calculation means, knowledge amount estimation element value C1DB creation section 40 second knowledge amount estimation element Value calculation means, knowledge amount estimation element value C2DB creation unit 50 Knowledge amount estimation means, knowledge amount estimation unit 110 Query log storage means, query log DB
120 Word / Field Storage, Word / Field DB
130 User / Query / Field Storage, User / Query / Field DB
140 Special phrase storage means, special phrase DB
150 User / field / special phrase effect value storage means, user / field / special phrase effect value DB
160 browsing history storage means, browsing history DB
170 First knowledge amount estimation element value storage means, knowledge amount estimation element value C1DB
180 Second knowledge amount estimation element value storage means, knowledge amount estimation element value C2DB

Claims

検索システムにおいてユーザの分野毎の知識量を推定するための知識量推定装置であって、
ユーザ識別子とクエリログを格納したクエリログ記憶手段と、
語句と分野を対応付けた情報を格納した語句・分野記憶手段と、
ユーザの知識量が推定できるような特殊語句と該特殊語句に対応する効果値を格納した特殊語句記憶手段と、
ユーザ識別子とクエリ及び分野を対応付けて格納するユーザ・クエリ・分野記憶手段と、
ユーザ識別子と分野及び特殊語句効果値を対応付けて格納するユーザ・分野・特殊効果値語句記憶手段と、
第１の知識量推定要素値を格納する第１の知識量推定要素値記憶手段と、
第２の知識量推定要素値を格納する第２の知識量推定要素値記憶手段と、
前記クエリログ記憶手段から読み出した前記ユーザ識別子とクエリに基づいて、前記語句・分野記憶手段を参照し、一致する語句があれば、該ユーザ識別子毎に該クエリと該語句をユーザ・クエリ・分野記憶手段に格納する分野参照手段と、
前記ユーザ・クエリ・分野記憶手段の各クエリに基づいて前記特殊語句記憶手段を参照し、該クエリが特殊語句を含む場合は該特殊語句に対応する効果値を取得してユーザ・分野・特殊語句効果値記憶手段に格納する特殊語句効果値参照手段と、
前記ユーザ・クエリ・分野記憶手段のデータを読み出して、各分野の各ユーザ毎に、前記クエリの専門性度合いの総和を求め、当該分野に属する全語句が、当該ユーザによってクエリとして入力された回数で除した値を第１の知識量推定要素値として第１の知識量推定要素値記憶手段に格納する第１の知識量推定要素値算出手段と、
前記ユーザ・分野・特殊語句効果値記憶手段のデータを読み出して、分野毎にクエリの効果値の平均を第２の知識量推定要素値として第２の知識量推定要素値記憶手段に格納する第２の知識量推定要素値算出手段と、
前記第１の知識量推定要素値記憶手段の前記第１の知識量推定要素値と前記第２の知識量推定要素値記憶手段の第２の知識量推定要素値の加重平均を求め、知識量推定値として出力する知識量推定手段と、
を有することを特徴とする知識量推定装置。 A knowledge amount estimation device for estimating a knowledge amount for each field of a user in a search system,
Query log storage means for storing a user identifier and a query log;
A phrase / field storage means for storing information in which a phrase is associated with a field;
A special word / phrase storage means for storing a special word / phrase from which the amount of knowledge of the user can be estimated and an effect value corresponding to the special word / phrase;
User query / field storage means for storing a user identifier, a query and a field in association with each other;
A user / field / special effect value phrase storage means for storing a user identifier, a field, and a special phrase effect value in association with each other;
First knowledge amount estimation element value storage means for storing a first knowledge amount estimation element value;
Second knowledge amount estimation element value storage means for storing a second knowledge amount estimation element value;
Based on the user identifier and query read from the query log storage means, the phrase / field storage means is referred to, and if there is a matching phrase, the query and the phrase are stored in the user query / field storage for each user identifier. A field reference means for storing in the means;
The special phrase storage means is referred to based on each query of the user query / field storage means, and when the query includes a special word / phrase, an effect value corresponding to the special phrase is obtained to obtain the user / field / special phrase. Special word effect value reference means to be stored in the effect value storage means;
The number of times that all the words and phrases belonging to the field are input as a query by the user by reading the data of the user query / field storage means, obtaining the sum of the degree of expertise of the query for each user in each field First knowledge quantity estimation element value calculation means for storing the value divided by 1 in the first knowledge quantity estimation element value storage means as a first knowledge quantity estimation element value;
The data of the user / field / special phrase effect value storage means is read out, and the average of the query effect values is stored in the second knowledge amount estimation element value storage means as the second knowledge amount estimation element value for each field. 2 knowledge amount estimation element value calculation means;
Calculating a weighted average of the first knowledge amount estimation element value of the first knowledge amount estimation element value storage means and the second knowledge amount estimation element value of the second knowledge amount estimation element value storage means; Knowledge amount estimating means for outputting an estimated value;
A knowledge amount estimation device characterized by comprising:

クエリと該クエリを入力した時点からｋ個前までの閲覧履歴を格納した閲覧履歴記憶手段を更に有し、
前記第１の知識量推定要素値算出手段は、
前記クエリに基づいて前記閲覧履歴記憶手段を参照し、該クエリが前記閲覧履歴に含まれていた場合に、当該ユーザの該クエリに関する補正定数を１未満の正の定数とし、含まれていない場合は該補正定数を１として、前記第１の知識量推定要素値に乗算する手段を含む
請求項１記載の知識量推定装置。 A browsing history storage means for storing a query and browsing history up to k times from the time when the query was input;
The first knowledge amount estimation element value calculation means includes:
When the browsing history storage means is referred to based on the query, and the query is included in the browsing history, the correction constant related to the query of the user is a positive constant less than 1, and is not included 2. The knowledge amount estimation apparatus according to claim 1, further comprising means for multiplying the first knowledge amount estimation element value by setting the correction constant to 1.

前記第１の知識量推定要素値算出手段は、
前記第１の知識量推定要素値を求める際に、前記クエリの専門性の度合いとして、各分野におけるクエリ全体の中に占める、クエリの頻度割合の逆数を用いる
請求項１または２記載の知識量推定装置。 The first knowledge amount estimation element value calculation means includes:
The knowledge amount according to claim 1 or 2, wherein, when obtaining the first knowledge amount estimation element value, an inverse number of a query frequency ratio occupying in all queries in each field is used as a degree of expertise of the query. Estimating device.

前記第１の知識量推定要素値算出手段は、
前記第１の知識量推定要素値を求める際に、前記クエリの専門性度合いの総和の平均を算出する
請求項１乃至３のいずれか１項に記載の知識量推定装置。 The first knowledge amount estimation element value calculation means includes:
The knowledge amount estimation apparatus according to any one of claims 1 to 3, wherein when obtaining the first knowledge amount estimation element value, an average of a sum total of the degree of expertise of the query is calculated.

請求項１乃至４のいずれか１項に記載の知識量推定装置を構成する各手段としてコンピュータを機能させるための知識量推定プログラム。 The knowledge amount estimation program for functioning a computer as each means which comprises the knowledge amount estimation apparatus of any one of Claims 1 thru | or 4.