JP2016133824A

JP2016133824A - Document search apparatus, document search method, and program

Info

Publication number: JP2016133824A
Application number: JP2015006014A
Authority: JP
Inventors: 侑吾西川; Yugo Nishikawa; 和久大野; Kazuhisa Ono; 益丈小沢; Masuhiro Ozawa; 松本　征二; Seiji Matsumoto; 征二松本; 中川　修; Osamu Nakagawa; 修中川
Original assignee: Dai Nippon Printing Co Ltd
Current assignee: Dai Nippon Printing Co Ltd
Priority date: 2015-01-15
Filing date: 2015-01-15
Publication date: 2016-07-25
Anticipated expiration: 2035-01-15
Also published as: JP6524668B2

Abstract

PROBLEM TO BE SOLVED: To provide a document search apparatus configured to search for a document highly related to a word selected by a user from among a plurality of documents, to display it in accordance with a degree of relevance, a search method, and a program.SOLUTION: A document search server stores a selected word in an arrangement, in response to selection of the word indicating a word displayed on a terminal or a document. A document search system extracts a word (the selected word or a history word) stored in the arrangement, searches a word relevance DB, a document relevance DB, and a document word relevance DB, for a word or a document highly related to the extracted word, and acquires a degree of relevance thereof, to calculate a score. Scores of the word or the document acquired for all the words stored in the arrangement are merged, to generate a score list of the word or the document. On the basis of the score list of the word or the document, the terminal outputs a distance between the selected word and the word indicating the word or the document, arranged based on the score, to a display section.SELECTED DRAWING: Figure 9

Description

本発明は、複数の文書から関連性の高い文書または文書情報を検索するための文書検索装置等に関するものである。 The present invention relates to a document search apparatus for searching highly relevant documents or document information from a plurality of documents.

従来より、複数の情報から所望の情報を検索するための検索エンジン（例えば、ｇｏｏｇｌｅ（登録商標）等）が存在する。 Conventionally, there are search engines (for example, Google (registered trademark)) for searching for desired information from a plurality of information.

特許文献１には、ユーザが単語を入力するごとに入力単語に関連する他の単語の検索を繰り返して生成された単語空間に基づいて、入力単語間に存在する単語または単語情報を提示する検索システムが記載されている。 Patent Document 1 discloses a search that presents words or word information existing between input words based on a word space generated by repeatedly searching for other words related to the input word each time the user inputs a word. The system is described.

特開２０１２−１２３６３９号公報JP 2012-123039 A

しかしながら上述の特許文献１に記載の技術では、ユーザに提示される検索結果が入力単語間に限定されるため、ユーザが全く予想としていない新たな情報が得られることがなかった。 However, in the technique described in Patent Document 1 described above, since the search result presented to the user is limited to the input words, new information that is not expected by the user has never been obtained.

本発明は、前述した問題点に鑑みてなされたもので、その目的とすることは、ユーザの選択した語と関連性の高い文書等を検索して、関連度合に応じた表示を行う文書検索装置等を提供するものである。 The present invention has been made in view of the above-described problems, and an object of the present invention is to search for a document or the like that is highly relevant to the word selected by the user, and to perform display according to the degree of relevance. A device or the like is provided.

前述の課題を解決するために第１の発明は、複数のテキストデータと、前記テキストデータ同士の関連度を記憶する記憶手段と、画面上に前記テキストデータの少なくとも一部を複数表示する第１の表示手段と、前記画面上の選択されたテキストデータに対して前記関連度を参照して他のテキストデータを抽出する抽出手段と、前記選択されたテキストデータ及び前記抽出手段により抽出されたテキストデータの少なくとも一部を前記画面上に表示する第２の表示手段と、を具備することを特徴とする文書検索装置である。
第１の発明によれば、ユーザによって選択された語と関連性の高い文書等を検索して得られた結果を、選択された語との関連度合に応じて表示することが可能となる。
なお、本発明においてテキストデータとは文書、文書に登場する単語、文書に関する情報（文書の著者名等）のいずれかを少なくとも含む。 In order to solve the above-described problem, a first invention is a first invention in which a plurality of text data, storage means for storing the degree of association between the text data, and a plurality of at least a part of the text data are displayed on a screen. Display means, extraction means for extracting other text data with reference to the degree of association with the selected text data on the screen, the selected text data and the text extracted by the extraction means And a second display means for displaying at least a part of the data on the screen.
According to the first invention, it is possible to display a result obtained by searching a document or the like highly related to the word selected by the user according to the degree of association with the selected word.
In the present invention, the text data includes at least one of a document, a word appearing in the document, and information related to the document (such as the author name of the document).

また、前記抽出手段は、前記関連度の高いテキストデータを所定数抽出することが望ましい。
これにより、ユーザによって選択された語と関連性の高い文書等を所定数抽出してユーザに提示することができる。従って、ユーザは検索された複数の文書等の中から所望の文書等を選択して、更なる文書検索を行うことが可能となる。 Further, it is desirable that the extraction means extracts a predetermined number of text data having a high degree of association.
As a result, a predetermined number of documents or the like highly related to the word selected by the user can be extracted and presented to the user. Therefore, the user can select a desired document from a plurality of retrieved documents and perform a further document search.

また、前記第1の表示手段及び／又は前記第２の表示手段は、前記関連度に応じて表示方式を変更することが望ましい。
これにより、表示される文書間の関連度合を視覚的にわかりやすくユーザに提示することが可能となる。 Further, it is desirable that the first display unit and / or the second display unit change a display method according to the degree of association.
As a result, the degree of association between the displayed documents can be presented to the user in a visually easy-to-understand manner.

また、前記第1の表示手段及び／又は前記第２の表示手段は、前記テキストデータの頻出度に応じて表示方式を変更することが望ましい。
これにより、表示される文書等の頻出度を視覚的にわかりやすくユーザに提示することが可能となる。 Further, it is desirable that the first display unit and / or the second display unit change a display method according to the frequency of the text data.
This makes it possible to present to the user the degree of frequent occurrence of the displayed document and the like in a visually easy-to-understand manner.

また、前記第２の表示手段は、前記抽出されたテキストデータの前記関連度をｎ倍して表示方式を変更することが望ましい。
これにより、選択された語と検索された文書等との関連度合を視覚的にわかりやすくユーザに提示することが可能となる。 The second display means may change the display method by multiplying the relevance of the extracted text data by n.
As a result, the degree of association between the selected word and the retrieved document or the like can be presented to the user in an easy-to-understand manner visually.

また、前記記憶手段は、前記選択されたテキストデータの履歴を記録し、前記抽出手段は、選択されたことのある全てのテキストデータに対する前記関連度を参照してテキストデータを抽出することが望ましい。
これにより、過去に選択された語の履歴を利用して、関連性の高い文書等を検索することが可能となる。これによりユーザの検索目的に近い情報を提示することが可能となる。 Preferably, the storage means records a history of the selected text data, and the extraction means extracts text data by referring to the degree of association with respect to all selected text data. .
This makes it possible to search for highly relevant documents using the history of words selected in the past. This makes it possible to present information close to the user's search purpose.

また、前記第２の表示手段は、最後に選択されたテキストデータの前記関連度をｎ倍して表示方式を変更することが望ましい。
これにより、ユーザの検索目的に近いと予想される、現在選択されている語との関連性の高い文書等を検索して、ユーザに提示することが可能となる。 The second display means may change the display method by multiplying the relevance of the text data selected last by n.
As a result, it is possible to search a document or the like highly relevant to the currently selected word, which is expected to be close to the user's search purpose, and present it to the user.

また、前記テキストデータ同士の前記関連度を算出する関連度算出手段を備えることが望ましい。
これにより、文書間の関連度、文書に登場する単語間の関連度、または文書と単語との関連度を算出することが可能となる。 Moreover, it is desirable to provide a relevance calculation means for calculating the relevance between the text data.
This makes it possible to calculate the degree of association between documents, the degree of association between words appearing in a document, or the degree of association between a document and a word.

第２の発明は、複数のテキストデータと、前記テキストデータ同士の関連度を記憶する記憶ステップと、画面上に前記テキストデータの少なくとも一部を複数表示する第１の表示ステップと、前記画面上の選択されたテキストデータに対して前記関連度を参照して他のテキストデータを抽出する抽出ステップと、前記選択されたテキストデータ及び前記抽出手段により抽出されたテキストデータの少なくとも一部を前記画面上に表示する第２の表示ステップと、を含むことを特徴とする文書検索方法である。
第２の発明によれば、ユーザによって選択された語と関連性の高い文書等を検索して得られた結果を、選択された語との関連度合に応じて表示することが可能となる。 The second invention includes a storage step of storing a plurality of text data, a degree of association between the text data, a first display step of displaying a plurality of at least a part of the text data on the screen, and the screen An extraction step of extracting other text data with reference to the relevance level of the selected text data, and at least a part of the selected text data and the text data extracted by the extraction means on the screen And a second display step for displaying on the document.
According to the second invention, it is possible to display a result obtained by searching a document or the like highly related to the word selected by the user according to the degree of association with the selected word.

第３の発明は、コンピュータを、複数のテキストデータと、前記テキストデータ同士の関連度を記憶する記憶手段と、画面上に前記テキストデータの少なくとも一部を複数表示する第１の表示手段と、前記画面上の選択されたテキストデータに対して前記関連度を参照して他のテキストデータを抽出する抽出手段と、前記選択されたテキストデータ及び前記抽出手段により抽出されたテキストデータの少なくとも一部を前記画面上に表示する第２の表示手段と、を具備する文書検索装置として機能させることを特徴とするプログラムである。
第３の発明によれば、ユーザによって選択された語と関連性の高い文書等を検索して得られた結果を、選択された語との関連度合に応じて表示することが可能となる。 In a third aspect of the invention, the computer includes a plurality of text data, a storage unit that stores the degree of association between the text data, a first display unit that displays a plurality of at least a part of the text data on a screen, Extraction means for extracting other text data with reference to the relevance level for the selected text data on the screen, at least a part of the selected text data and the text data extracted by the extraction means Is displayed on the screen as a document search device.
According to the third invention, it is possible to display a result obtained by searching a document or the like highly related to the word selected by the user according to the degree of association with the selected word.

本発明によって、ユーザの選択した語と関連性の高い文書等を検索して、関連度合に応じた表示を行うための文書検索装置等を提供することができる。 According to the present invention, it is possible to provide a document search device or the like for searching for a document or the like highly relevant to the word selected by the user and performing display according to the degree of association.

文書検索システムの構成例を示すシステム構成図System configuration diagram showing a configuration example of a document search system 文書検索サーバ、端末のハードウエアの構成例を示すブロック図Block diagram showing a configuration example of document search server and terminal hardware 文書登録処理の流れを示すフローチャートFlowchart showing the flow of document registration processing 文書ＤＢ、単語ＤＢ、文書単語頻度ＤＢに記憶されるデータの一例を示す図The figure which shows an example of the data memorize | stored in document DB, word DB, document word frequency DB 単語間関連度データ生成処理の流れを示すフローチャートThe flowchart which shows the flow of a word relevance data generation process 文書間関連度データ生成処理の流れを示すフローチャートFlowchart showing the flow of inter-document relevance data generation processing 文書単語関連度データ生成処理の流れを示すフローチャートFlowchart showing the flow of document word relevance data generation processing 単語間関連度ＤＢ、文書間関連度ＤＢ、文書単語関連度ＤＢに記憶されるデータの一例を示す図The figure which shows an example of the data memorize | stored in inter-word relevance DB, inter-document relevance DB, and document word relevance DB 文書検索処理の流れを示すフローチャートFlowchart showing the flow of document search processing 第１画面の画面例を示す図The figure which shows the example of a screen of the 1st screen 履歴を保持する配列の例を示す図The figure which shows the example of the arrangement which retains history スコアリストの一例を示す図Figure showing an example of the score list 第２画面の画面例を示す図The figure which shows the example of a screen of the 2nd screen 選択画面の画面例を示す図Figure showing a screen example of the selection screen

以下、図面に基づいて、本発明の好適な実施形態について詳細に説明する。
なお、本発明においてテキストデータとは文書、文書に登場する単語、文書に関する情報（文書の著者名等）のいずれかを少なくとも含む。 Hereinafter, preferred embodiments of the present invention will be described in detail with reference to the drawings.
In the present invention, the text data includes at least one of a document, a word appearing in the document, and information related to the document (such as the author name of the document).

図１は、本実施形態に係る文書検索システム２００の構成例を示す図である。図１に示すように、文書検索サーバ１００とユーザが利用する１または複数の端末１０１が、ネットワーク１０２を介して互いに通信接続されて構成される。 FIG. 1 is a diagram illustrating a configuration example of a document search system 200 according to the present embodiment. As shown in FIG. 1, a document search server 100 and one or a plurality of terminals 101 used by a user are connected to each other via a network 102.

文書検索サーバ１００は、各種データベース（図３、図５参照）を記憶するサーバ装置である。詳細は後述する。 The document search server 100 is a server device that stores various databases (see FIGS. 3 and 5). Details will be described later.

端末１０１は、ユーザが利用するコンピュータであり、ネットワーク１０２を介して文書検索サーバ１００から送信される検索結果等を表示する。また、端末１０１は、ユーザから後述する第１画面（図１０）や第２画面（図１３）から語の選択を受付けて文書検索サーバ１００に送信する。端末１０１は、汎用なコンピュータに代えて、携帯端末、モバイル端末などであっても良い。 A terminal 101 is a computer used by a user, and displays search results and the like transmitted from the document search server 100 via the network 102. In addition, the terminal 101 accepts word selection from a first screen (FIG. 10) and a second screen (FIG. 13) described later from the user, and transmits them to the document search server 100. The terminal 101 may be a mobile terminal or a mobile terminal instead of a general-purpose computer.

なお、後述する文書検索サーバ１００と端末１０１の機能を一体化して単一のコンピュータが文書検索システム２００の機能を実現する構成としても良い。また、文書検索サーバ１００の機能を単一のコンピュータ上に構築する必要は無く、ネットワーク１０２で接続された別々のコンピュータ上に構築しても良い。 It should be noted that the functions of the document search server 100 and the terminal 101, which will be described later, may be integrated so that a single computer realizes the function of the document search system 200. Further, the function of the document search server 100 does not need to be constructed on a single computer, and may be constructed on separate computers connected by the network 102.

本実施形態における文書検索システム２００とは、ユーザが選択した語に対して、語と関連性の高い文書情報を検索して端末１０１に提示するものである。 The document search system 200 in this embodiment searches for document information that is highly relevant to a word selected by a user and presents the document information to the terminal 101.

図２は、本実施形態に係る文書検索サーバ１００（端末１０１）のハードウエアの構成例を示すブロック図である。文書検索サーバ１００（端末１０１）は、図２に示すように、例えば、制御部１１、記憶部１２、メディア入出力部１３、通信制御部１４、入力部１５、表示部１６、周辺機器Ｉ／Ｆ部１７等が、バス１８を介して接続されて構成される。 FIG. 2 is a block diagram illustrating a hardware configuration example of the document search server 100 (terminal 101) according to the present embodiment. As shown in FIG. 2, the document search server 100 (terminal 101) includes, for example, a control unit 11, a storage unit 12, a media input / output unit 13, a communication control unit 14, an input unit 15, a display unit 16, a peripheral device I / The F unit 17 and the like are connected via the bus 18.

制御部１１は、ＣＰＵ（Central Processing Unit）、ＲＯＭ（Read Only
Memory）、ＲＡＭ（Random Access Memory）等により構成される。
ＣＰＵは、記憶部１２、ＲＯＭ、記憶媒体等に格納されるプログラムをＲＡＭ上のワークメモリ領域に呼び出して実行し、バス１８を介して接続された各装置を駆動制御し、文書検索サーバ１００（端末１０１）が行う後述する処理を実現する。ＲＯＭは、不揮発性メモリであり、コンピュータのブートプログラムやＢＩＯＳ等のプログラム、データ等を恒久的に保持する。ＲＡＭは、揮発性メモリであり、ロードしたプログラムや、データ等を一時的に保持すると共に、制御部１１が各処理を行うために使用するワークエリアを備える。 The control unit 11 includes a CPU (Central Processing Unit) and a ROM (Read Only).
Memory), RAM (Random Access Memory) and the like.
The CPU calls and executes a program stored in the storage unit 12, ROM, storage medium or the like in the work memory area on the RAM, drives and controls each device connected via the bus 18, and the document search server 100 ( The processing described later performed by the terminal 101) is realized. The ROM is a non-volatile memory and permanently holds a computer boot program, a program such as BIOS, data, and the like. The RAM is a volatile memory, and temporarily holds a loaded program, data, and the like, and includes a work area used by the control unit 11 to perform each process.

記憶部１２は、ＨＤＤ（Hard Disk Drive）等であり、制御部１１が実行するプログラムや、プログラム実行に必要なデータ、ＯＳ（Operating System）等が格納されている。これらのプログラムコードは、制御部１１により必要に応じて読み出されてＲＡＭに移され、ＣＰＵに読み出されて実行される。 The storage unit 12 is an HDD (Hard Disk Drive) or the like, and stores a program executed by the control unit 11, data necessary for program execution, an OS (Operating System), and the like. These program codes are read by the control unit 11 as necessary, transferred to the RAM, and read and executed by the CPU.

メディア入出力部１３は、例えば、ＣＤドライブ、ＤＶＤドライブ、ＭＯドライブ、フロッピー（登録商標）ディスクドライブ、等のメディア入出力装置であり、データの入出力を行う。
通信制御部１４は、通信制御装置、通信ポート等を有し、コンピュータとネットワーク間の通信を媒介する通信インターフェースであり、ネットワークを介して、他の装置間との通信制御を行う。 The media input / output unit 13 is a media input / output device such as a CD drive, a DVD drive, an MO drive, and a floppy (registered trademark) disk drive, and performs input / output of data.
The communication control unit 14 includes a communication control device, a communication port, and the like, and is a communication interface that mediates communication between a computer and a network, and performs communication control between other devices via the network.

入力部１５は、データ入力を行い、例えば、キーボード、マウスなどのポインティングデバイス、テンキーなどの入力装置を有する。入力されたデータを制御部１１へ出力する。
表示部１６は、例えば、ＣＲＴモニタ、液晶パネル等のディスプレイ装置と、ディスプレイ装置と連携して表示処理を実行するための論理回路（ビデオアダプタ等）で構成され、制御部１１の制御により入力された表示情報をディスプレイ装置上に表示させる。
尚、入力部１５と表示部１６は、それらの機能が一体化した、例えば、タッチパネル付ディスプレイであっても良い。 The input unit 15 performs data input and includes, for example, a keyboard, a pointing device such as a mouse, and an input device such as a numeric keypad. The input data is output to the control unit 11.
The display unit 16 includes, for example, a display device such as a CRT monitor or a liquid crystal panel, and a logic circuit (a video adapter or the like) for executing display processing in cooperation with the display device, and is input under the control of the control unit 11. The displayed display information is displayed on the display device.
The input unit 15 and the display unit 16 may be, for example, a display with a touch panel in which those functions are integrated.

周辺機器Ｉ／Ｆ部（インターフェース）１７は、コンピュータに周辺機器を接続させるためのポートであり、周辺機器Ｉ／Ｆ部１７を介してコンピュータは周辺機器とのデータの送受信を行う。周辺機器Ｉ／Ｆ部１７は、ＵＳＢやＩＥＥＥ１３９４やＲＳ−２３２Ｃ等で構成されており、通常複数の周辺機器Ｉ／Ｆを有する。周辺機器との接続形態は、有線、無線を問わない。
バス１８は、各装置間の制御信号、データ信号等の授受を媒介する経路である。 The peripheral device I / F unit (interface) 17 is a port for connecting a peripheral device to the computer, and the computer transmits and receives data to and from the peripheral device via the peripheral device I / F unit 17. The peripheral device I / F unit 17 is configured by USB, IEEE 1394, RS-232C, or the like, and usually includes a plurality of peripheral devices I / F. The connection form with the peripheral device may be wired or wireless.
The bus 18 is a path that mediates transmission / reception of control signals, data signals, and the like between the devices.

続いて、文書検索サーバ１００が新たな文書の登録を受付ける際に実行する文書登録処理について、図３と図４を用いて説明する。図３は、文書登録処理の流れを示すフローチャートである。図４は、文書ＤＢ、単語ＤＢ、文書単語頻度ＤＢに記憶されるデータの一例を示す図である。 Next, document registration processing executed when the document search server 100 accepts registration of a new document will be described with reference to FIGS. FIG. 3 is a flowchart showing the flow of document registration processing. FIG. 4 is a diagram illustrating an example of data stored in the document DB, the word DB, and the document word frequency DB.

文書検索サーバ１００の制御部１１は、入力部１５等から文書の入力を受付けると（ステップＳ１０１）、新たなドキュメントＩＤ２１を割り当てて、文書ＤＢに登録する（ステップＳ１０２）。 When receiving the input of the document from the input unit 15 or the like (step S101), the control unit 11 of the document search server 100 assigns a new document ID 21 and registers it in the document DB (step S102).

図４（ａ）に文書ＤＢの一例を示す。文書ＤＢは、文書情報を保持するＤＢであり、新たな文書の登録を受付けるごとにデータが追加されるものである。文書ＤＢには、ドキュメントＩＤ２１、データ種類２２、見出し２３、著者２４、テーマ名２５、本文２６等の情報が格納される。ドキュメントＩＤ２１とは、文書を一意に識別する値であり、文書の登録を受付けるごとに割り当てられる。 FIG. 4A shows an example of the document DB. The document DB is a DB that holds document information, and data is added every time registration of a new document is accepted. The document DB stores information such as document ID 21, data type 22, heading 23, author 24, theme name 25, and text 26. The document ID 21 is a value that uniquely identifies a document, and is assigned every time registration of a document is accepted.

本実施形態において文書とは、ニュース、テーマに関する解説文、書籍、雑誌、記事、論文、その他の書類等である。データ種類２２とは、上記のような文書の種類を記憶するものである。 In the present embodiment, the document is news, commentary on the theme, books, magazines, articles, papers, other documents, and the like. The data type 22 stores the type of document as described above.

文書検索サーバ１００の制御部１１は、新たに登録された文書（文書の見出し２３、著者２４、テーマ名２５、本文２６等）に対し、形態素解析を実行し（ステップＳ１０３）、重要語を抽出する（ステップＳ１０４）。形態素解析及び重要語抽出は、例えば、汎用的なソフトウエアを利用することができる。 The control unit 11 of the document search server 100 performs morphological analysis on the newly registered document (document heading 23, author 24, theme name 25, text 26, etc.) (step S103), and extracts important words. (Step S104). For morphological analysis and key word extraction, for example, general-purpose software can be used.

文書検索サーバ１００の制御部１１は、抽出された重要語が単語ＤＢに既に登録されているか否かを判定し、登録されていない場合には、新たな単語ＩＤ３１を割り当てて、その表記３２を単語ＤＢに登録する（ステップＳ１０５）。 The control unit 11 of the document search server 100 determines whether or not the extracted important word is already registered in the word DB. If the extracted important word is not registered, a new word ID 31 is assigned and its notation 32 is used. Register in the word DB (step S105).

図４（ｂ）に単語ＤＢの一例を示す。単語ＤＢは、単語情報を保持するＤＢであり、単語ＩＤ３１と表記３２を紐付けて記憶するものである。文書登録処理のステップＳ１０４にて抽出された重要語が単語ＤＢに登録されていない場合に、新たにデータが追加される。単語ＩＤ３１とは、単語を一意に識別する値であり、データが追加されるごとに割り当てられる。 FIG. 4B shows an example of the word DB. The word DB is a DB that holds word information, and stores the word ID 31 and the notation 32 in association with each other. If the important word extracted in step S104 of the document registration process is not registered in the word DB, new data is added. The word ID 31 is a value that uniquely identifies a word, and is assigned each time data is added.

文書検索サーバ１００の制御部１１は、新たに登録された文書内でステップＳ１０３にて抽出された重要語が出現する回数を、重要語ごとにカウントする。文書検索サーバ１００の制御部１１は、その文書のドキュメントＩＤ２１とその重要語の単語ＩＤ３２と出現回数を、文書単語頻度ＤＢに登録して（ステップＳ１０７）、処理を終了する。 The control unit 11 of the document search server 100 counts, for each important word, the number of times the important word extracted in step S103 appears in the newly registered document. The control unit 11 of the document search server 100 registers the document ID 21 of the document, the word ID 32 of the important word, and the number of appearances in the document word frequency DB (step S107), and ends the process.

図４（ｃ）に単語文書単語頻度ＤＢの一例を示す。単語文書単語頻度ＤＢは、ドキュメントＩＤ４１の文書内に単語ＩＤ４２の重要語が出現する頻度（出現頻度４３）を表す値である。ドキュメントＩＤ４１はドキュメントＩＤ２１と紐づき、単語ＩＤ４２は単語ＩＤ３１と紐づく。 FIG. 4C shows an example of the word document word frequency DB. The word document word frequency DB is a value representing the frequency (appearance frequency 43) at which the important word with the word ID 42 appears in the document with the document ID 41. The document ID 41 is associated with the document ID 21, and the word ID 42 is associated with the word ID 31.

以上のように、文書検索サーバ１００は、新たな文書登録を受付けると、データ種類、見出し、著者、本文等の文書情報を文書ＤＢに登録し、文書の重要語を抽出する。文書検索サーバ１００は、抽出された重要語を単語ＤＢに登録すると共に、文書内の出現回数をカウントして、文書単語頻度ＤＢに登録する。文書登録処理が実行されるごとに、文書単語頻度ＤＢには抽出された重要語の数と同数のデータが新たに追加されることとなる。 As described above, when receiving a new document registration, the document search server 100 registers document information such as a data type, a headline, an author, and a body text in the document DB, and extracts an important word of the document. The document search server 100 registers the extracted important word in the word DB, counts the number of appearances in the document, and registers it in the document word frequency DB. Each time the document registration process is executed, the same number of data as the number of extracted important words is newly added to the document word frequency DB.

次に、図５〜図８を参照して、文書検索サーバ１００が、図４に示すＤＢを利用して、単語間の関連度、文書間の関連度、および単語文書間の関連度を算出して事前に記憶する処理について説明する。 Next, referring to FIG. 5 to FIG. 8, the document search server 100 uses the DB shown in FIG. 4 to calculate the degree of association between words, the degree of association between documents, and the degree of association between word documents. The process of storing in advance will be described.

図５は、単語間関連度データ生成処理の流れを示すフローチャートである。
文書検索サーバ１００の制御部１１は、記憶部１２に記憶される単語ＤＢ（図４（ｂ））から単語Ａと単語Ｂとを選択して、入力する（ステップＳ２０１）。文書検索サーバ１００の制御部１１は、単語Ａと単語Ｂとの単語間関連度を算出する（ステップＳ２０２）。 FIG. 5 is a flowchart showing the flow of the inter-word relevance data generation process.
The control unit 11 of the document search server 100 selects and inputs the word A and the word B from the word DB (FIG. 4B) stored in the storage unit 12 (step S201). The control unit 11 of the document search server 100 calculates the degree of association between the words A and B (step S202).

単語間関連度として、例えば、文書における単語Ａと単語Ｂの共起の度合を測る尺度である相互情報量を算出する。具体的には、単語Ａが出現する文書数、単語Ｂが出現する文書数、および単語Ａと単語Ｂが共に出現する文書数を、文書単語頻度ＤＢ（図４（ｃ））を用いてそれぞれカウントする。これらの値を用いて、単語Ａと単語Ｂの相互情報量が算出される。 As the degree of association between words, for example, a mutual information amount that is a scale for measuring the degree of co-occurrence of word A and word B in a document is calculated. Specifically, the number of documents in which word A appears, the number of documents in which word B appears, and the number of documents in which both word A and word B appear are respectively used using the document word frequency DB (FIG. 4C). Count. Using these values, the mutual information amount of word A and word B is calculated.

文書検索サーバ１００の制御部１１は、単語Ａの単語ＩＤと単語Ｂの単語ＩＤと算出した単語間関連度を、単語間関連度ＤＢに新たに追加して（ステップＳ２０３）、処理を終了する。 The control unit 11 of the document search server 100 newly adds the word ID of the word A and the word ID of the word B and the calculated inter-word relation degree to the inter-word relation degree DB (step S203), and ends the process. .

以上のように、文書検索サーバ１００は、単語ＤＢに登録される異なる２つの単語間の関連度を算出し、単語間関連度ＤＢに記憶する。 As described above, the document search server 100 calculates the degree of association between two different words registered in the word DB and stores the degree of association between the words.

図８は、単語間関連度ＤＢ、文書間関連度ＤＢ、文書単語関連度ＤＢに記憶されるデータの一例を示す図である。 FIG. 8 is a diagram illustrating an example of data stored in the inter-word relevance DB, the inter-document relevance DB, and the document word relevance DB.

図８の（ａ）に示す単語間関連度ＤＢは、単語ＤＢに登録される異なる単語（単語ＩＤ５１と単語ＩＤ５２）間の関連度５３を記憶する。文書検索サーバ１００が、図５に示す単語間関連度データ生成処理を実行するごとに新たなデータが単語関連度ＤＢに追加される。 The inter-word association degree DB shown in FIG. 8A stores association degrees 53 between different words (word ID 51 and word ID 52) registered in the word DB. Each time the document search server 100 executes the inter-word relevance data generation process shown in FIG. 5, new data is added to the word relevance DB.

図６は、文書間関連度データ生成処理の流れを示すフローチャートである。
文書検索サーバ１００の制御部１１は、記憶部１２に記憶される文書ＤＢ（図４（ａ））から文書Ａと文書Ｂとを選択して、入力する（ステップＳ３０１）。文書検索サーバ１００の制御部１１は、文書Ａと文書Ｂとの文書間関連度を算出する（ステップＳ３０２）。 FIG. 6 is a flowchart showing the flow of inter-document relevance data generation processing.
The control unit 11 of the document search server 100 selects and inputs the document A and the document B from the document DB (FIG. 4A) stored in the storage unit 12 (step S301). The control unit 11 of the document search server 100 calculates the degree of association between documents A and B (step S302).

ここで、文書間関連度として、例えば、文書Ａと文書Ｂのコサイン類似度を算出する。具体的には、文書Ａに出現する単語とその出現頻度、文書Ｂに出現する単語とその出現頻度を、文書単語頻度ＤＢ（図４（ｃ））を用いてそれぞれ取得する。これらを用いて、文書Ａと文書Ｂの特徴を表現する文書ベクトルＡと文書ベクトルＢを生成し、文書ベクトルＡと文書ベクトルＢに基づいてコサイン類似度を算出する。 Here, as the inter-document relevance, for example, the cosine similarity between the document A and the document B is calculated. Specifically, the word appearing in the document A and its appearance frequency, and the word appearing in the document B and its appearance frequency are respectively acquired using the document word frequency DB (FIG. 4C). Using these, a document vector A and a document vector B expressing the characteristics of the document A and the document B are generated, and a cosine similarity is calculated based on the document vector A and the document vector B.

文書検索サーバ１００の制御部１１は、文書Ａの文書ＩＤと文書Ｂの文書ＩＤと算出した文書間関連度を、文書間関連度ＤＢに新たに追加して（ステップＳ３０３）、処理を終了する。 The control unit 11 of the document search server 100 newly adds the document ID of the document A and the document ID of the document B and the calculated inter-document relevance degree to the inter-document relevance DB (step S303), and ends the process. .

以上のように、文書検索サーバ１００は、文書ＤＢに登録される異なる２つの文書間の関連度を算出し、文書間関連度ＤＢに記憶する。 As described above, the document search server 100 calculates the degree of association between two different documents registered in the document DB, and stores the degree of association between the documents.

図８の（ｂ）に示す文書間関連度ＤＢは、文書ＤＢに登録される異なる文書（ドキュメントＩＤ６１とドキュメントＩＤ６２）間の関連度６３を記憶する。文書検索サーバ１００が、図６に示す文書間関連度データ生成処理を実行するごとに新たなデータが単語関連度ＤＢに追加される。 The inter-document relevance DB shown in FIG. 8B stores a relevance 63 between different documents (document ID 61 and document ID 62) registered in the document DB. Each time the document search server 100 executes the inter-document relevance data generation process shown in FIG. 6, new data is added to the word relevance DB.

図７は、文書単語関連度データ生成処理の流れを示すフローチャートである。
文書検索サーバ１００の制御部１１は、記憶部１２に記憶される文書ＤＢ（図４（ａ））から文書Ａを選択し、単語ＤＢ（図４（ｂ））から単語Ｂを選択して、入力する（ステップＳ４０１）。文書検索サーバ１００の制御部１１は、文書Ａと単語Ｂとの文書単語関連度を算出する（ステップＳ４０２）。 FIG. 7 is a flowchart showing the flow of document word relevance data generation processing.
The control unit 11 of the document search server 100 selects the document A from the document DB (FIG. 4A) stored in the storage unit 12, selects the word B from the word DB (FIG. 4B), Input (step S401). The control unit 11 of the document search server 100 calculates the document word association degree between the document A and the word B (step S402).

ここで、文書単語関連度として、例えば、文書Ａと単語Ｂのコサイン類似度を算出する。具体的には、文書Ａにおける単語Ｂの出現頻度を文書単語頻度ＤＢ（図４（ｃ））を用いて取得して、文書Ａの文書ベクトルを生成し、文書Ａと単語Ｂのコサイン類似度を算出する。 Here, as the document word relevance, for example, the cosine similarity between the document A and the word B is calculated. Specifically, the appearance frequency of the word B in the document A is acquired using the document word frequency DB (FIG. 4C), the document vector of the document A is generated, and the cosine similarity between the document A and the word B is generated. Is calculated.

文書検索サーバ１００の制御部１１は、文書Ａの文書ＩＤと単語Ｂの単語ＩＤと算出した文書単語関連度を、文書単語関連度ＤＢに新たに追加して（ステップＳ４０３）、処理を終了する。 The control unit 11 of the document search server 100 newly adds the document ID of the document A, the word ID of the word B, and the calculated document word association degree to the document word association degree DB (step S403), and ends the process. .

以上のように、文書検索サーバ１００は、文書ＤＢに登録される文書と単語ＤＢに登録される単語との関連度を算出し、文書単語関連度ＤＢに記憶する。 As described above, the document search server 100 calculates the degree of association between a document registered in the document DB and a word registered in the word DB, and stores it in the document word association DB.

図８の（ｃ）に示す文書単語関連度ＤＢは、文書ＤＢに登録される文書（ドキュメントＩＤ７１）と単語ＤＢに登録される単語（単語ＤＢ７２）との関連度７３を記憶する。文書検索サーバ１００が、図７に示す文書単語関連度データ生成処理を実行するごとに新たなデータが単語関連度ＤＢに追加される。 The document word relevance DB shown in FIG. 8C stores a relevance degree 73 between a document (document ID 71) registered in the document DB and a word (word DB 72) registered in the word DB. Each time the document search server 100 executes the document word relevance data generation process shown in FIG. 7, new data is added to the word relevance DB.

次に、図９〜図１４を用いて、文書検索システム２００が実行する文書検索処理について説明する。文書検索サーバ１００は図２の文書登録処理、図５〜図７の関連度データ生成処理を実行して、文書検索サーバ１００の記憶部１２には予め図４に示すＤＢおよび図８に示すＤＢが記憶されているものとする。 Next, a document search process executed by the document search system 200 will be described with reference to FIGS. The document search server 100 executes the document registration process shown in FIG. 2 and the relevance data generation process shown in FIGS. 5 to 7. The DB 12 shown in FIG. 4 and the DB shown in FIG. Is stored.

端末１０１に表示された遷移前の画面において、端末１０１はユーザから語の選択を受付けて、文書検索サーバ１００は選択された語と関連性の高い文書および単語を検索して、端末１０１に検索結果を送信する。検索結果が表示された遷移後の画面において、端末１０１はユーザから新たに語の選択を受付けて、文書検索サーバ１００は選択された語と関連性の高い文書および単語を再び検索する。以上のように、文書検索処理とは、繰り返し実行される処理である。 In the pre-transition screen displayed on the terminal 101, the terminal 101 accepts a word selection from the user, and the document search server 100 searches the terminal 101 for a document and a word that are highly relevant to the selected word. Send the result. In the post-transition screen on which the search result is displayed, the terminal 101 accepts a new word selection from the user, and the document search server 100 searches again for a document and a word that are highly related to the selected word. As described above, the document search process is a process that is repeatedly executed.

図９は、文書検索処理の流れを示すフローチャートである。
文書検索サーバ１００の制御部１１は、語が配置された検索キーを表示する第１画面（遷移前の画面）を端末１０１に送信し、端末１０１の制御部１１は、表示部１６に受信した第１画面を表示する（ステップＳ５０１）。 FIG. 9 is a flowchart showing the flow of document search processing.
The control unit 11 of the document search server 100 transmits a first screen (pre-transition screen) displaying a search key in which words are arranged to the terminal 101, and the control unit 11 of the terminal 101 receives the display unit 16. The first screen is displayed (step S501).

図１０は、第１画面８０の画面例を示す図である。図示されるように第１画面８０には、語が配置された複数の検索ボタン８１ａ、８１ｂ、８１ｃ、８１ｄが表示される。検索ボタン８１に配置される語とは、単語の表記、文書の著者等である。初期画面において表示する語の選択は、文書検索システム２００を利用するユーザの傾向に合わせて、文書検索サーバ１００によって適宜行われるものであって良い。 FIG. 10 is a diagram illustrating a screen example of the first screen 80. As shown in the figure, the first screen 80 displays a plurality of search buttons 81a, 81b, 81c, 81d in which words are arranged. The words arranged in the search button 81 are word notation, document author, and the like. The selection of the word to be displayed on the initial screen may be appropriately performed by the document search server 100 in accordance with the tendency of the user who uses the document search system 200.

検索ボタン８１の大きさは、語と紐付く単語または文書の出現回数や重要度を表すものであり、文書検索サーバ１００は文書単語頻度ＤＢを用いて検索ボタン８１の大きさを算出することができる。これにより、ユーザは表示された語が示す文書等の頻出度又は重要度を一目で認識することができる。また、検索ボタン８１に配置された語が単語と文書のどちらに紐付くのかに応じて、検索ボタン８１の色彩や形状等を変更しても良い。 The size of the search button 81 represents the number of occurrences and the importance of a word or document associated with a word, and the document search server 100 may calculate the size of the search button 81 using the document word frequency DB. it can. Thereby, the user can recognize at a glance the frequency or importance of the document or the like indicated by the displayed word. Further, the color or shape of the search button 81 may be changed according to whether the word arranged in the search button 81 is associated with a word or a document.

第１画面８０において、ユーザによって検索ボタン８１が選択されると、選択された検索ボタン８１に配置された語（以下、選択語と表記）と関連性の高い単語または文書が文書検索サーバ１００によって検索され、それらを示す語を配置した検索ボタンを表示する第２画面（図１３）に遷移する。 When the search button 81 is selected by the user on the first screen 80, a word or document highly related to a word (hereinafter referred to as a selected word) arranged on the selected search button 81 is displayed by the document search server 100. A transition is made to the second screen (FIG. 13) that displays the search buttons that are searched and in which words indicating them are arranged.

図９の説明に戻る。端末１０１の制御部１１は、語の選択を検知したか否かを判定する（ステップＳ５０２）。語の選択を検知しない場合には（ステップＳ５０２のＮＯ）、端末１０１の制御部１１は、ステップＳ５０２に戻る。 Returning to the description of FIG. The control unit 11 of the terminal 101 determines whether or not the selection of a word has been detected (step S502). When the selection of the word is not detected (NO in step S502), the control unit 11 of the terminal 101 returns to step S502.

語の選択を検知した場合には（ステップＳ５０２のＹＥＳ）、端末１０１の制御部１１は、選択語を文書検索サーバ１００に送信し、文書検索サーバ１００の制御部１１は、受信した選択語を検索の履歴として配列に格納し、記憶部１２または制御部１１のＲＡＭに記憶する（ステップＳ５０３）。 When the selection of a word is detected (YES in step S502), the control unit 11 of the terminal 101 transmits the selected word to the document search server 100, and the control unit 11 of the document search server 100 receives the received selected word. The search history is stored in the array and stored in the storage unit 12 or the RAM of the control unit 11 (step S503).

図１１は、履歴を保持する配列７８の例を示す図である。図１１に示すように、配列７８にはユーザによって以前に選択された語（以下、履歴語と表記）が格納されている。文書検索サーバ１は、新たに受信した選択語を配列７８の先頭に格納することによって、履歴語と区別して記憶することがきる。 FIG. 11 is a diagram illustrating an example of the array 78 that holds the history. As shown in FIG. 11, the array 78 stores words previously selected by the user (hereinafter referred to as history words). The document search server 1 can store the newly received selected word at the beginning of the array 78 to distinguish it from the history word.

文書検索サーバ１００の制御部１１は、配列から語を抽出し（ステップＳ５０４）、抽出された語に関連する単語および文書を検索して、関連度を取得する（ステップＳ５０５）。 The control unit 11 of the document search server 100 extracts words from the array (step S504), searches for words and documents related to the extracted words, and acquires the degree of association (step S505).

具体的には、文書検索サーバ１００は、配列から抽出された語（以下、抽出語と表記）が、単語に関する語であるのか、文書に関する語であるのかを判定する。単語に関する語である場合には、文書検索サーバ１００は、単語間関連度ＤＢを用いて、抽出語との関連度が高い上位Ｎ件（Ｎは任意の自然数）の単語を検索しその関連度を取得する。また、文書単語関連度ＤＢを用いて、抽出語との関連度の高い上位Ｎ件の文書を検索しその関連度を取得する。 Specifically, the document search server 100 determines whether a word extracted from the array (hereinafter referred to as an extracted word) is a word related to a word or a word related to a document. If it is a word related to a word, the document search server 100 searches the top N words (N is an arbitrary natural number) having a high degree of association with the extracted word using the inter-word relevance DB, and the degree of relevance. To get. Also, using the document word relevance DB, the top N documents having a high relevance with the extracted word are searched and the relevance is obtained.

同様にして、抽出語が文書に関する語である場合には、文書検索サーバ１００は文書間関連度ＤＢおよび文書単語関連度ＤＢを用いて、抽出語との関連度の高い上位Ｎ件の文書と上位Ｎ件の単語を検索して、その関連度を取得する。 Similarly, when the extracted word is a word related to a document, the document search server 100 uses the inter-document relevance DB and the document word relevance DB to identify the top N documents having a high relevance with the extracted word. The top N words are searched and their relevance is acquired.

文書検索サーバ１００の制御部１１は、抽出語が選択語であるか否かを判定し（ステップＳ５０６）、選択語である場合には（ステップＳ５０６のＹＥＳ）、文書検索サーバ１００の制御部１１は、取得した関連度をｎ倍（ｎは任意の値）して検索した文書または単語のスコアとする（ステップＳ５０７）。 The control unit 11 of the document search server 100 determines whether or not the extracted word is a selected word (step S506). If the extracted word is a selected word (YES in step S506), the control unit 11 of the document search server 100 is determined. The obtained relevance degree is multiplied by n (n is an arbitrary value) to obtain the score of the retrieved document or word (step S507).

選択語は現在選択されている語であるため、配列の他の語（履歴語）よりユーザの検索目的に近い語であると考えられる。そのため、選択語と関連性の高い単語および文書をユーザに提示できるように、選択語と関連性の高い単語および文書の関連度に重み付けして、スコアを算出するものである。 Since the selected word is the currently selected word, it is considered that the selected word is closer to the user's search purpose than other words (history words) in the array. Therefore, the score is calculated by weighting the degree of association between the word and the document highly related to the selected word and the document so that the word and the document highly related to the selected word can be presented to the user.

選択語でない場合には（ステップＳ５０６のＮＯ）、文書検索サーバ１００の制御部１１は、取得した関連度をそのまま検索した文書または単語のスコアとする（ステップＳ５０７）。 If it is not the selected word (NO in step S506), the control unit 11 of the document search server 100 uses the acquired relevance as it is as the score of the searched document or word (step S507).

文書検索サーバ１００の制御部１１は、配列に次の語があるか否かを判定する（ステップＳ５０９）。次の語がある場合には（ステップＳ５０９のＹＥＳ）、文書検索サーバ１００の制御部１１は、ステップＳ５０４に戻る。即ち、ステップＳ５０４〜ステップＳ５０８の処理は、配列に格納された語の数だけ繰り返し実行される。文書検索サーバ１００は、配列に格納された語それぞれに対して、関連度の高い文書または単語を検索して、そのスコアを取得する。 The control unit 11 of the document search server 100 determines whether or not there is the next word in the array (step S509). If there is a next word (YES in step S509), the control unit 11 of the document search server 100 returns to step S504. That is, the processes in steps S504 to S508 are repeatedly executed for the number of words stored in the array. The document search server 100 searches for a document or word having a high degree of relevance for each word stored in the array, and acquires the score.

次の語が無い場合には（ステップＳ５０９のＮＯ）、文書検索サーバ１００の制御部１１は、スコアに基づき遷移後の画面情報（文書のスコアリスト、単語のスコアリスト等）を生成して端末１０１に送信し、端末１０１の制御部１１は、受信した情報に基づいて表示部１６に遷移後の画面（第２画面）を表示する（ステップＳ５１０）。端末１０１の制御部１１は、ステップＳ５０２に戻り、第２画面において再び語の選択を受付ける。 If there is no next word (NO in step S509), the control unit 11 of the document search server 100 generates screen information after the transition (document score list, word score list, etc.) based on the score to generate a terminal. 101, the control unit 11 of the terminal 101 displays the screen (second screen) after the transition on the display unit 16 based on the received information (step S510). The control unit 11 of the terminal 101 returns to step S502 and accepts word selection again on the second screen.

ステップＳ５１０にて文書検索サーバ１００が端末１０１に送信するスコアリストについて説明する。図１２は、単語のスコアリストの一例を示す図である。図１２に示すように、単語のスコアリストには関連度の高い単語（関連語）とそのスコアが格納される。同様に文書のスコアリスト（図示せず）には関連度の高い文書とそのスコアが格納される。 A score list transmitted from the document search server 100 to the terminal 101 in step S510 will be described. FIG. 12 is a diagram illustrating an example of a word score list. As shown in FIG. 12, the word score list stores words with high relevance (related words) and their scores. Similarly, a document score list (not shown) stores highly relevant documents and their scores.

文書検索サーバ１００は、配列に格納された各語に対して検索した単語または文書のスコアをマージ処理して、単語または文書のスコアリストを生成する。即ち、文書検索サーバ１００は、現在選択されている選択語と関連性の高い単語または文書だけでなくて、履歴語と関連性の高い単語または文書も合わせてスコアリストを生成することとなる。 The document search server 100 merges the searched word or document scores for each word stored in the array to generate a word or document score list. That is, the document search server 100 generates a score list by combining not only a word or document highly related to the currently selected word but also a word or document highly related to the history word.

従って、ユーザが過去に選択した語の履歴を利用して、関連する文書または単語を表示することが可能となり、ユーザの検索の目的により近い検索結果を提示することができる。またこれにより、ユーザに対し新たな発想を提案できるといった効果も期待できる。 Therefore, it is possible to display related documents or words using the history of words selected by the user in the past, and to present search results closer to the purpose of the user's search. Moreover, the effect that a new idea can be proposed with respect to a user by this is also expectable.

図１３は、第２画面８２の画面例である。図示される第２画面８２は、図１０に示す第１画面８０にて「微生物（８１ａ）」がユーザによって選択された場合に、表示される遷移後の画面例を示すものである。第２画面８２は２つの領域（左ペイン８３と右ペイン８４）に分かれる。左ペイン８３には第１画面８０と同様に、語が配置された複数の検索ボタン８６ａ、８６ｂ、８６ｃ、８６ｅが表示される。直近の選択語である「微生物（８６ａ）」と関連性（スコア）の高い「水道光熱費（８６ｂ）」、「省エネ（８６ｃ）」は画面に残るが、関連性（スコア）の低い「ヘルスケア（８６ｄ）」は画面からフェードアウトし、関連性（スコア）の高い新たな語として「△△事業部（８６ｅ）」がフェードインする。 FIG. 13 is a screen example of the second screen 82. The illustrated second screen 82 shows a screen example after the transition that is displayed when “microorganism (81a)” is selected by the user on the first screen 80 shown in FIG. The second screen 82 is divided into two areas (left pane 83 and right pane 84). Similar to the first screen 80, the left pane 83 displays a plurality of search buttons 86a, 86b, 86c, 86e in which words are arranged. “Utilities (86b)” and “Energy saving (86c)” that are highly relevant (score) with the most recently selected word “microorganism (86a)” remain on the screen, but “health” that has low relevance (score) “Care (86d)” fades out from the screen, and “ΔΔ business division (86e)” fades in as a new word with high relevance (score).

選択語の検索ボタン８６ａとその他の検索ボタン８６ｂ、８６ｃ、８６ｅとの距離は、スコアの値を反映するものである。図示される例では、「水道光熱費（８６ｂ）」、「省エネ（８６ｃ）」、「△△事業部（８６ｅ）」のスコアの値はそれぞれ「０．９」、「０．３」、「０．８」であり、これらに応じて左ペイン８３に配置される検索ボタン８６の位置が決定される。これにより、ユーザは選択語と検索ボタン８６に表示される語との関連性を一目で認識することができる。 The distance between the search button 86a for the selected word and the other search buttons 86b, 86c, 86e reflects the score value. In the example shown in the figure, the score values of “water utility costs (86b)”, “energy saving (86c)”, and “△△ business division (86e)” are “0.9”, “0.3”, “ 0.8 ", and the position of the search button 86 arranged in the left pane 83 is determined according to these. Thereby, the user can recognize at a glance the relationship between the selected word and the word displayed on the search button 86.

左ペイン８３に表示される語が示す単語または文書は、単語または文書のスコアリストに基づいて、端末１０１または文書検索サーバ１００によって適宜選択されるものである。 The word or document indicated by the word displayed in the left pane 83 is appropriately selected by the terminal 101 or the document search server 100 based on the word or document score list.

第２画面８２において、ユーザによって検索ボタン８６が選択されると、選択された検索ボタン８６に配置された語と関連性の高い単語または文書が文書検索サーバ１００によって再び検索され、それらを示す語を配置した検索ボタンを表示する画面に遷移する。従って語が選択されるごとに画面が遷移して、表示される検索ボタン８６が入れ替わることとなる。 When the user selects the search button 86 on the second screen 82, the document search server 100 searches again for a word or document that is highly relevant to the word placed on the selected search button 86, and indicates the word. Transitions to the screen that displays the search button that has been placed. Therefore, every time a word is selected, the screen changes and the displayed search button 86 is switched.

右ペイン８４には、文書検索サーバ１００から受信した文書のスコアリストに従って、関連性（スコア）の高い文書の見出しまたはテーマ名８８ａ、８８ｂ、８８ｃ、８８ｄ、８８ｅが表示される。図示されるように文書のデータ種類によって、文書の表示領域の特定部位または背景を色分けしても良い。また、右ペイン８４には、ネットワーク１０２を介して検索された選択語に関連するインターネット上のニュース等を表示しても良い。 In the right pane 84, document headings or theme names 88a, 88b, 88c, 88d, and 88e with high relevance (score) are displayed according to the document score list received from the document search server 100. As shown in the figure, the specific part or background of the display area of the document may be color-coded according to the data type of the document. In the right pane 84, news on the Internet related to the selected word searched via the network 102 may be displayed.

右ペイン８４に表示される文書８８のいずれかがユーザによって選択されると、端末１０１の制御部１１は、文書検索サーバ１００にアクセスして、選択された文書の詳細情報（文書ＤＢに保持する情報）を配置する選択画面を表示部１６に出力する。 When one of the documents 88 displayed in the right pane 84 is selected by the user, the control unit 11 of the terminal 101 accesses the document search server 100 and stores detailed information (selected document DB). A selection screen for arranging (information) is output to the display unit 16.

図１４は選択画面９０の画面例である。図１４に示す様に、選択画面９０には、選択された文書の詳細情報９１と戻るボタン９２とが配置される。戻るボタン９２がユーザによって選択されると、第２画面８２に戻る。これにより、ユーザに対し検索された関連性の高い文書の詳細な情報を提示することができる。 FIG. 14 is a screen example of the selection screen 90. As shown in FIG. 14, detailed information 91 of a selected document and a return button 92 are arranged on the selection screen 90. When the return button 92 is selected by the user, the screen returns to the second screen 82. Thereby, detailed information of the searched highly relevant document can be presented to the user.

以上のように、文書検索システム２００は、端末１０１に表示された単語または文書を示す語の選択を受付けると、文書検索サーバ１００は選択語を配列に格納し、配列に格納される語（選択語または履歴語）を抽出して、抽出語と関連度の高い単語または文書を単語間関連度ＤＢまたは文書間関連度ＤＢ、および文書単語関連度ＤＢから検索してその関連度を取得する。抽出語が選択語の場合には取得した関連度に重み付けして単語または文書のスコアとし、抽出語が履歴語の場合には取得した関連度をそのまま単語または文書のスコアとして、配列に格納される全ての語に対して取得した単語または文書のスコアをマージ処理して、単語または文書のスコアリストを作成する。端末１０１は単語または文書のスコアリストに基づいて選択語と、スコアリストの単語または文書を示す語との距離をそのスコアに基づいて配置して表示部１６に出力する。 As described above, when the document search system 200 accepts selection of a word displayed on the terminal 101 or a word indicating a document, the document search server 100 stores the selected word in an array, and the word (selection selected) stored in the array. Word or history word) is extracted, and a word or document having a high degree of association with the extracted word is searched from the inter-word relevance DB, the inter-document relevance DB, and the document word relevance DB to obtain the relevance. If the extracted word is a selected word, the obtained relevance is weighted to obtain a word or document score, and if the extracted word is a historical word, the obtained relevance is stored as a word or document score in the array. The score of the word or document is created by merging the acquired word or document score for all words. The terminal 101 arranges the distance between the selected word based on the word or document score list and the word indicating the word or document on the score list based on the score, and outputs the distance to the display unit 16.

これにより、複数の文書から関連性の高い文書または単語を検索して、検索された文書または単語に対して関連性の指標であるスコアを算出し、選択語との関連性を距離によって表現して検索された語を表示する。またスコアは直近の選択語との関連度に加えて、過去にユーザによって選択された語の履歴との関連度を利用して算出される。従って、ユーザの検索目的により近い語を提示できるといった効果が得られる。 As a result, a highly relevant document or word is searched from a plurality of documents, a score that is an index of relevance is calculated for the searched document or word, and the relevance to the selected word is expressed by distance. Displays the searched word. The score is calculated using the degree of association with the history of the word selected by the user in the past in addition to the degree of association with the most recently selected word. Therefore, an effect that words closer to the user's search purpose can be presented is obtained.

本実施形態の文書検索システム２００は、例えば、会社内のコミュニケーションツールとして活用することができる。例えば、会社内外のニュースや、会社内の開発テーマ等の出所の異なる複数種類の文書を目的に応じて多角的に検索して、ユーザに関連性の高い文書または単語を視覚的にわかりやすく提示することができる。ユーザは提示された検索結果が表示された画面を介してタッチパネル操作等により簡便に更に検索作業を続けることが可能である。 The document search system 200 of this embodiment can be used as, for example, a communication tool in a company. For example, multiple types of documents with different sources, such as news inside and outside the company and development themes within the company, can be searched from various perspectives according to the purpose, and highly relevant documents or words are presented to the user in an easy-to-understand manner. can do. The user can continue the search operation more easily by touch panel operation or the like via the screen on which the presented search results are displayed.

以上、添付図面を参照しながら、本発明に係る文書検索システム２００等の好適な実施形態について説明したが、本発明はかかる例に限定されない。当業者であれば、本願で開示した技術的思想の範疇内において、各種の変更例又は修正例に想到し得ることは明らかであり、それらについても当然に本発明の技術的範囲に属するものと了解される The preferred embodiments of the document search system 200 and the like according to the present invention have been described above with reference to the accompanying drawings, but the present invention is not limited to such examples. It will be apparent to those skilled in the art that various changes or modifications can be conceived within the scope of the technical idea disclosed in the present application, and these naturally belong to the technical scope of the present invention. Understood

１００………文書検索サーバ
１０１………端末
１０２………ネットワーク
２００………文書検索システム
１１………制御部
１２………記憶部
１３………メディア入出力部
１４………通信制御部
１５………入力部
１６………表示部
１７………周辺機器Ｉ／Ｆ部
８０………第１画面
８２………第２画面
９０………選択画面 DESCRIPTION OF SYMBOLS 100 ......... Document search server 101 ......... Terminal 102 ......... Network 200 ......... Document search system 11 ......... Control part 12 ......... Storage part 13 ......... Media input / output part 14 ......... Communication control Section 15 ......... Input section 16 ......... Display section 17 ......... Peripheral device I / F section 80 ......... First screen 82 ......... Second screen 90 ......... Selection screen

Claims

複数のテキストデータと、前記テキストデータ同士の関連度を記憶する記憶手段と、
画面上に前記テキストデータの少なくとも一部を複数表示する第１の表示手段と、
前記画面上の選択されたテキストデータに対して前記関連度を参照して他のテキストデータを抽出する抽出手段と、
前記選択されたテキストデータ及び前記抽出手段により抽出されたテキストデータの少なくとも一部を前記画面上に表示する第２の表示手段と、
を具備することを特徴とする文書検索装置。 Storage means for storing a plurality of text data and a degree of association between the text data;
First display means for displaying a plurality of at least a part of the text data on a screen;
Extracting means for extracting other text data with reference to the relevance level for the selected text data on the screen;
Second display means for displaying at least a part of the selected text data and the text data extracted by the extraction means on the screen;
A document retrieval apparatus comprising:

前記抽出手段は、前記関連度の高いテキストデータを所定数抽出することを特徴とする請求項１に記載の文書検索装置。 The document retrieval apparatus according to claim 1, wherein the extraction unit extracts a predetermined number of text data having a high degree of association.

前記第1の表示手段及び／又は前記第２の表示手段は、前記関連度に応じて表示方式を変更することを特徴とする請求項１又は請求項２に記載の文書検索装置。 The document search apparatus according to claim 1, wherein the first display unit and / or the second display unit changes a display method according to the degree of association.

前記第1の表示手段及び／又は前記第２の表示手段は、前記テキストデータの頻出度に応じて表示方式を変更することを特徴とする請求項１乃至請求項３のいずれかに記載の文書検索装置。 4. The document according to claim 1, wherein the first display unit and / or the second display unit changes a display method according to a frequency of the text data. Search device.

前記第２の表示手段は、前記抽出されたテキストデータの前記関連度をｎ倍して表示方式を変更することを特徴とする請求項３又は請求項４に記載の文書検索装置。 5. The document search apparatus according to claim 3, wherein the second display unit changes the display method by multiplying the degree of association of the extracted text data by n. 6.

前記記憶手段は、前記選択されたテキストデータの履歴を記録し、
前記抽出手段は、選択されたことのある全てのテキストデータに対する前記関連度を参照してテキストデータを抽出することを特徴とする請求項１乃至請求項５のいずれかに記載の文書検索装置。 The storage means records a history of the selected text data,
6. The document search apparatus according to claim 1, wherein the extraction unit extracts text data with reference to the degree of association with respect to all selected text data.

前記第２の表示手段は、最後に選択されたテキストデータの前記関連度をｎ倍して表示方式を変更することを特徴とする請求項１乃至請求項６のいずれかに記載の文書検索装置。 The document search apparatus according to claim 1, wherein the second display unit changes the display method by multiplying the relevance of the text data selected last by n. .

前記テキストデータ同士の前記関連度を算出する関連度算出手段を備えることを特徴とする請求項７に記載の文書検索装置。 The document search apparatus according to claim 7, further comprising a relevance calculation unit that calculates the relevance between the text data.

複数のテキストデータと、前記テキストデータ同士の関連度を記憶する記憶ステップと、
画面上に前記テキストデータの少なくとも一部を複数表示する第１の表示ステップと、
前記画面上の選択されたテキストデータに対して前記関連度を参照して他のテキストデータを抽出する抽出ステップと、
前記選択されたテキストデータ及び前記抽出手段により抽出されたテキストデータの少なくとも一部を前記画面上に表示する第２の表示ステップと、
を含むことを特徴とする文書検索方法。 A storage step of storing a plurality of text data and a degree of association between the text data;
A first display step of displaying a plurality of at least a part of the text data on a screen;
An extraction step of extracting other text data with reference to the relevance level for the selected text data on the screen;
A second display step for displaying on the screen at least a part of the selected text data and the text data extracted by the extracting means;
A document retrieval method comprising:

コンピュータを、
複数のテキストデータと、前記テキストデータ同士の関連度を記憶する記憶手段と、
画面上に前記テキストデータの少なくとも一部を複数表示する第１の表示手段と、
前記画面上の選択されたテキストデータに対して前記関連度を参照して他のテキストデータを抽出する抽出手段と、
前記選択されたテキストデータ及び前記抽出手段により抽出されたテキストデータの少なくとも一部を前記画面上に表示する第２の表示手段と、
を具備する文書検索装置として機能させることを特徴とするプログラム。
Computer
Storage means for storing a plurality of text data and a degree of association between the text data;
First display means for displaying a plurality of at least a part of the text data on a screen;
Extracting means for extracting other text data with reference to the relevance level for the selected text data on the screen;
Second display means for displaying at least a part of the selected text data and the text data extracted by the extraction means on the screen;
A program that functions as a document search apparatus comprising: