JPH05342255A

JPH05342255A - Document retrieving method/device using syntax information of natural language

Info

Publication number: JPH05342255A
Application number: JP4177378A
Authority: JP
Inventors: Hideyuki Maki; 牧　　秀行; Ikuo Matsuba; 育雄松葉
Original assignee: Hitachi Ltd
Current assignee: Hitachi Ltd
Priority date: 1992-06-11
Filing date: 1992-06-11
Publication date: 1993-12-24

Abstract

PURPOSE:To reduce the skip of retrieving owing to different inscription and a synonym and a retrieving noise owing to the accident of gathering specified character strings in full text research by a natural language. CONSTITUTION:A syntax analytic device 107 analyzes a retrieving condition character string by the syntax analytic device and outputs syntax information (the appearance likelihood of a word and a inter-word distance) in the character string. A goodness- of-fit calcuration device 108 determines a retrieving condition character string inter- word proximate degree based on syntax information in the retrieving condition character string through the use of combination weight in a combination weight file 106, a retrieving objective sentence inter-word proximate degree based on syntax information of a retrieving objective sentence in a document data base 104 through the use of the combination weight, an associative inter-word proximate degree based on the retrieving objective sentence inter-word proximate degree and inter-word associative intensity in a synonym dictionary file 105, a word pair goodness-of-fit based on the retrieving condition character string inter-word proximate degree and the associative inter-word proximate degree through the use of the combination weight and the goodness-of-fit of the retrieving objective sentence by integrating the determined word pair fitness-of-degrees of respective word pairs.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】自然言語の形で保存された文書デ
ータベースをフルテキストサーチし、目的の文書を検索
する文書検索方法およびシステムに関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a document search method and system for performing a full-text search on a document database stored in the form of natural language to search for a target document.

【０００２】[0002]

【従来の技術】テキストベースの文献検索方法に関して
は特開平２−２５３４７４、特開平３−２０８６６、特
開平３−１３０８７３（いずれも日本電信電話
（株））、特開平３−１４８７６５（松下電器産業
（株））、特開平２−６２６６８（日本電気（株））な
どがある。このうち、特開平３−１４８７６５は文の構
文情報を用いた文書検索方法である。また、特開平２−
６２６６８は意味情報を含んだデータベースの検索方法
に関するものである。文書検索にニューラルネットワー
クを用いたものには特開平２−１６３８７６、特開平３
−１２２７６９、特開平３−１２２７７０（いずれも
（株）リコー）、特開平２−２２４０６８（（株）東
芝）などがある。これらはいずれもキーワード検索に関
するものである。2. Description of the Related Art Regarding text-based document retrieval methods, JP-A-2-253474, JP-A-3-20866, JP-A-3-130873 (Nippon Telegraph and Telephone Corporation), JP-A-3-148765 (Matsushita Electric Industrial) Co., Ltd., JP-A-2-62668 (NEC Corporation), and the like. Among them, Japanese Patent Laid-Open No. 3-148765 is a document search method using sentence syntax information. In addition, JP-A-2-
62668 relates to a method of searching a database including semantic information. Japanese Patent Laid-Open No. 163876/1990 and Japanese Patent Laid-Open No. hei 3-13876 use a neural network for document retrieval.
-122769, JP-A-3-122770 (both are Ricoh Co., Ltd.), and JP-A-2-224068 (Toshiba Corporation). These are all related to keyword search.

【０００３】[0003]

【発明が解決しようとする課題】フルテキストサーチに
よる文書検索においては文中に出現する文字列をいくつ
か指定して検索条件とし、指定された文字列が出現する
文を検索適合文として出力する方法が考えられる。この
場合、指定文字列中の単語と検索対象文中の単語との単
純な一致判定では、検索漏れや検索雑音が大きくなると
いう問題がある。例えば、「高い建物」についての文を
検索する際に『「高い」と「建物」の両方の文字列が出
現する文』を検索条件とすると、「高い建造物」という
文字列を含む文では適合文とならず、検索漏れとなる。
また、「高い山の麓に小さい建物ができた」という文字
列を含む文は「高い建物」についての文ではないにも関
わらず、「高い」と「建物」の両方の文字列を含んでい
るために適合文となり、これが検索雑音となる。かとい
って、『「高い建物」という文字列が出現する文』を検
索条件にすると、今度は「この建物は高い」のような文
が検索漏れとなる。本発明の目的は異表記や類義語によ
る検索漏れ及び指定文字列が偶然そろうことによる検索
雑音を減少させることにある。In a document search by full-text search, a method of outputting a sentence in which a specified character string appears as a search matching sentence by specifying some character strings appearing in the sentence as search conditions. Can be considered. In this case, a simple match determination between the word in the designated character string and the word in the search target sentence has a problem that search omission and search noise increase. For example, if you search for a sentence about "high building" and "the sentence in which both" high "and" building "appear", the sentence containing the string "high building" It will not be a matching sentence and will be omitted from the search.
Also, the sentence containing the string "A small building was formed at the foot of a high mountain" is not a sentence about "high building", but it contains both the strings "high" and "building". Since it is a matching sentence, it becomes a search noise. However, if the search condition is "a sentence in which the character string" tall building "appears", a sentence such as "this building is expensive" will be omitted. It is an object of the present invention to reduce search omission due to different notations and synonyms and search noise due to accidental occurrence of designated character strings.

【０００４】[0004]

【課題を解決するための手段】本発明の文書検索方法
は、入力された検索条件文字列の構文情報である検索条
件文字列内の単語の出現確度および単語間距離を基に検
索条件文字列単語間近接度を求める過程と、文書データ
ベース内の検索対象文の構文情報である検索対象文内の
単語の出現確度および単語間距離を基に検索対象文単語
間近接度を求める過程と、該検索対象文単語間近接度と
単語間の連想強度とを基に連想単語間近接度を求める過
程と、前記検索条件文字列単語間近接度と連想単語間近
接度とを基に単語対適合度を求める過程と、求められた
単語対ごとの単語対適合度を統合して検索対象文の適合
度を求める過程を備えている。According to the document search method of the present invention, the search condition character string is based on the word appearance probability and the word distance in the search condition character string, which is the syntax information of the input search condition character string. A step of obtaining a word-to-word proximity degree, a step of obtaining a search-to-word sentence word-to-word proximity degree based on a word appearance probability and a word-to-word distance in a search target sentence which is syntactic information of the search target sentence in the document database; Search target sentence The process of obtaining the associative word proximity based on the word proximity and the word association strength, and the word pair matching degree based on the search condition character string word proximity and the associative word proximity And a step of obtaining the matching degree of the search target sentence by integrating the word pair matching degree for each word pair thus obtained.

【０００５】また、本発明の文書検索システムは、自然
言語の形で保存され、データベース内の全文を検索対象
文とする文書データベースと、検索条件入出力装置と、
構文解析装置と、適合度計算装置と、類義語辞書ファイ
ルと、結合重みファイルと、出力装置を備えている。前
記構文解析装置は、入力された検索条件文字列を解析し
て構文情報である検索条件文字列内の単語の出現確度お
よび単語間距離を出力する手段を備えている。前記適合
度計算装置は、前記文書検索方法における各過程の処理
を行なう手段を夫々備えている。そして、前記出力装置
は、検索対象文の適合度を出力する。また、前記各手段
のうちの少なくとも２つの手段にニューラルネットワー
クを適用する。さらに、教師値入力装置を設け、教師値
に基づきニューラルネットワークに学習をさせる。Further, the document retrieval system of the present invention is a document database which is stored in the form of natural language and has all sentences in the database as retrieval target sentences, a retrieval condition input / output device,
It includes a syntax analysis device, a fitness calculation device, a synonym dictionary file, a connection weight file, and an output device. The syntax analysis device includes means for analyzing an input search condition character string and outputting a word appearance probability and a word distance in the search condition character string, which is syntax information. The conformity calculation apparatus is provided with means for performing the processing of each step in the document search method. Then, the output device outputs the matching degree of the search target sentence. Further, a neural network is applied to at least two of the above means. Further, a teacher value input device is provided to allow the neural network to learn based on the teacher value.

【０００６】[0006]

【作用】構文情報として、検索条件文字列については検
索条件文字列内の単語の出現確度および単語間距離を用
い、検索対象文については検索対象文内の単語の出現確
度および単語間距離を用い、さらに、単語から別の単語
がどの程度の確からしさで連想されるかを示す単語間の
連想強度を用いて、適合する検索対象文を検索している
ため、検索において、異表記や類義語による検索漏れ及
び指定文字列が偶然そろうことによる検索雑音を減少さ
せることができる。また、ニューラルネットワークを用
いることにより、学習を行なうことができ、使用者の要
求にマッチした検索を行なうことができる。[Function] As the syntactic information, the occurrence probability and inter-word distance of words in the search condition character string are used for the search condition character string, and the occurrence probability and inter-word distance of the words in the search target sentence are used for the search target sentence. Furthermore, since the matching search target sentence is searched by using the association strength between words, which indicates the degree of certainty that another word is associated with another word, in the search, a different notation or a synonym is used. It is possible to reduce search noise due to omission of search and coincidence of designated character strings. Also, by using a neural network, learning can be performed and a search that matches the user's request can be performed.

【０００７】[0007]

【実施例】本発明の一実施例を説明する。図１にその構
成を示す。文書データベース１０４には検索対象文がそ
の構文情報とともに保持されている。図２は検索対象文
の例である。１０個の文はいずれも「建物」に関する文
であるが文番号６から文番号１０は特に「高い建物」に
関する文である。適合度計算手段１０８が扱う単語の集
合をＷＯＲＤ、その要素数をＮとする。ここでは簡単の
ため適合度計算手段１０８で扱う単語を次の４つに限定
している。したがってＮ＝４である。EXAMPLE An example of the present invention will be described. The structure is shown in FIG. The document database 104 holds the search target sentence together with its syntax information. FIG. 2 is an example of a search target sentence. The ten sentences are all sentences relating to "building", but sentence numbers 6 to 10 are particularly sentences relating to "high building". Let WORD be the set of words handled by the fitness calculating means 108, and let N be the number of elements. Here, for simplicity, the words handled by the fitness calculation means 108 are limited to the following four words. Therefore, N = 4.

【０００８】[0008]

【数１】ＷＯＲＤ＝｛word_i｜ｉ＝１，…Ｎ｝＝｛建物，ビル，家，高い｝ただし word₁＝「建物」 word₂＝「ビル」 word₃＝「家」 word₄＝「高い」文書データベース１０４に保持される
構文情報は各検索対象文についての単語の出現確度Ａte
xt_iと単語間距離Ｄtext_ijである。ＷＯＲＤの要素の単
語のうち検索対象文に出現する単語の集合をＴＥＸＴと
する。例えば、図２の文番号１の場合[Equation 1] WORD = {word _i | i = 1, ... N} = {building, building, house, high} However, word ₁ = "building" word ₂ = "building" word ₃ = "house" word ₄ = " The syntax information held in the “high” document database 104 is word appearance probability Ate for each search target sentence.
xt _i and the inter-word distance Dtext _ij . A set of words that appear in the search target sentence among the words of the WORD element is TEXT. For example, in the case of sentence number 1 in FIG.

【０００９】[0009]

【数２】ＴＥＸＴ＝｛建物，高い｝となる。また、文番号６の場合は[Formula 2] TEXT = {building, high}. In the case of sentence number 6,

【００１０】[0010]

【数３】ＴＥＸＴ＝｛家，高い｝となる。Ｄtext_ijは次のように定義される。EQUATION 3 TEXT = {home, high}. Dtext _ij is defined as follows.

【００１１】[0011]

【数４】 [Equation 4]

【００１２】ここで、word_iとword_jの構文上の相対距離
は次のように求められる。図３は構文解析結果として得
られる構文木の例である。例えば、図３の文番号１に出
現する単語「高い」と「建物」について、構文木上で単
語「高い」を表す節から単語「建物」を表す節まで最短
経路で至るには４つの枝を通過しなければならない。こ
れを、「高い」と「建物」の間の構文上の絶対距離が４
であるということにする。このように、まず単語間の構
文上の絶対距離を求める。次に構文上の絶対距離の最大
値を求める。図３の文番号１の場合は「その」と「高
い」の間の絶対距離が５で最も大きく、これが構文上の
絶対距離の最大値となる。そこで、各単語間の構文上の
絶対距離をこの絶対距離の最大値で割って正規化する。
この正規化された距離を構文上の相対距離とする。「高
い」と「建物」の間の構文上の相対距離は４／５＝０．
８となる。この構文上の相対距離を構文情報として使用
する。図３、文番号６の「高い」と「家」の場合は、構
文上の絶対距離は１、絶対距離の最大値は「向い」と
「日当たり」あるいは「悪く」の間の４であるので、
「高い」と「家」の間の構文上の相対距離は１／４＝
０．２５となる。Ａtext_iは次のように定義される。Here, the syntactic relative distance between word _i and word _j is obtained as follows. FIG. 3 is an example of a syntax tree obtained as a result of syntax analysis. For example, for the words “high” and “building” appearing in sentence number 1 of FIG. 3, four branches are required to reach the shortest path from the section representing the word “high” to the section representing the word “building” on the syntax tree. Have to go through. The absolute syntactic distance between “high” and “building” is 4
I will assume that. In this way, first, the syntactical absolute distance between words is obtained. Next, find the maximum syntactical absolute distance. In the case of sentence number 1 in FIG. 3, the absolute distance between “that” and “high” is 5, which is the largest, and this is the maximum value of the absolute distance on the syntax. Then, the syntactical absolute distance between each word is divided by the maximum value of this absolute distance for normalization.
Let this normalized distance be a syntactic relative distance. The syntactic relative distance between "high" and "building" is 4/5 = 0.
It becomes 8. This syntactic relative distance is used as syntactic information. In the case of "high" and "house" of sentence number 6 in Fig. 3, the syntactical absolute distance is 1, and the maximum absolute distance is 4 between "facing" and "sunlit" or "bad". ,
The syntactic relative distance between “high” and “house” is 1/4 =
It becomes 0.25. Atext _i is defined as follows.

【００１３】[0013]

【数５】 [Equation 5]

【００１４】図４は文書データベース１０４に保持され
る構文情報の例である。Ｄtext中の「＊」はその欄に該
当するＤtextが定義されないことを表す。また、構文上
の相対距離の求め方からあきらかにＤtext_ij＝Ｄtext_ji
なので、図の左下の部分は必要なく、空欄とした。FIG. 4 shows an example of syntax information stored in the document database 104. “*” In Dtext indicates that the Dtext corresponding to that column is not defined. Moreover, it is clear from the method of obtaining the relative distance in the syntax that Dtext _ij = Dtext _ji
Therefore, the lower left part of the figure is unnecessary and left blank.

【００１５】類義語辞書ファイル１０５にはＷＯＲＤの
要素中のある単語から別の単語がどの程度の確からしさ
で連想されるかという情報が保持されている。この確か
らしさのことを連想強度と呼ぶことにする。図５は類義
語辞書ファイル１０５に保持される連想強度の例であ
る。例えば、「建物」から「家」への連想強度は「建
物」の行、「家」の列に記されている値で、０．９とな
っている。一般に単語ｉから単語ｊを連想する連想強度
と単語ｊから単語ｉを連想する連想強度は等しくない。
連想強度が１のときは全くの同義語、連想強度が０のと
きは意味が違うことを意味する。「建物」から「家」へ
の連想強度が０．９ということは検索対象文中の単語
「建物」を確からしさ０．９で単語「家」に置き換えて
もよいと解釈することもできる。類義語連想手段１０９
はこの連想強度を用いて類義語連想を行う。結合重みフ
ァイル１０６には適合度計算手段１０８を構成するニュ
ーラルネットワークの結合重みが保持されている。The synonym dictionary file 105 holds information about how certain a word in an element of WORD is associated with another word. This certainty is called associative strength. FIG. 5 shows an example of the association strength stored in the synonym dictionary file 105. For example, the association strength from “building” to “house” is 0.9, which is the value written in the row of “building” and the column of “house”. Generally, the association strength associated with the word j from the word i and the association strength associated with the word i from the word j are not equal.
When the associative strength is 1, it means completely synonymous, and when the associative strength is 0, it means that the meaning is different. The associative strength of 0.9 from “building” to “house” can be interpreted as the word “building” in the search target sentence may be replaced with the word “house” with a certainty of 0.9. Synonym association means 109
Performs synonym association using this association strength. The connection weight file 106 holds the connection weights of the neural networks forming the fitness calculating means 108.

【００１６】使用者は検索条件入出力手段１０１に検索
すべき文字列を入力する。入力された文字列は構文解析
手段１０７で構文解析され、文書データベース１０４に
保持されているのと同様の構文情報が得られる。なお、
適合度計算手段１０８は集合ＷＯＲＤの要素である単語
のみを扱うが、構文解析手段１０７はこれとは無関係に
あらゆる単語を扱えるものとする。得られた構文情報は
確認のため検索条件入出力手段１０１によって使用者に
表示される。使用者は必要に応じて構文情報を修正する
ことができる。図６は検索条件文字列とその構文解析結
果、構文情報の例である。適合度計算手段１０８で扱う
単語のうち検索条件文字列に出現する単語の集合をＫＥ
Ｙとする。例えば、図６に示した検索条件文字列の場合
はThe user inputs a character string to be searched to the search condition input / output means 101. The input character string is parsed by the syntax analysis unit 107, and the same syntax information as that stored in the document database 104 is obtained. In addition,
The goodness-of-fit calculation means 108 handles only the words that are the elements of the set WORD, but the syntactic analysis means 107 can handle all the words regardless of this. The obtained syntax information is displayed to the user by the search condition input / output unit 101 for confirmation. The user can modify the syntax information as needed. FIG. 6 shows an example of a search condition character string, its syntactic analysis result, and syntactic information. KE is a set of words that appear in the search condition character string among the words handled by the goodness-of-fit calculation unit 108.
Y For example, in the case of the search condition character string shown in FIG.

【００１７】[0017]

【数６】ＫＥＹ＝｛建物，高い｝となる。検索条件文字列についての構文情報は検索対象
文と同様、単語の出現確度Ａkey_iと単語間距離Ｄkey_ij
である。Ｄkey_ijは次のように定義される。EQUATION 6 KEY = {building, high} Similar to the search target sentence, the syntax information about the search condition character string is the word appearance probability Akey _i and the interword distance Dkey _ij.
Is. Dkey _ij is defined as follows.

【００１８】[0018]

【数７】 [Equation 7]

【００１９】検索対象文の場合は単語間距離には構文上
の相対距離を用いたが、検索条件文字列の場合は構文上
の絶対距離を用いた。Ａkey_iは次のように定義される。In the case of the retrieval target sentence, the syntactic relative distance was used as the inter-word distance, but in the case of the retrieval condition character string, the syntactic absolute distance was used. Akey _i is defined as follows.

【００２０】[0020]

【数８】 [Equation 8]

【００２１】確認の終わった構文情報Ａkey_i、Ｄkey_ij
は検索条件入出力手段１０１から適合度計算手段１０８
へ送られる。検索対象文の構文情報は文書データベース
１０４から一文ずつ読み出され、検索条件文字列の構文
情報とともに適合度計算手段１０８へ入力される。適合
度計算手段１０８では検索対象文の構文情報を検索条件
文字列の構文情報と比較し、その検索対象文の適合度を
計算する。適合度計算手段１０８で計算された検索対象
文の適合度は出力手段１０３へ送られる。使用者は出力
手段１０３に対して出力様式を指定することができ、出
力手段１０３はこの出力様式にしたがって適合度計算手
段１０８からの結果を使用者へ出力する。使用者が指定
する出力様式には例えば、すべての検索対象文について
その適合度を表示させたり、適合度がある閾値を越える
検索対象文だけを表示させるなどが考えられる。また、
ユーザは出力された適合度に不満足の場合、教師値入力
手段１０２を用いて望ましい適合度を教師値として適合
度計算部１０８に提示することができる。適合度計算部
１０８は教師値を受け取ると、学習動作に入り、教師値
により近い適合度を出力するように内部の計算機構を修
正する。この修正される計算機構とは具体的には後述の
ニューラルネットワークの結合重みである。修正された
結合重みは類義語辞書ファイル１０５、結合重みファイ
ル１０６に格納され、その後の適合度計算の際に使用さ
れる。検索条件入出力手段１０１、教師値入力手段１０
２、出力手段１０３は１台の入出力装置を兼用すること
もできる。Syntax information Akey _i , Dkey _{ij for which} confirmation has been completed
Is from the search condition input / output means 101 to the fitness calculation means 108
Sent to. The syntax information of the search target sentence is read from the document database 104 one by one, and is input to the fitness calculation means 108 together with the syntax information of the search condition character string. The matching degree calculating means 108 compares the syntax information of the search target sentence with the syntax information of the search condition character string, and calculates the matching degree of the search target sentence. The goodness of fit of the search target sentence calculated by the goodness-of-fit calculation unit 108 is sent to the output unit 103. The user can specify an output format for the output means 103, and the output means 103 outputs the result from the fitness calculation means 108 to the user according to this output format. The output style designated by the user may be, for example, displaying the relevance of all search target sentences or displaying only the search target sentences whose relevance exceeds a certain threshold. Also,
When the user is dissatisfied with the output fitness, the user can use the teacher value input means 102 to present the desired fitness to the fitness calculator 108 as a teacher value. Upon receiving the teacher value, the goodness-of-fit calculation unit 108 starts a learning operation and modifies the internal calculation mechanism so that the goodness of fit closer to the teacher value is output. The modified calculation mechanism is specifically the connection weight of the neural network described later. The corrected connection weight is stored in the synonym dictionary file 105 and the connection weight file 106, and is used in the subsequent calculation of the fitness. Search condition input / output means 101, teacher value input means 10
2. The output means 103 can also serve as one input / output device.

【００２２】図７、図８は適合度計算手段１０８の内部
構成の一例である。連想単語間近接度計算部７０８が類
義語連想手段１０９にあたる。本実施例では検索条件文
字列単語間近接度変換部７０６、検索対象文単語間近接
度変換部７０７、単語対適合度計算部８０５をそれぞれ
多層型ニューラルネットワークで、連想単語間近接度計
算部７０８をリカレント型ニューラルネットワークで構
成している。一般にニューラルネットワークは多入力一
出力の比較的簡単な計算ユニットを結合重みを介して多
数結合してなる計算装置である。計算ユニットの入出力
特性は次式で表される。FIGS. 7 and 8 show an example of the internal structure of the fitness calculating means 108. The associative word proximity calculation unit 708 corresponds to the synonym associating means 109. In the present embodiment, the search condition character string inter-word proximity conversion unit 706, the search target sentence inter-word proximity conversion unit 707, and the word pair matching score calculation unit 805 are each a multilayer neural network, and the association word inter-proximity calculation unit 708. Is composed of a recurrent neural network. In general, a neural network is a computing device in which a large number of relatively simple computing units with multiple inputs and one output are connected via connection weights. The input / output characteristic of the calculation unit is expressed by the following equation.

【００２３】[0023]

【数９】ｏ_i＝ｆ(net_i) net_i＝ａ(ｗ_ijｏ_j，θ_i) ｊ＝１，…，ＭただしＭはユニットｉへ入力するユニ
ットの数ｏ_i ：計算ユニットｉの出力値ｗ_ij：ユニットｊの出力とユニットｉの入力の間の結合
重み θ_i ：ユニットｉのバイアスａ()：入力関数ｆ()：出力関数多層型ニューラルネットワークは図２３のように入力
層、いくつかの中間層、出力層が順に結合されてなるニ
ューラルネットワークである。多層型ニューラルネット
ワークにおいては通常、信号は各層間を入力層側から出
力層側へ伝達され、出力層側から入力層側へ逆には伝達
されない。また、通常、同一層内の計算ユニット同士の
間には結合がない。したがって、計算ユニット間結合に
ループが存在せず、計算ユニット間の信号の伝達が入力
側の計算ユニットから出力側の計算ユニットへの一方向
のみである。このようなニューラルネットワークを特に
フィードフォワード型ニューラルネットワークと呼ぶこ
とがある。多層型ニューラルネットワークでは計算ユニ
ットの入・出力関数として次のような総和、及びシグモ
イド関数がよく用いられる。[Mathematical formula-see original document] o _i = f (net _i ) net _i = a (w _ij o _j , θ _i ) j = 1, ..., M where M is the number of units input to the unit _i o _i : of the calculation unit i Output value w _ij : Connection weight between output of unit j and input of unit i θ _i : Bias of unit i a (): Input function f (): Output function The multi-layer neural network has an input layer as shown in FIG. , A neural network in which several intermediate layers and output layers are connected in order. In a multilayer neural network, normally, a signal is transmitted between each layer from the input layer side to the output layer side, and not from the output layer side to the input layer side. Moreover, there is usually no coupling between the calculation units in the same layer. Therefore, there is no loop in the coupling between the calculation units, and the signal transmission between the calculation units is only in one direction from the calculation unit on the input side to the calculation unit on the output side. Such a neural network is sometimes called a feedforward type neural network. In the multilayer neural network, the following summation and sigmoid function are often used as input / output functions of the calculation unit.

【００２４】[0024]

【数１０】 [Equation 10]

【００２５】ニューラルネットワークに求める計算をさ
せるためには結合重み、及びバイアスの値を決定しなけ
ればならない。これは一般に簡単には決定できないが、
ニューラルネットワークには、使用者が明示的に与えず
とも、これらの値を自動的に調節できるという特徴があ
る。これをニューラルネットワークの学習という。多層
型ニューラルネットワーク向けの学習アルゴリズムとし
て誤差逆伝播法がよく知られている。誤差逆伝播法は教
師付き学習法で、ニューラルネットワークへの入力値
と、その入力値に対してニューラルネットワークが出力
すべき出力値の組を教師値として提示すると、教師出力
値とニューラルネットワークの出力値との誤差が減少す
るようにニューラルネットワークの結合重み、及びバイ
アスを変更する。結合重みｗ_ijの変更量Δｗ_ij、及びバ
イアスθ_iの変更量Δθ_iは次式で与えられる。In order for the neural network to perform the required calculation, it is necessary to determine the values of the connection weight and the bias. This is generally not easy to determine,
The neural network has a feature that these values can be automatically adjusted without the user explicitly giving them. This is called learning of neural networks. The error back-propagation method is well known as a learning algorithm for multilayer neural networks. The error back-propagation method is a supervised learning method. When a pair of an input value to a neural network and an output value to be output by the neural network for the input value is presented as a teacher value, the teacher output value and the output of the neural network are output. The connection weight and the bias of the neural network are changed so that the error with the value decreases. Change amount [Delta] w _ij of connection weights w _ij, and bias theta _i of change amount [Delta] [theta] _i is given by the following equation.

【００２６】[0026]

【数１１】Δｗ_ij＝−ηδ_iｏ_j Δθ_i ＝−ηδ_i ただし、ηは学習定数と呼ばれる正の定数、ｏ_jはユニ
ットｊの出力である。δ_iはユニットｉの誤差で、次式
で与えられる。Δw _ij = −ηδ _i o _j Δθ _i = −ηδ _i where η is a positive constant called a learning constant and o _j is the output of the unit j. δ _i is the error of unit i and is given by the following equation.

【００２７】[0027]

【数１２】 [Equation 12]

【００２８】ここで、ｔ_iはユニットｉに対する教師出
力値、ｆ'()は出力関数ｆ()の導関数である。数１２の
下側の式において、ユニットｊはユニットｉの出力値を
入力として受け取る計算ユニットである。誤差逆伝播法
は繰返し計算に基づく最急降下法なので、入力値とそれ
に対する教師出力値をニューラルネットワークに提示し
ては数１１、数１２にしたがって結合重みを変更すると
いう操作を繰り返し、学習を行う。多層型ニューラルネ
ットワークに対し、リカレント型ニューラルネットワー
クは計算ユニット間結合にループが存在するニューラル
ネットワークである。Here, t _i is a teacher output value for the unit i, and f ′ () is a derivative of the output function f (). In the lower expression of Equation 12, unit j is a calculation unit that receives the output value of unit i as an input. The error back-propagation method is a steepest descent method based on iterative calculation. Therefore, learning is performed by repeating an operation of presenting an input value and a teacher output value corresponding to the input value to the neural network and changing the connection weights according to Expressions 11 and 12. .. In contrast to the multi-layered neural network, the recurrent neural network is a neural network in which a loop exists in the coupling between calculation units.

【００２９】適合度計算手段１０８は適合度計算機能
と、学習機能を持つ。図７、図８を基に、初めに適合度
計算機能について述べ、その後で学習機能について述べ
る。検索条件文字列の構文情報７０４と１つの検索対象
文の構文情報７０５は適合度計算手段１０８の内部でそ
れぞれ検索条件文字列単語間近接度変換部７０６、検索
対象文単語間近接度変換部７０７に入力される。検索条
件文字列単語間近接度変換部７０６、検索対象文単語間
近接度変換部７０７はどちらも図９に示すような１個の
入力ユニット、１個の出力ユニットと任意の数の中間ユ
ニットを持つ多層型ニューラルネットワークで構成さ
れ、その結合重み７０１、７０２は結合重みファイル１
０６から読み込まれる。本実施例中の多層型ニューラル
ネットワークで使用する計算ユニットの入出力関数は前
記の数１０に示すシグモイド関数である。The fitness calculation means 108 has a fitness calculation function and a learning function. Based on FIGS. 7 and 8, the fitness calculation function will be described first, and then the learning function will be described. The syntax information 704 of the search condition character string and the syntax information 705 of one search target sentence are included in the fitness calculation means 108, respectively, the search condition character string inter-word proximity conversion unit 706 and the search target sentence inter-word proximity conversion unit 707. Entered in. The search condition character string inter-word proximity conversion unit 706 and the search target sentence inter-word proximity conversion unit 707 each include one input unit, one output unit and an arbitrary number of intermediate units as shown in FIG. The connection weights 701 and 702 are composed of a multi-layered neural network
It is read from 06. The input / output function of the calculation unit used in the multi-layered neural network in this embodiment is the sigmoid function shown in the equation (10).

【００３０】図１０は多層型ニューラルネットワークの
動作を示すＰＡＤ図である。以下に、該ＰＡＤ図の説明
をする。１００１：ｉが１からＮ₁までのそれぞれの場合につい
て１００２の処理を行い、その後、１００３へ行く。１００２：ネットワークの入力値Ｉ_iを第１層ユニット
ｉの出力値ｏ_i1に代入する。１００３：ｋが２からＬまでのそれぞれの場合について
１００４の処理を行い、その後、１００９へ行く。１００４：ｉが１からＮ_kまでのそれぞれの場合につい
て１００５から１００８の処理を行う。１００５：バイアスθ_ikを変数netに代入する。１００６：ｊが１からＮ_k-1までのそれぞれの場合につ
いて１００７の処理を行い、その後、１００８へ行く。１００７：netにｗ_ijk・ｏ_jk-1を加える。１００８：ｆ(net)をｏ_ikに代入する。１００９：ｉが１からＮ_Lまでのそれぞれの場合につい
て１０１０の処理を行う。１０１０：ｏ_iLをネットワークの出力値Ｏ_iに代入す
る。ここで、Ｌ：層数Ｎ_k ：第ｋ層のユニット数ｏ_ik ：第ｋ層ユニットｉの出力 θ_ik ：第ｋ層ユニットｉのバイアスｗ_ijk ：第ｋ−１層ユニットｊから第ｋ層ユニットｉへ
の結合重み f(net)：ユニットの入出力関数Ｉ_i ：入力ユニットｉへの入力Ｏ_i ：出力ユニットｉの出力である。FIG. 10 is a PAD diagram showing the operation of the multilayer neural network. The PAD diagram will be described below. 1001: The process of 1002 is performed for each case where i is 1 to N ₁ , and then the process proceeds to 1003. 1002: Substitute the input value I _i of the network for the output value o _i1 of the first layer unit i. 1003: For each case where k is from 2 to L, the process of 1004 is performed, and then the process proceeds to 1009. 1004: Processes 1005 to 1008 are performed for each case of i from 1 to N _k . 1005: Substitute the bias θ _ik into the variable net. 1006: For each case of j from 1 to N _k−1 , the process of 1007 is performed, and then the process proceeds to 1008. 1007: Add w _ijk · o _jk-1 to net. 1008: Substitute f (net) for o _ik . 1009: Perform 1010 for each case where i is 1 to N _L. 1010: Substitute o _iL into the output value O _{i of the} network. Here, L: the number of layers N _k : the number of units of the k-th layer o _ik : output of the k-th layer unit i θ _ik : bias of the k-th layer unit i w _ijk : k-th layer unit j to k-th layer Connection weight to unit i f (net): Input / output function of unit I _i : Input to input unit i O _i : Output of output unit i

【００３１】検索条件文字列単語間近接度変換部７０６
及び検索対象文単語間近接度変換部７０７ではそれぞれ
Ｄkey_ij、Ｄtext_ijを多層型ニューラルネットワークの
入力値Ｉ₁として入力し、出力値Ｏ₁をＣkey_ij、Ｃtext
_ijとする。図１１は検索条件文字列単語間近接度変換部
７０６及び検索対象文単語間近接度変換部７０７で行う
処理を表すＰＡＤ図である。以下に、該ＰＡＤ図の説明
をする。１１０１：ｉが１からＮ−１まで、ｊがｉ＋１からＮま
でのそれぞれの場合について１１０２と１１０６の処理
を行う。１１０２：word_i∈ＫＥＹかつword_j∈ＫＥＹならば１１
０３から１１０５の処理を行う。そうでなければ１１０
６へ行く。１１０３：Ｄkey_ijを検索条件文字列単語間近接度変換
部７０６を構成するニューラルネットワークの入力値Ｉ
₁に代入する。１１０４：図１０により、多層型ニューラルネットワー
クの計算を行う。１１０５：ニューラルネットワークの出力値ｏ₁をＣkey
_ijに代入する。１１０６：word_i∈ＴＥＸＴかつword_j∈ＴＥＸＴならば
１１０７から１１０９の処理を行う。１１０７：Ｄtext_ijを検索対象文単語間近接度変換部７
０７を構成するニューラルネットワークの入力値Ｉ₁に
代入する。１１０８：図１０により、多層型ニューラルネットワー
クの計算を行う。１１０９：ニューラルネットワークの出力値ｏ₁をＣtex
t_ijに代入する。ここで、Ｉ₁：図１０のＩ_i ｏ₁：図１０のｏ_i であり、 word_iは（数１）ＫＥＹは（数６）ＴＥＸＴは（数２）、（数３）をそれぞれ参照。Ｃkey_ijを検索条件文字列単語間近接
度、Ｃtext_ijを検索対象文単語間近接度と呼ぶことにす
る。図１２に検索条件文字列単語間近接度変換部７０６
の出力の例、図１３に検索対象文単語間近接度変換部７
０７の出力の例を示す。図４、図６のＤkey_ij、Ｄtext
_ijのかわりにＣkey_ij、Ｃtext_ijが出力されている。wor
d_iとword_jの構文上の距離が大きいほどＤkey_ij、Ｄtext
_ijの値は大きく、Ｃkey_ij、Ｃtext_ijの値は小さくな
る。Search condition character string inter-word proximity conversion unit 706
And search respectively the sentence words between proximity conversion unit 707 DKEY _ij, enter the dtext _ij as the input value I ₁ of the multi-layer neural network, the output values O ₁ Ckey _ij, Ctext
_ij . FIG. 11 is a PAD diagram showing processing performed by the search condition character string inter-word proximity conversion unit 706 and the search target sentence inter-word proximity conversion unit 707. The PAD diagram will be described below. 1101: The processing of 1102 and 1106 is performed for each case where i is 1 to N−1 and j is i + 1 to N. 1102: 11 if word _i ∈ KEY and word _j ∈ KEY
The processing from 03 to 1105 is performed. Otherwise 110
Go to 6. 1103: Dkey _ij is an input value I of the neural network that constitutes the search condition character string inter-word proximity conversion unit 706.
Substitute in ₁ . 1104: According to FIG. 10, the calculation of the multilayer neural network is performed. 1105: The output value o ₁ of the neural network is Ckey
Substitute for _ij . 1106: If word _i ∈ TEXT and word _j ∈ TEXT, the processes from 1107 to 1109 are performed. 1107: Search Dtext _ij for search target sentence inter-word proximity conversion unit 7
It is substituted for the input value I ₁ of the neural network forming 07. 1108: Calculate the multilayer neural network according to FIG. 1109: Output value o ₁ of the neural network is Ctex
Substitute in t _ij . Here, I ₁ : I _i o _{1 of} FIG. 10: o _i of FIG. 10, word _i is (Equation 1), KEY is (Equation 6), and TEXT is (Equation 2), (Equation 3), respectively. Ckey _ij will be referred to as a search condition character string word proximity, and Ctext _ij will be referred to as a search target sentence word proximity. FIG. 12 shows the search condition character string inter-word proximity conversion unit 706.
13 is an example of the output of the search target sentence inter-word proximity conversion unit 7 in FIG.
An example of the output of 07 is shown. Dkey _ij and Dtext in FIGS. 4 and 6
Ckey _ij instead of _{_ij,} Ctext _ij is output. wor
The larger the syntactic distance between d _i and word _j, the greater Dkey _ij and Dtext.
The value of _ij is large, and the values of Ckey _ij and Ctext _ij are small.

【００３２】Ｃtext_ijは次に連想単語間近接度計算部７
０８に入力される。連想単語間近接度計算部７０８は図
１４に示すようなリカレント型ニューラルネットワーク
で構成されている。このニューラルネットワークはＮ個
のユニットを持ち、それぞれのユニットがＷＯＲＤの要
素である単語１つに対応している。そして、連想強度７
０３をその結合重みとしている。連想強度７０３は類義
語辞書ファイル１０５から読み込まれる。図１４に示す
ように全てのユニットの出力は他の全てのユニットの入
力に結合しており、一般に結合重みは対称ではない。す
なわち、ｗ_ij≠ｗ_jiである。また、全てのユニットは場
合により入力ユニットにも、出力ユニットにもなる。こ
こで使用する計算ユニットの入出力関数は次式に示す最
大値関数である。Next, Ctext _ij is an associative word proximity calculation unit 7
08 is input. The associative word proximity calculation unit 708 is composed of a recurrent neural network as shown in FIG. This neural network has N units, and each unit corresponds to one word that is an element of WORD. And associative strength 7
03 is the connection weight. The associative strength 703 is read from the synonym dictionary file 105. As shown in FIG. 14, the outputs of all units are coupled to the inputs of all other units, and the coupling weights are generally not symmetrical. That is, w _ij ≠ w _ji . In addition, all units may be both input units and output units. The input / output function of the calculation unit used here is the maximum value function shown in the following equation.

【００３３】[0033]

【数１３】ａ(ｗ_ijｏ_j，θ_i)＝max_j(ｗ_ijｏ_j) ｆ(net_i)＝net_i 本実施例では、外部からリカレント型ニューラルネット
ワ−クへの入力値を〔０，１〕の範囲に制限する。ま
た、ユニット間の結合重みも同様に〔０，１〕に制限さ
れている。したがって、全てのユニットの出力値は
〔０，１〕の範囲内にあることが保証される。ユニット
間の結合重みが〔０，１〕に制限されていることと、ユ
ニットの入出力関数が最大値関数であることにより、リ
カレント型ネットワークではあるが、信号の伝達経路に
はループが存在せず、後で述べる誤差逆伝播学習法が可
能となる。このリカレント型ニューラルネットワ−クの
動作を表すＰＡＤ図を図１５に示す。以下に、該ＰＡＤ
図の説明をする。１５０１：ｉが１からＮまでのそれぞれの場合について
１５０２の処理を行い、その後、１５０３へ行く。１５０２：計算ユニットの出力値ｏ_iに０を代入する。１５０３：入力ユニットＩの出力値ｏ_IにＡ_Iを代入す
る。また、変数src_IにＩを代入する。１５０４：変数MXCHGの値が０になるまで１５０５の処
理を繰り返す。１５０５：ｉが１からＮまで（ただし、Ｉを除く）のそ
れぞれの場合について１５０６から１５１４の処理を行
う。１５０６：src_iに０を代入する。１５０７：変数MXINに０を代入する。１５０８：ｊが１からＮまで（ただし、ｉは除く）のそ
れぞれの場合について１５０９から１５１０の処理を行
い、その後、１５１３へ行く。１５０９：変数INにｗ_ij・ｏ_jを代入する。１５１０：MXIN＜INならば１５１１と１５１２の処理を
行う。１５１１：MXINにINの値を代入する。１５１２：src_iにｊを代入する。１５１３：変数CHGに｜ｏ_i−IN｜を代入する。１５１４：MXCHG＜CHGならば１５１５の処理を行う。１５１５：MXCHGにCHGの値を代入し、１５０４へ行く。ここで、Ｎ：ユニット数Ｉ：入力ユニットにするユニットの番号Ａ_I ：入力値ｏ_i ：ユニットｉの出力値ｗ_ij：ユニットｊからユニットｉへの結合重み src_i：ユニットｉへの最大入力値を出力したユニットの
番号である。連想単語間近接度計算部７０８では連想単語間
近接度ＡＣtext_ijを計算する。ＡＣtext_ijを計算するた
めにはまず連想単語出現確度ＡＡ_iを計算する。ＡＡ_iは
次式で定義される。A (w _ij o _j , θ _i ) = max _j (w _ij o _j ) f (net _i ) = net _{i In} this embodiment, the input value from the outside to the recurrent neural network is [ 0, 1]. Similarly, the connection weight between units is also limited to [0, 1]. Therefore, the output values of all units are guaranteed to be in the range [0,1]. Although the connection weight between units is limited to [0, 1] and the input / output function of the unit is a maximum value function, it is a recurrent network, but a loop exists in the signal transmission path. Instead, the error backpropagation learning method described later becomes possible. FIG. 15 is a PAD diagram showing the operation of this recurrent neural network. Below, the PAD
The figure will be described. 1501: The process of 1502 is performed for each of i from 1 to N, and then the process proceeds to 1503. 1502: Substitute 0 for the output value o _i of the calculation unit. 1503: Substitute A _I for the output value o _I of the input unit I. Also, I is assigned to the variable src _I. 1504: The process of 1505 is repeated until the value of the variable MXCHG becomes 0. 1505: Processes 1506 to 1514 are performed for each of i from 1 to N (excluding I). 1506: Substitute 0 for src _i . 1507: 0 is assigned to the variable MXIN. 1508: The processing from 1509 to 1510 is performed for each case where j is 1 to N (excluding i), and then the processing proceeds to 1513. 1509: Substitute w _ij · o _j into the variable IN. 1510: If MXIN <IN, the processing of 1511 and 1512 is performed. 1511: Substitute the value of IN for MXIN. 1512: Substitute j for src _i . 1513: Substitute | o _i −IN | for the variable CHG. 1514: If MXCHG <CHG, the process of 1515 is performed. 1515: Substitute the CHG value for MXCHG and go to 1504. Here, N: number of units I: Number A _I units of the input unit: input value o _i: output value of the unit i w _ij: connection weight from unit j to unit i src _i: maximum input to unit i The number of the unit that output the value. The associative word proximity calculation unit 708 calculates the associative word proximity ACtext _ij . In order to calculate ACtext _ij , first, the associative word appearance probability AA _i is calculated. AA _i is defined by the following equation.

【００３４】[0034]

【数１４】 [Equation 14]

【００３５】ここで、ＡＳＣ_ijはword_iがword_jからどの
程度の確からしさで連想されるかを表すもので、ユニッ
トｊを入力ユニットとしてリカレント型ネットワークを
動作させたときのユニットｉの出力値をＡＳＣ_ijとす
る。ＡＣtext_ijはＡＡ_iを用いて次のように定義され
る。Here, ASC _ij represents with what degree of certainty word _i is associated with word _j, and the output value of unit i when the recurrent network is operated with unit j as an input unit. Be ASC _ij . ACtext _ij is defined as follows using AA _i .

【００３６】[0036]

【数１５】ＡＣtext_ij＝Ｃtext_mn・ＡＡ_i・ＡＡ_j ただし、ｍ＝MAX_i，ｎ＝MAX_j ここで、MAX_iは次式で与えられる。[Number 15] _{_{_{ACtext ij = Ctext mn · AA i}}} · AA j However, m = MAX _i, with n = MAX _j here, MAX _i is given by the following equation.

【００３７】[0037]

【数１６】 [Equation 16]

【００３８】連想単語間近接度計算部７０８はこのＡＣ
text_ijを出力する。図１６は連想単語間近接度計算部７
０８の処理を表すＰＡＤ図である。以下に、該ＰＡＤ図
の説明をする。１６０１：ｉが１からＮまでのそれぞれの場合について
１６０２の処理を行い、その後、１６０９へ行く。１６０２：word_i∈ＴＥＸＴならば１６０３から１６０
５の処理を行い、そうでなければ１６０７の処理を行
う。１６０３：変数Ｉにｉを代入する。また、変数Ａ_iにＡt
ext_iを代入する。１６０４：図１５により、リカレント型ニューラルネッ
トワークの計算を行う。１６０５：ｊが１からＮまでのそれぞれの場合について
１６０６の処理を行う。１６０６：変数ＡＳＣ_jiにニューラルネットワークの出
力値ｏ_jを代入する。１６０７：ｊが１からＮまでのそれぞれの場合について
１６０８の処理を行う。１６０８：変数ＡＳＣ_jiに０を代入する。１６０９：ｉが１からＮまでのそれぞれの場合について
１６１０の処理を行い、その後、１６１３へ行く。１６１０：ＡＳＣ_ikのｋについての最大値がＡＳＣ_ijな
らば１６１１、１６１２の処理を行う。１６１１：変数ＡＡ_iにＡＳＣ_ijの値を代入する。１６１２：変数MAX_iにｊを代入する。１６１３：ｉが１からＮ−１まで、ｊがｉ＋１からＮま
でのそれぞれの場合について１６１４と１６１５の処理
を行う。１６１４：変数ｍにMAX_iの値を、変数ｎにMAX_jの値をそ
れぞれ代入する。１６１５：Ｃtext_mn・ＡＡ_i・ＡＡ_jの値をＣtext_ijに代
入する。ここで、Ｉ：図１５のＩＡ_i：図１５のＡ_i ｏ_i：図１５のｏ_i である。図１７は連想単語間近接度計算部７０８の出力
の例である。図１３のＡtext_iのかわりにＡＡ_iが計算さ
れ、Ｃtext_ijのかわりにＡＣtext_ijが出力される。な
お、ＡＡ_iは今後の処理には使用しないので出力しなく
てもよい。The associative word proximity calculation unit 708 uses this AC.
Output text _ij . FIG. 16 shows the associative word proximity calculation unit 7
It is a PAD figure showing the process of 08. The PAD diagram will be described below. 1601: The processing of 1602 is performed for each case where i is 1 to N, and then the processing proceeds to 1609. 1602: 1603 to 160 if word _i ∈ TEXT
5 is performed, and if not, 1607 is performed. 1603: Substitute i for variable I. Also, the variable A _i is At
Substitute ext _i . 1604: Calculate the recurrent neural network according to FIG. 1605: The processing of 1606 is performed for each case where j is 1 to N. 1606: Substitute the output value o _j of the neural network for the variable ASC _ji . 1607: The processing of 1608 is performed for each case where j is 1 to N. 1608: Substitute 0 for the variable ASC _ji . 1609: The process of 1610 is performed for each case of i from 1 to N, and then the process proceeds to 1613. 1610: If the maximum value of k of ASC _ik is ASC _ij , the processing of 1611 and 1612 is performed. 1611: Substitute the value of ASC _{ij for} the variable AA _i . 1612: Substitute j for the variable MAX _i . 1613: Processes 1614 and 1615 are performed for each case where i is 1 to N−1 and j is i + 1 to N. 1614: Substitute the value of MAX _{i for} the variable m and the value of MAX _j for the variable n. 1615: Substitute the values of Ctext _mn · AA _i · AA _j into Ctext _ij . Here, I: I A _{i of} FIG. 15: A _i o _{i of} FIG. 15: o _i of FIG. FIG. 17 is an example of an output of the associative word proximity calculation unit 708. AA _i is calculated in place of AText _i in FIG. 13, ACtext _ij is output instead of the cText _ij. Note that AA _i does not need to be output because it is not used for future processing.

【００３９】検索条件文字列単語間近接度変換部７０６
の出力と連想単語間近接度計算部７０８の出力は単語対
選択部８０４に入力される。単語対選択部８０４は検索
条件文字列単語間近接度変換部７０６の出力であるＡke
y_iを参照し、Ａkey_i＝１かつＡkey_j＝１かつｉ≠ｊであ
る単語対を検索し、その単語対についてのＣkey_ijとＡ
Ｃtext_ijを単語対適合度計算部８０５に入力する。単語
対適合度計算部８０５は図１８に示すような２個の入力
ユニット、１個の出力ユニットと任意の数の中間ユニッ
トを持つ多層型ニューラルネットワークで構成され、そ
の結合重み８０１は結合重みファイル１０６から読み込
まれる。単語対適合度計算部８０５はニューラルネット
ワークの２個の入力ユニットに入力値Ｉ₁、Ｉ₂としてそ
れぞれＣkey_ijとＡＣtext_ijを入力し、出力ユニットの
出力値Ｏ₁をＳ_ijとする。Ｓ_ijを単語対適合度と呼ぶこ
とにする。単語対適合度計算部８０５はＡkey_i＝１かつ
Ａkey_j＝１かつｉ≠ｊを満たす単語対すべてについてＳ
_ijを計算し、適合度統合部８０６に入力する。適合度統
合部８０６は入力された単語対ごとの適合度を統合し、
検索対象文の適合度ＳＣＲを計算する。ＳＣＲは次式で
与えられる。Search condition character string inter-word proximity conversion unit 706
And the output of the associative word proximity calculation unit 708 are input to the word pair selection unit 804. The word pair selection unit 804 is an output of the search condition character string inter-word proximity conversion unit 706, which is Ake.
With reference to y _i , a word pair in which Akey _i = 1 and Akey _j = 1 and i ≠ j is searched, and Ckey _ij and A for that word pair are searched.
The Ctext _ij is input to the word pair matching degree calculation unit 805. The word pair matching degree calculation unit 805 is composed of a multi-layer neural network having two input units, one output unit and an arbitrary number of intermediate units as shown in FIG. 18, and its connection weight 801 is a connection weight file. It is read from 106. The word pair matching degree calculation unit 805 inputs Ckey _ij and ACtext _ij as the input values I ₁ and I _{2 into the two} input units of the neural network, and sets the output value O ₁ of the output unit as S _ij . Let S _ij be called the word pair fitness. The word pair conformance calculation unit 805 calculates S for all word pairs that satisfy Akey _i = 1 and Akey _j = 1 and i ≠ j.
_ij is calculated and input to the fitness integration unit 806. The goodness-of-fit integration unit 806 integrates the goodness-of-fit for each input word pair,
The fitness SCR of the search target sentence is calculated. SCR is given by the following equation.

【００４０】[0040]

【数１７】ＳＣＲ＝Ｆscr(Ｓ_ij) ただし、ｉ，ｊはＡkey_i＝１かつＡkey_j＝１かつｉ≠ｊ
を満たすｉ，ｊここで、Ｆscrは任意の関数を用いてよ
いが、本実施例では最小値関数を用いた。すなわち、SCR = Fscr (S _ij ) where i and j are Akey _i = 1 and Akey _j = 1 and i ≠ j
I, j satisfying the above, where Fscr may use an arbitrary function, but the minimum value function is used in this embodiment. That is,

【００４１】[0041]

【数１８】Ｆscr(Ｓ_ij)＝min_i,j(Ｓ_ij) である。以降の説明ではＦscrに最小値関数を用いてい
るものとする。適合度統合部８０６はＳＣＲを適合度８
０２として出力する。図１９は単語対選択部８０４、単
語対適合度計算部８０５、適合度統合部８０６の処理を
表すＰＡＤ図である。以下に、該ＰＡＤ図の説明をす
る。１９０１：ｉが１からＮ−１まで、ｊがｉ＋１からＮま
でのそれぞれの場合について１９０２の処理を行い、そ
の後、１９０６へ行く。１９０２：word_i∈ＫＥＹかつword_j∈ＫＥＹならば１９
０３から１９０５の処理を行う。１９０３：ニューラルネットワークの入力値Ｉ₁にＣkey
_ij、Ｉ₂にＡＣtext_ijの値を代入する。１９０４：図１０によって、ニューラルネットワークの
計算を行う。１９０５：変数Ｓ_ijにニューラルネットワークの出力値
Ｏ₁を代入する。Fscr (S _ij ) = min _{i, j} (S _ij ). In the following description, it is assumed that the minimum value function is used for Fscr. The goodness-of-fit integration unit 806 sets the SCR to the goodness of fit of
Output as 02. FIG. 19 is a PAD diagram showing the processing of the word pair selecting unit 804, the word pair matching degree calculating unit 805, and the matching degree integrating unit 806. The PAD diagram will be described below. 1901: The process of 1902 is performed for each of i from 1 to N−1 and j from i + 1 to N, and then the process proceeds to 1906. 1902: 19 if word _i ∈ KEY and word _j ∈ KEY
The processing from 03 to 1905 is performed. 1903: Ckey for input value I ₁ of neural network
The value of ACtext _ij is substituted into _ij and I ₂ . 1904: The neural network is calculated according to FIG. 1905: Substitute the output value O ₁ of the neural network for the variable S _ij .

【００４２】１９０６：Ｆscr(Ｓ_ij)をＳＣＲに代入す
る。ここで、Ｉ_i：図１０のＩ_i Ｏ_i：図１０のＯ_i である。図２０は出力された適合度の例である。適合度
は〔０，１〕の範囲の実数で表される。図２０では、文
番号１の文は適合度が０．１０と小さく、検索条件に合
っていないことがわかる。文番号６の文は適合度０．６
５で、検索条件に比較的合っている。1906: Substitute Fscr (S _ij ) into SCR. Here, I _i : I _i O _{i of} FIG. 10: O _i of FIG. FIG. 20 shows an example of the output conformance. The goodness of fit is represented by a real number in the range [0, 1]. In FIG. 20, it can be seen that the sentence of sentence number 1 has a small matching degree of 0.10 and does not meet the search condition. The sentence of sentence number 6 has a goodness of fit of 0.6
In the case of 5, the search condition is relatively satisfied.

【００４３】次に、学習方法について述べる。使用者は
例えば、図２０の文番号６の適合度０．６５が不満足の
場合、教師値入力手段１０２を用いて望ましい適合度、
例えば１．０を教師値として適合度計算手段１０８に提
示する。適合度計算手段１０８の内部では教師値は適合
度統合部８０６に入力される。学習は、適合度統合部８
０６が出力する適合度８０２と教師値８０３との出力誤
差を出力側から入力側へ伝達する誤差逆伝播法を用い
る。図７、図８で破線で示したのが学習時の誤差の流れ
である。適合度統合部８０６の出力誤差は次式で与えら
れる。Next, the learning method will be described. If the user is not satisfied with the matching degree 0.65 of sentence number 6 in FIG. 20, the user can use the teacher value input means 102 to obtain the desired matching degree,
For example, 1.0 is presented to the fitness calculation means 108 as a teacher value. Inside the fitness calculating means 108, the teacher value is input to the fitness integrating unit 806. Learning is performed by the fitness integration unit 8
The error back-propagation method is used to transmit the output error between the fitness 802 and the teacher value 803 output from 06 from the output side to the input side. The broken lines in FIGS. 7 and 8 show the flow of errors during learning. The output error of the goodness-of-fit integration unit 806 is given by the following equation.

【００４４】[0044]

【数１９】Ｅ＝(Ｔ−ＳＣＲ)²／２Ｅ：出力誤差Ｔ：教師値ＳＣＲ：適合度統合部８０６の出力学習は検索条件文字列単語間近接度変換部７０６、検索
対象文単語間近接度変換部７０７、単語対適合度計算部
８０５、連想単語間近接度計算部７０８を構成するニュ
ーラルネットワークの結合重みおよび計算ユニットの持
つバイアスを変更することによって行われる。結合重み
およびバイアスの変更量の計算にはニューラルネットワ
ークの各ユニットの出力値が必要となるので、適合度計
算時の各ユニットの出力値を学習時に参照できるように
なんらかの方法によって保存しておく。誤差はまず適合
度統合部８０６から単語対適合度計算部８０５へ伝播さ
れる。単語対適合度計算部８０５が受け取る誤差は次式
で与えられる。Equation 19] ^{E = (T-SCR) 2} /2 E: output error T: teaching value SCR: Output learning fitness integration unit 806 search strings inter-word proximity conversion unit 706, the search subject sentence word close This is performed by changing the connection weight of the neural network and the bias of the calculation unit that form the contact degree conversion unit 707, the word pair matching degree calculation unit 805, and the associative word proximity degree calculation unit 708. Since the output value of each unit of the neural network is required for the calculation of the change amount of the connection weight and the bias, the output value of each unit at the time of calculating the fitness is saved by some method so that it can be referred to at the time of learning. The error is first propagated from the goodness-of-fit integration unit 806 to the word-pair goodness-of-fit calculation unit 805. The error received by the word pair matching degree calculation unit 805 is given by the following equation.

【００４５】[0045]

【数２０】errＳ_ij＝∂Ｅ／∂Ｓ_ij ＝−(Ｔ−ＳＣＲ)・∂ＳＣＲ／∂Ｓ_ij errＳ_ijは単語対word_i，word_jについての単語対適合度
Ｓ_ijの誤差である。数１８と数２０から、次式が導かれ
る。Equation 20] _{_{errS ij = ∂E / ∂S ij =}} - (T-SCR) · ∂SCR / ∂S ij errS ij is the error of the word pair fit S _ij for word pairs word _i, word _j. The following equation is derived from the equations 18 and 20.

【００４６】[0046]

【数２１】 [Equation 21]

【００４７】したがって、最小の適合度を出力した単語
対についてのみ誤差が単語対適合度計算部８０５へ伝播
され、以後この単語対の単語対適合度について学習が行
われる。この単語対のことを学習単語対と呼ぶことにす
る。また、学習単語対をなす２つの単語をword_A、word_B
と表すことにする。単語対適合度計算部８０５の学習に
は多層型ニューラルネットワークで通常用いられる誤差
逆伝播法を用いる。結合重みの変化量は次式で与えられ
る。Therefore, the error is propagated to the word pair fitness calculating unit 805 only for the word pair that outputs the minimum fitness, and thereafter the word pair fitness of this word pair is learned. This word pair is called a learning word pair. In addition, the two words that make up the learning word pair are word _A and word _B.
Will be represented. For the learning of the word pair matching degree calculation unit 805, the error back-propagation method that is usually used in a multilayer neural network is used. The change amount of the connection weight is given by the following equation.

【００４８】[0048]

【数２２】Δｗ_ijk＝−ηδ_ikｏ_jk-1 ｗ_ijk ：第ｋ−１層ユニットｊから第ｋ層ユニットｉ
への結合重み Δｗ_ijk：結合重みｗ_ijkの変化量 η ：学習定数 δ_ik ：第ｋ層のユニットｉの誤差ｏ_jk-1 ：第ｋ−１層ユニットｊの出力ここで、誤差δ_ikは次式で与えられる。Δw _ijk = −ηδ _ik o _jk-1 w _ijk : k−1 th layer unit j to k th layer unit i
Coupling weight Δw _ijk : change amount of coupling weight w _ijk η: learning constant δ _ik : error of unit i of the k-th layer o _jk-1 : output of unit j of the (k-1) -th layer where error δ _ik is It is given by the following formula.

【００４９】[0049]

【数２３】 [Equation 23]

【００５０】なお、入力層を第１層とし、出力に近いほ
ど層番号が大きくなるものとする。図２１はこの処理を
表すＰＡＤ図である。以下に、該ＰＡＤ図の説明をす
る。２１０１：ｉが１からＮ_Lまでのそれぞれの場合につい
て２１０２の処理を行い、その後、２１０３へ行く。２１０２：err_i・ｏ_iL・(１−ｏ_iL)をδ_iLに代入する。２１０３：ｋがＬ−１から１までのそれぞれの場合につ
いて２１０４の処理を行い、その後、２１０９へ行く。２１０４：ｉが１からＮ_kまでのそれぞれの場合につい
て２１０５から２１０８の処理を行う。２１０５：δ_ikに０を代入する。２１０６：ｊが１からＮ_k+1までのそれぞれの場合につ
いて２１０７の処理を行い、その後、２１０８へ行く。２１０７：δ_ikにｗ_jik+1・δ_jk+1を加える。２１０８：δ_ik・ｏ_ik・(１−ｏ_ik)の値をδ_ikに代入す
る。２１０９：ｋが２からＬまでのそれぞれの場合について
２１１０の処理を行う。２１１０：ｉが１からＮ_kのそれぞれの場合について２
１１１と２１１３の処理を行う。２１１１：ｊが１からＮ_k-1までのそれぞれの場合につ
いて２１１２の処理を行い、その後、２１１３へ行く。２１１２：Δｗ_ijkにηδ_ikｏ_jk-1の値を代入する。２１１３：Δθ_ikにηδ_ikの値を代入する。ここで、Ｌ：全層数Ｎ_L ：第Ｌ層にあるユニットの総数Ｎ_k ：第ｋ層にあるユニットの総数ｏ_iL ：第Ｌ層内にあるｉ番目のユニットの出力値ｏ_ik ：第ｋ層内にあるｉ番目のユニットの出力値 δ_ik ：第ｋ層ユニットｉの誤差 Δｗ_ijk：結合重みｗ_ijkの変更量 Δθ_ik ：バイアスθ_ikの変更量 err_i ：出力ユニットｉの出力誤差である。It is assumed that the input layer is the first layer, and the layer number increases as it approaches the output. FIG. 21 is a PAD diagram showing this processing. The PAD diagram will be described below. 2101: The process of 2102 is performed for each case where i is 1 to N _L , and then the process proceeds to 2103. 2102: Substitute err _i · o _iL · (1−o _iL ) into δ _iL . 2103: The process of 2104 is performed for each case where k is from L-1 to 1, and then the process proceeds to 2109. 2104: The processes from 2105 to 2108 are performed for each case where i is 1 to N _k . 2105: Substitute 0 for δ _ik . 2106: For each case of j from 1 to N _{k + 1} , the process of 2107 is performed, and then the process proceeds to 2108. 2107: Add w _{jik + 1} · δ _{jk + 1} to δ _ik . 2108: Substitute the value of δ _ik · o _ik · (1-o _ik ) into δ _ik . 2109: The process of 2110 is performed for each case where k is 2 to L. 2110: 2 for each case of i from 1 to N _k
Processing of 111 and 2113 is performed. 2111: The processing of 2112 is performed for each case where j is 1 to N _k−1 , and then the processing proceeds to 2113. 2112: Substitute the value of ηδ _ik o _jk-1 for Δw _ijk . 2113: The value of ηδ _ik is substituted into Δθ _ik . Here, L: total number of layers N _L : total number of units in L-th layer N _k : total number of units in k-th layer o _iL : output value of i-th unit in L-th layer o _ik : Output value of i-th unit in k-th layer δ _ik : Error of k-th layer unit i Δw _ijk : Change amount of coupling weight w _ijk Δθ _ik : Change amount of bias θ _ik err _i : Output error of output unit i Is.

【００５１】変更された結合重みは結合重みファイル１
０６に保存される。単語対適合度計算部８０５を構成す
るニューラルネットワークの入力ユニットまで逆伝播さ
れてきた誤差は単語対選択部８０４を介し、検索条件文
字列単語間近接度変換部７０６と連想単語間近接度計算
部７０８へ伝播される。誤差の関係を次式で示す。The changed connection weight is the connection weight file 1
It is stored in 06. The error that has been back-propagated to the input unit of the neural network that constitutes the word pair conformance calculation unit 805 is passed through the word pair selection unit 804 to the search condition character string inter-word proximity conversion unit 706 and the associative word proximity calculation unit. Propagated to 708. The relationship of error is shown by the following equation.

【００５２】[0052]

【数２４】 errＣkey_AB＝δ_1,1(AB) errＡＣtext_AB＝δ_2,1(AB) errＣkey_AB ：検索条件文字列単語間近接度変換部７
０６におけるＣkey_ABの誤差 errＡＣtext_AB：連想単語間近接度計算部７０８におけ
るＡＣtext_ABの誤差 δ_1,1(AB) ：誤差errＳ_ABを逆伝播させた結果の入力
ユニット１の誤差 δ_2,1(AB) ：誤差errＳ_ABを逆伝播させた結果の入力
ユニット２の誤差Ａ、Ｂは学習単語対をなす２単語の単語番号である。な
お、ここでは単語対適合度計算部８０５を構成するニュ
ーラルネットワークの２つの入力ユニットのうち検索条
件文字列単語間近接度を入力する方をユニット１、連想
単語間近接度を入力する方をユニット２と呼んでいる。
連想単語近接度計算部７０８では伝達されてきた誤差er
rＡＣtext_ABを小さくするように学習単語対をなす２つ
の単語の連想に関するリカレント型ニューラルネットワ
ークの結合重みを変更する。連想単語近接度計算部７０
８の出力ＡＣtext_ABは数１５により求められる。そこ
で、数１５の右辺の３つの因数についての誤差を次のよ
うに求める。ErrCkey _AB = δ _{1,1 (AB)} errACtext _AB = δ _{2,1 (AB)} errCkey _AB : search condition character string inter-word proximity conversion unit 7
Error of Ckey _AB errACtext in 06 _AB: error of ACtext _AB in associated word between proximity calculating unit 708 δ _{1,1 (AB):} error in the input unit 1 of a result of the back-propagated error errS _AB δ _{2,1 ( AB)} : Error of the input unit 2 as a result of back propagation of the error errS _AB A and B are word numbers of two words forming a learning word pair. Note that here, of the two input units of the neural network forming the word pair matching degree calculation unit 805, the one that inputs the search condition character string word proximity is the unit 1, and the one that inputs the associative word proximity is the unit. I'm calling 2.
The error er transmitted by the associative word proximity calculation unit 708
The connection weight of the recurrent neural network relating to the association of two words forming a learning word pair is changed so as to reduce rACtext _AB . Associative word proximity calculation unit 70
The output ACtext _AB of 8 is obtained by the equation 15. Therefore, the errors for the three factors on the right side of Expression 15 are obtained as follows.

【００５３】[0053]

【数２５】 errＣtext_kl＝errＡＣtext_AB・∂ＡＣtext_AB／∂Ｃtext_kl ＝errＡＣtext_AB・ＡＡ_A・ＡＡ_B ただし、ｋ＝MAX_A，ｌ＝MAX_B ここで、MAX_A、MAX_Bは数１６で定義される。[Number 25] _{_{_{errCtext kl = errACtext AB · ∂ACtext AB}}} / ∂Ctext kl = errACtext AB · AA A · AA B However, k = MAX _A, where _{_{l = MAX B, MAX A,}} MAX B is defined as the number 16 To be done.

【００５４】[0054]

【数２６】 errＡＡ_A＝errＡＣtext_AB・∂ＡＣtext_AB／∂ＡＡ_A ＝errＡＣtext_AB・Ｃtext_kl・ＡＡ_B [Number 26] _{_{_{errAA A = errACtext AB · ∂ACtext AB}}} / ∂AA A = errACtext AB · Ctext kl · AA B

【００５５】[0055]

【数２７】 errＡＡ_B＝errＡＣtext_AB・∂ＡＣtext_AB／∂ＡＡ_B ＝errＡＣtext_AB・Ｃtext_kl・ＡＡ_A errＣtext_klはＣtext_klの誤差、errＡＡ_AはＡＡ_Aの誤
差、errＡＡ_BはＡＡ_Bの誤差である。ＡＡ_A、ＡＡ_Bは数
１４で求められるので、次の誤差が求められる。[Number 27] _{_{_{errAA B = errACtext AB · ∂ACtext AB}}} / ∂AA B = errACtext AB · Ctext kl · AA A errCtext kl the error of Ctext _kl, error of errAA _A is AA _{A, errAA} _B in the error of AA _B is there. Since AA _A and AA _B are obtained by the equation 14, the following error is obtained.

【００５６】[0056]

【数２８】 [Equation 28]

【００５７】[0057]

【数２９】 [Equation 29]

【００５８】errＡＳＣ_Aj、errＡＳＣ_BjはそれぞれＡＳ
Ｃ_Aj、ＡＳＣ_Bjの誤差である。ＡＳＣ_Aj、ＡＳＣ_Bjは連
想単語間近接度計算部７０８を構成するリカレント型ニ
ューラルネットワークの出力なので、errＡＳＣ_Aj、err
ＡＳＣ_Bjがリカレント型ニューラルネットワークの出力
ユニットの誤差となる。ＡＳＣ_Ajを計算するときはユニ
ットｊが入力ユニット、ユニットＡが出力ユニットとな
る。同様に、ＡＳＣ_Bjを計算するときにはユニットｊが
入力ユニット、ユニットＢが出力ユニットとなる。ErrASC _Aj and errASC _Bj are AS
It is the error between C _Aj and ASC _Bj . Since ASC _Aj and ASC _Bj are outputs of the recurrent neural network forming the associative word proximity calculation unit 708, errASC _Aj and err
ASC _Bj is the error of the output unit of the recurrent neural network. When calculating ASC _Aj , unit j is the input unit and unit A is the output unit. Similarly, when calculating ASC _Bj , unit j is the input unit and unit B is the output unit.

【００５９】リカレント型ネットワークは一般には信号
の伝達経路にループがあるため単純な誤差逆伝播は行え
ないが、本実施例で用いるリカレント型ネットワークは
計算ユニットの入出力関数が最大値関数であることと、
ユニット間の結合重みが〔０，１〕に制限されているた
め、信号の伝達経路にループが存在せず、多層型ネット
ワークと同様に出力から入力への誤差逆伝播ができる。
出力ユニットでないユニットの誤差δ_iは次の手順で計
算される。ただし、Ｏは出力ユニットのユニット番号、errは出力
誤差、src_iは図１５で定義されるユニットｉへの最大入
力値を出力したユニットの番号である。このとき−１≦
δ_i≦１である。結合重みの変化量は次式で与えられるIn general, the recurrent type network cannot perform simple back propagation because there is a loop in the signal transmission path. However, in the recurrent type network used in this embodiment, the input / output function of the calculation unit is the maximum value function. When,
Since the coupling weight between the units is limited to [0, 1], there is no loop in the signal transmission path, and error back propagation from the output to the input can be performed as in the multilayer network.
The error δ _i of a unit that is not an output unit is calculated by the following procedure. Here, O is the unit number of the output unit, err is the output error, and src _i is the number of the unit that has output the maximum input value to the unit i defined in FIG. At this time -1 ≤
δ _i ≦ 1. The variation of the connection weight is given by

【００６０】[0060]

【数３０】 [Equation 30]

【００６１】数３０中のｄ_iは次式で計算される。The d _i in the equation (30) is calculated by the following equation.

【００６２】[0062]

【数３１】 [Equation 31]

【００６３】数３０、数３１によれば結合重みが〔０，
１〕となることが保証される。連想単語近接度計算部７
０８では以上の結合重み変化量をＡＳＣ_Aj、ＡＳＣ_Bjの
計算についてそれぞれ求め、その平均を最終的な変化量
とする。According to Equations 30 and 31, the connection weight is [0,
1] is guaranteed. Associative word proximity calculation unit 7
At 08, the above-mentioned change amounts of the connection weights are obtained for the calculation of ASC _Aj and ASC _Bj , and the average thereof is set as the final change amount.

【００６４】図２２にリカレント型ニューラルネットワ
ークの学習法を表すＰＡＤ図を示す。以下に、該ＰＡＤ
図の説明をする。２２０１：ｉが１からＮまでのそれぞれの場合について
２２０２の処理を行い、その後、２２０３へ行く。２２０２：δ_iに０を代入する。２２０３：変数ｓに出力ユニット番号Ｏを代入する。２２０４：δ_sに出力誤差errを代入する。２２０５：変数ａにsrc_sを代入する。２２０６：ａ＝ｓになるまで２２０７から２２１６の処
理を繰り返す。２２０７：δ_aにδ_sｗ_saを代入する。２２０８：変数ｄにηδ_aを代入する。２２０９：ｄ＜−１ならば２２１０の処理を行い、その
後、２２１２へ行く。ｄ＞１ならば２２１１の処理を行
い、その後、２２１２へ行く。２２１０：ｄに−１を代入する。２２１１：ｄに１を代入する。２２１２：ｄ＞０ならば２２１３の処理を行い、その
後、２２１５へ行く。そうでなければ、２２１４の処理
を行い、その後、２２１５へ行く。２２１３：(１−ｗ_sa)・ｄ・ｏ_aの値をΔｗ_saに代入す
る。２２１４：ｗ_sa・ｄ・ｏ_aの値をΔｗ_saに代入する。２２１５：ｓにａの値を代入する。２２１６：ａにsrc_sを代入し、２２０６へ行く。ここで、 Δｗ_sa：結合重みｗ_saの変更量Ｏ：出力ユニットに指定されたユニットの番号 err ：出力ユニットの出力誤差Ｎ：ユニット数 δ_i ：ユニットｉの誤差 src_i ：（図１５）で定義されるである。FIG. 22 is a PAD diagram showing the learning method of the recurrent neural network. Below, the PAD
The figure will be described. 2201: The process of 2202 is performed for each of i from 1 to N, and then the process proceeds to 2203. 2202: Substitute 0 for δ _i . 2203: Substitute the output unit number O for the variable s. 2204: Substitute the output error err for δ _s . 2205: Substitute src _s for variable a. 2206: The processes from 2207 to 2216 are repeated until a = s. 2207: substituting the δ _s w _sa in δ _a. 2208: Substitute ηδ _a for the variable d. 2209: If d <−1, the process of 2210 is performed, and then the process proceeds to 2212. If d> 1, the process of 2211 is performed, and then the process proceeds to 2212. 2210: Substitute -1 for d. 2211: Substitute 1 for d. 2212: If d> 0, the process of 2213 is performed, and then the process goes to 2215. Otherwise, the process of 2214 is performed, and then the process goes to 2215. 2213: Substitute the value of (1-w _sa ) · d · o _a into Δw _sa . 2214: Substitute the value of w _sa · d · o _a into Δw _sa . 2215: The value of a is substituted for s. 2216: Substitute src _s for a and go to 2206. Here, Δw _sa : change amount of coupling weight w _sa O: number of unit designated as output unit err: output error of output unit N: number of units δ _i : error of unit i src _i : (FIG. 15) Is defined.

【００６５】変更された結合重みは連想強度として類義
語辞書ファイル１０５に書き出される。連想単語間近接
度計算部７０８は数２４で求めた誤差errＣtext_klを検
索対象文単語間近接度変換部７０７へ伝播する。検索条
件文字列単語間近接度変換部７０６の出力ユニットが受
け取る誤差は数２４で求めたerrＣkey_ABだが、検索対象
文単語間近接度変換部７０７の出力ユニットが受け取る
誤差は数２５で求めたerrＣtext_klである。したがっ
て、検索条件文字列単語間近接度変換部７０６では学習
単語対word_A、word_Bについて学習を行う。検索対象文単
語間近接度変換部７０７では単語対word_k、word_lについ
て学習を行う。検索条件文字列単語間近接度変換部７０
６と検索対象文単語間近接度変換部７０７は多層型ネッ
トワークなので、図２１に示した誤差逆伝播法で学習を
行う。変更された結合重みは結合重みファイル１０６に
書き出される。The changed connection weight is written in the synonym dictionary file 105 as the associative strength. The associative word proximity calculation unit 708 propagates the error errCtext _kl obtained by the equation 24 to the search target sentence word proximity conversion unit 707. The error received by the output unit of the search condition character string word-to-word proximity conversion unit 706 is errCkey _AB obtained by Equation 24, but the error received by the output unit of the search target sentence word-to-word proximity degree conversion unit 707 is obtained by Equation 25. _kl . Therefore, the search condition character string inter-word proximity conversion unit 706 learns the learned word pair word _A and word _B. The search target sentence inter-word proximity conversion unit 707 learns a word pair word _k and word _l . Search condition character string inter-word proximity conversion unit 70
Since 6 and the search target sentence inter-word proximity conversion unit 707 are multilayer networks, learning is performed by the error back propagation method shown in FIG. The changed connection weight is written in the connection weight file 106.

【００６６】本実施例で用いたニューラルネットワーク
の学習法は繰返し計算に基づく再急降下法である。した
がって、図２１、図２２に示した結合重み変更処理を何
度も繰り返して行う必要がある。また、学習を行うべき
検索対象文が複数ある場合はそれぞれの検索対象文につ
いて図２１、図２２の処理を何度も繰り返す。このよう
に、適合度統合部８０６に教師値８０３を入力すると、
単語対適合度計算部８０５、連想単語間近接度計算部７
０８、検索条件文字列単語間近接度変換部７０６、検索
対象文単語間近接度変換部７０７の４つのニューラルネ
ットワークの学習を一貫して行うことが出来る。また、
それぞれのニューラルネットワークの学習定数ηを別々
に設定することによって、４つのニューラルネットワー
クの学習の割合を変えることが出来る。例えば、連想単
語間近接度計算部７０８だけ学習させたり、単語間近接
度変換部７０６、７０７を他より大きく学習させるなど
ができる。学習定数、学習繰返し回数等の、学習を制御
するパラメータの値は教師値入力手段１０２によって教
師値とともに使用者が指定できる。この学習機能によっ
て、望ましい適合度を提示すれば適合度計算手段１０８
が計算方法を獲得するので、適合度計算方法をあらかじ
め正確に決定する必要がない。The learning method of the neural network used in this embodiment is a re-descent method based on iterative calculation. Therefore, it is necessary to repeatedly perform the connection weight changing process shown in FIGS. 21 and 22. When there are a plurality of search target sentences to be learned, the processing of FIGS. 21 and 22 is repeated many times for each search target sentence. Thus, when the teacher value 803 is input to the goodness-of-fit integration unit 806,
Word pair compatibility calculation unit 805, associative word proximity calculation unit 7
08, the search condition character string inter-word proximity conversion unit 706, and the search target sentence inter-word proximity conversion unit 707 can be consistently learned. Also,
By setting the learning constant η of each neural network separately, the learning rate of the four neural networks can be changed. For example, it is possible to make only the associative word proximity calculation unit 708 learn, or make the word proximity conversion units 706 and 707 learn larger than others. The value of the parameter for controlling the learning such as the learning constant and the number of learning repetitions can be specified by the user together with the teacher value by the teacher value input means 102. If the desired fitness is presented by this learning function, the fitness calculation means 108
Since it acquires the calculation method, it is not necessary to accurately determine the fitness calculation method in advance.

【００６７】[0067]

【発明の効果】異表記や類義語による検索漏れを減少さ
せることができる。また、検索条件文字列を構成する単
語と検索対象文を構成する単語が偶然一致することによ
る検索雑音を減少させることができる。EFFECTS OF THE INVENTION It is possible to reduce the omission of search by different notation and synonyms. Further, it is possible to reduce search noise caused by accidentally matching a word forming the search condition character string with a word forming the search target sentence.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明のブロック構成を示す図である。FIG. 1 is a diagram showing a block configuration of the present invention.

【図２】実施例で用いた検索対象文の例を示す図であ
る。FIG. 2 is a diagram showing an example of a search target sentence used in an embodiment.

【図３】構文解析して得られる構文木の例を示す図であ
る。FIG. 3 is a diagram showing an example of a syntax tree obtained by syntax analysis.

【図４】文書データベース１０４に保持される構文情報
の例を示す図である。FIG. 4 is a diagram showing an example of syntax information held in a document database 104.

【図５】類義語辞書ファイル１０５に保持される連想強
度の例を示す図である。5 is a diagram showing an example of associative strength held in a synonym dictionary file 105. FIG.

【図６】検索条件文字列とその構文解析木、構文情報の
例を示す図である。FIG. 6 is a diagram showing an example of a search condition character string, its syntax analysis tree, and syntax information.

【図７】適合度計算手段１０８の内部構成を示す図であ
る。FIG. 7 is a diagram showing an internal configuration of a fitness calculation unit.

【図８】図７に続く適合度計算手段１０８の内部構成を
示す図である。FIG. 8 is a diagram showing an internal configuration of a fitness calculation means 108 following FIG. 7;

【図９】単語間近接度変換部７０６、７０７を構成する
１入力１出力の多層型ニューラルネットワークを示す図
である。FIG. 9 is a diagram showing a 1-input 1-output multi-layered neural network that constitutes inter-word proximity conversion units 706 and 707.

【図１０】多層型ニューラルネットワークの計算処理を
表すＰＡＤ図である。FIG. 10 is a PAD diagram showing a calculation process of a multilayer neural network.

【図１１】単語間近接度変換部７０６、７０７の処理を
表すＰＡＤ図である。FIG. 11 is a PAD diagram showing processing of inter-word proximity conversion units 706 and 707.

【図１２】検索条件文字列単語間近接度変換部７０６の
出力例を示す図である。FIG. 12 is a diagram showing an output example of a search condition character string inter-word proximity conversion unit 706.

【図１３】検索対象文単語間近接度変換部７０７の出力
例を示す図である。FIG. 13 is a diagram illustrating an output example of a search target sentence inter-word proximity conversion unit 707.

【図１４】連想単語間近接度計算部７０８を構成するリ
カレント型ニューラルネットワークを示す図である。FIG. 14 is a diagram showing a recurrent neural network that constitutes an associative word proximity calculation unit 708.

【図１５】リカレント型ニューラルネットワークの計算
処理を表すＰＡＤ図である。FIG. 15 is a PAD diagram showing a calculation process of a recurrent neural network.

【図１６】連想単語間近接度計算部７０８の処理を表す
ＰＡＤ図である。FIG. 16 is a PAD diagram showing processing of an associative word proximity degree calculation unit 708.

【図１７】連想単語間近接度計算部７０８の出力例を示
す図である。FIG. 17 is a diagram showing an output example of an associative word proximity calculation section 708.

【図１８】単語対適合度計算部８０５を構成する２入力
１出力の多層型ニューラルネットワークを示す図であ
る。FIG. 18 is a diagram showing a two-input, one-output multi-layered neural network which constitutes a word pair fitness calculation unit 805.

【図１９】単語対選択部８０４、単語対適合度計算部８
０５、適合度統合部８０６の処理を表すＰＡＤ図であ
る。FIG. 19 is a word pair selection unit 804, a word pair conformance calculation unit 8
05 is a PAD diagram showing the processing of a goodness-of-fit integration unit 806.

【図２０】検索対象文の適合度の出力例を示す図であ
る。FIG. 20 is a diagram showing an output example of the matching degree of a search target sentence.

【図２１】多層型ニューラルネットワークの誤差逆伝播
学習法の処理を表すＰＡＤ図である。FIG. 21 is a PAD diagram showing a process of an error backpropagation learning method of a multilayer neural network.

【図２２】リカレント型ニューラルネットワークの学習
法の処理を表すＰＡＤ図である。FIG. 22 is a PAD diagram showing the processing of the learning method of the recurrent neural network.

【図２３】多層型ニューラルネットワークを説明するた
めの図である。FIG. 23 is a diagram for explaining a multilayer neural network.

【符号の説明】[Explanation of symbols]

１０１検索条件入出力手段１０２教師値入力手段１０３出力手段１０４文書データベース１０５類義語辞書ファイル１０６結合重みファイル１０７構文解析手段１０８適合度計算手段１０９類義語連想手段７０１検索条件文字列単語間近接度変換部７０６を構
成する多層型ニューラルネットワークの結合重み７０２検索対象文単語間近接度変換部７０７を構成す
る多層型ニューラルネットワークの結合重み７０３類義語の連想強度７０４検索条件文字列の構文情報７０５検索対象文の構文情報７０６検索条件文字列単語間近接度変換部７０７検索対象文単語間近接度変換部７０８連想単語間近接度計算部８０１単語対適合度計算部８０５を構成する多層型ニ
ューラルネットワークの結合重み８０２適合度出力８０３教師値入力手段１０２から入力される教師値８０４単語対選択部８０５単語対適合度計算部８０６適合度統合部101 search condition input / output means 102 teacher value input means 103 output means 104 document database 105 synonym dictionary file 106 connection weight file 107 syntactic analysis means 108 fitness calculation means 109 synonym association means 701 search condition character string interword proximity conversion unit 706 Connection weight 702 of the multi-layered neural network forming the search target sentence 702 connection weight of the multi-layered neural network forming the inter-proximity conversion unit 707 search word sentence 703 synonym association strength 704 syntax information of search condition character string 705 search target sentence syntax Information 706 Search condition character string Inter-word proximity conversion unit 707 Search target sentence Inter-word proximity conversion unit 708 Association word inter-proximity calculation unit 801 Word pair relevance calculation unit 805 Connection weight of multilayer neural network 802 Relevance Degree output 80 Teaching value is input from the teacher value input means 102 804 word pair selection unit 805 word pairs adaptability calculation unit 806 adaptability integration unit

Claims

【特許請求の範囲】[Claims]

【請求項１】自然言語の形で保存され、データベース
内の全文を検索対象文とする文書データベースから、入
力された検索条件文字列に適合する対象文を検索する文
書検索方法において、入力された検索条件文字列の構文情報である検索条件文
字列内の単語の出現確度および単語間距離を基に検索条
件文字列単語間近接度を求める過程と、文書データベース内の検索対象文の構文情報である検索
対象文内の単語の出現確度および単語間距離を基に検索
対象文単語間近接度を求める過程と、該検索対象文単語間近接度と単語間の連想強度とを基に
連想単語間近接度を求める過程と、前記検索条件文字列単語間近接度と連想単語間近接度と
を基に単語対適合度を求める過程と、求められた単語対ごとの単語対適合度を統合して検索対
象文の適合度を求める過程を備えることを特徴とする文
書検索方法。1. A document search method for searching for a target sentence that matches an input search condition character string from a document database that is stored in the form of a natural language and has all the sentences in the database as a search target sentence. The process of obtaining the word-to-word proximity based on the probability of occurrence and the distance between words in the search condition character string, which is the syntax information of the search condition character string, and the syntax information of the search target sentence in the document database. The process of obtaining the word-to-word proximity of the search target sentence based on the occurrence probability and the distance between words of a certain search target sentence, and the associative word proximity based on the word-to-word proximity of the search target sentence and the association strength between words. The step of obtaining the contact degree, the step of obtaining the word pair matching degree based on the search condition character string word proximity degree and the associative word proximity degree, and the word pair matching degree of each obtained word pair are integrated. Search target sentence suitability Document search method characterized by comprising the step of obtaining the degree.

【請求項２】請求項１記載の文書検索方法において、
求められた前記適合度が所定の閾値を越える検索対象文
を出力する過程を備えることを特徴とする文書検索方
法。2. The document search method according to claim 1, wherein
A document search method comprising the step of outputting a search target sentence in which the obtained matching degree exceeds a predetermined threshold value.

【請求項３】請求項１記載の文書検索方法において、
使用者の指定する出力様式により前記検索対象文の適合
度を出力する過程を備えることを特徴とする文書検索方
法。3. The document search method according to claim 1, wherein
A document retrieval method comprising the step of outputting the matching degree of the retrieval target sentence according to an output style designated by a user.

【請求項４】請求項１記載の文書検索方法において、
前記文書データベースは自然言語文と共にその構文情報
を保持していることを特徴とする文書検索方法。4. The document search method according to claim 1, wherein
The document retrieval method, wherein the document database holds natural language sentences and syntactic information thereof.

【請求項５】請求項１記載の文書検索方法において、
前記連想単語間近接度を求める過程と前記単語対適合度
を求める過程をニューラルネットワークにより行なうこ
とを特徴とする文書検索方法。5. The document search method according to claim 1,
A document retrieval method, characterized in that a neural network is used to perform the step of obtaining the degree of association between associated words and the step of obtaining the degree of word pair matching.

【請求項６】請求項５記載の文書検索方法において、
使用者が入力する教師値に基づき、教師値により近い適
合度を得るようにニューラルネットワークに学習させる
過程を備えることを特徴とする文書検索方法。6. The document search method according to claim 5,
A document retrieval method comprising the step of causing a neural network to learn so as to obtain a goodness of fit closer to a teacher value based on a teacher value input by a user.

【請求項７】自然言語の形で保存され、データベース
内の全文を検索対象文とする文書データベースと、検索
条件入出力装置と、構文解析装置と、適合度計算装置
と、類義語辞書ファイルと、結合重みファイルと、出力
装置を備え、入力された検索条件文字列に適合する対象
文を検索する文書検索システムであって、前記構文解析装置は、入力された検索条件文字列を解析
して構文情報である検索条件文字列内の単語の出現確度
および単語間距離を出力する手段を備え、前記適合度計算装置は、前記検索条件文字列内の単語の出現確度および単語間距
離を基に前記結合重みファイル内の結合重みを用いて検
索条件文字列単語間近接度を求める手段と、文書データベース内の検索対象文の構文情報である検索
対象文内の単語の出現確度および単語間距離を基に前記
結合重みファイル内の結合重みを用いて検索対象文単語
間近接度を求める手段と、前記検索対象文単語間近接度と前記類義語辞書ファイル
内の単語間の連想強度とを基に連想単語間近接度を求め
る手段と、前記検索条件文字列単語間近接度と連想単語間近接度と
を基に前記結合重みファイル内の結合重みを用いて単語
対適合度を求める手段と、求められた単語対ごとの単語対適合度を統合して検索対
象文の適合度を求める手段とを備え、前記出力装置により前記求められた検索対象文の適合度
を出力することを特徴とする文書検索システム。7. A document database which is stored in the form of natural language and has all sentences in the database as a search target sentence, a search condition input / output device, a syntactic analysis device, a fitness calculation device, and a synonym dictionary file, A document weight search system comprising a connection weight file and an output device for searching a target sentence that matches an input search condition character string, wherein the syntax analysis device analyzes the input search condition character string to generate a syntax. It is provided with a means for outputting the appearance probability and the inter-word distance of the word in the search condition character string which is information, the fitness calculation device, based on the appearance probability and the inter-word distance of the word in the search condition character string, A method for determining the proximity between search condition character string words using the connection weights in the connection weight file, and the occurrence probability of the words in the search target sentence, which is the syntax information of the search target sentence in the document database, and A means for obtaining the degree of proximity between search target sentence words by using the connection weight in the connection weight file based on the distance between words, the degree of closeness between the search target sentence words, and the association strength between words in the synonym dictionary file, A means for obtaining an associative word proximity degree based on, and a means for obtaining a word pair matching degree by using a connection weight in the connection weight file based on the search condition character string word proximity degree and an associative word proximity degree And a means for calculating the matching degree of the search target sentence by integrating the calculated word pair matching degree for each word pair, and outputting the calculated matching degree of the search target sentence by the output device. Document search system.

【請求項８】請求項７記載の文書検索システムにおい
て、前記適合度計算装置は求められた前記適合度が所定
の閾値を越える検索対象文を出力する手段を備えること
を特徴とする文書検索システム。8. The document search system according to claim 7, wherein the fitness calculation device includes means for outputting a search target sentence for which the obtained fitness exceeds a predetermined threshold value. ..

【請求項９】請求項７記載の文書検索システムにおい
て、前記適合度計算装置は使用者の指定する出力様式に
より前記検索対象文の適合度を出力する手段を備えるこ
とを特徴とする文書検索システム。9. The document search system according to claim 7, wherein the conformance calculation apparatus includes means for outputting the conformance of the search target sentence in an output format designated by a user. ..

【請求項１０】請求項７記載の文書検索システムにお
いて、前記文書データベースは自然言語文と共にその構
文情報を保持していることを特徴とする文書検索システ
ム。10. The document search system according to claim 7, wherein the document database holds natural language sentences and syntactic information thereof.

【請求項１１】請求項７記載の文書検索システムにお
いて、使用者が入力した検索条件文字列の構文解析結果
から得られた構文情報を前記検索条件入出力装置により
使用者が修正することができるようにしたことを特徴と
する文書検索システム。11. The document search system according to claim 7, wherein the search condition input / output device allows the user to correct the syntax information obtained from the syntax analysis result of the search condition character string input by the user. A document retrieval system characterized by the above.

【請求項１２】請求項７記載の文書検索システムにお
いて、前記連想単語間近接度を求める手段と前記単語対
適合度を求める手段をニューラルネットワークにより構
成したことを特徴とする文書検索システム。12. The document search system according to claim 7, wherein the means for finding the degree of association between associated words and the means for finding the degree of word pair matching are configured by a neural network.

【請求項１３】請求項７記載の文書検索システムにお
いて、前記検索条件文字列単語間近接度を求める手段と
前記検索対象文単語間近接度を求める手段と前記連想単
語間近接度を求める手段と前記単語対適合度を求める手
段をニューラルネットワークにより構成したことを特徴
とする文書検索システム。13. The document search system according to claim 7, further comprising: means for obtaining the degree of proximity between search condition character string words, means for obtaining the degree of proximity between search target sentence words, and means for obtaining the degree of proximity between associated words. A document retrieval system characterized in that the means for obtaining the word pair matching degree is constituted by a neural network.

【請求項１４】請求項１２または請求項１３記載の文
書検索システムにおいて、教師値入力装置を備え、該教
師値入力装置から教師値を受けたニューラルネットワー
クは該教師値に基づく学習により前記結合重みを変更し
かつ前段のニューラルネットワークに伝達する誤差値を
求めて前段に伝達し、前段のニューラルネットワークは
伝達された誤差値に基づく学習により前記結合重みを変
更しかつさらに前段のニューラルネットワークに伝達す
る誤差値を求めてさらに前段に伝達することにより格段
のニューラルネットワークが学習をして前記結合重みを
変更することを特徴とする文書検索システム。14. The document retrieval system according to claim 12 or 13, further comprising a teacher value input device, wherein the neural network receiving the teacher value from the teacher value input device learns based on the teacher value to obtain the connection weight. And the error value to be transmitted to the preceding-stage neural network is obtained and transmitted to the preceding stage, and the preceding-stage neural network changes the connection weight by learning based on the transmitted error value and further transmits to the preceding-stage neural network. A document retrieval system, characterized in that a remarkable neural network learns and changes the connection weight by obtaining an error value and transmitting it to the preceding stage.

【請求項１５】請求項１４記載の文書検索システムに
おいて、前記教師値入力装置からは各ニューラルネット
ワークの学習定数が入力され、各ニューラルネットワー
クはそれぞれ異なる学習定数をとりうることを特徴とす
る文書検索システム。15. The document search system according to claim 14, wherein the learning constants of the respective neural networks are inputted from the teacher value input device, and the respective neural networks can take different learning constants. system.