JP3553795B2

JP3553795B2 - Synonym calculation apparatus and method, and medium recording synonym calculation program

Info

Publication number: JP3553795B2
Application number: JP11857998A
Authority: JP
Inventors: 雅且大久保; 孝史井上; 正之杉崎; 一男田中
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 1998-04-28
Filing date: 1998-04-28
Publication date: 2004-08-11
Anticipated expiration: 2018-04-28
Also published as: JPH11312168A

Description

【０００１】
【発明の属する技術分野】
本発明は、単語同士の関連度が定義された複数の辞書を用いて、同義語とみなせる単語をグループ化する同義語計算装置及びその方法並びに同義語計算プログラムを記録した媒体に関するものである。
【０００２】
【従来の技術】
従来、表記は異なるが同じ意味を持つ単語をまとめた辞書として、同義語辞書があった。
【０００３】
同義語辞書は、例えば情報検索において１つの単語を検索語として入力した際に、同義語辞書を用いてその検索語を補完して検索することによって、利用者の意図する情報を簡単に検索できるようにするために使用される。あるいは、情報検索サービスにおいて、さまざまな情報に対する利用者からの要求の強さは検索語の使用回数によって測定することができるが、その際、同義語に対する使用回数をまとめて集計することによって、より正確な値を求めることができる。
【０００４】
しかし、単語同士の関係は、時と共に変化する。例えば「オリンピック」という検索語は、オリンピックの歴史や競技種目を検索する場合、オリンピック会場へのアクセス方法や入場チケットについて検索する場合、オリンピックの競技結果の速報を知りたい場合、次に開催されるオリンピックについて知りたい場合等、時期に応じて様々な用途で使用される。
【０００５】
従来の同義語辞書は、このような時と共に変化する関係は考慮されておらず、このため、同義語辞書を用いて検索しても、常に同じ検索結果しか得られないという問題があった。また、情報要求の強さについても、従来の同義語辞書では正確な測定ができないという問題があった。
【０００６】
【発明が解決しようとする課題】
ところで、本出願人が先に提案した検索ログを解析する情報関連づけ装置（特願平９−１４８５１９号）を用いることにより、時と共に変化する関係に自動的に追随した関連度辞書を構築できる。しかし、このようにして作成された関連度辞書では、単語同士の関連度は求められるが、どの単語とどの単語が、その時に同義語として利用されているかを正確に判定することはできない。関連度が所定の閾値以上のものを同義語とみなすことも考えられるが、この方法では誤差が大きくなることが懸念される。
【０００７】
例えば、３つの単語Ｗ１，Ｗ２，Ｗ３があって、Ｗ１とＷ２、Ｗ２とＷ３の間の関連度がそれぞれ閾値より大きかったとしても、Ｗ１とＷ３の関連度が閾値より大きいとは限らない。即ち、単純に閾値だけによる判定では、このような連鎖による誤判定を招いてしまう。また、前記の情報関連付け装置では、異なる２種類の観点からの関連度を求めることができるが、この両者を効果的に組み合わせて同義語辞書を構築する手段については述べていなかった。
【０００８】
本発明の目的は、上記のような問題点に鑑みてなされたものであり、少なくとも２種類の辞書を効果的に組み合わせて同義語とみなされる単語を自動的にグループ化し得る、同義語計算装置及びその方法並びに同義語計算プログラムを記録した媒体を提供することにある。
【０００９】
【課題を解決するための手段】
上記目的を達成するため、本発明では、少なくとも２種類の関連度辞書を用い、一の関連度辞書に基づいて単語グループを初期化するとともに、各関連度辞書に基づいて単語グループを併合処理することによって同義語グループを作成するため、少なくとも２種類の関連度を反映した同義語辞書を作成することができる。また、関連度辞書として所定期間の検索ログを解析して作成した辞書を用いることにより、一般的な同義語ではなく、その時期に同義語的に用いられた関連語を集約できるので、現在の情報ニーズを反映した同義語辞書を作成することができる。
【００１０】
【発明の実施の形態】
以下、本発明を図面に基づいて詳述する。
【００１１】
図１は本発明の実施の形態の一例を示すもので、図中、１は間隔関連度辞書、２は時系列関連度辞書、３は単語グループ初期化部、４は単語グループ化部である。
【００１２】
間隔関連度辞書１は、本出願人が先に提案した、検索ログを解析する情報関連づけ装置（特願平９−１４８５１９号）を用いて、同一利用者による検索の時間間隔に基づいて単語同士の関連度を定義・作成したもので、図２にその一例を示す。
【００１３】
図２において、単語Ｗ１と単語Ｗ２，Ｗ３，……との関連度（間隔関連度と呼ぶ。）は、それぞれ、Ｉｒ（１，２），Ｉｒ（１，３），……であることを示している。また、Ｉｒ（１，２）＝Ｉｒ（２，１），Ｉｒ（１，３）＝Ｉｒ（３，１），……である。
【００１４】
時系列関連度辞書２は、本出願人が先に提案した、検索ログを解析する情報関連づけ装置（特願平９−１４８５１９号）を用いて、各検索語の使用頻度の時系列の相関係数に基づいて単語同士の関連度を定義・作成したもので、図３にその一例を示す。
【００１５】
図３において、単語Ｗ１と単語Ｗ２，Ｗ３，……との関連度（時系列関連度と呼ぶ。）は、それぞれ、Ｃｒ（１，２），Ｃｒ（１，３），……であることを示している。また、Ｃｒ（１，２）＝Ｃｒ（２，１），Ｃｒ（１，３）＝Ｃｒ（３，１），……である。
【００１６】
単語グループ初期化部３は、各単語が属するグループの初期値を設定するもので、図４に初期値設定フローチャートの一例を示す。図４において、Ｇ［Ｗｉ］は、単語Ｗｉが属するグループの名前を表しており、初期値としてＧ［Ｗｉ］＝ｉとしている。即ち、各単語はそれぞれ、その単語のみからなるグループに属するように設定される。
【００１７】
単語グループ化部４は、各単語間の間隔関連度と時系列関連度に基づいて単語をグループ化するもので、図５に単語グループ化処理の基本フローチャートの一例を示す。図５では、２つの単語ＷｊとＷｋの基準となる関連度をＲ（ｊ，ｋ）、Ｒ（ｊ，ｋ）の閾値をＴＨとしている。
【００１８】
処理の流れは、
Ｒ（ｊ，ｋ）の最も大きな組（ｊ，ｋ）を取り出し（ステップＳ１）、Ｒ（ｊ，ｋ）＞ＴＨでなければ終了する（ステップＳ２）。Ｒ（ｊ，ｋ）＞ＴＨであれば、Ｇ［Ｗｊ］に属する単語（要素）Ｗｐ、Ｇ［Ｗｋ］に属する単語（要素）Ｗｑをそれぞれ取り出し（ステップＳ３）、ＷｐとＷｑとがグループ化条件を満たすかどうかを検査する（ステップＳ４）。
【００１９】
条件を満たす場合、Ｇ［Ｗｊ］内の全ての要素Ｗｐと、Ｇ［Ｗｋ］内の全ての要素Ｗｑとについて検査し（ステップＳ５）、これらがグループ化条件を満たす場合、Ｇ［Ｗｋ］をＧ［Ｗｊ］に併合して１つのグループとする、即ちＧ［Ｗｊ］＝Ｇ［Ｗｊ］＋Ｇ［Ｗｋ］とする（ステップＳ６）。
【００２０】
全ての（ｊ，ｋ）についてステップＳ２からＳ６までの処理を行っていれば終了し（ステップＳ７）、そうでなければ、次に大きなＲ（ｊ，ｋ）値を持つ（ｊ，ｋ）の組を取り出し（ステップＳ８）、これがＧ［Ｗｊ］＝Ｇ［Ｗｋ］でなければ（ステップＳ９）、ステップＳ２からＳ６まで繰り返す。
【００２１】
単語グループ化部４では、上記基本フローチャートに基づいて単語のグループ化処理を行う。この際に、Ｒ（ｊ，ｋ）としてＩｒ（ｊ，ｋ）、閾値としてＴＨＩ１を用い、また、ステップＳ４におけるグループ化条件として、
Ｉｒ（ｐ，ｑ）＞ＴＨＩ２またはＣｒ（ｐ，ｑ）＞ＴＨＣＲ１……（条件１）
を用いる。
【００２２】
この結果、各単語はグループに分けることができ、各グループに含まれる単語間には（条件１）が成り立つ。即ち、各グループ内の単語は間隔関連度がＴＨＩ２より大きいか、または時系列関連度がＴＨＣＲ１より大きくなっており、それらの単語同士を同義語として出力する。
【００２３】
図６は単語グループ初期化部３の別の例を示すもので、図１と同一構成部分は同一符号をもって表す。即ち、１は間隔関連度辞書、５は単独グループ生成部、６はコアグループ生成部である。
【００２４】
単独グループ生成部５は、図４のフローチャートに基づいて各単語のみからのグループを生成する。また、コアグループ生成部６は間隔関連度に基づいてコアとなる単語グループを生成する。この処理は図５の基本フローチャートにおいて、Ｒ（ｊ，ｋ）としてＩｒ（ｊ，ｋ）、閾値としてＴＨＩ３を用い、また、ステップＳ４におけるグループ化条件として、
Ｉｒ（ｐ，ｑ）＞ＴＨＩ３ ……（条件２）
を用いて行う。この結果、各グループ内の単語は（条件２）が成り立ち、このコアグループを初期値として前記のグループ化処理を行う。
【００２５】
図７は単語グループ初期化部３のさらに別の例を示すもので、図６と同一構成部分は同一符号をもって表す。即ち、１は間隔関連度辞書、５は単独グループ生成部、７は余弦計算部、８はコアグループ生成部である。
【００２６】
余弦計算部７は、単語ＷｊとＷｋとの間の余弦値Ｃｏｓ（ｊ，ｋ）を以下のようにして計算する。即ち、間隔関連度辞書１に基づいて、Ｗｊ＝｛Ｉｒ（ｊ，１），Ｉｒ（ｊ，２），……Ｉｒ（ｊ，ｎ）｝、Ｗｋ＝｛Ｉｒ（ｋ，１），Ｉｒ（ｋ，２），……Ｉｒ（ｋ，ｎ）｝とする時（但し、ｊ＝１，２，……ｎ、ｋ＝１，２，……ｎ、ｊ≠ｋ）、

として計算する。
【００２７】
上記式で求められたＣｏｓ（ｊ，ｋ）は、各Ｗｊ，Ｗｋをｎ次元空間で表した時のコサイン値（余弦値）に等しい。ここで、各間隔関連度は全て０以上の値であるので、Ｃｏｓ（ｊ，ｋ）の値は、０から１までの間の値となる。即ち、Ｃｏｓ（ｊ，ｋ）の値が大きいほどＷｊとＷｋの角度は小さくなる。
【００２８】
また、コアグループ生成部８では、各単語間の余弦値に基づいてコアとなる単語グループを生成する。この処理は図５の基本フローチャートにおいて、Ｒ（ｊ，ｋ）としてＣｏｓ（ｊ，ｋ）、閾値としてＴＨＣＯＳ１を用い、また、ステップＳ４におけるグループ化条件として、
Ｃｏｓ（ｐ，ｑ）＞ＴＨＣＯＳ１ ……（条件３）
を用いて行う。この結果、各グループ内の単語は（条件２）が成り立ち、このコアグループを初期値として単語のグループ化処理を行う。
【００２９】
単語グループ初期化部を図７の構成とした時、単語グループ化部４の処理は、図５の基本フローチャートにおいて、Ｒ（ｊ，ｋ）としてＩｒ（ｊ，ｋ）、閾値としてＴＨＩ４を用い、また、ステップＳ４におけるグループ化条件として、
Ｉｒ（ｐ，ｑ）＞ＴＨＩ４またはＣｒ（ｐ，ｑ）＞ＴＨＣＲ２
またはＣｏｓ（ｐ，ｑ）＞ＴＨＣＯＳ２ ……（条件４）
を用いて行う。
【００３０】
この結果、各単語はグループに分けることができ、各グループに含まれる単語間には（条件４）が成り立つ。即ち、各グループ内の単語は間隔関連度がＴＨＩ４より大きいか、または時系列関連度がＴＨＣＲ２より大きいか、あるいは余弦値がＴＨＣＯＳ２より大きくなっており、それらの単語同士を同義語として出力する。
【００３１】
このように、性格の異なる２種類の関連度をもとにして、各単語が同義語としてみなすことができるかどうかを判定し、同義語同士を同じグループに入れることができる。また、関連度辞書として、本出願人が先に提案した検索ログを解析する情報関連づけ装置（特願平９−１４８５１９号）によって生成された辞書を用いることにより、「その時点で同義語的に使用された単語」をグループ化することができるため、時代の流れに即した同義語辞書を自動的に生成できる。
【００３２】
なお、関連度辞書の構成としては、各単語間の関連度が記述してあればその構成は任意であることはいうまでもない。
【００３３】
【発明の効果】
以上説明したように、本発明によれば、単語同士の関連度を定義した少なくとも２種類の辞書を用いることによって同義語を検出しグループ化するため、同義語辞書を自動的に作成することができる。
【図面の簡単な説明】
【図１】本発明の実施の形態の一例を示すブロック図
【図２】間隔関連度辞書の一例を示す説明図
【図３】時系列関連度辞書の一例を示す説明図
【図４】単語グループ初期化部における処理のフローチャート
【図５】関連度に基づいて単語をグループ化する処理のフローチャート
【図６】単語グループ初期化部の別の例を示すブロック図
【図７】単語グループ初期化部のさらに別の例を示すブロック図
【符号の説明】
１：間隔関連度辞書、２：時系列関連度辞書、３：単語グループ初期化部、４：単語グループ化部、５：単独グループ生成部、６，８：コアグループ生成部、７：余弦計算部。[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a synonym calculation apparatus and method for grouping words that can be regarded as synonyms using a plurality of dictionaries in which the degree of association between words is defined, and a medium storing a synonym calculation program.
[0002]
[Prior art]
Conventionally, there is a synonym dictionary as a dictionary in which words having different notations but the same meaning are put together.
[0003]
The synonym dictionary can easily search for information intended by a user by, for example, inputting one word as a search word in an information search and complementing the search word using the synonym dictionary to perform a search. Used to be so. Alternatively, in an information search service, the strength of a request from a user for various information can be measured by the number of times a search term is used. An accurate value can be obtained.
[0004]
However, the relationship between words changes over time. For example, the search term "Olympics" will be held next when searching for the history and sports of the Olympics, searching for access to the Olympic venues and entry tickets, and wanting to know the breaking results of the Olympic games. It is used for various purposes depending on the season, such as when you want to know about the Olympics.
[0005]
The conventional synonym dictionary does not consider such a relationship that changes with time, and therefore, there is a problem that even if a search is performed using the synonym dictionary, only the same search result is always obtained. Also, there is a problem that the conventional synonym dictionary cannot accurately measure the strength of the information request.
[0006]
[Problems to be solved by the invention]
By using an information associating apparatus (Japanese Patent Application No. 9-148519) for analyzing a search log previously proposed by the present applicant, a relevance dictionary automatically following a relationship that changes with time can be constructed. However, in the relevance dictionary created in this way, the relevance between words is obtained, but it is not possible to accurately determine which words and which words are used as synonyms at that time. It is conceivable that a word whose relevance is equal to or more than a predetermined threshold value is regarded as a synonym, but there is a concern that this method may increase an error.
[0007]
For example, even if there are three words W1, W2, and W3 and the relevance between W1 and W2 and between W2 and W3 is greater than the threshold, the relevance between W1 and W3 is not necessarily greater than the threshold. That is, a simple determination based on only the threshold value causes an erroneous determination due to such a chain. Further, in the above-mentioned information associating device, it is possible to obtain the degree of relevance from two different viewpoints, but there is no description of means for effectively combining the two to construct a synonym dictionary.
[0008]
An object of the present invention has been made in view of the above-mentioned problems, and has a synonym calculating apparatus capable of automatically combining at least two types of dictionaries and automatically grouping words regarded as synonyms. Another object of the present invention is to provide a medium storing a synonym calculation program and a method thereof.
[0009]
[Means for Solving the Problems]
In order to achieve the above object, according to the present invention, at least two types of relevance dictionaries are used, a word group is initialized based on one relevance dictionary, and a word group is merged based on each relevance dictionary. Thus, since a synonym group is created, a synonym dictionary reflecting at least two types of relevance can be created. In addition, by using a dictionary created by analyzing a search log for a predetermined period as a relevance dictionary, related terms used synonymously at that time can be aggregated instead of general synonyms. A synonym dictionary that reflects information needs can be created.
[0010]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, the present invention will be described in detail with reference to the drawings.
[0011]
FIG. 1 shows an example of an embodiment of the present invention. In the figure, 1 is an interval relevance dictionary, 2 is a time-series relevance dictionary, 3 is a word group initialization unit, and 4 is a word grouping unit. .
[0012]
The interval relevance dictionary 1 uses an information associating device (Japanese Patent Application No. 9-148519) for analyzing a search log, which has been previously proposed by the present applicant, to compare words with each other based on the time interval of search by the same user. Are defined and created, and FIG. 2 shows an example thereof.
[0013]
In FIG. 2, the relevance (referred to as an interval relevance) between the word W1 and the words W2, W3,... Is Ir (1,2), Ir (1,3),. Is shown. Further, Ir (1,2) = Ir (2,1), Ir (1,3) = Ir (3,1),...
[0014]
The time-series relevance dictionary 2 uses an information associating device (Japanese Patent Application No. 9-148519) for analyzing a search log, which has been proposed by the present applicant, to determine the time-series phase relationship of the frequency of use of each search word. FIG. 3 shows an example in which the degree of association between words is defined and created based on the number.
[0015]
In FIG. 3, the relevance (referred to as a time-series relevance) between the word W1 and the words W2, W3,... Is Cr (1,2), Cr (1,3),. Is shown. Also, Cr (1,2) = Cr (2,1), Cr (1,3) = Cr (3,1),...
[0016]
The word group initialization unit 3 sets an initial value of a group to which each word belongs. FIG. 4 shows an example of an initial value setting flowchart. In FIG. 4, G [Wi] represents the name of the group to which the word Wi belongs, and G [Wi] = i as an initial value. That is, each word is set to belong to a group consisting of only that word.
[0017]
The word grouping section 4 groups words based on the interval relevance between words and the time-series relevance, and FIG. 5 shows an example of a basic flowchart of the word grouping process. In FIG. 5, the reference degree of relevance between two words Wj and Wk is R (j, k), and the threshold value of R (j, k) is TH.
[0018]
The processing flow is
The largest set (j, k) of R (j, k) is extracted (step S1), and the process ends unless R (j, k)> TH (step S2). If R (j, k)> TH, words (elements) Wp belonging to G [Wj] and words (elements) Wq belonging to G [Wk] are respectively extracted (step S3), and Wp and Wq are grouped. It is checked whether the condition is satisfied (step S4).
[0019]
If the condition is satisfied, all the elements Wp in G [Wj] and all the elements Wq in G [Wk] are checked (step S5). If these satisfy the grouping condition, G [Wk] is determined. G [Wj] is merged into one group, that is, G [Wj] = G [Wj] + G [Wk] (step S6).
[0020]
If the processing from steps S2 to S6 has been performed for all (j, k), the processing ends (step S7), otherwise, the (j, k) having the next largest R (j, k) value A set is taken out (step S8), and if this is not G [Wj] = G [Wk] (step S9), steps S2 to S6 are repeated.
[0021]
The word grouping unit 4 performs a word grouping process based on the basic flowchart. At this time, Ir (j, k) is used as R (j, k), THI1 is used as a threshold, and the grouping conditions in step S4 are as follows:
Ir (p, q)> THI2 or Cr (p, q)> THCR1 (condition 1)
Is used.
[0022]
As a result, each word can be divided into groups, and (condition 1) is established between the words included in each group. That is, the words in each group have an interval relevance greater than THI2 or a time-series relevance greater than THCR1, and output these words as synonyms.
[0023]
FIG. 6 shows another example of the word group initialization unit 3, and the same components as those in FIG. 1 are denoted by the same reference numerals. That is, 1 is an interval association degree dictionary, 5 is a single group generation unit, and 6 is a core group generation unit.
[0024]
The single group generation unit 5 generates a group from only each word based on the flowchart of FIG. The core group generation unit 6 generates a core word group based on the interval relevance. This process uses Ir (j, k) as R (j, k) and THI3 as a threshold value in the basic flowchart of FIG. 5, and as a grouping condition in step S4,
Ir (p, q)> THI3 (condition 2)
This is performed using As a result, the word in each group satisfies (condition 2), and the above grouping process is performed with the core group as an initial value.
[0025]
FIG. 7 shows still another example of the word group initialization unit 3, and the same components as those in FIG. 6 are denoted by the same reference numerals. That is, 1 is an interval relevance dictionary, 5 is a single group generation unit, 7 is a cosine calculation unit, and 8 is a core group generation unit.
[0026]
The cosine calculator 7 calculates a cosine value Cos (j, k) between the words Wj and Wk as follows. That is, based on the interval association degree dictionary 1, Wj = {Ir (j, 1), Ir (j, 2),... Ir (j, n)}, Wk = {Ir (k, 1), Ir ( k, 2),... Ir (k, n)} (j = 1, 2,... n, k = 1, 2,... n, j ≠ k)

Is calculated as
[0027]
Cos (j, k) obtained by the above equation is equal to a cosine value (cosine value) when each Wj, Wk is represented in an n-dimensional space. Here, since each interval relevance is a value of 0 or more, the value of Cos (j, k) is a value between 0 and 1. That is, as the value of Cos (j, k) increases, the angle between Wj and Wk decreases.
[0028]
The core group generation unit 8 generates a core word group based on the cosine value between words. This process uses Cos (j, k) as R (j, k) and THCOS1 as a threshold in the basic flowchart of FIG. 5, and as a grouping condition in step S4,
Cos (p, q)> THCOS1 (condition 3)
This is performed using As a result, the word in each group satisfies (Condition 2), and word grouping processing is performed using the core group as an initial value.
[0029]
When the word group initialization unit is configured as shown in FIG. 7, the process of the word grouping unit 4 uses Ir (j, k) as R (j, k) and THI4 as a threshold in the basic flowchart of FIG. Also, as a grouping condition in step S4,
Ir (p, q)> THI4 or Cr (p, q)> THCR2
Or Cos (p, q)> THCOS2 (condition 4)
This is performed using
[0030]
As a result, each word can be divided into groups, and (condition 4) is established between the words included in each group. That is, the words in each group have an interval relevance greater than THI4, a time-series relevance greater than THCR2, or a cosine value greater than THCOS2, and output these words as synonyms.
[0031]
In this way, it is possible to determine whether each word can be regarded as a synonym based on two types of relevance having different personalities, and to put the synonyms into the same group. In addition, by using a dictionary generated by an information associating device (Japanese Patent Application No. 9-148519) for analyzing a search log previously proposed by the present applicant as a relevance dictionary, "synonyms at that time are obtained. Since "used words" can be grouped, synonym dictionaries that match the times can be automatically generated.
[0032]
It goes without saying that the configuration of the relevance dictionary is arbitrary as long as the relevance between words is described.
[0033]
【The invention's effect】
As described above, according to the present invention, synonyms are detected and grouped by using at least two types of dictionaries that define the degree of relevance between words, so that a synonym dictionary can be automatically created. it can.
[Brief description of the drawings]
FIG. 1 is a block diagram showing an example of an embodiment of the present invention. FIG. 2 is an explanatory diagram showing an example of an interval relevance dictionary. FIG. 3 is an explanatory diagram showing an example of a time-series relevance dictionary. FIG. 5 is a flowchart of a process for grouping words based on the degree of relevance. FIG. 6 is a block diagram showing another example of the word group initialization unit. FIG. 7 is a word group initialization. Block diagram showing still another example of the unit.
1: interval relevance dictionary, 2: time-series relevance dictionary, 3: word group initialization unit, 4: word group generation unit, 5: single group generation unit, 6, 8: core group generation unit, 7: cosine calculation Department.

Claims

単語同士の関連度をそれぞれ異なる観点から定義した少なくとも２種類の関連度辞書と、
一の関連度辞書に含まれる全ての単語についてそれらが属するグループを初期設定する単語グループ初期化部と、
一のグループに属する単語と他のグループに属する単語とが同一グループに属するとみなせるか否かを、前記少なくとも２種類の関連度辞書に定義された単語同士の関連度に基づいて判定し、一のグループに属する全ての単語と他のグループに属する全ての単語との全ての組み合わせについて同一グループに属するとみなせる場合、一のグループと他のグループとを併合する単語グループ化部とを備えた
ことを特徴とする同義語計算装置。At least two types of relevance dictionaries defining relevance between words from different viewpoints,
A word group initialization unit that initializes a group to which all the words included in one association dictionary belong.
It is determined whether a word belonging to one group and a word belonging to another group can be regarded as belonging to the same group based on the relevance between words defined in the at least two types of relevance dictionary. When all combinations of all words belonging to a group and all words belonging to another group can be regarded as belonging to the same group, a word grouping unit for merging one group with another group is provided. Synonym calculation device characterized by the above-mentioned.

一の関連度辞書に含まれる全ての単語についてその単語のみを含むグループを初期設定する単語グループ初期化部を備えたことを特徴とする請求項１記載の同義語計算装置。2. The synonym calculation apparatus according to claim 1, further comprising a word group initializing unit that initializes a group including only the word for all the words included in one relevance dictionary.

一の関連度辞書に定義された単語同士の関連度が所定の閾値より大きい単語のみを含むグループを初期設定する単語グループ初期化部を備えたことを特徴とする請求項１記載の同義語計算装置。2. The synonym calculation according to claim 1, further comprising a word group initializing unit for initializing a group including only words whose relevance between words defined in one relevance dictionary is larger than a predetermined threshold. apparatus.

一の関連度辞書に定義された単語同士の関連度を用いて作成したベクトル間の余弦値が所定の閾値より大きい単語のみを含むグループを初期設定する単語グループ初期化部を備えたことを特徴とする請求項１記載の同義語計算装置。A word group initialization unit that initializes a group including only words whose cosine value between vectors created using the degree of association between words defined in one association degree dictionary is larger than a predetermined threshold value The synonym calculation device according to claim 1, wherein

いずれかの関連度辞書に定義された単語同士の関連度が予め辞書毎に設定された特定の閾値より大きい場合、一のグループに属する単語と他のグループに属する単語とが同一グループに属すると見なして処理する単語グループ化部を備えたことを特徴とする請求項１乃至４いずれか記載の同義語計算装置。If the relevance between words defined in any of the relevance dictionaries is greater than a specific threshold value set in advance for each dictionary, it is determined that a word belonging to one group and a word belonging to another group belong to the same group. The synonym calculation device according to claim 1, further comprising a word grouping unit that processes the synonym.

いずれかの関連度辞書に定義された単語同士の関連度が予め辞書毎に設定された特定の閾値より大きい場合もしくは一の関連度辞書に定義された単語同士の関連度を用いて作成したベクトル間の余弦値が特定の閾値より大きい場合、一のグループに属する単語と他のグループに属する単語とは同一グループに属すると見なして処理する単語グループ化部を備えたことを特徴とする請求項１乃至４いずれか記載の同義語計算装置。A vector created using the relevance between words defined in any of the relevance dictionaries when the relevancy between words is greater than a specific threshold value set in advance for each dictionary or using the relevance between words defined in one relevance dictionary A word grouping unit for processing a word belonging to one group and a word belonging to another group as being regarded as belonging to the same group when a cosine value between them is larger than a specific threshold value. 5. The synonym calculation device according to any one of 1 to 4.

所定期間の検索ログを解析して作成した関連度辞書を備えたことを特徴とする請求項１乃至６いずれか記載の同義語計算装置。7. The synonym calculation device according to claim 1, further comprising a relevance dictionary created by analyzing a search log for a predetermined period.

単語同士の関連度をそれぞれ異なる観点から定義した少なくとも２種類の関連度辞書のうちの一の関連度辞書に含まれる全ての単語についてそれらが属するグループを初期設定し、
一のグループに属する単語と他のグループに属する単語とが同一グループに属するとみなせるか否かを、前記少なくとも２種類の関連度辞書に定義された単語同士の関連度に基づいて判定し、
一のグループに属する全ての単語と他のグループに属する全ての単語との全ての組み合わせについて同一グループに属するとみなせる場合、一のグループと他のグループとを併合する
ことを特徴とする同義語計算方法。Initially set a group to which all the words included in one of the relevance dictionaries of at least two types of relevance dictionaries in which the relevance between words is defined from different viewpoints,
Whether or not words belonging to one group and words belonging to another group can be regarded as belonging to the same group is determined based on the relevance between words defined in the at least two types of relevance dictionary,
A synonym calculation characterized by merging one group with another group when all combinations of all words belonging to one group and all words belonging to another group can be regarded as belonging to the same group. Method.

一の関連度辞書に含まれる全ての単語についてその単語のみを含むグループを初期設定することを特徴とする請求項８記載の同義語計算方法。9. The synonym calculation method according to claim 8, wherein for all the words included in one relevance dictionary, a group including only the word is initially set.

一の関連度辞書に定義された単語同士の関連度が所定の閾値より大きい単語のみを含むグループを初期設定することを特徴とする請求項８記載の同義語計算方法。9. The synonym calculation method according to claim 8, wherein a group including only words whose relevance between words defined in one relevance dictionary is larger than a predetermined threshold value is initially set.

一の関連度辞書に定義された単語同士の関連度を用いて作成したベクトル間の余弦値が所定の閾値より大きい単語のみを含むグループを初期設定することを特徴とする請求項８記載の同義語計算方法。9. A synonym according to claim 8, wherein a group including only words whose cosine value between vectors created using the degree of association between words defined in one association degree dictionary is larger than a predetermined threshold value is initialized. Word calculation method.

いずれかの関連度辞書に定義された単語同士の関連度が予め辞書毎に設定された特定の閾値より大きい場合、一のグループに属する単語と他のグループに属する単語とが同一グループに属するとみなすことを特徴とする請求項８乃至１１いずれか記載の同義語計算方法。If the relevance between words defined in any of the relevance dictionaries is greater than a specific threshold value set in advance for each dictionary, it is determined that a word belonging to one group and a word belonging to another group belong to the same group. The synonym calculation method according to any one of claims 8 to 11, wherein the synonym calculation method is considered.

いずれかの関連度辞書に定義された単語同士の関連度が予め辞書毎に設定された特定の閾値より大きい場合もしくは一の関連度辞書に定義された単語同士の関連度を用いて作成したベクトル間の余弦値が特定の閾値より大きい場合、一のグループに属する単語と他のグループに属する単語とは同一グループに属するとみなすことを特徴とする請求項８乃至１１いずれか記載の同義語計算方法。A vector created using the relevance between words defined in any of the relevance dictionaries when the relevancy between words is greater than a specific threshold value set in advance for each dictionary or using the relevance between words defined in one relevance dictionary 12. The synonym calculation according to claim 8, wherein a word belonging to one group and a word belonging to another group are considered to belong to the same group when a cosine value between them is larger than a specific threshold value. Method.

所定期間の検索ログを解析して作成した関連度辞書を用いることを特徴とする請求項８乃至１３いずれか記載の同義語計算方法。14. The synonym calculation method according to claim 8, wherein a relevance dictionary created by analyzing a search log for a predetermined period is used.

計算装置に、
単語同士の関連度をそれぞれ異なる観点から定義した少なくとも２種類の関連度辞書のうちの一の関連度辞書に含まれる全ての単語についてそれらが属するグループを初期設定させる手順と、
一のグループに属する単語と他のグループに属する単語とが同一グループに属するとみなせるか否かを、前記少なくとも２種類の関連度辞書に定義された単語同士の関連度に基づいて判定させる手順と、
一のグループに属する全ての単語と他のグループに属する全ての単語との全ての組み合わせについて同一グループに属するとみなせる場合、一のグループと他のグループとを併合する手順とを実行させるための同義語計算プログラムを記録した媒体。 For computing devices,
Initializing the group to which all the words included in one of the at least two types of relevance dictionaries in which the relevance between words is defined from different viewpoints belong to ,
A step of determining whether words belonging to one group and words belonging to another group can be regarded as belonging to the same group, based on the relevance between words defined in the at least two types of relevance dictionaries ; ,
A synonym for performing a procedure for merging one group with another group when all combinations of all words belonging to one group and all words belonging to another group can be considered as belonging to the same group. A medium that stores a word calculation program.

一の関連度辞書に含まれる全ての単語についてその単語のみを含むグループを初期設定させる手順を含むことを特徴とする請求項１５記載の同義語計算プログラムを記録した媒体。16. The medium recording the synonym calculation program according to claim 15 , further comprising a step of initializing a group including only the word in all the words included in one relevance dictionary.

一の関連度辞書に定義された単語同士の関連度が所定の閾値より大きい単語のみを含むグループを初期設定させる手順を含むことを特徴とする請求項１５記載の同義語計算プログラムを記録した媒体。16. The medium storing the synonym calculation program according to claim 15 , further comprising a step of initializing a group including only words whose relevance between words defined in one relevance dictionary is larger than a predetermined threshold. .

一の関連度辞書に定義された単語同士の関連度を用いて作成したベクトル間の余弦値が所定の閾値より大きい単語のみを含むグループを初期設定させる手順を含むことを特徴とする請求項１５記載の同義語計算プログラムを記録した媒体。

【請求項１９】いずれかの関連度辞書に定義された単語同士の関連度が予め辞書毎に設定された特定の閾値より大きい場合、一のグループに属する単語と他のグループに属する単語とが同一グループに属するとみなすことを特徴とする請求項１５乃至１８いずれか記載の同義語計算プログラムを記録した媒体。16. The method according to claim 15 , further comprising a step of initializing a group including only words whose cosine values between vectors created using the degree of association between words defined in one association degree dictionary are larger than a predetermined threshold value. A medium on which the described synonym calculation program is recorded.

19. When the relevance between words defined in any of the relevance dictionaries is larger than a specific threshold value set in advance for each dictionary, a word belonging to one group and a word belonging to another group are determined. 19. The medium storing the synonym calculation program according to claim 15, wherein the medium is regarded as belonging to the same group.

いずれかの関連度辞書に定義された単語同士の関連度が予め辞書毎に設定された特定の閾値より大きい場合もしくは一の関連度辞書に定義された単語同士の関連度を用いて作成したベクトル間の余弦値が特定の閾値より大きい場合、一のグループに属する単語と他のグループに属する単語とは同一グループに属するとみなすことを特徴とする請求項１５乃至１８いずれか記載の同義語計算プログラムを記録した媒体。A vector created using the relevance between words defined in any of the relevance dictionaries when the relevancy between words is greater than a specific threshold value set in advance for each dictionary or using the relevance between words defined in one relevance dictionary 19. The synonym calculation according to claim 15, wherein a word belonging to one group and a word belonging to another group are considered to belong to the same group when a cosine value between them is larger than a specific threshold value. Medium on which the program is recorded.

所定期間の検索ログを解析して作成した関連度辞書を用いることを特徴とする請求項１５乃至２０いずれか記載の同義語計算プログラムを記録した媒体。21. A medium storing a synonym calculation program according to claim 15, wherein a relevance dictionary created by analyzing a search log for a predetermined period is used.