JP2018185661A

JP2018185661A - Text analysis and presentation method

Info

Publication number: JP2018185661A
Application number: JP2017087216A
Authority: JP
Inventors: 正智鎌田; Masatomo Kamata
Original assignee: Toyota Motor Corp
Current assignee: Toyota Motor Corp
Priority date: 2017-04-26
Filing date: 2017-04-26
Publication date: 2018-11-22

Abstract

PROBLEM TO BE SOLVED: To extract and present appropriate words in a wider range regardless of word appearance frequency.SOLUTION: A text analysis and presentation method comprises the steps of: sorting similar texts contained in a prescribed text into groups on the basis of word information related to words contained in the prescribed text and word relation information indicating the relation between words and calculating an influence degree of each word in each group; calculating an influence degree of each text in each group on the basis of the calculated influence degree of each word in each group; extracting a group with the highest influence degree among target words, constraint words, and answer words contained in user questions on the basis of an influence degree of each word in each group; searching for texts in which the influence degree of the extracted group is higher than or equal to a prescribed value on the basis of an influence degree of each word in each group; and extracting words indicating an answer for a question from the searched texts on the basis of the word relation information.SELECTED DRAWING: Figure 14

Description

本発明は、所定文章を解析し、該所定文章に含まれる複数の文章の中から、ユーザの質問に適した単語を抽出し、該抽出した単語を提示する文章解析提示方法に関する。 The present invention relates to a sentence analysis and presentation method for analyzing a predetermined sentence, extracting a word suitable for a user's question from a plurality of sentences included in the predetermined sentence, and presenting the extracted word.

単語の出願頻度に基づいて重要度を算出し、該算出した重要度が高い単語を選択する技術が知られている（例えば、特許文献１参照）。 A technique is known in which importance is calculated based on the word application frequency, and a word with a high calculated importance is selected (see, for example, Patent Document 1).

特開２００３−０９９４２７号公報JP 2003-099427 A

上記技術において、例えば、所定文章からユーザの質問に適した単語を抽出する場合、出現頻度の低い単語は重要度が低くなる。このため、そのような出現頻度の低い単語は抽出されず、狭く制限された範囲で単語を抽出することとなる。一方で、ユーザの質問に適した単語は、新しい気づきに繋がるものなど、出現頻度の低い単語である場合もある。この場合、適切な単語を抽出し、提示できない虞がある。 In the above technique, for example, when a word suitable for a user's question is extracted from a predetermined sentence, a word having a low appearance frequency is less important. For this reason, such a word with low appearance frequency is not extracted, but a word is extracted within a narrow limited range. On the other hand, a word suitable for a user's question may be a word with a low appearance frequency, such as a word that leads to a new awareness. In this case, an appropriate word may not be extracted and presented.

本発明は、このような問題点を解決するためになされたものであり、単語の出現頻度によらず、より広い範囲で適切な単語を抽出し提示できる文章解析提示方法を提供することを主たる目的とする。 The present invention has been made to solve such problems, and it is a main object of the present invention to provide a sentence analysis and presentation method that can extract and present appropriate words in a wider range regardless of the appearance frequency of words. Objective.

上記目的を達成するための本発明の一態様は、
所定文章を解析し、該所定文章に含まれる複数の文章の中から、ユーザの質問に適した単語を抽出し、該抽出した単語を提示する文章解析提示方法であって、
前記所定文章に含まれる単語に関する単語情報と、前記単語間の関係を示す単語関係情報と、に基づいて、前記所定文章に含まれる各文章を類似するもの同士のグループに分類し、該各クループに対する各単語の影響度を算出するステップと、
前記算出した各グループに対する各単語の影響度に基づいて、前記各グループに対する各文章の影響度を算出するステップと、
前記算出した各クループに対する各単語の影響度に基づいて、前記ユーザの質問に含まれる単語のうち、前記質問の目的を示す目的単語、前記質問の回答範囲に対する制約を示す制約単語、および、前記質問で得たい回答を示す回答単語、の影響度が最も大きいグループを抽出するステップと、
前記算出した各グループに対する各単語の影響度に基づいて、前記抽出したグループの影響度が所定値以上の文章を探索するステップと、
前記単語関係情報に基づいて、前記探索した文章の中から、前記質問に対する回答を示す単語を抽出し、該抽出した単語を提示するステップと、
を含む、
ことを特徴とする文章解析提示方法
である。 In order to achieve the above object, one embodiment of the present invention provides:
A sentence analysis presentation method for analyzing a predetermined sentence, extracting a word suitable for a user's question from a plurality of sentences included in the predetermined sentence, and presenting the extracted word,
Classifying each sentence included in the predetermined sentence into a group of similar ones based on the word information related to the words included in the predetermined sentence and the word relation information indicating the relationship between the words; Calculating the influence of each word on
Calculating the influence of each sentence on each group based on the calculated influence of each word on each group;
Based on the calculated degree of influence of each word on each group, among the words included in the user's question, a target word indicating the purpose of the question, a constraint word indicating a constraint on the answer range of the question, and Extracting the group with the greatest influence of the answer word indicating the answer you want to get in the question;
Searching for a sentence in which the degree of influence of the extracted group is a predetermined value or more based on the degree of influence of each word on the calculated group;
Extracting a word indicating an answer to the question from the searched sentences based on the word relation information, and presenting the extracted word;
including,
It is a sentence analysis presentation method characterized by this.

本発明によれば、単語の出現頻度によらず、より広い範囲で適切な単語を抽出し提示できる文章解析提示方法を提供することができる。 ADVANTAGE OF THE INVENTION According to this invention, the text analysis presentation method which can extract and show an appropriate word in a wider range irrespective of the appearance frequency of a word can be provided.

本発明の一実施形態に係る文章解析提示装置の概略的なシステム構成を示すブロック図である。1 is a block diagram showing a schematic system configuration of a text analysis presentation device according to an embodiment of the present invention. 単語情報の一例を示す図である。It is a figure which shows an example of word information. 単語ベクトルの一例を示す図である。It is a figure which shows an example of a word vector. 単語関係情報の一例を示す図である。It is a figure which shows an example of word related information. 文章ベクトルの一例を示す図である。It is a figure which shows an example of a text vector. 各グループに対する各単語の影響度の一例を示す図である。It is a figure which shows an example of the influence degree of each word with respect to each group. 各グループに対する各文章ＩＤの影響度の一例を示す図である。It is a figure which shows an example of the influence degree of each text ID with respect to each group. 類似単語及び類似度の一例を示す図である。It is a figure which shows an example of a similar word and similarity. 回答単語情報の一例を示す図である。It is a figure which shows an example of reply word information. 回答単語情報の一例を示す図である。It is a figure which shows an example of reply word information. 単語間のネットワークの一例を示す図である。It is a figure which shows an example of the network between words. 周辺単語間関係情報の一例を示す図である。It is a figure which shows an example of the relationship information between peripheral words. 周辺単語間のネットワークの一例を示す図である。It is a figure which shows an example of the network between surrounding words. 本発明の一実施形態に係る文章解析提示方法のフローの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the text analysis presentation method which concerns on one Embodiment of this invention.

以下、図面を参照して本発明の実施形態について説明する。
本発明の一実施形態に係る文章解析提示装置は、所定文章（論文、特許文献など）からユーザの質問に適した単語や文章を抽出し、抽出した単語や文章を整理などして提示するものである。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.
A sentence analysis and presentation device according to an embodiment of the present invention extracts words and sentences suitable for a user's question from a predetermined sentence (paper, patent document, etc.), and arranges and presents the extracted words and sentences. It is.

図１は、本発明の一実施形態に係る文章解析提示装置の概略的なシステム構成を示すブロック図である。本実施形態に係る文章解析提示装置１は、例えば、文章の解析を行う文章解析部２と、記憶部３と、入力部４と、単語や文章などを表示する表示出力部５と、単語及び文章の抽出を行う単語文章抽出部６と、を備えている。 FIG. 1 is a block diagram showing a schematic system configuration of a sentence analysis presentation device according to an embodiment of the present invention. The sentence analysis presentation device 1 according to the present embodiment includes, for example, a sentence analysis unit 2 that analyzes sentences, a storage unit 3, an input unit 4, a display output unit 5 that displays words, sentences, and the like, A word sentence extraction unit 6 that extracts sentences.

文章解析提示装置１は、例えば、演算処理等と行うＣＰＵ（Central Processing Unit）、ＣＰＵによって実行される演算プログラム等が記憶されたＲＯＭ（Read Only Memory）やＲＡＭ（Random Access Memory）からなるメモリ、外部と信号の入出力を行うインターフェイス部（Ｉ／Ｆ）、などからなるマイクロコンピュータを中心にして、ハードウェア構成されている。ＣＰＵ、メモリ、及びインターフェイス部は、データバスなどを介して相互に接続されている。 The sentence analysis and presentation device 1 includes, for example, a CPU (Central Processing Unit) that performs arithmetic processing and the like, a memory that includes a ROM (Read Only Memory) and a RAM (Random Access Memory) that store arithmetic programs executed by the CPU, The hardware configuration is centered on a microcomputer including an interface unit (I / F) for inputting / outputting signals to / from the outside. The CPU, memory, and interface unit are connected to each other via a data bus or the like.

文章解析部２は、テキスト化された所定文章に対し前処理を行い、単語情報などを生成する。文章解析部２は、例えば、図２に示す如く、文章集合である所定文章に含まれる各文章にＩＤ（以下、文章ＩＤ）を付与する。文章解析部２は、文章ＩＤ毎に、各文章ＩＤの文章内に出現する単語（「一般的」、「に」、「は」など）、その単語の品詞（名詞、助詞、助詞など）、その単語の詳細（一般、格助詞、係助詞など）、その単語の原型（「一般的」、「に」、「は」など）など、をそれぞれ対応付けた単語情報を生成する。このとき、文章解析部２は、同義語をまとめるようにしてもよい。これにより、単語情報をより効率的に参照できる。 The sentence analysis unit 2 performs preprocessing on the predetermined sentence that has been converted into text, and generates word information and the like. For example, as shown in FIG. 2, the sentence analysis unit 2 assigns an ID (hereinafter, sentence ID) to each sentence included in a predetermined sentence that is a sentence set. For each sentence ID, the sentence analysis unit 2 includes words ("general", "ni", "ha", etc.) that appear in the sentence of each sentence ID, parts of speech (nouns, particles, particles, etc.) of the words, Word information that associates the details of the word (general, case particles, co-particles, etc.) and the original form of the word (such as “general”, “ni”, “ha”) is generated. At this time, the sentence analysis unit 2 may collect synonyms. Thereby, word information can be referred to more efficiently.

所定文章は、予め記憶部３などに記憶されていてもよく、文章解析部２に入力部４などを介して適宜入力されてもよい。また、単語情報は、予め記憶部３などに記憶されていてもよく、文章解析部２に入力部４などを介して適宜入力されてもよい。記憶部３は、例えば、上記メモリなどにより構成されている。入力部４は、例えば、キーボード、マウス、タッチスクリーンなどで構成され、ユーザが数値や文字情報などを入力することができる。 The predetermined sentence may be stored in advance in the storage unit 3 or the like, or may be appropriately input to the sentence analysis unit 2 via the input unit 4 or the like. Further, the word information may be stored in advance in the storage unit 3 or the like, or may be appropriately input to the sentence analysis unit 2 via the input unit 4 or the like. The storage unit 3 is configured by, for example, the memory described above. The input unit 4 includes, for example, a keyboard, a mouse, a touch screen, and the like, and a user can input numerical values, character information, and the like.

文章解析部２は、例えば、単語をベクトル化して表現する定量化手法（Ｗｏｒｄ２Ｖｅｃなど）を用いて、各文章内の各単語を分散表現した単語ベクトルを算出する。文章解析部２は、例えば、図３に示す如く、各文章内の各単語の各次元（１次元、２次元、・・・）のベクトル値を算出する。文章解析部２は、算出した単語情報及び単語ベクトルを記憶部３に記憶させてもよい。 For example, the sentence analysis unit 2 calculates a word vector in which each word in each sentence is expressed in a distributed manner by using a quantification method (Word2Vec or the like) that expresses the word as a vector. For example, as shown in FIG. 3, the sentence analysis unit 2 calculates a vector value of each dimension (one dimension, two dimensions,...) Of each word in each sentence. The sentence analysis unit 2 may store the calculated word information and word vector in the storage unit 3.

表示出力部５は、例えば、ユーザの指示に応じて、文章解析部２により生成された単語情報や単語ベクトルなどを表示出力する。表示出力部５は、例えば、液晶ディスプレイや有機ＥＬ（Electro-Luminescence）ディスプレイで構成されている。 The display output unit 5 displays and outputs the word information, the word vector, and the like generated by the sentence analysis unit 2 according to a user instruction, for example. The display output unit 5 is configured by, for example, a liquid crystal display or an organic EL (Electro-Luminescence) display.

文章解析部２は、記憶部３の単語情報などに基づいて、構文解析や意味解析などを行い、単語間の関係（単語関係情報）を算出する。文章解析部２は、例えば、因果関係、並列関係、同義語関係、指示語関係などにある単語を対応付けた単語関係情報を算出する。文章解析部２は、例えば、図４に示す如く、文章ＩＤ毎の単語関係情報を算出する。 The sentence analysis unit 2 performs syntax analysis, semantic analysis, and the like based on word information in the storage unit 3 and calculates a relationship between words (word relationship information). The sentence analysis unit 2 calculates, for example, word relationship information that associates words that are in a causal relationship, a parallel relationship, a synonym relationship, an instruction word relationship, and the like. For example, the sentence analysis unit 2 calculates word relation information for each sentence ID as shown in FIG.

図４において、単語Ａ及び単語Ｂは、例えば、因果関係、並列関係、同義語関係、指示語関係などの関係にある。文章解析部２は、算出した文章ＩＤ毎の単語関係情報を記憶部３に記憶させてもよい。なお、単語関係情報は、予め記憶部３などに記憶されていてもよく、文章解析部２に入力部４などを介して適宜入力されてもよい。 In FIG. 4, a word A and a word B are in a relationship such as a causal relationship, a parallel relationship, a synonym relationship, and an instruction word relationship. The sentence analysis unit 2 may store the calculated word relation information for each sentence ID in the storage unit 3. Note that the word-related information may be stored in advance in the storage unit 3 or the like, or may be appropriately input to the sentence analysis unit 2 via the input unit 4 or the like.

この文章ＩＤ毎の単語関係情報には、単語の出現頻度という観点ではなく、各単語間の関係を網羅的かつ体系的に記憶できる。このため、この文章ＩＤ毎の単語関係情報を用いて、後述の質問回答単語を抽出することで、出現頻度の低い単語についても見落としなく、適切に抽出できる。 In the word relationship information for each sentence ID, the relationship between words can be stored comprehensively and systematically, not in terms of the appearance frequency of words. For this reason, by extracting the later-described question answer words using the word relation information for each sentence ID, it is possible to appropriately extract words with low appearance frequency without oversight.

文章解析部２は、記憶部３の単語情報や単語関係情報などに基づいて、所定文章に含まれる各文章を類似するもの同士のグループに分類する。例えば、文章解析部２は、単語情報に基づいて、図５に示す如く、各文章ＩＤの文章の文章ベクトル（行列）を算出する。文章解析部２は、算出した文章ベクトルの類似性に基づいて、クラスタ分析や行列分解等を行って各文章ＩＤを、類似するもの同士のグループに分類しタグ付け（分類ＮＯ１、分類ＮＯ２、ＮＯ３分類、・・・・）を行い、図６に示す如く、各グループに対する各単語の影響度を算出する。 The sentence analysis unit 2 classifies each sentence included in the predetermined sentence into a group of similar ones based on the word information and the word relation information in the storage unit 3. For example, the sentence analysis unit 2 calculates a sentence vector (matrix) of the sentence of each sentence ID based on the word information, as shown in FIG. The sentence analysis unit 2 performs cluster analysis and matrix decomposition based on the similarity of the calculated sentence vectors, classifies each sentence ID into a group of similar ones, and tags them (classification NO1, classification NO2, NO3). Classification,...) Is performed, and the degree of influence of each word on each group is calculated as shown in FIG.

文章解析部２は、算出した各グループに対する各単語の影響度に基づいて、図７に示す如く、タグ付けした各グループに対する各文章（文章ＩＤ）の影響度を算出する。なお、各グループに対する各単語の影響度と、各グループに対する各文章ＩＤの影響度と、を乗算したものが、上記文章ベクトルとなる。文章解析部２は、算出した各グループに対する各単語の影響度および各グループに対する各文章ＩＤの影響度を記憶部３に記憶させてもよい。 The sentence analysis unit 2 calculates the degree of influence of each sentence (sentence ID) on each tagged group, as shown in FIG. 7, based on the calculated degree of influence of each word on each group. Note that the sentence vector is obtained by multiplying the influence degree of each word for each group by the influence degree of each sentence ID for each group. The sentence analysis unit 2 may store the calculated degree of influence of each word on each group and the degree of influence of each sentence ID on each group in the storage unit 3.

上述の如く、予め、各グループに対する各単語の影響度および各グループに対する各文章ＩＤの影響度を求め記憶部３に記憶させ、記憶部３に記憶されたこれら影響度を用いることで、後述の質問意図を示す目的単語、制約単語および回答単語の影響を最も表すグループに対して、影響の強い文章ＩＤを高速かつ効率的に探索できる。 As described above, the degree of influence of each word for each group and the degree of influence of each sentence ID for each group are obtained in advance and stored in the storage unit 3, and the degree of influence stored in the storage unit 3 is used as described later. It is possible to search for a sentence ID having a strong influence at high speed and efficiently for a group that best represents the influence of a target word indicating a question intention, a restriction word, and an answer word.

単語文章抽出部６は、ユーザの質問に含まれ、その質問の意図を示す単語を抽出する。単語文章抽出部６は、ユーザの質問に含まれる単語のうち、その質問の目的を示す目的単語、その質問の回答範囲に対する制約を示す制約単語、その質問で得たい回答を示す回答単語、を抽出する。これにより、ユーザが得たい質問の意図を示す単語を的確に抽出できる。 The word sentence extraction unit 6 extracts words included in the user's question and indicating the intention of the question. The word sentence extraction unit 6 includes, among words included in the user's question, a target word indicating the purpose of the question, a constraint word indicating a restriction on the answer range of the question, and an answer word indicating an answer to be obtained by the question. Extract. Thereby, the word which shows the intention of the question which a user wants to obtain can be extracted exactly.

例えば、単語文章抽出部６には、入力部４を介して、目的単語、制約単語および回答単語が入力されてもよい。単語文章抽出部６は、入力されたその各単語を目的単語、制約単語、および回答単語として抽出してもよい。さらに、単語文章抽出部６は、例えば、ニューラルネットワークなどの機械学習器を用いて、目的単語、制約単語、および回答単語を機械学習し、この学習結果を用いて、ユーザの質問から目的単語、制約単語、および回答単語を抽出してもよい。事前の学習結果を用いることで、目的単語、制約単語、および回答単語を自動的に抽出できる。 For example, the target word, the restriction word, and the answer word may be input to the word sentence extraction unit 6 via the input unit 4. The word sentence extraction unit 6 may extract each input word as a target word, a restriction word, and an answer word. Further, the word sentence extraction unit 6 uses a machine learning device such as a neural network to machine learn the target word, the constraint word, and the answer word, and uses the learning result to obtain the target word, Restriction words and answer words may be extracted. By using the prior learning result, the target word, the constraint word, and the answer word can be automatically extracted.

単語文章抽出部６は、例えば、ユーザの質問「イオン伝導度に影響を与える電解液の組成は？」に含まれる単語から、目的単語「イオン伝導度」、制約単語「電解液」、および回答単語「組成」、をそれぞれ抽出する。 For example, the word sentence extraction unit 6 selects a target word “ion conductivity”, a restriction word “electrolyte”, and an answer from words included in the user's question “What is the composition of the electrolyte affecting ion conductivity?” The word “composition” is extracted.

単語文章抽出部６は、抽出した質問意図を示す目的単語、制約単語および回答単語の影響を最も表すグループに対して、影響の強い文章ＩＤを探索する。例えば、単語文章抽出部６は、記憶部３に記憶された各グループに対する各単語の影響度（図６）に基づいて、抽出した質問意図を示す目的単語、制約単語および回答単語の影響度を最も表すグループを抽出する。単語文章抽出部６は、各グループに対する各文章ＩＤの影響度（図７）に基づいて、抽出したグループに対して、影響の強い文章ＩＤを探索する。 The word sentence extraction unit 6 searches for a sentence ID having a strong influence on a group that most represents the influence of the target word indicating the extracted question intention, the restriction word, and the answer word. For example, the word sentence extraction unit 6 determines the influences of the target word, the restriction word, and the answer word indicating the extracted question intention based on the influence degree of each word with respect to each group stored in the storage unit 3 (FIG. 6). Extract the most representative group. The word sentence extraction unit 6 searches for the sentence ID having a strong influence on the extracted group based on the degree of influence of each sentence ID on each group (FIG. 7).

より具体的には、単語文章抽出部６は、記憶部３に記憶された各グループに対する各単語の影響度に基づいて、抽出した質問意図を示す目的単語、制約単語、及び回答単語の影響度が最も大きいグループを抽出する。単語文章抽出部６は、各グループに対する各文章ＩＤの影響度に基づいて、抽出したグループの影響度が所定値以上の文章ＩＤを探索する。 More specifically, the word sentence extraction unit 6 is based on the influence degree of each word with respect to each group stored in the storage unit 3, and the influence degree of the target word, the restriction word, and the answer word indicating the extracted question intention The group with the largest is extracted. The word sentence extraction unit 6 searches for sentence IDs whose influence degree of the extracted group is a predetermined value or more based on the influence degree of each sentence ID with respect to each group.

これにより、単語の出現頻度に影響されることなく、より広い範囲で、目的単語、制約単語および回答単語の影響度が高く、これら単語に大きく関連する文章ＩＤを高速に探索することができる。 Thereby, the influence degree of the target word, the restriction word, and the answer word is high in a wider range without being influenced by the appearance frequency of the word, and the sentence ID greatly related to these words can be searched at high speed.

単語文章抽出部６は、例えば、図７に示す文章ＩＤの中から、抽出した質問「イオン伝導度に影響を与える電解液の組成は？」の意図を示す目的単語「イオン伝導度」、制約単語「電解液」、及び回答単語「組成」の影響を最も表すグループに対して、影響の強い文章ＩＤ「１０００１、１０００２、１０００７」を探索する。単語文章抽出部６は、探索した文章ＩＤに対応する文章を表示出力部５に出力する。表示出力部５は、単語文章抽出部６は、探索した文章ＩＤに対応する文章を、適宜、表示してもよい。 For example, the word sentence extraction unit 6 selects the target word “ion conductivity” indicating the intention of the extracted question “What is the composition of the electrolytic solution that affects ion conductivity?” From the sentence ID shown in FIG. For the group that most represents the influence of the word “electrolyte” and the answer word “composition”, the sentence ID “10001, 10001, 10007” having a strong influence is searched. The word sentence extraction unit 6 outputs a sentence corresponding to the searched sentence ID to the display output unit 5. The display output unit 5 and the word sentence extraction unit 6 may appropriately display a sentence corresponding to the searched sentence ID.

なお、単語文章抽出部６は、例えば、上記質問意図を示す目的単語、制約単語、及び回答単語のうち一部又は全部が所定文書に含まれていないため、これら各単語の影響を最も表すグループに対して、影響の強い文章ＩＤを探索できない場合もある。この場合、単語文章抽出部６は、質問意図を示す目的単語、制約単語、及び回答単語に類似する類似単語を抽出する。単語文章抽出部６は、抽出した類似単語の影響を最も表すグループに対して、影響の強い文章ＩＤを探索してもよい。 The word sentence extraction unit 6 is, for example, a group that best represents the influence of each word because some or all of the target word indicating the question intention, the restriction word, and the answer word are not included in the predetermined document. On the other hand, there is a case where a sentence ID having a strong influence cannot be searched. In this case, the word sentence extraction unit 6 extracts similar words similar to the target word indicating the question intention, the constraint word, and the answer word. The word sentence extraction unit 6 may search for a sentence ID having a strong influence on a group that most represents the influence of the extracted similar words.

これにより、上記質問意図を示す目的単語、制約単語、及び回答単語のうち一部又は全部が所定文書に含まれていない場合でも、これら単語に類似する類似単語を用いて、文章ＩＤを探索できる。 As a result, even if some or all of the target word, the constraint word, and the answer word indicating the question intention are not included in the predetermined document, the sentence ID can be searched using similar words similar to these words. .

例えば、単語文章抽出部６は、記憶部３の単語ベクトル（図３）などに基づいて、所定文章の各文章の中から、質問意図を示す目的単語、制約単語、及び回答単語に類似した類似単語を含む文章を特定する。単語文章抽出部６は、特定した文章内から、類似度が高い、目的単語、制約単語、及び回答単語の類似単語を抽出する。単語文章抽出部６は、特定した文章内から、例えば、類似度が所定値以上の、目的単語、制約単語、及び回答単語の類似単語を抽出する。 For example, the word sentence extraction unit 6 is similar to the target word indicating the question intention, the restriction word, and the answer word from each sentence of the predetermined sentence based on the word vector (FIG. 3) in the storage unit 3. Identify sentences that contain words. The word sentence extraction unit 6 extracts similar words of the target word, the restriction word, and the answer word having high similarity from the specified sentence. The word sentence extraction unit 6 extracts, for example, similar words such as a target word, a restriction word, and an answer word whose similarity is equal to or higher than a predetermined value from the specified sentence.

単語文章抽出部６は、例えば、図８に示す如く、目的単語「イオン伝導度」に類似する類似単語「イオン伝導」や「イオン電導」、制約単語「電解質」に類似する類似単語「電解物質」や「電解液」、回答単語「組成」に類似する類似単語「成分」や「材料」を抽出する。 For example, as shown in FIG. 8, the word sentence extraction unit 6 uses a similar word “electrolytic substance” similar to the similar word “ion conduction” or “ion conduction” similar to the target word “ion conductivity” and the restriction word “electrolyte”. ”,“ Electrolyte ”, and similar words“ component ”and“ material ”similar to the answer word“ composition ”are extracted.

単語文章抽出部６は、探索した文章ＩＤと、記憶部３に記憶された文章ＩＤ毎の単語関係情報と、に基づいて、探索した文章ＩＤの文章の中から、質問の回答を示す質問回答単語を抽出する。これにより、目的単語、制約単語および回答単語の影響度が高く、これら単語に大きく関連する文章の中から、単語の出現頻度によらず、より広い範囲で、ユーザの質問に適した単語を抽出できる。 The word sentence extracting unit 6 indicates a question answer indicating the answer of the question from the sentences of the searched sentence ID based on the searched sentence ID and the word relation information for each sentence ID stored in the storage unit 3. Extract words. As a result, target words, restricted words, and answer words have a high degree of influence, and words suitable for the user's question are extracted from a wide range of words, regardless of the appearance frequency of words, from sentences that are highly related to these words. it can.

例えば、単語文章抽出部６は、探索した文章ＩＤと、記憶部３に記憶された文章ＩＤ毎の単語関係情報と、に基づいて、図９に示す如く、探索した１つの文章ＩＤ（１０００１）に対応する単語関係（単語Ａ及び単語Ｂ）を抽出し並べた回答単語情報を生成する。図９において、目的単語、制約単語、および回答単語の順に並べている。単語Ａと単語Ｂとは、因果関係、並列関係、同義語関係、指示語関係などの関係にある。 For example, the word sentence extraction unit 6 searches for one sentence ID (10001) as shown in FIG. 9 based on the searched sentence ID and the word relation information for each sentence ID stored in the storage unit 3. The word relationship (word A and word B) corresponding to is extracted and answer word information is generated. In FIG. 9, the target word, the constraint word, and the answer word are arranged in this order. The word A and the word B are in a relationship such as a causal relationship, a parallel relationship, a synonym relationship, and an instruction word relationship.

単語文章抽出部６は、生成した回答単語情報の中から、質問回答単語を抽出する。単語文章抽出部６は、例えば、図９に示す如く、回答単語情報において、単語Ｂの中から、回答単語「組成」に相当する単語「（ＣＦ３ＳＯ２）２）」、「（ＦＳＯ２）２ＮＬｉなど」を、質問回答単語として抽出する。 The word sentence extraction unit 6 extracts a question answer word from the generated answer word information. For example, as shown in FIG. 9, the word sentence extraction unit 6 uses the word “(CF3SO2) 2)”, “(FSO2) 2NLi, etc.” corresponding to the answer word “composition” from the word B in the answer word information. Are extracted as question answer words.

なお、単語文章抽出部６は、探索した文章ＩＤと、記憶部３に記憶された文章ＩＤ毎の単語関係情報と、に基づいて、図１０に示す如く、探索した２つ以上の文章ＩＤ（１０００１、１０００７など）に対応する単語関係（単語Ａ及び単語Ｂ）を抽出し並べた回答単語情報を生成してもよい。単語文章抽出部６は、生成したより広い範囲の回答単語情報の中から、質問回答単語を抽出する。これにより、より広い範囲で、ユーザの質問に適した単語を抽出できる。 Note that the word sentence extraction unit 6 uses the searched sentence ID and the word relation information for each sentence ID stored in the storage unit 3 as shown in FIG. Answer word information in which word relationships (word A and word B) corresponding to 10001 and 10007) are extracted and arranged may be generated. The word sentence extraction unit 6 extracts question answer words from the generated wider range of answer word information. Thereby, the word suitable for a user's question can be extracted in a wider range.

さらに、単語文章抽出部６は、生成した回答単語情報に基づいて、図１１に示す如く、質問意図を示す目的単語、制約単語、および回答単語に合致する単語間のネットワークを生成してもよい。単語文章抽出部６は、このネットワークから質問回答単語を抽出してもよい。ネットワークでは、目的単語、制約単語、および回答単語の順に接続され、因果関係、並列関係、同義語関係、指示語関係などの単語関係がある単語同士が接続される。例えば、単語文章抽出部６は、図１１に示すネットワークから質問回答単語「（ＣＦ３ＳＯ２）２）」、「（ＦＳＯ２）２ＮＬｉ」などを抽出する。 Furthermore, the word sentence extraction unit 6 may generate a network between the target word indicating the intention of the question, the constraint word, and the word that matches the answer word based on the generated answer word information as shown in FIG. . The word sentence extraction unit 6 may extract a question answer word from this network. In the network, target words, constraint words, and answer words are connected in this order, and words that have a word relationship such as a causal relationship, a parallel relationship, a synonym relationship, or an instruction word relationship are connected. For example, the word sentence extraction unit 6 extracts the question answer words “(CF3SO2) 2)”, “(FSO2) 2NLi”, and the like from the network shown in FIG.

なお、単語文章抽出部６は、例えば、ニューラルネットワークなどの機械学習器を用いて、目的単語、制約単語、および回答単語と、質問回答単語と、の関係を機械学習し、この学習結果を用いて、探索した文章ＩＤの文章から質問回答単語を抽出してもよい。 Note that the word sentence extraction unit 6 uses, for example, a machine learning device such as a neural network to perform machine learning on the relationship between the target word, the restriction word, the answer word, and the question answer word, and uses the learning result. Then, the question answer word may be extracted from the sentence having the searched sentence ID.

単語文章抽出部６は、単語間関係情報、単語間のネットワーク、及び質問回答単語を表示出力部５に出力する。表示出力部５は、単語文章抽出部６からの回答単語情報、単語間のネットワーク、及び質問回答単語を、適宜、表示してもよい。これにより、ユーザは、質問の回答として適切な質問回答単語を得られるだけでなく、回答単語情報や単語間のネットワークを参照することで、単語間の関係が視覚的により明確となるため、ユーザは新しい気づきに繋がる単語を得ることができる。 The word sentence extraction unit 6 outputs the inter-word relationship information, the network between words, and the question answer word to the display output unit 5. The display output unit 5 may appropriately display the answer word information from the word sentence extraction unit 6, the network between words, and the question answer word. As a result, the user not only obtains an appropriate question answer word as an answer to the question, but also by referring to the answer word information and the network between words, the relationship between the words becomes visually clearer. Can get words that lead to new awareness.

単語文章抽出部６は、記憶部３に記憶された文章ＩＤ毎の単語関係情報に基づいて、図１２に示す如く、質問意図を示す目的単語、制約単語、及び回答単語とその周辺の単語との関係（周辺単語間関係情報）を生成する。なお、図１２において、単語Ａの周辺に単語Ｂがあり、単語Ａと単語Ｂとは、因果関係、並列関係、同義語関係、指示語関係などの近接な関係にある。 Based on the word relation information for each sentence ID stored in the storage unit 3, the word sentence extraction unit 6, as shown in FIG. 12, the target word indicating the question intention, the constraint word, the answer word, and the surrounding words, Is generated (related information between neighboring words). In FIG. 12, there is a word B around the word A, and the word A and the word B are in a close relationship such as a causal relationship, a parallel relationship, a synonym relationship, or a directive relationship.

さらに、単語文章抽出部６は、生成した周辺単語間関係情報に基づいて、図１３に示す如く、周辺単語間のネットワークを生成する。単語文章抽出部６は、周辺単語間関係、及び周辺単語間のネットワークを表示出力部５に出力する。表示出力部５は、単語文章抽出部６からの周辺単語間関係、及び周辺単語間のネットワークを、適宜、表示してもよい。 Furthermore, the word sentence extraction part 6 produces | generates the network between adjacent words as shown in FIG. 13 based on the produced | generated relationship information between adjacent words. The word sentence extraction unit 6 outputs the relationship between neighboring words and the network between neighboring words to the display output unit 5. The display output unit 5 may appropriately display the relationship between neighboring words from the word sentence extracting unit 6 and the network between neighboring words.

ユーザは、この周辺単語間関係や周辺単語間のネットワークを参照することで、新しい気づきに繋がる単語を得ることができる。また、ユーザは、仮に、適切な質問回答単語を得られない場合でも、これら周辺単語間関係や周辺単語間のネットワークを参照することで、その質問回答単語の周囲に存在する単語を得ることができる。 The user can obtain a word that leads to a new awareness by referring to the relationship between neighboring words and the network between neighboring words. In addition, even if the user cannot obtain an appropriate question answer word, the user can obtain words existing around the question answer word by referring to the relationship between these neighboring words and the network between the neighboring words. it can.

図１４は、本実施形態に係る文章解析提示方法のフローの一例を示すフローチャートである。 FIG. 14 is a flowchart showing an example of the flow of the text analysis presentation method according to the present embodiment.

文章解析部２は、文章ＩＤ毎に、各文章内に出現する単語の単語情報を生成し、生成した単語情報を記憶部３に記憶させる（ステップＳ１０１）。 The sentence analysis unit 2 generates word information of words appearing in each sentence for each sentence ID, and stores the generated word information in the storage unit 3 (step S101).

文章解析部２は、記憶部３の単語情報などに基づいて、構文解析や意味解析などを行い、単語関係情報を算出し、算出した単語関係情報を記憶部３に記憶させる（ステップＳ１０２）。 The sentence analysis unit 2 performs syntax analysis and semantic analysis based on the word information in the storage unit 3 and the like, calculates word relationship information, and stores the calculated word relationship information in the storage unit 3 (step S102).

文章解析部２は、算出した文章ベクトルの類似性に基づいて、クラスタ分析や行列分解等を行って各文章ＩＤをグループに分類しタグ付けを行い、各グループに対する各単語の影響度を算出し、該算出した影響度を記憶部３に記憶させる（ステップＳ１０３）。 The sentence analysis unit 2 performs cluster analysis and matrix decomposition based on the similarity of the calculated sentence vectors, classifies each sentence ID into a group, performs tagging, and calculates the degree of influence of each word on each group. The calculated degree of influence is stored in the storage unit 3 (step S103).

文章解析部２は、算出した各グループに対する各単語の影響度に基づいて、各グループに対する文章ＩＤの影響度を算出し、該算出した影響度を記憶部３に記憶させる（ステップＳ１０４）。 The sentence analysis unit 2 calculates the influence degree of the sentence ID for each group based on the calculated influence degree of each word for each group, and stores the calculated influence degree in the storage unit 3 (step S104).

単語文章抽出部６は、ユーザの質問に含まれる単語のうち、その質問の目的を示す目的単語、その質問の回答範囲に対する制約を示す制約単語、その質問で得たい回答を示す回答単語、を抽出する（ステップＳ１０５）。 The word sentence extraction unit 6 includes, among words included in the user's question, a target word indicating the purpose of the question, a constraint word indicating a restriction on the answer range of the question, and an answer word indicating an answer to be obtained by the question. Extract (step S105).

単語文章抽出部６は、記憶部３に記憶された各グループに対する各単語の影響度に基づいて、抽出した質問意図を示す目的単語、制約単語、及び回答単語の影響度が最も大きいグループを抽出する（ステップＳ１０６）。 Based on the degree of influence of each word stored in the storage unit 3, the word sentence extraction unit 6 extracts a group having the largest influence of the extracted target word, constraint word, and answer word indicating the question intention. (Step S106).

単語文章抽出部６は、各グループに対する各文章ＩＤの影響度に基づいて、抽出したグループの影響度が所定値以上の文章ＩＤを探索する（ステップＳ１０７）。 Based on the degree of influence of each sentence ID on each group, the word sentence extraction unit 6 searches for a sentence ID having an influence degree of the extracted group equal to or greater than a predetermined value (step S107).

単語文章抽出部６は、探索した文章ＩＤと、記憶部３に記憶された文章ＩＤ毎の単語関係情報と、に基づいて、探索した文章ＩＤの文章の中から、質問の回答を示す質問回答単語を抽出する（ステップＳ１０８）。 The word sentence extracting unit 6 indicates a question answer indicating the answer of the question from the sentences of the searched sentence ID based on the searched sentence ID and the word relation information for each sentence ID stored in the storage unit 3. A word is extracted (step S108).

以上、本実施形態に係る文章解析提示方法において、上述の如く、単語情報および単語関係情報に基づいて、所定文章に含まれる各文章は類似するもの同士のグループに分類され、該各クループに対する各単語の影響度が算出される。算出された各グループに対する各単語の影響度に基づいて、各グループに対する各文章の影響度が算出される。算出された各クループに対する各単語の影響度に基づいて、ユーザの質問に含まれる単語のうち、質問の目的を示す目的単語、質問の回答範囲に対する制約を示す制約単語、および、質問で得たい回答を示す回答単語、の影響度が最も大きいグループが、抽出される。算出された各グループに対する各単語の影響度に基づいて、抽出したグループの影響度が所定値以上の文章が、探索される。単語関係情報に基づいて、探索した文章の中から、質問に対する回答を示す質問回答単語が抽出され、該抽出された質問回答単語が提示される。 As described above, in the sentence analysis presentation method according to the present embodiment, as described above, based on the word information and the word relation information, the sentences included in the predetermined sentence are classified into groups of similar ones. The influence degree of the word is calculated. Based on the calculated degree of influence of each word on each group, the degree of influence of each sentence on each group is calculated. Based on the degree of influence of each word for each calculated group, among the words included in the user's question, the target word indicating the purpose of the question, the constraint word indicating the constraint on the answer range of the question, and the question The group having the greatest influence of the answer word indicating the answer is extracted. Based on the calculated degree of influence of each word on each group, a sentence in which the degree of influence of the extracted group is a predetermined value or more is searched. Based on the word relation information, a question answer word indicating an answer to the question is extracted from the searched sentences, and the extracted question answer word is presented.

このように、各グループに対する各単語の影響度および各グループに対する各文章の影響度を用いて、ユーザの質問意図を示す目的単語、制約単語および回答単語の影響を最も表すグループに対して、影響の強い文章を高速かつ効率的に探索し、探索した文章から、単語の出現頻度という観点ではなく、各単語間の関係が網羅的かつ体系的に記憶されている単語関係情報を用いて質問回答単語を抽出することで、出現頻度の低い単語についても見落としなく、適切に抽出できる。すなわち、本実施形態によれば、単語の出現頻度によらず、より広い範囲で適切な単語を抽出し提示できる。 In this way, using the degree of influence of each word on each group and the degree of influence of each sentence on each group, the influence is exerted on the group that best represents the influence of the target word, constraint word, and answer word indicating the user's question intention. Fast and efficiently search for strong sentences, and answer questions using word relation information in which relations between words are comprehensively and systematically stored, not from the viewpoint of word appearance frequency, from the searched sentences By extracting words, words with low frequency of appearance can be appropriately extracted without being overlooked. That is, according to the present embodiment, an appropriate word can be extracted and presented in a wider range regardless of the appearance frequency of the word.

なお、本発明は上記実施の形態に限られたものではなく、趣旨を逸脱しない範囲で適宜変更することが可能である。 Note that the present invention is not limited to the above-described embodiment, and can be changed as appropriate without departing from the spirit of the present invention.

例えば、上記実施形態において、文章解析提示装置１は、探索した文章ＩＤの文章や抽出した質問回答単語などをディスプレイなどにより表示し出力しているが、これに限定されず、例えば、スピーカなどにより音声出力してもよい。 For example, in the above-described embodiment, the sentence analysis presentation device 1 displays and outputs the sentence with the searched sentence ID or the extracted question answer word on a display or the like. However, the present invention is not limited to this. Audio may be output.

本発明は、例えば、図１４に示す処理を、ＣＰＵにコンピュータプログラムを実行させることにより実現することも可能である。 In the present invention, for example, the processing shown in FIG. 14 can be realized by causing a CPU to execute a computer program.

プログラムは、様々なタイプの非一時的なコンピュータ可読媒体（non-transitory computer readable medium）を用いて格納され、コンピュータに供給することができる。非一時的なコンピュータ可読媒体は、様々なタイプの実体のある記録媒体（tangible storage medium）を含む。非一時的なコンピュータ可読媒体の例は、磁気記録媒体（例えばフレキシブルディスク、磁気テープ、ハードディスクドライブ）、光磁気記録媒体（例えば光磁気ディスク）、ＣＤ−ＲＯＭ（Read Only Memory）、ＣＤ−Ｒ、ＣＤ−Ｒ／Ｗ、半導体メモリ（例えば、マスクＲＯＭ、ＰＲＯＭ（Programmable ROM）、ＥＰＲＯＭ（Erasable PROM）、フラッシュＲＯＭ、ＲＡＭ（random access memory））を含む。 The program may be stored using various types of non-transitory computer readable media and supplied to a computer. Non-transitory computer readable media include various types of tangible storage media. Examples of non-transitory computer-readable media include magnetic recording media (for example, flexible disks, magnetic tapes, hard disk drives), magneto-optical recording media (for example, magneto-optical disks), CD-ROMs (Read Only Memory), CD-Rs, CD-R / W and semiconductor memory (for example, mask ROM, PROM (Programmable ROM), EPROM (Erasable PROM), flash ROM, RAM (random access memory)) are included.

プログラムは、様々なタイプの一時的なコンピュータ可読媒体（transitory computer readable medium）によってコンピュータに供給されてもよい。一時的なコンピュータ可読媒体の例は、電気信号、光信号、及び電磁波を含む。一時的なコンピュータ可読媒体は、電線及び光ファイバ等の有線通信路、又は無線通信路を介して、プログラムをコンピュータに供給できる。 The program may be supplied to the computer by various types of transitory computer readable media. Examples of transitory computer readable media include electrical signals, optical signals, and electromagnetic waves. The temporary computer-readable medium can supply the program to the computer via a wired communication path such as an electric wire and an optical fiber, or a wireless communication path.

１文章解析提示装置、２文章解析部、３記憶部、４入力部、５表示出力部、６単語文章抽出部 DESCRIPTION OF SYMBOLS 1 sentence analysis presentation apparatus, 2 sentence analysis part, 3 memory | storage part, 4 input part, 5 display output part, 6 word sentence extraction part

Claims

所定文章を解析し、該所定文章に含まれる複数の文章の中から、ユーザの質問に適した単語を抽出し、該抽出した単語を提示する文章解析提示方法であって、
前記所定文章に含まれる単語に関する単語情報と、前記単語間の関係を示す単語関係情報と、に基づいて、前記所定文章に含まれる各文章を類似するもの同士のグループに分類し、該各クループに対する各単語の影響度を算出するステップと、
前記算出した各グループに対する各単語の影響度に基づいて、前記各グループに対する各文章の影響度を算出するステップと、
前記算出した各クループに対する各単語の影響度に基づいて、前記ユーザの質問に含まれる単語のうち、前記質問の目的を示す目的単語、前記質問の回答範囲に対する制約を示す制約単語、および、前記質問で得たい回答を示す回答単語、の影響度が最も大きいグループを抽出するステップと、
前記算出した各グループに対する各単語の影響度に基づいて、前記抽出したグループの影響度が所定値以上の文章を探索するステップと、
前記単語関係情報に基づいて、前記探索した文章の中から、前記質問に対する回答を示す単語を抽出し、該抽出した単語を提示するステップと、
を含む、
ことを特徴とする文章解析提示方法。 A sentence analysis presentation method for analyzing a predetermined sentence, extracting a word suitable for a user's question from a plurality of sentences included in the predetermined sentence, and presenting the extracted word,
Classifying each sentence included in the predetermined sentence into a group of similar ones based on the word information related to the words included in the predetermined sentence and the word relation information indicating the relationship between the words; Calculating the influence of each word on
Calculating the influence of each sentence on each group based on the calculated influence of each word on each group;
Based on the calculated degree of influence of each word on each group, among the words included in the user's question, a target word indicating the purpose of the question, a constraint word indicating a constraint on the answer range of the question, and Extracting the group with the greatest influence of the answer word indicating the answer you want to get in the question;
Searching for a sentence in which the degree of influence of the extracted group is a predetermined value or more based on the degree of influence of each word on the calculated group;
Extracting a word indicating an answer to the question from the searched sentences based on the word relation information, and presenting the extracted word;
including,
A sentence analysis presentation method characterized by this.