JP6775366B2

JP6775366B2 - Selection device and selection method

Info

Publication number: JP6775366B2
Application number: JP2016182362A
Authority: JP
Inventors: 祐宮崎; 隼人小林; 香里谷尾; 晃平菅原; 正樹野口
Original assignee: Yahoo Japan Corp
Current assignee: Yahoo Japan Corp
Priority date: 2016-09-16
Filing date: 2016-09-16
Publication date: 2020-10-28
Anticipated expiration: 2036-09-16
Also published as: JP2018045657A

Description

本発明は、選択装置、および選択方法に関する。 The present invention, the selection device, about your and selection method.

従来、入力された情報の解析結果に基づいて、入力された情報と関連する情報を検索もしくは生成し、検索もしくは生成した情報を応答として出力する技術が知られている。このような技術の一例として、入力されたテキストに含まれる単語、文章、文脈を多次元ベクトルに変換して解析し、解析結果に基づいて、入力されたテキストと類似するテキストや、入力されたテキストに続くテキストを類推し、類推結果を出力する自然言語処理の技術が知られている。 Conventionally, there is known a technique of searching or generating information related to the input information based on the analysis result of the input information, and outputting the searched or generated information as a response. As an example of such a technique, words, sentences, and contexts contained in the input text are converted into a multidimensional vector and analyzed, and based on the analysis result, text similar to the input text or input. A natural language processing technique that infers the text following the text and outputs the analogy result is known.

特開２０００−３５３１６０号公報Japanese Unexamined Patent Publication No. 2000-353160

“word2vecによる自然言語処理”，西尾泰和，2014年05月発行，ISBN978-4-87311-683-9"Natural Language Processing with word2vec", Yasukazu Nishio, May 2014, ISBN978-4-87311-683-9

しかしながら、上記の従来技術では、所定の概念の理解を援助する情報を出力することができない場合がある。 However, in the above-mentioned prior art, it may not be possible to output information that assists in understanding a predetermined concept.

例えば、上記の従来技術では、入力されたテキストと類似するテキストや、入力されたテキストに続くテキスト等、利用者が予測しうる情報を出力しているに過ぎない。このため、例えば、たとえ話等、利用者の理解を援助する情報を出力することができない場合がある。 For example, in the above-mentioned prior art, only information that can be predicted by the user, such as a text similar to the input text and a text following the input text, is output. Therefore, for example, it may not be possible to output information that assists the user's understanding, such as a parable.

本願は、上記に鑑みてなされたものであって、所定の概念の理解を援助する情報を出力することを目的とする。 The present application has been made in view of the above, and an object of the present application is to output information that assists in understanding a predetermined concept.

本願に係る選択装置は、所定の構造を有する文章に含まれる単語群を抽出する抽出部と、前記単語群に含まれる各単語を個別にベクトル化した複数のベクトルを、前記文章において各単語が出現した順序で結合したベクトル、もしくは、前記文章において各単語が出現した順序で複数のベクトルのテンソル積となるベクトルを前記文章と対応するベクトルとして生成する生成部と、前記生成部が文章ごとに生成したベクトル同士を比較することで、所定の文章と概念が類似する他の文章を選択する選択部とを有することを特徴とする。 Selection device according to the present application, the word extraction section that extracts a group of words included in the text, a plurality of vectors individually vectorized each word included before Symbol word group, in the sentence having a predetermined structure A generator that generates a vector that is connected in the order in which It is characterized by having a selection unit for selecting another sentence having a similar concept to a predetermined sentence by comparing the vectors generated in .

実施形態の一態様によれば、所定の概念の理解を援助する情報を出力することができる。 According to one aspect of the embodiment, it is possible to output information that assists in understanding a predetermined concept.

図１は、実施形態に係る学習装置が実行する学習処理の一例を示す図である。FIG. 1 is a diagram showing an example of learning processing executed by the learning device according to the embodiment. 図２は、実施形態に係る学習装置の構成例を示す図である。FIG. 2 is a diagram showing a configuration example of the learning device according to the embodiment. 図３は、実施形態に係る正解データデータベースに登録される情報の一例を示す図である。FIG. 3 is a diagram showing an example of information registered in the correct answer data database according to the embodiment. 図４は、実施形態に係る抽象概念空間データベースに登録される情報の一例を示す図である。FIG. 4 is a diagram showing an example of information registered in the abstract concept space database according to the embodiment. 図５は、実施形態に係る学習装置が比較する構造の一例を示す図である。FIG. 5 is a diagram showing an example of a structure in which the learning devices according to the embodiment are compared. 図６は、実施形態に係る学習処理の流れの一例を説明するフローチャートである。FIG. 6 is a flowchart illustrating an example of the flow of the learning process according to the embodiment. 図７は、実施形態に係る測定処理の流れの一例を説明するフローチャートである。FIG. 7 is a flowchart illustrating an example of the flow of the measurement process according to the embodiment. 図８は、ハードウェア構成の一例を示す図である。FIG. 8 is a diagram showing an example of a hardware configuration.

以下に、本願に係る選択装置、および選択方法を実施するための形態（以下、「実施形態」と記載する。）について図面を参照しつつ詳細に説明する。なお、この実施形態により本願に係選択装置、および選択方法が限定されるものではない。また、以下の各実施形態において同一の部位には同一の符号を付し、重複する説明は省略される。 Hereinafter, the selection device according to the present application, embodiments of the contact and selection method (hereinafter referred to as "embodiment".) Will be described in detail with reference to the drawings. Incidentally, engagement selector herein by this embodiment, does not contact and selection method is limited. Further, in each of the following embodiments, the same parts are designated by the same reference numerals, and duplicate description is omitted.

［実施形態］
〔１−１．学習装置の一例〕
まず、図１を用いて、学習装置が実行する処理の一例について説明する。図１は、実施形態に係る学習装置が実行する学習処理の一例を示す図である。図１では、学習装置１０は、以下に説明する学習処理を実行する情報処理装置であり、例えば、サーバ装置やクラウドシステム等により実現される。 [Embodiment]
[1-1. Example of learning device]
First, an example of the process executed by the learning device will be described with reference to FIG. FIG. 1 is a diagram showing an example of learning processing executed by the learning device according to the embodiment. In FIG. 1, the learning device 10 is an information processing device that executes the learning process described below, and is realized by, for example, a server device, a cloud system, or the like.

より具体的には、学習装置１０は、インターネット等の所定のネットワークＮを介して、入力装置１００や情報処理装置２００（例えば、図２を参照）といった任意の装置と通信が可能である。 More specifically, the learning device 10 can communicate with an arbitrary device such as an input device 100 or an information processing device 200 (see, for example, FIG. 2) via a predetermined network N such as the Internet.

ここで、入力装置１００は、マイクなどの音声を取得する音声取得装置を用いて、利用者の発言を取得する。そして、入力装置１００は、任意の音声認識技術を用いて、発言をテキストデータに変換し、変換後のテキストデータを学習装置１０へと送信する。また、情報処理装置２００は、スピーカ等の音声を出力する装置を用いて、学習装置１０から受信したテキストデータの読み上げを行う。なお、情報処理装置２００は、学習装置１０から受信したテキストデータを所定の表示装置に表示してもよい。 Here, the input device 100 acquires the user's remark by using a voice acquisition device such as a microphone that acquires the voice. Then, the input device 100 converts the remark into text data by using an arbitrary voice recognition technique, and transmits the converted text data to the learning device 10. In addition, the information processing device 200 reads out the text data received from the learning device 10 by using a device that outputs voice such as a speaker. The information processing device 200 may display the text data received from the learning device 10 on a predetermined display device.

なお、入力装置１００や情報処理装置２００は、スマートフォンやタブレット等のスマートデバイス、デスクトップＰＣ（Personal Computer）やノートＰＣ等、サーバ装置等の情報処理装置により実現される。なお、入力装置１００および情報処理装置２００は、例えば、同一の情報処理装置によって実現されてもよく、例えば、ロボット等の装置によって実現されてもよい。 The input device 100 and the information processing device 200 are realized by an information processing device such as a smart device such as a smartphone or tablet, a desktop PC (Personal Computer), a notebook PC, or a server device. The input device 100 and the information processing device 200 may be realized by, for example, the same information processing device, or may be realized by a device such as a robot.

〔１−２．学習装置の処理について〕
ここで、ある事柄についてたとえ話を生成することができるのであれば、その事柄の概念を理解しているとも考えられる。また、かかる事柄の抽象化度が高い程、事柄をより良く理解しているとも考えられる。このため、学習装置１０が事柄をより高度に抽象化できるのであれば、その事柄の概念を理解している（事柄の概念を学習している）と言うことができる。また、学習装置１０が、ある事柄の概念をたとえ話にすることができるのであれば、利用者にその事柄をより容易に理解させることができるとも考えられる。 [1-2. About the processing of the learning device]
Here, if a parable can be generated about a certain matter, it is considered that the person understands the concept of the matter. It is also considered that the higher the degree of abstraction of such matters, the better the understanding of the matters. Therefore, if the learning device 10 can abstract a matter to a higher degree, it can be said that it understands the concept of the matter (learning the concept of the matter). It is also considered that if the learning device 10 can parable the concept of a certain matter, the user can be made to understand the matter more easily.

そこで、学習装置１０は、入力された事柄を精度良く抽出するために、以下の学習処理を実行する。まず、学習装置１０は、所定の構造を有する文章に含まれる複数の単語（以下、「単語群」と記載する。）を抽出する。そして、学習装置１０は、所定のベクトル空間上において単語群に含まれる所定の単語の概念を示すベクトルが示す位置に、単語群に含まれる他の単語のベクトルを含むベクトル空間を紐付けることで、文章が有する概念を示す概念空間を学習する。例えば、学習装置１０は、構造が類似する文章から抽出された概念空間が類似するように、モデルの学習を行う。 Therefore, the learning device 10 executes the following learning process in order to accurately extract the input matter. First, the learning device 10 extracts a plurality of words (hereinafter, referred to as "word group") included in a sentence having a predetermined structure. Then, the learning device 10 associates the vector space including the vector of another word included in the word group with the position indicated by the vector indicating the concept of the predetermined word included in the word group on the predetermined vector space. , Learn the concept space that shows the concept of sentences. For example, the learning device 10 trains the model so that the conceptual spaces extracted from sentences having similar structures are similar.

例えば、学習装置１０は、正解データとして入力された文章から第１単語、第２単語、および第３単語を抽出したものとする。このような場合、学習装置１０は、ｗｏｒｄ２ｖｅｃ等の技術を用いて、各単語を分散表現（ベクトル）へと変換する。そして、学習装置１０は、第１単語のベクトルを含むベクトル空間（分散表現空間）において、第１単語のベクトルの先端に、第２単語のベクトルを含むベクトル空間を紐付ける。すなわち、学習装置１０は、所定のベクトル空間上において第１単語のベクトルが示す位置に第２ベクトルを含むベクトル空間を紐付けることで、第１単語のベクトルの先端に第２単語のベクトルを埋め込む。 For example, it is assumed that the learning device 10 extracts the first word, the second word, and the third word from the sentence input as the correct answer data. In such a case, the learning device 10 converts each word into a distributed expression (vector) by using a technique such as word2vec. Then, the learning device 10 associates the vector space including the vector of the second word with the tip of the vector of the first word in the vector space (distributed representation space) including the vector of the first word. That is, the learning device 10 embeds the vector of the second word at the tip of the vector of the first word by associating the vector space including the second vector with the position indicated by the vector of the first word on the predetermined vector space. ..

そして、学習装置１０は、第１単語のベクトルの先端に埋め込んだ第２単語のベクトルの先端に、さらに第３単語のベクトルを埋め込む。すなわち、学習装置１０は、第１単語のベクトルの先端に第２単語のベクトルを含むベクトル空間を紐付け、第２単語のベクトルの先端に第３単語のベクトルを含むベクトル空間を紐付ける。例えば、学習装置１０は、第１単語、第２単語、および第３単語をｎ次元のベクトルに変換する。このような場合、学習装置１０は、第１単語のベクトル「（ａ１、ａ２・・・・ａｎ）」、第２単語のベクトル「（ｂ１、ｂ２・・・・ｂｎ）」、および第３単語のベクトル「（ｃ１、ｃ２・・・・ｃｎ）」が得られた場合、第１単語、第２単語、および第３単語を含む文章の抽象化された概念を示すベクトルとして「（ａ１、ａ２・・・・ａｎ、ｂ１、ｂ２・・・・ｂｎ、ｃ１、ｃ２・・・・ｃｎ）」といった、各単語のベクトルの結合である３ｎ次元のベクトルを生成する。なお、例えば、学習装置１０は、上記したベクトルの結合の他に、ベクトル同士のテンソル積空間のいずれかを用いて、文章の抽象化された概念を示す空間（以下、「抽象概念空間」と記載する。）として生成する。 Then, the learning device 10 further embeds the vector of the third word at the tip of the vector of the second word embedded in the tip of the vector of the first word. That is, the learning device 10 associates the vector space including the vector of the second word with the tip of the vector of the first word, and associates the vector space including the vector of the third word with the tip of the vector of the second word. For example, the learning device 10 converts the first word, the second word, and the third word into an n-dimensional vector. In such a case, the learning device 10 uses the vector "(a1, a2 ... an)" of the first word, the vector "(b1, b2 ... bn)" of the second word, and the third word. When the vector "(c1, c2 ... cn)" of is obtained, "(a1, a2)" is used as a vector showing the abstract concept of the sentence including the first word, the second word, and the third word. ... an, b1, b2 ... bn, c1, c2 ... cn) ", which is a combination of the vectors of each word to generate a 3n-dimensional vector. For example, the learning device 10 uses any of the tensor product spaces between the vectors in addition to the above-mentioned combination of the vectors to show the abstracted concept of the sentence (hereinafter, referred to as “abstract concept space”). Describe as.).

ここで、抽象概念空間で、各文章の分散表現を比較した場合には、文章の構造を考慮した上で、文章の抽象化された概念同士を比較することができる。例えば、「「月曜日」は「曜日」の「１つ目」である」という第１文章と「「１月１日」は「一年」の「初日」である」という第２文章とを考える。例えば、学習装置１０は、「月曜日」を第１単語とし、「曜日」を第２単語とし「１つ目」を第３単語として、第１単語のベクトルの先端に第２単語を埋め込み、第２単語のベクトルの先端に第３単語を埋め込むことで、第１文章を抽象概念空間に落とし込む。また例えば、学習装置１０は、「１月１日」を第１単語とし、「一年」を第２単語とし「初日」を第３単語として、第１単語のベクトルの先端に第２単語を埋め込み、第２単語のベクトルの先端に第３単語を埋め込むことで、第２文章を抽象概念空間に落とし込む。 Here, when comparing the distributed expressions of each sentence in the abstract concept space, it is possible to compare the abstracted concepts of the sentences after considering the structure of the sentences. For example, consider the first sentence "" Monday "is the" first day "of the" day of the week "" and the second sentence "" January 1 "is the" first day "of the" year ". .. For example, the learning device 10 embeds the second word at the tip of the vector of the first word, with "Monday" as the first word, "day of the week" as the second word, and "first" as the third word. By embedding the third word at the tip of the two-word vector, the first sentence is dropped into the abstract concept space. Further, for example, in the learning device 10, "January 1" is set as the first word, "one year" is set as the second word, "first day" is set as the third word, and the second word is set at the tip of the vector of the first word. By embedding and embedding the third word at the tip of the vector of the second word, the second sentence is dropped into the abstract concept space.

ここで、抽象概念空間上においては、第１文章のベクトルと第２文章のベクトルとのユークリッド距離とが近くなるとは限らない。しかしながら、第１文章および第２文章においては、文章構造が類似している。例えば、第１文章および第２文章では、第１単語が日付に関連する単語であり、第２単語が日付を含む単語であり、第３単語が第２単語において第１単語が示す意味を示す単語である。このため、抽象概念空間上においては、第１文章のベクトルの構造と第２文章のベクトルの構造とが類似すると予測される。この結果、抽象概念空間上においては、第１文章のベクトルと第２文章のベクトルのコサイン距離が１に近い値や所定の範囲内に収まる値を取ると予測される。 Here, in the abstract concept space, the Euclidean distance between the vector of the first sentence and the vector of the second sentence is not always close. However, in the first sentence and the second sentence, the sentence structures are similar. For example, in the first and second sentences, the first word is a date-related word, the second word is a date-containing word, and the third word indicates the meaning of the first word in the second word. It's a word. Therefore, in the abstract concept space, it is predicted that the vector structure of the first sentence and the vector structure of the second sentence are similar. As a result, in the abstract concept space, it is predicted that the cosine distance between the vector of the first sentence and the vector of the second sentence takes a value close to 1 or a value within a predetermined range.

このため、例えば、学習装置１０は、正解データとなる様々な文章を抽象概念空間上に落とし込んでおき、抽象概念空間上において、処理対象となる文章のベクトルと類似するベクトル（例えば、コサイン距離が１に近いベクトル）が示す文章を出力した場合は、処理対象となる文章と構造が類似する文章、すなわち、処理対象となる文章のたとえ話を出力することができると考えられる。 Therefore, for example, the learning device 10 drops various sentences that are correct data into the abstract concept space, and in the abstract concept space, a vector similar to the vector of the sentence to be processed (for example, the cosine distance is When the sentence indicated by (vector close to 1) is output, it is considered that a sentence having a structure similar to that of the sentence to be processed, that is, a parable of the sentence to be processed can be output.

そこで、学習装置１０は、以下の測定処理を実行する。まず、学習装置１０は、上述した学習処理によって学習が行われたモデルを用いて、利用者から受け付けた文章と抽象概念空間上における構造が類似する単語群を選択し、選択された単語群を用いて、受付けられた文章と同様の構造を有する文章を生成する。すなわち、学習装置１０は、利用者から受付けた文章に含まれる単語群と関係性の連鎖が類似する他の単語群を含む文章を生成する。そして、学習装置１０は、生成した文章を出力する。より具体的には、学習装置１０は、利用者から受付けた文章が有する概念を示すたとえとして、生成した文章を出力する。 Therefore, the learning device 10 executes the following measurement processing. First, the learning device 10 selects a word group having a structure similar to that of the sentence received from the user in the abstract concept space by using the model learned by the learning process described above, and selects the selected word group. Use to generate a sentence that has a structure similar to the accepted sentence. That is, the learning device 10 generates a sentence including another word group having a similar chain of relationships to the word group included in the sentence received from the user. Then, the learning device 10 outputs the generated sentence. More specifically, the learning device 10 outputs the generated sentence as a parable showing the concept of the sentence received from the user.

〔１−３．学習処理や測定処理の利用例について〕
ここで、学習装置１０は、上述したたとえ話を出力する処理を任意の目的で実行してよい。例えば、学習装置１０は、利用者に概念を教示するために、上述した学習処理や測定処理を利用してもよい。より具体的な例を挙げると、学習装置１０は、利用者が知識を有する分野でのたとえ話を生成することで、効率的に人に概念を理解させてもよい。 [1-3. About usage examples of learning processing and measurement processing]
Here, the learning device 10 may execute the process of outputting the parable described above for any purpose. For example, the learning device 10 may use the learning process and the measurement process described above in order to teach the user a concept. To give a more specific example, the learning device 10 may efficiently make a person understand a concept by generating a parable in a field in which the user has knowledge.

例えば、学習装置１０は、利用者から文章Ａおよび分野Ｄの指定を受付ける。このような場合、学習装置１０は、抽象概念空間上において、分野Ｄに属する文章のベクトルのうち、文章Ａと類似するベクトルを抽出する。そして、学習装置１０は、抽出したベクトルが示す文章Ｂを出力することで、利用者が指定した分野Ｄでのたとえ話を出力してもよい。より具体的には、学習装置１０は、「文章Ａに含まれる各単語の関係は、文章Ｂに含まれる各単語の関係みたいなもの」等といった応答を出力してもよい。 For example, the learning device 10 accepts the designation of the sentence A and the field D from the user. In such a case, the learning device 10 extracts a vector similar to the sentence A from the vectors of the sentences belonging to the field D in the abstract concept space. Then, the learning device 10 may output a parable in the field D designated by the user by outputting the sentence B indicated by the extracted vector. More specifically, the learning device 10 may output a response such as "the relationship of each word included in the sentence A is like the relationship of each word included in the sentence B".

このように、学習装置１０は、文章Ａを構成する単語のみならず、文章Ａの構造（各単語の関係性）を文章Ａの概念の構成要素として学習し、構成要素が類似する概念（すなわち、文章Ｂの概念）を抽象概念空間上で抽出し、抽出した概念の構成要素を用いて、文章Ａの概念のたとえ話を生成する。 In this way, the learning device 10 learns not only the words constituting the sentence A but also the structure of the sentence A (relationship of each word) as a component of the concept of the sentence A, and the concept having similar components (that is, that is). , The concept of sentence B) is extracted on the abstract concept space, and the parable of the concept of sentence A is generated by using the components of the extracted concept.

なお、例えば、学習装置１０は、利用者からたとえ話の条件を受付けてもよい。例えば、学習装置１０は、文章Ａおよび分野Ｄの指定と共に、たとえのベースとなる単語Ｃの指定を受付ける。このような場合、学習装置１０は、分野Ｄに属する文章のベクトルのうち、文章Ａと類似するベクトルであって、単語Ｃのベクトル（すなわち、単語Ｃの概念）を含むベクトルを選択し、選択したベクトルが示す文章を出力する。この結果、学習装置１０は、文章Ａに含まれる各単語の関係性を、単語Ｃを用いてたとえたたとえ話を出力することができるので、利用者の理解を促進させることができる。 For example, the learning device 10 may accept parable conditions from the user. For example, the learning device 10 accepts the designation of the word C, which is the base of the parable, as well as the designation of the sentence A and the field D. In such a case, the learning device 10 selects and selects a vector of sentences belonging to the field D, which is similar to the sentence A and includes the vector of the word C (that is, the concept of the word C). Output the sentence indicated by the vector. As a result, the learning device 10 can output a parable in which the relationship of each word included in the sentence A is compared by using the word C, so that the user's understanding can be promoted.

一方、上述したようなたとえ話の精度は、入力された文章の理解度を図る指標となりえる。そこで、学習装置１０は、上述した測定処理により利用者から受付けた文章と単語群の関係性の連鎖が類似する文章を応答として出力し、利用者との対話を通して効率的な学習を行ってもよい。 On the other hand, the accuracy of the parable as described above can be an index for measuring the comprehension of the input sentence. Therefore, even if the learning device 10 outputs a sentence having a similar chain of relationships between the sentence received from the user and the word group as a response by the above-mentioned measurement process, and performs efficient learning through dialogue with the user. Good.

例えば、学習装置１０は、利用者から文章Ａおよび分野Ｄの指定を受付ける。このような場合、学習装置１０は、抽象概念空間上において、分野Ｄに属する文章のベクトルのうち、文章Ａのベクトルと構造が類似する複数のベクトルを候補として抽出する。そして、学習装置１０は、抽出したベクトルのうち、正答である最も可能性が高いベクトルを用いてたとえ話を生成し、生成したたとえ話を出力する。この結果、利用者からたとえ話が正しい旨の入力を受付けた場合は、処理を終了する。一方、学習装置１０は、利用者からたとえ話が間違っている旨の入力を受付けた場合は、そのベクトルを候補から除外するように、最も正答である可能性が高いベクトルを再選択し、他のベクトルからたとえ話を再生成する。そして、学習装置１０は、再生成したたとえ話を出力する。このような処理を繰り替えす際に、正答である最も可能性が高いベクトルを選択する際のアルゴリズムを段階的に補正すればよい。 For example, the learning device 10 receives the designation of the sentence A and the field D from the user. In such a case, the learning device 10 extracts a plurality of vectors of sentences belonging to the field D having a structure similar to that of the vector of sentence A as candidates in the abstract concept space. Then, the learning device 10 generates a parable using the vector that is most likely to be the correct answer among the extracted vectors, and outputs the generated parable. As a result, when the input that the parable is correct is received from the user, the process is terminated. On the other hand, when the learning device 10 receives an input from the user that the parable is wrong, the learning device 10 reselects the vector that is most likely to be the correct answer so as to exclude the vector from the candidates, and other Regenerate the parable from the vector of. Then, the learning device 10 outputs the regenerated parable. When repeating such processing, the algorithm for selecting the vector that is most likely to be the correct answer may be corrected step by step.

〔１−４．学習処理について〕
なお、学習装置１０は、任意の数の単語を含む文章を抽象概念空間上に落とし込んでよい。例えば、学習装置１０は、順次、ある単語のベクトルの先端に他の単語のベクトルを埋め込むことで、任意の数の単語を含む文章を抽象概念空間に落とし込むことができる。また、学習装置１０は、各単語を任意の次元数の分散表現に変更して良い。 [1-4. About learning process]
The learning device 10 may drop a sentence containing an arbitrary number of words into the abstract concept space. For example, the learning device 10 can sequentially embed a vector of another word at the tip of a vector of a certain word, so that a sentence containing an arbitrary number of words can be dropped into the abstract concept space. Further, the learning device 10 may change each word into a distributed representation having an arbitrary number of dimensions.

なお、学習装置１０は、単語群が所定の構造を有する文章、すなわち、各単語が所定の関係性を有する文章であれば、任意の文章を正解データとし、上述した学習処理を実行すればよい。この際、学習装置１０は、文章の内容が属する分野（例えば、医療分野や技術分野等）等に関わらず、任意の文章を正解データとして学習してよい。 The learning device 10 may use any sentence as correct answer data and execute the above-mentioned learning process if the word group is a sentence having a predetermined structure, that is, a sentence in which each word has a predetermined relationship. .. At this time, the learning device 10 may learn any sentence as correct answer data regardless of the field to which the content of the sentence belongs (for example, the medical field, the technical field, etc.).

〔１−５．学習装置１０が実行する処理の一例〕
次に、図１を用いて、学習装置１０が実行する学習処理および測定処理の一例について説明する。まず、学習装置１０は、正解データとなる文章を用いた学習処理を実行する。より具体的には、学習装置１０は、正解データとなる文章に含まれる単語群を抽出し、抽出した単語のベクトルの先端に、他の単語のベクトルを順次埋め込むことで、文章の概念を示す抽象概念空間を形成する（ステップＳ１）。 [1-5. An example of processing executed by the learning device 10]
Next, an example of the learning process and the measurement process executed by the learning device 10 will be described with reference to FIG. First, the learning device 10 executes a learning process using sentences that are correct answer data. More specifically, the learning device 10 shows the concept of a sentence by extracting a word group included in a sentence as correct answer data and sequentially embedding a vector of another word at the tip of the vector of the extracted word. Form an abstract concept space (step S1).

例えば、学習装置１０は、文章＃１から単語＃１、単語＃２、単語＃３を抽出した場合は、各単語を分散表現に変換し、単語＃１の分散表現であるベクトルの先端に、単語＃２の分散表現であるベクトルを含むベクトル空間を紐付ける。さらに学習装置１０は、単語＃２の分散表現であるベクトルの先端に、単語＃３の分散表現であるベクトルを含むベクトル空間を紐付ける。この結果、学習装置１０は、文章＃１に含まれる各単語の関係性の構造を投影した抽象概念空間を生成することができる。 For example, when the learning device 10 extracts words # 1, words # 2, and words # 3 from sentence # 1, it converts each word into a distributed expression, and at the tip of the vector which is the distributed expression of word # 1. Associate a vector space containing a vector, which is a distributed representation of word # 2. Further, the learning device 10 associates a vector space including the vector which is the distributed expression of the word # 3 with the tip of the vector which is the distributed expression of the word # 2. As a result, the learning device 10 can generate an abstract concept space that projects the structure of the relationship of each word included in the sentence # 1.

また、学習装置１０は、測定処理として、学習処理により学習が行われた抽象概念空間を用いて、入力された文章の単語群と、単語間の関係性の構造が類似する他の単語群を選択し、選択した単語群からなる文章、すなわち、入力された文章のたとえ話を出力する測定処理を実行する。まず、学習装置１０は、利用者の発言Ａを入力として受付ける（ステップＳ２）。このような場合、学習装置１０は、抽象概念空間に、文章＃２に含まれる単語群を落とし込む（ステップＳ３）。例えば、学習装置１０は、学習処理と同様の処理を実行することで、文章＃２から単語群＃２を抽出し、単語群＃２に含まれる各単語をベクトル化し、ある単語のベクトルの先端に他の単語のベクトルを埋め込むことで、抽象概念空間上における文章＃２のベクトル＃２を生成する。 Further, the learning device 10 uses the abstract concept space learned by the learning process as the measurement process to select a word group of the input sentence and another word group having a similar structure of the relationship between the words. It selects and executes a measurement process that outputs a sentence consisting of the selected word group, that is, a parable of the input sentence. First, the learning device 10 receives the user's remark A as an input (step S2). In such a case, the learning device 10 drops the word group included in the sentence # 2 into the abstract concept space (step S3). For example, the learning device 10 extracts the word group # 2 from the sentence # 2 by executing the same process as the learning process, vectorizes each word included in the word group # 2, and the tip of the vector of a certain word. By embedding the vector of another word in, the vector # 2 of the sentence # 2 in the abstract concept space is generated.

続いて、学習装置１０は、類似するベクトルを構成する単語群を抽象概念空間から選択する（ステップＳ４）。例えば、学習装置１０は、ベクトル＃２と類似するベクトル、すなわち、各単語間の関係性の構造が類似するベクトルを選択する。ここで、学習装置１０は、抽象概念空間上にベクトル＃１とベクトル＃３とが存在するが、ベクトル＃３がベクトル＃２と類似する場合は、ベクトル＃３を選択し、ベクトル＃３を構成する単語群＃３を選択する。 Subsequently, the learning device 10 selects a group of words constituting a similar vector from the abstract concept space (step S4). For example, the learning device 10 selects a vector similar to vector # 2, that is, a vector having a similar structure of relationships between words. Here, the learning device 10 selects the vector # 3 and sets the vector # 3 when the vector # 1 and the vector # 3 exist in the abstract concept space, but the vector # 3 is similar to the vector # 2. Select the word group # 3 to compose.

そして、学習装置１０は、選択した単語群を用いて、入力された文章の概念のたとえ話を生成する（ステップＳ５）。例えば、学習装置１０は、単語群＃３から、文章＃２のたとえ話となる文章＃３を生成し、生成した文章＃３をたとえ話として出力する（ステップＳ６）。この結果、例えば、ロボット等の情報処理装置２００は、文章＃３を発言Ｃとして音声出力することができる。 Then, the learning device 10 uses the selected word group to generate a parable of the concept of the input sentence (step S5). For example, the learning device 10 generates a sentence # 3, which is a parable of the sentence # 2, from the word group # 3, and outputs the generated sentence # 3 as a parable (step S6). As a result, for example, the information processing device 200 such as a robot can output the sentence # 3 as a voice C.

〔２．学習装置の構成〕
以下、上記した学習処理を実現する学習装置１０が有する機能構成の一例について説明する。図２は、実施形態に係る学習装置の構成例を示す図である。図２に示すように、学習装置１０は、通信部２０、記憶部３０、および制御部４０を有する。 [2. Learning device configuration]
Hereinafter, an example of the functional configuration of the learning device 10 that realizes the above-mentioned learning process will be described. FIG. 2 is a diagram showing a configuration example of the learning device according to the embodiment. As shown in FIG. 2, the learning device 10 includes a communication unit 20, a storage unit 30, and a control unit 40.

通信部２０は、例えば、ＮＩＣ（Network Interface Card）等によって実現される。そして、通信部２０は、ネットワークＮと有線または無線で接続され、入力装置１００、および情報処理装置２００との間で情報の送受信を行う。 The communication unit 20 is realized by, for example, a NIC (Network Interface Card) or the like. Then, the communication unit 20 is connected to the network N by wire or wirelessly, and transmits / receives information between the input device 100 and the information processing device 200.

記憶部３０は、例えば、ＲＡＭ（Random Access Memory)、フラッシュメモリ（Flash Memory）等の半導体メモリ素子、または、ハードディスク、光ディスク等の記憶装置によって実現される。また、記憶部３０は、正解データデータベース３１および抽象概念空間データベース３２を記憶する。 The storage unit 30 is realized by, for example, a semiconductor memory element such as a RAM (Random Access Memory) or a flash memory (Flash Memory), or a storage device such as a hard disk or an optical disk. In addition, the storage unit 30 stores the correct answer data database 31 and the abstract concept space database 32.

正解データデータベース３１には、正解データとなる文章が登録されている。例えば、図３は、実施形態に係る正解データデータベースに登録される情報の一例を示す図である。図３に示す例では、正解データデータベース３１には、「文章ＩＤ（Identifier）」、「文章データ」、「第１単語」、「第２単語」等といった項目を有する情報が登録される。 In the correct answer data database 31, sentences that are correct answer data are registered. For example, FIG. 3 is a diagram showing an example of information registered in the correct answer data database according to the embodiment. In the example shown in FIG. 3, information having items such as "sentence ID (Identifier)", "sentence data", "first word", and "second word" is registered in the correct answer data database 31.

ここで、「文章ＩＤ（Identifier）」は、正解データとなる文章を識別するための情報である。また、「文章データ」とは、文章のテキストデータである。また、「第１単語」とは、対応付けられた「文章データ」に含まれる単語群のうち、文章内に最初に出現する単語であり、「第２単語」とは、対応付けられた「文章データ」に含まれる単語群のうち、文章内に２番目に出現する単語である。なお、正解データデータベース３１には、「第１単語」や「第２単語」以外にも、文章に含まれる単語が順に登録されているものとする。 Here, the "sentence ID (Identifier)" is information for identifying a sentence that is correct answer data. The "text data" is text data of a text. Further, the "first word" is a word that first appears in a sentence among the word groups included in the associated "sentence data", and the "second word" is the associated "sentence data". Among the word groups included in the "sentence data", this is the second word that appears in the sentence. In addition to the "first word" and the "second word", the words included in the sentence are registered in the correct answer data database 31 in order.

例えば、図３に示す例では、文章ＩＤ「ＩＤ＃１」、文章データ「文章データ＃１」、第１単語「単語＃１−１」、および第２単語「単語＃１−２」が対応付けて登録されている。このような情報は、文章ＩＤ「ＩＤ＃１」が示す文章が文章データ「文章データ＃１」であり、かかる文章中に第１単語「単語＃１−１」および第２単語「単語＃１−２」が順に含まれている旨を示す。 For example, in the example shown in FIG. 3, the sentence ID "ID # 1", the sentence data "sentence data # 1", the first word "word # 1-1", and the second word "word # 1-2" correspond to each other. It is registered with. In such information, the sentence indicated by the sentence ID "ID # 1" is the sentence data "sentence data # 1", and the first word "word # 1-1" and the second word "word # 1" are included in the sentence. -2 "indicates that they are included in order.

なお、図３に示す例では、「文章データ＃１」、「単語＃１−１」、「単語＃１−２」等といった概念的な値について記載したが、実際には文章のテキストデータや単語のテキストデータが登録されることとなる。 In the example shown in FIG. 3, conceptual values such as "sentence data # 1", "word # 1-1", and "word # 1-2" are described, but in reality, text data of sentences and text data are described. The text data of the word will be registered.

抽象概念空間データベース３２には、抽象概念空間に投影された文章のベクトル、すなわち、正解データである文章のベクトルが登録されている。例えば、図４は、実施形態に係る抽象概念空間データベースに登録される情報の一例を示す図である。図４に示す例では、抽象概念空間データベース３２には、「文章ＩＤ」、「文章データ」、「文章ベクトル」等といった項目を有する情報が登録される。 In the abstract concept space database 32, a vector of sentences projected on the abstract concept space, that is, a vector of sentences which is correct answer data is registered. For example, FIG. 4 is a diagram showing an example of information registered in the abstract concept space database according to the embodiment. In the example shown in FIG. 4, information having items such as "sentence ID", "sentence data", and "sentence vector" is registered in the abstract concept space database 32.

ここで、「文章ベクトル」とは、対応付けられた文章ＩＤが示す文章を抽象概念空間上に投影した際のベクトルであり、対応付けられた文章ＩＤが示す文章に含まれる単語群のうち、所定の単語のベクトルの先端に他の単語のベクトルを埋め込むことで生成されるベクトルである。例えば、「文章ベクトル」は、各単語のベクトルを順次連結することで生成されるベクトルである。 Here, the "sentence vector" is a vector when the sentence indicated by the associated sentence ID is projected onto the abstract concept space, and among the word groups included in the sentence indicated by the associated sentence ID. It is a vector generated by embedding a vector of another word at the tip of a vector of a predetermined word. For example, a "sentence vector" is a vector generated by sequentially concatenating the vectors of each word.

例えば、図４に示す例では、文章ＩＤ「ＩＤ＃１」、文章データ「文章データ＃１」、文章ベクトル「ベクトル＃１」が対応付けて登録されている。このような情報は、文章ＩＤ「ＩＤ＃１」が示す文章が文章データ「文章データ＃１」であり、かかる文章を抽象概念空間上に落とし込んだ場合、文章ベクトル「ベクトル＃１」となる旨を示す。なお、図４に示す例では、「ベクトル＃１」等といった概念的な値について記載したが、実際にはベクトルを示す数値が登録されることとなる。 For example, in the example shown in FIG. 4, the sentence ID “ID # 1”, the sentence data “sentence data # 1”, and the sentence vector “vector # 1” are registered in association with each other. Such information means that the sentence indicated by the sentence ID "ID # 1" is the sentence data "sentence data # 1", and when such a sentence is dropped into the abstract concept space, it becomes the sentence vector "vector # 1". Is shown. In the example shown in FIG. 4, conceptual values such as "vector # 1" are described, but in reality, numerical values indicating vectors are registered.

図２に戻り、説明を続ける。制御部４０は、コントローラ（controller）であり、例えば、ＣＰＵ（Central Processing Unit）、ＭＰＵ（Micro Processing Unit）等のプロセッサによって、学習装置１０内部の記憶装置に記憶されている各種プログラムがＲＡＭ等を作業領域として実行されることにより実現される。また、制御部４０は、コントローラ（controller）であり、例えば、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）等の集積回路により実現されてもよい。 Returning to FIG. 2, the explanation will be continued. The control unit 40 is a controller, and for example, various programs stored in a storage device inside the learning device 10 by a processor such as a CPU (Central Processing Unit) or an MPU (Micro Processing Unit) store a RAM or the like. It is realized by being executed as a work area. Further, the control unit 40 is a controller, and may be realized by, for example, an integrated circuit such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array).

図２に示すように、制御部４０は、抽出部４１、学習部４２、選択部４３、生成部４４、および出力部４５を有する。なお、抽出部４１および学習部４２は、上述した学習処理を実行し、選択部４３〜出力部４５は、上述した測定処理を実行する。 As shown in FIG. 2, the control unit 40 includes an extraction unit 41, a learning unit 42, a selection unit 43, a generation unit 44, and an output unit 45. The extraction unit 41 and the learning unit 42 execute the learning process described above, and the selection unit 43 to the output unit 45 execute the measurement process described above.

抽出部４１は、所定の構造を有する文章に含まれる単語群を抽出する。例えば、抽出部４１は、図示を省略した任意の装置から正解データとして文章を受信すると、形態素解析等により、文章に含まれる単語群を抽出する。そして、抽出部４１は、受信した文章と、文章に含まれる単語群とを正解データデータベース３１に登録する。例えば、抽出部４１は、単語群に含まれる各単語を、文章中に出現する順に、正解データデータベース３１に登録する。 The extraction unit 41 extracts a group of words included in a sentence having a predetermined structure. For example, when the extraction unit 41 receives a sentence as correct answer data from an arbitrary device (not shown), the extraction unit 41 extracts a word group included in the sentence by morphological analysis or the like. Then, the extraction unit 41 registers the received sentence and the word group included in the sentence in the correct answer data database 31. For example, the extraction unit 41 registers each word included in the word group in the correct answer data database 31 in the order of appearance in the sentence.

学習部４２は、所定のベクトル空間上において単語群に含まれる所定の単語の概念を示すベクトルが示す位置に、単語群に含まれる他の単語のベクトルを含むベクトル空間を紐付けることで、文章が有する概念を示す概念空間を学習する。すなわち、学習部４２は、抽象概念空間データベース３２を生成する。 The learning unit 42 associates a vector space including a vector of another word included in the word group with a position indicated by a vector indicating the concept of a predetermined word included in the word group on a predetermined vector space, and thereby causes a sentence. Learn the concept space that shows the concept of. That is, the learning unit 42 generates the abstract concept space database 32.

例えば、学習部４２は、正解データデータベース３１に登録された各文章の各単語群を、ｗ２ｖ等の技術を用いて、それぞれ分散表現に変換する。そして、学習部４２は、同一の文章に含まる各単語の分散表現、すなわちベクトルを連続させたベクトルを文章ベクトルとして生成する。すなわち、学習部４２は、文章に含まれるある単語のベクトルの先端に、他の単語のベクトルを埋め込み、かかる他の単語のベクトルの先端に、さらに別の単語のベクトルを埋め込む。換言すると、学習部４２は、文章に含まれる各単語のベクトルを多重化することで、文章ベクトルを生成する。そして、学習部４２は、生成した文章ベクトルを抽象概念空間データベース３２に登録する。 For example, the learning unit 42 converts each word group of each sentence registered in the correct answer data database 31 into a distributed expression by using a technique such as w2v. Then, the learning unit 42 generates a distributed expression of each word included in the same sentence, that is, a vector in which the vectors are continuous, as a sentence vector. That is, the learning unit 42 embeds the vector of another word at the tip of the vector of a certain word included in the sentence, and embeds the vector of another word at the tip of the vector of the other word. In other words, the learning unit 42 generates a sentence vector by multiplexing the vector of each word included in the sentence. Then, the learning unit 42 registers the generated sentence vector in the abstract concept space database 32.

このような学習部４２の学習処理により生成された抽象概念空間では、ベクトル同士の比較を行うことで、文章が有する単語間の関係性の構造を比較することができる。例えば、図５は、実施形態に係る学習装置が比較する構造の一例を示す図である。例えば、図５に示す例では、文章＃１には、単語＃１−１〜＃１−３が含まれており、文章＃２には、単語＃２−１〜＃２−３が含まれている。 In the abstract concept space generated by the learning process of the learning unit 42, the structure of the relationship between words in the sentence can be compared by comparing the vectors. For example, FIG. 5 is a diagram showing an example of a structure in which the learning devices according to the embodiment are compared. For example, in the example shown in FIG. 5, sentence # 1 contains words # 1-1 to # 1-3, and sentence # 2 contains words # 2-1 to # 2-3. ing.

ここで、文章＃１では、単語＃１−１が文章＃１のベースとなる概念を示し、単語＃１−２がベースのバリエーションを示すという構成＃１−１を有する。また、文章＃１では、単語＃１−３が、単語＃１−１および単語＃１−２からなる概念を修飾するという構成＃１−２を有するものとする。このような文章＃１を学習部４２が生成する抽象概念空間上に落とし込んだ場合、各単語を多重化した際に、各単語の概念のみならず、各単語間の関係性の構造、すなわち構成＃１−１および構成＃１−２も抽象概念空間上に落とし込むことができると考えられる。例えば、学習装置１０は、単語＃１−１のベクトルの先端に単語＃１−２のベクトルを埋め込み、単語＃１−２のベクトルの先端に単語＃１−３のベクトルを埋め込んだ場合は、埋め込みの順序により構成＃１−１および構成＃１−２を再現することができる。 Here, the sentence # 1 has a configuration # 1-1 in which the word # 1-1 indicates the concept that is the base of the sentence # 1 and the word # 1-2 indicates the variation of the base. Further, in the sentence # 1, it is assumed that the word # 1-3 has a configuration # 1-2 that modifies the concept consisting of the word # 1-1 and the word # 1-2. When such sentence # 1 is dropped into the abstract concept space generated by the learning unit 42, when each word is multiplexed, not only the concept of each word but also the structure of the relationship between each word, that is, the composition It is considered that # 1-1 and configuration # 1-2 can also be dropped into the abstract concept space. For example, when the learning device 10 embeds the vector of word # 1-2 at the tip of the vector of word # 1-1 and embeds the vector of word # 1-3 at the tip of the vector of word # 1-2, Configuration # 1-1 and configuration # 1-2 can be reproduced by the order of embedding.

ここで、文章＃２においても同様に、単語＃２−１が文章＃２のベースとなる概念を示し、単語＃２−２がベースのバリエーションを示すという構成＃２−１を有する。また、文章＃２では、単語＃２−３が、単語＃２−１および単語＃２−２からなる概念を修飾するという構成＃２−２を有するものとする。このような文章＃２を抽象概念空間上に落とし込んだ場合、個々の単語＃２−１〜＃２−３が単語＃１−１〜＃１−３と類似せずとも、構成＃２−１および構成＃２−２が構成＃１−１および構成＃１−２と類似する場合は、文章＃２全体のベクトルが文章＃１全体のベクトルと類似すると考えられる。 Here, similarly, the sentence # 2 has a configuration # 2-1 in which the word # 2-1 indicates the concept that is the base of the sentence # 2, and the word # 2-2 indicates the variation of the base. Further, in sentence # 2, it is assumed that word # 2-3 has a configuration # 2-2 that modifies the concept consisting of word # 2-1 and word # 2-2. When such a sentence # 2 is dropped into the abstract concept space, even if the individual words # 2-1 to # 2-3 are not similar to the words # 1-1 to # 1-3, the composition # 2-1 And if configuration # 2-2 is similar to configuration # 1-1 and configuration # 1-2, it is considered that the vector of the entire sentence # 2 is similar to the vector of the entire sentence # 1.

そこで、学習装置１０は、文章＃１のベクトルと文章＃２のベクトルとが類似する場合には、文章＃１と文章＃２とが同一構造を有すると推定し、文章＃１の構造が示す概念のたとえ話に文章＃２が利用可能であると判定する。例えば、学習装置１０は、以下に説明する測定処理を実行することで、文章＃１のたとえ話を生成する。 Therefore, when the vector of the sentence # 1 and the vector of the sentence # 2 are similar, the learning device 10 presumes that the sentence # 1 and the sentence # 2 have the same structure, and the structure of the sentence # 1 shows. Determine that sentence # 2 is available in the parable of the concept. For example, the learning device 10 generates the parable of sentence # 1 by executing the measurement process described below.

図２に戻り、説明を続ける。選択部４３は、入力装置１００から受けつけた文章から単語群を抽出し、学習部４２と同様の処理により、各単語のベクトルを連続させたベクトル、すなわち、入力された文章の文章ベクトルを生成する。そして、選択部４３は、抽象概念空間データベース３２を参照し、入力された文章の文章ベクトルと類似する文章ベクトルを検索する。例えば、選択部４３は、入力された文章の文章ベクトルとのコサイン距離の値が最も小さくなる文章ベクトルを検索する。そして、選択部４３は、検索した文章ベクトルと対応付けられた文章ＩＤを特定し、特定した文章ＩＤと対応付けられた単語群を正解データデータベース３１から選択する。 Returning to FIG. 2, the explanation will be continued. The selection unit 43 extracts a word group from the sentence received from the input device 100, and generates a vector in which the vectors of each word are continuous, that is, a sentence vector of the input sentence, by the same processing as the learning unit 42. .. Then, the selection unit 43 refers to the abstract concept space database 32 and searches for a sentence vector similar to the sentence vector of the input sentence. For example, the selection unit 43 searches for the sentence vector having the smallest cosine distance value from the sentence vector of the input sentence. Then, the selection unit 43 specifies the sentence ID associated with the searched sentence vector, and selects the word group associated with the specified sentence ID from the correct answer data database 31.

生成部４４は、選択された単語群を用いて、利用者から受付けた文章と同様の構造を有する文章を生成する。例えば、生成部４４は、入力装置１００から受付けた文章が有する構造と同様の構造を有する文章を選択部４３が選択した単語群から生成する。 The generation unit 44 uses the selected word group to generate a sentence having the same structure as the sentence received from the user. For example, the generation unit 44 generates a sentence having a structure similar to that of the sentence received from the input device 100 from the word group selected by the selection unit 43.

出力部４５は、生成部４４が生成した文章を、利用者から受付けた文章が有する概念を示すたとえとして出力する。例えば、出力部４５は、生成部４４が生成した文章を情報処理装置２００へと出力し、入力装置１００から受付けた文章のたとえとして、出力した文章を出力するように指示する。この結果、情報処理装置２００は、たとえ話を出力することができる。 The output unit 45 outputs the sentence generated by the generation unit 44 as a parable showing the concept of the sentence received from the user. For example, the output unit 45 outputs the sentence generated by the generation unit 44 to the information processing device 200, and instructs the output unit 45 to output the output sentence as a parable of the sentence received from the input device 100. As a result, the information processing device 200 can output the parable.

〔３．学習装置が実行する処理の流れの一例〕
次に、図６を用いて、学習装置１０が実行する学習処理の流れの一例について説明する。図６は、実施形態に係る学習処理の流れの一例を説明するフローチャートである。まず、学習装置１０は、正解データとなる文章を取得すると（ステップＳ１０１）、取得した文章から単語群を抽出する（ステップＳ１０２）。そして、学習装置１０は、単語群の各単語をベクトル化し（ステップＳ１０３）、あるベクトルの先端に他のベクトルのベクトル空間を紐付ける処理を順次行うことで、文章を抽象概念空間に落とし込み（ステップＳ１０４）、処理を終了する。 [3. An example of the flow of processing executed by the learning device]
Next, an example of the flow of the learning process executed by the learning device 10 will be described with reference to FIG. FIG. 6 is a flowchart illustrating an example of the flow of the learning process according to the embodiment. First, when the learning device 10 acquires a sentence to be correct answer data (step S101), the learning device 10 extracts a word group from the acquired sentence (step S102). Then, the learning device 10 vectorizes each word of the word group (step S103), and sequentially performs a process of associating the vector space of another vector with the tip of a certain vector, thereby dropping the sentence into the abstract concept space (step). S104), the process is terminated.

次に、図７を用いて、学習装置１０が実行する測定処理の流れの一例について説明する。図７は、実施形態に係る測定処理の流れの一例を説明するフローチャートである。まず、学習装置１０は、利用者の発言等、利用者が入力した文章を取得すると（ステップＳ２０１）、文章に含まれる各単語群のベクトルを多重化することで、文章を抽象概念空間に落とし込む（ステップＳ２０２）。そして、学習装置１０は、抽象概念空間上で、入力された文章のベクトルと類似するベクトルと対応する単語群を選択する（ステップＳ２０３）。すなわち、学習装置１０は、入力された文章と単語間の関係性の構成が類似する文章のベクトルを抽象概念空間上で検索し、検索したベクトルを構成する単語群を選択する。そして、学習装置１０は、選択した単語群を用いて、取得した文章のたとえ話を生成し（ステップＳ２０４）、生成したたとえ話を出力して（ステップＳ２０５）、処理を終了する。 Next, an example of the flow of the measurement process executed by the learning device 10 will be described with reference to FIG. 7. FIG. 7 is a flowchart illustrating an example of the flow of the measurement process according to the embodiment. First, when the learning device 10 acquires a sentence input by the user such as a user's remark (step S201), the learning device 10 drops the sentence into the abstract concept space by multiplexing the vector of each word group included in the sentence. (Step S202). Then, the learning device 10 selects a word group corresponding to a vector similar to the vector of the input sentence in the abstract concept space (step S203). That is, the learning device 10 searches for a vector of sentences having a similar structure of relationships between the input sentence and the word in the abstract concept space, and selects a group of words constituting the searched vector. Then, the learning device 10 generates a parable of the acquired sentence using the selected word group (step S204), outputs the generated parable (step S205), and ends the process.

〔４．変形例〕
上記では、学習装置１０による学習処理および測定処理の一例について説明した。しかしながら、実施形態は、これに限定されるものではない。以下、学習装置１０が実行する学習処理のバリエーションについて説明する。 [4. Modification example]
In the above, an example of the learning process and the measurement process by the learning device 10 has been described. However, the embodiments are not limited to this. Hereinafter, variations of the learning process executed by the learning device 10 will be described.

〔４−１．学習処理について〕
上述した例では、学習装置１０は、正解データとして受付けた文章の各単語をベクトル化し、ベクトルを順に多重化することで、抽象概念空間を学習した。ここで、学習装置１０は、所定の構造を有する文章であれば、任意の文章を正解データとして採用してよい。 [4-1. About learning process]
In the above example, the learning device 10 has learned the abstract concept space by vectorizing each word of the sentence received as correct answer data and multiplexing the vectors in order. Here, the learning device 10 may adopt any sentence as correct answer data as long as it is a sentence having a predetermined structure.

また、例えば、学習装置１０は、文章に含まれる単語間の関係性の構造に応じて、ベクトルを多重化する順番を変化させてもよい。例えば、学習装置１０は、第１の構造を有する文章を抽象概念空間に落とし込む場合は、各単語が文書中に出現する順序で、各単語のベクトルを多重化する。一方、学習装置１０は、第１の構造とは逆の概念を形成する第２の構造を有する文章を抽象概念空間に落とし込む場合は、各単語が文書中に出現する順序とは逆の順序で、各単語のベクトルを多重化してもよい。 Further, for example, the learning device 10 may change the order in which the vectors are multiplexed according to the structure of the relationship between the words included in the sentence. For example, when the sentence having the first structure is dropped into the abstract concept space, the learning device 10 multiplexes the vector of each word in the order in which each word appears in the document. On the other hand, when the learning device 10 drops a sentence having a second structure that forms a concept opposite to the first structure into the abstract concept space, the order in which each word appears in the document is reversed. , The vector of each word may be multiplexed.

〔４−２．装置構成〕
上述した例では、学習装置１０は、学習装置１０内で学習処理および測定処理を実行した。しかしながら、実施形態は、これに限定されるものではない。例えば、学習装置１０は、学習処理のみを実行し、測定処理については、他の装置が実行してもよい。例えば、学習装置１０が上述した学習処理によって生成した抽象概念空間をプログラムパラメータとして用いるプログラムを実行することで、学習装置１０以外の情報処理装置が、上述した測定処理を実現してもよい。また、学習装置１０は、正解データデータベース３１や抽象概念空間データベース３２を外部のストレージサーバに記憶させてもよい。 [4-2. Device configuration〕
In the above example, the learning device 10 executed the learning process and the measurement process in the learning device 10. However, the embodiments are not limited to this. For example, the learning device 10 may execute only the learning process, and another device may execute the measurement process. For example, an information processing device other than the learning device 10 may realize the above-mentioned measurement process by executing a program in which the learning device 10 uses the abstract concept space generated by the above-mentioned learning process as a program parameter. Further, the learning device 10 may store the correct answer data database 31 and the abstract concept space database 32 in an external storage server.

〔４−３．その他〕
また、上記実施形態において説明した各処理のうち、自動的に行われるものとして説明した処理の全部または一部を手動的に行うこともでき、あるいは、手動的に行われるものとして説明した処理の全部または一部を公知の方法で自動的に行うこともできる。この他、上記文章中や図面中で示した処理手順、具体的名称、各種のデータやパラメータを含む情報については、特記する場合を除いて任意に変更することができる。例えば、各図に示した各種情報は、図示した情報に限られない。 [4-3. Others]
Further, among the processes described in the above-described embodiment, all or a part of the processes described as being automatically performed can be manually performed, or the processes described as being manually performed can be performed. All or part of it can be done automatically by a known method. In addition, the processing procedure, specific name, and information including various data and parameters shown in the above text and drawings may be arbitrarily changed unless otherwise specified. For example, the various information shown in each figure is not limited to the illustrated information.

また、図示した各装置の各構成要素は機能概念的なものであり、必ずしも物理的に図示の如く構成されていることを要しない。すなわち、各装置の分散・統合の具体的形態は図示のものに限られず、その全部または一部を、各種の負荷や使用状況などに応じて、任意の単位で機能的または物理的に分散・統合して構成することができる。 Further, each component of each of the illustrated devices is a functional concept, and does not necessarily have to be physically configured as shown in the figure. That is, the specific form of distribution / integration of each device is not limited to the one shown in the figure, and all or part of the device is functionally or physically distributed / physically in arbitrary units according to various loads and usage conditions. It can be integrated and configured.

また、上記してきた各実施形態は、処理内容を矛盾させない範囲で適宜組み合わせることが可能である。 In addition, the above-described embodiments can be appropriately combined as long as the processing contents do not contradict each other.

〔５．プログラム〕
また、上述してきた実施形態に係る学習装置１０は、例えば図８に示すような構成のコンピュータ１０００によって実現される。図８は、ハードウェア構成の一例を示す図である。コンピュータ１０００は、出力装置１０１０、入力装置１０２０と接続され、演算装置１０３０、一次記憶装置１０４０、二次記憶装置１０５０、出力ＩＦ（Interface）１０６０、入力ＩＦ１０７０、ネットワークＩＦ１０８０がバス１０９０により接続された形態を有する。 [5. program〕
Further, the learning device 10 according to the above-described embodiment is realized by, for example, a computer 1000 having a configuration as shown in FIG. FIG. 8 is a diagram showing an example of a hardware configuration. The computer 1000 is connected to the output device 1010 and the input device 1020, and the arithmetic unit 1030, the primary storage device 1040, the secondary storage device 1050, the output IF (Interface) 1060, the input IF 1070, and the network IF 1080 are connected by the bus 1090. Has.

演算装置１０３０は、一次記憶装置１０４０や二次記憶装置１０５０に格納されたプログラムや入力装置１０２０から読み出したプログラム等に基づいて動作し、各種の処理を実行する。一次記憶装置１０４０は、ＲＡＭ等、演算装置１０３０が各種の演算に用いるデータを一次的に記憶するメモリ装置である。また、二次記憶装置１０５０は、演算装置１０３０が各種の演算に用いるデータや、各種のデータベースが登録される記憶装置であり、ＲＯＭ(Read Only Memory)、ＨＤＤ、フラッシュメモリ等により実現される。 The arithmetic unit 1030 operates based on a program stored in the primary storage device 1040 or the secondary storage device 1050, a program read from the input device 1020, or the like, and executes various processes. The primary storage device 1040 is a memory device that temporarily stores data used by the arithmetic unit 1030 for various calculations, such as a RAM. Further, the secondary storage device 1050 is a storage device in which data used by the calculation device 1030 for various calculations and various databases are registered, and is realized by a ROM (Read Only Memory), an HDD, a flash memory, or the like.

出力ＩＦ１０６０は、モニタやプリンタといった各種の情報を出力する出力装置１０１０に対し、出力対象となる情報を送信するためのインタフェースであり、例えば、ＵＳＢ（Universal Serial Bus）やＤＶＩ（Digital Visual Interface）、ＨＤＭＩ（登録商標）（High Definition Multimedia Interface）といった規格のコネクタにより実現される。また、入力ＩＦ１０７０は、マウス、キーボード、およびスキャナ等といった各種の入力装置１０２０から情報を受信するためのインタフェースであり、例えば、ＵＳＢ等により実現される。 The output IF 1060 is an interface for transmitting information to be output to an output device 1010 that outputs various information such as a monitor and a printer. For example, USB (Universal Serial Bus), DVI (Digital Visual Interface), and the like. It is realized by a standard connector such as HDMI (registered trademark) (High Definition Multimedia Interface). Further, the input IF 1070 is an interface for receiving information from various input devices 1020 such as a mouse, a keyboard, and a scanner, and is realized by, for example, USB.

なお、入力装置１０２０は、例えば、ＣＤ（Compact Disc）、ＤＶＤ（Digital Versatile Disc）、ＰＤ（Phase change rewritable Disk）等の光学記録媒体、ＭＯ（Magneto-Optical disk）等の光磁気記録媒体、テープ媒体、磁気記録媒体、または半導体メモリ等から情報を読み出す装置であってもよい。また、入力装置１０２０は、ＵＳＢメモリ等の外付け記憶媒体であってもよい。 The input device 1020 includes, for example, an optical recording medium such as a CD (Compact Disc), a DVD (Digital Versatile Disc), a PD (Phase change rewritable Disk), a magneto-optical recording medium such as an MO (Magneto-Optical disk), or a tape. It may be a device that reads information from a medium, a magnetic recording medium, a semiconductor memory, or the like. Further, the input device 1020 may be an external storage medium such as a USB memory.

ネットワークＩＦ１０８０は、ネットワークＮを介して他の機器からデータを受信して演算装置１０３０へ送り、また、ネットワークＮを介して演算装置１０３０が生成したデータを他の機器へ送信する。 The network IF1080 receives data from another device via the network N and sends it to the arithmetic unit 1030, and also transmits the data generated by the arithmetic unit 1030 to the other device via the network N.

演算装置１０３０は、出力ＩＦ１０６０や入力ＩＦ１０７０を介して、出力装置１０１０や入力装置１０２０の制御を行う。例えば、演算装置１０３０は、入力装置１０２０や二次記憶装置１０５０からプログラムを一次記憶装置１０４０上にロードし、ロードしたプログラムを実行する。 The arithmetic unit 1030 controls the output device 1010 and the input device 1020 via the output IF 1060 and the input IF 1070. For example, the arithmetic unit 1030 loads a program from the input device 1020 or the secondary storage device 1050 onto the primary storage device 1040, and executes the loaded program.

例えば、コンピュータ１０００が学習装置１０として機能する場合、コンピュータ１０００の演算装置１０３０は、一次記憶装置１０４０上にロードされたプログラムを実行することにより、制御部４０の機能を実現する。 For example, when the computer 1000 functions as the learning device 10, the arithmetic unit 1030 of the computer 1000 realizes the function of the control unit 40 by executing the program loaded on the primary storage device 1040.

〔６．効果〕
上述したように、学習装置１０は、所定の構造を有する文章に含まれる単語群を抽出する。そして、学習装置１０は、所定のベクトル空間上において単語群に含まれる所定の単語の概念を示すベクトルが示す位置に、単語群に含まれる他の単語のベクトルを含むベクトル空間を紐付けることで、文章が有する概念を示す抽象概念空間を学習する。このため、学習装置１０は、文章に含まれる各単語の関係性の構造同士を比較可能な抽象概念空間を生成することができるので、例えば、たとえ話等、利用者の理解を援助する情報を出力することができる。 [6. effect〕
As described above, the learning device 10 extracts a group of words included in a sentence having a predetermined structure. Then, the learning device 10 associates the vector space including the vector of another word included in the word group with the position indicated by the vector indicating the concept of the predetermined word included in the word group on the predetermined vector space. , Learn the abstract concept space that shows the concept of sentences. Therefore, the learning device 10 can generate an abstract concept space in which the structure of the relationship of each word included in the sentence can be compared with each other. Can be output.

また、学習装置１０は、学習された抽象概念空間を用いて、利用者から受け付けた文章と単語間の関係性の構造が類似する単語群を選択する。そして、学習装置１０は、選択された単語群を用いて、利用者から受け付けた文章と同様の構造を有する文章を生成し、生成した文章を出力する。例えば、学習装置１０は、利用者から受付けた文章が有する概念を示すたとえとして、生成した文章を出力する。このため、学習装置１０は、利用者の理解を援助する情報を出力することができる。 Further, the learning device 10 uses the learned abstract concept space to select a word group having a similar structure of the relationship between the sentence and the word received from the user. Then, the learning device 10 uses the selected word group to generate a sentence having the same structure as the sentence received from the user, and outputs the generated sentence. For example, the learning device 10 outputs the generated sentence as a parable showing the concept of the sentence received from the user. Therefore, the learning device 10 can output information that assists the user's understanding.

以上、本願の実施形態のいくつかを図面に基づいて詳細に説明したが、これらは例示であり、発明の開示の欄に記載の態様を始めとして、当業者の知識に基づいて種々の変形、改良を施した他の形態で本発明を実施することが可能である。 Although some of the embodiments of the present application have been described in detail with reference to the drawings, these are examples, and various modifications are made based on the knowledge of those skilled in the art, including the embodiments described in the disclosure column of the invention. It is possible to practice the present invention in other improved forms.

また、上記してきた「部（section、module、unit）」は、「手段」や「回路」などに読み替えることができる。例えば、生成部は、生成手段や生成回路に読み替えることができる。 In addition, the above-mentioned "section, module, unit" can be read as "means" or "circuit". For example, the generation unit can be read as a generation means or a generation circuit.

２０通信部
３０記憶部
３１正解データデータベース
３２抽象概念空間データベース
４０制御部
４１抽出部
４２学習部
４３選択部
４４生成部
４５出力部
１００入力装置
２００情報処理装置 20 Communication unit 30 Storage unit 31 Correct data database 32 Abstract concept space database 40 Control unit 41 Extraction unit 42 Learning unit 43 Selection unit 44 Generation unit 45 Output unit 100 Input device 200 Information processing device

Claims

所定の構造を有する文章に含まれる単語群を抽出する抽出部と、
前記単語群に含まれる各単語を個別にベクトル化した複数のベクトルを、前記文章において各単語が出現した順序で結合したベクトル、もしくは、前記文章において各単語が出現した順序で複数のベクトルのテンソル積となるベクトルを前記文章と対応するベクトルとして生成する生成部と、
前記生成部が文章ごとに生成したベクトル同士を比較することで、所定の文章と概念が類似する他の文章を選択する選択部と
を有することを特徴とする選択装置。 An extraction unit that extracts a group of words included in a sentence having a predetermined structure,
A plurality of vectors individually vectorized each word included before Symbol word group, vector each word in the sentence is bound in the order they appear, or a plurality of vectors in the order in which each word has appeared in the text A generator that generates a vector that is a tensor product as a vector corresponding to the sentence,
A selection device characterized in that the generation unit has a selection unit for selecting another sentence having a similar concept to a predetermined sentence by comparing the vectors generated for each sentence .

前記生成部は、利用者から受け付けた文章と、比較対象となる複数の文章とのそれぞれについて、各文章と対応するベクトルであって、次数が同一となるベクトルを生成し、
前記選択部は、前記比較対象となる複数の文章のうち、対応するベクトルが、前記利用者から受け付けた文章と対応するベクトルに類似する文章を選択する
ことを特徴とする請求項１に記載の選択装置。 The generation unit generates a vector corresponding to each sentence and having the same order for each of the sentence received from the user and the plurality of sentences to be compared.
The selection unit selects a sentence in which the corresponding vector is similar to the sentence received from the user and the corresponding vector from the plurality of sentences to be compared.
The selection device according to claim 1.

前記利用者から受付けた文章が有する概念を示すたとえとして、前記選択部が選択した文章に含まれる単語群を用いて生成された文章を出力する
ことを特徴とする請求項２に記載の選択装置。 Selection according though showing a concept included in the sentence received from the previous SL user in claim 2, characterized in that for outputting a sentence generated using word group contained in the sentence in which the selected portion is selected apparatus.

選択装置が実行する選択方法であって、
所定の構造を有する文章に含まれる単語群を抽出する抽出工程と、
前記単語群に含まれる各単語を個別にベクトル化した複数のベクトルを、前記文章において各単語が出現した順序で結合したベクトル、もしくは、前記文章において各単語が出現した順序で複数のベクトルのテンソル積となるベクトルを前記文章と対応するベクトルとして生成する生成工程と、
前記生成工程で文章ごとに生成したベクトル同士を比較することで、所定の文章と概念が類似する他の文章を選択する選択工程と
を含むことを特徴とする選択方法。 A selection method selection device executes,
An extraction process that extracts a group of words contained in a sentence having a predetermined structure,
A plurality of vectors individually vectorized each word included before Symbol word group, vector each word in the sentence is bound in the order they appear, or a plurality of vectors in the order in which each word has appeared in the text A generation step of generating a vector to be a tensor product as a vector corresponding to the above sentence,
A selection method comprising a selection step of selecting another sentence having a similar concept to a predetermined sentence by comparing the vectors generated for each sentence in the generation step .