JP6467893B2

JP6467893B2 - Information processing system, information processing method, and program

Info

Publication number: JP6467893B2
Application number: JP2014245097A
Authority: JP
Inventors: ダニエルアンドラーデ; 正明土田; 晃裕田村
Original assignee: NEC Corp
Current assignee: NEC Corp
Priority date: 2014-12-03
Filing date: 2014-12-03
Publication date: 2019-02-13
Anticipated expiration: 2034-12-03
Also published as: JP2016110284A

Description

本発明は、情報処理システム、情報処理方法、及び、プログラムに関し、特に、単語に係る情報を表すベクトルであるWord Embeddingを生成するための情報処理システム、情報処理方法、及び、プログラムに関する。 The present invention relates to an information processing system, an information processing method, and a program, and more particularly, to an information processing system, an information processing method, and a program for generating Word Embedding that is a vector representing information related to a word.

近年、自然言語処理に関する様々な目的のために、Word Embeddingが利用されている。Word Embeddingは、単語に関する構文や意味的な情報を、低次元のベクトルで表したものである。意味的に類似する単語間では、Word Embeddingの距離が小さくなることが知られている。 In recent years, Word Embedding has been used for various purposes related to natural language processing. Word Embedding is a low-dimensional vector that represents syntactic and semantic information about words. It is known that the distance of Word Embedding becomes small between words that are semantically similar.

Word Embeddingを文書（テキスト）の分類に用いる場合に、Word Embeddingに、クラスを推定するための情報を組み入れる（クラスを推定可能なWord Embeddingを学習する）ことが有用であることが、非特許文献１に開示されている。例えば、Word Embeddingを、テキストが「(romantic) date」についてのテキストであるかどうかの分類に用いる場合、単語「flowers」は「tulips」や、「present」、「chocolate」に近い方がよい。一方、テキストが「environment issue」についてのテキストであるかどうかの分類に用いる場合、単語「flowers」は「grass」や「trees」に近い方がよい。 When Word Embedding is used for document (text) classification, it is useful to incorporate information for class estimation into Word Embedding (learning Word Embedding that can be used for class estimation). 1 is disclosed. For example, when using Word Embedding to classify whether the text is text about “(romantic) date”, the word “flowers” should be closer to “tulips”, “present”, and “chocolate”. On the other hand, the word “flowers” should be closer to “grass” or “trees” when used to classify whether the text is about “environment issue”.

Word Embeddingの質を向上させるためには、一般的に、例えば、数百万程度の大量の学習データが必要である。しかしながら、上述のようなテキストの分類に用いるWord Embeddingの学習に必要な、クラスを表すラベルが付与された学習データは、例えば、数千程度と、それほど多くない。さらに、Word Embeddingの学習は、大規模なデータに対して行われるため、一般的に、Word Embeddingを生成するための計算量も膨大になる。 In order to improve the quality of Word Embedding, for example, a large amount of learning data of about several million is generally required. However, the number of learning data to which a label representing a class necessary for learning Word Embedding used for text classification as described above is given is not so large, for example, about several thousand. Furthermore, since learning of Word Embedding is performed on large-scale data, generally the amount of calculation for generating Word Embedding is enormous.

そのため、クラスを推定可能なWord Embeddingを生成する他の方法として、既に存在するWord Embeddingを、クラスを推定可能なWord Embeddingに適合させる技術が、例えば、非特許文献２、非特許文献３に開示されている。 Therefore, as another method of generating Word Embedding capable of estimating a class, a technique for adapting existing Word Embedding to Word Embedding capable of estimating a class is disclosed in, for example, Non-Patent Document 2 and Non-Patent Document 3. Has been.

非特許文献２に開示された技術では、Word Embeddingをパラメタとして用い、オリジナルのWord Embeddingに近づけるための正則化項を含む最適化問題を解くことにより、クラスを推定可能なWord Embeddingを学習する。 In the technique disclosed in Non-Patent Document 2, Word Embedding is learned by using Word Embedding as a parameter and solving an optimization problem including a regularization term to approximate the original Word Embedding.

また、非特許文献３に開示された技術では、ニューラルネットワークの初期パラメタにオリジナルのWord Embeddingを設定し、ラベルが付与された学習データを用いて、クラスを推定可能なWord Embeddingを学習する。 In the technique disclosed in Non-Patent Document 3, original Word Embedding is set as an initial parameter of a neural network, and Word Embedding in which a class can be estimated is learned using learning data with a label.

Duyu Tang他、「Learning sentiment-specific word embedding for twitter sentiment classification」、Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics、pages 1555-1565、June 2014Duyu Tang et al., `` Learning sentiment-specific word embedding for twitter sentiment classification '', Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics, pages 1555-1565, June 2014 Igor Labutov、Hod Lipson、「Re-embedding words」、Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics、pages 489-493、August 2013Igor Labutov, Hod Lipson, "Re-embedding words", Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics, pages 489-493, August 2013 Remi Lebret、Ronan Collobert、「Word embeddings through hellinger pca」、Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics、pages 482-490、April 2014Remi Lebret, Ronan Collobert, `` Word embeddings through hellinger pca '', Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics, pages 482-490, April 2014

しかしながら、上述の非特許文献２、３に開示された技術は、ラベルが付与された学習データに存在する単語のWord Embeddingを対象としており、学習データに存在しない単語のWord Embeddingには、クラスを推定するための情報を組み入れることができない。このため、学習データに存在する単語のWord Embeddingと学習データに存在しない単語のWord Embeddingとを正しく比較することができない。例えば、単語「like」と「love」間でオリジナルのWord Embeddingの距離が近いと仮定する。しかしながら、単語「like」だけが学習データに存在し、クラスを推定するための情報を組み入れられた場合、理想的には、単語「like」と「love」の類似性が維持されるべきであっても、Word Embedding間の距離が増加してしまう。 However, the techniques disclosed in Non-Patent Documents 2 and 3 described above are intended for Word Embedding of words existing in learning data with a label attached, and Word Embedding of words that do not exist in learning data has a class. Information for estimation cannot be incorporated. For this reason, it is not possible to correctly compare the word embedding of words existing in the learning data with the word embedding of words not existing in the learning data. For example, assume that the distance between the original word embedding is close between the words “like” and “love”. However, if only the word “like” is present in the learning data and the information for class estimation is incorporated, ideally the similarity between the words “like” and “love” should be maintained. However, the distance between Word Embedding increases.

本発明の目的は、上述の技術課題を解決し、学習データに存在しない単語のWord Embeddingも、クラスを推定可能なWord Embeddingに変換できる、情報処理システム、情報処理方法、及び、プログラムを提供することである。 An object of the present invention is to provide an information processing system, an information processing method, and a program capable of solving the above technical problem and converting Word Embedding of a word that does not exist in learning data into Word Embedding capable of estimating a class. That is.

上述の技術課題を解決するための技術手段として、本発明の情報処理システムは、複数のテキストの各々について、当該テキストが属するクラスに係る情報と当該テキストに含まれる各単語に係る情報を表すベクトルである第１のWord Embedding（ＷＥ）を取得する、ＷＥ取得手段と、前記テキストが属するクラスに係る情報と当該テキストに含まれる各単語の第１のＷＥとに基づき、前記第１のＷＥを、第２のＷＥを有する単語を含むテキストが属するクラスを推定可能な前記第２のＷＥに変換するための変換関数を学習する、変換関数学習手段と、を備える。 As a technical means for solving the above technical problem, the information processing system according to the present invention includes, for each of a plurality of texts, a vector representing information relating to a class to which the text belongs and information relating to each word included in the text. Based on the WE acquisition means for acquiring the first Word Embedding (WE), information relating to the class to which the text belongs, and the first WE of each word included in the text, the first WE And a conversion function learning means for learning a conversion function for converting the class to which the text including the word having the second WE belongs to the second WE that can be estimated.

また、本発明の情報処理方法は、複数のテキストの各々について、当該テキストが属するクラスに係る情報と当該テキストに含まれる各単語に係る情報を表すベクトルである第１のWord Embedding（ＷＥ）を取得し、前記テキストが属するクラスに係る情報と当該テキストに含まれる各単語の第１のＷＥとに基づき、前記第１のＷＥを、第２のＷＥを有する単語を含むテキストが属するクラスを推定可能な前記第２のＷＥに変換するための変換関数を学習する。 Further, the information processing method of the present invention provides, for each of a plurality of texts, a first word embedding (WE) that is a vector representing information relating to a class to which the text belongs and information relating to each word included in the text. Based on the information relating to the class to which the text belongs and the first WE of each word included in the text, the first WE is estimated as the class to which the text including the word having the second WE belongs. A conversion function for converting to the possible second WE is learned.

また、本発明のプログラムは、コンピュータに、複数のテキストの各々について、当該テキストが属するクラスに係る情報と当該テキストに含まれる各単語に係る情報を表すベクトルである第１のWord Embedding（ＷＥ）を取得し、前記テキストが属するクラスに係る情報と当該テキストに含まれる各単語の第１のＷＥとに基づき、前記第１のＷＥを、第２のＷＥを有する単語を含むテキストが属するクラスを推定可能な前記第２のＷＥに変換するための変換関数を学習する、処理を実行させる。 In addition, the program of the present invention causes a computer to first word embedding (WE) that is a vector representing information related to a class to which the text belongs and information related to each word included in the text for each of a plurality of texts. And classifying the first WE to the class to which the text including the word having the second WE belongs based on the information related to the class to which the text belongs and the first WE of each word included in the text. A process for learning a conversion function for converting to the second WE that can be estimated is executed.

本発明の技術効果は、学習データに存在しない単語のWord Embeddingを、クラスを推定可能なWord Embeddingに変換できることである。 The technical effect of the present invention is that Word Embedding of a word that does not exist in the learning data can be converted to Word Embedding capable of estimating a class.

本発明の実施の形態の特徴的な構成を示すブロック図である。It is a block diagram which shows the characteristic structure of embodiment of this invention. 本発明の実施の形態における、ＷＥ（Word Embedding）学習システム１００の構成を示す図である。It is a figure which shows the structure of WE (Word Embedding) learning system 100 in embodiment of this invention. 本発明の実施の形態における、コンピュータにより実現されたＷＥ学習システム１００の構成を示すブロック図である。It is a block diagram which shows the structure of WE learning system 100 implement | achieved by computer in embodiment of this invention. 本発明の実施の形態におけるＷＥ学習システム１００の動作を示すフローチャートである。It is a flowchart which shows operation | movement of the WE learning system 100 in embodiment of this invention. 本発明の実施の形態における変換行列Ｔの学習処理の例を示すフローチャートである。It is a flowchart which shows the example of the learning process of the conversion matrix T in embodiment of this invention. 本発明の実施の形態における、学習データ記憶部１１０に記憶される学習データの例を示す図である。It is a figure which shows the example of the learning data memorize | stored in the learning data storage part 110 in embodiment of this invention. 本発明の実施の形態における、ＷＥ記憶部１２０に記憶されるWord Embeddingの例を示す図である。It is a figure which shows the example of Word Embedding memorize | stored in the WE memory | storage part 120 in embodiment of this invention. 本発明の実施の形態における、変換済みＷＥ記憶部１６０に記憶される変換済みのWord Embeddingの例を示す図である。It is a figure which shows the example of converted Word Embedding memorize | stored in the converted WE memory | storage part 160 in embodiment of this invention.

はじめに、本発明の実施の形態の構成を説明する。図２は、本発明の実施の形態における、ＷＥ（Word Embedding）学習システム１００の構成を示す図である。ＷＥ学習システム１００は、本発明の情報処理システムの一実施形態である。 First, the configuration of the embodiment of the present invention will be described. FIG. 2 is a diagram showing a configuration of a WE (Word Embedding) learning system 100 in the embodiment of the present invention. The WE learning system 100 is an embodiment of the information processing system of the present invention.

図２を参照すると、ＷＥ学習システム１００は、学習データ記憶部１１０、ＷＥ記憶部１２０、ＷＥ取得部１３０、変換関数学習部１４０、ＷＥ変換部１５０、及び、変換済みＷＥ記憶部１６０を含む。 Referring to FIG. 2, the WE learning system 100 includes a learning data storage unit 110, a WE storage unit 120, a WE acquisition unit 130, a conversion function learning unit 140, a WE conversion unit 150, and a converted WE storage unit 160.

学習データ記憶部１１０は、学習データとして、クラスに分類された（ラベルが付与された）テキストを記憶する。 The learning data storage unit 110 stores text classified into classes (labeled) as learning data.

図６は、本発明の実施の形態における、学習データ記憶部１１０に記憶される学習データの例を示す図である。図６の例では、各学習データの識別子に対して、テキストが属するクラス、及び、テキストが関連付けられている。 FIG. 6 is a diagram illustrating an example of learning data stored in the learning data storage unit 110 according to the embodiment of the present invention. In the example of FIG. 6, the class to which the text belongs and the text are associated with the identifier of each learning data.

ＷＥ記憶部１２０は、単語毎に、クラスを推定するための情報を組み入れる前の（オリジナルの）Word Embedding（第１のWord Embedding）を記憶する。オリジナルのWord Embeddingは、例えば、クラスに分類されていない（ラベルが付与されていない）大量のテキストを用いた学習により、予め生成される。ＷＥ記憶部１２０に記憶されているWord Embeddingは、学習データに含まれる単語、及び、学習データに含まれない単語の両方のWord Embeddingを含む。 The WE storage unit 120 stores (original) Word Embedding (first Word Embedding) before incorporating information for estimating a class for each word. The original Word Embedding is generated in advance, for example, by learning using a large amount of text that is not classified into a class (not labeled). Word Embedding stored in the WE storage unit 120 includes Word Embedding of both words included in the learning data and words not included in the learning data.

図７は、本発明の実施の形態における、ＷＥ記憶部１２０に記憶されるWord Embeddingの例を示す図である。図７の例では、各単語の識別子に対して、当該単語のオリジナルのWord Embeddingが関連付けられている。 FIG. 7 is a diagram illustrating an example of Word Embedding stored in the WE storage unit 120 in the embodiment of the present invention. In the example of FIG. 7, each word identifier is associated with the original Word Embedding of the word.

ＷＥ取得部１３０は、学習データ記憶部１１０の各学習データについて、当該学習データのテキストが属するクラスを取得する。また、ＷＥ取得部１３０は、ＷＥ記憶部１２０から、当該テキストに含まれる各単語のWord Embeddingを取得する。 The WE acquisition unit 130 acquires, for each learning data in the learning data storage unit 110, a class to which the text of the learning data belongs. Further, the WE acquisition unit 130 acquires the word embedding of each word included in the text from the WE storage unit 120.

変換関数学習部１４０は、オリジナルのWord Embeddingを、クラスを推定するための情報を含む（クラスを推定可能な）Word Embedding（第２のWord Embedding）に変換するための変換関数を学習（生成）する。ここで、変換関数学習部１４０は、ＷＥ取得部１３０により取得された、各学習データのテキストが属するクラス、及び、当該テキストに含まれる各単語のWord Embeddingを用いて、変換関数を学習する。 The conversion function learning unit 140 learns (generates) a conversion function for converting the original Word Embedding into Word Embedding (second Word Embedding) that includes information for estimating a class (a class can be estimated). To do. Here, the conversion function learning unit 140 learns the conversion function using the class to which the text of each learning data acquired by the WE acquisition unit 130 belongs and the Word Embedding of each word included in the text.

ＷＥ変換部１５０は、変換関数学習部１４０により生成された変換関数を用いて、ＷＥ記憶部１２０に記憶されるオリジナルのWord Embeddingを、クラスを推定可能なWord Embeddingに変換する。 The WE conversion unit 150 uses the conversion function generated by the conversion function learning unit 140 to convert the original Word Embedding stored in the WE storage unit 120 into a Word Embedding in which the class can be estimated.

変換済みＷＥ記憶部１６０は、単語毎に、クラスを推定可能なWord Embedding（変換済みのWord Embedding）を記憶する。 The converted WE storage unit 160 stores, for each word, Word Embedding (converted Word Embedding) capable of estimating the class.

図８は、本発明の実施の形態における、変換済みＷＥ記憶部１６０に記憶される変換済みのWord Embeddingの例を示す図である。図８の例では、各単語の識別子に対して、当該単語の変換済みのWord Embeddingが関連付けられている。 FIG. 8 is a diagram illustrating an example of converted Word Embedding stored in the converted WE storage unit 160 in the embodiment of the present invention. In the example of FIG. 8, converted Word Embedding of the word is associated with the identifier of each word.

なお、ＷＥ学習システム１００は、ＣＰＵ（Central Processing Unit）とプログラムを記憶した記憶媒体を含み、プログラムに基づく制御によって動作するコンピュータであってもよい。 The WE learning system 100 may be a computer that includes a CPU (Central Processing Unit) and a storage medium that stores a program, and that operates by control based on the program.

図３は、本発明の実施の形態における、コンピュータにより実現されたＷＥ学習システム１００の構成を示すブロック図である。 FIG. 3 is a block diagram showing a configuration of a WE learning system 100 realized by a computer according to the embodiment of the present invention.

ＷＥ学習システム１００は、ＣＰＵ１０１、ハードディスクやメモリ等の記憶デバイス（記憶媒体）１０２、他の装置等と通信を行う通信デバイス１０３、マウスやキーボード等の入力デバイス１０４、及び、ディスプレイ等の出力デバイス１０５を含む。 The WE learning system 100 includes a CPU 101, a storage device (storage medium) 102 such as a hard disk and a memory, a communication device 103 that communicates with other devices, an input device 104 such as a mouse and a keyboard, and an output device 105 such as a display. including.

ＣＰＵ１０１は、ＷＥ取得部１３０、変換関数学習部１４０、及び、ＷＥ変換部１５０の機能を実現するためのコンピュータプログラムを実行する。記憶デバイス１０２は、学習データ記憶部１１０、ＷＥ記憶部１２０、及び、変換済みＷＥ記憶部１６０のデータを記憶する。なお、通信デバイス１０３が、他の装置等から、学習データやオリジナルのWord Embeddingを取得してもよい。また、通信デバイス１０３は、他の装置等へ、変換済みのWord Embeddingを出力してもよい。また、入力デバイス１０４が、利用者等から、学習データやオリジナルのWord Embeddingを取得し、出力デバイス１０５が、利用者等へ、変換済みのWord Embeddingを出力してもよい。 The CPU 101 executes a computer program for realizing the functions of the WE acquisition unit 130, the conversion function learning unit 140, and the WE conversion unit 150. The storage device 102 stores data of the learning data storage unit 110, the WE storage unit 120, and the converted WE storage unit 160. Note that the communication device 103 may acquire learning data or original Word Embedding from another device or the like. Further, the communication device 103 may output the converted Word Embedding to another device or the like. Further, the input device 104 may acquire learning data or original Word Embedding from a user or the like, and the output device 105 may output the converted Word Embedding to the user or the like.

また、ＷＥ学習システム１００は、図２に示された各構成要素が、有線または無線で接続された複数の物理的な装置に分散的に配置されることより構成されていてもよい。 Further, the WE learning system 100 may be configured such that each component illustrated in FIG. 2 is distributed in a plurality of physical devices connected by wire or wirelessly.

次に、本発明の実施の形態における、変換関数の学習方法を説明する。 Next, a method for learning a conversion function in the embodiment of the present invention will be described.

本発明の実施の形態では、変換関数として変換行列Ｔを学習する。変換行列Ｔを学習するために、変換行列Ｔをパラメタとして含む目的関数が設定された最適化問題を、以下のように生成する。 In the embodiment of the present invention, the transformation matrix T is learned as a transformation function. In order to learn the transformation matrix T, an optimization problem in which an objective function including the transformation matrix T as a parameter is set is generated as follows.

はじめに、単語ｗｏｒｄ＿ｊ（ｊ＝１,２，…）のオリジナルのWord Embeddingを次元数ｄのベクトルｅ^ｏｒｉｇ _ｊとすると、変換後のWord Embeddingであるｅ_ｊは、次元数ｄ×ｄの変換行列Ｔを用いた線形変換により、ｅ_ｊ＝Ｔ・ｅ^ｏｒｉｇ _ｊと表される。ここで、変換後のWord Embeddingについて、数１式を定義する。 First, when the original Word Embedding of the word word_j (j = 1, 2,...) Is a vector e ^orig _j of the dimension number d, the converted Word Embedding e _j is a transformation matrix T of the dimension number d × d. ^Is expressed as e _j = T · e ^orig _j . Here, Formula 1 is defined about the Word Embedding after conversion.

そして、ｉ番目の学習データのテキストｔｅｘｔ＿ｉを表す特徴量として、例えば、テキストｔｅｘｔ＿ｉに含まれる全単語についてのｆ_ｊの平均を、数２式のように定義する。 Then, as an amount of feature representing the text text_i of the i-th learning data, for example, an average of f _j for all words included in the text text_i is defined as in Expression 2.

ここで、ｎ_ｉは、テキストｔｅｘｔ＿ｉに含まれる単語の数である。 Here, _{n i} is the number of words contained in the text text_i.

テキストの分類に、ロジスティック回帰分析を用いる場合、特徴量ｘのテキストのクラスｙがｃ（ｃ∈Ｃ、Ｃはクラスを表すラベルの集合）であると推定される確率は、数３式に示すソフトマックス関数により表される。 When logistic regression analysis is used for text classification, the probability that the class y of the text of the feature quantity x is estimated to be c (cεC, C is a set of labels representing the class) is expressed by Equation 3. Represented by a softmax function.

ここで、ｗ_ｋ（ｗ_ｋは、次元数ｄのベクトル）とｂ_ｋは、回帰パラメタである。ｎ_ｃは、クラスの数である。 Here, w _k (w _k is a vector of dimension number d) and b _k are regression parameters. n _c is the number of class.

変換行列Ｔ、回帰パラメタｗ_ｋ、ｂ_ｋは、数３式の推定確率を用いた、数４式に示す目的関数を解くことにより得られる。 The transformation matrix T and the regression parameters w _k and b _k can be obtained by solving the objective function shown in Formula 4 using the estimated probability of Formula 3.

数４式の右辺第１項目は、損失関数（負の対数尤度関数）であり、クラスの推定精度の悪さを示す。右辺第２項目は、過学習を防ぐための正則化項である。｜｜・｜｜は、行列の大きさ（フロベニウスノルム（L2-norm））を示す。係数λ_１の値には、予め設定された所定の値が用いられる。 The first item on the right side of Equation 4 is a loss function (negative log-likelihood function) and indicates the poor estimation accuracy of the class. The second item on the right side is a regularization term for preventing overlearning. || · || indicates the size of the matrix (Frobenius norm (L2-norm)). The value of the coefficient lambda _1, a predetermined value is used that is set in advance.

さらに、オリジナルのWord Embeddingと変換後のWord Embeddingとの類似性を保つために、数４式に、オリジナルのWord Embeddingと変換後のWord Embeddingとの類似性を保つための正則化項Ｒｅｇ（Ｔ）を加えた、数５式に示す目的関数を用いてもよい。 Furthermore, in order to maintain the similarity between the original Word Embedding and the converted Word Embedding, the regularization term Reg (T) for maintaining the similarity between the original Word Embedding and the converted Word Embedding is expressed in Equation 4. ) To which the objective function shown in Formula 5 may be used.

ここで、係数λ_２の値には、予め設定された所定の値が用いられる。正則化項Ｒｅｇ（Ｔ）には、例えば、数６式が用いられる。 Here, the value of the coefficient lambda _2, a predetermined value is used that is set in advance. For example, Equation 6 is used as the regularization term Reg (T).

ここで、行列Ｂは、次元数ｄ×ｄの、変換行列Ｔに係る正規直交行列である。 Here, the matrix B is an orthonormal matrix related to the transformation matrix T having a dimension number d × d.

数７式に示すように、異なるWord Embeddingに正規直交行列Ｂによる線形変換を行っても、変換前と変換後で、Word Embedding間の距離（L2-distance）は変わらない。 As shown in Equation 7, even if linear transformation using the orthonormal matrix B is performed on different Word Embeddings, the distance (L2-distance) between the Word Embeddings does not change before and after the transformation.

したがって、数６式に示す正則化項により、変換行列Ｔを正規直交行列Ｂに近づけることにより、オリジナルのWord Embeddingと変換後のWord Embeddingとの類似性を保つような解が得られる。なお、変換行列Ｔに最も近い正規直交行列Ｂは、変換行列Ｔの特異値分解（ＳＶＤ：Singular Value Decomposition）により得られる左特異行列Ｕ、右特異行列Ｖを用いて、Ｂ＝Ｕ・Ｖ^Ｔにより与えられる。 Therefore, by bringing the transformation matrix T closer to the orthonormal matrix B using the regularization term shown in Formula 6, a solution that maintains the similarity between the original Word Embedding and the converted Word Embedding can be obtained. Note that the orthonormal matrix B closest to the transformation matrix T is obtained by using a left singular matrix U and a right singular matrix V obtained by singular value decomposition (SVD) of the transformation matrix T, and B = U · V ^T Given by.

次に、本発明の実施の形態の動作を説明する。 Next, the operation of the embodiment of the present invention will be described.

図４は、本発明の実施の形態におけるＷＥ学習システム１００の動作を示すフローチャートである。 FIG. 4 is a flowchart showing the operation of the WE learning system 100 according to the embodiment of the present invention.

はじめに、ＷＥ取得部１３０は、学習データ記憶部１１０から学習データを取得する（ステップＳ１０１）。 First, the WE acquisition unit 130 acquires learning data from the learning data storage unit 110 (step S101).

例えば、ＷＥ取得部１３０は、図６のような学習データを取得する。 For example, the WE acquisition unit 130 acquires learning data as illustrated in FIG.

ＷＥ取得部１３０は、学習データ記憶部１１０の各学習データについて、当該学習データのテキストが属するクラスを取得し、ＷＥ記憶部１２０から、当該テキストに含まれる各単語のWord Embeddingを取得する。（ステップＳ１０２）。 For each learning data stored in the learning data storage unit 110, the WE acquisition unit 130 acquires a class to which the text of the learning data belongs, and acquires, from the WE storage unit 120, the Word Embedding of each word included in the text. (Step S102).

例えば、ＷＥ取得部１３０は、図６の各学習データについて、当該学習データのテキストが属するクラスを取得し、図７のWord Embeddingの中から、当該テキストに含まれる各単語のオリジナルのWord Embeddingを取得する。 For example, the WE acquisition unit 130 acquires the class to which the text of the learning data belongs for each learning data in FIG. 6, and selects the original word embedding of each word included in the text from the word embedding in FIG. 7. get.

変換関数学習部１４０は、各学習データのテキストが属するクラス、及び、テキストに含まれる各単語のWord Embeddingを用いて、変換関数を学習するための目的関数を生成する（ステップＳ１０３）。 The conversion function learning unit 140 generates an objective function for learning the conversion function using the class to which the text of each learning data belongs and the Word Embedding of each word included in the text (step S103).

例えば、変換関数学習部１４０は、図６の各学習データのテキストが属するクラス、及び、図７のWord Embeddingの中から取得した、当該テキストに含まれる各単語のオリジナルのWord Embeddingを、数５式の目的関数に適用する。 For example, the conversion function learning unit 140 calculates the original Word Embedding of each word included in the text obtained from the class to which the text of each learning data in FIG. 6 belongs and the Word Embedding in FIG. Applies to the objective function of the expression.

変換関数学習部１４０は、生成した目的関数の解を求めることにより、変換関数を学習する（ステップＳ１０４）。 The conversion function learning unit 140 learns the conversion function by obtaining a solution of the generated objective function (step S104).

例えば、変換関数学習部１４０は、数５式の目的関数の解を求めることにより、変換行列Ｔ、パラメタｗ_ｋ、ｂ_ｋを学習する。 For example, the conversion function learning unit 140 learns the conversion matrix T and the parameters w _k and b _k by obtaining a solution of the objective function of Formula 5.

ここで、数５式における特異値分解は計算量が多いため、変換関数学習部１４０は、変換行列Ｔの学習を、例えば以下のように、変換行列Ｔに係る特異値分解と、変換行列Ｔ、パラメタｗ_ｋ、ｂ_ｋの更新と、に分けて近似的に行ってもよい。 Here, since the singular value decomposition in Equation 5 has a large amount of calculation, the conversion function learning unit 140 performs learning of the conversion matrix T by performing, for example, singular value decomposition related to the conversion matrix T and conversion matrix T as follows. , Parameters w _k and b _k may be updated and divided approximately.

図５は、本発明の実施の形態における変換行列Ｔの学習処理の例を示すフローチャートである。 FIG. 5 is a flowchart showing an example of the learning process of the transformation matrix T in the embodiment of the present invention.

変換関数学習部１４０は、変換行列Ｔを、例えば、単位行列のような所定の正規直交行列で初期化する（ステップＳ２０１）。 The conversion function learning unit 140 initializes the conversion matrix T with a predetermined orthonormal matrix such as a unit matrix (step S201).

変換関数学習部１４０は、変換行列Ｔの特異値分解Ｔ＝Ｕ・Ｓ・Ｖ^Ｔにより、特異行列Ｕ、Ｖを求め、正規直交行列ＢにＵ・Ｖ^Ｔを設定する（ステップＳ２０２）。 The transformation function learning unit 140 obtains singular matrices U and V by singular value decomposition T = U · S · V ^T of the transformation matrix T, and sets U · V ^T in the orthonormal matrix B (step S202).

変換関数学習部１４０は、例えば、勾配降下法（gradient decent）を用いて、変換行列Ｔ、パラメタｗ_ｋ、ｂ_ｋを更新する（ステップＳ２０３）。 The conversion function learning unit 140 updates the conversion matrix T and the parameters w _k and b _k using, for example, a gradient descent method (step S203).

変換関数学習部１４０は、ステップＳ２０３の結果が収束するまで、ステップＳ２０２からの処理を繰り返す（ステップＳ２０４）。ここで、収束の有無は、数５式の目的関数ｇ（ｘ；ｗ、Ｔ）の値の前回の値との差異が所定の閾値以下の場合、収束と判断する。 The conversion function learning unit 140 repeats the processing from step S202 until the result of step S203 converges (step S204). Here, the presence / absence of convergence is determined as convergence when the difference between the value of the objective function g (x; w, T) in Expression 5 and the previous value is equal to or less than a predetermined threshold.

なお、変換関数学習部１４０は、ステップＳ２０２を毎回実施する代わりに、所定の繰り返し回数毎に実施することにより、特異値分解の演算量を低減してもよい。また、変換関数学習部１４０は、ステップＳ２０１、Ｓ２０２の処理を初回だけ実施することにより、特異値分解の演算量をさらに低減してもよい。この場合、ステップＳ２０１における所定の正規直交行列として単位行列を用いることにより、変換行列Ｔは単位行列に近づくように更新される。 Note that the conversion function learning unit 140 may reduce the amount of computation of singular value decomposition by performing it every predetermined number of repetitions instead of performing step S202 every time. Further, the conversion function learning unit 140 may further reduce the calculation amount of the singular value decomposition by performing the processes of steps S201 and S202 only for the first time. In this case, the transformation matrix T is updated to approach the unit matrix by using the unit matrix as the predetermined orthonormal matrix in step S201.

次に、ＷＥ変換部１５０は、変換関数学習部１４０により学習された変換関数を用いて、ＷＥ記憶部１２０に記憶されているオリジナルのWord Embeddingの各々を、クラスを推定可能なWord Embeddingに変換する（ステップＳ１０５）。ＷＥ変換部１５０は、変換済みのWord Embeddingを変換済みＷＥ記憶部１６０に保存する
例えば、ＷＥ変換部１５０は、学習された変換行列Ｔを用いて、図７の各Word Embeddingｅ^ｏｒｉｇ _ｊを、図８のように、クラスを推定可能なWord Embeddingｅ_ｊに変換する。 Next, the WE conversion unit 150 converts each of the original Word Embedding stored in the WE storage unit 120 into a Word Embedding capable of estimating a class using the conversion function learned by the conversion function learning unit 140. (Step S105). The WE conversion unit 150 stores the converted Word Embedding in the converted WE storage unit 160. For example, the WE conversion unit 150 uses the learned conversion matrix T to represent each Word Embedding _e ^orig _j in FIG. As shown in FIG. 8, the class is converted into a Word Embedding _j that can be estimated.

以降、変換済みＷＥ記憶部１６０に記憶された変換済みのWord Embeddingを用いて、テキスト分類部（図示せず）や他の分類装置（図示せず）により、新たなテストデータについて、テキスト分類が行われる。 Thereafter, using the converted Word Embedding stored in the converted WE storage unit 160, the text classification is performed on new test data by a text classification unit (not shown) or another classification device (not shown). Done.

ここで、テキスト分類部や他の分類装置は、ロジスティック回帰分析を用いて、テストデータのテキストを分類してもよい。この場合、例えば、変換関数学習部１４０により学習した変換行列Ｔ、パラメタｗ_ｋ、ｂ_ｋを数３式に適用して得られる確率をもとに、分類が行われる。また、テキスト分類部や他の分類装置は、ロジスティック回帰分析以外の分析手法を用いて、テストデータのテキストを分類してもよい。 Here, the text classification unit and other classification devices may classify the text of the test data using logistic regression analysis. In this case, for example, the classification is performed based on the probability obtained by applying the transformation matrix T and the parameters w _k and b _k learned by the transformation function learning unit 140 to Equation (3). In addition, the text classification unit and other classification devices may classify the text of the test data using an analysis method other than logistic regression analysis.

以上により、本発明の実施の形態の動作が完了する。 Thus, the operation of the embodiment of the present invention is completed.

次に、本発明の実施の形態の変形例を説明する。 Next, a modification of the embodiment of the present invention will be described.

（正則化についての変形例１）
上述の説明では、数５式において、オリジナルのWord Embeddingと変換後のWord Embeddingとの類似性を保つための正則化項として、数６式を用いた。ここで、正則化項として、数６式の代わりに、正規直交行列Ｂに所定の倍率αを乗じた、数８式を用いてもよい。 (Modification 1 about regularization)
In the above description, Expression 6 is used as a regularization term for maintaining the similarity between the original Word Embedding and the converted Word Embedding in Expression 5. Here, as a regularization term, instead of Equation 6, Equation 8 obtained by multiplying the orthonormal matrix B by a predetermined magnification α may be used.

数９式に示すように、異なるWord Embeddingに、αＢによる線形変換を行っても、変換前と変換後で、Word Embedding間の相対的な距離は変わらない。 As shown in Equation 9, even if linear conversion using αB is performed on different Word Embeddings, the relative distance between Word Embeddings does not change before and after conversion.

｜α｜を十分大きくして得られた変換行列Ｔを用いてテキストの分類を行う場合、数１式において、ｔａｎｈが−１、または、＋１に近づくため、離散ベクトルが得られる。これにより、テキストの分類処理における演算量を低減できる。 When text is classified using the transformation matrix T obtained by sufficiently increasing | α |, tanh approaches −1 or +1 in Equation 1, so that a discrete vector is obtained. Thereby, the calculation amount in the text classification process can be reduced.

（正則化についての変形例２）
上述の説明では、オリジナルのWord Embeddingｅ^ｏｒｉｇを、変換行列Ｔを用いた線形変換Ｔ・ｅ^ｏｒｉｇにより、クラスを推定可能なWord Embeddingに変換した。しかしながら、オリジナルのWord Embeddingｅ^ｏｒｉｇを、ｔａｎｈ（Ｔ・ｅ^ｏｒｉｇ）のような非線形変換により変換してもよい。この場合、数６式の正則化項の代わりに、例えば、数１０式を用いてもよい。 (Modification 2 about regularization)
In the above description, the original Word Embedding ^orig is converted into Word Embedding that can be used to estimate the class by linear transformation T · e ^orig using the transformation matrix T. However, the original Word Embedding ^orig may be converted by non-linear conversion such as tanh (T · e ^orig ). In this case, instead of the regularization term of Formula 6, for example, Formula 10 may be used.

ここで、Ｓは、変換後のWord EmbeddingからオリジナルのWord Embeddingを再構成するための行列であり、上述の勾配降下法（gradient decent）により学習される。 Here, S is a matrix for reconstructing the original Word Embedding from the converted Word Embedding, and is learned by the gradient decent method described above.

（正則化についての変形例３）
また、オリジナルのWord Embeddingｅ^ｏｒｉｇを、α・ｔａｎｈ（（１／α）・Ｔ・ｅ^ｏｒｉｇ）のような非線形変換により変換してもよい。この場合、数１式の代わりに、数１１式が用いられる。 (Modification 3 about regularization)
Further, the original Word Embedding ^orig may be converted by nonlinear transformation such as α · tanh ((1 / α) · T · e ^orig ). In this case, Formula 11 is used instead of Formula 1.

ここで、αは、α≫１の実数である。この場合、変換行列Ｔが単位行列Ｉに近づけば、変換後のWord Embeddingは、オリジナルのWord Embeddingに近づく。したがって、数６式の正則化項の代わりに、数１２式が用いられる。 Here, α is a real number of α >> 1. In this case, if the conversion matrix T approaches the unit matrix I, the converted Word Embedding approaches the original Word Embedding. Therefore, Equation 12 is used instead of the regularization term of Equation 6.

（正則化についての変形例４）
また、数６式の正則化項の代わりに、数１３式のように、変換行列Ｔの大きさ（フロベニウスノルム（L2-norm））を用いてもよい。この場合、変換行列Ｔの大きさが小さくなるような解が得られる。 (Modification 4 about regularization)
Further, instead of the regularization term in Expression 6, the size of the transformation matrix T (Frobenius norm (L2-norm)) may be used as in Expression 13. In this case, a solution that reduces the size of the transformation matrix T is obtained.

さらに、数６式の正則化項の代わりに、数１４式のように、変換行列Ｔのトレースノルムを用いてもよい。この場合、変換行列Ｔのランクが低くなるような解が得られる。 Furthermore, instead of the regularization term in Expression 6, the trace norm of the transformation matrix T may be used as in Expression 14. In this case, a solution that lowers the rank of the transformation matrix T is obtained.

（変換関数についての変形例）
上述の説明では、変換関数として、次元数ｄ×ｄの変換行列Ｔを用いた。しかしながら、これに限らず、Word Embeddingを変換できれば、変換関数は、他の変換関数や他の変換行列でもよい。例えば、変換関数学習部１４０は、変換関数として、オリジナルのWord Embeddingをより高次の空間のWord Embeddingにマッピングする、次元数ｐ×ｄ（ｐ＞ｄ）の変換行列Ｔを用いてもよい。この場合、例えば、数６式の正則化項における正規直交行列Ｂの代わりに、数１５式のような行列Ｂ’が用いられる。 (Variation of conversion function)
In the above description, the transformation matrix T having the dimension number d × d is used as the transformation function. However, the present invention is not limited to this, and the conversion function may be another conversion function or another conversion matrix as long as Word Embedding can be converted. For example, the conversion function learning unit 140 may use, as the conversion function, a conversion matrix T having the number of dimensions p × d (p> d) that maps the original Word Embedding to the Word Embedding in a higher-order space. In this case, for example, instead of the orthonormal matrix B in the regularization term of Equation 6, a matrix B ′ as in Equation 15 is used.

ここで、０は、次元数（ｐ−ｄ）×ｄのゼロ行列である。行列Ｂ’による線形変換が、変換前のベクトル間の距離を変えないことも、容易に示すことができる。 Here, 0 is a zero matrix of dimension number (p−d) × d. It can also be easily shown that the linear transformation by the matrix B 'does not change the distance between the vectors before the transformation.

（目的関数についての変形例）
上述の説明では、各学習データに、テキストが属するクラスを表すラベルが付与されている場合を例に説明した。しかしながら、これに限らず、各学習データに、テキストが各クラスｃ（ｃ∈Ｃ）に属する確率ｐ_ｉ（ｙ_ｉ＝ｃ）が付与されていてもよい。この場合、変換関数学習部１４０は、数５式の代わりに、例えば数１６式の目的関数を用いて、変換関数を学習する。 (Modification of objective function)
In the above description, the case where a label representing the class to which the text belongs is given to each learning data as an example. However, the present invention is not limited thereto, and the probability p _i (y _i = c) that the text belongs to each class c (cεC) may be given to each learning data. In this case, the conversion function learning unit 140 learns the conversion function using, for example, an objective function of Expression 16 instead of Expression 5.

最後に、本発明の実施の形態の特徴的な構成を説明する。図１は、本発明の実施の形態の特徴的な構成を示すブロック図である。 Finally, a characteristic configuration of the embodiment of the present invention will be described. FIG. 1 is a block diagram showing a characteristic configuration of an embodiment of the present invention.

図１を参照すると、ＷＥ学習システム１００（情報処理システム）は、ＷＥ取得部１３０、及び、変換関数学習部１４０を含む。 Referring to FIG. 1, the WE learning system 100 (information processing system) includes a WE acquisition unit 130 and a conversion function learning unit 140.

ＷＥ取得部１３０は、複数のテキストの各々について、当該テキストが属するクラスに係る情報と当該テキストに含まれる各単語に係る情報を表すベクトルである第１のWord Embedding（ＷＥ）を取得する。 The WE acquisition unit 130 acquires, for each of a plurality of texts, first word embedding (WE) that is a vector representing information related to a class to which the text belongs and information related to each word included in the text.

変換関数学習部１４０は、テキストが属するクラスに係る情報と当該テキストに含まれる各単語の第１のＷＥとに基づき、第１のＷＥを、第２のＷＥを有する単語を含むテキストが属するクラスを推定可能な第２のＷＥに変換するための変換関数を学習する。 Based on the information related to the class to which the text belongs and the first WE of each word included in the text, the conversion function learning unit 140 uses the first WE as the class to which the text including the word having the second WE belongs. Is converted to a second WE that can be estimated.

本発明の実施の形態によれば、学習データに存在しない単語のWord Embeddingも、クラスを推定可能なWord Embeddingに変換できる。その理由は、変換関数学習部１４０が、テキストが属するクラスに係る情報とテキストに含まれる各単語の第１のWord Embeddingとに基づき、第１のWord Embeddingを、クラスを推定可能な第２のWord Embeddingに変換するための変換関数を学習するためである。 According to the embodiment of the present invention, Word Embedding of a word that does not exist in the learning data can also be converted into Word Embedding in which the class can be estimated. The reason is that the conversion function learning unit 140 is capable of estimating the first word embedding based on the information related to the class to which the text belongs and the first word embedding of each word included in the text. This is to learn a conversion function for converting to Word Embedding.

これにより、任意の単語のWord Embeddingを、変換関数を用いて、クラスを推定するための情報を含む（クラスを推定可能な）Word Embeddingに変換することができる。そして、テキストを分類するときに、学習データに存在しない単語についても、変換関数により変換された（クラスを推定可能な）Word Embeddingを用いることができる。したがって、テキストの分類精度が向上する。 Thereby, Word Embedding of an arbitrary word can be converted into Word Embedding that includes information for estimating a class (a class can be estimated) using a conversion function. When classifying text, Word Embedding converted by a conversion function (a class can be estimated) can also be used for words that do not exist in the learning data. Therefore, the text classification accuracy is improved.

以上、実施形態を参照して本願発明を説明したが、本願発明は上記実施形態に限定されるものではない。本願発明の構成や詳細には、本願発明のスコープ内で当業者が理解し得る様々な変更をすることができる。 While the present invention has been described with reference to the embodiments, the present invention is not limited to the above embodiments. Various changes that can be understood by those skilled in the art can be made to the configuration and details of the present invention within the scope of the present invention.

１００ＷＥ学習システム
１０１ＣＰＵ
１０２記憶デバイス
１０３通信デバイス
１０４入力デバイス
１０５出力デバイス
１１０学習データ記憶部
１２０ＷＥ記憶部
１３０ＷＥ取得部
１４０変換関数学習部
１５０ＷＥ変換部
１６０変換済みＷＥ記憶部 100 WE learning system 101 CPU
DESCRIPTION OF SYMBOLS 102 Storage device 103 Communication device 104 Input device 105 Output device 110 Learning data storage part 120 WE storage part 130 WE acquisition part 140 Conversion function learning part 150 WE conversion part 160 Converted WE storage part

Claims

複数のテキストの各々について、当該テキストが属するクラスに係る情報と当該テキストに含まれる各単語に係る情報を表すベクトルである第１のWord Embedding（ＷＥ）を取得する、ＷＥ取得手段と、
前記テキストが属するクラスに係る情報と当該テキストに含まれる各単語の第１のＷＥとに基づき、前記第１のＷＥを、第２のＷＥを有する単語を含むテキストが属するクラスを推定可能な前記第２のＷＥに変換するための変換関数を学習する、変換関数学習手段と、
を備える情報処理システム。 WE acquisition means for acquiring, for each of a plurality of texts, first word embedding (WE) that is a vector representing information related to a class to which the text belongs and information related to each word included in the text;
Based on the information related to the class to which the text belongs and the first WE of each word included in the text, the class to which the text including the word having the second WE belongs can be estimated from the first WE. A conversion function learning means for learning a conversion function for converting to the second WE;
An information processing system comprising:

前記変換関数学習手段は、前記第２のＷＥによる推定精度がより高くなるように、前記変換関数を学習する、
請求項１に記載の情報処理システム。 The conversion function learning means learns the conversion function so that the estimation accuracy by the second WE is higher.
The information processing system according to claim 1.

前記変換関数学習手段は、前記変換関数を用いて表された前記推定精度に係る目的関数の解を求めることにより、前記変換関数を学習する、
請求項１または２に記載の情報処理システム。 The conversion function learning means learns the conversion function by obtaining a solution of an objective function related to the estimation accuracy expressed using the conversion function.
The information processing system according to claim 1 or 2.

前記目的関数は、前記変換関数に係る正則化項を含む、
請求項３に記載の情報処理システム。 The objective function includes a regularization term related to the conversion function,
The information processing system according to claim 3.

前記変換関数に係る正則化項は、前記第１のＷＥと前記第２のＷＥとの類似性を保つための正則化項である、
請求項４に記載の情報処理システム。 The regularization term related to the conversion function is a regularization term for maintaining the similarity between the first WE and the second WE.
The information processing system according to claim 4.

前記変換関数に係る正則化項は、前記変換関数と当該変化関数に係る正規直交行列との差分の大きさを示す、
請求項５に記載の情報処理システム。 The regularization term related to the conversion function indicates the magnitude of the difference between the conversion function and the orthonormal matrix related to the change function.
The information processing system according to claim 5.

前記変換関数に係る正則化項は、前記変換関数と単位行列との差分の大きさを示す、
請求項５に記載の情報処理システム。 The regularization term related to the conversion function indicates the magnitude of the difference between the conversion function and the unit matrix.
The information processing system according to claim 5.

前記クラスに係る情報は、テキストが属するクラス、または、テキストが各クラスに属する確率のいずれかである、
請求項１乃至７のいずれかに記載の情報処理システム。 The information related to the class is either a class to which the text belongs, or a probability that the text belongs to each class.
The information processing system according to claim 1.

さらに、前記学習された変換関数を用いて、前記第１のＷＥを前記第２のＷＥに変換するＷＥ変換手段を備える、
請求項１乃至８のいずれかに記載の情報処理システム。 Furthermore, it comprises WE conversion means for converting the first WE into the second WE using the learned conversion function.
The information processing system according to claim 1.

コンピュータに具備されたＷＥ（Word Embedding）取得手段が、複数のテキストの各々について、当該テキストが属するクラスに係る情報と当該テキストに含まれる各単語に係る情報を表すベクトルである第１のＷＥを取得し、
前記コンピュータに具備された変換関数学習手段が、前記テキストが属するクラスに係る情報と当該テキストに含まれる各単語の第１のＷＥとに基づき、前記第１のＷＥを、第２のＷＥを有する単語を含むテキストが属するクラスを推定可能な前記第２のＷＥに変換するための変換関数を学習する、
情報処理方法。

A WE (Word Embedding) acquisition means provided in the computer obtains , for each of a plurality of texts, a first WE that is a vector representing information relating to a class to which the text belongs and information relating to each word included in the text. Acquired,
The conversion function learning means provided in the computer has the first WE and the second WE based on the information related to the class to which the text belongs and the first WE of each word included in the text. Learning a conversion function for converting the class to which the text including the word belongs into the second WE that can be estimated;
Information processing method.

コンピュータに、
複数のテキストの各々について、当該テキストが属するクラスに係る情報と当該テキストに含まれる各単語に係る情報を表すベクトルである第１のWord Embedding（ＷＥ）を取得し、
前記テキストが属するクラスに係る情報と当該テキストに含まれる各単語の第１のＷＥとに基づき、前記第１のＷＥを、第２のＷＥを有する単語を含むテキストが属するクラスを推定可能な前記第２のＷＥに変換するための変換関数を学習する、
処理を実行させるプログラム。 On the computer,
For each of a plurality of texts, obtain a first Word Embedding (WE) that is a vector representing information related to a class to which the text belongs and information related to each word included in the text,
Based on the information related to the class to which the text belongs and the first WE of each word included in the text, the class to which the text including the word having the second WE belongs can be estimated from the first WE. Learning a conversion function for converting to a second WE;
A program that executes processing.