JP2015152661A

JP2015152661A - Weighted finite state automaton creation device, symbol string conversion device, voice recognition device, methods thereof and programs

Info

Publication number: JP2015152661A
Application number: JP2014024129A
Authority: JP
Inventors: 堀　貴明; Takaaki Hori; 貴明堀; 陽太郎久保; Yotaro Kubo; 中村　篤; Atsushi Nakamura; 篤中村
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2014-02-12
Filing date: 2014-02-12
Publication date: 2015-08-24
Anticipated expiration: 2034-02-12
Also published as: JP6235922B2

Abstract

PROBLEM TO BE SOLVED: To provide a method of generating WFSA (Weighted Finite State Automaton) from an RNN (Recurrent Neural Network) language model.SOLUTION: The method includes the steps of: acquiring a state that becomes a transition source state, and a present first input symbol; creating a new state if a transition destination state based on the present first input symbol is not set from the transition source state, setting the newly created state as the transition destination state, and allocating the present first input symbol to the newly created state; calculating an appearance probability of the present first input symbol by using an RNN model when the transition destination state based on the present first input symbol is not set from the transition source state and the appearance probability of the present first input symbol is not calculated; and creating a state transition including the transition source state, the transition destination state, the present first input symbol and the appearance probability of the present first input symbol or a function including the appearance probability as an argument, as weights.

Description

本発明は、変化しうる有限の状態と、入力による状態の遷移を、図式化した重み付き有限オートマトン(weighted finite state automaton、以下「ＷＦＳＡ」ともいう)の作成技術、その作成方法により作成された重み付き有限オートマトンを用いた記号列の変換技術、音声認識技術に関する。 The present invention is created by a technique and method for creating a weighted finite state automaton (hereinafter also referred to as “WFSA”) in which a finite state that can be changed and a state transition caused by an input are schematically represented. The present invention relates to symbol string conversion technology and speech recognition technology using weighted finite automata.

ＷＦＳＡとは、変化しうる有限の状態と、入力による状態の遷移を、図式化したものである。 WFSA is a diagram of a finite state that can change and a state transition caused by input.

また、重み付き有限状態変換器（Weighted Finite-State Transducer、以下「ＷＦＳＴ」ともいう）は、ＷＦＳＡの拡張であり、状態の遷移は入力、重みに加え、出力を有する。ＷＦＳＴは、記号列を変換するための記号列変換規則を状態と状態遷移によって表現するともいえる。 A weighted finite state transducer (Weighted Finite-State Transducer, hereinafter also referred to as “WFST”) is an extension of WFSA, and a state transition has an output in addition to an input and a weight. It can be said that WFST expresses a symbol string conversion rule for converting a symbol string by a state and a state transition.

従来、音声認識において、入力音声の音響パターンを表す記号列を入力、その音響パターンに対応する単語列を出力とするＷＦＳＴに、Ｎグラムモデルで表される言語モデルのＷＦＳＴを合成して、記号列変換を行うことで、音響的にも言語的にも妥当な単語列（音声認識結果）に変換する方法が知られている（非特許文献１及び特許文献１参照）。 Conventionally, in speech recognition, a symbol string representing an acoustic pattern of an input speech is input, and a WFST of a language model represented by an N-gram model is synthesized with a WFST that outputs a word string corresponding to the acoustic pattern. A method of converting into a word string (speech recognition result) that is acoustically and linguistically appropriate by performing column conversion is known (see Non-Patent Document 1 and Patent Document 1).

一方で、Ｎグラムモデル以外の言語モデルとして、リカレントニューラルネットワーク言語モデルがある。リカレントニューラルネットワーク(英訳Recurrent Neural Network:以下「ＲＮＮ」ともいう）は多層ニューラルネットワークの一種であり、中間層のニューロンに再帰的な結合を持つのが特徴である。このＲＮＮ言語モデルは、Ｎグラム言語モデルと併せて用いることで音声認識の精度を大きく向上させることが知られている（非特許文献２）。 On the other hand, there is a recurrent neural network language model as a language model other than the N-gram model. A recurrent neural network (hereinafter referred to as “RNN”) is a kind of multilayer neural network, characterized by having recursive connections to neurons in an intermediate layer. This RNN language model is known to greatly improve the accuracy of speech recognition when used in combination with an N-gram language model (Non-Patent Document 2).

特開２００７−６６２３７号公報JP 2007-66237 A

堀，塚田，「重み付き有限状態トランスデューサによる音声認識」, 情報処理学会誌, 2004, Vol. 45, No.10, pp1020-1026.Hori, Tsukada, "Speech recognition by weighted finite state transducer", Journal of Information Processing Society of Japan, 2004, Vol. 45, No.10, pp1020-1026. T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, S. Khudanpur, "Recurrent neural network based language model", 国際会議Interspeech 2010予稿集, 2010, pp. 1045-1048.T. Mikolov, M. Karafiat, L. Burget, J. Cernocky, S. Khudanpur, "Recurrent neural network based language model", International Conference Interspeech 2010 Proceedings, 2010, pp. 1045-1048.

非特許文献１および特許文献１では、Ｎグラム言語モデルのＷＦＳＴを用いた記号列変換方法および音声認識方法が開示されているが、非特許文献２のＲＮＮ言語モデルをＷＦＳＴに変換する方法は知られておらず、ＲＮＮ言語モデルをＷＦＳＴに基づく効率的な記号列変換に適用することはできなかった。従来、ＲＮＮ言語モデルを用いる記号列変換方法では、Ｎグラム言語モデルを用いた記号列変換により複数の出力記号列の候補を求め、各候補のスコアをＲＮＮ言語モデルによって付け直すことにより、最もスコアの高くなった候補を結果として出力していた。しかし、入力記号列を最後まで読み込むまでは複数の出力記号列候補を出力させることはできないので、ＲＮＮ言語モデルによる各出力記号列候補へのスコア付けは、入力記号列を読み終えるまで始めることができない。音声認識の場合は、話し終えた後で認識結果が出力されるまでに遅延が生じることになるので、システムの応答に遅れが生じ、オンラインのシステムとしては使い辛く、利用範囲が限られるという問題がある。 Non-Patent Document 1 and Patent Document 1 disclose a symbol string conversion method and a speech recognition method using WFST of an N-gram language model, but the method of converting the RNN language model of Non-Patent Document 2 to WFST is known. The RNN language model could not be applied to efficient symbol string conversion based on WFST. Conventionally, in the symbol string conversion method using the RNN language model, a plurality of output symbol string candidates are obtained by symbol string conversion using the N-gram language model, and the score of each candidate is reassigned by the RNN language model. As a result, the candidate that became higher was output. However, since a plurality of output symbol string candidates cannot be output until the input symbol string is read to the end, scoring of each output symbol string candidate by the RNN language model can be started until the input symbol string is read. Can not. In the case of voice recognition, there will be a delay until the recognition result is output after the speech is finished, so the response of the system will be delayed, making it difficult to use as an online system, and the range of use is limited There is.

本発明は、ＲＮＮ言語モデルからＷＦＳＡを生成する方法を提供することを目的とする。 It is an object of the present invention to provide a method for generating WFSA from an RNN language model.

上記の課題を解決するために、本発明の一態様によれば、一つの入力層、一つ以上の中間層、および一つの出力層を持ち、少なくとも一つの中間層の中でニューロンが相互に結合された再帰結合を持つモデルをリカレントニューラルネットワーク（以下、ＲＮＮと呼ぶ）とし、ＲＮＮに入力される記号を表すベクトルを第一入力記号とし、最初から現在の一つ前までの第一入力記号の系列である第一入力記号列に対して、現在の第一入力記号の出現確率分布を出力するＲＮＮモデルがＲＮＮモデル格納部に格納されているものとし、重み付き有限状態オートマトン作成方法は、ＲＮＮモデルＷＦＳＡ状態遷移集合取得部が、変化しうる有限の状態と、入力による状態の遷移を表現する重み付き有限状態オートマトン(以下ＷＦＳＡともいう)である第一ＷＦＳＡにＲＮＮモデルを変換するＲＮＮモデルＷＦＳＡ状態遷移集合取得ステップを含む。ＲＮＮモデルＷＦＳＡ状態遷移集合取得ステップは、遷移元状態となる状態と現在の第一入力記号とを取得するステップと、遷移元状態から現在の第一入力記号による遷移先状態が未設定の場合、新たな状態を作成し、遷移先状態として新たに作成した状態を設定し、新たに作成した状態に現在の第一入力記号を割り当てるステップと、遷移元状態から現在の第一入力記号による遷移先状態が未設定であって、かつ、現在の第一入力記号の出現確率が計算されていない場合、ＲＮＮモデルを用いて、現在の第一入力記号の出現確率を計算するステップと、遷移元状態、遷移先状態、現在の第一入力記号、現在の第一入力記号の出現確率もしくはそれを引数に取る関数を重みとして含む状態遷移を作成するステップとを含む。 In order to solve the above problems, according to one aspect of the present invention, one input layer, one or more intermediate layers, and one output layer are provided, and neurons are mutually connected in at least one intermediate layer. A model having a combined recursive combination is a recurrent neural network (hereinafter referred to as RNN), a vector representing a symbol input to the RNN is a first input symbol, and the first input symbol from the beginning to the previous one The RNN model that outputs the current first input symbol appearance probability distribution is stored in the RNN model storage unit, and the weighted finite state automaton creation method is: The RNN model WFSA state transition set acquisition unit is a weighted finite state automaton (hereinafter also referred to as WFSA) that expresses a finite state that can change and a state transition caused by an input. Including RNN model WFSA state transition set obtaining step of converting the RNN model in the first WFSA that. The RNN model WFSA state transition set acquisition step includes a step of acquiring a state to be a transition source state and a current first input symbol, and when a transition destination state by the current first input symbol is not set from the transition source state, Create a new state, set the newly created state as the transition destination state, assign the current first input symbol to the newly created state, and transition destination from the transition source state to the current first input symbol When the state is not set and the current first input symbol appearance probability is not calculated, using the RNN model, calculating the current first input symbol appearance probability; and the transition source state Generating a state transition including, as a weight, a transition destination state, a current first input symbol, an appearance probability of the current first input symbol, or a function that takes it as an argument.

上記の課題を解決するために、本発明の他の態様によれば、一つの入力層、一つ以上の中間層、および一つの出力層を持ち、少なくとも一つの中間層の中でニューロンが相互に結合された再帰結合を持つモデルをリカレントニューラルネットワーク（以下、ＲＮＮと呼ぶ）とし、ＲＮＮに入力される記号を表すベクトルを第一入力記号とし、重み付き有限状態オートマトン作成装置は、最初から現在の一つ前までの第一入力記号の系列である第一入力記号列に対して、現在の第一入力記号の出現確率分布を出力するＲＮＮモデルが格納されるＲＮＮモデル格納部と、ＲＮＮモデルＷＦＳＡ状態遷移集合取得部が、変化しうる有限の状態と、入力による状態の遷移を表現する重み付き有限状態オートマトン(以下ＷＦＳＡともいう)である第一ＷＦＳＡにＲＮＮモデルを変換するＲＮＮモデルＷＦＳＡ状態遷移集合取得部とを含む。ＲＮＮモデルＷＦＳＡ状態遷移集合取得部は、遷移元状態となる状態と現在の第一入力記号とを取得し、遷移元状態から現在の第一入力記号による遷移先状態が未設定の場合、新たな状態を作成し、遷移先状態として新たに作成した状態を設定し、新たに作成した状態に現在の第一入力記号を割り当て、遷移元状態から現在の第一入力記号による遷移先状態が未設定であって、かつ、現在の第一入力記号の出現確率が計算されていない場合、ＲＮＮモデルを用いて、現在の第一入力記号の出現確率を計算し、遷移元状態、遷移先状態、現在の第一入力記号、現在の第一入力記号の出現確率もしくはそれを引数に取る関数を重みとして含む状態遷移を作成する。 In order to solve the above problems, according to another aspect of the present invention, there is one input layer, one or more intermediate layers, and one output layer, and neurons are mutually connected in at least one intermediate layer. A model having a recursive connection coupled to is a recurrent neural network (hereinafter referred to as RNN), a vector representing a symbol input to the RNN is a first input symbol, and a weighted finite state automaton generator is An RNN model storage unit that stores an RNN model that outputs an appearance probability distribution of the current first input symbol with respect to a first input symbol string that is a series of first input symbols up to the previous one, and an RNN model The WFSA state transition set acquisition unit is a first W that is a weighted finite state automaton (hereinafter also referred to as WFSA) that expresses a finite state that can change and a state transition caused by an input. And a RNN model WFSA state transition set obtaining unit for converting the RNN model SA. The RNN model WFSA state transition set acquisition unit acquires a state to be a transition source state and the current first input symbol, and if the transition destination state by the current first input symbol is not set from the transition source state, a new Create a state, set the newly created state as the transition destination state, assign the current first input symbol to the newly created state, and do not set the transition destination state from the transition source state to the current first input symbol If the appearance probability of the current first input symbol is not calculated, the occurrence probability of the current first input symbol is calculated using the RNN model, and the transition source state, the transition destination state, the current The state transition including the first input symbol, the current occurrence probability of the first input symbol, or a function that takes it as an argument is created.

本発明によれば、ＲＮＮ言語モデルからＷＦＳＡを生成することができる。 According to the present invention, a WFSA can be generated from an RNN language model.

WFSTの一例を示す図。The figure which shows an example of WFST. WFSTを表によって示す図。The figure which shows WFST with a table | surface. 一つのWFSTを用いた記号列変換の一例を示す図。The figure which shows an example of the symbol string conversion using one WFST. WFSTを用いた記号列変換手順を示す図。The figure which shows the symbol sequence conversion procedure using WFST. ＲＮＮ言語モデルを説明するための図。The figure for demonstrating a RNN language model. 第一実施形態に係る記号列変換装置の機能ブロック図。The functional block diagram of the symbol string conversion apparatus which concerns on 1st embodiment. ＲＮＮ言語モデルＷＦＳＴ状態遷移集合取得部の処理フローの一例を示す図。The figure which shows an example of the processing flow of a RNN language model WFST state transition set acquisition part. ＲＮＮ言語モデルＷＦＳＴから変換したＷＦＳＴの状態と状態遷移の例を示す図。The figure which shows the example of the state of WFST converted from the RNN language model WFST, and a state transition. 第一実施形態に係る記号列変換装置において、ＲＮＮ言語モデルＷＦＳＴから変換したＷＦＳＴの状態と状態遷移の例を示す図。The figure which shows the example of the state of WFST converted from the RNN language model WFST, and a state transition in the symbol sequence converter which concerns on 1st embodiment. 第二実施形態に係る音声認識装置の機能ブロック図。The functional block diagram of the speech recognition apparatus which concerns on 2nd embodiment. 第二実施形態に係る音声認識装置の効果を説明するための図。The figure for demonstrating the effect of the speech recognition apparatus which concerns on 2nd embodiment.

以下、本発明の実施形態について説明する。なお、以下の説明に用いる図面では、同じ機能を持つ構成部や同じ処理を行うステップには同一の符号を記し、重複説明を省略する。また、ベクトルや行列の各要素単位で行われる処理は、特に断りが無い限り、そのベクトルやその行列の全ての要素に対して適用されるものとする。 Hereinafter, embodiments of the present invention will be described. In the drawings used for the following description, constituent parts having the same function and steps for performing the same process are denoted by the same reference numerals, and redundant description is omitted. Further, the processing performed for each element of a vector or matrix is applied to all elements of the vector or matrix unless otherwise specified.

＜第一実施形態＞
＜第一実施形態のポイント＞
本実施形態では、ＲＮＮ言語モデルを次の手順でＷＦＳＴに変換し、記号列変換を行う。
（１）初めに初期状態を作成する。このとき入力記号の系列（以下、「入力記号列」ともいう）の始まりを表す記号を割り当てておいてもよい。
（２）遷移元状態から入力記号による遷移先状態として、新たな状態を作成し、新たに作成した状態に入力記号を割り当てる。ＲＮＮモデルを用いて、入力記号の出現確率を計算する。遷移元状態、遷移先状態、入力記号および入力記号に等しい出力記号、入力記号の出現確率もしくはそれを引数に取る関数を重みとして含む状態遷移を作成する。
（３）入力記号列が与えられ、その記号列変換を行う過程で、必要な状態やある入力記号に対応する状態遷移を必要なときに（２）の手順でＷＦＳＴに変換して取り出す。 <First embodiment>
<Points of first embodiment>
In the present embodiment, the RNN language model is converted to WFST by the following procedure, and symbol string conversion is performed.
(1) First, an initial state is created. At this time, a symbol representing the beginning of a series of input symbols (hereinafter also referred to as “input symbol string”) may be assigned.
(2) A new state is created as a transition destination state by an input symbol from the transition source state, and an input symbol is assigned to the newly created state. The appearance probability of the input symbol is calculated using the RNN model. A state transition including a transition source state, a transition destination state, an input symbol and an output symbol equal to the input symbol, an occurrence probability of the input symbol, or a function taking the argument as a weight is created.
(3) An input symbol string is given, and in the process of converting the symbol string, a necessary state or a state transition corresponding to a certain input symbol is converted into WFST by the procedure of (2) and taken out.

＜前提知識＞
第一実施形態について説明する前に、その前提となる知識について説明する。 <Prerequisite knowledge>
Prior to describing the first embodiment, the prerequisite knowledge will be described.

WFSTは、(1)状態と、(2)状態から状態へと遷移できることを表す状態遷移、(3)状態遷移において受理される入力記号、(4)その際に出力される出力記号、及び、(5)その状態遷移の重みの集合によって定義される。WFSTは、ある入力記号列が与えられたときに、初期状態からその入力記号列の記号を順に受理する状態遷移に従って出力記号を出力しながら状態遷移を繰り返し、終了状態に達すると終了するモデルである。形式的にはWFSTは次の８つの組（Ｑ，Σ，△，ｉ，Ｆ，Ｅ，λ，ρ）によって定義される。
１．Ｑは有限の状態の集合。
２．Σは入力記号の有限の集合。
３．△は出力記号の有限の集合。
４．ｉ∈Ｑは初期状態。
５．Ｆ∈Ｑは終了状態の集合。
６．Ｅ∈Ｑ×Σ×△×Ｑは、現状態から入力記号により、出力記号を出力して次状態に遷移する状態遷移の集合。
７．λは初期重み。
８．ρ（ｑ）は終了状態ｑの終了重み。ｑ∈Ｆ。 WFST consists of (1) state, (2) state transition indicating that transition from state to state is possible, (3) input symbol accepted in state transition, (4) output symbol output at that time, and (5) It is defined by a set of weights for the state transition. WFST is a model that, when a certain input symbol string is given, repeats the state transition while outputting output symbols according to the state transition that sequentially accepts the symbols of the input symbol sequence from the initial state, and terminates when the end state is reached. is there. Formally, WFST is defined by the following eight sets (Q, Σ, Δ, i, F, E, λ, ρ).
1. Q is a set of finite states.
2. Σ is a finite set of input symbols.
3. Δ is a finite set of output symbols.
4). iεQ is the initial state.
5. FεQ is a set of end states.
6). EεQ × Σ × Δ × Q is a set of state transitions in which an output symbol is output from the current state according to an input symbol to transition to the next state.
7). λ is the initial weight.
8). ρ (q) is the end weight of the end state q. qεF.

WFSTの一例を図１に示す。 An example of WFST is shown in FIG.

図１において、１０は、マル（“○”）で表された状態を示しており、そのマルの中の数字はその状態の番号を表している。１１は、二重マル（“◎”）で表された終了状態を示しており、その二重マルの中の数字は、その終了状態の番号と状態遷移が終了して最後に累積される終了重みが“（状態番号）／（終了重み）”のように表されている。以後、状態の番号を用いて状態を指し示す場合は、単に状態とその番号を用いて“状態０”や“状態３”のように称す。１２は、各状態を結ぶ矢印（“→”）で表された状態遷移を示しており、各々の状態遷移に付与された記号や数字は、その状態遷移に関連付けられた入力記号、出力記号、重みを“（入力記号）：（出力記号）／（重み）”のように表したものである。 In FIG. 1, 10 indicates a state represented by a circle (“◯”), and the number in the circle represents the state number. 11 indicates an end state represented by a double circle (“◎”), and the number in the double circle is the number of the end state and the end accumulated at the end of the state transition. The weight is expressed as “(state number) / (end weight)”. Hereinafter, when the state is indicated using the state number, the state and the number are simply referred to as “state 0” or “state 3”. Reference numeral 12 denotes a state transition represented by an arrow (“→”) connecting each state, and a symbol or a number given to each state transition is an input symbol, an output symbol, The weight is expressed as “(input symbol) :( output symbol) / (weight)”.

図２のように、図１のWFSTを表によって定義することもできる。図２は、各行が一つの状態遷移を表し、その状態遷移における遷移元（現状態）の状態番号と遷移先（次状態）の状態番号、入力記号、出力記号、重みが記されている。最終状態（図１では状態３）は、遷移先、入力記号、出力記号を空とし、状態遷移終了時に累積される重み（終了重み）を記されている。一般に、WFSTの初期状態は状態０とされ、初期重みλも省略されることが多い。そのため、本実施形態でも初期状態を状態０とし、初期重みを省略して明記しないこととする。 As shown in FIG. 2, the WFST of FIG. 1 can be defined by a table. In FIG. 2, each row represents one state transition, and the state number of the transition source (current state) and the state number of the transition destination (next state), the input symbol, the output symbol, and the weight in the state transition are described. In the final state (state 3 in FIG. 1), the transition destination, the input symbol, and the output symbol are empty, and the weight (end weight) accumulated at the end of the state transition is described. In general, the initial state of WFST is set to state 0, and the initial weight λ is often omitted. Therefore, in this embodiment, the initial state is set to state 0, the initial weight is omitted, and it is not specified.

図１のWFSTは、例えば、入力記号列ａ，ａ，ｂ，ｃを出力記号列ｄ，ｄ，ｃ，ｂに変換することができ、その際の状態遷移過程は、状態番号の系列を用いて表すと０，０，１，３であり、重みの累積値（以下「累積重み」と称す）は、０．５＋０．５＋０．３＋１＋０．５＝２．８となる。しかし、図１のWFSTでは、ａ，ａ，ｂ，ｃという入力記号列に対しては、０，０，１，３と０，０，２，３の２通りの状態遷移過程が考えられる。一般に、ある入力記号列に対して複数の状態遷移の可能性がある場合（これを非決定性という）は、状態遷移過程における累積重みが最小または最大になる状態遷移過程を選択し、その累積重みが最小または最大の状態遷移過程に対応する出力記号列を選択する。状態遷移の可能性が高いものに対してより大きい重みを設定する場合には、累積重みが最大の状態遷移過程に対応する出力記号列を選択し、状態遷移の可能性が高いものに対してより小さい重みを設定する場合には、累積重みが最小の状態遷移過程に対応する出力記号列を選択すればよい。図１の例においても、ａ，ａ，ｂ，ｃという入力記号列に対して累積重みが最も小さい状態遷移過程０，０，１，３を選んで、変換結果をｄ，ｄ，ｃ，ｂとする。 The WFST of FIG. 1 can convert, for example, input symbol strings a, a, b, and c into output symbol strings d, d, c, and b, and the state transition process at this time uses a sequence of state numbers. In other words, 0, 0, 1, 3 and the cumulative value of weight (hereinafter referred to as “cumulative weight”) is 0.5 + 0.5 + 0.3 + 1 + 0.5 = 2.8. However, in the WFST of FIG. 1, two state transition processes of 0, 0, 1, 3 and 0, 0, 2, 3 can be considered for input symbol strings a, a, b, and c. In general, when there is a possibility of multiple state transitions for an input symbol string (this is called non-determinism), the state transition process in which the cumulative weight in the state transition process is minimized or maximized is selected, and the cumulative weight is selected. Selects the output symbol string corresponding to the state transition process with the minimum or maximum. When setting a higher weight for the one with the high possibility of state transition, select the output symbol string corresponding to the state transition process with the largest cumulative weight, and for the one with the high possibility of state transition. When a smaller weight is set, an output symbol string corresponding to the state transition process with the smallest cumulative weight may be selected. Also in the example of FIG. 1, the state transition processes 0, 0, 1, 3 having the smallest cumulative weight are selected for the input symbol strings a, a, b, c, and the conversion results are d, d, c, b. And

ある重み付き有限状態変換器Ａがあり、この重み付き有限状態変換器Ａに対して記号列Ｘが入力記号列として与えられたとき、累積重みが最小となる出力記号列（すなわち記号列変換結果）を求めるには、次の累積重みの最小値Ｗ（Ｘ）を計算する必要がある。 When there is a weighted finite state converter A and a symbol string X is given as an input symbol string to the weighted finite state converter A, an output symbol string (that is, a symbol string conversion result that minimizes the accumulated weight) ) Is required to calculate the minimum value W (X) of the next cumulative weight.

ここで、Ｗ（Ｘ→Ｙ；Ａ）は、重み付き有限状態変換器Ａによって記号列Ｘが記号列Ｙに変換されるときの状態遷移過程における累積重みを表す。この累積重みＷ（Ｘ→Ｙ；Ａ）の最小値Ｗ（Ｘ）を求めて、その最小値を与える記号列Ｙが記号列変換結果となる。この記号列変換結果を求めるには、入力記号列によって初期状態から終了状態に至るコスト（累積重み）が、最小または最大の状態遷移過程を探し出すことによって行われる。この手順は、例えば、特許文献１に開示されている。 Here, W (X → Y; A) represents the cumulative weight in the state transition process when the symbol string X is converted to the symbol string Y by the weighted finite state converter A. The minimum value W (X) of the cumulative weight W (X → Y; A) is obtained, and the symbol string Y giving the minimum value is the symbol string conversion result. The symbol string conversion result is obtained by searching for a state transition process in which the cost (cumulative weight) from the initial state to the end state is minimized or maximized by the input symbol string. This procedure is disclosed in Patent Document 1, for example.

一つのWFSTを用いた記号列変換装置の機能ブロック図の一例を図３に示す。 An example of a functional block diagram of a symbol string converter using one WFST is shown in FIG.

まず、本明細書において、“仮説”とは、ある記号列を構成する記号が順に入力され（読み込まれ）、現時点までに読み込まれた入力記号列に対して、WFSTにおいて初期状態からその入力記号列によって状態遷移を繰り返した場合の可能性のある一つの状態遷移過程を表すものとする。 First, in this specification, “hypothesis” means that symbols constituting a certain symbol string are sequentially input (read), and the input symbol string read up to the present time is input from the initial state in WFST. It is assumed that one state transition process that may occur when the state transition is repeated by a sequence is represented.

記号列入力部１０３は、入力記号列を構成する記号を先頭から順に一つずつ読み込み（取得し）、仮説展開部１０４に送る。 The symbol string input unit 103 reads (acquires) the symbols constituting the input symbol string one by one from the top in order, and sends them to the hypothesis developing unit 104.

仮説展開部１０４は、記号列入力部１０３で取得した記号とＷＦＳＴ格納部１０１から読み込んだＷＦＳＴに従って、これまで読み込んだ記号列に対する仮説の集合を新たに受け取った記号を用いて各仮説の状態遷移過程を更新することにより新たな仮説を生成し、仮説絞込み部１０５に送る。 The hypothesis expansion unit 104 uses the symbols newly received from the hypothesis set for the symbol strings read so far according to the symbols acquired by the symbol string input unit 103 and the WFSTs read from the WFST storage unit 101, and the state transition of each hypothesis By updating the process, a new hypothesis is generated and sent to the hypothesis narrowing unit 105.

仮説絞込み部１０５は、仮説展開部１０４から受け取った仮説の集合に対し、同じ状態に到達している仮説の中で累積重みが最小または最大の仮説以外の仮説を削除することにより仮説を絞り込む。仮説絞込み部１０５は、入力記号列が最後まで読み込まれていれば、累積重みが最小または最大の仮説に対応する出力記号列を記号列出力部１０６に送る。入力記号列が最後まで読み込まれていなければ、仮説を仮説展開部１０４に送る。 The hypothesis narrowing unit 105 narrows down the hypotheses by deleting hypotheses other than the hypothesis having the smallest or largest cumulative weight among the hypotheses reaching the same state from the hypothesis set received from the hypothesis developing unit 104. If the input symbol string has been read to the end, the hypothesis narrowing-down unit 105 sends the output symbol string corresponding to the hypothesis having the minimum or maximum cumulative weight to the symbol string output unit 106. If the input symbol string has not been read to the end, the hypothesis is sent to the hypothesis developing unit 104.

記号列出力部１０６は、仮説絞込み部１０５から受け取った出力記号列を記号列変換結果として出力する。 The symbol string output unit 106 outputs the output symbol string received from the hypothesis narrowing unit 105 as a symbol string conversion result.

次に、この実施の形態に基づいて記号列を変換する手順の一例を示す。 Next, an example of a procedure for converting a symbol string based on this embodiment will be described.

まず、WFSTのある状態遷移をｅと表すとき、ｎ[ｅ]を遷移先の状態（次状態）、ｉ[ｅ]を入力記号、ｏ[ｅ]を出力記号、ｗ[ｅ]を重みと定義する。また、ある仮説をｈと表わすとき、ｓ[ｈ]をその状態遷移過程において到達している状態、Ｗ[ｈ]をその状態遷移過程における累積重み、Ｏ[ｈ]をその状態遷移過程において出力されている記号列とする。 First, when a state transition in WFST is expressed as e, n [e] is a transition destination state (next state), i [e] is an input symbol, o [e] is an output symbol, and w [e] is a weight. Define. Also, when a certain hypothesis is expressed as h, s [h] is a state reached in the state transition process, W [h] is an accumulated weight in the state transition process, and O [h] is output in the state transition process. It is a symbol string.

この手順において、仮説は仮説のリスト（以後これを「仮説リスト」と呼ぶ）を用いて管理する。仮説リストに対し、仮説を挿入したり、仮説を取り出したりすることができる。但し、仮説リストに仮説を挿入する場合に、仮説リスト内に同じ状態に到達している仮説があれば、累積重みの小さいほうまたは大きいほうだけを仮説リストに残し、仮説を絞り込む。 In this procedure, hypotheses are managed using a list of hypotheses (hereinafter referred to as “hypothesis list”). Hypotheses can be inserted into and extracted from the hypothesis list. However, when a hypothesis is inserted into the hypothesis list, if there is a hypothesis that has reached the same state in the hypothesis list, only the smaller or larger cumulative weight is left in the hypothesis list to narrow down the hypotheses.

WFSTを用いた記号列変換手順を図４に示す。 The symbol string conversion procedure using WFST is shown in FIG.

以下、WFSTを用いる記号列変換の例（図３）の各部がどのような手順で処理を行うか、図４を用いて説明する。 In the following, the procedure performed by each unit of the example of symbol string conversion using WFST (FIG. 3) will be described with reference to FIG.

ステップＳ１０１より開始し、初期設定として、ステップＳ１０２において空の仮説リストＨとＨ’とを生成する。ステップＳ１０３において、初期の仮説ｈ（ｈは仮説展開部１０４において更新する前の仮説を表す）を生成し、状態ｓ[ｈ]＝０（WFSTの初期状態）、累積重みＷ[ｈ]＝０、出力記号列Ｏ[ｈ]＝φ（ここではφは空の記号列を表す）とし、仮説リストＨに挿入する。 Starting from step S101, as an initial setting, empty hypothesis lists H and H 'are generated in step S102. In step S103, an initial hypothesis h (h represents a hypothesis before being updated in the hypothesis developing unit 104) is generated, a state s [h] = 0 (an initial state of WFST), and a cumulative weight W [h] = 0. The output symbol string O [h] = φ (here, φ represents an empty symbol string) and is inserted into the hypothesis list H.

ステップＳ１０４では、記号列入力部１０３は、入力記号列に含まれる記号を一つ読み込み、その記号をｘに代入し、仮説展開部１０４に出力する。次のステップＳ１０５からＳ１０８は、仮説展開部１０４において実行される。 In step S <b> 104, the symbol string input unit 103 reads one symbol included in the input symbol string, substitutes that symbol for x, and outputs it to the hypothesis expansion unit 104. The following steps S105 to S108 are executed in the hypothesis developing unit 104.

ステップＳ１０５では、仮説リストＨから仮説を一つ取り出し仮説ｈに代入し、状態ｓ[ｈ]から入力記号がｘに等しい状態遷移のリスト（以下、「状態遷移リスト」ともいう）Ｅを用意する。 In step S105, one hypothesis is extracted from the hypothesis list H and substituted into the hypothesis h, and a state transition list (hereinafter also referred to as “state transition list”) E whose input symbol is equal to x is prepared from the state s [h]. .

ステップＳ１０６では、状態遷移リストＥ＝φ（ここではφは空のリストを表す）であればＳ１１０に進む。そうでなければ、Ｓ１０７に進み、状態遷移リストＥから状態遷移を一つ取り出し、ｅに代入する。 In step S106, if the state transition list E = φ (here, φ represents an empty list), the process proceeds to S110. Otherwise, the process proceeds to S107, and one state transition is extracted from the state transition list E and substituted into e.

ステップＳ１０８で新たな仮説ｆ（ｆは仮説展開部１０４において更新した後の仮説を表す）を生成し、状態ｓ[ｆ]＝ｎ[ｅ]、累積重みＷ[ｆ]＝Ｗ[ｈ]＋ｗ[e]、出力記号列Ｏ[ｆ]＝Ｏ[ｈ]・ｏ[ｅ]とし、仮説絞込み部１０５に出力する。ここで、“・”は二つの記号または記号列を接続し、一つの記号列にする演算を表す。 In step S108, a new hypothesis f (f represents a hypothesis after being updated in the hypothesis developing unit 104) is generated, and a state s [f] = n [e] and a cumulative weight W [f] = W [h] + w [e], output symbol string O [f] = O [h] · o [e], and output to hypothesis narrowing down section 105. Here, “·” represents an operation of connecting two symbols or symbol strings to form one symbol string.

ステップＳ１０９は、仮説絞込み部１０５で実行され、仮説ｆを仮説リストＨ’に挿入することにより仮説を絞り込む。例えば、仮説リストＨ’内に同じ状態に到達している仮説があれば、累積重みの小さいほうまたは大きいほうだけを仮説リストＨ’に残し、仮説を絞り込む。 Step S109 is executed by the hypothesis narrowing-down unit 105 and narrows down the hypotheses by inserting the hypothesis f into the hypothesis list H ′. For example, if there is a hypothesis that reaches the same state in the hypothesis list H ′, only the smaller or larger cumulative weight is left in the hypothesis list H ′ to narrow down the hypotheses.

ステップＳ１０９からＳ１０６に戻り、次の状態遷移について仮説を展開する。 Returning from step S109 to S106, a hypothesis is developed for the next state transition.

ステップＳ１１０では、仮説リストＨ＝φ（すべての仮説を展開済み）であればＳ１１１に進む。そうでなければＳ１０６に戻り、次の仮説ｈを展開する。 In step S110, if the hypothesis list H = φ (all hypotheses have been expanded), the process proceeds to S111. Otherwise, return to S106 and develop the next hypothesis h.

ステップＳ１１１では、新たに生成された仮説リストＨ’の要素を、すでに空となった仮説リストＨにすべて移し、Ｓ１１２に進む。 In step S111, all the elements of the newly generated hypothesis list H 'are transferred to the already empty hypothesis list H, and the process proceeds to S112.

ステップＳ１１２では、記号列入力部１０３において次の入力記号が存在するならばＳ１０４に戻り、そうでなければ、入力記号列がすべて読み込まれたと判断しＳ１１３に進む。 In step S112, if there is a next input symbol in the symbol string input unit 103, the process returns to S104. Otherwise, it is determined that all the input symbol strings have been read, and the process proceeds to S113.

ステップＳ１１３では、仮説リストＨの中で終了状態に到達している仮説の累積重みにその終了状態の終了重みを加えた後で、その終了状態に到達している仮説の中から累積重み（Ｗ[ｈ]）が最小となる仮説ｈを選び、その出力記号列Ｏ[ｈ]を記号列変換結果として、記号列出力部１０６が出力する。 In step S113, after adding the end weight of the end state to the cumulative weight of the hypothesis reaching the end state in the hypothesis list H, the cumulative weight (W The hypothesis h that minimizes [h]) is selected, and the symbol string output unit 106 outputs the symbol string conversion result O [h] as the symbol string conversion result.

ステップＳ１１４にてWFSTを用いる記号列変換手順を終了する。 In step S114, the symbol string conversion procedure using WFST is terminated.

[記号列変更例]
この記号列変換手順に従って、図１のWFSTに入力記号列ａ，ａ，ｂ，ｃが与えられた場合の出力記号列を求める過程を順を追って説明する。但し、ここでは、現状態番号ｓ、出力記号列Ｏ、累積重みＷの仮説がある場合、その仮説を（ｓ，Ｏ，Ｗ）のように表すものとする。また、WFSTのある状態遷移（現状態番号ｓ、次状態番号ｎ、入力記号ｘ、出力記号ｙ、重みｗ）を＜ｓ→ｎ，ｘ：ｙ／ｗ＞と表すものとする。また、この例では、累積重みが最小の状態遷移過程に対応する出力記号列を記号列変換結果として選択するものとする。 [Example of changing symbol string]
The process of obtaining the output symbol string when the input symbol strings a, a, b, and c are given to the WFST of FIG. 1 according to this symbol string conversion procedure will be described step by step. However, here, if there is a hypothesis of the current state number s, the output symbol string O, and the cumulative weight W, the hypothesis is represented as (s, O, W). A state transition (current state number s, next state number n, input symbol x, output symbol y, weight w) of WFST is represented as <s → n, x: y / w>. In this example, an output symbol string corresponding to a state transition process with the smallest cumulative weight is selected as a symbol string conversion result.

Ｓ１０１から開始し、Ｓ１０２で空の仮説リストＨ及びＨ’を作る。 Starting from S101, empty hypothesis lists H and H 'are created in S102.

Ｓ１０３により仮説リストＨの中の仮説（０，φ，０）を挿入する。 A hypothesis (0, φ, 0) in the hypothesis list H is inserted in S103.

（記号“ａ”読み込み）
Ｓ１０４で記号ａを読み込みｘと置く。Ｓ１０５において仮説リストＨから仮説（０，φ，０）を取り出す。この仮説の現状態０から入力記号がａに等しい状態遷移＜０→０，ａ：ｄ／０．５＞を含む状態遷移リストＥを作る（図１参照）。 (Read symbol “a”)
In S104, the symbol a is read and set as x. In S105, a hypothesis (0, φ, 0) is extracted from the hypothesis list H. A state transition list E including a state transition <0 → 0, a: d / 0.5> whose input symbol is equal to a is created from the current state 0 of this hypothesis (see FIG. 1).

Ｓ１０６で状態遷移リストＥ＝φではないのでＳ１０７に進み、状態遷移＜０→０，ａ：ｄ／０．５＞を取り出し、Ｓ１０８で新たな仮説（０，ｄ，０．５）を生成し、Ｓ１０９で仮説リストＨ’に挿入する。 Since the state transition list E is not equal to φ in S106, the process proceeds to S107, where the state transition <0 → 0, a: d / 0.5> is extracted, and a new hypothesis (0, d, 0.5) is generated in S108. , S109 is inserted into the hypothesis list H ′.

Ｓ１０６に戻り、状態遷移リストＥ＝φであるためＳ１１０に進み、仮説リストＨ＝φであるためＳ１１１に進む。仮説リストＨ’の要素（０，ｄ，０．５）を仮説リストＨに移し、Ｓ１１２で次の入力記号が存在するのでＳ１０４に戻る。 Returning to S106, since the state transition list E = φ, the process proceeds to S110, and since the hypothesis list H = φ, the process proceeds to S111. The element (0, d, 0.5) of the hypothesis list H ′ is moved to the hypothesis list H. Since the next input symbol exists in S112, the process returns to S104.

続いて、Ｓ１０４で記号ａを読み込みｘと置く。Ｓ１０５において仮説リストＨから仮説（０，ｄ，０．５）を取り出す。この仮説の現状態０から入力記号がａに等しい状態遷移＜０→０，ａ：ｄ／０．５＞を含む状態遷移リストＥを生成する。 In step S104, the symbol a is read and set as x. In S105, a hypothesis (0, d, 0.5) is extracted from the hypothesis list H. A state transition list E including a state transition <0 → 0, a: d / 0.5> whose input symbol is equal to a is generated from the current state 0 of this hypothesis.

Ｓ１０６でＥ＝φではないのでＳ１０７に進み、状態遷移リストＥから状態遷移＜０→０，ａ：ｄ／０．５＞を取り出す。Ｓ１０８で新たな仮説（０，ｄｄ，１）を生成し、Ｓ１０９で仮説リストＨ’に挿入する。 Since E = φ is not satisfied in S106, the process proceeds to S107, and the state transition <0 → 0, a: d / 0.5> is extracted from the state transition list E. A new hypothesis (0, dd, 1) is generated in S108, and inserted into the hypothesis list H 'in S109.

Ｓ１０６に戻り、状態遷移リストＥ＝φであるためＳ１１０に進み、仮説リストＨ＝φであるためＳ１１１に進む。仮説リストＨ’の要素（０，ｄｄ，１）を仮説リストＨに移し、Ｓ１１２で次の入力記号が存在するのでＳ１０４に戻る。 Returning to S106, since the state transition list E = φ, the process proceeds to S110, and since the hypothesis list H = φ, the process proceeds to S111. The element (0, dd, 1) of the hypothesis list H ′ is moved to the hypothesis list H, and since the next input symbol exists in S112, the process returns to S104.

（記号“ｂ”読み込み）
続いて、Ｓ１０４で記号ｂを読み込みｘと置く。Ｓ１０５において仮説リストＨから仮説（０，ｄｄ，１）を取り出す。この仮説の現状態０から入力記号がｂに等しい状態遷移＜０→１，ｂ：ｃ／０．３＞と＜０→２，ｂ：ｂ／１＞とを含む状態遷移リストＥを作る。 (Read symbol “b”)
In step S104, the symbol b is read and set as x. In S105, a hypothesis (0, dd, 1) is extracted from the hypothesis list H. A state transition list E including state transitions <0 → 1, b: c / 0.3> and <0 → 2, b: b / 1> whose input symbol is equal to b is created from the current state 0 of this hypothesis.

Ｓ１０６で状態遷移リストＥ＝φではないのでＳ１０７に進み、状態遷移リストＥから、一つ目の状態遷移＜０→１，ｂ：ｃ／０．３＞を取り出す。Ｓ１０８で新たな仮説（１，ｄｄｃ，１．３）を生成し、Ｓ１０９で仮説リストＨ’に挿入する。 Since the state transition list E is not equal to φ in S106, the process proceeds to S107, and the first state transition <0 → 1, b: c / 0.3> is extracted from the state transition list E. A new hypothesis (1, ddc, 1.3) is generated in S108, and inserted into the hypothesis list H 'in S109.

Ｓ１０６に戻り、状態遷移リストＥ＝φではないのでＳ１０７に進み、状態遷移リストＥから二つ目の状態遷移＜０→２，ｂ：ｂ／１＞を取り出す。Ｓ１０８で新たな仮説（２，ｄｄｂ，２）を生成して、Ｓ１０９で仮説リストＨ’に挿入する。 Returning to S106, since the state transition list E is not equal to φ, the process proceeds to S107, and the second state transition <0 → 2, b: b / 1> is extracted from the state transition list E. A new hypothesis (2, ddb, 2) is generated in S108, and inserted into the hypothesis list H 'in S109.

Ｓ１０６に戻り状態遷移リストＥ＝φであるためＳ１１０に進み、仮説リストＨ＝φであるためＳ１１１に進み、仮説リストＨ’の要素（１，ｄｄｃ，１．３）と（２，ｄｄｂ，２）とは仮説リストＨに移され、Ｓ１１２で次の入力記号が存在するのでＳ１０４に戻る。 Returning to S106, since the state transition list E = φ, the process proceeds to S110, and since the hypothesis list H = φ, the process proceeds to S111, and the elements (1, ddc, 1.3) and (2, ddb, 2) of the hypothesis list H ′. ) Is moved to the hypothesis list H, and since the next input symbol exists in S112, the process returns to S104.

（記号“ｃ”読み込み）
続いて、Ｓ１０４で記号ｃを読み込みｘと置く。Ｓ１０５において仮説リストＨから一つ目の仮説（１，ｄｄｃ，１．３）を取り出す。この仮説の現状態１から入力記号がｃに等しい状態遷移＜１→３，ｃ：ｂ／１＞を含む状態遷移リストＥを作る。 (Read symbol “c”)
In step S104, the symbol c is read and set as x. In S105, the first hypothesis (1, ddc, 1.3) is extracted from the hypothesis list H. A state transition list E including a state transition <1 → 3, c: b / 1> whose input symbol is equal to c is created from the current state 1 of this hypothesis.

Ｓ１０６でＥ＝φではないのでＳ１０７に進み、状態遷移リストＥから状態遷移＜１→３，ｃ：ｂ／１＞を取り出す。Ｓ１０８で新たな仮説（１，ｄｄｃｂ，２．３）を生成し、Ｓ１０９で仮説リストＨ’に挿入する。 Since E = φ is not satisfied in S106, the process proceeds to S107, and the state transition <1 → 3, c: b / 1> is extracted from the state transition list E. A new hypothesis (1, ddcb, 2.3) is generated in S108, and inserted in the hypothesis list H 'in S109.

Ｓ１０６に戻り、状態遷移リストＥ＝φであるためＳ１１０に進み、仮説リストＨ≠φであるためＳ１０５に戻り、仮説リストＨから二つ目の仮説（２，ｄｄｂ，２）を取り出す。この仮説の現状態２から入力記号がｃに等しい状態遷移＜２→３，ｃ：ａ／０．６＞を含む状態遷移リストＥを作る。 Returning to S106, since the state transition list E = φ, the process proceeds to S110, and since the hypothesis list H ≠ φ, the process returns to S105 to extract the second hypothesis (2, ddb, 2) from the hypothesis list H. A state transition list E including a state transition <2 → 3, c: a / 0.6> whose input symbol is equal to c is created from the current state 2 of this hypothesis.

Ｓ１０６で状態遷移リストＥ＝φではないのでＳ１０７に進み、状態遷移リストＥから状態遷移＜２→３，ｃ：ａ／０．６＞を取り出す。Ｓ１０８で新たな仮説（３，ｄｄｂａ，２．６）を生成し、Ｓ１０９で仮説リストＨ’に挿入する。このとき、仮説リストＨ’の中には既に仮説（３，ｄｄｃｂ，２．３）が含まれており、仮説（３，ｄｄｂａ，２．６）は同じ状態３に到達しているので、累積重みの小さい仮説（３，ｄｄｃｂ，２．３）を残し、仮説（３，ｄｄｂａ，２．６）は仮説リストＨ’から削除する。 Since the state transition list E is not equal to φ in S106, the process proceeds to S107, and the state transition <2 → 3, c: a / 0.6> is extracted from the state transition list E. A new hypothesis (3, ddba, 2.6) is generated at S108, and inserted into the hypothesis list H 'at S109. At this time, the hypothesis list H ′ already includes the hypothesis (3, ddcb, 2.3), and the hypothesis (3, ddba, 2.6) has reached the same state 3, so The hypothesis (3, ddcb, 2.3) having a small weight is left, and the hypothesis (3, ddba, 2.6) is deleted from the hypothesis list H ′.

Ｓ１０６に戻り、状態遷移リストＥ＝φであるため、Ｓ１１０に進み、仮説リストＨ＝φであるためＳ１１１に進む。Ｓ１１１で仮説リストＨ’の要素（３，ｄｄｃｂ，２．３）を仮説リストＨに移し、Ｓ１１２で次の入力記号が存在しないのでＳ１１３に進む。 Returning to S106, since the state transition list E = φ, the process proceeds to S110, and since the hypothesis list H = φ, the process proceeds to S111. In S111, the element (3, ddcb, 2.3) of the hypothesis list H ′ is moved to the hypothesis list H. In S112, since there is no next input symbol, the process proceeds to S113.

Ｓ１１３で、仮説リストＨ内の仮説（３，ｄｄｃｂ，２．３）の到達状態３は終了状態であるため、終了重みを加えて（３，ｄｄｃｂ，２．８）とし、この仮説が終了状態に到達した唯一の仮説であり、累積重みが最小となるので、その出力記号列ｄｄｃｂを変換結果として出力し、Ｓ１１４で記号列変換処理を終了する。 In S113, since the arrival state 3 of the hypothesis (3, ddcb, 2.3) in the hypothesis list H is an end state, the end weight is added to (3, ddcb, 2.8), and this hypothesis is an end state Is the only hypothesis that has been reached, and since the accumulated weight is the smallest, the output symbol string ddcb is output as a conversion result, and the symbol string conversion process is terminated in S114.

このような記号列変換では、記号列の出現確率を求めるモデル（言語モデル）を利用することが多い。一般には、記号のＮ個連鎖確率に基づいて記号列の出現確率を計算するＮグラム言語モデルをＷＦＳＴに変換して利用する。 In such symbol string conversion, a model (language model) for obtaining the appearance probability of the symbol string is often used. In general, an N-gram language model that calculates the appearance probability of a symbol string based on the N-chain probability of symbols is converted into WFST and used.

＜ＲＮＮ言語モデル＞
一方、記号列の出現確率を計算する言語モデルとして、ＲＮＮ言語モデルがある。このモデルはＮグラム言語モデルとは異なり、記号のＮ個連鎖確率を用いるのではなく、記号列を初めから順に現在の一つ前の記号まで読み込んだときの記号列（全履歴）に対して、その次に出現する現在の記号の確率を予測する。 <RNN language model>
On the other hand, there is an RNN language model as a language model for calculating the appearance probability of a symbol string. Unlike the N-gram language model, this model does not use the N-chain probability of symbols, but instead of the symbol sequence (total history) when the symbol sequence is read from the beginning to the current previous symbol. , Predict the probability of the current symbol appearing next.

ＲＮＮは、一つの入力層、一つ以上の中間層、および一つの出力層を持ち、少なくとも一つの中間層の中でニューロンが相互に結合された再帰結合を持つ。そして、ＲＮＮ言語モデルは、ＲＮＮに入力記号列の各記号を順次入力し、現在の一つ前の記号を表すベクトルと、その時の中間層の各ニューロンの活性度を用いて、現在の記号の出現確率を計算する。 The RNN has one input layer, one or more intermediate layers, and one output layer, and has a recursive connection in which neurons are connected to each other in at least one intermediate layer. Then, the RNN language model sequentially inputs each symbol of the input symbol string into the RNN, and uses the vector representing the current previous symbol and the activity of each neuron in the intermediate layer at that time to determine the current symbol. Calculate the appearance probability.

各層には複数のニューロンがあり、それぞれ上位や下位、もしくは同じ層にあるニューロンと結合されている。各ニューロンは、発火している度合を表す活性度（実数値）を持つ。結合されたニューロン間には結合の強さを表す結合重み（実数値）が割り当てられる。各ニューロンの活性度は、結合重みを掛けた値として結合先のニューロンに伝播される。 Each layer has a plurality of neurons that are connected to neurons in the upper, lower, or the same layer. Each neuron has an activity (real value) indicating the degree of firing. A connection weight (real value) representing the strength of the connection is assigned between the connected neurons. The activity of each neuron is propagated to the connection destination neuron as a value multiplied by the connection weight.

次に、ＲＮＮ言語モデルによって記号列の出現確率を計算する方法を説明する。 Next, a method for calculating the appearance probability of the symbol string using the RNN language model will be described.

まず、Ｌ層からなるＲＮＮがあり、１層目が入力層、２〜Ｌ−１層目が中間層、Ｌ層目が出力層である。また、ｎ番目の層（１≦ｎ≦Ｌ）にはＨ_ｎ個のニューロンが含まれるものとする。そして、ｍ番目の層のｊ番目のニューロンからｎ番目の層のｋ番目のニューロンへの結合重みをｗ_{（ｍ，ｎ）}［ｊ，ｋ］で表すものとする。但し、１≦ｍ≦ｎ≦Ｌ，１≦ｊ≦Ｈ_ｍ，１≦ｋ≦Ｈ_ｎとする。なお、本実施形態では、ＲＮＮ言語モデルは、入力層側から出力層側に（下位から上位に）向かって結合する。入力記号はベクトルで表現され、その各要素の値を入力層のニューロンの活性度とする。従って、各記号を表すベクトルの次元数と入力層のニューロンの数は同一である。例えば、入力として取りうる値の種類数を、入力記号のベクトルの次元数とし、入力に対応する要素を１とし、他の要素を０とするベクトルを入力記号とする。入力層は、そのベクトル（入力記号）の次元数と同一のニューロンを持ち、入力層の各ニューロンと入力記号の各要素とが対応し、各要素の値を対応するニューロンの活性度とする。なお、中間層の各ニューロンと入力記号の各要素とは対応しない。 First, there is an RNN composed of L layers. The first layer is an input layer, the 2nd to (L-1) th layers are intermediate layers, and the Lth layer is an output layer. In addition, it is assumed that the nth layer (1 ≦ n ≦ L) includes H _n neurons. The connection weight from the j-th neuron in the m-th layer to the k-th neuron in the n-th layer is represented by w _{(m, n)} [j, k]. However, 1 ≦ m ≦ n ≦ L, 1 ≦ j ≦ H _m , and 1 ≦ k ≦ H _n . In the present embodiment, the RNN language model is coupled from the input layer side to the output layer side (from lower to higher). The input symbol is expressed as a vector, and the value of each element is defined as the activity of the neurons in the input layer. Therefore, the number of dimensions of the vector representing each symbol is the same as the number of neurons in the input layer. For example, the number of types of values that can be taken as the input is the number of dimensions of the vector of the input symbol, an element corresponding to the input is 1 and a vector in which the other elements are 0 is the input symbol. The input layer has neurons having the same number of dimensions as the vector (input symbol), each neuron of the input layer corresponds to each element of the input symbol, and the value of each element is defined as the activity of the corresponding neuron. Note that each neuron in the intermediate layer does not correspond to each element of the input symbol.

入力記号列Ｘ＝ｘ_１，…，ｘ_ｔ，…，ｘ_Ｔがあり、その１番目から順にｔ番目の入力記号ｘ_ｔを読み込んだ時、入力層のｉ番目のニューロンの活性度ｈ_１ ^（ｔ）［ｉ］は、 There is an input symbol string X = x ₁ ,..., X _t ,..., X _T , and when the t-th input symbol x _t is read in order from the first, the activity h ₁ ^{( t)} [i] is

となる。なお、ｘ_ｔ［ｉ］はｔ番目の入力記号ｘ_ｔの第ｉ次元目の要素の値を表す。 It becomes. Note that x _t [i] represents the value of the i-th element of the t-th input symbol x _t .

そして、ｎ番目の層（１＜ｎ≦Ｌ）のｋ番目のニューロンの活性度ｈ_ｎ ^（ｔ）［ｋ］は、そのニューロンに結合されたｎ−１番目の層に存在する全てのニューロンに対して、その活性度に結合重みを掛けて総和をとることで次式のように計算される。 Then, the activity h _n ^(t) [k] of the kth neuron in the nth layer (1 <n ≦ L) is given to all neurons existing in the n−1th layer connected to the neuron. On the other hand, by multiplying the degree of activity by the coupling weight and taking the sum, it is calculated as follows.

ここで、ｆ（ｘ）は活性化関数と呼ばれ、通常は活性度を０と１の間に正規化するためのシグモイド関数 Here, f (x) is called an activation function, and is usually a sigmoid function for normalizing the activity between 0 and 1.

を用いる。但し、出力層の活性度を求める場合は一般に活性度を確率と見なすためにソフトマックス関数 Is used. However, when calculating the activity of the output layer, it is generally a softmax function to consider the activity as a probability.

が用いられる。ここで、分母は活性度を確率と見なすための正規化項であり、ｚ_ｎ［ｋ］はｋ番目のニューロンに対してｎ−１層目から結合されたニューロンの活性度の重み付き和を表し、 Is used. Here, the denominator is a normalization term for regarding the activity as a probability, and z _n [k] is a weighted sum of the activities of neurons connected from the (n−1) -th layer to the k-th neuron. Represent,

のように計算される。これは式（２）のシグモイド関数ｆ（）の中身と同じである。 It is calculated as follows. This is the same as the contents of the sigmoid function f () in equation (2).

一方、中間層において同じ層内のニューロンとの再帰的な結合がある場合は、ｔ−１番目の記号ｘ_ｔ−１を読み込んだときの中間層における活性度ｈ_ｎ ^{（ｔ−１）}を与える。すなわち、 On the other hand, when there is a recursive connection with a neuron in the same layer in the intermediate layer, the activity h _n ^(t−1) in the intermediate layer when the _t− 1th symbol x _t−1 is read is given. . That is,

のように、右辺のシグモイド関数（）内の第２項には、ｔ−１の添え字が付いた活性度ｈ_ｎ ^{（ｔ−１）}［ｋ］が再帰的な活性度として同じ層内のニューロンに重み付きで伝搬される。要は、中間層において同じ層内のニューロンとの再帰的な結合がある場合は、その中間層内に存在するニューロンの活性度ｈ_ｎ ^（ｔ）を式（６）により求める。 In the second term in the sigmoid function () on the right side, the activity h _n ^(t−1) [k] with the subscript ^t−1 is the recursive activity in the same layer. Propagated to neurons with weight. In short, when there is a recursive connection with neurons in the same layer in the intermediate layer, the activity h _n ^(t) of the neurons existing in the intermediate layer is obtained by equation (6).

ＲＮＮ言語モデルでは、出力層の個々のニューロンは固有の記号に対応している。例えば、入力として取りうる値の種類数と出力として取りうる値の種類数とが同じ場合には、入力層と出力層のニューロンの総数を同じとし、出力層の各ニューロンを、入力層の各ニューロンに対応するものとすればよい。ＲＮＮ言語モデルでは、予測される次の記号の出現確率は、その記号に対応するニューロンの活性度として求められる。すなわち、入力記号列ｘ_１…ｘ_ｔを読み込んだ後で、記号ｖ_ｋが出現する確率は、 In the RNN language model, each neuron in the output layer corresponds to a unique symbol. For example, if the number of types of values that can be taken as inputs is the same as the number of types of values that can be taken as outputs, the total number of neurons in the input layer and output layer is the same, and each neuron in the output layer It should just correspond to a neuron. In the RNN language model, the predicted appearance probability of the next symbol is obtained as the activity of the neuron corresponding to the symbol. That is, after reading the input symbol string x ₁ ... X _t , the probability that the symbol v _k appears is

となる。但し、記号ｖ_ｋは出力層のｋ番目のニューロンに対応する記号を表す。 It becomes. Here, the symbol v _k represents a symbol corresponding to the k-th neuron in the output layer.

なお、ＲＮＮのパラメタは、各ニューロンを繋ぐ結合重みであり、記号列の学習データを用いて誤差逆伝搬法を用いて推定される。 Note that the RNN parameter is a connection weight that connects the neurons, and is estimated using the error back propagation method using the learning data of the symbol string.

図５は、入力層、中間層、出力層を各一層ずつ持つＲＮＮ言語モデルを表している。ここで、中間層は自分自身に戻る再帰的な結合を持っている。入力層の各ニュートンには、入力記号の各要素の値がそれぞれ活性度として与えられる。なお、ＲＮＮ言語モデルでは、入力記号は一般に０または１の値を要素とするベクトルとして表現される。例えば、考慮する全ての記号の数（語彙サイズ）と同じだけのニューロンを用意しておき、入力記号に対応するニューロンの活性度だけが１、他のニューロンの活性度は０を取るように設定することができる。この場合、仮に考慮する入力記号の種類をＡ，Ｂ，Ｃとすると、入力層のニューロンは３つ必要であり、記号Ａ、記号Ｂ、記号Ｃに対応する入力記号（ベクトル）は、それぞれ、 FIG. 5 shows an RNN language model having an input layer, an intermediate layer, and an output layer. Here, the middle layer has a recursive connection back to itself. Each Newton in the input layer is given the value of each element of the input symbol as the activity. In the RNN language model, the input symbol is generally expressed as a vector having 0 or 1 values as elements. For example, as many neurons as the number of all symbols to be considered (vocabulary size) are prepared, so that only the activity level of the neuron corresponding to the input symbol is 1 and the activity level of other neurons is 0. can do. In this case, if the types of input symbols to be considered are A, B, and C, three neurons in the input layer are required, and the input symbols (vectors) corresponding to the symbols A, B, and C are respectively

のように表せる。但し、ベクトルの１次元目がＡ、２次元目がＢ、３次元目がＣに対応するものとする。また、図５では入力層のニューロンの左側から順にベクトルの１、２、３次元目の要素が活性度になるように対応している。 It can be expressed as However, the first dimension of the vector corresponds to A, the second dimension corresponds to B, and the third dimension corresponds to C. Further, in FIG. 5, the elements in the first, second, and third dimensions of the vector are arranged in order from the left side of the neurons in the input layer so as to have the activity.

中間層には再帰的な結合があるので、各ニューロンの活性度は式（６）に従って計算する。但し、最初の記号を読み込んだとき、すなわちｔ＝１のときはｈ_ｎ ^{（ｔ−１）}［ｋ］＝０とし、ｈ_ｎ ^{（ｔ−１）}［ｋ］がない式（２）に従って活性度を計算する。出力層のニューロンの活性度は式（２）に従って計算し、活性化関数には式（４）のソフトマックス関数を用いる。出力層のニューロンは、左から順に記号Ａ，Ｂ，Ｃに対応している。 Since there are recursive connections in the intermediate layer, the activity of each neuron is calculated according to equation (6). However, when the first symbol is read, that is, when t = 1, h _n ^(t−1) [k] = 0 is set, and the activity according to Equation (2) without h _n ^(t−1) [k] Calculate The activation level of the neurons in the output layer is calculated according to equation (2), and the softmax function of equation (4) is used as the activation function. The neurons in the output layer correspond to symbols A, B, and C in order from the left.

ＲＮＮの中間層のニューロンの活性度は再帰的な結合により再び中間層のニューロンへ伝搬されることから、中間層のニューロンの活性度には、現在までに読み込んだ入力記号列の特徴が記憶される。従って、ＲＮＮ言語モデルは入力記号列の最初から現在までの履歴に依存した入力記号の出現確率を求めることができる。これは、過去のＮ−１個の記号のみから次の記号を予測するＮグラムモデル（Ｎは高々３か４）よりも長い文脈を考慮した記号出現確率を求めることが可能なモデルとなっている。 Since the activity of the neuron in the intermediate layer of the RNN is propagated again to the neuron in the intermediate layer by recursive connection, the characteristics of the input symbol string read up to now are stored in the activity of the neuron in the intermediate layer. The Therefore, the RNN language model can determine the appearance probability of the input symbol depending on the history from the beginning of the input symbol string to the present. This is a model that can obtain a symbol appearance probability considering a longer context than an N-gram model (N is 3 or 4 at most) that predicts the next symbol from only the past N-1 symbols. Yes.

なお、ＲＮＮの中間層のニューロンの総数は少なすぎると精度が悪くなることがあり、多すぎると学習が上手くいかないことがあるので、実験等により予め適切な値を求めておけばよい。例えば２００〜３００個程度に設定する。また、中間層の総数も実験等により予め適切な値を求めておけばよく、例えば、１層に設定する。 Note that if the total number of neurons in the intermediate layer of the RNN is too small, the accuracy may deteriorate, and if it is too large, learning may not be successful. Therefore, an appropriate value may be obtained in advance by experiments or the like. For example, it is set to about 200 to 300. The total number of intermediate layers may be determined in advance by an experiment or the like, and is set to one layer, for example.

＜第一実施形態＞
図６は本実施形態の記号列変換装置の機能ブロック図である。記号列入力部１０３、仮説展開部１０４、仮説絞込み部１０５、記号列出力部１０６の機能構成は図１の記号列変換装置と同様である。ＷＦＳＴを格納する格納部１０１に代えて、本実施形態では、ＲＮＮ言語モデル格納部６０７と初期状態取得部６０８、ＲＮＮ言語モデルＷＦＳＴ状態遷移集合取得部６０９、および終了状態判定部６１０を含む。このような構成により、ＷＦＳＴを参照する代わりに、状態遷移集合取得部６０９によってＲＮＮ言語モデルから必要な部分のＷＦＳＴの状態および状態遷移を取得する処理に置き換えられている点が異なる。これにより、ＲＮＮ言語モデルを用いた効率的な記号列変換が可能である。以下、各部について説明する。 <First embodiment>
FIG. 6 is a functional block diagram of the symbol string converter of this embodiment. The functional configuration of the symbol string input unit 103, the hypothesis expansion unit 104, the hypothesis narrowing unit 105, and the symbol string output unit 106 is the same as that of the symbol string conversion apparatus of FIG. Instead of the storage unit 101 that stores the WFST, the present embodiment includes an RNN language model storage unit 607, an initial state acquisition unit 608, an RNN language model WFST state transition set acquisition unit 609, and an end state determination unit 610. With such a configuration, instead of referring to the WFST, the state transition set acquisition unit 609 is replaced with a process of acquiring a necessary part of the WFST state and state transition from the RNN language model. Thus, efficient symbol string conversion using the RNN language model is possible. Hereinafter, each part will be described.

＜ＲＮＮ言語モデル格納部６０７＞
ＲＮＮ言語モデル格納部６０７には、ＲＮＮ言語モデルが格納されている。例えば、ＲＮＮの構造に関する情報（層の数、各層のニューロンの数など）やパラメタ（ニューロン間の結合重み）が記憶されている。 <RNN language model storage unit 607>
The RNN language model storage unit 607 stores an RNN language model. For example, information on the structure of the RNN (number of layers, number of neurons in each layer, etc.) and parameters (connection weights between neurons) are stored.

＜記号列入力部１０３＞
記号列入力部１０３は、入力記号列を構成する記号を先頭から順に一つずつ読み込み（取得し）、仮説展開部１０４に送る。 <Symbol string input unit 103>
The symbol string input unit 103 reads (acquires) the symbols constituting the input symbol string one by one from the top in order, and sends them to the hypothesis developing unit 104.

＜仮説展開部１０４＞
仮説展開部１０４は、記号列入力部１０３から記号ｘを受け取る。仮説展開部１０４は、記号列入力部１０３で取得した記号ｘとＷＦＳＴ格納部１０１から読み込んだＷＦＳＴに従って、これまで読み込んだ記号列に対する仮説の集合を新たに受け取った記号ｘを用いて各仮説の状態遷移過程を更新することにより新たな仮説を生成し、仮説絞込み部１０５に送る。例えば、以下のようにして仮説を展開する。 <Hypothesis expansion unit 104>
The hypothesis developing unit 104 receives the symbol x from the symbol string input unit 103. The hypothesis expansion unit 104 uses the symbol x newly received from the hypothesis set corresponding to the symbol string read so far according to the symbol x acquired by the symbol string input unit 103 and the WFST read from the WFST storage unit 101. By updating the state transition process, a new hypothesis is generated and sent to the hypothesis narrowing unit 105. For example, the hypothesis is developed as follows.

まだ仮説を生成していない場合（言い換えると、１番目の記号を受け取った場合）には、仮説展開部１０４は、まず、後述する初期状態取得部６０８に、ＷＦＳＴの初期状態を取得するように指示し、初期状態取得部６０８からＷＦＳＴの初期状態の状態番号を取得する。仮説展開部１０４は、次に、取得した初期状態の状態番号と記号列入力部１０３から受け取った記号ｘとを後述するＲＮＮ言語モデルＷＦＳＴ状態遷移集合取得部６０９に送り、初期状態から記号ｘにより遷移可能な状態の集合を取得する。そして、受け取った状態遷移の集合を用いて仮説を生成する。 When a hypothesis has not yet been generated (in other words, when the first symbol is received), the hypothesis expansion unit 104 first acquires an initial state of WFST from the initial state acquisition unit 608 described later. The initial state acquisition unit 608 acquires the state number of the initial state of WFST. The hypothesis expansion unit 104 then sends the acquired initial state state number and the symbol x received from the symbol string input unit 103 to the RNN language model WFST state transition set acquisition unit 609 described later, and uses the symbol x from the initial state. Get a set of transitionable states. Then, a hypothesis is generated using the received set of state transitions.

既にこれまでに入力された記号に対応する仮説が生成されている場合（言い換えると、２番目以降の記号を受け取った場合）は、仮説展開部１０４は、記号列入力部１０３から受け取った記号ｘと、現在の仮説が到達している状態に対応する状態番号ｐとをＲＮＮ言語モデルＷＦＳＴ状態遷移集合取得部６０９に送り、現在の仮説が到達している状態ｐから、記号列入力部１０３から受け取った新たな記号ｘにより遷移可能な状態の集合を取得する。そして、ＲＮＮ言語モデルＷＦＳＴ状態遷移集合取得部６０９から取得した遷移可能な状態の集合を用いて、現在の仮説の状態遷移過程を更新することにより新たな仮説の集合を生成する。 When a hypothesis corresponding to a symbol input so far has already been generated (in other words, when the second and subsequent symbols are received), the hypothesis expansion unit 104 receives the symbol x received from the symbol string input unit 103. And the state number p corresponding to the state where the current hypothesis has reached are sent to the RNN language model WFST state transition set acquisition unit 609, and from the state p where the current hypothesis has arrived, from the symbol string input unit 103 A set of states that can be transitioned by the received new symbol x is acquired. Then, using the set of transitionable states acquired from the RNN language model WFST state transition set acquisition unit 609, a new hypothesis set is generated by updating the state transition process of the current hypothesis.

仮説展開部１０４は、生成した仮説の集合の各仮説についての累積重みを算出する。そして、生成した仮説の集合を終了状態判定部６１０へ送り、各仮説の到達している状態が終了状態であるか否かを判定することにより、各仮説の累積重みを更新する。具体的には、終了状態に到達している仮説の累積重みに、その終了状態の終了重みを加えることで、仮説の累積重みを更新する。 The hypothesis developing unit 104 calculates a cumulative weight for each hypothesis in the generated hypothesis set. Then, the generated hypothesis set is sent to the end state determination unit 610, and it is determined whether or not the state reached by each hypothesis is the end state, thereby updating the cumulative weight of each hypothesis. Specifically, the cumulative weight of the hypothesis is updated by adding the final weight of the final state to the cumulative weight of the hypothesis that has reached the final state.

そして、生成した仮説の集合とその累積重みを仮説絞込み部１０５へ送る。 Then, the generated hypothesis set and its accumulated weight are sent to the hypothesis narrowing unit 105.

＜初期状態取得部６０８＞
初期状態取得部６０８では、ＷＦＳＴの初期状態を生成する。これは、図４のＳ１０４における初期状態の要求ｓ［ｈ］＝ｉに対応する処理であり、一つの状態番号（例えばｉ＝０）を返す。このとき、状態集合Ｑ＝｛０｝、状態数｜Ｑ｜＝１になる。そして、生成した初期状態の状態番号ｓ［ｈ］を、仮説展開部１０４を介して、ＲＮＮ言語モデルＷＦＳＴ状態遷移集合取得部６０９に送る。なお、このとき、入力記号列の始まりを表す記号を初期状態に割り当ててもよい。 <Initial state acquisition unit 608>
The initial state acquisition unit 608 generates an initial state of WFST. This is a process corresponding to the initial state request s [h] = i in S104 of FIG. 4, and returns one state number (for example, i = 0). At this time, the state set Q = {0} and the number of states | Q | = 1. Then, the generated state number s [h] of the initial state is sent to the RNN language model WFST state transition set acquisition unit 609 via the hypothesis expansion unit 104. At this time, a symbol representing the beginning of the input symbol string may be assigned to the initial state.

＜ＲＮＮ言語モデルＷＦＳＴ状態遷移集合取得部６０９＞
ＲＮＮ言語モデルＷＦＳＴ状態遷移集合取得部６０９は、入力された状態番号ｐと入力記号ｘとを仮説展開部１０４から受け取り、これらの値を用いて、ＲＮＮ言語モデルをＷＦＳＴに変換する。本実施形態では、状態番号ｐの状態から入力記号ｘで遷移可能な状態遷移の集合Ｅを求める。 <RNN language model WFST state transition set acquisition unit 609>
The RNN language model WFST state transition set acquisition unit 609 receives the input state number p and the input symbol x from the hypothesis expansion unit 104, and uses these values to convert the RNN language model into WFST. In the present embodiment, a set E of state transitions that can be transitioned by the input symbol x from the state of the state number p is obtained.

図７は、ＲＮＮ言語モデルＷＦＳＴ状態遷移集合取得部６０９の処理フローの例を示す。 FIG. 7 shows an example of the processing flow of the RNN language model WFST state transition set acquisition unit 609.

まずステップＳ７０１より開始し、ステップＳ７０２で状態番号ｐと現在の入力記号ｘとを取得する。なお、状態番号ｐは、遷移元となる状態（遷移元状態）に対応する。 First, in step S701, the state number p and the current input symbol x are acquired in step S702. Note that the state number p corresponds to a state that is a transition source (transition source state).

ステップＳ７０３では、状態番号ｐに対応する遷移元状態から現在の入力記号ｘによる遷移先状態δ（ｐ，ｘ）が未設定か否かを判定する。 In step S703, it is determined from the transition source state corresponding to the state number p whether or not the transition destination state δ (p, x) by the current input symbol x is not set.

遷移先状態δ（ｐ，ｘ）が未設定の場合、ステップＳ７０４に進み、そこで新たな状態ｑを作成する。例えば、その新たな状態の状態番号ｑを現在の状態数｜Ｑ｜に設定し、ｑを状態集合Ｑに追加する。そして、遷移先状態δ（ｐ，ｘ）として新たな状態ｑを設定し、状態ｑに現在の入力記号ｘを割り当て、ステップＳ７０６に進む。状態ｑに割り当てた入力記号を特にｘ_ｑとも表す。 When the transition destination state δ (p, x) is not set, the process proceeds to step S704, where a new state q is created. For example, the state number q of the new state is set to the current state number | Q |, and q is added to the state set Q. Then, a new state q is set as the transition destination state δ (p, x), the current input symbol x is assigned to the state q, and the process proceeds to step S706. Especially also denoted as x _q input symbol assigned to the state q.

遷移先状態δ（ｐ，ｘ）が未設定でない（設定済み）場合、ステップＳ７０５に進み、状態番号ｑに遷移先状態δ（ｐ，ｘ）を代入し、ステップＳ７０８に進む。 If the transition destination state δ (p, x) is not set (set), the process proceeds to step S705, the transition destination state δ (p, x) is substituted for the state number q, and the process proceeds to step S708.

ステップＳ７０６では、現在の入力記号ｘの出現確率が計算されているか否かを判定する。現在の入力記号ｘの出現確率は、言い換えると、ＲＮＮ言語モデルの出力層の現在の入力記号ｘに対応するニューロンの活性度ｈ_Ｌ ^（ｐ）［ｋ（ｘ）］である。ただし、ｋ（ｘ）は現在の入力記号ｘに対応するニューロンを指す番号を表すものとする。 In step S706, it is determined whether the appearance probability of the current input symbol x has been calculated. In other words, the appearance probability of the current input symbol x is the activity h _L ^(p) [k (x)] of the neuron corresponding to the current input symbol x in the output layer of the RNN language model. Here, k (x) represents a number indicating a neuron corresponding to the current input symbol x.

現在の入力記号ｘの出現確率が未計算の場合、ステップＳ７０７で、ＲＮＮ言語モデルを用いて、入力記号ｘに対応するニューロンの活性度ｈ_Ｌ ^（ｐ）［ｋ（ｘ）］を計算する。ただし、活性度ｈ_Ｌ ^（ｐ）［ｋ（ｘ）］は、式（１）と式（２）とは一部異なり、入力記号列の何番目かを表すインデックスｔの代わりに状態番号ｐに依存している。従って、
入力層では、ニューロンの活性度ｈ_１ ^（ｐ）［ｋ（ｘ）］を If the appearance probability of the current input symbol x is not yet calculated, in step S707, the neuron activity h _L ^(p) [k (x)] corresponding to the input symbol x is calculated using the RNN language model. However, the degree of activity h _L ^(p) [k (x)] is partially different from Expression (1) and Expression (2), and the state number p is used instead of the index t indicating the number of the input symbol string. It depends. Therefore,
In the input layer, the neuron activity h ₁ ^(p) [k (x)]

として求める。但し、ｘ_ｐ［ｉ］は、状態番号ｐに割り当てられた入力記号ｘ_ｐの第ｉ次元目の要素を表す。 Asking. Here, x _p [i] represents an i-th element of the input symbol x _p assigned to the state number p.

再帰結合のない中間層（１＜ｎ＜Ｌ）及び出力層（ｎ＝Ｌ）では、ニューロンの活性度をｈ_ｎ ^（ｐ）［ｋ（ｘ）］を In the intermediate layer (1 <n <L) and the output layer (n = L) without recursive coupling, the activity of the neuron is _expressed as h _n ^(p) [k (x)].

として求める。ただし、活性化関数ｆ（）として、再帰結合のない中間層では式（３）のシグモイド関数を用い、出力層では式（４）のソフトマックス関数を用いる。 Asking. However, as the activation function f (), the sigmoid function of Expression (3) is used in the intermediate layer without recursive coupling, and the softmax function of Expression (4) is used in the output layer.

再帰結合のある中間層（１＜ｎ＜Ｌ）では、ニューロンの活性度をｈ_ｎ ^（ｐ）［ｋ（ｘ）］を In the intermediate layer (1 <n <L) with recursive coupling, the activity of the neuron is _expressed as h _n ^(p) [k (x)].

として求める。ただし、ｐ^（−１）は状態ｐに遷移する直前の状態（木構造の親ノード）を表し、活性化関数ｆ（）として式（３）のシグモイド関数を用いる。式（８）〜（１０）を用いて、入力記号ｘに対応する出力層のニューロンの活性度ｈ_Ｌ ^（ｐ）［ｋ（ｘ）］を求める。 Asking. However, p ^(-1) represents the state (parent node of the tree structure) immediately before the transition to the state p, and uses the sigmoid function of Expression (3) as the activation function f (). Using equations (8) to (10), the activity h _L ^(p) [k (x)] of the neuron in the output layer corresponding to the input symbol x is obtained.

このような処理により、本実施形態におけるＲＮＮ言語モデルＷＦＳＴ状態遷移集合取得部６０９では、ただ一つの入力記号列を考慮するのではなく、あらゆる記号列を状態遷移で表すことができる。そして、ＲＮＮ言語モデルにおける記号の出現確率は入力記号列の始めから現在までの入力記号列に依存することから、木構造のＷＦＳＴとして構成される。よって、各状態が記号列の固有の履歴に対応することから、状態番号ｐは任意の記号列の任意番目の記号に一意に対応し、各ニューロンの活性度も状態番号ｐに依存して記録される。 By such processing, the RNN language model WFST state transition set acquisition unit 609 in the present embodiment can represent every symbol string as a state transition, instead of considering only one input symbol string. Since the appearance probability of the symbol in the RNN language model depends on the input symbol string from the beginning of the input symbol string to the present, it is configured as a tree-structured WFST. Therefore, since each state corresponds to a unique history of the symbol string, the state number p uniquely corresponds to the arbitrary symbol of the arbitrary symbol string, and the activity of each neuron is recorded depending on the state number p. Is done.

ステップＳ７０８では、遷移元状態ｐ、遷移先状態ｑ、現在の第一入力記号ｘおよび現在の第一入力記号に等しい出力記号ｘ、現在の第一入力記号の出現確率ｈ_ｎ ^（ｐ）［ｋ（ｘ）］もしくはそれを引数に取る関数を重みとして含む状態遷移を作成する。例えば、現在の第一入力記号の出現確率ｈ_ｎ ^（ｐ）［ｋ（ｘ）］を引数に取る関数としては、出現確率ｈ_ｎ ^（ｐ）［ｋ（ｘ）］の対数の負値−ｌｏｇ（ｈ_ｎ ^（ｐ）［ｋ（ｘ）］）等が考えられる。よって、例えば、状態遷移＜ｐ→ｑ，ｘ：ｘ／−ｌｏｇ（ｈ_ｎ ^（ｐ）［ｋ（ｘ）］）＞を作成する。これを唯一の状態遷移として持つ状態遷移の集合Ｅを生成する。最後にステップＳ７０９に進み、生成した状態遷移の集合Ｅを仮説展開部１０４に出力して状態遷移集合取得部６０９の処理を終了する。 In step S708, the transition source state p, the transition destination state q, the current first input symbol x, the output symbol x equal to the current first input symbol, and the occurrence probability h _n ^(p) [k of the current first input symbol. (X)] or a state transition including a function taking the argument as a weight is created. For example, a function that takes the appearance probability _h ⁿ of the current first input symbol (p) [k (x) ] as an argument, the logarithm of a negative value -log probability of occurrence _h ⁿ (p) [k (x) ] (H _n ^(p) [k (x)]) and the like are conceivable. Therefore, for example, a state transition <p → q, x: x / −log (h _n ^(p) [k (x)])> is created. A set E of state transitions having this as the only state transition is generated. Finally, the process proceeds to step S709, where the generated state transition set E is output to the hypothesis expansion unit 104, and the process of the state transition set acquisition unit 609 ends.

本実施形態の手順に従えば、図５のＲＮＮ言語モデルは、図８に示すような木構造のＷＦＳＴの一部分として構成される。図８に示すようにＷＦＳＴの各状態は、初期状態０から始まる任意の記号列に対応しており、例えば、状態９は記号列Ｂ，Ｃに対応している。但し、本実施形態の手順では、入力記号列に対応する状態と状態遷移のみが作られるので、図８のように各状態から可能性のあるすべての記号に対する状態遷移が作られる訳ではない。実際に、木構造であらゆる記号列を表現すると、記号列の長さ（木の深さ）に応じて、状態数が指数的に増加してしまうため、ＲＮＮ言語モデルを予めＷＦＳＴに変換しておくことは非現実的である。本実施形態では、記号列変換に必要な状態と状態遷移のみを作るため、状態数が急激に増加することはない。例えば、入力記号列Ｂ，Ｃ，Ａを読み込むと、図９の状態０，１，２，３に対応する状態遷移が作られる。その後、入力記号列Ｂ，Ａを読み込むと、状態１から状態４への遷移が新たに作られる。一度作った状態遷移の重みについては活性度を再計算する必要はないので、計算は効率的である。しかし、様々な入力記号列を読み込む中で状態数は徐々に増加していくので、記憶領域を削減するために、適当なタイミングで状態や状態遷移を消去しても良い。 If the procedure of this embodiment is followed, the RNN language model of FIG. 5 will be comprised as a part of WFST of a tree structure as shown in FIG. As shown in FIG. 8, each state of WFST corresponds to an arbitrary symbol string starting from the initial state 0. For example, the state 9 corresponds to the symbol strings B and C. However, in the procedure of the present embodiment, only the states and state transitions corresponding to the input symbol string are created, so that state transitions for all possible symbols from each state are not created as shown in FIG. Actually, if every symbol string is expressed by a tree structure, the number of states increases exponentially according to the length of the symbol string (the depth of the tree), so the RNN language model is converted to WFST in advance. It is unrealistic to leave. In the present embodiment, since only the states and state transitions necessary for symbol string conversion are created, the number of states does not increase rapidly. For example, when input symbol strings B, C, and A are read, state transitions corresponding to states 0, 1, 2, and 3 in FIG. Thereafter, when the input symbol strings B and A are read, a transition from the state 1 to the state 4 is newly created. Since it is not necessary to recalculate the activity for the weight of the state transition once made, the calculation is efficient. However, since the number of states gradually increases while reading various input symbol strings, states and state transitions may be deleted at an appropriate timing in order to reduce the storage area.

＜終了状態判定部８１０＞
終了状態判定部８１０では、入力された仮説の集合（仮説リスト）Ｈの中の個々の仮説ｈが到達した状態ｓ［ｈ］が終了状態か否か、すなわちｓ［ｈ］∈Ｆか否かを判定する。例えば読み込んだ記号が入力記号列の最後の記号であるなら、ｓ［ｈ］は終了状態、そうでなければ終了状態ではない、といった判定をしても良い。または、状態ｓ［ｈ］が記号列の終わりを表す特殊記号＜／ｓ＞に対応した状態であるならばｓ［ｈ］は終了状態、そうでなければ終了状態ではない、といった判定をしても良い。この終了状態判定は、図４のＳ１１３の処理に相当する。 <End state determination unit 810>
The end state determination unit 810 determines whether or not the state s [h] at which each hypothesis h in the set of hypotheses (hypothesis list) H has arrived is an end state, that is, whether or not s [h] εF. Determine. For example, if the read symbol is the last symbol in the input symbol string, it may be determined that s [h] is in an end state, and otherwise it is not in an end state. Alternatively, if the state s [h] is a state corresponding to the special symbol </ s> representing the end of the symbol string, a determination is made that s [h] is the end state, and otherwise it is not the end state. Also good. This end state determination corresponds to the processing of S113 in FIG.

＜仮説絞込み部１０５＞
仮説絞込み部１０５では、同じ状態に到達している仮説の中で累積重みが最小の仮説以外の仮説を削除することにより仮説を絞り込む。更には、存在する仮説の中で累積重みが相対的に大きい仮説を削除しても良い。これは、例えば、存在する仮説の中での累積重みの最小値に一定の値を加えた値を閾値として、この閾値よりも累積重みの大きな仮説をすべて削除しても良い（この仮説絞り込み方法を「枝刈り」とも言う）。そして、入力記号列が最後まで読み込まれていれば、その終了状態に到達している仮説の中から累積重み（Ｗ［ｈ］）が最小となる仮説ｈを選び、その出力記号列Ｏ［ｈ］を記号列変換結果として、記号列出力部１０６に送る。入力記号列が最後まで読み込まれていない場合には、絞り込み後の仮説の集合を仮説展開部１０４へ送る。 <Hypothesis narrowing unit 105>
The hypothesis narrowing unit 105 narrows down hypotheses by deleting hypotheses other than the hypothesis having the smallest cumulative weight among the hypotheses that have reached the same state. Furthermore, hypotheses having a relatively large cumulative weight among existing hypotheses may be deleted. This may be done, for example, by using a value obtained by adding a certain value to the minimum value of accumulated weights in existing hypotheses as a threshold, and deleting all hypotheses having a larger accumulated weight than this threshold (this hypothesis narrowing method) Is also called “pruning”). If the input symbol string has been read to the end, the hypothesis h having the smallest cumulative weight (W [h]) is selected from the hypotheses that have reached the end state, and the output symbol string O [h ] To the symbol string output unit 106 as a symbol string conversion result. If the input symbol string has not been read to the end, the set of refined hypotheses is sent to the hypothesis developing unit 104.

＜記号列出力部１０６＞
記号列出力部１０６では、仮説絞込み部１０５から受け取った出力記号列を出力する。 <Symbol string output unit 106>
The symbol string output unit 106 outputs the output symbol string received from the hypothesis narrowing unit 105.

＜効果＞
このような構成により、ＲＮＮ言語モデルからＮグラム言語モデルを用いることなく、直接ＷＦＳＴを生成することができる。ＲＮＮ言語モデルに対応するＷＦＳＴに基づき記号列変換を行うことができる。 <Effect>
With such a configuration, a WFST can be directly generated from an RNN language model without using an N-gram language model. Symbol string conversion can be performed based on WFST corresponding to the RNN language model.

＜変形例＞
本実施形態では、状態遷移の集合Ｅには一つの状態遷移しか含まれておらず、そこから一つの仮説しか生成しないため、仮説リストや状態遷移リスト等を用意せずに、図４のＳ１０２、Ｓ１０６、Ｓ１０９、Ｓ１１０、Ｓ１１１等の処理を省略してもよい。 <Modification>
In the present embodiment, only one state transition is included in the state transition set E, and only one hypothesis is generated therefrom. Therefore, without preparing a hypothesis list, a state transition list, or the like, S102 in FIG. , S106, S109, S110, S111 and the like may be omitted.

本実施形態では、記号列変換装置の一部として、ＲＮＮ言語モデル格納部６０７、初期状態取得部６０８、ＲＮＮ言語モデルＷＦＳＴ状態遷移集合取得部６０９、終了状態判定部６１０を設けたが、ＲＮＮ言語モデル格納部６０７、初期状態取得部６０８、ＲＮＮ言語モデルＷＦＳＴ状態遷移集合取得部６０９、終了状態判定部６１０を含む重み付き有限状態変換器作成装置として構成してもよい。重み付き有限状態変換器作成装置は、入力記号ｘとを入力とし、ＲＮＮ言語モデルをＷＦＳＴに変換し、変換後のＷＦＳＴを出力する。例えば、本実施形態のように状態番号ｐの状態から入力記号ｘで遷移可能な状態遷移を求めることで、ＲＮＮ言語モデルをＷＦＳＴに変換する。なお、状態番号は、初期状態取得部６０８やＲＮＮ言語モデルＷＦＳＴ状態遷移集合取得部６０９で求めた状態番号を用いればよい。 In this embodiment, an RNN language model storage unit 607, an initial state acquisition unit 608, an RNN language model WFST state transition set acquisition unit 609, and an end state determination unit 610 are provided as part of the symbol string conversion device. You may comprise as a weighted finite state converter production apparatus containing the model storage part 607, the initial state acquisition part 608, the RNN language model WFST state transition set acquisition part 609, and the completion | finish state determination part 610. FIG. The weighted finite state converter creating apparatus receives the input symbol x, converts the RNN language model into WFST, and outputs the converted WFST. For example, the RNN language model is converted to WFST by obtaining a state transition that can be transitioned by the input symbol x from the state of the state number p as in the present embodiment. As the state number, the state number obtained by the initial state acquisition unit 608 or the RNN language model WFST state transition set acquisition unit 609 may be used.

また、重み付き有限状態変換器作成装置は、入力記号ｘとを入力とし、ＲＮＮ言語モデルを重み付き有限オートマトン（ＷＦＳＡ）に変換し、変換後のＷＦＳＡを出力してもよい。出力記号を含まない状態遷移を求めることで、容易にＷＦＳＡを求めることができる。よって、ＷＦＳＴを求めることは、同時にＷＦＳＡを求めることを意味している。このとき、ＲＮＮ言語モデルＷＦＳＴ状態遷移集合取得部をＲＮＮ言語モデルＷＦＳＡ状態遷移集合取得部と言ってもよい。なお、求めたＷＦＳＡに対して、入力記号と同様の出力記号を付加することで、本実施形態のＷＦＳＴに変換することもできる。つまり、求めたＷＦＳＡは、入力記号と同様の出力記号を付加したＷＦＳＴに対応する。 Further, the weighted finite state converter creating apparatus may receive the input symbol x, convert the RNN language model into a weighted finite automaton (WFSA), and output the converted WFSA. By obtaining a state transition that does not include an output symbol, the WFSA can be easily obtained. Therefore, obtaining WFST means obtaining WFSA at the same time. At this time, the RNN language model WFST state transition set acquisition unit may be referred to as an RNN language model WFSA state transition set acquisition unit. In addition, it can also convert into WFST of this embodiment by adding the output symbol similar to an input symbol with respect to the calculated | required WFSA. That is, the obtained WFSA corresponds to a WFST to which an output symbol similar to the input symbol is added.

また、初期状態を別途与えられる（例えば、人手により、または予め定めた初期状態を与えられる）構成としてもよい。その場合、記号列変換装置や重み付き有限状態変換器作成装置は、初期状態取得部６０８を含まなくともよい。また、終了状態判定部６１０を記号列変換装置や別装置として設けてもよい。その場合、重み付き有限状態変換器作成装置は、終了状態判定部６１０を含まなくともよい。 Alternatively, the initial state may be separately given (for example, manually or a predetermined initial state may be given). In that case, the symbol string conversion device and the weighted finite state converter creation device may not include the initial state acquisition unit 608. Further, the end state determination unit 610 may be provided as a symbol string conversion device or another device. In that case, the weighted finite state converter creating apparatus may not include the end state determining unit 610.

ＲＮＮモデルとして、ＲＮＮ言語モデルを用いているが、必ずしも言語モデルに限定されない。要は、ＲＮＮに入力記号列の各記号を順次入力し、現在の一つ前の記号を表すベクトルと、その時の中間層の各ニューロンの活性度を用いて、現在の記号の出現確率を計算するＲＮＮモデルであれば、ＲＮＮ言語モデル以外のＲＮＮモデルであっても、ＷＦＳＴに変換することができる。 Although the RNN language model is used as the RNN model, it is not necessarily limited to the language model. In short, each symbol of the input symbol string is sequentially input to RNN, and the appearance probability of the current symbol is calculated using the vector representing the current previous symbol and the activity of each neuron in the intermediate layer at that time. Any RNN model other than the RNN language model can be converted to WFST.

＜第二実施形態＞
第一実施形態と異なる部分を中心に説明する。本実施形態では、本発明を音声認識に適用する。 <Second embodiment>
A description will be given centering on differences from the first embodiment. In the present embodiment, the present invention is applied to speech recognition.

図１０は第二実施形態に係る音声認識装置の機能ブロック図である。本実施形態では、第一実施形態と同様に、ＲＮＮ言語モデル格納部、初期状態取得部、ＲＮＮ言語モデルＷＦＳＴ状態遷移集合取得部、終了状態判定部を有する。 FIG. 10 is a functional block diagram of the speech recognition apparatus according to the second embodiment. As in the first embodiment, this embodiment includes an RNN language model storage unit, an initial state acquisition unit, an RNN language model WFST state transition set acquisition unit, and an end state determination unit.

すなわち、ＲＮＮ言語モデル格納部１００７、初期状態取得部１００８、ＲＮＮ言語モデルＷＦＳＴ状態遷移集合取得部１００９、および終了状態判定部１０１０を用いて、ＲＮＮ言語モデルＷＦＳＴの状態遷移集合を必要に応じて生成することで、ＲＮＮ言語モデルに対するＷＦＳＴの全体（図８参照）が存在するかのように記号列変換を行う。 That is, using the RNN language model storage unit 1007, the initial state acquisition unit 1008, the RNN language model WFST state transition set acquisition unit 1009, and the end state determination unit 1010, a state transition set of the RNN language model WFST is generated as necessary. Thus, the symbol string conversion is performed as if the entire WFST (see FIG. 8) for the RNN language model exists.

なお、ＲＮＮ言語モデル格納部１００７、初期状態取得部１００８、状態遷移集合取得部１００９、および終了状態判定部１０１０以外の各部の処理については、特許文献１に詳細が記載されているため、ここでは概要のみを説明する。 Since details of the processing of each unit other than the RNN language model storage unit 1007, the initial state acquisition unit 1008, the state transition set acquisition unit 1009, and the end state determination unit 1010 are described in Patent Document 1, Only the outline will be described.

＜音声信号入力部１００３及び音声特徴記号列抽出部１００４＞
音声を入力する音声信号入力部１００３から送られた音声信号はその音声の短時間音響パターンの時系列を記号列として抽出する音声特徴記号列抽出部１００４において音響特徴記号列に変換し、その音響特徴記号列を入力として記号列変換を行う記号列変換部１００５に送る。 <Audio signal input unit 1003 and audio feature symbol string extraction unit 1004>
The voice signal sent from the voice signal input unit 1003 that inputs voice is converted into an acoustic feature symbol string by the voice feature symbol string extraction unit 1004 that extracts the time series of the short-time acoustic pattern of the voice as a symbol string, and the sound The feature symbol string is input to the symbol string conversion unit 1005 that performs symbol string conversion.

＜記号列変換部１００５＞
記号列変換部１００５は、仮説展開部１００６、仮説補正部１０１１及び仮説絞込み部１０１２を含む。 <Symbol string conversion unit 1005>
The symbol string conversion unit 1005 includes a hypothesis expansion unit 1006, a hypothesis correction unit 1011, and a hypothesis narrowing unit 1012.

記号列変換部１００５は、音響モデル格納部１００１から音声固定単位（例えば音素）の標準的な音響パターン系列の特徴を保持し、個々の音声固定単位と任意の音響パターンの間の類似度を与える音響モデルを読みだす。 The symbol string conversion unit 1005 holds the characteristics of a standard acoustic pattern sequence of a fixed speech unit (for example, phoneme) from the acoustic model storage unit 1001, and gives a similarity between each fixed speech unit and an arbitrary acoustic pattern. Read the acoustic model.

音声認識に用いる音響パターンには、短い時間（例えば１０ミリ秒）ごとに音声信号を分析することにより得られるメルケプストラム（ｍｅｌ−ｆｒｅｑｕｅｎｃｙｃｅｐｓｔｒａｌｃｏｅｆｆｉｃｉｅｎｔｓ，
ＭＦＣＣと呼ばれる）、デルタＭＦＣＣ、ＬＰＣケプストラム、対数パワーなどがある。 The acoustic pattern used for speech recognition includes a mel cepstrum obtained by analyzing the speech signal every short time (for example, 10 milliseconds).
(Referred to as MFCC), delta MFCC, LPC cepstrum, log power, etc.

種々の音声固定単位（例えば音素）の標準的特徴を保持する音響モデルとしては、例えば、それら音響パターンの系列の集合を確率・統計理論に基づいてモデル化する隠れマルコフモデル法（ＨｉｄｄｅｎＭａｒｋｏｖＭｏｄｅｌ，以後ＨＭＭと呼ぶ）が主流である。このＨＭＭ法の詳細は、例えば、社団法人電子情報通信学会、中川聖一著「確率モデルによる音声認識」に開示されている。音響モデルとして他の従来技術を用いてもよい。 As an acoustic model that holds standard features of various speech fixed units (for example, phonemes), for example, a hidden Markov model method (Hidden Markov Model, which models a set of sequences of these acoustic patterns based on probability / statistical theory) (Hereinafter referred to as HMM) is the mainstream. Details of the HMM method are disclosed in, for example, “Recognition of Speech by Stochastic Model” by Seichi Nakagawa, Institute of Electronics, Information and Communication Engineers. Other conventional techniques may be used as the acoustic model.

さらに、仮説展開部１００６は、単語辞書ＷＦＳＴ格納部１００２から前記音声固定単位の系列からその発音を持つ単語の系列に変換する単語辞書ＷＦＳＴを読み出す。さらに、記号列変換部１００５は、ＲＮＮ言語モデルＷＦＳＴ生成部１００を用いて、ＲＮＮ言語モデルＷＦＳＴの状態遷移集合を生成し、音声特徴記号列抽出部１００４から送られた音響特徴記号列を読み込み、累積重み最小または最大の出力記号列を求め、記号列出力部１０１３に送る。 Further, the hypothesis developing unit 1006 reads the word dictionary WFST from the word dictionary WFST storage unit 1002 for converting the speech fixed unit sequence into a word sequence having the pronunciation. Further, the symbol string conversion unit 1005 generates a state transition set of the RNN language model WFST using the RNN language model WFST generation unit 100, reads the acoustic feature symbol string sent from the speech feature symbol string extraction unit 1004, An output symbol string having the minimum or maximum cumulative weight is obtained and sent to the symbol string output unit 1013.

（仮説展開部１００６）
仮説展開部１００６は、音声特徴記号列抽出部１００４から送られた音響特徴記号列の記号を一つずつ読み込む。次に、仮説展開部１００６は、単語辞書ＷＦＳＴを用いて、音響特徴記号列の音響特徴記号から現在の仮説の集合の各々に新しい状態遷移を追加し、新たな仮説を展開する。 (Hypothesis development unit 1006)
The hypothesis developing unit 1006 reads the symbols of the acoustic feature symbol string sent from the speech feature symbol string extracting unit 1004 one by one. Next, using the word dictionary WFST, the hypothesis developing unit 1006 adds a new state transition to each of the current set of hypotheses from the acoustic feature symbols of the acoustic feature symbol string, and develops a new hypothesis.

なお、単語辞書ＷＦＳＴの重みとして、音響モデルによって計算される音響特徴記号（音響パターン）のスコアを用いる。ただし、このスコアは、大きいほど入力音響パターンが音響モデルによって表される音声固定単位に近いことを表すので、マイナスの音響スコアをもって重みとする。隠れマルコフモデルによる音響スコアの計算では、例えばガウス分布に基づく確率値が用いられる。 Note that the score of the acoustic feature symbol (acoustic pattern) calculated by the acoustic model is used as the weight of the word dictionary WFST. However, the larger the score, the closer the input acoustic pattern is to the sound fixed unit represented by the acoustic model, so a negative acoustic score is used as the weight. In the calculation of the acoustic score by the hidden Markov model, for example, a probability value based on a Gaussian distribution is used.

（仮説補正部１０１１）
仮説補正部１０１１は、新しい状態遷移を追加された仮説の集合を受け取る。仮説補正部１０１１は、初期状態取得部１００８、状態遷移集合取得部１００９、終了状態判定部１０１０によって、仮説展開部１００６から受け取った個々の仮説の累積重みを補正する。 (Hypothesis correction unit 1011)
The hypothesis correction unit 1011 receives a set of hypotheses to which new state transitions are added. The hypothesis correction unit 1011 corrects the cumulative weight of each hypothesis received from the hypothesis expansion unit 1006 by the initial state acquisition unit 1008, the state transition set acquisition unit 1009, and the end state determination unit 1010.

具体的には、仮説展開部１００６から受け取った個々の仮説の状態遷移過程から出力される単語列を入力記号列として、第一実施形態と同様の処理を行って出力される出力記号列に対応する累積重み、すなわち、可能な状態遷移過程の中で累積重みが最小となる状態遷移過程の累積重み、をその仮説の累積重みに加算することにより、各仮説の累積重みを補正する。言い換えると、ＲＮＮ言語モデルに対応するＷＦＳＴを用いて、仮説展開部１００６から受け取った個々の仮説の状態遷移過程から出力される単語列の状態遷移の重みを補正する。また別の言い方をすると、仮説補正部１０１１は、言語モデルＷＦＳＴ生成部１００により部分的に生成される、ＲＮＮ言語モデルに対応するＷＦＳＴとは別の単語辞書ＷＦＳＴを用いて、音響特徴記号列（ＲＮＮ言語モデルに対応するＷＦＳＴに対する入力記号列とは別の記号列である）を変換することにより得た仮説の集合に含まれる仮説の状態遷移過程から出力される記号列を、ＲＮＮ言語モデルに対応するＷＦＳＴに対する入力記号列として、第一実施形態と同様の処理を行う。 Specifically, a word string output from the state transition process of each hypothesis received from the hypothesis expansion unit 1006 is used as an input symbol string, and the output symbol string is output by performing the same processing as in the first embodiment. The cumulative weight of each hypothesis is corrected by adding the cumulative weight of the hypothetical state, that is, the cumulative weight of the state transition process having the smallest cumulative weight among the possible state transition processes, to the cumulative weight of the hypothesis. In other words, using the WFST corresponding to the RNN language model, the weight of the state transition of the word string output from the state transition process of each hypothesis received from the hypothesis developing unit 1006 is corrected. In other words, the hypothesis correction unit 1011 uses a word dictionary WFST that is partially generated by the language model WFST generation unit 100 and that is different from the WFST corresponding to the RNN language model, and uses the acoustic feature symbol string ( The symbol string output from the state transition process of the hypothesis included in the set of hypotheses obtained by converting the symbol string different from the input symbol string for the WFST corresponding to the RNN language model is converted into the RNN language model. The same processing as in the first embodiment is performed as an input symbol string for the corresponding WFST.

（仮説絞込み部１０１２）
仮説絞込み部１０１２は、仮説補正部１０１１で生成された仮説の集合を受け取り、同じ状態に到達している仮説の中で累積重みが最小または最大となる仮説から所定数の仮説以外の仮説を削除することにより仮説を絞り込む。そして、入力記号列が最後まで読み込まれていれば、その終了状態に到達している仮説の中から累積重みが最小または最大となる仮説を選び、その出力記号列を記号列変換結果として、記号列出力部１０１３に送る。入力記号列が最後まで読み込まれていない場合には、累積重みの大きい所定数の仮説を削除して、絞り込んだ後、残った仮説の集合を仮説展開部１００６へ送る。 (Hypothesis narrowing down part 1012)
The hypothesis narrowing-down unit 1012 receives the set of hypotheses generated by the hypothesis correction unit 1011 and deletes hypotheses other than a predetermined number of hypotheses from hypotheses having the minimum or maximum cumulative weight among hypotheses that have reached the same state. To narrow down the hypothesis. If the input symbol string has been read to the end, the hypothesis having the smallest or maximum cumulative weight is selected from the hypotheses that have reached the end state, and the output symbol string is used as the symbol string conversion result, The data is sent to the column output unit 1013. If the input symbol string has not been read to the end, a predetermined number of hypotheses with large accumulated weights are deleted and narrowed down, and the remaining hypothesis set is sent to the hypothesis developing unit 1006.

仮説展開部１００６は、音響特徴記号列の次の記号を読み込み、記号列変換部１００５は、入力された音声特徴記号列をすべて読み終えるまで、同様の処理を繰り返す。 The hypothesis developing unit 1006 reads the next symbol of the acoustic feature symbol string, and the symbol string converting unit 1005 repeats the same processing until all the input voice feature symbol strings have been read.

最後の音声特徴記号を読み込んだ後に、仮説絞込み部１０１２において累積重み最小の仮説とその出力記号列を求め、記号列出力部１０１３に送る。 After reading the last speech feature symbol, the hypothesis narrowing unit 1012 obtains the hypothesis with the minimum cumulative weight and its output symbol string, and sends it to the symbol string output unit 1013.

よって、記号列変換部１００５は、１つ以上の単語列の中から補正後の重みの累積重みが最小または最大の状態遷移過程に対応する単語列を音声認識結果として決定し、記号列出力部１０１３に出力する。 Therefore, the symbol string conversion unit 1005 determines a word string corresponding to the state transition process with the minimum or maximum cumulative weight of the corrected weights from one or more word strings as a speech recognition result, and the symbol string output unit 1013.

＜記号列出力部１０１３＞
記号列出力部１０１３は、受け取った単語列を音声認識結果として出力する。 <Symbol string output unit 1013>
The symbol string output unit 1013 outputs the received word string as a speech recognition result.

このような構成により、本発明を音声認識に利用することができる。 With such a configuration, the present invention can be used for speech recognition.

＜変形例＞
第一実施形態のＲＮＮ言語モデルＷＦＳＴ状態遷移集合取得部６０９において、状態遷移の重み−ｌｏｇ（ｈ_Ｌ ^（ｐ）［ｋ（ｘ）］）を求めるとき（図７のステップＳ７０８）、Ｎグラム言語モデルによって求めた確率との線形補間を行っても良い。例えば、状態遷移重みは <Modification>
When the RNN language model WFST state transition set acquisition unit 609 according to the first embodiment obtains a state transition weight -log (h _L ^(p) [k (x)]) (step S708 in FIG. 7), an N-gram language Linear interpolation with the probability obtained by the model may be performed. For example, the state transition weight is

のように計算できる。ここで、Ｐ（ｘ│ｘ_{ｐ＾（−Ｎ＋２）}，…，ｘ_{ｐ＾（−１）}，ｘ_ｐ）（ただし、下付添え字のｐ＾（−Ｎ＋２）とｐ＾（−１）とはそれぞれｐ^{（−Ｎ＋２）}とｐ^（−１））を表す）は、Ｎグラム言語モデルで計算されるＮグラム確率を表し、ｘ_{ｐ＾（−Ｎ＋２）}，…，ｘ_{ｐ＾（−１）}，ｘ_ｐは状態ｐに至る状態遷移過程の最後のＮ−１個の状態に割り当てられた記号の系列を表す。つまり、記号ｘの直前の長さＮ−１の記号列に依存したＮグラム確率となっている。λはＲＮＮ言語モデルで求めた記号出現確率とＮグラム言語モデルで求めたＮグラム確率の間のバランスを取る係数であり、０≦λ≦１とする。 It can be calculated as follows. _{Here, P (x│x p ^ (-} N + 2), ..., x p ^ (- 1), x p) ( However, subscript of p ^ (- N + 2) and p ^ (- 1) and Represents p ^{(−N + 2)} and p ⁽⁻¹⁾ ), respectively, represents the N-gram probability calculated by the N-gram language model, and x _{p ^ (− N + 2)} ,..., X _{p ^ (− 1)} , X _p represent a sequence of symbols assigned to the last N−1 states of the state transition process leading to state p. That is, the N-gram probability depends on the symbol string of length N−1 immediately before the symbol x. λ is a coefficient that balances the symbol appearance probability obtained by the RNN language model and the N-gram probability obtained by the N-gram language model, and 0 ≦ λ ≦ 1.

＜実験結果＞
図１０に示す形態で音声認識装置を構築した。音響モデルには、５１種類の音素に対するＨＭＭを用意し、各音素ごとに３つの状態があり、各状態にはその音素のコンテキスト（前にある音素は何か、後ろに続く音素は何か）に応じて２，５４６種類ある音響パターンの確率密度分布の内の一つが割り当てられている。これら確率密度分布のＩｄ番号を音声固定単位とした。 <Experimental result>
A speech recognition apparatus was constructed in the form shown in FIG. The acoustic model has HMMs for 51 types of phonemes, and there are three states for each phoneme. Each state has a phoneme context (what is the phoneme in front and what is behind it) Accordingly, one of 2,546 kinds of probability density distributions of acoustic patterns is assigned. The Id number of these probability density distributions was used as a voice fixed unit.

音声信号の音響パターンの系列は、１０ミリ秒ごとに音声信号を分析することにより得られるＭＦＣＣ１２次元、ＭＦＣＣの各次元の時系列方向に前後２フレーム見たきの一次回帰係数であるデルタＭＦＣＣ１２次元、各次元の時系列方向に前後２フレーム見たきの一次回帰係数であるデルタデルタＭＦＣＣ１２次元、および対数パワーを合わせた３９次元のベクトルを要素とする入力系列として抽出する。 The sequence of the acoustic pattern of the audio signal is MFCC 12 dimension obtained by analyzing the audio signal every 10 milliseconds, Delta MFCC 12 dimension which is a primary regression coefficient when viewing two frames before and after in the time series direction of each dimension of MFCC, A delta-delta MFCC 12-dimensional primary regression coefficient when viewing two frames before and after the dimension in the time-series direction and a 39-dimensional vector combined with logarithmic power are extracted as input sequences.

辞書として１０万単語とその発音を用い、音声固定単位の系列から単語列に変換するＷＦＳＴを構築した。 Using 100,000 words and their pronunciation as a dictionary, a WFST was constructed to convert a fixed speech unit sequence into a word string.

ＲＮＮ言語モデルは、マサチューセッツ工科大学の英語講義コーパスの１０４講義を書き起こした単語列を用いて学習した。同様に、Ｎグラム言語モデル（Ｎ＝３）も学習し、音声認識処理を行うときは、前述の変形例の式（１１）でＷＦＳＴの状態遷移重みを求めた。このとき、λ＝０．５に設定した。 The RNN language model was learned using a word sequence that was a transcription of 104 lectures of the Massachusetts Institute of Technology English Lecture Corpus. Similarly, when the N-gram language model (N = 3) is also learned and speech recognition processing is performed, the state transition weight of WFST is obtained by Expression (11) of the above-described modification. At this time, λ was set to 0.5.

図１１は、Ｎグラム言語モデルだけを用いる音声認識方法（従来法（１））、Ｎグラム言語モデルを用いる音声認識によって各発話あたり最大１０００候補の単語列を出力させた後で、各候補をＲＮＮ言語モデルで再スコア付けを行い、スコア最大の候補を選び直す音声認識方法（従来法（２））、および本実施形態による音声認識方法における単語誤り率、認識処理時間、遅延時間を表している。 FIG. 11 shows a speech recognition method that uses only the N-gram language model (conventional method (1)), and a speech string that uses the N-gram language model to output a maximum of 1000 candidate word strings for each utterance. Re-scoring with the RNN language model and reselecting the candidate with the highest score (conventional method (2)), and the word error rate, recognition processing time, and delay time in the speech recognition method according to the present embodiment Yes.

認識処理時間は実時間比で計測しており、これは認識処理時間を実際に話された時間で割った値であり、小さいほど高速であることを示す。認識処理時間は、Intel Xeon X5570 2.54GHzプロセッサを用いて音声認識を動作させ、８講義（計７．８時間）を認識させたときの時間を計測して求めた。単語誤り率は、実際に話された単語の中で誤って認識した単語の割合を表しており、小さいほど音声認識の精度が高いことを表す。また、遅延時間は、各発話の音声を入力し終えて、結果が出力されるまでの時間を計測し、認識した全発話で平均した値である。 The recognition processing time is measured by a real time ratio, which is a value obtained by dividing the recognition processing time by the actually spoken time. The recognition processing time was obtained by measuring the time when speech recognition was performed using an Intel Xeon X5570 2.54 GHz processor and 8 lectures (total of 7.8 hours) were recognized. The word error rate represents the proportion of words that are mistakenly recognized among words that are actually spoken. The smaller the word error rate, the higher the accuracy of speech recognition. The delay time is a value obtained by measuring the time from the end of inputting the voice of each utterance until the result is output, and averaging all the recognized utterances.

図１１の結果において、従来法（１）はＮグラム言語モデルのみを用いているので、単語誤り率が２６．８％となっており、ＲＮＮ言語モデルを用いる従来法（２）や本実施形態の単語誤り率２４．７％よりも高くなっている。一方、認識処理時間（実時間比）では、従来法（１）が最も小さく（０．３８）、次に本実施形態（０．４５）、そして従来法（２）（０．５８）の順である。そして、遅延時間では、従来法（２）は０．３６秒となっている。これは、複数の候補を出力させてＲＮＮ言語モデルで再スコア付を行う計算が必要なためである。これに対し、本実施形態は０．０２秒とかなり少ない遅延時間で抑えられている。以上より、本実施形態は、ＲＮＮ言語モデルを用いて単語誤り率を削減しながら、再スコア付を行う従来法（２）と比べて認識処理時間を少なく抑えつつ、大幅に少ない遅延時間で音声認識を行えることが示された。 In the result of FIG. 11, since the conventional method (1) uses only the N-gram language model, the word error rate is 26.8%, and the conventional method (2) using the RNN language model and this embodiment The word error rate is higher than 24.7%. On the other hand, in the recognition processing time (real time ratio), the conventional method (1) is the smallest (0.38), followed by the present embodiment (0.45) and the conventional method (2) (0.58). It is. In the delay time, the conventional method (2) is 0.36 seconds. This is because a calculation for outputting a plurality of candidates and re-scoring with the RNN language model is necessary. On the other hand, in the present embodiment, the delay time is as small as 0.02 seconds. As described above, the present embodiment reduces the word error rate by using the RNN language model, suppresses the recognition processing time as compared with the conventional method (2) in which re-scoring is performed, and performs speech with significantly less delay time. It was shown that recognition was possible.

＜その他の変形例＞
仮説展開部１００６と仮説補正部１０１１とを併せて、状態遷移合成部と呼んでもよい。状態遷移合成部は、ＲＮＮ言語モデルに基づくＷＦＳＴの状態遷移と、単語辞書ＷＦＳＴの状態遷移とを合成し、合成した状態遷移の集合を生成するといってもよい。その場合、記号列変換部１００５は、合成した状態遷移の集合を参照して、音響特徴記号列を単語列に変換していると言える。 <Other variations>
The hypothesis development unit 1006 and the hypothesis correction unit 1011 may be collectively referred to as a state transition synthesis unit. The state transition synthesis unit may synthesize the WFST state transition based on the RNN language model and the state transition of the word dictionary WFST and generate a set of synthesized state transitions. In that case, it can be said that the symbol string conversion unit 1005 converts the acoustic feature symbol string into a word string with reference to the set of synthesized state transitions.

なお、本実施形態の構成は、音声認識装置に限らず、入力記号列を出力記号列に変換する記号列変換装置として利用できる。要は、状態遷移合成部は、ＲＮＮモデルに基づくＷＦＳＴの状態遷移と、それとは別のＷＦＳＴの状態遷移とを合成し、合成した状態遷移の集合を生成する。記号列変換部は、合成した状態遷移の集合を参照して、ＲＮＮモデルに基づくＷＦＳＴに対する入力記号列とは別の入力記号列を、ＲＮＮモデルに基づくＷＦＳＴの出力記号列に変換する。 Note that the configuration of this embodiment is not limited to a speech recognition device, and can be used as a symbol string conversion device that converts an input symbol string into an output symbol string. In short, the state transition synthesis unit synthesizes a WFST state transition based on the RNN model and another WFST state transition, and generates a set of synthesized state transitions. The symbol string conversion unit converts an input symbol string different from the input symbol string for the WFST based on the RNN model into an WFST output symbol string based on the RNN model, with reference to the set of state transitions synthesized.

本発明は上記の実施形態及び変形例に限定されるものではない。例えば、上述の各種の処理は、記載に従って時系列に実行されるのみならず、処理を実行する装置の処理能力あるいは必要に応じて並列的にあるいは個別に実行されてもよい。その他、本発明の趣旨を逸脱しない範囲で適宜変更が可能である。 The present invention is not limited to the above-described embodiments and modifications. For example, the various processes described above are not only executed in time series according to the description, but may also be executed in parallel or individually as required by the processing capability of the apparatus that executes the processes. In addition, it can change suitably in the range which does not deviate from the meaning of this invention.

＜プログラム及び記録媒体＞
また、上記の実施形態及び変形例で説明した各装置における各種の処理機能をコンピュータによって実現してもよい。その場合、各装置が有すべき機能の処理内容はプログラムによって記述される。そして、このプログラムをコンピュータで実行することにより、上記各装置における各種の処理機能がコンピュータ上で実現される。 <Program and recording medium>
In addition, various processing functions in each device described in the above embodiments and modifications may be realized by a computer. In that case, the processing contents of the functions that each device should have are described by a program. Then, by executing this program on a computer, various processing functions in each of the above devices are realized on the computer.

この処理内容を記述したプログラムは、コンピュータで読み取り可能な記録媒体に記録しておくことができる。コンピュータで読み取り可能な記録媒体としては、例えば、磁気記録装置、光ディスク、光磁気記録媒体、半導体メモリ等どのようなものでもよい。 The program describing the processing contents can be recorded on a computer-readable recording medium. As the computer-readable recording medium, for example, any recording medium such as a magnetic recording device, an optical disk, a magneto-optical recording medium, and a semiconductor memory may be used.

また、このプログラムの流通は、例えば、そのプログラムを記録したＤＶＤ、ＣＤ−ＲＯＭ等の可搬型記録媒体を販売、譲渡、貸与等することによって行う。さらに、このプログラムをサーバコンピュータの記憶装置に格納しておき、ネットワークを介して、サーバコンピュータから他のコンピュータにそのプログラムを転送することにより、このプログラムを流通させてもよい。 The program is distributed by selling, transferring, or lending a portable recording medium such as a DVD or CD-ROM in which the program is recorded. Further, the program may be distributed by storing the program in a storage device of the server computer and transferring the program from the server computer to another computer via a network.

このようなプログラムを実行するコンピュータは、例えば、まず、可搬型記録媒体に記録されたプログラムもしくはサーバコンピュータから転送されたプログラムを、一旦、自己の記憶部に格納する。そして、処理の実行時、このコンピュータは、自己の記憶部に格納されたプログラムを読み取り、読み取ったプログラムに従った処理を実行する。また、このプログラムの別の実施形態として、コンピュータが可搬型記録媒体から直接プログラムを読み取り、そのプログラムに従った処理を実行することとしてもよい。さらに、このコンピュータにサーバコンピュータからプログラムが転送されるたびに、逐次、受け取ったプログラムに従った処理を実行することとしてもよい。また、サーバコンピュータから、このコンピュータへのプログラムの転送は行わず、その実行指示と結果取得のみによって処理機能を実現する、いわゆるＡＳＰ（Application Service Provider）型のサービスによって、上述の処理を実行する構成としてもよい。なお、プログラムには、電子計算機による処理の用に供する情報であってプログラムに準ずるもの（コンピュータに対する直接の指令ではないがコンピュータの処理を規定する性質を有するデータ等）を含むものとする。 A computer that executes such a program first stores, for example, a program recorded on a portable recording medium or a program transferred from a server computer in its storage unit. When executing the process, this computer reads the program stored in its own storage unit and executes the process according to the read program. As another embodiment of this program, a computer may read a program directly from a portable recording medium and execute processing according to the program. Further, each time a program is transferred from the server computer to the computer, processing according to the received program may be executed sequentially. Also, the program is not transferred from the server computer to the computer, and the above-described processing is executed by a so-called ASP (Application Service Provider) type service that realizes the processing function only by the execution instruction and result acquisition. It is good. Note that the program includes information provided for processing by the electronic computer and equivalent to the program (data that is not a direct command to the computer but has a property that defines the processing of the computer).

また、コンピュータ上で所定のプログラムを実行させることにより、各装置を構成することとしたが、これらの処理内容の少なくとも一部をハードウェア的に実現することとしてもよい。 In addition, although each device is configured by executing a predetermined program on a computer, at least a part of these processing contents may be realized by hardware.

Claims

一つの入力層、一つ以上の中間層、および一つの出力層を持ち、少なくとも一つの中間層の中でニューロンが相互に結合された再帰結合を持つモデルをリカレントニューラルネットワーク（以下、ＲＮＮと呼ぶ）とし、ＲＮＮに入力される記号を表すベクトルを第一入力記号とし、最初から現在の一つ前までの第一入力記号の系列である第一入力記号列に対して、現在の第一入力記号の出現確率分布を出力するＲＮＮモデルがＲＮＮモデル格納部に格納されているものとし、
ＲＮＮモデルＷＦＳＡ状態遷移集合取得部が、変化しうる有限の状態と、入力による状態の遷移を表現する重み付き有限状態オートマトン(以下ＷＦＳＡともいう)である第一ＷＦＳＡにＲＮＮモデルを変換するＲＮＮモデルＷＦＳＡ状態遷移集合取得ステップを含み、
前記ＲＮＮモデルＷＦＳＡ状態遷移集合取得ステップは、
遷移元状態となる状態と現在の第一入力記号とを取得するステップと、
前記遷移元状態から前記現在の第一入力記号による遷移先状態が未設定の場合、新たな状態を作成し、遷移先状態として新たに作成した状態を設定し、新たに作成した状態に前記現在の第一入力記号を割り当てるステップと、
前記遷移元状態から前記現在の第一入力記号による遷移先状態が未設定であって、かつ、前記現在の第一入力記号の出現確率が計算されていない場合、前記ＲＮＮモデルを用いて、前記現在の第一入力記号の出現確率を計算するステップと、
前記遷移元状態、前記遷移先状態、前記現在の第一入力記号、前記現在の第一入力記号の出現確率もしくはそれを引数に取る関数を重みとして含む状態遷移を作成するステップとを含む、
重み付き有限状態オートマトン作成方法。 A model having one input layer, one or more intermediate layers, and one output layer and having a recurrent connection in which neurons are mutually connected in at least one intermediate layer is referred to as a recurrent neural network (hereinafter referred to as RNN). ), And a vector representing a symbol input to the RNN is a first input symbol, and the first input symbol sequence that is a sequence of first input symbols from the first to the previous one is input to the current first input Assume that an RNN model that outputs an appearance probability distribution of symbols is stored in the RNN model storage unit,
An RNN model in which the RNN model WFSA state transition set acquisition unit converts the RNN model into a first WFSA that is a weighted finite state automaton (hereinafter also referred to as WFSA) that expresses a finite state that can change and a state transition caused by input. Including a WFSA state transition set acquisition step,
The RNN model WFSA state transition set acquisition step includes:
Obtaining a state to be a transition source state and the current first input symbol;
If the transition destination state by the current first input symbol is not set from the transition source state, a new state is created, a newly created state is set as the transition destination state, and the current state is set to the newly created state. Assigning the first input symbol of
If the transition destination state by the current first input symbol is not set from the transition source state and the occurrence probability of the current first input symbol is not calculated, the RNN model is used to Calculating an occurrence probability of the current first input symbol;
Creating a state transition including, as a weight, the transition source state, the transition destination state, the current first input symbol, the occurrence probability of the current first input symbol, or a function that takes it as an argument.
How to create weighted finite state automata.

請求項１の重み付き有限状態オートマトン作成方法で作成した第一ＷＦＳＡに、前記現在の第一入力記号に等しい出力記号を付加した重み付き有限状態変換器である第一ＷＦＳＴを用いる記号列変換方法であって、
記号列変換部が、前記第一入力記号とは別の入力記号の系列を、出力記号の系列に変換する記号列変換ステップを含み、
前記記号列変換ステップは、
状態遷移合成部が、前記第一ＷＦＳＴの状態遷移と、前記第一ＷＦＳＴとは別のＷＦＳＴである第二ＷＦＳＴの状態遷移とを合成し、合成した状態遷移の集合を生成する状態遷移合成ステップを含み、
前記合成した状態遷移の集合を参照して、前記第一入力記号とは別の入力記号である第二入力記号の系列を、前記出力記号の系列に変換する、
記号列変換方法。 A symbol string conversion method using a first WFST which is a weighted finite state converter in which an output symbol equal to the current first input symbol is added to the first WFSA created by the weighted finite state automaton creation method of claim 1 Because
A symbol string converting unit including a symbol string converting step of converting a sequence of input symbols different from the first input symbol into a sequence of output symbols;
The symbol string converting step includes:
State transition synthesis step in which the state transition synthesis unit synthesizes the state transition of the first WFST and the state transition of the second WFST, which is a different WFST from the first WFST, and generates a set of synthesized state transitions Including
Referring to the set of synthesized state transitions, converting a series of second input symbols, which are input symbols different from the first input symbols, into a sequence of output symbols,
Symbol string conversion method.

請求項２の記号列変換方法を用いる音声認識方法であって、
前記ＲＮＮモデルはＲＮＮ言語モデルであり、
前記第二ＷＦＳＴは単語辞書ＷＦＳＴであり、
前記第二入力記号の系列は、音響特徴記号列であり、
前記状態遷移合成ステップは、
仮説展開部が、前記単語辞書ＷＦＳＴを用いて、音響特徴記号列の音響特徴記号から現在の仮設の集合の各々に新しい状態遷移を追加し新たな仮説を展開する仮説展開ステップと、
仮説補正部が、新たな仮説の状態遷移過程から出力される単語列を前記第一入力記号列とし、前記第一ＷＦＳＴを用いて、前記単語列の状態遷移の重みを補正する仮説補正ステップと、を含み、
前記記号列変換ステップは、
１つ以上の単語列の中から補正後の重みの累積重みが最小または最大の状態遷移過程に対応する単語列を音声認識結果として決定する、
音声認識方法。 A speech recognition method using the symbol string conversion method according to claim 2,
The RNN model is an RNN language model;
The second WFST is a word dictionary WFST;
The series of the second input symbols is an acoustic feature symbol string,
The state transition synthesis step includes:
A hypothesis developing unit that uses the word dictionary WFST to add a new state transition to each of the current temporary set from the acoustic feature symbol of the acoustic feature symbol string and develop a new hypothesis;
A hypothesis correction unit in which a hypothesis correction unit sets a word string output from a state transition process of a new hypothesis as the first input symbol string, and corrects the weight of the state transition of the word string using the first WFST; Including,
The symbol string converting step includes:
Determining, as a speech recognition result, a word string corresponding to a state transition process having a minimum or maximum accumulated weight of corrected weights from among one or more word strings;
Speech recognition method.

一つの入力層、一つ以上の中間層、および一つの出力層を持ち、少なくとも一つの中間層の中でニューロンが相互に結合された再帰結合を持つモデルをリカレントニューラルネットワーク（以下、ＲＮＮと呼ぶ）とし、ＲＮＮに入力される記号を表すベクトルを第一入力記号とし、
最初から現在の一つ前までの第一入力記号の系列である第一入力記号列に対して、現在の第一入力記号の出現確率分布を出力するＲＮＮモデルが格納されるＲＮＮモデル格納部と、
変化しうる有限の状態と、入力による状態の遷移を表現する重み付き有限状態オートマトン(以下ＷＦＳＡともいう)である第一ＷＦＳＡにＲＮＮモデルを変換するＲＮＮモデルＷＦＳＡ状態遷移集合取得部とを含み、
前記ＲＮＮモデルＷＦＳＡ状態遷移集合取得部は、
遷移元状態となる状態と現在の第一入力記号とを取得し、
前記遷移元状態から前記現在の第一入力記号による遷移先状態が未設定の場合、新たな状態を作成し、遷移先状態として新たに作成した状態を設定し、新たに作成した状態に前記現在の第一入力記号を割り当て、
前記遷移元状態から前記現在の第一入力記号による遷移先状態が未設定であって、かつ、前記現在の第一入力記号の出現確率が計算されていない場合、前記ＲＮＮモデルを用いて、前記現在の第一入力記号の出現確率を計算し、
前記遷移元状態、前記遷移先状態、前記現在の第一入力記号、前記現在の第一入力記号の出現確率もしくはそれを引数に取る関数を重みとして含む状態遷移を作成する、
重み付き有限状態オートマトン作成装置。 A model having one input layer, one or more intermediate layers, and one output layer and having a recurrent connection in which neurons are mutually connected in at least one intermediate layer is referred to as a recurrent neural network (hereinafter referred to as RNN). ), And a vector representing a symbol input to the RNN is a first input symbol,
An RNN model storage unit that stores an RNN model that outputs an appearance probability distribution of a current first input symbol with respect to a first input symbol string that is a series of first input symbols from the beginning to the previous one; ,
A finite state that can change, and an RNN model WFSA state transition set acquisition unit that converts the RNN model into a first WFSA that is a weighted finite state automaton (hereinafter also referred to as WFSA) that expresses a state transition caused by an input,
The RNN model WFSA state transition set acquisition unit includes:
Get the transition source state and the current first input symbol,
If the transition destination state by the current first input symbol is not set from the transition source state, a new state is created, a newly created state is set as the transition destination state, and the current state is set to the newly created state. Assign the first input symbol of
If the transition destination state by the current first input symbol is not set from the transition source state and the occurrence probability of the current first input symbol is not calculated, the RNN model is used to Calculate the occurrence probability of the current first input symbol,
Creating a state transition including, as a weight, the transition source state, the transition destination state, the current first input symbol, the occurrence probability of the current first input symbol or a function taking the argument as an argument,
Weighted finite state automaton generator.

請求項４の重み付き有限状態オートマトン作成装置で作成した第一ＷＦＳＡに、前記現在の第一入力記号に等しい出力記号を付加した重み付き有限状態変換器である第一ＷＦＳＴを用いる記号列変換装置であって、
前記第一入力記号とは別の入力記号の系列を、出力記号の系列に変換する記号列変換部を含み、
前記記号列変換部は、
前記第一ＷＦＳＴの状態遷移と、前記第一ＷＦＳＴとは別のＷＦＳＴである第二ＷＦＳＴの状態遷移とを合成し、合成した状態遷移の集合を生成する状態遷移合成部を含み、
前記合成した状態遷移の集合を参照して、前記第一入力記号とは別の入力記号である第二入力記号の系列を、前記出力記号の系列に変換する、
記号列変換装置。 A symbol string conversion device using a first WFST, which is a weighted finite state converter, in which an output symbol equal to the current first input symbol is added to the first WFSA created by the weighted finite state automaton creation device of claim 4 Because
A symbol string conversion unit for converting a series of input symbols different from the first input symbol into a series of output symbols;
The symbol string converter is
A state transition combining unit that combines the state transition of the first WFST and the state transition of the second WFST, which is a different WFST from the first WFST, and generates a set of the combined state transitions;
Referring to the set of synthesized state transitions, converting a series of second input symbols, which are input symbols different from the first input symbols, into a sequence of output symbols,
Symbol string converter.

請求項５の記号列変換装置を用いる音声認識装置であって、
前記ＲＮＮモデルはＲＮＮ言語モデルであり、
前記第二ＷＦＳＴは単語辞書ＷＦＳＴであり、
前記第二入力記号の系列は、音響特徴記号列であり、
前記状態遷移合成部は、
仮説展開部が、前記単語辞書ＷＦＳＴを用いて、音響特徴記号列の音響特徴記号から現在の仮設の集合の各々に新しい状態遷移を追加し新たな仮説を展開する仮説展開部と、
仮説補正部が、新たな仮説の状態遷移過程から出力される単語列を前記第一入力記号列とし、前記第一ＷＦＳＴを用いて、前記単語列の状態遷移の重みを補正する仮説補正部と、を含み、
前記記号列変換部は、
１つ以上の単語列の中から補正後の重みの累積重みが最小または最大の状態遷移過程に対応する単語列を音声認識結果として決定する、
音声認識装置。 A speech recognition device using the symbol string conversion device according to claim 5,
The RNN model is an RNN language model;
The second WFST is a word dictionary WFST;
The series of the second input symbols is an acoustic feature symbol string,
The state transition synthesis unit
A hypothesis developing unit that uses the word dictionary WFST to add a new state transition to each of the current temporary set from the acoustic feature symbol of the acoustic feature symbol string and develop a new hypothesis;
A hypothesis correction unit that uses the first input symbol string as a word string output from a state transition process of a new hypothesis and corrects the weight of the state transition of the word string using the first WFST; Including,
The symbol string converter is
Determining, as a speech recognition result, a word string corresponding to a state transition process having a minimum or maximum accumulated weight of corrected weights from among one or more word strings;
Voice recognition device.

請求項１の重み付き有限状態オートマトン作成方法、または、請求項２の記号列変換方法、または、請求項３の音声認識方法の各ステップをコンピュータに実行させるためのプログラム。
A program for causing a computer to execute the steps of the weighted finite state automaton creation method of claim 1, the symbol string conversion method of claim 2, or the speech recognition method of claim 3.