JP6674876B2

JP6674876B2 - Correction device, correction method, and correction program

Info

Publication number: JP6674876B2
Application number: JP2016184754A
Authority: JP
Inventors: 智恵子西澤; 豊明渡
Original assignee: Toyota Technical Development Corp
Current assignee: Toyota Technical Development Corp
Priority date: 2016-09-21
Filing date: 2016-09-21
Publication date: 2020-04-01
Anticipated expiration: 2036-09-21
Also published as: JP2018049165A

Description

本発明は、音声信号をテキストに変換する音声認識技術に関する。 The present invention relates to a speech recognition technique for converting a speech signal into text.

従来、音声を入力して、テキストデータに変換する音声認識技術がある。当該音声認識技術は、基本的に、音響モデルや言語モデルを用いて、音声を解析してテキストデータに変換する。しかしながら、そのような音声認識を行う装置にあっては、入力した音声がくぐもったりした場合などには、正確に音声を解析できないことがある。その結果、装置は、誤認識した結果を出力する可能性がある。 2. Description of the Related Art Conventionally, there is a voice recognition technology for inputting voice and converting it into text data. The speech recognition technology basically analyzes a speech using an acoustic model or a language model and converts the speech into text data. However, such a device that performs speech recognition may not be able to accurately analyze the speech when the input speech is muffled. As a result, the device may output a result of erroneously recognized.

そこで、特許文献１には、入力された音声をテキストに変換し、そのテキスト内の単語に信頼度の低い単語が含まれる場合に、テキストに含まれる信頼度の高い一以上の単語からなる文字列に基づく予測変換を行って、テキストの修正を行う音声入力支援システムが開示されている。 Therefore, Japanese Patent Application Laid-Open No. H11-163873 discloses a technique in which an input voice is converted into text, and when words in the text include words with low reliability, a character composed of one or more words with high reliability included in the text A speech input support system that performs predictive conversion based on a column and corrects text is disclosed.

特開２０１２−７８６５０号公報JP 2012-78650 A

しかしながら、特許文献１に示す音声入力支援システムでは、テキストの修正がテキスト内の単語の信頼度に基づいて、修正を行うだけであり、その予測結果の信頼性が高いと言えるかは場合によってしまうという問題があった。 However, in the voice input support system disclosed in Patent Literature 1, text correction is only performed based on the reliability of words in the text, and the reliability of the prediction result may be high depending on the case. There was a problem.

そこで、本発明は、上記問題に鑑みて成されたものであり、特許文献１とは異なる手法を用いて、音声認識結果を補正することができる補正装置を提供することを目的とする。 Therefore, the present invention has been made in view of the above-described problem, and has as its object to provide a correction device that can correct a speech recognition result using a method different from that of Patent Document 1.

本発明に係る補正装置は、文節と文節との間の関連性を示す文節間関連情報を記憶する記憶部と、音声信号をテキストデータに変換する音声認識処理の結果である１以上の文字列を示す認識結果情報の入力を受け付ける入力部と、認識結果情報で示される文字列から文節の候補となる文節候補を抽出する抽出部と、文節間関連情報を用いて、抽出部が抽出した文節候補同士の係り受けに基づいて、文字列を構成する可能性のある文節を特定する特定部と、特定部が特定した文節候補を用いて、認識結果情報で示される文字列を補正する補正部とを備える。 A correction device according to the present invention includes a storage unit that stores inter-segment related information indicating the association between segments, and one or more character strings that are the result of speech recognition processing that converts a speech signal into text data. An input unit that receives input of recognition result information indicating a phrase, an extraction unit that extracts a phrase candidate that is a phrase candidate from a character string indicated by the recognition result information, and a phrase extracted by the extraction unit using inter-phrase related information. A specifying unit that specifies a phrase that may form a character string based on the dependency between candidates, and a correcting unit that corrects a character string indicated by recognition result information using the phrase candidate specified by the specifying unit And

また、上記補正装置において、特定部は、抽出部が抽出した文節のうち信頼度が他の文節よりも高い文節を基準として、当該信頼度が高い文節の係り先の候補となる１以上の係り先文節候補、または、他の文節からの係りを受ける係り元の候補となる１以上の係り元文節候補を、文字列を構成する可能性のある文節として特定することとしてもよい。 In the correction device, the specifying unit may determine, based on a phrase extracted by the extraction unit, a phrase having a higher reliability than the other phrases, as one or more relation candidates that are candidates for a destination of the phrase having the higher reliability. It is also possible to specify one or more candidate candidate phrase candidates that are candidates for a preceding clause or a candidate for a modification that receives a modification from another phrase, as phrases that may form a character string.

また、上記補正装置において、特定部は、複数の係り先文節候補、又は、複数の係り元文節候補の中から、認識結果情報で示される文字列の音の類似度に基づいて、文字列を構成する可能性のある文節を特定することとしてもよい。 Further, in the correction device, the specifying unit may form the character string based on the similarity of the sound of the character string indicated by the recognition result information from among the plurality of modification target phrase candidates or the plurality of modification source phrase candidates. A phrase that may be composed may be specified.

また、上記補正装置において、記憶部は、さらに、単語と単語との間の関連性を示す単語間関連情報を記憶し、補正部は、特定部が特定した文字列を構成する可能性のある文節を、当該文節を構成している可能性のある単語候補に分割し、単語間関連情報を用いて、単語候補の中から文字列を構成する可能性のある文節を構成する可能性のある単語を特定して、補正を行うこととしてもよい。 In the correction device, the storage unit may further store inter-word related information indicating a relationship between the words, and the correction unit may configure the character string specified by the specifying unit. There is a possibility that a phrase is divided into word candidates that may constitute the phrase, and a phrase that may form a character string may be constructed from the word candidates using inter-word related information. The correction may be performed by specifying a word.

また、上記補正装置において、補正部は、文字列を構成する可能性のある単語を文章の文頭から確度の高い単語を選択し、選択した単語に続く可能性のある単語を、少なくとも文節を分割して得られる単語及び単語間関連情報に含まれる単語及び認識結果情報のいずれかの単語の中から選択することを繰り返すことで、文字列を補正することとしてもよい。 In the correction device, the correction unit selects a word having a high probability from the beginning of the sentence of a word that may form a character string, and divides at least a phrase that may possibly follow the selected word. The character string may be corrected by repeatedly selecting from among the words and the words included in the word-to-word related information and the recognition result information.

また、上記補正装置において、文節間関連情報は、ユーザの発話例を示すテキストデータベースに基づいて生成された、係り側文節を示す文字列と受け側文節を示す文字列とを対応付けた情報であることとしてもよい。 Further, in the correction device, the inter-phrase related information is generated based on a text database indicating an example of a user's utterance, and is information that associates a character string indicating a relevant phrase with a character string indicating a receiving phrase. It may be.

また、上記補正装置において、単語間関連情報は、ユーザの発話例を示すテキストデータベースに基づいて生成された、１以上の単語列と、当該１以上の単語列に連なる単語とを対応付けた情報であることとしてもよい。 Further, in the correction device, the inter-word related information is information in which one or more word strings generated based on a text database indicating a user's utterance example are associated with words connected to the one or more word strings. It is good also as being.

また、上記補正装置において、補正装置は、さらに、補正部が補正した補正結果を出力する出力部を備えることとしてもよい。 In the correction device, the correction device may further include an output unit that outputs a correction result corrected by the correction unit.

また、本発明に係る補正方法は、音声信号をテキストデータに変換する音声認識処理の結果である１以上の文字列を示す認識結果情報の入力を受け付ける入力ステップと、認識結果情報で示される文字列から文節の候補となる文節候補を抽出する抽出ステップと、文節と文節との間の関連性を示す文節間関連情報を用いて、抽出ステップにおいて抽出した文節候補同士の係り受けに基づいて、文字列を構成する可能性のある文節を特定する特定ステップと、特定ステップにおいて特定した文節候補を用いて、認識結果情報で示される文字列を補正する補正ステップとを含む。 Further, the correction method according to the present invention includes an input step of receiving an input of recognition result information indicating one or more character strings which is a result of a voice recognition process for converting a voice signal into text data, and a character indicated by the recognition result information. Using an extraction step of extracting a phrase candidate that is a candidate for a phrase from the sequence, and inter-clause related information indicating the relationship between the phrase and the phrase, based on the dependency between the phrase candidates extracted in the extraction step, The method includes a specifying step of specifying a phrase that may form a character string, and a correction step of correcting the character string indicated by the recognition result information using the phrase candidate specified in the specific step.

また、本発明に係る補正プログラムは、文節と文節との間の関連性を示す文節間関連情報を記憶するメモリにアクセス可能なコンピュータに、音声信号をテキストデータに変換する音声認識処理の結果である１以上の文字列を示す認識結果情報の入力を受け付ける入力機能と、認識結果情報で示される文字列から文節の候補となる文節候補を抽出する抽出機能と、文節間関連情報を用いて、抽出機能が抽出した文節候補同士の係り受けに基づいて、文字列を構成する可能性のある文節を特定する特定機能と、特定機能が特定した文節候補を用いて、認識結果情報で示される文字列を補正する補正機能とを実現させる。 Further, the correction program according to the present invention provides a computer that can access a memory that stores inter-segment related information indicating the association between segments with a result of a speech recognition process that converts a speech signal into text data. Using an input function for receiving an input of recognition result information indicating one or more character strings, an extraction function for extracting a phrase candidate that is a phrase candidate from the character string indicated by the recognition result information, and inter-phrase related information, Based on the dependency between the phrase candidates extracted by the extraction function, a specific function that specifies a phrase that may form a character string, and a character indicated by the recognition result information using the phrase candidate specified by the specific function And a correction function for correcting a column.

本発明の一態様に係る補正装置によれば、並列処理を行う２つの系を双方の系において共に利用する積分演算の単位で分けることにより、互いの系の間にデータを渡すためのバッファを設けることなく並列処理を実現できる。したがって、バッファを設けることにより発生する遅延に基づく並列化誤差を発生させることなく並列処理により、逐次処理よりも補正を高速化することができる。 According to the correction device of one embodiment of the present invention, the two systems that perform parallel processing are divided by the unit of the integral operation that is used in both systems, so that a buffer for passing data between the systems is provided. Parallel processing can be realized without providing. Therefore, the correction can be performed at a higher speed than in the sequential processing by the parallel processing without generating the parallelization error based on the delay caused by providing the buffer.

補正装置の一構成例を示すブロック図である。FIG. 3 is a block diagram illustrating a configuration example of a correction device. 実施の形態に係る音声認識結果の補正の流れを示す図である。FIG. 9 is a diagram showing a flow of correcting a speech recognition result according to the embodiment. 補正装置の詳細な構成例を示すブロック図である。FIG. 3 is a block diagram illustrating a detailed configuration example of a correction device. 補正装置の動作を示すフローチャートである。6 is a flowchart illustrating an operation of the correction device. 補正装置の動作の具体例を示す図である。FIG. 7 is a diagram illustrating a specific example of the operation of the correction device. 補正装置の動作を示すフローチャートである。6 is a flowchart illustrating an operation of the correction device. 図６に続く補正装置の動作を示すフローチャートである。7 is a flowchart showing the operation of the correction device following FIG. 6. 補正装置の動作の具体例を示す図である。（ａ）は、一回目の繰り返し処理の具体例であり、（ｂ）は、ｎ回目の繰り返し処理の具体例である。FIG. 7 is a diagram illustrating a specific example of the operation of the correction device. (A) is a specific example of the first repetition processing, and (b) is a specific example of the nth repetition processing. 補正装置の動作の具体例を示す図である。FIG. 7 is a diagram illustrating a specific example of the operation of the correction device. 補正装置の一構成例を示すブロック図である。FIG. 3 is a block diagram illustrating a configuration example of a correction device.

以下、本発明の一実施態様に係る補正装置について、図面を参照しながら詳細に説明する。 Hereinafter, a correction device according to an embodiment of the present invention will be described in detail with reference to the drawings.

＜実施の形態＞
図１に示すように、本発明に係る補正装置１００は、入力部１１０と、記憶部１２０と、抽出部１３１と、特定部１３２と、補正部１３３とを備える。補正装置１００は、音声認識の結果を示す認識結果情報２２０の入力を受け付けて、音声認識結果のテキストデータを補正する装置である。 <Embodiment>
As shown in FIG. 1, the correction device 100 according to the present invention includes an input unit 110, a storage unit 120, an extraction unit 131, a specification unit 132, and a correction unit 133. The correction device 100 is a device that receives input of recognition result information 220 indicating a result of voice recognition and corrects text data of the voice recognition result.

記憶部１２０は、文節と文節との間の関連性を示す文節間関連情報１２２を記憶している。記憶部１２０は、各種情報を記憶する機能を有する記憶媒体であり、例えば、ＨＤＤ、ＳＳＤ、フラッシュメモリ等により実現される。文節間関連情報１２２は、文節と文節との係り受けを定義した情報であり、文節の係り側の文言と、文節の受け側の文言とが対応付けられた情報である。 The storage unit 120 stores inter-segment association information 122 indicating the association between the segments. The storage unit 120 is a storage medium having a function of storing various types of information, and is realized by, for example, an HDD, an SSD, a flash memory, or the like. The inter-phrase related information 122 is information that defines the dependency between the phrases and the phrase, and is information in which the phrase on the side of the phrase and the phrase on the receiving side of the phrase are associated with each other.

入力部１１０は、音声信号をテキストデータに変換する音声認識処理の結果である認識結果情報２２０の入力を受け付ける。入力部１１０は、補正装置１００外部から入力を受け付ける入力インターフェースであり、例えば、ＵＳＢ等の入力ポートや、無線ＬＡＮ等の通信ポート、あるいは、キーボードやタッチパネル等の入力デバイスからの入力を受け付けるポート等により実現される。認識結果情報２２０は、少なくとも、音声信号をテキストに変換したテキストデータを含む情報であり、音声認識の入力元となった音声信号そのものも含まれてもよく、更には、音声信号を音素で表した情報（音韻のみの情報で、例えば、ローマ字等により表現される）も含まれてもよい。 The input unit 110 receives an input of recognition result information 220 which is a result of a voice recognition process for converting a voice signal into text data. The input unit 110 is an input interface that receives an input from outside the correction device 100, such as an input port such as a USB, a communication port such as a wireless LAN, or a port that receives an input from an input device such as a keyboard or a touch panel. Is realized by: The recognition result information 220 is at least information including text data obtained by converting a voice signal into text, and may include the voice signal itself that is the input source of the voice recognition. Further, the voice signal is represented by a phoneme. Information (only phonological information, for example, expressed in Roman characters or the like) may be included.

抽出部１３１は、入力部１１０に入力された認識結果情報２２０で示される文字列から文節の候補となる文節候補を抽出する。抽出部１３１は、認識結果情報２２０で示されるテキストデータの文字列に対して、一般に知られる構文解析を行って、文節足り得る文字列である文節候補を抽出するものであり、例えば、プロセッサや専用回路、ＦＰＧＡ等の再構成可能回路等により実現される。 The extraction unit 131 extracts a phrase candidate that is a phrase candidate from the character string indicated by the recognition result information 220 input to the input unit 110. The extraction unit 131 performs generally known syntax analysis on the character string of the text data indicated by the recognition result information 220 to extract a phrase candidate that is a character string that can be phrased. This is realized by a dedicated circuit, a reconfigurable circuit such as an FPGA, or the like.

特定部１３２は、記憶部１２０に記憶されている文節間関連情報１２２を用いて、抽出部１３１が抽出した文節候補同士の係り受けに基づいて、認識結果情報２２０で示される文字列を構成する可能性のある文節を特定する。認識結果情報２２０で示される文字列を構成する可能性のある文節とは、認識結果情報２２０で示される認識結果のテキストデータの文字列の修正候補となる可能性のある文節のことをいう。特定部１３２は、抽出部１３１が抽出した文節候補の組み合わせに対して、文節間関連情報１２２で示される係り受け関係にある文節対の情報を用いて認識結果情報２２０で示される文字列を構成する可能性のある文節候補を特定するものであり、例えば、プロセッサや専用回路、ＦＰＧＡ等の再構成可能回路等により実現される。 The specifying unit 132 configures the character string indicated by the recognition result information 220 based on the dependency between the phrase candidates extracted by the extracting unit 131, using the inter-phrase related information 122 stored in the storage unit 120. Identify potential clauses. The phrase that may form the character string indicated by the recognition result information 220 refers to a phrase that may be a candidate for correcting the character string of the text data of the recognition result indicated by the recognition result information 220. The specifying unit 132 forms a character string indicated by the recognition result information 220 using the information of the phrase pair having a dependency relationship indicated by the inter-segment related information 122 for the combination of the phrase candidates extracted by the extracting unit 131. This is for specifying a phrase candidate that may be performed, and is realized by, for example, a processor, a dedicated circuit, a reconfigurable circuit such as an FPGA, or the like.

補正部１３３は、特定部１３２が特定した文節候補を用いて、認識結果情報２２０で示される文字列を補正する。補正部１３３は、文節候補について、文の先頭から順番に文を構成する可能性のある確度と、認識結果情報２２０に含まれる音声信号の音の類似度とに基づいて、認識結果情報２２０で示される文字列を補正するものであり、例えば、プロセッサや専用回路、ＦＰＧＡ等の再構成可能回路などにより実現される。 The correction unit 133 corrects the character string indicated by the recognition result information 220 using the phrase candidate specified by the specifying unit 132. The correction unit 133 uses the recognition result information 220 based on the likelihood that the sentence is likely to form a sentence in order from the beginning of the sentence and the sound similarity of the audio signal included in the recognition result information 220. The character string shown is corrected, and is realized by, for example, a processor, a dedicated circuit, or a reconfigurable circuit such as an FPGA.

以下、音声認識処理の認識結果を補正する音声認識結果補正装置として機能する補正装置１００を、さらに、詳細に説明する。 Hereinafter, the correction device 100 that functions as a voice recognition result correction device that corrects the recognition result of the voice recognition process will be described in more detail.

図２は、音声をテキストデータに変換する流れであって、本発明に係る補正装置１００の概要を説明するための図である。本発明に係る補正装置１００は、音声認識処理により得られたテキストデータを補正するものである。 FIG. 2 shows a flow of converting voice into text data, and is a diagram for explaining an outline of the correction device 100 according to the present invention. The correction device 100 according to the present invention corrects text data obtained by a voice recognition process.

音声認識装置２００は、従来から知られている音声認識処理を行う装置であり、音声２１０の入力を受け付けて、音声信号２０１に変換する。音声信号２０１は、音の時間的変化を示す情報であってもよいし、入力された音声に対してフーリエ変換を施して得られる周波数信号であってもよい。音声認識装置２００は、その音声信号２０１を解析して、テキストデータに変換し、そのテキストデータを認識結果情報２２０として出力する装置である。 The voice recognition device 200 is a device that performs a conventionally known voice recognition process, and receives an input of a voice 210 and converts it into a voice signal 201. The audio signal 201 may be information indicating a temporal change of the sound, or may be a frequency signal obtained by performing Fourier transform on the input audio. The voice recognition device 200 is a device that analyzes the voice signal 201, converts it into text data, and outputs the text data as recognition result information 220.

まず、音声認識装置２００は、ユーザから音声信号の入力を、マイク等の音声入力装置を介して受け付ける。音声認識装置２００は、入力された音声信号波形に変換し、その音声信号波形から特徴量を抽出する。 First, the voice recognition device 200 receives an input of a voice signal from a user via a voice input device such as a microphone. The voice recognition device 200 converts the input voice signal into a voice signal waveform and extracts a feature amount from the voice signal waveform.

音声認識装置２００は、抽出した特徴量に対して、記憶してある音響モデル２０２と、言語モデル２０３と、を参照して、音声認識処理、即ち、音声信号をテキストデータに変換する処理を実行する。音響モデル２０２とは、音声認識処理で使用する言語に用いられる一語、一語の音の周波数特性を示す情報である。音響モデル２０２は、音声認識処理で使用する言語において定義されている単語についての周波数特性を示す情報も含んでよい。また、言語モデル２０３とは、音声認識処理で使用する言語で用いられる単語を示すテキストデータの集合である。言語モデル２０３は、音声認識処理で使用する言語で用いられる言い回しや、具体的例文のテキストデータも含んでよい。 The speech recognition device 200 executes a speech recognition process, that is, a process of converting a speech signal into text data with reference to the stored acoustic model 202 and language model 203 for the extracted feature amount. I do. The acoustic model 202 is one word used as a language used in the speech recognition processing, and information indicating the frequency characteristics of the sound of one word. The acoustic model 202 may also include information indicating a frequency characteristic of a word defined in a language used in the speech recognition processing. The language model 203 is a set of text data indicating words used in a language used in the speech recognition processing. The language model 203 may include a phrase used in a language used in the speech recognition processing and text data of a specific example sentence.

補正装置１００は、音声認識装置２００から出力された音声認識処理の結果である認識結果情報２２０を受け付ける。認識結果情報２２０は、認識結果の文字列を示すテキストデータである。認識結果情報２２０は、ユーザが入力した音声信号２０１またはその音声信号に基づいて生成される周波数信号、あるいは、音声信号２０１の音素を表現した音素情報を含む。また、認識結果情報２２０は、複数の認識結果を含んでよい。認識結果情報２２０は、さらに認識結果における各単語の信頼度の情報を含む。単語の信頼度は、音声認識結果のテキストデータにおいて、単語間の連なりとしての可能性や、入力された音声から単語としてあり得る可能性などに基づいて、所定のアルゴリズムに従って音声認識装置２００により算出されるものである。当該信頼度を算出するアルゴリズムには、従来より知られているものを使用する。 The correction device 100 receives recognition result information 220 that is a result of the voice recognition process output from the voice recognition device 200. The recognition result information 220 is text data indicating a character string of the recognition result. The recognition result information 220 includes the audio signal 201 input by the user, a frequency signal generated based on the audio signal, or phoneme information expressing a phoneme of the audio signal 201. Further, the recognition result information 220 may include a plurality of recognition results. The recognition result information 220 further includes information on the reliability of each word in the recognition result. The reliability of the word is calculated by the speech recognition apparatus 200 according to a predetermined algorithm based on the possibility of a sequence between words in the text data of the speech recognition result, the possibility of being a word from the input speech, and the like. Is what is done. As the algorithm for calculating the reliability, a conventionally known algorithm is used.

補正装置１００は、受け付けた認識結果情報２２０に対して、文節間関連情報１２２を用いて、解析を行い、認識結果情報２２０で示される文字列を構成する文字列（文節）を抽出する。この処理で、補正装置１００は、音声認識した文字列に対して、その文字列の補正対象となる文節をピックアップする。そして、補正装置１００は、ピックアップした文節を更に単語単位に分割して、それらの単語を補正後の文字列に使用される単語の候補とする。 The correction device 100 analyzes the received recognition result information 220 using the inter-segment relation information 122 and extracts a character string (clause) constituting the character string indicated by the recognition result information 220. In this process, the correction device 100 picks up a phrase to be corrected for the character string whose speech has been recognized. Then, the correction device 100 further divides the picked-up phrase into word units, and sets those words as word candidates to be used for the corrected character string.

補正装置１００は、抽出された単語から、単語間関連情報１２３を用いて、１以上の単語列と当該１以上の単語列に連なる単語のつながり関係、及び、音声信号の音素とから、補正候補となる文字列を特定し、最終的に、補正後の文字列を出力する。１以上の単語列とは、少なくとも１つの単語を含む文字列のことであり、文章等に連続して登場する単語を連ねた文字列のことである。 The correction device 100 uses the inter-word relation information 123 to select a correction candidate from the extracted words based on the connection relationship between one or more word strings and the words connected to the one or more word strings and the phonemes of the audio signal. Then, the corrected character string is output. The one or more word strings are character strings including at least one word, and are character strings in which words appearing continuously in a sentence or the like are linked.

即ち、補正装置１００は、音声認識処理の結果を補正する装置であり、ユーザの発話内容の推定をする処理を行う装置であるとも言える。 That is, the correction device 100 is a device that corrects the result of the voice recognition process, and can be said to be a device that performs a process of estimating the utterance content of the user.

図２に示す例では、「ミカンがあります」と話したユーザの音声２１０を、音声認識装置２００が誤認識して、「ミカン終わります」という認識結果情報２２０を出力している。これに対し、補正装置１００がその認識結果情報２２０を「ミカンがあります」という内容に補正した例を示している。 In the example illustrated in FIG. 2, the voice recognition device 200 erroneously recognizes the voice 210 of the user who has spoken “There is mandarin orange”, and outputs recognition result information 220 stating that “mandarin is over”. On the other hand, an example is shown in which the correction device 100 corrects the recognition result information 220 to the content of "There is orange".

図３は、補正装置１００の詳細な機能構成を示すブロック図である。図３に示すように、補正装置１００は、入力部１１０と、記憶部１２０と、制御部１３０と、出力部１４０とから構成される。補正装置１００は、入力インターフェースと、出力インターフェースとを有するコンピュータシステムである。補正装置１００の、入力部１１０と、記憶部１２０と、制御部１３０と、出力部１４０とは、互いに、バスを介して接続されている。 FIG. 3 is a block diagram illustrating a detailed functional configuration of the correction device 100. As shown in FIG. 3, the correction device 100 includes an input unit 110, a storage unit 120, a control unit 130, and an output unit 140. The correction device 100 is a computer system having an input interface and an output interface. The input unit 110, the storage unit 120, the control unit 130, and the output unit 140 of the correction device 100 are connected to each other via a bus.

入力部１１０は、外部の装置からの入力を受け付ける入力インターフェースである。ここでは、入力部１１０は、音声認識装置による認識結果である認識結果情報２２０の入力を受け付ける。認識結果情報２２０は、音声認識処理の結果得られた１以上のテキストデータ（音声認識の結果としてあり得ると音声認識装置２００が判断した文字列）と、テキストデータに含まれる各単語の信頼度を示す数値情報と、当該音声信号を音素で表現した音素情報とを含む。認識結果情報２２０が認識結果としてのテキストデータを複数含む場合には、どのテキストデータが音声認識の結果として可能性が最も高いのかを示す情報を含み、単語の信頼度は、その可能性が最も高いと音声認識装置２００が判断したテキストデータに含まれる単語に対するもののみが含まれる。また、認識結果情報２２０は、元となった音声信号を含んでも良い。なお、入力部１１０は、認識結果情報２２０に相当する情報の入力を受け付けるものであればよく、音声認識装置２００に接続されているマイクに接続して、ユーザが入力する音声を取得し、認識結果の文字列をユーザからキーボードなどの入力デバイスを用いて直接手入力で認識結果を入力することで、認識結果情報２２０の入力を受け付けることとしてもよい。 The input unit 110 is an input interface that receives an input from an external device. Here, the input unit 110 receives an input of recognition result information 220 which is a recognition result by the speech recognition device. The recognition result information 220 includes one or more pieces of text data obtained as a result of the voice recognition processing (character strings determined by the voice recognition device 200 to be possible as a result of the voice recognition) and the reliability of each word included in the text data. , And phoneme information representing the audio signal by phonemes. When the recognition result information 220 includes a plurality of pieces of text data as a recognition result, the recognition result information 220 includes information indicating which text data has the highest possibility as a result of the speech recognition, and the reliability of the word indicates that the possibility is the highest. Only those for words included in the text data determined by the speech recognition device 200 to be high are included. Further, the recognition result information 220 may include the original audio signal. The input unit 110 may be any unit that accepts input of information corresponding to the recognition result information 220. The input unit 110 is connected to a microphone connected to the voice recognition device 200 to acquire a voice input by the user and perform recognition. The input of the recognition result information 220 may be received by directly inputting the recognition result from the user using the input device such as a keyboard.

記憶部１２０は、補正装置１００が動作上必要とする各種のプログラムやデータを記憶する記憶媒体である。記憶部１２０は、例えば、ＨＤＤ（Hard Disc Drive）、ＳＳＤ（Solid State Drive）、フラッシュメモリ等により実現することができる。 The storage unit 120 is a storage medium that stores various programs and data required for the operation of the correction device 100. The storage unit 120 can be realized by, for example, a hard disk drive (HDD), a solid state drive (SSD), a flash memory, or the like.

記憶部１２０は、ユーザ発話コーパス１２１と、文節間関連情報１２２と、単語間関連情報１２３とを記憶している。 The storage unit 120 stores a user utterance corpus 121, inter-segment related information 122, and inter-word related information 123.

ユーザ発話コーパス１２１は、ユーザの発言をテキストデータにしたデータ集合である。ここでいうユーザは、音声信号を入力するユーザであってもよいし、音声信号を入力するユーザとは関係がない複数のユーザであってもよい。音声信号を入力するユーザのみの発言を集めたユーザ発話コーパスであれば、そのユーザに特化した補正装置１００を提供することができる。一方で、様々なユーザの発言を集めたユーザ発話コーパスであれば、汎用性の高い補正装置１００を提供することができる。なお、様々なユーザの発言を集めたユーザ発話コーパスの場合において、音声信号を入力するユーザの発話の情報が含まれてもよい。ユーザ発話コーパス１２１は、ユーザが発話した文章の集合であり、例えば、「ミカンがあります。」とか、「リンゴが机の上にあります。」とか、「来週で公演が終わります。」というような具体的文章のテキストデータを数多く含む。 The user utterance corpus 121 is a data set in which a user's utterance is converted into text data. The user here may be a user who inputs an audio signal, or may be a plurality of users who are not related to the user who inputs the audio signal. If the user utterance corpus is a collection of utterances of only the user who inputs the audio signal, the correction device 100 specialized for the user can be provided. On the other hand, if the user utterance corpus is a collection of utterances of various users, the versatile correction device 100 can be provided. Note that, in the case of a user utterance corpus in which utterances of various users are collected, utterance information of a user who inputs an audio signal may be included. The user utterance corpus 121 is a set of sentences uttered by the user, for example, “There is a mandarin orange”, “There is an apple on a desk”, or “The performance will end next week”. Contains many text data of specific sentences.

文節間関連情報１２２は、ユーザ発話コーパス１２１から作成される情報であり、文節間の係り受けを示す情報である。ユーザ発話コーパス１２１は、前述の通りユーザの発話をテキストデータ化したものであり、その中には、文節同士の係り受けについての情報も含まれる。そこで、ユーザ発話コーパス１２１に対して、構文解析を行うことで、文節と文節との係り受けに関する情報を抽出できる。したがって、文節間関連情報１２２は、係り受け関係にある文節対における係り側の文節を示す文言と、受け側の文節を示す文言とを対応付けた情報となる。表１は、文節間関連情報１２２のデータ構成例を示したデータ概念図である。文節間関連情報１２２は、ユーザ発話コーパス１２１を用いて制御部１３０が構文解析を行うことで作成してもよいし、予め作成したものを記憶部１２０に記憶していることとしてもよい。文節間関連情報１２２においては、同じ文節同士の組み合わせの重複を許して登録することができる。なお、当該構成の代わりに、下記表１において、その組み合わせのユーザ発話コーパス１２１における登場回数の情報が対応付けられていてもよい。例えば、ユーザ発話コーパス１２１内に、ユーザの発話として、「ミカンが机の上にあります。」という文章があった場合には、文節間関連情報１２２は、以下の表１に示すような係り受け情報を含むこととなる。 The inter-phrase related information 122 is information created from the user utterance corpus 121, and is information indicating dependency between phrasals. The user utterance corpus 121 is obtained by converting the utterance of the user into text data as described above, and includes information on dependency between phrases. Therefore, by performing syntactic analysis on the user utterance corpus 121, it is possible to extract information on the dependency between phrases. Therefore, the inter-segment related information 122 is information in which a sentence indicating a phrase on the modification side in a phrase pair having a dependency relationship is associated with a sentence indicating a phrase on the receiving side. Table 1 is a conceptual data diagram showing an example of the data configuration of the inter-segment relation information 122. The inter-phrase related information 122 may be created by the control unit 130 performing syntax analysis using the user utterance corpus 121, or information created in advance may be stored in the storage unit 120. In the inter-segment related information 122, the same clauses can be registered while allowing overlapping of combinations. Instead of this configuration, in Table 1 below, information on the number of appearances of the combination in the user utterance corpus 121 may be associated. For example, when there is a sentence “Mikan is on the desk” as a user's utterance in the user utterance corpus 121, the inter-segment related information 122 is changed as shown in Table 1 below. Information will be included.

単語間関連情報１２３は、ユーザ発話コーパス１２１から作成される情報であり、単語間の係り受けを示す情報である、ユーザ発話コーパス１２１は、前述の通りユーザの発話をテキストデータ化したものであり、その中には、１以上の単語列と、その単語列に連なる単語についての情報も含まれる。そこで、ユーザ発話コーパス１２１に対して構文解析を行うことで、１以上の単語列とその１以上の単語列に連なる単語との係り受けに関する情報を抽出できる。したがって、単語間関連情報１２３は、１以上の単語列（以下、係り側単語列と呼称することもある）と、その１以上の単語列に連なる単語とを対応付けた情報となる。このような１以上の単語列と、単語との組み合わせの情報はn-gramモデルと呼称されることもある。表２は、単語間関連情報１２３のデータ構成例を示したデータ概念図である。単語間関連情報１２３においては、ユーザ発話コーパス１２１に登場する文章に基づいて作成され、同じ１以上の単語列とそれに連なる単語の組み合わせの重複を許して登録することができる。なお、当該構成の代わりに、下記表２において、その組み合わせの登場回数の情報が対応付けられていてもよい。単語間関連情報１２３は、制御部１３０により作成されてもよいし、予め作成したものを記憶部１２０に記憶していることとしてもよい。例えば、ユーザ発話コーパス１２１内に、ユーザの発話として、「ミカンが机の上にあります。」という文章があった場合には、単語間関連情報１２３は、以下の表２に示すような対応情報を含むこととなる。なお、表２における「（文頭）」は、文頭を示す識別子であり、補正装置１００は、「（文頭）」も一単語として扱って、補正対象となる文字列を生成する。 The inter-word related information 123 is information created from the user utterance corpus 121, and is information indicating dependency between words. The user utterance corpus 121 is obtained by converting user utterance into text data as described above. The information includes one or more word strings and information on words connected to the word strings. Therefore, by performing a syntax analysis on the user utterance corpus 121, it is possible to extract information on the dependency between one or more word strings and words connected to the one or more word strings. Therefore, the inter-word related information 123 is information that associates one or more word strings (hereinafter, sometimes referred to as “involving word strings”) with the words connected to the one or more word strings. Such information of a combination of one or more word strings and words may be called an n-gram model. Table 2 is a data conceptual diagram showing a data configuration example of the inter-word related information 123. The inter-word related information 123 is created based on a sentence appearing in the user utterance corpus 121, and can be registered while permitting duplication of a combination of one or more word strings and words connected thereto. Instead of this configuration, in Table 2 below, information on the number of appearances of the combination may be associated. The inter-word relation information 123 may be created by the control unit 130 or may be created in advance and stored in the storage unit 120. For example, when there is a sentence “Mikan is on the desk” in the user utterance corpus 121 as the utterance of the user, the inter-word related information 123 becomes the correspondence information as shown in Table 2 below. Will be included. Note that “(sentence head)” in Table 2 is an identifier indicating the head of the sentence, and the correction device 100 generates a character string to be corrected by treating “(head of sentence)” as one word.

制御部１３０は、補正装置１００の各部を制御する機能を有するプロセッサである。制御部１３０は、抽出部１３１と、特定部１３２と、補正部１３３とを含む。制御部１３０は、記憶部１２０に記憶されている補正プログラムを実行することにより、入力された認識結果情報２２０で示される文字列の補正候補となる文字列を出力することができる。なお、補正候補がない場合には、補正候補は出力されず、補正候補を発見できなかったことを示す情報を出力する。制御部１３０による認識結果情報２２０で示される文字列の補正は、大別して、抽出部１３１と特定部１３２により、文字列を構成する可能性のある文節及び単語を特定する第一段階と、特定した文節及び単語を用いて、補正候補の文を生成する第二段階との、二つの段階を経て実行される。 The control unit 130 is a processor having a function of controlling each unit of the correction device 100. The control unit 130 includes an extraction unit 131, a specification unit 132, and a correction unit 133. By executing the correction program stored in the storage unit 120, the control unit 130 can output a character string that is a correction candidate for the character string indicated by the input recognition result information 220. If there is no correction candidate, no correction candidate is output, and information indicating that no correction candidate was found is output. The correction of the character string indicated by the recognition result information 220 by the control unit 130 is roughly divided into a first stage in which the extraction unit 131 and the specification unit 132 specify phrases and words that may form the character string, The second step is a step of generating a sentence of a correction candidate by using the phrase and the word thus obtained.

抽出部１３１は、認識結果情報２２０で示される文字列から、文字列に含まれている可能性のある文節を抽出する。抽出部１３１は、認識結果情報２２０で示される単語に分かれている文字列を一連の文字列につないだ後に、構文解析を行って、文節の候補を抽出する。ここで、構文解析には、従来からある構文解析技術を用いることとする。 The extracting unit 131 extracts a phrase that may be included in the character string from the character string indicated by the recognition result information 220. After connecting the character string divided into words indicated by the recognition result information 220 to a series of character strings, the extraction unit 131 performs syntax analysis to extract a phrase candidate. Here, a conventional syntax analysis technique is used for the syntax analysis.

特定部１３２は、抽出部１３１が抽出した文節の候補について、文節の係り受けの組み合わせを特定する。そして、特定部１３２は、特定した係り受けの関係にある文節対を構成する文節それぞれの信頼度を算出する。この信頼度は、認識結果情報に含まれる各単語の信頼度を用いて算出する。具体的には、文節を構成する単語それぞれの信頼度の平均値を文節の信頼度として算出する。 The specifying unit 132 specifies a combination of phrase dependencies for the phrase candidates extracted by the extracting unit 131. Then, the specifying unit 132 calculates the reliability of each of the phrases forming the phrase pair having the specified dependency relationship. This reliability is calculated using the reliability of each word included in the recognition result information. Specifically, the average value of the reliability of each word constituting the phrase is calculated as the reliability of the phrase.

特定部１３２は、文節対のうち、算出した信頼度の高い方の文節を基準の文節とする。そして、特定部１３２は、基準の文節に対応する文節を文節間関連情報１２２から特定する。具体的には、信頼度の高い方の文節が係り側である場合には、基準文節に対して音の類似度が所定の閾値を超える文節を文節間関連情報１２２の係り側文節から特定し、特定した係り側文節に対応する受け側の文節の文字列を特定する。逆に信頼度の高い方の文節が受け側である場合には、その文節に対応する係り側の文節の文字列を文節間関連情報１２２から特定する。例えば、信頼度の高い文節が、「ミカンが」という文節であった場合、上述の表１の例で言えば、対応する文節として、「あります」、「机の」、「上に」といった文節が特定される。これらの特定される文節が補正候補の文節となる。 The specifying unit 132 sets the calculated phrase having the higher reliability among the phrase pairs as a reference phrase. Then, the specifying unit 132 specifies a phrase corresponding to the reference phrase from the inter-phrase related information 122. Specifically, when the phrase with higher reliability is the involuntary side, the phrase whose sound similarity to the reference phrase exceeds a predetermined threshold is specified from the intervening side phrase of the inter-phrase related information 122. Then, the character string of the clause on the receiving side corresponding to the specified clause on the participant side is specified. Conversely, when the phrase with higher reliability is the receiving side, the character string of the phrase on the participant corresponding to the phrase is identified from the inter-phrase related information 122. For example, if the phrase having a high degree of reliability is the phrase “Mikanga”, then in the example of Table 1 above, the corresponding phrases “Aru”, “Desktop”, and “On” Is specified. These specified phrases become phrases that are correction candidates.

特定部１３２は特定した文節と、認識結果情報２２０で示される音声信号との音の類似度を算出し、算出した音の類似度が所定の閾値を超えるか否かによって、文節の補正候補となるかどうかを判定する。ここで、所定の閾値は、音の類似度と比較し、音の類似度が一定以上高いか否かを判定するために用いる値のことであり、予め設定された値である。特定部１３２は、特定した文節と、認識結果情報２２０で示される音声信号との音の類似度が所定の閾値を超える場合に、その特定した文節を、補正用の文節の候補とする。そして、特定部１３２は、その文節を構成する単語を補正後の文章を構成する可能性のある単語として記憶部１２０に記憶する。ここで、所定の閾値は、補正用の文節を発見し得る値として、予めシミュレーション等を行うことにより適切な値を設定するものとする。また、音の類似度とは、比較対象の文節（又は単語）の音韻の類似度のことであり、例えば、文節Ａと文節Ｂとの音の類似度を比較する場合には、文節Ａをローマ字表記したときのアルファベット列と、文節Ｂをローマ字表記したときのアルファベット列との間の文字列同士の類似度を算出する。なお、音の類似度は、音声信号同士を比較して算出することとしてもよいが、この場合、ユーザ発話コーパス１２１に含まれる文節や単語の音声信号を記憶部１２０に記憶しておく、または、ネットワーク等から取得する必要がある。 The specifying unit 132 calculates the similarity of the sound between the specified phrase and the audio signal indicated by the recognition result information 220, and determines whether or not the candidate for correction of the phrase depends on whether the calculated similarity of the sound exceeds a predetermined threshold. Is determined. Here, the predetermined threshold value is a value used for comparing with the sound similarity and determining whether the sound similarity is higher than a certain value or not, and is a preset value. When the similarity between the specified phrase and the sound signal indicated by the recognition result information 220 exceeds a predetermined threshold, the specifying unit 132 sets the specified phrase as a candidate for a correction phrase. Then, the specifying unit 132 stores the words forming the phrase in the storage unit 120 as words that may form the corrected text. Here, the predetermined threshold is set to an appropriate value by performing a simulation or the like in advance as a value at which a phrase for correction can be found. The sound similarity refers to the similarity of the phonemes of the phrase (or word) to be compared. For example, when comparing the similarity of the sounds of the phrase A and the phrase B, the phrase A The similarity between the character strings between the alphabet string when the Roman alphabet is written and the alphabet string when the phrase B is written in the Roman alphabet is calculated. Note that the sound similarity may be calculated by comparing voice signals, but in this case, voice signals of phrases and words included in the user utterance corpus 121 are stored in the storage unit 120, or Must be obtained from a network or the like.

補正部１３３は、抽出部１３１と特定部１３２とにより、抽出された単語候補（記憶部１２０に記憶した補正後の文章を構成する可能性のある単語）に基づいて、認識結果情報２２０で示される文字列の修正候補となる文字列を生成する。補正部１３３は、記憶部１２０に記憶された単語群の中から文章の先頭から、文章の先頭として成立し得る単語を特定し、特定した単語と、認識結果情報２２０で示される音声信号（音素情報）との音の類似度に基づいて、文章の先頭から適格な単語を特定し、文の先頭から順番に文章を単語を特定して行きながら、補正文章を生成する。補正部１３３による補正文章の生成処理の更なる詳細についてはフローチャートを用いて後述する。 The correction unit 133 uses the recognition result information 220 based on the extracted word candidates (words that may be included in the corrected text stored in the storage unit 120) and extracted by the extraction unit 131 and the identification unit 132. Generate a character string that is a candidate for correcting the character string to be modified. The correction unit 133 specifies a word that can be established as the head of the sentence from the head of the sentence from the group of words stored in the storage unit 120, and specifies the specified word and a speech signal (phoneme) indicated by the recognition result information 220. Based on the similarity of the sound with the information), a qualified sentence is specified from the beginning of the sentence, and the corrected sentence is generated while identifying the sentence in order from the beginning of the sentence. Further details of the correction text generation process performed by the correction unit 133 will be described later using a flowchart.

出力部１４０は、制御部１３０から出力された補正結果を示す補正結果情報２３０を補正装置１００外部に出力する機能を有する出力インターフェースである。出力部１４０は、例えば、補正装置１００に接続されたモニタに補正後の文字列を表示することで、補正結果情報２３０を出力することとしてもよいし、ユーザの端末に、補正結果情報２３０を送信することで出力することとしてもよい。 The output unit 140 is an output interface having a function of outputting correction result information 230 indicating a correction result output from the control unit 130 to the outside of the correction device 100. The output unit 140 may output the correction result information 230 by, for example, displaying the corrected character string on a monitor connected to the correction device 100, or may output the correction result information 230 to a user terminal. It may be output by transmitting.

以上が、補正装置１００の構成である。 The above is the configuration of the correction device 100.

＜動作＞
ここから、補正装置１００の動作について図４、図６、図７のフローチャートを用いて説明する。また、図５、図８、図９を用いて、具体的な補正例を説明する。 <Operation>
The operation of the correction device 100 will now be described with reference to the flowcharts of FIGS. 4, 6, and 7. Further, a specific correction example will be described with reference to FIGS.

図４、図６、図７は、補正装置１００が音声認識装置２００による音声を認識した結果である認識結果情報２２０の入力を受け付けて、補正結果を出力するまでの補正装置１００の処理を示している。図４は、抽出部１３１及び特定部１３２による認識結果情報２２０で示される文字列から文節を抽出し、単語候補を特定する第一段階の処理を示している。そして、図６、図７は、補正部１３３による補正文の候補を生成する第二段階の処理を示している。 4, 6, and 7 show processing of the correction device 100 from when the correction device 100 receives the input of the recognition result information 220, which is the result of recognition of the voice by the voice recognition device 200, until the correction result is output. ing. FIG. 4 illustrates a first-stage process of extracting a phrase from a character string indicated by the recognition result information 220 by the extraction unit 131 and the identification unit 132 and identifying a word candidate. 6 and 7 show a second-stage process of generating a correction sentence candidate by the correction unit 133.

＜第一段階の処理＞
（ステップＳ４０１）
ステップＳ４０１において、補正装置１００の入力部１１０は、認識結果情報２２０の入力を受け付ける、入力部１１０は、受け付けた認識結果情報２２０を記憶部１２０に記憶して、ステップＳ４０２に移行する。 <First stage processing>
(Step S401)
In step S401, the input unit 110 of the correction apparatus 100 receives the input of the recognition result information 220. The input unit 110 stores the received recognition result information 220 in the storage unit 120, and proceeds to step S402.

（ステップＳ４０２）
ステップＳ４０２において、制御部１３０の抽出部１３１は、認識結果情報２２０で示されるテキストデータの文字列に対して構文解析を行って、文字列から成立し得る文節を抽出する。 (Step S402)
In step S402, the extraction unit 131 of the control unit 130 performs a syntax analysis on the character string of the text data indicated by the recognition result information 220, and extracts a clause that can be formed from the character string.

（ステップＳ４０３）
制御部１３０の特定部１３２は、ステップＳ４０３からステップＳ４１３までの処理を、係り受け関係にある文節対毎に実行する。 (Step S403)
The specifying unit 132 of the control unit 130 executes the processing from step S403 to step S413 for each phrase pair having a dependency relationship.

（ステップＳ４０４）
ステップＳ４０４において、特定部１３２は、文節対を構成する各文節の信頼度を算出する。ここで、文節の信頼度は、上述の通り、認識結果情報２２０に含まれる単語の信頼度に基づいて算出される。即ち、特定部１３２は、文節対の一方の文節について、その文節が文節間関連情報１２２の係り側または受け側に登場する登場頻度を算出し、一方の文節の信頼度とする。また、特定部１３２は、文節対の他方の文節についても、その文節が文節間関連情報１２２の係り側または受け側に登場する登場頻度を算出し、他方の文節の信頼度とする。文節対を構成する文節双方の信頼度を算出すると、ステップＳ４０５に移行する。 (Step S404)
In step S404, the specifying unit 132 calculates the reliability of each of the phrases forming the phrase pair. Here, the reliability of the phrase is calculated based on the reliability of the words included in the recognition result information 220 as described above. In other words, the specifying unit 132 calculates the appearance frequency of one of the phrases in the phrase pair, the frequency of appearance of the phrase on the engagement side or the receiving side of the inter-phrase related information 122, and sets the frequency as the reliability of the one phrase. In addition, the specifying unit 132 also calculates the appearance frequency of the other phrase in the phrase pair, which appears on the part of the inter-phrase related information 122 that is related to or associated with the phrase, and uses the calculated frequency as the reliability of the other phrase. After calculating the reliability of both of the clauses forming the clause pair, the process proceeds to step S405.

（ステップＳ４０５）
ステップＳ４０５において、特定部１３２は、文節対を構成する一方の文節の信頼度と、他方の文節の信頼度とを比較し、係る側の文節の信頼度が、受ける側の文節の信頼度よりも高いか否かを判定する。係る側の文節の信頼度が、受ける側の文節の信頼度よりも高い場合には（ＹＥＳ）、ステップＳ４０６に移行し、係る側の文節の信頼度が、受ける側の信頼度よりも高くない場合には（ＮＯ）、ステップＳ４０７に移行する。 (Step S405)
In step S405, the specifying unit 132 compares the reliability of one of the phrases forming the phrase pair with the reliability of the other phrase, and determines that the reliability of the relevant phrase is greater than the reliability of the phrase on the receiving side. Is also high. When the reliability of the phrase on the receiving side is higher than the reliability of the phrase on the receiving side (YES), the process proceeds to step S406, and the reliability of the phrase on the side is not higher than the reliability of the receiving side. In this case (NO), the process proceeds to step S407.

（ステップＳ４０６）
ステップＳ４０６において、特定部１３２は、係る側の文節を、基準の文節に設定する。基準の文節とは、認識結果情報２２０で示される文字列の文節の候補となり得る文節である。係る側の文節を基準の文節に設定すると、ステップＳ４０８に移行する。 (Step S406)
In step S406, the specifying unit 132 sets the relevant phrase as a reference phrase. The reference phrase is a phrase that can be a candidate for the phrase of the character string indicated by the recognition result information 220. When the phrase on the side is set as a reference phrase, the process proceeds to step S408.

（ステップＳ４０７）
ステップＳ４０７において、特定部１３２は、受け側の文節を、基準の文節に設定する。受け側の文節を基準の文節に設定すると、ステップＳ４０８に移行する。 (Step S407)
In step S407, the specifying unit 132 sets the phrase on the receiving side as a reference phrase. When the phrase on the receiving side is set as the reference phrase, the process proceeds to step S408.

（ステップＳ４０８）
ステップＳ４０８において、特定部１３２は、音の類似度を用いて基準文節のリスト出しを実行する。具体的には、特定部１３２は、基準の文節で示される文言と、文節間関連情報１２２において基準の文節の係り受けに一致する方の文節で示される文言との、音の類似度を算出し、その類似度が所定の閾値となる文言を基準文節としてリスト出しする。例えば、「ミカンが」という文節が基準の文節に設定した場合に、「ミカンを」とか「未完の」といった文節が文節間関連情報１２２から抽出されて、基準文節のリストに追加される。基準文節のリスト出しを行うと、ステップＳ４０９に移行する。 (Step S408)
In step S408, the specifying unit 132 executes the list extraction of the reference clause using the similarity of the sound. Specifically, the specifying unit 132 calculates the similarity of the sound between the phrase indicated by the reference phrase and the phrase indicated by the phrase that matches the dependency of the reference phrase in the inter-phrase related information 122. Then, words whose similarity becomes a predetermined threshold are listed as reference phrases. For example, when the phrase “Mikan wa” is set as the reference phrase, the phrase “Mikan” or “unfinished” is extracted from the inter-phrase related information 122 and added to the list of reference phrases. After listing the reference clause, the process proceeds to step S409.

（ステップＳ４０９）
特定部１３２は、ステップＳ４０９からステップＳ４１１の処理を、ステップＳ４０８においてリストに入れた基準文節全てについて繰り返す。 (Step S409)
The specifying unit 132 repeats the processing from step S409 to step S411 for all the reference clauses included in the list in step S408.

（ステップＳ４１０）
ステップＳ４１０において、特定部１３２は、ステップＳ４０８におけるリストの中の一つの基準分節に対応する係り側または受け側の文節を文節間関連情報１２２から特定する。つまり、基準文節が係り側の文節であった場合に、文節間関連情報１２２において対応する受け側の文節を特定する。逆に基準文節が受け側の文節であった場合に、文節間関連情報１２２において対応する係り側の文節を特定する。例えば、「ミカンが」という文節がリストにあった場合に、その受け側の文節として「あります」という文節を特定する。特定部１３２は、特定した文節の音と、認識結果情報２２０で示される音声信号の音の類似度を算出する。特定部１３２は、算出した類似度が所定の閾値を超えるか否かに応じて、補正文節の候補出しを行う。 (Step S410)
In step S410, the specifying unit 132 specifies the phrase on the participant side or the receiving side corresponding to one reference segment in the list in step S408 from the inter-phrase related information 122. In other words, when the reference phrase is the phrase on the participant side, the corresponding phrase on the receiving side is specified in the inter-phrase related information 122. Conversely, when the reference phrase is the phrase on the receiving side, the corresponding phrase on the participating side is specified in the inter-phrase related information 122. For example, if the phrase “Mikan is” is in the list, the phrase “Yes” is specified as the phrase on the receiving side. The specifying unit 132 calculates the similarity between the sound of the specified phrase and the sound of the audio signal indicated by the recognition result information 220. The specifying unit 132 performs correction phrase candidate search depending on whether the calculated similarity exceeds a predetermined threshold.

（ステップＳ４１１）
ステップＳ４１１において、特定部１３２は、ステップＳ４０８においてリスト出ししたすべての基準文節について、ステップＳ４１０の処理を実行したか否かを判定し、実行していない場合には、リスト出しした基準文節中の基準文節について、ステップＳ４１０の処理を実行していないものに、ステップＳ４１０の処理を実行する。リストに含まれる基準文節全てについて、ステップＳ４１０の処理を実行している場合には、ステップＳ４１２の処理に移行する。 (Step S411)
In step S411, the specifying unit 132 determines whether or not the processing of step S410 has been performed for all the reference clauses listed in step S408. The processing of step S410 is executed for the reference clause for which the processing of step S410 has not been executed. If the process of step S410 has been performed for all the reference clauses included in the list, the process proceeds to step S412.

（ステップＳ４１２）
ステップＳ４１２において、特定部１３２は、補正文節の候補各々を、単語単位に分割して、処理を終了する。 (Step S412)
In step S412, the specifying unit 132 divides each of the corrected phrase candidates into words, and ends the process.

（ステップＳ４１３）
ステップＳ４１３において、特定部１３２は、係り受け関係にあるすべての文節対について、ステップＳ４０４からＳ４１２の処理を実行しているか否かを判定し、すべての文節対について処理を実行していると判定した場合に処理を終了する。 (Step S413)
In step S413, the specifying unit 132 determines whether or not the processing of steps S404 to S412 has been performed for all of the phrase pairs in the dependency relationship, and determines that the processing has been performed for all of the phrase pairs. If so, the process ends.

図４の処理を実行することにより、補正装置１００は、認識結果情報２２０で示される文字列を構成する可能性の高い単語を抽出することができる。以上が第一段階の処理の内容である。この認識結果情報２２０で示される文字列を構成する可能性の高い単語を抽出する処理について具体例を用いて説明する。 By executing the processing in FIG. 4, the correction device 100 can extract a word having a high possibility of forming a character string indicated by the recognition result information 220. The above is the contents of the first stage processing. A process of extracting a word having a high possibility of forming a character string indicated by the recognition result information 220 will be described using a specific example.

＜第一段階の処理の具体例＞
図４に示した処理を、具体例を用いて説明する。 <Specific example of first stage processing>
The processing illustrated in FIG. 4 will be described using a specific example.

補正装置１００は、認識結果情報２２０として、「ミカン」「終わり」「ます」という文字列の入力を受け付けたとする。 It is assumed that the correction device 100 has received the input of the character string “Mikan”, “End”, and “Masu” as the recognition result information 220.

すると、補正装置１００の抽出部１３１は、まず、認識結果情報２２０で示される単語群を連結して、一連の文字列を生成する。したがって、この場合、抽出部１３１は、「ミカン終わります」という文字列を得る。 Then, the extraction unit 131 of the correction device 100 first connects the word groups indicated by the recognition result information 220 to generate a series of character strings. Therefore, in this case, the extraction unit 131 obtains a character string “the orange ends”.

抽出部１３１は、連結した文字列に対して構文解析（係り受け解析）を行って、文節としてあり得る文字列を抽出する。例えば、抽出部１３１は、「ミカン終わります」という文字列であれば、「ミカン」という文字列５１３と、「終わります」という文字列５１４の文節を抽出する。ここまでの、処理が図４のステップＳ４０１からＳ４０３の処理に該当する。 The extraction unit 131 performs a syntax analysis (dependency analysis) on the connected character string, and extracts a character string that may be a phrase. For example, if the character string is “Mikan ends”, the extraction unit 131 extracts the phrases of the character string 513 of “Mikan” and the character string 514 of “End”. The processing so far corresponds to the processing of steps S401 to S403 in FIG.

これらの文字列について、特定部１３２は、それぞれの信頼度を算出する。ここで信頼度は、認識結果情報２２０に含まれる各単語の信頼度に基づいて算出する。文節が複数の単語から成る場合には、各単語の信頼度の平均値を用いる。データ５１５では、例えば、「ミカン」という文節に対して信頼度０．６６７が算出されている。また、「終わります」という文節に対して、この文節を構成する「終わり」という単語に対して、０．２５５という信頼度が算出され、「ます」という単語に対して、０．５２９という信頼度が算出されている。その結果、「終わります」という文節に対して、その平均値である０．３９２（（０．２５５＋０．５２９）／２）という信頼度が算出される。当該処理は、図４のステップＳ４０４の処理に該当する。 For these character strings, the specifying unit 132 calculates the respective degrees of reliability. Here, the reliability is calculated based on the reliability of each word included in the recognition result information 220. If the phrase is composed of a plurality of words, the average value of the reliability of each word is used. In the data 515, for example, a reliability of 0.667 is calculated for the phrase “Mikan”. In addition, for the phrase "finish", a confidence level of 0.255 is calculated for the word "finish" that forms this phrase, and for the word "masu", a confidence level of 0.529 is calculated. The degree has been calculated. As a result, a confidence value of 0.392 ((0.255 + 0.529) / 2), which is the average value, is calculated for the phrase “finished”. This processing corresponds to the processing in step S404 in FIG.

特定部１３２は、係り受け関係にある文節同士の信頼度を比較する。当該処理は、図４のステップＳ４０５の処理に該当する。係り側である「ミカン」の信頼度０．６６７の方が高いことを検出する。したがって、図４のステップＳ４０６に示すように、特定部１３２は、「ミカン」を基準の文節に設定する。 The specifying unit 132 compares the degrees of reliability of the phrases having a dependency relationship. This processing corresponds to the processing in step S405 in FIG. It is detected that the reliability of “Mikan”, which is the person in charge, is higher at 0.667. Therefore, as shown in Step S406 of FIG. 4, the specifying unit 132 sets “Mikan” as a reference phrase.

次に、特定部１３２は、設定した基準の文節（「ミカン」という係り側文節）について、文節間関連情報１２２を参照して、係り側として登録されている文節との比較を行って、音の類似度が高い係り側文節を特定する。当該処理は、図４のステップＳ４０８の処理に該当する。例えば、図５の例で言えば、データ５１７に示すように、特定部１３２が「ミカン」という文節と、「ミカンが」という文節とがリスト出しできた例を示している。 Next, the specifying unit 132 compares the phrase of the set reference (the dependent phrase “Mikan”) with the phrase registered as the dependent by referring to the inter-phrase related information 122 to generate a sound. The relevant clause having a high degree of similarity is specified. This processing corresponds to the processing in step S408 in FIG. For example, in the example of FIG. 5, as shown in the data 517, an example is shown in which the specifying unit 132 can list the phrase “Mikan” and the phrase “Mikan wa”.

次に、特定部１３２は、リスト出しした文節に対して、受け側として登録されている文節を文節間関連情報１２２から取得する。特定部１３２は、例えば、文節間関連情報１２２から、「ミカン」という係り側文節５２１、５２４に対して、対応付けられている受け側文節の「買った」、「あります」、「食べた」というような文節を取得する。そして、特定部１３２は、取得した文節それぞれと、最初に認識結果情報２２０に対して行った構文解析により分けて得られた受け側文節である「終わります」との音の類似度を算出する。即ち、特定部１３２は、「買った」という文節５２５と「終わります」という文節５２６との音の類似度を算出し、閾値以上であるか否かを判定する。同様に、特定部１３２は、「あります」という文節５２２と「終わります」という文節５２３との音の類似度を算出し、閾値以上であるか否かを判定する。また、「食べた」という文節についても同様に文節「終わります」との音の類似度を算出して閾値以上であるかを判定する。 Next, the specifying unit 132 acquires, from the inter-phrase related information 122, a phrase registered as a receiver for the listed phrases. The specifying unit 132, for example, from the inter-segment related information 122, associates the related clauses 521 and 524 of “Mikan” with “received”, “bought”, “eats” of the receiving clause. To get such a clause. Then, the specifying unit 132 calculates the similarity of the sound between each of the acquired phrases and “the end of the phrase” which is the receiving phrase obtained by performing the syntax analysis performed first on the recognition result information 220. . That is, the specifying unit 132 calculates the similarity of the sound between the clause 525 of “bought” and the clause 526 of “finished”, and determines whether or not the sound similarity is greater than or equal to the threshold. Similarly, the specifying unit 132 calculates the similarity of the sound between the clause 522 of “Yes” and the clause 523 of “End”, and determines whether or not the similarity is greater than or equal to the threshold. Similarly, the similarity of the phrase “eat” to the phrase “finished” is calculated to determine whether or not the phrase is equal to or greater than the threshold.

この判定の結果、特定部１３２は、文節「終わります」との音の類似度が所定以上である文節「あります」を補正文節の候補として特定する。特定部１３２は、補正文節の候補として特定した文節を単語単位に分割し、補正文章を生成するための文字列を構成する可能性のある単語として記憶する。 As a result of this determination, the specifying unit 132 specifies, as a candidate for a corrected phrase, a phrase “is present” whose similarity of the sound with the phrase “ends” is equal to or greater than a predetermined value. The specifying unit 132 divides a phrase specified as a candidate for a corrected phrase into words and stores the word as a word that may form a character string for generating a corrected sentence.

このようにして、認識結果を補正するための材料となる単語を特定する。 In this way, a word serving as a material for correcting the recognition result is specified.

＜第二段階の処理＞
補正装置１００は、次に、抽出した単語を用いて、補正文章を生成する処理を実行する。その詳細について、図６、図７に示すフローチャートを用いて説明する。当該処理は、補正装置１００の補正部１３３が実行する。なお、図７は、図６に続く処理を示している。 <Second stage processing>
Next, the correction device 100 performs a process of generating a corrected sentence using the extracted words. The details will be described with reference to the flowcharts shown in FIGS. This processing is executed by the correction unit 133 of the correction device 100. FIG. 7 shows processing subsequent to FIG.

（ステップＳ６０１）
補正部１３３は、ステップＳ６０１からステップＳ６１９までの処理を、予め定められた指定数の候補文が見つかるまで、または、別の候補文が見つからなくなるまで、繰り返す。 (Step S601)
The correction unit 133 repeats the processing from step S601 to step S619 until a predetermined specified number of candidate sentences are found, or until another candidate sentence cannot be found.

（ステップＳ６０２）
補正部１３３は、現在の処理対象となっている単語列について、ステップＳ６０２からステップＳ６０８までの処理を、現在の単語列毎に繰り返す。なお、現在の単語列がない、即ち、初回の処理においては、以下の処理は、文頭に来る単語を選定する処理となり、それ以外の場合は、単語列に続く単語を選定する処理となる。ここで、単語列に続く単語には、文末を示す識別子も含まれる。 (Step S602)
The correction unit 133 repeats the processing from step S602 to step S608 for the current word string for the current word string to be processed. It should be noted that there is no current word string, that is, in the first processing, the following processing is processing for selecting a word that comes to the beginning of a sentence, otherwise, it is processing for selecting a word that follows the word string. Here, the word following the word string includes an identifier indicating the end of the sentence.

（ステップＳ６０３）
ステップＳ６０３において、補正部１３３は、現在の単語列に続く次の候補単語を、単語間関連情報１２３を用いて特定する。補正部１３３は、現在の１以上の単語列が、単語間関連情報１２３の前提とする単語列に対応するものがあるか否か判定し、対応するものがあった場合に、その後に続く単語を、次の候補単語として特定する。次の候補単語には、短単語（助詞など）や、フィラー等も含まれ得る。また、単語間関連情報１２３の係り側にない場合には、その次に来る可能性のある単語として、第一段階の処理で特定された単語や認識結果情報２２０に含まれる各認識結果で用いられている単語の中から特定する。次の候補単語を特定すると、ステップＳ６０４に移行する。 (Step S603)
In step S603, the correction unit 133 specifies the next candidate word following the current word string using the inter-word relation information 123. The correction unit 133 determines whether or not the current one or more word strings correspond to the word string assumed in the inter-word relation information 123. If there is a corresponding word string, the subsequent word string is determined. Is specified as the next candidate word. The next candidate word may include a short word (such as a particle) or a filler. In addition, when the word is not involved in the inter-word related information 123, it is used as a word that may come next in the word specified in the first stage processing or in each recognition result included in the recognition result information 220. Identify among the words that have been used. When the next candidate word is specified, the process moves to step S604.

（ステップＳ６０４）
補正部１３３は、ステップＳ６０３において特定した次の候補単語毎に、ステップＳ６０４からステップＳ６０７までの処理を繰り返す。 (Step S604)
The correction unit 133 repeats the processing from step S604 to step S607 for each of the next candidate words specified in step S603.

（ステップＳ６０５）
ステップＳ６０５において、補正部１３３は、次の仮単語列を、現在の単語列に、ステップＳ６０３において特定した次の候補単語のいずれかを組み合わせた単語列を、次の仮単語列として設定し、ステップＳ６０６に移行する。 (Step S605)
In step S605, the correction unit 133 sets a word string obtained by combining the next provisional word string with the current word string and any of the next candidate words specified in step S603 as the next provisional word string, The process moves to step S606.

（ステップＳ６０６）
ステップＳ６０６において、補正部１３３は、次の仮単語列の、単語列としてあり得る確率を算出する。当該確率は、次の仮単語列を構成する単語間のつながりやすさを示す値であるともいえる。補正部１３３は、次の仮単語列が単語列としてあり得る（文章の意味合いとして正しいと考えられる）確率を、単語間関連情報１２３に登録されている前提とする単語列とその後に続く単語との関係にある組み合わせに登場する頻度に基づいて算出する。言い換えれば、次の仮単語列の確率は、次の仮単語列を構成する単語列と単語との、ユーザ発話コーパス１２１における共起頻度に基づいて算出される。 (Step S606)
In step S606, the correction unit 133 calculates the probability of the next provisional word string as a word string. It can be said that the probability is a value indicating the ease of connection between words constituting the next provisional word string. The correction unit 133 calculates the probability that the next provisional word string may be a word string (which is considered to be correct as the meaning of the sentence) by using the word string assumed in the inter-word related information 123 and the following word. Is calculated based on the frequency of appearance in the combinations having the relationship. In other words, the probability of the next provisional word string is calculated based on the co-occurrence frequency of the word and the word constituting the next provisional word string in the user utterance corpus 121.

（ステップＳ６０７）
ステップＳ６０７において、補正部１３３は、次の候補単語列全てについて、ステップＳ６０５、Ｓ６０６の処理を行ったかを判定し、行っていない場合には、残りの次の候補単語について、ステップＳ６０５、Ｓ６０６の処理を行う。全ての次の候補単語列について、ステップＳ６０５、Ｓ６０６の処理を終了した場合には、ステップＳ６０８に移行する。 (Step S607)
In step S607, the correction unit 133 determines whether or not the processing of steps S605 and S606 has been performed for all of the next candidate word strings. If not, the correction unit 133 performs the processing of steps S605 and S606 for the remaining next candidate words. Perform processing. When the processes of steps S605 and S606 have been completed for all the next candidate word strings, the process proceeds to step S608.

（ステップＳ６０８）
ステップＳ６０８において、補正部１３３は、現在の単語列全てについて、ステップＳ６０３からステップＳ６０７までの処理を行ったかを判定し、行っていない場合には、残りの単語列にについて、ステップＳ６０３からステップＳ６０７までの処理を行う。すべての単語列について、ステップＳ６０３からステップＳ６０７までの処理を終了した場合には、ステップＳ６０９に移行する。 (Step S608)
In step S608, the correction unit 133 determines whether the processing from step S603 to step S607 has been performed for all of the current word strings. If not, the correction unit 133 performs the processing on steps S603 to S607 for the remaining word strings. The processing up to is performed. When the processes from step S603 to step S607 have been completed for all the word strings, the process proceeds to step S609.

（ステップＳ６０９）
ステップＳ６０９において、補正部１３３は、次の単語列を、ステップＳ６０６において算出した確率の降順に並べ替えて、ステップＳ６１０に移行する。 (Step S609)
In step S609, the correction unit 133 sorts the next word string in descending order of the probability calculated in step S606, and proceeds to step S610.

（ステップＳ６１０）
補正部１３３は、ステップＳ６１０からステップＳ６１５までの処理を、次の仮単語列毎に繰り返し実行する。 (Step S610)
The correction unit 133 repeatedly executes the processing from step S610 to step S615 for each next provisional word string.

（ステップＳ６１１）
ステップＳ６１１において、補正部１３３は、次の仮単語列と認識結果情報２２０で示される音素情報の音の類似度を算出して、ステップＳ６１２に移行する。 (Step S611)
In step S611, the correction unit 133 calculates the similarity between the next provisional word string and the phoneme information indicated by the recognition result information 220, and proceeds to step S612.

（ステップＳ６１２）
ステップＳ６１２において、補正部１３３は、ステップＳ６１１において算出した音の類似度が、所定の閾値を超えるか否かを判定する。音の類似度が所定の閾値を超える場合には（ＹＥＳ）、ステップＳ６１３に移行し、超えない場合には（ＮＯ）、ステップＳ６１４に移行する。 (Step S612)
In step S612, the correction unit 133 determines whether or not the similarity of the sound calculated in step S611 exceeds a predetermined threshold. If the sound similarity exceeds a predetermined threshold (YES), the process proceeds to step S613, and if not (NO), the process proceeds to step S614.

（ステップＳ６１３）
ステップＳ６１３において、補正部１３３は、音の類似度が所定の閾値を超えた次の仮単語列を、次の単語列として確定し、図７のステップＳ６１４の処理に移行する。 (Step S613)
In step S613, the correction unit 133 determines the next provisional word string whose sound similarity exceeds a predetermined threshold as the next word string, and proceeds to the process of step S614 in FIG.

（ステップＳ６１４）
ステップＳ６１４において、補正部１３３は、これまでに作成した単語列数、即ち、候補文数が予め定めた最大数以内になっているか否かを判定する。これまでに作成した単語列数が予め定めた最大数以内になっている場合には、ステップＳ６１５に移行し、なっていない場合には、ステップＳ６１６に移行する。ここで、予め定めた最大数は、補正の候補となる文章がそれ以上多くなって、処理が膨大になるのを防ぐための数である。この最大数は、補正装置１００のオペレータにより設定される。 (Step S614)
In step S614, the correction unit 133 determines whether the number of word strings created so far, that is, the number of candidate sentences is within a predetermined maximum number. If the number of word strings created so far is within the predetermined maximum number, the process proceeds to step S615, and if not, the process proceeds to step S616. Here, the predetermined maximum number is a number for preventing an increase in the number of sentences that are candidates for correction and an increase in processing. This maximum number is set by the operator of the correction device 100.

（ステップＳ６１５）
ステップＳ６１５において、補正部１３３は、全ての次の仮単語列について、ステップＳ６１１からステップＳ６１４までの処理を実行しているか否かを判定する。全ての次の仮単語列について、ステップＳ６１１からステップＳ６１４までの処理を実行していない場合には、ステップＳ６１１からステップＳ６１４までの処理を実行していない次の仮単語列について、ステップＳ６１１からステップＳ６１４までの処理を実行する。 (Step S615)
In step S615, the correction unit 133 determines whether or not the processing from step S611 to step S614 has been executed for all the next provisional word strings. If the processing from step S611 to step S614 has not been performed for all the next provisional word strings, the processing from step S611 to step S611 is performed for the next provisional word string for which the processing from step S611 to step S614 has not been performed. The processing up to S614 is executed.

（ステップＳ６１６）
ステップＳ６１６において、補正部１３３は、単語列が文末に到達したか否かを判定する。単語列が文末に到達していた場合には（ＹＥＳ）、ステップＳ６１７に移行し、単語列が文末に到達していない場合には（ＮＯ）、ステップＳ６１９に移行する。当該判定は、単語列に、文末を示す文末識別子が付随しているか否かにより行う。 (Step S616)
In step S616, the correction unit 133 determines whether the word string has reached the end of the sentence. If the word string has reached the end of the sentence (YES), the process proceeds to step S617, and if the word string has not reached the end of the sentence (NO), the process proceeds to step S619. This determination is made based on whether or not the word string is accompanied by a sentence end identifier indicating the end of the sentence.

（ステップＳ６１７）
ステップＳ６１７において、補正部１３３は、単語列が候補文としての条件を満たすか否かを判定する。単語列が候補文としての条件を満たす場合には（ＹＥＳ）、ステップＳ６１８に移行し、条件を満たさない場合には（ＮＯ）、ステップＳ６１９に移行する。ここで、候補文としての条件とは、文章としての意味は通っても音声信号と大きく異ならない文章を発見するために設けられた条件である。この条件は、本実施の形態においては、認識結果情報２２０で示される単語数の数と、候補文の単語数との差が、予め定められた単語数のずれの許容値以下であるか否かという条件と、認識結果情報２２０で示される音声信号全体の音と、候補文全体の音の類似度が、予め定められた所定の閾値を超えているか否かという条件とがある。両方の条件を満たす場合に、補正部１３３は、次の仮単語列が候補文としての条件を満たすと判定し、それ以外の場合に満たさないと判定する。 (Step S617)
In step S617, the correction unit 133 determines whether the word string satisfies the condition as a candidate sentence. If the word string satisfies the condition as a candidate sentence (YES), the process proceeds to step S618, and if not, the process proceeds to step S619. Here, the condition as a candidate sentence is a condition provided for finding a sentence that has a meaning as a sentence but is not significantly different from an audio signal. In this embodiment, the condition is that the difference between the number of words indicated by the recognition result information 220 and the number of words of the candidate sentence is equal to or less than a predetermined allowable difference in the number of words. And the condition that the similarity between the sound of the entire voice signal indicated by the recognition result information 220 and the sound of the entire candidate sentence exceeds a predetermined threshold. When both conditions are satisfied, the correction unit 133 determines that the next provisional word string satisfies the condition as a candidate sentence, and otherwise determines that it is not satisfied.

（ステップＳ６１８）
ステップＳ６１８において、補正部１３３は、単語列が候補文としての条件を満たすと判定した場合には、その単語列を候補文として確定する。そして、補正部１３３は、作成した候補文の数を示す候補文数をインクリメントし、ステップＳ６１９に移行する。 (Step S618)
In step S618, when determining that the word string satisfies the condition as a candidate sentence, the correction unit 133 determines the word string as a candidate sentence. Then, the correction unit 133 increments the number of candidate sentences indicating the number of created candidate sentences, and proceeds to step S619.

（ステップＳ６１９）
ステップＳ６１９において、補正部１３３は、全ての現在の単語列について、ステップＳ６０３からステップＳ６１８までの処理を実行したかを判定し、実行していない場合には、次の現在の単語列について、ステップＳ６０３からステップＳ６１８までの処理を実行する。全ての現在の単語列について、ステップＳ６０３からステップＳ６１８までの処理を実行していた場合には、ステップＳ６２０に移行する。 (Step S619)
In step S619, the correction unit 133 determines whether or not the processing from step S603 to step S618 has been performed for all current word strings. The processing from S603 to S618 is executed. If the processes from step S603 to step S618 have been executed for all the current word strings, the process proceeds to step S620.

（ステップＳ６２０）
ステップＳ６２０において、補正部１３３は、補正の候補文を発見したか否かを、ステップＳ６１７において、条件を満たす候補文があったか否かに基づいて判定する。補正の候補文を発見した場合には（ＹＥＳ）、ステップＳ６２１の処理に移行し、発見できなかった場合には（ＮＯ）、ステップＳ６２２の処理に移行する。 (Step S620)
In step S620, the correction unit 133 determines whether a candidate sentence for correction has been found based on whether or not there is a candidate sentence that satisfies the condition in step S617. When a candidate sentence for correction is found (YES), the process proceeds to step S621, and when it is not found (NO), the process proceeds to step S622.

（ステップＳ６２１）
ステップＳ６２１において、補正部１３３は、ステップＳ６０１からＳ６１９までの繰り返し処理において発見された候補文について、その単語列確率が尤も大きい候補文を補正結果として決定し、補正結果を出力して終了する。 (Step S621)
In step S621, the correction unit 133 determines, as a correction result, a candidate sentence having a large likelihood of the word string probability of the candidate sentence found in the repetition processing from steps S601 to S619, outputs the correction result, and ends the processing.

（ステップＳ６２２）
ステップＳ６２２において、補正部１３３は、候補文を発見できなかったことから、認識結果情報２２０で示されるテキストデータの補正なしと判定して、補正なしを示す情報を出力部１４０に出力させる。 (Step S622)
In step S622, the correction unit 133 determines that the text data indicated by the recognition result information 220 has not been corrected since the candidate sentence could not be found, and causes the output unit 140 to output information indicating that there is no correction.

以上に示す処理により、補正装置１００は、入力された音声認識装置による認識結果の入力を受け付けて、その文字列の補正結果を出力することができる。したがって、音声認識装置２００による誤認識により、ユーザが発話した内容とは異なった認識処理結果が出力されたとしても、補正装置１００がその認識結果情報２２０で示される文字列を補正した文字列を出力することができるので、音声認識による認識結果を補正することができる。 By the processing described above, the correction device 100 can receive the input of the recognition result by the input speech recognition device and output the correction result of the character string. Therefore, even if a recognition processing result different from the content uttered by the user is output due to erroneous recognition by the voice recognition device 200, the correction device 100 corrects the character string indicated by the recognition result information 220. Since it can be output, the recognition result by the voice recognition can be corrected.

＜第二段階の処理の具体例＞
以下、図６、図７に示すフローチャートに示す動作を補正装置１００が実行した場合の動作について、具体例を用いて説明する。図８、図９は、補正部１３３による処理により、集められるデータと、その加工例を説明するための図である。図６、図７に示す処理は、文章が文末に到達するまで行われる繰り返し処理であり、以下において、繰り返しの最初からｎ回目（文末到達）までの処理について説明する。 <Specific example of the second stage processing>
Hereinafter, an operation when the operation shown in the flowcharts shown in FIGS. 6 and 7 is executed by the correction device 100 will be described using a specific example. 8 and 9 are diagrams for explaining data collected by the processing by the correction unit 133 and processing examples thereof. The processes shown in FIGS. 6 and 7 are repetitive processes performed until the sentence reaches the end of the sentence. Hereinafter, the processes from the beginning of the repetition to the nth time (the end of the sentence) will be described.

（１回目）
図８（ａ）には、補正部１３３による図６、図７に示す繰り返し処理の１回目の処理を示している。データ集合８１１に示すように、音声認識の結果を示す認識結果情報２２０から、「ミカン」「終わり」「ます」という認識結果が補正装置１００に入力されたとする。 (The first)
FIG. 8A illustrates the first processing of the repetition processing illustrated in FIGS. 6 and 7 by the correction unit 133. As shown in a data set 811, it is assumed that recognition results “Mikan”, “end”, and “mas” are input to the correction device 100 from the recognition result information 220 indicating the result of speech recognition.

この認識結果情報２２０に対して、抽出部１３１と、特定部１３２は、図４に示す処理を行ったことにより、修正候補の単語群として、データ８１２を得たとする。データ８１２に示すように、「ミカン終わります」という文章から、抽出部１３１は、変換候補となる「ミカン」、「終わり」、「ます」、「を」、「割り」、「未完」、「あり」、「実」、…といった単語を抽出する。ここまでは、第一段階の処理（図４参照）に該当する。補正部１３３は、データ８１２に含まれる単語群を用いて、補正対象となる候補文を作成していく。 It is assumed that the extraction unit 131 and the specifying unit 132 obtain the data 812 as a group of correction candidate words by performing the processing illustrated in FIG. 4 on the recognition result information 220. As shown in the data 812, from the sentence “Mikami ends”, the extraction unit 131 extracts the conversion candidates “Mikan”, “End”, “Masu”, “O”, “Division”, “Unfinished”, “ Words such as "Yes", "Real", ... are extracted. Up to this point, this corresponds to the first stage processing (see FIG. 4). The correction unit 133 creates a candidate sentence to be corrected using the word group included in the data 812.

補正部１３３は、まず、データ８１３に示されるように、文頭に用いる単語をデータ８１２の単語群の中から選択する。この選択は、図６のステップＳ６０３の処理に該当する。当該選択は、適宜、候補単語（認識結果情報２２０に含まれる認識結果の文字列を構成する単語や図４に示す処理で決定した候補文節の単語）の中から順番に選択される。そして、それぞれの場合の確率、即ち、文頭におく単語として「（文頭）」という識別子と、その「（文頭）」とのつながりやすさについての確率を算出する。即ち、「（文頭）」という識別子と、選択した単語との組み合わせのつながりやすさの確率を算出する。この処理は、図６におけるステップＳ６０６の処理に該当する。 The correction unit 133 first selects a word to be used at the beginning of a sentence from the word group of the data 812 as shown in the data 813. This selection corresponds to the processing in step S603 in FIG. The selection is appropriately made in order from the candidate words (words constituting the character string of the recognition result included in the recognition result information 220 and words of the candidate clause determined by the processing shown in FIG. 4). Then, the probability in each case, that is, the probability of the connection between the identifier “(head of sentence)” as the word to be put at the head of the sentence and the “(head of sentence)” is calculated. That is, the probability of the ease of connection of the combination of the identifier “(head of sentence)” and the selected word is calculated. This processing corresponds to the processing in step S606 in FIG.

補正部１３３は、ステップＳ６０８の処理により、図８のデータ８１４に示すように、次の仮単語列を、算出した確率の降順に並べ替える。 The correction unit 133 rearranges the next provisional word string in descending order of the calculated probabilities, as shown in the data 814 of FIG.

補正部１３３は、次の仮単語列「（文頭）＋ミカン」、「（文頭）＋未完」、「（文頭）＋終わり」、…それぞれと、認識結果情報２２０で示される音声信号との音の類似度を算出する。この処理は、図６のステップＳ６１０の処理に該当する。そして、補正部１３３は、それぞれの類似度が所定の閾値を超えるか否かによって、次の単語列として使用するか否かを判定する。この判定の結果、補正部１３３は、図８のデータ８１５に示すように、「（文頭）＋ミカン」、「（文頭）＋未完」は、次の単語列として類似度が高いので使用することを決定し、「（文頭）＋終わり」は、音の類似が低くて所定の閾値を越えなかったために、次の単語列としては用いないことを決定する。 The correction unit 133 outputs the sound of the next provisional word string “(head of sentence) + Mikan”, “(head of sentence) + incomplete”, “(head of sentence) + end”,... And the sound signal indicated by the recognition result information 220. Is calculated. This processing corresponds to the processing in step S610 in FIG. Then, the correction unit 133 determines whether or not to use as the next word string based on whether or not each similarity exceeds a predetermined threshold. As a result of this determination, as shown in the data 815 of FIG. 8, the correction unit 133 uses “(head of sentence) + Mikan” and “(head of sentence) + unfinished” because the similarity is high as the next word string. Is determined and “(start of sentence) + end” is determined not to be used as the next word string because the similarity of sounds is low and does not exceed a predetermined threshold.

最後に、補正部１３３は、文末に到達しているか否かを判定する。現時点で決定している次の単語列は、まだ文末に到達していない（文末識別子を有する次の単語列がない）ので、２回目の繰り返し処理に移行する。この処理は、図７のステップＳ６１５のＮＯからステップＳ６１８の処理に該当する。 Finally, the correction unit 133 determines whether the end of the sentence has been reached. Since the next word string determined at the present time has not reached the end of the sentence (there is no next word string having the end-of-sentence identifier), the processing shifts to the second repetition processing. This process corresponds to the process from NO in step S615 to step S618 in FIG.

（２回目）
図８（ｂ）には、補正部１３３による図６、図７に示す繰り返し処理の中の２回目の繰り返し処理によるデータの変遷を示している。図８（ａ）に示す処理により、補正部１３３は、文頭に「ミカン」を併せた単語列と、文頭に「未完」を併せた単語列とを得ている。また、文字列を構成する単語群も使用可能な単語として残されている。その結果、データ８２１を得る。ステップＳ６０２、Ｓ６０３における現在の単語列は、「（文頭）ミカン」、「（文頭）未完」が該当する。 (The second)
FIG. 8B shows a transition of data by the second repetition processing in the repetition processing shown in FIGS. 6 and 7 by the correction unit 133. By the processing illustrated in FIG. 8A, the correction unit 133 obtains a word string including “Mikan” at the beginning of the sentence and a word string including “Unfinished” at the beginning of the sentence. Also, a group of words constituting the character string is left as a usable word. As a result, data 821 is obtained. The current word strings in steps S602 and S603 correspond to “(head of sentence) orange” and “(head of sentence) incomplete”.

補正部１３３は、現在の単語列に続く次の候補単語として、データ８１２に含まれる単語の他、単語列に続く可能性のある短単語、フィラーなどを特定する。ここでは、補正部１３３は、データ８２２に示すように、短単語としては、「が」、「は」、「も」、「の」、「を」…等の助詞を、次の候補単語として選出する。したがって、図８のデータ８２１の上部に示す単語や、データ８２２に示す単語などを次の候補単語として選出する。当該処理は、ステップＳ６０３の処理に該当する。なお、現単語列に続く可能性のある単語には、文末を示す単語の代わりとなる識別子として、「（文末）」が用意されており、この識別子は、その文の文末であることを示す。 The correction unit 133 specifies, as the next candidate word following the current word string, a word included in the data 812, a short word that may possibly follow the word string, a filler, and the like. Here, as shown in the data 822, the correction unit 133 uses particles such as “ga”, “ha”, “mo”, “no”, “wo”, etc. as short words as next candidate words. elect. Therefore, the word shown in the upper part of the data 821 in FIG. 8 and the word shown in the data 822 are selected as the next candidate words. This process corresponds to the process of step S603. For a word that may follow the current word string, “(end of sentence)” is prepared as an identifier instead of a word indicating the end of the sentence, and this identifier indicates that it is the end of the sentence of the sentence. .

補正部１３３は、現在の単語列として「（文頭）ミカン」に、次の単語候補を組み合わせて、データ８２２やデータ８１２に含まれる単語とを組み合わせて、次の単語列を生成する。当該処理は、ステップＳ６０５の処理に該当する。そして、それぞれを次の単語列候補として、その確率を算出する。補正部１３３は、例えば、「（文頭）ミカン」と「が」とを組み合わせて次の単語列としてその確率を算出する。その結果、補正部１３３は、データ８２３に示す次の仮単語列の確率を算出する。当該処理は、ステップＳ６０６の処理に該当する。 The correction unit 133 generates the next word string by combining the next word candidate with “(head of sentence) orange” as the current word string and the words included in the data 822 and the data 812. This processing corresponds to the processing of Step S605. Then, the probabilities are calculated as the next word string candidates. The correction unit 133 calculates the probability of the next word string, for example, by combining “(head of sentence) orange” and “ga”. As a result, the correction unit 133 calculates the probability of the next provisional word string shown in the data 823. This processing corresponds to the processing of Step S606.

そして、補正部１３３は、算出した確率の降順で、データ８２４に示すように次の仮単語列を並べ替える。当該処理は、ステップＳ６０８の処理に該当する。補正部１３３は、それぞれの次の仮単語列と、認識結果情報２２０の音声信号の音との類似度を算出する。音の類似度を算出する手法としては、例えば、レーベンシュタイン距離を用いて算出してもよいし、類似度を算出する次の仮単語列を構成する文字のローマ字表記と、認識結果情報２２０に含まれる音素情報で示されるローマ字表記との文字の一致率を類似度として算出してもよい。補正部１３３は、データ８２５に示すように、音の類似度が所定の閾値を次の単語列として確定する。そして、確定した次の単語列が、文末に到達していない、もしくは、文末に到達したとしても候補文としての条件を満たしていないことを検出して、次の繰り返し処理に移行する。ここでは、「（文頭）ミカン（文末）」という単語列が文末に到達したことになっているが、候補文としての条件を満たしていない、例えば、単語列の単語数と、認識結果情報２２０の単語数との間の差分が所定閾値よりも大きいことから、候補文としての条件を満たしていないと判定する。 Then, the correction unit 133 rearranges the next provisional word string as shown in the data 824 in descending order of the calculated probabilities. This processing corresponds to the processing of Step S608. The correction unit 133 calculates the similarity between each next provisional word string and the sound of the voice signal of the recognition result information 220. As a method of calculating the similarity of the sound, for example, the similarity may be calculated using the Levenshtein distance, or the Romanization notation of the characters constituting the next provisional word string for calculating the similarity, and the recognition result information 220 may be used. The matching rate of the character with the Romanized notation indicated by the included phoneme information may be calculated as the similarity. As shown in the data 825, the correction unit 133 determines a predetermined threshold value for the similarity between sounds as the next word string. Then, it is detected that the determined next word string does not reach the end of the sentence, or even if it reaches the end of the sentence, it does not satisfy the condition as a candidate sentence, and the processing shifts to the next repetition processing. Here, the word string “(head of sentence) orange (end of sentence)” is assumed to have reached the end of the sentence, but does not satisfy the condition as a candidate sentence. For example, the number of words in the word string and the recognition result information 220 Is larger than a predetermined threshold, it is determined that the condition as a candidate sentence is not satisfied.

このようにして、補正部１３３は、次の単語列が文末に到達し、候補文としての条件を満たすまで、図６、図７に示す処理を繰り返す。以下、図９を用いて最終的に補正結果情報２３０を出力できる回の繰り返し処理の具体例を説明する。 In this way, the correction unit 133 repeats the processing shown in FIGS. 6 and 7 until the next word string reaches the end of the sentence and satisfies the condition as a candidate sentence. Hereinafter, a specific example of the repetition process in which the correction result information 230 can be finally output will be described with reference to FIG.

（ｎ回目）
図９は、図８に示した繰り返し処理の結果の次に行われる、ｎ回目（文末に到達する回の繰り返し処理）の繰り返し処理の例を示している。 (N-th)
FIG. 9 illustrates an example of the n-th (the repetition process to reach the end of sentence) repetition process performed next to the result of the repetition process illustrated in FIG. 8.

ｎ−１回の繰り返し処理により、図９のデータ８３１に示すように、現単語列として、例えば、「（文頭）ミカンがあります」や「（文頭）ミカンを割ります」、「（文頭）未完が終わります」等の単語列が形成されているものとする。 As shown in the data 831 of FIG. 9, as the current word string, for example, “(the beginning of the sentence) is orange”, “(the beginning of the sentence) is divided”, and “(the beginning of the sentence) are not completed, as shown in the data 831 of FIG. Ends. "Etc. are formed.

補正部１３３は、これらの現単語列に対して、次に続く単語を特定する。そして、補正部１３３は、現単語列に対して、次に続く可能性のある単語を組み合わせた、次の仮単語列を形成する。補正部１３３は、次の仮単語列それぞれについて、データ８３２に示すように、現単語列と、当該現単語列につなぐ単語のつながりやすさを示す確率を算出する。 The correction unit 133 specifies the next word following these current word strings. Then, the correction unit 133 forms the next provisional word string by combining the current word string with a word that may be next. The correction unit 133 calculates, for each of the following provisional word strings, a probability indicating the ease of connection between the current word string and a word connected to the current word string, as indicated by data 832.

補正部１３３は、確率を算出した次の仮単語列を、データ８３３に示すように、その確率の降順に並べ替えたリストを生成する。そして、補正部１３３は、次の仮単語列各々について、認識結果情報２２０で示される音声信号の音との類似度を算出し、所定の閾値を超えるか否かを判定する。データ８３４の例では、「ミカンがあります」及び「ミカンを割ります」という次の仮単語列の音の類似度は、所定の閾値を超え、「未完が終わりますミカン」という単語列は、所定の閾値を超えていない。 The correction unit 133 generates a list in which the tentative word string next to the calculated probability is rearranged in descending order of the probability, as indicated by data 833. Then, the correction unit 133 calculates, for each of the following provisional word strings, the degree of similarity to the sound of the audio signal indicated by the recognition result information 220, and determines whether or not the number exceeds a predetermined threshold. In the example of the data 834, the similarity of the sound of the next provisional word sequence “There is a mandarin orange” and “divide the mandarin orange” exceeds a predetermined threshold, and the word string “Mikan is unfinished” is a predetermined character string. Does not exceed the threshold of.

補正部１３３は、音の類似度が所定の閾値を超えた次の仮単語列について、その次の仮単語列が文末に到達しているか否かを、次の仮単語列に「（文末）」識別子が付いているかで判定する。ここでは、データ８３５に示すように、「ミカンがあります」という単語列と、「ミカンを割ります」という単語列が文末に到達していると判定された例を示している。 The correction unit 133 determines whether or not the next provisional word string has reached the end of the sentence for the next provisional word string for which the similarity of the sound has exceeded a predetermined threshold value. Is determined based on whether an identifier is attached. Here, as shown in the data 835, an example is shown in which it is determined that the word string “has oranges” and the word string “divides oranges” have reached the end of the sentence.

補正部１３３は、文末に到達したと判定された単語列について、候補文としての条件を満たすか否かを判定する。即ち、補正部１３３は、候補文を構成する単語数と認識結果情報２２０を構成する単語数との差が所定の閾値以下であるかと、候補文の音と認識結果情報２２０で示される音声信号の音との類似度が所定の閾値以上であるかと、を判定する。ここでは、データ８３６に示すように、文末に到達した次の仮単語列双方が候補文としての条件を満たしていると判定した例を示している。 The correction unit 133 determines whether the word string determined to have reached the end of the sentence satisfies the condition as a candidate sentence. That is, the correction unit 133 determines whether the difference between the number of words constituting the candidate sentence and the number of words constituting the recognition result information 220 is equal to or smaller than a predetermined threshold value, and determines whether the sound of the candidate sentence and the speech signal indicated by the recognition result information 220 are correct. It is determined whether or not the similarity with the sound is equal to or greater than a predetermined threshold. Here, as shown in data 836, an example is shown in which it is determined that both of the next provisional word strings reaching the end of the sentence satisfy the condition as a candidate sentence.

補正部１３３は、候補部としての条件を満たすと判定された次の仮単語列の、単語列としてあり得る確率を取得する。データ８３７に示す例では、「ミカンがあります」という単語列の確率は、「０．００３６」、「ミカンを割ります」という単語列の確率は、「０．００３」となっている。 The correction unit 133 obtains the probability of the next provisional word string determined to satisfy the condition as the candidate part as a word string. In the example shown in the data 837, the probability of the word string “there is a mandarin orange” is “0.0036”, and the probability of the word string “divides the mandarin orange” is “0.003”.

したがって、補正部１３３は、データ８３８に示すように、文末に到達し、候補文としての条件を満たし、単語列としての確率が最も高い「ミカンがあります」という単語列を、補正結果情報２３０として出力する。 Therefore, as shown in the data 838, the correction unit 133 sets the word string “has oranges” that reaches the end of the sentence, satisfies the condition as a candidate sentence, and has the highest probability as a word string as the correction result information 230. Output.

したがって、図８、図９に示したように、補正装置１００は、「ミカン終わります」という音声認識の結果を、「ミカンがあります」という文意として不自然でない内容に変更することができていることがわかる。 Therefore, as shown in FIGS. 8 and 9, the correction device 100 can change the result of the voice recognition of “the end of oranges” to a content that is not unnatural as the meaning of “there is orange”. You can see that there is.

＜まとめ＞
上述したように、補正装置１００は、音声認識装置による音声認識の結果を、文節間の係り受けや、１以上の単語列と当該単語列に続く単語のつながりやすさ、そして、補正候補となる文の音と、音声認識に用いた音声信号の音との類似度に基づいて、補正することができる。したがって、補正装置１００は、従来の音声認識結果の文字列に含まれる単語列の信頼度だけを用いた補正手法よりもより、正確な補正ができる。 <Summary>
As described above, the correction device 100 uses the results of speech recognition by the speech recognition device as dependencies between phrases, the ease of connection between one or more word strings and the words following the word strings, and correction candidates. The correction can be made based on the similarity between the sound of the sentence and the sound of the voice signal used for voice recognition. Therefore, the correction device 100 can perform more accurate correction than the conventional correction method using only the reliability of the word string included in the character string of the speech recognition result.

＜補足＞
上記実施の形態に係る補正装置は、上記実施の形態に限定されるものではなく、他の手法により実現されてもよいことは言うまでもない。以下、各種変形例について説明する。 <Supplement>
The correction device according to the above embodiment is not limited to the above embodiment, and it goes without saying that the correction device may be realized by another method. Hereinafter, various modifications will be described.

（１）上記実施の形態においては、特に記載していないが、音声信号に、「んー」、「ええ…」、「ええっと」と言ったフィラーが含まれていたり、「が」、「は」等の短単語が不足していたりしても、補正装置１００は、そのフィラー表現や短単語を考慮した補正文（補正結果情報２３０）を出力することができる。ユーザ発話コーパス１２１には、フィラー表現が含まれている可能性がある。そのため、単語間関連情報１２３においては、１以上の単語列とその単語列に続く単語等が登録されものであるため、ユーザ発話コーパス１２１にフィラーが含まれる場合には、単語間関連情報１２３にフィラーも含まれる。その結果、補正装置１００は、音声信号に含まれるフィラー表現を含む文章も出力することができる。なお、単語列の確率を算出する際には、フィラー表現や短単語は、無視して算出するので、単語列の確率に対して、フィラーや短単語は影響を及ぼさない。したがって、音声信号において短単語やフィラーの有無にかかわらず補正装置１００は、認識結果情報２２０に含まれる認識結果の補正を行うことができる。 (1) In the above embodiment, although not particularly described, the audio signal contains fillers such as "um", "eh ...", "eh", "ga", Even if a short word such as “ha” is insufficient, the correction device 100 can output a correction sentence (correction result information 230) that takes into account the filler expression and the short word. The user utterance corpus 121 may contain a filler expression. Therefore, in the inter-word related information 123, since one or more word strings and words following the word strings are registered, if the user utterance corpus 121 includes a filler, the inter-word related information 123 Fillers are also included. As a result, the correction device 100 can also output a sentence including the filler expression included in the audio signal. When calculating the probability of a word string, filler expressions and short words are ignored and calculated, so fillers and short words do not affect the probability of a word string. Therefore, the correction device 100 can correct the recognition result included in the recognition result information 220 regardless of the presence or absence of a short word or a filler in the audio signal.

（２）上記実施の形態においては、補正装置１００は、単語の信頼度として、認識結果情報２２０に含まれる単語の信頼度を用いることとしたが、これはその限りではない。例えば、信頼度は、補正装置１００が各単語がユーザ発話コーパス１２１に登場する頻度に基づいて算出することとしてもよいし、予めユーザが設定することとしてもよい。 (2) In the above embodiment, the correction device 100 uses the reliability of the word included in the recognition result information 220 as the reliability of the word, but this is not a limitation. For example, the reliability may be calculated based on the frequency at which each word appears in the user utterance corpus 121 by the correction device 100, or may be set in advance by the user.

（３）上記実施の形態においては、補正装置１００は、作成した候補文が、候補文としての条件を満たすかの判定において、２つの条件を満たすか否かで判定することとしている（ステップＳ６１６参照）が、これはその限りではない。いずれか一方のみの条件を候補文としての条件を満たすかの判定に用いることとしてもよい。 (3) In the above embodiment, the correction device 100 determines whether the created candidate sentence satisfies the condition as a candidate sentence based on whether the created candidate sentence satisfies two conditions (step S616). This is not the case. Only one of the conditions may be used to determine whether the condition as a candidate sentence is satisfied.

（４）上記実施の形態に示す補正装置１００は、音声認識装置２００内に組み込まれて構成されることとしてもよい。 (4) The correction device 100 described in the above embodiment may be configured to be incorporated in the speech recognition device 200.

（５）上記実施の形態においては、補正装置における音声認識結果を補正する手法として、補正装置のプロセッサが補正プログラム等を実行することにより、算出することとしているが、これは装置に集積回路（ＩＣ（Integrated Circuit）チップ、ＬＳＩ（Large Scale Integration））等に形成された論理回路（ハードウェア）や専用回路によって実現してもよい。また、これらの回路は、１または複数の集積回路により実現されてよく、上記実施の形態に示した複数の機能部の機能を１つの集積回路により実現されることとしてもよい。ＬＳＩは、集積度の違いにより、ＶＬＳＩ、スーパーＬＳＩ、ウルトラＬＳＩなどと呼称されることもある。すなわち、図１０に示すように、補正装置１００は、入力回路１１０ａと、記憶回路１２０ａと、抽出回路１３１ａと、特定回路１３２ａと、補正回路１３３ａとから構成されてよく、それぞれの機能は、上記実施の形態に示した補正装置１００に含まれる同名の各機能部の機能と同様である。 (5) In the above embodiment, as a method of correcting the speech recognition result in the correction device, the calculation is performed by the processor of the correction device executing the correction program or the like. It may be realized by a logic circuit (hardware) formed on an IC (Integrated Circuit) chip, an LSI (Large Scale Integration), or the like, or a dedicated circuit. Further, these circuits may be realized by one or a plurality of integrated circuits, and the functions of the plurality of functional units described in the above embodiment may be realized by a single integrated circuit. The LSI may be referred to as a VLSI, a super LSI, an ultra LSI, or the like depending on the degree of integration. That is, as shown in FIG. 10, the correction device 100 may include an input circuit 110a, a storage circuit 120a, an extraction circuit 131a, a specific circuit 132a, and a correction circuit 133a. The function is the same as that of each functional unit having the same name included in the correction device 100 shown in the embodiment.

また、上記補正プログラムは、プロセッサが読み取り可能な記憶媒体に記憶されていてよく、記憶媒体としては、「一時的でない有形の媒体」、例えば、テープ、ディスク、カード、半導体メモリ、プログラマブルな論理回路などを用いることができる。また、上記補正プログラムは、当該補正プログラムを伝送可能な任意の伝送媒体（通信ネットワークや放送波等）を介して上記プロセッサに供給されてもよい。本発明は、上記補正プログラムが電子的な伝送によって具現化された、搬送波に埋め込まれたデータ信号の形態でも実現され得る。 Further, the correction program may be stored in a storage medium readable by a processor, and the storage medium may be a “temporary tangible medium” such as a tape, a disk, a card, a semiconductor memory, and a programmable logic circuit. Etc. can be used. Further, the correction program may be supplied to the processor via an arbitrary transmission medium (a communication network, a broadcast wave, or the like) capable of transmitting the correction program. The present invention can also be realized in the form of a data signal embedded in a carrier wave, in which the correction program is embodied by electronic transmission.

なお、上記補正プログラムは、例えば、ActionScript、JavaScript（登録商標）などのスクリプト言語、Objective-C、Java（登録商標）などのオブジェクト指向プログラミング言語、HTML5などのマークアップ言語などを用いて実装できる。 The correction program can be implemented using, for example, a script language such as ActionScript or JavaScript (registered trademark), an object-oriented programming language such as Objective-C or Java (registered trademark), or a markup language such as HTML5.

（６）上記実施の形態及び各補足に示した構成は、適宜組み合わせることとしてもよい。 (6) The configurations described in the above embodiments and supplements may be appropriately combined.

１００補正装置
１１０入力部
１２０記憶部
１３０制御部
１３１抽出部
１３２特定部
１３３補正部
１４０出力部 Reference Signs List 100 correction device 110 input unit 120 storage unit 130 control unit 131 extraction unit 132 identification unit 133 correction unit 140 output unit

Claims

文節と文節との間の関連性を示す文節間関連情報を記憶する記憶部と、
音声信号をテキストデータに変換する音声認識処理の結果である１以上の文字列を示す認識結果情報の入力を受け付ける入力部と、
前記認識結果情報で示される文字列から文節の候補となる文節候補を抽出する抽出部と、
前記文節間関連情報を用いて、前記抽出部が抽出した文節候補同士の係り受けに基づいて、前記文字列を構成する可能性のある文節を特定する特定部と、
前記特定部が特定した文節候補を用いて、前記認識結果情報で示される文字列を補正する補正部とを備える補正装置。 A storage unit for storing inter-phrase related information indicating the relevance between clauses;
An input unit that receives input of recognition result information indicating one or more character strings that is a result of a voice recognition process that converts a voice signal into text data;
An extraction unit that extracts a phrase candidate that is a phrase candidate from the character string indicated by the recognition result information,
A specifying unit that specifies a phrase that may form the character string based on the dependency between the phrase candidates extracted by the extraction unit using the inter-phrase related information;
A correction unit that corrects a character string indicated by the recognition result information using the phrase candidate specified by the specifying unit.

前記特定部は、前記抽出部が抽出した文節のうち信頼度が他の文節よりも高い文節を基準として、当該信頼度が高い文節の係り先の候補となる１以上の係り先文節候補、または、他の文節からの係りを受ける係り元の候補となる１以上の係り元文節候補を、前記文字列を構成する可能性のある文節として特定する
ことを特徴とする請求項１に記載の補正装置。 The specifying unit is based on a phrase having a higher reliability than other phrases among the phrases extracted by the extraction unit, and one or more candidate phrase candidates that are candidates for a destination of the phrase with high reliability, or 2. The method according to claim 1, wherein one or more candidate candidate phrase candidates that are candidates for a modification source that receives a modification from another phrase are specified as phrases that may constitute the character string. apparatus.

前記特定部は、複数の係り先文節候補、又は、複数の係り元文節候補の中から、前記認識結果情報で示される文字列の音の類似度に基づいて、前記文字列を構成する可能性のある文節を特定する
ことを特徴とする請求項２に記載の補正装置。 The identification unit may configure the character string based on the similarity of the sound of the character string indicated by the recognition result information, from among a plurality of dependency phrase candidates or a plurality of dependency source phrase candidates. The correction device according to claim 2, wherein a phrase having a mark is specified.

前記記憶部は、さらに、単語と単語との間の関連性を示す単語間関連情報を記憶し、
前記補正部は、前記特定部が特定した前記文字列を構成する可能性のある文節を、当該文節を構成している可能性のある単語候補に分割し、前記単語間関連情報を用いて、前記単語候補の中から前記文字列を構成する可能性のある文節を構成する可能性のある単語を特定して、前記補正を行う
ことを特徴とする請求項１〜３のいずれか１項に記載の補正装置。 The storage unit further stores inter-word related information indicating the relevance between the words,
The correcting unit divides a phrase that may form the character string specified by the specifying unit into word candidates that may form the phrase, and uses the inter-word related information, The correction is performed by specifying, from the word candidates, a word that may form a phrase that may form the character string. The method according to claim 1, wherein the correction is performed. The correction device as described.

前記補正部は、前記文字列を構成する可能性のある単語を文章の文頭から確度の高い単語を選択し、選択した単語に続く可能性のある単語を、少なくとも前記文節を分割して得られる単語及び前記単語間関連情報に含まれる単語及び前記認識結果情報のいずれかの単語の中から選択することを繰り返すことで、前記文字列を補正する
ことを特徴とする請求項４に記載の補正装置。 The correction unit selects a word having a high probability from the beginning of a sentence of a word that may form the character string, and obtains a word that may follow the selected word by at least dividing the phrase. The correction according to claim 4, wherein the character string is corrected by repeatedly selecting a word, a word included in the inter-word related information, and any one of the recognition result information. apparatus.

前記単語間関連情報は、ユーザの発話例を示すテキストデータベースに基づいて生成された、１以上の単語列と、当該１以上の単語列に連なる単語とを対応付けた情報であることを特徴とする請求項４又は５に記載の補正装置。The inter-word related information is information in which one or more word strings generated based on a text database indicating a user's utterance example are associated with words connected to the one or more word strings. The correction device according to claim 4, wherein the correction device performs the correction.

前記文節間関連情報は、ユーザの発話例を示すテキストデータベースに基づいて生成された、係り側文節を示す文字列と受け側文節を示す文字列とを対応付けた情報であることを特徴とする請求項１〜６のいずれか１項に記載の補正装置。The inter-phrase related information is information that is generated based on a text database that indicates a user's utterance example and that associates a character string that indicates a relevant phrase with a character string that indicates a receiving phrase. The correction device according to claim 1.

前記補正装置は、さらに、
前記補正部が補正した補正結果を出力する出力部を備える
ことを特徴とする請求項１〜７のいずれか１項に記載の補正装置。 The correction device further includes:
The correction device according to claim 1, further comprising an output unit configured to output a correction result corrected by the correction unit.

音声信号をテキストデータに変換する音声認識処理の結果である１以上の文字列を示す認識結果情報の入力を受け付ける入力ステップと、
前記認識結果情報で示される文字列から文節の候補となる文節候補を抽出する抽出ステップと、
文節と文節との間の関連性を示す文節間関連情報を用いて、前記抽出ステップにおいて抽出した文節候補同士の係り受けに基づいて、前記文字列を構成する可能性のある文節を特定する特定ステップと、
前記特定ステップにおいて特定した文節候補を用いて、前記認識結果情報で示される文字列を補正する補正ステップとを含む補正方法。 An input step of receiving an input of recognition result information indicating one or more character strings that is a result of a voice recognition process for converting a voice signal into text data;
An extraction step of extracting a phrase candidate that is a phrase candidate from the character string indicated by the recognition result information,
Using the inter-segment relation information indicating the relation between the segments and the phrase, specifying a phrase that may form the character string based on the dependency between the phrase candidates extracted in the extraction step Steps and
Correcting the character string indicated by the recognition result information using the phrase candidate specified in the specifying step.

文節と文節との間の関連性を示す文節間関連情報を記憶するメモリにアクセス可能なコンピュータに、
音声信号をテキストデータに変換する音声認識処理の結果である１以上の文字列を示す認識結果情報の入力を受け付ける入力機能と、
前記認識結果情報で示される文字列から文節の候補となる文節候補を抽出する抽出機能と、
前記文節間関連情報を用いて、前記抽出機能が抽出した文節候補同士の係り受けに基づいて、前記文字列を構成する可能性のある文節を特定する特定機能と、
前記特定機能が特定した文節候補を用いて、前記認識結果情報で示される文字列を補正する補正機能とを実現させる補正プログラム。
A computer that can access a memory that stores inter-phrase related information indicating the relation between the clauses,
An input function for receiving an input of recognition result information indicating one or more character strings that is a result of a voice recognition process for converting a voice signal into text data;
An extraction function of extracting a phrase candidate that is a phrase candidate from the character string indicated by the recognition result information;
Using the inter-phrase related information, based on the dependency between the phrase candidates extracted by the extraction function, a specifying function to specify a phrase that may constitute the character string,
A correction function for correcting a character string indicated by the recognition result information using the phrase candidate specified by the specific function.