JP3727173B2

JP3727173B2 - Speech recognition method and apparatus

Info

Publication number: JP3727173B2
Application number: JP17246998A
Authority: JP
Inventors: 均岩見田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1998-06-19
Filing date: 1998-06-19
Publication date: 2005-12-14
Anticipated expiration: 2018-06-19
Also published as: JP2000010583A

Description

【０００１】
【発明の属する技術分野】
本発明は、音声入力装置において音声を認識する方法及び装置に関する。
【０００２】
【従来の技術】
不特定話者を対象とした音声認識においては、入力された音声をどのような言葉と認識するかをあらかじめ文字列によって、認識語彙として与えておく方法が良く用いられている。したがって、入力された音声が認識語彙として事前に与えられていないものであった場合には、認識することができないという不都合があった。
【０００３】
かかる弊害を回避するべく、多くの認識語彙を事前に登録しておく必要が生じる。この場合に、音声を認識語彙に変換するのに使用する変換用辞書に「読みがな」と「表記文字」を１：１で登録しておくことも考えられるが、語彙の総数を考慮すると現実的ではないため、様々な工夫が行われている。特開平７−７３１７５号公報においては、複数の「読みがな」に対して一つの「表記文字」を指し示す方法が開示されている。
【０００４】
【発明が解決しようとする課題】
しかし、上記方法においても、不特定話者が読み方を誤った場合等には依然として対応できず、誤った「表記文字」を認識文字として出力するか、認識できない状態となるという問題点があった。
【０００５】
本発明は、かかる問題点を解決するために、不特定話者が読み方を誤った場合においても、意図した「表記文字」を認識文字として出力することが可能な音声を認識する方法及び装置を提供することを目的とする。
【０００６】
【課題を解決するための手段】
上記課題を解決するために本発明にかかる音声認識方法は、一つの項目が表記文字列と読みがな列とで構成される認識語彙リストを音声入力の認識語彙として与える音声認識方法において、入力された音声を分析する音響分析工程と、表記文字列の一部である部分表記文字列の読みがなである部分読みがな列に対応付けて当該部分表記文字列の読み変え読みがな列があらかじめ登録されている読みがな列変換表に従って、前記認識語彙リストの読みがな列に含まれる部分読みがな列を、前記音響分析工程による分析結果に含まれる読み変え読みがな列で置換した新たな読みがな列を一つ又は複数生成し、前記新たな読みがな列を読み変え読みがな列として前記認識語彙リストに追加する読みがな列変換工程と、前記音響分析工程による分析結果を前記認識語彙リストの読みがな列及び読み変え読みがな列と照合し、照合度合の指標である照合スコアを算出し、前記照合スコアの最も良い前記項目を選択する照合工程とを含むことを特徴とする。
【０００７】
かかる構成により、表記文字列と読みがな列とで構成される認識語彙リストに認識語彙として登録されていない音声入力である場合であっても、読み変え読みがな列として生成されていれば認識語彙として登録されている場合と同様の認識結果を出力することができる。したがって、読みがなを誤って音声入力した場合であっても、正しい表記文字列として認識することが可能となる。また、読みがなを一意に定めることが困難な文字列、例えば住所や人名等において読みがなを定めることが困難な場合であっても、複数の読みがな列を用いて照合することができることから、正しい表記文字列を認識する可能性をより高くすることができる。
【０００８】
次に、本発明にかかる音声認識装置は、一つの項目が表記文字列と読みがな列とで構成される認識語彙リストを音声入力の認識語彙として与える音声認識装置において、入力された音声を分析する音響分析部と、表記文字列の一部である部分表記文字列の読みがなである部分読みがな列に対応付けて当該部分表記文字列の読み変え読みがな列があらかじめ登録されている読みがな列変換表に従って、前記認識語彙リストの読みがな列に含まれる部分読みがな列を、前記音響分析部による分析結果に含まれる読み変え読みがな列で置換した新たな読みがな列を一つ又は複数生成し、前記新たな読みがな列を読み変え読みがな列として前記認識語彙リストに追加する読みがな列変換部と、前記音響分析部による分析結果を前記認識語彙リストの読みがな列及び読み変え読みがな列と照合し、照合度合の指標である照合スコアを算出し、前記照合スコアの最も良い前記項目を選択する照合部を含むことを特徴とする。
【０００９】
かかる構成により、表記文字列と読みがな列とで構成される認識語彙リストに認識語彙として登録されていない音声入力である場合であっても、読み変え読みがな列として生成されていれば認識語彙として登録されている場合と同様の認識結果を出力することができる。したがって、読みがなを誤って音声入力した場合であっても、正しい表記文字列として認識することが可能となる。また、読みがなを一意に定めることが困難な文字列、例えば住所や人名等において読みがなを定めることが困難な場合であっても、複数の読みがな列を用いて照合することができることから、正しい表記文字列を認識する可能性をより高くすることができる。
【００１０】
次に、本発明にかかるコンピュータに実行させるプログラムを記録したコンピュータ読み取り可能な記録媒体は、一つの項目が表記文字列と読みがな列とで構成される認識語彙リストを音声入力の認識語彙として与えるコンピュータに実行させるプログラムを記録したコンピュータ読み取り可能な記録媒体において、入力された音声を分析する音響分析手順と、表記文字列の一部である部分表記文字列の読みがなである部分読みがな列に対応付けて当該部分表記文字列の読み変え読みがな列があらかじめ登録されている読みがな列変換表に従って、前記認識語彙リストの読みがな列に含まれる部分読みがな列を、前記音響分析手順による分析結果に含まれる読み変え読みがな列で置換した新たな読みがな列を一つ又は複数生成し、前記新たな読みがな列を読み変え読みがな列として前記認識語彙リストに追加する読みがな列変換手順と、前記音響分析手順による分析結果を前記認識語彙リストの読みがな列及び読み変え読みがな列と照合し、照合度合の指標である照合スコアを算出し、前記照合スコアの最も良い前記項目を選択する照合手順をコンピュータに実行させるプログラムを記録したことを特徴とする。
【００１１】
かかる構成により、コンピュータ上へ当該プログラムをロードさせ実行することで、表記文字列と読みがな列とで構成される認識語彙リストに認識語彙として登録されていない音声入力である場合であっても、読み変え読みがな列として生成されていれば認識語彙として登録されている場合と同様の認識結果を出力することができる。したがって、読みがなを誤って音声入力した場合であっても、正しい表記文字列として認識することが可能となる音声認識装置を実現することができる。
【００１２】
【発明の実施の形態】
以下、本発明の実施形態にかかる音声認識方法及び装置について、図１を参照しながら説明する。図１は本発明の実施形態にかかる音声認識装置１の構成図を示す。
【００１３】
図１において、１１は音響分析部を示し、不特定話者から入力された音声を分析する役割を果たす。音響分析部１１については従来からの音声認識装置に適用されているものと違いはない。
【００１４】
１２は読みがな列変換部を示す。読みがな列変換部１２は、一つの項目が表記文字列と読みがな列とで構成される認識語彙リスト１５の中に、読みがな列変換表１４に記述された部分読みがな列が存在し、当該部分読みがな列に対応する部分表記文字列が、読みがな列変換表１４に記述された部分表記文字列と一致する場合に、当該部分読みがな列を読み変え読みがな列に変換した新たな読みがな列を生成し、新たな読みがな列を読み変え読みがな列として語彙認識リスト１５の当該項目に追加する。したがって、認識語彙リスト１５を自動的に更新する役割を果たす。すなわち、認識語彙リスト１５は表記文字列と読みがな列で構成されるものであるが、さらに読み変え読みがな列を追加することが可能となる。追加される読み変え読みがな列は一つに限定されるものではなく、二以上の複数であっても構わない。かかる構成とすることで、新たに追加された読み変え読みがな列を用いることで認識語彙リスト１５が拡張され、認識語彙として事前に登録されていない読みがな列についても新たな認識語彙として用いることが可能となる。
【００１５】
１３は照合部を示し、音響分析部１１で分析された音声を読みがな列変換部１２で作成された認識語彙リスト１５と照合することで、最も音声入力に近い表記文字列を選択して、音声認識結果として出力する役割を果たす。具体的には、音響分析部１１で分析された音声を認識語彙リスト１５と照合して、照合している程度を表わす指標である照合スコア値を算出して、照合スコア値の良い、すなわち照合度合の高い表記文字列から順に音声認識結果として出力することになる。
【００１６】
以上のように本実施形態によれば、登録されていない読みがな列についても、読み変え読みがな列が追加されることにより、住所や氏名等のように読みがなを特定することが比較的困難なものが対象であっても、より正しい認識語彙を選択することができる。
【００１７】
次に本発明の一実施例について、図２を参照しながら説明する。図２は本発明の一実施例における音声認識装置の構成図を示す。図２において、不特定話者が音声を入力すると、音響分析部１１によって入力した音声が分析される。例えば、「みなとまち」という音声を入力する。
【００１８】
一方、あらかじめ認識語彙としては、読みがな列「みなとちょう」に対して表記文字列「港町」が登録されているが、読みがな列「みなとまち」に対して表記文字列「港町」が登録されていない場合には、従来の音声認識方法によれば、認識語彙なしと判断される。
【００１９】
本発明にかかる音声認識装置においては、項目検索部１２１において、表記文字列と読みがな列とで構成される認識語彙リスト１５の中に、読みがな列変換表１４に記述された部分読みがな列が存在し、当該部分読みがな列に対応する部分表記文字列が、読みがな列変換表１４に記述された部分表記文字列と一致する場合に、読み変え読みがな列追加部１２２において、当該部分読みがな列を部分読みがな列に変換した新たな読みがな列を生成し、新たな読みがな列を読み変え読みがな列として認識語彙リスト１５の当該項目に追加することができる。例えば、読みがな列変換表が（表１）に示すように記述されている場合、与えられた認識語彙リスト（表２）に対して、表記文字列「港町」の読み変え読みがな列「みなとまち」と表記文字列「大和東」の読み変え読みがな列「だいわひがし」を自動的に追加した、新しい認識語彙リスト（表３）が生成される。
【００２０】
【表１】

【００２１】
【表２】

【００２２】
【表３】

【００２３】
音声入力された「みなとまち」という言葉は、照合部１３で認識語彙リスト１５に指定されている認識語彙と照合を行う。照合した結果どの程度照合しているのか判断する指標として、スコア算出部１３１で照合スコアが算出される。ここで、照合スコアとは、ある音声入力に対して認識語彙の各項目ごとに計算されるものであり、入力音声がどの程度その項目らしいかを示す尺度を意味する。
【００２４】
照会スコア算出方法の代表的な方法として、ある項目の周波数の特徴、例えば「ま」という読みの周波数の特徴を時系列にパターン化したものと、入力された音声の周波数の特徴を時系列に表したものとを比較して、双方のベクトル間のユークリッド距離を時系列方向に累積したものをスコアとする方法がある。この方法によると、双方がまったく同じパターンであればユークリッド距離はゼロとなるため、スコアはゼロとなる。逆にパターンの相違が大きいほどスコア値が大きく、すなわち双方のパターンが一致しないと判断される。
【００２５】
もちろん、かかる方法に限定されるわけではなく、他の方法でスコアを算出しても良い。例えば、標準的な音声パターンの出現確率を時系列方向に乗算したものも考えられる。
【００２６】
そして、各項目の照合スコアを項目スコア決定部１３２で決定して、上位項目選択部１３３において、照合スコア値の優れた語彙から順に順位付けを行う。かかる順位付けの最も高い語彙が認識結果として出力される。
【００２７】
以上のように、本実施例によれば、認識語彙としてすべての読みがな列を登録すること無く、読み変え読みがな列を追加することにより、効率的な音声認識装置を構成することが可能となる。
【００２８】
次に、本発明の実施形態にかかる音声認識装置を実現するプログラムの処理の流れについて説明する。図３に本発明の実施形態にかかる音声認識装置を実現するプログラムの処理の流れ図を示す。
【００２９】
まず、入力された音声に対して、読みがな変換表に登録されている各項目の読みがな列の照合スコア値の計算を行う（ステップ３１１）。この場合は、従来と同様、認識語彙として登録されていないものについては、照合スコア値は最低となる。
【００３０】
次に、認識語彙リストに登録されている各項目の読み変え読みがな列の照合スコア値の計算を行う（ステップ３１２）。このステップで、認識語彙としての登録が無くても、読み変え読みがな列として追加されていれば、自動的に認識語彙が生成されることにより認識語彙として判断される範囲が拡大して、認識語彙が登録されているのと同様の照合スコア値が算出される。
【００３１】
そして、ステップ３１１及びステップ３１２の算出結果に基づいて、算出対象となっている項目についての照合スコアが決定する（ステップ３２１）。以上の処理を認識語彙リストに登録されている全項目について繰り返し行う（ステップ３２２）。そして、全項目の照合スコア値が算出されたところで、照合スコア値の優れたものから順に選択して入力音声に対する認識結果として出力する（ステップ３３１）。
【００３２】
また、本発明の実施形態にかかる音声認識装置を実現するプログラムを記憶した記録媒体は、図４に示す記録媒体の例に示すように、ＣＤ−ＲＯＭやフロッピーディスク等の可搬型記録媒体だけでなく、通信回線の先に備えられた他の記憶装置や、コンピュータのハードディスクやＲＡＭ等の記録媒体のいずれでも良く、プログラム実行時には、プログラムはローディングされ、主メモリ上で実行される。
【００３３】
また、本発明の実施形態にかかる音声認識装置により生成された読みがな列変換表等を記録した記録媒体も、図４に示す記録媒体の例に示すように、ＣＤ−ＲＯＭやフロッピーディスク等の可搬型記録媒体だけでなく、通信回線の先に備えられた他の記憶装置や、コンピュータのハードディスクやＲＡＭ等の記録媒体のいずれでも良く、例えば本発明にかかる音声認識装置を利用する際にコンピュータにより読み取られる。
【００３４】
【発明の効果】
以上のように本発明にかかる音声認識方法によれば、認識語彙としてすべての読みがな列を事前に登録すること無く、音声入力が正しい読みがなの通りにされなかった場合においても、希望する認識結果を得ることが可能となる。
【図面の簡単な説明】
【図１】本発明の実施形態にかかる音声認識装置の概略構成図
【図２】本発明の一実施例における音声認識装置の概略構成図
【図３】本発明の実施形態における音声認識装置の処理の流れ図
【図４】記録媒体の例示図
【符号の説明】
１音声認識装置
１１音響分析部
１２読みがな列変換部
１３照合部
１４読みがな列変換表
１５認識語彙リスト
４１回線先の記憶装置
４２ＣＤ−ＲＯＭやフロッピーディスク等の可搬型記録媒体
４２−１ＣＤ−ＲＯＭ
４２−２フロッピーディスク
４３コンピュータ
４４コンピュータ上のＲＡＭ／ハードディスク等の記録媒体
１２１項目検索部
１２２読み変え読みがな列追加部
１３１スコア算出部
１３２項目スコア決定部
１３３上位項目選択部[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a method and apparatus for recognizing speech in a speech input device.
[0002]
[Prior art]
In speech recognition for unspecified speakers, a method of giving a recognition vocabulary as a recognition vocabulary in advance using a character string is often used as an input speech. Therefore, when the input voice is not given as a recognition vocabulary in advance, there is a disadvantage that it cannot be recognized.
[0003]
In order to avoid such an adverse effect, it is necessary to register many recognition vocabularies in advance. In this case, it is conceivable that “reading” and “notation characters” are registered 1: 1 in the conversion dictionary used to convert the speech into the recognized vocabulary, but considering the total number of vocabularies Since it is not realistic, various ideas have been made. Japanese Patent Application Laid-Open No. 7-73175 discloses a method of indicating one “notation character” for a plurality of “reading”.
[0004]
[Problems to be solved by the invention]
However, even in the above method, there is a problem in that it is still impossible to deal with an unspecified speaker misreading, etc., and an incorrect "notation character" is output as a recognized character or becomes unrecognizable. .
[0005]
In order to solve such problems, the present invention provides a method and apparatus for recognizing speech that can output an intended “notation character” as a recognition character even when an unspecified speaker misreads. The purpose is to provide.
[0006]
[Means for Solving the Problems]
In order to solve the above problems, a speech recognition method according to the present invention provides a recognition vocabulary list in which one item is composed of a written character string and a reading string as a recognition vocabulary for speech input. An analysis process for analyzing the recorded speech and a partial reading character string corresponding to a partial reading character string that is a part of the written character string. In accordance with the pre-registered reading column conversion table , the partial reading column included in the reading column of the recognized vocabulary list is replaced with the reading replacement column included in the analysis result of the acoustic analysis step. A reading sequence conversion step of generating one or a plurality of replaced new reading columns, adding the new reading column as a reading column to the recognized vocabulary list, and the acoustic analysis step Analysis results by The recognition vocabulary list to read a column and read varied readings against the a column, calculates the matching score is an indication of the matching degree, comprise a matching step of selecting the best the item of the matching score It is characterized by.
[0007]
With this configuration, even if the input is a speech input that is not registered as a recognized vocabulary in a recognized vocabulary list composed of a written character string and a reading string, if it is generated as a reading-reading string, A recognition result similar to that registered as a recognition vocabulary can be output. Therefore, even if the reading is input by mistake, it can be recognized as a correct written character string. In addition, even if it is difficult to determine a reading string in a character string that is difficult to uniquely determine a reading, such as an address or a person's name, a plurality of reading strings can be used for collation. As a result, the possibility of recognizing the correct written character string can be increased.
[0008]
The speech recognition apparatus according to the present invention is the speech recognition apparatus in which one of the items given as recognition vocabulary of the speech input a composed recognition vocabulary list and is a column read as writing character string, the voice input The acoustic analysis unit to be analyzed and the partial reading character string that is a part of the written character string are registered in advance in association with the partial reading character string that is the reading of the partial written character string. In accordance with the reading column conversion table , the new reading column that replaces the partial reading column included in the reading column of the recognized vocabulary list with the reading column included in the analysis result by the acoustic analysis unit. A reading column conversion unit that generates one or a plurality of reading columns and replaces the new reading column as a reading column and adds it to the recognition vocabulary list, and an analysis result by the acoustic analysis unit reading of the recognition vocabulary list Against the can such columns and reading changed to read a column, it calculates the matching score is an indication of the matching degree, characterized in that it comprises a collating unit for selecting the best the item of the matching scores.
[0009]
With this configuration, even if the input is a speech input that is not registered as a recognized vocabulary in a recognized vocabulary list composed of a written character string and a reading string, if it is generated as a reading-reading string, A recognition result similar to that registered as a recognition vocabulary can be output. Therefore, even if the reading is input by mistake, it can be recognized as a correct written character string. In addition, even if it is difficult to determine a reading string in a character string that is difficult to uniquely determine a reading, such as an address or a person's name, a plurality of reading strings can be used for collation. As a result, the possibility of recognizing the correct written character string can be increased.
[0010]
Next, a computer-readable recording medium recording a program to be executed by a computer according to the present invention uses a recognition vocabulary list in which one item is a notation character string and a reading string as a recognition vocabulary for speech input. In a computer-readable recording medium recording a program to be executed by a computer to be given, there is an acoustic analysis procedure for analyzing input speech and partial reading that is a part of the written character string. In accordance with the reading column conversion table in which the reading column of the partial notation character string is registered in advance in association with the reading column, the reading column included in the reading column of the recognized vocabulary list is determined. the acoustic analysis procedures new readings such sequence was replaced by reading changed to read a column included in the analysis result to one or more generate by the new And to read a column transformation steps to be added to the recognition vocabulary list as changed to read a column read Migana column, the acoustic analysis result by the analysis procedure to read the recognition vocabulary list such columns and reading changed to read it A program for causing a computer to execute a collation procedure for selecting the item having the best collation score is calculated by collating with a column, calculating a collation score that is an index of the collation degree.
[0011]
With this configuration, even when the input is not registered as a recognition vocabulary in a recognition vocabulary list composed of a written character string and a reading string by loading and executing the program on a computer, If it is generated as an unread column, it is possible to output the same recognition result as that registered as a recognition vocabulary. Therefore, it is possible to realize a speech recognition apparatus that can recognize a correct written character string even when a reading is input by mistake.
[0012]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, a speech recognition method and apparatus according to an embodiment of the present invention will be described with reference to FIG. FIG. 1 shows a configuration diagram of a speech recognition apparatus 1 according to an embodiment of the present invention.
[0013]
In FIG. 1, reference numeral 11 denotes an acoustic analysis unit, which plays a role of analyzing speech input from an unspecified speaker. The acoustic analysis unit 11 is not different from that applied to a conventional speech recognition apparatus.
[0014]
Reference numeral 12 denotes a reading column conversion unit. The reading column conversion unit 12 includes a partial reading column described in the reading column conversion table 14 in a recognition vocabulary list 15 in which one item is a notation character string and a reading column. If the partial notation character string corresponding to the partial reading string matches the partial notation character string described in the reading column conversion table 14, the partial reading character string is replaced and read. A new reading column converted to a kana column is generated, and the new reading column is replaced and added to the item of the vocabulary recognition list 15 as a reading column. Accordingly, the recognition vocabulary list 15 is automatically updated. That is, the recognized vocabulary list 15 is composed of a written character string and a non-reading string, but it is possible to add a reading / reading string. The number of reading-reading columns that are added is not limited to one, but may be two or more. With this configuration, the recognition vocabulary list 15 is expanded by using a newly added reading / reading sequence, and a reading sequence that is not registered in advance as a recognition vocabulary is also a new recognition vocabulary. It can be used.
[0015]
Reference numeral 13 denotes a collation unit, which selects the notation character string closest to the voice input by collating the speech analyzed by the acoustic analysis unit 11 with the recognized vocabulary list 15 created by the reading column conversion unit 12. It plays a role of outputting as a voice recognition result. Specifically, the speech analyzed by the acoustic analysis unit 11 is collated with the recognition vocabulary list 15, and a collation score value that is an index representing the degree of collation is calculated. The voice recognition results are output in order from the highest-notation character string.
[0016]
As described above, according to the present embodiment, it is possible to specify a reading, such as an address or name, by adding a reading-reading column to a reading-reading column that is not registered. Even if the subject is relatively difficult, a more correct recognition vocabulary can be selected.
[0017]
Next, an embodiment of the present invention will be described with reference to FIG. FIG. 2 shows a block diagram of a speech recognition apparatus in an embodiment of the present invention. In FIG. 2, when an unspecified speaker inputs a voice, the voice inputted by the acoustic analysis unit 11 is analyzed. For example, a voice “Minatomachi” is input.
[0018]
On the other hand, as the recognition vocabulary, the notation character string “Minatocho” is registered in advance for the reading string “Minatocho”, but the notation character string “Minatomachi” is registered for the reading string “Minatomachi”. If not registered, it is determined that there is no recognized vocabulary according to the conventional speech recognition method.
[0019]
In the speech recognition apparatus according to the present invention, in the item search unit 121, the partial reading described in the reading column conversion table 14 in the recognition vocabulary list 15 including the written character string and the reading column. When a partial notation character string corresponding to the partial reading character column matches the partial notation character string described in the reading character column conversion table 14, the reading change reading column is added. In the section 122, a new reading column is generated by converting the partial reading column into a partial reading column, and the new reading column is replaced with the reading item as the reading column. Can be added to. For example, when the unrecognized column conversion table is described as shown in (Table 1), the notation character string “Minato” is replaced with the unrecognized character string “Minatomachi” for the given recognition vocabulary list (Table 2). A new recognized vocabulary list (Table 3) is automatically created by automatically adding “Daiwa Higashi”, an unrecognized string of “Minatomachi” and the written character string “Yamato Higashi”.
[0020]
[Table 1]

[0021]
[Table 2]

[0022]
[Table 3]

[0023]
The word “Minatomachi” inputted by voice is collated with the recognized vocabulary specified in the recognized vocabulary list 15 by the collation unit 13. As an index for determining how much collation is performed as a result of the collation, the score calculation unit 131 calculates a collation score. Here, the collation score is calculated for each item of the recognized vocabulary with respect to a certain voice input, and means a scale indicating how much the input voice seems to be the item.
[0024]
As a typical method of calculating the query score, the frequency characteristics of a certain item, for example, the frequency characteristics of the reading “ma” are patterned in time series, and the frequency characteristics of the input speech are time-series. There is a method in which a score obtained by comparing Euclidean distances between both vectors in a time series direction is compared with the expressed one. According to this method, if both patterns are exactly the same, the Euclidean distance is zero, so the score is zero. Conversely, the greater the difference between the patterns, the larger the score value, that is, it is determined that the two patterns do not match.
[0025]
Of course, the method is not limited to this method, and the score may be calculated by another method. For example, a product obtained by multiplying the appearance probability of a standard voice pattern in the time series direction is also conceivable.
[0026]
And the collation score of each item is determined in the item score determination part 132, and the high-order item selection part 133 ranks in an order from the vocabulary with the excellent collation score value. The vocabulary with the highest ranking is output as the recognition result.
[0027]
As described above, according to the present embodiment, an efficient speech recognition apparatus can be configured by adding a reading / reading sequence without registering all reading sequences as a recognition vocabulary. It becomes possible.
[0028]
Next, a process flow of a program that realizes the speech recognition apparatus according to the embodiment of the present invention will be described. FIG. 3 shows a flowchart of processing of a program that realizes the speech recognition apparatus according to the embodiment of the present invention.
[0029]
First, the collation score value of the reading column of each item registered in the reading table is calculated for the input speech (step 311). In this case, as in the prior art, the matching score value is the lowest for those not registered as recognition vocabulary.
[0030]
Next, a collation score value is calculated for a column in which each item registered in the recognized vocabulary list is replaced (step 312). In this step, even if there is no registration as a recognized vocabulary, if it has been added as a column that does not change the reading, the range that is determined as the recognized vocabulary is expanded by automatically generating the recognized vocabulary, A collation score value similar to that in which the recognition vocabulary is registered is calculated.
[0031]
Then, based on the calculation results of step 311 and step 312, the matching score for the item to be calculated is determined (step 321). The above processing is repeated for all items registered in the recognized vocabulary list (step 322). When the collation score values of all items are calculated, the items with the highest collation score values are selected in order and output as recognition results for the input speech (step 331).
[0032]
Further, the recording medium storing the program for realizing the speech recognition apparatus according to the embodiment of the present invention is only a portable recording medium such as a CD-ROM or a floppy disk as shown in the example of the recording medium shown in FIG. Instead, it may be any other storage device provided at the end of the communication line, or a recording medium such as a computer hard disk or RAM, and when the program is executed, the program is loaded and executed on the main memory.
[0033]
In addition, a recording medium on which a reading sequence conversion table or the like generated by the speech recognition apparatus according to the embodiment of the present invention is recorded is a CD-ROM, a floppy disk, or the like as shown in the example of the recording medium shown in FIG. In addition to the portable recording medium, any other storage device provided at the end of the communication line, or a recording medium such as a computer hard disk or RAM may be used. For example, when using the speech recognition apparatus according to the present invention. Read by computer.
[0034]
【The invention's effect】
As described above, according to the speech recognition method of the present invention, it is desired even when the speech input is not correctly read without registering all the reading columns as the recognition vocabulary in advance. A recognition result can be obtained.
[Brief description of the drawings]
FIG. 1 is a schematic configuration diagram of a speech recognition apparatus according to an embodiment of the present invention. FIG. 2 is a schematic configuration diagram of a speech recognition apparatus according to an embodiment of the present invention. Flow chart of processing [Fig. 4] Illustration of recording media [Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 Speech recognition apparatus 11 Acoustic analysis part 12 Reading column conversion part 13 Collation part 14 Reading string conversion table 15 Recognition vocabulary list 41 Memory | storage device 42 of line destination Portable recording media, such as CD-ROM and floppy disk 42- 1 CD-ROM
42-2 Floppy disk 43 Computer 44 Recording medium 121 such as RAM / hard disk on computer 121 Item search unit 122 Interchangeable reading column addition unit 131 Score calculation unit 132 Item score determination unit 133 Higher item selection unit

Claims

一つの項目が表記文字列と読みがな列とで構成される認識語彙リストを音声入力の認識語彙として与える音声認識方法において、
入力された音声を分析する音響分析工程と、
表記文字列の一部である部分表記文字列の読みがなである部分読みがな列に対応付けて当該部分表記文字列の読み変え読みがな列があらかじめ登録されている読みがな列変換表に従って、前記認識語彙リストの読みがな列に含まれる部分読みがな列を、前記音響分析工程による分析結果に含まれる読み変え読みがな列で置換した新たな読みがな列を一つ又は複数生成し、前記新たな読みがな列を読み変え読みがな列として前記認識語彙リストに追加する読みがな列変換工程と、
前記音響分析工程による分析結果を前記認識語彙リストの読みがな列及び読み変え読みがな列と照合し、照合度合の指標である照合スコアを算出し、前記照合スコアの最も良い前記項目を選択する照合工程とを含むことを特徴とする音声認識方法。In a speech recognition method in which a recognition vocabulary list in which one item is composed of a written character string and a reading string is given as a recognition vocabulary for speech input,
An acoustic analysis process for analyzing the input speech;
A reading sequence conversion in which a partial reading character string that is a part of the written character string is associated with a partial reading character string that is read in advance, and the reading character string of the partial writing character string is registered in advance. In accordance with the table , one new reading column is obtained by replacing the partial reading column included in the reading column of the recognized vocabulary list with the reading column included in the analysis result of the acoustic analysis step. Or a plurality of read-out column conversion steps to read the new reading column and add it to the recognized vocabulary list as a reading column .
The analysis result of the acoustic analysis step is collated with a reading column and a reading column of the recognized vocabulary list , a collation score that is an index of a collation degree is calculated, and the item having the best collation score is selected. A voice recognition method comprising: a collating step.

一つの項目が表記文字列と読みがな列とで構成される認識語彙リストを音声入力の認識語彙として与える音声認識装置において、
入力された音声を分析する音響分析部と、
表記文字列の一部である部分表記文字列の読みがなである部分読みがな列に対応付けて当該部分表記文字列の読み変え読みがな列があらかじめ登録されている読みがな列変換表に従って、前記認識語彙リストの読みがな列に含まれる部分読みがな列を、前記音響分析部による分析結果に含まれる読み変え読みがな列で置換した新たな読みがな列を一つ又は複数生成し、前記新たな読みがな列を読み変え読みがな列として前記認識語彙リストに追加する読みがな列変換部と、
前記音響分析部による分析結果を前記認識語彙リストの読みがな列及び読み変え読みがな列と照合し、照合度合の指標である照合スコアを算出し、前記照合スコアの最も良い前記項目を選択する照合部を含むことを特徴とした音声認識装置。In a speech recognition device that provides a recognition vocabulary list, in which one item is a written character string and a reading string, as a recognition vocabulary for speech input,
An acoustic analysis unit that analyzes the input speech;
A reading sequence conversion in which a partial reading character string that is a part of the written character string is associated with a partial reading character string that is read in advance, and the reading character string of the partial writing character string is registered in advance. According to the table , one new reading column is obtained by replacing the partial reading column included in the reading column of the recognized vocabulary list with the reading column included in the analysis result by the acoustic analysis unit. Or a plurality of reading column conversion units that generate and read the new reading column and add it to the recognized vocabulary list as a reading column ;
The analysis result by the acoustic analysis unit is collated with a reading column and a reading column of the recognized vocabulary list , a collation score that is an index of a collation degree is calculated, and the item having the best collation score is selected. A speech recognition apparatus including a matching unit.

一つの項目が表記文字列と読みがな列とで構成される認識語彙リストを音声入力の認識語彙として与えるコンピュータに実行させるプログラムを記録したコンピュータ読み取り可能な記録媒体において、
入力された音声を分析する音響分析手順と、
表記文字列の一部である部分表記文字列の読みがなである部分読みがな列に対応付けて当該部分表記文字列の読み変え読みがな列があらかじめ登録されている読みがな列変換表に従って、前記認識語彙リストの読みがな列に含まれる部分読みがな列を、前記音響分析手順による分析結果に含まれる読み変え読みがな列で置換した新たな読みがな列を一つ又は複数生成し、前記新たな読みがな列を読み変え読みがな列として前記認識語彙リストに追加する読みがな列変換手順と、
前記音響分析手順による分析結果を前記認識語彙リストの読みがな列及び読み変え読みがな列と照合し、照合度合の指標である照合スコアを算出し、前記照合スコアの最も良い前記項目を選択する照合手順をコンピュータに実行させるプログラムを記録したコンピュータ読み取り可能な記録媒体。In a computer-readable recording medium recording a program for causing a computer to execute a computer that gives a recognition vocabulary list in which one item is a written character string and a reading string as a recognition vocabulary for speech input,
An acoustic analysis procedure for analyzing the input speech;
A reading sequence conversion in which a partial reading character string that is a part of the written character string is associated with a partial reading character string that is read in advance, and the reading character string of the partial writing character string is registered in advance. According to the table , one new reading column is obtained by replacing the partial reading column included in the reading column of the recognized vocabulary list with the reading column included in the analysis result by the acoustic analysis procedure. Or a plurality of reading sequence to convert the new reading sequence to read and add to the recognized vocabulary list as a reading sequence ,
The analysis result by the acoustic analysis procedure is collated with a reading column and a reading column of the recognized vocabulary list , a collation score that is an index of a collation degree is calculated, and the item having the best collation score is selected. a computer-readable recording medium a program executed by a computer verification procedure for.