JPH0567240B2

JPH0567240B2 -

Info

Publication number: JPH0567240B2
Application number: JP29512185A
Authority: JP
Inventors: Hiroshi Mobara
Original assignee: Tokyo Shibaura Electric Co Ltd
Current assignee: Toshiba Corp
Priority date: 1985-12-25
Filing date: 1985-12-25
Publication date: 1993-09-24
Also published as: JPS62150396A

Description

【発明の詳細な説明】〔発明の技術分野〕本発明は音声認識装置、特に不特定話者の音声
を認識する装置に関する。DETAILED DESCRIPTION OF THE INVENTION [Technical Field of the Invention] The present invention relates to a speech recognition device, and particularly to a device for recognizing speech of unspecified speakers.

（従来技術とその問題点〕一般に音声認識装置は、特定話者向きの装置と
不特定話者向きの装置とに大別される。前者は、
あらかじめユーザが自分の音声パターンを登録し
ておき辞書を作成し、この辞書内に登録された音
声パターンと実際に入力された音声パターンとを
比較して特定話者用の認識アルゴリズムを用いて
認識を行つている。辞書に登録すべき単語はユー
ザが自由に定義できるという利点を有するが、登
録を行なつた特定の話者の音声しか認識できない
という重大な欠点を有する。(Prior art and its problems) Speech recognition devices are generally divided into devices for specific speakers and devices for non-specific speakers.The former is
Users register their own voice patterns in advance and create a dictionary, and the voice patterns registered in this dictionary are compared with the voice patterns actually input and recognized using a recognition algorithm for a specific speaker. is going on. Although this method has the advantage that the words to be registered in the dictionary can be freely defined by the user, it has the serious drawback that only the voice of a specific speaker who has registered the words can be recognized.

これに対し後者は、メーカ柄で多数人の音声パ
ターンデータに基づいて不特定人が特定の言葉等
を発声したときに共通に抽出される音声パターン
である共通音声パターンデータを用意し、これを
あらかじめ辞書登録しておくものである。前者と
異なり、共通音声パターンを用いて不特定話者の
音声を不特定話者用の認識アルゴリズムを用いて
認識している。 On the other hand, with the latter, the manufacturer prepares common voice pattern data, which is a voice pattern commonly extracted when an unspecified person utters a specific word, etc., based on the voice pattern data of many people. This must be registered in the dictionary in advance. Unlike the former, this method uses a common speech pattern to recognize the voice of an unspecified speaker using a recognition algorithm for unspecified speakers.

第２図は従来のこの不特定話者向きの音声認識
装置のブロツク図である。音声入力部１、例えば
マイクによつて入力された音声は音声パターン信
号に変換され、音声認識部２に与えられる。 FIG. 2 is a block diagram of a conventional speech recognition device suitable for unspecified speakers. Voice input through a voice input section 1, for example a microphone, is converted into a voice pattern signal and provided to a voice recognition section 2.

一方、辞書部３は、例えばROM等の記憶装置
から構成され、あらかじめ所定の単語についての
共通音声パターンデータが登録されている。音声
認識部２は音声入力部１から与えられた音声パタ
ーン信号に基づいて辞書部３から類似音声パータ
ンを検索し、音声認識を行なう。なお第２図で一
点鎖線で囲つた部分は一体化した装置を構成する
ことを示す。この装置は不特定話者の音声を認識
できるという利点がある反面、辞書部３の内容は
メーカ側で決められてしまい、ユーザが自由に定
義できず、固有名詞等特有の単語の登録が必要な
装置には適用しにくいという欠点がある。 On the other hand, the dictionary section 3 is comprised of a storage device such as a ROM, and has common speech pattern data for predetermined words registered in advance. The speech recognition section 2 searches the dictionary section 3 for a similar speech pattern based on the speech pattern signal given from the speech input section 1, and performs speech recognition. In FIG. 2, the portion surrounded by a dashed line indicates that an integrated device is constructed. Although this device has the advantage of being able to recognize speech from any speaker, the contents of the dictionary section 3 are determined by the manufacturer and cannot be freely defined by the user, and it is necessary to register unique words such as proper nouns. The disadvantage is that it is difficult to apply to modern equipment.

〔発明の目的〕[Purpose of the invention]

そこで本発明は、ユーザが辞書内容を容易に定
義でき、しかも不特定話者の音声識別が可能な音
声識別装置を提供することを目的とする。 SUMMARY OF THE INVENTION Therefore, an object of the present invention is to provide a voice recognition device that allows a user to easily define the contents of a dictionary and that can identify voices of unspecified speakers.

〔発明の構成〕[Structure of the invention]

上記目的を達成するため本発明の音声認識装置
は、不特定話者の音声を音声パターン信号に変換
する音声入力部と、複数の話者が特定の言葉を発
声したときに複数の発声に共通する音声パターン
として抽出される共通音声パターンを複数記憶す
る書換え可能な辞書部と、上記音声入力部から与
えられた音声パターン信号に基づいて不特定話者
の認識アルゴリズムを用いて上記辞書部から類似
する音声パターンを検索する音声認識部と、を有
し、不特定話者の音声認識に用いるべき種々の共通
音声パターンのデータが予め記録されて提供され
る情報記録カードと、上記情報記録カードから上
記共通音声パターンを読込んでこれを上記辞書部
に記憶させる読込部と、を備えることを特徴とす
る。 In order to achieve the above object, the speech recognition device of the present invention includes a speech input section that converts the speech of an unspecified speaker into a speech pattern signal, and a speech recognition device that is common to multiple utterances when multiple speakers utter specific words. a rewritable dictionary section that stores a plurality of common speech patterns extracted as speech patterns; and a rewritable dictionary section that stores a plurality of common speech patterns extracted as speech patterns; a voice recognition unit that searches for a voice pattern to be used for voice recognition; The device is characterized by comprising a reading unit that reads the common voice pattern and stores it in the dictionary unit.

〔発明の実施例〕[Embodiments of the invention]

以下本発明を第１図に示す実施例に基づいて説
明する。本実施例は、本発明を電話の自動ダイヤ
ル装置に適用したものである。従来からプツシユ
ホン等には、頻繁に使用する電話番号を短縮番号
に対応づけて記憶しておく機能が設けられている
が、本装置を使用すれば、短縮番号のダイヤル操
作も不要となり、発呼者は相手先の名前を音声入
力するだけでよい。 The present invention will be explained below based on the embodiment shown in FIG. In this embodiment, the present invention is applied to an automatic telephone dialing device. Traditionally, pushphones and other devices have been equipped with a function to store frequently used phone numbers in association with abbreviated numbers, but with this device, dialing abbreviated numbers is no longer necessary, making it easier to make calls. All the user has to do is input the recipient's name by voice.

従来装置と同様に、音声入力部１（例えばマイ
ク）により入力された音声は音声パターン信号と
して音声認識部２に与えられ、音声認識部２はこ
の音声パターン信号に基づいて辞書部３′から類
似音声パターンを検索し、音声認識を行なう。た
だこの辞書部３′は、従来装置における辞書部３
とは異なり、書換え可能な記憶装置、例えば
RAM、EPROM等で構成されている。特定話者
向きの音声認識装置では、ユーザがマイク等の音
声入力部を介して辞書部に直接音声パターンの書
込みを行なうことができる。しかし、不特定話者
向きの音声認識装置では、不特定話者に適用でき
る共通音声パターンの生成を行なわねばならない
ので、ユーザ側でこの音声パターンを準備するこ
とは非常に困難である。そこで本発明では、読込
部４を設け、予め共通音声パターンのデータが記
録された情報記録カード５から音声パターンを読
込み、これを辞書部３′に登録させるようにして
いる。この登録操作は、例えばキーボード６を用
いて行なえばよい。 Similar to the conventional device, the voice input by the voice input section 1 (for example, a microphone) is given to the voice recognition section 2 as a voice pattern signal, and the voice recognition section 2 selects similar sounds from the dictionary section 3' based on this voice pattern signal. Search for voice patterns and perform voice recognition. However, this dictionary section 3' is different from the dictionary section 3' in the conventional device.
Unlike rewritable storage devices, e.g.
It consists of RAM, EPROM, etc. In a speech recognition device suitable for a specific speaker, a user can directly write a speech pattern into a dictionary section through a speech input section such as a microphone. However, in a speech recognition device suitable for unspecified speakers, it is necessary to generate a common speech pattern that can be applied to unspecified speakers, so it is extremely difficult for the user to prepare this speech pattern. Therefore, in the present invention, a reading section 4 is provided to read a voice pattern from an information recording card 5 on which data of a common voice pattern has been recorded in advance, and to register this in the dictionary section 3'. This registration operation may be performed using the keyboard 6, for example.

さて、いま例えば本装置が10人分の電話番号に
ついての音声発呼機能を付加する機能を有するも
のとしてその動作を説明する。プツシユホン等の
電話器には＃１〜＃10の短縮番号に対応して、実
際の電話番号が登録されている。従つて本音声認
識装置の機能は10人の名前の音声入力に対して
＃１〜＃10の短縮番号を割当てられればよいこと
になる。音声認識部２から＃１〜＃10のいずれか
の短縮番号が認識結果として出力されれば、電話
器側ではこれに基づいて外線呼出しを行なうこと
ができる。 Now, for example, the operation of this device will be explained assuming that the device has a function of adding a voice calling function to the telephone numbers of 10 people. Actual telephone numbers are registered in telephones such as pushphones, corresponding to abbreviated numbers #1 to #10. Therefore, the function of the present voice recognition device is that it can allocate abbreviated numbers #1 to #10 to the voice input of the names of 10 people. If the voice recognition section 2 outputs any of the abbreviated numbers #1 to #10 as a recognition result, the telephone side can make an outside call based on this.

まず、ユーザは各短縮番号に対応して登録すべ
き名前を決める。例えば＃１：鈴木、＃２：田
中、…と登録するものとする。この登録を行なう
ためにユーザは各名前の共通音声データを用意す
る。上述の例では、カード５１には“鈴木”なる
共通音声データのパターンが、カード５２には
“田中”なる共通音声データのパターンが記録さ
れている。登録操作はまずキーボード６上の
REGSTキー６２を押すことによつて行なわれ
る。これによつて本装置は登録モードになる。続
いてカード５１を読込部４に挿入し、“鈴木”な
る共通音声データのパターンを読込ませる。ユー
ザはここでキーボード６から“１”キーを入力
し、“鈴木”なる単語を番号＃１に登録する。更
にカード５２を挿入して“田中”なる単語を同様
に番号＃２に登録する。このようにしてすべての
登録が終了したら、RECキー６１を押すことに
よつて本装置は音声認識モードになる。この音声
認識モードで例えば発呼者が“鈴木”なる音声を
発すると、音声入力部１を介してその音声パター
ンが音声認識部２に与えられる。音声認識部２は
辞書部３′を検索して該音声パターが＃１に対応
することを認識し、これを認識結果として電話器
に出力する。なお、登録したデータはCANキー
６３によつてキヤンセルすることができる。ま
た、キーボード６から数字キーの入力を行なうか
わりに、読込部４が読込んだ順に＃１〜＃10まで
割当てて登録するようにしてもよい。 First, the user decides which name should be registered in correspondence with each abbreviated number. For example, assume that #1: Suzuki, #2: Tanaka, etc. are registered. In order to perform this registration, the user prepares common voice data for each name. In the above example, the common voice data pattern "Suzuki" is recorded on the card 51, and the common voice data pattern "Tanaka" is recorded on the card 52. The registration operation is first done on the keyboard 6.
This is done by pressing the REGST key 62. This puts the device into registration mode. Next, the card 51 is inserted into the reading section 4, and the common voice data pattern "Suzuki" is read. The user now inputs the "1" key from the keyboard 6 to register the word "Suzuki" as number #1. Further, the card 52 is inserted and the word "Tanaka" is similarly registered in number #2. When all the registrations are completed in this manner, by pressing the REC key 61, the device enters the voice recognition mode. In this voice recognition mode, for example, when a caller utters the voice "Suzuki," the voice pattern is provided to the voice recognition section 2 via the voice input section 1. The speech recognition section 2 searches the dictionary section 3', recognizes that the speech pattern corresponds to #1, and outputs this as a recognition result to the telephone set. Note that the registered data can be canceled using the CAN key 63. Further, instead of inputting numerical keys from the keyboard 6, numbers #1 to #10 may be assigned and registered in the order read by the reading unit 4.

情報記録カードとしては、紙またはプラスチツ
クの材質のカードに磁気的または光学的に音声パ
ターンを書込んだものを用いればよい。あるいは
ICカードを利用することもできる。これらのカ
ードは、ハードデイスク、フロツピイデイスクと
いつた媒体に比べ、安価で取扱いも簡単である。
メーカ側では、種々の音声パターンを記録した情
報記録カードをソフトウエアライブラリーとして
供給すればよい。ユーザは自分の必要なカードを
購入し利用することができる。 The information recording card may be a paper or plastic card on which a sound pattern is magnetically or optically written. or
You can also use an IC card. These cards are cheaper and easier to handle than media such as hard disks and floppy disks.
The manufacturer may supply information recording cards with various audio patterns recorded thereon as a software library. Users can purchase and use the cards they need.

〔発明の効果〕〔Effect of the invention〕

以上説明したように本発明は、予め不特定話者
識別に用いる種々の共通音声パターンのデータが
記録されて提供される情報記録カードから必要な
ものを選択して音声パターンを記憶する辞書に読
込ませて音声認識装置を動作させるので、利用者
が自分の肉声をカードや音声識別装置の辞書に登
録する場合の面倒な操作や、録音に必要な静かな
環境を要せずに、簡単に所望の共通音声パターン
の辞書登録を行つて不特定話者の音声識別を行う
ことが可能となる。 As explained above, the present invention provides information recording cards in which data of various common voice patterns used for unspecified speaker identification are recorded in advance, and a necessary one is selected from the provided information recording card and read into a dictionary that stores the voice patterns. Since the voice recognition device is operated by the user, there is no need for the user to perform the troublesome operations of registering his or her real voice in the card or the dictionary of the voice recognition device, or to create a quiet environment necessary for recording. By registering common speech patterns in a dictionary, it becomes possible to identify the speech of unspecified speakers.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は本発明に係る音声認識装置のブロツク
図、第２図は従来の音声認識装置のブロツク図で
ある。１……音声入力部、２……音声認識部、３，
３′……辞書部、４……読込部、５……情報記録
カード、６……キーボード。 FIG. 1 is a block diagram of a speech recognition device according to the present invention, and FIG. 2 is a block diagram of a conventional speech recognition device. 1...Speech input section, 2...Speech recognition section, 3,
3'...Dictionary section, 4...Reading section, 5...Information recording card, 6...Keyboard.

Claims

【特許請求の範囲】１不特定話者の音声を音声パターン信号に変換
する音声入力部と、複数の話者が特定の言葉を発声したときに複数
の発声に共通する音声パターンとして抽出される
共通音声パターンを複数記憶する書換え可能な辞
書部と、前記音声入力部から与えられた音声パターン信
号に基づいて前記辞書部から類似する共通音声パ
ターンを検索する音声認識部と、を有する音声認識装置において、不特定話者の音声認識に用いるべき種々の共通
音声パターンのデータが予め記録されて提供され
る情報記録カードと、前記情報記録カードから前記共通音声パターン
を読込んでこれを前記辞書部に記憶させる読込部
と、を備えることを特徴とする音声認識装置。２前記情報記録カードが紙またはプラスチツク
から成り、磁気的または光学的に音声パターンが
記録されていることを特徴とする特許請求の範囲
第１項に記載の音声認識装置。３前記情報記録カードがICカードからなり、
このICカード上の記憶素子に音声パターンが記
録されていることを特徴とする特許請求の範囲第
１項記載の音声認識装置。４前記読込部がキーボードを有し、このキーボ
ードによつて指定された辞書部内の領域に音声パ
ターンを記憶させることを特徴とする特許請求の
範囲第１項乃至第３項のいずれか１つに記載の音
声認識装置。[Claims] 1. A voice input unit that converts the voice of an unspecified speaker into a voice pattern signal, and a voice pattern signal that is extracted as a voice pattern common to a plurality of utterances when a plurality of speakers utter specific words. A speech recognition device comprising: a rewritable dictionary section that stores a plurality of common speech patterns; and a speech recognition section that searches the dictionary section for a similar common speech pattern based on a speech pattern signal given from the speech input section. an information recording card provided with pre-recorded data of various common voice patterns to be used for speech recognition of unspecified speakers; and an information recording card that reads the common voice patterns from the information recording card and sends them to the dictionary section. A speech recognition device comprising: a reading unit for storing data; 2. The voice recognition device according to claim 1, wherein the information recording card is made of paper or plastic, and has a voice pattern recorded thereon magnetically or optically. 3. The information recording card consists of an IC card,
2. The voice recognition device according to claim 1, wherein a voice pattern is recorded in a memory element on the IC card. 4. According to any one of claims 1 to 3, the reading section has a keyboard, and stores the voice pattern in an area within the dictionary section designated by the keyboard. The voice recognition device described.