JPH0713598A

JPH0713598A - Specific task speech data base generating device

Info

Publication number: JPH0713598A
Application number: JP5153177A
Authority: JP
Inventors: Hiroshi Kurokawa; 寛黒川; Shingo Fujiwara; 紳吾藤原
Original assignee: Osaka Gas Co Ltd
Current assignee: Osaka Gas Co Ltd
Priority date: 1993-06-24
Filing date: 1993-06-24
Publication date: 1995-01-17

Abstract

PURPOSE:To provide the specific task speech data base generating device which can greatly reduce the labor required to record speech data. CONSTITUTION:This device is equipped with a 1st appearance frequency calculating means 1 which calculates the 1st appearance frequency of a phoneme chain in an object sentence 7 of a specific task, a selecting means 2 which selects a subject in a general speech data base 8, a 2nd appearance frequency calculating means 3 which calculates the 2nd appearance frequency of the phoneme chain in the selected subset, a correlation value calculating means 4 which calculates the correlation value between the 2nd appearance frequency and 1st appearance frequency, a specifying means 5 which specifies the subset of a sentence having an appearance frequency closer to the 1st appearance frequency by utilizing the calculated correlation value, and a data base generating means 6 which selects speech information in the general speech data base 8 corresponding to the specified document and generates a speech data base 9 for specific task.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、音声認識処理において
用いる音声認識モデルについて、特定のタスクに適応化
するために、一般の音声データベースから特定タスク用
の音声データベースを生成する特定タスク音声データベ
ース生成装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a speech recognition model used in speech recognition processing, and generates a speech database for a specific task from a general speech database in order to adapt it to a specific task. It relates to the device.

【０００２】[0002]

【従来の技術】一般に、音声認識処理等に用いられる音
声認識モデルは、任意のタスクについてもある程度の認
識性能を持たせるための注意、工夫がなされている。し
かし、特定のタスクにのみ音声認識を用いることが分か
っている場合は、音声認識モデルを、そのタスクに適応
化させて用いた方が、認識性能が向上する。ここで、そ
の適応化を実行するためには、そのタスクにおける典型
的な文章や単語を発話した音声データを用いる必要があ
る。この音声データの収集は、従来、タスク読み上げ文
の作成、話者の確保、タスク文の読み上げと収録、収録
データに誤りがないか等の検査、更に、データ収録を行
うための環境設定等の作業により行っていた。2. Description of the Related Art Generally, a voice recognition model used for a voice recognition process or the like is designed so as to give a certain level of recognition performance to arbitrary tasks. However, if it is known that the speech recognition is used only for a specific task, the recognition performance is improved when the speech recognition model is adapted to the task and used. Here, in order to execute the adaptation, it is necessary to use the voice data that utters a typical sentence or word in the task. This voice data collection has traditionally been done by creating task reading sentences, securing speakers, reading and recording task sentences, inspecting recorded data for errors, and setting the environment for recording data. It was done by work.

【０００３】[0003]

【発明が解決しようとする課題】しかしながら、以上の
ような方法では、音声データの収録を行う場合、データ
収録の準備、タスク文の読み上げ及び収録データの検査
に大きな労力（時間、コスト）が必要であり、特に話者
に対する負担が大きいという課題がある。However, in the above method, when voice data is recorded, a large amount of labor (time and cost) is required for preparing the data recording, reading out the task sentence and inspecting the recorded data. Therefore, there is a problem that the burden on the speaker is large.

【０００４】本発明は、従来の音声データ収録のこのよ
うな課題を考慮し、音声データの収録に必要とされる労
力が大幅に削減できる特定タスク音声データベース生成
装置を提供することを目的とするものである。SUMMARY OF THE INVENTION The present invention has been made in consideration of such problems of the conventional voice data recording, and an object of the present invention is to provide a specific task voice database generation device capable of significantly reducing the labor required for recording the voice data. It is a thing.

【０００５】[0005]

【課題を解決するための手段】請求項１の本発明は、入
力された特定のタスクに関する対象文における各所定の
音素連鎖の第１の出現度数を算出する第１出現度数算出
手段と、多種類のタスクに対応する複数の文を有する一
般的な音声データベースの文集合から、所定の規則に従
って複数個の文の部分集合を複数種類選択する選択手段
と、その選択された各部分集合における各音素連鎖の第
２の出現度数を算出する第２出現度数算出手段と、それ
らの第２の出現度数及び第１の出現度数間の複数個の相
関値を算出する相関値算出手段と、その算出された相関
値を利用して第１の出現度数により近い出現度数を有す
る文の部分集合を特定する特定手段と、その特定結果に
基づき、それらの文章に対応する、一般的音声データベ
ース中の音声情報を選び、特定タスク用の音声データベ
ースを生成するデータベース生成手段とを備えた特定タ
スク音声データベース生成装置である。According to the present invention of claim 1, first appearance frequency calculating means for calculating a first appearance frequency of each predetermined phoneme chain in a target sentence relating to a specific input task, Selection means for selecting a plurality of types of subsets of a plurality of sentences according to a predetermined rule from a sentence set of a general voice database having a plurality of sentences corresponding to types of tasks, and each of the selected subsets. Second appearance frequency calculating means for calculating the second appearance frequency of the phoneme chain, correlation value calculating means for calculating a plurality of correlation values between the second appearance frequency and the first appearance frequency, and the calculation Specifying means for specifying a subset of sentences having an appearance frequency closer to the first appearance frequency by using the obtained correlation value, and voices in a general voice database corresponding to those sentences based on the identification result. information Select a specific task speech database generator with a database generation means for generating a speech database for a specific task.

【０００６】請求項３の本発明は、入力された所定のタ
スクに関する対象文における各所定の音素連鎖の出現度
数が０でない出現音素連鎖を検出する音素連鎖検出手段
と、多種類のタスクに対応する複数の文を有する一般的
な音声データベースの文集合から、所定の規則に従って
複数個の文の部分集合を複数種類選択する選択手段と、
その選択された各部分集合における、出現音素連鎖に関
するエントロピーを算出するエントロピー算出手段と、
その算出されたエントロピーを利用して最大のエントロ
ピーを有する文の部分集合を特定する特定手段と、その
特定結果に基づき、それらの文章に対応する、一般的音
声データベース中の音声情報を選び、特定タスク用の音
声データベースを生成するデータベース生成手段とを備
えた特定タスク音声データベース生成装置である。The present invention according to claim 3 corresponds to a plurality of types of tasks, and a phoneme chain detecting means for detecting an appearing phoneme chain in which the occurrence frequency of each predetermined phoneme chain in the target sentence relating to the inputted predetermined task is not 0. Selecting means for selecting a plurality of types of subsets of a plurality of sentences according to a predetermined rule from a sentence set of a general voice database having a plurality of sentences to
In each of the selected subsets, entropy calculation means for calculating entropy regarding the appearance phoneme chain,
Using the calculated entropy, specifying means for specifying the subset of sentences having the maximum entropy, and based on the specifying result, selecting and specifying the voice information in the general voice database corresponding to those sentences. It is a specific task voice database generation device comprising a database generation means for generating a voice database for a task.

【０００７】[0007]

【作用】本発明は、第１出現度数算出手段が、特定タス
クの対象文における各音素連鎖の第１の出現度数を算出
し、選択手段が、一般的な音声データベースの文集合か
ら、複数個の文の部分集合を複数種類選択し、第２出現
度数算出手段が、選択された各部分集合における各音素
連鎖の第２の出現度数を算出し、相関値算出手段が、そ
れらの第２の出現度数及び第１の出現度数間の複数個の
相関値を算出し、特定手段が、算出された相関値を利用
して第１の出現度数により近い出現度数を有する文の部
分集合を特定し、データベース生成手段が、特定結果に
基づき、それらの文章に対応する、一般的音声データベ
ース中の音声情報を選び、特定タスク用の音声データベ
ースを生成する。According to the present invention, the first appearance frequency calculating means calculates the first appearance frequency of each phoneme chain in the target sentence of the specific task, and the selecting means selects a plurality of sentences from a general voice database sentence set. A plurality of types of sentence subsets are selected, the second appearance frequency calculation means calculates the second appearance frequency of each phoneme chain in each of the selected subsets, and the correlation value calculation means calculates the second appearance frequency thereof. A plurality of correlation values between the frequency of appearance and the first frequency of occurrence are calculated, and the specifying means specifies a subset of sentences having a frequency of occurrence closer to the first frequency of occurrence using the calculated correlation value. The database generating means selects the voice information in the general voice database corresponding to those sentences based on the specific result, and generates the voice database for the specific task.

【０００８】また本発明は、音素連鎖検出手段が、対象
文における各音素連鎖の出現度数が０でない出現音素連
鎖を検出し、選択手段が、一般的な音声データベースの
文集合から、複数個の文の部分集合を複数種類選択し、
エントロピー算出手段が、選択された各部分集合におけ
る、出現音素連鎖に関するエントロピーを算出し、特定
手段が、算出されたエントロピーを利用して最大のエン
トロピーを有する文の部分集合を特定し、データベース
生成手段が、特定結果に基づき、それらの文章に対応す
る、一般的音声データベース中の音声情報を選び、特定
タスク用の音声データベースを生成する。Further, according to the present invention, the phoneme chain detecting means detects an appearing phoneme chain whose occurrence frequency of each phoneme chain in the target sentence is not 0, and the selecting means selects a plurality of phoneme chain sentences from a general speech database sentence set. Select multiple types of sentence subsets,
The entropy calculating means calculates entropy regarding the appearance phoneme chain in each selected subset, and the specifying means specifies the subset of the sentence having the maximum entropy by using the calculated entropy, and the database generating means. On the basis of the identification result, the speech information in the general speech database corresponding to those sentences is selected, and the speech database for the specific task is generated.

【０００９】[0009]

【実施例】以下に、本発明をその実施例を示す図面に基
づいて説明する。DESCRIPTION OF THE PREFERRED EMBODIMENTS The present invention will be described below with reference to the drawings showing its embodiments.

【００１０】図１は、本発明にかかる第１の実施例の特
定タスク音声データベース生成装置の構成図である。す
なわち、特定タスク音声データベース生成装置には、特
定のタスクにおける所定の対象文７を入力し、その入力
された対象文７について種々の音素連鎖毎の出現度数
（以下、第１出現度数と呼ぶ）を算出する第１出現度数
算出手段１が設けられ、他方、広範囲のタスクについて
対応できるように、複数の文章により構成されている一
般的音声データベース８の文集合から、任意の文の部分
集合を選択する選択手段２が設けられている。その選択
手段２は、選択された部分集合における音素連鎖毎の出
現度数（以下、第２出現度数と呼ぶ）を算出する第２出
現度数算出手段３に接続され、その第２出現度数算出手
段３及び前述の第１出現度数算出手段１は、第１出現度
数及び第２出現度数間の相関値を算出する相関値算出手
段４に接続されている。その相関値算出手段４は、算出
された相関値を利用して（より高い相関値のものを特定
する）、第１出現度数に、より近い出現度数を有する文
の部分集合を特定する特定手段５に接続され、その特定
手段５は、その特定された文の部分集合に対応する音声
情報を、一般的音声データベース８中から選択して、特
定タスク用音声データベース９を生成するデータベース
生成手段６に接続されている。FIG. 1 is a block diagram of a specific task voice database generating apparatus according to a first embodiment of the present invention. That is, a predetermined target sentence 7 in a specific task is input to the specific task speech database generation device, and the frequency of appearance for each phoneme chain of the input target sentence 7 (hereinafter referred to as the first frequency of occurrence) is input. The first appearance frequency calculation means 1 for calculating the expression 1 is provided, and on the other hand, a subset of arbitrary sentences is selected from the sentence set of the general voice database 8 composed of a plurality of sentences so that a wide range of tasks can be dealt with. A selection means 2 for selecting is provided. The selecting means 2 is connected to the second appearance frequency calculating means 3 for calculating the appearance frequency for each phoneme chain in the selected subset (hereinafter referred to as the second appearance frequency), and the second appearance frequency calculating means 3 is connected to the second appearance frequency calculating means 3. The first appearance frequency calculation means 1 is connected to the correlation value calculation means 4 for calculating the correlation value between the first appearance frequency and the second appearance frequency. The correlation value calculating means 4 uses the calculated correlation value (identifies one having a higher correlation value) to specify a subset of sentences having an appearance frequency closer to the first appearance frequency. The database generating means 6 is connected to the database 5, and the specifying means 5 selects the voice information corresponding to the specified subset of sentences from the general voice database 8 to generate the voice database 9 for the specific task. It is connected to the.

【００１１】次に、上記第１の実施例の特定タスク音声
データベース生成装置の動作について図２を参照しなが
ら説明する。Next, the operation of the specific task voice database generation device of the first embodiment will be described with reference to FIG.

【００１２】まず、対象文７が入力されると（ステップ
Ｓ１）、第１出現度数算出手段１は、その対象文７にお
ける各音素連鎖の出現度数である第１出現度数を算出す
る（ステップＳ２）。一方、選択手段２は、一般的音声
データベース８から、ある文の部分集合を選択し（ステ
ップＳ３）、選択された文の部分集合における各音素連
鎖の出現度数である第２出現度数が、第２出現度数算出
手段３により算出される（ステップＳ４）。First, when the target sentence 7 is input (step S1), the first appearance frequency calculating means 1 calculates the first appearance frequency which is the appearance frequency of each phoneme chain in the target sentence 7 (step S2). ). On the other hand, the selection means 2 selects a certain sentence subset from the general speech database 8 (step S3), and the second appearance frequency, which is the appearance frequency of each phoneme chain in the selected sentence subset, becomes the second. 2 It is calculated by the appearance frequency calculation means 3 (step S4).

【００１３】このようにして算出された第１出現度数及
び第２出現度数は、相関値算出手段４に入力され、それ
ら出現度数間の相関値が算出される（ステップＳ５）。
次にその相関値により、目標とする文の部分集合を特定
し得たかどうか判断する（ステップＳ６）。すなわち、
一回目に選択された部分集合に関する相関値には比較対
象が無いので一旦記憶し、無条件にステップ７へ進み、
上記の文の部分集合中の１文を一般的音声データベース
８から選択した別の１文に変更し（ステップＳ７）、そ
の変更された文の部分集合について、前述と同様に第２
出現度数を算出し（ステップＳ４）、その第２出現度数
及び上述の第１出現度数間の相関値を算出し（ステップ
Ｓ５）、特定手段５により、その新しい相関値と前回算
出した相関値とを比較して相関値の高い方を、次の比較
の基準として残す（ステップＳ６）。The first appearance frequency and the second appearance frequency thus calculated are input to the correlation value calculating means 4, and the correlation value between these appearance frequencies is calculated (step S5).
Next, it is determined whether the target sentence subset can be identified based on the correlation value (step S6). That is,
Since there is no comparison target in the correlation value related to the subset selected the first time, it is temporarily stored, and unconditionally proceeds to step 7,
One sentence in the sentence subset is changed to another sentence selected from the general speech database 8 (step S7), and the changed sentence subset is changed to the second sentence as described above.
The frequency of appearance is calculated (step S4), the correlation value between the second frequency of occurrence and the first frequency of occurrence described above is calculated (step S5), and the new correlation value and the previously calculated correlation value are determined by the specifying means 5. And the one with the higher correlation value is left as the reference for the next comparison (step S6).

【００１４】以上の処理を、相関値が十分高くなるか、
文の変更が所定回数を越えるか、変更する文が無くなる
まで繰り返し行う（ステップＳ４，Ｓ５，Ｓ６，Ｓ
７）。その後、文の部分集合を特定できるだけの相関値
が得られると（ステップＳ６）、その相関値に対応する
文の部分集合を特定する（ステップＳ８）。次に、デー
タベース生成手段６により、その特定された文の部分集
合に対応する音声情報を、一般的音声データベース８か
ら選択し（ステップＳ９）、特定タスク用音声データベ
ースを生成する（ステップＳ１０）。Whether the correlation value becomes sufficiently high,
It is repeated until the number of sentence changes exceeds a predetermined number or there is no sentence to be changed (steps S4, S5, S6, S).
7). After that, when a correlation value sufficient to specify the sentence subset is obtained (step S6), the sentence subset corresponding to the correlation value is identified (step S8). Next, the database generation means 6 selects the voice information corresponding to the specified subset of sentences from the general voice database 8 (step S9), and generates the voice database for the specific task (step S10).

【００１５】図３は、上記本実施例を更に具体的に説明
するものであって、上述の対象文（Ｓとする）７及び、
その対象文（Ｓ）７における各音素連鎖の出現度数（Ｔ
とする）の一例を示す図である。又、図４（ａ）は、一
般的音声データベース８から任意に選択した文の部分集
合（Ｓ0 とし、ここでは２文を選択する）及び、その文
の部分集合（Ｓ0 ）における各音素連鎖の出現度数（Ｔ
0 とする）の一例を示す図である。図４（ｂ）は、前述
の図４（ａ）に示した文の部分集合（Ｓ0 ）の２文のう
ち、上側の１文を一般的音声データベース８から選択し
た別の１文（この１文は、一般的音声データベース８中
の全部の文集合から部分集合（Ｓ0 ）を除いた残りの文
集合から選ぶ）に変更した場合の文の部分集合（Ｓ1と
する）及び、変更された文の部分集合（Ｓ1）における
各音素連鎖の出現度数（Ｔ1とする）の一例を示す図で
ある。FIG. 3 is a diagram for more specifically explaining the present embodiment, which includes the above-mentioned target sentence (S) 7 and
The appearance frequency (T) of each phoneme chain in the target sentence (S) 7
FIG. Further, FIG. 4A shows a subset of sentences (S0, two sentences are selected here) arbitrarily selected from the general speech database 8 and phoneme chains of each sentence subset (S0). Occurrence frequency (T
It is a figure which shows an example of (it is set to 0). FIG. 4 (b) shows another one sentence (this one sentence selected from the general voice database 8 of the upper one sentence of the two sentences of the sentence subset (S0) shown in FIG. 4 (a). Sentences are selected from the whole sentence set in the general voice database 8 and selected from the remaining sentence set except the subset (S0)), and the sentence subset (S1) and the changed sentence It is a figure which shows an example of the appearance frequency (it is set as T1) of each phoneme chain in a subset (S1) of.

【００１６】ここで、前述した文の部分集合の特定の方
法は、まず、対象文（Ｓ）における出現度数（Ｔ）と、
文の部分集合（Ｓ0）における出現度数（Ｔ0）との相関
を取り、その値を例えば、Spearman の順位相関を用い
て相関値ｒ₀を計算すると、ｒ₀＝0.046 となる。同様
にして、対象文（Ｓ）における出現度数（Ｔ）と、１文
を変更した文の部分集合（Ｓ1）における出現度数（Ｔ
1）との相関を取り、その相関値ｒ₁ を計算すると、ｒ₁
＝0.534 となる。Here, in the method of identifying a subset of sentences described above, first, the occurrence frequency (T) in the target sentence (S),
Correlates with the occurrence frequency (T0) in the subset of sentences (S0), the value for example, when calculating the correlation value r ₀ with the rank correlation of Spearman, the r ₀ = 0.046. Similarly, the appearance frequency (T) in the target sentence (S) and the appearance frequency (T) in the sentence subset (S1) obtained by changing one sentence.
1) and the correlation value r ₁ is calculated, r ₁
= 0.534.

【００１７】次に、算出された相関値ｒ₀，ｒ₁を比較
し、比較の結果ｒ₀＜ｒ₁であれば、文の部分集合（Ｓ
1）を次回の比較に用いる部分集合（Ｓ0）とし、その部
分集合（Ｓ0）について同様の処理を行う。逆にｒ₀≧ｒ
₁ であれば、文の部分集合（Ｓ0）をそのままとして同
様の処理を行う。このようにして、所定回数の部分集合
の変更を行った後、あるいは、一定の制限下での全ての
部分集合を選んだ後、最終的に出現度数（Ｔ）との相関
値が最も高い文の部分集合を特定する。Next, the calculated correlation values r ₀ and r ₁ are compared, and if the comparison result r ₀ <r ₁ , the sentence subset (S
1) is set as a subset (S0) to be used in the next comparison, and the same processing is performed on the subset (S0). Conversely, r ₀ ≧ r
If it is ₁ , the same processing is performed while leaving the sentence subset (S0) as it is. In this way, after changing the subset a predetermined number of times, or after selecting all the subsets under certain restrictions, the sentence with the highest correlation value with the appearance frequency (T) is finally obtained. Specify a subset of.

【００１８】図５は、本発明にかかる第２の実施例の特
定タスク音声データベース生成装置の構成図である。す
なわち、本実施例の特定タスク音声データベース生成装
置には、特定のタスクにおける対象文７を入力し、その
入力された対象文７について、種々の音素連鎖のうち出
現度数が０でない音素連鎖（以下、出現音素連鎖と呼
ぶ）を検出する音素連鎖検出手段１０が設けられ、他
方、第１の実施例と同様、一般的音声データベース８の
文集合から、任意の文の部分集合を選択する選択手段２
が設けられている。その選択手段２及び前述の音素連鎖
検出手段１０は、選択された文の部分集合について、出
現音素連鎖に関するエントロピーを算出するエントロピ
ー算出手段１１に接続され、そのエントロピー算出手段
１１は、算出されたエントロピーを利用して、最大のエ
ントロピーを有する文の部分集合を特定する特定手段５
に接続されている。その特定手段５は第１の実施例と同
様、その特定された文の部分集合に対応する音声情報
を、一般的音声データベース８中から選択して、特定タ
スク用音声データベース９を生成するデータベース生成
手段６に接続されている。FIG. 5 is a block diagram of a specific task voice database generating apparatus according to the second embodiment of the present invention. That is, the target task speech database generation device of the present embodiment inputs a target sentence 7 in a specific task, and with respect to the input target sentence 7, a phoneme chain whose appearance frequency is not 0 among various phoneme chains (hereinafter , And a phoneme chain detecting means 10 for detecting the appearance phoneme chain), and a selecting means for selecting an arbitrary sentence subset from the sentence set of the general speech database 8 as in the first embodiment. Two
Is provided. The selecting means 2 and the above-mentioned phoneme chain detecting means 10 are connected to the entropy calculating means 11 which calculates the entropy regarding the appearance phoneme chain for the selected subset of sentences, and the entropy calculating means 11 calculates the calculated entropy. Specifying means 5 for specifying a subset of sentences having the maximum entropy by using
It is connected to the. As in the first embodiment, the specifying means 5 selects the voice information corresponding to the specified subset of sentences from the general voice database 8 to generate the voice database 9 for the specific task. It is connected to the means 6.

【００１９】次に、上記第２の実施例の特定タスク音声
データベース生成装置の動作について図６を参照しなが
ら説明する。Next, the operation of the specific task voice database generation device of the second embodiment will be described with reference to FIG.

【００２０】まず、対象文７が入力されると（ステップ
Ｓ１１）、音素連鎖検出手段１０は、その特定タスクに
関する対象文７における音素連鎖の出現度数が０でない
出現音素連鎖を検出する（ステップＳ１２）。一方、選
択手段２は、一般的音声データベース８から文の部分集
合を選択し（ステップＳ１３）、その選択された文の部
分集合について、上記検出された出現音素連鎖に関する
エントロピーを、エントロピー算出手段１１によって算
出する（ステップＳ１４）。次に、算出されたエントロ
ピーが、その文の部分集合として特定できるかどうか
（すなわち、エントロピーが最大かどうか）を判断する
（ステップＳ１５）。ここでは、この判断は、別の文の
部分集合におけるエントロピーと比較して、その最大の
ものを見つけることにより行う。すなわち、第１の実施
例と同様の考えにより、上記の文の部分集合中の１文を
一般的音声データベース８から選択した別の１文に変更
し（ステップＳ１６）、その変更された文の部分集合に
ついて、前述と同様にエントロピーを算出する。ここ
で、エントロピーの算出は、例えば以下のようにして行
う。First, when the target sentence 7 is input (step S11), the phoneme chain detection means 10 detects an appearing phoneme chain whose occurrence frequency of the phoneme chain in the target sentence 7 relating to the specific task is not 0 (step S12). ). On the other hand, the selecting means 2 selects a sentence subset from the general speech database 8 (step S13), and for the selected sentence subset, the entropy relating to the above detected phoneme chain is entropy calculating means 11 (Step S14). Next, it is determined whether the calculated entropy can be specified as a subset of the sentence (that is, whether the entropy is maximum) (step S15). Here, this decision is made by comparing the entropy in another sentence subset and finding its maximum. That is, based on the same idea as in the first embodiment, one sentence in the above-mentioned sentence subset is changed to another one sentence selected from the general voice database 8 (step S16), and the changed sentence is changed. Entropy is calculated for the subset as described above. Here, the entropy is calculated, for example, as follows.

【００２１】対象文７、文の部分集合、１文を変更した
文の部分集合の例を、第１の実施例で示した図３、図４
（ａ）及び図４（ｂ）と同様とすると、まず、対象文
（Ｓ）７における出現度数が０でない出現音素連鎖（Ｕ
とする）に関して、文の部分集合（Ｓ0 ）における出現
音素連鎖（Ｕ）のエントロピー（Ｅ0 とする）を次式
（数１）により算出する。An example of a target sentence 7, a sentence subset, and a sentence subset obtained by changing one sentence is shown in FIGS. 3 and 4 in the first embodiment.
(A) and FIG. 4 (b), first, the appearance phoneme chain (U) whose appearance frequency in the target sentence (S) 7 is not 0 (U
), The entropy (E0) of the appearance phoneme chain (U) in the sentence subset (S0) is calculated by the following equation (Equation 1).

【００２２】[0022]

【数１】 [Equation 1]

【００２３】次に、文の部分集合（Ｓ0）の１文を変更
した部分集合（Ｓ1）における出現音素連鎖（Ｕ）のエ
ントロピー（Ｅ1とする）を次式（数２）により算出す
る。Next, the entropy (E1) of the appearance phoneme chain (U) in the subset (S1) obtained by changing one sentence of the sentence subset (S0) is calculated by the following equation (Equation 2).

【００２４】[0024]

【数２】 [Equation 2]

【００２５】特定手段５は、このようにして算出された
エントロピー（Ｅ0，Ｅ1）を比較し、エントロピーが最
大の文の部分集合を特定する（ステップＳ１５）。すな
わち、比較の結果、Ｅ0＜Ｅ1 であれば、文の部分集合
（Ｓ1）を次回に用いる部分集合（Ｓ0）とし、その部分
集合（Ｓ0）について同様の処理を行う。逆にＥ0≧Ｅ1
であれば、文の部分集合（Ｓ0）をそのままとして同様
の処理を行う。The specifying means 5 compares the entropies (E0, E1) calculated in this way, and specifies the sentence subset having the maximum entropy (step S15). That is, if E0 <E1 as a result of the comparison, the subset (S1) of the sentence is set as the subset (S0) to be used next time, and the same processing is performed on the subset (S0). Conversely, E0 ≧ E1
If so, the same processing is performed while leaving the sentence subset (S0) as it is.

【００２６】以上の処理を、エントロピーが十分高くな
るか変更する文が無くなるまで繰り返し行う。その後、
最大のエントロピー（又は、文の部分集合を特定できる
だけのエントロピー）が得られると、そのエントロピー
に対応する文の部分集合を特定する（ステップＳ１
７）。次に、データベース生成手段６により、その特定
された文の部分集合に対応する音声情報を、一般的音声
データベース８から選択し（ステップＳ１８）、特定タ
スク用音声データベースを生成する（ステップＳ１
９）。The above processing is repeated until the entropy becomes sufficiently high or there is no sentence to change. afterwards,
When the maximum entropy (or the entropy enough to identify a sentence subset) is obtained, the sentence subset corresponding to the entropy is identified (step S1).
7). Next, the database generation means 6 selects the voice information corresponding to the identified subset of sentences from the general voice database 8 (step S18), and generates the voice database for the specific task (step S1).
9).

【００２７】以上のように、相関値又はエントロピーを
利用して、特定タスクの対象文に最も近いと思われる文
章に対応する音声情報を一般的音声データベース中から
選択することにより、特定タスクについての音声データ
を収録する作業を必要とせず、既存の一般的音声データ
ベースを用いることのみで、特定のタスクに対応した音
声データベースを生成することができ、特定のタスクに
適応化できる音声認識モデルを構築することが可能とな
る。As described above, the correlation value or the entropy is used to select the voice information corresponding to the sentence that is considered to be the closest to the target sentence of the specific task from the general voice database. It is possible to generate a voice database corresponding to a specific task without using the task of recording voice data and only by using an existing general voice database, and construct a voice recognition model that can be adapted to the specific task. It becomes possible to do.

【００２８】なお、上記実施例では、文の部分集合を特
定する方法として、相関値、エントロピーを相互に比較
して処理していったが、これに限らず、例えば、相関値
又はエントロピーにあらかじめ閾値を設定しておき、算
出した相関値、エントロピーがその閾値を越えた場合
に、その文の部分集合を特定する構成としてもよい（越
えない限り、文の変更を続行する）。In the above embodiment, the correlation value and the entropy are compared with each other and processed as a method for specifying the sentence subset. However, the present invention is not limited to this. A threshold value may be set in advance, and when the calculated correlation value or entropy exceeds the threshold value, a subset of the sentence may be specified (unless the threshold is exceeded, the sentence change is continued).

【００２９】また、上記実施例では、いずれも対象文は
１つの文、文の部分集合は２つの文からなる例を示した
が、これに限定されるものではない。In each of the above embodiments, the target sentence is one sentence and the sentence subset is two sentences. However, the present invention is not limited to this.

【００３０】また、上記実施例では、いずれも文の部分
集合の変更は、１文のみとしたが、これに限らず、２文
とも変更してもよい。あるいは又、文の部分集合が３文
以上の場合は、そのうちの２文以上、又は全部を変更す
るようにしてもよい。あるいは、他の変更方法でもかま
わないが、できるだけ、早く部分集合の特定が出来るよ
うな変更方法が望ましい。In each of the above embodiments, the sentence subset is changed only for one sentence, but the present invention is not limited to this, and two sentences may be changed. Alternatively, when the subset of sentences is three or more, two or more or all of them may be changed. Alternatively, other modification methods may be used, but a modification method that enables the subset to be identified as soon as possible is desirable.

【００３１】また、上記実施例では、いずれも相関値又
はエントロピーを算出して比較するのに、２組の部分集
合を選択して行う構成としたが、これに限らず、例えば
選択可能な部分集合の相関値又はエントロピーを先に全
て算出しておき、そのうちから最大のものを特定する構
成としてもよい。あるいは又、３組以上の部分集合をグ
ループとして、そのグループ毎に同様に最大のものを選
択し、最終的にそれらグループのうちで最大のものを特
定するようにしてもよい。In each of the above embodiments, two sets of subsets are selected to calculate and compare the correlation value or entropy. However, the present invention is not limited to this. It is also possible to calculate all the correlation values or entropy of the set first and specify the largest one from them. Alternatively, three or more subsets may be set as a group, the maximum one may be similarly selected for each group, and the maximum one may be finally specified.

【００３２】また、上記実施例では、いずれも一般的音
声データベースから文の部分集合を選択する所定の規則
は、任意に選択することであったが、これに限らず、例
えば一般的音声データベースがタスク毎に分類された構
造であれば、各タスクを代表する文章を選択するように
してもよい。In each of the above embodiments, the predetermined rule for selecting a sentence subset from the general voice database is arbitrary selection, but the present invention is not limited to this. If the structure is classified for each task, a sentence representative of each task may be selected.

【００３３】また、上記実施例では、いずれも各処理手
段を専用のハードウェアにより構成したが、これに代え
て、同様の機能をコンピュータを用いてソフトウェア的
に実現してもよい。Further, in each of the above embodiments, each processing means is constituted by dedicated hardware, but instead of this, the same function may be realized by software using a computer.

【００３４】[0034]

【発明の効果】以上述べたところから明らかなように本
発明は、特定タスク用の音声データベースを生成する場
合に、音声データの収録に必要とされる労力が大幅に削
減できるという長所を有する。As is apparent from the above description, the present invention has an advantage that the labor required for recording voice data can be greatly reduced when a voice database for a specific task is generated.

【図面の簡単な説明】[Brief description of drawings]

【図１】本発明にかかる第１の実施例の特定タスク音声
データベース生成装置の構成図である。FIG. 1 is a configuration diagram of a specific task voice database generation device according to a first example of the present invention.

【図２】同第１の実施例の特定タスク音声データベース
生成装置の動作を説明する流れ図である。FIG. 2 is a flowchart illustrating the operation of the specific task voice database generation device according to the first embodiment.

【図３】同第１の実施例における対象文及び各音素連鎖
の出現度数の一例を示す図である。FIG. 3 is a diagram showing an example of appearance frequencies of a target sentence and each phoneme chain in the first embodiment.

【図４】同図（ａ）は、同第１の実施例における文の部
分集合及び各音素連鎖の出現度数の一例を示す図、同図
（ｂ）は、その文の部分集合の１文を変更したときの部
分集合及び各音素連鎖の出現度数の一例を示す図であ
る。FIG. 4 (a) is a diagram showing an example of a sentence subset and the frequency of appearance of each phoneme chain in the first embodiment, and FIG. 4 (b) is one sentence of the sentence subset. It is a figure which shows an example of the appearance frequency of a subset and each phoneme chain when changing.

【図５】本発明にかかる第２の実施例の特定タスク音声
データベース生成装置の構成図である。FIG. 5 is a configuration diagram of a specific task voice database generation device according to a second exemplary embodiment of the present invention.

【図６】同第２の実施例の特定タスク音声データベース
生成装置の動作を説明する流れ図である。FIG. 6 is a flowchart illustrating the operation of the specific task voice database generation device according to the second embodiment.

【符号の説明】[Explanation of symbols]

１第１出現度数算出手段２選択手段３第２出現度数算出手段４相関値算出手段５特定手段６データベース生成手段７対象文８一般的音声データベース９特定タスク用音声データベース１０音素連鎖検出手段１１エントロピー算出手段 DESCRIPTION OF SYMBOLS 1 1st appearance frequency calculation means 2 selection means 3 2nd appearance frequency calculation means 4 Correlation value calculation means 5 Identification means 6 Database generation means 7 Target sentence 8 General speech database 9 Speech database for specific tasks 10 Phoneme chain detection means 11 Entropy Calculation means

Claims

【特許請求の範囲】[Claims]

【請求項１】入力された特定のタスクに関する対象文
における各所定の音素連鎖の第１の出現度数を算出する
第１出現度数算出手段と、多種類のタスクに対応する複
数の文を有する一般的な音声データベースの文集合か
ら、所定の規則に従って複数個の文の部分集合を複数種
類選択する選択手段と、その選択された各部分集合にお
ける前記各音素連鎖の第２の出現度数を算出する第２出
現度数算出手段と、それらの第２の出現度数及び前記第
１の出現度数間の複数個の相関値を算出する相関値算出
手段と、その算出された相関値を利用して前記第１の出
現度数により近い出現度数を有する文の部分集合を特定
する特定手段と、その特定結果に基づき、それらの文章
に対応する、前記一般的音声データベース中の音声情報
を選び、前記特定タスク用の音声データベースを生成す
るデータベース生成手段とを備えたことを特徴とする特
定タスク音声データベース生成装置。1. A first appearance frequency calculating means for calculating a first appearance frequency of each predetermined phoneme chain in a target sentence relating to a specific input task, and a plurality of sentences corresponding to various kinds of tasks in general. Selecting means for selecting a plurality of types of subsets of a plurality of sentences from a sentence set of a typical speech database and a second appearance frequency of each phoneme chain in each selected subset. Second appearance frequency calculation means, correlation value calculation means for calculating a plurality of correlation values between the second appearance frequency and the first appearance frequency, and the correlation value calculation means for calculating the correlation values using the calculated correlation values. Specifying means for specifying a subset of sentences having an appearance frequency closer to the appearance frequency of 1, and voice information in the general voice database corresponding to those sentences is selected based on the identification result, and the specific task And a database generation means for generating a speech database for a specific task.

【請求項２】前記特定手段は、予め決められた閾値と
前記算出された相関値とを比較し、その相関値が閾値よ
り低い場合は、前記選択手段に別の文の部分集合を選択
させ、前記相関値が閾値より高い場合は、その高い相関
値に対応するところの前記第２の出現度数に対応する前
記文の部分集合を特定するものであることを特徴とする
請求項１記載の特定タスク音声データベース生成装置。2. The specifying means compares a predetermined threshold value with the calculated correlation value, and if the correlation value is lower than the threshold value, causes the selecting means to select another sentence subset. 2. The method according to claim 1, wherein when the correlation value is higher than a threshold value, a subset of the sentence corresponding to the second appearance frequency corresponding to the high correlation value is specified. Specific task voice database generator.

【請求項３】入力された所定のタスクに関する対象文
における各所定の音素連鎖の出現度数が０でない出現音
素連鎖を検出する音素連鎖検出手段と、多種類のタスク
に対応する複数の文を有する一般的な音声データベース
の文集合から、所定の規則に従って複数個の文の部分集
合を複数種類選択する選択手段と、その選択された各部
分集合における、前記出現音素連鎖に関するエントロピ
ーを算出するエントロピー算出手段と、その算出された
エントロピーを利用して最大のエントロピーを有する文
の部分集合を特定する特定手段と、その特定結果に基づ
き、それらの文章に対応する、前記一般的音声データベ
ース中の音声情報を選び、前記特定タスク用の音声デー
タベースを生成するデータベース生成手段とを備えたこ
とを特徴とする特定タスク音声データベース生成装置。3. A phoneme chain detecting means for detecting an appearing phoneme chain in which the occurrence frequency of each predetermined phoneme chain in the target sentence relating to the inputted predetermined task is not 0, and a plurality of sentences corresponding to various kinds of tasks. Selection means for selecting a plurality of types of subsets of a plurality of sentences from a sentence set of a general speech database according to a predetermined rule, and entropy calculation for calculating entropy of the appearance phoneme chain in each selected subset. Means, specifying means for specifying a subset of sentences having the maximum entropy by using the calculated entropy, and voice information in the general voice database corresponding to those sentences based on the specifying result. And a database generating means for generating a voice database for the specific task. Task voice database generator.