JPS6154240B2

JPS6154240B2 -

Info

Publication number: JPS6154240B2
Application number: JP55173079A
Authority: JP
Inventors: Junichi Ichikawa; Hidekazu Shiratori; Osamu Terao; Yasuo Sato; Takayuki Ooyama
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1980-12-10
Filing date: 1980-12-10
Publication date: 1986-11-21
Also published as: JPS5797594A

Description

【発明の詳細な説明】本発明は音声の標準パターン形成方法に係り、
特に、２つの音声パターンの照合に動的計画法を
用いて平均化された標準パターンを形成する方法
に関する。DETAILED DESCRIPTION OF THE INVENTION The present invention relates to a method for forming a standard pattern of speech,
In particular, the present invention relates to a method of forming an averaged standard pattern using dynamic programming to match two speech patterns.

一般にパターンの照合によつて音声を認識する
ためには、単語単位に音声の標準パターンを単語
辞書としてメモリに登録しておき、入力音声パタ
ーンと標準パターンとが照合される。この場合、
単語辞書としての標準パターンはできる限り、そ
の単語の発音に関して平均化されている事が望ま
しい。ところが１つの単語を発音する場合、単語
音声の継続時間長及び単語音声中の音韻の長さ
は、発音する人や発音する時の状況等によつて一
般に変動する。この様に同一の単語について、継
続時間長及び音韻の長さの異なる２つの音声パタ
ーンを平均化して標準パターンを得る方法とし
て、動的計画法を用いて上記２つの音声パターン
を照合し、その結果を用いて平均化した標準パタ
ーンを形成する手法が知られている。 Generally, in order to recognize speech by pattern matching, a standard pattern of speech is registered in a memory as a word dictionary for each word, and the input speech pattern and the standard pattern are matched. in this case,
It is desirable that the standard pattern used as a word dictionary is as averaged in terms of the pronunciation of the word as possible. However, when pronouncing one word, the duration of the word sound and the length of the phoneme in the word sound generally vary depending on the person pronouncing the word, the situation at the time of pronunciation, etc. In this way, one way to obtain a standard pattern by averaging two speech patterns with different durations and phoneme lengths for the same word is to use dynamic programming to match the two speech patterns. A method of forming an averaged standard pattern using the results is known.

従来の、動的計画法を用いた標準パターンの形
成方法は後に詳述するが、概略的には、２つの音
声パターンＡ，Ｂのそれぞれの継続時間長を等し
い時間単位に時刻t₁，t₂，…，ｔ_I及び時刻t₁，
t₂，…，ｔ_Jで分割してそれぞれを平面上の縦軸
及び横軸の目盛とし、各時刻における音声パター
ンの周波数スペクトルa₁，a₂，…，ａ_I及びb₁，
b₂，…，ｂ_Jを求めて２つの音声パターンＡ，Ｂ
の類似度が最大となるような最適径路を上記平面
上に得、縦軸又は横軸の各時刻に存在する最適径
路に対応する周波数スペクトルを適当な平均化手
段によつて平均化して標準パターンを得るもので
ある。 A conventional method of forming a standard pattern using dynamic programming will be described in detail later, but generally speaking, the duration length of each of the two audio patterns A and B is divided into equal time units at times t ₁ and t. ₂ ,..., t _I and time t ₁ ,
Divide by t ₂ , ..., t _J and use each as a scale on the vertical axis and horizontal axis on the plane, and the frequency spectrum of the voice pattern at each time a ₁ , a ₂ , ..., a _I and b ₁ ,
b ₂ ,...,b Find two speech patterns A and B to find _J
Obtain the optimal path on the plane that maximizes the similarity of This is what you get.

しかしながら、上記従来方法によれば、最適径
路を得た平面の縦軸及び横軸の一方からのみ周波
数スペクトルを平均化しているので、得られる標
準パターンは音声パターンＡ及びＢの一方のみに
極度に依存し、かつ標準パターンの継続時間長は
音声パターンＡ及びＢの一方の継続時間長に等し
い。このため、得られた標準パターンは２つの音
声パターンＡ，Ｂを充分に平均化したものとはい
えない。 However, according to the above conventional method, the frequency spectrum is averaged only from one of the vertical and horizontal axes of the plane from which the optimal path was obtained, so the standard pattern obtained is extremely limited to only one of the audio patterns A and B. and the duration of the standard pattern is equal to the duration of one of the audio patterns A and B. Therefore, the obtained standard pattern cannot be said to be a sufficiently averaged version of the two audio patterns A and B.

本発明の目的は、上記の従来方法における問題
に鑑み、動的計画法を実行する際に得た最適径路
を、横軸又は縦軸に対して一定の角度をなす所定
間隔の平行直線群で部分径路に分割し、その部分
径路の各々に対応する周波数スペクトルを平均化
するという構想に基づき、２つの音声パターン
Ａ，Ｂの音韻の長さのみならず継続時間長をも平
均化した標準パターンを得る事にある。 In view of the above-mentioned problems with the conventional method, it is an object of the present invention to convert the optimal path obtained when performing dynamic programming into a group of parallel straight lines at predetermined intervals that form a constant angle with respect to the horizontal or vertical axis. Based on the concept of dividing into partial paths and averaging the frequency spectra corresponding to each of the partial paths, this standard pattern averages not only the phoneme length but also the duration length of the two speech patterns A and B. The goal is to obtain

本発明によつて提供される方法の要旨は、２つ
の音声パターンＡ及びＢのそれぞれの継続時間長
を等時間単位の区分に分割し、その区分のすべて
における該音声パターンの周波数スペクトルa₁，
a₂，…，ａ_I及びb₁，b₂，…，ｂ_Jを求め、音声パ
ターンをそれぞれベクトル時系列Ａ＝（a₁，a₂，
…，ａ_I）及びＢ＝（b₁，b₂，…，ｂ_J）で表わ
し、これらのベクトル時系列の各周波数スペクト
ルの間に動的計画法により最適な対応関係（ａ_n(
〓_）：ｂ_o(〓₎）ただし、ｒ＝１，２，…，Ｍ，ｍ
(1)＝ｎ(1)＝１，ｍ（Ｍ）＝Ｉ，ｎ（Ｍ）＝Ｊ）を定
め、これらの最適な対応関係によつてｍ−ｎ平面
上に最適径路を作成して平均化された標準パター
ンＣのベクトル時系列を形成する方法であつて、
ｍ−ｎ平面上の最適径路を所定数の平行直線群
pm＋qn＝ｒ_i（ただし、ｉ＝１，２，…，Ｎ）に
よつてＮ−１個の部分径路に分割し、これらの部
分径路の各々に存在する最適な対応関係（ａ_ki1：
ｂ_li1），…，（ａ_ki2：ｂ_li2）から部分ベクトル時系
列Ａ_i＝（ａ_ki1，…，ａ_ki2）及びＢ_i＝（ｂ_li1，…，
ｂ_li2）を求め、これらの部分ベクトル時系列を所
定の平均化手法によつて平均化して標準パターン
Ｃの周波数スペクトルｃ_iを求め、それにより標
準パターンＣのベクトル時系列Ｃ＝（c₁，c₂，
…，ｃ_N-1）を形成する事を特徴とする音声の標
準パターン形成方法である。 The gist of the method provided by the present invention is to divide the duration of each of two speech patterns A and B into equal time unit segments, and to divide the frequency spectrum of the speech pattern in all of the segments a ₁ ,
a ₂ , ..., a _I and b ₁ , b ₂ , ..., b _J are obtained, and the speech patterns are expressed as vector time series A = (a ₁ , a ₂ ,
..., a _I ) and B=(b ₁ , b ₂ , ..., b _J ), and the optimal correspondence relationship (a _n(
〓 ₎ :b _o( 〓 ₎ ) However, r=1, 2,..., M, m
(1)=n(1)=1, m(M)=I, n(M)=J), create an optimal path on the m-n plane based on these optimal correspondence relationships, and calculate the average A method for forming a vector time series of a standardized standard pattern C, the method comprising:
The optimal path on the m-n plane is defined by a group of parallel straight lines of a predetermined number.
Divide into N-1 partial paths by pm+qn= _ri (where i=1, 2,...,N), and calculate the optimal correspondence relationship (a _ki1 :
b _li1 ), ..., (a _ki2 : b _li2 ), the partial vector time series A _i = (a _ki1 , ..., a _ki2 ) and B _i = (b _li1 , ...,
b _li2 ), average these partial vector time series using a predetermined averaging method to obtain the frequency spectrum c _i of the standard pattern C, and thereby the vector time series C = (c ₁ , _c2 ,
..., c _N-1 ).

以下、添付の図面に基づいて本発明の実施例に
よる音声の標準パターン形成方法を説明する。 Hereinafter, a method for forming a standard audio pattern according to an embodiment of the present invention will be described with reference to the accompanying drawings.

第１図は本発明の１実施例による音声の標準パ
ターン形成方法を説明するためのグラフ図であ
る。第１図において、２つの音声パターンＡ及び
Ｂを平均化して標準パターンＣを作成するものと
する。音声パターンＡの継続時間長はＴ_Aであ
り、音声パターンＢの継続時間長はＴ_Bである。
時間Ｔ_A及びＴ_Bはそれぞれ等時間単位に時刻t₁，
t₂，…，ｔ_I及び時刻t₁，t₂，…，ｔ_Jで分割され
ている。音声パターンＡの時刻t₁，t₂，…，ｔ_Iに
おける周波数スペクトルはベクトル量でそれぞれ
a₁，a₂，…，ａ_Iで表わされており、音声パター
ンＢの時刻t₁，t₂，…，ｔ_Jにおける周波数スペク
トルはそれぞれb₁，b₂，…，ｂ_Jで表わされてい
る。音声パターンＡ及びＢの周波数スペクトルの
間の類似関係をできるだけ少ない計算で求める方
法として、動的計画法が知られている。この動的
計画法によつて、音声パターンＡ及びＢの時間軸
をそれぞれ縦軸及び横軸としたｍ−ｎ平面を作成
し、そのｍ−ｎ平面上で両パターンの類似度が最
大となるような最適径路を見い出す。更に詳しく
は、最終的に得られる標準パターンＣの時間軸の
始点及び終点はそれぞれ音声パターンＡの時間軸
の始点t₁及び終点ｔ_Iに対応すると共に、音声パ
ターンＢの時間軸の始点t₁及び終点ｔ_Jに対応す
ることに鑑み、最適径路の始点及び終点はｍ−ｎ
平面上の座標原点（t₁，t₁）及びｔ_I，ｔ_J）にそれ
ぞれ対応させる。動的計画法によつて最適径路を
求める場合の原則は、次の２つに要約される。す
なわち、原点（t₁，t₂）からある座標（ｔ_p，ｔ
_q）までの最適径路を求める場合、（ｔ_p，ｔ_q）よ
り時間的に前の点に至る最適径路は求まつている
と仮定し、その座標（ｔ_p，ｔ_q）に至る最適径
路は直前のいくつかの点からの可能な径路のうち
最適な径路を通ると仮定する事である。例えば第
１図において、座標（ｔ_p，ｔ_q）までの最適径路
を求めるとすると（ｔ_p，ｔ_q）へ至る径路として
は、（ｔ_p，ｔ_q-1），（ｔ_p-1，ｔ_q-1）又は（ｔ_p-
_１，ｔ_q）から至る三つの径路が可能である。仮
定により（ｔ_p，ｔ_q-1），（ｔ_p-1，ｔ_q-1）および
（ｔ_p-1，ｔ_q）までの最適径路は求まつているの
であるから、最適径路が上記３つの径路の１つを
たどつたとした場合の音声パターンＡ及びＢの類
似度を、３つの径路についてそれぞれ求めて比較
し、最大類似度となる径路を最適径路は通る。図
においては、最適径路上の座標（ｔ_p，ｔ_q）の直
前の座標は（ｔ_p，ｔ_q-1）となつている。最適径
路の始点及び終点を所定の漸化式に代入する事に
より最適径路ｌは上記動的計画法に基づいて求ま
る。なお、類似度としては、例えば音声パターン
Ａ及びＢの周波数スペクトルa₁，a₂，…，ａ_Iの
各々とb₁，b₂，…，ｂ_Jの各々との差の絶対値の
逆数が用いられ、これらが最適径路に沿つて加算
されて得られる。従つて最終的に最適径路ｌが求
まつた場合、類似度Ｄは次のように表わされる。 FIG. 1 is a graph diagram for explaining a method for forming a standard voice pattern according to an embodiment of the present invention. In FIG. 1, it is assumed that two voice patterns A and B are averaged to create a standard pattern C. The duration length of voice pattern A is T _A , and the duration length of voice pattern B is T _B .
The times T _A and T _B are equal time units of time t ₁ ,
It is divided at t ₂ ,..., t _I and at time t ₁ , t ₂ ,..., t _J. The frequency spectrum at times t ₁ , t ₂ , ..., t _I of speech pattern A is a vector quantity, and each
The frequency spectrum at time t ₁ _, t ₂ _, ..., _{t J} _of speech pattern B is represented by b ₁ , b ₂ , ..., b _J, respectively. has been done. Dynamic programming is known as a method for determining the similarity relationship between the frequency spectra of speech patterns A and B with as few calculations as possible. Through this dynamic programming method, an m-n plane is created with the time axes of speech patterns A and B as the vertical and horizontal axes, respectively, and the similarity between both patterns is maximized on the m-n plane. Find the optimal route. More specifically, the start point and end point of the time axis of the finally obtained standard pattern C correspond to the start point t ₁ and end point t _I of the time axis of audio pattern A, respectively, and the start point t ₁ of the time axis of audio pattern B. and the end point t _J , the start and end points of the optimal route are m−n
The coordinates are made to correspond to the coordinate origins (t ₁ , t ₁ ) and t _I , t _J ) on the plane, respectively. The principles for finding the optimal route using dynamic programming can be summarized into the following two. In other words, from the origin (t ₁ , t ₂ ) to a certain coordinate (t _p , t
_q ), it is assumed that the optimal route to a point temporally earlier than (t _p , t _q ) has been found, and the optimal route to the coordinates (t _p , t _q ) is found. is to assume that we take the optimal route among the possible routes from the previous several points. For example, in Fig. 1, if the optimal route to the coordinates (t _p , t _q ) is to be found, the routes to (t _p , t _q ) are (t _p , t _q-1 ), (t _p-1 , t _q-1 ) or (t _p-
₁ , t _q ) are possible. Since the optimal path to (t _p , t _q-1 ), (t _p-1 , t _q-1 ) and (t _p-1 , t _q ) has been found by the assumption, the optimal path is the above The degree of similarity between voice patterns A and B when one of the three paths is followed is determined and compared for each of the three paths, and the optimal path passes through the path having the maximum degree of similarity. In the figure, the coordinates immediately before the coordinates (t _p , t _q ) on the optimal path are (t _p , t _q-1 ). By substituting the starting point and end point of the optimal route into a predetermined recurrence formula, the optimal route l is determined based on the above dynamic programming method. Note that the degree of similarity is, for example, the reciprocal of the absolute value of the difference between each of the frequency spectra a ₁ , a ₂ _, ..., a _I of voice patterns A and B and each of b ₁ , b 2 , ..., b _J. and these are summed along the optimal path. Therefore, when the optimal route l is finally determined, the degree of similarity D is expressed as follows.

ここに、ａ_n(1)＝a₁，ｂ_o(1)＝ｂ_I，ａ_n(M)＝ａ
_I，ａ_o(M)＝ｂ_Jである。 Here, a _n (1)=a ₁ , b _o (1)=b _I , a _n(M) = a
_I , _ao(M) = _bJ .

従来は、こうして得られた最適径路ｌに基づい
て直接的に平均化された標準パターンを得てい
た。すなわち、ｍ−ｎ平面の縦軸又は横軸に関し
てのみ各時間において最適径路に対応している両
音声パターンの周波数スペクトルを平均化してい
た。例えば縦軸に関して平均を取る場合は、縦軸
の時間Ｔ_A内を等間隔に分割する時刻t₁，t₂，…，
ｔ_Iの各々において最適径路ｌに対応する周波数
スペクトルの平均を求める。すなわち、時刻t₁に
おいては周波数スペクトルａ_n(1)＝a₁とｂ_o(1)＝b₁
が対応しているのでa₁とb₁の平均を標準パターン
の時間t₁における周波数スペクトルc₁′とし、時刻
t₂においては周波数スペクトルａ_n(2)＝ａ_n(3)＝ａ
_n(4)＝a₂，ｂ_o(2)＝b₂，ｂ_o(3)＝b₃，ｂ_o(4)＝b₄が対
応しているので、c₂′＝（3a₂＋b₂＋b₃＋b₄）／６を
標準パターンの時刻t₂における周波数スペクトル
とする。以下同様にして時刻ｔ_Iまでの各時間に
おける標準パターンの周波数スペクトルを求めれ
ば、標準パターンＣが C′＝（c₁′，c₂′，…，c₁′）として求まる。横軸に関して平均を取る場合にも
同様の手法で標準パターンC″は C″＝（c₁″，c₂″，…，ｃ_J″）として求まる。 Conventionally, an averaged standard pattern was directly obtained based on the optimal path l obtained in this way. That is, the frequency spectra of both speech patterns corresponding to the optimal path at each time are averaged only on the vertical axis or the horizontal axis of the mn plane. For example, when taking an average on the vertical axis, time t ₁ , t ₂ , ..., which divides time T _A on the vertical axis at equal intervals, is used.
The average of the frequency spectra corresponding to the optimal path l is determined for each of t _I . That is, at time t ₁ , the frequency spectrum a _n (1) = a ₁ and b _o (1) = b ₁
correspond, so the average of a ₁ and b ₁ is taken as the frequency spectrum c ₁ ′ at time t ₁ of the standard pattern, and the time
At t ₂ , the frequency spectrum a _n (2)=a _n (3)=a
_{Since n} (4) = a ₂ , b _o (2) = b ₂ , b _o (3) = b ₃ , b _o (4) = b ₄ correspond, c ₂ ′ = (3a ₂ + b ₂ +b ₃ +b ₄ )/6 is the frequency spectrum of the standard pattern at time t ₂ . Thereafter, if the frequency spectrum of the standard pattern at each time up to time t _I is obtained in the same manner, the standard pattern C is obtained as C'=(c ₁ ', c ₂ ', . . . , c ₁ '). When taking the average on the horizontal axis, the standard pattern C'' is found as C''=(c ₁ '', c ₂ '', ..., c _J '') using the same method.

しかしながら、上述の従来方法によつて求めた
標準パターンは前述の如く、音声パターンＡ又は
Ｂのいずれか一方にのみ極度に依存し、かつ継続
時間長も音声パターンＡ又はＢの継続時間長Ｔ_A
又はＴ_Bのいずれか一方に等しくなるので、２つ
の音声パターンを充分に平均化したものとはいえ
ない。 However, as described above, the standard pattern obtained by the conventional method described above is extremely dependent on either voice pattern A or B, and the duration length is also the duration time T _A of voice pattern A or B.
or T _B , it cannot be said that the two voice patterns are sufficiently averaged.

本発明の実施例によれば、ｍ−ｎ平面上につく
られた最適径路ｌを横軸又は縦軸に平行でなにＮ
本の平行直線群ｓによつて（Ｎ−１）個の部分径
路l₁，l₂，…，ｌ_N-1に分割し、各部分径路に対応
する音声パターンＡ及びＢの部分パターンＡ_i＝
（ａ_ki1，…，ａ_ki2）及びＢ_i＝（ｂ_li1，…，ｂ_li2）
を
求める。例えば部分径路l₁に対応する音声パター
ンＡの周波数スペクトルはａ_n(1)＝a₁とａ_n(2)＝a₂
であるから、A₁＝（a₁，a₂）であり、同様に部分径
路l₁に対応する音声パターンＢの周波数スペクト
ルはｂ_o(1)＝b₁とｂ_o(2)＝b₂であるからB₁＝（b₁，
b₂）である。部分径路l₂に対しては同様にしてA₂
＝（a₂，a₂），B₂＝（b₃，b₄）である。こうして得ら
れた部分パターンを適当な平均化手段によつて平
均化することにより標準パターンＣの周波数成分
ｃ_iが得られる。平均化手段の１実施例によれ
ば、上記平行直線群をpm＋qn＝ｒ_i，ただしｉ＝
１，２，…，Ｎと表わすと、標準パターンＣの周
波数スペクトルｃ_iはと表わされる。図に示した場合について、c₁，c₂
を求めると、 c₁＝ｐ（ａ_１＋ａ_２）＋ｑ（ｂ_１＋ｂ_２）／２（ｐ
＋ｑ） c₂＝２ｐａ_２＋ｑ（ｂ_３＋ｂ_４）／２（ｐ＋ｑ）となる。以下、c₄ないしｃ_N-1も同様にして得ら
れる。平行直線の個数Ｎを増減することによりサ
ンプリングの点数を任意に変えることができる。
斜行直線群を用いた事により、標準パターンＣの
継続時間長は音声パターンＡ，Ｂの継続時間長Ｔ
_A，Ｔ_Bの加重平均値に等しくする事ができ、か
つ、音韻の長さについても両パターンから等しく
影響を受けるようにする事ができる。 According to the embodiment of the present invention, the optimal path l created on the m-n plane is parallel to the horizontal axis or the vertical axis and is
Divided into (N-1) partial paths l ₁ , l ₂ , ..., l _N-1 by a group of parallel straight lines s, and partial patterns A _i of audio patterns A and B corresponding to each partial path. =
(a _ki1 ,..., a _ki2 ) and B _i =(b _li1 ,..., b _li2 )
seek. For example, the frequency spectrum of speech pattern A corresponding to partial path l ₁ is a _n (1) = a ₁ and a _n (2) = a ₂
Therefore, A ₁ = (a ₁ , a ₂ ), and similarly, the frequency spectrum of speech pattern B corresponding to partial path l ₁ is b _o (1) = b ₁ and b _o (2) = b ₂ Therefore, B ₁ = (b ₁ ,
_b2 ). Similarly, for partial path l ₂ , A ₂
= (a ₂ , a ₂ ), B ₂ = (b ₃ , b ₄ ). The frequency components c _i of the standard pattern C are obtained by averaging the partial patterns obtained in this way using a suitable averaging means. According to one embodiment of the averaging means, the group of parallel straight lines is defined as pm+qn= _ri , where i=
When expressed as 1, 2, ..., N, the frequency spectrum c _i of standard pattern C is It is expressed as For the case shown in the figure, c ₁ , c ₂
When calculating, c ₁ = p (a ₁ + a ₂ ) + q (b ₁ + b ₂ )/2 (p
+q) c ₂ =2pa ₂ +q(b ₃ +b ₄ )/2(p+q). Hereinafter, c ₄ to c _N-1 can be obtained in the same manner. By increasing or decreasing the number N of parallel straight lines, the number of sampling points can be changed arbitrarily.
By using the diagonal straight line group, the duration length of standard pattern C is equal to the duration time T of voice patterns A and B.
It can be made equal to the weighted average value of _A and T _B , and the length of the phoneme can be equally influenced by both patterns.

第２図は第１図のグラフ図によつて説明した本
発明による音声の標準パターン形成方法を実施す
るための標準パターン形成装置の１実施例を示す
ブロツク図である。第２図において、１０１は音
声パターンＡ及びＢのパターンの照合によつて求
められた最適径路のｍ−ｎ平面上の座標〔ｍ
（τ），ｎ（τ）〕〓_=1,2,…_,Mを格納する第１
のメモリであり、１０２はメモリ１０１から送ら
れて来るｍ及びｎと、定数ｐ及びｑを格納する第
２のメモリ２０１から送られて来る定数ｐ及びｑ
とを使つてpm＋qnを計算する第１の演算回路で
あり、１０３は第２の演算回路から送られて来る
値ri＋１と第１の演算回路からの値pm＋qnとを
比較する比較回路であり、１０４は比較回路１０
３における比較の結果、pm＋qnの方がri＋１よ
り大きい時に計数が１だけ増加するカウンタであ
り、１０５はカウンタの計数値ｉを保持するレジ
スタであり、第２の演算回路１０６はレジスタ１
０５の値に基づいて平行直線群の値ｒ_iを計算す
る。第３のメモリ２０２には２つの音声パターン
Ａ及びＢの周波数スペクトル（a₁，a₂，…，ａ
_I）及びb₁，b₂，…，ｂ_J）が格納されている。１
０７はメモリ２０２に格納されている周波数スペ
クトルａ_n(〓₎，ｂ_o(〓₎から標準パターンＣの周
波数スペクトルｃ_iを計算する第３の演算回路で
ある。ただし、計数値ｉはレジスタ１０５から第
３の演算回路１０７に与えられる。制御装置１０
８はレジスタ１０５の初期設定と第１のメモリ１
０１から座標〔ｍ（τ），ｎ（τ）〕を順番に読み
出す為の指令を行う。標準パターンＣの周波数ス
ペクトルは第４のメモリ２０３に格納される。 FIG. 2 is a block diagram showing one embodiment of a standard pattern forming apparatus for carrying out the method for forming a standard sound pattern according to the present invention explained using the graph diagram of FIG. In FIG. 2, 101 is the coordinate on the m-n plane of the optimal route found by matching the speech patterns A and B.
(τ), n(τ)〓 _=1,2, … _{, the first one that stores M}
102 is a memory that stores m and n sent from the memory 101 and constants p and q sent from the second memory 201 that stores constants p and q.
103 is a comparison circuit that compares the value ri+1 sent from the second arithmetic circuit with the value pm+qn from the first arithmetic circuit; 104 is comparison circuit 10
As a result of the comparison in 3, it is a counter whose count increases by 1 when pm+qn is greater than ri+1, 105 is a register that holds the count value i of the counter, and the second arithmetic circuit 106 is a counter that increases by 1 when pm+qn is larger than ri+1.
The value r _i of the group of parallel straight lines is calculated based on the value of 05. The third memory 202 stores frequency spectra of two audio patterns A and B (a ₁ , a ₂ , ..., a
_I ) and b ₁ , b ₂ ,..., b _J ) are stored. 1
07 is a third arithmetic circuit that calculates the frequency spectrum c _i of the standard pattern C from the frequency spectra a _{n (} ₎ and b _{o (} ₎ stored in the memory 202 . However, the count value i is given to the third arithmetic circuit 107 from the register 105. Control device 10
8 is the initial setting of the register 105 and the first memory 1
A command is issued to sequentially read the coordinates [m(τ), n(τ)] from 01. The frequency spectrum of standard pattern C is stored in fourth memory 203.

第３図は第２図の装置の動作を説明するための
流れ図である。第２図及び第３図を参照して、制
御装置１０８によつて第１段階で、ｒ，ｉ及び部
分パターン内の座標の組合せ中で、平均値を求め
たものの個数を示すｘをすべて零に初期設定し、
第２段階でｒを１だけ増加させる。第３段階で第
１の演算回路によつてｋ＝pm（τ）＋qn（τ）
を計算し、第４段階で比較回路１０３においてｋ
とｒ_iの値を比較する。ｋがri＋１より大きい場合
は第５段階でレジスタ１０５に保持されている数
値ｉを１だけ増加させ、第６段階で数値ｘを零に
して第７段階で第３の演算装置１０７において前
サイクルまでに求まつたｃ_iと今サイクルにおけ
るｃ_iとの平均を計算する。第４段階でｋがri＋１
と等しいかより小の場合は座標ｍ（τ），ｎ
（τ）が同一の部分径路内にあるので、第５、第
６段階の処理を行わずに直接第７段階で平均を計
算する。制御装置１０８によつて、第８段階でｘ
の値を１だけ増加させ、第９段階でτとＭを比較
する。τがＭより小の場合は再び第２段階に戻
り、同様の処理を繰返す。τがＭより大又は等し
いときは処理は終了する。 FIG. 3 is a flowchart for explaining the operation of the apparatus shown in FIG. Referring to FIGS. 2 and 3, in the first step, the control device 108 sets all x indicating the number of average values among the combinations of r, i and coordinates in the partial pattern to zero. Initialize to
In the second step, r is increased by 1. In the third stage, the first arithmetic circuit calculates k=pm(τ)+qn(τ)
In the fourth step, the comparison circuit 103 calculates k
and the value of r _i . If k is greater than ri+1, in the fifth step the numerical value i held in the register 105 is increased by 1, in the sixth step the numerical value x is zeroed, and in the seventh step the third arithmetic unit 107 processes up to the previous cycle. The average of c _i found in and c _i in the current cycle is calculated. In the fourth stage, k is ri+1
If it is equal to or less than, the coordinates m(τ), n
(τ) is within the same partial path, the average is directly calculated in the seventh stage without performing the processing in the fifth and sixth stages. By the control device 108, x
The value of is increased by 1, and τ and M are compared in the ninth step. If τ is smaller than M, the process returns to the second step and repeats the same process. When τ is greater than or equal to M, the process ends.

以上の説明から明らかなように、本発明によつ
て動的計画法を実行して得られた最適径路を、横
軸又は縦軸に対して一定の角度をなす平行直線群
で部分径路に分割し、その部分径路の各々に対応
する周波数スペクトルを平均化する事により、２
つの音声パターンの音韻の長さ及び継続時間長が
平均化されるので、充分に平均化された標準パタ
ーンが得られる。 As is clear from the above explanation, the optimal route obtained by executing dynamic programming according to the present invention is divided into partial routes by a group of parallel straight lines that make a certain angle with respect to the horizontal or vertical axis. By averaging the frequency spectra corresponding to each of the partial paths, 2
Since the phoneme length and duration length of the two speech patterns are averaged, a sufficiently averaged standard pattern is obtained.

なお、平均化手段は前述の実施例における数式
に代えて他の平均化のための数式を用いてもよ
い。 Note that the averaging means may use other averaging formulas instead of the formulas in the above-described embodiments.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図は本発明の１実施例による音声の標準パ
ターン形成方法を説明するためのグラフ図、第２
図は第１図のグラフ図によつて説明した方法を実
施するための標準パターン形成装置の１実施例を
示すブロツク図、第３図は第２図の装置の動作を
説明するための流れ図である。Ａ及びＢ……音声パターン、a₁，a₂，…，ａ_I
……音声パターンＡの周波数スペクトル、b₁，
b₂，…，ｂ_J……音声パターンＢの周波数スペク
トル、Ｃ……標準パターン、c₁，c₂，…，ｃ_N-1
……標準パターンＣの周波数スペクトル、ｌ……
最適径路、l₁，l₂，…，ｌ_N-1……部分径路、ｓ…
…平行直線群、１０１……最適径路の座標格納用
のメモリ、１０２……pm＋qn計算用の演算回
路、１０３……比較回路、１０４……カウンタ、
１０５……レジスタ、１０６……ｒ_i計算用の演
算回路、１０７……ｃ_i計算用の演算回路、１０
８……制御装置、２０１……ｐ及びｑを格納する
メモリ、２０２……音声パターンＡ及びＢのベク
トル時系列を格納するメモリ、２０３……標準パ
ターンＣのベクトル時系列を格納するメモリ。 FIG. 1 is a graph diagram for explaining a method for forming a standard voice pattern according to one embodiment of the present invention, and FIG.
The figure is a block diagram showing one embodiment of a standard pattern forming apparatus for carrying out the method explained using the graph diagram of FIG. 1, and FIG. 3 is a flow chart for explaining the operation of the apparatus of FIG. be. A and B...Voice pattern, a ₁ , a ₂ ,..., a _I
...Frequency spectrum of speech pattern A, b ₁ ,
b ₂ ,..., b _J ... Frequency spectrum of voice pattern B, C... Standard pattern, c ₁ , c ₂ , ..., c _N-1
...Frequency spectrum of standard pattern C, l...
Optimal route, l ₁ , l ₂ ,..., l _N-1 ... Partial route, s...
...Parallel straight line group, 101...Memory for storing coordinates of optimal route, 102...Arithmetic circuit for calculating pm+qn, 103...Comparison circuit, 104...Counter,
105...Register, 106...Arithmetic circuit for r _i calculation, 107...Arithmetic circuit for c _i calculation, 10
8...control device, 201...memory for storing p and q, 202...memory for storing vector time series of audio patterns A and B, 203...memory for storing vector time series of standard pattern C.

Claims

【特許請求の範囲】１２つの音声パターンＡ及びＢのそれぞれの継
続時間長を等時間単位の区分に分割し、該区分の
すべてにおける該音声パターンの周波数スペクト
ルa₁，a₂，…ａ_I及びb₁，b₂，…，ｂ_Jを求め、該
音声パターンをそれぞれベクトル時系列Ａ＝
（a₁，a₂，…，ａ_I）及びＢ＝（b₁，b₂，…，ｂ_J）
で表わし、該ベクトル時系列の各周波数スペクト
ルの間に動的計画法により最適な対応関係（ａ_n(
〓_）：ｂ_o(〓₎））（ただし、τ＝１，２，…，Ｍ，
ｍ(1)＝ｎ(1)＝１，ｍ（Ｍ）＝Ｉ，ｎ（Ｍ）＝Ｊ）を
定め、該最適な対応関係によつてｍ−ｎ平面上に
最適径路を作成して平均化された標準パターンＣ
のベクトル時系列を形成する方法であつて、前記
ｍ−ｎ平面上の最適径路を所定数の平行直線群
pm＋qn＝ri（ただし、ｉ＝１，２，…，Ｎ）に
よつてＮ−１個の部分径路に分割し、該部分径路
の各々に存在する最適な対応関係（ａ_ki1：ｂ_li
_１），…，（ａ_ki2：ｂ_li2）から部分ベクトル時系列
Ａ_i＝（ａ_ki1，…，ａ_ki2）及びＢ_i＝（ｂ_li1，…，ｂ_l
i
_２）を求め、該部分ベクトル時系列を所定の平均
化手法によつて平均化して標準パターンＣの周波
数スペクトルｃ_iを求め、それにより標準パター
ンＣのベクトル時系列Ｃ＝（c₁，c₂，…，ｃ_N-1）
を形成する事を特徴とする音声の標準パターン形
成方法。２前記平均化手法としてを用いる事を特徴とする特許請求の範囲第１項記
載の音声の標準パターン形成方法。[Claims] 1. Divide the duration length of each of the two audio patterns A and B into segments of equal time units, and calculate the frequency spectrum a ₁ , a ₂ , ...a _I of the audio pattern in all of the segments. and b ₁ , b ₂ , ..., b _J are obtained, and each of the speech patterns is expressed as a vector time series A=
(a ₁ , a ₂ , ..., a _I ) and B = (b ₁ , b ₂ , ..., b _J )
The optimal correspondence relationship (a _n(
〓 ₎ :bo ₍ 〓 ₎ )) (However, τ=1, 2,...,M,
m(1)=n(1)=1, m(M)=I, n(M)=J), create an optimal path on the m-n plane based on the optimal correspondence, and calculate the average. standard pattern C
A method of forming a vector time series of
Divide into N-1 partial routes by pm + qn = ri (where i = 1, 2, ..., N), and calculate the optimal correspondence relationship (a _ki1 : b _li
₁ ), ..., (a _ki2 : b _li2 ), the partial vector time series A _i = (a _ki1 , ..., a _ki2 ) and B _i = (b _li1 , ..., b _{l
i}
₂ ), average the partial vector time series using a predetermined averaging method to obtain the frequency spectrum c _i of the standard pattern C, and thereby the vector time series of the standard pattern C = (c ₁ , c ₂ ,…,c _N-1 )
A method for forming a standard pattern of speech, which is characterized by forming. 2 As the above averaging method 2. A method for forming a standard sound pattern according to claim 1, characterized in that the method uses: