JPS58209794A - Pattern matching system - Google Patents
Pattern matching systemInfo
- Publication number
- JPS58209794A JPS58209794A JP57092824A JP9282482A JPS58209794A JP S58209794 A JPS58209794 A JP S58209794A JP 57092824 A JP57092824 A JP 57092824A JP 9282482 A JP9282482 A JP 9282482A JP S58209794 A JPS58209794 A JP S58209794A
- Authority
- JP
- Japan
- Prior art keywords
- pattern
- time series
- matching
- series pattern
- standard
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Granted
Links
Abstract
(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.
Description
【発明の詳細な説明】
jal 発明り技#分野
本発明は音声認識方式に係り、4?に音声の特徴全表わ
すペクトjしの時系列で表わされるパターンの7ツ千ン
グ万式の改良に関する。[Detailed Description of the Invention] jal Inventive Technique #Field The present invention relates to a voice recognition method, and includes 4? This invention relates to the improvement of seven thousand patterns of patterns expressed in a time series of vectors that express all the characteristics of speech.
(1)) 従来技術と問題点
−ffにパターンのマツチングによって音声全認識する
ためには、単語単位に音声の標準パダ、−ン紫単語辞書
としてメモリに登録して8き、入力音声パターンと標準
パターンとがマツチングされる。(1)) Prior Art and Problems - In order to recognize all speech by matching patterns to ff, register the speech in memory as a standard word dictionary for each word, and then match the input speech pattern with the input speech pattern. The standard pattern is matched.
この場合入力音声パターン及び標準パダー7は出来る限
妙、その単語の発音に関して時間軸で正規化されている
事が望ましい。ところが1つ(/J単単語全発音る場合
、単語音声の継続時間長及び単語音声中の音韻の長さは
発音丁6人や発音する時の状況等によって、一般に変動
する。この様に音声は、動的計画法(DPマ、千ング)
と識別関数法とが代表的な方法として仰られてい6つし
かし動的計画法は時間軸の非憑形な伸縮に対する正規化
には有効であるが、パラメークの確率的な変動に対処す
ることは困難である。又識別関数法はパラメークの確率
的変動には有効であるが9時間軸方向の正規化に問題が
あるといろ欠点がある。In this case, it is desirable that the input speech pattern and the standard padder 7 be normalized on the time axis with respect to the pronunciation of the word as accurately as possible. However, when pronouncing a whole word (/J), the duration of the word sound and the length of the phoneme in the word sound generally vary depending on the person who is pronouncing it, the situation when pronouncing it, etc. is dynamic programming (DPma, Sengu)
The discriminant function method is said to be a typical method.6 However, while dynamic programming is effective for normalizing against unnatural expansion and contraction of the time axis, it is difficult to deal with stochastic fluctuations in parameters. It is difficult. Although the discriminant function method is effective for stochastic variations in parameters, it has drawbacks such as problems with normalization in the time axis direction.
(Cl 発明の目的 本発明の目的は上記欠点金線くため1時間軸。(Cl Object of the invention The purpose of the present invention is to eliminate the above-mentioned defective gold wire on a one-time scale.
闇波数軸とで距離計算の重み金変えることにより。By changing the weight of distance calculation with the dark wave number axis.
動的計画法と識別関数法との両方式全組み合せ。Full combination of dynamic programming and discriminant function method.
両方式の長所金倉せもったパダーンマッ千ンク万式?提
供することにある。Advantages of both methods: Kanakura Semota Padaan Mak Sennku Manshiki? It is about providing.
(cl 発明の構成
本発明の構成は入力時系列パターンを動的計画法によっ
て標準時系列パターンと7クチングする手段と、該マツ
チング手段の結果に基づき入力時系列パターンから新た
な7.千ノグ用時系列パターンを作放す6手段とを設け
、該新たな7ツ千ング用時系列パダーンと標準パターン
との間で2時間軸上及び周波数軸上で異な6重み付けを
用いた距離計算全行なってパターン7ツ千ング金行なう
も力である。(cl) Structure of the Invention The structure of the present invention includes a means for matching an input time series pattern with a standard time series pattern by dynamic programming, and a method for generating a new time series pattern from the input time series pattern based on the result of the matching means. A pattern is created by performing all distance calculations between the new 7-thousand time series pattern and the standard pattern using 6 different weightings on the time axis and the frequency axis. Even if you spend 7,000 yen, it's power.
(el 発明CON施例
ii図は本発明の一実施例を説明するブロック図である
。音声パターンの継続時間長を等時間単位の区分に分割
し、該区分のすべてに於る該音声パターンの周波数スペ
クトル? 入力時系列/−e 9−ンについてはペクト
・レイ、 、 鶴、 i、・・・・・・iとし。(el Invention CON Embodiment II Figure is a block diagram illustrating an embodiment of the present invention.The duration length of a voice pattern is divided into sections of equal time units, and the voice pattern in all of the sections is divided into sections of equal time units. Frequency spectrum? For the input time series/-e 9-, let Pecto-Rei, , Tsuru, i, ......i.
→→
標準時系列パターンについてはベクトルb+wbt*b
、 ・・b、とすると、メモリ1には入力時系列パター
ンA =(ai ) ;−+〜工が格納される0又メモ
リ2には標準時系列パターンB = (b」)4−1−
、+が格納される。DPマツチング部2に於て、メモリ
1及び3より入力時系列パターンA及び標準時系列パタ
ーンBが動的計画法によりマツチングされる。→→ For standard time series pattern, vector b+wbt*b
, . . b, then memory 1 stores the input time series pattern A = (ai) ;-+~ 0, and memory 2 stores the standard time series pattern B = (b'') 4-1-
, + are stored. In the DP matching section 2, the input time series pattern A and the standard time series pattern B from the memories 1 and 3 are matched by dynamic programming.
動的計画法によるマツチングの詳細は1−日本音響学会
誌J vo127.No9.483〜490ベーヂ。For details on matching using dynamic programming, see 1-Journal of the Acoustical Society of Japan J vo127. No. 9.483-490 beige.
1971年9月に記載されているが、概略的には、2つ
の音声パターンA、 Hの夫々の継続時間長を等しい
時間単位に時刻1〜正と時刻1〜Jに分割し、夫々を平
面上の縦軸及び横軸の目盛とし、各時刻に於ける音声パ
ターンの7j波数スペクトルを求める。即ちベクトルa
le a*+ as・・・・・・&!及びベクトルb+
e k)t * bs・・・bJを求めて2つの音声
パターンA、 Bの間の距離が最小になるように最適径
路10を得て、その時の最適径路lOに沿った累積距離
としてマツチング距離を得るもので66゜即ち先ず距離
gH* 1=/ at k)I/′とし、g + *
Jは第2図に示す如くg・−1,j・g・、j−1・
gI・j−1の3点よりの最小値に1.1点に於ける音
声パターンAとBの対応するベクトル間の距離7 鱈−
r7を加えることにより得られる。このようにしてi、
jを遂次増加しなからg ’ + 3を求めることによ
り最終的に音声パターンA、 Bの間の最適なマツチ
ング距離としてKIJ−が求まり、又この計算過程で、
マトリックスの各格子点に於て最適径路が直前の3点の
中のどれを通ったかP+、 jに記憶させることにより
最適なマツチングの径路も同時に求められる。Although it is described in September 1971, roughly speaking, the duration length of each of the two voice patterns A and H is divided into equal time units into time 1 to positive and time 1 to J, and each is divided into two planes. The 7j wave number spectrum of the voice pattern at each time is determined using the vertical and horizontal axes as scales. That is, vector a
le a*+ as...&! and vector b+
e k) t * bs...bJ is obtained to obtain the optimal path 10 so that the distance between the two voice patterns A and B is minimized, and the matching distance is the cumulative distance along the optimal path IO at that time. To obtain 66 degrees, that is, first, let the distance gH* 1=/at k)I/', then g + *
As shown in Figure 2, J is g・-1, j・g・, j−1・
The distance between the corresponding vectors of speech patterns A and B at 1.1 points is the minimum value from the three points of gI・j−1 7
Obtained by adding r7. In this way i,
By successively increasing j and finding g' + 3, KIJ- is finally found as the optimal matching distance between voice patterns A and B, and in this calculation process,
By storing in P+,j which of the previous three points the optimal path passed through at each grid point of the matrix, the optimal matching path can be found at the same time.
マトリックス4に得られた音声パターンA、 B間の
最適マツチング距離gt、 Jと最適径路PI 、Jに
より、入力時系列パターンAとの間で新たなマツチング
用パターンCを求めるため、変換部5に於て第3図のフ
ローチャートに示す如くマトリックス4の点(1,J)
からj=J、J−1,〜3.2.1と最適径路を遡りな
がら、jに対応するベクトルeを、又対応するベクトル
τが複数個ある時、それ等を平均化したものをdとする
ことによりマツチング用パターンCを得る。ここでMは
jに対応する&1が複数個ある時平均化するための%叙
でめるo 1=I−j=J、M=0は出発点でC,1=
alとなる。最通−路R,1が第2図に示す如く点し−
l。In order to obtain a new matching pattern C between the input time series pattern A and the input time series pattern A using the optimal matching distance gt, J between the audio patterns A and B obtained in the matrix 4 and the optimal path PI, J, As shown in the flowchart of Figure 3, point (1, J) of matrix 4
While tracing back the optimal path from j = J, J-1, ~3.2.1, find the vector e corresponding to j, and when there are multiple corresponding vectors τ, the average of them is d. By doing this, matching pattern C is obtained. Here, M can be expressed as a % expression for averaging when there are multiple &1 corresponding to j. 1=I-j=J, M=0 is the starting point and C, 1=
It becomes al. The closest route R, 1 is marked as shown in Fig. 2.
l.
J)より点Ct、3)に至る径路を通っ7t″ih合+
1点い−”IJ−Oより点(+、 」)に至る径路を通
つった場合01点”、j−s)工り点(+、 J)に至
る径路を通った場合−とすればP19.が+の場合一つ
前の格子点ではM=1となる。PL、jが0又は−の時
はM=0である。今山発点(L J)より1つ遡った場
合p+、jがOの方向であつ几とすると1=1−1.j
′−j−19M二Oであるから(J−1=&1−1とな
り、−の方向でめったとするとe J−1” & lで
あり、十の方向であったとするとi=、−1,M=M+
1であるからn ;cn+、、5./ 2となん更にP
I、jが+の1同に1点遡ったとするとi=+ −r
、 M、=■+ 1であるため号== (2(箔+i箔
ン2+旨1/ 3 = (ii”7+旨+旨y、3とな
り結が3個平均されることを示す。From J) to point Ct, 7t''ih + through the route leading to 3)
1 point - If you take a route from IJ-O to point (+, '') 01 point, j-s) If you take a route to machining point (+, J) - If P19. is +, M = 1 at the previous grid point. When PL, j is 0 or -, M = 0. If you go back one point from the Imayama starting point (L J), p + , if j is in the direction of O, then 1=1-1.j
'-j-19M2O, so (J-1=&1-1, if it is rarely in the - direction, e J-1"& l, and if it is in the ten direction, i=, -1, M=M+
1, so n;cn+,,5. / 2 and even more P
If I and j go back one point to +, then i = + -r
, M, = ■ + 1, so sign = = (2 (Hoil + i Hakun 2 + Uma 1/3 = (ii"7 + Uta + Uma y, 3, indicating that three results are averaged.
変換部5に於て上記の如く求められたマノチング用パダ
ーンC= (ci b = 1−Jはメモリ6に格納さ
れ6.3新たなマツチング用パターンCは演算部7に於
て、標準時系列パターンBとの間で距離計算に重み付け
?するため9重みW == (’W”7 ) j =
1−Jで格納しているメモリ8より重み金表わすベクト
ルw、、w2・・・・・WJ f用いて距離りが計算さ
れ最終的な7ツ千ング距離が出力され6゜距離りは例え
l−1′
但しNは音声パターンのペクトlしの次元数として釆め
ちれる。The matching pattern C=(ci b = 1-J) obtained as above in the converting section 5 is stored in the memory 6. 6.3 The new matching pattern C is converted into a standard time series pattern in the calculating section 7. To weight the distance calculation between B and B, 9 weights W == ('W”7) j =
The distance is calculated from the memory 8 stored in 1-J using vectors representing weights w,, w2...WJ f, and the final 7-thousand distance is output. l-1' where N is determined as the number of dimensions of the speech pattern vector l.
重みWとしては、先ず或単語に関する多数回の発声パタ
ーンから平均パターンt−求めてBとし。As the weight W, first, an average pattern t- is calculated from the utterance patterns of a certain word many times and is set as B.
それに対して各−回一回の発声パターンt−Aとして、
上記本発明の方法によりCt求め6゜これにより重みW
、に?求め6.)
パターンCの要素
りは重みを求めるため発声した発声回数上記の如き時間
軸上及び周波数軸上で異なる重み付は全行なって距離全
計算することにより、パラメータの確率的な変動に対処
し得る動的計画法全利用した7ツ千ングが実施出来る。On the other hand, as a vocalization pattern t-A for each time,
Ct is determined by the above method of the present invention by 6 degrees, and the weight W
, to? Search 6. ) The elements of pattern C are the number of utterances uttered to find the weights. Stochastic variations in parameters can be dealt with by performing all different weightings on the time axis and frequency axis and calculating the distances as described above. It is possible to perform 7-thousands using dynamic programming.
即ちパラメータの中心的なスペクトlLf重くシ、比較
的重要でないスペクトルの重み金軽くして距離計算する
ものである。That is, the distance is calculated by weighting the central spectrum lLf of the parameters and reducing the weight of relatively unimportant spectra.
げ)発明の詳細
な説明した如く本発明は時間軸上及び周波数軸上で距離
計算の重み?変えることが出来るため。g) As explained in detail, the present invention does not require weighting of distance calculations on the time axis and the frequency axis. Because it can be changed.
時間軸の非線形な伸縮に対する正規化に有効な動的計画
法によるパターンマツチング方式に、パラメータの確率
的な変動に有効な識別関数法の長所全組合せたパターン
マツチング方式を提倶し得るので、パターンの変動に有
効に対処し得ると共に。We can propose a pattern matching method that combines all the advantages of the pattern matching method using dynamic programming, which is effective for normalization against nonlinear expansion and contraction of the time axis, and the discriminant function method, which is effective for stochastic fluctuations of parameters. , as well as being able to effectively deal with pattern variations.
標準パターンの数?有効に減少せしめることが可能であ
り、その効果は大なるものがある。Number of standard patterns? It can be effectively reduced, and the effect is significant.
第1図は本発明の一実施例金説明丁6ブロック図、第2
図は7ノ千ング径路の選択を説明する図。
第3図は新たなマツチング用パターン業求めるためのフ
ローチャートでゐ6゜
1.3,6.8はメモリ、2はDPマツ千7グ部。
5は変換部、7は演算部である。
代理人 弁理士 松 岡 宏四廊7−二〜
1−ゴ −二二Fig. 1 is a block diagram of one embodiment of the present invention;
The figure is a diagram illustrating the selection of the 7-thousand route. Figure 3 is a flowchart for finding a new matching pattern. 6.1.3 and 6.8 are memories, and 2 is a DP mating section. 5 is a conversion section, and 7 is a calculation section. Agent Patent Attorney Koshiro Matsuoka 7-2~ 1-Go-22
Claims (1)
ターンとマツチングする手段と、該マツチング手段の結
果に基づき入力時系列パターンから新たなマツ千ング用
時系列パダーン金作放する手段と?設け、該新たなマツ
チング用時系列パターンと標準パターンとの間で1時間
軸上及び周波数軸上で踵な6重み付は金剛いた距離計’
I:i行なってパターンマツチング全行なうこと全特撃
とするバダーノマッ千7グ方式。A means for matching an input time series pattern with a standard time series pattern by full dynamic programming, and a means for generating a new time series pattern for pine milling from the input time series pattern based on the result of the matching means. 6 weightings between the new matching time series pattern and the standard pattern on the time axis and the frequency axis are used as a distance meter.
I: A Badano match method in which all pattern matching performed by performing i is considered a special attack.
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP57092824A JPS58209794A (en) | 1982-05-31 | 1982-05-31 | Pattern matching system |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
JP57092824A JPS58209794A (en) | 1982-05-31 | 1982-05-31 | Pattern matching system |
Publications (2)
Publication Number | Publication Date |
---|---|
JPS58209794A true JPS58209794A (en) | 1983-12-06 |
JPH0361955B2 JPH0361955B2 (en) | 1991-09-24 |
Family
ID=14065176
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
JP57092824A Granted JPS58209794A (en) | 1982-05-31 | 1982-05-31 | Pattern matching system |
Country Status (1)
Country | Link |
---|---|
JP (1) | JPS58209794A (en) |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007279742A (en) * | 2006-04-06 | 2007-10-25 | Toshiba Corp | Speaker authentication recognition method and device |
Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS5525091A (en) * | 1978-08-14 | 1980-02-22 | Nippon Electric Co | Voice characteristic pattern comparator |
-
1982
- 1982-05-31 JP JP57092824A patent/JPS58209794A/en active Granted
Patent Citations (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JPS5525091A (en) * | 1978-08-14 | 1980-02-22 | Nippon Electric Co | Voice characteristic pattern comparator |
Cited By (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
JP2007279742A (en) * | 2006-04-06 | 2007-10-25 | Toshiba Corp | Speaker authentication recognition method and device |
Also Published As
Publication number | Publication date |
---|---|
JPH0361955B2 (en) | 1991-09-24 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
Nishimura et al. | Singing Voice Synthesis Based on Deep Neural Networks. | |
Zen et al. | Statistical parametric speech synthesis using deep neural networks | |
Zhang et al. | Transfer learning from speech synthesis to voice conversion with non-parallel training data | |
US20220013106A1 (en) | Multi-speaker neural text-to-speech synthesis | |
JP3114975B2 (en) | Speech recognition circuit using phoneme estimation | |
CN110570876B (en) | Singing voice synthesizing method, singing voice synthesizing device, computer equipment and storage medium | |
Nakamura et al. | Singing voice synthesis based on convolutional neural networks | |
CN112489629A (en) | Voice transcription model, method, medium, and electronic device | |
CN113539232A (en) | Muslim class voice data set-based voice synthesis method | |
Li et al. | Styletts-vc: One-shot voice conversion by knowledge transfer from style-based tts models | |
KR20190016889A (en) | Method of text to speech and system of the same | |
CN111599339A (en) | Speech splicing synthesis method, system, device and medium with high naturalness | |
Gao et al. | Personalized Singing Voice Generation Using WaveRNN. | |
JP3311460B2 (en) | Voice recognition device | |
Chen et al. | The USTC System for Voice Conversion Challenge 2016: Neural Network Based Approaches for Spectrum, Aperiodicity and F0 Conversion. | |
JP2898568B2 (en) | Voice conversion speech synthesizer | |
Liu et al. | Controllable accented text-to-speech synthesis | |
JPS58209794A (en) | Pattern matching system | |
JP3102195B2 (en) | Voice recognition device | |
Chandra et al. | Towards the development of accent conversion model for (l1) bengali speaker using cycle consistent adversarial network (cyclegan) | |
Bhatia et al. | Speaker accent recognition by MFCC using K-nearest neighbour algorithm: a different approach | |
Al-Radhi et al. | Nonparallel expressive tts for unseen target speaker using style-controlled adaptive layer and optimized pitch embedding | |
JP2886474B2 (en) | Rule speech synthesizer | |
Sung et al. | Factored maximum likelihood kernelized regression for HMM-based singing voice synthesis. | |
JP3438293B2 (en) | Automatic Word Template Creation Method for Speech Recognition |