JPS58219600A

JPS58219600A - Collimation of speaker

Info

Publication number: JPS58219600A
Application number: JP57103525A
Authority: JP
Inventors: 滝波　孝治; 丹一安藤
Original assignee: Tateisi Electronics Co; Omron Tateisi Electronics Co
Current assignee: Omron Corp
Priority date: 1982-06-15
Filing date: 1982-06-15
Publication date: 1983-12-21
Also published as: JPH0337198B2

Abstract

(57)【要約】本公報は電子出願前の出願データであるた
め要約のデータは記録されません。(57) [Summary] This bulletin contains application data before electronic filing, so abstract data is not recorded.

Description

【発明の詳細な説明】（ａ）　　発明の関連する技術分野この発明は話者照合を自動的に行う話者照合システムに
おいて、照合率の改善をはかる照合方法に関する。DETAILED DESCRIPTION OF THE INVENTION (a) Technical field to which the invention relates The present invention relates to a verification method for improving the verification rate in a speaker verification system that automatically performs speaker verification.

ｔｂ）　　従来技術とその欠点話者照合システムにおいて個人の標準ベクトルと未知サ
ンプルベクトルの距離計算では、一般に重み付き距離が
用いられる。この重み付き距離の計算には一般に特徴ベ
クトルの共分散行列が用いられるが、共分散行列が正定
値であるためには、特徴ベクトルの次元数よりも多い学
習ザンブル数が必要となる。しかし、特徴ベクトルの次
元数は一般に数十次元であるから、実用的な話者照合シ
ステムにおいては、この条件を満たす学習ザンプルを登
録時に集めることは困難である。そこで、従来は、その
対策として多数の人の同一条件の特徴ベクトルの共分散
行列の平均値をあらかじめ実験により求めておく方法が
とられている。しかしながら、個人によっては、この平
均の共分散行列よりかけ離れていることがあるので、照
合率の低Ｆを招くという欠点があった。tb) Prior art and its disadvantages In speaker verification systems, a weighted distance is generally used to calculate the distance between an individual's standard vector and an unknown sample vector. The covariance matrix of the feature vector is generally used to calculate this weighted distance, but in order for the covariance matrix to be positive definite, a larger number of training samples than the number of dimensions of the feature vector is required. However, since the number of dimensions of a feature vector is generally several tens of dimensions, in a practical speaker verification system, it is difficult to collect learning samples that meet this condition at the time of registration. Conventionally, as a countermeasure against this problem, a method has been adopted in which the average value of the covariance matrix of feature vectors of a large number of people under the same conditions is determined in advance through experiments. However, depending on the individual, the covariance matrix may deviate far from this average covariance matrix, resulting in a low matching rate F.

（Ｃ）　　発明の目的この発明の目的は高照合率で話者照合を行える話者照合
方法を提供することにある。(C) Purpose of the Invention The purpose of the present invention is to provide a speaker verification method that can perform speaker verification with a high matching rate.

（（１）　　発明の構成と効果この発明は、要約すれば、照合のたびに照合された特徴ベク）／しを記憶していき
、記憶した特徴ペク）／しの数が特徴ベクトルの次元数
を超えていないときは所定の共分散行列を用いて話者照
合を行い、記憶した特徴ベクトルの数が前記次元数に達
しだとき、記憶した特徴ベクトルより求まる共分散行列
を用いて話者照合を行い、更に記憶した特徴ベクトルの
数が前記次元数に達した後の照合時においては、既に記
憶されている最も古い特徴ベクトルを消去するとともに
照合された特徴ベクトルを記憶し、そのとき記憶されて
いる特徴ベクトルより求まる共分散行列によって次回の
話者照合を行うことを特徴とする。((1) Structure and effect of the invention This invention can be summarized as follows: The feature vectors)/shi that are matched each time are memorized, and the number of memorized feature vectors)/shi is the number of dimensions of the feature vector. When the number of stored feature vectors reaches the number of dimensions, speaker verification is performed using a covariance matrix determined from the stored feature vectors. Further, at the time of matching after the number of stored feature vectors has reached the number of dimensions, the oldest feature vector that has already been stored is deleted, the matched feature vector is stored, and the The feature is that the next speaker verification is performed using the covariance matrix found from the feature vectors.

そして、この発明に係る話者照合方法によれば、照合の
たびに最も古い特徴ベクトルが消去され、そのとき照合
された特徴ベクトルが加えられるから、話者固有の特徴
ベクトルに基づく共分散行列の更新が可能となって、よ
シ高い照合率で話者照合を行える利点を有する〇（ｅ）　　実施例の説明第１図はこの発明に係る話者照合を実施するための話者
照合システムを示すブロック図である。According to the speaker matching method of the present invention, the oldest feature vector is deleted each time matching is performed, and the feature vectors matched at that time are added. It has the advantage of being able to update and perform speaker verification with a higher matching rate.〇(e) Description of Embodiments FIG. 1 shows a speaker verification system for carrying out speaker verification according to the present invention. FIG.

この話者照合システムにおいて、特徴ベクトルを記憶す
る個人照合カード（以下、ｉＤカードという。）が使用
される。この１Ｄカードは、電気的・磁気的・光学的に
書込み・消去可能な素子を実装したものでよく、記憶容
量の大きいものが好ましい。In this speaker verification system, a personal verification card (hereinafter referred to as an ID card) that stores feature vectors is used. This 1D card may be mounted with electrically, magnetically, or optically writable/erasable elements, and preferably has a large storage capacity.

この話者照合システムは、重み付き距離の算出を行う演
算部を含む計算機４と、上記ｉＤカードの特徴ベクトル
の読み取りまたは書込みを行う読取書込装置６と、計算
機４に照合″ｉ！たけ登録の指示や話者（記号ｒで示す
。）の登録番号等を入力するキーボード５と、話者の音
声を計算機４に入力するためのマイクロフォン１１低域
フイルタ２およびＡ／Ｄ変換器３とで構成される。This speaker verification system includes a computer 4 including an arithmetic unit that calculates a weighted distance, a reading/writing device 6 that reads or writes the feature vector of the ID card, and a verification "i!take registration" in the computer 4. A keyboard 5 for inputting instructions and the registration number of the speaker (indicated by the symbol r), a microphone 11, a low-pass filter 2, and an A/D converter 3 for inputting the speaker's voice into the computer 4. configured.

（１く、ｉ＜、：ｎ；ｎは少なくとも特徴ベクトルの次
元数より大きい数である。）を記憶する記憶領域（３０
０，３０１，・・・）、計算機４により算出された後述
の共分散行列Ｖｒを記憶する記憶領域４００、話者ｒの
ｉＤ登録番号等を記憶する記憶領域４１０を有する。A storage area (30
0, 301, . . ), a storage area 400 for storing a covariance matrix Vr, which will be described later, calculated by the computer 4, and a storage area 410 for storing the iD registration number of speaker r, etc.

まだ、計算Ｒ４に含まれるメモリーには、第２図（Ａｌ
に示すように、特徴ベクトルの記憶領域２００、い値Ｉ
（と、上記のｉＤカードに書込み可能な特徴ベクトルの
数ｎとが、記憶される記憶領域２２０、音声プログラム
およびシステム制御プログラムの記憶領域２３０が含ま
れている。The memory included in calculation R4 still contains the data shown in Figure 2 (Al
As shown in FIG.
(and the number n of feature vectors that can be written to the ID card) are stored, and a storage area 230 for audio programs and system control programs.

次に、動作を説明する。Next, the operation will be explained.

第３図はこの話者照合システムにおける照合処理を示す
フローチャートである。FIG. 3 is a flowchart showing the verification process in this speaker verification system.

まず、話者はｉ、　Ｉ）カードを読取書込装置６に挿入
し、話者の登録番号や登録・照合の別をキーボード５に
よって入力する。次にステップｎｌ（以下、ステップｎ
ｉを単にｎｉという。）にて、話者ハマイクロフォンＩ
に向ってｌまたは複数のあらかしめ定められた単語を発
声する。マイクロフォンｌに入力された音声は低域フィ
ルタ２　、　Ａ／１Ｆ変換器３を通してデジタル化され
、計算機４に取り込まれる。この音声入力は計算機４に
より分析θ され、その音声入力の特徴ベク）／し韓ｒ１が算出され
る（ｎ２）。音声データの分析には、一般によく知られ
ている分析方法の一つを採用すればよい。First, the speaker: i) inserts the card into the reading/writing device 6, and inputs the speaker's registration number and registration/verification information using the keyboard 5; Next, step nl (hereinafter, step n
i is simply called ni. ), the speaker was using microphone I.
Say one or more predetermined words to the person. The voice input to the microphone 1 is digitized through a low-pass filter 2 and an A/1F converter 3, and then input into a computer 4. This voice input is analyzed by the computer 4, and the feature vector of the voice input is calculated (n2). One of the generally well-known analysis methods may be employed to analyze the voice data.

システムに登録が要求されているときは、算出された特
徴ベクトルはｉ　１）カードの記憶領域３００に書き込
まれる（ｎ４）。以上の特徴ベクトルのｉＤカードへの
登録は、ｍ回繰り返されると完了する（ｎ５．ｎ１７）
。通常、この登録操作は適当な日数をおいてｍ回繰り返
される。なお、ｍは１以上の値で、かつｉＤカードに書
き込める特徴ベクトルの数ｎより小さい値である。When registration is requested to the system, the calculated feature vector is written to the storage area 300 of the i1) card (n4). Registration of the above feature vectors to the ID card is completed after it is repeated m times (n5.n17)
. Normally, this registration operation is repeated m times at an appropriate number of days. Note that m is a value of 1 or more and smaller than the number n of feature vectors that can be written on the ID card.

次に、システムに照合が要求されている場合は、ステッ
プｎ６以下に移る。Next, if the system requires verification, the process moves to step n6 and subsequent steps.

話者ｒは上述の登録時と同様にｉ　Ｄカードを読取書込
装置６に挿入し、照合要求をシステムに知らせた後、登
録時と同じ単語をマイクロフォン１に向って発声する（
ｎｌ−ｎ３）。計算機４はマイクロフォン１に入力され
た音声より未知サンプルの特徴ベクトルｚｕを計算し、
これを記憶領域２１０に格納する。そして、■Ｄカード
上の４個の特徴ベクトルθｒｊ　（１＜ｊ＜ｌ）を読み
取り、記憶領域２００に一旦格納する（ｎ６）。更に、
下式で表される標準ベクトルｏｒを計算して求め、記憶
領域２１５に格納する（ｎ７）。Speaker r inserts the iD card into the reading/writing device 6 in the same way as at the time of registration described above, notifies the system of the verification request, and then speaks the same words into the microphone 1 as at the time of registration (
nl-n3). Calculator 4 calculates the feature vector zu of the unknown sample from the voice input to microphone 1,
This is stored in the storage area 210. Then, the four feature vectors θrj (1<j<l) on the ■D card are read and temporarily stored in the storage area 200 (n6). Furthermore,
A standard vector or expressed by the following formula is calculated and stored in the storage area 215 (n7).

このようにして求めた標準ベクトルｏｒと未知サンフ諏
しの特徴ベクトルＺｕから、重み付き距離５（９ｒ　、
　Ｚ、ｕ、）を算出する（ｎ８）。重み付き距離の一例
を下式に示す。From the standard vector or obtained in this way and the feature vector Zu of the unknown summation, the weighted distance 5 (9r,
Z, u, ) is calculated (n8). An example of the weighted distance is shown in the formula below.

Ｓ（θｒ、Ｚｕ）　−１Ｖｒ１”（Ｚｕ−Ｏｒ）Ｖｒ　
　（ＺｕＪｒ）ここで、　　Ｖｒは重み付けだめの共分
散行列であり、下式で表される。S(θr, Zu) −1Vr1”(Zu−Or)Vr
(ZuJr) Here, Vr is a covariance matrix for weighting, and is expressed by the following formula.

なお、記憶された特徴ベクトルがｎ個になるまでは、Ｖ
ｒは」一式を用いず、あらかじめ実験により求められた
多数の人の平均の共分散行列を用いて重み付き距離Ｓ　
（Ｑｒ、Ｚｕ）が算出される。次に、ｎ９にて、重み付
き距離Ｓ（θｒ、Ｚｕ）と、あらかじめ定められたしき
い値■、との比較が行われ、重み付き距離ＳがＲ，より
小さいとき本人と判定する。Note that until the number of stored feature vectors reaches n, V
r is a weighted distance S that does not use a single set, but uses a covariance matrix of the average of many people determined in advance through experiments.
(Qr, Zu) is calculated. Next, at n9, a comparison is made between the weighted distance S(θr, Zu) and a predetermined threshold value (2), and when the weighted distance S is smaller than R, it is determined that the person is the real person.

これに対し、重み付き距離ＳがＲより大きいときは、再
度音声入力を促すか、あるいはシステムの管理者とコン
タクトをとるように指示する（ｎ１３）。On the other hand, when the weighted distance S is larger than R, the user is prompted to input voice again or is instructed to contact the system administrator (n13).

本人と判定された場合、ｎ１０でｉＤカード上の特徴ベ
クトルの数ｌがｎよシ小さいか否か判定される。ｌがｎ
より小さいとき、すなわちｉＤカード上に才だ特徴ベク
トルが書き込める時は、読取書込装置６によりｉＤカー
ドに今回の特徴ベク）　）（ｉ　Ｚ、ｕをＯｒ　、　ｌ
　−１−１として書き込む（ｎｉｌ）。If it is determined that the person is the real person, it is determined at n10 whether the number l of feature vectors on the ID card is smaller than n. l is n
When it is smaller, that is, when a special feature vector can be written on the ID card, the current feature vector is written on the ID card by the reading/writing device 6.
Write as -1-1 (nil).

その後、ｎ１２にてシステムは目的のサービスを実行す
る。After that, the system executes the target service at n12.

ｌが１１に等しいときは、今回の特徴ベクトルＺｕをｉ
Ｄカードのθｒｎが書き込まれていた領域に、他の特徴
ベクトルを１ずつ前にシフトさせて゛書き込む（ｎ１４
）・。しだがって１１１４においては、最も古い特徴ベ
クトルはｉＤカードから消去されることになる。次に、
今回の特徴ベクトルを加えた１１個の特徴ベクトルより
話者ｒ固有の共分散行列Ｖｒ’を算、出しくｎ１５）、
記憶領域４００に書き込まれている古いＶｒを消去して
その領域にＶｒ’を書き込む（ｎ１６）。その後は、ｎ
１２にて目的のサービスの実行に移る。When l is equal to 11, the current feature vector Zu is
In the area where θrn of the D card was written, shift other feature vectors forward one by one and write them (n14
)・. Therefore, at 1114, the oldest feature vector will be deleted from the ID card. next,
Calculate and output the covariance matrix Vr' specific to speaker r from the 11 feature vectors including the current feature vector.n15)
The old Vr written in the storage area 400 is erased and Vr' is written in that area (n16). After that, n
At step 12, the process moves to execution of the target service.

上述のように、照合回数が特徴ベクトルの次元数以上に
なれば、照合のたびに特徴ベクトル集合の要素が更新さ
れ、それによって共分散行列も照合毎に更新されるので
、照合回数の増加に従い、話者固有の共分散行列に近づ
くことになり、高い１！６合率での話者照合を行える。As mentioned above, if the number of matches is greater than or equal to the dimensionality of the feature vector, the elements of the feature vector set will be updated each time there is a match, and the covariance matrix will also be updated with each match, so as the number of matches increases, , which approaches the speaker-specific covariance matrix, allowing speaker verification with a high 1!6 ratio.

【図面の簡単な説明】[Brief explanation of the drawing]

第１図ｄコこの発明に係る話者照合を実施するだめの話
者照合システムを示すブロック図、第２図い）は同話者
照合システムに用いるｉＤカードに含まれるメモリーの
記憶領域を示す図、第２図（１３）は同話者照合システ
ムの計算機４に含まれるメモリーの記憶領域を示す図、
第３図は同話者照合システムにおける照合処理を示すフ
ローチャートである０１・・・マイクロフォン、　　４・・・計算機、６・・
・読取書込装置。出願人　　立石電機株式会社代理人　弁理士　小森　久夫Figure 1 (d) is a block diagram showing a speaker verification system for carrying out speaker verification according to the present invention, and Figure 2 (d) shows the storage area of the memory included in the ID card used in the speaker verification system. Figure 2 (13) is a diagram showing the storage area of the memory included in the computer 4 of the co-speaker verification system.
FIG. 3 is a flowchart showing the matching process in the co-speaker matching system.0 1...Microphone, 4...Computer, 6...
・Reading/writing device. Applicant Tateishi Electric Co., Ltd. Agent Patent Attorney Hisao Komori

Claims

【特許請求の範囲】[Claims]

（１）話者の特徴ベクトルから決まる標準ベクトル表、
未知サンプルの特徴ベクトルとから求めた共分散行列の
重み付き距離により話者照合を行う方法において、照合
のたびに照合された特徴ベクトルを記憶していき、記憶
した特徴ベクトルの数が特徴ベクトルの次元数を超えて
いないときは所定の共分散行列を用いて話者照合を行い
、記憶した特徴ベク）／しの数が前記次元数に達したと
き、記憶した特徴ベクトルより求まる共分散行列を用い
て話者照合を行い、更に記憶した特徴ベクトルの数が前
記次元数に達した後の照合時においては、既に記憶され
ている最も古い特徴ベクトルを消去するとともに照合さ
れた特徴ベクトルを記憶し、そのとき記憶されている特
徴ベクトルより求まる共分散行列によって次回の話者照
合を行う話者照合方法。(1) Standard vector table determined from speaker feature vectors,
In a method of speaker matching using a weighted distance of a covariance matrix obtained from the feature vector of an unknown sample, the feature vectors matched each time are memorized, and the number of memorized feature vectors increases as the number of feature vectors increases. If the number of dimensions is not exceeded, speaker matching is performed using a predetermined covariance matrix, and when the number of stored feature vectors When the number of stored feature vectors reaches the number of dimensions, the oldest feature vector already stored is deleted and the matched feature vector is stored. , a speaker matching method in which the next speaker matching is performed using a covariance matrix determined from the feature vectors stored at that time.