KR100560425B1

KR100560425B1 - Apparatus for registrating and identifying voice and method thereof

Info

Publication number: KR100560425B1
Application number: KR1020030084213A
Authority: KR
Inventors: 최우용; 이경희; 반성범
Original assignee: 한국전자통신연구원
Priority date: 2003-11-25
Filing date: 2003-11-25
Publication date: 2006-03-13
Also published as: KR20050050466A

Abstract

본 발명에 의한 SVM(Support Vector Machine)을 이용한 화자 등록 및 인증 시스템과 그 방법은 등록될 사용자의 음성을 입력받는 음성입력부; 상기 입력된 음성에 대하여 소정의 전처리과정을 수행한 후 소정의 음성특징을 적어도 하나 이상 추출하는 전처리부; 및 상기 추출된 각 음성특징을 염색체로 매핑하여 특징집합을 구성하고 상기 특징집합에 대하여 유전자 알고리즘을 적용하여 최적의 염색체로 이루어지는 최적특징집합을 구한 후 상기 최적특징집합을 입력벡터로 하여 SVM을 생성함으로써 화자모델을 구성하는 제어부;를 포함하는 것을 특징으로 하며, 화자 등록 과정에서는 개인별로 식별력이 우수한 특징집합을 선택하고, 화자 인증 과정에서는 상기 화자 등록시 학습과정에서 선택된 특징 집합만을 사용함으로써, 불필요한 정보에 따른 메모리 사용을 줄이고, 화자 인증을 위한 데이터 계산량도 줄일 수 있어 USB 토큰 또는 스마트카드와 같은 제한된 자원 환경 하에서도 화자 인증을 통한 신원인증을 가능하게 하는 이점이 있다Speaker registration and authentication system using the SVM (Support Vector Machine) and the method according to the present invention includes a voice input unit for receiving a voice of the user to be registered; A preprocessor configured to extract at least one predetermined voice feature after performing a predetermined preprocessing operation on the input voice; And constructing a feature set by mapping each of the extracted negative features to a chromosome, applying a genetic algorithm to the feature sets, obtaining an optimal feature set consisting of an optimal chromosome, and generating an SVM using the optimal feature set as an input vector. And a control unit constituting the speaker model. In the speaker registration process, a feature set having excellent discrimination ability is selected for each individual, and the speaker authentication process uses only the feature set selected in the learning process when registering the speaker. It can reduce the memory usage and reduce the amount of data calculation for speaker authentication, which enables the identification authentication through speaker authentication even under limited resource environment such as USB token or smart card.

SVM, 화자인증, 유전자 알고리즘SVM, speaker authentication, genetic algorithm

Description

ＳＶＭ을 이용한 화자 등록 및 인증 시스템과 그 방법{Apparatus for registrating and identifying voice and method thereof}Speaker registration and authentication system using SMB and its method {Apparatus for registrating and identifying voice and method

도 1은 본 발명에 의한 SVM을 이용한 화자 등록 및 인증 시스템의 구성을 보여주는 블럭도이다.1 is a block diagram showing the configuration of a speaker registration and authentication system using the SVM according to the present invention.

도 2는 본 발명에 의한 화자 등록 방법의 과정을 보여주는 흐름도이다.2 is a flowchart illustrating a process of a speaker registration method according to the present invention.

도 3은 본 발명에 의한 화자 등록 방법에 있어서 각 염색체에 대한 적합도 측정을 위한 과정의 상세 흐름도이다.Figure 3 is a detailed flowchart of the process for measuring the fitness for each chromosome in the speaker registration method according to the present invention.

도 4는 본 발명에 의한 화자 인증 방법의 과정을 보여주는 흐름도이다.4 is a flowchart illustrating a process of a speaker authentication method according to the present invention.

본 발명은 생체정보를 이용한 사용자 등록 및 인증 시스템 및 그 방법에 관한 것으로, 특히 제한된 환경에서 음성정보를 이용한 화자 등록 과정에서는 화자별로 식별력이 우수한 특징 집합을 선택하고, 인증과정에서는 학습과정에서 선택된 특징 집합만을 사용하여 화자인증이 가능하도록 하는 시스템 및 그 방법에 관한 것이다.The present invention relates to a user registration and authentication system using biometric information and a method thereof. In particular, in a speaker registration process using voice information in a limited environment, a feature set having excellent discrimination ability is selected for each speaker, and a feature selected in a learning process in the authentication process. The present invention relates to a system and a method for enabling speaker authentication using only a set.

통상적으로 신원인증 시스템이라 함은 보안이 요구되는 특정 지역이나 건물 등에 미리 등록된 사람들만의 출입을 허용하기 위하여 도입된 시스템으로, 사용자의 각종 신상정보와 특정 건물로의 접근을 위한 비밀번호 등을 스마트카드에 기록하여 상기 신상정보와 비밀번호가 일치하는 등록된 사용자에게만 접근을 허용하는 방식의 스마트카드 시스템이 일반적인 신원인증 시스템으로서 널리 사용되고 있으나 상기 스마트카드 시스템은 타인에 의한 도용, 위조 및 변조가 비교적 용이한 문제점이 있었다.In general, the identity authentication system is a system introduced to allow access only to those who are registered in advance in a specific area or building where security is required, and smart information such as a user's various personal information and a password for accessing a specific building A smart card system that allows access only to registered users whose personal information and password match by recording on a card is widely used as a general identity authentication system, but the smart card system is relatively easy to steal, forgery, and forgery by others. There was a problem.

따라서 최근 들어서는 사람의 망막, 홍채, 지문, 서명, 얼굴, 음성 등과 같은 사용자 고유의 생체정보를 신원 인증에 이용하는 생체인식 기술 분야가 그 뛰어난 보안성을 이유로 새로운 신원인증 시스템으로서 크게 주목받고 있으며, 사회 전반에 걸쳐 보안문제가 화두로 대두됨에 따라 보다 신뢰성이 높은 보안시스템을 구축하고자 하는 사용자들을 통해 높은 설치비용에도 불구하고 그 이용이 급속히 증가하고 있는 추세이다.Therefore, in recent years, the biometric technology field that uses user's own biometric information such as human retina, iris, fingerprint, signature, face, voice, etc. for identity authentication has gained great attention as a new identity authentication system because of its excellent security. As security issues become a hot topic throughout, users who want to build more reliable security systems are increasing rapidly despite their high installation costs.

한편, 종래 상기 지문, 망막, 홍채 등을 이용하는 생체정보 인증 시스템은 사용자로 하여금 자신의 생체정보가 외부로 유출된다는 느낌을 주게 되므로 사용자의 거부감이 크다는 문제점이 있으나, 음성을 이용하는 경우에는 사용자의 거부감을 크게 줄일 수 있다는 장점이 있다.On the other hand, the conventional biometric information authentication system using the fingerprint, retina, iris, etc. has a problem that the user's rejection is large because the user's biometric information is leaked to the outside, but the user's rejection when using voice There is an advantage that can be greatly reduced.

이하 상기 종래 화자인증에 관한 연구들을 살펴보면, 종래 화자인증에 관한 연구들중 대표적인 것으로 HMM(Hidden Markov Model)과 DTW(Dynamic Time Warping)를 들 수 있다. 상기 HMM은 음성의 각 프레임의 주파수 특성을 통계적 모델로 표현하는 방법으로 모든 프레임에 대해서 확률값을 계산하기 때문에 인식성능은 뛰어나 지만 계산량이 많아서 데이터의 처리속도가 느리다는 단점이 있었다. Hereinafter, the studies on the conventional speaker authentication may include the HMM (Hidden Markov Model) and DTW (Dynamic Time Warping). The HMM is a method of expressing the frequency characteristics of each frame of speech as a statistical model, which calculates probability values for all frames. However, the HMM has a high recognition performance, but has a large amount of calculation and has a disadvantage of slow data processing speed.

그리고 상기 DTW는 템플릿 매칭 방법으로 시간축에 대해서 부분선형 변환을 수행함으로써 발성 속도의 차이에서 오는 프레임간의 불일치를 보상해 주는 방법이다. 그러나 상기 DTW 방법은 별도의 훈련과정 없이 등록음성 특징벡터를 저장하기 때문에 등록 시간이 짧다는 장점은 있으나 인증 시간이 길고, 메모리 사용량이 많다는 단점이 있었다. 따라서 상기 방법들은 메모리 및 계산량에 제한이 있는 환경 하에서는 사용이 어려운 문제점이 있었다.The DTW is a method of compensating for inconsistencies between frames resulting from a difference in speech speed by performing partial linear transformation on a time axis using a template matching method. However, the DTW method has a merit that the registration time is short because the registration voice feature vector is stored without a separate training process, but the authentication time is long and memory usage is high. Therefore, the above methods have a problem in that they are difficult to use in an environment in which memory and calculation amount are limited.

본 발명이 이루고자 하는 기술적 과제는 유전자 알고리즘과 SVM을 이용하여 식별력이 우수한 특징집합을 선택하여 사용자를 등록하는 시스템 및 그 방법을 제공하는데 있다.An object of the present invention is to provide a system and method for registering a user by selecting a feature set having excellent discrimination ability using a genetic algorithm and an SVM.

본 발명이 이루고자 하는 다른 기술적 과제는 화자등록시에 선택된 특징집합과 SVM을 이용하여 인증을 수행하는 시스템 및 그 방법을 제공하는데 있다.Another object of the present invention is to provide a system and method for performing authentication using a selected feature set and SVM during speaker registration.

상기의 기술적 과제를 이루기 위하여 본 발명에 의한 SVM을 이용한 화자 등록 시스템은 등록될 사용자의 음성을 입력받는 음성입력부; 상기 입력된 음성에 대하여 소정의 전처리과정을 수행한 후 소정의 음성특징을 적어도 하나 이상 추출하는 전처리부; 및 상기 추출된 각 음성특징을 염색체로 매핑하여 특징집합을 구성하고 상기 특징집합에 대하여 유전자 알고리즘을 적용하여 최적의 염색체로 이루어지는 최적특징집합을 구한 후 상기 최적특징집합을 입력벡터로 하여 SVM(Support Vector Machine)을 생성함으로써 화자모델을 구성하는 제어부;를 포함하는 것을 특징으로 한다.In order to achieve the above technical problem, the speaker registration system using the SVM according to the present invention includes a voice input unit for receiving a voice of a user to be registered; A preprocessor configured to extract at least one predetermined voice feature after performing a predetermined preprocessing operation on the input voice; And constructing a feature set by mapping each extracted voice feature to a chromosome, applying a genetic algorithm to the feature set, obtaining an optimal feature set consisting of an optimal chromosome, and using the optimal feature set as an input vector. And a controller constituting the speaker model by generating a vector machine.

상기의 기술적 과제를 이루기 위하여 본 발명에 의한 SVM을 이용한 화자 등록 방법은 등록될 사용자의 음성을 입력받는 단계; 상기 입력된 음성에 대하여 소정의 전처리과정을 수행한 후 소정의 음성특징을 적어도 하나 이상 추출하는 단계; 상기 추출된 각 음성특징을 염색체로 매핑하여 특징집합을 구성하는 단계; 및 상기 특징집합에 대하여 유전자 알고리즘을 적용하여 최적의 염색체로 이루어지는 최적특징집합을 구하고 상기 최적특징집합을 입력벡터로 하여 SVM(Support Vector Machine)을 생성함으로써 상기 사용자를 대표하는 화자모델을 구성하는 단계;를 포함하는 것을 특징으로 한다.In order to achieve the above technical problem, the speaker registration method using the SVM according to the present invention comprises the steps of: receiving a voice of a user to be registered; Extracting at least one predetermined voice feature after performing a predetermined preprocessing operation on the input voice; Mapping each of the extracted negative features to a chromosome to construct a feature set; And constructing a speaker model representing the user by applying a genetic algorithm to the feature set, obtaining an optimal feature set consisting of an optimal chromosome, and generating a support vector machine (SVM) using the optimal feature set as an input vector. It characterized by including.

상기의 다른 기술적 과제를 이루기 위하여 본 발명에 의한 SVM을 이용한 화자 인증 시스템은 등록된 사용자별로 SV(Support Vector), 가중치정보, 최적특징값으로 구성된 화자모델을 구비하고, 인증을 요하는 사용자의 음성을 입력받는 음성입력부; 상기 입력된 음성에 대하여 소정의 전처리과정을 수행한 후 소정의 음성특징을 적어도 하나 이상 추출하는 전처리부; 상기 인증을 요구하는 사용자의 개인정보를 입력받는 개인정보입력부; 및 상기 개인정보에 기초하여 대응되는 상기 화자모델내의 상기 SV, 가중치정보, 최적특징값을 기초로 화자인증을 수행하는 제어부;를 포함하는 것을 특징으로 한다.In order to achieve the above technical problem, the speaker authentication system using the SVM according to the present invention has a speaker model composed of SV (Support Vector), weight information, and optimal feature value for each registered user, and a voice of a user requiring authentication. Voice input unit for receiving a; A preprocessor configured to extract at least one predetermined voice feature after performing a predetermined preprocessing operation on the input voice; A personal information input unit configured to receive personal information of a user requesting authentication; And a controller configured to perform speaker authentication based on the SV, weight information, and optimal feature values in the speaker model corresponding to the personal information.

상기의 다른 기술적 과제를 이루기 위하여 본 발명에 의한 SVM을 이용한 화자 인증 방법은 등록된 사용자별로 SV(Support Vector), 가중치정보, 최적특징값으 로 구성된 화자모델을 기초로 하여,인증을 요구하는 사용자의 음성을 입력받아 소정의 전처리과정을 거쳐 소정의 음성특징을 추출하는 단계; 상기 인증을 요구하는 사용자의 개인정보를 입력받는 단계; 상기 개인정보를 기초로 상기 화자모델내의 SV, 가중치정보, 최적특징값을 읽어오는 단계; 및 상기 SV, 가중치정보로 인증SVM을 생성한 후, 상기 최적특징값에 대응되는 상기 추출된 음성특징을 상기 인증SVM의 입력벡터로 하여 화자인증을 수행하는 단계;를 포함하는 것을 특징으로 한다.In order to achieve the above technical problem, the speaker authentication method using SVM according to the present invention is based on a speaker model composed of SV (Support Vector), weight information, and optimal feature value for each registered user. Receiving a voice and extracting a predetermined voice feature through a predetermined preprocessing step; Receiving personal information of a user requesting authentication; Reading SV, weight information, and optimal feature values in the speaker model based on the personal information; And generating an authentication SVM using the SV and weight information, and performing speaker authentication using the extracted voice feature corresponding to the optimal feature value as an input vector of the authentication SVM.

이하 첨부된 도면을 참조하여 본 발명의 바람직한 일 실시예를 상세히 설명하도록 한다. 도 1은 본 발명에 의한 SVM(Support Vector Machine)을 이용한 화자 등록 및 인증 시스템의 구성을 보여주는 블럭도이다. 먼저 상기 도 1을 참조하여 본 발명에 의한 SVM을 이용한 화자 등록 및 인증 시스템의 동작을 개괄적으로 살펴본 후 도 2내지 도4를 참조하면서 상세하게 설명하도록 한다. 우선 화자 등록 블럭에 대하여 살펴본다. 화자 등록부(100)는 상기 화자 등록 및 인증 시스템에서 사용자의 등록 요구에 대해 사용자 고유의 음성을 등록하는 기능블록으로 등록음성 입력부(101), 제1전처리부(102), 제1제어부(103), SVM 분류기(104) 및 화자 모델(105)로 구성된다. Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings. 1 is a block diagram showing the configuration of a speaker registration and authentication system using the SVM (Support Vector Machine) according to the present invention. First, the operation of the speaker registration and authentication system using the SVM according to the present invention will be described with reference to FIG. 1, and then described in detail with reference to FIGS. 2 to 4. First, the speaker registration block will be described. The speaker registration unit 100 is a function block for registering a user's own voice in response to a user's registration request in the speaker registration and authentication system. The registered voice input unit 101, the first preprocessor 102, and the first controller 103 , SVM classifier 104 and speaker model 105.

상기 화자 등록부(100)의 동작을 살펴보면, 등록음성 입력부(101)에서 입력받은 사용자의 등록 음성을 제1전처리부(102)에서 끝점 검출 및 전처리 과정을 수행하여 제1제어부(103)로 인가시킨다. 제1제어부(103)는 상기 전처리부를 거친 음성으로부터 음성특징을 추출하고 추출된 음성특징을 염색체로 대응시켜 염색체로 표현되는 특징 집합을 구성한 후, SVM분류기(104)를 통해 각 염색체에 대한 적합도 를 측정한다. Referring to the operation of the speaker registration unit 100, the registered voice of the user input from the registered voice input unit 101 performs the end point detection and preprocessing process in the first preprocessor 102 to apply to the first control unit 103. . The first control unit 103 extracts the voice feature from the voice that passed through the preprocessing unit, maps the extracted voice feature to the chromosome, constructs a feature set represented by the chromosome, and then uses the SVM classifier 104 to determine the fitness for each chromosome. Measure

이때 제1제어부(103)는 유전자 알고리즘(Genetic Algorithm)를 이용하여 교차(Crossover)와 돌연변이 등으로 상기 염색체 값을 진화시키면서 각 사람의 음성특징을 가장 잘 나타내는 최고의 염색체를 찾아내어 이를 입력벡터로 하여 SVM(Support Vector Machine)을 생성함으로써 각 사용자에 대한 화자 모델(105)을 생성하여 저장한다. 여기서 유전자 알고리즘은 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자라면 누구든지 이해하는 것이므로 설명을 생략한다.At this time, the first control unit 103 finds the best chromosome that best represents the voice characteristics of each person by evolving the chromosome value by crossover and mutation using a genetic algorithm, and uses it as an input vector. By creating a support vector machine (SVM), a speaker model 105 for each user is generated and stored. Here, the genetic algorithm is omitted because it is understood by those of ordinary skill in the art.

이제 화자 인증 블럭에 대하여 살펴본다. 화자 인증부(110)는 상기 화자 등록 및 인증 시스템에서 사용자의 인증 요구에 대해 사용자의 인증 여부를 결정하는 블록으로 인증음성 입력부(111), 개인정보 입력부(114), 제2전처리부(112), 제2제어부(113),SVM분류기(104) 및 화자 모델(105)로 구성된다. Now look at the speaker authentication block. The speaker authentication unit 110 is a block for determining whether the user is authenticated to the authentication request of the user in the speaker registration and authentication system, the authentication voice input unit 111, the personal information input unit 114, and the second preprocessor 112. , The second control unit 113, the SVM classifier 104, and the speaker model 105.

상기 화자 인증부(180)의 동작을 살펴보면, 인증음성 입력부(100)와 개인정보입력부(114)로부터 각각 사용자의 인증 음성과 개인정보를 받아들인다. 제2전처리부(112)에서는 인증음성입력부(111)에서 획득한 인증을 요하는 사용자의 음성으로부터 끝점을 검출하고, 전처리 과정을 수행하여 제2제어부(113)로 인가시킨다. 제2제어부(113)는 개인정보 입력부(114)를 통해 상기 신원인증을 요구한 사용자가 입력하는 개인 식별 정보를 이용하여 상기 인증 시스템에 등록된 사용자 중 상기 개인 식별 정보에 해당하는 사용자의 화자인증을 위해 화자 모델(140)에 저장된 SV값과 가중치 정보, 그리고 개인별 식별력이 우수한 특징 선택값을 읽어 들인다. Referring to the operation of the speaker authentication unit 180, the authentication voice and personal information of the user is received from the authentication voice input unit 100 and the personal information input unit 114, respectively. The second preprocessor 112 detects an end point from the voice of the user requiring authentication obtained by the authentication voice input unit 111, performs a preprocessing process, and applies the preprocessing process to the second controller 113. The second control unit 113 uses the personal identification information input by the user who has requested the identity authentication through the personal information input unit 114, and the speaker authentication of the user corresponding to the personal identification information among the users registered in the authentication system. For reading the SV value and weight information stored in the speaker model 140, and a feature selection value excellent in individual identification.

그리고 제2제어부(113)는 상기 SV값과 가중치 정보를 이용하여 인증용SVM을 생성하고, 상기 특징 선택값을 이용하여 상기 화자 등록 시스템에 저장된 식별력이 우수한 특징 집합에 해당하는 특징값을 추출하여 그 추출된 값을 입력벡터로 하여 SVM 분류기(104)를 통해 상기 인증용SVM에 의하여 화자 인증을 수행한다. The second controller 113 generates an SVM for authentication using the SV value and the weight information, and extracts a feature value corresponding to a feature set having excellent distinguishing power stored in the speaker registration system using the feature selection value. Speaker authentication is performed by the SVM for authentication through the SVM classifier 104 using the extracted value as an input vector.

이때 상기 개인정보 입력부(114)는 다수의 숫자키를 구비하여 화자인증 요청 사용자가 자신의 개인 식별 정보를 숫자로 입력할 수 있도록 한 사용자 정보 입력 인터페이스일 수도 있으며, 사용자별로 키가 할당되는 USB 보안 토큰(Token) 입력 인터페이스로 구현될 수도 있다. 또한 상기 개인정보 입력부(114)는 근래에 들어 신원인증을 위해 널리 사용되고 있는 신분(Identification: ID) 카드를 인식할 수 있는 신분카드 인식부 등이 될 수도 있는데, 상기 신분카드로 사용되는 스마트카드 등과 같은 IC 카드의 경우와 같이 자원이 한정적인 경우에도 화자인증을 위한 SVM 분류기(104) 생성에 필요한 상기 사용자별 SV값, 가중치 정보 및 개인별 식별력이 우수한 특징 선택값을 미리 저장시켜 IC 카드 자체에서 화자 인증이 가능하게 함으로써, 보안성을 한층 더 높일 수 있다.In this case, the personal information input unit 114 may be a user information input interface having a plurality of numeric keys so that the speaker authentication request user may input his or her personal identification information as a number. It may be implemented as a token input interface. In addition, the personal information input unit 114 may be an identification card recognition unit for recognizing an identification (ID) card which is widely used for identification in recent years, such as a smart card used as the identification card, etc. Even when resources are limited, such as in the case of the same IC card, the speaker for the IC card itself is stored in advance by storing the user SV value, the weight information, and the feature selection value having excellent personal identification in advance, which are necessary for generating the SVM classifier 104 for speaker authentication. By enabling authentication, security can be further enhanced.

이하에서는 위에서 언급한 SVM을 이용한 화자 등록 및 인증 시스템에 대하여 보다 상세한 설명을 하기로 한다.Hereinafter, the speaker registration and authentication system using the above-mentioned SVM will be described in more detail.

도 2는 본 발명의 일실시예에 따른 SVM분류기를 구비한 화자 등록 시스템에서 등록된 사람의 선택된 특징값을 입력벡터로 하여 SVM을 생성한 후, 생성된 SVM을 표현하는 SV값과 가중치를 저장하고, 또한 선택된 특징 정보를 저장하는 동작 의 흐름을 도시한 것이다. 이하 상기 도 1 및 도 2를 참조하여 본 발명의 실시 예를 상세히 설명한다.FIG. 2 illustrates an SVM generated by using a selected feature value of a registered person as an input vector in a speaker registration system having an SVM classifier according to an embodiment of the present invention, and then stores SV values and weights representing the generated SVMs. And a flow of operations for storing the selected feature information. Hereinafter, an embodiment of the present invention will be described in detail with reference to FIGS. 1 and 2.

먼저 제1제어부(103)는 상기 화자 등록 시스템에 등록하여야 하는 사용자들의 사용자별 특징값을 생성하기 위해 상기 음성 입력부(101)를 제어하여 상기 시스템에 등록될 사용자의 음성을 입력받도록 한다(S200단계). 이어 제1전처리부(102)는 상기 등록음성입력부(101)가 출력하는 사용자의 입력 음성으로부터 시작점과 끝점을 검출하여 묵음구간을 제외한 음성을 추출한다(S201단계). 그리고 제1전처리부(102)는 상기 추출된 음성을 Preemphasis 과정을 통하여 음성 신호의 고주파 영역을 강조하는 필터링을 하여 음성특징을 추출하기 위한 전처리 과정을 수행한다(S202단계).First, the first controller 103 controls the voice input unit 101 to generate a user-specific feature value of users who should be registered in the speaker registration system so as to receive a voice of a user to be registered in the system (S200). ). Subsequently, the first preprocessor 102 detects a start point and an end point from the user's input voice output by the registered voice input unit 101 and extracts the voice excluding the silent section (step S201). In operation S202, the first preprocessor 102 performs a preprocessing process to extract a voice feature by filtering the extracted voice to emphasize a high frequency region of the voice signal through a preemphasis process.

이어 제1제어부(101)는 (S203)단계 내지 (S208)단계를 수행하면서 각 개인의 음성 특징 정보 중에서 유전자 알고리즘을 이용한 진화 과정을 통해 개인별로 식별 능력이 우수한 특징들만을 선택하여 개인별로 고유한 차원의 SVM 특징 벡터를 계산해 내는데 이하 그 과정을 설명한다. 이를 위해 먼저 제1제어부(101)는 상기 등록될 사용자의 음성과 시스템에 등록된 타인의 음성들을 학습 데이터와 튜닝 데이터로 나누어 두고, 식별능력이 우수한 특징을 선택하기 위한 학습과정에 사용될 학습 데이터와 튜닝 데이터 각각에 대하여 전체 특징 리스트에 해당되는 값들을 미리 계산해 둔다. 즉, 제1제어부(101)는 상기 음성특징으로부터 상기 학습 데이터와 튜닝 데이터로 사용되는 사용자 음성과 타인의 음성에 대해 각 염색체 표현에 사용될 전체 특징값들을 미리 계산하여 저장하여 두는데, 이는 다음에 설명할 유전자 알고리즘에 의한 특징 집합 선택 과정에서 특징값들이 반복하여 계산되는 것을 막기 위한 것이다.Subsequently, the first control unit 101 performs steps S203 to S208 and selects only features having excellent discrimination ability for each individual through an evolutionary process using a genetic algorithm among the individual voice feature information, thereby uniquely identifying each individual. To calculate the dimension SVM feature vector, the process is described below. To this end, the first control unit 101 divides the voice of the user to be registered and the voices of others registered in the system into learning data and tuning data, and the learning data to be used in the learning process for selecting a feature having excellent identification ability. For each tuning data, values corresponding to the entire feature list are calculated in advance. That is, the first controller 101 calculates and stores in advance the entire feature values to be used for each chromosome expression for the user voice and the voice of another person used as the training data and the tuning data from the voice feature. This is to prevent feature values from being repeatedly calculated in the feature set selection process by the genetic algorithm to be described.

이어 제1제어부(101)는 유전자 알고리즘을 이용하여 음성특징을 가장 잘 표현할 최적의 특징집합을 선택하기 위하여 상기 미리 계산된 개인별 음성의 특징값들에서 각 염색체에 대한 특징값을 추출하여 SVM의 입력 벡터를 구성(S204단계)한 후, 각각의 선택된 특징 집합을 표현하는 염색체의 적합도를 측정하여 적합도 또는 세대수가 일정 수에 이를 때까지 염색체 교차와 돌연변이를 이용한 염색체의 진화 과정을 수행한다(S205단계). 여기서 적합도를 구하는 (S205단계)의 상세한 과정은 도 3을 참조하면서 보다 자세하게 설명한다.Subsequently, the first controller 101 extracts a feature value for each chromosome from the pre-calculated individual feature values to select an optimal feature set that best expresses the voice feature using a genetic algorithm and inputs the SVM. After constructing the vector (step S204), the fitness of the chromosome expressing each selected feature set is measured, and the evolution of the chromosome using chromosomal crossover and mutation is performed until the fitness or generation number reaches a predetermined number (step S205). ). Here, the detailed process of obtaining the suitability (step S205) will be described in more detail with reference to FIG.

도 3은 상기 제1제어부(101)에서 염색체에 대한 적합도를 측정하는 서브루틴 처리 흐름을 도시한 것으로, 상기 도 3을 참조하면, 제1제어부(101)는 상기 학습 데이터의 특징값들 중에서, 염색체로 표현된 선택된 특징들에 해당되는 값들을 추출하여 이를 입력벡터로 만들어 SVM을 생성하고(S300단계), 이렇게 생성된 SVM에 대하여 튜닝 데이터의 특징값들 중에서, 염색체로 표현된 선택된 특징들에 해당되는 값들을 추출하여 이를 상기 생성된 SVM에 입력하여 인증 성공 횟수에 따른 SVM의 분리 능력을 평가하고, 이를 적합도로 사용한다(S301단계).FIG. 3 illustrates a subroutine processing flow for measuring the fitness of the chromosome in the first control unit 101. Referring to FIG. 3, the first control unit 101 includes the following values from the feature values of the training data. Extract the values corresponding to the selected features represented by the chromosome into an input vector to generate an SVM (step S300), and among the feature values of the tuning data for the generated SVM, the selected features represented by the chromosome The corresponding values are extracted and input to the generated SVM to evaluate the separation capability of the SVM according to the number of successful authentications, and use the appropriateness (step S301).

이때 만일 상기 서브루틴의 처리 흐름을 통해 측정되는 상기 염색체의 적합도가 미리 설정된 기준치를 초과하거나 상기 염색체의 진화과정이 미리 설정된 세대수에 이르는지를 판단하여, 유전자 알고리즘의 종료여부를 결정한다(S206단계). 만약 기준치를 초과하였거나 세대수에 도달하였으면 적합도가 최고로 측정된 염색체의 선택된 특징값을 입력벡터로 하여 다시 SVM을 생성(S207단계)하고, 생성된 SVM을 저장하기 위하여 SV값과 가중치를 저장하고, (S208)단계에서 상기 최고의 염 색체로 표현된 특징 집합을 저장함으로써 인가된 사용자의 화자 등록 과정을 종료하게 된다(S206단계).In this case, if the fitness of the chromosome measured by the processing flow of the subroutine exceeds a preset reference value or determines whether the evolution of the chromosome reaches a preset number of generations, it is determined whether to terminate the genetic algorithm (step S206). . If the reference value is exceeded or the number of generations is reached, an SVM is generated again using the selected feature value of the chromosome whose fitness is best measured as an input vector (step S207), and the SV value and weight are stored to store the generated SVM, and ( In step S208, the speaker registration process of the authorized user is terminated by storing the feature set represented by the highest chromosome (step S206).

도 4는 본 발명의 실시 예에 따른 SVM분류기를 구비한 화자 인증 시스템에서 화자인증을 위한 과정의 흐름을 도시한 것이다. 이하 상기 도 1 및 도 4를 참조하여 화자 인증 방법의 일 실시예를 상세히 설명한다.4 is a flowchart illustrating a process for speaker authentication in a speaker authentication system having an SVM classifier according to an embodiment of the present invention. Hereinafter, an embodiment of a speaker authentication method will be described in detail with reference to FIGS. 1 and 4.

먼저 제2제어부(113)는 임의의 사용자로부터 화자 인증 요구가 있는 경우 인증음성입력부(111)를 통해 상기 사용자의 음성을 입력받도록 한다. 제2전처리부(112)는 상기 인증하고자 하는 사용자가 입력한 음성에서 시작점과 끝점을 검출하여 묵음 구간을 제외한 음성을 추출한다(S401단계). 이어 상기 추출된 음성을 Preemphasis 과정을 통하여 음성 신호의 고주파 영역을 증폭하여 제2제어부(113)으로 출력하게 된다(S402단계). 또한 개인정보입력부(170)은 위에서 설명한 것처럼 인증을 요구하는 사용자로부터 개인 식별 정보를 입력받는다. 제2제어부(113)는 상기 개인정보 입력부(170)를 통해 입력되는 개인 식별 정보를 통해 상기 인증 시스템 내에 회원으로 등록된 사용자 중 상기 개인 식별 정보에 해당하는 사용자의 화자인증을 위해 이미 저장되어 있는 SV값과 가중치 정보 및 특징 정보값의 개인별 식별력이 우수한 특징 선택값을 화자모델(105)로부터 읽어 들인 후, 상기 제2전처리부(112)가 출력한 입력 음성에 대하여 화자모델(105)로부터 읽어 들인 식별력이 우수한 특징 선택값에 대응되는 특징값을 추출하고(S403단계), 화자모델(105)에서 읽어 들인 SV값과 가중치 정보를 이용하여 인증SVM을 생성한다(S404단계). 제2제어부(113)는 상기 추출된 특징값을 입력벡터로 하여 상기 생성된 인증SVM에 적용함으로써 화자 인증을 수행하고(S405단계), 상기 인증 결과를 화면에 출력시킨다(S406단계).First, the second control unit 113 receives a voice of the user through the authentication voice input unit 111 when a speaker authentication request is made from any user. The second preprocessor 112 detects a start point and an end point from the voice input by the user to be authenticated and extracts the voice excluding the silent section (step S401). Subsequently, the extracted voice is amplified by the high frequency region of the voice signal through a preemphasis process and output to the second controller 113 (step S402). In addition, the personal information input unit 170 receives personal identification information from a user requesting authentication as described above. The second controller 113 is already stored for speaker authentication of a user corresponding to the personal identification information among the users registered as members in the authentication system through the personal identification information input through the personal information input unit 170. A feature selection value having excellent individual discriminating power of the SV value, the weight information, and the feature information value is read from the speaker model 105 and then read from the speaker model 105 with respect to the input voice output from the second preprocessing unit 112. A feature value corresponding to the feature selection value with excellent discrimination power is extracted (step S403), and an authentication SVM is generated using the SV value and weight information read from the speaker model 105 (step S404). The second controller 113 performs speaker authentication by applying the extracted feature value as an input vector to the generated authentication SVM (step S405) and outputs the authentication result to the screen (step S406).

따라서 상기와 같이 본 발명에 의한 SVM을 이용한 화자 등록 및 인증 방법에서는 개인별로 식별 능력이 우수한 특징 집합을 선택함으로써, 인증 과정을 수행함에 있어 자원 사용을 효율적으로 하여 한정된 자원 하에서도 높은 성능의 인증 시스템을 구현할 수 있게 된다.Therefore, in the speaker registration and authentication method using the SVM according to the present invention as described above, by selecting a feature set having excellent identification capability for each individual, the high performance authentication system under limited resources by efficiently using resources in performing the authentication process Can be implemented.

한편 지금까지는 화자 등록 방법과 화자 인증 방법을 분리해서 설명하였으나 두 가지 방법이 결합하여 화자 등록 및 인증 방법으로 구현될 수 있음은 명백하므로 그 설명은 생략하기로 한다.Meanwhile, the speaker registration method and the speaker authentication method have been separately described so far, but since the two methods can be combined and implemented as a speaker registration and authentication method, the description thereof will be omitted.

본 발명에 의한 SVM을 이용한 화자 등록 혹은 화자 인증 방법은 또한 컴퓨터로 읽을 수 있는 기록매체에 컴퓨터가 읽을 수 있는 코드로서 구현되는 것이 가능하다. 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 시스템에 의하여 읽혀질 수 있는 데이터가 저장되는 모든 종류의 기록장치를 포함한다. 컴퓨터가 읽을 수 있는 기록매체의 예로는 ROM, RAM, CD-ROM, 자기 테이프, 하드 디스크, 플로피 디스크, 플래쉬 메모리, 광 데이타 저장장치등이 있으며, 또한 캐리어 웨이브(예를들면 인터넷을 통한 전송)의 형태로 구현되는 것도 포함된다. 또한 컴퓨터가 읽을 수 있는 기록매체는 컴퓨터 통신망으로 연결된 컴퓨터 시스템에 분산되어, 분산방식으로 읽을 수 있는 코드로서 저장되고 실행될 수 있다. 또한 본 발명에 의한 폰트 롬 데이터구조도 컴퓨터로 읽을 수 있는 ROM, RAM, CD-ROM, 자기 테이프, 하드 디스크, 플로피 디스크, 플래쉬 메모리, 광 데이타 저장장치등과 같은 기록매체에 컴퓨터가 읽 을 수 있는 코드로서 구현되는 것이 가능하다.The speaker registration or speaker authentication method using the SVM according to the present invention may also be implemented as computer readable codes on a computer readable recording medium. The computer-readable recording medium includes all kinds of recording devices in which data that can be read by a computer system is stored. Examples of computer-readable recording media include ROM, RAM, CD-ROM, magnetic tape, hard disk, floppy disk, flash memory, optical data storage device, and also carrier wave (e.g. transmission over the Internet). It is also included to be implemented in the form of. The computer readable recording medium can also be distributed over computer systems connected over a computer network so that the computer readable code is stored and executed in a distributed fashion. Also, the font ROM data structure according to the present invention can be read by a computer on a recording medium such as a computer readable ROM, RAM, CD-ROM, magnetic tape, hard disk, floppy disk, flash memory, optical data storage device, or the like. It can be implemented as code.

이상과 같이 본 발명은 양호한 실시예에 근거하여 설명하였지만, 이러한 실시예는 이 발명을 제한하려는 것이 아니라 예시하려는 것으로, 본 발명이 속하는 기술분야의 숙련자라면 이 발명의 기술사상을 벗어남이 없이 위 실시예에 대한 다양한 변화나 변경 또는 조절이 가능함이 자명할 것이다. 그러므로, 이 발명의 보호범위는 첨부된 청구범위에 의해서만 한정될 것이며, 위와 같은 변화예나 변경예 또는 조절예를 모두 포함하는 것으로 해석되어야 할 것이다.As described above, the present invention has been described based on the preferred embodiments, but these embodiments are intended to illustrate the present invention, not to limit the present invention, and those skilled in the art to which the present invention pertains should be practiced without departing from the technical spirit of the present invention. It will be apparent that various changes, modifications, or adjustments to the examples are possible. Therefore, the protection scope of the present invention will be limited only by the appended claims, and should be construed as including all such changes, modifications or adjustments.

이상에서 설명한 바와 같이, 본 발명에 의한 SVM을 이용한 화자 등록 및 인증 시스템 과 그 방법에 있어서, 화자 등록 과정에서는 개인별로 식별력이 우수한 특징집합을 선택하고, 화자 인증 과정에서는 상기 화자 등록시 학습과정에서 선택된 특징 집합만을 사용함으로써, 불필요한 정보에 따른 메모리 사용을 줄이고, 화자 인증을 위한 데이터 계산량도 줄일 수 있어 USB 토큰 또는 스마트카드와 같은 제한된 자원 환경 하에서도 화자 인증을 통한 신원인증을 가능하게 하는 이점이 있다. 또한 식별력이 우수한 특징 정보의 사용에 따라 인증 성능도 향상시킬 수 있으며, 인증과정에서는 학습을 통해서 만들어진 최적의 특징 집합에 의한 SVM에 의해 화자 인증이 수행됨에 따라 화자 인증 수행시간도 단축시킬 수 있는 이점이 있다.As described above, in the speaker registration and authentication system and the method using the SVM according to the present invention, in the speaker registration process, a feature set having excellent discrimination ability is selected for each individual, and in the speaker authentication process, the speaker registration process is selected in the learning process when registering the speaker. By using only the feature set, the memory usage according to unnecessary information can be reduced, and the amount of data calculation for speaker authentication can be reduced, thereby enabling identity authentication through speaker authentication even under limited resource environment such as USB token or smart card. . In addition, the authentication performance can be improved according to the use of distinguishable feature information, and in the authentication process, the speaker authentication time can be shortened as the speaker authentication is performed by SVM based on the optimal feature set created through learning. There is this.

Claims

등록될 사용자의 음성을 입력받는 음성입력부;A voice input unit configured to receive a voice of a user to be registered;

상기 입력된 음성에서 묵음구간을 제외한 음성을 추출한 후 상기 묵음이 제외된 음성의 고주파 영역을 강조한 후 소정의 음성특징을 적어도 하나 이상 추출하는 전처리부; 및A pre-processing unit extracting at least one voice feature after extracting a voice excluding a silent section from the input voice, emphasizing a high frequency region of the voice excluding the silence; And

상기 추출된 각 음성특징을 염색체로 매핑하여 특징집합을 구성하고 상기 특징집합에 대하여 염색체 교차와 돌연변이를 기초로하는 염색체 진화 알고리즘을 적용하여 상기 최적특징집합을 구한 후 상기 최적특징집합을 입력벡터로 하여 SVM(Support Vector Machine)을 생성함으로써 화자모델을 구성하는 제어부;를 포함하는 것을 특징으로 하는 SVM을 이용한 화자 등록 시스템.Map each extracted negative feature to a chromosome to construct a feature set, apply the chromosome evolution algorithm based on chromosomal crossing and mutation to the feature set to obtain the optimal feature set, and then convert the optimal feature set into an input vector. And a controller configured to construct a speaker model by generating a support vector machine (SVM).

삭제delete

제1항에 있어서, 상기 제어부는The method of claim 1, wherein the control unit

상기 전처리부의 출력신호를 학습용 음성과 튜닝용 음성으로 나눈 후 상기 학습용 음성과 튜닝용 음성에 대하여 상기 특징집합을 각각 생성하여, 학습용 음성특징집합을 기초로 SVM을 생성한 후 튜닝용 음성특징집합의 값들을 상기 SVM에 입력하여 적합도를 구하는 것을 특징으로 하는 SVM을 이용한 화자 등록 시스템.After dividing the output signal of the preprocessor into a learning voice and a tuning voice, the feature set is generated for the learning voice and the tuning voice, respectively, and an SVM is generated based on the learning voice feature set. Speaker registration system using the SVM, characterized in that to obtain the goodness of fit by inputting the values to the SVM.

등록된 사용자별로 SV(Support Vector), 가중치정보, 최적특징값으로 구성된 화자모델을 구비하여 화자를 인증하는 시스템에 있어서,In the system for authenticating the speaker by having a speaker model composed of SV (Support Vector), weight information, optimal feature value for each registered user,

인증을 요하는 사용자의 음성을 입력받는 음성입력부;A voice input unit configured to receive a voice of a user requiring authentication;

상기 입력된 음성에서 묵음구간을 제외한 음성을 추출한 후 상기 묵음이 제외된 음성의 고주파 영역을 증폭한 후 소정의 전처리과정을 수행한 후 소정의 음성특징을 적어도 하나 이상 추출하는 전처리부;A pre-processing unit for extracting at least one predetermined voice feature after extracting a voice excluding a silent section from the input voice, amplifying a high frequency region of the voice without silence, performing a predetermined preprocessing step;

상기 인증을 요구하는 사용자의 개인정보를 입력받는 개인정보입력부; 및A personal information input unit configured to receive personal information of a user requesting authentication; And

상기 개인정보에 기초하여 대응되는 상기 화자모델내의 상기 SV, 가중치정보를 기초로 인증용SVM을 생성한 후, 상기 인증을 요하는 사용자의 입력 음성에서 상기 최적특징값에 대응되는 특징값을 추출하여 상기 인증용SVM의 입력벡터로 하여 인증을 수행하는 제어부;를 포함하는 것을 특징으로 하는 SVM을 이용한 화자 인증 시스템.After generating an SVM for authentication based on the SV and weight information in the speaker model based on the personal information, and extracting a feature value corresponding to the optimal feature value from the input voice of the user requiring authentication And a controller configured to perform authentication as an input vector of the SVM for authentication.

삭제delete

(a) 등록될 사용자의 음성을 입력받는 단계;(a) receiving a voice of a user to be registered;

(b) 상기 입력된 음성에서 묵음 구간을 제외한 음성을 추출하고 상기 묵음 구간이 제외된 음성의 고주파 영역을 증폭한 후 소정의 음성특징을 적어도 하나 이상 추출하는 단계;(b) extracting a voice excluding a silent section from the input voice, amplifying a high frequency region of the voice excluding the silent section, and extracting at least one predetermined voice feature;

(c) 상기 추출된 각 음성특징을 염색체로 매핑하여 특징집합을 구성하는 단계; 및(c) constructing a feature set by mapping each of the extracted negative features to a chromosome; And

(d) 상기 특징집합을 소정의 학습용 특징집합과 튜닝용 특징집합으로 구분하여 생성한 후 상기 학습용 특징집합과 튜닝용 특징집합에 대하여 유전자 알고리즘을 적용하여 상기 사용자의 음성특징을 대표하는 최적특징을 생성한 후 상기 최적특징을 입력벡터로 하여 등록음성SVM을 생성하고, 상기 튜닝용 특징집합을 입력벡터로 하여 상기 등록음성SVM에 입력하여 적합도를 구함으로써 상기 사용자를 대표하는 화자모델을 구성하는 단계;를 포함하는 것을 특징으로 하는 SVM을 이용한 화자 등록 방법.(d) generating the feature set by dividing the feature set into a predetermined learning feature set and a tuning feature set, and then applying a genetic algorithm to the learning feature set and the tuning feature set to generate an optimal feature representing the voice feature of the user. Constructing a speaker model representing the user by generating a registered voice SVM using the optimal feature as an input vector, and inputting the tuning feature set as an input vector to the registered voice SVM to obtain a goodness of fit; Speaker registration method using SVM comprising a.

삭제delete

등록된 사용자별로 SV(Support Vector), 가중치정보, 최적특징값으로 구성된 화자모델을 기초로 화자를 인증하는 방법에 있어서,In the method of authenticating the speaker based on the speaker model composed of SV (Support Vector), weight information, optimal feature value for each registered user,

(a) 인증을 요구하는 사용자의 음성을 입력받아 묵음 구간을 제외한 음성을 추출하고, 상기 묵음 구간이 제외된 음성의 고주파 영역을 증폭한 후 소정의 음성특징을 추출하는 단계;(a) receiving a voice of a user requesting authentication, extracting a voice excluding a silent section, amplifying a high frequency region of the voice excluding the silent section, and extracting a predetermined voice feature;

(b) 상기 인증을 요구하는 사용자의 개인정보를 입력받는 단계;(b) receiving personal information of a user requesting authentication;

(c) 상기 개인정보를 기초로 상기 화자모델내의 SV, 가중치정보, 최적특징값을 읽어오는 단계; 및(c) reading SV, weight information, and optimal feature values in the speaker model based on the personal information; And

(d) 상기 SV, 가중치정보로 인증SVM을 생성한 후, 상기 최적특징값에 대응되는 상기 추출된 음성특징을 상기 인증SVM의 입력벡터로 하여 화자인증을 수행하는 단계;를 포함하는 것을 특징으로 하는 SVM을 이용한 화자 인증 방법.(d) generating an authentication SVM using the SV and weight information, and performing speaker authentication using the extracted voice feature corresponding to the optimal feature value as an input vector of the authentication SVM; Speaker authentication method using SVM.

삭제delete

제11항에 있어서, 상기 (d)단계는The method of claim 11, wherein step (d)

상기 화자 인증 결과를 화상처리장치로 출력하여 상기 인증을 요구하는 사용자에게 인증결과를 보고하는 단계;를 더 포함하는 것을 특징으로 하는 SVM을 이용한 화자 인증 방법.And outputting the speaker authentication result to an image processing apparatus and reporting the authentication result to a user requesting the authentication.