KR20180049422A

KR20180049422A - Speaker authentication system and method

Info

Publication number: KR20180049422A
Application number: KR1020160144756A
Authority: KR
Inventors: 오선호; 김건우; 한승완
Original assignee: 한국전자통신연구원
Priority date: 2016-11-01
Filing date: 2016-11-01
Publication date: 2018-05-11
Also published as: KR102604319B1

Abstract

Disclosed are a speaker authentication system and a method thereof. According to an embodiment of the present invention, the speaker authentication system comprises: an information collection unit to suggest a word arrangement structure where different words are randomly arranged whenever speaker authentication is performed to collect speaker voice data; a pattern recognition unit to test whether a speaker pattern in the word arrangement structure in accordance with a word order recognized in the voice data and a preregistered user unique pattern are identical to each other; a voice recognition unit to compare a voice feature of the speaker extracted by analyzing the voice data with a preregistered user unique voice to test whether the voice feature and the user unique voice are identical to each other; and a control unit to permit authentication if the speaker pattern and the voice feature are identical to the preregistered user unique pattern and unique voice.

Description

화자 인증 시스템 및 그 방법{SPEAKER AUTHENTICATION SYSTEM AND METHOD}[0001] SPEAKER AUTHENTICATION SYSTEM AND METHOD [0002]

본 발명은 화자 인증 시스템 및 그 방법에 관한 것으로서, 보다 상세하게는 스마트폰에서 사용자 음성 패턴을 이용한 화자 인증 시스템 및 그 방법에 관한 것이다. The present invention relates to a speaker authentication system and a method thereof, and more particularly, to a speaker authentication system using a user voice pattern in a smartphone and a method thereof.

일반적으로 스마트폰이나 컴퓨터와 같은 개인 정보통신 단말기, 게이트 출입 시스템, 클라이언트 서버간 네트워크 시스템 및 온라인 결제 서비스 등에는 정당한 사용자의 인증을 위한 다양한 인식기술이 사용되고 있다.In general, various recognition technologies are used for authentication of legitimate users in a personal information communication terminal such as a smart phone or a computer, a gateway access system, a network system between a client server and an online settlement service.

기존의 인증과 이를 위한 인식 기술은 개인정보 유출에 취약한 사용자 아이디/패스워드(ID/PW), 주민등록번호, 휴대폰 번호 등을 보완하기 위하여 휴대폰 인증, 카드(예; 난수 카드, RF 태그, 신용카드) 인증 및 OTP(One Time Password) 인증을 병행하여 복잡하게 사용자 인증절차를 강화하고 있다.Conventional authentication and recognition technology for this purpose require mobile phone authentication, card (eg, random card, RF tag, credit card) certification to supplement user ID / password (ID / PW), resident registration number, And OTP (One Time Password) authentication.

그러나, 위와 같이 강화된 인증절차를 수행하기 위해서는 복잡한 인터넷 환경 및 모바일 환경의 인증 프로그램을 설치해야 하고, 복잡한 인증절차를 수행해야 하는 불편과 인증을 위해 카드나 OTP 단말을 반드시 소지해야만 하는 단점이 있다.However, in order to perform the authentication procedure as described above, it is necessary to install an authentication program of a complicated internet environment and a mobile environment, inconvenience of complicated authentication procedure, and a disadvantage that a card or an OTP terminal must be possessed for authentication .

한편, 사용자의 고유한 생체정보를 활용하는 지문, 얼굴 및 음성 등을 이용한 인식기술이 개발되고 있으나, 지문인식 기술은 복제로 도용 가능한 단점이 있으며, 얼굴인식 기술은 아직까지 산업에서 요구하는 수준의 인식 성능을 보여주지 못하고 있다.On the other hand, recognition technology using fingerprint, face and voice utilizing unique biometric information of the user has been developed. However, fingerprint recognition technology has a disadvantage that it can be used by copying, and face recognition technology is still required in industry The recognition performance is not shown.

또한, 음성인식 기술은 화자의 발화로 인한 보안단어의 유출과 녹음으로의 도용이 가능한 단점이 지적되고 있다.In addition, the speech recognition technology has been pointed out as a disadvantage that the leakage of the security word due to the speaker's utterance and the stealing by the recording can be done.

이러한 단점을 해결하기 위하여, 특허문헌 1(한국등록특허 제1181060호)에는 보안 단어가 포함되어 있는 임의의 보안 문장을 생성하고 이를 발화한 사용자의 보안 문장 및 보안 단어의 일치 여부에 따른 인증 기술을 개시하고 있다.To solve these drawbacks, Patent Document 1 (Korean Patent No. 1181060) generates an arbitrary security sentence including a security word, and generates an authentication technique according to whether the user's security statement and security word match Lt; / RTI >

그러나, 상기 특허문헌 1은 사용자가 인증을 위해 보안 단어를 포함하는 보안 문장을 전체를 발화하여 모두 일치시켜야 하기 때문에 보안 문장이 길어질 수록 화자의 휴먼에러 및 음성인식실패의 확률이 높아져 그 인식성능이 떨어지는 문제점이 있다. However, in the above-mentioned Patent Document 1, since the user must completely match the security sentences including the security word for authentication and match them all, the longer the security sentence becomes, the higher the probability of the human error and the speech recognition failure of the speaker becomes, There is a falling problem.

특허문헌 1 : 한국등록특허 제1181060호 (2012.09.07. 공고)Patent Document 1: Korean Patent No. 1181060 (issued on September 7, 2012)

본 발명의 실시 예는 무작위로 제시된 단어를 화자가 미리 등록된 사용자 고유패턴에 따라 발음하는 것을 인식하여 사용자 인증을 수행하는 화자 인증 시스템 및 그 방법을 제공하고자 한다.An embodiment of the present invention is to provide a speaker authentication system and a method for performing user authentication by recognizing that a speaker presents a randomly presented word according to a user's unique pattern registered in advance.

본 발명의 일 측면에 따르면, 화자 인증 시스템은, 화자 인증 실행 시 마다 서로 다른 단어가 무작위로 배열된 단어 배열 구조를 제시하여 화자 음성 데이터를 수집하는 정보 수집부; 상기 음성 데이터에서 인식된 단어 순서에 따른 상기 단어 배열 구조에서의 화자 패턴과 사전에 등록된 사용자 고유패턴의 일치 여부를 검사하는 패턴 인식부; 상기 음성 데이터를 분석하여 추출된 상기 화자의 음성 특징을 사전에 등록된 사용자 고유음성과 비교하여 일치 여부를 검사하는 음성 인식부; 및 상기 화자 패턴 및 음성특징이 상기 사전에 등록된 사용자의 고유패턴 및 고유음성과 모두 일치하면 인증을 허가하는 제어부를 포함한다.According to an aspect of the present invention, a speaker authentication system includes an information collecting unit for collecting speaker voice data by presenting a word array structure in which different words are randomly arranged every time a speaker authentication is executed; A pattern recognition unit for checking whether a speaker pattern in the word arrangement structure according to a word sequence recognized in the speech data matches a user specific pattern registered in a dictionary; A voice recognition unit for analyzing the voice data and comparing voice characteristics of the speaker extracted by the voice recognition unit with previously registered user specific voice to check for coincidence; And a control unit for authorizing authentication when the speaker pattern and the voice feature coincide with the unique pattern and the unique voice of the user registered in advance.

또한, 상기 음성 데이터에서 화자가 발음한 음성의 음절구간을 구분하여 입력된 단어를 인식하는 단어 인식부; 및 상기 고유패턴 및 고유음성을 화자 인증을 위한 사용자 인증정보로 저장하는 데이터베이스부를 더 포함할 수 있다.A word recognition unit for recognizing an input word by distinguishing a syllable interval of a voice uttered by the speaker from the voice data; And a database unit for storing the unique pattern and the unique voice as user authentication information for speaker authentication.

또한, 상기 단어는, 인간의 발음이 가능한 문자, 글자, 숫자, 알파벳 및 기호 중 적어도 하나를 포함할 수 있다.Further, the words may include at least one of characters, letters, numbers, alphabets, and symbols that can be pronounced by humans.

또한, 상기 고유음성은, 상기 사용자만의 억양, 음절, 음색, 방언 및 성조 중 적어도 하나의 변별력을 가지는 음성 특징을 포함할 수 있다.In addition, the unique voice may include a voice feature having at least one distinguishing power among the user's intonation, syllable, tone, dialect, and tone.

또한, 상기 고유패턴은, 상기 단어 배열 구조에 무작위로 배열된 단어들 중에서 화자가 발화해야 하는 단어 입력 순서를 결정하는 정보일 수 있다.In addition, the unique pattern may be information for determining a word input order that a speaker should utter, among words randomly arranged in the word arrangement structure.

또한, 상기 정보 수집부는, 사용자 인터페이스 및 음성 인터페이스를 통해 상기 고유패턴 및 고유음성을 수집하여 사용자 인증정보로 등록할 수 있다.In addition, the information collecting unit may collect the unique pattern and the unique voice through the user interface and the voice interface, and register the unique pattern and the unique voice as the user authentication information.

또한, 상기 정보 수집부는, 종횡의 상기 단어 배열 구조에서 단어 배열 형태가 연속되거나 단어 배열 형태가 분리된 상기 고유패턴을 등록할 수 있다.In addition, the information collecting unit may register the unique pattern in which the word arrangement type is continuous or the word arrangement type is separated in the vertical and horizontal word arrangement structure.

또한, 상기 패턴 인식부는, 상기 화자 인증 실행 시마다 무작위로 제시된 상기 단어 배열 구조를 취득하고, 상기 단어 배열 구조에서 화자의 발음으로 입력된 단어 순서에 기초한 화자 패턴을 인식할 수 있다.The pattern recognition unit may acquire the word arrangement structure randomly presented at each execution of the speaker authentication, and recognize the speaker pattern based on the word order inputted by the speaker's pronunciation in the word arrangement structure.

또한, 상기 패턴 인식부는, 상기 단어 배열 구조에서 고유패턴에 기초한 제1 단어 순서와 상기 화자 패턴에 기초하여 입력된 제2 단어 순서를 비교하여 일치 여부를 검사할 수 있다.The pattern recognition unit may compare the first word order based on the unique pattern and the second word order inputted based on the speaker pattern in the word arrangement structure to check whether or not they match.

또한, 상기 음성 인식부는, 상기 등록된 사용자 고유음성에 대한 상기 화자 음성특징의 유사도가 소정 비율을 초과하면 일치한 것으로 판단하고, 상기 소정 비율 이하이면 인증에 실패한 것으로 판단할 수 있다.In addition, the speech recognition unit may determine that the similarity of the speaker's voice characteristic to the registered user-specific voice exceeds a predetermined ratio, and may determine that authentication fails if the ratio is less than the predetermined ratio.

한편, 본 발명의 일 측면에 따른, 단말기의 화자 인증 방법은, 화자 인증 실행 시 마다 서로 다른 단어가 무작위로 배열된 단어 배열 구조를 제시하여 화자 음성 데이터를 수집하는 단계; 상기 음성 데이터에서 인식된 단어 순서에 따른 상기 단어 배열 구조에서의 화자 패턴과 사전에 등록된 사용자 고유패턴의 일치 여부를 검사하는 단계; 상기 음성 데이터를 분석하여 추출된 상기 화자의 음성 특징과 사전에 등록된 사용자 고유음성의 일치 여부를 검사하는 단계; 및 상기 화자 패턴 및 음성특징이 상기 사전에 등록된 사용자의 고유패턴 및 고유음성과 모두 일치하면 인증을 허가하는 단계를 포함한다.According to an aspect of the present invention, there is provided a method of authenticating a speaker of a terminal, comprising: collecting speaker voice data by presenting a word array structure in which words are randomly arranged at every speaker authentication; Checking whether a speaker pattern in the word arrangement structure according to a word order recognized in the voice data matches a user's own pattern registered in a dictionary; Analyzing the voice data to check whether the voice characteristic of the speaker extracted and the user specific voice registered in advance match; And authorizing authentication if the speaker pattern and voice feature match both the unique pattern and the unique voice of the pre-registered user.

또한, 상기 고유패턴의 일치 여부를 검사하는 단계는, 상기 음성 데이터에서 음성의 음절구간을 구분하여 입력된 단어를 순차적으로 인식하는 단계를 포함할 수 있다.In addition, the step of checking whether the unique pattern is matched may include sequentially recognizing the inputted word by classifying the syllable interval of the voice in the voice data.

또한, 상기 고유음성의 일치 여부를 검사하는 단계는, 상기 화자의 음성 데이터를 분석하여 억양, 음절, 음색, 방언 및 성조 중 적어도 하나의 변별력을 가지는 화자 음성특징을 추출하는 단계를 포함할 수 있다.The step of checking whether the unique voice is matched may include analyzing the speech data of the speaker to extract a speaker voice characteristic having at least one of the intonation, syllable, tone color, dialect, .

또한, 상기 인증을 허가하는 단계 이후에, 상기 사용자 인증 허가에 따른 상기 사용자 고유패턴을 신규 등록, 수정, 유효기간 갱신 및 삭제 중 적어도 하나를 설정하는 단계를 더 포함할 수 있다.The method may further include setting at least one of new registration, modification, validity period update, and deletion of the user specific pattern according to the user authentication permission after the step of permitting the authentication.

또한, 상기 화자 음성 데이터를 수집하는 단계 이전에는, 사용자로부터 화자 인증을 위해 사용되는 고유패턴을 입력 받는 단계; 샘플단어를 제시하여 사용자로부터 발음된 음성을 입력 받아 고유음성을 추출하는 단계; 및 상기 고유패턴과 고유음성을 화자 인증을 위한 사용자 인증정보로 등록하는 단계를 더 포함할 수 있다.In addition, prior to the step of collecting the speaker voice data, there is a step of receiving a unique pattern used for speaker authentication from a user; Extracting a unique voice by inputting a voice pronounced by a user by presenting a sample word; And registering the unique pattern and the unique voice as user authentication information for speaker authentication.

또한, 상기 고유패턴을 입력 받는 단계는, 상기 사용자로부터 터치스크린, 키보드, 키패드 및 마우스 중 적어도 하나를 통해 상기 고유패턴을 입력 받을 수 있다.In addition, the receiving of the unique pattern may receive the unique pattern from the user through at least one of a touch screen, a keyboard, a keypad, and a mouse.

한편, 본 발명의 일 측면에 따른, 화자 인증을 위해 사용자 단말기와 연동하는 서버의 화자 인증 방법은, 상기 사용자 단말기로 화자 인증 실행 시 마다 서로 다른 단어가 무작위로 배열된 단어 배열 구조를 제시하여 화자 음성 데이터를 수집하는 단계; 상기 음성 데이터에서 인식된 단어 순서에 따른 상기 단어 배열 구조에서의 화자 패턴과 사전에 등록된 사용자 고유패턴의 일치 여부를 검사하는 단계; 상기 음성 데이터를 분석하여 추출된 상기 화자의 음성 특징과 사전에 등록된 사용자 고유음성의 일치 여부를 검사하는 단계; 및 상기 화자 패턴 및 음성특징이 상기 사전에 등록된 사용자의 고유패턴 및 고유음성과 모두 일치하면 상기 사용자 단말기의 인증 성공을 통보하는 단계를 포함한다.According to another aspect of the present invention, there is provided a speaker authentication method of a server interworking with a user terminal for speaker authentication, the method comprising: presenting a word sequence structure in which different words are randomly arranged every time a speaker authentication is performed, Collecting voice data; Checking whether a speaker pattern in the word arrangement structure according to a word order recognized in the voice data matches a user's own pattern registered in a dictionary; Analyzing the voice data to check whether the voice characteristic of the speaker extracted and the user specific voice registered in advance match; And notifying the authentication success of the user terminal if the speaker pattern and the voice feature match both the unique pattern and the unique voice of the user registered in advance.

또한, 상기 화자 음성 데이터를 수집하는 단계 이전에, 상기 사용자 단말기로부터 사용자 인증정보 등록을 요청에 따른 사용자 고유패턴과 사용자 음성 데이터를 수신하는 단계; 및 상기 사용자 음성 데이터의 음성 특징 검출에 따른 사용자 고유음성과 상기 고유패턴을 사용자 인증정보로 등록하는 단계를 더 포함할 수 있다.Receiving the user specific pattern and the user voice data according to the request for registering the user authentication information from the user terminal before the step of collecting the speaker voice data; And registering the user specific voice according to the voice feature detection of the user voice data and the unique pattern as the user authentication information.

또한, 상기 사용자 단말기의 인증 성공을 통보하는 단계 이후에, 사용자 인증 성공에 따른 상기 사용자 고유패턴을 신규 등록, 수정, 유효기간 갱신 및 삭제 중 적어도 하나를 설정하는 단계를 더 포함할 수 있다.The method may further include setting at least one of a new registration, a modification, a validity period update, and a deletion of the user specific pattern according to the success of user authentication after the step of informing the user terminal of the authentication success.

또한, 상기 사용자 단말기의 인증 성공을 통보하는 단계는, 상기 화자 패턴 및 화자 음성특징 중 어느 하나라도 일치하지 않으면 인증에 실패한 것으로 판단하여 재입력을 요청하거나 인증절차를 종료하는 단계를 포함할 수 있다.In addition, the step of notifying the user terminal of the authentication success may include a step of determining that the authentication is unsuccessful if either the speaker pattern or the speaker's voice feature does not match, requesting re-input or ending the authentication procedure .

본 발명의 실시 예에 따르면, 사용자의 음성 인식을 위한 고유패턴을 활용한 1회성 단어 입력을 통해 화자 인증을 수행함으로써 화자가 발음한 1회성 단어를 다른 사람이 듣거나 음성을 녹음하더라도 이를 이용한 접근이 불가능하여 도용을 방지할 수 있다.According to the embodiment of the present invention, the speaker authentication is performed through the one-time word input using the unique pattern for the user's voice recognition, so that even if another person hears the one-time word pronounced by the speaker or records the voice, It is impossible to prevent theft.

또한, 인증 강화를 위해 별도의 물건을 소지 및 복잡한 절차가 필요 없이 화자의 음성 데이터만으로 사용자의 고유패턴과 고유음성을 활용한 화자 인식을 동시에 수행함으로써 고유패턴이 유출되더라도 음성 인증 기술에서의 보안성을 향상시킬 수 있다.Also, in order to enhance the authentication, it is possible to simultaneously perform speaker recognition using the unique pattern of the user and the unique voice using only the voice data of the speaker without having a separate object and complicated procedures, Can be improved.

그리고, 종래 음성 인식 위한 불필요한 문장의 입력이 필요 없이 화자 인증에 필요한 단어 발음만으로 보안성이 향상된 화자 인증을 수행함으로써 음성 인식 실패확률을 줄일 수 있다.In addition, it is possible to reduce the probability of speech recognition failure by performing speaker authentication with improved security only by word pronunciation necessary for speaker authentication without the need to input unnecessary sentences for speech recognition.

도 1은 본 발명의 실시 예에 따른 화자 인증 시스템이 적용 가능한 네트워크 구성도이다.
도 2는 본 발명의 실시 예에 따른 화자 인증 시스템의 구성을 개략적으로 나타낸 블록도이다.
도 3은 본 발명의 실시 예에 따른 사용자 고유패턴 입력방식을 설명하기 위한 화면이다.
도 4는 본 발명의 실시 예에 따른 사용자 고유음성 등록방식을 설명하기 위한 화면이다.
도 5는 본 발명의 실시 예에 따른 화자 인증을 방식을 설명하기 위한 화자 음성 입력 메뉴를 나타낸다.
도 6은 본 발명의 실시 예에 따른 화자 인증을 위한 사용자 인증정보 등록방법을 나타낸 흐름도이다.
도 7은 본 발명의 실시 예에 따른 화자 인증 방법을 개략적으로 나타낸 흐름도이다.
도 8은 본 발명의 실시 예에 따른 복수의 정보통신 기기의 연동에 따른 화자 인증 방법을 개략적으로 나타낸 흐름도이다.1 is a network configuration diagram to which a speaker authentication system according to an embodiment of the present invention can be applied.
2 is a block diagram schematically showing a configuration of a speaker authentication system according to an embodiment of the present invention.
FIG. 3 is a screen for explaining a user specific pattern input method according to an embodiment of the present invention.
4 is a screen for explaining a user specific voice registration method according to an embodiment of the present invention.
5 illustrates a speaker voice input menu for explaining a speaker authentication method according to an embodiment of the present invention.
6 is a flowchart illustrating a user authentication information registration method for speaker authentication according to an embodiment of the present invention.
7 is a flowchart schematically showing a speaker authentication method according to an embodiment of the present invention.
8 is a flowchart schematically illustrating a speaker authentication method according to an embodiment of the present invention.

아래에서는 첨부한 도면을 참고로 하여 본 발명의 실시 예에 대하여 본 발명이 속하는 기술 분야에서 통상의 지식을 가진 자가 용이하게 실시할 수 있도록 상세히 설명한다. 그러나 본 발명은 여러 가지 상이한 형태로 구현될 수 있으며 여기에서 설명하는 실시 예에 한정되지 않는다. 그리고 도면에서 본 발명을 명확하게 설명하기 위해서 설명과 관계없는 부분은 생략하였으며, 명세서 전체를 통하여 유사한 부분에 대해서는 유사한 도면 부호를 붙였다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings so that those skilled in the art can easily carry out the present invention. The present invention may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein. In order to clearly illustrate the present invention, parts not related to the description are omitted, and similar parts are denoted by like reference characters throughout the specification.

명세서 전체에서, 어떤 부분이 어떤 구성요소를 "포함"한다고 할 때, 이는 특별히 반대되는 기재가 없는 한 다른 구성요소를 제외하는 것이 아니라 다른 구성요소를 더 포함할 수 있는 것을 의미한다. 또한, 명세서에 기재된 "…부", "…기", "모듈" 등의 용어는 적어도 하나의 기능이나 동작을 처리하는 단위를 의미하며, 이는 하드웨어나 소프트웨어 또는 하드웨어 및 소프트웨어의 결합으로 구현될 수 있다.Throughout the specification, when an element is referred to as "comprising ", it means that it can include other elements as well, without excluding other elements unless specifically stated otherwise. Also, the terms " part, "" module," and " module ", etc. in the specification mean a unit for processing at least one function or operation and may be implemented by hardware or software or a combination of hardware and software have.

명세서 전체에서, 제1 또는 제2 등의 용어는 다양한 구성 요소들을 설명하는데 사용될 수 있지만, 상기 구성 요소들은 상기 용어들에 의해 한정되어서는 안 된다. 상기 용어들은 하나의 구성 요소를 다른 구성 요소로부터 구별하는 목적으로만, 예컨대 본 발명의 개념에 따른 권리 범위로부터 이탈되지 않은 채, 제1 구성요소는 제2 구성요소로 명명될 수 있고, 유사하게 제2 구성요소는 제1 구성요소로도 명명될 수 있다.Throughout the specification, the terms first or second etc. may be used to describe various elements, but the elements should not be limited by the terms. The terms are intended to distinguish one element from another, for example, without departing from the scope of the invention in accordance with the concepts of the present invention, the first element may be termed the second element, The second component may also be referred to as a first component.

명세서 전체에서, '사용자' 및 '사용자 인증정보'는 정당한 사용자와 그 사용자의 인증을 위한 등록정보를 의미하고, '화자' 및 '화자 인증정보'는 정당한 사용자 여부가 인증되지 않은 인간을 의미한다. 따라서, 동일인이라도 사용자 인증정보 등록 과정에서는 '사용자'로, 사용자 인증을 요청하는 과정에서는 '화자'로, 그리고 상기 '화자'가 정당한 사용자로 인증 받은 후에는 '사용자'로 각각 명명될 수 있다.Throughout the specification, 'user' and 'user authentication information' means a legitimate user and registration information for authentication of the user, and 'speaker' and 'speaker authentication information' means a person who is not authenticated as a legitimate user . Accordingly, the same user can be named as 'user' in the process of registering the user authentication information, 'speaker' in the process of requesting user authentication, and 'user' after authenticating the 'speaker' as a legitimate user.

이제 본 발명의 실시 예에 따른 화자 인증 시스템 및 그 방법에 대하여 도면을 참조로 하여 상세하게 설명한다.Now, a speaker authentication system and a method thereof according to an embodiment of the present invention will be described in detail with reference to the drawings.

도 1은 본 발명의 실시 예에 따른 화자 인증 시스템이 적용 가능한 네트워크 구성도이다.1 is a network configuration diagram to which a speaker authentication system according to an embodiment of the present invention can be applied.

첨부된 도 1을 참조하면, 본 발명의 실시 예에 따른 화자 인증 시스템(100)은 화자에게 무작위로 제시된 단어를 미리 등록된 사용자의 고유패턴과 고유음성에 따라 발음하는 것을 인식하여 정당한 사용자 인증을 수행하는 음성 인식 성능이 향상된 시스템이다.1, a speaker authentication system 100 according to an embodiment of the present invention recognizes that a word randomly presented to a speaker is pronounced according to a unique pattern and a unique voice of a user registered in advance, This is a system with improved speech recognition performance.

이러한, 화자 인증 시스템(100)은 사용자 단말기(10), 서비스 서버(20) 및 인증 서버(30)에 포함될 수 있으며, 그 실시 및 서비스 형태에 따라 전체 또는 일부 기능이 상호 연동되도록 설치될 수 있다.The speaker authentication system 100 may be included in the user terminal 10, the service server 20, and the authentication server 30, and all or some of the functions may be interlocked .

예를 들면, 사용자 단말기(10)는 스마트폰, 휴대폰, 컴퓨터, 노트북, 테블릿, 웨어러블 단말기 등과 같은 개인 정보통신 단말기일 수 있으며, 여기에 탑재된 화자 인증 시스템(100)은 해당 정보통신 단말기의 사용을 위한 인증을 수행할 수 있다.For example, the user terminal 10 may be a personal information communication terminal such as a smart phone, a mobile phone, a computer, a notebook, a tablet, a wearable terminal, etc., Authentication for use can be performed.

또한, 서비스 서버(20)는 네트워크를 통해 클라이언트 기반 서비스를 제공하는 정보통신 단말기일 수 있다.In addition, the service server 20 may be an information communication terminal that provides a client-based service through a network.

서비스 서버(20)는 사용자 단말기(10)와 연동하여 접속 및 해당 서비스 제공을 위한 사용자의 고유패턴과 고유음성을 사용자 인증정보로 등록하고, 상기 화자 인을 수행할 수 있다.The service server 20 can register the unique pattern and unique voice of the user for providing the connection and the corresponding service as the user authentication information in cooperation with the user terminal 10 and perform the speaker authentication.

마찬가지로, 인증 서버(30)는 사용자의 고유패턴과 고유음성을 사용자 인증정보로 등록하고, 사용자 단말기(10)와 서비스 서버(20)간의 접속 및 서비스 제공을 위한 화자 인증을 수행할 수 있다.Similarly, the authentication server 30 can register the user's unique pattern and unique voice as user authentication information, and perform speaker authentication for connection and service provision between the user terminal 10 and the service server 20. [

한편, 아래의 도 2를 통하여 화자 인증 시스템(100)의 구성을 설명하되, 설명의 편의와 이해를 돕기 위해 특별한 기재가 없는 한 사용자 단말기(10)에 적용된 것을 위주로 설명한다.The configuration of the speaker authentication system 100 will be described with reference to FIG. 2 below. In order to facilitate the understanding and understanding of the present invention, a description will be given mainly to the application to the user terminal 10 unless otherwise specified.

도 2는 본 발명의 실시 예에 따른 화자 인증 시스템의 구성을 개략적으로 나타낸 블록도이다.2 is a block diagram schematically showing a configuration of a speaker authentication system according to an embodiment of the present invention.

첨부된 도 2을 참조하면, 본 발명의 실시 예에 따른 화자 인증 시스템(100)은 정보 수집부(110), 단어 인식부(120), 패턴 인식부(130), 음성 인식부(140), 데이터베이스부(150) 및 제어부(160)를 포함한다.2, a speaker authentication system 100 according to an exemplary embodiment of the present invention includes an information collecting unit 110, a word recognizing unit 120, a pattern recognizing unit 130, a voice recognizing unit 140, A database unit 150, and a control unit 160.

정보 수집부(110)는 사용자 단말기(10)에 구비된 다양한 인터페이스를 통해 사용자의 고유패턴과 고유음성을 수집한다.The information collecting unit 110 collects unique patterns and unique voices of the user through various interfaces provided in the user terminal 10. [

정보 수집부(110)는 사용자와의 상호 연동을 위한 입력 메뉴를 화면에 표시하는 사용자 인터페이스(Graphical User Interface, GUI) 및 사용자로부터 발화된 음성을 녹음하는 음성 인터페이스를 포함할 수 있다.The information collecting unit 110 may include a graphical user interface (GUI) for displaying an input menu for interacting with a user on a screen, and a voice interface for recording voice uttered by the user.

정보 수집부(110)는 화자 인증을 수행하기 위해 사전에 상기 사용자 고유패턴과 고유 음성을 수집하여 이들을 사용자 인증정보로써 데이터베이스부(150)에 등록한다.The information collection unit 110 collects the user specific patterns and unique voices in advance and registers them in the database unit 150 as user authentication information in order to perform speaker authentication.

여기서, 상기 사용자 고유패턴은 화자 인증을 위해 무작위로 제시된 단어들 중에서 화자가 발화해야 하는 단어 입력 순서(즉, 단어 발화 순서)를 결정하는 정보이다. 그리고, 상기 고유음성은 사용자만의 음성패턴에 따른 억양, 음절, 음색, 방언 및 성조 등의 파라미터로써 변별력을 가지는 음성 특징을 포함할 수 있다.Here, the user specific pattern is information for determining a word input order (i.e., a word utterance order) that a speaker should utter, among randomly presented words for speaker authentication. In addition, the unique voice may include a voice feature having a distinguishing power as parameters such as intonation, syllable, tones, tongue and tongue according to a voice pattern of a user only.

예를 들면, 도 3은 본 발명의 실시 예에 따른 사용자 고유패턴 입력방식을 설명하기 위한 화면이다.For example, FIG. 3 is a screen for explaining a user specific pattern input method according to an embodiment of the present invention.

첨부된 도 3을 참조하면, 정보 수집부(110)는 GUI를 통해 사용자 고유패턴 등록메뉴를 화면에 표시하고 사용자로부터 고유패턴을 입력 받는다. 이 때, 상기 사용자 고유패턴 입력은 터치스크린, 키보드(키패드) 및 마우스 등을 통해 입력할 수 있다.Referring to FIG. 3, the information collecting unit 110 displays a user-specific pattern registration menu on a screen through a GUI and receives a unique pattern from a user. At this time, the user-specific pattern input can be input through a touch screen, a keyboard (keypad), a mouse, and the like.

가령, 예시된 패턴입력1을 살펴보면 통상의 스마트폰 잠금 패턴과 같이 화면에 종횡의 단어 배열 구조가 표시되면 드래그 입력을 통해 단어 배열 형태가 연속된 패턴을 등록을 할 수 있다.For example, if the pattern input 1 illustrated in FIG. 1 is displayed, if a word array structure is displayed on the screen as in a typical smartphone lock pattern, a pattern in which a word arrangement pattern is continuous can be registered through a drag input.

또한, 예시된 패턴입력2를 참조하면, 단어 배열 구조에 표시된 숫자를 입력하여 단어 배열 형태가 연속되지 않는 분리된 패턴으로 등록을 할 수 있다. 이러한 상기 패턴입력2 방식은 상기 패턴입력1의 단어 배열 형태가 연결되는 방식에 비해 다양한 형태의 고유패턴을 등록할 수 있는 이점이 있다.Also, referring to the illustrated pattern input 2, a number displayed in the word arrangement structure can be input to register a separate pattern in which the word arrangement pattern is not continuous. Such a pattern input 2 scheme has an advantage in that various types of unique patterns can be registered in comparison with a manner in which word arrangement types of the pattern input 1 are connected.

다음, 도 4는 본 발명의 실시 예에 따른 사용자 고유음성 등록방식을 설명하기 위한 화면이다.Next, FIG. 4 is a screen for explaining a user-specific voice registration method according to an embodiment of the present invention.

첨부된 도 4를 참조하면, 정보 수집부(110)는 GUI를 통해 사용자 고유음성 등록메뉴를 화면에 표시하고 고유음성 등록을 위한 샘플 문자, 단어, 문장을 제시한다.Referring to FIG. 4, the information collecting unit 110 displays a user-specific voice registration menu on the screen through a GUI and presents sample characters, words, and sentences for unique voice registration.

이때 입력된 사용자 음성 데이터는 후술되는 단어 인식부(120) 및 음성 인식부(140)에 의해 처리되어 사용자만의 음성 특징을 분별하는 고유음성으로 등록할 수 있다.At this time, the inputted user voice data can be registered by the word recognition unit 120 and the voice recognition unit 140, which will be described later, and can be registered as a unique voice distinguishing voice characteristics of only the user.

한편, 도 5는 본 발명의 실시 예에 따른 화자 인증을 방식을 설명하기 위한 화자 음성 입력 메뉴를 나타낸다.Meanwhile, FIG. 5 shows a speaker voice input menu for explaining a speaker authentication method according to an embodiment of the present invention.

첨부된 도 5를 참조하면, 정보 수집부(110)는 화자 음성 입력 메뉴의 단어 배열 구조를 통해 화자 인증 실행 시 마다 서로 다른 단어를 무작위로 배열하여 제시한다. 상기 단어는 인간의 발음(발화)이 가능한 문자로써 글자, 숫자, 알파벳, 기호 등을 포함할 수 있다.Referring to FIG. 5, the information collecting unit 110 randomly arranges and presents different words every time the speaker authentication is performed through the word arrangement structure of the speaker voice input menu. The word may include letters, numbers, alphabets, symbols, and the like as characters capable of human pronunciation (utterance).

화자는 상기 단어의 배열 구조에서 사용자의 등록된 고유패턴에 따른 단어 발화 순서를 파악하고, 해당 단어를 순서대로 발음한다. 이 때, 공공장소 등에서 화자가 발음한 단어를 다른 사람이 들었더라도 상기 단어는 랜덤하게 배열되는 1회성 인증수단일 뿐 등록된 고유패턴이 공개되지 않으므로 음성인식 기술에서의 보안성을 향상시킬 수 있다.The speaker grasps the word utterance sequence according to the registered unique pattern of the user in the arrangement structure of the words, and pronounces the words in order. At this time, even if another word is pronounced by a speaker in a public place or the like, the word is randomly arranged as a one-time authentication means, and since the registered unique pattern is not disclosed, the security in the speech recognition technology can be improved .

예컨대, 상기 사용자의 고유패턴이 도 3의 패턴등록1과 같이 'L'자 형태로 입력된 경우 화자는 금, 2, 돼지, 샵(#)을 순서대로 발음하여 입력할 수 있다.For example, when the unique pattern of the user is inputted in the form of 'L' like the pattern registration 1 of FIG. 3, the speaker can input the gold, 2, pig, and shop (#) in order.

정보 수집부(110)는 음성 인터페이스를 통해 화자로부터 발음된 음성을 입력 받아 녹음하고, 녹음된 음성 데이터를 단어 인식부(120)로 전달할 수 있다.The information collecting unit 110 may receive and record the voice pronounced by the speaker through the voice interface, and may transmit the recorded voice data to the word recognizing unit 120. [

단어 인식부(120)는 정보 수집부(110)로부터 화자 인증을 위해 무작위로 배열된 문자들이 제시된 상태에서 화자로부터 발음된 음성 데이터를 전달 받는다.The word recognition unit 120 receives the voice data from the speaker in a state in which characters arranged at random are presented for speaker authentication from the information collection unit 110. [

단어 인식부(120)는 상기 음성 데이터에서 화자가 발음한 음성의 음절구간을 구분하여 입력된 단어를 인식하여 패턴 인식부(130)로 전달한다.The word recognizing unit 120 recognizes a syllable section of a voice uttered by the speaker in the voice data and inputs the recognized word to the pattern recognizing unit 130. [

단어 인식부(120)는 도 5의 예시와 같이 화자의 발음 순서에 따라 금, 2, 돼지, 샵(#)의 단어를 순서대로 인식할 수 있다.The word recognition unit 120 can recognize the words of gold, 2, pig, and shop (#) in order according to the pronunciation order of the speaker as in the example of FIG.

패턴 인식부(130)는 인식된 단어 순서에 따른 화자 패턴이 사전에 등록된 사용자 고유패턴과 일치하는지 여부를 검사한다.The pattern recognition unit 130 checks whether the speaker pattern according to the recognized word order matches a user's unique pattern registered in advance.

여기서, 상기 화자 패턴과 상기 고유패턴의 일치 여부를 검사하는 방법은 구체적으로 아래의 두 방식을 포함하는 것으로 설명될 수 있다.Here, the method of checking whether or not the speaker pattern and the unique pattern are matched may be explained as including the following two methods.

첫 번째 방식으로, 패턴 인식부(130)는 상기 일치 여부를 검사하는 방법으로 화자로부터 발화된 단어 순서에 따른 화자 패턴을 인식하고 이를 고유패턴과 비교하여 일치 여부를 검사할 수 있다.In the first method, the pattern recognition unit 130 recognizes the speaker pattern according to the order of words uttered by the speaker, and compares the speaker pattern with the unique pattern to check whether the speaker pattern matches or not.

패턴 인식부(130)는 화자 인증 실행 시마다 상기 화자 음성 입력 메뉴를 통해 무작위로 제시된 상기 단어 배열 구조를 취득하고, 상기 단어 배열 구조에서 화자의 발음으로 입력된 단어 순서에 기초한 화자 패턴을 인식할 수 있다.The pattern recognition unit 130 acquires the word arrangement structure randomly presented through the speaker voice input menu each time the speaker authentication is executed and recognizes the speaker pattern based on the word order inputted by the speaker's pronunciation in the word arrangement structure have.

그리고, 패턴 인식부(130)는 인식된 상기 화자 패턴과 미리 등록된 상기 고유패턴을 비교하여 두 패턴의 일치 여부를 검사하고 그 검사 결과를 제어부(160)로 전달할 수 있다.The pattern recognition unit 130 compares the recognized speaker pattern with the previously registered unique pattern to check whether the two patterns match or not, and can transmit the inspection result to the control unit 160. [

두 번째 방식으로, 패턴 인식부(130)는 상기 단어 배열 구조에서 고유패턴에 기초하여 인증을 위해 입력되어야 할 제1 단어 순서와 상기 화자 패턴에 기초하여 입력된 제2 단어 순서를 비교하는 검사 방식을 사용할 수 있다.In the second method, the pattern recognition unit 130 compares the first word order to be inputted for authentication based on the unique pattern in the word arrangement structure with the second word order inputted based on the speaker pattern, Can be used.

패턴 인식부(130)는 화자 인증 실행 시마다 상기 화자 음성 입력 메뉴를 통해 무작위로 제시된 상기 단어 배열 구조에 사용자 고유패턴을 적용하여 인증을 위해 입력되어야 할 제1 단어 순서를 파악할 수 있다.The pattern recognition unit 130 can recognize the first word order to be input for authentication by applying the user specific pattern to the word arrangement structure randomly presented through the speaker voice input menu each time the speaker authentication is executed.

또한, 패턴 인식부(130)는 상기 단어 배열 구조에서 화자 패턴에 기초하여 입력된 제2 단어 순서를 파악할 수 있다.In addition, the pattern recognition unit 130 can recognize the second word order inputted based on the speaker pattern in the word arrangement structure.

그리고, 그리고, 패턴 인식부(130)는 상기 고유패턴에 기초한 제1 단어 순서와 상기 화자 패턴에 기초한 제2 단어 순서를 비교하여 일치 여부를 검사하고 그 검사 결과를 제어부(160)로 전달할 수 있다.Then, the pattern recognition unit 130 compares the first word order based on the unique pattern with the second word order based on the speaker pattern, and checks whether they match or not, and transmits the inspection result to the controller 160 .

한편, 음성 인식부(140)는 화자의 음성 데이터를 분석하여 억양, 음절, 음색, 방언 및 성조 등의 변별력을 가지는 음성특징을 추출한다.On the other hand, the speech recognition unit 140 analyzes the speech data of the speaker and extracts speech features having discriminating power such as intonation, syllable, tone color, dialect,

음성 인식부(140)는 상기 추출된 화자의 음성특징을 미리 등록된 사용자 고유음성과 비교하여 동일인의 음성인지 일치 여부를 검사한다.The speech recognition unit 140 compares the extracted speech characteristic of the speaker with the previously registered user specific speech to check whether the same person is speech or not.

예컨대, 음성 인식부(140)는 MFCC(Mel Frequency Cepstral Coefficients)를 이용한 음성패턴 인식을 통해 화자 음성특징을 추출할 수 있으며, 이에 한정되지 않고 보다 향상된 음성 특징 인식기능을 가지는 다양한 음성인식 기술이 적용될 수 있다.For example, the speech recognition unit 140 may extract a speaker voice characteristic through voice pattern recognition using MFCC (Mel Frequency Cepstral Coefficients), and various speech recognition techniques having a more advanced voice feature recognition function may be applied .

이 때, 음성 인식부(140)는 상기 화자 음성특징과 사용자 고유음성의 비교결과 소정 비율의 유사도에 따른 인식결과를 제어부(160)로 전달할 수 있다.At this time, the speech recognition unit 140 can transmit the recognition result according to the similarity of the predetermined ratio to the control unit 160 as a result of the comparison between the speaker's voice feature and the user's own voice.

예컨대, 음성 인식부(140)는 등록된 사용자 고유음성에 대한 상기 화자 음성특징의 유사도가 소정 비율(예; 80%)을 초과하면 일치한 것으로 판단하고, 상기 소정비율 이하이면 불일치한 것으로 실패한 것으로 판단할 수 있다.For example, the speech recognition unit 140 determines that the similarity of the speaker's voice characteristic with respect to the registered user-specific voice exceeds a predetermined ratio (e.g., 80%), and if the similarity is below the predetermined ratio, It can be judged.

데이터베이스부(150)는 본 발명의 실시 예에 따른 화자 인증 시스템(100)의 운용을 위한 각종 프로그램 및 정보를 저장하고, 그 운용에 따라 생성되는 정보를 저장한다.The database unit 150 stores various programs and information for operating the speaker authentication system 100 according to an embodiment of the present invention and stores information generated according to the operation.

제어부(160)는 본 발명의 실시 예에 따른 화자 인증을 위한 상기 각부의 전반적인 동작을 제어하며, 데이터베이스부(150)에 등록된 상기 사용자의 인증정보를 사용하여 사용자 인식을 수행한다.The control unit 160 controls the overall operation of the respective units for speaker authentication according to an embodiment of the present invention and performs user recognition using the authentication information of the user registered in the database unit 150.

제어부(160)는 화자 인증 시스템(100)이 탑재된 정보통신 기기의 중앙 처리 장치일 수 있으며, 각부의 동작을 위해 마련된 하드웨어와 프로그램을 실행하여 그 실질적인 동작을 제어할 수 있다.The control unit 160 may be a central processing unit of the information communication device on which the speaker authentication system 100 is installed, and may control the actual operation by executing hardware and programs provided for the operations of the respective units.

즉, 제어부(160)는 앞서 설명된 사용자 인정정보의 등록을 위한 메뉴 제공으로부터 그 등록과정, 화자 인증을 위한 메뉴 제공으로부터 그 인증과정을 전반적으로 제어할 수 있다.That is, the control unit 160 can control the authentication process from the provision of the menu for registering the user authentication information described above to the provision of the menu for the registration process and the speaker authentication.

제어부(160)는 앞서 설명된 패턴 인식부(130)의 패턴 검사 결과 및 음성 인식부(140)의 음성 인식결과에 따라 화자의 인증 여부를 결정한다. The control unit 160 determines whether or not the speaker is authenticated according to the pattern inspection result of the pattern recognition unit 130 and the speech recognition result of the speech recognition unit 140 described above.

이 때, 제어부(160)는 화자의 음성 데이터로 입력된 화자 패턴 및 음성특징이 등록된 사용자의 고유패턴 및 고유음성과 모두 일치하는 경우에만 정당한 사용자로서의 인증을 허가한다.At this time, the control unit 160 permits authentication as a legitimate user only when the speaker pattern and the voice feature input by the speaker's voice data match both the unique pattern of the registered user and the unique voice.

반면, 제어부(160)는 화자로부터 입력된 화자 패턴 및 음성특징 중 어느 하나라도 일치하지 않으면 인증에 화자의 사용자 인증에 실패한 것으로 판단한다.On the other hand, when the controller 160 does not match any one of the speaker pattern and the voice feature input from the speaker, the control unit 160 determines that the user authentication of the speaker has failed in authentication.

한편, 다음의 도 6 및 도 7을 통해서 전술한 화자 인증 시스템(100)의 구성을 바탕으로 하는 본 발명의 실시 예에 따른 화자 인식 방법을 설명한다.6 and 7, the speaker recognition method according to the embodiment of the present invention based on the configuration of the speaker authentication system 100 described above will be described.

다만, 화자 인증 시스템(100)의 각 구성은 그 기능에 따라 세부적으로 더 분리되거나 하나의 시스템으로 통합될 수 있으므로, 이하 화자 인증 방법을 설명에 있어서의 그 주체를 화자 인증 시스템(100)으로하여 설명한다However, since each configuration of the speaker authentication system 100 can be further separated or integrated into one system according to its function, the speaker authentication system 100 will be described below as the subject of the speaker authentication method Explain

먼저, 도 6은 본 발명의 실시 예에 따른 화자 인증을 위한 사용자 인증정보 등록방법을 나타낸 흐름도이다.6 is a flowchart illustrating a method for registering user authentication information for speaker authentication according to an embodiment of the present invention.

첨부된 도 6을 참조하면, 본 발명의 실시 예에 따른 화자 인증 방법은 사용자가 패스워드(PW), 패턴정보, 음성 인식 정보, 행위 패턴 및 토큰 등의 기존 인증수단을 이용하여 로그인 된 상태를 가정하여 설명한다.Referring to FIG. 6, a speaker authentication method according to an embodiment of the present invention uses a conventional authentication means such as a password (PW), pattern information, voice recognition information, an action pattern, and a token, .

화자 인증 시스템(100)은 사용자 고유패턴 등록메뉴를 화면에 표시하고, 사용자로부터 화자 인증을 위해 사용되는 고유패턴을 입력 받는다(S101). 이 때, 상기 사용자는 터치스크린, 키보드(키패드) 및 마우스 등을 통해 상기 고유패턴을 입력할 수 있다.The speaker authentication system 100 displays a user-specific pattern registration menu on the screen, and receives a unique pattern used for speaker authentication from the user (S101). At this time, the user can input the unique pattern through a touch screen, a keyboard (keypad), a mouse, and the like.

화자 인증 시스템(100)은 사용자 고유음성 등록메뉴를 화면에 표시하여 샘플 단어를 제시하고, 사용자의 제시된 단어 발음에 따른 음성을 입력 받는다(S102).The speaker authentication system 100 displays a user-specific voice registration menu on the screen to present a sample word, and receives a voice corresponding to the user's proposed word pronunciation (S102).

화자 인증 시스템(100)은 상기 입력된 음성에서 단어를 검출하고, 상기 단어의 발음 시 음성 특징을 토대로 사용자 만의 고유음성 정보를 추출한다(S103).The speaker authentication system 100 detects a word from the input voice and extracts unique voice information unique to the user based on the voice characteristic in pronunciation of the word (S103).

화자 인증 시스템(100)은 사용자의 고유패턴과 고유음성을 화자 인증을 위한 사용자 인증정보로써 등록한다(S104).The speaker authentication system 100 registers the unique pattern of the user and the unique voice as user authentication information for speaker authentication (S104).

다음, 도 7은 본 발명의 실시 예에 따른 화자 인증 방법을 개략적으로 나타낸 흐름도이다.7 is a flowchart schematically illustrating a speaker authentication method according to an embodiment of the present invention.

첨부된 도 7을 참조하면, 본 발명의 실시 예에 따른 화자 인증 시스템(100)은 화자 인증이 시작되면 화자 음성 입력 메뉴를 통해 무작위로 배열된 단어 배열 구조를 제시한다(S201). Referring to FIG. 7, a speaker authentication system 100 according to an embodiment of the present invention presents a randomly arranged word array structure through a speaker's voice input menu at the start of speaker authentication (S201).

이 때, 화자는 상기 단어의 배열 구조에서 사용자의 등록된 고유패턴에 따른 단어 발화 순서를 파악하고, 해당 단어를 순서대로 발음할 수 있다.At this time, the speaker can grasp the word utterance sequence according to the registered unique pattern of the user in the arrangement structure of the words, and can pronounce the word in order.

화자 인증 시스템(100)은 화자로부터 발음으로 녹음된 음성 데이터를 취득하고(S202), 상기 음성데이터에서 음성의 음절구간을 구분하여 입력된 단어를 순차적으로 인식한다(S203).The speaker authentication system 100 acquires the voice data recorded by pronunciation from the speaker (S202), distinguishes the syllable interval of the voice from the voice data, and sequentially recognizes the inputted word (S203).

화자 인증 시스템(100)은 순차적으로 인식된 단어 순서에 따른 화자 패턴이 사전에 등록된 사용자 고유패턴과 일치하는지 여부를 검사한다(S204). 상기 화자 패턴은 상기 제시된 단어 배열 구조에서 화자의 발음으로 입력된 단어 순서에 따라 인식할 수 있다. The speaker authentication system 100 checks whether or not the speaker pattern according to the sequentially recognized word order coincides with the user's unique pattern registered in advance (S204). The speaker pattern can be recognized according to the word order inputted by the pronunciation of the speaker in the presented word arrangement structure.

화자 인증 시스템(100)은 상기 화자 패턴이 상기 사용자 고유패턴과 일치하면(S205; 예), 상기 화자의 음성 데이터를 분석하여 억양, 음절, 음색, 방언 및 성조 등의 변별력을 가지는 화자 음성특징을 추출한다(S206).If the speaker pattern matches the user's own pattern (S205; Yes), the speaker authentication system 100 analyzes the speaker's voice data to determine a speaker voice feature having discriminative power such as intonation, syllable, tones, (S206).

화자 인증 시스템(100)은 상기 추출된 화자 음성특징을 미리 등록된 사용자 고유음성과 비교하여 일치 여부를 검사한다(S207).The speaker authentication system 100 compares the extracted speaker voice characteristic with a previously registered user specific voice to check whether they match or not (S207).

화자 인증 시스템(100)은 상기 화자 음성특징이 상기 사용자 고유음성과 일치하면(S208; 예), 상기 화자를 정당한 사용자로 인증한다(S209).The speaker authentication system 100 authenticates the speaker as a legitimate user (S209) if the speaker voice characteristic matches the user specific voice (S208;

즉, 화자 인증 시스템(100)은 상기 화자 패턴 및 화자 음성특징이 모두 일치하는 경우에만 정당한 사용자로서의 인증을 허가할 수 있다.That is, the speaker authentication system 100 can permit authentication as a legitimate user only when both the speaker pattern and the speaker's voice characteristics match.

이후, 화자 인증 시스템(100)은 화자가 정당한 사용자로써 인증됨에 따른 요청으로 사용자 고유패턴을 신규 등록, 수정, 유효기간 갱신 및 삭제 등을 설정 할 수 있다.Thereafter, the speaker authentication system 100 can set a new registration, a modification, an effective period renewal, a deletion, and the like of the user specific pattern in response to a request that the speaker is authenticated as a legitimate user.

반면, 상기 S205 단계의 화자 패턴 및 상기 S208 단계의 화자 음성특징 중 어느 하나라도 등록된 사용자 인식정보와 일치하지 않으면(S205; 아니오, S208; 아니오), 인증에 실패한 것으로 판단하여 재입력을 요청하거나 인증절차를 종료할 수 있다.On the other hand, if either the speaker pattern in step S205 or the speaker's voice feature in step S208 does not match the registered user identification information (S205; NO, S208; NO) The authentication process can be terminated.

이상의 도 6 및 도 7을 통해 설명된 화자 인증 방법은 하나의 정보통신 기기에 탑재된 화자 인증 시스템(100)에서 동작하는 것으로 설명하였다. 그러나, 본 발명의 실시 예는 이에 한정되지 않으며 앞서 도 1에서의 설명과 같이 사용자 단말기(10)와 서버(20, 30)간의 연동되는 서비스 형태로 화자 인증을 수행할 수 있다.The speaker authentication method described with reference to FIG. 6 and FIG. 7 has been described as operating in the speaker authentication system 100 installed in one information communication device. However, the embodiment of the present invention is not limited to this, and speaker authentication may be performed in a service type that is interlocked between the user terminal 10 and the servers 20 and 30 as described above with reference to FIG.

예컨대, 도 8은 본 발명의 실시 예에 따른 복수의 정보통신 기기의 연동에 따른 화자 인증 방법을 개략적으로 나타낸 흐름도이다.For example, FIG. 8 is a flowchart schematically showing a speaker authentication method according to an interworking of a plurality of information communication devices according to an embodiment of the present invention.

첨부된 도 8을 참조하면, 하나의 실시 예로 사용자 단말기(10)와 서버(40)가 연동하여 사용자의 인증정보를 서버(40)에 등록하고, 서버(40)에서 그에 따른 화자 인증을 수행하는 흐름을 나타내고 있으며, 그 과정에서의 앞선 도 6 및 도 7을 통해 이미 설명된 부분은 생략한다. 여기서, 서버(40)는 상기 사용자 인증정보를 등록하고 서비스를 제공하는 서비스 서버(20)이거나 상기 서비스를 제공하기 위해 연동하여 사용자 인증을 관리하는 인증 서버(30)일 수 있다.8, in one embodiment, the user terminal 10 and the server 40 are interlocked to register the authentication information of the user in the server 40, and the server 40 performs the speaker authentication And the parts already described with reference to FIG. 6 and FIG. 7 in the process are omitted. Here, the server 40 may be a service server 20 that registers the user authentication information and provides a service, or an authentication server 30 that manages user authentication in cooperation with the service server 20 in order to provide the service.

먼저, 도 8에서의 사용자 인증정보 등록과정을 살펴보면, 사용자 단말기(10)는 사용자 인증정보 등록을 요청하며, 이를 위해 해당 사용자의 고유패턴과 음성 데이터를 입력하여 서버(40)로 전송한다(S301). Referring to FIG. 8, the user terminal 10 requests registration of user authentication information. To this end, the user terminal 10 inputs a unique pattern and voice data of the corresponding user to the server 40 (S301 ).

서버(40)는 사용자 단말기(10)로부터 상기 요청을 수신하면, 수신된 사용자 음성 데이터의 음성 특징 검출에 따른 고유음성과 상기 고유패턴을 사용자 인증정보로 등록한다(S302).Upon receiving the request from the user terminal 10, the server 40 registers the unique voice and the unique pattern according to the voice feature detection of the received user voice data as user authentication information (S302).

그리고, 서버(40)는 상기 요청된 사용자 인증정보 등록 완료 정보를 사용자 단말기(10)에 통보함으로써 등록과정을 완료한다(S303).Then, the server 40 notifies the user terminal 10 of the requested user authentication information registration completion information to complete the registration process (S303).

다음, 도 8에서의 사용자 인식 과정을 살펴보면, 서버(40)는 화자 인증이 시작되면 화자 음성 입력 메뉴를 통해 무작위로 배열된 단어 배열 구조를 사용자 단말기(10)를 통해 제시한다(S304).8, when the speaker authentication is started, the server 40 presents the word array structure randomly arranged through the speaker's voice input menu through the user terminal 10 (S304).

사용자 단말기(10)는 화자로부터 발음으로 녹음된 음성 데이터를 실어 화자 인증을 요청한다(S305). 이 때, 사용자 단말기(10)는 서버(40)로의 서비스 접속 및 온라인 결제 등의 이슈로 화자 인증을 요청할 수 있다.The user terminal 10 requests the speaker authentication by loading the voice data recorded by pronunciation from the speaker (S305). At this time, the user terminal 10 can request the speaker authentication with an issue such as service connection to the server 40 and online settlement.

서버(40)는 사용자 단말기(10)로부터 화자 인증 요청을 수신하면, 수신된 화자 음성 데이터에 포함된 단어를 순차적으로 인식하고(S306), 상기 단어의 순서에 따른 화자 패턴을 인식하여 등록된 사용자 고유패턴과의 일치 여부를 검사할 수 있다(S307).Upon receiving the speaker authentication request from the user terminal 10, the server 40 sequentially recognizes the words included in the received speaker voice data (S306), recognizes the speaker pattern according to the order of the words, It is possible to check whether or not the pattern matches with the unique pattern (S307).

또한, 서버(40)는 상기 음성 데이터의 분석에 따른 화자 음성 특징과 등록된 사용자 고유음성의 일치 여부를 검사할 수 있다(S308).In addition, the server 40 may check whether the speaker voice characteristic according to the analysis of the voice data matches the registered user's unique voice (S308).

서버(40)는 상기 화자 패턴 및 화자 음성 특징이 모두 일치하면 화자 인증 성공을 사용자 단말기(10)로 통보하거나, 상기 화자 패턴 및 화자 음성 특징 중 어느 하나라도 불일치하면 화자 인증 실패를 사용자 단말기로 통보할 수 있다(S309).The server 40 notifies the user terminal 10 of the successful authentication of the speaker authentication when both the speaker pattern and the speaker's voice characteristics coincide or notifies the user terminal of the speaker authentication failure when either the speaker pattern or the speaker's voice feature is inconsistent (S309).

이와 같이, 본 발명의 실시 예에 따르면, 사용자의 음성 인식을 위한 고유패턴을 활용한 1회성 단어 입력을 통해 화자 인증을 수행함으로써 화자가 발음한 1회성 단어를 다른 사람이 듣거나 음성을 녹음하더라도 이를 이용한 접근이 불가능하여 도용을 방지할 수 있는 효과가 있다.As described above, according to the embodiment of the present invention, the speaker authentication is performed through the one-time word input using the unique pattern for the user's voice recognition, so that even if another person hears the one- So that it is impossible to access them, and the theft can be prevented.

또한, 인증 강화를 위해 별도의 물건을 소지 및 복잡한 절차가 필요 없이 화자의 음성 데이터만으로 사용자의 고유패턴과 고유음성을 활용한 화자 인식을 동시에 수행함으로써 고유패턴이 유출되더라도 음성 인증 기술에서의 보안성을 향상시킬 수 있는 효과가 있다.Also, in order to enhance the authentication, it is possible to simultaneously perform speaker recognition using the unique pattern of the user and the unique voice using only the voice data of the speaker without having a separate object and complicated procedures, Can be improved.

그리고, 종래 음성 인식 위한 불필요한 문장의 입력이 필요 없이 화자 인증에 필요한 단어 발음만으로 보안성이 강화된 인증을 수행함으로써 음성 인식 실패확률을 줄일 수 있는 효과가 있다.In addition, there is an effect of reducing the probability of speech recognition failure by performing authentication with enhanced security only by word pronunciation necessary for speaker authentication without the need to input unnecessary sentences for speech recognition.

본 발명의 실시 예는 이상에서 설명한 장치 및/또는 방법을 통해서만 구현이 되는 것은 아니며, 본 발명의 실시 예의 구성에 대응하는 기능을 실현하기 위한 프로그램, 그 프로그램이 기록된 기록 매체 등을 통해 구현될 수도 있으며, 이러한 구현은 앞서 설명한 실시 예의 기재로부터 본 발명이 속하는 기술분야의 전문가라면 쉽게 구현할 수 있는 것이다.The embodiments of the present invention are not limited to the above-described apparatuses and / or methods, but may be implemented through a program for realizing functions corresponding to the configuration of the embodiment of the present invention, a recording medium on which the program is recorded And such an embodiment can be easily implemented by those skilled in the art from the description of the embodiments described above.

이상에서 본 발명의 실시 예에 대하여 상세하게 설명하였지만 본 발명의 권리범위는 이에 한정되는 것은 아니고 다음의 청구범위에서 정의하고 있는 본 발명의 기본 개념을 이용한 당업자의 여러 변형 및 개량 형태 또한 본 발명의 권리범위에 속하는 것이다.While the present invention has been particularly shown and described with reference to exemplary embodiments thereof, it is to be understood that the invention is not limited to the disclosed exemplary embodiments, It belongs to the scope of right.

10: 사용자 단말기 20: 서비스 서버
30: 인증 서버 100: 화자 인증 시스템
110: 정보 수집부 120: 단어 인식부
130: 패턴 인식부 140: 음성 인식부
150: 데이터베이스부 160: 제어부10: user terminal 20: service server
30: Authentication server 100: Speaker authentication system
110: information collecting unit 120: word recognizing unit
130: pattern recognition unit 140: speech recognition unit
150: Database part 160: Control part

Claims

화자 인증 실행 시 마다 서로 다른 단어가 무작위로 배열된 단어 배열 구조를 제시하여 화자 음성 데이터를 수집하는 정보 수집부;
상기 음성 데이터에서 인식된 단어 순서에 따른 상기 단어 배열 구조에서의 화자 패턴과 사전에 등록된 사용자 고유패턴의 일치 여부를 검사하는 패턴 인식부;
상기 음성 데이터를 분석하여 추출된 상기 화자의 음성 특징을 사전에 등록된 사용자 고유음성과 비교하여 일치 여부를 검사하는 음성 인식부; 및
상기 화자 패턴 및 음성특징이 상기 사전에 등록된 사용자의 고유패턴 및 고유음성과 모두 일치하면 인증을 허가하는 제어부
를 포함하는 화자 인증 시스템.An information collecting unit for presenting speaker voice data by presenting a word array structure in which different words are randomly arranged every time the speaker authentication is executed;
A pattern recognition unit for checking whether a speaker pattern in the word arrangement structure according to a word sequence recognized in the speech data matches a user specific pattern registered in a dictionary;
A voice recognition unit for analyzing the voice data and comparing voice characteristics of the speaker extracted by the voice recognition unit with previously registered user specific voice to check for coincidence; And
If the speaker pattern and the voice feature coincide with the unique pattern and the unique voice of the user registered in the dictionary,
And a speaker authentication system.

제 1 항에 있어서,
상기 음성 데이터에서 화자가 발음한 음성의 음절구간을 구분하여 입력된 단어를 인식하는 단어 인식부; 및
상기 고유패턴 및 고유음성을 화자 인증을 위한 사용자 인증정보로 저장하는 데이터베이스부
를 더 포함하는 화자 인증 시스템.The method according to claim 1,
A word recognition unit for recognizing an input word by distinguishing a syllable interval of a voice uttered by the speaker from the voice data; And
A database unit for storing the unique pattern and the unique voice as user authentication information for speaker authentication
And a speaker authentication system.

제 1 항에 있어서,
상기 단어는,
인간의 발음이 가능한 문자, 글자, 숫자, 알파벳 및 기호 중 적어도 하나를 포함하는 화자 인증 시스템.The method according to claim 1,
The word "
A speaker authentication system including at least one of human-pronounceable characters, letters, numbers, alphabets, and symbols.

제 1 항에 있어서,
상기 고유음성은,
상기 사용자의 억양, 음절, 음색, 방언 및 성조 중 적어도 하나의 변별력을 가지는 음성 특징을 포함하는 화자 인증 시스템.The method according to claim 1,
The unique voice includes
And a speech characteristic having at least one of discrimination power of the user's intonation, syllable, tone color, dialect, and tone.

제 1 항에 있어서,
상기 고유패턴은,
상기 단어 배열 구조에 무작위로 배열된 단어들 중에서 화자가 발화해야 하는 단어 입력 순서를 결정하는 정보인 화자 인증 시스템.The method according to claim 1,
The inherent pattern may include:
Wherein the information is information for determining a word input order that a speaker should utter, among words randomly arranged in the word arrangement structure.

제 1 항에 있어서,
상기 정보 수집부는,
사용자 인터페이스 및 음성 인터페이스를 통해 상기 고유패턴 및 고유음성을 수집하여 사용자 인증정보로 등록하는 화자 인증 시스템.The method according to claim 1,
The information collecting unit,
And collecting the unique pattern and the unique voice through the user interface and the voice interface, and registering the unique pattern and the unique voice as user authentication information.

제 6 항에 있어서,
상기 정보 수집부는,
종횡의 상기 단어 배열 구조에서 단어 배열 형태가 연속되거나 단어 배열 형태가 분리된 상기 고유패턴을 등록하는 화자 인증 시스템.The method according to claim 6,
The information collecting unit,
And registers the unique pattern in which the word array type is consecutive or the word array type is separated in the vertical and horizontal word array structure.

제 1 항에 있어서,
상기 패턴 인식부는,
상기 화자 인증 실행 시마다 무작위로 제시된 상기 단어 배열 구조를 취득하고, 상기 단어 배열 구조에서 화자의 발음으로 입력된 단어 순서에 기초한 화자 패턴을 인식하는 화자 인증 시스템.The method according to claim 1,
Wherein the pattern recognition unit comprises:
Acquiring the word arrangement structure randomly presented each time the speaker authentication is executed, and recognizing a speaker pattern based on a word sequence inputted by pronunciation of the speaker in the word array structure.

제 1 항에 있어서,
상기 패턴 인식부는,
상기 단어 배열 구조에서 고유패턴에 기초한 제1 단어 순서와 상기 화자 패턴에 기초하여 입력된 제2 단어 순서를 비교하여 일치 여부를 검사하는 화자 인증 시스템.The method according to claim 1,
Wherein the pattern recognition unit comprises:
And comparing the first word order based on the unique pattern and the second word order inputted based on the speaker pattern in the word arrangement structure to check whether or not they match.

제 1 항에 있어서,
상기 음성 인식부는,
상기 등록된 사용자 고유음성에 대한 상기 화자 음성특징의 유사도가 소정 비율을 초과하면 일치한 것으로 판단하고, 상기 소정 비율 이하이면 인증에 실패한 것으로 판단하는 화자 인증 시스템.The method according to claim 1,
The voice recognition unit recognizes,
If the degree of similarity of the speaker voice characteristic with respect to the registered user specific voice exceeds a predetermined ratio, judges that the authentication is unsuccessful if the similarity is equal to or less than the predetermined ratio.

단말기의 화자 인증 방법에 있어서,
화자 인증 실행 시 마다 서로 다른 단어가 무작위로 배열된 단어 배열 구조를 제시하여 화자 음성 데이터를 수집하는 단계;
상기 음성 데이터에서 인식된 단어 순서에 따른 상기 단어 배열 구조에서의 화자 패턴과 사전에 등록된 사용자 고유패턴의 일치 여부를 검사하는 단계;
상기 음성 데이터를 분석하여 추출된 상기 화자의 음성 특징과 사전에 등록된 사용자 고유음성의 일치 여부를 검사하는 단계; 및
상기 화자 패턴 및 음성특징이 상기 사전에 등록된 사용자의 고유패턴 및 고유음성과 모두 일치하면 인증을 허가하는 단계
를 포함하는 화자 인증 방법.In a speaker authentication method of a terminal,
Collecting speaker voice data by presenting a word array structure in which different words are randomly arranged every time the speaker authentication is executed;
Checking whether a speaker pattern in the word arrangement structure according to a word order recognized in the voice data matches a user's own pattern registered in a dictionary;
Analyzing the voice data to check whether the voice characteristic of the speaker extracted and the user specific voice registered in advance match; And
Authorizing authentication if the speaker pattern and the voice feature match both the unique pattern and the unique voice of the user registered in the dictionary
And a speaker authentication method.

제 11 항에 있어서,
상기 고유패턴의 일치 여부를 검사하는 단계는,
상기 음성 데이터에서 음성의 음절구간을 구분하여 입력된 단어를 순차적으로 인식하는 단계를 포함하는 화자 인증 방법.12. The method of claim 11,
Wherein the step of checking whether the unique pattern is matched includes:
Identifying a syllable section of a voice in the voice data, and sequentially recognizing input words.

제 11 항에 있어서,
상기 고유음성의 일치 여부를 검사하는 단계는,
상기 화자의 음성 데이터를 분석하여 억양, 음절, 음색, 방언 및 성조 중 적어도 하나의 변별력을 가지는 화자 음성특징을 추출하는 단계를 포함하는 화자 인증 방법.12. The method of claim 11,
The method of claim 1,
Analyzing the speech data of the speaker to extract a speaker voice characteristic having at least one of intonation, syllable, tone color, dialect, and tone.

제 11 항에 있어서,
상기 인증을 허가하는 단계 이후에,
상기 사용자 인증 허가에 따른 상기 사용자 고유패턴을 신규 등록, 수정, 유효기간 갱신 및 삭제 중 적어도 하나를 설정하는 단계를 더 포함하는 화자 인증 방법.12. The method of claim 11,
After the step of authorizing the authentication,
Further comprising setting at least one of new registration, modification, validity period update, and deletion of the user specific pattern according to the user authentication permission.

제 11 항에 있어서,
상기 화자 음성 데이터를 수집하는 단계 이전에는,
사용자로부터 화자 인증을 위해 사용되는 고유패턴을 입력 받는 단계;
샘플단어를 제시하여 사용자로부터 발음된 음성을 입력 받아 고유음성을 추출하는 단계; 및
상기 고유패턴과 고유음성을 화자 인증을 위한 사용자 인증정보로 등록하는 단계를 더 포함하는 화자 인증 방법.12. The method of claim 11,
Before the step of collecting the speaker voice data,
Receiving a unique pattern used for speaker authentication from a user;
Extracting a unique voice by inputting a voice pronounced by a user by presenting a sample word; And
And registering the unique pattern and the unique voice as user authentication information for speaker authentication.

제 15 항에 있어서,
상기 고유패턴을 입력 받는 단계는,
상기 사용자로부터 터치스크린, 키보드, 키패드 및 마우스 중 적어도 하나를 통해 상기 고유패턴을 입력 받는 화자 인증 방법.16. The method of claim 15,
Wherein the step of receiving the unique pattern comprises:
Wherein the unique pattern is input from the user through at least one of a touch screen, a keyboard, a keypad, and a mouse.

화자 인증을 위해 사용자 단말기와 연동하는 서버의 화자 인증 방법에 있어서,
상기 사용자 단말기로 화자 인증 실행 시 마다 서로 다른 단어가 무작위로 배열된 단어 배열 구조를 제시하여 화자 음성 데이터를 수집하는 단계;
상기 음성 데이터에서 인식된 단어 순서에 따른 상기 단어 배열 구조에서의 화자 패턴과 사전에 등록된 사용자 고유패턴의 일치 여부를 검사하는 단계;
상기 음성 데이터를 분석하여 추출된 상기 화자의 음성 특징과 사전에 등록된 사용자 고유음성의 일치 여부를 검사하는 단계; 및
상기 화자 패턴 및 음성특징이 상기 사전에 등록된 사용자의 고유패턴 및 고유음성과 모두 일치하면 상기 사용자 단말기의 인증 성공을 통보하는 단계
를 포함하는 화자 인증 방법.A speaker authentication method of a server for interfacing with a user terminal for speaker authentication,
Collecting speaker voice data by presenting a word array structure in which different words are randomly arranged every time the speaker authentication is executed in the user terminal;
Checking whether a speaker pattern in the word arrangement structure according to a word order recognized in the voice data matches a user's own pattern registered in a dictionary;
Analyzing the voice data to check whether the voice characteristic of the speaker extracted and the user specific voice registered in advance match; And
If the speaker pattern and the voice feature coincide with the unique pattern and the unique voice of the user registered in advance, notifying the authentication success of the user terminal
And a speaker authentication method.

제 17 항에 있어서,
상기 화자 음성 데이터를 수집하는 단계 이전에,
상기 사용자 단말기로부터 사용자 인증정보 등록을 요청에 따른 사용자 고유패턴과 사용자 음성 데이터를 수신하는 단계; 및
상기 사용자 음성 데이터의 음성 특징 검출에 따른 사용자 고유음성과 상기 고유패턴을 사용자 인증정보로 등록하는 단계를 더 포함하는 화자 인증 방법.18. The method of claim 17,
Before the step of collecting the speaker voice data,
Receiving a user specific pattern and user voice data according to a request for registering user authentication information from the user terminal; And
And registering the user specific voice and the unique pattern according to the voice feature detection of the user voice data as user authentication information.

제 11 항에 있어서,
상기 사용자 단말기의 인증 성공을 통보하는 단계 이후에,
사용자 인증 성공에 따른 상기 사용자 고유패턴을 신규 등록, 수정, 유효기간 갱신 및 삭제 중 적어도 하나를 설정하는 단계를 더 포함하는 화자 인증 방법.12. The method of claim 11,
After notifying the authentication success of the user terminal,
Further comprising setting at least one of a new registration, a modification, a validity period update, and deletion of the user specific pattern as a result of user authentication success.

제 11 항에 있어서,
상기 사용자 단말기의 인증 성공을 통보하는 단계는,
상기 화자 패턴 및 화자 음성특징 중 어느 하나라도 일치하지 않으면 인증에 실패한 것으로 판단하여 재입력을 요청하거나 인증절차를 종료하는 단계를 포함하는 화자 인증 방법.12. The method of claim 11,
Wherein the notification of the authentication success of the user terminal comprises:
And determining that the authentication is unsuccessful if either the speaker pattern or the speaker's voice feature does not match, and requesting re-input or ending the authentication procedure.