KR100777551B1

KR100777551B1 - Voice recognition system and method for changing member's configuration as a channel capacity

Info

Publication number: KR100777551B1
Application number: KR1020010038428A
Authority: KR
Inventors: 전호현
Original assignee: 주식회사 케이티
Priority date: 2001-06-29
Filing date: 2001-06-29
Publication date: 2007-11-16
Also published as: KR20030002728A

Abstract

본 발명은 채널용량에 따른 가변 구성이 가능한 음성인식 시스템 및 그 방법에 관한 것으로, 음성인식부와 호처리부를 패키지화함으로써 음성인식엔진이 변경되더라도 호처리부의 프로그램의 수정이 필요없도록 한 채널용량에 따른 가변 구성이 가능한 음성인식 시스템 및 그 방법을 제공함에 그 목적이 있다.The present invention relates to a voice recognition system and a method which can be configured in accordance with the channel capacity, by packaging the voice recognition unit and the call processing unit according to the channel capacity that does not need to modify the program of the call processing unit even if the voice recognition engine is changed It is an object of the present invention to provide a speech recognition system and a method capable of variable configuration.

본 발명은 음성인식시스템을 효율적으로 구성하기 위해서 음성인식부를 호처리부와 하나의 시스템에 넣어 패키지화하고, 그 호처리부와 음성인식부간의 데이터전송은 LAN을 사용하며, 이때 전송되는 데이터의 앞에 특별한 헤더를 붙여 전송하여 호처리부에서 음성인식부로 전송되는 데이터의 종류를 다양하게 할 수 있게 함으로써 인식엔진의 제작회사가 바뀌더라도 호처리부의 프로그램을 수정할 필요가 없도록 하는 것을 특징으로 한다.According to the present invention, a voice recognition unit is packaged into a call processing unit and a single system in order to efficiently configure a voice recognition system, and data transmission between the call processing unit and the voice recognition unit uses a LAN, and at this time, a special header in front of the transmitted data It is possible to vary the type of data transmitted from the call processing unit to the voice recognition unit by attaching the data, so that it is not necessary to modify the program of the call processing unit even if the manufacturing company of the recognition engine changes.

본 발명을 적용하면, 호처리를 행하는 전처리부 및 음성인식을 행하는 음성인식부가 상호 통신 가능한 패키지 형태로 구성되며, 상호간 소켓 통신을 행함으로써 음성인식 성능을 유지하면서 채널용량에 따라 패키지로 확장이 매우 용이하며, 음성인식엔진 제작사에 종속되지 않고 필요시 선택적으로 가변 가능하게 된다는 효과가 있다.According to the present invention, the preprocessing unit for performing call processing and the voice recognition unit for voice recognition are configured in a package form that can communicate with each other, and the extension to a package according to channel capacity is maintained while maintaining voice recognition performance by performing socket communication between each other. It is easy, and there is an effect that it can be selectively changed when necessary without being dependent on the voice recognition engine manufacturer.

Description

채널용량에 따른 가변 구성이 가능한 음성인식 시스템 및 그 방법{VOICE RECOGNITION SYSTEM AND METHOD FOR CHANGING MEMBER'S CONFIGURATION AS A CHANNEL CAPACITY}VOICE RECOGNITION SYSTEM AND METHOD FOR CHANGING MEMBER'S CONFIGURATION AS A CHANNEL CAPACITY}

도 1은 본 발명의 일실시예에 따른 채널용량에 따른 가변 구성이 가능한 음성인식 시스템의 개략적인 구성을 도시한 모식도,1 is a schematic diagram showing a schematic configuration of a voice recognition system capable of varying the configuration according to the channel capacity according to an embodiment of the present invention,

도 2는 본 발명의 일실시예에 따른 채널용량에 따른 가변 구성이 가능한 음성인식 시스템내의 전처리부와 음성인식부간의 패킷 처리과정을 나타내는 도면이다. 2 is a diagram illustrating a packet processing process between a preprocessor and a voice recognition unit in a voice recognition system capable of varying configuration according to channel capacity according to an embodiment of the present invention.

*도면의 주요부분에 대한 부호의 설명** Description of the symbols for the main parts of the drawings *

100:전처리부 110:시나리오 Thread100: preprocessing unit 110: scenario thread

120:TCP/IP 클라이언트 130:TCP/IP 서버120: TCP / IP client 130: TCP / IP server

200:음성인식부 210:TCP/IP 서버200: speech recognition unit 210: TCP / IP server

220:입력데이터 230:인식처리검색 Thread220: input data 230: recognition processing search thread

240:TCP/IP 클라이언트240: TCP / IP client

본 발명은 채널용량에 따른 가변 구성이 가능한 음성인식 시스템 및 그 방법에 관한 것으로, 보다 상세하게 서비스 채널의 수에 따라 그 구성의 변경이 가능하도록 한 채널용량에 따른 가변 구성이 가능한 음성인식 시스템 및 그 방법에 관한 것이다.The present invention relates to a voice recognition system capable of variable configuration according to channel capacity and a method thereof, and more particularly, to a voice recognition system capable of variable configuration according to channel capacity such that the configuration can be changed according to the number of service channels. It's about how.

주지된 바와 같이, 음성인식시스템은 사람의 음성을 입력 받아 인식기를 통해 인식 결과를 유도해내는 시스템으로, 현재 이러한 인식기술을 이용한 다양한 서비스가 개발되어 상용화되고 있다.As is well known, the voice recognition system is a system that derives a recognition result through a recognizer by receiving a human voice, and various services using such recognition technology have been developed and commercialized.

일반적으로 음성인식 시스템은 화자가 음성데이터를 입력하는 전화단말기와, 그 전화단말기의 음성데이터를 호처리부로 중계하여 전송하는 교환기(Private Branch eXchange)와, 호처리 기능을 수행하는 호처리부와, 상기 호처리부를 매개로 인가된 음성데이터를 토대로 음성인식을 수행하는 단일의 음성인식부로 구성된다.In general, a voice recognition system includes a telephone terminal for a speaker to input voice data, a private branch eXchange for relaying and transmitting voice data of the telephone terminal to a call processing unit, a call processing unit performing a call processing function, and It consists of a single voice recognition unit that performs voice recognition based on the voice data applied through the call processor.

그러나, 소용량 채널의 경우 하나의 시스템내에서 음성인식을 비롯한 서비스에 필요한 모든 기능들을 처리하였다. 그래서 설계하기에 따라, 시스템의 구성이 약간이 상이하지만, 각 모듈간에 데이터의 교환은 메모리나 파일을 이용하는 것이 일반적이다. However, in case of small capacity channel, all functions necessary for service including voice recognition are processed in one system. So, depending on the design, the configuration of the system is slightly different, but it is common to exchange data between modules using a memory or a file.

또한, 음성인식 처리 회선수가 많을 경우 실시간 서비스문제로 하나의 시스템으로는 곤란한 경우 계산량이 많이 소요되는 음성인식부를 별도의 시스템에서 처리하도록 분리하며 서비스 시나리오를 제공하는 부분과 음성인식을 처리하는 시스템간에는 LAN을 이용하여 필요한 데이터를 주고 받는다. 그리고 음성인식을 처리할 수 있는 회선용량이 늘어나면 인식기능을 처리할 수 있는 시스템의 개수도 함께 증 가되어야 하므로 입력되는 채널에 따라 음성인식을 처리하는 시스템이 달라지게 된다. In addition, when there are a large number of voice recognition processing lines, it is necessary to separate a voice recognition unit that requires a large amount of calculation from a separate system to process a separate system. Send and receive necessary data using LAN. And as the line capacity that can process voice recognition increases, the number of systems that can handle the recognition function must also increase, so the system that processes voice recognition varies according to the input channel.

이 경우 서비스를 처리하는 전처리부와 할당된 채널의 음성인식처리부를 손쉽게 구별할 수 있는 방법이 없다면 시스템을 확장하는데 많은 시간과 노력이 소요되며 또한 유지보수도 어렵게 된다는 문제가 있다.In this case, if there is no way to easily distinguish between the preprocessing unit processing the service and the voice recognition processing unit of the allocated channel, it takes a lot of time and effort to expand the system, and also becomes difficult to maintain.

본 발명은 상기한 종래 기술의 사정을 감안하여 이루어진 것으로, 음성인식부와 호처리부를 패키지화함으로써 음성인식엔진이 변경되더라도 호처리부의 프로그램의 수정이 필요없도록 한 채널용량에 따른 가변 구성이 가능한 음성인식 시스템 및 그 방법을 제공함에 그 목적이 있다.SUMMARY OF THE INVENTION The present invention has been made in view of the above-described circumstances of the prior art, and by packaging the speech recognition unit and the call processing unit, even if the speech recognition engine is changed, the speech recognition capable of a variable configuration according to the channel capacity that does not require modification of the program of the call processing unit is required. It is an object of the present invention to provide a system and a method thereof.

본 발명의 다른 목적은 음성인식 데이터의 헤더부내에 특정 데이터를 첨부함으로써 호처리부로부터 음성인식부로 전송되는 데이터를 종류를 다양하게 할 수 있도록 한 채널용량에 따른 가변 구성이 가능한 음성인식 시스템 및 그 방법을 제공함에 있다.Another object of the present invention is to provide a voice recognition system and method capable of varying configuration according to channel capacity so that various types of data transmitted from the call processing unit to the voice recognition unit can be varied by attaching specific data in the header portion of the voice recognition data. In providing.

상기한 목적을 달성하기 위해, 본 발명의 바람직한 실시예에 따르면 음성입력으로 음성을 인식하는 음성인식시스템에 있어서, 입력된 음성의 시작점과 끝점을 찾아내어 전송하거나 주파수성분의 데이터로 만들어 전송하는 전처리부와, 전처리부의 데이터를 이용하여 등록된 어휘목록에서 가장 유사하다고 생각되는 어휘를 검색하는 음성인식부를 소켓 통신 가능하게 구성하여 그 교환 및 확장시 패키지 단위 의 교환 및 확장이 가능하도록 한 채널용량에 따른 가변 구성이 가능한 음성인식 시스템이 제공된다.In order to achieve the above object, according to a preferred embodiment of the present invention, in the speech recognition system that recognizes the voice by the voice input, pre-processing to find and transmit the start and end points of the input voice or to make the data of the frequency component By using the data of the preprocessing section and the preprocessing section, the voice recognition unit for searching for the vocabulary that is considered to be the most similar in the registered vocabulary list is configured so that the socket communication can be performed. According to the present invention, a voice recognition system capable of variable configuration is provided.

바람직하게, 상기 전처리부와 음성인식부간의 통신 데이터에는 통신을 서술한 헤더가 첨부된 소켓이 구성된 것을 특징으로 하는 채널용량에 따른 가변 구성이 가능한 음성인식 시스템이 제공된다.Preferably, a voice recognition system capable of varying configuration according to channel capacity is provided in the communication data between the preprocessor and the voice recognition unit, wherein a socket with a header describing the communication is configured.

보다 바람직하게, 상기 전처리부와 음성인식부가 각각 두 개 이상의 시스템에 분리되어 구성된 경우에는 상기 전처리부와 음성인식부에는 할당된 채널에 따라 해당 인식시스템의 IP 어드레스를 입력하여 지정하는 것을 특징으로 하는 채널용량에 따른 가변 구성이 가능한 음성인식 시스템이 제공된다.More preferably, when the preprocessing unit and the voice recognition unit are separately configured in at least two systems, the preprocessing unit and the voice recognition unit may be designated by inputting an IP address of the corresponding recognition system according to an assigned channel. Provided is a speech recognition system capable of varying configuration according to channel capacity.

한편, 본 발명은 입력된 음성을 처리하여 인식처리를 행하는 음성인식시스템에 있어서, 입력된 음성의 시작점과 끝점을 찾아내어 전송하거나 주파수성분의 데이터로 만들어 전송하는 전처리부와, 전처리부의 데이터를 이용하여 등록된 어휘목록에서 가장 유사하다고 생각되는 어휘를 검색하는 음성인식부가 상호 소켓 통신하도록 함으로써 음성인식 데이터를 처리하는 것을 특징으로 하는 채널용량에 따른 가변 구성이 가능한 음성인식 방법이 제공된다.On the other hand, the present invention is a voice recognition system for processing the input voice to perform the recognition process, the pre-processing unit for finding and transmitting the starting point and the end point of the input voice or by making the data of the frequency component, and using the data of the pre-processing unit There is provided a voice recognition method that can be configured in accordance with the channel capacity characterized by processing the voice recognition data by allowing the voice recognition unit for retrieving the vocabulary that is considered to be the most similar in the registered vocabulary list.

바람직하게, 전처리부와 음성인식부의 상호 통신을 위하여 음성인식 초기화시 장비사양에 따라 데이터 형식을 지정하는 과정을 포함하는 것을 특징으로 하는 채널용량에 따른 가변 구성이 가능한 음성인식 방법이 제공된다.Preferably, there is provided a voice recognition method capable of a variable configuration according to the channel capacity, comprising the step of specifying the data format according to the equipment specifications during the voice recognition initialization for mutual communication of the preprocessing unit and the voice recognition unit.

보다 바람직하게, 상기 전처리부는 입력되는 음성데이터를 상기 음성인식부에 대하여 가공없이 전송하거나 또는 주파수 성분의 데이터를 추출하여 전송하는 것 중 어느 하나를 선택하여 전송하는 것을 특징으로 하는 채널용량에 따른 가변 구성이 가능한 음성인식 방법이 제공된다. More preferably, the preprocessing unit transmits the input voice data without processing to the voice recognition unit, or selects and transmits one of the data of the frequency component and transmits the variable data according to the channel capacity. A configurable speech recognition method is provided.

이하, 본 발명에 대해 도면을 참조하여 상세하게 설명한다.EMBODIMENT OF THE INVENTION Hereinafter, this invention is demonstrated in detail with reference to drawings.

도 1은 본 발명의 일실시예에 따른 채널용량에 따른 가변 구성이 가능한 음성인식 시스템의 개략적인 구성을 도시한 모식도이다.1 is a schematic diagram showing a schematic configuration of a voice recognition system capable of a variable configuration according to the channel capacity according to an embodiment of the present invention.

이를 참조하면, 본 발명에 따른 채널용량에 따른 가변 구성이 가능한 음성인식 시스템은 그 내부에 전처리부(100)와 음성인식부(200)를 하나의 패키지로 구성한다. 따라서, 다른 시스템으로 이전설치가 가능한 신축성 있는 시스템을 구성한다.Referring to this, the voice recognition system capable of variable configuration according to the channel capacity according to the present invention comprises a pre-processing unit 100 and the voice recognition unit 200 in one package. Therefore, configure a flexible system that can be transferred to another system.

이를 위하여, 음성인식 서비스를 제어하는 부분과 음성인식을 수행하는 부분과의 통신을 LAN을 이용한 소켓(packet)통신을 실행함으로써 가능하고, 이때 상기 두 가지 부분(서비스 제어를 위한 전처리부, 음성인식부)이 시스템으로 분리되어 있지 않았더라도 소켓(packet)을 이용한 통신을 함으로써 시스템의 확장성이 용이해진다.To this end, communication between the part controlling the voice recognition service and the part performing the voice recognition is possible by executing a socket communication using a LAN, wherein the two parts (pre-processing part for service control, voice recognition) Even if the sub) is not separated into the system, the system can be easily expanded by communicating with the socket.

따라서, 본 발명에서는 음성인식과정을 입력된 음성의 시작점과 끝점을 찾아내어 전송하거나 주파수성분의 데이터로 만들어 전송하는 전처리부(100)와, 전처리부의 데이터를 이용하여 등록된 어휘목록에서 가장 유사하다고 생각되는 어휘를 검색하는 음성인식부(200)로 분리함을 전제로 두 양단간의 통신 데이터 형식을 정의한다.Therefore, in the present invention, the preprocessing unit 100 that finds and transmits a start point and an end point of an input voice or transmits the data of a frequency component is made of data of a frequency component, and is most similar in a lexicon registered using data of the preprocessor. The communication data format between the two ends is defined on the premise that the speech recognition unit 200 searches for the lexical word.

보다 상세하게, 상기 전처리부(100)는 그 내부에 시나리오 스레드(110)와, TCP/IP 클라이언트(120)와 TCP/IP 서버(130)가 구비되고, 상기 음성인식부(200)는 그 내부에 상기 전처리부(100)의 TCP/IP 클라이언트(120)로부터 전송된 데이터를 인가받는 TCP/IP 서버(210)와, 입력 데이터(220)와, 그 입력 데이터를 기등록된 어휘목록에서 검색하는 인식처리 검색 스레드(230)와 TCP/IP 클라이언트(240)가 구성된다.In more detail, the preprocessor 100 includes a scenario thread 110, a TCP / IP client 120, and a TCP / IP server 130 therein, and the voice recognition unit 200 is provided therein. The TCP / IP server 210 receiving the data transmitted from the TCP / IP client 120 of the preprocessor 100, the input data 220, and the input data are searched in a pre-registered lexical list. The recognition processing search thread 230 and the TCP / IP client 240 are configured.

상기 구성으로, 본 발명에 따른 채널용량에 따른 가변 구성이 가능한 음성인식 시스템을 통해 서비스를 제공하는 경우 먼저, 상기 전처리부(100)는 사용자로부터 음성을 받아들여 음성의 시작점과 끝점을 찾아내고, 찾아낸 부분에 대하여 시스템의 구조에 따라서는 주파수성분에 대한 데이터를 만들어 내어 음성인식기능을 수행하는 상기 음성인식부(200)로 전송한다. With the above configuration, in the case of providing a service through a voice recognition system capable of variable configuration according to the channel capacity according to the present invention, the preprocessor 100 receives the voice from the user to find the start and end points of the voice, According to the structure of the system for the found part, data on frequency components are generated and transmitted to the voice recognition unit 200 which performs a voice recognition function.

이때, 상기 전처리부(100)와 음성인식부(200) 사이의 통신에는 소켓(socket)을 사용하여 상기 전처리부(100)와 음성인식부(200)가 하나의 시스템에서 동작하던 지, 별도의 시스템에서 동작하던 지와는 무관하게 이용자는 동일한 프로그램으로 구성이 가능하게 된다. At this time, the communication between the preprocessor 100 and the voice recognition unit 200 using a socket (socket) or the pre-processing unit 100 and the voice recognition unit 200 operates in one system, or separate Regardless of whether the system is running or not, users can configure the same program.

즉, 상기 전처리부(100)와 음성인식부(200)가 하나의 시스템에 있는 경우는 동일한 IP address를 사용하므로 특별히 이것을 지정해 줄 필요가 없게 되고, 두개이상의 시스템으로 분리되어 있는 경우 할당된 채널에 그 인식시스템의 IP address 만 입력하면 된다.That is, when the preprocessing unit 100 and the voice recognition unit 200 are in one system, the same IP address is used, and thus it is not necessary to specify this in particular. You only need to enter the IP address of the recognition system.

이하, 상기한 구성으로 이루어진 채널용량에 따른 가변 구성이 가능한 음성인식 시스템의 전처리부와 음성인식부간의 데이터 처리과정에 대하여 첨부된 도면 을 참조하여 상세하게 기술한다.Hereinafter, a data processing procedure between the preprocessing unit and the voice recognition unit of the voice recognition system capable of varying configuration according to the channel capacity having the above configuration will be described in detail with reference to the accompanying drawings.

도 2는 본 발명의 일실시예에 따른 채널용량에 따른 가변 구성이 가능한 음성인식 시스템내의 전처리부와 음성인식부간의 패킷 처리과정을 나타내는 도면이다.2 is a diagram illustrating a packet processing process between a preprocessor and a voice recognition unit in a voice recognition system capable of varying configuration according to channel capacity according to an embodiment of the present invention.

이를 참조하면, 상기 전처리부(100)와 음성인식부(200)를 통해 이루어지는 음성인식 처리동작은 표 1, 표 2, 표 3, 표 4로 정의된 통신 데이터 형식에 의하여 도 2에 도시된 바와 같이 수행된다.Referring to this, the voice recognition processing operations performed through the preprocessor 100 and the voice recognition unit 200 are shown in FIG. 2 by the communication data formats defined in Tables 1, 2, 3, and 4. Is performed together.

음성인식 초기화 요구(P1)Voice recognition initialization request (P1) Seq. Seq. Description Description 1 One Start Start 2 2 0X01(음성인식 초기화) 0X01 (Voice recognition initialization) 3 3 '전 처리부' 장치의 시스템 ID System ID of the 'preprocessor' device 4 4 '전 처리부' 장치의 IP 어드레스 IP address of the 'preprocessor' device 5 5 음성코딩 종류표기 혹은 음성특징 추출값 Voice coding type notation or voice feature extraction value 6 6 표본의 값 크기 Value of sample size 7 7 표본의 속도 Speed of specimen 8 8 프레임 단위 Frame unit 9 9 프레임 당 바이트 수 Bytes per frame 10 10 End End

음성인식 초기화 응답(P2) Voice recognition initialization response (P2) Seq. Seq. Description Description 1 One Start Start 2 2 Acknowledge Acknowledge 3 3 '전 처리부' 장치의 시스템 ID System ID of the 'preprocessor' device 4 4 OK(0) or NOK(1) OK (0) or NOK (1) 5 5 End End

음성인식 처리요구(P3)Voice recognition processing request (P3) Seq. Seq. Description Description 1 One Start Start 2 2 음성코딩 종류표기 혹은 음성특징 추출값 Voice coding type notation or voice feature extraction value 3 3 '전 처리부' 장치의 시스템 ID(01-99) System ID of the 'preprocessor' device (01-99) 4 4 '전 처리부' 장치의 IP 어드레스 String (ex:"192.168.0.1") IP address string of the 'preprocessor' device (ex: "192.168.0.1") 5 5 메시지 ID는 '전처리부'에서 부여 Message ID is given by 'Preprocessor' 6 6 시나리오상에서 인식처리 단계의 Step ID Step ID of recognition processing step in scenario 7 7 CMD에 따른 음성인식처리의 입력 데이터 크기 Input data size of speech recognition processing according to CMD 8 8 CMD에 따른 음성인식처리의 입력 데이터 Input data of speech recognition processing according to CMD 9 9 End End

Seq. Seq. Description Description 1 One Start Start 2 2 '호처리 및 특징 추출부' 장치 시스템 ID(01-99) 'Call Processing and Feature Extraction Unit' Device System ID (01-99) 3 3 메시지 ID Message ID 4 4 시나리오 상에서 인식처리 단계의 Step ID Step ID of recognition processing step in scenario 5 5 음성인식 검색처리 결과(Index code) Voice recognition search processing result (Index code) 6 6 음성인식 검색처리 결과(해당 단어 문자열) Speech recognition search result (the word string) 7 7 End End

본 발명의 실시예에 따른 상기 전처리부(100)와 음성인식부(200)를 통해 이루어지는 음성인식 초기화 작업은 표 1과 표 2에 따른 데이터 형식으로 정의되어 있다.The voice recognition initialization task performed by the preprocessor 100 and the voice recognition unit 200 according to an embodiment of the present invention is defined in the data format according to Table 1 and Table 2.

즉, 표 1에 게재된 'DATA_MODE'의 값에 따라 상기 음성인식부(200)의 입력데이터의 특정한 형식이 초기화에서 지정된다. 그 'DATA_MODE'의 값에 따라 상기 전처리부(100)와 음성인식부(200)의 장비사양을 융통적으로 선택할 수 있다. That is, according to the value of 'DATA_MODE' shown in Table 1, the specific format of the input data of the voice recognition unit 200 is designated in the initialization. According to the value of 'DATA_MODE', the equipment specifications of the preprocessor 100 and the voice recognition unit 200 can be flexibly selected.

예컨대, 음성인식으로 처리할 어휘가 보통 규모이지만 다 채널의 음성인식장치를 구성하는 경우에는 'DATA_MODE'의 값이 세 가지 형식인 0x01, 0x02, 0x04 중에서 어떤 하나로 지정하여 상기 전처리부(100)는 호 처리만을 담당하는 단순한 것으로 구성하고, 상기 음성인식부(200)의 음성인식처리기내 음성에서 주파수 성분을 찾아내는 특징추출 기능이 추가된 기능을 부여할 수 있다.For example, if the vocabulary to be processed by speech recognition is of a normal size but a multi-channel speech recognition device is configured, the value of 'DATA_MODE' is set to one of three formats, 0x01, 0x02, and 0x04, so that the preprocessor 100 It can be configured to be a simple one in charge of call processing, and a feature addition function for finding a frequency component from the voice in the voice recognition processor of the voice recognition unit 200 is added.

또한, 'DATA_MODE'의 값이 0x08이 되면 상기 전처리부(100)에서 음성특징의 추출기능이 구현되고, 상기 음성인식부(200)는 음성인식처리만을 구현할 수 있는 단순한 장비로 구성할 수 있다. In addition, when the value of 'DATA_MODE' is 0x08, the extraction function of the voice feature is implemented in the preprocessor 100, and the voice recognition unit 200 may be configured as a simple device that can implement only the voice recognition processing.

일단, 초기환경이 설정된 다음의 음성인식처리과정은 표 3의 정의 따라 음성인식처리를 상기 전처리부(100)로부터 음성인식부(200)로 요청하면, 음성인식부(200)는 표 4의 데이터 형식으로 그 처리 결과를 응답한다. Once the initial environment is set, the voice recognition processing process requests the voice recognition processing from the preprocessing unit 100 to the voice recognition unit 200 according to the definition of Table 3, and the voice recognition unit 200 uses the data shown in Table 4 below. Answer the processing result in the form.

도 2에 도시된 바와 같은 처리 과정에서 음성인식처리를 요청하는 사용자는 표 3 및 표 4의 'SYS_ID'와 'MSG_ID'의 값에 의하여 정확히 식별할 수 있다. A user requesting a voice recognition process in the process as shown in FIG. 2 can be correctly identified by the values of 'SYS_ID' and 'MSG_ID' in Tables 3 and 4.

또한, 특정한 사용자가 음성인식 장치의 서비스를 이용하고 있다면, 시나리오상의 위치는 표 3 및 표 4의 'RECOG_STEP'를 통해서 확인할 수 있게 된다. In addition, if a specific user is using the service of the voice recognition device, the location of the scenario can be confirmed through 'RECOG_STEP' of Tables 3 and 4.

따라서, 본 발명에 따른 채널용량에 따른 가변 구성이 가능한 음성인식 시스템은 상기 전처리부(100) 및 음성인식부(200)가 상호 통신 가능한 패키지 형태로 구성되어 있으므로 채널용량에 따라 패키지로 확장이 매우 용이하며, 음성인식엔진 제작사에 종속되지 않고 필요시 선택적으로 가변 가능하다.Therefore, the voice recognition system capable of variable configuration according to the channel capacity according to the present invention is very extended to the package according to the channel capacity because the preprocessing unit 100 and the voice recognition unit 200 is configured in a package form that can communicate with each other. It is easy and can be selectively changed when necessary without being dependent on the voice recognition engine manufacturer.

한편, 본 발명의 실시예에 따른 채널용량에 따른 가변 구성이 가능한 음성인식 시스템 및 그 방법은 단지 상기한 실시예에 한정되는 것이 아니라 그 기술적 요지를 이탈하지 않는 범위내에서 다양한 변경이 가능하다. On the other hand, the voice recognition system and the method which can be configured in accordance with the channel capacity according to an embodiment of the present invention is not limited to the above embodiment but can be variously changed within the scope not departing from the technical gist.

상기한 바와 같이, 본 발명에 따른 채널용량에 따른 가변 구성이 가능한 음성인식 시스템 및 그 방법은 호처리를 행하는 전처리부 및 음성인식을 행하는 음성인식부가 상호 통신 가능한 패키지 형태로 구성되며, 상호간 소켓 통신을 행함으로써 음성인식 성능을 유지하면서 채널용량에 따라 패키지로 확장이 매우 용이하며, 음성인식엔진 제작사에 종속되지 않고 필요시 선택적으로 가변 가능하게 된다는 효 과가 있다.As described above, the voice recognition system and the method of the variable configuration according to the channel capacity according to the present invention is a pre-processing unit for performing the call processing and the voice recognition unit for the voice recognition is configured in a package form that can communicate with each other, mutual communication By doing this, it is very easy to expand the package according to the channel capacity while maintaining the voice recognition performance, and can be selectively changed when necessary without being dependent on the voice recognition engine manufacturer.

Claims

전처리부 및 음성 인식부를 포함하는 채널 용량에 따른 가변 구성이 가능한 음성 인식 시스템으로서,A voice recognition system capable of varying configuration according to channel capacity including a preprocessor and a voice recognition unit,

상기 전처리부와 상기 음성 인식부 사이의 상호 통신을 위하여 음성 인식 초기화 시에 채널 용량에 따라 상기 전처리부로부터 상기 음성 인식부로의 입력 데이터 형식이 지정되면,If the input data format from the preprocessor to the voice recognition unit is designated according to channel capacity at the time of voice recognition initialization for mutual communication between the preprocessor and the voice recognition unit,

상기 전처리부는 지정된 상기 입력 데이터 형식에 따라 입력된 음성의 시작점과 끝점을 찾아내어 전송하거나 주파수성분의 데이터로 만들어 전송하는 특징 추출 기능을 수행하고,The preprocessing unit performs a feature extraction function for finding and transmitting a starting point and an end point of an input voice according to a specified input data format, or making the data of frequency components and transmitting the same.

상기 음성 인식부는 지정된 상기 입력 데이터 형식에 따라 상기 전처리부의 데이터를 이용하여 등록된 어휘목록에서 가장 유사하다고 생각되는 어휘를 검색하는 음성 인식 처리 기능을 수행하며,The speech recognition unit performs a speech recognition processing function of searching for a vocabulary that is considered to be the most similar in the registered vocabulary list using the data of the preprocessing unit according to the designated input data format.

상기 전처리부와 상기 음성 인식부를 소켓 통신 가능하게 구성하여 그 교환 및 확장시 패키지 단위의 교환 및 확장이 가능한 채널 용량에 따른 가변 구성이 가능한 음성 인식 시스템.The voice recognition system is configured to enable the socket communication with the pre-processing unit and the voice recognition unit is possible to vary the configuration according to the channel capacity that can be exchanged and expanded in the package unit.

청구항 1에 있어서,The method according to claim 1,

상기 전처리부는 지정된 상기 입력 데이터 형식에 따라 상기 특징 추출 기능을 수행하거나 입력된 음성데이터를 상기 음성 인식부로 가공 없이 전송하는 단순 호 처리 기능을 수행하는 것을 선택가능한 채널 용량에 따른 가변 구성이 가능한 음성 인식 시스템.The preprocessing unit can perform voice extraction according to the designated input data format or perform a simple call processing function for transmitting the input voice data to the voice recognition unit without processing. system.

청구항 1에 있어서,The method according to claim 1,

상기 음성 인식부는 지정된 상기 입력 데이터 형식에 따라 상기 음성 인식 처리 기능을 수행하거나 상기 음성 인식 처리 기능 및 상기 특징 추출 기능을 동시에 수행하는 것을 선택가능한 채널 용량에 따른 가변 구성이 가능한 음성 인식 시스템.And the voice recognition unit is configurable according to a channel capacity selectable to perform the voice recognition processing function or simultaneously perform the voice recognition processing function and the feature extraction function according to the designated input data format.

청구항 1 내지 청구항 3 중 어느 하나의 항에 있어서,The method according to any one of claims 1 to 3,

상기 전처리부와 상기 음성 인식부 사이의 통신 데이터에는 통신을 서술한 헤더가 첨부된 소켓이 구성된 것을 특징으로 하는 채널 용량에 따른 가변 구성이 가능한 음성 인식 시스템.Communication data between the preprocessing unit and the speech recognition unit is configured with a socket with a header describing the communication, the speech recognition system capable of varying configuration according to the channel capacity.

상기 전처리부 및 상기 음성 인식부가 각각 두 개 이상의 시스템에 분리되어 구성된 경우에는 상기 전처리부 및 상기 음성 인식부에는 할당된 채널에 따라 해당 인식시스템의 IP 어드레스를 입력하여 지정하는 것을 특징으로 하는 채널 용량에 따른 가변 구성이 가능한 음성 인식 시스템.When the preprocessor and the voice recognition unit are separately configured in at least two systems, the preprocessing unit and the voice recognition unit input and designate an IP address of the corresponding recognition system according to the assigned channel. Voice recognition system capable of variable configuration according to.

전처리부와 음성 인식부 사이의 상호 통신을 위하여 음성 인식 초기화 시에 채널 용량에 따라 상기 전처리부로부터 상기 음성 인식부로의 입력 데이터 형식을 지정하는 제 1 단계;A first step of designating an input data format from the preprocessor to the voice recognition unit according to channel capacity at the time of voice recognition initialization for mutual communication between the preprocessor and the voice recognition unit;

지정된 상기 입력 데이터 형식에 따라 상기 전처리부가 입력된 음성의 시작점과 끝점을 찾아내어 전송하거나 주파수성분의 데이터로 만들어 전송하는 특징 추출 기능을 수행하는 제 2 단계; 및A second step of performing a feature extraction function of finding and transmitting a start point and an end point of the input voice according to the designated input data format or by making data of frequency components; And

지정된 상기 입력 데이터 형식에 따라 상기 음성 인식부가 상기 전처리부의 데이터를 이용하여 상기 음성 인식부가 등록된 어휘목록에서 가장 유사하다고 생각되는 어휘를 검색하는 음성 인식 처리 기능을 수행하고 상기 전처리부와 상기 음성 인식부가 상호 소켓 통신하도록 함으로써 음성 인식 데이터를 처리하는 제 3 단계를 포함하는 채널 용량에 따른 가변 구성이 가능한 음성 인식 방법.According to the designated input data format, the speech recognition unit performs a voice recognition processing function of searching for a vocabulary that is considered to be the most similar in the registered vocabulary list by using the data of the preprocessor. And a third step of processing voice recognition data by causing additional mutual socket communication.

청구항 6에 있어서,The method according to claim 6,

상기 제 2 단계는, The second step,

상기 전처리부가 지정된 상기 입력 데이터 형식에 따라 상기 특징 추출 기능을 수행하거나 입력된 음성데이터를 상기 음성 인식부에 대하여 가공 없이 전송하는 단순 호 처리 기능을 수행하는 것을 선택가능한 채널 용량에 따른 가변 구성이 가능한 음성 인식 방법.Variable configuration according to channel capacity is selectable to perform the feature extraction function according to the input data format designated by the preprocessor or to perform a simple call processing function for transmitting the input voice data to the voice recognition unit without processing. Speech recognition method.

청구항 6에 있어서,The method according to claim 6,

상기 제 3 단계는,The third step,

상기 음성 인식부가 지정된 상기 입력 데이터 형식에 따라 상기 음성 인식 처리 기능 및 상기 특징 추출 기능을 동시에 수행하거나 상기 음성 인식 처리 기능만을 수행하는 것을 선택가능한 채널 용량에 따른 가변 구성이 가능한 음성 인식 방법.And a variable configuration according to a channel capacity selectable to simultaneously perform the voice recognition processing function and the feature extraction function or perform only the voice recognition processing function according to the input data format designated by the voice recognition unit.