KR100667522B1

KR100667522B1 - Speech Recognition Method of Mobile Communication Terminal Using LPC Coefficient

Info

Publication number: KR100667522B1
Application number: KR1019980056241A
Authority: KR
Inventors: 성 희 박; 남 호 정; 창 업 황; 오 일 권
Original assignee: 주식회사 현대오토넷
Priority date: 1998-12-18
Filing date: 1998-12-18
Publication date: 2007-05-17
Also published as: KR20000040579A

Abstract

본 발명은 음성의 PCM(pulse code modulation) 데이터를 직접 이용하지 않고 이동통신 단말기의 보코더(VOCODER) 출력인 LSP 코드(line spectrum pair code)로부터 LPC(linear predictive coding) 계수를 추출하여 음성 인식에 사용되는 LPC 계수를 이용한 이동통신 단말기 음성인식 방법에 관한 것으로서, 일정한 패킷의 형태로 입력되는 데이터를 입력받아 상기 패킷중에서 LSP 코드 정보만을 추출하는 단계(S1)와, 상기 단계(S1)에서 추출된 LSP 코드 정보를 LSP 주파수로 변환하는 단계(S2)와, 상기 단계(S2)로부터 출력되는 신호를 입력받아 로우 패스 필터를 통해서 저주파수 성분의 신호만을 통과하는 단계(S3)와, 상기 단계(S3)를 수행한 후 음성 인식부에서 음성 신호를 용이하게 인식하도록 LSP 주파수를 LPC 계수로 변환하는 단계(S4)와, 상기 단계(S4)의 수행으로 LPC 계수로 변환 된 신호를 입력받아 LPC 셉스트럼으로 변환하는 단계(S5)로 이루어져 있어서, 데이터의 계산량을 현저히 감소시킬 수 있어서 데이터의 처리속도를 향상시킬 수 있는 발명이다. The present invention extracts linear predictive coding (LPC) coefficients from line spectrum pair code (VOSPR) output of a mobile communication terminal without directly using pulse code modulation (PCM) data of speech and is used for speech recognition. A method of speech recognition using a LPC coefficient, the method comprising: receiving data input in the form of a predetermined packet and extracting only LSP code information from the packet (S1) and the LSP extracted in the step (S1); Converting the code information into the LSP frequency (S2), receiving the signal output from the step (S2) and passing only the signal of the low frequency component through the low pass filter (S3), and the step (S3) After performing the step (S4) to convert the LSP frequency into LPC coefficients so that the speech recognition unit easily recognizes the speech signal, the signal converted to LPC coefficients by performing the step (S4) Input receives the invention which can be made according to the step (S5) of converting the LPC forceps strum, it is possible to significantly reduce the amount of computation of data speed up the processing of data.

Description

LPC 계수를 이용한 이동통신 단말기 음성인식 방법Speech Recognition Method of Mobile Communication Terminal Using LPC Coefficient

본 발명은 이동 통신 단말기에서의 음성 인식에 관한 것으로서, 특히 음성의 PCM(pulse code modulation) 데이터를 직접 이용하지 않고 이동통신 단말기의 보코더(VOCODER) 출력인 LSP 코드(line spectrum pair code)로부터 LPC(linear predictive coding) 계수를 추출하여 음성 인식에 사용되는 LPC 계수를 이용한 이동통신 단말기 음성인식 방법에 관한 것이다. BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to speech recognition in a mobile communication terminal. In particular, the present invention relates to LPC (line spectrum pair code) from an LSP code, which is a VOCODER output of a mobile communication terminal, without directly using pulse code modulation (PCM) data of a voice. The present invention relates to a speech recognition method of a mobile communication terminal using LPC coefficients used for speech recognition by extracting linear predictive coding coefficients.

종래의 음성인식을 위한 음성의 특정계수로 많이 사용되는 LPC 셉스트럼(cepstrum) 계수는 보통의 경우 PCM 데이터로부터 얻어지게 된다. 이러한 경우 자기 상관계수(autocorrelation coefficients)를 이용하게 되는데, 이런 경우 계산량이 상당히 증가하게 됨으로써, 시스템의 동작 속도가 저하되는 문제점이 있었다. LPC cepstrum coefficient, which is widely used as a specific coefficient of speech for conventional speech recognition, is usually obtained from PCM data. In this case, autocorrelation coefficients are used. In this case, the computational amount is significantly increased, resulting in a decrease in the operation speed of the system.

상기와 같은 문제점을 해결하기 위한 본 발명은, 이동통신 단말기에서 음성의 특정 계수를 보코더 출력인 LSP 코드로부터 추출하고, 패킷의 레이트 정보를 이용하여 음성 신호를 인식하는 LPC 계수를 이용한 이동통신 단말기 음성인식 방법을 제공하는데 있다. The present invention for solving the above problems, the mobile communication terminal extracts a specific coefficient of the voice from the LSP code which is the vocoder output, and uses the LPC coefficients for recognizing the voice signal using the packet rate information To provide a recognition method.

상기와 같은 목적을 달성하기 위한 본 발명의 LPC 계수를 이용한 이동통신 단말기 음성인식 방법은, 일정한 패킷의 형태로 입력되는 데이터를 입력받아 상기 패킷중에서 LSP 코드 정보만을 추출하는 단계(S1)와, 상기 단계(S1)에서 추출된 LSP 코드 정보를 LSP 주파수로 변환하는 단계(S2)와, 상기 단계(S2)로부터 출력되는 신호를 입력받아 로우 패스 필터를 통해서 저주파수 성분의 신호만을 통과하는 단계(S3)와, 상기 단계(S3)를 수행한 후 음성 인식부에서 음성 신호를 용이하게 인식하도록 LSP 주파수를 LPC 계수로 변환하는 단계(S4)와, 상기 단계(S4)의 수행으로 LPC 계수로 변환 된 신호를 입력받아 LPC 셉스트럼으로 변환하는 단계(S5)로 이루어져 있는 것을 특징으로 한다. Mobile communication terminal speech recognition method using the LPC coefficient of the present invention for achieving the above object, the step of receiving the data input in the form of a predetermined packet to extract only the LSP code information from the packet (S1), and Converting the LSP code information extracted in the step S1 to the LSP frequency (S2); receiving a signal output from the step S2, and passing only a signal having a low frequency component through a low pass filter (S3); And (S4) converting the LSP frequency into an LPC coefficient so that a speech signal is easily recognized by the speech recognizer after performing step S3, and converting the signal into an LPC coefficient by performing step S4. It is characterized in that consisting of the step (S5) receiving the input to LPC septum.

본 발명의 LPC 계수를 이용한 이동통신 단말기 음성인식 방법을 첨부한 도면을 참고로 상세히 기술하면 다음과 같다. Referring to the accompanying drawings, the voice recognition method of the mobile communication terminal using the LPC coefficient of the present invention will be described in detail as follows.

도 1은 본 발명에 대한 개략적인 블럭도로서, 외부로부터 음성신호를 입력받아 전기신호로 변환하는 마이크로폰(microphone)(11)과, 상기 마이크로폰(11)에 의해서 음성 신호가 전기 신호로 변환된 신호를 PCM 코딩한 후 음성 신호의 데이터 크기를 감소시키기 위해서 일정한 형태의 패킷으로 출력하는 보코더(VOCODER)(12)와, 상기 보코더(12)에서 출력된 패킷 데이터(packet data)를 입력받아 음성 구간을 검출하고 음성 통신용 코드인 LSP 계수의 값을 음성 인식에 필요한 LPC 계수로 변환하는 마이컴(14)과, 상기 마이컴(14)으로부터 LPC 값을 입력받아 그 내부에 저장되어 있는 음성 신호와 비교하여 가장 유사한 단어를 인식하는 음성 인식부(16)와, 상기 음성 인식부(16)에서 출력되는 신호가 상기 마이컴(14)에 인가하면 상기 마이컴(14)에서 음성 신호의 내용에 따라 출력되는 제어신호에 의해서 단말기의 다이얼링(dialing)과 같은 동작을 수행하는 구동부(18)로 구성되어 있다. 1 is a schematic block diagram of the present invention, a microphone 11 for receiving a voice signal from the outside and converting it into an electric signal, and a signal in which the voice signal is converted into an electric signal by the microphone 11. In order to reduce the data size of a speech signal after PCM coding, a VOCODER 12 for outputting a packet of a predetermined form and a packet data output from the vocoder 12 are received. The microcomputer 14 which detects and converts the value of the LSP coefficient, which is a code for voice communication, into an LPC coefficient necessary for speech recognition, and receives the LPC value from the microcomputer 14 and compares it with the voice signal stored therein. The voice recognition unit 16 that recognizes a word and the signal output from the voice recognition unit 16 are applied to the microcomputer 14, and the microcomputer 14 outputs the audio signal according to the contents of the voice signal. It is composed of a drive unit 18 that performs an action, such as dialing (dialing) of the terminal by a control signal.

상기 보코더(12)에서 출력되는 패킷 데이터는 레이트(rate), LSP, 피치(pitch) 및 코드북(codebook) 데이터를 포함하고 있으며, 상기 레이트는 다음의 조건에 의해서 결정된다. Packet data output from the vocoder 12 includes rate, LSP, pitch, and codebook data, and the rate is determined by the following conditions.

R(0) ＞ TH3인 경우에는 레이트=1이고, If R (0)> TH3, rate = 1,

R(0) ＞ TH2 및 R(0) ＜ TH3인 경우에는 레이트= 1/2이고, When R (0)> TH2 and R (0) <TH3, the rate is 1/2,

R(0) ＞ TH1 및 R(0) ＜ TH2인 경우에는 레이트= 1/4이고, If R (0)> TH1 and R (0) <TH2, the rate is 1/4,

R(0) ＜ TH1인 경우에는 레이트= 1/8이 된다. When R (0) <TH1, the rate is 1/8.

여기에서 TH1, TH2, TH3은 레이트를 결정하기 위한 임계값들로서 R(0)의 값에 따라 결정되며, R(0)은 현재 프레임의 에너지를 나타낸다. Here TH1, TH2, TH3 are thresholds for determining the rate and are determined according to the value of R (0), where R (0) represents the energy of the current frame.

레이트가 1인 경우에는 음성 신호를 나타내고, 레이트가 1/8 경우에는 묵음 또는 배경음을 나타내고, 레이트가 1/2, 1/4 인 경우에는 음성과 묵음 사이의 과도구간을 나타낸다. A rate of 1 indicates an audio signal, a rate of 1/8 indicates a mute or background sound, and a rate of 1/2 or 1/4 indicates a transition between voice and silence.

상기와 같이 구성된 본 발명의 작용, 효과를 첨부된 도면을 참고로 기술하면 다음과 같다. Referring to the accompanying drawings, the operation, effects of the present invention configured as described above are as follows.

먼저, 외부로부터 음성 신호가 마이크로폰(11)을 통해서 입력되면, 상기 마이크로폰(11)에 의해서 음성 신호가 전기신호로 변환된 후 보코더(12)에 입력되어 샘플링 레이트(sampling rate)로 샘플링된 후 레이트(rate), LSP, 피치와 코드북 등의 정보를 갖는 데이터의 패킷으로 형성되며, 여기서, 레이트는 R(0)과 TH1, TH2, TH3의 값에 따라서 다양하게 변화하고 이에 따라서 데이터 량도 변하게 된 후 마이컴(14)에 입력된다. First, when a voice signal is input from the outside through the microphone 11, the voice signal is converted into an electrical signal by the microphone 11, and then input to the vocoder 12, and sampled at a sampling rate. It is formed of a packet of data having information such as (rate), LSP, pitch, and codebook, wherein the rate varies in accordance with the values of R (0) and TH1, TH2, TH3, and accordingly, the amount of data also changes. After that, it is input to the microcomputer 14.

그러면 상기 마이컴(14)에서는 상기와 같이 일정한 패킷의 형태로 입력되는 데이터를 입력받아 상기 패킷중에서 LSP 코드 정보만을 추출하여(S1), 다음과 같은 식에 의해서(여기서, X는 LSP 주파수이고, Q(X)는 LSP 코드이며, Q^max는 최대 양자화 레벨이고, Q^min은 최소 양자화 레벨이며, N은 양자화 비트수이다. ) LSP 주파수로 변환된다(S2).Then, the microcomputer 14 receives data input in the form of a predetermined packet as described above, and extracts only LSP code information from the packet (S1). By the equation (where X is the LSP frequency, Q (X) is the LSP code, Q ^max is the maximum quantization level, Q ^min is the minimum quantization level, and N is the number of quantization bits). (S2).

상기 제 2 단계(S2)로부터 LSP 주파수를 구한 후에는 도시되지 않은 로우 패스 필터(low pass filter)를 통해서 고주파 성분의 신호를 제거하고 저주파 성분의 신호만을 통과한다(S3). After obtaining the LSP frequency from the second step S2, a signal of a high frequency component is removed through a low pass filter (not shown) and only a signal of a low frequency component passes (S3).

이와 같기, 제 3 단계(S3)를 수행한 후 LSP 주파수 신호는 LPC 계수로 변환되어 상기 음성 인식부(16)에서 음성 신호를 용이하게 인식하도록 하며(S4), 상기 단계(S4)에서 LPC 계수로 변환된 음성 신호는 다음과 같이, As such, after performing the third step S3, the LSP frequency signal is converted into an LPC coefficient so that the speech recognition unit 16 easily recognizes the speech signal (S4), and in the step S4, the LPC coefficients. The converted voice signal is as follows,

1) i N인 경우(여기서, N은 LPC 차수임)에는, Cep[i]= -1) If i N (where N is the LPC order), Cep [i] = −

이 되고,Become,

2) i ＞ N인 경우(여기서, N은 LPC 차수임)에는 Cep[i]=2) If i> N, where N is the LPC order, then Cep [i] =

이 되도록(여기서, i, j는 인텍스(index)를 나타냄.) LPC 셉스트럼으로 변환되며(S5), 이때 보통의 경우에는 LPC 셉스트럼의 차수는 LPC 차수보다 크게 설정한다. (Where i and j represent indexes), and are converted to LPC septum (S5), in which case the order of the LPC septum is usually set to be larger than the LPC order.

이와 같이, 음성 신호가 LPC 계수에서 LPC 셉스트럼으로 변환된 신호는 음성 인식부(16)에 입력되고, 상기 음성 인식부(16)에서는 상기 음성 인식부(16)에서 이미 저장하고 있는 음성 신호와 LPC 셉스트럼으로 변환된 신호를 비교하여 이들의 음성 신호가 동일한지 여부를 판단한 후 마이컴(14)에 출력한다. In this way, a signal in which the voice signal is converted from the LPC coefficient to the LPC septum is input to the voice recognition unit 16, and the voice recognition unit 16 stores the voice signal already stored in the voice recognition unit 16. And the signals converted to the LPC septum are compared to determine whether these voice signals are the same and then output to the microcomputer 14.

그러면 상기 마이컴(14)에서는 상기 음성 인식부(16)로부터 출력되는 비교 결과를 입력받아 음성 신호의 내용에 따라 구동부(18)이 동작하도록 제어신호를 출력한다. Then, the microcomputer 14 receives the comparison result output from the voice recognition unit 16 and outputs a control signal to operate the driving unit 18 according to the contents of the voice signal.

상기와 같이 구성된 본 발명의 LPC 계수를 이용한 이동통신 단말기 음성인식 방법은, 이동통신 단말기에서 통신용 코드인 LSP 코드로부터 바로 음성 특징 계수인 LPC 셉스트럼 계수를 추출하여 이를 음성 인식에 사용함으로써, 비록 하드웨어의 변화는 없더라도 음성의 PCM 데이터로부터 특징 계수를 추출하는 것에 비해서 계산량을 현저히 감소시킬 수 있는 발명이다. In the mobile communication terminal speech recognition method using the LPC coefficients of the present invention configured as described above, the LPC receptacle coefficient, which is a voice feature coefficient, is directly extracted from the LSP code, which is a communication code, and used for speech recognition. Even if there is no change in hardware, the invention can significantly reduce the amount of computation compared to extracting feature coefficients from PCM data of speech.

도 1은 본 발명의 LPC 계수를 이용한 이동통신 단말기 음성인식 장치에 대한 개략적인 블록도이며, 1 is a schematic block diagram of a mobile communication terminal speech recognition device using the LPC coefficient of the present invention,

도 2는 본 발명의 LPC 계수를 이용한 이동통신 단말기 음성인식 방법에 대한 흐름도이다. 2 is a flowchart illustrating a voice recognition method of a mobile communication terminal using the LPC coefficient of the present invention.

<도면의 주요부분에 대한 부호의 설명><Description of the symbols for the main parts of the drawings>

11 : 마이크로폰 12 : 보코더11: microphone 12: vocoder

14 : 마이컴 16 : 음성 인식부14: microcomputer 16: voice recognition unit

18 : 구동부18: drive unit

Claims

레이트(rate), LSP, 피치(pitch) 및 코드북(codebook) 데이터를 포함하고 있는 패킷 데이터를 입력받아 상기 패킷중에서 LSP 코드 정보만을 추출하는 단계(S1)와,Receiving packet data including rate, LSP, pitch, and codebook data and extracting only LSP code information from the packet (S1);

상기 단계(S1)에서 추출된 LSP 코드 정보를 LSP 주파수로 변환하는 단계(S2)와,Converting the LSP code information extracted in the step S1 into an LSP frequency (S2);

상기 단계(S2)로부터 출력되는 신호를 입력받아 로우 패스 필터를 통해서 저주파수 성분의 신호만을 통과하는 단계(S3)와,Receiving a signal output from the step S2 and passing only a signal having a low frequency component through a low pass filter (S3);

상기 단계(S3)를 수행한 후 음성 인식부에서 음성 신호를 용이하게 인식하도록 LSP 주파수를 LPC 계수로 변환하는 단계(S4)와,Converting the LSP frequency into an LPC coefficient so that the speech recognition unit easily recognizes the speech signal after performing step S3;

상기 단계(S4)의 수행으로 LPC 계수로 변환 된 신호를 입력받아 LPC 셉스트럼으로 변환하는 단계(S5)Step S5 of receiving the signal converted into the LPC coefficients by performing the step (S4) (S5)

로 구성되며,Consists of,

상기 레이트는 R(0) ＞ TH3인 경우에는 레이트=1이고, R(0) ＞TH2 및 R(0) ＜ TH3인 경우에는 레이트= 1/2이고, R(0) ＞ TH1 및 R(0) ＜ TH2인 경우에는 레이트= 1/4이고, R(0) ＜ TH1인 경우에는 레이트= 1/8이 되는 것(여기에서 TH1, TH2, TH3은 레이트를 결정하기 위한 임계값들로서 R(0)의 값에 따라 결정되며, R(0)은 현재 프레임의 에너지를 나타낸다.)을 특징으로 하는 LPC 계수를 이용한 이동통신 단말기 음성인식 방법. The rate is rate = 1 when R (0)> TH3, rate = 1/2 when R (0)> TH2 and R (0) <TH3, and R (0)> TH1 and R (0). Rate = 1/4 when <TH2, and rate = 1/8 when R (0) <TH1 (where TH1, TH2, TH3 are R (0) as thresholds for determining the rate. And R (0) represents the energy of the current frame.).