KR100397435B1

KR100397435B1 - Method for processing language model using classes capable of processing registration of new words in speech recognition system

Info

Publication number: KR100397435B1
Application number: KR1019960029444A
Authority: KR
Inventors: 홍준모
Original assignee: 엘지전자 주식회사
Priority date: 1996-07-20
Filing date: 1996-07-20
Publication date: 2003-12-24
Also published as: KR980011006A

Abstract

PURPOSE: A method for processing a language model using classes capable of processing registration of new words in a speech recognition system is provided to improve the recognition rate without learning using sentences by generating registration classes according to additional new words. CONSTITUTION: If a new word is inputted through a key board, the new word is applied to a CPU so that the CPU registers the new word(301). The CPU searches whether a first new word is registered(302). The CPU searches whether a new registration class is to be set(303). If not, the CPU inserts the new word into the existing class(304). If so, the new registration class is registered(305). The mutual connection probability of registration classes and the generative probability of words are decided whenever a new registration class is generated(306). The sentence probability that the inserted word is generated in the class is decided(307).

Description

음성인식 시스템에서 새로운 등록단어 처리가 가능한 클래스를 이용한 언어학적 모델처리방법Linguistic Model Processing Method Using Classes that Can Process New Registered Words in Speech Recognition System

본 발명은 음성 인식 시스템에 있어서, 새로운 단어의 추가에 따라 등록 클래스를 생성하여 언어학적 모델을 처리하는 방법에 관한 것이다.The present invention relates to a method for processing a linguistic model by generating a registration class according to addition of a new word in a speech recognition system.

일반적으로 음성 인식 시스템에서 음성 인식을 하기 위해서 언어학적 모델(language model)을 처리한다. 언어학적 모델(language model)이란 단어들의 시퀀스(sequence)가 있을 때 그것들이 나올 수 있는 확률을 계산하는 수식과 확률을 구하기까지의 일련의 과정으로 정의할 수 있다. 통상적으로 문장을 구성하고 있는 단어의 시퀀스가 발생될 확률을 구하는 동작을 제1도를 참조하여 설명하면, 먼저 101단계에서 키보드를 이용하여 인식할 단어의 범위와 수를 결정하고 이 단어들이 들어 있는 문장을 구성한다. 그리고 102단계에서 언어학적 모델에서 사용되는 수식과 주어진 학습문장의 정보를 이용하여 단어상호간의 연결확률을 결정한다. 그런 후 103단계에서 단어상호간의 연결확률이 결정되면 그것을 이용하여 발생할 문장확률을 계산한다. 언어학적 모델은 음향학적 지식을 통해 구해진 단어들의 여러 시퀀스 가운데서 발생 가능한 가장 합리적인 후보를 찾기 위해 사용한다. 언어학적 모델(language model)중에서 이전의 N개의 단어가 현재 나타나는 단어의 확률에 영향을 끼친다고 가정한 모델을 N-gram language model이라 한다. 즉 w1부터 wN까지의 같은 단어들이 순차적으로 나온다고 가정하면, 이것이 나올 확률은 하기 식<1>에 의해 구해진다.In general, a speech model processes a linguage model for speech recognition. The linguage model can be defined as a series of processes to calculate the probability of calculating the probability that they can come out when there is a sequence of words. In general, an operation of obtaining a probability of generating a sequence of words constituting a sentence will be described with reference to FIG. 1. Construct a sentence. In step 102, the link probability between words is determined using the equations used in the linguistic model and the information of the given learning sentence. Then, in step 103, if the probability of connection between words is determined, the probability of occurrence of sentences is calculated using it. Linguistic models are used to find the most reasonable candidates that can occur among various sequences of words obtained through acoustic knowledge. In the language model, a model that assumes that the previous N words influence the probability of the present word is called an N-gram language model. That is, assuming that the same words from w1 to wN come out sequentially, the probability of this coming out is obtained by the following equation.

N-gram model중에서도 직 전의 1단어에 의해 확률이 결정되는 bigram model과 직전의 2단어에 의한 확률이 결정되는 trigram model이 많이 사용된다. 그런데, 대상 단어의 수가 많아지면 모든 biram과 trigram을 다루기가 힘들다. 이는 학습문장이 무한개의 단어열을 가지지 못했기 때문에 존재하지 않는 bigram과 trigram이 있을 경우가 많을 뿐만 아니라 다루어야 하는 정보의 양도 너무 많아 현재의 시스템에서 효과적으로 구현하기가 어렵기 때문이다. 따라서 단어들을 몇 개씩 나눈 집합을 클래스라 하는데, 이 클래스를 이용하여 단어에 대한 발생확률을 구하기 위한기술이 나오게 되었다. 이 클래스를 이용하여 단어에 대한 발생확률을 구하는 동작을 제2도를 참조하여 설명하면, 201단계에서 키보드를 이용하여 인식할 단어의 범위와 수를 결정하고 이 단어들이 들어 있는 학습문장을 구성한다. 그리고 202단계에서 구성한 학습문장에 대한 일정수의 클래스를 정하여 클래스 상호간의 연결확률과 클래스 내에서 단어가 발생할 확률을 식<2>에 의해 결정한다. 단어가 발생할 확률을 결정하기 위해서는 식 <2>와 같이 w₁이란 단어 뒤에 w₂가 올 확률은 w₁이 속하는 클래스 g₁뒤에 w₂가 속한 클래스 g₂가 올 확률에다가 클래스 g₂내에서 w₂가 나을 확률을 곱하면 된다.Among the N-gram models, the bigram model whose probability is determined by the first word and the trigram model whose probability is determined by the two words immediately before are used. However, as the number of target words increases, it is difficult to deal with all birams and trigrams. This is because there are many bigrams and trigrams that do not exist because the learning sentence does not have an infinite number of words, and the amount of information to be handled is too large to be effectively implemented in the current system. Therefore, a class that divides a few words into a class is called a class. Using this class, a technique for calculating the probability of occurrence of a word has emerged. Referring to FIG. 2, the operation of calculating the probability of occurrence of a word using this class will be described with reference to FIG. 2. In step 201, the range and number of words to be recognized are determined using a keyboard, and a learning sentence containing these words is constructed. . Then, a certain number of classes for the learning sentence constructed in step 202 are determined, and the probability of occurrence of words in the classes and the probability of occurrence of words in the classes are determined by Equation <2>. In order to determine the probability of occurrence of the word expression <2>, and w ₁ is w _2, the chance of the back word w ₁ the class g ₁ class, which is part of the back is w _2, g ₂ is w in the class g ₂ edaga chance of belonging as Multiply the probability that ₂ is better.

그런 후 204단계에서는 이와 같이 결정된 단어가 발생할 확률을 이용하여 발생할 문장확률을 구한다. 그런데 상기와 같은 종래의 방법은 단어가 발생될 확률이 결정되어 있는 상태에서 새로운 단어가 추가되면 이미 결정된 클래스를 가지고 새로운 단어에 대한 발생확률을 구할 수 없는 문제점이 있다.Then, in step 204, a sentence probability to be generated is calculated by using the probability of occurrence of the word thus determined. However, the conventional method as described above has a problem in that when a new word is added in a state where a probability of occurrence of a word is determined, a probability of occurrence of a new word cannot be obtained with a class already determined.

따라서 본 발명의 목적은 음성 인식 시스템에서 새로운 단어의 추가에 따라 등록클래스를 생성하여 언어학적 모델을 처리하는 방법을 제공함에 있다.Accordingly, an object of the present invention is to provide a method of processing a linguistic model by generating a registration class according to the addition of a new word in a speech recognition system.

본 발명의 다른 목적은 새로운 단어가 추가될 시 단어열의 발생확률을 계산할 수 있는 언어학적 모델 처리방법을 제공함에 있다.Another object of the present invention is to provide a linguistic model processing method that can calculate the probability of occurrence of a word string when a new word is added.

상기 목적을 달성하기 위한 본 발명은 새로운 단어가 등록될 시 새로운 등록 클래스를 설정하여 등록클래스의 연결 확률 및 단어의 발생 확률을 결정하여 언어학적 모델을 처리함을 특징으로 한다.The present invention for achieving the above object is characterized by processing a linguistic model by setting a new registration class when a new word is registered to determine the connection probability of the registration class and the occurrence probability of the word.

이하 본 발명을 첨부한 도면을 참조하여 본 발명의 바람직한 일 실시 예를 상세히 설명한다.Hereinafter, exemplary embodiments of the present invention will be described in detail with reference to the accompanying drawings.

제3도는 본 발명의 실시 예에 따른 음성 인식 시스템의 블록 구성도이다.3 is a block diagram of a speech recognition system according to an embodiment of the present invention.

키보드 10는 음성인식을 위한 각종 기능키를 발생하여 CPU 12로 인가한다. 롬 14는 음성인식을 위해 등록된 클래스에 따라 단어의 발생확률을 결정하기 위한 프로그램을 저장하고 있다. 램 16은 단어의 발생확률을 결정하기 위한 각종 데이타를 일시적으로 저장한다. CPU 12는 새로이 등록되는 단어를 입력받아 새로운 클래스를 생성하여 클래스 내에 첨가된 단어의 확률을 계산할 수 있도록 제어한다. 마이크 18는 인식할 음성신호를 전기적신호로 변환한다. A/D변환기 20는 전기적신호 변환된 음성신호를 디지탈 신호로 변환하여 상기 CPU 12로 인가한다. 모니터 22는 CPU 12의 제어에 의해 각종 데이타를 디스플레이한다.The keyboard 10 generates various function keys for voice recognition and applies them to the CPU 12. ROM 14 stores a program for determining the probability of occurrence of a word according to a class registered for speech recognition. RAM 16 temporarily stores various data for determining the probability of occurrence of a word. The CPU 12 receives a newly registered word, generates a new class, and controls to calculate the probability of the added word in the class. The microphone 18 converts the voice signal to be recognized into an electrical signal. The A / D converter 20 converts the electrical signal converted voice signal into a digital signal and applies it to the CPU 12. The monitor 22 displays various data under the control of the CPU 12.

제4도는 본 발명의 실시 예에 적용되는 언어학적 모델처리 제어 흐름도이다.4 is a flowchart illustrating linguistic model processing applied to an embodiment of the present invention.

상술한 제3도 및 제4도를 참조하여 본 발명의 바람직한 일 실시 예의 동작을 상세히 설명한다.The operation of the preferred embodiment of the present invention will be described in detail with reference to FIGS. 3 and 4 described above.

먼저 301단계에서 키보드 10을 통해 새로운 단어를 입력하면 CPU 12로 인가되어 CPU 12는 램 16에 새로운 단어를 등록시킨다. 그런 후 302단계에서 CPU 12는 첫 번째 새로운 단어가 등록되었는가 검색한다. 이때 첫 번째 단어등록이 아니면 303단계로 진행하여 CPU 12는 새로운 등록클래스를 설정할 것인가 검색한다. 새로운 등록 클래스를 설정할 것이 아니면 304단계로 진행하여 그 단어가 가지고 있는문법적 성질이 현재 존재하는 등록클래스의 성질과 동일하면 새로운 클래스를 생성하지 않아도 되므로, 기존 클래스 내에 단어를 삽입하고 306단계로 진행한다. 그리고 303단계에서 새로운 등록 클래스 설정이면 305단계로 진행하여 새로운 등록 클래스를 등록한다. 즉, 단어의 집합이 정해지고 K개의 클래스로 나누어 문장을 학습시켰다고 가정하면 현재 K개의 클래스가 존재하므로 최초의 새로운 단어가 등록되면 K+1번째의 새로운 클래스가 생성된다. 다음으로 306단계에서 등록 클래스가 새로 생성될 때마다 다른 모든 클래스들과의 상호 연결확률 및 단어의 발생확률을 결정한다. 즉, 등록 클래스 g_R1이 새로 첨가되었다면 bigram의 경우 현재 존재하는 모든 클래스 i에 대해 Pr(g_i/g_R1)과 Pr(g_R1/g_i)을 구해준다. 이러한 확률을 구하는 방법은 여러 가지가 있으나 일예로 기존의 K개의 클래스 중에서 성질이 가장 비슷한 것 1개를 본뜨거나 여러 개의 평균을 구할 수 있다. 그런 후 307단계에서 클래스 내에 첨가된 단어에 대하여 그 클래스 내에서 첨가된 단어가 발생될 문장확률을 정한다.First, when a new word is input through the keyboard 10 in step 301, the CPU 12 is applied to the CPU 12, and the CPU 12 registers the new word in the RAM 16. Then, in step 302, the CPU 12 searches whether the first new word is registered. If it is not the first word registration, the process proceeds to step 303 where the CPU 12 searches whether to set a new registration class. If you do not want to set a new registered class, go to step 304. If the grammatical property of the word is the same as that of the existing registered class, you do not need to create a new class. Therefore, insert the word into the existing class and proceed to step 306. . If a new registration class is set in step 303, the process proceeds to step 305 to register a new registration class. In other words, assuming that a set of words is defined and a sentence is divided into K classes to learn a sentence, there are currently K classes. When a first new word is registered, a K + 1th new class is created. Next, in step 306, whenever a new registration class is generated, the probability of occurrence of interconnection and occurrence of words with all other classes is determined. That is, if a newly registered class g _R1 is added, bigram obtains Pr (g _i / g _R1 ) and Pr (g _R1 / g _i ) for all existing classes i. There are many ways to calculate these probabilities. For example, one of the existing K classes having the most similar properties can be modeled or multiple averages can be obtained. Thereafter, in step 307, a sentence probability that a word added in the class is to be generated is determined for the word added in the class.

이와 같이 본 발명은, 음성인식 시스템에서 새로운 추가 단어에 대한 단어간 연결확률을 결정할 시 새로운 단어의 추가에 따라 등록클래스를 생성하여 문장을 이용한 별도의 학습없이도 음성인식률을 향상시킬수 있는 잇점이 있다.As described above, the present invention has an advantage of improving speech recognition rate without additional learning using sentences by creating a registration class according to the addition of a new word when determining the word-to-word connection probability for a new additional word in the speech recognition system.

제1도는 일반적인 문장을 구성하고 있는 단어의 시퀀스가 발생될 확률을 구하기 위한 제어 흐름도1 is a control flowchart for calculating the probability of generating a sequence of words constituting a general sentence.

제2도는 일반적인 클래스를 이용하여 단어에 대한 발생확률을 구하기 위한 제어 흐름도2 is a control flowchart for calculating the probability of occurrence of a word using a general class.

제3도는 본 발명의 실시 예에 따른 음성인식 시스템의 블럭구성도3 is a block diagram of a voice recognition system according to an embodiment of the present invention.

제4도는 본 발명의 실시 예에 적용되는 언어학적 모델처리 제어 흐름도4 is a flow chart of linguistic model processing applied to an embodiment of the present invention.

Claims

음성인식 시스템에서 새로운 등록단어 처리가 가능한 클래스를 이용한 언어학적 모델처리방법에 있어서,In the linguistic model processing method using a class that can process a new registered word in the speech recognition system,

새로운 단어가 등록될 시 새로운 등록 클래스를 설정하여 상기 등록클래스의 연결 확률 및 단어의 발생 확률을 결정하여 언어학적 모델을 처리함을 특징으로 하는 방법.When a new word is registered, a new registration class is set to determine the connection probability of the registration class and the occurrence probability of the word to process the linguistic model.

사용자에 의해 단어가 등록될 시 최초의 새로운 단어인지 검색하는 과정과,When a word is registered by the user, searching for the first new word;

상기 등록된 단어가 최초의 새로운 단어일 경우 새로운 등록 클래스를 생성하는 과정과,Generating a new registration class when the registered word is the first new word;

상기 등록된 단어가 최초의 새로운 단어가 아닐 경우 새로운 등록클래스를 설정할 것인지 여부를 검출하는 과정과,Detecting whether to set a new registered class when the registered word is not the first new word;

상기 새로운 등록 클래스를 설정할 경우 새로운 등록 클래스를 생성하는 과정과,Creating a new registration class when setting the new registration class;

상기 새로운 등록클래스를 생성한 후 상기 등록클래스의 연결확률 및 단어의 발생확률을 결정하는 과정으로 이루어 짐을 특징으로 하는 방법.And generating a connection probability of the registered class and occurrence probability of a word after generating the new registration class.