KR20010064247A

KR20010064247A - Method of using multi - level recognition unit for speech recognition

Info

Publication number: KR20010064247A
Application number: KR1019990062397A
Authority: KR
Inventors: 황규웅; 권오욱; 박준
Original assignee: 오길록; 한국전자통신연구원
Priority date: 1999-12-27
Filing date: 1999-12-27
Publication date: 2001-07-09

Abstract

PURPOSE: A method for using multi level voice recognition unit is provided to obtain a language model from a statistics value and to recognize a language model using various-level recognition units by searching all connection relations of recognition units of various levels. CONSTITUTION: In a language model construction method using an n-gram for designating a word being displayed after a specific word, a language model is constructed by considering all connection relations between language units of various levels after inputted sentence is divided into language units of various levels. In addition, in a voice recognition searching method using a n-gram for designating a word being displayed after a specific word, a language model is constructed and stored by considering all connection relations between language units of various levels after inputted sentence is divided into language units of various levels. A matched sentence is searched out of the language model stored in the above stage by considering all connection relations between language units of various levels after the inputted sentence is divided into language units of various levels.

Description

음성 인식을 위한 다중 수준의 음성 인식 단위 사용 방법 {Method of using multi - level recognition unit for speech recognition}Method of using multi-level recognition unit for speech recognition

본 발명은 음성 인식을 위한 음성 인식 단위 사용 방법에 관한 것이며, 특히, 여러 수준의 인식 단위의 연결 관계를 모두 고려하여 이의 통계치로부터 언어 모델을 구하고 탐색에서도 여러 수준의 인식 단위를 모두 사용하여 인식하는 음성인식 단위 사용 방법에 관한 것이다.The present invention relates to a method of using a speech recognition unit for speech recognition, and more particularly, to obtain a language model from its statistics in consideration of all the connection relations of the various levels of recognition units, and to recognize the recognition using all the various levels of recognition units in search. The present invention relates to a voice recognition unit.

종래에는 어절을 이용하여 인식하거나 의사 형태소를 이용하여 음성 인식을 수행하는 경우가 많았다. 국외의 예로는 기존의 단어를 인식 단위로 사용하는 방법에 추가하여 상용구를 하나의 인식 단위로 사용하는 방법이 있다.In the past, speech recognition was often performed using words or pseudo morphemes. For example, in addition to using a conventional word as a recognition unit, there is a method using a boilerplate as a recognition unit.

그러나, 한국어와 같은 교착어에 있어서는 단어에 해당되는 어절의 어미 및 조사 활용이 다양하여 그 수가 많으므로, 의사 형태소 등을 이용하는 방법이 제안되었으나, 이는 아직까지 어절을 인식 단위로 사용하는 경우에 비하여 낮은 성능을 보이고 있다는 문제점이 있다.However, in the interstitial language such as Korean, since the word and the search usage of the word corresponding to the word are diverse and many, a method using pseudo morphemes has been proposed, but this is still lower than the case of using the word as a recognition unit. There is a problem that shows performance.

본 발명은 상기와 같은 종래 기술의 문제점을 해결하기 위하여 안출된 것으로서, 여러 수준의 인식 단위의 연결 관계를 모두 고려하여 이의 통계치로부터 언어 모델을 구하고 탐색에서도 여러 수준의 인식 단위를 모두 사용하여 인식하는 음성 인식 단위 사용 방법을 제공하는데 그 목적이 있다.The present invention has been made to solve the problems of the prior art as described above, taking into account all the connections of the various levels of recognition unit to obtain a language model from its statistics and to recognize the recognition using all of the various levels of recognition unit Its purpose is to provide a method of using a speech recognition unit.

도 1은 본 발명이 적용되는 음성 인식 장치의 구성도이고,1 is a block diagram of a speech recognition apparatus to which the present invention is applied;

도 2는 본 발명에 적용되는 음성 인식에서의 언어 모델의 역할을 개념적으로 설명하고 있는 도면이고,2 is a diagram conceptually explaining the role of a language model in speech recognition applied to the present invention;

도 3은 종래 기술에 따른 언어 모델을 구하는 방법을 설명하고 있는 예시도이고,3 is an exemplary view illustrating a method of obtaining a language model according to the prior art,

도 4는 본 발명의 일 실시예에 따른 언어 모델을 구하는 방법을 개념적으로 설명하고 있는 도면이다.4 is a view conceptually illustrating a method of obtaining a language model according to an embodiment of the present invention.

앞서 설명한 바와 같은 목적을 달성하기 위한 본 발명에 따르면, 특정 단어 다음에 어떠한 단어들이 나타나는 가를 지정하는 n-gram을 이용한 언어 모델 구축 방법에 있어서, 입력되는 문장을 다양한 수준의 언어 단위로 분리한 후, 상기 다양한 수준의 언어 단위들간의 모든 연결 관계를 고려하여 언어 모델을 구축하는 것을특징으로 하는 언어 모델 구축 방법이 제공된다.According to the present invention for achieving the object as described above, in the method of building a language model using n-gram that specifies what words appear after a particular word, after separating the input sentences into various levels of language units There is provided a language model building method, wherein the language model is constructed in consideration of all connection relations among the various levels of language units.

또한, 특정 단어 다음에 어떠한 단어들이 나타나는 가를 지정하는 n-gram을 이용한 음성 인식 탐색 방법에 있어서, 입력되는 문장을 다양한 수준의 언어 단위로 분리한 후, 상기 다양한 수준의 언어 단위들간의 모든 연결 관계를 고려하여 언어 모델을 구축하여 저장하는 제 1 단계; 및 입력되는 음성을 다양한 수준의 언어 단위로 분리한 후, 상기 다양한 수준의 언어 단위들간의 모든 연결 관계를 고려하여 상기 제 1 단계에서 저장한 언어 모델 중에서 매치되는 문장을 탐색하는 제 2 단계를 포함하여 이루어진 것을 특징으로 하는 음성 인식 탐색 방법이 제공된다.In addition, in a speech recognition search method using n-gram which specifies which words appear after a specific word, all input relations between the various level language units are divided after the input sentence is divided into various language units. Considering a first step of constructing and storing a language model; And a second step of dividing the input voice into language units of various levels and searching for a matching sentence among the language models stored in the first step in consideration of all connection relations among the language units of various levels. There is provided a speech recognition searching method, which is made.

또한, 컴퓨터에, 입력되는 문장을 다양한 수준의 언어 단위로 분리한 후, 상기 다양한 수준의 언어 단위들간의 모든 연결 관계를 고려하여 언어 모델을 구축하여 저장하는 제 1 단계; 및 입력되는 음성을 다양한 수준의 언어 단위로 분리한 후, 상기 다양한 수준의 언어 단위들간의 모든 연결 관계를 고려하여 상기 제 1 단계에서 저장한 언어 모델 중에서 매치되는 문장을 탐색하는 제 2 단계를 포함하여 이루어진 것을 실행시킬 수 있는 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록 매체가 제공된다.The method may further include: separating a sentence to be input into a computer at various levels of language, and then constructing and storing a language model in consideration of all connection relationships among the language units of the various levels; And a second step of dividing the input voice into language units of various levels and searching for a matching sentence among the language models stored in the first step in consideration of all connection relations among the language units of various levels. There is provided a computer-readable recording medium having recorded thereon a program capable of executing what has been done.

보다 더 상세하게는, 입력되는 문장 및 입력되는 음성을 구, 어절 및 형태소의 언어 단위로 분리하는 것을 특징으로 하는 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록 매체가 제공된다.More specifically, there is provided a computer-readable recording medium having recorded thereon a program characterized by separating input sentences and input voice into phrase, word, and morpheme language units.

아래에서, 본 발명에 따른 양호한 일 실시예를 첨부한 도면을 참조로 하여상세히 설명하겠다.In the following, with reference to the accompanying drawings, a preferred embodiment according to the present invention will be described in detail.

컴퓨터로 음성을 인식하는 경우에 필요한 대표적 요소로는 음성을 모델링한 음향 모델, 인식하고자 하는 대상 단어들을 정의한 단어 사전, 상기 단어 사전에 들어 있는 단어들이 어떻게 연결되어 문장을 이루는가를 기술한 언어 모델을 들 수 있다.Representative elements required for computer speech recognition include an acoustic model modeling speech, a word dictionary defining target words to be recognized, and a language model describing how words in the word dictionary form a sentence. Can be mentioned.

도 1은 본 발명이 적용되는 음성 인식 장치의 구성도로서, 상기 음성 인식 장치는 입력 수단인 마이크(102), 아날로그 디지털 변환부(AD 변환부, 103), 특징 추출부(104), 음향 모델 훈련부(107), 언어 모델 훈련부(108) 및 최적 모델 탐색부(109)를 포함하여 이루어져 있다.1 is a block diagram of a speech recognition apparatus to which the present invention is applied. The speech recognition apparatus includes a microphone 102 as an input means, an analog-digital converter (AD converter) 103, a feature extractor 104, and an acoustic model. The training unit 107, the language model training unit 108, and the optimal model search unit 109 are included.

음성 데이터가 상기 입력 수단인 마이크(102)를 통하여 입력되면, 상기 아날로그 디지털 변환기(103)에서 아날로그/디지털 변환이 수행되고, 상기 특징 추출부(104)에서는 변환된 디지털 음성 정보로부터 음성 인식에 필요한 특징을 추출한다.When voice data is input through the microphone 102, which is the input means, analog-to-digital conversion is performed by the analog-to-digital converter 103, and the feature extractor 104 is required for speech recognition from the converted digital voice information. Extract the feature.

또한, 음성 인식에 사용되는 음향 모델 및 언어 모델은 훈련 과정에서 얻어지게 되는데, 상기 음향 모델 훈련부(107)에 훈련용 음성 데이터가 입력되면, 음향 모델을 훈련하여 저장하고, 상기 언어 모델 훈련부(108)에 훈련용 문자 데이터가 입력되면, 언어 모델을 훈련하여 저장한다. 이때 저장된 상기 언어 모델은 상기 음향 모델의 연결 정보로 사용된다.In addition, an acoustic model and a language model used for speech recognition are obtained during a training process. When training voice data is input to the acoustic model training unit 107, the acoustic model is trained and stored, and the language model training unit 108 is provided. When training text data is inputted into), the language model is trained and stored. In this case, the stored language model is used as connection information of the acoustic model.

또한, 상기 최적 모델 탐색부(109)에서는 최적 모델이 탐색되며, 이 모델이 나타내는 단어가 인식 결과가 된다.In addition, the optimum model search unit 109 searches for an optimum model, and the word represented by the model becomes a recognition result.

본 발명은 언어 모델의 훈련에 적용되어 어절간의 연결 확률에 추가하여 형태소 - 어절간의 연결 확률도 구하고, 최적 모델 탐색에서도 기존에 사용하는 어절간 연결 관계에 추가하여 형태소 - 어절간의 복합적인 연결 관계를 고려하여 최적 모델을 탐색하게 된다.The present invention is applied to the training of language models to obtain the linkage probability between morphemes and words in addition to the linkage probability between words. Consider the optimal model.

도 2는 본 발명에 적용되는 음성 인식에서의 언어 모델의 역할을 개념적으로 설명하고 있는 도면으로서, 어떤 한 단어 다음에 단어들이 어떤 확률로 나타날 수 있는가를 보여주고 있다. 이는 어떤 단어가 발성되었는가를 찾아내는 음성 인식에 큰 도움을 준다.FIG. 2 conceptually illustrates the role of a language model in speech recognition applied to the present invention, and shows which probability words can appear after a word. This is a great help in speech recognition to find out which words are spoken.

도 2에 도시되어 있는 P(w1??w1)는 단어 w1 다음에 w1이 나올 확률이고, P(w2??w1)는 단어 w1 다음에 w2가 나올 확률이며, P(wn??w1)는 단어 w1 다음에 wn이 나올 확률이다.P (w1 ?? w1) shown in FIG. 2 is the probability that w1 appears after the word w1, P (w2 ?? w1) is the probability that w2 appears after the word w1, and P (wn ?? w1) The probability that wn comes after the word w1.

언어 모델에서 인식하고자 하는 단어의 수가 적고, 그 연결 관계가 단순하고 명확한 경우에는 컨텍스트 프리 그래머(Context Free Grammar)를 사용하여 사람이 직접 그 관계를 기술하여 준다. 그러나, 대상 단어의 수가 많고, 명확한 관계를 찾기 힘든 경우에는 대상 영역의 많은 문장들로부터 통계적으로 단어의 연결 관계를 추출하여 문법을 생성한다. 통계적인 방법의 경우 대표적인 것은 n개의 단어의 연결 발생 횟수에 기반한 n-gram이다.When the number of words to be recognized in the language model is small and the connection is simple and clear, the context free grammar is used to describe the relationship directly. However, when the number of target words is large and it is difficult to find a definite relationship, a grammar is generated by extracting the word relations statistically from many sentences in the target area. In the case of the statistical method, a representative example is n-gram based on the number of occurrences of concatenation of n words.

예를 들어, 두 개의 단어의 연결 관계만 모델링한다면, 단어 w1 다음에 w2가 나올 확률은 훈련 문장에서 w1 다음에 w2가 나온 횟수 Freq(w1, w2)를 w1이 나온전체 횟수 Freq(w1)으로 나누어 구한다.For example, if you only model the linkage of two words, the probability that w2 comes after the word w1 is equal to the frequency of Freq (w1, w2) where w1 comes after w1 in the training sentence as Freq (w1). Find it separately.

이를 수학식으로 표현한 것이 아래의 [수학식 1]이다.This is expressed by the following equation (Equation 1).

단어 w1 다음에 w2가 나올 확률 P(w2|w1) = Freq(w1, w2) / Freq(w1)The probability that w2 comes after the word w1 P (w2 | w1) = Freq (w1, w2) / Freq (w1)

도 3은 종래 기술에 따른 언어 모델을 구하는 방법을 설명하고 있는 예시도로서, 이를 상세히 설명하면 다음과 같다.3 is an exemplary view illustrating a method of obtaining a language model according to the prior art, which will be described in detail below.

먼저, 어절을 인식 및 언어 모델 단위로 사용하는 경우의 예는 다음과 같다.First, an example of using a word as a unit of recognition and language model is as follows.

문장 : 나는 밥을 먹는다.Sentence: I eat rice

인식 단위 : '나는', '밥을', '먹는다'Recognition unit: 'I', 'bake', 'eat'

연결 관계 : 나는 --> 밥을Connection relationship: I-> Bob

밥을 --> 먹는다Eat rice->

또한, 기존의 방법에서 형태소를 인식 및 언어 모델 단위로 사용하는 경우의 예를 보면 다음과 같다.In addition, an example of using a morpheme as a unit of recognition and language model in the conventional method is as follows.

문장 : 나는 밥을 먹는다.Sentence: I eat rice

인식 단위 : '나', '는', '밥', '을', '먹', '는다'Recognition unit: 'me', 'a', 'bap', 'a', 'eating', 'ta'

연결 관계 : 나 --> 는Connection relationship: me->

는 --> 밥-> Bob

밥 --> 을Bob->

을 --> 먹-> Eat

먹 --> 는다Eat-> eat

도 4는 본 발명의 일 실시예에 따른 언어 모델을 구하는 방법을 개념적으로 설명하고 있는 도면으로서, 본 발명에서는 어절 및 형태소를 모두 인식 및 언어 모델 단위로 사용한다. 그 예를 보면 다음과 같다.FIG. 4 conceptually illustrates a method of obtaining a language model according to an embodiment of the present invention. In the present invention, both a word and a morpheme are used as a unit of recognition and language model. For example:

문장 : 나는 밥을 먹는다.Sentence: I eat rice

인식 단위 : '나는', '밥을', '먹는다', '나', '는', '밥', '을', '먹',Recognition Units: 'I', 'Bob', 'Eat', 'I', 'It', 'Bob', 'A', 'Eat',

'는다''Do'

연결 관계 : 나는 --> 밥을Connection relationship: I-> Bob

밥을 --> 먹는다Eat rice->

나 --> 는Me->

는 --> 밥-> Bob

밥 --> 을Bob->

을 --> 먹-> Eat

먹 --> 는다Eat-> eat

나는 --> 밥I-> Bob

는 --> 밥을-> Rice

밥을 --> 먹Eat rice->

을 --> 먹는다-> Eat

본 발명에서 제안하는 방법은 '먹는다'는 '먹'과 '는다'라는 의사 형태소로 이루어져 있고, 이 의사형태소와 어절과의 연결 관계 또한 고려하는 방법이다.The method proposed by the present invention is composed of pseudo morphemes of 'eat' and 'consume', and also considers the connection relationship between the pseudomorphism and the word.

즉, 종래의 언어 모델에서는 이 문장에 대해서 인식 단위를 찾자면, '나는', '밥을', '먹는다'의 세 개 또는 형태소를 사용하는 방법에서는 '나', '는', '밥', '을', '먹', '는다'의 6개인데, 본 발명에서는 이 두 가지 방법을 모두 포함하여 '나는', '밥을', '먹는다', '나', '는', '밥', '을', '먹', '는다'의 9개를 모두 인식 단위로 사용하며, 연결 관계 또한 모두 고려한다.In other words, in the conventional language model, to find the recognition unit for this sentence, 'I', 'b', 'bap' , ',', 'Eat', and 'do' are six. In the present invention, both of these methods include 'I', 'rice', 'eat', 'me', 'the', ' Nine of Bob, ',', 'eat', and 'to' are all used as recognition units, and the connection relationship is also considered.

즉, 본 발명에서 제안하고자 하는 것은 문장은 구로 이루어져 있고, 구는 어절로, 어절은 다시 형태소로 이루어져 있다고 보는 것이며, 이 모두의 연결 관계를 모두 고려하고, 이 모두를 인식 단위로 사용한다. 언어 모델을 구하는 수식은 기존의 방법과 같으나 그 단위로는 어절, 형태소 등의 다양한 단위를 사용한다.That is, the present invention proposes that a sentence is composed of a phrase, a phrase is composed of a phrase, and a phrase is composed of morphemes again, all of the connection relations are considered, and all are used as recognition units. The formula for obtaining a language model is the same as the conventional method, but various units such as word and morpheme are used as the unit.

상기와 같은 본 발명은 컴퓨터로 읽을 수 있는 기록 매체로 기록되고, 컴퓨터에 의해 처리될 수 있다.The present invention as described above is recorded on a computer-readable recording medium, and can be processed by a computer.

앞서 상세히 설명한 바와 같이 본 발명은 다중 수준의 인식단위를 사용하여 보다 정확한 언어모델을 구하므로 이를 이용한 음성 인식의 성능을 향상시키고, 그 외에 언어 모델을 적용하여 자동적인 작업을 수행하는 언어 처리 등의 분야에서도그 성능을 향상시킬 수 있는 효과가 있다.As described in detail above, the present invention obtains a more accurate language model using a multilevel recognition unit, thereby improving the performance of speech recognition using the same, and in addition, language processing for performing automatic tasks by applying the language model. There is an effect that can improve the performance in the field.

이상에서 본 발명에 대한 기술 사상을 첨부 도면과 함께 서술하였지만 이는 본 발명의 가장 양호한 일 실시예를 예시적으로 설명한 것이지 본 발명을 한정하는 것은 아니다. 또한, 이 기술 분야의 통상의 지식을 가진 자이면 누구나 본 발명의 기술 사상의 범주를 이탈하지 않는 범위 내에서 다양한 변형 및 모방이 가능함은 명백한 사실이다.The technical spirit of the present invention has been described above with reference to the accompanying drawings, but this is by way of example only and not by way of limitation to the present invention. In addition, it is obvious that any person skilled in the art may make various modifications and imitations without departing from the scope of the technical idea of the present invention.

Claims

특정 단어 다음에 어떠한 단어들이 나타나는 가를 지정하는 n-gram을 이용한 언어 모델 구축 방법에 있어서,In the method of building a language model using n-gram that specifies which words appear after a certain word,

입력되는 문장을 다양한 수준의 언어 단위로 분리한 후, 상기 다양한 수준의 언어 단위들간의 모든 연결 관계를 고려하여 언어 모델을 구축하는 것을 특징으로 하는 언어 모델 구축 방법.And dividing an input sentence into language units of various levels, and constructing a language model in consideration of all connection relations among the language units of various levels.

제 1 항에 있어서,The method of claim 1,

입력되는 문장을 구, 어절 및 형태소의 언어 단위로 분리하는 것을 특징으로 하는 언어 모델 구축 방법.A method of constructing a language model, comprising: separating input sentences into phrase, word, and morpheme language units.

특정 단어 다음에 어떠한 단어들이 나타나는 가를 지정하는 n-gram을 이용한 음성 인식 탐색 방법에 있어서,In the speech recognition search method using n-gram that specifies which words appear after a specific word,

입력되는 문장을 다양한 수준의 언어 단위로 분리한 후, 상기 다양한 수준의 언어 단위들간의 모든 연결 관계를 고려하여 언어 모델을 구축하여 저장하는 제 1 단계; 및A first step of dividing an input sentence into language units of various levels, and constructing and storing a language model in consideration of all connection relations among the language units of various levels; And

입력되는 음성을 다양한 수준의 언어 단위로 분리한 후, 상기 다양한 수준의언어 단위들간의 모든 연결 관계를 고려하여 상기 제 1 단계에서 저장한 언어 모델 중에서 매치되는 문장을 탐색하는 제 2 단계를 포함하여 이루어진 것을 특징으로 하는 음성 인식 탐색 방법.After the input voice is divided into various levels of language units, a second step of searching for a matching sentence among the language models stored in the first step in consideration of all the connection relations between the various levels of language units; Speech recognition search method, characterized in that made.

제 3 항에 있어서,The method of claim 3, wherein

입력되는 문장 및 입력되는 음성을 구, 어절 및 형태소의 언어 단위로 분리하는 것을 특징으로 하는 음성 인식 탐색 방법.A speech recognition search method comprising: separating input sentences and input voice into phrase, word, and morpheme language units.

컴퓨터에,On your computer,

입력되는 음성을 다양한 수준의 언어 단위로 분리한 후, 상기 다양한 수준의 언어 단위들간의 모든 연결 관계를 고려하여 상기 제 1 단계에서 저장한 언어 모델 중에서 매치되는 문장을 탐색하는 제 2 단계를 포함하여 이루어진 것을 실행시킬 수 있는 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록 매체.A second step of dividing an input voice into language units of various levels, and searching for a matching sentence among language models stored in the first step in consideration of all connection relations among the language units of various levels; A computer-readable recording medium that records a program capable of executing what has been done.

제 5 항에 있어서,The method of claim 5,

입력되는 문장 및 입력되는 음성을 구, 어절 및 형태소의 언어 단위로 분리하는 것을 특징으로 하는 프로그램을 기록한 컴퓨터로 읽을 수 있는 기록 매체.A computer-readable recording medium having a program recorded thereon, wherein the input sentence and the input voice are separated into phrase, word, and morpheme language units.