KR20030010979A

KR20030010979A - Continuous speech recognization method utilizing meaning-word-based model and the apparatus

Info

Publication number: KR20030010979A
Application number: KR1020010045687A
Authority: KR
Inventors: 강현석
Original assignee: 삼성전자주식회사
Priority date: 2001-07-28
Filing date: 2001-07-28
Publication date: 2003-02-06

Abstract

PURPOSE: A continuous voice recognition method and system using a meaning word unit model are provided to improve voice recognition rate by using the meaning word model as well as a phoneme model. CONSTITUTION: A characteristic vector to be used for voice recognition is extracted from an input voice signal. Pattern matching for the extracted characteristic vector is performed using a phoneme unit model and a meaning word unit model, which uses a meaning word essential for constructing a sentence as a separate recognition unit(320). An optimum sentence is selected using information of meaning words among sentences obtained by performing the pattern matching.

Description

의미어단위 모델을 이용한 연속음성인식방법 및 장치{Continuous speech recognization method utilizing meaning-word-based model and the apparatus}Continuous speech recognization method utilizing meaning-word-based model and the apparatus

본 발명은 음성인식방법 및 그 장치에 관한 것으로, 좀더 구체적으로는 의미어 단위모델을 이용한 음성인식방법 및 그 장치에 관한 것이다.The present invention relates to a speech recognition method and apparatus, and more particularly, to a speech recognition method and apparatus using a semantic unit model.

기존의 음성인식방법에는 인식단위에 따라서 단어기반(word-based) 음성인식과 음소기반(phoneme-based) 음성인식이 있다. 단어기반 음성인식방법은 인식할 대상을 단어 단위로 설정하기 때문에 인식장치가 단어를 학습하고 입력에 대한 출력으로 가장 유사한 단어를 찾는다. 단어기반 음성인식방법은 학습만 잘 되어 있다면 우수한 인식결과를 가져올 수 있는 방법이지만 단어단위 인식을 하기 위해서는 수만개에 해당하는 단어에 대한 학습모델과 학습 데이터를 가지고 있어야 하며, 특히, 모든 단어모델에 대해 공평한 학습을 시킨다는 것은 매우 어려운 작업이고, 수만가지의 단어 모델을 유지하는데 필요한 메모리 용량도 많이 차지한다. 즉, 단어기반 음성인식방법은 충분한 학습을 시키기 어렵다는 문제점과 많은 비용이 들어간다는 단점을 안고 있다.Conventional speech recognition methods include word-based speech recognition and phoneme-based speech recognition according to recognition units. The word-based speech recognition method sets the object to be recognized in word units so that the recognition device learns the word and finds the most similar word as the output for the input. Word-based speech recognition method can bring excellent recognition result if it is well trained, but in order to recognize word unit, it must have learning model and learning data about tens of thousands of words, especially for all word models. Equitable learning is a very difficult task and takes up a lot of memory to maintain tens of thousands of word models. That is, the word-based speech recognition method has a problem that it is difficult to sufficiently learn and costs a lot.

음소기반 음성인식방법은 음소를 인식하여 단어를 구성하는 것인데 단어기반 음성인식방법보다는 적은 비용으로 비교적 우수한 인식 작업을 수행할 수 있지만, 음소인식이 실패하게 되었을 때 단어구성에 치명적인 영향을 줄 수 있다.The phoneme-based speech recognition method constructs words by recognizing phonemes, which can perform relatively superior recognition tasks at a lower cost than the word-based speech recognition method, but it can have a fatal effect on word composition when phoneme recognition fails. .

또한, 단어기반 음성인식방법과 음소기반 음성인식방법의 장점을 모두 살리기 위해 이용한 방법으로 기능어 모델이 있다 기능어 모델은 문장에서 자주 사용되지만 구어체 대화에서 잘 들리지 않는 기능어들, 예컨대, 영어로는 'a','an','in','and' 등의 인식단위를 자체 단어 모델로 구성하는 것이다. 기능어 단위 음성인식방법이 인식단위모델이 기능어들 즉, 영어의 전치사나 접속사, 한국어의 조사에 대해 만들어져 있어서, 이미 학습되어진 단어 모델과의 패턴매칭을 통해 가장 유사한 단어를 인식결과로 출력한다. 즉, 단어 음성이 입력되어지면 음소단위로 인식결과가 도출되고 발음사전을 참조하여 단어를 구성하게 되며 기능어들의 경우에는 음소단위 인식이 아닌 단어단위 인식이 수행되어 발음사전의 참조없이 곧바로 인식이 수행된다. 이와 같은 방법은 단어를 한꺼번에 비교분석하기 대문에 인식 결과에 대한 신뢰도가 높으나 단어 모델을 최적으로 학습시키는 것이 매우 어렵다.In addition, there is a functional language model that is used to take advantage of both the word-based speech recognition method and the phoneme-based speech recognition method. The functional model is a functional word that is frequently used in sentences but is hard to hear in spoken conversation, such as 'a' in English. Recognition units such as ',' 'an', 'in', and 'and' are composed of their own word model. The functional unit unit speech recognition method is that the recognition unit model is made for the investigation of the functional words, that is, the prepositions, conjunctions and Korean of the English language, and outputs the most similar words through the pattern matching with the already learned word model as the recognition result. That is, when a word voice is input, recognition results are derived in phoneme units, and words are formed by referring to the pronunciation dictionary. In the case of functional words, recognition is performed without reference to the phonetic dictionary, instead of phoneme recognition. do. This method has high confidence in recognition results because it compares and analyzes words at once, but it is very difficult to optimally train word models.

기능어단위 모델을 이용하는 경우 그렇지 않은 경우보다 인식률이 향상됨은 많은 논문을 통해 증명되었다. 또한, 단어단위 인식이 가지고 있는 학습의 문제점도 나타나지 않는다. 그러나, 이 방법은 기능어에 한정된 것으로써 실제 문장을 구성하는데 중요한 정보를 담고있는 단어들에 대해서는 여전히 인식오류의 문제점을 안고 있다.Many papers have demonstrated that the recognition rate is improved when the functional unit model is used. In addition, there is no problem of learning that word unit recognition has. However, this method is limited to functional words and still has a problem of recognition error for words that contain important information for constructing actual sentences.

또한, 학습된 음소 모델과 발음사전만 있으면 원하는 단어를 인식할 수 있는 고립단어인식방법과 달리 연속음성인식방법에서 '단어 네트워크(word network)'가 추가적으로 사용되어 인식된 단어들을 묶어 하나의 문장을 구성하여 인식하는데, 이때, 단어 네트워크를 어떻게 구성하느냐에 따라서 탐색 공간의 크기가 좌우되고 이것은 인식기의 성능에 많은 영향을 미치므로 단어네트워크는 올바른 문장을 인식하도록 하는데 중요한 역할을 한다.In addition, unlike the isolated word recognition method, which can recognize the desired word only with the learned phoneme model and the pronunciation dictionary, the word network is additionally used in the continuous speech recognition method to bind the recognized words to form a sentence. In this case, the size of the search space depends on how the word network is constructed, and since this affects the performance of the recognizer, the word network plays an important role in recognizing the correct sentence.

단어 네트워크는 주로 word-pair grammar, N-gram, Finite State Automata (FSA) 등으로 구현된다. 주요 개념은 한 단어 뒤에 나타날 수 있는 단어들을 연결하되 규칙에 의해 고정시키거나 통계적인 확률값을 연결하는 것이다. word-pair grammar는 특정 단어 뒤에 나타날 수 있는 단어들만 연결시킨다. 예를 들어, "먹고" + "싶습니다" 는 순서대로 연결이 가능하지만 그 반대로는 연결될 수 없다. 이러한 word-pair grammar의 단어 네트워크 구조의 한 예가 도 2에 도시되어 있다. N-gram은 단어와 단어 사이의 연결에 통계적인 확률을 이용한다. 학습 데이터 뭉치를 이용해서 어떤 단어가 한 단어 다음에 나타날 확률을 계산하여 확률이 높은 쪽으로 탐색을 수행하는 것이다. FSA는 구성가능한 문장들을 모두 네트워크로 묶는 것으로, 이 방법은 인식 성능은 빨라지지만, 구성된 문장 이외의 문장이 들어오면 인식이 불가능하다는 문제점이 있다.Word networks are mainly implemented with word-pair grammars, N-grams, and finite state automata (FSA). The main idea is to concatenate words that can appear after a word, either fixed by a rule or a statistical probability value. The word-pair grammar matches only words that can appear after a particular word. For example, "eat" + "want" can be connected in order, but not vice versa. An example of such a word network structure of word-pair grammar is shown in FIG. N-grams use statistical probabilities to link words to words. We use a bunch of training data to calculate the probability that a word appears after a word and then search for the higher probability. FSA combines all the configurable sentences into a network. This method improves recognition performance, but has a problem in that when a sentence other than the composed sentence comes in, it cannot be recognized.

이와 같이 기존에 사용하던 단어 네트워크는 탐색공간을 줄임과 동시에 자유로운 문장을 구성하는데는 한계를 지니고 있다. 즉, 탐색공간을 줄이면 인식할 수 있는 문장의 자유도가 떨어지고(특히 어순이 자유로운 한국어의 경우에는 더욱 그렇다), 반대로 다양한 어순의 문장을 인식하려면 탐색 공간이 늘어나게 된다.As such, the existing word network has a limit in constructing free sentences while reducing search space. In other words, if the search space is reduced, the degree of freedom of recognizable sentences decreases (particularly in the case of Korean where the word order is free). On the contrary, the search space increases to recognize sentences of various word orders.

본 발명은 상기와 같은 문제점을 해결하고자 기본적으로는 음소단위의 음성인식기법을 이용하여 단어단위 인식의 장점을 이용하되, 기능어 단위 인식과는 달리 문장의 구조를 파악하고 의미를 이해하는데 중요한 역할을 하는 문장성분이나 핵심어들을 별도의 인식단위로 설계("의미어 모델(meaning word)")하여 인식률을 향상시키기 위한 것이다.The present invention basically utilizes the advantages of word unit recognition using phoneme-based speech recognition techniques to solve the above problems. Unlike functional word unit recognition, the present invention plays an important role in understanding the structure of a sentence and understanding the meaning. It is to improve recognition rate by designing sentence components or key words as a separate recognition unit ("meaning word").

또한, 본 발명은 인식의 성능을 향상시키기 위해 의미어를 중심으로 구성된단어 네트워크를 이용함으로써 탐색공간을 줄이고 또한 인식결과에 대한 오류 정정을 위해 의미 정보를 이용하여 최적의 인식결과를 도출해내기 위한 것이다.In addition, the present invention is to reduce the search space by using a word network centered on the semantic words to improve the performance of the recognition and to derive the optimum recognition results using the semantic information for error correction of the recognition results .

도 1a는 종래의 기능어단위 인식모델에 따른 음성인식기의 블럭도.1A is a block diagram of a speech recognizer according to a conventional functional unit recognition model.

도 1b는 종래의 음소단위 인식모델에 따른 음성인식기의 블럭도.1B is a block diagram of a speech recognizer according to a conventional phoneme unit recognition model.

도 2는 종래의 word-pair grammar로 구현된 단어 네트워크의 구성도.2 is a block diagram of a word network implemented with a conventional word-pair grammar.

도 3은 본 발명에 따른 연속음성인식시스템의 개략적인 블럭도.3 is a schematic block diagram of a continuous speech recognition system according to the present invention;

도 4a는 종래의 음소단위 HMM모델의 개념도.Figure 4a is a conceptual diagram of a conventional phoneme unit HMM model.

도 4b는 본 발명에 따른 의미어단위 HMM모델의 개념도.4b is a conceptual diagram of a semantic unit HMM model according to the present invention;

도 5는 본 발명에 따른 의미어 중심 단어 네트워크의 개념도.5 is a conceptual diagram of a semantic centric word network according to the present invention;

* 도면의 주요한 부분에 대한 부호의 설명 *Explanation of symbols on the main parts of the drawings

300 : 전처리기300: preprocessor

310 : 학습기310: learner

320 : 인식기320: recognizer

360 : 후처리기360: post processor

본 발명은 상기와 같은 기술적 과제를 해결하기 위해 화자가 발화한 문장의 구조를 파악하고 의미를 이해하는데 중요한 역할을 하는 문장성분, 예컨대 동사와 핵심어를 별도의 음성인식단위 모델로 하고, 단어네트워크를 의미어 중심으로 구성하며, 후처리기에서도 의미어와 관련된 의미정보를 이용하여 최적의 인식결과를 찾아낸다.In order to solve the technical problem as described above, a sentence component, for example, a verb and a key word, which play an important role in grasping the structure of a sentence spoken by a speaker and an understanding of a meaning, is used as a separate speech recognition unit model, and a word network is used. It consists mainly of semantic words and also finds the best recognition result by using semantic information related to semantic words in post processor.

본 발명의 하나의 특징은 음성인식방법에 관한 것으로, 입력 음성 신호로부터 인식에 사용될 특징벡터를 추출하는 단계와, 음소단위 모델과, 문장 구성에 중요한 역할을 하는 의미어를 별도의 인식단위로 하는 의미어단위모델을 인식단위모델로 이용하여 상기 추출된 특징벡터 신호에 대한 패턴매칭을 수행하는 단계를 포함한다.One aspect of the present invention relates to a speech recognition method, comprising extracting a feature vector to be used for recognition from an input speech signal, using a phoneme unit model, and a semantic that plays an important role in sentence construction as a separate recognition unit. And performing pattern matching on the extracted feature vector signal using the semantic unit model as a recognition unit model.

본 발명의 다른 특징은 연속음성인식방법에 관한 것으로, 입력 음성신호로부터 인식에 사용될 특징벡터를 추출하는 단계와, 인식단위모델로서 음소단위 모델과 문장구성에 중요한 역할을 하는 의미어를 별도의 인식단위로 하는 의미어단위 모델을 사용하고, 상기 의미어가 갖는 의미 정보를 이용하여 네트워크를 구성한 의미어중심 단어 네트워크를 이용하여 상기 추출된 특징벡터에 대한 패턴매칭을 수행하는 단계와, 패턴매칭 수행 결과 문장들중에서 의미어가 가지고 있는 정보를 이용해서 최적의 문장을 선별하는 단계를 포함한다.Another aspect of the present invention relates to a continuous speech recognition method, comprising: extracting a feature vector to be used for recognition from an input speech signal; and separately recognizing a phoneme unit model and a semantic that plays an important role in constructing a sentence as a recognition unit model. Performing pattern matching on the extracted feature vectors using a semantic center word network that uses a semantic unit model that is a unit and constructs a network using semantic information of the semantic word, and results of performing pattern matching Selecting the optimal sentence using the information of the semantic among the sentences.

본 발명의 또다른 특징은 음성인식기에 관한 것으로, 입력 음성신호로부터 인식에 사용될 특징벡터를 추출하는 전처리기와, 인식단위모델로서 음소단위 모델과 문장구성에 중요한 역할을 하는 의미어를 별도의 인식단위로 하는 의미어단위 모델을 사용하여 상기 전처리기로부터 출력된 특징벡터 신호에 대한 패턴매칭을 수행하는 인식기를 포함한다.Another aspect of the present invention relates to a speech recognizer, comprising: a preprocessor for extracting a feature vector to be used for recognition from an input speech signal; and a separate unit for recognizing a phoneme unit model and a semantic that plays an important role in sentence construction as a recognition unit model. And a recognizer for performing pattern matching on the feature vector signal output from the preprocessor using a semantic unit model.

본 발명의 또다른 특징은 연속음성인식시스템에 관한 것으로, 입력 음성신호로부터 인식에 사용될 특징벡터를 추출하는 전처리기와, 인식단위모델로서 음소단위 모델과 문장구성에 중요한 역할을 하는 의미어를 별도의 인식단위로 하는 의미어단위 모델을 사용하고, 상기 의미어가 갖는 의미 정보를 이용하여 네트워크를 구성한 의미어중심 단어 네트워크를 이용하여, 상기 전처리기로부터 출력된 특징벡터 신호에 대한 패턴매칭을 수행하는 인식기와, 상기 인식기의 패턴매칭 수행결과 문장들중에서 의미어가 가지고 있는 정보를 이용해서 최적의 문장을 선별하는 후처리기를 포함한다.Another aspect of the present invention relates to a continuous speech recognition system, comprising: a preprocessor for extracting feature vectors to be used for recognition from an input speech signal; A recognizer for performing pattern matching on the feature vector signal output from the preprocessor using a semantic unit word model using a semantic unit model as a recognition unit and using a semantic center word network configured with a semantic information of the semantic word. And a post-processor for selecting an optimal sentence using information possessed by the semantic word among the pattern matching result sentences of the recognizer.

바람직하게, 상기 음성인식방법 및 음성인식기에서 의미어는 문장구성에 중요한 역할을 하는 동사나 핵심어를 포함한다.Preferably, the semantic words in the speech recognition method and the speech recognizer include a verb or a key word that plays an important role in sentence construction.

또한, 바람직하게, 상기 음성인식방법 및 음성인식기에서 상기 입력 음성 신호는 예약, 증권서비스 관련 대화신호를 포함한다.In addition, preferably, in the voice recognition method and the voice recognizer, the input voice signal includes a reservation, a securities service related conversation signal.

이제, 도 3 내지 5를 참조하여 본 발명을 상세히 설명한다.The present invention will now be described in detail with reference to FIGS. 3 to 5.

도 3에 본 발명에 따른 음성인식장치의 한 예가 도시되어 있다. 음성인식장치는 전처리기(300)와, 학습기(310)와, 인식기(320)와, 후처리기(360)를 포함한다.3 shows an example of a voice recognition device according to the present invention. The speech recognition apparatus includes a preprocessor 300, a learner 310, a recognizer 320, and a postprocessor 360.

전처리기(300)는 입력으로 들어온 아날로그 음성 신호로부터 학습 및 인식에 사용될 특징벡터를 추출한다. 즉, 전처리기의 윗단으로 들어오는 음성입력신호는 음성인식을 위한 데이터이고, 아랫단으로 들어오는 음성데이터는 학습에 사용하기 위한 데이터를 나타낸다.The preprocessor 300 extracts a feature vector to be used for learning and recognition from the analog voice signal input to the input. That is, the voice input signal coming into the upper end of the preprocessor is data for voice recognition, and the voice data coming into the bottom end represents data for use in learning.

학습기(310)는 인식의 기본 단위를 통계적으로 학습하며, 이때 인식의 기본 단위는 음소가 될 수도 있고 형태소 혹은 단어가 될 수도 있는데, 본 발명에서는 학습기는 인식기에서 사용될 인식단위모델로서 음소모델과 의미어단위 모델을 생성한다. 즉, 학습기(310)는 음소단위 모델을 기본으로 하면서 주요 문장성분이나 동사 및 핵심어를 별도의 인식단위 즉, 의미어단위로 구성하여 모델을 학습한다. 학습하는 방법은 기존 음성인식장치의 학습방법을 사용하되, 의미어 모델과 음소단위 모델간의 학습이 공평하게 이루어지도록 학습데이터를 조절하며, 의미어 모델의 수는 상당히 많을 수도 있으나, 그 대화 영역이 특정 부분, 예를 들어, 호텔예약, 증권서비스 등에 한정된다면 적은 수의 의미어 모델이 존재할 수 있으며 학습 또한 용이하게 할 수 있다. 이와 같은 의미어 모델은 문장을 구성하고 의미를 부여하는데 중요한 역할을 하는 단어들에 대한 인식률을 높임으로써 발화 문장의 의미분석을 정확하게 할 수 있도록 해준다.The learner 310 statistically learns the basic unit of recognition, and the basic unit of recognition may be phoneme, morpheme or word. In the present invention, the learner is a recognition unit model to be used in the recognizer and a phoneme model. Create a unit model. That is, the learner 310 learns a model by constructing a main sentence component, a verb, and a key word into a separate recognition unit, that is, a semantic unit, based on a phoneme unit model. The method of learning uses the learning method of the existing speech recognition device, but adjusts the learning data so that the learning between the semantic model and the phoneme unit model is fair, and the number of the semantic models may be quite large, If it is limited to a certain part, for example, hotel reservation, securities service, etc., there may be a small number of semantic models, which may facilitate learning. Such a semantic model enables accurate semantic analysis of spoken sentences by increasing the recognition rate of words that play an important role in constructing and assigning sentences.

인식기(320)는 학습기(310)에서 학습된 인식단위 모델(330)과 발음사전(340)과, 의미어 중심 단어 네트워크(350)를 이용하여, 전처리기로부터 출력된 음성신호의 특징벡터들에 대해 패턴매칭을 수행한다. 본발명에서 인식기를 HMM(Hidden Markov Model)을 사용하여 구현하면, 음소단위 모델은 도 4a에 도시된 바와 같이일반적으로 3-상태 좌우형태의 모양을 가지며, 의미어단위 모델은 도 4b에 도시된 바와 같이 이보다 더 많은 상태를 가지는 좌우형태가 될 것이다.The recognizer 320 uses the recognition unit model 330 learned from the learner 310, the pronunciation dictionary 340, and the semantic center word network 350 to add feature vectors of the speech signal output from the preprocessor. Pattern matching is performed. When the recognizer is implemented using the HMM (Hidden Markov Model) in the present invention, the phoneme unit model generally has a three-state left and right shape as shown in FIG. 4A, and the semantic unit model is shown in FIG. 4B. As will be the left and right forms with more states than this.

본 발명에서는 또한 연속음성인식에 사용되는 단어 네트워크(350)를 의미어 중심으로 구성한다. 즉, 단어 네트워크를 의미어가 갖는 의도(intention) 및 의미(meaning) 정보를 이용하여 구성한다. 예를 들면, "예약하-"라는 의미어 모델이 있다고 가정할 때, 이 의미어가 가지고 있는 의도는 "예약"에 대한 것이고 담고 있는 정보는 (예약)주체, (예약)날짜, (예약)대상, (예약)개수 등이 될 것이다. 이러한 정보를 기초로 도 5에 도시된 바와 같이 의미어 중심 단어 네트워크를 구성할 수 있다. 이와 같이 구성된 의미어 중심 단어 네트워크로부터 의미어가 감지되면 전체 인식과정에서 만들어진 각 단어들이 포함되는 단어 네트워크 내의 범주를 이용해서 현재 올바른 문장으로 인식하고 있는지를 확인할 수 있다. 예를 들어서, "예약"에 대한 의미어가 감지되었는데 인식과정에서 "예약"과 전혀 관련없는 범주가 나타난다면 그 문장은 잘못 인식된 것이므로 그 방향으로는 더 이상 인식작업을 수행할 필요가 없다. 이러한 방법으로 해서 탐색 공간을 줄일 수 있다.In the present invention, the word network 350 used for continuous speech recognition is mainly composed of semantic terms. That is, the word network is configured by using the intention and meaning information of the semantic word. For example, suppose you have a semantic model of "reservation." The intent of this semantics is for "reservation." The information contained in the (reservation) subject, (reservation) date, and (reservation) object. , (Reservation) number, etc. Based on this information, a semantic center word network can be constructed as shown in FIG. 5. When the semantic word is detected from the semantic center word network configured as described above, it is possible to check whether the word is recognized as the correct sentence by using the category in the word network including the words generated in the entire recognition process. For example, if a semantic word for "reservation" is detected, and a category that is not related to "reservation" appears in the recognition process, the sentence is misrecognized and there is no need to perform the recognition operation any further in that direction. In this way, the search space can be reduced.

후처리기(360)는 인식기(320)로부터 출력된 N개의 인식결과 문장들중에서 가장 오류가 적은 문장을 찾아내는데, 이때에도 의미어가 가지고 있는 정보를 이용해서 의미에 맞는 최적의 문장을 찾아낸다. 즉, 인식문장에서 의미어가 여러 개 나타날 경우 우선순위를 정하여 문장을 구성하는데 가장 적합한 단어의 범주를 가려내는 작업을 한다. 예를 들어서, "제가 어제 예약한 것 확인해 주세요"에서 의미어가 "예약한"과 "확인해" 두개라면 여기서는 중요한 역할을 하는 의미어는 "확인해"가 될 것이다. 따라서 "확인하다"라는 의미어에 맞는 단어 네트워크를 통과한 문장인지를 후처리기에서 확인할 필요가 있고 만일 그렇지 않다면 후처리기는 N개의 인식 후보 문장들중에서 최적의 문장을 찾게 될 것이다.The post processor 360 finds a sentence having the least error among the N recognition result sentences outputted from the recognizer 320, and also finds an optimal sentence suitable for the meaning using information possessed by the semantic word. That is, when several semantic words appear in the recognition sentence, the priority is determined to select the most suitable word category for constructing the sentence. For example, in "Please confirm that I made a reservation yesterday", if the semantics are two "reserved" and "confirm", then the semantics that play an important role will be "confirm". Therefore, it is necessary to check in the postprocessor whether the sentence has passed the word network matching the word "confirm" or the postprocessor will find the optimal sentence among the N recognition candidate sentences.

이상과 같은 본 발명에 의하면, 음성인식단위모델로서 음소단위 모델뿐만 아니라 문장구성에 핵심적인 동사등의 의미어를 이용하여 의미어단위 모델을 사용함으로써 음성인식의 성능을 향상시킬 수 있다.According to the present invention as described above, the performance of speech recognition can be improved by using a semantic unit model using not only a phoneme unit model but also a semantic word such as a verb which is essential for sentence structure.

또한, 본 발명에 의하면 단어 네트워크를 의미어 중심으로 구성함으로써 탐색 공간을 줄일 수 있다.Further, according to the present invention, the search space can be reduced by constructing the word network around the semantic center.

또한, 본 발명에 의하면 단어 네트워크를 거쳐서 나온 인식결과 후보들을 의미어 정보를 이용함으로써 가장 오류가 작은 문장을 찾아낼 수 있다.In addition, according to the present invention, by using the semantic information on the recognition result candidates coming out through the word network, the sentence having the least error can be found.

Claims

음성인식방법에 있어서,In the voice recognition method,

입력 음성 신호로부터 인식에 사용될 특징벡터를 추출하는 단계와,Extracting a feature vector to be used for recognition from an input speech signal;

음소단위 모델 및 문장 구성에 중요한 역할을 하는 의미어를 별도의 인식단위로 하는 의미어단위모델을 인식단위모델로 이용하여 상기 추출된 특징벡터 신호에 대한 패턴매칭을 수행하는 단계를 포함하는 음성인식방법.Speech recognition comprising the step of performing pattern matching on the extracted feature vector signal using a semantic unit model having a semantic unit as a separate recognition unit as a semantic unit that plays an important role in the phoneme unit model and sentence construction. Way.

연속음성인식방법에 있어서,In the continuous speech recognition method,

입력 음성신호로부터 인식에 사용될 특징벡터를 추출하는 단계와,Extracting a feature vector to be used for recognition from an input speech signal;

인식단위모델로서 음소단위 모델 및 문장구성에 중요한 역할을 하는 의미어를 별도의 인식단위로 하는 의미어단위 모델을 사용하고, 상기 의미어가 갖는 의미 정보를 이용하여 네트워크를 구성한 의미어중심 단어 네트워크를 이용하여 상기 추출된 특징벡터에 대한 패턴매칭을 수행하는 단계와,As a recognition unit model, we use a semantic unit model that uses a semantic unit as a separate recognition unit as a phoneme unit model and a sentence that plays an important role in sentence structure, and uses the semantic information of the semantic to construct a network Performing pattern matching on the extracted feature vectors;

패턴매칭 수행 결과 문장들중에서 의미어가 가지고 있는 정보를 이용해서 최적의 문장을 선별하는 단계를 포함하는 연속음성인식방법.The pattern matching performance continuous speech recognition method comprising the step of selecting the optimal sentence using the information that the semantic words among the sentences.

상기 제1항 또는 제2항에 있어서,The method according to claim 1 or 2,

상기 의미어는 문장 구성에 중요한 동사나 핵심어를 포함하는 음성인식방법.The semantic words include verbs or key words that are important for sentence construction.

입력 음성 신호는 예약, 증권서비스 관련 대화신호를 포함하는 음성인식방법.The input voice signal includes a voice signal related to a reservation and securities service.

음성인식기에 있어서,In the voice recognizer,

입력 음성신호로부터 인식에 사용될 특징벡터를 추출하는 전처리기와,A preprocessor for extracting feature vectors to be used for recognition from an input speech signal;

인식단위모델로서 음소단위 모델 및 문장구성에 중요한 역할을 하는 의미어를 별도의 인식단위로 하는 의미어단위 모델을 사용하여 상기 전처리기로부터 출력된 특징벡터 신호에 대한 패턴매칭을 수행하는 인식기를 포함하는 음성인식기.A recognition unit model includes a recognizer that performs pattern matching on a feature vector signal output from the preprocessor using a semantic unit model having a semantic unit as a separate recognition unit as a phoneme unit model and a sentence structure. Voice recognizer.

연속음성인식기에 있어서,In the continuous speech recognizer,

인식단위모델로서 음소단위 모델 및 문장구성에 중요한 역할을 하는 의미어를 별도의 인식단위로 하는 의미어단위 모델을 사용하고, 상기 의미어가 갖는 의미 정보를 이용하여 네트워크를 구성한 의미어중심 단어 네트워크를 이용하여, 상기 전처리기로부터 출력된 특징벡터 신호에 대한 패턴매칭을 수행하는 인식기와,As a recognition unit model, we use a semantic unit model that uses a semantic unit as a separate recognition unit as a phoneme unit model and a sentence that plays an important role in sentence structure, and uses the semantic information of the semantic to construct a network A recognizer for performing pattern matching on the feature vector signal output from the preprocessor;

상기 인식기의 패턴매칭 수행결과 문장들중에서 의미어가 가지고 있는 정보를 이용해서 최적의 문장을 선별하는 후처리기를 포함하는 연속음성인식기.And a post-processor for selecting an optimal sentence using information possessed by the semantic word among the sentence matching result sentences of the recognizer.

상기 제5항 또는 제6항에 있어서,The method according to claim 5 or 6,

상기 의미어는 문장 구성에 중요한 동사나 핵심어를 포함하는 음성인식기.The semantic word is a speech recognizer including a verb or a key word important to construct a sentence.

상기 입력 음성 신호는 예약, 증권서비스 관련 대화신호를 포함하는 음성인식기.The input voice signal is a voice recognizer including a reservation, securities service-related talk signal.