KR20040072104A

KR20040072104A - Method for enhancing search speed of speech recognition system

Info

Publication number: KR20040072104A
Application number: KR1020030008038A
Authority: KR
Inventors: 김지환
Original assignee: 엘지전자 주식회사
Priority date: 2003-02-08
Filing date: 2003-02-08
Publication date: 2004-08-18
Also published as: KR100480039B1

Abstract

PURPOSE: A method for improving a search speed of a voice recognizer is provided to calculate an upper limit of a maximum value in a current node instead of calculating a likelihood of an input voice for all paths, and to form a decreased lexical tree to perform a search process, thereby reducing an entire search time. CONSTITUTION: A system defines a state combination such that a weight on a state mixture becomes a maximum value and a state transition probability is the biggest. The system defines a maximum likelihood among all paths, and a likelihood obtained for an input after combining all descendent nodes. The system prunes not to further search a state which satisfies a predetermined condition. The system carries out a state combination only for adjacent states without executing the state combination for all the descendent nodes, and decreases a lexical tree as maintaining dependency in accordance with a sequence among the states.

Description

음성인식기의 탐색속도 향상방법{METHOD FOR ENHANCING SEARCH SPEED OF SPEECH RECOGNITION SYSTEM}How to improve search speed of voice recognizer {METHOD FOR ENHANCING SEARCH SPEED OF SPEECH RECOGNITION SYSTEM}

본 발명은 음성인식기에서 스테이트 병합을 이용하여 탐색 속도를 향상시키는 기술에 관한 것으로, 특히 준 연속식 은닉 마르코프 모델 또는 이산형 은닉 마르코프 모델을 이용하는 음성인식 시스템에서 확률값을 발생시키는 경로를 탐색할 때 스테이트 병합을 이용하여 탐색 속도를 향상시킬 수 있도록 한 음성인식기의 탐색속도 향상방법에 관한 것이다.The present invention relates to a technique for improving search speed by using state merging in a speech recognizer, and in particular, when searching a path for generating a probability value in a speech recognition system using a semi-continuous hidden Markov model or a discrete hidden Markov model. The present invention relates to a method of improving the search speed of a speech recognizer to improve the search speed by using merge.

음성인식기는 입력된 음성과 확률적으로 가장 유사한 등록단어(또는 등록된 단어들의 열)를 출력하게 된다. 따라서, 연속 음성인식에서는 등록된 단어들의 모든 조합에 대해서 입력 음성이 발성될 확률을 구해야 한다.The speech recognizer outputs a registered word (or a string of registered words) that is most likely similar to the input voice. Therefore, in continuous speech recognition, the probability that the input voice is spoken for all combinations of registered words should be calculated.

확률값을 발생시키는 것이 가능한 경로들 중 입력 음성에 대해서 가장 높은 확률값을 발생시키는 경로를 찾는 과정을 탐색(search)이라 한다. 가능한 모든 경로에 대해서 탐색이 진행되는 것은 이론적으로는 가능하지만, 발음사전에 등록된 단어의 수가 늘어나게 되면 시간상 제약과 메모리 공간상의 제약 등으로 인하여 가능한 모든 경로에 대한 탐색을 수행하는 것은 실질적으로는 불가능하게 된다.The process of finding a path that generates the highest probability value for the input speech among the paths that can generate a probability value is called search. Searching for all possible paths is theoretically possible, but if the number of words registered in the pronunciation dictionary increases, it is practically impossible to search for all possible paths due to time constraints and memory space constraints. Done.

발생 가능한 경로들로 네트워크를 구성했을 때 이 네트워크로부터 특정 경로를 제거하는 방법을 프루닝이라고 하는데, 도 1은 기존에 널리 사용되는 빔 프루닝 방법을 나타낸 것이다.When a network is composed of possible paths, a method of removing a specific path from the network is called pruning. FIG. 1 illustrates a beam pruning method that is widely used.

상기 빔 프루닝 방법에서는 입력에 대한 실제 최적의 경로가 프루닝될 수 있으며, 프루닝 정도는 빔의 크기가 작아짐에 따라서 커지게 된다. 특히 잡음 등으로 음성신호가 왜곡된 경우 탐색 초반부에 실제 최적 경로의 유사도(likelihood)가 빔의 범위 밖에 놓이게 되어 실제 최적 경로가 프루닝될 가능성이 높아진다. 따라서, 빔의 크기에 대해서 인식소요시간과 인식율간에는 상충(trade-off) 관계가 성립하게 된다.In the beam pruning method, the actual optimal path to the input can be pruned, and the degree of pruning increases as the size of the beam decreases. In particular, when the voice signal is distorted due to noise or the like, the likelihood of the actual optimal path is outside the range of the beam at the beginning of the search, thereby increasing the possibility of the actual optimal path being pruned. Therefore, a trade-off relationship is established between the recognition time and the recognition rate with respect to the beam size.

이와 같이 종래의 음성인식기에 있어서는 자손 노드(descendent node)들로 구성 가능한 모든 경로들에 대해서 입력 음성의 유사도를 계산한 후 최적경로를 결정하게 되므로, 경로 탐색 시간이 많이 소요되는 단점이 있었다.As described above, in the conventional voice recognizer, the optimum path is determined after calculating the similarity of the input voice with respect to all paths configurable as descendant nodes.

따라서, 본 발명의 제1목적은 탐색시 자손 노드들로 구성 가능한 모든 경로들에 대해서 입력 음성의 유사도를 계산하는 것이 아니라, 현재 노드에서 이의 최대값의 상한을 계산해서 프루닝 방식을 개선하는데 있다.Therefore, the first object of the present invention is not to calculate the similarity of the input voice for all paths configurable as descendant nodes in the search, but to improve the pruning scheme by calculating the upper limit of its maximum value at the current node. .

본 발명의 제2목적은 주어진 어휘 트리(lexical tree) 보다 크기가 축소된 어휘 트리를 만들고, 이를 이용하여 탐색을 선행하여 전체 탐색시간을 줄이는데 있다.A second object of the present invention is to create a lexical tree with a smaller size than a given lexical tree, and to use this to reduce the overall search time prior to the search.

도 1은 종래 기술에 의한 빔 프루닝 방법의 설명도.1 is an explanatory diagram of a beam pruning method according to the prior art;

도 2는 본 발명의 스테이트 병합에 따른 믹스춰에 대한 가중치 결정 방법의 설명도.2 is an explanatory diagram of a weight determination method for a mixture according to the state merging of the present invention.

도 3은 본 발명에 의한 프루닝 방법의 설명도.3 is an explanatory diagram of a pruning method according to the present invention.

도 4는 본 발명에 의한 트리 크기 축소 방법의 실행 예시도.4 is an exemplary view of execution of a method for reducing tree size according to the present invention;

본 발명의 제1특징에 따르면, 두 개 이상의 스테이트(state)들을 병합하는 처리방법을 구비하는데, 이 처리방법은 준 연속식 은닉 마르코프 모델(SCHMM: Semi-Continuous HMM) 또는 이산형 은닉 마르코프 모델(DHMM: Discrete HMM)를 이용하는 음성인식 시스템에서 적용 가능하다.According to a first aspect of the invention, there is provided a method of merging two or more states, which method comprises a semi-continuous hidden Markov model (SCHMM) or a discrete hidden Markov model (SCH). DHMM: Applicable in speech recognition systems using Discrete HMM.

본 발명의 제2특징에 따르면, 특징 1에서의 병합 처리방법을 바탕으로, 병합 결과만을 이용해서 입력 음성에 대한 병합 되어진 모든 스테이트들로 부터의 유사도의 최대값의 상한을 결정하는 처리방법을 구비한다.According to a second aspect of the present invention, there is provided a processing method for determining an upper limit of the maximum value of similarity from all merged states for an input voice using only the merge result based on the merge processing method in feature 1. do.

본 발명의 제3특징에 따르면, 특징 1에서의 병합 처리방법을 바탕으로 크기가 축소된 어휘 트리를 생성하는 처리방법을 구비한다.According to a third aspect of the present invention, there is provided a processing method for generating a reduced sized lexical tree based on the merging processing method in the first feature.

본 발명에 의한 음성인식기의 탐색속도 향상방법은, 스테이트 병합 후 스테이트의 믹스춰에 대한 가중치는 병합전 스테이트들의 해당 믹스춰에 대한 가중치들 중 최대값이 되고, 병합 후 스테이트의 천이 확률은 병합전 스테이트들간의 천이 확률 중 가장 큰 값이 되도록 스테이트 병합을 정의하는 제1과정과; t 시점에서 스테이트 j에 도달하는 모든 경로들 중 최대의 유사도 및, 스테이트 j와 스테이트 j의 모든 자손 노드들을 병합한 후에에서시점까지의 입력에 대해서 구한 유사도의 상한을 정의하는 제2과정과; 작은 크기의 빔을 이용하여 구해진 최적 경로의 log 유사도의 하한을 K라 할 때,를 만족하는 스테이트 i에 대해서 더 이상 탐색을 수행하지 않도록 프루닝하는 제3과정과; 모든 자손 노드에 대하여 상기 스테이트 병합을 수행하는 것이 아니라, 이웃한 스테이트들에 대해서만, 또는 트리에서 분기가 일어나는 지점 등에 대해서만 스테이트 병합을 수행하여 스테이트간 순서에 따른 종속성을 유지하면서 어휘트리를 축소하는 제4과정으로 이루어지는 것으로, 이와 같은 본 발명의 탐색속도 향상방법을 첨부한 도 2 내지 도 4를 참조하여 상세히 설명하면 다음과 같다.In the method of improving the search speed of the voice recognizer according to the present invention, the weight of the state after the state merge is the maximum value among the weights of the corresponding mix of states before the merge, and the state transition probability of the state after the merge A first step of defining a state merging so as to be the largest value of the transition probabilities between states; After merging the maximum similarity of all paths to state j at time point t and all descendant nodes of state j and state j in A second process of defining an upper limit of the similarity obtained for the input up to the viewpoint; When the lower limit of the log similarity of the optimal path obtained using a small beam is K, Pruning so that no further searching is performed on state i satisfying? Rather than performing the state merging for all descendant nodes, perform state merging only for neighboring states or only for branching points in the tree to reduce the lexical tree while maintaining dependencies between states. It consists of four processes, described in detail with reference to Figures 2 to 4 attached to the search speed improvement method of the present invention as follows.

준 연속식 은닉 마르코프 모델(SCHMM)의 경우 스테이트 j에서 t 시점에서의 입력 음성에 대한 출력확률는 다음의 [수학식1]과 같이 계산된다.Input speech at time t in state j for a semi-continuous hidden Markov model (SCHMM) Output probability for Is calculated as shown in Equation 1 below.

여기서,은 스테이트 j의 믹스춰(mixture) m에 대한 가중치이고,은 믹스춰 m으로부터 계산된에 대한 가우시안(Gaussian) 확률값을 의미한다.here, Is the weight for the mixture m in state j, Is calculated from the mixture m Gaussian probability for.

상기 모델(SCHMM)은 믹스춰들을 공유하는 준 연속식 HMM이므로는 스테이트에 독립적이 된다. 따라서,는 탐색시 스테이트 j에서 각 믹스춰에대한 가중치에 의해서 결정된다.The model (SCHMM) is a quasi-continuous HMM that shares mixes Becomes state-independent. therefore, Is the weight for each mix in state j in the search Determined by

스테이트 A의 믹스춰들에 대한 가중치를이라 하고, 스테이트 B의 믹스춰들에 대한 가중치를이라 하며, 스테이트 C의 가중치들이 다음의 [수학식2]와 같을 때,The weights for the mixes in State A Let's call the weights for state B's mixes When the weights of state C are equal to the following [Equation 2],

스테이트 C의 가중치를 이용하여에 대한 출력확률을 구하면,이고,가 된다. 즉, 스테이트 A와 스테이트 B로부터 구해지는에 대한 출력확률값들은이상이 된다. 이와 같은 방법으로 스테이트 D의 가중치들을 다음의 [수학식3]라 할 때,Using the weights of state C Finding the output probability for, ego, Becomes That is, obtained from state A and state B The output probability values for It becomes abnormal. In this way, when the weights of state D are given by the following Equation 3,

스테이트 D의 가중치를 이용하여에 대한 출력확률을 구하면,이고,가 된다. 즉, 스테이트 A와 스테이트 B로부터 구해지는에 대한 출력확률값들은이하가 된다.Using the weight of state D Finding the output probability for, ego, Becomes That is, obtained from state A and state B The output probability values for It becomes as follows.

본 발명에서는 두 개 이상의 스테이트 병합을 다음과 같이 정의한다. 첫째, 병합 후 스테이트의 믹스춰에 대한 가중치는 병합전 스테이트들의 해당 믹스춰에 대한 가중치들 중 최대값이 된다. 둘째, 병합 후 스테이트의 천이 확률은 병합전스테이트들간의 천이 확률 중 가장 큰 값이 된다.In the present invention, two or more state merges are defined as follows. First, the weight for the mix of states after merging is the maximum of the weights for the corresponding mix of states before merging. Second, the transition probability of the post-merging state is the largest of the transition probabilities among the pre-merging states.

도 2는 본 발명에 의한 병합으로 스테이트의 믹스춰에 대한 가중치가 결정되는 과정을 보여주고 있다. 상기 설명에서의 스테이트 병합 방법은 이산형 은닉 마르코프 모델(DHMM)을 이용한 음성인식기에도 동일하게 적용할 수 있다.Figure 2 shows the process of determining the weight for the mix of the state by merging according to the present invention. The state merging method in the above description can be equally applied to a speech recognizer using a discrete hidden Markov model (DHMM).

와를 다음과 같이 정의한다. Wow Define as

: t 시점에서 스테이트 j에 도달하는 모든 경로들 중 최대의 유사도. : maximum similarity among all paths reaching state j at time point t.

: 스테이트 j와 스테이트 j의 모든 자손 노드(descendent node)들을 병합한 후에에서시점까지의 입력에 대해서 구한 유사도. : After merging state j and all descendant nodes of state j in Similarity obtained for the input up to the time point.

스테이트 병합의 정의에 따라 어휘 트리에 있는 스테이트 j에 대해서 t+1에서 T 시점까지의 입력에 대한 스테이트 j와 스테이트 j의 자손 노드들에 의해서 얻어지는 유사도들은 모두보다 작게 된다.For the state j in the lexical tree, according to the definition of state merging, the similarities obtained by state j and child nodes of state j for input from time t + 1 to T are both Becomes smaller.

빔(beam)의 크기가 커짐에 따라 보다 정확한 음성인식 결과를 찾는 것이 가능해지는 반면 소요시간과 메모리 소요량이 늘어나게 된다. 본 발명에서 제안하는 프루닝 방법은 작은 크기의 빔을 이용하여 최적경로의 유사도의 하한을 구한 후, 이 값과 본 발명에서 제안하는 스테이트 병합 방법으로 구현된다.As the size of the beam increases, it becomes possible to find more accurate voice recognition results, while increasing the time and memory requirements. The pruning method proposed by the present invention is implemented using the state merging method proposed by the present invention after obtaining the lower limit of the similarity of the optimum path using a small sized beam.

작은 크기의 빔을 이용하여 구해진 최적 경로의 log 유사도의 하한을 K라 할 때, 이 K를 이용해서 다음의 [수학식4]를 만족하는 스테이트 i에 대해서 더 이상 탐색을 수행하지 않도록 한다.(도 3 참조)When the lower limit of the log similarity of the optimal path obtained using a small beam is K, this K is used to avoid further searching for the state i that satisfies Equation 4 below. 3)

도 3은 본 발명에 의한 프루닝 방법의 실시 예를 나타낸 것으로, 여기서는 t 시점에서 스테이트 i에 도달하는 모든 경로들 중 최대 유사도를 의미하고,는 스테이트 i와 스테이트 i의 모든 자손 노드들을 병합한 후에서시점까지의 입력에 대해서 구한 유사도를 의미하며, K는 작은 크기의 빔을 이용하여 구해진 최적 경로의유사도의 하한을 의미한다.3 shows an embodiment of a pruning method according to the present invention, where Denotes the maximum similarity among all the paths reaching state i at time point t, Merges state i and all descendant nodes of state i in It means the similarity obtained for the input up to the point of view, where K is the optimal path It means the lower limit of similarity.

결국, 본 발명에 의해 제안된 프루닝 방법은 병합의 특성으로 인하여 최적의 경로가 프루닝되지 않는 것을 보장하며, 탐색을 하는 과정에서 자손 노드들에 대해서 탐색 공간을 확장하지 않고도 프루닝 여부를 결정할 수 있다. 또한, 본 발명에 의해 제안된 프루닝 방법은 빔 프루닝 방법과 병행해서 사용이 가능하다.Finally, the pruning method proposed by the present invention ensures that the optimal path is not pruned due to the nature of merging, and determines whether to pruning without expanding the search space for descendant nodes during the search. Can be. In addition, the pruning method proposed by the present invention can be used in parallel with the beam pruning method.

상기와 같은 방법으로 추정된는 병합 전 스테이트간 순서에 따른 종속성(dependency)을 무시하는 단점이 있기 때문에, 트리(tree)에서 고도(height)가 낮을수록(즉, root에 가까울수록), 그리고 t의 값이 작을수록 실제 탐색 공간(space)을 확장하면서 구해진 최대 유사도와의 차이가 커지게 된다.Estimated in the same way as above Has the disadvantage of ignoring dependencies between states before merging, so the lower the height in the tree (i.e., the closer to the root), and the smaller the value of t, the actual search As the space is expanded, the difference from the maximum similarity obtained is increased.

모든 자손 노드에 대하여 스테이트 병합을 수행하는 것이 아니라, 이웃한 스테이트들에 대해서만, 또는 트리에서 분기(branching)가 일어나는 지점 등에 대해서만 스테이트 병합을 수행하면 스테이트 간 순서에 따른 종속성을 유지하면서 주어진 어휘 트리보다 크기가 축소된 어휘 트리를 만들 수 있게 된다. 크기가 축소된어휘 트리를 이용해서 탐색을 선행함으로써 전체 탐색시간을 줄이는 효과를 볼 수 있다.Rather than performing state merging for all descendant nodes, but only for neighboring states, or only at the point of branching in the tree, etc., the state merging preserves the order-dependent dependencies between states rather than the given lexical tree. You will be able to create a reduced lexical tree. We can reduce the overall search time by preceding the search with the reduced sized vocabulary tree.

도 4는 본 발명에 의한 트리 크기 축소 방법의 실행 예를 나타낸 것이다. 여기에서는 트리에서 분기가 일어난 스테이트들의 자식노드에서 스테이트 병합이 수행된다. 단, 이러한 자식노드들은 최종 스테이트가 아니어야 한다.Figure 4 shows an example of the execution of the tree size reduction method according to the present invention. Here, state merging is performed on the child nodes of the branches in the tree. However, these child nodes should not be final states.

참고로, 본 발명에서 제안한 스테이트 변환 및 어휘 트리 축소 방법은 모두 컴파일 타임(compile time)때 구현 가능하다.For reference, both the state transformation and the lexical tree reduction method proposed in the present invention can be implemented at compile time.

이상에서 상세히 설명한 바와 같이 본 발명은 준 연속식 은닉 마르코프 모델(SCHMM) 또는 이산형 은닉 마르코프 모델(DHMM)을 이용하여 음성인식기를 구현하는 경우, 구성 가능한 모든 경로들에 대해서 입력 음성의 유사도를 계산하는 것이 아니라 현재 노드에서 이의 최대값의 상한을 계산하여 프루닝 방식이 개선되는 효과가 있다. 또한, 주어진 어휘 트리보다 축소된 어휘 트리를 만들고, 이를 이용하여 탐색을 선행함으로써, 전체 탐색 시간이 줄어드는 효과가 있다.As described in detail above, when the speech recognizer is implemented using a semi-continuous hidden Markov model (SCHMM) or a discrete hidden Markov model (DHMM), the similarity of the input speech is calculated for all configurable paths. Rather than calculating the upper limit of its maximum value at the current node, the pruning method is improved. In addition, by creating a reduced lexical tree than a given lexical tree and using the preceding search, the overall search time is reduced.

Claims

스테이트 병합 후 스테이트의 믹스춰에 대한 가중치는 병합전의 해당 가중치들 중 최대값이 되고, 병합 후 스테이트의 천이 확률은 병합전 스테이트들간의 천이 확률 중 가장 큰 값이 되도록 스테이트 병합을 정의하는 제1과정과; t 시점에서 스테이트 j에 도달하는 모든 경로들 중 최대의 유사도 및, 스테이트 j와 스테이트 j의 모든 자손 노드들을 병합한 후에에서시점까지의 입력에 대해서 구한 유사도를 정의하는 제2과정과; 작은 크기의 빔을 이용하여 구해진 최적 경로의 log 유사도의 하한을 K라 할 때, 소정의 조건을 만족하는 스테이트 i에 대해서 더 이상 탐색을 수행하지 않도록 프루닝하는 제3과정과; 모든 자손 노드에 대하여 상기 스테이트 병합을 수행하는 것이 아니라, 이웃한 스테이트들에 대해서만, 또는 트리에서 분기가 일어나는 지점 등에 대해서만 스테이트 병합을 수행하여 스테이트간 순서에 따른 종속성을 유지하면서 어휘트리를 축소하는 제4과정으로 이루어지는 것을 특징으로 하는 음성인식기의 탐색속도 향상방법.The first process of defining the state merging so that the weight of the state mixture after the state merging becomes the maximum value among the corresponding weights before merging, and the transition probability of the state after merging becomes the largest value among the transition probabilities between states before merging. and; After merging the maximum similarity of all paths to state j at time point t and all descendant nodes of state j and state j in A second step of defining similarity obtained for the input up to the viewpoint; A third step of pruning no further searching for state i satisfying a predetermined condition when a lower limit of log similarity of an optimal path obtained using a small sized beam is K; Rather than performing the state merging for all descendant nodes, perform state merging only for neighboring states or only for branching points in the tree to reduce the lexical tree while maintaining dependencies between states. Search speed improvement method of the speech recognizer, characterized in that consisting of four steps.

제1항에 있어서, 제2과정은 스테이트 병합의 정의에 따라 어휘 트리에 있는 스테이트 j에 대해서 t+1에서 T 시점까지의 입력에 대해서 스테이트 j와 스테이트 j의 자손 노드들에 의해서 얻어지는 유사도들은 모두보다 작게 되는 것을 특징으로 하는 음성인식기의 탐색속도 향상방법.2. The method of claim 1, wherein the second step is that the similarities obtained by state j and the descendant nodes of state j for all inputs from time t + 1 to time T for state j in the lexical tree according to the definition of state merging. The search speed improvement method of the voice recognizer, characterized in that the smaller.

제1항에 있어서, 제3과정의 소정의 조건은인 것을 특징으로 하는 하는 음성인식기의 탐색속도 향상방법.The method of claim 1, wherein the predetermined condition of the third process is Search speed improvement method of the speech recognizer, characterized in that.