KR102062694B1

KR102062694B1 - Apparatus and method for postprocessing of decoded speech

Info

Publication number: KR102062694B1
Application number: KR1020170104421A
Authority: KR
Inventors: 최승호; 이정훈; 윤덕규; 김홍국
Original assignee: 국방과학연구소
Priority date: 2017-08-17
Filing date: 2017-08-17
Publication date: 2020-01-06
Also published as: KR20190019468A

Abstract

본 발명은 음성 복호화 시스템의 후처리 장치에 관한 것으로, 원음과 상기 원음의 복호화 음성을 각각 주파수 영역의 신호로 변환하는 신호 변환부; 상기 주파수 영역의 신호로 변환된 원음 및 복호화 음성의 주파수 스펙트럼 비율을 연산하는 스펙트럼 비율 연산부; 상기 주파수 스펙트럼 비율에 기초하여 상기 원음 및 복호화 음성의 가중치 벡터를 연산하는 가중치 벡터 연산부; 및 상기 가중치 벡터를 후처리 대상 음성의 복호화 음성에 적용하는 가중치 적용부를 포함할 수 있다.The present invention relates to a post-processing apparatus of a speech decoding system, comprising: a signal converter for converting original speech and decoded speech of the original speech into signals in a frequency domain, respectively; A spectral ratio calculator configured to calculate a frequency spectral ratio of the original sound and the decoded voice converted into the signal in the frequency domain; A weight vector calculator configured to calculate a weight vector of the original and decoded speech based on the frequency spectrum ratio; And a weight applying unit that applies the weight vector to the decoded speech of the post-processing target speech.

Description

음성 복호화 시스템의 후처리 장치 및 방법 {APPARATUS AND METHOD FOR POSTPROCESSING OF DECODED SPEECH}Post-processing apparatus and method of speech decoding system {APPARATUS AND METHOD FOR POSTPROCESSING OF DECODED SPEECH}

본 발명은 음성 복호화 시스템의 후처리 기술에 관한 것이다.The present invention relates to a post-processing technique of a speech decoding system.

디지털 음성통신에 사용되는 음성 복호화기는 음성 발성 모델의 파라미터를 이용하여 음성을 합성(즉 복호화)하는데, 대부분의 음성 복호화기는 한정된 비트 수로 음성 파라미터를 할당하는 방식을 적용하고 있다.Voice decoders used in digital voice communication synthesize (ie, decode) voices using parameters of voice speech models. Most voice decoders adopt a method of allocating voice parameters with a limited number of bits.

또한, 음성 복호화기에 LPC(Linear Predictive Coding), LSF(Line Spectral Frequency) 등과 같은 후처리 기술을 적용하여 음성의 명료도를 제한적으로 개선하고자 하는 시도가 이루어지고 있다.In addition, attempts have been made to restrict the clarity of speech by applying post-processing techniques such as linear predictive coding (LPC), line spectral frequency (LSF), etc. to the speech decoder.

한국공개특허 2011-0057596호 (2011.06.01 공개)Korean Patent Publication No. 2011-0057596 (Published June 1, 2011)

본 발명의 실시예에서는, 복호화된 음성의 명료도를 향상시킬 수 있는 음성 복호화 시스템의 후처리 기술을 제안하고자 한다.In an embodiment of the present invention, a post-processing technique of a speech decoding system capable of improving the intelligibility of decoded speech is proposed.

본 발명의 실시예에 따르면, 원음과 상기 원음의 복호화 음성을 각각 주파수 영역의 신호로 변환하는 신호 변환부; 상기 주파수 영역의 신호로 변환된 원음 및 복호화 음성의 주파수 스펙트럼 비율을 연산하는 스펙트럼 비율 연산부; 상기 주파수 스펙트럼 비율에 기초하여 상기 원음 및 복호화 음성의 가중치 벡터를 연산하는 가중치 벡터 연산부; 및 상기 가중치 벡터를 후처리 대상 음성의 복호화 음성에 적용하는 가중치 적용부를 포함하는 음성 복호화 시스템의 후처리 장치를 제공할 수 있다.According to an embodiment of the present invention, a signal converter for converting the original sound and the decoded voice of the original sound into a signal in the frequency domain, respectively; A spectral ratio calculator configured to calculate a frequency spectral ratio of the original sound and the decoded voice converted into the signal in the frequency domain; A weight vector calculator configured to calculate a weight vector of the original and decoded speech based on the frequency spectrum ratio; And a weight applying unit configured to apply the weight vector to the decoded speech of the post-processing target voice.

또한, 본 발명의 실시예에 따르면, 원음의 복호화 음성을 출력하는 음성 복호화 시스템의 후처리 방법에 있어서, 상기 원음 및 복호화 음성을 주파수 영역의 신호로 변환하는 단계; 상기 주파수 영역의 신호로 변환된 원음 및 복호화 음성의 주파수 스펙트럼 비율을 연산하는 단계; 상기 주파수 스펙트럼 비율에 따라 상기 원음 및 복호화 음성의 가중치 벡터를 연산하는 단계; 및 상기 가중치 벡터를 후처리 대상 음성의 복호화 음성에 적용하는 단계를 포함하는 음성 복호화 시스템의 후처리 방법을 제공할 수 있다.According to an embodiment of the present invention, there is provided a post-processing method of a speech decoding system for outputting a decoded speech of an original sound, comprising: converting the original sound and the decoded speech into a signal in a frequency domain; Calculating a frequency spectral ratio of the original sound and the decoded voice converted into the signal in the frequency domain; Calculating weight vectors of the original and decoded speech according to the frequency spectrum ratio; And applying the weight vector to the decoded speech of the post-processing target voice.

또한, 본 발명의 실시예에 따르면, 후처리 대상 음성의 복호화 음성을 주파수 영역의 신호로 변환하는 신호 변환부; 및 상기 주파수 영역의 신호로 변환된 후처리 대상 음성의 복호화 음성에 가중치 벡터를 적용하는 가중치 적용부를 포함하며, 상기 가중치 벡터는, 원음 및 복호화 음성 간의 주파수 스펙트럼 비율을 기초로 연산되는 음성 복호화 시스템의 후처리 장치를 제공할 수 있다.In addition, according to an embodiment of the present invention, a signal converter for converting the decoded voice of the post-processing target voice to a signal in the frequency domain; And a weight applying unit that applies a weight vector to the decoded speech of the post-processing target speech converted into a signal in the frequency domain, wherein the weight vector is calculated based on a frequency spectrum ratio between the original sound and the decoded speech. A post-treatment device can be provided.

본 발명의 실시예에 의하면, 원음과 복호화 음성의 스펙트럼 비율을 이용하여 원음과 복호화 음성 간의 스펙트럼 왜곡을 보상해 줌으로써, 복호화 음성의 명료도를 향상시킬 수 있다. 특히, 기존 방식의 후처리 기술과 비교하여 우수한 명료도 측정 결과를 얻을 수 있었다.According to an embodiment of the present invention, the clarity of the decoded speech can be improved by compensating for the spectral distortion between the original and decoded speech using the spectral ratio of the original sound and the decoded speech. In particular, compared with the conventional post-treatment technology, excellent clarity measurement results were obtained.

도 1은 본 발명의 실시예에 따른 음성 복호화 시스템에서 가중치 벡터를 연산하기 위한 후처리 장치의 블록도이다.
도 2는 도 1의 스펙트럼 비율 연산부(104)가 주파수 스펙트럼 비율을 구하는데 사용하는 주파수 밴드별 주파수 범위를 예시한 테이블이다.
도 3은 본 발명의 실시예에 따른 음성 복호화 시스템에서 가중치 벡터를 적용하기 위한 후처리 장치의 블록도이다.
도 4는 본 발명의 실시예에 따른 음성 복호화 시스템에서 가중치 벡터를 연산하는 후처리 과정을 예시한 흐름도이다.
도 5는 본 발명의 실시예에 따른 음성 복호화 시스템에서 가중치 벡터를 적용하는 후처리 과정을 예시한 흐름도이다.1 is a block diagram of a post-processing apparatus for calculating a weight vector in a speech decoding system according to an embodiment of the present invention.
FIG. 2 is a table illustrating a frequency band for each frequency band used by the spectral ratio calculator 104 of FIG. 1 to obtain a frequency spectral ratio.
3 is a block diagram of a post-processing apparatus for applying a weight vector in a speech decoding system according to an embodiment of the present invention.
4 is a flowchart illustrating a post processing process of calculating a weight vector in a voice decoding system according to an embodiment of the present invention.
5 is a flowchart illustrating a post processing process of applying a weight vector in a speech decoding system according to an embodiment of the present invention.

본 발명의 이점 및 특징, 그리고 그것들을 달성하는 방법은 첨부되는 도면과 함께 상세하게 후술되어 있는 실시예들을 참조하면 명확해질 것이다. 그러나 본 발명은 이하에서 개시되는 실시예들에 한정되는 것이 아니라 다양한 형태로 구현될 수 있으며, 단지 본 실시예들은 본 발명의 개시가 완전하도록 하고, 본 발명이 속하는 기술분야에서 통상의 지식을 가진 자에게 발명의 범주를 완전하게 알려주기 위해 제공되는 것이며, 본 발명의 범주는 청구항에 의해 정의될 뿐이다.Advantages and features of the present invention, and methods for achieving them will become apparent with reference to the embodiments described below in detail in conjunction with the accompanying drawings. However, the present invention is not limited to the embodiments disclosed below, but can be implemented in various forms, only the embodiments are to make the disclosure of the present invention complete, and those skilled in the art to which the present invention pertains. It is provided to fully inform the scope of the invention, and the scope of the invention is defined only by the claims.

본 발명의 실시예들을 설명함에 있어서 공지 기능 또는 구성에 대한 구체적인 설명은 본 발명의 실시예들을 설명함에 있어 실제로 필요한 경우 외에는 생략될 것이다. 그리고 후술되는 용어들은 본 발명의 실시예에서의 기능을 고려하여 정의된 용어들로서 이는 사용자, 운용자의 의도 또는 관례 등에 따라 달라질 수 있다. 그러므로 그 정의는 본 명세서 전반에 걸친 내용을 토대로 내려져야 할 것이다.In describing the embodiments of the present invention, detailed descriptions of well-known functions or configurations will be omitted unless they are actually necessary in describing the embodiments of the present invention. In addition, terms to be described below are terms defined in consideration of functions in the embodiments of the present invention, which may vary according to intentions or customs of users and operators. Therefore, the definition should be made based on the contents throughout the specification.

본 발명의 실시예는 음성 발성 모델의 파라미터를 이용하여 음성을 합성하는 음성 복호화 시스템에서 복호화 음성의 명료도 저하 현상을 해결하기 위하여 각각의 음성 복호화기 별로 원음과 복호화 음성의 스펙트럼 비를 이용하여 통계적으로 가중치 벡터를 구하고, 구해진 가중치 벡터를 후처리 대상 음성의 복호화 음성에 적용하는 것을 기초로 한다. 즉, 본 발명의 실시예는 음성 복호화기 마다 각각 가중치를 구하고, 후처리 대상 복호화 음성에 사용된 음성 복호화기에 대해, 구해진 가중치를 이용하여 후처리를 수행하는 것을 기초로 한다.According to an embodiment of the present invention, in order to solve the degradation of the intelligibility of the decoded speech in the speech decoding system synthesizing the speech using the parameters of the speech model, the spectral ratio of the original sound and the decoded speech is statistically used. The weight vector is obtained, and the weight vector is applied to the decoded speech of the post-processing target speech. That is, an embodiment of the present invention is based on obtaining weights for each speech decoder and performing post-processing using the obtained weights for the speech decoder used for the post-processing target speech.

이하, 첨부된 도면을 참조하여 본 발명의 실시예에 대해 상세히 설명하기로 한다.Hereinafter, with reference to the accompanying drawings will be described in detail an embodiment of the present invention.

도 1은 본 발명의 실시예에 따른 음성 복호화 시스템에서 가중치 벡터를 연산하기 위한 후처리 장치의 블록도로서, 본 발명의 실시예에 따른 후처리 장치는 묵음 제거부(100), 신호 변환부(102), 스펙트럼 비율 연산부(104) 및 가중치 벡터 연산부(106)를 포함할 수 있으며, 묵음 제거부(100)로 입력되는 x(n) 및 y(n)은 각각 음성 복호화 시스템에 입력되는 원음 및 음성 복호화 시스템을 거친 복호화 음성을 나타낸다.1 is a block diagram of a post-processing apparatus for calculating a weight vector in a speech decoding system according to an embodiment of the present invention. The post-processing apparatus according to the embodiment of the present invention includes a silence remover 100 and a signal converter ( 102, a spectral ratio calculator 104 and a weight vector operator 106, wherein x (n) and y (n) input to the silence remover 100 are the original sound input to the speech decoding system, respectively. Represents a decoded voice that has passed through the voice decoding system.

도 1에 도시한 바와 같이, 묵음 제거부(100)는 입력되는 원음 및 복호화 음성의 묵음을 제거할 수 있다. 본 발명의 실시예에서 원음 및 복호화 음성의 묵음을 제거하는 이유는, 후술하는 신호 변환, 스펙트럼 비율 계산 등의 과정을 수행할 때 주파수 스펙트럼에 묵음에 대한 정보가 포함되지 않도록 하여 스펙트럼의 왜곡에 의한 명료도 손실을 최소화하기 위함이다. 이때, 복호화 음성이라 함은, 음성 복호화기를 거쳐 음성 발성 모델의 파라미터를 이용하여 합성된 음성을 의미한다.As shown in FIG. 1, the silence removing unit 100 may remove silence of an input original sound and a decoded voice. In the embodiment of the present invention, the reason for removing the silence of the original sound and the decoded speech is that the frequency spectrum does not include information on silence when performing the following signal conversion, spectrum ratio calculation, etc. In order to minimize the loss of clarity. In this case, the decoded voice refers to a voice synthesized using a parameter of a voice speech model through a voice decoder.

신호 변환부(102)는 묵음 제거부(100)를 통해 묵음이 제거된 원음 및 복호화 음성을 각각 주파수 영역의 신호로 변환할 수 있다. 예컨대, 신호 변환부(102)는 시간 도메인에서의 원음 및 복호화 음성을 주파수 도메인에서의 원음 및 복호화 음성으로 변환할 수 있다. 이러한 신호 변환부(102)는, 예를 들어 이산 푸리에 변환(Discrete Fourier Transform, DFT)을 이용할 수 있다.The signal converter 102 may convert the original sound and the decoded voice from which the silence is removed through the silence remover 100 into signals in the frequency domain, respectively. For example, the signal converter 102 may convert original and decoded speech in the time domain into original and decoded speech in the frequency domain. Such a signal converter 102 may use a Discrete Fourier Transform (DFT), for example.

스펙트럼 비율 연산부(104)는 주파수 영역의 신호로 변환된 원음 및 복호화 음성의 주파수 스펙트럼 비율을 연산할 수 있다. 예를 들어, 스펙트럼 비율 연산부(104)는 주파수 영역의 신호로 변환된 원음 및 복호화 음성의 주파수 밴드별 스펙트럼 평균을 구하여 주파수 밴드별로 주파수 스펙트럼 비율을 연산할 수 있다.The spectral ratio calculator 104 may calculate the frequency spectral ratios of the original sound and the decoded voice that are converted into signals in the frequency domain. For example, the spectral ratio calculator 104 may calculate a spectral average for each frequency band of the original sound and the decoded voice, which are converted into a signal in the frequency domain, and calculate the frequency spectral ratio for each frequency band.

도 2는 이와 같은 주파수 스펙트럼 비율을 구하는데 사용되는 주파수 밴드별 주파수 범위를 예시한 테이블이다.2 is a table illustrating a frequency range for each frequency band used to obtain such a frequency spectrum ratio.

도 2에 예시한 바와 같이, 본 발명의 실시예에서는 원음 및 복호화 음성에 대해 20개의 주파수 밴드로 구분하고, 각각의 주파수 밴드는 일정 주파수 범위를 갖도록 구성하였다.As illustrated in FIG. 2, in the exemplary embodiment of the present invention, 20 frequency bands are divided for the original sound and the decoded voice, and each frequency band is configured to have a predetermined frequency range.

이때, 주파수 밴드는, 예를 들어 1/3 옥타브 스케일(octave scale)로 분할된 주파수 범위를 가질 수 있다. 도 2에서는 1~4000Hz의 주파수 구간을 20개의 주파수 밴드로 구분하고, 주파수 구간별 각각의 밴드를 1/3 옥타브 스케일로 분할하여 각각의 주파수 밴드별 주파수 대역의 하한 주파수와 상한 주파수가 일정한 배율을 갖도록 구성하였다. 도 2에서 알 수 있듯이, 1/3 옥타브 스케일에서는 각 주파수 밴드의 하한 주파수와 상한 주파수에 대략 1.26의 배율을 곱한 주파수 범위가 되도록 주파수 밴드가 분할될 수 있다. 참고로, 1/1 옥타브 스케일에서는 2배의 배율을 갖도록 주파수 밴드가 분할될 수 있다.In this case, the frequency band may have a frequency range divided into, for example, 1/3 octave scale. In FIG. 2, a frequency range of 1 to 4000 Hz is divided into 20 frequency bands, and each band for each frequency section is divided into 1/3 octave scales, whereby the lower limit frequency and the upper limit frequency of the frequency band for each frequency band are constant. It was configured to have. As can be seen in Figure 2, in the 1/3 octave scale, the frequency band can be divided so that the frequency range is multiplied by a magnification of approximately 1.26 by the lower limit frequency and the upper limit frequency of each frequency band. For reference, in the 1/1 octave scale, the frequency band may be divided to have a magnification of 2 times.

이와 같이 1/3 옥타브 스케일로 주파수 밴드를 분할함으로써, 가청 주파수 대역에서 원음 및 복호화 음성에 대한 보다 정밀한 스펙트럼 평균이 구해질 수 있다.By dividing the frequency band on the 1/3 octave scale in this way, a more precise spectral average of the original and decoded speech in the audible frequency band can be obtained.

가중치 벡터 연산부(106)는 스펙트럼 비율 연산부(104)에서 구해진 주파수 밴드별 주파수 스펙트럼 비율을 기초로 원음 및 복호화 음성의 가중치 벡터를 연산할 수 있다.The weight vector calculator 106 may calculate a weight vector of the original and decoded speech based on the frequency spectrum ratio for each frequency band obtained by the spectrum ratio calculator 104.

이때의 가중치 벡터 연산부(106)는 주파수 밴드별로 연산된 주파수 스펙트럼 비율을 기하 평균화하여 가중치 벡터를 구할 수 있으며, 주파수 밴드별로 구해진 가중치 벡터를 주파수 인덱스별 가중치 벡터로 변환할 수 있다.In this case, the weight vector calculator 106 may obtain a weight vector by geometric-averaging the frequency spectrum ratio calculated for each frequency band, and may convert the weight vector obtained for each frequency band into a weight vector for each frequency index.

도 3은 본 발명의 실시예에 따른 음성 복호화 시스템에서 가중치 벡터를 적용하기 위한 후처리 장치의 블록도로서, 묵음 제거부(200), 신호 변환부(202), 가중치 적용부(206), 스펙트럼 이퀄라이저부(미도시) 및 신호 역변환부(208)를 포함할 수 있다.3 is a block diagram of a post-processing apparatus for applying a weight vector in a speech decoding system according to an embodiment of the present invention. The silence remover 200, the signal converter 202, the weight applier 206, and a spectrum are shown. It may include an equalizer (not shown) and the inverse signal converter 208.

도 3의 후처리 장치는, 도 2의 후처리 장치에서 구해진 가중치 벡터를 후처리 대상 음성의 복호화 음성에 실제 적용하기 위한 구성이다.The post-processing apparatus of FIG. 3 is a configuration for actually applying the weight vector obtained by the post-processing apparatus of FIG. 2 to the decoded speech of the post-processing target speech.

도 3에 도시한 바와 같이, 묵음 제거부(200)는 후처리 대상 음성의 복호화 음성의 묵음을 제거하는 역할을 할 수 있다. 본 발명의 실시예에서 후처리 대상 음성의 복호화 음성의 묵음을 제거하는 이유는, 후술하는 신호 변환 과정을 수행할 때 주파수 스펙트럼에 묵음에 대한 정보가 포함되지 않도록 하여 스펙트럼의 왜곡에 의한 명료도 손실을 최소화하기 위함이다.As shown in FIG. 3, the silence removing unit 200 may serve to remove silence of the decoded voice of the post-processing target voice. In the embodiment of the present invention, the reason for removing the silence of the decoded speech of the post-processing target voice is to prevent loss of intelligibility due to the distortion of the spectrum by not including the information on the silence in the frequency spectrum when performing the signal conversion process described later. This is to minimize.

신호 변환부(202)는 묵음 제거부(200)를 통해 묵음이 제거된 후처리 대상 음성의 복호화 음성을 주파수 영역의 신호로 변환할 수 있다. 예컨대, 신호 변환부(102)는 시간 도메인에서의 후처리 대상 음성의 복호화 음성을 주파수 도메인에서의 후처리 대상 음성의 복호화 음성으로 변환할 수 있다. 이러한 신호 변환부(202)는, 예를 들어 이산 푸리에 변환 기법이 적용될 수 있다.The signal converter 202 may convert the decoded voice of the post-processing target voice from which the silence is removed through the silence remover 200 into a signal in the frequency domain. For example, the signal converter 102 may convert the decoded voice of the post-processing target voice in the time domain into the decoded voice of the post-processing target voice in the frequency domain. The signal converter 202 may be, for example, a discrete Fourier transform technique.

가중치 적용부(206)는 도 1의 가중치 벡터 연산부(106)에서 구해진 가중치 벡터를 주파수 영역의 신호로 변환된 후처리 대상 음성의 복호화 음성에 적용을 할 수 있다. 예를 들어, 가중치 적용부(206)는 주파수 인덱스별 가중치 벡터를 후처리 대상 음성의 복호화 음성에 적용하여 후처리 대상 음성의 복호화 음성의 스펙트럼 왜곡을 보상해 줄 수 있다.The weight applying unit 206 may apply the weight vector obtained by the weight vector calculating unit 106 of FIG. 1 to the decoded speech of the post-processing target voice converted into a signal in the frequency domain. For example, the weight applier 206 may apply the weight vector for each frequency index to the decoded speech of the post-processing speech to compensate for the spectral distortion of the decoded speech of the post-processing speech.

신호 역변환부(208)는 주파수 인덱스별 가중치 벡터가 적용되어 스펙트럼 왜곡이 보상된 복호화 음성의 주파수 영역의 신호를 시간 영역의 신호로 변환할 수 있다. 예컨대, 신호 역변환부(208)는 주파수 도메인에서 보정된 음성의 복호화 음성을 시간 도메인에서의 보정된 복호화 음성으로 변환할 수 있다. 이러한 신호 역변환부(208)는, 예를 들어 역이산 푸리에 변환(Inverse Discrete Fourier Transform) 기법이 적용될 수 있다.The signal inverse transform unit 208 may convert a signal in the frequency domain of the decoded speech whose spectral distortion is compensated by applying a weight vector for each frequency index, into a signal in the time domain. For example, the signal inverse converter 208 may convert the decoded voice of the voice corrected in the frequency domain into the corrected decoded voice in the time domain. The signal inverse transform unit 208 may be, for example, an Inverse Discrete Fourier Transform technique.

이하, 상술한 구성과 함께, 본 발명의 실시예에 따른 음성 복호화 시스템의 후처리 과정을 첨부한 도 4 및 도 5의 흐름도를 참조하여 보다 구체적으로 설명하기로 한다.Hereinafter, with reference to the above-described configuration, with reference to the flowchart of Figures 4 and 5 attached to the post-processing process of the speech decoding system according to an embodiment of the present invention will be described in more detail.

먼저, 도 4는 본 발명의 실시예에 따른 음성 복호화 시스템에서 가중치 벡터를 연산하는 후처리 과정을 예시한 흐름도이다.First, FIG. 4 is a flowchart illustrating a post processing process of calculating a weight vector in a voice decoding system according to an embodiment of the present invention.

도 4에 예시한 바와 같이, 묵음 제거부(100)는 입력되는 원음 및 복호화 음성에 포함될 수 있는 묵음을 제거할 수 있다(S100).As illustrated in FIG. 4, the silence removing unit 100 may remove silence that may be included in the input original sound and the decoded voice (S100).

이후, 신호 변환부(102)는 묵음 제거부(100)를 통해 묵음이 제거된 원음 및 복호화 음성을 각각 주파수 영역의 신호로 변환할 수 있다(S102). 예컨대, 신호 변환부(102)는 시간 도메인에서의 원음 및 복호화 음성을 주파수 도메인에서의 원음 및 복호화 음성으로 변환할 수 있다.Thereafter, the signal converter 102 may convert the original sound and the decoded voice from which the silence is removed through the silence remover 100 into signals in a frequency domain, respectively (S102). For example, the signal converter 102 may convert original and decoded speech in the time domain into original and decoded speech in the frequency domain.

스펙트럼 비율 연산부(104)는 신호 변환부(102)에서 주파수 영역의 신호로 변환된 원음 및 복호화 음성의 주파수 스펙트럼의 비율을 연산할 수 있다(S104). 예를 들어, 스펙트럼 비율 연산부(104)는 주파수 영역의 신호로 변환된 원음 및 복호화 음성의 주파수 밴드별 스펙트럼 평균을 구하여 주파수 밴드별로 주파수 스펙트럼 비율을 연산할 수 있다.The spectral ratio calculator 104 may calculate the ratio of the frequency spectrum of the original sound and the decoded voice, which are converted into a signal in the frequency domain by the signal converter 102 (S104). For example, the spectral ratio calculator 104 may calculate a spectral average for each frequency band of the original sound and the decoded voice converted into a signal in the frequency domain, and calculate the frequency spectral ratio for each frequency band.

이러한 주파수 밴드별 주파수 스펙트럼 비율은 r(i, l)로 표시될 수 있으며, 이는 다음 [수학식 1]에 의해 구해질 수 있다.The frequency spectrum ratio of each frequency band may be represented by r ( i, l ), which may be obtained by Equation 1 below.

[수학식 1]에서 i 및 l은 각각 프레임 인덱스 및 1/3 밴드 인덱스이며,

은 원음의 주파수 밴드별 스펙트럼 평균이고,

은 복호화 음성의 주파수 밴드별 스펙트럼 평균을 각각 나타낸다.In Equation 1, i and l are frame index and 1/3 band index, respectively.

Is the spectral mean of the original frequency bands,

Denotes a spectral average of each frequency band of the decoded speech.

가중치 벡터 연산부(106)는 스펙트럼 비율 연산부(104)에서 구해진 주파수 밴드별 주파수 스펙트럼 비율을 기초로 원음 및 복호화 음성의 가중치 벡터를 연산할 수 있다(S106).The weight vector calculator 106 may calculate a weight vector of the original and decoded speech based on the frequency spectrum ratio for each frequency band obtained by the spectrum ratio calculator 104 (S106).

이때의 가중치 벡터는 도 2의 주파수 밴드별로 연산된 주파수 스펙트럼 비율을 기하 평균화하는 과정을 거쳐 구해질 수 있으며, 이러한 기하 평균화 과정을 거쳐 구해지는 가중치 벡터는 다음 [수학식 2] 및 [수학식 3]과 같이 표현될 수 있다.At this time, the weight vector may be obtained by geometric averaging the frequency spectrum ratio calculated for each frequency band of FIG. 2, and the weight vector obtained through the geometric averaging process is represented by Equations 2 and 3 below. ] Can be expressed as

여기서, M은 학습용 음성 데이터베이스의 전체 프레임 개수, L은 주파수 밴드의 전체 개수를 각각 나타낸다.Here, M denotes the total number of frames of the training voice database, and L denotes the total number of frequency bands.

도 5는 도 4에서 구해진 가중치 벡터를 후처리 대상 음성의 복호화 음성에 적용하는 후처리 과정을 예시한 흐름도이다.5 is a flowchart illustrating a post-processing process of applying the weight vector obtained in FIG. 4 to a decoded voice of a post-processing target voice.

도 5에 예시한 바와 같이, 묵음 제거부(200)는 후처리 대상 음성의 복호화 음성의 묵음을 제거하는 역할을 할 수 있다. 본 발명의 실시예에서 후처리 대상 음성의 복호화 음성의 묵음을 제거하는 이유는, 후술하는 신호 변환 과정을 수행할 때 주파수 스펙트럼에 묵음에 대한 정보가 포함되지 않도록 하여 스펙트럼의 왜곡에 의한 명료도 손실을 최소화하기 위함이다.As illustrated in FIG. 5, the silence removing unit 200 may serve to remove silence of the decoded voice of the post-processing target voice. In the embodiment of the present invention, the reason for removing the silence of the decoded voice of the post-processing target voice is to prevent the loss of intelligibility due to the distortion of the spectrum by not including the information about the silence in the frequency spectrum when performing the signal conversion process described later. This is to minimize.

이후, 신호 변환부(202)는 묵음 제거부(200)를 통해 묵음이 제거된 후처리 대상 음성의 복호화 음성을 주파수 영역의 신호로 변환할 수 있다(S202). 예컨대, 신호 변환부(102)는 시간 도메인에서의 후처리 대상 음성의 복호화 음성을 주파수 도메인에서의 후처리 대상 음성의 복호화 음성으로 변환할 수 있다.Thereafter, the signal converter 202 may convert the decoded voice of the post-processing target voice from which the silence is removed through the silence remover 200 into a signal in a frequency domain (S202). For example, the signal converter 102 may convert the decoded voice of the post-processing target voice in the time domain into the decoded voice of the post-processing target voice in the frequency domain.

가중치 적용부(206)는 도 4에서 구해진 가중치 벡터를 주파수 영역의 신호로 변환된 후처리 대상 음성의 복호화 음성에 적용을 할 수 있다(S204). 예를 들어, 가중치 적용부(206)는 주파수 밴드별로 구해진 가중치 벡터를 주파수 인덱스별 가중치 벡터로 변환하고, 변환된 주파수 인덱스별 가중치 벡터를 후처리 대상 음성의 복호화 음성에 적용할 수 있으며, 이는 다음 [수학식 4] 및 [수학식 5]와 같이 표현될 수 있다.The weight applying unit 206 may apply the weight vector obtained in FIG. 4 to the decoded speech of the post-processing target speech converted into a signal in the frequency domain (S204). For example, the weight applying unit 206 may convert the weight vector obtained for each frequency band into a weight vector for each frequency index, and apply the converted weight vector for each frequency index to the decoded speech of the post-processing voice. It can be expressed as shown in [Equation 4] and [Equation 5].

[수학식 5]와 같이 주파수 인덱스별 가중치 벡터가 후처리 대상 음성의 복호화 음성에 적용됨으로써, 후처리 대상 음성의 복호화 음성의 스펙트럼 왜곡이 보상될 수 있다.As shown in Equation 5, the weight vector for each frequency index is applied to the decoded speech of the post-processing speech, thereby compensating for spectral distortion of the decoded speech of the post-processing speech.

최종적으로, 신호 역변환부(208)는 주파수 인덱스별 가중치 벡터가 적용되어 스펙트럼 왜곡이 보상된 복호화 음성의 주파수 영역의 신호를 시간 영역의 신호로 변환할 수 있다(S206). 예컨대, 신호 역변환부(208)는 주파수 도메인에서의 후처리 대상 음성의 복호화 음성을 시간 도메인에서의 후처리 대상 음성의 복호화 음성으로 변환할 수 있다. 이러한 신호 역변환부(208)는, 예를 들어 역이산 푸리에 변환 기법이 적용될 수 있다.Finally, the signal inverse transform unit 208 may convert the signal in the frequency domain of the decoded speech whose spectral distortion is compensated by applying the weight vector for each frequency index (S206). For example, the signal inverse transform unit 208 may convert the decoded voice of the post-processing target voice in the frequency domain into the decoded voice of the post-processing target voice in the time domain. The signal inverse transform unit 208 may be, for example, an inverse discrete Fourier transform technique.

아래 [수학식 6]은 신호 역변환부(208)에 의해 시간 영역의 신호로 변환된 후처리 대상 음성의 복호화 음성의 최종 출력값을 수식으로 표현한 것이다.Equation 6 below represents a final output value of the decoded speech of the post-processing target speech converted by the signal inverse transform unit 208 into a signal in the time domain.

[수학식 6]에서 LSD(Long Spectral Distance)는 스펙트럼 간의 차이를 나타내는 값(dB)이며, LSD가 낮을수록 원음과 복호화 음성 간의 왜곡 정도가 줄어드는 것을 의미한다.In Equation 6, LSD (Long Spectral Distance) is a value (dB) representing the difference between the spectrums, and the lower the LSD, the less the distortion between the original sound and the decoded speech.

본 발명의 실시예에서는, 예를 들어 LPC(linear Predictive Coding) 기반의 음성 복호화 환경에서 후처리 대상 음성의 복호화 음성에 대한 후처리 결과를 시뮬레이션하였으며, 아래 [표 1]은 LPC 복호화 환경에서 종래의 후처리 방법과 본 발명의 실시예에 따라 가중치가 적용된 후처리 방법을 적용했을 때의 LSD 값을 비교한 것이다.In the embodiment of the present invention, the post-processing result of the decoded speech of the post-processing target voice is simulated in, for example, linear predictive coding (LPC) -based speech decoding environment. The LSD value of the post-processing method and the weighted post-processing method according to the embodiment of the present invention are compared.

복호화 코더Decryption coder 종래 1Conventional 1 종래 2Conventional 2 본 발명The present invention LPC-10eLPC-10e 13.1113.11 13.1713.17 12.5512.55

본 발명의 실시예와 같이, 후처리 대상 음성의 복호화 음성에 대해 원음과 복호화 음성의 주파수 스펙트럼 비율에 따른 가중치를 적용했을 때 LSD 결과가 낮아짐을 [표 1]에서 확인할 수 있다.As in the embodiment of the present invention, it can be seen from Table 1 that the LSD result is lowered when the weighted value according to the frequency spectrum ratio of the original sound and the decoded voice is applied to the decoded voice of the post-processing target voice.

이상 설명한 바와 같은 본 발명의 실시예에 의하면, 각각의 음성 복호화기 별로 원음과 복호화 음성의 스펙트럼 비를 이용하여 통계적으로 가중치 벡터를 구하고, 구해진 가중치 벡터를 후처리 대상 음성의 복호화 음성에 적용함으로써, 음성 복호화 시스템에서 복호화 음성의 명료도를 향상시키도록 구현하였다.According to the embodiment of the present invention as described above, by calculating the weight vector statistically by using the spectral ratio of the original sound and the decoded speech for each speech decoder, and applying the obtained weight vector to the decoded speech of the post-processing target voice, Implemented to improve the intelligibility of decoded speech in a speech decoding system.

100: 묵음 제거부
102: 신호 변환부
104: 스펙트럼 비율 연산부
106: 가중치 벡터 연산부
206: 가중치 적용부
208: 신호 역변환부100: silence remover
102: signal conversion unit
104: spectral ratio calculator
106: weight vector calculation unit
206: weighting unit
208: signal inverse transform unit

Claims

입력되는 원음 및 상기 원음의 복호화 음성의 묵음을 제거하는 묵음 제거부;
상기 묵음이 제거된 상기 원음과 상기 원음의 복호화 음성을 각각 주파수 영역의 신호로 변환하는 신호 변환부;
상기 주파수 영역의 신호로 변환된 원음 및 복호화 음성의 주파수 밴드별 스펙트럼 평균을 구하고 상기 주파수 밴드별 스펙트럼 평균을 이용하여 상기 주파수 밴드별로 상기 원음과 상기 복호화된 음성의 주파수 스펙트럼 비율을 연산하는 스펙트럼 비율 연산부;
상기 주파수 밴드별로 연산된 상기 주파수 스펙트럼 비율을 기하 평균화하여 가중치 벡터를 연산하는 가중치 벡터 연산부; 및
상기 가중치 벡터를 후처리 대상 음성의 복호화 음성에 적용하는 가중치 적용부를 포함하고,
상기 가중치 벡터(

)는 수학식 1,2에 의해 연산되는
[수학식 1]

[수학식 2]

(여기서, M은 학습용 음성 데이터베이스의 전체 프레임 개수, L은 주파수 밴드의 전체 개수를 의미함)
음성 복호화 시스템의 후처리 장치.A silence remover for removing silence of the input original sound and the decoded voice of the original sound;
A signal converter for converting the original sound from which the silence is removed and the decoded voice of the original sound into signals in a frequency domain, respectively;
A spectral ratio calculator which calculates a spectral average of frequency bands of the original sound and the decoded speech converted into the signal in the frequency domain and calculates a frequency spectral ratio of the original sound and the decoded speech for each frequency band by using the spectral average for each frequency band ;
A weight vector calculator configured to calculate a weight vector by geometric-averaging the frequency spectrum ratio calculated for each frequency band; And
A weight applying unit configured to apply the weight vector to the decoded speech of the post-processing target speech;
The weight vector (

) Is calculated by Equation 1,2
[Equation 1]

[Equation 2]

(Where M is the total number of frames in the training voice database and L is the total number of frequency bands)
Post-processing device of the voice decoding system.

삭제delete

제 1 항에 있어서,
상기 가중치 벡터 연산부는,
상기 주파수 밴드별로 상기 가중치 벡터를 연산하는
음성 복호화 시스템의 후처리 장치.The method of claim 1,
The weight vector calculation unit,
Computing the weight vector for each frequency band
Post-processing device of the voice decoding system.

제 4 항에 있어서,
상기 주파수 밴드는 1/3 옥타브 스케일(octave scale)로 분할된 주파수 범위를 갖는
음성 복호화 시스템의 후처리 장치.The method of claim 4, wherein
The frequency band has a frequency range divided by 1/3 octave scale.
Post-processing device of the voice decoding system.

제 4 항에 있어서,
상기 가중치 벡터 연산부는,
상기 주파수 밴드별로 연산된 상기 가중치 벡터를 주파수 인덱스(index)별 가중치 벡터로 변환하는
음성 복호화 시스템의 후처리 장치.The method of claim 4, wherein
The weight vector calculation unit,
Converting the weight vector calculated for each frequency band into a weight vector for each frequency index
Post-processing device of the voice decoding system.

제 6 항에 있어서,
상기 가중치 적용부는,
상기 주파수 인덱스별 가중치 벡터를 상기 후처리 대상 음성의 복호화 음성에 적용하는
음성 복호화 시스템의 후처리 장치.The method of claim 6,
The weight applying unit,
The weight vector for each frequency index is applied to the decoded speech of the post-processing target speech.
Post-processing device of the voice decoding system.

원음의 복호화 음성을 출력하는 음성 복호화 시스템의 후처리 방법에 있어서,
입력되는 원음 및 상기 복호화 음성의 묵음을 제거하는 단계;
상기 원음 및 상기 복호화 음성을 주파수 영역의 신호로 변환하는 단계;
상기 주파수 영역의 신호로 변환된 원음 및 복호화 음성의 주파수 밴드별 스펙트럼 평균을 구하는 단계;
상기 주파수 밴드별 스펙트럼 평균을 이용하여 상기 주파수 밴드별로 상기 원음과 상기 복호화된 음성의 주파수 스펙트럼 비율을 연산하는 단계;
상기 주파수 밴드별로 연산된 상기 주파수 스펙트럼 비율을 기하 평균화하여 가중치 벡터를 연산하는 단계; 및
상기 가중치 벡터를 후처리 대상 음성의 복호화 음성에 적용하는 단계를 포함하고,
상기 가중치 벡터(

)는 수학식 1,2에 의해 연산되는
[수학식 1]

[수학식 2]

(여기서, M은 학습용 음성 데이터베이스의 전체 프레임 개수, L은 주파수 밴드의 전체 개수를 의미함)
음성 복호화 시스템의 후처리 방법.In the post-processing method of the speech decoding system for outputting the decoded speech of the original sound,
Removing silence of the input original sound and the decoded voice;
Converting the original sound and the decoded voice into a signal in a frequency domain;
Obtaining a spectral average for each frequency band of the original sound and the decoded voice converted into the signal in the frequency domain;
Calculating a frequency spectral ratio of the original sound and the decoded speech for each frequency band by using the spectrum average for each frequency band;
Calculating a weight vector by geometric-averaging the frequency spectrum ratio calculated for each frequency band; And
Applying the weight vector to a decoded speech of a post-processing target speech,
The weight vector (

) Is calculated by Equation 1,2
[Equation 1]

[Equation 2]

(Where M is the total number of frames in the training voice database and L is the total number of frequency bands)
Post-processing method of speech decoding system.

삭제delete

제 8 항에 있어서,
상기 가중치 벡터를 연산하는 단계는,
상기 주파수 밴드별로 연산된 상기 가중치 벡터를 주파수 인덱스별 가중치 벡터로 변환하는
음성 복호화 시스템의 후처리 방법.The method of claim 8,
Computing the weight vector,
Converting the weight vector calculated for each frequency band into a weight vector for each frequency index
Post-processing method of speech decoding system.

제 11 항에 있어서,
상기 적용하는 단계는,
상기 주파수 인덱스별 가중치 벡터를 상기 후처리 대상 음성의 복호화 음성에 적용하는 단계를 포함하는
음성 복호화 시스템의 후처리 방법.The method of claim 11,
The applying step,
Applying the weighted vector for each frequency index to the decoded speech of the post-processing target speech.
Post-processing method of speech decoding system.

제 8 항에 있어서,
상기 가중치 벡터를 연산하는 단계는,
상기 주파수 밴드별로 상기 가중치 벡터를 연산하는 단계를 포함하는
음성 복호화 시스템의 후처리 방법.The method of claim 8,
Computing the weight vector,
Calculating the weight vector for each frequency band;
Post-processing method of speech decoding system.

제 13 항에 있어서,
상기 주파수 밴드는 1/3 옥타브 스케일로 분할된 주파수 범위를 갖는
음성 복호화 시스템의 후처리 방법.The method of claim 13,
The frequency band has a frequency range divided by 1/3 octave scale
Post-processing method of speech decoding system.

제 8 항에 있어서,
상기 주파수 영역의 신호로 변환하는 단계는,
상기 묵음이 제거된 원음 및 복호화 음성을 프레임 단위로 구분한 후 상기 주파수 영역의 신호로 변환하는 단계를 포함하는
음성 복호화 시스템의 후처리 방법.The method of claim 8,
Converting to the signal in the frequency domain,
And dividing the original sound and the decoded voice from which the silence is removed in units of frames and converting the original sound and the decoded voice into a signal in the frequency domain.
Post-processing method of speech decoding system.

입력되는 원음 및 상기 원음의 복호화 음성의 묵음을 제거하는 묵음 제거부;
후처리 대상 음성의 복호화 음성을 주파수 영역의 신호로 변환하는 신호 변환부; 및
상기 주파수 영역의 신호로 변환된 후처리 대상 음성의 복호화 음성에 가중치 벡터를 적용하는 가중치 적용부를 포함하며,
상기 가중치 벡터는,
주파수 밴드별로 연산된 원음 및 상기 원음의 복호화 음성의 주파수 스펙트럼 비율을 기하 평균화하여 연산되고,
상기 주파수 스펙트럼 비율은,
상기 주파수 영역의 신호로 변환된 원음 및 복호화 음성의 주파수 밴드별 스펙트럼 평균을 구하고 상기 주파수 밴드별 스펙트럼 평균을 이용하여 상기 주파수 밴드별로 연산되고,
상기 가중치 벡터(

)는 수학식 1,2에 의해 연산되는
[수학식 1]

[수학식 2]

(여기서, M은 학습용 음성 데이터베이스의 전체 프레임 개수, L은 주파수 밴드의 전체 개수를 의미함)
음성 복호화 시스템의 후처리 장치.
A silence remover for removing silence of the input original sound and the decoded voice of the original sound;
A signal converter for converting the decoded voice of the post-processing target voice into a signal in a frequency domain; And
It includes a weight applying unit for applying a weight vector to the decoded speech of the post-processing target voice converted into a signal of the frequency domain,
The weight vector is,
Calculated by geometric-averaging the frequency spectral ratio of the original sound calculated for each frequency band and the decoded speech of the original sound,
The frequency spectrum ratio is,
Obtaining a spectral average for each frequency band of the original sound and the decoded speech converted into a signal in the frequency domain, and calculating the spectral mean for each frequency band using the spectral average for the frequency bands;
The weight vector (

) Is calculated by Equation 1,2
[Equation 1]

[Equation 2]

삭제delete