KR100394146B1

KR100394146B1 - The duplex multilevel adaptive compression voice signal transmission method over internet protocol

Info

Publication number: KR100394146B1
Application number: KR10-2000-0084480A
Authority: KR
Inventors: 김종호
Original assignee: 디지털시스 주식회사; 김종호
Priority date: 2000-12-28
Filing date: 2000-12-28
Publication date: 2003-08-09
Also published as: KR20020055123A

Abstract

본 발명은 인터넷 프로토콜 상에서의 실시간 음성신호에 대한 "이중 다해상도 적응 압축 알고리즘(Duplex Multilevel Adaptive Compression Algorithm) 전송 방법"에 관한 것으로서, 신호 입력시의 네트워크 지연특성을 측정하여 분해레벨The present invention relates to a "Duplex Multilevel Adaptive Compression Algorithm Transmission Method" for a real-time voice signal over an Internet protocol. The present invention relates to a resolution level by measuring network delay characteristics at the time of signal input.

(Analysis Level)를 결정하는 단계; 완전복원이 보장되는 직각경(直角鏡) 여파기Determining an analysis level; Right angle filter for full restoration

(濾波器)와 양자레벨조절 통하여 이중 다행상도로 적응 분해하는 단계; 상기 분해된 신호를 대역분할 적응차분 펄스부호변조방식(SB-ADPCM)으로 적응 양자화시키는 단계; 상기 분해된 이중 다해상도 실시간 음성데이터 및 해상도 상태 정보를 주채널 및 보조데이터 채널을 통하여 전송하는 단계; 상기 전송된 신호를 역적응양자화 시키는 단계; 역양자레벨조절과 역직각경여파기를 통하여 원신호로 합성하는 단계; 를 포함한다. 즉 이 발명을 통하여 제시하는 실시간 음성데이터 부호화 및 그 전송방법은 네트워크의 지연 상태에 따라 부호화방법 및 그 전송 속도를 이차원적으로 변화시킴으로써 다해상도를 이용하여 적응적으로 음성을 전송하는 것을 특징으로 한다. 이 발명을 통하여 패킷의 지연문제가 심각한 네트워크 환경에서 획기적으로 개선된 실시간 음성 데이터의 전송방법을 제공한다.Adaptive decomposition with dual multi-resolution through quantum level control and quantum level control; Adaptive quantizing the decomposed signal by band division adaptive differential pulse code modulation (SB-ADPCM); Transmitting the decomposed dual multi-resolution real time voice data and resolution state information through a primary channel and an auxiliary data channel; Dequantizing the transmitted signal; Synthesizing the original signal through inverse quantum level control and inverse rectangular filter; It includes. In other words, the real-time speech data encoding and its transmission method proposed through the present invention are characterized by adaptively transmitting speech using multi-resolution by two-dimensionally changing the encoding method and its transmission speed according to the delay state of the network. . The present invention provides a method for transmitting real-time voice data that is remarkably improved in a network environment where a packet delay problem is serious.

Description

인터넷 프로토콜상에서의 이중 다해상도 적응 압축 음성신호 전송 방법{The duplex multilevel adaptive compression voice signal transmission method over internet protocol}The duplex multilevel adaptive compression voice signal transmission method over internet protocol

본 발명은 최근 빠르게 변모하는 통신환경 하에서 통신혁명의 중요한 한 축을 담당할 것으로 예상되는 인터넷상에서의 실시간 음성통신 기술 즉, VoIP(Voice over Internet Protocol)기술 구현을 위한 특별한 적응 음성 압축 알고리즘 및 그 전송방법을 제시한다. 이 발명은 이중 다해상도 적응 압축 알고리즘(Duplex M ultilevel Adaptive Compression Algorithm)으로 명명되어 졌으며 인터넷의 네트워크 지연이 심한 상황에서 보다 효율적이고 내성이 강한 실시간 음성신호 전송 방법을 제공한다.The present invention is a special adaptive speech compression algorithm and its transmission method for real-time voice communication technology, ie Voice over Internet Protocol (VoIP) technology, which is expected to play an important role in the communication revolution under the rapidly changing communication environment. To present. The present invention, which is named as the Duplex Multi-Low Adaptive Compression Algorithm, provides a more efficient and robust real-time voice signal transmission method in the case of network delay in the Internet.

VoIP(Voice over Internet Protocol)란 인터넷상에서의 패킷 단위의 음성 전송을 의미하며, 인터넷 텔레포니, 아이피 텔레포니, 패킷-보이스, 팩킷화된 보이스와 혼용하여 사용되고 있다. 이 VoIP 기술의 등장으로 인터넷망과 전화망이 통합이 가능하게 되어가고 있다.Voice over Internet Protocol (VoIP) refers to packet-based voice transmission over the Internet, and is used interchangeably with Internet telephony, IP telephony, packet-voice, and packetized voice. With the advent of this VoIP technology, the Internet and telephone networks are being integrated.

IP 네트워크(Internet Protocol Network; IP Network)는 기본적으로 실시간이 아닌 일반 데이터의 전송을 목적으로 만들어진 네트워크로서 음성과 같은 실시간 정보를 전달하는 환경에는 적합하지 않았다. IP 네트워크 상에서 실시간으로 전송되는 음성 트래픽은 다른 데이터의 전송량에 따라 영향을 받으므로 음성 데이터가 손실되거나 제대로 전달되지 않을 수 있기 때문이다. 즉, 실시간 정보의 전달을 위하여 사용되는 인터넷상의 UDP(User Datagram Protocol)는 종단간에 정보의 전달 기능을 기본적으로 제공하지만 그 전달이 정확히 되었는지를 보장하지는 못하며, 전송도중 네트워크의 상태에 따라 문제가 발생하면 그 실시간 정보는 이용할 수 없게 되어 결국 폐기되고 마는 것이다.Internet Protocol Network (IP Network) is basically a network designed to transmit general data, not real time, and is not suitable for an environment that delivers real time information such as voice. This is because the voice traffic transmitted in real time on the IP network is affected by the amount of other data transmission, so the voice data may be lost or not properly delivered. In other words, User Datagram Protocol (UDP) on the Internet used for real-time information delivery basically provides end-to-end information transfer function, but it does not guarantee that the information is correctly delivered, and problems occur depending on the network status during transmission. The real-time information is then unavailable and eventually discarded.

이러한 측면에서 음성정보를 효과적으로 인터넷을 통하여 전송하는 것은 기술적으로 해결해야할 난제이며 음성신호에 대한 고도의 코딩 기술과 효율적인 전송 기술을 필요로 한다.In this respect, the effective transmission of voice information through the Internet is a technical problem to be solved and requires a high level of coding and efficient transmission techniques for voice signals.

하지만 인터넷을 사용하여 실시간 음성정보를 전달하는 것이 기존의 전화망을 이용하는 것보다 경제적인 측면에서 매우 유리하며 또한 널리 보급되어 있어 어디에서나 편리하게 활용할 수 있기 때문에 IP 네트워크를 이용한 음성전달 기법 및 장치들이 끊임없이 개발되어 왔다.However, using the Internet to deliver real-time voice information is economically more advantageous than using a conventional telephone network, and because it is widely available and can be conveniently used anywhere, voice transmission techniques and devices using IP networks are constantly Has been developed.

이러한 기법 및 장치의 개발에 대한 노력은 처음에는 인터넷 프로토콜 주소(IP Address)를 가진 컴퓨터끼리의 통신, 즉, PC-TO-PC 의 방식으로 시작되었으나 이러한 PC-TO-PC의 방식은 통화하고자 하는 상대방이 반드시 인터넷상에 접속되어 있어야 한다는 한계가 있어 널리 활용되지는 못하였다. 그러나 곧 인터넷과 일반전화망을 연결하는 관련기술의 발달과 인터넷의 급속한 확산으로 인터넷과 일반전화망을 연동하여 서비스하는 방식 즉 PC-TO-PHONE과 PHONE-TO-PHONE 방식의 적용이 보편화 되었다.Efforts to develop these techniques and devices began at the beginning of communication between computers with Internet Protocol (IP) addresses, that is, PC-TO-PC. There was a limitation that the other party must be connected to the Internet, so it was not widely used. However, due to the development of related technologies connecting the Internet and the general telephone network and the rapid spread of the Internet, the application of the PC-TO-PHONE and PHONE-TO-PHONE methods by interworking the Internet and the general telephone network became common.

상기와 같은 현재의 VoIP 서비스방식에서 통화품질을 결정하는 공통인 요소로는 음성정보를 실어 나르는 패킷의 지연, 지연의 차이, 패킷의 손실, 에코 등을 들 수 있다. 이들 중 각 패킷의 지연의 차이에서 오는 문제점은 버퍼의 크기를 조정 등의 기술을 적용하여 처리하고, 에코문제는 에코를 제거하는 많은 기법들이 개발되어 왔다. 패킷 손실이 발생할 경우 수화자측의 복호화에 직접 영향을 미치게 되고 심할 경우 수화자의 통화품질에 영향을 주게 되는 바, 현재의 기술수준으로 음성 패킷의 약 10%를 잃더라도 고음질의 음성을 원래대로 복원할 수 있으나 그 이상의 손실이 발생하게 되면 음성의 복구가 되지 않게 되므로 음성이 끊어지는 등의 문제가 발생한다. 이러한 패킷의 손실은 네트워크의 신뢰성 및 안정성과 밀접한 관계를 가지는데 인터넷의 보급 확장과 더불어 날로 네트워크의 신뢰성과 안정성은 좋아지고 있어서 VoIP 기술구현에 있어서 좋은 영향을 끼치고 있다.Common factors for determining call quality in the current VoIP service schemes include delay of packets carrying voice information, difference in delay, loss of packets, echo, and the like. Among these, problems related to the delay of each packet are dealt with by applying a technique such as adjusting the size of the buffer, and many techniques have been developed to eliminate the echo problem. If packet loss occurs, it directly affects the decryption of the called party, and if it is severe, it affects the call quality of the called party. Even if about 10% of the voice packets are lost at the current technology level, high quality voice is restored. However, if more loss occurs, the voice is not restored, causing a problem such as a broken voice. Such packet loss is closely related to network reliability and stability. With the expansion of the Internet, the reliability and stability of the network are improving, which has a good effect on the implementation of VoIP technology.

패킷의 지연은 네트워크상의 트래픽량에 따라 심한 차이를 보인다. 이는 네트워크의 용량과 라우팅 알고리즘, 음성신호의 부호화방법 등에 의하여 직접적인 영향을 받게되며 인터넷 사용자가 증가할수록 지연은 더욱 증가하게 된다. 인터넷상에서의 패킷의 지연시간은 대략 70-160ms 사이를 유지한다. 인터넷의 비동기 특성으로 인하여 지연이 일정하지 않게 되는데 네트워크상에 부하가 많이 걸리는 경우 300ms를 훨씬 넘어간다. 대부분의 100ms 이하의 지연에서는 그 지연사실을 인지하지 못하며 100ms에서 300ms사이의 지연에서는 음성이 멈칫거리는 것을 약하게 느끼며, 300ms이상의 지연에서는 통화품질의 저하를 확실하게 느끼게 된다. 근래에 와서 인터넷상으로 통과하는 정보의 종류가 데이터량이 적은 문자 데이터에서 데이터량이 많은 그래픽, 동영상 등으로 급속히 변화함에 따라 점점 문제의 심각성이 높아지고 있다.The delay of the packet varies greatly depending on the traffic volume on the network. This is directly affected by the network capacity, routing algorithm, and encoding method of voice signals. As the number of Internet users increases, the delay increases. The latency of packets on the Internet is between approximately 70 and 160 ms. Due to the asynchronous nature of the Internet, the delay is inconsistent, but if the network is heavily loaded, it is well over 300ms. Most delays of less than 100ms do not recognize the delay. At delays of 100ms to 300ms, voices are weakly perceived, while delays of more than 300ms clearly impair call quality. In recent years, the seriousness of the problem is increasing as the type of information passing through the Internet rapidly changes from character data having a small amount of data to graphics and video having a large amount of data.

이러한 패킷 지연의 문제점을 해결하기 위하여 근본적으로 네트워크의 인프라에 투자를 확대하여 대역폭을 넓혀 주어야 하지만, 기존의 주어진 환경에서 패킷의 지연을 최소화하기 위하여 패킷의 크기를 10-30 Byte로 작게 하거나, 음성의 부호화방식을 개선하는 등 다양한 방법들을 사용하고 있다.In order to solve the problem of packet delay, it is necessary to expand the bandwidth by expanding the investment in the infrastructure of the network, but in order to minimize the delay of the packet in a given environment, the packet size is reduced to 10-30 bytes or Various methods are used, such as improving the encoding method of the.

그러나 이러한 기존의 방법에서는 여전히 과도한 패킷지연 환경하에서는 통화의 품질이 현저하게 떨어지는 문제를 가지고 있어서 네트워크 환경이 나빠질 경우 음성정보의 정보자체가 차단되어 통화가 거의 불가능한 수준까지 내려간다.However, the existing method still has a problem that the quality of the call is significantly reduced under excessive packet delay environment. If the network environment is bad, the information of the voice information itself is blocked and the call is almost impossible.

따라서 트래픽 증가 등으로 네트워크 지연이 심해질 경우 기존의 방법은 실시간으로 음성을 전달하는 환경에 있어서 적절한 해결방안이 되기 어렵다는 문제점을 안고 있다.Therefore, when the network delay becomes severe due to traffic increase, the conventional method has a problem that it is difficult to be an appropriate solution in the environment of delivering voice in real time.

본 발명은 네트워크 트래픽의 증가로 데이터가 지연되어 실시간 음성전송이 제대로 수행되지 않는 것을 해결하는데 그 첫번째 목적이 있다.The first object of the present invention is to solve a problem in which real-time voice transmission is not performed properly due to delay in data due to an increase in network traffic.

본 발명의 다른 목적은 인터넷 프로토콜 네트워크의 트래픽을 감시하고 이러한 트래픽의 증감에 따른 패킷 데이터의 지연율을 감안하여 트래픽 상황이 좋아질 때 통상음질보다 더 나은 품질의 음성데이터를 실시간으로 전달할 수 있는 방법을 제시하는데 있다.Another object of the present invention is to provide a method for monitoring the traffic of an internet protocol network and delivering voice data of higher quality than normal sound quality in real time when traffic conditions improve in view of the delay rate of packet data according to the increase and decrease of such traffic. It is.

이 외의 본 발명의 목적은 후술하는 실시예를 통해 알 수 있을 것이다.Other purposes of the present invention will be apparent from the following examples.

도 1은 분해 QMF의 구조를 나타낸 도면이며,1 is a view showing the structure of the decomposition QMF,

도 2는 합성 QMF의 구조를 나타낸 도면이고,2 is a view showing the structure of a synthetic QMF,

도 3은 저역 통과 QMF의 주파수 응답 특성을 나타낸 도면이고,3 is a diagram showing the frequency response characteristics of the low pass QMF,

도 4는 고역 통과 QMF의 주파수 응답 특성을 나타낸 도면이며,4 is a diagram showing the frequency response characteristics of the high pass QMF,

도 5는 1차원 신호의 다해상도 분해를 나타낸 도면이고,5 is a diagram illustrating multi-resolution decomposition of a one-dimensional signal,

도 6은 1차원 신호를 다해상도로 분해한 나무 구조를 나타낸 도면이며,6 is a diagram illustrating a tree structure obtained by decomposing a one-dimensional signal into multiresolution,

도 7은 음성신호 "chat.wav"의 파형을 나타낸 도면이고,7 shows waveforms of a voice signal "chat.wav",

도 8은 3 단계로 분해한 신호를 나타낸 도면이며,8 is a diagram illustrating a signal decomposed in three stages,

도 9는 음성신호의 분해 블럭 구성도이고,9 is a block diagram illustrating the decomposition of an audio signal;

도 10은 네트워크 지연시간 측정자료를 나타낸 도면이고,10 is a view showing network latency measurement data,

도 11은 음성신호의 합성 블록 구성도이며,11 is a block diagram showing the synthesis of speech signals;

도 12는 입력 음성신호 파형의 일부를 나타낸 도면이고,12 is a diagram illustrating a part of an input audio signal waveform;

도 13은 G.722 전송방식 적용시의 SNR을 나타낸 도면이며,13 is a view showing SNR when G.722 transmission method is applied.

도 14는 시간영역은 고정 레벨 분해하고, 주파수영역은 7비트 고정 양자화시켰을 때의 SNR 측정 결과를 나타낸 도면이고,14 is a diagram showing the results of SNR measurement when the time domain is fixed-level decomposition and the frequency domain is 7-bit fixed quantization.

도 15는 시간영역은 적응 분해하고, 주파수영역은 7비트 고정 양자화시켰을 때의 SNR 측정 결과를 나타낸 도면이며,15 is a diagram illustrating an SNR measurement result when the time domain is adaptively decomposed and the frequency domain is 7-bit fixed quantized.

도 16는 시간영역은 분해하지 않고, 주파수영역은 적응 양자화 시켰을 때의 SNR 측정 결과를 나타낸 도면이고,FIG. 16 is a diagram showing an SNR measurement result when the time domain is not decomposed and the frequency domain is adaptive quantized.

도 17는 시간영역은 적응 분해하고, 주파수영역은 적응 양자화 시켰을 때의 SNR 측정 결과를 나타낸 도면이다.FIG. 17 shows the SNR measurement results when the time domain is adaptively decomposed and the frequency domain is adaptive quantized. FIG.

상기 목적을 달성하기 위하여, 본 발명은 신호 입력시의 네트워크 지연특성을 측정하여 분해레벨(Analysis Level)을 결정하는 단계; 완전복원이 보장되는 직각경(直角鏡) 여파기(濾波器)와 양자레벨조절을 통하여 이중 다해상도로 적응 분해하는 단계; 상기 분해된 신호를 대역분할 적응차분 펄스부호변조방식(SB-ADPCM)으로 적응양자화시키는 단계; 상기 분해된 이중 다해상도 실시간 음성데이터 및 해상도 상태 정보를 주채널 및 보조데이터 채널을 통하여 전송하는 단계; 상기 전송된 신호를 역적응양자화 시키는 단계; 역양자레벨조절과 역직각경여파기를 통하여 원신호로 합성하는 단계; 를 포함한다. 즉 이 발명을 통하여 제시하는 실시간 음성데이터 부호화 및 그 전송방법은 네트워크의 지연 상태에 따라 부호화방법 및 그 전송속도를 이차원적으로 변화시킴으로써 다해상도를 이용하여 적응적으로 음성을 전송하는 것을 특징으로 한다. 이 발명을 통하여 패킷의 지연문제가 심각한 네트워크 환경에서 획기적으로 개선된 실시간 음성 데이터의 전송방법을 제공한다.In order to achieve the above object, the present invention comprises the steps of measuring the network delay characteristics at the time of signal input (Analysis Level) to determine; Adaptive decomposition to dual multi-resolution through a quadrature filter and quantum level control to ensure complete restoration; Adaptive quantization of the decomposed signal by band division adaptive differential pulse code modulation (SB-ADPCM); Transmitting the decomposed dual multi-resolution real time voice data and resolution state information through a primary channel and an auxiliary data channel; Dequantizing the transmitted signal; Synthesizing the original signal through inverse quantum level control and inverse rectangular filter; It includes. In other words, the real-time speech data encoding and its transmission method proposed through the present invention are characterized by adaptively transmitting speech using multi-resolution by two-dimensionally changing the encoding method and its transmission speed according to the delay state of the network. . The present invention provides a method for transmitting real-time voice data that is remarkably improved in a network environment where a packet delay problem is serious.

이하에서는 먼저 신호의 분해와 합성을 위한 직각경 여파기(Quadrature Mirror Filter: QMF)의 기본 설계에 대해 설명하고, 본 발명 인터넷 프로토콜 상에서의 이중 다해상 적응 압축 음성신호 전송 방법을 단계적으로 구체화하여 설명한다.Hereinafter, a basic design of a quadrature mirror filter (QMF) for signal decomposition and synthesis will be described, and a dual multiresolution adaptive compressed voice signal transmission method according to the present invention Internet protocol will be described step by step. .

1. 직각경 여파기의 설계1. Design of right angle filter

신호의 분해Signal decomposition

밴드 손실이 없는 격자구조의 직각경 여파기구조를 살펴보면 다음과 같다. 도 에서와 같이 입력된 신호를 분해하기 위해 전달함수Hp(z)는 하기식 1로 표현된다.The rectangular filter structure of the lattice structure without band loss is as follows. To decompose the input signal as shown in FIG transfer function Hp (z) is represented by the following formula 1.

(식 1)(Equation 1)

여기서 A'_k는 2x2 직교(Orthogonal) 행렬이며, D(z)는 식 2와 같이 나타낼 수 있다.Wherein A _'k is a 2x2 orthogonal (Orthogonal) matrix, D (z) can be expressed as Equation 2 shows.

(식 2)(Equation 2)

A'_k는 일정한 이득을 갖는 직교행렬이므로 식 3과 같이 나타낼 수 있다.A _'k can be expressed by equation (3) because it is an orthogonal matrix having a constant gain.

(식 3)(Equation 3)

다시Hp(z)를 정리하면 식 4와 같이 일정한 이득을 갖는 격자구조의 QMF를 구성할 수 있다.In summary, Hp (z) can be arranged to form a QMF having a lattice structure having a constant gain as in Equation 4.

(식 4)(Equation 4)

여기서, c는 여파기의 종합이득으로서 각 단계의 이득의 곱으로 나타난다.Where c is the total gain of the filter and is expressed as the product of the gains in each step.

(식 5)(Eq. 5)

신호의 합성Synthesis of Signal

복원을 위한 역행렬은 다음과 같이 구할 수 있다.The inverse for restoring can be obtained as

(식 6)(Equation 6)

그러나, 식 7을 고려하면However, considering Equation 7

(식 7)(Eq. 7)

그리고, D^-1(z) = D(z^-1)이므로, 다음 식과 같이 나타낼 수 있다.Since D ⁻¹ (z) = D (z ⁻¹ ), it can be expressed as follows.

(식 8)(Eq. 8)

따라서 복원 여파기의 격자구조는 다음 식과 같이 나타낼 수 있으며 그 구조는 도 2에 나타내었다.Therefore, the lattice structure of the reconstruction filter can be expressed as the following equation, and the structure thereof is shown in FIG. 2.

(식 9)(Eq. 9)

이상을 정리하여 가장 간단한 직각경 여파기의 예를 들면 식 10과 같이 나타낼 수 있다.In summary, for example, the simplest rectangular filter can be expressed as shown in Equation 10.

(식 10)(Eq. 10)

여기서 저역통과여파기 부분은 c[1 1]로 나타내고 고역통과여파기 부분은 c[1 -1]로 나타낸다.Here, the lowpass filter part is represented by c [1 1] and the highpass filter part is represented by c [1 -1].

직각경 여파기의 주파수 응답특성Frequency Response Characteristics of Rectangular Filters

상기에서 설계된 여파기의 주파수 응답 특성은 도 3과 도 4에 나타나있다. 도 5는 1차원 신호의 다해상도 분해를 나타내고 있고, 도 6은 1차원 신호를 다해상도로 분해한 나무의 구조를 나타내고 있다.The frequency response characteristic of the designed filter is shown in FIGS. 3 and 4. FIG. 5 shows the multi-resolution decomposition of the one-dimensional signal, and FIG. 6 shows the structure of the tree obtained by decomposing the one-dimensional signal into the multi-resolution.

2. 이중 다해상도 적응 압축 알고리즘2. Dual multiresolution adaptive compression algorithm

(Duplex Multilevel Adaptive Compression Algorithm)(Duplex Multilevel Adaptive Compression Algorithm)

개 요summary

ITU-T G.722는 기본적으로 입력신호를 직각경 여파기를 이용하여 저역(0 -4kHz)과 고역(4 -8kHz)의 두 대역으로 분할한 후 차분 양자화하는 과정을 거친 다음 저역과 고역을 합성하여 전송한다. 직각경 여파기의 구조는 완전복원을 만족해야하며 또한 선형 위상을 만족해야 한다. 이런 구조를 갖기 위해서는 적절한 여파기 계수 A_k를 선택해야 하며 계수를 간단히 하기 위해 각각 1을 선택하면 전체적으로 이득이 2가 되어 각 단을 거칠 때마다 에너지가 2배가 된다. 이것은 양자화 과정에서 이를 고려하여 다시 조정하면 되므로 여기서는 이득을 고려하지 않기로 한다.Basically, ITU-T G.722 divides the input signal into two bands of low band (0 -4 kHz) and high band (4 -8 kHz) using a quadrature filter and then performs differential quantization and synthesizes the low and high bands. To transmit. The structure of the quadrature filter must satisfy full restoration and also satisfy the linear phase. In order to have such a structure, an appropriate filter coefficient A _k must be selected. To simplify the coefficients, 1 is selected as the overall gain of 2, and the energy is doubled at each stage. This can be adjusted again in the quantization process, so we will not consider the gain here.

본 발명에서는 G.722의 종래방식을 획기적으로 변경하여, 한 번 처리된 저역 신호부를 트래픽의 상태에 따라 일정 수준까지 반복적으로 2차원적으로 분해함으로써 이중 다해상도를 갖는 구조로 신호를 분석하려고 한다. 다해상도 분해 과정과 양자화 과정에서 표본화 주파수와 양자화 간격을 각각 네트워크의 상황에 따라 일정 레벨까지 반복적으로 변화시킨다. 전송품질이 보증되지 않는 인터넷 환경, 특히 패킷의 지연에서 생기는 실시간 음성의 음질저하 방지에 초점을 맞추고, 트래픽상황이 호전되면 보다 상대적으로 더 높은 품질의 실시간 음성을 전달할 수 있는 방법을 시도한다. 이러한 목표에 접근하기 위하여 네트워크의 부하상황을 감시하고 그 상황에 따라 전송할 데이터의 발생률을 가변적으로 조정함으로써 네트워크의 상황에 적절히 대응하는 효율적인 음성 압축 알고리즘의 설계 및 전송 방법을 제공한다.In the present invention, the conventional method of G.722 is drastically changed, and the low-frequency signal portion that has been processed once is repeatedly decomposed two-dimensionally to a certain level according to the traffic state to analyze the signal with a structure having a dual multi-resolution. . In multiresolution decomposition and quantization, the sampling frequency and quantization interval are repeatedly changed to a certain level according to the network conditions. This paper focuses on preventing the sound quality degradation of the real-time voice caused by the delay of packet, especially in the Internet environment where the transmission quality is not guaranteed, and attempts to deliver a higher quality real-time voice when traffic conditions improve. In order to approach this goal, the present invention provides a method of designing and transmitting an efficient voice compression algorithm that monitors the network load condition and variably adjusts the data transmission rate according to the situation.

음성신호의 분해Voice signal decomposition

음성신호의 분해과정을 설명하기 위해 사용한 "chat.wav"음성 데이터의 파형을 도 7에 나타내었다.The waveform of the "chat.wav" voice data used to explain the decomposition process of the voice signal is shown in FIG.

이 신호를 다중 해상도로 3단계까지 분해한 후 각 단계의 저역여파기 통과 신호들을 도 8에 나타내었다. 도 8에서 알 수 있는 바와 같이 QMF 단계를 하나씩 거칠수록 고역신호들이 사라지고 저역 신호들만 남게된다. 신호의 전송률은 단계를 거칠 때 마다 하향 표본화되어 계속하여 반으로 감소됨을 알 수 있다.After decomposing the signal up to three stages with multiple resolutions, the low pass filter signals of each stage are shown in FIG. 8. As can be seen in FIG. 8, as the QMF steps are passed one by one, the high pass signals disappear and only the low pass signals remain. It can be seen that the transmission rate of the signal is sampled downward with each step and continues to decrease in half.

시간영역에서의 분해과정을 나타내고 있는 도 8에서와 같이 입력된 신호는 여러 단계의 해상도로 분해되는데 이 분해되는 해상도는 네트워크의 지연시간에 의해 결정된다. 빠른 전송을 위해 본 발명에서는 고역통과된 신호는 전송하지 않는다.As shown in Fig. 8 showing the decomposition process in the time domain, the input signal is decomposed into several resolutions. The resolution is determined by the delay time of the network. In the present invention, for high speed transmission, the high pass signal is not transmitted.

도 9에는 음성신호의 분해 불록 구성도가 나타나있다.9 shows an exploded block configuration diagram of an audio signal.

네트워크상의 지연측정Delay measurement on the network

본 발명은 네트워크의 지연 상태를 고려하여 신호 전송 속도를 변화시킴으로써 이중 다해상도를 이용하여 적응적으로 음성을 전송하는 것을 위한 것으로서, 네트워크상에서 발생하는 지연시간에 대한 정보를 보조 채널을 통하여 상호 교환하고, 이 정보를 이용하여 일정수준 이상의 지연상황이 발생하면 데이터의 전송률을 제한하였다. 실험에서 사용할 네트워크 모형은 실제 네트워크상의 Yahoo 사이트를 "Ping.exe"로 반복적으로 측정하여 구축하였다. 그 결과를 도 10에 나타내었다.The present invention is to adaptively transmit voice using dual multi-resolution by changing the signal transmission rate in consideration of the delay state of the network, and exchanges information on the delay time occurring on the network through an auxiliary channel. Using this information, the data rate is limited when a certain delay occurs. The network model to be used in the experiment was constructed by repeatedly measuring the Yahoo site on the actual network with "Ping.exe". The results are shown in FIG.

도 10에서 보는 바와 같이 네트워크의 전송 속도는 끊임없이 변화하고 있다. 이렇게 무작위적인 지연에 의해 전송품질이 보장되지 않는 상황에서 실시간으로 데이터를 전송한다는 것은 네트워크의 특성상 상당한 전송데이터의 열화를 가져올 수 있다. 이를 개선하기 위해 본 발명에서는 네트워크상에서의 전송지연을 경계값(Threshold)으로 나눈 후 그 몫을 취하여 분해레벨(Analysis Level)을 결정한다. 이 때 사용되는 경계값은 300ms, 200ms, 100ms, 50ms로 설정하여 각각의 경우에 있어서 분해레벨이 증가할 때마다 데이터의 발생률을 시간영역과 주파수영역 모두에서 반씩 감소시키는 방법을 사용하였다. 시간 영역에서는 데시메이션 (Decimation)을 이용하여 표본화 주파수를 줄이는 방법을 사용하였고, 주파수 영역에서는 저역통과된 신호의 양자화 간격을 늘려 데이터의 발생률을 줄여나가는 방법을 사용하였다.As shown in Figure 10, the transmission speed of the network is constantly changing. Transmitting data in real time in a situation where transmission quality is not guaranteed due to random delay may cause significant degradation of transmission data due to the characteristics of the network. In order to improve this problem, the present invention divides the transmission delay on the network by a threshold and takes the quotient to determine the analysis level. In this case, the thresholds used were set to 300ms, 200ms, 100ms, and 50ms. In each case, the data generation rate was reduced by half in both the time domain and the frequency domain as the resolution level increased. In the time domain, the decimation method is used to reduce the sampling frequency. In the frequency domain, the quantization interval of the lowpass signal is increased to reduce the data generation rate.

음성신호의 합성Synthesis of Speech Signal

전송된 신호는 역양자화기와 직각경 여파기를 거쳐 원신호로 합성한다. 이때 전송되지 않은 고역신호는 고려하지 않고 저역신호만으로 원 신호를 합성한다. 이러한 음성 신호의 합성 블록 구성도는 도 11에 나타나 있다.The transmitted signal is synthesized into the original signal through an inverse quantizer and a square filter. At this time, the raw signal is synthesized using only the low-band signal without considering the untransmitted high-band signal. The synthesis block diagram of such a voice signal is shown in FIG.

하기 실시예를 통해 본 발명을 보다 구체적으로 설명한다. 단, 본 발명은 실시예에 기재된 내용에 한정되지 않으며 하기 실시예는 단지 예시적인 목적으로 기재된다.The present invention is explained in more detail through the following examples. However, the present invention is not limited to the contents described in the Examples, and the following Examples are described for illustrative purposes only.

실시예Example

타이타닉 영화의 주제곡 "My heart will go on"을 신호로 사용하였으며, 입력된 신호의 일부분을 도 12에 나타내었다. 기존의 G.722와 같은 방식으로 신호를 전송하였을 때 신호 대 잡음비(Signal to Noise Ratio:SNR)를 관찰하기 위하여, 입력된 신호를 G.722의 모드 2 방식인 속도 56kbps, 1 level QMF 분해, 5 비트 저역신호, 2 비트 고역신호로 양자화하였다. 이 신호를 실험용 모형 네트워크를 통하여 전송한 후, 역양자화와 QMF 합성과정을 거쳤다. 이렇게 합성된 신호의 SNR 변화곡선은 도 13에 나타나 있다. 도 13에서 알 수 있는 바와 같이 상대적으로 고른 SNR 값을 보여주고 있다. 그러나 네트워크 지연시간을 고려하지 않고 일정한 속도의 데이터를 발생시킴으로써 사용자가 듣는 음성의 품질은 시간상의 지연으로 불연속 구간이 발생한다는 단점이 있다.The theme song "My heart will go on" of the Titanic movie was used as a signal, and a part of the input signal is shown in FIG. 12. In order to observe the Signal to Noise Ratio (SNR) when the signal is transmitted in the same manner as the conventional G.722, the input signal is divided into G.722 mode 2 speed 56kbps, 1 level QMF decomposition, It was quantized into a 5-bit low-band signal and a 2-bit high-band signal. The signal was transmitted through an experimental model network, followed by dequantization and QMF synthesis. The SNR change curve of the synthesized signal is shown in FIG. 13. As can be seen in Figure 13 shows a relatively even SNR value. However, by generating data at a constant speed without considering network delay time, there is a disadvantage in that a discontinuous section occurs due to time delay of voice quality.

본 발명에서 데이터 발생률을 제어하기 위해 사용하는 중요한 변수인 분해레벨 결정 공식은 식 11과 같다.The decomposition level determination formula, which is an important variable used to control the data generation rate in the present invention, is represented by Equation 11.

(식 11)(Eq. 11)

이렇게 결정된 분해레벨에 따라 신호를 분해한 후, N 비트로 양자화하였는데 이 N 값은 식 12와 같이 결정하였다. 이때 분해레벨의 최대값은 6으로 하였다. 그 이유는 7 이상에서는 분해레벨에서는 다해상도의 특징이 거의 나타나지 않고 신호가 많이 왜곡되기 때문이다.The signal was decomposed according to the resolution level thus determined and quantized into N bits. The N value was determined as shown in Equation 12. The maximum value of decomposition level was 6 at this time. The reason is that at 7 or more, multiresolution features hardly appear at the resolution level and the signal is distorted much.

(식 12)(Eq. 12)

도 14에는 네트워크의 지연특성을 무시하고, 주어진 고정 분해레벨별로 음성신호에 대하여 분해와 합성을 시도하여 얻은 각 SNR(Signal to Noise Ratio)을 나타낸다. 주파수 영역에서는 7 비트 고정 양자화를 시도하였다.FIG. 14 shows each signal to noise ratio (SNR) obtained by disassembling and synthesizing a speech signal for each given fixed resolution level, ignoring the delay characteristics of the network. In the frequency domain, 7-bit fixed quantization is attempted.

도 15에는 네트워크에서 발생하는 전송 지연시간을 고려하여 분해레벨을 결정한 후 시간영역에서의 데시메이션을 시도하고(적응 분해), 주파수 영역에서는 7 비트로 고정 양자화하여 데이터의 발생률을 조절한 결과를 보여준다.Fig. 15 shows the result of determining the decomposition level in consideration of the transmission delay time occurring in the network, then attempting decimation in the time domain (adaptive decomposition), and adjusting the incidence of data by fixed quantization with 7 bits in the frequency domain.

도 16에서는 시간영역의 분해는 적용하지 않고, 주파수 영역에서만 분해레벨에 따라 양자화 간격을 조절하여(적응 양자화) 데이터를 발생시킨 결과를 나타내고 있다. 도 15와 비교하여 비교적 신호의 열화가 많은 것을 알 수 있었다.In FIG. 16, the decomposition of the time domain is not applied. In FIG. 16, data is generated by adjusting the quantization interval (adaptation quantization) according to the decomposition level only in the frequency domain. Compared with FIG. 15, it was found that signal deterioration was relatively large.

본 발명에서 제공하는 음성신호 전송방법은 네트워크의 지연특성을 나타내는 분해레벨에 따라 데이터의 발생률을 적응적으로 조절할 수 있도록 시간영역과 주파수영역에서 분해 및 합성을 동시에 적용하는 것이다. 이 방법을 적용하여 측정한 SNR 변화는 도 17에 나타내었으며, 그 결과 동일한 수준의 압축률로 비교하였을 때 본 발명에서 시도한 방법이 다른 알고리즘보다 전송지연환경에서 더 안정적인 SNR을 보여준다는 사실이 밝혀졌다. 또한 네트워크 부하상태가 나빠지더라도 환경의 변화에 빠르게 적응하여 저속의 비트율로 지각할 수 있는 수준의 음성을 제공함으로써 음성의 단절 상태를 완화시킬 수 있었다.In the voice signal transmission method provided by the present invention, the decomposition and synthesis are simultaneously applied in the time domain and the frequency domain so that the generation rate of data can be adaptively adjusted according to the decomposition level representing the delay characteristics of the network. The SNR change measured by applying this method is shown in FIG. 17, and as a result, it was found that the proposed method shows more stable SNR in the transmission delay environment than other algorithms when compared with the same level of compression ratio. In addition, even when the network load worsened, it was able to quickly adapt to changes in the environment and provide voice at a low bit rate to alleviate the disconnection state of the voice.

본 발명은 입력되는 신호를 완전복원이 보장되는 직각경 여파기를 이용하여 이중 다해상도로 분해한 후 G.722 표준에서 사용하는 형식을 획기적으로 변경하여 7 비트 이하의 양자화를 수행하고, 1 비트 보조데이터 채널을 이용하여 전송지연에 대한 시간정보와 제어정보를 교환함으로써 저지연 고음질 전송을 실현하였다. 본 발명에서는 300ms 전송지연 환경에 국한하지 않고 여러 단계(300ms, 200ms, 100ms, 50ms)의 경계값을 설정하여 이 경계값의 배수를 넘어 설 때마다 직각경 여파기의 분해레벨과 양자화 간격을 늘림으로써 전체적인 데이터의 발생을 감소시켰다. 신호를 고속으로 전송하기 위해서는 인터넷 환경에서는 SNR을 희생할 수밖에 없지만 음성신호의 특징을 이용하여 사람이 감지하지 못하는 사이에 이 신호들의 해상도를 동적으로 가변하여 전송함으로써 끊임없이 변화하는 네트워크의 지연 환경하에서도 같은 비트율에서 잡음이 상대적으로 적은 음질을 얻을 수 있었다. 본 발명에서는 네트워크의 지연이 일정수준을 넘어설 때마다, 전송되는 정보를 반으로 줄이는 방법을 사용하였으며, 시간영역과 주파수영역에서 동시에 데이터 발생률을 줄임으로써 동일한 비트율에서 더 향상된 신호 대 잡음비를 구현하는 목표를 달성할 수 있었다.The present invention decomposes an input signal into a dual multi-resolution using a rectangular filter that guarantees full restoration, and then changes the format used by the G.722 standard to perform quantization of 7 bits or less, and 1 bit auxiliary. The low-delay high-quality transmission is realized by exchanging time information and control information on transmission delay using a data channel. In the present invention, it is not limited to the 300ms transmission delay environment, but by setting the threshold value of several stages (300ms, 200ms, 100ms, 50ms) and increasing the resolution level and the quantization interval of the square filter each time the multiple of the threshold is exceeded. Reduced overall data generation. In order to transmit signals at high speed, SNR must be sacrificed in the Internet environment, but the characteristics of voice signals can be used to dynamically change the resolution of these signals without human detection, so even in a constantly changing network delay environment. At the same bit rate, relatively low noise quality was obtained. In the present invention, whenever the network delay exceeds a certain level, a method of reducing the transmitted information is halved. By simultaneously reducing the data generation rate in the time domain and the frequency domain, the signal to noise ratio is improved at the same bit rate. I was able to achieve my goal.

본 발명에서 제공되는 음성신호 전송방법은 시간영역과 주파수영역에서 분해 해상도와 양자화 간격을 적응적으로 변경함으로써, 인터넷과 같은 QoS가 보증되지 않은 환경에서도 전송되는 신호들의 음질을 상대적으로 고르게 유지하도록 하였다.In the voice signal transmission method provided by the present invention, the resolution and the quantization interval are adaptively changed in the time domain and the frequency domain, thereby maintaining the sound quality of the transmitted signals relatively even in an environment where QoS is not guaranteed, such as the Internet. .

인터넷에서 전화음성의 범주를 넘어서는 실시간, 고품질 음성신호를 효과적으로 전송할 수 있는 알고리즘을 제공하으로써, 지연특성이 나쁜 네트워크 환경에서의 심각한 음질 저하를 방지하고 전반적으로 음질이 고르게 유지되도록 할 수 있다.By providing algorithms that can effectively transmit real-time, high-quality voice signals over the range of telephone voices on the Internet, it is possible to prevent severe sound degradation in network environments with poor delay characteristics and to maintain sound quality throughout.

본 발명에서 제공하는 음성신호 전송방법은 네트워크의 지연특성을 나타내는 분해레벨에 따라 데이터의 발생률을 적응적으로 조절할 수 있도록 시간영역과 주파수영역에서 분해 및 합성을 동시에 적용하는 것이다. 이 방법을 적용하여 측정한 SNR 변화는 도 17에 나타내었으며, 그 결과 동일한 수준의 압축률로 비교하였을 때 본 발명에 따른 방법이 다른 알고리즘보다 전송지연환경에서 더 안정적인 SNR을 보여준다는 사실이 밝혀졌다. 또한 네트워크 부하상태가 나빠지더라도 환경의 변화에 빠르게 적응하여 저속의 비트율로 지각할 수 있는 수준의 음성을 제공함으로써 음성의 단절 상태를 완화시킬 수 있었다.In the voice signal transmission method provided by the present invention, the decomposition and synthesis are simultaneously applied in the time domain and the frequency domain so that the generation rate of data can be adaptively adjusted according to the decomposition level representing the delay characteristics of the network. The SNR change measured by applying this method is shown in FIG. 17. As a result, it was found that the method according to the present invention showed more stable SNR in the transmission delay environment than other algorithms when compared with the same level of compression ratio. In addition, even when the network load worsened, it was able to quickly adapt to changes in the environment and provide voice at a low bit rate to alleviate the disconnection state of the voice.

Claims

완전복원이 보장되는 직각경(直角鏡) 여파기(濾波器)와 양자레벨조절 통하여 이중 다해상도로 적응 분해하는 단계; 역양자레벨조절과 역직각경여파기를 통하여 이중 다해상도 신호를 원신호로 합성하는 단계; 를 포함하는 것으로서 네트워크의 지연 상태에 따라 신호 전송 속도를 변화시킴으로써 적응적으로 음성을 전송하는 것을 특징으로 하는 인터넷 프로토콜 상에서의 이중 다행상도 적응 압축 음성신호 전송 방법.Adaptive decomposition to dual multi-resolution through a quadrature filter and quantum level control to ensure complete restoration; Synthesizing a dual multi-resolution signal into an original signal through inverse quantum level control and inverse rectangular filter; And adaptively transmit voice by changing a signal transmission rate according to a delay state of a network.

제 1항에 있어서, 상기 네트워크 지연 상태에 따라 신호 전송 속도를 변화시키는 방법이 네트워크의 지연특성을 나타내는 분해레벨에 따라 데이터의 발생률을 적응적으로 조절할 수 있도록 시간영역과 주파수영역에서 분해 및 합성을 동시에 적용하는 것을 특징으로 하는 인터넷 프로토콜 상에서의 이중 다해상도 적응 압축 음성신호 전송 방법.The method of claim 1, wherein the method of varying the signal transmission rate according to the network delay state performs decomposition and synthesis in the time domain and the frequency domain so that the generation rate of the data can be adaptively adjusted according to the decomposition level representing the delay characteristic of the network. A dual multi-resolution adaptive compressed voice signal transmission method over an internet protocol, characterized by simultaneous application.

제 2항에 있어서, 상기 분해레벨에 따라 데이터 발생률을 적응적으로 조절하는 방법이 분해레벨이 증가할 때마다 데이터의 발생률을 시간영역과 주파수영역 모두에서 반씩 감소시키기 위해 시간 영역에서는 데시메이션(Decimation)을 반복적으로 이용하여 표본화 주파수를 줄이고, 주파수 영역에서는 저역통과된 신호의 양자화 간격을 반복적으로 일정하게 늘려 데이터의 발생률을 줄여나가는 것을 특징으로 하는 인터넷 프로토콜 상에서의 이중 다해상도 적응 압축 음성신호 전송 방법.3. The method of claim 2, wherein the method of adaptively adjusting the data generation rate according to the decomposition level reduces the data generation rate by half in both the time domain and the frequency domain every time the decomposition level is increased. A method for transmitting dual multi-resolution adaptive compressed voice signals over an Internet protocol characterized by reducing the sampling frequency by repeating) and reducing the incidence of data by repeatedly increasing the quantization interval of the lowpass signal in the frequency domain. .

제 2항에 있어서, 상기 분해레벨이 네트워크상에서의 전송지연을 경계값으로 나눈 후 그 몫을 취하여 결정되는 것을 특징으로 하는 인터넷 프로토콜 상에서의 이중 다해상도 적응 압축 음성신호 전송 방법.3. The method of claim 2, wherein the decomposition level is determined by dividing a transmission delay on a network by a threshold and taking a quotient thereof.

제 4항에 있어서, 상기 사용되는 경계값이 300ms, 200ms, 100ms, 50ms로 구성된 군 중에서 설정되는 것을 특징으로 하는 인터넷 프로토콜 상에서의 이중 다해상도 적응 압축 음성신호 전송 방법.5. The method of claim 4, wherein the threshold value used is set among a group consisting of 300 ms, 200 ms, 100 ms, and 50 ms.

제 2항에 있어서, 상기 분해레벨 결정 공식이3. The formulation of claim 2 wherein the decomposition level determination formula is

인 것을 특징으로 하는 인터넷 프로토콜 상에서의 이중 다해상도 적응 압축 음성신호 전송 방법.A dual multi-resolution adaptive compressed voice signal transmission method over an Internet protocol characterized by the above-mentioned.

제 1항에 있어서, 상기 적응 양자화 단계에 있어서 N 값이 하기식The method of claim 1, wherein the N value in the adaptive quantization step is

에 의해 결정되는 것을 특징으로 하는 인터넷 프로토콜 상에서의 이중 다해상도 적응 압축 음성신호 전송 방법.A dual multi-resolution adaptive compressed speech signal transmission method over an Internet protocol characterized by the above.

제 7항에 있어서, 상기 N 값 결정식에 있어서 분해레벨의 최대값을 6으로 하는 것을 특징으로 하는 인터넷 프로토콜 상에서의 이중 다해상도 적응 압축 음성신호 전송 방법.8. The method according to claim 7, wherein the maximum value of the decomposition level is 6 in the N value determination equation.