KR20010026290A

KR20010026290A - Method for automatically detecting pitch points of voice signals

Info

Publication number: KR20010026290A
Application number: KR1019990037536A
Authority: KR
Inventors: 류승표; 안종영
Original assignee: 박종섭; 현대전자산업 주식회사
Priority date: 1999-09-04
Filing date: 1999-09-04
Publication date: 2001-04-06

Abstract

PURPOSE: A method for automatically detecting a pitch point of a sound signal is provided to exactly detect a pitch by automatically a pitch point of an input sound signal. CONSTITUTION: An AMDF signal with respect to a sound signal is extracted in predetermined sections(S10). A pitch period in each section of the sound signal is detected using the extracted AMDF signal(S11). The detected pitch period of a sound signal is arranged in a row by the hour and a pitch envelope is pictured(S12). A pitch point in the sound signal is detected based the pictured pitch envelope(S13). The step(S13) includes a step for searching a point having a greatest amplitude in a sound signal waveform and a step for setting the point having a greatest amplitude as the first pitch point.

Description

음성 신호 피치 지점 자동 검출 방법 { METHOD FOR AUTOMATICALLY DETECTING PITCH POINTS OF VOICE SIGNALS }Automatic detection of voice signal pitch points {METHOD FOR AUTOMATICALLY DETECTING PITCH POINTS OF VOICE SIGNALS}

본 발명은 음성 인식 장치에 관한 것이며, 보다 상세히는 음성 신호 피치 지점 자동 검출 방법에 관한 것이다.The present invention relates to a speech recognition apparatus, and more particularly, to a method for automatically detecting a speech signal pitch point.

일반적으로 음성 인식 장치에 있어서, 입력 음성 신호의 피치 주파수는 합성 음질을 좌우하는 요인으로 작용하므로, 음성 인식 및 음성 합성 기능을 수행할 때 정확한 합성 음질을 유지하기 위한 사전 작업으로써 반드시 사람의 음성 신호에서 주기적인 피치 지점을 검출한다.In general, in the speech recognition apparatus, the pitch frequency of the input speech signal acts as a factor in determining the synthesized sound quality. Therefore, when performing the speech recognition and the speech synthesis function, the speech signal of a human must always be a preliminary work to maintain the correct synthesized sound quality. Detect periodic pitch points at.

그러나, 종래에는 디스플레이를 이용하여 사람이 직접 상기와 같은 음성 신호를 육안으로 확인하면서 입력 음성 신호의 피치 지점을 일일이 검출해야 하기 때문에, 음성 신호에 대한 피치 특성을 도식화하기가 어려울 뿐만 아니라, 시간이 많이 소요되고 정확하게 피치 검출을 하지 못하는 문제점이 있다.However, in the related art, since a person must visually check the voice signal as described above by using a display, the pitch point of the input voice signal must be detected one by one. There is a problem that it takes a lot of time and does not accurately detect the pitch.

따라서, 본 발명은 상술한 종래의 문제점을 극복하기 위한 것으로서, 본 발명의 목적은 입력 음성 신호의 피치 지점을 자동으로 검출하도록 된 음성 신호 피치 지점 자동 검출 방법을 제공하는데 있다.Accordingly, an object of the present invention is to provide a method for automatically detecting a pitch point of a voice signal to automatically detect a pitch point of an input voice signal.

상기 본 발명의 목적을 달성하기 위한 음성 신호 피치 지점 자동 검출 방법은 음성 신호에 대하여 소정의 구간별로 AMDF 신호를 추출하는 단계와, 상기 소정의 구간별로 추출한 AMDF 신호를 이용하여 음성 신호의 각 구간에서의 피치 주기를 검출하는 단계와, 검출된 음성 신호의 피치 주기를 시간적으로 나열하여 피치 포락선을 그리는 단계, 및 완성된 피치 포락선을 참고로하여 음성 신호에서의 피치 지점을 검출하는 단계로 이루어진다.According to an aspect of the present invention, there is provided a method of automatically detecting a pitch point of a speech signal, extracting an AMDF signal for a predetermined section of a speech signal, and extracting the AMDF signal for each section of the speech signal in each section of the speech signal. Detecting a pitch period of the speech signal, drawing a pitch envelope by temporally arranging the pitch periods of the detected speech signal, and detecting a pitch point in the speech signal with reference to the completed pitch envelope.

이에 따라서, 본 발명의 음성 신호 피치 지점 자동 검출 방법은 수작업으로 음성 신호의 피치 지점을 검출하는 종래의 방법에 비해 더 정확하고 신속하게 피치 지점을 자동으로 검출할 수 있도록 되어 있다.Accordingly, the automatic voice signal pitch point detection method of the present invention is capable of automatically detecting the pitch point more accurately and quickly than the conventional method of manually detecting the pitch point of the voice signal.

도 1은 본 발명에 따른 음성 신호 피치 지점 자동 검출 방법을 도시한 플로차트,1 is a flowchart showing a method for automatically detecting a pitch point of a voice signal according to the present invention;

도 2는 AMDF 신호의 그래프,2 is a graph of AMDF signal,

도 3은 "말하세요"라는 음성 파형을 도시한 그래프,3 is a graph depicting a speech waveform " speak "

도 4는 도 3의 음성 파형에 대응하는 피치 포락선을 도시한 그래프,4 is a graph showing a pitch envelope corresponding to the voice waveform of FIG. 3;

도 5는 주기적인 간격을 두고 일정한 패턴이 반복되는 소정의 음성 신호 파형을 도시한 그래프,5 is a graph showing a predetermined voice signal waveform in which a predetermined pattern is repeated at periodic intervals;

도 6은 도 1의 피치 지점 검출 단계의 동작 상태를 도시한 플로차트이다.FIG. 6 is a flowchart showing an operating state of the pitch point detecting step of FIG. 1.

이하, 본 발명의 실시예를 첨부한 도면을 참조하여 상세히 설명하기로 한다.Hereinafter, embodiments of the present invention will be described in detail with reference to the accompanying drawings.

도 1은 본 발명에 따른 음성 신호 피치 지점 자동 검출 방법을 도시한 플로차트이다.1 is a flowchart illustrating a method for automatically detecting a pitch point of a voice signal according to the present invention.

S10 단계는 음성 신호에 대하여 소정의 구간별로 AMDF 신호를 추출하는 단계이다.The step S10 is a step of extracting the AMDF signal for each predetermined section of the voice signal.

상기 AMDF(Average magnitude difference function) 신호는 음성 신호에 대한 구간별 진폭 차이값의 평균값을 나타낸다.The average magnitude difference function (AMDF) signal represents an average value of the amplitude difference values for each section of the speech signal.

S11 단계는 상기 소정의 구간별로 추출한 AMDF 신호를 이용하여 음성 신호의 각 구간에서의 피치 주기를 검출하는 단계이다.Step S11 is a step of detecting a pitch period in each section of the voice signal using the AMDF signal extracted for each predetermined section.

S12 단계는 검출된 음성 신호의 피치 주기를 시간적으로 나열하여 피치 포락선을 그리는 단계이다.In step S12, the pitch envelope of the detected speech signal is arranged in time.

S13 단계는 완성된 피치 포락선을 참고로하여 음성 신호에서의 피치 지점을 검출하는 단계이다.Step S13 is a step of detecting a pitch point in the speech signal with reference to the completed pitch envelope.

상기와 같이 이루어진 본 발명에 따른 음성 신호 피치 지점 자동 검출 방법은 다음과 같이 작동한다.The automatic voice signal pitch point detection method according to the present invention made as described above operates as follows.

최초에, 음성 신호에 대하여 소정의 구간별로 AMDF 신호를 추출하는 단계(S10)에서는 n개의 소구간을 갖는 음성 신호 s(n)에 대하여 AMDF 연산을 실행하여 도 2에 도시된 바와 같은 AMDF 신호 그래프를 구한다.First, in step S10 of extracting an AMDF signal for a predetermined interval from a speech signal, an AMDF operation is performed on the speech signal s (n) having n subdivisions to display an AMDF signal graph as shown in FIG. 2. Obtain

상기와 같이 AMDF 연산을 실행하여 소정의 구간별로 AMDF 신호가 추출되면, 계속해서 상기 소정의 구간별로 추출한 AMDF 신호를 이용하여 음성 신호의 각 구간에서의 피치 주기를 검출한다(S11).When the AMDF signal is extracted for each predetermined section by performing the AMDF operation as described above, the pitch period in each section of the speech signal is detected using the AMDF signal extracted for each predetermined section (S11).

이때, 피치 주기는 도 2에 도시된 바와 같이, 음성 신호에 대하여 소정의 구간별로 추출한 AMDF 신호의 원점과 상기 원점에서부터 가장 먼저 나타나는 최저점 사이의 거리값을 측정하여 구할 수 있다.In this case, as shown in FIG. 2, the pitch period may be obtained by measuring a distance value between the origin of the AMDF signal extracted for each predetermined section of the speech signal and the lowest point appearing first from the origin.

특히, 본 발명에서는 소정의 구간별로 추출한 AMDF 신호의 원점과 상기 원점에서부터 가장 먼저 나타나는 최저점 사이의 거리값을 측정하여 피치 주기를 구할 때, 상기 AMDF 신호의 리플이 과다하게 많을 경우와 구간 에너지의 크기가 미세할 경우에는 피치가 없는 것으로 판단한다.Particularly, in the present invention, when the pitch period is obtained by measuring the distance value between the origin of the AMDF signal extracted for each predetermined section and the lowest point appearing first from the origin, the ripple of the AMDF signal is excessive and the magnitude of the section energy. If is fine, it is determined that there is no pitch.

상기와 같이 음성 신호의 소정 구간별 피치 주기가 구해지면, 계속해서 검출된 음성 신호의 피치 주기를 시간적으로 나열하여 피치 포락선을 그린다(S12).When the pitch period for each predetermined section of the audio signal is obtained as described above, a pitch envelope is drawn by arranging the pitch periods of the detected audio signal in time (S12).

상기 피치 포락선을 그릴 때는, 그래프의 명시도를 높이기 위하여 피치 주기의 역수인 피치 주파수를 이용하여 그리며, 상기 피치 포락선은 피치 지점 검출 시에 음성 신호의 각 샘플과 선형 비교를 하는데 사용된다.When the pitch envelope is drawn, the pitch envelope, which is the inverse of the pitch period, is used to increase the clarity of the graph, and the pitch envelope is used to make a linear comparison with each sample of the speech signal when the pitch point is detected.

예컨대, 도 3에 도시된 바와 같이 "말하세요"라는 음성 파형에 대응하는 피치 포락선은 도 4에 도시된 바와 같이 나타난다.For example, as shown in FIG. 3, a pitch envelope corresponding to the speech waveform “tell me” is shown as shown in FIG. 4.

상기와 같이 소정의 피치 포락선이 그려지면, 완성된 피치 포락선을 참고로하여 음성 신호에서의 피치 지점을 검출한다(S13).When the predetermined pitch envelope is drawn as described above, the pitch point in the audio signal is detected with reference to the completed pitch envelope (S13).

예컨대, 도 5에 도시된 바와 같이 주기적인 간격(피치 주기)을 두고 일정한 패턴이 반복되고, 반복 구간마다 최고 지점(피치 지점)이 나타나는 소정의 음성 신호 파형 에 대하여 피치 지점을 검출하는 과정은 도 6에 도시된 바와 같이 이루어진다.For example, as shown in FIG. 5, a process of detecting a pitch point with respect to a predetermined voice signal waveform in which a predetermined pattern is repeated at regular intervals (pitch periods) and a maximum point (pitch point) appears in each repeating section is illustrated in FIG. As shown in 6.

도 6에 있어서, s[n]은 음성 샘플이고, L은 음성 샘플의 개수이며, P[n]은 해당 피치 지점에서의 피치 주기이다. 또한, P[0]는 최초 피치 지점이고, [ ]안의 음수값은 최초 피치 지점에서 왼쪽에 존재함을 의미하며, 최종 결과값은 P[ ]이다.In Fig. 6, s [n] is a voice sample, L is the number of voice samples, and P [n] is a pitch period at the pitch point. In addition, P [0] is an initial pitch point, a negative value in [] means that it exists on the left at the initial pitch point, and the final result is P [].

상기 피치 지점 검출 단계(S13)에서는, 제일 먼저 음성 신호 파형에서 가장 큰 진폭 지점을 찾은 다음(20), 가장 큰 진폭 지점을 최초의 피치 지점으로 설정한다(S21).In the pitch point detecting step S13, the largest amplitude point is first found in the speech signal waveform (20), and then the largest amplitude point is set as the first pitch point (S21).

상기와 같이 최초 피치 지점이 설정되면, 상기 최초의 피치 지점에서의 피치 주기를 계산하여 피치 주기의 거리만큼 뒤로 이동한 다음, 이동 지점에서 해당 이동 지점의 피치 주기를 계산하여 앞뒤로 피치 주기의 절반 거리만큼의 영역에서 두 번째 최고 진폭 지점을 찾고, 계속해서 상기 두 번째 피치 지점에서의 피치 주기를 계산하여 피치 주기의 거리만큼 뒤로 이동한 다음, 이동 지점에서 해당 이동 지점의 피치 주기를 계산하여 앞뒤로 피치 주기의 절반 거리만큼의 영역에서 세 번째 최고 진폭 지점을 다시 찾는 동작을 상기 최초의 피치 지점이 음성 신호 파형의 원점으로 이동할 때까지 반복한다(S22∼S25).When the first pitch point is set as described above, the pitch period at the first pitch point is calculated and moved backward by the distance of the pitch period, and then the pitch period of the corresponding moving point is calculated at the moving point, and the half distance of the pitch period is moved back and forth. Finds the second highest amplitude point in the region, continues to calculate the pitch period at the second pitch point and moves backward by the distance of the pitch period, and then calculates the pitch period of the corresponding move point at the moving point to pitch back and forth The operation of finding the third highest amplitude point again in an area of half the period is repeated until the first pitch point moves to the origin of the audio signal waveform (S22 to S25).

즉, 이때는 도 5의 음성 신호 파형의 임의의 피치 지점들 중에서 최초 피지 지점으로 설정한 피치 지점을 기준으로 하여 왼쪽으로 이동하면서 피치 지점을 검출한다.That is, at this time, the pitch point is detected while moving to the left based on the pitch point set as the first sebum point among arbitrary pitch points of the audio signal waveform of FIG. 5.

한편, 상기 최초 피치 지점을 기준으로 하여 왼쪽으로 이동하면서 모든 피치 지점을 검출하고 나면, 반대로 최초 피지 지점을 기준으로 하여 오른쪽으로 이동하면서 피치 지점을 검출한다.On the other hand, after detecting all the pitch points while moving to the left based on the initial pitch point, on the contrary, the pitch points are detected while moving to the right based on the first sebum point.

이때는, 상기 최초의 피치 지점에서의 피치 주기를 계산하여 피치 주기의 거리만큼 앞으로 이동한 다음, 이동 지점에서 해당 이동 지점의 피치 주기를 계산하여 앞뒤로 피치 주기의 절반 거리만큼의 영역에서 두 번째 최고 진폭 지점을 찾고, 상기 두 번째 피치 지점에서의 피치 주기를 계산하여 피치 주기의 거리만큼 앞으로 이동한 다음, 이동 지점에서 해당 이동 지점의 피치 주기를 계산하여 앞뒤로 피치 주기의 절반 거리만큼의 영역에서 세 번째 최고 진폭 지점을 다시 찾는 동작을 상기 최초의 피치 지점이 음성 신호 파형의 종점으로 이동할 때까지 반복한 후 리턴한다(S26∼S31).At this time, the pitch period at the first pitch point is calculated and moved forward by the distance of the pitch period, and then the pitch period of the corresponding movement point is calculated at the moving point, and the second highest amplitude in the region of half the pitch period back and forth. Find the point, calculate the pitch period at the second pitch point and move forward by the distance of the pitch period, and then calculate the pitch period of the corresponding move point at the moving point, The operation of finding the highest amplitude point again is repeated after the first pitch point is moved to the end point of the audio signal waveform (S26 to S31).

상술한 바와 같이 본 발명에 따른 음성 신호 피치 지점 자동 검출 방법은 입력 음성 신호의 피치 지점을 자동으로 검출하도록 되어 있기 때문에, 수작업으로 피치 지점을 검출하는 종래의 방법에 비해 더 정확하고 신속하게 피치 지점을 검출할 수 있는 효과가 있다.As described above, the voice signal pitch point automatic detection method according to the present invention is designed to automatically detect the pitch point of the input voice signal, so that the pitch point is more accurate and faster than the conventional method of manually detecting the pitch point. There is an effect that can be detected.

이상에서 설명한 것은 본 발명에 따른 음성 신호 피치 지점 자동 검출 방법을 실시하기 위한 하나의 실시예에 불과한 것으로서, 본 발명은 상기한 실시예에 한정되지 않고, 이하의 특허청구의 범위에서 청구하는 본 발명의 요지를 벗어남이 없이 당해 발명이 속하는 분야에서 통상의 지식을 가진 자라면 누구든지 다양한 변경 실시가 가능할 것이다.What has been described above is just one embodiment for implementing the automatic voice signal pitch point detection method according to the present invention, the present invention is not limited to the above-described embodiment, the present invention claimed in the following claims Various changes can be made by those skilled in the art without departing from the gist of the present invention.

Claims

음성 신호에 대하여 소정의 구간별로 AMDF 신호를 추출하는 단계(S10)와,Extracting the AMDF signal for each of the predetermined sections with respect to the voice signal (S10);

상기 소정의 구간별로 추출한 AMDF 신호를 이용하여 음성 신호의 각 구간에서의 피치 주기를 검출하는 단계(S11)와,Detecting a pitch period in each section of the speech signal using the AMDF signal extracted for each predetermined section (S11);

검출된 음성 신호의 피치 주기를 시간적으로 나열하여 피치 포락선을 그리는 단계(S12), 및Drawing a pitch envelope by temporally arranging the pitch periods of the detected speech signals (S12), and

완성된 피치 포락선을 참고로하여 음성 신호에서의 피치 지점을 검출하는 단계(S13)Detecting a pitch point in the speech signal with reference to the completed pitch envelope (S13)

로 이루어진 것을 특징으로 하는 음성 신호 피치 지점 자동 검출 방법.Speech signal pitch point automatic detection method, characterized in that consisting of.

제 1 항에 있어서, 상기 피치 지점 검출 단계(S13)는The method of claim 1, wherein the pitch point detection step (S13)

음성 신호 파형에서 가장 큰 진폭 지점을 찾는 단계(20)와,Finding the largest amplitude point in the speech signal waveform (20),

가장 큰 진폭 지점을 최초의 피치 지점으로 설정하는 단계(S21)와,Setting the largest amplitude point as the first pitch point (S21),

상기 최초의 피치 지점에서의 피치 주기를 계산하여 피치 주기의 거리만큼 뒤로 이동한 다음, 이동 지점에서 해당 이동 지점의 피치 주기를 계산하여 앞뒤로 피치 주기의 절반 거리만큼의 영역에서 두 번째 최고 진폭 지점을 찾고, 상기 두 번째 피치 지점에서의 피치 주기를 계산하여 피치 주기의 거리만큼 뒤로 이동한 다음, 이동 지점에서 해당 이동 지점의 피치 주기를 계산하여 앞뒤로 피치 주기의 절반 거리만큼의 영역에서 세 번째 최고 진폭 지점을 다시 찾는 동작을 상기 최초의 피치 지점이 음성 신호 파형의 원점으로 이동할 때까지 반복하는 단계(S22∼S25), 및The pitch period at the first pitch point is calculated and moved backward by the distance of the pitch period, and then the pitch period of the corresponding moving point is calculated at the moving point, and the second highest amplitude point is moved back and forth in the area of half the pitch period. Find, calculate the pitch period at the second pitch point and move it back by the distance of the pitch period, and then calculate the pitch period of the corresponding move point at the moving point to obtain the third highest amplitude in an area of half the pitch period back and forth. Repeating the operation of finding the point again until the first pitch point moves to the origin of the audio signal waveform (S22 to S25), and

상기 최초의 피치 지점에서의 피치 주기를 계산하여 피치 주기의 거리만큼 앞으로 이동한 다음, 이동 지점에서 해당 이동 지점의 피치 주기를 계산하여 앞뒤로 피치 주기의 절반 거리만큼의 영역에서 두 번째 최고 진폭 지점을 찾고, 상기 두 번째 피치 지점에서의 피치 주기를 계산하여 피치 주기의 거리만큼 앞으로 이동한 다음, 이동 지점에서 해당 이동 지점의 피치 주기를 계산하여 앞뒤로 피치 주기의 절반 거리만큼의 영역에서 세 번째 최고 진폭 지점을 다시 찾는 동작을 상기 최초의 피치 지점이 음성 신호 파형의 종점으로 이동할 때까지 반복한 후 리턴하는 단계(S26∼S31)The pitch period at the first pitch point is calculated and moved forward by the distance of the pitch period, and then the pitch period of the corresponding movement point is calculated at the moving point, and the second highest amplitude point is moved forward and backward in an area of half the pitch period. Find, calculate the pitch period at the second pitch point and move it forward by the distance of the pitch period, and then calculate the pitch period of the corresponding move point at the moving point to obtain the third highest amplitude in the region of half the pitch period back and forth. Repeating the operation of finding the point again until the first pitch point moves to the end point of the audio signal waveform (S26 to S31).

로 이루어진 것 특징으로 하는 음성 신호 피치 지점 자동 검출 방법.Voice signal pitch point automatic detection method characterized in that consisting of.