KR940005042B1

KR940005042B1 - Synthesis method and apparatus of the korean language

Info

Publication number: KR940005042B1
Application number: KR1019910025702A
Authority: KR
Inventors: 정광균; 이윤근
Original assignee: 주식회사 금성사; 이헌조
Priority date: 1991-12-31
Filing date: 1991-12-31
Publication date: 1994-06-10
Also published as: KR930014271A

Abstract

The synthesis method reduces a memory, saves infinite voice data, and adjusts data of voice signal by formant synthesis method. The device includes a RS-232C (10), a controller (20) which controls a voice synthesis device, a digital signal processor (30) which processes output data of RS-232C (10), a program memory (40), a data memory (50), a digital-analog converter (60) which converts digital signal of digital signal processor (30) into anaolg signal, a low pass filter (70) which filters analog signal, and an amplifier (80) which outputs signal through a speaker (90).

Description

포만트를 이용한 한국어 합성방법 및 장치Korean Synthesis Method and Device Using Formant

제1도는 종래의 블록 구성도.1 is a conventional block diagram.

제2도는 본 발명의 하드웨어 시스템 구성도.2 is a hardware system configuration diagram of the present invention.

제3도는 본 발명의 제어동작 흐름도.3 is a flow chart of the control operation of the present invention.

제4도는 본 발명에 적용되는 클래트 합성기 구성도.4 is a schematic diagram of a class synthesizer applied to the present invention.

제5도는 본 발명의 동작을 설명하기 위한 파형도5 is a waveform diagram illustrating the operation of the present invention.

* 도면의 주요부분에 대한 부호의 설명* Explanation of symbols for main parts of the drawings

10 : RS-232C 20 : 제어부10: RS-232C 20: control unit

30 : 디지털 신호 처리부 40, 50 : 메모리30: digital signal processor 40, 50: memory

60 : 디지털-아날로그 변환기 70 : 저역필터60: digital-to-analog converter 70: low pass filter

80 : 증폭기 90 : 스피커80: amplifier 90: speaker

본 발명은 음성합성 방법 및 장치에 관한 것으로, 특히 포만트를 이용하여 한국어를 무제한으로 합성하는 음성합성 방법 및 장치에 관한 것이다.The present invention relates to a method and apparatus for speech synthesis, and more particularly, to a method and apparatus for synthesizing an unlimited number of Koreans using formants.

종래에는 제1도에 도시한 바와 같이 음성신호(1) 저역필터(2)와 아날로그 신호를 디지털로 바꾸어주는 아날로그/디지털 변환기(3), 분석 알고리즘(4) 및 분석 데이터 저장부(5)로 구성된 음성분석부와, 상기 분석 데이터 저장부(5)와 합성알고리즘(6), 디지털/아날로그변환기(7) 및 저역필터(8)로 구성된 음성합성부로 구성하여 스피커(9)에 접속되었다.Conventionally, as shown in FIG. 1, an audio signal 1, a low pass filter 2, and an analog / digital converter 3 for converting an analog signal to digital, an analysis algorithm 4, and an analysis data storage unit 5 are used. A voice analysis section composed of the voice analysis section, the analysis data storage section 5, a synthesis algorithm 6, a digital / analog converter 7 and a low pass filter 8 is connected to the speaker 9.

상기와 같이 음성합성을 위해서는 필히 분석부(4)가 필요하고, 상기 분석된 데이터를 임의의 메모리에 저장(5)해 두었다가 이 저장 데이터를 사용하여 합성부(6)에서 합성한다. 그리하여 음성합성에는 분석 알고리즘과 합성알고리즘이 필요하게 되는데 이 두 알고리즘에 의해서 합성방식이 다수개 존재한다.As described above, an analysis unit 4 is necessary for speech synthesis, and the analyzed data is stored 5 in an arbitrary memory and synthesized by the synthesis unit 6 using the stored data. Thus, speech synthesis requires analysis algorithms and synthesis algorithms. There are many synthesis methods by these two algorithms.

이러한 합성방식 중 종래의 웨이브 폼 코딩(wave-form coding) 방식은 음성신호 자체를 아날로그/디지털 변환하여 2∼4배 압축시킨 후 메모리에 저장하였다가 합성시 이 저장데이터의 압축을 풀어서 합성해준다.Among these synthesis methods, the conventional wave-form coding method compresses the voice signal itself by analog / digital conversion, compresses it 2 to 4 times, stores it in a memory, and then decompresses and synthesizes the stored data during synthesis.

그러므로, 정해진 문장만 합성되고 또한 메모리도 상당히 많아 소요되는 문제가 많다. 또한 분석-합성(Analysis-Synthesis) 방법은 단어단위나 음절단위로 성도 모델하여 이 모델의 계수를 메모리에 저장한 후 합성시 상기 저장된 계수를 사용하여 합성한다.Therefore, only a predetermined sentence is synthesized, and a lot of memory is required. In addition, the analysis-synthesis method stores vocal tracts in units of words or syllables, stores the coefficients of the model in a memory, and synthesizes the coefficients using the stored coefficients.

이 방법도 역시 메모리가 많이 소요되고 음성의 색, 성, 속도 조절에 문제가 있다.This method also requires a lot of memory and has problems in controlling the color, sex and speed of the voice.

따라서, 본 발명은 상기한 종래의 제반 문제점들을 해결하기 위하여 창안한 것으로, 본 발명의 목적은 메모리가 적게 소요되어 음의 변화가 쉽고, 무한의 음성데이터를 저장가능하고, 음성소리 데이터를 마음대로 수정할 수 있는 포만트(Formant)합성방식을 이용한 음성합성방법 및 장치를 제공함에 있다.Therefore, the present invention was devised to solve the above-mentioned problems. The object of the present invention is that the memory needs less, the sound can be easily changed, infinite voice data can be stored, and the voice sound data can be modified at will. The present invention provides a method and apparatus for synthesizing speech using a formant synthesis method.

상기한 본 발명의 목적을 달성하기 위한 바람직한 실시예를 이하 첨부된 도면에 의하여 상세히 설명하면 다음과 같다.Preferred embodiments for achieving the object of the present invention described above in detail by the accompanying drawings as follows.

제2도는 본 발명의 하드웨어 시스템 구성도로서, RS-232C(10)에서 데이터를 입력(11)받아 음성합성 프로그램을 실행하기 위한 메인 중앙처리장치인 디지털 신호 처리부(30)에 제어부(10)를 통해 전달한다.2 is a block diagram of the hardware system of the present invention. The control unit 10 is connected to a digital signal processor 30, which is a main central processing unit for receiving data 11 from the RS-232C 10 and executing a voice synthesis program. Pass through.

상기 제어부(10)는 상기 디지털 신호 처리부(30)와 상기 RS-232C(10), 프로그램 메모리(40), 데이터 메모리(50), 다지털-아날로그 변환기(60)를 제어한다.The controller 10 controls the digital signal processor 30, the RS-232C 10, the program memory 40, the data memory 50, and the digital-analog converter 60.

상기 프로그램 메모리(40)에는 제어 프로그램과 음성합성 프로그램이 내장된다.The program memory 40 includes a control program and a voice synthesis program.

상기 데이터 메모리(50)는 음성합성 데이터 베이스(DBASE), 사전, 디지털 신호처리부(30)가 필요한 작업 메모리(Working Memory)로 구성된다.The data memory 50 includes a voice synthesis database DBASE, a dictionary, and a working memory that requires the digital signal processor 30.

상기 디지털-아날로그 변환기(60)는 합성된 디지털신호를 아날로그신호로 바꾸어준다.The digital-analog converter 60 converts the synthesized digital signal into an analog signal.

상기 디지털-아날로그 변환기(60)의 출력신호는 저역필터(70)와 증폭기(80)를 거쳐 스피커(90)로 출력된다.The output signal of the digital-analog converter 60 is output to the speaker 90 via the low pass filter 70 and the amplifier 80.

상기한 구성에서 동작이 시작되면 합성한 데이터를 모두 입력받았는지 체크하고 모두 받지 못했다면 버퍼에 저장하고, 모두 받았다면 합성 프로그램을 실행시킨다.When the operation starts in the above configuration, it is checked whether all the synthesized data has been inputted, and if not received, it is stored in a buffer, and if all are received, the synthesized program is executed.

이하 제3도 내지 제5도를 참조하여 상세히 설명한다.Hereinafter, with reference to FIGS. 3 to 5 will be described in detail.

동작이 시작(101)되면 데이터 입력단(11)을 통해 음성신호의 음절, 단어, 문장등을 입력(102)받는다.When the operation is started 101, the syllable, words, sentences, etc. of the voice signal are received through the data input terminal 11.

상기 음절, 단어, 문장입력은 초성, 중성, 종성으로 분리가 가능한 조합형 코드 2바이트(Byte)로 입력된다. 그러면 영문, 기호를 처리(103)한다.The syllables, words, and sentence inputs are input in a combination code of 2 bytes that can be divided into primary, neutral, and final. Then, the English and symbol are processed (103).

상기 영문, 기호처리는 영문이나 기호를 한글발음으로 바꾸어준다. 그리고 사용되는 영문단어는 영문, 기호 사전(104)을 찾아 한글로 바꾸어준다.The English and symbol processing converts English or symbols into Korean pronunciation. The English word used is found in the English, symbol dictionary (104) and converted to Hangul.

그 다음 장음처리(105)를 한다.Next, the long sound processing 105 is performed.

한국어는 그 특성상 한 어절내에서 다른 음절에 비해 길이가 긴 음절이 존재하는데 이를 장음이라 한다. 장음과 단음을 구별하여 발음하지 않으면 자연도가 매우 떨어진다. 그래서 그 단어와 장음에 해당하는 음절의 위치를 사전에 저장(106)하였다가 입력되는 단어를 등록된 단어와 비교하여 장음인 경우 그 음절 앞에 특정표기를 하여 준다. (일예로써 바른자세→바른∼자세)In Korean, the length of syllable is longer than other syllables in one word. If you do not pronounce long and short sounds, naturalness is very poor. Therefore, the position of the syllable corresponding to the word and the long sound is stored in the dictionary (106), and the input word is compared with the registered word. (For example, right posture → right posture)

상기 장음처리(106)후 강세처리(107)를 한다.The stress treatment 107 is performed after the long sound treatment 106.

강세란 한 낱말 안에서 어느 한 음절이 다른 음절보다 더 크게 들리는 현상이다. 강세를 갖는 음절은 에너지가 더 크고, 음조가 더 높고, 음절 길이가 더 긴 특징이 있다. 그래서 다음과 같이 1차강세('), 2차강세(") 규칙을 정한다.Stress is the phenomenon in which one syllable is louder than another syllable in a word. Stressed syllables are characterized by greater energy, higher pitches, and longer syllable lengths. So, we define the rules of primary stress (') and secondary stress (") as follows.

ⅰ) 단음절로 구성된 어절은 항상 강세를 받는다.V) Words composed of single syllables are always stressed.

ⅱ) 장음은 항상 강세를 받는다.Ii) long sound is always stressed;

ⅲ) 2음절 또는 3음절의 어절인 경우 처음 나오는 받침있는 음절이 강세를 받으며 모두 받침이 없는 경우는 마지막 음절이 강세를 받는다.V) In the case of two-syllable or three-syllable words, the first supported syllable is stressed.

ⅳ) 4음절 이상인 경우 ⅱ),ⅲ)에 분하여 첫째, 둘째 음절이 강세를 받는다.I) In case of more than 4 syllables, the first and second syllables are stressed.

ⅴ) 3음절 이상의 경우 제2강세를 가진다. 이 경우 제2강세는 연속하여 오지 않고 제2강세는 위의 규칙을 따른다.Iii) a second accent is more than three syllables. In this case, the second accent does not come consecutively and the second accent follows the above rules.

ⅵ) 경음과 격음은 그 앞 음절이 받침을 포함하고 있다고 가정하여 규칙을 적용한다.Iii) The rhythm and the rhythm shall apply the rule assuming that the syllable preceding it contains the bearing.

예) 바른∼자세→바'른∼'자세Ex) Good posture → Good posture

그 다음 음운변동처리(108)를 한다.Then, phonological fluctuation processing 108 is performed.

음소와 음소가 연결되어 단어, 어절 등을 이룰 때 각자의 음소들은 그 위치에 따라 또는, 인접한 음소의 성격에 따라 그 성질이 변화하게 되는데 이를 음운 변동이라 한다.When a phoneme and a phoneme are connected to form a word or word, their phonemes change according to their location or the nature of adjacent phonemes. This is called phonological variation.

음운 변동 규칙은 아래와 같으며 이때, 규칙에 적용되지 않는 경우('ㄴ'첨가, 경음화의 일부분 등)는 사전처리(109)했다.The phonological fluctuation rule is as follows. In this case, if the rule does not apply ('b' addition, part of the horn), the preprocessing is performed (109).

즉 어절에서 허사(어미, 조사)를 구분해 내기 위하여 조사, 어미 사전을 구성했다.In other words, the survey and the ending dictionary were constructed to distinguish the vanity from the word.

ⅰ) 'ㅎ'탈락, 격음화 처리Ⅰ) 'ㅎ' dropping, voicing

ii) 귀착, 경음화, 연음처리ii) incidence, hardening, softening

ⅲ) 자음접변 1Consonant 1

ⅳ) 자음접변 2Consonant 2

ⅴ) 구개음화구) palatalization

종성이 'ㄷ', 'ㅌ;이고 전 음절의 중성이 'ㅣ'종성 'ㅈ', 'ㅊ'The finality is' ㄷ ','ㅌ; and the neutrality of all syllables is' ㅣ ' Finality 'ㅈ', 'ㅊ'

상기 음운변동처리(108)후 운율 구현 및 음운현상처리(110)를 행한다.After the phonological fluctuation processing 108, the rhyme realization and the phonological development process 110 are performed.

음운현상처리는 다음과 같이 한다.The phonological development process is as follows.

ⅰ) 비음화처리I) non-negative treatment;

모음과 인접한 초성 또는 종성이 비음일 경우 그 모음은 비음화된다.If the initial or longitudinal adjacent vowel is nonnegative, the vowel is nonnegative.

비음화가 된 모음은 합성기의 나젤풀(nasal pole; RNP)과 나젤제로(nasal zero;RNZ)에 특정값을 첨가함으로써 합성시 비음 처리를 하기 위해 상기 나젤풀과 나젤제로의 주파수(RNP,RNZ) 및 밴드폭(Bwnp, Bwnz)을 결정한다.The non-negative vowels are characterized by adding specific values to the nasal poles (RNP) and nasal zero (RNZ) of the synthesizer so that the nasal pools and the zero frequencies (RNP, RNZ) are used for nasal treatment during synthesis. And bandwidths Bwnp and Bwnz.

이 값은 포만트 베이터베이스(DBASE)에서 포만트와 세그먼트(segment)길이를 가져와 다음과 같이 구한다.This value is obtained by taking formant and segment lengths from formant database as follows.

이때, 해당비율화된 모음의 첫 번째 포만트(F1)는 일정한 양만큼 증가시킨다.At this time, the first formant F1 of the proportioned vowel is increased by a certain amount.

예 '눈'의 'ㅜ', '경우'의 'ㅕ'와 'ㅜ' 등Example: 'TT' in 'eye', 'ㅕ' and 'TT' in 'case', etc.

ⅱ) 유성음화Ii) voiced speech

무성자음이 전후에 인접한 유성음 사이에 존재할 때 유성음화가 되는 경우가 있다. 무성자음 중 경음과 격음은 이 경우에 해당하지 않는다.When voiced consonants exist between adjacent voiced sounds before and after they become voiced. Among the unvoiced consonants, the horn and vowels do not fall in this case.

일반적으로 유성음은 무성음보다 에너지의 길이가 턱이지므로 다음과 같이 처리한다.In general, the voiced sound is longer than the unvoiced sound, so it is processed as follows.

① ㄱ, ㄷ, ㅂ, ㅈ의 경우① a, c, ㅂ, or

무성음 부분의 여기(extation) 신호를 성문펄스(glottal pulse)로 대체하고 에너지를 70∼80%로 감소시킨다.The excitation signal of the unvoiced part is replaced by a glottal pulse and the energy is reduced to 70-80%.

② ㅎ의 경우② ㅎ

앞 음절의 종성이 유성자음일 경우, 'ㅎ'을 탈락시키고 앞 음절의 받침이 없는 경우 에너지를 70%로 감소시키고 길이를 단축시킨다.If the last syllable is the voiced consonant, 'ㅎ' is dropped, and if there is no support of the previous syllable, the energy is reduced to 70% and the length is shortened.

③ '위'의 단모음화③ Short vowels of 'top'

'위'는 단모음으로 분류되는 경우도 있고 중모음으로 분류되는 경우도 있는데 '위'앞에 자음이 존재할 경우는 단모음, 존재하지 않을 경우는 중모음으로 발음되는 경향이 있다.'Up' may be classified as a short vowel or it may be classified as a middle vowel, but if there is a consonant before 'up', it tends to be pronounced as a short vowel.

④ 동자음 축약④ consonant condensation

'ㄴ''ㅁ''ㅇ'ㄹ이 종성과 초성에 연속되어 나타날 경우 종성 자음의 길이가 뒤따르는 초성 자음의 길이만큼 축약된다.When 'b' ',' 'appears consecutively to the last and the last, the length of the final consonant is reduced by the length of the leading consonant.

위 ①∼④에 의해 변화된 최종적인 포만트 정보를 이용하여 연속적인 포만트 사이의 궤적인 포만트 궤적(Contour)을 계산한다.Using the final formant information changed by the above ① to ④, the formant trajectory between consecutive formants is calculated.

각 세그먼트(Segment)간의 포만트 연결관계는 발음이 연속되는 경우(Count)와 단절되는 경우(discn)로 나누어지는데 다음과 같이 구현된다.The formant connection relationship between each segment is divided into a case where the pronunciation is continuous (Count) and a case where the pronunciation is disconnected (discn).

(F(S), BW(s), AMP(s) : s 번째 세그먼트의 시작점에서의 포만트 주파수, 대역폭, 크기)(F (S), BW (s), AMP (s): formant frequency, bandwidth, magnitude at the beginning of the s-th segment)

f(s,m), bw(s,m), amp(s,m) : s번째 세그먼트의 m번째 프레임의 포만트 주파수, 대역폭, 크기)f (s, m), bw (s, m), amp (s, m): formant frequency, bandwidth, size of mth frame of sth segment)

다음은 운율 구현으로 다음 4가지의 정보를 이용하여 자연스러운 운율을 구현한다.The following is an implementation of rhyme. The following four pieces of information are used to implement natural rhymes.

이때 사용되는 정보는 피치궤적 데이터 베이스(DBase)와 에너지궤적 데이터베이스(DBase)에 저장되어 있는 정보로 미리 해당음에 적당한 정보를 실험적으로 분석하여 데이터 베이스를 구성해 놓는다.The information used at this time is information stored in the pitch trace database (DBase) and the energy trace database (DBase), and a database is constructed by experimentally analyzing the appropriate information.

① 에너지 포락선 및 레벨 : 에너지는 합성율의 크기를 나타내고, 에너지 정보는 에너지 포락선과 에너지 레벨의 곱이다.① Energy envelope and level: Energy represents the magnitude of synthesis rate, and energy information is the product of energy envelope and energy level.

가. 초성end. Initiality

파열음 : ㄱ, ㄲ, ㅋ, ㄷ, ㄸ, ㅌ, ㅂ, ㅃ, ㅍ(제5도의 (a)참조)Ruptured sound: a, ㄲ, ㅋ, ,, ㄸ, ㅌ, ㅂ, ㅃ, ((see (a) in Figure 5)

마찰음, 파찰음, 가음 : ㅅ, ㅆ, ㅈ, ㅉ, ㅎ(제5도의 (a)참조)Friction, Ripple, Gamm: ㅅ, ㅆ, ㅈ, ㅉ, ㅎ (refer to (a) in Figure 5)

유성자음 : ㄴ, ㅁ, ㅇ, ㄹ(제5도의 (c)참조)Meteor Consonants: b, ㅁ, ㅇ, d (see (c) in Figure 5)

나. 종성I. Jongseong

닫힘음 : ㄱ, ㄷ, ㅂ(제5도의 (d)참조)Closed: a, c, ㅂ (see (d) in Figure 5)

유성자음(제5도의 (마)참조)Voiced consonants (see (e) in Figure 5)

에너지 레벨은 음절의 크기를 나타내는데 음절의 크기는 강세에 의해 영향을 받는다.The energy level represents the size of the syllable, which is affected by stress.

· 첫째 강세 : S1.4First bullish: S1.4

· 둘째 강세 : S1.2Second bullish: S1.2

· 강세 없음 : S1.0No stress: S1.0

② 피치 포락선 및 레벨② pitch envelope and level

피치 주기는 음높이를 결정하고 이것은 운율과 강세에 영향을 준다.The pitch period determines the pitch, which affects rhythm and stress.

피치 정보는 피치포락선과 피치레벨의 합으로 구성된다.Pitch information consists of the sum of a pitch envelope and a pitch level.

-포락선Envelope

가. 마침표(.)로 끝나는 어절의 마지막 음절(제5도의 (e)참조)end. Last syllable of a word ending with a period (.) (See (e) in Figure 5)

나. 물음표(?)로 끝나는 어절의 마지막 음절(제5도의 (f)참조)I. Last syllable of a word ending with a question mark (?) (See (f) in Figure 5)

다. 쉼표(,), 느낌표(!)로 끝나는 마지막 음절(제5도의 (g)참조)All. Last syllable ending with comma (,) or exclamation point (!) (See (g) in Figure 5)

-레벨-level

가. 한 발화단위 안에서 한 어절이 증가할때마다 피치 주기는 0.3msec 증가한다.end. The pitch period increases by 0.3msec each time a word is increased in one ignition unit.

세 어절 이후는 그대로 유지한다.It remains the same after three words.

나. 마지막 어절중 마침표(.), 또는 물음표(?)로 끝나는 경우 그 어절의 피치 주기는 1msec 증가하고 마지막 음절은 다시 1.8msec(','인 경우), 0.5msec('?'의 경우)증가한다.I. If the last word ends with a period (.) Or a question mark (?), The pitch period of that word is increased by 1 msec, and the last syllable is increased by 1.8 msec (if ',') and 0.5 msec ('?'). .

다. 강세를 받는 음절의 피치 주기는 0.1msec 감소한다.All. The pitch period of the stressed syllable is reduced by 0.1 msec.

③ 음절길이③ syllable length

음절의 길이는 그 음절의 성격에 따라 변하므로 해당음절의 중성의 성격, 음절수, 강세, 음절의 위치등에 따라 그 길이를 조절한다.Since the length of syllables varies according to the characteristics of the syllables, the length of the syllables is adjusted according to the neutral character, the number of syllables, the stress, and the position of the syllables.

음절의 길이(L)는 일반적으로 앞음절과의 천이영역부분인 Ltr1, 안정적인 구간인 Lst, 뒷음절과의 천이영역 부분인 Ltr2로 구성되어 있다.The syllable length (L) is generally composed of Ltr1, which is the transition region with the front syllable, Lst, which is the stable section, and Ltr2, which is the transition region with the back syllable.

중모음일 경우는 지연구간인 Ltrd가 추가된다.In the case of medium vowels, the delay section Ltrd is added.

④ 휴지기간④ Rest period

음절과 음절 사이에는 적당한 시간의 휴지 기간이 존재하여야 자연스러운 발음으로 합성되는 경우가 있다.There is a case where a proper period of rest exists between syllables and syllables to synthesize natural pronunciation.

연결되는 음의 성격에 따라 다음과 같이 실험적으로 구한 휴지기간을 삽입한다.Depending on the nature of the sound connected, the experimentally obtained rest period is inserted as follows.

이어서 데이터 패키지(Data Package) 구축(112)을 실행한다.Next, data package construction 112 is executed.

상기 운율구현 및 음운 현상처리단계(110)에서 DBASE(111)를 참조하여 합성 피라미터가 추출되면 합성기에서 이 피라미터를 읽어 음성신호를 합성해 낸다.When the synthesis parameter is extracted with reference to the DBASE 111 in the rhythm implementation and phonological development step 110, the synthesizer reads the parameter to synthesize a speech signal.

이를 위하여 합성 피라미터는 일정한 형태(format)에 맞게 배열되어야 한다. 이것을 데이터 패키지라 한다.To do this, the synthetic parameters must be arranged in a certain format. This is called a data package.

여기서 ①∼⑥까지는 음절 단위 정보이고 ⑦∼⑩까지는 세그먼트 단위 정보이다.Here, 1 to 6 are syllable unit information, and 7 to 8 are segment unit information.

데이터 패키지는 ①∼⑩에 의해 구해서 다음 단계(115)에 합성기로 보내서 합성음을 만든다.The data package is obtained by? -⑩ and sent to the synthesizer in the next step 115 to produce a synthesized sound.

상기 합성단계(114)의 합성기는 클래트(klatt)의 병렬-직렬(Paralled-casacade)합성기를 한국어에 맞게 단순화시켜 사용했으며 그 구성은 제4도와 같다.The synthesizer of the synthesis step 114 is used to simplify the parallel (serlat-casacade) synthesizer of the clatt (klatt) in Korean and its configuration is shown in FIG.

상기 제4도에서 임펄스 발생기(Impulse Genrator:210)와 구분된 성문펄스 발생기(Glottal Pulse Generator:220)는 상술한 데이터 패키지의 피치정보를 사용하여 만들고, 가우스 난수표 발생기(gaussian Random Number Generator) (230)는 무성음의 음원을 만드는 것으로 피치정보와 난수표를 사용하여 만든다.In FIG. 4, a global pulse generator 220 distinguished from an impulse generator 210 is made using the pitch information of the above-described data package, and a Gaussian Random Number Generator 230 ) Creates a sound source of unvoiced sound using pitch information and random numbers.

즉, 상기 성문펄스와 난수표는 각각 에너지정보의 이득을 각각 Ar(240) 및 Au(250)에 곱셈기(241)(251)로 곱하여 덧셈기(260)로 더한 후 음원신호를 구성한다.That is, the glottal pulse and the random number table respectively multiply the gains of the energy information by the multipliers 241 and 251 to Ar 240 and Au 250, respectively, and add them to the adder 260 to form a sound source signal.

상기 음원신호는 음절의 성질에 따라 병렬합성(270)이나 직렬합성(280)을 통과하여 음성신호를 합성한다.The sound source signal passes through the parallel synthesis 270 or the serial synthesis 280 according to the nature of the syllables to synthesize the audio signal.

상기 직렬합성(280)의 RNP는 나젤풀(Nasal Pole)이고, PNZ는 나젤 제로(Nazal Zero)이며, R은 밴드 패스필터(BPF)이다. 비음화된 음일 경우 상기 RNP 및 RNZ에 특정값이 지정되지만, 나머지 경우는 그대로 통과한다.The RNP of the series synthesis 280 is Nasal Pole, PNZ is Nazal Zero, and R is a band pass filter (BPF). In the case of non-negative sound, a specific value is assigned to the RNP and RNZ, but the other cases pass through as it is.

상기 밴드 패스 필터의 필터값[2y(nT)]은,The filter value [2y (nT)] of the band pass filter is

여기에서, Bw는 밴드폭이고Where Bw is the bandwidth

T는 주기이며T is a cycle

f는 포만트 주파수이다.f is the formant frequency.

상기 나젤풀과 나젤 제로 모델은The Naselpool and Nazel Zero models

상기와 같이 데이터 패키지 정보의 합성 피라미드로부터 합성기의 계수들을 조절한 후, 음원합성을 직렬합성 또는 병렬합성하게 되면 원하는 음성합성을 하게 된다.After adjusting the coefficients of the synthesizer from the synthesis pyramid of the data package information as described above, if the sound source synthesis is synthesized in series or in parallel, desired speech synthesis is achieved.

한편, 운율구현 및 음운현상처리 단계와 데이터 패키지 구축단계에 이용되는 데이터 베이스(DBASE)구성은 다음과 같다.On the other hand, the configuration of the database (DBASE) used in the rhyme implementation, phonological phenomenon processing step and data package construction step is as follows.

① 피치 궤적 DBASE① Pitch Trajectory DBASE

피치 궤적 DBASE는 초기지 피치와 피치의 레벨 및 포락선을 상기 단계(110)의 피치 포락선 레벨의 펄스를 사용하여 3가지로 DBASE를 구성한다.The pitch trajectory DBASE configures DBASE by using the initial pitch, the pitch level, and the envelope using pulses of the pitch envelope level in step 110.

② 에너지 궤적 DBASE② Energy trace DBASE

에너지 궤적 DBASE 피치 체적 DBASE와 같이 상기 단계(110)의 에너지 포락선 및 레벨의 규칙을 사용하여 5가지와 강세에 의해서 DBASE를 구성한다.Energy Trajectory DBASE Pitch Volumes DBASE is constructed by five and stress using the rules of energy envelope and level of step 110 as in DBASE.

③ 반음절 단위 포만트 DBASE③ half syllable unit formant DBASE

이 DBASE는 각 음소의 조음 위치에 따라 다음과 같이 10개의 군으로 구성한다.This DBASE consists of ten groups as follows according to the articulation position of each phoneme.

여기서 상기 ㄱ, ㄲ, ㅋ+모음 DBASE에서 ㄱ+모음, ㄲ+모음, ㅁ, ㅋ+모음은 세그먼트의 길이에 의해 구별된다. 이하 ㄴ, ㄷ, ㄸ, ㅌ+모음, ㅁ, ㅂ, ㅃ, ㅍ+모음 …등도 세그먼트 길이에 의해 구별된다.Here, the a, ㄲ, ㅋ + collection in DBASE a + vowel, ㄲ + vowel, ㅁ, ㅋ + vowel are distinguished by the length of the segment. , ㄸ, ㄸ, ㅌ + ,, ㅁ, ㅂ, ㅃ, ㅍ + 모. And the like are also distinguished by the segment length.

종성의 DBASE는 초성의 DBASE를 역으로 하면 된다.The final DBASE should be the reverse of the initial DBASE.

ㄱ+모음(초성) DBASE가 (f, bw1, f2, bw2, f3, bw3, f4, bw4)일 때 모음+ㄱ(종성)의 DBASE는 (f4, bw4, f3, bw3, f2., bw2, f1, bw1)이 된다.A + vowel (primary) When DBASE is (f, bw1, f2, bw2, f3, bw3, f4, bw4), the DBASE of vowel + a (single) is (f4, bw4, f3, bw3, f2., Bw2, f1, bw1).

여기서 f1∼f4 : 1∼4포만트의 주파수Where f1 to f4 are the frequencies of 1 to 4 formants.

bw1∼bw4 : 1∼4 대역폭bw1 to bw4: 1 to 4 bandwidth

결국 포만트 DBASE의 종료는 10(조음위치에 따라 군) S21(모음의 개수)=210개가 된다.As a result, the formant DBASE ends at 10 (groups according to articulation position) S21 (number of collections) = 210.

세그먼트 길이 DBASE는 다음과 같이 구성한다.Segment length DBASE consists of:

여기서 각 군의 DBASE의 주파수와 밴드폭, 세그먼트 길이는 음성분석 장비를 사용하여 음성을 발음하여 찾은 값이다.Here, the frequency, bandwidth, and segment length of DBASE in each group are the values found by pronouncing the voice using voice analysis equipment.

이상에서 상세히 설명한 바와 같이 본 발명은 기존의 정해진 문장만 한국어 합성이 되어 실용화되고, 문장은 변경이 어려우며, 대용량의 문장은 합성하기 어렵던 것을, 문장변경도 쉽고 대용량의 문장합성이 쉬우며, 기대효과 역시 한국어로 발음되는 모든 시스템에 적용이 가능한 유익한 장점이 있다.As described in detail above, the present invention is practically used only because the existing sentences are synthesized in Korean, and the sentences are difficult to change, and the large-capacity sentences are difficult to synthesize. There is also a beneficial advantage that can be applied to any system pronounced in Korean.

Claims

음성합성 데이터 베이스 및 필요한 내용이 수록되어 있는 사전과 디지털 신호처리가 가능하도록한 작업데이터가 저장된 메모리 데이터(50)와, 음성합성 프로그램이 내장된 프로그램 메모리(40)와, RS-232C(10)의 출력데이터가 인가되면 각각의 제어신호를 발생하여 음성합성장치를 제어하는 제어부(20)와 상기 제어부(20)의 제어에 따라 상기 프로그램 메모리(40)의 음성합성 프로그램 및 상기 데이터 메모리(50)내의 사전내용에 의해 상기 RS-232(10)의 출력데이터를 신호처리하는 디지털 신호처리부(30)와, 상기 제어부(20)에서 출력되는 상기 디지털 신호 처리부(30)에서 합성된 디지털 신호를 아날로그 신호로 변환되는 디지철-아날로그 변환기(60)와, 상기 디지털-아날로그 변환기(60)로부터 출력된 아날로그 신호에서 저역신호를 필터링시키는 저역필터(70)와, 상기 저역필터(70)로부터 출력된 신호를 일정 레벨로 증폭시켜 스피커(90)를 통해 외부로 송출하는 증폭기(80)를 포함하여 이루어진 것을 특징으로 하는 포만트를 이용한 한국어 합성장치.Memory data 50 storing a speech synthesis database and a dictionary containing necessary contents and work data enabling digital signal processing, a program memory 40 incorporating a speech synthesis program, and an RS-232C 10 When the output data is applied, the respective control signals are generated to control the speech synthesis apparatus and the speech synthesis program of the program memory 40 and the data memory 50 under the control of the controller 20. A digital signal synthesized by the digital signal processor 30 for processing the output data of the RS-232 10 and the digital signal synthesized by the digital signal processor 30 output from the controller 20 according to the prior contents in the analog signal. A low-pass filter 70 for filtering a low-pass signal from an analog signal output from the digital-to-analog converter 60, and Korean synthesizer using a formant, characterized in that it comprises an amplifier (80) for amplifying the signal output from the low pass filter (70) to a predetermined level to the outside through the speaker (90).

데이터 입력단을 통해 합성할 음절, 단어, 문장등으로 입력받는 제1단계(102)와, 상기 입력된 데이터로부터 영문 및 기호를 영문 및 기호사전(104)을 이용하여 처리하는 제2단계(103)와, 상기 입력 단어를 등록단어와 비교하여 장음인 경우 해당장음 앞에 특정표기를 해주는 장음 처리단계인 제3단계(105)와, 상기 입력 단어중 에너지가 더 크고 음조가 더 높으며 음절길이가 더 긴 단어의 강세를 처리하는 제4단계(107)와, 상기 입력 단어중 음소와 음소가 연결되어 단어 및 어절 등을 이룰 때 음소의 위치나 성격에 따라 변화되는 음운변동을 처리하는 제5단계(108)와, 상기 입력 단어중 음절의 비음화와 유성음화를 처리하는 음운형상처리 및 운율구현을 하는 제6단계(110)와, 상기 제6단계 처리후 합성 피라미터가 추출되면 합성기에서 상기 피라미터를 읽어 일정한 형태에 맞게 배열시켜 데이터 패키지를 구축하는 제7단계(112)와, 상기 구축된 데이터 패키지 정보로 음성을 합성하는 제8단계(114)를 순차적으로 실행하는 것을 특징으로 하는 포만트를 이용한 한국어 합성방법.A first step 102 of inputting syllables, words, sentences, etc. to be synthesized through a data input terminal, and a second step 103 of processing English and symbols from the input data using the English and symbol dictionary 104. And a third step 105, which is a long sound processing step of giving a specific notation in front of the corresponding long sound when the long word is compared with the registered word, and the energy of the input words is higher, the pitch is higher, and the syllable length is longer. A fourth step 107 of processing stress of a word; and a fifth step of processing phonological fluctuations which are changed according to the location or character of a phoneme when a phoneme and a phoneme of the input words are connected to form a word and a word. And a sixth step 110 for phonological shape processing and rhyme embodying the syllables of the syllables among the input words, and a synthesis parameter after the sixth step processing. Read schedule Korean synthesis using a formant, characterized in that the seventh step 112 of arranging according to the form to build a data package and the eighth step 114 of synthesizing the voice with the constructed data package information sequentially Way.

제2항에 있어서, 상기 제1단계(102)의 음절, 단어, 문장입력은 초성, 중성, 종성으로 분리가 가능한 조합형 코드 2바이트로 입력받도록 한 것을 특징으로 하는 포만트를 이용한 한국어 합성방법.The method of claim 2, wherein the syllable, word, and sentence inputs of the first step (102) are input as a combination code of 2 bytes that can be divided into primary, neutral, and final.

제2항에 있어서, 상기 제2단계(103) 영문 및 기호처리는 영문 및 기호 사전을 이용하여 한글발음으로 바꾸어 주도록 한 것을 특징으로 하는 포만트를 이용한 한국어 합성방법.3. The method of claim 2, wherein the English and symbol processing of the second step (103) is made to be converted into Korean pronunciation using an English and symbol dictionary.

제2항에 있어서, 상기 제5단계(108)인 음운 변동처리는 입력단어의 어미, 조사 사전과 불규칙 음운 변동사전(109)을 이용하여 음운변동을 처리하도록 한 것을 특징으로 하는 포만트를 이용한 한국어 합성방법.The method of claim 2, wherein the phonological fluctuation processing of the fifth step 108 is performed by using a formant to process phonological fluctuation by using an input word ending, an irradiation dictionary, and an irregular phonological fluctuation dictionary 109. Korean Synthesis Method.

제2항에 있어서, 상기 제6단계(110) 처리는 피치궤적 데이터 베이스와 에너지 궤적 데이터 베이스를 이용하여 처리하도록 한 것을 특징으로 하는 포만트를 이용한 한국어 합성밥법.The method of claim 2, wherein the sixth step (110) processing is performed by using a pitch trajectory database and an energy trajectory database.

제2항에 있어서, 상기 제7단계(112)처리는 반음절 단위의 포만트 데이터 베이스를 이용하여 처리하도록 한 것을 특징으로 하는 포만트를 이용한 한국어 합성방법.The method of claim 2, wherein the seventh step (112) is performed by using a formant database in units of half-syllables.