JPH032319B2 - - Google Patents

Info

Publication number
JPH032319B2
JPH032319B2 JP60173274A JP17327485A JPH032319B2 JP H032319 B2 JPH032319 B2 JP H032319B2 JP 60173274 A JP60173274 A JP 60173274A JP 17327485 A JP17327485 A JP 17327485A JP H032319 B2 JPH032319 B2 JP H032319B2
Authority
JP
Japan
Prior art keywords
pause
sentences
conversational
understanding
fundamental frequency
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired
Application number
JP60173274A
Other languages
Japanese (ja)
Other versions
JPS6234200A (en
Inventor
Eiji Oohira
Akio Komatsu
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
National Institute of Advanced Industrial Science and Technology AIST
Original Assignee
Agency of Industrial Science and Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Agency of Industrial Science and Technology filed Critical Agency of Industrial Science and Technology
Priority to JP60173274A priority Critical patent/JPS6234200A/en
Publication of JPS6234200A publication Critical patent/JPS6234200A/en
Publication of JPH032319B2 publication Critical patent/JPH032319B2/ja
Granted legal-status Critical Current

Links

Description

【発明の詳細な説明】 〔発明の利用分野〕 本発明は自然に発声された会話文を理解し、理
解結果に応じた応答を行なう会話音声理解システ
ムに係り、特に会話文を意味的なまとまりを示す
単位に分割する方法に関する。
[Detailed Description of the Invention] [Field of Application of the Invention] The present invention relates to a conversational speech understanding system that understands naturally uttered conversational sentences and responds according to the understanding result. It relates to a method of dividing into units indicating.

〔発明の背景〕[Background of the invention]

従来、音声を入力手段とするシステムでは、単
語音声や朗読調に発声された連続音声を対象とし
ていた。しかし、朗読調ではなく自然に発声され
た会話文(以下、これを単に会話文と呼ぶ。)の
場合は、思考を組立てながら発声するため、言い
問違いや省略表現などによる文法的に整つていな
い文が存在するほか、複数の文が続けて入力され
る。そして、これらの文は句読点で区切られてい
ない。したがつて会話文の理解においては、ま
ず、会話文を言語処理可能にするために、意味的
まとまりを示す単位に分割する必要がある。入力
音声を分割する方式に関しては、特開昭48−
30302などがあるが、これらは、限定単語を音韻
に分割する方法や文法的に整つた朗読調の文を文
節に分割する方法に関するものであり、自然な会
話文の分割については配慮されていない。
Conventionally, systems using speech as an input means have targeted word speech or continuous speech uttered in a reading tone. However, in the case of conversational sentences that are uttered naturally rather than recited (hereinafter referred to simply as conversational sentences), the sentences are uttered while assembling thoughts, so they are not grammatically correct due to misunderstandings or abbreviations. In addition to some sentences that are not entered correctly, multiple sentences are entered in succession. And these sentences are not separated by punctuation. Therefore, in order to understand conversational sentences, it is first necessary to divide the conversational sentences into units that represent semantic groups in order to make them linguistically processable. Regarding the method of dividing input audio, see Japanese Patent Application Laid-open No.
30302, etc., but these are concerned with methods of dividing limited words into phonemes and methods of dividing grammatically well-organized recitation sentences into clauses, and do not consider the division of natural conversational sentences. .

〔発明の目的〕[Purpose of the invention]

本発明の目的は、文法的に整つていない文が存
在し、かつ複数の文が連続して入力される会話文
を、意味的まとまりをもつた単位に分割すること
により、理解の信頼性が高く、かつ処理量の少な
い会話音声理解方法を提供することにある。
The purpose of the present invention is to improve the reliability of understanding by dividing conversational sentences in which there are sentences that are not grammatically well-organized and in which multiple sentences are input consecutively into units that have semantic coherence. An object of the present invention is to provide a conversational speech understanding method that has high performance and requires a small amount of processing.

〔発明の概要〕[Summary of the invention]

かかる目的を達成するため、本発明は音声の抑
揚や強勢などの韻律情報(音声パワー、基本周波
数など)を用い、会話文を意味的まとまりに分割
したことを特徴とする。この韻律情報は、発声内
容に対して合理的・自然的な情報であり、特に抑
揚は、発声内容が問いかけ調の場合は、どこの国
においても文末の声の高さを高くするというよう
に言語によらない普遍的な情報となる。
In order to achieve this object, the present invention is characterized in that a conversational sentence is divided into semantic groups using prosodic information such as intonation and stress of the voice (voice power, fundamental frequency, etc.). This prosodic information is rational and natural information relative to the content of the utterance, and in particular intonation, when the content of the utterance is in a questioning tone, the pitch of the voice at the end of the sentence is raised in any country. It becomes universal information that is not dependent on language.

〔発明の実施例〕[Embodiments of the invention]

本発明の一実施例を第1図に示す。第1図は、
キーボードなどよりカナ文字で入力される記述文
の理解システムを応用した会話音声理解システム
の一構成図である。記述文理解では、形態素解析
部6にカナ文字列が入力される。形態素解析部6
は辞書メモリ7を用いて文節の検出を行ない、文
節侯補を出力する。そして、構文解析部8では構
文を満足する文節侯補のチエーンを検出し、意味
解析部10では、更に意味的に自然なチエーンを
検出し、最も確かなものを解として出力する。会
話音声理解システムでは、入力手段が音声である
ため、音声をカナ文字に変換する必要がある。こ
のため、音声の音韻情報や韻律情報を求める特徴
抽出部1および標準パターン5とのマツチングに
より入力音声をカナ文字に変換する音声認識部4
を設ける。
An embodiment of the present invention is shown in FIG. Figure 1 shows
FIG. 1 is a configuration diagram of a conversational speech understanding system that applies a system for understanding descriptive sentences input in kana characters from a keyboard or the like. In the written sentence understanding, a kana character string is input to the morphological analysis unit 6. Morphological analysis section 6
uses the dictionary memory 7 to detect phrases and output phrase candidates. Then, the syntactic analysis unit 8 detects a chain of clause candidates that satisfy the syntax, and the semantic analysis unit 10 detects a more semantically natural chain, and outputs the most reliable one as a solution. In a conversational speech understanding system, since the input means is speech, it is necessary to convert speech into kana characters. For this reason, the feature extraction unit 1 obtains phonological information and prosody information of speech, and the speech recognition unit 4 converts input speech into kana characters by matching with standard patterns 5.
will be established.

記述文理解においては、処理の対象を句点で区
切られた文としており、これに基づいた構文情報
などに従がつて理解を行なつている。しかし、会
話文は文法的に整つてない文が存在し、複数の文
が続けて入力されることがあるため、これをその
まま理解しようとすると、多くの変形を含んだ構
文情報9等を用意する必要がある。したがつて、
処理量が増大し、理解の信頼性の低下が生じてし
まう。このため会話文を意味的なまとまりに分割
する必要がある。一般的に会話文を意味的なまと
まりに分割するものとしては、記述文における句
点が上げられる。会話文の句点位置に相当する箇
所の特徴の1つは、息つぎによるポーズが生じる
ことである。したがつて、無音区間(音声パワー
が雑音レベルPθ以下の区間)の長さが閾値Pλ(例
えば300m sec)以上をポーズとし、これを検出
することにより、句点位置の検出が可能である。
しかし、会話文の場合は、思考を行ないながら発
声していくため、言い違いや思い違いをした後で
も長くポーズが生じてしまう。
In descriptive text comprehension, the processing target is a sentence separated by periods, and comprehension is performed according to syntactic information based on this. However, in conversational sentences, there are sentences that are not grammatically organized, and multiple sentences may be input in succession, so if you try to understand them as they are, you will need to prepare syntactic information 9 that includes many variations. There is a need to. Therefore,
This increases the amount of processing and reduces the reliability of understanding. For this reason, it is necessary to divide conversational sentences into semantic groups. Periods in descriptive sentences are generally used to divide conversational sentences into semantic groups. One of the characteristics of places that correspond to periods in conversational sentences is that a pause occurs due to a pause. Therefore, by detecting a period in which the length of a silent section (a section in which the audio power is equal to or less than the noise level Pθ) is equal to or greater than a threshold value Pλ (for example, 300 m sec) as a pause, it is possible to detect the position of a period.
However, in the case of conversational sentences, the person speaks while thinking, so there is a long pause even after a misunderstanding or misunderstanding occurs.

句点位置の特徴を表わすもう一つの韻律情報と
して、音声の抑揚であるイントネーシヨンがあ
る。イントネーシヨンは、文頭において急速に立
ち上がり、その後文末に向つて緩やかに低くなつ
ていく。そして、文末においては、話者の最低基
本周波数に近づく。しかし、言い間違いや思い違
いによりポーズが生じた箇所では、文末の基本周
波数が高いまま終わり、ポーズ後の基本周波数も
ポーズ前の基本周波数とほぼ同じ高さから始ま
り、文を継続しようとする傾向にある。
Another piece of prosodic information that represents the characteristics of the point position is intonation, which is the intonation of the voice. Intonation rises rapidly at the beginning of a sentence and then gradually decreases toward the end. At the end of the sentence, the frequency approaches the speaker's lowest fundamental frequency. However, in places where a pause occurs due to a misspoken word or misunderstanding, the fundamental frequency at the end of the sentence remains high, and the fundamental frequency after the pause starts from almost the same height as the fundamental frequency before the pause, leading to a tendency to continue the sentence. be.

会話文分割部2は、以上の会話文の句点に相当
する位置の特徴を利用して、会話文を意味的まと
まりに分割する。その分割方式を第2図、第3図
を用いて具体的に説明する。第2図は、ポーズ付
近の韻律情報の形状例、第3図は本方式の流れ図
を示している。
The conversational sentence dividing unit 2 divides the conversational sentence into semantic groups by utilizing the characteristics of the positions corresponding to the punctuation marks in the conversational sentence. The division method will be specifically explained using FIGS. 2 and 3. FIG. 2 shows an example of the shape of prosody information near a pause, and FIG. 3 shows a flowchart of this method.

(1) まずポーズを検出するため、無音区間がPλ
以上続く箇所を検出し、分割候補とする。
(1) First, to detect a pause, the silent interval is Pλ
A continuous portion is detected and set as a division candidate.

(2) ポーズが検出された箇所のうち、ポーズ前の
音声の基本周波数Feが話者の下限周波数以下
のもののみを候補として残し、後は文中である
とする。
(2) Among the parts where a pause is detected, only those in which the fundamental frequency Fe of the voice before the pause is equal to or lower than the speaker's lower limit frequency are left as candidates, and the remaining parts are assumed to be in the sentence.

(3) 更に、ポーズ後に最大値を示す基本周波数
FsとFeの差であるΔFが閾値(例えば、男性で
は40〜50Hz)以上であれば、その位置を分割点
とする。
(3) Furthermore, the fundamental frequency that shows the maximum value after the pause
If ΔF, which is the difference between Fs and Fe, is equal to or greater than a threshold (for example, 40 to 50 Hz for men), that position is set as a division point.

ここで話者の下限周波数とは、現在システムを
利用している話者の発声可能な最低周波数に定数
倍(例えば1.1〜1.2倍)したものである。そし
て、この情報は、話者情報学習部3により抽出さ
れ、話者情報としてあらかじめ登録する。話者情
報は、数十音節よりなる平叙文(例えば挨拶文)
より求める。
Here, the speaker's lower limit frequency is the lowest frequency that can be uttered by a speaker currently using the system multiplied by a constant (for example, 1.1 to 1.2 times). This information is then extracted by the speaker information learning section 3 and registered in advance as speaker information. Speaker information is a declarative sentence (for example, a greeting) consisting of several dozen syllables.
Seek more.

〔発明の効果〕〔Effect of the invention〕

本発明によれば、文法的に整つていない文が存
在し、かつ複数の文が続けて入力する会話文を意
味的まとまりをもつた単位に分割できる。このた
め、以降の理解処理が簡素化され、処理量が低減
できるほか、理解の信頼性をも向上できる効果が
ある。
According to the present invention, a conversational sentence in which there is a sentence that is not grammatically well-organized and a plurality of consecutive sentences are input can be divided into units that are semantically coherent. Therefore, the subsequent understanding processing is simplified, the amount of processing can be reduced, and the reliability of understanding can also be improved.

【図面の簡単な説明】[Brief explanation of the drawing]

第1図は、会話音声理解システムの一構成図、
第2図は、ポーズ付近の韻律情報を説明するため
の図、第3図は本方式の流れ図である。 符号の説明、1……特徴抽出部、2……会話文
分解部、3……話者情報学習部。
Figure 1 is a configuration diagram of a conversational speech understanding system.
FIG. 2 is a diagram for explaining prosody information near a pause, and FIG. 3 is a flowchart of this method. Explanation of symbols: 1...Feature extraction unit, 2...Conversation sentence decomposition unit, 3...Speaker information learning unit.

Claims (1)

【特許請求の範囲】[Claims] 1 音声による会話文を理解する会話音声理解方
法において、入力される会話音声から韻律情報を
抽出し、その抽出された韻律情報を利用して無音
区間の長さからポーズを検出し、該検出されたポ
ーズの直前の基本周波数を予め登録しておいた話
者の下限周波数と比較して分割点の侯補を選び、
該ポーズの直前の基本周波数と該ポーズの直後に
最大値を示す基本周波数との差に応じて上記分割
点の侯補を分割点とすることにより、上記入力さ
れる会話音声を意味的まとまりをもつた単位に分
割することを特徴とする韻律情報を利用した会話
音声理解方法。
1. In a conversational speech understanding method for understanding spoken conversation sentences, prosodic information is extracted from input conversational speech, and the extracted prosodic information is used to detect pauses from the length of silent intervals. The fundamental frequency immediately before the pause is compared with the lower limit frequency of the speaker registered in advance, and candidates for the dividing point are selected.
By setting the candidates of the dividing point as dividing points according to the difference between the fundamental frequency immediately before the pause and the fundamental frequency that shows the maximum value immediately after the pause, the input conversational speech is divided into semantic groups. A method for understanding conversational speech using prosodic information, which is characterized by dividing it into multiple units.
JP60173274A 1985-08-08 1985-08-08 Conversation voice understanding system utilizing meter information Granted JPS6234200A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP60173274A JPS6234200A (en) 1985-08-08 1985-08-08 Conversation voice understanding system utilizing meter information

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
JP60173274A JPS6234200A (en) 1985-08-08 1985-08-08 Conversation voice understanding system utilizing meter information

Publications (2)

Publication Number Publication Date
JPS6234200A JPS6234200A (en) 1987-02-14
JPH032319B2 true JPH032319B2 (en) 1991-01-14

Family

ID=15957406

Family Applications (1)

Application Number Title Priority Date Filing Date
JP60173274A Granted JPS6234200A (en) 1985-08-08 1985-08-08 Conversation voice understanding system utilizing meter information

Country Status (1)

Country Link
JP (1) JPS6234200A (en)

Families Citing this family (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JPS62141731U (en) * 1986-02-28 1987-09-07
JP2007032373A (en) * 2005-07-25 2007-02-08 Ebara Corp Structure of casing of horizontal shaft pump for pump gate, horizontal shaft pump for pump gate, and pump gate facility
JP5141695B2 (en) * 2008-02-13 2013-02-13 日本電気株式会社 Symbol insertion device and symbol insertion method
JPWO2019087811A1 (en) * 2017-11-02 2020-09-24 ソニー株式会社 Information processing device and information processing method

Also Published As

Publication number Publication date
JPS6234200A (en) 1987-02-14

Similar Documents

Publication Publication Date Title
Klatt The Klattalk text-to-speech conversion system
US5752227A (en) Method and arrangement for speech to text conversion
GB1380502A (en) Systems for the synthesis of speech from alphanumeric data
Aijun et al. Speech corpus of Chinese discourse and the phonetic research
Bijankhan et al. Tfarsdat-the telephone farsi speech database.
JPH032319B2 (en)
Jeon et al. Automatic generation of Korean pronunciation variants by multistage applications of phonological rules.
Rapp Automatic labelling of German prosody.
Marasek et al. Multi-level annotation in SpeeCon Polish speech database
JPH02308194A (en) Foreign language learning device
Ziółko et al. Statistics of diphones and triphones presence on the word boundaries in the Polish language. Applications to ASR
Disambiguation et al. Speech Synthesis
JPS5837698A (en) Conversion method for voice input japanese language typewriter
Brinckmann The Kiel corpus of read speech as a resource for speech synthesis
Kula et al. Prosody control in diphone-based speech synthesis system for Polish
KR0136423B1 (en) Phonetic change processing method by validity check of sound control symbol
Wu et al. A comparison study on contextual modeling for estimating functional loads of phonological contrasts
Weibin et al. Duration Modeling For Chinese Systhesis from C-ToBI Labeled Corpus
久木田美枝子 et al. Research on Language Acquisition (II)
Bruce On the phonetics of rhythm: Evidence from Swedish
Hashimoto et al. Context labels based on" bunsetsu" for HMM-based speech synthesis of Japanese
Sproat Spoken Output Technologies
Hanane et al. An Expert System for Automatic Reading of A Text Written in Standard Arabic
JPS61121167A (en) Audio word processor using divided utterance
Abramson et al. Stop voicing, intonation, and the F contour

Legal Events

Date Code Title Description
EXPY Cancellation because of completion of term