JP6548938B2

JP6548938B2 - Speech processing apparatus and speech processing method

Info

Publication number: JP6548938B2
Application number: JP2015074832A
Authority: JP
Inventors: 正司最上; 渡邉　秀雄; 秀雄渡邉
Original assignee: 寿通信機株式会社
Priority date: 2014-01-17
Filing date: 2015-04-01
Publication date: 2019-07-24
Anticipated expiration: 2035-04-01
Also published as: JP2015135267A; JP2016110050A

Description

本発明は、音響装置や放送設備などのスピーカーから流れる音声を明瞭に聴きやすくするための音声処理装置及び音声明瞭化装置並びに音声処理方法に関する。 The present invention relates to a voice processing device, a voice clarifying device, and a voice processing method for making it easy to clearly hear voice flowing from a speaker such as an audio device or a broadcast facility.

一般に、ビルや公共の建物会館、ショッピングセンター、スポーツ施設などといった大勢の人が集まる場所には、必要な情報の案内や緊急のアナウンスを行うための放送設備（館内放送）が設置されている。この放送設備は、例えば建物の天井や壁などにスピーカーを設置し、放送室から送られる音声信号をスピーカーで音声に変換して出るようになっているが、天井板の共振現象や他のスピーカーからでる音声との干渉などの条件によっては音声の内容が聞き取り難いことがある。 In general, in places where a large number of people gather, such as buildings, public building halls, shopping centers, sports facilities, etc., broadcast facilities (in-house broadcasts) are provided for providing necessary information and making emergency announcements. In this broadcast facility, for example, a speaker is installed on the ceiling or wall of a building, and an audio signal sent from a broadcast room is converted to voice by the speaker and output, but the resonance phenomenon of the ceiling plate or other speakers Depending on conditions such as interference with outgoing voice, the contents of the voice may be difficult to hear.

一方、加齢によって聴力が衰えた高齢者や難聴者の場合、音声は聴き取れても何を言っているのかよく分からないことがある。これは人間が発する音声（言語）が母音と子音からなるものであって、このうち母音は比較的容易に聴き取れても子音が聴き取り難いためであると考えられている。従って、スピーカーから出る音量を上げたり、補聴器の感度を高くしただけでは、子音だけでなく母音も大きくなるため、容易に解消できない。 On the other hand, in the case of an elderly person or a hearing impaired person whose hearing ability has deteriorated due to aging, it may be difficult to understand what they are saying even though they can hear the sound. This is thought to be because human-generated speech (language) is composed of vowels and consonants, and vowels among them are difficult to hear even if they can be heard relatively easily. Therefore, just raising the volume from the speaker or increasing the sensitivity of the hearing aid increases the vowel as well as the consonant, so it can not be eliminated easily.

そのため、例えば以下の特許文献１では、携帯電話やテレビ、ステレオなどのオーディオ機器を使用する使用者の年齢による聴覚の変化に合わせてレシーバなどから出力される音声の周波数特性およびレベルを補正する音声補正装置が提案されている。また、以下の特許文献２や３などでは、話者のホルマントを検出して聴きやすいホルマントに変形する方法が提案されている。 Therefore, for example, in Patent Document 1 below, a voice that corrects the frequency characteristics and level of voice output from a receiver or the like in accordance with the change in hearing due to the age of the user who uses an audio device such as a mobile phone, television, or stereo Correction devices have been proposed. Further, in the following Patent Documents 2 and 3 and the like, there is proposed a method of detecting a formant of a speaker and transforming it into an easy-to-listen formant.

特開２０００−２０９６９８号公報Japanese Patent Laid-Open No. 2000-209698 特開２００８−１１６５３４号公報JP, 2008-116534, A 特開平０１−９３７９６号公報Unexamined-Japanese-Patent No. 01-93796 gazette

ところで、前述したような館内放送におけるスピーカーから出る音声を聴き取り易くするためには、放送機器やスピーカーを高品質なものに代えたり、スピーカーの数や設置場所などを工夫するなどが考えられるが、そのためには多大な費用を要する。一方、聴力が衰えた高齢者や難聴者に対する音声の補正方法では、声色（音質）が大きく変化して不自然な音声になってしまうことがある。 By the way, in order to make it easy to listen to the sound emitted from the speaker in the hall broadcast as mentioned above, it is possible to replace the broadcasting equipment and the speaker with high quality ones, devise the number of speakers and the installation place etc. In order to do that, it costs a lot of money. On the other hand, in the method of correcting the voice for elderly people and hearing impaired people whose hearing ability has deteriorated, the voice color (sound quality) may be largely changed to be an unnatural voice.

そこで、本発明はこれらの課題を解決するために案出されたものであり、その目的の１つは、既存の音響装置や放送設備であってもそのスピーカーから流れる音声を明瞭に聴きやすくできる新規な音声処理装置及び音声明瞭化装置並びに音声処理方法を提供するものである。また、本発明の他の目的は、聴力が衰えた高齢者や難聴者であっても違和感の無い自然な音声で聴き取ることができる新規な音声処理装置及び音声明瞭化装置並びに音声処理方法を提供するものである。 Therefore, the present invention has been devised to solve these problems, and one of the purposes of the present invention is to make it easy to clearly listen to the audio flowing from the speaker even if it is an existing audio device or a broadcasting facility. The present invention provides a novel voice processing device, a voice clarification device, and a voice processing method. Another object of the present invention is to provide a novel speech processing apparatus, speech clarifying apparatus, and speech processing method that can be heard with natural speech without discomfort even in the elderly or the deaf who are deaf. It is provided.

人間の聴覚による言語の理解は、鼓膜で受けた音波を聴覚器官で電気信号に変換して脳へ伝達され、過去の経験から得た記憶に照合して言葉として判断されるものであるが、聴覚器官は非常に複雑であり、その働きについてはあまり解明されていない。 The understanding of language by human hearing is that the sound wave received by the eardrum is converted to an electrical signal by the auditory organ and transmitted to the brain, and it is judged as a word by collating with memories obtained from past experiences, The auditory organ is very complex and its function is not well understood.

近年、難聴者向けに補聴器や集音器が「対個人」を目的に特殊な補助音声処理を施した受動的機器類が広く普及しつつあるが、これらは外部から到来する音声や他の音を聴取するもので装着による疲労感や見た目を気にすることから使用をためらうケースが多い。また、高性能なものほど高額となり、経済的負担が大きい。 In recent years, passive devices for which hearing aids and sound collectors have been subjected to special auxiliary sound processing for the purpose of “pair-to-person” are becoming widespread for hearing impaired people, but these are voices from other sources and other sounds There are many cases where hesitation is used because it takes care of the feeling of fatigue and appearance caused by wearing. Also, the higher the performance, the higher the economic burden.

一方、言語の発声メカニズムに関する研究によれば、声帯という器官から強さ、量などが調整された空気が母音として放出される。これを第１ホルマントといい、後に続く軌道、口腔の容積や形状、鼻腔、舌の振動・形状、上顎、下顎による弛緩容積などの調整による共鳴現象で第１ホルマントより高い周波数域にエネルギーの大きい周波数特性上のピークが複数現れ、これらをホルマントと呼んでいる。低い周波数側から高い周波数側に向かって順次現れるピークを第２ホルマント、第３ホルマント、第４ホルマント…、第ｎホルマントとされ、それらを合成したものがその個人独自の声色として発せられる。 On the other hand, according to research on the speech production mechanism of speech, air whose strength, quantity, etc. are adjusted is emitted as a vowel from an organ called a vocal cord. This is called the first formant, which has a large energy in the frequency band higher than that of the first formant due to the resonance phenomenon by adjusting the trajectory, the volume and shape of the oral cavity, the nasal cavity, the vibration and shape of the tongue, the upper and lower jaws A plurality of peaks on the frequency characteristic appear, and these are called formants. The peaks appearing sequentially from the low frequency side to the high frequency side are taken as the second formant, the third formant, the fourth formant,..., The n-th formant, and the synthesized one is emitted as the voice color unique to the individual.

一般に聴覚による音声信号の理解は、その声の流れの中で重要な周波数成分およびその大きさを検知し、それらを脳に伝達して実現されるのであるが、この流れに関する研究は、言語の分野では比較的進んでおり、第１ホルマントから第４ホルマントが特に重量な周波数成分とされている。 Generally speaking, auditory understanding of speech signals is realized by detecting important frequency components and their magnitudes in the voice stream and transmitting them to the brain. The field is relatively advanced, and the first formant to the fourth formant are regarded as particularly heavy frequency components.

そこで、前記課題を解決するための第１の発明は、収音された音声信号を分岐し、分岐した一方の音声信号から低次のホルマント以下の低周波数成分と高次のホルマントを超える高周波数成分とを除去する帯域濾波器と、前記分岐した他方の音声信号の位相を、前記帯域濾波器を通過した音声信号の位相と合うように補正する位相補正器と、前記帯域濾波器を通過した音声信号を前記位相補正器を通過した音声信号に加算合成して出力する加算合成器とを有することを特徴とする音声処理装置である。 Therefore, according to the first invention for solving the above-mentioned problems, the picked-up voice signal is branched, and from one of the branched voice signals, low frequency components below the low formant and high frequencies exceeding the high formant A band filter for removing components, a phase corrector for correcting the phase of the other split audio signal to match the phase of the audio signal passed through the band filter, and the band filter And an adder-synthesizer for adding and synthesizing an audio signal to the audio signal that has passed through the phase corrector, and outputting the audio signal.

このような構成によれば、もとの音声信号から言語の識別に重要な第２〜第４程度のホルマント成分のみを抽出して元の音声信号に加えて出力できるため、健聴者にとっては、条件悪化により減衰した第２〜第４程度のホルマント成分が強調されることにより、条件の悪い放送設備であってもその音声を明瞭に聴き取ることができる。一方、聴力が衰えた高齢者や難聴者のように特に第２〜第４程度のホルマント成分が聴き取り難い対象者にとっては、全体の音量を上げなくともその音声の内容を明瞭に聴き取ることができる。 According to such a configuration, only the second to fourth formant components important for language identification can be extracted from the original speech signal, and can be added to the original speech signal for output. By emphasizing the second to fourth formant components which are attenuated due to the deterioration of the conditions, even the broadcasting equipment with the poor conditions can clearly hear the voice. On the other hand, for subjects who have difficulty hearing the 2nd to 4th degree formant components, such as elderly people and hearing impaired people who have impaired hearing, listen clearly to the contents of the voice without raising the overall volume. Can.

ここで、本発明でいう「低次のホルマント以下の低周波数成分と高次のホルマントを超える高周波数成分とを除去した」周波数帯域としては特に限定するものではないが、例えば言語の認識に重要な要素である第２から第４ホルマント、より望ましくは第２から第５ホルマントを含む周波数帯域をいう（以下、同じである）。 Here, the frequency band “in which the low frequency components lower than the low formants and the high frequency components exceeding the high order formants are removed” in the present invention is not particularly limited, but it is important for language recognition, for example. The second to fourth formants, more preferably the second to fifth formants, which are the same elements (hereinafter the same).

第２の発明は、収音されたステレオ信号となる左右音声信号から低次のホルマント以下の低周波数成分と高次のホルマントを超える高周波数成分とを除去する帯域濾波器と、前記左右音声信号の位相を、前記帯域濾波器を通過した音声信号の位相と合うようにそれぞれ補正する一対の位相補正器と、前記帯域濾波器を通過した音声信号を前記各位相補正器を通過した左右音声信号にそれぞれ加算合成して出力する一対の加算合成器とを有することを特徴とする音声処理装置である。このような構成によれば、ステレオ音声の場合であっても、第１の発明と同様に明瞭で聴き取りやすい音声を出力できる。 According to a second aspect of the present invention, there is provided a band-pass filter for removing low frequency components below a low formant and high frequency components exceeding a high formant from left and right audio signals to be collected stereo signals; A pair of phase correctors that respectively correct the phase of the voice signal that has passed through the bandpass filter, and left and right voice signals that have passed through the phase correctors of the voice signal that has passed through the bandpass filter And a pair of addition synthesizers that respectively add and combine and output the same. According to such a configuration, even in the case of stereo sound, it is possible to output clear and easy-to-listen sound as in the first aspect of the invention.

第３の発明は、第１または第２の発明において、前記帯域濾波器は、分岐した一方の音声信号から低次のホルマント以下の低周波数成分を除去するハイパスフィルターと、前記ハイパスフィルターを通過した音声信号から高次のホルマントを超える高周波数成分を除去するローパスフィルターとからなることを特徴とする音声処理装置である。このような構成によれば、もとの音声信号から言語の識別に重要な第２〜第４程度のホルマント成分のみを確実に抽出することができる。 In a third invention according to the first or second invention, the band-pass filter passes a high pass filter for removing low frequency components below a low formant from one branched audio signal, and the high pass filter A voice processing apparatus comprising: a low pass filter for removing high frequency components exceeding a high-order formant from a voice signal. According to such a configuration, it is possible to reliably extract only the second to fourth formant components important for language identification from the original speech signal.

第４の発明は、第１〜第３の発明において、前記帯域濾波器を通過した合成前の音声信号の値を調整する効果調整器を備えたことを特徴とする音声処理装置である。条件や放送設備の性能などにより、音声の識別に重要な第２〜第４程度のホルマント成分の減衰量は一様ではない。また、聴力が衰えた高齢者や難聴者の場合、第２〜第４程度のホルマント成分の聴取能力には個人差が大きい。そこで、本発明のように効果調整器によって帯域濾波器で抽出した第２〜第４程度のホルマント成分の値を適宜調整（増減）することによって明瞭で且つ違和感のない音声を出力することができる。 A fourth aspect of the invention is the voice processing device according to any one of the first to third aspects of the invention, including an effect adjuster for adjusting the value of the voice signal before synthesis which has passed through the band-pass filter. Due to the conditions and the performance of the broadcasting equipment, the attenuation amounts of the second to fourth formant components that are important for voice identification are not uniform. In addition, in the case of elderly people and hearing impaired people who have impaired hearing, there are large individual differences in listening ability of the second to fourth formant components. Therefore, it is possible to output clear and incongruent voice by appropriately adjusting (increasing or decreasing) the values of the second to fourth formant components extracted by the band-pass filter by the effect adjuster as in the present invention. .

第５の発明は、可搬自在な筐体内に、前記第１〜第４のいずれかの音声処理装置を収容すると共に、前記筐体の表面に、収音用のマイクロホンを脱着可能に接続する入力側接続口と、前記音声処理装置で処理した音声信号を放送設備に出力する出力側接続口とを備えた音声明瞭化装置である。このような構成によれば、既存の放送設備に対して本発明装置を簡単に組み込むことが（付設）できるため、低コストで優れた効果を発揮できる。また、持ち運びが容易となるため、屋外のイベント会場の放送設備や他の放送設備にも簡単に適用できる。 According to a fifth aspect of the present invention, in any one of the first to fourth voice processing devices is accommodated in a portable housing, and a microphone for sound collection is detachably connected to the surface of the housing. A voice disambiguation device comprising an input-side connection port and an output-side connection port for outputting an audio signal processed by the audio processing device to a broadcast facility. According to such a configuration, since the device of the present invention can be easily incorporated (attached) to the existing broadcasting equipment, excellent effects can be exhibited at low cost. In addition, because it is easy to carry, it can be easily applied to the broadcasting facilities and other broadcasting facilities of an outdoor event venue.

第６の発明は、収音された音声信号を分岐する第１のステップと、前記第１のステップで分岐した一方の音声信号から第１ホルマント以下の低周波数成分と高次のホルマントを超える高周波数成分とを除去する第２のステップと、前記第１のステップで分岐した他方の音声信号の位相を、前記第２のステップで処理された音声信号の位相と合うように補正する第３のステップと、前記第２のステップで処理された音声信号と前記第３のステップで処理された音声信号とを合成して出力する第４のステップとを含むことを特徴とする音声処理方法である。このような構成によれば、第１の発明と同様に、明瞭で聴き取りやすい音声を出力できる。 According to a sixth aspect of the present invention, there is provided a first step of branching a picked up audio signal, and a high frequency over low form components below the first formant and high order formants from one of the audio signals branched in the first step. A second step of removing frequency components and a third step of correcting the phase of the other audio signal branched in the first step to match the phase of the audio signal processed in the second step A voice processing method comprising: a step; and a fourth step of synthesizing and outputting the voice signal processed in the second step and the voice signal processed in the third step. . According to such a configuration, as in the first aspect of the invention, clear and easy-to-hear voice can be output.

本発明によれば、もとの音声信号から言語の識別に重要な第２〜第４程度のホルマント成分のみを抽出して元の音声信号に加えて出力できるため、健聴者にとっては、条件悪化により減衰した第２〜第４程度のホルマント成分が強調されることにより、条件の悪い放送設備であってもその音声を明瞭に聴き取ることができる。一方、聴力が衰えた高齢者や難聴者のように特に第２〜第４程度のホルマント成分が聴き取り難い対象者にとっては、全体の音量を上げなくともその音声の内容を明瞭に聴き取ることができる。 According to the present invention, since only the second to fourth formant components important for language identification can be extracted from the original speech signal and added to the original speech signal, the condition is deteriorated for the hearing person. By emphasizing the formant components of the second to fourth degree which are attenuated by the above, it is possible to clearly hear the voice even in a bad condition of the broadcasting equipment. On the other hand, for subjects who have difficulty hearing the 2nd to 4th degree formant components, such as elderly people and hearing impaired people who have impaired hearing, listen clearly to the contents of the voice without raising the overall volume. Can.

音声の流れのなかでみられる周波数分析例を示した図である。It is a figure showing an example of frequency analysis seen in a flow of speech. 音声の流れの中からサンプリングした言語認識の過程を説明した例を示した図である。It is the figure which showed the example which demonstrated the process of the language recognition sampled from the flow of speech. 感音性難聴におけるＶの音声レベルを１０ｄＢ増加した例を示した図である。It is the figure which showed the example which increased the audio | voice level of V in the sensorineural deafness by 10 dB. 第１ホルマントｈ１以外のＧ領域のホルマントを帯域濾波器で抽出し、振幅増幅させた各ホルマントの各高調波成分信号を主音声信号へ加算合成した状態を示した図である。FIG. 7 is a diagram showing a formant of a G region other than the first formant h1 extracted by a band-pass filter, and a state in which each harmonic component signal of each formant whose amplitude is amplified is added and synthesized to a main audio signal. 本発明に係る音声処理装置１００の実施の一形態を示した図である。It is a figure showing an embodiment of speech processing unit 100 concerning the present invention. 位相補正器３の回路構成図である。FIG. 2 is a circuit diagram of a phase corrector 3; 入力緩衝器２と帯域濾波器３０（４、５）の周波数特性図である。It is a frequency characteristic figure of the input buffer 2 and the band-pass filter 30 (4, 5). 帯域濾波器３０（４、５）による位相ずれ特性図である。It is a phase shift characteristic figure by band pass filter 30 (4, 5). 位相補正器３が無い場合の位相差Ｔを示す図である。It is a figure which shows the phase difference T in case there is no phase corrector 3. FIG. 位相差による周波数特性図である。It is a frequency characteristic figure by a phase difference. 位相変換器３の位相可変範囲を示す図である。FIG. 6 is a diagram showing a phase variable range of the phase converter 3; 効果調整器６による出力特性を示す図である。FIG. 6 is a diagram showing an output characteristic of the effect adjuster 6; 位相補正器３がある場合の同位相加算を示す図である。It is a figure which shows the same phase addition in case the phase correction device 3 exists. 既存の有線放送設備の構成を示す図である。It is a figure showing composition of existing cable broadcasting equipment. 既存の無線放送設備の構成を示す図である。It is a figure which shows the structure of the existing radio broadcasting installation. 本発明に係る音声明瞭化装置２００の実施の一形態を示した図である。It is the figure which showed one Embodiment of the audio | voice clarification apparatus 200 based on this invention. 本発明に係る音声処理装置１００の他の実施形態（ステレオ音声）を示した図である。It is the figure which showed other embodiment (stereo audio | voice) of the speech processing apparatus 100 based on this invention.

以下、本発明の実施の形態を添付図面を参照しながら説明する。図１は、音声の流れのなかでみられる周波数分析例を示したものである。前述したように人が言葉を発すると共鳴現象に伴い、ホルマントと称される数次の高調波が同時に放出される。これら数次の高調波のうち、主に母音に影響を与える基本波ｈ１を第１ホルマントといい、次に周波数が高い高調波ｈ２を第２ホルマント、その次に周波数が高い高調波ｈ３を第３ホルマント、その次に周波数が高い高調波ｈ４を第４ホルマントと称されており、以後順に周波数が高い高調波ｈｎを第ｎホルマントと称されている。そして、高次のホルマントになるほど低いレベルへと減衰するが、このうち第２〜第５程度のホルマント成分が特に言語理解に重要な要素となっていることが判明している。 Hereinafter, embodiments of the present invention will be described with reference to the attached drawings. FIG. 1 shows an example of frequency analysis found in the flow of speech. As described above, when a person utters a word, a number of harmonics called formant are simultaneously emitted along with the resonance phenomenon. Of these several harmonics, the fundamental wave h1 that mainly affects vowels is called the first formant, the next highest form harmonic h2 is the second formant, and the second highest harmonic h3 is the second formant. The third formant, the second highest harmonic h4 is called the fourth formant, and the second highest harmonic hn is called the nth formant. And, the higher the formant, the lower the level decreases, but it has been revealed that the second to fifth formant components are particularly important elements for language comprehension.

言語分野における研究によれば、聴覚は各ホルマントを敏感に検出し、また言葉を知覚する能力は幼児期に学習し、記憶しているといわれている。そのため、聴覚から得られた言語情報の一部が欠落していても過去に記憶蓄積された経験で欠落部分が補完され、一般的な日常会話であれば問題にはならないといわれている。 According to research in the language field, hearing senses each formant sensitively, and the ability to perceive words is said to learn and remember in early childhood. Therefore, it is said that even if part of the language information obtained from the hearing is missing, the missing part is complemented by the experience stored in the past, and this is not a problem in general daily conversation.

しかしながら、聞こえてきた音を言葉として認識するためには、聴覚が前記のホルマントを検出する必要があるが、もしこれらのホルマントのうち重要なホルマント成分が検出できないと言葉として認識できない。例えば、５００Ｈｚ〜３ｋＨｚの帯域を削除すると言語を全く理解できないという実験結果も報告されている。 However, in order to recognize the sound that has been heard as words, it is necessary for the auditory sense to detect the above-mentioned formants, but if these important formants can not be detected as words, it can not be recognized as words. For example, experimental results have also been reported that if the band of 500 Hz to 3 kHz is eliminated, the language can not be understood at all.

音を聴き取り難い難聴は、外耳、中耳の障害による伝音性難聴と、内耳、聴覚神経、脳の障害による感音性難聴と、これらの両方に起因する複合性難聴との３種類に大きく分類される。このうち、軽度の難聴である伝音性難聴の場合は、例えばテレビの音量を大きくしたり、対話者が耳元で話しかけるなどで対応できるが、感音性難聴の場合は、加齢と共に進行し、到来音を大きくしてもただ「ゴホン、ゴホン」という音として聞こえるだけで言葉の内容が理解できない。これは音量を上げても第２ホルマント〜第４ホルマントの中レベルのホルマントを検出する能力が低下してしまったためと考えられる。 There are three types of hard-of-hearing deafness: conductive deafness due to external and middle ear disorders, and sensorineural deafness due to both inner and auditory nerves and brain disorders, combined hearing loss due to both of these. It is classified broadly. Among them, in the case of conductive hearing loss that is mild deafness, for example, the volume of the television can be increased or the interlocutors can talk at the ear, but in the case of sensorineural hearing loss, it progresses with aging. Even if you make the incoming sound louder, you can not understand the content of the words just by hearing it as a "Gohon, Gohon" sound. It is considered that this is because the ability to detect the medium level formant of the second formant to the fourth formant is lowered even if the volume is increased.

また、聴覚には過大音が到達したときにその音の周波数近辺の他の小さな音は検出され難いか、またはある音が別の音に妨害されて聴き取り難いという性質がある。これをマスキング効果といい、過大音に対して脳の自己防衛作用が働くためといわれている。これらの代表例として、例えば大きな会場に設置される放送設備において天井部に埋め込まれたスピーカーから出る音声成分に含まれる低音域の音圧レベルが、天井板によるバッフル効果によって２〜３倍程度に上昇することがある。 Also, when an excessive sound arrives in the auditory sense, other small sounds in the vicinity of the frequency of the sound are difficult to detect or there is a property that one sound is disturbed by another sound and difficult to hear. This is called masking effect, and it is said that the brain's self-defense action works against excessive noise. As a typical example of these, the sound pressure level of the low frequency range included in the sound component emitted from the speaker embedded in the ceiling in a broadcasting facility installed in a large hall, for example, is about 2 to 3 times by the baffle effect by the ceiling plate It may rise.

この結果、そのスピーカーから出る音声成分のうち、第２ホルマント以上のホルマント成分がマスキングされてしまい放送内容が聴き取れないという事態がしばしば起こる。一方、高齢者の場合では音声検知能力が低下するため、言語の基本波である第１ホルマントの音量が上昇すると第２ホルマント以降のホルマントがマスキングされて言葉が聴き取れなくなってしまう。 As a result, among the audio components emitted from the speaker, the formant component of the second or more formant is often masked and the broadcast content can not be heard. On the other hand, in the case of an elderly person, the voice detection ability is lowered, so when the volume of the first formant which is the fundamental wave of the language rises, the formants after the second formant are masked and the words can not be heard.

以上の考察から言語を正確に伝達するためには、第２ホルマント以上の高調波成分を増幅させてマスキングレベルを超えるようにすることでこれらの難聴の問題を解決できると考える。 From the above discussion, in order to accurately transmit the language, it is considered that these hearing loss problems can be solved by amplifying the harmonic components of the second formant or higher to exceed the masking level.

図２は、音声の流れの中からサンプリングした言語認識の過程を説明した例を示したものである。横軸は周波数（Ｈｚ）、縦軸は音圧（ｄＢ）であり、Ｖ（ヴイ）という声に含まれるホルマント成分とそのレベルを示している。第１ホルマントｈ１は最も低い周波数成分（母音）を有しているが、その音圧は最も高いレベルとなっている。以後、第２ホルマントｈ２〜第ｎホルマントｈｎになるに従ってその音圧は徐々に低いレベルに推移している。 FIG. 2 shows an example illustrating the process of language recognition sampled from the flow of speech. The horizontal axis is frequency (Hz) and the vertical axis is sound pressure (dB), which indicates the formant component and its level included in the voice V (Vui). The first formant h1 has the lowest frequency component (vowel), but its sound pressure is at the highest level. Thereafter, the sound pressure gradually changes to a lower level as the second formant h2 to the n-th formant hn are reached.

この図において、ラインＭＫは健聴者のマスキング範囲を示し、ラインＭＫ’は難聴者のマスキング範囲を示している。図示するように健聴者のマスキングラインＭＫはすべてのホルマントより低いため、マスキング効果を受け難くＶ音声を明瞭に聴き取れるのに対し、難聴者のマスキングラインＭＫ’は、第１ホルマントｈ１を除く他のホルマントよりも高いため、マスキング効果が著しくＶ音声を聴き取れることができないことが分かる。 In this figure, the line MK indicates the masking area of the hearing person, and the line MK 'indicates the masking area of the deaf person. As shown in the figure, since the masking line MK for a hearing person is lower than all formants, it is hard to receive the masking effect and can clearly hear V sound, while the masking line MK ′ for a hearing impaired person is the other than the first formant h1. It can be seen that the masking effect can not be heard in V sound significantly because it is higher than the formant of.

図３は、感音性難聴におけるＶの音声レベルを１０ｄＢ増加した場合を示したものである。図においてｈ１＋〜ｈｎ＋はそれぞれ各ホルマントｈ１〜ｈｎをそれぞれ１０ｄＢ増加したレベルを示しているが、難聴者のマスキングラインＭＫ’は単にそのまま上方向へ平行移動するだけであり、音量を上げただけでは言語の理解は不可能であることが分かる。 FIG. 3 shows the case where the voice level of V in sensorineural deafness is increased by 10 dB. In the figure, h1 + to hn + indicate the levels in which each formant h1 to hn is increased by 10 dB, respectively, but the masking line MK ′ of the deaf person merely moves parallel upward as it is, and raising the volume is sufficient. It turns out that understanding the language is impossible.

図４は、音声認識に重要な要素である、第１ホルマントｈ１以外のＧ領域のホルマント（第２ホルマントｈ２〜第ｎホルマントｈｎ）を帯域濾波器（バンドパスフィルタ）で抽出し、振幅増幅させた各高調波成分信号ホルマントｈ２＋＋〜ｈｎ＋＋を主音声信号へ加算合成した状態を示したものである。図示するように、このような処理をすれば全てのホルマント成分が難聴者のマスキングラインＭＫ’を上回る結果となって音声Ｖを理解できることが分かる。 FIG. 4 shows that the formant (the second formant h2 to the nth formant hn) of the G region other than the first formant h1 which is an important element for speech recognition is extracted by a band pass filter (band pass filter) and amplified in amplitude. It shows a state in which each harmonic component signal formant h 2 ++ to hn ++ is added and synthesized to the main speech signal. As shown in the drawing, it can be understood that if such processing is performed, all formant components exceed the masking line MK 'of the deaf person and the voice V can be understood.

なお、従来の補聴器や集音器は、離れた位置から発生される音声を集めることが主体となっているが、空中を伝播する途中で高次のホルマント成分は周囲の騒音や拡散、空気質量により減衰して言語理解度が著しく阻害されるため、一層複雑な音声処理が必要になると思われる。 Although conventional hearing aids and sound collectors mainly collect voices generated from distant locations, high-order formant components may cause ambient noise, diffusion, and air mass while propagating through the air. It is thought that more complicated speech processing will be required, as it will be attenuated by this and speech comprehension will be significantly impaired.

本発明の音声処理装置は、発声者の口元の近い距離のマイクロホンで収音された音声、すなわち空気などによる伝搬損失が少なく、言語理解に重要な要素となる高次のホルマント成分が減衰されていない状態のピュアな音声を主な対象とし、その音声から高次のホルマント成分を抽出して元の音声に加算合成すれば、条件の悪い放送設備や難聴者であっても明瞭に音声の内容を聴き取ることが可能となるとの知見のもとに案出されたものである。 In the speech processing apparatus according to the present invention, speech collected by a microphone at a distance close to the mouth of the speaker, that is, propagation loss due to air or the like is small, and high-order formant components that are important elements for speech comprehension are attenuated. If the main target is pure voice in the absence state and high-order formant components are extracted from the voice and added and synthesized to the original voice, the contents of the voice will be clear even in poor broadcasting equipment and deaf people It was devised based on the knowledge that it would be possible to listen to

図５は、本発明に係る音声処理装置１００の実施の一形態を示したものである。図中符号１は収音用のマイクロホン、２はこのマイクロホン１で発声する微小電圧信号を数倍に増幅して効率良く処理するための緩衝増幅器、３はこの緩衝増幅器２で増幅された音声信号の位相を補正する位相補正器、３０は同じく緩衝増幅器２で増幅された音声信号のうち低次のホルマント以下の周波数成分と高次のホルマント以上の周波数成分とを除去する帯域濾波器である。 FIG. 5 shows an embodiment of the speech processing apparatus 100 according to the present invention. In the figure, reference numeral 1 is a microphone for sound collection, 2 is a buffer amplifier for amplifying the minute voltage signal produced by the microphone 1 several times and processing efficiently, and 3 is an audio signal amplified by the buffer amplifier 2 phase corrector for correcting the phase, 30 is also bandpass filters for removing the low-order formant frequency components lower than the higher formant frequencies above components of the amplified audio signal at buffer amplifier 2.

また、図中６はこの帯域濾波器３０を通過した音声信号の値を調整する調整器、７はこの調整器７で調整された音声信号を位相補正器３で補正された音声信号に加算合成する加算合成器、８はこの加算合成器で合成された音声信号の出力を調整する出力緩衝器、９は音声信号をマイクロホン１の出力感度に調整するための減衰器、１０は出力端子である。 Further, in the figure, reference numeral 6 is an adjuster for adjusting the value of the audio signal which has passed through the band filter 30, and 7 is an addition synthesis of the audio signal adjusted by the adjuster 7 to the audio signal corrected by the phase corrector 3. 8 is an output buffer for adjusting the output of the audio signal synthesized by the addition / synthesizer, 9 is an attenuator for adjusting the audio signal to the output sensitivity of the microphone 1, and 10 is an output terminal .

帯域濾波器３０は、さらにハイパスフィルター４とローパスフィルター５とから構成されており、ハイパスフィルター４によって音声信号のなかから第１ホルマントｈ１以下の低周波成分を除去し、ローパスフィルター５によってハイパスフィルター４を通過した音声信号から高次のホルマントを超える高周波数成分を除去して第２ホルマント〜第５ホルマントの周波数成分を抽出するようになっている。 The band pass filter 30 further includes a high pass filter 4 and a low pass filter 5, and removes low frequency components below the first formant h 1 from the audio signal by the high pass filter 4. The high frequency components exceeding the high-order formants are removed from the audio signal which has passed through to extract the frequency components of the second formant to the fifth formant.

位相補正器３は、図６に示すように入力端子１５、入力接地端子１６、出力端子１７、出力接地端子１８、演算増幅器１９、コンデンサ、抵抗Ｒ、Ｒ１、Ｒ２とからなる回路構成を有しており、音声信号を後述するように０〜−１８０度の範囲で任意に位相遅延制御できるようになっている。 The phase corrector 3 has a circuit configuration including an input terminal 15, an input ground terminal 16, an output terminal 17, an output ground terminal 18, an operational amplifier 19, a capacitor, and resistors R, R1 and R2, as shown in FIG. The phase delay control can be performed arbitrarily in the range of 0 to -180 degrees as described later.

以下、このような構成をした音声処理装置１００の作用を説明する。図５に示すようにまず発話者の口元で発せられた音声は殆ど減衰することなく直接マイクロホン１で集音されて電気信号（音声信号）に変換されて緩衝増幅器２で数倍に増幅される。増幅された音声信号はその後、分岐してその一方が主音声経路Ｌ１を通過して位相補正器３に送られて後述するようにその位相が補正処理される。 Hereinafter, the operation of the speech processing apparatus 100 having such a configuration will be described. As shown in FIG. 5, first, the voice emitted at the speaker's mouth is directly collected by the microphone 1 with almost no attenuation, converted into an electric signal (voice signal) and amplified several times by the buffer amplifier 2 . The amplified audio signal is then branched, one of which passes through the main audio path L1 and is sent to the phase corrector 3, where its phase is corrected as described later.

他方、副音声経路Ｌ２側に送られた音声信号は、帯域濾波器３０で第２ホルマント〜第５ホルマントの周波数成分が抽出される。すなわち、この帯域濾波器３０のハイパスフィルター４を通過することによって第１ホルマントおよび有害な低周波成分、例えばマイクロホン１に吹き付けられる息などで生ずる風切り音が抑圧された後、ローパスフィルタ−５を通過することによって各部の増幅素子自体の発する耳障りな高周波雑音を抑圧して図７の曲線ｎａで示す一種の広帯域濾波器特性を持たせ改善効果の向上を図っている。 On the other hand, in the audio signal sent to the side of the sub audio path L2, the band pass filter 30 extracts frequency components of the second formant to the fifth formant. That is, after passing through the high pass filter 4 of this band pass filter 30 after the wind noise generated by the first formant and harmful low frequency components such as the breath blown to the microphone 1 is suppressed, it passes through the low pass filter 5 By suppressing the offensive high frequency noise generated by the amplification element itself of each part, the wide band filter characteristic shown by the curve na in FIG. 7 is given to improve the improvement effect.

ハイパスフィルター４は、明瞭度合いを任意に設定するため５倍程度の増幅器としての機能を併せ持ち、ローパスフィルタ−５は、さらに言語の認識に対する貢献度の低い超高次のホルマントを含む広域周波数成分を除去することで信号対雑音比性能の特性を改善している。この帯域濾過器３０の出力点ｚの周波数特性を図７の曲線ｂｐで示す。本実施の形態では、この帯域濾過器３０の周波数特性は様々な検証テストで設定された４００Ｈｚ~７ｋＨｚの約４オクターブであり、低域、高域ともおよそ−１２ｄＢ／ｏｃｔ程度で各々遮断している。 The high pass filter 4 also has a function as an amplifier of about 5 times in order to set the degree of clarity arbitrarily, and the low pass filter-5 further includes wide-range frequency components including ultra high-order formants that contribute less to speech recognition. The removal improves the characteristics of the signal to noise ratio performance. The frequency characteristic of the output point z of the band filter 30 is shown by a curve bp in FIG. In this embodiment, the frequency characteristic of the band filter 30 is about 4 octaves of 400 Hz to 7 kHz set in various verification tests, and the low band and high band are cut off at about -12 dB / oct respectively. There is.

この帯域濾過器３０を出た音声信号は、効果調整器６でその値が調整されてから、加算合成器７で主音声経路Ｌ１側の音声信号に加算合成される。加算合成された音声信号は、出力緩衝器８および減衰器９で適宜その出力が調整された後、出力端子１０から既存の放送設備などへ出力される。 The audio signal output from the band filter 30 is adjusted in its value by the effect adjuster 6, and then added and synthesized to the audio signal on the main audio path L1 side by the adder / synthesizer 7. The audio signal subjected to the addition and synthesis is appropriately adjusted in its output by the output buffer 8 and the attenuator 9, and then output from the output terminal 10 to an existing broadcasting facility or the like.

ここで、この帯域濾過器３０においては図５のｘ点からの入力信号が図８に示すように音声帯域内で最大９０度程度の位相遅れＴをもってｚ点に出力されるという本質的な特性がある。このため、このｚ点に出力された音声信号を、主音声経路Ｌ１を通過する主音声信号にそのまま加算合成器７で合成すると、図９に示すように主音声信号電圧をＶｘ、帯域濾過器３０の出力電圧をＶｚとすると、Ｖｚは位相遅れのため、−Ｖｚと考えられることでＶｘ＋（−Ｖｚ）＝Ｖｋとなり、２信号の電圧差が加算合成器７の出力となる。 Here, in this band-pass filter 30, the essential characteristic is that the input signal from point x in FIG. 5 is output to point z with a phase delay T of about 90 degrees at the maximum within the voice band as shown in FIG. There is. Therefore, when the voice signal output to point z is directly synthesized with the main voice signal passing through the main voice path L1 by the addition synthesizer 7, as shown in FIG. 9, the main voice signal voltage is Vx, and the band filter is Assuming that the output voltage of 30 is Vz, Vz is considered to be −Vz because Vz is a phase delay, and Vx + (− Vz) = Vk, and the voltage difference between the two signals becomes the output of the adder-combiner 7.

この結果、図１０に示すようにある周波数域に周波数特性上ｄなる「深い谷」特性という不感帯を生じてしまい、言語理解上特に重要な第２ホルマントおよび第３ホルマント金パンの音量が極端に小さくなるなどの明瞭度の改善に大きな障害となる不都合が発生する。図９は２音声信号の位相を考慮した場合の加算合成器７における信号の相殺現象を示したものである。 As a result, as shown in FIG. 10, a dead zone of a "deep valley" characteristic, which is frequency characteristic d, is generated in a certain frequency range, and the volume of the second formant and the third formant gold pan particularly important for language comprehension There is a disadvantage that the improvement of the clarity such as the reduction becomes a big obstacle. FIG. 9 shows the cancellation phenomenon of the signals in the summing synthesizer 7 when the phases of the two audio signals are taken into consideration.

本発明は、このような不感帯を解消するために主音声信号経路Ｌ１上に位相補正器３を設け、図１３に示すように高次のホルマント抽出のための帯域濾波器３０（４、５）で生ずる位相遅れに相当する位相遅延を加えて補正し、補正した音声信号に帯域濾波器３０（４、５）を通過した音声信号を加算合成することで不感帯の発生を回避するようにしたものである。 According to the present invention, the phase corrector 3 is provided on the main audio signal path L1 in order to eliminate such a dead zone, and as shown in FIG. 13, the band pass filter 30 (4, 5) for high-order formant extraction. To avoid the generation of a dead zone by adding and correcting a phase delay corresponding to the phase delay occurring at the same time and adding a voice signal passing through the band filter 30 (4, 5) to the corrected voice signal It is.

従って、主音声経路Ｌ１側に設けられた位相補正器３は、本発明の音声処理装置１００を実現する上で重要な役割を担っていると共に大きな特徴となっている。すなわち、図６に示すようにこの位相補正器３を構成する回路では、角周波数はω＝１／ＣＲで決定され、０〜−１８０度の範囲で任意に位相遅延制御が達成できる。ちなみにＣ及びＲ値を任意に選択することで利得を一定にしたまま低い周波数帯であっても位相遅れを限りなく０度に近づけることができるという優れた特質があり、その位相特性を図１１に示す。 Therefore, the phase corrector 3 provided on the main audio path L1 side plays an important role in realizing the audio processing device 100 of the present invention and has a great feature. That is, as shown in FIG. 6, in the circuit constituting this phase corrector 3, the angular frequency is determined by ω = 1 / CR, and phase delay control can be achieved arbitrarily in the range of 0 to −180 degrees. By the way, there is an excellent characteristic that by selecting C and R values arbitrarily, even in the low frequency band while keeping the gain constant, the phase delay can be made to approach 0 degree as much as possible. Shown in.

図６において入力端子１５から入力された音声信号は演算増幅器１９の反転端子（−）に印加されるため、出力端子１７には入力された信号とは逆相の反転信号が現れる。一方、同時に入力端子１５にはコンデンサＣが演算増幅器１９の非反転端子（＋）へ接続されているが、コンデンサＣは周波数よりそのインピーダンスが変化する素子でそのインピーダンスＺは１／ωＣであることから周波数に反比例し、周波数が高くなるほど内部抵抗（インピーダンス）が低くなるという特性がある。 In FIG. 6, since the audio signal inputted from the input terminal 15 is applied to the inverting terminal (-) of the operational amplifier 19, an inverted signal having the reverse phase to the signal inputted appears at the output terminal 17. On the other hand, at the same time the capacitor C is connected to the non-inversion terminal (+) of the operational amplifier 19 at the input terminal 15, but the capacitor C is an element whose impedance changes from the frequency and its impedance Z is 1 / ωC. Is inversely proportional to the frequency, and the higher the frequency, the lower the internal resistance (impedance).

そのため、高い周波数ほど非反転入力端子への入力信号が増加し、演算増幅器１９内部では両方の信号を演算し、非反転側の信号が大きければ出力端子１７の信号の位相遅延幅を小さくし、周波数が低くなればインピーダンスが大きくなるため、非反転入力端子への入力信号が小さくなり、出力端子１７には大きな遅延幅を持った信号が出力されるのであるが、希望周波数での位相遅延度合いは、抵抗Ｒ、コンデンサＣの定数で自由に設定が可能になる。なお、この位相補正器３の増幅度はｒ２／ｒ１で決定され、目的に応じて任意に設定可能であり、常に一定に保つことができる。 Therefore, as the frequency increases, the input signal to the noninverting input terminal increases, and both signals are calculated in the operational amplifier 19. If the signal on the noninverting side is large, the phase delay width of the signal at the output terminal 17 is reduced, As the frequency decreases, the impedance increases, so the input signal to the noninverting input terminal decreases, and a signal with a large delay width is output to the output terminal 17. However, the phase delay degree at the desired frequency is Can be freely set by the constants of the resistor R and the capacitor C. The amplification degree of the phase corrector 3 is determined by r 2 / r 1, can be arbitrarily set according to the purpose, and can be always kept constant.

そして、図４のＧ範囲にある言語理解に重要な高次ホルマント成分は効果調整器６により加算合成器に印加され、マスキングラインＭＫ’を超えるように任意に設定することで出力緩衝器８の合成出力ｔ点では、図１２（Ａ）〜（Ｃ）に示すように周波数対出力レベル曲線は、効果調整器６の設定位置により、「最小位置」、「中央位置」、「最大位置」のように使用場所や難聴者の程度に応じて最適な効果が得られるように自由に選択することができる。 Then, high-order formant components important for language comprehension in the G range of FIG. 4 are applied to the addition synthesizer by the effect adjuster 6, and are arbitrarily set so as to exceed the masking line MK ′. At the synthesized output t point, as shown in FIGS. 12A to 12C, the frequency vs. output level curve has “minimum position”, “center position” and “maximum position” depending on the setting position of the effect adjuster 6. Depending on the place of use and the degree of the person with hearing loss, it can be freely selected to obtain the optimum effect.

さらに、本発明の音声処理装置１００は、出力側に減衰器９を備えることにより、その出力感度をマイクロホン１の出力感度に合わせことが可能となるため、図１４や図１５に示すような既存の有線及び無線の放送設備に対しても簡単に組み込むことができる。この結果、既存の放送設備に対しては何ら改造や改良を加える必要がなくなり、低コストで優れた館内放送の音質改善効果が得られる。 Furthermore, the speech processing apparatus 100 according to the present invention can adjust its output sensitivity to the output sensitivity of the microphone 1 by providing the attenuator 9 on the output side, so the existing as shown in FIG. 14 and FIG. It can be easily incorporated into wired and wireless broadcasting facilities. As a result, there is no need to modify or improve the existing broadcasting equipment, and the low-cost and excellent sound quality improvement effect of in-house broadcasting can be obtained.

この場合には、さらに図１６に示すように持ち運び（可搬）自在な筐体（金属ケース）３３内に、本発明の音声処理装置１００を内蔵し、その筐体３３の表面に収音用のマイクロホン１を脱着可能に接続する入力側接続口３４と、音声処理装置１００で処理した音声信号を放送設備に出力する出力側接続口３５を設けるような構成としたユニット状の音声明瞭化装置２００とすれば、取り付けや取り外しが容易になるだけでなく、持ち運びや収容が簡単にできるため、屋外のイベント会場の放送設備や他の放送設備にも簡単に適用できる。 In this case, as shown in FIG. 16, the voice processing apparatus 100 of the present invention is further incorporated in a portable (portable) case (metal case) 33, and the surface of the case 33 is used for sound collection. Unit-like voice clarification device configured to be provided with an input-side connection port 34 for detachably connecting the microphone 1 of the above, and an output-side connection port 35 for outputting a voice signal processed by the voice processing apparatus 100 to a broadcast facility If it is 200, not only installation and removal becomes easy, but also it can be carried and stored easily, so it can be easily applied to the broadcasting facilities of the outdoor event hall and other broadcasting facilities.

また、本発明に係る音声処理装置１００の他の実施の形態として、図１７に示すように主音声経路Ｌ１を左右一対設け、各主音声経路Ｌ１−Ｒ、Ｌ１−Ｌの音声信号を前記と同じように副主音声経路Ｌ２で帯域濾過し、その抽出した信号（第２ホルマント〜第５ホルマント）をそれぞれの加算合成器７ａ、７ｂで各主音声経路Ｌ１−Ｒ、Ｌ１−Ｌの音声信号に加算合成すれば、ステレオ音声の場合であっても同様な効果を得ることができる。 Further, as another embodiment of the speech processing apparatus 100 according to the present invention, as shown in FIG. 17, a pair of main speech paths L1 is provided on the left and right, and the speech signals of the respective main speech paths L1-R and L1-L are Similarly, the band-pass filter is performed in the sub main voice path L2, and the extracted signals (the second formant to the fifth formant) are voice signals of the respective main voice paths L1-R, L1-L in the respective addition synthesizers 7a and 7b. Similar effects can be obtained even in the case of stereo sound by performing addition synthesis on.

すなわち、図１７に示すように一対の信号入力端子１ａ、１ｂからそれぞれ入力されたステレオ信号となる左右音声信号は、各入力緩衝装置２ａ、２ｂで増幅されると共に、後段の動作を正常とするために低インピーダンス出力変換される。各入力緩衝装置２ａ、２ｂからでた左右音声信号（ステレオ信号）は、それぞれ各主音声経路Ｌ１−Ｒ、Ｌ１−Ｌと入力加算器３１側に分岐する。 That is, as shown in FIG. 17, the left and right audio signals as stereo signals respectively input from the pair of signal input terminals 1a and 1b are amplified by the input buffer devices 2a and 2b, and the operation of the subsequent stage is normal. Low impedance output conversion. The left and right audio signals (stereo signals) from the input buffer devices 2a and 2b are branched to the main audio paths L1-R and L1-L and the input adder 31, respectively.

入力加算器３１側に分岐した左右音声信号（ステレオ信号）は、ここで合流してから混合緩衝器３２でモノラル信号に変換され、ハイパスフィルタ−４とローパスフィルター５を含む帯域濾波器３０で増幅されると共に高次のホルマントが抽出されて効果調整器６に送られてその値が調整される。 The left and right audio signals (stereo signals) branched to the input adder 31 side are merged here and converted to a monaural signal by the mixing buffer 32 and amplified by the band filter 30 including the high pass filter 4 and the low pass filter 5 At the same time, the high-order formants are extracted and sent to the effect adjuster 6 to adjust their values.

一方、各主音声経路Ｌ１−Ｒ、Ｌ１−Ｌ側を通過する各左右音声信号（ステレオ信号）は、各位相補正器３ａ、３ｂで前述したようにその位相が帯域濾波器３０で必然的に生ずる位相遅れ分に相当する時間遅れが補正されて同位相となってそれぞれ加算合成器７ａ、７ｂにおいて効果調整器６で効果が調整されたモノラル音声信号と加算合成される。その後、各左右音声信号（ステレオ信号）は、それぞれの出力緩衝器８ａ、８ｂでインピーダンスが下げられた後、信号出力端子１０ａ、１０ｂから出力されることになる。 On the other hand, the left and right audio signals (stereo signals) passing through the main audio paths L1-R and L1-L are inevitably phase-filtered by the band-pass filter 30 as described above in the phase correctors 3a and 3b. The time delays corresponding to the phase delays that occur are corrected to be in phase, and are respectively added to the monaural audio signal whose effect has been adjusted by the effect adjuster 6 in the addition / synthesis units 7a and 7b. Thereafter, the left and right audio signals (stereo signals) are output from the signal output terminals 10a and 10b after the impedances thereof are lowered by the output buffers 8a and 8b.

一般にテレビ放送の音声はモノラルで放送され、広がり成分は左右のチャンネルにそれぞれ分離して音場を作り出しているため、入力加算器３１および混合緩衝器３２の出力、すなわち音声成分は広がり成分よりも約６ｄＢ程度大きなレベルで帯域濾波器３０へと印加される。このため、この音声成分から言語認識に重要な第２ホルマント〜第５ホルマントを抽出してもとのステレオ信号に合成することにより明瞭化を達成することができる。 Generally, the sound of television broadcasting is broadcasted in monaural, and the spread component is separated into left and right channels respectively to create a sound field, so the output of the input adder 31 and the mixing buffer 32, that is, the sound component is more than the spread component. The band filter 30 is applied at a level as large as about 6 dB. For this reason, it is possible to achieve clarification by combining the second formant to the fifth formant important for language recognition from this speech component into the original stereo signal.

この結果、聴きやすく明瞭化された音声信号を無線送信装置やＴＶ音声補助装置、音楽再生装置などへ組み込むことで聴力が衰えた高齢者や難聴者であっても明瞭に聴き取ることができ、また、騒音が激しい環境での日常生活で活用することが可能となる。そして、本発明装置１００を実際に既存の無線送信装置やＴＶ音声補助装置、音楽再生装置等に組み込んだところ、健聴者と難聴者との日常生活で生ずる障害を除くことが可能となったとのフィールドテスト結果も報告されている。 As a result, even elderly people and deaf people whose hearing ability has deteriorated can be clearly heard by incorporating an easily understandable and clear audio signal into a wireless transmission device, a TV voice auxiliary device, a music reproduction device, etc. Also, it can be used in daily life in a noisy environment. Then, when the device 100 of the present invention is actually incorporated into an existing wireless transmission device, a TV voice auxiliary device, a music reproduction device, etc., it is possible to eliminate the obstacles that occur in daily life of a hearing person and a deaf person. Field test results are also reported.

本発明の音声処理装置１００および持ち運びを容易にした音声明瞭化装置２００は、既存の放送設備であれば、その殆どに適用可能であり、簡単に優れた音声明瞭化効果を発揮できる。例えば、病院などの呼び出し設備、自治体の非常放送設備、電話機の送話部、無線通信機、音声認識会議システム、イベント会場の案内放送装置、テレビ受像機、ラジオ受信機、公共施設の案内放送設備、高齢者・障害者収容施設の放送設備、学校内放送設備、電車・バスの車内放送、駅・ショッピングセンター・デパート・映画館などの多くの人が集まる場所の館内放送などに簡単に適用でき、優れた音声明瞭化効果が得られる。 The speech processing apparatus 100 of the present invention and the speech clarification apparatus 200 that is easy to carry can be applied to most of existing broadcasting equipment, and can easily exhibit excellent speech clarification effects. For example, call equipment such as a hospital, emergency broadcast equipment of a municipality, transmitter of a telephone set, wireless communication device, voice recognition conference system, guidance broadcast device of event hall, television receiver, radio receiver, guidance broadcast equipment of public facilities It can be easily applied to broadcasting equipment for elderly people and persons with disabilities, broadcasting equipment in schools, in-car broadcasting of trains and buses, and in-house broadcasting of places where many people gather such as stations, shopping centers, department stores, cinemas, etc. , An excellent voice clarification effect can be obtained.

本発明の音声処理装置１００および音声明瞭化装置２００によれば、聴き取り難かった音声が明瞭化されるだけでなく、送話者個人特有の声色を保持できるため、違和感のない自然な音声を生成することができる。 According to the voice processing apparatus 100 and the voice clarifying apparatus 200 of the present invention, not only voices that are difficult to be heard can be clarified but also voice tones specific to the individual transmitter can be maintained. Can be generated.

１００…音声処理装置
２００…音声明瞭化装置
１…マイクロホン
２…入力緩衝装置
３…位相補正器
４…ハイパスフィルター
５…ローパスフィルター
６…効果調整器
７…加算合成器
８…出力緩衝器
９…減衰器
１０…音声出力端子
１１…音声処理部
１２…電力増幅器
１３…スピーカ
１５…入力端子
１６…入力接地端子
１７…出力端子
１８…出力接地端子
１９…演算増幅器
２０…本発明装置を搭載した変調器部
２１…無線送信部
２２…送信用アンテナ又は赤外線発光部
２３…受信用アンテナ又は赤外線受光部
２４…無線信号受信部
３０…帯域濾波器
３１…入力加算器
３２…混合緩衝器
３３…筐体
３４…入力側接続口
３５…出力側接続口 DESCRIPTION OF SYMBOLS 100 ... Sound processing apparatus 200 ... Sound clarification apparatus 1 ... Microphone 2 ... Input buffer apparatus 3 ... Phase corrector 4 ... High pass filter 5 ... Low pass filter 6 ... Effect adjustment 7 ... Addition synthetic | combination apparatus 8 ... Output buffer 9 ... Attenuation Device 10 ... Audio output terminal 11 ... Audio processing unit 12 ... Power amplifier 13 ... Speaker 15 ... Input terminal 16 ... Input ground terminal 17 ... Output terminal 18 ... Output ground terminal 19 ... Operational amplifier 20 ... Modulator equipped with the device of the present invention Unit 21 ... Wireless transmitting unit 22 ... Transmitting antenna or infrared light emitting unit 23 ... Receiving antenna or infrared receiving unit 24 ... Wireless signal receiving unit 30 ... Band filter 31 ... Input adder 32 ... Mixed buffer 33 ... Housing 34 ... Input side connection port 35 ... Output side connection port

Claims

発話者のピュアな音声を集音するマイクロホンと、
前記マイクロホンで集音された音声信号を分岐し、分岐した一方の音声信号から低次のホルマント以下の低周波数成分と高次のホルマントを超える高周波数成分とを除去する帯域濾波器と、
前記分岐した他方の音声信号の位相を、前記帯域濾波器を通過した音声信号の位相と合うように補正する位相補正器と、
前記帯域濾波器を通過した音声信号の各ホルマント成分が周波数マスキングラインを超えるように、当該各ホルマント成分のレベルを調整する効果調整器と、
前記効果調整器を通過した音声信号を前記位相補正器を通過した音声信号に加算合成して出力する加算合成器とを有し、
前記帯域濾波器は、分岐した一方の音声信号から４００Ｈｚ未満の低次のホルマントを含む低周波数成分を除去するハイパスフィルターと、
前記ハイパスフィルターを通過した音声信号から７ｋＨｚを超える高次のホルマントを含む高周波数成分を除去するローパスフィルターとからなることを特徴とする音声処理装置。
A microphone that collects the pure voice of the speaker,
A band-pass filter for branching an audio signal collected by the microphone and removing low-frequency components below the low-order formant and high-frequency components exceeding the high-order formant from one of the branched audio signals;
A phase corrector that corrects the phase of the other branched audio signal so as to match the phase of the audio signal that has passed through the band-pass filter;
An effect adjuster for adjusting the level of each formant component such that each formant component of the audio signal that has passed through the band filter exceeds a frequency masking line;
And an adder / synthesizer that adds and synthesizes an audio signal passed through the effect adjuster with an audio signal passed through the phase corrector, and
The band-pass filter is a high-pass filter for removing low frequency components including low-order formants less than 400 Hz from one audio signal branched.
A voice processing apparatus comprising: a low pass filter for removing high frequency components including high-order formants exceeding 7 kHz from a voice signal which has passed through the high pass filter.

発話者のピュアな音声を集音する第１のステップと、
前記第１のステップで集音された音声信号を分岐し、分岐した一方の音声信号から低次のホルマント以下の低周波数成分と高次のホルマントを超える高周波数成分とを除去して帯域濾過する第２のステップと、
前記分岐した他方の音声信号の位相を、前記帯域濾過した一方の音声信号の位相と合うように補正する第３のステップと、
前記帯域濾過した一方の音声信号の各ホルマント成分が周波数マスキングラインを超えるように、当該各ホルマント成分のレベルを調整する第４のステップと、
前記前記第４のステップで調整した音声信号を前記第３のステップで位相調整された音声信号に加算合成して出力する第５のステップとを含み、
前記第２のステップは、分岐した一方の音声信号から４００Ｈｚ未満の低次のホルマントを含む低周波数成分を除去するハイパスフィルターと、
前記ハイパスフィルターを通過した音声信号から７ｋＨｚを超える高次のホルマントを含む高周波数成分を除去するローパスフィルターとを用いて帯域濾過することを特徴とする音声処理方法。 The first step of collecting the pure voice of the speaker,
The audio signal collected in the first step is branched, and band filtering is performed to remove low frequency components below the low formant and high frequency components exceeding the high formant from one of the branched audio signals. Second step,
A third step of correcting the phase of the other branched audio signal to be in phase with the phase of the band-filtered one audio signal;
Adjusting the level of each formant component such that each formant component of the one band-pass filtered audio signal exceeds the frequency masking line;
And a fifth step of adding and synthesizing the speech signal adjusted in the fourth step to the speech signal phase-adjusted in the third step, and outputting the resultant signal.
The second step is a high pass filter for removing low frequency components including low-order formants less than 400 Hz from one audio signal branched.
A voice processing method comprising band-pass filtering using a low-pass filter for removing high frequency components including high-order formants exceeding 7 kHz from a voice signal passed through the high-pass filter.