JP4150795B2

JP4150795B2 - Hearing assistance device, audio signal processing method, audio processing program, computer-readable recording medium, and recorded apparatus

Info

Publication number: JP4150795B2
Application number: JP2005014568A
Authority: JP
Inventors: 則男赤松
Original assignee: University of Tokushima
Current assignee: University of Tokushima
Priority date: 2005-01-21
Filing date: 2005-01-21
Publication date: 2008-09-17
Anticipated expiration: 2025-01-21
Also published as: JP2006203683A

Abstract

PROBLEM TO BE SOLVED: To provide auditory sense auxiliary equipment or the like capable of performing correction to a sound that can be easily heard by low-load arithmetic processing. SOLUTION: The auditory sense auxiliary equipment comprises an irregular waveform converter 14 for quantizing the amplitude of a sound signal inputted by a sound input section 10, adding the amplitude value of data in a prescribed adjacent range for quantized data at each point, dividing it by the number of added data for obtaining the partial average value with the data as a center, and comparing the amplitude value of data at each point with each partial average value for converting it to an irregular waveform based on the truth or falsehood of the comparison result; a voice component extractor 16 for extracting voice components corresponding to a human voice from the irregular waveform of the sound signal; and an emphasis processor 18 for emphasis by raising the projection of voice components, and lowering a recess and for generating an emphasis voice waveform. COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、例えば聴力の衰えた高齢者や、難聴者などの聴覚障害者の聴覚を補助する補聴器等の聴覚補助装置、音声信号処理方法、音声処理プログラム及びコンピュータで読み取り可能な記録媒体並びに記録した機器に関する。 The present invention relates to a hearing aid device such as a hearing aid for assisting the hearing of a deaf person such as a deaf person or a hearing impaired person, a sound signal processing method, a sound processing program, a computer-readable recording medium, and a recording medium. Related equipment.

聴覚障害者に対し、高い生活環境を提供するために、その聴覚（聴力）を補助する装置として、補聴器がある。補聴器には、例えば小型マイク、増幅器、およびイヤホンからなるものがあるが、このような補聴器は、マイク（小型マイク）に入力された音を、単純に増幅して出力するだけであるため、その出力にはノイズが多く含まれ、さらには会話相手の声や注意すべき物音（重要な環境音）などが、そのノイズに埋もれてしまうこともあり、視覚障害者の聴覚を補助するのに充分とは言えなかった。 There is a hearing aid as a device for assisting hearing (hearing ability) in order to provide a high living environment to a hearing impaired person. Hearing aids include, for example, small microphones, amplifiers, and earphones, but such hearing aids simply amplify and output the sound input to the microphone (small microphone). The output contains a lot of noise, and the conversation partner's voice and important sounds (important environmental sounds) may be buried in the noise, enough to assist the hearing of visually impaired people I couldn't say that.

そこで、人間の音声が、特定の周波数帯域（中音域）に局在していることを利用して、マイクに入力された音声を、中音域を抜き出すバンドパスフィルタを通してから増幅する補聴器が開発されている。しかしながら、このような補聴器でも、会話相手の音声や注意すべき物音などが、快適かつ明瞭に聞こえるとは言い難かった。 Therefore, hearing aids have been developed that amplify the sound input to the microphone through a bandpass filter that extracts the midrange by utilizing the fact that human speech is localized in a specific frequency band (midrange). ing. However, even with such a hearing aid, it was difficult to say that the voice of the conversation partner and the sound of attention should be heard comfortably and clearly.

一方、最近のデジタル信号処理デバイスの発達により、デジタル回路やプロセッサを超小型化することが可能になり、このような技術が、補聴器の分野にも応用されている。デジタル信号処理を応用した補聴器では、アナログ信号の音声信号をＡ／Ｄ変換し、デジタル信号としてから、このデジタル信号に対し、デジタルフィルタによるフィルタリング、雑音除去、周波数空間処理などのデジタル信号処理を施すことにより、可聴性を高めるようになされている。 On the other hand, recent developments in digital signal processing devices have made it possible to miniaturize digital circuits and processors, and such techniques are also being applied to the field of hearing aids. In hearing aids that apply digital signal processing, analog audio signals are A / D converted into digital signals, which are then subjected to digital signal processing such as filtering by a digital filter, noise removal, and frequency space processing. Therefore, the audibility is improved.

ここで、図１６は、従来の聴覚補助装置としての補聴器の一例の構成を示している。この補聴器においては、まずマイク３０１で、周囲の音声やその他の物音を拾い、これを電気信号に変換し、原音声信号Ａ１１として出力する。この原音声信号Ａ１１はアナログフィルタ１０８に供給され、そこでは、人間の音声の周波数分布が集中する中音域だけが通過され、他はカットされる。これにより、アナログフィルタ１０８からは、中音域音声信号Ａ１２が出力される。中音域音声信号Ａ１２は、Ａ／Ｄ変換器１０９に供給され、そこでＡ／Ｄ変換され、これによりデジタル信号としての音声信号Ａ１３にされる。 Here, FIG. 16 shows a configuration of an example of a hearing aid as a conventional hearing aid device. In this hearing aid, first, the microphone 301 picks up surrounding sounds and other physical sounds, converts them into electric signals, and outputs them as original sound signals A11. This original audio signal A11 is supplied to the analog filter 108, where only the mid-range where the frequency distribution of human voice is concentrated is passed, and the others are cut. As a result, the mid-range audio signal A12 is output from the analog filter 108. The mid-range audio signal A12 is supplied to the A / D converter 109, where it is A / D converted, and thereby converted into an audio signal A13 as a digital signal.

音声信号Ａ１３はメモリ３０２に供給され、一時記憶される。メモリ３０２は、信号バスを介してデジタルシグナルプロセッサ（ＤＳＰ）３０３に接続されており、このＤＳＰ３０３は、メモリ３０２に格納された音声信号に対して、例えばデジタルフィルタリングや、雑音除去、ＦＦＴ（高速フーリエ変換）等の周波数成分分解処理や周波数空間処理等を施す。このような信号処理が施された音声信号は、処理音声信号Ａ１５として、メモリ３０２からＤ／Ａ変換器１１７に供給される。Ｄ／Ａ変換器１１７では、デジタル信号である処理音声信号Ａ１５がＤ／Ａ変換され、アナログ音声信号Ａ１６にされる。アナログ音声信号Ａ１６は、増幅器１１８に供給されて増幅される。そして、増幅器１１８からは、増幅された音声信号Ａ１７がイヤホン３０４に供給され、そこから出力される。以上のようにして、マイク３０１に入力された音が、使用者（視覚障害者）の耳に届く。 The audio signal A13 is supplied to the memory 302 and temporarily stored. The memory 302 is connected to a digital signal processor (DSP) 303 via a signal bus. The DSP 303 performs, for example, digital filtering, noise removal, FFT (fast Fourier transform) on the audio signal stored in the memory 302. Frequency component decomposition processing such as conversion) and frequency space processing. The audio signal subjected to such signal processing is supplied from the memory 302 to the D / A converter 117 as the processed audio signal A15. In the D / A converter 117, the processed audio signal A15, which is a digital signal, is D / A converted into an analog audio signal A16. The analog audio signal A16 is supplied to the amplifier 118 and amplified. Then, the amplified audio signal A17 is supplied from the amplifier 118 to the earphone 304 and output therefrom. As described above, the sound input to the microphone 301 reaches the user's (visually impaired) ear.

しかしながら、上述したような補聴器では、単一のマイク３０１に入力された音から人間の音声に相当すると考えられる周波数成分を取り出して、可聴性を高めるようになされているため、会話相手の音声と、そうでない他人の音声とがともに増幅され、使用者が聞こうとしている会話相手の音声が聞き取り難くなる課題があった。さらに、人間の音声と、それと同じような周波数成分を有する外部の物音とも区別されずに増幅されるため、やはり使用者が聞こうとしている音が聞き取りにくい課題があった。また、例えば自動車のクラクションや、警報音、電話のベルなどは、生活上重要な環境音（重要音）であり、常時聞こえる状態にあることが望ましいが、上述した補聴器を使用した場合には、このような重要音を聞き逃すおそれもある。 However, in the hearing aid as described above, the frequency component considered to be equivalent to human voice is extracted from the sound input to the single microphone 301 to enhance the audibility. There is a problem that the voice of the conversation partner that the user wants to listen to becomes difficult to hear because the voice of the other person is amplified together. Furthermore, since the human voice and the external sound having the same frequency component are amplified without being distinguished from each other, there is still a problem that it is difficult to hear the sound that the user wants to hear. In addition, for example, car horn, alarm sound, telephone bell, etc. are environmental sounds (important sounds) that are important in daily life, and it is desirable that they are always audible, but when using the above-mentioned hearing aid, There is also a risk of missing such important sounds.

このような課題に対して、特許文献１に示す聴覚補助装置が開発されている。この聴覚補助装置は、図１７に示すように、無指向性マイク１０２Ｌおよび１０２Ｒに入力された環境音と、指向性マイク１０７に入力された会話相手の音声とが、プロセッサユニット１５０でそれぞれ独立に処理され、いずれか一方が増幅されて、イヤースピーカ１１９Ｒおよび１１９Ｌから出力されるよう構成され、会話相手の音声および注意すべき物音（重要音）が、快適かつ明瞭に聞こえるようにしている。 In response to such a problem, a hearing aid apparatus disclosed in Patent Document 1 has been developed. As shown in FIG. 17, the hearing aid apparatus is configured such that the environment sound input to the omnidirectional microphones 102 L and 102 R and the voice of the conversation partner input to the directional microphone 107 are independently generated by the processor unit 150. It is processed and either one is amplified and output from the ear speakers 119R and 119L, so that the voice of the conversation partner and the sound of interest (important sound) can be heard comfortably and clearly.

しかしながら、この補聴器ではＤＳＰ等の音声処理回路がフーリエ変換を行っているため、浮動小数点演算などの複雑な演算処理が必要となり、高速で高性能の演算回路が要求され、装置が大型で高価になるという問題があった。補聴器においてはリアルタイム処理が求められるため、高速な処理が可能なハードウェア仕様が要求される一方で、携帯可能な、可能な限り小型軽量化が望まれている。
特開平８−７９８９７号公報 However, in this hearing aid, since an audio processing circuit such as a DSP performs Fourier transform, complicated arithmetic processing such as floating point arithmetic is required, a high-speed and high-performance arithmetic circuit is required, and the apparatus is large and expensive. There was a problem of becoming. Since a hearing aid requires real-time processing, a hardware specification capable of high-speed processing is required, and a portable and compact and light weight is desired.
JP-A-8-79897

本発明は、このような問題点を解決するためになされたものである。本発明の主な目的は、音声処理において演算を簡素化し、高速かつ低負荷の処理とすることで実装や組み込みを容易にした聴覚補助装置、音声信号処理方法、音声処理プログラム及びコンピュータで読み取り可能な記録媒体並びに記録した機器を提供することにある。 The present invention has been made to solve such problems. The main object of the present invention is to enable hearing aid devices, audio signal processing methods, audio processing programs, and computers that can be easily implemented and incorporated by simplifying operations in audio processing and providing high-speed and low-load processing. And providing a recording device.

上記の目的を達成するために、本発明の第１の聴覚補助装置は、音声信号を入力するための音声入力部と、前記音声入力部で入力された音声信号の振幅を量子化し、量子化された各点のデータにつき、隣接する所定の範囲のデータの振幅値を加算し、これを加算したデータの個数で除算して該データを中心とする部分平均値を求め、各点のデータの振幅値と各々の部分平均値とを比較し、比較結果の真偽に基づいて凹凸波形に変換するための凹凸波形変換部と、前記凹凸波形変換部で得られた音声信号の凹凸波形から、予め登録された人の声に関する登録パターンに従い人の声に対応する声成分を抽出するための声成分抽出部と、前記声成分抽出部で抽出された声成分の凸部を高く、凹部を低くすることで強調し、強調声波形を生成するための強調処理部と、強調声波形を出力するための音声出力部とを備える。これにより、フーリエ変換で音声信号を周波数スペクトルに変換することなく、音声信号から声成分を抽出でき、かつ聞き取りやすく補正できるので、ノイズの音量を増やさない快適な聴覚補助装置が実現できる。 In order to achieve the above object, a first hearing assist device of the present invention quantizes a voice input unit for inputting a voice signal, and an amplitude of the voice signal input by the voice input unit, For each point of data, add the amplitude value of the adjacent predetermined range of data, divide this by the number of the added data, find a partial average value centered on the data, From the concavo-convex waveform conversion unit for comparing the amplitude value and each partial average value, and converting to the concavo-convex waveform based on the truth of the comparison result, and the concavo-convex waveform of the audio signal obtained by the concavo-convex waveform conversion unit, A voice component extraction unit for extracting a voice component corresponding to a human voice according to a registered pattern relating to a human voice registered in advance, and a convex portion of the voice component extracted by the voice component extraction unit is high and a concave portion is low To generate emphasizing voice waveform Comprising a enhancement processing unit, an audio output unit for outputting the emphasis voice waveform. As a result, a voice component can be extracted from the voice signal without correcting the voice signal into a frequency spectrum by Fourier transform, and correction can be easily performed. Therefore, a comfortable hearing aid apparatus that does not increase the volume of noise can be realized.

さらにまた、本発明の第２の聴覚補助装置は、前記凹凸波形変換部が、凹凸波形に変換するサンプリング個数を２のべき乗とし、前記声成分抽出部が、べき指数を調整することで声成分を抽出する。これにより、声成分を抽出する処理をビットシフト演算により行うことができるので、処理を低負荷で行うことが可能となる。 Furthermore, in the second hearing aid of the present invention, the uneven waveform converter converts the number of samples to be converted into the uneven waveform to a power of 2, and the voice component extractor adjusts the exponent to obtain a voice component. To extract. Thereby, the process of extracting the voice component can be performed by the bit shift operation, and thus the process can be performed with a low load.

さらにまた、本発明の第３の聴覚補助装置は、前記声成分抽出部が、周波数成分を人の声の高域に対応する成分、低域に対応する成分に応じたべき指数をそれぞれ設定して声成分を抽出する。これにより、人の声が含まれない高域、低域をカットできるので、効果的に人の声に対応する声成分のみを抽出できる。 Furthermore, in the third hearing aid of the present invention, the voice component extraction unit sets an exponent that should correspond to a component corresponding to a high frequency of a human voice and a component corresponding to a low frequency, respectively. To extract the voice component. Thereby, since the high region and the low region that do not include the human voice can be cut, only the voice component corresponding to the human voice can be extracted effectively.

さらにまた、本発明の第４の聴覚補助装置は、前記凹凸波形変換部が部分平均値を演算する際に、所定の範囲の加算すべきデータの個数を２のべき乗として、前記所定の範囲の加算したデータの個数で除算をビット・シフト演算で行う。これにより、除算の際にビット・シフト演算が可能となり、さらに演算処理を簡素化でき、高速化に寄与し得る。 Furthermore, in the fourth hearing aid apparatus of the present invention, when the uneven waveform converter calculates a partial average value, the number of data to be added in a predetermined range is set to a power of 2, and Divide by the number of added data by bit shift operation. As a result, it is possible to perform a bit shift operation at the time of division, further simplify the arithmetic processing, and contribute to speeding up.

さらにまた、本発明の第５の聴覚補助装置は、前記凹凸波形変換部が部分平均値を演算する際に、一のデータにつき平均値を求めるために所定の範囲のデータの振幅値を加算した加算値を保持しておき、次のデータの加算値を求める際に、保持された加算値から、不要な振幅値を減算すると共に、必要な振幅値を加算することで、加算値を演算する。これにより、各平均値演算において加算値を求める際に、前回のデータについて演算した加算値を利用して、必要なデータの入れ替えによって所望の加算値とすることができ、加算演算を大幅に簡素化でき、演算処理をさらに高速化することが可能となる。 Furthermore, in the fifth hearing aid of the present invention, when the uneven waveform converting unit calculates a partial average value, the amplitude value of data in a predetermined range is added to obtain an average value for one data. When the addition value is held and the addition value of the next data is obtained, the addition value is calculated by subtracting the unnecessary amplitude value from the held addition value and adding the necessary amplitude value. . As a result, when obtaining an addition value in each average value calculation, the addition value calculated for the previous data can be used to obtain a desired addition value by exchanging necessary data, greatly simplifying the addition calculation. It is possible to speed up the arithmetic processing.

さらにまた、本発明の第６の聴覚補助装置は、ｋ点を中心とする前後ｎの区間Ｎ（＝２ｎ）における平均値αｋを、 Furthermore, the sixth hearing assisting device of the present invention calculates the average value αk in the interval N (= 2n) of the front and rear n centering around the k point,

として表現する際、平均値の演算において、平均値α_ｋを、その前段の位置である（ｋ−１）点における平均値を用いて In the calculation of the average value, the average value α _k is calculated using the average value at the point (k−1) that is the position of the previous stage.

で演算する。これにより、各平均値を前段の平均値を利用して逐次的に求めることができ、演算処理量を大幅に低減して高速かつ低負荷な音声信号特徴量抽出処理を実現する。 Calculate with. As a result, each average value can be obtained sequentially using the average value in the previous stage, and the amount of calculation processing can be greatly reduced to realize high-speed and low-load voice signal feature amount extraction processing.

さらにまた、本発明の第７の聴覚補助装置は、前記声成分抽出部で抽出された声成分につき、子音が認識されると前記音声出力部から出力される音量を大きくし、子音の後に母音が認識されると、母音から所定時間で音量増幅を解除することを特徴とする聴覚補助装置。これにより、子音が聞き取りやすくでき、音量全体を上げずとも明瞭に音声を聞き取りできる聴覚補助装置が実現できる。 Furthermore, in the seventh hearing aid of the present invention, the voice component extracted by the voice component extraction unit increases the volume output from the voice output unit when a consonant is recognized, and the vowel after the consonant When the voice is recognized, the amplification of volume is canceled after a predetermined time from the vowel. As a result, it is possible to realize an auditory assistance device that makes it easy to hear consonants and that can clearly hear the sound without increasing the overall volume.

さらにまた、本発明の音声処理方法は、入力された音声信号に基づいて人の声を補正して出力する音声信号処理方法であって、音声信号を入力する工程と、入力された音声信号の振幅を量子化し、量子化された各点のデータにつき、隣接する所定の範囲のデータの振幅値を加算し、これを加算したデータの個数で除算して該データを中心とする部分平均値を求め、各点のデータの振幅値と各々の部分平均値とを比較し、比較結果の真偽に基づいて凹凸波形に変換すると共に、得られた音声信号の凹凸波形から人の声に対応する声成分を抽出する工程と、抽出された声成分を強調して強調声波形を生成する工程と、強調声波形を再生する工程とを有する。これにより、フーリエ変換で音声信号を周波数スペクトルに変換することなく、音声信号から声成分を抽出でき、かつ聞き取りやすく補正できるので、ノイズの音量を増やさないで快適な音声再生が実現できる。 Furthermore, the audio processing method of the present invention is an audio signal processing method for correcting and outputting a human voice based on the input audio signal, the step of inputting the audio signal, and the input of the input audio signal. The amplitude is quantized, and for each quantized point data, the amplitude value of adjacent data in a predetermined range is added, and this is divided by the number of added data to obtain a partial average value centered on the data. Find and compare the amplitude value of each point of data with each partial average value, and convert it into a concavo-convex waveform based on the truth of the comparison result, and respond to human voice from the concavo-convex waveform of the obtained audio signal A step of extracting a voice component, a step of generating an emphasized voice waveform by emphasizing the extracted voice component, and a step of reproducing the emphasized voice waveform. As a result, the voice component can be extracted from the audio signal without being converted into a frequency spectrum by Fourier transform, and correction can be easily performed. Therefore, comfortable audio reproduction can be realized without increasing the noise volume.

さらにまた、本発明の音声信号特徴量抽出プログラムは、入力された音声信号に基づいて人の声を補正して出力する音声信号処理プログラムであって、音声信号を入力する機能と、入力された音声信号の振幅を量子化し、量子化された各点のデータにつき、隣接する所定の範囲のデータの振幅値を加算し、これを加算したデータの個数で除算して該データを中心とする部分平均値を求め、各点のデータの振幅値と各々の部分平均値とを比較し、比較結果の真偽に基づいて凹凸波形に変換すると共に、得られた音声信号の凹凸波形から人の声に対応する声成分を抽出する機能と、抽出された声成分を強調して強調声波形を生成する機能と、強調声波形を再生する機能とをコンピュータに実現させる。これにより、フーリエ変換で音声信号を周波数スペクトルに変換することなく、音声信号から声成分を抽出でき、かつ聞き取りやすく補正できるので、ノイズの音量を増やさないで快適な音声再生が実現できる。 Furthermore, the audio signal feature extraction program of the present invention is an audio signal processing program for correcting and outputting a human voice based on an input audio signal, and a function for inputting an audio signal and an input Quantize the amplitude of the audio signal, add the amplitude value of the adjacent predetermined range of data for each quantized point data, and divide this by the number of added data, and center the data An average value is obtained, the amplitude value of each point data is compared with each partial average value, and converted into a concavo-convex waveform based on the true / false of the comparison result. The computer realizes a function of extracting a voice component corresponding to the above, a function of emphasizing the extracted voice component to generate an emphasized voice waveform, and a function of reproducing the emphasized voice waveform. As a result, the voice component can be extracted from the audio signal without being converted into a frequency spectrum by Fourier transform, and correction can be easily performed. Therefore, comfortable audio reproduction can be realized without increasing the noise volume.

また本発明のコンピュータで読み取り可能な記録媒体又は記録した機器は、上記プログラムを格納するものである。記録媒体には、ＣＤ−ＲＯＭ、ＣＤ−Ｒ、ＣＤ−ＲＷやフレキシブルディスク、磁気テープ、ＭＯ、ＤＶＤ−ＲＯＭ、ＤＶＤ−ＲＡＭ、ＤＶＤ−Ｒ、ＤＶＤ＋Ｒ、ＤＶＤ−ＲＷ、ＤＶＤ＋ＲＷ、Ｂｌｕｅ−ｒａｙディスク、ＨＤＤＶＤ（ＡＯＤ）等の磁気ディスク、光ディスク、光磁気ディスク、半導体メモリその他のプログラムを格納可能な媒体が含まれる。またプログラムには、上記記録媒体に格納されて配布されるものの他、インターネット等のネットワーク回線を通じてダウンロードによって配布される形態のものも含まれる。さらに記録した機器には、上記プログラムがソフトウェアやファームウェア等の形態で実行可能な状態に実装された汎用もしくは専用機器を含む。さらにまたプログラムに含まれる各処理や機能は、コンピュータで実行可能なプログラムソフトウエアにより実行してもよいし、各部の処理を所定のゲートアレイ（ＦＰＧＡ、ＡＳＩＣ）等のハードウエア、又はプログラム・ソフトウエアとハードウェアの一部の要素を実現する部分的ハードウエア・モジュールとが混在する形式で実現してもよい。 The computer-readable recording medium or the recorded device of the present invention stores the above program. Recording media include CD-ROM, CD-R, CD-RW, flexible disk, magnetic tape, MO, DVD-ROM, DVD-RAM, DVD-R, DVD + R, DVD-RW, DVD + RW, Blue-ray disk, This includes a magnetic disk such as HD DVD (AOD), an optical disk, a magneto-optical disk, a semiconductor memory, and other media capable of storing programs. The program includes a program distributed in a download manner through a network line such as the Internet, in addition to a program stored and distributed in the recording medium. Further, the recorded devices include general-purpose or dedicated devices in which the program is implemented in a state where it can be executed in the form of software, firmware, or the like. Furthermore, each process and function included in the program may be executed by program software that can be executed by a computer, or each part of the process or function may be executed by hardware such as a predetermined gate array (FPGA, ASIC), or program software. It may be realized in the form of a mixture of hardware and partial hardware modules that realize some elements of hardware.

本発明の聴覚補助装置、音声信号処理方法、音声処理プログラム及びコンピュータで読み取り可能な記録媒体並びに記録した機器によれば、人の声を聞き取りやすく強調して出力可能な聴覚補助装置が実現される。それは、音声信号から抽出された人の声成分に対して、強調処理を行っているからである。これにより、ノイズなどを大きくすることなく声の成分の音量を大小調整して、快適な聴覚補助装置が実現できる。 According to the hearing aid device, sound signal processing method, sound processing program, computer-readable recording medium, and recorded device of the present invention, a hearing aid device capable of emphasizing and outputting a human voice is realized. . This is because enhancement processing is performed on the human voice component extracted from the audio signal. This makes it possible to realize a comfortable hearing aid device by adjusting the volume of the voice component without increasing noise.

以下、本発明の実施の形態を図面に基づいて説明する。ただし、以下に示す実施の形態は、本発明の技術思想を具体化するための聴覚補助装置、音声信号処理方法、音声処理プログラム及びコンピュータで読み取り可能な記録媒体並びに記録した機器を例示するものであって、本発明は聴覚補助装置、音声信号処理方法、音声処理プログラム及びコンピュータで読み取り可能な記録媒体並びに記録した機器を以下のものに特定しない。また、本明細書は特許請求の範囲に示される部材を、実施の形態の部材に特定するものでは決してない。特に実施の形態に記載されている構成部品の寸法、材質、形状、その相対的配置等は特に特定的な記載がない限りは、本発明の範囲をそれのみに限定する趣旨ではなく、単なる説明例にすぎない。なお、各図面が示す部材の大きさや位置関係等は、説明を明確にするため誇張していることがある。さらに以下の説明において、同一の名称、符号については同一もしくは同質の部材を示しており、詳細説明を適宜省略する。さらに、本発明を構成する各要素は、複数の要素を同一の部材で構成して一の部材で複数の要素を兼用する態様としてもよいし、逆に一の部材の機能を複数の部材で分担して実現することもできる。 Hereinafter, embodiments of the present invention will be described with reference to the drawings. However, the embodiment described below exemplifies a hearing aid apparatus, an audio signal processing method, an audio processing program, a computer-readable recording medium, and a recorded device for embodying the technical idea of the present invention. Therefore, the present invention does not specify a hearing aid device, an audio signal processing method, an audio processing program, a computer-readable recording medium, and a recorded device as follows. Further, the present specification by no means specifies the members shown in the claims to the members of the embodiments. In particular, the dimensions, materials, shapes, relative arrangements, and the like of the component parts described in the embodiments are not intended to limit the scope of the present invention unless otherwise specified, and are merely explanations. It is just an example. Note that the size, positional relationship, and the like of the members shown in each drawing may be exaggerated for clarity of explanation. Furthermore, in the following description, the same name and symbol indicate the same or the same members, and detailed description thereof will be omitted as appropriate. Furthermore, each element constituting the present invention may be configured such that a plurality of elements are constituted by the same member and the plurality of elements are shared by one member, and conversely, the function of one member is constituted by a plurality of members. It can also be realized by sharing.

本明細書において聴覚補助装置や音声信号処理システムに接続される操作、制御、入出力、表示、その他の処理等のためのコンピュータ、プリンタ、外部記憶装置その他の周辺機器との接続は、例えばＩＥＥＥ１３９４、ＲＳ−２３２ｘ、ＲＳ−４２２、ＲＳ−４２３、ＲＳ−４８５、ＵＳＢ等のシリアル接続、パラレル接続、あるいは１０ＢＡＳＥ−Ｔ、１００ＢＡＳＥ−ＴＸ、１０００ＢＡＳＥ−Ｔ等のネットワークを介して電気的に接続して通信を行う。接続は有線を使った物理的な接続に限られず、ＩＥＥＥ８０２．１ｘ、ＯＦＤＭ方式等の無線ＬＡＮやＢｌｕｅｔｏｏｔｈ等の電波、赤外線、光通信等を利用した無線接続等でもよい。さらに認識対象の音声データや認識後の音声データの保存や設定の保存等を行うための記録媒体には、メモリカードや磁気ディスク、光ディスク、光磁気ディスク、半導体メモリ等が利用できる。 In this specification, connection to computers, printers, external storage devices, and other peripheral devices for operation, control, input / output, display, and other processing connected to a hearing aid device and an audio signal processing system is, for example, IEEE 1394. RS-232x, RS-422, RS-423, RS-485, USB, etc., serial connection, parallel connection, or 10BASE-T, 100BASE-TX, 1000BASE-T, etc. Communicate. The connection is not limited to a physical connection using a wire, but may be a wireless connection using a wireless LAN such as IEEE 802.1x or OFDM, radio waves such as Bluetooth, infrared rays, optical communication, or the like. Furthermore, a memory card, a magnetic disk, an optical disk, a magneto-optical disk, a semiconductor memory, or the like can be used as a recording medium for storing voice data to be recognized, voice data after recognition, saving settings, and the like.

図１に、本発明の聴覚補助装置の一例として、補聴器のブロック図を示す。図１（ａ）に示す聴覚補助装置１００は、音声入力部１０、Ａ／Ｄ（アナログ／デジタル）変換器１２、凹凸波形変換部１４、声成分抽出部１６、強調処理部１８、音声出力部２０を備える。音声入力部１０としては、マイクロホンや音声入力端子等などが利用できる。特に補聴器として利用する場合は、マイクロホンで入力した音声信号をＡ／Ｄ変換器１２でデジタル音声信号に変換して声成分抽出部１６に送出する。ただ、直接デジタルの音声データを音声入力部１０から入力する形態や、ネットワーク接続された外部機器から音声データを入力する方式も適宜採用できる。 FIG. 1 shows a block diagram of a hearing aid as an example of the hearing aid apparatus of the present invention. A hearing aid apparatus 100 shown in FIG. 1A includes an audio input unit 10, an A / D (analog / digital) converter 12, an uneven waveform conversion unit 14, a voice component extraction unit 16, an enhancement processing unit 18, and an audio output unit. 20. As the voice input unit 10, a microphone, a voice input terminal, or the like can be used. In particular, when used as a hearing aid, the audio signal input by the microphone is converted to a digital audio signal by the A / D converter 12 and sent to the voice component extraction unit 16. However, a form in which digital audio data is directly input from the audio input unit 10 and a system in which audio data is input from an external device connected to the network can be appropriately employed.

マイクロホン１０から取り込まれた音声信号は、アナログフィルタなどの雑音除去装置に入力され、ここで１０ｍｓ程度の周期でフレーム分析され、周囲環境の雑音や、マイクや伝送路が有する伝送特性雑音を除去される。その後、Ａ／Ｄ変換器１２でデジタル音声信号に変換して凹凸波形変換部１４により量子化され、凹凸波形に変換される。さらにこの凹凸波形から声成分抽出部１６で人の声に対応する声成分を抽出し、これを強調処理部１８で強調声波形に補正し、音声出力部２０から出力される。音声出力部２０はスピーカや音声出力端子などである。凹凸波形変換部１４、声成分抽出部１６、強調処理部１８等の各種演算処理部材は、マイクロプロセッサ（ＭＰＵ）やＣＰＵ、ＬＳＩ、ＦＰＧＡやＡＳＩＣ等のゲートアレイといった論理回路や中央演算処理装置等のハードウエアやソフトウエア、あるいはこれらの混在により実現できる。また必ずしも各構成要素が図１に示した構成と同一でなくてもよく、その機能が実質的に同一であるもの、あるいは一つの要素が図１に示す構成における複数の要素の機能を備えるものは、本発明に含まれる。 The audio signal captured from the microphone 10 is input to a noise removal device such as an analog filter, where the frame analysis is performed at a period of about 10 ms, and noise in the surrounding environment and transmission characteristic noise of the microphone and the transmission path are removed. The Thereafter, it is converted into a digital audio signal by the A / D converter 12, quantized by the concave / convex waveform converting unit 14, and converted into a concave / convex waveform. Further, a voice component corresponding to a human voice is extracted from the uneven waveform by the voice component extraction unit 16, and this is corrected to an emphasized voice waveform by the enhancement processing unit 18 and output from the voice output unit 20. The audio output unit 20 is a speaker, an audio output terminal, or the like. Various arithmetic processing members such as the concavo-convex waveform converting unit 14, the voice component extracting unit 16, and the emphasizing processing unit 18 include a logic circuit such as a microprocessor (MPU), a CPU, an LSI, an FPGA, an ASIC, and a central processing unit. It can be realized by hardware, software, or a mixture of these. In addition, each component does not necessarily have the same configuration as that shown in FIG. 1, and its function is substantially the same, or one component has the functions of a plurality of elements in the configuration shown in FIG. Are included in the present invention.

特徴量としては、一般にはケプストラム係数が利用され、対数的な変換処理により対数スペクトルを求め、逆フーリエ変換や逆コサイン変換をすることにより算出して抽出することが行われていた。ただ、この方法では周波数スペクトル等への演算が必要となり処理量の負担が大きいので、本実施の形態では各母音（５母音）の特徴を振幅波形から抽出する方式とする。振幅波形から特徴を抽出することで周波数スペクトル等への変換のための演算を省くことが可能であり、演算回数も比較的少ない計算量で済む。得られた特徴量を用いて離散ボロノイ図により領域を分割し、異なるカテゴリの境界座標を算出して最小２乗法による識別境界線を決定する。 In general, a cepstrum coefficient is used as the feature quantity, and a logarithmic spectrum is obtained by logarithmic conversion processing, and is calculated and extracted by performing inverse Fourier transform or inverse cosine transform. However, since this method requires computation on the frequency spectrum and the processing load is large, the present embodiment adopts a method of extracting the characteristics of each vowel (five vowels) from the amplitude waveform. By extracting features from the amplitude waveform, it is possible to omit computation for conversion to a frequency spectrum or the like, and the number of computations can be relatively small. A region is divided by a discrete Voronoi diagram using the obtained feature quantity, boundary coordinates of different categories are calculated, and an identification boundary line by the least square method is determined.

標準モデルとしては、複数の認識対象語彙毎の特徴量の時系列を確率的な遷移として表現する隠れマルコフモデル（ＨＭＭ）と呼ばれる方法がある。ＨＭＭとは、あらかじめ個人差による音韻や単語の特徴量の時系列をＨＭＭモデルに学習させておき、入力音声がモデルに確率値としてどのくらい近いかを捉えて認識する方法である。また、標準モデルとしては、複数の認識対象語彙毎の特徴量の時系列の中の代表的な特徴量の時系列をモデルとしても良いし、さらに特徴量の時系列を時間的あるいは周波数的に正規化（伸縮）することで得られる特徴量の正規化時系列を用いてもよい。例えば、時間軸上で任意の長さに正規化する方法としＤＰマッチング（動的計画法）があり、あらかじめ決定した対応付けの規則に従って、時間的特徴量の時系列を正規化することが可能である。 As a standard model, there is a method called a Hidden Markov Model (HMM) that expresses a time series of feature quantities for a plurality of recognition target words as a stochastic transition. HMM is a method in which a time series of phonemes and word feature quantities due to individual differences is learned in advance in an HMM model, and how close the input speech is as a probability value to the model is recognized. In addition, as a standard model, a time series of representative feature quantities in a time series of feature quantities for a plurality of recognition target vocabularies may be used as a model, and a time series of feature quantities may be temporally or frequency-wise. You may use the normalization time series of the feature-value obtained by normalizing (expanding / contracting). For example, there is DP matching (dynamic programming) as a method of normalizing to an arbitrary length on the time axis, and it is possible to normalize the time series of temporal feature amounts according to a predetermined association rule It is.

本実施形態では、このようにいずれの場合の標準モデルを使用することができる。ただし、いずれの標準モデルを作成する場合でも、標準モデルを作成するための複数の音声データをあらかじめ用意しておき、入力音声の振幅に対して同様の処理を行い凹凸波形に変換して登録しておく必要がある。 In this embodiment, the standard model in any case can be used as described above. However, when creating any standard model, prepare multiple audio data to create the standard model in advance, perform the same processing on the amplitude of the input audio, convert it to a concavo-convex waveform, and register it. It is necessary to keep.

凹凸波形変換部１４は、図１（ｂ）に示すように、振幅算出部２２と、平均値演算部２４と、比較部２６と、変換部２８とを備える。振幅算出部２２は、入力波形の振幅波形に基づいて量子化を行う。量子化されたデータは平均値演算部２４で、各点の平均値を求める。ここでは、図２（ａ）に示すようにサンプル点を中心とするＮ個のデータについて平均値を求める。すなわち、サンプル点ｋを中心として、その前のｎ（＝Ｎ／２）個のデータ及びその後のｎ個のデータについて、振幅値を加算してＮで除算する。具体的には、Ｎ＝２×ｎとし、ｘ_ｋの前後ｎ点のサンプル値より得られた平均値α_ｋを以下の数５にて演算する。 As shown in FIG. 1B, the uneven waveform converter 14 includes an amplitude calculator 22, an average value calculator 24, a comparator 26, and a converter 28. The amplitude calculation unit 22 performs quantization based on the amplitude waveform of the input waveform. For the quantized data, an average value calculation unit 24 calculates an average value of each point. Here, as shown in FIG. 2A, an average value is obtained for N pieces of data centered on the sample point. That is, with the sample point k as the center, the amplitude value is added and divided by N for the previous n (= N / 2) data and the subsequent n data. Specifically, N = 2 × n, and the average value α _k obtained from the sample values at n points before and after x _k is calculated by the following equation (5).

ここで、添え字のｋは現在参照しているサンプル点である。ｘ_ｋはｋ点における振幅値を表している。またＮは凹凸波形を算出するために用いる閾値を計算するための幅を表している。ここでＮを２のべき乗とすることで、除算の際に２進数であればビット・シフト演算が利用でき、平均値演算部２４での演算を簡単にすることができるので好ましい。 Here, the subscript k is the currently referenced sample point. x _k represents the amplitude value at the point _k . N represents a width for calculating a threshold value used for calculating the uneven waveform. Here, it is preferable to set N to a power of 2, since a bit shift operation can be used if it is a binary number at the time of division, and the operation in the average value operation unit 24 can be simplified.

このようにして平均値演算部２４で各点につき得られた平均値を、比較部２６で振幅値と比較する。具体的には、各点の振幅値ｘ_ｋとその平均値α_ｋとを比較し、以下の数６のように比較結果を出力する。 Thus, the average value obtained for each point by the average value calculation unit 24 is compared with the amplitude value by the comparison unit 26. Specifically, the amplitude value x _{k of} each point and the average value α _k thereof are compared, and the comparison result is output as shown in Equation 6 below.

このように、比較部２６は振幅波形の各点につき平均値を演算し、ｋ点のサンプル値（振幅値）が各平均値より大か小により得られる信号を出力する。比較結果としてｘ_ｋが平均値以上であればａ、平均値以下であればｂを比較部２６は出力する。この結果を、変換部２８で凹凸状の波形として出力する。例えばａ＝１、ｂ＝０とすると、振幅波形は谷か山（０か１）の凹凸波形で表現できる。この凹凸波形は、平均値を演算したＮ個の領域において、ｘ_ｋが平均値以上すなわち波形が凸形になっているのか、あるいはｘ_ｋが平均値以下すなわち凹形になっているかを表現している。よって、Ｎの値を変化させれば元の振幅波形の凹凸を粗く、あるいは細かく表現することができ、Ｎをパラメータとして変化させることにより複数個の特徴量を抽出できる。このように振幅波形の特徴を凹凸のみで簡素化して表現することにより、必要な特徴量を抽出でき声成分抽出や音声認識モデルに適用することができる。特に、凹凸のみで表現される特徴量は、あまり高度でない情報処理機能においても音声波形から子音部と母音部の切り出し（Segmentation）が実現できる。また抽出された声成分に基づいて音声出力に処理を加える場合も、実用的な時間間隔で音声出力の大きさを変化させることが可能となる。 In this way, the comparison unit 26 calculates an average value for each point of the amplitude waveform, and outputs a signal obtained when the sample value (amplitude value) at the k point is larger or smaller than each average value. If the comparison result as x _k or higher average value a, comparing section 26 b equal to or less than the average value is output. The result is output as an uneven waveform by the conversion unit 28. For example, if a = 1 and b = 0, the amplitude waveform can be expressed by a concave / convex waveform of a valley or a mountain (0 or 1). This concavo-convex waveform represents whether x _k is equal to or greater than the average value, that is, the waveform is convex, or x _k is equal to or less than the average value, that is, concave, in the N regions where the average value is calculated. ing. Therefore, if the value of N is changed, the unevenness of the original amplitude waveform can be expressed coarsely or finely, and a plurality of feature amounts can be extracted by changing N as a parameter. In this way, by simplifying and expressing the features of the amplitude waveform with only the unevenness, the necessary feature amount can be extracted and applied to voice component extraction and a speech recognition model. In particular, the feature amount expressed only by unevenness can realize segmentation of the consonant part and the vowel part from the speech waveform even in an information processing function that is not so advanced. Also, when processing is performed on the audio output based on the extracted voice component, the size of the audio output can be changed at a practical time interval.

さらに、平均値の演算において、振幅波形の加算値を求める際に、近接する位置での算出値を利用して演算を簡素化することができる。すなわち、あるデータにつき平均値を求めるためにＮ（＝２ｎ）個のデータの振幅値を加算した加算値を保持しておく。このとき、平均値α_ｋは以下の数７のように演算できる。 Further, in calculating the average value, when calculating the added value of the amplitude waveform, the calculation can be simplified by using the calculated values at the adjacent positions. That is, an addition value obtained by adding the amplitude values of N (= 2n) pieces of data in order to obtain an average value for a certain data is held. At this time, the average value α _k can be calculated as shown in Equation 7 below.

ここで、平均値α_ｋは以下の数８のように変形できる。 Here, the average value α _k can be transformed as shown in the following equation (8).

一方、平均値α_ｋは上記数５で表記されるので、以下の数９のようにも表記できる。 On the other hand, since the average value α _k is expressed by the above formula 5, it can also be expressed by the following formula 9.

したがって、ｋをｋ−１に置き換えると、ｋ−１すなわちｋより１つ前のデータに関する平均値α_ｋー１は以下の数１０のように表記できる。 Therefore, when k is replaced with k−1, the average value α _k−1 regarding the data k−1, that is, the data immediately before _k can be expressed as the following Expression 10.

さらに、α_ｋー１を変形すると、次式数１１のようになる。 Furthermore, when α _k−1 is transformed, the following equation 11 is obtained.

上記数１１でα_ｋー１に代わって平均値α_ｋを求めると、以下の数１２のようになる。 When the average value α _k is _obtained in place of α _k−1 in the above equation 11, the following equation 12 is obtained.

さらにα_ｋー１を再記すると、数１３のようになる。 Furthermore, when α _k−1 is rewritten, the following equation 13 is obtained.

上記数１２、数１３を整理すると、以下の数１４のようになる。 When the above formulas 12 and 13 are arranged, the following formula 14 is obtained.

数１４から、平均値α_ｋはその前段で演算した平均値α_ｋー１を用いて容易に演算することが可能となる。すなわち、α_k−１が求まると、数１４の２項目である以下の数１５にα_ｋ−１に加算することで、平均値α_ｋを求めることができる。 From Equation 14, the average value α _k can be easily calculated using the average value α _k−1 calculated in the preceding stage. That is, when α _k−1 is obtained, the average value α _k can be _obtained by adding α _k−1 to the following equation 15 which is two items of equation 14.

これにより、逐次的にα_ｋ＋１、α_ｋ＋２、・・・α_ｋ＋ｎを求めることができる。上記の演算を図表で表すと、図２（ｂ）のようになる。この図に示すように、α_kとα_k−１を算出する際のデータには共通領域があることが分かる。したがって、これらの共通領域のデータに対しては、一度の計算で演算は終了し、この演算結果をメモリなどの記憶手段に格納しておけば、次の演算に利用することができるので、全体の計算時間が短縮される。以上のように、演算を簡素化し、さらに演算量も減らすことで演算処理を極めて低負荷にすることができ、非常に簡単な演算によってすべての平均値α_ｋを求めるアルゴリズムが得られる。これによって、音声波形から声成分抽出のための特徴抽出を高速かつ簡単に求めることが可能となり、実用性が極めて高い。 Thereby, α _{k + 1} , α _{k + 2} ,... Α _{k + n} can be obtained sequentially. The above calculation is represented in a chart as shown in FIG. As shown in this figure, it can be seen that there is a common area in the data when α _k and α _k−1 are calculated. Therefore, for the data in these common areas, the calculation is completed by a single calculation, and if this calculation result is stored in a storage means such as a memory, it can be used for the next calculation. The calculation time is reduced. As described above, by simplifying the calculation and further reducing the amount of calculation, it is possible to reduce the load of the calculation process, and an algorithm for obtaining all the average values α _k can be obtained by a very simple calculation. This makes it possible to quickly and easily obtain feature extraction for voice component extraction from a speech waveform, and is extremely practical.

次に、実施例としてコンピュータ・シミュレーションにより抽出した特徴量を用いた母音の認識実験の結果を図３〜図１０に示す。
（音声データ） Next, FIG. 3 to FIG. 10 show results of vowel recognition experiments using feature quantities extracted by computer simulation as examples.
(Voice data)

本実施例では、音声データから特徴量を抽出する特徴量抽出部１４として、音声の特徴を生かしアナログ処理とデジタル処理部を組み合わせて作られた専用の集積回路（ＩＣ）を使用した。音声の特徴とは、音声を波形で表したとき一般に正領域と負領域において非対称であること、声帯から送り出される圧力はパルス信号の発生、減衰に基づいていることである。これらの点を考慮してマイクロホン１０から得られる音声信号を電圧値で測定すると、同時に正負の電力値の最大を一定時間保持させながら、次の正負の電圧値を検出するまでの時間を検出し、ピッチを検出している。このＩＣを用いることで、音声波形とピッチを検出することが可能である。図３に、ＩＣにより取得した音声波形とピッチ情報を示す。 In the present embodiment, a dedicated integrated circuit (IC) made by combining analog processing and digital processing using the features of speech is used as the feature amount extraction unit 14 that extracts feature amounts from speech data. The characteristics of speech are that when speech is represented by a waveform, it is generally asymmetric in the positive and negative regions, and the pressure delivered from the vocal cords is based on the generation and attenuation of a pulse signal. Taking these points into consideration, when the audio signal obtained from the microphone 10 is measured as a voltage value, the time until the next positive / negative voltage value is detected while simultaneously holding the maximum positive / negative power value for a certain period of time is detected. , Detecting the pitch. By using this IC, it is possible to detect the speech waveform and pitch. FIG. 3 shows the speech waveform and pitch information acquired by the IC.

また分類は５母音から得られる２母音の組み合わせを用いて、投票形式で行なわれる。そして音声の振幅波形から認識に有効な特徴の抽出と解析を行なう。本実施例では、成人男性１名から６７音素の１７セットの音声データを取得する。サンプリング周波数は８１．９２ｋＨｚである。さらに、音声は自然発話と意識発話のデータを取得し、自然発話は一日の時間を問わず発生した音声であり、意識発話は夜の静かな時間帯に音素をはっきりと発声した音声である。音声の長さとして意識発話のピッチ数は自然発話の約１．５倍の長さで取得されている。
（母音波形抽出と前処理） The classification is performed in a voting format using a combination of two vowels obtained from five vowels. Then, extraction and analysis of features effective for recognition are performed from the amplitude waveform of the speech. In this embodiment, 17 sets of sound data of 67 phonemes are acquired from one adult male. The sampling frequency is 81.92 kHz. In addition, voices are acquired from spontaneous and conscious utterances, and natural utterances are voices that are generated regardless of the time of the day, and conscious utterances are voices that clearly utter phonemes in the quiet hours of the night. . As the length of the voice, the number of consciousness utterances is acquired with a length about 1.5 times that of natural utterances.
(Vessel waveform extraction and preprocessing)

母音識別を行なうためにＩＣにより得られているピッチを参照して音素データから母音の定常とする区間を抽出する。そこで、全ピッチ数の３分の２の位置にあたるピッチを中心に前後の１周期分を抽出し、全部で３周期分の信号を定常な母音波形として特徴を抽出するために用いる。そして、抽出した３周期の母音波形から凹凸波形を生成する。ここで、元の振幅波形である定常とした３周期の母音波形を図４に、Ｎ＝２５６として図４から抽出した１周期分の凹凸波形を図５に、Ｎ＝６４として図４から抽出した１周期分の凹凸波形を図６に、それぞれ示す。これらの図においては、上記数６のａ＝０．８、ｂ＝０．２としている。図５と図６を比較すると、Ｎの小さい図６の方が細かな凹凸波形となっており、振幅波形の山谷を細かく抽出していること、およびＮの大きい図５は、振幅波形の山谷を粗く抽出していることが判る。本実施例では、このようにＮを２５６と６４の２つに設定して得られた凹凸波形を用いて認識実験を行う。なお、Ｎの値として凹凸波形を生成する幅である２５６と６４は経験的に決定しているが、これ以外の値とすることもできることはいうまでもない。 In order to identify vowels, a section in which vowels are steady is extracted from phoneme data with reference to the pitch obtained by the IC. Therefore, one period before and after the pitch corresponding to the position corresponding to two-thirds of the total number of pitches is extracted, and a signal corresponding to a total of three periods is used to extract features as a stationary vowel waveform. Then, an uneven waveform is generated from the extracted three-period vowel waveform. Here, the original three-period vowel waveform, which is the original amplitude waveform, is extracted from FIG. 4 and the irregular waveform for one period extracted from FIG. 4 with N = 256 is extracted from FIG. FIG. 6 shows the uneven waveform for one cycle. In these figures, a = 0.8 and b = 0.2 in Equation 6 above. When FIG. 5 is compared with FIG. 6, FIG. 6 with a smaller N has a finer uneven waveform, and the peaks and valleys of the amplitude waveform are extracted more finely, and FIG. It can be seen that is extracted roughly. In this embodiment, the recognition experiment is performed using the uneven waveform obtained by setting N to 256 and 64 in this way. In addition, although 256 and 64 which are the width | variety which produces an uneven | corrugated waveform as a value of N are determined empirically, it cannot be overemphasized that it can also be set as other values.

このようにして得られた振幅波形と凹凸波形を用いて時間軸上で母音の特徴量を抽出する。特徴量は主に１周期分の波形から抽出している。その１周期分の波形は３周期の波形の始点とした点から類似度（ユークリッド距離）を計算し、距離の近い２周期を選択する。そして時間的に早く存在している波形を選択する。これは中心より後半での母音波形の抽出を行っているために、後半に位置する母音波形よりも前半に位置する母音波形のほうが、より母音の特徴を保った波形であると考えられるためである。このように選択された１周期分の波形部分と３周期分の凹凸波形から特徴を抽出する。次に、母音を識別するための特徴量について述べる。抽出する特徴量の数は全部で５個である。母音を識別するための提案システムは、５母音の内の２母音の各組み合わせから選ばれた母音の投票数により識別したい母音であると決定している。２母音の組み合わせから選択すべき母音を識別するために、各２母音を識別する特徴は組み合わせごとに異なる。２母音を識別しやすい特徴量を各組み合わせごとに選択することで比較的高い識別率が得られると考えられる。抽出する５つの特徴量は、以下の通りである。 The feature amount of the vowel is extracted on the time axis using the amplitude waveform and the uneven waveform thus obtained. The feature amount is mainly extracted from the waveform for one period. For the waveform for one cycle, the similarity (Euclidean distance) is calculated from the point where the waveform of the three cycles is the starting point, and two cycles with close distances are selected. Then, a waveform that exists earlier in time is selected. This is because the vowel waveform is extracted in the latter half of the center, so the vowel waveform located in the first half is considered to have a waveform with more vowel characteristics than the vowel waveform located in the second half. is there. Features are extracted from the waveform portion for one cycle selected in this way and the uneven waveform for three cycles. Next, feature amounts for identifying vowels will be described. The total number of feature amounts to be extracted is five. The proposed system for identifying vowels determines that the vowel is to be identified by the number of vowel votes selected from each combination of two vowels out of five vowels. In order to identify a vowel to be selected from a combination of two vowels, the characteristics for identifying each two vowels are different for each combination. It is considered that a relatively high identification rate can be obtained by selecting, for each combination, a feature quantity that can easily identify two vowels. The five feature quantities to be extracted are as follows.

（１）２５６凹凸波形を参照し、母音波形の１周期の始まりから探索して得られる最初の凸部分の幅
（２）特徴量１で検出された凸幅に存在する振幅波形の面積
（３）特徴量１で検出された凸幅に存在する振幅波形の分散値
（４）特徴量１で検出された凸幅に存在する振幅波形を０〜１に正規化を行い、凸幅で生成した正弦波との類似性
（５）３周期分の６４凹凸波形に存在する凸の数
である。 (1) The width of the first convex portion obtained by searching from the beginning of one cycle of the vowel waveform with reference to the 256 concave / convex waveform (2) The area of the amplitude waveform existing in the convex width detected by the feature 1 (3) ) Dispersion value of the amplitude waveform existing in the convex width detected by the feature amount 1 (4) The amplitude waveform existing in the convex width detected by the feature amount 1 is normalized to 0 to 1 and generated by the convex width Similarity with sine wave (5) The number of protrusions present in 64 uneven waveforms for 3 periods.

ここで特徴量４について詳述する。まず特徴量４を抽出するために正弦波を生成する。その正弦波は、
（１）凸区間の振幅値を０〜１に正規化を行う。その振幅値の最大値の位置の検出を行う。
（２）始点から最大値位置までに０からπ／２までの正弦波を生成する。最大値位置から終点までにπ／２からπまでの正弦波を生成する。
という処理で行なう。正弦波により得られた波形と振幅波形の例を図７と図８に示す。図７は、凸部分の／ｅ／音素波形と生成した正弦波であり、図８は、凸部分の／ｏ／音素波形と生成した正弦波である。特徴量４はこの二つの波形から逐次４点で角度が計算され、その差の合計が特徴量とされる。これらの特徴量の大半は１周期の始めに存在する凸部分の特徴量で構成されている。この部分は母音波形（定常波形）で最も変化があり、また各母音の特徴が現われていると考えられる。この始めに位置する凸部分の検出は１周期分の２５６凹凸波形から検出され、凹幅が最も大きい箇所の次に現われる凸部分である。
（母音認識実験） Here, the feature amount 4 will be described in detail. First, a sine wave is generated to extract the feature quantity 4. The sine wave is
(1) Normalize the amplitude value of the convex section to 0-1. The position of the maximum value of the amplitude value is detected.
(2) A sine wave from 0 to π / 2 is generated from the start point to the maximum value position. A sine wave from π / 2 to π is generated from the maximum value position to the end point.
This process is performed. Examples of waveforms and amplitude waveforms obtained from the sine wave are shown in FIGS. FIG. 7 shows the convex part / e / phoneme waveform and the generated sine wave, and FIG. 8 shows the convex part / o / phoneme waveform and the generated sine wave. The feature amount 4 is calculated from four angles sequentially from these two waveforms, and the sum of the differences is taken as the feature amount. Most of these feature amounts are composed of feature amounts of convex portions existing at the beginning of one cycle. This part has the most change in the vowel waveform (stationary waveform), and it is considered that the characteristics of each vowel appear. The detection of the convex portion located at the beginning is detected from the 256 concave / convex waveform for one cycle, and is a convex portion that appears next to the portion having the largest concave width.
(Vowel recognition experiment)

次に、抽出した特徴量を用いて離散ボロノイ図と最小２乗法を適用し母音の識別を行なう。離散ボロノイ図とは空間中に配置された多数の母点の勢力範囲を表す図である。離散ボロノイ図は母点の存在する空間を離散化し、各画素がどの母点に近いかによって空間を分割したものである。本実施例で用いた離散ボロノイ図は逐次添加法を適用している。これは離散化された空間に母点を一つずつ追加していき、新しいボロノイ領域だけを効率よく作成することで、高速に離散ボロノイ図を作成する方法である。本実施例で用いた離散ボロノイ領域は５１６０×５１６０の領域であり、各母点との対応は得られている母点の最大値を用いて０から最大値までを均等に分割している。
（母音認識アルゴリズム） Next, the vowels are identified by applying the discrete Voronoi diagram and the least square method using the extracted feature quantity. The discrete Voronoi diagram is a diagram showing the power range of a large number of generating points arranged in the space. The discrete Voronoi diagram discretizes the space where the mother point exists, and divides the space depending on which mother point each pixel is close to. The discrete Voronoi diagram used in this example applies the sequential addition method. This is a method of creating discrete Voronoi diagrams at high speed by adding generating points one by one to the discretized space and efficiently creating only new Voronoi regions. The discrete Voronoi region used in this example is a 5160 × 5160 region, and the correspondence with each generating point is divided equally from 0 to the maximum value using the maximum value of the generated generating points.
(Vowel recognition algorithm)

識別アルゴリズムは２母音の組み合わせ（_５Ｃ_２）によりなされる。そして、各組み合わせに適した特徴量を用いて母音を選択し、全組み合わせから得られた投票数により識別する母音を決定する。その各組み合わせに適した特徴量（識別のために使用した）を表１に示す。 The identification algorithm is made by a combination of two vowels ( ₅ C ₂ ). And a vowel is selected using the feature-value suitable for each combination, and the vowel identified by the number of votes obtained from all the combinations is determined. Table 1 shows feature amounts (used for identification) suitable for each combination.

ここで、識別のための特徴量の組み合わせを示す表１において、上段の／ａ／−／ｉ／等は各母音の組み合わせを表しており、下段の１−２等は使用した特徴量の番号（１）および（２）を示している。これらの２母音の識別を行なう特徴の選択は全データから抽出した特徴量の分布を調査し、目視による特徴量の分布を確認した後に経験的に決定している。さらに母音の識別を行なうための特徴量の数が２個であるのは、単一の特徴量を用いて識別を行なえば簡単な閾値処理になるため処理時間は少なくて済むが、同じ母音であっても分布の広がりのために境界付近のデータに関して単一の特徴で識別を行なうと誤認識を起こしやすいと考えられるからである。また２個の特徴量を用いてることで目視によるデータ分布の確認を行ないやすいことや、直線的な閾値処理ではなく非線形的な識別が可能であり演算回数をできるだけ少なくすることが目的である。 Here, in Table 1 showing combinations of feature quantities for identification, / a / − / i / etc. in the upper row represents combinations of vowels, and 1-2 etc. in the lower row represents feature number numbers used. (1) and (2) are shown. The selection of the features for discriminating these two vowels is determined empirically after examining the distribution of feature amounts extracted from all data and confirming the distribution of feature amounts by visual observation. Furthermore, the number of feature quantities for identifying vowels is two. If a single feature quantity is used for identification, it becomes simple threshold processing, so the processing time can be reduced. This is because it is considered that erroneous recognition is likely to occur if the data near the boundary is identified with a single feature due to the spread of the distribution. Another object of the present invention is to make it easy to check the data distribution visually by using two feature amounts, and to enable non-linear discrimination instead of linear threshold processing, and to reduce the number of calculations as much as possible.

本手法で母音識別は離散ボロノイ図による２クラスの母音の組み合わせの境界線を求めるものである。得られている特徴量をマッピングしボロノイ領域に分ける。そして、同カテゴリによる領域の統合を図り、２クラス間の境界部分の座標を得る。そして、その座標により構成される境界線を最小２乗法により境界線の関数を算出する。その境界線の関数を用いて母音の識別は２クラス間の投票形式により行なわれる。選択された母音の投票数が単独で最も多く存在した場合のみ、その母音に識別されたとする。一番多い投票数が等しく複数の母音に見られる場合は破棄する。ここで、／ｉ／と／ｕ／以外の組み合わせに対する識別は抽出した特徴量をそのまま用いている。しかし、／ｉ／と／ｕ／の識別に対しては特徴量１の常用対数をとることで数値を変換している。これは特徴量５に対して特徴量１の値のスケールが大きいためであり、最小２乗法を適用して境界線の関数を求めやすくするためである。また比較実験としてマハラノビス距離を用いた認識実験を行なう。これは各組み合わせから母音の選択はマハラノビス距離に基づいて行なうものである。マハラノビス距離は各グループの中心から分散を考慮した距離を示す。本実施例で抽出された母音の特徴は分布に偏りのある特徴量である。そのためマハラノビス距離による識別が有効であると考えられる。意識発話による各母音デー多数は１８８個であり、自然発話による各母音データは１７８個である。そして各識別に適した特徴量を用いてボロノイ図とマハラノビスにより得られた自然発話による識別結果と意識発話による識別結果を表２と表３に示す。表２は、離散ボロノイ図を用いた自然発話と意識発話の単独第一候補による認識率を、表３はマハラノビス距離による自然発話と意識発話の単独第一候補による認識率を、それぞれ示している。 In this method, vowel identification is to find a boundary line of a combination of two classes of vowels using a discrete Voronoi diagram. The obtained features are mapped and divided into Voronoi regions. Then, the regions are integrated according to the same category to obtain the coordinates of the boundary portion between the two classes. Then, the function of the boundary line is calculated by the least square method for the boundary line constituted by the coordinates. Vowels are identified by a voting format between two classes using the boundary function. Assume that a vowel is identified only when the number of votes of the selected vowel is the largest. Discard if the most votes are found in multiple vowels. Here, for the combinations other than / i / and / u /, the extracted feature amount is used as it is. However, for identification of / i / and / u /, the numerical value is converted by taking the common logarithm of the feature quantity 1. This is because the scale of the value of the feature quantity 1 is larger than that of the feature quantity 5, and the boundary function is easily obtained by applying the least square method. As a comparative experiment, a recognition experiment using Mahalanobis distance is performed. This is because the selection of vowels from each combination is performed based on the Mahalanobis distance. Mahalanobis distance indicates the distance considering the variance from the center of each group. The features of the vowels extracted in the present embodiment are feature amounts with a biased distribution. Therefore, it is considered that identification by Mahalanobis distance is effective. The number of vowel data by conscious utterance is 188, and the number of vowel data by natural utterance is 178. Tables 2 and 3 show the discrimination results based on natural utterances and the consciousness utterances obtained by Voronoi diagrams and Mahalanobis using feature quantities suitable for each discrimination. Table 2 shows the recognition rate of natural utterances and consciousness utterances using a discrete first candidate using discrete Voronoi diagrams, and Table 3 shows the recognition rates of natural utterances and consciousness utterances based on the Mahalanobis distance. .

以上のように、離散ボロノイ図を適用し境界線を求めることで、マハラノビス距離を用いて行なった認識実験より認識率が向上している。これは離散ボロノイ図により明確な境界線の導出が可能であることで認識率が向上したと考えられ良好な結果を得ることができた。また表３に示すように、マハラノビス距離を用いた認識結果から自然発話と意識発話の両方とも／ｕ／と／ｅ／の識別結果が他の母音に比べて低い。これは／ｉ／と／ｕ／、／ｅ／と／ｏ／の抽出した特長が境界を超えて存在するためである。この境界付近に存在する特徴を持つ母音は、マハラノビス距離により分散を考慮され境界を得ているが、正しい識別が行なえていないことが考えられる。この理由として、現在用いているデータから正確な各母音に関する分散が計算できないこと、あるいは音圧データを用いることで類似している母音の特徴の分布状態に偏りが存在すること等が考えられる。 As described above, by applying the discrete Voronoi diagram and obtaining the boundary line, the recognition rate is improved compared to the recognition experiment performed using the Mahalanobis distance. It was considered that the recognition rate was improved because a clear boundary line could be derived from the discrete Voronoi diagram, and good results were obtained. Also, as shown in Table 3, the recognition results using the Mahalanobis distance indicate that both the natural utterance and the conscious utterance have lower identification results of / u / and / e / than other vowels. This is because the extracted features of / i / and / u / and / e / and / o / exist beyond the boundary. Vowels with features existing near this boundary are considered to be dispersed by taking into account the dispersion by the Mahalanobis distance, but it is possible that they are not correctly identified. This may be due to the fact that accurate dispersion for each vowel cannot be calculated from the currently used data, or that there is a bias in the distribution of similar vowel features by using sound pressure data.

ここで、意識発話に用いた特徴の分布を図９と図１０に示す。図９は／ｉ／と／ｕ／に関する特徴分布を示し、図１０は／ｅ／と／ｏ／に関する特徴分布を示している。この結果から、自然発話より意識発話の結果が良好であることがわかる。これは意識して声を発声しているために、母音部分の１周期波形（ピッチ）が多く現われ、安定した母音波形を抽出できていると考えられる。このように意識発話を行なうことで比較的良好な識別結果が得られる。さらに簡易な演算のみで特徴量の抽出やマハラノビス距離による識別を行なっているため小規模なハードウェア構成で実現可能であると考えられる。しかし、認識精度において離散ボロノイ図を用いた認識精度には及ばない。そのために離散ボロノイ空間の設定を適切に行なうことができれば、離散ボロノイ図を用いた小規模なハードウェアが構築できると考えられる。このように、本実施の形態を適用することで比較的良好な認識精度が得られると考えられる。また、境界線を求めるのではなく勢力図を保持することで特徴量をその勢力図に照らし合わせ、母音の決定も可能である。 Here, the distribution of features used for consciousness utterances is shown in FIGS. FIG. 9 shows the feature distribution for / i / and / u /, and FIG. 10 shows the feature distribution for / e / and / o /. From this result, it can be seen that the result of the conscious utterance is better than the natural utterance. This is because the vowel is consciously uttered, and thus one periodic waveform (pitch) of the vowel part appears and it is considered that a stable vowel waveform can be extracted. By performing conscious utterance in this way, a relatively good identification result can be obtained. Furthermore, since feature extraction and identification based on Mahalanobis distance are performed only by simple calculation, it can be realized with a small hardware configuration. However, the recognition accuracy does not reach the recognition accuracy using the discrete Voronoi diagram. Therefore, if the discrete Voronoi space can be set appropriately, it is considered that small-scale hardware using a discrete Voronoi diagram can be constructed. Thus, it is considered that relatively good recognition accuracy can be obtained by applying this embodiment. In addition, it is possible to determine a vowel by comparing a feature amount with the power diagram by holding a power diagram instead of obtaining a boundary line.

以上のように、本実施例によってモバイル機器のための音声認識システムが実現可能となる。特に小型のハードウェアのために比較的簡単な演算による母音の振幅波形からの特徴量抽出と５母音の識別を行ない、その有効性を検証した。 As described above, this embodiment makes it possible to implement a voice recognition system for mobile devices. Especially for small hardware, we extracted features from vowel amplitude waveforms and identified five vowels by relatively simple calculations, and verified their effectiveness.

さらに、平均値等による特徴量のスケールの正規化を行なうことで、より小スペースで離散ボロノイ図が適用可能となり、演算時間を少なくすることができる。また、より明確な識別が可能である特徴の抽出にも適用可能である。例えば、識別境界線を２分類を行なうのに適した手法であるＳＶＭを用いることができる。また、母音波形の１周期ごとにも違いがあるために定常であるとする母音波形の特定や母音に対して音声の始まりから終わりまでのピッチ幅の変動等を調査し、標準モデルとして登録しておくことで識別に有効な特徴の抽出や演算時間がさらに改善される。
（補聴機能） Further, by normalizing the scale of the feature amount by the average value or the like, the discrete Voronoi diagram can be applied in a smaller space, and the calculation time can be reduced. Further, the present invention can be applied to extraction of features that can be identified more clearly. For example, SVM which is a technique suitable for classifying the identification boundary line into two can be used. Also, because there is a difference in each cycle of the vowel waveform, the vowel waveform is identified as being stationary, the pitch width fluctuation from the beginning to the end of the vowel is investigated, and registered as a standard model. This makes it possible to further improve the feature extraction and calculation time.
(Hearing aid function)

さらに、本発明は音声信号処理の前処理として広い応用範囲を備えており、入力された音声信号の声成分抽出や音声認識のみならず、音声の出力に際しての処理、例えば音声をより聞き取りやすい形に加工することもできる。この機能を応用すれば、音声をより聞き取りやすく加工する補聴機能として、補聴器等に利用できる。 Furthermore, the present invention has a wide range of applications as a pre-process for audio signal processing. In addition to voice component extraction and audio recognition of an input audio signal, a process for outputting audio, for example, a form that makes audio easier to hear. Can also be processed. If this function is applied, it can be used in a hearing aid or the like as a hearing aid function for processing voice more easily.

従来の補聴器では単純に検知した音声信号をすべて増幅させて音量を大きくするものであった。しかしながら、補聴器から出力される音量を増加すると、目的とする音以外の雑音も大きく聞こえるため、耳にガンガンと音が入り頭痛障害等を引き起こしたり不快感を伴うことがあった。また、小さなスピーカでは音が割れて音質が劣化する。さらに音量を大きくすると消費電力が大きくなり、長時間の使用が困難になる。さらにまたイヤースピーカの口径が大きくなり、重量も増加し、補聴器全体の形状も大きくなるといった欠点があった。 Conventional hearing aids simply increase the volume by amplifying all detected audio signals. However, when the volume output from the hearing aid is increased, noises other than the target sound can be heard loudly, which may cause headaches and other unpleasant feelings. In addition, with a small speaker, the sound is broken and the sound quality deteriorates. Further, when the volume is increased, power consumption increases and it becomes difficult to use for a long time. Furthermore, the diameter of the ear speaker is increased, the weight is increased, and the overall shape of the hearing aid is increased.

これに対して、本発明の音声信号の特徴量抽出機能を応用し、声の聞き取り難い成分を聞き取りやすく加工することができる。すなわち、発音を峻別する部位のみを強調する処理によって、ノイズ成分を大きくすることなく聞き分けの容易な音声に加工できる。この方法では、音声認識のような高度な識別までは不要で、声成分の抽出ができれば十分であるため、より精度を向上させることができ、しかも処理をさらに簡素化できる。
（音声信号の取得手順） On the other hand, by applying the feature extraction function of the audio signal of the present invention, it is possible to easily process components that are difficult to hear. That is, it is possible to process a voice that is easy to distinguish without increasing the noise component by the process of emphasizing only the part that distinguishes pronunciation. This method does not require advanced identification such as speech recognition, and it is sufficient if the voice component can be extracted. Therefore, the accuracy can be further improved, and the processing can be further simplified.
(Audio signal acquisition procedure)

この音声信号処理方法の手順を、図１１のフローチャートに示す。図１１に示すように、音声入力部１０で音声信号波形を取得した後、声成分抽出部１６で人の声成分を抽出し、さらに強調処理部１８で凹凸波形を強調して強調声波形を生成して、この強調声波形に基づいて音声出力部２０で補正された声を出力する。以下、図１２〜図１３を参照して凹凸波形変換部１４及び声成分抽出部１６で音声信号波形から声成分を抽出する手順を説明する。図１２〜図１３は、取得した音声信号波形をそれぞれ示している。図１２は、音声の最も低い周波数に対応する波形を示しており、その周期はＴ_１である。一方、図１３は、音声の最も高い周波数に対応する波形を示しており、その周期はＴ_２である。人の声に含まれる周波数成分は、男性でも最低の周波数ｆ_１（＝１／Ｔ_１）は通常１００Ｈｚ止まりであり、１００Ｈｚ以下の周波数成分は殆ど無い。このとき、周期Ｔ１は１／１００Ｈｚ＝０．０１秒＝１０ｍｓである。そしてこの一周期の正又は負の半サイクルは、その１／２、すなわち１０ｍｓ／２＝５ｍｓとなる。したがって、音声信号波形のサンプリングは、５ｍｓ（０．００５秒）程度の幅（窓）でサンプリング周期を設定すれば、音声の最も低い周波数Ｔ１にも対応してサンプリングが可能となる。本実施の形態では、周波数が１００Ｈｚ近傍にある場合の余裕を考慮して、６ｍｓに設定する。言い換えると、半波長が６ｍｓよりも長い成分は人の声でない成分（すなわちノイズ）であると峻別でき、これによって音声信号から人の声（の低音領域）に相当する成分を抽出することができる。 The procedure of this audio signal processing method is shown in the flowchart of FIG. As shown in FIG. 11, after the voice signal waveform is acquired by the voice input unit 10, the human voice component is extracted by the voice component extraction unit 16, and the embossed waveform is further emphasized by the enhancement processing unit 18. Generated and output a voice corrected by the voice output unit 20 based on the emphasized voice waveform. The procedure for extracting the voice component from the voice signal waveform by the concave / convex waveform converting unit 14 and the voice component extracting unit 16 will be described below with reference to FIGS. 12 to 13 show the acquired audio signal waveforms, respectively. Figure 12 shows a waveform corresponding to the lowest frequency of the audio, the period is T _1. On the other hand, FIG. 13 shows a waveform corresponding to the highest frequency of the audio, the period is T _2. The frequency component contained in the human voice is usually 100 Hz at the lowest frequency f ₁ (= 1 / T ₁ ) even in men, and there is almost no frequency component below 100 Hz. At this time, the period T1 is 1/100 Hz = 0.01 seconds = 10 ms. Then, the positive or negative half cycle of one cycle is 1/2 of that, that is, 10 ms / 2 = 5 ms. Therefore, the sampling of the audio signal waveform can be performed corresponding to the lowest frequency T1 of the audio if the sampling period is set with a width (window) of about 5 ms (0.005 seconds). In the present embodiment, 6 ms is set in consideration of a margin when the frequency is in the vicinity of 100 Hz. In other words, a component whose half wavelength is longer than 6 ms can be distinguished as a component that is not a human voice (that is, noise), and thereby a component corresponding to a human voice (its low sound region) can be extracted from the audio signal. .

例えば、音楽用ＣＤと同等の音質でサンプリングする場合、ｆ＝４４．１ｋＨｚすなわち１秒間に４４１００サンプルを取得する必要がある。この周期でサンプリングする場合、０．６ｍｓの区間（窓）で取得されるサンプル数は、４４１００サンプル×０．００６ｍｓ＝２６４．６サンプルとなる。したがって、上記の手法でサンプリングする場合は、Ｎ＝２６４．６に設定すればよいことになる。ここで、上述したようにＮを２のべき乗に設定すれば、ビットシフトでの演算が可能となり、浮動小数点演算を使用しない整数型演算で高速且つ低負荷に処理することができる。ここでＮ＝２^ｎとすると、Ｎ＝２^８＝２５６とすれば、上記とほぼ等しい設定が実現できる。この場合は、ｎ＝８を採用する。以上から、低音の場合はｎ＝８を採用することで、低い周波数でのサンプリングすなわち音声信号取得に対応できることが判る。 For example, when sampling with sound quality equivalent to that of a music CD, it is necessary to obtain 44100 samples at f = 44.1 kHz, that is, one second. In the case of sampling at this cycle, the number of samples acquired in the 0.6 ms section (window) is 44100 samples × 0.006 ms = 264.6 samples. Therefore, when sampling is performed by the above method, N = 264.6 may be set. Here, if N is set to a power of 2, as described above, it is possible to perform an operation by bit shift, and an integer type operation that does not use a floating-point operation can be processed at high speed and with a low load. Assuming that N = ²ⁿ , a setting substantially equal to the above can be realized if N = 2 ⁸ = 256. In this case, n = 8 is adopted. From the above, it can be seen that by adopting n = 8 in the case of low sound, it is possible to cope with sampling at a low frequency, that is, acquisition of an audio signal.

同様に、高い周波数ｆ_２（＝１／Ｔ_２）に対しては、ｎ＝１，２，３のいずれか一を採用できる。これによって、人の声の高い周波数成分よりもさらに高い周波数成分は、人の声でないノイズであると峻別できる。この結果、人の声が含まれない低域と高域をカットして人の声のみを抽出できる。この例では、音声信号の高周波域信号、中周波域信号、低周波域信号に分けて、それぞれ低周波域に対してはｎ＝８、中周波域に対してはｎ＝５又は６、高周波域に対してはｎ＝１、２、３のいずれかに設定している。高域、中域、低域のそれぞれにおいて、ｎの値をいずれに設定するかは、使用される環境やユーザの聴力特性などに応じて設定される。また、高域、中域、低域の３つで分ける他、高域と低域のみの２つを利用する方法や４以上に区別する方法も用途やユーザなどに応じて適宜採用できる。 Similarly, for high frequency f ₂ (= 1 / T ₂ ), any one of n = 1, 2, 3 can be adopted. As a result, a frequency component higher than the high frequency component of the human voice can be distinguished as noise that is not a human voice. As a result, it is possible to extract only human voices by cutting low and high frequencies that do not include human voices. In this example, the audio signal is divided into a high frequency region signal, a medium frequency region signal, and a low frequency region signal, n = 8 for the low frequency region, n = 5 or 6 for the medium frequency region, For the area, n = 1, 2, or 3 is set. In each of the high frequency range, the mid frequency range, and the low frequency range, the value of n is set according to the environment used, the hearing characteristics of the user, and the like. In addition to dividing into three areas of high, middle, and low bands, a method that uses only two of the high band and the low band, or a method that distinguishes four or more can be used as appropriate according to the application and user.

以上のように、取得した音声信号に対して、ｎの値を高域、中域、低域の３つでそれぞれ変化させて人の声を含む音声信号として高周波域信号、中周波域信号、低周波域信号をそれぞれ抽出することができる。いいかえると、サンプリングの際にｎの値を変化させることで、人の声にあたる成分のみを抽出できる。
言い換えると、以上の工程では、音声信号波形から低音と高音をカットして人の声の成分を抽出し、これをデジタル処理して凹凸波形としている。このようにして得られた凹凸パターンは人の声成分を含んでいるため、これを明瞭にするための補正を行う。
（強調声データの生成） As described above, with respect to the acquired audio signal, the value of n is changed in each of the high range, the mid range, and the low range, and the audio signal including the human voice is changed as a high frequency range signal, an intermediate frequency range signal, Each low frequency signal can be extracted. In other words, only the component corresponding to a human voice can be extracted by changing the value of n during sampling.
In other words, in the above process, a low tone and a high tone are cut from the audio signal waveform to extract a human voice component, and this is digitally processed to form an uneven waveform. Since the concavo-convex pattern obtained in this way contains a human voice component, correction is performed to clarify this.
(Generation of emphasized voice data)

次に、このようにして取得された各帯域の音声信号に対して、強調処理部１８で音声を聞き取りしやすくするための補正処理を行う。具体的な処理を図１４〜図１５に基づいて説明する。従来、補聴器などの聴覚補助装置においては、取得された音声の音量を一律に引き上げることで聞き取りやすくしていた。しかしながら、単にボリュームを上げるだけでは、取得された音声に含まれるノイズも増幅されるため、耳元で喧しく再生されるだけで快適に聴取できるとは言い難い状況であった。そこで本実施の形態においては、デジタル信号で取得された音声信号に対して、人の声のみを抽出し、さらに人の声を聞き取りやすくする処理を加えることによって、音量を変えずに聴取しやすくできる。具体的には、図１４に示すように、波形の凸部を持ち上げ、一方凹部を引き下げることで、実線で示す凹凸を強調した波線で示す強調声波形に変換する。これによって、音のメリハリが強調され、聴取しやすい音声に補正することが可能となる。また、波形の補正量は、低域で大きく、高域で小さくなるように設定することで、より聴取しやすい音声に補正できる。 Next, the enhancement processing unit 18 performs a correction process to make it easier to hear the voice signal of each band acquired in this way. Specific processing will be described with reference to FIGS. Conventionally, in hearing aids such as hearing aids, it has been easy to hear by raising the volume of the acquired sound uniformly. However, simply increasing the volume also amplifies the noise contained in the acquired sound, so it is difficult to say that it can be comfortably heard just by playing back at the ear. Therefore, in the present embodiment, it is easy to hear without changing the volume by extracting only the human voice from the audio signal acquired as a digital signal and further adding a process for making the human voice easy to hear. it can. Specifically, as shown in FIG. 14, by raising the convex portion of the waveform and pulling down the concave portion, the waveform is converted to an emphasized voice waveform indicated by the wavy line with the unevenness indicated by the solid line emphasized. As a result, the sharpness of the sound is emphasized, and the sound can be corrected to be easy to hear. Further, by setting the waveform correction amount so that it is large in the low range and small in the high range, it can be corrected to a voice that is easier to hear.

具体的な演算としては、上記の手法で検出された凹凸波形に対して、凹凸のそれぞれに所定の係数を乗算する。この際、小数点を含む浮動小数点演算では演算処理が複雑化し、特にリアルタイム処理が求められる補聴器などの頂角補助装置においては要求される仕様が高くなる。そこで、整数の加算、乗算のみで演算可能なように、ビットシフトの手法を適用する。 As a specific calculation, each of the concavo-convex waveforms detected by the above method is multiplied by a predetermined coefficient. At this time, in floating point arithmetic including a decimal point, the arithmetic processing is complicated, and in particular, a required specification is high in a vertical angle assisting device such as a hearing aid that requires real-time processing. Therefore, a bit shift method is applied so that computation can be performed only by addition and multiplication of integers.

まず、凸部の強調においては、（１＋１／２^Ｕ）を乗算する。例えば、Ｕ＝３の場合は、１＋１／２^３＝１＋１／８＝１＋０．１２５＝１．１２５となり、整数値に３回ビットシフト演算を行い、元の値に加算することで１．１２５倍に強調できる。また同様にＵ＝０の場合は、１＋１／２^０＝１＋１＝２、Ｕ＝１の場合は、１＋１／２^１＝１＋０．５＝１．５、Ｕ＝２の場合は、１＋１／２^２＝１＋０．２５＝１．２５、Ｕ＝４の場合は、１＋１／２^４＝１＋０．０６２５＝１．０６２５、Ｕ＝５の場合は、１＋１／２^５＝１＋０．０３１２５＝１．０３１２５等、Ｕの値を変化させることで強調の程度を調整できる。このように、整数に加算とビットシフト演算を繰り返すことで、高速で低負荷な波形補正が実現できる。
（強調声波形の補正量） First, in emphasizing the convex part, (1 + 1/2 ^U ) is multiplied. For example, in the case of U = 3, 1 + 1/2 ³ = 1 + 1/8 = 1 + 0.125 = 1.125, which is 1.125 times by performing bit shift operation three times on the integer value and adding it to the original value Can be emphasized. Similarly, when U = 0, 1 + 1/2 ⁰ = 1 + 1 = 2, when U = 1, 1 + 1/2 ¹ = 1 + 0.5 = 1.5, when U = 2, 1 + 1/2 ² = 1 + 0.25 = 1.25, U = 4, 1 + 1/2 ⁴ = 1 + 0.0625 = 1.0625, U = 5, 1 + 1/2 ⁵ = 1 + 0.03125 = 1.03125, etc. The degree of emphasis can be adjusted by changing the value of U. In this way, by repeating the addition and the bit shift operation to the integer, high-speed and low-load waveform correction can be realized.
(Correction amount of emphasized voice waveform)

本実施の形態では、低域では補正量を大きくし、高域では小さく、中域ではその中間となるように設定している。具体的には、低域ではＵ＝０又は１、中域ではＵ＝２、高域ではＵ＝３又は４を採用している。これにより、凹凸波形の凹凸を強調した聴取しやすい音声波形に変換することができる。図１５に示すように、高域、中域、低域毎に所定の周期でサンプリングを行いｔ_１〜ｔ_５に示すようなスペクトルが得られる。さらに実線で示す波形の凸部及び凹部を各々検出し、凹凸を強調して波線で示すような強調声波形を生成する。この強調声波形に従って音声信号をスピーカなどの出力部から出力すると、ノイズなどの成分は強調されることなく、音声の、聞き取りやすいポイント部分のみが音量が大小に変化される結果強調されて再生されるので、極めて聴取しやすい音声となって、またノイズが強調されない結果Ｓ／Ｎ比の優れた補聴器として理想的な特性を得ることができる。特に、従来のように単純に音量を上げる方法では背景音なども強調される結果、耳元でガンガンと音が鳴る不快な状態となる。これに対して本実施の形態では、音量を殊更上げることなく、僅かに調整するのみで聞き取りやすい形に補正したのみであるため、非常に快適に使用できる。また必要に応じて、補正後の音量を調整する機能を付加することも可能であることはいうまでもない。 In this embodiment, the correction amount is set to be large in the low range, small in the high range, and intermediate in the mid range. Specifically, U = 0 or 1 is adopted in the low band, U = 2 in the middle band, and U = 3 or 4 in the high band. Thereby, it can convert into the audio | voice waveform which is easy to hear which emphasized the unevenness | corrugation of the uneven | corrugated waveform. As shown in FIG. 15, sampling is performed at a predetermined cycle for each of the high frequency range, the mid frequency range, and the low frequency range, and a spectrum as shown in t ₁ to t ₅ is obtained. Furthermore, the convex part and the concave part of the waveform indicated by the solid line are detected, and the emphasized voice waveform as indicated by the dashed line is generated by emphasizing the unevenness. When an audio signal is output from an output unit such as a speaker according to this emphasized voice waveform, noise and other components are not emphasized, and only the point portion of the voice that is easy to hear is emphasized and reproduced as a result of the volume changing. Therefore, the sound becomes extremely easy to hear, and noise is not emphasized. As a result, an ideal characteristic can be obtained as a hearing aid having an excellent S / N ratio. In particular, the conventional method of simply increasing the volume emphasizes the background sound and the like, resulting in an unpleasant state of sounding sharply at the ear. On the other hand, in the present embodiment, the sound volume can be used very comfortably because it is corrected to a form that is easy to hear by only slightly adjusting without increasing the volume. Needless to say, it is possible to add a function for adjusting the volume after correction as necessary.

以上のように、本実施の形態によれば、音声信号波形のフーリエ変換を行うことなく、フーリエ級数と同等の周波数成分が取得でき、しかもそのための演算を整数型の加算、乗算のみで処理でき、極めて高速かつ低負荷な処理とすることができ、安価なハードウェア、ソフトウェア構成においてもリアルタイム処理が可能であり、システムの小型化が可能となる。また、ビットシフトはレジスタで構成でき、またワイヤードロジックで実現できるため、安価且つ簡単な構成とできる。このため、補聴器のような小型化、軽量化が求められる装置への実装には理想的である。また、携帯電話などへの実装も容易に行える利点が得られる。
（子音の強調処理） As described above, according to the present embodiment, it is possible to acquire a frequency component equivalent to the Fourier series without performing Fourier transform of the audio signal waveform, and further, it is possible to process only the integer type addition and multiplication. The processing can be performed at extremely high speed and with a low load, and real-time processing is possible even with inexpensive hardware and software configurations, and the system can be downsized. In addition, the bit shift can be configured by a register and can be realized by wired logic, so that the configuration can be made inexpensive and simple. Therefore, it is ideal for mounting on a device that is required to be smaller and lighter, such as a hearing aid. Further, there is an advantage that it can be easily mounted on a mobile phone or the like.
(Consonant enhancement)

以上の処理は、音声信号に対して母音と子音を特に区別することなく補正を行っている。ただ、母音よりも子音を強調する処理とすることで、より聞き取りやすくすることもできる。一般には、母音でなく子音の成分を聞き取りやすくすることで、音声の聞き取り難い成分を聞き取りやすくして音の判別が容易になる。すなわち、音声は子音と母音に分かれるが、母音は比較的音量が大きく聞き取りやすいのに対し、子音は時間的に短く音量も小さくなりがちで聞き取り難い部分となる。そのため、子音の部分を強調すれば、聞き取りやすくすることができる。検出された音声信号から、子音と母音とを区別するには、上述した音声認識技術を適用してもよい。この場合は、具体的な発生音の識別までは不要で、母音と子音の区別ができれば十分であるため、より精度を向上させることができ、しかも処理をさらに簡素化できる。これによって子音と母音とを区別し、さらに再生前に音声信号を加工する。また母音と子音と区別は、フォルマント等に基づいて行うこともできる。例えば、母音についてはサンプリングの周期を大きくして凹凸波形を取得し、凹凸数を計数することで判別できる。 The above process corrects the vowels and consonants without particularly distinguishing them from the audio signal. However, the process of emphasizing consonants rather than vowels can make it easier to hear. In general, by making it easy to hear consonant components instead of vowels, it is easy to hear components that are difficult to hear speech, thereby facilitating sound discrimination. That is, the speech is divided into consonants and vowels, but the vowels are relatively loud and easy to hear, whereas the consonants tend to be short in time and low in volume and difficult to hear. Therefore, if the consonant part is emphasized, it can be made easier to hear. In order to distinguish a consonant from a vowel from the detected voice signal, the above-described voice recognition technique may be applied. In this case, it is not necessary to identify specific generated sounds, and it is sufficient if the vowels and consonants can be distinguished. Therefore, the accuracy can be further improved, and the processing can be further simplified. This distinguishes consonants from vowels, and further processes the audio signal before playback. The vowels and consonants can be distinguished based on formants or the like. For example, for a vowel, it can be determined by increasing the sampling period to obtain an uneven waveform and counting the number of unevenness.

ここでは、再生中に音量をほぼリアルタイムに調整するアルゴリズムとして、聞き取り難く時間的にも短い子音部がきたときに音量を大きくし、それ以外の部分は通常の音量とする。子音部の直後に表れる母音部の音量は一般に大きいので、子音とその直後の母音により音声は認識される。この際に母音部は１周期から数周期の短時時間波形で音声認識は完了する。聴覚と脳機能による短時間の音声認識が完了すると、それ以後は母音の継続であり、その期間において音声の出力信号を非常に小さくする。そして再び子音が入力されると、同様に音声出力を大きくする。この操作により、音声出力を聴感上は大きくしたことになるが、常に大音量の音声を聞いている訳ではないので、不快感は極めて少なくなる。また必要に応じて、母音の音量を絞るように調整してもよい。 Here, as an algorithm for adjusting the volume almost in real time during reproduction, the volume is increased when a consonant part that is difficult to hear and is short in time comes, and the other part is set to a normal volume. Since the volume of the vowel part that appears immediately after the consonant part is generally high, the speech is recognized by the consonant and the vowel immediately after that. At this time, the vowel part completes speech recognition with a short time waveform of one to several cycles. When short-time speech recognition by auditory and brain functions is completed, vowels are continued thereafter, and the speech output signal is made very small during that period. When the consonant is input again, the sound output is similarly increased. By this operation, the sound output is increased in terms of hearing, but since the sound is not always heard at a high volume, discomfort is extremely reduced. Moreover, you may adjust so that the volume of a vowel may be reduced as needed.

上記の音声信号処理のアルゴリズムにおいて重要な点は、音声の出力を通常とする、あるいは低下させた状態から、子音が入力される時点で音声出力を増加させる処理である。従来のシステムでは音声認識自体の処理量が多く、さらにこのような音量変化の処理を加えると、音声の子音の検出に要する時間と処理量が増大し、実用化は容易でない。これに対して、本発明のアルゴリズムでは整数演算のみで子音部と母音部を検出することが可能であるので、非常に短時間の簡単な信号処理により子音の検出が可能となる。すなわち、高速化が可能な演算処理量に抑えることができるので、補聴器や携帯電話のような携帯型電気機器等の小型のシステムへの組み込みや実装が容易となる。特に専用の処理を行うＩＣを用意し、上記のデジタル信号処理を行わせることにより、携帯機器が保有するあまり高度でない情報処理機能においても音声波形から子音部と母音部の切り出し（Segmentation）が実現され、実用的な処理速度で音声出力の大きさを変化させることが可能となる。このように、本実施の形態によれば長い処理時間と大規模集積回路を必要とする不動小数点演算を用いることなく、整数型の演算のみで高速かつシンプルに実行できるので、実用的な価値は高い。また、音声信号の処理には通常ＤＳＰ（Digital Signal Processor）等を用いるが、浮動小数点演算を行うと長時間を要し、しかも集積回路の規模が増加するので、携帯電話などに搭載することは困難である。これに対し以上のアルゴリズムは整数型演算とビット・シフト演算のみで高速に処理することができ、これを回路として組み込むには小規模の集積回路で実現可能である。携帯機器や補聴器などの小型装置に搭載することができることは、実用化に際して大きな特長となる。 An important point in the above audio signal processing algorithm is a process of increasing the audio output when a consonant is input from a state where the audio output is normal or lowered. In the conventional system, the amount of processing for speech recognition itself is large, and if such volume change processing is further added, the time and amount of processing required to detect the consonant of the speech increase, making it difficult to put it into practical use. On the other hand, the algorithm of the present invention can detect the consonant part and the vowel part only by integer arithmetic, so that the consonant can be detected by simple signal processing in a very short time. In other words, since it is possible to suppress the amount of calculation processing that can be increased in speed, it is easy to incorporate and mount it in a small system such as a portable electric device such as a hearing aid or a mobile phone. In particular, by preparing an IC that performs dedicated processing and performing the above digital signal processing, the segmentation of consonant parts and vowel parts can be realized from speech waveforms even in the less sophisticated information processing functions possessed by portable devices. Thus, it is possible to change the size of the audio output at a practical processing speed. As described above, according to the present embodiment, it is possible to execute simply and at high speed with only integer type operations without using a fixed-point operation that requires a long processing time and a large-scale integrated circuit. high. Also, a DSP (Digital Signal Processor) or the like is usually used for processing audio signals. However, if floating point arithmetic is performed, it takes a long time, and the scale of the integrated circuit increases. Have difficulty. On the other hand, the above algorithm can be processed at high speed only by integer type operation and bit shift operation, and it can be realized by a small scale integrated circuit to incorporate this as a circuit. The fact that it can be mounted on a small device such as a portable device or a hearing aid is a great feature for practical use.

さらに、補聴器の使用者に応じた周波数特性に調整することもできる。補聴器のスピーカから出力される音声の周波数特性を、使用者の耳の聴覚特性に合致させるよう調整することにより、聞こえ難い周波数に対して補正をかけることが可能となり、使用者に応じて聞き取りやすい適切な音声で聴くことができるようになる。 Furthermore, the frequency characteristics can be adjusted according to the user of the hearing aid. By adjusting the frequency characteristics of the sound output from the hearing aid speaker to match the hearing characteristics of the user's ears, it is possible to correct frequencies that are difficult to hear, and it is easy to hear according to the user. You will be able to listen with appropriate sound.

このように、本発明によれば音声の聞き取りを容易にできる高性能な携帯型補聴器が実現できる。特に小型の補聴器は使用できる電池の大きさも制限され、小消費電力とする必要があるため、本発明のように演算処理量が少なく必要な消費電力も少なくて済む聴覚補助装置は、理想的である。 As described above, according to the present invention, a high-performance portable hearing aid that can easily listen to voice can be realized. In particular, a small hearing aid is limited in the size of a battery that can be used and needs to have low power consumption. Therefore, a hearing aid device that requires a small amount of calculation processing and low power consumption as in the present invention is ideal. is there.

また、本発明は補聴器以外の携帯機器への実装も容易に行える。すなわち、携帯電話やＰＨＳ等、音声を再生するスピーカを備える機器において、上記の処理を適用することで通話内容をより聞き取りやすくすることができる。特に携帯電話は小型、軽量化や連続駆動時間の長時間化の要求が強く、それでいて高性能化が求められているため、本発明のように低消費電力で音声をクリアに再生できるという優れた機能は非常に実用性がある。また、上記と同様に携帯電話の使用者に応じた周波数特性に再生設定を調整しておくことで、使用者に応じて最適な状態で音声を聞くことのできる携帯電話が実現可能である。これによって、補聴器を利用しない者であっても携帯電話の音声を聞き取りやすくして便利に使用できる。特に、通常の携帯電話ではスピーカから出力される音声等は出力も小さく、健常な聴覚特性の使用者向けに作成されているので、聴覚特性が低下した高齢者には使用し難いことがあったが、本発明を利用して音声を聞き取りやすくした高齢者やシニア向けの携帯電話を実現することが可能となる。
（補聴機能付携帯電話） In addition, the present invention can be easily mounted on portable devices other than hearing aids. That is, in a device including a speaker that reproduces sound, such as a mobile phone or PHS, the contents of the call can be more easily heard by applying the above processing. In particular, mobile phones have a strong demand for miniaturization, light weight, and long continuous drive time, and yet there is a demand for high performance, so that it is possible to clearly reproduce sound with low power consumption as in the present invention. The function is very practical. Further, by adjusting the reproduction setting to the frequency characteristic according to the user of the mobile phone in the same manner as described above, it is possible to realize a mobile phone that can listen to the voice in an optimum state according to the user. As a result, even a person who does not use a hearing aid can easily use the mobile phone to hear the voice of the mobile phone. In particular, in normal mobile phones, the sound output from the speaker is small, and it is created for users with healthy hearing characteristics, so it may be difficult to use for elderly people with poor hearing characteristics However, using the present invention, it becomes possible to realize a mobile phone for the elderly and senior who can easily hear the voice.
(Mobile phone with hearing aid function)

また、補聴器機能を携帯電話等の携帯電気機器に組み込むことも可能である。このような補聴機能付携帯電話を、例えば、携帯電話の動作モードを切り替えて補聴器として使用可能とできる。これによって、携帯電話を使用するのと同じ姿勢で、すなわち耳元に電話機をおく状態で補聴器として使用することができるので、使用時の不自然さが少なく、周囲の人間も補聴器を使用しているとは気が付きにくく、使用時の抵抗感を和らげることができる。特に近年、携帯電話が普及し、年齢・男女を問わず携帯電話を使用することが通常の行動パターンとして定着した結果、携帯電話を耳に当てることに対して、周囲の人々は特別な反応を示さなくなり、当たり前の行為として認識されている。他方、特別な形状をした補聴器を取り出して耳に当てることは通常では不自然さがつきまとい、しゃべり方にも多少の変化が表れ、補聴器の使用者も周辺の人々も特別な心理状態に陥りやすく、自然な会話が阻害される可能性がある。このような状況に鑑み、携帯電話に補聴器を組み込むことで、携帯電話を耳に当てて、補聴器として利用していることは外面的には判別できず、ごく普通の日常風景に埋没させることができ、補聴器の使用者の心理的なプレッシャは軽減される。 It is also possible to incorporate the hearing aid function into a portable electrical device such as a cellular phone. Such a mobile phone with a hearing aid function can be used as a hearing aid, for example, by switching the operation mode of the mobile phone. As a result, it can be used as a hearing aid in the same posture as using a mobile phone, that is, with the phone placed at the ear, so there is little unnaturalness in use and surrounding people also use hearing aids It is hard to notice and can relieve the feeling of resistance during use. In particular, in recent years, mobile phones have become widespread, and using mobile phones regardless of age or gender has become established as a normal behavior pattern. It is no longer shown and is recognized as a natural action. On the other hand, taking out a hearing aid with a special shape and placing it on your ear is usually unnatural, and there are some changes in the way you talk, making it easier for both the hearing aid user and the surrounding people to enter a special psychological state. , Natural conversations may be hindered. In view of this situation, by incorporating a hearing aid into the mobile phone, it is not possible to externally determine that the mobile phone is used as a hearing aid by placing it on the ear, and it can be buried in an ordinary everyday landscape. Yes, the psychological pressure of the hearing aid user is reduced.

また補聴器を携帯電話と一体とすることで、多くのメリットが生まれる。例えば、携帯電話のスピーカを補聴器のスピーカとして共用することができるので、補聴器専用のイヤースピーカを持ち歩く必要がない。また携帯電話を持ち運ぶ感覚で補聴器を常時携帯できるので、特別に補聴器を持ち歩いているという感覚が少なくなり、補聴器の使用における抵抗感を軽減できる。さらに補聴器を使用しないときには、身体に着けておく必要がないので、使用者の身体的負担が少ない。さらにまた、携帯電話のデジタル処理の集積回路に補聴器機能をもつ集積回路を組み込むことが可能であり、補聴器専用の集積回路を別個に製作する必要がなく、補聴器と携帯電話を個別に持つ場合に比較してコストを安価に抑えることができる。さらに近年は携帯電話用に長期使用可能な充電式電池が開発されており、これら最新のものを使用することができるので、補聴器の電源供給不足の心配が少なくなる。さらにまた、電池に限らず本体も最新の携帯電話をベースに製作できるので、機能的、デザイン的にも最新の携帯電話モデルを採用でき、利便性が高く、コスト的にも安く製作することが可能である。特に、補聴器専用の金型を作製すれば高価になるが、携帯電話と兼用することにより金型代の節約にもなる。 There are many advantages to integrating a hearing aid with a mobile phone. For example, since a speaker of a cellular phone can be shared as a speaker of a hearing aid, it is not necessary to carry an ear speaker dedicated to the hearing aid. In addition, since the hearing aid can be always carried with the feeling of carrying a mobile phone, the feeling of carrying around the hearing aid is reduced, and the resistance in using the hearing aid can be reduced. Further, when the hearing aid is not used, it is not necessary to keep it on the body, so that the physical burden on the user is small. Furthermore, it is possible to incorporate an integrated circuit having a hearing aid function into the digital processing integrated circuit of the mobile phone, so that it is not necessary to separately manufacture an integrated circuit dedicated to the hearing aid, and the hearing aid and the mobile phone are separately provided. In comparison, the cost can be kept low. Furthermore, in recent years, rechargeable batteries that can be used for a long time have been developed for mobile phones, and since these latest batteries can be used, there is less concern about the shortage of power supply for hearing aids. Furthermore, since the main body can be manufactured based on the latest mobile phone as well as the battery, the latest mobile phone model can be adopted in terms of functionality and design, making it convenient and inexpensive to manufacture. Is possible. In particular, if a mold dedicated to a hearing aid is produced, it becomes expensive, but it also saves the cost of the mold by using it as a mobile phone.

この際、携帯電話にはマイク等の集音機を、使用者が通話するために使用するものと、周囲の音を集音するためのものとで２個以上を設けることが好ましい。 At this time, it is preferable to provide two or more sound collectors such as microphones for the mobile phone, one for use by the user to make a call and the other for collecting ambient sounds.

さらに、補聴機能等のように音声信号の再生側で聞き取りやすく処理する他、音声信号の入力側で同様の処理を行うこともできる。例えば携帯電話において、スピーカのみならずマイク側に本発明を適用し、通話の相手側に向けて送信される音声信号に、上記のような子音強調処理を行えば、携帯電話の使用者のみならず通話先の相手においても、会話内容を聞きとりやすくできる。さらにまた、電話のみならず、ＴＶやラジオ、ＴＶ電話、ＴＶ会議システム等、スピーカやイヤホンで音声を再生する機器にも同様に適用できる。 Furthermore, in addition to processing for easy listening on the audio signal reproduction side, such as a hearing aid function, the same processing can be performed on the audio signal input side. For example, in a mobile phone, if the present invention is applied not only to a speaker but also to a microphone and the above-described consonant enhancement processing is performed on an audio signal transmitted to the other party of the call, only the user of the mobile phone It is easy for the other party to hear the conversation. Furthermore, the present invention can be similarly applied not only to telephones but also to devices that reproduce sound using speakers or earphones, such as TVs, radios, TV phones, and TV conference systems.

本発明の聴覚補助装置、音声信号処理方法、音声処理プログラム及びコンピュータで読み取り可能な記録媒体並びに記録した機器は、補聴器などで音声を聞き取りやすく補正する前処理や後処理に好適に適用できる。 The hearing aid apparatus, sound signal processing method, sound processing program, computer-readable recording medium, and recorded device according to the present invention can be suitably applied to preprocessing and postprocessing for correcting a sound so that it can be easily heard by a hearing aid or the like.

本発明の一実施の形態に係る聴覚補助装置の電子線撮像部の構成を示すブロック図である。It is a block diagram which shows the structure of the electron beam imaging part of the hearing aid apparatus which concerns on one embodiment of this invention. 振幅波形のＮ個のデータについて平均値α_ｋを求める様子を示す説明図である。It is explanatory drawing which shows a mode that the average value (alpha) _k is calculated _| required about N data of an amplitude waveform. 入力される音声波形とそのピッチ情報を示す説明図である。It is explanatory drawing which shows the audio | voice waveform input and its pitch information. 定常とした３周期の母音波形を示すグラフである。It is a graph which shows the vowel waveform of 3 periods made into the steady state. Ｎ＝２５６として図４から抽出した１周期分の凹凸波形を示すグラフである。It is a graph which shows the uneven | corrugated waveform for 1 period extracted from FIG. 4 as N = 256. Ｎ＝６４として図４から抽出した１周期分の凹凸波形を示すグラフである。It is a graph which shows the uneven | corrugated waveform for 1 period extracted from FIG. 4 as N = 64. 凸部分の／ｅ／音素波形と生成した正弦波の波形を示すグラフである。It is a graph which shows the waveform of / e / phoneme waveform of a convex part, and the generated sine wave. 凸部分の／ｏ／音素波形と生成した正弦波の波形を示すグラフである。It is a graph which shows the waveform of the / o / phoneme waveform of a convex part, and the produced | generated sine wave. 意識発話に用いた／ｉ／と／ｕ／に関する特徴分布を示すグラフである。It is a graph which shows the characteristic distribution regarding / i / and / u / used for consciousness utterance. 意識発話に用いた／ｅ／と／ｏ／に関する特徴分布を示すグラフである。It is a graph which shows the feature distribution regarding / e / and / o / used for consciousness utterance. 音声信号処理方法の手順を示すフローチャートである。It is a flowchart which shows the procedure of the audio | voice signal processing method. 音声の最も低い周波数に対応する波形を示すグラフである。It is a graph which shows the waveform corresponding to the lowest frequency of an audio | voice. 音声の最も高い周波数に対応する波形を示すグラフである。It is a graph which shows the waveform corresponding to the highest frequency of an audio | voice. 凹凸波形の凹凸を強調して強調声波形に変換する様子を示すグラフである。It is a graph which shows a mode that the unevenness | corrugation of an uneven | corrugated waveform is emphasized and it converts into an emphasized voice waveform. 凹凸波形に基づき補正された強調声波形の一例を示すグラフである。It is a graph which shows an example of the emphasis voice waveform correct | amended based on the uneven | corrugated waveform. 従来の補聴器の一例の構成を示すブロック図である。It is a block diagram which shows the structure of an example of the conventional hearing aid. 従来の補聴器の音声処理回路の構成例を示すブロック図である。It is a block diagram which shows the structural example of the audio | voice processing circuit of the conventional hearing aid.

符号の説明Explanation of symbols

１００…聴覚補助装置
１０…音声入力部
１２…Ａ／Ｄ変換器
１４…凹凸波形変換部
１６…声成分抽出部
１８…強調処理部
２０…音声出力部
２２…振幅算出部
２４…平均値演算部
２６…比較部
２８…変換部 DESCRIPTION OF SYMBOLS 100 ... Hearing assistance apparatus 10 ... Audio | voice input part 12 ... A / D converter 14 ... Uneven | corrugated waveform conversion part 16 ... Voice component extraction part 18 ... Enhancement processing part 20 ... Audio | voice output part 22 ... Amplitude calculation part 24 ... Average value calculation part 26: Comparison unit 28: Conversion unit

Claims

音声信号を入力するための音声入力部と、
前記音声入力部で入力された音声信号の振幅を量子化し、量子化された各点のデータにつき、隣接する所定の範囲のデータの振幅値を加算し、これを加算したデータの個数で除算して該データを中心とする部分平均値を求め、各点のデータの振幅値と各々の部分平均値とを比較し、比較結果の真偽に基づいて凹凸波形に変換するための凹凸波形変換部と、
前記凹凸波形変換部で得られた音声信号の凹凸波形から、予め登録された人の声に関する登録パターンに従い人の声に対応する声成分を抽出するための声成分抽出部と、
前記声成分抽出部で抽出された声成分の凸部を高く、凹部を低くすることで強調し、強調声波形を生成するための強調処理部と、
強調声波形を出力するための音声出力部と、
を備えることを特徴とする聴覚補助装置。 An audio input unit for inputting an audio signal;
Quantizes the amplitude of the audio signal input by the audio input unit, adds the amplitude value of the adjacent predetermined range of data for each quantized data, and divides this by the number of the added data A concave / convex waveform converting unit for obtaining a partial average value centered on the data, comparing the amplitude value of the data at each point with each partial average value, and converting it to a concave / convex waveform based on the truth of the comparison result When,
A voice component extraction unit for extracting a voice component corresponding to a human voice according to a registered pattern related to a human voice registered in advance from the concave / convex waveform of the voice signal obtained by the concave / convex waveform conversion unit;
An emphasis processing unit for emphasizing the voice component extracted by the voice component extraction unit by increasing the convexity of the voice component and lowering the concave, and generating an emphasized voice waveform;
An audio output unit for outputting an emphasized voice waveform;
A hearing aid device comprising:

請求項１に記載の聴覚補助装置であって、
前記凹凸波形変換部が、凹凸波形に変換するサンプリング個数を２のべき乗とし、前記声成分抽出部が、べき指数を調整することで声成分を抽出することを特徴とする聴覚補助装置。 The hearing aid device according to claim 1,
The hearing aid apparatus characterized in that the uneven waveform converter converts the number of samples to be converted into an uneven waveform to a power of 2, and the voice component extractor extracts a voice component by adjusting a power exponent.

請求項１又は２に記載の聴覚補助装置であって、
前記声成分抽出部が、周波数成分を人の声の高域に対応する成分、低域に対応する成分に応じたべき指数をそれぞれ設定して声成分を抽出することを特徴とする聴覚補助装置。 The hearing aid device according to claim 1 or 2,
Hearing assist device characterized in that the voice component extraction unit extracts a voice component by setting an exponent corresponding to a component corresponding to a high frequency of a human voice and a component corresponding to a low frequency as a frequency component. .

請求項１から３のいずれかに記載の聴覚補助装置であって、
前記凹凸波形変換部が部分平均値を演算する際に、所定の範囲の加算すべきデータの個数を２のべき乗として、前記所定の範囲の加算したデータの個数で除算をビット・シフト演算で行うことを特徴とする聴覚補助装置。 The hearing aid device according to any one of claims 1 to 3,
When the concavo-convex waveform converting unit calculates a partial average value, the number of data to be added in a predetermined range is set to a power of 2, and division is performed by a bit shift operation with the number of data added in the predetermined range. A hearing aid device characterized by that.

請求項４に記載の聴覚補助装置であって、
前記凹凸波形変換部が部分平均値を演算する際に、一のデータにつき平均値を求めるために所定の範囲のデータの振幅値を加算した加算値を保持しておき、次のデータの加算値を求める際に、保持された加算値から、不要な振幅値を減算すると共に、必要な振幅値を加算することで、加算値を演算することを特徴とする聴覚補助装置。 The hearing aid device according to claim 4,
When the concave / convex waveform converting unit calculates a partial average value, an addition value obtained by adding amplitude values of data in a predetermined range is obtained in order to obtain an average value for one data, and an addition value of the next data A hearing aid device that calculates an added value by subtracting an unnecessary amplitude value from a held addition value and adding a necessary amplitude value when obtaining the value.

請求項５に記載の聴覚補助装置であって、ｋ点を中心とする前後ｎの区間Ｎ（＝２ｎ）における平均値α_ｋを、
として表現する際、平均値の演算において、平均値α_ｋを、その前段の位置である（ｋ−１）点における平均値α_ｋー１を用いて
で演算することを特徴とする聴覚補助装置。 The hearing aid according to claim 5, wherein an average value α _k in a section N (= 2n) of n before and after the point k is the center,
In the calculation of the average value, the average value α _k is used as the average value α _k−1 at the point (k−1) that is the position of the previous stage.
Hearing assistance device characterized by calculating with

請求項１から６のいずれかに記載の聴覚補助装置であって、
前記声成分抽出部で抽出された声成分につき、子音が認識されると前記音声出力部から出力される音量を大きくし、子音の後に母音が認識されると、母音から所定時間で音量増幅を解除することを特徴とする聴覚補助装置。 The hearing aid device according to any one of claims 1 to 6,
For a voice component extracted by the voice component extraction unit, the volume output from the voice output unit is increased when a consonant is recognized, and when a vowel is recognized after the consonant, the volume is amplified in a predetermined time from the vowel. A hearing aid device characterized by being released.

入力された音声信号に基づいて人の声を補正して出力する音声信号処理方法であって、
音声信号を入力する工程と、
入力された音声信号の振幅を量子化し、量子化された各点のデータにつき、隣接する所定の範囲のデータの振幅値を加算し、これを加算したデータの個数で除算して該データを中心とする部分平均値を求め、各点のデータの振幅値と各々の部分平均値とを比較し、比較結果の真偽に基づいて凹凸波形に変換すると共に、得られた音声信号の凹凸波形から人の声に対応する声成分を抽出する工程と、
抽出された声成分を強調して強調声波形を生成する工程と、
強調声波形を再生する工程と、
を有することを特徴とする音声処理方法。 An audio signal processing method for correcting and outputting a human voice based on an input audio signal,
Inputting an audio signal;
The amplitude of the input audio signal is quantized, and for each quantized point data, the amplitude value of the adjacent predetermined range of data is added, and this is divided by the number of added data to center the data. And compare the amplitude value of the data at each point with each partial average value, and convert it to a concavo-convex waveform based on the truth of the comparison result, and from the concavo-convex waveform of the obtained audio signal Extracting a voice component corresponding to a human voice;
Generating an emphasized voice waveform by emphasizing the extracted voice component;
Reproducing the emphasized voice waveform;
A voice processing method characterized by comprising:

入力された音声信号に基づいて人の声を補正して出力する音声信号処理プログラムであって、
音声信号を入力する機能と、
入力された音声信号の振幅を量子化し、量子化された各点のデータにつき、隣接する所定の範囲のデータの振幅値を加算し、これを加算したデータの個数で除算して該データを中心とする部分平均値を求め、各点のデータの振幅値と各々の部分平均値とを比較し、比較結果の真偽に基づいて凹凸波形に変換すると共に、得られた音声信号の凹凸波形から人の声に対応する声成分を抽出する機能と、
抽出された声成分を強調して強調声波形を生成する機能と、
強調声波形を再生する機能と、
をコンピュータに実現させることを特徴とする音声処理プログラム。 An audio signal processing program for correcting and outputting a human voice based on an input audio signal,
The ability to input audio signals;
The amplitude of the input audio signal is quantized, and for each quantized point data, the amplitude value of the adjacent predetermined range of data is added, and this is divided by the number of added data to center the data. And compare the amplitude value of the data at each point with each partial average value, and convert it to a concavo-convex waveform based on the truth of the comparison result, and from the concavo-convex waveform of the obtained audio signal A function to extract voice components corresponding to human voices;
A function to generate an emphasized voice waveform by emphasizing the extracted voice component;
A function to reproduce the emphasized voice waveform,
A sound processing program for causing a computer to realize the above.

請求項９に記載されるプログラムを格納したコンピュータで読み取り可能な記録媒体又は記録した機器。 A computer-readable recording medium or a recorded device storing the program according to claim 9.