JP2018010207A

JP2018010207A - Speech signal processing device and speech signal processing program

Info

Publication number: JP2018010207A
Application number: JP2016139753A
Authority: JP
Inventors: 遠藤　香緒里; Kaori Endo; 香緒里遠藤
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2016-07-14
Filing date: 2016-07-14
Publication date: 2018-01-18
Anticipated expiration: 2036-07-14
Also published as: JP6677110B2

Abstract

PROBLEM TO BE SOLVED: To not deteriorate sound quality and to reduce a load due to speech signal processing in wideband speech signal processing.SOLUTION: A first band division unit (23) divides a low frequency region of a speech signal converted from time domain representation to frequency domain representation into a plurality of first bands using a first bandwidth. A bandwidth determination unit (24) determines a second bandwidth more than or equal to the first bandwidth for dividing a high frequency region based on height of importance of the high frequency region having a higher frequency than the frequency of the low frequency region of the speech signal. A second band division unit (25) divides the high frequency region of the speech signal into a plurality of second bands using the second bandwidth determined by the bandwidth determination unit. A speech signal adjustment unit (26) performs speech signal adjustment processing to each of the plurality of first bands and each of the plurality of second bands.SELECTED DRAWING: Figure 1

Description

本発明は、音声信号処理装置及び音声信号処理プログラムに関する。 The present invention relates to an audio signal processing device and an audio signal processing program.

高音質化を実現するためには広帯域の音声信号処理に対応する必要があるが、広帯域の音声信号処理では情報量が増大するため、音声信号処理の負担が増大する。例えば、入力音声信号を低域周波数雑音成分が含まれる低域音声信号と高域周波数雑音成分が含まれる高域音声信号とに分割し、入力音声信号のパワーが大きい低域音声信号をダウンサンプリングして入力音声信号から間引く技術が存在する。これにより、低域音声信号に対して、少ない演算量でより高度な雑音抑圧処理を行うことができる。また、入力音声信号のパワーが小さい高域音声信号に対しては、低域音声信号への雑音抑制処理よりも簡単な雑音抑制処理を行うことで、より少ない演算量で音声歪みを低減し、かつ、雑音を除去することで音質を劣化させない。したがって、音声信号処理の負担を低減することができる。 In order to achieve high sound quality, it is necessary to support wideband audio signal processing. However, since the amount of information increases in wideband audio signal processing, the burden of audio signal processing increases. For example, the input audio signal is divided into a low frequency audio signal including a low frequency noise component and a high frequency audio signal including a high frequency noise component, and the low frequency audio signal having a high power is downsampled. Thus, there is a technique for thinning out an input audio signal. As a result, it is possible to perform more advanced noise suppression processing with a small amount of computation on the low-frequency audio signal. In addition, for high-frequency audio signals with low power of the input audio signal, by performing noise suppression processing that is simpler than noise suppression processing for low-frequency audio signals, audio distortion can be reduced with less computation, In addition, sound quality is not deteriorated by removing noise. Therefore, the burden of audio signal processing can be reduced.

しかしながら、騒音を含む音声の音声信号は、状況によりその周波数特性が変化する。例えば、子音区域など、高域音声信号に音声特徴の情報が多く含まれる場合、または、騒音の高域周波数成分の定常性が低い場合などに、高域音声信号に簡単な雑音抑制処理を実行すると、音声歪みが多くなり、雑音を十分に除去できない場合がある。 However, the frequency characteristics of an audio signal including noise change depending on the situation. For example, simple noise suppression processing is performed on a high-frequency audio signal when the high-frequency audio signal contains a lot of audio feature information, such as consonant zones, or when the high-frequency component of the noise is low Then, the voice distortion increases, and noise may not be sufficiently removed.

また、音声信号処理の負担を低減するために、入力音声信号を所定の周波数帯域に分割された帯域分割信号に変換し、周波数帯域毎の特徴量に応じて雑音、環境音、及び楽音等の音響信号をフィルタリングする技術が存在する。フィルタリングされた帯域分割信号の明瞭度に応じて、フィルタリングされた帯域分割信号と入力音声信号との配分を調整して、出力信号を合成することで、ユーザが不快にならない程度に音質が劣化せず、かつ、第三者には聞き難い音声を生成する。 In addition, in order to reduce the burden of audio signal processing, the input audio signal is converted into a band-divided signal divided into a predetermined frequency band, and noise, environmental sounds, musical sounds, etc. are converted according to the feature amount for each frequency band. There are techniques for filtering acoustic signals. According to the clarity of the filtered band-divided signal, the distribution of the filtered band-divided signal and the input audio signal is adjusted, and the output signal is synthesized, so that the sound quality is degraded to the extent that the user is not uncomfortable. In addition, it generates sounds that are difficult for third parties to hear.

特開２００６−２０１６２２号公報JP 2006-201622 A 特開２００９−７５１６０号公報JP 2009-75160 A 特許第３３０９８９５号公報Japanese Patent No. 3309895 特許第４５３３４２７号公報Japanese Patent No. 4533427 特許第５４５３７４０号公報Japanese Patent No. 5453740

しかしながら、音声及び周囲騒音の周波数特性などは経時的に変化する。所定の周波数帯域に分割している関連技術では、経時的な変化に応じた適切な帯域幅で音声信号処理を行うことが困難である。 However, the frequency characteristics of voice and ambient noise change over time. With the related technology that divides the signal into predetermined frequency bands, it is difficult to perform audio signal processing with an appropriate bandwidth according to changes over time.

本発明は、１つの側面として、広帯域の音声信号処理において、音質を劣化させず、かつ、音声信号処理による負担を低減することを目的とする。 An object of the present invention is to reduce the burden of audio signal processing without degrading sound quality in wideband audio signal processing.

１つの実施形態では、第１帯域分割部は、時間領域表現から周波数領域表現に変換した音声信号の低周波数領域を第１帯域幅で複数の第１帯域に分割する。帯域幅決定部は、音声信号の低周波数領域の周波数より周波数が高い高周波数領域の重要度の高さに基づいて、高周波数領域を分割するための第１帯域幅以上の第２帯域幅を決定する。第２帯域分割部は、帯域幅決定部で決定された第２帯域幅で、音声信号の高周波数領域を複数の第２帯域に分割する。音声信号調整部は、複数の第１帯域の各々及び複数の第２帯域の各々に対して音声信号調整処理を実行する。 In one embodiment, the first band dividing unit divides the low frequency region of the audio signal converted from the time domain representation into the frequency domain representation into a plurality of first bands with a first bandwidth. The bandwidth determination unit determines a second bandwidth that is equal to or higher than the first bandwidth for dividing the high frequency region based on the importance of the high frequency region that is higher than the frequency of the low frequency region of the audio signal. decide. The second band dividing unit divides the high frequency region of the audio signal into a plurality of second bands with the second bandwidth determined by the bandwidth determining unit. The audio signal adjustment unit executes audio signal adjustment processing for each of the plurality of first bands and each of the plurality of second bands.

本発明は、１つの側面として、広帯域の音声信号処理において、音質を劣化させず、かつ、音声信号処理による負担を低減することを可能とする。 As one aspect, the present invention makes it possible to reduce the burden of audio signal processing without degrading sound quality in wideband audio signal processing.

第１〜第５実施形態に係る音声信号処理装置の要部機能の一例を示すブロック図である。It is a block diagram which shows an example of the principal part function of the audio | voice signal processing apparatus which concerns on 1st-5th embodiment. 第１〜第５実施形態に係る音声信号処理装置のハードウェアの構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of the audio | voice signal processing apparatus which concerns on 1st-5th embodiment. 第１〜第５実施形態に係る音声信号処理の概要を説明するための概念図である。It is a conceptual diagram for demonstrating the outline | summary of the audio | voice signal process which concerns on 1st-5th embodiment. 第１〜第５実施形態に係る音声信号処理の概要を説明するための概念図である。It is a conceptual diagram for demonstrating the outline | summary of the audio | voice signal process which concerns on 1st-5th embodiment. 第１〜第４実施形態に係る音声信号処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the audio | voice signal process which concerns on 1st-4th embodiment. 第１実施形態に係る音声信号分析処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the audio | voice signal analysis process which concerns on 1st Embodiment. 第１実施形態に係る高周波数領域の帯域数算出を説明するための線図である。It is a diagram for demonstrating the band number calculation of the high frequency area | region which concerns on 1st Embodiment. 第１〜第５実施形態に係る高周波数領域の帯域数を説明するための概念図である。It is a conceptual diagram for demonstrating the number of bands of the high frequency area | region which concerns on 1st-5th embodiment. 第１〜第５実施形態に係る高周波数領域の帯域併合を説明するための概念図である。It is a conceptual diagram for demonstrating the band merge of the high frequency area | region which concerns on 1st-5th embodiment. 第１〜第５実施形態に係る高周波数領域の帯域併合を説明するための概念図である。It is a conceptual diagram for demonstrating the band merge of the high frequency area | region which concerns on 1st-5th embodiment. 第１〜第５実施形態に係る帯域併合処理の一例を示すフローチャートである。It is a flowchart which shows an example of the band merge process which concerns on 1st-5th embodiment. 第１〜第５実施形態に係る音声信号調整処理の一例を示すフローチャートである。It is a flowchart which shows an example of the audio | voice signal adjustment process which concerns on 1st-5th embodiment. 第１〜第５実施形態に係るゲイン分配の一例を示すフローチャートである。It is a flowchart which shows an example of the gain distribution which concerns on 1st-5th embodiment. 第１〜第５実施形態に係るゲイン分配の一例を示すフローチャートである。It is a flowchart which shows an example of the gain distribution which concerns on 1st-5th embodiment. 第１〜第５実施形態の原理を説明するための概念図である。It is a conceptual diagram for demonstrating the principle of 1st-5th embodiment. 第１〜第５実施形態の原理を説明するための概念図である。It is a conceptual diagram for demonstrating the principle of 1st-5th embodiment. 第１〜第５実施形態の原理を説明するための概念図である。It is a conceptual diagram for demonstrating the principle of 1st-5th embodiment. 第２実施形態に係る音声信号分析処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the audio | voice signal analysis process which concerns on 2nd Embodiment. 第２実施形態に係る高周波数領域の帯域数算出を説明するための線図である。It is a diagram for demonstrating the band number calculation of the high frequency area | region which concerns on 2nd Embodiment. 第３実施形態に係る音声信号分析処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the audio | voice signal analysis process which concerns on 3rd Embodiment. 第３および第４実施形態に係る音声有無判定処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the audio | voice presence / absence determination processing which concerns on 3rd and 4th embodiment. 第３および第４実施形態に係る基本周波数算出処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the fundamental frequency calculation process which concerns on 3rd and 4th embodiment. 第３実施形態に係る高周波数領域の帯域数算出を説明するための線図である。It is a diagram for demonstrating the band number calculation of the high frequency area | region which concerns on 3rd Embodiment. 第４実施形態に係る音声信号分析処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the audio | voice signal analysis process which concerns on 4th Embodiment. 第５実施形態に係る音声信号処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the audio | voice signal process which concerns on 5th Embodiment. 第５実施形態に係る音声信号処理の概要を説明するための概念図である。It is a conceptual diagram for demonstrating the outline | summary of the audio | voice signal processing which concerns on 5th Embodiment. 第５実施形態に係る音声信号処理の概要を説明するための概念図である。It is a conceptual diagram for demonstrating the outline | summary of the audio | voice signal processing which concerns on 5th Embodiment. 第５実施形態に係る音声信号処理の概要を説明するための概念図である。It is a conceptual diagram for demonstrating the outline | summary of the audio | voice signal processing which concerns on 5th Embodiment. 第５実施形態に係る音声信号処理の概要を説明するための概念図である。It is a conceptual diagram for demonstrating the outline | summary of the audio | voice signal processing which concerns on 5th Embodiment. 第５実施形態に係る高周波数領域下減変更処理の流れの一例を示すフローチャートである。It is a flowchart which shows an example of the flow of the high frequency area | region lowering change process which concerns on 5th Embodiment.

［第１実施形態］
以下、図面を参照して第１実施形態の一例を詳細に説明する。 [First Embodiment]
Hereinafter, an example of the first embodiment will be described in detail with reference to the drawings.

図１に示す音声信号処理装置１０は、音声入力部２１、周波数領域変換部２２、第１帯域分割部２３、帯域幅決定部２４、第２帯域幅分割部２５、音声信号調整部２６、時間領域変換部２７及び音声出力部２８を含む。音声入力部２１は音声を検出し、検出した音声を音声信号に変換する。 The audio signal processing apparatus 10 shown in FIG. 1 includes an audio input unit 21, a frequency domain conversion unit 22, a first band division unit 23, a bandwidth determination unit 24, a second bandwidth division unit 25, an audio signal adjustment unit 26, a time An area conversion unit 27 and an audio output unit 28 are included. The voice input unit 21 detects voice and converts the detected voice into a voice signal.

周波数領域変換部２２は、音声信号を時間領域表現から周波数領域表現に変換する。例えば、フーリエ変換を使用して、時間に応じてレベルが変化する音声信号を周波数に応じてレベルが変化する音声信号に変換する。第１帯域分割部２３は、周波数領域表現に変換した音声信号の低周波数領域を第１帯域幅で複数の第１帯域幅に分割する。帯域幅決定部２４は、音声信号の低周波数領域の周波数より周波数が高い高周波数領域の重要度の高さに基づいて、高周波数領域を分割するための第１帯域幅以上の第２帯域幅を決定する。 The frequency domain conversion unit 22 converts the audio signal from the time domain representation to the frequency domain representation. For example, using a Fourier transform, an audio signal whose level changes with time is converted into an audio signal whose level changes with frequency. The first band dividing unit 23 divides the low frequency region of the audio signal converted into the frequency region expression into a plurality of first bandwidths by the first bandwidth. The bandwidth determination unit 24 has a second bandwidth equal to or higher than the first bandwidth for dividing the high frequency region based on the importance of the high frequency region whose frequency is higher than the frequency of the low frequency region of the audio signal. To decide.

低周波数領域は、一般に重要度が高い領域である。一方、高周波数領域は、一般に低周波数領域と比較して重要度が低い領域であるが、高周波数領域に含まれる音声信号の特徴によっては重要度が高い場合もある。高周波数領域の重要度が高い場合には、音質を劣化させないようにするため、重要度が高くなるにしたがって、高周波数領域を分割する際の帯域幅を狭くし、帯域数を多くして、高周波数領域の音声信号処理の精度を高くすることで、音質を劣化させない。 The low frequency region is generally a region with high importance. On the other hand, the high frequency region is generally a region that is less important than the low frequency region, but may be more important depending on the characteristics of the audio signal included in the high frequency region. If the importance of the high frequency region is high, in order not to deteriorate the sound quality, as the importance increases, the bandwidth when dividing the high frequency region is narrowed, the number of bands is increased, The sound quality is not deteriorated by increasing the accuracy of the sound signal processing in the high frequency region.

帯域幅決定部２４は係数決定部２９を含むことができ、係数決定部２９は、高周波数領域の重要度の高さに基づいて係数を決定する。この場合、帯域幅決定部２４は、決定された係数を第１帯域幅に乗じることで第２帯域幅を決定する。帯域幅決定部２４は、決定された係数に対応する個数の第１帯域幅を加算することで第２帯域幅を決定してもよい。 The bandwidth determination unit 24 can include a coefficient determination unit 29. The coefficient determination unit 29 determines a coefficient based on the importance of the high frequency region. In this case, the bandwidth determination unit 24 determines the second bandwidth by multiplying the first bandwidth by the determined coefficient. The bandwidth determination unit 24 may determine the second bandwidth by adding the number of first bandwidths corresponding to the determined coefficient.

第２帯域分割部２５は、帯域幅決定部２４で決定された第２帯域幅で、音声信号の高周波数領域を複数の第２帯域に分割する。音声信号調整部２６は、複数の第１帯域の各々及び複数の第２帯域の各々に対して音声信号調整処理を実行する。時間領域変換部２７は、音声信号を周波数領域表現から時間領域表現に変換する。音声出力部２８は、音声信号を音声に変換して出力する。 The second band dividing unit 25 divides the high frequency region of the audio signal into a plurality of second bands with the second bandwidth determined by the bandwidth determining unit 24. The audio signal adjustment unit 26 performs an audio signal adjustment process on each of the plurality of first bands and each of the plurality of second bands. The time domain conversion unit 27 converts the audio signal from the frequency domain expression to the time domain expression. The sound output unit 28 converts the sound signal into sound and outputs the sound.

音声信号処理装置１０は、一例として、図２に示すように、プロセッサの一例であるＣＰＵ（Central Processing Unit）３１、一次記憶部３２、二次記憶部３３、外部インターフェイス３４、マイク３５、スピーカ３６及び通信部３７を含む。ＣＰＵ３１、一次記憶部３２、二次記憶部３３、外部インターフェイス３４、マイク３５、スピーカ３６、及び通信部３７は、バス３９を介して相互に接続されている。 As an example, as shown in FIG. 2, the audio signal processing apparatus 10 includes a CPU (Central Processing Unit) 31 that is an example of a processor, a primary storage unit 32, a secondary storage unit 33, an external interface 34, a microphone 35, and a speaker 36. And a communication unit 37. The CPU 31, primary storage unit 32, secondary storage unit 33, external interface 34, microphone 35, speaker 36, and communication unit 37 are connected to each other via a bus 39.

一次記憶部３２は、例えば、ＲＡＭ（Random Access Memory）などの揮発性のメモリである。二次記憶部３３は、例えば、ＨＤＤ（Hard Disk Drive）、又はＳＳＤ（Solid State Drive）などの不揮発性のメモリである。 The primary storage unit 32 is, for example, a volatile memory such as a RAM (Random Access Memory). The secondary storage unit 33 is a non-volatile memory such as an HDD (Hard Disk Drive) or an SSD (Solid State Drive).

二次記憶部３３は、プログラム格納領域３３Ａ及びデータ格納領域３３Ｂを含む。プログラム格納領域３３Ａは、一例として、音声信号処理プログラムなどのプログラムを記憶している。データ格納領域３３Ｂは、一例として、音声信号および音声信号処理プログラムを実行している間に生成される中間データなどを記憶する。 The secondary storage unit 33 includes a program storage area 33A and a data storage area 33B. As an example, the program storage area 33A stores a program such as an audio signal processing program. As an example, the data storage area 33B stores an audio signal and intermediate data generated while the audio signal processing program is being executed.

ＣＰＵ３１は、プログラム格納領域３３Ａから音声信号処理プログラムを読み出して一次記憶部３２に展開する。ＣＰＵ３１は、音声信号処理プログラムを実行することで、図１の周波数領域変換部２２、第１帯域分割部２３、帯域幅決定部２４、第２帯域分割部２５、音声信号調整部２６、時間領域変換部２７、及び係数決定部２９として動作する。 The CPU 31 reads out the audio signal processing program from the program storage area 33 </ b> A and develops it in the primary storage unit 32. The CPU 31 executes the audio signal processing program, so that the frequency domain converting unit 22, the first band dividing unit 23, the bandwidth determining unit 24, the second band dividing unit 25, the audio signal adjusting unit 26, and the time domain shown in FIG. It operates as a conversion unit 27 and a coefficient determination unit 29.

なお、音声信号処理プログラムなどのプログラムは、外部サーバに記憶され、ネットワークを介して、一次記憶部３２に展開されてもよい。また、音声信号処理プログラムなどのプログラムは、ＤＶＤ（Digital Versatile Disc）などの非一時的記録媒体に記憶され、記録媒体読込装置を介して、一次記憶部３２に展開されてもよい。 Note that a program such as an audio signal processing program may be stored in an external server and expanded in the primary storage unit 32 via a network. Further, a program such as an audio signal processing program may be stored in a non-temporary recording medium such as a DVD (Digital Versatile Disc) and expanded in the primary storage unit 32 via a recording medium reading device.

マイク３５は、音声入力部２１の一例であり、ユーザが発話した音声及び背景雑音などを検出し、音声信号に変換する。スピーカ３６は、音声出力部２８の一例であり、音声信号を音声に変換して出力する。通信部３７は、音声入力部２１及び音声出力部２８の一例であり、有線または無線の通信回線を介して音声信号を送受信する。 The microphone 35 is an example of the voice input unit 21 and detects voice uttered by the user, background noise, and the like, and converts them into voice signals. The speaker 36 is an example of the audio output unit 28 and converts an audio signal into audio and outputs the audio. The communication unit 37 is an example of the audio input unit 21 and the audio output unit 28, and transmits and receives audio signals via a wired or wireless communication line.

外部インターフェイス３４には外部装置が接続され、外部インターフェイス３４は、外部装置とＣＰＵ３１との間の各種情報の送受信を司る。マイク３５、スピーカ３６及び通信部３７が音声信号処理装置１０に含まれている例について説明した。しかしながら、マイク３５、スピーカ３６及び通信部３７の全部または一部は、外部インターフェイス３４を介して接続される外部装置であってもよい。 An external device is connected to the external interface 34, and the external interface 34 controls transmission / reception of various information between the external device and the CPU 31. The example in which the microphone 35, the speaker 36, and the communication unit 37 are included in the audio signal processing device 10 has been described. However, all or part of the microphone 35, the speaker 36, and the communication unit 37 may be an external device connected via the external interface 34.

なお、音声信号処理装置１０は、例えば、スマートフォンであってよいが、本実施形態は、これに限定されない。例えば、音声信号処理装置１０は、携帯電話、タブレット、パーソナルコンピュータなどの音声通信に利用可能な装置であってよい。また、音声信号処理装置１０の一部または全部は、マイク３５、スピーカ３６及び通信部３７などと物理的に離隔して、例えば、ネットワークを介して配置されたコンピュータであってよい。 In addition, although the audio | voice signal processing apparatus 10 may be a smart phone, for example, this embodiment is not limited to this. For example, the audio signal processing device 10 may be a device that can be used for audio communication, such as a mobile phone, a tablet, and a personal computer. Moreover, a part or all of the audio signal processing apparatus 10 may be a computer that is physically separated from the microphone 35, the speaker 36, the communication unit 37, and the like, and is disposed, for example, via a network.

ネットワークを介して配置されたコンピュータを音声信号処理装置１０とする場合、ネットワークを介して配置されたコンピュータとしてのサーバに音声信号処理プログラムを格納する。マイク３５、スピーカ３６及び通信部３７などを備えたユーザの情報端末で音声信号を取得する。 When the computer arranged via the network is the audio signal processing apparatus 10, the audio signal processing program is stored in a server as a computer arranged via the network. A voice signal is acquired by a user information terminal including a microphone 35, a speaker 36, a communication unit 37, and the like.

情報端末から送信された音声信号を用いてサーバで音声信号処理を行い、音声信号処理の結果などをサーバから通話相手の情報端末に送信する。または、マイク３５、スピーカ３６及び通信部３７などを備えた通話相手の情報端末で音声信号を取得する。情報端末から送信された音声信号を用いてサーバで音声信号処理を行い、音声信号処理の結果などをサーバからユーザの情報端末に送信する。 The audio signal processing is performed by the server using the audio signal transmitted from the information terminal, and the result of the audio signal processing is transmitted from the server to the information terminal of the other party. Alternatively, the voice signal is acquired by the information terminal of the other party having the microphone 35, the speaker 36, the communication unit 37, and the like. The voice signal transmitted from the information terminal is used by the server to perform voice signal processing, and the result of the voice signal processing is transmitted from the server to the user information terminal.

次に、音声信号処理の原理について説明する。図３Ａに例示するように、帯域幅ＷＢＡ２の広帯域音声信号を、所定の第１帯域幅ＷＢ１で分割すると、帯域数が多くなり、音声信号処理による負担が増加する。第１帯域幅ＷＢ１は、例えば、音声信号を時間領域表現から周波数領域表現に変換する際の周波数分解能であってよい。本実施形態では、図３Ｂに例示するように、音声信号の低周波数領域を分割する第１帯域幅ＷＢ１は変更せず、音声信号の高周波数領域を分割する第２帯域幅ＷＢ２を第１帯域幅ＷＢ１より広く決定する。これにより、全体として帯域数を低減することで、音声信号処理による負担を低減する。 Next, the principle of audio signal processing will be described. As illustrated in FIG. 3A, when a wideband audio signal having the bandwidth WBA2 is divided by the predetermined first bandwidth WB1, the number of bands increases, and the burden due to the audio signal processing increases. The first bandwidth WB1 may be, for example, a frequency resolution when converting an audio signal from a time domain representation to a frequency domain representation. In this embodiment, as illustrated in FIG. 3B, the first bandwidth WB1 for dividing the low frequency region of the audio signal is not changed, and the second bandwidth WB2 for dividing the high frequency region of the audio signal is changed to the first band. It is determined wider than the width WB1. Thereby, the burden by audio | voice signal processing is reduced by reducing the number of bands as a whole.

音声信号の高周波数領域の重要度は、音声信号の低周波数領域の重要度よりも低い。一般的に、音声の基本周波数などの特徴は、音声信号の低周波数領域に含まれることが多いためである。しかしながら、高周波数領域の重要度によっては、高周波数領域を分割する第２帯域幅ＷＢ２を広くして帯域数を低減することで、音声信号処理後の音質が劣化する虞もある。この問題に対処するため、音声信号の高周波数領域の重要度の高さに基づいて、高周波数領域を分割する第２帯域幅ＷＢ２を決定する。これにより、音声信号処理後の音質が劣化しないようにすることができる。 The importance of the high frequency region of the audio signal is lower than the importance of the low frequency region of the audio signal. This is because, in general, features such as the fundamental frequency of speech are often included in the low frequency region of the speech signal. However, depending on the importance of the high frequency region, there is a possibility that the sound quality after the audio signal processing is deteriorated by widening the second bandwidth WB2 that divides the high frequency region to reduce the number of bands. In order to cope with this problem, the second bandwidth WB2 for dividing the high frequency region is determined based on the importance of the high frequency region of the audio signal. Thereby, it is possible to prevent the sound quality after the sound signal processing from being deteriorated.

音声信号の高周波数領域の重要度の高さは、低周波数領域の音声信号のパワーに対する高周波数領域の音声信号のパワーの比率、高周波数領域のパワーの非定常性、及び、音声信号の基本周波数に基づいて決定される。また、音声信号の高周波数領域の重要度の高さは、音声信号が子音に対応するか否か、に基づいて決定される。音声信号の高周波数領域の重要度の高さは、これらの少なくとも２つの組み合わせに基づいて決定されてもよい。 The importance of the high frequency region of the audio signal is the ratio of the power of the audio signal in the high frequency region to the power of the audio signal in the low frequency region, the unsteadiness of the power in the high frequency region, and the basics of the audio signal. Determined based on frequency. Further, the level of importance of the high frequency region of the audio signal is determined based on whether or not the audio signal corresponds to a consonant. The importance of the high frequency region of the audio signal may be determined based on a combination of at least two of these.

音声信号の高周波数領域の重要度の高さは、低周波数領域の音声信号のパワーに対する高周波数領域の音声信号のパワーの比率が大きくなるにしたがって、高くされ、高周波数領域のパワーの非定常性が高くなるにしたがって、高くされる。または、音声信号の高周波数領域の重要度の高さは、音声信号の基本周波数が高くなるにしたがって、高くされ、音声信号が子音に対応する場合に子音に対応しない場合より、高くされる。 The importance of the high frequency region of the audio signal is increased as the ratio of the power of the audio signal in the high frequency region to the power of the audio signal in the low frequency region is increased, and the power in the high frequency region is unsteady. It gets higher as the sex gets higher. Alternatively, the importance of the high frequency region of the audio signal is increased as the fundamental frequency of the audio signal is increased, and is higher when the audio signal corresponds to the consonant than when it does not correspond to the consonant.

本実施形態では、高周波数領域の重要度の高さが、低周波数領域の音声信号のパワーに対する高周波数領域の音声信号のパワーの比率が大きくなるにしたがって、高くされる例について説明する。 In the present embodiment, an example will be described in which the importance of the high frequency region is increased as the ratio of the power of the audio signal in the high frequency region to the power of the audio signal in the low frequency region is increased.

次に、音声信号処理装置１０の作用について説明する。図４に音声信号処理の一例を示す。例えば、ユーザが音声信号処理装置１０の電源をオンすると、ＣＰＵ３１は、ステップ１０１で、音声信号を１フレーム分読み込む。１フレームは、例えば、２０ｍ秒分の音声信号であってよい。音声信号は、マイク３５で検出された音声に基づいて変換された音声信号であってもよいし、通信部３７で通話相手の情報端末から有線または無線の通信回線を介して受信した音声信号であってもよい。 Next, the operation of the audio signal processing apparatus 10 will be described. FIG. 4 shows an example of audio signal processing. For example, when the user turns on the power of the audio signal processing apparatus 10, the CPU 31 reads an audio signal for one frame in step 101. One frame may be an audio signal for 20 milliseconds, for example. The audio signal may be an audio signal converted based on the audio detected by the microphone 35 or may be an audio signal received by the communication unit 37 from a communication partner information terminal via a wired or wireless communication line. There may be.

ＣＰＵ３１は、ステップ１０２で、音声信号を時間領域表現から周波数領域表現に変換する。例えば、フーリエ変換を使用して、時間に応じてレベルが変化する音声信号を周波数に応じてレベルが変化する音声信号に変換する。以下、ステップ１０７で、音声信号を周波数領域表現から時間領域表現に変換するまで、周波数領域表現に変換された音声信号を、音声信号と呼ぶ。 In step 102, the CPU 31 converts the audio signal from the time domain representation to the frequency domain representation. For example, using a Fourier transform, an audio signal whose level changes with time is converted into an audio signal whose level changes with frequency. Hereinafter, the audio signal converted into the frequency domain expression until the audio signal is converted from the frequency domain expression to the time domain expression in step 107 is referred to as an audio signal.

ＣＰＵ３１は、ステップ１０３で、後述する音声信号分析処理を行う。音声信号分析処理で、高周波数領域の重要度の高さが算出される。ＣＰＵ３１は、ステップ１０４で、後述するように、高周波数領域の重要度の高さが低くなるにしたがって、高周波数領域の帯域数が少なくなり、高周波数領域の重要度の高さが高くなるにしたがって、高周波数領域の帯域数が多くなるように、帯域数を算出する。 In step 103, the CPU 31 performs an audio signal analysis process to be described later. In the audio signal analysis processing, the importance level in the high frequency region is calculated. As described later, the CPU 31 decreases the number of bands in the high frequency region and increases the importance in the high frequency region as the importance level in the high frequency region decreases, as described later. Therefore, the number of bands is calculated so that the number of bands in the high frequency region is increased.

ＣＰＵ３１は、ステップ１０５で、後述するように、ステップ１０４で算出された高周波数領域の帯域数で、高周波数領域全体の帯域幅を除算することで、第２帯域幅ＷＢ２を算出する。また、ＣＰＵ３１は、後述するように、高周波数領域の第１帯域幅ＷＢ１の帯域を併合して第２帯域幅ＷＢ２の帯域を生成する。即ち、第２帯域幅ＷＢ２の帯域の各々に対応する第１帯域幅の複数の帯域の音声信号の平均値を、当該第２帯域幅ＷＢ２の帯域の各々の音声信号とすることで、高周波数領域は、第２帯域幅ＷＢ２の帯域に分割される。 In step 105, as will be described later, the CPU 31 calculates the second bandwidth WB2 by dividing the bandwidth of the entire high frequency region by the number of bands of the high frequency region calculated in step 104. Further, as will be described later, the CPU 31 merges the bands of the first bandwidth WB1 in the high frequency region to generate the band of the second bandwidth WB2. That is, the average value of the audio signals of the plurality of bands of the first bandwidth corresponding to each of the bands of the second bandwidth WB2 is set as the audio signal of each of the bands of the second bandwidth WB2, so that the high frequency The region is divided into bands of the second bandwidth WB2.

ＣＰＵ３１は、ステップ１０６で、後述するように、低周波数領域の第１帯域幅ＷＢ１で分割された帯域及び高周波数領域の第２帯域幅ＷＢ２で分割された帯域の各々に音声信号調整処理を実行する。ＣＰＵ３１は、ステップ１０７で、例えば、逆フーリエ変換を使用して、音声信号を周波数領域表現から時間領域表現に変換する。ＣＰＵ３１は、ステップ１０８で、音声信号を出力する。音声信号は、音声に変換されてスピーカ３６から出力されてもよいし、通信部３７に出力され、有線または無線の通信回線を介して通話相手の情報端末に送信されてもよい。 In step 106, the CPU 31 executes an audio signal adjustment process in each of the band divided by the first bandwidth WB1 in the low frequency region and the band divided by the second bandwidth WB2 in the high frequency region, as will be described later. To do. In step 107, the CPU 31 converts the audio signal from the frequency domain representation to the time domain representation using, for example, inverse Fourier transform. In step 108, the CPU 31 outputs an audio signal. The voice signal may be converted into voice and output from the speaker 36, or may be output to the communication unit 37 and transmitted to the information terminal of the call partner via a wired or wireless communication line.

ＣＰＵ３１は、ステップ１０９で、未処理の音声信号が存在するか否か判定する。例えば、ユーザが音声信号処理装置１０の電源をオフし、未処理の音声信号が存在しないと判定した場合、ＣＰＵ３１は、音声信号処理を終了する。一方、ステップ１０９で、未処理の音声信号が存在すると判定した場合、ＣＰＵ３１は、ステップ１０１に戻る。 In step 109, the CPU 31 determines whether there is an unprocessed audio signal. For example, when the user turns off the power of the audio signal processing apparatus 10 and determines that there is no unprocessed audio signal, the CPU 31 ends the audio signal processing. On the other hand, if it is determined in step 109 that there is an unprocessed audio signal, the CPU 31 returns to step 101.

ステップ１０３の音声信号分析処理の詳細を図５に例示する。本実施形態では、低周波数領域の音声信号のパワーに対する高周波数領域の音声信号のパワーの比率が大きくなるにしたがって、音声信号の高周波数領域の重要度が高くされる例について説明する。 Details of the audio signal analysis processing in step 103 are illustrated in FIG. In the present embodiment, an example will be described in which the importance of the high frequency region of the audio signal is increased as the ratio of the power of the audio signal in the high frequency region to the power of the audio signal in the low frequency region is increased.

ＣＰＵ３１は、ステップ１２１で、低周波数領域の音声信号のパワーを算出する。まず、図４のステップ１０２で、音声信号を時間領域表現から周波数領域表現に変換した際の周波数の分解能に対応する第１帯域幅ＷＢ１で音声信号全体が帯域に分割されているものとし、帯域の各々にインデックスｉを付ける。例えば、音声信号の最大周波数が３２０００Ｈｚであり、第１帯域幅ＷＢ１が３１．２５Ｈｚである場合、インデックスは、０〜１０２３（＝１０２４＝３２０００Ｈｚ／３１．２５Ｈｚ）である。 In step 121, the CPU 31 calculates the power of the audio signal in the low frequency region. First, in step 102 of FIG. 4, it is assumed that the entire audio signal is divided into bands with a first bandwidth WB1 corresponding to the frequency resolution when the audio signal is converted from the time domain representation to the frequency domain representation. Index i. For example, when the maximum frequency of the audio signal is 32000 Hz and the first bandwidth WB1 is 31.25 Hz, the index is 0 to 1023 (= 1024 = 32000 Hz / 31.25 Hz).

次に、高周波数領域の下限周波数である境界周波数に対応する帯域のインデックスである高周波数領域の下限インデックスＨＳを定める。例えば、境界周波数を８０３１．２５Ｈｚに設定する場合、高周波数領域の下限インデックスＨＳは、２５７（＝８０３１．２５Ｈｚ／３１．２５Ｈｚ）である。 Next, a lower limit index HS of the high frequency region that is an index of a band corresponding to the boundary frequency that is the lower limit frequency of the high frequency region is determined. For example, when the boundary frequency is set to 8031.25 Hz, the lower limit index HS in the high frequency region is 257 (= 8031.25 Hz / 31.25 Hz).

ＣＰＵ３１は、式（１）に例示するように、低周波数領域の下限インデックスＬＳから上限インデックスＬＥ（＝ＨＳ−１）までの帯域の各々の音声信号のパワーＰ［ｉ］を加算することで、低周波数領域の音声信号のパワーＬＰを算出する。
As illustrated in Expression (1), the CPU 31 adds the power P [i] of each audio signal in the band from the lower limit index LS to the upper limit index LE (= HS-1) in the low frequency region, The power LP of the audio signal in the low frequency region is calculated.

インデックスｉに対応する帯域の音声信号のパワーＰ［ｉ］は、式（２）に例示するように、インデックスｉに対応する帯域の音声信号の実部Ｒ［ｉ］の二乗と虚部Ｉ［ｉ］の二乗を加算することで算出される。
Ｐ［ｉ］＝Ｒ［ｉ］^２＋Ｉ［ｉ］^２ …（２）
例えば、低周波数領域の下限インデックスＬＳは３（９３．７５Ｈｚ＝３１．２５Ｈｚ×３）、上限インデックスＬＥは２５６（８０００Ｈｚ＝３１．２５Ｈｚ×２５６）であってよい。 The power P [i] of the audio signal in the band corresponding to the index i is equal to the square of the real part R [i] of the audio signal in the band corresponding to the index i and the imaginary part I [ i] squared.
P [i] = R [i] ² + I [i] ² (2)
For example, the lower limit index LS in the low frequency region may be 3 (93.75 Hz = 31.25 Hz × 3), and the upper limit index LE may be 256 (8000 Hz = 31.25 Hz × 256).

ＣＰＵ３１は、ステップ１２２で、高周波数領域の音声信号のパワーＨＰを算出する。高周波数領域の音声信号のパワーＨＰは、式（３）に例示するように、高周波数領域の下限インデックスＨＳから高周波数領域の上限インデックスＨＥまでの帯域の各々の高周波数領域の音声信号のパワーＰ［ｉ］を加算することで算出される。

例えば、高周波数領域の下限インデックスＨＳは２５７（８０３１．２５Ｈｚ＝３１．２５Ｈｚ×２５７）、高周波数領域の上限インデックスＨＥは１０２３（３１９６８．７５Ｈｚ＝３１．２５Ｈｚ×１０２３）であってよい。 In step 122, the CPU 31 calculates the power HP of the audio signal in the high frequency region. The power HP of the audio signal in the high frequency region is the power of the audio signal in each high frequency region in the band from the lower limit index HS in the high frequency region to the upper limit index HE in the high frequency region, as illustrated in Expression (3). Calculated by adding P [i].

For example, the lower limit index HS in the high frequency region may be 257 (8031.25 Hz = 31.25 Hz × 257), and the upper limit index HE in the high frequency region may be 1023 (31968.75 Hz = 31.25 Hz × 1023).

ＣＰＵ３１は、ステップ１２３で、低周波数領域の音声信号のパワーＬＰに対する高周波数領域の音声信号のパワーＨＰの比率Ｈｒｔを算出する。比率Ｈｒｔは、式（４）に例示するように、高周波数領域の音声信号のパワーＨＰの対数から低周波数領域の音声信号のパワーＬＰの対数を減算することで算出することができる。
Ｈｒｔ＝１０ｌｏｇ_１０ＨＰ−１０ｌｏｇ_１０ＬＰ …（４） In step 123, the CPU 31 calculates a ratio Hrt of the power HP of the audio signal in the high frequency region to the power LP of the audio signal in the low frequency region. The ratio Hrt can be calculated by subtracting the logarithm of the power LP of the audio signal in the low frequency region from the logarithm of the power HP of the audio signal in the high frequency region, as illustrated in Expression (4).
Hrt = ₁₀ log ₁₀ HP- ₁₀ log ₁₀ LP (4)

次に、図４のステップ１０４の詳細について説明する。ステップ１０４では、ステップ１０３で算出した低周波数領域の音声信号のパワーに対する高周波数領域の音声信号のパワーの比率Ｈｒｔに基づいて、図７に例示する高周波数領域の帯域数Ｈｎｍを算出する。比率Ｈｒｔが大きくなるにしたがって、高周波数領域の重要度は高くなる。したがって、比率Ｈｒｔが大きくなるにしたがって、帯域数Ｈｎｍが大きくなるように設定する。即ち、比率Ｈｒｔが大きくなるにしたがって、高周波数領域の帯域の各々の帯域幅である第２帯域幅ＷＢ２は狭くなり、第１帯域幅ＷＢ１に近付く。第２帯域幅ＷＢ２については後述する。 Next, details of step 104 in FIG. 4 will be described. In step 104, the number of bands Hnm in the high frequency region illustrated in FIG. 7 is calculated based on the ratio Hrt of the power of the audio signal in the high frequency region to the power of the audio signal in the low frequency region calculated in step 103. As the ratio Hrt increases, the importance of the high frequency region increases. Therefore, the band number Hnm is set to increase as the ratio Hrt increases. That is, as the ratio Hrt increases, the second bandwidth WB2 that is the bandwidth of each band in the high frequency region becomes narrower and approaches the first bandwidth WB1. The second bandwidth WB2 will be described later.

詳細には、例えば、式（５）〜式（７）を使用して、比率Ｈｒｔに基づいて、高周波数領域の帯域数Ｈｎｍを取得する。式（５）〜式（７）の比率Ｈｒｔと高周波数領域の帯域数Ｈｎｍとの関係を図６に例示する。図６では、横軸に比率Ｈｒｔ、縦軸に高周波数領域の帯域数Ｈｎｍが示されている。
Ｈｎｍ＝ＨｎｍｎＨｒｔ＜ＨｒｔＬの場合 …（５）
Ｈｎｍ＝Ｈｎｍｎ＋
（（Ｈｎｍｘ−Ｈｎｍｎ）／（ＨｒｔＨ−ＨｒｔＬ））×（Ｈｒｔ−ＨｒｔＬ）
ＨｒｔＬ≦Ｈｒｔ＜ＨｒｔＨの場合 …（６）
Ｈｎｍ＝ＨｎｍｘＨｒｔ≧ＨｒｔＨの場合 …（７） Specifically, for example, the number of bands Hnm in the high frequency region is acquired based on the ratio Hrt using Equations (5) to (7). FIG. 6 illustrates the relationship between the ratio Hrt of Expressions (5) to (7) and the number of bands Hnm in the high frequency region. In FIG. 6, the horizontal axis represents the ratio Hrt, and the vertical axis represents the number of bands Hnm in the high frequency region.
When Hnm = Hnmn Hrt <HrtL (5)
Hnm = Hnmn +
((Hnmx−Hnmn) / (HrtH−HrtL)) × (Hrt−HrtL)
When HrtL ≦ Hrt <HrtH (6)
When Hnm = HnmxHrt ≧ HrtH (7)

例えば、併合前の高周波数領域の帯域数が２５６（＝ＨＥ−ＨＳ＋１）である場合、Ｈｎｍｘ＝２５６、Ｈｎｍｎ＝１、ＨｒｔＨ＝−１０［ｄＢ］、ＨｒｔＬ＝−５０［ｄＢ］であってよい。 For example, when the number of bands in the high frequency region before merging is 256 (= HE−HS + 1), Hnmx = 256, Hnmn = 1, HrtH = −10 [dB], and HrtL = −50 [dB]. .

次に、図４のステップ１０５の帯域併合処理の詳細について説明する。ステップ１０５の帯域併合処理では、図８Ａ及び図８Ｂに例示するように、高周波数領域の音声信号を、ステップ１０４で算出した高周波数領域の帯域数Ｈｎｍの帯域に分割するため、第１帯域幅ＷＢ１で分割された帯域を併合帯域数Ｎ毎に併合する。併合帯域数Ｎは、高周波数領域の重要度の高さに基づいて決定される係数の一例である。 Next, details of the band merging process in step 105 of FIG. 4 will be described. In the band merging process in step 105, as illustrated in FIG. 8A and FIG. 8B, the first bandwidth is used to divide the audio signal in the high frequency region into the band of the number of high frequency regions Hnm calculated in step 104. The bands divided by WB1 are merged every merged band number N. The number N of merged bands is an example of a coefficient that is determined based on the importance of the high frequency region.

詳細には、ＣＰＵ３１は、図９のステップ１３１で、併合帯域数Ｎを算出する。併合帯域数Ｎは、高周波数領域の重要度の高さが高くなるに従って小さくなり、最も小さい場合１となるように決定される。詳細には、式（８）に例示するように、高周波数領域の上限インデックスから下限インデックスを減算し１加算した値、即ち、高周波数領域のインデックス数を、帯域数Ｈｎｍで除算することで、併合帯域数Ｎを算出する。
Ｎ＝（ＨＥ−ＨＳ＋１）／Ｈｎｍ …（８）
Ｎは、四捨五入、切り上げ、または、切り下げで、整数の値とする。 Specifically, the CPU 31 calculates the number N of merged bands in step 131 of FIG. The number N of merged bands is determined so as to decrease as the importance of the high frequency region increases, and to be 1 when the importance is the lowest. Specifically, as illustrated in Equation (8), a value obtained by subtracting the lower limit index from the upper limit index in the high frequency region and adding one, that is, the index number in the high frequency region is divided by the number of bands Hnm. The number N of merged bands is calculated.
N = (HE−HS + 1) / Hnm (8)
N is rounded up, rounded up, or rounded down to an integer value.

即ち、高周波数領域は、第２帯域幅ＷＢ２（＝第１帯域幅ＷＢ１×併合帯域数Ｎ）で、帯域数Ｈｎｍの帯域に分割される。次に、併合前のＮ個の帯域の音声信号の平均値を対応する併合後の帯域の音声信号として設定する。 That is, the high frequency region is divided into bands of the number of bands Hnm by the second bandwidth WB2 (= first bandwidth WB1 × number of merged bands N). Next, the average value of the audio signals in N bands before merging is set as the corresponding audio signal in the band after merging.

ＣＰＵ３１は、ステップ１３２で、併合後の帯域数をカウントする変数ｊに０を設定する。ＣＰＵ３１は、ステップ１３３で、変数ｊに１を加算する。ＣＰＵ３１は、ステップ１３４で、併合される帯域数をカウントする変数ｋに０を設定する。ＣＰＵ３１は、ステップ１３５で、併合される帯域の先頭帯域のインデックスｍを算出する。インデックスｍは、式（９）に例示するように、高周波数領域の下限インデックスと、変数ｊから１を減算した値に、併合帯域数Ｎを乗算した値と、を加算することで、算出される。
ｍ＝ＨＳ＋（ｊ−１）×Ｎ …（９） In step 132, the CPU 31 sets 0 to a variable j for counting the number of bands after merging. In step 133, the CPU 31 adds 1 to the variable j. In step 134, the CPU 31 sets 0 to a variable k that counts the number of bands to be merged. In step 135, the CPU 31 calculates the index m of the head band of the bands to be merged. The index m is calculated by adding the lower limit index in the high frequency region and the value obtained by subtracting 1 from the variable j and the value obtained by multiplying the number of merge bands N, as illustrated in Expression (9). The
m = HS + (j−1) × N (9)

ＣＰＵ３１は、ステップ１３６で、Ｎ個分の併合前の帯域の音声信号の実部の累積を記憶する変数ｔＲ及びＮ個分の併合前の帯域の音声信号の虚部の累積を記憶する変数ｔＩに０を設定する。ＣＰＵ３１は、ステップ１３７で、変数ｋに１を加算する。ＣＰＵ３１は、ステップ１３８で、インデックスｍ＋ｋ−１に対応する帯域の音声信号の実部Ｒ［ｍ＋ｋ−１］を変数ｔＲに加算し、インデックスｍ＋ｋ−１に対応する帯域の音声信号の虚部Ｉ［ｍ＋ｋ−１］を変数ｔＩに加算する。 In step 136, the CPU 31 stores a variable tR that stores the accumulation of the real part of the audio signal in the band before N merges and a variable tI that stores the accumulation of the imaginary part of the audio signal in the band before N merges. Set to 0. In step 137, the CPU 31 adds 1 to the variable k. In step 138, the CPU 31 adds the real part R [m + k−1] of the audio signal in the band corresponding to the index m + k−1 to the variable tR, and the imaginary part I [ m + k−1] is added to the variable tI.

ＣＰＵ３１は、ステップ１３９で、変数ｋが併合帯域数Ｎより小さく、かつ、インデックスｍに変数ｋを加算した値が高周波数領域の上限インデックスより小さいか否か判定する。判定が肯定された場合、即ち、併合帯域数分の帯域がまだ併合されておらず、かつ、未処理のインデックスに対応する帯域がまだ存在する場合、ＣＰＵ３１は、ステップ１３７に戻る。一方、ステップ１３９の判定が否定された場合、即ち、併合帯域数分の帯域が併合されたか、または、未処理のインデックスに対応する帯域が存在しなくなった場合、ＣＰＵ３１はステップ１４０に進む。 In step 139, the CPU 31 determines whether or not the variable k is smaller than the merged band number N and the value obtained by adding the variable k to the index m is smaller than the upper limit index in the high frequency region. If the determination is affirmative, that is, if the bands corresponding to the number of merged bands have not yet been merged and there is still a band corresponding to an unprocessed index, the CPU 31 returns to step 137. On the other hand, if the determination in step 139 is negative, that is, if the bands corresponding to the number of merged bands have been merged, or if there is no band corresponding to the unprocessed index, the CPU 31 proceeds to step 140.

ＣＰＵ３１は、ステップ１４０で、式（１０｝に例示するように、変数ｔＲに累積された音声信号の実部の値を併合帯域数Ｎで除算して、累積された音声信号の実部の値の平均値を算出し、算出した平均値をｍＲ［ＬＥ＋ｊ］に記憶する。
ｍＲ［ＬＥ＋ｊ］＝ｔＲ／Ｎ …（１０） In step 140, the CPU 31 divides the value of the real part of the audio signal accumulated in the variable tR by the number N of merge bands as illustrated in the equation (10), and the value of the real part of the accumulated audio signal. Is calculated, and the calculated average value is stored in mR [LE + j].
mR [LE + j] = tR / N (10)

また、ＣＰＵ３１は、式（１１）に例示するように、変数ｔＩに累積された音声信号の虚部の値を併合帯域数Ｎで除算して、累積された音声信号の虚部の値の平均値を算出し、算出した平均値をｍＩ［ＬＥ＋ｊ］に記憶する。
ｍＩ［ＬＥ＋ｊ］＝ｔＩ／Ｎ …（１１） Further, as illustrated in the equation (11), the CPU 31 divides the value of the imaginary part of the audio signal accumulated in the variable tI by the merged band number N, and averages the value of the imaginary part of the accumulated audio signal. The value is calculated, and the calculated average value is stored in mI [LE + j].
mI [LE + j] = tI / N (11)

ＣＰＵ３１は、ステップ１４１で、変数ｊが高周波数領域の帯域数Ｈｎｍを越えたか否か判定し、判定が否定された場合、即ち、まだ併合されていない帯域が高周波数領域に存在する場合、ＣＰＵ３１は、ステップ１３３に戻る。一方、判定が肯定された場合、即ち、併合されていない帯域が高周波数領域に存在しない場合、ＣＰＵ３１は、帯域併合処理を終了する。 In step 141, the CPU 31 determines whether or not the variable j has exceeded the number of bands Hnm in the high frequency region. If the determination is negative, that is, if a band that has not yet been merged exists in the high frequency region, the CPU 31. Returns to step 133. On the other hand, if the determination is affirmative, that is, if a band that has not been merged does not exist in the high frequency region, the CPU 31 ends the band merge process.

なお、高周波数領域の併合帯域数Ｎが１である場合については、図９のステップ１３２〜ステップ１４１の処理を行う代わりに、式（１２）及び式（１３）に例示するように、ｍＲ［ｐ］に音声信号の実部Ｒ［ｐ］を記憶し、ｍＩ［ｐ］に虚部Ｉ［ｐ］を記憶すればよい。ｐは併合後の帯域のインデックスに相当するが、併合帯域数Ｎが１である場合、ｐは高周波数領域の下限インデックスＨＳ〜上限インデックスＨＥまで１ずつ増加する。
ｍＲ［ｐ］＝Ｒ［ｐ］ …（１２）
ｍＩ［ｐ］＝Ｉ［ｐ］ …（１３）
帯域併合を行わない低周波数領域でも、上記と同様に、ｍＲ［ｐ］に音声信号の実部Ｒ［ｐ］を記憶し、ｍＩ［ｐ］に虚部Ｉ［ｐ］を記憶する。低周波数領域では、ｐは低周波数領域の下限インデックスＬＳ〜上限インデックスＬＥまで１ずつ増加する。 In the case where the number N of merged bands in the high frequency region is 1, instead of performing the processing from step 132 to step 141 in FIG. 9, as illustrated in equations (12) and (13), mR [ The real part R [p] of the audio signal may be stored in p], and the imaginary part I [p] may be stored in mI [p]. p corresponds to the index of the band after merging, but when the number N of merging bands is 1, p increases by 1 from the lower limit index HS to the upper limit index HE in the high frequency region.
mR [p] = R [p] (12)
mI [p] = I [p] (13)
Even in the low-frequency region where band merging is not performed, the real part R [p] of the audio signal is stored in mR [p] and the imaginary part I [p] is stored in mI [p], as described above. In the low frequency region, p increases by 1 from the lower limit index LS to the upper limit index LE in the low frequency region.

次に、図４のステップ１０６の音声信号調整処理について説明する。図１０にステップ１０６の音声信号調整処理の詳細を例示する。音声信号調整処理では、高周波数領域の帯域を併合した後の帯域毎にゲインを算出し、併合前の帯域にゲインを配分し、併合前の帯域毎にゲインを適用することで、調整された音声信号を取得する。ＣＰＵ３１は、ステップ１５１で、変数ｐに０を設定する。 Next, the audio signal adjustment process in step 106 of FIG. 4 will be described. FIG. 10 illustrates details of the audio signal adjustment processing in step 106. In the audio signal adjustment processing, the gain is calculated for each band after merging the bands in the high frequency region, the gain is distributed to the band before merging, and the gain is applied to each band before merging. Acquire an audio signal. In step 151, the CPU 31 sets 0 to the variable p.

ＣＰＵ３１は、ステップ１５２で、変数ｐに１を加算し、ステップ１５３で、併合後の帯域の音声信号ｍＲ［ｐ］及びｍＩ［ｐ］に、既知の手法を適用して、併合後の帯域毎の騒音抑圧ゲインＧを算出する。ＣＰＵ３１は、ステップ１５４で、併合後の帯域毎の騒音抑圧ゲインＧを対応する併合前のＮ個の帯域の各々に分配する。 In step 152, the CPU 31 adds 1 to the variable p. In step 153, the CPU 31 applies a known technique to the audio signals mR [p] and mI [p] of the merged bands, and sets each band after the merge. The noise suppression gain G is calculated. In step 154, the CPU 31 distributes the noise suppression gain G for each band after merging to each of the corresponding N bands before merging.

図１１Ａに例示するように、併合後の帯域の騒音抑圧ゲインＧがｇである場合、図１１Ｂに例示するように、対応する併合前のＮ個の帯域の騒音抑圧ゲインＧはｇに設定される。ＣＰＵ３１は、ステップ１５５で、併合前の帯域毎の音声信号に騒音抑圧ゲインＧを適用することで、騒音を抑圧した音声信号を算出する。ＣＰＵ３１は、ステップ１５６で、ｐが低域周波数領域の帯域数Ｌｎｍ（＝ＬＥ−ＬＳ＋１）と高域周波数領域の併合後の帯域数Ｈｎｍとの和、即ち、併合後の全帯域数より小さいか否か判定する。ステップ１５６の判定が肯定された場合、即ち、併合後の帯域の全てについて処理が終了していない場合、ＣＰＵ３１は、ステップ１５２に戻る。一方、ステップ１５６の判定が否定された場合、即ち、併合後の帯域の全てについて処理が終了した場合、ＣＰＵ３１は、音声信号調整処理を終了する。 As illustrated in FIG. 11A, when the noise suppression gain G in the band after merging is g, as shown in FIG. 11B, the corresponding noise suppression gains G in the N bands before merging are set to g. The In step 155, the CPU 31 calculates a sound signal in which noise is suppressed by applying the noise suppression gain G to the sound signal for each band before merging. In step 156, the CPU 31 determines whether p is less than the sum of the number of bands Lnm (= LE-LS + 1) in the low frequency range and the number of bands Hnm after merging in the high frequency range, ie, the total number of bands after merging. Judge whether or not. If the determination in step 156 is affirmative, that is, if the processing has not been completed for all the bands after merging, the CPU 31 returns to step 152. On the other hand, when the determination in step 156 is negative, that is, when the processing is completed for all the bands after merging, the CPU 31 ends the audio signal adjustment processing.

なお、音声信号調整処理の一例として、騒音抑圧処理を行う例を使用したが、本実施形態はこれに限定されない。例えば、エコー抑圧処理、または音声強調処理などが行われてもよい。 In addition, although the example which performs a noise suppression process was used as an example of an audio | voice signal adjustment process, this embodiment is not limited to this. For example, echo suppression processing or speech enhancement processing may be performed.

図１２Ａに例示する帯域幅ＷＢＡ１の音声信号を第１帯域幅ＷＢ１で分割すると帯域数はＷＢＡ１／ＷＢ１となる。一方、図１２Ｂに例示する帯域幅ＷＢＡ２の音声信号を第１帯域幅ＷＢ１で分割すると帯域数はＷＢＡ２／ＷＢ１となる。即ち、帯域幅ＷＢＡ２の音声信号の帯域数はＷＢＡ２／ＷＢＡ１となり、音声信号の帯域幅が広くなるにしたがって、帯域数も増大する。 When the audio signal having the bandwidth WBA1 illustrated in FIG. 12A is divided by the first bandwidth WB1, the number of bands becomes WBA1 / WB1. On the other hand, when the audio signal having the bandwidth WBA2 illustrated in FIG. 12B is divided by the first bandwidth WB1, the number of bands becomes WBA2 / WB1. That is, the number of bands of the audio signal with the bandwidth WBA2 is WBA2 / WBA1, and the number of bands increases as the bandwidth of the audio signal becomes wider.

音声信号の帯域幅、即ち、サンプリング周波数を増大することで、高音質化を実現することができる。しかしながら、上記したように、帯域数も増大し、騒音抑圧処理などの音声信号調整処理の負担が増大する。音声信号調整処理による負担を低減するためには、音声信号を分割する帯域幅を広くして、帯域数を低減すればよい。 Higher sound quality can be achieved by increasing the bandwidth of the audio signal, that is, the sampling frequency. However, as described above, the number of bands also increases, and the burden of audio signal adjustment processing such as noise suppression processing increases. In order to reduce the burden caused by the audio signal adjustment processing, the bandwidth for dividing the audio signal may be widened to reduce the number of bands.

しかしながら、一般的に、音声信号の低周波数領域は、音声の基本周波数などの特徴を含むため、分割する帯域幅を広くして帯域数を低減することは、音声信号処理後の音質を劣化させる原因となり得る。したがって、本実施形態では、図１２Ｃに例示するように、音声信号の高周波数領域を分割する第２帯域幅ＷＢ２を、低周波数領域を分割する第１帯域幅ＷＢ１より広くして、高周波数領域の帯域数を低減することで、音声信号全体として帯域数を低減する。 However, in general, since the low frequency region of an audio signal includes features such as the fundamental frequency of the audio, reducing the number of bands by widening the divided bandwidth deteriorates the sound quality after the audio signal processing. It can be a cause. Therefore, in this embodiment, as illustrated in FIG. 12C, the second bandwidth WB2 for dividing the high frequency region of the audio signal is made wider than the first bandwidth WB1 for dividing the low frequency region, so that the high frequency region is divided. By reducing the number of bands, the number of bands as a whole audio signal is reduced.

しかしながら、高周波数領域の重要度によっては、高周波数領域を分割する第２帯域幅ＷＢ２を広くして帯域数を低減することで、音声信号処理後の音質が劣化する虞もある。この問題に対処するため、音声信号の高周波数領域の重要度の高さに基づいて、高周波数領域を分割する第２帯域幅ＷＢ２を決定する。即ち、高周波数領域の重要度の高さが高くなるにしたがって狭くなるように、第１帯域幅ＷＢ１以上の帯域幅である第２帯域幅ＷＢ２を決定する。これにより、音声信号処理後の音質が劣化しないようにすることができる。 However, depending on the importance of the high frequency region, there is a possibility that the sound quality after the audio signal processing is deteriorated by widening the second bandwidth WB2 that divides the high frequency region to reduce the number of bands. In order to cope with this problem, the second bandwidth WB2 for dividing the high frequency region is determined based on the importance of the high frequency region of the audio signal. That is, the second bandwidth WB2 that is a bandwidth equal to or larger than the first bandwidth WB1 is determined so as to become narrower as the importance of the high frequency region becomes higher. Thereby, it is possible to prevent the sound quality after the sound signal processing from being deteriorated.

本実施形態では、第１帯域分割部が、時間領域表現から周波数領域表現に変換した音声信号の低周波数領域を第１帯域幅で複数の第１帯域に分割する。帯域幅決定部が、音声信号の低周波数領域の周波数より周波数が高い高周波数領域の重要度の高さに基づいて、高周波数領域を分割するための第１帯域幅以上の第２帯域幅を決定する。第２帯域分割部が、帯域幅決定部で決定された第２帯域幅で、音声信号の高周波数領域を複数の第２帯域に分割する。音声信号調整部が、複数の第１帯域の各々及び複数の第２帯域の各々に対して音声信号調整処理を実行する。 In the present embodiment, the first band dividing unit divides the low frequency region of the audio signal converted from the time domain representation into the frequency domain representation into a plurality of first bands with the first bandwidth. The bandwidth determination unit determines a second bandwidth equal to or higher than the first bandwidth for dividing the high frequency region based on the importance of the high frequency region having a frequency higher than the frequency of the low frequency region of the audio signal. decide. The second band dividing unit divides the high frequency region of the audio signal into a plurality of second bands with the second bandwidth determined by the bandwidth determining unit. The audio signal adjustment unit executes an audio signal adjustment process for each of the plurality of first bands and each of the plurality of second bands.

本実施形態では、広帯域の音声信号処理において、音質を劣化させず、かつ、音声信号処理による負担を低減することを可能とする。 In the present embodiment, it is possible to reduce the burden due to the audio signal processing without deteriorating the sound quality in the wideband audio signal processing.

［第２実施形態］
次に、第２実施形態の一例を説明する。第１実施形態と同様の構成及び作用については、説明を省略する。第２実施形態は、図４のステップ１０３の音声信号分析処理で、音声信号の高周波数領域の重要度の高さが、高周波数領域の音声信号のパワーの非定常性が高くなるにしたがって高くされる点で第１実施形態と異なる。また、第２実施形態は、ステップ１０４で高周波数領域の帯域数を算出する際に、音声信号のパワーの非定常性に基づいて高周波数領域の帯域数を算出する点で、第１実施形態と異なる。 [Second Embodiment]
Next, an example of the second embodiment will be described. The description of the same configuration and operation as in the first embodiment will be omitted. In the second embodiment, in the audio signal analysis processing in step 103 of FIG. 4, the importance of the high frequency region of the audio signal increases as the power nonstationarity of the audio signal in the high frequency region increases. This is different from the first embodiment. In the second embodiment, when the number of bands in the high frequency region is calculated in step 104, the number of bands in the high frequency region is calculated based on the unsteadiness of the power of the audio signal. And different.

図４のステップ１０３の第２実施形態における詳細を図１３に例示する。ＣＰＵ３１は、ステップ１６１で、第１帯域幅ＷＢ１で分割された帯域毎の高周波数領域のパワーＰ［ｈｉ］（ｈｉ＝ＨＳ，…，ＨＥ）を算出する。パワーＰ［ｈｉ］の算出については、上述したパワーＰ［ｉ］の算出と同様であるため、説明を省略する。ＣＰＵ３１は、ステップ１６２で、帯域毎の高周波数領域の平均パワーＰａｖ［ｈｉ］を更新する。 Details of the second embodiment of step 103 in FIG. 4 are illustrated in FIG. In step 161, the CPU 31 calculates the power P [hi] (hi = HS,..., HE) in the high frequency region for each band divided by the first bandwidth WB1. Since the calculation of the power P [hi] is the same as the calculation of the power P [i] described above, the description thereof is omitted. In step 162, the CPU 31 updates the average power Pav [hi] in the high frequency region for each band.

平均パワーＰａｖ［ｈｉ］は、式（１４）に例示するように、インデックスｈｉに対応する帯域の音声信号の１つ前のフレームまでの平均パワーＰａｖＢ［ｈｉ］に１から現フレームの寄与係数ｃ１を減算した値を乗算した値と、インデックスｈｉに対応する帯域の音声信号のパワーＰ［ｈｉ］に現フレームの寄与係数ｃ１を乗算した値と、を加算することで取得することができる。
Ｐａｖ［ｈｉ］＝（１−ｃ１）＊ＰａｖＢ［ｈｉ］＋ｃ１×Ｐ［ｈｉ］ …（１４） The average power Pav [hi] is calculated from 1 to the contribution coefficient c1 of the current frame from 1 to the average power PavB [hi] up to the previous frame of the audio signal in the band corresponding to the index hi, as illustrated in Expression (14). Can be obtained by adding the value obtained by multiplying the value obtained by subtracting the value and the value obtained by multiplying the power P [hi] of the audio signal in the band corresponding to the index hi by the contribution coefficient c1 of the current frame.
Pav [hi] = (1-c1) * PavB [hi] + c1 × P [hi] (14)

寄与係数ｃ１は、０〜１の値であり、例えば、０．０１であってよい。また、最初のフレームについて平均パワーＰａｖ［ｈｉ］を計算する場合の、１つ前のフレームの平均パワーＰａｖＢ［ｈｉ］は０［ｄＢ］としてもよい。 The contribution coefficient c1 is a value between 0 and 1, for example, 0.01. Further, when the average power Pav [hi] is calculated for the first frame, the average power PavB [hi] of the previous frame may be set to 0 [dB].

ＣＰＵ３１は、ステップ１６３で、高周波数領域のパワーの非定常性Ｈｓｔを算出する。高周波数領域のパワーの非定常性Ｈｓｔは、式（１５）に例示するように算出することができる。式（１５）では、まず、パワーＰ［ｈｉ］から平均パワーＰａｖ［ｈｉ］を減算した値の絶対値を、高周波数領域の下限インデックスＨＳから上限インデックスＨＥまで加算する。加算した値を、上限インデックスＨＥから下限インデックスＨＳを減算し１を加算した値、即ち、高周波数領域のインデックス数で除算した値の対数が非定常性Ｈｓｔである。
In step 163, the CPU 31 calculates the power non-stationarity Hst in the high frequency region. The unsteadiness Hst of the power in the high frequency region can be calculated as illustrated in the equation (15). In equation (15), first, the absolute value of the value obtained by subtracting the average power Pav [hi] from the power P [hi] is added from the lower limit index HS to the upper limit index HE in the high frequency region. The logarithm of the value obtained by subtracting the lower limit index HS from the upper limit index HE and adding 1 to the added value, that is, the value obtained by dividing the added value by the number of indexes in the high frequency region is the nonstationary Hst.

次に、図４のステップ１０４の詳細について説明する。本実施形態のステップ１０４では、ステップ１０３で算出した高周波数領域のパワーの非定常性Ｈｓｔに基づいて、図１４に例示する高周波数領域の帯域数Ｈｎｍを算出する。非定常性Ｈｓｔが高くなるにしたがって、高周波数領域の重要度は高くなる。したがって、非定常性Ｈｓｔが高くなるにしたがって、帯域数Ｈｎｍが大きくなるように設定する。即ち、非定常性Ｈｓｔが高くなるにしたがって、高周波数領域の帯域の各々の帯域幅である第２帯域幅ＷＢ２は狭くなる。 Next, details of step 104 in FIG. 4 will be described. In step 104 of the present embodiment, the number of bands Hnm in the high frequency region illustrated in FIG. 14 is calculated based on the power nonstationarity Hst in the high frequency region calculated in step 103. As the nonstationary Hst increases, the importance of the high frequency region increases. Therefore, the band number Hnm is set to increase as the non-stationary property Hst increases. That is, as the non-stationary property Hst increases, the second bandwidth WB2 that is the bandwidth of each band in the high frequency region becomes narrower.

ステップ１０４では、ステップ１０３で算出した高周波数領域のパワーの非定常性Ｈｓｔに基づいて、高周波数領域の帯域数Ｈｎｍを算出する。詳細には、例えば、式（１６）〜式（１８）を使用して、高周波数領域の帯域数Ｈｎｍを取得する。式（１６）〜式（１８）の高周波数領域の音声信号のパワーの非定常性Ｈｓｔと高周波数領域の帯域数Ｈｎｍの関係を図１４に例示する。 In step 104, the number of bands Hnm in the high frequency region is calculated based on the non-stationarity Hst of the power in the high frequency region calculated in step 103. Specifically, for example, the number of bands Hnm in the high frequency region is acquired using Expressions (16) to (18). FIG. 14 illustrates the relationship between the non-stationary power Hst of the audio signal in the high frequency region and the number of bands Hnm in the high frequency region in Expression (16) to Expression (18).

図１４では、横軸に高周波数領域の音声信号のパワーの非定常性Ｈｓｔ、縦軸に高周波数領域の帯域数Ｈｎｍが示されている。
Ｈｎｍ＝ＨｎｍｎＨｓｔ＜ＨｓｔＬの場合 …（１６）
Ｈｎｍ＝Ｈｎｍｎ＋
（（Ｈｎｍｘ−Ｈｎｍｎ）／（ＨｓｔＨ−ＨｓｔＬ））×（Ｈｓｔ−ＨｓｔＬ）
ＨｓｔＬ≦Ｈｓｔ＜ＨｓｔＨの場合 …（１７）
Ｈｎｍ＝ＨｎｍｘＨｓｔ≧ＨｓｔＨの場合 …（１８） In FIG. 14, the horizontal axis represents the unsteadiness Hst of the power of the audio signal in the high frequency region, and the vertical axis represents the number of bands Hnm in the high frequency region.
When Hnm = Hnmn Hst <HstL (16)
Hnm = Hnmn +
((Hnmx−Hnmn) / (HstH−HstL)) × (Hst−HstL)
When HstL ≦ Hst <HstH (17)
When Hnm = HnmxHst ≧ HstH (18)

例えば、併合前の高周波数領域の帯域数が２５６（＝ＨＥ−ＨＳ＋１）である場合、Ｈｎｍｘ＝２５６、Ｈｎｍｎ＝１、ＨｓｔＨ＝６［ｄＢ］、ＨｓｔＬ＝１［ｄＢ］であってよい。 For example, when the number of bands in the high frequency region before merging is 256 (= HE−HS + 1), Hnmx = 256, Hnmn = 1, HstH = 6 [dB], and HstL = 1 [dB].

［第３実施形態］
次に、第３実施形態の一例を説明する。第１実施形態または第２実施形態と同様の構成及び作用については、説明を省略する。第３実施形態は、図４のステップ１０３の音声信号分析処理で、音声信号の高周波数領域の重要度の高さが、音声信号の基本周波数が高くなるにしたがって高くされる点で、第１実施形態または第２実施形態と異なる。また、第３実施形態は、ステップ１０４で高周波数領域の帯域数を算出する際に、音声信号の基本周波数の高さに基づいて高周波数領域の帯域数を算出する点で、第１実施形態または第２実施形態と異なる。 [Third Embodiment]
Next, an example of the third embodiment will be described. The description of the same configuration and operation as those in the first embodiment or the second embodiment is omitted. The third embodiment is the first embodiment in that the importance of the high frequency region of the audio signal is increased as the fundamental frequency of the audio signal is increased in the audio signal analysis processing of step 103 in FIG. It differs from the form or the second embodiment. Further, in the third embodiment, when the number of bands in the high frequency region is calculated in step 104, the number of bands in the high frequency region is calculated based on the height of the fundamental frequency of the audio signal. Or, it is different from the second embodiment.

図４のステップ１０３の本実施形態における詳細を図１５に例示する。ＣＰＵ３１は、ステップ１７１で後述する音声有無判定処理を実行する。ステップ１７２で、ステップ１７１の音声有無判定処理の結果に基づいて、音声の有無を判定する。ステップ１７２の判定が否定された場合、即ち、音声信号がユーザの発話による音声を含まないと判定された場合、音声信号分析処理を終了する。音声信号がユーザの発話による音声を含まない、即ち、雑音であれば、基本周波数を算出する必要はないためである。 Details of step 103 of FIG. 4 in this embodiment are illustrated in FIG. In step 171, the CPU 31 executes a voice presence / absence determination process described later. In step 172, the presence or absence of sound is determined based on the result of the sound presence / absence determination process in step 171. If the determination in step 172 is negative, that is, if it is determined that the voice signal does not include the voice of the user's utterance, the voice signal analysis process ends. This is because it is not necessary to calculate the fundamental frequency if the voice signal does not include the voice of the user's speech, that is, if it is noise.

一方、ステップ１７２の判定が肯定された場合、即ち、音声信号がユーザの発話による音声を含むと判定された場合、ＣＰＵ３１は、ステップ１７３で、後述する基本周波数算出処理を実行することで、基本周波数Ｂを算出する。ＣＰＵ３１は、ステップ１７４で、基本周波数の平均Ｂａｖを更新する。基本周波数の平均Ｂａｖは、式（１９）に例示するように、１から寄与係数ｃ２を減算した値に１つ前のフレームまでの基本周波数の平均Ｂａｖを乗算した値と、寄与係数ｃ２に現フレームの基本周波数Ｂを乗算した値と、を加算した値で、更新することができる。
Ｂａｖ＝（１−ｃ２）＊ＢａｖＢ＋ｃ２＊Ｂ …（１９） On the other hand, if the determination in step 172 is affirmative, that is, if it is determined that the audio signal includes the voice of the user's utterance, the CPU 31 executes the basic frequency calculation process described later in step 173, thereby The frequency B is calculated. In step 174, the CPU 31 updates the average Bav of the fundamental frequency. The average Bav of the fundamental frequency is calculated by multiplying the value obtained by subtracting the contribution coefficient c2 from 1 and the average Bav of the fundamental frequency up to the previous frame, and the contribution coefficient c2, as illustrated in Expression (19). It can be updated with a value obtained by adding the value obtained by multiplying the basic frequency B of the frame.
Bav = (1-c2) * BavB + c2 * B (19)

基本周波数の平均Ｂａｖを最初に更新する際の１つ前のフレームまでの基本周波数の平均ＢａｖＢは、３００［Ｈｚ］であってよい。寄与係数ｃ２は、現フレームの基本周波数の基本周波数の平均Ｂａｖへの寄与係数であり、寄与係数ｃ２は、０〜１であってよく、例えば、０．０１であってよい。 The average BavB of the fundamental frequency up to the previous frame when the average Bav of the fundamental frequency is updated for the first time may be 300 [Hz]. The contribution coefficient c2 is a contribution coefficient to the average Bav of the fundamental frequencies of the current frame, and the contribution coefficient c2 may be 0 to 1, for example, 0.01.

図１６に、図１５のステップ１７１の音声有無判定処理の詳細を例示する。ＣＰＵ３１は、ステップ１８１で、音声信号のパワーＰＡを算出する。音声信号のパワーＰＡは、式（２０）に例示するように、インデックスｉに対応する帯域の音声信号のパワーＰ［ｉ］をインデックス０からインデックスＨＥ、即ち、高周波数領域の上限インデックスまで加算した値である。
FIG. 16 illustrates details of the voice presence / absence determination processing in step 171 of FIG. In step 181, the CPU 31 calculates the power PA of the audio signal. The power PA of the audio signal is obtained by adding the power P [i] of the audio signal in the band corresponding to the index i from the index 0 to the index HE, that is, the upper limit index in the high frequency region, as illustrated in Expression (20). Value.

ＣＰＵ３１は、ステップ１８２で、ノイズのパワー仮平均ｔＮａｖを算出する。ノイズのパワー仮平均ｔＮａｖは、式（２１）に例示するように、１から寄与係数ｃ３を減算した値に１つ前のフレームまでのノイズの平均ＮａｖＢを乗算した値と、寄与係数ｃ３に音声信号のパワーＰＡを乗算した値と、を加算して算出することができる。
ｔＮａｖ＝（１−ｃ３）×ＮａｖＢ＋ｃ３×ＰＡ …（２１）
寄与係数ｃ３は、現在のフレームの音声信号のノイズのパワー仮平均ｔＮａｖへの寄与を表す寄与係数であり、寄与係数ｃ３は、０〜１であってよく、例えば、０．０１であってよい。音声信号がユーザの発話による音声を含まないと判定されるフレームが出現するまで、ｔＮａｖを算出する場合、ＮａｖＢは４０［ｄＢ］であってよい。 In step 182, the CPU 31 calculates the noise power temporary average tNav. As illustrated in Expression (21), the noise power temporary average tNav is obtained by multiplying the value obtained by subtracting the contribution coefficient c3 from 1 and the noise average NavB up to the previous frame, and the contribution coefficient c3 to the voice. It can be calculated by adding the value obtained by multiplying the signal power PA.
tNav = (1-c3) × NavB + c3 × PA (21)
The contribution coefficient c3 is a contribution coefficient representing the contribution of noise of the audio signal of the current frame to the power temporary average tNav, and the contribution coefficient c3 may be 0 to 1, for example, 0.01. . When calculating tNav until a frame in which it is determined that the audio signal does not include the voice of the user's speech appears, NavB may be 40 [dB].

ＣＰＵ３１は、ステップ１８３で、音声信号のパワーＰＡとノイズのパワー仮平均ｔＮａｖとの差が閾値Ｔｈ１を越えるか否か判定する。Ｔｈ１は、例えば、６［ｄＢ］であってよい。ステップ１８３の判定が肯定された場合、ＣＰＵ３１は、ステップ１８４で、フラグＶＦに音声信号が発話による音声を含むことを表す値１を設定し、音声有無判定処理を終了する。ステップ１８３の判定は、音声信号のパワーＰＡとノイズのパワー仮平均ｔＮａｖとの差が閾値Ｔｈ１を越えて、音声信号がユーザの発話による音声を含むと判定された場合、肯定される。 In step 183, the CPU 31 determines whether or not the difference between the audio signal power PA and the noise power temporary average tNav exceeds a threshold Th1. Th1 may be 6 [dB], for example. When the determination in step 183 is affirmed, in step 184, the CPU 31 sets the flag VF to a value 1 indicating that the voice signal includes voice due to speech, and ends the voice presence / absence determination process. The determination in step 183 is affirmed when the difference between the power PA of the audio signal and the temporary power average tNav of the noise exceeds the threshold value Th1 and it is determined that the audio signal includes the voice of the user's utterance.

ステップ１８３の判定が否定された場合、ＣＰＵ３１は、ステップ１８５でフラグＶＦに音声信号がユーザの発話による音声を含まないことを表す値０を設定する。ステップ１８３の判定は、音声信号のパワーＰＡとノイズのパワー仮平均ｔＮａｖとの差が閾値Ｔｈ１以下である場合、音声信号はユーザの発話による音声を含まないと判定し、否定される。ＣＰＵ３１は、ステップ１８６で、ノイズのパワー平均Ｎａｖにステップ１８２で算出したノイズのパワー仮平均ｔＮａｖを設定し、音声有無判定処理を終了する。現フレームはユーザの発話による音声を含まないノイズを表す音声信号のフレームであるためである。 When the determination in step 183 is negative, the CPU 31 sets a value 0 indicating that the voice signal does not include the voice of the user's utterance in the flag VF in step 185. If the difference between the power PA of the audio signal and the noise power temporary average tNav is equal to or less than the threshold value Th1, the determination in step 183 determines that the audio signal does not include the audio from the user's speech, and is denied. In step 186, the CPU 31 sets the noise power temporary average tNav calculated in step 182 to the noise power average Nav, and ends the voice presence / absence determination process. This is because the current frame is a frame of a voice signal that represents noise that does not include voice generated by the user's speech.

なお、図１５のステップ１７２では、フラグＶＦに値１が設定されている場合に、音声信号がユーザの発話による音声を含むと判定し、フラグＶＦに値０が設定されている場合に、音声信号がユーザの発話による音声を含まないと判定する。 In step 172 of FIG. 15, when the value 1 is set in the flag VF, it is determined that the audio signal includes the voice of the user's utterance, and when the value 0 is set in the flag VF, It is determined that the signal does not include the voice of the user's utterance.

図１７に、図１５のステップ１７３の基本周波数算出処理の詳細を例示する。ＣＰＵ３１は、ステップ１９１で、音声信号のパワーＰ［ｉ］を算出する。音声信号のパワーＰ［ｉ］の算出については、上述したため、説明を省略する。ＣＰＵ３１は、ステップ１９２で、自己相関ＳＲを算出する。自己相関ＳＲは、パワーＰ［ｉ］のスペクトルに逆フーリエ変換を実行することで、算出することができる。 FIG. 17 illustrates details of the fundamental frequency calculation process in step 173 of FIG. In step 191, the CPU 31 calculates the power P [i] of the audio signal. Since the calculation of the power P [i] of the audio signal has been described above, the description thereof is omitted. In step 192, the CPU 31 calculates an autocorrelation SR. Autocorrelation SR can be calculated by performing inverse Fourier transform on the spectrum of power P [i].

ＣＰＵ３１は、ステップ１９３で基本周波数Ｂを算出する。詳細には、音声信号の自己相関ＳＲにおいて、シフト時間が正であり、かつ、最小の位置で自己相関値が極大となる時間を基本周期τとする。サンプリング周波数Ｆｓを基本周期τで除算することで、基本周波数Ｂを算出することができる。
Ｂ＝Ｆｓ／τ …（２２） In step 193, the CPU 31 calculates a fundamental frequency B. Specifically, in the autocorrelation SR of the audio signal, the time when the shift time is positive and the autocorrelation value is maximized at the minimum position is defined as a basic period τ. The fundamental frequency B can be calculated by dividing the sampling frequency Fs by the fundamental period τ.
B = Fs / τ (22)

次に、図４のステップ１０４の詳細について説明する。本実施形態のステップ１０４では、ステップ１０３で算出した平均基本周波数Ｂａｖに基づいて、図１８に例示する高周波数領域の帯域数Ｈｎｍを算出する。平均基本周波数Ｂａｖが高くなるにしたがって、高周波数領域の重要度は高くなる。したがって、平均基本周波数Ｂａｖが高くなるにしたがって、帯域数Ｈｎｍが大きくなるように設定する。即ち、平均基本周波数Ｂａｖが高くなるにしたがって、高周波数領域の帯域の各々の帯域幅である第２帯域幅ＷＢ２は狭くなる。 Next, details of step 104 in FIG. 4 will be described. In step 104 of the present embodiment, the number of bands Hnm in the high frequency region illustrated in FIG. 18 is calculated based on the average basic frequency Bav calculated in step 103. As the average fundamental frequency Bav increases, the importance of the high frequency region increases. Therefore, the band number Hnm is set to increase as the average fundamental frequency Bav increases. That is, as the average basic frequency Bav increases, the second bandwidth WB2 that is the bandwidth of each band in the high frequency region becomes narrower.

ステップ１０４では、ステップ１０３で算出した平均基本周波数Ｂａｖに基づいて、高周波数領域の帯域数Ｈｎｍを算出する。詳細には、例えば、式（２３）〜式（２５）を使用して、高周波数領域の帯域数Ｈｎｍを取得する。式（２３）〜式（２５）の平均基本周波数Ｂａｖと高周波数領域の帯域数Ｈｎｍの関係を図１８に例示する。 In step 104, the number of bands Hnm in the high frequency region is calculated based on the average fundamental frequency Bav calculated in step 103. Specifically, for example, the number of bands Hnm in the high frequency region is acquired using Expressions (23) to (25). FIG. 18 illustrates the relationship between the average fundamental frequency Bav in the equations (23) to (25) and the number of bands Hnm in the high frequency region.

図１８では、横軸に平均基本周波数Ｂａｖ、縦軸に高周波数領域の帯域数Ｈｎｍが示されている。
Ｈｎｍ＝ＨｎｍｎＢａｖ＜ＢａｖＬの場合 …（２３）
Ｈｎｍ＝Ｈｎｍｎ＋
（（Ｈｎｍｘ−Ｈｎｍｎ）／（ＢａｖＨ−ＢａｖＬ））×（Ｂａｖ−ＢａｖＬ）
ＢａｖＬ≦Ｂａｖ＜ＢａｖＨの場合 …（２４）
Ｈｎｍ＝ＨｎｍｘＢａｖ≧ＢａｖＨの場合 …（２５） In FIG. 18, the horizontal axis represents the average fundamental frequency Bav, and the vertical axis represents the number of bands Hnm in the high frequency region.
When Hnm = Hnmn Bav <BavL (23)
Hnm = Hnmn +
((Hnmx−Hnmn) / (BavH−BavL)) × (Bav−BavL)
When BavL ≦ Bav <BavH (24)
When Hnm = Hnmx Bav ≧ BavH (25)

例えば、併合前の高周波数領域の帯域数が２５６（＝ＨＥ−ＨＳ＋１）である場合、Ｈｎｍｘ＝２５６、Ｈｎｍｎ＝１、ＢａｖＨ＝４００［Ｈｚ］、ＢａｖＬ＝７０［Ｈｚ］であってよい。なお、図１５のステップ１７２で、音声信号がユーザの発話による音声を含まないと判定された場合、即ち、音声信号がノイズを含むと判定された場合、Ｈｎｍは１に設定されてもよいし、１つ前のフレームの帯域数Ｈｎｍと同じ帯域数に設定されてもよい。本実施形態において、ユーザの発話による音声を含まない音声信号の高周波数領域の重要度の高さは、低いためである。 For example, when the number of bands in the high frequency region before merging is 256 (= HE−HS + 1), Hnmx = 256, Hnmn = 1, BavH = 400 [Hz], and BavL = 70 [Hz] may be used. Note that if it is determined in step 172 of FIG. 15 that the audio signal does not include the voice of the user's utterance, that is, if it is determined that the audio signal includes noise, Hnm may be set to 1. It may be set to the same number of bands as the number of bands Hnm of the previous frame. This is because, in the present embodiment, the importance of the high frequency region of the audio signal that does not include the voice generated by the user's utterance is low.

［第４実施形態］
次に、第４実施形態の一例を説明する。第１〜第３実施形態と同様の構成及び作用については、説明を省略する。第４実施形態は、図４のステップ１０３の音声信号分析処理で、音声信号の高周波数領域の重要度の高さが、音声信号が子音に対応する場合に子音に対応しない場合より高くされる点で、第１〜第３実施形態と異なる。また、第４実施形態は、ステップ１０４で高周波数領域の帯域数を算出する際に、音声信号が子音に対応するか否かに基づいて高周波数領域の帯域数を算出する点で、第１〜第３実施形態と異なる。 [Fourth Embodiment]
Next, an example of the fourth embodiment will be described. The description of the same configurations and operations as those of the first to third embodiments is omitted. In the fourth embodiment, in the audio signal analysis processing in step 103 of FIG. 4, the importance of the high frequency region of the audio signal is made higher when the audio signal corresponds to the consonant than when it does not correspond to the consonant. Thus, it is different from the first to third embodiments. In the fourth embodiment, when calculating the number of bands in the high frequency region in step 104, the number of bands in the high frequency region is calculated based on whether or not the audio signal corresponds to a consonant. -Different from the third embodiment.

本実施形態における図４のステップ１０３の詳細を図１９に例示する。ＣＰＵ３１は、ステップ２０１で、音声有無判定処理を実行し、ステップ２０２で、音声の有無を判定する。ステップ２０１及びステップ２０２は、図１５のステップ１７１及びステップ１７２と同様であるため、説明を省略する。ステップ２０２の判定が否定された場合、即ち、音声信号がユーザの発話による音声を含まないと判定された場合には、ＣＰＵ３１は、フラグＣＦに子音ではないことを示す値０を設定して、音声信号分析処理を終了する。 FIG. 19 illustrates details of step 103 in FIG. 4 in the present embodiment. In step 201, the CPU 31 executes a voice presence / absence determination process, and in step 202, determines the presence / absence of voice. Step 201 and step 202 are the same as step 171 and step 172 of FIG. When the determination in step 202 is negative, that is, when it is determined that the voice signal does not include the voice of the user's utterance, the CPU 31 sets a value 0 indicating that it is not a consonant to the flag CF, The audio signal analysis process is terminated.

ステップ２０２の判定が肯定された場合、即ち、音声信号がユーザの発話による音声を含むと判定された場合、ＣＰＵ３１は、ステップ２０３で基本周波数算出処理を実行する。ステップ２０３は、図１５のステップ１７３と同様であるため、説明を省略する。ＣＰＵ３１は、ステップ２０４で、基本周波数が所定の閾値Ｔｈ２を越えるか否か判定する。ステップ２０４の判定が否定された場合、即ち、基本周波数が閾値Ｔｈ２を越えない場合、ＣＰＵ３１は、ステップ２１０で、フラグＣＦに子音ではないことを示す値０を設定して、音声信号分析処理を終了する。閾値Ｔｈ２は、例えば、１０００［Ｈｚ］であってよい。 If the determination in step 202 is affirmative, that is, if it is determined that the audio signal includes the voice of the user's utterance, the CPU 31 executes a fundamental frequency calculation process in step 203. Step 203 is the same as step 173 in FIG. In step 204, the CPU 31 determines whether or not the fundamental frequency exceeds a predetermined threshold Th2. If the determination in step 204 is negative, that is, if the fundamental frequency does not exceed the threshold Th2, the CPU 31 sets the flag CF to a value 0 indicating that it is not a consonant in step 210, and performs the audio signal analysis process. finish. The threshold Th2 may be 1000 [Hz], for example.

ステップ２０４の判定が肯定された場合、即ち、基本周波数が閾値Ｔｈ２を越えた場合、ＣＰＵ３１は、ステップ２０５〜ステップ２０７で、低周波数領域の音声信号のパワーに対する高周波数領域の音声信号のパワーの比率を算出する。ステップ２０５〜ステップ２０７は、図５のステップ１２１〜ステップ１２３と同様であるため、説明を省略する。ＣＰＵ３１は、ステップ２０８で、低周波数領域の音声信号のパワーに対する高周波数領域の音声信号のパワーの比率が所定の閾値Ｔｈ３を越えるか否か判定する。ステップ２０８の判定が否定された場合、即ち、低周波数領域のパワーに対する高周波数領域のパワーの比率が所定の閾値Ｔｈ３を越えない場合、ＣＰＵ３１は、ステップ２１０でフラグＣＦに子音でないことを示す値０を設定して、音声信号分析処理を終了する。 If the determination in step 204 is affirmative, that is, if the fundamental frequency exceeds the threshold Th2, the CPU 31 determines in step 205 to step 207 that the power of the audio signal in the high frequency region relative to the power of the audio signal in the low frequency region. Calculate the ratio. Steps 205 to 207 are the same as steps 121 to 123 in FIG. In step 208, the CPU 31 determines whether or not the ratio of the power of the audio signal in the high frequency region to the power of the audio signal in the low frequency region exceeds a predetermined threshold Th3. If the determination in step 208 is negative, that is, if the ratio of the power in the high frequency region to the power in the low frequency region does not exceed the predetermined threshold Th3, the CPU 31 indicates in step 210 that the flag CF is not a consonant. 0 is set and the audio signal analysis process is terminated.

ステップ２０８の判定が肯定された場合、ＣＰＵ３１は、ステップ２０９でフラグＣＦに子音であることを示す値１を設定して、音声信号分析処理を終了する。ステップ２０９の判定は、低周波数領域の音声信号のパワーに対する高周波数領域の音声信号のパワーの比率が所定の値Ｔｈ３を越えた場合に肯定される。 If the determination in step 208 is affirmed, the CPU 31 sets a value 1 indicating that it is a consonant in the flag CF in step 209, and ends the sound signal analysis process. The determination in step 209 is affirmed when the ratio of the power of the audio signal in the high frequency region to the power of the audio signal in the low frequency region exceeds a predetermined value Th3.

次に、図４のステップ１０４の詳細について説明する。ステップ１０４では、併合後の高周波数領域の帯域数Ｈｎｍを算出する。ＣＰＵ３１は、ステップ１０３（詳細には、図９のステップ２０９及びステップ２１０）で設定したフラグＣＦの値に基づいて、高周波数領域の帯域数Ｈｎｍを算出する。 Next, details of step 104 in FIG. 4 will be described. In step 104, the number of bands Hnm in the high frequency region after merging is calculated. The CPU 31 calculates the number of bands Hnm in the high frequency region based on the value of the flag CF set in step 103 (specifically, step 209 and step 210 in FIG. 9).

例えば、フラグＣＦに値０が設定されている場合、即ち、音声信号が子音に対応しない場合、帯域数Ｈｎｍに１に近い小さい値を設定する。また、フラグＣＦに値１が設定されている場合、即ち、音声信号が子音に対応する場合、帯域数Ｈｎｍに併合前の高周波数領域の帯域数ＨＥ−ＨＳ＋１に近い値を設定する。 For example, when the value 0 is set in the flag CF, that is, when the audio signal does not correspond to the consonant, a small value close to 1 is set to the band number Hnm. Further, when the value 1 is set in the flag CF, that is, when the audio signal corresponds to a consonant, a value close to the band number HE-HS + 1 in the high frequency region before merging is set to the band number Hnm.

詳細には、フラグＣＦに値０が設定されている場合、即ち、音声信号が子音に対応しない場合、例えば、帯域数Ｈｎｍに８を設定し、フラグＣＦに値１が設定されている場合、即ち、音声信号が子音に対応する場合、例えば、帯域数Ｈｎｍに２５６を設定する。併合前の高周波数領域の帯域数は２５６（＝ＨＥ−ＨＳ＋１）であるとする。 Specifically, when the value 0 is set in the flag CF, that is, when the audio signal does not correspond to a consonant, for example, when the number of bands Hnm is set to 8 and the value 1 is set in the flag CF, That is, when the audio signal corresponds to a consonant, for example, 256 is set to the band number Hnm. It is assumed that the number of bands in the high frequency region before merging is 256 (= HE−HS + 1).

なお、第１〜第４実施形態の何れか２つ以上を組み合わせて、高周波数領域の帯域数Ｈｎｍを算出するようにしてもよい。第１及び第２実施形態、第１及び第３実施形態、第１及び第４実施形態、第２及び第３実施形態、第２及び第４実施形態、第３及び第４実施形態、第１、第２、及び第３実施形態、及び、第１、第２及び第４実施形態、の組み合わせが可能である。また、第２、第３及び第４実施形態、及び第１〜第４実施形態の組み合わせも可能である。 Note that the number of bands Hnm in the high frequency region may be calculated by combining any two or more of the first to fourth embodiments. 1st and 2nd embodiment, 1st and 3rd embodiment, 1st and 4th embodiment, 2nd and 3rd embodiment, 2nd and 4th embodiment, 3rd and 4th embodiment, 1st Combinations of the first, second, and third embodiments and the first, second, and fourth embodiments are possible. Also, combinations of the second, third, and fourth embodiments and the first to fourth embodiments are possible.

例えば、第１〜第４の実施形態を組み合わせて、高周波数領域の帯域数Ｈｎｍを算出する。第１実施形態で低周波数領域の音声信号のパワーに対する高周波数領域の音声信号のパワーの比率に基づいて算出した帯域数をＨｎｍ１とし、第２実施形態で高周波数領域の非定常性に基づいて算出した帯域数をＨｎｍ２とする。第３実施形態で平均基本周波数に基づいて算出した帯域数をＨｎｍ３とし、第４実施形態で音声信号が子音に対応するか否かに基づいて算出した帯域数をＨｎｍ４とする。 For example, the number of bands Hnm in the high frequency region is calculated by combining the first to fourth embodiments. The number of bands calculated based on the ratio of the power of the audio signal in the high frequency region to the power of the audio signal in the low frequency region in the first embodiment is Hnm1, and based on the non-stationarity in the high frequency region in the second embodiment. The calculated number of bands is Hnm2. The number of bands calculated based on the average fundamental frequency in the third embodiment is Hnm3, and the number of bands calculated based on whether the audio signal corresponds to a consonant in the fourth embodiment is Hnm4.

この場合、帯域数Ｈｎｍは、式（２６）で例示するように算出することができる。
Ｈｎｍ＝ｄ１×Ｈｎｍ１＋ｄ２×Ｈｎｍ２＋
ｄ３×Ｈｎｍ３＋ｄ４×Ｈｎｍ４ …（２６）
ｄ１〜ｄ４は、０〜１の値を有する寄与係数であり、ｄ１＋ｄ２＋ｄ３＋ｄ４＝１である。例えば、ｄ１＝０．２５、ｄ２＝０．２、ｄ３＝０．２５、ｄ４＝０．３であってよい。 In this case, the number of bands Hnm can be calculated as exemplified by Equation (26).
Hnm = d1 × Hnm1 + d2 × Hnm2 +
d3 × Hnm3 + d4 × Hnm4 (26)
d1 to d4 are contribution coefficients having values of 0 to 1, and d1 + d2 + d3 + d4 = 1. For example, d1 = 0.25, d2 = 0.2, d3 = 0.25, and d4 = 0.3.

第１及び第２実施形態を組み合わせて、高周波数領域の帯域数Ｈｎｍを算出する場合には、ｄ３＝ｄ４＝０とする。第１及び第３実施形態を組み合わせて、高周波数領域の帯域数Ｈｎｍを算出する場合には、ｄ２＝ｄ４＝０とする。第１及び第４実施形態を組み合わせて、高周波数領域の帯域数Ｈｎｍを算出する場合には、ｄ２＝ｄ３＝０とする。 When the number of bands Hnm in the high frequency region is calculated by combining the first and second embodiments, d3 = d4 = 0. When the number of bands Hnm in the high frequency region is calculated by combining the first and third embodiments, d2 = d4 = 0. When the number of bands Hnm in the high frequency region is calculated by combining the first and fourth embodiments, d2 = d3 = 0.

第２及び第３実施形態を組み合わせて、高周波数領域の帯域数Ｈｎｍを算出する場合には、ｄ１＝ｄ４＝０とする。第２及び第４実施形態を組み合わせて、高周波数領域の帯域数Ｈｎｍを算出する場合には、ｄ１＝ｄ３＝０とする。第３及び第４実施形態を組み合わせて、高周波数領域の帯域数Ｈｎｍを算出する場合には、ｄ１＝ｄ２＝０とする。 When the number of bands Hnm in the high frequency region is calculated by combining the second and third embodiments, d1 = d4 = 0. When the number of bands Hnm in the high frequency region is calculated by combining the second and fourth embodiments, d1 = d3 = 0. When the number of bands Hnm in the high frequency region is calculated by combining the third and fourth embodiments, d1 = d2 = 0.

第１、第２及び第３実施形態を組み合わせて、高周波数領域の帯域数Ｈｎｍを算出する場合には、ｄ４＝０とする。第１、第２及び第４実施形態を組み合わせて、高周波数領域の帯域数Ｈｎｍを算出する場合には、ｄ３＝０とする。第２、第３及び第４実施形態を組み合わせて、高周波数領域の帯域数Ｈｎｍを算出する場合には、ｄ１＝０とする。
［第５実施形態］ When the number of bands Hnm in the high frequency region is calculated by combining the first, second, and third embodiments, d4 = 0. When the number of bands Hnm in the high frequency region is calculated by combining the first, second, and fourth embodiments, d3 = 0. When the second, third, and fourth embodiments are combined to calculate the number of bands Hnm in the high frequency region, d1 = 0 is set.
[Fifth Embodiment]

次に、第５実施形態の一例を説明する。第５実施形態の一例を図２０に示す。第５実施形態は、ステップ２２５で、高周波数領域の下限インデックス、即ち、高周波数領域の下限周波数である境界周波数を変更する点で、第１実施形態〜第４実施形態と異なる。 Next, an example of the fifth embodiment will be described. An example of the fifth embodiment is shown in FIG. The fifth embodiment is different from the first to fourth embodiments in that, in step 225, the lower limit index of the high frequency region, that is, the boundary frequency that is the lower limit frequency of the high frequency region is changed.

高周波数領域の下限インデックスＨＳ１が図２１Ａに示す帯域に対応している場合、帯域併合は、図２１Ａ及び図２１Ｂに示される高周波数領域Ｈａｒｅａ１に対して行われる。高周波数領域Ｈａｒｅａ１は、下限インデックスＨＳ１〜上限インデックスＨＥの帯域を含む。 When the lower limit index HS1 of the high frequency region corresponds to the band shown in FIG. 21A, the band merging is performed on the high frequency region Area1 shown in FIGS. 21A and 21B. The high frequency region Area1 includes a band of lower limit index HS1 to upper limit index HE.

本実施形態では、図２１Ｂに例示される併合後の全帯域数が所定の最大帯域数を越える場合、高周波数領域の下限インデックスをＨＳ２に変更する。即ち、高周波数領域の下限周波数である境界周波数を低減する。これにより、帯域併合は図２１Ｃに示される、高周波数領域Ｈａｒｅａ１よりも広い高周波数領域Ｈａｒｅａ２に対して行われ、併合後の全帯域数が低減される。即ち、併合後の高周波数領域の第２帯域幅ＷＢ２を広くすることで、高周波数領域の帯域数Ｈｎｍは変わらず、低周波数領域でＨＳ１−ＨＳ２個の帯域数が低減する。 In the present embodiment, when the total number of bands after merging exemplified in FIG. 21B exceeds a predetermined maximum number of bands, the lower limit index of the high frequency region is changed to HS2. That is, the boundary frequency that is the lower limit frequency in the high frequency region is reduced. As a result, band merging is performed on the high frequency region Area2 wider than the high frequency region Area1 shown in FIG. 21C, and the total number of bands after merging is reduced. That is, by widening the second bandwidth WB2 in the high frequency region after merging, the number of bands Hnm in the high frequency region does not change, and the number of HS1-HS2 bands in the low frequency region is reduced.

図２０のステップ２２１〜２２４は、図４のステップ１０１〜ステップ１０４と同様であり、図２０のステップ２２６〜ステップ２３０は、図４のステップ１０５〜ステップ１０９と同様であるため、説明を省略する。 20 are the same as steps 101 to 104 in FIG. 4, and steps 226 to 230 in FIG. 20 are the same as steps 105 to 109 in FIG. .

図２２に、図２０のステップ２２５の詳細を例示する。ＣＰＵ３１は、ステップ２３１で、高周波数領域の下限インデックスＨＳとステップ２２４で算出した高周波数領域の帯域数Ｈｎｍとを加算した値が、所定の最大帯域数Ａｍｘを越えるか否か判定する。ステップ２３１の判定が否定された場合、即ち、併合後の全帯域数が所定の最大帯域数Ａｍｘを越えない場合、ＣＰＵ３１は、高周波数領域下限変更処理を終了する。 FIG. 22 illustrates details of step 225 of FIG. In step 231, the CPU 31 determines whether or not the value obtained by adding the lower limit index HS of the high frequency region and the number of bands Hnm of the high frequency region calculated in step 224 exceeds a predetermined maximum number of bands Amx. If the determination in step 231 is negative, that is, if the total number of bands after merging does not exceed the predetermined maximum number of bands Amx, the CPU 31 ends the high frequency region lower limit changing process.

ステップ２３１の判定が肯定された場合、即ち、併合後の全帯域数が所定の最大帯域数Ａｍｘを越えた場合、ＣＰＵ３１は、ステップ２３２で、下限インデックスＨＳを低減する。詳細には、式（２７）に例示するように、高周波数領域の下限インデックスＨＳに最大帯域数Ａｍｘから高周波数領域の帯域数Ｈｎｍを減算した値を設定する。
ＨＳ＝Ａｍｘ−Ｈｎｍ …（２７）
即ち、低周波数領域の帯域数ＨＳ（＝ＬＥ＋１）をＨＳ１からＡｍｘ−Ｈｎｍ（＝ＨＳ２）に低減することで、図２１Ｃに例示するように、低周波数領域でＨＳ１−ＨＳ２個の帯域数が低減し、高周波数領域の帯域数はＨｎｍのままであるため、全体として帯域数をＨＳ１−ＨＳ２個分低減することができる。 If the determination in step 231 is affirmative, that is, if the total number of bands after merging exceeds a predetermined maximum number of bands Amx, the CPU 31 reduces the lower limit index HS in step 232. Specifically, as exemplified in Expression (27), a value obtained by subtracting the number of bands Hnm in the high frequency region from the maximum number of bands Amx is set to the lower limit index HS in the high frequency region.
HS = Amx−Hnm (27)
That is, by reducing the number of bands HS (= LE + 1) in the low frequency region from HS1 to Amx-Hnm (= HS2), the number of HS1-HS2 bands in the low frequency region is reduced as illustrated in FIG. 21C. However, since the number of bands in the high frequency region remains Hnm, the number of bands as a whole can be reduced by two HS1-HS.

なお、上記では、図４のステップ１０４で算出した高周波数領域の帯域数Ｈｎｍの値を変更しない例、即ち、図２１Ｃに例示されるように、併合後の第２帯域幅ＷＢ２を広くする、即ち、併合帯域数Ｎを増大する例について説明した。しかしながら、本実施形態は、これに限定されない。例えば、ステップ１０４で算出した高周波数領域の帯域数Ｈｎｍから算出される併合帯域数Ｎを変更しないように、帯域数Ｈｎｍの値をステップ１０４で算出した帯域数Ｈｎｍよりも増大するようにしてもよい。 In the above, an example in which the value of the number of high frequency bands Hnm calculated in step 104 of FIG. 4 is not changed, that is, as illustrated in FIG. 21C, the second bandwidth WB2 after merging is widened. That is, the example in which the number N of merged bands is increased has been described. However, the present embodiment is not limited to this. For example, the value of the band number Hnm may be made larger than the band number Hnm calculated in step 104 so as not to change the merged band number N calculated from the band number Hnm calculated in step 104. Good.

詳細には、式（２８）に例示するように、低周波数領域の帯域数ＨＳ（＝ＬＥ＋１）と高周波数領域の帯域数Ｈｎｍとを加算した値が所定の最大帯域数Ａｍｘ以下の値となるように、高周波数領域の下限インデックスＨＳを調整する。
ＨＳ＋Ｈｎｍ≦Ａｍｘ …（２８） Specifically, as illustrated in Expression (28), a value obtained by adding the number of bands HS (= LE + 1) in the low frequency region and the number of bands Hnm in the high frequency region is a value equal to or less than the predetermined maximum number of bands Amx. Thus, the lower limit index HS in the high frequency region is adjusted.
HS + Hnm ≦ Amx (28)

即ち、式（２９）に例示するように、最大帯域数Ａｍｘに併合帯域数Ｎを乗算した値から、高周波数領域の上限インデックスに１を加算した値を減算した値を、併合帯域数Ｎから１を減算した値で除算した値以下となるように、下限インデックスＨＳを設定する。
ＨＳ≦（Ａｍｘ×Ｎ−（ＨＥ＋１））／（Ｎ−１） …（２９）
下限インデックスＨＳは、切り下げで、整数の値とする。 That is, as exemplified in the equation (29), a value obtained by subtracting a value obtained by adding 1 to the upper limit index of the high frequency region from a value obtained by multiplying the maximum band number Amx by the merged band number N is obtained from the merged band number N. The lower limit index HS is set so as to be equal to or less than the value obtained by dividing 1 by the value obtained by subtracting 1.
HS ≦ (Amx × N− (HE + 1)) / (N−1) (29)
The lower limit index HS is rounded down to an integer value.

式（２９）は、以下のように導かれる。式（２８）において、高周波数領域の帯域数Ｈｎｍを、式（３０）に例示するように置き替えると、式（３１）となる。式（３０）は、高周波数領域の上限インデックスＨＥから低減後の下限インデックスＨＳを減算し、１を加算した値を、併合帯域数Ｎで除算した値が、下限インデックスＨＳを低減した後の高周波数領域の帯域数Ｈｎｍであることを表している。高周波数領域の上限インデックスＨＥから低減後の下限インデックスＨＳを減算し、１を加算した値は、下限インデックスＨＳを低減した後であって、併合前の高周波数領域の帯域数である。 Equation (29) is derived as follows. In Expression (28), when the number of bands Hnm in the high frequency region is replaced as illustrated in Expression (30), Expression (31) is obtained. Equation (30) is obtained by subtracting the lower limit index HS after reduction from the upper limit index HE in the high frequency region and adding 1 to the value obtained by dividing the number N of merged bands by reducing the lower limit index HS. This indicates that the number of bands in the frequency domain is Hnm. The value obtained by subtracting the lower limit index HS after reduction from the upper limit index HE in the high frequency region and adding 1 is the number of bands in the high frequency region after the lower limit index HS is reduced and before merging.

Ｈｎｍ＝（ＨＥ−ＨＳ＋１）／Ｎ …（３０）
なお、高周波数領域の併合帯域数Ｎを算出する方法については、図９のステップ１３１と同様であるため、説明を省略する。
ＨＳ＋（ＨＥ−ＨＳ＋１）／Ｎ≦Ａｍｘ …（３１）
式（３１）の左辺にＨＳが現れるように変形すると、式（２９）となる。 Hnm = (HE−HS + 1) / N (30)
The method for calculating the number N of merged bands in the high frequency region is the same as that in step 131 in FIG.
HS + (HE−HS + 1) / N ≦ Amx (31)
When transforming so that HS appears on the left side of Expression (31), Expression (29) is obtained.

この場合、調整前は低周波数領域であった調整後の高周波数領域の下限インデックスＨＳ（ＨＳ２）〜調整前の下限インデックスＨＳ−１（ＨＳ１−１）に対応する帯域が、調整後には高周波数領域となり、図２１Ｄに例示されるように、併合帯域数Ｎで併合される。即ち、ＨＳ２〜ＨＳ１−１に対応する帯域の併合後の帯域数は、下限インデックスＨＳを調整する前の１／Ｎとなるため、下限インデックスＨＳ調整後の帯域数は全体として低減される。 In this case, the band corresponding to the lower limit index HS (HS2) of the high frequency region after adjustment, which was the low frequency region before the adjustment, to the lower limit index HS-1 (HS1-1) before adjustment, is the high frequency after the adjustment. As shown in FIG. 21D, it is merged with the number N of merged bands. That is, since the number of bands after merging the bands corresponding to HS2 to HS1-1 is 1 / N before adjusting the lower limit index HS, the number of bands after adjusting the lower limit index HS is reduced as a whole.

また、本実施形態では、低周波数領域の帯域数と高周波数領域の帯域数との和が最大帯域数を越えないように、境界周波数を低減する。 In the present embodiment, the boundary frequency is reduced so that the sum of the number of bands in the low frequency region and the number of bands in the high frequency region does not exceed the maximum number of bands.

本実施形態では、音声信号処理による負担を所定量以下に低減することを可能とする。 In the present embodiment, it is possible to reduce the burden due to the audio signal processing to a predetermined amount or less.

なお、本実施形態は、第１〜第４の実施形態の何れか、または、第１〜第４実施形態の何れか少なくとも２つの組み合わせに適用されてもよい。 In addition, this embodiment may be applied to any one of the first to fourth embodiments or any combination of at least two of the first to fourth embodiments.

なお、第１〜第５実施形態では、低周波数領域が音声信号を周波数領域表現に変換する際の周波数分解能で分割されているものとして説明したが、第１〜第５実施形態はこれに限定されない。例えば、音声信号処理による負担をさらに低減することが期待される場合、低周波数領域は、周波数分解能のＭ倍（Ｍは２以上の自然数）の第１帯域幅に分割されてもよい。 In the first to fifth embodiments, the low frequency region has been described as being divided by the frequency resolution when the audio signal is converted into the frequency domain representation. However, the first to fifth embodiments are limited to this. Not. For example, when it is expected to further reduce the burden due to the audio signal processing, the low frequency region may be divided into a first bandwidth that is M times the frequency resolution (M is a natural number of 2 or more).

なお、第１〜第５実施形態では、フレーム毎に、高周波数領域の併合後の帯域数Ｈｎｍを算出する例について説明したが、第１〜第５実施形態はこれに限定されない。Ｌフレーム毎に帯域数Ｈｎｍを算出し、その後のＬ−１フレームについては、同じ帯域数Ｈｎｍで、高周波数領域を分割するようにしてもよい。Ｌは、例えば、５０〜１００であってよい。音声信号は、ある程度継続的に類似した特徴を示す傾向があるためである。 In the first to fifth embodiments, the example in which the number of bands Hnm after merging the high frequency regions is calculated for each frame has been described, but the first to fifth embodiments are not limited to this. The number of bands Hnm may be calculated for each L frame, and for the subsequent L-1 frames, the high frequency region may be divided with the same number of bands Hnm. L may be, for example, 50-100. This is because audio signals tend to show similar characteristics to some extent continuously.

なお、図６、１４、１８及び式（１）〜（３１）は一例であり、第１〜第５の実施形態はこれらに限定されない。また、図４、５、９、１０、１３、１５、１６、１７、１９、２０、２２のフローチャートのステップの順序は一例であり、第１〜第５実施形態は当該順序に限定されない。また、第１〜第５実施形態は、音声通話などの音声データのリアルタイム処理に適用されてもよいし、予め記憶装置に記憶されている音声データに適用されてもよい。 In addition, FIG.6,14,18 and Formula (1)-(31) are examples, and 1st-5th embodiment is not limited to these. Moreover, the order of the steps in the flowcharts of FIGS. 4, 5, 9, 10, 13, 15, 16, 17, 19, 20, and 22 is an example, and the first to fifth embodiments are not limited to the order. The first to fifth embodiments may be applied to real-time processing of voice data such as a voice call, or may be applied to voice data stored in advance in a storage device.

以上の各実施形態に関し、更に以下の付記を開示する。 Regarding the above embodiments, the following additional notes are disclosed.

（付記１）
時間領域表現から周波数領域表現に変換した音声信号の低周波数領域を第１帯域幅で複数の第１帯域に分割する第１帯域分割部と、
前記音声信号の前記低周波数領域の周波数より周波数が高い高周波数領域の重要度の高さに基づいて、前記高周波数領域を分割するための前記第１帯域幅以上の第２帯域幅を決定する帯域幅決定部と、
前記帯域幅決定部で決定された前記第２帯域幅で、前記音声信号の前記高周波数領域を複数の第２帯域に分割する第２帯域分割部と、
前記複数の第１帯域の各々及び前記複数の第２帯域の各々に対して音声信号調整処理を実行する音声信号調整部と、
を含む、音声信号処理装置。
（付記２）
前記帯域幅決定部は、
前記高周波数領域の重要度の高さが高くなるにしたがって狭くなるように前記帯域幅を決定する、
付記１の音声信号処理装置。
（付記３）
前記音声信号の高周波数領域の重要度の高さは、
低周波数領域の音声信号のパワーに対する前記高周波数領域の音声信号のパワーの比率、前記高周波数領域の音声信号のパワーの非定常性、前記音声信号の基本周波数、及び前記音声信号が子音に対応するか否か、の少なくとも１つに基づいて決定され、
前記低周波数領域の音声信号のパワーに対する前記高周波数領域の音声信号のパワーの比率が大きくなるにしたがって高くなり、
前記高周波数領域の音声信号のパワーの非定常性が高くなるにしたがって高くなり、
前記音声信号の基本周波数が高くなるにしたがって高くなり、
前記音声信号が子音に対応する場合は子音に対応しない場合より高くなるように、
決定される、
付記１または付記２に記載の音声信号処理装置。
（付記４）
前記帯域幅決定部は、
前記高周波数領域の重要度の高さに基づいて係数を決定する係数決定部、
を含み、
前記第１帯域幅に前記係数決定部で決定された前記係数を乗じることで、前記第２帯域幅を決定する、
付記１〜付記３の何れかの音声信号処理装置。
（付記５）
前記係数は、前記高周波数領域の重要度の高さが高くなるにしたがって小さくなり、最も小さい場合、前記係数は１となるように決定される、
付記４の音声信号処理装置。
（付記６）
前記係数は自然数である、
付記４または付記５の音声信号処理装置。
（付記７）
前記高周波数領域は、周波数が所定の境界周波数以上の周波数領域であり、
前記低周波数領域は、周波数が前記境界周波数より低い周波数領域であり、
前記第１帯域分割部で分割される前記第１帯域の数と前記第２帯域分割部で分割される前記第２帯域の数との和が最大帯域数を越えないように前記境界周波数を低減する、
付記１〜付記６の何れかの音声信号処理装置。
（付記８）
時間領域表現から周波数領域表現に変換した音声信号の低周波数領域を第１帯域幅で複数の第１帯域に分割し、
前記音声信号の前記低周波数領域の周波数より周波数が高い高周波数領域の重要度の高さに基づいて、前記高周波数領域を分割するための前記第１帯域幅以上の第２帯域幅を決定し、
決定された前記第２帯域幅で、前記音声信号の前記高周波数領域を複数の第２帯域に分割し、
前記複数の第１帯域の各々及び前記複数の第２帯域の各々に対して音声信号調整処理を実行する、
音声信号処理をコンピュータに実行させるためのプログラム。
（付記９）
前記高周波数領域の重要度の高さが高くなるにしたがって狭くなるように前記帯域幅を決定する、
付記８のプログラム。
（付記１０）
前記音声信号の高周波数領域の重要度の高さは、
低周波数領域の音声信号のパワーに対する前記高周波数領域の音声信号のパワーの比率、前記高周波数領域の音声信号のパワーの非定常性、前記音声信号の基本周波数、及び前記音声信号が子音に対応するか否か、の少なくとも１つに基づいて決定され、
前記低周波数領域の音声信号のパワーに対する前記高周波数領域の音声信号のパワーの比率が大きくなるにしたがって高くなり、
前記高周波数領域の音声信号のパワーの非定常性が高くなるにしたがって高くなり、
前記音声信号の基本周波数が高くなるにしたがって高くなり、
前記音声信号が子音に対応する場合は子音に対応しない場合より高くなるように、
決定される、
付記８または付記９のプログラム。
（付記１１）
前記音声信号処理は、
前記高周波数領域の重要度の高さに基づいて係数を決定する、
ことをさらに含み、
前記第１帯域幅に、決定された前記係数を乗じることで、前記第２帯域幅を決定する、
付記８〜付記１０の何れかのプログラム。
（付記１２）
前記係数は、前記高周波数領域の重要度の高さが高くなるにしたがって小さくなり、最も小さい場合、前記係数は１となるように決定される、
付記１１のプログラム。
（付記１３）
前記係数は自然数である、
付記１１または付記１２のプログラム。
（付記１４）
前記高周波数領域は、周波数が所定の境界周波数以上の周波数領域であり、
前記低周波数領域は、周波数が前記境界周波数より低い周波数領域であり、
分割される前記第１帯域の数と、分割される前記第２帯域の数と、の和が最大帯域数を越えないように前記境界周波数を低減する、
付記８〜付記１３の何れかのプログラム。 (Appendix 1)
A first band dividing unit that divides a low frequency region of an audio signal converted from a time domain representation into a frequency domain representation into a plurality of first bands with a first bandwidth;
A second bandwidth equal to or greater than the first bandwidth for dividing the high frequency region is determined based on the importance of the high frequency region having a frequency higher than the frequency of the low frequency region of the audio signal. A bandwidth determination unit;
A second band dividing unit that divides the high frequency region of the audio signal into a plurality of second bands with the second bandwidth determined by the bandwidth determining unit;
An audio signal adjustment unit that executes an audio signal adjustment process for each of the plurality of first bands and each of the plurality of second bands;
An audio signal processing apparatus.
(Appendix 2)
The bandwidth determination unit
Determining the bandwidth to become narrower as the importance of the high frequency region becomes higher;
The audio signal processing device according to attachment 1.
(Appendix 3)
The importance of the high frequency region of the audio signal is
The ratio of the power of the audio signal in the high frequency region to the power of the audio signal in the low frequency region, the unsteadiness of the power of the audio signal in the high frequency region, the fundamental frequency of the audio signal, and the audio signal corresponding to the consonant Based on at least one of whether or not
As the ratio of the power of the audio signal in the high frequency region to the power of the audio signal in the low frequency region increases,
As the non-stationarity of the power of the audio signal in the high frequency region becomes higher,
Increases as the fundamental frequency of the audio signal increases;
If the audio signal corresponds to a consonant, it will be higher than if it does not correspond to a consonant,
It is determined,
The audio signal processing device according to appendix 1 or appendix 2.
(Appendix 4)
The bandwidth determination unit
A coefficient determination unit that determines a coefficient based on the importance of the high frequency region;
Including
Determining the second bandwidth by multiplying the first bandwidth by the coefficient determined by the coefficient determination unit;
The audio signal processing device according to any one of supplementary notes 1 to 3.
(Appendix 5)
The coefficient becomes smaller as the importance of the high frequency region becomes higher, and when it is the smallest, the coefficient is determined to be 1.
The audio signal processing device according to attachment 4.
(Appendix 6)
The coefficient is a natural number,
The audio signal processing device according to appendix 4 or appendix 5.
(Appendix 7)
The high frequency region is a frequency region whose frequency is equal to or higher than a predetermined boundary frequency,
The low frequency region is a frequency region whose frequency is lower than the boundary frequency;
The boundary frequency is reduced so that the sum of the number of the first bands divided by the first band dividing unit and the number of the second bands divided by the second band dividing unit does not exceed the maximum number of bands. To
The audio signal processing device according to any one of supplementary notes 1 to 6.
(Appendix 8)
Dividing the low frequency region of the audio signal converted from the time domain representation into the frequency domain representation into a plurality of first bands with a first bandwidth;
A second bandwidth that is equal to or higher than the first bandwidth for dividing the high frequency region is determined based on the importance of the high frequency region having a frequency higher than the frequency of the low frequency region of the audio signal. ,
Dividing the high-frequency region of the audio signal into a plurality of second bands with the determined second bandwidth;
Performing an audio signal adjustment process on each of the plurality of first bands and each of the plurality of second bands;
A program for causing a computer to execute audio signal processing.
(Appendix 9)
Determining the bandwidth to become narrower as the importance of the high frequency region becomes higher;
Appendix 8 program.
(Appendix 10)
The importance of the high frequency region of the audio signal is
The ratio of the power of the audio signal in the high frequency region to the power of the audio signal in the low frequency region, the unsteadiness of the power of the audio signal in the high frequency region, the fundamental frequency of the audio signal, and the audio signal corresponding to the consonant Based on at least one of whether or not
As the ratio of the power of the audio signal in the high frequency region to the power of the audio signal in the low frequency region increases,
As the non-stationarity of the power of the audio signal in the high frequency region becomes higher,
Increases as the fundamental frequency of the audio signal increases;
If the audio signal corresponds to a consonant, it will be higher than if it does not correspond to a consonant,
It is determined,
Appendix 8 or 9 program.
(Appendix 11)
The audio signal processing is
Determining a coefficient based on the importance of the high frequency region;
Further including
Determining the second bandwidth by multiplying the first bandwidth by the determined coefficient;
The program according to any one of appendix 8 to appendix 10.
(Appendix 12)
The coefficient becomes smaller as the importance of the high frequency region becomes higher, and when it is the smallest, the coefficient is determined to be 1.
Appendix 11 program.
(Appendix 13)
The coefficient is a natural number,
The program of Supplementary Note 11 or Supplementary Note 12.
(Appendix 14)
The high frequency region is a frequency region whose frequency is equal to or higher than a predetermined boundary frequency,
The low frequency region is a frequency region whose frequency is lower than the boundary frequency;
Reducing the boundary frequency so that the sum of the number of the first bands to be divided and the number of the second bands to be divided does not exceed the maximum number of bands;
The program according to any one of appendix 8 to appendix 13.

１０音声信号処理装置
２３第１帯域分割部
２４帯域幅決定部
２５第２帯域分割部
３１ＣＰＵ
３２一次記憶部
３３二次記憶部 10 audio signal processing device 23 first band dividing unit 24 bandwidth determining unit 25 second band dividing unit 31 CPU
32 Primary storage unit 33 Secondary storage unit

Claims

時間領域表現から周波数領域表現に変換した音声信号の低周波数領域を第１帯域幅で複数の第１帯域に分割する第１帯域分割部と、
前記音声信号の前記低周波数領域の周波数より周波数が高い高周波数領域の重要度の高さに基づいて、前記高周波数領域を分割するための前記第１帯域幅以上の第２帯域幅を決定する帯域幅決定部と、
前記帯域幅決定部で決定された前記第２帯域幅で、前記音声信号の前記高周波数領域を複数の第２帯域に分割する第２帯域分割部と、
前記複数の第１帯域の各々及び前記複数の第２帯域の各々に対して音声信号調整処理を実行する音声信号調整部と、
を含む、音声信号処理装置。 A first band dividing unit that divides a low frequency region of an audio signal converted from a time domain representation into a frequency domain representation into a plurality of first bands with a first bandwidth;
A second bandwidth equal to or greater than the first bandwidth for dividing the high frequency region is determined based on the importance of the high frequency region having a frequency higher than the frequency of the low frequency region of the audio signal. A bandwidth determination unit;
A second band dividing unit that divides the high frequency region of the audio signal into a plurality of second bands with the second bandwidth determined by the bandwidth determining unit;
An audio signal adjustment unit that executes an audio signal adjustment process for each of the plurality of first bands and each of the plurality of second bands;
An audio signal processing apparatus.

前記帯域幅決定部は、
前記高周波数領域の重要度の高さが高くなるにしたがって狭くなるように前記第２帯域幅を決定する、
請求項１に記載の音声信号処理装置。 The bandwidth determination unit
Determining the second bandwidth to become narrower as the importance of the high frequency region becomes higher;
The audio signal processing apparatus according to claim 1.

前記音声信号の高周波数領域の重要度の高さは、
低周波数領域の音声信号のパワーに対する前記高周波数領域の音声信号のパワーの比率、前記高周波数領域の音声信号のパワーの非定常性、前記音声信号の基本周波数、及び前記音声信号が子音に対応するか否か、の少なくとも１つに基づいて決定され、
前記低周波数領域の音声信号のパワーに対する前記高周波数領域の音声信号のパワーの比率が大きくなるにしたがって高くなり、
前記高周波数領域のパワーの非定常性が高くなるにしたがって高くなり、
前記音声信号の基本周波数が高くなるにしたがって高くなり、
前記音声信号が子音に対応する場合は子音に対応しない場合より高くなるように、
決定される、
請求項１または請求項２に記載の音声信号処理装置。 The importance of the high frequency region of the audio signal is
The ratio of the power of the audio signal in the high frequency region to the power of the audio signal in the low frequency region, the unsteadiness of the power of the audio signal in the high frequency region, the fundamental frequency of the audio signal, and the audio signal corresponding to the consonant Based on at least one of whether or not
As the ratio of the power of the audio signal in the high frequency region to the power of the audio signal in the low frequency region increases,
As the unsteadiness of the power in the high frequency region increases,
Increases as the fundamental frequency of the audio signal increases;
If the audio signal corresponds to a consonant, it will be higher than if it does not correspond to a consonant,
It is determined,
The audio signal processing device according to claim 1.

前記帯域幅決定部は、
前記高周波数領域の重要度の高さに基づいて係数を決定する係数決定部、
を含み、
前記第１帯域幅に前記係数決定部で決定された前記係数を乗じることで、前記第２帯域幅を決定する、
請求項１〜請求項３の何れか１項に記載の音声信号処理装置。 The bandwidth determination unit
A coefficient determination unit that determines a coefficient based on the importance of the high frequency region;
Including
Determining the second bandwidth by multiplying the first bandwidth by the coefficient determined by the coefficient determination unit;
The audio signal processing apparatus according to any one of claims 1 to 3.

前記係数は、前記高周波数領域の重要度の高さが高くなるにしたがって小さくなり、最も小さい場合、前記係数は１となるように決定される、
請求項４に記載の音声信号処理装置。 The coefficient becomes smaller as the importance of the high frequency region becomes higher, and when it is the smallest, the coefficient is determined to be 1.
The audio signal processing device according to claim 4.

前記係数は自然数である、
請求項４または請求項５に記載の音声信号処理装置。 The coefficient is a natural number,
The audio signal processing device according to claim 4 or 5.

前記高周波数領域は、周波数が所定の境界周波数以上の周波数領域であり、
前記低周波数領域は、周波数が前記境界周波数より低い周波数領域であり、
前記第１帯域分割部で分割される前記第１帯域の数と前記第２帯域分割部で分割される前記第２帯域の数との和が最大帯域数を越えないように前記境界周波数を低減する、
請求項１〜請求項６の何れか１項に記載の音声信号処理装置。 The high frequency region is a frequency region whose frequency is equal to or higher than a predetermined boundary frequency,
The low frequency region is a frequency region whose frequency is lower than the boundary frequency;
The boundary frequency is reduced so that the sum of the number of the first bands divided by the first band dividing unit and the number of the second bands divided by the second band dividing unit does not exceed the maximum number of bands. To
The audio signal processing apparatus according to any one of claims 1 to 6.

時間領域表現から周波数領域表現に変換した音声信号の低周波数領域を第１帯域幅で複数の第１帯域に分割し、
前記音声信号の前記低周波数領域の周波数より周波数が高い高周波数領域の重要度の高さに基づいて、前記高周波数領域を分割するための前記第１帯域幅以上の第２帯域幅を決定し、
決定された前記第２帯域幅で、前記音声信号の前記高周波数領域を複数の第２帯域に分割し、
前記複数の第１帯域の各々及び前記複数の第２帯域の各々に対して音声信号調整処理を実行する、
音声信号処理をコンピュータに実行させるためのプログラム。 Dividing the low frequency region of the audio signal converted from the time domain representation into the frequency domain representation into a plurality of first bands with a first bandwidth;
A second bandwidth that is equal to or higher than the first bandwidth for dividing the high frequency region is determined based on the importance of the high frequency region having a frequency higher than the frequency of the low frequency region of the audio signal. ,
Dividing the high-frequency region of the audio signal into a plurality of second bands with the determined second bandwidth;
Performing an audio signal adjustment process on each of the plurality of first bands and each of the plurality of second bands;
A program for causing a computer to execute audio signal processing.