JP2000181477A

JP2000181477A - Voice processor

Info

Publication number: JP2000181477A
Application number: JP10354545A
Authority: JP
Inventors: 秀享 ▲高▼橋; Hideyuki Takahashi
Original assignee: Olympus Optical Co Ltd
Current assignee: Olympus Corp
Priority date: 1998-12-14
Filing date: 1998-12-14
Publication date: 2000-06-30

Abstract

PROBLEM TO BE SOLVED: To automatically adjust sound volume to a proper level corresponding to a listener. SOLUTION: This processor is equipped with a voice decision part 30A which decides a voiced and a voiceless section of voice data to be processed, a mean frame energy calculation part 30B which calculates the mean frame energy of each voiced section decided by the decision part 30A, a voice level control gain calculation part 30C which calculates a voice level control gain for adjusting the voice level of each voiced section according to the mean frame energy of each voiced section calculated by the calculation part 30B, and a voice level control gain write part 30D which writes the voice level control gain found by the calculation part 30C at a specific position of the voice data.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音声処理装置に関す
る。[0001] The present invention relates to an audio processing device.

【０００２】[0002]

【従来の技術】音声録音再生装置等の音声処理装置に
は、通常、ボリューム手段が設けられており、使用者は
そのボリューム手段を操作することにより所望の音量で
聞くことができる。2. Description of the Related Art A sound processing device such as a sound recording / reproducing device is usually provided with a volume means, and a user can listen at a desired volume by operating the volume means.

【０００３】しかし、音声はダイナミックレンジが広
く、同じボリュームレベルであっても音量が小さい場合
も大きい場合もあり、そのような音量の変動が原因で聞
きづらい場合には、使用者がそのつどボリュームを調整
する必要があった。特に、対談の録音のように、録音装
置と発言者との距離が発言者によって異なる場合にあっ
ては、録音装置から近い位置にいる発言者の音声だけが
大きく聞こえて、録音装置から遠い位置にいる発言者の
声は小さくて聞き取りにくい場合がある。[0003] However, the sound has a wide dynamic range, and the sound volume may be low or high even at the same volume level. When the fluctuation of the sound volume makes it difficult to hear, the user may adjust the volume each time. It needed to be adjusted. In particular, when the distance between the recording device and the speaker differs from speaker to speaker, such as when recording a conversation, only the voice of the speaker near the recording device can be heard loudly, and the position far from the recording device can be heard. The voice of the speaker in the room may be small and difficult to hear.

【０００４】特開平９−２３２８９２号公報はこのよう
な問題を克服すべく、入力された直前の音声のレベルに
基づいて音量を調節する音量制御装置を開示している。Japanese Unexamined Patent Publication No. Hei 9-232892 discloses a volume control device for adjusting the volume based on the level of the sound immediately before input in order to overcome such a problem.

【０００５】[0005]

【発明が解決しようとする課題】ところで音量は重要な
情報量のひとつであり、頻繁にその音量が変更されると
自然さが失われ、かえって音質を低下させることになり
かねない。音量の調節は、自然さを保ったまま行われる
ことが望ましく、このためには、処理すべき音声の全体
との関係において音量を調整することが必要である。By the way, the volume is one of important information quantities, and if the volume is changed frequently, the naturalness is lost and the sound quality may be degraded. It is desirable that the sound volume be adjusted while maintaining the naturalness. For this purpose, it is necessary to adjust the sound volume in relation to the entire sound to be processed.

【０００６】しかしながら、上記した特開平９−２３２
８９２号公報では、直前の音声のレベルのみに基づいて
音量を調節するので、音量が一定に保持されるという効
果は有するが、かえって音質を低下させてしまう場合が
あった。However, the above-mentioned Japanese Patent Application Laid-Open No. 9-232
In Japanese Patent No. 892, since the volume is adjusted based only on the level of the immediately preceding sound, the effect is maintained that the volume is kept constant, but the sound quality may be rather deteriorated.

【０００７】本発明はこのような課題に着目してなされ
たものであり、その目的とするところは、聞く人に応じ
て音量をより適切な大きさに自動的に調節することがで
きる音声処理装置を提供することにある。The present invention has been made in view of such a problem, and an object of the present invention is to provide an audio processing apparatus capable of automatically adjusting a volume to a more appropriate level according to a listener. It is to provide a device.

【０００８】[0008]

【課題を解決するための手段】上記の目的を達成するた
めに、第１の発明に係る音声処理装置は、処理すべき音
声データの有音区間と無音区間とを判定する音声判定手
段と、この音声判定手段で判定された有音区間につい
て、各有音区間の平均フレームエネルギーを計算する平
均フレームエネルギー計算手段と、この平均フレームエ
ネルギー計算手段により計算された各有音区間の平均フ
レームエネルギーに基づいて、各有音区間の音声レベル
を調整するための音声レベル調整ゲインを計算する音声
レベル調整ゲイン計算手段と、この音声レベル調整ゲイ
ン計算手段により求めた音声レベル調整ゲインを上記音
声データの所定の位置に書き込む音声レベル調整ゲイン
書き込み手段とを具備する。In order to achieve the above object, a voice processing apparatus according to a first aspect of the present invention comprises a voice determination unit for determining a voiced section and a silent section of voice data to be processed, For the voiced section determined by the voice determination means, an average frame energy calculation means for calculating an average frame energy of each voiced section, and an average frame energy of each voiced section calculated by the average frame energy calculation means. Voice level adjustment gain calculating means for calculating a voice level adjustment gain for adjusting the voice level of each sound section based on the voice level adjustment gain calculated by the voice level adjustment gain calculating means. And audio level adjustment gain writing means for writing to the position of

【０００９】また、第２の発明に係る音声処理装置は、
第１の発明に係る音声処理装置において、上記音声レベ
ル調整ゲイン計算手段が、上記各有音区間の平均フレー
ムエネルギーの、全有音区間での平均値に基づいて前記
音声レベル調整ゲインを計算する。[0009] Further, the audio processing apparatus according to the second aspect of the present invention comprises:
In the audio processing device according to the first invention, the audio level adjustment gain calculation means calculates the audio level adjustment gain based on an average value of the average frame energy of each audio section in all audio sections. .

【００１０】また、第３の発明に係る音声処理装置は、
第１または第２の発明に係る音声処理装置において、前
記音声データを再生するにあたって、各フレームに対応
する上記音声レベル調整ゲインを当該音声データに乗じ
るか否かをユーザに選択させるための選択部を有する。[0010] The speech processing apparatus according to a third aspect of the present invention includes:
In the audio processing device according to the first or second invention, when reproducing the audio data, a selection unit for allowing a user to select whether or not to multiply the audio data by the audio level adjustment gain corresponding to each frame. Having.

【００１１】[0011]

【発明の実施の形態】以下、図面を参照して本発明の実
施形態を詳細に説明する。Embodiments of the present invention will be described below in detail with reference to the drawings.

【００１２】図１（Ａ）は本実施形態に係る音声処理の
一形態を説明するための図である。図１（Ａ）におい
て、音声録音装置としてのディジタルレコーダ１０によ
って録音された音声データは当該ディジタルレコーダ１
０に着脱自在に装着されたミニチュアカード１１に記憶
される。ミニチュアカード１１はディジタルレコーダ１
０から取り外されてＰＣカードアダプタ１２に装着した
状態で、パーソナルコンピュータ１３に装填される。パ
ーソナルコンピュータ１３には音声再生、情報表示など
の処理を行なう制御プログラム１４がインストールされ
ている。FIG. 1A is a diagram for explaining one form of audio processing according to the present embodiment. In FIG. 1A, audio data recorded by a digital recorder 10 as an audio recording device is stored in the digital recorder 1.
0 is stored in the miniature card 11 which is detachably attached to the card. The miniature card 11 is a digital recorder 1
After being removed from the PC card adapter 12 and attached to the PC card adapter 12, it is loaded into the personal computer 13. The personal computer 13 has installed therein a control program 14 for performing processing such as sound reproduction and information display.

【００１３】図１（Ｂ）は、パーソナルコンピュータ１
３の制御部の構成を示す図であり、本実施形態に係る音
声処理を行なうために各種の機能を備えている。すなわ
ち、音声判定部３０Ａは処理すべき音声データ全体につ
いてその有音区間と無音区間とを判定する。平均フレー
ムエネルギー計算部３０Ｂは音声判定部３０Ａで判定さ
れた有音区間について、各有音区間の平均フレームエネ
ルギーを計算する。音声レベル調整ゲイン計算部３０Ｃ
は平均フレームエネルギー計算部３０Ｂにより計算され
た各有音区間の平均フレームエネルギーに基づいて、各
有音区間の音声レベルを調整するための音声レベル調整
ゲインを計算する。音声レベル調整ゲイン書き込み部３
０Ｄは音声レベル調整ゲイン計算部３０Ｃで計算した音
声レベル調整ゲインを音声データの所定の位置に書き込
む。選択処理部３０Ｅは音声データの再生時において、
音声レベル調整ゲイン書き込み部３０Ｄに書き込まれた
音声レベル調整ゲインを乗じるか否かをユーザに選択さ
せ、この選択に応じた処理を行なう部分である。ユーザ
の選択は例えばパーソナルコンピュータ１３のキーボー
ドから行なうことができる。FIG. 1B shows a personal computer 1.
FIG. 3 is a diagram illustrating a configuration of a control unit of No. 3 and includes various functions for performing audio processing according to the present embodiment. That is, the sound determination unit 30A determines the sound section and the silent section of the entire sound data to be processed. The average frame energy calculation unit 30B calculates the average frame energy of each voiced section for the voiced section determined by the voice determination unit 30A. Audio level adjustment gain calculator 30C
Calculates an audio level adjustment gain for adjusting the audio level of each voiced section based on the average frame energy of each voiced section calculated by the average frame energy calculation unit 30B. Audio level adjustment gain writing unit 3
0D writes the audio level adjustment gain calculated by the audio level adjustment gain calculator 30C at a predetermined position in the audio data. The selection processing unit 30E, when reproducing the audio data,
This section allows the user to select whether or not to multiply by the audio level adjustment gain written in the audio level adjustment gain writing section 30D, and performs processing according to the selection. The user's selection can be made from the keyboard of the personal computer 13, for example.

【００１４】図２は音声データのデータ構造を示す図で
ある。図２において、音声データは、１のファイルとし
て構成されており、ファイルヘッダ領域２０−１とフレ
ームデータ領域２０−２とからなる。ファイルヘッダ領
域２０−１には例えば、記録された日時や録音時間長等
の情報が記録されている。また、フレームデータ領域２
０−２は複数のフレーム（フレーム１，２，３，…ｎ）
から構成され、各フレームは、フレームヘッダ領域２０
−３と音声データ領域２０−４とからなる。フレームヘ
ッダ領域２０−３には音声レベル調整ゲインが上記した
音声レベル調整ゲイン書き込み部３０Ｄにより書き込ま
れる。ここでは初期値として１がセットされるものと
し、音声データ領域２０−４には音声データが所定長記
録される。FIG. 2 is a diagram showing the data structure of audio data. In FIG. 2, the audio data is configured as one file, and includes a file header area 20-1 and a frame data area 20-2. In the file header area 20-1, for example, information such as recorded date and time and recording time length is recorded. Also, the frame data area 2
0-2 is a plurality of frames (frames 1, 2, 3,... N)
Each frame is composed of a frame header area 20
-3 and an audio data area 20-4. The audio level adjustment gain is written in the frame header area 20-3 by the audio level adjustment gain writing unit 30D. Here, 1 is set as an initial value, and audio data is recorded in the audio data area 20-4 for a predetermined length.

【００１５】ここでパーソナルコンピュータ１３内に取
り込まれた音声データは、再生に先だって、本実施形態
に係る音声レベルの調整が行われる。以下、フローチャ
ートを参照して説明する。Here, the audio data taken into the personal computer 13 is subjected to audio level adjustment according to the present embodiment prior to reproduction. Hereinafter, description will be made with reference to a flowchart.

【００１６】図３は、本実施形態に係る音声レベルの調
整処理の概要を示すフローチャートである。本調整処理
はここではパーソナルコンピュータ１３の制御部で行わ
れる。FIG. 3 is a flowchart showing the outline of the audio level adjustment processing according to the present embodiment. This adjustment process is performed by the control unit of the personal computer 13 here.

【００１７】図３において、まず処理対象の音声データ
について、有音／無音区間判定処理のためのしきい値計
算処理を行い（ステップＳ１）、そのしきい値に基づい
て有音／無音区間判定処理を行う（ステップＳ２）。続
いて、ステップＳ２において判定された各有音区間にお
ける平均フレームエネルギーを計算し（ステップＳ
３）、求めた各有音区間の平均フレームエネルギーに基
づいて、各有音区間における各サンプルの音声レベル調
整ゲインを計算する（ステップＳ４）。次に音声レベル
調整ゲイン書き込み部３０Ｄによりこの音声レベル調整
ゲインを音声データ中の所定の位置に書き込む（ステッ
プＳ５）。ここでは図２に示すフレームヘッダ領域２０
−３に書き込まれる。In FIG. 3, a threshold value calculation process for voice / silence section determination processing is first performed on voice data to be processed (step S1), and voice / silence section determination is performed based on the threshold value. Processing is performed (step S2). Subsequently, the average frame energy in each sound section determined in step S2 is calculated (step S2).
3) Based on the obtained average frame energy of each sound section, a sound level adjustment gain of each sample in each sound section is calculated (step S4). Next, the audio level adjustment gain is written into a predetermined position in the audio data by the audio level adjustment gain writing unit 30D (step S5). Here, the frame header area 20 shown in FIG.
-3 is written.

【００１８】以下、上記ステップＳ１からステップＳ５
までの各ステップの処理について、詳細に説明する。The following steps S1 to S5
The processing of each step up to will be described in detail.

【００１９】図４は、ステップＳ１の有音／無音区間の
判定時のしきい値計算処理の内容を示すフローチャート
である。この処理が始まると、まず、フレーム番号のカ
ウント値を示す変数ｆを０に初期化しておく（ステップ
Ｓ６）。FIG. 4 is a flow chart showing the contents of the threshold value calculation processing at the time of determination of a sound / non-sound section in step S1. When this process starts, first, a variable f indicating the count value of the frame number is initialized to 0 (step S6).

【００２０】次に、以下の式を用いてフレームｆにおけ
るフレームエネルギーｅ（ｆ）を計算する（ステップＳ
７）。Next, the frame energy e (f) in the frame f is calculated using the following equation (step S).
7).

【００２１】[0021]

【数１】 (Equation 1)

【００２２】なお、数式中、ｓ（ｉ）は１フレーム中の
サンプル位置ｉにおけるサンプル、Ｎは１フレームを構
成するサンプル数を示している。In the equation, s (i) indicates a sample at a sample position i in one frame, and N indicates the number of samples forming one frame.

【００２３】次に、変数ｆの値が０であるか否か、すな
わち、初期のフレームであるか否かを判定し（ステップ
Ｓ８）、ｆが０である場合には、最小フレームエネルギ
ーを示す変数ｍｉｎの値をｅ（ｆ）（＝ｅ（０））にセ
ットする（ステップＳ９）。Next, it is determined whether or not the value of the variable f is 0, that is, whether or not the frame is an initial frame (step S8). If f is 0, it indicates the minimum frame energy. The value of the variable min is set to e (f) (= e (0)) (step S9).

【００２４】また、上記ステップＳ８においてｆが０で
ない場合には、フレームエネルギーｅ（ｆ）が変数ｍｉ
ｎより小さいか否かを判定し（ステップＳ１０）、小さ
い場合には変数ｍｉｎにフレームエネルギーｅ（ｆ）を
セットし（ステップＳ９）、一方、小さくない場合には
そのまま何もせずにステップＳ１１に行く。If f is not 0 in step S8, the frame energy e (f) is set to the variable mi.
It is determined whether or not n is smaller than n (step S10). If smaller, frame energy e (f) is set to a variable min (step S9). go.

【００２５】ステップＳ１１ではファイルが終端に達し
たか否かを判定し、まだ終端ではない場合には、変数ｆ
をインクリメントして（ステップＳ１２）、次のフレー
ムデータを読み出して上記ステップＳ７に戻って上述し
た処理を繰返す。In step S11, it is determined whether or not the end of the file has been reached.
Is incremented (step S12), the next frame data is read out, and the process returns to step S7 to repeat the above-described processing.

【００２６】また、このステップＳ１１においてファイ
ルの終端に達したと判断された場合は、しきい値ｔｒｓ
に、上記変数ｍｉｎに所定の値α（例えば１．２）を積
算した値をセットして（ステップＳ１３）、この処理を
抜ける。If it is determined in step S11 that the end of the file has been reached, the threshold trs
Then, a value obtained by integrating a predetermined value α (for example, 1.2) is set in the variable min (step S13), and the process exits.

【００２７】このようなしきい値設定の処理方法は、す
でに音声データが記録されていることを有効に利用した
ものであり、ファイル全体の最小エネルギーに基づいて
しきい値を決定することができるために、誤りの少ない
有音／無音区間判定処理を行なうことが可能であるな
お、上述では、読み込んだ全区間（つまり、音声ファイ
ルを構成する全フレーム）の最小値を求めているが、本
発明はこれに限定されるものではなく、例えばパーソナ
ルコンピュータのメモリ容量を勘案して、全区間の最小
値でなく、ある程度の長さの区間に区切って処理するよ
うにしても良い。Such a threshold value setting processing method effectively utilizes the fact that audio data has already been recorded, and the threshold value can be determined based on the minimum energy of the entire file. In addition, it is possible to perform a sound / silence section determination process with few errors. In the above description, the minimum value of all the read sections (that is, all frames constituting the audio file) is obtained. Is not limited to this. For example, in consideration of the memory capacity of the personal computer, the processing may be performed not on the minimum value of all sections but on sections of a certain length.

【００２８】続いて、図５は、上記図３のステップＳ２
における有音／無音区間判定処理の内容を示すフローチ
ャートである。FIG. 5 is a flowchart showing the operation of step S2 in FIG.
6 is a flowchart showing the content of a sound / non-sound section determination process in FIG.

【００２９】この処理が始まると、フレーム番号のカウ
ント値を示す変数ｆ、直前のフレームの有音／無音の状
態を示す変数Ｐｒｅｖ、音声データ中の有音区間のカウ
ント値を示す変数ＶｓＣｎｔを、各々０に初期化してお
き、有音区間から無音区間への遷移を猶予するための制
御変数ＨａｎｇＣｎｔを、例えば１０にセットしておく
（ステップＳ１４）。When this process is started, a variable f indicating the count value of the frame number, a variable Prev indicating the sound / non-speech state of the immediately preceding frame, and a variable VsCnt indicating the count value of the sound section in the audio data are set as follows: Each of them is initialized to 0, and a control variable HangCnt for delaying the transition from a sound section to a silent section is set to, for example, 10 (step S14).

【００３０】次に、上述した図４において計算したフレ
ームエネルギーｅ（ｆ）が、図４において計算したしき
い値ｔｒｓより大きいか否かを判定する（ステップＳ１
５）。ここでｅ（ｆ）がｔｒｓより大きい場合には、Ｈ
ａｎｇＣｎｔの値を０にセットし（ステップＳ１６）、
続いて直前のフレームの有音／無音の状態を示す変数Ｐ
ｒｅｖの値が０か否かを判定する（ステップＳ１７）。
このＰｒｅｖの値が０であれば、直前のフレームは無音
フレームであったことを示し、Ｐｒｅｖの値が１であれ
ば、直前のフレームは有音フレームであったことを示す
ものである。ここでＰｒｅｖが０であれば、有音区間の
開始フレームを示す変数ｖｂ（ＶｓＣｎｔ）に現在のフ
レームｆの値をセットし、Ｐｒｅｖに１をセットし、有
音区間のカウント値を示す変数ＶｓＣｎｔをインクリメ
ントする（ステップＳ１８）。一方、Ｐｒｅｖが０でな
ければ何もせずにステップＳ１９に行く。Next, it is determined whether or not the frame energy e (f) calculated in FIG. 4 is larger than the threshold value trs calculated in FIG. 4 (step S1).
5). Here, if e (f) is greater than trs, H
The value of angCnt is set to 0 (step S16),
Then, a variable P indicating the sound / non-sound state of the immediately preceding frame
It is determined whether the value of rev is 0 (step S17).
If the value of Prev is 0, it indicates that the immediately preceding frame was a silent frame, and if the value of Prev is 1, it indicates that the immediately preceding frame was a voiced frame. Here, if Prev is 0, the value of the current frame f is set to a variable vb (VsCnt) indicating the start frame of the sound section, Prev is set to 1, and a variable VsCnt indicating the count value of the sound section is set. Is incremented (step S18). On the other hand, if Prev is not 0, the process goes to step S19 without doing anything.

【００３１】そして、ファイルが終端に達したか否かを
判定し（ステップＳ１９）、まだ終端でない場合には、
変数ｆをインクリメントして（ステップＳ２０）、次の
フレームを読み出して上記ステップＳ１５に戻って処理
を繰返す。Then, it is determined whether or not the file has reached the end (step S19).
The variable f is incremented (step S20), the next frame is read, and the process returns to step S15 to repeat the process.

【００３２】また、ステップＳ１９においてファイルの
終端に達したと判断された場合は、この処理を抜ける。If it is determined in step S19 that the end of the file has been reached, this processing is exited.

【００３３】また、上記ステップＳ１５において、ｅ
（ｆ）がしきい値ｔｒｓより小さいと判定された場合に
は、ＨａｎｇＣｎｔの値が、例えば９より大きいか否か
を判定し（ステップＳ２１）、ＨａｎｇＣｎｔの値が９
より小さいと判定されたときは、ＨａｎｇＣｎｔの値を
インクリメントして（ステップＳ２２）、上記ステップ
Ｓ１７に行く。一方、ＨａｎｇＣｎｔの値が９より大き
いと判定されたときは、Ｐｒｅｖの値が１であるか否
か、すなわち、直前のフレームが有音であったか否かを
判定する（ステップＳ２３）。ここで、Ｐｒｅｖが１で
あれば、有音区間の終了フレームを示す変数ｖｅ（Ｖｓ
Ｃｎｔ）に現在のフレームｆの値をセットし、Ｐｒｅｖ
に０をセットし（ステップＳ２４）、上記ステップＳ１
９に行く。一方、Ｐｒｅｖが０であれば、何もしないで
上記ステップＳ１９に行く。In step S15, e
If it is determined that (f) is smaller than the threshold trs, it is determined whether the value of HangCnt is greater than, for example, 9 (step S21), and the value of HangCnt is 9
If it is determined that the value is smaller than the value, the value of HangCnt is incremented (step S22), and the process proceeds to step S17. On the other hand, when it is determined that the value of HangCnt is greater than 9, it is determined whether the value of Prev is 1, that is, whether or not the immediately preceding frame has sound (step S23). Here, if Prev is 1, the variable ve (Vs
Cnt) is set to the value of the current frame f, and Prev
Is set to 0 (step S24), and step S1 is set.
Go to 9. On the other hand, if Prev is 0, the process goes to step S19 without doing anything.

【００３４】この処理の結果、ファイル中の有音区間、
無音区間の判定がされ、各有音区間が始まるフレームの
値と、各有音区間が終了するフレームの値が、各々変数
ｖｂ（ＶｓＣｎｔ）、ｖｅ（ＶｓＣｎｔ）に記憶され
る。ここで、変数ＶｓＣｎｔは、有音区間の区間数をカ
ウントする変数であり、この処理を抜けた時点で、ファ
イル中の有音区間の総区間数を示している。As a result of this processing, the sound section in the file,
A silent section is determined, and the value of a frame at which each voiced section starts and the value of a frame at which each voiced section ends are stored in variables vb (VsCnt) and ve (VsCnt), respectively. Here, the variable VsCnt is a variable for counting the number of voiced sections, and indicates the total number of voiced sections in the file at the time of exiting this process.

【００３５】また、この処理のように変数ＨａｎｇＣｎ
ｔを設定することにより有音区間から無音区間への遷移
が所定フレーム分猶予されることになるため、語尾の部
分を誤って無音と判定してしまうようなことを回避する
ことができるという効果を奏する。Also, as in this process, the variable HangCn
By setting t, the transition from the voiced section to the silent section is delayed for a predetermined frame, so that it is possible to avoid the erroneous determination of the ending part as silence. To play.

【００３６】続いて、図６は、上記図３のステップＳ３
における有音区間毎の平均フレームエネルギー計算処理
の内容を示すフローチャートである。FIG. 6 is a flowchart showing the operation of step S3 in FIG.
5 is a flowchart showing the contents of an average frame energy calculation process for each sound section in FIG.

【００３７】この処理が始まると、まず、有音区間のカ
ウント値を示す変数Ｃｎｔの値を０に初期化しておく
（ステップＳ２５）。When this process starts, first, the value of a variable Cnt indicating the count value of a sound section is initialized to 0 (step S25).

【００３８】次に、図５の処理によって求められた有音
区間の総区間数ＶｓＣｎｔが０より大きいか否かを判定
する（ステップＳ２６）。ここでＶｓＣｎｔが０以下で
あれば、ファイル中に有音区間は存在しないと判断し
て、何もしないでこの処理を抜ける。一方、ここでＶｓ
Ｃｎｔが０より大きければ、以下の式を用いて有音区間
Ｃｎｔにおけるフレーム数ｖｎを計算する（ステップＳ
２７）。Next, it is determined whether or not the total number of voiced sections VsCnt obtained by the processing of FIG. 5 is larger than 0 (step S26). If VsCnt is equal to or less than 0, it is determined that there is no sound section in the file, and the process exits without performing any operation. On the other hand, here Vs
If Cnt is larger than 0, the number of frames vn in the sound interval Cnt is calculated using the following equation (step S).
27).

【００３９】[0039]

【数２】 (Equation 2)

【００４０】次に、以下の式により有音区間Ｃｎｔにお
ける平均フレームエネルギーＥｓｅｃ（Ｃｎｔ）を計算
する（ステップＳ２８）。Next, the average frame energy Esec (Cnt) in the sound interval Cnt is calculated by the following equation (step S28).

【００４１】[0041]

【数３】 (Equation 3)

【００４２】そして、ＣｎｔがＶｓＣｎｔ−１に満たな
いか否か、すなわち、全ての有音区間に対して平均フレ
ームエネルギーＥｓｅｃ（Ｃｎｔ）の計算を終えていな
いか否かを判定し（ステップＳ２９）、ＣｎｔがＶｓＣ
ｎｔ−１に満たないのであれば、Ｃｎｔの値をインクリ
メントして（ステップＳ３０）、上記ステップＳ２７に
戻って次の有音区間の平均フレームエネルギーＥｓｅｃ
（Ｃｎｔ）の計算処理を行う。一方、ステップＳ２９に
おいてＣｎｔの値がＶｓＣｎｔ−１以上であると判定さ
れれば、この処理を抜ける。Then, it is determined whether or not Cnt is less than VsCnt-1, that is, whether or not the calculation of the average frame energy Esec (Cnt) has been completed for all sound sections (step S29). , Cnt is VsC
If it is less than nt-1, the value of Cnt is incremented (step S30), and the process returns to step S27 to return the average frame energy Esec of the next sound section.
(Cnt) is calculated. On the other hand, if it is determined in step S29 that the value of Cnt is equal to or greater than VsCnt-1, the process exits from this process.

【００４３】続いて、図７は、上記図３のステップＳ４
における有音区間毎のゲイン計算処理の内容を示すフロ
ーチャートである。FIG. 7 is a flowchart showing step S4 in FIG.
6 is a flowchart showing the content of a gain calculation process for each sound section in FIG.

【００４４】この処理が始まると、まず、有音区間のカ
ウント値を示す変数Ｃｎｔの値を０に初期化しておく
（ステップＳ３１）。次に、図５の処理によって求めら
れた有音区間の総区間数ＶｓＣｎｔが０より大きいか否
かを判定する（ステップＳ３２）。ここでＶｓＣｎｔが
０以下であれば、ファイル中に有音区間は存在しないと
判断して、何もしないでこの処理を抜ける。一方、ここ
でＶｓＣｎｔが０より大きければ、以下の式により、図
６のステップＳ２８で求めた、有音区間Ｃｎｔにおける
平均フレームエネルギーＥｓｅｃ（Ｃｎｔ）の、ファイ
ル中の全有音区間の平均値Ｅａｖｒを計算する（ステッ
プＳ３３）。When this process starts, first, the value of a variable Cnt indicating the count value of a sound section is initialized to 0 (step S31). Next, it is determined whether or not the total number of sound sections VsCnt obtained by the processing of FIG. 5 is greater than 0 (step S32). If VsCnt is equal to or less than 0, it is determined that there is no sound section in the file, and the process exits without performing any operation. On the other hand, if VsCnt is greater than 0, the average Eavr of the average frame energy Esec (Cnt) in the sound section Cnt obtained in step S28 in FIG. Is calculated (step S33).

【００４５】[0045]

【数４】 (Equation 4)

【００４６】次に、有音区間ＣｎｔにおけるゲインＧ
（Ｃｎｔ）を、以下の式を用いて計算する（ステップＳ
３４）。Next, the gain G in the sound interval Cnt
(Cnt) is calculated using the following equation (Step S)
34).

【００４７】[0047]

【数５】 (Equation 5)

【００４８】ここで、数式中、ｓｑｒｔ（）は、カッ
コ内の式の平方根を示している。そして、ＣｎｔがＶｓ
Ｃｎｔ−１に満たないか否か、すなわち、全ての有音区
間に対してゲインＧ（Ｃｎｔ）の計算を終えていないか
否かを判定し（ステップＳ３５）、ＣｎｔがＶｓＣｎｔ
−１に満たないのであれば、Ｃｎｔの値をインクリメン
トして（ステップＳ３６）、上記ステップＳ３４に戻っ
て次の有音区間のゲインＧ（Ｃｎｔ）の計算処理を行
う。一方、ステップＳ３５においてＣｎｔの値がＶｓＣ
ｎｔ−１以上であると判定されれば、この処理を抜け
る。Here, in the formula, sqrt () indicates the square root of the formula in parentheses. And Cnt is Vs
It is determined whether or not Cnt is less than Cnt-1, that is, whether or not the calculation of the gain G (Cnt) has been completed for all sound sections (step S35).
If it is less than -1, the value of Cnt is incremented (step S36), and the process returns to step S34 to calculate the gain G (Cnt) of the next sound section. On the other hand, in step S35, the value of Cnt is VsC
If it is determined that the value is equal to or more than nt-1, the processing exits.

【００４９】続いて、図８は、上記図３のステップＳ５
におけるゲインを音声データ中の所定の位置に書き込む
処理の内容を示すフローチャートである。FIG. 8 is a flowchart showing the operation of step S5 in FIG.
5 is a flowchart showing the contents of a process for writing a gain in a predetermined position in the audio data in FIG.

【００５０】図８において、この処理が始まると、ま
ず、フレーム番号のカウント値を示す変数ｆと、有音区
間のカウント値を示す変数Ｃｎｔの値を０に初期化して
おく（ステップＳ３７）。In FIG. 8, when this process is started, first, the value of a variable f indicating a count value of a frame number and the value of a variable Cnt indicating a count value of a sound section are initialized to 0 (step S37).

【００５１】次に、図５の処理で求められた有音区間の
総区間数ＶｓＣｎｔが０より大きいか否かを判定する
（ステップＳ３８）。ここでＶｓＣｎｔが０以下であれ
ば、ファイル中に有音区間は存在しないと判断して、何
もしないでこの処理を抜ける。Next, it is determined whether or not the total number of sound sections VsCnt obtained in the processing of FIG. 5 is larger than 0 (step S38). If VsCnt is equal to or less than 0, it is determined that there is no sound section in the file, and the process exits without performing any operation.

【００５２】一方、上記ステップＳ３８において、Ｖｓ
Ｃｎｔが０より大きければ、ｆがｖｂ（Ｃｎｔ）の値以
上であるか否かを判定する（ステップＳ３９）。ここ
で、ｆがｖｂ（Ｃｎｔ）の値に満たないと判定すると、
ファイルが終端に達したか否かを判定し（ステップＳ４
４）、まだ終端でない場合にはｆをインクリメントして
（ステップＳ４５）、次のフレームデータを読み出し
て、ステップＳ３９に戻る。このステップＳ４４におい
てファイルの終端に達したと判断した場合は、この処理
を抜ける。On the other hand, in step S38, Vs
If Cnt is greater than 0, it is determined whether f is equal to or greater than the value of vb (Cnt) (step S39). Here, if it is determined that f is less than the value of vb (Cnt),
It is determined whether the file has reached the end (step S4).
4) If not at the end, f is incremented (step S45), the next frame data is read, and the process returns to step S39. If it is determined in step S44 that the end of the file has been reached, this processing is exited.

【００５３】ステップＳ３９においてｆがｖｂ（Ｃｎ
ｔ）の値以上であると判定すると、ｆはｖｅ（Ｃｎｔ）
の値以下であるか否かを判定する（ステップＳ４０）。
このとき、ステップＳ３９，Ｓ４０ともに判定がｙｅｓ
であれば、現在のｆは有音区間内にあることを示してい
る。In step S39, f becomes vb (Cn
When it is determined that the value is equal to or more than the value of t), f is ve (Cnt)
It is determined whether or not the value is equal to or less than (step S40).
At this time, the determination of both steps S39 and S40 is yes.
If, it indicates that the current f is within the sound interval.

【００５４】ここで、ｆはｖｅ（Ｃｎｔ）の値以下であ
ると判定すると、フレームヘッダにＧ（Ｃｎｔ）の値を
書き換える（ステップＳ４１）。一方、ステップＳ４０
においてｆはｖｅ（Ｃｎｔ）より大きいと判断すると、
Ｃｎｔをインクリメントする（ステップＳ４２）。Here, if it is determined that f is equal to or less than the value of ve (Cnt), the value of G (Cnt) is rewritten in the frame header (step S41). On the other hand, step S40
When it is determined that f is larger than ve (Cnt),
Cnt is incremented (step S42).

【００５５】次に、ＣｎｔがＶｓＣｎｔ−１に満たない
か否か、すなわち、全ての有音区間におけるフレームヘ
ッダの書き換えを終えていないか否かを判定し（ステッ
プＳ４３）、ＣｎｔがＶｓＣｎｔ−１に満たなければ、
ファイル終端に達したか否かを判定する（ステップＳ４
４）。一方、ＣｎｔがＶｓＣｎｔ−１以上であれば、こ
の処理を抜ける。Next, it is determined whether or not Cnt is less than VsCnt-1, that is, whether or not the rewriting of the frame header in all sound sections has been completed (step S43). If less than
It is determined whether or not the end of the file has been reached (step S4)
4). On the other hand, if Cnt is equal to or greater than VsCnt-1, the process exits this process.

【００５６】以上説明した手段によって、各フレームに
おけるゲイン値を設定することができる。この後、音声
データの再生時、各フレームの音声に当該フレームヘッ
ダのゲイン値を乗じることがユーザにより選択された場
合には、音声レベルを自動的に調整した音声データが再
生される。また、各フレームの音声に当該フレームヘッ
ダのゲイン値を乗じないことが選択された場合には、音
声レベルを調整しない状態の元の音声データが再生され
る。このようにして、本実施形態では音声レベルを自動
的に調整した音声データを再生するか、元の音声データ
を再生するかをユーザが選択することができる。The gain value in each frame can be set by the means described above. Thereafter, when reproducing the audio data, if the user selects to multiply the audio of each frame by the gain value of the frame header, the audio data whose audio level is automatically adjusted is reproduced. When it is selected not to multiply the audio of each frame by the gain value of the frame header, the original audio data without adjusting the audio level is reproduced. In this manner, in the present embodiment, the user can select whether to reproduce the audio data whose audio level is automatically adjusted or to reproduce the original audio data.

【００５７】上記した実施形態によれば、すでに記録さ
れている音声データに対して、音声データ全体との関係
において、各有音区間の音声レベルを自動的に調整する
ようにしたので、ユーザはボリュームを調整することな
く、音声を一定の音量で聞くことができるという効果を
奏する。According to the above-described embodiment, the audio level of each sound section is automatically adjusted with respect to the already recorded audio data in relation to the entire audio data. There is an effect that the sound can be heard at a constant volume without adjusting the volume.

【００５８】また、すでに記録されている音声データに
対して音声認識をさせる場合にも、音声認識処理に先立
って、本実施形態による音声レベルの調整処理を行え
ば、音声レベルが一定に保たれるために安定した音声認
識を行うことが可能となるという効果を奏する。Also, in the case of performing voice recognition on already recorded voice data, if the voice level adjustment processing according to the present embodiment is performed prior to the voice recognition processing, the voice level is kept constant. Therefore, there is an effect that stable voice recognition can be performed.

【００５９】さらに、本実施形態においては、パーソナ
ルコンピュータ１３により音声レベルの調整処理を行な
うようにしたが、音声録音装置としてのディジタルレコ
ーダ１０内において、音声レベルの調整処理を実現する
ようにしてもよい。Further, in the present embodiment, the audio level adjustment processing is performed by the personal computer 13, but the audio level adjustment processing may be realized in the digital recorder 10 as an audio recording device. Good.

【００６０】なお、本発明は上述した実施形態に限定さ
れるものではなく、発明の主旨を逸脱しない範囲内にお
いて種々の変形や応用が可能であることはもちろんであ
る。It should be noted that the present invention is not limited to the above-described embodiment, and it is needless to say that various modifications and applications can be made without departing from the gist of the invention.

【００６１】[0061]

【発明の効果】請求項１あるいは２に記載の発明によれ
ば、聞く人に応じて音量をより適切な大きさに自動的に
調節することができる音声処理装置を提供することがで
きるという効果を奏する。According to the first or second aspect of the present invention, it is possible to provide an audio processing apparatus capable of automatically adjusting a volume to a more appropriate level according to a listener. To play.

【００６２】また、請求項３に記載の発明によれば、音
声レベルを自動的に調整した音声データを再生するか、
元の音声データを再生するかをユーザが選択することが
できるという効果を奏する。According to the third aspect of the present invention, whether to reproduce the audio data whose audio level is automatically adjusted,
There is an effect that the user can select whether to reproduce the original audio data.

【図面の簡単な説明】[Brief description of the drawings]

【図１】（Ａ）は本実施形態に係る音声処理の一形態を
説明するための図であり、（Ｂ）は制御部の構成を示す
図である。FIG. 1A is a diagram for describing one form of audio processing according to the present embodiment, and FIG. 1B is a diagram illustrating a configuration of a control unit.

【図２】フレームの構成を示す図である。FIG. 2 is a diagram illustrating a configuration of a frame.

【図３】本発明に係る音声レベルの調整処理の概要を示
すフローチャートである。FIG. 3 is a flowchart illustrating an outline of an audio level adjustment process according to the present invention.

【図４】図３に示すステップＳ１の有音／無音区間の判
定時のしきい値計算処理の内容を示すフローチャートで
ある。FIG. 4 is a flowchart showing a content of a threshold value calculation process at the time of determining a sound / non-sound section in step S1 shown in FIG. 3;

【図５】図３のステップＳ２における有音／無音区間判
定処理の内容を示すフローチャートである。FIG. 5 is a flowchart showing the content of a sound / non-sound section determination process in step S2 of FIG. 3;

【図６】図３のステップＳ３における有音区間毎の平均
フレームエネルギー計算処理の内容を示すフローチャー
トである。FIG. 6 is a flowchart showing the contents of an average frame energy calculation process for each sound section in step S3 of FIG. 3;

【図７】図３のステップＳ４における有音区間毎のゲイ
ン計算処理の内容を示すフローチャートである。FIG. 7 is a flowchart showing the content of a gain calculation process for each sound section in step S4 of FIG. 3;

【図８】図３のステップＳ５におけるゲインを音声デー
タ中の所定の位置に書き込む処理の内容を示すフローチ
ャートである。FIG. 8 is a flowchart showing the contents of a process of writing a gain at a predetermined position in audio data in step S5 of FIG.

【符号の説明】[Explanation of symbols]

１０…ディジタルレコーダ、１１…ミニチュアカード、１２…ＰＣカードアダプタ、１３…パーソナルコンピュータ、１４…制御プログラム、３０Ａ…音声判定部、３０Ｂ…平均フレームエネルギー計算部、３０Ｃ…音声レベル調整ゲイン計算部、３０Ｄ…音声レベル調整ゲイン書き込み部、３０Ｅ…選択処理部。 DESCRIPTION OF SYMBOLS 10 ... Digital recorder, 11 ... Miniature card, 12 ... PC card adapter, 13 ... Personal computer, 14 ... Control program, 30A ... Speech judgment part, 30B ... Average frame energy calculation part, 30C ... Speech level adjustment gain calculation part, 30D ... Audio level adjustment gain writing unit, 30E ... selection processing unit.

Claims

【特許請求の範囲】[Claims]

【請求項１】処理すべき音声データの有音区間と無音
区間とを判定する音声判定手段と、この音声判定手段で判定された有音区間について、各有
音区間の平均フレームエネルギーを計算する平均フレー
ムエネルギー計算手段と、この平均フレームエネルギー計算手段により計算された
各有音区間の平均フレームエネルギーに基づいて、各有
音区間の音声レベルを調整するための音声レベル調整ゲ
インを計算する音声レベル調整ゲイン計算手段と、この音声レベル調整ゲイン計算手段により求めた音声レ
ベル調整ゲインを上記音声データの所定の位置に書き込
む音声レベル調整ゲイン書き込み手段と、を具備することを特徴とする音声処理装置。1. A voice determining means for determining a voiced section and a silent section of voice data to be processed, and calculating an average frame energy of each voiced section for the voiced section determined by the voice determining means. Average frame energy calculating means; and a voice level for calculating a voice level adjustment gain for adjusting a voice level of each voiced section based on the average frame energy of each voiced section calculated by the average frame energy calculating means. An audio processing apparatus comprising: an adjustment gain calculation unit; and an audio level adjustment gain writing unit that writes the audio level adjustment gain obtained by the audio level adjustment gain calculation unit at a predetermined position in the audio data.

【請求項２】上記音声レベル調整ゲイン計算手段は、
上記各有音区間の平均フレームエネルギーの、全有音区
間での平均値に基づいて前記音声レベル調整ゲインを計
算すること、を特徴とする請求項１記載の音声処理装
置。2. The sound level adjusting gain calculating means,
2. The audio processing apparatus according to claim 1, wherein the audio level adjustment gain is calculated based on an average value of the average frame energy of each audio section in all audio sections.

【請求項３】前記音声データを再生するにあたって、
各フレームに対応する上記音声レベル調整ゲインを当該
音声データに乗じるか否かをユーザに選択させるための
選択部を有することを特徴とする請求項１または２記載
の音声処理装置。3. When reproducing the audio data,
The audio processing apparatus according to claim 1, further comprising a selection unit configured to allow a user to select whether to multiply the audio data by the audio level adjustment gain corresponding to each frame.