JPH10143193A

JPH10143193A - Speech signal processor

Info

Publication number: JPH10143193A
Application number: JP8296104A
Authority: JP
Inventors: Sachihiro Yamashita; 祥宏山下; Shoichi Goto; 昌一後藤; Shuhei Taniguchi; 周平谷口; Atsushi Ishizu; 厚石津
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 1996-11-08
Filing date: 1996-11-08
Publication date: 1998-05-29

Abstract

PROBLEM TO BE SOLVED: To suppress the absence of a speech signal containing an important key word and enable easy-to-hear fast-forward speech reproduction by providing a temporary buffer managing means which discards speech frames flexibly. SOLUTION: A filter bank 3 divides inputted speech frames into frequency bands by using a filter bank equipped with plural band-dividing filters. A voiced sound decision object band selecting means 4 which inputs the band-divided speech signal selects the speech signal and inputs it to a voiced sound frame decision means 5. The voiced sound frame decision means 5 compares the spectrum amplitude level of a band-limited speech signal with a threshold level. A temporary buffer managing means 7 rearranges management information in the decreasing order of decided score values by referring to the decided score values for the respective speech frames in the temporary buffer management information. After the rearrangement, specific speech frames are selected in the decreasing order and left in a temporary buffer means 6, and the remainders are discarded.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は音声信号の早送り再
生に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to fast forward reproduction of an audio signal.

【０００２】[0002]

【従来の技術】最近のビデオ機器などでは、映像の早送
り再生技術の発展とともに、音声の早送り再生技術も発
展してきている。2. Description of the Related Art In recent video equipment and the like, audio fast-forward playback technology has been developed along with the video fast-forward playback technology.

【０００３】映像の早送り再生は、視覚的に気にならな
い程度のコマ落としを行い、映像を間引きながら再生す
ることで早送り再生を可能としている。In the fast-forward playback of video, fast-forward playback is made possible by performing frame dropping to such an extent that the user does not notice visually, and playing back the video while thinning it out.

【０００４】一方音声は、時間データの間引きを行い、
時間長を短くすることで早送り再生を可能としている。
この時間データの間引きは、単純に音声データの内容に
関係なくある一定長を間引く方法と、無音音声部を検知
し、これを間引く方法がある。またこれらの２つを合わ
せた方法もある。[0004] On the other hand, for audio, time data is thinned out.
Fast forward playback is possible by shortening the time length.
The time data may be thinned out by simply thinning out a certain length regardless of the content of the audio data, or by detecting a silent audio part and thinning it out. There is also a method combining these two.

【０００５】従来例として、単純間引きと、無音音声部
を検知しこれを間引く２つの方法を合わせた早送り音声
再生を図１１を用い説明する。[0005] As a conventional example, a simple thinning-out and a fast-forwarding sound reproduction in which two methods for detecting and thinning out a silent sound part are combined will be described with reference to FIG.

【０００６】図１１において１０１はA/D変換器であ
り、１０２は前記A/D変換器１０１で変換した音声デー
タを、ある時間単位毎に分割する音声フレーム生成手段
である。以後このようにある時間毎に分割された音声信
号を音声フレームと呼ぶ。In FIG. 11, reference numeral 101 denotes an A / D converter, and reference numeral 102 denotes an audio frame generating means for dividing the audio data converted by the A / D converter 101 into certain time units. Hereinafter, such an audio signal divided every time is referred to as an audio frame.

【０００７】１０３は前記音声フレーム生成手段１０２
で生成した音声フレームを入力とし、音声フレームの振
幅レベルの絶対値を検知する振幅レベル検知手段であ
る。１０４は前記振幅レベル検知手段１０３で検知した
振幅レベルの絶対値を入力とし、あらかじめ設定したし
きい値レベルとの比較を行い、音声フレームの選択を行
うしきい値比較手段である。１０５は前記しきい値比較
手段１０４によって選択された音声フレームを時系列に
蓄積する一時バッファ手段である。１０６は前記一時バ
ッファ手段１０５に蓄積された音声フレームを管理する
一時バッファ管理手段である。１０７は前記一時バッフ
ァ手段１０５からの音声信号を入力とするD/A変換器で
ある。Reference numeral 103 denotes the voice frame generating means 102
Is an amplitude level detecting means for receiving the audio frame generated in step (1) as an input and detecting the absolute value of the amplitude level of the audio frame. Reference numeral 104 denotes a threshold comparing unit which receives an absolute value of the amplitude level detected by the amplitude level detecting unit 103, compares the absolute value with a preset threshold level, and selects a voice frame. Reference numeral 105 denotes a temporary buffer unit that accumulates the audio frames selected by the threshold value comparison unit 104 in time series. Reference numeral 106 denotes a temporary buffer management unit that manages the audio frames stored in the temporary buffer unit 105. Reference numeral 107 denotes a D / A converter to which the audio signal from the temporary buffer means 105 is input.

【０００８】次に具体的に１.２５倍速度の早送り音声
信号の生成方法を例にとって説明する。Next, a method of generating a fast-forward audio signal at a 1.25-times speed will be specifically described.

【０００９】まず入力音声信号をA/D変換器１０１によ
ってデジタル変換し、音声フレーム生成手段１０２によ
って、音声ピッチ周期に近い２０ミリ秒毎に分割し、音
声フレームを生成する。First, an input audio signal is digitally converted by an A / D converter 101, and the audio frame generation means 102 divides the input audio signal every 20 milliseconds, which is close to the audio pitch period, to generate an audio frame.

【００１０】生成された音声フレームを次の振幅レベル
検知手段１０３に入力し、音声フレームの振幅レベルの
絶対値を検知する。The generated voice frame is input to the next amplitude level detecting means 103, and the absolute value of the amplitude level of the voice frame is detected.

【００１１】次にしきい値比較手段１０４へ音声フレー
ムの振幅レベルの絶対値を入力し、しきい値レベルと振
幅レベルの絶対値との比較を行い、音声フレームが有音
声の音声フレームであるか無音声の音声フレームである
かの判定を行う。しきい値レベルは無音状態に近い値設
定する。判定は、しきい値レベルよりも振幅レベルの絶
対値が大きい場合、有音声の音声フレームと判定し、小
さい場合には無音声の音声フレームと判定する。Next, the absolute value of the amplitude level of the voice frame is input to the threshold value comparing means 104, and the threshold level and the absolute value of the amplitude level are compared to determine whether the voice frame is a voiced voice frame. It is determined whether the frame is a silent voice frame. The threshold level is set to a value close to a silent state. In the determination, when the absolute value of the amplitude level is larger than the threshold level, the voice frame is determined to be a voiced voice frame.

【００１２】このようにして、有音声の音声フレームと
判定された音声フレームがしきい値比較手段１０４によ
って選択され、時系列に次の一時バッファ手段１０５へ
蓄積されていく。蓄積される音声フレーム数は時間の経
過とともに増加していく。In this way, the audio frame determined to be a voiced audio frame is selected by the threshold value comparing means 104 and stored in the next temporary buffer means 105 in time series. The number of audio frames to be stored increases over time.

【００１３】一時バッファ管理手段１０６では、音声フ
レーム生成手段１０２で生成した音声フレームの数を受
け、一時バッファ手段１０５に蓄積された音声フレーム
数を調整する。The temporary buffer management means 106 receives the number of audio frames generated by the audio frame generation means 102 and adjusts the number of audio frames stored in the temporary buffer means 105.

【００１４】ここでバッファ管理手段１０６による音声
フレーム数の調整方法を説明する。説明のため一時バッ
ファ手段１０５の容量は音声フレーム生成手段１０２で
生成される音声フレームの大きさの１０倍、つまり音声
フレーム１０フレーム分の容量を持つものとする。Here, a method of adjusting the number of audio frames by the buffer management means 106 will be described. For the sake of explanation, it is assumed that the capacity of the temporary buffer means 105 is ten times the size of the audio frame generated by the audio frame generation means 102, that is, has a capacity of 10 audio frames.

【００１５】１.２５倍速度の早送り音声再生を実現す
るためには、時間軸方向に８/１０の圧縮を行う必要が
ある。このため一時バッファ管理手段１０６によって一
時バッファ手段１０５に蓄積した音声フレーム数を、音
声フレーム１０フレーム分に相当する時間毎に８フレー
ムに調整する必要がある。In order to realize fast-forward audio reproduction at 1.25 times speed, it is necessary to perform 8/10 compression in the time axis direction. For this reason, it is necessary to adjust the number of audio frames accumulated in the temporary buffer means 105 by the temporary buffer management means 106 to eight frames at a time corresponding to 10 audio frames.

【００１６】調整時、一時バッファ手段１０５に音声フ
レームが１０フレーム存在する場合は、蓄積された音声
フレームの１０フレーム中から、蓄積の順番の古い音声
フレームから遡って２フレームを破棄する。９フレーム
存在する場合には、同様に最後の１フレームを破棄す
る。このようにして音声フレームの内容に関係なく単純
に音声フレームを間引いていく。At the time of adjustment, if there are 10 audio frames in the temporary buffer means 105, two frames are discarded from the 10 stored audio frames, starting from the oldest audio frame in the order of accumulation. If there are nine frames, the last one frame is similarly discarded. In this way, audio frames are simply thinned out regardless of the content of the audio frames.

【００１７】８フレーム存在する場合は破棄は行わず、
７フレーム以下である場合は中身が無音である音声デー
タの音声フレームを８フレームに満たす分だけ最後の音
声フレームへ付加する。このようにして調整毎で常に８
フレームになるように管理する。If there are 8 frames, the discard is not performed.
If the number of frames is equal to or less than seven, the number of voice frames of voice data having no sound is added to the last voice frame by an amount corresponding to eight frames. In this way, always 8
Manage to be a frame.

【００１８】このようにして一時バッファ手段１０５の
音声フレーム数を調整毎に８フレームに管理すること
で、時間軸方向に８/１０の圧縮が達成される。By managing the number of audio frames in the temporary buffer means 105 to 8 frames for each adjustment in this way, 8/10 compression is achieved in the time axis direction.

【００１９】そしてこの８つの音声フレームをD/A変換
器１０７へ順に入力し、アナログ信号へ変換し再生する
ことで、入力音声信号に対し１.２５倍速度の早送り音
声再生を達成できる。The eight audio frames are sequentially input to the D / A converter 107, converted into an analog signal and reproduced, whereby fast-forward audio reproduction at a 1.25-times speed with respect to the input audio signal can be achieved.

【００２０】ところで実際の音声信号、例えばテレビ番
組などの音声信号を入力とした場合、無音声であるシー
ンは少なく、しきい値比較手段１０４で無音の音声フレ
ームと判定される音声フレームの数は少ない。When an actual audio signal, for example, an audio signal of a television program or the like is input, there are few scenes which are silent, and the number of audio frames determined as silent audio frames by the threshold comparing means 104 is as follows. Few.

【００２１】このため音声フレーム生成手段１０２によ
って生成された音声フレームは、しきい値比較手段１０
４によって有音声の音声フレームと判定されることが多
く、ほとんどが一時バッファ手段１０５へ蓄積される。
これによりしきい値比較手段１０４での無音の音声フレ
ームの破棄は期待できず、一時バッファ管理手段１０６
による音声フレームの破棄が頻繁に起こる。そしてこの
破棄された音声フレームの中に重要なキーワードが存在
した場合は、聴取者にとって内容理解が困難な早送り音
声再生になる。For this reason, the voice frame generated by the voice frame generating means 102 is
4 is often determined as a voice frame having voice, and most of the voice frame is stored in the temporary buffer unit 105.
As a result, it is not expected that the silent voice frame is discarded by the threshold value comparing means 104, and the temporary buffer managing means 106
Often, speech frames are discarded due to If an important keyword is present in the discarded voice frame, fast forward voice reproduction is difficult for the listener to understand the content.

【００２２】[0022]

【発明が解決しようとする課題】テレビ音声などを入力
音声信号とした場合、従来例では無音の音声フレームの
存在が少ない場合に、一時バッファ管理手段１０６によ
って頻繁に音声フレームの破棄が生じる。このため破棄
した音声フレーム中に重要なキーワードが存在した場
合、内容理解が困難な早送り音声再生となっていた。When a television sound or the like is used as an input sound signal, the temporary buffer management means 106 frequently discards the sound frame when there are few silent sound frames in the conventional example. For this reason, when an important keyword exists in the discarded audio frame, fast forward audio reproduction is difficult to understand.

【００２３】本発明ではかかる点に艦み、無音の音声フ
レームが存在しない場合に生じる重要なキーワードを含
む音声フレームの破棄による内容理解困難な早送り音声
再生を改善するため、柔軟な無音、有音の音声フレーム
の判定手段を備えた音声信号処理装置を提供するもので
ある。According to the present invention, in order to improve fast-forward voice reproduction which is difficult to understand by discarding voice frames including important keywords, which occurs when there is no silent voice frame, flexible silence and sound are provided. And an audio signal processing device provided with an audio frame determination means.

【００２４】[0024]

【課題を解決するための手段】前記目的を達成するため
に、本発明の音声信号処理装置は、音声信号を入力とす
る音声信号入力手段と、前記音声信号入力手段からの音
声信号を入力とし、音声信号の分割を行なう音声フレー
ム生成手段と、前記音声フレーム生成手段によって分割
した音声信号を入力とし、任意の周波数帯域幅に分割す
るフィルターバンクと、前記フィルターバンクによって
周波数帯域分割した音声信号を入力とし、周波数帯域分
割された音声信号から任意の周波数帯域を含む音声信号
を選択する有音声判定対象帯域選択手段と、前記有音声
判定対象帯域選択手段によって選択された周波数帯域制
限の音声信号を入力とし、有音であるか無音であるかの
判定を行ない、前記音声フレーム生成手段からの音声信
号の選択を行なう有音声フレーム判定手段と、前記有音
声判定手段によって選択された音声信号を入力とし、時
系列に一時蓄積する一時バッファ手段と、前記一時バッ
ファ手段に蓄積された音声信号を管理する一時バッファ
管理手段と、前記一時バッファ手段からの音声信号を出
力する音声信号出力手段を備えたことを特徴としてい
る。To achieve the above object, an audio signal processing apparatus according to the present invention comprises an audio signal input means for inputting an audio signal, and an audio signal from the audio signal input means. An audio frame generation unit for dividing an audio signal, an audio signal divided by the audio frame generation unit as an input, a filter bank for dividing the audio signal into an arbitrary frequency bandwidth, and an audio signal divided into frequency bands by the filter bank. As an input, an audio determination target band selecting means for selecting an audio signal including an arbitrary frequency band from the audio signal divided into frequency bands, and an audio signal of a frequency band limit selected by the audio determination target band selecting means. As an input, a determination is made as to whether there is sound or no sound, and an audio signal from the audio frame generating means is selected. An audio frame determination unit, a temporary buffer unit that receives the audio signal selected by the presence audio determination unit and temporarily stores the audio signal in time series, and a temporary buffer management unit that manages the audio signal stored in the temporary buffer unit. And audio signal output means for outputting an audio signal from the temporary buffer means.

【００２５】また、前記目的を達成するために、本発明
の音声信号処理装置は、前記音声フレーム手段からの有
音声信号を選択する有音声フレーム選択手段と、前記一
時バッファ管理手段の代わりに、前記一時バッファ手段
に蓄積された分割音声信号を、前記一時バッファ手段に
蓄積された分割音声信号の数をもとに、分割音声信号単
位に管理する一時バッファ管理手段を備えたことを特徴
としている。In order to achieve the above object, an audio signal processing device according to the present invention comprises: a voice frame selecting means for selecting a voice signal from the voice frame means; A temporary buffer managing unit that manages the divided audio signals stored in the temporary buffer unit on a divided audio signal basis based on the number of the divided audio signals stored in the temporary buffer unit. .

【００２６】[0026]

【発明の実施の形態】BEST MODE FOR CARRYING OUT THE INVENTION

(実施例１)本発明の実施例１を図１、図２、図３、図
４、図５を用いて説明する。(Embodiment 1) Embodiment 1 of the present invention will be described with reference to FIGS. 1, 2, 3, 4, and 5. FIG.

【００２７】図1は本発明の音声信号処理装置の実施例
１を示すブロック図である。図1において１は音声信号
入力手段であり、A/D変換器を備える。２は前記音声信
号入力手段１からの入力音声信号をある時間毎に分割し
音声フレームを生成する音声フレーム生成手段であり、
３は前記音声フレーム生成手段２からの音声フレームを
入力とし、複数の周波数帯域へ分割するフィルターバン
クであり、４は前記フィルターバンク３からの周波数帯
域分割された音声フレームを入力とし、有音声判定の対
象となる周波数帯域を含む周波数帯域制限の音声信号を
選択する有音声判定対象帯域選択手段であり、５は前記
有音声判定対象帯域選択手段４で選択した周波数帯域制
限の音声信号を入力とし、有音、無音の判定を行なう有
音声フレーム判定手段である。６は前記音声フレーム生
成手段２で生成した音声フレームを時系列に蓄積する一
時バッファ手段であり、７は前記一時バッファ手段７に
蓄積された音声フレームを、前記有音声フレーム判定手
段での判定結果と前記一時バッファ手段に蓄積された音
声フレームの数をもとに、音声フレーム単位に管理する
一時バッファ管理手段であり、８は前記一時バッファ手
段６からの音声信号を入力とする音声信号出力手段であ
り、D/A変換器を備える。FIG. 1 is a block diagram showing Embodiment 1 of an audio signal processing device according to the present invention. In FIG. 1, reference numeral 1 denotes audio signal input means, which includes an A / D converter. Reference numeral 2 denotes an audio frame generation unit that divides an input audio signal from the audio signal input unit 1 at certain intervals to generate an audio frame,
Reference numeral 3 denotes a filter bank which receives an audio frame from the audio frame generating means 2 as input and divides the audio frame into a plurality of frequency bands. Is a voice determination target band selection means for selecting a frequency band limited audio signal including a target frequency band, and 5 is an input of the frequency band limited audio signal selected by the voice determination target band selection means 4. , Voiced frame determination means for determining the presence or absence of voice or sound. Reference numeral 6 denotes a temporary buffer for storing the audio frames generated by the audio frame generating means 2 in time series. Reference numeral 7 denotes a result of the determination by the voiced frame determining means for the audio frames stored in the temporary buffer 7. And temporary buffer management means for managing the audio frames in units of audio frames based on the number of audio frames stored in the temporary buffer means. 8 is an audio signal output means to which the audio signal from the temporary buffer means 6 is input. And a D / A converter is provided.

【００２８】次に例として１.２５倍速度の早送り音声
再生を達成するまでの動作を説明する。Next, as an example, the operation up to the achievement of fast-forward audio reproduction at 1.25 times speed will be described.

【００２９】入力音声信号は音声信号入力手段１のA/D
変換器によってデジタル信号へ変換され、次の音声フレ
ーム生成手段２へ入力される。音声フレーム生成手段２
では、図２に示すように入力された音声信号を音声ピッ
チ周期に近い２０ミリ秒毎に分割し、音声フレームを生
成する。The input audio signal is A / D of the audio signal input means 1.
The signal is converted into a digital signal by the converter, and is input to the next audio frame generating means 2. Voice frame generation means 2
Then, as shown in FIG. 2, the input audio signal is divided every 20 milliseconds close to the audio pitch period to generate an audio frame.

【００３０】次に音声フレーム生成手段２で生成した音
声フレームをフィルターバンク３へ入力する。フィルタ
ーバンク３では、図３に示すような帯域分割フィルター
を複数備えたフィルターバンクを用い入力した音声フレ
ームを複数の周波数帯域へ分割する。Next, the voice frame generated by the voice frame generating means 2 is input to the filter bank 3. The filter bank 3 divides an input audio frame into a plurality of frequency bands using a filter bank including a plurality of band division filters as shown in FIG.

【００３１】フィルターバンク３で帯域分割された音声
信号を入力とする有音声判定対象帯域選択手段４では、
図３に示すように、人の声の再現性を重視し１００Ｈｚ
から２００Ｈｚの周波数を含む周波数帯域幅の音声信号
を選択し、有音声フレーム判定手段５へ入力する。The audio-voice-judgment target band selecting means 4 which receives the audio signal band-divided by the filter bank 3 as input,
As shown in FIG. 3, emphasis is placed on the reproducibility of human voice and 100 Hz
, A voice signal having a frequency bandwidth including a frequency of 200 Hz is selected and input to the voiced frame determination means 5.

【００３２】有音声フレーム判定手段５では、図４に示
すように、帯域制限された音声信号のスペクトル振幅レ
ベルとしきい値レベルとの比較を行う。しきい値の設定
は、スペクトル振幅レベルが０に近いレベルに設定す
る。As shown in FIG. 4, the voice frame determination means 5 compares the spectrum amplitude level of the band-limited voice signal with the threshold level. The threshold is set to a level at which the spectrum amplitude level is close to zero.

【００３３】比較を行い、しきい値レベル以上のスペク
トル振幅レベルのサンプルの個数を判定スコアとして算
出する。この判定スコア値の大きさを参照することによ
って無音、有音の状態を見きわめることができる。図４
に示す例では判定スコアは１２になる。A comparison is made, and the number of samples having a spectrum amplitude level equal to or higher than the threshold level is calculated as a judgment score. By referring to the magnitude of the judgment score value, it is possible to determine the state of silence or sound. FIG.
In the example shown in FIG.

【００３４】以上のようにして有音声フレーム判定手段
５で判定スコアが算出され、一時バッファ手段６に順次
音声フレーム生成手段２からの音声フレーム出力が蓄積
されていく。As described above, the judgment score is calculated by the sound frame judgment means 5, and the sound frame output from the sound frame generation means 2 is sequentially accumulated in the temporary buffer means 6.

【００３５】次に一時バッファ手段６に蓄積した音声フ
レームを、一時バッファ管理手段７によって、ある時間
毎にフレーム数を調整する方法について説明する。Next, a method for adjusting the number of frames of the audio frames stored in the temporary buffer means 6 at predetermined time intervals by the temporary buffer management means 7 will be described.

【００３６】一時バッファ管理手段７は、音声フレーム
生成手段２で生成された音声フレーム数を受けて調整の
タイミングを決定する。The temporary buffer management means 7 receives the number of audio frames generated by the audio frame generation means 2 and determines the timing of adjustment.

【００３７】本実施例１では一時バッファ手段６の容量
である音声フレーム、１０フレーム分の時間毎で調整を
行う場合を例として説明する。The first embodiment will exemplify a case in which the adjustment is performed at intervals of 10 audio frames, which is the capacity of the temporary buffer means 6, and 10 frames.

【００３８】図５は１０フレームの音声フレームが一時
バッファ手段６に蓄積されている様子を示している。FIG. 5 shows a state in which 10 audio frames are stored in the temporary buffer means 6.

【００３９】各音声フレームには、有音声フレーム判定
手段５で算出された判定スコアと、一時バッファ管理手
段７によって管理されている固有のインデックスがあ
り、これらをまとめて一組とし、一時バッファ管理情報
として一時バッファ管理手段７によって管理されてい
る。Each voice frame has a judgment score calculated by the voice frame judgment means 5 and a unique index managed by the temporary buffer management means 7. The information is managed by the temporary buffer management means 7.

【００４０】図５の例では、音声フレームAからJに対
し、各々小文字のaからjがインデックスとして対応して
いる。各音声フレームに対応する判定スコア値は図５に
示すとおりである。In the example shown in FIG. 5, lowercase letters a to j correspond to audio frames A to J, respectively, as indices. The judgment score value corresponding to each audio frame is as shown in FIG.

【００４１】一時バッファ管理手段７は、一時バッファ
管理情報内の各音声フレームに対する判定スコア値を参
照し、判定スコア値の大きい順に、図５のように一時バ
ッファ管理情報の並び替えを行う。ただし、判定スコア
値が同値の場合は先に一時バッファ手段６に蓄積された
音声フレームを優先し並び替えを行う。The temporary buffer management means 7 refers to the judgment score value for each audio frame in the temporary buffer management information, and sorts the temporary buffer management information as shown in FIG. 5 in descending order of the judgment score value. However, when the judgment score values are the same, the sound frames stored in the temporary buffer means 6 are prioritized and rearranged.

【００４２】１.２５倍速度の早送り音声再生を達成す
るためには、一時バッファ手段６内の音声フレーム数を
１０フレームから８フレームへ調整しなくてはならな
い。このため、一時バッファ管理情報の並び替えの後、
図５に示すように判定スコア値の大きいものから順に、
８つの音声フレームを選択し、これらを一時バッファ手
段６に残し、残りの２フレームを破棄する。図５の例で
はインデックスがdとgである２つの音声フレームD、Gが
破棄されている。破棄した後に図5に示すように、一時
バッファ手段６に残った音声フレームを次の音声信号出
力手段８へ入力していく。In order to achieve fast-forward audio reproduction at 1.25 times speed, the number of audio frames in the temporary buffer means 6 must be adjusted from 10 frames to 8 frames. Therefore, after sorting the temporary buffer management information,
As shown in FIG. 5, in order from the one with the largest judgment score value,
Eight voice frames are selected, these are left in the temporary buffer means 6, and the remaining two frames are discarded. In the example of FIG. 5, two audio frames D and G whose indexes are d and g are discarded. After discarding, as shown in FIG. 5, the audio frame remaining in the temporary buffer means 6 is input to the next audio signal output means 8.

【００４３】このように調整された一時バッファ手段６
内の音声フレームを、順に音声出力手段８のD/A変換器
へ入力し出力音声信号を得て、これを再生することによ
り１.２５倍速度の早送り音声再生が達成できる。The temporary buffer means 6 adjusted in this way
The input audio frames are sequentially input to the D / A converter of the audio output means 8 to obtain an output audio signal, which is reproduced to achieve fast-forward audio reproduction at a 1.25 times speed.

【００４４】従来例では一時バッファ手段６に蓄積され
た音声フレームの破棄は、音声フレームの内容に関係な
く、蓄積順の古い、つまり時間的に新しい音声フレーム
から順に破棄していた。このため破棄した音声フレーム
に重要なキーワードが存在した場合、内容理解が困難な
早送り音声再生になっていた。In the conventional example, the audio frames stored in the temporary buffer means 6 are discarded in ascending order of the storage order, that is, the audio frames that are newer in time, regardless of the contents of the audio frames. For this reason, when an important keyword is present in the discarded audio frame, fast forward audio reproduction is difficult to understand.

【００４５】本発明では、一時バッファ手段６に蓄積さ
れた音声フレームの判定スコア値の最も小さい、つまり
無音の音声フレームにより近いものから順に音声フレー
ムを破棄するため、従来例と比べると重要なキーワード
を含む音声フレームの破棄が少なくなる。このため内容
理解しやすい早送り音声再生が可能となる。According to the present invention, the speech frames stored in the temporary buffer means 6 are discarded in order from the smallest judgment score value of the speech frame, that is, the speech frame closer to the silent speech frame. , The number of discards of the audio frame including the For this reason, fast-forward audio reproduction that makes it easy to understand the contents becomes possible.

【００４６】尚、本実施例１では有音声判定対象帯域手
段４において、１００Ｈｚから２００Ｈｚの帯域を含む
音声信号の選択を行ったが、着目する音声信号に応じて
複数の帯域分割音声信号の選択を行い、前記有音声フレ
ーム判定手段５により判定スコア値を算出することによ
って、同様な早送り音声再生が可能である。In the first embodiment, the audio signal including the band from 100 Hz to 200 Hz is selected by the audio existence judgment band means 4, but a plurality of band-divided audio signals are selected according to the audio signal of interest. The fast forward sound reproduction can be performed by calculating the judgment score value by the sound frame judgment means 5.

【００４７】また本実施例１では音声フレームの長さを
２０ミリ秒にしたが、既知な技術であるケプストラム法
などを用いて、音声ピッチの周期を計算し、この長さで
音声フレームを生成することでも同様な早送り音声再生
が可能である。In the first embodiment, the length of the audio frame is set to 20 milliseconds. However, the period of the audio pitch is calculated by using a known technique such as the cepstrum method, and the audio frame is generated using this length. By doing so, the same fast-forward audio reproduction is possible.

【００４８】また本実施例１では１.２５倍速度の早送
り音声再生について説明したが、一時バッファ管理手段
７での音声フレームの調整の周期、破棄する音声フレー
ムの数を調整することで、同様に任意の速度の早送り音
声再生が可能である。In the first embodiment, the fast-forward audio reproduction at a 1.25-times speed has been described. However, by adjusting the audio frame adjustment cycle in the temporary buffer management means 7 and the number of audio frames to be discarded, the same applies. A fast-forward audio reproduction at an arbitrary speed is possible.

【００４９】また本実施例１では有音声フレーム判定手
段での判定スコアー値の算出を、しきい値以上のスペク
トル振幅を持つサンプルの個数としたが、逆にしきい値
以下のものの個数を判定スコアとしても同様な早送り音
声再生が可能である。In the first embodiment, the judgment score value in the voiced frame judgment means is calculated as the number of samples having a spectral amplitude equal to or larger than the threshold value. The same fast forward sound reproduction is possible.

【００５０】(実施例２)本発明の実施例２を図６、図
７、図８、図９、図１０を用いて説明する。(Embodiment 2) Embodiment 2 of the present invention will be described with reference to FIGS. 6, 7, 8, 9 and 10. FIG.

【００５１】図６において図１と同様の機能を有するも
のは同一の番号を付けて説明を省略する。In FIG. 6, components having the same functions as those in FIG. 1 are assigned the same reference numerals and description thereof will be omitted.

【００５２】図６における９は、有音声フレーム判定手
段５によって算出された判定スコア値を参照し、音声フ
レーム生成手段２からの音声フレームデータの選択を行
い、無音の音声フレームの場合は破棄を行い、有音の音
声フレームの場合は一時バッファ手段６へ音声フレーム
を選択する有音声フレーム選択手段である。In FIG. 6, reference numeral 9 designates the selection of the voice frame data from the voice frame generation means 2 by referring to the judgment score value calculated by the voice frame determination means 5, and discards the sound frame if it is a silent voice frame. If the voice frame is a voice frame, the voice frame selection unit selects a voice frame in the temporary buffer unit 6.

【００５３】図６における１０は、有音声フレーム判定
手段５からの判定スコア値、有音声フレーム選択手段９
で破棄した音声フレーム数、一時バッファ手段６から蓄
積した音声フレーム数を受け、一時バッファ手段に蓄積
された音声フレームデータを音声フレーム単位で管理す
る一時バッファ管理手段である。In FIG. 6, reference numeral 10 denotes a judgment score value from the voiced frame determination means 5 and voiced frame selection means 9.
A temporary buffer management unit that receives the number of audio frames discarded in step 1 and the number of audio frames stored from the temporary buffer unit 6 and manages the audio frame data stored in the temporary buffer unit in units of audio frames.

【００５４】実施例１では、音声フレームの破棄を実施
例１で説明した一時バッファ管理手段のみによって行っ
ていた。本実施例２では、有音声フレーム選択手段９で
無音と判定される音声フレームをあらかじめ破棄してお
き、さらに図６の一時バッファ管理手段１０によって一
時バッファ手段６中の音声フレーム数を調整することで
早送り音声再生を達成する。実施例１と同様に１.２５
倍速度の早送り音声再生を例にとって説明を行う。In the first embodiment, the destruction of the audio frame is performed only by the temporary buffer management means described in the first embodiment. In the second embodiment, the voice frames determined to be silent by the voice frame selection means 9 are discarded in advance, and the number of voice frames in the temporary buffer means 6 is adjusted by the temporary buffer management means 10 in FIG. To achieve fast forward audio playback. 1.25 as in Example 1.
A description will be given taking double-speed fast-forward audio reproduction as an example.

【００５５】音声信号入力手段１へ入力音声信号を入力
し、有音声フレーム判定手段５によって判定スコア値を
算出するまでは、前記の実施例１で説明したものと同様
であるので説明を省略する。The steps up to the input of the input voice signal to the voice signal input means 1 and the calculation of the determination score value by the voiced frame determination means 5 are the same as those described in the first embodiment, so that the description is omitted. .

【００５６】有音声フレーム選択手段９は、有音声フレ
ーム判定手段５によって算出された判定スコア値を参照
し、図７に示すよな音声フレームの選択を行う。判定ス
コア値が正値の場合、つまり有音の音声フレームとして
判定できる場合は、音声フレームを次の一時バッファ手
段６へ入力し、判定スコア値が０の場合、つまり無音の
音声フレームとして判定できる場合は破棄を行う。この
ようにして音声フレームの選択を行う。The voice frame selection means 9 refers to the judgment score value calculated by the voice frame determination means 5 and selects a voice frame as shown in FIG. When the judgment score value is a positive value, that is, when it can be judged as a sound voice frame, the sound frame is input to the next temporary buffer means 6, and when the judgment score value is 0, that is, it can be judged as a silent sound frame. If it is, discard it. Thus, the selection of the audio frame is performed.

【００５７】また、破棄した音声フレーム数を一時バッ
ファ管理手段１０へ入力する。このようにして、選択さ
れた音声フレームは一時バッファ手段６へ時系列に蓄積
されていく。The number of discarded audio frames is input to the temporary buffer management means 10. In this way, the selected audio frames are accumulated in the temporary buffer means 6 in time series.

【００５８】次に一時バッファ管理手段１０による一時
バッファ６中の音声フレームの調整方法について図８、
図９、図１０を用いて説明する。Next, a method of adjusting the audio frame in the temporary buffer 6 by the temporary buffer management means 10 will be described with reference to FIG.
This will be described with reference to FIGS.

【００５９】調整は、あらかじめ設定した音声フレーム
数が、一時バッファ手段６に蓄積された時点で始まる。The adjustment starts when the preset number of audio frames is stored in the temporary buffer means 6.

【００６０】ここでは例として、音声フレームが２０フ
レーム蓄積された時点で調整を行う場合をとりあげる。
尚、調整開始時に一時バッファ手段６の音声フレーム
数、２０フレームに対して、有音声フレーム選択手段９
による破棄した音声フレーム数が５フレームであれば、
４対１の割合となり時間軸方向に８／１０の圧縮がで
き、この２０フレームを次の音声信号出力手段に入力し
再生すれば1.２５倍速度の早送り音声再生が達成され
る。つまり、一時バッファ手段に残す音声フレーム数
と、最終的に破棄される音声フレーム数の比を４対１に
調整すれば、1.２５倍速度の早送り音声再生が達成され
ることになる。Here, as an example, a case will be described in which the adjustment is performed when 20 audio frames are accumulated.
At the start of the adjustment, the number of audio frames in the temporary buffer means 6, 20 frames,
If the number of audio frames discarded by is 5 frames,
The compression ratio is 4: 1, and 8/10 compression can be performed in the time axis direction. If these 20 frames are input to the next audio signal output means and reproduced, fast-forward audio reproduction at 1.25 times speed is achieved. In other words, if the ratio of the number of audio frames left in the temporary buffer means to the number of audio frames finally discarded is adjusted to 4: 1, fast-forward audio reproduction at 1.25 times speed is achieved.

【００６１】一時バッファ手段６に音声フレームが２０
フレーム蓄積されるまでに、有音声フレーム選択手段に
よって破棄された音声フレームの数は、音声フレームの
内容によって一定ではない。調整時での一時バッファ手
段中の音声フレーム数と、有音声フレーム選択手段によ
って破棄された音声フレーム数の状態は、図８、図９に
示す２通りが考えられる。The temporary buffer means 6 stores 20 audio frames.
The number of audio frames discarded by the audio frame selection means until the frames are accumulated is not constant depending on the contents of the audio frames. The state of the number of audio frames in the temporary buffer unit at the time of adjustment and the number of audio frames discarded by the audio frame selection unit can be two states shown in FIGS.

【００６２】図８は、一時バッファ手段に２０フレーム
の音声フレームが蓄積されており、有音声フレーム選択
手段によって７フレームの音声フレームが無音の音声フ
レームとして破棄された場合を示している。FIG. 8 shows a case where 20 frames of audio frames are stored in the temporary buffer means, and 7 audio frames are discarded as silent audio frames by the voiced audio frame selecting means.

【００６３】図９は、同様に一時バッファ手段に２０フ
レームの音声フレームが蓄積されており、声フレーム選
択手段によって３フレームの音声フレームが無音の音声
フレームとして破棄された場合を示している。FIG. 9 shows a case where 20 voice frames are similarly stored in the temporary buffer means, and three voice frames are discarded as silent voice frames by the voice frame selecting means.

【００６４】図８、図９の最小無音声フレーム数とは、
一時バッファ手段中の２０フレームに対する４分の１の
値である定数値の５フレームを示している。The minimum number of silent frames in FIGS. 8 and 9 is
5 shows a constant value of 5 frames which is a quarter value of 20 frames in the temporary buffer means.

【００６５】まず図８に示す場合の一時バッファ管理手
段による調整の流れを図１０を用いて説明する。First, the flow of adjustment by the temporary buffer management means in the case shown in FIG. 8 will be described with reference to FIG.

【００６６】図１０は一時バッファ管理手段による調整
の流れを示したもので、２０フレーム分の音声フレーム
が一時バッファ手段に蓄積された時点で調整が開始され
る。図８に示す場合ではステップ２０１、ステップ２０
２、ステップ２０３、ステップ２０４が実行される。FIG. 10 shows a flow of the adjustment by the temporary buffer management means. The adjustment is started when 20 frames of audio frames are accumulated in the temporary buffer means. In the case shown in FIG.
2. Steps 203 and 204 are executed.

【００６７】まずステップ２０１では、前回の調整時に
ステップ２０４でセットした無音声フレーム数の余りを
現在の無音声フレーム数へ加算する。First, in step 201, the remainder of the number of unvoiced frames set in step 204 during the previous adjustment is added to the current number of unvoiced frames.

【００６８】ここで無音声フレーム数の余りとは、図８
の場合であれば、最小無音声フレーム数を超えた分の音
声フレーム数を示しており、図８に示している現在の無
音声フレーム数７フレームは、前回の無音声フレーム数
の余りを含めたフレーム数である。また図８では次回に
加算される無音声フレーム数の余りは２フレームとな
る。尚、無音声フレーム数の初期値は０フレームであ
る。Here, the remainder of the number of non-speech frames is defined as shown in FIG.
In the case of, the number of voice frames exceeding the minimum number of voiceless frames is shown, and the current number of voiceless frames of seven frames shown in FIG. Frame number. In FIG. 8, the remainder of the number of non-voice frames to be added next time is two frames. Note that the initial value of the number of unvoiced frames is 0.

【００６９】ステップ２０１によって、前回の無音声フ
レーム数の余りを含めた無音声フレーム数を決定し、次
のステップ２０２では無音声フレーム数と、最小無音声
フレーム数との比較を行う。図８の場合は、比較は真値
となり、次のステップ２０３へ進む。In step 201, the number of non-voice frames including the remainder of the previous number of non-voice frames is determined. In the next step 202, the number of non-voice frames is compared with the minimum number of non-voice frames. In the case of FIG. 8, the comparison becomes a true value, and the process proceeds to the next step 203.

【００７０】ステップ２０３では、一時バッファ手段に
蓄積された２０フレームの音声フレーム全てを音声信号
出力手段へ入力する。一時バッファ手段中の音声フレー
ムの破棄は一切行わない。In step 203, all the 20 audio frames stored in the temporary buffer unit are input to the audio signal output unit. The audio frame in the temporary buffer means is not discarded at all.

【００７１】次のステップ２０４では、次回の調整時の
ステップ２０１で加算する無音声フレーム数の余りを算
出する。図８においては２フレームとなる。In the next step 204, the remainder of the number of unvoiced frames to be added in step 201 in the next adjustment is calculated. In FIG. 8, there are two frames.

【００７２】図８に示すように、有音声フレーム選択手
段によって破棄された音声フレーム数が最小無音声フレ
ーム数以上の場合は、最小無音声フレーム５フレームに
対し、一時バッファ手段中の２０フレームの音声をその
まま音声信号出力手段へ入力し再生することで１.２５
倍速度の早送り音声再生が達成される。この場合の音声
は、判定スコア値が０の音声フレームだけの廃棄を行っ
た１.25倍速度音声となる。As shown in FIG. 8, when the number of audio frames discarded by the audio frame selection means is equal to or more than the minimum number of non-audio frames, the minimum number of non-audio frames is reduced to 20 frames in the temporary buffer means. It is 1.25 by directly inputting the audio to the audio signal output means and reproducing it.
Double-speed fast-forward audio reproduction is achieved. In this case, the voice is a 1.25-times speed voice obtained by discarding only the voice frame having the determination score value of 0.

【００７３】次に図９の有音声フレーム選択手段によっ
て破棄された音声フレーム数が最小無音声フレーム数よ
り少ない場合を説明する。尚、ステップ２０１、ステッ
プ２０２までは同様であるため説明を省略し、ステップ
２０５からの流れを説明する。Next, a case where the number of audio frames discarded by the audio frame selection means in FIG. 9 is smaller than the minimum number of non-audio frames will be described. Since steps 201 and 202 are the same, the description will be omitted, and the flow from step 205 will be described.

【００７４】図９は最小無音声フレーム数が、無音声フ
レーム数より多い場合でり、無音声フレーム数は３フレ
ームとなっている。このまま一時バッファ手段の２０フ
レームの音声フレームを音声信号出力手段へ入力して
も、現在の無音声フレーム数３フレームに対し２０フレ
ームの音声フレームの再生となるため、１.２５倍速度
に満たない早送り音声再生となる。このため、ステップ
２０５では最小無音フレーム数に満たない数の音声フレ
ームを新たに処理し、有音声フレーム選択手段によっ
て、破棄または一時バッファ手段へ音声フレームを蓄積
する。図９の場合は、２フレーム中、１フレームが破棄
され無音声フレーム数が３から４になり、１フレームが
一時バッファ手段へ蓄積され２１フレームの音声フレー
ムが一時バッファ手段に蓄積された場合を示している。FIG. 9 shows a case where the minimum number of non-voice frames is larger than the number of non-voice frames, and the number of non-voice frames is three. Even if the 20-frame audio frame of the temporary buffer unit is input to the audio signal output unit as it is, 20 frames of audio frames are reproduced with respect to the current 3 frames of non-audio frames, so that the speed is less than 1.25 times speed. Fast forward audio playback. Therefore, in step 205, the number of voice frames less than the minimum number of silent frames is newly processed, and the voice frame selection means discards or accumulates the voice frames in the temporary buffer means. In the case of FIG. 9, it is assumed that one frame is discarded out of two frames, the number of non-speech frames is changed from three to four, one frame is accumulated in the temporary buffer means, and the 21 speech frames are accumulated in the temporary buffer means Is shown.

【００７５】ステップ２０５で新規に音声フレームを処
理した後に、ステップ２０６では新たに一時バッファ手
段に蓄積された音声フレーム数分を、一時バッファ手段
に蓄積された音声フレームから破棄する。図９の場合
は、新たに蓄積した音声フレーム数は１フレームである
ので、１フレーム分の音声フレームを破棄する。そして
この破棄の方法は前期実施例１で図５を用いて説明した
方法を用い、判定スコア値をもとに１フレーム分を破棄
する。After a new audio frame is processed in step 205, in step 206, the number of audio frames newly stored in the temporary buffer means is discarded from the audio frames stored in the temporary buffer means. In the case of FIG. 9, since the number of newly stored audio frames is one, the audio frames for one frame are discarded. This discarding method uses the method described in the first embodiment with reference to FIG. 5, and discards one frame based on the judgment score value.

【００７６】ステップ２０６で音声フレームの破棄を行
った後は２０フレームの音声フレームが一時バッファ手
段に残されることになり、この２０フレームの音声フレ
ームを音声信号出力手段へ入力する。After the discarding of the audio frame in step 206, 20 audio frames are left in the temporary buffer means, and these 20 audio frames are input to the audio signal output means.

【００７７】最後にステップ２０７で次回に加算する無
音声フレームの余りを０にセットする。Finally, in step 207, the remainder of the non-voice frame to be added next time is set to zero.

【００７８】図９の場合、有音声フレーム選択手段にて
破棄される音声フレームが少ない場合、実施例１で図５
を用いて説明した判定スコア値をもとにした音声フレー
ムの破棄を行い、図８の場合と同様に、結果的に破棄し
た総音声フレーム数５フレームに対し、２０フレームの
音声フレームを音声信号出力手段へ入力し再生すること
で、１.２５倍速度の早送り音声再生が達成できる。In the case of FIG. 9, if the number of voice frames to be discarded by the voice frame selection means is small, the first embodiment will be described with reference to FIG.
The audio frame is discarded based on the judgment score value described with reference to FIG. 8, and in the same manner as in FIG. By inputting the data to the output means and reproducing the data, fast-forward audio reproduction at a 1.25 times speed can be achieved.

【００７９】前記の実施例１の例では、判定スコア値を
もとに、あらかじめ決められた数の音声フレームの破棄
を行い可変速度音声再生を達成していた。これは単純に
時間的に新しい音声フレームを破棄する従来の方法より
も、音声フレームの内容、つまり判定スコア値をもとに
破棄する音声フレームを決定する点で従来の例と比較す
るとキーワードの欠落の少ない早送り音声再生が達成で
きる。In the example of the first embodiment, a predetermined number of audio frames are discarded based on the judgment score value, thereby achieving variable speed audio reproduction. Compared to the conventional method, which simply discards a new audio frame in terms of time, the lack of keywords compared to the conventional example in that the speech frame to be discarded is determined based on the content of the speech frame, that is, the judgment score value. Fast-forward audio reproduction with less noise can be achieved.

【００８０】尚、本実施例２では１.２５倍速度の早送
り音声再生について説明したが、実施例１と同様に一時
バッファ管理手段７での調整する音声フレーム数を変え
ることで任意の速度の早送り音声再生が可能である。In the second embodiment, fast-forward audio reproduction at 1.25 times speed has been described. However, as in the first embodiment, by changing the number of audio frames to be adjusted by the temporary buffer management means 7, an arbitrary speed can be obtained. Fast forward audio playback is possible.

【００８１】また、図１０で、ステップ２０３とステッ
プ２０４を入れ替えてもどうような早送り音声再生が達
成でき、またステップ２０７とステップ２０８を入れ替
えてもどうような早送り音声再生が達成できる。Also, in FIG. 10, any kind of fast-forward sound reproduction can be achieved even if step 203 and step 204 are interchanged, and any kind of fast-forward sound reproduction can be achieved even if step 207 and step 208 are interchanged.

【００８２】[0082]

【発明の効果】以上説明したように、柔軟な音声フレー
ムの破棄を行う一時バッファ管理手段を備えることによ
って、重要なキーワードを含んだ音声信号の欠落を極力
抑え、聞き取りやすい早送り音声再生を達成できる。As described above, the provision of the temporary buffer management means for flexibly discarding the audio frame minimizes the loss of the audio signal including the important keyword, thereby achieving the fast forward audio reproduction which is easy to hear. .

【図面の簡単な説明】[Brief description of the drawings]

【図１】実施例１における音声処理装置の構成を示すブ
ロック図FIG. 1 is a block diagram illustrating a configuration of an audio processing device according to a first embodiment.

【図２】図１における音声フレームの生成過程の示す模
式図FIG. 2 is a schematic diagram showing a process of generating a voice frame in FIG. 1;

【図３】図１におけるフィルターバンク３、有音声対象
帯域選択手段４での動作を示す模式図FIG. 3 is a schematic diagram showing the operation of the filter bank 3 and the audio target band selection means 4 in FIG. 1;

【図４】図１における判定スコア値の説明図FIG. 4 is an explanatory diagram of a judgment score value in FIG. 1;

【図５】図１における一時バッファ管理手段７での音声
フレーム数の調整を示す模式図FIG. 5 is a schematic diagram showing adjustment of the number of audio frames in the temporary buffer management unit 7 in FIG. 1;

【図６】実施例２における音声処理装置のブロック図FIG. 6 is a block diagram of an audio processing device according to a second embodiment.

【図７】図６における有音声フレーム選択手段９での音
声フレームの選択を示す模式図FIG. 7 is a schematic diagram showing selection of a voice frame by voice frame selection means 9 in FIG. 6;

【図８】図６における一時バッファ管理手段１０での音
声フレーム数の調整を示す模式図FIG. 8 is a schematic diagram showing adjustment of the number of audio frames by the temporary buffer management unit 10 in FIG. 6;

【図９】図６における一時バッファ管理手段１０での音
声フレーム数の調整を示す模式図FIG. 9 is a schematic diagram showing adjustment of the number of audio frames in the temporary buffer management unit 10 in FIG. 6;

【図１０】図６における一時バッファ管理手段１０での
音声フレーム数の調整を示すフローチャートFIG. 10 is a flowchart showing the adjustment of the number of audio frames in the temporary buffer management unit 10 in FIG. 6;

【図１１】従来例を示すブロック図FIG. 11 is a block diagram showing a conventional example.

【符号の説明】[Explanation of symbols]

１音声信号入力手段２音声フレーム生成手段３フィルターバンク４有音声判定対象帯域選択手段５有音声フレーム選択手段６一時バッファ手段７一時バッファ管理手段８音声信号出力手段 REFERENCE SIGNS LIST 1 audio signal input means 2 audio frame generation means 3 filter bank 4 audio determination target band selection means 5 audio frame selection means 6 temporary buffer means 7 temporary buffer management means 8 audio signal output means

───────────────────────────────────────────────────── フロントページの続き (72)発明者石津厚大阪府門真市大字門真1006番地松下電器産業株式会社内 ──────────────────────────────────────────────────続き Continuing from the front page (72) Inventor Atsushi Ishizu 1006 Kazuma Kadoma, Kadoma City, Osaka Matsushita Electric Industrial Co., Ltd.

Claims

【特許請求の範囲】[Claims]

【請求項１】音声信号を入力とする音声信号入力手段
と、前記音声信号入力手段からの音声信号を分割する音
声フレーム生成手段と、前記音声フレーム生成手段によ
って分割した音声信号を任意の周波数帯域幅に分割する
フィルターバンクと、前記フィルターバンクによって周
波数帯域分割した音声信号から、任意の周波数帯域を含
む音声信号を選択する有音声判定対象帯域選択手段と、
前記有音声判定対象帯域選択手段によって選択した周波
数帯域制限の音声信号に対し、有音であるか無音である
かの判定を行なう有音声フレーム判定手段と、前記音声
フレーム生成手段からの音声信号を一時蓄積する一時バ
ッファ手段と、前記一時バッファ手段に蓄積された音声
信号を、前記有音声フレーム判定手段での判定結果と前
記一時バッファ手段に蓄積された分割音声信号の数をも
とに、分割音声信号単位に管理する一時バッファ管理手
段と、前記一時バッファ手段からの音声信号を出力する
音声信号出力手段を備えたことを特徴とした音声信号処
理装置。An audio signal input means for inputting an audio signal, an audio frame generation means for dividing an audio signal from the audio signal input means, and an audio signal divided by the audio frame generation means in an arbitrary frequency band A filter bank to be divided into widths, and an audio signal determination target band selecting means for selecting an audio signal including an arbitrary frequency band from the audio signal divided into frequency bands by the filter bank,
A voiced frame determination unit that determines whether a voice signal or a non-voice is generated for a voice band-limited voice signal selected by the voiced voice determination target band selection unit, and a voice signal from the voice frame generation unit. A temporary buffer for temporarily storing the audio signal, and dividing the audio signal stored in the temporary buffer based on the result of the determination by the voiced frame determining unit and the number of divided audio signals stored in the temporary buffer. An audio signal processing device comprising: a temporary buffer management unit that manages audio signals in units; and an audio signal output unit that outputs an audio signal from the temporary buffer unit.

【請求項２】前記有音声フレーム判定手段での判定結果
から、前記音声フレーム生成手段からの有音声信号を選
択する有音声フレーム選択手段と、前記一時バッファ管
理手段の代わりに、前記一時バッファ手段に蓄積された
分割音声信号を、前記一時バッファ手段に蓄積された分
割音声信号の数をもとに、分割音声信号単位に管理する
一時バッファ管理手段を備えたことを特徴とした請求項
記載１記載の音声信号処理装置。2. A voice frame selecting means for selecting a voice signal from the voice frame generating means based on a determination result by the voice frame determining means, and the temporary buffer means instead of the temporary buffer managing means. 2. A temporary buffer managing means for managing divided audio signals stored in the temporary buffer means on a divided audio signal basis based on the number of divided audio signals stored in the temporary buffer means. An audio signal processing device as described in the above.