JP3189597B2

JP3189597B2 - Audio time base converter

Info

Publication number: JP3189597B2
Application number: JP26020694A
Authority: JP
Inventors: 正之三崎; 武志則松; 和彦佐藤
Original assignee: Panasonic Corp; Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Corp; Panasonic Holdings Corp
Priority date: 1994-10-25
Filing date: 1994-10-25
Publication date: 2001-07-16
Anticipated expiration: 2016-07-16
Also published as: JPH08123483A

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【産業上の利用分野】本発明は、ビデオテープレコーダ
ー（ＶＴＲ）等で音声の低速再生を行なう際に必要とな
る、音声の時間軸の長さを任意に伸長を行なうことを可
能にする音声時間軸変換装置に関するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to an audio system capable of arbitrarily extending the time axis of audio, which is required when performing low-speed audio reproduction with a video tape recorder (VTR) or the like. The present invention relates to a time axis conversion device.

【０００２】[0002]

【従来の技術】従来より、ある速度で記録された音声信
号を記録時の速度と異なる速度で再生する音声時間軸変
換装置は存在する。例えば、テープレコーダーではテー
プの走行スピードを調節して再生速度を早めたり遅めた
りすることができる。しかし、再生スピードが変化する
のに伴って同時に音程も変化してしまうので、内容が聞
きづらくなってしまう。そこで、音程を変化させずに再
生速度を変化させることの可能な音声時間軸変換装置が
提案されている。2. Description of the Related Art Conventionally, there is an audio time base conversion apparatus for reproducing an audio signal recorded at a certain speed at a speed different from the speed at the time of recording. For example, a tape recorder can adjust the running speed of the tape to increase or decrease the reproduction speed. However, since the pitch also changes at the same time as the reproduction speed changes, the content becomes difficult to hear. Therefore, there has been proposed an audio time axis conversion device capable of changing the reproduction speed without changing the pitch.

【０００３】以下、従来の音声時間軸変換装置について
図面を参照しながら説明する。図４は従来の音声時間軸
変換装置の構成を表すブロック図である。図４におい
て、１は音響信号の記録および再生を行なう記録再生
部、２は再生されたアナログ信号をデジタル信号に変換
するＡ／Ｄ変換器、３はデジタルデータを記録するため
のバッファメモリ、４はＤ／Ａ変換器、５はバッファメ
モリへのデータの書き込みを制御する制御部、６はメモ
リのデータの読み出しを制御する読み出し制御部であ
る。[0003] A conventional audio time base converter will be described below with reference to the drawings. FIG. 4 is a block diagram showing a configuration of a conventional audio time base conversion apparatus. In FIG. 4, 1 is a recording / reproducing unit for recording and reproducing an audio signal, 2 is an A / D converter for converting a reproduced analog signal into a digital signal, 3 is a buffer memory for recording digital data, 4 Is a D / A converter, 5 is a control unit that controls writing of data to the buffer memory, and 6 is a read control unit that controls reading of data from the memory.

【０００４】以上のように構成された音声時間軸変換装
置について以下にその動作を説明する。ここでは、記録
媒体への記録速度以下で音声信号を再生する場合に、音
程を記録時の状態に戻して再生する音声時間軸変換装置
について説明する。[0004] The operation of the audio time base conversion device configured as described above will be described below. Here, a description will be given of an audio time base conversion apparatus that reproduces an audio signal by returning a pitch to a state at the time of recording when an audio signal is reproduced at a recording speed on a recording medium or lower.

【０００５】まず、記録再生部１において、記録時のＭ
倍（０＜Ｍ＜１）の速度で音響信号が再生される。ここ
で記録再生部とは、例えばＶＴＲ、テープレコーダー等
である。次に、記録再生部１から再生された音響信号
は、再生速度に反比例したサンプリング周期Ｔ／Ｍで、
Ａ／Ｄ変換器２によりデジタル信号に変換される。ここ
で、Ｔは記録時の音響信号について標本化定理を満足す
るサンプリング周期であり、Ｍ倍速再生された音響信号
の場合には、その１／Ｍの周期になる。Ａ／Ｄ変換され
たこれらのデジタル信号は、書き込み制御部５によって
周期Ｔ／Ｍでバッファメモリ３に順次記録されていく。
ここで、バッファメモリ３に記録された各デジタル信号
を、周期Ｔで読み出し再生すれば、記録時の音程に復元
できるが、出力信号を連続して出し続けるには入力信号
データが不足し、時間的に空白となる区間ができる。そ
のため、読み出し制御部６ではバッファメモリ３に蓄え
られたデジタル信号を数１０msecのフレーム単位で２度
繰り返して読みだしを行う区間を設けるようにして、不
足するデータを補うようにする。読み出し制御部６によ
り読み出されたデジタル信号を、Ｄ／Ａ変換器４により
サンプリング周期Ｔでアナログ信号に変換する。これら
一連の処理により、音程を変化させずに音声時間軸変換
が実現できる。ここで説明したような、音程一定で速度
のみを変換する技術については、例えば「『会話の時間
軸を圧縮／伸長するテープ・レコーダ』、小坂，横堀，
藤田，日経エレクトロニクス（１９７６．７．２６）」
に詳しく解説されている。First, in the recording / reproducing unit 1, M
An acoustic signal is reproduced at double speed (0 <M <1). Here, the recording / reproducing unit is, for example, a VTR, a tape recorder, or the like. Next, the acoustic signal reproduced from the recording / reproducing unit 1 has a sampling period T / M that is inversely proportional to the reproducing speed.
The signal is converted into a digital signal by the A / D converter 2. Here, T is a sampling period that satisfies the sampling theorem for the audio signal at the time of recording, and is 1 / M of that for an audio signal reproduced at M-times speed. These A / D-converted digital signals are sequentially recorded in the buffer memory 3 by the write control unit 5 at a period T / M.
Here, if each digital signal recorded in the buffer memory 3 is read out and reproduced at a period T, it can be restored to the pitch at the time of recording. However, input signal data is insufficient to continuously output signals, and time is required. There is a blank section. For this reason, the read control unit 6 compensates for insufficient data by providing a section in which the digital signal stored in the buffer memory 3 is repeatedly read twice in units of several tens of msec. The digital signal read by the read control unit 6 is converted by the D / A converter 4 into an analog signal at a sampling period T. Through these series of processes, audio time base conversion can be realized without changing the pitch. As for the technique for converting only the speed with a constant pitch as described here, for example, “Tape Recorder for Compressing / Expanding the Talk Time Axis”, Kosaka, Yokobori,
Fujita, Nikkei Electronics (1976.7.26) "
Is explained in detail.

【０００６】図５は１／２倍速の場合の処理例を示して
いる。（ａ）は記録時のデータを示しており、（ｂ）は
バッファメモリに蓄えられていくデータの時間的位置を
示している。（ｂ）の各ブロックを２回づつ繰り返しな
がらサンプリング周期Ｔで再生したものが（ｃ）のデー
タ列となり、これは（ａ）のデータ列と音程が同じであ
り、時間軸が２倍のスケールになっている。FIG. 5 shows an example of processing in the case of 1/2 speed. (A) shows the data at the time of recording, and (b) shows the time position of the data stored in the buffer memory. The data sequence reproduced in the sampling period T while repeating each block of (b) twice is the data sequence of (c), which has the same pitch as the data sequence of (a), and has a double time axis scale. It has become.

【０００７】[0007]

【発明が解決しようとする課題】上記した従来例では、
記録時の音程を保ち、音声速度は記録媒体の再生速度と
同一であり、記録時より遅く変換されている。ここで、
ＶＴＲ等で画像情報を詳細にゆっくりと見たい場合など
において、記録媒体の再生速度を遅くしていくと、従来
の時間軸変換装置を用いた場合には、音声速度も画像と
同様に遅くなっていく。ここで、人間の会話速度を違和
感無く可変できる幅については、０．７５〜１．５倍程
度といわれている。したがって、主として画像情報を詳
細にゆっくりと見たい場合に記録媒体の再生速度をあま
り遅くすると、再生される音声速度が必要以上に遅くな
りすぎて違和感が起こり、かえって聴き取りにくくな
る。また、これを回避するために、現在の記録媒体の再
生速度よりも早い速度で音声を聴取しようとすると、再
生すべき音声信号データが時間的に見て不足してしま
う。この場合、不足する音声データ区間が定期的に発生
し、この区間に無音データを挿入するなどの手段を用い
たとしても、不連続な音声信号となり極めて不自然な再
生音となる。In the above conventional example,
The pitch at the time of recording is maintained, and the audio speed is the same as the playback speed of the recording medium, and is converted lower than at the time of recording. here,
When the reproduction speed of the recording medium is reduced, for example, when the user wants to view image information slowly and in detail on a VTR or the like, the sound speed is reduced similarly to the image when the conventional time axis conversion device is used. To go. Here, it is said that the width in which the human conversation speed can be varied without discomfort is about 0.75 to 1.5 times. Accordingly, if the reproduction speed of the recording medium is too slow, mainly when the user wants to view the image information slowly in detail, the reproduced sound speed becomes too slow, which causes a sense of incongruity, which makes it difficult to hear. In order to avoid this, if an attempt is made to listen to audio at a speed higher than the current playback speed of the recording medium, the audio signal data to be played will be short in time. In this case, an insufficient audio data section is periodically generated, and even if means such as inserting silent data into this section is used, a discontinuous audio signal is generated, resulting in an extremely unnatural reproduced sound.

【０００８】本発明は、上記課題を解決するもので、記
録媒体から記録速度以下の再生速度で読み出した場合
に、音声の速度は必要以上に遅くせず、不連続点を生じ
ることもなく、聴き取りやすい音声を聴取することが可
能な音声時間軸変換装置を提供することを目的とする。The present invention solves the above-mentioned problems, and when reading from a recording medium at a reproduction speed equal to or lower than the recording speed, the speed of the audio is not reduced unnecessarily, and no discontinuity occurs. It is an object of the present invention to provide a sound time axis conversion device capable of listening to sounds that are easy to hear.

【０００９】[0009]

【課題を解決するための手段】上記課題を解決するため
に、請求項１記載の音声時間軸変換装置は、記録媒体に
記憶された音響信号を記録時のＭ（０＜Ｍ＜１）倍の速
度で読み出す再生部と、再生部で読み出されたアナログ
信号をデジタル信号へと変換するＡ／Ｄ変換器と、Ａ／
Ｄ変換器の出力データを記憶する入力バッファと、音響
信号の有音区間と無音区間の判定を行なう有音無音判定
部と、入力バッファの音声データに時間軸伸長を行う時
間軸制御部と、時間軸制御部の出力データを記憶する出
力バッファと、出力バッファ内に記憶されているデータ
のデータ残量を計測するデータ残量監視部と、データ残
量監視部から得られたデータ残量に応じて時間軸伸長の
伸長比を有音区間と無音区間の各々独立に決定する伸長
比制御部と、出力バッファに記録されている音声データ
をアナログ信号に変換するＤ／Ａ変換器とを備え、時間
軸制御部は、有音無音判定部の出力信号と伸長比制御部
の出力信号に基づいて、無音区間の伸長比が有音区間の
伸長比より大きい伸長比で入力バッファの音声データを
時間軸伸長することを特徴とする。 According to a first aspect of the present invention, there is provided an audio time base converting apparatus for converting an acoustic signal stored in a recording medium into M (0 <M <1) times at the time of recording. A reproducing unit for reading at a speed of A / D, an A / D converter for converting an analog signal read by the reproducing unit into a digital signal,
An input buffer for storing the output data of the D converter, the acoustic
A voice activity detection unit for judging voiced section and silent section of the signal, and time-axis control unit for performing time-base decompression on the audio data in the input buffer, output you store the output data in the time axis controller <br /> and power buffer, and remaining data amount monitoring unit that measures a remaining data amount of data stored in the output buffer, extension of the time between the base decompression according to the data remaining amount obtained from the data remaining quantity monitoring unit with a stretch ratio controller for determining the ratio each independently of voiced section and silent section, and a D / a converter for converting an analog signal to audio data recorded in the output buffer, time
The axis control unit is composed of the output signal of the sound / silence determination unit and the expansion ratio control unit.
Based on the output signal of
The audio data in the input buffer is
It is characterized by extending the time axis.

【００１０】[0010]

【００１１】請求項２記載の音声時間軸変換装置は、入
力バッファの音声データを、無音区間は１／Ｍ以上の伸
長比で、有音区間は１．０以上１／Ｍ以下の伸長比で時
間軸伸長する時間軸制御部を備えたものである。[0011] Voice time axis converter according to claim 2, wherein the inlet
Audio data force buffer, the silent interval 1 / M or more Shin
When the sound interval is between 1.0 and 1 / M ,
It is provided with a time axis control unit for extending the axis .

【００１２】請求項３記載の音声時間軸変換装置は、出
力バッファ内のデータ残量が所定の値以下の場合、入力
バッファの音声データを、無音区間は１／Ｍ以上の伸長
比で、有音区間は１／Ｍの伸長比で時間軸伸長する時間
軸制御部を備えたものである。[0012] Voice time axis converter according to claim 3, wherein the output
If the data remaining amount in the power buffer is below a predetermined value, the input
The audio data in the buffer, the silent section extension of more than 1 / M
In the sound interval, the time to elongate the time axis at the 1 / M expansion ratio
An axis control unit is provided.

【００１３】[0013]

【作用】上記の構成によれば、有音無音の判定の結果を
もとに、無音区間の伸長比を有音区間より大きくした時
間軸圧縮を行った後にバッファメモリに書き込みを行
う。この際に、バッファメモリ内に記録しているデータ
のデータ残量を計測し、データ残量が少なくなるほど伸
長比を大きくし、また、無音区間の割合が少なくても自
動的に伸長比を加減してバッファメモリに絶えず十分な
データが確保される構成にしたことにより、可能な限り
有音区間の再生速度を記録時に近い値に保って再生する
ことができ、聴き取りやすいスロー再生音を得ることが
できる。According to the above arrangement, based on the result of the sound / non-speech determination, the data is written to the buffer memory after the time axis compression in which the expansion ratio of the no-sound section is made larger than that of the no-sound section. At this time, the remaining data amount of the data recorded in the buffer memory is measured, and the decompression ratio is increased as the remaining data amount is reduced, and the decompression ratio is automatically adjusted even if the ratio of the silent section is small. In this way, sufficient data is always secured in the buffer memory, so that the playback speed of the sound section can be maintained as close as possible to the value at the time of recording, and a slow playback sound that is easy to hear is obtained. be able to.

【００１４】また、バッファメモリに残っているデータ
の数であるデータ残量が極めて少ない場合には、有音区
間でも１／Ｍの伸長比で時間軸伸長して音切れを防ぎ、
それ以外の場合には無音区間の伸長比をデータ残量をも
とに調整する。これによって、音声の速度は所定の固定
値で再生しつつ、バッファメモリが空になることによっ
て出力信号がとぎれることもない、違和感の無い自然な
再生音を得ることができる。[0014] Also, if a very small amount of remaining data is the number of data remaining in the server Ffamemori will prevent sound interruption and also extended time axis stretch ratio of 1 / M in voiced section,
In other cases, the extension ratio of the silent section is adjusted based on the remaining amount of data. As a result, while reproducing the sound at a predetermined fixed value, it is possible to obtain a natural reproduced sound without a sense of incongruity in which the output signal is not interrupted by the empty buffer memory.

【００１５】[0015]

【実施例】以下、本発明の第１の実施例について図面を
参照しながら説明する。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS A first embodiment of the present invention will be described below with reference to the drawings.

【００１６】図１は本発明の第１の実施例における音声
時間軸変換装置の構成を示すブロック図である。FIG. 1 is a block diagram showing the configuration of a voice time base conversion apparatus according to a first embodiment of the present invention.

【００１７】図１において、１０１は音響信号の記録お
よび再生を行なう記録再生部、１０２は記録再生部１で
再生されたアナログ信号をデジタル信号に変換するＡ／
Ｄ変換器、１０３はＡＤ変換された音響信号を一旦記録
するための入力バッファ、１０４は入力バッファから詠
み出されたデジタル信号列が有音区間であるか無音区間
であるかを判定する有音無音判定部、１０５は、入力バ
ッファから読み出された信号に対して所定の伸長比で時
間軸伸長処理を行う時間軸制御部、１０６は入力バッフ
ァからのデータの読み出しおよびそのアドレスを制御す
る読み出し制御部、１０７は出力バッファへのデータの
書き込みおよびそのアドレスを制御する書き込み制御
部、１０８は時間軸制御部で処理されたデータを一時的
に蓄える出力バッファ、１０９は出力バッファに一時的
に保存しているデータサイズを監視するデータ残量監視
部、１１０は時間軸制御部の伸長比をデータ残量監視部
の出力に応じて決定する伸長比制御部、１１１は出力バ
ッファに記録されたデジタルデータをアナログ信号に変
換するＤ／Ａ変換器である。In FIG. 1, reference numeral 101 denotes a recording / reproducing unit for recording and reproducing an audio signal, and 102 denotes an A / A for converting an analog signal reproduced by the recording / reproducing unit 1 into a digital signal.
D converter 103, an input buffer for temporarily recording the A / D-converted audio signal, and 104 a sound generator for determining whether the digital signal sequence drawn from the input buffer is a sound section or a silent section. A silence judging unit, 105 is a time axis control unit that performs time axis decompression processing on the signal read from the input buffer at a predetermined expansion ratio, and 106 is a data read from the input buffer and a read that controls its address. A control unit 107 is a write control unit for writing data to an output buffer and controlling its address, 108 is an output buffer for temporarily storing data processed by the time axis control unit, and 109 is a temporary storage for the output buffer. A data remaining amount monitoring unit that monitors the size of the data being processed, 110 determines the expansion ratio of the time axis control unit according to the output of the data remaining amount monitoring unit. Stretch ratio control unit that, 111 is a D / A converter for converting the digital data recorded in the output buffer to an analog signal.

【００１８】以上のように構成された音声時間軸変換装
置について、以下その動作を図１を参照しながら詳細に
説明する。The operation of the audio time base conversion apparatus configured as described above will be described below in detail with reference to FIG.

【００１９】まず、記録再生部１０１から記録時のＭ
（０＜Ｍ＜１）倍の速度で音響信号が読み出される。以
後、速度とは記録速度に対する相対速度を表すこととす
る。ここで、記録再生部１０１での記録時のサンプリン
グ周期をＴとすると、記録再生部１０１よりＭ倍速で再
生された音響信号は逐次Ａ／Ｄ変換器１０２によりサン
プリング周期Ｔ／Ｍでデジタル信号系列に変換されて、
入力バッファ１０３に書き込まれる。一方、Ｄ／Ａ変換
器１１１は記録時と同じサンプリング周期Ｔでアナログ
信号への変換が行われるので、単位時間あたり入力信号
に比べて１／Ｍ倍の信号が出力バッファには適宜準備さ
れていなければならない。その際に、入力信号全体に同
じ割合の時間軸伸長を施すのではなく、無音区間には有
音区間より大きい伸長比で時間軸伸縮することで有音区
間の伸長比を下げるように動作させるのが基本的な考え
方である。First, M from the recording / reproducing unit 101 at the time of recording
An acoustic signal is read at a speed (0 <M <1) times. Hereinafter, the speed indicates a relative speed to the recording speed. Here, assuming that the sampling period at the time of recording in the recording / reproducing unit 101 is T, the acoustic signal reproduced at M-times speed from the recording / reproducing unit 101 is sequentially converted into a digital signal sequence by the A / D converter 102 at the sampling period T / M. Is converted to
The data is written to the input buffer 103. On the other hand, since the D / A converter 111 performs conversion into an analog signal at the same sampling period T as during recording, a signal that is 1 / M times as large as the input signal per unit time is appropriately prepared in the output buffer. There must be. At this time, instead of performing the same ratio of the time axis expansion to the entire input signal, the non-speech section is operated to expand and contract the time axis at the expansion ratio larger than the sound section to lower the expansion ratio of the sound section. That is the basic idea.

【００２０】入力バッファから読み出された信号系列か
ら、有音無音判定部１０４によりそのサンプル値列が有
音区間であるか無音区間であるかの判定が行われる。こ
の有音／無音判定は公知の技術により容易に判定でき
る。この判定結果をもとに、時間軸制御部１０６は、入
力バッファから読み出したデータに対して時間軸伸長処
理を施して、出力バッファ１０８へ出力する。その際に
は、無音区間には無音区間用の伸長比で時間軸伸長し、
有音区間には有音区間用の伸長比で時間軸伸長が行われ
る。これらの伸長比はデータ残量監視部１０９で求めら
れたデータ残量をもとに伸長比制御部１１０で設定値を
与えられる。Based on the signal sequence read from the input buffer, the sound / non-speech determining section 104 determines whether the sample value sequence is a sound section or a silent section. This sound / non-sound determination can be easily made by a known technique. Based on this determination result, the time axis control unit 106 performs time axis expansion processing on the data read from the input buffer and outputs the data to the output buffer 108. In that case, the time axis is extended to the silent section at the extension ratio for the silent section,
In the sound section, the time axis expansion is performed at the expansion ratio for the sound section. These expansion ratios are given set values by the expansion ratio control unit 110 based on the data remaining amount obtained by the data remaining amount monitoring unit 109.

【００２１】データ残量監視部は、出力バッファに書き
込まれているが、Ｄ／Ａ変換器１１１にはまだ出力され
ていないデータの残量をモニタしており、そのデータ残
量によって有音区間用伸長比と無音区間用伸長比を決定
する。したがって、出力バッファへのデータの溜まり具
合に応じて伸長比を調整することで、出力バッファが空
になることを防いでいる。The data remaining amount monitoring unit monitors the remaining amount of data that has been written to the output buffer but has not yet been output to the D / A converter 111. And the expansion ratio for the silent section. Therefore, the output buffer is prevented from being emptied by adjusting the expansion ratio according to the degree of accumulation of data in the output buffer.

【００２２】データ残量と伸長比の関係は例えば図２
（ａ）のように１次関数で与えられるものでも、あるい
は階段状に変化するものでもかまわない。図２（ａ）の
例において、出力バッファが空に近い状態ほど伸長比を
大きくして、出力バッファにデータを溜まりやすくして
いる。特に無音区間の伸長比を大きくしている。これ
は、有音区間の伸長比を下げても出力バッファが空にな
らないようするためである。図２（ｂ）の例では、有音
区間はデータ残量が０にならない限り伸長比１、すなわ
ち、記録時と同一の音声の速度で再生されることにな
る。この場合、有音区間の伸長比が固定の１の状態で
は、有音区間が連続すると出力バッファ内のデータ残量
が急激に減少することになるので、無音区間の伸長比は
おおむね大きめにして、出力バッファにデータが溜まり
やすくしている。時間軸伸長することで出力バッファが
空にならないようにデータ数を増加させることはできる
が、むやみに大きい値の伸長比を与えていると出力バッ
ファの容量を越えてしまうことになり、出力信号の連続
性を保てなくなる。このため、データ残量が多くなるに
連れて、伸長比は小さく押さえてある。FIG. 2 shows the relationship between the remaining data amount and the expansion ratio.
As shown in (a), it may be given by a linear function, or may be changed stepwise. In the example of FIG. 2A, the expansion ratio is increased as the output buffer is closer to empty, so that data is more likely to accumulate in the output buffer. In particular, the extension ratio in the silent section is increased. This is to prevent the output buffer from being emptied even if the extension ratio of the sound section is reduced. In the example of FIG. 2B, the sound section is reproduced at the expansion ratio of 1 unless the remaining data amount becomes 0, that is, at the same audio speed as that at the time of recording. In this case, in the state where the expansion ratio of the voiced section is fixed at 1, since the remaining amount of data in the output buffer decreases rapidly when the voiced section continues, the expansion ratio of the silent section is set to a relatively large value. In this case, data easily accumulates in the output buffer. By expanding the time axis, the number of data can be increased so that the output buffer does not become empty.However, if a large expansion ratio is given unnecessarily, the capacity of the output buffer will be exceeded. Continuity cannot be maintained. For this reason, the expansion ratio is kept small as the remaining data amount increases.

【００２３】以下は、記録媒体の再生速度を記録時の２
／３倍（Ｍ＝２／３）にした場合を一例にとって、動作
説明を行う。The following describes the reproduction speed of the recording medium at the time of recording.
The operation will be described by taking, as an example, the case of times (M = ２).

【００２４】まず、図２の伸長比設定テーブルは、デー
タ残量が０のとき、有音区間の伸長比を１．５にして、
入力信号に有音が与えられても出力バッファが空になる
ことを防いでいる。また、データ残量がほぼ出力バッフ
ァ容量と等しい場合には、無音区間の伸長比は１．５以
下に抑える必要がある。First, the decompression ratio setting table of FIG. 2 shows that when the remaining data amount is 0, the decompression ratio of the sound section is set to 1.5.
Even if a sound is given to the input signal, the output buffer is prevented from being empty. When the remaining data amount is almost equal to the output buffer capacity, the expansion ratio in the silent section needs to be suppressed to 1.5 or less.

【００２５】図３は無音区間と有音区間とを別々の時間
軸伸長比で時間軸伸長を行う場合の処理の様子を時間軸
に関し模式的に示したものである。（ａ）の記録時の入
力信号に対して（ｂ）は２／３倍の再生速度で記録媒体
から音声を再生した場合である。ここで、入力信号の無
音区間の割合に依存して無音区間、有音区間の伸長比を
決める必要がある。（ｃ）と（ｄ）には無音区間の割合
の異なる２つの例を示す。入力信号１から６の部分にお
いて、（ｃ）の例では１，２，３が無音区間で、４，
５，６が有音区間とした場合の処理を行っている。
（ｄ）の例では１，２が無音区間で、３，４，５，６が
有音区間とした場合の処理を行っている。この例では有
音区間の伸長比はともに１．０にしているため、無音区
間の伸長比は、（ｃ）の例では２．０、（ｄ）の例では
２．５となる。これらの例のように、無音区間の割合が
あらかじめ推定できれば、出力バッファから過不足なく
出力データをＤ／Ａ変換器に供給し続けられるので、伸
長比を一定に固定しておけばよい。しかし、再生するソ
ースの種類によって無音の含まれる割合は様々である。
したがって、出力バッファに蓄えられたデータの量をモ
ニタしその値によって伸長比を決定し、出力バッファで
出力データの時間的な過不足を吸収することによって、
無音の割合が予想できない音声であっても、無音区間と
有音区間の伸長比を独立に設定することができる。FIG. 3 schematically shows the processing in the case of performing the time axis expansion in the silent section and the sound section at different time axis expansion ratios with respect to the time axis. (B) shows a case where sound is reproduced from a recording medium at a reproduction speed of 2/3 times the input signal at the time of recording (a). Here, it is necessary to determine the expansion ratio of the silent section and the sound section depending on the ratio of the silent section of the input signal. (C) and (d) show two examples in which the ratio of the silent section is different. In the part of the input signals 1 to 6, in the example of (c), 1, 2, and 3 are silent sections,
Processing is performed in the case where 5 and 6 are sound sections.
In the example of (d), processing is performed in the case where 1, 2 are silent sections, and 3, 4, 5, and 6 are sound sections. In this example, since the expansion ratios of the sound sections are both 1.0, the expansion ratios of the silent sections are 2.0 in the example of (c) and 2.5 in the example of (d). As in these examples, if the ratio of the silent section can be estimated in advance, the output data can be continuously supplied from the output buffer to the D / A converter without any excess or shortage. Therefore, the expansion ratio may be fixed. However, the proportion of silence included varies depending on the type of source to be reproduced.
Therefore, by monitoring the amount of data stored in the output buffer and determining the expansion ratio based on the value, the output buffer absorbs the excess or deficiency of the output data over time,
Even for voices for which the proportion of silence is unpredictable, the extension ratio between the silent section and the sound section can be set independently.

【００２６】ここで、時間軸変換処理の解説については
例えば「『高品質音声速度変換方式のＤＳＰによる実
現』、鈴木，三崎，電子情報通信学会音声研究会資料
SP90-34、（1990.8.23）」などに詳しく記述されてい
る。Here, for a description of the time axis conversion processing, see, for example, "Implementation of DSP for High Quality Voice Speed Conversion Method", Suzuki, Misaki, Speech Research Group of IEICE.
SP90-34, (1990.8.23).

【００２７】このような伸長比の制御を行なうことによ
り、無音区間の割合により時間軸伸長する伸長比が少々
変化するが、記録時の音声の速度以下で、かつ、記録媒
体の再生速度より早い音声の速度で、音声信号を聴取で
きることになる。By performing such control of the expansion ratio, the expansion ratio for expanding the time axis slightly changes depending on the ratio of the silent section. However, the expansion ratio is lower than the speed of the sound at the time of recording and higher than the reproduction speed of the recording medium. At the speed of the voice, the voice signal can be heard.

【００２８】以上のように、本実施例によれば、データ
残量に基づいて有音区間・無音区間各々独立に時間軸伸
長比を設定し、データ残量が予め定めた一定量より少な
い時には有音区間の伸長比を１／Ｍに設定して出力信号
が途切れることを防ぎつつ、有音区間をできるだけ記録
時の音声の速度に近くする伸長比の制御を行うことで、
記録媒体の再生速度が遅くなっても違和感なく聞き取り
やすい再生音を得ることができる。As described above, according to the present embodiment, the time axis expansion ratio is set independently for each of the sound section and the silent section based on the remaining data, and when the remaining data is smaller than a predetermined fixed amount, By setting the expansion ratio of the voiced section to 1 / M to prevent the output signal from being interrupted, and controlling the expansion ratio to make the voiced section as close as possible to the speed of the sound at the time of recording,
Even if the reproduction speed of the recording medium becomes slow, a reproduced sound that is easy to hear without discomfort can be obtained.

【００２９】[0029]

【発明の効果】以上のように、本発明は、記録速度のＭ
倍（０＜Ｍ＜１）で再生された音響信号を有音無音判定
部で有音区間と無音区間の判定をし、時間軸制御部では
有音区間と無音区間に対して独立に設定した伸長比で時
間軸伸長して出力バッファに蓄え、出力バッファに記録
されているデータ残量に応じて無音区間および有音区間
の伸長比を予め定めた規則により決定し、無音区間伸長
比を１／Ｍ以上かつ、有音区間伸長比を１．０以上１／
Ｍ以下に設定して各々の伸長比を独立に変化させること
で、有音区間の音声の速度を再生速度より早くできる。
また、有音区間伸長比は、データ残量が所定の値以下の
場合には１／Ｍに設定し、それ以外の場合には指定した
固定値に設定し、かつ、無音区間伸長比を１／Ｍ以上の
範囲でデータ残量に対応した変換規則に基づいて決定す
ることにより、音声の速度は再生速度より早い一定値で
再生することができる。したがって、有音区間の音声の
速度をより記録時に近い値にして出力できる。また、無
音区間の含まれる割合に応じて無音伸長比と有音伸長
比、あるいは、無音伸長比のみを調整できるように、デ
ータ残量監視部を設けており、その結果、どのような入
力信号が与えられても、出力信号が途切れることなく再
生できる。As described above, according to the present invention, the recording speed M
The sound signal reproduced at the multiple (0 <M <1) is determined by the sound / non-speech determining unit to be a sound period and a silent period, and the time axis control unit is set independently for the sound period and the silent period. The time axis is expanded by the expansion ratio and stored in the output buffer, and the expansion ratio of the silent section and the sound section is determined according to a predetermined rule according to the remaining amount of data recorded in the output buffer. / M or more, and the sound section expansion ratio is 1.0 or more 1 /
By setting the expansion ratio to M or less and changing each expansion ratio independently, the speed of the sound in the voiced section can be faster than the reproduction speed.
Further, the sound section expansion ratio is set to 1 / M when the remaining data amount is equal to or less than a predetermined value, otherwise, it is set to a specified fixed value, and the silent section expansion ratio is set to 1 By determining based on the conversion rule corresponding to the remaining data amount in the range of / M or more, the audio speed can be reproduced at a constant value higher than the reproduction speed. Therefore, it is possible to output the sound speed in the sound section at a value closer to the recording time. Further, a data remaining amount monitoring unit is provided so that the silent extension ratio and the audio extension ratio or only the silent extension ratio can be adjusted according to the ratio of the included silent section. , The output signal can be reproduced without interruption.

【００３０】このように、本発明によれば、画像信号を
ゆっくりと見る都合によって、記録媒体の再生速度を遅
くしても音声信号を必要以上に遅い音声の速度で聞く必
要はなくなり、違和感の無い聞き取りやすいスロー再生
を可能にする音声時間軸変換装置を提供することができ
る。As described above, according to the present invention, it is not necessary to listen to the audio signal at an unnecessarily low audio speed even if the reproduction speed of the recording medium is reduced, because the image signal is viewed slowly. It is possible to provide an audio time base conversion device which enables slow and easy-to-listen slow reproduction.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明の実施例の音声時間軸変換装置の構成を
示すブロック図FIG. 1 is a block diagram showing a configuration of an audio time base conversion apparatus according to an embodiment of the present invention.

【図２】本実施例の伸長比設定テーブルの説明図FIG. 2 is an explanatory diagram of an expansion ratio setting table according to the present embodiment.

【図３】本実施例の時間軸伸長処理の模式図FIG. 3 is a schematic diagram of a time axis extension process according to the present embodiment.

【図４】従来例の音声時間軸変換装置のブロック図FIG. 4 is a block diagram of a conventional audio time base conversion apparatus.

【図５】従来の時間軸伸長処理の模式図FIG. 5 is a schematic diagram of a conventional time base extension process.

【符号の説明】[Explanation of symbols]

１０１記録再生部１０２Ａ／Ｄ変換器１０３入力バッファ１０４有音無音判定部１０５時間軸制御部１０６読み出し制御部１０７書き込み制御部１０８出力バッファ１０９データ残量監視部１１０伸長比制御部１１１Ｄ／Ａ変換器 DESCRIPTION OF SYMBOLS 101 Recording / reproducing part 102 A / D converter 103 Input buffer 104 Sound / non-speech determination part 105 Time axis control part 106 Read control part 107 Writing control part 108 Output buffer 109 Data remaining amount monitoring part 110 Expansion ratio control part 111 D / A converter

───────────────────────────────────────────────────── フロントページの続き (56)参考文献特開平７−191695（ＪＰ，Ａ) 特開平７−192392（ＪＰ，Ａ) 特開平６−289895（ＪＰ，Ａ) 特開平５−73089（ＪＰ，Ａ) 特開平３−205656（ＪＰ，Ａ) (58)調査した分野(Int.Cl.⁷，ＤＢ名) G10L 21/00 - 21/06 ──────────────────────────────────────────────────続き Continuation of the front page (56) References JP-A-7-191695 (JP, A) JP-A-7-192392 (JP, A) JP-A-6-289895 (JP, A) 73089 (JP, A) JP-A-3-205656 (JP, A) (58) Fields investigated (Int. Cl. ⁷ , DB name) G10L 21/00-21/06

Claims

(57)【特許請求の範囲】(57) [Claims]

【請求項１】記録媒体に記憶された音響信号を記録時
のＭ（０＜Ｍ＜１）倍の速度で読み出す再生部と、前記再生部で読み出されたアナログ信号をデジタル信号
へと変換するＡ／Ｄ変換器と、前記Ａ／Ｄ変換器の出力データを記憶する入力バッファ
と、前記音響信号の有音区間と無音区間の判定を行なう有音
無音判定部と、前記入力バッファの音声データに時間軸伸長を行う時間
軸制御部と、前記時間軸制御部の出力データを記憶する出力バッファ
と、前記出力バッファ内に記憶されているデータのデータ残
量を計測するデータ残量監視部と、前記データ残量監視部から得られたデータ残量に応じて
時間軸伸長の伸長比を有音区間と無音区間の各々独立に
決定する伸長比制御部と、前記出力バッファに記録されている音声データをアナロ
グ信号に変換するＤ／Ａ変換器とを備え、前記時間軸制御部は、前記有音無音判定部の出力信号と
前記伸長比制御部の出力信号に基づいて、無音区間の伸
長比が有音区間の伸長比より大きい伸長比で前記入力バ
ッファの音声データを時間軸伸長することを特徴とする
音声時間軸変換装置。1. A reproducing section for reading an acoustic signal stored in a recording medium at a speed of M (0 <M <1) times during recording, and converting an analog signal read by the reproducing section into a digital signal. A / D converter, an input buffer for storing output data of the A / D converter, a sound / silence determining unit for determining a sound section and a silent section of the acoustic signal , and a sound of the input buffer. A time axis control unit that performs time axis expansion on data; an output buffer that stores output data of the time axis control unit; and a data remaining amount monitoring unit that measures the remaining data amount of data stored in the output buffer. If, depending on the remaining data amount obtained from the data residual amount monitoring unit
And stretch ratio control unit that each independently <br/> determination of voiced section and silent section extension ratio of the time between the shaft extension, the audio data recorded D / A converter for converting the analog signal to the output buffer a vessel, said time-axis control unit, the output signal of the activity decision unit and
Based on the output signal of the expansion ratio control unit, expansion of a silent section is performed.
When the input ratio is longer than the expansion ratio of the sound section,
An audio time axis conversion device, characterized in that the audio data of the buffer is extended on the time axis.

【請求項２】時間軸制御部は、入力バッファの音声デ
ータを、無音区間は１／Ｍ以上の伸長比で、有音区間は
１．０以上１／Ｍ以下の伸長比で時間軸伸長することを
特徴とする請求項１記載の音声時間軸変換装置。2. A time axis control unit comprising:
The chromatography data, the silence section 1 / M or more stretch ratio, sound interval is audio time axis conversion according to claim 1, wherein the time-base decompression in the following stretch ratio 1.0 or higher 1 / M apparatus.

【請求項３】時間軸制御部は、出力バッファ内のデー
タ残量が所定の値以下の場合、入力バッファの音声デー
タを、無音区間は１／Ｍ以上の伸長比で、有音区間は１
／Ｍの伸長比で時間軸伸長することを特徴とする請求項
１または請求項２記載の音声時間軸変換装置。3. The time axis control section, if the remaining amount of data in the output buffer is equal to or less than a predetermined value , the audio data in the input buffer.
The silence section has an expansion ratio of 1 / M or more , and the sound section has 1
Claims, characterized in that the / M stretch ratio extension time axis
The audio time base conversion device according to claim 1 or 2 .