JPH08255000A

JPH08255000A - Voice signal reproducing device

Info

Publication number: JPH08255000A
Application number: JP7058719A
Authority: JP
Inventors: Teruo Hoshi; 照雄法師; Masanori Miyatake; 正典宮武; Junichi Umemoto; 順一梅本
Original assignee: Sanyo Electric Co Ltd
Current assignee: Sanyo Electric Co Ltd
Priority date: 1995-03-17
Filing date: 1995-03-17
Publication date: 1996-10-01

Abstract

PURPOSE: To shorten the time required for reproducing a recorded voice while maintaing a state easy to hear. CONSTITUTION: A DSP 22 writes an input voice data in a ring memory 24 by eliminating silent parts of the data while detecting them. Besides, since data are read out from the memory 24 in a fixed speed, a reproduced voice is outputted from a speaker 28 with an ordinary musical interval. When the silent parts are small, the data amount to be written becomes large than the data amount to be read out in the memory 24. In this case, the data amount to be inputted to the DSP 22 is make small by making the speed of a motor 34 slow and by making the sampling frequency of an A/D converter 16 slow in synchronization with the rotational speed of the motor 34. Consequently, the data amount to be written in the ring memory 24 becomes small and then it is prevented that data are made to be rewritten before the data are read out in the ring memory 24.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【産業上の利用分野】本発明は、会議などにおける録音
音声を短時間で効率的に再生する音声信号再生装置に関
する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice signal reproducing device for efficiently reproducing recorded voice in a conference or the like in a short time.

【０００２】[0002]

【従来の技術】従来より、講演会、講習会、打ち合わせ
会議などの内容を記録するために、テープレコーダ等の
録音装置が利用されている。特に、コンパクトタイプの
カセットテープを用いたカセットテープレコーダは、カ
セットテープのみならず、レコーダ自体が安価、小型で
あり、これら録音に広く利用されている。2. Description of the Related Art Conventionally, a recording device such as a tape recorder has been used to record the contents of lectures, seminars, and meeting meetings. In particular, a cassette tape recorder using a compact type cassette tape is not only a cassette tape, but the recorder itself is inexpensive and small, and is widely used for recording these.

【０００３】そして、録音したテープは、その後におい
て再生され、録音した本人における内容確認や、録音時
にいなかった者による内容の把握に利用される。実際に
再生すると、これらの録音された内容の中には、不要な
間（音声入力のない状態）などがかなり多く存在する。The recorded tape is then reproduced and used for confirmation of the content of the person who recorded it and for grasping the content by a person who was not at the time of recording. When actually reproduced, among these recorded contents, there are considerably many unnecessary periods (states without voice input) and the like.

【０００４】一方、このようなテープの再生において
は、なるべく短時間ですませたいという要求がある。こ
のため、音声が入力されたときのみ録音を行う音声起動
機能を有した装置も知られている。この機能を利用すれ
ば、無音部分が録音されないため、再生が短時間で行え
る。しかし、この音声起動機能は、音声が入力されたこ
とを認識してから、録音を開始するため、音声の最初の
部分が削除されてしまい、内容を十分に把握できなくな
るという問題があった。On the other hand, in the reproduction of such a tape, there is a demand that the reproduction time be as short as possible. Therefore, there is also known a device having a voice activation function of recording only when a voice is input. If this function is used, the silent portion is not recorded, so that the reproduction can be performed in a short time. However, this voice activation function has a problem in that since the recording is started after recognizing that the voice is input, the first part of the voice is deleted and the content cannot be sufficiently grasped.

【０００５】一方、テープを早送り再生すれば、これに
よって再生時間を短縮できる。しかし、単に早送りする
と、音声の音程が高域側（周波数が高周波数側）にシフ
トし、非常に聞き取りにくくなってしまう。On the other hand, if the tape is fast-forwarded and reproduced, the reproduction time can be shortened. However, simply fast-forwarding shifts the pitch of the voice to the high frequency side (frequency is high frequency side), making it very difficult to hear.

【０００６】そこで、テープを早送り再生しながらも、
通常の音程でしかも比較的ゆっくりとした速度で音声を
再生する話速変換処理技術も知られている。この話速変
換では、テープを早送りして得た音声信号の音程をもと
の周波数に戻すとともに、音声信号の無音部分を検出し
てこの部分を削除する。削除する部分が十分であれば、
早送り再生にもかかわらず通常再生速度で音声を再生す
ることができる。Therefore, while fast-forwarding and reproducing the tape,
There is also known a speech speed conversion processing technique for reproducing a voice at a normal pitch and at a relatively slow speed. In this speech speed conversion, the pitch of the audio signal obtained by fast-forwarding the tape is returned to the original frequency, and a silent portion of the audio signal is detected and this portion is deleted. If there are enough parts to delete,
It is possible to play audio at the normal playback speed despite fast-forward playback.

【０００７】一方、通常速度で再生する場合でも、英会
話などゆっくり聞きたいという需要がある。その場合に
も音声信号の無音部分をゆっくり検出して削除すれば、
検出部分に相当する時間だけ音声を伸長でき、ゆっくり
音声を再生することができる。On the other hand, there is a demand for listening slowly to English conversation even when reproducing at normal speed. Even in that case, if the silent part of the audio signal is slowly detected and deleted,
The voice can be expanded for a time corresponding to the detection portion, and the voice can be played back slowly.

【０００８】[0008]

【発明が解決しようとする課題】しかし、上記従来例に
よれば、検出される無音部分が少ない場合、早送り再生
速度に追従させて音声を出力させるには、音声部分の時
間軸圧縮率を高くしなければならない。一般に、３０％
以上圧縮すると話の意味を聞き取るのが困難になってし
まうという欠点があった。また、通常速度で再生する場
合にも、ほとんど音声を伸長させることが出来ず、話速
変換の効果が得られなかった。However, according to the above-mentioned conventional example, when there are few silent parts to be detected, in order to output the audio while following the fast-forward reproduction speed, the time base compression rate of the audio part is increased. Must. Generally, 30%
The above compression has the drawback that it becomes difficult to hear the meaning of the story. In addition, even when reproduced at the normal speed, the voice could hardly be expanded, and the effect of the speech speed conversion could not be obtained.

【０００９】本発明は、上記問題点を解決することを課
題としてなされたものであり、話の速度を聞きやすいも
のに維持しながら再生に要する時間を短縮したり、再生
速度を維持しながら話の速度を聞きやすいものに変換し
たりすることのできる音声再生装置を提供することを目
的とする。SUMMARY OF THE INVENTION The present invention has been made to solve the above-mentioned problems, and shortens the time required for reproduction while keeping the talk speed easy to hear, or talks while maintaining the reproduction speed. It is an object of the present invention to provide a voice reproduction device capable of converting the speed of a sound into an easy-to-hear one.

【００１０】[0010]

【課題を解決するための手段】本発明は音声信号記録媒
体に記憶されている音声信号を再生出力する音声信号再
生手段と、再生出力される音声信号の継続時間を圧縮あ
るいは伸長処理する時間圧縮伸長手段と、この時間圧縮
伸長手段における圧縮伸長処理状況に応じて上記音声信
号再生手段の再生速度を制御する再生制御手段と、を有
することを特徴とする。DISCLOSURE OF THE INVENTION The present invention relates to an audio signal reproducing means for reproducing and outputting an audio signal stored in an audio signal recording medium, and a time compression for compressing or expanding the duration of the reproduced and output audio signal. The present invention is characterized by comprising decompression means and reproduction control means for controlling the reproduction speed of the audio signal reproduction means in accordance with the compression / decompression processing status of the time compression / decompression means.

【００１１】また、本発明は上記時間圧縮伸長手段は、
音声信号の中の無音部分を圧縮する無音省略処理を行う
ことを特徴とする。Further, according to the present invention, the time compression / expansion means is
It is characterized by performing a silence reduction process for compressing a silence portion in an audio signal.

【００１２】また、本発明は、上記音声信号再生手段
は、アナログの音声信号を出力するものであり、このア
ナログの音声信号をデジタルに変換するＡ／Ｄ変換手段
をさらに有し、このＡ／Ｄ変換手段から出力されるデジ
タル音声信号が時間圧縮伸長手段に供給されることを特
徴とする。In the present invention, the audio signal reproducing means outputs an analog audio signal, and further has an A / D converting means for converting the analog audio signal into a digital signal. The digital audio signal output from the D conversion means is supplied to the time compression / decompression means.

【００１３】また、本発明は、上記Ａ／Ｄ変換手段は、
アナログ信号を所定のサンプリングクロックに応じてデ
ジタル信号に変換するものであり、このサンプリングク
ロックを上記音声信号再生手段の再生速度に応じて変更
するサンプリングクロック制御手段を更に有することを
特徴とする。Further, according to the present invention, the A / D conversion means is
The analog signal is converted into a digital signal in accordance with a predetermined sampling clock, and a sampling clock control means for changing the sampling clock in accordance with the reproduction speed of the audio signal reproduction means is further provided.

【００１４】また、本発明は、上記音声信号記録媒体
は、磁気テープであり、上記音声信号再生手段は磁気テ
ープを送るテープ送りモータを含み、上記サンプリング
クロック制御手段は、上記テープ送りモータの回転数に
応じてサンプリングクロックを制御することを特徴とす
る。Further, in the present invention, the audio signal recording medium is a magnetic tape, the audio signal reproducing means includes a tape feed motor for feeding the magnetic tape, and the sampling clock control means rotates the tape feed motor. It is characterized in that the sampling clock is controlled according to the number.

【００１５】また、本発明は、上記時間圧縮伸長手段
は、Ａ／Ｄ変換手段からのデジタル音声データの中から
無音部分を検出し、無音部分を省略する処理を行う無音
省略処理部と、無音部分が省略された音声データを新し
いものから常に所定時間分記憶する音声メモリと、この
音声メモリに記憶されている音声データを読み出し出力
する読み出し出力部と、を有することを特徴とする。Further, according to the present invention, the time compression / expansion means detects a silence part in the digital audio data from the A / D conversion means, and performs a process of omitting the silence part, and a silence elimination part. It is characterized in that it has a voice memory for always storing a predetermined period of new voice data from which the parts are omitted, and a read output section for reading and outputting the voice data stored in this voice memory.

【００１６】また、本発明は、上記時間圧縮伸長手段
は、音声メモリ内におけるすでに読み出された音声デー
タの量であるメモリ残量を認識するメモリ残量認識手段
を含み、上記再生制御手段は、認識されたメモリ残量に
応じて再生速度を制御することを特徴とする。Further, in the present invention, the time compression / expansion means includes a remaining memory capacity recognizing means for recognizing a remaining memory capacity which is the amount of the audio data already read in the audio memory, and the reproduction control means. , The playback speed is controlled according to the recognized remaining memory capacity.

【００１７】また、本発明は、上記時間圧縮伸長手段
は、有音部分における音声データの繰り返し波形の一部
を間引きあるいは繰り返し波形を付加する時間軸圧縮伸
長処理を行うことを特徴とする。Further, the present invention is characterized in that the time compression / expansion means carries out a time axis compression / expansion process for thinning out a part of a repetitive waveform of voice data in a voiced portion or adding a repetitive waveform.

【００１８】また、本発明は、話速指令を外部から入力
する話速入力手段と、音声信号再生手段から再生出力さ
れる音声信号の話速を検出する話速検出手段と、を有
し、上記時間圧縮伸長手段は、話速変換処理後の話速
が、入力された話速指令における話速に合致するように
間引きあるいは付加処理を行うことを特徴とする。The present invention further comprises a voice speed input means for externally inputting a voice speed command, and a voice speed detecting means for detecting a voice speed of a voice signal reproduced and output from the voice signal reproducing means. The time compression / decompression means is characterized by performing thinning-out or addition processing so that the voice speed after the voice speed conversion process matches the voice speed in the input voice speed command.

【００１９】また、本発明は、上記時間圧縮伸長手段
は、音声信号再生手段から再生出力される音声信号の話
速を検出する話速検出手段を有し、この話速検出手段に
よって検出した話速が速い場合には目標とする話速を所
定の話速より比較的速く、上記話速検出手段によって検
出した話速が遅い場合には目標とする話速を所定の話速
より比較的遅く設定して時間軸圧縮伸張処理を行うこと
を特徴とする。Further, according to the present invention, the time compression / expansion means has a voice speed detecting means for detecting a voice speed of a voice signal reproduced and outputted from the voice signal reproducing means, and the voice detected by the voice speed detecting means. When the speed is fast, the target speech speed is relatively faster than the predetermined speech speed, and when the speech speed detected by the speech speed detecting means is slow, the target speech speed is relatively slower than the predetermined speech speed. It is characterized in that the time axis compression / expansion processing is performed after setting.

【００２０】また、本発明は、上記話速検出手段は、単
位時間当たりの母音の発生回数をカウントすることによ
って話速を検出することを特徴とする。Further, the present invention is characterized in that the speech speed detecting means detects the speech speed by counting the number of times vowels are generated per unit time.

【００２１】また、本発明は、上記話速検出手段は、１
つの母音の発声継続時間を検出することによって話速を
検出することを特徴とする。According to the present invention, the speech speed detecting means is
The feature is that the speech speed is detected by detecting the utterance durations of two vowels.

【００２２】[0022]

【作用】音声信号再生手段は、記録媒体に記憶されてい
る音声信号を再生して出力する。時間圧縮伸長手段は、
再生された音声信号に対して時間圧縮伸長処理を施す。
そして、再生制御手段が時間の圧縮伸長状況に応じて、
再生手段による再生速度を制御する。例えば、時間圧縮
伸長手段が無音部分を省略する場合であれば、無音部分
が多い場合に、再生速度を速くし、無音部分が少なく、
省略が少ない場合に再生速度を遅くする。これによっ
て、時間圧縮伸長手段からの出力データ量が一定値に近
づく。The audio signal reproducing means reproduces and outputs the audio signal stored in the recording medium. The time compression / expansion means is
A time compression / expansion process is performed on the reproduced audio signal.
Then, the playback control means, depending on the time compression / expansion situation,
The reproduction speed by the reproduction means is controlled. For example, in the case where the time compression / decompression means omits the silent part, if there are many silent parts, the playback speed is increased and the silent parts are reduced.
If there are few omissions, slow down the playback speed. As a result, the amount of output data from the time compression / expansion means approaches a constant value.

【００２３】また、Ａ／Ｄ変換手段により、デジタル信
号を時間圧縮伸長手段に供給することにより、通常のメ
モリ等を利用して、時間圧縮伸長の処理が行われる。Further, by supplying the digital signal to the time compression / expansion means by the A / D conversion means, the time compression / expansion processing is performed using a normal memory or the like.

【００２４】また、Ａ／Ｄ変換手段におけるサンプリン
グクロックを再生速度に同期させることにより、再生速
度が変わっても入力信号に対するデジタル信号の変換レ
ートは一定に保持される。Further, by synchronizing the sampling clock in the A / D conversion means with the reproduction speed, the conversion rate of the digital signal with respect to the input signal is kept constant even if the reproduction speed changes.

【００２５】また、音声信号記録媒体が磁気テープであ
れば、磁気テープの送り駆動用のモータの回転数を制御
することによって、再生速度が制御される。また、Ａ／
Ｄ変換手段のサンプリングクロックは、モータ回転数に
応じて発生するパルスを利用することができる。When the audio signal recording medium is a magnetic tape, the reproduction speed is controlled by controlling the rotation speed of a motor for driving the magnetic tape. Also, A /
As the sampling clock of the D conversion means, a pulse generated according to the motor rotation speed can be used.

【００２６】また、無音部分を省略等の時間圧縮伸長処
理を行った音声データを音声メモリに順次記憶してお
き、これを所定の周波数で読み出すことにより、時間が
圧縮短縮された音声データが得られる。Further, voice data subjected to time compression / expansion processing such as omission of silent portions is sequentially stored in a voice memory and is read out at a predetermined frequency to obtain voice data whose time is shortened. To be

【００２７】また、音声メモリには、時間圧縮伸長され
た音声データが書き込まれ、この書き込まれた音声デー
タが所定のスピードで読み出される。読み出しスピード
の方は、音程が所定のものになるように読み出さなけれ
ばならず、一定のスピードになる。そこで、音声メモリ
内のすでに読み出されたデータの量、換言すると音声メ
モリ内の書き込み可能なデータの量、すなわちメモリ残
量は、入力されてくる音声信号の時間圧縮伸長の度合い
によって変化する。本発明では、メモリ残量によって、
再生速度を制御することによって、メモリ残量をほぼ一
定の値に制御する。Further, the time-compressed and expanded audio data is written in the audio memory, and the written audio data is read out at a predetermined speed. Regarding the read speed, the read speed must be such that the pitch becomes a predetermined value, and the read speed becomes constant. Therefore, the amount of already read data in the audio memory, in other words, the amount of writable data in the audio memory, that is, the remaining memory amount, changes depending on the degree of time compression / expansion of the input audio signal. In the present invention, depending on the remaining memory,
By controlling the reproduction speed, the remaining memory capacity is controlled to a substantially constant value.

【００２８】また、話速変換処理によって、無音部分を
削除するとともに有音部分を時間軸を圧縮伸長すること
によっても、時間圧縮伸長が達成される。Further, the time compression / expansion can be achieved by deleting the silent part and compressing / expanding the voiced part on the time axis by the voice speed conversion process.

【００２９】また、この話速変換による時間軸圧縮伸長
処理によれば、音声は好みの速さで再生できる。この速
さの程度は、ユーザの好みによって異なる。そこで、ユ
ーザが話速変換の程度を指定すること０より、ユーザの
指定通りの速さで再生が行われる。Further, according to the time base compression / expansion processing by the speech speed conversion, the voice can be reproduced at a desired speed. The degree of this speed depends on the user's preference. Therefore, when the user specifies 0 to the degree of speech speed conversion, the reproduction is performed at the speed specified by the user.

【００３０】また、音声信号再生手段によって再生出力
された音声信号における話速を検出し、その話速を考慮
することによって、音声信号再生手段によって再生出力
された音声信号（例えば、録音テープの再生によって得
られる音声信号）の局所的な早口、遅口を残して、時間
軸圧縮処理が行われる。すなわち、時間軸圧縮処理の目
標とする話速は、通常予め設定された所定の話速である
が、この処理の目標となる所定の話速を検出した話速に
よって、検出した話速が遅ければ目標となる話速を速
く、検出した話速が遅ければ目標となる話速を遅くす
る。Further, by detecting the voice speed in the voice signal reproduced and output by the voice signal reproducing means and considering the voice speed, the voice signal reproduced and outputted by the voice signal reproducing means (for example, reproduction of a recording tape). The audio signal obtained by (1) is locally processed in the early and late directions, and the time axis compression processing is performed. That is, the target speech speed of the time axis compression processing is usually a predetermined speech speed set in advance, but the detected speech speed is delayed by the speech speed at which the predetermined speech speed that is the target of this processing is detected. For example, the target speech speed is increased, and if the detected speech speed is low, the target speech speed is decreased.

【００３１】また、単位時間当たりの母音の数を数える
ことによって、音声信号における話速を検出することが
できる。Further, by counting the number of vowels per unit time, the speech speed in the voice signal can be detected.

【００３２】また、１つの母音の継続時間を計測するこ
とによって、音声信号における話速を検出することがで
きる。通常、母音は、１つの繰り返し波形を多数回繰り
返すことによって、構成されている。そこで、この波形
の繰り返し数をカウントすることによって早口遅口の程
度、すなわち話速を検出することができる。なお、話速
変換における圧縮伸長は、この波形単位でデータを間引
いたり追加したりすることによって好適に行われる。By measuring the duration of one vowel, the speech speed in the voice signal can be detected. Usually, a vowel is formed by repeating one repeating waveform many times. Therefore, by counting the number of repetitions of this waveform, it is possible to detect the degree of early or late speech, that is, the speech speed. The compression / expansion in the speech speed conversion is preferably performed by thinning out or adding data in this waveform unit.

【００３３】[0033]

【実施例】以下、本発明に係る音声再生装置の一実施例
について、図面に基づいて説明する。録音テープ１０
は、通常のオーディオ等の録音に使用されるコンパクト
サイズのＣ−カセットであり、通常の方式で音声信号が
録音されている。すなわち、普通の録音機で、標準のテ
ープ速度４．７５ｃｍ／ｓｅｃで録音されたものであ
り、テープ上の音声の記録フォーマットも標準のもので
ある。DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS An embodiment of a sound reproducing apparatus according to the present invention will be described below with reference to the drawings. Recording tape 10
Is a compact size C-cassette used for recording ordinary audio and the like, and an audio signal is recorded by a normal method. That is, it was recorded with a standard tape recorder at a standard tape speed of 4.75 cm / sec, and the recording format of audio on the tape was also standard.

【００３４】磁気ヘッド１２は、録音テープ１０のテー
プ走行面に近接して配置されており、録音されている音
声信号を電気信号（アナログ音声信号）として出力す
る。磁気ヘッド１２には、プリアンプ１４を介し、Ａ／
Ｄ変換器１６が接続されており、磁気ヘッド１２で得ら
れたアナログ音声信号は、所定のレベルに増幅されてＡ
／Ｄ変換器１６に供給される。Ａ／Ｄ変換器１６は、外
部から供給されるサンプリングクロックによって決定さ
れるサンプリング周波数（例えば、１６ｋＨｚ）で、ア
ナログ音声信号をサンプリングし、例えば１６ビットの
ＰＣＭ形式のデジタルデータ（音声データ）に変換す
る。The magnetic head 12 is arranged close to the tape running surface of the recording tape 10 and outputs a recorded voice signal as an electric signal (analog voice signal). The magnetic head 12 is connected to the A /
The D converter 16 is connected, and the analog audio signal obtained by the magnetic head 12 is amplified to a predetermined level to
It is supplied to the / D converter 16. The A / D converter 16 samples an analog audio signal at a sampling frequency (for example, 16 kHz) determined by a sampling clock supplied from the outside and converts it into 16-bit PCM format digital data (audio data), for example. To do.

【００３５】Ａ／Ｄ変換器１６からのデジタルデータ
は、入力バッファ１８に入力され、ここに記憶される。
ここで、この入力バッファ１８は、１フレーム分（通常
１フレームが１０ｍｓｅｃ程度）の音声データを交互に
受け入れる２つのフレームメモリからなっており、一方
のフレームメモリにデータが記憶されているときに他方
のフレームメモリからのデータが読み出され、これにつ
いて後述の処理がなされる。Digital data from the A / D converter 16 is input to the input buffer 18 and stored therein.
The input buffer 18 is composed of two frame memories that alternately receive one frame of audio data (usually, one frame is about 10 msec). When one frame memory stores data, the other one stores the other. The data from the frame memory is read out, and the processing described later is performed on the data.

【００３６】入力バッファ１８には、データバス２０を
介し、ＤＳＰ（デジタルシグナルプロセッサ）２２が接
続されており、またデータバス２０にはリングメモリ２
４も接続されている。このＤＳＰ２２は、音程を維持し
たまま出力音声を時間圧縮伸長処理を行う。特に、本実
施例では、無音部分を検出して、この無音部分を間引き
処理する無音省略処理と、有音部分の一部を圧縮伸長す
る時間軸圧縮伸長処理の両方を行う、いわゆる話速変換
処理を行う。そして、時間圧縮伸長したデータをリング
メモリ２４に記憶する。なお、この例では、リングメモ
リ２４として、２５６ｋビットのものが採用されてい
る。A DSP (digital signal processor) 22 is connected to the input buffer 18 via a data bus 20, and the ring memory 2 is connected to the data bus 20.
4 is also connected. The DSP 22 performs time compression / expansion processing on the output voice while maintaining the pitch. In particular, in this embodiment, a so-called voice speed conversion is performed, in which both a silent part is detected and a silent omission process for thinning out the silent part and a time axis compression / expansion process for compressing / expanding a part of the sound part are performed. Perform processing. Then, the time-compressed and expanded data is stored in the ring memory 24. In this example, the ring memory 24 has a capacity of 256 kbits.

【００３７】また、データバス２０には、Ｄ／Ａ変換器
２６も接続されており、リングメモリ２４から読み出さ
れたデータは、Ｄ／Ａ変換器２６によりアナログ音声信
号に変換され、スピーカ２８に供給される。そこで、こ
のスピーカ２８から入力音声信号に比べて時間が圧縮伸
長された音声データに基づく音声が出力される。なお、
図示は省略したが、Ｄ／Ａ変換器２６とスピーカ２８の
間には、通常増幅器などが設けられる。A D / A converter 26 is also connected to the data bus 20, and the data read from the ring memory 24 is converted into an analog audio signal by the D / A converter 26 and the speaker 28 is used. Is supplied to. Therefore, the speaker 28 outputs the sound based on the sound data whose time is compressed and expanded as compared with the input sound signal. In addition,
Although illustration is omitted, an amplifier or the like is usually provided between the D / A converter 26 and the speaker 28.

【００３８】また、データバス２０には、メモリコント
ローラ３０も接続されている。そして、このメモリコン
トローラ３０が、上述のリングメモリ２４に対するデー
タの書き込み読み出しを制御する。すなわち、メモリコ
ントローラ３０は、書き込みアドレスカウンタ及び読み
出しアドレスカウンタを内蔵しており、これらのカウン
ト値によって、リングメモリ２４におけるデータの書き
込み読み出しアドレスを決定する。A memory controller 30 is also connected to the data bus 20. Then, the memory controller 30 controls writing / reading of data to / from the ring memory 24. That is, the memory controller 30 incorporates a write address counter and a read address counter, and the write / read address of data in the ring memory 24 is determined by these count values.

【００３９】書き込みアドレスは、ＤＳＰ２２が所定の
時間圧縮伸長処理した音声データを供給する際に書き込
みデータ量に応じて順次カウントアップしていき、リン
グメモリ２４の最終アドレスに達した場合には、最初の
アドレスに戻る。これによって、入力されてくる音声デ
ータをリングメモリ２４に順次記憶することができる。
一方、読み出しアドレスは、常に一定のクロックでカウ
ントアップしていく。なお、読み出しアドレスもリング
メモリ２４の最終アドレスまで来たときには、次のカウ
ントアップで最初のアドレスに戻る。When the DSP 22 supplies the audio data compressed and expanded for a predetermined time, the write address is sequentially counted up according to the write data amount, and when the final address of the ring memory 24 is reached, Return to the address. Thereby, the input voice data can be sequentially stored in the ring memory 24.
On the other hand, the read address always counts up with a constant clock. When the read address also reaches the last address of the ring memory 24, it returns to the first address at the next count up.

【００４０】そして、読み出しアドレスのすすみは一定
であるが、書き込みアドレスのすすみは、ＤＳＰ２２に
おける時間圧縮伸長処理の状況により異なる。このた
め、書き込みアドレスと読み出しアドレスの差は、時間
圧縮伸長処理の状況によって変化する。なお、このアド
レスの差は、読み出しが終了した音声データの量、すな
わち今後の書き込みが可能な音声データの量に対応して
おり、これをリングメモリ２４におけるメモリ残量と呼
ぶ。The corner of the read address is constant, but the corner of the write address differs depending on the situation of the time compression / expansion process in the DSP 22. Therefore, the difference between the write address and the read address changes depending on the status of the time compression / expansion process. Note that this address difference corresponds to the amount of audio data that has been read, that is, the amount of audio data that can be written in the future, and this difference is called the remaining memory amount in the ring memory 24.

【００４１】本実施例では、このメモリ残量はメモリコ
ントローラ３０が認識し、これについての信号をサーボ
アンプ３２に供給する。そして、サーボアンプ３２がメ
モリ残量に応じてテープ送り用のモータ３４の回転を制
御する。In this embodiment, the memory controller 30 recognizes the remaining amount of the memory, and supplies a signal regarding this to the servo amplifier 32. Then, the servo amplifier 32 controls the rotation of the tape feeding motor 34 according to the remaining memory.

【００４２】また、本実施例では、モータ３４にモータ
回転数に応じたパルスを発生するＦＧパルス発生器３６
が取り付けられており、ここからモータ回転数に応じた
ＦＧパルスは発生される。そして、このＦＧパルス発生
器３６からのＦＧパルスは、位相比較器３８、電圧制御
発振器４０及び１／Ｎ分周器４２からなるＰＬＬ（フェ
ーズ・ロックド・ループ）に供給され、電圧制御発振器
４０の出力である周波数信号がＡ／Ｄ変換器１６にサン
プリングクロックとして供給される。従って、Ａ／Ｄ変
換器１６におけるサンプリングクロックはモータ３４の
回転数に応じたものになる。Further, in the present embodiment, the FG pulse generator 36 for generating a pulse corresponding to the motor rotation speed to the motor 34.
Is attached, and an FG pulse corresponding to the motor rotation speed is generated from here. Then, the FG pulse from the FG pulse generator 36 is supplied to a PLL (phase locked loop) including a phase comparator 38, a voltage controlled oscillator 40, and a 1 / N frequency divider 42, and the voltage controlled oscillator 40 has a FG pulse. The output frequency signal is supplied to the A / D converter 16 as a sampling clock. Therefore, the sampling clock in the A / D converter 16 corresponds to the rotation speed of the motor 34.

【００４３】そして、本実施例の装置により音声再生を
行う場合には、まずモータ３４を２倍速（回転数１６ｒ
ｐｓ（回転／秒）、テープ送り速度９．５ｃｍ／ｓｅ
ｃ、テープ送り用ローラ径２ｍｍ）で駆動する。When audio reproduction is performed by the apparatus of this embodiment, first, the motor 34 is driven at double speed (rotation speed 16r).
ps (rotation / second), tape feeding speed 9.5 cm / se
c, tape feed roller diameter 2 mm).

【００４４】また、ＦＧパルス発生器３６はモータ３４
の１６ｒｐｓに対し、３２０ＨｚのＦＧパルス（１回転
２０パルス）を発生するものとし、１／Ｎ分周器４２
は、１／５０分周器とする。これによって、モータ３４
の回転数が１６ｒｐｓであった場合には、１６ｋＨｚの
サンプリングクロックがＡ／Ｄ変換器１６に供給される
ことになる。The FG pulse generator 36 has a motor 34.
FG pulse of 320 Hz (20 pulses per rotation) is generated for 16 rps of 1 / N frequency divider 42.
Is a 1/50 frequency divider. As a result, the motor 34
If the number of revolutions is 16 rps, the sampling clock of 16 kHz is supplied to the A / D converter 16.

【００４５】このため、Ａ／Ｄ変換器１６において、１
６ｋＨｚのサンプリング周波数でＡ／Ｄ変換が行われ、
これが入力バッファ１８に記憶される。Therefore, in the A / D converter 16, 1
A / D conversion is performed at a sampling frequency of 6 kHz,
This is stored in the input buffer 18.

【００４６】ここで、本実施例においては、ＤＳＰ２２
による時間圧縮伸長処理は、データ中の無音部分を検出
し、この一部を削除する無音省略処理と、音声信号の繰
り返し波形の一部を圧縮伸長して音程を変更せずに時間
軸を圧縮伸長して話速を制御する話速変換処理の２つで
ある。しかし、本実施例では、まず無音省略処理のみを
前提として説明を行う。Here, in the present embodiment, the DSP 22
The time compression / decompression processing by means of detecting the silence part in the data and eliminating this part, and the compression / expansion of the repetitive waveform of the audio signal to compress the time axis without changing the pitch. There are two types of speech speed conversion processing for expanding and controlling the speech speed. However, in the present embodiment, first, the description will be given on the premise of only the silence omission processing.

【００４７】ＤＳＰ２２は、入力バッファ１８に記憶さ
れている１フレーム（１０ｍｓｅｃ）の音声データのパ
ワーの平均値が所定の閾値以下であるかによって無音フ
レームを検出する。この場合の閾値としては、音声のパ
ワーと話者の発声環境中の周囲雑音のパワーとを区別で
きる値が設定される。そして、この無音フレームが５１
以上継続した場合に、５１フレーム目以降を削除する。The DSP 22 detects a silent frame depending on whether the average value of the power of the audio data of one frame (10 msec) stored in the input buffer 18 is less than a predetermined threshold value. In this case, the threshold is set to a value that can distinguish between the power of voice and the power of ambient noise in the speaking environment of the speaker. And this silent frame is 51
When the above is continued, the 51st frame and subsequent frames are deleted.

【００４８】これは、無音部分は、できるだけ短くする
ことが望ましいが、余り短くすると、間がなくなり、聞
き取りずらくなるためであり、これによって、無音部分
は最大０．５秒程度になる。なお、通常無音部分は、１
秒程度であれば、十分とされているができるだけ短くし
たい。しかし、０．３秒以下にすると間が小さくなりす
ぎると考えられる。そこで、早送り再生でも０．５秒程
度に設定するのが好適と考えられる。This is because it is desirable to make the silent portion as short as possible, but if it is made too short, it will be too short and it will be difficult to hear, so that the silent portion will be about 0.5 seconds at maximum. Note that the normal silence part is 1
About a second is enough, but I want to make it as short as possible. However, it is considered that the interval becomes too small when the time is 0.3 seconds or less. Therefore, it is considered preferable to set it to about 0.5 seconds even in fast-forward reproduction.

【００４９】このような処理によって、無音部分が短縮
されたデータがリングメモリ２４に書き込まれる。一
方、このリングメモリ２４からの読み出しは、８ｋＨｚ
に固定されたクロックで行う。Ａ／Ｄ変換器１６におけ
るサンプリング周波数は、テープ速度を９．５ｃｍ／ｓ
ｅｃとした際に１６ｋＨｚである。録音テープ１０には
通常の４．７５ｃｍ／ｓｅｃで録音された音声が記憶さ
れており、そのままでは２倍速再生した周波数が２倍の
音声になってしまう。本実施例では、Ａ／Ｄ変換器１６
のサンプリング周波数（１６ｋＨｚ）の１／２の周波数
（８ｋＨｚ）のクロックでリングメモリ２４からの読み
出しを行うことで、音声信号の周波数を通常のものに戻
している。By such processing, the data in which the silent portion is shortened is written in the ring memory 24. On the other hand, the reading from this ring memory 24 is 8 kHz.
Do it with a fixed clock. The sampling frequency in the A / D converter 16 is such that the tape speed is 9.5 cm / s.
When it is set to ec, it is 16 kHz. The sound recorded at the normal 4.75 cm / sec is stored in the recording tape 10, and the frequency reproduced at the double speed becomes double the sound as it is. In this embodiment, the A / D converter 16
The frequency of the audio signal is returned to the normal one by reading from the ring memory 24 with a clock having a frequency (8 kHz) that is ½ of the sampling frequency (16 kHz).

【００５０】ここで、無音部分が適度に分散しており、
かつその存在率がちょうど１／２であれば、このような
処理が問題なく継続される。しかし、無音部分が少ない
場合には、リングメモリ２４に書き込まれるデータ量が
多く、読み出しスピードは一定であるため、リングメモ
リ２４における書き込みアドレスと読み出しアドレスの
差（時間差）が大きくなってくる。このアドレスの差が
大きくなるということは、未だに読み出されていないデ
ータの記憶量が大きいことを意味しており、書き換えて
よいメモリ容量（メモリ残量）が少ないことを意味して
いる。そして、メモリ残量が、０より小さくなると、読
み出す前のデータが書き換えられてしまうことになり、
正しい再生が行えなくなる。Here, the silent portions are appropriately dispersed,
And, if the existence rate is exactly 1/2, such processing is continued without any problem. However, when the silent portion is small, the amount of data written in the ring memory 24 is large and the read speed is constant, so that the difference (time difference) between the write address and the read address in the ring memory 24 becomes large. The increase in the difference between the addresses means that the storage amount of the data that has not been read yet is large, and that the rewritable memory capacity (memory remaining amount) is small. When the remaining memory capacity becomes smaller than 0, the data before reading will be rewritten,
Correct playback cannot be performed.

【００５１】そこで、メモリコントローラ３０は、メモ
リ残量についての信号をサーボアンプ３２に供給し、サ
ーボアンプがメモリ残量が小さくなると、モータ３４の
回転数を低くするように制御する。これによって、録音
テープ１０の送り速度が遅くなり、再生速度が遅くな
る。一方、ＰＬＬによってＡ／Ｄ変換器１６におけるサ
ンプリング周波数もテープ送り速度に応じて小さくな
る。従って、Ａ／Ｄ変換器１６の出力のデジタルデータ
におけるテープ送り速度９．５ｃｍ／ｓｅｃの再生音に
対しサンプリングクロック１６ｋＨｚという関係、すな
わち、テープ送り速度に対するサンプリングクロックの
比率（データのサンプリングレートに対応）を維持しつ
つ、遅いサンプリングクロックでデジタルデータが入力
バッファ１８に書き込まれることになる。Therefore, the memory controller 30 supplies a signal regarding the remaining amount of memory to the servo amplifier 32, and when the remaining amount of memory becomes small, the servo amplifier controls so that the rotation speed of the motor 34 is lowered. As a result, the feeding speed of the recording tape 10 becomes slow and the playback speed becomes slow. On the other hand, the PLL also reduces the sampling frequency in the A / D converter 16 according to the tape feeding speed. Therefore, the relationship of the sampling clock 16 kHz to the reproduced sound at the tape feeding speed of 9.5 cm / sec in the digital data output from the A / D converter 16, that is, the ratio of the sampling clock to the tape feeding speed (corresponding to the sampling rate of the data ) Is maintained, digital data is written to the input buffer 18 with a slow sampling clock.

【００５２】これによって、ＤＳＰ２２における無音部
分の時間圧縮伸長処理を行った後のデータ量も少なくな
り、リングメモリ２４におけるメモリ残量が所定値に収
められることになる。例えば、テープ送り速度を４．７
５ｃｍ／ｓｅｃ、サンプリング周波数を８ｋＨｚにすれ
ば、無音部分が０であってもそのまま時間軸圧縮０の再
生が行われることになる。As a result, the amount of data after performing the time compression / expansion process of the silent portion in the DSP 22 is reduced, and the remaining memory amount in the ring memory 24 is kept within a predetermined value. For example, the tape feeding speed is 4.7.
If the sampling frequency is set to 5 cm / sec and the sampling frequency is set to 8 kHz, the time axis compression 0 is reproduced as it is even if the silent portion is 0.

【００５３】一方、リングメモリ２４におけるメモリ残
量が大きくなり、例えばメモリ残量が１００％を超える
と、前に読み出したデータをもう一度読み出すことにな
り、やはり正常な再生が行えなくなる。この場合には、
上述の場合と逆にモータ３４の回転数を速め、書き込み
データ量を増やせばよい。また、ＤＳＰ２２における無
音部分の時間圧縮伸長処理におけるデータ削除量を全デ
ータの３／４に限定しておき、テープ送り速度を４倍速
まで変更可能にしておけば、１／４の無音データはリン
グメモリ２４に少なくとも書き込まれることになり、無
音部分が継続した場合でも再生音において問題は発生し
ない。On the other hand, when the remaining memory amount in the ring memory 24 becomes large, for example, when the remaining memory amount exceeds 100%, the previously read data will be read again and normal reproduction cannot be performed. In this case,
Contrary to the above case, the rotation speed of the motor 34 may be increased to increase the write data amount. In addition, if the data deletion amount in the time compression / expansion process of the silent part in the DSP 22 is limited to 3/4 of all the data and the tape feeding speed can be changed up to 4 times speed, 1/4 of the silent data is ringed. At least it will be written in the memory 24, and no problem occurs in the reproduced sound even when the silent portion continues.

【００５４】このように、本実施例によれば、２倍速の
再生を基本として、録音テープ１０に録音されている音
声信号中の無音部分の量に応じて、１倍速から４倍速ま
でテープ送り速度を変化させる。これによって、リング
メモリ２４内のメモリ残量を所定量に維持し、効果的な
時間圧縮伸長の再生を行うことができる。例えば、図２
に示すように、メモリ残量に応じてテープ速度が１倍か
ら４倍に変更される。As described above, according to this embodiment, the tape feed is performed from 1 × speed to 4 × speed according to the amount of the silent portion in the audio signal recorded on the recording tape 10 on the basis of the reproduction at the 2 × speed. Change speed. As a result, the remaining amount of memory in the ring memory 24 can be maintained at a predetermined amount, and effective time compression / expansion reproduction can be performed. For example, FIG.
As shown in, the tape speed is changed from 1 × to 4 × according to the remaining memory capacity.

【００５５】さらに、上記説明においては、ＤＳＰ２２
による時間圧縮伸長処理を無音部分についてだけ述べた
が、これに音声信号の繰り返し波形の一部を間引いたり
付加したりして音程を変更せずに話速を制御する処理が
組み合わされた時間圧縮伸張を行うこともできる。Further, in the above description, the DSP 22
Although the time compression / expansion processing by the above was described only for the silent part, the time compression combined with the processing for controlling the speech speed without changing the pitch by thinning out or adding a part of the repetitive waveform of the audio signal Stretching can also be done.

【００５６】このため、時間軸圧縮処理について次に説
明する。まず、入力バッファ１８には、テープの再生速
度に係わらずオリジナルの音声（録音テープ１０に記憶
されている音声）の２倍の話速を保持した状態の音声デ
ータが蓄えられている。そして、時間軸圧縮伸長のため
には、まず母音についての繰り返し波形を検出しなけれ
ばならない。Therefore, the time axis compression process will be described below. First, the input buffer 18 stores audio data in a state in which a speech speed twice as high as the original audio (audio stored in the recording tape 10) is held regardless of the reproduction speed of the tape. Then, in order to perform compression / expansion on the time axis, first, a repetitive waveform for a vowel must be detected.

【００５７】そこで、ＤＳＰ２２は、入力バッファ１８
に記憶されている音声データから音声データのパワー変
化、スペクトル変化、周期性の変化等を検出し、これら
検出結果についての情報に基づいて音声データ中の母音
区間を検出すると共に、この母音区間を抽出する。そし
て、抽出された母音区間中における周期性から繰り返し
波形を認識し、繰り返し波形を所望数だけ削除・追加し
て時間軸圧縮伸長を行う。Therefore, the DSP 22 has the input buffer 18
The power change, spectrum change, periodicity change, etc. of the voice data are detected from the voice data stored in, and the vowel section in the voice data is detected based on the information about these detection results. Extract. Then, the repetitive waveform is recognized from the periodicity in the extracted vowel section, and the desired number of repetitive waveforms are deleted / added to perform time-axis compression / expansion.

【００５８】ここで、パワー変化、スペクトル変化、周
期性の変化の検出に基づく母音区間の検出および繰り返
し周期の検出について、説明する。Now, detection of a vowel section and detection of a repetition period based on detection of power change, spectrum change, and periodicity change will be described.

【００５９】まず、音声データのパワーは、音声データ
の二乗の関数である。そして、母音は子音よりパワーが
大きい。そこで、サンプリングされた音声データの二乗
の値の変化状態からその値が大きい部分を母音区間と推
定する。次に、音声データについて、高速フーリエ変換
処理を施し、順次スペクトル分析を行う。母音区間は、
特定の周波数の強度が大きい状態がある程度続くため、
これを基に母音区間を推定できる。さらに、母音区間
は、一定の波形が繰り返される。このため、音声データ
の自己相関をとれば、母音区間では、大きなピークが検
出される。そこで、音声データについての自己相関のピ
ークの大きさから母音部分を検出することができる。さ
らに、この自己相関におけるピークの存在位置の変化か
ら母音の変化を検出することができる。First, the power of voice data is a function of the square of the voice data. And vowels have more power than consonants. Therefore, from the change state of the squared value of the sampled voice data, the part having a large value is estimated as the vowel section. Next, the voice data is subjected to a fast Fourier transform process and sequentially subjected to spectrum analysis. The vowel section is
As the intensity of a specific frequency continues to be high to some extent,
The vowel section can be estimated based on this. Furthermore, a fixed waveform is repeated in the vowel section. Therefore, if the autocorrelation of the voice data is taken, a large peak is detected in the vowel section. Therefore, the vowel part can be detected from the size of the peak of the autocorrelation of the voice data. Furthermore, the change in the vowel can be detected from the change in the position of the peak in the autocorrelation.

【００６０】このような検出結果の１つから母音区間を
検出してもよいが、これらを総合的に評価して、母音区
間を検出するとよい。次に、検出した母音区間を抽出す
ると共に、１つの母音についての繰り返し波形およびこ
の繰り返し数を認識する。このためには、繰り返し波形
を認識しなければならないが、これらは上述の自己相関
のピークの存在位置から検出できる。そして、認識した
繰り返し波形の１またはそれ以上を削除・追加すること
によって繰り返し波形の間引き・追加による時間軸圧縮
伸長が達成される。The vowel section may be detected from one of such detection results, but it is preferable to comprehensively evaluate these and detect the vowel section. Next, the detected vowel section is extracted, and the repeating waveform and the number of repetitions of one vowel are recognized. For this purpose, it is necessary to recognize repetitive waveforms, which can be detected from the existence positions of the above-mentioned autocorrelation peaks. Then, by deleting / adding one or more of the recognized repetitive waveforms, time-axis compression / expansion is achieved by thinning / adding the repetitive waveforms.

【００６１】通常の場合、１つの母音は、ほぼ同一の繰
り返し波形が５〜３０程度繰り返される場合が多い。こ
のため、例えば、ほぼ繰り返し波形が１２連続する音声
に対して、連続３波形ごとに１波形の計４波形を削除す
れば、１つの母音の継続時間を２／３に時間圧縮するこ
とができる。In the usual case, one vowel often has approximately the same repeating waveform repeated for about 5 to 30 times. For this reason, for example, with respect to a voice having approximately 12 continuous repeating waveforms, by deleting a total of 4 waveforms for every 3 continuous waveforms, the duration of one vowel can be compressed to 2/3. .

【００６２】ここで、波形を単純に削除するのではな
く、次のようにして、２波形を１波形にまとめることも
好適である。この場合の削除・付加波形の前後の波形の
接続には、接続部分の前の１波形のサンプル値に０〜１
に線形変化する窓を掛けた傾斜波形値と接続部分の後の
１波形に１〜０に線形変化する窓を掛けた逆傾斜波形値
とを加算する重複加算法と称する手法が採用される。こ
のような接続手法によって、波形の連続性を維持しなが
ら時間軸圧縮伸長でき、結果、音声の波形の周波数（音
声の音程）に変化はない。これにより、音程を維持した
まま時間軸を圧縮伸長する（最適な速度に変換する）こ
とができる。Here, instead of simply deleting the waveforms, it is also preferable to combine the two waveforms into one waveform as follows. In this case, to connect the waveforms before and after the deleted / added waveform, add 0 to 1 to the sample value of one waveform before the connection part.
A method called an overlap and add method is used in which a gradient waveform value obtained by applying a linearly changing window to a waveform and a reverse waveform value obtained by multiplying one waveform after the connecting portion by a window changing linearly from 1 to 0 are added. With such a connection method, the time axis can be compressed and expanded while maintaining the continuity of the waveform, and as a result, the frequency of the waveform of the voice (the pitch of the voice) does not change. As a result, the time axis can be compressed and expanded (converted to an optimum speed) while maintaining the pitch.

【００６３】なお、このような母音区間の検出、繰り返
し波形の認識、間引き等の処理をおこうためには、ある
程度の期間の音声データを記憶しておき、これについて
処理を行うことが好ましい。そこで、入力バッファ１８
を単なるフレームメモリではなく、１００フレーム分程
度のリングメモリとするとよい。すなわち、入力バッフ
ァ１８において、入力されてくる音声データを順次古い
ものの上に上書きし、常に最新のものから１００フレー
ム分を記憶する。これによって、周期性等をかなりの時
間にわたって検出でき、より精度の高い処理が行える。In order to carry out such processing as vowel section detection, repeated waveform recognition, and thinning-out, it is preferable to store voice data for a certain period of time and perform the processing. Therefore, the input buffer 18
Is a ring memory for about 100 frames, not just a frame memory. That is, in the input buffer 18, the input audio data is sequentially overwritten on the old audio data, and 100 frames from the latest audio data are always stored. As a result, the periodicity and the like can be detected for a considerable time, and more accurate processing can be performed.

【００６４】このように本実施例のＤＰＳ２２において
は、時間軸圧縮伸長を無音部分を削除する無音省略処置
に組み合わせ、より積極的な話速変換を行っている。As described above, in the DPS 22 of the present embodiment, the time base compression / expansion is combined with the silence omission treatment for deleting the silent portion to perform more aggressive voice speed conversion.

【００６５】ここで、この話速変換の程度については、
ユーザの好みもある。そこで、図示のごとくユーザによ
る操作スイッチＳを設け、話速変換についての程度を選
択して指令入力させるとよい。例えば、これを３段階と
し、図２に示すように、メモリ残量に対する話速の大き
さＳ１〜Ｓ３を選択するとよい。すなわち、予め図２に
示すような関係をマップとして記憶しておき、これを参
照して話速変換の程度（波形を圧縮伸長する割合）を決
定する。そして、テープ速度及び話速変換の組み合わせ
によって、図２における話速の特性の直線（ユーザの選
択による３種類）と、テープ速度の直線の交点にメモリ
残量がなるように制御が行われる。Here, regarding the degree of this speech speed conversion,
There are also user preferences. Therefore, it is preferable to provide an operation switch S by the user as shown in the figure, select the degree of speech speed conversion, and input a command. For example, this may be set in three stages, and as shown in FIG. 2, the speech speed magnitudes S1 to S3 with respect to the remaining memory capacity may be selected. That is, the relationship as shown in FIG. 2 is stored in advance as a map, and the degree of speech speed conversion (ratio of compressing and expanding the waveform) is determined with reference to the map. Then, by the combination of the tape speed and the voice speed conversion, control is performed so that the remaining memory capacity is at the intersection of the straight line of the voice speed characteristic (three types selected by the user) in FIG. 2 and the tape speed line.

【００６６】「変形例１」上記の例では、話速変換の基
準をオリジナルの録音テープからのそのままの再生音声
においている。しかし、録音された音声には、早口の場
合と遅口の場合があり、聞き取りやすい話速変換の程度
は、録音音声の早口（遅口）の程度によって異なる。そ
こで、本例では、話速変換の基準をオリジナルの音声に
おける音声の絶対速度におく。[Modification 1] In the above example, the reference of the voice speed conversion is the reproduced sound as it is from the original recording tape. However, there are cases where the recorded voice is fast-mouthed and slow-mouthed, and the degree of speech speed conversion that is easy to hear differs depending on the degree of the fast-mouthed (late-mouthed) of the recorded voice. Therefore, in this example, the reference of the voice speed conversion is set to the absolute speed of the voice in the original voice.

【００６７】すなわち、本例では、固定的に設定される
絶対話速と話者の本来の話速との比率を用いて、ＤＳＰ
２２から出力される音声データが絶対話速になるように
時間軸圧縮伸長処理における母音の繰り返し波形の圧縮
伸長率を決定する。That is, in this example, the DSP is determined by using the ratio between the absolute speech speed fixedly set and the original speech speed of the speaker.
The compression / expansion rate of the repetitive waveform of the vowel in the time-base compression / expansion process is determined so that the audio data output from 22 has the absolute speech speed.

【００６８】これによって、話速が常に一定になり、早
口のものをもっと早口にしたり、遅口のものが余り短縮
されないというような欠点が除去される。さらに、リン
グメモリ２４のメモリ残量に応じたテープ速度の変更も
行うため、系全体の安定した動作が確保される。As a result, the speed of speech is always kept constant, and the drawbacks of making the fast-mouthed ones faster and the slow-mouthed ones less shortened are eliminated. Further, since the tape speed is changed according to the remaining memory of the ring memory 24, stable operation of the entire system is secured.

【００６９】また、ユーザが、外部からの入力により、
この話速の絶対速度（早口の程度）を指定できるように
するとよい。例えば、早口の程度１、２、３等のボタン
を設け、このボタンをユーザに操作させて、この程度を
決定する。そして、ＤＳＰ２２における話速変換処理に
おいて、指定された早口の程度に応じて、母音の繰り返
し波形の繰り返し数を決定し、入力音声の繰り返し波形
の繰り返し数をこれにそろえる。従って、ユーザの望む
話速での音声再生を行うことができる。なお、ボタンは
インターフェースを介し、データバス２０に接続してお
くことで、この操作をＤＳＰ２２が認識できる。In addition, the user inputs from the outside,
It is advisable to be able to specify the absolute speed of this speech speed (the degree of quick talk). For example, a button for fastness levels 1, 2, 3 and the like is provided, and the user is allowed to operate this button to determine this level. Then, in the speech speed conversion processing in the DSP 22, the number of repetitions of the repeating waveform of the vowel is determined according to the degree of the specified quick mouth, and the number of repetitions of the repeating waveform of the input voice is set to this. Therefore, it is possible to reproduce the voice at the speech speed desired by the user. By connecting the button to the data bus 20 via the interface, the DSP 22 can recognize this operation.

【００７０】「変形例２」上記変形例１においては、ユ
ーザが絶対速度を指定して再生するようにした。しか
し、一般的に、話者は重要な部分、相手に意志を確実に
伝えたい部分等は比較的ゆっくり話す傾向がある。従っ
て、上記変形例１のように、絶対速度を一定にしてしま
うと話しのニュアンスが読みにくくなることが考えられ
る。[Modification 2] In Modification 1 described above, the user specifies the absolute speed for reproduction. However, in general, the speaker tends to speak relatively slowly in an important part, a part in which he / she wants to surely convey his / her will to the other party. Therefore, it is conceivable that the nuance of the speech becomes difficult to read if the absolute speed is kept constant as in the first modification.

【００７１】そこで、本例では、所定期間（１秒間）の
入力音声データに対し、その母音の繰り返し波形の繰り
返し数をカウントして、話速の絶対速度（母音繰り返し
波形の繰り返し数）を計測し、この平均繰り返し数を求
める。次に、算出された平均繰り返し数が目標となる繰
り返し数（再生の際に目的とする話速に対応した繰り返
し数）となるように、削減率を計算する。そして、求め
られた削減率が３割であれば、これに対応する０．７を
乗算して、ＤＳＰ２２における時間軸圧縮伸長処理の目
標である各母音の繰り返し波形の繰り返し数を決定す
る。Therefore, in this example, the number of repetitions of the repetitive waveform of the vowel is counted with respect to the input voice data for a predetermined period (one second), and the absolute speed of speech (the number of repetitions of the repetitive vowel waveform) is measured. Then, the average number of repetitions is calculated. Next, the reduction rate is calculated so that the calculated average number of repetitions becomes the target number of repetitions (the number of repetitions corresponding to the target speech speed during reproduction). Then, if the calculated reduction rate is 30%, 0.7 corresponding thereto is multiplied to determine the number of repetitions of the repetitive waveform of each vowel which is the target of the time axis compression / expansion process in the DSP 22.

【００７２】これによって、ＤＳＰ２２は、入力されて
くる音声の１フレーム（１０ｍｓｅｃ）に対しては、各
母音の繰り返し波形の３割を間引いて、話速変換を行
う。そこで、話者の本来の話速の平均値（１秒間）に対
するその実時間変化（１０ｍｓｅｃ）との比率を用い、
固定的に設定される絶対話速に対し、この比率をかけた
変動話速になるように時間軸圧縮伸長処理における母音
の繰り返し波形の間引き処理が行われる。このようにし
て、随時検出した話速によって時間圧縮処理において目
標となる所定の話速（例えば、早口でもなく、遅口でも
ない適切な話速）が比較的小さな変動幅で変化する。As a result, the DSP 22 thins out 30% of the repetitive waveform of each vowel for one frame (10 msec) of the input voice and converts the speech speed. Therefore, using the ratio of the average value (1 second) of the speaker's original speech speed and its real-time change (10 msec),
The vowel repetitive waveform decimating process in the time axis compression / expansion process is performed so that a variable speech rate obtained by multiplying the fixed absolute speech rate by this ratio is obtained. In this way, the predetermined speech speed (for example, an appropriate speech speed that is neither fast-talking nor slow-talking) that is a target in the time compression process changes with a relatively small fluctuation width depending on the speech speed detected at any time.

【００７３】なお、目標となる絶対話速については、上
述の例と同様に、ユーザにより選択できるようにすると
よい。また、この処理を行うためには、入力バッファ１
８として、１秒分（１００フレーム分）の容量が必要で
ある。It should be noted that the target absolute speech speed should be selectable by the user as in the above example. Also, in order to perform this processing, the input buffer 1
As 8, the capacity of 1 second (100 frames) is required.

【００７４】このようにして、図３に示すように、絶対
話速が遅いほど話速変換による時間軸圧縮率が高くなる
処理が行われる。そして、全体としての時間の圧縮率
は、所定のものに維持しながら、部分的な早口、遅口は
残すことができ、話しのニュアンスを維持して好適な時
間軸圧縮処理が行える。In this way, as shown in FIG. 3, the processing is performed such that the slower the absolute voice speed, the higher the time-base compression rate by the voice speed conversion. The overall compression ratio of time can be maintained at a predetermined value while leaving partial early and late words, and nuances of speech can be maintained to perform suitable time axis compression processing.

【００７５】「さらに、別の変形」上述の例では、話速
を母音に繰り返し波形の繰り返し数で決定したが、所定
時間当たりの母音の出現回数によっても実質的の同等の
判定を行うことができる。そこで、母音の繰り返し波形
の繰り返し数に代えて母音の出現回数により、早口の程
度を判定してもよい。すなわちＤＳＰ２２において、所
定時間内の母音の出現回数をカウントし、話速を検出す
るとよい。[Further Other Modifications] In the above example, the speech speed is determined by the number of repetitions of the repetitive waveform for the vowel, but substantially the same determination can be made based on the number of appearances of the vowel per predetermined time. it can. Therefore, instead of the number of repetitions of the repeating waveform of the vowel, the number of times the vowel appears may be used to determine the degree of quick speech. That is, the DSP 22 may count the number of appearances of vowels within a predetermined time and detect the speech speed.

【００７６】また、時間軸圧縮伸長の処理としては、無
音省略処理または時間軸圧縮伸長処理のいずれか一方の
みでもよく、両方を用いた話速変換の処理でもよい。The time axis compression / expansion processing may be either the silence skipping processing or the time axis compression / expansion processing, or the speech speed conversion processing using both of them.

【００７７】また、上述の例では、録音媒体としてＣカ
セット録音テープのみを説明したが、他の録音テープで
も同様に、適用することができる。さらに、コンパクト
ディスク（ＣＤ）、半導体メモリによる固体録音メモリ
などでもその再生速度を制御することによって、本発明
を好適に適用することができる。In the above example, only the C cassette recording tape has been described as the recording medium, but other recording tapes can be similarly applied. Further, the present invention can be preferably applied to a compact disk (CD), a solid-state recording memory such as a semiconductor memory, and the like by controlling the reproduction speed thereof.

【００７８】さらに、上述の例では、ＤＳＰ２２が話速
変換処理のすべてを行ったが、マイコンなどを別に設
け、このマイコンにおいて話速変換処理や、ユーザの入
力する話速指令についての処理等行ってもよい。Further, in the above-mentioned example, the DSP 22 performs all of the speech speed conversion processing, but a microcomputer or the like is separately provided, and the speech speed conversion processing and processing of the speech speed command input by the user are performed in this microcomputer. May be.

【００７９】[0079]

【発明の効果】以上説明したように、本発明によれば、
話速変換処理により、話速を適正化できる。また、話速
変換処理の状況に応じて記録媒体からの再生速度を制御
するため、再生されたが出力されないでデータが消えて
しまうような不具合の発生を防止することができる。ま
た、無音部分を省略することにより、有音部分に悪影響
を及ぼさずに、再生時間の短縮が行える。As described above, according to the present invention,
The voice speed can be optimized by the voice speed conversion processing. Further, since the reproduction speed from the recording medium is controlled according to the situation of the voice speed conversion processing, it is possible to prevent the occurrence of a problem that the data is erased although it is reproduced but not output. Further, by omitting the silent part, the reproduction time can be shortened without adversely affecting the sound part.

【００８０】また、無音部分の省略処理をデジタル的に
行うことによって、メモリなどを利用した効率的な処理
が行える。Further, by performing the process of omitting the silent part digitally, an efficient process using a memory or the like can be performed.

【００８１】また、Ａ／Ｄ変換のサンプリングクロック
を記憶媒体からの再生速度に応じて変更することによっ
て、再生された音声信号とデジタルデータとの間の変換
レートは一定の保つことができ、音声出力の際の処理が
容易となる。By changing the sampling clock for A / D conversion according to the reproduction speed from the storage medium, the conversion rate between the reproduced audio signal and digital data can be kept constant, Processing at the time of output becomes easy.

【００８２】また、通常のカセットデッキで録音された
通常の磁気テープを再生して、好適な話速変換を行うこ
とができる。Further, it is possible to reproduce a normal magnetic tape recorded on a normal cassette deck to perform a suitable voice speed conversion.

【００８３】また、話速変換された音声データを音声メ
モリに記憶し、ここから音声データを読み出すことによ
って、話速変換した音声をスピーカなどに所定のスピー
ドで供給することができる。Further, by storing the voice data whose voice speed has been converted into the voice memory and reading the voice data from the voice memory, the voice whose voice speed has been converted can be supplied to the speaker or the like at a predetermined speed.

【００８４】また、音声メモリのメモリ残量に応じて再
生速度を制御することによって、音声メモリにおいてデ
ータが読み出される前に書き換えられてしまうような事
態の発生を防止して、好適な話速変換が行える。Further, by controlling the reproduction speed in accordance with the remaining memory capacity of the voice memory, it is possible to prevent a situation in which the data is rewritten before being read out from the voice memory, and a preferable voice speed conversion is performed. Can be done.

【００８５】また、話速変換の程度をユーザが指定する
ことによって、ユーザの好みに合わせた話速での再生が
行われる。Further, by the user designating the degree of speech speed conversion, the reproduction is performed at the speech speed according to the user's preference.

【００８６】また、平均的な話速を一定に制御してお
き、局所的には入力音声信号の話速に応じて話速を変更
することで、話のニュアンスを維持しつつ話速の最適化
を行うことができる。Further, the average speech speed is controlled to be constant, and the speech speed is locally changed according to the speech speed of the input voice signal, so that the nuance of the speech is maintained and the speech speed is optimized. Can be converted.

【００８７】また、単位時間当たりの母音の発生回数や
１つの母音の継続時間の検出により、入力音声の話速を
容易に検出することができる。Further, the speech speed of the input voice can be easily detected by detecting the number of times vowels are generated per unit time and the duration of one vowel.

【図面の簡単な説明】[Brief description of drawings]

【図１】実施例の全体構成を示すブロック図である。FIG. 1 is a block diagram showing an overall configuration of an embodiment.

【図２】メモリ残量とテープ速度及び話速の関係を示
す特性図である。FIG. 2 is a characteristic diagram showing the relationship between the remaining memory capacity, tape speed, and speech speed.

【図３】絶対話速と話速変換率の関係を示す特性図で
ある。FIG. 3 is a characteristic diagram showing a relationship between an absolute voice speed and a voice speed conversion rate.

【符号の説明】[Explanation of symbols]

１０録音テープ、１２磁気ヘッド、１６Ａ／Ｄ変
換器、１８入力バッファ、２０データバス、２２
ＤＳＰ、２４リングメモリ、２６Ｄ／Ａ変換器、２
８スピーカ、３０メモリコントローラ、３４モー
タ、３６ＦＧパルス発生器、３８位相比較器、４０
電圧制御発振器、４２１／Ｎ分周器。10 recording tape, 12 magnetic head, 16 A / D converter, 18 input buffer, 20 data bus, 22
DSP, 24 ring memory, 26 D / A converter, 2
8 speaker, 30 memory controller, 34 motor, 36 FG pulse generator, 38 phase comparator, 40
Voltage controlled oscillator, 42 1 / N divider.

Claims

【特許請求の範囲】[Claims]

【請求項１】音声信号記録媒体に記憶されている音声
信号を再生出力する音声信号再生手段と、再生出力される音声信号の継続時間を圧縮あるいは伸長
処理する時間圧縮伸長手段と、この時間圧縮伸長手段における圧縮伸長処理状況に応じ
て上記音声信号再生手段の再生速度を制御する再生制御
手段と、を有することを特徴とする音声信号再生装置。1. An audio signal reproducing means for reproducing and outputting an audio signal stored in an audio signal recording medium, a time compression / expansion means for compressing or expanding the duration of an audio signal reproduced and output, and this time compression. An audio signal reproducing apparatus comprising: a reproduction control unit that controls a reproduction speed of the audio signal reproducing unit according to a compression / expansion processing state in the expanding unit.

【請求項２】請求項１に記載の装置において、上記時間圧縮伸長手段は、音声信号の中の無音部分を圧
縮する無音省略処理を行うことを特徴とする音声信号再
生装置。2. The audio signal reproducing device according to claim 1, wherein the time compression / expansion means performs a silence omission process for compressing a silent portion in the audio signal.

【請求項３】請求項１または２に記載の装置におい
て、上記音声信号再生手段は、アナログの音声信号を出力す
るものであり、このアナログの音声信号をデジタルに変換するＡ／Ｄ変
換手段をさらに有し、このＡ／Ｄ変換手段から出力されるデジタル音声信号が
時間圧縮伸長手段に供給されることを特徴とする音声信
号再生装置。3. The apparatus according to claim 1, wherein the audio signal reproducing means outputs an analog audio signal, and an A / D converting means for converting the analog audio signal into a digital signal. An audio signal reproducing apparatus further comprising: a digital audio signal output from the A / D converting means is supplied to the time compression / expansion means.

【請求項４】請求項３に記載の装置において、上記Ａ／Ｄ変換手段は、アナログ信号を所定のサンプリ
ングクロックに応じてデジタル信号に変換するものであ
り、このサンプリングクロックを上記音声信号再生手段
の再生速度に応じて変更するサンプリングクロック制御
手段を更に有することを特徴とする音声信号再生装置。4. The apparatus according to claim 3, wherein the A / D conversion unit converts an analog signal into a digital signal according to a predetermined sampling clock, and the sampling clock is the audio signal reproduction unit. An audio signal reproducing apparatus further comprising a sampling clock control means for changing the sampling speed according to the reproducing speed of the.

【請求項５】請求項４に記載の装置において、上記音声信号記録媒体は、磁気テープであり、上記音声
信号再生手段は磁気テープを送るテープ送りモータを含
み、上記サンプリングクロック制御手段は、上記テープ送り
モータの回転数に応じてサンプリングクロックを制御す
ることを特徴とする音声信号再生装置。5. The apparatus according to claim 4, wherein the audio signal recording medium is a magnetic tape, the audio signal reproducing means includes a tape feed motor for feeding the magnetic tape, and the sampling clock control means is the An audio signal reproducing device characterized in that a sampling clock is controlled according to the number of revolutions of a tape feed motor.

【請求項６】請求項３〜５のいずれかに記載の装置に
おいて、上記時間圧縮伸長手段は、Ａ／Ｄ変換手段からのデジタル音声データの中から無音
部分を検出し、無音部分を省略する処理を行う無音省略
処理部と、無音部分が省略された音声データを新しいも
のから常に所定時間分記憶する音声メモリと、この音声
メモリに記憶されている音声データを読み出し出力する
読み出し出力部と、を有することを特徴とする音声信号再生装置。6. The apparatus according to any one of claims 3 to 5, wherein the time compression / expansion means detects a silent part in the digital audio data from the A / D conversion means and omits the silent part. A silence omission processing unit that performs processing, an audio memory that always stores a predetermined amount of audio data from which new audio data has been omitted, and a read output unit that reads and outputs the audio data stored in this audio memory. An audio signal reproducing device having:

【請求項７】請求項６に記載の装置において、上記時間圧縮伸長手段は、音声メモリ内におけるすでに読み出された音声データの
量であるメモリ残量を認識するメモリ残量認識手段を含
み、上記再生制御手段は、認識されたメモリ残量に応じて再生速度を制御すること
を特徴とする音声信号再生装置。7. The apparatus according to claim 6, wherein the time compression / expansion means includes a memory remaining amount recognition means for recognizing a memory remaining amount which is the amount of audio data already read in the audio memory, The audio signal reproducing apparatus, wherein the reproduction control means controls the reproduction speed according to the recognized remaining memory capacity.

【請求項８】請求項１〜７のいずれかに記載の装置に
おいて、上記時間圧縮伸長手段は、有音部分における音声データの繰り返し波形の一部を間
引きあるいは繰り返し波形を付加する時間軸圧縮伸長処
理を行うことを特徴とする音声信号再生装置。8. The apparatus according to any one of claims 1 to 7, wherein the time compression / expansion means decimates a part of a repetitive waveform of voice data in a voiced part or adds a repetitive waveform. An audio signal reproducing device characterized by performing processing.

【請求項９】請求項８に記載の装置において、話速指令を外部から入力する話速入力手段と、音声信号
再生手段から再生出力される音声信号の話速を検出する
話速検出手段と、を有し、上記時間圧縮伸長手段は、時間軸圧縮伸張処理後の話速
が、入力された話速指令における話速に合致するように
間引きあるいは付加処理を行うことを特徴とする音声信
号再生装置。9. The apparatus according to claim 8, further comprising: a voice speed input means for externally inputting a voice speed command; and a voice speed detecting means for detecting a voice speed of a voice signal reproduced and output from the voice signal reproducing means. The audio signal, wherein the time compression / expansion means performs thinning-out or addition processing so that the speech speed after the time-axis compression / expansion processing matches the speech speed in the input speech speed command. Playback device.

【請求項１０】請求項８に記載の装置において、上記時間圧縮伸長手段は、音声信号再生手段から再生出
力される音声信号の話速を検出する話速検出手段を有
し、この話速検出手段によって検出した話速が速い場合
には目標とする話速を所定の話速より比較的速く、上記
話速検出手段によって検出した話速が遅い場合には目標
とする話速を所定の話速より比較的遅く設定して時間軸
圧縮伸張処理を行うことを特徴とする音声信号再生装
置。10. The apparatus according to claim 8, wherein the time compression / expansion means has a speech speed detecting means for detecting a speech speed of a voice signal reproduced and output from the voice signal reproducing means, and the voice speed detection is performed. When the speech speed detected by the means is fast, the target speech speed is relatively higher than the predetermined speech speed, and when the speech speed detected by the speech speed detecting means is slow, the target speech speed is the predetermined speech speed. An audio signal reproducing device characterized in that a time base compression / expansion process is performed by setting the speed relatively slower than the speed.

【請求項１１】請求項９または１０に記載の装置にお
いて、上記話速検出手段は、単位時間当たりの母音の発生回数をカウントすることに
よって話速を検出することを特徴とする音声信号再生装
置。11. The audio signal reproducing apparatus according to claim 9, wherein the speech speed detecting means detects the speech speed by counting the number of times vowels are generated per unit time. .

【請求項１２】請求項９または１０に記載の装置にお
いて、上記話速検出手段は、１つの母音の発声継続時間を検出することによって話速
を検出することを特徴とする音声信号再生装置。12. The audio signal reproducing device according to claim 9, wherein the speech speed detecting means detects the speech speed by detecting the utterance duration of one vowel.