JPH0962300A

JPH0962300A - Speech decoding device

Info

Publication number: JPH0962300A
Application number: JP7220745A
Authority: JP
Inventors: Shiyuuichi Kawama; 修一河間
Original assignee: Sharp Corp
Current assignee: Sharp Corp
Priority date: 1995-08-29
Filing date: 1995-08-29
Publication date: 1997-03-07
Anticipated expiration: 2015-08-29
Also published as: JP3285472B2

Abstract

PROBLEM TO BE SOLVED: To provide a speech decoding device having a variable speed function capable of applying even when a pitch period is longer than frame length and having less sound quality deterioration. SOLUTION: A reproduction speed control part 2 produces a control signal for reproducing a sound source synp' (n) in which a sound source is deleted in a pitch period unit from the sound source to which a pitch component produced in a pitch synthesis filter 4 is added, or repeated, in a speech with a synthesis filter 6. The synthesis filter 6 produces a synthesis speech synn which is deleted or repeated by performing speech synthesis processing based on linear predictive factor αk sent from a linear predictive factor memory 5 to the sound source synp' (n) outputted from the pitch synthesis filter 4 according to a control signal sent from the reproduction control part 2. By performing delete processing or repeat processing in a pitch period unit, variable speed with less sound quality deterioration is realized. Since delete processing or repeat processing is performed in a pitch period unit, this method is applied even when pitch period is longer than frame length.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、符号化した音声
を再生するに際して再生速度を変更できる音声復号化装
置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice decoding device capable of changing a reproduction speed when reproducing encoded voice.

【０００２】[0002]

【従来の技術】最近、ＩＣ(集積回路)化された数kbps
(キロビット/秒)の音声符号化復号化装置が実用化され
ている。この符号化復号化装置の符号化方法としては、
再生音質が優れているＣＥＬＰ(Ｃode Ｅxcited Ｌinea
r Ｐrediction)がよく使われる。そして、このような符
号化復号化装置が留守番機能付き電話等の音声蓄積(録
音再生)装置に使用されるケースが増えてきている。2. Description of the Related Art Recently, several kbps integrated into an IC (integrated circuit)
(Kilobits / second) speech encoding / decoding device has been put to practical use. As the encoding method of this encoding / decoding device,
CELP (Code Excited Linea) with excellent playback sound quality
r Prediction) is often used. In addition, such an encoding / decoding device is increasingly used in a voice storage (recording / reproducing) device such as a telephone with an answering machine.

【０００３】上記留守番機能付き電話の音声蓄積装置に
は、録音したメッセージの中から必要な情報を早く見つ
け出すために高速で再生したり、早口のメッセージの内
容を正しく理解するために遅く再生する機能が求められ
る。そこで、通常は、復号化した音声データを再生速度
変換(時間軸圧縮伸長：「早聞き」あるいは「遅聞き」と呼
ばれる場合がある)装置に通して再生速度を変えたり、
音声符号化復号化装置における復号化器自身に再生速度
変換機能を付加したりしている。In the voice storage device of the telephone with an answering machine, a function of playing back at a high speed in order to quickly find necessary information in a recorded message, or a slow play in order to correctly understand the contents of a fast-paced message. Is required. Therefore, normally, the playback speed is changed by passing the decoded audio data through a playback speed conversion (time axis compression / expansion: sometimes called "fast listening" or "slow listening") device,
A reproduction speed conversion function is added to the decoder itself in the audio encoding / decoding device.

【０００４】音質の劣化が少なくて音の高さが変わらな
い再生速度変換方法として、一般的には、音声のピッチ
周期間隔で波形を削除したり挿入したりする方法があ
る。その際に、波形の削除または挿入による波形の接続
点での不連続性によって波形に歪みが生じた場合には、
音質が劣化してしまう。そこで、このような歪みが生じ
難いように、波形の接続点をゼロクロス点(波高値が
“０"となる時点か“０"に近い値となる時点)になるよ
うに調節したり、フェードイン・フェードアウトの窓関
数を掛けた接続点近傍の波形同士を加え合わせたりして
いる。尚、波形の不連続性による歪みは、元々の波形が
定常的なほど知覚されやすいのである。As a reproduction speed conversion method in which the deterioration of the sound quality is small and the pitch does not change, generally, there is a method of deleting or inserting a waveform at a pitch interval of a voice. At that time, if the waveform is distorted due to discontinuity at the connection point of the waveform due to deletion or insertion of the waveform,
Sound quality deteriorates. Therefore, to prevent such distortion from occurring, adjust the connection point of the waveform to the zero-cross point (the time when the peak value becomes “0” or a value close to “0”) or fade in. -The waveforms near the connection point multiplied by the fade-out window function are added together. The distortion due to the discontinuity of the waveform is more likely to be perceived as the original waveform is more stationary.

【０００５】ところで、上記音声符号化復号化装置の復
号化器自身に再生速度変換機能を付加する方法として、
ピッチ予測マルチパルス音声符号化・復号化方式に音声
速度変換機能を付加する方法(特開平２−９３７００号
公報)がある。以下、このピッチ予測マルチパルス音声
符号化・復号化方式における音声復号化器について説明
する。By the way, as a method of adding a reproduction speed conversion function to the decoder itself of the above audio encoding / decoding apparatus,
There is a method of adding a voice speed conversion function to a pitch prediction multi-pulse voice encoding / decoding system (Japanese Patent Laid-Open No. 2-93700). The speech decoder in this pitch prediction multi-pulse speech coding / decoding system will be described below.

【０００６】先ず、ピッチ予測マルチパルス符号化・復
号化方式について簡単に説明する。この符号化・復号化
方式は、基本的には、音声の発声メカニズムを模した方
式である。音声は、声帯で生成されたピッチ周期を有す
る気流が喉から口や鼻に至る空間(声道)において調音さ
れることによって作られる。そこで、符号化時には、入
力音声信号を２０ミリ秒程度のフレームに分割し、その
フレーム内のスペクトル包絡とピッチ周期を求めて、振
幅や隣接関係が異なる数本のパルスで成るマルチパルス
音源を作成する。そして、ピッチ合成フィルタを通して
ピッチ成分を有する音源(音声生成におけるピッチ周期
を有する気流に相当)を作り、スペクトル包絡(音声生成
における調音に相当)を持たせるための合成フィルタを
通して音声を合成する。そして、合成音声と入力音声と
の誤差が最も小さくなるように上記マルチパルス音源の
パルス振幅やパルス位置を調節し、その時のパルスの振
幅,位置情報(音源情報)とピッチ情報とスペクトル包絡
情報を符号化するのである。First, a pitch prediction multi-pulse encoding / decoding system will be briefly described. This encoding / decoding method is basically a method that imitates a voice utterance mechanism. Speech is produced by the articulation of an air flow having a pitch period generated by the vocal cords in the space (vocal tract) from the throat to the mouth and nose. Therefore, at the time of encoding, the input speech signal is divided into frames of about 20 milliseconds, the spectral envelope and the pitch period in the frame are obtained, and a multi-pulse sound source composed of several pulses having different amplitudes and adjacency relations is created. To do. Then, a sound source having a pitch component (corresponding to an airflow having a pitch period in voice generation) is created through the pitch synthesizing filter, and a voice is synthesized through a synthesizing filter for providing a spectral envelope (corresponding to articulation in voice generation). Then, the pulse amplitude and the pulse position of the multi-pulse sound source are adjusted so that the error between the synthesized voice and the input voice is minimized, and the amplitude of the pulse at that time, position information (sound source information), pitch information, and spectrum envelope information are obtained. It is encoded.

【０００７】復号化時には、上記符号化された音源情
報,ピッチ情報およびスペクトル包絡情報をフレーム単
位で復号化する。そして、上記復号化された音源情報か
ら音源を作り、上記ピッチ情報に基づくピッチ合成フィ
ルタを通してピッチ成分を持った音源を作り、上記スペ
クトル包絡情報に基づく合成フィルタを通して音声を作
成するのである。At the time of decoding, the coded excitation information, pitch information and spectrum envelope information are decoded in frame units. Then, a sound source is created from the decoded sound source information, a sound source having a pitch component is created through a pitch synthesis filter based on the pitch information, and a sound is created through a synthesis filter based on the spectrum envelope information.

【０００８】ここで、フレーム内の各時点ｎにおける音
源をexc(n)(０≦ｎ＜Ｌf、Ｌfはフレーム長)とし、ピッ
チ情報としてピッチ周期Ｐおよびピッチ合成フィルタ係
数βi(−Ｌp≦ｉ≦Ｌp、ｉは次数)を用いると、ピッチ
合成フィルタの出力synp(n)は式(１)で示される。Here, the sound source at each time point n in the frame is defined as exc (n) (0≤n <Lf, Lf is the frame length), and the pitch period P and pitch synthesis filter coefficient βi (-Lp≤i) are used as pitch information. ≤Lp, i is the order), the output synp (n) of the pitch synthesizing filter is expressed by equation (1).

【数１】ここで、一般的には、Ｌpは０か１である。上記ピッチ
周期Ｐおよびフィルタ係数βiが最適である場合には、
ピッチ合成フィルタの出力synp(n)は式(２)となる。[Equation 1] Here, Lp is generally 0 or 1. When the pitch period P and the filter coefficient βi are optimal,
The output synp (n) of the pitch synthesis filter is given by equation (2).

【数２】 [Equation 2]

【０００９】さらに、上記スペクトル包絡情報として線
形予測係数αi(０≦ｉ≦Ｌs、ｉは予測次数)を用いた場
合には、合成フィルタの出力(つまり合成音声信号)syn
(n)は式（３)となる。Further, when a linear prediction coefficient αi (0 ≦ i ≦ Ls, i is a prediction order) is used as the spectrum envelope information, the output of the synthesis filter (that is, the synthesized speech signal) syn
(n) becomes equation (3).

【数３】 (Equation 3)

【００１０】次に、上記ピッチ予測マルチパルス音声符
号化・復号化方式において音声速度を変更する方法につ
いて説明する。上記ピッチ予測マルチパルス音声符号化
・復号化方式においては、式(２)および式(３)によって
合成音声信号を生成する。そこで、Ｒ倍速の再生を行う
場合には、復号化時のフレーム長を符号化時のフレーム
長Ｌfの１/Ｒ倍にして復号化するのである。その際に、
復号化時のフレームの方を符号化時よりも長くする場合
(Ｒ＜１)には、マルチパルス音源の後に０成分パルスを
補った上で上記ピッチ合成フィルタを通してピッチ成分
を生成した後、合成フィルタを通して調音成分を生成す
るのである。これに対して、復号化時のフレームの方を
符号化時よりも短くする場合(Ｒ＞１)には、音源情報か
ら作成されたマルチパルス音源の１/Ｒ倍までを復号時
のマルチパルス音源とし、上記ピッチ合成フィルタを通
してピッチ成分を生成した後に合成フィルタに通すので
ある。こうすることによって、再生速度はＲ倍であるが
ピッチ周期はそのままの音声を再生できるのである。
尚、実際には、フレーム境界での歪みを低減させるため
に、復号化時のフレーム長はピッチ長の整数倍となるよ
うにする。Next, a method of changing the voice speed in the pitch predictive multi-pulse voice encoding / decoding system will be described. In the pitch prediction multi-pulse speech encoding / decoding method, a synthetic speech signal is generated by the equations (2) and (3). Therefore, when reproducing at R times speed, the frame length at the time of decoding is made 1 / R times the frame length Lf at the time of encoding, and decoding is performed. At that time,
When making a frame longer when decoding than when encoding
For (R <1), the zero-component pulse is supplemented after the multi-pulse sound source, the pitch component is generated through the pitch synthesis filter, and the articulatory component is generated through the synthesis filter. On the other hand, when the frame length at the time of decoding is made shorter than that at the time of encoding (R> 1), up to 1 / R times the multi-pulse sound source created from the sound source information can be used for multi-pulse at the time of decoding. As a sound source, a pitch component is generated through the pitch synthesis filter and then passed through the synthesis filter. By doing so, it is possible to reproduce the voice with the reproduction speed R times but the pitch period as it is.
Actually, in order to reduce the distortion at the frame boundary, the frame length at the time of decoding is set to be an integral multiple of the pitch length.

【００１１】[0011]

【発明が解決しようとする課題】しかしながら、上記ピ
ッチ予測マルチパルス音声符号化・復号化方式における
音声復号化器には、以下のような問題がある。However, the speech decoder in the pitch predictive multi-pulse speech coding / decoding system has the following problems.

【００１２】すなわち、上記音声復号化器に用いられる
ピッチ合成フィルタは、式(１)からＩＩＲ(不定期間イ
ンパルス応答)型であり、ピッチ周期Ｐ分だけ遡った過
去の出力synp(n−Ｐ)が現在の出力synp(n)に影響を与え
るようなフィルタである。したがって、上記音声復号化
器におけるＲ倍速再生に際して、符号化時のマルチパル
ス音源に０成分を加えたり、符号化時のマルチパルス音
源を打ち切ったりして、復号時のマルチパルス音源とす
ることは、ピッチ周期Ｐ分だけ後のピッチ合成フィルタ
出力に悪い影響を与えることになり、結果的に合成音声
の劣化につながるという問題がある。That is, the pitch synthesis filter used in the speech decoder is of the IIR (indefinite period impulse response) type from the equation (1), and the past output synp (n-P) traced back by the pitch period P. Is a filter that affects the current output synp (n). Therefore, at the time of R-speed reproduction in the speech decoder, it is possible to add a zero component to the multi-pulse sound source at the time of encoding or cut off the multi-pulse sound source at the time of encoding to obtain a multi-pulse sound source at the time of decoding. , The pitch synthesis filter output after the pitch period P is adversely affected, resulting in deterioration of synthesized speech.

【００１３】また、上述のピッチ予測マルチパルス符号
化・復号化方式においては、符号化時のフレーム長が２
０ミリ秒と十分長いために、上述したＲ倍速再生は最低
ピッチ周期(男性の低い声に対応)にも対応できる。とこ
ろが、フレーム長が短くてピッチ周期がフレーム長より
長い場合には、Ｒ倍速再生に際して以下のように不都合
なことが生じるのである。すなわち、上記ピッチ周期は
各フレーム毎に求められる。したがって、ピッチ周期が
複数のフレームに跨がる場合には、ピッチ周期とフレー
ムとの対応が曖昧となり、Ｒ倍速再生に際してどのフレ
ームのピッチ周期を使えば良いのかが明確に分からない
という問題がある。特に、あるフレームにおいてピッチ
周期が半ピッチや倍ピッチとなっている場合には、当該
フレームと隣接フレームとのピッチ周期が大きく異な
る。したがって、このような場合を考慮すると、ピッチ
周期として、例えば複数フレームのピッチ周期の平均を
使用することもできないのである。Further, in the above-described pitch prediction multi-pulse coding / decoding system, the frame length at the time of coding is 2
Since it is sufficiently long as 0 millisecond, the above-described R-speed reproduction can be applied to the lowest pitch period (corresponding to a male low voice). However, when the frame length is short and the pitch period is longer than the frame length, the following inconvenience may occur during R double speed reproduction. That is, the pitch period is obtained for each frame. Therefore, when the pitch period extends over a plurality of frames, the correspondence between the pitch period and the frame becomes ambiguous, and there is a problem that it is not clear which frame pitch period should be used for R-speed reproduction. . In particular, when the pitch cycle is a half pitch or a double pitch in a certain frame, the pitch cycle between the frame and the adjacent frame is significantly different. Therefore, in consideration of such a case, it is not possible to use, for example, an average of pitch periods of a plurality of frames as the pitch period.

【００１４】ここで、上記半ピッチとは、ピッチ周期が
本来の２倍であり、ピッチ周波数が本来の半分のことで
ある。また、上記倍ピッチとは、ピッチ周期が本来の半
分であり、ピッチ周期が本来の２倍のことである。この
半ピッチあるいは倍ピッチの存在は、符号化時において
ピッチ周期を求める場合に誤った値を求めてしまう要因
となる。Here, the half pitch means that the pitch period is twice the original pitch and the pitch frequency is half the original pitch. The double pitch means that the pitch cycle is half of the original pitch and the pitch cycle is twice the original pitch. The presence of the half pitch or the double pitch causes a wrong value to be obtained when the pitch period is obtained at the time of encoding.

【００１５】そこで、この発明の目的は、ピッチ周期が
フレーム長より長い場合にも適用可能であり音質劣化の
少ない可変速機能を有する音声復号化装置を提供するこ
とにある。Therefore, an object of the present invention is to provide a speech decoding apparatus having a variable speed function which can be applied even when the pitch period is longer than the frame length and has little deterioration in sound quality.

【００１６】[0016]

【課題を解決するための手段】上記目的を達成するた
め、請求項１に係る発明は、ピッチ予測と線形予測を用
いた音声符号化方法による符号列を復号化して得られた
音源情報に基づいて音源信号を生成する音源生成部と,
上記符号列を復号化して得られたピッチ予測情報に基づ
いて上記音源信号にピッチ成分を付加するピッチ合成フ
ィルタと,上記符号列を復号化して得られた線形予測情
報に基づいて上記ピッチ成分が付加された音源信号から
音声信号を合成する音声合成フィルタを有する音声復号
化装置において、再生速度倍率に基づいて,音声の再生
速度を制御するための制御信号を出力する再生速度制御
部を備えて、上記ピッチ合成フィルタは,上記制御信号
を受けて,上記ピッチ成分が付加された音源信号に対し
てピッチ周期を単位とする区間の削除あるいは繰り返し
の何れか一方を行って上記音声合成フィルタに送出する
ことを特徴としている。In order to achieve the above object, the invention according to claim 1 is based on excitation information obtained by decoding a code string by a speech coding method using pitch prediction and linear prediction. A sound source generator that generates a sound source signal by
A pitch synthesis filter that adds a pitch component to the excitation signal based on the pitch prediction information obtained by decoding the code string, and the pitch component based on the linear prediction information obtained by decoding the code string is A voice decoding device having a voice synthesis filter for synthesizing a voice signal from an added sound source signal, comprising a reproduction speed control unit for outputting a control signal for controlling a reproduction speed of voice, based on a reproduction speed multiplication factor. The pitch synthesis filter receives the control signal, deletes or repeats a section with a pitch period as a unit for the sound source signal to which the pitch component is added, and sends it to the voice synthesis filter. It is characterized by doing.

【００１７】上記構成において、音声符号列が復号化さ
れて音源情報,ピッチ予測情報および線形予測情報が得
られる。そして、上記音源情報に基づいて、音源生成部
によって音源信号が生成されると、上記ピッチ予測情報
に基づいて、ピッチ合成フィルタによって上記音源信号
にピッチ成分が付加される。そうすると、再生速度制御
部によって再生速度倍率に応じた制御信号が出力され
る。そして、この制御信号が上記ピッチ合成フィルタに
よって受けられると、上記ピッチ合成フィルタによっ
て、上記ピッチ成分が付加された音源信号に対してピッ
チ周期を単位とする区間の削除あるいは繰り返しが行わ
れて音声合成フィルタに送出される。そして、上記音声
合成フィルタによって、上記線形予測情報に基づいて、
上記削除あるいは繰り返しが行われたピッチ成分を有す
る音源信号から音声信号が合成される。In the above configuration, the speech code sequence is decoded to obtain excitation information, pitch prediction information and linear prediction information. When a sound source signal is generated by the sound source generation unit based on the sound source information, a pitch component is added to the sound source signal by the pitch synthesis filter based on the pitch prediction information. Then, the reproduction speed control unit outputs a control signal according to the reproduction speed magnification. Then, when this control signal is received by the pitch synthesis filter, the pitch synthesis filter deletes or repeats a section in units of a pitch cycle with respect to the sound source signal to which the pitch component is added, thereby performing speech synthesis. Sent to the filter. Then, by the voice synthesis filter, based on the linear prediction information,
An audio signal is synthesized from a sound source signal having a pitch component that has been deleted or repeated.

【００１８】こうして、上記ピッチ合成フィルタによっ
てピッチ成分が付加された音源信号に対してピッチ周期
単位での削除や繰り返しを行った後に上記音声合成フィ
ルタで音声信号を合成することによって、再生速度可変
の影響による音質の劣化を抑えて、且つ、音の高さが変
わらない音声が合成される。In this way, the sound source signal to which the pitch component is added by the pitch synthesis filter is deleted or repeated in pitch period units, and then the voice signal is synthesized by the voice synthesis filter, so that the reproduction speed can be varied. A voice with the same pitch is synthesized while suppressing the deterioration of the sound quality due to the influence.

【００１９】又、請求項２に係る発明は、請求項１に係
る発明の音声復号化装置において、上記ピッチ合成フィ
ルタは,上記ピッチ成分が付加された音源信号における
所定区間を保持する音源信号保持手段を有すると共に、
上記再生速度制御部は,上記音源信号保持手段に保持さ
れている保持音源信号の時間長が現フレームのピッチ周
期以上であることを検知して上記保持音源信号中に削除
あるいは繰り返しの対象となる区間が存在することを検
出する繰り返し・削除区間検出手段と,上記繰り返し・削
除区間検出手段によって上記削除あるいは繰り返しの対
象となる区間の存在が検出されると上記制御信号を出力
する繰り返し・削除処理手段を有して、上記ピッチ合成
フィルタは、上記制御信号を受けると、上記保持音源信
号に対してピッチ周期を単位とする区間の削除あるいは
繰り返しの何れか一方を行って上記音声合成フィルタに
送出するようになっていることを特徴としている。The invention according to claim 2 is the speech decoding apparatus according to the invention according to claim 1, wherein the pitch synthesis filter holds a sound source signal holding a predetermined section in the sound source signal to which the pitch component is added. With the means,
The reproduction speed control unit detects that the time length of the held sound source signal held in the sound source signal holding means is equal to or longer than the pitch period of the current frame, and becomes a target of deletion or repetition in the held sound source signal. Repeat / delete section detecting means for detecting the existence of a section, and repeat / delete processing for outputting the control signal when the existence of the section to be deleted or repeated is detected by the repeat / delete section detecting means When the pitch synthesis filter receives the control signal, the pitch synthesis filter deletes or repeats a section having a pitch cycle as a unit with respect to the held sound source signal, and sends it to the voice synthesis filter. The feature is that it is designed to do.

【００２０】上記構成において、ピッチ合成フィルタに
よってピッチ成分が付加された音源信号の所定区間が音
源信号保持手段に保持される。そうすると、上記再生速
度制御部は、繰り返し・削除区間検出手段によって、上
記音源信号保持手段に保持されている保持音源信号の時
間長が現フレームのピッチ周期以上であることを検知し
て上記保持音源信号中に上記削除あるいは繰り返しの対
象となる区間が存在することを検出する。そして、繰り
返し・削除処理手段によって上記制御信号が出力され、
この制御信号が上記ピッチ合成フィルタによって受けら
れると、上記保持音源信号に対して削除区間の削除ある
いは繰り返し区間の繰り返しが行われて上記音声合成フ
ィルタに送出される。すなわち、上記音源信号保持手段
に保持される音源信号の所定区間長が最大ピッチ周期以
上のフレーム単位に設定されていれば、上記ピッチ周期
がフレーム長より大きくても、上記ピッチ合成フィルタ
によってピッチ成分が付加された音源信号に対して確実
にピッチ周期単位での削除や繰り返しが行われる。In the above structure, the sound source signal holding means holds the predetermined section of the sound source signal to which the pitch component is added by the pitch synthesis filter. Then, the reproduction speed control section detects, by the repeat / delete section detecting means, that the time length of the held sound source signal held in the sound source signal holding means is equal to or longer than the pitch cycle of the current frame, and the held sound source. It is detected that there is a section to be deleted or repeated in the signal. Then, the control signal is output by the repetition / deletion processing means,
When this control signal is received by the pitch synthesizing filter, deletion sections are deleted or repetition sections are repeated with respect to the held sound source signal, and the holding sound source signal is sent to the voice synthesizing filter. That is, if the predetermined section length of the sound source signal held in the sound source signal holding means is set in a frame unit of a maximum pitch period or more, even if the pitch period is longer than the frame length, the pitch component is filtered by the pitch synthesis filter. The sound source signal to which is added is surely deleted or repeated in pitch cycle units.

【００２１】又、請求項３に係る発明は、請求項１に係
る発明の音声復号化装置において、上記再生速度制御部
は、上記再生速度倍率に基づく現時点までの希望再生時
間と現時点までの実際に再生した時間の差を検出する再
生時間差検出手段を有して、この再生時間差検出手段に
よって上記実際に再生した時間が未だ希望再生時間に至
っていないと判定された場合に上記制御信号を出力して
上記差の値を０にするようになっていることを特徴とし
ている。According to a third aspect of the present invention, in the voice decoding device according to the first aspect of the present invention, the reproduction speed control section is based on the reproduction speed multiplication factor and has a desired reproduction time up to the present time and an actual time up to the present time. The reproduction time difference detection means for detecting the difference in the reproduction time, and outputs the control signal when the reproduction time difference detection means determines that the actually reproduced time has not yet reached the desired reproduction time. It is characterized in that the value of the difference is set to 0.

【００２２】上記構成によれば、上記再生速度制御部に
よって、現時点までの実際に再生した時間が上記再生速
度倍率に基づく希望再生時間になるように上記再生速度
の制御が行われ、結果的に上記再生速度倍率での再生速
度での音声合成が行われる。According to the above construction, the reproduction speed control unit controls the reproduction speed so that the actual reproduction time up to the present time becomes the desired reproduction time based on the reproduction speed multiplication factor. Speech synthesis is performed at the reproduction speed at the reproduction speed magnification.

【００２３】又、請求項４に係る発明は、請求項３に係
る発明の音声復号化装置において、上記再生速度制御部
は、再生速度を遅くする場合に、上記再生時間差検出手
段によって希望再生時間と実際に再生した時間との差の
値が負の所定値以下になったと判定すると、上記ピッチ
成分が付加された音源信号のピッチ周期を単位とする繰
り返し区間を複数回繰り返して上記音声合成フィルタに
送出させる制御信号を出力して、上記希望再生時間と実
際に再生した時間との差を速やかに０に近づけるように
なっていることを特徴としている。According to a fourth aspect of the present invention, in the audio decoding device according to the third aspect of the present invention, the reproduction speed control unit uses the reproduction time difference detection means to reduce the reproduction speed to a desired reproduction time. When it is determined that the value of the difference between the actual playback time and the actual playback time is less than or equal to a negative predetermined value, the speech synthesis filter is repeated by repeating a repeating section in a unit of the pitch cycle of the sound source signal to which the pitch component is added. Is output so that the difference between the desired reproduction time and the actual reproduction time can be quickly brought close to zero.

【００２４】上記構成によれば、現時点までの実際に再
生した時間が希望再生時間に近づかないために現時点ま
での希望再生時間と実際に再生した時間との差の値が負
の所定値以下になった場合には、上記ピッチ成分が付加
された音源信号の繰り返し区間が複数回繰り返されて上
記実際の再生時間が希望再生時間になるように最適に制
御される。According to the above configuration, since the actual reproduction time up to the present time does not approach the desired reproduction time, the value of the difference between the desired reproduction time up to the present time and the actual reproduction time is equal to or less than a negative predetermined value. In this case, the repeating section of the sound source signal to which the pitch component is added is repeated a plurality of times, and the actual reproduction time is optimally controlled so as to become the desired reproduction time.

【００２５】又、請求項５に係る発明は、ピッチ予測と
線形予測を用いた音声符号化方法による符号列を復号化
して得られた音源情報に基づいて音源信号を生成する音
源生成部と,上記符号列を復号化して得られたピッチ予
測情報に基づいて上記音源信号にピッチ成分を付加する
ピッチ合成フィルタと,上記符号列を復号化して得られ
た線形予測情報に基づいて上記ピッチ成分が付加された
音源信号から音声信号を合成する音声合成フィルタを有
する音声復号化装置において、再生速度倍率の値が１以
上であるか否かを判定して,判定結果を表す信号を出力
する再生速度倍率判定部と、上記再生速度倍率判定部か
らの上記再生速度倍率の値が１以上であることを表す信
号を受けて,第１の制御信号を出力する削除処理部と、
上記再生速度倍率判定部からの上記再生速度倍率の値が
１より小さいことを表す信号を受けて,第２の制御信号
を出力する繰り返し処理部を備えて、上記ピッチ合成フ
ィルタは、上記第１の制御信号を受けた場合には、上記
ピッチ成分が付加された音源信号のピッチ周期を単位と
する削除区間を削除して上記音声合成フィルタに送出す
る一方、上記第２の制御信号を受けた場合には、上記ピ
ッチ成分が付加された音源信号のピッチ周期を単位とす
る繰り返し区間を繰り返して上記音声合成フィルタに送
出することを特徴としている。Further, the invention according to claim 5 is a sound source generation unit for generating a sound source signal based on sound source information obtained by decoding a code string by a voice coding method using pitch prediction and linear prediction. Based on the pitch prediction information obtained by decoding the code string pitch synthesis filter to add a pitch component to the excitation signal, the pitch component based on the linear prediction information obtained by decoding the code string In a speech decoding apparatus having a speech synthesis filter for synthesizing a speech signal from an added sound source signal, it is judged whether or not a value of a reproduction speed multiplication factor is 1 or more, and a reproduction speed for outputting a signal representing the judgment result. A magnification determining unit, a deletion processing unit that receives a signal from the reproduction speed multiplying determination unit indicating that the value of the reproduction speed multiplying factor is 1 or more, and outputs a first control signal;
The pitch synthesizing filter is provided with a first processing unit that receives a signal from the reproduction speed magnification determining unit indicating that the value of the reproduction speed magnification is smaller than 1 and outputs a second control signal. When the control signal of the second control signal is received, the deletion section whose unit is the pitch cycle of the sound source signal to which the pitch component is added is deleted and is sent to the voice synthesis filter. In this case, it is characterized in that the repeating section in which the pitch period of the sound source signal to which the pitch component is added is used as a unit is repeatedly transmitted to the speech synthesis filter.

【００２６】上記構成において、上記再生速度倍率判定
部によって再生速度倍率の値が１以上であると判定され
ると、削除処理部によって第１の制御信号が出力され
る。そして、この第１の制御信号を受けた上記ピッチ合
成フィルタによって、上記ピッチ成分が付加された音源
信号のピッチ周期を単位とする削除区間が削除されて上
記合成フィルタに送出される。これに対して、上記再生
速度倍率判定部によって再生速度倍率の値が１より小さ
いと判定されると、繰り返し処理部によって第２の制御
信号が出力される。そして、この第２の制御信号を受け
た上記ピッチ合成フィルタによって、上記ピッチ成分が
付加された音源信号のピッチ周期を単位とする繰り返し
区間が繰り返されて上記合成フィルタに送出される。こ
うして、上記再生速度倍率の値に応じて、再生速度を通
常の再生速度より速める早聞き処理と遅める遅聞き処理
とに切り替えられる。In the above structure, when the reproduction speed magnification determining unit determines that the value of the reproduction speed magnification is 1 or more, the deletion processing unit outputs the first control signal. Then, the pitch synthesis filter having received the first control signal deletes the deletion section in units of the pitch cycle of the sound source signal to which the pitch component is added, and sends it to the synthesis filter. On the other hand, when the reproduction speed magnification determination unit determines that the value of the reproduction speed magnification is smaller than 1, the repetitive processing unit outputs the second control signal. Then, the pitch synthesizing filter that has received the second control signal repeats a repeating section with the pitch period of the sound source signal to which the pitch component is added as a unit, and sends it to the synthesizing filter. In this way, according to the value of the reproduction speed multiplication factor, the reproduction speed is switched between the fast-listening processing for increasing the reproduction speed and the slow-listening processing for delaying the reproduction speed.

【００２７】[0027]

【発明の実施の形態】以下、この発明を図示の実施の形
態により詳細に説明する。図１は本実施の形態の音声復
号化装置におけるブロック図である。この音声復号化装
置に適用する符号化方式は、上記ＣＥＬＰ符号化方式や
ピッチ予測マルチパルス符号化方式等の音声の発声メカ
ニズムを模した音声符号化方式である。BEST MODE FOR CARRYING OUT THE INVENTION The present invention will be described in detail below with reference to the embodiments shown in the drawings. FIG. 1 is a block diagram in the speech decoding apparatus of this embodiment. The coding system applied to this speech decoding apparatus is a speech coding system imitating a speech utterance mechanism such as the CELP coding system or the pitch prediction multi-pulse coding system.

【００２８】尚、上記ＣＥＬＰ符号化方式は、短いフレ
ーム長での符号化方式であり、５ミリ秒程度のフレーム
長を用いるのが一般的である。このＣＥＬＰ符号化方式
も、上記ピッチ予測マルチパルス符号化方式と同様に音
声の発声メカニズムを模した音声符号化方式であり、ピ
ッチ成分を含まない音源として雑音成分を使用する。す
なわち、符号化部および復号化部は、音源として共通の
雑音コードブックを有しており、符号化時に音源として
最適な雑音を見つけるようにしている。The CELP coding method is a coding method with a short frame length, and generally uses a frame length of about 5 milliseconds. This CELP coding method is also a speech coding method that imitates a speech utterance mechanism like the pitch prediction multi-pulse coding method, and uses a noise component as a sound source that does not include a pitch component. That is, the encoding unit and the decoding unit have a common noise codebook as an excitation, and try to find optimum noise as an excitation at the time of encoding.

【００２９】デマルチプレクサ１は、この復号化装置に
対応する符号化装置(図示せず)によって作成された符号
列から、フレーム単位(フレーム長＝Ｌf)の音源情報(パ
ルスの振幅,位置),ピッチ情報(ピッチ周期Ｐとピッチ予
測係数β)およびスペクトル包絡情報(線形予測係数α)
を復号化して取り出す。再生速度制御部２は、上記デマ
ルチプレクサ１からのピッチ周期Ｐとピッチ予測係数β
に基づいて、再生速度倍率Ｒに従った再生音声を作るた
めにピッチ合成フィルタ４,線形予測係数メモリ５およ
び合成フィルタ６を制御する。そして、ピッチ合成フィ
ルタ４で後に詳述するようにして生成されるピッチ成分
を加えた音源synp(n)の一部を削除したり繰り返したり
して、音源synp'(n)(０≦ｎ＜Ｌ、Ｌは音源の長さ)を作
成し、このピッチ成分を有する音源synp'(n)に基づいて
音声合成を行う。この場合、音源の長さＬはフレーム長
Ｌｆに依らない。The demultiplexer 1 extracts excitation information (pulse amplitude, position) in frame units (frame length = Lf) from a code string created by an encoding device (not shown) corresponding to this decoding device, Pitch information (pitch period P and pitch prediction coefficient β) and spectrum envelope information (linear prediction coefficient α)
Is decrypted and taken out. The reproduction speed control unit 2 uses the pitch period P from the demultiplexer 1 and the pitch prediction coefficient β.
The pitch synthesis filter 4, the linear prediction coefficient memory 5, and the synthesis filter 6 are controlled to produce reproduced sound according to the reproduction speed magnification R. Then, the sound source synp '(n) (0≤n <is deleted by repeating or deleting a part of the sound source synp (n) added with the pitch component generated by the pitch synthesis filter 4 as described later in detail. L and L are the lengths of sound sources), and speech synthesis is performed based on the sound source synp '(n) having this pitch component. In this case, the length L of the sound source does not depend on the frame length Lf.

【００３０】ここで、上記再生速度制御部２が削除した
り繰り返したりする区間は、ピッチ周期Ｐになるように
決められる。符号化方式によっては、例えば無声音が合
成される場合のようにピッチ情報を用いないフレームが
存在する場合がある。このような場合には、問題なく処
理できるようにピッチ周期Ｐ＝０とする。つまり、削除
したり繰り返したりする区間長を０として扱うのであ
る。尚、上記再生速度制御部２のより詳細な説明は後で
行う。Here, the section deleted or repeated by the reproduction speed control unit 2 is determined to be the pitch period P. Depending on the encoding method, there may be a frame that does not use pitch information, for example, when unvoiced sound is synthesized. In such a case, the pitch period P = 0 is set so that processing can be performed without problems. That is, the section length to be deleted or repeated is treated as 0. A more detailed description of the reproduction speed control unit 2 will be given later.

【００３１】ピッチ無し音源生成部３は、上記デマルチ
プレクサ１からの音源情報に基づいて、ピッチ成分を含
まない音源exc(n)(０≦ｎ＜Ｌf)を生成する。ここで、
符号化方式が上記ＣＥＬＰ符号化方式の場合には、ピッ
チ無し音源生成部３は雑音コードブックを有し、この雑
音コードブックから音源情報に基づいて雑音信号を取り
出して、上記ピッチ成分を含まない音源として出力す
る。これに対して、符号化方式が上記マルチパルス符号
化方式の場合には、ピッチ無し音源生成部３は音源情報
に基づく振幅と位置とに基づいてパルス列を生成する。
上記ピッチ合成フィルタ４は、上記ピッチ無し音源生成
部３で生成されたピッチ成分を含まない音源exc(n)か
ら、ピッチ成分を加えた音源synp(n)（０≦ｎ＜Ｌf)を
生成する。尚、ピッチ合成フィルタ４についても後に詳
細に説明する。The pitchless sound source generator 3 generates a sound source exc (n) (0 ≦ n <Lf) containing no pitch component based on the sound source information from the demultiplexer 1. here,
When the encoding method is the CELP encoding method, the pitchless excitation generator 3 has a noise codebook, extracts a noise signal from this noise codebook based on the excitation information, and does not include the pitch component. Output as a sound source. On the other hand, when the encoding method is the multi-pulse encoding method, the pitchless excitation generator 3 generates a pulse train based on the amplitude and the position based on the excitation information.
The pitch synthesis filter 4 generates a sound source synp (n) (0 ≦ n <Lf) to which a pitch component is added from the sound source exc (n) generated by the pitchless sound source generation unit 3 that does not include the pitch component. . The pitch synthesis filter 4 will also be described in detail later.

【００３２】上記線形予測係数メモリ５は、現フレーム
と過去Ｆ_max分のフレームとで成る（Ｆ_max＋１)個のフ
レーム分の線形予測係数αを記憶しておく。ここで、Ｆ
_maxは式（４)によって求める。Ｆ_max＝ceil(Ｐ_max/Ｌf) …（４）但し、Ｐ_max：ピッチ周期の最大値 ceil(x)：ｘ以上の整数で最小となる値The linear prediction coefficient memory 5 stores the linear prediction coefficient α for (F _max +1) frames including the current frame and the past F _max frames. Where F
_max is obtained by the equation (4). F _max = ceil (P _max / Lf) (4) where P _{max is} the maximum value of the pitch period ceil (x): the minimum value of x or more

【００３３】今、記憶している線形予測係数αを夫々α
^kとする(ｋ：現フレームを０として−Ｆ_max≦ｋ≦０の
整数)と、線形予測係数メモリ５は、再生速度制御部２
からの制御信号に従って、音源synp'(n)に対応する線形
予測係数α^kを合成フィルタ６に出力するのである。そ
の後、再生速度制御部２の制御の下に合成フィルタ６の
動作が終了すると、式(５)によって、記憶している線形
予測係数α^kの更新を行う。 α^k-1←α^k (−Ｆ_max＋１≦ｋ≦０) …（５）The linear prediction coefficients α currently stored are respectively α
^{When k} is set (k: an integer of −F _max ≦ k ≦ 0 with the current frame being 0), the linear prediction coefficient memory 5 uses the reproduction speed control unit 2
The linear prediction coefficient α ^k corresponding to the sound source synp ′ (n) is output to the synthesis filter 6 in accordance with the control signal from. After that, when the operation of the synthesizing filter 6 ends under the control of the reproduction speed control unit 2, the stored linear prediction coefficient α ^k is updated by the equation (5). α ^k-1 ← α ^k (−F _max + 1 ≦ k ≦ 0) (5)

【００３４】上記合成フィルタ６は、上記再生速度制御
部２からの制御信号に基づいて、ピッチ合成フィルタ４
からのピッチ成分を有する音源synp(n)の一部を削除し
たり繰り返したりした音源synp'(n)を受け取り、この音
源synp'(n)に適した線形予測係数α^kを線形予測係数メ
モリ５から受け取る。そして、式(３)に従ってフィルタ
リングを行って、合成音声syn(n)(０≦ｎ＜Ｌ)を生成す
る。尚、上記合成フィルタ６は、上述のようにフレーム
単位でフィルタリング処理を行わない。したがって、実
時間再生処理の必要がある場合には、合成フィルタ６の
出力データを一時的に蓄える一方、その蓄えたデータを
一定時間に１つずつ出力可能なＦＩＦＯ(Ｆirst Ｉn Ｆ
irst Ｏut)メモリ等のバッファメモリが必要である。The synthesizing filter 6 is based on the control signal from the reproducing speed control section 2 and the pitch synthesizing filter 4 is provided.
Receives the sound source synp '(n), which is obtained by deleting or repeating a part of the sound source synp (n) having the pitch component from, and stores the linear prediction coefficient α ^k suitable for this sound source synp' (n) in the linear prediction coefficient memory. Receive from 5. Then, filtering is performed according to the equation (3) to generate the synthesized voice syn (n) (0 ≦ n <L). The synthesizing filter 6 does not perform the filtering process on a frame-by-frame basis as described above. Therefore, when real-time reproduction processing is required, the output data of the synthesis filter 6 is temporarily stored, and at the same time, the stored data can be output one by one in a fixed time (FIrst In Fn).
A buffer memory such as an irst out memory is required.

【００３５】次に、上記ピッチ合成フィルタ４について
詳細に説明する。図２は、上記ピッチ合成フィルタ４の
詳細なブロック図である。このピッチ合成フィルタ４
は、内部フィルタメモリ１１とフィルタ出力メモリ１２
と乗算器１３と加算器１４から構成される。尚、本実施
の形態においては、式(１)における次数ｉの最大値Ｌp
は、Ｌp＝０としておく。つまり、ピッチ合成フィルタ
４は、式(６)に従ってピッチ合成フィルタリング計算を
行うのである。 synp(n)＝exc(n)＋β・synp(n-Ｐ)（０≦ｎ＜Ｌf) …（６）こうして算出された１フレーム分のピッチ成分を有する
音源synp(n)（Ｌf個のデータで成る)は、フィルタ出力
メモリ１２に格納される。また、フィルタ出力メモリ１
２に格納された現フレームの音源synp(n)は、次フレー
ムの音源synp(n)の算出に際して内部フィルタメモリ１
１に転送される。こうして、内部フィルタメモリ１１に
はＦ_max個のフレーム分のピッチ成分を有する音源synp
(n)が保持されているのである。つまり、上記ピッチ合
成フィルタ４は、内部フィルタメモリ１１から読み出し
た前フレームの音源synp(n-Ｌf)を用いて式(６)に従っ
て現フレームの音源synp（n)を算出してフィルタ出力メ
モリ１２に格納するのである。Next, the pitch synthesis filter 4 will be described in detail. FIG. 2 is a detailed block diagram of the pitch synthesis filter 4. This pitch synthesis filter 4
Is an internal filter memory 11 and a filter output memory 12
And a multiplier 13 and an adder 14. In the present embodiment, the maximum value Lp of the order i in equation (1) Lp
Is set to Lp = 0. That is, the pitch synthesizing filter 4 performs the pitch synthesizing filtering calculation according to the equation (6). synp (n) = exc (n) + β · synp (n−P) (0 ≦ n <Lf) (6) The sound source synp (n) (Lf data having the pitch component for one frame calculated in this way) Are stored in the filter output memory 12. Also, the filter output memory 1
The sound source synp (n) of the current frame stored in 2 is stored in the internal filter memory 1 when calculating the sound source synp (n) of the next frame.
Forwarded to 1. Thus, the sound source synp having the pitch components of F _max frames is stored in the internal filter memory 11.
(n) is retained. That is, the pitch synthesis filter 4 calculates the sound source synp (n) of the current frame according to the equation (6) using the sound source synp (n-Lf) of the previous frame read from the internal filter memory 11, and calculates the sound source synp (n) of the current frame. It is stored in.

【００３６】そこで、上記内部フィルタメモリ１１の長
さＬimは、式(７)で表される長さが必要なのである。Ｌim＝Ｆ_max・Ｌf …（７）また、上記内部フィルタメモリ１１のアドレスは、図２
に示すように、順に、−Ｌim,−(Ｌim＋１),…,−１と
割り付ける。そして、各々のアドレスには、以前のフレ
ームでの時点ｎにおけるピッチ成分を有する音源synp
(n)を格納するのである。例えば、上記内部フィルタメ
モリ１１のアドレス“−１"に格納される音源synp（−
１)は、前フレームにおける時点“Ｌf−１"で算出され
た音源synp（Ｌf−１)である。一方、上記フィルタ出力
メモリ１２の長さは１フレーム分あれば十分であり、フ
レーム長Ｌfと同じである。そして、アドレスは順に０,
１,…,(Ｌf−１)と割り付ける。尚、図２においては、
１つのメモリを内部フィルタメモリ１１とフィルタ出力
メモリ１２とに分割しているが、勿論別々のメモリ構成
でも差し支えない。Therefore, the length Lim of the internal filter memory 11 needs to be the length represented by the equation (7). Lim = F _max Lf (7) Further, the address of the internal filter memory 11 is as shown in FIG.
, −Lim, − (Lim + 1), ..., −1 are assigned in order. Then, at each address, the sound source synp having the pitch component at the time point n in the previous frame
It stores (n). For example, the sound source synp (-stored in the address "-1" of the internal filter memory 11
1) is the sound source synp (Lf-1) calculated at the time point "Lf-1" in the previous frame. On the other hand, the length of the filter output memory 12 is enough for one frame, which is the same as the frame length Lf. And the address is 0,
Allocate as 1, ..., (Lf-1). In addition, in FIG.
Although one memory is divided into the internal filter memory 11 and the filter output memory 12, of course, separate memory configurations may be used.

【００３７】本実施の形態におけるピッチ合成フィルタ
４と従来の可変速機能を有する音声復号化装置における
ピッチ合成フィルタとの大きな違いは、内部フィルタメ
モリ１１の長さである。すなわち、上記従来のピッチ周
期単位で波形を削除したり挿入したりする音声復号化装
置におけるピッチ合成フィルタの内部フィルタメモリの
長さは、ピッチ周期の最大値Ｐ_maxである。これに対し
て本音声復号化装置における内部フィルタメモリ１１に
おいては、後に詳述するように、現フレームでのピッチ
合成フィルタリング計算結果synp(n)を現フレームでは
使用せずに後に使用する場合があるので、ピッチ合成フ
ィルタメモリ１１をその場合のバッファ代わりにも使用
するのである。そして、そのバッファとして必要な長さ
は、上記ピッチ周期の最大値Ｐ_max以上であってフレー
ム長単位での最小の長さであるＬimなのである。このよ
うに、上記バッファの長さをＬimとすることによって、
ピッチ周期単位で波形を削除したり挿入したりした際の
接続箇所における波形に歪みが生じないのである。勿
論、上記内部フィルタメモリ１１の長さを従来の内部フ
ィルタメモリと同様にＰ_maxとし、別に長さＬimのバッ
ファを設けても良い。しかしながら、その場合には明ら
かに無駄である。A major difference between the pitch synthesizing filter 4 in the present embodiment and the pitch synthesizing filter in the conventional speech decoding apparatus having the variable speed function is the length of the internal filter memory 11. That is, the length of the internal filter memory of the pitch synthesizing filter in the speech decoding apparatus that deletes or inserts a waveform in the above-mentioned conventional pitch cycle unit is the maximum value P _max of the pitch cycle. On the other hand, in the internal filter memory 11 in this speech decoding apparatus, as described later in detail, the pitch synthesis filtering calculation result synp (n) in the current frame may be used later without being used in the current frame. Therefore, the pitch synthesis filter memory 11 is also used as a buffer in that case. The length required for the buffer is Lim, which is the maximum value P _{max of the} pitch period or more and is the minimum length in frame length units. Thus, by setting the length of the above buffer to Lim,
The waveform is not distorted at the connection point when the waveform is deleted or inserted in units of pitch period. Of course, the length of the internal filter memory 11 may be set to P _max as in the conventional internal filter memory, and a buffer of length Lim may be separately provided. However, in that case, it is obviously useless.

【００３８】上記構成のピッチ合成フィルタ４は次のよ
うに動作してピッチ合成フィルタリング計算を行う。す
なわち、式(６)に従った演算処理を実現するために、ピ
ッチ情報であるピッチ周期Ｐおよびピッチ予測係数βを
デマルチプレクサ１からフレーム単位で受け取る。そし
て、時点ｎ(０≦ｎ＜Ｌf)において、時点ｎがピッチ周
期Ｐよりも小さい場合には内部フィルタメモリ１１から
データsynp(n-Ｐ)を読み出す一方、時点ｎがピッチ周期
Ｐ以上である場合には、時点(n-Ｐ)は現フレーム中に在
るのでフィルタ出力メモリ１２からデータsynp(n-Ｐ)を
読み出す。そして、この読み出したデータsynp(n-Ｐ)を
乗算器１３でβ倍した後、加算器１４によって、ピッチ
無し音源生成部３で生成された音源exc(n)に加算する。
こうして算出されたピッチ成分を有する音源synp(n)は
フィルタ出力メモリ１２のアドレスｎに格納される。そ
して、上述の動作が終了すると、次のフレームの計算に
備えて内部フィルタメモリ１１の記憶内容を次式によっ
て更新する。 synp(n)←synp(n＋Ｌf) (−Ｌim≦ｎ＜０)The pitch synthesizing filter 4 having the above structure operates as follows to perform pitch synthesizing filtering calculation. That is, the pitch period P and pitch prediction coefficient β, which are pitch information, are received from the demultiplexer 1 on a frame-by-frame basis in order to realize the arithmetic processing according to the equation (6). Then, at the time point n (0 ≦ n <Lf), when the time point n is smaller than the pitch cycle P, the data synp (n−P) is read from the internal filter memory 11, while the time point n is the pitch cycle P or more. In this case, since the time point (n-P) is in the current frame, the data synp (n-P) is read from the filter output memory 12. Then, the read data synp (n−P) is multiplied by β by the multiplier 13, and then added by the adder 14 to the sound source exc (n) generated by the pitchless sound source generation unit 3.
The sound source synp (n) having the pitch component calculated in this way is stored in the address n of the filter output memory 12. Then, when the above operation is completed, the stored contents of the internal filter memory 11 are updated by the following equation in preparation for the calculation of the next frame. synp (n) ← synp (n + Lf) (−Lim ≦ n <0)

【００３９】上記再生速度制御部２は、再生速度倍率Ｒ
と１との大小によって処理内容が異なる。そこで、以
下、再生速度倍率Ｒ≧１の早聞き再生および通常再生の
場合と再生速度倍率Ｒ≦１の遅聞き再生および通常再生
の場合とに分けて説明する。ここで、上記再生速度倍率
Ｒと再生時間との関係は、通常速度による再生時間をＬ
nとすると、再生速度倍率Ｒでの希望再生時間ＬhはＬh
＝Ｌn/Ｒとなる。The reproduction speed control unit 2 controls the reproduction speed magnification R.
The processing contents differ depending on the size of 1 and 1. Therefore, hereinafter, description will be made separately for the case of fast-listening reproduction and normal reproduction with the reproduction speed magnification R ≧ 1, and the case of slow-listening reproduction and normal reproduction with the reproduction speed magnification R ≦ 1. Here, regarding the relationship between the reproduction speed magnification R and the reproduction time, the reproduction time at the normal speed is L
If n, the desired playback time Lh at the playback speed magnification R is Lh
= Ln / R.

【００４０】(ａ) 再生速度倍率Ｒ≧１の場合（早聞き
再生および通常再生の場合）上記再生速度制御部２は、ピッチ合成フィルタ４におい
てピッチ合成フィルタリング計算の結果得られたピッチ
成分を有する音源synp(n)の中から、削除できる区間を
次のようにして探す。すなわち、式(２)より、現フレー
ムにおけるピッチ合成フィルタリング計算の結果である
音源synp(n)(０≦ｎ＜Ｌf)は、ピッチ周期Ｐだけ遡った
音源synp(n-Ｐ)と相似である。したがって、音源synp
(n)は時点ｎ＝０近傍と時点ｎ＝−Ｐ近傍とにおいても
相似であると考えられる。しかも、音源synp(n)が定常
的なほどピッチ予測係数βは１に近付く。そこで、synp
(n)(−Ｐ≦ｎ＜０)を削除区間とするのである。(A) When the reproduction speed magnification R ≧ 1 (for fast-listening reproduction and normal reproduction) The reproduction speed control unit 2 has a pitch component obtained as a result of pitch synthesis filtering calculation in the pitch synthesis filter 4. From the sound source synp (n), search for a section that can be deleted as follows. That is, from Expression (2), the sound source synp (n) (0 ≦ n <Lf), which is the result of the pitch synthesis filtering calculation in the current frame, is similar to the sound source synp (n−P) traced back by the pitch period P. . Therefore, the sound source synp
It is considered that (n) is similar even in the vicinity of the time point n = 0 and in the vicinity of the time point n = −P. Moreover, the pitch prediction coefficient β approaches 1 as the sound source synp (n) is more stationary. So, synp
(n) (-P≤n <0) is the deletion section.

【００４１】この削除の結果、上記ピッチ予測係数βが
１に近い値でない場合は、現フレーム付近は元々非定常
区間であるために、削除区間を削除した後の接続点の不
連続に起因する歪みは知覚され難い。また、βが１に近
い値の場合には現フレーム付近は元々定常区間であるた
めに、接続点の不連続に起因する歪みは知覚され易くな
る。ところが、時点ｎ＝−Ｐと時点ｎ＝０とを接続した
際の不連続性は小さいので歪みも小さく、結果的に上記
歪みは知覚され難いのである。As a result of this deletion, when the pitch prediction coefficient β is not a value close to 1, it is due to the discontinuity of the connection points after the deletion section is deleted, because the vicinity of the current frame is originally a non-stationary section. Distortion is hard to perceive. Further, when β is a value close to 1, since the vicinity of the current frame is originally a steady section, distortion due to discontinuity of connection points is easily perceived. However, since the discontinuity is small when the time point n = -P and the time point n = 0 are connected, the distortion is also small, and as a result, the distortion is hard to be perceived.

【００４２】以下、上記ピッチ予測係数βが１に近い場
合の具体例について述べる。図６(a)はピッチ合成フィ
ルタ４で生成される音源波形であり、図６(b)は削除後
の音源波形である。図６においては、音源は本来パルス
や雑音成分であるが、削除後の連続性を分かり易くする
ために音源波形を三角波で表している。図６(a)におい
て、現フレームの区間０≦ｎ＜Ｌfの音源波形に最も類
似している区間はピッチ周期Ｐだけ離れた区間−Ｐ≦ｎ
＜(−Ｐ＋Ｌf)であり、時点−Ｐ近傍の音源波形は時点
０近傍の音源波形に似ている。したがって、図６(b)の
ように、図６(a)における区間−Ｐ≦ｎ＜０の音源波形
を削除して現フレームの音源波形を削除しない場合にお
ける、削除後の音源波形における接続点Ａでの歪みを小
さくできるのである。尚、図６に示す音源波形は、Ｌf/
Ｐ＜１である場合の波形である。これに対して、Ｌf/Ｐ
≧１である場合の音源波形は図７に示すようになるA specific example of the case where the pitch prediction coefficient β is close to 1 will be described below. 6A shows the sound source waveform generated by the pitch synthesizing filter 4, and FIG. 6B shows the sound source waveform after the deletion. In FIG. 6, the sound source is originally a pulse or noise component, but the sound source waveform is represented by a triangular wave in order to make it easier to understand the continuity after deletion. In FIG. 6A, the section most similar to the sound source waveform in the section 0 ≦ n <Lf of the current frame is a section −P ≦ n apart by the pitch period P.
<(-P + Lf), and the sound source waveform near time point -P is similar to the sound source waveform near time point 0. Therefore, as shown in FIG. 6B, when the source waveform of section −P ≦ n <0 in FIG. 6A is deleted and the source waveform of the current frame is not deleted, the connection point in the source waveform after deletion is deleted. The distortion at A can be reduced. The sound source waveform shown in FIG. 6 is Lf /
It is a waveform when P <1. On the other hand, Lf / P
The sound source waveform when ≧ 1 is as shown in FIG.

【００４３】本実施の形態においては、後に詳述するよ
うに、一定間隔で一定区間の音源削除を行わない。した
がって、音声の合成処理を行いながら再生時間が再生速
度倍率Ｒに見合った再生時間になるように調節する必要
がある。そこで、再生時間長の調整用変数remを用いる
のである。上記再生速度制御部２は、ピッチ合成フィル
タ４から音源synp'(n)を合成フィルタ６に送る毎に、式
(８)によって調整用変数remを算出する。 rem(x)＝rem(x-1)＋Ｌpb(Ｒ−１)−Ｌc ‥‥（８）ここで、Ｌpb：実際に再生する区間の時間長Ｌc：削除する区間の時間長ｘ：rem算出回数In the present embodiment, as will be described in detail later, the sound source is not deleted at a constant interval at a constant interval. Therefore, it is necessary to adjust the reproduction time to match the reproduction speed magnification R while performing the voice synthesis process. Therefore, the variable rem for adjusting the reproduction time length is used. The reproduction speed control unit 2 calculates the expression each time the sound source synp ′ (n) is sent from the pitch synthesis filter 4 to the synthesis filter 6.
The adjustment variable rem is calculated by (8). rem (x) = rem (x-1) + Lpb (R-1) -Lc (8) Here, Lpb: time length of the section to be actually reproduced Lc: time length of the section to be deleted x: rem calculation count

【００４４】すなわち、上記調整用変数remは、現時点
までの実際に音声を再生した時間長と希望再生時間長と
の差のＲ倍を表したものであり、その初期値rem(０)は
０である。したがって、調整用変数remが０より小さい
場合には実際の再生時間は希望再生時間より短く、０よ
り大きい場合には実際の再生時間は希望再生時間より長
い。また、調整用変数remが０の場合には再生時間と希
望再生時間とが一致しているのである。That is, the adjustment variable rem represents R times the difference between the time length of actual sound reproduction up to the present time and the desired reproduction time length, and its initial value rem (0) is 0. Is. Therefore, when the adjustment variable rem is smaller than 0, the actual reproduction time is shorter than the desired reproduction time, and when it is larger than 0, the actual reproduction time is longer than the desired reproduction time. Further, when the adjustment variable rem is 0, the reproduction time and the desired reproduction time match.

【００４５】本実施の形態においては、ピッチ合成フィ
ルタ４で生成されたピッチ成分を有する音源synp(n)か
らピッチ周期Ｐ単位で音源を削除する削除処理を行う毎
に調整用変数remを算出して、現時点まで実際に音声を
再生した時間長が希望再生時間長になったかを監視する
のである。In the present embodiment, the adjustment variable rem is calculated every time the deletion process is performed to delete the sound source in pitch unit P from the sound source synp (n) having the pitch component generated by the pitch synthesis filter 4. Then, it is monitored whether the length of time when the sound is actually reproduced has reached the desired length of reproduction time.

【００４６】また、本実施の形態においては、音声の合
成処理を行いながら再生時間を短縮するために、ピッチ
周期Ｐを越える長さの数フレームに渡って一時的に音声
合成処理を行わない。そして、この音声合成処理を行わ
ないフレーム中に削除区間を検索し、削除区間が存在す
る場合にはピッチ合成フィルタ４における内部フィルタ
メモリ１１に格納されている前Ｌim個のフレームの音源
synp(n)からピッチ周期Ｐ単位で音源synp(n)を削除して
音源synp'(n)として出力するのである。Further, in the present embodiment, in order to shorten the reproduction time while performing the voice synthesis process, the voice synthesis process is not temporarily performed over several frames having a length exceeding the pitch period P. Then, a deletion section is searched for in a frame in which this speech synthesis processing is not performed, and if there is a deletion section, the sound source of the previous Lim frames stored in the internal filter memory 11 in the pitch synthesis filter 4 is searched.
The sound source synp (n) is deleted from synp (n) in units of the pitch period P and is output as the sound source synp '(n).

【００４７】図８は、上記デマルチプレクサ１,再生速
度制御部２,ピッチ無し音源生成部３およびピッチ合成
フィルタ４によって実行される早聞き再生処理動作のフ
ローチャートである。以下、図８に従って、早聞き再生
処理動作について説明しつつ、主に再生速度制御部２の
動作を説明する。尚、上記早聞き再生処理動作のフロー
チャートにおけるステップＳ2以外は、総て再生速度制
御部２による処理動作である。FIG. 8 is a flowchart of the fast-listening reproduction processing operation executed by the demultiplexer 1, the reproduction speed control unit 2, the pitchless sound source generation unit 3, and the pitch synthesis filter 4. Hereinafter, the operation of the reproduction speed control unit 2 will be mainly described while describing the fast-listening reproduction processing operation with reference to FIG. Note that, except for step S2 in the flow chart of the fast-listening reproduction processing operation, the processing operation is performed by the reproduction speed control unit 2.

【００４８】ステップＳ1で、上記調整用変数remの初期
値および音声合成処理を行わないフレームのフレーム数
cntの初期値に“０"がセットされる。ステップＳ2で、
上記デマルチプレクサ１,ピッチ無し音源生成部３およ
びピッチ合成フィルタ４によって、フレーム単位でのピ
ッチ成分を有する音源synp(n)の生成処理が行われて、
ピッチ合成フィルタ４のフィルタ出力メモリ１２に格納
される。但し、上記ピッチ合成フィルタ４は、この段階
では内部フィルタメモリ１１の記憶内容を更新しないで
おく。また、線形予測係数メモリ５は、現フレームの線
形予測係数α⁰を記憶しておく。In step S1, the initial value of the adjustment variable rem and the number of frames for which speech synthesis processing is not performed
"0" is set to the initial value of cnt. In step S2,
The demultiplexer 1, the non-pitch sound source generation unit 3, and the pitch synthesis filter 4 generate a sound source synp (n) having a pitch component in frame units,
It is stored in the filter output memory 12 of the pitch synthesis filter 4. However, the pitch synthesis filter 4 does not update the contents stored in the internal filter memory 11 at this stage. Further, the linear prediction coefficient memory 5 stores the linear prediction coefficient α ⁰ of the current frame.

【００４９】ステップＳ3で、上記調整用変数remの値が
正であるか否かが判別される。その結果、正である場合
にはステップＳ5に進む一方、そうでなければステップ
Ｓ4に進む。このステップでは、上記ピッチ合成フィル
タ４で生成された音源synp(n)に対して削除処理を行う
必要があるかを判定しているのである。すなわち、調整
用変数remの値が正であるということは、現時点までの
実際の再生時間は希望再生時間より長く、生成された音
源synp(n)から更に音源を削除する必要があることを意
味してる。そこで、調整用変数remの値が正の場合に
は、上記ステップＳ5に移行して、これまでに音声合成
処理を行っていない区間(すなわち、ピッチ合成フィル
タ４の内部フィルタメモリ１１に音源synp(n)が格納さ
れている区間)中における削除可能な区間の有無を判定
するのである。これに対して、調整用変数remの値が０
以下の場合には、生成された音源synp(n)から削除する
必要がないと判断して上記ステップＳ4に移行し、現フ
レームに係る音声合成処理を行うのである。In step S3, it is determined whether or not the value of the adjustment variable rem is positive. As a result, if the result is positive, the process proceeds to step S5. If not, the process proceeds to step S4. In this step, it is determined whether or not the sound source synp (n) generated by the pitch synthesis filter 4 needs to be deleted. That is, the positive value of the adjustment variable rem means that the actual playback time up to the present time is longer than the desired playback time, and it is necessary to further delete the sound source from the generated sound source synp (n). I'm doing it. Therefore, when the value of the adjustment variable rem is positive, the process proceeds to step S5, and the section in which the speech synthesis processing has not been performed so far (that is, the sound source synp ( It is determined whether or not there is a deletable section in (section where n) is stored). On the other hand, the value of the adjustment variable rem is 0
In the following cases, it is determined that it is not necessary to delete the generated sound source synp (n), the process proceeds to step S4, and the voice synthesis process for the current frame is performed.

【００５０】ステップＳ4で、上記ピッチ合成フィルタ
４に対して、上記ステップＳ2において生成されてピッ
チ合成フィルタ４のフィルタ出力メモリ１２に格納され
ている現フレームの音源synp(n)（０≦ｎ＜Ｌf)を読み
出して合成フィルタ６に送出させる制御信号が出力され
る。また、上記線形予測係数メモリ５に対して、現フレ
ームの線形予測係数α⁰を合成フィルタ６に送出させる
制御信号が出力される。そして、合成フィルタ６に対し
て、現フレームの音声合成を行わせる制御信号が出力さ
れる。そうした後、フレーム数cntに０がセットされ、
式(８)に従って調整用変数remが算出されて、ステップ
Ｓ18に進む。この場合、上記音源の長さ(データ長)Ｌ＝
実際の再生時間長Ｌpb＝フレーム長Ｌf、削除区間の時
間長Ｌc＝０、フレーム数cnt＝０である。したがって、
合成フィルタ６では、通常速度での音声合成処理が行わ
れることになる。In step S4, for the pitch synthesizing filter 4, the sound source synp (n) (0≤n <of the current frame generated in step S2 and stored in the filter output memory 12 of the pitch synthesizing filter 4). A control signal for reading Lf) and sending it to the synthesis filter 6 is output. Further, a control signal for sending the linear prediction coefficient α ⁰ of the current frame to the synthesis filter 6 is output to the linear prediction coefficient memory 5. Then, a control signal for synthesizing the voice of the current frame is output to the synthesis filter 6. After that, 0 is set to the frame number cnt,
The adjustment variable rem is calculated according to the equation (8), and the process proceeds to step S18. In this case, the length of the sound source (data length) L =
The actual reproduction time length Lpb = frame length Lf, the deletion section time length Lc = 0, and the number of frames cnt = 0. Therefore,
The synthesizing filter 6 performs the voice synthesizing process at the normal speed.

【００５１】ステップＳ5で、次の条件が成立するか否
かが判別される。Ｐ≦cnt・Ｌf その結果、上記条件が成立する場合にはステップＳ7に
進み、そうでなければステップＳ6に進む。このステッ
プでは、音声合成処理を行っていないフレームの長さ
が、ピッチ周期Ｐ以上になったか否かを判定するのであ
る。そして、ピッチ周期Ｐ以上である場合には、図６
(a)に示すように、上記ステップＳ2において生成された
現フレーム(０≦ｎ＜Ｌf)における音源synp(n)に相似な
区間が上記ピッチ合成フィルタ４の内部フィルタメモリ
１１に格納されている(−Ｐ≦ｎ＜(−Ｐ＋Ｌf))。した
がって、上記相似な区間は削除可能な区間であると判断
して上記ステップＳ7に移行し、これまで音声の合成を
行っていない区間(内部フィルタメモリ１１に音源が格
納されている区間)から上記相似な区間を削除して音声
の合成を行うのである。これに対して、ピッチ周期Ｐよ
り小さい場合には、音声合成処理を行っていないフレー
ム中に削除可能な区間はないと判断して上記ステップＳ
6に移行し、音声合成処理を行わないフレームの長さが
ピッチ周期Ｐ以上になるのを待つのである。In step S5, it is determined whether or not the following conditions are met. P ≦ cnt · Lf As a result, if the above condition is satisfied, the process proceeds to step S7, and if not, the process proceeds to step S6. In this step, it is determined whether or not the length of the frame in which the voice synthesis processing is not performed is equal to or longer than the pitch period P. When the pitch period is P or more,
As shown in (a), a section similar to the sound source synp (n) in the current frame (0 ≦ n <Lf) generated in step S2 is stored in the internal filter memory 11 of the pitch synthesis filter 4. (−P ≦ n <(− P + Lf)). Therefore, it is determined that the similar section is a section that can be deleted, and the process proceeds to step S7. From the section in which speech synthesis has not been performed so far (section in which the sound source is stored in the internal filter memory 11), The similar section is deleted to synthesize the voice. On the other hand, if it is smaller than the pitch period P, it is determined that there is no deletable section in the frame in which the voice synthesis processing is not performed, and the above step S
The process shifts to 6 and waits until the length of the frame not subjected to the voice synthesis process becomes equal to or longer than the pitch period P.

【００５２】ステップＳ6で、音声合成処理を行わない
フレーム数cntの内容がインクリメントされる。そうし
た後、ステップＳ18に進む。すなわち、現フレームで
は、音声合成は行われないのである。In step S6, the content of the number of frames cnt for which voice synthesis processing is not performed is incremented. After that, the process proceeds to step S18. That is, voice synthesis is not performed in the current frame.

【００５３】ステップＳ7で、次の条件Ｐ≦ｋ・Ｌf を満足する整数ｋ(０＜ｋ≦cnt)の最小値ｋ_minが求めら
れる。ステップＳ8で、上記ステップＳ7において求めら
れた最小値ｋ_minが音声合成処理を行わないフレーム数c
ntに等しいか否かが判別される。その結果、等しければ
ステップＳ10に進み、そうでなければステップＳ9に進
む。すなわち、上記ステップＳ7およびステップＳ8にお
いては、これまで音声合成処理を行っていない区間が、
図６(a)の如く削除区間(−Ｐ≦ｎ＜０)を含むフレーム
のみであるか、あるいは、図７(a)の如く削除区間(−Ｐ
≦ｎ≦０)を含まないフレーム(−２Ｌf≦ｎ≦−Ｌf)を
も有するかを判定するのである。そして、削除区間を含
まないフレームを有する場合には、上記ステップＳ9に
移行して、削除区間を含まないフレームの音声合成処理
を行うのである。In step S7, the minimum value _{kmin of} the integer k (0 <k≤cnt) satisfying the following condition P≤kLf is obtained. In step S8, the minimum value _kmin obtained in step S7 is the number of frames c for which speech synthesis processing is not performed.
It is determined whether it is equal to nt. As a result, if they are equal, the process proceeds to step S10, and if not, the process proceeds to step S9. That is, in the above steps S7 and S8, the section in which the voice synthesis processing has not been performed so far is
As shown in FIG. 6A, only the frame including the deletion section (-P≤n <0) is used, or as shown in FIG.
It is determined whether there is a frame (-2Lf≤n≤-Lf) that does not include ≤n≤0). If there is a frame that does not include the deleted section, the process proceeds to step S9, and the voice synthesis processing of the frame that does not include the deleted section is performed.

【００５４】ステップＳ9で、これまで音声合成処理を
行っていない区間のうち削除区間を含まないフレームの
音声合成の際の制御が以下のようにして行われる。すな
わち、上記ピッチ合成フィルタ４に対して、ｋ＝ cnt，
(cnt−１)，…，（ｋ_min＋１)の順に、上記内部フィル
タメモリ１１からsynp(n)（−ｋ・Ｌf≦ｎ＜−(ｋ−１)・
Ｌf)のデータを読み出して合成フィルタ６に送出させる
制御信号が出力される。また、線形予測係数メモリ５に
対して、上記読み出されたデータに対応する線形予測係
数α^kを読み出して合成フィルタ６に送出させる制御信
号が出力される。そして、合成フィルタ６に対して、上
記削除区間を含まないフレームの音声合成を行わせる制
御信号が出力されるのである。その結果、上記合成フィ
ルタ６によって、これまで音声合成処理を行っていない
区間のうち削除区間を含まないフレームの音声合成処理
が行われる。尚、その際におけるデータ長ＬはＬfであ
る。In step S9, the control at the time of voice synthesis of the frame which does not include the deleted section among the sections which have not been subjected to the voice synthesis processing is performed as follows. That is, for the pitch synthesis filter 4, k = cnt,
(cnt−1), ..., (K _min +1) in this order from the internal filter memory 11 to synp (n) (− k · Lf ≦ n <− (k−1) ·
A control signal for reading the data of Lf) and sending it to the synthesis filter 6 is output. Further, a control signal for reading the linear prediction coefficient α ^k corresponding to the read data and sending it to the synthesis filter 6 is output to the linear prediction coefficient memory 5. Then, the control signal for causing the synthesis filter 6 to perform voice synthesis of the frame not including the deletion section is output. As a result, the synthesizing filter 6 performs the voice synthesizing process of the frame that does not include the deleted segment among the segments that have not been subjected to the voice synthesizing process. The data length L at that time is Lf.

【００５５】ステップＳ10で、上記ステップＳ7におい
て求められた最小値ｋ_minに基づいてｋ_min・Ｌfが算出さ
れる。そして、ｋ_min・Ｌfの値がピッチ周期Ｐに等しい
か否かが判別される。その結果、等しければステップＳ
12に進み、そうでなければステップＳ11に進む。すなわ
ち、このステップにおいては、ピッチ周期Ｐ以上の時間
長を有する最数小のフレーム連鎖の時間長がピッチ周期
Ｐに等しいか否かを判定するのである。そして、ピッチ
周期Ｐに等しい場合には、未だ音声合成処理を行ってい
ない区間は削除区間に等しいので音声合成処理を終了す
る。これに対して、ピッチ周期Ｐに等しくない場合に
は、未だ音声合成処理を行っていない区間には削除区間
を越える区間が存在するので、上記ステップＳ11に移行
して削除区間を含むフレームの音声合成処理を行うので
ある。[0055] In step S10, k _min · Lf is calculated based on the minimum value k _min determined in step S7. Then, it is determined whether or not the value of _kmin · Lf is equal to the pitch period P. As a result, if they are equal, step S
If not, go to step S11. That is, in this step, it is determined whether or not the time length of the smallest frame chain having a time length equal to or longer than the pitch period P is equal to the pitch period P. If it is equal to the pitch period P, the section in which the voice synthesis processing has not yet been performed is equal to the deletion section, so the voice synthesis processing ends. On the other hand, if it is not equal to the pitch period P, there is a section that exceeds the deletion section in the section that has not been subjected to speech synthesis processing, so the process moves to step S11 and the speech of the frame including the deletion section is processed. The synthesizing process is performed.

【００５６】ステップＳ11で、これまで音声合成処理を
行っていない区間のうち削除区間を含むフレームの音声
合成の際の制御が以下のようにして行われる。すなわ
ち、上記ピッチ合成フィルタ４に対して、内部フィルタ
メモリ１１からsynp(n)(−ｋ_min・Ｌf≦ｎ＜−Ｐ)のデー
タ（図６(a)における区間−２Ｌf≦ｎ＜−Ｐの音源波形
に相当）を読み出して合成フィルタ６に送出させる制御
信号が出力される。また、線形予測係数メモリ５に対し
て、上記読み出されたデータに対応する線形予測係数α
^kmin(但し、kminは上記ｋ_minのことである)を読み出し
て合成フィルタ６に送出させる制御信号が出力される。
そして、合成フィルタ６に対して、上記削除区間を含む
フレームの音声合成を行わせる制御信号が出力されるの
である。その結果、上記合成フィルタ６によって、これ
まで音声合成処理を行っていない区間のうち削除区間を
含むフレームの音声合成処理が行われる。尚、その際に
おけるデータ長Ｌは(ｋ_min・Ｌf−Ｐ)である。In step S11, the control at the time of voice synthesis of the frame including the deleted section among the sections which have not been subjected to the voice synthesis processing is performed as follows. That is, for the pitch synthesis filter 4, the internal filter memory 11 synp (n) (- k min · Lf ≦ n <-P) data (section -2Lf ≦ n in FIG. 6 (a) <-P of A control signal for reading out (corresponding to the sound source waveform) and sending it to the synthesis filter 6 is output. Further, the linear prediction coefficient α corresponding to the read data is stored in the linear prediction coefficient memory 5.
^kmin (However, kmin than it above k _min) control signal for delivery to the synthesis filter 6 reads is output.
Then, the control signal for causing the synthesis filter 6 to perform the voice synthesis of the frame including the deletion section is output. As a result, the synthesizing filter 6 performs the voice synthesizing process of the frame including the deleted segment among the segments which have not been subjected to the voice synthesizing process. Note that the data length L in this case is _{(k min · Lf-P)} .

【００５７】ステップＳ12で、式(８)に従って調整用変
数remが算出される。ここで、Ｌpb＝cnt・Ｌf−Ｐ、Ｌc
＝Ｐである。ステップＳ13で、上記ステップＳ12におい
て算出された調整用変数remが正であるか否かが判別さ
れる。その結果、正であればステップＳ15に進み、そう
でなければステップＳ14に進む。ステップＳ14で、上記
ステップＳ4と同様にして、現フレームに係る通常速度
での音声合成処理と、フレーム数cntの初期値設定と、
調整用変数remの算出が行われる。In step S12, the adjustment variable rem is calculated according to the equation (8). Here, Lpb = cnt · Lf−P, Lc
= P. In step S13, it is determined whether or not the adjustment variable rem calculated in step S12 is positive. As a result, if the result is positive, the process proceeds to step S15, and if not, the process proceeds to step S14. In step S14, similar to step S4, the voice synthesis processing at the normal speed for the current frame, the initial value setting of the number of frames cnt,
The adjustment variable rem is calculated.

【００５８】ステップＳ15で、次の条件が成立するか否
かが判別される。Ｐ≦cnt・Ｌf その結果、上記条件が成立する場合にはステップＳ17に
進み、そうでなければステップＳ16に進む。このステッ
プでは、音声合成処理を行っていないフレームの長さが
ピッチ周期Ｐ以上であるか否かを判定することによっ
て、現フレームにおける削除区間の有無の判定を行うの
である。すなわち、ピッチ周期Ｐ以上である場合には、
現フレームのデータ内にも削除区間が存在する。つま
り、図７(a)に示すように、ピッチ周期で見ると、synp
(n-Ｐ)(−Ｐ≦ｎ＜０)と synp(n)((ｍ−１)Ｐ≦ｎ＜ｍ
Ｐ)とが相似であることを意味している。ここで、ｍは
１≦ｍ＜Ｌf/Ｐの範囲内にある整数である。そこで、ピ
ッチ周期Ｐ以上である場合には、上記ステップＳ17に移
行して、ｍの最大値をｍ_max(すなわち、現フレーム内に
含まれるピッチ周期Ｐの数)として、現フレーム内の相
似区間内にあるｍ個(０≦ｍ≦ｍ_max)のピッチ周期Ｐに
属する音源synp(n)を削除するのである。その際に、ｍ
が大きいほど(つまり、現フレーム内の削除区間を大き
く取るほど)上記削除区間を含むフレームと現フレーム
との接続箇所でずれが生じ易く、再生音声の音質は劣化
しやすい。しなしながら、式(８)による再生速度の調節
はし易くなる。そこで、本実施の形態では、ｍ＝ｍ_max
とする。In step S15, it is determined whether or not the following conditions are met. P ≦ cnt · Lf As a result, if the above condition is satisfied, the process proceeds to step S17, and if not, the process proceeds to step S16. In this step, it is determined whether or not there is a deletion section in the current frame by determining whether or not the length of the frame for which the voice synthesis processing is not performed is equal to or longer than the pitch period P. That is, when the pitch period is P or more,
There is also a deletion section in the data of the current frame. In other words, as shown in Fig. 7 (a), the synp
(n-P) (-P≤n <0) and synp (n) ((m-1) P≤n <m
P) means that it is similar. Here, m is an integer within the range of 1 ≦ m <Lf / P. Therefore, if it is equal to or greater than the pitch period P, the process proceeds to step S17, where the maximum value of m is m _max (that is, the number of pitch periods P included in the current frame), and the similar section in the current frame is set. The sound source synp (n) belonging to the m (0 ≦ m ≦ m _max ) pitch periods P within is deleted. At that time, m
Is larger (that is, the larger the deletion section in the current frame is), the more easily a shift occurs in the connection point between the frame including the deletion section and the current frame, and the sound quality of the reproduced voice is easily deteriorated. However, the reproduction speed can be easily adjusted by the formula (8). Therefore, in the present embodiment, m = m _max
And

【００５９】ステップＳ16で、音声合成処理を行わない
フレーム数cntに“１"がセットされてステップＳ18に進
む。つまり、これまで音声合成処理を行っていない区間
がピッチ周期Ｐより小さいので現フレームのデータには
削除区間はないとし、現フレームに対する音声合成処理
は行わずに、次フレームと一緒に行うのである。In step S16, "1" is set in the number of frames cnt for which voice synthesis processing is not performed, and the flow advances to step S18. That is, it is assumed that there is no deletion section in the data of the current frame because the section in which speech synthesis processing has not been performed so far is smaller than the pitch period P, and speech synthesis processing is not performed for the current frame but is performed together with the next frame. .

【００６０】ステップＳ17で、上記ピッチ合成フィルタ
４に対して、フィルタ出力メモリ１２からsynp(n)(ｍ
_max・Ｐ≦ｎ＜Ｌf)のデータを読み出して合成フィルタ６
に送出させる制御信号が出力される。また、線形予測係
数メモリ５に対して、現フレームの線形予測係数α⁰を
合成フィルタ６に送出させる制御信号が出力される。そ
して、合成フィルタ６に対して、現フレームにおける削
除区間以外の区間の音声合成を行わせる制御信号が出力
されるのである。その結果、上記合成フィルタ６によっ
て、現フレームにおける削除区間以外の区間の音声合成
処理が行われる。尚、その際におけるデータ長Ｌは(Ｌf
−ｍ_max・P)である。ここで、データ長Ｌ＝実際の再生時
間長Ｌpb、削除期間の時間長Ｌc＝ｍ_max・Ｐである。ま
た、音声合成処理を行わないフレーム数cntに０がセッ
トされる。そうした後、式(８)に従って調整用変数rem
の計算が行われる。In step S17, the filter output memory 12 outputs synp (n) (m) to the pitch synthesis filter 4.
The data of _max · P ≦ n <Lf) is read and the synthesis filter 6
The control signal to be sent to is output. Further, a control signal for sending the linear prediction coefficient α ⁰ of the current frame to the synthesis filter 6 is output to the linear prediction coefficient memory 5. Then, a control signal for causing the synthesis filter 6 to perform voice synthesis in a section other than the deleted section in the current frame is output. As a result, the synthesizing filter 6 performs the speech synthesizing process for the section other than the deleted section in the current frame. The data length L at that time is (Lf
-M _max · P). Here, the data length L = actual reproduction time length Lpb and the deletion period time length Lc = m _max · P. Further, 0 is set to the number of frames cnt for which speech synthesis processing is not performed. After that, according to equation (8), the adjustment variable rem
Is calculated.

【００６１】ステップＳ18で、現フレームに係る再生速
度制御および音声合成の処理が終了したので、ピッチ合
成フィルタ４の内部フィルタメモリ１１および線形予測
係数メモリ５の記憶内容の更新が行われる。ステップＳ
19で、現フレームは最終フレームであるか否かが判別さ
れる。その結果、最終フレームでなければ上記ステップ
Ｓ2に戻って次のフレームの処理に移行する。一方、最
終フレームであれば、早聞き再生処理動作を終了する。In step S18, since the reproduction speed control and voice synthesis processing for the current frame are completed, the stored contents of the internal filter memory 11 of the pitch synthesis filter 4 and the linear prediction coefficient memory 5 are updated. Step S
At 19, it is determined whether the current frame is the last frame. As a result, if it is not the final frame, the process returns to the step S2 to shift to the processing of the next frame. On the other hand, if it is the last frame, the fast-listening reproduction processing operation is ended.

【００６２】次に、以上の早聞き再生および通常速度再
生処理を、図６(a),(b)を用いて更に具体的に説明す
る。ここで、時点０においては、cnt＝２(直前２フレー
ム(−２Ｌf≦ｎ＜０)では音声処理を行っていない)、re
m＞０(削除区間を探索する状態)とする。先ず、ステッ
プＳ2において、現フレーム(０≦ｎ＜Ｌf)に係るピッチ
成分を有する音源synp(n)が生成される。そして、ステ
ップＳ3においてrem＞０であると判別されてステップＳ
5に進む。さらに、図６(a)よりＰ＜cnt・Ｌf＝２Ｌfであ
るから、ステップＳ7に進む。上記ステップＳ7において
はＰ≦ｋ・Ｌf を満足するｋの最小値ｋ_minは“２"であるから、ｋ_min
＝cnt＝２となる。したがって、音声合成処理を行って
いない区間に削除区間を含まないフレームは存在しない
と判断してステップＳ10に進む。そして、図６(a)より
ｋ_min・Ｌf＝２Ｌf≠ＰであるからステップＳ11に進んで
synp(n)(−２Ｌf≦ｎ＜−Ｐ)（すなわち、削除区間を含
むフレームにおける削除区間以外の区間の音源)を用い
て、区間−Ｐ≦ｎ＜０を削除した音声合成処理を行うの
である。そして、ステップＳ12において算出された調整
用変数remの値はrem≦０であるとするとステップＳ14に
進み、synp(n)(０≦ｎ＜Ｌf)（すなわち、現フレームの
音源)を用いて現フレームに係る通常速度での音声合成
処理が行われる。。その結果、音声合成処理を行う音源
の区間は図６(b)に示すようになる。こうして、１ピッ
チ周期Ｐ分の音源波形を削除することによって、ピッチ
周期Ｐがフレーム長Ｌfより長くても、音質の劣化が少
なくて音の高さが変わらない早聞き再生処理が行われる
のである。Next, the above-described fast-listening reproduction and normal speed reproduction processing will be described more specifically with reference to FIGS. 6 (a) and 6 (b). Here, at time 0, cnt = 2 (voice processing is not performed in the immediately preceding two frames (-2Lf ≦ n <0)), re
It is set to m> 0 (state to search for a deleted section). First, in step S2, a sound source synp (n) having a pitch component related to the current frame (0 ≦ n <Lf) is generated. Then, in step S3, it is determined that rem> 0 and step S3
Go to 5. Further, since P <cnt.Lf = 2Lf from FIG. 6 (a), the process proceeds to step S7. In step S7, since the minimum value k _{min of} k satisfying P ≦ k · Lf is “2”, k _min
= Cnt = 2. Therefore, it is determined that there is no frame that does not include the deleted section in the section in which the voice synthesis processing is not performed, and the process proceeds to step S10. Then, as shown in FIG. 6 (a), since k _min · Lf = 2Lf ≠ P, the process proceeds to step S11.
Synp (n) (−2Lf ≦ n <−P) (that is, the sound source of a section other than the deletion section in the frame including the deletion section) is used to perform the speech synthesis processing in which the section −P ≦ n <0 is deleted. is there. Then, assuming that the value of the adjustment variable rem calculated in step S12 is rem ≦ 0, the process proceeds to step S14, in which synp (n) (0 ≦ n <Lf) (that is, the sound source of the current frame) is used. The voice synthesis processing at the normal speed related to the frame is performed. . As a result, the section of the sound source for which the voice synthesis processing is performed is as shown in FIG. 6 (b). Thus, by deleting the sound source waveform for one pitch period P, even if the pitch period P is longer than the frame length Lf, the fast-listening reproduction process in which the sound quality is less deteriorated and the pitch does not change is performed. .

【００６３】尚、図７(a)の如く、Ｌf/Ｐ≧１である音
源波形の場合（ｋ_min＝１,cnt＝２)には、ステップＳ13
において調整用変数remの値はrem＞０であると判断され
たとするとステップＳ15に進む。そして、cnt＝２であ
るから、条件“Ｐ≦cnt・Ｌf＝２Ｌf"が成立してステッ
プＳ17に進み、１≦ｍ≦Ｌf/Ｐなる整数ｍはｍ＝ｍ_max
＝１であるからsynp(n)(Ｐ≦ｎ＜Ｌf)（すなわち、現フ
レームの削除区間以外の区間の音源)を用いて、現フレ
ームから区間０≦ｎ＜Ｐを削除した音声合成処理を行う
のである。その結果、音声合成処理を行う音源の区間は
図７(b)に示すようになる。こうして、２ピッチ周期２
Ｐ分の音源波形を削除した早聞き再生処理が行われるの
である。Incidentally, as shown in FIG. 7 (a), in the case of the sound source waveform with Lf / P ≧ 1, (k _min = 1 and cnt = 2), step S13
If it is determined that the value of the adjustment variable rem is rem> 0, the process proceeds to step S15. Then, since cnt = 2, the condition “P ≦ cnt · Lf = 2Lf” is satisfied and the process proceeds to step S17, where the integer m satisfying 1 ≦ m ≦ Lf / P is m = m _max.
= 1, synp (n) (P ≦ n <Lf) (that is, a sound source in a section other than the deleted section of the current frame) is used to perform speech synthesis processing in which section 0 ≦ n <P is deleted from the current frame. Do it. As a result, the section of the sound source for which the voice synthesis processing is performed is as shown in FIG. Thus, 2 pitch periods 2
The fast-listening reproduction process is performed by removing the sound source waveform for P.

【００６４】図３は、図８に示す早聞き再生および通常
再生処理を実行するための再生速度制御部２の具体的な
構成例を示す。この再生速度制御部２は、再生時間差検
出部２１,削除区間検出部２２および削除処理部２３を
有する。そして、再生時間差検出部２１は、外部から入
力される再生速度倍率Ｒと削除処理部２３からの実際に
再生する区間の時間長Ｌpbおよび削除する区間の時間長
Ｌcとに基づいて、式(８)に従って調整用変数remを算出
する。そして、図８に示す早聞き再生処理動作のフロー
チャートにおける上記ステップＳ3およびステップＳ13
の判断処理を行う。FIG. 3 shows a concrete example of the structure of the reproduction speed control unit 2 for executing the fast-listening reproduction and the normal reproduction processing shown in FIG. The reproduction speed control unit 2 has a reproduction time difference detection unit 21, a deletion section detection unit 22 and a deletion processing unit 23. Then, the reproduction time difference detection unit 21 uses the formula (8) based on the reproduction speed magnification R input from the outside and the time length Lpb of the actually reproduced section and the time length Lc of the section to be deleted from the deletion processing unit 23. ), The adjustment variable rem is calculated. The steps S3 and S13 in the flowchart of the fast-listening reproduction processing operation shown in FIG.
The determination process of is performed.

【００６５】また、上記削除区間検出部２２は、デマル
チプレクサ１からのピッチ周期Ｐと削除処理部２３から
の音声合成処理を行わないフレーム数cntと内部メモリ
に格納されたフレーム長Ｌfに基づいて、図８に示す早
聞き再生処理動作のフローチャートにおける上記ステッ
プＳ5およびステップＳ15の判断処理を行って、ピッチ
合成フィルタ４で生成されて内部フィルタメモリ１１お
よびフィルタ出力メモリ１２に格納されているピッチ成
分を有する音源synp(n)の中で削除できる区間の有無を
検出するのである。また、上記削除処理部２３は、図８
に示す早聞き再生処理動作のフローチャートにおけるデ
マルチプレクサ１,ピッチ無し音源生成部３,ピッチ合成
フィルタ４,再生時間差検出部２１および削除区間検出
部２２による処理以外の処理を行って、音源synp(n)の
中で削除できる区間をピッチ単位で削除した音源synp'
(n)を生成して音声再生するためにピッチ合成フィルタ
４,線形予測係数メモリ５および合成フィルタ６を制御
する制御信号を生成する。Further, the deletion section detection unit 22 is based on the pitch period P from the demultiplexer 1, the number of frames cnt from the deletion processing unit 23 which does not perform the voice synthesis process, and the frame length Lf stored in the internal memory. The pitch components generated by the pitch synthesis filter 4 and stored in the internal filter memory 11 and the filter output memory 12 by performing the determination processing in steps S5 and S15 in the flowchart of the fast-listening reproduction processing operation shown in FIG. The presence / absence of a section that can be deleted in the sound source synp (n) having is detected. In addition, the deletion processing unit 23 is configured as shown in FIG.
In the flow chart of the fast-listening reproduction processing operation shown in FIG. 1, the sound source synp (n The sound source synp 'in which the section that can be deleted in
A control signal for controlling the pitch synthesizing filter 4, the linear prediction coefficient memory 5, and the synthesizing filter 6 for generating (n) and reproducing the voice is generated.

【００６６】上述のように、本実施の形態における音声
復号化装置は再生速度制御部２を有し、この再生速度制
御部２には再生時間差検出部２１,削除区間検出部２２
および削除処理部２３を設けている。そして、再生時間
差検出部２１は、削除処理毎に、再生速度倍率Ｒと削除
処理結果とに基づいて調整用変数remを算出する。そし
て、得られた調整用変数remの正否によって、次の削除
処理を行うか否かを判定する。その結果、削除処理を行
う場合には、削除区間検出部２２によって、これまで音
声合成処理を行っていない区間および現フレーム中にお
けるピッチ周期Ｐ単位での削除区間の有無を検出する。
そして、この検索結果に従って、削除処理部２３によっ
て、ピッチ合成フィルタ４によって生成されたピッチ成
分を有する音源synp(n)中から削除区間を削除して音源s
ynp'(n)を生成するための制御信号を出力するようにし
ている。As described above, the speech decoding apparatus according to the present embodiment has the reproduction speed control unit 2, and the reproduction speed control unit 2 includes the reproduction time difference detection unit 21 and the deleted section detection unit 22.
And a deletion processing unit 23. Then, the reproduction time difference detection unit 21 calculates the adjustment variable rem for each deletion process based on the reproduction speed magnification R and the deletion process result. Then, depending on whether the obtained adjustment variable rem is correct or not, it is determined whether or not the next deletion process is performed. As a result, when the deletion processing is performed, the deletion section detection unit 22 detects the section in which the voice synthesis processing has not been performed so far and the presence or absence of the deletion section in the pitch period P unit in the current frame.
Then, according to this search result, the deletion processing unit 23 deletes the deletion section from the sound source synp (n) having the pitch component generated by the pitch synthesis filter 4 to remove the sound source s.
A control signal for generating ynp '(n) is output.

【００６７】したがって、本実施の形態によれば、ピッ
チ予測と線形予測を用いた符号化・復号化方式による音
声復号化装置において、ピッチ周期Ｐ単位で音源synp
(n)に対する削除処理を行うことができ、再生音声の音
質劣化が少なく、且つ、ピッチ周期Ｐがフレーム長Ｌf
より長い場合でも対処できる音声復号化装置を実現でき
る。また、音声合成処理を行いながら調整用変数remに
よって次の削除処理の実行の可否を判定しているので、
再生速度を指定された再生速度倍率Ｒになるように正し
く制御できる。Therefore, according to the present embodiment, in the speech decoding apparatus by the encoding / decoding method using the pitch prediction and the linear prediction, the sound source synp is performed in units of pitch period P.
(n) can be deleted, the sound quality of the reproduced sound is not deteriorated, and the pitch period P is the frame length Lf.
It is possible to realize a voice decoding device that can handle even a longer time. In addition, since it is determined whether or not the next deletion process can be executed by the adjustment variable rem while performing the voice synthesis process,
The reproduction speed can be correctly controlled so as to be the specified reproduction speed magnification R.

【００６８】(ｂ) 再生速度倍率Ｒ≦１の場合（遅聞き
再生および通常再生の場合）上記再生速度制御部２は、ピッチ合成フィルタ４におい
てピッチ合成フィルタリング計算の結果得られたピッチ
成分を有する音源synp(n)の中から、繰り返すことがで
きる区間を次のようにして探す。すなわち、上述したＲ
≧１の場合と同様に、音源synp(n)(０≦ｎ≦Ｌf)はピッ
チ周期Ｐだけ溯った音源synp(n-Ｐ)と相似であるからsy
np(n)(−Ｐ≦ｎ＜０)を繰り返し区間とするのである。(B) When the reproduction speed magnification R ≦ 1 (in the case of slow-playing reproduction and normal reproduction) The reproduction speed control unit 2 has a pitch component obtained as a result of pitch synthesis filtering calculation in the pitch synthesis filter 4. From the sound source synp (n), search for a repeatable section as follows. That is, the above R
Similar to the case of ≧ 1, the sound source synp (n) (0 ≦ n ≦ Lf) is similar to the sound source synp (n−P) that is pitched by the pitch period P.
The np (n) (-P≤n <0) is set as the repeating section.

【００６９】この繰り返しの結果、上記繰り返し区間を
挿入することによって繰り返し区間の前後に２つの接続
箇所が生ずる。そのうち、後の接続箇所での接続は元々
連続する区間同士の接続であるために、連続性は完全に
保証される。これに対して、前の接続箇所での接続は、
繰り返し区間自身の最後尾の時点と先頭の時点とが接続
される。ところが、ピッチ予測係数βが１に近い値でな
い場合には、現フレーム付近は非定常区間であるために
接続箇所での不連続に起因する歪みは知覚され難い。こ
れに対して、ピッチ予測係数βが１に近い値の場合に
は、繰り返し区間の先頭近傍は現フレームのsynp(n)(０
≦ｎ＜Ｐ)の先頭付近と殆ど合同であるので接続箇所で
の不連続性は小さくなる。As a result of this repetition, two connection points are formed before and after the repeating section by inserting the repeating section. Since the connection at the subsequent connection point is originally a connection between continuous sections, continuity is completely guaranteed. On the other hand, the connection at the previous connection point is
The end time point and the start time point of the repeating section itself are connected. However, when the pitch prediction coefficient β is not a value close to 1, the distortion due to discontinuity at the connection point is hard to be perceived because the vicinity of the current frame is a non-steady section. On the other hand, when the pitch prediction coefficient β is a value close to 1, the vicinity of the beginning of the repeated section is synp (n) (0
Since it is almost congruent with the vicinity of the head of ≦ n <P), the discontinuity at the connection point becomes small.

【００７０】以下、上記ピッチ予測係数βが１に近い場
合の具体例について述べる。図６(a)はピッチ合成フィ
ルタ４からの音源波形であり、図６(c)は繰り返し後の
音源波形である。図６(a)における区間−Ｐ≦ｎ＜０の
波形を時点０の間に挿入することによって、区間−Ｐ≦
ｎ＜０の波形が繰り返されて、図６(c)に示す波形とな
る。このときの先の接続箇所Ｂの歪みは小さい。尚、図
６に示す音源波形は、Ｌf/Ｐ＜１である場合の波形であ
る。これに対して、Ｌf/Ｐ≧１である場合の音源波形は
図７のようになる。A specific example of the case where the pitch prediction coefficient β is close to 1 will be described below. FIG. 6A shows a sound source waveform from the pitch synthesizing filter 4, and FIG. 6C shows a sound source waveform after repetition. By inserting the waveform of section −P ≦ n <0 in FIG. 6A between time points 0, section −P ≦ n
The waveform of n <0 is repeated to form the waveform shown in FIG. At this time, the strain at the connection point B is small. The sound source waveform shown in FIG. 6 is a waveform when Lf / P <1. On the other hand, the sound source waveform when Lf / P ≧ 1 is as shown in FIG.

【００７１】本実施の形態においても、Ｒ≧１の場合と
同様に、一定間隔で一定区間の音源の挿入(繰り返し)を
行わない。したがって、音声の合成処理を行いながら再
生時間を調節する必要がある。そこで、Ｒ≦１の場合に
は式(９)によって調整用変数remを算出するのである。 rem(x)＝rem(x-1)＋Ｌn(１/Ｒ−１)−Ｌr …（９）ここで、Ｌn：通常速度で再生する区間の時間長Ｌr：繰り返して再生する区間の時間長ｘ：remの算出回数Also in this embodiment, as in the case of R ≧ 1, the sound source is not inserted (repeated) in a constant interval at a constant interval. Therefore, it is necessary to adjust the reproduction time while performing the voice synthesis process. Therefore, when R ≦ 1, the adjustment variable rem is calculated by the equation (9). rem (x) = rem (x-1) + Ln (1 / R-1) -Lr (9) Here, Ln: time length of section reproduced at normal speed Lr: time length of section repeatedly reproduced x : Rem calculation count

【００７２】すなわち、上記調整用変数remは現時点ま
での希望再生時間長と実際に再生した時間長との差を表
している。この調整用変数remが０より小さい場合には
実際の再生時間は希望再生時間より長く、０より大きい
場合には実際の再生時間は希望再生時間より短いことを
示す。That is, the adjustment variable rem represents the difference between the desired reproduction time length up to the present time and the actual reproduction time length. If the adjustment variable rem is smaller than 0, the actual reproduction time is longer than the desired reproduction time, and if it is larger than 0, the actual reproduction time is shorter than the desired reproduction time.

【００７３】本実施の形態においては、ピッチ合成フィ
ルタ４で生成されたピッチ成分を有する音源synp(n)に
対してピッチ周期Ｐ単位で音源を繰り返す繰り返し処理
を行う毎に調整用変数remを算出して、現時点までに実
際に音声を再生した時間長が希望再生時間長になったか
を監視するのである。In the present embodiment, the adjustment variable rem is calculated every time the sound source synp (n) having the pitch component generated by the pitch synthesizing filter 4 is repeatedly processed by repeating the sound source in units of pitch period P. Then, it is monitored whether the time length of actually reproducing the voice up to the present time has reached the desired reproduction time length.

【００７４】図９は、上記デマルチプレクサ１,再生速
度制御部２,ピッチ無し音源生成部３およびピッチ合成
フィルタ４によって実行される遅聞き再生処理動作のフ
ローチャートである。以下、図９に従って、遅聞き再生
処理動作について説明しつつ、主に遅聞き再生処理時に
おける再生速度制御部２の動作を説明する。尚、上記遅
聞き再生処理動作のフローチャートにおけるステップＳ
22以外は、総て再生速度制御部２による処理動作であ
る。FIG. 9 is a flow chart of the slow listening reproduction processing operation executed by the demultiplexer 1, the reproduction speed control unit 2, the pitchless sound source generation unit 3 and the pitch synthesis filter 4. Hereinafter, the operation of the reproduction speed control unit 2 mainly during the slow-playing reproduction processing will be described while explaining the slow-playing reproduction processing operation with reference to FIG. 9. Incidentally, step S in the flowchart of the slow-listening reproduction processing operation described above.
All the operations other than 22 are processing operations by the reproduction speed control unit 2.

【００７５】ステップＳ21〜ステップＳ26で、図８に示
す早聞き再生処理動作のフローチャートにおける上記ス
テップＳ1〜ステップＳ6と同様にして、調整用変数rem
および音声合成処理を行わないフレームcntの初期値セ
ット、現フレームにおけるピッチ成分を有する音源synp
(n)の生成処理、調整用変数remの正否判別、現フレーム
の音声合成制御とフレーム数cntの初期値セットおよび
調整用変数rem算出、条件Ｐ≦cnt・Ｌfの判別、フレーム
数cntのインクリメントが行われる。ここで、上記ステ
ップＳ24において行われる調整用変数rem算出は、式
(９)を用いて行われる。但し、Ｌn＝Ｌf、Ｌc＝０、cnt
＝０である。In steps S21 to S26, the adjustment variable rem is set in the same manner as in steps S1 to S6 in the flowchart of the fast-listening reproduction processing operation shown in FIG.
And the initial value set of the frame cnt for which speech synthesis processing is not performed, the sound source synp having the pitch component in the current frame
(n) generation processing, determination of correctness of adjustment variable rem, voice synthesis control of current frame, calculation of initial value of frame number cnt and adjustment variable rem, determination of condition P ≦ cnt · Lf, increment of frame number cnt Is done. Here, the adjustment variable rem calculated in step S24 is calculated by
(9) is used. However, Ln = Lf, Lc = 0, cnt
= 0.

【００７６】上記ステップＳ25における判定の結果上記
条件Ｐ≦cnt・Ｌfを満たしている場合には、図６(a)ある
いは図７(a)に示すように、上記ステップＳ22において
生成された現フレーム(０≦ｎ＜Ｌf)における音源synp
(n)に相似な区間がピッチ合成フィルタ４の内部フィル
タメモリ１１に格納されている（図６(a)では−Ｐ≦ｎ
＜(−Ｐ＋Ｌf)、図７(a)では−Ｐ≦ｎ＜０)。したがっ
て、この相似な区間を繰り返すことによって遅聞き再生
が可能となる。そこで、上記条件を満たしている場合に
は、ステップＳ27に移行して、繰り返し区間の挿入を行
うのである。If the result of determination in step S25 is that the above condition P ≦ cnt · Lf is satisfied, the current frame generated in step S22 as shown in FIG. 6 (a) or FIG. 7 (a). Sound source synp at (0 ≦ n <Lf)
A section similar to (n) is stored in the internal filter memory 11 of the pitch synthesis filter 4 (-P≤n in FIG. 6A).
<(− P + Lf), −P ≦ n <0 in FIG. 7 (a). Therefore, the slow-listening reproduction can be performed by repeating the similar section. Therefore, when the above condition is satisfied, the process proceeds to step S27, and the repeated section is inserted.

【００７７】ステップＳ27で、これまで音声合成処理を
行っていないフレームの音声合成の際の制御が以下のよ
うにして行われる。すなわち、上記ピッチ合成フィルタ
４に対して、ｋ＝ cnt,(cnt−１),…,１の順に、内部フ
ィルタメモリ１１からsynp(n)（−ｋ・Ｌf≦ｎ＜−(ｋ−
１)・Ｌf)のデータを読み出して合成フィルタ６に送出さ
せる制御信号が出力される。また、線形予測係数メモリ
５に対して、上記読み出されたデータに対応する線形予
測係数α^kを読み出して合成フィルタ６に送出させる制
御信号が出力される。そうした後、合成フィルタ６に対
して、音声合成処理を行っていないフレームの音声合成
を行わせる制御信号が出力されるのである。その結果、
上記合成フィルタ６によって、これまで音声合成処理を
行っていないフレームの音声合成処理が行われる。尚、
その際におけるデータ長ＬはＬfである。In step S27, the control at the time of voice synthesis of the frame which has not been subjected to the voice synthesis processing is performed as follows. That is, with respect to the pitch synthesis filter 4, synp (n) (-kLf≤n <-(k- from the internal filter memory 11 in the order of k = cnt, (cnt-1), ...
A control signal for reading the data 1) .Lf) and sending it to the synthesis filter 6 is output. Further, a control signal for reading the linear prediction coefficient α ^k corresponding to the read data and sending it to the synthesis filter 6 is output to the linear prediction coefficient memory 5. After that, a control signal that causes the synthesis filter 6 to perform voice synthesis of a frame that has not undergone voice synthesis processing is output. as a result,
The synthesizing filter 6 performs speech synthesizing processing of a frame which has not been subjected to speech synthesizing processing so far. still,
The data length L at that time is Lf.

【００７８】ステップＳ28で、上記繰り返し区間の音声
合成の際の制御が以下のようにして行われる。すなわ
ち、先ず、Ｐ≦ｋ・Ｌf を満足する整数ｋ(０＜ｋ≦cnt)の最小値ｋ_minが求めら
れる。そして、ピッチ合成フィルタ４に対して、上記内
部フィルタメモリ１１からsynp(n)（−Ｐ≦ｎ＜−(ｋ
_min−１)・Ｌf)のデータを読み出して合成フィルタ６に
送出させる制御信号が出力される。また、線形予測係数
メモリ５に対して、上記読み出されたデータに対応する
線形予測係数α^kmin(但し、kminはｋ_minのことである)
を読み出して合成フィルタ６に送出させる制御信号が出
力される。そして、合成フィルタ６に対して、繰り返し
区間の音声合成を行わせる制御信号が出力されるのであ
る。その結果、上記合成フィルタ６によって、図６(a)
の如く繰り返し区間(−Ｐ≦ｎ＜０)が前フレーム区間
(−Ｌf≦ｎ＜０)を越える場合には、繰り返し区間のう
ち前フレーム区間を越える区間(−Ｐ≦ｎ＜−Ｌf)の音
声合成処理が行われる。これに対して、図７(a)に示す
如く繰り返し区間(−Ｐ≦ｎ＜０)が前フレーム区間(−
Ｌf≦ｎ＜０)を越えない場合には、繰り返し区間全体の
音声合成処理が行われるのである。尚、その際における
データ長Ｌは(Ｐ−(ｋ_min−１)・Ｌf)である。In step S28, the control for synthesizing the voice in the repeating section is performed as follows. That is, first, the minimum value k _{min of} the integer k (0 <k ≦ cnt) that satisfies P ≦ k · Lf is obtained. Then, for the pitch synthesizing filter 4, the synp (n) (-P≤n <-(k
A control signal for reading the data of _min −1) · Lf) and sending it to the synthesis filter 6 is output. Further, for the linear prediction coefficient memory 5, the linear prediction coefficient α ^kmin corresponding to the above-mentioned read data (however, kmin means _kmin ).
Is output and a control signal for sending it to the synthesis filter 6 is output. Then, the control signal that causes the synthesis filter 6 to perform voice synthesis in the repeated section is output. As a result, the synthesis filter 6 shown in FIG.
The repetition section (-P≤n <0) is the previous frame section as
When (-Lf≤n <0) is exceeded, voice synthesis processing is performed for a section (-P≤n <-Lf) that exceeds the previous frame section in the repeated section. On the other hand, as shown in FIG. 7A, the repeated section (-P≤n <0) is the previous frame section (-P).
When Lf ≦ n <0) is not exceeded, the voice synthesis processing for the entire repeated section is performed. The data length L at that time is (P- ( _kmin- 1) .Lf).

【００７９】次に、上記ピッチ合成フィルタ４に対し
て、ｋ＝ｋ_min−１,ｋ_min−２,…,１の順に内部フィル
タメモリ１１からsynp(n)(−ｋ・Ｌf≦ｎ＜−(ｋ−１)・
Ｌf)のデータを読み出して合成フィルタ６に送出させる
制御信号が出力される。また、線形予測係数メモリ５に
対して、上記読み出されたデータに対応する線形予測係
数α^kを読み出して合成フィルタ６に送出させる制御信
号が出力される。そうした後、合成フィルタ６に対し
て、繰り返し区間の音声合成を行わせる制御信号が出力
されるのである。その結果、上記合成フィルタ６によっ
て、繰り返し区間のうち前フレーム区間(図６(a)におけ
る区間−Ｌf≦ｎ＜０)の音声合成処理が行われるのであ
る。尚、その際におけるデータ長ＬはＬfである。Next, for the pitch synthesis filter 4, synp (n) (-k.Lf≤n <-from the internal filter memory 11 in the order of k = _kmin- 1, _kmin- 2, ... (k-1)
A control signal for reading the data of Lf) and sending it to the synthesis filter 6 is output. Further, a control signal for reading the linear prediction coefficient α ^k corresponding to the read data and sending it to the synthesis filter 6 is output to the linear prediction coefficient memory 5. After that, the control signal that causes the synthesis filter 6 to perform the voice synthesis in the repeated section is output. As a result, the synthesizing filter 6 performs the speech synthesizing process of the previous frame section (section −Lf ≦ n <0 in FIG. 6A) among the repeating sections. The data length L at that time is Lf.

【００８０】ステップＳ29〜ステップＳ33で、図８に示
す早聞き再生処理動作のフローチャートにおける上記ス
テップＳ12〜16と同様にして、調整用変数remの算出、
調整用変数remの正否判別、現フレームに係る通常速度
での音声合成処理とフレーム数cntの初期値設定および
調整用変数rem算出、条件Ｐ≦cnt・Ｌfの成立判別、フレ
ーム数cntへの“１"のセットが行われる。その際に、上
記ステップＳ29およびステップＳ31における調整用変数
remの算出は式(９)によって行われる。但し、上記ステ
ップＳ29ではＬn＝cnt・Ｌf、Ｌr＝Ｐである。また、上
記ステップＳ31ではＬn＝Ｌf、Ｌc＝０、cnt＝０であ
る。In steps S29 to S33, the adjustment variable rem is calculated in the same manner as steps S12 to 16 in the flowchart of the fast-listening reproduction processing operation shown in FIG.
Whether or not the adjustment variable rem is correct, voice synthesis processing at the normal speed related to the current frame, initial value setting of the frame number cnt and calculation of the adjustment variable rem, determination of whether the condition P ≦ cnt · Lf is established, and the frame number cnt 1 "is set. At that time, the adjustment variables in step S29 and step S31
The calculation of rem is performed by equation (9). However, in the above step S29, Ln = cnt.Lf and Lr = P. In step S31, Ln = Lf, Lc = 0, and cnt = 0.

【００８１】上記ステップＳ32における判別の結果上記
条件Ｐ≦cnt・Ｌfが成立する場合には、(ａ)における上
記早聞き再生および通常再生で述べたように、図７(a)
に示す如く、ピッチ周期で見ると、synp(n-Ｐ)(−Ｐ≦
ｎ＜０)とsynp(n)((ｍ−１)Ｐ≦ｎ＜ｍＰ)とが相似であ
る。ここで、ｍは１≦ｍ＜Ｌf/Ｐの範囲内の整数。そこ
で、現フレームにおける上記相似区間synp(n)((ｍ−１)
Ｐ≦ｎ＜ｍＰ、１≦ｍ≦ｍ_max)を繰り返し区間として利
用できるのである。そこで、ステップＳ34に移行して、
現フレームによる繰り返し音声合成処理を行うのであ
る。If the condition P ≦ cnt · Lf is satisfied as a result of the determination in step S32, as described in the fast-listening reproduction and the normal reproduction in (a), as shown in FIG.
As shown in, when viewed in pitch period, synp (n-P) (-P≤
n <0) and synp (n) ((m−1) P ≦ n <mP) are similar. Here, m is an integer within the range of 1 ≦ m <Lf / P. Therefore, the similar section synp (n) ((m-1) in the current frame
P ≦ n <mP, 1 ≦ m ≦ m _max ) can be used as the repeating section. Therefore, move to step S34,
That is, the repeated voice synthesis process is performed using the current frame.

【００８２】ステップＳ34で、先ず、ピッチ合成フィル
タ４に対して、フィルタ出力メモリ１２よりｍ＝１から
順に現フレームの繰り返し区間synp(n)((ｍ−１)Ｐ≦ｎ
＜ｍ・Ｐ)のデータ(データ長Ｌ＝Ｐ)を読み出して合成フ
ィルタ６に２回送出させる制御信号が出力される。ま
た、線形予測係数メモリ５に対して、現フレームの線形
予測係数α⁰を合成フィルタ６に送出させる制御信号が
出力される。そして、合成フィルタ６に対して、現フレ
ームにおける繰り返し区間の音声合成処理を行わせる制
御信号が出力される。その結果、上記合成フィルタ６に
よって、現フレームの繰り返し区間(図７(a)における区
間０≦ｎ＜Ｐ)の音声合成処理が２回行われるのであ
る。In step S34, first, for the pitch synthesizing filter 4, the repetition interval synp (n) ((m-1) P≤n of the current frame is sequentially read from the filter output memory 12 from m = 1.
A control signal for reading the data of <m · P) (data length L = P) and sending it to the synthesis filter 6 twice is output. Further, a control signal for sending the linear prediction coefficient α ⁰ of the current frame to the synthesis filter 6 is output to the linear prediction coefficient memory 5. Then, a control signal for causing the synthesizing filter 6 to perform the speech synthesizing process of the repeated section in the current frame is output. As a result, the synthesizing filter 6 performs the speech synthesizing process twice in the repeating section of the current frame (section 0 ≦ n <P in FIG. 7A).

【００８３】次に、上記ピッチ合成フィルタ４に対し
て、上記フィルタ出力メモリ１２から現フレームの残り
区間synp(n)(ｍ_max・Ｐ≦ｎ＜Ｌf)のデータ(データ長Ｌ
＝Ｌf−ｍ_max・Ｐ)を読み出して合成フィルタ６に送出さ
せる制御信号が出力される。また、線形予測係数メモリ
５に対して、現フレームの線形予測係数α⁰を合成フィ
ルタ６に送出させる制御信号が出力される。そして、合
成フィルタ６に対して、現フレームにおける残り区間の
音声合成処理を行わせる制御信号が出力される。その結
果、上記合成フィルタ６によって、現フレームの残り区
間(図７(a)における区間Ｐ≦ｎ＜Ｌf)の音声合成処理が
行われるのである。そうした後、上記音声合成処理を行
わないフレーム数cntが初期設定され、式（９)によって
調整用変数remの算出が行われる。ここで、Ｌn＝Ｌf、
Ｌc＝ｍ_max・Ｐである。Next, for the pitch synthesizing filter 4, the data (data length L) of the remaining section synp (n) (m _max · P ≦ n <Lf) of the current frame is output from the filter output memory 12.
= Lf−m _max · P) is read out and a control signal for sending to the synthesis filter 6 is output. Further, a control signal for sending the linear prediction coefficient α ⁰ of the current frame to the synthesis filter 6 is output to the linear prediction coefficient memory 5. Then, a control signal that causes the synthesis filter 6 to perform the voice synthesis process of the remaining section in the current frame is output. As a result, the synthesizing filter 6 performs the speech synthesizing process for the remaining section of the current frame (section P ≦ n <Lf in FIG. 7A). After that, the number of frames cnt for which the speech synthesis processing is not performed is initialized, and the adjustment variable rem is calculated by the equation (9). Where Ln = Lf,
Lc = m _max · P.

【００８４】ステップＳ35およびステップＳ36で、図８
に示す早聞き再生処理動作のフローチャートにおける上
記ステップＳ18およびステップＳ19と同様にして、ピッ
チ合成フィルタ４の内部フィルタメモリ１１および線形
予測係数メモリ５の更新、最終フレーム判別が行われ
る。そして現フレームが最終フレームであると判別され
ると遅聞き再生処理動作を終了する。In steps S35 and S36, as shown in FIG.
The internal filter memory 11 and the linear prediction coefficient memory 5 of the pitch synthesizing filter 4 are updated and the final frame is discriminated in the same manner as steps S18 and S19 in the flowchart of the fast-listening reproduction processing operation shown in FIG. When it is determined that the current frame is the last frame, the slow-listening playback processing operation ends.

【００８５】次に、以上の遅聞き再生および通常再生処
理を、図６(a),(c)に従って更に具体的に説明する。こ
こで、時点０においては、cnt＝２(直前２フレーム(−
２Ｌf≦ｎ＜０)では音声処理を行っていない)、rem＞０
(繰り返し区間を探索する状態)とする。先ず、ステップ
Ｓ22において、現フレーム(０≦ｎ＜Ｌf)に係るピッチ
成分を有する音源synp(n)が生成される。そして、ステ
ップＳ23においてrem＞０であると判別されてステップ
Ｓ25に進む。さらに、図６(a)よりＰ＜cnt・Ｌf＝２Ｌf
であるから、ステップＳ27に進む。上記ステップＳ27で
はsynp(n)(−２Ｌf≦ｎ＜−Ｌf、−Ｌf≦ｎ＜０)（すな
わち、音声合成処理を行っていないフレームの音源)を
用いてフレーム単位で音声合成を行う。さらに、ステッ
プＳ28において、synp(n)(−Ｐ≦ｎ＜０)（すなわち、
繰り返し区間の音源)を用いて、−Ｐ≦ｎ＜−Ｌf(すな
わち、繰り返し区間のうち前フレームを越える区間)と
−Ｌf≦ｎ＜０(すなわち、繰り返し区間のうち前フレー
ム区間)との２区間に分けて音声合成処理を行うのであ
る。Next, the above-described slow-playing reproduction and normal reproduction processing will be described more specifically with reference to FIGS. 6 (a) and 6 (c). Here, at the time point 0, cnt = 2 (the last two frames (-
2Lf ≦ n <0), voice processing is not performed), rem> 0
(State of searching for repeated section). First, in step S22, a sound source synp (n) having a pitch component related to the current frame (0 ≦ n <Lf) is generated. Then, in step S23, it is determined that rem> 0, and the process proceeds to step S25. Further, from FIG. 6 (a), P <cnt.Lf = 2Lf
Therefore, the process proceeds to step S27. In step S27, speech synthesis is performed frame by frame using synp (n) (-2Lf≤n <-Lf, -Lf≤n <0) (that is, a sound source of a frame for which speech synthesis processing is not performed). Further, in step S28, synp (n) (−P ≦ n <0) (that is,
2) of -P≤n <-Lf (that is, a section that exceeds the previous frame in the repeated section) and -Lf≤n <0 (that is, the previous frame section in the repeated section) by using the sound source of the repeated section). The voice synthesis processing is performed by dividing into sections.

【００８６】そして、上記ステップＳ29において算出さ
れた調整用変数remの値はrem≦０であるとするとステッ
プＳ31に進み、synp(n)(０≦ｎ＜Ｌf)（すなわち、現フ
レームの音源)を用いて現フレームに係る通常速度での
音声合成処理が行われる。その結果、音声合成処理を行
う音源の区間は、図６(c)に示すようになる。こうし
て、１ピッチ周期Ｐ分の音源波形を挿入することによっ
て、音質の劣化が少なくて音の高さが変わらない遅聞き
再生処理が行われるのである。If the value of the adjustment variable rem calculated in step S29 is rem≤0, the process proceeds to step S31, where synp (n) (0≤n <Lf) (that is, the sound source of the current frame). Is used to perform speech synthesis processing at the normal speed for the current frame. As a result, the section of the sound source for which the voice synthesis processing is performed is as shown in FIG. Thus, by inserting the sound source waveform for one pitch period P, the slow-listening reproduction processing in which the pitch of the sound is not changed and the pitch of the sound is not changed is performed.

【００８７】尚、図７(a)の如く、Ｌf/Ｐ≧１である音
源波形の場合（ｋ_min＝１,cnt＝２)には、ステップＳ30
において調整用変数remの値はrem＞０であると判断され
たとするとステップＳ32に進む。そして、cnt＝２であ
るから、条件“Ｐ≦cnt・Ｌf＝２Ｌf"が成立してステッ
プＳ34に進み、１≦ｍ≦Ｌf/Ｐなる整数ｍはｍ＝ｍ_max
＝１であるからsynp(n)(０≦ｎ＜Ｐ)（すなわち、現フ
レームの繰り返し区間の音源)を用いて、現フレームの
繰り返し区間の音声合成処理が２回行われる。その後、
synp(n)(Ｐ≦ｎ＜Ｌf)（すなわち、現フレームの残り区
間の音源)を用いて、現フレームの残り区間の音声合成
処理が行われる。その結果、音声合成処理を行う音源の
区間は図７(c)に示すようになる。こうして、２ピッチ
周期２Ｐ分の音源波形を挿入した遅聞き再生処理が行わ
れるのである。Incidentally, as shown in FIG. 7A, in the case of a sound source waveform with Lf / P ≧ 1 ( _kmin = 1, cnt = 2), step S30
If it is determined that the value of the adjustment variable rem is rem> 0, the process proceeds to step S32. Then, since cnt = 2, the condition “P ≦ cnt · Lf = 2Lf” is satisfied and the process proceeds to step S34, where the integer m satisfying 1 ≦ m ≦ Lf / P is m = m _max.
Since = 1, synp (n) (0 ≦ n <P) (that is, the sound source in the repeating section of the current frame) is used to perform the voice synthesis process twice in the repeating section of the current frame. afterwards,
The speech synthesis processing of the remaining section of the current frame is performed using synp (n) (P ≦ n <Lf) (that is, the sound source of the remaining section of the current frame). As a result, the section of the sound source for which the voice synthesis processing is performed is as shown in FIG. 7 (c). In this way, the slow-hearing reproduction processing in which the sound source waveform for two pitch periods of 2P is inserted is performed.

【００８８】図４は、図９に示す遅聞き再生および通常
再生処理を実行するための再生速度制御部２の具体的な
構成例を示す。この再生速度制御部２は、再生時間差検
出部２５,繰り返し区間検出部２６および繰り返し処理
部２７を有する。そして、再生時間差検出部２５は、外
部から入力される再生速度倍率Ｒと繰り返し処理部２７
からの通常速度で再生する区間の時間長Ｌnおよび繰り
返して再生する区間の時間長Ｌrとに基づいて、式(９)
に従って調整用変数remを算出する。そして、図９に示
す遅聞き再生処理動作のフローチャートにおける上記ス
テップＳ23およびステップＳ30の判断処理を行う。FIG. 4 shows a concrete example of the structure of the reproduction speed control unit 2 for executing the slow-playing reproduction and the normal reproduction processing shown in FIG. The reproduction speed control unit 2 has a reproduction time difference detection unit 25, a repetition section detection unit 26, and a repetition processing unit 27. Then, the reproduction time difference detection unit 25 detects the reproduction speed magnification R input from the outside and the repetition processing unit 27.
(9) based on the time length Ln of the section to be reproduced at the normal speed from and the time length Lr of the section to be repeatedly reproduced.
The adjustment variable rem is calculated in accordance with. Then, the judgment processing of steps S23 and S30 in the flowchart of the slow-hearing reproduction processing operation shown in FIG. 9 is performed.

【００８９】また、上記繰り返し区間検出部２６は、デ
マルチプレクサ１からのピッチ周期Ｐと繰り返し処理部
２７からの音声合成処理を行わないフレーム数cntと内
部メモリに格納されたフレーム長Ｌfに基づいて、図９
に示す遅聞き再生処理動作のフローチャートにおける上
記ステップＳ25およびステップＳ32の判断処理を行っ
て、ピッチ合成フィルタ４で生成されて内部フィルタメ
モリ１１およびフィルタ出力メモリ１２に格納されてい
るピッチ成分を有する音源synp(n)の中における繰り返
しできる区間の有無を検出するのである。また、上記繰
り返し処理部２７は、図９に示す遅聞き再生処理動作の
フローチャートにおけるデマルチプレクサ１,ピッチ無
し音源生成部３,ピッチ合成フィルタ４,再生時間差検出
部２５および繰り返し区間検出部２６による処理以外の
処理を行って、音源synp(n)の中で繰り返しできる区間
をピッチ単位で繰り返した音源synp'(n)を生成して音声
再生するためにピッチ合成フィルタ４,線形予測係数メ
モリ５および合成フィルタ６を制御する制御信号を生成
する。Further, the repeating section detecting unit 26, based on the pitch period P from the demultiplexer 1, the number of frames cnt from the repeating processing unit 27 in which voice synthesis processing is not performed, and the frame length Lf stored in the internal memory. , Fig. 9
A sound source having a pitch component generated by the pitch synthesizing filter 4 and stored in the internal filter memory 11 and the filter output memory 12 by performing the determination processing of the above steps S25 and S32 in the flowchart of the slow listening reproduction processing operation shown in FIG. The presence or absence of repeatable intervals in synp (n) is detected. Further, the repetition processing unit 27 is processed by the demultiplexer 1, the pitchless sound source generation unit 3, the pitch synthesis filter 4, the reproduction time difference detection unit 25, and the repetition section detection unit 26 in the flowchart of the slow-hearing reproduction processing operation shown in FIG. Pitch synthesizing filter 4, linear prediction coefficient memory 5 and a linear prediction coefficient memory 5 in order to generate a sound source synp '(n) in which a repeatable section of the sound source synp (n) is repeated in pitch units and reproduce the sound by performing processing other than A control signal for controlling the synthesis filter 6 is generated.

【００９０】尚、以上の遅聞き再生および通常再生動作
においては、上記再生速度倍率Ｒの値が０.７程度以下
になると、繰り返し区間の量が足りなくなって実際の再
生時間(Ｌn＋Lｒ)が希望再生時間(Ｌn/Ｒ)より早くなっ
てしまう。その場合には、繰り返し区間の繰り返し数を
多くすればよい。ここで、繰り返し数を“ｒ"とする
と、式(９)は、 rem(x)＝rem(x-1)＋Ｌn(１/Ｒ−１)−ｒ・Ｌr …（１０）となる。そこで、式(１０)におけるｒを調整することに
よって調整用変数remが０に近づくようにするのであ
る。但し、繰り返し数ｒが大きくなると再生音質が劣化
するので、繰り返し数ｒの最大値ｒ_maxを予め決めてお
く必要がある。In the slow-playing reproduction and the normal reproduction operation described above, when the value of the reproduction speed magnification R becomes about 0.7 or less, the amount of the repeated section becomes insufficient and the actual reproduction time (Ln + Lr) is desired. It becomes faster than the playback time (Ln / R). In that case, the number of repetitions in the repeating section may be increased. Here, assuming that the number of repetitions is “r”, the equation (9) becomes rem (x) = rem (x−1) + Ln (1 / R−1) −r · Lr (10). Therefore, the adjustment variable rem is made to approach 0 by adjusting r in the equation (10). However, since the reproduction sound quality deteriorates when the number of repetitions r increases, it is necessary to determine the maximum value r _max of the number of repetitions r in advance.

【００９１】具体的には、上記調整用変数remに関する
“０"以下の閾値Ｓremを設定しておく。そして、図９に
示す遅聞き再生処理動作のフローチャートにおける上記
ステップＳ28およびステップＳ34では、音声合成処理に
先立って、先ず式(９)を用いて調整用変数remを算出す
る。そして、算出された調整用変数remがrem＜Ｓremで
あれば、上述したような通常の繰り返し処理では再生時
間が足りないと判断し、式(１０)を用いてrem≧０とな
る最小の繰り返し数ｒを求める。そして、この求めた繰
り返し数ｒと最大値ｒ_maxとの何れか小さい方を繰り返
し数とするのである。そして、この繰り返し数だけ上記
繰り返し区間を繰り返して各ステップの音声合成処理を
行う。そうした後、上記繰り返し数を式(１０)中の繰り
返し数ｒとして調整用変数remの更新を行うのである。
尚、上記閾値ＳremがＳrem＝０であれば、上述の“rem
＜Ｓrem"の判定を行わず、常に繰り返し数を求めること
になる。Specifically, a threshold value Srem that is equal to or less than "0" for the adjustment variable rem is set. Then, in steps S28 and S34 in the flowchart of the slow-hearing reproduction processing operation shown in FIG. 9, the adjustment variable rem is first calculated using the equation (9) prior to the voice synthesis processing. Then, if the calculated adjustment variable rem is rem <Srem, it is determined that the reproduction time is short in the above-described normal iterative processing, and the minimum repetition that satisfies rem ≧ 0 by using the equation (10). Find the number r. Then, the smaller one of the obtained repeat number r and the maximum value r _max is set as the repeat number. Then, the above-described repeating section is repeated by the number of repetitions, and the voice synthesis processing of each step is performed. After that, the adjustment variable rem is updated with the number of iterations as the number of iterations r in the equation (10).
If the threshold Srem is Srem = 0, the above "rem
The number of repetitions is always obtained without making the determination of <Srem ”.

【００９２】上述のように、本実施の形態における音声
復号化装置は再生速度制御部２を有し、この再生速度制
御部２には再生時間差検出部２５,繰り返し区間検出部
２６および繰り返し処理部２７を設けている。そして、
再生時間差検出部２５は、繰り返し処理毎に、再生速度
倍率Ｒと繰り返し処理結果に基づいて調整用変数remを
算出する。そして、得られた調整用変数remの正否によ
って、次の繰り返し処理を行うか否かを判定する。その
結果、繰り返し処理を行う場合には、繰り返し区間検出
部２６によって、これまで音声合成処理を行っていない
区間及び現フレーム中におけるピッチ周期Ｐ単位での繰
り返し区間の有無を検出する。そして、この検出結果に
従って、繰り返し処理部２７によって、ピッチ合成フィ
ルタ４で生成されたピッチ成分を有する音源synp(n)に
繰り返し区間を挿入して音源synp'(n)を生成するための
制御信号を出力するようにしている。As described above, the speech decoding apparatus according to the present embodiment has the reproduction speed control unit 2, and the reproduction speed control unit 2 includes the reproduction time difference detection unit 25, the repetition section detection unit 26 and the repetition processing unit. 27 are provided. And
The reproduction time difference detection unit 25 calculates the adjustment variable rem based on the reproduction speed magnification R and the result of the repetition processing for each repetition processing. Then, depending on whether the obtained adjustment variable rem is correct or not, it is determined whether or not the next iterative process is performed. As a result, when repeating processing is performed, the repeating section detecting unit 26 detects whether or not there is a section for which speech synthesis processing has not been performed so far and a repeating section in units of the pitch period P in the current frame. Then, according to the detection result, the iterative processing unit 27 inserts a repetitive section into the sound source synp (n) having the pitch component generated by the pitch synthesis filter 4 to generate a sound source synp '(n). Is output.

【００９３】したがって、本実施の形態によれば、ピッ
チ周期Ｐ単位で音源synp(n)に対する繰り返し処理を行
うことができ、再生音声の音質劣化が少なく、且つ、ピ
ッチ周期Ｐがフレーム長Ｌfより長い場合でも対処でき
る音声復号化装置を実現できる。また、音声合成処理を
行いながら調整用変数remによって次の繰り返し処理の
実行の可否を判定しているので、再生速度を指定された
再生速度倍率Ｒになるように正しく制御できる。Therefore, according to the present embodiment, the sound source synp (n) can be repeatedly processed for each pitch period P, the sound quality of reproduced voice is less deteriorated, and the pitch period P is less than the frame length Lf. It is possible to realize a voice decoding device that can handle even a long time. Further, since it is determined whether or not the next iterative process can be executed by the adjustment variable rem while performing the voice synthesis process, it is possible to correctly control the reproduction speed to the specified reproduction speed magnification R.

【００９４】以上、Ｒ≧１の早聞き再生および通常再生
の場合と、Ｒ≦１の遅聞き再生および通常再生の場合に
つて述べてきた。ところで、両場合においてＲ＝１を含
んでいるのは、どちらの場合も通常再生処理が可能であ
ることを示している。このとき、式(８),式(９)の何れ
においても、常に、調整用変数rem＝０となる。本実施
の形態においては、実際の再生速度が所望の再生速度に
なるように、式(８)または式(９)による調整用変数rem
に基づいて再生速度を調整している。ここで、式(８)お
よび式(９)における再生速度倍率Ｒは定数でなくても構
わない。したがって、再生途中であっても再生速度制御
部２に入力される再生速度倍率Ｒを変更することによっ
て再生速度を変えることができるのである。In the above, the cases of the fast listening reproduction and the normal reproduction of R ≧ 1 and the slow listening reproduction and the normal reproduction of R ≦ 1 have been described. By the way, the fact that R = 1 is included in both cases indicates that the normal reproduction processing is possible in both cases. At this time, in any of the equations (8) and (9), the adjustment variable rem = 0 always holds. In the present embodiment, the adjustment variable rem according to equation (8) or equation (9) is set so that the actual playback speed becomes the desired playback speed.
The playback speed is adjusted based on. Here, the reproduction speed magnification R in the equations (8) and (9) does not have to be a constant. Therefore, even during reproduction, the reproduction speed can be changed by changing the reproduction speed magnification R input to the reproduction speed control unit 2.

【００９５】但し、上述した早聞き再生および通常再生
の場合や、遅聞き再生および通常再生の場合には、再生
速度制御部２が図３あるいは図４の構造を有して、図８
あるいは図９の如く異なったフローチャートに従って動
作するようになっている。したがって、上述した早聞き
再生および通常再生の場合には再生速度倍率ＲをＲ＞１
からＲ＜１に変更することはできない。同様に、遅聞き
再生および通常再生の場合には、Ｒ＜１からＲ＞１に変
更することはできない。つまり、早聞き再生処理か遅聞
き再生処理の何れか一方しかできないのである。However, in the case of the fast-listening reproduction and the normal reproduction, or the slow-listening reproduction and the normal reproduction described above, the reproduction speed control unit 2 has the structure of FIG. 3 or FIG.
Alternatively, it operates according to a different flowchart as shown in FIG. Therefore, in the above-described fast-listening reproduction and normal reproduction, the reproduction speed magnification R is set to R> 1.
To R <1 cannot be changed. Similarly, it is not possible to change from R <1 to R> 1 in the case of slow playback and normal playback. That is, only one of the fast-listening reproduction process and the slow-listening reproduction process can be performed.

【００９６】(ｃ) 早聞き,遅聞き再生処理および通常
再生の場合本実施の形態においては、早聞き再生処理と遅聞き再生
処理との両方に対処できる再生速度制御部２について説
明する。本実施の形態においては、上記再生速度倍率Ｒ
を再生処理中に変更可能にするために、再生速度制御部
２では、再生速度倍率Ｒと１との大小によって式(８)あ
るいは式(９)を用いて調整用変数remを算出するのであ
る。但し、その際には、式(８)と式(９)との次元を揃え
る必要がある。式(８)によって算出される調整用変数re
mは、上述したように、現時点までの実際に再生した時
間長と希望再生時間長の差のＲ倍を表している。そこ
で、本実施の形態では式(８)の次元を式(９)の次元（式
(９)によって算出される調整用変数remは現時点までの
希望再生時間長と実際に再生した時間長の差である）に
合わせるのである。つまり、本実施の形態においては、
再生速度倍率ＲがＲ≧１の場合には調整用変数remを式
(１１)によって算出するのである。 rem(x)＝rem(x-1)＋(Ｌpb(Ｒ−１)−Ｌc)/Ｒ …（１１) 尚、もし再生途中に再生速度倍率Ｒを変更せずに再生処
理が１段落した際に変更するのであれば、式(８)をその
まま用いても問題はない。(C) Cases of fast-listening, slow-listening reproduction processing and normal reproduction In the present embodiment, the reproduction speed control section 2 capable of coping with both the fast-listening reproduction processing and the slow-listening reproduction processing will be described. In the present embodiment, the reproduction speed magnification R
In order to make it possible to change during the reproduction processing, the reproduction speed control unit 2 calculates the adjustment variable rem using the formula (8) or the formula (9) according to the size of the reproduction speed magnification R and 1. . However, in that case, it is necessary to match the dimensions of Expression (8) and Expression (9). Adjustment variable re calculated by equation (8)
As described above, m represents R times the difference between the actual reproduction time length up to the present time and the desired reproduction time length. Therefore, in the present embodiment, the dimension of equation (8) is changed to the dimension of equation (9) (equation
The adjustment variable rem calculated by (9) is the difference between the desired reproduction time length up to the present time and the actual reproduction time length). That is, in the present embodiment,
When the reproduction speed magnification R is R ≧ 1, the adjustment variable rem
It is calculated by (11). rem (x) = rem (x-1) + (Lpb (R-1) -Lc) / R (11) Incidentally, if the reproduction processing reaches one stage without changing the reproduction speed magnification R during reproduction. If it is changed to, there is no problem in using the equation (8) as it is.

【００９７】図１０および図１１は、上記デマルチプレ
クサ１,再生速度制御部２,ピッチ無し音源生成部３およ
びピッチ合成フィルタ４によって実行される音声再生処
理動作のフローチャートである。本音声再生処理動作の
フローチャートは、図８に示す早聞き再生処理動作のフ
ローチャートの系列と図９に示す遅聞き再生処理動作の
フローチャートの系列とを並行して備えており、両系列
への分岐点において再生速度倍率Ｒと“１"との大小を
判別して何れの系列に分岐するかを判定するものであ
る。以下、図１０および図１１に従って、早聞き再生処
理と遅聞き再生処理との両方に対処できる音声再生処理
動作について説明する。FIG. 10 and FIG. 11 are flowcharts of the audio reproduction processing operation executed by the demultiplexer 1, the reproduction speed control unit 2, the pitchless sound source generation unit 3 and the pitch synthesis filter 4. The flow chart of the audio reproduction processing operation includes a series of flowcharts of the fast-listening reproduction processing operation shown in FIG. 8 and a series of flowcharts of the slow-listening reproduction processing operation shown in FIG. 9 in parallel. At this point, the size of the reproduction speed magnification R and "1" is discriminated to determine which sequence to branch to. An audio reproduction processing operation capable of coping with both the fast-listening reproduction processing and the slow-listening reproduction processing will be described below with reference to FIGS.

【００９８】ステップＳ41〜ステップＳ43で、図８に示
す早聞き再生処理動作のフローチャートにおけるステッ
プＳ1〜ステップＳ3と同様にして、調整用変数remおよ
び音声合成処理を行わないフレームcntの初期値セッ
ト、現フレームにおけるピッチ成分を有する音源synp
(n)の生成処理、調整用変数remの正否判別が行われる。
ステップＳ44で、上記再生速度倍率Ｒが１以上であるか
否かが判別される。その結果、１以上であれば、ステッ
プＳ45に進んで早聞き再生および通常再生処理動作に移
行する。一方、１より小さければ、ステップＳ46に進ん
で遅聞き再生処理動作に移行する。ステップＳ45で、図
８に示す早聞き再生処理動作のフローチャートにおける
ステップＳ4と同様にして、現フレームの音声合成制御
とフレーム数cntの初期値セットおよび調整用変数remの
算出が行われる。そうした後ステップＳ68に進む。その
際における調整用変数remの算出は式(１１)を用いて行
われる。ステップＳ46で、図９に示す遅聞き再生処理動
作のフローチャートにおけるステップＳ24と同様にし
て、現フレームの音声合成制御とフレーム数cntの初期
値セットおよび調整用変数rem算出が行われる。そうし
た後ステップＳ68に進む。その際における調整用変数re
m算出は式(９)を用いて行われる。In steps S41 to S43, the adjustment variable rem and the initial value set of the frame cnt not subjected to the voice synthesis processing are set in the same manner as in steps S1 to S3 in the flowchart of the fast-listening reproduction processing operation shown in FIG. Sound source synp with pitch component in current frame
The generation processing (n) and the correctness determination of the adjustment variable rem are performed.
In step S44, it is determined whether or not the reproduction speed magnification R is 1 or more. As a result, if it is 1 or more, the process proceeds to step S45 to shift to the fast-listening reproduction and normal reproduction processing operations. On the other hand, if it is smaller than 1, the process proceeds to step S46 to shift to the slow-listening reproduction processing operation. In step S45, the voice synthesis control of the current frame, the initial value set of the frame number cnt, and the adjustment variable rem are calculated in the same manner as step S4 in the flowchart of the fast-listening reproduction processing operation shown in FIG. After that, the process proceeds to step S68. At this time, the adjustment variable rem is calculated using equation (11). In step S46, the voice synthesis control of the current frame, the initial value setting of the number of frames cnt, and the adjustment variable rem are calculated in the same manner as in step S24 in the flowchart of the slow-playing reproduction processing operation shown in FIG. After that, the process proceeds to step S68. Adjustment variable re in that case
The m calculation is performed using the equation (9).

【００９９】ステップＳ47,ステップＳ48で、図８に示
す早聞き再生処理動作のフローチャートにおけるステッ
プＳ5,ステップＳ6と同様にして、条件Ｐ≦cnt・Ｌfの判
別、フレーム数cntのインクリメントが行われる。ステ
ップＳ49で、上記再生速度倍率Ｒが１以上であるか否か
が判別される。その結果、１以上であれば、ステップＳ
50に進んで早聞き再生および通常再生処理動作に移行す
る。一方、１より小さければ、ステップＳ56に進んで遅
聞き再生処理動作に移行する。ステップＳ50〜ステップ
Ｓ55で、図８に示す早聞き再生処理動作のフローチャー
トにおけるステップＳ7〜ステップＳ12と同様にして、
Ｐ≦ｋ・Ｌfを満たす整数ｋの最小値ｋ_mimの算出、ｋ_mim
＝cntの判別、音声合成処理を行っていない区間中の削
除区間を含まないフレームの音声合成処理、ｋ_mim・Ｌf
＝Ｐの判別、音声合成処理を行っていない区間中の削除
区間を含むフレームの音声合成処理、調整用変数rem算
出が行われる。その際における調整用変数rem算出は式
(１１)を用いて行われる。ステップＳ56〜ステップＳ59
で、図９に示す遅聞き再生処理動作のフローチャートに
おけるステップＳ27〜ステップＳ30と同様にして、音声
合成処理を行っていないフレームの音声合成処理、繰り
返し区間の音声合成処理、調整用変数rem算出、調整用
変数remの正否判別が行われる。その際における調整用
変数rem算出は式(９)を用いて行われる。In steps S47 and S48, similarly to steps S5 and S6 in the flowchart of the fast-listening reproduction processing operation shown in FIG. 8, the condition P ≦ cnt · Lf is determined and the number of frames cnt is incremented. In step S49, it is determined whether or not the reproduction speed magnification R is 1 or more. If the result is 1 or more, step S
Proceed to step 50 to switch to fast-listening playback and normal playback processing operations. On the other hand, if it is smaller than 1, the process proceeds to step S56 to shift to the late-listening playback processing operation. In steps S50 to S55, similar to steps S7 to S12 in the flowchart of the fast-listening reproduction processing operation shown in FIG.
Calculation of minimum value k _mim of integer k satisfying P ≦ k · Lf, k _mim
= Cnt determination, voice synthesis processing of a frame not including a deleted section in a section not subjected to voice synthesis processing, k _mim · Lf
= P, a voice synthesis process of a frame including a deletion section in a section in which voice synthesis processing is not performed, and an adjustment variable rem calculation are performed. The adjustment variable rem is calculated using the formula
(11) is used. Step S56 to Step S59
Then, in the same manner as steps S27 to S30 in the flowchart of the slow-hearing reproduction processing operation shown in FIG. 9, the speech synthesis processing of the frame in which the speech synthesis processing is not performed, the speech synthesis processing of the repeated section, the adjustment variable rem calculation, Whether or not the adjustment variable rem is correct is determined. The calculation of the adjustment variable rem at that time is performed using the equation (9).

【０１００】ステップＳ60〜ステップＳ62で、上記ステ
ップＳ44〜ステップＳ46と同様にしで、Ｒ≧１の判別、
現フレームの音声合成制御とフレーム数cntの初期値セ
ットおよび式(１１)による調整用変数remの算出、現フ
レームの音声合成制御とフレーム数cntの初期値セット
および式(９)による調整用変数remの算出が行われる。
そうした後ステップＳ68に進む。In steps S60 to S62, in the same manner as steps S44 to S46, it is determined whether R ≧ 1.
Voice synthesis control of the current frame, calculation of the initial value set of the frame number cnt and the adjustment variable rem by the formula (11), voice synthesis control of the current frame and initial value set of the frame number cnt and the adjustment variable of the formula (9) rem is calculated.
After that, the process proceeds to step S68.

【０１０１】ステップＳ63,ステップＳ64で、図８に示
す早聞き再生処理動作のフローチャートにおける上記ス
テップＳ15およびステップＳ16と同様にして、Ｐ≦cnt・
Ｌfの判別、フレーム数cntへの“１"のセットが行われ
る。ステップＳ65で、上記再生速度倍率Ｒが１以上であ
るか否かが判別される。その結果、１以上であれば、ス
テップＳ66に進んで早聞き再生および通常再生処理動作
に移行する。一方、１より小さければ、ステップＳ67に
進んで遅聞き再生処理動作に移行する。ステップＳ66
で、図８に示す早聞き再生処理動作のフローチャートに
おける上記ステップＳ17と同様にして、現フレームにお
ける削除区間以外の区間の音声合成処理と、フレーム数
cntの初期値セットおよび式(１１)による調整用変数rem
の算出が行われる。ステップＳ67で、図９に示す遅聞き
再生処理動作のフローチャートにおける上記ステップＳ
34と同様にして、現フレームにおける繰り返し区間と残
りの区間との音声合成処理と、フレーム数cntの初期値
セットおよび式(９)による調整用変数remの算出が行わ
れる。In steps S63 and S64, P≤cnt.multidot.s, in the same manner as steps S15 and S16 in the flowchart of the fast-listening reproduction processing operation shown in FIG.
The determination of Lf and the setting of "1" to the frame number cnt are performed. In step S65, it is determined whether or not the reproduction speed magnification R is 1 or more. As a result, if it is 1 or more, the process proceeds to step S66 to shift to the fast listening reproduction and normal reproduction processing operations. On the other hand, if it is smaller than 1, the process proceeds to step S67 to shift to the slow-listening playback processing operation. Step S66
Then, similarly to step S17 in the flowchart of the fast-listening reproduction processing operation shown in FIG. 8, the speech synthesis processing of the section other than the deleted section in the current frame and the number of frames are performed.
The initial value set of cnt and the adjustment variable rem according to equation (11)
Is calculated. In step S67, the above-mentioned step S in the flowchart of the slow-playing reproduction processing operation shown in FIG.
Similar to 34, voice synthesis processing of the repeated section and the remaining section in the current frame, the initial value set of the number of frames cnt, and the calculation of the adjustment variable rem by Expression (9) are performed.

【０１０２】ステップＳ68,ステップＳ69で、図８に示
す早聞き再生処理動作のフローチャートにおける上記ス
テップＳ18およびステップＳ19と同様にして、ピッチ合
成フィルタ４の内部フィルタメモリ１１および線形予測
係数メモリ５の更新、最終フレーム判別が行われる。そ
して現フレームが最終フレームであると判別されると音
声再生処理動作を終了する。In steps S68 and S69, the internal filter memory 11 and the linear prediction coefficient memory 5 of the pitch synthesizing filter 4 are updated in the same manner as steps S18 and S19 in the flowchart of the fast-listening reproduction processing operation shown in FIG. The final frame determination is performed. When it is determined that the current frame is the final frame, the audio reproduction processing operation ends.

【０１０３】図５は、図１０および図１１に示す音声再
生処理を実行するための再生速度制御部２の具体的な構
成例を示す。この再生速度制御部２は、再生時間差検出
部３１,繰り返し・削除区間検出部３２,削除処理部３３,
繰り返し処理部３４および再生速度倍率判定部３５を有
している。そして、再生時間差検出部３１は、外部から
入力される再生速度倍率Ｒと削除処理部３３からのＬpb
およびＬcと繰り返し処理部３４からのＬnおよびＬrと
に基づいて、式(９)あるいは式(１１)に従って調整用変
数remを算出する。そして、図１０および図１１に示す
音声再生処理動作のフローチャートにおける上記ステッ
プＳ43およびステップＳ59の判断処理を行う。FIG. 5 shows a concrete example of the structure of the reproduction speed control unit 2 for executing the audio reproduction processing shown in FIGS. The reproduction speed control unit 2 includes a reproduction time difference detection unit 31, a repetition / deletion section detection unit 32, a deletion processing unit 33,
It has a repetition processing unit 34 and a reproduction speed magnification determination unit 35. Then, the reproduction time difference detection unit 31 receives the reproduction speed magnification R input from the outside and Lpb from the deletion processing unit 33.
Based on Lc and Lc and Ln and Lr from the iterative processing unit 34, the adjustment variable rem is calculated according to Expression (9) or Expression (11). Then, the determination processing of steps S43 and S59 in the flowchart of the audio reproduction processing operation shown in FIGS. 10 and 11 is performed.

【０１０４】また、上記繰り返し・削除区間検出部３２
は、デマルチプレクサ１からのピッチ周期Ｐと削除処理
部３３および繰り返し処理部３４からの音声合成処理を
行わないフレーム数cntと内部メモリに格納されたフレ
ーム長Ｌfに基づいて、図１０および図１１に示す音声
再生処理動作のフローチャートにおける上記ステップＳ
47およびステップＳ63の判断処理を行って、ピッチ合成
フィルタ４で生成されて内部フィルタメモリ１１および
フィルタ出力メモリ１２に格納されているピッチ成分を
有する音源synp(n)中における削除あるいは繰り返しで
きる区間を検出するのである。また、上記再生速度倍率
判定部３５は、図１０および図１１に示す音声再生処理
動作のフローチャートにおける上記ステップＳ44,Ｓ49,
Ｓ60,Ｓ65の判断処理を行って、実行される再生処理が
早聞き再生処理および遅聞き再生処理の何れであるかを
判断する。そして、判断結果に基づいて、再生時間差検
出部３１に、調整用変数remを算出する際に式(９)と式
(１１)との何れの式を用いるかを知らせるのである。The repeat / delete section detection unit 32
10 and 11 are based on the pitch period P from the demultiplexer 1, the number of frames cnt from the deletion processing unit 33 and the repetition processing unit 34 for which the voice synthesis processing is not performed, and the frame length Lf stored in the internal memory. Step S in the flowchart of the audio reproduction processing operation shown in FIG.
After performing the determination process of 47 and step S63, a section that can be deleted or repeated in the sound source synp (n) having the pitch component generated by the pitch synthesis filter 4 and stored in the internal filter memory 11 and the filter output memory 12 is selected. To detect. In addition, the reproduction speed magnification determination unit 35 uses the steps S44, S49, and S44 in the flowchart of the audio reproduction processing operation shown in FIGS.
The determination processing of S60 and S65 is performed to determine whether the reproduction processing to be executed is the fast-listening reproduction processing or the slow-listening reproduction processing. Then, based on the determination result, the reproduction time difference detection unit 31 calculates the adjustment variable rem using the formula (9) and the formula (9).
It informs which of the expressions (11) and (11) is used.

【０１０５】また、上記削除処理部３３および繰り返し
処理部３４は、図１０および図１１に示す音声再生処理
動作のフローチャートにおけるデマルチプレクサ１,ピ
ッチ無し音源生成部３,ピッチ合成フィルタ４,再生時間
差検出部３１,繰り返し・削除区間検出部３２および再生
速度倍率判定部３５による処理以外の処理を行って、音
源synp(n)の中で削除や繰り返しができる区間をピッチ
単位で削除あるいは繰り返した音源synp'(n)を生成して
音声再生するためにピッチ合成フィルタ４,線形予測係
数メモリ５および合成フィルタ６を制御する制御信号を
生成する。The deletion processing unit 33 and the repetition processing unit 34 include the demultiplexer 1, the pitchless sound source generation unit 3, the pitch synthesis filter 4, and the reproduction time difference detection in the flowcharts of the audio reproduction processing operation shown in FIGS. A sound source synp in which a section that can be deleted or repeated in the sound source synp (n) is deleted or repeated in pitch units by performing processing other than the processing by the unit 31, the repeated / deleted section detection unit 32, and the reproduction speed magnification determination unit 35. A control signal for controlling the pitch synthesizing filter 4, the linear prediction coefficient memory 5 and the synthesizing filter 6 is generated in order to generate '(n) and reproduce the voice.

【０１０６】上述のように、本実施の形態における音声
復号化装置は再生速度制御部２を有し、この再生速度制
御部２には再生時間差検出部３１,繰り返し・削除区間検
出部３２,削除処理部３３,繰り返し処理部３４および再
生速度倍率判定部３５を設けている。そして、再生時間
差検出部３１は、削除処理または繰り返し処理毎に、再
生速度倍率Ｒと削除処理結果あるいは繰り返し処理結果
に基づいて調整用変数remを算出する。そして、得られ
た調整用変数remの正否によって、次の削除処理部ある
いは繰り返し処理を行うか否かを判定する。その結果行
う場合には、再生速度倍率判定部３５によって再生速度
倍率ＲがＲ≧１であるかＲ＜１であるかを判定すること
によって削除処理を行うのか繰り返し処理を行うのかを
判定する。そして、削除処理を行う場合には削除処理部
３３によって、図３における削除処理部２３と同様にし
て、ピッチ周期Ｐ単位で削除処理を行って音源synp'(n)
を生成するための制御信号を出力する。一方、繰り返し
処理を行う場合には繰り返し処理部３４によって、図４
における繰り返し処理部２７と同様にして、ピッチ周期
Ｐ単位で繰り返し処理を行って音源synp'(n)を生成する
ための制御信号を出力するようにしている。As described above, the speech decoding apparatus according to the present embodiment has the reproduction speed control unit 2, and the reproduction speed control unit 2 includes the reproduction time difference detection unit 31, the repeat / delete section detection unit 32, and the deletion unit. A processing unit 33, a repetition processing unit 34, and a reproduction speed magnification determination unit 35 are provided. Then, the reproduction time difference detection unit 31 calculates the adjustment variable rem based on the reproduction speed magnification R and the deletion processing result or the repetition processing result for each deletion processing or repetition processing. Then, based on whether the obtained adjustment variable rem is correct or not, it is determined whether or not the next deletion processing unit or the repeated processing is to be performed. When the result is obtained, the reproduction speed magnification determination unit 35 determines whether the reproduction speed magnification R is R ≧ 1 or R <1 to determine whether the deletion processing or the repetition processing is performed. Then, when the deletion processing is performed, the deletion processing unit 33 performs the deletion processing in units of pitch period P in the same manner as the deletion processing unit 23 in FIG. 3 to generate the sound source synp ′ (n).
The control signal for generating is output. On the other hand, when repeating processing is performed, the repeating processing unit 34
In the same manner as the iterative processing unit 27 in, the iterative processing is performed for each pitch period P and a control signal for generating the sound source synp '(n) is output.

【０１０７】したがって、本実施の形態によれば、音声
再生処理中に再生速度倍率Ｒの値をＲ＞１←→Ｒ＝１←
→Ｒ＜１に切り換えることによって早聞き再生処理,通
常再生処理および遅聞き再生処理に切り替えることがで
きる。すなわち、音声再生時の再生速度を無段階で変更
することが可能になるのである。このことは、例えば、
非常に長時間記録された音声情報の中から重要な箇所を
探す場合に、不必要な箇所を早聞きし、必要な箇所が近
づくと低速の早聞きを行い、重要箇所は遅聞きして十分
に内容を把握することを、音声再生中に再生速度倍率Ｒ
を変更するだけで容易に行うことができ、非常に有効で
ある。Therefore, according to the present embodiment, the value of the reproduction speed magnification R is set to R> 1 ← → R = 1 ← during the audio reproduction processing.
→ By switching to R <1, it is possible to switch to fast-listening reproduction processing, normal reproduction processing and slow-listening reproduction processing. That is, it becomes possible to change the reproduction speed during sound reproduction steplessly. This means, for example,
When searching for an important part from the audio information recorded for a very long time, fast-listen to the unnecessary part, perform low-speed listening when the necessary part comes close, and listen to the important part late To understand the contents of the playback speed ratio R during voice playback.
It is very effective because it can be easily done by simply changing.

【０１０８】尚、本実施の形態では、式(８)と式(９)と
の次元を揃えるために、式(８)における(Ｌpb(Ｒ−１)
−Ｌc)を１/Ｒ倍した式(１１)を式(８)に変えて用いて
いる。しかしながら、式(９)における(Ｌn(１/Ｒ−１)
−Ｌr)をＲ倍した式を式(９)に変えて使用しても差し支
えない。また、この発明における上記早聞き再生処理動
作,遅聞き再生処理動作および音声再生処理動作のアル
ゴリズムは、図８,図９あるいは図１０〜図１１のフロ
ーチャートに限定されるものではない。In the present embodiment, in order to make the dimensions of equations (8) and (9) equal, (Lpb (R-1) in equation (8) is used.
Equation (11) obtained by multiplying −Lc) by 1 / R is used instead of equation (8). However, in equation (9), (Ln (1 / R-1)
There is no problem even if the expression obtained by multiplying −Lr) by R is used instead of the expression (9). Further, the algorithms of the fast-listening reproduction processing operation, the slow-listening reproduction processing operation, and the sound reproduction processing operation in the present invention are not limited to the flowcharts of FIGS. 8, 9 or 10 to 11.

【０１０９】[0109]

【発明の効果】以上より明らかなように、請求項１に係
る発明の音声復号化装置は、ピッチ予測と線形予測を用
いた音声符号化方法による符号列を復号化して合成音声
を生成するに際に、再生速度制御部によって再生速度倍
率に基づく制御信号を出力し、この制御信号を受けたピ
ッチ合成フィルタによって、ピッチ成分が付加された音
源信号に対してピッチ周期を単位とする削除あるいは繰
り返しの何れか一方を行って音声合成フィルタに送出す
るので、上記音声合成フィルタによって、上記ピッチ成
分を有する音源信号をピッチ周期単位で削除あるいは繰
り返した音源信号に基づいて音声信号が合成される。し
たがって、上記ピッチ成分を付加する前の音源信号を削
除したり繰り返したりする場合に比較して、音源信号の
削除あるいは繰り返しによる音質の劣化が少なくて音の
高さが変わらない再生速度可変を実現できる。As is clear from the above, the speech decoding apparatus of the invention according to claim 1 is for decoding a code string by a speech coding method using pitch prediction and linear prediction to generate synthesized speech. At this time, the reproduction speed control unit outputs a control signal based on the reproduction speed multiplication factor, and the pitch synthesis filter that receives the control signal deletes or repeats in units of the pitch cycle with respect to the sound source signal to which the pitch component is added. One of the above is performed and sent to the voice synthesizing filter. Therefore, the voice synthesizing filter synthesizes the voice signal based on the sound source signal obtained by deleting or repeating the sound source signal having the pitch component in pitch cycle units. Therefore, compared with the case where the sound source signal before adding the pitch component is deleted or repeated, the reproduction speed can be changed so that the sound quality is not deteriorated due to the deletion or repetition of the sound source signal and the pitch does not change. it can.

【０１１０】また、請求項２に係る発明の音声復号化装
置は、上記ピッチ合成フィルタには生成されたピッチ成
分が付加された音源信号を所定区間保持する音源信号保
持手段を設け、上記再生速度制御部には、上記音源信号
保持手段に保持されている保持音源信号中に削除区間あ
るいは繰り返し区間が存在することを検出する繰り返し
・削除区間検出手段と、上記保持音源信号中における削
除区間あるいは繰り返し区間の存在が検出されると制御
信号を出力する繰り返し・削除処理手段を設けて、この
制御信号を受けたピッチ合成フィルタによって、上記保
持音源信号に対してピッチ周期を単位とする区間の削除
あるいは繰り返しを行って音声合成フィルタに送出する
ので、上記ピッチ合成フィルタにおいてピッチ成分が付
加された音源信号に対するピッチ周期単位での削除や繰
り返しを容易に実現できる。特に、上記音源信号保持手
段に保持できる音源信号の所定区間長を最大ピッチ周期
以上のフレーム単位に設定しておけば、上記ピッチ周期
がフレーム長より長い場合でも、上記ピッチ成分を有す
る音源信号に対するピッチ周期単位での削除や繰り返し
を確実に行うことができる。Further, in the speech decoding apparatus of the invention according to claim 2, the pitch synthesizing filter is provided with a sound source signal holding means for holding a sound source signal to which the generated pitch component is added in a predetermined section, and the reproduction speed is set. The control section includes a repeat / delete section detecting means for detecting the existence of a deleted section or a repeated section in the held sound source signal held by the sound source signal holding means, and a deleted section or repeat in the held sound source signal. Repeating / deleting processing means is provided for outputting a control signal when the presence of a section is detected, and a pitch synthesizing filter that receives the control signal is used to delete a section having a pitch period as a unit with respect to the held sound source signal. Since it is repeated and sent to the speech synthesis filter, the sound source signal to which the pitch component is added in the pitch synthesis filter. It can be easily realized deletion or repetition of a pitch period basis against. In particular, if the predetermined section length of the sound source signal that can be held in the sound source signal holding means is set in a frame unit of a maximum pitch period or more, even if the pitch period is longer than the frame length, the sound source signal having the pitch component is It is possible to surely delete and repeat in pitch cycle units.

【０１１１】また、請求項３に係る発明の音声復号化装
置は、上記再生速度制御部には、上記再生速度倍率に基
づく現時点までの希望再生時間と現時点までの実際に再
生した時間との差を検出する再生時間差検出手段を設け
て、上記実際に再生した時間が未だ希望再生時間に至っ
ていない場合には上記制御信号を出力して上記差の値を
０にするので、上記差の値に基づいて実際の再生時間が
希望再生時間に近づいているかを常に監視して再生速度
を制御できる。したがって、上記ピッチ成分を有する音
源信号に対する削除や繰り返しの区間長が常に一定でな
くとも、上記再生速度倍率での再生速度になるように再
生速度を制御できる。Further, in the audio decoding device according to the third aspect of the present invention, the reproduction speed control unit causes the difference between the desired reproduction time up to the present time and the actual reproduction time up to the present time based on the reproduction speed multiplication factor. A reproduction time difference detecting means for detecting the difference is provided, and when the actually reproduced time has not yet reached the desired reproduction time, the control signal is output to set the difference value to 0. Based on this, the playback speed can be controlled by constantly monitoring whether the actual playback time is approaching the desired playback time. Therefore, the reproduction speed can be controlled so as to reach the reproduction speed at the reproduction speed magnification, even if the section length of the deletion or repetition for the sound source signal having the pitch component is not always constant.

【０１１２】また、請求項４に係る発明の音声復号化装
置における上記再生速度制御部は、再生速度を遅くする
場合に、上記再生時間差検出手段によって希望再生時間
と実際に再生した時間との差の値が負の所定値以下にな
ったと判定されると、上記ピッチ成分が付加された音源
信号のピッチ周期を単位とする繰り返し区間を複数回繰
り返して音声合成フィルタに送出させる制御信号を出力
するので、上記再生速度倍率が小さいために実際に再生
した時間が希望再生時間に近づかない場合であっても、
実際の再生時間が希望再生時間になるように最適に制御
できる。Further, in the audio decoding device according to the fourth aspect of the present invention, when the reproduction speed control section slows down the reproduction speed, the difference between the desired reproduction time and the actually reproduced time by the reproduction time difference detecting means. When it is determined that the value of is less than or equal to a negative predetermined value, a control signal to be output to the speech synthesis filter is output by repeating the repeating section in which the pitch period of the sound source signal to which the pitch component is added is repeated multiple times. Therefore, even if the actual playback time does not approach the desired playback time due to the small playback speed magnification,
It can be optimally controlled so that the actual playback time becomes the desired playback time.

【０１１３】また、請求項５に係る発明の音声復号化装
置は、再速度制御部に再生速度倍率判定部,削除処理部
および繰り返し処理部を設けて、再生速度倍率が１以上
の場合には、上記削除処理部からの第１の制御信号を受
けたピッチ合成フィルタによって、ピッチ成分が付加さ
れた音源信号のピッチ周期を単位とする削除区間を削除
して音声合成フィルタに送出する一方、上記再生速度倍
率が１より小さい場合には、上記繰り返し処理部からの
第２の制御信号を受けた上記ピッチ合成フィルタによっ
て、ピッチ成分が付加された音源信号のピッチ周期を単
位とする繰り返し区間を繰り返して上記合成フィルタに
送出するので、上記再生速度倍率の値に応じて、再生速
度を通常の再生速度より速める早聞き処理と遅める遅聞
き処理と通常の再生速度での通常処理とを切り替え実行
できる。Further, in the speech decoding apparatus according to the fifth aspect of the present invention, the re-speed control section is provided with a reproduction speed magnification determination section, a deletion processing section and a repetition processing section, and when the reproduction speed multiplication rate is 1 or more. , The pitch synthesis filter that receives the first control signal from the deletion processing unit deletes the deletion section in units of the pitch period of the sound source signal to which the pitch component is added, and sends the deletion section to the speech synthesis filter. When the reproduction speed multiplication factor is smaller than 1, the pitch synthesis filter which receives the second control signal from the repetition processing unit repeats the repeating section in units of the pitch cycle of the sound source signal to which the pitch component is added. Is sent to the synthesizing filter according to the value of the reproduction speed multiplication factor, so that the reproduction speed is faster than the normal reproduction speed, the fast-listening processing is delayed, and the slow-reception processing is delayed. The normal processing at speed can be switched and executed.

【０１１４】したがって、音声再生中であっても上記再
生速度倍率を変更することができ、無段階変速が可能に
なる。すなわち、この発明によれば、例えば、非常に長
時間記録された音声情報の中における不必要な箇所を高
速で早聞きし、必要な箇所が近づくと低速の早聞きを行
い、重要箇所は遅聞きして内容を十分に把握する動作
を、音声再生中に再生速度倍率変更するという簡単な処
理だけで行うことができるのである。Therefore, the reproduction speed multiplication factor can be changed even during voice reproduction, and stepless speed change becomes possible. That is, according to the present invention, for example, an unnecessary portion of voice information recorded for a very long time is quickly heard at high speed, and when a necessary portion is approached, low-speed listening is performed, and an important portion is delayed. The operation of listening and sufficiently grasping the content can be performed only by a simple process of changing the reproduction speed ratio during the sound reproduction.

【図面の簡単な説明】[Brief description of drawings]

【図１】この発明の音声復号化装置における一実施の形
態を示すブロック図である。FIG. 1 is a block diagram showing an embodiment of a speech decoding apparatus of the present invention.

【図２】図１におけるピッチ合成フィルタの詳細なブロ
ック図である。FIG. 2 is a detailed block diagram of the pitch synthesis filter in FIG.

【図３】図１における再生速度制御部の早聞き再生およ
び通常再生処理を実現するためのブロック図である。FIG. 3 is a block diagram for realizing fast-listening reproduction and normal reproduction processing of a reproduction speed control unit in FIG.

【図４】図１における再生速度制御部の遅聞き再生およ
び通常再生処理を実現するためのブロック図である。FIG. 4 is a block diagram for realizing slow-playing reproduction and normal reproduction processing of the reproduction speed control unit in FIG.

【図５】図１における再生速度制御部の早聞き再生,通
常再生および遅聞き再生処理を実現するためのブロック
図である。5 is a block diagram for realizing the fast-listening playback, normal playback, and slow-listening playback processing of the playback speed control unit in FIG.

【図６】ピッチ成分を付加した音源波形の削除と繰り返
しの一例を示す模式図である。FIG. 6 is a schematic diagram showing an example of deletion and repetition of a sound source waveform to which a pitch component is added.

【図７】図６とは異なる音源波形の削除と繰り返し例を
示す模式図である。FIG. 7 is a schematic diagram showing an example of deletion and repetition of a sound source waveform different from that in FIG.

【図８】図３に示す再生速度制御部２の制御の下に行わ
れる早聞き再生処理動作フローチャートである。8 is a fast-listening reproduction processing operation flowchart which is performed under the control of the reproduction speed control unit 2 shown in FIG.

【図９】図４に示す再生速度制御部２の制御の下に行わ
れる遅聞き再生処理動作フローチャートである。9 is a flowchart of a slow-playing reproduction processing operation performed under the control of the reproduction speed control unit 2 shown in FIG.

【図１０】図５に示す再生速度制御部２の制御の下に行
われる音声再生処理動作フローチャートである。10 is a flowchart of an audio reproduction processing operation performed under the control of the reproduction speed control unit 2 shown in FIG.

【図１１】図１０に続く音声再生処理動作のフローチャ
ートでである。11 is a flowchart of the audio reproduction processing operation following FIG.

【符号の説明】[Explanation of symbols]

１…デマルチプレクサ、２…再生速度制御
部、３…ピッチ無し音源生成部、４…ピッチ合
成フィルタ、５…線形予測係数メモリ、６…
合成フィルタ、１１…内部フィルタメモリ、１
２…フィルタ出力メモリ、１３…乗算器、
１４…加算器、２１,２５,３１…再生時間差検
出部、２２…削除区間検出部、２３,３３
…削除処理部、２６…繰り返し区間検出部、２
７,３４…繰り返し処理部、３２…繰り返し・削除区間検
出部、３５…再生速度倍率判定部。1 ... Demultiplexer, 2 ... Reproduction speed control unit, 3 ... Pitchless sound source generation unit, 4 ... Pitch synthesis filter, 5 ... Linear prediction coefficient memory, 6 ...
Synthesis filter, 11 ... Internal filter memory, 1
2 ... Filter output memory, 13 ... Multiplier,
14 ... Adder, 21, 25, 31 ... Playback time difference detecting section, 22 ... Delete section detecting section, 23, 33
... deletion processing unit, 26 ... repeated section detection unit, 2
7, 34 ... Repeat processing section, 32 ... Repeat / delete section detecting section, 35 ... Reproduction speed magnification determining section.

Claims

【特許請求の範囲】[Claims]

【請求項１】ピッチ予測と線形予測を用いた音声符号
化方法による符号列を復号化して得られた音源情報に基
づいて音源信号を生成する音源生成部と、上記符号列を
復号化して得られたピッチ予測情報に基づいて上記音源
信号にピッチ成分を付加するピッチ合成フィルタと、上
記符号列を復号化して得られた線形予測情報に基づいて
上記ピッチ成分が付加された音源信号から音声信号を合
成する音声合成フィルタを有する音声復号化装置におい
て、再生速度倍率に基づいて、音声の再生速度を制御するた
めの制御信号を出力する再生速度制御部を備えて、上記ピッチ合成フィルタは、上記制御信号を受けて、上
記ピッチ成分が付加された音源信号に対してピッチ周期
を単位とする区間の削除あるいは繰り返しの何れか一方
を行って上記音声合成フィルタに送出することを特徴と
する音声復号化装置。1. A sound source generation unit that generates a sound source signal based on sound source information obtained by decoding a code string by a speech coding method using pitch prediction and linear prediction, and obtains by decoding the code string. A pitch synthesis filter that adds a pitch component to the excitation signal based on the obtained pitch prediction information, and a speech signal from the excitation signal to which the pitch component is added based on the linear prediction information obtained by decoding the code string In a voice decoding device having a voice synthesis filter for synthesizing, a playback speed control section for outputting a control signal for controlling a playback speed of a voice is provided based on a playback speed multiplication factor, and the pitch synthesis filter is In response to the control signal, the sound source signal to which the pitch component is added is either deleted or repeated in units of a pitch cycle, and the sound is reproduced. Speech decoding apparatus characterized by delivering the formed filter.

【請求項２】請求項１に記載の音声復号化装置におい
て、上記ピッチ合成フィルタは、生成されたピッチ成分が付
加された音源信号における所定区間を保持する音源信号
保持手段を有すると共に、上記再生速度制御部は、上記音源信号保持手段に保持されている保持音源信号の
時間長が現フレームのピッチ周期以上であることを検知
して上記保持音源信号中に削除あるいは繰り返しの対象
となる区間が存在することを検出する繰り返し・削除区
間検出手段と、上記繰り返し・削除区間検出手段によって上記削除ある
いは繰り返しの対象となる区間の存在が検出されると、
上記制御信号を出力する繰り返し・削除処理手段を有し
て、上記ピッチ合成フィルタは、上記制御信号を受けると、
上記保持音源信号に対してピッチ周期を単位とする区間
の削除あるいは繰り返しの何れか一方を行って上記音声
合成フィルタに送出するようになっていることを特徴と
する音声復号化装置。2. The speech decoding apparatus according to claim 1, wherein the pitch synthesizing filter has a sound source signal holding means for holding a predetermined section of the sound source signal to which the generated pitch component is added, and the reproduction is performed. The speed control unit detects that the time length of the held sound source signal held in the sound source signal holding means is equal to or longer than the pitch period of the current frame, and detects a section to be deleted or repeated in the held sound source signal. When the presence of the section to be deleted or repeated is detected by the repeated / deleted section detecting means for detecting the existence and the repeated / deleted section detecting means,
Having a repetition / deletion processing means for outputting the control signal, the pitch synthesis filter receives the control signal,
A speech decoding apparatus, characterized in that either one of a section having a pitch cycle as a unit is deleted or repeated with respect to the held excitation signal, and the section is transmitted to the speech synthesis filter.

【請求項３】請求項１に記載の音声復号化装置におい
て、上記再生速度制御部は、上記再生速度倍率に基づく現時
点までの希望再生時間と現時点までの実際に再生した時
間との差を検出する再生時間差検出手段を有して、この
再生時間差検出手段によって上記実際に再生した時間が
未だ希望再生時間に至っていないと判定された場合に上
記制御信号を出力して上記差の値を０にするようになっ
ていることを特徴とする音声復号化装置。3. The audio decoding device according to claim 1, wherein the reproduction speed control unit detects a difference between a desired reproduction time up to the present time and an actual reproduction time up to the present time based on the reproduction speed multiplication factor. When the reproduction time difference detection means determines that the actually reproduced time has not reached the desired reproduction time yet, the control signal is output to set the difference value to 0. A speech decoding apparatus characterized by being adapted to.

【請求項４】請求項３に記載の音声復号化装置におい
て、上記再生速度制御部は、再生速度を遅くする場合に、上
記再生時間差検出手段によって希望再生時間と実際に再
生した時間との差の値が負の所定値以下になったと判定
すると、上記ピッチ成分が付加された音源信号のピッチ
周期を単位とする繰り返し区間を複数回繰り返して上記
音声合成フィルタに送出させる制御信号を出力して、上
記希望再生時間と実際に再生した時間との差を速やかに
０に近づけるようになっていることを特徴とする音声復
号化装置。4. The audio decoding device according to claim 3, wherein, when the reproduction speed control section slows down the reproduction speed, a difference between the desired reproduction time and the actually reproduced time by the reproduction time difference detecting means. When it is determined that the value of is less than or equal to a negative predetermined value, a control signal to be output to the speech synthesis filter is output by repeating a repeating section in which the pitch period of the sound source signal to which the pitch component is added is repeated multiple times. A voice decoding device characterized in that the difference between the desired reproduction time and the actual reproduction time is quickly brought close to zero.

【請求項５】ピッチ予測と線形予測を用いた音声符号
化方法による符号列を復号化して得られた音源情報に基
づいて音源信号を生成する音源生成部と、上記符号列を
復号化して得られたピッチ予測情報に基づいて上記音源
信号にピッチ成分を付加するピッチ合成フィルタと、上
記符号列を復号化して得られた線形予測情報に基づいて
上記ピッチ成分が付加された音源信号から音声信号を合
成する音声合成フィルタを有する音声復号化装置におい
て、再生速度倍率の値が１以上であるか否かを判定して、判
定結果を表す信号を出力する再生速度倍率判定部と、上記再生速度倍率判定部からの上記再生速度倍率の値が
１以上であることを表す信号を受けて、第１の制御信号
を出力する削除処理部と、上記再生速度倍率判定部からの上記再生速度倍率の値が
１より小さいことを表す信号を受けて、第２の制御信号
を出力する繰り返し処理部を備えて、上記ピッチ合成フィルタは、上記第１の制御信号を受け
た場合には、上記ピッチ成分が付加された音源信号のピ
ッチ周期を単位とする削除区間を削除して上記音声合成
フィルタに送出する一方、上記第２の制御信号を受けた
場合には、上記ピッチ成分が付加された音源信号のピッ
チ周期を単位とする繰り返し区間を繰り返して上記音声
合成フィルタに送出することを特徴とする音声復号化装
置。5. A sound source generation unit that generates a sound source signal based on sound source information obtained by decoding a code string by a speech coding method using pitch prediction and linear prediction, and obtains by decoding the code string. A pitch synthesis filter that adds a pitch component to the excitation signal based on the obtained pitch prediction information, and a speech signal from the excitation signal to which the pitch component is added based on the linear prediction information obtained by decoding the code string In a speech decoding apparatus having a speech synthesis filter for synthesizing, a reproduction speed multiplication determination unit that determines whether or not the value of the reproduction speed multiplication is 1 or more, and outputs a signal indicating the determination result; A deletion processing unit that receives a signal from the magnification determination unit indicating that the value of the reproduction speed multiplication factor is 1 or more, and outputs a first control signal, and the reproduction speed from the reproduction speed multiplication determination unit. The pitch synthesizing filter includes a repetition processing unit that outputs a second control signal in response to a signal indicating that the value of the magnification is smaller than 1, and the pitch synthesizing filter receives the first control signal. While deleting the deletion section having the pitch period of the sound source signal to which the pitch component is added as a unit and sending it to the voice synthesis filter, when the second control signal is received, the pitch component is added. A speech decoding apparatus, characterized in that a repeating section with a pitch period of a sound source signal as a unit is repeatedly transmitted to the speech synthesis filter.