JP2003046490A

JP2003046490A - Voice transmission device

Info

Publication number: JP2003046490A
Application number: JP2001228905A
Authority: JP
Inventors: Wataru Fushimi; 渉伏見; Shigeaki Suzuki; 茂明鈴木
Original assignee: Mitsubishi Electric Corp
Current assignee: Mitsubishi Electric Corp
Priority date: 2001-07-30
Filing date: 2001-07-30
Publication date: 2003-02-14

Abstract

PROBLEM TO BE SOLVED: To provide a voice transmission device capable of realizing a clock difference absorbing function with higher voice quality and high precision at a low cost, which can absorb a difference of clock signals between a transmitter side device and a receiver side device. SOLUTION: A buffer control section 15 inserts a silence voice signal generated by a silence coding voice signal generating section 13 to a voice signal stored in a buffer section 14 on the basis of a storage amount of voice signals monitored by a buffer capacity monitor section 16 and voiced/silence information detected by a voice detection section 11 or aborts the silence voice signal from the voice signals stored in the buffer section 14. Thus, a change in the storage capacity of the voice signals stored in the buffer section 14 is reduced so as to eliminate defects in voice transmission due to a difference of clock signals between the transmitter side device and the receiver side device.

Description

【発明の詳細な説明】Detailed Description of the Invention

【０００１】[0001]

【発明の属する技術分野】この発明は、ＩＰパケットを
用いて音声信号などの信号を伝送する音声伝送装置に関
するものである。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to a voice transmission device for transmitting signals such as voice signals using IP packets.

【０００２】[0002]

【従来の技術】従来、ＩＰネットワークを介した音声伝
送装置において、インターネット電話サービスなどのリ
アルタイム伝送が要求されるサービスを実現する場合、
装置間の伝送クロックの同期が考慮されていないため、
装置間のクロックの誤差により、受信側装置において受
信データの過不足が発生する問題がある。この問題を解
決するために、装置に実装されるクロックとして、非常
に精度の高いクロックを使用している。2. Description of the Related Art Conventionally, when a service requiring real-time transmission such as an internet telephone service is realized in a voice transmission device via an IP network,
Since the synchronization of transmission clocks between devices is not considered,
There is a problem that the reception side device may have an excess or deficiency of received data due to a clock error between the devices. In order to solve this problem, a highly accurate clock is used as the clock mounted on the device.

【０００３】また、図１９は、特開平５−１０３０１２
に示された従来の音声伝送装置の構成図である。図１９
の従来の音声伝送装置は、音声パケット入力線１１１の
接続された音声パケット逆変換部１０１の出力信号１２
６は補償パターン生成部１０４と有音無音判定部１０２
と選択回路１０５とに接続入力される。この選択回路１
０５には上記補償パターン生成部１０４の出力信号１２
７も接続入力され、さらにキュー積み込み制御部１０３
の選択信号１２２も接続入力される。上記有音無音判定
部１０２の判定結果信号１２３はキュー積み込み制御部
１０３に接続入力される。選択回路１０５の出力信号１
２５は音声フレームキュー１０６へ接続入力され、この
音声フレームキュー１０６は音声フレーム出力線１１２
へとその音声フレームを出力する。上記キュー積み込み
制御部１０３の積み込み指示信号１２０は音声フレーム
キュー１０６と計数回路１０７とに接続入力されてい
る。また、計数回路１０７はその計数結果をカウント値
信号１２４としてキュー積み込み制御部１０３に出力す
る。タイミング発生部１０８はこの計数回路１０７と音
声フレームキュー１０６へタイミング信号１２１を出力
する。Further, FIG. 19 shows in Japanese Patent Laid-Open No. 5-103012.
It is a block diagram of the conventional audio transmission device shown in FIG. FIG. 19
In the conventional voice transmission device, the output signal 12 of the voice packet reverse conversion unit 101 connected to the voice packet input line 111 is output.
Reference numeral 6 denotes a compensation pattern generation unit 104 and a sound / silence determination unit 102.
To the selection circuit 105. This selection circuit 1
Reference numeral 05 denotes the output signal 12 of the compensation pattern generation unit 104.
7 is also connected and input, and further the queue loading control unit 103
The selection signal 122 of is also connected and input. The determination result signal 123 of the voiced / non-voiced determination unit 102 is connected and input to the queue loading control unit 103. Output signal 1 of selection circuit 105
25 is connected and input to the audio frame queue 106, and the audio frame queue 106 is connected to the audio frame output line 112.
To output the audio frame. The loading instruction signal 120 of the queue loading control unit 103 is connected and input to the audio frame queue 106 and the counting circuit 107. The counting circuit 107 also outputs the counting result to the queue loading control unit 103 as a count value signal 124. The timing generator 108 outputs a timing signal 121 to the counting circuit 107 and the audio frame queue 106.

【０００４】次にその動作について説明する。音声パケ
ット逆変換部１０１は音声パケット入力線１１１より入
力された音声パケットから音声フレームを再生変換し、
有音無音判定部１０２と補償パターン生成部１０４と選
択回路１０５とに、それぞれ出力する。有音無音判定部
１０２はその受け取った音声フレームの無音状態／有音
状態を判定する。そうしてその有音無音の判定結果を有
音無音判定部１０２はキュー積み込み制御部１０３に判
定結果信号１２３として出力する。キュー積み込み制御
部１０３はこの判定結果信号１２３を次の音声フレーム
の判定結果信号１２３が入力されるまで保持する。Next, the operation will be described. The voice packet reverse conversion unit 101 reproduces and converts a voice frame from a voice packet input from the voice packet input line 111,
It outputs to the presence / absence determination section 102, the compensation pattern generation section 104, and the selection circuit 105, respectively. The sound / sound determination unit 102 determines the sound / sound state of the received voice frame. Then, the sound / sound determination unit 102 outputs the sound / silence determination result to the cue loading control unit 103 as a determination result signal 123. The queue loading control unit 103 holds this determination result signal 123 until the determination result signal 123 of the next audio frame is input.

【０００５】補償パターン生成部１０４は音声パケット
逆変換部１０１から入力された音声フレームから補償音
声フレームを生成し選択回路１０５に出力する。そうし
て、選択回路１０５はキュー積み込み制御部１０３から
の選択信号１２２に基づいて、補償音声フレームか、音
声パケット逆変換部１０１の音声フレームかを選択切り
換えて音声フレームキュー１０６へ出力する。音声フレ
ームキュー１０６はこの選択回路１０５からの音声フレ
ームをキューに積み込む。このとき計数回路１０７は、
この積み込み動作の度にカウント値を１ずつカウントア
ップする。The compensation pattern generation unit 104 generates a compensation voice frame from the voice frame input from the voice packet inverse conversion unit 101 and outputs it to the selection circuit 105. Then, the selection circuit 105 selectively switches between the compensated voice frame and the voice frame of the voice packet inverse conversion unit 101 based on the selection signal 122 from the queue loading control unit 103, and outputs the selected voice frame to the voice frame queue 106. The audio frame queue 106 loads the audio frame from the selection circuit 105 into the queue. At this time, the counting circuit 107
The count value is incremented by one each time this loading operation is performed.

【０００６】音声フレームキュー１０６は、タイミング
発生部１０８が発生させる一定周期を有するタイミング
信号１２１に従って、キュー内に最先に積み込んだ音声
フレームから順次、音声フレーム出力線１１２を介して
図示しないコーデック側へ出力する。この音声フレーム
キュー１０６が音声フレームを出力すると、計数回路１
０７はタイミング信号１２１によって１ずつカウント値
をカウントダウンする。The audio frame queue 106 is, in accordance with a timing signal 121 having a constant cycle generated by the timing generator 108, sequentially from the earliest audio frame loaded in the queue via the audio frame output line 112 to a codec side (not shown). Output to. When the audio frame queue 106 outputs an audio frame, the counting circuit 1
07 counts down the count value by 1 according to the timing signal 121.

【０００７】キュー積み込み制御部１０３は計数回路１
０７から入力されるカウント値信号１２４を常時監視
し、そのカウント値（つまりは音声フレームキュー１０
６内のキューに積み込まれた音声フレームの残存総数）
が予め与えられた下限閾個数Ｂ以下であることを識別す
ると、音声フレームキュー１０６の最後の音声フレーム
が無音であったか否かを判断する。そして、無音であっ
たならば、補償パターン生成部１０４が出力する補償音
声フレームを選択して音声フレームキュー１０６へ出力
させるように、選択回路１０５を制御する。しかも音声
フレームキュー１０６へキュー積み込み制御部１０３
は、その補償音声フレームを積み込ませる積み込み指示
信号１２０を出力する。これとは逆に、音声フレームキ
ュー１０６の最終音声フレームが有音であったならば、
音声パケット逆変換部１０１からの音声フレームを音声
フレームキュー１０６へ積み込むようにし、無音フレー
ムが入力されるまで、補償パターン生成部１０４の補償
音声フレームの音声フレームキュー１０６への積み込み
補償処理は行わない。そうして、カウント値信号１２４
が下限閾個数Ｂを超えるまでこの動作をキュー積み込み
制御部１０３は制御する。The queue loading control unit 103 includes a counting circuit 1
The count value signal 124 inputted from 07 is constantly monitored, and the count value (that is, the audio frame queue 10
(Total number of voice frames remaining in queue 6)
When it is determined that is equal to or less than the lower limit threshold number B given in advance, it is determined whether or not the last voice frame in the voice frame queue 106 is silent. Then, if there is no sound, the selection circuit 105 is controlled so that the compensated voice frame output by the compensation pattern generation unit 104 is selected and output to the voice frame queue 106. Moreover, the queue loading control unit 103 is added to the audio frame queue 106.
Outputs a loading instruction signal 120 for loading the compensated audio frame. On the contrary, if the last voice frame in the voice frame queue 106 is voiced,
The voice frames from the voice packet inverse conversion unit 101 are loaded in the voice frame queue 106, and the compensation pattern generation unit 104 does not load the compensated voice frames in the voice frame queue 106 until a silent frame is input. . Then, the count value signal 124
Until the number exceeds the lower limit threshold number B, the queue loading control unit 103 controls this operation.

【０００８】また、カウント値信号１２４の値が予め与
えられた上限閾個数Ａ以下であるならば、キュー積み込
み制御部１０３は音声パケット逆変換部１０１から入力
される音声フレームを有音無音に関係なく無条件に選択
回路１０５を介して音声フレームキュー１０６へ積み込
ませるように、選択信号１２２と積み込み指示信号１２
０とを出力する。Further, if the value of the count value signal 124 is equal to or less than the upper limit threshold number A given in advance, the queue loading control unit 103 relates the voice frame input from the voice packet inverse conversion unit 101 to voiced / unvoiced. Selection signal 122 and loading instruction signal 12 so as to unconditionally load the audio frame queue 106 via the selection circuit 105.
0 is output.

【０００９】しかし、カウント値信号１２４の値が上限
閾個数Ａを超えたならば、キュー積み込み制御部１０３
は、音声パケット逆変換部１０１からの無音音声フレー
ムのみを音声フレームキュー１０６へ積み込むように選
択回路１０５を制御する。つまり、キュー積み込み音声
フレーム数が上限閾個数Ａを超過したならば、常に音声
パケット逆変換部１０１からの有音音声フレームは破棄
して音声フレームキュー１０６へ積み込むことはしな
い。However, if the value of the count value signal 124 exceeds the upper limit threshold number A, the queue loading control unit 103
Controls the selection circuit 105 so that only the silent voice frames from the voice packet inverse conversion unit 101 are loaded into the voice frame queue 106. That is, when the number of voice frames loaded in the queue exceeds the upper limit threshold number A, the voiced voice frames from the voice packet inverse conversion unit 101 are not always discarded and loaded into the voice frame queue 106.

【発明が解決しようとする課題】[Problems to be Solved by the Invention]

【００１０】従来の音声伝送装置では、非常に精度の高
いクロックを使用するため、装置コストが高くなるとい
う問題があった。また、クロック誤差を吸収するための
キューバッファの入力を制御しているため、キューバッ
ファに蓄積されてしまった音声信号については制御がで
きないという問題と、閾値を越えた後に無音音声フレー
ムが現れない場合にはキューバッファ制御が不能となる
問題、キューバッファが溢れる傾向にある場合には、有
意な音声信号が含まれる有音音声フレームが破棄される
という問題があった。また、ある時間長を持った音声フ
レーム単位での制御であるため、音声フレーム単位より
小さな単位での制御が不可能であるという問題があっ
た。In the conventional voice transmission apparatus, there is a problem that the apparatus cost becomes high because a clock having a very high accuracy is used. Also, since the input of the queue buffer for absorbing the clock error is controlled, it is not possible to control the audio signal accumulated in the queue buffer, and no silent audio frame appears after the threshold is exceeded. In this case, there is a problem that the queue buffer cannot be controlled, and when the queue buffer tends to overflow, a voiced voice frame including a significant voice signal is discarded. In addition, since the control is performed in units of voice frames having a certain time length, there is a problem that control in units smaller than voice frames is impossible.

【００１１】この発明は上記のような問題点を解決する
ためになされたもので、高い音声通話品質と高精度なク
ロック誤差吸収を安価に実現する音声伝送装置を得るこ
とを目的とする。The present invention has been made to solve the above problems, and an object of the present invention is to obtain a voice transmission device which realizes high voice communication quality and highly accurate clock error absorption at a low cost.

【００１２】[0012]

【課題を解決するための手段】第１の発明は、受信した
ＩＰパケットから音声信号を抽出するＩＰパケット受信
部と、上記ＩＰパケット受信部により抽出された音声信
号の有音無音区間を示す有音無音情報を検出する音声検
出部と、上記ＩＰパケット受信部により抽出された音声
信号を蓄積するバッファ部と、上記バッファ部に蓄積さ
れた音声信号の蓄積量を監視するバッファ監視部と、上
記バッファ監視部により監視された音声信号の蓄積量と
上記音声検出部により検出された有音無音情報に基づい
て、上記バッファ部に蓄積された音声信号に新たな音声
信号を挿入するか又は上記バッファ部に蓄積された音声
信号を廃棄するバッファ制御部と、上記バッファ制御部
により音声信号が挿入又は廃棄された第２の音声信号を
復号する復号部と、を備えたものである。According to a first aspect of the present invention, there is provided an IP packet receiving section for extracting a voice signal from a received IP packet, and a voiced silence section of the voice signal extracted by the IP packet receiving section. A voice detection unit that detects voiceless information, a buffer unit that stores the voice signal extracted by the IP packet reception unit, a buffer monitoring unit that monitors the storage amount of the voice signal stored in the buffer unit, A new audio signal is inserted into the audio signal accumulated in the buffer unit or the buffer is added based on the accumulated amount of the audio signal monitored by the buffer monitoring unit and the voiced / unvoiced information detected by the audio detection unit. A buffer control unit that discards the audio signal accumulated in the unit, and a decoding unit that decodes the second audio signal in which the audio signal is inserted or discarded by the buffer control unit It is those with a.

【００１３】第２の発明は、上記バッファ部に蓄積され
た音声信号に挿入する無音音声信号を生成する無音音声
信号生成部と、上記ＩＰパケット受信部により抽出され
た音声信号に上記有音無音情報を示すマーカーを付与す
るマーカー付与部を備え、上記バッファ部は、上記マー
カー付与部によりマーカーが付与された音声信号を蓄積
し、上記バッファ制御部は、上記バッファ監視部により
監視された音声信号の蓄積量と上記音声信号に付与され
たマーカーに基づいて、上記バッファ部に蓄積された音
声信号の無音区間に上記無音音声信号生成部により生成
された無音音声信号を挿入するか又は上記バッファ部に
蓄積された音声信号の無音区間の無音音声信号を廃棄
し、上記復号部は、上記バッファ制御部により無音音声
信号が挿入又は廃棄された音声信号からこの音声信号に
付与されたマーカーを除去した第２の音声信号を復号す
るものである。A second aspect of the present invention is a silent voice signal generating unit for generating a silent voice signal to be inserted into the voice signal accumulated in the buffer unit, and the voiced and silent voice signal for the voice signal extracted by the IP packet receiving unit. The buffer unit stores the audio signal to which the marker is added by the marker adding unit, and the buffer control unit is the audio signal monitored by the buffer monitoring unit. Based on the accumulated amount and the marker added to the audio signal, the silent audio signal generated by the silent audio signal generating unit is inserted into the silent interval of the audio signal accumulated in the buffer unit, or the buffer unit. And discards the silent voice signal in the silent section of the voice signal accumulated in the, and the decoding unit inserts or discards the silent voice signal by the buffer control unit. The is to decode the second audio signal by removing the applied marker to the audio signal from the audio signal.

【００１４】第３の発明は、上記バッファに蓄積された
音声信号の蓄積量が予め設定された下限値以下のときに
は、上記バッファ部に蓄積された音声信号の無音区間に
上記無音音声信号生成部により生成された無音音声信号
を挿入し、上記音声信号の蓄積量が予め設定された上限
値以上のときには、上記バッファ部に蓄積された音声信
号の無音区間の無音音声信号を廃棄するバッファ制御部
を備えたものである。According to a third aspect of the present invention, when the storage amount of the audio signal stored in the buffer is equal to or lower than a preset lower limit value, the silent audio signal generation unit is placed in the silent section of the audio signal stored in the buffer unit. A buffer control unit that inserts the silent voice signal generated by the above, and discards the silent voice signal in the silent section of the voice signal accumulated in the buffer unit when the accumulated amount of the voice signal is equal to or more than a preset upper limit value. It is equipped with.

【００１５】第４の発明は、上記バッファに蓄積された
音声信号の無音区間の継続時間を測定する無音継続測定
部を備え、上記バッファ制御部は、上記バッファに蓄積
された音声信号の蓄積量が予め設定された下限値以下の
ときには、上記継続時間と上記音声信号に付与されたマ
ーカーに基づいて、上記バッファ部に蓄積された音声信
号の無音区間に上記無音音声信号生成部により生成され
た無音音声信号を挿入し、上記バッファに蓄積された音
声信号の蓄積量が予め設定された上限値以上のときに
は、上記継続時間と上記音声信号に付与されたマーカー
に基づいて、上記バッファ部に蓄積された音声信号の無
音区間の無音音声信号を廃棄するものである。A fourth aspect of the present invention comprises a silence continuation measuring section for measuring a duration of a silent section of the audio signal accumulated in the buffer, wherein the buffer control section is an accumulation amount of the audio signal accumulated in the buffer. Is less than or equal to a preset lower limit value, based on the duration and the marker added to the audio signal, generated by the silent audio signal generation unit in the silent section of the audio signal accumulated in the buffer unit When a silent audio signal is inserted and the storage amount of the audio signal stored in the buffer is equal to or greater than a preset upper limit value, the audio signal is stored in the buffer unit based on the duration and the marker attached to the audio signal. The silent audio signal in the silent section of the generated audio signal is discarded.

【００１６】第５の発明は、無音から有音に変化した時
点から一定時間前までの無音区間により構成されるフロ
ントハングオーバー区間と、有音から無音に変化した時
点から一定時間後までの無音区間により構成されるハン
グオーバー区間とを示すマーカーを上記音声信号に付与
するマーカー付与部を備え、上記バッファ制御部は、上
記バッファに蓄積された音声信号の蓄積量が予め設定さ
れた下限値以下のときには、上記フロントハングオーバ
ー区間でなく又上記ハングオーバー区間でない音声信号
の無音区間に上記無音音声信号生成部により生成された
無音音声信号を挿入し、上記バッファに蓄積された音声
信号の蓄積量が予め設定された複数の上限値以上のとき
には、この複数の上限値に応じて上記フロントハングオ
ーバー区間、又は上記ハングオーバー区間、又は上記フ
ロントハングオーバー区間でなく又上記ハングオーバー
区間でない無音区間の無音音声信号を廃棄するものであ
る。A fifth aspect of the present invention is a front hangover section composed of a silent section from a time point of change from silence to a certain time before a certain period of time, and a silent section from a time point of change from voice to silence to a certain time later. The buffer control unit includes a marker adding unit that adds a marker indicating a hangover section composed of sections to the audio signal, and the buffer control unit has an accumulation amount of the audio signal accumulated in the buffer equal to or lower than a preset lower limit value. In the case of, the silent audio signal generated by the silent audio signal generation unit is inserted in the silent interval of the audio signal which is neither the front hangover interval nor the hangover interval, and the accumulated amount of the audio signal accumulated in the buffer. Is greater than or equal to a plurality of upper limit values set in advance, the front hangover section according to the plurality of upper limit values, or Serial hangover interval, or is intended to discard the silence audio signal in the silence section not the front hangover interval are not also the hangover period.

【００１７】第６の発明は、上記ＩＰパケット受信部に
より抽出された音声信号が会話による音声信号か否かを
判別する受信データ判別部と、上記受信データ判別部に
より判別された判別結果に基づいて上記ＩＰパケット受
信部の音声信号以外の信号を選択するか又は上記バッフ
ァに蓄積された音声信号を選択するセレクタとを備え、
上記復号部は、上記セレクタの選択結果に基づいて上記
無音音声信号が挿入又は廃棄された音声信号からこの音
声信号に付与されたマーカーを除去した第３の音声信号
を復号するものである。A sixth aspect of the present invention is based on a received data discriminating section for discriminating whether or not the voice signal extracted by the IP packet receiving section is a voice signal for conversation, and a discrimination result discriminated by the received data discriminating section. A selector for selecting a signal other than the voice signal of the IP packet receiving section or for selecting a voice signal accumulated in the buffer,
The decoding unit decodes the third audio signal in which the marker added to the audio signal is removed from the audio signal in which the silent audio signal is inserted or discarded based on the selection result of the selector.

【００１８】第７の発明は、上記受信データ判別部の判
別結果に基づいて上記ＩＰパケット受信部の音声信号が
ファクシミリ信号か否かを判定し、上記音声信号がファ
クシミリ信号のときにはこのファクシミリ信号のプロト
コルを解析するファクシミリプロトコル解析部を備え、
上記バッファ制御部は、上記バッファ部にファクシミリ
信号が蓄積されているときには、上記ファクシミリプロ
トコル解析部により解析された解析情報に基づいて上記
バッファ部に蓄積されたファクシミリ信号のプロトコル
上、上記無音音声信号の挿入又は廃棄を行っても問題の
ない音声信号の無音区間に上記無音音声信号を挿入する
か又は上記音声信号の無音区間の無音音声信号を廃棄す
るものである。A seventh aspect of the present invention determines whether the voice signal of the IP packet receiving section is a facsimile signal based on the discrimination result of the received data discriminating section, and when the voice signal is a facsimile signal, the facsimile signal Equipped with a facsimile protocol analysis unit that analyzes protocols,
When a facsimile signal is stored in the buffer section, the buffer control section is based on the analysis information analyzed by the facsimile protocol analysis section, and based on the protocol of the facsimile signal stored in the buffer section, the silent voice signal. Is inserted into the silent section of the audio signal which causes no problem or the silent audio signal in the silent section of the audio signal is discarded.

【００１９】第８の発明は、受信したＩＰパケットから
音声信号を抽出するＩＰパケット受信部と、上記ＩＰパ
ケット受信部により抽出された音声信号を復号する復号
部と、上記復号部により復号された第３の音声信号から
有音無音区間を示す有音無音情報を検出する音声検出部
と、上記復号部により復号された第３の音声信号を蓄積
するバッファ部と、上記バッファ部に蓄積された第３の
音声信号の蓄積量を監視するバッファ監視部と、上記バ
ッファ監視部により監視された第３の音声信号の蓄積量
と上記音声検出部により検出された有音無音情報に基づ
いて、上記バッファ部に蓄積された第３の音声信号に新
たな音声信号を挿入するか又は上記バッファ部に蓄積さ
れた第３の音声信号を廃棄するバッファ制御部と、を備
えたものである。An eighth aspect of the invention is an IP packet receiving section for extracting a voice signal from a received IP packet, a decoding section for decoding the voice signal extracted by the IP packet receiving section, and a decoding section for decoding the voice signal. A voice detection unit that detects voiced / unvoiced information indicating a voiced / unvoiced section from the third voice signal, a buffer unit that stores the third voice signal decoded by the decoding unit, and a buffer unit that stores the third voice signal. Based on the buffer monitoring unit that monitors the storage amount of the third audio signal, the storage amount of the third audio signal monitored by the buffer monitoring unit, and the voiced / unvoiced information detected by the voice detection unit. And a buffer control section for inserting a new audio signal into the third audio signal accumulated in the buffer section or discarding the third audio signal accumulated in the buffer section.

【００２０】第９の発明は、上記バッファ部に蓄積され
た第３の音声信号に挿入する無音音声信号を生成する無
音音声信号生成部と上記復号部により復号された第３の
音声信号に有音無音情報を示すマーカーを付与するマー
カー付与部を備え、上記バッファ部は、上記マーカー付
与部によりマーカーが付与された第３の音声信号を蓄積
し、上記バッファ制御部は、上記バッファ監視部により
監視された第３の音声信号の蓄積量と上記第３の音声信
号に付与されたマーカーに基づいて、上記バッファ部に
蓄積された第３の音声信号の無音区間に上記無音音声信
号生成部により生成された無音音声信号を挿入するか又
は上記バッファ部に蓄積された第３の音声信号の無音区
間の無音音声信号を廃棄するものである。A ninth aspect of the present invention is applicable to a third voice signal decoded by the silence voice signal generation unit and the decoding unit, which generates a silence voice signal to be inserted into the third voice signal accumulated in the buffer unit. A buffer for storing the third audio signal to which the marker is added by the marker adding unit, and the buffer control unit for the buffer monitoring unit. Based on the monitored accumulation amount of the third audio signal and the marker added to the third audio signal, the silent audio signal generation unit causes the silent audio signal generation unit to enter the silent interval of the third audio signal accumulated in the buffer unit. The generated silent voice signal is inserted or the silent voice signal in the silent period of the third voice signal accumulated in the buffer unit is discarded.

【００２１】第１０の発明は、上記バッファに蓄積され
た第３の音声信号の蓄積量が予め設定された下限値以下
のときには、、上記バッファ部に蓄積された第３の音声
信号の無音区間に上記無音音声信号生成部により生成さ
れた無音音声信号を挿入し、上記第３の音声信号の蓄積
量が予め設定された上限値以上のときには、上記バッフ
ァ部に蓄積された第３の音声信号の無音区間の無音音声
信号を廃棄するバッファ制御部を備えたものである。In a tenth aspect of the present invention, when the storage amount of the third voice signal stored in the buffer is equal to or lower than a preset lower limit value, the silent section of the third voice signal stored in the buffer unit. When the silent audio signal generated by the silent audio signal generating section is inserted into the third audio signal and the accumulated amount of the third audio signal is equal to or more than a preset upper limit value, the third audio signal accumulated in the buffer section is inserted. It is provided with a buffer control unit for discarding the silent audio signal in the silent section.

【００２２】第１１の発明は、上記バッファに蓄積され
た第３の音声信号の蓄積量が予め設定された複数の下限
値以下のときには、この複数の下限値に応じて上記第３
の音声信号の無音区間に上記無音音声信号生成部により
生成された無音音声信号を挿入し、又上記バッファに蓄
積された第３の音声信号の蓄積量が予め設定された複数
の上限値以上のときには、この複数の上限値に応じて上
記第３の音声信号の無音区間の無音音声信号を廃棄する
バッファ制御部を備えたものである。According to an eleventh aspect of the invention, when the storage amount of the third audio signal stored in the buffer is less than or equal to a plurality of preset lower limit values, the third invention is performed according to the plurality of lower limit values.
Of the voice signal, the silence voice signal generated by the silence voice signal generation unit is inserted, and the accumulation amount of the third voice signal accumulated in the buffer is equal to or more than a plurality of preset upper limit values. In some cases, a buffer control unit for discarding the silent voice signal in the silent section of the third voice signal is provided according to the plurality of upper limit values.

【００２３】第１２発明は、上記複数の下限値が最低下
限値以下でかつ上記バッファに蓄積された第３の音声信
号に無音区間がないときには、上記第３の音声信号の有
音区間の信号欠落部分の信号を挿入する補間処理を行う
バッファ制御部を備えたものである。In a twelfth aspect of the invention, when the plurality of lower limit values are equal to or less than the minimum lower limit value and the third voice signal accumulated in the buffer has no silence section, the signal of the voice section of the third voice signal is obtained. It is provided with a buffer control unit for performing an interpolation process for inserting the signal of the missing portion.

【００２４】第１３の発明は、上記バッファに蓄積され
た第３の音声信号の無音区間の継続時間を測定する無音
継続測定部を備え、上記バッファ制御部は、上記バッフ
ァに蓄積された第３の音声信号の蓄積量が予め設定され
た下限値以下のときには、上記継続時間と上記第３の音
声信号に付与されたマーカーに基づいて、上記バッファ
部に蓄積された第３の音声信号の無音区間に上記無音音
声信号生成部により生成された無音音声信号を挿入し、
上記バッファに蓄積された第３の音声信号の蓄積量が予
め設定された上限値以上のときには、上記継続時間と上
記第３の音声信号に付与されたマーカーに基づいて、上
記バッファ部に蓄積された第３の音声信号の無音区間の
無音音声信号を廃棄するものである。A thirteenth invention comprises a silence continuation measuring section for measuring a duration of a silent section of the third audio signal accumulated in the buffer, and the buffer control section comprises the third section accumulated in the buffer. When the accumulated amount of the audio signal is less than or equal to a preset lower limit value, the silence of the third audio signal accumulated in the buffer unit is determined based on the duration and the marker added to the third audio signal. Insert the silent voice signal generated by the silent voice signal generation unit in the section,
When the accumulation amount of the third audio signal accumulated in the buffer is equal to or larger than a preset upper limit value, the third audio signal is accumulated in the buffer unit based on the duration and the marker added to the third audio signal. The silent audio signal in the silent section of the third audio signal is discarded.

【００２５】第１４の発明は、無音から有音に変化した
時点から一定時間前までの無音区間により構成されるフ
ロントハングオーバー区間と、有音から無音に変化した
時点から一定時間後までの無音区間により構成されるハ
ングオーバー区間とを示すマーカーを上記第３の音声信
号に付与するマーカー付与部を備え、上記バッファ制御
部は、上記バッファに蓄積された第３の音声信号の蓄積
量が予め設定された下限値以下のときには、上記フロン
トハングオーバー区間でなく又上記ハングオーバー区間
でない第３の音声信号の無音区間に上記無音音声信号を
挿入し、上記バッファに蓄積された第３の音声信号の蓄
積量が予め設定された複数の上限値以上のときには、こ
の複数の上限値に応じて上記フロントハングオーバー区
間、又は上記ハングオーバー区間、又は上記フロントハ
ングオーバー区間でなく又上記ハングオーバー区間でな
い無音区間の無音音声信号を廃棄するものである。A fourteenth aspect of the present invention is a front hangover section composed of a silent section from a time point of changing from silence to a certain period of time before a certain period of time, and a silent section from a time of changing from voice to silence to a certain period of time later. The buffer control unit includes a marker adding unit that adds a marker indicating a hangover section constituted by a section to the third audio signal, and the buffer control unit stores the accumulated amount of the third audio signal accumulated in the buffer in advance. When the value is less than or equal to the set lower limit value, the silent audio signal is inserted into the silent interval of the third audio signal which is neither the front hangover interval nor the hangover interval, and the third audio signal accumulated in the buffer. When the accumulated amount of the vehicle is equal to or greater than a plurality of preset upper limit values, the front hangover section or the hangover is performed according to the plurality of upper limit values. Over period, or is intended to discard the silence audio signal in the silence section not also the hangover period rather than the front hangover period.

【００２６】第１５の発明は、上記復号部により復号さ
れた第３の音声信号が会話による音声信号か否かを判別
する受信データ判別部と、上記受信データ判別部により
判別された判別結果に基づいて上記ＩＰパケット受信部
の第３の音声信号以外の信号を選択するか又は上記バッ
ファに蓄積された第３の音声信号を選択するセレクタと
を備えたものである。A fifteenth aspect of the invention is to provide a received data discriminating unit for discriminating whether or not the third voice signal decoded by the decoding unit is a voice signal for conversation, and a discrimination result discriminated by the received data discriminating unit. And a selector for selecting a signal other than the third audio signal of the IP packet receiving section or selecting the third audio signal accumulated in the buffer based on the above.

【００２７】第１６の発明は、上記受信データ判別部に
よる判別結果に基づいて上記復号部により復号された第
３の音声信号がファクシミリ信号か否かを判定し、上記
第３の音声信号がファクシミリ信号のときにはこのファ
クシミリ信号のプロトコルを解析するファクシミリプロ
トコル解析部を備え、上記バッファ制御部は、上記バッ
ファ部にファクシミリ信号が蓄積されているときには、
上記ファクシミリプロトコル解析部により解析された解
析情報に基づいて上記バッファ部に蓄積されたファクシ
ミリ信号のプロトコル上、上記無音音声信号の挿入又は
廃棄を行っても問題のない第３の音声信号の無音区間に
上記無音音声信号を挿入するか又は上記第３の音声信号
の無音区間の無音音声信号を廃棄するものである。According to a sixteenth aspect of the invention, it is determined whether the third voice signal decoded by the decoding unit is a facsimile signal based on the result of the determination by the received data determining unit, and the third voice signal is a facsimile. When the facsimile signal is a signal, a facsimile protocol analyzing unit for analyzing the protocol of the facsimile signal is provided, and the buffer control unit is provided when the facsimile signal is stored in the buffer unit.
The silent section of the third voice signal, which has no problem even if the silent voice signal is inserted or discarded according to the protocol of the facsimile signal stored in the buffer section based on the analysis information analyzed by the facsimile protocol analysis section. To the silent audio signal or to discard the silent audio signal in the silent section of the third audio signal.

【００２８】[0028]

【発明の実施の形態】実施の形態１．以下、実施の形態
１を図を参照して説明する。図１は、実施の形態１の音
声伝送装置の構成図である。図１において、１０は受信
した音声ＩＰパケットから符号化音声信号を抽出するＩ
Ｐパケット受信部、１１はＩＰパケット受信部１０から
出力された符号化音声信号の有音／無音状態を検出・判
定する音声検出部、１２は音声検出部１１からの情報に
基づきＩＰパケット受信部１０からの符号化音声信号に
対して有音／無音を示すマーカーを付与するマーカー付
与部、１３はバッファ制御部１５からの指示により無音
符号化音声信号を生成し出力する無音符号化音声信号生
成部、１４はマーカー付与部１２を介して入力される符
号化音声信号を一時蓄積するバッファ部、１５はバッフ
ァ量監視部１６からの情報に基づいてバッファ部１４内
に一時蓄積されている符号化音声信号に対して無音符号
化音声信号生成部１３からの無音符号化音声信号を挿
入、又はバッファ部１４内符号化音声信号の廃棄を行う
バッファ制御部、１６はバッファ内の符号化音声信号の
蓄積量を監視するバッファ量監視部、１７は装置内クロ
ックにて定期的にバッファ部１４から出力された符号化
音声信号を復号する復号部である。BEST MODE FOR CARRYING OUT THE INVENTION Embodiment 1. Hereinafter, the first embodiment will be described with reference to the drawings. FIG. 1 is a configuration diagram of the audio transmission device according to the first embodiment. In FIG. 1, 10 is an I for extracting a coded voice signal from a received voice IP packet.
A P packet receiving unit, 11 is a voice detecting unit for detecting / judging the voiced / silent state of the encoded voice signal output from the IP packet receiving unit 10, and 12 is an IP packet receiving unit based on the information from the voice detecting unit 11. A marker assigning unit that assigns a marker indicating voiced / unvoiced to the encoded voice signal from 10, and a silence encoded voice signal generation unit that generates and outputs a silence encoded voice signal according to an instruction from the buffer control unit 15. And 14, a buffer unit for temporarily storing the encoded audio signal input via the marker giving unit 12, and 15 an encoding unit temporarily stored in the buffer unit 14 based on the information from the buffer amount monitoring unit 16. A buffer control unit that inserts the silence coded voice signal from the silence coded voice signal generation unit 13 into the voice signal or discards the coded voice signal in the buffer unit 1. The buffer amount monitoring unit that monitors the accumulation amount of the encoded audio signal in the buffer, 17 is a decoder for decoding the periodically coded speech signal output from the buffer unit 14 in device clock.

【００２９】次に動作について説明する。入力された音
声ＩＰパケットはＩＰパケット受信部１０にて音声ＩＰ
パケットに格納されている符号化音声信号が抽出され
て、音声検出部１１及びマーカー付与部１２に出力され
る。音声検出部１１では、符号化音声信号に含まれる音
声符号化パラメータや簡易復号処理によって得られた音
声信号の音声レベルなどから該当符号化音声信号が有音
状態であるか無音状態であるかを検出・判定してその結
果をマーカー付与部１２に出力する。マーカー付与部１
２では、音声検出部１１からの有音／無音情報に基づい
てＩＰパケット受信部１０から入力した符号化音声信号
に対して、例えば符号化音声信号のヘッダ情報として有
音状態であるか無音状態であるかを示すマーカーを付与
しバッファ部１４へ出力する。Next, the operation will be described. The input voice IP packet is voice IP in the IP packet receiving unit 10.
The encoded voice signal stored in the packet is extracted and output to the voice detection unit 11 and the marker addition unit 12. The voice detection unit 11 determines whether the corresponding coded voice signal is in a voiced state or a silent state based on the voice coding parameters included in the coded voice signal and the voice level of the voice signal obtained by the simple decoding process. The detection / judgment is performed and the result is output to the marker applying unit 12. Marker attachment part 1
In 2, the encoded voice signal input from the IP packet reception unit 10 based on the voiced / non-voiced information from the voice detection unit 11 is in a voiced state or a silent state, for example, as header information of the coded voice signal. And outputs it to the buffer unit 14.

【００３０】無音符号化音声信号生成部１３では、バッ
ファ制御部１５からの指示にしたがってバッファ部１４
に入力される符号化音声信号と同じ符号化方式が施され
た無音符号化音声信号を生成してバッファ部１４に出力
する。バッファ部１４では、マーカー付与部１２を介し
て入力された符号化音声信号を一時蓄積してバッファ制
御部１５により符号化音声信号の廃棄や挿入が行われた
後に受信側装置のクロックに基づいて定期的にマーカー
情報を除いた符号化音声信号が復号部１７へ出力され
る。バッファ１４内に蓄積された符号化音声信号の順序
が乱れることは無く、挿入や廃棄はされるが入力された
順で出力される。In the silence coded voice signal generation unit 13, the buffer unit 14 is operated in accordance with the instruction from the buffer control unit 15.
The coded voice signal input to the above is generated as a silence coded voice signal and output to the buffer unit 14. The buffer unit 14 temporarily stores the coded audio signal input via the marker adding unit 12 and discards or inserts the coded audio signal by the buffer control unit 15 and then based on the clock of the receiving side device. The encoded audio signal from which the marker information is removed is output to the decoding unit 17 at regular intervals. The encoded audio signals stored in the buffer 14 are not disturbed in order and are inserted or discarded, but are output in the order in which they are input.

【００３１】ここで、バッファ部１４への入力はＩＰパ
ケット送信元のクロックに基づいて行われ、バッファ部
１４からの出力は受信側装置のクロックに基づいて行わ
れるため、もし、送信元のクロックが受信側装置のクロ
ックよりも早い場合にはバッファ部１４の蓄積量が増加
する傾向となり、逆に送信元のクロックが受信側装置の
クロックよりも遅い場合にはバッファ部１４の蓄積量は
減る傾向となる。バッファ量監視部１６では、バッファ
部１４の入力や出力状況及び挿入や廃棄状況を監視して
バッファ部１４内の符号化音声信号の蓄積量をバッファ
制御部１５に通知する。復号部１７では、バッファ部１
４から出力された符号化音声信号を復号して音声信号と
して出力する。Here, since the input to the buffer unit 14 is performed based on the clock of the IP packet transmission source and the output from the buffer unit 14 is performed based on the clock of the receiving side device, if the clock of the transmission source is used. Is faster than the clock of the receiving device, the accumulated amount of the buffer unit 14 tends to increase, and conversely, when the clock of the transmission source is slower than the clock of the receiving device, the accumulated amount of the buffer unit 14 decreases. It becomes a tendency. The buffer amount monitoring unit 16 monitors the input / output status and the insertion / discard status of the buffer unit 14 and notifies the buffer control unit 15 of the storage amount of the coded audio signal in the buffer unit 14. In the decoding unit 17, the buffer unit 1
The coded audio signal output from 4 is decoded and output as an audio signal.

【００３２】バッファ制御部１５の動作について説明す
る。図２はバッファ制御部１５の動作を示したフローチ
ャートである。バッファ制御部１５はバッファ量監視部
１６からの情報に基づいてバッファ部１４の蓄積量を確
認し（ステップＳ１）、その蓄積量が予め決められた下
限値以下か否かを判断する（ステップＳ２）。The operation of the buffer controller 15 will be described. FIG. 2 is a flowchart showing the operation of the buffer control unit 15. The buffer control unit 15 confirms the storage amount of the buffer unit 14 based on the information from the buffer amount monitoring unit 16 (step S1), and determines whether the storage amount is less than or equal to a predetermined lower limit value (step S2). ).

【００３３】もし下限値以下であれば、バッファ部１４
内に蓄積されている符号化音声信号に付与されているマ
ーカーを調査して無音区間を見つけてその無音区間に無
音符号化音声信号生成部１３に指示して生成させた無音
符号化音声信号を１音声フレーム分挿入することでバッ
ファ１４内蓄積量を増やす処理を行う（ステップＳ
３）。もし下限値以下でなければ、その蓄積量が予め決
められた上限値以上か否かを判断する（ステップＳ
４）。もし上限値以上であれば、バッファ部１４内に蓄
積されている符号化音声信号に付与されたマーカーを調
査して無音区間を見つけてその無音区間の無音符号化音
声信号を１音声フレーム分廃棄することでバッファ部１
４内蓄積量を減らす処理を行う（ステップＳ５）。もし
上限値以上でなければ処理は行わない。If it is less than the lower limit value, the buffer unit 14
The marker added to the coded speech signal accumulated in the inside is searched for a silent section, and the silent coded speech signal generated by instructing the silent coded speech signal generation unit 13 to the silent section is generated. A process for increasing the storage amount in the buffer 14 by inserting one audio frame is performed (step S
3). If it is not below the lower limit value, it is judged whether or not the accumulated amount is above a predetermined upper limit value (step S
4). If it is equal to or more than the upper limit value, the marker attached to the coded voice signal accumulated in the buffer unit 14 is examined to find a silent section, and the silent coded voice signal in the silent section is discarded for one voice frame. The buffer unit 1
A process of reducing the accumulated amount in 4 is performed (step S5). If it is not over the upper limit, no processing is performed.

【００３４】なお、ここでは符号化音声信号を例に説明
したが、以下の実施の形態も含め、符号化音声信号に限
定するものではなく、符号化されていない音声信号につ
いても適用するものである。そのため各実施の形態中に
記載の符号化音声信号は音声信号、無音符号化音声信号
は無音音声信号であることを表している。Although the coded voice signal is described as an example here, the present invention is not limited to the coded voice signal in the following embodiments, and the present invention is also applied to a non-coded voice signal. is there. Therefore, the coded voice signal described in each embodiment is a voice signal, and the silence coded voice signal is a silence voice signal.

【００３５】以上のように、受信した符号化音声信号に
有音／無音を示すマーカーを付与してバッファに一時蓄
積し、そのバッファの蓄積量に応じてバッファ内の符号
化音声信号を挿入／廃棄することで、送信元（送信側装
置）のクロックと送信先（受信側装置）のクロックとの
差を吸収することができ、且つ、より高品質な音声通話
品質及びより高精度なクロック差吸収機能を安価に実現
できる。As described above, the received coded voice signal is added with the marker indicating voice / non-voice and temporarily stored in the buffer, and the coded voice signal in the buffer is inserted / stored according to the accumulated amount of the buffer. By discarding, the difference between the clock of the transmission source (transmission side device) and the clock of the transmission destination (reception side device) can be absorbed, and higher quality voice call quality and higher precision clock difference can be obtained. The absorption function can be realized at low cost.

【００３６】実施の形態２．以下、実施の形態２を図を
参照して説明する。図３は、実施の形態２の音声伝送装
置の構成図である。図３において、図１と同一符号は同
一または相当部分を示しているので説明を省略する。１
８はバッファ部１４に入力された無音区間の継続時間を
測定する無音継続測定部である。Embodiment 2. Hereinafter, the second embodiment will be described with reference to the drawings. FIG. 3 is a configuration diagram of the voice transmission device according to the second embodiment. In FIG. 3, the same reference numerals as those in FIG. 1 indicate the same or corresponding portions, and thus the description thereof will be omitted. 1
Reference numeral 8 denotes a silence continuation measurement unit that measures the duration of the silence section input to the buffer unit 14.

【００３７】次に動作について説明する。入力された音
声ＩＰパケットはＩＰパケット受信部１０にて音声ＩＰ
パケットに格納されている符号化音声信号が抽出され
て、音声検出部１１及びマーカー付与部１２に出力され
る。音声検出部１１では、符号化音声信号に含まれる音
声符号化パラメータや簡易復号処理によって得られた音
声信号の音声レベルなどから該当符号化音声信号が有音
状態であるか無音状態であるかを検出・判定してその結
果をマーカー付与部１２に出力する。マーカー付与部１
２では、音声検出部１１からの有音／無音情報に基づい
てＩＰパケット受信部１０から入力した符号化音声信号
に対して、例えば符号化音声信号のヘッダ情報として有
音状態であるか無音状態であるかを示すマーカーを付与
しバッファ部１４へ出力する。Next, the operation will be described. The input voice IP packet is voice IP in the IP packet receiving unit 10.
The encoded voice signal stored in the packet is extracted and output to the voice detection unit 11 and the marker addition unit 12. The voice detection unit 11 determines whether the corresponding coded voice signal is in a voiced state or a silent state based on the voice coding parameters included in the coded voice signal and the voice level of the voice signal obtained by the simple decoding process. The detection / judgment is performed and the result is output to the marker applying unit 12. Marker attachment part 1
In 2, the encoded voice signal input from the IP packet reception unit 10 based on the voiced / non-voiced information from the voice detection unit 11 is in a voiced state or a silent state, for example, as header information of the coded voice signal. And outputs it to the buffer unit 14.

【００３８】無音継続測定部１８では、バッファ部１４
に入力される符号化音声信号に付与されたマーカーを監
視してバッファ部１４に入力された符号化音声信号の無
音状態継続時間を測定してその結果をバッファ制御部１
５に通知する。無音符号化音声信号生成部１３では、バ
ッファ制御部１５からの指示にしたがってバッファ部１
４に入力される符号化音声信号と同じ符号化方式が施さ
れた無音符号化音声信号を生成してバッファ部１４に出
力する。バッファ部１４では、マーカー付与部１２を介
して入力された符号化音声信号を一時蓄積してバッファ
制御部１５により符号化音声信号の廃棄や挿入が行われ
た後に受信側装置のクロックに基づいて定期的にマーカ
ー情報を除いた符号化音声信号が復号部１７へ出力され
る。バッファ１４内に蓄積された符号化音声信号の順序
が乱れることは無く、挿入や廃棄はされるが入力された
順で出力される。In the silent continuation measuring unit 18, the buffer unit 14
The marker added to the encoded voice signal input to the buffer unit 14 is monitored, the silent state duration of the encoded voice signal input to the buffer unit 14 is measured, and the result is buffer control unit 1
Notify 5. In the silent coded voice signal generation unit 13, the buffer unit 1 is instructed by the buffer control unit 15.
A coded voice signal input to 4 is subjected to the same coding method as the voice coded voice signal, and is output to the buffer unit 14. The buffer unit 14 temporarily stores the coded audio signal input via the marker adding unit 12 and discards or inserts the coded audio signal by the buffer control unit 15 and then based on the clock of the receiving side device. The encoded audio signal from which the marker information is removed is output to the decoding unit 17 at regular intervals. The encoded audio signals stored in the buffer 14 are not disturbed in order and are inserted or discarded, but are output in the order in which they are input.

【００３９】ここで、バッファ部１４への入力はＩＰパ
ケット送信元のクロックに基づいて行われ、バッファ部
１４からの出力は受信側装置のクロックに基づいて行わ
れるため、もし、送信元のクロックが受信側装置のクロ
ックよりも早い場合にはバッファ部１４の蓄積量が増加
する傾向となり、逆に送信元のクロックが受信側装置の
クロックよりも遅い場合にはバッファ部１４の蓄積量は
減る傾向となる。Here, since the input to the buffer unit 14 is performed based on the clock of the IP packet transmission source and the output from the buffer unit 14 is performed based on the clock of the receiving side device, if the clock of the transmission source is used. Is faster than the clock of the receiving device, the accumulated amount of the buffer unit 14 tends to increase, and conversely, when the clock of the transmission source is slower than the clock of the receiving device, the accumulated amount of the buffer unit 14 decreases. It becomes a tendency.

【００４０】バッファ量監視部１６では、バッファ部１
４の入力や出力状況及び挿入や廃棄状況を監視してバッ
ファ部１４内の符号化音声信号の蓄積量をバッファ制御
部１５に通知する。復号部１７では、バッファ部１４か
ら出力された符号化音声信号を復号して音声信号として
出力する。In the buffer amount monitoring unit 16, the buffer unit 1
The input / output status and the insertion / discard status of 4 are monitored, and the buffer control section 15 is notified of the accumulated amount of the coded audio signal in the buffer section 14. The decoding unit 17 decodes the encoded audio signal output from the buffer unit 14 and outputs it as an audio signal.

【００４１】バッファ制御部１５の動作について説明す
る。図４はバッファ制御部１５の動作を示したフローチ
ャートである。バッファ制御部１５はバッファ量監視部
１６からの情報に基づいてバッファ部１４の蓄積量を確
認し（ステップＳ１１）、その蓄積量が予め決められた
下限値以下か否かを判断する（ステップＳ１２）。The operation of the buffer controller 15 will be described. FIG. 4 is a flowchart showing the operation of the buffer controller 15. The buffer control unit 15 confirms the storage amount of the buffer unit 14 based on the information from the buffer amount monitoring unit 16 (step S11), and determines whether the storage amount is less than or equal to a predetermined lower limit value (step S12). ).

【００４２】もし下限値以下であれば、無音継続測定部
１８からの情報に基づいて無音状態の継続時間を確認し
（ステップＳ１３）、その無音状態継続時間が予め決め
られた閾値より短いか否かを判断する（ステップＳ１
４）。もし閾値よりも短ければ、バッファ部１４内に蓄
積されている符号化音声信号に付与されているマーカー
を調査して無音区間を見つけてその無音区間に無音符号
化音声信号生成部１３に指示して生成させた無音符号化
音声信号をＮ個音声フレーム分挿入することでバッファ
１４内蓄積量を増やす処理を行う（ステップＳ１５）。
もし閾値よりも短くなければ、バッファ部１４内に蓄積
されている符号化音声信号に付与されているマーカーを
調査して無音区間を見つけてその無音区間に無音符号化
音声信号生成部１３に指示して生成させた無音符号化音
声信号をＭ個音声フレーム分挿入することでバッファ１
４内蓄積量を増やす処理を行う（ステップＳ１６）。こ
こで、ＮはＭよりも小さいとする。If it is below the lower limit value, the duration of the silent state is confirmed based on the information from the silent duration measuring unit 18 (step S13), and whether the silent duration is shorter than a predetermined threshold value or not. It is determined (step S1
4). If it is shorter than the threshold value, the marker added to the coded speech signal accumulated in the buffer unit 14 is searched to find a silent section, and the silent coded speech signal generation section 13 is instructed to find the silent section. A process of increasing the amount of accumulation in the buffer 14 is performed by inserting N voice frames of the silence-encoded voice signal generated as described above (step S15).
If it is not shorter than the threshold value, the marker added to the coded speech signal accumulated in the buffer unit 14 is searched to find a silent section, and the silent coded speech signal generation section 13 is instructed to the silent section. The silence encoded audio signal generated by the above is inserted into the buffer 1 by inserting M audio frames.
A process of increasing the storage amount in 4 is performed (step S16). Here, it is assumed that N is smaller than M.

【００４３】また、バッファ蓄積量が下限値以下でなけ
れば、その蓄積量が予め決められた上限値以上か否かを
判断する（ステップＳ１７）。もし上限値以上であれ
ば、無音継続測定部１８からの情報に基づいて無音状態
の継続時間を確認し（ステップＳ１８）、その無音状態
継続時間が予め決められた閾値より短いか否かを判断す
る（ステップＳ１９）。もし閾値よりも短ければ、バッ
ファ部１４内に蓄積されている符号化音声信号に付与さ
れたマーカーを調査して無音区間を見つけてその無音区
間の無音符号化音声信号をＸ音声フレーム分廃棄するこ
とでバッファ部１４内蓄積量を減らす処理を行う（ステ
ップＳ２０）。もし閾値よりも短くなければ、バッファ
部１４内に蓄積されている符号化音声信号に付与された
マーカーを調査して無音区間を見つけてその無音区間の
無音符号化音声信号をＹ音声フレーム分廃棄することで
バッファ部１４内蓄積量を減らす処理を行う（ステップ
Ｓ２１）。ここで、ＸはＹよりも小さいとする。また、
バッファ蓄積量が上限値以上でなければ処理は行わな
い。If the buffer storage amount is not lower than the lower limit value, it is determined whether or not the buffer storage amount is higher than a predetermined upper limit value (step S17). If it is equal to or higher than the upper limit value, the duration of the silent state is confirmed based on the information from the silent duration measuring unit 18 (step S18), and it is determined whether or not the duration of the silent state is shorter than a predetermined threshold value. Yes (step S19). If it is shorter than the threshold value, the marker attached to the coded voice signal accumulated in the buffer unit 14 is examined to find a silent section, and the silent coded voice signal in the silent section is discarded for X voice frames. As a result, processing for reducing the storage amount in the buffer unit 14 is performed (step S20). If it is not shorter than the threshold value, the marker attached to the coded voice signal accumulated in the buffer unit 14 is examined to find a silent section, and the silent coded voice signal in the silent section is discarded for Y voice frames. By doing so, a process of reducing the accumulated amount in the buffer unit 14 is performed (step S21). Here, it is assumed that X is smaller than Y. Also,
If the buffer storage amount is not more than the upper limit value, no processing is performed.

【００４４】以上のように、受信した符号化音声信号に
有音／無音を示すマーカーを付与してバッファに一時蓄
積し、そのバッファの蓄積量に応じてバッファ内の符号
化音声信号を挿入／廃棄すると共に、挿入／廃棄を行う
無音区間の長さに応じて挿入／廃棄を行う量を調整する
ことで、送信元のクロックと受信側装置のクロックとの
差を吸収することができ、且つ、より高品質な音声通話
品質及びより高精度なクロック差吸収機能を安価に実現
できる。As described above, the received coded voice signal is added with the marker indicating voice / non-voice and temporarily stored in the buffer, and the coded voice signal in the buffer is inserted / stored according to the accumulated amount of the buffer. By discarding and adjusting the amount of insertion / discarding according to the length of the silent section in which insertion / discarding is performed, it is possible to absorb the difference between the clock of the transmission source and the clock of the receiving side device, and In addition, a higher quality voice call quality and a higher precision clock difference absorption function can be realized at low cost.

【００４５】実施の形態３．以下、実施の形態３を図を
参照して説明する。図５は、実施の形態３の音声伝送装
置の構成図である。図５において、図１と同一符号は同
一または相当部分を示しているので説明を省略する。１
９は音声検出部１１からの情報に基づいてフロントハン
グオーバー及びハングオーバーを示すマーカーをマーカ
ー付与部１２を介して入力される符号化音声信号に付与
する第２マーカー付与部である。Embodiment 3. The third embodiment will be described below with reference to the drawings. FIG. 5 is a configuration diagram of the voice transmission device according to the third embodiment. In FIG. 5, the same reference numerals as those in FIG. 1
Reference numeral 9 denotes a second marker adding unit that adds a marker indicating front hangover and a hangover to the encoded audio signal input via the marker adding unit 12 based on the information from the audio detecting unit 11.

【００４６】次に動作について説明する。入力された音
声ＩＰパケットはＩＰパケット受信部１０にて音声ＩＰ
パケットに格納されている符号化音声信号が抽出され
て、音声検出部１１及びマーカー付与部１２に出力され
る。音声検出部１１では、符号化音声信号に含まれる音
声符号化パラメータや簡易復号処理によって得られた音
声信号の音声レベルなどから該当符号化音声信号が有音
状態であるか無音状態であるかを検出・判定してその結
果をマーカー付与部１２に出力する。マーカー付与部１
２では、音声検出部１１からの有音／無音情報に基づい
てＩＰパケット受信部１０から入力した符号化音声信号
に対して、例えば符号化音声信号のヘッダ情報として有
音状態であるか無音状態であるかを示すマーカーを付与
して第２マーカー付与部１９へ出力する。Next, the operation will be described. The input voice IP packet is voice IP in the IP packet receiving unit 10.
The encoded voice signal stored in the packet is extracted and output to the voice detection unit 11 and the marker addition unit 12. The voice detection unit 11 determines whether the corresponding coded voice signal is in a voiced state or a silent state based on the voice coding parameters included in the coded voice signal and the voice level of the voice signal obtained by the simple decoding process. The detection / judgment is performed and the result is output to the marker applying unit 12. Marker attachment part 1
In 2, the encoded voice signal input from the IP packet reception unit 10 based on the voiced / non-voiced information from the voice detection unit 11 is in a voiced state or a silent state, for example, as header information of the coded voice signal. And outputs it to the second marker assigning unit 19.

【００４７】第２マーカー付与部１９では、音声検出部
１１からの有音／無音情報に基づいて、無音状態から有
音状態に変化した時点よりもある一定時間前の部分をフ
ロントハングオーバーとして、有音状態から無音状態へ
変化した時点からある一定時間後の部分をハングオーバ
ーとして、マーカー付与部１２を介して入力された符号
化音声信号に対して、マーカー付与部１２と同様に例え
ば符号化音声信号のヘッダ情報としてフロントハングオ
ーバー部分であるかハングオーバー部分であるかを示す
第２のマーカーを付与してバッファ部１４へ出力する。In the second marker applying section 19, based on the voiced / non-voiced information from the voice detection section 11, a portion which is a certain time before the time point when the voiceless state is changed to the voiced state is set as a front hangover, The portion after a certain period of time from the time when the voiced state is changed to the silent state is set as a hangover, and the encoded audio signal input through the marker giving unit 12 is encoded in the same manner as the marker giving unit 12, for example. A second marker indicating whether it is the front hangover portion or the hangover portion is added as header information of the audio signal and output to the buffer unit 14.

【００４８】ここで、フロントハングオーバー、ハング
オーバーについて説明する。図６は、音声信号の大きさ
と有音／無音の判定に使う閾値、及び、その有音／無音
判定結果を模式的に表した図である。一般的に有音／無
音の判定は音声の大きさ（音圧レベル）を閾値と比較し
てその閾値以上であれば有音、閾値以下であれば無音と
する。実際の音声では、言葉の初めや終わりの部分で音
の立ち上がりや立下りの過程として閾値以下ではある
が、音として重要な部分がある。話頭や語頭、また、話
尾や語尾と称される。この部分は、図６において、音声
としては存在するが有音閾値以下であるために無音とさ
れている斜線部分に相当する。この部分をカバーするよ
うに、無音から有音に変化した時点からある一定時間前
の部分をフロントハングオーバーとし、また、有音から
無音に変化した時点からある一定時間後の部分をハング
オーバーとして、無音部分と区別することとした。Front hangover and hangover will be described. FIG. 6 is a diagram schematically showing the size of a voice signal, a threshold value used for judging presence / absence of sound, and the result of the presence / absence judgment. In general, the presence / absence of sound is determined by comparing the loudness (sound pressure level) of a voice with a threshold value, and if the sound value is equal to or higher than the threshold value, the sound value is determined, and if the threshold value is lower than the threshold value, the sound level is determined to be silent. In the actual voice, the process of rising and falling of the sound is below the threshold value at the beginning and end of the word, but there is an important part as the sound. It is called the beginning or beginning of a word, or the ending or ending. This portion corresponds to the shaded portion in FIG. 6, which is present as voice but is silent because it is below the voice threshold. To cover this part, the part before a certain period of time from the point of change from silence to voice is defined as the front hangover, and the part after a certain period of time from the point of the change from voice to silence is defined as the hangover. , It was decided to distinguish it from the silent part.

【００４９】無音符号化音声信号生成部１３では、バッ
ファ制御部１５からの指示にしたがってバッファ部１４
に入力される符号化音声信号と同じ符号化方式が施され
た無音符号化音声信号を生成してバッファ部１４に出力
する。バッファ部１４では、マーカー付与部１２及び第
２マーカー付与部１９を介して入力された符号化音声信
号を一時蓄積してバッファ制御部１５により符号化音声
信号の廃棄や挿入が行われた後に受信側装置のクロック
に基づいて定期的にマーカー情報を除いた符号化音声信
号が復号部１７へ出力される。バッファ１４内に蓄積さ
れた符号化音声信号の順序が乱れることは無く、挿入や
廃棄はされるが入力された順で出力される。In the silence coded speech signal generation unit 13, the buffer unit 14 is instructed by the buffer control unit 15.
The coded voice signal input to the above is generated as a silence coded voice signal and output to the buffer unit 14. The buffer unit 14 temporarily stores the encoded voice signal input via the marker giving unit 12 and the second marker giving unit 19 and receives the encoded voice signal after the encoded voice signal is discarded or inserted by the buffer control unit 15. The encoded audio signal from which the marker information is removed is periodically output to the decoding unit 17 based on the clock of the side device. The encoded audio signals stored in the buffer 14 are not disturbed in order and are inserted or discarded, but are output in the order in which they are input.

【００５０】ここで、バッファ部１４への入力はＩＰパ
ケット送信元のクロックに基づいて行われ、バッファ部
１４からの出力は受信側装置のクロックに基づいて行わ
れるため、もし、送信元のクロックが受信側装置のクロ
ックよりも早い場合にはバッファ部１４の蓄積量が増加
する傾向となり、逆に送信元のクロックが受信側装置の
クロックよりも遅い場合にはバッファ部１４の蓄積量は
減る傾向となる。バッファ量監視部１６では、バッファ
部１４の入力や出力状況及び挿入や廃棄状況を監視して
バッファ部１４内の符号化音声信号の蓄積量をバッファ
制御部１５に通知する。復号部１７では、バッファ部１
４から出力された符号化音声信号を復号して音声信号と
して出力する。Here, since the input to the buffer unit 14 is performed based on the clock of the IP packet transmission source and the output from the buffer unit 14 is performed based on the clock of the receiving side device, if the clock of the transmission source is used. Is faster than the clock of the receiving device, the accumulated amount of the buffer unit 14 tends to increase, and conversely, when the clock of the transmission source is slower than the clock of the receiving device, the accumulated amount of the buffer unit 14 decreases. It becomes a tendency. The buffer amount monitoring unit 16 monitors the input / output status and the insertion / discard status of the buffer unit 14 and notifies the buffer control unit 15 of the storage amount of the coded audio signal in the buffer unit 14. In the decoding unit 17, the buffer unit 1
The coded audio signal output from 4 is decoded and output as an audio signal.

【００５１】バッファ制御部１５の動作について説明す
る。図７はバッファ制御部１５の動作を示したフローチ
ャートである。バッファ制御部１５はバッファ量監視部
１６からの情報に基づいてバッファ部１４の蓄積量を確
認し（ステップＳ３１）、その蓄積量が予め決められた
下限値以下か否かを判断する（ステップＳ３２）。The operation of the buffer controller 15 will be described. FIG. 7 is a flowchart showing the operation of the buffer controller 15. The buffer control unit 15 confirms the storage amount of the buffer unit 14 based on the information from the buffer amount monitoring unit 16 (step S31), and determines whether the storage amount is less than or equal to a predetermined lower limit value (step S32). ).

【００５２】もし下限値以下であれば、バッファ部１４
内に蓄積されている符号化音声信号に付与されているマ
ーカーを調査してフロントハングオーバー区間でもなく
ハングオーバー区間でもない無音区間を見つけてその無
音区間に無音符号化音声信号生成部１３に指示して生成
させた無音符号化音声信号を１音声フレーム分挿入する
ことでバッファ１４内蓄積量を増やす処理を行う（ステ
ップＳ３３）。もし下限値以下でなければ、その蓄積量
が予め決められた第１上限値以上か否かを判断する（ス
テップＳ３４）。もし第１上限値以上であれば、その蓄
積量が予め決められた第２上限値以上か否かを判断する
（ステップＳ３５）。もし第２上限値以上であれば、そ
の蓄積量が予め決められた第３上限値以上か否かを判断
する（ステップＳ３７）。If it is less than the lower limit value, the buffer unit 14
By investigating the markers added to the coded speech signal accumulated in the inside, a silent section which is neither the front hangover section nor the hangover section is found, and the silent coded speech signal generation unit 13 is instructed to the silent section. The silence coded audio signal generated in this way is inserted into one audio frame to increase the storage amount in the buffer 14 (step S33). If it is not below the lower limit value, it is determined whether or not the accumulated amount is above a predetermined first upper limit value (step S34). If it is greater than or equal to the first upper limit value, it is determined whether or not the accumulated amount is greater than or equal to the predetermined second upper limit value (step S35). If it is greater than or equal to the second upper limit value, it is determined whether or not the accumulated amount is greater than or equal to the predetermined third upper limit value (step S37).

【００５３】もし第３上限値以上であれば、バッファ部
１４内に蓄積されている符号化音声信号に付与されたマ
ーカーを調査してフロントハングオーバー区間を見つけ
てそのフロントハングオーバー区間にある符号化音声信
号を１音声フレーム分廃棄することでバッファ部１４内
蓄積量を減らす処理を行う（ステップＳ３９）。もし第
３上限値以上でなければ、即ち第２上限値以上で第３上
限値未満であれば、バッファ部１４内に蓄積されている
符号化音声信号に付与されたマーカーを調査してハング
オーバー区間を見つけてそのハングオーバー区間にある
符号化音声信号を１音声フレーム分廃棄することでバッ
ファ部１４内蓄積量を減らす処理を行う（ステップＳ３
８）。If the third upper limit value is exceeded, the marker attached to the encoded voice signal accumulated in the buffer unit 14 is examined to find the front hangover section, and the code in the front hangover section is found. A process of reducing the accumulated amount in the buffer unit 14 is performed by discarding one voice frame of the encoded voice signal (step S39). If it is not equal to or more than the third upper limit value, that is, if it is equal to or more than the second upper limit value and less than the third upper limit value, the marker attached to the coded audio signal accumulated in the buffer unit 14 is examined to hangover. A process of reducing the accumulated amount in the buffer unit 14 by finding a section and discarding the coded voice signal in the hangover section by one voice frame is performed (step S3).
8).

【００５４】もし第２上限値以上でなければ、即ち、第
１上限値以上で第２上限値未満であれば、バッファ部１
４内に蓄積されている符号化音声信号に付与されたマー
カーを調査してフロントハングオーバー区間でもなくハ
ングオーバー区間でもない無音区間を見つけてその無音
区間にある符号化音声信号を１音声フレーム分廃棄する
ことでバッファ部１４内蓄積量を減らす処理を行う（ス
テップＳ３６）。もし第１上限値以上でなければ、処理
は行わない。If the second upper limit value is not exceeded, that is, if the first upper limit value is exceeded and the second upper limit value is exceeded, the buffer unit 1
By investigating the markers added to the coded speech signal accumulated in 4 to find a silent section which is neither a front hangover section nor a hangover section, the coded speech signal in the silent section is divided into one speech frame. A process of reducing the accumulated amount in the buffer unit 14 by discarding is performed (step S36). If it is not greater than or equal to the first upper limit, no processing is performed.

【００５５】以上のように、受信した符号化音声信号に
有音／無音を示すマーカー、及び、フロントハングオー
バー区間、ハングオーバー区間を示すマーカーを付与し
てバッファに一時蓄積し、そのバッファの蓄積量に応じ
て、且つ、バッファ内の有音／無音／フロントハングオ
ーバー／ハングオーバーの符号化音声信号の種別に応じ
て、バッファ内の符号化音声信号を挿入／廃棄すること
で、送信元のクロックと受信側装置のクロックとの差を
吸収することができ、且つ、より高品質な音声通話品質
及びより高精度なクロック差吸収機能を安価に実現でき
る。As described above, the received coded voice signal is temporarily accumulated in the buffer by adding the marker indicating the presence / absence of voice and the markers indicating the front hangover section and the hangover section, and accumulating in the buffer. By inserting / discarding the coded voice signal in the buffer according to the amount and according to the type of the voiced / silent / front hangover / hangover coded voice signal in the buffer, The difference between the clock and the clock of the receiving side device can be absorbed, and a higher quality voice call quality and a more accurate clock difference absorbing function can be realized at low cost.

【００５６】なお、本実施の形態で説明した、バッファ
内の有音／無音／フロントハングオーバー／ハングオー
バーの符号化音声信号の種別に応じて、また、バッファ
部に蓄積された符号化音声信号に対する複数の上限値に
応じて、バッファ内の符号化音声信号を廃棄すること
は、符号化音声信号の中で、廃棄しても問題の少ない符
号化音声信号をより上限値の低いところから廃棄するこ
とができるようにしたものである。本実施の形態の例で
は、上限値の一番低い第１上限値以上で第２上限値未
満、第２上限値以上で第３上限値未満、第３上限値以上
について、廃棄しても問題の少ないフロントハングオー
バー区間でもなくハングオーバー区間でもない無音区
間、ハングオーバー区間、フロントハングオーバー区間
の順でそれぞれの区間の符号化音声信号を廃棄すること
について説明したものである。It should be noted that the coded voice signals accumulated in the buffer section according to the types of voiced / silent / front hangover / hangover coded voice signals in the buffer described in the present embodiment. Discarding the coded voice signal in the buffer according to a plurality of upper limit values for It is something that can be done. In the example of the present embodiment, there is a problem even if the upper limit value is the first upper limit value or more and less than the second upper limit value, the second upper limit value or more and less than the third upper limit value, and the third upper limit value or more are discarded. It is described that the coded voice signal of each section is discarded in the order of the silent section, the hangover section, and the front hangover section, which are neither the front hangover section nor the hangover section having a small number.

【００５７】実施の形態４．以下、実施の形態４を図を
参照して説明する。図８は、実施の形態４の音声伝送装
置の構成図である。図８において、図１と同一符号は同
一または相当部分を示しているので説明を省略する。２
０はＩＰパケット受信部１０からの符号化音声信号が通
常の会話等の音声信号なのか、ファクシミリ信号などの
音声信号以外の信号なのかを判別する受信データ判別
部、２１は受信データ判別部２０の判別結果に基づき、
復号器１７への出力をＩＰパケット受信部１０からの入
力か、バッファ部１４からの入力かを選択するセレクタ
である。Fourth Embodiment Hereinafter, Embodiment 4 will be described with reference to the drawings. FIG. 8 is a configuration diagram of the voice transmission device according to the fourth embodiment. In FIG. 8, the same reference numerals as those in FIG. 1 indicate the same or corresponding portions, and thus the description thereof will be omitted. Two
Reference numeral 0 is a received data discriminating unit for discriminating whether the encoded voice signal from the IP packet receiving unit 10 is a voice signal for ordinary conversation or a signal other than the voice signal such as a facsimile signal, and 21 is a received data discriminating unit 20. Based on the determination result of
It is a selector that selects the output to the decoder 17 from the input from the IP packet receiving unit 10 or the input from the buffer unit 14.

【００５８】次に動作について説明する。入力された音
声ＩＰパケットはＩＰパケット受信部１０にて音声ＩＰ
パケットに格納されている符号化音声信号が抽出され
て、音声検出部１１及びマーカー付与部１２に出力され
る。音声検出部１１では、符号化音声信号に含まれる音
声符号化パラメータや簡易復号処理によって得られた音
声信号の音声レベルなどから該当符号化音声信号が有音
状態であるか無音状態であるかを検出・判定してその結
果をマーカー付与部１２に出力する。マーカー付与部１
２では、音声検出部１１からの有音／無音情報に基づい
てＩＰパケット受信部１０から入力した符号化音声信号
に対して、例えば符号化音声信号のヘッダ情報として有
音状態であるか無音状態であるかを示すマーカーを付与
しバッファ部１４へ出力する。Next, the operation will be described. The input voice IP packet is voice IP in the IP packet receiving unit 10.
The encoded voice signal stored in the packet is extracted and output to the voice detection unit 11 and the marker addition unit 12. The voice detection unit 11 determines whether the corresponding coded voice signal is in a voiced state or a silent state based on the voice coding parameters included in the coded voice signal and the voice level of the voice signal obtained by the simple decoding process. The detection / judgment is performed and the result is output to the marker applying unit 12. Marker attachment part 1
In 2, the encoded voice signal input from the IP packet reception unit 10 based on the voiced / non-voiced information from the voice detection unit 11 is in a voiced state or a silent state, for example, as header information of the coded voice signal. And outputs it to the buffer unit 14.

【００５９】受信データ判別部２０では、ＩＰパケット
受信部１０から入力した符号化音声信号が通常の会話な
どの音声信号なのかファクシミリ信号等の音声信号以外
の信号なのかを判別してその結果をセレクタ２１へ出力
する。セレクタ２１では、受信データ判別部２０からの
指示にしたがって、音声信号と判別されたならばバッフ
ァ１４からの入力を選択して復号部１７へ出力し、ファ
クシミリ信号などの音声信号以外の信号と判別されたな
らばＩＰパケット受信部１０からの入力を選択して復号
部１７へ出力する。無音符号化音声信号生成部１３で
は、バッファ制御部１５からの指示にしたがってバッフ
ァ部１４に入力される符号化音声信号と同じ符号化方式
が施された無音符号化音声信号を生成してバッファ部１
４に出力する。The reception data discriminating unit 20 discriminates whether the encoded voice signal input from the IP packet receiving unit 10 is a voice signal for ordinary conversation or a signal other than the voice signal such as a facsimile signal, and the result is judged. Output to the selector 21. According to the instruction from the received data discriminating unit 20, the selector 21 selects the input from the buffer 14 and outputs it to the decoding unit 17 if discriminated as a voice signal, and discriminates it as a signal other than a voice signal such as a facsimile signal. If so, the input from the IP packet receiving unit 10 is selected and output to the decoding unit 17. The silence coded voice signal generation unit 13 generates a silence coded voice signal subjected to the same coding method as the coded voice signal input to the buffer unit 14 in accordance with an instruction from the buffer control unit 15 to generate a buffer unit. 1
Output to 4.

【００６０】バッファ部１４では、マーカー付与部１２
を介して入力された符号化音声信号を一時蓄積してバッ
ファ制御部１５により符号化音声信号の廃棄や挿入が行
われた後に受信側装置のクロックに基づいて定期的にマ
ーカー情報を除いた符号化音声信号が復号部１７へ出力
される。バッファ１４内に蓄積された符号化音声信号の
順序が乱れることは無く、挿入や廃棄はされるが入力さ
れた順で出力される。In the buffer section 14, the marker applying section 12
A code in which marker information is periodically removed based on the clock of the receiving-side device after the coded voice signal input via The encoded voice signal is output to the decoding unit 17. The encoded audio signals stored in the buffer 14 are not disturbed in order and are inserted or discarded, but are output in the order in which they are input.

【００６１】ここで、バッファ部１４への入力はＩＰパ
ケット送信元のクロックに基づいて行われ、バッファ部
１４からの出力は受信側装置のクロックに基づいて行わ
れるため、もし、送信元のクロックが受信側装置のクロ
ックよりも早い場合にはバッファ部１４の蓄積量が増加
する傾向となり、逆に送信元のクロックが受信側装置の
クロックよりも遅い場合にはバッファ部１４の蓄積量は
減る傾向となる。バッファ量監視部１６では、バッファ
部１４の入力や出力状況及び挿入や廃棄状況を監視して
バッファ部１４内の符号化音声信号の蓄積量をバッファ
制御部１５に通知する。復号部１７では、バッファ部１
４から出力された符号化音声信号を復号して音声信号と
して出力する。バッファ制御部１５の動作については、
実施の形態１にて図２を用いて説明したものと同等であ
るため、説明を省略する。Here, since the input to the buffer unit 14 is performed based on the clock of the IP packet transmission source and the output from the buffer unit 14 is performed based on the clock of the receiving side device, if the clock of the transmission source is used. Is faster than the clock of the receiving device, the accumulated amount of the buffer unit 14 tends to increase, and conversely, when the clock of the transmission source is slower than the clock of the receiving device, the accumulated amount of the buffer unit 14 decreases. It becomes a tendency. The buffer amount monitoring unit 16 monitors the input / output status and the insertion / discard status of the buffer unit 14 and notifies the buffer control unit 15 of the storage amount of the coded audio signal in the buffer unit 14. In the decoding unit 17, the buffer unit 1
The coded audio signal output from 4 is decoded and output as an audio signal. Regarding the operation of the buffer controller 15,
Since this is the same as that described in the first embodiment with reference to FIG. 2, description thereof will be omitted.

【００６２】以上のように、受信した符号化音声信号に
有音／無音を示すマーカーを付与してバッファに一時蓄
積し、そのバッファの蓄積量に応じてバッファ内の符号
化音声信号を挿入／廃棄することで、送信元のクロック
と受信側装置のクロックとの差を吸収することができ、
且つ、より高品質な音声通話品質及びより高精度なクロ
ック差吸収機能を安価に実現できる。また、通常の会話
などの音声信号以外のファクシミリ信号などには、音声
信号と同様の処理は不適切である場合もあり、音声信号
とそれ以外を区別することで、トータルな高通話品質を
提供できる。As described above, the received coded voice signal is added with the marker indicating voice / non-voice and temporarily stored in the buffer, and the coded voice signal in the buffer is inserted / stored according to the storage amount of the buffer. By discarding, the difference between the clock of the transmission source and the clock of the receiving side device can be absorbed,
In addition, higher quality voice call quality and higher accuracy clock difference absorption function can be realized at low cost. In addition, processing similar to voice signals may not be suitable for facsimile signals other than voice signals such as normal conversation. By distinguishing voice signals from other processes, total high call quality is provided. it can.

【００６３】実施の形態５．以下、実施の形態５を図を
参照して説明する。図９は、実施の形態５の音声伝送装
置の構成図である。図９において、図８と同一符号は同
一または相当部分を示しているので説明を省略する。２
２はＩＰパケット受信部１０からの符号化音声信号がフ
ァクシミリ信号か否かを判定し、ファクシミリ信号であ
るならばそのプロトコルを解析するファクシミリプロト
コル解析部である。Embodiment 5. Hereinafter, the fifth embodiment will be described with reference to the drawings. FIG. 9 is a configuration diagram of the voice transmission device according to the fifth embodiment. In FIG. 9, the same symbols as those in FIG. Two
Reference numeral 2 denotes a facsimile protocol analysis unit that determines whether or not the encoded voice signal from the IP packet reception unit 10 is a facsimile signal and, if it is a facsimile signal, analyzes the protocol thereof.

【００６４】次に動作について説明する。入力された音
声ＩＰパケットはＩＰパケット受信部１０にて音声ＩＰ
パケットに格納されている符号化音声信号が抽出され
て、音声検出部１１及びマーカー付与部１２に出力され
る。音声検出部１１では、符号化音声信号に含まれる音
声符号化パラメータや簡易復号処理によって得られた音
声信号の音声レベルなどから該当符号化音声信号が有音
状態であるか無音状態であるかを検出・判定してその結
果をマーカー付与部１２に出力する。マーカー付与部１
２では、音声検出部１１からの有音／無音情報に基づい
てＩＰパケット受信部１０から入力した符号化音声信号
に対して、例えば符号化音声信号のヘッダ情報として有
音状態であるか無音状態であるかを示すマーカーを付与
しバッファ部１４へ出力する。Next, the operation will be described. The input voice IP packet is voice IP in the IP packet receiving unit 10.
The encoded voice signal stored in the packet is extracted and output to the voice detection unit 11 and the marker addition unit 12. The voice detection unit 11 determines whether the corresponding coded voice signal is in a voiced state or a silent state based on the voice coding parameters included in the coded voice signal and the voice level of the voice signal obtained by the simple decoding process. The detection / judgment is performed and the result is output to the marker applying unit 12. Marker attachment part 1
In 2, the encoded voice signal input from the IP packet reception unit 10 based on the voiced / non-voiced information from the voice detection unit 11 is in a voiced state or a silent state, for example, as header information of the coded voice signal. And outputs it to the buffer unit 14.

【００６５】受信データ判別部２０では、ＩＰパケット
受信部１０から入力した符号化音声信号が通常の会話な
どの音声信号なのかファクシミリ信号等の音声信号以外
の信号なのかを判別してその結果をファクシミリプロト
コル解析部２２へ出力する。ファクシミリプロトコル解
析部２２では、受信データ判別部２０の判別結果に基づ
いて音声信号以外であればＩＰパケット受信部１０から
の符号化音声信号の簡易復号を行ってファクシミリ信号
か否かを判定し、ファクシミリ信号である場合にはその
ファクシミリ信号のプロトコルの解析を行いバッファ監
視部１５へ通知する。The received data discriminating unit 20 discriminates whether the encoded voice signal input from the IP packet receiving unit 10 is a voice signal for normal conversation or a signal other than the voice signal such as a facsimile signal, and the result is judged. Output to the facsimile protocol analysis unit 22. The facsimile protocol analysis unit 22 performs simple decoding of the encoded voice signal from the IP packet reception unit 10 to determine whether it is a facsimile signal or not, based on the determination result of the received data determination unit 20, if it is not a voice signal. If it is a facsimile signal, the protocol of the facsimile signal is analyzed and the buffer monitoring unit 15 is notified.

【００６６】無音符号化音声信号生成部１３では、バッ
ファ制御部１５からの指示にしたがってバッファ部１４
に入力される符号化音声信号と同じ符号化方式が施され
た無音符号化音声信号を生成してバッファ部１４に出力
する。バッファ部１４では、マーカー付与部１２を介し
て入力された符号化音声信号を一時蓄積してバッファ制
御部１５により符号化音声信号の廃棄や挿入が行われた
後に受信側装置のクロックに基づいて定期的にマーカー
情報を除いた符号化音声信号が復号部１７へ出力され
る。バッファ１４内に蓄積された符号化音声信号の順序
が乱れることは無く、挿入や廃棄はされるが入力された
順で出力される。In the silence coded speech signal generation unit 13, the buffer unit 14 is instructed by the buffer control unit 15.
The coded voice signal input to the above is generated as a silence coded voice signal and output to the buffer unit 14. The buffer unit 14 temporarily stores the coded audio signal input via the marker adding unit 12 and discards or inserts the coded audio signal by the buffer control unit 15 and then based on the clock of the receiving side device. The encoded audio signal from which the marker information is removed is output to the decoding unit 17 at regular intervals. The encoded audio signals stored in the buffer 14 are not disturbed in order and are inserted or discarded, but are output in the order in which they are input.

【００６７】ここで、バッファ部１４への入力はＩＰパ
ケット送信元のクロックに基づいて行われ、バッファ部
１４からの出力は受信側装置のクロックに基づいて行わ
れるため、もし、送信元のクロックが受信側装置のクロ
ックよりも早い場合にはバッファ部１４の蓄積量が増加
する傾向となり、逆に送信元のクロックが受信側装置の
クロックよりも遅い場合にはバッファ部１４の蓄積量は
減る傾向となる。バッファ量監視部１６では、バッファ
部１４の入力や出力状況及び挿入や廃棄状況を監視して
バッファ部１４内の符号化音声信号の蓄積量をバッファ
制御部１５に通知する。復号部１７では、バッファ部１
４から出力された符号化音声信号を復号して音声信号と
して出力する。Here, since the input to the buffer unit 14 is performed based on the clock of the IP packet transmission source and the output from the buffer unit 14 is performed based on the clock of the receiving side device, if the clock of the transmission source is used. Is faster than the clock of the receiving device, the accumulated amount of the buffer unit 14 tends to increase, and conversely, when the clock of the transmission source is slower than the clock of the receiving device, the accumulated amount of the buffer unit 14 decreases. It becomes a tendency. The buffer amount monitoring unit 16 monitors the input / output status and the insertion / discard status of the buffer unit 14 and notifies the buffer control unit 15 of the storage amount of the coded audio signal in the buffer unit 14. In the decoding unit 17, the buffer unit 1
The coded audio signal output from 4 is decoded and output as an audio signal.

【００６８】ファクシミリ信号には、例えば、ＤＣＳ信
号とＥＰＴ信号との間の無音区間は７５±２０ｍ秒、Ｅ
ＰＴ信号とトレーニング信号との間は２０〜２５ｍ秒と
細かに規定されている。したがって、バッファ部１４内
符号化音声信号の無音部分に対して挿入／廃棄の制御を
行う際に、ファクシミリ信号である場合には上記のよう
なプロトコル上クリティカルな区間は避けて処理するこ
とが望ましい。In the facsimile signal, for example, the silent section between the DCS signal and the EPT signal is 75 ± 20 msec, E
The interval between the PT signal and the training signal is finely defined as 20 to 25 ms. Therefore, when controlling the insertion / discarding of the silent portion of the encoded voice signal in the buffer unit 14, it is desirable to avoid the above-mentioned protocol-critical section in the case of a facsimile signal and perform processing. .

【００６９】バッファ制御部１５では、このことを踏ま
え、ファクシミリプロトコル解析部２２からの情報に基
づいて、バッファ部１４に一時蓄積されている符号化音
声信号がファクシミリ信号の場合には、ファクシミリプ
ロトコル上、挿入／廃棄を行っても問題無い無音区間に
対して処理を行うよう制御する。これ以外のバッファ制
御部１５の動作については、実施の形態１にて図２を用
いて説明したものと同等であるため、説明を省略する。In consideration of this, the buffer control unit 15 uses the information from the facsimile protocol analysis unit 22 and, when the encoded voice signal temporarily stored in the buffer unit 14 is a facsimile signal, the buffer protocol unit 15 , The control is performed so that the processing is performed on a silent section that has no problem even if insertion / discarding is performed. The other operations of the buffer control unit 15 are the same as those described with reference to FIG. 2 in the first embodiment, and thus the description thereof will be omitted.

【００７０】以上のように、受信した符号化音声信号に
有音／無音を示すマーカーを付与してバッファに一時蓄
積し、そのバッファの蓄積量に応じてバッファ内の符号
化音声信号を挿入／廃棄することで、送信元のクロック
と受信側装置のクロックとの差を吸収することができ、
且つ、より高品質な音声通話品質及びより高精度なクロ
ック差吸収機能を安価に実現できる。また、通常の会話
などの音声信号以外のファクシミリ信号には、ファクシ
ミリプロトコルも考慮した制御を行うことで、トータル
な高通話品質を提供できる。As described above, the received coded voice signal is added with the marker indicating voice / non-voice and temporarily stored in the buffer, and the coded voice signal in the buffer is inserted / stored according to the accumulated amount of the buffer. By discarding, the difference between the clock of the transmission source and the clock of the receiving side device can be absorbed,
In addition, higher quality voice call quality and higher accuracy clock difference absorption function can be realized at low cost. In addition, for facsimile signals other than voice signals such as normal conversation, total high call quality can be provided by performing control in consideration of the facsimile protocol.

【００７１】実施の形態６．以下、実施の形態６を図を
参照して説明する。図１０は、実施の形態６の音声伝送
装置の構成図である。図１０において、１０は受信した
音声ＩＰパケットから符号化音声信号を抽出するＩＰパ
ケット受信部、１１は復号部１７から出力された音声信
号の有音／無音状態を検出・判定する音声検出部、１２
は音声検出部１１からの情報に基づき復号部１７からの
音声信号に対して有音／無音を示すマーカーを付与する
マーカー付与部、２３はバッファ制御部１５からの指示
により無音音声信号を生成し出力する無音音声信号生成
部、１４はマーカー付与部１２を介して入力される音声
信号を一時蓄積するバッファ部、１５はバッファ量監視
部１６からの情報に基づいてバッファ部１４内に一時蓄
積されている音声信号に対して無音音声信号生成部２３
からの無音音声信号を挿入、及びバッファ部１４内音声
信号の廃棄を行うバッファ制御部、１６はバッファ内の
音声信号の蓄積量を監視するバッファ量監視部、１７は
ＩＰパケット受信部１０から出力された符号化音声信号
を復号する復号部である。Sixth Embodiment Embodiment 6 will be described below with reference to the drawings. FIG. 10 is a configuration diagram of a voice transmission device according to the sixth embodiment. In FIG. 10, 10 is an IP packet receiving unit that extracts a coded voice signal from a received voice IP packet, 11 is a voice detection unit that detects / determines whether the voice signal output from the decoding unit 17 is a voiced / unvoiced state, 12
Is a marker adding unit that adds a marker indicating voiced / unvoiced to the audio signal from the decoding unit 17 based on the information from the audio detection unit 11, and 23 generates a silent audio signal according to an instruction from the buffer control unit 15. A silent audio signal generation unit for output, 14 is a buffer unit for temporarily storing the audio signal input via the marker giving unit 12, and 15 is temporarily stored in the buffer unit 14 based on the information from the buffer amount monitoring unit 16. A voice signal generator 23 for the voice signal
A buffer control unit that inserts a silent audio signal from the buffer unit and discards the audio signal in the buffer unit 14, 16 is a buffer amount monitoring unit that monitors the accumulated amount of audio signals in the buffer, and 17 is an output from the IP packet receiving unit 10. A decoding unit that decodes the encoded audio signal that has been generated.

【００７２】次に動作について説明する。入力された音
声ＩＰパケットはＩＰパケット受信部１０にて音声ＩＰ
パケットに格納されている符号化音声信号が抽出され
て、復号部１７に出力される。復号部１７では、ＩＰパ
ケット受信部１０から出力された符号化音声信号を復号
して音声信号として出力する。音声検出部１１では、復
号部１７からの音声信号の音声レベルなどから該当音声
信号が有音状態であるか無音状態であるかを検出・判定
してその結果をマーカー付与部１２に出力する。マーカ
ー付与部１２では、音声検出部１１からの有音／無音情
報に基づいて復号部１７から入力した音声信号に対し
て、例えば音声信号が８ビットの情報であれば９ビット
目の情報として有音状態であるか無音状態であるかを示
すマーカーを付与しバッファ部１４へ出力する。Next, the operation will be described. The input voice IP packet is voice IP in the IP packet receiving unit 10.
The encoded audio signal stored in the packet is extracted and output to the decoding unit 17. The decoding unit 17 decodes the encoded voice signal output from the IP packet receiving unit 10 and outputs it as a voice signal. The voice detection unit 11 detects and determines whether the corresponding voice signal is in a voiced state or a silent state based on the voice level of the voice signal from the decoding unit 17, and outputs the result to the marker addition unit 12. In the marker giving unit 12, for the voice signal input from the decoding unit 17 based on the voiced / non-voiced information from the voice detection unit 11, for example, if the voice signal is 8-bit information, it is present as the 9th bit information. A marker indicating whether it is in a sound state or a silent state is added and output to the buffer unit 14.

【００７３】無音音声信号生成部２３では、バッファ制
御部１５からの指示にしたがってバッファ部１４に入力
される音声信号と同じフォーマットの無音音声信号を生
成してバッファ部１４に出力する。バッファ部１４で
は、マーカー付与部１２を介して入力された音声信号を
一時蓄積してバッファ制御部１５により音声信号の廃棄
や挿入が行われた後に受信側装置のクロックに基づいて
定期的にマーカー情報を除いた音声信号が出力される。
バッファ１４内に蓄積された音声信号の順序が乱れるこ
とは無く、挿入や廃棄はされるが入力された順で出力さ
れる。The silent voice signal generator 23 generates a silent voice signal in the same format as the voice signal input to the buffer 14 according to an instruction from the buffer controller 15 and outputs it to the buffer 14. In the buffer unit 14, the audio signal input via the marker giving unit 12 is temporarily accumulated, and after the audio signal is discarded or inserted by the buffer control unit 15, the marker is periodically added based on the clock of the receiving side device. An audio signal excluding information is output.
The order of the audio signals stored in the buffer 14 is not disturbed, and the audio signals are inserted or discarded but are output in the input order.

【００７４】ここで、バッファ部１４への入力はＩＰパ
ケット送信元のクロックに基づいて行われ、バッファ部
１４からの出力は受信側装置のクロックに基づいて行わ
れるため、もし、送信元のクロックが受信側装置のクロ
ックよりも早い場合にはバッファ部１４の蓄積量が増加
する傾向となり、逆に送信元のクロックが受信側装置の
クロックよりも遅い場合にはバッファ部１４の蓄積量は
減る傾向となる。バッファ量監視部１６では、バッファ
部１４の入力や出力状況及び挿入や廃棄状況を監視して
バッファ部１４内の音声信号の蓄積量をバッファ制御部
１５に通知する。Here, since the input to the buffer unit 14 is performed based on the clock of the IP packet transmission source and the output from the buffer unit 14 is performed based on the clock of the receiving side device, if the clock of the transmission source is used. Is faster than the clock of the receiving device, the accumulated amount of the buffer unit 14 tends to increase, and conversely, when the clock of the transmission source is slower than the clock of the receiving device, the accumulated amount of the buffer unit 14 decreases. It becomes a tendency. The buffer amount monitoring unit 16 monitors the input / output status and the insertion / discard status of the buffer unit 14 and notifies the buffer control unit 15 of the storage amount of the audio signal in the buffer unit 14.

【００７５】バッファ制御部１５の動作について説明す
る。図１１はバッファ制御部１５の動作を示したフロー
チャートである。バッファ制御部１５はバッファ量監視
部１６からの情報に基づいてバッファ部１４の蓄積量を
確認し（ステップＳ４１）、その蓄積量が予め決められ
た下限値以下か否かを判断する（ステップＳ４２）。The operation of the buffer controller 15 will be described. FIG. 11 is a flowchart showing the operation of the buffer controller 15. The buffer control unit 15 confirms the storage amount of the buffer unit 14 based on the information from the buffer amount monitoring unit 16 (step S41), and determines whether the storage amount is less than or equal to a predetermined lower limit value (step S42). ).

【００７６】もし下限値以下であれば、バッファ部１４
内に蓄積されている音声信号に付与されているマーカー
を調査して無音区間を見つけてその無音区間に無音音声
信号生成部２３に指示して生成させた無音音声信号を１
音声サンプル分挿入することでバッファ１４内蓄積量を
増やす処理を行う（ステップＳ４３）。もし下限値以下
でなければ、その蓄積量が予め決められた上限値以上か
否かを判断する（ステップＳ４４）。もし上限値以上で
あれば、バッファ部１４内に蓄積されている音声信号に
付与されたマーカーを調査して無音区間を見つけてその
無音区間の無音音声信号を１音声サンプル分廃棄するこ
とでバッファ部１４内蓄積量を減らす処理を行う（ステ
ップＳ４５）。もし上限値以上でなければ処理は行わな
い。If the lower limit value is not exceeded, the buffer unit 14
The marker attached to the voice signal accumulated in the inside is searched for a silent section, and the silent section generated by instructing the silent sound signal generation unit 23 to the silent section is set to 1
A process of increasing the storage amount in the buffer 14 by inserting voice samples is performed (step S43). If it is not below the lower limit, it is judged whether or not the accumulated amount is above a predetermined upper limit (step S44). If it is equal to or more than the upper limit value, the marker added to the audio signal accumulated in the buffer unit 14 is searched to find a silent section, and the silent audio signal in the silent section is discarded by one audio sample. A process of reducing the accumulated amount in the unit 14 is performed (step S45). If it is not over the upper limit, no processing is performed.

【００７７】また、バッファ制御部１５の別の動作につ
いて説明する。図１２はバッファ制御部１５の動作を示
したフローチャートである。バッファ制御部１５はバッ
ファ量監視部１６からの情報に基づいてバッファ部１４
の蓄積量を確認し（ステップＳ５１）、その蓄積量が予
め決められた第１下限値以下か否かを判断する（ステッ
プＳ５２）。もし第１下限値以下であれば、更に、その
蓄積量が予め決められた第２下限値以下か否かを判断す
る（ステップＳ５３）。ここで、第１下限値は第２下限
値よりも大きい。もし第２下限値以下であれば、バッフ
ァ部１４に蓄積されている音声信号に付与されているマ
ーカーを調査して無音区間があるか否かを判断する（ス
テップＳ５５）。Another operation of the buffer controller 15 will be described. FIG. 12 is a flowchart showing the operation of the buffer controller 15. The buffer controller 15 uses the information from the buffer amount monitor 16 to buffer the buffer 14.
Is confirmed (step S51), and it is determined whether the accumulated amount is less than or equal to a predetermined first lower limit value (step S52). If it is less than or equal to the first lower limit value, it is further determined whether or not the accumulated amount is less than or equal to the predetermined second lower limit value (step S53). Here, the first lower limit value is larger than the second lower limit value. If it is less than or equal to the second lower limit, the marker added to the audio signal accumulated in the buffer unit 14 is examined to determine whether or not there is a silent section (step S55).

【００７８】無音区間があれば、バッファ部１４内に蓄
積されている音声信号の無音区間に無音音声信号生成部
２３に指示して生成させた無音音声信号を１音声サンプ
ル分挿入することでバッファ１４内蓄積量を増やす処理
を行う（ステップＳ５６）。無音区間が無ければ、バッ
ファ部１４に蓄積されている音声信号に対して補間処理
を行って１音声サンプルを生成して挿入することでバッ
ファ１４内蓄積量を増やす処理を行う（ステップＳ５
７）。ここで、補間処理とは音声信号の有音区間中の信
号欠落部分をこの信号欠落部分の前後の信号状態から補
う処理を示すものである。If there is a silent interval, the silent audio signal generated by instructing the silent audio signal generator 23 is inserted into the silent interval of the audio signal stored in the buffer unit 14 by inserting one audio sample. A process of increasing the storage amount in 14 is performed (step S56). If there is no silent section, interpolation processing is performed on the audio signal stored in the buffer unit 14 to generate and insert one audio sample, thereby performing processing for increasing the storage amount in the buffer 14 (step S5).
7). Here, the interpolation processing is processing for compensating for a signal missing portion in a voiced section of a voice signal from signal states before and after the signal missing portion.

【００７９】もし第２下限値以下でなければ、バッファ
部１４に蓄積されている音声信号に付与されているマー
カーを調査して無音区間があるか否かを判断する（ステ
ップＳ５４）。無音区間があれば、上記と同様にバッフ
ァ部１４内に蓄積されている音声信号の無音区間に無音
音声信号生成部２３に指示して生成させた無音音声信号
を１音声サンプル分挿入することでバッファ１４内蓄積
量を増やす処理を行う（ステップＳ５６）。無音区間が
なければ処理は行わない。もし第１下限値以下でなけれ
ば、その蓄積量が予め決められた第１上限値以上か否か
を判断する（ステップＳ５８）。もし第１上限値以上で
あれば、更に、その蓄積量が予め決められた第２上限値
以上か否かを判断する（ステップＳ５９）。ここで、第
１上限値は第２上限値よりも小さい。If it is not less than the second lower limit, the marker added to the audio signal accumulated in the buffer unit 14 is examined to determine whether or not there is a silent section (step S54). If there is a silent section, the silent audio signal generated by instructing the silent audio signal generating section 23 is inserted into the silent section of the audio signal stored in the buffer section 14 in the same manner as described above. A process of increasing the storage amount in the buffer 14 is performed (step S56). If there is no silent section, no processing is performed. If it is not less than the first lower limit value, it is determined whether the accumulated amount is not less than a predetermined first upper limit value (step S58). If it is greater than or equal to the first upper limit value, it is further determined whether or not the accumulated amount is greater than or equal to the predetermined second upper limit value (step S59). Here, the first upper limit is smaller than the second upper limit.

【００８０】もし第２上限値以上であれば、バッファ部
１４に蓄積されている音声信号に付与されているマーカ
ーを調査して無音区間があるか否かを判断する（ステッ
プＳ６１）。無音区間があれば、バッファ部１４内に蓄
積されている無音音声信号を１音声サンプル分廃棄する
ことでバッファ部１４内蓄積量を減らす処理を行う（ス
テップＳ６２）。無音区間が無ければ、バッファ部１４
に蓄積されている音声信号に対して間引き処理を行って
１音声サンプルを廃棄することでバッファ１４内蓄積量
を減らす処理を行う（ステップＳ６３）。If the second upper limit value is exceeded, the marker added to the audio signal accumulated in the buffer unit 14 is examined to determine whether or not there is a silent section (step S61). If there is a silent section, a process of reducing the accumulated amount in the buffer unit 14 by discarding one audio sample of the silent voice signal accumulated in the buffer unit 14 is performed (step S62). If there is no silent section, the buffer unit 14
The audio signal stored in the buffer 14 is thinned out and one audio sample is discarded to reduce the storage amount in the buffer 14 (step S63).

【００８１】もし第２上限値以上でなければ、バッファ
部１４に蓄積されている音声信号に付与されているマー
カーを調査して無音区間があるか否かを判断する（ステ
ップＳ６０）。無音区間があれば、上記と同様にバッフ
ァ部１４内に蓄積されている無音音声信号を１音声サン
プル分廃棄することでバッファ部１４内蓄積量を減らす
処理を行う（ステップＳ６２）。無音区間がなければ処
理は行わない。もし第１上限値以上でなければ処理は行
わない。If it is not more than the second upper limit, the marker added to the audio signal accumulated in the buffer unit 14 is examined to determine whether or not there is a silent section (step S60). If there is a silent section, the silent audio signal accumulated in the buffer unit 14 is discarded by one audio sample in the same manner as described above to reduce the accumulated amount in the buffer unit 14 (step S62). If there is no silent section, no processing is performed. If it is not greater than or equal to the first upper limit, no processing is performed.

【００８２】以上のように、受信し復号した音声信号に
有音／無音を示すマーカーを付与してバッファに一時蓄
積し、そのバッファの蓄積量に応じてバッファ内の音声
信号を挿入／廃棄することで、送信元のクロックと受信
側装置のクロックとの差を吸収することができ、且つ、
より高品質な音声通話品質及びより高精度なクロック差
吸収機能を安価に実現できる。また、復号音声信号に対
して処理することで、ある時間長を持つ音声フレーム単
位で処理する場合よりも、細かな制御が可能となり、更
なる高品質な音声通話品質及びより高精度なクロック差
吸収機能を安価に実現できる。As described above, the voice signal received and decoded is added with a marker indicating voice / non-voice and temporarily stored in the buffer, and the voice signal in the buffer is inserted / discarded according to the accumulated amount of the buffer. As a result, it is possible to absorb the difference between the clock of the transmission source and the clock of the receiving side device, and
A higher quality voice call quality and a higher accuracy clock difference absorption function can be realized at low cost. In addition, by processing the decoded voice signal, finer control is possible than when processing in units of voice frames having a certain time length, and further higher quality voice call quality and higher precision clock difference can be achieved. The absorption function can be realized at low cost.

【００８３】実施の形態７．以下、実施の形態７を図を
参照して説明する。図１３は、実施の形態７の音声伝送
装置の構成図である。図１３において、図１０と同一符
号は同一または相当部分を示しているので説明を省略す
る。１８はバッファ部１４に入力された無音区間の継続
時間を測定する無音継続測定部である。Seventh Embodiment Hereinafter, the seventh embodiment will be described with reference to the drawings. FIG. 13 is a configuration diagram of the voice transmission device of the seventh embodiment. In FIG. 13, the same reference numerals as those in FIG. 10 indicate the same or corresponding portions, and thus the description thereof will be omitted. Reference numeral 18 denotes a silence continuation measurement unit that measures the duration of the silence section input to the buffer unit 14.

【００８４】次に動作について説明する。入力された音
声ＩＰパケットはＩＰパケット受信部１０にて音声ＩＰ
パケットに格納されている符号化音声信号が抽出され
て、復号部１７に出力される。復号部１７では、ＩＰパ
ケット受信部１０から出力された符号化音声信号を復号
して音声信号として出力する。音声検出部１１では、復
号部１７からの音声信号の音声レベルなどから該当音声
信号が有音状態であるか無音状態であるかを検出・判定
してその結果をマーカー付与部１２に出力する。マーカ
ー付与部１２では、音声検出部１１からの有音／無音情
報に基づいて復号部１７から入力した音声信号に対し
て、例えば音声信号が８ビットの情報であれば９ビット
目の情報として有音状態であるか無音状態であるかを示
すマーカーを付与しバッファ部１４へ出力する。Next, the operation will be described. The input voice IP packet is voice IP in the IP packet receiving unit 10.
The encoded audio signal stored in the packet is extracted and output to the decoding unit 17. The decoding unit 17 decodes the encoded voice signal output from the IP packet receiving unit 10 and outputs it as a voice signal. The voice detection unit 11 detects and determines whether the corresponding voice signal is in a voiced state or a silent state based on the voice level of the voice signal from the decoding unit 17, and outputs the result to the marker addition unit 12. In the marker giving unit 12, for the voice signal input from the decoding unit 17 based on the voiced / non-voiced information from the voice detection unit 11, for example, if the voice signal is 8-bit information, it is present as the 9th bit information. A marker indicating whether it is in a sound state or a silent state is added and output to the buffer unit 14.

【００８５】無音継続測定部１８では、バッファ部１４
に入力される音声信号に付与されたマーカーを監視して
バッファ部１４に入力された音声信号の無音状態継続時
間を測定してその結果をバッファ制御部１５に通知す
る。無音音声信号生成部２３では、バッファ制御部１５
からの指示にしたがってバッファ部１４に入力される音
声信号と同じフォーマットの無音音声信号を生成してバ
ッファ部１４に出力する。バッファ部１４では、マーカ
ー付与部１２を介して入力された音声信号を一時蓄積し
てバッファ制御部１５により音声信号の廃棄や挿入が行
われた後に受信側装置のクロックに基づいて定期的にマ
ーカー情報を除いた音声信号が出力される。バッファ１
４内に蓄積された音声信号の順序が乱れることは無く、
挿入や廃棄はされるが入力された順で出力される。In the silence continuation measuring unit 18, the buffer unit 14
The marker added to the voice signal input to the buffer unit 14 is monitored, the silent state duration of the voice signal input to the buffer unit 14 is measured, and the result is notified to the buffer control unit 15. In the silent audio signal generation unit 23, the buffer control unit 15
In accordance with the instruction from, a silent audio signal having the same format as the audio signal input to the buffer unit 14 is generated and output to the buffer unit 14. In the buffer unit 14, the audio signal input via the marker giving unit 12 is temporarily accumulated, and after the audio signal is discarded or inserted by the buffer control unit 15, the marker is periodically added based on the clock of the receiving side device. An audio signal excluding information is output. Buffer 1
The order of the audio signals stored in 4 is not disturbed,
They are inserted or discarded, but they are output in the order they were input.

【００８６】ここで、バッファ部１４への入力はＩＰパ
ケット送信元のクロックに基づいて行われ、バッファ部
１４からの出力は受信側装置のクロックに基づいて行わ
れるため、もし、送信元のクロックが受信側装置のクロ
ックよりも早い場合にはバッファ部１４の蓄積量が増加
する傾向となり、逆に送信元のクロックが受信側装置の
クロックよりも遅い場合にはバッファ部１４の蓄積量は
減る傾向となる。バッファ量監視部１６では、バッファ
部１４の入力や出力状況及び挿入や廃棄状況を監視して
バッファ部１４内の音声信号の蓄積量をバッファ制御部
１５に通知する。Here, the input to the buffer unit 14 is performed based on the clock of the IP packet transmission source, and the output from the buffer unit 14 is performed based on the clock of the receiving side device. Is faster than the clock of the receiving device, the accumulated amount of the buffer unit 14 tends to increase, and conversely, when the clock of the transmission source is slower than the clock of the receiving device, the accumulated amount of the buffer unit 14 decreases. It becomes a tendency. The buffer amount monitoring unit 16 monitors the input / output status and the insertion / discard status of the buffer unit 14 and notifies the buffer control unit 15 of the storage amount of the audio signal in the buffer unit 14.

【００８７】バッファ制御部１５の動作について説明す
る。図１４はバッファ制御部１５の動作を示したフロー
チャートである。バッファ制御部１５はバッファ量監視
部１６からの情報に基づいてバッファ部１４の蓄積量を
確認し（ステップＳ７１）、その蓄積量が予め決められ
た下限値以下か否かを判断する（ステップＳ７２）。The operation of the buffer controller 15 will be described. FIG. 14 is a flowchart showing the operation of the buffer controller 15. The buffer control unit 15 confirms the storage amount of the buffer unit 14 based on the information from the buffer amount monitoring unit 16 (step S71), and determines whether the storage amount is less than or equal to a predetermined lower limit value (step S72). ).

【００８８】もし下限値以下であれば、無音継続測定部
１８からの情報に基づいて無音状態の継続時間を確認し
（ステップＳ７３）、その無音状態継続時間が予め決め
られた閾値より短いか否かを判断する（ステップＳ７
４）。もし閾値よりも短ければ、バッファ部１４内に蓄
積されている音声信号に付与されているマーカーを調査
して無音区間を見つけてその無音区間に無音音声信号生
成部２３に指示して生成させた無音音声信号をＮ個音声
サンプル分挿入することでバッファ１４内蓄積量を増や
す処理を行う（ステップＳ７５）。もし閾値よりも短く
なければ、バッファ部１４内に蓄積されている音声信号
に付与されているマーカーを調査して無音区間を見つけ
てその無音区間に無音音声信号生成部２３に指示して生
成させた無音音声信号をＭ個音声サンプル分挿入するこ
とでバッファ１４内蓄積量を増やす処理を行う（ステッ
プＳ７６）。ここで、ＮはＭよりも小さいとする。If it is less than the lower limit value, the duration of the silent state is confirmed based on the information from the silent duration measuring unit 18 (step S73), and whether or not the duration of the silent state is shorter than a predetermined threshold value. It is determined (step S7)
4). If it is shorter than the threshold value, the marker attached to the audio signal accumulated in the buffer unit 14 is investigated to find a silent section, and the silent section is instructed to generate the silent section. A process of increasing the accumulated amount in the buffer 14 by inserting N voice samples of the voiceless voice signal is performed (step S75). If it is not shorter than the threshold value, the marker added to the audio signal accumulated in the buffer unit 14 is searched to find a silent section, and the silent audio signal generation section 23 is instructed to generate the silent section. A process of increasing the accumulated amount in the buffer 14 is performed by inserting M sound samples of the silent audio signal (step S76). Here, it is assumed that N is smaller than M.

【００８９】また、バッファ蓄積量が下限値以下でなけ
れば、その蓄積量が予め決められた上限値以上か否かを
判断する（ステップＳ７７）。もし上限値以上であれ
ば、無音継続測定部１８からの情報に基づいて無音状態
の継続時間を確認し（ステップＳ７８）、その無音状態
継続時間が予め決められた閾値より短いか否かを判断す
る（ステップＳ７９）。もし閾値よりも短ければ、バッ
ファ部１４内に蓄積されている音声信号に付与されたマ
ーカーを調査して無音区間を見つけてその無音区間の無
音音声信号をＸ音声サンプル分廃棄することでバッファ
部１４内蓄積量を減らす処理を行う（ステップＳ８
０）。もし閾値よりも短くなければ、バッファ部１４内
に蓄積されている音声信号に付与されたマーカーを調査
して無音区間を見つけてその無音区間の無音音声信号を
Ｙ音声サンプル分廃棄することでバッファ部１４内蓄積
量を減らす処理を行う（ステップＳ８１）。ここで、Ｘ
はＹよりも小さいとする。また、バッファ蓄積量が上限
値以上でなければ処理は行わない。If the buffer storage amount is not lower than the lower limit value, it is determined whether the buffer storage amount is higher than a predetermined upper limit value (step S77). If it is equal to or more than the upper limit value, the duration of the silent state is confirmed based on the information from the silent duration measuring unit 18 (step S78), and it is determined whether or not the duration of the silent state is shorter than a predetermined threshold value. Yes (step S79). If it is shorter than the threshold value, the buffer unit 14 investigates the markers added to the audio signals accumulated in the buffer unit 14 to find a silent section and discards the silent audio signal in the silent section for X audio samples. Processing for reducing the accumulated amount in 14 is performed (step S8).
0). If it is not shorter than the threshold value, the marker added to the audio signal accumulated in the buffer unit 14 is searched to find a silent section, and the silent audio signal in the silent section is discarded by Y audio samples. A process of reducing the accumulated amount in the unit 14 is performed (step S81). Where X
Is smaller than Y. If the buffer storage amount is not more than the upper limit value, no processing is performed.

【００９０】以上のように、受信し復号した音声信号に
有音／無音を示すマーカーを付与してバッファに一時蓄
積し、そのバッファの蓄積量に応じてバッファ内の音声
信号を挿入／廃棄すると共に、挿入／廃棄を行う無音区
間の長さに応じて挿入／廃棄を行う量を調整すること
で、送信元のクロックと受信側装置のクロックとの差を
吸収することができ、且つ、より高品質な音声通話品質
及びより高精度なクロック差吸収機能を安価に実現でき
る。また、復号音声信号に対して処理することで、ある
時間長を持つ音声フレーム単位で処理する場合よりも、
細かな制御が可能となり、更なる高品質な音声通話品質
及びより高精度なクロック差吸収機能を安価に実現でき
る。As described above, the voice signal received and decoded is added with a marker indicating voice / non-voice and temporarily stored in the buffer, and the voice signal in the buffer is inserted / discarded according to the storage amount of the buffer. At the same time, by adjusting the amount of insertion / discarding according to the length of the silent section in which insertion / discarding is performed, the difference between the clock of the transmission source and the clock of the reception side device can be absorbed, and more A high-quality voice call quality and a more accurate clock difference absorption function can be realized at low cost. In addition, by processing the decoded audio signal, compared to the case of processing for each audio frame having a certain time length,
Fine control is possible, and a higher quality voice communication quality and a more accurate clock difference absorption function can be realized at low cost.

【００９１】実施の形態８．以下、実施の形態８を図を
参照して説明する。図１５は、実施の形態８の音声伝送
装置の構成図である。図１５において、図１０と同一符
号は同一または相当部分を示しているので説明を省略す
る。１９は音声検出部１１からの情報に基づいてフロン
トハングオーバー及びハングオーバーを示すマーカーを
マーカー付与部１２を介して入力される符号化音声信号
に付与する第２マーカー付与部である。Eighth Embodiment The eighth embodiment will be described below with reference to the drawings. FIG. 15 is a configuration diagram of the voice transmission device of the eighth embodiment. In FIG. 15, the same reference numerals as those in FIG. 10 indicate the same or corresponding portions, and thus the description thereof will be omitted. Reference numeral 19 denotes a second marker adding section that adds a marker indicating front hangover and a hangover to the encoded audio signal input via the marker adding section 12 based on the information from the voice detecting section 11.

【００９２】次に動作について説明する。入力された音
声ＩＰパケットはＩＰパケット受信部１０にて音声ＩＰ
パケットに格納されている符号化音声信号が抽出され
て、復号部１７に出力される。復号部１７では、ＩＰパ
ケット受信部１０から出力された符号化音声信号を復号
して音声信号として出力する。音声検出部１１では、復
号部１７からの音声信号の音声レベルなどから該当音声
信号が有音状態であるか無音状態であるかを検出・判定
してその結果をマーカー付与部１２に出力する。マーカ
ー付与部１２では、音声検出部１１からの有音／無音情
報に基づいて復号部１７から入力した音声信号に対し
て、例えば音声信号が８ビットの情報であれば９ビット
目の情報として有音状態であるか無音状態であるかを示
すマーカーを付与して第２マーカー付与部１９へ出力す
る。Next, the operation will be described. The input voice IP packet is voice IP in the IP packet receiving unit 10.
The encoded audio signal stored in the packet is extracted and output to the decoding unit 17. The decoding unit 17 decodes the encoded voice signal output from the IP packet receiving unit 10 and outputs it as a voice signal. The voice detection unit 11 detects and determines whether the corresponding voice signal is in a voiced state or a silent state based on the voice level of the voice signal from the decoding unit 17, and outputs the result to the marker addition unit 12. In the marker giving unit 12, for the voice signal input from the decoding unit 17 based on the voiced / non-voiced information from the voice detection unit 11, for example, if the voice signal is 8-bit information, it is present as the 9th bit information. A marker indicating whether it is in a sound state or a silent state is added and output to the second marker adding unit 19.

【００９３】第２マーカー付与部１９では、音声検出部
１１からの有音／無音情報に基づいて、無音状態から有
音状態に変化した時点よりもある一定時間前の部分をフ
ロントハングオーバーとして、有音状態から無音状態へ
変化した時点からある一定時間後の部分をハングオーバ
ーとして、マーカー付与部１２を介して入力された音声
信号に対して、マーカー付与部１２と同様に例えば音声
信号が８ビットの情報で９ビット目がマーカー付与部１
２でのマーカー情報であれば１０ビット目の情報として
フロントハングオーバー部分であるかハングオーバー部
分であるかを示す第２のマーカーを付与してバッファ部
１４へ出力する。In the second marker applying section 19, based on the voiced / non-voiced information from the voice detection section 11, a portion before a certain period of time from the time when the voiceless state is changed to the voiced state is set as a front hangover, As in the case where the voice signal input via the marker giving unit 12 is a voice signal input via the marker giving unit 12, for example, a voice signal is 8 The 9th bit in the bit information is the marker attaching unit 1
If the marker information is 2, a second marker indicating whether it is the front hangover portion or the hangover portion is added as the 10th bit information and output to the buffer unit 14.

【００９４】無音音声信号生成部２３では、バッファ制
御部１５からの指示にしたがってバッファ部１４に入力
される音声信号と同じフォーマットの無音音声信号を生
成してバッファ部１４に出力する。バッファ部１４で
は、マーカー付与部１２及び第２マーカー付与部１９を
介して入力された音声信号を一時蓄積してバッファ制御
部１５により音声信号の廃棄や挿入が行われた後に受信
側装置のクロックに基づいて定期的にマーカー情報を除
いた音声信号が出力される。バッファ１４内に蓄積され
た音声信号の順序が乱れることは無く、挿入や廃棄はさ
れるが入力された順で出力される。The silent voice signal generator 23 generates a silent voice signal in the same format as the voice signal input to the buffer 14 according to an instruction from the buffer controller 15 and outputs it to the buffer 14. In the buffer unit 14, the audio signal input via the marker giving unit 12 and the second marker giving unit 19 is temporarily stored, and the buffer control unit 15 discards or inserts the audio signal, and then the clock of the receiving side device. Based on the above, the audio signal excluding the marker information is periodically output. The order of the audio signals stored in the buffer 14 is not disturbed, and the audio signals are inserted or discarded but are output in the input order.

【００９５】ここで、バッファ部１４への入力はＩＰパ
ケット送信元のクロックに基づいて行われ、バッファ部
１４からの出力は受信側装置のクロックに基づいて行わ
れるため、もし、送信元のクロックが受信側装置のクロ
ックよりも早い場合にはバッファ部１４の蓄積量が増加
する傾向となり、逆に送信元のクロックが受信側装置の
クロックよりも遅い場合にはバッファ部１４の蓄積量は
減る傾向となる。バッファ量監視部１６では、バッファ
部１４の入力や出力状況及び挿入や廃棄状況を監視して
バッファ部１４内の音声信号の蓄積量をバッファ制御部
１５に通知する。Here, since the input to the buffer unit 14 is performed based on the clock of the IP packet transmission source and the output from the buffer unit 14 is performed based on the clock of the receiving side device, if the clock of the transmission source is used. Is faster than the clock of the receiving device, the accumulated amount of the buffer unit 14 tends to increase, and conversely, when the clock of the transmission source is slower than the clock of the receiving device, the accumulated amount of the buffer unit 14 decreases. It becomes a tendency. The buffer amount monitoring unit 16 monitors the input / output status and the insertion / discard status of the buffer unit 14 and notifies the buffer control unit 15 of the storage amount of the audio signal in the buffer unit 14.

【００９６】バッファ制御部１５の動作について説明す
る。図１６はバッファ制御部１５の動作を示したフロー
チャートである。バッファ制御部１５はバッファ量監視
部１６からの情報に基づいてバッファ部１４の蓄積量を
確認し（ステップＳ９１）、その蓄積量が予め決められ
た下限値以下か否かを判断する（ステップＳ９２）。The operation of the buffer controller 15 will be described. FIG. 16 is a flowchart showing the operation of the buffer control unit 15. The buffer control unit 15 confirms the storage amount of the buffer unit 14 based on the information from the buffer amount monitoring unit 16 (step S91), and determines whether the storage amount is less than or equal to a predetermined lower limit value (step S92). ).

【００９７】もし下限値以下であれば、バッファ部１４
内に蓄積されている音声信号に付与されているマーカー
を調査してフロントハングオーバー区間でもなくハング
オーバー区間でもない無音区間を見つけてその無音区間
に無音音声信号生成部２３に指示して生成させた無音音
声信号を１音声サンプル分挿入することでバッファ１４
内蓄積量を増やす処理を行う（ステップＳ９３）。もし
下限値以下でなければ、その蓄積量が予め決められた第
１上限値以上か否かを判断する（ステップＳ９４）。も
し第１上限値以上であれば、その蓄積量が予め決められ
た第２上限値以上か否かを判断する（ステップＳ９
５）。もし第２上限値以上であれば、その蓄積量が予め
決められた第３上限値以上か否かを判断する（ステップ
Ｓ９７）。If the lower limit value is not exceeded, the buffer unit 14
By investigating the markers added to the audio signals accumulated in the inside, a silent section which is neither a front hangover section nor a hangover section is found, and the silent section is instructed to generate the silent section. The buffer 14 can be inserted by inserting one silent sample
A process of increasing the internal storage amount is performed (step S93). If it is not less than the lower limit value, it is determined whether or not the accumulated amount is not less than the predetermined first upper limit value (step S94). If it is greater than or equal to the first upper limit value, it is determined whether or not the accumulated amount is greater than or equal to the second predetermined upper limit value (step S9).
5). If it is greater than or equal to the second upper limit value, it is determined whether the accumulated amount is greater than or equal to the predetermined third upper limit value (step S97).

【００９８】もし第３上限値以上であれば、バッファ部
１４内に蓄積されている音声信号に付与されたマーカー
を調査してフロントハングオーバー区間を見つけてその
フロントハングオーバー区間にある音声信号を１音声サ
ンプル分廃棄することでバッファ部１４内蓄積量を減ら
す処理を行う（ステップＳ９９）。もし第３上限値以上
でなければ、バッファ部１４内に蓄積されている音声信
号に付与されたマーカーを調査してハングオーバー区間
を見つけてそのハングオーバー区間にある音声信号を１
音声サンプル分廃棄することでバッファ部１４内蓄積量
を減らす処理を行う（ステップＳ９８）。If the third upper limit value is exceeded, the marker attached to the voice signal accumulated in the buffer unit 14 is investigated to find the front hangover section, and the voice signal in the front hangover section is detected. A process of reducing the accumulated amount in the buffer unit 14 by discarding one audio sample is performed (step S99). If it is not equal to or more than the third upper limit value, the marker attached to the audio signal accumulated in the buffer unit 14 is searched to find the hangover section, and the audio signal in the hangover section is set to 1
A process of reducing the accumulated amount in the buffer unit 14 by discarding the audio samples is performed (step S98).

【００９９】もし第２上限値以上でなければ、バッファ
部１４内に蓄積されている音声信号に付与されたマーカ
ーを調査してフロントハングオーバー区間でもなくハン
グオーバー区間でもない無音区間を見つけてその無音区
間にある音声信号を１音声サンプル分廃棄することでバ
ッファ部１４内蓄積量を減らす処理を行う（ステップＳ
９６）。もし第１上限値以上でなければ、処理は行わな
い。If the second upper limit value is not exceeded, the marker added to the audio signal accumulated in the buffer unit 14 is examined to find a silent section which is neither the front hangover section nor the hangover section, and A process of reducing the accumulated amount in the buffer unit 14 is performed by discarding one voice sample of the voice signal in the silent section (step S
96). If it is not greater than or equal to the first upper limit, no processing is performed.

【０１００】以上のように、受信し復号した音声信号に
有音／無音を示すマーカー、及び、フロントハングオー
バー区間、ハングオーバー区間を示すマーカーを付与し
てバッファに一時蓄積し、そのバッファの蓄積量に応じ
て、且つ、バッファ内の有音／無音／フロントハングオ
ーバー／ハングオーバーの音声信号の種別に応じて、バ
ッファ内の音声信号を挿入／廃棄することで、送信元の
クロックと受信側装置のクロックとの差を吸収すること
ができ、且つ、より高品質な音声通話品質及びより高精
度なクロック差吸収機能を安価に実現できる。また、復
号音声信号に対して処理することで、ある時間長を持つ
音声フレーム単位で処理する場合よりも、細かな制御が
可能となり、更なる高品質な音声通話品質及びより高精
度なクロック差吸収機能を安価に実現できる。As described above, the received / decoded voice signal is added with the marker indicating voiced / non-voiced, the front hangover section, and the marker indicating the hangover section, temporarily stored in the buffer, and stored in the buffer. By inserting / discarding the audio signal in the buffer according to the volume and according to the type of audio signal / silence / front hangover / hangover audio signal in the buffer, the clock of the transmission source and the receiving side It is possible to absorb the difference from the clock of the device, and to realize a higher quality voice communication quality and a more accurate clock difference absorption function at low cost. In addition, by processing the decoded voice signal, finer control is possible than when processing in units of voice frames having a certain time length, and further higher quality voice call quality and higher precision clock difference can be achieved. The absorption function can be realized at low cost.

【０１０１】実施の形態９．以下、実施の形態９を図を
参照して説明する。図１７は、実施の形態９の音声伝送
装置の構成図である。図１７において、図１０と同一符
号は同一または相当部分を示しているので説明を省略す
る。２０は復号部１７からの音声信号が通常の会話等の
音声信号なのか、ファクシミリ信号などの音声信号以外
の信号なのかを判別する受信データ判別部、２１は受信
データ判別部２０の判別結果に基づき、出力を復号部１
７からの入力か、バッファ部１４からの入力かを選択す
るセレクタである。Ninth Embodiment Hereinafter, Embodiment 9 will be described with reference to the drawings. FIG. 17 is a configuration diagram of the audio transmission device according to the ninth embodiment. In FIG. 17, the same reference numerals as those in FIG. 10 indicate the same or corresponding portions, and thus the description thereof is omitted. Reference numeral 20 is a received data discriminating unit for discriminating whether the voice signal from the decoding unit 17 is a voice signal for ordinary conversation or a signal other than the voice signal such as a facsimile signal, and 21 is a discrimination result of the received data discriminating unit 20. Based on the output, the decoding unit 1
7 is a selector for selecting an input from 7 or an input from the buffer unit 14.

【０１０２】次に動作について説明する。入力された音
声ＩＰパケットはＩＰパケット受信部１０にて音声ＩＰ
パケットに格納されている符号化音声信号が抽出され
て、復号部１７に出力される。復号部１７では、ＩＰパ
ケット受信部１０出力された符号化音声信号を復号して
音声信号として出力する。音声検出部１１では、復号部
１７からの音声信号の音声レベルなどから該当音声信号
が有音状態であるか無音状態であるかを検出・判定して
その結果をマーカー付与部１２に出力する。マーカー付
与部１２では、音声検出部１１からの有音／無音情報に
基づいて復号部１７から入力した音声信号に対して、例
えば音声信号が８ビットの情報であるならば９ビット目
の情報として有音状態であるか無音状態であるかを示す
マーカーを付与しバッファ部１４へ出力する。Next, the operation will be described. The input voice IP packet is voice IP in the IP packet receiving unit 10.
The encoded audio signal stored in the packet is extracted and output to the decoding unit 17. The decoding unit 17 decodes the encoded voice signal output from the IP packet receiving unit 10 and outputs it as a voice signal. The voice detection unit 11 detects and determines whether the corresponding voice signal is in a voiced state or a silent state based on the voice level of the voice signal from the decoding unit 17, and outputs the result to the marker addition unit 12. In the marker giving unit 12, for the voice signal input from the decoding unit 17 based on the voiced / non-voiced information from the voice detection unit 11, for example, if the voice signal is 8-bit information, as the 9th bit information, A marker indicating whether it is in a voiced state or a silent state is added and output to the buffer unit 14.

【０１０３】受信データ判別部２０では、復号部１７か
ら入力した音声信号が通常の会話などの音声信号なのか
ファクシミリ信号等の音声信号以外の信号なのかを判別
してその結果をセレクタ２１へ出力する。セレクタ２１
では、受信データ判別部２０からの指示にしたがって、
音声信号と判別されたならばバッファ１４からの入力を
選択して出力し、ファクシミリ信号などの音声信号以外
の信号と判別されたならば復号部１７からの入力を選択
して出力する。無音音声信号生成部２３では、バッファ
制御部１５からの指示にしたがってバッファ部１４に入
力される音声信号と同じフォーマットの無音音声信号を
生成してバッファ部１４に出力する。The received data discriminating unit 20 discriminates whether the voice signal input from the decoding unit 17 is a voice signal for normal conversation or a signal other than the voice signal such as a facsimile signal and outputs the result to the selector 21. To do. Selector 21
Then, according to the instruction from the received data discrimination unit 20,
If it is determined to be a voice signal, the input from the buffer 14 is selected and output, and if it is determined to be a signal other than the voice signal such as a facsimile signal, the input from the decoding unit 17 is selected and output. The silent voice signal generation unit 23 generates a silent voice signal in the same format as the voice signal input to the buffer unit 14 according to an instruction from the buffer control unit 15 and outputs the generated voice signal to the buffer unit 14.

【０１０４】バッファ部１４では、マーカー付与部１２
を介して入力された音声信号を一時蓄積してバッファ制
御部１５により音声信号の廃棄や挿入が行われた後に受
信側装置のクロックに基づいて定期的にマーカー情報を
除いた音声信号がセレクタ２１へ出力される。バッファ
１４内に蓄積された音声信号の順序が乱れることは無
く、挿入や廃棄はされるが入力された順で出力される。In the buffer unit 14, the marker giving unit 12
After the voice signal input via the buffer is temporarily stored and the buffer control unit 15 discards or inserts the voice signal, the voice signal from which the marker information is periodically removed based on the clock of the receiving device is the selector 21. Is output to. The order of the audio signals stored in the buffer 14 is not disturbed, and the audio signals are inserted or discarded but are output in the input order.

【０１０５】ここで、バッファ部１４への入力はＩＰパ
ケット送信元のクロックに基づいて行われ、バッファ部
１４からの出力は受信側装置のクロックに基づいて行わ
れるため、もし、送信元のクロックが受信側装置のクロ
ックよりも早い場合にはバッファ部１４の蓄積量が増加
する傾向となり、逆に送信元のクロックが受信側装置の
クロックよりも遅い場合にはバッファ部１４の蓄積量は
減る傾向となる。バッファ量監視部１６では、バッファ
部１４の入力や出力状況及び挿入や廃棄状況を監視して
バッファ部１４内の音声信号の蓄積量をバッファ制御部
１５に通知する。バッファ制御部１５の動作について
は、実施の形態６にて図１１を用いて説明したものと同
等であるため、説明を省略する。Here, the input to the buffer unit 14 is performed based on the clock of the IP packet transmission source, and the output from the buffer unit 14 is performed based on the clock of the receiving side device. Is faster than the clock of the receiving device, the accumulated amount of the buffer unit 14 tends to increase, and conversely, when the clock of the transmission source is slower than the clock of the receiving device, the accumulated amount of the buffer unit 14 decreases. It becomes a tendency. The buffer amount monitoring unit 16 monitors the input / output status and the insertion / discard status of the buffer unit 14 and notifies the buffer control unit 15 of the storage amount of the audio signal in the buffer unit 14. The operation of the buffer control unit 15 is the same as that described with reference to FIG. 11 in the sixth embodiment, so the description thereof will be omitted.

【０１０６】以上のように、受信し復号した音声信号に
有音／無音を示すマーカーを付与してバッファに一時蓄
積し、そのバッファの蓄積量に応じてバッファ内の音声
信号を挿入／廃棄することで、送信元のクロックと受信
側装置のクロックとの差を吸収することができ、且つ、
より高品質な音声通話品質及びより高精度なクロック差
吸収機能を安価に実現できる。また、通常の会話などの
音声信号以外のファクシミリ信号などには、音声信号と
同様の処理は不適切である場合もあり、音声信号とそれ
以外を区別することで、トータルな高通話品質を提供で
きる。また、復号音声信号に対して処理することで、あ
る時間長を持つ音声フレーム単位で処理する場合より
も、細かな制御が可能となり、更なる高品質な音声通話
品質及びより高精度なクロック差吸収機能を安価に実現
できる。As described above, the voice signal received and decoded is added with a marker indicating voice / non-voice and temporarily stored in the buffer, and the voice signal in the buffer is inserted / discarded according to the storage amount of the buffer. As a result, it is possible to absorb the difference between the clock of the transmission source and the clock of the receiving side device, and
A higher quality voice call quality and a higher accuracy clock difference absorption function can be realized at low cost. In addition, processing similar to voice signals may not be suitable for facsimile signals other than voice signals such as normal conversation. By distinguishing voice signals from other processes, total high call quality is provided. it can. In addition, by processing the decoded voice signal, finer control is possible than when processing in units of voice frames having a certain time length, and further higher quality voice call quality and higher precision clock difference can be achieved. The absorption function can be realized at low cost.

【０１０７】実施の形態１０．以下、実施の形態１０を
図を参照して説明する。図１８は、実施の形態１０の音
声伝送装置の構成図である。図１８において、図１７と
同一符号は同一または相当部分を示しているので説明を
省略する。２２は復号部１７からの音声信号がファクシ
ミリ信号か否かを判定し、ファクシミリ信号であるなら
ばそのプロトコルを解析するファクシミリプロトコル解
析部である。Embodiment 10. The tenth embodiment will be described below with reference to the drawings. FIG. 18 is a configuration diagram of the audio transmission device according to the tenth embodiment. In FIG. 18, the same symbols as those in FIG. 17 indicate the same or corresponding portions, and thus the description thereof will be omitted. A facsimile protocol analysis unit 22 determines whether the voice signal from the decoding unit 17 is a facsimile signal and analyzes the protocol if the audio signal is a facsimile signal.

【０１０８】次に動作について説明する。入力された音
声ＩＰパケットはＩＰパケット受信部１０にて音声ＩＰ
パケットに格納されている符号化音声信号が抽出され
て、復号部１７に出力される。復号部１７では、ＩＰパ
ケット受信部１０から出力された符号化音声信号を復号
して音声信号として出力する。音声検出部１１では、復
号部１７からの音声信号の音声レベルなどから該当音声
信号が有音状態であるか無音状態であるかを検出・判定
してその結果をマーカー付与部１２に出力する。マーカ
ー付与部１２では、音声検出部１１からの有音／無音情
報に基づいて復号部１７から入力した音声信号に対し
て、例えば音声信号が８ビットの情報であるならば９ビ
ット目の情報として有音状態であるか無音状態であるか
を示すマーカーを付与しバッファ部１４へ出力する。Next, the operation will be described. The input voice IP packet is voice IP in the IP packet receiving unit 10.
The encoded audio signal stored in the packet is extracted and output to the decoding unit 17. The decoding unit 17 decodes the encoded voice signal output from the IP packet receiving unit 10 and outputs it as a voice signal. The voice detection unit 11 detects and determines whether the corresponding voice signal is in a voiced state or a silent state based on the voice level of the voice signal from the decoding unit 17, and outputs the result to the marker addition unit 12. In the marker giving unit 12, for the voice signal input from the decoding unit 17 based on the voiced / non-voiced information from the voice detection unit 11, for example, if the voice signal is 8-bit information, as the 9th bit information, A marker indicating whether it is in a voiced state or a silent state is added and output to the buffer unit 14.

【０１０９】受信データ判別部２０では、復号部１７か
ら入力した音声信号が通常の会話などの音声信号なのか
ファクシミリ信号等の音声信号以外の信号なのかを判別
してその結果をファクシミリプロトコル解析部２２へ出
力する。ファクシミリプロトコル解析部２２では、受信
データ判別部２０の判別結果に基づいて音声信号以外で
あれば復号部１７からの音声信号がファクシミリ信号か
否かを判定し、ファクシミリ信号である場合にはそのフ
ァクシミリ信号のプロトコルの解析を行いバッファ監視
部１５へ通知する。The received data discriminating unit 20 discriminates whether the voice signal input from the decoding unit 17 is a voice signal for ordinary conversation or a signal other than the voice signal such as a facsimile signal, and the result is determined by the facsimile protocol analyzing unit. 22 is output. The facsimile protocol analysis unit 22 determines whether the voice signal from the decoding unit 17 is a facsimile signal or not, if it is not a voice signal, based on the discrimination result of the received data discriminating unit 20, and if it is a facsimile signal, the facsimile signal. The protocol of the signal is analyzed and the buffer monitoring unit 15 is notified.

【０１１０】無音音声信号生成部２３では、バッファ制
御部１５からの指示にしたがってバッファ部１４に入力
される音声信号と同じフォーマットの無音音声信号を生
成してバッファ部１４に出力する。バッファ部１４で
は、マーカー付与部１２を介して入力された音声信号を
一時蓄積してバッファ制御部１５により音声信号の廃棄
や挿入が行われた後に受信側装置のクロックに基づいて
定期的にマーカー情報を除いた音声信号が出力される。
バッファ１４内に蓄積された符号化音声信号の順序が乱
れることは無く、挿入や廃棄はされるが入力された順で
出力される。The silent voice signal generating section 23 generates a silent voice signal of the same format as the voice signal input to the buffer section 14 according to an instruction from the buffer control section 15, and outputs it to the buffer section 14. In the buffer unit 14, the audio signal input via the marker giving unit 12 is temporarily accumulated, and after the audio signal is discarded or inserted by the buffer control unit 15, the marker is periodically added based on the clock of the receiving side device. An audio signal excluding information is output.
The encoded audio signals stored in the buffer 14 are not disturbed in order and are inserted or discarded, but are output in the order in which they are input.

【０１１１】ここで、バッファ部１４への入力はＩＰパ
ケット送信元のクロックに基づいて行われ、バッファ部
１４からの出力は受信側装置のクロックに基づいて行わ
れるため、もし、送信元のクロックが受信側装置のクロ
ックよりも早い場合にはバッファ部１４の蓄積量が増加
する傾向となり、逆に送信元のクロックが受信側装置の
クロックよりも遅い場合にはバッファ部１４の蓄積量は
減る傾向となる。バッファ量監視部１６では、バッファ
部１４の入力や出力状況及び挿入や廃棄状況を監視して
バッファ部１４内の音声信号の蓄積量をバッファ制御部
１５に通知する。Here, the input to the buffer unit 14 is performed based on the clock of the IP packet transmission source, and the output from the buffer unit 14 is performed based on the clock of the receiving side apparatus. Is faster than the clock of the receiving device, the accumulated amount of the buffer unit 14 tends to increase, and conversely, when the clock of the transmission source is slower than the clock of the receiving device, the accumulated amount of the buffer unit 14 decreases. It becomes a tendency. The buffer amount monitoring unit 16 monitors the input / output status and the insertion / discard status of the buffer unit 14 and notifies the buffer control unit 15 of the storage amount of the audio signal in the buffer unit 14.

【０１１２】ファクシミリ信号には、例えば、ＤＣＳ信
号とＥＰＴ信号との間の無音区間は７５±２０ｍ秒、Ｅ
ＰＴ信号とトレーニング信号との間は２０〜２５ｍ秒と
細かに規定されている。したがって、バッファ部１４内
音声信号の無音部分に対して挿入／廃棄の制御を行う際
に、ファクシミリ信号である場合には上記のようなプロ
トコル上クリティカルな区間は避けて処理することが望
ましい。バッファ制御部１５では、このことを踏まえ、
ファクシミリプロトコル解析部２２からの情報に基づい
て、バッファ部１４に一時蓄積されている音声信号がフ
ァクシミリ信号の場合には、ファクシミリプロトコル
上、挿入／廃棄を行っても問題無い無音区間に対して処
理を行うよう制御する。これ以外のバッファ制御部１５
の動作については、実施の形態６にて図１１を用いて説
明したものと同等であるため、説明を省略する。In the facsimile signal, for example, the silent section between the DCS signal and the EPT signal is 75 ± 20 msec, E
The interval between the PT signal and the training signal is finely defined as 20 to 25 ms. Therefore, when the insertion / discarding control is performed on the silent portion of the voice signal in the buffer unit 14, it is desirable to avoid the critical section in the protocol as described above when the signal is a facsimile signal. Based on this, the buffer control unit 15
If the voice signal temporarily stored in the buffer unit 14 is a facsimile signal based on the information from the facsimile protocol analysis unit 22, the processing is performed on a silent section in which insertion / discarding does not cause a problem in the facsimile protocol. Control to do. Other buffer control unit 15
The operation of is the same as that described in the sixth embodiment with reference to FIG. 11, and thus the description thereof is omitted.

【０１１３】以上のように、受信し復号した音声信号に
有音／無音を示すマーカーを付与してバッファに一時蓄
積し、そのバッファの蓄積量に応じてバッファ内の音声
信号を挿入／廃棄することで、送信元のクロックと受信
側装置のクロックとの差を吸収することができ、且つ、
より高品質な音声通話品質及びより高精度なクロック差
吸収機能を安価に実現できる。また、通常の会話などの
音声信号以外のファクシミリ信号には、ファクシミリプ
ロトコルも考慮した制御を行うことで、トータルな高通
話品質を提供できる。また、復号音声信号に対して処理
することで、ある時間長を持つ音声フレーム単位で処理
する場合よりも、細かな制御が可能となり、更なる高品
質な音声通話品質及びより高精度なクロック差吸収機能
を安価に実現できる。As described above, the voice signal received and decoded is added with a marker indicating voice / non-voice and temporarily stored in the buffer, and the voice signal in the buffer is inserted / discarded according to the accumulated amount of the buffer. Thus, it is possible to absorb the difference between the clock of the transmission source and the clock of the reception side device, and
A higher quality voice call quality and a higher accuracy clock difference absorption function can be realized at low cost. In addition, for facsimile signals other than voice signals such as normal conversation, total high call quality can be provided by performing control in consideration of the facsimile protocol. In addition, by processing the decoded voice signal, finer control is possible than when processing in units of voice frames having a certain time length, and further higher quality voice call quality and higher precision clock difference can be achieved. The absorption function can be realized at low cost.

【０１１４】[0114]

【発明の効果】この発明は、以上説明したように構成さ
れているので、以下に示すような効果を奏する。Since the present invention is constructed as described above, it has the following effects.

【０１１５】第１〜３のの発明では、バッファ部に蓄積
された音声信号の蓄積量に基づいて、この蓄積された音
声信号に音声信号を挿入するか廃棄することにより、送
信元のクロックと送信先のクロックとの差を吸収するこ
とができるので、より高品質な音声通話品質及びより高
精度なクロック差吸収機能を安価に実現できる。In the first to third aspects of the invention, based on the accumulated amount of the audio signal accumulated in the buffer section, the audio signal is inserted into or discarded from the accumulated audio signal so that the clock of the transmission source can be obtained. Since the difference with the clock of the transmission destination can be absorbed, a higher quality voice communication quality and a more accurate clock difference absorption function can be realized at low cost.

【０１１６】第４の発明では、音声信号の無音区間の継
続時間、即ち無音区間の長さに応じて音声信号を挿入又
は廃棄を行うことにより、送信元のクロックと受信側装
置のクロックとの差を吸収することができので、より高
品質な音声通話品質及びより高精度なクロック差吸収機
能を安価に実現できる。According to the fourth aspect of the invention, by inserting or discarding the audio signal according to the duration of the silent section of the audio signal, that is, the length of the silent section, the clock of the transmission source and the clock of the receiving side apparatus are separated. Since the difference can be absorbed, higher quality voice communication quality and higher accuracy clock difference absorption function can be realized at low cost.

【０１１７】第５の発明では、バッファ部に蓄積された
音声信号の有音無音区間、フロントハングオーバー区
間、ハングオーバー区間応じて、バッファ部の音声信号
を挿入又は廃棄を行うことにより、廃棄しても問題の少
ない音声信号から順に廃棄をすることができるので、よ
り高品質な音声通話品質及びより高精度なクロック差吸
収機能を安価に実現できる。According to the fifth aspect of the invention, the voice signal of the buffer unit is discarded by inserting or discarding the voice signal according to the voiced / unvoiced period, the front hangover period, and the hangover period of the voice signal accumulated in the buffer unit. However, since it is possible to discard the audio signals in order from the one with less problems, it is possible to inexpensively realize a higher quality voice communication quality and a more accurate clock difference absorption function.

【０１１８】第６の発明では、音声信号が会話による音
声信号か否かを判別することにより、音声信号と音声信
号以外の信号とを区別することができ、高通話品質を提
供できる。According to the sixth aspect of the present invention, it is possible to distinguish between a voice signal and a signal other than the voice signal by determining whether or not the voice signal is a voice signal for conversation, and it is possible to provide high communication quality.

【０１１９】第７の発明では、音声信号がファクシミリ
信号か否かを判定すると共に、このファクシミリ信号の
プロトコルを解析することにより、音声信号以外のファ
クシミリ信号にはファクシミリプロトコルを考慮した制
御を行うことができるので、高通話品質を提供できる。In the seventh invention, it is determined whether or not the voice signal is a facsimile signal, and the protocol of this facsimile signal is analyzed, so that the facsimile protocol other than the voice signal is controlled in consideration of the facsimile protocol. As a result, high call quality can be provided.

【０１２０】第８〜１２の発明では、バッファ部に蓄積
された復号された音声信号の蓄積量に基づいて、この蓄
積された復号された音声信号に音声信号を挿入するか廃
棄することにより、送信元のクロックと送信先のクロッ
クとの差を吸収することができるので、より高品質な音
声通話品質及びより高精度なクロック差吸収機能を安価
に実現できる。In the eighth to twelfth inventions, the voice signal is inserted into or discarded from the accumulated decoded voice signal based on the accumulated amount of the decoded voice signal accumulated in the buffer section. Since the difference between the clock of the transmission source and the clock of the transmission destination can be absorbed, higher quality voice communication quality and higher accuracy clock difference absorption function can be realized at low cost.

【０１２１】第１３の発明では、復号された音声信号の
無音区間の継続時間、即ち無音区間の長さに応じて音声
信号を挿入又は廃棄を行うことにより、送信元のクロッ
クと受信側装置のクロックとの差を吸収することができ
ので、より高品質な音声通話品質及びより高精度なクロ
ック差吸収機能を安価に実現できる。In the thirteenth invention, the voice signal is inserted or discarded according to the duration of the silent section of the decoded voice signal, that is, the length of the silent section, whereby the clock of the transmission source and the receiving side apparatus are Since the difference with the clock can be absorbed, a higher quality voice call quality and a more accurate clock difference absorption function can be realized at low cost.

【０１２２】第１４の発明では、バッファ部に蓄積され
た復号された音声信号の有音無音区間、フロントハング
オーバー区間、ハングオーバー区間応じて、バッファ部
の復号された音声信号に音声信号を挿入又は廃棄を行う
ことにより、廃棄しても問題の少ない復号された音声信
号から順に廃棄をすることができるので、より高品質な
音声通話品質及びより高精度なクロック差吸収機能を安
価に実現できる。In the fourteenth invention, the voice signal is inserted into the decoded voice signal of the buffer unit according to the voiced / unvoiced period, the front hangover period, and the hangover period of the decoded voice signal accumulated in the buffer unit. Alternatively, by discarding, it is possible to discard decoded voice signals in order from the ones that have the least problems even when discarded, so that a higher quality voice call quality and a more accurate clock difference absorption function can be realized at low cost. .

【０１２３】第１５の発明では、復号された音声信号が
会話による音声信号か否かを判別することにより、復号
された音声信号と音声信号以外の信号とを区別すること
ができ、高通話品質を提供できる。According to the fifteenth invention, by determining whether or not the decoded voice signal is a voice signal for conversation, it is possible to distinguish the decoded voice signal from the signals other than the voice signal, and thus it is possible to obtain a high communication quality. Can be provided.

【０１２４】第１６の発明では、復号された音声信号が
ファクシミリ信号か否かを判定すると共に、このファク
シミリ信号のプロトコルを解析することにより、復号さ
れた音声信号以外のファクシミリ信号にはファクシミリ
プロトコルを考慮した制御を行うことができるので、高
通話品質を提供できる。According to the sixteenth invention, it is determined whether or not the decoded voice signal is a facsimile signal, and the protocol of this facsimile signal is analyzed so that the facsimile protocol is applied to the facsimile signals other than the decoded voice signal. Since it is possible to perform control in consideration, it is possible to provide high call quality.

【図面の簡単な説明】[Brief description of drawings]

【図１】実施の形態１の音声伝送装置の構成図。FIG. 1 is a configuration diagram of a voice transmission device according to a first embodiment.

【図２】実施の形態１におけるバッファ制御部１５の
動作を示したフローチャート。FIG. 2 is a flowchart showing the operation of the buffer control unit 15 in the first embodiment.

【図３】実施の形態２の音声伝送装置の構成図。FIG. 3 is a configuration diagram of a voice transmission device according to a second embodiment.

【図４】実施の形態２におけるバッファ制御部１５の
動作を示したフローチャート。FIG. 4 is a flowchart showing the operation of the buffer control unit 15 in the second embodiment.

【図５】実施の形態３の音声伝送装置の構成図。FIG. 5 is a configuration diagram of a voice transmission device according to a third embodiment.

【図６】実施の形態３において音声信号の大きさ、有
音／無音の判定に使う閾値及び有音／無音判定結果を模
式的に表した図。FIG. 6 is a diagram schematically showing the magnitude of a voice signal, a threshold value used for determining whether there is sound or no sound, and the result of determining whether there is sound or no sound in the third embodiment.

【図７】実施の形態３におけるバッファ制御部１５の
動作を示したフローチャート。FIG. 7 is a flowchart showing the operation of the buffer control unit 15 in the third embodiment.

【図８】実施の形態４の音声伝送装置の構成図。FIG. 8 is a configuration diagram of a voice transmission device according to a fourth embodiment.

【図９】実施の形態５の音声伝送装置の構成図。FIG. 9 is a configuration diagram of a voice transmission device according to a fifth embodiment.

【図１０】実施の形態６の音声伝送装置の構成図。FIG. 10 is a configuration diagram of a voice transmission device according to a sixth embodiment.

【図１１】実施の形態６におけるバッファ制御部１５
の動作を示したフローチャート。FIG. 11 is a buffer control unit 15 according to the sixth embodiment.
The flowchart which showed operation | movement.

【図１２】実施の形態６におけるバッファ制御部１５
の別の動作を示したフローチャート。FIG. 12 is a buffer control unit 15 according to the sixth embodiment.
6 is a flow chart showing another operation of FIG.

【図１３】実施の形態７の音声伝送装置の構成図。FIG. 13 is a configuration diagram of a voice transmission device according to a seventh embodiment.

【図１４】実施の形態７におけるバッファ制御部１５
の動作を示したフローチャート。FIG. 14 is a buffer control unit 15 according to the seventh embodiment.
The flowchart which showed operation | movement.

【図１５】実施の形態８の音声伝送装置の構成図。FIG. 15 is a configuration diagram of a voice transmission device according to an eighth embodiment.

【図１６】実施の形態８におけるバッファ制御部１５
の動作を示したフローチャート。FIG. 16 is a buffer control unit 15 according to the eighth embodiment.
The flowchart which showed operation | movement.

【図１７】実施の形態９の音声伝送装置の構成図。FIG. 17 is a configuration diagram of a voice transmission device according to a ninth embodiment.

【図１８】実施の形態１０の音声伝送装置の構成図。FIG. 18 is a configuration diagram of an audio transmission device according to a tenth embodiment.

【図１９】従来の音声伝送装置の構成図。FIG. 19 is a configuration diagram of a conventional voice transmission device.

【符号の説明】[Explanation of symbols]

１０ＩＰパケット受信部、１１音声検出部、１２
マーカー付与部、１３無音符号化音声信号生成部、１４
バッファ部、１５バッファ制御部、１６バッファ量
監視部、１７復号部。10 IP packet receiver, 11 Voice detector, 12
Marker assigning unit, 13 Silence coded audio signal generating unit, 14
Buffer unit, 15 buffer control unit, 16 buffer amount monitoring unit, 17 decoding unit.

Claims

【特許請求の範囲】[Claims]

【請求項１】受信したＩＰパケットから音声信号を抽
出するＩＰパケット受信部と、上記ＩＰパケット受信部により抽出された音声信号の有
音無音区間を示す有音無音情報を検出する音声検出部
と、上記ＩＰパケット受信部により抽出された音声信号を蓄
積するバッファ部と、上記バッファ部に蓄積された音声信号の蓄積量を監視す
るバッファ監視部と、上記バッファ監視部により監視された音声信号の蓄積量
と上記音声検出部により検出された有音無音情報に基づ
いて、上記バッファ部に蓄積された音声信号に新たな音
声信号を挿入するか又は上記バッファ部に蓄積された音
声信号を廃棄するバッファ制御部と、上記バッファ制御部により音声信号が挿入又は廃棄され
た第２の音声信号を復号する復号部と、を備えたことを
特徴とする音声伝送装置。1. An IP packet receiving unit for extracting a voice signal from a received IP packet, and a voice detecting unit for detecting voiced / unvoiced information indicating a voiced / unvoiced section of the voice signal extracted by the IP packet receiving unit. A buffer unit for accumulating the voice signal extracted by the IP packet receiving unit, a buffer monitoring unit for monitoring the accumulation amount of the voice signal accumulated in the buffer unit, and a voice signal monitored by the buffer monitoring unit. A new voice signal is inserted into the voice signal accumulated in the buffer unit or the voice signal accumulated in the buffer unit is discarded based on the accumulated amount and the voiced / unvoiced information detected by the voice detection unit. A buffer control section; and a decoding section for decoding the second audio signal in which the audio signal is inserted or discarded by the buffer control section. Voice transmission equipment.

【請求項２】上記バッファ部に蓄積された音声信号に
挿入する無音音声信号を生成する無音音声信号生成部
と、上記ＩＰパケット受信部により抽出された音声信号に上
記有音無音情報を示すマーカーを付与するマーカー付与
部を備え、上記バッファ部は、上記マーカー付与部によりマーカー
が付与された音声信号を蓄積し、上記バッファ制御部は、上記バッファ監視部により監視
された音声信号の蓄積量と上記音声信号に付与されたマ
ーカーに基づいて、上記バッファ部に蓄積された音声信
号の無音区間に上記無音音声信号生成部により生成され
た無音音声信号を挿入するか又は上記バッファ部に蓄積
された音声信号の無音区間の無音音声信号を廃棄し、上記復号部は、上記バッファ制御部により無音音声信号
が挿入又は廃棄された音声信号からこの音声信号に付与
されたマーカーを除去した第２の音声信号を復号するこ
とを特徴とする請求項１記載の音声伝送装置。2. A silent voice signal generation unit that generates a silent voice signal to be inserted into the voice signal accumulated in the buffer unit, and a marker that indicates the voiced / unvoiced information in the voice signal extracted by the IP packet reception unit. The buffer unit stores the audio signal to which the marker is added by the marker adding unit, and the buffer control unit stores the amount of the audio signal monitored by the buffer monitoring unit. Based on the marker added to the audio signal, the silent audio signal generated by the silent audio signal generation unit is inserted into the silent section of the audio signal accumulated in the buffer unit, or accumulated in the buffer unit. The silent voice signal in the silent section of the voice signal is discarded, and the decoding unit outputs the voice in which the silent voice signal is inserted or discarded by the buffer control unit. Audio transmission device according to claim 1, wherein the decoding the second audio signal by removing the applied marker to the audio signal from the item.

【請求項３】上記バッファ制御部は、上記バッファに
蓄積された音声信号の蓄積量が予め設定された下限値以
下のときには、上記バッファ部に蓄積された音声信号の
無音区間に上記無音音声信号生成部により生成された無
音音声信号を挿入し、上記音声信号の蓄積量が予め設定
された上限値以上のときには、上記バッファ部に蓄積さ
れた音声信号の無音区間の無音音声信号を廃棄すること
を特徴とする請求項２記載の音声伝送装置。3. The buffer control section, when the storage amount of the audio signal stored in the buffer is equal to or less than a preset lower limit value, the silent audio signal is included in a silent section of the audio signal stored in the buffer section. Inserting the silent voice signal generated by the generating unit, and discarding the silent voice signal in the silent section of the voice signal accumulated in the buffer unit when the accumulated amount of the voice signal is equal to or more than a preset upper limit value. The voice transmission device according to claim 2, wherein

【請求項４】上記バッファに蓄積された音声信号の無
音区間の継続時間を測定する無音継続測定部を備え、上記バッファ制御部は、上記バッファに蓄積された音声
信号の蓄積量が予め設定された下限値以下のときには、
上記継続時間と上記音声信号に付与されたマーカーに基
づいて、上記バッファ部に蓄積された音声信号の無音区
間に上記無音音声信号生成部により生成された無音音声
信号を挿入し、上記バッファに蓄積された音声信号の蓄
積量が予め設定された上限値以上のときには、上記継続
時間と上記音声信号に付与されたマーカーに基づいて、
上記バッファ部に蓄積された音声信号の無音区間の無音
音声信号を廃棄することを特徴とする請求項２記載の音
声伝送装置。4. A silence continuation measuring section for measuring a duration of a silent section of a voice signal accumulated in the buffer, wherein the buffer control section presets an accumulation amount of the voice signal accumulated in the buffer. Below the lower limit,
Based on the duration and the marker added to the audio signal, the silent audio signal generated by the silent audio signal generation unit is inserted into the silent section of the audio signal stored in the buffer unit, and stored in the buffer. When the accumulated amount of the audio signal is equal to or greater than the preset upper limit value, based on the duration and the marker attached to the audio signal,
3. The audio transmission device according to claim 2, wherein the silent audio signal in the silent section of the audio signal accumulated in the buffer unit is discarded.

【請求項５】無音から有音に変化した時点から一定時
間前までの無音区間により構成されるフロントハングオ
ーバー区間と、有音から無音に変化した時点から一定時
間後までの無音区間により構成されるハングオーバー区
間とを示すマーカーを上記音声信号に付与するマーカー
付与部を備え、上記バッファ制御部は、上記バッファに蓄積された音声
信号の蓄積量が予め設定された下限値以下のときには、
上記フロントハングオーバー区間でなく又上記ハングオ
ーバー区間でない音声信号の無音区間に上記無音音声信
号生成部により生成された無音音声信号を挿入し、上記
バッファに蓄積された音声信号の蓄積量が予め設定され
た複数の上限値以上のときには、この複数の上限値に応
じて上記フロントハングオーバー区間、又は上記ハング
オーバー区間、又は上記フロントハングオーバー区間で
なく又上記ハングオーバー区間でない無音区間の無音音
声信号を廃棄することを特徴とする請求項３記載の音声
伝送装置。5. A front hangover section composed of a silent section from a time point of changing from silence to a certain time before a certain time, and a silent section from a time point of changing from sound to silence to a certain time later. A hangover section and a marker giving unit for giving a marker indicating the audio signal to the audio signal, the buffer control unit, when the accumulation amount of the audio signal accumulated in the buffer is less than or equal to a preset lower limit value,
The silent audio signal generated by the silent audio signal generation unit is inserted into the silent interval of the audio signal which is neither the front hangover interval nor the hangover interval, and the accumulation amount of the audio signal accumulated in the buffer is preset. When the number is equal to or more than the plurality of upper limit values, the silent audio signal of the front hangover section, the hangover section, or the silent section which is neither the front hangover section nor the hangover section according to the plurality of upper limit values. The voice transmission device according to claim 3, wherein the voice transmission device is discarded.

【請求項６】上記ＩＰパケット受信部により抽出され
た音声信号が会話による音声信号か否かを判別する受信
データ判別部と、上記受信データ判別部により判別された判別結果に基づ
いて上記ＩＰパケット受信部の音声信号以外の信号を選
択するか又は上記バッファに蓄積された音声信号を選択
するセレクタとを備え、上記復号部は、上記セレクタの選択結果に基づいて上記
無音音声信号が挿入又は廃棄された音声信号からこの音
声信号に付与されたマーカーを除去した第３の音声信号
を復号することを特徴とする請求項３記載の音声伝送装
置。6. The received data discriminating unit for discriminating whether or not the voice signal extracted by the IP packet receiving unit is a voice signal for conversation, and the IP packet based on the discrimination result discriminated by the received data discriminating unit. And a selector for selecting a signal other than the audio signal of the receiving section or for selecting an audio signal stored in the buffer, wherein the decoding section inserts or discards the silent audio signal based on the selection result of the selector. 4. The audio transmission device according to claim 3, wherein the audio signal is decoded and a third audio signal obtained by removing a marker added to the audio signal is decoded.

【請求項７】上記受信データ判別部の判別結果に基づ
いて上記ＩＰパケット受信部の音声信号がファクシミリ
信号か否かを判定し、上記音声信号がファクシミリ信号
のときにはこのファクシミリ信号のプロトコルを解析す
るファクシミリプロトコル解析部を備え、上記バッファ制御部は、上記バッファ部にファクシミリ
信号が蓄積されているときには、上記ファクシミリプロ
トコル解析部により解析された解析情報に基づいて上記
バッファ部に蓄積されたファクシミリ信号のプロトコル
上、上記無音音声信号の挿入又は廃棄を行っても問題の
ない音声信号の無音区間に上記無音音声信号を挿入する
か又は上記音声信号の無音区間の無音音声信号を廃棄す
ることを特徴とする請求項６記載の音声伝送装置。7. A determination is made as to whether or not the voice signal of the IP packet reception unit is a facsimile signal based on the determination result of the received data determination unit, and when the voice signal is a facsimile signal, the protocol of this facsimile signal is analyzed. A facsimile protocol analysis unit is provided, and the buffer control unit, when a facsimile signal is stored in the buffer unit, stores the facsimile signal stored in the buffer unit based on the analysis information analyzed by the facsimile protocol analysis unit. According to the protocol, the silent voice signal is inserted or discarded without any problem even if the silent voice signal is inserted into the silent period of the voice signal, or the silent voice signal of the silent period of the voice signal is discarded. The audio transmission device according to claim 6.

【請求項８】受信したＩＰパケットから音声信号を抽
出するＩＰパケット受信部と、上記ＩＰパケット受信部により抽出された音声信号を復
号する復号部と、上記復号部により復号された第３の音声信号から有音無
音区間を示す有音無音情報を検出する音声検出部と、上記復号部により復号された第３の音声信号を蓄積する
バッファ部と、上記バッファ部に蓄積された第３の音声信号の蓄積量を
監視するバッファ監視部と、上記バッファ監視部により監視された第３の音声信号の
蓄積量と上記音声検出部により検出された有音無音情報
に基づいて、上記バッファ部に蓄積された第３の音声信
号に新たな音声信号を挿入するか又は上記バッファ部に
蓄積された第３の音声信号を廃棄するバッファ制御部
と、を備えたことを特徴とする音声伝送装置。8. An IP packet receiving unit for extracting a voice signal from the received IP packet, a decoding unit for decoding the voice signal extracted by the IP packet receiving unit, and a third voice decoded by the decoding unit. A voice detection unit that detects voiced / unvoiced information indicating a voiced / unvoiced section from a signal, a buffer unit that stores the third voice signal decoded by the decoding unit, and a third voice that is stored in the buffer unit. A buffer monitoring unit that monitors a signal storage amount, and a buffer monitoring unit that stores the third voice signal monitored by the buffer monitoring unit and the voice / voice information detected by the voice detection unit in the buffer unit. And a buffer control unit that inserts a new audio signal into the generated third audio signal or discards the third audio signal accumulated in the buffer unit. Apparatus.

【請求項９】上記バッファ部に蓄積された第３の音声
信号に挿入する無音音声信号を生成する無音音声信号生
成部と上記復号部により復号された第３の音声信号に有音無音
情報を示すマーカーを付与するマーカー付与部を備え、上記バッファ部は、上記マーカー付与部によりマーカー
が付与された第３の音声信号を蓄積し、上記バッファ制御部は、上記バッファ監視部により監視
された第３の音声信号の蓄積量と上記第３の音声信号に
付与されたマーカーに基づいて、上記バッファ部に蓄積
された第３の音声信号の無音区間に上記無音音声信号生
成部により生成された無音音声信号を挿入するか又は上
記バッファ部に蓄積された第３の音声信号の無音区間の
無音音声信号を廃棄することを特徴とする請求項８記載
の音声伝送装置。9. A voiceless / soundless information is added to a third voice signal decoded by the decoder, and a voiceless voice signal generation unit for generating a voiceless voice signal to be inserted into the third voice signal accumulated in the buffer unit. A marker assigning unit that assigns a marker to be displayed, wherein the buffer unit stores the third audio signal to which the marker is assigned by the marker assigning unit, and the buffer control unit monitors the third audio signal monitored by the buffer monitoring unit. No sound generated by the silent sound signal generation unit in the silent section of the third sound signal stored in the buffer unit, based on the accumulated amount of the third sound signal and the marker added to the third sound signal. 9. The audio transmission device according to claim 8, wherein an audio signal is inserted or a silent audio signal in a silent section of the third audio signal accumulated in the buffer unit is discarded.

【請求項１０】上記バッファ制御部は、上記バッファ
に蓄積された第３の音声信号の蓄積量が予め設定された
下限値以下のときには、、上記バッファ部に蓄積された
第３の音声信号の無音区間に上記無音音声信号生成部に
より生成された無音音声信号を挿入し、上記第３の音声
信号の蓄積量が予め設定された上限値以上のときには、
上記バッファ部に蓄積された第３の音声信号の無音区間
の無音音声信号を廃棄することを特徴とする請求項９記
載の音声伝送装置。10. The buffer control unit, when the storage amount of the third audio signal stored in the buffer is equal to or lower than a preset lower limit value, outputs the third audio signal stored in the buffer unit. When the silent voice signal generated by the silent voice signal generation unit is inserted in the silent period and the accumulated amount of the third voice signal is equal to or more than the preset upper limit value,
10. The audio transmission device according to claim 9, wherein the silent audio signal in the silent section of the third audio signal accumulated in the buffer unit is discarded.

【請求項１１】上記バッファ制御部は、上記バッファ
に蓄積された第３の音声信号の蓄積量が予め設定された
複数の下限値以下のときには、この複数の下限値に応じ
て上記第３の音声信号の無音区間に上記無音音声信号生
成部により生成された無音音声信号を挿入し、又上記バ
ッファに蓄積された第３の音声信号の蓄積量が予め設定
された複数の上限値以上のときには、この複数の上限値
に応じて上記第３の音声信号の無音区間の無音音声信号
を廃棄することを特徴とする請求項９記載の音声伝送装
置。11. The buffer control unit, when the storage amount of the third audio signal stored in the buffer is equal to or lower than a plurality of preset lower limit values, the third control unit determines the third lower limit value according to the plurality of lower limit values. When the silent voice signal generated by the silent voice signal generation unit is inserted in the silent period of the voice signal, and when the accumulation amount of the third voice signal accumulated in the buffer is equal to or more than the preset upper limit values. 10. The audio transmission device according to claim 9, wherein the silent audio signal in the silent section of the third audio signal is discarded according to the plurality of upper limit values.

【請求項１２】上記バッファ制御部は、上記複数の下
限値が最低下限値以下でかつ上記バッファに蓄積された
第３の音声信号に無音区間がないときには、上記第３の
音声信号の有音区間の信号欠落部分の信号を挿入する補
間処理を行うことを特徴とする請求項１１記載の音声伝
送装置。12. The buffer control section, when the plurality of lower limit values are equal to or less than the minimum lower limit value and the third voice signal accumulated in the buffer has no silent section, the sound control of the third voice signal is performed. The audio transmission device according to claim 11, wherein interpolation processing for inserting a signal in a signal missing portion of a section is performed.

【請求項１３】上記バッファに蓄積された第３の音声
信号の無音区間の継続時間を測定する無音継続測定部を
備え、上記バッファ制御部は、上記バッファに蓄積された第３
の音声信号の蓄積量が予め設定された下限値以下のとき
には、上記継続時間と上記第３の音声信号に付与された
マーカーに基づいて、上記バッファ部に蓄積された第３
の音声信号の無音区間に上記無音音声信号生成部により
生成された無音音声信号を挿入し、上記バッファに蓄積
された第３の音声信号の蓄積量が予め設定された上限値
以上のときには、上記継続時間と上記第３の音声信号に
付与されたマーカーに基づいて、上記バッファ部に蓄積
された第３の音声信号の無音区間の無音音声信号を廃棄
することを特徴とする請求項９記載の音声伝送装置。13. A silence continuation measuring section for measuring a duration of a silent section of the third audio signal accumulated in the buffer, wherein the buffer control section comprises the third continuation measuring section accumulated in the buffer.
When the accumulated amount of the audio signal of is less than or equal to the preset lower limit value, the third accumulated in the buffer unit is based on the duration and the marker added to the third audio signal.
When the silent voice signal generated by the silent voice signal generation unit is inserted into the silent period of the voice signal, and the storage amount of the third voice signal stored in the buffer is equal to or more than a preset upper limit value, 10. The silent audio signal in the silent section of the third audio signal accumulated in the buffer unit is discarded based on the duration and the marker added to the third audio signal. Audio transmission device.

【請求項１４】無音から有音に変化した時点から一定
時間前までの無音区間により構成されるフロントハング
オーバー区間と、有音から無音に変化した時点から一定
時間後までの無音区間により構成されるハングオーバー
区間とを示すマーカーを上記第３の音声信号に付与する
マーカー付与部を備え、上記バッファ制御部は、上記バッファに蓄積された第３
の音声信号の蓄積量が予め設定された下限値以下のとき
には、上記フロントハングオーバー区間でなく又上記ハ
ングオーバー区間でない第３の音声信号の無音区間に上
記無音音声信号を挿入し、上記バッファに蓄積された第
３の音声信号の蓄積量が予め設定された複数の上限値以
上のときには、この複数の上限値に応じて上記フロント
ハングオーバー区間、又は上記ハングオーバー区間、又
は上記フロントハングオーバー区間でなく又上記ハング
オーバー区間でない無音区間の無音音声信号を廃棄する
ことを特徴とする請求項１０記載の音声伝送装置。14. A front hangover section composed of a silent section from a time point of change from silence to a certain time before a certain time period, and a silent section from time point of change from sound to silence to a certain time later. A marker indicating that a hangover section is added to the third audio signal, and the buffer controller controls the third audio signal stored in the buffer.
Is less than a preset lower limit value, the silent audio signal is inserted into the silent interval of the third audio signal which is neither the front hangover interval nor the hangover interval and is stored in the buffer. When the accumulated amount of the accumulated third audio signal is equal to or more than a plurality of preset upper limit values, the front hangover section, the hangover section, or the front hangover section is determined according to the plurality of upper limit values. 11. The voice transmission device according to claim 10, wherein a silent voice signal in a silent section other than the hangover section is discarded.

【請求項１５】上記復号部により復号された第３の音
声信号が会話による音声信号か否かを判別する受信デー
タ判別部と、上記受信データ判別部により判別された判別結果に基づ
いて上記ＩＰパケット受信部の第３の音声信号以外の信
号を選択するか又は上記バッファに蓄積された第３の音
声信号を選択するセレクタとを備えたことを特徴とする
請求項１０記載の音声伝送装置。15. A reception data discriminating unit for discriminating whether or not the third voice signal decoded by the decoding unit is a voice signal for conversation, and the IP based on the discrimination result discriminated by the reception data discriminating unit. 11. The audio transmission device according to claim 10, further comprising a selector that selects a signal other than the third audio signal of the packet receiving unit or selects the third audio signal accumulated in the buffer.

【請求項１６】上記受信データ判別部による判別結果
に基づいて上記復号部により復号された第３の音声信号
がファクシミリ信号か否かを判定し、上記第３の音声信
号がファクシミリ信号のときにはこのファクシミリ信号
のプロトコルを解析するファクシミリプロトコル解析部
を備え、上記バッファ制御部は、上記バッファ部にファクシミリ
信号が蓄積されているときには、上記ファクシミリプロ
トコル解析部により解析された解析情報に基づいて上記
バッファ部に蓄積されたファクシミリ信号のプロトコル
上、上記無音音声信号の挿入又は廃棄を行っても問題の
ない第３の音声信号の無音区間に上記無音音声信号を挿
入するか又は上記第３の音声信号の無音区間の無音音声
信号を廃棄することを特徴とする請求項１５記載の音声
伝送装置。16. A determination is made as to whether or not the third voice signal decoded by the decoding unit is a facsimile signal, based on the result of the determination by the received data determination unit, and when the third voice signal is a facsimile signal, this determination is made. A facsimile protocol analysis unit for analyzing a protocol of a facsimile signal is provided, and the buffer control unit, when the facsimile signal is accumulated in the buffer unit, the buffer unit based on the analysis information analyzed by the facsimile protocol analysis unit. According to the protocol of the facsimile signal stored in the above, there is no problem even if the silent voice signal is inserted or discarded, the silent voice signal is inserted in the silent period of the third voice signal, or the third voice signal 16. The audio transmission device according to claim 15, wherein a silent audio signal in a silent section is discarded.