JP3604817B2

JP3604817B2 - Voice transmission system and receiving terminal

Info

Publication number: JP3604817B2
Application number: JP17199496A
Authority: JP
Inventors: 朋子久野
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 1996-07-02
Filing date: 1996-07-02
Publication date: 2004-12-22
Anticipated expiration: 2016-07-02
Also published as: JPH1023067A

Description

【０００１】
【発明の属する技術分野】
本発明は、音声伝送システム及び受信端末に関し、より具体的には、例えば遠隔地間で音声を伝送する音声伝送システム及び受信端末に関する。
【０００２】
【従来の技術】
遠隔地間で音声を伝送する音声伝送システムでは、符号化された音声データを通信回線を介して伝送し、受信側で、復号化して音声出力している。
【０００３】
【発明が解決しようとする課題】
従来例では、受信側は、受信した音声情報を単純に復号化して音声出力しているだけであり、伝送遅延は全く考慮されていない。つまり、従来例では、音声伝送で遅延が生じた場合、遅延された状態で音声が再生されている。
【０００４】
本発明はこのような問題点を解決し、伝送遅延を解消する音声伝送システム及び受信端末を提示することを目的とする。
【０００５】
本発明はまた、一連の音声ストリーム中で長さの異なる遅延が生じていても、その遅延の影響を緩和又は抑制する音声伝送システム及び受信端末を提示することを目的とする。
【０００６】
【課題を解決するための手段】
本発明に係る音声伝送システムは、ネットワーク上の複数の端末がそれぞれ、音声符号化手段と、音声復号化手段と、受信音声情報を一時格納するバッファと、受信した符号化音声データが無音か否かを判定する無音／有音判定手段とを具備し、これら複数の端末間で音声を伝送する音声伝送システムであって、送信側端末は、音声情報に予め音声送信開始時点の時間情報を付加してネットワークに送出し、受信側端末は、受信した音声情報に付加される前記時間情報と実時間との差が予め決められた値以上の場合に、無音データの復号化をバイパスすることを特徴とする。
本発明に係る受信端末は、ネットワーク上の送信端末から送信された符号化音声データを受信する受信端末であって、前記受信した符号化音声データを一時格納するバッファと、前記バッファに格納された符号化音声データを復号化する音声復号化手段と、前記受信した符号化音声データが無音か否かを判定する判定手段と、前記受信した音声情報に対して前記送信端末で付加された音声送信の開始時点の時間情報と実時間情報との差が予め定められた値以上の場合、前記判定手段によって無音データと判定された無音データの復号化をバイパスする制御手段とを具備することを特徴とする。
【０００７】
【発明の実施の形態】
以下、図面を参照して本発明の実施の形態を詳細に説明する。
【０００８】
図１は、本発明の一実施例の概略構成ブロック図を示す。ローカル・エリア・ネットワーク（ＬＡＮ）などの通信ネットワーク１０に、音声通信端末２０，４０が接続する。音声通信端末２０，４０は共に同じ構成からなり、コンピュータ２２，４２にマイク２４，４４とスピーカ２６，４６が接続し、コンピュータ２２，４２には、受信音声を一時格納する音声バッファ２８，４８、受信音声の無音／有音を判定する無音／有音判定手段３０，５０、音声符号化手段３２，５２及び音声復号化手段３４，５４を具備する。手段３０，３２，３４；５０，５２，５４はそれぞれ、ソフトウエア、ハードウエア又はソフトウエアとハードウエアからなる。音声バッファ２８，４８は基本的にはＦＩＦＯ（先入れ先出し）メモリであり、その容量は、利用可能なメモリ容量の範囲内で任意に変更可能である。
【０００９】
コンピュータ２２，４２はまた、全体を制御するＣＰＵ、各種プログラム及びデータ等を記憶するＲＯＭ、ネットワーク１０に対するデータ送受信手段、並びにユーザが操作する操作手段などを備えている。音声は、ＰＣＭ方式で取り込まれる。
【００１０】
端末２０と端末４０は互いに双方向に音声を伝送できるが、ここでは、端末２０から端末４０に音声を伝送する場合の動作を説明する。端末２０のマイク２４により取り込まれた音声は、音声符号化手段３２により符号化される。符号化された音声データは、本実施例では、１秒分の音声データを１つのパケット・データとして、１パケット・データずつネットワーク１０を介して端末４０に転送される。
【００１１】
端末４０はネットワーク１０から入力したパケット・データをバッファ４８に格納する。無音／有音判定手段５０は、バッファ４８に格納されたパケット・データが無音かどうかを調べる。具体的には、端末２０が、音声を取り込んだときに、１パケット毎に音声データの分散を算出し、得られた分散が所定値以下の時に、そのパケットの部分は無音であるとして、データ’０’をそのパケットに入れる。従って、受信側の端末４０の無音／有音判定手段５０は、受信したパケットのデータが’０’であれば、それは無音のパケットであるとする。伝送すべき音声信号と、パケットとの関係の一例を図２に示す。４つ目のパケットが無音になる。
【００１２】
図３は、端末４０における音声再生処理のフローチャートである。バッファ４８に１以上のパケットが格納されるのを待ち（Ｓ１）、１以上のパケットがあれば（Ｓ１）、パケットが１個かどうかを調べる（Ｓ２）。音声伝送に遅延が生じていない時、バッファ４８内の音声パケットの個数は、常に１つである。パケットが１つならば（Ｓ２）、そのパケットの符号化音声データを音声復号化手段５４に供給する（Ｓ４）。音声復号化手段５４は入力する符号化音声データを復号化し、スピーカ４６に供給して音声出力させる。音声出力したパケットはバッファ４８から抹消して（Ｓ５）、Ｓ１に戻る。
【００１３】
２つ以上のパケットがあると（Ｓ２）、音声の再生に遅延が生じているので、無音のパケットの音声出力をパスする。即ち、無音／有音判定手段５０が先頭のパケットが無音かどうかを調べ（Ｓ３）、無音であれば、そのパケットをバッファ４８から抹消してＳ１に戻り（Ｓ５）、有音であれば（Ｓ３）、その先頭パケットの符号化音声データを音声復号化手段５４により復号化させ、スピーカ４６から音声出力させる（Ｓ４）。音声出力したパケットはバッファ４８から抹消して（Ｓ５）、Ｓ１に戻る。
【００１４】
音声再生に遅延が生じているときに、本実施例では、無音データの再生出力をパスするので、その無音の時間だけ、音声再生の遅延を解消できる。
【００１５】
上記実施例では、音声送信側が、無音部分のパケットに特定データを埋め込んだが、送信する各パケットに時間情報を入れてもよい。その時間情報は、時刻情報であっても、音声送信の開始時点からの経過時間情報であってもよい。この場合、図４に示すように、端末２０での送信すべき音声の入力と、端末４０における受信音声の再生出力との間には、送信側での音声符号化、伝送及び受信側での音声復号化に要する時間に相当する微小時間の遅れが発生する。この遅れをδｔとする。
【００１６】
図５は、このような変更例における音声受信端末４０での音声処理のフローチャートを示す。
【００１７】
バッファ４８から先頭パケットを取り出し（Ｓ１１）、そのパケットの時間情報を実時間と比較し、差が微小時間δｔ以下であれば（Ｓ１２）、音声再生に深刻な遅延が生じていないことになり、そのパケットの符号化音声データを復号化して音声出力し（Ｓ１４）、そのパケットをバッファ４８から消去して（Ｓ１５）、次のパケットの処理（Ｓ１１）に戻る。
【００１８】
差がδｔより大きければ（Ｓ１２）、深刻な遅延が生じていることになり、先頭パケットが無音かどうかを調べる（Ｓ１３）。無音のパケットであれば（Ｓ１３）、復号化せずにバッファ４８から抹消し（Ｓ１５）、次のパケットの処理（Ｓ１１）に戻る。無音でなければ（Ｓ１３）、そのパケットの符号化音声データを復号化して音声出力し（Ｓ１４）、そのパケットをバッファ４８から消去して（Ｓ１５）、次のパケットの処理（Ｓ１１）に戻る。
【００１９】
このように、無音のパケットを復号化（及び音声出力）しないので、その無音の区間だけ、音声の遅延を解消できる。
【００２０】
【発明の効果】
以上の説明から容易に理解できるように、本発明によれば、伝送等による音声再生の遅延を、再生音声出力に影響しないような形で効果的に解消できる。
【図面の簡単な説明】
【図１】本発明の一実施例の概略構成ブロック図である。
【図２】音声とパケットとの対応を示す模式図である。
【図３】本実施例の音声再生処理のフローチャートである。
【図４】伝送遅延等の遅延の説明図である。
【図５】本実施例の変更例の音声再生処理のフローチャートである。
【符号の説明】
１０：通信ネットワーク
２０，４０：音声通信端末
２２，４２：コンピュータ
２４，４４：マイク
２６，４６：スピーカ
２８，４８：音声バッファ
３０，５０：無音／有音判定手段
３２，５２：音声符号化手段
３４，５４：音声復号化手段[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to a voice transmission system and a receiving terminal , and more specifically, to a voice transmission system and a receiving terminal for transmitting voice between remote locations, for example.
[0002]
[Prior art]
2. Description of the Related Art In a voice transmission system for transmitting voice between remote locations, encoded voice data is transmitted via a communication line, and the receiving side decodes and outputs voice.
[0003]
[Problems to be solved by the invention]
In the conventional example, the receiving side simply decodes the received audio information and outputs the audio, and does not consider transmission delay at all. That is, in the conventional example, when a delay occurs in the audio transmission, the audio is reproduced in a delayed state.
[0004]
An object of the present invention is to solve such a problem and to present a voice transmission system and a receiving terminal that eliminate transmission delay.
[0005]
Another object of the present invention is to provide an audio transmission system and a receiving terminal that reduce or suppress the influence of a delay having a different length even in a series of audio streams.
[0006]
[Means for Solving the Problems]
In the audio transmission system according to the present invention, a plurality of terminals on the network each include an audio encoding unit, an audio decoding unit, a buffer for temporarily storing received audio information, and whether received encoded audio data is silent. A voice transmission system for transmitting voice between the plurality of terminals, wherein the transmitting terminal adds time information at the start of voice transmission to the voice information in advance. And transmitting it to the network, and the receiving terminal, if the difference between the time information added to the received voice information and the real time is equal to or greater than a predetermined value, bypasses the decoding of silence data. Features.
The receiving terminal according to the present invention is a receiving terminal that receives coded voice data transmitted from a transmitting terminal on a network, and a buffer that temporarily stores the received coded voice data, and a buffer that is stored in the buffer. Voice decoding means for decoding coded voice data, determining means for determining whether or not the received coded voice data is silent, voice transmission added to the received voice information by the transmitting terminal And control means for bypassing the decoding of the silence data determined as silence data by the judgment means when the difference between the time information at the start time and the real time information is equal to or greater than a predetermined value. And
[0007]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, embodiments of the present invention will be described in detail with reference to the drawings.
[0008]
FIG. 1 shows a schematic block diagram of an embodiment of the present invention. Voice communication terminals 20 and 40 connect to a communication network 10 such as a local area network (LAN). The voice communication terminals 20 and 40 have the same configuration. Microphones 24 and 44 and speakers 26 and 46 are connected to the computers 22 and 42, and the computers 22 and 42 have voice buffers 28 and 48 for temporarily storing received voices. The apparatus comprises silence / speech determination means 30 and 50 for judging silence / speech of a received voice, speech encoding means 32 and 52, and speech decoding means 34 and 54. The means 30, 32, 34; 50, 52, 54 each comprise software, hardware or software and hardware. The audio buffers 28 and 48 are basically FIFO (first-in first-out) memories, and the capacity thereof can be arbitrarily changed within a range of available memory capacity.
[0009]
The computers 22 and 42 also include a CPU for controlling the whole, a ROM for storing various programs and data, a data transmitting / receiving unit for the network 10, and an operating unit operated by a user. Audio is captured by the PCM method.
[0010]
Although the terminal 20 and the terminal 40 can transmit voice in both directions, the operation when transmitting voice from the terminal 20 to the terminal 40 will be described here. The audio captured by the microphone 24 of the terminal 20 is encoded by the audio encoding unit 32. In the present embodiment, the encoded audio data is transferred to the terminal 40 via the network 10 one packet data at a time, with one second of audio data as one packet data.
[0011]
The terminal 40 stores the packet data input from the network 10 in the buffer 48. The silence / speech determination unit 50 checks whether the packet data stored in the buffer 48 is silence. Specifically, when the terminal 20 captures the voice, the variance of the voice data is calculated for each packet, and when the obtained variance is equal to or less than a predetermined value, the packet portion is determined to be silent, and Put '0' in the packet. Therefore, if the data of the received packet is “0”, the silence / speech determination unit 50 of the terminal 40 on the receiving side determines that the packet is a silence packet. FIG. 2 shows an example of a relationship between a voice signal to be transmitted and a packet. The fourth packet becomes silent.
[0012]
FIG. 3 is a flowchart of the audio reproduction process in the terminal 40. It waits for one or more packets to be stored in the buffer 48 (S1). If there is one or more packets (S1), it is checked whether there is one packet (S2). When there is no delay in voice transmission, the number of voice packets in the buffer 48 is always one. If there is one packet (S2), the coded voice data of the packet is supplied to the voice decoding means 54 (S4). The voice decoding means 54 decodes the input coded voice data and supplies it to the speaker 46 to output the voice. The packet output as the voice is deleted from the buffer 48 (S5), and the process returns to S1.
[0013]
If there are two or more packets (S2), the audio output of the silent packet is passed because the reproduction of the audio is delayed. That is, the silent / sound determining means 50 checks whether the first packet is silent (S3). If there is no sound, the packet is deleted from the buffer 48 and the process returns to S1 (S5). S3) The encoded audio data of the first packet is decoded by the audio decoding means 54, and the audio is output from the speaker 46 (S4). The packet output as the voice is deleted from the buffer 48 (S5), and the process returns to S1.
[0014]
In this embodiment, when a delay occurs in the sound reproduction, the reproduction output of the silent data is passed, so that the delay of the sound reproduction can be eliminated only during the silent period.
[0015]
In the above embodiment, the voice transmitting side embeds the specific data in the packet of the silence part, but time information may be inserted in each packet to be transmitted. The time information may be time information or elapsed time information from the start of voice transmission. In this case, as shown in FIG. 4, between the input of the audio to be transmitted at the terminal 20 and the reproduction output of the received audio at the terminal 40, the audio encoding on the transmission side, the transmission and the transmission A small time delay corresponding to the time required for speech decoding occurs. This delay is defined as δt.
[0016]
FIG. 5 shows a flowchart of audio processing in the audio receiving terminal 40 in such a modification.
[0017]
The first packet is extracted from the buffer 48 (S11), the time information of the packet is compared with the real time, and if the difference is less than the minute time δt (S12), no serious delay has occurred in the audio reproduction. The encoded audio data of the packet is decoded and output as audio (S14), the packet is deleted from the buffer 48 (S15), and the process returns to the next packet processing (S11).
[0018]
If the difference is larger than δt (S12), it means that a serious delay has occurred, and it is checked whether or not the head packet is silent (S13). If the packet is a silent packet (S13), the packet is deleted from the buffer 48 without decoding (S15), and the process returns to the processing of the next packet (S11). If there is no silence (S13), the encoded audio data of the packet is decoded and output as audio (S14), the packet is deleted from the buffer 48 (S15), and the process returns to the next packet processing (S11).
[0019]
As described above, since the silent packet is not decoded (and the voice is not output), the delay of the voice can be eliminated only in the silent section.
[0020]
【The invention's effect】
As can be easily understood from the above description, according to the present invention, the delay of sound reproduction due to transmission or the like can be effectively eliminated without affecting the reproduced sound output.
[Brief description of the drawings]
FIG. 1 is a schematic block diagram of an embodiment of the present invention.
FIG. 2 is a schematic diagram showing correspondence between voice and packets.
FIG. 3 is a flowchart of a sound reproduction process according to the embodiment.
FIG. 4 is an explanatory diagram of a delay such as a transmission delay.
FIG. 5 is a flowchart of a sound reproduction process according to a modification of the embodiment.
[Explanation of symbols]
10: Communication networks 20, 40: Voice communication terminals 22, 42: Computers 24, 44: Microphones 26, 46: Speakers 28, 48: Voice buffers 30, 50: Silence / voice determination means 32, 52: Voice coding means 34, 54: voice decoding means

Claims

ネットワーク上の複数の端末がそれぞれ、音声符号化手段と、音声復号化手段と、受信音声情報を一時格納するバッファと、受信した符号化音声データが無音か否かを判定する無音／有音判定手段とを具備し、これら複数の端末間で音声を伝送する音声伝送システムであって、送信側端末は、音声情報に予め音声送信開始時点の時間情報を付加してネットワークに送出し、受信側端末は、受信した音声情報に付加される前記時間情報と実時間との差が予め決められた値以上の場合に、無音データの復号化をバイパスすることを特徴とする音声伝送システム。A plurality of terminals on the network each include an audio encoding unit, an audio decoding unit, a buffer for temporarily storing received audio information, and a silence / speech determination for judging whether or not received encoded audio data is silent. Means for transmitting voice between the plurality of terminals, wherein the transmitting terminal adds time information at the time of starting the voice transmission to the voice information and sends it to the network, An audio transmission system, wherein a terminal bypasses decoding of silence data when a difference between the time information added to received audio information and real time is equal to or greater than a predetermined value.

ネットワーク上の送信端末から送信された符号化音声データを受信する受信端末であって、A receiving terminal for receiving encoded voice data transmitted from a transmitting terminal on a network,
前記受信した符号化音声データを一時格納するバッファと、A buffer for temporarily storing the received encoded audio data,
前記バッファに格納された符号化音声データを復号化する音声復号化手段と、Audio decoding means for decoding the encoded audio data stored in the buffer,
前記受信した符号化音声データが無音か否かを判定する判定手段と、Determining means for determining whether or not the received encoded voice data is silent;
前記受信した音声情報に対して前記送信端末で付加された音声送信の開始時点の時間情報と実時間情報との差が予め定められた値以上の場合、前記判定手段によって無音データと判定された無音データの復号化をバイパスする制御手段If the difference between the time information at the start of voice transmission added by the transmitting terminal to the received voice information and the real-time information is greater than or equal to a predetermined value, the voice data is determined to be silent data by the determination unit. Control means for bypassing the decoding of silence data
とを具備することを特徴とする受信端末。A receiving terminal comprising: