JP2004253856A

JP2004253856A - Communication processing method

Info

Publication number: JP2004253856A
Application number: JP2003039321A
Authority: JP
Inventors: Masakazu Yano; 雅一矢野
Original assignee: Matsushita Electric Industrial Co Ltd
Current assignee: Panasonic Holdings Corp
Priority date: 2003-02-18
Filing date: 2003-02-18
Publication date: 2004-09-09

Abstract

<P>PROBLEM TO BE SOLVED: To provide a communication processing method whereby talking through an Internet phone can be attained without decreasing the communication efficiency even when intermission or a delay of voice takes place when talking through the Internet phone. <P>SOLUTION: In the case that an information terminal 119 calls an information terminal 149 to talk, a voice data generating means 102 of the information terminal 119 generates voice data from a received voice signal. A voice recognition means 103 converts the generated voice data into character data and a character data generating means 104 generates the character data. The information terminal 119 transmits the generated character data to the information terminal 149 together with the voice data. The information terminal 149 receives the character data from the information terminal 119 together with the voice data. A voice output means 138 provides an output of voice from the received voice data and a character data display means 137 displays the character data on its screen together with the voice output. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、インターネットを用いて音声・映像データの双方向通信を行うインターネット電話の通信処理方法に関する。
【０００２】
【従来の技術】
インターネットの発展により、ＩＰ網を利用して音声伝送を行うインターネット電話が普及しつつある。最近では、高圧縮映像符号化技術の進歩により、映像伝送を伴うインターネットテレビ電話も登場している。
【０００３】
インターネット電話の最もシンプルな形は、ＩＰ網に接続された２台のＰＣ（パーソナルコンピュータ）間で通話を行うものである。それ以外に、両側とも公衆網に接続された電話機を使用しているが、途中の伝送路にＩＰ網を使用している場合や、片側がＰＣで片側が電話機の形態もある。相互接続するための通信プロトコルにはＨ．３２３が採用されている。Ｈ．３２３は、ＩＴＵ−Ｔ（国際電気通信連合−標準化セクタ）において１９９６年に制定された通信プロトコルであり、サービス保証を行わないＬＡＮによるマルチメディア通信の端末機器およびサービスについて規定している。Ｈ．３２３では呼接続などは信頼性のあるトランスポート（ＴＣＰなど）を利用しているが、音声・映像伝送は遅延時間の増加を防ぐため、信頼性のないトランスポート（ＵＤＰなど）を利用している（例えば非特許文献１参照）。
【０００４】
インターネット電話では、音声もＩＰパケットという形で規格化されているため、データと混在してＩＰ網上で送受信できる。このためデータ通信と別に用意していた音声用通信回線の設備をデータ通信網に統合することが可能である。また、ホワイトボードやアプリケーション共有といったネットワークでの協調作業を支援する機能も搭載し、聴覚と視覚とによって情報伝達することも可能である。
【０００５】
一方、インターネット電話の課題として、音質の保証ができないという点がある。現在、インターネットなどのＩＰ網ではＱｏＳ（ＱｕａｌｉｔｙｏｆＳｅｒｖｉｃｅ）が保証されていないので、トラヒックが上がると、遅延が大きくなったり、パケットが消失する確率が上がり音質が悪化する。さらに、遅延により普段は知覚されないエコーも気になり出す。また、パケットの消失が起きると、音が途切れたりすることもあり、相手の声を聞きとり難い場合が生じる。
【０００６】
従来、相手の声を聞きとり難いあいまいな場合でもコミュニケーション効率を向上する手段として、受信した音声信号を文字情報に変換し視覚的に表示する方法が提案されている（例えば特許文献１参照）。
【０００７】
【非特許文献１】
ＩＴＵ−ＴＲｅｃｏｍｍｅｎｄａｔｉｏｎＨ．３２３， ”ＶｉｓｕａｌＴｅｌｅｐｈｏｎｅＳｙｓｔｅｍｓａｎｄＥｑｕｉｐｍｅｎｔｆｏｒＬｏｃａｌＡｒｅａＮｅｔｗｏｒｋｓＷｈｉｃｈＰｒｏｖｉｄｅａＮｏｎ−ＧｕａｒａｎｔｅｅｄＱｕａｌｉｔｙｏｆＳｅｒｖｉｃｅ，”ＩＴＵ−Ｔ，１９９６．
【特許文献１】
特開平２−１８５１５６号公報（第１図）
【０００８】
【発明が解決しようとする課題】
しかしながら、従来の方法では、受信した音声信号を文字変換して視覚的に表示するので、音声信号を受信し損ねた場合、正確な内容を文字表示できないという課題がある。そこで、本発明では音声信号を受信し損ねた場合でも、コミュニケーション効率を低下させることなく通話できる通信処理方法を提供することを目的とする。
【０００９】
【課題を解決するための手段】
この課題を解決するために本発明は、以下のような特徴を有する。
【００１０】
請求項１に記載の発明は、音声入力手段と、入力された音声信号から音声データを作成する音声データ作成手段と、音声データを文字データに変換する音声認識手段と、文字データ作成手段と、音声データを送受信する音声データ送受信手段と、音声データと共に前記文字データ作成手段により作成された文字データを送受信する文字データ送受信手段と、受信した音声データを出力する音声出力手段と、受信した文字データを表示する文字データ表示手段とを備えた端末間で所定のネットワークを介して通信することを特徴とする通信処理方法である。
【００１１】
送信端末では、音声認識手段により変換した文字データを音声データと共に送信し、受信端末では、音声データと共に文字データを受信する。受信した文字データは音声出力と共に画面に表示される。文字データは音声データより比較的データ量が少ないのでより確実に受信することができる。したがって、音声通話を行う際に、音の途切れや遅延が発生した場合でも、送信端末のユーザが話した内容が画面に文字表示されるので、コミュニケーション効率を低下させることなく通話することができる。
【００１２】
請求項２に記載の発明は、映像入力手段と、入力された映像信号から映像データを作成する映像データ作成手段と、映像データを文字データに変換する映像認識手段と、文字データ作成手段と、映像データを送受信する映像データ送受信手段と、映像データと共に前記文字データ作成手段により作成された文字データを送受信する文字データ送受信手段と、受信した映像データを出力する映像出力手段と、受信した文字データを表示する文字データ表示手段とを備えた端末間で所定のネットワークを介して通信することを特徴とする通信処理方法である。
【００１３】
送信端末では、映像認識手段により変換した文字データを映像データと共に送信し、受信端末では、映像データと共に文字データを受信する。受信した文字データは映像出力と共に画面に表示される。文字データは映像データより比較的データ量が少ないのでより確実に受信することができる。したがって、手話による会話を行う際に、映像の途切れや遅延が発生した場合でも、映像出力と共に送信端末のユーザが伝えたい内容が画面に文字表示されるので、コミュニケーション効率を低下させることなく通話することができる。
【００１４】
請求項３に記載の発明は、音声入力手段と、入力された音声信号から音声データを作成する音声データ作成手段と、映像入力手段と、入力された映像信号から映像データを作成する映像データ作成手段と、音声データを文字データに変換する音声認識手段と、文字データ作成手段と、音声データを送受信する音声データ送受信手段と、映像データを送受信する映像データ送受信手段と、音声データと共に前記文字データ作成手段により作成された文字データを送受信する文字データ送受信手段と、受信した音声データを出力する音声出力手段と、受信した映像データを出力する映像出力手段と、受信した文字データを表示する文字データ表示手段とを備えた端末間で所定のネットワークを介して通信することを特徴とする通信処理方法である。
【００１５】
送信端末では、音声データ、映像データと共に音声認識手段により変換した文字データを送信し、受信端末では、音声データ、映像データと共に文字データを受信する。受信した文字データは音声出力、映像出力と共に画面に表示される。文字データは音声データより比較的データ量が少ないのでより確実に受信することができる。したがって、テレビ電話を行う際に、音声の途切れや遅延が発生した場合でも、音声出力、映像出力と共に送信端末のユーザが話した内容が画面に文字表示されるので、コミュニケーション効率を低下させることなく通話することができる。
【００１６】
請求項４に記載の発明は、音声入力手段と、入力された音声信号から音声データを作成する音声データ作成手段と、映像入力手段と、入力された映像信号から映像データを作成する映像データ作成手段と、映像データを文字データに変換する映像認識手段と、文字データ作成手段と、音声データを送受信する音声データ送受信手段と、映像データを送受信する映像データ送受信手段と、映像データと共に前記文字データ作成手段により作成された文字データを送受信する文字データ送受信手段と、受信した音声データを出力する音声出力手段と、受信した映像データを出力する映像出力手段と、受信した文字データを表示する文字データ表示手段とを備えた端末間で所定のネットワークを介して通信することを特徴とする通信処理方法である。
【００１７】
送信端末では、音声データ、映像データと共に映像認識手段により変換した文字データを送信し、受信端末では、音声データ、映像データと共に文字データを受信する。受信した文字データは音声出力、映像出力と共に画面に表示される。文字データは映像データより比較的データ量が少ないのでより確実に受信することができる。したがって、テレビ電話で手話会話を行う際に、映像のコマ落ちや遅延が発生した場合でも、音声出力、映像出力と共に送信端末のユーザが伝えたい内容が受信端末に文字表示されるので、コミュニケーション効率を低下させることなく通話することができる。
【００１８】
請求項５に記載の発明は、端末が、文字データの言語を翻訳する言語翻訳手段を備え、言語翻訳手段により翻訳された文字データを送信、または受信した文字データを言語翻訳手段により翻訳することを特徴とする請求項１ないし請求項４いずれかに記載の通信処理方法である。
【００１９】
受信端末のユーザが使用する言語に応じて、送信端末で言語翻訳手段により文字データを言語変換して送信し、受信端末で、受信した文字データを画面表示することで、受信端末のユーザは、自国語で文字を読むことができるので、コミュニケーション効率をさらに向上させることができる。また、受信端末で、受信した文字データを言語翻訳手段により言語変換して画面表示することで、コミュニケーション効率をさらに向上させることもできる。
【００２０】
請求項６に記載の発明は、端末が、受信した音声データのデータ紛失の有無を検知する音声データ紛失検知手段と、受信した文字データを音声データに変換する音声合成手段とを備え、音声データ紛失検知手段により音声データの紛失を検知した場合、前記音声合成手段により作成された音声データを音声出力することを特徴とする請求項１または請求項３に記載の通信処理方法である。
【００２１】
受信端末で、受信した音声データに紛失があった場合に、受信した文字データから音声合成手段により音声データを作成して音声出力する。紛失した音声データを補い音声出力することができるので、コミュニケーション効率をさらに向上させることができる。
【００２２】
請求項７に記載の発明は、端末が、受信した映像データのデータ紛失の有無を検知する映像データ紛失検知手段と、受信した文字データを映像データに変換する映像合成手段とを備え、映像データ紛失検知手段により映像データの紛失を検知した場合、前記映像合成手段により作成された映像データを映像出力することを特徴とする請求項２または請求項４に記載の通信処理方法である。
【００２３】
受信端末で、受信した映像データに紛失があった場合に、受信した文字データから映像合成手段により映像データを作成して映像出力する。紛失した映像データを補い映像出力することができるので、コミュニケーション効率をさらに向上させることができる。
【００２４】
請求項８に記載の発明は、端末が、文字データの言語を翻訳する言語翻訳手段を備え、言語翻訳手段により翻訳された文字データを送信、または受信した文字データを言語翻訳手段により翻訳することを特徴とする請求項６または請求項７に記載の通信処理方法である。
【００２５】
受信端末のユーザが使用する言語に応じて、送信端末で言語翻訳手段により文字データを言語変換して送信し、受信端末では、受信した文字データを画面表示する。さらに音声データに紛失があった場合には、受信した文字データから音声合成手段により音声データを作成して音声出力するか、または映像データに紛失があった場合には受信した文字データから映像合成手段により映像データを作成して映像出力する。受信端末のユーザは、自国語で文字を読むことができ、かつ音声データに紛失があった場合には、自国語で音声出力され、また映像データに紛失があった場合に、映像データを補い映像出力することができるので、コミュニケーション効率をさらに向上させることができる。また、受信端末で、受信した文字データを言語翻訳手段により言語変換して画面表示し、さらに音声データに紛失があった場合に、言語変換された文字データから音声合成手段により音声データを作成して音声出力するか、または映像データに紛失があった場合には受信した文字データから映像合成手段により映像データを作成して映像出力することで、コミュニケーション効率をさらに向上させることもできる。
【００２６】
【発明の実施の形態】
以下、本発明の実施の形態について、図１を用いて説明する。
【００２７】
（実施の形態１）
図１は本発明の一実施の形態による通信処理方法を示す構成図である。図１において、１０１、１３１は音声入力手段、１０２、１３２は音声データ作成手段、１０３、１３３は音声データを文字データに変換する音声認識手段、１０４、１３４は文字データ作成手段、１０５、１３５は音声データ送受信手段、１０６、１３６は文字データ送受信手段、１０７、１３７は文字データ表示手段、１０８、１３８は音声出力手段、１０９、１３９は映像入力手段、１１０、１４０は映像データ作成手段、１１１、１４１は映像データを文字データに変換する映像認識手段、１１２、１４２は映像データ送受信手段、１１３、１４３は映像出力手段、１１４、１４４は文字データの言語を翻訳する言語翻訳手段、１１５、１４５は音声データ紛失検知手段、１１６、１３６は映像データ紛失検知手段、１１７、１３７は音声合成手段、１１８、１３８は映像合成手段、１１９、１４９は同じ手段を有している情報端末であり、それぞれの手段を選択的に使用できる。情報端末１１９と情報端末１４９は、インターネット１２１を介して接続されている。
【００２８】
以上のように構成された本発明の通信処理方法について、その動作を説明する。
【００２９】
情報端末１１９から情報端末１４９を呼び出して通話を行いたいとする。
【００３０】
まず、情報端末１１９において、音声入力手段１０１、音声データ作成手段１０２、音声認識手段１０３、文字データ作成手段１０４、音声データ送受信手段１０５、文字データ送受信手段１０６、文字データ表示手段１０７、音声出力手段１０８を用い、情報端末１４９において、音声入力手段１３１、音声データ作成手段１３２、音声認識手段１３３、文字データ作成手段１３４、音声データ送受信手段１３５、文字データ送受信手段１３６、文字データ表示手段１３７、音声出力手段１３８を用いた場合について説明する。情報端末１１９では、音声入力手段１０１により音声信号が入力され、入力された音声信号から音声データ作成手段１０２により音声データが作成される。作成された音声データは音声認識手段１０３により文字データに変換され、文字データ作成手段１０４により文字データが作成される。音声データは、音声データ送受信手段１０５により情報端末１４９に送信され、文字データは、文字データ送受信手段１０６により音声データと共に情報端末１４９に送信される。
【００３１】
情報端末１４９では、音声データ送受信手段１３５と文字データ送受信手段１３６により、音声データと共に文字データが受信される。受信した音声データは、音声出力手段１３８により音声出力される。受信した文字データは、文字データ表示手段１３７により音声出力と共に画面に表示される。文字データは音声データより比較的データ量が少ないのでより確実に受信することができる。したがって、音声通話を行う際に、音の途切れや遅延が発生した場合でも、音声出力と共に送信端末のユーザが話した内容が画面に表示されるので、コミュニケーション効率を低下させることなく通話することができる。音声認識技術は公知の技術であるので説明を省略する。
【００３２】
また、情報端末１１９で、情報端末１４９のユーザが使用する言語に応じて、文字データを言語翻訳手段１１４により言語変換して送信し、情報端末１４９で、受信した文字データを文字データ表示手段１３７により画面表示することにより、情報端末１４９のユーザは、自国語で文字を読むことができる。言語翻訳技術は公知の技術であるので説明を省略する。
【００３３】
また、情報端末１４９で、情報端末１１９のユーザが使用する言語に応じて、受信した文字データを言語翻訳手段１４４により言語変換して画面表示することにより、情報端末１４９のユーザは、自国語で文字を読むことができる。
【００３４】
また、情報端末１４９で音声データ紛失検知手段１４５と、音声合成手段１４７を用いた場合には、音声データ紛失検知手段１４５により、受信した音声データの紛失を検知した場合に受信した文字データを音声合成手段１１７により音声データに変換し音声出力する。紛失した音声データを補い音声出力することができる。音声データ紛失検知手段１４５は、音声データパケットのヘッダに格納されたシーケンス番号により検知することができる。
【００３５】
また、情報端末１１９で、情報端末１４９のユーザが使用する言語に応じて、文字データを言語翻訳手段１１４により言語変換して送信し、情報端末１４９で、受信した文字データを文字データ表示手段１３７により画面表示し、さらに、情報端末１４９で音声データ紛失検知手段１４５と音声合成手段１４７を用いて、音声データ紛失検知手段１４５により受信した音声データの紛失を検知した場合に、受信した文字データを音声合成手段１４７により音声データに変換して音声出力することで、情報端末１４９のユーザは、自国語で文字を読むことができ、かつ音声データに紛失があった場合には、自国語で音声出力することができる。
【００３６】
次に、情報端末１１９において、音声入力手段１０１、音声データ作成手段１０２、音声認識手段１０３、文字データ作成手段１０４、音声データ送受信手段１０５、文字データ送受信手段１０６、文字データ表示手段１０７、音声出力手段１０８、映像入力手段１０９、映像データ作成手段１１０、映像データ送受信手段１１２を用い、情報端末１４９において、音声入力手段１３１、音声データ作成手段１３２、音声認識手段１３３、文字データ作成手段１３４、音声データ送受信手段１３５、文字データ送受信手段１３６、文字データ表示手段１３７、音声出力手段１３８、映像入力手段１３９、映像データ作成手段１４０、映像データ送受信手段１４２を用いた場合について説明する。情報端末１１９では、音声入力手段１０１により音声信号が入力され、入力された音声信号から音声データ作成手段１０２により音声データが作成される。また、映像入力手段１０９により映像信号が入力され、入力された映像信号から映像データ作成手段１１０により映像データが作成される。作成された音声データは音声認識手段１０３により文字データに変換され、文字データ作成手段１０４により文字データが作成される。音声データは音声データ送受信手段１０５により情報端末１４９に送信され、映像データは映像データ送受信手段１１２により情報端末１４９に送信され、文字データは文字データ送受信手段１０６により音声データ、映像データと共に情報端末１４９に送信される。
【００３７】
情報端末１４９では、音声データ送受信手段１３５、映像データ送受信手段１４２と文字データ送受信手段１３６により、音声データ、映像データと共に文字データが受信される。音声データは、音声出力手段１３８により音声出力される。映像データは映像出力手段１４３により映像出力される。文字データは、文字データ表示手段１３７により音声出力、映像出力と共に画面に表示される。文字データは音声データより比較的データ量が少ないのでより確実に受信することができる。したがって、テレビ電話を行う際に、音の途切れや遅延が発生した場合でも、送信端末のユーザが話した内容が画面に文字表示されるので、テレビ電話を行う場合に、コミュニケーション効率を低下させることなく通話することができる。
【００３８】
また、情報端末１１９で、情報端末１４９のユーザが使用する言語に応じて、文字データを言語翻訳手段１１４により言語変換して送信し、情報端末１４９で、受信した文字データを文字データ表示手段１３７により画面表示することで、受信端末のユーザは、自国語で文字を読むことができる。
【００３９】
また、情報端末１４９で、情報端末１１９のユーザが使用する言語に応じて、受信した文字データを言語翻訳手段１４４により言語変換して画面表示することにより、情報端末１４９のユーザは、自国語で文字を読むことができる。
【００４０】
また、情報端末１４９で音声データ紛失検知手段１４５と、音声合成手段１４７を用いた場合には、音声データ紛失検知手段１４５により、受信した音声データの紛失を検知した場合に受信した文字データを音声合成手段１１７により音声データに変換し音声出力することで、紛失した音声データを補い音声出力することができる。
【００４１】
また、情報端末１１９で、情報端末１４９のユーザが使用する言語に応じて、文字データを言語翻訳手段１１４により言語変換して送信し、情報端末１４９で、受信した文字データを文字データ表示手段１３７により画面表示し、さらに、情報端末１４９で音声データ紛失検知手段１４５と音声合成手段１４７を用いて、音声データ紛失検知手段１４５により受信した音声データの紛失を検知した場合に、受信した文字データを音声合成手段１４７により音声データに変換して音声出力することで、情報端末１４９のユーザは、自国語で文字を読むことができ、かつ音声データに紛失があった場合には、自国語で音声出力することができる。
【００４２】
次に、情報端末１１９において、文字データ作成手段１０４、文字データ送受信手段１０６、文字データ表示手段１０７、映像入力手段１０９、映像データ作成手段１１０、映像認識手段１１１、映像データ送受信手段１１２を用い、情報端末１４９において、文字データ作成手段１３４、音声データ送受信手段１３５、文字データ送受信手段１３６、文字データ表示手段１３７、映像入力手段１３９、映像データ作成手段１４０、映像認識手段１４１、映像データ送受信手段１４２を用いた場合について説明する。情報端末１１９では、映像入力手段１０９により映像信号が入力される。入力された映像信号から映像データ作成手段１１０により映像データが作成される。作成された映像データは映像認識手段１１１により文字データに変換され、文字データ作成手段１０４により文字データが作成される。映像データは、映像データ送受信手段１１２により情報端末１４９に送信され、文字データは文字データ送受信手段１０６により映像データと共に情報端末１４９に送信される。
【００４３】
情報端末１４９では、映像データ送受信手段１４２と文字データ送受信手段１３６により、映像データと共に文字データが受信される。映像データは映像出力手段１４３により映像出力される。文字データは文字データ表示手段１３７により映像出力と共に画面に表示される。文字データは映像データより比較的データ量が少ないのでより確実に受信することができる。したがって、手話で会話を行う際に、映像のコマ落ちや遅延が発生した場合でも、情報端末１１９のユーザが伝えたい内容が情報端末１４９に文字表示されるので、コミュニケーション効率を低下させることなく通話することができる。映像認識技術は公知の技術であるので説明を省略する。
【００４４】
また、情報端末１１９で、情報端末１４９のユーザが使用する言語に応じて、文字データを言語翻訳手段１１４により言語変換して送信し、情報端末１４９で、受信した文字データを文字データ表示手段１３７により画面表示することにより、受信端末のユーザは、自国語で文字を読むことができる。
【００４５】
また、情報端末１４９で、情報端末１１９のユーザが使用する言語に応じて、受信した文字データを言語翻訳手段１４４により言語変換して画面表示することにより、情報端末１４９のユーザは、自国語で文字を読むことができる。
【００４６】
また、情報端末１４９で映像データ紛失検知手段１４６と、映像合成手段１４８を用いた場合には、映像データ紛失検知手段１４６により、受信した映像データの紛失を検知した場合に受信した文字データを映像合成手段１４８より映像データに変換し映像出力することにより、紛失した映像データを補い映像出力することができる。映像データ紛失検知手段は、映像データパケットのヘッダに格納されたシーケンス番号により検知することができる。
【００４７】
また、情報端末１１９で、情報端末１４９のユーザが使用する言語に応じて、文字データを言語翻訳手段１１４により言語変換して送信し、情報端末１４９で、受信した文字データを文字データ表示手段１３７により画面表示し、さらに、情報端末１４９で映像データ紛失検知手段１４６と映像合成手段１４８を用いて、映像データ紛失検知手段１４６により受信した映像データの紛失を検知した場合に、受信した文字データを映像合成手段１４８により映像データに変換して映像出力することで、情報端末１４９のユーザは、自国語で文字を読むことができ、かつ映像データに紛失があった場合には、紛失した映像データを補い映像出力することができる。
【００４８】
最後に、情報端末１１９において、音声入力手段１０１、音声データ作成手段１０２、文字データ作成手段１０４、音声データ送受信手段１０５、文字データ送受信手段１０６、文字データ表示手段１０７、音声出力手段１０８、映像入力手段１０９、映像データ作成手段１１０、映像認識手段１１１、映像データ送受信手段１１２を用い、情報端末１４９において、音声入力手段１３１、音声データ作成手段１３２、音声認識手段１３３、文字データ作成手段１３４、音声データ送受信手段１３５、文字データ送受信手段１３６、文字データ表示手段１３７、音声出力手段１３８、映像入力手段１３９、映像データ作成手段１４０、映像認識手段１４１、映像データ送受信手段１４２を用いた場合について説明する。情報端末１１９では、音声入力手段１０１により音声信号が入力され、入力された音声信号から音声データ作成手段１０２により音声データが作成される。また、映像入力手段１０９により映像信号が入力され、入力された映像信号から映像データ作成手段１１０により映像データが作成される。作成された映像データは映像認識手段１１１により文字データに変換され、文字データ作成手段１０４により文字データが作成される。音声データは音声データ送受信手段１０５により情報端末１４９に送信され、映像データは映像データ送受信手段１１２により情報端末１４９に送信され、文字データは文字データ送受信手段１０６により音声データ、映像データと共に情報端末１４９に送信される。
【００４９】
情報端末１４９では、音声データ送受信手段１３５、映像データ送受信手段１４２と文字データ送受信手段１３６により、音声データ、映像データと共に文字データが受信される。データは、音声出力手段１３８により音声出力される。映像データは映像出力手段１４３により映像出力される。文字データは、文字データ表示手段１３７により音声出力、映像出力と共に画面に表示される。文字データは映像データより比較的データ量が少ないのでより確実に受信することができる。したがって、音の途切れや遅延が発生した場合でも、送信端末のユーザが伝えたい内容が画面に文字表示されるので、テレビ電話で音声を伴う手話会話を行う場合に、コミュニケーション効率を低下させることなく通話することができる。
【００５０】
また、情報端末１１９で、情報端末１４９のユーザが使用する言語に応じて、文字データを言語翻訳手段１１４により言語変換して送信し、情報端末１４９で、受信した文字データを文字データ表示手段１３７により画面表示することで、受信端末のユーザは、自国語で文字を読むことができる。
【００５１】
また、情報端末１４９で、情報端末１１９のユーザが使用する言語に応じて、文字データ送受信手段１３６により受信した文字データを言語翻訳手段１４４により言語変換し、文字データ表示手段１３７により画面表示することで、情報端末１４９のユーザは、自国語で文字を読むことができる。
【００５２】
また、情報端末１４９で映像データ紛失検知手段１４６と、映像合成手段１４８を用いた場合には、映像データ紛失検知手段１４６により、受信した映像データの紛失を検知した場合に受信した文字データを映像合成手段１４８より映像データに変換し映像出力することにより、紛失した映像データを補い映像出力することができる。
【００５３】
また、情報端末１１９で、情報端末１４９のユーザが使用する言語に応じて、文字データを言語翻訳手段１１４により言語変換して送信し、情報端末１４９で、受信した文字データを文字データ表示手段１３７により画面表示し、さらに、情報端末１４９で映像データ紛失検知手段１４６と映像合成手段１４８を用いて、映像データ紛失検知手段１４６により受信した映像データの紛失を検知した場合に、受信した文字データを映像合成手段１４８により映像データに変換して映像出力することで、受信端末のユーザは、自国語で文字を読むことができ、かつ映像データに紛失があった場合には、映像データを補い映像出力することができる。
【００５４】
なお、本実施の形態では、文字データ表示手段１３７が、画面に出力するものについて示したが、文字データ表示手段１３７がプリンタに表示する構成にしても良い。
【００５５】
【発明の効果】
以上のように本発明（請求項１）によれば、送信端末では、音声データと共に音声認識手段により変換した文字データを送信し、受信端末では、音声データと共に文字データを受信する。受信した文字データは音声出力と共に画面に表示される。文字データは音声データより比較的データ量が少ないのでより確実に受信することができる。したがって、音の途切れや遅延が発生した場合でも、音声出力と共に送信端末のユーザが話した内容が画面に文字表示されるので、音声通話を行う場合に、コミュニケーション効率を低下させることなく通話することができる。
【００５６】
本発明（請求項２）によれば、送信端末では、映像データと共に映像認識手段により変換した文字データを送信し、受信端末では、映像データと共に文字データを受信する。受信した文字データは映像出力と共に画面に表示される。文字データは映像データより比較的データ量が少ないのでより確実に受信することができる。したがって、映像の途切れや遅延が発生した場合でも、映像出力と共に送信端末のユーザが伝えたい内容が画面に表示されるので、手話による会話を行う場合に、コミュニケーション効率を低下させることなく通話することができる。
【００５７】
本発明（請求項３）によれば、送信端末では、音声データ、映像データと共に音声認識手段により変換した文字データを送信し、受信端末では、音声データ、映像データと共に文字データを受信する。受信した文字データは音声出力、映像出力と共に画面に表示される。文字データは音声データより比較的データ量が少ないのでより確実に受信することができる。したがって、音声の途切れや遅延が発生した場合でも、音声出力、映像出力と共に送信端末のユーザが話した内容が画面に文字表示されるので、テレビ電話を行う場合に、コミュニケーション効率を低下させることなく通話することができる。
【００５８】
本発明（請求項４）によれば、送信端末では、音声データ、映像データと共に映像認識手段により変換した文字データを送信し、受信端末では、音声データ、映像データと共に文字データを受信する。受信した文字データは音声出力、映像出力と共に画面に表示される。文字データは映像データより比較的データ量が少ないのでより確実に受信することができる。したがって、映像のコマ落ちや遅延が発生した場合でも、音声出力、映像出力と共に送信端末のユーザが伝えたい内容が受信端末に文字表示されるので、テレビ電話で音声を伴う手話会話を行う場合に、コミュニケーション効率を低下させることなく通話することができる。
【００５９】
本発明（請求項５）によれば、受信端末のユーザが使用する言語に応じて、送信端末で言語翻訳手段により文字データを言語変換して送信し、受信端末で受信した文字データを画面表示することで、受信端末のユーザは、自国語で文字を読むことができるので、コミュニケーション効率をさらに向上させることができる。また、受信端末で、受信した文字データを言語翻訳手段により言語変換して画面表示することで、コミュニケーション効率をさらに向上させることもできる。
【００６０】
本発明（請求項６）によれば、受信端末で、受信した音声データに紛失があった場合に、受信した文字データから音声合成手段により音声データを作成して音声出力する。紛失した音声データを補い音声出力することができるので、コミュニケーション効率をさらに向上させることができる。
【００６１】
本発明（請求項７）によれば、受信端末で、受信した映像データに紛失があった場合に、受信した文字データから映像合成手段により映像データを作成して映像出力する。紛失した映像データを補い映像出力することができるので、コミュニケーション効率をさらに向上させることができる。
【００６２】
本発明（請求項８）によれば、受信端末のユーザが使用する言語に応じて、送信端末で言語翻訳手段により文字データを言語変換して送信し、受信端末では、受信した文字データを画面表示する。さらに音声データに紛失があった場合には、受信した文字データから音声合成手段により音声データを作成して音声出力するか、または映像データに紛失があった場合には受信した文字データから映像合成手段により映像データを作成して映像出力する。受信端末のユーザは、自国語で文字を読むことができ、かつ音声データに紛失があった場合には、自国語で音声出力され、また映像データに紛失があった場合に、映像データを補い映像出力されるので、コミュニケーション効率をさらに向上させることができる。また、受信端末で、受信した文字データを言語翻訳手段により言語変換して画面表示し、さらに音声データに紛失があった場合に、言語変換された文字データから音声合成手段により音声データを作成して音声出力するか、または映像データに紛失があった場合には受信した文字データから映像合成手段により映像データを作成して映像出力することで、コミュニケーション効率をさらに向上させることもできる。
【図面の簡単な説明】
【図１】本発明の一実施の形態による通信処理方法を示す構成図
【符号の説明】
１０１、１３１音声入力手段
１０２、１３２音声データ作成手段
１０３、１３３音声認識手段
１０４、１３４文字データ作成手段
１０５、１３５音声データ送受信手段
１０６、１３６文字データ送受信手段
１０７、１３７文字データ表示手段
１０８、１３８音声出力手段
１０９、１３９映像入力手段
１１０、１４０映像データ作成手段
１１１、１４１映像認識手段
１１２、１４２映像データ送受信手段
１１３、１４３映像出力手段
１１４、１４４言語翻訳手段
１１５、１４５音声データ紛失検知手段
１１６、１４６映像データ紛失検知手段
１１７、１４７音声合成手段
１１８、１４８映像合成手段
１１９、１４９情報端末
１２１インターネット[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an Internet telephone communication processing method for performing two-way communication of audio / video data using the Internet.
[0002]
[Prior art]
With the development of the Internet, Internet telephones that perform voice transmission using an IP network are becoming widespread. Recently, with the advancement of high-compression video encoding technology, Internet videophones with video transmission have also appeared.
[0003]
The simplest form of Internet telephone is a telephone call between two PCs (personal computers) connected to an IP network. In addition, both sides use telephones connected to a public network, but there are also cases in which an IP network is used for a transmission line along the way, or one side is a PC and one side is a telephone. The communication protocol for interconnection is H.264. 323 are employed. H. H.323 is a communication protocol established in 1996 by the ITU-T (International Telecommunication Union-Standardization Sector), and specifies terminal equipment and services for multimedia communication by LAN without service guarantee. H. In H.323, a reliable transport (TCP or the like) is used for call connection or the like, but an audio / video transmission uses an unreliable transport (UDP or the like) to prevent an increase in delay time. (For example, see Non-Patent Document 1).
[0004]
In an Internet telephone, voice is also standardized in the form of an IP packet, so that it can be transmitted and received on an IP network together with data. Therefore, it is possible to integrate the equipment of the voice communication line prepared separately from the data communication into the data communication network. It is also equipped with a function to support collaborative work on a network such as whiteboard and application sharing, and it is possible to transmit information by hearing and vision.
[0005]
On the other hand, one of the problems with Internet telephones is that sound quality cannot be guaranteed. At present, quality of service (QoS) is not guaranteed in an IP network such as the Internet. Therefore, when traffic increases, the delay increases, the probability of packet loss increases, and sound quality deteriorates. In addition, echoes that are not normally perceived due to delays are also noticeable. Further, when a packet is lost, the sound may be interrupted, and it may be difficult to hear the voice of the other party.
[0006]
2. Description of the Related Art Conventionally, as a means for improving communication efficiency even in cases where it is difficult to hear the other party's voice, a method of converting a received voice signal into character information and visually displaying it has been proposed (for example, see Patent Document 1).
[0007]
[Non-patent document 1]
ITU-T Recommendation H. 323, "Visual Telephone Systems and Equipment for Local Area Networks While Provide a Non-Guaranted Quality of Service," ITU-T.
[Patent Document 1]
JP-A-2-185156 (FIG. 1)
[0008]
[Problems to be solved by the invention]
However, according to the conventional method, the received audio signal is converted into characters and visually displayed. Therefore, if the audio signal is not received properly, there is a problem that accurate characters cannot be displayed. SUMMARY OF THE INVENTION It is an object of the present invention to provide a communication processing method capable of making a telephone call without lowering the communication efficiency even if a voice signal is not received.
[0009]
[Means for Solving the Problems]
In order to solve this problem, the present invention has the following features.
[0010]
The invention according to claim 1 includes a voice input unit, a voice data generating unit that generates voice data from an input voice signal, a voice recognition unit that converts voice data into character data, a character data generating unit, Voice data transmitting / receiving means for transmitting / receiving voice data, character data transmitting / receiving means for transmitting / receiving character data generated by the character data generating means together with voice data, voice output means for outputting received voice data, and received character data A communication processing method characterized by communicating via a predetermined network between terminals having character data display means for displaying.
[0011]
The transmitting terminal transmits the character data converted by the voice recognition means together with the voice data, and the receiving terminal receives the character data together with the voice data. The received character data is displayed on the screen together with the audio output. Since the character data has a relatively smaller data amount than the voice data, it can be received more reliably. Therefore, even if the sound is interrupted or delayed when making a voice call, the contents spoken by the user of the transmitting terminal are displayed on the screen in characters, so that the call can be made without lowering the communication efficiency.
[0012]
According to a second aspect of the present invention, there is provided a video input unit, a video data generation unit that generates video data from an input video signal, a video recognition unit that converts the video data into character data, a character data generation unit, Video data transmitting / receiving means for transmitting / receiving video data, character data transmitting / receiving means for transmitting / receiving character data created by the character data creating means together with the video data, video output means for outputting received video data, and received character data A communication processing method characterized by communicating via a predetermined network between terminals having character data display means for displaying.
[0013]
The transmitting terminal transmits the character data converted by the video recognition means together with the video data, and the receiving terminal receives the character data together with the video data. The received character data is displayed on the screen together with the video output. Since the character data has a relatively smaller data amount than the video data, it can be received more reliably. Therefore, even when the video is interrupted or delayed during the sign language conversation, the content desired by the user of the transmitting terminal is displayed on the screen along with the video output, so that the user can talk without lowering the communication efficiency. be able to.
[0014]
According to a third aspect of the present invention, there is provided an audio input unit, an audio data generating unit for generating audio data from an input audio signal, a video input unit, and a video data generating unit for generating video data from an input video signal. Means, voice recognition means for converting voice data into text data, text data creation means, voice data transmission / reception means for transmitting / receiving voice data, video data transmission / reception means for transmitting / receiving video data, and the text data together with voice data. Character data transmitting / receiving means for transmitting / receiving character data created by the creating means, audio output means for outputting received audio data, video output means for outputting received video data, and character data for displaying received character data A communication processing method comprising communicating via a predetermined network between terminals having display means.
[0015]
The transmitting terminal transmits the character data converted by the voice recognition means together with the audio data and the video data, and the receiving terminal receives the character data together with the audio data and the video data. The received character data is displayed on the screen together with the audio output and the video output. Since the character data has a relatively smaller data amount than the voice data, it can be received more reliably. Therefore, even when audio interruption or delay occurs during a videophone call, the contents spoken by the user of the transmitting terminal are displayed on the screen along with audio output and video output, so that communication efficiency is not reduced. You can talk.
[0016]
According to a fourth aspect of the present invention, there is provided an audio input unit, an audio data generating unit for generating audio data from an input audio signal, a video input unit, and a video data generating unit for generating video data from an input video signal. Means, video recognition means for converting video data into character data, character data creation means, audio data transmission / reception means for transmitting / receiving audio data, video data transmission / reception means for transmitting / receiving video data, and the text data together with the video data. Character data transmitting / receiving means for transmitting / receiving character data created by the creating means, audio output means for outputting received audio data, video output means for outputting received video data, and character data for displaying received character data A communication processing method comprising communicating via a predetermined network between terminals having display means.
[0017]
The transmitting terminal transmits the character data converted by the video recognition means together with the audio data and the video data, and the receiving terminal receives the character data together with the audio data and the video data. The received character data is displayed on the screen together with the audio output and the video output. Since the character data has a relatively smaller data amount than the video data, it can be received more reliably. Therefore, even if video frames are dropped or delayed when sign language conversation is performed on a videophone, the contents that the user of the transmitting terminal wants to convey along with the audio output and video output are displayed in characters on the receiving terminal, thereby improving communication efficiency. It is possible to make a call without lowering.
[0018]
According to a fifth aspect of the present invention, the terminal includes language translation means for translating the language of the character data, and transmits the character data translated by the language translation means or translates the received character data by the language translation means. The communication processing method according to any one of claims 1 to 4, characterized in that:
[0019]
According to the language used by the user of the receiving terminal, the transmitting terminal converts the language of the character data by the language translating means and transmits the data, and the receiving terminal displays the received character data on the screen. The ability to read characters in their native language can further improve communication efficiency. Further, the communication efficiency can be further improved by converting the received character data into a language by means of the language translating means and displaying it on the screen at the receiving terminal.
[0020]
According to a sixth aspect of the present invention, the terminal comprises: a lost voice data detecting means for detecting the presence or absence of data loss of the received voice data; and a voice synthesizing means for converting the received character data into voice data. 4. The communication processing method according to claim 1, wherein when the loss detecting means detects loss of the voice data, the voice data created by the voice synthesizing means is output as voice.
[0021]
In the receiving terminal, if the received voice data is lost, voice data is created from the received character data by voice synthesis means and output as voice. Since lost voice data can be supplemented and output as voice, communication efficiency can be further improved.
[0022]
According to a seventh aspect of the present invention, the terminal comprises: a video data loss detecting means for detecting the presence or absence of data loss of the received video data; and a video synthesizing means for converting the received character data into video data. 5. The communication processing method according to claim 2, wherein when the loss of the video data is detected by the loss detecting means, the video data created by the video synthesizing means is output as a video.
[0023]
In the receiving terminal, when the received video data is lost, video data is created by the video synthesizing unit from the received character data, and the video data is output. Since lost video data can be supplemented and output as a video, communication efficiency can be further improved.
[0024]
According to an eighth aspect of the present invention, the terminal includes language translation means for translating the language of the character data, and transmits the character data translated by the language translation means or translates the received character data by the language translation means. The communication processing method according to claim 6 or claim 7, characterized in that:
[0025]
According to the language used by the user of the receiving terminal, the transmitting terminal converts the language of the character data by the language translating means and transmits the converted character data. The receiving terminal displays the received character data on the screen. Furthermore, if the audio data is lost, the audio data is created by the voice synthesizing means from the received character data and output as audio, or if the video data is lost, the video synthesis is performed from the received character data. Means for generating video data and outputting the video data. The user of the receiving terminal can read the characters in the native language, and if the audio data is lost, the audio is output in the native language, and if the video data is lost, the video data is supplemented. Since video output can be performed, communication efficiency can be further improved. In the receiving terminal, the received character data is converted into a language by a language translating means and displayed on a screen. If there is any loss in the voice data, voice data is created by the voice synthesizing means from the language converted character data. If the video data is lost, or if the video data is lost, the communication efficiency can be further improved by creating the video data from the received character data by the video synthesis means and outputting the video data.
[0026]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, an embodiment of the present invention will be described with reference to FIG.
[0027]
(Embodiment 1)
FIG. 1 is a configuration diagram showing a communication processing method according to an embodiment of the present invention. In FIG. 1, 101 and 131 are voice input means, 102 and 132 are voice data creation means, 103 and 133 are voice recognition means for converting voice data into character data, 104 and 134 are character data creation means, and 105 and 135 are Audio data transmitting / receiving means, 106 and 136 are character data transmitting / receiving means, 107 and 137 are character data displaying means, 108 and 138 are audio output means, 109 and 139 are video input means, 110 and 140 are video data creating means, 111, 141 is a video recognition means for converting video data into character data, 112 and 142 are video data transmitting / receiving means, 113 and 143 are video output means, 114 and 144 are language translation means for translating the language of the character data, and 115 and 145 are The audio data loss detecting means 116 and 136 are video data loss detecting means 117 and 137. Speech synthesis means, image synthesizing means 118 and 138, 119,149 is an information terminal having the same unit, the respective means can be selectively used. The information terminal 119 and the information terminal 149 are connected via the Internet 121.
[0028]
The operation of the communication processing method of the present invention configured as described above will be described.
[0029]
It is assumed that the information terminal 119 calls the information terminal 149 to make a call.
[0030]
First, in the information terminal 119, the voice input means 101, the voice data creation means 102, the voice recognition means 103, the character data creation means 104, the voice data transmission / reception means 105, the character data transmission / reception means 106, the character data display means 107, the voice output means In the information terminal 149, the voice input unit 131, the voice data creation unit 132, the voice recognition unit 133, the character data creation unit 134, the voice data transmission / reception unit 135, the character data transmission / reception unit 136, the character data display unit 137, the voice The case where the output unit 138 is used will be described. In the information terminal 119, a voice signal is input by the voice input unit 101, and voice data is generated by the voice data generation unit 102 from the input voice signal. The generated voice data is converted into character data by the voice recognition unit 103, and character data is generated by the character data generation unit 104. The voice data is transmitted to the information terminal 149 by the voice data transmitting / receiving means 105, and the character data is transmitted to the information terminal 149 together with the voice data by the character data transmitting / receiving means 106.
[0031]
In the information terminal 149, the voice data transmitting / receiving means 135 and the character data transmitting / receiving means 136 receive the character data together with the voice data. The received audio data is output as audio by the audio output unit 138. The received character data is displayed on the screen together with voice output by the character data display means 137. Since the character data has a relatively smaller data amount than the voice data, it can be received more reliably. Therefore, even when the sound is interrupted or delayed when making a voice call, the content spoken by the user of the transmitting terminal is displayed on the screen together with the voice output, so that the call can be made without lowering the communication efficiency. it can. Since the speech recognition technique is a known technique, the description is omitted.
[0032]
The information terminal 119 converts the character data according to the language used by the user of the information terminal 149 by the language translating means 114 and transmits the data. The information terminal 149 converts the received character data into character data display means 137. , The user of the information terminal 149 can read characters in his or her own language. Since the language translation technique is a known technique, the description is omitted.
[0033]
Also, the information terminal 149 converts the received character data into languages according to the language used by the user of the information terminal 119 by using the language translating means 144 and displays the screen on the screen. Can read characters.
[0034]
When the information terminal 149 uses the lost voice data detecting means 145 and the voice synthesizing means 147, the character data received when the lost voice data is detected by the voice data loss detecting means 145 is converted to voice data. The data is converted into audio data by the synthesizing means 117 and output as audio. The lost voice data can be supplemented and output as voice. The audio data loss detecting means 145 can detect the audio data loss by the sequence number stored in the header of the audio data packet.
[0035]
The information terminal 119 converts the character data according to the language used by the user of the information terminal 149 by the language translating means 114 and transmits the data. The information terminal 149 converts the received character data into character data display means 137. When the information terminal 149 detects the loss of the voice data received by the voice data loss detecting means 145 using the voice data loss detecting means 145 and the voice synthesizing means 147, the received character data is displayed. By converting the voice data into voice data by the voice synthesis means 147 and outputting the voice, the user of the information terminal 149 can read the characters in his / her own language, and if there is a loss in the voice data, the user can use the voice in his / her own language. Can be output.
[0036]
Next, in the information terminal 119, the voice input means 101, the voice data creation means 102, the voice recognition means 103, the character data creation means 104, the voice data transmission / reception means 105, the character data transmission / reception means 106, the character data display means 107, the voice output The information terminal 149 uses a voice input unit 131, a voice data generation unit 132, a voice recognition unit 133, a character data generation unit 134, a voice, A case where the data transmitting / receiving means 135, the character data transmitting / receiving means 136, the character data displaying means 137, the audio output means 138, the video input means 139, the video data creating means 140, and the video data transmitting / receiving means 142 are used will be described. In the information terminal 119, a voice signal is input by the voice input unit 101, and voice data is generated by the voice data generation unit 102 from the input voice signal. In addition, a video signal is input by the video input unit 109, and video data is generated by the video data generation unit 110 from the input video signal. The generated voice data is converted into character data by the voice recognition unit 103, and character data is generated by the character data generation unit 104. The audio data is transmitted to the information terminal 149 by the audio data transmitting / receiving means 105, the video data is transmitted to the information terminal 149 by the video data transmitting / receiving means 112, and the character data is transmitted to the information terminal 149 by the character data transmitting / receiving means 106 together with the audio data and the video data. Sent to.
[0037]
In the information terminal 149, the audio data transmission / reception unit 135, the video data transmission / reception unit 142, and the character data transmission / reception unit 136 receive the character data together with the audio data and the video data. The audio data is output as audio by the audio output unit 138. The video data is output by the video output means 143. The character data is displayed on the screen together with the audio output and the video output by the character data display means 137. Since the character data has a relatively smaller data amount than the voice data, it can be received more reliably. Therefore, even if the sound is interrupted or delayed when making a videophone call, the contents spoken by the user of the transmitting terminal are displayed on the screen in characters, so that when making a videophone call, the communication efficiency is reduced. You can talk without.
[0038]
The information terminal 119 converts the character data according to the language used by the user of the information terminal 149 by the language translating means 114 and transmits the data. The information terminal 149 converts the received character data into character data display means 137. , The user of the receiving terminal can read the characters in his or her own language.
[0039]
Also, the information terminal 149 converts the received character data into languages according to the language used by the user of the information terminal 119 by using the language translating means 144 and displays the screen on the screen. Can read characters.
[0040]
When the information terminal 149 uses the lost voice data detecting means 145 and the voice synthesizing means 147, the character data received when the lost voice data is detected by the voice data loss detecting means 145 is converted to voice data. By converting the audio data into audio data and outputting the audio by the synthesizing unit 117, the lost audio data can be supplemented and output as audio.
[0041]
The information terminal 119 converts the character data according to the language used by the user of the information terminal 149 by the language translating means 114 and transmits the data. The information terminal 149 converts the received character data into character data display means 137. When the information terminal 149 detects the loss of the voice data received by the voice data loss detecting means 145 using the voice data loss detecting means 145 and the voice synthesizing means 147, the received character data is displayed. By converting the voice data into voice data by the voice synthesis means 147 and outputting the voice, the user of the information terminal 149 can read the characters in his / her own language, and if there is a loss in the voice data, the user can use the voice in his / her own language. Can be output.
[0042]
Next, in the information terminal 119, using the character data creation means 104, the character data transmission / reception means 106, the character data display means 107, the video input means 109, the video data creation means 110, the video recognition means 111, and the video data transmission / reception means 112, In the information terminal 149, the character data creation means 134, the audio data transmission / reception means 135, the character data transmission / reception means 136, the character data display means 137, the video input means 139, the video data creation means 140, the video recognition means 141, the video data transmission / reception means 142 The case where is used will be described. In the information terminal 119, a video signal is input by the video input unit 109. Video data is generated by the video data generating means 110 from the input video signal. The created video data is converted into character data by the video recognition unit 111, and character data is created by the character data creation unit 104. The video data is transmitted to the information terminal 149 by the video data transmitting / receiving means 112, and the character data is transmitted to the information terminal 149 together with the video data by the character data transmitting / receiving means 106.
[0043]
In the information terminal 149, the video data transmitting / receiving means 142 and the character data transmitting / receiving means 136 receive the character data together with the video data. The video data is output by the video output means 143. The character data is displayed on the screen together with the video output by the character data display means 137. Since the character data has a relatively smaller data amount than the video data, it can be received more reliably. Therefore, even when a video frame is dropped or a delay occurs during conversation in sign language, the content desired by the user of the information terminal 119 is displayed in characters on the information terminal 149, so that the communication can be performed without deteriorating the communication efficiency. can do. The image recognition technique is a known technique, and thus the description is omitted.
[0044]
The information terminal 119 converts the character data according to the language used by the user of the information terminal 149 by the language translating means 114 and transmits the data. The information terminal 149 converts the received character data into character data display means 137. , The user of the receiving terminal can read the characters in his or her own language.
[0045]
Also, the information terminal 149 converts the received character data into languages according to the language used by the user of the information terminal 119 by using the language translating means 144 and displays the screen on the screen. Can read characters.
[0046]
When the information terminal 149 uses the video data loss detecting means 146 and the video synthesizing means 148, the character data received when the video data loss detecting means 146 detects the loss of the received video data is converted to the video data. By converting the image data into image data and outputting the image from the synthesizing means 148, the lost image data can be supplemented and output as an image. The video data loss detecting means can detect the video data loss by the sequence number stored in the header of the video data packet.
[0047]
The information terminal 119 converts the character data according to the language used by the user of the information terminal 149 by the language translating means 114 and transmits the data. The information terminal 149 converts the received character data into character data display means 137. When the information terminal 149 detects the loss of the video data received by the video data loss detecting means 146 using the video data loss detecting means 146 and the video synthesizing means 148, the received character data is displayed. By converting the video data into video data by the video synthesizing unit 148 and outputting the video, the user of the information terminal 149 can read characters in his / her own language, and if the video data is lost, Can be output.
[0048]
Finally, in the information terminal 119, the voice input means 101, the voice data creation means 102, the character data creation means 104, the voice data transmission / reception means 105, the character data transmission / reception means 106, the character data display means 107, the voice output means 108, the video input means Means 109, video data creating means 110, video recognizing means 111, video data transmitting / receiving means 112, and at information terminal 149, voice input means 131, voice data creating means 132, voice recognition means 133, character data creating means 134, voice A case where the data transmitting / receiving means 135, the character data transmitting / receiving means 136, the character data displaying means 137, the audio output means 138, the video input means 139, the video data creating means 140, the video recognizing means 141, and the video data transmitting / receiving means 142 are used will be described. . In the information terminal 119, a voice signal is input by the voice input unit 101, and voice data is generated by the voice data generation unit 102 from the input voice signal. In addition, a video signal is input by the video input unit 109, and video data is generated by the video data generation unit 110 from the input video signal. The created video data is converted into character data by the video recognition unit 111, and character data is created by the character data creation unit 104. The audio data is transmitted to the information terminal 149 by the audio data transmitting / receiving means 105, the video data is transmitted to the information terminal 149 by the video data transmitting / receiving means 112, and the character data is transmitted to the information terminal 149 by the character data transmitting / receiving means 106 together with the audio data and the video data. Sent to.
[0049]
In the information terminal 149, the audio data transmission / reception unit 135, the video data transmission / reception unit 142, and the character data transmission / reception unit 136 receive the character data together with the audio data and the video data. The data is output as sound by the sound output means 138. The video data is output by the video output means 143. The character data is displayed on the screen together with the audio output and the video output by the character data display means 137. Since the character data has a relatively smaller data amount than the video data, it can be received more reliably. Therefore, even if the sound is interrupted or delayed, the content that the user of the transmitting terminal wants to convey is displayed on the screen in characters, so that when performing sign language conversation accompanied by voice on a videophone, the communication efficiency is not reduced. You can talk.
[0050]
The information terminal 119 converts the character data according to the language used by the user of the information terminal 149 by the language translating means 114 and transmits the data. The information terminal 149 converts the received character data into character data display means 137. , The user of the receiving terminal can read the characters in his or her own language.
[0051]
The information terminal 149 converts the character data received by the character data transmitting / receiving unit 136 into a language by the language translating unit 144 in accordance with the language used by the user of the information terminal 119, and displays the screen on the character data display unit 137. Thus, the user of the information terminal 149 can read characters in his / her own language.
[0052]
When the information terminal 149 uses the video data loss detecting means 146 and the video synthesizing means 148, the character data received when the video data loss detecting means 146 detects the loss of the received video data is converted to the video data. By converting the image data into image data and outputting the image from the synthesizing means 148, the lost image data can be supplemented and output as an image.
[0053]
The information terminal 119 converts the character data according to the language used by the user of the information terminal 149 by the language translating means 114 and transmits the data. The information terminal 149 converts the received character data into character data display means 137. When the information terminal 149 detects the loss of the video data received by the video data loss detecting means 146 using the video data loss detecting means 146 and the video synthesizing means 148, the received character data is displayed. By converting the video data into video data by the video synthesizing means 148 and outputting the video, the user of the receiving terminal can read the characters in his or her own language, and if the video data is lost, supplement the video data with the video data. Can be output.
[0054]
In the present embodiment, the character data display unit 137 outputs to the screen. However, the character data display unit 137 may display on a printer.
[0055]
【The invention's effect】
As described above, according to the present invention (claim 1), the transmitting terminal transmits the character data converted by the voice recognition unit together with the voice data, and the receiving terminal receives the character data together with the voice data. The received character data is displayed on the screen together with the audio output. Since the character data has a relatively smaller data amount than the voice data, it can be received more reliably. Therefore, even if the sound is interrupted or delayed, the contents spoken by the user of the transmitting terminal are displayed on the screen along with the voice output, so that when making a voice call, it is necessary to make a call without lowering the communication efficiency. Can be.
[0056]
According to the present invention (claim 2), the transmitting terminal transmits the character data converted by the video recognizing means together with the video data, and the receiving terminal receives the character data together with the video data. The received character data is displayed on the screen together with the video output. Since the character data has a relatively smaller data amount than the video data, it can be received more reliably. Therefore, even if video interruption or delay occurs, the content that the user of the transmitting terminal wants to convey is displayed on the screen together with the video output, and when talking in sign language, it is necessary to talk without reducing communication efficiency. Can be.
[0057]
According to the present invention (claim 3), the transmitting terminal transmits the character data converted by the voice recognition means together with the audio data and the video data, and the receiving terminal receives the character data together with the audio data and the video data. The received character data is displayed on the screen together with the audio output and the video output. Since the character data has a relatively smaller data amount than the voice data, it can be received more reliably. Therefore, even if a break or delay occurs in the audio, the content spoken by the user of the transmitting terminal is displayed on the screen along with the audio output and the video output, so that when making a videophone call, the communication efficiency is not reduced. You can talk.
[0058]
According to the present invention (claim 4), the transmitting terminal transmits the character data converted by the video recognizing means together with the audio data and the video data, and the receiving terminal receives the character data together with the audio data and the video data. The received character data is displayed on the screen together with the audio output and the video output. Since the character data has a relatively smaller data amount than the video data, it can be received more reliably. Therefore, even when video frames are dropped or delayed, the contents of the transmitting terminal user to be conveyed are displayed on the receiving terminal as well as the audio output and the video output. It is possible to make a call without lowering the communication efficiency.
[0059]
According to the present invention (claim 5), according to the language used by the user of the receiving terminal, the transmitting terminal converts the language of the character data by the language translating means and transmits it, and the receiving terminal displays the received character data on the screen. By doing so, the user of the receiving terminal can read the characters in his / her own language, so that the communication efficiency can be further improved. Further, the communication efficiency can be further improved by converting the received character data into a language by means of the language translating means and displaying it on the screen at the receiving terminal.
[0060]
According to the present invention (claim 6), when the received voice data is lost at the receiving terminal, voice data is created by voice synthesis means from the received character data and output as voice. Since lost voice data can be supplemented and output as voice, communication efficiency can be further improved.
[0061]
According to the present invention (claim 7), when the received video data is lost at the receiving terminal, video data is created from the received character data by the video synthesizing means, and the video data is output. Since lost video data can be supplemented and output as a video, communication efficiency can be further improved.
[0062]
According to the present invention (claim 8), according to the language used by the user of the receiving terminal, the transmitting terminal converts the language of the character data by the language translating means and transmits the character data. indicate. Furthermore, if the audio data is lost, the audio data is created by the voice synthesizing means from the received character data and output as audio, or if the video data is lost, the video synthesis is performed from the received character data. Means for generating video data and outputting the video data. The user of the receiving terminal can read characters in his / her own language, and if audio data is lost, audio is output in its own language, and if video data is lost, video data is supplemented. Since video is output, communication efficiency can be further improved. In the receiving terminal, the received character data is converted into a language by a language translating means and displayed on a screen. If there is any loss in the voice data, voice data is created by the voice synthesizing means from the language converted character data. If the video data is lost, or if the video data is lost, the communication efficiency can be further improved by creating the video data from the received character data by the video synthesis means and outputting the video data.
[Brief description of the drawings]
FIG. 1 is a configuration diagram showing a communication processing method according to an embodiment of the present invention;
[Explanation of symbols]
101, 131 voice input means
102, 132 sound data creation means
103, 133 voice recognition means
104, 134 character data creation means
105, 135 voice data transmission / reception means
106, 136 character data transmission / reception means
107, 137 character data display means
108, 138 Voice output means
109, 139 Video input means
110, 140 means for creating video data
111, 141 Image recognition means
112, 142 Video data transmitting / receiving means
113, 143 Video output means
114,144 language translation means
115, 145 Voice data loss detection means
116, 146 Video data loss detection means
117, 147 Voice synthesis means
118, 148 Image synthesis means
119,149 Information terminal
121 Internet

Claims

音声入力手段と、入力された音声信号から音声データを作成する音声データ作成手段と、音声データを文字データに変換する音声認識手段と、文字データ作成手段と、音声データを送受信する音声データ送受信手段と、音声データと共に前記文字データ作成手段により作成された文字データを送受信する文字データ送受信手段と、受信した音声データを出力する音声出力手段と、受信した文字データを表示する文字データ表示手段とを備えた端末間で所定のネットワークを介して通信することを特徴とする通信処理方法。Voice input means, voice data generating means for generating voice data from an input voice signal, voice recognition means for converting voice data into character data, character data generating means, and voice data transmitting / receiving means for transmitting / receiving voice data Character data transmitting / receiving means for transmitting / receiving the character data created by the character data creating means together with the sound data, sound output means for outputting the received sound data, and character data display means for displaying the received character data. A communication processing method, characterized in that communication is performed between terminals provided with the terminal via a predetermined network.

映像入力手段と、入力された映像信号から映像データを作成する映像データ作成手段と、映像データを文字データに変換する映像認識手段と、文字データ作成手段と、映像データを送受信する映像データ送受信手段と、映像データと共に前記文字データ作成手段により作成された文字データを送受信する文字データ送受信手段と、受信した映像データを出力する映像出力手段と、受信した文字データを表示する文字データ表示手段とを備えた端末間で所定のネットワークを介して通信することを特徴とする通信処理方法。Video input means, video data generating means for generating video data from an input video signal, video recognizing means for converting video data into character data, character data generating means, and video data transmitting / receiving means for transmitting / receiving video data Character data transmitting / receiving means for transmitting / receiving the character data created by the character data creating means together with the video data, video output means for outputting the received video data, and character data display means for displaying the received character data. A communication processing method, characterized in that communication is performed between terminals provided with the terminal via a predetermined network.

音声入力手段と、入力された音声信号から音声データを作成する音声データ作成手段と、映像入力手段と、入力された映像信号から映像データを作成する映像データ作成手段と、音声データを文字データに変換する音声認識手段と、文字データ作成手段と、音声データを送受信する音声データ送受信手段と、映像データを送受信する映像データ送受信手段と、音声データと共に前記文字データ作成手段により作成された文字データを送受信する文字データ送受信手段と、受信した音声データを出力する音声出力手段と、受信した映像データを出力する映像出力手段と、受信した文字データを表示する文字データ表示手段とを備えた端末間で所定のネットワークを介して通信することを特徴とする通信処理方法。Audio input means, audio data creation means for creating audio data from the input audio signal, video input means, video data creation means for creating video data from the input video signal, and audio data to character data Voice recognition means for converting, character data creating means, audio data sending / receiving means for sending / receiving audio data, video data sending / receiving means for sending / receiving video data, and character data created by the character data creating means together with audio data. Between a terminal having character data transmitting / receiving means for transmitting / receiving, audio output means for outputting received audio data, video output means for outputting received video data, and character data display means for displaying received character data. A communication processing method for communicating via a predetermined network.

音声入力手段と、入力された音声信号から音声データを作成する音声データ作成手段と、映像入力手段と、入力された映像信号から映像データを作成する映像データ作成手段と、映像データを文字データに変換する映像認識手段と、文字データ作成手段と、音声データを送受信する音声データ送受信手段と、映像データを送受信する映像データ送受信手段と、映像データと共に前記文字データ作成手段により作成された文字データを送受信する文字データ送受信手段と、受信した音声データを出力する音声出力手段と、受信した映像データを出力する映像出力手段と、受信した文字データを表示する文字データ表示手段とを備えた端末間で所定のネットワークを介して通信することを特徴とする通信処理方法。Audio input means, audio data creating means for creating audio data from the input audio signal, video input means, video data creating means for creating video data from the input video signal, and converting the video data to character data Video recognition means for converting, character data creation means, audio data transmission / reception means for transmitting / receiving audio data, video data transmission / reception means for transmitting / receiving video data, and character data created by the character data creation means together with video data. Between a terminal having character data transmitting / receiving means for transmitting / receiving, audio output means for outputting received audio data, video output means for outputting received video data, and character data display means for displaying received character data. A communication processing method for communicating via a predetermined network.

端末が、文字データの言語を翻訳する言語翻訳手段を備え、言語翻訳手段により翻訳された文字データを送信、または受信した文字データを言語翻訳手段により翻訳することを特徴とする請求項１ないし請求項４いずれかに記載の通信処理方法。2. The terminal according to claim 1, further comprising a language translator for translating the language of the character data, wherein the terminal transmits the character data translated by the language translator or translates the received character data by the language translator. Item 5. The communication processing method according to any one of Items 4.

端末が、受信した音声データのデータ紛失の有無を検知する音声データ紛失検知手段と、受信した文字データを音声データに変換する音声合成手段とを備え、音声データ紛失検知手段により音声データの紛失を検知した場合、前記音声合成手段により作成された音声データを音声出力することを特徴とする請求項１または請求項３に記載の通信処理方法。The terminal is provided with voice data loss detecting means for detecting the presence or absence of data loss of the received voice data, and voice synthesizing means for converting the received character data to voice data. 4. The communication processing method according to claim 1, wherein, when the voice data is detected, voice data generated by the voice synthesizing unit is output as voice.

端末が、受信した映像データのデータ紛失の有無を検知する映像データ紛失検知手段と、受信した文字データを音声データに変換する映像合成手段とを備え、映像データ紛失検知手段により映像データの紛失を検知した場合、前記映像合成手段により作成された映像データを映像出力することを特徴とする請求項２または請求項４に記載の通信処理方法。The terminal includes video data loss detecting means for detecting the presence or absence of data loss of received video data, and video synthesizing means for converting received character data to audio data, and the video data loss detecting means detects loss of video data. The communication processing method according to claim 2, wherein when the detection is performed, the video data generated by the video synthesizing unit is output as a video.

端末が、文字データの言語を翻訳する言語翻訳手段を備え、言語翻訳手段により翻訳された文字データを送信、または受信した文字データを言語翻訳手段により翻訳することを特徴とする請求項６または請求項７に記載の通信処理方法。7. The terminal according to claim 6, further comprising: a language translator for translating the language of the character data, wherein the terminal transmits the character data translated by the language translator or translates the received character data by the language translator. Item 8. The communication processing method according to Item 7.