JP2002010138A

JP2002010138A - Method for processing information and device therefor

Info

Publication number: JP2002010138A
Application number: JP2000184069A
Authority: JP
Inventors: Kazuhiko Yamamori; 和彦山森; Kenji Kogure; 賢司木暮
Original assignee: Nippon Telegraph and Telephone Corp
Current assignee: Nippon Telegraph and Telephone Corp
Priority date: 2000-06-20
Filing date: 2000-06-20
Publication date: 2002-01-11

Abstract

PROBLEM TO BE SOLVED: To provide an information processing method capable of immediately announcing emergency news to a viewer in a broadcast program or lecture for performing information processing such as voice recognition or translation processing. SOLUTION: This device is provided with a path for operating information processing to input a voice signal and a video signal, and to delay the video signal, and to superimpose text information converted after voice recognition on the delayed video signal and a path for by-passing the path for operating the information processing. When the voice signal is emergency information, the path is switched to the path for by-passing the path for operating the information processing.

Description

【発明の詳細な説明】DETAILED DESCRIPTION OF THE INVENTION

【０００１】[0001]

【発明の属する技術分野】本発明は、テレビ等の生放送
における音声のリアルタイム音声認識、外国語による遠
隔地からの講演・講義等またはラジオ放送の同時通訳、
外国語による文字情報サービスのリアルタイム翻訳等の
情報処理方法及び装置に関し、特に緊急ニュースが発生
した場合にリアルタイムに視聴者に報知することのでき
る情報処理方法及び装置に関する。BACKGROUND OF THE INVENTION 1. Field of the Invention The present invention relates to real-time speech recognition of speech in live broadcasting such as television, lectures and lectures in foreign languages, simultaneous interpretation of radio broadcasting, and the like.
The present invention relates to an information processing method and apparatus for real-time translation of a character information service in a foreign language, and more particularly to an information processing method and apparatus capable of notifying a viewer in real time when emergency news occurs.

【０００２】[0002]

【従来の技術】テレビにおける様々な音声をリアルタイ
ムで音声認識し、文字等のメディアへ変換することがで
きれば、その内容を様々に記憶・加工することができ、
その効用はきわめて大きい。また耳に障害を持つ人も音
声情報をリアルタイムで理解することができ、福祉面で
も効用が大きい。また、”音”を出せない環境下での情
報把握にも有効である。2. Description of the Related Art If various voices on a television can be recognized in real time and converted into media such as characters, the contents can be stored and processed in various ways.
Its utility is extremely large. People with hearing impairments can also understand voice information in real time, which is very useful in terms of welfare. It is also effective for grasping information in an environment where "sound" cannot be output.

【０００３】図６は、テレビ放送におけるアナウンサの
音声をリアルタイムで音声認識する音声認識処理システ
ムの構成例を示す。カメラからの映像信号とマイクから
の音声信号は遅延処理部１１に入力し一定時間（音声信
号が音声認識部１３で音声認識、文字列信号生成に要す
る時間）遅延させ、また、音声信号は音声認識部１３
（前処理部、音声認識処理部、文字出力連結部、文字出
力表示部、オペレータによる修正）に入力して音声認識
され文字列信号に変換される。変換された文字列信号は
画面合成部１２により遅延させた映像信号と同期させ映
像信号と文字列信号を重畳して映像出力信号を作成す
る。FIG. 6 shows a configuration example of a voice recognition processing system for recognizing voice of an announcer in a television broadcast in real time. The video signal from the camera and the audio signal from the microphone are input to the delay processing unit 11 and are delayed for a certain period of time (the time required for the audio signal to be recognized by the audio recognition unit 13 and to generate a character string signal). Recognition unit 13
(A pre-processing unit, a voice recognition processing unit, a character output connection unit, a character output display unit, and correction by an operator), and the voice is recognized and converted into a character string signal. The converted character string signal is synchronized with the video signal delayed by the screen synthesizing unit 12, and the video signal and the character string signal are superimposed to create a video output signal.

【０００４】作成された映像出力信号と遅延処理部１１
で遅延された音声出力信号はテレビ放送電波として出力
される。The generated video output signal and the delay processing unit 11
Is output as a television broadcast wave.

【０００５】[0005]

【発明が解決しようとする課題】上記従来例において、
緊急ニュース（例えば、地震情報、火災情報等）が発生
した場合、これを報知するアナウンサの音声と音声認識
した文字列はアナウンサが発声した時刻より一定時間遅
れて放送されることとなり緊急を要する場合に問題があ
った。本発明は、時間遅れがなく緊急ニュースを視聴者
に報知することのできる情報処理方法及び装置を提供す
ることを目的とする。In the above conventional example,
When emergency news (for example, earthquake information, fire information, etc.) occurs, the voice of the announcer that reports this and the character string recognized as voice will be broadcast for a certain period of time after the time the announcer uttered, and an emergency is required. Had a problem. An object of the present invention is to provide an information processing method and apparatus capable of notifying a viewer of emergency news without a time delay.

【０００６】[0006]

【課題を解決するための手段】上記課題を解決するため
に、本発明は、音声信号と映像信号を入力し、映像信号
を遅延させ、遅延させた映像信号に音声認識して変換し
たテキスト情報を重畳させる情報処理を行う経路と、こ
の情報処理を行う経路をバイパスする経路を備え、音声
信号が緊急を要する情報である場合には情報処理を行う
経路をバイパスする経路に切換えることを特徴とする。SUMMARY OF THE INVENTION In order to solve the above-mentioned problems, the present invention provides a method for inputting an audio signal and a video signal, delaying the video signal, converting the delayed video signal into speech information by voice recognition. And a path for bypassing the path for performing the information processing, and switching to a path for bypassing the path for performing the information processing when the audio signal is urgent information. I do.

【０００７】また、本発明は、ある言語による文字情報
あるいは音声信号を入力し、前記文字情報あるいは音声
信号を遅延させ、前記文字情報あるいは音声信号を翻訳
して異なる言語による文字情報あるいは音声信号と、遅
延させたある言語による文字情報あるいは音声信号を同
時に出力する情報処理を行う経路と、この情報処理を行
う経路をバイパスする経路を備え、ある言語による文字
情報あるいは音声信号が緊急を要する情報である場合に
は前記情報処理を行う経路をバイパスする経路に切換え
ることを特徴とする。Further, the present invention provides a method for inputting character information or a voice signal in a certain language, delaying the character information or the voice signal, translating the character information or the voice signal and converting the character information or the voice signal into character information or a voice signal in a different language. A path for performing information processing for simultaneously outputting delayed character information or a voice signal in a certain language, and a path for bypassing the path for performing this information processing. In some cases, a path for performing the information processing is switched to a bypass path.

【０００８】[0008]

【発明の実施の形態】（実施例１）図１は、本発明をテ
レビ放送へ適用した時の構成例を示す。図１において、
１はスタジオ等でアナウンサの声と映像を収集するスタ
ジオ部、２はアナウンサの発声が入力されるマイクを含
む音声回路部、３はアナウンサの映像等が入力されるカ
メラを含む映像回路部、４はVTRその他に蓄積された各
種番組やコマーシャル等を含む蓄積素材部である。(Embodiment 1) FIG. 1 shows a configuration example when the present invention is applied to a television broadcast. In FIG.
Reference numeral 1 denotes a studio unit for collecting an announcer's voice and image in a studio or the like, 2 denotes an audio circuit unit including a microphone to which announcer's voice is input, 3 denotes a video circuit unit including a camera to which an announcer's image and the like are input, Denotes a storage material section including various programs and commercials stored in a VTR and the like.

【０００９】５は放送する番組をスタジオ部１からの番
組か、あるいは蓄積素材部４からの番組かを切換える切
換器、６は音声回路部２または蓄積素材部４の音声部分
を音声認識する音声認識部である。７は、本発明におい
ては音声認識の対象となる番組を音声認識処理のため
に、実際の放送(ON AIR)に先行させてスタートするの
で、番組スタートとON AIRまでの時間差分の映像信号と
音声信号を遅延させるための遅延処理部、８は音声認識
部６で認識された認識結果である文字列出力を、遅延処
理部７で遅延された映像信号に重畳するための画面合成
部、９は放送する番組を遅延処理部７により遅延された
番組か、あるいは切換器５を通った通常の番組かを選択
するための切換器、１０は時計を含み切換器５・遅延処
理部７・画面合成部８・切換器９の動作を制御する制御
部、２０は切換器９の出力すなわち放送する番組をON A
IRするための電波出力部である。Reference numeral 5 denotes a switch for switching between a program to be broadcasted from the studio unit 1 and a program from the stored material unit 4, and 6 denotes a voice for recognizing a voice portion of the audio circuit unit 2 or the stored material unit 4. It is a recognition unit. In the present invention, since a program to be subjected to voice recognition is started prior to an actual broadcast (ON AIR) for voice recognition processing in the present invention, a video signal having a time difference between a program start and ON AIR is used. A delay processing unit for delaying the audio signal; a screen synthesis unit for superimposing a character string output as a recognition result recognized by the voice recognition unit on the video signal delayed by the delay processing unit; Is a switch for selecting whether the broadcast program is a program delayed by the delay processing unit 7 or a normal program passed through the switch 5. Reference numeral 10 is a clock including a switch 5, a delay processing unit 7, and a screen. The control unit 20 controls the operations of the synthesizing unit 8 and the switching unit 9.
Radio wave output unit for IR.

【００１０】また、情報処理装置は切換器５、情報処理
部、切換器９、制御部１０から構成され、情報処理部は
音声認識部６、遅延処理部７、画面合成部８から構成さ
れる。なお図１において、実際のテレビ放送設備はもっ
と複雑であるが、ここでは本発明の説明に必要な部分の
みを簡略化して示している。このように構成された図１
のシステムの動作を説明する。The information processing apparatus comprises a switching unit 5, an information processing unit, a switching unit 9, and a control unit 10, and the information processing unit comprises a voice recognition unit 6, a delay processing unit 7, and a screen synthesizing unit 8. . In FIG. 1, the actual television broadcasting equipment is more complicated, but here, only the parts necessary for the description of the present invention are simplified. FIG. 1 thus configured
The operation of the system will be described.

【００１１】図２は、図１のシステムの動作を示したタ
イムチャートである。はじめにスタジオ部１からのアナ
ウンサの映像および音声とこれを音声認識した結果を画
面に文字として重畳表示する場合について説明する。図
１において、切換器５はスタジオ部１の番組を選択する
ように切換え、切換器９は遅延処理部７および音声認識
部６を経由する番組を選択するように切換える。FIG. 2 is a time chart showing the operation of the system shown in FIG. First, a description will be given of a case where the video and audio of the announcer from the studio unit 1 and the result of voice recognition of the announcer are superimposed and displayed as characters on the screen. In FIG. 1, a switch 5 switches to select a program in the studio unit 1, and a switch 9 switches to select a program passing through the delay processing unit 7 and the voice recognition unit 6.

【００１２】このような状態でスタジオ部１において、
番組をON AIR開始時刻より少し前に開始する。図２にお
いて、(a)音声認識対象番組欄が図１のa部（切換器５の
出力部）における上記した番組すなわち音声認識を行う
対象の番組の状態を表すタイムチャートで、時刻Aより
番組を開始する。図２(d)ON AIR欄は番組が実際にON AI
Rされる状態を表すタイムチャートであり、(a)欄の番組
が時刻BにおいてON AIRされることを表している。すな
わち実際の番組開始から時間ｔ₁ 後にON AIRされること
を示している。In such a state, in the studio section 1,
Start the program a little before the ON AIR start time. In FIG. 2, (a) a speech recognition target program column is a time chart showing the state of the above-mentioned program in the part a (the output unit of the switching unit 5) of FIG. To start. Fig. 2 (d) ON AIR column shows the program is actually ON AI
It is a time chart which shows the state where R is performed, and shows that the program of (a) column is turned on at time B. That indicates that it is ON AIR from the actual program start after a time t _1.

【００１３】図１において切換器５を経た番組は、遅延
処理部７において、映像および音声共に時間ｔ₁だけ遅
延処理をされ、画面合成部８を経て切換器９に送られ
る。一方切換器５を経た番組のうち、音声信号は音声認
識部６にも送られ、ここで音声認識処理が行われる。図
２(b)欄は図１のｂ部（音声認識部６の入力側）の音声
認識の動作を表すタイムチャートで、(a)欄の番組開始
と同時に音声認識もスタートする。In FIG. 1, the program that has passed through the switch 5 is subjected to delay processing for both video and audio by the time t ₁ in the delay processing section 7, and is sent to the switch 9 via the screen synthesis section 8. On the other hand, among the programs that have passed through the switching unit 5, the audio signal is also sent to the audio recognition unit 6, where the audio recognition processing is performed. The column (b) of FIG. 2 is a time chart showing the voice recognition operation of the part b (the input side of the voice recognition unit 6) in FIG. 1. The voice recognition starts simultaneously with the start of the program in the column (a).

【００１４】図２(c)欄は音声認識の結果の文字出力が
得られるタイムチャートであり、図１のｃ点（音声認識
部６の出力側）の出力を表している。音声認識結果が時
間ｔ ₂ だけ遅れて得られることを表している。音声認識
部６の構成は、例えば、図６に示したような構成と動作
でよく、音声認識処理には例えば、一般的なパソコン上
で動作するような安価な音声認識手段を使用することが
できる。FIG. 2C shows the character output as a result of the speech recognition.
FIG. 4 is a time chart obtained, and corresponds to a point c (voice recognition) in FIG.
(The output side of the unit 6). When the speech recognition result is
Interval t _Two Just delayed. voice recognition
The configuration of the unit 6 is, for example, as shown in FIG.
For speech recognition processing, for example, on a general personal computer
It is possible to use inexpensive voice recognition means that works with
it can.

【００１５】ここでｔ₁ とｔ₂ は次のようにして定めれ
ばよい。時間ｔ₂は、音声認識部６が図６のような構成
である場合、音声認識処理部に安価な音声認識手段を使
用した時、音声認識が終了し、さらにその結果の認識誤
りをオペレータが修正可能な時間に等しいかそれより長
く定めればよい。一方時間ｔ₁ は、下限値としては安価
な音声認識手段を使用した時の認識遅れを救済するもの
であるから時間ｔ₂ に等しいか、少し短い時間となる。Here, t ₁ and t ₂ may be determined as follows. At time t ₂ , when the voice recognition unit 6 has a configuration as shown in FIG. 6, when an inexpensive voice recognition unit is used for the voice recognition processing unit, the voice recognition ends, and the operator recognizes a recognition error as a result. The time may be set to be equal to or longer than the correctable time. On the other hand, the time t ₁ is equal to or slightly shorter than the time t ₂ because the lower limit is to remedy the recognition delay when an inexpensive voice recognition means is used.

【００１６】ｔ₂ は入力される音声の種類やオペレータ
での修正時間により変動が生じる。またｔ₂ がｔ₁ より
短いと、視聴者には音声が聞こえる前に認識結果が示さ
れることになり、不自然であるので、ｔ₁ はｔ₂ のばら
つきの最小値よりやや短く設定するのが好ましい。また
ｔ₁ の上限値は、ON AIRに先立ち、相当前から番組をス
タートさせるとアナウンサの原稿が間に合わない、関連
番組の編集が終了していない、臨時の割り込みに対応で
きない等の不都合が生じるので、それより短いことが条
件となる。The time t ₂ varies depending on the type of voice to be input and the correction time by the operator. If t ₂ is shorter than t ₁ , the viewer will be shown the recognition result before hearing the sound, which is unnatural. Therefore, t ₁ is set to be slightly shorter than the minimum value of the variation of t ₂ . Is preferred. The upper limit value of t ₁ is, prior to the ON AIR, not in time is considerable and before to start a program from the announcer of the original document, not finished editing the related program, because problems such as that can not respond to ad hoc interrupt occurs , But shorter.

【００１７】現実的には、数秒から数分程度が望ましい
値となる。ただし、ｔ₁ とｔ₂それぞれの長さおよび両
者の大小関係は番組制作の考え方、使用する音声認識手
段によって上記にこだわらず自由に設定すればよい。実
際の放送では、音声認識の対象となる番組ばかりが放送
されるわけではなく、また番組をON AIRに先立ち常時先
行させて放送していると、不都合も生じるので必要に応
じて通常の放送すなわち番組開始と同時にON AIRするこ
とも必要である。In practice, a desirable value is several seconds to several minutes. However, the lengths of t ₁ and t ₂ and the magnitude relationship between them may be freely set without depending on the above, depending on the concept of program production and the voice recognition means used. In actual broadcasting, not only programs that are subject to voice recognition are broadcasted, but if programs are always broadcast prior to ON AIR, inconvenience will occur, so normal broadcasting as needed It is necessary to perform ON AIR at the same time as the program starts.

【００１８】これは、音声認識動作に不都合が起きたと
きの救済や音声認識システムの休憩としても意味があ
る。図２において、時刻E以降がこの状態を説明した図
で、時刻Eで音声認識対象番組のON AIRが終了したら、
直ちに例えばコマーシャル等をON AIRすればよい。これ
は図１において、切換器５を蓄積素材部４の映像と音声
を通すように切換え、また切換器９を切換器５の出力を
そのまま通すようにすればよい。This is also meaningful as a remedy when a problem occurs in the voice recognition operation or as a break in the voice recognition system. In FIG. 2, this state is described after time E. At time E, when ON AIR of the voice recognition target program ends,
For example, a commercial or the like may be immediately turned ON AIR. In FIG. 1, the switch 5 may be switched so as to pass the video and audio of the storage material section 4, and the switch 9 may be passed the output of the switch 5 as it is.

【００１９】図２において、音声認識の出力が時刻E以
降である時刻Fまで出力されているが、音声認識結果は
映像と音声に重畳される文字出力であり、文字を読むた
めの時間も含まれているので、時刻Eにおいて映像と音
声を次の番組に切換えても視聴上の不都合はないように
設定することができる。また、再び音声認識を行う時
は、図２に示すように時刻Gにおいて、時刻Hに先行して
番組をスタートさせ同様の動作を行わせればよい。In FIG. 2, the output of voice recognition is output until time F which is after time E, but the voice recognition result is a character output superimposed on the video and the voice, and includes a time for reading the character. Since the video and audio are switched to the next program at time E, the setting can be made so that there is no inconvenience in viewing. Further, when performing voice recognition again, at time G, the program may be started prior to time H and the same operation may be performed at time G as shown in FIG.

【００２０】図２において、緊急ニュースが時刻Jで発
生した場合は、制御部１０により切換器５と切換器９を
切換えて遅延処理を行わない回路（音声認識経路をバイ
パスする経路）に切換え、アナウンサはこの緊急ニュー
スを発声する。このようにして音声認識対象番組中であ
っても緊急ニュースを直ちに視聴者に報知することがで
きる。また、緊急ニュースの終了した時刻Kにおいて制
御部１０により切換器５と切換器９を切換えて継続して
音声認識対象番組を放送する。In FIG. 2, when the emergency news occurs at time J, the control unit 10 switches the switch 5 and the switch 9 to switch to a circuit that does not perform the delay processing (a path that bypasses the voice recognition path). The announcer will announce this urgent news. In this way, the viewer can be immediately notified of the emergency news even during the speech recognition target program. Further, at the time K at which the emergency news ends, the control unit 10 switches the switching unit 5 and the switching unit 9 to continuously broadcast the program for voice recognition.

【００２１】なお上記の説明では、１つの番組単位で音
声認識を行う場合とそのままON AIRする場合について説
明したが、１つの番組の中の部分々で行ってもよいこと
は無論である。次に蓄積素材部４からの映像と音声以外
の音が含まれる音声を認識し、結果を画面に文字として
重畳表示する場合について説明する。図１において、切
換器５は、蓄積素材部４の番組を選択するように動作
し、切換器９は遅延処理部７および音声認識部６を経由
する番組を選択するよう動作する。音声認識部６は、例
えば、図６のような構成でよく前処理部において音声以
外の音を除去する処理が行われる。前処理部において音
声以外の音の除去が充分でないと、前記したスタジオ部
１からの音声より認識率が悪くなるが、本発明では時間
ｔ₂ をより長く取ることで、これを救済することが可能
である。In the above description, the case where speech recognition is performed in units of one program and the case where ON AIR is performed as it is have been described. However, it is needless to say that the recognition may be performed in parts within one program. Next, a case will be described in which a sound including sounds other than the video and the sound from the storage material section 4 is recognized, and the result is superimposed and displayed on the screen as characters. In FIG. 1, a switch 5 operates to select a program of the storage material unit 4, and a switch 9 operates to select a program passing through the delay processing unit 7 and the voice recognition unit 6. The voice recognition unit 6 may have a configuration as shown in FIG. 6, for example, and the preprocessing unit performs a process of removing sounds other than voice. If the pre-processing unit does not sufficiently remove sounds other than sound, the recognition rate is lower than that of the sound from the studio unit 1 described above. However, in the present invention, it is possible to relieve this by taking a longer time t _2. It is possible.

【００２２】以上の説明では、音声認識部に安価な音声
認識手段を使用する場合について説明したが、従来技術
で説明したような高性能で高価なシステムを使用するこ
とも無論可能であり、この場合は前記した認識結果の遅
れが改善される効果がある。さらに、本発明によれば音
声認識部６は、機械的な音声認識手段を用いずに全て人
手で構成することも可能である。この場合は手間は増え
るが機械認識のための設備費と開発費は、不要となる効
果がある。（実施例２）本発明を外国語による遠隔地へのリアルタ
イムの講演等に適用した場合について説明する。In the above description, the case where inexpensive voice recognition means is used for the voice recognition unit has been described. However, it is of course possible to use a high-performance and expensive system as described in the prior art. In this case, there is an effect that the delay of the recognition result is improved. Further, according to the present invention, the voice recognition unit 6 can be entirely configured manually without using mechanical voice recognition means. In this case, the labor is increased, but the equipment cost and the development cost for machine recognition become unnecessary. (Embodiment 2) A case where the present invention is applied to a real-time lecture to a remote place in a foreign language or the like will be described.

【００２３】図３は外国語による講演等へ本発明を適用
した実施例で、図３において情報処理部を構成する音声
認識部６、遅延処理部７、画面合成部８は図１の場合と
同様の構成・動作である。２１は音声認識部６の文字出
力を翻訳処理するための翻訳処理部、２２は翻訳処理部
２１の翻訳結果を表示する文字出力表示部、２３は文字
出力表示部２２の出力結果をオペレータが常時監視し、
翻訳誤りがあるときに修正するオペレータ修正である。
２４はこれらを包含した翻訳部である。FIG. 3 shows an embodiment in which the present invention is applied to a lecture or the like in a foreign language. In FIG. 3, a speech recognition unit 6, a delay processing unit 7, and a screen synthesis unit 8 constituting an information processing unit are the same as those in FIG. The configuration and operation are similar. 21 is a translation processing unit for translating the character output of the speech recognition unit 6, 22 is a character output display unit that displays the translation result of the translation processing unit 21, and 23 is an operator that constantly outputs the output result of the character output display unit 22. Monitor,
This is an operator correction that corrects when there is a translation error.
Reference numeral 24 denotes a translation unit including these.

【００２４】図３の動作は、実施例１(図１）と同様で
あり、音声入力された外国語が翻訳され文字出力として
画面に重畳表示される。ここでは、講演等の遅延処理部
７での遅延時間は音声認識前処理に要する合計時間から
定めればよい。図３では、音声認識して文字出力とし、
さらにそれを翻訳する場合について説明したが、音声認
識部６の構成例である図６と図３の翻訳部２４とを統合
し処理の一部を共通化してもよい。また図３では、翻訳
結果を文字出力する場合について示したが、後述する図
４のように構成すれば翻訳結果を合成音声として得るこ
とも容易である。（実施例３）ラジオ等音声メディアによる外国語放送に
対して本発明を適用し、同時通訳する場合の実施例を図
４に示す。The operation of FIG. 3 is the same as that of the first embodiment (FIG. 1), and the foreign language input by voice is translated and superimposed on the screen as a character output. Here, the delay time of the lecture or the like in the delay processing unit 7 may be determined from the total time required for the speech recognition preprocessing. In FIG. 3, voice recognition is performed to output characters,
Further, the case where the translation is performed has been described. However, the translation unit 24 shown in FIG. 6 and FIG. 3 which is a configuration example of the speech recognition unit 6 may be integrated and a part of the processing may be shared. FIG. 3 shows the case where the translation result is output as characters. However, if the configuration is made as shown in FIG. 4 described later, it is easy to obtain the translation result as a synthesized speech. (Embodiment 3) FIG. 4 shows an embodiment in which the present invention is applied to a foreign language broadcast by audio media such as a radio and simultaneous interpretation is performed.

【００２５】図４において、音声認識部６、遅延処理部
７、翻訳処理部２１、文字出力表示部２２、オペレータ
修正２３は、図３に示した実施例と同様の構成・動作で
ある。２５は翻訳結果である文字情報を音声メディアに
変換するための音声合成部、２６は通訳前の外国語音声
と通訳結果の音声とを重畳して聴取者に聞かせるとき、
通訳結果が聞き取り易いように音量を調整するレベル調
整、２７は外国語音声と通訳音声を重畳する音声重畳部
である。In FIG. 4, the speech recognition unit 6, delay processing unit 7, translation processing unit 21, character output display unit 22, and operator correction 23 have the same configuration and operation as the embodiment shown in FIG. Reference numeral 25 denotes a speech synthesis unit for converting character information as a translation result into audio media, and 26, when superimposing a foreign language voice before interpretation and a speech of the interpretation result to a listener,
A level adjuster 27 adjusts the volume so that the interpreter can easily hear the interpreted result. Reference numeral 27 denotes a voice superimposing unit that superposes the foreign language voice and the interpreter voice.

【００２６】なお、聴取者に通訳音声のみを聞かせるの
であれば、レベル調整２６、音声重畳部２７は不要で、
音声合成部２５の出力を聞かせればよい。この場合は、
もとの入力音声と情報処理された音声を別々のルート
（例えば、右耳と左耳から等）で聞かせることになる。
図４の動作は、実施例１の場合と同様であり、この場合
は遅延処理部７での入力音声（外国語）の遅延時間は、
音声認識部６および翻訳処理部２１の情報処理に要する
合計時間から定めればよい。（実施例４）外国語による文字放送に対して本発明を適
用し、同時翻訳する場合の実施例を図５に示す。If the listener can hear only the interpreted voice, the level adjustment 26 and the voice superimposing unit 27 are unnecessary.
What is necessary is just to hear the output of the voice synthesis unit 25. in this case,
The original input voice and the processed voice are heard through different routes (for example, from the right ear and the left ear).
The operation of FIG. 4 is the same as that of the first embodiment. In this case, the delay time of the input voice (foreign language) in the delay processing unit 7 is
What is necessary is just to determine from the total time required for the information processing of the speech recognition part 6 and the translation processing part 21. (Embodiment 4) FIG. 5 shows an embodiment in which the present invention is applied to teletext broadcasting in a foreign language and simultaneous translation is performed.

【００２７】図５において、遅延処理部７、翻訳処理部
２１、文字出力表示部２２、オペレータ修正２３、画面
合成部８は、図３に示した実施例と同様の構成・動作で
ある。図５の動作は、図３の実施例と同様であり、この
場合は、遅延処理部７での入力された文字情報（外国
語）の遅延時間は、翻訳処理部２１での情報処理に要す
る時間から定めればよい。In FIG. 5, the delay processing unit 7, the translation processing unit 21, the character output display unit 22, the operator correction 23, and the screen synthesizing unit 8 have the same configuration and operation as the embodiment shown in FIG. The operation of FIG. 5 is the same as that of the embodiment of FIG. 3. In this case, the delay time of the input character information (foreign language) in the delay processing unit 7 is required for the information processing in the translation processing unit 21. It may be determined from time.

【００２８】また図５では翻訳処理結果を文字で出力す
る場合について述べたが、図４の実施例のように構成
し、音声を聴取できる手段を設ければ合成音声で出力す
ることも容易に実施できる。また、図５において、受け
手に翻訳結果のみを表示するのであれば、画面合成部８
は不要であり、この場合は文字出力表示部２２の出力を
出力とすればよい。この場合は、元の入力文字情報と情
報処理結果とを別々のルートで表示することになる。In FIG. 5, the case where the translation processing result is output as characters has been described. However, if the translation processing result is configured as in the embodiment of FIG. Can be implemented. In FIG. 5, if only the translation result is displayed at the receiver, the screen combining unit 8
Is unnecessary, and in this case, the output of the character output display unit 22 may be output. In this case, the original input character information and the information processing result are displayed on different routes.

【００２９】以上の説明では、本発明による遅延処理と
音声認識や翻訳等の情報処理を送り手側で行う場合を例
にとり説明したが、受け手側で行っても同様に実施で
き、同様の効果を得ることができる。この場合は、受け
手側で利用する情報処理手段の性能に応じて個々に遅延
時間を設定することが出来る。また遅延処理と情報処理
を情報の送り手と受け手とは別の第３者機関で行っても
よい。この場合は処理前と処理後の情報を通信等の手段
によって送受すればよい。またこのように構成した場合
は、音声認識や翻訳などの情報処理手段を共通化できる
ので高性能・高価な手段を用いることができ、音声認識
や翻訳等の情報処理サービスを享受したい者だけ、ある
いは享受したい時だけ利用できるという効果がある。In the above description, the case where the delay processing and the information processing such as voice recognition and translation are performed on the sender side according to the present invention has been described as an example. Can be obtained. In this case, the delay time can be individually set according to the performance of the information processing means used on the receiving side. Further, the delay processing and the information processing may be performed by a third party different from the information sender and the information receiver. In this case, information before and after processing may be transmitted and received by means such as communication. In addition, in the case of such a configuration, information processing means such as voice recognition and translation can be shared, so that high-performance and expensive means can be used. Only those who want to enjoy information processing services such as voice recognition and translation can be used. Or there is an effect that it can be used only when it is desired to enjoy.

【００３０】さらに、同様に遅延処理と情報処理とをそ
れぞれ別々の場所で行うことも可能である。Further, similarly, the delay processing and the information processing can be performed at different places.

【００３１】[0031]

【発明の効果】以上説明したように、本発明は、音声認
識や翻訳処理等の情報処理を行う放送番組や講演・講義
において、緊急ニュースを直ちに視聴者に報知すること
ができる。As described above, according to the present invention, emergency news can be immediately notified to a viewer in a broadcast program or a lecture or lecture for which information processing such as voice recognition or translation processing is performed.

【図面の簡単な説明】[Brief description of the drawings]

【図１】本発明をテレビ放送に適用した構成例を示す
図。FIG. 1 is a diagram showing a configuration example in which the present invention is applied to television broadcasting.

【図２】本発明における情報処理装置のタイムチャート
例を示す図。FIG. 2 is a diagram showing an example of a time chart of the information processing apparatus according to the present invention.

【図３】本発明を外国語による講演等へ適用した情報処
理部の構成例を示す図。FIG. 3 is a diagram showing a configuration example of an information processing unit in which the present invention is applied to a lecture in a foreign language or the like.

【図４】本発明をラジオ等による外国語放送の同時通訳
へ適用した情報処理部の構成例を示す図。FIG. 4 is a diagram showing a configuration example of an information processing unit in which the present invention is applied to simultaneous interpretation of a foreign language broadcast by a radio or the like.

【図５】本発明を外国語による文字放送の翻訳へ適用し
た情報処理部の構成例を示す図。FIG. 5 is a diagram showing a configuration example of an information processing unit in which the present invention is applied to translation of a teletext broadcast in a foreign language.

【図６】従来のリアルタイム音声認識処理部の構成例を
示す図。FIG. 6 is a diagram showing a configuration example of a conventional real-time speech recognition processing unit.

【符号の説明】[Explanation of symbols]

１スタジオ部２音声回路部３映像回路部４蓄財素材部５切換器６音声認識部７遅延処理部８画面合成部９切換器 10 制御部 11 遅延処理部 21 翻訳処理部 22 文字出力表示部 23 オペレータ修正 24 翻訳部 25 音声合成部 26 レベル調整 27 音声重畳部 DESCRIPTION OF SYMBOLS 1 Studio part 2 Audio circuit part 3 Video circuit part 4 Storage material part 5 Switcher 6 Speech recognition part 7 Delay processing part 8 Screen synthesis part 9 Switcher 10 Control part 11 Delay processing part 21 Translation processing part 22 Character output display part 23 Operator correction 24 Translation unit 25 Voice synthesis unit 26 Level adjustment 27 Voice superimposition unit

───────────────────────────────────────────────────── フロントページの続きＦターム(参考） 5C023 AA18 BA09 BA19 CA01 CA04 CA05 EA13 5C025 AA29 BA28 CA09 CA19 5D015 KK04 ──────────────────────────────────────────────────続き Continued on the front page F term (reference) 5C023 AA18 BA09 BA19 CA01 CA04 CA05 EA13 5C025 AA29 BA28 CA09 CA19 5D015 KK04

Claims

【特許請求の範囲】[Claims]

【請求項１】音声信号と映像信号を入力し、映像信号を
遅延させ、遅延させた映像信号に音声認識して変換した
テキスト情報を重畳させる情報処理を行う経路と、この
情報処理を行う経路をバイパスする経路を備え、音声信
号が緊急を要する情報である場合には前記情報処理を行
う経路をバイパスする経路に切換えることを特徴とする
情報処理方法。1. A path for inputting an audio signal and a video signal, delaying the video signal, and performing information processing for superimposing text information converted by voice recognition on the delayed video signal, and a path for performing this information processing An information processing method, comprising: a path that bypasses the information processing, and when the audio signal is urgent information, switching to a path that bypasses the information processing path.

【請求項２】ある言語による文字情報あるいは音声信号
を入力し、前記文字情報あるいは音声信号を遅延させ、
前記文字情報あるいは音声信号を翻訳して異なる言語に
よる文字情報あるいは音声信号と、遅延させたある言語
による文字情報あるいは音声信号を同時に出力する情報
処理を行う経路と、この情報処理を行う経路をバイパス
する経路を備え、ある言語による文字情報あるいは音声
信号が緊急を要する情報である場合には前記情報処理を
行う経路をバイパスする経路に切換えることを特徴とす
る情報処理方法。2. Inputting character information or a voice signal in a certain language, delaying the character information or the voice signal,
A path for performing information processing for simultaneously outputting the character information or voice signal in a different language by translating the character information or voice signal and a delayed character information or voice signal in a certain language, and bypassing a path for performing this information processing An information processing method, comprising: providing a path for performing the information processing and switching to a path for bypassing the path for performing the information processing when character information or a voice signal in a language is urgent information.

【請求項３】音声信号と映像信号を入力し、映像信号を
遅延させる遅延処理部と、音声信号を音声認識して変換
したテキスト情報を生成する音声認識部と、遅延させた
映像信号とテキスト情報を重畳させる画面合成部と、入
力された音声信号と映像信号と、画面合成部の出力信号
とを切換える切換器を備え、切換器は音声信号が緊急を
要する情報である場合に画面合成部の出力信号を入力さ
れた音声信号と映像信号に切換えて出力することを特徴
とする情報処理装置。3. A delay processing unit for receiving an audio signal and a video signal and delaying the video signal, an audio recognition unit for recognizing the audio signal and generating text information converted, and a delayed video signal and text. A screen synthesizing unit for superimposing information; and a switching unit for switching between an input audio signal, a video signal, and an output signal of the screen synthesizing unit, wherein the switching unit is used when the audio signal is urgent information. An information processing apparatus characterized in that the output signal is switched to an input audio signal and video signal and output.

【請求項４】ある言語による文字情報あるいは音声信号
を入力し、前記文字情報あるいは音声信号を遅延させる
遅延処理部と、前記文字情報あるいは音声信号を翻訳し
て異なる言語による文字情報あるいは音声信号を生成す
る翻訳処理部と、異なる言語による文字情報あるいは音
声信号と遅延させたある言語による文字情報あるいは音
声信号を同時に出力する情報処理部と、ある言語による
文字情報あるいは音声信号が緊急を要する情報である場
合には前記情報処理部の出力を入力されたある言語によ
る文字情報あるいは音声信号を切換えて出力する切換器
を備えたことを特徴とする情報処理装置。4. A delay processing unit for inputting character information or a voice signal in a certain language and delaying the character information or the voice signal, and translating the character information or the voice signal to convert the character information or the voice signal in a different language. A translation processing unit to generate, an information processing unit to simultaneously output character information or a voice signal in a certain language delayed with character information or a voice signal in a different language; and a character information or a voice signal in a certain language with urgent information. In some cases, the information processing apparatus further comprises a switch for switching the output of the information processing section between character information or a voice signal in an input language.