JP2006072130A

JP2006072130A - Information processor and information processing method

Info

Publication number: JP2006072130A
Application number: JP2004257426A
Authority: JP
Inventors: Masaaki Yamada; 雅章山田; Kazue Kaneko; 和恵金子; Kohei Yamada; 耕平山田
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2004-09-03
Filing date: 2004-09-03
Publication date: 2006-03-16

Abstract

<P>PROBLEM TO BE SOLVED: To provide a technique for facilitating a work to synthesize sound by synchronizing background sound with response sound. <P>SOLUTION: The device for generating a response message by synthesizing response sound and background sound, is provided with an acquisition means which obtains each of the data of the response sound and the background sound, an output means which outputs the background sound, a starting point instructing means which instructs the output starting point of the response sound and a synchronization information holding means which holds the synchronization information between the response sound and the background sound that is computed based on the outputted background sound and the output starting point of the instructed response sound. <P>COPYRIGHT: (C)2006,JPO&NCIPI

Description

本発明は、電話等の応答メッセージを作成或いは出力する技術に関する。 The present invention relates to a technique for creating or outputting a response message such as a telephone call.

従来より、電話機の機能としてメッセージ応答機能が実装されている。これは、着呼者が、あらかじめ用意された応答メッセージを発呼者に聞かせる機能であり、留守番電話や保留音に用いられる。 Conventionally, a message response function has been implemented as a telephone function. This is a function in which the called party listens to the caller for a response message prepared in advance, and is used for an answering machine or a holding tone.

通常、応答メッセージは、ユーザがマイクに向かって発声した内容を録音することにより作成される。あるいは、肉声の代わりに音声合成を用いて作成される場合もある。 Usually, the response message is created by recording the content uttered by the user toward the microphone. Alternatively, it may be created using speech synthesis instead of the real voice.

さらに、上記録音音声や合成音声に対して背景音（バックグラウンドミュージック、以下ではＢＧＭという。）を重畳して応答メッセージとする場合もある（例えば、特許文献１〜４参照）。
特開平６−２９１８２９号公報特開平７−１８３９４１号公報特開２００２−３５４１１１号公報特許第３３０６０８０号明細書 Further, a background message (background music, hereinafter referred to as BGM) may be superimposed on the recorded voice or synthesized voice to form a response message (see, for example, Patent Documents 1 to 4).
JP-A-6-291829 JP 7-183941 A JP 2002-354111 A Japanese Patent No. 3306080 Specification

しかしながら、ＢＧＭを重畳する場合、録音音声や合成音声とのタイミングを合わせるのが難しいという問題がある。 However, when superimposing BGM, there is a problem that it is difficult to synchronize the timing with the recorded voice or synthesized voice.

本発明は、上記課題の解決を図り、応答音に背景音を同期させて合成する作業を容易化する技術の提供を目的としている。 SUMMARY OF THE INVENTION An object of the present invention is to provide a technique for solving the above-described problems and facilitating the work of synthesizing a background sound with a response sound.

上述の課題を解決し、目的を達成するために、本発明の情報処理装置は、応答音と背景音とを合成して応答メッセージを作成する装置であって、前記応答音及び背景音の各データを取得する取得手段と、前記背景音を出力する出力手段と、前記応答音の出力開始点を指示する開始点指示手段と、前記出力された背景音と前記指示された応答音の出力開始点とに基づいて算出された、当該応答音と背景音間の同期情報を保持する同期情報保持手段とを備える。 In order to solve the above-mentioned problems and achieve the object, an information processing apparatus of the present invention is an apparatus that creates a response message by synthesizing a response sound and a background sound, Acquisition means for acquiring data, output means for outputting the background sound, start point instruction means for instructing an output start point of the response sound, output start of the output background sound and the instructed response sound Synchronization information holding means for holding synchronization information between the response sound and the background sound calculated based on the point.

また、本発明の情報処理方法は、応答音と背景音とを合成して応答メッセージを出力する方法であって、前記応答音及び背景音の各データを取得する取得工程と、前記背景音を出力する出力工程と、前記応答音の出力開始点を指示する開始点指示工程と、前記出力された背景音と前記指示された応答音の出力開始点とに基づいて算出された、当該応答音と背景音間の同期情報を保持する同期情報保持工程とを備える。 Further, the information processing method of the present invention is a method for outputting a response message by synthesizing a response sound and a background sound, an acquisition step of acquiring each data of the response sound and the background sound, The response sound calculated based on the output step of outputting, the start point indicating step of indicating the output start point of the response sound, the output background sound and the output start point of the instructed response sound And a synchronization information holding step for holding synchronization information between background sounds.

なお、本発明は、上記メッセージ作成方法及びメッセージ出力方法を、コンピュータに実行させるためのプログラムや、当該プログラムを格納する記録媒体等としても適用可能である。 The present invention can also be applied to a program for causing a computer to execute the message creation method and the message output method, a recording medium for storing the program, and the like.

以上説明したように、本発明によれば、簡便なユーザインタフェースでＢＧＭと録音音声あるいはＢＧＭと合成音声とのタイミングを合わせることが可能となる。 As described above, according to the present invention, it is possible to synchronize the timing of BGM and recorded voice or BGM and synthesized voice with a simple user interface.

以下に、添付図面を参照して本発明を実施するための最良の形態について詳細に説明する。 The best mode for carrying out the present invention will be described below in detail with reference to the accompanying drawings.

尚、以下に説明する実施形態は、本発明の実現手段としての一例であり、本発明が適用される装置の構成や各種条件によって適宜修正又は変更されるべきものであり、本発明は以下の実施形態に限定されるものではない。 The embodiment described below is an example as means for realizing the present invention, and should be appropriately modified or changed according to the configuration of the apparatus to which the present invention is applied and various conditions. It is not limited to the embodiment.

また、本発明は、後述する実施形態であるメッセージ作成機能を実現するソフトウェアのプログラムコードを記憶した記憶媒体（または記録媒体）を、システムあるいは装置に供給し、そのシステムあるいは装置のコンピュータ（またはCPUやMPU）が記憶媒体に格納されたプログラムコードを読み出し実行することによっても、達成されることは言うまでもない。
［第１の実施形態］
図１は本発明に係る第１の実施形態のメッセージ作成装置のハードウェア構成図である。 Further, the present invention supplies a storage medium (or recording medium) storing software program codes for realizing a message creation function according to an embodiment described later to a system or apparatus, and the computer (or CPU) of the system or apparatus Needless to say, this can also be achieved by reading and executing the program code stored in the storage medium.
[First Embodiment]
FIG. 1 is a hardware configuration diagram of the message creation device according to the first embodiment of the present invention.

図１において、１は数値演算・制御等の処理を行なう中央処理装置であり、後述する手順に従って演算を行なう。２はユーザに対して音声やＢＧＭあるいはＢＧＭ重畳音声を提示する音声出力装置である。３はユーザに対して情報を提示する出力装置である。出力装置の典型例として、液晶ディスプレイのような画像出力装置が考えられるが、上記音声出力装置２と兼用するような形態であっても良い。さらに、ランプの点滅だけといった簡便なものであっても良い。４はユーザが音声やＢＧＭを入力する音声入力装置である。５はタッチパネルやキーボード・マウス等の入力装置であり、ユーザが本装置に対して動作の指示を与えるのに用いられる。電話の場合、プッシュボタンやフック等を入力装置として用いることもできる。 In FIG. 1, reference numeral 1 denotes a central processing unit that performs processing such as numerical calculation and control, and performs calculations in accordance with procedures described later. Reference numeral 2 denotes an audio output device that presents audio, BGM, or BGM superimposed audio to the user. An output device 3 presents information to the user. As a typical example of the output device, an image output device such as a liquid crystal display is conceivable. However, the output device 2 may be combined with the audio output device 2. Furthermore, it may be as simple as blinking of a lamp. Reference numeral 4 denotes a voice input device for a user to input voice and BGM. Reference numeral 5 denotes an input device such as a touch panel or a keyboard / mouse, which is used by a user to give an operation instruction to the device. In the case of a telephone, a push button, a hook, or the like can be used as an input device.

６は電話回線などに接続された通信装置であり、外部の機器との通信を行う。通信装置で通信される内容は、音声のようなアナログ信号の場合もあるし、デジタル化されたデータの場合もある。７はディスク装置や不揮発メモリ等の記憶装置であり、ＢＧＭデータ７０１・音声合成内容７０２が保持される。さらに、外部記憶装置７には、ＲＡＭ９に保持される各種情報のうち、恒久的に使用されるべき情報も保持される。また、外部記憶装置７は、ＣＤ−ＲＯＭやメモリカードといった可搬性のある記憶装置であっても良く、これによって利便性を高めることもできる。８は読み取り専用のメモリであり、本発明を実現するためのプログラムコード８０１や図示しない固定的データ等が格納される。もっとも、本実施形態においては、外部記憶装置７とＲＯＭ８の使用には任意性がある。例えば、プログラムコード８０１は、ＲＯＭ８ではなく外部記憶装置７にインストールされるものであっても良い。９はＲＡＭ等の一時情報を保持するメモリであり、音声合成開始タイプ９０１・ＢＧＭ音量抑制フラグ９０２・音声合成開始点９０３・ＢＧＭ出力終了タイプ９０４・ＢＧＭ出力終了待ち時間９０５・発声速度調整フラグ９０６・フェードアウトフラグ９０７・ＢＧＭ出力終了点９０８・発声速度９０９およびその他の一時的なデータや各種フラグ等が保持される。上記中央処理装置１乃至ＲＡＭ９は、バスで接続されている。 A communication device 6 is connected to a telephone line or the like, and communicates with external devices. The content communicated by the communication device may be an analog signal such as voice or may be digitized data. Reference numeral 7 denotes a storage device such as a disk device or a non-volatile memory, which holds BGM data 701 and speech synthesis content 702. Further, the external storage device 7 holds information to be used permanently among various information held in the RAM 9. In addition, the external storage device 7 may be a portable storage device such as a CD-ROM or a memory card, thereby improving convenience. A read-only memory 8 stores a program code 801 for realizing the present invention, fixed data (not shown), and the like. However, in this embodiment, the use of the external storage device 7 and the ROM 8 is optional. For example, the program code 801 may be installed in the external storage device 7 instead of the ROM 8. Reference numeral 9 denotes a memory such as a RAM that holds temporary information, such as a speech synthesis start type 901, a BGM volume suppression flag 902, a speech synthesis start point 903, a BGM output end type 904, a BGM output end waiting time 905, and a speech rate adjustment flag 906 -Fade-out flag 907-BGM output end point 908-Speech rate 909 and other temporary data, various flags, etc. are held. The central processing unit 1 to RAM 9 are connected by a bus.

次に、本発明に係る実施形態の応答メッセージ作成処理について、図２乃至図４のフローチャートを参照して説明する。 Next, the response message creation processing according to the embodiment of the present invention will be described with reference to the flowcharts of FIGS.

図２において、ＢＧＭデータ取得ステップＳ１で、ＢＧＭデータを取得する。ＢＧＭデータは、音声入力装置４から録音された音楽等を外部記憶装置７に記録するような形態であっても良いし、通信装置６を介してインターネット等に接続し、符号化された音楽データや何らかの記述言語で記述された音楽データをダウンロードするような形態であっても良い。あるいは、あらかじめ用意された標準ＢＧＭデータをＲＯＭ８から読み込んで用いても良い。また、ＢＧＭといっても、必ずしも音楽である必要はなく、例えば鳥のさえずりのような環境音であっても良い。 In FIG. 2, BGM data is acquired in BGM data acquisition step S1. The BGM data may be in a form in which music recorded from the voice input device 4 is recorded in the external storage device 7, or encoded music data connected to the Internet or the like via the communication device 6. Alternatively, the music data described in some description language may be downloaded. Alternatively, standard BGM data prepared in advance may be read from the ROM 8 and used. Also, BGM does not necessarily have to be music, and may be an environmental sound such as a bird's song.

次に、音声合成内容取得ステップＳ２で、音声合成の発声内容を取得する。音声合成内容は、入力装置５からユーザが入力する形態であっても良いし、通信装置６を介してダウンロードするような形態であっても良い。あるいは、あらかじめ標準として用意された標準音声合成内容をＲＯＭ８から読み込んで用いても良い。 Next, in speech synthesis content acquisition step S2, the speech content of speech synthesis is acquired. The speech synthesis content may be input by the user from the input device 5 or may be downloaded via the communication device 6. Alternatively, standard speech synthesis content prepared as a standard in advance may be read from the ROM 8 and used.

図７は、入力装置５からユーザが音声合成内容を入力する形態の場合に、本ステップにおいて出力装置３に出力されるユーザインタフェース画面の一例である。 FIG. 7 is an example of a user interface screen output to the output device 3 in this step when the user inputs speech synthesis content from the input device 5.

次に、発声速度初期化ステップＳ３で、発声速度９０９を初期化する。本実施形態では、発声速度を、標準発声速度との比で表現するものとする。したがって、初期化後の発声速度は１となる。 Next, in the utterance speed initialization step S3, the utterance speed 909 is initialized. In this embodiment, the utterance speed is expressed as a ratio with the standard utterance speed. Therefore, the utterance speed after initialization is 1.

次に、音声合成開始方法取得ステップＳ４で、音声合成の開始方法に関するパラメータを取得する。音声合成の開始方法に関するパラメータは、具体的には、音声合成開始タイプ９０１・ＢＧＭ音量抑制フラグ９０２を取得する。さらに、音声合成開始タイプが「ユーザによる指示」でない場合には、音声合成開始点９０３も取得する。 Next, in the speech synthesis start method acquisition step S4, parameters relating to the speech synthesis start method are acquired. Specifically, the speech synthesis start type 901 / BGM volume suppression flag 902 is acquired as the parameter related to the speech synthesis start method. Furthermore, when the voice synthesis start type is not “instruction by the user”, a voice synthesis start point 903 is also acquired.

図８は、本ステップにおいて出力装置３に出力されるユーザインタフェース画面の一例である。図中、１００１および１００２は、入力装置５からの入力によってどちらかが選択されるボタンになっており、１００１が選択された場合には、音声合成開始タイプが「絶対値設定」となり、１００２が選択された場合には、音声合成開始タイプが「ユーザによる指示」となる。さらに、１００１が選択された場合には、数値入力フィールド１００３が入力可能となり、音声合成開始点が入力される。また、チェックボックス１００４がチェックされると、ＢＧＭ音量抑制フラグ９０２が設定される。 FIG. 8 is an example of a user interface screen output to the output device 3 in this step. In the figure, reference numerals 1001 and 1002 denote buttons that are selected by input from the input device 5. When 1001 is selected, the speech synthesis start type is “absolute value setting”, and 1002 is If it is selected, the speech synthesis start type is “instruction by the user”. Furthermore, when 1001 is selected, a numerical value input field 1003 can be input, and a speech synthesis start point is input. When the check box 1004 is checked, a BGM volume suppression flag 902 is set.

上記音声合成開始方法取得ステップＳ４の後、処理はＢＧＭ出力終了方法取得ステップＳ５に移る。ＢＧＭ出力終了方法取得ステップＳ５では、ＢＧＭ出力の終了方法に関するパラメータを取得する。ＢＧＭ出力の終了方法に関するパラメータは、具体的には、ＢＧＭ出力終了タイプ９０４・フェードアウトフラグ９０７を取得する。さらに、ＢＧＭ出力終了タイプが「ユーザによる指示」でない場合には、ＢＧＭ出力終了待ち時間９０５も取得する。さらに、音声合成開始タイプが「ユーザによる指示」である場合には、発声速度調整フラグ９０６も取得する。 After the speech synthesis start method acquisition step S4, the process proceeds to a BGM output end method acquisition step S5. In the BGM output end method acquisition step S5, parameters regarding the BGM output end method are acquired. Specifically, the BGM output end type 904 / fade-out flag 907 is acquired as the parameter related to the BGM output end method. Further, if the BGM output end type is not “instruction by the user”, the BGM output end waiting time 905 is also acquired. Furthermore, when the speech synthesis start type is “instruction by the user”, an utterance speed adjustment flag 906 is also acquired.

図９は、本ステップにおいて出力装置３に出力されるユーザインタフェース画面の一例である。図中、１１０１および１１０２は、入力装置５からの入力によってどちらかが選択されるボタンになっており、１１０１が選択された場合には、ＢＧＭ出力終了タイプが「絶対値設定」となり、１１０２が選択された場合には、ＢＧＭ出力終了タイプが「ユーザによる指示」となる。さらに、１１０１が選択された場合には、数値入力フィールド１１０３が入力可能となり、ＢＧＭ出力終了待ち時間が入力される。さらに、１１０２が選択された場合には、チェックボックス１１０４がチェック可能となり、チェックボックス１１０４がチェックされると、発声速度調整フラグ９０６が設定される。また、チェックボックスチェックボックス１１０５がチェックされると、フェードアウトフラグ９０７が設定される。 FIG. 9 is an example of a user interface screen output to the output device 3 in this step. In the figure, reference numerals 1101 and 1102 indicate buttons that are selected by input from the input device 5. When 1101 is selected, the BGM output end type is “absolute value setting”, and 1102 is If it is selected, the BGM output end type is “user instruction”. Further, when 1101 is selected, the numerical value input field 1103 can be input, and the BGM output end waiting time is input. Further, when 1102 is selected, the check box 1104 can be checked, and when the check box 1104 is checked, the speech rate adjustment flag 906 is set. When the check box check box 1105 is checked, a fade-out flag 907 is set.

上記ＢＧＭ出力終了方法取得ステップＳ５の後、処理はステップＳ６に移る。ステップＳ６では、上記音声合成開始方法取得ステップＳ４およびＢＧＭ出力終了方法取得ステップＳ５で取得されたパラメータをもとに、ユーザの指示による操作が必要かどうかを判定する。具体的には、上記音声合成開始タイプ９０１あるいはＢＧＭ出力終了タイプ９０４のいずれかが「ユーザによる指示」の場合に、ユーザの指示による操作が必要であると判定し、処理をＢＧＭ出力開始指示取得ステップＳ７に進め、上記音声合成開始タイプ９０１およびＢＧＭ出力終了タイプ９０４のいずれも「ユーザによる指示」でない場合に、処理を試聴開始指示取得ステップＳ２０１に進める。 After the BGM output end method acquisition step S5, the process proceeds to step S6. In step S6, it is determined whether or not an operation according to a user instruction is necessary based on the parameters acquired in the speech synthesis start method acquisition step S4 and the BGM output end method acquisition step S5. Specifically, when either of the speech synthesis start type 901 or the BGM output end type 904 is “instructed by the user”, it is determined that an operation according to the user's instruction is necessary, and the process acquires the BGM output start instruction. Proceeding to step S7, if neither the speech synthesis start type 901 nor the BGM output end type 904 is “instructed by the user”, the process proceeds to a trial listening start instruction acquiring step S201.

ＢＧＭ出力開始指示取得ステップＳ７では、ユーザによるＢＧＭ出力開始の指示を取得する。図１０に、ＢＧＭ出力開始の指示を取得する際のユーザインタフェースの一例を示す。本ステップでは、図中、ＢＧＭ出力開始ボタン１２０１以外のボタンは押下不能な状態になっている。ＢＧＭ出力開始ボタン１２０１が押下されると、ＢＧＭ出力開始の指示がなされたものとし、以下の処理に進む。 In a BGM output start instruction acquisition step S7, an instruction to start BGM output by the user is acquired. FIG. 10 shows an example of a user interface when acquiring an instruction to start BGM output. In this step, in the figure, buttons other than the BGM output start button 1201 cannot be pressed. When the BGM output start button 1201 is pressed, it is assumed that an instruction to start BGM output has been given, and the process proceeds to the following process.

次に、計時開始ステップＳ８において、計時を開始する。計時には、タイマ等の専用計時装置を用いても良いし、ＢＧＭ出力中のデータのサンプル番号を用いて、間接的に計時しても良い。 Next, in a timing start step S8, timing is started. For timing, a dedicated timing device such as a timer may be used, or the timing may be indirectly measured using a sample number of data being output from the BGM.

次に、ＢＧＭ出力開始ステップＳ９において、ＢＧＭの出力を開始する。 Next, in the BGM output start step S9, the output of BGM is started.

次に、ステップＳ１０において、上記音声合成開始タイプ９０１が「ユーザによる指示」かどうかを判定し、「ユーザによる指示」であるならば音声合成開始指示取得ステップＳ１２に処理を進め、「ユーザによる指示」でない時は音声合成開始点待機ステップＳ１１に処理を進める。 Next, in step S10, it is determined whether or not the speech synthesis start type 901 is “user instruction”. If it is “user instruction”, the process proceeds to speech synthesis start instruction acquisition step S12. If not, the process proceeds to the speech synthesis start point standby step S11.

音声合成開始点待機ステップＳ１１では、上記計時開始後の時間が音声合成開始点９０３に達するまで待機し、計時開始後の時間が音声合成開始点９０３に達した後にステップＳ１４に処理を進める。 In the speech synthesis start point waiting step S11, the process waits until the time after the start of the time reaches the speech synthesis start point 903, and the process proceeds to step S14 after the time after the start of the time reaches the speech synthesis start point 903.

音声合成開始指示取得ステップＳ１２では、ユーザによる音声合成開始の指示を取得する。本ステップでは、図１０中、音声合成開始ボタン１２０２以外のボタンは押下不能な状態になっている。音声合成開始ボタン１２０２が押下されると、音声合成開始の指示がなされたものとし、以下の処理に進む。 In a voice synthesis start instruction acquisition step S12, a voice synthesis start instruction by the user is acquired. In this step, buttons other than the speech synthesis start button 1202 in FIG. When the voice synthesis start button 1202 is pressed, it is assumed that a voice synthesis start instruction has been issued, and the process proceeds to the following process.

音声合成開始点記録ステップＳ１３では、上記計時開始後の時間を音声合成開始点９０３として記録する。 In the speech synthesis start point recording step S13, the time after the start of the time measurement is recorded as the speech synthesis start point 903.

次に、ステップＳ１４において、ＢＧＭ音量抑制フラグ９０２が設定されているか判定し、ＢＧＭ音量抑制フラグ９０２が設定されている場合には処理をＢＧＭ音量抑制ステップＳ１５に進め、ＢＧＭ音量抑制フラグ９０２が設定されていない場合には処理を音声合成開始ステップＳ１６に進める。 Next, in step S14, it is determined whether or not the BGM volume suppression flag 902 is set. If the BGM volume suppression flag 902 is set, the process proceeds to the BGM volume suppression step S15, and the BGM volume suppression flag 902 is set. If not, the process proceeds to speech synthesis start step S16.

ＢＧＭ音量抑制ステップＳ１５では、ＢＧＭの出力音量を抑制する。 In the BGM volume suppression step S15, the output volume of the BGM is suppressed.

次に、音声合成開始ステップＳ１６では、音声合成を開始し、ＢＧＭとの重畳音声を出力する。 Next, in speech synthesis start step S16, speech synthesis is started and a superimposed speech with BGM is output.

次に、合成音声時間長計算ステップＳ１０１では、音声合成内容７０２を標準速度で合成した場合の時間長を計算する。 Next, in the synthesized speech time length calculation step S101, the time length when the speech synthesis content 702 is synthesized at the standard speed is calculated.

次に、ステップＳ１０２において、上記ＢＧＭ出力終了タイプ９０４が「ユーザによる指示」かどうかを判定し、「ユーザによる指示」であるならばＢＧＭ出力終了指示取得ステップＳ１０３に処理を進め、「ユーザによる指示」でない時はＢＧＭ出力終了点計算ステップＳ１１１に処理を進める。 Next, in step S102, it is determined whether or not the BGM output end type 904 is "user instruction". If it is "user instruction", the process proceeds to BGM output end instruction acquisition step S103. If not, the process proceeds to BGM output end point calculation step S111.

ＢＧＭ出力終了指示取得ステップＳ１０３では、ユーザによるＢＧＭ出力終了の指示を取得する。本ステップでは、図１０中、ＢＧＭ出力終了ボタン１２０３以外のボタンは押下不能な状態になっている。ＢＧＭ出力終了ボタン１２０３が押下されると、ＢＧＭ出力終了の指示がなされたものとし、以下の処理に進む。 In the BGM output end instruction acquisition step S103, a BGM output end instruction by the user is acquired. In this step, buttons other than the BGM output end button 1203 in FIG. 10 cannot be pressed. When the BGM output end button 1203 is pressed, it is assumed that an instruction to end BGM output has been given, and the process proceeds to the following process.

次に、ＢＧＭ出力終了点記録ステップＳ１０４で、上記計時開始後の時間をＢＧＭ出力終了点９０８として記録する。 Next, in the BGM output end point recording step S104, the time after the start of the time measurement is recorded as the BGM output end point 908.

次に、ステップＳ１０５で、発声速度調整フラグ９０６が設定されているか判定し、発声速度調整フラグ９０６が設定されている場合には処理を発声速度計算ステップＳ１０６に進め、発声速度調整フラグ９０６が設定されていない場合には処理をステップＳ１０７に進める。 Next, in step S105, it is determined whether the utterance speed adjustment flag 906 is set. If the utterance speed adjustment flag 906 is set, the process proceeds to the utterance speed calculation step S106, and the utterance speed adjustment flag 906 is set. If not, the process proceeds to step S107.

発声速度計算ステップＳ１０６では、発声速度調整後の発声速度を計算する。発声速度調整後の発声速度は、（発声速度調整後の発声速度）＝（（ＢＧＭ出力終了点）−（音声合成開始点））÷（合成音声時間長）として計算される。計算結果は、発声速度９０９に保持される。本ステップ終了後、処理をステップＳ１０７に進める。 In the utterance speed calculation step S106, the utterance speed after the utterance speed adjustment is calculated. The speech rate after the speech rate adjustment is calculated as (speech rate after speech rate adjustment) = ((BGM output end point) − (speech synthesis start point)) ÷ (synthesized speech time length). The calculation result is held at the utterance speed 909. After this step ends, the process proceeds to step S107.

ステップＳ１０７では、フェードアウトフラグ９０７が設定されているか判定し、フェードアウトフラグ９０７が設定されている場合には処理をＢＧＭフェードアウトステップＳ１０８に進め、フェードアウトフラグ９０７が設定されていない場合には処理をＢＧＭ停止ステップＳ１０９に進める。 In step S107, it is determined whether the fade-out flag 907 is set. If the fade-out flag 907 is set, the process proceeds to the BGM fade-out step S108. If the fade-out flag 907 is not set, the process is stopped. Proceed to step S109.

フェードアウトステップＳ１０８では、ＢＧＭ出力をフェードアウトさせる。 In the fade-out step S108, the BGM output is faded out.

次に、ＢＧＭ出力停止ステップＳ１０９で、ＢＧＭ出力を停止する。 Next, in the BGM output stop step S109, the BGM output is stopped.

次に、第一音声合成終了待機ステップＳ１１０で音声合成の終了を待つ。音声合成が終了したら、処理を図４の試聴開始指示取得ステップＳ２０１に移す。 Next, the first speech synthesis end waiting step S110 waits for the end of speech synthesis. When the speech synthesis is completed, the process proceeds to the trial listening start instruction acquisition step S201 in FIG.

ＢＧＭ出力終了点計算ステップＳ１１１では、ＢＧＭ出力の終了点を計算する。ＢＧＭ出力終了点は、（ＢＧＭ出力終了点）＝（合成音声時間長）＋（音声合成開始点）として計算される。 In the BGM output end point calculation step S111, the end point of BGM output is calculated. The BGM output end point is calculated as (BGM output end point) = (synthesized speech time length) + (speech synthesis start point).

次に、ステップＳ１１２でＢＧＭ出力終了待ち時間９０５の正負を判定し、ＢＧＭ出力終了待ち時間９０５が正であれば第二音声合成終了待機ステップＳ１１３に処理を進め、ＢＧＭ出力終了待ち時間９０５が負であればＢＧＭ出力終了点待機ステップＳ１１５に処理を進める。 Next, in step S112, it is determined whether the BGM output end waiting time 905 is positive or negative. If the BGM output end waiting time 905 is positive, the process proceeds to the second speech synthesis end waiting step S113, and the BGM output end waiting time 905 is negative. If so, the process proceeds to BGM output end point waiting step S115.

第二音声合成終了待機ステップＳ１１３では、音声合成の終了を待つ。 In the second speech synthesis end waiting step S113, the end of speech synthesis is awaited.

次に、ＢＧＭ音量復帰ステップＳ１１４で、ＢＧＭの音量を通常の音量に設定する。 Next, in the BGM volume return step S114, the BGM volume is set to a normal volume.

次に、ＢＧＭ出力終了点待機ステップＳ１１５で、上記計時開始後の時間がＢＧＭ出力終了点９０８に達するまで待機し、計時開始後の時間がＢＧＭ出力終了点９０８に達した後にステップＳ１０７に処理を進める。 Next, in the BGM output end point waiting step S115, the process waits until the time after the timing starts reaches the BGM output end point 908. After the time after the timing starts reaches the BGM output end point 908, the process proceeds to step S107. Proceed.

試聴開始指示取得ステップＳ２０１では、ユーザからの試聴開始の指示を取得する。本ステップにおいて出力装置３に出力されるユーザインタフェース画面の一例を図１１に示す。 In the trial listening start instruction obtaining step S201, a trial listening start instruction from the user is obtained. An example of a user interface screen output to the output device 3 in this step is shown in FIG.

次に、ステップＳ２０２で応答メッセージ再生サブルーチンを呼び出す。応答メッセージ再生サブルーチンの処理内容については後述する。 Next, a response message reproduction subroutine is called in step S202. The processing contents of the response message reproduction subroutine will be described later.

次に、試聴結果取得ステップＳ２０３で、ユーザの試聴結果を取得する。本ステップにおいて出力装置３に出力されるユーザインタフェース画面の一例を図１２に示す。 Next, in the audition result acquisition step S203, the user's audition result is acquired. An example of a user interface screen output to the output device 3 in this step is shown in FIG.

次に、ステップＳ２０４において、上記試聴結果取得ステップＳ２０３で取得された試聴結果が「やり直し」であるか判定し、試聴結果が「やり直し」であれば処理を上記発声速度初期化ステップＳ３に移し、試聴結果が「やり直し」でなければ処理をパラメータセーブステップＳ２０５に移す。 Next, in step S204, it is determined whether or not the audition result acquired in the audition result acquisition step S203 is “redo”. If the audition result is “redo”, the process proceeds to the utterance speed initialization step S3. If the audition result is not “redo”, the process proceeds to parameter saving step S205.

パラメータセーブステップＳ２０５では、ＲＡＭ９上にある音声開始・ＢＧＭ出力終了に関する各パラメータを、外部記憶装置７にコピーする。 In the parameter saving step S205, each parameter related to the voice start / BGM output end in the RAM 9 is copied to the external storage device 7.

パラメータセーブステップＳ２０５の終了後、応答メッセージ作成の処理を終了する。 After the parameter saving step S205 ends, the response message creation process ends.

次に、応答メッセージ再生サブルーチンの処理フローについて図５を参照して説明する。 Next, the processing flow of the response message reproduction subroutine will be described with reference to FIG.

本実施形態では、応答メッセージ再生サブルーチンの処理をイベント駆動型の処理として説明している。対象となるイベントは、「音声合成開始点に到達した際のタイマイベント」・「ＢＧＭ終了点に到達した際のタイマイベント」・「音声合成終了のソフトウェア割り込み」である。 In the present embodiment, the processing of the response message reproduction subroutine is described as event-driven processing. The target events are “timer event when reaching the speech synthesis start point”, “timer event when reaching the BGM end point”, and “software interrupt at the end of speech synthesis”.

まず、タイマ開始ステップＳ３０１でタイマイベント用のタイマを開始する。 First, a timer for a timer event is started in timer start step S301.

次に、ＢＧＭ出力開始ステップＳ３０２で、ＢＧＭを出力する。 Next, in the BGM output start step S302, BGM is output.

次に、イベント待機ステップＳ３０３でイベントを待つ。イベント待機ステップＳ３０３は、上記いずれかのイベントが生ずると次の処理に移る。 Next, it waits for an event in event waiting step S303. The event waiting step S303 proceeds to the next processing when any of the above events occurs.

次に、ステップＳ３０４で、上記イベント待機ステップＳ３０３で得られたイベント種別に基づいて処理の遷移先を決定する。イベント種別が「音声合成開始点に到達した際のタイマイベント」であればステップＳ３０５に処理を移し、イベント種別が「ＢＧＭ終了点に到達した際のタイマイベント」であればステップＳ３０９に処理を移し、イベント種別が「音声合成終了のソフトウェア割り込み」であればステップＳ３１３に処理を移す。 Next, in step S304, the process transition destination is determined based on the event type obtained in the event standby step S303. If the event type is “timer event when the speech synthesis start point is reached”, the process proceeds to step S305. If the event type is “timer event when the BGM end point is reached”, the process proceeds to step S309. If the event type is “software interrupt at the end of speech synthesis”, the process proceeds to step S313.

ステップＳ３０５では、ＢＧＭ音量抑制フラグ９０２が設定されているか判定し、ＢＧＭ音量抑制フラグ９０２が設定されている場合には処理をＢＧＭ音量抑制ステップＳ３０６に進め、ＢＧＭ音量抑制フラグ９０２が設定されていない場合には処理を発声速度設定ステップＳ３０７に進める。 In step S305, it is determined whether the BGM volume suppression flag 902 is set. If the BGM volume suppression flag 902 is set, the process proceeds to the BGM volume suppression step S306, and the BGM volume suppression flag 902 is not set. In that case, the process proceeds to the speech rate setting step S307.

ＢＧＭ音量抑制ステップＳ３０６では、ＢＧＭの出力音量を抑制する。 In the BGM volume suppression step S306, the output volume of the BGM is suppressed.

発声速度設定ステップＳ３０７では、発声速度９０９の内容に基づいて、音声合成の発声速度を変更する。 In the speech rate setting step S307, the speech synthesis speech rate is changed based on the content of the speech rate 909.

次に、音声合成開始ステップＳ３０８では、音声合成を開始し、ＢＧＭとの重畳音声を出力し、イベント待機ステップＳ３０３に処理を戻す。 Next, in speech synthesis start step S308, speech synthesis is started, superimposed speech with BGM is output, and the process returns to event waiting step S303.

ステップＳ３０９では、フェードアウトフラグ９０７が設定されているか判定し、フェードアウトフラグ９０７が設定されている場合には処理をＢＧＭフェードアウトステップＳ３１０に進め、フェードアウトフラグ９０７が設定されていない場合には処理をＢＧＭ停止ステップＳ３１１に進める。 In step S309, it is determined whether the fade-out flag 907 is set. If the fade-out flag 907 is set, the process proceeds to a BGM fade-out step S310. If the fade-out flag 907 is not set, the process is stopped. The process proceeds to step S311.

フェードアウトステップＳ３１０では、ＢＧＭ出力をフェードアウトさせる。 In the fade-out step S310, the BGM output is faded out.

次に、ＢＧＭ出力停止ステップＳ３１１で、ＢＧＭ出力を停止する。 Next, in the BGM output stop step S311, the BGM output is stopped.

次に、ステップＳ３１２で、既に音声合成が終了しているか判定し、音声合成が終了していれば応答メッセージ再生サブルーチンを終了し、音声合成が終了していなければイベント待機ステップＳ３０３に処理を戻す。 Next, in step S312, it is determined whether speech synthesis has already been completed. If speech synthesis has been completed, the response message reproduction subroutine is terminated. If speech synthesis has not been completed, the process returns to event waiting step S303. .

ＢＧＭ音量復帰ステップＳ３１３では、ＢＧＭの音量を通常の音量に設定する。 In the BGM volume return step S313, the BGM volume is set to a normal volume.

次に、ステップＳ３１４で、既にＢＧＭ出力が終了しているか判定し、ＢＧＭ出力が終了していれば応答メッセージ再生サブルーチンを終了し、ＢＧＭ出力が終了していなければイベント待機ステップＳ３０３に処理を戻す。 Next, in step S314, it is determined whether the BGM output has been completed. If the BGM output has been completed, the response message reproduction subroutine is terminated. If the BGM output has not been completed, the process returns to the event waiting step S303. .

以上が、応答メッセージ作成時の処理フローである。 The above is the processing flow when creating the response message.

次に、留守番電話等において応答メッセージを出力する際の処理について図６を参照して説明する。 Next, processing when outputting a response message in an answering machine or the like will be described with reference to FIG.

図６は、本実施形態の応答メッセージの出力処理を示すフローチャートである。 FIG. 6 is a flowchart showing response message output processing according to this embodiment.

図６において、まず、応答メッセージ再生開始指示ステップＳ４０１で応答メッセージ再生の指示を検出する。留守番電話機能の場合、本ステップは着呼の検出が相当する。 In FIG. 6, first, a response message reproduction instruction is detected in response message reproduction start instruction step S401. In the case of the answering machine function, this step corresponds to detection of an incoming call.

次に、パラメータロードステップＳ４０２で、上記パラメータセーブステップＳ２０５でセーブされた音声開始・ＢＧＭ出力終了に関する各パラメータを、ＲＡＭ９にコピーする。 Next, in the parameter loading step S402, the parameters related to the voice start / BGM output end saved in the parameter saving step S205 are copied to the RAM 9.

次に、出力先設定ステップＳ４０３で音声合成およびＢＧＭの出力先を設定する。留守番電話機能の場合、通信装置６が出力先となる。 Next, in the output destination setting step S403, the output destination of speech synthesis and BGM is set. In the case of the answering machine function, the communication device 6 is the output destination.

次に、ステップＳ４０４で応答メッセージを再生する。応答メッセージ再生の処理フローは、音声合成およびＢＧＭの出力先が異なること以外は、前述の応答メッセージ再生サブルーチンの処理フローと同様である。 Next, the response message is reproduced in step S404. The processing flow of the response message playback is the same as the processing flow of the response message playback subroutine described above except that the speech synthesis and the output destination of BGM are different.

以上が留守番電話等において応答メッセージを出力する際の処理である。
［第２の実施形態］
第１の実施形態では、応答メッセージの内容は音声合成によって生成されるものとして記述したが、音声合成であることは必須ではない。例えば、ユーザあるいは他者の発声をあらかじめ録音した録音音声の再生に基づくものであっても良い。 The above is the processing for outputting a response message in an answering machine or the like.
[Second Embodiment]
In the first embodiment, the content of the response message is described as being generated by speech synthesis, but it is not essential to be speech synthesis. For example, it may be based on reproduction of a recorded voice in which a user or another person's voice is recorded in advance.

この場合、上記第１の実施形態における発声速度調整は、話速変換技術に基づくものとなる。話速変換技術として、特開平５−２５７４９０号公報等、簡便なものから多くの信号処理を伴うものまで様々な手法が提案されているが、本発明に関しては、話速変換技術の種類は問わない。 In this case, the speech rate adjustment in the first embodiment is based on the speech rate conversion technique. As a speech speed conversion technique, various methods such as Japanese Patent Application Laid-Open No. 5-257490 have been proposed, ranging from a simple one to one accompanied with many signal processings. Absent.

以下、応答メッセージが録音音声の再生に基づく場合の実施形態について具体的に説明する。なお、以下の説明において、第１の実施形態と同等の機能を持つ要素については、同一の符号を付して示している。 Hereinafter, an embodiment in the case where the response message is based on the reproduction of the recorded voice will be specifically described. In the following description, elements having functions equivalent to those of the first embodiment are denoted by the same reference numerals.

図１３は本発明に係る第２の実施形態の音声データ作成装置のハードウェア構成図である。 FIG. 13 is a hardware configuration diagram of an audio data creation apparatus according to the second embodiment of the present invention.

中央処理装置１乃至通信装置６およびＲＯＭ８は、第１の実施形態と同様である。 The central processing unit 1 to the communication device 6 and the ROM 8 are the same as those in the first embodiment.

７は、機能面では第１の実施形態と同様であり、ＢＧＭデータ７０１・オリジナル録音音声７０３・録音音声７０４が保持される。 7 is the same as the first embodiment in terms of function, and BGM data 701, original recording voice 703, and recording voice 704 are held.

９は、機能面では第１の実施形態と同様であり、音声再生開始タイプ９１０・ＢＧＭ音量抑制フラグ９０２・音声再生開始点９１１・ＢＧＭ出力終了タイプ９０４・ＢＧＭ出力終了待ち時間９０５・発声速度調整フラグ９０６・フェードアウトフラグ９０７・ＢＧＭ出力終了点９０８およびその他の一時的なデータや各種フラグ等が保持される。 9 is the same as the first embodiment in terms of function, and is an audio reproduction start type 910, a BGM volume suppression flag 902, an audio reproduction start point 911, a BGM output end type 904, a BGM output end waiting time 905, and an utterance speed adjustment. A flag 906, a fade-out flag 907, a BGM output end point 908, other temporary data, various flags, and the like are held.

上記中央処理装置１乃至ＲＡＭ９は、バスで接続されている。 The central processing unit 1 to RAM 9 are connected by a bus.

次に、本実施形態の応答メッセージ作成処理について、図１４乃至図１６のフローチャートを参照して説明する。 Next, the response message creation processing of this embodiment will be described with reference to the flowcharts of FIGS.

まず、ＢＧＭデータ取得ステップＳ１で、ＢＧＭデータを取得する。ＢＧＭデータは、音声入力装置４から録音された音楽等を外部記憶装置７に記録するような形態であっても良いし、通信装置６を介してインターネット等に接続し、符号化された音楽データや何らかの記述言語で記述された音楽データをダウンロードするような形態であっても良い。あるいは、あらかじめ用意された標準ＢＧＭデータをＲＯＭ８から読み込んで用いても良い。 First, in the BGM data acquisition step S1, BGM data is acquired. The BGM data may be in a form in which music recorded from the voice input device 4 is recorded in the external storage device 7, or encoded music data connected to the Internet or the like via the communication device 6. Alternatively, the music data described in some description language may be downloaded. Alternatively, standard BGM data prepared in advance may be read from the ROM 8 and used.

次に、オリジナル録音音声取得ステップＳ５０１で、応答メッセージの内容として再生まれる音声を取得する。オリジナル録音音声は、音声入力装置４からユーザが入力する形態であっても良いし、通信装置６を介してダウンロードするような形態であっても良い。あるいは、あらかじめ標準として用意された標準録音音声をＲＯＭ８から読み込んで用いても良い。 Next, in the original recording voice acquisition step S501, the voice to be reproduced as the content of the response message is acquired. The original recorded voice may be input by the user from the voice input device 4 or may be downloaded via the communication device 6. Alternatively, a standard recording voice prepared as a standard in advance may be read from the ROM 8 and used.

次に、録音音声初期化ステップＳ５０２で、録音音声７０４をオリジナル録音音声７０３に初期化する。 Next, the recorded voice 704 is initialized to the original recorded voice 703 in a recorded voice initialization step S502.

次に、音声再生開始方法取得ステップＳ５０３で、音声再生の開始方法に関するパラメータを取得する。音声再生の開始方法に関するパラメータは、具体的には、音声再生開始タイプ９１０・ＢＧＭ音量抑制フラグ９０２を取得する。さらに、音声再生開始タイプが「ユーザによる指示」でない場合には、音声再生開始点９１１も取得する。 Next, in an audio reproduction start method acquisition step S503, parameters relating to the audio reproduction start method are acquired. Specifically, the parameter relating to the voice reproduction start method acquires the voice reproduction start type 910 / BGM volume suppression flag 902. Further, when the audio reproduction start type is not “instruction by the user”, an audio reproduction start point 911 is also acquired.

次に、ＢＧＭ出力終了方法取得ステップＳ５で、ＢＧＭ出力の終了方法に関するパラメータを取得する。ＢＧＭ出力の終了方法に関するパラメータは、具体的には、ＢＧＭ出力終了タイプ９０４・フェードアウトフラグ９０７を取得する。さらに、ＢＧＭ出力終了タイプが「ユーザによる指示」でない場合には、ＢＧＭ出力終了待ち時間９０５も取得する。さらに、音声再生開始タイプが「ユーザによる指示」である場合には、発声速度調整フラグ９０６も取得する。 Next, in the BGM output end method acquisition step S5, parameters relating to the BGM output end method are acquired. Specifically, the BGM output end type 904 / fade-out flag 907 is acquired as the parameter related to the BGM output end method. Further, if the BGM output end type is not “instruction by the user”, the BGM output end waiting time 905 is also acquired. Furthermore, when the sound reproduction start type is “instruction by the user”, an utterance speed adjustment flag 906 is also acquired.

次に、ステップＳ６で、上記音声再生開始方法取得ステップＳ５０３およびＢＧＭ出力終了方法取得ステップＳ５で取得されたパラメータをもとに、ユーザの指示による操作が必要かどうかを判定する。具体的には、上記音声再生開始タイプ９１０あるいはＢＧＭ出力終了タイプ９０４のいずれかが「ユーザによる指示」の場合に、ユーザの指示による操作が必要であると判定し、処理をＢＧＭ出力開始指示取得ステップＳ７に進め、上記音声再生開始タイプ９１０およびＢＧＭ出力終了タイプ９０４のいずれも「ユーザによる指示」でない場合に、処理を試聴開始指示取得ステップＳ２０１に進める。 Next, in step S6, it is determined whether or not an operation according to a user instruction is necessary based on the parameters acquired in the audio reproduction start method acquisition step S503 and the BGM output end method acquisition step S5. Specifically, when either of the audio reproduction start type 910 or the BGM output end type 904 is “instructed by the user”, it is determined that an operation according to the user's instruction is necessary, and the process acquires the BGM output start instruction. Proceeding to step S7, if neither the audio reproduction start type 910 nor the BGM output end type 904 is “instructed by the user”, the process proceeds to trial listening start instruction obtaining step S201.

ＢＧＭ出力開始指示取得ステップＳ７では、ユーザによるＢＧＭ出力開始の指示を取得する。 In a BGM output start instruction acquisition step S7, an instruction to start BGM output by the user is acquired.

次に、ステップＳ５０４において、上記音声再生開始タイプ９１０が「ユーザによる指示」かどうかを判定し、「ユーザによる指示」であるならば音声再生開始指示取得ステップＳ５０６に処理を進め、「ユーザによる指示」でない時は音声再生開始点待機ステップＳ５０５に処理を進める。 Next, in step S504, it is determined whether or not the voice reproduction start type 910 is "user instruction". If it is "user instruction", the process proceeds to voice reproduction start instruction acquisition step S506. If not, the process proceeds to the voice reproduction start point standby step S505.

音声再生開始点待機ステップＳ５０５では、上記計時開始後の時間が音声再生開始点９１１に達するまで待機し、計時開始後の時間が音声再生開始点９１１に達した後にステップＳ１４に処理を進める。 In the audio reproduction start point standby step S505, the operation waits until the time after the time measurement starts reaches the audio reproduction start point 911, and after the time after the time measurement reaches the audio reproduction start point 911, the process proceeds to step S14.

音声再生開始指示取得ステップＳ５０６では、ユーザによる音声再生開始の指示を取得する。 In an audio reproduction start instruction acquisition step S506, an instruction to start audio reproduction by the user is acquired.

音声再生開始点記録ステップＳ５０７では、上記計時開始後の時間を音声再生開始点９１１として記録する。 In the audio reproduction start point recording step S507, the time after the start of the time measurement is recorded as the audio reproduction start point 911.

次に、ステップＳ１４において、ＢＧＭ音量抑制フラグ９０２が設定されているか判定し、ＢＧＭ音量抑制フラグ９０２が設定されている場合には処理をＢＧＭ音量抑制ステップＳ１５に進め、ＢＧＭ音量抑制フラグ９０２が設定されていない場合には処理を音声再生開始ステップＳ５０８に進める。 Next, in step S14, it is determined whether or not the BGM volume suppression flag 902 is set. If the BGM volume suppression flag 902 is set, the process proceeds to the BGM volume suppression step S15, and the BGM volume suppression flag 902 is set. If not, the process proceeds to the audio reproduction start step S508.

次に、音声再生開始ステップＳ５０８では、録音音声をＢＧＭに重畳して出力する。 Next, in a sound reproduction start step S508, the recorded sound is superimposed on the BGM and output.

次に、録音音声時間長計算ステップＳ６０１では、録音音声７０４の時間長を計算する。 Next, in recording voice time length calculation step S601, the time length of the recording voice 704 is calculated.

ＢＧＭ出力終了指示取得ステップＳ１０３では、ユーザによるＢＧＭ出力終了の指示を取得する。 In the BGM output end instruction acquisition step S103, a BGM output end instruction by the user is acquired.

次に、ステップＳ１０５で、発声速度調整フラグ９０６が設定されているか判定し、発声速度調整フラグ９０６が設定されている場合には処理を話速変換ステップＳ６０２に進め、発声速度調整フラグ９０６が設定されていない場合には処理をステップＳ１０７に進める。 Next, in step S105, it is determined whether the speech rate adjustment flag 906 is set. If the speech rate adjustment flag 906 is set, the process proceeds to the speech rate conversion step S602, where the speech rate adjustment flag 906 is set. If not, the process proceeds to step S107.

話速変換ステップＳ６０２では、録音音声７０４の話速を変換する。話速変換後の録音音声の時間長は、（話速変換後の時間長）＝（ＢＧＭ出力終了点）−（音声再生開始点）となる。 In the speech speed conversion step S602, the speech speed of the recorded voice 704 is converted. The time length of the recorded voice after the speech speed conversion is (time length after the speech speed conversion) = (BGM output end point) − (voice reproduction start point).

次に、第一音声再生終了待機ステップＳ６０３で音声再生の終了を待つ。音声再生が終了したら、処理を試聴開始指示取得ステップＳ２０１に移す。 Next, the end of the audio reproduction is awaited in the first audio reproduction end waiting step S603. When the audio reproduction is completed, the process proceeds to a trial listening start instruction acquisition step S201.

ＢＧＭ出力終了点計算ステップＳ１１１では、ＢＧＭ出力の終了点を計算する。ＢＧＭ出力終了点は、（ＢＧＭ出力終了点）＝（録音音声時間長）＋（音声再生開始点）として計算される。 In the BGM output end point calculation step S111, the end point of BGM output is calculated. The BGM output end point is calculated as (BGM output end point) = (recording voice time length) + (voice playback start point).

次に、ステップＳ１１２でＢＧＭ出力終了待ち時間９０５の正負を判定し、ＢＧＭ出力終了待ち時間９０５が正であれば第二音声再生終了待機ステップＳ６０４に処理を進め、ＢＧＭ出力終了待ち時間９０５が負であればＢＧＭ出力終了点待機ステップＳ１１５に処理を進める。 Next, in step S112, it is determined whether the BGM output end waiting time 905 is positive or negative. If the BGM output end waiting time 905 is positive, the process proceeds to the second audio reproduction end waiting step S604, and the BGM output end waiting time 905 is negative. If so, the process proceeds to BGM output end point waiting step S115.

第二音声再生終了待機ステップＳ６０４では、音声再生の終了を待つ。 In the second audio reproduction end waiting step S604, the end of the audio reproduction is awaited.

試聴開始指示取得ステップＳ２０１では、ユーザからの試聴開始の指示を取得する。 In the trial listening start instruction obtaining step S201, a trial listening start instruction from the user is obtained.

次に、ステップＳ７０１で応答メッセージ再生サブルーチンを呼び出す。応答メッセージ再生サブルーチンの処理内容については後述する。 In step S701, a response message reproduction subroutine is called. The processing contents of the response message reproduction subroutine will be described later.

次に、試聴結果取得ステップＳ２０３で、ユーザの試聴結果を取得する。 Next, in the audition result acquisition step S203, the user's audition result is acquired.

次に、ステップＳ２０４において、上記試聴結果取得ステップＳ２０３で取得された試聴結果が「やり直し」であるか判定し、試聴結果が「やり直し」であれば処理を上記録音音声初期化ステップＳ５０２に移し、試聴結果が「やり直し」でなければ処理をパラメータセーブステップＳ２０５に移す。 Next, in step S204, it is determined whether or not the audition result acquired in the audition result acquisition step S203 is “redo”. If the audition result is “redo”, the process proceeds to the recording sound initialization step S502. If the audition result is not “redo”, the process proceeds to parameter saving step S205.

パラメータセーブステップＳ２０５終了後、応答メッセージ作成の処理を終了する。 After the parameter saving step S205 ends, the response message creation process ends.

次に、応答メッセージ再生サブルーチンの処理フローについて図１７を参照して説明する。 Next, the processing flow of the response message reproduction subroutine will be described with reference to FIG.

本実施形態では、応答メッセージ再生サブルーチンの処理をイベント駆動型の処理として説明している。対象となるイベントは、「音声再生開始点に到達した際のタイマイベント」・「ＢＧＭ終了点に到達した際のタイマイベント」・「音声再生終了のソフトウェア割り込み」である。 In the present embodiment, the processing of the response message reproduction subroutine is described as event-driven processing. The target events are “timer event when reaching the audio playback start point”, “timer event when reaching the BGM end point”, and “software interrupt at the end of audio playback”.

次に、ステップＳ８０１で、上記イベント待機ステップＳ３０３で得られたイベント種別に基づいて処理の遷移先を決定する。イベント種別が「音声再生開始点到達した際のタイマイベント」であればステップＳ３０５に処理を移し、イベント種別が「ＢＧＭ終了点に到達した際のタイマイベント」であればステップＳ３０９に処理を移し、イベント種別が「音声再生終了のソフトウェア割り込み」であればステップＳ３１３に処理を移す。 Next, in step S801, a process transition destination is determined based on the event type obtained in the event standby step S303. If the event type is “timer event when the audio playback start point is reached”, the process proceeds to step S305, and if the event type is “timer event when the BGM end point is reached”, the process proceeds to step S309. If the event type is “software interrupt at the end of audio reproduction”, the process proceeds to step S313.

次に、音声再生開始ステップＳ８０２では、録音音声をＢＧＭに重畳して出力し、イベント待機ステップＳ３０３に処理を戻す。 Next, in a sound reproduction start step S802, the recorded sound is superimposed on the BGM and output, and the process returns to the event waiting step S303.

次に、ステップＳ８０３で、既に音声再生が終了しているか判定し、音声再生が終了していれば応答メッセージ再生サブルーチンを終了し、音声再生が終了していなければイベント待機ステップＳ３０３に処理を戻す。 Next, in step S803, it is determined whether the audio reproduction has been completed. If the audio reproduction has been completed, the response message reproduction subroutine is terminated. If the audio reproduction has not been completed, the process returns to the event waiting step S303. .

なお、留守番電話等において応答メッセージを出力する際の処理については、実施形態１と同様であるため省略する。
［他の実施形態］
上記各実施形態において、合成音声とＢＧＭあるいは録音音声とＢＧＭを応答メッセージ再生時に重畳するものとして説明したが、合成音声とＢＧＭあるいは録音音声とＢＧＭを重畳したデータを外部記憶装置７に保持し、応答メッセージ再生時には外部記憶装置７に保持されたデータを再生するようにしても良い。これにより、応答メッセージ再生時の計算量を削減することが可能である。特に、ＢＧＭがサウンドプロセッサ等の専用プロセッサを用いない場合（例えば、符号化されたＰＣＭデータ等）に有効である。 Note that the processing for outputting a response message in an answering machine or the like is the same as that in the first embodiment, and is therefore omitted.
[Other Embodiments]
In each of the embodiments described above, the synthesized voice and BGM or the recorded voice and BGM are described as being superimposed when the response message is reproduced. When the response message is reproduced, the data held in the external storage device 7 may be reproduced. Thereby, it is possible to reduce the amount of calculation at the time of response message reproduction. This is particularly effective when the BGM does not use a dedicated processor such as a sound processor (for example, encoded PCM data).

また、上記各実施形態において、ＢＧＭ出力開始点が音声合成開始点あるいは音声再生開始点に先立つものとして説明したが、音声合成開始点あるいは音声再生開始点がＢＧＭ出力開始点に先立つことを許すような実装を行っても良い。 In each of the above embodiments, the BGM output start point has been described as preceding the voice synthesis start point or the voice playback start point. However, the voice synthesis start point or the voice playback start point may be allowed to precede the BGM output start point. May be implemented.

また、上記各実施形態において、ＢＧＭ出力終了点の後にフェードアウトが起こるように説明したが、フェードアウト終了とＢＧＭ出力終了点が同時になるような実装を行っても良い。 In each of the above embodiments, the fade-out occurs after the BGM output end point. However, the fade-out end and the BGM output end point may be mounted at the same time.

また、上記各実施形態において、音声合成開始点と実際に音声合成が開始される時刻は必ずしも完全に一致させる必要はない。例えば、音声合成開始点近辺で音声合成を開始するのに適切な箇所を探索し、探索された箇所から音声合成を開始しても良い。音声合成を開始するのに適切な箇所の例として、ＢＧＭの音量が小さい箇所・ＢＧＭが定常的な箇所等が挙げられる。さらに、ＢＧＭが何らかの記述言語で記述されている場合、音符間の境界・小節間の境界等も音声合成を開始するのに適切な箇所の例として挙げられる。さらに、ＢＧＭ出力終了点と実際にＢＧＭ出力が終了する時刻についても同様である。 Further, in each of the above embodiments, the voice synthesis start point and the actual voice synthesis start time do not necessarily have to coincide completely. For example, it is possible to search for an appropriate location for starting speech synthesis near the speech synthesis start point, and start speech synthesis from the searched location. As an example of a place suitable for starting speech synthesis, a place where the volume of the BGM is low, a place where the BGM is steady, and the like can be given. Furthermore, when the BGM is described in some description language, the boundary between notes, the boundary between measures, and the like can be cited as examples of places suitable for starting speech synthesis. The same applies to the BGM output end point and the time when the BGM output actually ends.

本発明に係る第１の実施形態のメッセージ作成装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the message preparation apparatus of 1st Embodiment which concerns on this invention. 第１の実施形態のメッセージ作成処理を示すフローチャートである。It is a flowchart which shows the message preparation process of 1st Embodiment. 第１の実施形態のメッセージ作成処理を示すフローチャートである。It is a flowchart which shows the message preparation process of 1st Embodiment. 第１の実施形態のメッセージ作成処理を示すフローチャートである。It is a flowchart which shows the message preparation process of 1st Embodiment. 第１の実施形態のメッセージ作成処理を示すフローチャートである。It is a flowchart which shows the message preparation process of 1st Embodiment. 第１の実施形態のメッセージ作成処理を示すフローチャートである。It is a flowchart which shows the message preparation process of 1st Embodiment. 第１の実施形態の音声合成内容取得時のユーザインタフェース画面例を示す図である。It is a figure which shows the example of a user interface screen at the time of the speech synthesis content acquisition of 1st Embodiment. 第１の実施形態の音声合成開始方法取得時のユーザインタフェース画面例を示す図である。It is a figure which shows the example of a user interface screen at the time of the speech synthesis start method acquisition of 1st Embodiment. 第１の実施形態の音声合成終了方法取得時のユーザインタフェース画面例を示す図である。It is a figure which shows the example of a user interface screen at the time of the speech synthesis end method acquisition of 1st Embodiment. 第１の実施形態のＢＧＭと音声合成の同期情報を取得する際のユーザインタフェース画面例を示す図である。It is a figure which shows the example of a user interface screen at the time of acquiring the synchronous information of BGM and speech synthesis of 1st Embodiment. 第１の実施形態の試聴開始指示を取得する際のユーザインタフェース画面例を示す図である。It is a figure which shows the example of a user interface screen at the time of acquiring the audition start instruction | indication of 1st Embodiment. 第１の実施形態のユーザの試聴結果を取得する際のユーザインタフェース画面例を示す図である。It is a figure which shows the example of a user interface screen at the time of acquiring the user's audition result of 1st Embodiment. 第２の実施形態のメッセージ作成装置のハードウェア構成を示すブロック図である。It is a block diagram which shows the hardware constitutions of the message preparation apparatus of 2nd Embodiment. 第２の実施形態のメッセージ作成処理を示すフローチャートである。It is a flowchart which shows the message preparation process of 2nd Embodiment. 第２の実施形態のメッセージ作成処理を示すフローチャートである。It is a flowchart which shows the message preparation process of 2nd Embodiment. 第２の実施形態のメッセージ作成処理を示すフローチャートである。It is a flowchart which shows the message preparation process of 2nd Embodiment. 第２の実施形態のメッセージ作成処理を示すフローチャートである。It is a flowchart which shows the message preparation process of 2nd Embodiment.

Claims

応答音と背景音とを合成して応答メッセージを作成する情報処理装置であって、
前記応答音及び背景音の各データを取得する取得手段と、
前記背景音を出力する出力手段と、
前記応答音の出力開始点を指示する開始点指示手段と、
前記出力された背景音と前記指示された応答音の出力開始点とに基づいて算出した、当該応答音と背景音間の同期情報を保持する同期情報保持手段とを具備することを特徴とする情報処理装置。 An information processing apparatus that creates a response message by synthesizing a response sound and a background sound,
Obtaining means for obtaining each data of the response sound and the background sound;
Output means for outputting the background sound;
Start point indicating means for indicating an output start point of the response sound;
Synchronization information holding means for holding synchronization information between the response sound and the background sound, which is calculated based on the output background sound and the output start point of the instructed response sound. Information processing device.

前記背景音の出力終了点を指示する終了点指示手段と、前記指示された背景音の出力終了点を保持する終了点保持手段とを更に備えることを特徴とする請求項１に記載の情報処理装置。 2. The information processing according to claim 1, further comprising: an end point instructing unit that instructs an output end point of the background sound; and an end point holding unit that holds the instructed output end point of the background sound. apparatus.

前記応答音の出力開始点及び前記背景音の出力終了点から当該応答音の出力時間長を算出する手段を更に備えることを特徴とする請求項２に記載の情報処理装置。 The information processing apparatus according to claim 2, further comprising means for calculating an output time length of the response sound from an output start point of the response sound and an output end point of the background sound.

前記応答音は記号で表現され、音声合成により出力されることを特徴とする請求項１乃至３のいずれか１項に記載の情報処理装置。 The information processing apparatus according to claim 1, wherein the response sound is expressed by a symbol and output by speech synthesis.

前記応答音は音声データで表現され、前記応答音は当該音声データにより再生されることを特徴とする請求項１乃至３のいずれか１項に記載の情報処理装置。 The information processing apparatus according to any one of claims 1 to 3, wherein the response sound is expressed by audio data, and the response sound is reproduced by the audio data.

前記同期情報に基づいて、前記応答音の出力開始点を判定する判定手段と、
前記判定結果に基づいて、前記応答音を前記背景音に重畳させて音声出力する音声出力手段とを具備することを特徴とする請求項１記載の情報処理装置。 Determination means for determining an output start point of the response sound based on the synchronization information;
The information processing apparatus according to claim 1, further comprising: an audio output unit that outputs the audio by superimposing the response sound on the background sound based on the determination result.

前記背景音の出力終了点を指示する終了点指示手段と、前記指示された背景音の出力終了点を保持する終了点保持手段とを更に備え、
前記音声出力手段は、前記背景音の出力終了点以降は応答音のみを音声出力するように前記背景音の出力を終了することを特徴とする請求項６に記載の情報処理装置。 An end point indicating means for instructing the output end point of the background sound; and an end point holding means for holding the output end point of the instructed background sound,
The information processing apparatus according to claim 6, wherein the sound output unit ends the output of the background sound such that only the response sound is output as a sound after the output end point of the background sound.

前記背景音の出力終了点を指示する終了点指示手段と、前記指示された背景音の出力終了点を保持する終了点保持手段とを更に備え、
前記音声出力手段は、前記背景音の出力終了点以降は背景音の音声出力を変更することを特徴とする請求項６に記載の情報処理装置。 An end point indicating means for instructing the output end point of the background sound; and an end point holding means for holding the output end point of the instructed background sound,
The information processing apparatus according to claim 6, wherein the sound output unit changes the sound output of the background sound after the output end point of the background sound.

前記応答音の出力開始点及び前記背景音の出力終了点から当該応答音の出力時間長を算出する手段を更に備え、
前記音声出力手段は、前記出力時間長に基づいて前記応答音を出力することを特徴とする請求項７に記載の情報処理装置。 Means for calculating an output time length of the response sound from an output start point of the response sound and an output end point of the background sound;
The information processing apparatus according to claim 7, wherein the voice output unit outputs the response sound based on the output time length.

前記同期情報に基づいて、前記背景音の出力音量を変更する手段を更に備えることを特徴とする請求項７に記載の情報処理装置。 The information processing apparatus according to claim 7, further comprising a unit that changes an output volume of the background sound based on the synchronization information.

前記応答音は記号で表現され、音声合成により出力されることを特徴とする請求項６乃至１０のいずれか１項に記載の情報処理装置。 The information processing apparatus according to claim 6, wherein the response sound is expressed by a symbol and output by speech synthesis.

前記応答音は音声データで表現され、前記応答音は当該音声データにより再生されることを特徴とする請求項６乃至１０のいずれか１項に記載の情報処理装置。 The information processing apparatus according to claim 6, wherein the response sound is expressed by audio data, and the response sound is reproduced by the audio data.

前記応答音は音声データで表現され、前記音声出力手段は、前記出力時間長に基づいて前記音声データを話速変換したデータを再生することを特徴とする請求項９に記載の情報処理装置。 The information processing apparatus according to claim 9, wherein the response sound is expressed by voice data, and the voice output unit plays back data obtained by converting the voice data based on the output time length.

応答音と背景音とを合成して応答メッセージを作成する情報処理方法であって、
前記応答音及び背景音の各データを取得する取得工程と、
前記背景音を出力する出力工程と、
前記応答音の出力開始点を指示する開始点指示工程と、
前記出力された背景音と前記指示された応答音の出力開始点とに基づいて算出した、当該応答音と背景音間の同期情報を保持する同期情報保持工程とを備えることを特徴とする情報処理方法。 An information processing method for creating a response message by synthesizing a response sound and a background sound,
An acquisition step of acquiring each data of the response sound and the background sound;
An output step of outputting the background sound;
A start point indicating step of indicating an output start point of the response sound;
And a synchronization information holding step for holding synchronization information between the response sound and the background sound, which is calculated based on the output background sound and the output start point of the instructed response sound. Processing method.

請求項１４に記載の方法を、コンピュータに実行させるためのプログラム。 A program for causing a computer to execute the method according to claim 14.

請求項１５に記載のプログラムを記憶したコンピュータ読み取り可能な記憶媒体。 A computer-readable storage medium storing the program according to claim 15.