JP4042580B2

JP4042580B2 - Terminal device for speech synthesis using pronunciation description language

Info

Publication number: JP4042580B2
Application number: JP2003018891A
Authority: JP
Inventors: 清志山木
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2003-01-28
Filing date: 2003-01-28
Publication date: 2008-02-06
Anticipated expiration: 2023-01-28
Also published as: KR20060093089A; KR20040069270A; JP2004234096A; HK1064786A1; CN1517978A; CN1267888C; KR100754571B1

Description

【０００１】
【発明の属する技術分野】
本発明は、電子メールの送受信が可能な端末装置に関し、特に、発音記述言語による音声合成をする端末装置に関する。
【０００２】
【従来の技術】
従来より、テキストで記述された文章を音声合成する技術が利用されている。現在では、さらに、合成する音声にその抑揚（イントネーション）等を付加し、より自然な人の発音ができる技術が開発されている。
一方、携帯電話機やパーソナル・コンピュータ等では、電子メールの着信を音声合成により知らせたり、テキストで記述された電子メールを音声合成して読み上げることができるようになっている。例えば、特許文献１には、電子メールの本文に添付ファイル参照用コード／文字列を挿入し、テキスト本文の読み上げに続き、挿入された所定のコード／文字列に基づき、該所定のコード／文字列に対応づけられた音楽データまたは画像データからなる添付ファイルを参照して該添付ファイルデータの再生及び／または表示をさせることで、伝わりにくい感情、楽しさ等を伝える技術が開示されている。また、この特許文献１では、所定のコード／文字列に対応する音楽データまたは画像データを、送信すべき電子メールに添付ファイルとして自動添付するものとなっている。
【０００３】
【特許文献１】
特開２００２−０７３５０７号公報
【０００４】
【発明が解決しようとする課題】
しかしながら、上記特許文献１に記載の技術は、電子メールの本文中に挿入された所定のコード等に対応する、予め用意された音楽データや画像データからなる添付ファイルを生成し、これを電子メールとともに送信し、受信側で、メール本文に挿入された所定のコード等に基づき、添付ファイルの音楽データや画像データを再生するもので、音声合成に関しては、テキスト本文を音声合成するのみで、これに音楽データや画像データの再生機能を付加したにすぎない。すなわち、送信側のユーザが音声合成させる言葉やその抑揚等を任意に指定し、受信側で再生させるものではない。
【０００５】
また、音楽データや画像データを添付ファイルに含めることで、メール本文中には、所定のコードや文字列が挿入されるのみですむものとなっているが、挿入される所定のコード等が意味のある文字列である場合はともかく、ユーザにとって意味のない制御コードである場合には、文字化けと勘違いされたり、見た目もよくない。特に、受信側が、携帯電話機等の小型の携帯型端末装置である場合には、その表示画面が小さいことから、より好ましくない。また、挿入される文字列が意味あるものであって、見た目に問題なくとも、この文字列に対応して再生されるのは予め用意された音楽データや画像データである。
【０００６】
本発明は、上記の点に鑑みてなされたもので、テキストで記述される音声発音用の発音記述言語により、電子メールの受信側にて、指定された言葉のその抑揚をも含む音声合成ができ、さらに、受信側に送る電子メール自体は、読み易く読み手に不快感を与えないものとすることができる、発音記述言語による音声合成をする端末装置を提供するものである。
【０００７】
【課題を解決するための手段】
請求項１に記載の発明は、電子メールの送受信が可能な端末装置において、テキストで記述され、指定された言葉を音声化する際の抑揚をも規定する音声発音用の発音記述言語による特殊文字列を解釈し、該特殊文字列で規定される音声を音声合成する音声合成手段と、前記電子メールを表示する表示手段とを備え、発信動作においては、作成された電子メールに前記特殊文字列が記述されている場合、前記電子メールに記述された前記特殊文字列を識別し、前記特殊文字列のみを記述した別ファイルを作成し、これを前記電子メールの添付ファイルとするとともに、前記電子メールから該特殊文字列を削除し、受信動作においては、受信した電子メールの内容を前記表示手段に表示し、該電子メールに添付された前記添付ファイルが開かれると、該添付ファイルに記述された前記特殊文字列を前記音声合成手段によって解釈し音声合成することを特徴としている。
【０００８】
本発明では、送信側の端末装置にて、ユーザが、テキストで記述される上記発音記述言語を用いて、送信する電子メールに、音声合成させたい言葉とその抑揚（イントネーション）等とを規定する特殊文字列を記述し、この電子メールを受信者宛に送信すると、受信側の端末装置は、受信した電子メールに含まれる前記特殊文字列を解釈し音声合成する。この発明では、従来の電子メールの表示に加え、上記特殊文字列を用いた音声合成によるさらなる表現効果を提供することができる。
なお、上記発音記述言語は、発音させる言葉（文字列）と、この文字列を構成する各かな文字のそれぞれに対し、抑揚を付加する場合にはこの抑揚を規定する記号をその文字に対応して付加することで上記特殊文字列を記述する構文とすることが好ましい。このようにすることで、１文字単位に抑揚を付加することができる。
【００１０】
本発明では、発信者側にて作成された電子メール内に記述された特殊文字列が所定の方法で記述されている場合、発信側の端末装置が、この特殊文字列を記述した別ファイルを作成し、この電子メールからこの特殊文字列を削除する。そして、この別ファイルを電子メールの添付ファイルとして、電子メールとともに受信側に向け送信する。受信側の端末装置は、受信した電子メールに添付された添付ファイルが開かれると、この添付ファイルに記録された特殊文字列を解釈し音声合成する。
このように、発信側の端末装置では、上記特殊文字列を、電子メールから添付ファイルに移動（添付ファイルに記述し記録するとともに、電子メールから削除）させ、受信側の端末装置では、添付ファイルを開くことによりその音声合成をすることで、受信側で電子メールを表示する際、上記特殊文字列の表示を除くことができ、電子メールの表示に係る見た目を損なうことがない。
【００１１】
また、請求項２に記載の発明は、請求項１に記載の端末装置において、前記特殊文字列の前後に該特殊文字列を識別するための専用制御文字を記述することを特徴としている。
本発明では、専用制御文字を特殊文字列の前後に記述することにより、特殊文字列を識別できるようにしている。
【００１２】
また、請求項３に記載の発明は、請求項２に記載の端末装置において、発信側の端末装置は、前記専用制御文字からなる記述を、予め決められた汎用の絵文字に置換することを特徴としている。
本発明では、専用制御文字からなる記述を、予め決められた汎用の絵文字等に置換することにより、受信側の端末装置では、送信側で記述された電子メール内の専用制御文字部分が、汎用の絵文字等となって表示される。すなわち、制御上の意味しかもたない専用制御文字に代えて、遊び心のある楽しい表現をすることができる。
【００１３】
また、請求項４に記載の発明は、請求項１に記載の端末装置において、前記添付ファイルが添付された電子メールが開かれると、これに応じて前記添付ファイルを開くことを特徴としている。
本発明では、電子メールが開かれた時点で、添付ファイルの特殊文字列が解釈され音声合成されるので、受信側のユーザは、自身で添付ファイルを開く必要がなく、また、特殊文字列による記述が別ファイル（添付ファイル）となっていても、電子メールの内容の表示と特殊文字列による音声合成とをほぼ同時に行わせることができる。
【００１４】
【発明の実施の形態】
以下、本発明の実施の形態を、図面を参照して説明する。
図１は、本発明を適用した一実施の形態である携帯電話機の構成を示すブロック図である。なお、本発明は、携帯電話機に限らず、ＰＨＳ（登録商標）（Ｐｅｒｓｏｎａｌｈａｎｄｙｐｈｏｎｅｓｙｓｔｅｍ）や、携帯情報端末（ＰＤＡ：ＰｅｒｓｏｎａｌＤｉｇｉｔａｌＡｓｓｉｓｔａｎｔ）、パーソナル・コンピュータ等にも適用できるものである。
【００１５】
図１において、符号１１は、ＣＰＵ（中央処理装置）であり、各種プログラムを実行することにより携帯端末装置１の各部の動作を制御する。
符号１２は、通信部であり、この通信部１２に備わるアンテナ１２ａで受信された信号の復調を行うとともに、送信する信号を変調してアンテナ１２ａに供給している。
上記ＣＰＵ・１１は、通信部１２で復調されたインターネット等のネットワークからの信号を、所定のプロトコルに従って復号化をし、この信号に含まれる情報（例えば、電子メール）を下記の表示部２１に表示させるとともに、電子メール（またはその添付ファイル）に後述する特殊文字列がある場合、この特殊文字列で規定される音声を下記の音声合成機能付音源部１６に音声合成させる。また、送信する電子メール等のデータは、ＣＰＵ・１１により所定のプロトコルによる符号化が行われ、通信部１２にて変調された後、送信先のサーバ（電子メールの場合は、メール・サーバ）に向け、アンテナ１２ａから基地局に送信される。
【００１６】
符号１３は、音声処理部である。通信部１２で復調された電話回線の音声信号は、この音声処理部１３において復号され、スピーカ１４から出力される。一方、マイク１５から入力された音声信号はデジタル化され音声処理部１３において圧縮符号化される。そして、通信部１２にて変調されアンテナ１２ａから携帯電話網の基地局へ出力される。音声処理部１３は、例えばＣＥＬＰ（ＣｏｄｅＥｘｃｉｔｅｄＬＰＣ）系やＡＤＰＣＭ（適応差分ＰＣＭ符号化）方式により、音声データを高能率圧縮符号化／復号化している。
【００１７】
符号１６は、音声合成機能付音源部であり、着信音として選択された楽曲データを再生しスピーカ１７から放音する。また、発音させる言葉を構成する各文字の各音素に対応する所定の音声データ（この音声データは、声質・音程等に影響するパラメータを含む）を受けた場合には、ＣＰＵ・１１からの制御を受けて、これを音声合成しスピーカ１７から発音（発声）する。この音声合成機能付音源部１６による音声合成方式は任意であるが、例えば、特公昭５８−５３３５１号公報に開示されたＣＳＭ音声合成の技術をＦＭ音源に適用することで実現できる。
また、符号１８は、操作部であり、携帯電話機１の本体に設けられた英数字のボタンを含む各種ボタン（図示せず）やその他の入力デバイスからの入力を検知する入力手段である。
【００１８】
符号１９は、ＲＡＭ（ＲａｎｄｏｍＡｃｃｅｓｓＭｅｍｏｒｙ）であり、ＣＰＵ・１１のワークエリアや、ダウンロードされた楽曲データや伴奏データ（これらは着信メロディの再生等に用いる）の格納エリアや、受信した電子メールのデータが格納されるメールデータ格納エリア等がさらに設定される。
符号２０は、ＲＯＭ（ＲｅａｄＯｎｌｙＭｅｍｏｒｙ）である。このＲＯＭ・２０は、ＣＰＵ・１１が実行する、発信・着信等の制御をする各種電話機能プログラムや楽曲再生処理を補助するプログラムや、電子メールの送受信を制御するメール送受信機能プログラムや、音声合成処理を補助するプログラム等のプログラムの他、各音素毎の音声データや楽音データ等の各種データが格納されている。
【００１９】
また、表示部２１は、ＬＣＤ（ＬｉｑｕｉｄＣｒｙｓｔａｌＤｉｓｐｌａｙ）等からなり、ＣＰＵ・１１の制御により、メニュー等の表示や、電子メールの内容等の表示や、操作部１８の操作に応じた表示をする表示器である。
符号２２は、着信時に着信音に代えて携帯電話機１の本体を振動させることにより、着信をユーザに知らせるバイブレータである。
なお、各機能ブロックはバス３０を介してデータや命令の授受を行っている。
【００２０】
ここで、言葉を音声化する際の抑揚等をも規定する音声発音用の発音記述言語による特殊文字列について説明する。
図２に、特殊文字列を含む文章（電子メールの本文）の一例を示している。この例では、同図の符号▲１▼の専用制御文字で挟まれた「か＿３さがほ＾５し＿４い’４ね＄２ー」の部分が特殊文字列を含む文章であり、他の部分が通常のテキスト文である。この特殊文字列を含む文章「か＿３さがほ＾５し＿４い’４ね＄２ー」は、「かさがほしいねー」という言葉にイントネーションを付加して音声合成させるための発音記述言語による記述である。この例に記述された記号「’」、「＾」、「＿」、「＄」等は、文字（かな文字）に付加するイントネーションの種別を示すもので、この記号の後の文字（この記号の直後に数値がある場合は、この数値に続く文字）に対して、所定のアクセントを付加するものである。
【００２１】
図３（ａ）に、本発音記述言語における各記号（代表例）の意味を示している。すなわち「’」は、語頭でピッチを上げ（図３（ｂ）▲１▼参照）、「＾」では発音中ピッチを上げ（図３（ｃ）▲３▼参照）、「＿」は、語頭でピッチを下げ（図３（ｂ）▲２▼参照）、「＄」では、発音中ピッチを下げるように（図３（ｃ）▲４▼参照）音声合成を行うことを意味している。
また、上記記号の直後に数値が付加される場合は、その数値は付加するアクセントの変化量を指定するものである。例えば、「か＿３さが」では、「さ」を語頭でピッチを３の量だけ下げることを示し、「が」をその下げたピッチで発音し、「か」は、標準の高さで発音することを示す。
【００２２】
このように、本発音記述言語は、発音させる言葉に含まれる文字にアクセント（イントネーション）を付加する場合に、その文字の直前に、図２に示すような記号（さらには、イントネーションの変化量を示す数値）を付加する記述をする構文となっている。なお、本実施の形態ではピッチを制御する記号についてのみ説明したが、これら以外に音の強弱、速度、音質等を制御する記号を用いることもできる。また、この特殊文字列は、図２に示す例のように、電子メールの本文中に記述しても、電子メールのタイトル部分に記述してもよいし、あるいは、所定の添付ファイル（例えば、その拡張子により、特殊文字列が含まれていることが識別できる添付ファイル）の中に記述し、送信する電子メールに添付するようにしてもよい。
【００２３】
次に、このように構成された本実施形態の携帯電話機１の動作について説明する。なお、通常の電話機能による発信・着信時の動作や電子メールの送受信等に係る動作については、周知の技術でありその説明は省略する。また、以下では、上記発音記述言語による特殊文字列にて記載された文をＨＶ−Ｓｃｒｉｐｔと称する。
【００２４】
［送信側の動作］
まず、送信者は、メール文作成時に、発音させたいＨＶ−Ｓｃｒｉｐｔを電子メールの本文中のいずれかの場所に（タイトル欄でもよい）、テキストで記述する（図４（ａ）参照）。このとき、送信者は、さらに専用制御文字（ここでは、図４（ｂ）の▲１▼に示す文字とする）で、このＨＶ−Ｓｃｒｉｐｔを挟むように入力する。このようにして、ＨＶ−Ｓｃｒｉｐｔを含む電子メール文が作成される。このように、文章としては読みづらい、特殊な文字列をメール文作成時には記述するが、このＨＶ−Ｓｃｒｉｐｔを添付ファイルに移動することで、電子メール自体は相手に読みやすいメール文としている。これにより、本実施の形態の携帯電話機１のみならず、本実施の形態の機能をもたない一般の携帯電話機上でも、通常のメール文と同様に表示され、読み手に不快感を与えないものとなる。
【００２５】
次に、電子メール文の作成が終わると、送信者はこの電子メールを送信する操作をする。以下は、送信側携帯電話機１の動作である（図５参照）。
送信側の携帯電話機１は、ステップＳ０１にて、メール送信操作が有るか否か判断し、メール送信操作が行われるまで待機している。
上記のように、送信者によりメール送信操作が行われると、ステップＳ０１の判断でＹｅｓと判定され、ステップＳ０２に移行する。
ステップＳ０２では、メール内に専用制御文字が有るか否かの判断をする。メール内に専用制御文字がない場合は、この判断でＮｏと判定され、ステップＳ０３に移行し、ステップＳ０３にて電子メールの送信がなされ終了する。
ここでは、上記のように、メール文中に専用制御文字が記述されているので、ステップＳ０２の判断でＹｅｓと判定され、ステップＳ０４に移行する。
【００２６】
ステップＳ０４では、新規に添付ファイルを作成する。この添付ファイルのファイル名は適宜付けられるが、拡張子には、ＨＶ−Ｓｃｒｉｐｔを含むファイルであることを示すために専用の拡張子（例えば、．ｈｖｓ）を付ける。
そして、ステップＳ０５にて、メール文中の専用制御文字に挟まれたＨＶ−Ｓｃｒｉｐｔを添付ファイルに移動させる（ここでは、専用制御文字に挟まれた文字列をＨＶ−Ｓｃｒｉｐｔとみなし、このＨＶ−Ｓｃｒｉｐｔをメール文中から抽出して、添付ファイルに記述し記録するとともに、この電子メールから削除する）。
【００２７】
次に、ステップＳ０６では、さらに、メール文中の専用制御文字の組を、所定の汎用絵文字（ここでは、図４（ｃ）の▲２▼に示す絵文字とする）に変更（置換）する。
そして、ステップＳ０７にて、ステップＳ０６にて変更を加えられた電子メールと、ステップＳ０４，Ｓ０５にて作成された添付ファイルを、指定されたアドレス送信する。なお、ステップＳ０５，Ｓ０６の処理は、別の所定の操作受けることにより、この処理のみ別途行うようにしてもよい。また、送信側の携帯電話機１に残す電子メールは、入力時のものであっても、上記処理後の電子メールと添付ファイルの組であってもよい。
【００２８】
［受信側の動作］
次に、受信側の携帯電話機１の動作を、図６のフローチャートを用いて説明する。
受信側の携帯電話機１は、ステップＳ１１にて、メールの表示操作が有るか否か判断し、ユーザによるメールの表示操作が行われるまで待機している。
ここで、受信者（ユーザ）により、メールの表示操作がなされたとする。すると、ステップＳ１１の判断で、Ｙｅｓと判定されステップＳ１２に移行する。
ステップＳ１２では、受信した電子メールの内容を表示部２１に表示する。
【００２９】
次に、ステップＳ１３にて、メール内にＨＶ−Ｓｃｒｉｐｔが有るか否か判断する。ここでは、メール内に専用制御文字の組が有るか否かでこの判断を行う。この判断で、Ｙｅｓ、すなわちメール内に専用制御文字の組が有り、ＨＶ−Ｓｃｒｉｐｔが有ると判定されると、ステップＳ１４に移行する。
ステップＳ１４では、このＨＶ−Ｓｃｒｉｐｔで指定された言葉を構成する各文字の各音素に対応する音声データを、ＲＯＭ・２０から読み出し、さらにこのＨＶ−Ｓｃｒｉｐｔに記述された、言葉にアクセントを付加する記号に基づき、発音させる音声にイントネーションをもたせるように、音声合成機能付音源１６に音声データ（このときこの音声データ自体を、その音程等を変えるために加工する場合もある）を与えるとともに所定の制御をし、音声合成をさせる。このように、メール文中にＨＶ−Ｓｃｒｉｐｔがある場合は、これを解釈し、すぐに音声合成をして発音する。
【００３０】
ステップＳ１３の判断で、Ｎｏと判定された場合、すなわち、メール内にＨＶ−Ｓｃｒｉｐｔの記述がない場合、ステップＳ１５に移行する。
ステップＳ１５では、「．ｈｖｓ」の拡張子をもつ添付ファイルが添付されているか判断する。この判断でＮｏと判定された場合、すなわち、ＨＶ−Ｓｃｒｉｐｔが全く無い場合、音声合成はされずそのまま終了する。
一方、ステップＳ１５の判断で、Ｙｅｓと判定された場合、すなわち「．ｈｖｓ」の拡張子をもつ添付ファイルが添付されている場合、ステップＳ１６に移行する。
【００３１】
ステップＳ１６では、使用するメーラーにて、「．ｈｖｓ」の拡張子をもつ添付ファイルを自動で開く設定になっているか判断する。
この判断で、Ｙｅｓと判定された場合、すなわち自動で添付ファイルを開く設定となっている場合、ステップＳ１７に移行し、ファイルを開く。
一方、この判断で、Ｎｏと判定された場合、すなわち自動で添付ファイルを開く設定となっていない場合、ステップＳ１８に移行する。
【００３２】
ステップＳ１８では、受信者による添付ファイルの展開操作が有るか否かの判断をし、受信者が添付ファイルを開く操作をするのを待機する。そして、受信者が添付ファイルを開く操作をすると、Ｙｅｓと判定されステップＳ１７に移行し、ファイルを開く。
そして、ステップＳ１９では、ステップＳ１４と同様にして、ＨＶ−Ｓｃｒｉｐｔに基づき音声合成をし、指定された言葉に指定されたアクセントを付加して発音する。
なお、上記で説明した各動作フローは一例であり、本発明は、上記の処理の流れに限定されるものではない。
【００３３】
以上、この発明の実施形態を、図面を参照して詳述してきたが、この発明の具体的な構成はこの実施形態に限られるものではなく、この発明の要旨を逸脱しない範囲の構成等も含まれる。例えば、上記実施の形態では、直後の文字のアクセントを規定する記号とそのアクセントの変化量を規定する数値を用いた構文からなる発音記述言語を用いているが、この発音記述言語で規定する構文は、これに限るものではない。例えば、これらの記号等を文字の直後に記述するような構文であってもよい。
また、メール作成時に、メール文中にＨＶ−Ｓｃｒｉｐｔ（特殊文字列）を記述し、この特殊文字列を添付ファイルに移動させているが、発信者が自身で、この添付ファイルを別に作成するようにしてもよい。
また、受信者側において添付ファイルを自動で展開する設定は、メーラーでの設定に限らず、添付ファイル自体にその設定が含まれていてもよい。
【００３４】
【発明の効果】
以上、詳細に説明したように、本発明によれば、従来の電子メールの表示に加え、上記特殊文字列を用いることで音声合成によるさらなる表現効果を提供することができる。
また、本発明によれば、送信側で、上記特殊文字列を記述した別ファイルを作成し、これを電子メールの添付ファイルとするとともに、この電子メールから該特殊文字列を削除し、受信側で、添付ファイルを開くことによりその音声合成をするので、受信側で電子メールを表示する際、上記特殊文字列の表示を除くことができ、電子メールの表示に係る見た目を損なうことがなく、読み手に不快感を与えない。
【００３５】
また、本発明によれば、特殊文字列を識別する専用制御文字からなる記述を、汎用の絵文字等に置換するので、遊び心のある楽しい表現をすることができる。また、本発明によれば、電子メールが開かれた時点で、添付ファイルの特殊文字列が解釈され音声合成されるので、受信側のユーザは、自身で添付ファイルを開く必要がなく、また、特殊文字列による記述が別ファイル（添付ファイル）となっていても、電子メールの内容の表示と特殊文字列による音声合成とをほぼ同時に行わせることができる。
【図面の簡単な説明】
【図１】本発明を適用した一実施の形態である携帯電話機の構成を示すブロック図である。
【図２】同実施の形態における特殊文字列を含む文章（電子メールの本文）の一例である。
【図３】同実施の形態における発音記述言語における各記号（代表例）の意味を説明する図である。
【図４】同実施の形態において、送信側におけるメール文の作成からその変換と添付ファイルの作成までの、メール文と添付ファイルの内容例である。
【図５】同実施の形態の送信側携帯電話機の動作を説明するフローチャートである。
【図６】同実施の形態の受信側携帯電話機の動作を説明するフローチャートである。
【符号の説明】
１…携帯電話機（端末装置）、１１…ＣＰＵ、１２…通信部、１２ａ…アンテナ、１３…音声処理部、１４，１７…スピーカ、１５…マイク、１６…音声合成機能付音源、１８…操作部、１９…ＲＡＭ、２０…ＲＯＭ、２１…表示部、２２…バイブレータ、３０…バス[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a terminal device capable of sending and receiving electronic mail, and more particularly to a terminal device that performs speech synthesis using a pronunciation description language.
[0002]
[Prior art]
2. Description of the Related Art Conventionally, a technology for synthesizing a sentence described in text has been used. At present, a technology has been developed that adds more intonation (intonation) and the like to the synthesized speech, thereby enabling more natural human pronunciation.
On the other hand, cellular phones, personal computers, and the like can notify the arrival of an electronic mail by voice synthesis, or read an electronic mail written in text by voice synthesis. For example, in Patent Document 1, an attached file reference code / character string is inserted into the body of an e-mail, and after the text body is read out, the predetermined code / character is inserted based on the inserted predetermined code / character string. A technique is disclosed that conveys emotions, enjoyment, and the like that are difficult to communicate by referring to an attached file made up of music data or image data associated with a column and playing and / or displaying the attached file data. In Patent Document 1, music data or image data corresponding to a predetermined code / character string is automatically attached as an attached file to an electronic mail to be transmitted.
[0003]
[Patent Document 1]
Japanese Patent Laid-Open No. 2002-073507
[Problems to be solved by the invention]
However, the technique described in Patent Document 1 generates an attached file made up of music data and image data prepared in advance corresponding to a predetermined code or the like inserted in the body of the e-mail, And the music data and image data of the attached file are played back on the receiving side based on a predetermined code inserted in the mail body. For speech synthesis, the text body is simply speech-synthesized. Is just a music data and image data playback function. That is, the user on the transmission side does not specify the words to be synthesized, their inflections, etc., and reproduce them on the reception side.
[0005]
Also, by including music data and image data in the attached file, it is only necessary to insert a predetermined code or character string into the mail body. Regardless of a certain character string, if the control code is meaningless to the user, it is misunderstood as garbled characters or looks bad. In particular, when the receiving side is a small portable terminal device such as a cellular phone, the display screen is small, which is not preferable. Also, the inserted character string is meaningful, and it is music data or image data prepared in advance corresponding to this character string even if there is no problem in appearance.
[0006]
The present invention has been made in view of the above points, and by using a pronunciation description language for phonetic pronunciation described in text, a voice synthesis including the inflection of a specified word can be performed on the receiving side of an e-mail. Further, the electronic mail itself to be sent to the receiving side is intended to provide a terminal device for synthesizing speech in a phonetic description language that can be read easily and does not cause discomfort to the reader.
[0007]
[Means for Solving the Problems]
According to one aspect of the present invention, in the terminal apparatus capable of transmitting and receiving e-mail, written in text, special by pronunciation description language for voice pronunciation defining also suppress the lift when voicing the specified word A voice synthesizing unit that interprets a character string and synthesizes a voice defined by the special character string; and a display unit that displays the e-mail. In outgoing operation, the special character is added to the created e-mail. When a column is described, the special character string described in the e-mail is identified, a separate file describing only the special character string is created, and this is used as an attachment file of the e-mail. The special character string is deleted from the e-mail, and in the receiving operation, the contents of the received e-mail are displayed on the display means, and the attached file attached to the e-mail is opened. When is characterized by interpreting voice synthesizing said special character string described in the package insert with file by said speech synthesis means.
[0008]
In the present invention, the terminal device on the transmission side uses the pronunciation description language described in the text to specify the words that the user wants to synthesize and the intonation thereof. When a special character string is described and this e-mail is transmitted to the recipient, the receiving terminal device interprets the special character string included in the received e-mail and synthesizes it. According to the present invention, in addition to the conventional display of electronic mail, it is possible to provide a further expression effect by voice synthesis using the special character string.
Note that the above-mentioned pronunciation description language corresponds to the character to be pronounced (character string) and the symbol that defines the inflection when the inflection is added to each kana character constituting the character string. It is preferable that the above-mentioned special character string is described as a syntax. In this way, inflection can be added in units of one character.
[0010]
In the present invention, when a special character string described in an e-mail created on the caller side is described in a predetermined method, the terminal device on the caller side sends another file describing the special character string. Create and delete this special string from this email. Then, this separate file is transmitted to the receiving side together with the electronic mail as an attached file of the electronic mail. When the attached file attached to the received e-mail is opened, the receiving terminal device interprets the special character string recorded in the attached file and synthesizes the voice.
As described above, in the terminal device on the transmission side, the special character string is moved from the e-mail to the attached file (described and recorded in the attached file and deleted from the e-mail). When the e-mail is displayed on the receiving side, the special character string can be excluded from being displayed, and the appearance related to the e-mail display is not impaired.
[0011]
The invention according to claim 2, in the terminal device according to claim 1 is characterized and Turkey to describe the dedicated control character for identifying the special string before and after the special character string .
In the present invention, the special character string can be identified by describing the dedicated control character before and after the special character string.
[0012]
Further, the invention according to claim 3, in the terminal apparatus according to claim 2, originating terminal device, replacing the description made of the dedicated control characters, the pictographs of predetermined generic It is characterized by.
In the present invention, by replacing the description composed of the dedicated control characters with a predetermined general-purpose pictogram or the like, in the terminal device on the receiving side, the dedicated control character portion in the e-mail described on the transmitting side is Are displayed as pictograms. That is, a playful and fun expression can be provided instead of the dedicated control character that has no control meaning.
[0013]
The invention according to claim 4, in the terminal device according to claim 1, when the attachment is e-mail that is attached is opened, is characterized by opening the attached file accordingly.
In the present invention, when the e-mail is opened, the special character string of the attached file is interpreted and synthesized, so that the user on the receiving side does not need to open the attached file by himself / herself. Even if the description is a separate file (attached file), the display of the contents of the e-mail and the speech synthesis using the special character string can be performed almost simultaneously.
[0014]
DETAILED DESCRIPTION OF THE INVENTION
Hereinafter, embodiments of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing a configuration of a mobile phone as an embodiment to which the present invention is applied. The present invention is not limited to a mobile phone, but can be applied to a PHS (registered trademark) (Personal handyphone system), a personal digital assistant (PDA), a personal computer, and the like.
[0015]
In FIG. 1, reference numeral 11 denotes a CPU (Central Processing Unit), which controls the operation of each unit of the mobile terminal device 1 by executing various programs.
Reference numeral 12 denotes a communication unit that demodulates a signal received by an antenna 12a included in the communication unit 12, and modulates a signal to be transmitted and supplies the modulated signal to the antenna 12a.
The CPU 11 decodes a signal from a network such as the Internet demodulated by the communication unit 12 according to a predetermined protocol, and information (for example, e-mail) included in the signal is displayed on the display unit 21 below. When the electronic mail (or its attached file) has a special character string to be described later, the voice defined by the special character string is voice-synthesized by the sound source unit with voice synthesis function 16 described below. In addition, data such as e-mail to be transmitted is encoded by the CPU 11 according to a predetermined protocol, modulated by the communication unit 12, and then transmitted to a destination server (in the case of e-mail, a mail server). Toward the base station from the antenna 12a.
[0016]
Reference numeral 13 denotes an audio processing unit. The voice signal of the telephone line demodulated by the communication unit 12 is decoded by the voice processing unit 13 and output from the speaker 14. On the other hand, the audio signal input from the microphone 15 is digitized and compressed and encoded by the audio processing unit 13. Then, the signal is modulated by the communication unit 12 and output from the antenna 12a to the base station of the mobile phone network. The voice processing unit 13 performs high-efficiency compression coding / decoding of voice data by, for example, a CELP (Code Excited LPC) system or an ADPCM (Adaptive Differential PCM Coding) method.
[0017]
Reference numeral 16 denotes a sound source unit with a voice synthesis function, which reproduces music data selected as a ring tone and emits the sound from the speaker 17. Further, when receiving predetermined speech data corresponding to each phoneme of each character constituting the word to be pronounced (this speech data includes parameters that affect voice quality, pitch, etc.), control from the CPU 11 In response, the sound is synthesized and sounded (spoken) from the speaker 17. The speech synthesis method by the sound source unit 16 with the speech synthesis function is arbitrary, but can be realized, for example, by applying the CSM speech synthesis technology disclosed in Japanese Patent Publication No. 58-53351 to the FM sound source.
Reference numeral 18 denotes an operation unit, which is an input unit that detects inputs from various buttons (not shown) including alphanumeric buttons provided on the main body of the mobile phone 1 and other input devices.
[0018]
Reference numeral 19 denotes a RAM (Random Access Memory), a work area of the CPU 11, a storage area of downloaded music data and accompaniment data (these are used for playing ringtones, etc.), and a received e-mail A mail data storage area for storing data is further set.
Reference numeral 20 denotes a ROM (Read Only Memory). The ROM 20 includes various telephone function programs for controlling outgoing / incoming calls, programs for assisting music reproduction processing, mail transmission / reception function programs for controlling transmission / reception of electronic mail, voice synthesis, and the like executed by the CPU 11. In addition to programs such as a program for assisting processing, various data such as voice data and musical tone data for each phoneme are stored.
[0019]
The display unit 21 includes an LCD (Liquid Crystal Display) or the like, and displays a menu or the like, an e-mail content, or the like according to the operation of the operation unit 18 under the control of the CPU 11. It is a display.
Reference numeral 22 denotes a vibrator that informs the user of an incoming call by vibrating the main body of the mobile phone 1 instead of a ringing tone when an incoming call is received.
Each functional block exchanges data and commands via the bus 30.
[0020]
Here, a special character string in a pronunciation description language for phonetic pronunciation that also defines inflection when a word is made into speech will be described.
FIG. 2 shows an example of a sentence (a body of an electronic mail) including a special character string. In this example, the part of “Ka_3 is about 5 and 4 is $ 4” sandwiched between the dedicated control characters indicated by symbol (1) in the same figure is a sentence including a special character string. Is a normal text sentence. The sentence containing this special character string “Ka_3 ga ＾ 5 ＿ _ 4 '' 4 ne $ 2-” is based on the pronunciation description language for adding the intonation to the word “ka ga ga ga na ne” and synthesizing it. It is a description. The symbols “′”, “^”, “_”, “$”, etc. described in this example indicate the type of intonation to be added to a character (kana character). If there is a numerical value immediately after, a predetermined accent is added to the character following the numerical value).
[0021]
FIG. 3A shows the meaning of each symbol (representative example) in the pronunciation description language. That is, “'” increases the pitch at the beginning (see (1) in FIG. 3), “^” increases the pitch during pronunciation (see (3) in FIG. 3 (c)), and “_” indicates the beginning. The pitch is lowered (see (2) in FIG. 3B), and “$” means that speech synthesis is performed so as to lower the pitch during sound generation (see (4) in FIG. 3).
When a numerical value is added immediately after the symbol, the numerical value specifies the amount of change in the accent to be added. For example, “ka_3ga” indicates that “sa” is the beginning and the pitch is lowered by an amount of 3, “ga” is pronounced at the lowered pitch, and “ka” is pronounced at the standard height. Indicates to do.
[0022]
As described above, in the pronunciation description language, when an accent (intonation) is added to a character included in a word to be pronounced, a symbol (in addition, the amount of intonation change shown in FIG. It is a syntax that adds a description). Although only the symbols for controlling the pitch have been described in the present embodiment, symbols for controlling the strength, speed, sound quality, etc. of the sound can be used in addition to these. The special character string may be described in the body of the e-mail, in the title of the e-mail, as in the example shown in FIG. 2, or a predetermined attached file (for example, It may be described in an attached file that can be identified as including a special character string by its extension, and attached to an e-mail to be transmitted.
[0023]
Next, the operation of the mobile phone 1 of the present embodiment configured as described above will be described. Note that operations related to outgoing / incoming calls using normal telephone functions and operations related to transmission / reception of e-mails are well-known techniques and will not be described. Hereinafter, a sentence written in a special character string in the pronunciation description language is referred to as HV-Script.
[0024]
[Sender operation]
First, at the time of creating a mail sentence, the sender describes the HV-Script to be pronounced as text in any place in the body of the e-mail (may be a title field) (see FIG. 4A). At this time, the sender further inputs a special control character (here, the character indicated by (1) in FIG. 4B) sandwiching the HV-Script. In this way, an e-mail message including HV-Script is created. Thus, although it is difficult to read as a sentence, a special character string is described when creating a mail sentence. By moving this HV-Script to an attached file, the e-mail itself is a mail sentence that is easy to read by the other party. As a result, not only on the cellular phone 1 of the present embodiment but also on a general cellular phone that does not have the functions of the present embodiment, it is displayed in the same manner as a normal mail text, and does not cause discomfort to the reader. It becomes.
[0025]
Next, when the creation of the e-mail text is completed, the sender performs an operation of transmitting the e-mail. The following is the operation of the transmitting-side mobile phone 1 (see FIG. 5).
In step S01, the transmitting-side mobile phone 1 determines whether or not there is a mail transmission operation, and waits until the mail transmission operation is performed.
As described above, when a mail transmission operation is performed by the sender, “Yes” is determined in the determination in step S01, and the process proceeds to step S02.
In step S02, it is determined whether or not there is a dedicated control character in the mail. If there is no dedicated control character in the mail, it is determined No in this determination, the process proceeds to step S03, and the e-mail is transmitted in step S03 and the process is terminated.
Here, as described above, since the dedicated control character is described in the mail text, it is determined Yes in the determination in step S02, and the process proceeds to step S04.
[0026]
In step S04, a new attached file is created. The file name of the attached file is appropriately given, but a special extension (for example, .hvs) is attached to the extension to indicate that the file includes HV-Script.
In step S05, the HV-Script sandwiched between the dedicated control characters in the mail text is moved to the attached file. Is extracted from the mail text, described in the attached file, recorded, and deleted from the email).
[0027]
Next, in step S06, the set of dedicated control characters in the mail text is further changed (replaced) to a predetermined general-purpose pictogram (here, the pictogram shown by (2) in FIG. 4C).
In step S07, the e-mail changed in step S06 and the attached file created in steps S04 and S05 are transmitted at the designated addresses. Note that the processes of steps S05 and S06 may be separately performed only by receiving another predetermined operation. Further, the e-mail to be left in the transmitting-side mobile phone 1 may be the one at the time of input or a combination of the e-mail after the above processing and the attached file.
[0028]
[Receiver operation]
Next, the operation of the mobile phone 1 on the receiving side will be described using the flowchart of FIG.
In step S11, the mobile phone 1 on the receiving side determines whether or not there is a mail display operation, and waits until a mail display operation is performed by the user.
Here, it is assumed that a mail display operation is performed by the recipient (user). Then, it is judged as Yes by judgment of Step S11, and it shifts to Step S12.
In step S12, the content of the received e-mail is displayed on the display unit 21.
[0029]
Next, in step S13, it is determined whether or not HV-Script is present in the mail. Here, this determination is made based on whether or not there is a set of dedicated control characters in the mail. If it is determined that Yes, that is, there is a set of dedicated control characters in the mail and HV-Script is present, the process proceeds to step S14.
In step S14, speech data corresponding to each phoneme of each character constituting the word specified by the HV-Script is read from the ROM 20 and an accent is added to the word described in the HV-Script. Based on the symbols, voice data (the voice data itself may be processed to change its pitch, etc.) is given to the sound source with voice synthesis function 16 so that the sound to be generated has intonation. Control and synthesize speech. In this way, if there is HV-Script in the mail text, it is interpreted and immediately synthesized and pronounced.
[0030]
If it is determined No in step S13, that is, if there is no description of HV-Script in the mail, the process proceeds to step S15.
In step S15, it is determined whether an attached file having an extension of “.hvs” is attached. When it is determined No in this determination, that is, when there is no HV-Script at all, speech synthesis is not performed and the processing is terminated as it is.
On the other hand, if it is determined as Yes in step S15, that is, if an attached file having the extension “.hvs” is attached, the process proceeds to step S16.
[0031]
In step S16, it is determined whether or not the mailer to be used is set to automatically open an attached file having the extension “.hvs”.
If it is determined as Yes in this determination, that is, if it is set to automatically open the attached file, the process proceeds to step S17 to open the file.
On the other hand, when it is determined No in this determination, that is, when the setting is not set to automatically open the attached file, the process proceeds to step S18.
[0032]
In step S18, it is determined whether or not an attachment file expansion operation has been performed by the receiver, and the receiver waits for an operation to open the attachment file. When the recipient performs an operation to open the attached file, the determination is Yes and the process proceeds to step S17 to open the file.
In step S19, as in step S14, speech synthesis is performed based on HV-Script, and a designated accent is added to a designated word and pronounced.
Each operation flow described above is an example, and the present invention is not limited to the above processing flow.
[0033]
The embodiment of the present invention has been described in detail with reference to the drawings. However, the specific configuration of the present invention is not limited to this embodiment, and the configuration and the like within the scope of the present invention are not limited. included. For example, in the above embodiment, a pronunciation description language composed of a syntax that uses a symbol that defines the accent of the immediately following character and a numerical value that defines the amount of change in the accent is used. Is not limited to this. For example, the syntax may be such that these symbols are described immediately after the character.
In addition, HV-Script (special character string) is described in the mail text at the time of composing the mail, and this special character string is moved to the attached file. However, the sender should create this attached file separately. May be.
Further, the setting for automatically expanding the attached file on the receiver side is not limited to the setting in the mailer, and the setting may be included in the attached file itself.
[0034]
【The invention's effect】
As described above in detail, according to the present invention, in addition to the conventional display of electronic mail, the use of the special character string can provide a further expression effect by speech synthesis.
Further, according to the present invention, on the transmission side, another file describing the special character string is created, and this is used as an attached file of the e-mail, and the special character string is deleted from the e-mail, and the receiving side Then, since the speech synthesis is performed by opening the attached file, when displaying the e-mail on the receiving side, the display of the special character string can be excluded, and the appearance related to the display of the e-mail is not impaired, Does not make readers uncomfortable.
[0035]
Further, according to the present invention, since the description composed of the dedicated control character for identifying the special character string is replaced with a general-purpose pictogram or the like, a playful and fun expression can be achieved. Further, according to the present invention, when the e-mail is opened, the special character string of the attached file is interpreted and synthesized, so that the receiving user does not need to open the attached file himself, Even if the description by the special character string is a separate file (attached file), the display of the contents of the e-mail and the speech synthesis by the special character string can be performed almost simultaneously.
[Brief description of the drawings]
FIG. 1 is a block diagram showing a configuration of a mobile phone according to an embodiment to which the present invention is applied.
FIG. 2 is an example of a sentence (a body of an electronic mail) including a special character string in the embodiment.
FIG. 3 is a diagram for explaining the meaning of each symbol (representative example) in the pronunciation description language according to the embodiment;
FIG. 4 is a content example of mail text and attached file from creation of mail text on the transmission side to conversion and creation of attached file in the embodiment;
FIG. 5 is a flowchart for explaining the operation of the transmitting-side mobile phone according to the embodiment;
FIG. 6 is a flowchart for explaining the operation of the receiving-side mobile phone according to the embodiment;
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Mobile telephone (terminal device), 11 ... CPU, 12 ... Communication part, 12a ... Antenna, 13 ... Voice processing part, 14, 17 ... Speaker, 15 ... Microphone, 16 ... Sound source with a voice synthesis function, 18 ... Operation part 19 ... RAM, 20 ... ROM, 21 ... display unit, 22 ... vibrator, 30 ... bus

Claims

電子メールの送受信が可能な端末装置において、
テキストで記述され、指定された言葉を音声化する際の抑揚をも規定する音声発音用の発音記述言語による特殊文字列を解釈し、該特殊文字列で規定される音声を音声合成する音声合成手段と、前記電子メールを表示する表示手段とを備え、
発信動作においては、作成された電子メールに前記特殊文字列が記述されている場合、前記電子メールに記述された前記特殊文字列を識別し、前記特殊文字列のみを記述した別ファイルを作成し、これを前記電子メールの添付ファイルとするとともに、前記電子メールから該特殊文字列を削除し、
受信動作においては、受信した電子メールの内容を前記表示手段に表示し、該電子メールに添付された前記添付ファイルが開かれると、該添付ファイルに記述された前記特殊文字列を前記音声合成手段によって解釈し音声合成する
ことを特徴とする発音記述言語による音声合成をする端末装置。In a terminal device that can send and receive e-mail,
Written in the text, the specified word to interpret the special character string by the pronunciation description language for audio pronunciation for defining also suppress the lift at the time of voicing, speech synthesis voice which is defined by the special string sound Combining means, and display means for displaying the e-mail,
In the sending operation, when the special character string is described in the created e-mail, the special character string described in the e-mail is identified, and another file describing only the special character string is created. , And this as an attachment of the e-mail, and delete the special character string from the e-mail,
In the receiving operation, the content of the received e-mail is displayed on the display means, and when the attached file attached to the e-mail is opened, the special character string described in the attached file is converted into the voice synthesizing means. terminal for the speech synthesis by pronunciation description language, characterized in that interpret speech synthesis by.

請求項１に記載の端末装置において、
前記特殊文字列の前後に該特殊文字列を識別するための専用制御文字を記述する
ことを特徴とする発音記述言語による音声合成をする端末装置。The terminal device according to claim 1,
That describes a special control character for identifying the special string before and after the special character string
Terminal for the speech synthesis by pronunciation description language, wherein a call.

請求項２に記載の端末装置において、
発信側の端末装置は、前記専用制御文字からなる記述を、予め決められた汎用の絵文字に置換する
ことを特徴とする発音記述言語による音声合成をする端末装置。The terminal device according to claim 2,
Originating terminal device, the terminal device to the speech synthesis by pronunciation description language, characterized by replacing the description made of the dedicated control characters, the pictographs of predetermined generic.

請求項３に記載の端末装置において、
前記添付ファイルが添付された電子メールが開かれると、これに応じて前記添付ファイルを開く
ことを特徴とする発音記述言語による音声合成をする端末装置。The terminal device according to claim 3,
A terminal device for synthesizing speech in a phonetic description language, characterized in that, when an e-mail attached with the attached file is opened, the attached file is opened accordingly.