JP4026512B2

JP4026512B2 - Singing composition data input program and singing composition data input device

Info

Publication number: JP4026512B2
Application number: JP2003052056A
Authority: JP
Inventors: 裕司久湊; オルトラジャウメ
Original assignee: Yamaha Corp
Current assignee: Yamaha Corp
Priority date: 2003-02-27
Filing date: 2003-02-27
Publication date: 2007-12-26
Anticipated expiration: 2023-02-27
Also published as: JP2004258561A

Abstract

<P>PROBLEM TO BE SOLVED: To provide a data input interface for singing synthesis which facilitates lyrics input operation. <P>SOLUTION: When one word consists of a plurality of notes, the word is divided by using a hyphen and inputted. The word is obtained by removing the hyphen of words inputted by using the hyphen and combining them together. Pronunciation symbols of the obtained word are obtained from a database. At this time, the pronunciation symbols are corrected according to a specified rule. The pronunciation symbols are divided based upon vowels as borders and allocated to the notes. <P>COPYRIGHT: (C)2004,JPO&NCIPI

Description

【０００１】
【発明の属する技術分野】
本発明は、歌唱合成装置において、歌詞の入力を容易にすることができるデータ入力装置およびその制御を行うプログラムに関する。
【０００２】
【従来の技術】
従来、音声合成装置や音声合成プログラムにおいて、テキストが与えられると、このテキストに対応する発音記号を生成して、この発音記号に従って、音声を合成するものがあった。この種の装置やプログラムの中には、生成された発音記号列を編集する機能を有するものがあった。この編集のために、これら装置やプログラムは、テキストエディタを実装していて、ユーザはこのテキストエディタを用いて発音記号列を直接編集するのが一般的であった。
しかしながら、生成される発音記号列は、読みを表す記号、アクセントやポーズの位置などを表す記号の列であり、これらの記号の意味をユーザが知らなければ、発音記号列をユーザが編集するのは非常に難しかった。
【０００３】
この欠点を克服する技術として、例えば、特許文献１においては、発音記号列を知らない一般のユーザにとって理解が容易な発話区分という記号を導入している。この発音区分は、アクセント句、声立て句、呼気段落の区切り位置を表わしていて、発音記号列の定義を知らなくても、その言語を話すことができる者であれば、容易に編集することができる。また、発音区分の編集作業によって、間接的に発音記号の編集が行われるように構成されているので、特許文献１によれば、容易に発音記号列の編集を行うことができる。
【０００４】
また今日、コンピュータを用いて作曲することが行われている。このコンピュータを用いた音楽編集は、ＤＴＭ（デスクトップミュージック）とも呼ばれている。このＤＴＭでは、ユーザは、ＭＩＤＩ（ＭｕｓｉｃａｌＩｎｓｔｒｕｍｅｎｔｓＤｉｇｉｔａｌＩｎｔｅｒｆａｃｅ）音源を利用して、好きな曲を自分自身で演奏したり、またユーザ自身が製作者となって、音色やメロディを考えながら、自分だけの楽曲を作って演奏を楽しんだりしている。
また、ＤＴＭにおいては、音声合成の技術を利用して、曲に合わせてコンピュータに歌を歌わせることも行われている。
【０００５】
【特許文献１】
特開平５−１１７９７号公報
【０００６】
【発明が解決しようとする課題】
上述した特許文献１の技術は、自然性の高い韻律を与える適切な発音記号列を容易に生成することが可能な発音記号列生成装置、及びそれを用いたテキスト音声合成システムを提案している。しかし、特許文献１の技術は、既に出来上がっている楽譜の音符に歌詞を割り当てていく音楽編集作業のための装置やプログラムに適用することはできない。
【０００７】
また楽曲編集において、音符に歌詞と発音をつける場合、１つの音符に１つの母音を割り振っていくのは、発音記号と音節区切りを知らないユーザには難しい。なぜならば、例えば英語の歌詞を音符に割り当てる場合、ユーザは単語の音節を知らなければならない。しかし一般のユーザは、単語の音節を知るには辞書を見なければならず、歌詞入力の際にそのようなことをするのはわずらわしいのである。
【０００８】
本発明は、上述した事情に鑑みて為されたものであり、音符に歌詞を割り振ることを容易にすることができる歌唱合成用データ入力装置および歌唱合成用データ入力プログラムを提供することを目的としている。
【０００９】
【課題を解決するための手段】
上記目的を達成するため、本発明による歌唱合成用データ入力プログラムは、楽曲を構成する複数の音符に対応した複数のノートデータからなる歌唱スコアデータを記憶手段に記憶する記憶過程と、前記記憶手段に記憶された複数の連続したノートデータに対応付けられて入力された入力データであって、１つの単語を構成する複数の表記文字と、連続した２つの前記表記文字間を接続する表記文字接続記号とを含んだ入力データを、入力装置を介して取得し、該入力データから前記表記文字接続記号を除去して単語を取得するとともに、ユーザが前記表記文字接続記号を用いて単語を割り当てたノートデータの数を取得する単語取得過程と、前記単語取得過程において取得された単語について、辞書を検索して、単語の発音態様を示す発音記号と、単語を構成する音節の数を表す区分情報とを取得する単語情報取得過程と、前記単語情報取得過程において取得した区分情報が表す数と、前記単語取得過程において取得された単語に対応付けられたノートデータの数とを比較し、両者が一致する場合には、前記単語取得過程において１の単語から得られた発音記号を各々１個の母音を含む音節に分け、各音節を、前記歌唱スコアデータにおける各ノートデータのうち前記単語が取得された入力データに対応付けられた複数の連続したノートデータに割り当てることにより、前記記憶手段に記憶された歌唱スコアデータを更新し、両者が一致しない場合には割り当て処理を中止する歌唱スコアデータ更新過程とをコンピュータに実行させる。
【００１０】
また、本発明による歌唱合成用データ入力装置は、楽曲を構成する複数の音符に対応した複数のノートデータからなる歌唱スコアデータを記憶する記憶手段と、前記記憶手段に記憶された複数の連続したノートデータに対応付けられて入力された入力データであって、１つの単語を構成する複数の表記文字と、連続した２つの前記表記文字間を接続する表記文字接続記号とを含んだ入力データを、入力装置を介して取得し、該入力データから前記表記文字接続記号を除去して単語を取得するとともに、ユーザが前記表記文字接続記号を用いて単語を割り当てたノートデータの数を取得する単語取得手段と、前記単語取得手段において取得された単語について、辞書を検索して、単語の発音態様を示す発音記号と、単語を構成する音節の数を表す区分情報とを取得する単語情報取得手段と、前記単語取得手段において取得した区分情報が表す数と、前記単語取得過程において取得された単語に対応付けられたノートデータの数とを比較し、両者が一致する場合には、前記単語取得手段において１の単語から得られた発音記号を各々１個の母音を含む音節に分け、各音節を、前記歌唱スコアデータにおける各ノートデータのうち前記単語が取得された入力データに対応付けられた複数の連続したノートデータに割り当てることにより、前記記憶手段に記憶された歌唱スコアデータを更新し、両者が一致しない場合には割り当て処理を中止する歌唱スコアデータ更新手段とを有する。
【００１１】
【発明の実施の形態】
本発明の好適な実施形態を、図面を参照しながら説明する。
図１は、この発明の一実施形態である歌唱合成用データ入力装置としての機能を有するコンピュータ１の構成を示すブロック図である。図１に示すコンピュータ１において、ＣＰＵ（Central Processing Unit）１１、ＲＯＭ（Read Only Memory）１２、ＲＡＭ（Random Access Memory）１３、操作部１４、ＨＤＤ（ハードディスク駆動装置）１５、ディスプレイ１６、データ入出力部１７は、バスＢＵＳを介して接続されており、お互いにデータの授受を行うことができる。また音源部１８、スピーカ１９は、コンピュータ１に外部機器として接続されているが、コンピュータ１の内部の機器として構成してもよい。
【００１２】
ＣＰＵ１１は、汎用的なデータ処理を行うマイクロプロセッサであり、ＲＯＭ１２に格納されたＢＩＯＳ（Basic Input/Output System）等の制御用プログラムおよびＨＤＤ１５に格納されたＯＳ（オペレーティングシステム）に従い、コンピュータ１の他の構成部の制御処理を行う。
【００１３】
ＲＯＭ１２は、ＢＩＯＳ等の制御用プログラムを格納する不揮発性メモリである。また、ＲＡＭ１３は、ＣＰＵ１１や他の構成部が利用するデータを一時的に記憶するための揮発性メモリである。ＲＯＭ１２内のＢＩＯＳは、コンピュータ１の電源が投入された時に、ＣＰＵ１１によって読み出され、ＲＡＭ１３に書き込まれる。ＣＰＵ１１は、このＲＡＭ１３内のＢＩＯＳに従ってハードウェアの利用環境を構築する。操作部１４は、キーパッドやマウス等を有し、ユーザによって行われる操作内容を反映したデータをＣＰＵ１１に送信する。ＨＤＤ１５は、大容量の記憶領域を有する不揮発性のメモリであり、ＨＤＤ１５に記憶されるデータは書き換え可能である。ＨＤＤ１５には、ＯＳと、各種のアプリケーションと、各種のアプリケーションによって利用されるデータが格納されている。ＣＰＵ１１は、ＢＩＯＳによるハードウェア環境の構築後、ＨＤＤ１５からＯＳを読み出して、ＲＡＭ１３に書き込み、ＯＳに従って、ＧＵＩ（Graphical User Interface）環境およびアプリケーションの実行環境の構築等の処理を行う。ＨＤＤ１５に記憶されているアプリケーションのうち主要なものとして、歌唱合成用データ入力アプリケーションがある。
【００１４】
ＣＰＵ１１は、マウス等の操作により、歌唱合成用データ入力アプリケーションの実行指示をユーザから受け取ると、このＨＤＤ１５から歌唱合成用データ入力アプリケーションを読み出してＲＡＭ１３に書き込み、歌唱合成用データ入力アプリケーションに従って各種処理を行う環境を構築する。このようにして、コンピュータ１は、本実施形態に係る歌唱合成用データ入力装置として機能する。
【００１５】
ディスプレイ１６は、液晶ディスプレイと、ＣＰＵ１１による制御の下、液晶ディスプレイを駆動する駆動回路とを有し、文字、図形等の情報を表示する。
【００１６】
データ入出力部１７は、例えばＵＳＢ（Universal Serial Bus）インターフェース、各種データを入出力可能なインターフェースであり、外部機器からデータを受信し、受信したデータをＣＰＵ１１に転送したり、ＣＰＵ１１により生成されたデータを外部機器に送信したりする。
音源部１８は入力されたデータに基づいて楽音信号を発生し、スピーカ１９から楽音として出力する。
【００１７】
図２は、コンピュータ１のＣＰＵ１１が歌唱合成用データ入力アプリケーションを実行することにより提供される歌唱合成用データ入力装置の機能構成を示すブロック図である。図に示すように、歌唱合成用データ入力装置は、操作手段２１、外部データ入力手段２２、データ編集手段２３、表示手段２４および記憶手段２５を有している。これらのうち、操作手段２１は、コンピュータ１の操作部１４であり、記憶手段２５は、コンピュータ１のＲＡＭ１３およびＨＤＤ１５である。また外部データ入力手段２２は、コンピュータ１のデータ入出力部１７であり、表示手段２４はディスプレイ１６である。データ編集手段２３は、歌唱合成用データ入力アプリケーションを構成するソフトウェアモジュールである。
【００１８】
外部データ入力手段２２は、外部機器等から歌唱スコアデータ２５１を取得し、取得した歌唱スコアデータ２５１をデータ編集手段２３へ送信する。データ編集手段２３は、受け取った歌唱スコアデータ２５１を記憶手段２５に格納し、操作手段２１の操作に応じて、この歌唱スコアデータ２５１の編集を行う。表示手段２４は、記憶手段２５に記憶された歌唱スコアデータ２５１を表示する。
【００１９】
図３は、この歌唱スコアデータ２５１の構成を例示している。歌唱スコアデータ２５１は、楽曲を構成する一連の音符の各々に対応したノートデータからなる時系列データである。図３において横に並んだ１行分のデータが１つのノートデータを示している。１つのノートデータは、番号、音高、発音記号、発音期間、入力文字、表示文字の各データによって構成されている。ここで、各ノートデータにおける番号は、そのノートデータによって表される音符が曲の先頭から何番目のものであるかを示している。また、各ノートデータにおける音高および発音期間は、そのノートデータに対応した音符の音高および発音期間を各々指定している。発音記号、入力文字および表示文字は、ユーザによる操作手段２１の操作に応じて、データ編集手段２３が各ノートデータに割り当てる情報である。本実施形態の特徴は、このデータ編集手段２３によって行われる各ノートデータへの発音記号等の割り当て処理にある。記憶手段２５には、この各ノートデータへの発音記号等の割り当て処理において参照される辞書データ２５３と編集規則データ２５２が記憶されている。
【００２０】
図４に辞書データの構成を示す。辞書データは、多数の英単語等の表記文字の各々に対応したデータの集まりであり、１つの単語に対応したデータは、その単語の表記と、その単語が発音されるときの態様を表した発音記号列と、その単語が発音されるときの音声がいくつの音節に分けられるかの区分数を示す情報とにより構成されている。
【００２１】
本実施形態において、ユーザは、操作手段２１の操作により、一連のノートデータに割り当てるべき歌詞を入力することができる。この歌詞を入力する際に、複数の音節からなる単語を複数の連続した音符に割り当てるような場合がある。本実施形態は、このような場合におけるユーザの便宜を図ったものである。本実施形態において、ユーザは、ｎ音節からなる単語をｎ個の連続した音符に割り当てたい場合に次のルールだけを守ればいい。
ａ．最初の音符に対応付けて単語における最初の表記文字を入力し、最後の音符に対応付けて単語における最後の表記文字を入力する。
ｂ．最後の音符以外の音符に対応付けて入力するデータは、必ず表記文字接続記号（例えばハイフン）で終わるようにする。
ｃ．単語における最初の表記文字および最後の表記文字以外の表記文字は、単語における出現順序と同じ順序で入力する。
【００２２】
データ編集手段２３は、このルールに従って入力されたデータを操作手段２１から受け取ると、表記文字接続記号によって接続された表記文字列を１個の単語と解釈する。そして、データ編集手段２３は、記憶手段２５に記憶された辞書データ２５３を参照することにより、この単語に対応した発音記号列を取得し、これを各々１個の母音を含んだｎ個の音節に分割し、ｎ個のノートデータに割り当てる。
編集規則データ２５２は、以上説明した割り当て処理において参照される特殊な規則を表すデータである。例えば、英語には、あいまい母音（Ｓｃｈｗａ音）なるものがある。あいまい母音は発音が弱く発音期間が短い場合に用いられるもので、これに対応する音符の音量が大きい、または発音期間が長い場合には、通常の母音に変換される。更に具体的な例をだすと、上記割り当て処理において、0.5秒以上持続される音符にＳｃｈｗａ音が割り当てられた場合、データ編集手段２３は、これを「Ｑ」という発音記号に変換する。
【００２３】
また、他の規則としては、「ｈ」の発音の変更がある。これは、「ｈ」の発音を前後の発音に従って、無声にしたり有声にしたりする規則である。具体的には、データ編集手段２３は、上記割り当て処理において、あるノートデータに発音記号「ｈ」を割り当てた場合において、そのノートデータの前のノートデータに母音が割り当てられており、かつ、前のノートデータとの間に休符がない場合に「ｈ」の発音を有声とする。例えば、「She hits」という歌詞の「She」と「hits」との間に休符があると「hits」の「ｈ」は無声、休符がないと有声にする。
【００２４】
次に、歌唱合成用データ入力装置の動作を説明する。ＣＰＵ１１により歌唱合成用データ入力アプリケーションが実行されると、表示手段２４に図５に示すようなピアノロールが表示される。次に、操作手段２１に歌唱スコアデータを読み込む旨の指示が与えられると、外部データ入力手段２２を用いて、外部から歌唱スコアデータが読み込まれる。
【００２５】
読み込まれた歌唱スコアデータは、データ編集手段２３を介して記憶手段２５に記憶される。記憶手段２５に記憶された歌唱スコアデータは、表示手段２４上にピアノロール画面として表示される。図５はこのピアノロール画面を示すものである。このピアノロール画面においては、上下方向がピアノの鍵の並び方向に対応しており、下が低音、上が高音の鍵を表している。すなわち、ピアノロール画面の縦軸は音高軸となっている。また、ピアノロール画面の横軸は時間軸となっている。そして、図中、太線によって囲った矩形は、歌唱スコアデータに含まれる個々のノートデータを表しており、ノートバーと呼ばれる。各ノートデータに対応したノートバーの音高軸方向の位置は、そのノートデータによって指定された音高を示している。また、各ノートデータに対応したノートバーの時間軸方向の位置は、そのノートデータによって指定された音符の発音タイミングを示している。また、各ノートデータに対応したノートバーの長さは、そのノートデータによって指定された発音期間を示している。
【００２６】
本実施形態において、単語を分割して複数のノートデータに割り当てる場合、ユーザは、図６に示すように、語が連続することを示すハイフンを使って操作手段２１による単語の入力を行う。すなわち、図６において「amazing」と言う単語が、「ａ−」、「ｍａ−」、「ｚｉｎｇ」として入力されているが、「ａ」と「ｍａ」についている「−」は、単語の入力が未だ途中であり、後続のノートバーへ割り当てるべき表記文字が未だ残っていることを示している。図に示す例では、「ａ−」は「ｍａ−」につながり、つながったものはさらに「ｚｉｎｇ」へとつながる。「ｚｉｎｇ」には、ハイフンが付いていないので、これが単語の最後尾の表記文字列である。
【００２７】
このようにして入力される「amazing」が、本実施形態においてどのように処理されるかを以下説明する。
【００２８】
まず、ユーザは操作手段２１を使用して、「amazing」を割り当てる最初のノートバーＮＢ１をダブルクリックするなどして、ノートバーに歌詞を入力することができる歌詞入力待ち状態にする。そして、操作手段２１により「ａ」と「−」と入力しエンターキーを押す。
【００２９】
これにより、データ編集手段２３は、ノートバーＮＢ１に対応付けて、入力データ「ａ−」を取得する。そして、この入力データ「ａ−」を、ノートバーＮＢ１に対応付けて表示するとともに、記憶手段２５に確保されたバッファ領域（図示略）に格納する。次にユーザは、同様の操作により、ノートバーＮＢ２を指定して「ｍａ−」を入力する。これにより、データ編集手段２３は、記憶手段２５におけるバッファ領域内の既存の入力データ「ａ−」に新たに取得した入力データ「ｍａ−」を追加する。この結果、バッファ領域内の入力データは、「ａ−ｍａ−」となる。次にユーザは、同様の操作により、ノートバーＮＢ３を指定して「ｚｉｎｇ」を入力する。これにより、データ編集手段２３は、記憶手段２５におけるバッファ領域内の既存の入力データ「ａ−ｍａ−」に新たに取得した入力データ「ｚｉｎｇ」を追加する。この結果、バッファ領域内の入力データは、「ａ−ｍａ−ｚｉｎｇ」となる。この場合、データ編集手段２３は、入力された「ｚｉｎｇ」は、最後が「−」で終わっていないので、単語を構成する全ての表記文字の入力が終了したことを検知する。
【００３０】
次に、データ編集手段２３は、記憶手段２５におけるバッファ領域内の入力データ「ａ−ｍａ−ｚｉｎｇ」から、ハイフンを除去し、単語「ａｍａｚｉｎｇ」を得る。
【００３１】
次に、データ編集手段２３は、「ａｍａｚｉｎｇ」の区分数と発音記号を取得する為に、辞書データ２５３内の「ａｍａｚｉｎｇ」を検索する。辞書データ２５３には、「ａｍａｚｉｎｇ」の区分数と発音記号として、それぞれ「３」と「＠ｍｅＩＺＩＮ」が格納されているので、データ編集手段２３はこれらを得る。次にデータ編集手段２３は、この区分数と、ユーザが単語を割り当てたノートデータ（ノートバー）の数を比較する。もしこの数に不一致があればエラーの表示を表示手段２４に出して、処理を中止する。
【００３２】
次にデータ編集手段２３は、得られた発音記号列を音符に割り当てるために分割する。この時、母音の位置を区切りとする規則を使って分割する。これは通常の楽譜では１つの音符に対して１つの音節が対応していて、１つの音節には１つの母音が含まれているからである。「ａｍａｚｉｎｇ」の発音記号は、「＠ｍｅＩＺＩＮ」であるので、「＠」と「ｍｅＩ」と「ＺＩＮ」という３つの音節に分割する。そして、データ編集手段２３は、分割した音節を、歌唱スコアデータ内のノートバーＮＢ１、ＮＢ２およびＮＢ３に対応した各ノートデータの発音記号の欄に格納する。
【００３３】
次に、データ編集手段２３は、編集規則データ２５２に格納されている規則に従って記憶手段２５内の歌唱スコアデータにおける発音記号の修正をする。図７は、この処理を経た最終的な歌唱スコアデータを示している。なお、図７に示される発音記号は、発音記号をコンピュータ上で扱いやすいように規定したサンパ（SAMPA）に従った音声アルファベットである。
図６に示されるデータの場合、編集規則データ２５２に当てはまる規則が無いので、なにも変更されない。
このようにして、「amazing」に対応した最終的な歌唱スコアデータが記憶手段２５内に得られる。
【００３４】
次に、図８（ａ）を参照し、ユーザが、「amazing」と言う単語を４つの音符に割り当てて入力する場合の動作を説明する。この例では、ユーザがノートバーのＮＢ１からＮＢ４に図のように表記文字を入力している。「amazing」と言う単語の音節数は「３」であるので、これはユーザが勘違いをして入力している例である。
この場合、データ編集手段２３は、上述の処理と同じように、バッファ領域にユーザが入力した「ａ−ｍａ−ｚｉ−ｎｇ」を格納する。
次に、データ編集手段２３は、バッファ領域に格納された「ａ−ｍａ−ｚｉ−ｎｇ」とから、ハイフンを除去して「ａｍａｚｉｎｇ」と言う単語を得る。
【００３５】
次に、データ編集手段２３は、「ａｍａｚｉｎｇ」の区分数と発音記号を取得する為に、辞書データ２５３内の「ａｍａｚｉｎｇ」を検索する。辞書データ２５３には、「ａｍａｚｉｎｇ」の区分数と発音記号として、それぞれ「３」と「＠ｍｅＩＺＩＮ」が格納されているので、データ編集手段２３はこれらを得る。次にデータ編集手段２３は、この区分数と、単語がいくつのノートバーに跨って入力されたかを比較をする。この「ａｍａｚｉｎｇ」の場合、ユーザは４つのノートバーに跨って入力されており、ノートバーの数が区分数と異なる。よってデータ編集手段２３は、エラーの表示を表示手段２４に出して、処理を中止する
【００３６】
次に、ユーザが、「amazing」と言う単語を正しく３つの音符に割り当てて入力するが、図８（ｂ）に示すように、「amazing」と言う単語の音節を誤って入力した場合の動作を説明する。この場合にはユーザが入力した「ａｍａ−ｚｉ−ｎｇ」がバッファ領域に格納されるが、これからハイフンが除去されるので、バッファ領域の内容は、前述と同様、「ａｍａｚｉｎｇ」と言う単語になる。従って、前述と同様、この単語の発音記号を構成する３つの音節である「＠」と「ｍｅＩ」と「ＺＩＮ」が３つのノートデータに割り当てられる。
【００３７】
次に、ユーザが、図８（ｃ）に示すように、「amazing」と言う単語を４つの音符に割り当てて入力する場合の動作を説明する。これは、ユーザが「amazing」と言う語の音節数を４と勘違いした場合ではなく、ユーザは音節数を正しく認識していて、かつ「amazing」と言う単語を４つの音符に割り当てる場合である。
具体的には、ユーザが、「ｍａ−」の音を伸ばすことを意図して、図８（ｃ）のようにノートバーＮＢ１からＮＢ４に入力するものとする。
この場合、ユーザは、「ｍａ−」を入力したノートバーの次のノートバーに、「−」のみを入力する。このようにして入力される「―」は、他の特殊な記号、例えば「＊」に置き換えられて、バッファ領域に格納される。
【００３８】
従って、ユーザが４つのノートバーへの入力を終えると、バッファ領域には「ａ−ｍａ−＊ｚｉｎｇ」が残る。
データ編集手段２３は、バッファ領域に格納された「ａ−ｍａ−＊ｚｉｎｇ」から、ハイフンを削除して「ａｍａ＊ｚｉｎｇ」と言う文字列を得る。次に、データ編集手段２３は、バッファ領域内の「ａｍａ＊ｚｉｎｇ」が「＊」を含んでいるのを検知すると、「ａｍａ＊ｚｉｎｇ」を第２のバッファ領域にコピーした後、バッファ領域内の「ａｍａｉ＊ｚｉｎｇ」から「＊」を除去し、単語「ａｍａｚｉｎｇ」を得る。
次に、データ編集手段２３は、「ａｍａｚｉｎｇ」の区分数と発音記号を取得する為に、辞書データ２５３内の「ａｍａｚｉｎｇ」を検索する。辞書データ２５３には、「ａｍａｚｉｎｇ」の区分数と発音記号として、それぞれ「３」と「＠ｍｅＩＺＩＮ」が格納されているので、データ編集手段２３はこれらを得る。
【００３９】
次にデータ編集手段２３は、この区分数と、単語がまたがって入力されているノートバーの数とを比較をする。この場合、「ａｍａｚｉｎｇ」の区分数が「３」であるのに対し、ユーザは「ａｍａｚｉｎｇ」を入力したノートバーの数は「４」であり、ノートバーの数の方が区分数よりも「１」だけ多い。
しかし、第２のバッファ領域には、「＊」が１つ格納されており、これは、４個のノートバーのうち１個は、その前のノートバーの音を引き継ぐことを意味している。この場合、データ編集手段２３は、ノートバーの数から「＊」の数を引き算して、区分数と比較する。ここでは同数になるので、データ編集手段２３は、処理を継続する。
【００４０】
そして、データ編集手段２３は、次のような処理を行う。まず、データ編集手段２３は、バッファ領域内の「ａｍａ＊ｚｉｎｇ」の先頭から順次音節を取り出していく。最初に「ａ」が取り出されるので、この音節に対応した発音記号「＠」を、「ａｍａｚｉｎｇ」の入力を行った４個のノートバーに対応した各ノートデータのうち、最初のノートデータに割り当てる。
次に、「ｍａ」が取り出されるので、この取り出した音節に対応した発音記号「ｍｅＩ」を、２番目のノートデータに割り当てる。
次に、「＊」が取り出されるので、直前の発音記号「ｍｅＩ」の引き伸ばしの為の、発音記号「Ｉ」を３番目のノートデータに割り当てる。
最後に、「ｚｉｎｇ」が取り出されるので、この取り出した音節に対応した発音記号「ＺＩＮ」を、４番目のノートデータに割り当てる。
以上のような処理により、図９のような歌唱スコアデータが得られる。
このように、本実施形態によれば、ユーザが単語を複数のノートバーにまたがって入力した場合、各ノートバーに対応した各ノートデータに、単語を構成する各音節が自動的に割り当てられ、歌唱スコアデータが生成される。よって、ユーザによる歌唱スコアデータの編集が容易になる。
【００４１】
なお、本実施形態では、表記文字として英語アルファベットを用いる例を示したが、表記文字は辞書データにより発音記号に変換されるようになっていれば漢字や記号列であってもよい。
【００４２】
【発明の効果】
以上説明したように、本発明による歌唱合成装置用データ入力インターフェースによれば、ユーザに正しい音節区切りの知識がなく、歌詞の入力を誤って行なわれても母音の数を正しく入力されれば、正しい発音記号が設定される。
【図面の簡単な説明】
【図１】本発明による歌唱合成用データ入力インターフェースを実現するコンピュータの構成を示す図である。
【図２】本発明による歌唱合成用データ入力インターフェースの機能ブロック図を示す図である。
【図３】歌唱スコアデータの構成を示す図である。
【図４】辞書データの構成を示す図である。
【図５】ピアノロールの表示例を示す図である。
【図６】ピアノロールに歌詞を入力する具体例を示す図である。
【図７】歌唱スコアデータの構成を示す図である。
【図８】ピアノロールに歌詞を入力する具体例を示す図である。
【図９】歌唱スコアデータの構成を示す図である。
【符号の説明】
１・・・コンピュータシステム、１１・・・ＣＰＵ、１２・・・ＲＯＭ、１３・・・ＲＡＭ、１４・・・操作部、１５・・・ＨＤＤ、１６・・・ディスプレイ、１７・・・データ入出力部、１８・・・音源部、１９・・・スピーカ、２１・・・操作手段、２２・・・外部データ入力手段、２３・・・データ編集手段、２４・・・表示手段、２５・・・記憶手段。[0001]
BACKGROUND OF THE INVENTION
The present invention relates to a data input device that can easily input lyrics in a singing voice synthesizing device and a program that controls the data input device.
[0002]
[Prior art]
Conventionally, in a speech synthesizer or a speech synthesis program, when a text is given, a phonetic symbol corresponding to the text is generated and a speech is synthesized according to the phonetic symbol. Some devices and programs of this type have a function of editing a generated phonetic symbol string. For this editing, these devices and programs are equipped with a text editor, and a user generally edits a phonetic symbol string directly using this text editor.
However, the generated phonetic symbol sequence is a sequence of symbols representing readings, symbols and positions of accents and poses, etc. If the user does not know the meaning of these symbols, the user edits the phonetic symbol sequence. Was very difficult.
[0003]
As a technique for overcoming this drawback, for example, Patent Document 1 introduces a symbol called utterance classification that is easy to understand for a general user who does not know a phonetic symbol string. This pronunciation division represents the position of the accent phrase, voice phrase, and exhalation paragraph, and can be easily edited by those who can speak the language without knowing the definition of the phonetic symbol string. Can do. Further, since the phonetic symbols are indirectly edited by the editing operation of the phonetic classification, according to Patent Document 1, the phonetic symbol string can be easily edited.
[0004]
Today, music is composed using a computer. This music editing using a computer is also called DTM (desktop music). In this DTM, the user can play a favorite song by himself using MIDI (Musical Instruments Digital Interface) sound source, or the user himself can be a producer and think about the tone and melody. I enjoy making music.
Also, in DTM, using a voice synthesis technique, a computer is allowed to sing a song according to a song.
[0005]
[Patent Document 1]
JP-A-5-11797 [0006]
[Problems to be solved by the invention]
The technique of Patent Document 1 described above proposes a phonetic symbol string generation device capable of easily generating an appropriate phonetic symbol string that gives a highly natural prosody, and a text-to-speech synthesis system using the same. . However, the technique of Patent Document 1 cannot be applied to a device or program for music editing work in which lyrics are assigned to musical notes of a musical score that has already been completed.
[0007]
In addition, in music editing, when adding lyrics and pronunciation to notes, it is difficult for a user who does not know phonetic symbols and syllable breaks to allocate one vowel to one note. Because, for example, when assigning English lyrics to a note, the user must know the syllable of the word. However, a general user must look at the dictionary to know the syllables of words, and it is troublesome to do so when inputting lyrics.
[0008]
The present invention has been made in view of the above-described circumstances, and an object thereof is to provide a singing composition data input device and a singing composition data input program capable of easily allocating lyrics to notes. Yes.
[0009]
[Means for Solving the Problems]
In order to achieve the above object, a singing synthesizing data input program according to the present invention stores a singing score data composed of a plurality of note data corresponding to a plurality of notes constituting a musical piece in a storage means, and the storage means Input data input in association with a plurality of consecutive note data stored in a plurality of written characters constituting one word and a written character connection for connecting two consecutive written characters The input data including the symbol is acquired via the input device, the notation character connection symbol is removed from the input data to obtain the word , and the user assigns the word using the notation character connection symbol a word acquiring process for acquiring the number of note data, the word acquired in the word acquiring process, by searching the dictionary, pronunciation indicating the words in the pronunciation aspect Corresponding to the word acquired in the word acquisition process, the number represented by the segment information acquired in the word information acquisition process, and the word acquired in the word acquisition process The number of note data attached is compared, and if both match, the phonetic symbol obtained from one word in the word acquisition process is divided into syllables each containing one vowel, and each syllable is The singing score data stored in the storage means is updated by allocating to a plurality of continuous note data associated with the input data from which the word is acquired among the respective note data in the singing score data. If they do not match , the computer is caused to execute a singing score data updating process for canceling the assignment process .
[0010]
Further, the data input device for singing synthesis according to the present invention comprises a storage means for storing singing score data composed of a plurality of note data corresponding to a plurality of notes constituting a musical piece, and a plurality of continuous data stored in the storage means. Input data input in association with note data, the input data including a plurality of notation characters constituting one word and a notation character connection symbol for connecting two consecutive notation characters. , A word obtained through an input device, the word is obtained by removing the notation character connection symbol from the input data , and the user obtains the number of note data to which the word is assigned using the notation character connection symbol Table acquisition means, for the word acquired in the word obtaining unit searches the dictionary, and phonetic symbol indicating a word pronunciation embodiment, the number of syllables constituting the word Word information acquisition means for acquiring classification information, the number represented by the classification information acquired in the word acquisition means, and the number of note data associated with the word acquired in the word acquisition process, Are matched, the phonetic symbol obtained from one word in the word acquisition means is divided into syllables each including one vowel, and each syllable is included in each note data in the singing score data. Singing score data for updating the singing score data stored in the storage means by allocating to a plurality of continuous note data associated with the acquired input data, and canceling the allocating process if they do not match Updating means.
[0011]
DETAILED DESCRIPTION OF THE INVENTION
A preferred embodiment of the present invention will be described with reference to the drawings.
FIG. 1 is a block diagram showing a configuration of a computer 1 having a function as a singing synthesizing data input device according to an embodiment of the present invention. In the computer 1 shown in FIG. 1, a CPU (Central Processing Unit) 11, a ROM (Read Only Memory) 12, a RAM (Random Access Memory) 13, an operation unit 14, an HDD (Hard Disk Drive Device) 15, a display 16, and data input / output The units 17 are connected via a bus BUS and can exchange data with each other. The sound source unit 18 and the speaker 19 are connected to the computer 1 as external devices, but may be configured as devices inside the computer 1.
[0012]
The CPU 11 is a microprocessor that performs general-purpose data processing. In addition to the computer 1, the CPU 11 follows a control program such as BIOS (Basic Input / Output System) stored in the ROM 12 and an OS (operating system) stored in the HDD 15. The control processing of the constituent parts of the
[0013]
The ROM 12 is a non-volatile memory that stores a control program such as BIOS. The RAM 13 is a volatile memory for temporarily storing data used by the CPU 11 and other components. The BIOS in the ROM 12 is read by the CPU 11 and written in the RAM 13 when the computer 1 is turned on. The CPU 11 constructs a hardware usage environment according to the BIOS in the RAM 13. The operation unit 14 includes a keypad, a mouse, and the like, and transmits data reflecting operation contents performed by the user to the CPU 11. The HDD 15 is a nonvolatile memory having a large-capacity storage area, and data stored in the HDD 15 can be rewritten. The HDD 15 stores an OS, various applications, and data used by the various applications. The CPU 11 reads out the OS from the HDD 15 and writes it into the RAM 13 after constructing the hardware environment by the BIOS, and performs processing such as construction of a GUI (Graphical User Interface) environment and an application execution environment according to the OS. Among the applications stored in the HDD 15, there is a singing composition data input application.
[0014]
When the CPU 11 receives an instruction to execute the song synthesis data input application from the user by operating the mouse or the like, the CPU 11 reads the song synthesis data input application from the HDD 15 and writes it into the RAM 13, and performs various processes according to the song synthesis data input application. Build the environment to do. In this way, the computer 1 functions as a song synthesis data input device according to the present embodiment.
[0015]
The display 16 includes a liquid crystal display and a drive circuit that drives the liquid crystal display under the control of the CPU 11 and displays information such as characters and graphics.
[0016]
The data input / output unit 17 is, for example, a USB (Universal Serial Bus) interface, an interface capable of inputting / outputting various data, receives data from an external device, transfers the received data to the CPU 11, or is generated by the CPU 11. Send data to external devices.
The sound source unit 18 generates a musical sound signal based on the input data and outputs it as a musical sound from the speaker 19.
[0017]
FIG. 2 is a block diagram showing a functional configuration of a singing voice synthesizing data input device provided by the CPU 11 of the computer 1 executing a singing voice data input application. As shown in the figure, the singing voice synthesizing data input device has an operation means 21, an external data input means 22, a data editing means 23, a display means 24 and a storage means 25. Among these, the operation unit 21 is the operation unit 14 of the computer 1, and the storage unit 25 is the RAM 13 and the HDD 15 of the computer 1. The external data input means 22 is the data input / output unit 17 of the computer 1, and the display means 24 is the display 16. The data editing means 23 is a software module that constitutes a singing synthesizing data input application.
[0018]
The external data input unit 22 acquires the singing score data 251 from an external device or the like, and transmits the acquired singing score data 251 to the data editing unit 23. The data editing unit 23 stores the received singing score data 251 in the storage unit 25, and edits the singing score data 251 in accordance with the operation of the operation unit 21. The display unit 24 displays the singing score data 251 stored in the storage unit 25.
[0019]
FIG. 3 illustrates the configuration of this singing score data 251. The singing score data 251 is time-series data composed of note data corresponding to each of a series of notes constituting the music. In FIG. 3, the data for one line arranged side by side indicates one note data. One piece of note data is composed of data of numbers, pitches, phonetic symbols, pronunciation periods, input characters, and display characters. Here, the number in each piece of note data indicates the number of the note represented by the note data from the beginning of the song. The pitch and sound generation period in each note data designate the pitch and sound generation period of the note corresponding to the note data. The phonetic symbols, input characters, and display characters are information that the data editing unit 23 assigns to each piece of note data in accordance with the operation of the operation unit 21 by the user. The feature of the present embodiment is in the assignment processing of phonetic symbols to each note data performed by the data editing means 23. The storage means 25 stores dictionary data 253 and editing rule data 252 that are referred to in the process of assigning phonetic symbols to the respective note data.
[0020]
FIG. 4 shows the structure of dictionary data. Dictionary data is a collection of data corresponding to each of a large number of written characters such as English words, and the data corresponding to one word represents the notation of the word and the manner in which the word is pronounced. It consists of a phonetic symbol string and information indicating the number of divisions into how many syllables the voice when the word is pronounced is divided into.
[0021]
In the present embodiment, the user can input lyrics to be assigned to a series of note data by operating the operation means 21. When inputting the lyrics, there are cases where a word consisting of a plurality of syllables is assigned to a plurality of consecutive notes. This embodiment is intended for the convenience of the user in such a case. In the present embodiment, when the user wants to assign a word consisting of n syllables to n consecutive notes, only the following rule should be observed.
a. The first notation character in the word is input in association with the first note, and the last notation character in the word is input in association with the last note.
b. Data input in association with a note other than the last note must always end with a notation character connection symbol (for example, a hyphen).
c. The notation characters other than the first notation character and the last notation character in the word are input in the same order as the appearance order in the word.
[0022]
When the data editing unit 23 receives data input according to this rule from the operation unit 21, the data editing unit 23 interprets the notation character string connected by the notation character connection symbol as one word. Then, the data editing unit 23 refers to the dictionary data 253 stored in the storage unit 25 to obtain a phonetic symbol string corresponding to this word, and uses it as n syllables each including one vowel. Divided into n pieces of note data.
The edit rule data 252 is data representing a special rule referred to in the assignment process described above. For example, in English, there is an ambiguous vowel (Schwa sound). An ambiguous vowel is used when the pronunciation is weak and the pronunciation period is short. When the volume of the corresponding note is large or the pronunciation period is long, it is converted into a normal vowel. As a more specific example, when a Schwa sound is assigned to a note that lasts 0.5 seconds or more in the assignment process, the data editing means 23 converts this into a phonetic symbol “Q”.
[0023]
Another rule is to change the pronunciation of “h”. This is a rule that makes the pronunciation of “h” unvoiced or voiced according to the previous and next pronunciations. Specifically, when the phonetic symbol “h” is assigned to certain note data in the assignment process, the data editing means 23 assigns a vowel to the note data before the note data, and When there is no rest between the note data and the note data, the pronunciation of “h” is voiced. For example, if there is a rest between the words “She” and “hits” in the lyrics “She hits”, “h” in “hits” is silent and voiced if there is no rest.
[0024]
Next, the operation of the singing synthesizing data input device will be described. When the data input application for song synthesis is executed by the CPU 11, a piano roll as shown in FIG. Next, when an instruction to read the singing score data is given to the operation means 21, the singing score data is read from the outside using the external data input means 22.
[0025]
The read singing score data is stored in the storage means 25 via the data editing means 23. The singing score data stored in the storage means 25 is displayed on the display means 24 as a piano roll screen. FIG. 5 shows this piano roll screen. In this piano roll screen, the up and down direction corresponds to the arrangement direction of the piano keys, and the lower side represents the bass key and the upper side represents the treble key. That is, the vertical axis of the piano roll screen is the pitch axis. The horizontal axis of the piano roll screen is a time axis. In the drawing, a rectangle surrounded by a thick line represents individual note data included in the singing score data, and is called a note bar. The position of the note bar corresponding to each note data in the pitch axis direction indicates the pitch specified by the note data. The position of the note bar corresponding to each note data in the time axis direction indicates the sounding timing of the note designated by the note data. Further, the length of the note bar corresponding to each note data indicates the sound generation period designated by the note data.
[0026]
In the present embodiment, when a word is divided and assigned to a plurality of note data, the user inputs the word by the operation means 21 using a hyphen indicating that words are continuous as shown in FIG. That is, the word “amazing” in FIG. 6 is input as “a−”, “ma−”, and “zing”, but “−” attached to “a” and “ma” is the input of the word. Is still in the middle, indicating that there are still characters to be assigned to the subsequent note bar. In the example shown in the figure, “a−” is connected to “ma−”, and the connected one is further connected to “zing”. Since “zing” does not have a hyphen, it is a written character string at the end of the word.
[0027]
How “amazing” input in this way is processed in the present embodiment will be described below.
[0028]
First, the user uses the operation means 21 to enter a lyric input waiting state in which lyrics can be input to the note bar by, for example, double-clicking the first note bar NB1 to which “amazing” is assigned. Then, “a” and “−” are input by the operating means 21 and the enter key is pressed.
[0029]
Thereby, the data editing means 23 acquires the input data “a−” in association with the note bar NB1. The input data “a−” is displayed in association with the note bar NB1, and is stored in a buffer area (not shown) secured in the storage means 25. Next, the user designates the note bar NB2 and inputs “ma−” by the same operation. As a result, the data editing unit 23 adds the newly acquired input data “ma−” to the existing input data “a−” in the buffer area in the storage unit 25. As a result, the input data in the buffer area is “a-ma-”. Next, the user designates the note bar NB3 and inputs “zing” by the same operation. Accordingly, the data editing unit 23 adds the newly acquired input data “zing” to the existing input data “a-ma-” in the buffer area in the storage unit 25. As a result, the input data in the buffer area is “a-ma-zing”. In this case, since the input “zing” does not end with “−” at the end, the data editing unit 23 detects that input of all the notation characters constituting the word is completed.
[0030]
Next, the data editing unit 23 removes the hyphen from the input data “a-ma-zing” in the buffer area in the storage unit 25 to obtain the word “amazing”.
[0031]
Next, the data editing unit 23 searches for “amazing” in the dictionary data 253 in order to obtain the number of divisions and pronunciation symbols of “amazing”. Since the dictionary data 253 stores “3” and “@meIZIN” as the number of divisions of “amazing” and the phonetic symbols, respectively, the data editing means 23 obtains them. Next, the data editing means 23 compares the number of divisions with the number of note data (note bar) to which the user has assigned words. If there is a discrepancy between the numbers, an error display is displayed on the display means 24 and the processing is stopped.
[0032]
Next, the data editing means 23 divides the obtained phonetic symbol string in order to assign it to a note. At this time, it is divided using a rule with the position of the vowel as a delimiter. This is because, in a normal score, one syllable corresponds to one note, and one syllable contains one vowel. Since the phonetic symbol of “amazing” is “@meIZIN”, it is divided into three syllables “@”, “meI”, and “ZIN”. Then, the data editing means 23 stores the divided syllables in the phonetic symbol column of each note data corresponding to the note bars NB1, NB2 and NB3 in the singing score data.
[0033]
Next, the data editing unit 23 corrects the phonetic symbols in the singing score data in the storage unit 25 according to the rules stored in the editing rule data 252. FIG. 7 shows final singing score data that has undergone this processing. Note that the phonetic symbols shown in FIG. 7 are phonetic alphabets according to a sampa (SAMPA) that specifies phonetic symbols so that they can be easily handled on a computer.
In the case of the data shown in FIG. 6, since there is no rule that applies to the edit rule data 252, nothing is changed.
In this way, final singing score data corresponding to “amazing” is obtained in the storage means 25.
[0034]
Next, with reference to FIG. 8A, an operation when the user assigns and inputs the word “amazing” to four notes will be described. In this example, the user inputs notation characters in NB1 to NB4 of the note bar as shown in the figure. Since the number of syllables of the word “amazing” is “3”, this is an example in which the user makes a misunderstanding.
In this case, the data editing unit 23 stores “a-ma-zi-ng” input by the user in the buffer area, as in the above-described processing.
Next, the data editing means 23 obtains the word “amazing” by removing the hyphen from “a-ma-zi-ng” stored in the buffer area.
[0035]
Next, the data editing unit 23 searches for “amazing” in the dictionary data 253 in order to obtain the number of divisions and pronunciation symbols of “amazing”. Since the dictionary data 253 stores “3” and “@meIZIN” as the number of divisions of “amazing” and the phonetic symbols, respectively, the data editing means 23 obtains them. Next, the data editing means 23 compares the number of divisions with how many note bars the word is input over. In the case of “amazing”, the user is input across four note bars, and the number of note bars is different from the number of sections. Therefore, the data editing means 23 gives an error display to the display means 24 and stops the processing.
Next, when the user correctly inputs the word “amazing” to three notes and inputs it, as shown in FIG. 8B, the operation when the syllable of the word “amazing” is erroneously input. Will be explained. In this case, “ama-zi-ng” input by the user is stored in the buffer area. Since hyphens are removed from the buffer area, the content of the buffer area is the word “amazing” as described above. . Accordingly, as described above, the three syllables “@”, “meI”, and “ZIN” constituting the phonetic symbol of this word are assigned to the three note data.
[0037]
Next, the operation when the user assigns and inputs the word “amazing” to four notes as shown in FIG. 8C will be described. This is not when the user misunderstood the number of syllables of the word “amazing” as 4, but when the user correctly recognizes the number of syllables and assigns the word “amazing” to four notes. .
Specifically, it is assumed that the user inputs to the note bars NB1 to NB4 as shown in FIG. 8C with the intention of extending the sound “ma−”.
In this case, the user inputs only “−” in the note bar next to the note bar in which “ma−” is input. The “-” input in this way is replaced with another special symbol, for example, “*”, and stored in the buffer area.
[0038]
Therefore, when the user finishes inputting the four note bars, “a-ma- * zing” remains in the buffer area.
The data editing unit 23 deletes the hyphen from “a-ma- * zing” stored in the buffer area to obtain a character string “ama * zing”. Next, when the data editing unit 23 detects that “ama * zing” in the buffer area includes “*”, the data editing unit 23 copies “ama * zing” to the second buffer area, The “*” is removed from “amai * zing” of the word to obtain the word “amazing”.
Next, the data editing unit 23 searches for “amazing” in the dictionary data 253 in order to obtain the number of divisions and pronunciation symbols of “amazing”. Since the dictionary data 253 stores “3” and “@meIZIN” as the number of divisions of “amazing” and the phonetic symbols, respectively, the data editing means 23 obtains them.
[0039]
Next, the data editing means 23 compares the number of divisions with the number of note bars that are input across words. In this case, the number of divisions of “amazing” is “3”, whereas the number of note bars in which the user inputs “amazing” is “4”, and the number of note bars is greater than the number of divisions. Only 1 ”.
However, one “*” is stored in the second buffer area, which means that one of the four note bars takes over the sound of the preceding note bar. . In this case, the data editing means 23 subtracts the number of “*” from the number of note bars and compares it with the number of sections. Since the number is the same here, the data editing means 23 continues the processing.
[0040]
Then, the data editing unit 23 performs the following process. First, the data editing unit 23 sequentially extracts syllables from the head of “ama * zing” in the buffer area. Since “a” is extracted first, the phonetic symbol “@” corresponding to this syllable is assigned to the first note data among the respective note data corresponding to the four note bars to which “amazing” is input. .
Next, since “ma” is extracted, the phonetic symbol “meI” corresponding to the extracted syllable is assigned to the second note data.
Next, since “*” is extracted, the phonetic symbol “I” for extending the previous phonetic symbol “meI” is assigned to the third note data.
Finally, since “zing” is extracted, the phonetic symbol “ZIN” corresponding to the extracted syllable is assigned to the fourth note data.
Through the processing as described above, singing score data as shown in FIG. 9 is obtained.
Thus, according to the present embodiment, when a user inputs a word across a plurality of note bars, each syllable constituting the word is automatically assigned to each note data corresponding to each note bar, Singing score data is generated. Therefore, the user can easily edit the singing score data.
[0041]
In this embodiment, an example in which an English alphabet is used as a written character has been described. However, the written character may be a Chinese character or a symbol string as long as it is converted into a phonetic symbol by dictionary data.
[0042]
【The invention's effect】
As described above, according to the data input interface for a singing voice synthesizer according to the present invention, if the user has no knowledge of correct syllable breaks and the number of vowels is correctly input even if the lyrics are input incorrectly, The correct phonetic symbol is set.
[Brief description of the drawings]
FIG. 1 is a diagram showing the configuration of a computer that implements a data input interface for song synthesis according to the present invention.
FIG. 2 is a functional block diagram of a song synthesis data input interface according to the present invention.
FIG. 3 is a diagram illustrating a configuration of singing score data.
FIG. 4 is a diagram illustrating a configuration of dictionary data.
FIG. 5 is a diagram illustrating a display example of a piano roll.
FIG. 6 is a diagram showing a specific example of inputting lyrics to the piano roll.
FIG. 7 is a diagram showing a configuration of singing score data.
FIG. 8 is a diagram showing a specific example of inputting lyrics to the piano roll.
FIG. 9 is a diagram showing a configuration of singing score data.
[Explanation of symbols]
DESCRIPTION OF SYMBOLS 1 ... Computer system, 11 ... CPU, 12 ... ROM, 13 ... RAM, 14 ... Operation part, 15 ... HDD, 16 ... Display, 17 ... Data entry Output unit, 18 ... sound source unit, 19 ... speaker, 21 ... operating means, 22 ... external data input means, 23 ... data editing means, 24 ... display means, 25 ... -Memory means.

Claims

楽曲を構成する複数の音符に対応した複数のノートデータからなる歌唱スコアデータを記憶手段に記憶する記憶過程と、
前記記憶手段に記憶された複数の連続したノートデータに対応付けられて入力された入力データであって、１つの単語を構成する複数の表記文字と、連続した２つの前記表記文字間を接続する表記文字接続記号とを含んだ入力データを、入力装置を介して取得し、該入力データから前記表記文字接続記号を除去して単語を取得するとともに、ユーザが前記表記文字接続記号を用いて単語を割り当てたノートデータの数を取得する単語取得過程と、
前記単語取得過程において取得された単語について、辞書を検索して、単語の発音態様を示す発音記号と、単語を構成する音節の数を表す区分情報とを取得する単語情報取得過程と、
前記単語情報取得過程において取得した区分情報が表す数と、前記単語取得過程において取得された単語に対応付けられたノートデータの数とを比較し、両者が一致する場合には、前記単語取得過程において１の単語から得られた発音記号を各々１個の母音を含む音節に分け、各音節を、前記歌唱スコアデータにおける各ノートデータのうち前記単語が取得された入力データに対応付けられた複数の連続したノートデータに割り当てることにより、前記記憶手段に記憶された歌唱スコアデータを更新し、両者が一致しない場合には割り当て処理を中止する歌唱スコアデータ更新過程と
をコンピュータに実行させることを特徴とする歌唱合成用データ入力プログラム。A storage process of storing in the storage means singing score data composed of a plurality of note data corresponding to a plurality of notes constituting the music;
Input data that is input in association with a plurality of continuous note data stored in the storage means, and connects between a plurality of written characters constituting one word and the two continuous written characters Input data including a notation character connection symbol is obtained via an input device, the notation character connection symbol is removed from the input data to obtain a word, and a user uses the notation character connection symbol to obtain a word A word acquisition process for acquiring the number of note data to which
A word information acquisition process for searching the dictionary for the word acquired in the word acquisition process, and acquiring a phonetic symbol indicating a pronunciation mode of the word and division information indicating the number of syllables constituting the word;
When the number represented by the category information acquired in the word information acquisition process is compared with the number of note data associated with the word acquired in the word acquisition process, and the two match, the word acquisition process The phonetic symbol obtained from one word is divided into syllables each including one vowel, and a plurality of syllables are associated with input data from which the word is acquired among the note data in the singing score data. The singing score data stored in the storage means is updated by allocating the continuous singing data, and the singing score data updating process of canceling the allocating process when both do not match is performed by the computer. A data input program for singing synthesis.

請求項１記載の歌唱合成用データ入力プログラムにおいて、
前記単語取得過程において取得した入力データに、母音を伸ばして発音することを意味する母音延長記号が入っていた場合、前記単語情報取得過程において取得した発音記号における該母音延長記号の直前の音節に対応する母音を引き伸ばして発音することを指示する情報を含めるように前記歌唱スコアデータを更新することを特徴とする歌唱合成用データ入力プログラム。In the data input program for singing synthesis according to claim 1,
When the input data acquired in the word acquisition process includes a vowel extension symbol that means that the vowel is extended and pronounced, the syllable immediately before the vowel extension symbol in the phonetic symbol acquired in the word information acquisition step A singing synthesizing data input program for updating the singing score data so as to include information for instructing the corresponding vowel to be stretched and pronounced.

楽曲を構成する複数の音符に対応した複数のノートデータからなる歌唱スコアデータを記憶する記憶手段と、
前記記憶手段に記憶された複数の連続したノートデータに対応付けられて入力された入力データであって、１つの単語を構成する複数の表記文字と、連続した２つの前記表記文字間を接続する表記文字接続記号とを含んだ入力データを、入力装置を介して取得し、該入力データから前記表記文字接続記号を除去して単語を取得するとともに、ユーザが前記表記文字接続記号を用いて単語を割り当てたノートデータの数を取得する単語取得手段と、
前記単語取得手段において取得された単語について、辞書を検索して、単語の発音態様を示す発音記号と、単語を構成する音節の数を表す区分情報とを取得する単語情報取得手段と、
前記単語取得手段において取得した区分情報が表す数と、前記単語取得過程において取得された単語に対応付けられたノートデータの数とを比較し、両者が一致する場合には、前記単語取得手段において１の単語から得られた発音記号を各々１個の母音を含む音節に分け、各音節を、前記歌唱スコアデータにおける各ノートデータのうち前記単語が取得された入力データに対応付けられた複数の連続したノートデータに割り当てることにより、前記記憶手段に記憶された歌唱スコアデータを更新し、両者が一致しない場合には割り当て処理を中止する歌唱スコアデータ更新手段と
を有することを特徴とする歌唱合成用データ入力装置。Storage means for storing singing score data composed of a plurality of note data corresponding to a plurality of notes constituting the music;
Input data that is input in association with a plurality of continuous note data stored in the storage means, and connects between a plurality of written characters constituting one word and the two continuous written characters Input data including a notation character connection symbol is obtained via an input device, the notation character connection symbol is removed from the input data to obtain a word, and a user uses the notation character connection symbol to obtain a word Word acquisition means for acquiring the number of note data to which
Word information acquisition means for searching the dictionary for words acquired by the word acquisition means, and acquiring phonetic symbols indicating the pronunciation mode of the words and division information indicating the number of syllables constituting the word;
When the number represented by the category information acquired in the word acquisition unit is compared with the number of note data associated with the word acquired in the word acquisition process, and the two match, in the word acquisition unit A phonetic symbol obtained from one word is divided into syllables each including one vowel, and each syllable is associated with a plurality of input data in which the word is acquired from each note data in the singing score data. Singing score data updating means for updating the singing score data stored in the storage means by allocating to continuous note data, and suspending the allocating process if both do not match, singing composition Data input device.