JP2009244789A

JP2009244789A - Karaoke system with guide vocal creation function

Info

Publication number: JP2009244789A
Application number: JP2008094094A
Authority: JP
Inventors: Tomoaki Nakamura; 友昭中村
Original assignee: Daiichikosho Co Ltd
Current assignee: Daiichikosho Co Ltd
Priority date: 2008-03-31
Filing date: 2008-03-31
Publication date: 2009-10-22

Abstract

<P>PROBLEM TO BE SOLVED: To provide a karaoke system with a guide vocal creation function, for effectively informing a singer of what singing is like, when the singer sings according to a model, in addition to singing timing and a pitch without requiring specific complicated operation, and reproducing guide vocal without unnaturalness. <P>SOLUTION: A user classified phoneme data 56 in which a voice data is obtained and extracted when an arbitrary musical piece is sung by an arbitrary user, is stored, and a score data 57 which is created based on a note data and a lyrics data for each musical piece, is stored. Regarding a selected musical piece, the guide vocal to which a phoneme data corresponding to a user ID attached to the musical piece of the user classified phoneme data 56 is combined with the score data 57 of the musical piece is created and output. <P>COPYRIGHT: (C)2010,JPO&INPIT

Description

本発明は、歌唱者の音声データに基づいてガイドボーカルを歌唱音声合成して出力させるガイドボーカル生成機能を備えるカラオケシステムに関する。 The present invention relates to a karaoke system having a guide vocal generating function for synthesizing and outputting a guide vocal based on voice data of a singer.

近年、カラオケシステムの高機能化が進み、演奏中にガイドボーカル（所謂「模範ボーカル」）を出力して歌唱支援することが行われている。従来、ガイドボーカルは、楽曲の歌唱を得意とする模範歌唱者の歌唱音声を予め録音しておき、当該楽曲の演奏中に、その歌唱音声を再生可能とすることで、利用者に理想的な歌唱（採点機能にて高得点が出せる歌唱）を導くことができる。 In recent years, karaoke systems have become more advanced, and guide vocals (so-called “exemplary vocals”) are output during performance to support singing. Conventionally, guide vocals are ideal for users by recording in advance the singing voice of a model singer who is good at singing music and making the singing voice replayable during the performance of the music. Singing (singing with a high scoring function) can be guided.

このようなカラオケシステムが付帯するガイドボーカル機能に関連し、従来、様々な技術が想到されている。例えば、特許文献１では、歌唱者が演奏について行けているのか否かを、マイクロホンからの歌唱音声の入力有無を検出することで自動的に判別し、入力が無い場合にガイドボーカルを再生するカラオケ装置が開示されている。あるいは、特許文献２では、マイクロホンからの入力音量を測定してこの入力音量が伴奏音楽の音量に比べて小さい時に、その不足分を補うような音量でガイドボーカルを再生するカラオケ装置が開示されている。このように、ガイドボーカルは、歌唱中の利用者に対して理想的な歌唱を導くツールとして活用されている。 Various techniques have been conceived in the past related to the guide vocal function attached to such a karaoke system. For example, in Patent Document 1, a karaoke that automatically determines whether or not a singer can perform a performance by detecting whether or not a singing voice is input from a microphone, and reproduces a guide vocal when there is no input. An apparatus is disclosed. Alternatively, Patent Document 2 discloses a karaoke apparatus that measures the input volume from a microphone and reproduces the guide vocal with a volume that compensates for the shortage when the input volume is smaller than the volume of the accompaniment music. Yes. Thus, the guide vocal is utilized as a tool for guiding an ideal song to the user who is singing.

特開平３−２９５９８号公報JP-A-3-29598 特開平４−２９８７９３号公報JP-A-4-298793

しかしながら、人の声質は千差万別であり、歌唱者によってはガイドボーカルの模範歌唱者の声質と著しくかけ離れていることもあり、この場合には、歌唱者本人が上手く歌えた場合の「歌唱の雰囲気」が非常につかみ難い。すなわち、従来の技術では、歌唱タイミングや音程については良く認識できるものの、歌唱者本人が模範通りに歌った場合、どのような歌唱になるのかは分からない。さらに、同じカラオケの場で、歌唱者の他に利用者（聴衆）がいた場合、歌唱の途中で声質の異なるガイドボーカルが再生されると、他の利用者にとって非常に不自然に聴こえてしまう。勿論、歌唱者本人の歌唱音声にてガイドボーカルが再生されれば、ガイドボーカルのみは当然のことながら、例え、ガイドボーカルと本人の歌唱音声とが一部重なっても「ユニゾン」のように聴こえるため、不自然さがない。したがって、このような課題については、従来技術の範囲では、歌唱者本人がガイドボーカルを予め歌って録音すれば良いのであるが、本来、歌唱を上手く歌えていないが故、ガイドボーカルを活用するのであるから、本末転倒であり得ないことである。 However, the voice quality of a person is quite different, and depending on the singer, the voice quality of the model vocalist of the guide vocal may be significantly different. Is very difficult to grasp. That is, in the conventional technology, although the singing timing and the pitch can be well recognized, it is not known what kind of singing will be performed when the singer sings according to the model. Furthermore, if there is a user (audience) in addition to the singer at the same karaoke venue, if a guide vocal with a different voice quality is played during the singing, it will be very unnatural for other users. . Of course, if the guide vocal is played with the singing voice of the singer, the guide vocal can be heard as a matter of course, even if the guide vocal and the singing voice of the singing partly overlap. Therefore, there is no unnaturalness. Therefore, for such a problem, in the range of the prior art, the singer should sing and record the guide vocal in advance, but since the singing is not singing well, the guide vocal is used. Because there is, it cannot be a fall at the end.

ところで、最近、歌唱音声合成の技術が進歩し、本来、歌唱していないにも拘わらず、個人の発声をサンプリングして、歌唱を再現できる歌唱音声合成装置が高性能となっている。具体的には、例えば、特開２００７−２４０５６４号公報では、音声素片データベースに各種の音声素片を示す音声素片データをサンプリングして記憶しておき、ユーザから入力される音符データ及び歌詞データを含む曲データに基づいて、歌唱歌唱音声合成に用いる複数の音声素片、発生タイミング、ピッチを指定する情報を曲の進行の時系列化した歌唱音声合成スコアを生成し、当該歌唱音声合成スコアで指定される音声素片に対応する音声素片データを上記音声素片データベースから読み出し、所定のピッチ変換、素片連結を行うことで歌唱音声を歌唱音声合成することが開示されている。 By the way, recently, singing voice synthesis technology has advanced, and a singing voice synthesizer capable of sampling a person's utterance and reproducing a singing voice despite the fact that the singing voice is not originally sung. Specifically, for example, in Japanese Patent Application Laid-Open No. 2007-240564, speech unit data indicating various speech units is sampled and stored in a speech unit database, and note data and lyrics input from a user are stored. Based on the song data including the data, generate a singing voice synthesis score in which a plurality of voice segments used for singing singing voice synthesis, the generation timing, and the information specifying the pitch are time-series of the progress of the song, and the singing voice synthesis It is disclosed that speech unit data corresponding to a speech unit specified by a score is read from the speech unit database, and singing speech is synthesized with singing speech by performing predetermined pitch conversion and unit connection.

そこで、本発明は、この歌唱音声合成技術を好適に利用し、利用者による音声素片（音素）データのサンプリングなど、特別で面倒な操作を要することなく、歌唱タイミングや音程については勿論、歌唱者本人が模範通りに歌った場合に、どのような歌唱になっているのかを本人に効果的に知らしめることができ、さらに、同じカラオケの場に、歌唱者の他に利用者がいたとしても、例え、ガイドボーカルと本人の歌唱音声とが一部重なっても「ユニゾン」のように聴こえ、不自然さが少ないガイドボーカルを再生可能とするガイドボーカル生成機能を備えるカラオケシステムの提供を目的とする。 Therefore, the present invention suitably uses this singing voice synthesis technique, and does not require special and troublesome operations such as sampling of speech segment (phoneme) data by the user. If the person himself sang according to the model, it is possible to effectively inform the person what kind of singing is performed, and further, there is a user in addition to the singer in the same karaoke place For example, even if the guide vocal and the person's singing voice partially overlap, the purpose is to provide a karaoke system equipped with a guide vocal generation function that makes it possible to reproduce a guide vocal with less unnaturalness that can be heard like “Unison” And

上記課題を解決するために、請求項１の発明では、利用者ＩＤが附帯されて選曲された楽曲の演奏に際してガイドボーカルを歌唱音声合成して出力するガイドボーカル生成機能を備えるカラオケシステムであって、利用者特定手段、利用者別音声データ採取手段、利用者別音素データ抽出手段、スコアデータ作成手段及びガイドボーカル生成手段を有し、前記利用者特定手段は、ログイン要求の利用者より利用者ＩＤを取得して当該利用者を特定し、前記利用者別音声データ採取手段は、任意の利用者による任意の楽曲の歌唱時にその音声データを採取し、前記利用者別音素データ抽出手段は、利用者ＩＤ別に、前記取得した音声データから少なくとも発声音の基本単位となる本人の音素データを抽出して利用者別音素データベースに格納し、前記スコアデータ作成手段は、所定の楽曲の音符データ及び歌詞データに基づいて歌唱音声合成用のスコアデータを作成し、前記ガイドボーカル生成手段は、利用者ＩＤが附帯されて選曲された楽曲について、前記作成された当該楽曲のスコアデータに、当該利用者ＩＤに対応する音素データを歌唱音声合成させることでガイドボーカルを生成して出力可能とさせる、構成とする。 In order to solve the above-mentioned problem, the invention of claim 1 is a karaoke system having a guide vocal generating function for synthesizing and outputting a guide vocal when a musical piece selected with a user ID is selected. , User identification means, user-specific voice data collection means, user-specific phoneme data extraction means, score data creation means, and guide vocal generation means, wherein the user identification means is a user than a user of a login request. ID is acquired and the said user is specified, The said voice data collection means according to user collects the voice data at the time of the singing of arbitrary music by an arbitrary user, The said phoneme data extraction means according to said user is For each user ID, at least the phoneme data of the person who is the basic unit of the uttered sound is extracted from the acquired voice data and stored in the user-specific phoneme database. The score data creating means creates score data for singing voice synthesis based on the note data and lyrics data of a predetermined music, and the guide vocal generating means is for a music selected with a user ID attached thereto. It is set as the structure which produces | generates and can output a guide vocal by singing voice synthesis | combination of the phoneme data corresponding to the said user ID to the score data of the said said produced music.

請求項２の発明では、前記スコアデータ作成手段は、任意の楽曲の演奏の際に、楽曲の音符データ及び歌詞データに基づいてスコアデータを作成し、前記ガイドボーカル生成手段は、前記作成された当該楽曲のスコアデータに、当該利用者ＩＤに対応する音素データを歌唱音声合成させることでガイドボーカルを生成して出力可能とさせる構成である。 In the invention of claim 2, the score data creating means creates score data based on the musical note data and lyric data when playing an arbitrary piece of music, and the guide vocal generating means The composition is such that a guide vocal can be generated and output by synthesizing the phoneme data corresponding to the user ID with the score data of the song in the singing voice.

本発明によれば、任意の利用者による任意の楽曲の歌唱時にその音声データが採取され、利用者別音素データベースに当該音声データから少なくとも発声音の基本単位となる本人の音素データが抽出されて利用者ＩＤ別に格納し、所定の楽曲音符データ及び歌詞データに基づいて歌唱音声合成用のスコアデータを作成し、利用者ＩＤが附帯されて選曲された楽曲について、当該楽曲のスコアデータに、当該利用者ＩＤに対応する音素データを歌唱音声合成させることでガイドボーカルを生成して出力可能とさせる構成とすることにより、利用者による特別で面倒な操作を要することなく、歌唱タイミングや音程については勿論、歌唱者本人が模範通りに歌った場合に、どのような歌唱になっているのかを本人に効果的に知らしめることができ、また、不自然さが少ないガイドボーカルを再生することができるものである。 According to the present invention, voice data is collected when an arbitrary song is sung by an arbitrary user, and at least the person's phoneme data, which is a basic unit of the uttered sound, is extracted from the voice data into the user-specific phoneme database. Store by user ID, create score data for singing voice synthesis based on predetermined musical note data and lyric data, and for the music selected with the user ID attached, the score data of the music About the singing timing and the pitch without requiring a special and troublesome operation by the user, by making it possible to generate and output a guide vocal by synthesizing the phoneme data corresponding to the user ID. Of course, if the singer sings according to his model, he can effectively inform the person what the singing is. Further, those which can reproduce less unnaturalness guide vocal.

また、任意の楽曲の演奏の際に、当該楽曲のスコアデータを作成し、当該スコアデータに当該利用者ＩＤに対応する音素データを歌唱音声合成させることでガイドボーカルを生成して出力可能とさせることにより、楽曲歌唱のキーやテンポなどの設定を当該ガイドボーカルに反映させることができるものである。 In addition, when playing an arbitrary musical piece, the score data of the musical piece is created, and the vocal data is generated by synthesizing the phoneme data corresponding to the user ID with the score data so that the guide vocal can be generated and output. Thus, settings such as the key and tempo of song singing can be reflected in the guide vocal.

以下、本発明の最良の実施形態を図により説明する。
図１に、本発明に係るカラオケシステムの系統構成図を示す。図１（Ａ）は通信ネットワークを使用して本システムを構成させた場合のネットワーク模式図、図１（Ｂ）はホスト装置の概要ブロック構成図、図１（Ｃ）は利用者別音素ＤＢの登録内容の説明図である。 Hereinafter, the best embodiment of the present invention will be described with reference to the drawings.
FIG. 1 shows a system configuration diagram of a karaoke system according to the present invention. 1A is a schematic diagram of a network when this system is configured using a communication network, FIG. 1B is a schematic block diagram of a host device, and FIG. 1C is a phoneme DB for each user. It is explanatory drawing of the registration content.

図１（Ａ）において、ホスト装置１１はカラオケシステムの一部を構成するものとして、通信ネットワーク１２を介して所定数のカラオケ演奏端末１３（１３Ａ〜１３Ｎ）を管理するものであり、相互にデータ授受自在に接続されたものである。上記通信ネットワーク１２としては、例えば、一般公衆電話回線やこれを用いたＡＤＳＬや光通信回線或いはインターネット、さらにはＬＡＮがあるが、インターネット上に構築されるＶＰＮが好ましい。 In FIG. 1A, the host device 11 manages a predetermined number of karaoke performance terminals 13 (13A to 13N) via the communication network 12 as a part of the karaoke system, It is connected so that it can be exchanged freely. The communication network 12 includes, for example, a general public telephone line, an ADSL using the same, an optical communication line, the Internet, and a LAN, and a VPN constructed on the Internet is preferable.

ホスト装置１１は、図１（Ｂ）に示すように、少なくとも送受信手段２１、制御手段２２、利用者別音素データ抽出手段２３及び利用者別音素データベース（ＤＢ）２４を備える。上記送受信手段２１は、各カラオケ演奏端末１３（１３Ａ〜１３Ｎ）との通信（データ授受）を行うために、通信ネットワーク１２の通信方式と整合性をとるための例えば物理的な通信用回路やプラットフォーム等のソフトウエアにより構成される。 As shown in FIG. 1B, the host device 11 includes at least a transmission / reception unit 21, a control unit 22, a user-specific phoneme data extraction unit 23, and a user-specific phoneme database (DB) 24. The transmission / reception means 21 is, for example, a physical communication circuit or platform for consistency with the communication system of the communication network 12 in order to perform communication (data exchange) with each karaoke performance terminal 13 (13A to 13N). It is configured by software such as.

上記制御手段２２は、当該ホスト装置１１を統括的に制御するもので、例えば物理的なＣＰＵであり、図示しないＲＯＭに格納されているプログラムのアルゴリズム処理を行う。上記利用者別音素データ抽出手段２３は、所定のカラオケ演奏端末１３より利用者ＩＤに基づいて音声データが送信されてきたときに、音声データから少なくとも発声音の基本単位となる本人の音素データを抽出して利用者別音素ＤＢ２４に利用者ＩＤを関連付けて格納する。ここでは、上記音素データの他に、音声データから歌唱者の歌唱技術をデータ化した歌唱唱法データをも抽出して利用者ＩＤに関連付けて格納する。 The control means 22 controls the host device 11 in an integrated manner, and is a physical CPU, for example, and performs algorithm processing of a program stored in a ROM (not shown). The user-specific phoneme data extracting means 23 obtains the phoneme data of the person who is at least the basic unit of the uttered sound from the sound data when the sound data is transmitted from the predetermined karaoke performance terminal 13 based on the user ID. The user ID is extracted and stored in the user-specific phoneme DB 24 in association with the user ID. Here, in addition to the phoneme data, singing method data obtained by converting the singer's singing technique into data from the voice data is also extracted and stored in association with the user ID.

すなわち、上記利用者別音素ＤＢ２４は、図１（Ｃ）に示すように、利用者ＩＤ別に、発声音の基本単位となる本人の音素データ及び当該本人の歌唱唱法データを格納したものであるが、少なくとも音素データが格納される。音素データとは、総ての発声音を、母音、先頭子音、末尾子音、子音から母音への変化、母音から子音への変化の五つの音声素片に区分してデータ化したものである。歌唱唱法データとは、ビブラートや抑揚などの歌に特徴を付ける歌唱技術を周波数情報などによりデータ化したものであり、これをもガイドボーカルの生成の要件に加えることにより、ガイドボーカルに歌唱技術を反映させることができるものである。また、当該歌唱唱法データに対応して、後述の楽曲ＤＢ（４５）に模範的な唱法をデータ化した楽曲唱法データ（６３）を備えさせて歌唱音声合成用のスコアデータ作成の一要素とされる。本実施形態では、上記音素データのみについて説明するが、当該唱法データを含ませてもよい。 That is, as shown in FIG. 1C, the user-specific phoneme DB 24 stores, for each user ID, the person's phoneme data, which is the basic unit of the uttered sound, and the person's singing method data. , At least phoneme data is stored. The phoneme data is data obtained by classifying all utterances into five speech segments: vowels, head consonants, tail consonants, changes from consonants to vowels, and changes from vowels to consonants. Singing method data is a singing technique that characterizes vibrato, inflection and other singing techniques based on frequency information, etc., and this is also added to the requirements for generating guide vocals. It can be reflected. Corresponding to the singing method data, the music DB (45), which will be described later, is provided with music singing method data (63) obtained by converting an exemplary singing method into data, and is used as an element for creating score data for singing voice synthesis. The In the present embodiment, only the phoneme data will be described, but the chord data may be included.

続いて、図２に、本発明のカラオケシステムにおけるカラオケ演奏端末のブロック構成図を示す。図２において、カラオケ演奏端末１３は、主要装置としてのカラオケ演奏装置３１に有線又は無線で外部接続されるものとして、表示部３２、ミキシングアンプ３３、マイク３４、スピーカ３５を備える。また、有線又は無線で遠隔入出力端末３６が接続される。当該遠隔入出力端末３６は、楽曲検索手段３７、利用者特定手段３８及び端末表示部３９を備える。 Then, in FIG. 2, the block block diagram of the karaoke performance terminal in the karaoke system of this invention is shown. In FIG. 2, the karaoke performance terminal 13 includes a display unit 32, a mixing amplifier 33, a microphone 34, and a speaker 35 that are externally connected to a karaoke performance device 31 as a main device by wire or wirelessly. A remote input / output terminal 36 is connected by wire or wireless. The remote input / output terminal 36 includes music search means 37, user identification means 38, and a terminal display unit 39.

上記表示部３２は、通常の楽曲選曲表示やカラオケ演奏時の映像、歌詞テロップを表示するもので、例えば液晶ディスプレイ（ＬＣＤ）、プラズマディスプレイ（ＰＤＰ）、その他種々のディスプレイを採用することができる。上記ミキシングアンプ３３は、カラオケ演奏装置３１より送られてくる演奏音声信号と、マイク３４からの音声信号、後述のガイドボーカル生成手段（５３）より送られてくるガイドボーカルとをミキシングし、増幅してスピーカ３５より出力する。 The display unit 32 displays normal music selection display, video during karaoke performance, and lyrics telop. For example, a liquid crystal display (LCD), a plasma display (PDP), and other various displays can be employed. The mixing amplifier 33 mixes and amplifies the performance voice signal sent from the karaoke performance device 31, the voice signal from the microphone 34, and the guide vocal sent from the guide vocal generating means (53) described later. And output from the speaker 35.

遠隔入出力端末３６は、図示しない端末送受信部により、カラオケ演奏装置３１に対して有線方式ないし無線方式（ＩＲ方式やブルートゥース（登録商標）機構のピコネット接続方式など）を利用してデータ授受を行うためのもので、楽曲検索手段３７、利用者特定手段３８及び端末表示部３９を適宜備える。 The remote input / output terminal 36 transmits / receives data to / from the karaoke performance apparatus 31 by using a wired system or a wireless system (IR system, Bluetooth (registered trademark) mechanism piconet connection system, etc.) by a terminal transmission / reception unit (not shown). Therefore, a music search means 37, a user specifying means 38 and a terminal display unit 39 are provided as appropriate.

上記楽曲検索手段３７は、後述するユーザインタフェース機能により利用者に楽曲を検索させ、選曲させるプログラムである。上記利用者特定手段３８は、ログイン要求の利用者より利用者ＩＤを取得して当該利用者を特定するプログラムである。当該利用者の特定は、利用者の所持するＩＣカードからの利用者ＩＤの取得、ユーザＩＤやパスワードの入力、声紋、指紋などの生体認証等による。なお、利用者特定手段３８を、カラオケ演奏装置３１に備えさせてもよい。 The music search means 37 is a program that allows a user to search for music by a user interface function, which will be described later, and to select music. The user specifying means 38 is a program for acquiring a user ID from the user who requested the login and specifying the user. The identification of the user is based on acquisition of a user ID from an IC card possessed by the user, input of a user ID or password, biometric authentication such as a voice print or fingerprint. The user specifying means 38 may be provided in the karaoke performance device 31.

上記端末表示部３９は、液晶ディスプレイ（ＬＣＤ）とタッチセンサとを積層して入出力用とし、表示されるアイコン等に対応して当該タッチセンサにより楽曲の選択などのデータを入力することができるＧＵＩのユーザインタフェース機能を有するものである。 The terminal display unit 39 can be used for input / output by laminating a liquid crystal display (LCD) and a touch sensor, and can input data such as music selection by the touch sensor corresponding to displayed icons and the like. It has a GUI user interface function.

上記カラオケ演奏装置３１は、バス４１、中央制御部４２、ＲＯＭ４３、ＲＡＭ４４、楽曲ＤＢ４５、映像ＤＢ４６、映像再生制御部４７、音楽演奏制御部４８、Ａ／Ｄ変換部４９、利用者別音声データ採取手段５０、利用者別音素データ取得手段５１、スコアデータ作成手段５２、ガイドボーカル生成手段５３及び送受信部５４Ａ，５４Ｂを適宜備える。また、ＲＡＭ４４には、利用者別音声データ５５、利用者別音素データ５６、楽曲別スコアデータ５７及び予約待ち行列５８の記憶領域が形成される。なお、上記各構成について、本発明の要旨と直接関連しない要素部分であっても、従前のカラオケ装置においても大部分が適用可能であることを示すために、装置全体を説明する。 The karaoke performance device 31 includes a bus 41, a central control unit 42, a ROM 43, a RAM 44, a music DB 45, a video DB 46, a video reproduction control unit 47, a music performance control unit 48, an A / D conversion unit 49, and voice data collection for each user. Means 50, user-specific phoneme data acquisition means 51, score data creation means 52, guide vocal generation means 53, and transmission / reception units 54A and 54B are provided as appropriate. The RAM 44 includes storage areas for user-specific voice data 55, user-specific phoneme data 56, music-specific score data 57, and a reservation queue 58. In addition, even if it is an element part which is not directly related to the summary of this invention about each said structure, in order to show that most can be applied also in the conventional karaoke apparatus, the whole apparatus is demonstrated.

上記中央制御部４２は、このシステムを統括的に処理制御する物理的なＣＰＵであり、ＲＯＭ４３に記憶されているプログラムに基づくアルゴリズム処理を行う。上記ＲＡＭ４４は、利用者別音声データ５５、利用者別音素データ５６、楽曲別スコアデータ５７及び予約待ち行列５８の記憶領域が形成される他に、上記種々のプログラムを展開、実行させるための作業領域としての役割をなすもので、例えば半導体メモリで構成され、仮想的にハードディスク上に構築される場合をも含む概念である。 The central control unit 42 is a physical CPU that performs overall processing control of the system, and performs algorithm processing based on a program stored in the ROM 43. The RAM 44 has storage areas for user-specific voice data 55, user-specific phoneme data 56, music-specific score data 57, and reservation queue 58, as well as operations for developing and executing the various programs. It serves as an area, and is a concept that includes, for example, a case where it is configured by a semiconductor memory and is virtually built on a hard disk.

上記楽曲ＤＢ４５は、楽曲毎に、音符データ、歌詞データを格納し、適宜当該楽曲についての歌唱技術（ビブラートや抑揚等）を周波数などでデータ化した楽曲唱法データを格納する（図４参照）。具体的には、楽曲ＩＤ、曲名及びアーチストＩＤ（アーチスト名）が関連付けられた楽曲テーブルを有し、楽曲毎に、楽曲ＩＤで管理される所定データ形式のカラオケ楽曲の音符データ（例えば、ＭＩＤＩ（登録商標）形式の音符データ）及び歌詞データ（歌詞テロップデータ）が同期されて構成される楽曲データ（ファイル）について楽曲コードをファイル名としてそれぞれ格納したデータベースであり、映像ＤＢ４６に格納された当該楽曲毎の背景映像を表示するための所定数のシーン映像を割り当てる割当データが関連付けられる。 The music DB 45 stores note data and lyric data for each music, and stores music singing method data obtained by appropriately singing the singing technique (vibrato, inflection, etc.) of the music by frequency or the like (see FIG. 4). Specifically, it has a music table in which a music ID, a music title, and an artist ID (artist name) are associated, and for each music, karaoke musical note data (for example, MIDI ( (Registered trademark) format note data) and lyric data (lyric telop data) in synchronism with music data (files), each of which stores a music code as a file name, and the music stored in the video DB 46 Allocation data for assigning a predetermined number of scene videos for displaying each background video is associated.

当該映像ＤＢ４６は、背景映像表示のための所定数のシーン映像データを所定数格納するデータベースである。なお、この楽曲ＤＢ４５及び映像ＤＢ４６を、カラオケ演奏装置３１ではなく、上記ホスト装置１１に備えさせることとしてもよい。 The video DB 46 is a database that stores a predetermined number of scene video data for displaying a background video. The music DB 45 and the video DB 46 may be provided not in the karaoke performance device 31 but in the host device 11.

上記映像再生制御部４７は、演奏時に、映像ＤＢ４６より抽出された所定数のシーン映像データ及び楽曲コードで楽曲ＤＢ４５より抽出された歌詞テロップデータ（歌詞文字データ）を当該楽曲の音符データに同期させて表示部３２に出力する電子回路である。上記音楽演奏制御部４８は、楽曲コードで楽曲ＤＢ４５より抽出された音符データをデジタル再生し、アナログ変換してミキシングアンプ３３に出力する電子回路である。 The video playback control unit 47 synchronizes the lyric telop data (lyric character data) extracted from the music DB 45 with a predetermined number of scene video data and music code extracted from the video DB 46 at the time of performance with the note data of the music. This is an electronic circuit that outputs to the display unit 32. The music performance control unit 48 is an electronic circuit that digitally reproduces the note data extracted from the music DB 45 with the music code, converts it into an analog signal, and outputs it to the mixing amplifier 33.

上記Ａ／Ｄ変換部４９は、マイク３４から入力される歌唱時の歌唱音声をデジタル変換して利用者別音声データ採取手段５０に送出する電子回路である。上記利用者別音声データ採取手段５０は、任意の利用者による任意の楽曲の歌唱時にその音声データを採取し、利用者ＩＤを附帯させてＲＡＭ４４の利用者別音声データ５５として記憶するプログラムである。なお、記憶された利用者別の音声データ５５は、例えば中央制御部４２が所定タイミング（例えばログアウト後）にホスト装置１１に送信するもので、ホスト装置１１では、上述のように利用者別音素データ抽出手段２３が、当該音声データより少なくとも音素データや適宜歌唱唱法データを抽出し、利用者ＩＤに関連付けて利用者別音素ＤＢ２４に格納するものである。 The A / D conversion unit 49 is an electronic circuit that digitally converts a singing voice input from the microphone 34 and sends it to the user-specific voice data collection means 50. The user-specific voice data collection means 50 is a program that collects voice data when an arbitrary user sings an arbitrary piece of music, and stores it as user-specific voice data 55 in the RAM 44 with a user ID attached thereto. . The stored user-specific voice data 55 is, for example, transmitted by the central control unit 42 to the host device 11 at a predetermined timing (for example, after logout). The data extraction means 23 extracts at least phoneme data and appropriate singing method data from the voice data, and stores them in the user-specific phoneme DB 24 in association with the user ID.

上記利用者別音素データ取得手段５１は、上記利用者特定手段３８で取得した利用者ＩＤに基づいてホスト装置１１に対して少なくとも当該利用者の音素データを要求し、当該ホスト装置１１より送信されてきた当該利用者の音素データをＲＡＭ４４の利用者別音素データ５６として利用者ＩＤを関連付けて記憶するプログラムである。 The user-specific phoneme data acquisition unit 51 requests at least the phoneme data of the user from the host device 11 based on the user ID acquired by the user specifying unit 38, and is transmitted from the host device 11. In this program, the phoneme data of the user is stored as user-specific phoneme data 56 in the RAM 44 in association with the user ID.

上記スコアデータ作成手段５２は、所定の楽曲の音符データ及び歌詞データに基づいて歌唱音声合成用のスコアデータを作成し、当該スコアデータをＲＡＭ４４の楽曲別スコアデータ５７に記憶するプログラムである。当該スコアデータの作成のタイミングは種々あるが、本実施形態では、任意の楽曲が予約待ち行列５８に登録されたときに、当該楽曲のスコアデータが作成されるものとして説明する。他のタイミングは後述する。 The score data creating means 52 is a program for creating score data for singing voice synthesis based on the note data and lyrics data of a predetermined music and storing the score data in the score data 57 by music of the RAM 44. Although there are various timings for generating the score data, in the present embodiment, description will be made assuming that score data of the music is generated when an arbitrary music is registered in the reservation queue 58. Other timing will be described later.

上記ガイドボーカル生成手段５３は、利用者ＩＤが附帯されて選曲された楽曲について、作成された当該楽曲のスコアデータに、当該利用者ＩＤに対応する音素データを歌唱音声合成させることでガイドボーカルを生成して出力可能とさせるプログラムである。一例として、予約待ち行列５８の登録楽曲について、楽曲別スコアデータ５７から当該楽曲のスコアデータを取得すると共に、選曲された当該楽曲に附帯された利用者ＩＤに基づいて対応の音素データを利用者別音素データ５６より取得し、取得されたスコアデータに音素データを歌唱音声合成させるものである。なお、ガイドボーカル生成は、例えば前述の特開２００７−２４０５６４号公報で示されたピッチ変換や素片連結等を適用することができる。ここで、出力可能とは、出力のタイミングを利用者の操作等により適宜選択できることを意味する。 The guide vocal generating means 53 synthesizes the vocal vocals by synthesizing the phoneme data corresponding to the user ID into the score data of the music created for the music selected with the user ID attached. It is a program that can be generated and output. As an example, for the registered music in the reservation queue 58, the score data of the music is acquired from the music score data 57, and the corresponding phoneme data is obtained based on the user ID attached to the selected music. It is acquired from the separate phoneme data 56, and the phoneme data is synthesized with the singing voice by the acquired score data. Note that, for example, pitch conversion and unit connection shown in the above-mentioned Japanese Patent Application Laid-Open No. 2007-240564 can be applied to the guide vocal generation. Here, “output is possible” means that the output timing can be appropriately selected by a user operation or the like.

上記送受信部５４Ａは、遠隔入出力端末３６との間で有線方式ないし無線方式（ＩＲ方式やブルートゥース（登録商標）機構のピコネット接続方式など）を利用してデータ授受を行うためのもので、そのための電子回路及びプログラムである。上記送受信部５４Ｂは、上記ホスト装置１１と上記通信ネットワーク１２を介してデータ授受を行うためのもので、通信方式と整合性をとるための例えば物理的な通信用回路やプラットフォーム等のソフトウエアにより構成されるものである。また、遠隔入出力端末３６は、送受信部５４Ａ，５４Ｂを介してホスト装置１１と通信可能とされる。 The transmission / reception unit 54A is used to exchange data with the remote input / output terminal 36 by using a wired system or a wireless system (such as an IR system or a Bluetooth (registered trademark) mechanism piconet connection system). Electronic circuit and program. The transmission / reception unit 54B is for exchanging data with the host device 11 via the communication network 12, and is configured by software such as a physical communication circuit or a platform for consistency with the communication method. It is composed. Further, the remote input / output terminal 36 can communicate with the host device 11 via the transmission / reception units 54A and 54B.

上記ＲＡＭ４４に形成される予約待ち行列５８は、遠隔入出力端末３６で選曲された楽曲ＩＤが送受信部５４Ａを介して送信されてきたときに、中央制御部４２が当該送信されてきた利用者ＩＤの附帯された楽曲ＩＤを予約順に記憶させていくデータ記憶領域である。 In the reservation queue 58 formed in the RAM 44, when the music ID selected by the remote input / output terminal 36 is transmitted via the transmission / reception unit 54A, the central control unit 42 transmits the user ID transmitted. Is a data storage area for storing the accompanying music IDs in the order of reservation.

なお、図示しないが、当該カラオケ演奏装置３１には、利用者が楽曲番号を直接入力したり、演奏楽曲のテンポや、歌唱音声に対する種々の調節を行うためのボタンやツマミ類が可変抵抗器等の電子素子に直結された操作パネルも接続される。ところで、カラオケ演奏端末１３をスタンドアローンのカラオケシステムとして適用させる場合には上記ホスト装置１１の利用者別音素データ抽出手段２３及び利用者別音素ＤＢ２４を備えさせる構成とすることにより実現することができるものである。 Although not shown in the figure, the karaoke performance device 31 includes buttons and knobs for the user to directly input a music number, or to make various adjustments to the tempo of the performance music and singing voice, etc. An operation panel directly connected to the electronic element is also connected. By the way, when the karaoke performance terminal 13 is applied as a stand-alone karaoke system, it can be realized by providing the user-specific phoneme data extracting means 23 and the user-specific phoneme DB 24 of the host device 11. Is.

そこで、図３に図２のカラオケ演奏装置における歌唱音声データの取得及び選曲時の処理フローチャートを示すと共に、図４に図３における選曲時の説明図を示す。図３（Ａ）は音素データのサンプリングの処理フローチャート、図３（Ｂ）はスコアデータ作成等の処理フローチャートである。 Therefore, FIG. 3 shows a processing flowchart for obtaining singing voice data and selecting a song in the karaoke performance apparatus of FIG. 2, and FIG. 4 shows an explanatory diagram for selecting the song in FIG. FIG. 3A is a processing flowchart for sampling phoneme data, and FIG. 3B is a processing flowchart for creating score data.

図３（Ａ）において、まず、利用者別音声データ採取手段５０が、演奏中における歌唱者の利用者ＩＤを予約待ち行列５８より認識し、マイク３４からＡ／Ｄ変換部４９を介して歌唱音声データを採取し、ＲＡＭ４４に利用者ＩＤに関連付けた利用者別音声データ５５として一旦記憶する（ステップ（Ｓ）１）。記憶した利用者別音声データ５５は、例えば中央制御部４２が利用者ＩＤに関連付けて所定タイミング（例えばログアウト後）にホスト装置１１に送信する（Ｓ２）。そして、ホスト装置１１では、上記利用者別音素データ抽出手段２３が利用者毎の音声データより音素データを抽出して利用者ＩＤを関連付けて利用者別音素ＤＢ２４に格納するものである（Ｓ３）。 In FIG. 3A, first, the user-specific voice data collection means 50 recognizes the user ID of the singer during the performance from the reservation queue 58, and sings from the microphone 34 via the A / D conversion unit 49. Audio data is collected and temporarily stored in the RAM 44 as user-specific audio data 55 associated with the user ID (step (S) 1). The stored user-specific voice data 55 is transmitted to the host device 11 at a predetermined timing (for example, after logout) in association with the user ID, for example, by the central control unit 42 (S2). In the host device 11, the user-specific phoneme data extracting means 23 extracts phoneme data from the voice data for each user, associates the user ID with it, and stores it in the user-specific phoneme DB 24 (S3). .

このように、所定の利用者の歌唱時に歌唱音声を取得し、当該音声データを利用者別音素データ抽出手段２３で抽出して利用者別音素ＤＢ２４に格納させることにより、歌唱者に意識させることなく音素データの取得を容易とさせることができるものである。 In this way, the singing voice is acquired when a predetermined user sings, and the voice data is extracted by the user-specific phoneme data extracting means 23 and stored in the user-specific phoneme DB 24 to make the singer aware. Therefore, acquisition of phoneme data can be facilitated.

続いて、選曲時のスコアデータ作成については、図３（Ｂ）において、まず、遠隔入出力端末３６で利用者ＩＤが取得された利用者から、例えば図示しないメニュー表示より「楽曲検索」が選択されて楽曲検索の要求があると（Ｓ１１）、楽曲検索選曲手段３７が実行されて検索画面を表示され、当該利用者による検索結果として、例えば図４（Ａ）に示すような選曲画面が表示される（Ｓ１２）。 Subsequently, regarding the creation of score data at the time of music selection, in FIG. 3B, first, “music search” is selected from a user whose user ID is acquired by the remote input / output terminal 36, for example, from a menu display (not shown). When there is a request for music search (S11), the music search music selection means 37 is executed and a search screen is displayed. As a search result by the user, for example, a music selection screen as shown in FIG. 4A is displayed. (S12).

当該利用者により、ガイドボーカルの設定欄４０で当該ガイドボーカルの再生の出力が適宜選択され、「転送」ボタンが選択されると（Ｓ１３）、選曲された楽曲の曲名（楽曲ＩＤ）及びガイドボーカルのフラグ情報（ガイドフラグの有無）に利用者ＩＤが附帯されてＲＡＭ４４の予約待ち行列５８に登録される（Ｓ１４）。上記ガイドボーカルの設定欄４０でガイドボーカルが選択されず、ガイドフラグが「無」の場合には終了する（Ｓ１５）。 When the user appropriately selects the output of reproduction of the guide vocal in the guide vocal setting field 40 and selects the “transfer” button (S13), the song name (music ID) and the guide vocal of the selected song are selected. Is added to the reservation queue 58 of the RAM 44 (S14). If no guide vocal is selected in the guide vocal setting field 40 and the guide flag is “None”, the process ends (S15).

一方、上記ガイドボーカルが「有」の場合には（Ｓ１５）、スコアデータ作成手段５２が、図４（Ｂ）に示すように、楽曲ＤＢ４５より選曲された楽曲の音符データ６１、歌詞データ６２を取得してスコアデータを作成し、当該楽曲ＩＤを附帯させてＲＡＭ４４の楽曲別スコアデータ５７に記憶する（Ｓ１６）。なお、図４（Ａ）の楽曲表示画面に示すガイドボーカル選択機能を設けない構成として、選曲可能な全部の楽曲について、スコアデータを作成しておいてもよい。 On the other hand, when the above-mentioned guide vocal is “present” (S15), the score data creating means 52 stores the note data 61 and the lyrics data 62 of the music selected from the music DB 45 as shown in FIG. Acquired score data is created, and the music ID is attached and stored in the score data 57 by music in the RAM 44 (S16). As a configuration in which the guide vocal selection function shown in the music display screen of FIG. 4A is not provided, score data may be created for all music that can be selected.

そして、ガイドボーカル生成手段５３が、図４（Ｃ）に示すように、予約待ち行列５８から演奏対象の楽曲ＩＤ及び選曲者の利用者ＩＤを特定し、楽曲ＩＤに基づいて楽曲別スコアデータ５７よりスコアデータを取得すると共に、利用者ＩＤに基づいて利用者別音素データ５６に記憶されている選曲者の音素データを取得しておくものである（Ｓ１７）。 Then, as shown in FIG. 4C, the guide vocal generating means 53 specifies the music ID to be played and the user ID of the music selector from the reservation queue 58, and score data 57 for each music based on the music ID. In addition to acquiring score data, the phoneme data of the music selector stored in the user-specific phoneme data 56 is acquired based on the user ID (S17).

そこで、図５に図２のカラオケ演奏装置におけるガイドボーカル生成の処理フローチャートを示すと共に、図６に図５におけるガイドボーカル生成の説明図を示す。ここではガイドボーカルの生成を、演奏開始の際に、例えば音楽演奏制御部４８より得られる演奏同期信号に基づいて処理するものとして説明する。 Therefore, FIG. 5 shows a processing flowchart of guide vocal generation in the karaoke performance apparatus of FIG. 2, and FIG. 6 shows an explanatory diagram of guide vocal generation in FIG. Here, description will be made assuming that the generation of the guide vocal is processed on the basis of a performance synchronization signal obtained from, for example, the music performance control unit 48 at the start of performance.

図５及び図６において、上記予約待ち行列５８に登録されて順番に演奏された楽曲のうち、ガイドフラグが附帯された楽曲の演奏が開始されると（Ｓ２１）、ガイドボーカル生成手段５３が、図６に示すように、取得された当該楽曲のスコアデータに、当該利用者ＩＤの音素データを、音楽演奏制御部４８からの演奏同期信号に応じて歌唱音声合成することによりガイドボーカルデータを作成し、ミキシングアンプ３３を介して出力するものである（Ｓ２２）。 5 and FIG. 6, when the performance of the music with the guide flag is started among the music registered in the reservation queue 58 and played in order (S21), the guide vocal generating means 53 As shown in FIG. 6, guide vocal data is created by synthesizing the singing voice according to the performance synchronization signal from the music performance control unit 48 with the phoneme data of the user ID in the acquired score data of the music. Then, it is output via the mixing amplifier 33 (S22).

このように、利用者による音声素片（音素）データのサンプリングなど、特別で面倒な操作を要することなく、歌唱タイミングや音程については勿論、歌唱者本人が模範通りに歌った場合に、どのような歌唱になっているのかを本人に効果的に知らしめることができ、さらに、同じカラオケの場に、歌唱者の他に利用者がいたとしても、例え、ガイドボーカルと本人の歌唱音声とが一部重なっても「ユニゾン」のように聴こえ、不自然さが少ないガイドボーカルを再生することができるものである。さらに、選曲された楽曲の演奏の際に、演奏対象の楽曲のスコアデータに、当該楽曲に附帯された利用者ＩＤに対応する音素データを歌唱音声合成させたガイドボーカルを生成して出力可能とさせることで、選曲時の楽曲歌唱の音程などの設定を当該ガイドボーカルに反映させることができるものである。 In this way, there is no need for special and cumbersome operations such as sampling of speech segment (phoneme) data by the user, as well as singing timing and pitch, as well as how the singer sings according to the model. If there are users other than the singer in the same karaoke place, for example, the guide vocal and the singing voice of the person Even if they overlap, they can be heard like “Unison” and can reproduce guide vocals with less unnaturalness. Furthermore, when playing the selected music, it is possible to generate and output a guide vocal by synthesizing the singing voice of the phoneme data corresponding to the user ID attached to the music to the score data of the music to be played By doing so, settings such as the pitch of the song singing at the time of music selection can be reflected in the guide vocal.

ところで、選曲した利用者が歌唱しなければ、演奏及びガイドボーカルのみが出力されることとなるが、利用者としては自分の音声で理想的な歌唱を、いわゆるお手本として聴取することができる。また、複数人での利用の場合に、ガイドボーカルを出力させ、選曲者が歌唱しているようによそおうことで他人に模範的な歌唱をあたかも本人が歌唱しているように見せかけることもできる。 By the way, if the selected user does not sing, only the performance and the guide vocal are output, but the user can listen to the ideal singing with his own voice as a so-called model. Also, in the case of use by a plurality of people, it is possible to make it appear as if the person is singing an exemplary singing song by outputting a guide vocal and letting the musician sing as if singing.

なお、予約待ち行列５８に登録された総ての楽曲について、スコアデータ作成手段５２がスコアデータを作成して楽曲別スコアデータ５７に記憶しておき、選曲時に上記「ガイドボーカル」の設定欄４０で設定がなされなかった場合（ガイドフラグ「無」）に、当該楽曲の歌唱中に表示部３２又は遠隔入出力端末３６の端末表示部３９の少なくとも何れかに、ガイドボーカルを出力させる選択ボタンを表示させることとしてもよい。このことは、図４（Ａ）の選曲画面に「ガイドボーカル」の設定欄４０を設けなかった場合も同様である。 Note that the score data creation means 52 creates score data for all the songs registered in the reservation queue 58 and stores them in the score data 57 for each song. If the setting is not made in (Guide flag “None”), a selection button for outputting a guide vocal to at least one of the display unit 32 and the terminal display unit 39 of the remote input / output terminal 36 during the song singing. It may be displayed. The same applies to the case where the “Guide Vocal” setting field 40 is not provided on the music selection screen of FIG.

次に、図７に、本発明のカラオケシステムにおけるカラオケ演奏端末の他の説明図を示す。この実施形態は、カラオケ演奏装置３１に歌唱監視手段７１を備えさせ、Ａ／Ｄ変換部４９より歌唱音声を取得させる構成としたものである。当該歌唱監視手段７１は、楽曲演奏中の歌唱者の歌唱状態に応じて、ガイドボーカル生成手段５３に対し、ガイドボーカル合成処理の処理開始、あるいは処理停止の信号を出力させるプログラムである。 Next, FIG. 7 shows another explanatory diagram of the karaoke performance terminal in the karaoke system of the present invention. In this embodiment, the karaoke performance device 31 is provided with singing monitoring means 71, and the singing voice is acquired from the A / D conversion unit 49. The singing monitoring means 71 is a program that causes the guide vocal generating means 53 to output a process start signal or a process stop signal to the guide vocal generating means 53 in accordance with the singing state of the singer who is playing the music.

ここで、上記歌唱状態とは、歌唱につまって無音となった場合、歌唱音声が小さくなって所定レベル以下になった場合、また、新たに歌唱採点処理を設けて所定の歌唱区間で採点値が基準値以下となった場合などをいう。ここでは、歌唱音声が小さくなって所定レベル以下になった場合について説明する。 Here, when the singing state is silenced after singing, the singing voice is reduced to a predetermined level or less, and a new singing scoring process is provided to score in a predetermined singing section. When the value is below the reference value. Here, a case where the singing voice is reduced to a predetermined level or less will be described.

すなわち、図７（Ｂ）において、演奏が開始されると（Ｓ３１）、歌唱監視手段７１が、歌唱音声のマイク入力レベルが所定値以下か否かを監視する（Ｓ３２）。当該マイク入力レベルが所定値以下となった場合には（Ｓ３３）、ガイドボーカルが出力中か否かを判断し（Ｓ３４）、出力されていなければガイドボーカル生成手段５３に対し、処理開始の信号を出力する（Ｓ３５）。そして、以降の演奏について、取得されたスコアデータに、当該利用者ＩＤの音素データを、音楽演奏制御部４８からの演奏同期信号に応じて歌唱音声合成することによりガイドボーカルデータを生成し、ミキシングアンプ３３を介して出力する（Ｓ３６）。 That is, in FIG. 7B, when a performance is started (S31), the singing monitoring means 71 monitors whether or not the microphone input level of the singing voice is below a predetermined value (S32). When the microphone input level is equal to or lower than the predetermined value (S33), it is determined whether or not the guide vocal is being output (S34), and if not, the process start signal is sent to the guide vocal generating means 53. Is output (S35). Then, for subsequent performances, guide vocal data is generated by mixing the phoneme data of the user ID with the acquired score data according to the performance synchronization signal from the music performance control unit 48, and mixing. The signal is output through the amplifier 33 (S36).

一方、Ｓ３３において、歌唱者のマイク入力レベルが所定値を越えている場合には、ガイドボーカルが出力中か否かを判断し（Ｓ３７）、出力中の場合には歌唱監視手段７１がガイドボーカル生成手段５３に対し、処理停止の信号を出力することにより（Ｓ３８）、当該ガイドボーカル生成手段５３がガイドボーカル合成の処理を停止する（Ｓ３９）。また、上記Ｓ３４においてガイドボーカルが出力中の場合、また、上記Ｓ３７においてガイドボーカルが出力されていない場合には、何も処理を行わず、以降、演奏終了までＳ３２〜Ｓ３９が繰り返えされるものである（Ｓ４０）。 On the other hand, when the microphone input level of the singer exceeds a predetermined value in S33, it is determined whether or not the guide vocal is being output (S37). By outputting a process stop signal to the generation means 53 (S38), the guide vocal generation means 53 stops the guide vocal synthesis process (S39). If the guide vocal is being output in S34, or if the guide vocal is not output in S37, no processing is performed, and thereafter S32 to S39 are repeated until the end of the performance. (S40).

このように、歌唱者の歌唱状態に応じてガイドボーカルを生成して出力可能状態とさせ、利用者の操作等によりガイドボーカルを出力させることから、歌唱者を好適にアシストすることができるものである。 In this way, the guide vocal is generated according to the singing state of the singer and is made ready for output, and the guide vocal is output by the user's operation or the like, so that the singer can be favorably assisted. is there.

ところで、上記実施形態では、楽曲別スコアデータ５７を楽曲が予約待ち行列５８に登録されたときに作成し、演奏の際にガイドボーカル生成手段５３がガイドボーカルの生成を行う場合を示したが、他に、以下のスコア作成、ガイドボーカル生成の手法がある。 By the way, in the said embodiment, although the score data 57 according to music was created when a music was registered into the reservation queue 58, the guide vocal production | generation means 53 produced | generated the guide vocal at the time of a performance was shown. In addition, there are the following score creation and guide vocal generation methods.

第１に、楽曲別スコアデータ５７を、予め全楽曲について作成しておき、例えばホスト装置１１でデータベース化して管理しておくもので、利用者のログイン時にホスト装置１１より全曲分のスコアデータを取得し、ガイドボーカル生成手段５３が予約待ち行列５８に楽曲が登録されたときにガイドボーカルを生成して記憶しておいて演奏時に出力させ、若しくは、演奏開始時にガイドボーカルを生成させて出力させるものである。 First, the score data 57 for each song is created in advance for all songs, and is stored in a database in the host device 11, for example, and score data for all songs is received from the host device 11 when the user logs in. Acquired and the guide vocal generating means 53 generates and stores the guide vocal when the music is registered in the reservation queue 58 and outputs it at the time of performance, or generates and outputs the guide vocal at the start of the performance. Is.

第２に、楽曲別スコアデータ５７を、予め全楽曲について作成しておき、例えばホスト装置１１でデータベース化して管理しておくもので、予約待ち行列５８に楽曲が登録されたときに、該当楽曲のスコアデータをホスト装置１１より取得し、ガイドボーカル生成手段５３が予約待ち行列５８に楽曲が登録されたときにガイドボーカルを生成して記憶しておいて演奏時に出力させ、若しくは、演奏開始時にガイドボーカルを生成させて出力させるものである。 Secondly, the score data 57 for each music is created in advance for all the music and managed in the database by the host device 11, for example. When the music is registered in the reservation queue 58, the corresponding music Is obtained from the host device 11, and the guide vocal generating means 53 generates and stores the guide vocal when the music is registered in the reservation queue 58 and outputs it at the time of performance or at the start of performance. A guide vocal is generated and output.

第３に、楽曲別スコアデータ５７を、演奏開始時に、予約待ち行列５８から楽曲ＩＤを取得してスコアデータを作成させ、作成されたスコアデータを基づいてガイドボーカル生成手段５３がガイドボーカルを生成させるものである。 Third, at the start of performance, the score data 57 for each piece of music is obtained from the reservation queue 58 and the score data is generated, and the guide vocal generating means 53 generates the guide vocal based on the generated score data. It is something to be made.

第４に、楽曲別スコアデータ５７を、予約待ち行列５８に楽曲が登録されたときに、予約待ち行列５８から楽曲ＩＤを取得してスコアデータを作成させ、作成されたスコアデータに基づいてガイドボーカル生成手段５３がガイドボーカルを生成させて記憶しておき、演奏開始時に出力させるものである。 Fourthly, when the music score data 57 is registered in the reservation queue 58, the music ID is obtained from the reservation queue 58 to create score data, and the guide is based on the created score data. The vocal generating means 53 generates and stores a guide vocal and outputs it at the start of performance.

本発明のガイドボーカル生成機能を備えるカラオケシステムは、カラオケの基本的機能を備えるカラオケ装置の分野に利用可能である。 The karaoke system provided with the guide vocal generating function of the present invention can be used in the field of karaoke apparatuses provided with the basic functions of karaoke.

本発明に係るカラオケシステムの系統構成図である。It is a system configuration | structure figure of the karaoke system which concerns on this invention. 本発明のカラオケシステムにおけるカラオケ演奏端末のブロック構成図である。It is a block block diagram of the karaoke performance terminal in the karaoke system of this invention. 図２のカラオケ演奏装置における歌唱音声データの取得及び選曲時の処理フローチャートである。It is a processing flowchart at the time of acquisition of song voice data and music selection in the karaoke performance apparatus of FIG. 図３における選曲時の説明図である。It is explanatory drawing at the time of the music selection in FIG. 図２のカラオケ演奏装置におけるガイドボーカル生成の処理フローチャートである。It is a process flowchart of the guide vocal production | generation in the karaoke performance apparatus of FIG. 図５におけるガイドボーカル生成の説明図である。It is explanatory drawing of the guide vocal production | generation in FIG. 本発明のカラオケシステムにおけるカラオケ演奏端末の他の説明図である。It is other explanatory drawing of the karaoke performance terminal in the karaoke system of this invention.

符号の説明Explanation of symbols

１１ホスト装置
１３カラオケ演奏端末
２３利用者別音素データ抽出手段
２４利用者別音素ＤＢ
３１カラオケ演奏装置
３６遠隔入出力端末
４５楽曲ＤＢ
５０利用者別音声データ採取手段
５２スコアデータ作成手段
５３ガイドボーカル生成手段
５５利用者別音声データ
５６利用者別音素データ
５７楽曲別スコアデータ
５８予約待ち行列
６１音符データ
６２歌詞データ
６３楽曲唱法データ
７１歌唱監視手段 11 Host device 13 Karaoke performance terminal 23 User-specific phoneme data extraction means 24 User-specific phoneme DB
31 Karaoke performance device 36 Remote input / output terminal 45 Music DB
50 User-specific voice data collection means 52 Score data creation means 53 Guide vocal generation means 55 User-specific voice data 56 User-specific phoneme data 57 Music-specific score data 58 Reservation queue 61 Musical note data 62 Lyrics data 63 Musical choreography data 71 Singing monitoring means

Claims

利用者ＩＤが附帯されて選曲された楽曲の演奏に際してガイドボーカルを歌唱音声合成して出力するガイドボーカル生成機能を備えるカラオケシステムであって、
利用者特定手段、利用者別音声データ採取手段、利用者別音素データ抽出手段、スコアデータ作成手段及びガイドボーカル生成手段を有し、
前記利用者特定手段は、ログイン要求の利用者より利用者ＩＤを取得して当該利用者を特定し、
前記利用者別音声データ採取手段は、任意の利用者による任意の楽曲の歌唱時にその音声データを採取し、
前記利用者別音素データ抽出手段は、利用者ＩＤ別に、前記取得した音声データから少なくとも発声音の基本単位となる本人の音素データを抽出して利用者別音素データベースに格納し、
前記スコアデータ作成手段は、所定の楽曲の音符データ及び歌詞データに基づいて歌唱音声合成用のスコアデータを作成し、
前記ガイドボーカル生成手段は、利用者ＩＤが附帯されて選曲された楽曲について、前記作成された当該楽曲のスコアデータに、当該利用者ＩＤに対応する音素データを歌唱音声合成させることでガイドボーカルを生成して出力可能とさせる、
ことを特徴とするガイドボーカル生成機能を備えるカラオケシステム。 A karaoke system provided with a guide vocal generating function for synthesizing and outputting a guide vocal when a musical piece selected with a user ID is played,
User identification means, user-specific voice data collection means, user-specific phoneme data extraction means, score data creation means and guide vocal generation means,
The user specifying means acquires the user ID from the user of the login request and specifies the user,
The user-specific voice data collection means collects the voice data when singing an arbitrary song by an arbitrary user,
The user-specific phoneme data extraction means extracts the phoneme data of the person who is at least the basic unit of the uttered sound from the acquired voice data for each user ID, and stores it in the user-specific phoneme database.
The score data creating means creates score data for singing voice synthesis based on note data and lyrics data of a predetermined music,
The guide vocal generating means generates a guide vocal by synthesizing the phoneme data corresponding to the user ID into the score data of the generated music for the music selected with the user ID attached thereto. Generate and enable output,
A karaoke system equipped with a guide vocal generation function.

請求項１記載のガイドボーカル生成機能を備えるカラオケシステムであって、
前記スコアデータ作成手段は、任意の楽曲の演奏の際に、楽曲の音符データ及び歌詞データに基づいてスコアデータを作成し、
前記ガイドボーカル生成手段は、前記作成された当該楽曲のスコアデータに、当該利用者ＩＤに対応する音素データを歌唱音声合成させることでガイドボーカルを生成して出力可能とさせることを特徴とするガイドボーカル生成機能を備えるカラオケシステム。 A karaoke system comprising the guide vocal generation function according to claim 1,
The score data creating means creates score data based on musical note data and lyric data when playing an arbitrary musical piece,
The guide vocal generating means generates a guide vocal by allowing the created score data of the music to synthesize phoneme data corresponding to the user ID by singing voice, and makes it possible to output the guide vocal. Karaoke system with vocal generation function.