JP6916580B1

JP6916580B1 - Karaoke equipment

Info

Publication number: JP6916580B1
Application number: JP2020032659A
Authority: JP
Inventors: 豪矢吹
Original assignee: Daiichikosho Co Ltd
Current assignee: Daiichikosho Co Ltd
Priority date: 2020-02-28
Filing date: 2020-02-28
Publication date: 2021-08-11
Anticipated expiration: 2040-02-28
Also published as: JP2021135425A

Abstract

【課題】歌唱中にコマンドワードが誤検出された場合でも、コマンドワードに対応する操作コマンドの実行を回避できるカラオケ装置を提供する。【解決手段】カラオケ装置（１１）は、カラオケ演奏中にリファレンスデータから抽出した基準ピッチと音声信号から抽出した音声ピッチを時系列に沿って記憶するピッチ記憶部（２４）と、カラオケ演奏中に音声信号によって音声入力されたコマンドワードを検出するコマンド検出部（２５）と、コマンドワードの音声入力開始から音声入力終了までの音声入力期間を特定する期間特定部（２６）と、ピッチ記憶部に記憶された音声入力期間の基準ピッチと音声ピッチを比較してコマンドワードの誤検出を判定する誤検出判定部（２７）と、コマンドワードの誤検出と判定された場合に、コマンドワードに対応する操作コマンドの実行を禁止するコマンド実行部（２８）と、を備えている。【選択図】図２PROBLEM TO BE SOLVED: To provide a karaoke device capable of avoiding execution of an operation command corresponding to a command word even when a command word is erroneously detected during singing. SOLUTION: A karaoke apparatus (11) has a pitch storage unit (24) that stores a reference pitch extracted from reference data during a karaoke performance and a voice pitch extracted from a voice signal in chronological order, and a pitch storage unit (24) during the karaoke performance. In the command detection unit (25) that detects the command word input by voice signal, the period specification unit (26) that specifies the voice input period from the start of voice input of the command word to the end of voice input, and the pitch storage unit. The false detection determination unit (27) that determines false detection of a command word by comparing the reference pitch of the stored voice input period with the voice pitch, and corresponds to the command word when it is determined that the false detection of the command word is made. It includes a command execution unit (28) that prohibits execution of operation commands. [Selection diagram] Fig. 2

Description

本発明は、カラオケ装置に関する。 The present invention relates to a karaoke device.

近年、カラオケ装置において、カラオケ演奏に関する各種操作を行うためのコマンドワードの音声入力が可能になっている。このようなカラオケ装置では、カラオケ演奏中の歌詞にコマンドワードと同じ語句が含まれていると、利用者が望まない操作コマンドが歌唱音声によって実行されることがある。そこで、音声認識処理によって歌唱音声からコマンドワードが検出されたタイミングで、当該タイミングで歌唱すべき歌詞とコマンドワードが一致する場合に操作コマンドの実行を禁止するカラオケ装置が提案されている（例えば、特許文献１参照）。 In recent years, in a karaoke device, it has become possible to input a command word by voice for performing various operations related to karaoke performance. In such a karaoke device, if the lyrics during karaoke performance include the same words and phrases as the command word, an operation command that the user does not want may be executed by the singing voice. Therefore, a karaoke device has been proposed that prohibits the execution of an operation command when the command word is detected from the singing voice by the voice recognition process and the lyrics to be sung and the command word match at that timing (for example). See Patent Document 1).

特開２０１９−１１７２８２号公報Japanese Unexamined Patent Publication No. 2019-117282

ところで、歌詞の中には、コマンドワードと異なる語句にも関わらず、コマンドワードとして音声認識され易い歌詞が存在する。例えば、利用者が歌詞「ストック」を歌唱したときに「ストップ」と音声認識されて、コマンドワード「ストップ」によってカラオケ演奏が中止される場合がある。このような場合、特許文献１に開示の技術では、コマンドワード「ストップ」を検出したタイミングで、このタイミングで歌唱すべき歌詞「ストック」と「ストップ」が一致しないのでカラオケ演奏が中止される。すなわち、特許文献１の技術では、コマンドワードが誤検出された場合には、利用者の意図に反して操作コマンドが実行されるという不具合がある。 By the way, in the lyrics, there are lyrics that are easily recognized by voice as command words even though they are different from the command words. For example, when the user sings the lyrics "stock", the voice is recognized as "stop", and the karaoke performance may be stopped by the command word "stop". In such a case, in the technique disclosed in Patent Document 1, at the timing when the command word "stop" is detected, the lyrics "stock" and "stop" to be sung at this timing do not match, so that the karaoke performance is stopped. That is, the technique of Patent Document 1 has a problem that when a command word is erroneously detected, an operation command is executed contrary to the intention of the user.

本発明の目的は、歌唱中にコマンドワードが誤検出された場合でも、コマンドワードに対応する操作コマンドの実行を回避できるカラオケ装置を提供することである。 An object of the present invention is to provide a karaoke device capable of avoiding execution of an operation command corresponding to a command word even if a command word is erroneously detected during singing.

上記目的を達成するための主たる発明は、コマンドワードの音声入力によってカラオケ演奏に関する操作コマンドを実行可能なカラオケ装置であって、カラオケ演奏中にリファレンスデータから抽出した基準ピッチと音声信号から抽出した音声ピッチを時系列に沿って記憶するピッチ記憶部と、カラオケ演奏中に音声信号によって音声入力されたコマンドワードを検出するコマンド検出部と、コマンドワードの音声入力開始から音声入力終了までの音声入力期間を特定する期間特定部と、前記ピッチ記憶部に記憶された音声入力期間の基準ピッチと音声ピッチを比較してコマンドワードの誤検出を判定する誤検出判定部と、コマンドワードの誤検出と判定された場合に、コマンドワードに対応する操作コマンドの実行を禁止するコマンド実行部と、を備えているカラオケ装置である。 The main invention for achieving the above object is a karaoke device capable of executing operation commands related to karaoke performance by voice input of command words, and a reference pitch extracted from reference data and voice extracted from voice signals during karaoke performance. A pitch storage unit that stores the pitch in chronological order, a command detection unit that detects a command word input by voice signal during karaoke performance, and a voice input period from the start of voice input to the end of voice input of the command word. A false detection determination unit that determines false detection of a command word by comparing a reference pitch and a voice pitch of a voice input period stored in the pitch storage unit with a period specific unit for specifying a command word, and a false detection and determination of a command word. It is a karaoke device provided with a command execution unit that prohibits execution of an operation command corresponding to a command word when the command word is executed.

本発明によれば、カラオケ演奏中の音声信号によってコマンドワードが音声入力された場合に、コマンドワードの音声入力期間の基準ピッチと音声ピッチが比較される。基準ピッチと音声ピッチのズレから、利用者の意図に反して歌唱音声によって音声入力されたコマンドワードか、利用者によって意図的に音声入力されたコマンドワードかが認識されて、コマンドワードの誤検出が判定される。よって、カラオケ演奏中に利用者の意図に反して操作コマンドが実行されることがない。 According to the present invention, when a command word is voice-input by a voice signal during karaoke performance, the reference pitch and the voice pitch of the command word voice input period are compared. From the difference between the reference pitch and the voice pitch, it is recognized whether the command word is voice-input by the singing voice against the intention of the user or the command word intentionally input by the user, and the command word is erroneously detected. Is determined. Therefore, the operation command is not executed against the intention of the user during the karaoke performance.

第１の実施形態のカラオケシステムの構成図である。It is a block diagram of the karaoke system of 1st Embodiment. 第１の実施形態のカラオケ装置の制御ブロック図である。It is a control block diagram of the karaoke apparatus of 1st Embodiment. 第１の実施形態のコマンドテーブルの一例を示す図である。It is a figure which shows an example of the command table of 1st Embodiment. 第１の実施形態の基準ピッチ及び音声ピッチの記憶例を示す図である。It is a figure which shows the storage example of the reference pitch and the voice pitch of 1st Embodiment. 第１の実施形態のコマンドワードの誤検出判定処理の一例を示す図である。It is a figure which shows an example of the false detection determination processing of the command word of 1st Embodiment. 第１の実施形態のカラオケ装置の処理を示すフローチャートである。It is a flowchart which shows the processing of the karaoke apparatus of 1st Embodiment. 第２の実施形態のカラオケ装置の制御ブロック図である。It is a control block diagram of the karaoke apparatus of the 2nd Embodiment. 第２の実施形態のピッチ範囲設定テーブルの一例を示す図である。It is a figure which shows an example of the pitch range setting table of the 2nd Embodiment.

＜第１の実施形態＞
図１を参照して、第１の実施形態のカラオケシステム１０について説明する。図１は、第１の実施形態のカラオケシステム１０の構成図である。 <First Embodiment>
The karaoke system 10 of the first embodiment will be described with reference to FIG. FIG. 1 is a configuration diagram of the karaoke system 10 of the first embodiment.

図１に示すように、カラオケシステム１０は、利用者が予約した楽曲のカラオケ演奏に合わせてカラオケ歌唱を楽しむためのシステムであり、カラオケ装置１１と、モニタ１２と、スピーカ１３と、マイクロフォン１４と、リモコン装置１５とを備えている。モニタ１２は、カラオケ装置１１からの映像信号等に基づいて、カラオケ演奏に合わせて背景映像と共に歌詞テロップを表示する。スピーカ１３は、カラオケ装置１１からの放音信号に基づいて、楽曲の演奏音と共に利用者の歌唱音声を放音する。マイクロフォン１４は、利用者の歌唱音声を音声信号に変換してカラオケ装置１１に入力する。 As shown in FIG. 1, the karaoke system 10 is a system for enjoying karaoke singing along with a karaoke performance of a song reserved by a user, and includes a karaoke device 11, a monitor 12, a speaker 13, and a microphone 14. , The remote control device 15 is provided. The monitor 12 displays a lyric telop together with a background image in accordance with the karaoke performance based on a video signal or the like from the karaoke device 11. The speaker 13 emits the user's singing voice together with the performance sound of the music based on the sound emission signal from the karaoke device 11. The microphone 14 converts the user's singing voice into a voice signal and inputs it to the karaoke device 11.

リモコン装置１５は、タッチパネルを主体に構成されている。リモコン装置１５は、利用者に対する検索メニューや検索結果等の各種情報をタッチパネルに表示すると共に、タッチパネルによって利用者の入力を受け付けている。リモコン装置１５とカラオケ装置１１は無線通信を介してペアリングされており、リモコン装置１５とカラオケ装置１１の間で各種情報が相互に送受信される。リモコン装置１５は、利用者のタッチ操作に基づいて楽曲を検索する。タッチパネルに表示された転送ボタンのタッチによって、楽曲ＩＤが予約楽曲情報としてカラオケ装置１１に送信される。 The remote control device 15 is mainly composed of a touch panel. The remote controller 15 displays various information such as a search menu and search results for the user on the touch panel, and accepts the user's input by the touch panel. The remote control device 15 and the karaoke device 11 are paired via wireless communication, and various information is transmitted and received between the remote control device 15 and the karaoke device 11. The remote control device 15 searches for music based on the touch operation of the user. By touching the transfer button displayed on the touch panel, the music ID is transmitted to the karaoke device 11 as reserved music information.

カラオケ装置１１は、リモコン装置１５から受信した予約楽曲情報を記憶部２１（図２参照）の予約管理テーブルに登録する。記憶部２１には、楽曲ＩＤ毎に楽曲データ、歌詞テロップデータ、背景映像データ等のカラオケ歌唱に関する各種データが記憶されている。カラオケ装置１１は、予約管理テーブルから登録順に予約楽曲情報を読み出し、この予約楽曲情報の楽曲ＩＤに対応する楽曲データ、歌詞テロップデータ、背景映像データを記憶部２１から読み出す。楽曲データには、少なくともメロディパートの採点用のリファレンスデータ及び伴奏データが含まれている。 The karaoke device 11 registers the reserved music information received from the remote control device 15 in the reservation management table of the storage unit 21 (see FIG. 2). The storage unit 21 stores various data related to karaoke singing, such as music data, lyrics telop data, and background video data, for each music ID. The karaoke device 11 reads the reserved music information from the reservation management table in the order of registration, and reads the music data, the lyrics telop data, and the background video data corresponding to the music ID of the reserved music information from the storage unit 21. The music data includes at least reference data and accompaniment data for scoring the melody part.

カラオケ装置１１がカラオケ演奏を開始すると、伴奏データの再生に同期して、歌詞テロップデータ及び背景映像データに基づいて歌詞テロップと背景映像がモニタ１２に表示される。また、カラオケ装置１１ではカラオケ演奏の伴奏音信号とマイクロフォン１４から入力された音声信号がミキサによって適切な比率でミキシングされて、このミキシング信号がアンプによって増幅されてスピーカ１３から放音される。このように、利用者がカラオケ演奏に合わせて歌唱すると、スピーカ１３から演奏音と共に歌唱音声が放音される。歌唱音声はリファレンスデータに基づいて採点される。 When the karaoke device 11 starts playing karaoke, the lyrics telop and the background image are displayed on the monitor 12 based on the lyrics telop data and the background image data in synchronization with the reproduction of the accompaniment data. Further, in the karaoke device 11, the accompaniment sound signal of the karaoke performance and the audio signal input from the microphone 14 are mixed by the mixer at an appropriate ratio, and this mixing signal is amplified by the amplifier and emitted from the speaker 13. In this way, when the user sings along with the karaoke performance, the singing sound is emitted from the speaker 13 together with the performance sound. The singing voice is scored based on the reference data.

図２から図４を参照して、カラオケ装置１１の制御構成について説明する。図２は、第１の実施形態のカラオケ装置１１の制御ブロック図である。図３は、第１の実施形態のコマンドテーブルの一例を示す図である。図４は、第１の実施形態の基準ピッチ及び音声ピッチの記憶例を示す図である。図５は、第１の実施形態のコマンドワードの誤検出判定処理の一例を示す図である。なお、図２の制御ブロック図には、説明の便宜上、楽曲の演奏処理及びコマンドの実行処理に関するブロックのみを図示している。 The control configuration of the karaoke device 11 will be described with reference to FIGS. 2 to 4. FIG. 2 is a control block diagram of the karaoke device 11 of the first embodiment. FIG. 3 is a diagram showing an example of a command table according to the first embodiment. FIG. 4 is a diagram showing a storage example of the reference pitch and the voice pitch of the first embodiment. FIG. 5 is a diagram showing an example of false detection determination processing of the command word of the first embodiment. Note that the control block diagram of FIG. 2 shows only blocks related to music performance processing and command execution processing for convenience of explanation.

図２に示すように、カラオケ装置１１は、カラオケ演奏処理に加えて、コマンドワードの音声入力によってカラオケ演奏に関する操作コマンドを実行可能に構成されている。カラオケ装置１１には、記憶部２１と、演奏部２２と、抽出部２３と、ピッチ記憶部２４と、コマンド検出部２５と、期間特定部２６と、誤検出判定部２７と、コマンド実行部２８とが設けられている。記憶部２１の所定の記憶領域には、予約楽曲情報が登録順に並べられた予約管理テーブルが記憶されている。記憶部２１の別の記憶領域には、予約管理情報の楽曲ＩＤ毎に楽曲データ、歌詞テロップデータ、映像データが記憶されている。 As shown in FIG. 2, the karaoke device 11 is configured to be able to execute an operation command related to the karaoke performance by voice input of a command word in addition to the karaoke performance processing. The karaoke device 11 includes a storage unit 21, a performance unit 22, an extraction unit 23, a pitch storage unit 24, a command detection unit 25, a period identification unit 26, a false detection determination unit 27, and a command execution unit 28. And are provided. In a predetermined storage area of the storage unit 21, a reservation management table in which reserved music information is arranged in the order of registration is stored. In another storage area of the storage unit 21, music data, lyrics telop data, and video data are stored for each music ID of the reservation management information.

図３に示すように、記憶部２１のさらに別の記憶領域には、コマンドワードに操作コマンドが対応付けられたコマンドテーブルが記憶されている。例えば、操作コマンド「演奏中止」にはコマンドワード「チュウシ」、「ストップ」、「オワリ」、「シュウリョウ」が対応付けられている。操作コマンド「キー↑」にはコマンドワード「キーアゲ」、「キーアップ」、「シャープ」が対応付けられている。操作コマンド「キー↓」にはコマンドワード「キー下げ」、「キーダウン」、「フラット」が対応付けられている。その他の操作コマンド「伴奏音量↑」、「伴奏音量↓」、「マイク音量↑」、「マイク音量↓」、「エコー↑」、「エコー↓」にも各種コマンドワードが対応付けられている。 As shown in FIG. 3, a command table in which operation commands are associated with command words is stored in yet another storage area of the storage unit 21. For example, the operation command "stop playing" is associated with the command words "chuushi", "stop", "owari", and "shuryo". The operation command "key ↑" is associated with the command words "key age", "key up", and "sharp". The operation command "key ↓" is associated with the command words "key down", "key down", and "flat". Various command words are also associated with other operation commands "accompaniment volume ↑", "accompaniment volume ↓", "microphone volume ↑", "microphone volume ↓", "echo ↑", and "echo ↓".

演奏部２２は、楽曲の演奏処理及びモニタ１２（図１参照）の表示処理を制御している。演奏部２２は、記憶部２１から楽曲データ、歌詞テロップデータ、背景映像データを読み出して、楽曲データに含まれる伴奏データを再生し、歌詞テロップ及び背景映像をモニタ１２に表示させる。 The performance unit 22 controls the performance processing of the music and the display processing of the monitor 12 (see FIG. 1). The performance unit 22 reads the music data, the lyrics telop data, and the background image data from the storage unit 21, reproduces the accompaniment data included in the music data, and displays the lyrics telop and the background image on the monitor 12.

抽出部２３は、カラオケ演奏中に楽曲データに含まれるリファレンスデータから、所定期間（例えば、２０［ｍｓｅｃ］）毎に１サンプルずつ基準ピッチを抽出する。同時に、抽出部２３は、マイクロフォン１４（図１参照）から入力される音声信号から、基準ピッチの抽出タイミング毎に１サンプルずつ音声ピッチを抽出する。基準ピッチと音声ピッチは、演奏開始から所定期間毎の経過時間に関連付けてピッチ記憶部２４に記憶される。このように、ピッチ記憶部２４には、カラオケ演奏中にリファレンスデータから抽出された基準ピッチと音声信号から抽出された音声ピッチが時系列に沿って記憶される。 The extraction unit 23 extracts a reference pitch one sample at a predetermined period (for example, 20 [msec]) from the reference data included in the music data during the karaoke performance. At the same time, the extraction unit 23 extracts one sample of the voice pitch from the voice signal input from the microphone 14 (see FIG. 1) at each reference pitch extraction timing. The reference pitch and the voice pitch are stored in the pitch storage unit 24 in association with the elapsed time for each predetermined period from the start of the performance. In this way, the pitch storage unit 24 stores the reference pitch extracted from the reference data and the audio pitch extracted from the audio signal during the karaoke performance in chronological order.

図４に示す例では、１５４２０［ｍｓｅｃ］から１５４８０［ｍｓｅｃ］の経過時間に、基準ピッチとして５９００［ｃｅｎｔ］、音声ピッチとして５５５５［ｃｅｎｔ］、５６７０［ｃｅｎｔ］、５６４６［ｃｅｎｔ］、５６６０［ｃｅｎｔ］が記憶されている。なお、非歌唱区間ではリファレンスデータが存在しないため基準ピッチとして０［ｃｅｎｔ］が記憶され、利用者が歌唱しないため音声ピッチとして０［ｃｅｎｔ］が記憶される。一般に音程の単位であるｃｅｎｔ値は相対値（例えば、半音は１００［ｃｅｎｔ］の差）で示されるが、本実施形態ではｃｅｎｔ値を絶対値とするために、音階Ｃ４（周波数２６１．６２６［Ｈｚ］）が６０００［ｃｅｎｔ］に定義されている。 In the example shown in FIG. 4, the reference pitch is 5900 [cent] and the voice pitch is 5555 [cent], 5670 [cent], 5646 [cent], 5660 [cent] in the elapsed time from 15420 [msec] to 15480 [msec]. ] Is remembered. In the non-singing section, 0 [cent] is stored as the reference pitch because there is no reference data, and 0 [cent] is stored as the voice pitch because the user does not sing. Generally, the cent value, which is a unit of pitch, is indicated by a relative value (for example, a semitone is a difference of 100 [cent]), but in the present embodiment, in order to make the cent value an absolute value, the scale C4 (frequency 261.626 [frequency] Hz]) is defined as 6000 [cent].

コマンド検出部２５は、カラオケ演奏中に音声信号によって音声入力されたコマンドワードを検出する。コマンド検出部２５では、マイクロフォン１４（図１参照）から音声信号が入力される度に、既知の音声認識処理によって音声信号が解析されて音声入力された文字が特定される。この特定された文字とコマンドワードの文字が逐次比較されて、全ての文字が一致した場合に音声信号からコマンドワードが検出される。例えば、カラオケ演奏中の音声信号から「ス」、「ト」、「ッ」、「プ」の順に文字が特定されると、コマンドワード「ストップ」が検出される。 The command detection unit 25 detects a command word voice-input by a voice signal during a karaoke performance. Each time a voice signal is input from the microphone 14 (see FIG. 1), the command detection unit 25 analyzes the voice signal by a known voice recognition process and identifies the input character. The specified character and the character of the command word are sequentially compared, and when all the characters match, the command word is detected from the audio signal. For example, when characters are specified in the order of "su", "to", "tsu", and "pu" from the audio signal during karaoke performance, the command word "stop" is detected.

期間特定部２６は、コマンドワードの音声入力開始から音声入力終了までの音声入力期間を特定する。期間特定部２６では、カラオケ演奏の開始からの経過時間が計時されており、コマンドワードの先頭文字を発声し始めた時点の経過時間を音声入力開始とし、コマンドワードの最終文字を発声し終えた時点の経過時間を音声入力終了として音声入力期間が特定される。例えば、コマンドワード「ストップ」の音声入力期間は、「ス」を発声し始めた時点の経過時間が音声入力開始として特定され、「プ」を発声し終えた時点の経過時間が音声入力終了として特定される。 The period specifying unit 26 specifies the voice input period from the start of the voice input of the command word to the end of the voice input. In the period specifying unit 26, the elapsed time from the start of the karaoke performance is timed, the elapsed time at the time when the first character of the command word is started to be uttered is set as the voice input start, and the last character of the command word is uttered. The voice input period is specified with the elapsed time at the time point as the end of voice input. For example, in the voice input period of the command word "stop", the elapsed time at the time when "su" is started to be uttered is specified as the voice input start, and the elapsed time at the time when "pu" is finished is specified as the voice input end. Be identified.

誤検出判定部２７は、ピッチ記憶部２４に記憶された音声入力期間の基準ピッチと音声ピッチを比較してコマンドワードの誤検出を判定する。誤検出判定部２７では、基準ピッチを基準とした比較用のピッチ範囲が設定される。このピッチ範囲と音声ピッチが比較されて、ピッチ範囲に含まれる音声ピッチのサンプル数の割合に応じてコマンドワードの誤検出が判定される。音声ピッチの割合が閾値（例えば、６０［％］）以上の場合にはコマンドワードの誤検出であると判定され、音声ピッチのサンプル数の割合が閾値未満の場合にはコマンドワードの誤検出ではないと判定される。 The false detection determination unit 27 compares the reference pitch of the voice input period stored in the pitch storage unit 24 with the voice pitch, and determines the false detection of the command word. The erroneous detection determination unit 27 sets a pitch range for comparison with reference to the reference pitch. This pitch range is compared with the voice pitch, and erroneous detection of the command word is determined according to the ratio of the number of voice pitch samples included in the pitch range. If the ratio of the voice pitch is equal to or more than the threshold value (for example, 60 [%]), it is determined that the command word is erroneously detected, and if the ratio of the number of voice pitch samples is less than the threshold value, the erroneous detection of the command word is performed. It is judged that there is no.

例えば、図５（Ａ）に示すように、音声入力期間の音声入力開始ｔ１から音声入力終了ｔ２までの基準ピッチを中心として比較用のピッチ範囲が設定されている。ここで、ピッチ記憶部２４に記憶された基準ピッチはリファレンスデータに基づくものであり、リファレンスデータに含まれるノート（音符）のノートオンからノートオフまでは同じ基準ピッチが維持されている。例えば、基準ピッチが５９００［ｃｅｎｔ］である場合、この５９００［ｃｅｎｔ］に対して５８００［ｃｅｎｔ］から６０００［ｃｅｎｔ］まで、±１００［ｃｅｎｔ］の範囲がピッチ範囲として設定される。 For example, as shown in FIG. 5A, a pitch range for comparison is set centering on a reference pitch from the voice input start t1 to the voice input end t2 during the voice input period. Here, the reference pitch stored in the pitch storage unit 24 is based on the reference data, and the same reference pitch is maintained from note-on to note-off of the notes (notes) included in the reference data. For example, when the reference pitch is 5900 [cent], a range of ± 100 [cent] is set as the pitch range from 5800 [cent] to 6000 [cent] with respect to this 5900 [cent].

次に、図５（Ｂ）に示すように、音声入力開始ｔ１から音声入力終了ｔ２までの音声ピッチの全サンプル数のうち、ピッチ範囲に含まれている音声ピッチのサンプル数の割合が求められる。ピッチ範囲に含まれる音声ピッチの割合が６０［％］以上の場合、音声ピッチの推移が基準ピッチの推移に全体的に類似している。このため、音声入力期間は利用者が歌唱している期間であり、音声ピッチは歌唱音声の音声信号から抽出された歌唱ピッチである。歌唱音声によってコマンドワードが音声入力されているため、誤検出判定部２７にコマンドワードの誤検出であると判定される。 Next, as shown in FIG. 5B, the ratio of the number of voice pitch samples included in the pitch range to the total number of voice pitch samples from the voice input start t1 to the voice input end t2 is obtained. .. When the ratio of the voice pitch included in the pitch range is 60 [%] or more, the transition of the voice pitch is generally similar to the transition of the reference pitch. Therefore, the voice input period is the period during which the user is singing, and the voice pitch is the singing pitch extracted from the voice signal of the singing voice. Since the command word is input by the singing voice, the false detection determination unit 27 determines that the command word is falsely detected.

一方、図５（Ｃ）に示すように、ピッチ範囲に含まれる音声ピッチの割合が６０［％］未満の場合、音声ピッチの推移が基準ピッチの推移に類似していない。このため、音声入力期間は利用者が歌唱していない期間であり、音声ピッチは歌唱音声以外の他の音声信号から抽出された音声ピッチである。利用者によってコマンドワードが意図的に音声入力された可能性が高いため、誤検出判定部２７にコマンドワードの誤検出ではないと判定される。なお、誤検出判定用の閾値［％］には、過去データ等から実験的、経験的、又は理論的に求められた任意の値が設定される。 On the other hand, as shown in FIG. 5C, when the ratio of the voice pitch included in the pitch range is less than 60 [%], the transition of the voice pitch is not similar to the transition of the reference pitch. Therefore, the voice input period is a period during which the user is not singing, and the voice pitch is a voice pitch extracted from a voice signal other than the singing voice. Since it is highly possible that the command word was intentionally input by voice by the user, the false positive determination unit 27 determines that the command word is not falsely detected. The threshold value [%] for erroneous detection determination is set to an arbitrary value experimentally, empirically, or theoretically obtained from past data or the like.

コマンド実行部２８は、誤検出判定部２７にコマンドワードの誤検出であると判定された場合に、コマンドワードに対応する操作コマンドの実行を禁止する。また、コマンド実行部２８は、誤検出判定部２７にコマンドワードの誤検出ではないと判定された場合に、コマンドワードに対応する操作コマンドを実行する。コマンド実行部２８によるコマンド実行時には、記憶部２１のコマンドテーブルが参照されて、コマンド検出部２５で検出されたコマンドワードに対応付けられた操作コマンドが実行される。例えば、コマンドワード「ストップ」に対応付けられた操作コマンド「演奏中止」が実行される。 The command execution unit 28 prohibits the execution of the operation command corresponding to the command word when the false detection determination unit 27 determines that the command word is falsely detected. Further, the command execution unit 28 executes an operation command corresponding to the command word when the false detection determination unit 27 determines that the command word is not falsely detected. When the command execution unit 28 executes a command, the command table of the storage unit 21 is referred to, and an operation command associated with the command word detected by the command detection unit 25 is executed. For example, the operation command "stop playing" associated with the command word "stop" is executed.

また、カラオケ装置１１の各部の処理は、プロセッサを用いてソフトウェアによって実現されてもよいし、集積回路等に形成された論理回路（ハードウェア）によって実現されてもよい。プロセッサを用いる場合には、プロセッサがメモリに記憶されているプログラムを読み出して実行することで各種処理が実施される。プロセッサとしては、例えば、ＣＰＵ（Central Processing Unit）が使用される。また、メモリは、用途に応じてＲＯＭ(Read Only Memory)、ＲＡＭ（Random Access Memory）等の一つ又は複数の記憶媒体によって構成されている。 Further, the processing of each part of the karaoke device 11 may be realized by software using a processor, or may be realized by a logic circuit (hardware) formed in an integrated circuit or the like. When a processor is used, various processes are performed by the processor reading and executing a program stored in the memory. As the processor, for example, a CPU (Central Processing Unit) is used. Further, the memory is composed of one or a plurality of storage media such as ROM (Read Only Memory) and RAM (Random Access Memory) depending on the intended use.

続いて、図６を参照して、カラオケ装置１１の処理動作について説明する。図６は、第１の実施形態のカラオケ装置１１の処理を示すフローチャートである。なお、図６に示すフローチャートは一例を示すものであり、カラオケ装置１１の処理動作はこのフローチャートに限定されない。また、図６では、図２の符号を適宜使用して説明する。 Subsequently, the processing operation of the karaoke device 11 will be described with reference to FIG. FIG. 6 is a flowchart showing the processing of the karaoke device 11 of the first embodiment. The flowchart shown in FIG. 6 is an example, and the processing operation of the karaoke device 11 is not limited to this flowchart. Further, in FIG. 6, reference numerals of FIG. 2 will be appropriately used for description.

図６に示すように、カラオケ装置１１によって予約管理テーブルから登録順に予約楽曲情報が読み出される（ステップＳ０１）。次に、予約楽曲情報に含まれる楽曲ＩＤに基づいて、記憶部２１から楽曲データ、歌詞テロップデータ、背景映像データが読み出される（ステップＳ０２）。次に、演奏部２２によって楽曲データ内の伴奏データが再生されてカラオケ演奏が開始される（ステップＳ０３）。カラオケ歌唱が開始されると、抽出部２３によってリファレンスデータから基準ピッチが抽出されると共に、利用者が歌唱する音声信号から音声ピッチが抽出されて、基準ピッチと音声ピッチがピッチ記憶部２４に記憶される（ステップＳ０４）。 As shown in FIG. 6, the karaoke device 11 reads the reserved music information from the reservation management table in the order of registration (step S01). Next, the music data, the lyrics telop data, and the background video data are read from the storage unit 21 based on the music ID included in the reserved music information (step S02). Next, the performance unit 22 reproduces the accompaniment data in the music data and starts the karaoke performance (step S03). When karaoke singing is started, the extraction unit 23 extracts the reference pitch from the reference data, extracts the voice pitch from the voice signal sung by the user, and stores the reference pitch and the voice pitch in the pitch storage unit 24. (Step S04).

次に、コマンド検出部２５によって既知の音声認識処理を用いて音声信号が解析されて、音声信号によって音声入力されたコマンドワードの検出処理が実施される（ステップＳ０５）。コマンド検出部２５によってコマンドワードが検出されない場合には（ステップＳ０５でＮｏ）、後述するステップＳ１０に処理が移行する。一方で、コマンド検出部２５によってコマンドワードが検出された場合には（ステップＳ０５でＹｅｓ）、期間特定部２６によってコマンドワードの音声入力開始から音声入力終了までの音声入力期間が特定される（ステップＳ０６）。 Next, the command detection unit 25 analyzes the voice signal using a known voice recognition process, and detects the command word input by the voice signal (step S05). If the command word is not detected by the command detection unit 25 (No in step S05), the process proceeds to step S10, which will be described later. On the other hand, when the command word is detected by the command detection unit 25 (Yes in step S05), the period specifying unit 26 specifies the voice input period from the start of voice input of the command word to the end of voice input (step). S06).

次に、誤検出判定部２７によって音声入力期間の基準ピッチと音声ピッチが比較されて、コマンドワードの誤検出が判定される（ステップＳ０７）。上記したように、基準ピッチを基準にしたピッチ範囲が設定され、音声入力期間の音声ピッチの全サンプル数に対するピッチ範囲内の音声ピッチのサンプル数の割合が求められる。ピッチ範囲内の音声ピッチのサンプル数の割合が閾値（例えば、６０［％］）以上の場合には、歌唱音声によって利用者の意図に反してコマンドワードが音声入力されているため、コマンドワードが誤検出されていると判定される。 Next, the erroneous detection determination unit 27 compares the reference pitch of the voice input period with the voice pitch, and determines erroneous detection of the command word (step S07). As described above, the pitch range based on the reference pitch is set, and the ratio of the number of voice pitch samples within the pitch range to the total number of voice pitch samples during the voice input period is obtained. When the ratio of the number of voice pitch samples in the pitch range is equal to or greater than the threshold value (for example, 60 [%]), the command word is input by voice against the user's intention by the singing voice, so that the command word is displayed. It is determined that the error has been detected.

誤検出判定部２７によってコマンドワードの誤検出ではないと判定された場合には（ステップＳ０７でＮｏ）、コマンド実行部２８によってコマンドワードに対応付けられた操作コマンドが実行される（ステップＳ０８）。誤検出判定部２７によってコマンドワードの誤検出であると判定された場合には（ステップＳ０７でＹｅｓ）、コマンド実行部２８による操作コマンドの実行が禁止される（ステップＳ０９）。カラオケ演奏が終了するまでステップＳ０４からステップＳ０９までの各処理が繰り返される（ステップＳ１０でＮｏ）。そして、カラオケ演奏が終了すると（ステップＳ１０でＹｅｓ）、演奏済みの予約楽曲情報が予約管理テーブルから削除される（ステップＳ１１）。 If the false positive determination unit 27 determines that the command word is not falsely detected (No in step S07), the command execution unit 28 executes the operation command associated with the command word (step S08). If the false positive determination unit 27 determines that the command word is falsely detected (Yes in step S07), the command execution unit 28 is prohibited from executing the operation command (step S09). Each process from step S04 to step S09 is repeated until the karaoke performance is completed (No in step S10). Then, when the karaoke performance is completed (Yes in step S10), the reserved music information that has already been played is deleted from the reservation management table (step S11).

以上、第１の実施形態によれば、カラオケ演奏中の音声信号によってコマンドワードが音声入力された場合に、コマンドワードの音声入力期間の基準ピッチと音声ピッチが比較される。基準ピッチと音声ピッチのズレから、利用者の意図に反して歌唱音声によって音声入力されたコマンドワードか、利用者によって意図的に音声入力されたコマンドワードかが認識されて、コマンドワードの誤検出が判定される。よって、カラオケ演奏中に利用者の意図に反して操作コマンドが実行されることがない。 As described above, according to the first embodiment, when the command word is voice-input by the voice signal during the karaoke performance, the reference pitch and the voice pitch of the voice input period of the command word are compared. From the difference between the reference pitch and the voice pitch, it is recognized whether the command word is voice-input by the singing voice against the intention of the user or the command word intentionally input by the user, and the command word is erroneously detected. Is determined. Therefore, the operation command is not executed against the intention of the user during the karaoke performance.

＜第２の実施形態＞
ところで、上記実施形態では、コマンドワードの誤検出判定用に基準ピッチを基準としたピッチ範囲が設定されている。利用者の歌唱力が高い場合であれば、ピッチ範囲が狭く設定されても、コマンドワードの誤検出が精度よく判定される。しかしながら、利用者の歌唱力が低い場合には、ピッチ範囲が狭く設定されると、ピッチ範囲内の音声ピッチの割合が少なくなってコマンドワードの誤検出が精度よく判定されず、利用者の意図に反して操作コマンドが実行される恐れがある。そこで、第２の実施形態のカラオケ装置では、利用者が歌唱したときの採点履歴に応じてピッチ範囲を変更している。 <Second embodiment>
By the way, in the above embodiment, a pitch range based on a reference pitch is set for erroneous detection determination of a command word. If the user's singing ability is high, false detection of command words can be accurately determined even if the pitch range is set narrow. However, when the singing ability of the user is low, if the pitch range is set narrow, the ratio of the voice pitch within the pitch range becomes small, and the false detection of the command word is not accurately determined, and the user's intention. On the contrary, the operation command may be executed. Therefore, in the karaoke device of the second embodiment, the pitch range is changed according to the scoring history when the user sings.

以下、図７及び図８を参照して、第２の実施形態のカラオケ装置３０について説明する。図７は、第２の実施形態のカラオケ装置３０の制御ブロック図である。図８は、第２の実施形態のピッチ範囲設定テーブルの一例を示す図である。第２の実施形態のカラオケ装置３０は、ピッチ範囲が変更可能な点で第１の実施形態のカラオケ装置１１と相違する。したがって、第２の実施形態では、第１の実施形態と同様な構成については説明を省略する。 Hereinafter, the karaoke device 30 of the second embodiment will be described with reference to FIGS. 7 and 8. FIG. 7 is a control block diagram of the karaoke device 30 of the second embodiment. FIG. 8 is a diagram showing an example of the pitch range setting table of the second embodiment. The karaoke device 30 of the second embodiment is different from the karaoke device 11 of the first embodiment in that the pitch range can be changed. Therefore, in the second embodiment, the description of the configuration similar to that of the first embodiment will be omitted.

図７に示すように、第２の実施形態のカラオケ装置３０は、第１の実施形態のカラオケ装置１１（図１参照）と略同様に構成されており、コマンドワードの音声入力によってカラオケ演奏に関する操作コマンドを実行可能に構成されている。カラオケ装置３０にはネットワーク１６を介してサーバ１７が接続されており、サーバ１７の利用者データベースによってカラオケ装置３０の利用者情報が管理されている。カラオケ装置３０には、記憶部３１と、演奏部３２と、抽出部３３と、ピッチ記憶部３４と、コマンド検出部３５と、期間特定部３６と、誤検出判定部３７と、コマンド実行部３８とが設けられている。 As shown in FIG. 7, the karaoke device 30 of the second embodiment has substantially the same configuration as the karaoke device 11 of the first embodiment (see FIG. 1), and relates to a karaoke performance by voice input of a command word. It is configured to be able to execute operation commands. The server 17 is connected to the karaoke device 30 via the network 16, and the user information of the karaoke device 30 is managed by the user database of the server 17. The karaoke device 30 includes a storage unit 31, a performance unit 32, an extraction unit 33, a pitch storage unit 34, a command detection unit 35, a period identification unit 36, a false detection determination unit 37, and a command execution unit 38. And are provided.

サーバ１７の利用者データベースには、利用者ＩＤ（識別情報）毎に利用者が過去に歌唱した楽曲の採点履歴が記憶されている。採点履歴には、楽曲ＩＤ、予約日時、採点結果等が含まれている。利用者が歌唱する度に、カラオケ装置３０からサーバ１７に採点履歴が送信されて、サーバ１７の利用者データベースに採点履歴が蓄積される。なお、カラオケ装置３０による採点処理については、ここでは詳述しないが、例えば、特開２００５−１０７３３１号公報に記載の公知技術を利用して、音声ピッチと基準ピッチが時系列順に比較されることで採点される。 In the user database of the server 17, the scoring history of the music sung by the user in the past is stored for each user ID (identification information). The scoring history includes the music ID, the reservation date and time, the scoring result, and the like. Every time the user sings, the karaoke device 30 transmits the scoring history to the server 17, and the scoring history is accumulated in the user database of the server 17. The scoring process by the karaoke device 30 will not be described in detail here, but for example, the voice pitch and the reference pitch are compared in chronological order by using the known technique described in Japanese Patent Application Laid-Open No. 2005-107331. It will be scored at.

誤検出判定部３７は、サーバ１７の利用者データベースから取得した採点履歴に基づいて、基準ピッチを基準にした所定のピッチ範囲を設定する。具体的には、カラオケ装置３０の利用時に利用者ＩＤとパスワードがリモコン装置１５（図１参照）に入力され、カラオケ装置３０からサーバ１７に利用者ＩＤとパスワードが送信される。サーバ１７において利用者ＩＤとパスワードによって利用者認証が実施された後、利用者ＩＤに対応付けられた採点履歴がサーバ１７からカラオケ装置３０に返信される。カラオケ装置３０は、採点履歴に含まれる採点結果の平均値を算出して、平均値に基づいてピッチ範囲設定テーブルからピッチ範囲を特定する。 The false positive determination unit 37 sets a predetermined pitch range based on the reference pitch based on the scoring history acquired from the user database of the server 17. Specifically, when using the karaoke device 30, the user ID and password are input to the remote controller device 15 (see FIG. 1), and the user ID and password are transmitted from the karaoke device 30 to the server 17. After the user authentication is performed by the user ID and the password on the server 17, the scoring history associated with the user ID is returned from the server 17 to the karaoke device 30. The karaoke device 30 calculates the average value of the scoring results included in the scoring history, and specifies the pitch range from the pitch range setting table based on the average value.

例えば、図８に示すように、ピッチ範囲設定テーブルには、採点結果の平均値にピッチ範囲の大きさが対応付けられている。採点結果の平均値が８５以上の場合には、ピッチ範囲として±１００［ｃｅｎｔ］が特定され、採点結果の平均値が７５以上で８４以下の場合には、ピッチ範囲として±１５０［ｃｅｎｔ］が特定される。採点結果の平均値が６５以上で７４以下の場合には、ピッチ範囲として±２００［ｃｅｎｔ］が特定され、採点結果の平均値が６４以下の場合には、ピッチ範囲として±２５０［ｃｅｎｔ］が特定される。なお、ピッチ範囲設定テーブルは、サーバ１７に記憶されていてもよいし、カラオケ装置３０に記憶されていてもよい。 For example, as shown in FIG. 8, in the pitch range setting table, the size of the pitch range is associated with the average value of the scoring results. When the average value of the scoring results is 85 or more, ± 100 [cent] is specified as the pitch range, and when the average value of the scoring results is 75 or more and 84 or less, the pitch range is ± 150 [cent]. Be identified. When the average value of the scoring results is 65 or more and 74 or less, ± 200 [cent] is specified as the pitch range, and when the average value of the scoring results is 64 or less, ± 250 [cent] is specified as the pitch range. Be identified. The pitch range setting table may be stored in the server 17 or may be stored in the karaoke device 30.

カラオケ装置３０は、特定したピッチ範囲と利用者ＩＤを対応付けて利用者別ピッチ範囲データとして記憶する。リモコン装置１５からカラオケ装置３０に楽曲予約情報として利用者ＩＤと楽曲ＩＤが送信されると、誤検出判定部３７において利用者別ピッチ範囲データの利用者ＩＤが検索される。利用者別ピッチ範囲データに利用者ＩＤが存在する場合には、楽曲ＩＤに対応する楽曲を演奏する際に、利用者ＩＤに対応付けられたピッチ範囲が特定される。そして、楽曲ＩＤに対応したリファレンスデータが読み出され、誤検出判定部３７によってリファレンスデータの基準ピッチを基準としたピッチ範囲が設定される。 The karaoke device 30 associates the specified pitch range with the user ID and stores it as user-specific pitch range data. When the user ID and the music ID are transmitted from the remote control device 15 to the karaoke device 30 as music reservation information, the false detection determination unit 37 searches for the user ID of the pitch range data for each user. When the user ID exists in the user-specific pitch range data, the pitch range associated with the user ID is specified when playing the music corresponding to the music ID. Then, the reference data corresponding to the music ID is read out, and the erroneous detection determination unit 37 sets a pitch range based on the reference pitch of the reference data.

以上、第２の実施形態によれば、第１の実施形態と同様に、カラオケ演奏中に利用者の意図に反して操作コマンドが実行されることがない。さらに、利用者の過去の採点履歴からピッチ範囲が設定される。よって、利用者の歌唱力に合わせてコマンドワードの誤検出が精度よく判定される。 As described above, according to the second embodiment, as in the first embodiment, the operation command is not executed against the intention of the user during the karaoke performance. Furthermore, the pitch range is set from the user's past scoring history. Therefore, the false detection of the command word is accurately determined according to the singing ability of the user.

なお、第１の実施形態では、誤検出判定部２７が基準ピッチを基準とした比較用のピッチ範囲と音声ピッチを比較してコマンドワードの誤検出を判定したが、誤検出判定処理はこの構成に限定されない。誤検出判定部２７はピッチ範囲を設定せずに基準ピッチと音声ピッチを比較して、コマンドワードの誤検出を判定してもよい。例えば、誤検出判定部２７は、基準ピッチと音声ピッチのピッチ差を用いてコマンドワードの誤検出を判定してもよい。 In the first embodiment, the erroneous detection determination unit 27 compares the comparison pitch range based on the reference pitch with the voice pitch to determine erroneous detection of the command word, but the erroneous detection determination process has this configuration. Not limited to. The false detection determination unit 27 may determine false detection of a command word by comparing the reference pitch and the voice pitch without setting the pitch range. For example, the false detection determination unit 27 may determine false detection of a command word by using the pitch difference between the reference pitch and the voice pitch.

また、第１実施形態では、誤検出判定部２７で誤検出を判定する閾値について、全コマンドワードに対して同一の値を設定したが、コマンドワードの長短に応じてコマンドワード毎に異なる値を設定してもよい。例えば、短いコマンドワードの閾値を７０［％］、長いコマンドワードの閾値を５０［％］と設定してもよい。 Further, in the first embodiment, the same value is set for all command words for the threshold value for determining false detection by the false detection determination unit 27, but a different value is set for each command word according to the length of the command word. It may be set. For example, the threshold value of the short command word may be set to 70 [%], and the threshold value of the long command word may be set to 50 [%].

また、各実施形態では、誤検出判定部２７、３７が基準ピッチを中心にしてピッチ範囲を設定したが、誤検出判定部２７、３７は基準ピッチを基準にしてピッチ範囲を設定すればよい。例えば、誤検出判定部２７、３７は基準ピッチを上限にしてピッチ範囲を設定してもよい。 Further, in each embodiment, the false detection determination units 27 and 37 set the pitch range centered on the reference pitch, but the false detection determination units 27 and 37 may set the pitch range based on the reference pitch. For example, the erroneous detection determination units 27 and 37 may set the pitch range with the reference pitch as the upper limit.

また、各実施形態では、カラオケ装置１１、３０がカラオケコマンダである一例について説明したが、カラオケ装置１１、３０はネットワークを介してサーバに接続された携帯電話等の携帯機器によって構成されてもよい。 Further, in each embodiment, an example in which the karaoke devices 11 and 30 are karaoke commanders has been described, but the karaoke devices 11 and 30 may be configured by a mobile device such as a mobile phone connected to a server via a network. ..

また、上記した各実施形態及び変形例において、カラオケ装置１１、３０に対してプログラムをインストールすることによって、カラオケ装置１１、３０にコマンドワードの誤検出を判定する機能が追加されてもよい。このプログラムは記憶媒体に記憶されている。記憶媒体は特に限定されないが、光ディスク、光磁気ディスク、フラッシュメモリ等の非一過性の記憶媒体であってもよい。 Further, in each of the above-described embodiments and modifications, a function for determining a false detection of a command word may be added to the karaoke devices 11 and 30 by installing a program on the karaoke devices 11 and 30. This program is stored in a storage medium. The storage medium is not particularly limited, but may be a non-transient storage medium such as an optical disk, a magneto-optical disk, or a flash memory.

また、本実施形態を説明したが、他の実施形態として、上記実施形態及び変形例を全体的又は部分的に組み合わせたものでもよい。 Moreover, although this embodiment has been described, as another embodiment, the above-described embodiment and modifications may be combined in whole or in part.

また、本発明の技術は上記の実施形態に限定されるものではなく、技術的思想の趣旨を逸脱しない範囲において様々に変更、置換、変形されてもよい。さらには、技術の進歩又は派生する別技術によって、技術的思想を別の仕方によって実現することができれば、その方法を用いて実施されてもよい。したがって、特許請求の範囲は、技術的思想の範囲内に含まれ得る全ての実施態様をカバーしている。 Further, the technique of the present invention is not limited to the above-described embodiment, and may be variously modified, replaced, or modified without departing from the spirit of the technical idea. Furthermore, if the technical idea can be realized in another way by the advancement of the technology or another technology derived from it, it may be carried out by using that method. Therefore, the claims cover all embodiments that may be included within the scope of the technical idea.

１１、３０：カラオケ装置
２４、３４：ピッチ記憶部
２５、３５：コマンド検出部
２６、３６：期間特定部
２７、３７：誤検出判定部
２８、３８：コマンド実行部 11, 30: Karaoke device 24, 34: Pitch storage unit 25, 35: Command detection unit 26, 36: Period identification unit 27, 37: False detection determination unit 28, 38: Command execution unit

Claims

コマンドワードの音声入力によってカラオケ演奏に関する操作コマンドを実行可能なカラオケ装置であって、
カラオケ演奏中にリファレンスデータから抽出した基準ピッチと音声信号から抽出した音声ピッチを時系列に沿って記憶するピッチ記憶部と、
カラオケ演奏中に音声信号によって音声入力されたコマンドワードを検出するコマンド検出部と、
コマンドワードの音声入力開始から音声入力終了までの音声入力期間を特定する期間特定部と、
前記ピッチ記憶部に記憶された音声入力期間の基準ピッチと音声ピッチを比較してコマンドワードの誤検出を判定する誤検出判定部と、
コマンドワードの誤検出と判定された場合に、コマンドワードに対応する操作コマンドの実行を禁止するコマンド実行部と、を備えていることを特徴とするカラオケ装置。 It is a karaoke device that can execute operation commands related to karaoke performance by voice input of command words.
A pitch storage unit that stores the reference pitch extracted from the reference data and the audio pitch extracted from the audio signal in chronological order during a karaoke performance.
A command detector that detects command words input by voice signals during karaoke performance,
The period specification part that specifies the voice input period from the start of voice input of the command word to the end of voice input,
An erroneous detection determination unit that determines erroneous detection of a command word by comparing the reference pitch of the voice input period stored in the pitch storage unit with the voice pitch.
A karaoke device including a command execution unit that prohibits execution of an operation command corresponding to a command word when it is determined that a command word is erroneously detected.

前記誤検出判定部は、基準ピッチを基準とした比較用のピッチ範囲を設定し、ピッチ範囲と音声ピッチを比較してピッチ範囲内の音声ピッチの割合に応じてコマンドワードの誤検出を判定することを特徴とする請求項１に記載のカラオケ装置。 The false detection determination unit sets a pitch range for comparison based on the reference pitch, compares the pitch range with the voice pitch, and determines false detection of the command word according to the ratio of the voice pitch within the pitch range. The karaoke device according to claim 1, wherein the karaoke device is characterized by the above.

前記誤検出判定部は、利用者が過去に歌唱した楽曲の採点履歴に基づいて前記ピッチ範囲を設定することを特徴とする請求項２に記載のカラオケ装置。 Wherein the error detection determination unit, the karaoke apparatus according to claim 2, wherein the setting the pitch range on the basis of the scoring history of songs the user has sung in the past.