JP2012137559A

JP2012137559A - Karaoke device and control method and control program for karaoke device

Info

Publication number: JP2012137559A
Application number: JP2010288842A
Authority: JP
Inventors: Sukenori Kaneko; 祐紀金子; Midori Nakamae; 碧中前; Kazuyo Kuroda; 和代黒田
Original assignee: Toshiba Corp
Current assignee: Toshiba Corp
Priority date: 2010-12-24
Filing date: 2010-12-24
Publication date: 2012-07-19

Abstract

PROBLEM TO BE SOLVED: To present a suitable karaoke musical piece to play according to a background dynamic image generated with image data, such as photograph data.SOLUTION: A karaoke playing terminal 13 comprises a recommended musical piece database in which predetermined conditions are associated with recommended karaoke musical pieces beforehand. It presents a user with recommended karaoke musical pieces by collating an analysis result to the predetermined conditions with regards to the recommended musical piece database. It then lets the user select one musical piece from the presented musical pieces and plays the selected karaoke musical piece.

Description

本発明の実施形態は、カラオケ装置、カラオケ装置の制御方法及び制御プログラムに関する。 Embodiments described herein relate generally to a karaoke apparatus, a karaoke apparatus control method, and a control program.

従来、カラオケ装置は、カラオケ楽曲データを処理して伴奏音楽としてのカラオケ楽曲をスピーカなどの音響システムを介して出力するとともに、このカラオケ楽曲に同期させて歌詞画像をディスプレイに出力していた。 Conventionally, a karaoke apparatus processes karaoke music data and outputs karaoke music as accompaniment music through a sound system such as a speaker, and outputs a lyrics image in synchronization with the karaoke music.

また、これらと並行して、カラオケ装置は、ビデオＣＤなどに記憶された規定の映像データを処理して、背景映像を再生し、この背景映像に歌詞画像をスーパーインポーズ表示するものが知られている。
さらに、リクエスト回数などに基づいて、推奨するカラオケ楽曲をユーザに提示するカラオケ装置も知られている。 In parallel with these, a karaoke apparatus is known that processes prescribed video data stored in a video CD, etc., reproduces a background video, and superimposes a lyrics image on the background video. ing.
Furthermore, a karaoke apparatus that presents recommended karaoke music to the user based on the number of requests is also known.

特開２００８−２７５９３６号公報JP 2008-275936 A

ところで、背景映像として規定の映像データに基づくものではなく、外部入力画像を用いることが考えられる。
従来技術においては、このような場合に、カラオケ装置により提示されたカラオケ楽曲が必ずしも当該外部入力画像にマッチするとは限らなかった。 By the way, it is conceivable to use an external input image instead of based on prescribed video data as a background video.
In the related art, in such a case, the karaoke music presented by the karaoke device does not always match the external input image.

そこで、本発明の目的は、写真データなどの画像データを用いて生成した背景動画像に合わせて演奏するのに好適なカラオケ楽曲を提示することが可能なカラオケ装置、カラオケ装置の制御方法および制御プログラムを提供することにある。 Accordingly, an object of the present invention is to provide a karaoke device capable of presenting a karaoke piece suitable for performance in accordance with a background moving image generated using image data such as photo data, a control method and control of the karaoke device. To provide a program.

実施形態のカラオケ装置は、所定の条件と、推奨するカラオケ楽曲と、をあらかじめ紐づけた推奨楽曲データベースを備えている。
また、カラオケ装置は、指定された静止画像データを解析する解析手段を備えている。 The karaoke apparatus according to the embodiment includes a recommended music database in which predetermined conditions and recommended karaoke music are linked in advance.
Moreover, the karaoke apparatus is provided with the analysis means which analyzes the designated still image data.

そして、楽曲提示手段は、解析の結果を、所定の条件に照らし合わせて推奨楽曲データベースを参照し、ユーザに対して推奨するカラオケ楽曲を提示する。
これにより選択受付手段は、提示されたカラオケ楽曲からいずれかをユーザに選択させる。
これらの結果、カラオケ装置では、ユーザに対して推奨するカラオケ楽曲のうち、ユーザにより選択されたカラオケ楽曲を再生する。 Then, the music presentation means refers to the recommended music database in light of the analysis result in accordance with a predetermined condition, and presents the recommended karaoke music to the user.
As a result, the selection receiving means causes the user to select one of the presented karaoke songs.
As a result, the karaoke apparatus reproduces the karaoke music selected by the user among the karaoke music recommended for the user.

図１は、実施形態に係るカラオケ装置を備えた通信カラオケシステムの概要構成説明図である。FIG. 1 is an explanatory diagram of a schematic configuration of a communication karaoke system including a karaoke apparatus according to an embodiment. 図２は、カラオケ演奏端末のブロック図である。FIG. 2 is a block diagram of the karaoke performance terminal. 図３は、カラオケ装置の要部の機能構成説明図である。FIG. 3 is an explanatory diagram of a functional configuration of a main part of the karaoke apparatus. 図４は、素材情報の構成説明図である。FIG. 4 is a diagram for explaining the structure of the material information. 図５は、分析情報の一構成例の説明図である。FIG. 5 is an explanatory diagram of a configuration example of analysis information. 図６は、笑顔度と人数とに基づいて決定されるエフェクトの一例を説明する図である。FIG. 6 is a diagram illustrating an example of an effect determined based on the smile level and the number of people. 図７は、上述した各エフェクト集に対応する具体的なエフェクト例の説明図である。FIG. 7 is an explanatory diagram of a specific effect example corresponding to each effect collection described above. 図８は、カラオケ再生処理の処理フローチャートである。FIG. 8 is a process flowchart of the karaoke playback process. 図９は、素材分析処理の処理フローチャートである。FIG. 9 is a process flowchart of the material analysis process. 図１０は、推奨楽曲データベースの構成説明図である。FIG. 10 is a diagram illustrating the configuration of the recommended music database.

次に実施の形態について図面を参照して説明する。
図１は、実施形態に係るカラオケ装置を備えた通信カラオケシステムの概要構成説明図である。
通信カラオケシステム１０は、カラオケ楽曲データなどを格納した図示しないカラオケデータベースを有するカラオケホスト１１と、カラオケホスト１１に対し、インターネット、ＶＰＮなどの通信ネットワーク１２を介して接続された複数のカラオケ演奏端末１３と、各カラオケ演奏端末１３に無線通信ネットワークを介して接続される複数のユーザ操作端末１４と、を備えている。 Next, embodiments will be described with reference to the drawings.
FIG. 1 is an explanatory diagram of a schematic configuration of a communication karaoke system including a karaoke apparatus according to an embodiment.
The communication karaoke system 10 includes a karaoke host 11 having a karaoke database (not shown) storing karaoke music data and the like, and a plurality of karaoke performance terminals 13 connected to the karaoke host 11 via a communication network 12 such as the Internet or VPN. And a plurality of user operation terminals 14 connected to each karaoke performance terminal 13 via a wireless communication network.

図２は、カラオケ演奏端末のブロック図である。
カラオケ演奏端末１３は、カラオケ演奏端末１３全体を制御するコントローラ１０１と、ユーザによるカラオケ演奏端末１３の操作入力を直接あるいはユーザ操作端末１４を介して間接に受け付けたり、ユーザ所有のＵＳＢ機器あるいはメモリカードなどからデータの入力を受け付けたりするユーザインタフェース１０２と、各種データ及びデータベースを記憶したハードディスクドライブ（ＨＤＤ）１０３と、を備えている。 FIG. 2 is a block diagram of the karaoke performance terminal.
The karaoke performance terminal 13 receives a controller 101 for controlling the karaoke performance terminal 13 as a whole and an operation input of the karaoke performance terminal 13 by a user directly or indirectly via the user operation terminal 14, or a user-owned USB device or memory card. And a hard disk drive (HDD) 103 in which various data and databases are stored.

また、カラオケ演奏端末１３は、通信ネットワーク１２を介してカラオケホスト１１との間の通信を行う通信インタフェース（Ｉ／Ｆ）１０４と、ＣＤ、ＤＶＤなどの光ディスクの記録／再生を行う光ディスクドライブ１０５と、ＶＲＡＭ１０６に格納されている表示画像データに基づいてディスプレイ１０７に各種表示を行う表示コントローラ１０８と、を備えている。 The karaoke performance terminal 13 includes a communication interface (I / F) 104 that performs communication with the karaoke host 11 via the communication network 12, and an optical disk drive 105 that performs recording / reproduction of an optical disk such as a CD or a DVD. And a display controller 108 for performing various displays on the display 107 based on the display image data stored in the VRAM 106.

さらに、カラオケ演奏端末１３は、マイクロフォン１０９Ａ、１０９Ｂからの入力音声をコントローラ側から入力されるカラオケ音響データに対応するカラオケ音響信号に重畳してスピーカ１１０に出力するサウンドコントローラ１１１と、各種画像を撮像するカメラ１１２と、を備えている。 Furthermore, the karaoke performance terminal 13 superimposes the input sound from the microphones 109A and 109B on the karaoke sound signal corresponding to the karaoke sound data input from the controller side, and outputs various images to the speaker 110. And a camera 112.

上記構成において、コントローラ１０１は、当該コントローラ１０１全体を制御するＣＰＵ１２１と、各種制御プログラムを不揮発的に記憶するＲＯＭ１２２と、各種データを一時的に格納し、ワーキングエリアとして機能するＲＡＭ１２３と、を備えている。 In the above configuration, the controller 101 includes a CPU 121 that controls the entire controller 101, a ROM 122 that stores various control programs in a nonvolatile manner, and a RAM 123 that temporarily stores various data and functions as a working area. Yes.

ユーザＩ／Ｆ１０２は、ユーザが各種操作を行う図示しない操作子が配置された操作パネル１２５と、ＵＳＢコネクタ１２６を介して接続された外部のＵＳＢ機器の制御を行うＵＳＢコントローラ１２７と、カードコネクタ１２８を介して接続された外部のメモリカードの制御を行うカードコントローラ１２９と、ユーザ操作端末１４からの無線通信により遠隔操作がなされるリモコンインタフェース（Ｉ／Ｆ）１３０と、を備えている。 The user I / F 102 includes an operation panel 125 on which an operator (not shown) for performing various operations by the user, a USB controller 127 for controlling an external USB device connected via the USB connector 126, and a card connector 128. A card controller 129 that controls an external memory card connected via the remote controller, and a remote control interface (I / F) 130 that is remotely operated by wireless communication from the user operation terminal 14.

図３は、カラオケ装置の要部の機能構成説明図である。
ここでは、動画像再生アプリケーションプログラム２０２が有する機能のうち、動画像生成機能を実現するための機能構成について説明する。 FIG. 3 is an explanatory diagram of a functional configuration of a main part of the karaoke apparatus.
Here, a functional configuration for realizing a moving image generation function among the functions of the moving image reproduction application program 202 will be described.

この動画像生成機能は、ユーザＩ／Ｆ１０２（上述のＵＳＢコントローラ１２７、カードコントローラ１２９等）を介して外部デバイス（ＵＳＢメモリ、メモリカード等）から格納された素材データ５１だけでなく、ＨＤＤ１０３内の所定のディレクトリに格納された素材データ５１や、通信インタフェース１０４及び通信ネットワーク１２を介して格納された素材データ５１に対しても適用できる。 This moving image generation function is used not only for the material data 51 stored from an external device (USB memory, memory card, etc.) via the user I / F 102 (the above-mentioned USB controller 127, card controller 129, etc.) but also in the HDD 103. The present invention can also be applied to the material data 51 stored in a predetermined directory and the material data 51 stored via the communication interface 104 and the communication network 12.

ここで、素材データ５１とは、例えば、ＨＤＤ１０３内の所定のディレクトリに格納された素材データ５１を例とすると、静止画像データ３０１Ａ、音声データ３０１Ｂ、動画像データ３０１Ｃ等である。 Here, the material data 51 is, for example, still image data 301 A, audio data 301 B, moving image data 301 C, and the like when the material data 51 stored in a predetermined directory in the HDD 103 is taken as an example.

動画像再生アプリケーションプログラム２０２は、コントローラ１０１のＲＡＭ１２３上に展開されており、機能的に見ると、素材入力部２１、素材分析部２２及び動画再生部２３を備えている。 The moving image reproduction application program 202 is developed on the RAM 123 of the controller 101, and includes a material input unit 21, a material analysis unit 22, and a moving image reproduction unit 23 when viewed functionally.

素材入力部２１は、ＵＳＢコントローラ１２７、カードコントローラ１２９等のユーザＩ／Ｆ１０２を介して、素材データ５１が入力されると、素材データ５１をＨＤＤ１０３内のデータベース１３１を構成する素材データベース３０１に格納する。ここで、素材データベース３０１は、生成される動画像に用いられる素材データ５１を格納するためのデータベースである。 When the material data 51 is input via the user I / F 102 such as the USB controller 127 or the card controller 129, the material input unit 21 stores the material data 51 in the material database 301 constituting the database 131 in the HDD 103. . Here, the material database 301 is a database for storing material data 51 used for a generated moving image.

具体的には、素材データベース３０１には、素材データ５１としての静止画像データ３０１Ａ、音声データ３０１Ｂ、動画像データ３０１Ｃ等が格納される。素材データベース３０１に格納された素材データ５１は、生成すべき動画像の素材候補として用いられる。 Specifically, the material database 301 stores still image data 301A, sound data 301B, moving image data 301C, and the like as material data 51. The material data 51 stored in the material database 301 is used as a material candidate for a moving image to be generated.

また、素材入力部２１は、素材データ５１がＨＤＤ１０３に格納されたことを素材分析部２２に通知する。 In addition, the material input unit 21 notifies the material analysis unit 22 that the material data 51 is stored in the HDD 103.

素材分析部２２は、素材入力部２１からの通知に応答して、素材データ５１の分析処理を開始する。
以下の説明においては、分析処理の分析対象である素材データ５１として、写真データが入力された場合を説明する。また、分析の目的としては、素材データ５１としての写真データに含まれる人物の表情（特に笑顔）及び人数を分析結果として出力するものとする。 In response to the notification from the material input unit 21, the material analysis unit 22 starts analysis processing of the material data 51.
In the following description, a case will be described in which photographic data is input as material data 51 that is an analysis target of analysis processing. For the purpose of analysis, it is assumed that facial expressions (especially smiles) and the number of persons included in the photographic data as the material data 51 are output as analysis results.

素材分析部２２は、大別すると、顔画像検出部２２１と、表情検出部２２２と、人数検出部２２３と、を備えている。なお、以下の説明においては、分析対象の素材データ５１が静止画像データ３０１Ａである場合を想定する。 The material analysis unit 22 includes a face image detection unit 221, a facial expression detection unit 222, and a number of people detection unit 223, when roughly classified. In the following description, it is assumed that the material data 51 to be analyzed is still image data 301A.

顔画像検出部２２１は、静止画像データ３０１Ａから顔画像を検出する顔検出処理を実行する。顔画像は、例えば、静止画像データ３０１Ａの特徴を解析し、予め用意された顔画像特徴サンプルと類似する特徴を有する領域を探索することによって検出することができる。ここで、顔画像特徴サンプルは、多数の人物それぞれの顔画像特徴を統計的に処理することによって抽出された特徴データである。 The face image detection unit 221 executes face detection processing for detecting a face image from the still image data 301A. The face image can be detected by, for example, analyzing the feature of the still image data 301A and searching for a region having a feature similar to a face image feature sample prepared in advance. Here, the face image feature sample is feature data extracted by statistically processing the face image features of a large number of persons.

顔検出処理の実行がなされると、静止画像データ３０１Ａ内に含まれる各顔画像の位置（座標）、サイズ、正面度等が検出される。 When the face detection process is executed, the position (coordinates), size, frontality, etc. of each face image included in the still image data 301A are detected.

さらに、顔画像検出部２２１は、静止画像データ３０１Ａから検出された複数の顔画像を、同一の人物と推定される顔画像別のグループに分類する。 Furthermore, the face image detection unit 221 classifies the plurality of face images detected from the still image data 301A into groups for each face image estimated as the same person.

また、顔画像検出部２２１は検出された顔画像に対応する人物を識別（特定）してもよい。その場合、顔画像検出部２２１は、例えば、識別する人物の顔画像特徴サンプルを用いて、検出された顔画像がその人物であるか否かを判定する。顔画像検出部２２１は、上述の結果に基づき、検出された顔画像に人物毎の顔ＩＤを付与する。顔画像検出部２２１は、検出した顔画像の情報（顔画像そのものおよび分類結果）を表情検出部２２２及び人数検出部２２３に出力する。 Further, the face image detection unit 221 may identify (specify) a person corresponding to the detected face image. In that case, the face image detection unit 221 determines whether or not the detected face image is the person using, for example, a face image feature sample of the person to be identified. The face image detection unit 221 assigns a face ID for each person to the detected face image based on the above result. The face image detection unit 221 outputs the detected face image information (the face image itself and the classification result) to the expression detection unit 222 and the number of people detection unit 223.

これにより、顔画像の情報が入力された表情検出部２２２は、顔画像検出部２２１によって検出された顔画像に対応する表情を検出する。そして、表情検出部２２２は、当該顔画像が検出された表情である尤もらしさを示す度合い（尤度）を算出する。 Thereby, the facial expression detection unit 222 to which the facial image information is input detects a facial expression corresponding to the facial image detected by the facial image detection unit 221. Then, the facial expression detection unit 222 calculates a degree (likelihood) indicating the likelihood that the facial image is the detected facial expression.

本実施形態においては、表情検出部２２２は、検出された顔画像に対応する表情が「笑顔」であるか否かを判定している。具体的には、表情検出部２２２は、例えば、「笑顔」の顔画像特徴サンプルに類似する特徴を有する顔画像を「笑顔」であると判定している。 In the present embodiment, the facial expression detection unit 222 determines whether the facial expression corresponding to the detected face image is “smile”. Specifically, the facial expression detection unit 222 determines that a face image having a feature similar to the “smile” face image feature sample is “smile”, for example.

そして、表情検出部２２２は、顔画像に対応する表情を「笑顔」であると判定した場合には、顔画像が笑顔であると推測する尤もらしさの度合いを笑顔度として算出することとなる。この場合において、１枚の静止画像データ３０１Ａから複数の画像が検出されている際には、表情検出部２２２は、例えば、複数の顔画像の笑顔度の平均を、静止画像データ３０１Ａの笑顔度とする。 When the facial expression detection unit 222 determines that the facial expression corresponding to the face image is “smile”, the facial expression detection unit 222 calculates the likelihood of estimating that the facial image is a smile as the smile level. In this case, when a plurality of images are detected from one still image data 301A, the facial expression detection unit 222 calculates, for example, the average smile level of the plurality of face images and the smile level of the still image data 301A. And

なお、笑顔度は、数値に限らず、例えば「高い」、「低い」といった相対的な指標で表してもよい。笑顔度を相対的な指標で表す際に、１枚の静止画像データ３０１Ａから複数の顔画像が検出されているときには、表情検出部２２２は、例えば、より多くの顔画像に設定されている方の指標（例えば、「高い」）を、静止画像データ３０１Ａの笑顔度に決定する。 The smile level is not limited to a numerical value, and may be expressed by a relative index such as “high” or “low”. When expressing a smile level as a relative index, if a plurality of face images are detected from one still image data 301A, the facial expression detection unit 222 is set to a larger number of face images, for example. Is determined based on the smile level of the still image data 301A.

以下の本実施形態の説明では、説明の簡略化のため、笑顔度のみを例として説明するが、表情検出部２２２は、笑顔に限らず、怒った顔、泣き顔、驚いた顔、無表情など、あらゆる表情である尤度を算出してもよい。 In the following description of the present embodiment, for the sake of simplicity, only the smile level will be described as an example. However, the facial expression detection unit 222 is not limited to a smile, but an angry face, a crying face, a surprised face, no expression, etc. The likelihood of any facial expression may be calculated.

一方、人数検出部２２３は、静止画像データ３０１Ａに含まれる人物の数を検出する。人数検出部２２３は、例えば、顔画像検出部２２１によって検出された顔画像の数を、静止画像データ３０１Ａに含まれる人物の数とする。また、人数検出部２２３は、例えば、顔画像を含む人物の全身や体の一部等を検出することにより、後ろ姿で捉えられた人物等を含む人数を算出してもよい。 On the other hand, the number-of-people detection unit 223 detects the number of people included in the still image data 301A. For example, the number-of-people detection unit 223 sets the number of face images detected by the face image detection unit 221 as the number of persons included in the still image data 301A. In addition, the number of persons detection unit 223 may calculate the number of persons including a person caught in the back, for example, by detecting the whole body or part of the body of the person including the face image.

なお、人数は、数値に限らず、例えば「多い」、「少ない」といった相対的な指標で表してもよい。人数検出部２２３は、例えば、静止画像データ３０１Ａからしきい値以上の数の顔画像が検出されているとき、静止画像データ３０１Ａの人数を「多い」に決定する。 The number of people is not limited to a numerical value, and may be expressed by a relative index such as “large” or “low”. For example, when the number of face images equal to or greater than the threshold value is detected from the still image data 301A, the number of people detection unit 223 determines the number of still image data 301A to be “large”.

素材分析部２２は、素材データ５１に付加された後述する素材情報３０２Ａ及び素材分析部２２の分析により生成された分析情報３０２Ｂを、ＨＤＤ１０３内の素材情報データベース３０２に格納する。 The material analysis unit 22 stores later-described material information 302 A added to the material data 51 and analysis information 302 B generated by the analysis of the material analysis unit 22 in the material information database 302 in the HDD 103.

図４は、素材情報の構成説明図である。
素材情報３０２Ａは、素材ＩＤ、ファイルパス、ファイルサイズ、ファイル形式、生成日時、生成場所、種類、画像サイズ、再生時間、入力経路を示す情報を含んでいる。 FIG. 4 is a diagram for explaining the structure of the material information.
The material information 302A includes information indicating a material ID, file path, file size, file format, generation date / time, generation location, type, image size, playback time, and input path.

ここで、「素材ＩＤ」は、素材データ５１を特定するために、一意に割り当てられる識別情報である。「ファイルパス」は、素材データ５１がＨＤＤ１０３上で、格納される場所を示す。「ファイルサイズ」は、素材データ５１のデータサイズを示す。「ファイル形式」は、素材データ５１のデータフォーマット（例えば、動画であれば、ｍｐｅｇフォーマット、ｗｍａフォーマット等、静止画であればｊｐｅｇフォーマット、ｂｍｐフォーマット等、音声であればｍｐ３フォーマット、ｗａｖフォーマット等）を示す。「生成日時」は、素材データ５１が生成された日時を表す情報（例えば、２０１０年１１月１０日等）を示す。「生成場所」は、素材データ５１が生成された場所を表す位置情報（例えば、ＧＰＳ測位による経度・移動情報）を示す。「種類」は、素材データ５１のデータ内容の種類（例えば、静止画像、音声、動画像等）を示す。「画像サイズ」は、素材データ５１が、静止画像データ３０１Ａ又は動画像データ３０１Ｃに対応するものであるときに、それらの表示時の画像サイズ（例えば、１０２４×７６８ピクセル等）を示す。「再生時間」は、素材データ５１が、音声データ３０１Ｂ又は動画像データ３０１Ｃに対応するものであるときに、通常速度で再生時の再生時間を示す。「入力経路」は、素材データ５１がカラオケ演奏端末１３に入力された経路（例えば、外部記憶メディア、外部記憶装置、ネットワーク上のサーバ等）を示す。 Here, the “material ID” is identification information uniquely assigned to identify the material data 51. “File path” indicates a location where the material data 51 is stored on the HDD 103. “File size” indicates the data size of the material data 51. “File format” is the data format of the material data 51 (for example, mpeg format, wma format, etc. for moving images, jpeg format, bmp format, etc. for still images, mp3 format, wav format, etc. for audio). Indicates. “Generation date / time” indicates information (for example, November 10, 2010) indicating the date / time when the material data 51 was generated. The “generation location” indicates position information (for example, longitude / movement information by GPS positioning) that represents a location where the material data 51 is generated. “Type” indicates the type of data content of the material data 51 (for example, still image, sound, moving image, etc.). “Image size” indicates an image size (for example, 1024 × 768 pixels) when the material data 51 corresponds to the still image data 301A or the moving image data 301C. “Reproduction time” indicates the reproduction time during reproduction at normal speed when the material data 51 corresponds to the audio data 301B or the moving image data 301C. “Input path” indicates a path (for example, an external storage medium, an external storage device, a server on a network, etc.) through which the material data 51 is input to the karaoke performance terminal 13.

図５は、分析情報の一構成例の説明図である。
分析情報３０２Ｂは、図５に示すように、例えば、上述した素材ＩＤ、笑顔度、人数及び顔画像情報を含む。 FIG. 5 is an explanatory diagram of a configuration example of analysis information.
As shown in FIG. 5, the analysis information 302B includes, for example, the above-described material ID, smile level, number of persons, and face image information.

また、顔画像情報は、上述の顔検出処理による分析結果に基づく情報を示す。したがって、顔画像情報は、例えば、顔画像、サイズ、位置、顔ＩＤを示す情報を含む。また、顔画像情報は、各顔画像の笑顔度を含んでもよい。
なお、分析情報３０２Ｂには、１つの静止画像データ３０１Ａから検出された顔画像に対応する数だけ、顔画像情報が格納される。 The face image information indicates information based on the analysis result by the face detection process described above. Accordingly, the face image information includes, for example, information indicating the face image, size, position, and face ID. The face image information may include the smile level of each face image.
Note that as many pieces of face image information as the number of face images detected from one still image data 301A are stored in the analysis information 302B.

また、素材分析部２１１は、静止画像データ３０１Ａから人物（顔画像を含む全身や体の一部等）、風景（海、山、花等）、動物（犬、猫、魚等）等のオブジェクトを検出（認識）し、それら分析結果（検出結果）を示す情報を含む分析情報３０２Ｂを生成してもよい。 In addition, the material analysis unit 211 obtains objects such as a person (a whole body including a face image or a part of a body), a landscape (a sea, a mountain, a flower, etc.), an animal (a dog, a cat, a fish, etc.) from the still image data 301A. May be detected (recognized), and analysis information 302B including information indicating the analysis results (detection results) may be generated.

さらに、素材分析部２１１は、素材情報３０２Ａや静止画像データ３０１Ａから撮影時刻、撮影位置等を推定し、それら分析結果（推定結果）を示す情報を含む分析情報３０２Ｂを生成してもよい。その場合、図５に示すように、分析情報３０２Ｂには、人物画像、サイズ、位置、及び人物ＩＤを含む人物画像情報、風景画像、サイズ、位置、及び属性を含む風景情報、動物画像、サイズ、位置、及び属性を含む動物情報、撮影時刻、並びに撮影位置が含まれる。 Further, the material analysis unit 211 may estimate the shooting time, the shooting position, and the like from the material information 302A and the still image data 301A, and generate analysis information 302B including information indicating the analysis results (estimation results). In this case, as shown in FIG. 5, analysis information 302B includes person image information including person images, sizes, positions, and person IDs, landscape information including landscape images, sizes, positions, and attributes, animal images, sizes. , Animal information including position and attribute, shooting time, and shooting position.

なお、素材分析部２１１は、音声データ３０１Ｂを分析し、検出された声に対応する人物の情報及び人物の数、検出された音楽の雰囲気及びジャンル等を含む分析情報３０２Ｂを生成してもよい。 Note that the material analysis unit 211 may analyze the audio data 301B and generate analysis information 302B including information about the person and the number of persons corresponding to the detected voice, the atmosphere and genre of the detected music, and the like. .

さらに、素材分析部２１１は、動画像データ３０１Ｃに含まれる各画像フレームを、静止画像データ３０１Ａと同様に分析し、上述の笑顔度、人数、顔画像情報等を含む分析情報３０２Ｂを生成してもよい。 Further, the material analysis unit 211 analyzes each image frame included in the moving image data 301C in the same manner as the still image data 301A, and generates analysis information 302B including the above-described smile level, number of persons, face image information, and the like. Also good.

素材分析部２１１は、入力された素材データ５１に対応する素材情報３０２Ａ及び分析情報３０２Ｂが、素材情報データベース３０２に格納されたことを動画再生部２３に通知する。 The material analysis unit 211 notifies the moving image reproduction unit 23 that the material information 302A and the analysis information 302B corresponding to the input material data 51 are stored in the material information database 302.

動画再生部２３は、素材分析部２２からの通知に応答して、素材データ５１を用いて合成動画（動画像）を生成し、生成された合成動画を再生（表示）する処理を開始する。その際、動画再生部２３は、素材情報データベース３０２を参照して、所定の条件を満たす素材データ５１を素材データベース３０１から抽出し、合成動画を生成する。
この場合において、動画再生部２３は、エフェクト抽出部２３１と、合成動画生成部２３２と、合成動画出力部２３３と、を備えている。 In response to the notification from the material analysis unit 22, the video playback unit 23 generates a composite video (moving image) using the material data 51, and starts a process of playing back (displaying) the generated composite video. At that time, the moving image reproducing unit 23 refers to the material information database 302 and extracts the material data 51 satisfying a predetermined condition from the material database 301 to generate a synthesized moving image.
In this case, the video playback unit 23 includes an effect extraction unit 231, a composite video generation unit 232, and a composite video output unit 233.

エフェクト抽出部２３１は、エフェクトデータベース３０３から、取り込んだ素材データ５１に適したエフェクトデータ３０３Ａを抽出する。ここで、エフェクトデータ３０３Ａとしては、ズーム、回転、ノイズ追加、モザイク化、輪郭抽出、エンボスなどの通常のビデオエフェクトの他、シーン間をつなぐトランジションも含まれるものとする。 The effect extraction unit 231 extracts effect data 303A suitable for the captured material data 51 from the effect database 303. Here, the effect data 303A includes transitions that connect scenes in addition to normal video effects such as zoom, rotation, noise addition, mosaication, contour extraction, and embossing.

具体的には、エフェクト抽出部２３１は、まず、素材情報データベース３０２から、抽出された素材データ５１に対応する分析情報３０２Ｂに含まれる笑顔度と人数とを抽出する。 Specifically, the effect extraction unit 231 first extracts the smile level and the number of people included in the analysis information 302B corresponding to the extracted material data 51 from the material information database 302.

そして、エフェクト抽出部２３１は、抽出した笑顔度と人数とに基づいて、抽出された素材データ５１に適したエフェクトデータ３０３Ａを選択する。エフェクト抽出部２３１は、例えば、抽出された複数の静止画像データ３０１Ａ（素材データ５１）の各々に対応する笑顔度と人数とから、これら複数の静止画像データ３０１Ａ全体での笑顔度の指標と人数の指標とを算出する。 Then, the effect extraction unit 231 selects effect data 303A suitable for the extracted material data 51 based on the extracted smile level and the number of people. For example, the effect extraction unit 231 determines the smile level index and the number of people in the entire still image data 301A from the smile level and the number of people corresponding to each of the extracted still image data 301A (material data 51). Is calculated.

すなわち、エフェクト抽出部２３１は、例えば、抽出された複数の静止画像データ３０１Ａのうち、笑顔度が第１しきい値以上である顔画像を含む静止画像データ３０１Ａの数を、複数の静止画像データ３０１Ａ全体での笑顔度の指標に決定する。 That is, for example, the effect extraction unit 231 determines the number of still image data 301A including a face image having a smile degree equal to or greater than the first threshold among the plurality of extracted still image data 301A. It is determined as an index of smile level in the entire 301A.

また、エフェクト抽出部２３１は、例えば、抽出された複数の静止画像データ３０１Ａの各々に対応する笑顔度の平均を、これら複数の静止画像データ３０１Ａ全体の笑顔度の指標に決定してもよい。 In addition, for example, the effect extraction unit 231 may determine the average smile level corresponding to each of the extracted still image data 301A as an index of the smile level of the entire still image data 301A.

また、エフェクト抽出部２３１は、例えば、抽出された複数の静止画像データ３０１Ａのうち、人数が第２しきい値以上である静止画像データ３０１Ａのうちの数を、複数の静止画像データ３０１Ａ全体での人数の指標に決定する。また、エフェクト抽出部２３１は、例えば、抽出された複数の静止画像データ３０１Ａの各々に対応する人数の平均を、これら複数の静止画像データ３０１Ａ全体の人数の指標に決定してもよい。 Further, for example, the effect extraction unit 231 calculates the number of the still image data 301A whose number of persons is equal to or greater than the second threshold value among the plurality of extracted still image data 301A in the plurality of still image data 301A as a whole. To be an indicator of the number of people. Further, for example, the effect extraction unit 231 may determine an average of the number of people corresponding to each of the plurality of extracted still image data 301A as an index of the number of people of the plurality of still image data 301A as a whole.

なお、上述のように、笑顔度と人数とは相対的な指標で表されてもよい。したがって、例えば、抽出された複数の静止画像データ３０１Ａの各々に「高い」又は「低い」という笑顔度が設定されているとき、エフェクト抽出部２３１は、より多くの静止画像データ３０１Ａに設定されている方の指標（例えば、「高い」）を、これら複数の静止画像データ３０１Ａ全体の笑顔度に決定する。また、例えば、抽出された複数の静止画像データ３０１Ａのうち、所定の割合（第１しきい値）以上の静止画像データ３０１Ａに「高い」という笑顔度が設定されているとき、エフェクト抽出部２３１は、これら複数の静止画像データ３０１Ａ全体の笑顔度を「高い」に決定する。 As described above, the smile level and the number of people may be represented by relative indices. Therefore, for example, when a smile level of “high” or “low” is set for each of a plurality of extracted still image data 301A, the effect extraction unit 231 is set to more still image data 301A. The index (for example, “high”) that is present is determined as the smile level of the entire still image data 301A. Further, for example, when a smile level of “high” is set in still image data 301A having a predetermined ratio (first threshold value) or more among a plurality of extracted still image data 301A, the effect extraction unit 231 Determines the smile level of the entire plurality of still image data 301A to be “high”.

同様に、例えば、抽出された複数の静止画像データ３０１Ａの各々に「多い」又は「少ない」という人数が設定されているとき、エフェクト抽出部２３１は、より多くの静止画像データ３０１Ａに設定されている方の指標（例えば、「少ない」）を、これら複数の静止画像データ３０１Ａ全体の人数に決定する。また、例えば、抽出された複数の静止画像データ３０１Ａのうち、所定の割合（第２しきい値）以上の静止画像データ３０１Ａに「多い」という人数が設定されているとき、エフェクト抽出部２３１は、これら複数の静止画像データ３０１Ａ全体の人数を「多い」に決定する。 Similarly, for example, when the number of “large” or “small” is set in each of the extracted still image data 301A, the effect extraction unit 231 is set to more still image data 301A. The index of the person (for example, “less”) is determined as the total number of the still image data 301A. In addition, for example, when the number of “large” is set in the still image data 301A that is equal to or higher than a predetermined ratio (second threshold) among the plurality of extracted still image data 301A, the effect extraction unit 231 Then, the number of the entire still image data 301A is determined to be “large”.

エフェクト抽出部２３１は、上述のように決定される、抽出された複数の静止画像データ３０１Ａ全体に対応する笑顔度と人数とに基づいて、これら複数の静止画像データ３０１Ａに適したエフェクトデータ３０３Ａを決定する。 The effect extraction unit 231 determines the effect data 303A suitable for the plurality of still image data 301A based on the smile level and the number of people corresponding to the entire extracted plurality of still image data 301A determined as described above. decide.

図６は、笑顔度と人数とに基づいて決定されるエフェクトの一例を説明する図である。
エフェクト抽出部２３１は、抽出された複数の静止画像データ３０１Ａ全体に対応する笑顔度と人数とに応じて、人数が多く、笑顔度が高い素材である場合には、人数が多く、笑顔度が高い素材である場合に適すると考えられるエフェクト集５１Ａを選択する。 FIG. 6 is a diagram illustrating an example of an effect determined based on the smile level and the number of people.
The effect extraction unit 231 has a large number of people and a high smile level in the case of a material with a large number of people and a high smile level according to the smile level and the number of people corresponding to the entire extracted plurality of still image data 301A. An effect collection 51A that is considered suitable for a high material is selected.

また、エフェクト抽出部２３１は、人数が多く、笑顔度が低い素材である場合には、人数が多く、笑顔度が低い素材である場合に適すると考えられるエフェクト集５１Ｂを選択する。 The effect extraction unit 231 selects an effect collection 51B that is considered suitable for a material with a large number of people and a low smile level when the number of people is a material with a low smile level.

また、エフェクト抽出部２３１は、人数が少なく、笑顔度が低い素材である場合には、人数が少なく、笑顔度が低い素材に適すると考えられるエフェクト集５１Ｃを選択する。 The effect extraction unit 231 selects an effect collection 51C that is suitable for a material with a small number of people and a low smile level when the number of people is low and the smile level is low.

また、エフェクト抽出部２３１は、人数が少なく、笑顔度が高い素材である場合には、人数が少なく、笑顔度が高い素材に適すると考えられるエフェクト集５１Ｄを選択する。 In addition, when the material is a material with a small number of people and a high smile level, the effect extraction unit 231 selects an effect collection 51D that is considered suitable for a material with a small number of people and a high smile level.

図７は、上述した各エフェクト集に対応する具体的なエフェクト例の説明図である。
人数が多く、笑顔度が高い素材に適したエフェクト集５２Ａとしては、幸せな印象や元気な印象を想起させる効果（装飾）を有する一群のエフェクトが用いられる。したがって、その場を盛り上げることができるような効果が付与される。 FIG. 7 is an explanatory diagram of a specific effect example corresponding to each effect collection described above.
As the effect collection 52A suitable for a material having a large number of people and a high smile level, a group of effects having an effect (decoration) reminiscent of a happy impression or a cheerful impression is used. Therefore, an effect that can excite the place is given.

また、人数が多く、笑顔度が低い素材に適したエフェクト集５２Ｂとしては、セレモニーを想起させる効果を有する一群のエフェクトが用いられる。したがって、例えば、荘厳な雰囲気を醸し出すような効果が付与される。 In addition, as the effect collection 52B suitable for a material having a large number of people and a low smile level, a group of effects having an effect reminiscent of a ceremony is used. Therefore, for example, an effect that creates a solemn atmosphere is given.

また、人数が少なく、笑顔度が低い素材に適したエフェクト集５２Ｃとしては、クールな印象や近未来的な印象を想起させる効果を有する一群のエフェクトが用いられる。
また、人数が少なく、笑顔度が高い素材に適したエフェクト集５２Ｄには、ファンタジーや魔法といった印象を想起させる効果を有する一群のエフェクトが用いられる。 In addition, as the effect collection 52C suitable for a material with a small number of people and a low smile level, a group of effects having an effect of recalling a cool impression or a near-future impression is used.
In addition, a group of effects having an effect of recalling an impression such as fantasy or magic is used for the effect collection 52D suitable for a material having a small number of people and a high smile level.

これらエフェクト集５２Ａ〜５２Ｄでは、エフェクトに用いられる色、形状、動き（モーション）、オブジェクト等を変更することによって、ユーザに認識される印象が変化するように設計される。 These effect collections 52A to 52D are designed to change the impression recognized by the user by changing the color, shape, movement (motion), object, and the like used for the effect.

したがって、例えば、幸せな印象や元気な印象を想起させる効果を有するエフェクト集５２Ａは、明るい色や鮮やかな色を用いたエフェクトを含む。また、例えば、クールな印象や近未来的な印象を想起させる効果を有するエフェクト集５２Ｃは、幾何学的な形状を用いたエフェクトを含む。 Therefore, for example, the effect collection 52 A that has an effect of recalling a happy impression or an energetic impression includes effects using bright colors or vivid colors. Further, for example, the effect collection 52 C having an effect of recalling a cool impression or a near-future impression includes an effect using a geometric shape.

なお、エフェクト抽出部２３１は、抽出された複数の静止画像データ３０１Ａに適するエフェクトを、図６及び図７に示すような４種類に分類されたエフェクト集に限らず、さらに細かく分類されたエフェクト集から選択することもできる。その場合、人数及び笑顔度の値（値域）に対応する、所定の種類のエフェクト集が予め規定され、エフェクト抽出部２３１は、規定された所定の種類のエフェクト集から、抽出された複数の静止画像データ３０１Ａに適したエフェクト集を選択する。 Note that the effect extraction unit 231 does not limit the effects suitable for the plurality of extracted still image data 301A to the effect collections classified into the four types as shown in FIGS. You can also choose from. In that case, a predetermined type of effect collection corresponding to the number of people and the smile level (value range) is specified in advance, and the effect extraction unit 231 extracts a plurality of still images extracted from the specified predetermined type of effect collection. An effect collection suitable for the image data 301A is selected.

また、エフェクト抽出部２３１は、人数及び笑顔度以外の指標を用いて、抽出された複数の静止画像データ３０１Ａに適したエフェクト集を選択してもよい。 Further, the effect extraction unit 231 may select an effect collection suitable for the plurality of extracted still image data 301A using an index other than the number of people and the smile level.

次にエフェクト抽出部２３１は、選択したエフェクト集に対応するエフェクトデータ３０３Ａをエフェクトデータベース３０３から抽出し、抽出したエフェクトデータ３０３Ａを合成動画生成部２３２に出力する。 Next, the effect extraction unit 231 extracts the effect data 303A corresponding to the selected effect collection from the effect database 303, and outputs the extracted effect data 303A to the synthesized moving image generation unit 232.

これにより、合成動画生成部２３２は、取り込まれた素材データ５１を含むカラオケ背景動画像となる合成動画を生成する。
このとき、合成動画に含まれる素材データ５１には、エフェクト抽出部２３１により抽出されたエフェクトデータ３０３Ａが施されている。 Thereby, the synthetic moving image generating unit 232 generates a synthetic moving image that becomes a karaoke background moving image including the captured material data 51.
At this time, the effect data 303A extracted by the effect extraction unit 231 is applied to the material data 51 included in the synthesized moving image.

また、例えば、合成動画に含まれる静止画像データ３０１Ａ（素材データ５１）に登場する人物の顔画像（オブジェクト）に対して、エフェクト抽出部２３１により抽出されたエフェクトデータ３０３Ａが施されている。
合成動画生成部２３２は、例えば、エフェクトデータ３０３Ａによって規定されたタイミングで表示される静止画像データ３０１Ａを含む合成動画を生成する。 Further, for example, the effect data 303A extracted by the effect extraction unit 231 is applied to the face image (object) of a person appearing in the still image data 301A (material data 51) included in the synthesized moving image.
For example, the combined moving image generation unit 232 generates a combined moving image including still image data 301A that is displayed at a timing defined by the effect data 303A.

また、この合成動画は、所定のタイミングで出力される音声データ３０１Ｂを含んでもよい。
そして、合成動画生成部２３２は、生成した合成動画を合成動画出力部２３３に出力する。 Further, the synthesized moving image may include audio data 301B output at a predetermined timing.
Then, the composite video generation unit 232 outputs the generated composite video to the composite video output unit 233.

なお、エフェクト抽出部２３１は、取り込まれた素材データ５１に対して、表情（例えば、笑顔度）と人数に基づくエフェクトデータ３０３Ａを施してもよい。その場合、合成動画生成部２３２は、エフェクト抽出部２３１によってエフェクトが施された複数の静止画像を含む動画像（合成動画）をカラオケ背景動画像として生成する。 Note that the effect extraction unit 231 may apply effect data 303A based on a facial expression (for example, smile level) and the number of people to the captured material data 51. In that case, the synthesized moving image generating unit 232 generates a moving image (synthetic moving image) including a plurality of still images to which the effect is applied by the effect extracting unit 231 as a karaoke background moving image.

合成動画出力部２３３は、合成動画生成部２３２によって生成された合成動画像を出力する。
合成動画出力部２３３は、合成動画を再生し、画面（ディスプレイ１０７）に表示する。 The composite video output unit 233 outputs the composite video generated by the composite video generation unit 232.
The synthesized moving image output unit 233 reproduces the synthesized movie and displays it on the screen (display 107).

また、合成動画出力部２３３は、合成動画をエンコードし、エンコードした合成動画のファイルを所定の記憶装置（例えば、ＨＤＤ１０３等）に格納してもよい。 The synthetic video output unit 233 may encode the synthetic video and store the encoded synthetic video file in a predetermined storage device (for example, the HDD 103).

以上の構成により、動画像再生アプリケーションプログラム２０２は、カラオケ背景動画像としての合成動画に用いられる素材データ５１に適したエフェクトデータ（エフェクト群）３０３Ａを決定する。 With the above configuration, the moving image playback application program 202 determines effect data (effect group) 303A suitable for the material data 51 used for the composite moving image as the karaoke background moving image.

具体的には、エフェクト抽出部２３１は、例えば、合成動画に用いられる複数の静止画像データ３０１Ａの各々の笑顔度と人数とに基づいて、これら複数の静止画像データ３０１Ａ全体の笑顔度と人数との指標を決定する。エフェクト抽出部２３１は、決定した笑顔度と人数との指標に基づいて、カラオケ背景動画像としての合成動画に用いられる複数の静止画像データ３０１Ａに適したエフェクトデータ３０３Ａを選択する。 Specifically, the effect extraction unit 231, for example, based on the smile level and the number of people of each of the plurality of still image data 301 A used for the composite video, the smile level and the number of people of the plurality of still image data 301 A as a whole. Determine the indicators. The effect extraction unit 231 selects the effect data 303A suitable for the plurality of still image data 301A used for the synthesized moving image as the karaoke background moving image based on the determined index of smile level and number of people.

したがって、合成動画に用いられるエフェクトデータ３０３Ａを選択する操作をユーザが行うことなく、合成動画生成部２３２は、適切なエフェクトデータ３０３Ａが施された複数の静止画像データ３０１Ａを含むカラオケ背景動画像（合成動画像）を生成することができる。 Therefore, without the user performing an operation of selecting the effect data 303A used for the composite video, the composite video generation unit 232 includes a karaoke background moving image (including a plurality of still image data 301A to which the appropriate effect data 303A is applied ( Composite moving image) can be generated.

次に実施形態のカラオケ背景動画像の生成及び再生処理について説明する。
図８は、カラオケ再生処理の処理フローチャートである。
まず、カラオケ演奏端末１３のコントローラ１０１は、カラオケ背景動画像としての合成動画に用いる素材データ５１を取り込む（ステップＳ１１）。 Next, generation and playback processing of a karaoke background moving image according to the embodiment will be described.
FIG. 8 is a process flowchart of the karaoke playback process.
First, the controller 101 of the karaoke performance terminal 13 takes in the material data 51 used for the synthesized moving image as the karaoke background moving image (step S11).

素材データ５１の取込方法としては、ＵＳＢコネクタ１２６を介して外部ＵＳＢ機器から取り込む方法、カードコネクタ１２８を介して外部のメモリカードから取り込む方法、カメラ１１２により撮影して取り込む方法、ＨＤＤ１０３に記憶されている共用素材データを取り込む方法、カラオケホスト１１から通信ネットワーク１２を介して共用素材データをダウンロードする方法などが考えられる。 The material data 51 can be captured from an external USB device via the USB connector 126, from an external memory card via the card connector 128, photographed and captured by the camera 112, and stored in the HDD 103. For example, a method of capturing shared material data, a method of downloading shared material data from the karaoke host 11 via the communication network 12, and the like can be considered.

一般的なユーザにおける素材データ５１の取込方法としては、ＵＳＢコネクタ１２６、カードコネクタ１２８あるいはカメラ１１２から取り込む方法が採られる。 As a method of capturing the material data 51 by a general user, a method of capturing from the USB connector 126, the card connector 128, or the camera 112 is employed.

具体的には、ＵＳＢコネクタ１２６にＵＳＢメモリ、ＵＳＢ接続ハードディスク、ＵＳＢ接続ＳＳＤ（Solid State Drive）などの外部記憶装置が接続された場合には、ＵＳＢコントローラ１２７を介して、写真データなどの静止画データを素材データ５１として取り込む。 Specifically, when an external storage device such as a USB memory, a USB-connected hard disk, or a USB-connected SSD (Solid State Drive) is connected to the USB connector 126, a still image such as photo data is connected via the USB controller 127. Data is taken in as material data 51.

また、カードコネクタ１２８に外部のメモリカードが接続された場合には、カードコントローラ１２９と、カードコントローラ１２９を介して写真データなどの静止画データを素材データ５１として取り込む。 When an external memory card is connected to the card connector 128, still image data such as photo data is taken in as material data 51 via the card controller 129 and the card controller 129.

また、ユーザの操作によりカメラ１１２により撮影がなされた場合には、撮影した写真データを素材データ５１として取り込む。 In addition, when the camera 112 has photographed by the user's operation, the photographed photograph data is captured as the material data 51.

次にコントローラ１０１は、動画像再生アプリケーションプログラム２０２を実行して素材分析処理を行う（ステップＳ１２）。 Next, the controller 101 executes the moving image reproduction application program 202 to perform material analysis processing (step S12).

図９は、素材分析処理の処理フローチャートである。
以下では、素材分析対象の素材データ５１が写真データなどの静止画像データ３０１Ａである場合を想定する。 FIG. 9 is a process flowchart of the material analysis process.
In the following, it is assumed that the material analysis target material data 51 is still image data 301A such as photographic data.

まず、素材入力部２１は、インタフェース部等を介して、静止画像データ３０１Ａが入力されたか否かを判別する（ステップＳ２１）。
ステップＳ２１の判別において、静止画像データ３０１Ａが入力されていない場合には（ステップＳ２１；Ｎｏ）、待機状態となる。 First, the material input unit 21 determines whether still image data 301A has been input via the interface unit or the like (step S21).
If the still image data 301A is not input in the determination in step S21 (step S21; No), the standby state is entered.

ステップＳ２１の判別において、静止画像データ３０１Ａが入力されている場合には（ステップＳ２１；Ｙｅｓ）、素材入力部２１は、入力された静止画像データ３０１Ａを素材データベース３０１に格納する（ステップＳ２２）。そして、素材入力部２１は、静止画像データ３０１Ａが入力されたことを素材分析部２２（顔画像検出部２２１）に通知する。 If the still image data 301A is input in the determination of step S21 (step S21; Yes), the material input unit 21 stores the input still image data 301A in the material database 301 (step S22). Then, the material input unit 21 notifies the material analysis unit 22 (face image detection unit 221) that the still image data 301A has been input.

次に、顔画像検出部２２１は、入力された静止画像データ３０１Ａから顔画像を検出する（ステップＳ２３）。
すなわち、顔画像検出部２２１は、静止画像データ３０１Ａ内に含まれる各顔画像の位置（座標）、サイズ、正面度等を検出する。この場合において、顔画像検出部２２１により、検出した顔画像に対応する人物を認識（識別）するようにしてもよい。
そして、顔画像検出部２２１は、検出した顔画像を示す情報を表情検出部２２２及び人数検出部２２３に出力する。 Next, the face image detection unit 221 detects a face image from the input still image data 301A (step S23).
That is, the face image detection unit 221 detects the position (coordinates), size, frontality, and the like of each face image included in the still image data 301A. In this case, the face image detection unit 221 may recognize (identify) a person corresponding to the detected face image.
Then, the face image detection unit 221 outputs information indicating the detected face image to the expression detection unit 222 and the number of people detection unit 223.

これにより、表情検出部２２２は、顔画像検出部２２１により検出された顔画像の笑顔度を決定する（ステップＳ２４）。
ここで、笑顔度とは、検出された顔画像が笑顔である尤もらしさの合い（尤度）を示す指標である。一つの静止画像データ３０１Ａから複数の顔画像が検出されている際には、それら顔画像の笑顔度に基づいて、静止画像データ３０１Ａの笑顔度を決定する。 As a result, the facial expression detection unit 222 determines the smile level of the face image detected by the face image detection unit 221 (step S24).
Here, the smile level is an index indicating the likelihood (likelihood) that the detected face image is a smile. When a plurality of face images are detected from one still image data 301A, the smile level of the still image data 301A is determined based on the smile levels of the face images.

また、人数検出部２２３は、顔画像検出部２２１により検出された顔画像の数に基づいて、静止画像データ３０１Ａに含まれる人物の数を決定する（ステップＳ２５）。
これらの結果、素材分析部２２は、静止画像データ３０１Ａに対応する笑顔度、人数、顔画像情報等を含む分析情報３０２Ｂを素材情報データベース３０２に格納する（ステップＳ２６）。 In addition, the number-of-people detection unit 223 determines the number of persons included in the still image data 301A based on the number of face images detected by the face image detection unit 221 (step S25).
As a result, the material analysis unit 22 stores the analysis information 302B including the smile level, the number of people, face image information, and the like corresponding to the still image data 301A in the material information database 302 (step S26).

以上の処理により、入力された静止画像データ３０１Ａに含まれる顔画像の笑顔度と人数とを決定し、これら笑顔度と人数とを含む分析情報３０２Ｂが素材情報データベース３０２に格納される。 Through the above process, the smile level and the number of faces included in the input still image data 301 A are determined, and analysis information 302 B including the smile level and the number of persons is stored in the material information database 302.

ここで、ＨＤＤ１０３のデータベース１３１に格納されている推奨楽曲データベースについて説明する。
図１０は、推奨楽曲データベースの構成説明図である。
推奨楽曲データベース４０１は、大別すると、分析結果分類データベース４０１Ａと、分類別推奨楽曲データベース４０１Ｂと、を備えている。
分析結果分類データベース４０１Ａは、解析結果と、推奨楽曲の分類コードとを対応づけるデータベースである。
また、分類別推奨楽曲データベース４０１Ｂは、推奨楽曲の分類コードと、実際のカラオケ楽曲を特定するためのカラオケ楽曲特定データと、を対応づけるデータベースである。 Here, the recommended music database stored in the database 131 of the HDD 103 will be described.
FIG. 10 is a diagram illustrating the configuration of the recommended music database.
The recommended music database 401 is roughly divided into an analysis result classification database 401A and a classification-specific recommended music database 401B.
The analysis result classification database 401A is a database that associates analysis results with classification codes of recommended songs.
Further, the classified recommended music database 401B is a database that associates recommended music classification codes with karaoke music specifying data for specifying actual karaoke music.

分析結果分類データベース４０１Ａは、具体的には、解析結果が撮影日時の場合には、適切な季節の曲に対応する推奨楽曲の分類コードを対応づけている。例えば、撮影日が１２月であれば、クリスマスソングに分類される推奨楽曲の分類コードが紐づけられている。 Specifically, in the analysis result classification database 401A, when the analysis result is the shooting date and time, the classification code of the recommended music corresponding to the appropriate seasonal music is associated. For example, if the shooting date is December, the recommended music classification code classified as a Christmas song is linked.

また、解析結果が、笑顔度である場合には、笑顔度を所定範囲に区分し、それぞれの笑顔度範囲に推奨楽曲の分類コードを対応づけている。例えば、笑顔度が低い場合には、荘厳なイメージの推奨楽曲の分類コードを対応づけ、笑顔度が高い場合には、元気で場を盛り上げるような推奨楽曲の分類コードを紐付けている。 When the analysis result is a smile level, the smile level is classified into a predetermined range, and a recommended music classification code is associated with each smile level range. For example, when the smile level is low, the classification code of the recommended music with a majestic image is associated, and when the smile level is high, the classification code of the recommended music that enlivens the field is linked.

また、解析結果が、写真データに対応する画像に含まれる人物の年齢を推定した推定年齢である場合には、年齢範囲を複数の年齢域に分類し、各年齢域の人々のそれぞれの時代におけるヒットソング等のカラオケ楽曲を推奨楽曲とする分類コードを対応づけている。さらに、写真データに対応する画像に含まれる人物の推定年齢、写真データに対応する撮影日時および現在日時から、写真データに対応する画像に含まれる人物の現在日時時点の年齢を推定し、現在の推定年齢に基づいて、当該年齢に相応しいカラオケ楽曲を推奨楽曲とする分類コードを紐づけるようにしてもよい。 In addition, when the analysis result is an estimated age obtained by estimating the age of the person included in the image corresponding to the photo data, the age range is classified into a plurality of age ranges, and people in each age range in each era Classification codes that make karaoke songs such as hit songs recommended songs are associated. Further, from the estimated age of the person included in the image corresponding to the photo data, the shooting date and time corresponding to the photo data, and the current date and time, the age of the person included in the image corresponding to the photo data is estimated at the current date and time, Based on the estimated age, a classification code in which karaoke music suitable for the age is recommended music may be linked.

また、解析結果が、写真データに対応する画像に含まれる人物の人数である場合には、例えば、バラード系のカラオケ楽曲を推奨楽曲とする分類コードを紐づけるようにしてもよい。 Further, when the analysis result is the number of persons included in the image corresponding to the photo data, for example, a classification code with ballad karaoke music as a recommended music may be associated.

また、解析結果が、写真データに対応する画像に含まれる人物の特定結果である場合には、例えば、当該人物が過去に歌ったカラオケ楽曲の履歴を記録しておき、当該ユーザの履歴に含まれる、あるいは、当該ユーザの履歴に含まれるカラオケ楽曲に類似するカラオケ楽曲を推奨楽曲とする分類コードを紐づけるようにしてもよい。 Further, when the analysis result is a result of specifying a person included in the image corresponding to the photo data, for example, a history of karaoke songs sung by the person in the past is recorded and included in the history of the user. Alternatively, a classification code with karaoke music similar to karaoke music included in the user's history as recommended music may be associated.

分類別推奨楽曲データベース４０１Ｂは、具体的には、分類コードを実際のカラオケ楽曲に紐づけるように、カラオケ楽曲の演奏コードに分類コードを紐づけている。 Specifically, the recommended music database 401B classified by category associates the classification code with the performance code of the karaoke music so that the classification code is associated with the actual karaoke music.

したがって、コントローラ１０１は、分析結果に基づいて取得した分類コードに基づいて、分類別推奨楽曲データベース４０１Ｂを参照し、実際のカラオケ楽曲の演奏コードを取得する推奨楽曲選択処理を実行する（ステップＳ１３）。 Therefore, the controller 101 refers to the classified recommended music database 401B based on the classification code acquired based on the analysis result, and executes the recommended music selection process for acquiring the performance code of the actual karaoke music (step S13). .

そして、得られた演奏コードに基づいて、対応するカラオケ楽曲の曲名、歌手名などを取得し、ディスプレイ１０７に推奨楽曲リストとして表示し、ユーザにいずれかのカラオケ楽曲を選択するように促す。
これにより、楽曲選択操作待ち状態となり（ステップＳ１４）、ユーザにより楽曲選択操作がなされたか否かを判別する（ステップＳ１５）。 Then, based on the obtained performance code, the song name, singer name, and the like of the corresponding karaoke song are acquired and displayed as a recommended song list on the display 107, and the user is prompted to select any karaoke song.
Thereby, it will be in a music selection operation waiting state (step S14), and it will be discriminate | determined whether the music selection operation was made by the user (step S15).

ステップＳ１５の判別において、楽曲選択操作がなされていない場合には（ステップＳ１５；Ｎｏ）、待機状態となる。
なお、待機時間が所定の待機最大時間を経過した場合には、楽曲選択操作待ち状態を解除するように構成することも可能である。 If it is determined in step S15 that no music selection operation has been performed (step S15; No), a standby state is entered.
It is also possible to configure to release the music selection operation waiting state when the standby time has exceeded a predetermined maximum standby time.

ステップＳ１５の判別において、ユーザによる楽曲選択操作がなされた場合には、コントローラ１０１は、エフェクト抽出部２３１として機能し、ステップＳ１２の素材分析処理結果に基づいて、動画像生成に用いる一連のエフェクトを選択する（ステップＳ１６）。すなわち、コントローラ１０１は、エフェクト抽出部２３１として機能して、取り込んだ素材データ５１に対応する分析情報３０２Ｂに基づいて、取り込んだ素材データ５１に適したエフェクト集を選択する。そして、コントローラ１０１は、選択したエフェクト集に対応するエフェクトデータ３０３Ａをエフェクトデータベース３０３から抽出する。 In the determination in step S15, when the user performs a music selection operation, the controller 101 functions as the effect extraction unit 231 and selects a series of effects used for moving image generation based on the material analysis processing result in step S12. Select (step S16). That is, the controller 101 functions as the effect extraction unit 231 and selects an effect collection suitable for the captured material data 51 based on the analysis information 302B corresponding to the captured material data 51. Then, the controller 101 extracts the effect data 303A corresponding to the selected effect collection from the effect database 303.

続いて、コントローラ１０１は、合成動画生成部２３２として機能し、抽出された素材データ５１とエフェクトデータ３０３Ａとを用いて、合成動画を生成する（ステップＳ１７）。ここで、生成された合成動画には、エフェクトデータ３０３Ａが施された素材データ５１が含まれる。なお、コントローラ１０１は、エフェクト抽出部２３１として機能するに際し、取り込んだ素材データ５１に選択したエフェクトデータ３０３Ａを施すようにしてもよい。
これにより、コントローラ１０１は、合成動画生成部２３２として機能するに際し、エフェクトデータ３０３Ａが施された素材データ５１を含む合成動画を生成する。 Subsequently, the controller 101 functions as the synthetic moving image generation unit 232, and generates a synthetic moving image using the extracted material data 51 and the effect data 303A (step S17). Here, the generated composite video includes the material data 51 to which the effect data 303A has been applied. When the controller 101 functions as the effect extraction unit 231, the controller 101 may apply the selected effect data 303 A to the captured material data 51.
As a result, when the controller 101 functions as the composite video generation unit 232, the controller 101 generates a composite video including the material data 51 to which the effect data 303A has been applied.

続いて、コントローラ１０１は、合成動画出力部２３３として機能し、合成動画を、カラオケ楽曲の背景動画像として、表示コントローラ１０８を介して、カラオケ楽曲に対応する歌詞とともにディスプレイ１０７に表示させる。 Subsequently, the controller 101 functions as the synthetic moving image output unit 233, and displays the synthetic moving image on the display 107 together with the lyrics corresponding to the karaoke music through the display controller 108 as a background moving image of the karaoke music.

これと並行して、コントローラ１０１は、サウンドコントローラ１１１を制御して、マイクロフォン１０９Ａ、１０９Ｂから入力された、ユーザの入力音声に、カラオケ楽曲をミキシングしてスピーカ１１０から音響出力する（ステップＳ１８）。 In parallel with this, the controller 101 controls the sound controller 111 to mix the karaoke music with the user input voice input from the microphones 109A and 109B and to output the sound from the speaker 110 (step S18).

この場合において、ユーザにより録画を行う旨の設定がなされ、あるいは、基本設定として録画を行う旨の設定がなされている場合には、カラオケ楽曲の背景動画像として合成動画及びユーザの入力音声にカラオケ楽曲をミキシングした音声をＨＤＤ１０３に録画し、あるいは、あらかじめ光ディスクドライブ１０５にセットされた書き込み可能なＣＤあるいは書き込み可能なＤＶＤなどに録画を行うようになっている。 In this case, if the setting for recording is made by the user, or if the setting for recording is made as the basic setting, the synthesized moving image and the input voice of the user are added to the karaoke music as the background moving image. The sound obtained by mixing the music is recorded on the HDD 103, or recorded on a writable CD or a writable DVD set in the optical disc drive 105 in advance.

また、コントローラ１０１は、曲のテンポを変更したり、早送り等、合成動画の再生速度を変更したりするようなトリック再生が行われているか否かを判別し（ステップＳ１９）、トリック再生が行われている場合には（ステップＳ１９；Ｙｅｓ）、カラオケ楽曲の再生終了タイミングを算出して、再生終了タイミングに合わせて、カラオケ楽曲の背景動画像として合成動画の再生終了もなされるように、未再生部分の合成動画の再生成を行い（ステップＳ２０）、処理を再びステップＳ１８に移行する。 Further, the controller 101 determines whether or not trick playback is being performed such as changing the tempo of the song or changing the playback speed of the composite video, such as fast-forwarding (step S19). If it has been interrupted (step S19; Yes), the playback end timing of the karaoke song is calculated, and the playback of the composite video as the background moving image of the karaoke song is also ended in accordance with the playback end timing. The composite moving image of the reproduction part is regenerated (step S20), and the process proceeds to step S18 again.

また、ステップＳ１９の判別において、トリック再生が行われていない場合には（ステップＳ１９；Ｎｏ）、カラオケ楽曲の再生が終了するまで、ステップＳ１７におけるカラオケ楽曲の背景動画像を、カラオケ楽曲に対応する歌詞と合わせてディスプレイ１０７に表示させるとともに、ユーザの入力音声に、カラオケ楽曲をミキシングしてスピーカ１１０から音響出力する処理を継続することとなる。 Further, in the determination of step S19, if trick playback is not performed (step S19; No), the background moving image of the karaoke song in step S17 corresponds to the karaoke song until the playback of the karaoke song ends. Along with the lyrics displayed on the display 107, the process of mixing karaoke music with the user's input voice and outputting the sound from the speaker 110 is continued.

以上の処理により、取り込まれた素材データ５１と、素材データ５１に適したエフェクトデータ３０３Ａとを用いた合成動画をカラオケ楽曲の背景動画像として生成して、カラオケ再生を行うことができる。 Through the above processing, a synthesized moving image using the captured material data 51 and effect data 303A suitable for the material data 51 can be generated as a background moving image of karaoke music, and karaoke playback can be performed.

このように、本実施形態によれば、ユーザは、取り込ませた素材データ５１に適したエフェクトデータ３０３Ａを選択するための作業を何ら行わなくても、適切なエフェクトデータ３０３Ａが施された素材データ５１を含む合成動画がカラオケ楽曲の背景動画像として生成される。そして生成されたカラオケ楽曲の背景動画像がディスプレイ１０７に表示されつつ、カラオケ楽曲の再生が行われる状態で、カラオケ演奏端末１３を利用することができる。 As described above, according to this embodiment, the user does not perform any work for selecting the effect data 303A suitable for the captured material data 51, and the material data to which the appropriate effect data 303A is applied. A synthesized moving image including 51 is generated as a background moving image of karaoke music. The karaoke performance terminal 13 can be used in a state in which the karaoke music is reproduced while the generated background moving image of the karaoke music is displayed on the display 107.

すなわち、ユーザはエフェクトデータ３０３Ａに関する知識を全く有しないにも拘わらず、動画像再生アプリケーションプログラム２０２は、適切なエフェクトデータ３０３Ａが施された素材データ５１を含む合成動画像をカラオケの背景画像として容易に生成できる。 That is, although the user has no knowledge about the effect data 303A, the moving image playback application program 202 can easily use the synthesized moving image including the material data 51 on which the appropriate effect data 303A is applied as the background image of karaoke. Can be generated.

なお、本実施形態の合成動画生成処理の手順は全てソフトウェアによって実行することができる。このため、合成動画生成処理の手順を実行するプログラムを格納したコンピュータ読み取り可能な記憶媒体を通じてこのプログラムを通常のコンピュータにインストールして実行するだけで、本実施形態と同様の効果を容易に実現することができる。 Note that all the procedures of the synthetic moving image generation process of the present embodiment can be executed by software. For this reason, the same effect as that of the present embodiment can be easily realized simply by installing and executing this program on a normal computer through a computer-readable storage medium storing a program for executing the procedure of the synthetic moving image generation process. be able to.

また本発明は上記実施形態そのままに限定されるものではなく、実施段階ではその要旨を逸脱しない範囲で構成要素を変形して具体化できる。また、上記実施形態に開示されている複数の構成要素の適宜な組み合わせにより、種々の発明を形成できる。
例えば、実施形態に示される全構成要素からいくつかの構成要素を削除してもよい。さらに、異なる実施形態にわたる構成要素を適宜組み合わせてもよい。 Further, the present invention is not limited to the above-described embodiment as it is, and can be embodied by modifying the constituent elements without departing from the scope of the invention in the implementation stage. In addition, various inventions can be formed by appropriately combining a plurality of components disclosed in the embodiment.
For example, some components may be deleted from all the components shown in the embodiment. Furthermore, constituent elements over different embodiments may be appropriately combined.

（実施形態の変形例）
例えば、背景動画像で使用する複数の写真について、例えば、複数のイベント（例えば、運動会、ピクニック、卒業式など）にまたがる写真データ（静止画データ）が入力された場合のように、撮影日時、撮影場所、笑顔度などにばらつきが大きい場合には、写真全体で一つの推奨カラオケ楽曲が選択できないものとして、複数の写真を複数のグループに分ける。 (Modification of the embodiment)
For example, for a plurality of photos used in a background moving image, for example, when photo data (still image data) spanning a plurality of events (for example, athletic meet, picnic, graduation ceremony, etc.) is input, If there is a large variation in shooting location, smile level, etc., it is assumed that one recommended karaoke piece cannot be selected for the entire photo, and the multiple photos are divided into multiple groups.

そして、グループ毎におすすめのカラオケ楽曲を推奨カラオケ楽曲として選択し、これらの複数のカラオケ楽曲のメドレー曲を提示し、ユーザにより選択された場合には、選択した複数のカラオケ楽曲のメドレー曲をオリジナルのメドレー曲として生成するように構成することも可能である。
この場合には、各グループに対応するカラオケ楽曲が演奏される期間に対応づけて当該グループに属する写真を使って、背景動画像を生成し、それらの背景動画像をつなげてメドレー曲全体の背景動画像として生成するようにすればよい。 Then, the recommended karaoke music for each group is selected as the recommended karaoke music, and the medley music of the plurality of karaoke music is presented. It is also possible to generate the medley music.
In this case, a background moving image is generated using photos belonging to the group in association with the period in which the karaoke music corresponding to each group is played, and the background moving image is connected to the background of the entire medley song. It may be generated as a moving image.

また、写真データに対応する画像に含まれる人物の笑顔度が高く、かつ、写真データに対応する画像に含まれる人物の数が多い場合には、場を盛り上げることが可能であるとあらかじめ指定されたカラオケ楽曲を提示するようにしてもよい。 In addition, when the smile level of the person included in the image corresponding to the photo data is high and the number of persons included in the image corresponding to the photo data is large, it is designated in advance that the place can be excited. You may make it show the karaoke music which was made.

以上の説明においては、解析結果が、写真データに対応する画像に含まれる人物の特定結果である場合には、当該人物が過去に歌ったカラオケ楽曲の履歴を記録しておき、当該ユーザの履歴に含まれる、あるいは、当該ユーザの履歴に含まれるカラオケ楽曲に類似するカラオケ楽曲を推奨楽曲とする構成としていた。 In the above description, when the analysis result is a result of specifying a person included in the image corresponding to the photo data, a history of karaoke songs sung by the person in the past is recorded, and the history of the user is recorded. Or a karaoke piece similar to the karaoke piece included in the user's history as a recommended piece.

しかしながら、カラオケの開始時に参加メンバの写真をカメラ１１２により撮影して顔認識を行って、個人を識別し、会員登録データベース等と照合して会員のこれまでの履歴や年齢から推奨するカラオケ楽曲を推奨カラオケ楽曲として提示する構成とすることも可能である。 However, at the start of karaoke, a photograph of the participating member is taken by the camera 112 to recognize the face, identify the individual, check the member registration database, etc., and check the karaoke music recommended from the member's past history and age It can also be configured to be presented as recommended karaoke music.

また、識別した個人の組み合わせ（グループ）毎に過去に歌ったカラオケ楽曲の履歴を記録しておき、当該グループの履歴に含まれる、あるいは、当該グループの履歴に含まれるカラオケ楽曲に類似するカラオケ楽曲を推奨カラオケ楽曲として提示したりするように構成することも可能である。 Moreover, the history of the karaoke music sung in the past is recorded for each identified individual combination (group), and the karaoke music that is included in the history of the group or similar to the karaoke music included in the history of the group. May be presented as recommended karaoke music.

同様に、素材データ５１として、写真データなどの静止画像データをユーザが持ち込んだ場合にも、静止画像データに対応する静止画像に含まれる一または複数の人物を特定して、推奨するカラオケ楽曲を提示するように構成することも可能である。 Similarly, even when a user brings in still image data such as photo data as the material data 51, one or a plurality of persons included in the still image corresponding to the still image data are specified, and recommended karaoke music is selected. It can also be configured to present.

１０通信カラオケシステム
１１カラオケホスト
１２通信ネットワーク
１３カラオケ演奏端末（カラオケ装置）
１４ユーザ操作端末（選択受付手段）
２１素材入力部
２２素材分析部（解析手段）
２３動画再生部
５１素材データ
１０１コントローラ（楽曲提示手段、選択受付手段、動画像生成手段）
１０７ディスプレイ（楽曲提示手段）
１０８表示コントローラ（楽曲提示手段）
１１０スピーカ（カラオケ再生手段）
１１１サウンドコントローラ（カラオケ再生手段）
２２１顔画像検出部（解析手段）
２２２表情検出部（解析手段）
２２３人数検出部（解析手段）
２３１エフェクト抽出部
２３２合成動画生成部（動画像生成手段）
２３３合成動画出力部（カラオケ再生手段）
４０１推奨楽曲データベース
４０１Ａ分析結果分類データベース
４０１Ｂ分類別推奨楽曲データベース 10 Communication Karaoke System 11 Karaoke Host 12 Communication Network 13 Karaoke Performance Terminal (Karaoke Device)
14 User operation terminal (selection accepting means)
21 Material Input Unit 22 Material Analysis Unit (Analysis Means)
23 video playback unit 51 material data 101 controller (music presenting means, selection receiving means, moving image generating means)
107 Display (music presentation means)
108 Display controller (music presentation means)
110 Speaker (karaoke playback means)
111 Sound controller (karaoke playback means)
221 face image detection unit (analysis means)
222 Facial expression detection unit (analysis means)
223 Number detection unit (analysis means)
231 Effect extraction unit 232 Composite video generation unit (moving image generation means)
233 Synthetic video output unit (karaoke playback means)
401 Recommended Music Database 401A Analysis Result Classification Database 401B Recommended Music Database by Classification

Claims

指定された静止画像データを解析する解析手段と、
所定の条件と、推奨するカラオケ楽曲と、をあらかじめ紐づけた推奨楽曲データベースと、
前記解析の結果を、前記所定の条件に照らし合わせて前記推奨楽曲データベースを参照し、ユーザに対して推奨するカラオケ楽曲を提示する楽曲提示手段と、
前記提示された楽曲からいずれかを前記ユーザに選択させる選択受付手段と、
を備えたことを特徴とするカラオケ装置。 An analysis means for analyzing the specified still image data;
A recommended music database in which predetermined conditions and recommended karaoke music are linked in advance;
The music presentation means for presenting the recommended karaoke music to the user by referring to the recommended music database in light of the analysis result and the predetermined condition;
Selection accepting means for allowing the user to select one of the presented songs;
A karaoke apparatus comprising:

前記解析手段は、前記静止画像データに対応する静止画像を解析し、前記静止画像に含まれる人物について推定した笑顔度、前記静止画像に含まれる人物について推定した年齢、前記静止画像に含まれる人物について推定した性別、前記静止画像に含まれる人物の数、前記静止画像に含まれる文字情報のうち、いずれか少なくとも一つを前記解析の結果として出力することを特徴とする請求項１記載のカラオケ装置。 The analyzing means analyzes a still image corresponding to the still image data, and estimates a smile degree estimated for a person included in the still image, an estimated age for a person included in the still image, and a person included in the still image. The karaoke according to claim 1, wherein at least one of the estimated gender, the number of persons included in the still image, and character information included in the still image is output as a result of the analysis. apparatus.

前記解析手段は、前記静止画像データに対応する静止画像を解析し、前記静止画像に含まれる人物を特定し、前記解析の結果として出力することを特徴とする請求項１記載のカラオケ装置。 The karaoke apparatus according to claim 1, wherein the analysis unit analyzes a still image corresponding to the still image data, specifies a person included in the still image, and outputs the person as a result of the analysis.

前記解析手段は、前記静止画像データに対応する静止画像を解析し、前記静止画像に含まれる人物について推定した年齢を、当該静止画像データのメタデータに基づいて補正して、現在の年齢として推定することを特徴とする請求項１記載のカラオケ装置。 The analysis means analyzes a still image corresponding to the still image data, corrects the estimated age of the person included in the still image based on the metadata of the still image data, and estimates the current age The karaoke apparatus according to claim 1, wherein:

前記ユーザにより選択されたカラオケ楽曲に適合させて、指定された複数の静止画像のうち少なくとも一部の静止画像を用いてカラオケ背景動画像として用いる動画像を生成する動画像生成手段と、
前記動画像生成手段により生成された動画像を、前記ユーザにより選択されたカラオケ楽曲に同期させて再生するカラオケ再生手段と、
を備えたことを特徴とする請求項１乃至請求項４のいずれかに記載のカラオケ装置。 A moving image generating means for generating a moving image to be used as a karaoke background moving image by using at least some of the still images selected from the plurality of specified still images, adapted to the karaoke music selected by the user;
Karaoke playback means for playing back the moving image generated by the moving image generation means in synchronization with the karaoke song selected by the user;
The karaoke apparatus according to any one of claims 1 to 4, further comprising:

前記動画像生成手段は、前記解析の結果に基づいて、前記複数の静止画像のうち、少なくとも一部の静止画像を含む動画像の生成に用いるエフェクト設定を選択するエフェクト設定選択手段を備え、前記選択された前記エフェクト設定を用いて、当該エフェクト設定を構成するエフェクトが施された前記動画像を生成することを特徴とする請求項１乃至請求項５のいずれかに記載のカラオケ装置。 The moving image generation means includes effect setting selection means for selecting an effect setting used for generating a moving image including at least a part of still images among the plurality of still images based on the result of the analysis, 6. The karaoke apparatus according to claim 1, wherein the moving image to which the effect constituting the effect setting is applied is generated using the selected effect setting. 7.

前記楽曲提示手段は、前記解析の結果に基づいて、複数の前記静止画像データを複数のグループに分け、前記グループ毎に推奨するカラオケ楽曲として選択し、これらの複数のカラオケ楽曲から生成されるメドレー曲を推奨するカラオケ楽曲として提示することを特徴とする請求項１乃至請求項６のいずれかに記載のカラオケ装置。 The music presentation means divides the plurality of still image data into a plurality of groups based on the result of the analysis, selects the karaoke music recommended for each group, and generates a medley generated from the plurality of karaoke music. The karaoke apparatus according to claim 1, wherein the karaoke apparatus presents the tune as a recommended karaoke piece.

カラオケ装置において実行されるカラオケ装置の制御方法であって、
前記カラオケ装置は、静止画像データの解析の結果と、推奨するカラオケ楽曲と、をあらかじめ紐づけた推奨楽曲データベースを備え、
指定された複数の静止画像データを解析する解析過程と、
前記解析の結果に基づいて、前記推奨楽曲データベースを参照し、ユーザに対して推奨するカラオケ楽曲を提示する楽曲提示過程と、
前記提示された楽曲からいずれかを前記ユーザに選択させる選択受付過程と、
前記ユーザにより選択されたカラオケ楽曲に適合させて、指定された複数の静止画像のうち少なくとも一部の静止画像を用いてカラオケ背景動画像として用いる動画像を生成する動画像生成過程と、
を備えたことを特徴とするカラオケ装置の制御方法。 A method for controlling a karaoke device executed in a karaoke device,
The karaoke apparatus includes a recommended music database in which a result of analysis of still image data and a recommended karaoke music are linked in advance,
An analysis process for analyzing a plurality of specified still image data;
Based on the result of the analysis, referring to the recommended music database, a music presentation process for presenting recommended karaoke music to the user,
A selection acceptance process for allowing the user to select one of the presented songs;
A moving image generation process for generating a moving image to be used as a karaoke background moving image by using at least some of the still images selected from the plurality of specified still images, adapted to the karaoke music selected by the user;
A method for controlling a karaoke apparatus, comprising:

所定の条件と、推奨するカラオケ楽曲と、をあらかじめ紐づけた推奨楽曲データベースを備えたカラオケ装置をコンピュータにより制御するための制御プログラムであって、
前記コンピュータを、
指定された複数の静止画像データを解析する解析手段、
前記解析の結果を、前記所定の条件に照らし合わせて前記推奨楽曲データベースを参照し、ユーザに対して推奨するカラオケ楽曲を提示する楽曲提示手段、
前記提示された楽曲からいずれかを前記ユーザに選択させる選択受付手段、
として機能させることを特徴とする制御プログラム。 A control program for controlling a karaoke apparatus provided with a recommended music database in which predetermined conditions and recommended karaoke music are linked in advance by a computer,
The computer,
Analysis means for analyzing a plurality of specified still image data;
The music presentation means for presenting the recommended karaoke music to the user by referring to the recommended music database in light of the analysis result and the predetermined condition,
Selection accepting means for allowing the user to select one of the presented songs;
A control program characterized by functioning as