JP2004120127A

JP2004120127A - Image layout apparatus. image layout program, and image layout method

Info

Publication number: JP2004120127A
Application number: JP2002277960A
Authority: JP
Inventors: Atsuji Nagahara; 永原　敦示; Michihiro Nagaishi; 長石　道博
Original assignee: Seiko Epson Corp
Current assignee: Seiko Epson Corp
Priority date: 2002-09-24
Filing date: 2002-09-24
Publication date: 2004-04-15
Anticipated expiration: 2022-09-24
Also published as: JP4211338B2

Abstract

<P>PROBLEM TO BE SOLVED: To provide an image layout apparatus for reflecting voice and characters or the like attached to a moving image as well as the image on the layout so as to suitably create a compilation product abundant in the contents. <P>SOLUTION: The image layout apparatus includes a moving image acquisition section 110 for acquiring a moving image; a voice recognition section 190 for recognizing voice based on voice information attached to the moving image acquired by the moving image acquisition section 110 and outputting text information being a result of recognition; an image selection section 150 for selecting an image among a plurality of still images configuring the moving image acquired by the moving image acquisition section 110; and an image layout section 170 for laying out the text information from the voice recognition section 190 and the selected image selected by the image selection section 150. <P>COPYRIGHT: (C)2004,JPO

Description

【０００１】
【発明の属する技術分野】
本発明は、動画像を構成する複数の静止画像のなかから選択した画像をレイアウトする装置およびプログラム、並びに方法に係り、特に、画像だけでなく動画像に付帯する音声や文字等もレイアウトに反映させ、内容の充実した編集物を作成するのに好適な画像レイアウト装置および画像レイアウトプログラム、並びに画像レイアウト方法に関する。
【０００２】
【従来の技術】
ディジタルビデオカメラで動画像を撮影し、撮影した動画像を使ってオリジナルアルバムを作成する場合、ユーザは、例えば、撮影した動画像をパソコンに取り込み、動画像を構成する複数の静止画像のなかから画像を選択し、選択画像をレイアウトした後にプリンタで印刷する。パソコン上のアプリケーションでは、多くの場合、印刷したい画像をいくつか選択すると、規定のテンプレートに選択画像を当てはめて自動レイアウトしてくれる。
【０００３】
従来、画像を利用してアルバムその他の編集物を作成する技術としては、例えば、特許文献１に開示されている電子絵本表示装置（以下、第１の従来例という。）、特許文献２に開示されているオリジナル絵本（以下、第２の従来例という。）、特許文献３に開示されているオリジナル絵本（以下、第３の従来例という。）および特許文献４に開示されているビデオプリンタ（以下、第４の従来例という。）があった。
【０００４】
第１の従来例は、あらかじめ絵本となる画像および文章などを書き込む記録媒体と、記録媒体から読み出した背景用画像データを格納する背景用メモリおよび動画用画像データを格納する動画用メモリと、背景用メモリおよび動画用メモリから読み出した背景データおよび動画データを合成して絵本にする合成部とを備え、合成部が合成した画像データを絵本として表示するように構成する。
【０００５】
これにより、絵本を電子化、更に簡易アニメーション機能や音声を付加して静止した絵でけでは得られない効果を与えるとともに、紙を使用せず、再印刷することなく絵本の題材の変更を可能とし、保管を簡易かつ長期間の保管を可能とする。
第２の従来例は、絵本の物語の中の特定の登場人物がブランクの絵本の各ページをあらかじめ準備しておき、ディジタルカメラにより撮影した人物画像をコンピュータにより処理して、ブランクの登場人物の部分にレイアウトしてプリントすることにより、絵本を完成させるようにして絵本の物語に登場する人物と名前を自由に設定出来るようにして、世界に一冊しか存在しない、一冊づつ違う絵本を安い費用で容易にプリント出来るようにしたものである。また、オリジナル絵本の製作方法においては、プリントされた絵本の原紙を独自の方法によりハードカバーを付けて製本出来るようにして短時間で容易にオリジナル絵本の製作が出来るようにしたものである。
【０００６】
第３の従来例は、原版の絵本において登場人物の名前と写真とを個人の名前および写真と取り替えて作成した。その一例として「ももじろう」は、その原版が「ももたろう」であり、題名が「ももじろう」に置き換えられるとともに、絵の中では、「ももたろう」の顔に変えて「子供（じろう）」の顔写真が使われている。さらに、物語本文中の名前の記載も「ももたろう」を「ももじろう」に置き換え、写真も全て「ももたろう」の顔が「子供（じろう）」の顔写真に取り替えられているため、世界で一冊だけのオリジナル絵本となる。
【０００７】
第４の従来例は、インデックスコード検波器にてテレビ信号に重畳されたインデックスコードを検波する。そして、インデックスコードが検波されたときに、メインＣＰＵが、インデックコードが重畳された画像をフレームメモリに記録する制御を行うとともにメカニズム部にて印刷する制御を行う。
これにより、テレビ番組の最初から最後まで見ることなく、必要な画像のみを適宜得ることが可能になる。
【０００８】
【特許文献１】
特開平５−１２０４００号公報
【０００９】
【特許文献２】
特開平１１−１２８５５３号公報
【００１０】
【特許文献３】
実願平１０−８４２０号公報
【００１１】
【特許文献４】
特開平７−１７０４７３号公報
【００１２】
【発明が解決しようとする課題】
ディジタルビデオカメラで撮影した動画像からオリジナルアルバムを作成する場合は、動画像を構成する複数の静止画像のなかから選択した画像をレイアウトするだけでなく、動画像に付帯する音声やクリップ等の文字も一緒にレイアウトした方が、内容の充実したアルバムを作成することができる。
【００１３】
しかしながら、第１の従来例にあっては、背景データおよび動画データを合成して絵本を作成するようになっているだけで、動画像に付帯する音声や文字等を編集物に反映させることはできない。
また、第２の従来例にあっては、ディジタルカメラにより撮影した人物画像をコンピュータにより処理して、ブランクの登場人物の部分にレイアウトしてプリントすることにより絵本を作成するようになっているだけで、同様に、動画像に付帯する音声や文字等を編集物に反映させることはできない。
【００１４】
また、第３の従来例にあっては、登場人物の名前と写真とを個人の名前および写真と取り替えて絵本を作成するようになっているだけで、同様に、動画像に付帯する音声や文字等を編集物に反映させることはできない。
また、第４の従来例にあっては、インデックスコードが検波されたときに、メインＣＰＵが、インデックコードが重畳された画像をフレームメモリに記録する制御を行うとともにメカニズム部にて印刷するようになっているだけで、同様に、動画像に付帯する音声や文字等を編集物に反映させることはできない。
【００１５】
そこで、本発明は、このような従来の技術の有する未解決の課題に着目してなされたものであって、画像だけでなく動画像に付帯する音声や文字等もレイアウトに反映させ、内容の充実した編集物を作成するのに好適な画像レイアウト装置および画像レイアウトプログラム、並びに画像レイアウト方法を提供することを目的としている。
【００１６】
【課題を解決するための手段】
〔発明１〕
上記目的を達成するために、発明１の画像レイアウト装置は、
動画像を構成する複数の静止画像のなかから画像を選択し、選択した画像をレイアウトする装置であって、
前記動画像に付帯する付帯情報を前記動画像から取得する付帯情報取得手段と、前記複数の静止画像のなかから画像を選択する画像選択手段と、前記付帯情報取得手段で取得した付帯情報に基づいて前記画像選択手段で選択した選択画像をレイアウトするレイアウト手段とを備えることを特徴とする。
【００１７】
このような構成であれば、付帯情報取得手段により、付帯情報が動画像から取得され、画像選択手段により、動画像を構成する複数の静止画像のなかから画像が選択される。そして、レイアウト手段により、取得された付帯情報に基づいて選択画像がレイアウトされる。
ここで、付帯情報は、動画像に付帯する情報をいい、これには、例えば、動画像に付帯する音声情報またはテキスト情報が含まれる。
【００１８】
また、付帯情報に基づいて選択画像をレイアウトすることには、例えば、付帯情報がレイアウト可能な情報であれば、付帯情報および選択画像をレイアウトすること、付帯情報がレイアウト不可能な情報であれば、付帯情報を利用した情報処理を行ってその処理結果および選択画像をレイアウトすることが含まれる。付帯情報がレイアウト可能な情報であっても、後者の方法によりレイアウトを行うこともできる。
〔発明２〕
さらに、発明２の画像レイアウト装置は、発明１の画像レイアウト装置において、
前記画像選択手段は、ユーザの好みに適合した画像を前記複数の静止画像のなかから選択するようになっていることを特徴とする。
【００１９】
このような構成であれば、画像選択手段により、ユーザの好みに適合した画像が複数の静止画像のなかから選択される。
ここで、ユーザの好みに適合した画像を選択する構成として、より具体的には、ユーザの好みに関するユーザ情報を記憶するユーザ情報記憶手段を備え、画像選択手段は、ユーザ情報記憶手段のユーザ情報に基づいて複数の静止画像のなかから画像を選択する構成を提案することができる。
〔発明３〕
さらに、発明３の画像レイアウト装置は、発明１および２のいずれかの画像レイアウト装置において、
前記動画像には、音声情報が付帯しており、
前記付帯情報取得手段は、前記動画像に付帯する音声情報に基づいて音声認識を行い、認識結果であるテキスト情報を前記付帯情報として取得するようになっており、
前記レイアウト手段は、前記付帯情報取得手段で取得したテキスト情報および前記選択画像をレイアウトするようになっていることを特徴とする。
【００２０】
このような構成であれば、付帯情報取得手段により、動画像に付帯する音声情報に基づいて音声認識が行われ、認識結果であるテキスト情報が付帯情報として取得される。そして、レイアウト手段により、取得されたテキスト情報および選択画像がレイアウトされる。
〔発明４〕
さらに、発明４の画像レイアウト装置は、発明１および２のいずれかの画像レイアウト装置において、
前記動画像には、テキスト情報が付帯しており、
前記付帯情報取得手段は、前記動画像に付帯するテキスト情報を前記付帯情報として取得するようになっており、
前記レイアウト手段は、前記付帯情報取得手段で取得したテキスト情報および前記選択画像をレイアウトするようになっていることを特徴とする。
【００２１】
このような構成であれば、付帯情報取得手段により、動画像に付帯するテキスト情報が付帯情報として取得され、レイアウト手段により、取得されたテキスト情報および選択画像がレイアウトされる。
〔発明５〕
さらに、発明５の画像レイアウト装置は、発明３および４のいずれかの画像レイアウト装置において、
さらに、前記付帯情報取得手段で取得したテキスト情報に基づいて要約を作成する要約作成手段を備え、
前記レイアウト手段は、前記要約作成手段で作成した要約および前記選択画像をレイアウトするようになっていることを特徴とする。
【００２２】
このような構成であれば、要約作成手段により、取得されたテキスト情報に基づいて要約が作成され、レイアウト手段により、作成された要約および選択画像がレイアウトされる。
〔発明６〕
さらに、発明６の画像レイアウト装置は、発明１ないし５のいずれかの画像レイアウト装置において、
さらに、前記動画像からシーンの区切を検出するシーン区切検出手段を備え、
前記画像選択手段は、前記シーン区切検出手段の検出結果に基づいて前記複数の静止画像のなかから画像を選択するようになっていることを特徴とする。
【００２３】
このような構成であれば、シーン区切検出手段により動画像からシーンの区切が検出されると、画像選択手段により、シーン区切検出手段の検出結果に基づいて複数の静止画像のなかから画像が選択される。
ここで、シーン区切検出手段の検出結果に基づいて画像を選択する構成として、より具体的には、シーン区切検出手段は、動画像からシーンの区切を検出し、シーンの区切を示すシーン区切情報を出力し、画像選択手段は、シーン区切検出手段からのシーン区切情報に基づいて、動画像を構成する複数の静止画像のうち同一シーンまたは特定シーンに属するもののなかから画像を選択する構成を提案することができる。
〔発明７〕
さらに、発明７の画像レイアウト装置は、発明１ないし５のいずれかの画像レイアウト装置において、
さらに、前記動画像からシーンの区切を検出するシーン区切検出手段を備え、
前記付帯情報取得手段は、前記シーン区切検出手段の検出結果に基づいて前記付帯情報を前記動画像から取得するようになっていることを特徴とする。
【００２４】
このような構成であれば、シーン区切検出手段により動画像からシーンの区切が検出されると、付帯情報取得手段により、シーン区切検出手段の検出結果に基づいて付帯情報が動画像から取得される。
ここで、シーン区切検出手段の検出結果に基づいて付帯情報を取得する構成として、より具体的には、シーン区切検出手段は、動画像からシーンの区切を検出し、シーンの区切を示すシーン区切情報を出力し、付帯情報取得手段は、シーン区切検出手段からのシーン区切情報に基づいて、動画像のうち同一シーンまたは特定シーンに係るものに付帯する付帯情報を動画像から取得する構成を提案することができる。
〔発明８〕
さらに、発明８の画像レイアウト装置は、発明５の画像レイアウト装置において、
さらに、前記動画像からシーンの区切を検出するシーン区切検出手段を備え、
前記要約作成手段は、前記シーン区切検出手段の検出結果および前記付帯情報取得手段で取得したテキスト情報に基づいて要約を作成するようになっていることを特徴とする。
【００２５】
このような構成であれば、シーン区切検出手段により動画像からシーンの区切が検出されると、要約作成手段により、シーン区切検出手段の検出結果および取得されたテキスト情報に基づいて要約が作成される。
ここで、シーン区切検出手段の検出結果に基づいて要約を作成する構成として、より具体的には、シーン区切検出手段は、動画像からシーンの区切を検出し、シーンの区切を示すシーン区切情報を出力し、要約作成手段は、シーン区切検出手段からのシーン区切情報、および動画像のうち同一シーンまたは特定シーンに係るものから取得したテキスト情報に基づいて要約を作成する構成を提案することができる。
〔発明９〕
さらに、発明９の画像レイアウト装置は、発明１ないし８のいずれかの画像レイアウト装置において、
前記レイアウト手段は、レイアウトの枠組みを構成する異なる複数のテンプレートのなかから前記テンプレートを選択し、選択したテンプレートおよび前記付帯情報に基づいて前記選択画像をレイアウトするようになっていることを特徴とする。
【００２６】
このような構成であれば、レイアウト手段により、異なる複数のテンプレートのなかからテンプレートが選択され、選択されたテンプレートおよび付帯情報に基づいて選択画像がレイアウトされる。
〔発明１０〕
さらに、発明１０の画像レイアウト装置は、発明１ないし９のいずれかの画像レイアウト装置において、
さらに、前記付帯情報を識別情報と対応付けて記憶する付帯情報記憶手段と、前記付帯情報記憶手段の付帯情報を提供する付帯情報提供手段とを備え、
前記レイアウト手段は、前記付帯情報に対応する識別情報を含む参照情報および前記選択画像をレイアウトするようになっており、
前記付帯情報提供手段は、前記参照情報に基づくアクセスがあったときは、当該参照情報に含まれる識別情報に対応する付帯情報を前記付帯情報記憶手段から読み出し、読み出した付帯情報をアクセス元に提供するようになっていることを特徴とする。
【００２７】
このような構成であれば、レイアウト手段により、付帯情報に対応する識別情報を含む参照情報と、選択画像とがレイアウトされる。したがって、レイアウト結果から参照情報を参照し、その参照情報に基づいて画像レイアウト装置にアクセスすることができる。
一方、参照情報に基づくアクセスがあると、付帯情報提供手段により、その参照情報に含まれる識別情報に対応する付帯情報が付帯情報記憶手段から読み出され、読み出された付帯情報がアクセス元に提供される。
【００２８】
ここで、レイアウト手段は、少なくとも参照情報および選択画像をレイアウトするようになっていればよく、付帯情報、参照情報および選択画像をレイアウトするようになっていてもよいし、参照情報および選択画像だけをレイアウトするようになっていてもよい。以下、発明１２の画像レイアウト装置において同じである。
【００２９】
また、付帯情報記憶手段は、付帯情報をあらゆる手段でかつあらゆる時期に記憶するものであり、付帯情報をあらかじめ記憶してあるものであってもよいし、付帯情報をあらかじめ記憶することなく、本装置の動作時に外部からの入力等によって付帯情報を記憶するようになっていてもよい。
〔発明１１〕
さらに、発明１１の画像レイアウト装置は、発明１０の画像レイアウト装置において、
さらに、前記識別情報を生成する識別情報生成手段を備え、
前記付帯情報取得手段は、前記付帯情報を前記動画像から取得し、取得した付帯情報を、前記識別情報生成手段で生成した識別情報と対応付けて前記付帯情報記憶手段に記憶するようになっていることを特徴とする。
【００３０】
このような構成であれば、識別情報生成手段により、識別情報が生成され、付帯情報取得手段により、付帯情報が動画像から取得され、取得された付帯情報が、生成された識別情報と対応付けられて付帯情報記憶手段に記憶される。
〔発明１２〕
さらに、発明１２の画像レイアウト装置は、発明１ないし９のいずれかの画像レイアウト装置において、
さらに、前記動画像のうち前記選択画像を含むものを識別情報と対応付けて記憶する動画像記憶手段と、前記動画像記憶手段の動画像を提供する動画像提供手段とを備え、
前記レイアウト手段は、前記選択画像を含む動画像に対応する識別情報を含む参照情報および前記選択画像をレイアウトするようになっており、
前記動画像提供手段は、前記参照情報に基づくアクセスがあったときは、当該参照情報に含まれる識別情報に対応する動画像を前記動画像記憶手段から読み出し、読み出した動画像をアクセス元に提供するようになっていることを特徴とする。
【００３１】
このような構成であれば、レイアウト手段により、選択画像を含む動画像に対応する識別情報を含む参照情報と、選択画像とがレイアウトされる。
一方、参照情報に基づくアクセスがあると、動画像提供手段により、その参照情報に含まれる識別情報に対応する動画像が動画像記憶手段から読み出され、読み出された動画像がアクセス元に提供される。
【００３２】
ここで、動画像記憶手段は、動画像をあらゆる手段でかつあらゆる時期に記憶するものであり、動画像をあらかじめ記憶してあるものであってもよいし、動画像をあらかじめ記憶することなく、本装置の動作時に外部からの入力等によって動画像を記憶するようになっていてもよい。
〔発明１３〕
さらに、発明１３の画像レイアウト装置は、発明１２の画像レイアウト装置において、
さらに、前記識別情報を生成する識別情報生成手段と、動画像を生成する動画像生成手段とを備え、
前記動画像生成手段は、前記動画像のうち前記選択画像を含むものを生成し、生成した動画像を、前記識別情報生成手段で生成した識別情報と対応付けて前記動画像記憶手段に記憶するようになっていることを特徴とする。
【００３３】
このような構成であれば、識別情報生成手段により、識別情報が生成され、動画像生成手段により、動画像のうち選択画像を含むものが生成され、生成された動画像が、生成された識別情報と対応付けられて動画像記憶手段に記憶される。
〔発明１４〕
さらに、発明１４の画像レイアウト装置は、発明１０ないし１３のいずれかの画像レイアウト装置において、
前記参照情報は、ＵＲＬであることを特徴とする。
【００３４】
このような構成であれば、レイアウト手段により、識別情報を含むＵＲＬと、選択画像とがレイアウトされる。したがって、レイアウト結果からＵＲＬを参照し、そのＵＲＬに基づいて画像レイアウト装置にアクセスすることができる。
〔発明１５〕
さらに、発明１５の画像レイアウト装置は、発明１０ないし１３のいずれかの画像レイアウト装置において、
前記参照情報は、バーコードであることを特徴とする。
【００３５】
このような構成であれば、レイアウト手段により、識別情報を含むバーコードと、選択画像とがレイアウトされる。したがって、レイアウト結果からバーコードを参照し、そのバーコードに基づいて画像レイアウト装置にアクセスすることができる。
〔発明１６〕
さらに、発明１６の画像レイアウト装置は、発明１０ないし１５のいずれかの画像レイアウト装置において、
前記参照情報は、広告情報を含むことを特徴とする。
【００３６】
このような構成であれば、レイアウト手段により、識別情報および広告情報を含む参照情報と、選択画像とがレイアウトされる。
〔発明１７〕
さらに、発明１７の画像レイアウト装置は、発明１ないし１６のいずれかの画像レイアウト装置において、
さらに、前記レイアウト手段のレイアウト結果に基づいて印刷を行う印刷手段を備えることを特徴とする。
【００３７】
このような構成であれば、印刷手段により、レイアウト手段のレイアウト結果に基づいて印刷が行われる。
〔発明１８〕
一方、上記目的を達成するために、発明１８の画像レイアウトプログラムは、
動画像を構成する複数の静止画像のなかから画像を選択し、選択した画像をレイアウトするプログラムであって、
前記動画像に付帯する付帯情報を前記動画像から取得する付帯情報取得手段、前記複数の静止画像のなかから画像を選択する画像選択手段、および前記付帯情報取得手段で取得した付帯情報に基づいて前記画像選択手段で選択した選択画像をレイアウトするレイアウト手段として実現される処理をコンピュータに実行させるためのプログラムであることを特徴とする。
【００３８】
このような構成であれば、コンピュータによってプログラムが読み取られ、読み取られたプログラムに従ってコンピュータが処理を実行すると、発明１の画像レイアウト装置と同等の作用が得られる。
〔発明１９〕
一方、上記目的を達成するために、発明１９の画像レイアウト方法は、
動画像を構成する複数の静止画像のなかから画像を選択し、選択した画像をレイアウトする方法であって、
前記動画像に付帯する付帯情報を前記動画像から取得する付帯情報取得ステップと、前記複数の静止画像のなかから画像を選択する画像選択ステップと、前記付帯情報取得ステップで取得した付帯情報に基づいて前記画像選択ステップで選択した選択画像をレイアウトするレイアウトステップとを含むことを特徴とする。
【００３９】
【発明の実施の形態】
以下、本発明の第１の実施の形態を図面を参照しながら説明する。図１ないし図１０は、本発明に係る画像レイアウト装置および画像レイアウトプログラム、並びに画像レイアウト方法の第１の実施の形態を示す図である。
本実施の形態は、本発明に係る画像レイアウト装置および画像レイアウトプログラム、並びに画像レイアウト方法を、ディジタルビデオカメラで撮影した動画像のなかから画像を選択し、選択した画像を自動レイアウトする場合について適用したものである。なお、ディジタルビデオカメラで撮影した動画像には、音声情報およびテキスト情報が付帯している。
【００４０】
本発明は、「視覚の誘導場」という概念を画像のレイアウト評価に用いて、最適なレイアウトとなる画像を選択すること、および選択画像を見栄えのよくレイアウトすることを実現するものである。まず、視覚の誘導場について簡単に説明する。
視覚の誘導場は、例えば、文字列上に存在する個々の文字の読み易さなどの評価を行うことで、その文字列全体の読み易さの指標などとして用いられている。
【００４１】
最初に、生理学および心理学的な知見に基づいた文字画像の視覚の誘導場の推定を行う例として、電子化によって得られた文字のディジタル画像から視覚の誘導場を推定する方法について説明する。
なお、文字列内の個々の文字が読み易い状態とは、個々の文字を囲む視覚の誘導場が、できるだけ干渉し合わないような間隔で配置されていることであるとされている。具体的には、個々の文字を囲む視覚の誘導場の閉曲線を考えたとき、その閉曲線のポテンシャル値が高いと他の文字との分離が難しく、読みにくいということである。このことから、視覚の誘導場の広がりを基準に、文字列内の個々の文字の読み易さを定量的に評価できると考えられる。なお、視覚の誘導場については、横瀬善正著の「形の心理学」（名古屋大学出版会（１９８６））（以下、これを参考論文という。）に記載されている。
【００４２】
参考論文に示された視覚の誘導場（以下、単に誘導場と略記する。）とは、図形の周囲に波及する「場」を考えることにより、視覚現象を説明するものである。参考論文は、直線・円弧で構成された図形を対象としているため、任意のディジタル画像の誘導場は求められない。ここでは、最初に白黒２値のディジタル画像における誘導場の計算方法を示す。
【００４３】
誘導場は、基本的にクーロンポテンシャルと解釈できることから、パターンの外郭を構成する画素を点電荷と仮定し、それらが作るクーロンポテンシャルの集積から、ディジタル画像における誘導場の分布を計算する。
図１は、ディジタル画像の画素配列を示す図である。図１に示すように、ｎ個の点列から構成される曲線ｆ（ｓ）によって、任意の点Ｐに誘導場が形成されるとする。曲線ｆ（ｓ）は、線図形の線分や画図形の輪郭線に当たる。そして、曲線ｆ（ｓ）を構成する各点ｐ_１，ｐ_２，…，ｐ_ｉ，…，ｐ_ｎを正電荷１の点電荷と仮定し、点Ｐから曲線ｆ（ｓ）上を走査して、曲線ｆ（ｓ）を構成するｎ個の点ｐ_１，ｐ_２，…，ｐ_ｉ，…，ｐ_ｎが見つかり、走査して見つかった曲線ｆ（ｓ）上の各点までの距離をｒ_ｉとすると、点Ｐにおける誘導場の強さＭ_ｘｙは、下式（１）により定義することができる。なお、Ｍ_ｘｙの下付符号ｘｙは、点Ｐの画像中のｘ座標およびｙ座標を表している。
【００４４】
【数１】

【００４５】
上式（１）を用いることにより、任意のディジタル画像の誘導場を求めることができる。また、曲線が複数ある場合、点Ｐにおける誘導場の強さＭ_ｘｙは、個々の曲線が点Ｐにつくる誘導場の和になる。なお、上式（１）は、点Ｐから発した光が直接当たる部分のみ和をとるという制約条件がつく。例えば、点Ｐに対して、曲線ｆ_１（ｓ），ｆ_２（ｓ），ｆ_３（ｓ）が図２に示すように存在しているとすると、点Ｐから見えない部分、つまり、この場合、曲線ｆ_１（ｓ）に遮蔽されて点Ｐから見えない範囲Ｚに存在する部分の和はとらない。図２の例では、曲線ｆ_３（ｓ）のすべてと曲線ｆ_２（ｓ）の一部の和はとらないことになる。これを遮蔽条件という。
【００４６】
図３（ａ）は、「Ａ」という文字について、上式（１）で計算した誘導場の例を示すものである。図３（ａ）の文字「Ａ」周辺に地図の等高線状に分布している細い線Ｌが誘導場の等ポテンシャル線であり、中央から外に行くほど誘導場の強さＭ_ｘｙは弱くなりやがて０に近づく。
図３（ａ）の誘導場の分布の形状・強さにおける特徴、特に「Ａ」の頂点付近の分布が他より鋭角な特徴は、参考論文による四角形や三角形など、図形の角付近に関する誘導場の分布の心理実験結果と一致する。
【００４７】
また、図３（ｂ）は、遮蔽条件がなく、画素すべてを正電荷１の点電荷と仮定した誘導場の例であるが、誘導場の分布は、全体的に丸くなり、参考論文による心理実験結果と異なったものとなる。このように、遮蔽条件は、誘導場を特徴づける上で重要なものとなる。
このようにして、ある文字についての誘導場を得ることができる。なお、視覚の誘導場を用いた技術の例としては、例えば、「長石道博：「視覚の誘導場を用いた読み易い和文プロポーショナル表示」、映像メディア学会誌、Ｖｏｌ．５２，Ｎｏ．１２，ｐｐ．１８６５−１８７２（１９９８）」や、「三好正純、下塩義文、古賀広昭、井手口健：「視覚の誘導場理論を用いた感性にもとづく文字配置の設計」、電子情報通信学会論文誌、８２−Ａ，９，１４６５−１４７３（１９９９）」がある。
【００４８】
本発明は、このような誘導場を利用し、文字や写真、絵、図形などからなるひとまとまりの画像について、そのレイアウトが最適なレイアウトであるか否かを評価し、それによって、これまで人間の直感や手作業に頼っていたレイアウト評価を自動的に行おうとするものである。
本実施の形態では、レイアウトの良し悪しを評価する際、レイアウト対象となるひとまとまりの画像を１つの誘導場計算対象とみなして、その誘導場を計算し、それによって求められた等ポテンシャル線の形状に基づいてレイアウトの良し悪しを評価する。
【００４９】
今、レイアウト対象となるひとまとまりの画像が図４に示されるように、文字列と写真からなる画像であるとする。図４に示される画像は、新聞記事の一部を示すもので、文字列部分Ｃと写真Ｐ１，Ｐ２からなり、図４に示されるレイアウトは、新聞紙面専門のデザイナによってなされたものであり、多くの人が見やすく内容の理解がし易いとされるレイアウトであるとする。
【００５０】
図４に示すように、ある限られた表示範囲にレイアウトされるひとまとまりの画像全体について、上式（１）を用いて誘導場を計算すると、求められた誘導場によって、図５のような等ポテンシャル線Ｌが描かれる。なお、このようなレイアウト対象となる情報全体について誘導場を計算する際、図４で示した文字列部分Ｃは、図５に示すように、それぞれの文字列を単純な線で表し、写真Ｐ１，Ｐ２は、その外形を矩形枠で表して誘導場を計算する。
【００５１】
これは、レイアウトが各要素の位置関係や大きさで決まるため、各要素を単純化して表現することができるからであり、このように、各要素を単純化して表現した状態で誘導場を計算し、求められた誘導場から等ポテンシャル線を描けば、その等ポテンシャル線は、そのレイアウト全体の等ポテンシャル線を表すことができる。
【００５２】
なお、図４に示すレイアウトは、専門のデザイナによってデザインされた見やすく内容の理解がし易いとされるレイアウトであり、このようにレイアウトされた画像全体から得られた等ポテンシャル線Ｌは、全体に凹凸が少なく丸みを帯びたものとなる。
このことから、レイアウト対象となるひとまとまりの画像全体について誘導場を計算し、それによって得られた等ポテンシャル線の形状から、その画像のレイアウトの良し悪しを判断することができる。つまり、得られた等ポテンシャル線の凹凸の度合いがわかれば、それによって当該画像のレイアウトが良いレイアウトであるかどうかの評価を行うことができる。
【００５３】
そこで、本実施の形態では、この等ポテンシャル線の凹凸の度合いを等ポテンシャル線の複雑度として求め、その複雑度を当該画像のレイアウトの良し悪しを評価する指標として用いる。つまり、等ポテンシャル線が、凹凸が少なく丸みを帯びていればいるほど複雑度は小さくなり、等ポテンシャル線の凹凸が激しいほど複雑度は大きくなる。この複雑度は、ｉ番目の等ポテンシャル線の複雑度をＣ_ｉで表せば、下式（２）により定義することができる。下式（２）において、Ｌ_ｉはｉ番目の等ポテンシャル線の長さ、Ｓ_ｉはｉ番目の等ポテンシャル線で囲まれた面の面積を表している。なお、ｉ番目の等ポテンシャル線の長さＬ_ｉは、そのポテンシャル線を構成するドット数と考えることができ、ｉ番目の等ポテンシャル線で囲まれた面の面積Ｓ_ｉは、ｉ番目の等ポテンシャル線で囲まれた面に存在するドット数と考えることができる。
【００５４】
【数２】

【００５５】
上式（２）によれば、レイアウト対象となるひとまとまりの画像について計算された誘導場によって描かれた等ポテンシャル線の長さが長いほど（凹凸が激しいほど）複雑度Ｃ_ｉの値は大きくなるといえる。逆に言えば、等ポテンシャル線に凹凸が少なく円に近いほど複雑度Ｃ_ｉは小さな値となる。
ここで、図４で示したひとまとまりの画像を図６で示すように色々なレイアウトとしたときのそれぞれの複雑度を計算してみる。図６では、図５と同様に、文字列部分Ｃはそれぞれの文字列を単純な線で表し、写真Ｐ１，Ｐ２は単に矩形枠で表している。
【００５６】
図６において、同図（ａ）は、図４と同じレイアウト（これをレイアウトＡ１という。）であり、同図（ｂ）は、図４の写真Ｐ２を文字列の中に配置したレイアウト（これをレイアウトＡ２という。）、同図（ｃ）は、写真Ｐ１が右下、写真Ｐ２が左上となっているレイアウト（これをレイアウトＡ３という。）、同図（ｄ）は、２つの写真Ｐ１，Ｐ２を文字列の中に配置したレイアウト（これをレイアウトＡ４という。）である。
【００５７】
これらについて、まず、それぞれの誘導場を計算し、求められた誘導場によって描かれた等ポテンシャル線（それぞれのｉ番目のポテンシャル線）から、上式（２）によってそれぞれ複雑度を計算すると、図７のような結果が得られた。図７は、横軸にそれぞれのレイアウトＡ１〜Ａ６をとり、縦軸にそれぞれのレイアウトＡ１〜Ａ６に対して求められた複雑度をとっている。
【００５８】
図７によれば、デザイナによってレイアウトされた読みやすく内容の理解のし易いとされるレイアウトＡ１（基準レイアウトＡ１という。）の複雑度が最も小さく、他の３つのレイアウトＡ２，Ａ３，Ａ４はいずれも、基準レイアウトＡ１に比べると、その複雑度は大きな値となっている。特に、この例においては、レイアウトＡ３が最も大きな複雑度となっている。
【００５９】
これは、前述したように、基準レイアウトＡ１から求められた誘導場に凹凸が少なく全体的に丸みを帯びているためであり、他の３つのレイアウトＡ２〜Ａ４はそれぞれのレイアウトから求められた等ポテンシャル線に凹凸が大きいためである。
また、等ポテンシャル線を利用し、画像全体における誘導場のエネルギＥは、下式（３）により定義することができる。下式（３）において、ｉはｉ番目の等ポテンシャル線を、Ｓ_ｉはｉ番目の等ポテンシャル線で囲まれた面の面積を、Ｐ_ｉはｉ番目の等ポテンシャル線におけるポテンシャル値をそれぞれ表している。これは、誘導場を３次元的に考えたとき、その誘導場の体積を求めるのに相当し、その体積の大きさをエネルギと定義している。
【００６０】
【数３】

【００６１】
以上は、新聞などの記事（多くは文字列と写真などからなる）の一部をレイアウト対象のひとまとまりの画像とし、そのひとまとまりの画像をレイアウトする場合についての評価を行った場合であるが、レイアウト対象の画像としては、一般的な画像を用いた場合の評価も同様に考えることができる。
次に、本発明に係る画像レイアウト装置の構成を図８を参照しながら説明する。図８は、本発明に係る画像レイアウト装置の構成を示す機能ブロック図である。
【００６２】
本発明に係る画像レイアウト装置は、図８に示すように、動画像を構成する複数の静止画像のなかから画像を選択してレイアウトするレイアウト部１００と、ユーザの好みに適合した画像その他特定画像の特徴を学習する学習部２００と、レイアウト条件その他の条件を入力する条件入力部３００とで構成されている。より具体的には、ＣＰＵ、ＲＯＭ、ＲＡＭおよびＩ／Ｆ等をバス接続した一般的なコンピュータとして構成し、ＣＰＵは、ＲＯＭの所定領域に格納されている所定のプログラムを起動させ、そのプログラムに従ってレイアウト部１００、学習部２００および条件入力部３００として実現される処理を実行する。
【００６３】
レイアウト部１００は、動画像を取得する動画像取得部１１０と、動画像取得部１１０で取得した動画像を構成する各静止画像について画像の特徴を示す画像特徴情報を抽出する画像特徴情報抽出部１２０と、ユーザの好みに適合した画像その他特定画像の特徴を示す画像特徴情報をユーザモデルとして記憶したユーザモデル記憶部１３０と、画像の評価値を算出する評価値算出部１４０と、動画像取得部１１０で取得した動画像を構成する複数の静止画像のなかから画像を選択する画像選択部１５０と、レイアウトの枠組みを構成する異なる複数のテンプレートを記憶したテンプレート記憶部１６０と、画像選択部１５０で選択した選択画像をレイアウトする画像レイアウト部１７０と、印刷を行う印刷部１８０と、表示を行う表示部１８５と、動画像取得部１１０で取得した動画像に付帯する音声情報に基づいて音声認識を行う音声認識部１９０と、音声認識部１９０の認識結果に基づいて要約を作成する要約作成部１９１と、動画像取得部１１０で取得した動画像からシーンの区切を検出するシーン区切検出部１９２とで構成されている。
【００６４】
動画像取得部１１０は、複数の動画像を記憶した動画像記憶媒体５０が与えられたときは、与えられた動画像記憶媒体５０からいずれかの動画像を取得するようになっている。ここで、動画像記憶媒体５０としては、例えば、ＦＤ、ＣＤ、ＭＯ、メモリカードその他のリムーバブルメモリがある。
画像特徴情報抽出部１２０は、動画像取得部１１０で取得した動画像を構成する各静止画像について、誘導場の強さＭ_ｘｙ、等ポテンシャル線の複雑度Ｃ_ｉ、誘導場のエネルギＥおよび画像を構成する各画素の三原色輝度値Ｎ_１ｘｙ，Ｎ_２ｘｙ，Ｎ_３ｘｙを画像特徴情報として抽出するようになっている。誘導場の強さＭ_ｘｙ、等ポテンシャル線の複雑度Ｃ_ｉおよび誘導場のエネルギＥは、静止画像を白黒２値化処理した画像に基づいて算出する。本実施の形態では、画像特徴情報に含まれる各特徴量Ｍ_ｘｙ，Ｃ_ｉ，Ｅ，Ｎ_１ｘｙ，Ｎ_２ｘｙおよびＮ_３ｘｙをそれぞれベクトルとして取り扱う。
【００６５】
ユーザモデル記憶部１３０は、複数のユーザモデルを記憶し、図９に示すように、ニューラルネットワーク４００により各ユーザモデルを記憶するようになっている。図９は、ニューラルネットワーク４００の構成を示す図である。なお、ユーザモデルとしては、ユーザの好みに適合した画像の特徴を示すユーザモデル、インパクトのある画像の特徴を示すユーザモデル、または特定画風の画像の特徴を示すユーザモデルが記憶されている。
【００６６】
ニューラルネットワーク４００は、図９に示すように、特徴量Ｍ_ｘｙ，Ｎ_１ｘｙ，Ｎ_２ｘｙ，Ｎ_３ｘｙ，Ｃ_ｉおよびＥを入力するｉ個の入力層Ｉ_ｉと、各入力層Ｉ_ｉからの出力を入力するｊ個の中間層Ｈ_ｊと、各中間層Ｈ_ｊの出力を入力して嗜好値を出力する出力層Ｏ_ｋとから構成されている。そして、入力層Ｉ_ｉと中間層Ｈ_ｊとは結合係数Ｗ_ｉｊのシナプスにより、中間層Ｈ_ｊと出力層Ｏ_ｋとは結合係数Ｗ_ｊｋのシナプスによりそれぞれ結合されている。
【００６７】
また、ニューラルネットワーク４００は、後述の特徴学習部２３０によりユーザの好みに適合した画像その他特定画像の特徴を学習している。したがって、ユーザの好みに適合した画像その他特定画像から抽出した特徴量をニューラルネットワーク４００に入力したときは、嗜好値として比較的高い値が出力層Ｏ_ｋから出力され、ユーザの好みに適合しない画像その他特定画像以外の画像から抽出した特徴量をニューラルネットワーク４００に入力したときは、嗜好値として比較的低い値が出力層Ｏ_ｋから出力される。
【００６８】
評価値算出部１４０は、ユーザモデル記憶部１３０のなかから、後述の評価値算出条件入力部３１０で入力した評価値算出条件を満たすユーザモデルを選択する。そして、画像特徴情報抽出部１２０で抽出した画像特徴情報から特徴量Ｍ_ｘｙ，Ｎ_１ｘｙ，Ｎ_２ｘｙ，Ｎ_３ｘｙ，Ｃ_ｉおよびＥを得て、得られた特徴量Ｍ_ｘｙ，Ｎ_１ｘｙ，Ｎ_２ｘｙ，Ｎ_３ｘｙ，Ｃ_ｉおよびＥを、選択したユーザモデルに係るニューラルネットワーク４００に入力し、ニューラルネットワーク４００の出力値を評価値として算出する。なお、評価値の算出は、各静止画像ごとに行う。
【００６９】
図８に戻り、シーン区切検出部１９２は、動画像取得部１１０で取得した動画像からシーンの区切を検出し、シーンの区切を示すシーン区切情報を画像選択部１５０および音声認識部１９０にそれぞれ出力するようになっている。なお、シーンの区切は、例えば、画像の輝度値の変化に基づいたものなど、従来のシーン抽出方法により検出することができる。
【００７０】
音声認識部１９０は、シーン区切検出部１９２からのシーン区切情報に基づいて、動画像取得部１１０で取得した動画像のうち同一シーンまたは特定シーンに係るものに付帯する音声情報をその動画像から取得し、取得した音声情報に基づいて音声認識を行い、認識結果としてテキスト情報を要約作成部１９１に出力するようになっている。なお、同一シーンまたは特定シーンの指定は、ユーザが行うようにしてもよいし、自動的に行うようにしてもよいが、画像選択部１５０が対象とするシーンと一致させることが必要である。
【００７１】
要約作成部１９１は、音声認識部１９０からのテキスト情報に基づいて要約を作成し、作成した要約をテキスト情報として画像レイアウト部１７０に出力するようになっている。なお、要約の作成については従来の例による。
画像選択部１５０は、シーン区切検出部１９２からのシーン区切情報に基づいて、動画像取得部１１０で取得した動画像を構成する複数の静止画像のうち同一シーンまたは特定シーンに属するもののなかから、評価値算出部１４０で算出した評価値が大きい順に所定数の静止画像を選択するようになっている。ここで、静止画像の選択は、さらに、後述の画像選択条件入力部３２０で入力した画像選択条件を満たすように行う。なお、同一シーンまたは特定シーンの指定は、ユーザが行うようにしてもよいし、自動的に行うようにしてもよいが、音声認識部１９０が対象とするシーンと一致させることが必要である。
【００７２】
テンプレート記憶部１６０は、図１０に示すように、異なる複数のテンプレートを記憶するようになっている。図１０は、テンプレートの構造を示す図である。
各テンプレートは、選択画像を格納するための画像格納枠およびテキスト情報を格納するための文字格納枠を複数レイアウト領域に配置して構成されており、各画像格納枠には、選択画像を配置する優先順位が付されている。図１０（ａ）に示すテンプレートでは、優先順位として最も高い「１」を付した画像格納枠５０１ａがレイアウト領域上半分に大きく配置され、優先順位として「２」〜「５」を付した画像格納枠５０２ａ〜５０５ａがレイアウト領域下半分の４区画にそれぞれ小さく配置されている。これは、評価値が最も高い選択画像を画像格納枠５０１ａに格納し、次いで評価値が高い順に４つの選択画像を画像格納枠５０２ａ〜５０５ａにそれぞれ格納することを意味している。一方、同図（ａ）に示すテンプレートでは、画像格納枠５０１ａ〜５０５ａに対応する文字格納枠５０１ｂ〜５０５ｂが対応の画像格納枠と重畳して配置されている。
【００７３】
また、図１０（ｂ）に示すテンプレートでは、優先順位として「１」を付した画像格納枠５１１がレイアウト領域左半分に大きく配置され、優先順位として「２」〜「４」を付した画像格納枠５１２〜５１４がレイアウト領域右半分の３区画にそれぞれ小さく配置されている。これは、評価値が最も高い選択画像を画像格納枠５１１に格納し、次いで評価値が高い順に３つの選択画像を画像格納枠５１２〜５１４にそれぞれ格納することを意味している。一方、同図（ｂ）に示すテンプレートでは、画像格納枠５１１の下方に文字格納枠５１５が配置されている。
【００７４】
また、図１０（ｃ）に示すテンプレートでは、レイアウト領域を縦４つ横２つに区画し、優先順位として「１」〜「８」を付した画像格納枠５２１〜５２８が、左から右、次いで上から下の順に各区画に配置されている。これは、評価値が高い順に８つの選択画像を画像格納枠５２１〜５２８にそれぞれ格納することを意味している。一方、同図（ｃ）に示すテンプレートでは、画像格納枠５２７，５２８の下方に文字格納枠５２９が配置されている。
【００７５】
図８に戻り、画像レイアウト部１７０は、テンプレート記憶部１６０のなかから、後述のレイアウト条件入力部３３０で入力したレイアウト条件を満たすテンプレートを選択する。そして、画像選択部１５０で選択した選択画像を、評価値算出部１４０で算出した評価値のうちその選択画像に対応するものに基づいて、選択したテンプレートなかの画像格納枠に格納する。具体的には、選択画像を、その評価値と一致する優先順位が付された画像格納枠に格納する。また、要約作成部１９１からのテキスト情報を、選択したテンプレートなかの文字格納枠に格納する。これにより選択画像および要約をレイアウトする。
【００７６】
印刷部１８０は、画像レイアウト部１７０でのレイアウト結果をプリンタ等で印刷するようになっている。これにより、ユーザは、画像レイアウト部１７０でのレイアウト結果を紙面にて確認することができる。
表示部１８５は、画像レイアウト部１７０でのレイアウト結果をディスプレイ等で表示するようになっている。これにより、ユーザは、画像レイアウト部１７０でのレイアウト結果を画面にて確認することができる。
【００７７】
学習部２００は、図８に示すように、動画像取得部１１０で取得した動画像を構成する複数の静止画像のなかからユーザによる画像の指定を入力する画像指定入力部２１０と、動画像取得部１１０で取得した動画像を構成する複数の静止画像のうち画像指定入力部２１０で入力した指定に係るものについて画像特徴情報を抽出する画像特徴情報抽出部２２０と、画像特徴情報抽出部２２０で抽出した画像特徴情報に基づいてユーザの好みに適合した画像その他特定画像の特徴を学習する特徴学習部２３０とで構成されている。
【００７８】
画像特徴情報抽出部２２０は、画像特徴情報抽出部１２０と同一機能を有して構成されており、動画像取得部１１０で取得した動画像を構成する複数の静止画像のうち画像指定入力部２１０で入力した指定に係るものについて、誘導場の強さＭ_ｘｙ、等ポテンシャル線の複雑度Ｃ_ｉ、誘導場のエネルギＥ、並びに画像を構成する各画素の三原色輝度値Ｎ_１ｘｙ，Ｎ_２ｘｙおよびＮ_３ｘｙを画像特徴情報として抽出するようになっている。
【００７９】
特徴学習部２３０は、ユーザモデル記憶部１３０のなかから、後述の評価値算出条件入力部３１０で入力した評価値算出条件を満たすユーザモデルを選択する。そして、画像特徴情報抽出部２２０で抽出した画像特徴情報から特徴量Ｍ_ｘｙ，Ｎ_１ｘｙ，Ｎ_２ｘｙ，Ｎ_３ｘｙ，Ｃ_ｉおよびＥを得て、得られた特徴量Ｍ_ｘｙ，Ｎ_１ｘｙ，Ｎ_２ｘｙ，Ｎ_３ｘｙ，Ｃ_ｉおよびＥに基づいて、公知のバックプロパゲーション法その他の学習法により、選択したユーザモデルに係るニューラルネットワーク４００を学習するようになっている。学習では、画像指定入力部２１０で入力した指定に係る静止画像から抽出した特徴量をニューラルネットワーク４００に入力したときに、嗜好値として比較的高い値が出力層Ｏ_ｋから出力されるように結合係数Ｗ_ｉｊ，Ｗ_ｊｋを決定する。例えば、バックプロパゲーション法を用いる場合は、前向き演算または後ろ向き演算により結合係数Ｗ_ｉｊ，Ｗ_ｊｋを決定する。
【００８０】
条件入力部３００は、図８に示すように、評価値の算出に関する評価値算出条件を入力する評価値算出条件入力部３１０と、画像の選択に関する画像選択条件を入力する画像選択条件入力部３２０と、レイアウトに関するレイアウト条件を入力するレイアウト条件入力部３３０とで構成されている。
評価値算出条件入力部３１０は、ユーザモデル記憶部１３０のユーザモデルのうちいずれかを特定する内容を評価値算出条件として入力するようになっている。ユーザは、例えば、「ユーザの好みに適合した画像」、「インパクトのある画像」および「特定画風の画像」のなかから「ユーザの好みに適合した画像」を指定すると、その指定に対応するユーザモデル（ユーザの好みに適合した画像の特徴を示すユーザモデル）を特定する内容が評価値算出条件として入力される。この場合は、画像選択部１５０において、ユーザの好みに適合した画像が選択され、画像レイアウト部１７０において、ユーザの好みに適合した画像をレイアウトする場合に適切なレイアウトとなるようにレイアウトが決定される。
【００８１】
画像選択条件入力部３２０は、選択画像の枚数を特定する内容を画像選択条件として入力するようになっている。例えば、選択画像の枚数として「１０」が指定された場合は、画像選択部１５０において、動画像取得部１１０で取得した動画像を構成する複数の静止画像のなかから、評価値算出部１４０で算出した評価値が大きい順に１０枚の静止画像が選択される。
【００８２】
レイアウト条件入力部３３０は、直接印刷を行うか否か、印刷プレビューを行うか否か、印刷ページ数、およびテンプレート記憶部１６０のテンプレートのうちいずれかを特定する内容をレイアウト条件として入力するようになっている。例えば、直接印刷および印刷プレビューを行うことが、印刷ページ数として「３」が、テンプレートとして「テンプレート１」がそれぞれ指定された場合は、画像レイアウト部１７０において、テンプレート１に基づいて選択画像が３ページを上限としてレイアウトされ、表示部１８５において、画像レイアウト部１７０でのレイアウト結果が印刷プレビューされた後、印刷部１８０において、画像レイアウト部１７０でのレイアウト結果が直接印刷される。
【００８３】
次に、本実施の形態の動作を説明する。
初めに、ニューラルネットワーク４００を学習する場合を説明する。
ユーザの好みに適合した画像の特徴を示すユーザモデルについてそのニューラルネットワーク４００を学習する場合、ユーザは、まず、複数の動画像を記憶した動画像記憶媒体５０を動画像取得部１１０に与える。動画像記憶媒体５０が与えられると、動画像取得部１１０により、与えられた動画像記憶媒体５０からいずれかの動画像が取得される。
【００８４】
次に、ユーザは、「ユーザの好みに適合した画像」を評価値算出条件として指定するとともに、動画像取得部１１０で取得された動画像を構成する複数の静止画像のなかから自己の好みに適合したものをいくつか指定する。これらの指定は、評価値算出条件入力部３１０および画像指定入力部２１０に入力する。
「ユーザの好みに適合した画像」が指定されると、特徴学習部２３０により、ユーザモデル記憶部１３０のなかから、ユーザの好みに適合したユーザモデルが学習対象として選択される。
【００８５】
一方、静止画像の指定が入力されると、画像特徴情報抽出部２２０により、動画像取得部１１０で取得された動画像を構成する複数の静止画像のうち入力された指定に係るものについて、特徴量Ｍ_ｘｙ，Ｎ_１ｘｙ，Ｎ_２ｘｙ，Ｎ_３ｘｙ，Ｃ_ｉおよびＥが画像特徴情報として抽出される。そして、特徴学習部２３０により、抽出された画像特徴情報から特徴量Ｍ_ｘｙ，Ｎ_１ｘｙ，Ｎ_２ｘｙ，Ｎ_３ｘｙ，Ｃ_ｉおよびＥを得て、得られた特徴量Ｍ_ｘｙ，Ｎ_１ｘｙ，Ｎ_２ｘｙ，Ｎ_３ｘｙ，Ｃ_ｉおよびＥに基づいて、選択されたユーザモデルに係るニューラルネットワーク４００が学習される。この一連の処理は、指定されたすべての静止画像について行われる。
【００８６】
なお、インパクトのある画像の特徴を示すユーザモデルについてそのニューラルネットワーク４００を学習する場合は、上記同様の要領で、「インパクトのある画像」を指定するとともに、動画像取得部１１０で取得した動画像を構成する複数の静止画像のなかからインパクトのあるものをいくつか指定すればよい。もちろん、画像を手動で指定するに限らず、インパクトのある画像に共通する画像特徴情報を求めておき、その画像特徴情報と同一または類似の画像特徴情報を有する画像を自動的に指定するようにしてもよい。
【００８７】
また、特定画風の画像の特徴を示すユーザモデルについてそのニューラルネットワーク４００を学習する場合は、上記同様の要領で、「特定画風の画像」を指定するとともに、動画像取得部１１０で取得した動画像を構成する複数の静止画像のなかから特定画風のものをいくつか指定すればよい。もちろん、画像を手動で指定するに限らず、特定画風の画像に共通する画像特徴情報を求めておき、その画像特徴情報と同一または類似の画像特徴情報を有する画像を自動的に指定するようにしてもよい。
【００８８】
次に、画像をレイアウトする場合を説明する。
ユーザの好みに適合した画像をレイアウトする場合、ユーザは、まず、複数の動画像を記憶した動画像記憶媒体５０を動画像取得部１１０に与える。動画像記憶媒体５０が与えられると、動画像取得部１１０により、与えられた動画像記憶媒体５０からいずれかの動画像が取得される。そして、シーン区切検出部１９２により、取得された動画像からシーンの区切が検出され、シーンの区切を示すシーン区切情報が画像選択部１５０および音声認識部１９０にそれぞれ出力される。
【００８９】
次に、ユーザは、「ユーザの好みに適合した画像」を評価値算出条件として指定するとともに、所望のテンプレートをレイアウト条件として指定する。これらの指定は、例えば、ディフォルト設定にしておくことで省略することもできる。また同時に、必要があれば、画像選択条件およびその他のレイアウト条件を指定することもできる。
【００９０】
「ユーザの好みに適合した画像」が指定されると、評価値算出部１４０により、ユーザモデル記憶部１３０のなかから、ユーザの好みに適合したユーザモデルが選択される。このユーザモデルは、評価値の算出に用いられる。また、テンプレートが指定されると、画像レイアウト部１７０により、テンプレート記憶部１６０のなかから、ユーザが指定したテンプレートが選択される。このテンプレートは、選択画像のレイアウトに用いられる。
【００９１】
一方、画像特徴情報抽出部１２０により、取得された動画像を構成する各静止画像について特徴量Ｍ_ｘｙ，Ｎ_１ｘｙ，Ｎ_２ｘｙ，Ｎ_３ｘｙ，Ｃ_ｉおよびＥが画像特徴情報として抽出される。次いで、評価値算出部１４０により、抽出された画像特徴情報から特徴量Ｍ_ｘｙ，Ｎ_１ｘｙ，Ｎ_２ｘｙ，Ｎ_３ｘｙ，Ｃ_ｉおよびＥを得て、得られた特徴量Ｍ_ｘｙ，Ｎ_１ｘｙ，Ｎ_２ｘｙ，Ｎ_３ｘｙ，Ｃ_ｉおよびＥが、選択されたユーザモデルに係るニューラルネットワーク４００に入力され、その入力に伴って出力されるニューラルネットワーク４００からの出力値が評価値として算出される。この一連の処理は、動画像取得部１１０で取得された動画像を構成するすべての静止画像について行われる。
【００９２】
次いで、画像選択部１５０により、シーン区切検出部１９２からのシーン区切情報に基づいて、取得された動画像を構成する複数の静止画像のうち同一シーンまたは特定シーンに属するもののなかから、評価値が大きい順に所定数の静止画像が選択される。
一方、音声認識部１９０により、シーン区切検出部１９２からのシーン区切情報に基づいて、取得された動画像のうち同一シーンまたは特定シーンに係るものに付帯する音声情報がその動画像から取得され、取得された音声情報に基づいて音声認識が行われ、認識結果としてテキスト情報が要約作成部１９１に出力される。次いで、要約作成部１９１により、音声認識部１９０からのテキスト情報に基づいて要約が作成され、作成された要約がテキスト情報として画像レイアウト部１７０に出力される。
【００９３】
画像の選択および要約の作成が行われると、画像レイアウト部１７０により、選択画像がその評価値に基づいてレイアウトされるとともに要約がレイアウトされる。レイアウトでは、選択されたテンプレートにおいて、選択画像が、その評価値と一致する優先順位が付された画像格納枠に格納される。また、選択されたテンプレートにおいて、要約作成部１９１からのテキスト情報が文字格納枠に格納される。そして、レイアウト条件として印刷プレビューを行うことが指定されていれば、表示部１８５により、画像レイアウト部１７０でのレイアウト結果がディスプレイ等で印刷プレビューされ、レイアウト条件として直接印刷を行うことが指定されていれば、印刷部１８０により、画像レイアウト部１７０でのレイアウト結果がプリンタ等で直接印刷される。
【００９４】
なお、インパクトのある画像をレイアウトする場合は、上記同様の要領で、「インパクトのある画像」を指定するとともに、所望のテンプレートをレイアウト条件として指定すればよい。
また、特定画風の画像をレイアウトする場合は、上記同様の要領で、「特定画風の画像」を指定するとともに、所望のテンプレートをレイアウト条件として指定すればよい。
【００９５】
このようにして、本実施の形態では、動画像に付帯する音声情報に基づいて音声認識を行いその認識結果であるテキスト情報を出力する音声認識部１９０と、動画像を構成する複数の静止画像のなかから画像を選択する画像選択部１５０と、音声認識部１９０からのテキスト情報および画像選択部１５０で選択した選択画像をレイアウトする画像レイアウト部１７０とを備える。
【００９６】
これにより、音声認識結果および選択画像がレイアウトされるので、選択画像だけでなく動画像に付帯する音声もレイアウトに反映させることができる。したがって、従来に比して、比較的内容の充実した編集物を作成することができる。さらに、本実施の形態では、画像選択部１５０は、ユーザの好みに適合した画像を複数の静止画像のなかから選択するようになっている。
【００９７】
これにより、ユーザの好みに比較的沿った内容の編集物を作成することができる。
さらに、本実施の形態では、さらに、音声認識部１９０からのテキスト情報に基づいて要約を作成する要約作成部１９１を備え、画像レイアウト部１７０は、要約作成部１９１で作成した要約および選択画像をレイアウトするようになっている。
【００９８】
これにより、動画像に付帯する音声情報に関する要約を併せてレイアウトすることができるので、さらに内容の充実した編集物を作成することができるとともに、音声情報に係るレイアウト部分が比較的簡潔明瞭となり、読みやすくなる。さらに、本実施の形態では、さらに、動画像からシーンの区切を検出するシーン区切検出部１９２を備え、画像選択部１５０は、シーン区切検出部１９２からのシーン区切情報に基づいて、動画像取得部１１０で取得した動画像を構成する複数の静止画像のうち同一シーンまたは特定シーンに属するもののなかから、評価値算出部１４０で算出した評価値が大きい順に所定数の静止画像を選択するようになっている。
【００９９】
これにより、動画像が複数のシーンからなる場合は、各シーンごとに画像を選択することができるので、比較的詳細に編集を行うことができる。
さらに、本実施の形態では、さらに、動画像からシーンの区切を検出するシーン区切検出部１９２を備え、音声認識部１９０は、シーン区切検出部１９２からのシーン区切情報に基づいて、動画像取得部１１０で取得した動画像のうち同一シーンまたは特定シーンに係るものに付帯する音声情報をその動画像から取得し、取得した音声情報に基づいて音声認識を行い、認識結果としてテキスト情報を要約作成部１９１に出力するようになっている。
【０１００】
これにより、動画像が複数のシーンからなる場合は、各シーンごとに音声情報を取得することができるので、比較的詳細に編集を行うことができる。
さらに、本実施の形態では、動画像を構成する各静止画像について画像の特徴を示す画像特徴情報を抽出する画像特徴情報抽出部１２０と、画像特徴情報抽出部１２０で抽出した画像特徴情報に基づいて画像の評価値を算出する評価値算出部１４０と、動画像を構成する複数の静止画像のなかから画像を選択する画像選択部１５０と、評価値算出部１４０で算出した評価値に基づいて画像選択部１５０で選択した選択画像のレイアウトを決定する画像レイアウト部１７０とを備えている。
【０１０１】
これにより、画像の内容に応じてレイアウトを決定することができるので、画像の内容に応じて比較的見栄えのよいレイアウトを実現することができる。
さらに、本実施の形態では、ユーザの好みに適合した画像特徴情報をユーザモデルとして記憶するためのユーザモデル記憶部１３０を備え、評価値算出部１４０は、画像特徴情報抽出部１２０で抽出した画像特徴情報およびユーザモデル記憶部１３０のユーザモデルに基づいて、評価値を算出するようになっている。
【０１０２】
これにより、ユーザの好みに比較的適合したレイアウトとなるようにレイアウトを決定することができるので、ユーザにとって比較的見栄えのよいレイアウトを実現することができる。また、ユーザの好みに比較的適合したレイアウトの画像を選択することができる。
さらに、本実施の形態では、ユーザモデルは、ユーザの好みに適合した画像について視覚の誘導場の強さＭ_ｘｙを示す誘導場特徴量を含み、画像特徴情報抽出部１２０は、各静止画像ごとに、その静止画像について視覚の誘導場の強さＭ_ｘｙを算出し、算出した視覚の誘導場の強さＭ_ｘｙを示す誘導場特徴量を含む画像特徴情報を抽出するようになっている。
【０１０３】
これにより、生理学、心理学的な知見に基づく視覚の誘導場の強さＭ_ｘｙをレイアウトの決定に利用したことにより、ユーザの好みにさらに適合したレイアウトとなるようにレイアウトを決定することができる。したがって、ユーザにとってさらに見栄えのよいレイアウトを実現することができる。また、ユーザの好みにさらに適合したレイアウトの画像を選択することができる。
【０１０４】
さらに、本実施の形態では、ユーザモデルは、ユーザの好みに適合した画像について視覚の誘導場における等ポテンシャル線の複雑度Ｃ_ｉを示す複雑度特徴量を含み、画像特徴情報抽出部１２０は、各静止画像ごとに、その静止画像について視覚の誘導場を算出し、算出した視覚の誘導場から等ポテンシャル線を得て、その等ポテンシャル線の複雑度Ｃ_ｉを示す複雑度特徴量を含む画像特徴情報を抽出するようになっている。
【０１０５】
これにより、生理学、心理学的な知見に基づく視覚の誘導場における等ポテンシャル線の複雑度Ｃ_ｉをレイアウトの決定に利用したことにより、ユーザの好みにさらに適合したレイアウトとなるようにレイアウトを決定することができる。したがって、ユーザにとってさらに見栄えのよいレイアウトを実現することができる。また、ユーザの好みにさらに適合したレイアウトの画像を選択することができる。
【０１０６】
さらに、本実施の形態では、ユーザモデルは、ユーザの好みに適合した画像について視覚の誘導場のエネルギＥを示すエネルギ特徴量を含み、画像特徴情報抽出部１２０は、各静止画像ごとに、その静止画像について視覚の誘導場のエネルギＥを算出し、算出した視覚の誘導場のエネルギＥを示すエネルギ特徴量を含む画像特徴情報を抽出するようになっている。
【０１０７】
これにより、生理学、心理学的な知見に基づく視覚の誘導場のエネルギＥをレイアウトの決定に利用したことにより、ユーザの好みにさらに適合したレイアウトとなるようにレイアウトを決定することができる。したがって、ユーザにとってさらに見栄えのよいレイアウトを実現することができる。また、ユーザの好みにさらに適合したレイアウトの画像を選択することができる。
【０１０８】
さらに、本実施の形態では、画像選択部１５０は、評価値算出部１４０で算出した評価値に基づいて、動画像を構成する複数の静止画像のなかから画像を選択するようになっている。
これにより、画像の特徴に関する評価値に応じて画像を選択することができるので、比較的見栄えのよい画像を選択することができる。
【０１０９】
さらに、本実施の形態では、評価値の算出に関する評価値算出条件を入力する評価値算出条件入力部３１０を備え、評価値算出部１４０は、評価値算出条件入力部３１０で入力した評価値算出条件および画像特徴情報抽出部１２０で抽出した画像特徴情報に基づいて、評価値を算出するようになっている。
これにより、評価値の算出条件を指定することができるので、レイアウトの自由度を向上することができる。
【０１１０】
さらに、本実施の形態では、画像の選択に関する画像選択条件を入力する画像選択条件入力部３２０を備え、画像選択部１５０は、画像選択条件入力部３２０で入力した画像選択条件に基づいて、動画像を構成する複数の静止画像のなかから画像を選択するようになっている。
これにより、画像の選択条件を指定することができるので、画像選択の自由度を向上することができる。
【０１１１】
さらに、本実施の形態では、動画像を取得する動画像取得部１１０を備え、画像選択部１５０は、動画像取得部１１０で取得した動画像を構成する複数の静止画像のなかから画像を選択するようになっている。
これにより、外部の画像をレイアウト対象として取り扱うことができる。
上記第１の実施の形態において、画像選択部１５０は、発明１、２、６または１８の画像選択手段に対応し、画像選択部１５０による選択は、発明１９の画像選択ステップに対応し、画像レイアウト部１７０は、発明１、３、５、９、１７または１８のレイアウト手段に対応している。また、画像レイアウト部１７０によるレイアウトは、発明１９のレイアウトステップに対応し、印刷部１８０は、発明１７の印刷手段に対応し、音声認識部１９０は、発明１、３、５、７または１８の付帯情報取得手段に対応し、音声認識部１９０による取得は、発明１９の付帯情報取得ステップに対応している。
【０１１２】
また、上記第１の実施の形態において、要約作成部１９１は、発明５の要約作成手段に対応し、シーン区切検出部１９２は、発明６または７のシーン区切検出手段に対応し、音声情報は、発明１、３、７、９、１８または１９の付帯情報に対応している。
次に、本発明の第２の実施の形態を図面を参照しながら説明する。図１１および図１２は、本発明に係る画像レイアウト装置および画像レイアウトプログラム、並びに画像レイアウト方法の第２の実施の形態を示す図である。以下、上記第１の実施の形態と異なる部分についてのみ説明し、重複する部分については同一の符号を付して説明を省略する。
【０１１３】
本実施の形態は、本発明に係る画像レイアウト装置および画像レイアウトプログラム、並びに画像レイアウト方法を、図１１に示すように、ディジタルビデオカメラで撮影した動画像のなかから画像を選択し、選択した画像を自動レイアウトする場合について適用したものであり、上記第１の実施の形態と異なるのは、要約に代えて動画像および要約を参照するためのＵＲＬをレイアウトする点にある。
【０１１４】
まず、本発明に係る画像レイアウト装置の構成を図１１を参照しながら説明する。図１１は、本発明に係る画像レイアウト装置の構成を示す機能ブロック図である。
本発明に係る画像レイアウト装置は、インターネットに通信可能に接続し、図１１に示すように、レイアウト部１００と、学習部２００と、条件入力部３００と、外部からのアクセスに応じて動画像または要約を提供する情報提供部６００とで構成されている。
【０１１５】
レイアウト部１００は、動画像取得部１１０と、画像特徴情報抽出部１２０と、ユーザモデル記憶部１３０と、評価値算出部１４０と、画像選択部１５０と、テンプレート記憶部１６０と、画像選択部１５０で選択した選択画像をレイアウトする画像レイアウト部１７２と、印刷部１８０と、表示部１８５と、音声認識部１９０と、音声認識部１９０の認識結果に基づいて要約を作成する要約作成部１９３と、動画像取得部１１０で取得した動画像からシーンの区切を検出するシーン区切検出部１９４と、ユニークキーを生成するユニークキー生成部１９５と、動画像を生成する動画像生成部１９６と、動画像および要約をユニークキーと対応付けて記憶する動画像記憶部１９７とで構成されている。
【０１１６】
動画像記憶部１９７には、動画像生成部１９６で生成した動画像および要約作成部１９３で作成した要約がそれぞれファイルとして格納されているとともに、各ユニークキーごとに動画像、要約およびシーン区切情報の対応付けを登録するユニークキー別対応テーブル７００が格納されている。図１２は、ユニークキー別対応テーブル７００のデータ構造を示す図である。
【０１１７】
ユニークキー別対応テーブル７００には、図１２に示すように、各ユニークキーごとに１つのレコードが登録されている。各レコードは、ユニークキー生成部１９５で生成したユニークキーを登録するフィールド７０２と、動画像を格納した動画像ファイルのファイル名および格納場所を登録するフィールド７０４と、要約を格納したテキストファイルのファイル名を登録するフィールド７０６と、シーン区切情報を登録するフィールド７０８とを含んで構成されている。フィールド７０８は、さらに、シーンの区切として開始時刻を登録するフィールドと、シーンの区切として終了時刻を登録するフィールドとを含んで構成されている。
【０１１８】
図１２の例では、第１段目のレコードには、ユニークキーとして「００１０００１」が、動画像ファイルのファイル名および格納場所として「ｄ：￥ｆｉｌｅｓ￥００１．ｍｐｅｇ」が、テキストファイルのファイル名として「Ｔｅｘｔ１」が、シーン区切情報として「０：１０〜０：２０」がそれぞれ登録されている。これは、動画像取得部１１０で取得した動画像のうち開始時刻０：１０から終了時刻０：２０までのシーンに係るものが動画像ファイル「ｄ：￥ｆｉｌｅｓ￥００１．ｍｐｅｇ」に格納され、同シーンに係る動画像に付帯する音声情報を要約したものがテキストファイル「Ｔｅｘｔ１」に格納されており、それら動画像および要約がユニークキー「００１００１」により特定可能となっていることを意味している。したがって、ユニークキー「００１００１」を与えて画像レイアウト装置にアクセスをすると、これに対応する動画像および要約を入手することができる。
【０１１９】
図１１に戻り、シーン区切検出部１９４は、動画像取得部１１０で取得した動画像からシーンの区切を検出し、シーンの区切を示すシーン区切情報を画像選択部１５０、音声認識部１９０および動画像生成部１９６にそれぞれ出力するようになっている。
ユニークキー生成部１９５は、ユニークキーを生成し、生成したユニークキーを要約作成部１９３および動画像生成部１９６に出力するようになっている。図１２の例では、ユニークキーは、動画像ファイルに付される番号を上位３桁として与え、ユニークキーの生成順に付される連続番号を下位３桁として与えることにより生成する。
【０１２０】
要約作成部１９３は、音声認識部１９０からのテキスト情報に基づいて要約を作成し、作成した要約を、ユニークキー生成部１９５で生成したユニークキーと対応付けて動画像記憶部１９７に記憶するようになっている。
動画像生成部１９６は、シーン区切検出部１９４からのシーン区切情報に基づいて、動画像取得部１１０で取得した動画像のうち同一シーンまたは特定シーンに係るものをその動画像から取得し、取得した動画像およびシーン区切情報を、ユニークキー生成部１９５で生成したユニークキーと対応付けて動画像記憶部１９７に記憶するようになっている。
【０１２１】
画像レイアウト部１７２は、テンプレート記憶部１６０のなかから、レイアウト条件入力部３３０で入力したレイアウト条件を満たすテンプレートを選択する。そして、画像選択部１５０で選択した選択画像を、評価値算出部１４０で算出した評価値のうちその選択画像に対応するものに基づいて、選択したテンプレートなかの画像格納枠に格納する。具体的には、選択画像を、その評価値と一致する優先順位が付された画像格納枠に格納する。また、画像選択部１５０で選択した選択画像に係るユニークキーを動画像記憶部１９７から読み出し、読み出したユニークキーを含むＵＲＬ（画像レイアウト装置のネットワークアドレスを示すもの）を、選択したテンプレートなかの文字格納枠に格納する。これにより選択画像およびＵＲＬをレイアウトする。
【０１２２】
情報提供部６００は、図１１に示すように、動画像記憶部１９７の動画像を提供する動画像提供部６１０と、動画像記憶部１９７の要約を提供する要約提供部６２０とで構成されている。
動画像提供部６１０は、ＵＲＬに基づくアクセスがあったときは、そのＵＲＬに含まれるユニークキーに対応する動画像を動画像記憶部１９７から読み出し、読み出した動画像をアクセス元に提供するようになっている。
【０１２３】
要約提供部６２０は、ＵＲＬに基づくアクセスがあったときは、そのＵＲＬに含まれるユニークキーに対応する要約のテキスト情報を動画像記憶部１９７から読み出し、読み出したテキスト情報をアクセス元に提供するようになっている。
次に、本実施の形態の動作を説明する。
ユーザの好みに適合した画像をレイアウトする場合、ユーザは、まず、複数の動画像を記憶した動画像記憶媒体５０を動画像取得部１１０に与える。動画像記憶媒体５０が与えられると、動画像取得部１１０により、与えられた動画像記憶媒体５０からいずれかの動画像が取得される。そして、シーン区切検出部１９４により、取得された動画像からシーンの区切が検出され、シーンの区切を示すシーン区切情報が画像選択部１５０、音声認識部１９０および動画像生成部１９６にそれぞれ出力される。
【０１２４】
次に、ユーザは、「ユーザの好みに適合した画像」を評価値算出条件として指定するとともに、所望のテンプレートをレイアウト条件として指定する。これらの指定は、例えば、ディフォルト設定にしておくことで省略することもできる。また同時に、必要があれば、画像選択条件およびその他のレイアウト条件を指定することもできる。
【０１２５】
「ユーザの好みに適合した画像」が指定されると、評価値算出部１４０により、ユーザモデル記憶部１３０のなかから、ユーザの好みに適合したユーザモデルが選択される。このユーザモデルは、評価値の算出に用いられる。また、テンプレートが指定されると、画像レイアウト部１７２により、テンプレート記憶部１６０のなかから、ユーザが指定したテンプレートが選択される。このテンプレートは、選択画像のレイアウトに用いられる。
【０１２６】
一方、画像特徴情報抽出部１２０により、取得された動画像を構成する各静止画像について特徴量Ｍ_ｘｙ，Ｎ_１ｘｙ，Ｎ_２ｘｙ，Ｎ_３ｘｙ，Ｃ_ｉおよびＥが画像特徴情報として抽出される。次いで、評価値算出部１４０により、抽出された画像特徴情報から特徴量Ｍ_ｘｙ，Ｎ_１ｘｙ，Ｎ_２ｘｙ，Ｎ_３ｘｙ，Ｃ_ｉおよびＥを得て、得られた特徴量Ｍ_ｘｙ，Ｎ_１ｘｙ，Ｎ_２ｘｙ，Ｎ_３ｘｙ，Ｃ_ｉおよびＥが、選択されたユーザモデルに係るニューラルネットワーク４００に入力され、その入力に伴って出力されるニューラルネットワーク４００からの出力値が評価値として算出される。この一連の処理は、動画像取得部１１０で取得された動画像を構成するすべての静止画像について行われる。
【０１２７】
次いで、画像選択部１５０により、シーン区切検出部１９４からのシーン区切情報に基づいて、取得された動画像を構成する複数の静止画像のうち同一シーンまたは特定シーンに属するもののなかから、評価値が大きい順に所定数の静止画像が選択される。
一方、音声認識部１９０により、シーン区切検出部１９４からのシーン区切情報に基づいて、取得された動画像のうち同一シーンまたは特定シーンに係るものに付帯する音声情報がその動画像から取得され、取得された音声情報に基づいて音声認識が行われ、認識結果としてテキスト情報が要約作成部１９３に出力される。次いで、ユニークキー生成部１９５により、ユニークキーが生成され、要約作成部１９３により、音声認識部１９０からのテキスト情報に基づいて要約が作成され、作成された要約のテキスト情報が、生成されたユニークキーと対応付けられて動画像記憶部１９７に記憶される。また、動画像生成部１９６により、シーン区切検出部１９４からのシーン区切情報に基づいて、取得された動画像のうち同一シーンまたは特定シーンに係るものがその動画像から取得され、取得された動画像およびシーン区切情報が、生成されたユニークキーと対応付けられて動画像記憶部１９７に記憶される。
【０１２８】
画像の選択、要約の作成および動画像の生成が行われると、画像レイアウト部１７２により、選択画像がその評価値に基づいてレイアウトされるとともにＵＲＬがレイアウトされる。レイアウトでは、選択されたテンプレートにおいて、選択画像が、その評価値と一致する優先順位が付された画像格納枠に格納される。また、選択画像に係るユニークキーが動画像記憶部１９７から読み出され、選択されたテンプレートにおいて、読み出されたユニークキーを含むＵＲＬ（例えば、ｈｔｔｐ：／／ｗｗｗ．ａｂｃｄ．ｃｏ．ｊｐ／ｆｉｌｅｓ．ｃｇｉ？ｉｄ＝００１０００１）が文字格納枠に格納される。そして、レイアウト条件として印刷プレビューを行うことが指定されていれば、表示部１８５により、画像レイアウト部１７２でのレイアウト結果がディスプレイ等で印刷プレビューされ、レイアウト条件として直接印刷を行うことが指定されていれば、印刷部１８０により、画像レイアウト部１７２でのレイアウト結果がプリンタ等で直接印刷される。
【０１２９】
レイアウト結果が印刷された編集物の頒布を受けたユーザは、編集物にＵＲＬが印刷されているので、自己のネットワーク端末等において、ＵＲＬを参照して画像レイアウト装置にアクセスすることにより、編集物に掲載されている画像に関連する動画像および要約を入手することができる。
画像レイアウト装置では、ＵＲＬに基づくアクセスがあると、動画像提供部６１０により、そのＵＲＬに含まれるユニークキーに対応する動画像が動画像記憶部１９７から読み出され、読み出された動画像がアクセス元に提供される。また、要約提供部６２０により、そのＵＲＬに含まれるユニークキーに対応する要約のテキスト情報が動画像記憶部１９７から読み出され、読み出されたテキスト情報がアクセス元に提供される。動画像または要約は、例えば、ホームページ形式で提供することができる。
【０１３０】
なお、インパクトのある画像をレイアウトする場合は、上記同様の要領で、「インパクトのある画像」を指定するとともに、所望のテンプレートをレイアウト条件として指定すればよい。
また、特定画風の画像をレイアウトする場合は、上記同様の要領で、「特定画風の画像」を指定するとともに、所望のテンプレートをレイアウト条件として指定すればよい。
【０１３１】
このようにして、本実施の形態では、動画像に付帯する音声情報に基づいて音声認識を行いその認識結果であるテキスト情報を出力する音声認識部１９０と、動画像を構成する複数の静止画像のなかから画像を選択する画像選択部１５０と、選択画像に係るユニークキーを含むＵＲＬおよび選択画像をレイアウトする画像レイアウト部１７２とを備える。
【０１３２】
これにより、動画像に付帯する音声情報等を入手するためのＵＲＬおよび選択画像がレイアウトされるので、選択画像だけでなく動画像に付帯する音声もレイアウトに反映させることができる。したがって、従来に比して、比較的内容の充実した編集物を作成することができる。
さらに、本実施の形態では、さらに、動画像のうち選択画像を含むものをユニークキーと対応付けて記憶する動画像記憶部１９７と、動画像記憶部１９７の動画像を提供する動画像提供部６１０とを備え、動画像提供部６１０は、ＵＲＬに基づくアクセスがあったときは、そのＵＲＬに含まれるユニークキーに対応する動画像を動画像記憶部１９７から読み出し、読み出した動画像をアクセス元に提供するようになっている。
【０１３３】
これにより、ＵＲＬに基づいて画像レイアウト装置にアクセスすることにより、選択画像を含む動画像を入手することができる。また、ＵＲＬを参照することにより動画像へのアクセスが可能となるので、動画像の入手が比較的容易となる。
さらに、本実施の形態では、さらに、要約をユニークキーと対応付けて記憶する動画像記憶部１９７と、動画像記憶部１９７の要約を提供する要約提供部６２０とを備え、要約提供部６２０は、ＵＲＬに基づくアクセスがあったときは、そのＵＲＬに含まれるユニークキーに対応する要約のテキスト情報を動画像記憶部１９７から読み出し、読み出したテキスト情報をアクセス元に提供するようになっている。
【０１３４】
これにより、ＵＲＬに基づいて画像レイアウト装置にアクセスすることにより、選択画像に関連する要約を入手することができる。また、ＵＲＬを参照することにより要約へのアクセスが可能となるので、要約の入手が比較的容易となる。上記第２の実施の形態において、画像選択部１５０は、発明１、２、６または１８の画像選択手段に対応し、画像選択部１５０による選択は、発明１９の画像選択ステップに対応し、画像レイアウト部１７２は、発明１、３、５、９、１０、１２、１７または１８のレイアウト手段に対応している。また、画像レイアウト部１７２によるレイアウトは、発明１９のレイアウトステップに対応し、印刷部１８０は、発明１７の印刷手段に対応し、音声認識部１９０は、発明１、３、５、７、１１または１８の付帯情報取得手段に対応している。
【０１３５】
また、上記第２の実施の形態において、音声認識部１９０による取得は、発明１９の付帯情報取得ステップに対応し、要約作成部１９３は、発明５の要約作成手段に対応し、シーン区切検出部１９４は、発明６または７のシーン区切検出手段に対応し、ユニークキー生成部１９５は、発明１１または１３の識別情報生成手段に対応している。また、動画像生成部１９６は、発明１３の動画像生成手段に対応し、動画像記憶部１９７は、発明１２若しくは１３の動画像記憶手段、または発明１０若しくは１１の付帯情報記憶手段に対応し、動画像提供部６１０は、発明１２の動画像提供手段に対応している。
【０１３６】
また、上記第２の実施の形態において、要約提供部６２０は、発明１０の付帯情報提供手段に対応し、音声情報は、発明１、３、７、９ないし１１、１８または１９の付帯情報に対応し、ユニークキーは、発明１０ないし１３の識別情報に対応し、ＵＲＬは、発明１０、１２または１４の参照情報に対応している。
なお、上記第１および第２の実施の形態においては、動画像に付帯する音声情報に基づいて音声認識を行いその認識結果であるテキスト情報を出力する音声認識部１９０を設けて構成したが、これに限らず、動画像にクリップ等のテキスト情報が付帯している場合は、音声認識部１９０に代えてまたは音声認識部１９０とともに、動画像に付帯するテキスト情報を動画像から取得するテキスト情報取得部を設けて構成してもよい。この場合、テキスト情報取得部で取得したテキスト情報は、音声認識部１９０からのテキスト情報と同様に取り扱えばよい。
【０１３７】
この場合において、テキスト情報取得部は、発明４の付帯情報取得手段に対応し、画像レイアウト部１７０，１７２は、発明４のレイアウト手段に対応し、テキスト情報は、発明４の付帯情報に対応している。
また、上記第２の実施の形態においては、動画像または要約を参照するための情報としてＵＲＬを採用するように構成したが、これに限らず、バーコードを採用するように構成してもよい。
【０１３８】
これにより、バーコードを参照することにより要約または動画像へのアクセスが可能となるので、要約または動画像の入手が比較的容易となる。
また、上記第２の実施の形態においては、ユニークキーおよび画像レイアウト装置のネットワークアドレスを含むＵＲＬをレイアウトするように構成したが、これに限らず、ＵＲＬには、さらに、広告情報を含めてもよい。
【０１３９】
この場合において、ＵＲＬは、発明１６の参照情報に対応している。
また、上記第１および第２の実施の形態において、シーン区切検出部１９２，１９４は、シーン区切情報を音声認識部１９０に出力し、音声認識部１９０は、シーン区切検出部１９２からのシーン区切情報に基づいて、動画像取得部１１０で取得した動画像のうち同一シーンまたは特定シーンに係るものに付帯する音声情報をその動画像から取得し、取得した音声情報に基づいて音声認識を行い、認識結果としてテキスト情報を要約作成部１９１に出力するように構成したが、これに限らず、シーン区切検出部１９２，１９４は、シーン区切情報を要約作成部１９１，１９３に出力し、要約作成部１９１，１９３は、シーン区切検出部１９２からのシーン区切情報に基づいて、動画像取得部１１０で取得した動画像のうち同一シーンまたは特定シーンに係るものに係るテキスト情報に基づいて要約を作成するように構成してもよい。
【０１４０】
これにより、動画像が複数のシーンからなる場合は、各シーンごとに要約を作成することができるので、比較的詳細に編集を行うことができる。
この場合において、シーン区切検出部１９２，１９３は、発明８のシーン区切検出手段に対応し、音声認識部１９０は、発明８の付帯情報取得手段に対応し、要約作成部１９１，１９３は、発明８の要約作成手段に対応している。
【０１４１】
また、上記第１の実施の形態においては、テンプレートに従って選択画像および要約をレイアウトするように構成したが、さらに具体的には、図１３に示すように、紙芝居風のテンプレートを用意し、テンプレートの文字格納枠に要約のテキスト情報を格納するように構成してもよい。図１３は、紙芝居風のテンプレートの構造を示す図である。また、図１３の例では、選択画像と要約を同一面に印刷するようにしたが、選択画像を表面に印刷し、要約を裏面に印刷するようにしてもよい。これにより、紙芝居を自動的に生成することができる。
【０１４２】
また、上記第１および第２の実施の形態においては、自動的にレイアウトされた印刷データを作成し、それを自動印刷するように構成したが、これに限らず、自動的にレイアウトした印刷データを作るのではなく、上位数枚を１枚ごと直接印刷するように構成することもできる。これにより、例えば、３枚だけ綺麗な画像を直ぐに印刷したい場合にも適用することができる。
【０１４３】
また、上記第１および第２の実施の形態においては、音声認識部１９０を設けたが、例えば、音声認識部１９０がＨＭＭ（Ｈｉｄｄｅｎ　Ｍａｒｋｏｖ　Ｍｏｄｅｌ）を利用して音声認識を行うようになっている場合は、シーン区切情報を利用し、音声認識に利用する言語モデルを変化させることや、シーンの区切において、音声認識の状態遷移をリセットすることが考えられる。
【０１４４】
また、上記第１および第２の実施の形態において、動画像取得部１１０は、複数の動画像を記憶した動画像記憶媒体５０が与えられたときは、与えられた動画像記憶媒体５０からいずれかの動画像を取得するように構成したが、これに限らず、動画像を少なくとも含むマルチメディアデータが与えられたときは、与えられたマルチメディアデータから動画像を抽出するように構成してもよい。
【０１４５】
これにより、マルチメディアデータをレイアウト対象として取り扱うことができる。
また、上記第１の実施の形態においては、自動的にレイアウトされた印刷データを作成し、それを自動印刷するように構成したが、これに限らず、上位数枚を１枚ごと直接印刷するように構成することもできる。
【０１４６】
これにより、例えば、３枚だけ綺麗な画像をすぐに印刷したい場合にも対応することができる。また、ディジタルビデオカメラで撮影されたメモリカード等をプリンタに差し込んだ際に、直接印刷するようなサービスやシステムを構築することができる。
また、上記第１および第２の実施の形態においては、ユーザの好みに適合した画像を複数の静止画像のなかから選択するように構成したが、これに限らず、一般的に印象の良い画像を複数の静止画像のなかから選択するように構成してもよい。この場合、複数のユーザに印象の良いと思う画像を指定してもらい、指定された画像の特徴を、上記第１および第２の実施の形態と同じ要領でニューラルネットワーク４００に学習させておけばよい。
【０１４７】
さらに、この場合、複数のユーザに印象の良し／悪しを入力してもらうだけでなく、印象の強い／弱いを入力してもらい、これに基づいてニューラルネットワーク４００に学習させることも可能である。これにより、一般的なユーザ特性が学習できるため、複数の人の好みに適合した画像を選択するのに好適な画像レイアウト装置を構成することができる。
【０１４８】
さらに、この場合、例えば、１０代、２０代、３０代など、年齢に応じてユーザをグループ分けして、各グループごとに、そのユーザに印象の良いと思う画像を指定してもらい、指定された画像の特徴をニューラルネットワーク４００に学習させることも可能である。これにより、同世代の人の好みに適合した画像を選択するのに好適な画像レイアウト装置を構成することができる。また、ある画像が何代の人に好まれるかを調べることにも使用できる。
【０１４９】
また、上記第１および第２の実施の形態において、ニューラルネットワーク４００は、出力層Ｏ_ｋを一つだけ設けて構成したが、これに限らず、複数の出力層を設けて構成してもよい。例えば、ユーザの好き／嫌いのいずれかを出力する第１の出力層と、ユーザの印象の良し／悪しのいずれかを出力する第２の出力層と、ユーザの印象の強さ／弱さのいずれかを出力する第３の出力層とを設けて構成することもできる。
【０１５０】
また、上記第１および第２の実施の形態においては、動画像取得部１１０で取得した動画像を構成するすべての静止画像から画像特徴情報を抽出するように構成したが、これに限らず、動画像取得部１１０で取得した動画像を構成する複数の静止画像のうち所定の抽出条件を満たすものから画像特徴情報を抽出するように構成してもよい。所定の抽出条件としては、例えば、色の分布を算出し、算出した分布が所定の閾値以上という条件を設定することができる。これにより、色が全体的に暗すぎる画像は抽出の対象外とすることができる。
【０１５１】
また、上記第１および第２の実施の形態においては、画像を構成するすべての画素の特徴量を抽出し、抽出した特徴量に基づいて学習を行うように構成したが、これに限らず、例えば、縦方向５つ横方向５つのピクセルからなる矩形領域の画素群において４つ角の画素を対象とし、対象画素の特徴量（例えば、平均値）を抽出し、抽出した特徴量に基づいて学習を行うように構成してもよい。
【０１５２】
また、上記第１および第２の実施の形態においては、特徴量Ｍ_ｘｙ，Ｎ_１ｘｙ，Ｎ_２ｘｙ，Ｎ_３ｘｙ，Ｃ_ｉおよびＥに基づいて画像選択および学習を行うように構成したが、これに限らず、特徴量Ｍ_ｘｙ，Ｎ_１ｘｙ，Ｎ_２ｘｙ，Ｎ_３ｘｙ，Ｃ_ｉおよびＥのうちいずれかに基づいて画像選択および学習を行うように構成してもよい。
また、上記第１および第２の実施の形態においては、ニューラルネットワーク４００の学習法としてバックプロパゲーション法を例示したが、これに限らず、自己組織化による教師なし学習法を利用することもできる。これにより、例えば、ユーザがディジタルビデオカメラで撮影した動画像を構成する２５枚の静止画像の特徴を学習し、その画像の傾向に沿って学習することができ、そのユーザの好みを自動的に学習することができる。
【０１５３】
また、上記第１および第２の実施の形態においては、静止画像を白黒２値化処理した画像に基づいて、誘導場の強さＭ_ｘｙ、等ポテンシャル線の複雑度Ｃ_ｉおよび誘導場のエネルギＥを算出するように構成したが、これに限らず、カラーの静止画像そのものに基づいて、誘導場の強さＭ_ｘｙ、等ポテンシャル線の複雑度Ｃ_ｉおよび誘導場のエネルギＥを算出するように構成することもできる。
【０１５４】
また、上記第１および第２の実施の形態においては、三原色輝度値を各原色ごとのベクトルＮ_１ｘｙ，Ｎ_２ｘｙおよびＮ_３ｘｙとして取り扱ったが、これに限らず、加算等を行って、１つのベクトルとして取り扱ってもよい。
また、上記第１および第２の実施の形態において、レイアウト部１００、学習部２００、条件入力部３００または情報提供部６００を実現するにあたってはいずれも、ＲＯＭにあらかじめ格納されている制御プログラムを実行する場合について説明したが、これに限らず、これらの手順を示したプログラムが記憶された記憶媒体から、そのプログラムをＲＡＭに読み込んで実行するようにしてもよい。
【０１５５】
ここで、記憶媒体とは、ＲＡＭ、ＲＯＭ等の半導体記憶媒体、ＦＤ、ＨＤ等の磁気記憶型記憶媒体、ＣＤ、ＣＤＶ、ＬＤ、ＤＶＤ等の光学的読取方式記憶媒体、ＭＯ等の磁気記憶型／光学的読取方式記憶媒体であって、電子的、磁気的、光学的等の読み取り方法のいかんにかかわらず、コンピュータで読み取り可能な記憶媒体であれば、あらゆる記憶媒体を含むものである。
【０１５６】
また、上記第１および第２の実施の形態においては、本発明に係る画像レイアウト装置および画像レイアウトプログラム、並びに画像レイアウト方法を、ディジタルビデオカメラで撮影した動画像のなかから画像を選択し、選択した画像を自動レイアウトする場合について適用したが、これに限らず、本発明の主旨を逸脱しない範囲で他の場合にも適用可能である。例えば、次のような変形例が考えられる。
【０１５７】
第１に、動画像に付帯させる情報は時間情報でもよい。このとき、例えば、ディジタルビデオカメラで撮影されたゴルフスイングをコマごとに表示させ、なおかつ、時間情報も付与することによって、さらに内容が充実した編集物を作成することができる。
第２に、ＵＲＬによって得られる情報は、動画像および要約に限らず、３Ｄ映像であってもよい。
【０１５８】
第３に、ＵＲＬに基づいて入手した動画像または要約の表示方法は、Ｗｅｂブラウザ経由でコンピュータ上に表示させたが、専用アプリケーションを利用して表示してもよいし、テレビ、カーナビ、携帯電話または携帯端末に直接表示してもよい。
第４に、ＵＲＬがＳＳＬ（Ｓｅｃｕｒｅ　Ｓｏｃｋｅｔ　Ｌａｙｅｒ）に対応していてもよい。
【０１５９】
第５に、動画像記憶部１９７には、動画像および要約をユニークキーと対応付けて記憶させたが、さらに、動画像、要約、広告情報、解説記事およびコラムをユニークキーと対応付けて記憶させてもよい。したがって、ＵＲＬに基づいてアクセスすれれば、動画像および要約だけでなく広告情報、解説記事およびコラムを入手することができる。
【０１６０】
第６に、動画像記憶部１９７には、変換部によって動画像を変換したメディア情報を記憶させてもよい。例えば、３Ｄ用にファイルフォーマットを変換し、それを記憶させてもよい。
【０１６１】
【発明の効果】
以上説明したように、本発明に係る請求項１ないし１７記載の画像レイアウト装置によれば、付帯情報に基づいて選択画像がレイアウトされるので、選択画像だけでなく動画像に付帯する音声や文字等もレイアウトに反映させることができる。したがって、従来に比して、比較的内容の充実した編集物を作成することができるという効果が得られる。
【０１６２】
さらに、本発明に係る請求項２記載の画像レイアウト装置によれば、ユーザの好みに比較的沿った内容の編集物を作成することができるという効果も得られる。
さらに、本発明に係る請求項３記載の画像レイアウト装置によれば、動画像に付帯する音声をレイアウトに反映させることができるので、さらに内容の充実した編集物を作成することができるという効果も得られる。
【０１６３】
さらに、本発明に係る請求項４記載の画像レイアウト装置によれば、動画像に付帯するテキストを併せてレイアウトすることができるので、さらに内容の充実した編集物を作成することができるという効果も得られる。
さらに、本発明に係る請求項５または８記載の画像レイアウト装置によれば、動画像に付帯する付帯情報に関する要約を併せてレイアウトすることができるので、さらに内容の充実した編集物を作成することができるとともに、付帯情報に係るレイアウト部分が比較的簡潔明瞭となり、読みやすくなるという効果も得られる。
【０１６４】
さらに、本発明に係る請求項６記載の画像レイアウト装置によれば、動画像が複数のシーンからなる場合は、各シーンごとに画像を選択することができるので、比較的詳細に編集を行うことができるという効果も得られる。
さらに、本発明に係る請求項７記載の画像レイアウト装置によれば、動画像が複数のシーンからなる場合は、各シーンごとに付帯情報を取得することができるので、比較的詳細に編集を行うことができるという効果も得られる。
【０１６５】
さらに、本発明に係る請求項８記載の画像レイアウト装置によれば、動画像が複数のシーンからなる場合は、各シーンごとに要約を作成することができるので、比較的詳細に編集を行うことができるという効果も得られる。
さらに、本発明に係る請求項１０または１１記載の画像レイアウト装置によれば、参照情報に基づいて画像レイアウト装置にアクセスすることにより、選択画像に関連する付帯情報を入手することができるという効果も得られる。
【０１６６】
さらに、本発明に係る請求項１２または１３記載の画像レイアウト装置によれば、参照情報に基づいて画像レイアウト装置にアクセスすることにより、選択画像を含む動画像を入手することができるという効果も得られる。
さらに、本発明に係る請求項１４記載の画像レイアウト装置によれば、ＵＲＬを参照することにより付帯情報または動画像へのアクセスが可能となるので、付帯情報または動画像の入手が比較的容易となるという効果も得られる。
【０１６７】
さらに、本発明に係る請求項１５記載の画像レイアウト装置によれば、バーコードを参照することにより付帯情報または動画像へのアクセスが可能となるので、付帯情報または動画像の入手が比較的容易となるという効果も得られる。
一方、本発明に係る請求項１８記載の画像レイアウトプログラムによれば、請求項１記載の画像レイアウト装置と同等の効果が得られる。
【０１６８】
一方、本発明に係る請求項１９記載の画像レイアウト方法によれば、請求項１記載の画像レイアウト装置と同等の効果が得られる。
【図面の簡単な説明】
【図１】ディジタル画像の画素配列を示す図である。
【図２】視覚の誘導場の強さを求める際の遮蔽条件を説明する図である。
【図３】文字「Ａ」の視覚の誘導場の例であり、同図（ａ）は遮蔽条件を考慮して視覚の誘導場を求めた場合、同図（ｂ）は遮蔽条件を考慮しないで視覚の誘導場を求めた場合を示す図である。
【図４】基準となるレイアウト例としてのある新聞記事の一部分の画像を示す図である。
【図５】図４に示す画像に対し、文字列部分はそれぞれの文字列を単純な線で表し、写真は単に矩形枠で表して誘導場を計算し、計算された誘導場から得られた等ポテンシャル線を示す図である。
【図６】図４で示した基準レイアウトとその基準レイアウトを種々変化させたレイアウトとした場合の図である。
【図７】図６（ａ）〜（ｄ）のようなレイアウトとしたときのそれぞれのレイアウトに対する複雑度を示す図である。
【図８】本発明に係る画像レイアウト装置の構成を示す機能ブロック図である。
【図９】ニューラルネットワーク４００の構成を示す図である。
【図１０】テンプレートの構造を示す図である。
【図１１】本発明に係る画像レイアウト装置の構成を示す機能ブロック図である。
【図１２】ユニークキー別対応テーブル７００のデータ構造を示す図である。
【図１３】紙芝居風のテンプレートの構造を示す図である。
【符号の説明】
５０…動画像記憶媒体，１００…レイアウト部，１１０…動画像取得部，１２０…画像特徴情報抽出部，１３０…ユーザモデル記憶部，１４０…評価値算出部，１５０…画像選択部，１６０…テンプレート記憶部，１７０，１７２…画像レイアウト部，１８０…印刷部，１８５…表示部，１９０…音声認識部，１９１，１９３…要約作成部，１９２，１９４…シーン区切検出部，１９５…ユニークキー生成部，１９６…動画像生成部，１９７…動画像記憶部，２００…学習部，２１０…画像指定入力部，２２０…画像特徴情報抽出部，２３０…特徴学習部，３００…条件入力部，３１０…評価値算出条件入力部，３２０…画像選択条件入力部，３３０…レイアウト条件入力部，４００…ニューラルネットワーク，５００〜５２８…画像格納枠，６００…情報提供部，６１０…動画像提供部，６２０…要約提供部，７００…ユニークキー別対応テーブル[0001]
TECHNICAL FIELD OF THE INVENTION
The present invention relates to an apparatus, a program, and a method for laying out an image selected from a plurality of still images constituting a moving image, and in particular, reflects not only images but also voices and characters accompanying the moving image in the layout. The present invention relates to an image layout apparatus, an image layout program, and an image layout method that are suitable for creating an edited material having a sufficient content.
[0002]
[Prior art]
When shooting a moving image with a digital video camera and creating an original album using the shot moving image, the user can, for example, import the shot moving image to a personal computer and select from among the multiple still images that compose the moving image After selecting an image, laying out the selected image and printing it with a printer. In many cases, an application on a personal computer automatically selects a number of images to be printed and applies the selected image to a specified template to perform automatic layout.
[0003]
2. Description of the Related Art Conventionally, as a technique for creating an album or other edited material using an image, for example, an electronic picture book display device (hereinafter, referred to as a first conventional example) disclosed in Patent Document 1 and a technology disclosed in Patent Document 2 are disclosed. Original picture book (hereinafter referred to as a second conventional example), an original picture book disclosed in Patent Document 3 (hereinafter referred to as a third conventional example), and a video printer disclosed in Patent Document 4 (hereinafter referred to as a third conventional example). Hereinafter, this is referred to as a fourth conventional example).
[0004]
In the first conventional example, a recording medium in which an image and text to be a picture book are written in advance, a background memory for storing background image data read from the recording medium and a moving image memory for storing moving image data, And a synthesizing unit for synthesizing the background data and the moving image data read from the memory for moving images and the moving image memory into a picture book.
[0005]
This allows the picture book to be digitized, adding effects such as simple animation functions and sound, which cannot be obtained with a static picture, and changing the subject of the picture book without reprinting without using paper. To enable easy and long-term storage.
In the second conventional example, a specific character in a story of a picture book prepares each page of a blank picture book in advance, processes a person image taken by a digital camera by a computer, and obtains a blank character. By laying out and printing parts, you can complete the picture book and freely set the people and names appearing in the story of the picture book, so there is only one book in the world, different picture books one by one It is easy to print at a cost. Also, in the method of producing an original picture book, the original picture book can be easily produced in a short time by binding a hardcover to the original paper of the printed picture book by a unique method.
[0006]
In the third conventional example, in the original picture book, the names and photos of the characters are replaced with the names and photos of individuals, and are created. As an example, "Momojirou" has its original version "Momotaro", its title is replaced with "Momotaro", and in the picture, it is replaced with "Momotaro" face and "Child (Jiro)" Is used. In addition, the description of the name in the story text has been replaced with "Momotaro" and "Momotaro" has been replaced with a photo of "Child (Jiro)". It becomes an original picture book of only books.
[0007]
In the fourth conventional example, an index code superimposed on a television signal is detected by an index code detector. Then, when the index code is detected, the main CPU controls the recording of the image on which the index code is superimposed in the frame memory and controls the printing by the mechanism unit.
This makes it possible to appropriately obtain only necessary images without watching the television program from the beginning to the end.
[0008]
[Patent Document 1]
JP-A-5-120400
[0009]
[Patent Document 2]
JP-A-11-128553
[0010]
[Patent Document 3]
Japanese Utility Model Application No. 10-8420
[0011]
[Patent Document 4]
JP-A-7-170473
[0012]
[Problems to be solved by the invention]
When creating an original album from a moving image captured by a digital video camera, not only lay out an image selected from a plurality of still images that compose the moving image, but also text such as audio and clips attached to the moving image. If you lay it out together, you can create an album with a lot of content.
[0013]
However, in the first conventional example, only a picture book is created by synthesizing background data and moving image data, and it is not possible to reflect sounds, characters, and the like attached to a moving image in a compilation. Can not.
Further, in the second conventional example, a picture book is created by processing a human image photographed by a digital camera by a computer, laying out the image on a blank character portion and printing it. Similarly, it is not possible to reflect sounds, characters, and the like accompanying the moving image on the edited material.
[0014]
Further, in the third conventional example, only a picture book is created by replacing the names of the characters and the photos with the names and photos of the individuals, and similarly, the voices and the accompanying images of the moving images are similarly created. Characters cannot be reflected in the compilation.
Further, in the fourth conventional example, when the index code is detected, the main CPU controls the recording of the image on which the index code is superimposed in the frame memory and prints the image in the mechanism unit. Similarly, it is not possible to reflect the sound, characters, and the like accompanying the moving image in the edited material.
[0015]
Therefore, the present invention has been made by focusing on such unresolved problems of the conventional technology, and reflects not only images but also voices and characters attached to moving images in the layout, and It is an object of the present invention to provide an image layout apparatus, an image layout program, and an image layout method suitable for creating a substantial compilation.
[0016]
[Means for Solving the Problems]
[Invention 1]
In order to achieve the above object, an image layout device according to Invention 1 is
An apparatus for selecting an image from among a plurality of still images constituting a moving image and laying out the selected image,
Additional information acquiring means for acquiring additional information attached to the moving image from the moving image, image selecting means for selecting an image from among the plurality of still images, and additional information based on the additional information acquired by the additional information acquiring means. Layout means for laying out the selected image selected by the image selecting means.
[0017]
With such a configuration, the additional information is acquired from the moving image by the additional information acquiring unit, and the image is selected from the plurality of still images constituting the moving image by the image selecting unit. Then, the layout unit lays out the selected image based on the acquired additional information.
Here, the supplementary information refers to information attached to the moving image, and includes, for example, audio information or text information attached to the moving image.
[0018]
To lay out the selected image based on the additional information, for example, if the additional information is information that can be laid out, lay out the additional information and the selected image, and if the additional information is information that cannot be laid out, And laying out the processing result and the selected image by performing information processing using the supplementary information. Even if the supplementary information is information that can be laid out, the layout can be performed by the latter method.
[Invention 2]
Further, the image layout device of the second aspect is the image layout device of the first aspect,
The image selecting means is configured to select an image suitable for the user's preference from the plurality of still images.
[0019]
With such a configuration, an image suitable for the user's preference is selected from the plurality of still images by the image selection unit.
Here, as a configuration for selecting an image suitable for the user's preference, more specifically, a user information storage means for storing user information relating to the user's preference is provided, and the image selection means is provided with user information in the user information storage means. , A configuration for selecting an image from among a plurality of still images can be proposed.
[Invention 3]
Further, the image layout device of the invention 3 is the image layout device of any of the

inventions

1 and 2,
Audio information is attached to the moving image,
The supplementary information acquisition unit performs speech recognition based on speech information attached to the moving image, and acquires text information as a recognition result as the supplementary information,
The layout means lays out the text information and the selected image acquired by the incidental information acquisition means.
[0020]
With such a configuration, the additional information acquiring unit performs voice recognition based on the audio information attached to the moving image, and obtains text information as a recognition result as the additional information. Then, the acquired text information and the selected image are laid out by the layout means.
[Invention 4]
Furthermore, the image layout apparatus of the fourth aspect is the image layout apparatus of any one of the first and second aspects,
The moving image has text information attached thereto,
The supplementary information acquiring means is configured to acquire text information attached to the moving image as the supplementary information,
The layout means lays out the text information and the selected image acquired by the incidental information acquisition means.
[0021]
With such a configuration, text information accompanying the moving image is obtained as the additional information by the additional information obtaining unit, and the obtained text information and the selected image are laid out by the layout unit.
[Invention 5]
Further, the image layout device of the fifth aspect is the image layout device of any one of the third and fourth aspects,
Further, a summary creating means for creating a summary based on the text information obtained by the incidental information obtaining means,
The layout means lays out the summary created by the summary creation means and the selected image.
[0022]
With such a configuration, a summary is created based on the acquired text information by the summary creating unit, and the created summary and the selected image are laid out by the layout unit.
[Invention 6]
Furthermore, the image layout device of the invention 6 is the image layout device of any of the inventions 1 to 5,
Further, a scene division detecting unit for detecting a scene division from the moving image,
The image selection unit is configured to select an image from the plurality of still images based on a detection result of the scene division detection unit.
[0023]
With such a configuration, when a scene division is detected from a moving image by the scene division detection unit, an image is selected from a plurality of still images based on the detection result of the scene division detection unit by the image selection unit. Is done.
Here, as a configuration for selecting an image based on the detection result of the scene division detection means, more specifically, the scene division detection means detects a scene division from a moving image and sets scene division information indicating the scene division. Output, and the image selecting unit proposes a configuration for selecting an image from those belonging to the same scene or a specific scene among a plurality of still images constituting the moving image based on the scene dividing information from the scene dividing detecting unit. can do.
[Invention 7]
Furthermore, the image layout device of the seventh aspect is the image layout device of any one of the first to fifth aspects.
Further, a scene division detecting unit for detecting a scene division from the moving image,
The supplementary information acquiring means is configured to acquire the supplementary information from the moving image based on a detection result of the scene division detecting means.
[0024]
With such a configuration, when a scene segmentation is detected from the moving image by the scene segmentation detection unit, the supplementary information acquisition unit acquires the supplementary information from the moving image based on the detection result of the scene segmentation detection unit. .
Here, as a configuration for acquiring the supplementary information based on the detection result of the scene partition detection unit, more specifically, the scene partition detection unit detects the scene partition from the moving image and sets the scene partition indicating the scene partition. The information is output, and the supplementary information acquisition unit proposes a configuration for acquiring, from the video, supplementary information associated with the same scene or a specific scene in the video based on the scene partition information from the scene partition detection unit. can do.
[Invention 8]
Further, the image layout device of the invention 8 is the image layout device of the invention 5,
Further, a scene division detecting unit for detecting a scene division from the moving image,
The summary creating means creates an abstract based on the detection result of the scene section detection means and the text information acquired by the incidental information acquisition means.
[0025]
With such a configuration, when a scene segmentation is detected from a moving image by the scene segmentation detection unit, a summary is created by the summary creation unit based on the detection result of the scene segmentation detection unit and the acquired text information. You.
Here, as a configuration for creating an abstract based on the detection result of the scene division detection means, more specifically, the scene division detection means detects the scene division from the moving image and sets scene division information indicating the scene division. The summarization creating means may propose a configuration for creating a summary based on the scene delimitation information from the scene delimitation detecting means and text information obtained from a moving image related to the same scene or a specific scene. it can.
[Invention 9]
Further, the image layout apparatus of the ninth aspect is the image layout apparatus of any of the first to eighth aspects,
The layout means selects the template from a plurality of different templates constituting a layout framework, and lays out the selected image based on the selected template and the supplementary information. .
[0026]
With such a configuration, a template is selected from a plurality of different templates by the layout unit, and the selected image is laid out based on the selected template and the accompanying information.
[Invention 10]
Further, the image layout device of the tenth aspect is the image layout device of any of the first to ninth aspects,
Further, additional information storage means for storing the additional information in association with identification information, and additional information providing means for providing additional information of the additional information storage means,
The layout means lays out the reference information including the identification information corresponding to the incidental information and the selected image,
The supplementary information providing means, when there is an access based on the reference information, reads the supplementary information corresponding to the identification information included in the reference information from the supplementary information storage means and provides the read supplementary information to an access source. Is characterized in that
[0027]
With such a configuration, the layout information lays out the reference information including the identification information corresponding to the incidental information and the selected image. Therefore, it is possible to refer to the reference information from the layout result and access the image layout device based on the reference information.
On the other hand, if there is an access based on the reference information, the additional information providing means reads the additional information corresponding to the identification information included in the reference information from the additional information storage means, and stores the read additional information as the access source. Provided.
[0028]
Here, the layout means only needs to lay out at least the reference information and the selected image, and may lay out the supplementary information, the reference information and the selected image, or only the reference information and the selected image. May be laid out. Hereinafter, the same applies to the image layout apparatus of the twelfth aspect.
[0029]
The supplementary information storage means stores the supplementary information by all means and at all times, and may store the supplementary information in advance, or may store the supplementary information without storing the supplementary information in advance. Additional information may be stored by an external input or the like during operation of the apparatus.
[Invention 11]
Further, the image layout device of the eleventh aspect is the image layout device of the tenth aspect,
Further, the apparatus further includes an identification information generation unit configured to generate the identification information,
The supplementary information acquisition unit acquires the supplementary information from the moving image, and stores the acquired supplementary information in the supplementary information storage unit in association with the identification information generated by the identification information generation unit. It is characterized by having.
[0030]
With such a configuration, the identification information is generated by the identification information generating means, the additional information is obtained from the moving image by the additional information obtaining means, and the obtained additional information is associated with the generated identification information. And stored in the accompanying information storage means.
[Invention 12]
Further, the image layout apparatus of the twelfth aspect is the image layout apparatus of any one of the first to ninth aspects,
Further, a moving image storage unit that stores a moving image including the selected image in association with identification information, and a moving image providing unit that provides a moving image of the moving image storage unit,
The layout means lays out the reference information and the selected image including identification information corresponding to the moving image including the selected image,
When there is an access based on the reference information, the moving image providing unit reads a moving image corresponding to the identification information included in the reference information from the moving image storage unit, and provides the read moving image to an access source. Is characterized in that
[0031]
With this configuration, the layout unit lays out the reference information including the identification information corresponding to the moving image including the selected image and the selected image.
On the other hand, when there is an access based on the reference information, the moving image providing means reads the moving image corresponding to the identification information included in the reference information from the moving image storage means, and the read moving image is used as the access source. Provided.
[0032]
Here, the moving image storage means is to store the moving image by all means and at all times, and may be the one in which the moving image is stored in advance, or without storing the moving image in advance, A moving image may be stored by an external input or the like during operation of the present apparatus.
[Invention 13]
Further, the image layout apparatus of the thirteenth aspect is the image layout apparatus of the twelfth aspect,
Further, the image processing apparatus further includes identification information generating means for generating the identification information, and a moving image generating means for generating a moving image,
The moving image generating unit generates an image including the selected image among the moving images, and stores the generated moving image in the moving image storage unit in association with the identification information generated by the identification information generating unit. It is characterized by the following.
[0033]
With such a configuration, the identification information is generated by the identification information generating means, the moving image including the selected image is generated by the moving image generating means, and the generated moving image is generated by the generated identification image. The information is stored in the moving image storage unit in association with the information.
[Invention 14]
Further, the image layout device of the invention 14 is the image layout device of any of the inventions 10 to 13,
The reference information is a URL.
[0034]
With such a configuration, the layout unit lays out the URL including the identification information and the selected image. Therefore, it is possible to refer to the URL from the layout result and access the image layout apparatus based on the URL.
[Invention 15]
Further, the image layout device of the invention 15 is the image layout device of any of the inventions 10 to 13,
The reference information is a barcode.
[0035]
With such a configuration, the barcode including the identification information and the selected image are laid out by the layout unit. Therefore, it is possible to refer to the barcode from the layout result and access the image layout device based on the barcode.
[Invention 16]
Further, the image layout device of the sixteenth aspect is the image layout device of any of the tenth to fifteenth aspects,
The reference information includes advertisement information.
[0036]
With such a configuration, the layout information lays out the reference information including the identification information and the advertisement information and the selected image.
[Invention 17]
Furthermore, the image layout apparatus of the seventeenth aspect is the image layout apparatus of any one of the first to sixteenth aspects,
Further, a printing unit for printing based on a layout result of the layout unit is provided.
[0037]
With such a configuration, printing is performed by the printing unit based on the layout result of the layout unit.
[Invention 18]
On the other hand, in order to achieve the above object, an image layout program according to invention 18 is
A program for selecting an image from a plurality of still images constituting a moving image and laying out the selected image,
Additional information acquiring means for acquiring additional information attached to the moving image from the moving image, image selecting means for selecting an image from among the plurality of still images, and additional information acquired by the additional information acquiring means based on the additional information The program is a program for causing a computer to execute processing realized as a layout unit that lays out a selected image selected by the image selection unit.
[0038]
With such a configuration, when the program is read by the computer and the computer executes the processing in accordance with the read program, an operation equivalent to that of the image layout apparatus of the first aspect is obtained.
[Invention 19]
On the other hand, in order to achieve the above object, an image layout method according to a nineteenth aspect,
A method of selecting an image from a plurality of still images constituting a moving image and laying out the selected image,
An incidental information acquiring step of acquiring incidental information attached to the moving image from the moving image, an image selecting step of selecting an image from the plurality of still images, and based on the incidental information acquired in the incidental information acquiring step. And laying out the selected image selected in the image selecting step.
[0039]
BEST MODE FOR CARRYING OUT THE INVENTION
Hereinafter, a first embodiment of the present invention will be described with reference to the drawings. FIGS. 1 to 10 are diagrams showing a first embodiment of an image layout apparatus, an image layout program, and an image layout method according to the present invention.
This embodiment applies the image layout apparatus, the image layout program, and the image layout method according to the present invention to a case where an image is selected from moving images shot by a digital video camera and the selected image is automatically laid out. It was done. It should be noted that the moving image captured by the digital video camera is accompanied by audio information and text information.
[0040]
The present invention realizes selection of an image having an optimal layout and layout of the selected image with good appearance by using the concept of a “visual guidance field” for image layout evaluation. First, a visual guidance field will be briefly described.
The visual guidance field is used as an index of the readability of the entire character string by, for example, evaluating the readability of individual characters existing on the character string.
[0041]
First, as an example of estimating a visual induction field of a character image based on physiological and psychological knowledge, a method of estimating a visual induction field from a digital image of a character obtained by digitization will be described.
Note that the state in which the individual characters in the character string are easy to read is that visual guidance fields surrounding the individual characters are arranged at intervals such that they do not interfere with each other as much as possible. Specifically, when considering a closed curve of a visual induction field surrounding each character, if the potential value of the closed curve is high, it is difficult to separate the character from other characters and it is difficult to read. From this, it is considered that the readability of each character in the character string can be quantitatively evaluated based on the spread of the visual induction field. The visual guidance field is described in “Psychology of Shape” by Yoshimasa Yokose (Nagoya University Press (1986)) (hereinafter referred to as a reference paper).
[0042]
The visual induction field (hereinafter simply abbreviated as induction field) shown in the reference paper describes a visual phenomenon by considering a “field” that spreads around a figure. Since the reference paper deals with figures composed of straight lines and circular arcs, it is not possible to find a guide field for arbitrary digital images. Here, a calculation method of an induction field in a black-and-white binary digital image will be described first.
[0043]
Since the induction field can be basically interpreted as a Coulomb potential, the pixels constituting the outer contour of the pattern are assumed to be point charges, and the distribution of the induction field in the digital image is calculated from the accumulation of the Coulomb potential generated by the pixels.
FIG. 1 is a diagram showing a pixel array of a digital image. As shown in FIG. 1, it is assumed that an induction field is formed at an arbitrary point P by a curve f (s) composed of a sequence of n points. The curve f (s) corresponds to a line segment of a line figure or an outline of an image figure. Then, each point p constituting the curve f (s) ₁ , P ₂ , ..., p _i , ..., p _n Is assumed to be a point charge having a positive charge of 1, and scanning is performed on the curve f (s) from the point P to obtain n points p forming the curve f (s). ₁ , P ₂ , ..., p _i , ..., p _n Is found, and the distance to each point on the curve f (s) found by scanning is represented by r. _i Then, the induction field strength M at the point P _xy Can be defined by the following equation (1). Note that M _xy The subscript xy represents the x coordinate and the y coordinate of the point P in the image.
[0044]
(Equation 1)

[0045]
By using the above equation (1), an induced field of an arbitrary digital image can be obtained. When there are a plurality of curves, the strength M of the induction field at the point P _xy Is the sum of the induction fields created by the individual curves at point P. In the above equation (1), there is a constraint that only the portion directly hit by the light emitted from the point P is summed. For example, for the point P, the curve f ₁ (S), f ₂ (S), f ₃ Assuming that (s) exists as shown in FIG. 2, a portion that cannot be seen from the point P, that is, in this case, the curve f ₁ The sum of the parts existing in the range Z that is shielded by (s) and cannot be seen from the point P is not calculated. In the example of FIG. ₃ All of (s) and curve f ₂ Partial sum of (s) will not be taken. This is called a shielding condition.
[0046]
FIG. 3A shows an example of an induction field calculated by the above equation (1) for the character “A”. A thin line L distributed in a contour line of the map around the character “A” in FIG. 3A is an equipotential line of the induction field, and the intensity M of the induction field increases from the center to the outside. _xy Weakens and approaches zero soon.
The feature in the shape and strength of the distribution of the induction field in FIG. 3A, particularly the feature in which the distribution near the apex of “A” is sharper than the others, is the induction field related to the vicinity of the corner of the figure, such as a square or a triangle according to the reference paper. Coincides with the results of psychological experiments.
[0047]
FIG. 3B shows an example of an induction field where there is no shielding condition and all pixels are assumed to be point charges of positive charge 1. The distribution of the induction field is generally round, and the psychology according to the reference paper is shown. It is different from the experimental result. Thus, the shielding condition is important in characterizing the induction field.
In this way, an induction field for a certain character can be obtained. Examples of the technique using the visual guidance field include, for example, “Michihiro Nagaishi:“ Easy-to-read Japanese proportional display using visual guidance field ””, Journal of the Institute of Image Media, Vol. 52, no. 12, pp. 1865-1872 (1998), "Masayoshi Miyoshi, Yoshifumi Shimoshio, Hiroaki Koga, and Ken Ideguchi:" Design of character layout based on sensibility using visual induction field theory ", IEICE Transactions, 82- A, 9, 1465-1473 (1999) ".
[0048]
The present invention utilizes such an induction field to evaluate whether or not the layout is an optimal layout for a group of images composed of characters, photographs, pictures, figures, and the like. The layout evaluation that relies on intuition and manual work is automatically performed.
In the present embodiment, when evaluating the quality of the layout, a group of images to be laid out is regarded as one induction field calculation object, the induction field is calculated, and the potential of the equipotential line obtained by the calculation is calculated. The quality of the layout is evaluated based on the shape.
[0049]
It is assumed that a group of images to be laid out is an image composed of a character string and a photograph as shown in FIG. The image shown in FIG. 4 shows a part of a newspaper article, and is composed of a character string portion C and photographs P1 and P2. The layout shown in FIG. 4 is made by a designer specializing in newspaper paper. It is assumed that the layout is such that many people can easily see and understand the contents.
[0050]
As shown in FIG. 4, when the induction field is calculated using the above equation (1) for the entire group of images laid out in a limited display range, the calculated induction field is as shown in FIG. 5. An equipotential line L is drawn. When calculating the induction field for the entire information to be laid out, the character string portion C shown in FIG. 4 is represented by a simple line as shown in FIG. , P2 calculate the induction field by representing the outer shape by a rectangular frame.
[0051]
This is because the layout is determined by the positional relationship and the size of each element, so each element can be expressed in a simplified manner. In this way, the induction field is calculated with each element simplified. Then, if an equipotential line is drawn from the derived induction field, the equipotential line can represent an equipotential line of the entire layout.
[0052]
The layout shown in FIG. 4 is a layout designed by a specialized designer and is easy to see and understand, and the equipotential lines L obtained from the entire image laid out in this manner are entirely It becomes round with little unevenness.
From this, it is possible to calculate the induction field for the entire set of images to be laid out, and determine the quality of the layout of the image from the shape of the equipotential lines obtained thereby. That is, if the degree of unevenness of the obtained equipotential lines is known, it is possible to evaluate whether the layout of the image is a good layout.
[0053]
Therefore, in the present embodiment, the degree of unevenness of the equipotential line is obtained as the complexity of the equipotential line, and the complexity is used as an index for evaluating the quality of the layout of the image. In other words, the degree of complexity decreases as the equipotential lines have less irregularities and are more rounded, and the greater the irregularities of the equipotential lines, the greater the complexity. This complexity is expressed by the complexity of the i-th equipotential line as C _i Can be defined by the following equation (2). In the following equation (2), L _i Is the length of the i-th equipotential line, S _i Represents the area of the surface surrounded by the i-th equipotential line. The length L of the i-th equipotential line _i Can be considered as the number of dots forming the potential line, and the area S of the surface surrounded by the i-th equipotential line _i Can be considered as the number of dots existing on the surface surrounded by the i-th equipotential line.
[0054]
(Equation 2)

[0055]
According to the above equation (2), the complexity C increases as the length of the equipotential line drawn by the induction field calculated for a group of images to be laid out (the more the unevenness increases). _i Can be said to be large. Conversely, the less complex the equipotential line is, the closer to a circle the complexity C _i Is a small value.
Here, the complexity of each of a group of images shown in FIG. 4 and various layouts as shown in FIG. 6 will be calculated. In FIG. 6, similarly to FIG. 5, the character string portion C represents each character string by a simple line, and the photographs P1 and P2 are simply represented by rectangular frames.
[0056]
6A shows the same layout as FIG. 4 (this is referred to as layout A1), and FIG. 6B shows the layout in which the photograph P2 of FIG. Is referred to as a layout A2), FIG. (C) shows a layout in which the photograph P1 is located at the lower right and the photograph P2 is located at the upper left (this is referred to as layout A3), and FIG. This is a layout in which P2 is arranged in a character string (this is called layout A4).
[0057]
For these, first, the respective induction fields are calculated, and the complexity is calculated from the equipotential lines (the respective i-th potential lines) drawn by the obtained induction fields according to the above equation (2). A result like 7 was obtained. In FIG. 7, the horizontal axis indicates the layouts A1 to A6, and the vertical axis indicates the complexity obtained for each of the layouts A1 to A6.
[0058]
According to FIG. 7, the layout A1 (referred to as a reference layout A1) laid out by the designer, which is considered to be easy to read and easy to understand the content, has the lowest complexity, and the other three layouts A2, A3, and A4 all have one. Also, the complexity is a large value compared to the reference layout A1. In particular, in this example, the layout A3 has the greatest complexity.
[0059]
This is because, as described above, the guide field obtained from the reference layout A1 has a small amount of irregularities and is generally rounded, and the other three layouts A2 to A4 are obtained from the respective layouts. This is because the potential lines have large irregularities.
Using the equipotential lines, the energy E of the induced field in the entire image can be defined by the following equation (3). In the following equation (3), i represents the i-th equipotential line, _i Is the area of the surface enclosed by the i-th equipotential line, P _i Represents the potential value at the i-th equipotential line, respectively. This is equivalent to obtaining the volume of the induction field when the induction field is considered three-dimensionally, and the magnitude of the volume is defined as energy.
[0060]
[Equation 3]

[0061]
The above is a case where a part of an article such as a newspaper (often consisting of a character string and a photograph) is set as a group of images to be laid out, and the evaluation is performed on the case where the group of images is laid out. The evaluation when a general image is used as the image to be laid out can be considered in the same manner.
Next, the configuration of the image layout apparatus according to the present invention will be described with reference to FIG. FIG. 8 is a functional block diagram showing the configuration of the image layout device according to the present invention.
[0062]
As shown in FIG. 8, an image layout apparatus according to the present invention includes a layout unit 100 for selecting and laying out an image from a plurality of still images constituting a moving image, an image suitable for user's preference, and other specific images. And a condition input unit 300 for inputting layout conditions and other conditions. More specifically, it is configured as a general computer in which a CPU, a ROM, a RAM, an I / F, and the like are connected to a bus, and the CPU activates a predetermined program stored in a predetermined area of the ROM, and according to the program. The processing executed as the layout unit 100, the learning unit 200, and the condition input unit 300 is executed.
[0063]
The layout unit 100 includes a moving image acquiring unit 110 that acquires a moving image, and an image characteristic information extracting unit that extracts image characteristic information indicating image characteristics of each still image constituting the moving image acquired by the moving image acquiring unit 110. 120, a user model storage unit 130 storing image characteristics information indicating characteristics of an image and other specific images suitable for the user's preference as a user model, an evaluation value calculation unit 140 for calculating an evaluation value of the image, and a moving image acquisition. An image selection unit 150 for selecting an image from among a plurality of still images constituting a moving image acquired by the unit 110; a template storage unit 160 storing a plurality of different templates constituting a layout framework; and an image selection unit 150 An image layout unit 170 for laying out the selected image selected in step 1, a print unit 180 for printing, and a display unit 18 for display A speech recognition unit 190 that performs speech recognition based on speech information attached to the moving image acquired by the moving image acquisition unit 110, a summary creation unit 191 that creates a summary based on the recognition result of the speech recognition unit 190, It is composed of a scene division detecting section 192 which detects a scene division from the moving image acquired by the moving image acquiring section 110.
[0064]
When a moving image storage medium 50 storing a plurality of moving images is provided, the moving image obtaining unit 110 obtains one of the moving images from the provided moving image storage medium 50. Here, the moving image storage medium 50 includes, for example, an FD, a CD, an MO, a memory card, and other removable memories.
The image feature information extraction unit 120 calculates the strength M of the guidance field for each still image constituting the moving image acquired by the moving image acquisition unit 110. _xy , Complexity of equipotential lines C _i , The induction field energy E and the three primary color luminance values N of each pixel constituting the image _1xy , N _2xy , N _3xy Is extracted as image feature information. Induction field strength M _xy , Complexity of equipotential lines C _i The energy E of the guidance field is calculated based on an image obtained by subjecting a still image to black and white binarization processing. In the present embodiment, each feature amount M included in the image feature information _xy , C _i , E, N _1xy , N _2xy And N _3xy Are treated as vectors.
[0065]
The user model storage unit 130 stores a plurality of user models, and stores each user model by a neural network 400 as shown in FIG. FIG. 9 is a diagram showing a configuration of the neural network 400. Note that, as the user model, a user model indicating a feature of an image suitable for a user's preference, a user model indicating a feature of an image having an impact, or a user model indicating a feature of an image in a specific style is stored.
[0066]
As shown in FIG. 9, the neural network 400 _xy , N _1xy , N _2xy , N _3xy , C _i Input layers I for inputting E and E _i And each input layer I _i J intermediate layers H that input the output from _j And each intermediate layer H _j Output layer O, which inputs the output of and outputs the preference value _k It is composed of And the input layer I _i And middle layer H _j Is the coupling coefficient W _ij By the synapse of the middle class H _j And output layer O _k Is the coupling coefficient W _jk Are connected by synapses.
[0067]
Further, the neural network 400 learns features of an image and other specific images suitable for the user's preference by a feature learning unit 230 described later. Therefore, when a feature value extracted from a specific image or another image suitable for the user's preference is input to the neural network 400, a relatively high value as a preference value is output to the output layer O. _k When a feature value extracted from an image that does not conform to the user's preference and extracted from an image other than the specific image is input to the neural network 400, a relatively low value as a preference value is output from the output layer O. _k Output from
[0068]
The evaluation value calculation unit 140 selects, from the user model storage unit 130, a user model that satisfies the evaluation value calculation condition input by the evaluation value calculation condition input unit 310 described later. Then, based on the image feature information extracted by the image feature information _xy , N _1xy , N _2xy , N _3xy , C _i And E, and the obtained feature quantity M _xy , N _1xy , N _2xy , N _3xy , C _i And E are input to the neural network 400 relating to the selected user model, and the output value of the neural network 400 is calculated as an evaluation value. The calculation of the evaluation value is performed for each still image.
[0069]
Referring back to FIG. 8, the scene segmentation detection unit 192 detects a scene segmentation from the moving image acquired by the moving image acquisition unit 110, and sends scene segmentation information indicating the scene segmentation to the image selection unit 150 and the voice recognition unit 190, respectively. Output. It should be noted that the scene division can be detected by a conventional scene extraction method, for example, based on a change in the luminance value of an image.
[0070]
Based on the scene division information from the scene division detection unit 192, the voice recognition unit 190 converts, from the moving image, audio information attached to the same scene or a specific scene among the moving images acquired by the moving image acquisition unit 110. Acquired, speech recognition is performed based on the acquired speech information, and text information is output to the summary creating unit 191 as a recognition result. The designation of the same scene or a specific scene may be performed by the user or automatically, but it is necessary that the image selection unit 150 matches the scene.
[0071]
The summary creation unit 191 creates a summary based on the text information from the speech recognition unit 190, and outputs the created summary to the image layout unit 170 as text information. The creation of the summary is based on the conventional example.
Based on the scene division information from the scene division detection unit 192, the image selection unit 150 selects one of a plurality of still images included in the moving image acquired by the moving image acquisition unit 110 from those belonging to the same scene or a specific scene. A predetermined number of still images are selected in descending order of the evaluation value calculated by the evaluation value calculation unit 140. Here, the selection of the still image is further performed so as to satisfy the image selection condition input by the image selection condition input unit 320 described later. The designation of the same scene or a specific scene may be performed by the user or may be performed automatically, but it is necessary to make the same as the scene targeted by the voice recognition unit 190.
[0072]
The template storage unit 160 stores a plurality of different templates as shown in FIG. FIG. 10 is a diagram showing the structure of the template.
Each template is configured by arranging an image storage frame for storing a selected image and a character storage frame for storing text information in a plurality of layout areas, and arranging the selected image in each image storage frame. Priorities are assigned. In the template shown in FIG. 10A, the image storage frame 501a with the highest priority “1” is arranged in the upper half of the layout area, and the image storage frames 501 with the priority “2” to “5” are stored. Frames 502a to 505a are each arranged small in four lower half sections of the layout area. This means that the selected image with the highest evaluation value is stored in the image storage frame 501a, and then the four selected images with the highest evaluation value are stored in the image storage frames 502a to 505a, respectively. On the other hand, in the template shown in FIG. 5A, character storage frames 501b to 505b corresponding to the image storage frames 501a to 505a are arranged so as to overlap the corresponding image storage frames.
[0073]
Further, in the template shown in FIG. 10B, the image storage frame 511 with “1” as the priority is largely arranged in the left half of the layout area, and the image storage frames with “2” to “4” as the priority are stored. Frames 512 to 514 are arranged small in three sections on the right half of the layout area. This means that the selected image with the highest evaluation value is stored in the image storage frame 511, and then the three selected images are stored in the image storage frames 512 to 514 in descending order of the evaluation value. On the other hand, in the template shown in FIG. 6B, a character storage frame 515 is arranged below the image storage frame 511.
[0074]
In the template illustrated in FIG. 10C, the layout area is divided into four vertically and two horizontally, and image storage frames 521 to 528 with “1” to “8” as priorities are arranged from left to right. Next, they are arranged in each section from top to bottom. This means that the eight selected images are stored in the image storage frames 521 to 528 in descending order of the evaluation value. On the other hand, in the template shown in FIG. 11C, a character storage frame 529 is arranged below the image storage frames 527 and 528.
[0075]
Returning to FIG. 8, the image layout unit 170 selects a template from the template storage unit 160 that satisfies a layout condition input by a layout condition input unit 330 described later. Then, the selected image selected by the image selection unit 150 is stored in the image storage frame of the selected template based on the evaluation value calculated by the evaluation value calculation unit 140 corresponding to the selected image. Specifically, the selected image is stored in an image storage frame having a priority order that matches the evaluation value. Further, the text information from the summary creating unit 191 is stored in the character storage frame in the selected template. This lays out the selected image and the summary.
[0076]
The printing unit 180 prints the layout result of the image layout unit 170 with a printer or the like. This allows the user to check the layout result of the image layout unit 170 on paper.
The display unit 185 displays the layout result of the image layout unit 170 on a display or the like. Thus, the user can check the layout result of the image layout unit 170 on the screen.
[0077]
As shown in FIG. 8, the learning unit 200 includes an image designation input unit 210 for inputting designation of an image by a user from among a plurality of still images constituting a moving image acquired by the moving image acquisition unit 110, An image feature information extraction unit 220 that extracts image feature information of a plurality of still images that form a moving image acquired by the unit 110 and that is related to the designation input by the image designation input unit 210, and an image feature information extraction unit 220 And a feature learning unit 230 that learns features of an image and other specific images suitable for the user's preference based on the extracted image feature information.
[0078]
The image feature information extraction unit 220 is configured to have the same function as the image feature information extraction unit 120, and includes the image designation input unit 210 out of a plurality of still images constituting the moving image acquired by the moving image acquisition unit 110. For the items pertaining to the designation entered in step, the induction field strength M _xy , Complexity of equipotential lines C _i , The induction field energy E, and the three primary color luminance values N of each pixel constituting the image. _1xy , N _2xy And N _3xy Is extracted as image feature information.
[0079]
The feature learning unit 230 selects, from the user model storage unit 130, a user model that satisfies the evaluation value calculation condition input by the evaluation value calculation condition input unit 310 described later. Then, based on the image feature information extracted by the image feature information _xy , N _1xy , N _2xy , N _3xy , C _i And E, and the obtained feature quantity M _xy , N _1xy , N _2xy , N _3xy , C _i Based on E and E, the neural network 400 relating to the selected user model is learned by a known back propagation method or another learning method. In the learning, when a feature amount extracted from the still image according to the designation input by the image designation input unit 210 is input to the neural network 400, a relatively high value as a preference value is output to the output layer O. _k Output from the coupling coefficient W _ij , W _jk To determine. For example, when the back propagation method is used, the coupling coefficient W is calculated by a forward operation or a backward operation. _ij , W _jk To determine.
[0080]
As shown in FIG. 8, the condition input unit 300 includes an evaluation value calculation condition input unit 310 for inputting an evaluation value calculation condition for calculating an evaluation value, and an image selection condition input unit 320 for inputting an image selection condition for image selection. And a layout condition input unit 330 for inputting layout conditions relating to the layout.
The evaluation value calculation condition input unit 310 is configured to input, as an evaluation value calculation condition, the content specifying one of the user models in the user model storage unit 130. For example, when the user designates “an image that suits the user's preference” from “images that suit the user's preference”, “images with impact”, and “images of a specific style”, the user corresponding to the designation Contents for specifying a model (a user model indicating the characteristics of an image suitable for the user's preference) are input as evaluation value calculation conditions. In this case, the image selection unit 150 selects an image suitable for the user's preference, and the image layout unit 170 determines a layout suitable for laying out the image suitable for the user's preference. You.
[0081]
The image selection condition input unit 320 is configured to input the content specifying the number of selected images as an image selection condition. For example, when “10” is specified as the number of selected images, the evaluation value calculation unit 140 determines the number of still images constituting the moving image acquired by the moving image acquisition unit 110 in the image selection unit 150. Ten still images are selected in descending order of the calculated evaluation value.
[0082]
The layout condition input unit 330 is configured to input, as layout conditions, whether to perform direct printing, whether to perform print preview, the number of pages to be printed, and the content specifying one of the templates in the template storage unit 160. Has become. For example, when performing direct printing and print preview, if “3” is specified as the number of print pages and “template 1” is specified as the template, the image layout unit 170 determines that the selected image is 3 based on the template 1. After the layout is performed with the page as the upper limit and the display unit 185 performs a print preview of the layout result in the image layout unit 170, the print unit 180 directly prints the layout result in the image layout unit 170.
[0083]
Next, the operation of the present embodiment will be described.
First, a case of learning the neural network 400 will be described.
When learning the neural network 400 for a user model showing characteristics of an image suitable for the user's preference, the user first provides the moving image storage medium 50 storing a plurality of moving images to the moving image acquisition unit 110. When the moving image storage medium 50 is provided, one of the moving images is obtained from the provided moving image storage medium 50 by the moving image obtaining unit 110.
[0084]
Next, the user designates “an image suitable for the user's preference” as the evaluation value calculation condition, and selects his / her preference from a plurality of still images constituting the moving image acquired by the moving image acquiring unit 110. Specify some that match. These designations are input to the evaluation value calculation condition input unit 310 and the image designation input unit 210.
When “an image that matches the user's preference” is specified, the feature learning unit 230 selects a user model that matches the user's preference from the user model storage unit 130 as a learning target.
[0085]
On the other hand, when the designation of a still image is input, the image feature information extracting unit 220 extracts a feature related to the input specification among a plurality of still images constituting the moving image acquired by the moving image acquiring unit 110. Quantity M _xy , N _1xy , N _2xy , N _3xy , C _i And E are extracted as image feature information. Then, the feature amount M is calculated by the feature learning unit 230 from the extracted image feature information. _xy , N _1xy , N _2xy , N _3xy , C _i And E, and the obtained feature quantity M _xy , N _1xy , N _2xy , N _3xy , C _i Based on E and E, the neural network 400 relating to the selected user model is learned. This series of processing is performed for all designated still images.
[0086]
When learning the neural network 400 for a user model showing the characteristics of an image having an impact, specify “an image having an impact” in the same manner as described above, and specify the moving image acquired by the moving image acquisition unit 110. It is only necessary to specify some of the plurality of still images having an impact from among the plurality of still images constituting. Of course, it is not limited to manually specifying an image, but image characteristic information common to impactful images is obtained, and an image having the same or similar image characteristic information as the image characteristic information is automatically specified. You may.
[0087]
When learning the neural network 400 for the user model showing the characteristics of the image of the specific style, in the same manner as above, the “image of the specific style” is designated, and the moving image acquired by the moving image acquisition unit 110 is specified. May be specified from among a plurality of still images constituting the image. Of course, it is not limited to manually specifying an image, image characteristic information common to an image of a specific style is obtained, and an image having the same or similar image characteristic information as the image characteristic information is automatically specified. You may.
[0088]
Next, a case of laying out an image will be described.
When laying out an image suitable for the user's preference, the user first provides the moving image storage medium 50 storing a plurality of moving images to the moving image acquisition unit 110. When the moving image storage medium 50 is provided, one of the moving images is obtained from the provided moving image storage medium 50 by the moving image obtaining unit 110. Then, the scene division detection unit 192 detects a scene division from the acquired moving image, and outputs scene division information indicating the scene division to the image selection unit 150 and the voice recognition unit 190.
[0089]
Next, the user specifies “an image suitable for the user's preference” as an evaluation value calculation condition, and specifies a desired template as a layout condition. These designations can be omitted, for example, by setting default settings. At the same time, if necessary, image selection conditions and other layout conditions can be specified.
[0090]
When the “image suitable for the user's preference” is designated, the evaluation value calculation unit 140 selects a user model suitable for the user's preference from the user model storage unit 130. This user model is used for calculating an evaluation value. When a template is specified, the template specified by the user is selected by the image layout unit 170 from the template storage unit 160. This template is used for the layout of the selected image.
[0091]
On the other hand, the feature amount M for each still image forming the moving image acquired by the image feature information _xy , N _1xy , N _2xy , N _3xy , C _i And E are extracted as image feature information. Next, the evaluation value calculation unit 140 calculates a feature amount M from the extracted image feature information. _xy , N _1xy , N _2xy , N _3xy , C _i And E, and the obtained feature quantity M _xy , N _1xy , N _2xy , N _3xy , C _i And E are input to the neural network 400 relating to the selected user model, and an output value from the neural network 400 output in accordance with the input is calculated as an evaluation value. This series of processing is performed on all the still images constituting the moving image acquired by the moving image acquiring unit 110.
[0092]
Next, based on the scene division information from the scene division detection unit 192, the image selection unit 150 determines the evaluation value from among the plurality of still images constituting the acquired moving image that belong to the same scene or a specific scene. A predetermined number of still images are selected in descending order.
On the other hand, based on the scene division information from the scene division detection unit 192, the audio recognition unit 190 acquires, from the acquired moving image, audio information accompanying the same scene or a specific scene from among the acquired moving images, Speech recognition is performed based on the acquired speech information, and text information is output to the summary creating unit 191 as a recognition result. Next, the summary creating unit 191 creates an abstract based on the text information from the speech recognition unit 190, and outputs the created abstract to the image layout unit 170 as text information.
[0093]
When an image is selected and a summary is created, the image layout unit 170 lays out the selected image based on the evaluation value and lays out the summary. In the layout, in the selected template, the selected image is stored in an image storage frame assigned a priority that matches the evaluation value. In the selected template, the text information from the summary creating unit 191 is stored in the character storage frame. If the print condition is designated as the layout condition, the display unit 185 previews the layout result of the image layout unit 170 on a display or the like, and designates the direct printing as the layout condition. Then, the printing unit 180 prints the layout result of the image layout unit 170 directly on a printer or the like.
[0094]
When laying out an image having an impact, it is only necessary to specify “an image having an impact” and a desired template as a layout condition in the same manner as described above.
When laying out an image in a specific style, in the same manner as described above, it is only necessary to specify “image in a specific style” and to specify a desired template as a layout condition.
[0095]
As described above, in the present embodiment, the voice recognition unit 190 that performs voice recognition based on voice information attached to a moving image and outputs text information as a result of the recognition, and a plurality of still images forming the moving image And an image layout unit 170 that lays out the text information from the voice recognition unit 190 and the selected image selected by the image selection unit 150.
[0096]
As a result, the voice recognition result and the selected image are laid out, so that not only the selected image but also the sound accompanying the moving image can be reflected in the layout. Therefore, it is possible to create an edited material with relatively rich contents as compared with the related art. Further, in the present embodiment, the image selecting section 150 selects an image suitable for the user's preference from a plurality of still images.
[0097]
As a result, it is possible to create a compilation having contents relatively suited to the user's preference.
Furthermore, the present embodiment further includes a summary creating unit 191 that creates a summary based on text information from speech recognition unit 190, and image layout unit 170 converts the summary and the selected image created by summary creating unit 191. It is designed to be laid out.
[0098]
As a result, the summary of the audio information attached to the moving image can be laid out together, so that a more complete edit can be created, and the layout part of the audio information becomes relatively simple and clear. It will be easier to read. Further, in the present embodiment, a scene segmentation detecting unit 192 for detecting a scene segmentation from a moving image is further provided. The image selecting unit 150 acquires a moving image based on the scene segmenting information from the scene segmenting detecting unit 192. A predetermined number of still images are selected in descending order of the evaluation value calculated by the evaluation value calculation unit 140 from among a plurality of still images constituting the moving image acquired by the unit 110 and belonging to the same scene or a specific scene. Has become.
[0099]
Thus, when a moving image is composed of a plurality of scenes, an image can be selected for each scene, so that editing can be performed in relatively detail.
Further, in the present embodiment, a scene segmentation detection unit 192 for detecting a scene segmentation from a moving image is further provided. The speech recognition unit 190 obtains a moving image acquisition based on the scene segmentation information from the scene segmentation detection unit 192. Voice information accompanying the same scene or a specific scene among the video images obtained by the unit 110 is obtained from the video, voice recognition is performed based on the obtained voice information, and text information is summarized as a recognition result. The data is output to the unit 191.
[0100]
Thus, when the moving image is composed of a plurality of scenes, audio information can be acquired for each scene, so that editing can be performed in relatively detail.
Further, in the present embodiment, an image feature information extracting unit 120 that extracts image feature information indicating features of an image for each still image forming a moving image, and an image feature information extraction unit 120 Evaluation value calculation unit 140 that calculates an evaluation value of an image, an image selection unit 150 that selects an image from among a plurality of still images constituting a moving image, and an evaluation value calculated by evaluation value calculation unit 140. An image layout unit 170 that determines the layout of the selected image selected by the image selection unit 150.
[0101]
Thus, the layout can be determined according to the content of the image, so that a layout with a relatively good appearance can be realized according to the content of the image.
Further, in the present embodiment, a user model storage unit 130 for storing image feature information suitable for the user's preference as a user model is provided, and the evaluation value calculation unit 140 includes an image extracted by the image feature information extraction unit 120. The evaluation value is calculated based on the characteristic information and the user model in the user model storage unit 130.
[0102]
Thus, the layout can be determined so as to be a layout relatively suited to the user's preference, so that a layout that is relatively attractive to the user can be realized. Further, it is possible to select an image having a layout relatively suited to the user's preference.
Further, in the present embodiment, the user model determines the strength M of the visual induction field for an image that matches the user's preference. _xy The image feature information extraction unit 120 includes, for each still image, the strength M of the visual induction field for each still image. _xy , And the calculated visual induction field strength M _xy The image feature information including the induction field feature amount indicating the above is extracted.
[0103]
As a result, the strength M of the visual induction field based on physiological and psychological findings _xy Is used to determine the layout, so that the layout can be determined so as to be more suitable for the user's preference. Therefore, it is possible to realize a layout that is more attractive to the user. Further, it is possible to select an image having a layout more suitable for the user's preference.
[0104]
Furthermore, in the present embodiment, the user model determines the complexity C of equipotential lines in a visual induction field for an image that matches the user's preference. _i The image feature information extracting unit 120 calculates a visual induction field for each still image for each still image, obtains an equipotential line from the calculated visual induction field, Equipotential line complexity C _i Is extracted.
[0105]
As a result, the complexity C of equipotential lines in a visual induction field based on physiological and psychological findings _i Is used to determine the layout, so that the layout can be determined so as to be more suitable for the user's preference. Therefore, it is possible to realize a layout that is more attractive to the user. Further, it is possible to select an image having a layout more suitable for the user's preference.
[0106]
Furthermore, in the present embodiment, the user model includes an energy feature amount indicating the energy E of the visual guidance field for an image that is suitable for the user's preference, and the image feature information extraction unit 120 performs, for each still image, The energy E of the visual guidance field is calculated for the still image, and the image feature information including the energy characteristic amount indicating the calculated energy E of the visual guidance field is extracted.
[0107]
In this way, by using the energy E of the visual induction field based on physiological and psychological knowledge to determine the layout, the layout can be determined so as to be more suitable for the user's preference. Therefore, it is possible to realize a layout that is more attractive to the user. Further, it is possible to select an image having a layout more suitable for the user's preference.
[0108]
Further, in the present embodiment, image selecting section 150 is configured to select an image from a plurality of still images constituting a moving image based on the evaluation value calculated by evaluation value calculating section 140.
Thus, an image can be selected according to the evaluation value related to the feature of the image, so that an image having a relatively good appearance can be selected.
[0109]
Furthermore, in the present embodiment, evaluation value calculation condition input section 310 for inputting evaluation value calculation conditions relating to evaluation value calculation is provided, and evaluation value calculation section 140 performs evaluation value calculation input in evaluation value calculation condition input section 310. An evaluation value is calculated based on the conditions and the image feature information extracted by the image feature information extraction unit 120.
Thereby, the calculation condition of the evaluation value can be specified, so that the degree of freedom of the layout can be improved.
[0110]
Further, in the present embodiment, an image selection condition input unit 320 for inputting an image selection condition relating to image selection is provided, and the image selection unit 150 An image is selected from a plurality of still images constituting the image.
Thereby, since the selection condition of the image can be designated, the degree of freedom of the image selection can be improved.
[0111]
Further, in the present embodiment, a moving image acquiring unit 110 that acquires a moving image is provided, and image selecting unit 150 selects an image from a plurality of still images that constitute the moving image acquired by moving image acquiring unit 110. It is supposed to.
Thus, an external image can be handled as a layout target.
In the first embodiment, the image selection unit 150 corresponds to the image selection unit of

Invention

1, 2, 6, or 18, and the selection by the image selection unit 150 corresponds to the image selection step of Invention 19, The layout unit 170 corresponds to the layout unit of the

invention

1, 3, 5, 9, 17, or 18. The layout by the image layout unit 170 corresponds to the layout step of the invention 19; the printing unit 180 corresponds to the printing unit of the invention 17; The acquisition by the speech recognition unit 190 corresponds to the supplementary information acquisition step, and corresponds to the supplementary information acquisition step of Invention 19.
[0112]
In the first embodiment, the summary creating section 191 corresponds to the summary creating section of the fifth aspect, the scene section detecting section 192 corresponds to the scene section detecting section of the sixth or seventh aspect, and the audio information is ,

Invention

1, 3, 7, 9, 18 or 19.
Next, a second embodiment of the present invention will be described with reference to the drawings. FIGS. 11 and 12 are diagrams showing an image layout device, an image layout program, and an image layout method according to a second embodiment of the present invention. Hereinafter, only the portions different from the first embodiment will be described, and the overlapping portions will be denoted by the same reference numerals and description thereof will be omitted.
[0113]
In the present embodiment, an image layout apparatus, an image layout program, and an image layout method according to the present invention, as shown in FIG. 11, select an image from moving images captured by a digital video camera, and select the selected image. Is different from the first embodiment in that a URL for referencing a moving image and a summary is laid out instead of the summary.
[0114]
First, the configuration of the image layout apparatus according to the present invention will be described with reference to FIG. FIG. 11 is a functional block diagram showing the configuration of the image layout device according to the present invention.
The image layout apparatus according to the present invention is communicably connected to the Internet, and as shown in FIG. 11, a layout unit 100, a learning unit 200, a condition input unit 300, and a moving image or And an information providing unit 600 for providing an abstract.
[0115]
The layout unit 100 includes a moving image acquisition unit 110, an image feature information extraction unit 120, a user model storage unit 130, an evaluation value calculation unit 140, an image selection unit 150, a template storage unit 160, and an image selection unit 150 An image layout unit 172 that lays out the selected image selected in step 1, a printing unit 180, a display unit 185, a speech recognition unit 190, and a summary creation unit 193 that creates a summary based on the recognition result of the speech recognition unit 190. A scene partition detection unit 194 for detecting a scene partition from the moving image acquired by the moving image acquisition unit 110, a unique key generation unit 195 for generating a unique key, a moving image generation unit 196 for generating a moving image, and a moving image And a moving image storage unit 197 that stores the summary in association with the unique key.
[0116]
The moving image storage unit 197 stores the moving image generated by the moving image generation unit 196 and the summary generated by the summary generation unit 193 as files, and also stores the moving image, the summary, and the scene division information for each unique key. A unique key-based correspondence table 700 for registering the correspondence is stored. FIG. 12 is a diagram showing a data structure of the unique key correspondence table 700. As shown in FIG.
[0117]
As shown in FIG. 12, one record is registered in the unique key correspondence table 700 for each unique key. Each record includes a field 702 for registering a unique key generated by the unique key generation unit 195, a field 704 for registering a file name and a storage location of a moving image file storing a moving image, and a file of a text file storing a summary. A field 706 for registering a name and a field 708 for registering scene delimitation information are configured. The field 708 further includes a field for registering a start time as a scene segment and a field for registering an end time as a scene segment.
[0118]
In the example of FIG. 12, “0010001” is used as the unique key in the first row, “d: \ files \ 001.mpeg” is used as the file name and storage location of the moving image file, and the file name of the text file is used. “Text1” and “0:10 to 0:20” are registered as the scene division information. This is because, among the moving images acquired by the moving image acquiring unit 110, those relating to the scene from the start time 0:10 to the end time 0:20 are stored in the moving image file “d: \ files \ 001.mpeg”, A summary of audio information attached to a moving image related to the scene is stored in a text file “Text1”, which means that the moving image and the summary can be specified by a unique key “001001”. I have. Therefore, when the image layout apparatus is accessed by giving the unique key “001001”, a moving image and a summary corresponding to the access can be obtained.
[0119]
Returning to FIG. 11, the scene segmentation detection unit 194 detects a scene segmentation from the moving image acquired by the moving image acquisition unit 110, and outputs scene segmentation information indicating the scene segmentation to the image selection unit 150, the voice recognition unit 190, and the moving image. The image is output to the image generation unit 196.
The unique key generation unit 195 generates a unique key, and outputs the generated unique key to the summary generation unit 193 and the moving image generation unit 196. In the example of FIG. 12, the unique key is generated by giving the number assigned to the moving image file as the upper three digits and giving the serial number assigned in the order of generation of the unique key as the lower three digits.
[0120]
The summary creation unit 193 creates a summary based on the text information from the speech recognition unit 190, and stores the created summary in the moving image storage unit 197 in association with the unique key generated by the unique key generation unit 195. It has become.
The moving image generation unit 196 acquires, from the moving image, an image related to the same scene or a specific scene among the moving images acquired by the moving image acquisition unit 110 based on the scene division information from the scene division detection unit 194, and acquires The moving image and the scene division information are stored in the moving image storage unit 197 in association with the unique key generated by the unique key generation unit 195.
[0121]
The image layout unit 172 selects a template that satisfies the layout condition input by the layout condition input unit 330 from the template storage unit 160. Then, the selected image selected by the image selection unit 150 is stored in the image storage frame of the selected template based on the evaluation value calculated by the evaluation value calculation unit 140 corresponding to the selected image. Specifically, the selected image is stored in an image storage frame having a priority order that matches the evaluation value. Further, the unique key corresponding to the selected image selected by the image selecting unit 150 is read from the moving image storage unit 197, and the URL (indicating the network address of the image layout device) including the read unique key is written in the selected template. Store in the storage frame. Thereby, the selected image and the URL are laid out.
[0122]
As shown in FIG. 11, the information providing unit 600 includes a moving image providing unit 610 that provides a moving image of the moving image storage unit 197, and a summary providing unit 620 that provides a summary of the moving image storage unit 197. I have.
When there is an access based on the URL, the moving image providing unit 610 reads the moving image corresponding to the unique key included in the URL from the moving image storage unit 197, and provides the read moving image to the access source. Has become.
[0123]
When there is an access based on the URL, the summary providing unit 620 reads the summary text information corresponding to the unique key included in the URL from the moving image storage unit 197, and provides the read text information to the access source. It has become.
Next, the operation of the present embodiment will be described.
When laying out an image suitable for the user's preference, the user first provides the moving image storage medium 50 storing a plurality of moving images to the moving image acquisition unit 110. When the moving image storage medium 50 is provided, one of the moving images is obtained from the provided moving image storage medium 50 by the moving image obtaining unit 110. Then, the scene segmentation detection unit 194 detects a scene segmentation from the acquired moving image, and scene segmentation information indicating the scene segmentation is output to the image selection unit 150, the voice recognition unit 190, and the moving image generation unit 196, respectively. You.
[0124]
Next, the user specifies “an image suitable for the user's preference” as an evaluation value calculation condition, and specifies a desired template as a layout condition. These designations can be omitted, for example, by setting default settings. At the same time, if necessary, image selection conditions and other layout conditions can be specified.
[0125]
When the “image suitable for the user's preference” is designated, the evaluation value calculation unit 140 selects a user model suitable for the user's preference from the user model storage unit 130. This user model is used for calculating an evaluation value. When a template is specified, the image layout unit 172 selects a template specified by the user from the template storage unit 160. This template is used for the layout of the selected image.
[0126]
On the other hand, the feature amount M for each still image forming the moving image acquired by the image feature information _xy , N _1xy , N _2xy , N _3xy , C _i And E are extracted as image feature information. Next, the evaluation value calculation unit 140 calculates a feature amount M from the extracted image feature information. _xy , N _1xy , N _2xy , N _3xy , C _i And E, and the obtained feature quantity M _xy , N _1xy , N _2xy , N _3xy , C _i And E are input to the neural network 400 relating to the selected user model, and an output value from the neural network 400 output in accordance with the input is calculated as an evaluation value. This series of processing is performed on all the still images constituting the moving image acquired by the moving image acquiring unit 110.
[0127]
Next, based on the scene segmentation information from the scene segmentation detection unit 194, the image selection unit 150 evaluates the evaluation value from among the plurality of still images constituting the acquired moving image from those belonging to the same scene or a specific scene. A predetermined number of still images are selected in descending order.
On the other hand, based on the scene division information from the scene division detection unit 194, the sound recognition unit 190 acquires, from the moving image, audio information accompanying the same scene or a specific scene among the acquired moving images, Speech recognition is performed based on the acquired speech information, and text information is output to the summary creation unit 193 as a recognition result. Next, a unique key is generated by the unique key generation unit 195, a summary is generated by the summary generation unit 193 based on the text information from the voice recognition unit 190, and the generated text information of the summary is generated by the unique key. The moving image is stored in the moving image storage unit 197 in association with the key. Also, based on the scene division information from the scene division detection unit 194, the moving image generation unit 196 acquires, from the moving image, an image related to the same scene or a specific scene among the acquired moving images, and acquires the acquired moving image. The image and the scene division information are stored in the moving image storage unit 197 in association with the generated unique key.
[0128]
When an image is selected, a summary is created, and a moving image is generated, the image layout unit 172 lays out the selected image based on the evaluation value and lays out the URL. In the layout, in the selected template, the selected image is stored in an image storage frame assigned a priority that matches the evaluation value. In addition, a unique key related to the selected image is read from the moving image storage unit 197, and a URL including the read unique key (for example, http://www.abcd.co.jp/files) is included in the selected template. .Cgi? Id = 0010001) is stored in the character storage frame. If print preview is specified as a layout condition, the display unit 185 previews the layout result of the image layout unit 172 on a display or the like, and specifies direct printing as a layout condition. Then, the printing unit 180 prints the layout result of the image layout unit 172 directly on a printer or the like.
[0129]
The user who has received the distribution of the edited product on which the layout result has been printed can access the image layout apparatus by referring to the URL at his own network terminal or the like because the URL is printed on the edited product. You can get moving images and summaries related to the images posted on the website.
In the image layout apparatus, when there is an access based on the URL, the moving image providing unit 610 reads the moving image corresponding to the unique key included in the URL from the moving image storage unit 197, and reads the read moving image. Provided to the access source. In addition, the summary providing unit 620 reads the text information of the summary corresponding to the unique key included in the URL from the moving image storage unit 197, and provides the read text information to the access source. The moving image or the summary can be provided, for example, in a homepage format.
[0130]
When laying out an image having an impact, it is only necessary to specify “an image having an impact” and a desired template as a layout condition in the same manner as described above.
When laying out an image in a specific style, in the same manner as described above, it is only necessary to specify “image in a specific style” and to specify a desired template as a layout condition.
[0131]
As described above, in the present embodiment, the voice recognition unit 190 that performs voice recognition based on voice information attached to a moving image and outputs text information as a result of the recognition, and a plurality of still images forming the moving image And an image layout unit 172 that lays out a URL including a unique key related to the selected image and a selected image.
[0132]
Thus, the URL and the selected image for obtaining the audio information and the like attached to the moving image are laid out, so that not only the selected image but also the sound attached to the moving image can be reflected in the layout. Therefore, it is possible to create an edited material with relatively rich contents as compared with the related art.
Furthermore, in the present embodiment, a moving image storage unit 197 that stores a moving image including a selected image in association with a unique key, and a moving image providing unit that provides a moving image of the moving image storage unit 197 610, the moving image providing unit 610 reads the moving image corresponding to the unique key included in the URL from the moving image storage unit 197 when there is an access based on the URL, and stores the read moving image in the access source. Is provided.
[0133]
Accordingly, by accessing the image layout device based on the URL, a moving image including the selected image can be obtained. In addition, since access to a moving image is made possible by referring to the URL, it is relatively easy to obtain the moving image.
Further, in the present embodiment, a moving image storage unit 197 that stores a summary in association with a unique key, and a summary providing unit 620 that provides a summary of the moving image storage unit 197 are further provided. When an access based on the URL is made, the summary text information corresponding to the unique key included in the URL is read from the moving image storage unit 197, and the read text information is provided to the access source.
[0134]
Thus, by accessing the image layout device based on the URL, it is possible to obtain a summary related to the selected image. Further, since the summary can be accessed by referring to the URL, it is relatively easy to obtain the summary. In the second embodiment, the image selecting unit 150 corresponds to the image selecting unit of

Invention

1, 2, 6, or 18, and the selection by the image selecting unit 150 corresponds to the image selecting step of Invention 19, The layout unit 172 corresponds to the layout unit of the

invention

1, 3, 5, 9, 10, 12, 17, or 18. The layout by the image layout unit 172 corresponds to the layout step of Invention 19, the printing unit 180 corresponds to the printing unit of Invention 17, and the voice recognition unit 190 corresponds to

Invention

1, 3, 5, 7, 11, or It corresponds to 18 additional information acquisition means.
[0135]
In the second embodiment, the acquisition by the voice recognition unit 190 corresponds to the supplementary information acquisition step of Invention 19, and the summary creation unit 193 corresponds to the summary creation means of Invention 5, and the scene segmentation detection unit Reference numeral 194 corresponds to the scene division detecting means of the invention 6 or 7, and the unique key generation unit 195 corresponds to the identification information generating means of the invention 11 or 13. Also, the moving image generation unit 196 corresponds to the moving image generation unit of Invention 13, and the moving image storage unit 197 corresponds to the moving image storage unit of Invention 12 or 13 or the incidental information storage unit of Invention 10 or 11. The moving image providing unit 610 corresponds to the moving image providing unit of the twelfth aspect.
[0136]
In the second embodiment, the summary providing unit 620 corresponds to the supplementary information providing unit of Invention 10, and the audio information is the supplementary information of

Invention

1, 3, 7, 9 to 11, 18, or 19. Correspondingly, the unique key corresponds to the identification information of the inventions 10 to 13, and the URL corresponds to the reference information of the inventions 10, 12, or 14.
In the first and second embodiments, the speech recognition unit 190 that performs speech recognition based on speech information attached to a moving image and outputs text information as a result of the recognition is provided. However, the present invention is not limited to this. When text information such as a clip is attached to a moving image, text information for acquiring text information attached to the moving image from the moving image instead of or together with the voice recognition unit 190. An acquisition unit may be provided. In this case, the text information acquired by the text information acquisition unit may be handled in the same manner as the text information from the speech recognition unit 190.
[0137]
In this case, the text information acquisition unit corresponds to the supplementary information acquisition unit of Invention 4, the

image layout units

170 and 172 correspond to the layout unit of Invention 4, and the text information corresponds to the supplementary information of Invention 4. ing.
In the second embodiment, the URL is adopted as the information for referring to the moving image or the summary. However, the present invention is not limited to this, and the barcode may be adopted. .
[0138]
This makes it possible to access the summary or the moving image by referring to the barcode, so that it is relatively easy to obtain the summary or the moving image.
In the second embodiment, the URL including the unique key and the network address of the image layout device is laid out. However, the present invention is not limited to this, and the URL may further include advertisement information. Good.
[0139]
In this case, the URL corresponds to the reference information of Invention 16.
In the first and second embodiments, the scene

segmentation detecting units

192 and 194 output the scene segmentation information to the speech recognition unit 190, and the speech recognition unit 190 outputs the scene segmentation information from the scene segmentation detection unit 192. Based on the information, audio information accompanying the same scene or a specific scene among the moving images acquired by the moving image acquisition unit 110 is acquired from the moving image, and speech recognition is performed based on the acquired audio information. Although the text information is output to the summary creating unit 191 as a recognition result, the present invention is not limited to this. The scene

partition detecting units

192 and 194 output the scene partition information to the

summary creating units

191 and 193, and 191 and 193 are based on the scene division information from the scene division detection unit 192, and are the same scenes or special scenes among the moving images acquired by the moving image acquisition unit 110. It may be configured to create a summary based on the text information relating to those pertaining to the scene.
[0140]
Thus, when a moving image is composed of a plurality of scenes, a summary can be created for each scene, so that editing can be performed in relatively detail.
In this case, the scene

division detection units

192 and 193 correspond to the scene division detection unit of the invention 8, the speech recognition unit 190 corresponds to the incidental information acquisition unit of the invention 8, and the

summary creation units

191 and 193 correspond to the invention. 8 summarizing means.
[0141]
In the first embodiment, the selected image and the summary are laid out according to the template. More specifically, as shown in FIG. 13, a picture-story style template is prepared, and You may be comprised so that the summary text information may be stored in a character storage frame. FIG. 13 is a diagram illustrating the structure of a picture-story style template. Further, in the example of FIG. 13, the selected image and the summary are printed on the same side, but the selected image may be printed on the front side and the summary may be printed on the back side. Thereby, a picture-story show can be automatically generated.
[0142]
Further, in the first and second embodiments, the print data is automatically laid out and is automatically printed. However, the present invention is not limited to this. , Instead of printing the top one, the top several sheets can be printed directly one by one. Accordingly, for example, the present invention can be applied to a case where only three beautiful images are to be printed immediately.
[0143]
In the first and second embodiments, the speech recognition unit 190 is provided. For example, the speech recognition unit 190 performs speech recognition using a HMM (Hidden Markov Model). In this case, it is conceivable to change the language model used for speech recognition by using scene segmentation information, or to reset the state transition of speech recognition in scene segmentation.
[0144]
In the first and second embodiments, when the moving image storage medium 50 storing a plurality of moving images is provided, the moving image acquisition unit 110 It is configured to acquire such a moving image, but not limited to this, when multimedia data including at least a moving image is provided, the moving image is configured to be extracted from the provided multimedia data. Is also good.
[0145]
Thus, the multimedia data can be handled as a layout target.
In the first embodiment, the print data is automatically laid out, and the print data is automatically printed. However, the present invention is not limited to this. It can also be configured as follows.
[0146]
Accordingly, for example, it is possible to cope with a case where only three beautiful images are to be printed immediately. Further, it is possible to construct a service or system for directly printing when a memory card or the like taken by a digital video camera is inserted into a printer.
In the first and second embodiments, an image suitable for the user's preference is configured to be selected from among a plurality of still images. However, the present invention is not limited to this. May be selected from a plurality of still images. In this case, a plurality of users may be requested to specify an image having a good impression, and the neural network 400 may learn the characteristics of the specified image in the same manner as in the first and second embodiments. Good.
[0147]
Further, in this case, it is possible not only to have a plurality of users input good / bad impressions, but also to input strong / weak impressions, and to have the neural network 400 learn based on the input. Thereby, since general user characteristics can be learned, it is possible to configure an image layout apparatus suitable for selecting an image suitable for a plurality of people's preferences.
[0148]
Further, in this case, the users are grouped according to their ages, for example, teens, twenties, thirties, etc., and for each group, the user is asked to specify an image that gives the user a good impression. It is also possible to make the neural network 400 learn the feature of the image that has been obtained. As a result, it is possible to configure an image layout device suitable for selecting an image suitable for the taste of the same generation. It can also be used to find out how many people prefer an image.
[0149]
In the first and second embodiments, the neural network 400 has the output layer O _k Is provided, but the invention is not limited to this, and a plurality of output layers may be provided. For example, a first output layer that outputs any of the likes / dislikes of the user, a second output layer that outputs any of the good / bad impressions of the user, and a strength / weakness of the impression of the user It is also possible to provide a third output layer for outputting either of them.
[0150]
In the first and second embodiments, the image feature information is extracted from all the still images constituting the moving image acquired by the moving image acquiring unit 110. However, the present invention is not limited to this. The image feature information may be extracted from a plurality of still images constituting a moving image acquired by the moving image acquiring unit 110, which satisfy predetermined extraction conditions. As the predetermined extraction condition, for example, a color distribution can be calculated, and a condition that the calculated distribution is equal to or more than a predetermined threshold can be set. This makes it possible to exclude an image whose color is too dark as a whole from being extracted.
[0151]
Further, in the first and second embodiments, the feature amounts of all the pixels constituting the image are extracted and learning is performed based on the extracted feature amounts. However, the present invention is not limited to this. For example, in a pixel group of a rectangular area composed of five pixels in the vertical direction and five pixels in the horizontal direction, four-sided pixels are targeted, and a feature amount (for example, an average value) of the target pixel is extracted, and based on the extracted feature amount. It may be configured to perform learning.
[0152]
In the first and second embodiments, the feature amount M _xy , N _1xy , N _2xy , N _3xy , C _i And E, the image selection and learning are performed, but the present invention is not limited to this. _xy , N _1xy , N _2xy , N _3xy , C _i And E may be configured to perform image selection and learning based on either of them.
In the first and second embodiments, the back propagation method is exemplified as the learning method of the neural network 400. However, the present invention is not limited to this, and an unsupervised learning method by self-organization can be used. . Thus, for example, the user can learn the features of 25 still images that compose a moving image captured by a digital video camera, and learn along the tendency of the images, and automatically determine the user's preference. You can learn.
[0153]
In the first and second embodiments, the strength of the induction field M is determined based on an image obtained by subjecting a still image to black and white binarization processing. _xy , Complexity of equipotential lines C _i And the energy E of the induction field is calculated, but the invention is not limited to this, and the strength M of the induction field is calculated based on the color still image itself. _xy , Complexity of equipotential lines C _i It is also possible to calculate the energy E of the induction field.
[0154]
In the first and second embodiments, the luminance values of the three primary colors are calculated by the vector N for each primary color. _1xy , N _2xy And N _3xy However, the present invention is not limited to this, and addition and the like may be performed and handled as one vector.
In the first and second embodiments, when realizing the layout unit 100, the learning unit 200, the condition input unit 300, or the information providing unit 600, the control program stored in the ROM is executed. However, the present invention is not limited to this, and the program may be read from a storage medium storing a program indicating these procedures into the RAM and executed.
[0155]
Here, the storage medium is a semiconductor storage medium such as a RAM or a ROM, a magnetic storage type storage medium such as an FD or HD, an optical read type storage medium such as a CD, CDV, LD, or DVD, or a magnetic storage type storage such as an MO. / Optical reading type storage media, including any storage media that can be read by a computer, regardless of an electronic, magnetic, optical, or other reading method.
[0156]
In the first and second embodiments, the image layout apparatus, the image layout program, and the image layout method according to the present invention select an image from moving images captured by a digital video camera, and select the image. The present invention has been applied to the case of automatically laying out an image, but the present invention is not limited to this, and can be applied to other cases without departing from the gist of the present invention. For example, the following modifications are possible.
[0157]
First, the information attached to the moving image may be time information. At this time, for example, a golf swing taken by a digital video camera is displayed for each frame, and time information is also added, so that an edited material with further enhanced content can be created.
Second, the information obtained by the URL is not limited to a moving image and a summary, but may be a 3D video.
[0158]
Thirdly, in the method of displaying a moving image or a summary obtained based on a URL, the moving image or the summary is displayed on a computer via a Web browser. Alternatively, it may be displayed directly on a mobile terminal.
Fourth, the URL may correspond to SSL (Secure Socket Layer).
[0159]
Fifth, the moving image storage unit 197 stores the moving image and the summary in association with the unique key, and further stores the moving image, the abstract, the advertisement information, the commentary article, and the column in association with the unique key. May be. Therefore, if access is made based on the URL, not only moving images and summaries but also advertisement information, commentary articles and columns can be obtained.
[0160]
Sixth, the moving image storage unit 197 may store media information obtained by converting a moving image by the conversion unit. For example, the file format may be converted for 3D and stored.
[0161]
【The invention's effect】
As described above, according to the image layout apparatus of the present invention, the selected image is laid out based on the additional information, so that not only the selected image but also voices and characters attached to the moving image are provided. Etc. can be reflected in the layout. Therefore, an effect is obtained that an edited material with relatively rich contents can be created as compared with the related art.
[0162]
Further, according to the image layout apparatus of the second aspect of the present invention, it is possible to obtain an effect that it is possible to create an edited matter having a content relatively suited to the user's preference.
Furthermore, according to the image layout apparatus of the third aspect of the present invention, since the sound accompanying the moving image can be reflected in the layout, it is possible to create an edited material with further enhanced contents. can get.
[0163]
Furthermore, according to the image layout device of the fourth aspect of the present invention, the text accompanying the moving image can be laid out together, so that an edited material with further enhanced contents can be created. can get.
Further, according to the image layout apparatus according to the fifth or eighth aspect of the present invention, it is possible to lay out the summary of the supplementary information attached to the moving image, so that a more complete edit can be created. In addition to this, it is possible to obtain the effect that the layout portion relating to the supplementary information becomes relatively simple and clear, and the readability becomes easy.
[0164]
Further, according to the image layout apparatus of the present invention, when a moving image is composed of a plurality of scenes, an image can be selected for each scene, so that editing can be performed in relatively detail. The effect that can be obtained is also obtained.
Furthermore, according to the image layout apparatus of the present invention, when a moving image includes a plurality of scenes, additional information can be acquired for each scene, so that editing is performed in relatively detail. The effect of being able to do so is also obtained.
[0165]
Furthermore, according to the image layout device of the present invention, when a moving image is composed of a plurality of scenes, a summary can be created for each scene, so that editing can be performed in relatively detail. The effect that can be obtained is also obtained.
Further, according to the image layout apparatus according to the tenth or eleventh aspect of the present invention, by accessing the image layout apparatus based on the reference information, it is possible to obtain the additional information related to the selected image. can get.
[0166]
Further, according to the image layout device of the present invention, by accessing the image layout device based on the reference information, it is possible to obtain a moving image including the selected image. Can be
Further, according to the image layout apparatus of the present invention, since it is possible to access the additional information or the moving image by referring to the URL, it is relatively easy to obtain the additional information or the moving image. The effect that it becomes becomes.
[0167]
Further, according to the image layout device of the present invention, since it is possible to access the supplementary information or the moving image by referring to the barcode, it is relatively easy to obtain the supplementary information or the moving image. Is also obtained.
On the other hand, according to the image layout program of the eighteenth aspect of the present invention, the same effects as those of the image layout apparatus of the first aspect can be obtained.
[0168]
On the other hand, according to the image layout method of the nineteenth aspect of the present invention, the same effects as those of the image layout apparatus of the first aspect can be obtained.
[Brief description of the drawings]
FIG. 1 is a diagram showing a pixel array of a digital image.
FIG. 2 is a diagram illustrating a shielding condition when obtaining the strength of a visual guidance field.
3A and 3B are examples of a visual guidance field of a character “A”. FIG. 3A shows a case where a visual guidance field is obtained in consideration of a shielding condition, and FIG. 3B does not consider a shielding condition. FIG. 6 is a diagram showing a case where a visual guidance field is obtained in FIG.
FIG. 4 is a diagram showing an image of a part of a newspaper article as a reference layout example.
FIG. 5 shows an image shown in FIG. 4, in which the character string portion represents each character string by a simple line, and the photograph is simply represented by a rectangular frame to calculate an induction field, and is obtained from the calculated induction field. It is a figure which shows an equipotential line.
FIG. 6 is a diagram showing a case where the reference layout shown in FIG. 4 and a layout obtained by changing the reference layout are variously changed;
FIG. 7 is a diagram illustrating the complexity of each layout when the layouts are as shown in FIGS. 6 (a) to 6 (d).
FIG. 8 is a functional block diagram showing a configuration of an image layout device according to the present invention.
FIG. 9 is a diagram showing a configuration of a neural network 400.
FIG. 10 is a diagram showing a structure of a template.
FIG. 11 is a functional block diagram illustrating a configuration of an image layout device according to the present invention.
FIG. 12 is a diagram showing a data structure of a unique key correspondence table 700;
FIG. 13 is a diagram showing the structure of a picture-story style template.
[Explanation of symbols]
50 moving image storage medium, 100 layout unit, 110 moving image acquisition unit, 120 image feature information extraction unit, 130 user model storage unit, 140 evaluation value calculation unit, 150 image selection unit, 160 template Storage unit, 170, 172 image layout unit, 180 printing unit, 185 display unit, 190 voice recognition unit, 191, 193 summary creation unit, 192, 194 scene division detection unit, 195 unique key generation unit , 196: Moving image generation unit, 197: Moving image storage unit, 200: Learning unit, 210: Image designation input unit, 220: Image feature information extraction unit, 230: Feature learning unit, 300: Condition input unit, 310: Evaluation Value calculation condition input unit, 320: image selection condition input unit, 330: layout condition input unit, 400: neural network, 500 to 528: image storage frame 600 ... information providing unit, 610 ... moving picture providing unit, 620 ... summary providing unit, 700 ... unique key by correspondence table

Claims

動画像を構成する複数の静止画像のなかから画像を選択し、選択した画像をレイアウトする装置であって、
前記動画像に付帯する付帯情報を前記動画像から取得する付帯情報取得手段と、前記複数の静止画像のなかから画像を選択する画像選択手段と、前記付帯情報取得手段で取得した付帯情報に基づいて前記画像選択手段で選択した選択画像をレイアウトするレイアウト手段とを備えることを特徴とする画像レイアウト装置。An apparatus for selecting an image from among a plurality of still images constituting a moving image and laying out the selected image,
Additional information acquiring means for acquiring additional information attached to the moving image from the moving image, image selecting means for selecting an image from among the plurality of still images, and additional information based on the additional information acquired by the additional information acquiring means. And a layout means for laying out the selected image selected by the image selecting means.

請求項１において、
前記画像選択手段は、ユーザの好みに適合した画像を前記複数の静止画像のなかから選択するようになっていることを特徴とする画像レイアウト装置。In claim 1,
The image layout device, wherein the image selecting means selects an image suitable for a user's preference from the plurality of still images.

請求項１及び２のいずれかにおいて、
前記動画像には、音声情報が付帯しており、
前記付帯情報取得手段は、前記動画像に付帯する音声情報に基づいて音声認識を行い、認識結果であるテキスト情報を前記付帯情報として取得するようになっており、
前記レイアウト手段は、前記付帯情報取得手段で取得したテキスト情報及び前記選択画像をレイアウトするようになっていることを特徴とする画像レイアウト装置。In any one of claims 1 and 2,
Audio information is attached to the moving image,
The supplementary information acquisition unit performs speech recognition based on speech information attached to the moving image, and acquires text information as a recognition result as the supplementary information,
The image layout apparatus, wherein the layout unit lays out the text information and the selected image acquired by the incidental information acquisition unit.

請求項１及び２のいずれかにおいて、
前記動画像には、テキスト情報が付帯しており、
前記付帯情報取得手段は、前記動画像に付帯するテキスト情報を前記付帯情報として取得するようになっており、
前記レイアウト手段は、前記付帯情報取得手段で取得したテキスト情報及び前記選択画像をレイアウトするようになっていることを特徴とする画像レイアウト装置。In any one of claims 1 and 2,
The moving image has text information attached thereto,
The supplementary information acquiring means is configured to acquire text information attached to the moving image as the supplementary information,
The image layout apparatus, wherein the layout unit lays out the text information and the selected image acquired by the incidental information acquisition unit.

請求項３及び４のいずれかにおいて、
さらに、前記付帯情報取得手段で取得したテキスト情報に基づいて要約を作成する要約作成手段を備え、
前記レイアウト手段は、前記要約作成手段で作成した要約及び前記選択画像をレイアウトするようになっていることを特徴とする画像レイアウト装置。In any one of claims 3 and 4,
Further, a summary creating means for creating a summary based on the text information obtained by the incidental information obtaining means,
The image layout apparatus, wherein the layout means lays out the summary created by the summary creation means and the selected image.

請求項１乃至５のいずれかにおいて、
さらに、前記動画像からシーンの区切を検出するシーン区切検出手段を備え、
前記画像選択手段は、前記シーン区切検出手段の検出結果に基づいて前記複数の静止画像のなかから画像を選択するようになっていることを特徴とする画像レイアウト装置。In any one of claims 1 to 5,
Further, a scene division detecting unit for detecting a scene division from the moving image,
The image layout device, wherein the image selecting means selects an image from the plurality of still images based on a detection result of the scene division detecting means.

請求項１乃至５のいずれかにおいて、
さらに、前記動画像からシーンの区切を検出するシーン区切検出手段を備え、
前記付帯情報取得手段は、前記シーン区切検出手段の検出結果に基づいて前記付帯情報を前記動画像から取得するようになっていることを特徴とする画像レイアウト装置。In any one of claims 1 to 5,
Further, a scene division detecting unit for detecting a scene division from the moving image,
The image layout apparatus, wherein the supplementary information acquiring means acquires the supplementary information from the moving image based on a detection result of the scene division detecting means.

請求項５において、
さらに、前記動画像からシーンの区切を検出するシーン区切検出手段を備え、
前記要約作成手段は、前記シーン区切検出手段の検出結果及び前記付帯情報取得手段で取得したテキスト情報に基づいて要約を作成するようになっていることを特徴とする画像レイアウト装置。In claim 5,
Further, a scene division detecting unit for detecting a scene division from the moving image,
An image layout apparatus, wherein the summary creating means creates an abstract based on a detection result of the scene section detecting means and text information acquired by the incidental information acquiring means.

請求項１乃至８のいずれかにおいて、
前記レイアウト手段は、レイアウトの枠組みを構成する異なる複数のテンプレートのなかから前記テンプレートを選択し、選択したテンプレート及び前記付帯情報に基づいて前記選択画像をレイアウトするようになっていることを特徴とする画像レイアウト装置。In any one of claims 1 to 8,
The layout means selects the template from among a plurality of different templates constituting a layout framework, and lays out the selected image based on the selected template and the accompanying information. Image layout device.

請求項１乃至９のいずれかにおいて、
さらに、前記付帯情報を識別情報と対応付けて記憶する付帯情報記憶手段と、前記付帯情報記憶手段の付帯情報を提供する付帯情報提供手段とを備え、
前記レイアウト手段は、前記付帯情報に対応する識別情報を含む参照情報及び前記選択画像をレイアウトするようになっており、
前記付帯情報提供手段は、前記参照情報に基づくアクセスがあったときは、当該参照情報に含まれる識別情報に対応する付帯情報を前記付帯情報記憶手段から読み出し、読み出した付帯情報をアクセス元に提供するようになっていることを特徴とする画像レイアウト装置。In any one of claims 1 to 9,
Further, additional information storage means for storing the additional information in association with identification information, and additional information providing means for providing additional information of the additional information storage means,
The layout means lays out the reference information including the identification information corresponding to the incidental information and the selected image,
When there is an access based on the reference information, the additional information providing unit reads the additional information corresponding to the identification information included in the reference information from the additional information storage unit, and provides the read additional information to the access source. An image layout device characterized in that the image layout device performs

請求項１０において、
さらに、前記識別情報を生成する識別情報生成手段を備え、
前記付帯情報取得手段は、前記付帯情報を前記動画像から取得し、取得した付帯情報を、前記識別情報生成手段で生成した識別情報と対応付けて前記付帯情報記憶手段に記憶するようになっていることを特徴とする画像レイアウト装置。In claim 10,
Further, the apparatus further includes an identification information generation unit configured to generate the identification information,
The supplementary information acquisition unit acquires the supplementary information from the moving image, and stores the acquired supplementary information in the supplementary information storage unit in association with the identification information generated by the identification information generation unit. An image layout apparatus characterized in that:

請求項１乃至９のいずれかにおいて、
さらに、前記動画像のうち前記選択画像を含むものを識別情報と対応付けて記憶する動画像記憶手段と、前記動画像記憶手段の動画像を提供する動画像提供手段とを備え、
前記レイアウト手段は、前記選択画像を含む動画像に対応する識別情報を含む参照情報及び前記選択画像をレイアウトするようになっており、
前記動画像提供手段は、前記参照情報に基づくアクセスがあったときは、当該参照情報に含まれる識別情報に対応する動画像を前記動画像記憶手段から読み出し、読み出した動画像をアクセス元に提供するようになっていることを特徴とする画像レイアウト装置。In any one of claims 1 to 9,
Further, a moving image storage unit that stores a moving image including the selected image in association with identification information, and a moving image providing unit that provides a moving image of the moving image storage unit,
The layout means lays out the reference information and the selected image including identification information corresponding to the moving image including the selected image,
When there is an access based on the reference information, the moving image providing unit reads a moving image corresponding to the identification information included in the reference information from the moving image storage unit, and provides the read moving image to an access source. An image layout apparatus characterized in that the image layout apparatus performs

請求項１２において、
さらに、前記識別情報を生成する識別情報生成手段と、動画像を生成する動画像生成手段とを備え、
前記動画像生成手段は、前記動画像のうち前記選択画像を含むものを生成し、生成した動画像を、前記識別情報生成手段で生成した識別情報と対応付けて前記動画像記憶手段に記憶するようになっていることを特徴とする画像レイアウト装置。In claim 12,
Further, the image processing apparatus further includes identification information generating means for generating the identification information, and a moving image generating means for generating a moving image,
The moving image generating unit generates an image including the selected image among the moving images, and stores the generated moving image in the moving image storage unit in association with the identification information generated by the identification information generating unit. An image layout apparatus characterized in that:

請求項１０乃至１３のいずれかにおいて、
前記参照情報は、ＵＲＬ（Ｕｎｉｆｏｒｍ　Ｒｅｓｏｕｒｃｅ　Ｌｏｃａｔｏｒ）であることを特徴とする画像レイアウト装置。In any one of claims 10 to 13,
An image layout apparatus, wherein the reference information is a URL (Uniform Resource Locator).

請求項１０乃至１３のいずれかにおいて、
前記参照情報は、バーコードであることを特徴とする画像レイアウト装置。In any one of claims 10 to 13,
The image layout device, wherein the reference information is a barcode.

請求項１０乃至１５のいずれかにおいて、
前記参照情報は、広告情報を含むことを特徴とする画像レイアウト装置。In any one of claims 10 to 15,
The image layout device, wherein the reference information includes advertisement information.

請求項１乃至１６のいずれかにおいて、
さらに、前記レイアウト手段のレイアウト結果に基づいて印刷を行う印刷手段を備えることを特徴とする画像レイアウト装置。In any one of claims 1 to 16,
The image layout apparatus further includes a printing unit that performs printing based on a layout result of the layout unit.

動画像を構成する複数の静止画像のなかから画像を選択し、選択した画像をレイアウトするプログラムであって、
前記動画像に付帯する付帯情報を前記動画像から取得する付帯情報取得手段、前記複数の静止画像のなかから画像を選択する画像選択手段、及び前記付帯情報取得手段で取得した付帯情報に基づいて前記画像選択手段で選択した選択画像をレイアウトするレイアウト手段として実現される処理をコンピュータに実行させるためのプログラムであることを特徴とする画像レイアウトプログラム。A program for selecting an image from a plurality of still images constituting a moving image and laying out the selected image,
Additional information acquiring means for acquiring additional information attached to the moving image from the moving image, an image selecting means for selecting an image from the plurality of still images, and additional information acquired by the additional information acquiring means based on the additional information. An image layout program, which is a program for causing a computer to execute processing realized as a layout unit that lays out a selected image selected by the image selection unit.

動画像を構成する複数の静止画像のなかから画像を選択し、選択した画像をレイアウトする方法であって、
前記動画像に付帯する付帯情報を前記動画像から取得する付帯情報取得ステップと、前記複数の静止画像のなかから画像を選択する画像選択ステップと、前記付帯情報取得ステップで取得した付帯情報に基づいて前記画像選択ステップで選択した選択画像をレイアウトするレイアウトステップとを含むことを特徴とする画像レイアウト方法。A method of selecting an image from a plurality of still images constituting a moving image and laying out the selected image,
An additional information acquiring step of acquiring additional information attached to the moving image from the moving image, an image selecting step of selecting an image from the plurality of still images, and based on the additional information acquired in the additional information acquiring step. And laying out the selected image selected in the image selecting step.