JP4692550B2

JP4692550B2 - Image processing apparatus, processing method thereof, and program

Info

Publication number: JP4692550B2
Application number: JP2008010205A
Authority: JP
Inventors: 辰吾鶴見
Original assignee: Sony Corp
Current assignee: Sony Corp
Priority date: 2008-01-21
Filing date: 2008-01-21
Publication date: 2011-06-01
Anticipated expiration: 2028-01-21
Also published as: US20140022455A1; EP2129112A4; US8599320B2; US8717504B2; CN101622868A; WO2009093398A1; JP2009171498A; KR20100114453A; EP2129112A1; CN101622868B; US20100111499A1

Abstract

A picture conversion information supply section 130 calculates per frame an affine transformation parameter for picture conversion based on motion information about a moving picture. With reference to a reference picture, a picture conversion section 140 affine-transforms pictures making up the moving picture per frame using the calculated affine transformation parameters. Based on information indicating the center position, angle or scaling factor about the transformed pictures coming from the picture conversion section 140, a sound conversion information calculation section 190 calculates sound conversion information for converting the sound corresponding to the pictures. Based on the sound conversion information, a sound conversion processing section 200 controls the volume of each of the channels making up the sound, adds up the controlled sound of each channel, and outputs the result as output sound to speakers 220.

Description

本発明は、画像処理装置に関し、特に、動画を再生することが可能な画像処理装置、その処理方法ならびに当該方法をコンピュータに実行させるプログラムに関する。 The present invention relates to an image processing apparatus, and more particularly to an image processing apparatus capable of reproducing a moving image, a processing method thereof, and a program for causing a computer to execute the method.

近年、デジタルビデオカメラ等により撮像された動画を再生する動画再生装置が普及している。デジタルビデオカメラにおいては、撮像者の関心に応じて撮像対象の被写体を拡大または縮小することができるズーム機能を備えたものが一般的である。このようなズーム機能を使用して撮像された動画を再生する場合には、表示画面上において被写体の大きさが変化しても音声はそのまま出力されるため、充分な臨場感が得られない。そこで、デジタルビデオカメラで撮像された撮像条件等を考慮して音声処理をすることが考えられる。例えば、デジタルビデオカメラのズーミング操作に関する情報に基づいて複数チャンネルの音声信号のレベルを調整する音声変換処理方法が提案されている（例えば、特許文献２参照。）。
特開２００５−３１１６０４号広報（図２） In recent years, a moving image reproducing apparatus that reproduces a moving image captured by a digital video camera or the like has become widespread. In general, a digital video camera is provided with a zoom function capable of enlarging or reducing a subject to be imaged in accordance with the interest of the photographer. When a moving image captured using such a zoom function is reproduced, even if the size of the subject changes on the display screen, the sound is output as it is, so that a sufficient sense of reality cannot be obtained. In view of this, it is conceivable to perform audio processing in consideration of imaging conditions captured by a digital video camera. For example, an audio conversion processing method that adjusts the levels of audio signals of a plurality of channels based on information relating to a zooming operation of a digital video camera has been proposed (see, for example, Patent Document 2).
Japanese Laid-Open Patent Publication No. 2005-311604 (FIG. 2)

上述の従来技術によれば、動画を再生する場合には、デジタルビデオカメラのズーム量に応じて音量を変えることで、その動画に適した現実味のある音響効果を得ることができる。 According to the above-described conventional technology, when reproducing a moving image, a realistic sound effect suitable for the moving image can be obtained by changing the volume according to the zoom amount of the digital video camera.

しかしながら、上述の従来技術では、動画を表示画面上の一部領域に表示させるような場合には、ズーム量と、画面上における動画の位置とが対応しないおそれがあり、画面上における動画の位置に応じた適切な音響効果を得ることができない場合がある。このように、動画を表示画面上の一部領域に表示させるような場合に、画面上における動画の位置に応じた適切な音響効果を得ることが重要である。 However, in the conventional technology described above, when a moving image is displayed in a partial area on the display screen, the zoom amount may not correspond to the position of the moving image on the screen. In some cases, it is not possible to obtain an appropriate acoustic effect according to the sound level. Thus, when displaying a moving image in a partial region on the display screen, it is important to obtain an appropriate acoustic effect according to the position of the moving image on the screen.

そこで、本発明は、動画を再生する場合に表示画面上における動画の表示領域に適した音声を生成することを目的とする。 Accordingly, an object of the present invention is to generate sound suitable for a moving image display area on a display screen when reproducing a moving image.

本発明は、上記課題を解決するためになされたものであり、その第１の側面は、動画および当該動画に対応する音声を含むコンテンツデータを取得するコンテンツ取得手段と、上記動画を構成する第１の画像を基準として上記動画の時間軸において上記第１の画像よりも後に位置して表示対象となる第２の画像を変換するための画像変換情報を供給する画像変換情報供給手段と、表示手段の表示画面上における上記第１の画像の配置位置を基準として上記画像変換情報に基づいて上記第２の画像を変換する画像変換手段と、上記変換された第２の画像を上記表示手段に表示させる表示制御手段と、上記画像変換情報により特定される要素であって上記第１の画像に対する上記第２の画像の移動に関する要素に基づいて上記第２の画像に係る音声を構成する複数のチャンネルの各音量を調整することにより当該音声を変換処理して出力音声を生成する音声変換処理手段と、上記変換された第２の画像が上記表示手段に表示されている際に上記生成された出力音声を音声出力手段に出力させる音声出力制御手段とを具備する画像処理装置およびその処理方法ならびに当該方法をコンピュータに実行させるプログラムである。これにより、画像変換情報に基づいて変換された画像において、その画像が表示される領域に応じて音声を変換処理して出力させるという作用をもたらす。
また、本発明の第２の側面は、動画および当該動画に対応する音声を含むコンテンツデータを取得するコンテンツ取得手段と、上記動画を構成する第１の画像を基準として上記動画の時間軸において上記第１の画像よりも後に位置して表示対象となる第２の画像を変換するための画像変換情報を供給する画像変換情報供給手段と、表示手段の表示画面上における上記第１の画像の配置位置を基準として上記画像変換情報に基づいて上記第２の画像を変換する画像変換手段と、上記変換された第２の画像を上記表示手段に表示させる表示制御手段と、上記画像変換情報により特定される要素であって上記第１の画像に対する上記第２の画像の移動、回転および倍率の少なくとも１つに関する要素に基づいて上記第２の画像に係る音声を構成する複数のチャンネルの各音量を調整することにより当該音声を変換処理して出力音声を生成する音声変換処理手段と、上記変換された第２の画像が上記表示手段に表示されている際に上記生成された出力音声を音声出力手段に出力させる音声出力制御手段とを具備する画像処理装置およびその処理方法ならびに当該方法をコンピュータに実行させるプログラムである。これにより、画像変換情報に基づいて変換された画像において、その画像が表示される領域に応じて音声を変換処理して出力させるという作用をもたらす。
また、この第１または第２の側面において、上記変換された第２の画像および当該第２の画像の背景となる背景画像を合成して合成画像とする画像合成手段をさらに具備し、上記表示制御手段は、上記合成画像を上記表示手段に表示させ、上記音声出力制御手段は、上記合成画像が上記表示手段に表示されている際に上記生成された出力音声を上記音声出力手段に出力させるようにしてもよい。これにより、画像変換情報に基づいて変換された画像において、その画像が表示される領域に応じて音声を変換処理して出力させるという作用をもたらす。
また、この第１または第２の側面において、上記要素に基づいて上記第２の画像に係る音声を構成する複数のチャンネルの各音量を調整するための音声変換情報を算出する音声変換情報算出手段をさらに具備し、上記音声変換処理手段は、上記算出された音声変換情報に基づいて上記第２の画像に係る音声を変換処理して上記出力音声を生成するようにしてもよい。これにより、画像変換情報に基づいて変換された画像において、その画像が表示される領域に応じて音声を変換処理して出力させるという作用をもたらす。 The present invention has been made to solve the above problems, and a first aspect of the present invention is a content acquisition unit that acquires content data including a moving image and sound corresponding to the moving image, and a first that constitutes the moving image. an image conversion information supply means for supplying an image conversion information for converting the second image to be displayed is located later than the first image in the time axis of the video image 1 as a reference, the display image converting means and said display means a second image which is the transformation of the second image based on the position of the first image in the unit of the display screen to the image conversion information based sound according to the second image based on factors relating to the movement of a display control means for displaying the elements in a said second to said first image by the image identified by the image conversion information to And voice conversion processing means for generating an output audio by converting processes the sound by adjusting the respective volume of the plurality of channels constituting a when the second image is the conversion are displayed on the display means to a program for executing the images processing apparatus and processing method, and the way to and a sound output control means for outputting an output audio that is the product to the audio output unit to the computer. Thereby, in the image converted based on the image conversion information, the sound is converted and output in accordance with the area where the image is displayed.
The second aspect of the present invention provides a content acquisition unit that acquires content data including a moving image and sound corresponding to the moving image, and the time axis of the moving image with respect to the first image constituting the moving image. Image conversion information supply means for supplying image conversion information for converting a second image to be displayed that is positioned after the first image, and arrangement of the first image on the display screen of the display means Image conversion means for converting the second image based on the image conversion information with reference to the position, display control means for displaying the converted second image on the display means, and identification by the image conversion information The sound related to the second image is configured based on an element that is at least one of movement, rotation, and magnification of the second image relative to the first image. The sound conversion processing means for converting the sound to generate output sound by adjusting the volume of each of the number of channels, and the generation when the converted second image is displayed on the display means An image processing apparatus including a sound output control means for outputting sound output to the sound output means, a processing method thereof, and a program for causing a computer to execute the method. Thereby, in the image converted based on the image conversion information, the sound is converted and output in accordance with the area where the image is displayed.
In the first or second aspect, the image processing device further includes image combining means for combining the converted second image and a background image serving as a background of the second image into a combined image, and The control means displays the synthesized image on the display means, and the sound output control means causes the sound output means to output the generated output sound when the synthesized image is displayed on the display means. You may do it. Thereby, in the image converted based on the image conversion information, the sound is converted and output in accordance with the area where the image is displayed.
Also, in this first or second aspect, the voice conversion information calculation means for calculating voice conversion information for adjusting each volume of a plurality of channels constituting the voice related to the second image based on the element. The voice conversion processing means may convert the voice related to the second image based on the calculated voice conversion information to generate the output voice. Thereby, in the image converted based on the image conversion information, the sound is converted and output in accordance with the area where the image is displayed.

また、この第１または第２の側面において、上記画像変換情報は、上記第１の画像に対する上記第２の画像の移動に関する要素を含むようにしてもよい。これにより、画像の移動に応じて音声を変換処理させるという作用をもたらす。 In the first or second aspect, the image conversion information may include an element related to movement of the second image with respect to the first image. This brings about the effect that the sound is converted according to the movement of the image.

また、この第１または第２の側面において、上記画像変換情報は、上記第１の画像に対する上記第２の画像の回転に関する要素を含むようにしてもよい。これにより、画像の回転に応じて音声を変換処理させるという作用をもたらす。 In the first or second aspect, the image conversion information may include an element related to rotation of the second image with respect to the first image. This brings about the effect that the sound is converted according to the rotation of the image.

また、この第１または第２の側面において、上記画像変換情報は、上記第１の画像に対する上記第２の画像の倍率に関する要素を含むようにしてもよい。これにより、画像の倍率に応じて音声を変換処理させるという作用をもたらす。 In the first or second aspect, the image conversion information may include an element related to a magnification of the second image with respect to the first image. This brings about the effect that the sound is converted according to the magnification of the image.

また、この第１または第２の側面において、上記音声変換処理手段は、音量調整手段と音声加算手段とを備え、上記音量調整手段は、上記音声変換情報に基づいて上記音声を構成する複数のチャンネルの各音量を調整し、上記音声加算手段は、上記調整後の音声をチャンネル毎に加算するようにしてもよい。これにより、複数チャンネルの音声を変換処理させるという作用をもたらす。 In the first or second aspect, the sound conversion processing means includes a sound volume adjusting means and a sound adding means, and the sound volume adjusting means includes a plurality of sounds constituting the sound based on the sound conversion information. The volume of each channel may be adjusted, and the sound adding means may add the adjusted sound for each channel. This brings about the effect | action that the audio | voice of multiple channels is converted.

また、この第１または第２の側面において、上記音声変換処理手段は、上記変換処理して上記出力音声を構成する右チャンネルおよび左チャンネルの音声を生成するようにしてもよい。これにより、右チャンネルおよび左チャンネルの音声を生成させるという作用をもたらす。 In the first or second aspect, the sound conversion processing means may generate the right channel and left channel sounds constituting the output sound by performing the conversion process. As a result, the sound of the right channel and the left channel is generated.

また、この第１または第２の側面において、上記音声変換処理手段は、上記変換処理して上記出力音声を構成するセンターチャンネルの音声を生成するようにしてもよい。これにより、センターチャンネルの音声を生成させるという作用をもたらす。 In the first or second aspect, the sound conversion processing means may generate the center channel sound constituting the output sound by performing the conversion process. This brings about the effect | action of producing | generating the sound of a center channel.

また、この第１または第２の側面において、上記音声は、右チャンネルおよび左チャンネルの音声を含み、上記音声変換処理手段は、上記右チャンネルおよび上記左チャンネルの音声を上記音声処理して上記出力音声を生成するようにしてもよい。これにより、右チャンネルおよび左チャンネルの入力音声を変換処理して出力音声を生成させるという作用をもたらす。 In the first or second aspect, the sound may include the sound of the right channel and left channel, the sound conversion processing means, the output sound of the right channel and the left channel and the audio processing Audio may be generated. This brings about the effect that the input sound of the right channel and the left channel is converted to generate the output sound.

また、この第１または第２の側面において、上記音声は、センターチャンネルの音声を含み、上記音声変換処理手段は、上記センターチャンネルの音声を上記変換処理して上記出力音声を生成するようにしてもよい。これにより、センターチャンネルの入力音声を変換処理して出力音声を生成させるという作用をもたらす。 In the first or second aspect, the sound includes a center channel sound, and the sound conversion processing unit generates the output sound by performing the conversion process on the center channel sound. Also good. This brings about the effect that the input sound of the center channel is converted to generate the output sound.

また、この第１または第２の側面において、上記第１の画像を含む画像を履歴画像として保持する画像保持手段をさらに具備し、上記画像変換手段は、上記画像変換情報に基づいて上記第２の画像および上記画像保持手段に保持されている履歴画像のうちの少なくとも一方を変換し、上記画像合成手段は、上記画像変換手段により少なくとも一方が変換された上記第２の画像および上記履歴画像を合成して上記合成画像とし上記合成画像を新たな履歴画像として上記画像保持手段に保持させるようにしてもよい。これにより、動画を構成する画像について、変換された一連の画像を合成して合成画像として表示させるという作用をもたらす。この場合において、上記画像保持手段に保持されている上記新たな履歴画像から上記表示手段の表示対象となる表示領域を決定して当該表示領域に含まれる画像を表示画像として取り出す表示領域取出手段をさらに具備し、上記画像合成手段は、上記第２の画像を上記表示画像に上書きして合成して新たな表示画像とし、上記表示制御手段は、上記新たな表示画像を上記表示手段に表示させ、上記表示領域取出手段は、上記画像保持手段の保持領域における上記表示領域の位置または角度または大きさに関する表示領域取出情報を生成し、上記音声変換処理手段は、上記画像変換情報により特定される上記要素と、上記表示領域取出情報により特定される要素であって上記画像保持手段の保持領域における上記表示領域の位置、角度、大きさに関する要素とに基づいて上記第２の画像に係る音声を構成する複数のチャンネルの各音量を調整することにより当該音声を変換処理して上記出力音声を生成するようにしてもよい。これにより、現在の画像を表示画面の領域に収まるように表示させるという作用をもたら Further, in the first or second aspect, the image processing device further includes an image holding unit that holds an image including the first image as a history image, and the image conversion unit performs the second conversion based on the image conversion information. At least one of the first image and the history image held in the image holding means, and the image composition means converts the second image and the history image converted at least one by the image conversion means. The combined image may be combined to form the combined image, and the combined image may be stored in the image storage unit as a new history image. This brings about the effect | action of synthesize | combining a series of converted images about the image which comprises a moving image, and displaying it as a synthesized image. In this case, a display area extracting means for determining a display area to be displayed on the display means from the new history image held in the image holding means and taking out an image included in the display area as a display image. further comprising, said image combining means, said second image as a new display image by combining by overwriting on the display image, the display control means, the new display image is displayed on the display means The display area extraction means generates display area extraction information relating to the position, angle or size of the display area in the holding area of the image holding means, and the sound conversion processing means is specified by the image conversion information. The position, angle, and size of the display area in the holding area of the image holding means that are specified by the element and the display area extraction information And conversion process the sound may generate the output sound by adjusting each volume of a plurality of channels constituting the sound according to the second image based on the elements related. This has the effect of displaying the current image so that it fits within the display screen area.

また、この第１または第２の側面において、上記画像変換手段は、上記表示手段における上記動画を表示させる表示領域を示すテンプレート情報に基づいて上記第２の画像を変換するようにしてもよい。これにより、テンプレート情報に基づいて画像を変換させるという作用をもたらす。 In the first or second aspect, the image conversion means may convert the second image based on template information indicating a display area in which the moving image is displayed on the display means. This brings about the effect | action of converting an image based on template information.

また、本発明の第３の側面は、動画および当該動画に対応する音声を含むコンテンツデータを取得するコンテンツ取得手順と、上記動画を構成する第１の画像を基準として上記動画の時間軸において上記第１の画像よりも後に位置して表示対象となる第２の画像を変換するための画像変換情報を供給する画像変換情報供給手順と、表示手段の表示画面上における上記第１の画像の配置位置を基準として上記画像変換情報に基づいて上記第２の画像を変換する画像変換手順と、上記変換された第２の画像を上記表示手段に表示させる表示制御手順と、上記画像変換情報により特定される要素であって上記第１の画像に対する上記第２の画像の移動、回転および倍率の少なくとも１つに関する要素に基づいて上記第２の画像に係る音声を構成する複数のチャンネルの各音量を調整することにより当該音声を変換処理して出力音声を生成する音声変換処理手順と、上記変換された第２の画像が上記表示手段に表示されている際に上記生成された出力音声を音声出力手段に出力させる音声出力制御手順とを具備する情報処理方法および当該方法をコンピュータに実行させるプログラムである。これにより、画像変換情報に基づいて変換された画像において、その画像が表示される領域に応じて音声を変換処理して出力させるという作用をもたらす。 The third aspect of the present invention, a content acquisition step of acquiring contents data including audio corresponding to the moving image and the moving image, the time axis of the moving the first image constituting the video as a reference An image conversion information supply procedure for supplying image conversion information for converting a second image to be displayed positioned after the first image , and the first image on the display screen of the display means ; An image conversion procedure for converting the second image based on the image conversion information on the basis of the arrangement position , a display control procedure for displaying the converted second image on the display means, and the image conversion information a component that is specified movement of the second image relative to the first image, constitutes a voice according to the second image based on at least one related elements of rotation and magnification A sound conversion processing step of generating an output audio by converting processes the sound by adjusting the respective volume of the number of channels, the generation when the second image is the conversion are displayed on the display means is a program for executing an information processing method and the way to and a sound output control procedure for outputting the output sound is the sound output unit to the computer. Thereby, in the image converted based on the image conversion information, the sound is converted and output in accordance with the area where the image is displayed.

本発明によれば、動画を再生する場合に表示画面上における動画の表示領域に適した音声を生成させることができるという優れた効果を奏し得る。 According to the present invention, it is possible to produce an excellent effect that sound suitable for a moving image display area on a display screen can be generated when a moving image is reproduced.

次に本発明の実施の形態について図面を参照して詳細に説明する。 Next, embodiments of the present invention will be described in detail with reference to the drawings.

図１は、本発明の実施の形態における画像処理装置１００の機能構成例を示すブロック図である。画像処理装置１００は、コンテンツ記憶部１１０と、コンテンツ取得部１２０と、画像変換情報供給部１３０と、画像変換部１４０と、画像合成部１５０と、画像メモリ１６０と、表示制御部１７０と、表示部１８０と、音声変換情報算出部１９０と、音声変換処理部２００と、音声出力制御部２１０と、スピーカ２２０と、操作受付部２３０とを備える。画像処理装置１００は、例えば、デジタルビデオカメラ等の撮像装置で撮像された動画について、映像解析により特徴量を抽出し、この抽出された特徴量を用いて各種画像処理を施すことが可能なパーソナルコンピュータによって実現することができる。 FIG. 1 is a block diagram illustrating a functional configuration example of an image processing apparatus 100 according to an embodiment of the present invention. The image processing apparatus 100 includes a content storage unit 110, a content acquisition unit 120, an image conversion information supply unit 130, an image conversion unit 140, an image composition unit 150, an image memory 160, a display control unit 170, a display Unit 180, voice conversion information calculation unit 190, voice conversion processing unit 200, voice output control unit 210, speaker 220, and operation reception unit 230. The image processing apparatus 100 is a personal computer capable of extracting a feature amount by video analysis from a moving image captured by an imaging device such as a digital video camera and performing various image processing using the extracted feature amount. It can be realized by a computer.

コンテンツ記憶部１１０は、動画とその動画に対応する音声とを含むコンテンツファイルを記憶するものである。また、コンテンツ記憶部１１０は、コンテンツ取得部１２０からの要求に応じてコンテンツファイルをコンテンツ取得部１２０に供給する。 The content storage unit 110 stores a content file including a moving image and sound corresponding to the moving image. In addition, the content storage unit 110 supplies a content file to the content acquisition unit 120 in response to a request from the content acquisition unit 120.

コンテンツ取得部１２０は、操作受付部２３０からのコンテンツ取得に係る操作入力に応じて、コンテンツ記憶部１１０に記憶されているコンテンツファイルを取得するものである。このコンテンツ取得部１２０は、取得されたコンテンツファイルのうち動画を画像変換情報供給部１３０および画像変換部１４０に出力する。また、コンテンツ取得部１２０は、取得されたコンテンツファイルのうち動画に対応する音声を音声変換処理部２００に出力する。 The content acquisition unit 120 acquires a content file stored in the content storage unit 110 in response to an operation input related to content acquisition from the operation reception unit 230. The content acquisition unit 120 outputs a moving image of the acquired content files to the image conversion information supply unit 130 and the image conversion unit 140. In addition, the content acquisition unit 120 outputs the audio corresponding to the moving image in the acquired content file to the audio conversion processing unit 200.

画像変換情報供給部１３０は、コンテンツ取得部１２０から出力された動画を解析して動き情報を検出し、この動き情報に基づいてアフィン変換パラメータを算出するものである。すなわち、画像変換情報供給部１３０は、動画を構成する各画像から特徴点を抽出するとともに、この特徴点に対するオプティカルフロー（動きベクトル）を抽出し、この抽出された特徴点に対するオプティカルフローを解析して支配的な動きを見せた特徴点を選択し、この支配的な動きを見せた特徴点に対するオプティカルフローに基づいて撮像装置の動きを推定する。ここで、支配的な動きとは、複数の特徴点に対するオプティカルフローの中で、比較的多数のオプティカルフローが示す規則的な動きを意味する。また、画像変換情報供給部１３０は、そのアフィン変換パラメータを画像変換部１４０に供給する。 The image conversion information supply unit 130 analyzes the moving image output from the content acquisition unit 120 to detect motion information, and calculates an affine transformation parameter based on the motion information. That is, the image conversion information supply unit 130 extracts a feature point from each image constituting the moving image, extracts an optical flow (motion vector) for the feature point, and analyzes the optical flow for the extracted feature point. Then, the feature point that shows the dominant motion is selected, and the motion of the imaging device is estimated based on the optical flow for the feature point that showed the dominant motion. Here, the dominant movement means a regular movement indicated by a relatively large number of optical flows among the optical flows for a plurality of feature points. In addition, the image conversion information supply unit 130 supplies the affine transformation parameters to the image conversion unit 140.

画像変換部１４０は、コンテンツ取得部１２０から出力された動画を構成する画像、および、画像メモリ１６０に保持されていた画像について、先頭フレームに対応する画像を基準として画像変換情報供給部１３０から供給されたアフィン変換パラメータを用いてフレーム毎にアフィン変換を施すものである。具体的には、画像変換部１４０は、現フレームに対応するアフィン変換パラメータの行列と、その直前までの各フレームに対応するアフィン変換パラメータの行列との乗算により求められたアフィン変換パラメータの行列を用いてアフィン変換を施す。この画像変換部１４０は、コンテンツ取得部１２０から出力された動画を構成する画像、および、画像メモリ１６０に保持されていた合成画像について、少なくとも何れか一方にアフィン変換を施してそれぞれを画像合成部１５０に出力する。また、画像変換部１４０は、画像メモリ１６０における先頭フレームに対応する画像を基準として、この乗算により求められたアフィン変換パラメータに基づいて、現フレームに対応する画像の中心位置、角度および倍率を算出して音声変換情報算出部１９０に出力する。ここで、本発明の実施の形態では、先頭フレームに対応する画像に関する情報を基準情報として説明する。この基準情報とは、画像メモリ１６０における先頭フレームに対応する画像の中心位置、角度および大きさを示す情報であり、画像変換部１４０に保持される。 The image conversion unit 140 supplies the images constituting the moving image output from the content acquisition unit 120 and the images held in the image memory 160 from the image conversion information supply unit 130 based on the image corresponding to the first frame. The affine transformation is performed for each frame using the affine transformation parameters. Specifically, the image conversion unit 140 calculates an affine transformation parameter matrix obtained by multiplying an affine transformation parameter matrix corresponding to the current frame by an affine transformation parameter matrix corresponding to each previous frame. To perform affine transformation. The image conversion unit 140 performs affine transformation on at least one of the images constituting the moving image output from the content acquisition unit 120 and the composite image held in the image memory 160, and each of the images is combined with the image combination unit. 150. In addition, the image conversion unit 140 calculates the center position, angle, and magnification of the image corresponding to the current frame based on the affine transformation parameters obtained by this multiplication with the image corresponding to the first frame in the image memory 160 as a reference. And output to the voice conversion information calculation unit 190. Here, in the embodiment of the present invention, information regarding an image corresponding to the first frame will be described as reference information. The reference information is information indicating the center position, angle, and size of the image corresponding to the first frame in the image memory 160, and is held in the image conversion unit 140.

具体的には、現フレームよりも前の各フレームに対応する画像により合成された合成画像を固定して動画を再生表示する場合には、画像変換部１４０は、コンテンツ取得部１２０から出力された現フレームに対応する画像を、画像変換情報供給部１３０から供給されたアフィン変換パラメータを用いてアフィン変換する。そして、画像変換部１４０は、画像メモリ１６０に保持されている画像および変換された現フレームに対応する画像を出力する。この場合には、画像変換部１４０は、現フレームに対応する画像の倍率以外の中心位置および角度を音声変換情報算出部１９０に出力する。一方、現フレームに対応する画像を固定して動画を再生表示する場合には、画像変換部１４０は、画像メモリ１６０に保持されている合成画像を、画像変換情報供給部１３０から供給されたアフィン変換パラメータを用いて、アフィン変換パラメータの方向とは逆方向にアフィン変換する。そして、画像変換部１４０は、現フレームに対応する画像および逆方向に変換された合成画像を画像合成部１５０に出力する。この場合には、画像変換部１４０は、現フレームに対応する画像の倍率のみを音声変換情報算出部１９０に出力する。また、現フレームに対応する画像の表示倍率を固定して動画を再生表示する場合には、画像変換部１４０は、画像変換情報供給部１３０から供給されたアフィン変換パラメータについて、倍率に関する要素（ズーム成分）と、倍率以外の要素（移動または回転に関する要素）とに分離し、画像メモリ１６０に保持されている現フレームより前の各フレームに対応する合成画像には拡大縮小に関する要素を用いて、アフィン変換パラメータの方向とは逆方向にアフィン変換を施し、コンテンツ取得部１２０から出力された現フレームに対応する画像には移動または回転に関する要素を用いてアフィン変換を施す。そして、画像変換部１４０は、変換された両方の画像を画像合成部１５０に出力する。この場合には、画像変換部１４０は、現フレームに対応する画像の中心位置、角度および倍率を音声変換情報算出部１９０に出力する。 Specifically, in the case where a synthesized image synthesized by images corresponding to each frame before the current frame is fixed and a moving image is reproduced and displayed, the image conversion unit 140 is output from the content acquisition unit 120. The image corresponding to the current frame is affine transformed using the affine transformation parameters supplied from the image conversion information supply unit 130. Then, the image conversion unit 140 outputs an image stored in the image memory 160 and an image corresponding to the converted current frame. In this case, the image conversion unit 140 outputs the center position and angle other than the magnification of the image corresponding to the current frame to the audio conversion information calculation unit 190. On the other hand, when the image corresponding to the current frame is fixed and the moving image is reproduced and displayed, the image conversion unit 140 converts the composite image held in the image memory 160 into the affine supplied from the image conversion information supply unit 130. Using the transformation parameter, affine transformation is performed in the direction opposite to the direction of the affine transformation parameter. Then, the image conversion unit 140 outputs the image corresponding to the current frame and the combined image converted in the reverse direction to the image combining unit 150. In this case, the image conversion unit 140 outputs only the magnification of the image corresponding to the current frame to the audio conversion information calculation unit 190. In addition, when a moving image is reproduced and displayed with the display magnification of the image corresponding to the current frame fixed, the image conversion unit 140 uses a factor (zoom) for the affine transformation parameter supplied from the image conversion information supply unit 130. Component) and elements other than the magnification (elements related to movement or rotation), and the composite image corresponding to each frame prior to the current frame held in the image memory 160 uses elements related to enlargement / reduction, Affine transformation is performed in the direction opposite to the direction of the affine transformation parameter, and the image corresponding to the current frame output from the content acquisition unit 120 is subjected to affine transformation using elements relating to movement or rotation. Then, the image conversion unit 140 outputs both converted images to the image composition unit 150. In this case, the image conversion unit 140 outputs the center position, angle, and magnification of the image corresponding to the current frame to the audio conversion information calculation unit 190.

これらの変換は、操作受付部２３０からの再生指示に係る操作入力に応じて行われる。なお、ここでは一例として、操作受付部２３０からの再生指示に係る操作入力に応じて、画像変換部１４０が現フレームに対応する画像の中心位置、角度および倍率を算出してその算出した情報を音声変換情報算出部１９０に出力する例を示したが、音声変換情報算出部１９０が、画像変換部１４０から出力された基準情報および乗算により求められるアフィン変換パラメータを用いて、現フレームに対応する画像の中心位置、角度および倍率を算出してもよい。さらに、画像変換部１４０は、画像メモリ１６０における先頭フレームに対応する画像の中心位置、角度および大きさを音声変換情報算出部１９０に出力する代わりに、現フレームよりも１つ前のフレームに対応する画像の中心位置、角度および大きさを出力するようにしてもよい。この場合には、音声変換情報算出部１９０は、現フレームよりも１つ前のフレームに対応する画像の中心位置、角度および大きさを基準情報の代わりに用いて、現フレームに対応するアフィン変換パラメータに基づいて現フレームに対応する画像の中心位置、角度および倍率を算出する。なお、現フレームより所定数前のフレームに対応する画像についても、所定数前のフレームに対応する画像の中心位置、角度および大きさを用いて、現フレームから所定数前のフレームまでの各フレームに対応するアフィン変換パラメータに基づいて同様に算出することができる。 These conversions are performed in response to an operation input related to a reproduction instruction from the operation reception unit 230. Here, as an example, the image conversion unit 140 calculates the center position, angle, and magnification of the image corresponding to the current frame in response to an operation input related to the reproduction instruction from the operation reception unit 230, and the calculated information is Although an example of outputting to the voice conversion information calculation unit 190 has been shown, the voice conversion information calculation unit 190 uses the reference information output from the image conversion unit 140 and the affine transformation parameters obtained by multiplication to handle the current frame. The center position, angle, and magnification of the image may be calculated. Furthermore, instead of outputting the center position, angle, and size of the image corresponding to the first frame in the image memory 160 to the audio conversion information calculation unit 190, the image conversion unit 140 corresponds to the frame immediately before the current frame. The center position, angle and size of the image to be output may be output. In this case, the speech conversion information calculation unit 190 uses the center position, angle, and size of the image corresponding to the frame immediately before the current frame instead of the reference information, and uses the affine transformation corresponding to the current frame. Based on the parameters, the center position, angle and magnification of the image corresponding to the current frame are calculated. As for the image corresponding to the predetermined number of frames before the current frame, each frame from the current frame to the predetermined number of frames using the center position, angle and size of the image corresponding to the predetermined number of frames. Can be similarly calculated based on the affine transformation parameters corresponding to.

画像合成部１５０は、コンテンツ取得部１２０から出力された動画を構成する画像と、画像メモリ１６０に保持されていた合成画像とを画像変換部１４０から受け取って合成するものである。この画像合成部１５０は、合成した合成画像を画像メモリ１６０に保持させるとともに表示制御部１７０に出力する。なお、ここでは一例として、画像合成部１５０が合成画像を画像メモリ１６０に保持させる例を示すが、合成画像を画像メモリ１６０に保持させずに、予め定められた画像を画像メモリ１６０に保持させておくようにしてもよい。例えば、予め定められた画像を、公園の画像とし、この公園の画像に合成させる動画を、散歩をしている子供を撮像した動画とする場合に、この公園の画像を画像メモリ１６０に保持させ、この公園の画像上にその動画をアフィン変換させながら合成させるようにすることができる。これにより、公園上を子供が散歩するような仮想的な動画を表示させることができるようになる。 The image composition unit 150 is configured to receive and compose the image constituting the moving image output from the content acquisition unit 120 and the composite image held in the image memory 160 from the image conversion unit 140. The image composition unit 150 causes the image memory 160 to hold the synthesized composite image and outputs it to the display control unit 170. Here, as an example, an example is shown in which the image composition unit 150 retains the composite image in the image memory 160, but a predetermined image is retained in the image memory 160 without retaining the composite image in the image memory 160. You may make it leave. For example, when a predetermined image is a park image and a moving image to be combined with the park image is a moving image of a child taking a walk, the park image is stored in the image memory 160. Then, it is possible to synthesize the moving image on the park image while performing affine transformation. As a result, it is possible to display a virtual moving image in which a child walks in the park.

画像メモリ１６０は、画像合成部１５０により合成された合成画像を保持するワークバッファである。画像メモリ１６０は、その保持している合成画像を画像変換部１４０に供給する。 The image memory 160 is a work buffer that holds the synthesized image synthesized by the image synthesis unit 150. The image memory 160 supplies the held composite image to the image conversion unit 140.

表示制御部１７０は、画像合成部１５０により合成された合成画像をフレーム毎に表示部１８０に表示させるものである。 The display control unit 170 displays the combined image combined by the image combining unit 150 on the display unit 180 for each frame.

表示部１８０は、表示制御部１７０の制御に基づいて、画像合成部１５０により合成された合成画像を表示するものである。例えば、パーソナルコンピュータやテレビジョンのディスプレイにより実現することができる。 The display unit 180 displays the synthesized image synthesized by the image synthesis unit 150 based on the control of the display control unit 170. For example, it can be realized by a display of a personal computer or a television.

音声変換情報算出部１９０は、画像変換部１４０においてアフィン変換パラメータおよび基準情報から求められた、現フレームに対応する画像の中心位置、角度または倍率に基づいて音声変換情報を算出するものである。ここにいう、音声変換情報とは、コンテンツ取得部１２０から出力された音声を変換するためのものである。この音声変換情報算出部１９０は、算出した音声変換情報を音声変換処理部２００に出力する。 The voice conversion information calculation unit 190 calculates the voice conversion information based on the center position, angle, or magnification of the image corresponding to the current frame, which is obtained from the affine transformation parameters and the reference information in the image conversion unit 140. Here, the audio conversion information is for converting audio output from the content acquisition unit 120. The voice conversion information calculation unit 190 outputs the calculated voice conversion information to the voice conversion processing unit 200.

音声変換処理部２００は、音声変換情報算出部１９０により算出された音声変換情報に基づいてコンテンツ取得部１２０から出力された音声を変換して出力音声を生成するものである。この音声変換処理部２００は、生成した出力音声を音声出力制御部２１０に出力する。この音声変換処理部２００は、音量調整部２０１と音声加算部２０２とを備える。音量調整部２０１は、音声変換情報算出部１９０により算出された音声変換情報に基づいてコンテンツ取得部１２０から出力された音声を構成する複数のチャンネルの各音量を調整するものである。この音量調整部２０１は、調整した複数のチャンネルの音声を音声加算部２０２に出力する。音声加算部２０２は、音量調整部２０１により調整された音声をチャンネル毎に加算するものである。この音声加算部２０２は、加算した音声を出力音声として音声出力制御部２１０に出力する。 The voice conversion processing unit 200 converts the voice output from the content acquisition unit 120 based on the voice conversion information calculated by the voice conversion information calculation unit 190 to generate output voice. The voice conversion processing unit 200 outputs the generated output voice to the voice output control unit 210. The voice conversion processing unit 200 includes a volume adjusting unit 201 and a voice adding unit 202. The volume adjustment unit 201 adjusts the volume of each of a plurality of channels constituting the audio output from the content acquisition unit 120 based on the audio conversion information calculated by the audio conversion information calculation unit 190. The volume adjusting unit 201 outputs the adjusted audio of the plurality of channels to the audio adding unit 202. The audio adding unit 202 adds the audio adjusted by the volume adjusting unit 201 for each channel. The sound adder 202 outputs the added sound to the sound output controller 210 as output sound.

音声出力制御部２１０は、音声変換処理部２００により生成された出力音声をスピーカ２２０に出力させるものである。 The audio output control unit 210 causes the speaker 220 to output the output audio generated by the audio conversion processing unit 200.

スピーカ２２０は、音声出力制御部２１０の制御に基づいて音声変換処理部２００により生成された出力音声を出力するものである。また、このスピーカ２２０は、複数のスピーカから構成されるスピーカシステムである。 The speaker 220 outputs the output sound generated by the sound conversion processing unit 200 based on the control of the sound output control unit 210. The speaker 220 is a speaker system including a plurality of speakers.

操作受付部２３０は、各種操作キー等を備え、これらのキーによる操作入力を受け付けると、受け付けた操作入力の内容をコンテンツ取得部１２０または画像変換部１４０に出力するものである。操作受付部２３０には、例えば、動画を再生する場合における表示モードを設定する設定キーが設けられている。この表示モードとして、例えば、現フレームに対応する画像にアフィン変換を施して、前の各フレームに対応する合成画像との合成画像を作成して表示する表示モード、前の各フレームに対応する合成画像にアフィン変換パラメータの方向とは逆方向にアフィン変換を施して、現フレームに対応する画像との合成画像を作成して表示する表示モード、または、現フレームに対応する画像の表示倍率を固定して動画を再生表示する表示モードがある。また、操作受付部２３０は、コンテンツ取得に係る操作入力の内容をコンテンツ取得部１２０に出力する。 The operation receiving unit 230 includes various operation keys and the like, and when receiving operation inputs using these keys, outputs the contents of the received operation input to the content acquisition unit 120 or the image conversion unit 140. For example, the operation accepting unit 230 is provided with a setting key for setting a display mode when a moving image is reproduced. As this display mode, for example, a display mode in which an image corresponding to the current frame is subjected to affine transformation to create and display a composite image with a composite image corresponding to each previous frame, and a composite corresponding to each previous frame A display mode in which the image is affine transformed in the direction opposite to the direction of the affine transformation parameter to create and display a composite image with the image corresponding to the current frame, or the display magnification of the image corresponding to the current frame is fixed. Then there is a display mode for playing back and displaying moving images. Further, the operation reception unit 230 outputs the content of the operation input related to content acquisition to the content acquisition unit 120.

なお、図１では、画像変換情報供給部１３０がアフィン変換パラメータを算出する例について説明したが、アフィン変換パラメータを関連付けた動画をコンテンツ記憶部１１０に記憶させておき、この動画をコンテンツ取得部１２０が取得して画像変換情報供給部１３０に出力し、この動画に関連付けられたアフィン変換パラメータを画像変換情報供給部１３０が抽出して画像変換部１４０に出力するようにしてもよい。 Although the example in which the image conversion information supply unit 130 calculates the affine transformation parameters has been described with reference to FIG. May be acquired and output to the image conversion information supply unit 130, and the image conversion information supply unit 130 may extract and output the affine transformation parameters associated with the moving image to the image conversion unit 140.

次に、画像変換に用いられるアフィン変換パラメータを検出する検出方法について図面を参照して詳細に説明する。 Next, a detection method for detecting an affine transformation parameter used for image conversion will be described in detail with reference to the drawings.

図２（ａ）乃至（ｃ）は、動画を構成するフレームに対応する画像の一例を示す図である。図３（ａ）は、図２に示す画像３００に対応するフレームの１つ前のフレームに対応する画像について背景等を省略して簡略化した画像を示す図である。また、図３（ｂ）および（ｃ）は、図２に示す画像３００について背景等を省略して簡略化した画像を示す図である。 2A to 2C are diagrams illustrating an example of an image corresponding to a frame constituting a moving image. FIG. 3A is a diagram illustrating a simplified image in which the background and the like are omitted from the image corresponding to the frame immediately before the frame corresponding to the image 300 illustrated in FIG. FIGS. 3B and 3C are views showing images simplified by omitting the background of the image 300 shown in FIG.

図２および図３に示す画像３００、３２０、３３０には、人が跨っている馬の像３０１、３２１、３３１と、この馬の像３０１、３２１、３３１の手前に設置されている蛇の像３０２、３２２、３３２とが含まれている。また、図２に示すように、これらの像の背景には旗や椅子等が存在し、この旗が風になびいている。 The images 300, 320, and 330 shown in FIG. 2 and FIG. 302, 322, and 332 are included. Further, as shown in FIG. 2, there are flags, chairs, etc. in the background of these images, and these flags are fluttering in the wind.

図３（ａ）に示す画像３２０は、図２（ａ）乃至（ｃ）および図３（ｂ）および（ｃ）に示す画像３００、３３０に対応するフレームの１つ前のフレームに対応する画像を簡略化した画像である。また、２つの連続するフレームに対応する画像３２０および３３０は、画面内の被写体がしだいに大きくなる場合における遷移を示す画像である。すなわち、この撮影時には、画面内の被写体をしだいに大きくする操作であるズームイン操作がされている。 An image 320 shown in FIG. 3A is an image corresponding to a frame immediately preceding the frame corresponding to the images 300 and 330 shown in FIGS. 2A to 2C and FIGS. This is a simplified image. Also, images 320 and 330 corresponding to two consecutive frames are images showing transitions when the subject in the screen gradually increases. That is, at the time of this photographing, a zoom-in operation that is an operation of gradually increasing the subject in the screen is performed.

本発明の実施の形態では、動画を構成する画像から特徴点を検出し、この特徴点に対応するオプティカルフローを用いてアフィン変換パラメータを計算する方法を例にして説明する。また、この例では、特徴点としてコーナー点を用いる場合について説明する。 In the embodiment of the present invention, a method of detecting a feature point from an image constituting a moving image and calculating an affine transformation parameter using an optical flow corresponding to the feature point will be described as an example. In this example, a case where a corner point is used as a feature point will be described.

ここで、図３（ａ）乃至（ｃ）では、画像３２０および３３０から検出された３つのコーナー点に対応するオプティカルフローを用いてアフィン変換パラメータを計算する方法を例にして説明する。 Here, in FIGS. 3A to 3C, a method for calculating affine transformation parameters using an optical flow corresponding to three corner points detected from the images 320 and 330 will be described as an example.

例えば、図３（ａ）に示す画像３２０において、特徴点として、馬の像３２１における口付近のコーナー点３２３と、馬の像３２１における人のお尻付近のコーナー点３２４と、蛇の像３２２の口付近のコーナー点３２５とが検出されているものとする。この場合において、図３（ｂ）に示す画像３３０において、勾配法やブロックマッチング法等により、画像３２０におけるコーナー点３２３、３２４および３２５に対するオプティカルフロー３３７、３３８および３３９が検出される。そして、この検出されたオプティカルフロー３３７、３３８および３３９に基づいて、画像３２０におけるコーナー点３２３、３２４および３２５に対応するコーナー点３３３、３３４および３３５が検出される。 For example, in the image 320 shown in FIG. 3A, as feature points, a corner point 323 near the mouth in the horse image 321, a corner point 324 near the human buttocks in the horse image 321, and a snake image 322 are used. It is assumed that a corner point 325 near the mouth is detected. In this case, in the image 330 shown in FIG. 3B, optical flows 337, 338, and 339 for the corner points 323, 324, and 325 in the image 320 are detected by a gradient method, a block matching method, or the like. Then, based on the detected optical flows 337, 338, and 339, corner points 333, 334, and 335 corresponding to the corner points 323, 324, and 325 in the image 320 are detected.

ここで、例えば、図３（ａ）および（ｂ）に示す画像３２０および３３０に含まれる馬の像３２１、３３１や蛇の像３２２、３３２は、地面に設置されているものであるため、撮像装置の動きとは無関係に動くものではない。このため、馬の像３２１、３３１や蛇の像３２２、３３２について検出されたコーナー点に対して求められたオプティカルフローに基づいて、撮像装置の動きを正確に推定することができる。例えば、図３（ｃ）に示すように、画像３３０において検出された３つのオプティカルフロー３３７乃至３３９に基づいて、画像３３０が、点３３６を中心にして画像３２０を拡大したものであることを推定することができる。これにより、画像３３０の撮影時における撮像装置の動きは、点３３６を中心とするズームイン動作であると判断することができる。このように、撮像装置の動きとは無関係に動くものではない物体についてコーナー点を検出し、このコーナー点に対して求められたオプティカルフローに基づいて、一定の規則性を備える撮像装置の動きを正確に検出することができる。このため、これらのコーナー点に対して求められたオプティカルフローを用いて、アフィン変換パラメータを計算して求めることができる。 Here, for example, the horse images 321 and 331 and the snake images 322 and 332 included in the images 320 and 330 shown in FIGS. 3A and 3B are set on the ground. It does not move independently of the movement of the device. Therefore, it is possible to accurately estimate the motion of the imaging device based on the optical flow obtained for the corner points detected for the horse images 321 and 331 and the snake images 322 and 332. For example, as shown in FIG. 3C, based on the three optical flows 337 to 339 detected in the image 330, it is estimated that the image 330 is obtained by enlarging the image 320 around the point 336. can do. Accordingly, it is possible to determine that the movement of the imaging device at the time of capturing the image 330 is a zoom-in operation centered on the point 336. As described above, a corner point is detected for an object that does not move independently of the movement of the imaging device, and the movement of the imaging device having a certain regularity is detected based on the optical flow obtained for the corner point. It can be detected accurately. Therefore, the affine transformation parameters can be calculated and obtained using the optical flow obtained for these corner points.

しかしながら、風になびいている旗等のように、撮像装置の動きとは無関係に動く物体が画像内に含まれる場合が考えられる。例えば、図２に示す画像３００には、風になびいている旗が含まれている。このような撮像装置の動きとは無関係に動く物体についてコーナー点が検出され、このコーナー点に対して求められたオプティカルフローを用いて撮像装置の動きを推定する場合には、撮像装置の動きを正確に推定することができない。 However, there may be a case where an object that moves independently of the movement of the imaging device is included in the image, such as a flag fluttering in the wind. For example, the image 300 shown in FIG. 2 includes a flag fluttering in the wind. When a corner point is detected for an object that moves independently of the movement of the imaging apparatus, and the movement of the imaging apparatus is estimated using the optical flow obtained for the corner point, the movement of the imaging apparatus is determined. It cannot be estimated accurately.

例えば、図２（ｂ）に示す画像３００において検出されたオプティカルフローを矢印で示すとともに、このオプティカルフローにより検出されたコーナー点を矢印の先端に白抜きの丸で示す。ここで、コーナー点３０３乃至３０５は、図３（ｂ）および（ｃ）に示すコーナー点３３３乃至３３５に対応するコーナー点である。また、コーナー点３０６乃至３１１は、馬の像３０１の背景に存在する旗について検出されたコーナー点である。そして、これらの旗が風になびいているため、風の影響による旗の動きがオプティカルフローとして検出されている。すなわち、コーナー点３０６乃至３１１に対応する各オプティカルフローは、撮像装置の動きとは無関係に動く旗について検出されたものである。このため、アフィン変換パラメータを計算する場合に用いられる３つのオプティカルフローに、コーナー点３０６乃至３１１のうちの少なくとも１つのコーナー点に対応するオプティカルフローが含まれている場合には、正確な撮像装置の動きを検出することができない。この場合には、正確なアフィン変換パラメータを計算することができない。 For example, the optical flow detected in the image 300 shown in FIG. 2B is indicated by an arrow, and the corner point detected by the optical flow is indicated by a white circle at the tip of the arrow. Here, the corner points 303 to 305 are corner points corresponding to the corner points 333 to 335 shown in FIGS. 3B and 3C. Corner points 306 to 311 are corner points detected for the flag present in the background of the horse image 301. Since these flags flutter in the wind, the movement of the flag due to the influence of the wind is detected as an optical flow. That is, each optical flow corresponding to the corner points 306 to 311 is detected for a flag that moves independently of the movement of the imaging apparatus. Therefore, when the optical flow corresponding to at least one of the corner points 306 to 311 is included in the three optical flows used when calculating the affine transformation parameters, an accurate imaging device Cannot detect the movement. In this case, accurate affine transformation parameters cannot be calculated.

以上で示したように、例えば、撮像装置の動きとは無関係に動く物体に対するオプティカルフロー（図２（ｂ）に示すコーナー点３０６乃至３１１に対応する各オプティカルフロー）と、撮像装置の動きとの関係で一定の規則性を備えるオプティカルフロー（図２（ｂ）に示すコーナー点３０６乃至３１１に対応する各オプティカルフロー以外のオプティカルフロー）とが、撮影画像から検出されることがある。 As described above, for example, the optical flow for an object that moves independently of the movement of the imaging device (each optical flow corresponding to the corner points 306 to 311 shown in FIG. 2B) and the movement of the imaging device. An optical flow having a certain regularity in relation (an optical flow other than the optical flows corresponding to the corner points 306 to 311 shown in FIG. 2B) may be detected from the captured image.

そこで、本発明の実施の形態では、３個のオプティカルフローに基づいてアフィン変換パラメータを計算するアフィン変換パラメータ計算処理を複数回行い、複数のアフィン変換パラメータを求め、これらの複数のアフィン変換パラメータの中から最適なアフィン変換パラメータを選択する例について説明する。なお、この例では、動画を構成する各画像に含まれている動物体の大きさが、画像の面積に対して比較的小さいものとする。 Therefore, in the embodiment of the present invention, the affine transformation parameter calculation process for calculating the affine transformation parameters based on the three optical flows is performed a plurality of times, a plurality of affine transformation parameters are obtained, and the plurality of affine transformation parameters are calculated. An example of selecting an optimum affine transformation parameter from among them will be described. In this example, it is assumed that the size of the moving object included in each image constituting the moving image is relatively small with respect to the area of the image.

ここで、アフィン変換について簡単に説明する。２次元上において、移動元の位置を（ｘ，ｙ）とし、アフィン変換後の移動先の位置を（ｘ´，ｙ´）とした場合に、アフィン変換の行列式は、式１で表すことができる。
Here, the affine transformation will be briefly described. In two dimensions, when the position of the movement source is (x, y) and the position of the movement destination after the affine transformation is (x ′, y ′), the determinant of the affine transformation is expressed by Equation 1. Can do.

ここで、ａ乃至ｆは、アフィン変換パラメータである。また、このアフィン変換パラメータによるアフィン行列ＡＭを次の式で表すことができる。この場合に、Ｘ方向のズーム成分ＸＺ、Ｙ方向のズーム成分ＹＺ、Ｘ方向の併進成分ＸＴ、Ｙ方向の併進成分ＹＴ、回転成分Ｒについては、それぞれ次の式で求めることができる。なお、単位行列の場合には、ａ＝ｅ＝１、ｂ＝ｃ＝ｄ＝ｆ＝０となる。
Here, a to f are affine transformation parameters. Further, the affine matrix AM based on this affine transformation parameter can be expressed by the following equation. In this case, the zoom component XZ in the X direction, the zoom component YZ in the Y direction, the translation component XT in the X direction, the translation component YT in the Y direction, and the rotation component R can be obtained by the following equations, respectively. In the case of a unit matrix, a = e = 1 and b = c = d = f = 0.

次に、アフィン変換パラメータの計算方法について説明する。 Next, a method for calculating affine transformation parameters will be described.

最初に、動画を構成するフレームの中の１つのフレームである現フレームに対応する画像において、オプティカルフローが検出された特徴点の中から３個の特徴点が選択される。例えば、図２（ｂ）に示す画像３００において検出されたコーナー点（白抜きの丸で示す）の中からランダムに３個のコーナー点が選択される。なお、アフィン変換パラメータとして、射影変換パラメータを用いる場合には、４個の特徴点がランダムに選択される。 First, in the image corresponding to the current frame, which is one of the frames constituting the moving image, three feature points are selected from the feature points from which the optical flow has been detected. For example, three corner points are selected at random from the corner points (indicated by white circles) detected in the image 300 shown in FIG. When projective transformation parameters are used as affine transformation parameters, four feature points are selected at random.

続いて、選択された３個の特徴点に対応する３個のオプティカルフローを用いてアフィン変換パラメータが計算される。例えば、図２（ｂ）に示す画像３００におけるコーナー点（白抜きの丸で示す）の中から選択された３個のコーナー点に対応するオプティカルフロー（白抜きの丸に接続される矢印で示す）を用いてアフィン変換パラメータが計算される。このアフィン変換パラメータは、式１を用いて求めることができる。 Subsequently, affine transformation parameters are calculated using the three optical flows corresponding to the three selected feature points. For example, an optical flow corresponding to three corner points selected from the corner points (indicated by white circles) in the image 300 shown in FIG. 2B (indicated by arrows connected to the white circles). ) Is used to calculate the affine transformation parameters. This affine transformation parameter can be obtained using Equation 1.

続いて、求められたアフィン変換パラメータに基づいて、アフィン変換パラメータのスコアが計算される。具体的には、求められたアフィン変換パラメータを用いて、現フレームの直前のフレームに対応する画像における全ての特徴点の移動先の位置を求める。そして、このアフィン変換パラメータを用いて求められた特徴点の位置と、現フレームにおいて検出された特徴点の位置とを比較して、互いに対応する２つの特徴点の位置の差分値が特徴点毎に計算される。差分値として、例えば、互いに対応する２つの特徴点の位置間の絶対距離が計算される。続いて、計算された差分値と、予め設定されている閾値とを特徴点毎に比較して、その差分値が閾値よりも小さい特徴点の個数をアフィン変換パラメータのスコアとして求める。このように、オプティカルフローが検出された特徴点の中から３個の特徴点をランダムに選択し、これらの特徴点に対応するオプティカルフローに基づいてアフィン変換パラメータのスコアを算出する処理を所定回数繰り返し、アフィン変換パラメータのスコアを複数算出する。この所定回数は、比較の対象となる画像の種類や画像処理装置１００の処理能力等に応じて適宜設定するようにしてもよく、固定値を用いるようにしてもよい。この所定回数として、例えば、画像処理装置１００の処理能力を考慮して２０回程度と設定することができる。 Subsequently, the score of the affine transformation parameter is calculated based on the obtained affine transformation parameter. Specifically, using the obtained affine transformation parameters, the positions of movement destinations of all feature points in the image corresponding to the frame immediately before the current frame are obtained. Then, the position of the feature point obtained using the affine transformation parameter is compared with the position of the feature point detected in the current frame, and the difference value between the positions of the two feature points corresponding to each other is determined for each feature point. Is calculated. As the difference value, for example, an absolute distance between the positions of two feature points corresponding to each other is calculated. Subsequently, the calculated difference value and a preset threshold value are compared for each feature point, and the number of feature points having the difference value smaller than the threshold value is obtained as a score of the affine transformation parameter. In this way, three feature points are randomly selected from the feature points in which the optical flow is detected, and a process for calculating the score of the affine transformation parameter based on the optical flow corresponding to these feature points is performed a predetermined number of times. A plurality of affine transformation parameter scores are calculated repeatedly. The predetermined number of times may be set as appropriate according to the type of image to be compared, the processing capability of the image processing apparatus 100, or a fixed value. For example, the predetermined number of times can be set to about 20 in consideration of the processing capability of the image processing apparatus 100.

例えば、図２（ｂ）に示す画像３００において検出されたコーナー点の中から、コーナー点３０６乃至３１１以外のコーナー点が３個選択された場合を考える。このように選択された３個のコーナー点に対応する３個のオプティカルフローを用いてアフィン変換パラメータが計算されると、上述したように、この３個のオプティカルフローは一定の規則性を備えているため、直前のフレームに対応する画像を一定の規則に従って変換させるアフィン変換パラメータが求められる。このため、アフィン変換パラメータを用いて求められたコーナー点の位置と、現フレームにおいて検出されたコーナー点の位置とについて、コーナー点３０６乃至３１１以外のコーナー点に関して求められる差分値は、比較的小さい値が算出される。このため、アフィン変換パラメータのスコアは、大きい値になる。 For example, consider a case where three corner points other than the corner points 306 to 311 are selected from the corner points detected in the image 300 shown in FIG. When the affine transformation parameters are calculated using the three optical flows corresponding to the three corner points selected in this way, as described above, the three optical flows have a certain regularity. Therefore, an affine transformation parameter for transforming an image corresponding to the immediately preceding frame according to a certain rule is obtained. For this reason, the difference value calculated | required regarding corner points other than the corner points 306-311 about the position of the corner point calculated | required using the affine transformation parameter and the position of the corner point detected in the present flame | frame is comparatively small. A value is calculated. For this reason, the score of the affine transformation parameter becomes a large value.

一方、図２（ｂ）に示す画像３００において検出されたコーナー点の中から、コーナー点３０６乃至３１１のうちの少なくとも１個を含む３個のコーナー点が選択された場合を考える。このように選択された３個のコーナー点に対応する３個のオプティカルフローを用いてアフィン変換パラメータが計算されると、上述したように、この３個のオプティカルフローには、一定の規則性を備えていないオプティカルフローが含まれるため、直前のフレームに対応する画像を一定の規則に従って変換させるものではないアフィン変換パラメータが求められる。このため、アフィン変換パラメータを用いて求められたコーナー点の位置と、現フレームにおいて検出されたコーナー点の位置とについて求められる差分値は、任意のコーナー点で比較的大きい値が算出される。このため、アフィン変換パラメータのスコアは、小さい値になる。 On the other hand, a case is considered where three corner points including at least one of the corner points 306 to 311 are selected from the corner points detected in the image 300 shown in FIG. When the affine transformation parameters are calculated using the three optical flows corresponding to the three corner points thus selected, as described above, the three optical flows have a certain regularity. Since an optical flow that is not provided is included, an affine transformation parameter that does not transform an image corresponding to the immediately preceding frame according to a certain rule is obtained. For this reason, the difference value calculated | required about the position of the corner point calculated | required using the affine transformation parameter and the position of the corner point detected in the present flame | frame is calculated relatively large value in arbitrary corner points. For this reason, the score of the affine transformation parameter becomes a small value.

続いて、求められた複数のアフィン変換パラメータのスコアの中で、スコアの値が最も大きいアフィン変換パラメータを代表アフィン変換パラメータとして選択する。そして、選択された代表アフィン変換パラメータを、画像変換部１４０に供給する。これにより、動画を構成する画像をアフィン変換する場合に、最適なアフィン変換パラメータを用いてアフィン変換することができる。 Subsequently, the affine transformation parameter having the largest score value is selected as the representative affine transformation parameter among the obtained scores of the plurality of affine transformation parameters. Then, the selected representative affine transformation parameter is supplied to the image conversion unit 140. Thereby, when the image which comprises a moving image is affine-transformed, it can affine-transform using an optimal affine transformation parameter.

以上で示したように、動画を構成する各画像に人物や車等の動いている物体（動物体）が含まれている場合でも、画像の面積に対するその動物体の大きさが比較的小さい場合には、動物体の影響を受けずに撮像装置の動きを抽出することができる。 As described above, even when each image constituting the moving image includes a moving object (animal body) such as a person or a car, the size of the moving object relative to the area of the image is relatively small. The movement of the imaging device can be extracted without being affected by the moving object.

また、撮像装置の動きを抽出することによって、ズームイン、ズームアウト、パン、チルト、ローテーション等の意図的に撮影者が移動させたと思われる動きを推定することができる。 Further, by extracting the motion of the imaging device, it is possible to estimate a motion that the photographer intentionally moves, such as zoom in, zoom out, pan, tilt, and rotation.

次に、本発明の実施の形態における画像処理装置１００の動作について図面を参照して説明する。 Next, the operation of the image processing apparatus 100 according to the embodiment of the present invention will be described with reference to the drawings.

図４は、本発明の実施の形態における画像処理装置１００によるアフィン変換パラメータ検出処理の処理手順を示すフローチャートである。 FIG. 4 is a flowchart showing a processing procedure of affine transformation parameter detection processing by the image processing apparatus 100 according to the embodiment of the present invention.

最初に、コンテンツ取得部１２０にコンテンツファイルが取得される（ステップＳ９００）。続いて、コンテンツ取得部１２０により取得されたコンテンツファイルの動画がデコードされ、時系列の順序で１つのフレームの画像が取得される（ステップＳ９０１）。続いて、取得された１つのフレームが画像変換情報供給部１３０に入力された動画の先頭のフレームであるか否かが判断される（ステップＳ９０２）。取得された１つのフレームが、先頭のフレームである場合には（ステップＳ９０２）、この先頭のフレームに対応する画像の全体から特徴点が抽出される（ステップＳ９０３）。例えば、図２（ｂ）に示すように、画像において複数のコーナー点が抽出される。続いて、アフィン変換パラメータとして単位行列のアフィン変換パラメータが選択され（ステップＳ９０４）、ステップＳ９１４に進む。 First, a content file is acquired by the content acquisition unit 120 (step S900). Subsequently, the moving image of the content file acquired by the content acquisition unit 120 is decoded, and an image of one frame is acquired in chronological order (step S901). Subsequently, it is determined whether or not the acquired one frame is the first frame of the moving image input to the image conversion information supply unit 130 (step S902). When the acquired one frame is the head frame (step S902), feature points are extracted from the entire image corresponding to the head frame (step S903). For example, as shown in FIG. 2B, a plurality of corner points are extracted from the image. Subsequently, the affine transformation parameter of the unit matrix is selected as the affine transformation parameter (step S904), and the process proceeds to step S914.

一方、取得された１つのフレームが、先頭のフレームではない場合には（ステップＳ９０２）、直前のフレームに対応する画像を基準として新たに撮影された領域から特徴点が抽出される（ステップＳ９０５）。すなわち、直前のフレームに対応する画像において既に抽出されている特徴点については、この特徴点に対応するオプティカルフローにより求めることができるため、現フレームに対応する画像においては抽出されない。 On the other hand, if the acquired one frame is not the first frame (step S902), a feature point is extracted from a region newly taken with reference to an image corresponding to the immediately preceding frame (step S905). . That is, since the feature points already extracted in the image corresponding to the immediately preceding frame can be obtained by the optical flow corresponding to the feature points, they are not extracted in the image corresponding to the current frame.

続いて、直前のフレームに対応する画像から抽出された各特徴点に対するオプティカルフローが計算される（ステップＳ９０６）。すなわち、図２（ｂ）に示すように、各コーナー点に対するオプティカルフローが計算される。 Subsequently, an optical flow for each feature point extracted from the image corresponding to the immediately preceding frame is calculated (step S906). That is, as shown in FIG. 2B, an optical flow for each corner point is calculated.

続いて、変数ｉが「１」に初期化される（ステップＳ９０７）。続いて、オプティカルフローが検出された特徴点の中から、Ｍ個の特徴点が選択される（ステップＳ９０８）。例えば、アフィン変換パラメータを用いる場合には、３個の特徴点がランダムに選択される。また、射影変換パラメータを用いる場合には、４個の特徴点がランダムに選択される。続いて、選択されたＭ個の特徴点に対応して計算されたＭ個のオプティカルフローに基づいて、アフィン変換パラメータが計算される（ステップＳ９０９）。 Subsequently, the variable i is initialized to “1” (step S907). Subsequently, M feature points are selected from the feature points from which the optical flow has been detected (step S908). For example, when using affine transformation parameters, three feature points are selected at random. In addition, when the projective transformation parameter is used, four feature points are selected at random. Subsequently, affine transformation parameters are calculated based on the M optical flows calculated corresponding to the selected M feature points (step S909).

続いて、計算して求められたアフィン変換パラメータに基づいて、アフィン変換パラメータのスコアが計算される（ステップＳ９１０）。具体的には、計算して求められたアフィン変換パラメータを用いて、直前のフレームに対応する画像における全ての特徴点の移動先の位置を求める。そして、このアフィン変換パラメータを用いて求められた特徴点の位置と、ステップＳ９０６でオプティカルフローを計算した際に求められた現フレームに対応する画像における特徴点の位置とを比較して、互いに対応する２つの特徴点の位置の差分値が特徴点毎に計算される。差分値として、例えば、互いに対応する２つの位置間の絶対距離が計算される。続いて、計算された差分値と、予め設定されている閾値とを特徴点毎に比較して、その差分値が閾値よりも小さい特徴点の個数をアフィン変換パラメータのスコアとして求める。 Subsequently, the score of the affine transformation parameter is calculated based on the affine transformation parameter obtained by calculation (step S910). Specifically, using the affine transformation parameters obtained by calculation, the movement destination positions of all feature points in the image corresponding to the immediately preceding frame are obtained. Then, the position of the feature point obtained using this affine transformation parameter is compared with the position of the feature point in the image corresponding to the current frame obtained when the optical flow is calculated in step S906 to correspond to each other. A difference value between the positions of the two feature points is calculated for each feature point. As the difference value, for example, an absolute distance between two positions corresponding to each other is calculated. Subsequently, the calculated difference value and a preset threshold value are compared for each feature point, and the number of feature points having the difference value smaller than the threshold value is obtained as a score of the affine transformation parameter.

続いて、変数ｉに「１」が加算され（ステップＳ９１１）、変数ｉが、定数Ｎよりも大きいか否かが判断される（ステップＳ９１２）。変数ｉが、定数Ｎ以下である場合には（ステップＳ９１２）、ステップＳ９０８に戻り、アフィン変換パラメータのスコア算出処理を繰り返す（ステップＳ９０８乃至Ｓ９１０）。例えば、定数Ｎとして、２０を用いることができる。 Subsequently, “1” is added to the variable i (step S911), and it is determined whether or not the variable i is larger than the constant N (step S912). If the variable i is equal to or less than the constant N (step S912), the process returns to step S908, and the affine transformation parameter score calculation process is repeated (steps S908 to S910). For example, 20 can be used as the constant N.

一方、変数ｉが定数Ｎよりも大きい場合には（ステップＳ９１２）、求められたアフィン変換パラメータのスコアのうちで、スコアの値が最も大きいアフィン変換パラメータが代表アフィン変換パラメータとして選択される（ステップＳ９１３）。続いて、選択された代表アフィン変換パラメータが、画像変換部１４０に供給される（ステップＳ９１４）。なお、現フレームが先頭のフレームである場合には、選択された単位行列のアフィン変換パラメータが、画像変換部１４０に供給される。続いて、現フレームに対応する画像と、この画像における特徴点とが上書き保存される（ステップＳ９１５）。 On the other hand, when the variable i is larger than the constant N (step S912), the affine transformation parameter having the largest score value is selected as the representative affine transformation parameter among the obtained scores of the affine transformation parameters (step S912). S913). Subsequently, the selected representative affine transformation parameter is supplied to the image conversion unit 140 (step S914). When the current frame is the first frame, the affine transformation parameters of the selected unit matrix are supplied to the image conversion unit 140. Subsequently, the image corresponding to the current frame and the feature points in this image are overwritten and saved (step S915).

続いて、現フレームが、画像変換情報供給部１３０に入力された動画の最後のフレームであるか否かが判断される（ステップＳ９１６）。現フレームが、最後のフレームではない場合には（ステップＳ９１６）、ステップＳ９０１に戻り、アフィン変換パラメータ検出処理を繰り返す（ステップＳ９０１乃至Ｓ９１５）。一方、現フレームが、最後のフレームである場合には（ステップＳ９１６）、アフィン変換パラメータ検出処理を終了する。 Subsequently, it is determined whether or not the current frame is the last frame of the moving image input to the image conversion information supply unit 130 (step S916). If the current frame is not the last frame (step S916), the process returns to step S901 to repeat the affine transformation parameter detection process (steps S901 to S915). On the other hand, if the current frame is the last frame (step S916), the affine transformation parameter detection process is terminated.

本発明の実施の形態では、動画を構成する画像において検出されたオプティカルフローに基づいてアフィン変換パラメータを検出する例について説明したが、加速度センサやジャイロセンサ等のセンサやズーム操作をする際に用いられるズームボタンを撮像装置に設け、このセンサやズームボタンによって撮影時における撮像装置の移動量を検出し、この撮像装置の移動量に基づいてアフィン変換パラメータを求めるようにしてもよい。なお、これらの撮影時において検出された撮像装置の移動量については、画像変換情報供給部１３０により求められたアフィン変換パラメータが正しいか否かを判断する際に用いることができる。また、画像変換情報供給部１３０により複数のアフィン変換パラメータを検出しておき、撮影時において検出された撮像装置の移動量に基づいて、この複数のアフィン変換パラメータの中から１つのアフィン変換パラメータを選択するようにしてもよい。 In the embodiment of the present invention, the example in which the affine transformation parameter is detected based on the optical flow detected in the image constituting the moving image has been described. However, the embodiment is used when performing a zoom operation or a sensor such as an acceleration sensor or a gyro sensor. The zoom button may be provided in the imaging device, the movement amount of the imaging device at the time of shooting may be detected by the sensor or the zoom button, and the affine transformation parameter may be obtained based on the movement amount of the imaging device. Note that the movement amount of the imaging device detected at the time of photographing can be used when determining whether or not the affine transformation parameter obtained by the image transformation information supply unit 130 is correct. In addition, a plurality of affine transformation parameters are detected by the image conversion information supply unit 130, and one affine transformation parameter is selected from the plurality of affine transformation parameters based on the movement amount of the imaging device detected at the time of shooting. You may make it select.

次に、上述したアフィン変換パラメータを用いて動画を再生表示する場合について図面を参照して詳細に説明する。なお、図５乃至図１６に示す各画像は、説明のため、簡略化するとともに、連続する２つのフレーム間の移動量を大きくして示している。 Next, a case where a moving image is reproduced and displayed using the above-described affine transformation parameters will be described in detail with reference to the drawings. Each of the images shown in FIGS. 5 to 16 is simplified for the sake of explanation, and the amount of movement between two consecutive frames is increased.

最初に、撮像装置の撮影時において、倍率が変更されないものの、撮像装置の位置を中心として、撮像装置のレンズの方向が上下左右の何れかに移動されている場合について説明する。 First, a case will be described in which the magnification of the imaging device is not changed at the time of shooting by the imaging device, but the direction of the lens of the imaging device is moved either up, down, left, or right around the position of the imaging device.

図５は、撮像装置により撮影された動画の遷移の一例を示す図である。図５には、山を背景にして人４００を撮影した場合における動画に含まれる連続するフレームに対応する画像４０１乃至４０３を示す図である。この例では、撮像装置のレンズの方向を右および上側に移動しながら、撮影者が撮影を行っている場合を示す。この場合には、撮像装置により撮影される動画に含まれる人４００が、その動画を構成する画像において右側から左側に移動するとともに下側に移動する。 FIG. 5 is a diagram illustrating an example of transition of a moving image shot by the imaging apparatus. FIG. 5 is a diagram showing images 401 to 403 corresponding to continuous frames included in a moving image when a person 400 is photographed against a mountain background. In this example, the photographer is photographing while moving the lens direction of the imaging device to the right and the upper side. In this case, the person 400 included in the moving image photographed by the imaging device moves from the right side to the left side and moves downward in the image constituting the moving image.

図６は、図５に示す各画像において、直前のフレームに対応する画像を破線で示すとともに、検出されるオプティカルフローの一例を示す図である。図６（ａ）に示す画像４０１は、図５（ａ）に示す画像４０１と同じものである。また、図６（ｂ）に示す画像４０２のうちの実線の部分は、図５（ｂ）に示す画像４０２と同じものであり、図６（ｂ）に示す画像４０２のうちの破線の部分は、図６（ａ）に示す画像４０１の実線の部分と同じものである。また、図６（ｂ）に示す画像４０２における矢印４０４乃至４０６は、画像４０２から検出されたオプティカルフローの一例を示す。同様に、図６（ｃ）に示す画像４０３のうちの実線の部分は、図５（ｃ）に示す画像４０３と同じものであり、図６（ｃ）に示す画像４０３のうちの破線の部分は、図６（ｂ）に示す画像４０２の実線の部分と同じものである。また、図６（ｃ）に示す画像４０３における矢印４０７乃至４０９は、画像４０３から検出されたオプティカルフローの一例を示す。 FIG. 6 is a diagram showing an example of an optical flow detected in each image shown in FIG. 5 while an image corresponding to the immediately preceding frame is indicated by a broken line. An image 401 shown in FIG. 6A is the same as the image 401 shown in FIG. Further, the solid line portion of the image 402 shown in FIG. 6B is the same as the image 402 shown in FIG. 5B, and the broken line portion of the image 402 shown in FIG. This is the same as the solid line portion of the image 401 shown in FIG. In addition, arrows 404 to 406 in the image 402 illustrated in FIG. 6B indicate an example of the optical flow detected from the image 402. Similarly, the solid line portion of the image 403 shown in FIG. 6C is the same as the image 403 shown in FIG. 5C, and the broken line portion of the image 403 shown in FIG. Is the same as the solid line portion of the image 402 shown in FIG. In addition, arrows 407 to 409 in the image 403 illustrated in FIG. 6C indicate an example of the optical flow detected from the image 403.

図６（ｂ）および（ｃ）に示すように、撮像装置の移動に合わせて、画像に含まれる人４００および背景の山が移動する。この移動により検出されるオプティカルフローに基づいてアフィン変換パラメータをフレーム毎に求めることができる。 As shown in FIGS. 6B and 6C, the person 400 and the background mountain included in the image move in accordance with the movement of the imaging apparatus. Based on the optical flow detected by this movement, affine transformation parameters can be obtained for each frame.

図７は、図５に示す画像４０１乃至４０３を含む動画を再生する場合における表示例を示す図である。なお、本発明の実施の形態では、動画を構成する各画像が合成されるため、再生時間の経過とともに、表示部１８０に表示される画像が通常の画像よりも大きくなる。このため、最初に表示される画像は、表示部１８０の表示領域の大きさよりも比較的小さくして表示される。なお、最初に表示される画像の大きさや位置等をユーザが指定するようにしてもよい。 FIG. 7 is a diagram illustrating a display example when a moving image including the images 401 to 403 illustrated in FIG. 5 is reproduced. In the embodiment of the present invention, since the images constituting the moving image are combined, the image displayed on the display unit 180 becomes larger than the normal image as the playback time elapses. For this reason, the first displayed image is displayed with a size relatively smaller than the size of the display area of the display unit 180. Note that the user may specify the size, position, and the like of the first displayed image.

図７（ａ）に示すように、最初は、先頭のフレームに対応する画像４０１のみが表示される。ここで、画像４０１に対応するアフィン変換パラメータの行列（３×３の行列）をＡ１とする場合に、Ａ１は単位行列であるため、画像４０１の位置および大きさは変換されない。続いて、次のフレームに対応する画像４０２が表示される場合には、このフレームに関連付けられているアフィン変換パラメータを用いて画像４０２がアフィン変換される。具体的には、画像４０２に対応するアフィン変換パラメータの行列をＡ２とし、画像４０１に対応するアフィン変換パラメータの行列をＡ１とする場合において、「Ａ１×Ａ２」の値が求められ、先頭のフレームの画像４０１の位置および大きさを基準にして、求められた「Ａ１×Ａ２」の行列により画像４０２がアフィン変換される。図７（ｂ）に示す画像においては、画像４０２の位置のみが変換される。そして、アフィン変換パラメータによりアフィン変換された画像４０２が、直前のフレームに対応する画像４０１に重なるように上書きされる。すなわち、画像４０１の領域のうちで、画像４０２と重複する領域４１０については、画像４０２の画像が上書きされる。また、画像４０１の領域のうちで、画像４０２と重複しない領域４１１については、画像４０１の画像が合成される。すなわち、２つ目のフレームに対応する画像４０２が表示される場合には、図７（ｂ）に示すように、画像４０２の全体部分と、画像４０１のうちの領域４１１に対応する部分とが合成された画像が表示される。また、表示されている画像のうちで最新の画像であることを示す画像枠を現フレームに対応する画像の周りに表示させることができる。図７（ｂ）では、画像４０２に画像枠が表示される。また、画像４０２をアフィン変換したアフィン変換パラメータが画像変換部１４０に保持される。 As shown in FIG. 7A, only the image 401 corresponding to the first frame is displayed at first. Here, when the matrix of affine transformation parameters (3 × 3 matrix) corresponding to the image 401 is A1, since A1 is a unit matrix, the position and size of the image 401 are not converted. Subsequently, when the image 402 corresponding to the next frame is displayed, the image 402 is affine transformed using the affine transformation parameters associated with the frame. Specifically, when the matrix of the affine transformation parameters corresponding to the image 402 is A2, and the matrix of the affine transformation parameters corresponding to the image 401 is A1, the value “A1 × A2” is obtained, and the first frame The image 402 is affine-transformed by the obtained “A1 × A2” matrix with reference to the position and size of the image 401. In the image shown in FIG. 7B, only the position of the image 402 is converted. Then, the image 402 that has been affine transformed with the affine transformation parameters is overwritten so as to overlap the image 401 corresponding to the immediately preceding frame. That is, in the area of the image 401, the area 410 that overlaps the image 402 is overwritten with the image of the image 402. In addition, in the area 411 that does not overlap with the image 402 in the area of the image 401, the image 401 is synthesized. That is, when the image 402 corresponding to the second frame is displayed, as shown in FIG. 7B, the entire portion of the image 402 and the portion corresponding to the region 411 in the image 401 are displayed. The synthesized image is displayed. In addition, an image frame indicating the latest image among the displayed images can be displayed around the image corresponding to the current frame. In FIG. 7B, an image frame is displayed on the image 402. Also, the image conversion unit 140 holds affine transformation parameters obtained by affine transformation of the image 402.

続いて、次のフレームに対応する画像４０３が表示される場合には、このフレームに関連付けられているアフィン変換パラメータを用いて画像４０３がアフィン変換される。すなわち、画像４０３に対応するアフィン変換パラメータの行列と、直前のアフィン変換に用いられた画像４０２に対応するアフィン変換パラメータの行列とを乗算して求められたアフィン変換パラメータにより画像４０３がアフィン変換される。具体的には、画像４０３に対応するアフィン変換パラメータの行列をＡ３とし、画像４０２に対応するアフィン変換パラメータの行列をＡ２とし、画像４０１に対応するアフィン変換パラメータの行列をＡ１とする場合において、「Ａ１×Ａ２×Ａ３」の値が求められ、先頭のフレームの画像４０１の位置および大きさを基準にして、求められた「Ａ１×Ａ２×Ａ３」の行列により画像４０３がアフィン変換される。図７（ｃ）に示す画像においては、画像４０３の位置のみが変換される。そして、アフィン変換パラメータによりアフィン変換された画像４０３が、前のフレームに対応する画像４０１および４０２の合成画像に重なるように上書きされる。すなわち、画像４０１および４０２の合成画像の領域のうちで、画像４０３と重複する領域４１３および４１４については、画像４０３の画像が上書きされる。また、画像４０１および４０２の合成画像の領域のうちで、画像４０３と重複しない領域４１１および４１２については、画像４０１および４０２の合成画像が合成される。すなわち、３つ目のフレームに対応する画像４０３が表示される場合には、図７（ｃ）に示すように、画像４０３の全体部分と、画像４０１のうちの領域４１１に対応する部分と、画像４０２のうちの領域４１２に対応する部分とが合成された画像が表示される。また、表示されている画像のうちで最新の画像であることを示す画像枠を現フレームに対応する画像の周りに表示させる場合には、図７（ｃ）に示す画像４０３に画像枠が表示される。また、画像４０３をアフィン変換したアフィン変換パラメータが画像変換部１４０に保持される。すなわち、画像４０２および４０３のそれぞれに対応するアフィン変換パラメータの行列の乗算により求められたアフィン変換パラメータが画像変換部１４０に保持される。このように、現フレームに対応する画像をアフィン変換する場合には、現フレームに対応するアフィン変換パラメータの行列と、この直前までの各フレームに対応するアフィン変換パラメータの行列との乗算により求められたアフィン変換パラメータにより、現フレームに対応する画像がアフィン変換される。このアフィン変換の際に求められたアフィン変換パラメータが画像変換部１４０に保持され、次のアフィン変換で用いられる。また、図１１および図１５の場合についても同様である。 Subsequently, when an image 403 corresponding to the next frame is displayed, the image 403 is affine transformed using the affine transformation parameters associated with this frame. That is, the image 403 is affine-transformed by the affine transformation parameters obtained by multiplying the matrix of the affine transformation parameters corresponding to the image 403 and the matrix of the affine transformation parameters corresponding to the image 402 used for the immediately preceding affine transformation. The Specifically, when the matrix of affine transformation parameters corresponding to the image 403 is A3, the matrix of affine transformation parameters corresponding to the image 402 is A2, and the matrix of affine transformation parameters corresponding to the image 401 is A1, A value of “A1 × A2 × A3” is obtained, and the image 403 is affine transformed with the obtained matrix of “A1 × A2 × A3” with reference to the position and size of the image 401 of the first frame. In the image shown in FIG. 7C, only the position of the image 403 is converted. Then, the image 403 that has been affine transformed with the affine transformation parameters is overwritten so as to overlap the composite image of the images 401 and 402 corresponding to the previous frame. In other words, in the composite image areas of images 401 and 402, areas 413 and 414 overlapping with image 403 are overwritten with the image of image 403. In addition, in the regions 411 and 412 that do not overlap with the image 403 among the regions of the composite image of the images 401 and 402, the composite image of the images 401 and 402 is combined. That is, when the image 403 corresponding to the third frame is displayed, as shown in FIG. 7C, the entire portion of the image 403, the portion corresponding to the region 411 in the image 401, An image obtained by combining the portion corresponding to the region 412 in the image 402 is displayed. When an image frame indicating that it is the latest image among the displayed images is displayed around the image corresponding to the current frame, the image frame is displayed on the image 403 shown in FIG. Is done. Also, the image conversion unit 140 holds affine transformation parameters obtained by affine transformation of the image 403. That is, the affine transformation parameters obtained by multiplication of the matrix of affine transformation parameters corresponding to the images 402 and 403 are held in the image transformation unit 140. As described above, when the image corresponding to the current frame is affine transformed, it is obtained by multiplying the matrix of affine transformation parameters corresponding to the current frame and the matrix of affine transformation parameters corresponding to each of the previous frames. The image corresponding to the current frame is affine transformed by the affine transformation parameters. The affine transformation parameters obtained at the time of this affine transformation are held in the image transformation unit 140 and used in the next affine transformation. The same applies to the cases of FIGS. 11 and 15.

図８は、図５に示す画像４０１乃至４０３を含む動画を再生する場合における表示例を示す図である。図７に示す表示例は、現フレームの前の各フレームに対応する合成画像（最初は１つの画像）を固定して、アフィン変換された現フレームに対応する画像をその合成画像に上書きして合成し、この合成された画像を表示するものである。これに対して、図８に示す表示例は、現フレームに対応する画像の位置を固定とし、現フレームの前の各フレームに対応する合成画像を、アフィン変換パラメータの方向とは逆方向にアフィン変換し、このアフィン変換された合成画像に現フレームに対応する画像を上書きして合成し、この合成された画像を表示するものである。すなわち、図７および図８に示す表示例は、固定位置に表示される画像、および、アフィン変換の対象となる画像が異なるものの、他の部分は共通する。このため、図７に共通する部分については、共通の符号を付して説明する。 FIG. 8 is a diagram showing a display example when a moving image including the images 401 to 403 shown in FIG. 5 is reproduced. In the display example shown in FIG. 7, a composite image (initially one image) corresponding to each frame before the current frame is fixed, and an image corresponding to the current frame subjected to affine transformation is overwritten on the composite image. The synthesized image is displayed. On the other hand, in the display example shown in FIG. 8, the position of the image corresponding to the current frame is fixed, and the composite image corresponding to each frame before the current frame is affine in the direction opposite to the direction of the affine transformation parameter. The resultant image is converted, overwritten with the image corresponding to the current frame on the composite image subjected to the affine transformation, and the composite image is displayed. That is, in the display examples shown in FIGS. 7 and 8, the image displayed at the fixed position and the image to be subjected to affine transformation are different, but the other parts are common. Therefore, portions common to FIG. 7 will be described with common reference numerals.

図８（ａ）に示すように、最初は、先頭のフレームに対応する画像４０１のみが表示される。続いて、次のフレームに対応する画像４０２が表示される場合には、このフレームに関連付けられているアフィン変換パラメータを用いて直前の画像である画像４０１が、アフィン変換パラメータの方向とは逆方向にアフィン変換される。具体的には、画像４０２に対応するアフィン変換パラメータの行列をＡ２とし、画像４０１に対応するアフィン変換パラメータの行列をＡ１とする場合において、ｉｎｖ（Ａ１×Ａ２）の値が求められ、求められたｉｎｖ（Ａ１×Ａ２）の行列により画像４０１がアフィン変換される。ここで、ｉｎｖＡ（Ａは行列）は、Ａの逆行列である。図８（ｂ）に示す画像においては、画像４０１の位置のみが変換される。そして、アフィン変換パラメータの方向とは逆方向にアフィン変換された画像４０１に、現フレームに対応する画像４０２が重なるように上書きされる。なお、画像４０１に画像４０２が上書きされた合成画像は、図７（ｂ）に示す合成画像と同じであるため、ここでの説明は省略する。 As shown in FIG. 8A, only the image 401 corresponding to the first frame is displayed at first. Subsequently, when the image 402 corresponding to the next frame is displayed, the image 401 which is the immediately preceding image using the affine transformation parameter associated with this frame is in the direction opposite to the direction of the affine transformation parameter. To affine transformation. Specifically, when the matrix of affine transformation parameters corresponding to the image 402 is A2 and the matrix of affine transformation parameters corresponding to the image 401 is A1, the value of inv (A1 × A2) is obtained and obtained. The image 401 is subjected to affine transformation by the matrix of inv (A1 × A2). Here, invA (A is a matrix) is an inverse matrix of A. In the image shown in FIG. 8B, only the position of the image 401 is converted. Then, the image 401 affine-transformed in the direction opposite to the direction of the affine transformation parameter is overwritten so that the image 402 corresponding to the current frame overlaps. Note that the composite image in which the image 401 is overwritten on the image 401 is the same as the composite image shown in FIG.

続いて、次のフレームに対応する画像４０３が表示される場合には、このフレームに関連付けられているアフィン変換パラメータを用いて、前のフレームに対応する画像４０１および画像４０２の合成画像が、アフィン変換パラメータの方向とは逆方向にアフィン変換される。具体的には、画像４０３に対応するアフィン変換パラメータの行列をＡ３とし、画像４０２に対応するアフィン変換パラメータの行列をＡ２とし、画像４０１に対応するアフィン変換パラメータの行列をＡ１とする場合において、ｉｎｖ（Ａ１×Ａ２×Ａ３）の値が求められ、このｉｎｖ（Ａ１×Ａ２×Ａ３）の行列により画像４０１および４０２の合成画像がアフィン変換される。図８（ｃ）に示す画像においては、画像４０１および画像４０２の合成画像の位置のみが変換される。そして、現フレームに対応する画像４０３が、アフィン変換パラメータの方向とは逆方向にアフィン変換された画像４０１および４０２の合成画像に重なるように上書きされる。なお、画像４０１および４０２に画像４０３が上書きされた合成画像は、図７（ｃ）に示す合成画像と同じであるため、ここでの説明は省略する。 Subsequently, when the image 403 corresponding to the next frame is displayed, the combined image of the image 401 and the image 402 corresponding to the previous frame is affine using the affine transformation parameters associated with this frame. Affine transformation is performed in the direction opposite to the direction of the transformation parameter. Specifically, when the matrix of affine transformation parameters corresponding to the image 403 is A3, the matrix of affine transformation parameters corresponding to the image 402 is A2, and the matrix of affine transformation parameters corresponding to the image 401 is A1, A value of inv (A1 × A2 × A3) is obtained, and a composite image of the images 401 and 402 is affine-transformed by the matrix of inv (A1 × A2 × A3). In the image shown in FIG. 8C, only the position of the composite image of the image 401 and the image 402 is converted. Then, the image 403 corresponding to the current frame is overwritten so as to overlap the composite image of the images 401 and 402 that have been affine transformed in the direction opposite to the direction of the affine transformation parameter. Note that the composite image obtained by overwriting the images 401 and 402 with the image 403 is the same as the composite image shown in FIG.

次に、撮像装置の撮影時において、撮像装置のレンズの方向は移動されないものの、倍率が変更されている場合について説明する。 Next, a description will be given of a case where the magnification is changed while the direction of the lens of the imaging device is not moved during shooting by the imaging device.

図９は、撮像装置により撮影された動画の遷移の一例を示す図である。図９には、山を背景にして人４２０を撮影した場合における動画に含まれる連続するフレームに対応する画像４２１乃至４２３を示す図である。この例では、撮像装置のレンズの倍率を上げながら、撮影者が撮影を行っている場合を示す。この場合には、撮像装置により撮影される動画に含まれる人４２０が、その動画を構成する画像において次第に大きくなる。なお、倍率を上げる際に撮像装置の位置が多少移動する場合があるものの、この例では、撮像装置の位置の移動については考慮せずに説明する。 FIG. 9 is a diagram illustrating an example of transition of a moving image shot by the imaging apparatus. FIG. 9 is a diagram illustrating images 421 to 423 corresponding to continuous frames included in a moving image when a person 420 is photographed against a mountain. In this example, the case where the photographer is photographing while increasing the magnification of the lens of the imaging device is shown. In this case, the person 420 included in the moving image photographed by the imaging device becomes gradually larger in the image constituting the moving image. Although the position of the imaging device may move slightly when the magnification is increased, this example will be described without considering the movement of the position of the imaging device.

図１０は、図９に示す各画像において、直前のフレームに対応する画像を破線で示すとともに、検出されるオプティカルフローの一例を示す図である。図１０（ａ）に示す画像４２１は、図９（ａ）に示す画像４２１と同じものである。また、図１０（ｂ）に示す画像４２２のうちの実線の部分は、図９（ｂ）に示す画像４２２と同じものであり、図１０（ｂ）に示す画像４２２のうちの破線の部分は、図９（ａ）に示す画像４２１の実線の部分と同じものである。また、図１０（ｂ）に示す画像４２２における矢印４２４乃至４２６は、画像４２２から検出されたオプティカルフローの一例を示す。同様に、図１０（ｃ）に示す画像４２３のうちの実線の部分は、図９（ｃ）に示す画像４２３と同じものであり、図１０（ｃ）に示す画像４２３のうちの破線の部分は、図９（ｂ）に示す画像４２２の実線の部分と同じものである。また、図１０（ｃ）に示す画像４２３における矢印４２７乃至４２９は、画像４２３から検出されたオプティカルフローの一例を示す。 FIG. 10 is a diagram illustrating an example of an optical flow detected in each image illustrated in FIG. 9 while an image corresponding to the immediately preceding frame is indicated by a broken line. An image 421 shown in FIG. 10A is the same as the image 421 shown in FIG. Also, the solid line portion of the image 422 shown in FIG. 10B is the same as the image 422 shown in FIG. 9B, and the broken line portion of the image 422 shown in FIG. This is the same as the solid line portion of the image 421 shown in FIG. In addition, arrows 424 to 426 in the image 422 shown in FIG. 10B indicate an example of the optical flow detected from the image 422. Similarly, the solid line portion of the image 423 shown in FIG. 10C is the same as the image 423 shown in FIG. 9C, and the broken line portion of the image 423 shown in FIG. Is the same as the solid line portion of the image 422 shown in FIG. In addition, arrows 427 to 429 in the image 423 illustrated in FIG. 10C indicate an example of the optical flow detected from the image 423.

図１０（ｂ）および（ｃ）に示すように、倍率の変更に合わせて、画像に含まれる人４２０および背景の山の大きさが変更する。この変更により検出されるオプティカルフローに基づいてアフィン変換パラメータをフレーム毎に求めることができる。 As shown in FIGS. 10B and 10C, the size of the person 420 and the background mountain included in the image is changed in accordance with the change in magnification. Based on the optical flow detected by this change, affine transformation parameters can be obtained for each frame.

図１１は、図９に示す画像４２１乃至４２３を含む動画を再生する場合における表示例を示す図である。 FIG. 11 is a diagram illustrating a display example when a moving image including the images 421 to 423 illustrated in FIG. 9 is reproduced.

図１１（ａ）に示すように、最初は、先頭のフレームに対応する画像４２１のみが表示される。続いて、次のフレームに対応する画像４２２が表示される場合には、このフレームに関連付けられているアフィン変換パラメータを用いて画像４２２がアフィン変換される。図１１（ｂ）に示す画像においては、画像４２２の大きさのみが変換される。そして、アフィン変換パラメータによりアフィン変換された画像４２２が、直前のフレームに対応する画像４２１に重なるように上書きされる。すなわち、画像４２１の領域のうちで、画像４２２と重複する領域については、画像４２２の画像が上書きされる。この場合には、画像４２１は、画像４２２の全ての領域と重複しているため、画像４２１に画像４２２の全ての画像が上書きされる。また、画像４２１の領域のうちで、画像４２２と重複しない領域４３１については、画像４２１の画像が合成される。すなわち、２つ目のフレームに対応する画像４２２が表示される場合には、図１１（ｂ）に示すように、画像４２２の全体部分と、画像４２１のうちの領域４３１に対応する部分とが合成された画像が表示される。また、表示されている画像のうちで最新の画像であることを示す画像枠を現フレームに対応する画像の周りに表示させることができる。図１１（ｂ）では、画像４２２に画像枠が表示される。また、画像４２２をアフィン変換したアフィン変換パラメータが画像変換部１４０に保持される。 As shown in FIG. 11A, only the image 421 corresponding to the first frame is displayed at first. Subsequently, when the image 422 corresponding to the next frame is displayed, the image 422 is affine transformed using the affine transformation parameters associated with this frame. In the image shown in FIG. 11B, only the size of the image 422 is converted. Then, the image 422 that has been affine transformed with the affine transformation parameters is overwritten so as to overlap the image 421 corresponding to the immediately preceding frame. That is, in the area of the image 421, the area of the image 422 is overwritten in the area overlapping with the image 422. In this case, since the image 421 overlaps with all the areas of the image 422, all the images of the image 422 are overwritten on the image 421. In addition, in the region 431 that does not overlap with the image 422 in the region of the image 421, the image of the image 421 is synthesized. That is, when the image 422 corresponding to the second frame is displayed, as shown in FIG. 11B, the entire portion of the image 422 and the portion corresponding to the region 431 in the image 421 are displayed. The synthesized image is displayed. In addition, an image frame indicating the latest image among the displayed images can be displayed around the image corresponding to the current frame. In FIG. 11B, an image frame is displayed on the image 422. Also, the image conversion unit 140 holds affine transformation parameters obtained by affine transformation of the image 422.

続いて、次のフレームに対応する画像４２３が表示される場合には、このフレームに関連付けられているアフィン変換パラメータを用いて画像４２３がアフィン変換される。すなわち、画像４２３に対応するアフィン変換パラメータの行列と、直前のアフィン変換に用いられた画像４２２に対応するアフィン変換パラメータの行列とを乗算して求められたアフィン変換パラメータにより画像４２３がアフィン変換される。図１１（ｃ）に示す画像においては、画像４２３の大きさのみが変換される。そして、アフィン変換パラメータによりアフィン変換された画像４２３が、前のフレームに対応する画像４２１および４２２の合成画像に重なるように上書きされる。すなわち、画像４２１および４２２の合成画像の領域のうちで、画像４２３と重複する領域については、画像４２３の画像が上書きされる。この場合には、画像４２３は、画像４２１および４２２の全ての領域と重複しているため、画像４２１および４２２の合成画像に画像４２３の全ての画像が上書きされる。また、画像４２１および４２２の合成画像の領域のうちで、画像４２３と重複しない領域４３２および４３３については、画像４２１および４２２の合成画像が合成される。すなわち、３つ目のフレームに対応する画像４２３が表示される場合には、図１１（ｃ）に示すように、画像４２３の全体部分と、画像４２１のうちの領域４３２に対応する部分と、画像４２２のうちの領域４３３に対応する部分とが合成された画像が表示される。また、表示されている画像のうちで最新の画像であることを示す画像枠を現フレームに対応する画像の周りに表示させる場合には、図１１（ｃ）に示す画像４２３に画像枠が表示される。また、画像４２３をアフィン変換したアフィン変換パラメータが画像変換部１４０に保持される。すなわち、画像４２２および４２３のそれぞれに対応するアフィン変換パラメータの行列の乗算により求められたアフィン変換パラメータが画像変換部１４０に保持される。 Subsequently, when an image 423 corresponding to the next frame is displayed, the image 423 is affine transformed using the affine transformation parameters associated with this frame. That is, the image 423 is affine-transformed with the affine transformation parameters obtained by multiplying the matrix of the affine transformation parameters corresponding to the image 423 and the matrix of the affine transformation parameters corresponding to the image 422 used for the immediately preceding affine transformation. The In the image shown in FIG. 11C, only the size of the image 423 is converted. Then, the image 423 affine-transformed by the affine transformation parameters is overwritten so as to overlap the synthesized image of the images 421 and 422 corresponding to the previous frame. That is, the image of the image 423 is overwritten in the region of the composite image of the images 421 and 422 that overlaps the image 423. In this case, since the image 423 overlaps with all the areas of the images 421 and 422, all the images of the image 423 are overwritten on the composite image of the images 421 and 422. In addition, regarding the regions 432 and 433 that do not overlap with the image 423 among the regions of the combined image of the images 421 and 422, the combined image of the images 421 and 422 is combined. That is, when the image 423 corresponding to the third frame is displayed, as shown in FIG. 11C, the entire portion of the image 423, the portion corresponding to the region 432 in the image 421, An image obtained by combining the portion corresponding to the region 433 in the image 422 is displayed. Further, when an image frame indicating that it is the latest image among the displayed images is displayed around the image corresponding to the current frame, the image frame is displayed on the image 423 shown in FIG. Is done. Further, the image conversion unit 140 holds an affine transformation parameter obtained by affine transformation of the image 423. That is, the affine transformation parameters obtained by multiplication of the matrix of affine transformation parameters corresponding to the images 422 and 423 are held in the image transformation unit 140.

図１２は、図９に示す画像４２１乃至４２３を含む動画を再生する場合における表示例を示す図である。図１１および図１２に示す表示例の相違は、図７および図８に示す表示例の相違と同様であり、固定位置に表示される画像、および、アフィン変換の対象となる画像が異なるものの、他の部分は共通する。このため、図１１に共通する部分については、共通の符号を付して説明する。 FIG. 12 is a diagram illustrating a display example when a moving image including the images 421 to 423 illustrated in FIG. 9 is reproduced. The difference between the display examples shown in FIG. 11 and FIG. 12 is the same as the difference between the display examples shown in FIG. 7 and FIG. 8. Other parts are common. Therefore, portions common to FIG. 11 will be described with common reference numerals.

図１２（ａ）に示すように、最初は、先頭のフレームに対応する画像４２１のみが表示される。続いて、次のフレームに対応する画像４２２が表示される場合には、このフレームに関連付けられているアフィン変換パラメータを用いて直前の画像である画像４２１が、アフィン変換パラメータの方向とは逆方向にアフィン変換される。図１２（ｂ）に示す画像においては、画像４２１の大きさのみが変換される。そして、アフィン変換パラメータの方向とは逆方向にアフィン変換された画像４２１に、現フレームに対応する画像４２２が重なるように上書きされる。なお、画像４２１に画像４２２が上書きされた合成画像については、大きさが異なるものの、その他の点は、図１１（ｂ）に示す合成画像と同じであるため、ここでの説明は省略する。 As shown in FIG. 12A, only the image 421 corresponding to the first frame is displayed at first. Subsequently, when the image 422 corresponding to the next frame is displayed, the image 421 which is the previous image using the affine transformation parameter associated with this frame is in the direction opposite to the direction of the affine transformation parameter. To affine transformation. In the image shown in FIG. 12B, only the size of the image 421 is converted. Then, the image 421 affine-transformed in the direction opposite to the direction of the affine transformation parameter is overwritten so that the image 422 corresponding to the current frame overlaps. Note that the composite image in which the image 422 is overwritten on the image 421 is different in size, but is otherwise the same as the composite image shown in FIG.

続いて、次のフレームに対応する画像４２３が表示される場合には、このフレームに関連付けられているアフィン変換パラメータを用いて、前のフレームに対応する画像４２１および画像４２２の合成画像が、アフィン変換パラメータの方向とは逆方向にアフィン変換される。図１２（ｃ）に示す画像においては、画像４２１および４２２の合成画像の大きさのみが変換される。そして、現フレームに対応する画像４２３が、アフィン変換パラメータの方向とは逆方向にアフィン変換された画像４２１および４２２の合成画像に重なるように上書きされる。なお、画像４２１および４２２の合成画像に画像４２３が上書きされた合成画像は、大きさが異なるものの、その他の点は、図１１（ｃ）に示す合成画像と同じであるため、ここでの説明は省略する。 Subsequently, when an image 423 corresponding to the next frame is displayed, a composite image of the image 421 and the image 422 corresponding to the previous frame is converted into an affine using the affine transformation parameters associated with the frame. Affine transformation is performed in the direction opposite to the direction of the transformation parameter. In the image shown in FIG. 12C, only the size of the combined image of the images 421 and 422 is converted. Then, the image 423 corresponding to the current frame is overwritten so as to overlap the synthesized image of the images 421 and 422 that have been affine transformed in the direction opposite to the direction of the affine transformation parameter. Note that the composite image obtained by overwriting the composite image of the images 421 and 422 with the image 423 is different in size, but the other points are the same as the composite image shown in FIG. Is omitted.

次に、撮像装置の撮影時において、撮像装置のレンズの方向や倍率は変更されないものの、撮影方向を回転中心にして撮像装置が回転されている場合について説明する。 Next, a description will be given of a case where the imaging device is rotated around the shooting direction, although the lens direction and magnification of the imaging device are not changed during shooting by the imaging device.

図１３は、撮像装置により撮影された動画の遷移の一例を示す図である。図１３には、山を背景にして人４４０を撮影した場合における動画に含まれる連続するフレームに対応する画像４４１乃至４４３を示す図である。この例では、撮影方向を回転中心にして撮像装置を回転しながら、撮影者が撮影を行っている場合を示す。この場合には、撮像装置により撮影される動画に含まれる人４４０が、その動画を構成する画像において回転していく。なお、撮像装置の回転により撮像装置の位置が多少移動する場合があるものの、この例では、撮像装置の位置の移動については考慮せずに説明する。 FIG. 13 is a diagram illustrating an example of transition of a moving image shot by the imaging apparatus. FIG. 13 is a diagram showing images 441 to 443 corresponding to continuous frames included in a moving image when a person 440 is photographed against a mountain background. In this example, the case where the photographer is photographing while rotating the imaging device with the photographing direction as the rotation center is shown. In this case, the person 440 included in the moving image photographed by the imaging device rotates in the images constituting the moving image. Note that although the position of the imaging apparatus may move somewhat due to the rotation of the imaging apparatus, in this example, the description will be given without considering the movement of the position of the imaging apparatus.

図１４は、図１３に示す各画像において、直前のフレームに対応する画像を破線で示すとともに、検出されるオプティカルフローの一例を示す図である。図１４（ａ）に示す画像４４１は、図１３（ａ）に示す画像４４１と同じものである。また、図１４（ｂ）に示す画像４４２のうちの実線の部分は、図１３（ｂ）に示す画像４４２と同じものであり、図１４（ｂ）に示す画像４４２のうちの破線の部分は、図１３（ａ）に示す画像４４１の実線の部分と同じものである。また、図１４（ｂ）に示す画像４４２における矢印４４４乃至４４６は、画像４４２から検出されたオプティカルフローの一例を示す。同様に、図１４（ｃ）に示す画像４４３のうちの実線の部分は、図１３（ｃ）に示す画像４４３と同じものであり、図１４（ｃ）に示す画像４４３のうちの破線の部分は、図１３（ｂ）に示す画像４４２の実線の部分と同じものである。また、図１４（ｃ）に示す画像４４３における矢印４４７乃至４４９は、画像４４３から検出されたオプティカルフローの一例を示す。 FIG. 14 is a diagram showing an example of an optical flow detected in each image shown in FIG. 13 while an image corresponding to the immediately preceding frame is indicated by a broken line. An image 441 shown in FIG. 14A is the same as the image 441 shown in FIG. Further, the solid line portion of the image 442 shown in FIG. 14B is the same as the image 442 shown in FIG. 13B, and the broken line portion of the image 442 shown in FIG. This is the same as the solid line portion of the image 441 shown in FIG. In addition, arrows 444 to 446 in the image 442 illustrated in FIG. 14B indicate an example of the optical flow detected from the image 442. Similarly, the solid line portion of the image 443 shown in FIG. 14C is the same as the image 443 shown in FIG. 13C, and the broken line portion of the image 443 shown in FIG. Is the same as the solid line portion of the image 442 shown in FIG. In addition, arrows 447 to 449 in the image 443 illustrated in FIG. 14C indicate an example of the optical flow detected from the image 443.

図１４（ｂ）および（ｃ）に示すように、撮像装置の回転に合わせて、画像に含まれる人４４０および背景の山が回転移動する。この回転移動により検出されるオプティカルフローに基づいてアフィン変換パラメータをフレーム毎に求めることができる。 As shown in FIGS. 14B and 14C, the person 440 and the background mountain included in the image rotate and move in accordance with the rotation of the imaging apparatus. Based on the optical flow detected by this rotational movement, the affine transformation parameters can be obtained for each frame.

図１５は、図１３に示す画像４４１乃至４４３を含む動画を再生する場合における表示例を示す図である。 FIG. 15 is a diagram illustrating a display example when a moving image including the images 441 to 443 illustrated in FIG. 13 is reproduced.

図１５（ａ）に示すように、最初は、先頭のフレームに対応する画像４４１のみが表示される。続いて、次のフレームに対応する画像４４２が表示される場合には、このフレームに関連付けられているアフィン変換パラメータを用いて画像４４２がアフィン変換される。図１５（ｂ）に示す画像においては、画像４４２の角度のみが変換される。そして、アフィン変換パラメータによりアフィン変換された画像４４２が、直前のフレームに対応する画像４４１に重なるように上書きされる。すなわち、画像４４１の領域のうちで、画像４４２と重複する領域４５０については、画像４４２の画像が上書きされる。また、画像４４１の領域のうちで、画像４４２と重複しない領域４５１および４５２については、画像４４１の画像が合成される。すなわち、２つ目のフレームに対応する画像４４２が表示される場合には、図１５（ｂ）に示すように、画像４４２の全体部分と、画像４４１のうちの領域４５１および４５２に対応する部分とが合成された画像が表示される。また、表示されている画像のうちで最新の画像であることを示す画像枠を現フレームに対応する画像の周りに表示させることができる。図１５（ｂ）では、画像４４２に画像枠が表示される。また、画像４４２をアフィン変換したアフィン変換パラメータが画像変換部１４０に保持される。 As shown in FIG. 15A, only the image 441 corresponding to the first frame is displayed at first. Subsequently, when the image 442 corresponding to the next frame is displayed, the image 442 is affine transformed using the affine transformation parameters associated with this frame. In the image shown in FIG. 15B, only the angle of the image 442 is converted. Then, the image 442 that has been affine transformed with the affine transformation parameters is overwritten so as to overlap the image 441 corresponding to the immediately preceding frame. That is, in the region of the image 441, the region 450 overlapping the image 442 is overwritten with the image of the image 442. In addition, regarding the regions 451 and 452 that do not overlap the image 442 in the region of the image 441, the image of the image 441 is synthesized. That is, when the image 442 corresponding to the second frame is displayed, as shown in FIG. 15B, the entire portion of the image 442 and the portions corresponding to the regions 451 and 452 in the image 441. A combined image is displayed. In addition, an image frame indicating the latest image among the displayed images can be displayed around the image corresponding to the current frame. In FIG. 15B, an image frame is displayed on the image 442. Further, the image conversion unit 140 holds affine transformation parameters obtained by affine transformation of the image 442.

続いて、次のフレームに対応する画像４４３が表示される場合には、このフレームに関連付けられているアフィン変換パラメータを用いて画像４４３がアフィン変換される。すなわち、画像４４３に対応するアフィン変換パラメータの行列と、直前のアフィン変換に用いられた画像４４２に対応するアフィン変換パラメータの行列とを乗算して求められたアフィン変換パラメータにより画像４４３がアフィン変換される。図１５（ｃ）に示す画像においては、画像４４３の角度のみが変換される。そして、アフィン変換パラメータによりアフィン変換された画像４４３が、前のフレームに対応する画像４４１および４４２の合成画像に重なるように上書きされる。すなわち、画像４４１および４４２の合成画像の領域のうちで、画像４４３と重複する領域４５３乃至４５７については、画像４４３の画像が上書きされる。また、画像４４１および４４２の合成画像の領域のうちで、画像４４３と重複しない領域４５８乃至４６１については、画像４４１および４４２の合成画像がさらに合成される。すなわち、３つ目のフレームに対応する画像４４３が表示される場合には、図１５（ｃ）に示すように、画像４４３の全体部分と、画像４４１のうちの領域４５９に対応する部分と、画像４４２のうちの領域４５８および４６０に対応する部分とが合成された画像が表示される。また、表示されている画像のうちで最新の画像であることを示す画像枠を現フレームに対応する画像の周りに表示させる場合には、図１５（ｃ）に示す画像４４３に画像枠が表示される。また、画像４４３をアフィン変換したアフィン変換パラメータが画像変換部１４０に保持される。すなわち、画像４４２および４４３のそれぞれに対応するアフィン変換パラメータの行列の乗算により求められたアフィン変換パラメータが画像変換部１４０に保持される。 Subsequently, when the image 443 corresponding to the next frame is displayed, the image 443 is affine transformed using the affine transformation parameters associated with this frame. That is, the image 443 is affine-transformed with the affine transformation parameters obtained by multiplying the matrix of the affine transformation parameters corresponding to the image 443 by the matrix of the affine transformation parameters corresponding to the image 442 used for the immediately preceding affine transformation. The In the image shown in FIG. 15C, only the angle of the image 443 is converted. Then, the image 443 that has been affine transformed with the affine transformation parameters is overwritten so as to overlap the synthesized image of the images 441 and 442 corresponding to the previous frame. That is, the image of the image 443 is overwritten in areas 453 to 457 that overlap with the image 443 among the areas of the composite image of the images 441 and 442. In addition, regarding the regions 458 to 461 that do not overlap with the image 443 among the regions of the combined image of the images 441 and 442, the combined image of the images 441 and 442 is further combined. That is, when the image 443 corresponding to the third frame is displayed, as shown in FIG. 15C, the entire portion of the image 443, the portion corresponding to the region 459 in the image 441, An image obtained by combining the portions corresponding to the regions 458 and 460 in the image 442 is displayed. Further, when an image frame indicating that it is the latest image among the displayed images is displayed around the image corresponding to the current frame, the image frame is displayed on the image 443 shown in FIG. Is done. Further, the image conversion unit 140 holds an affine transformation parameter obtained by affine transformation of the image 443. That is, the affine transformation parameters obtained by multiplication of the matrix of affine transformation parameters corresponding to the images 442 and 443 are held in the image transformation unit 140.

図１６は、図１３に示す画像４４１乃至４４３を含む動画を再生する場合における表示例を示す図である。図１５および図１６に示す表示例の相違は、図７および図８に示す表示例の相違と同様であり、固定位置に表示される画像、および、アフィン変換の対象となる画像が異なるものの、他の部分は共通する。このため、図１５に共通する部分については、共通の符号を付して説明する。 FIG. 16 is a diagram illustrating a display example when a moving image including the images 441 to 443 illustrated in FIG. 13 is reproduced. The difference between the display examples shown in FIGS. 15 and 16 is the same as the difference between the display examples shown in FIGS. 7 and 8, and the image displayed at the fixed position and the image targeted for affine transformation are different. Other parts are common. Therefore, portions common to FIG. 15 will be described with common reference numerals.

図１６（ａ）に示すように、最初は、先頭のフレームに対応する画像４４１のみが表示される。続いて、次のフレームに対応する画像４４２が表示される場合には、このフレームに関連付けられているアフィン変換パラメータを用いて直前の画像である画像４４１が、アフィン変換パラメータの方向とは逆方向にアフィン変換される。図１６（ｂ）に示す画像においては、画像４４１の角度のみが変換される。そして、アフィン変換パラメータの方向とは逆方向にアフィン変換された画像４４１に、現フレームに対応する画像４４２が重なるように上書きされる。なお、画像４４１に画像４４２が上書きされた合成画像については、角度が異なるものの、その他の点は、図１５（ｂ）に示す合成画像と同じであるため、ここでの説明は省略する。 As shown in FIG. 16A, only the image 441 corresponding to the first frame is displayed at first. Subsequently, when an image 442 corresponding to the next frame is displayed, the image 441 which is the immediately preceding image using the affine transformation parameter associated with this frame is in a direction opposite to the direction of the affine transformation parameter. To affine transformation. In the image shown in FIG. 16B, only the angle of the image 441 is converted. Then, the image 441 subjected to affine transformation in the direction opposite to the direction of the affine transformation parameter is overwritten so that the image 442 corresponding to the current frame overlaps. Note that the synthesized image in which the image 442 is overwritten on the image 441 is different in angle, but the other points are the same as the synthesized image shown in FIG.

続いて、次のフレームに対応する画像４４３が表示される場合には、このフレームに関連付けられているアフィン変換パラメータを用いて、前のフレームに対応する画像４４１および画像４４２の合成画像が、アフィン変換パラメータの方向とは逆方向にアフィン変換される。図１６（ｃ）に示す画像においては、画像４４１および４４２の合成画像の角度のみが変換される。そして、現フレームに対応する画像４４３が、アフィン変換パラメータの方向とは逆方向にアフィン変換された画像４４１および４４２の合成画像に重なるように上書きされる。なお、画像４４１および４４２に画像４４３が上書きされた合成画像は、角度が異なるものの、その他の点は、図１５（ｃ）に示す合成画像と同じであるため、ここでの説明は省略する。 Subsequently, when the image 443 corresponding to the next frame is displayed, the composite image of the image 441 and the image 442 corresponding to the previous frame is converted into an affine using the affine transformation parameters associated with this frame. Affine transformation is performed in the direction opposite to the direction of the transformation parameter. In the image shown in FIG. 16C, only the angle of the combined image of the images 441 and 442 is converted. Then, the image 443 corresponding to the current frame is overwritten so as to overlap the synthesized image of the images 441 and 442 that have been affine transformed in the direction opposite to the direction of the affine transformation parameter. Note that the synthesized image in which the image 443 and the image 443 are overwritten on the images 441 and 442 are different in angle, but are the same as the synthesized image shown in FIG.

以上では、動画を構成する各画像の位置、倍率および角度が順次変更される場合についてそれぞれ説明したが、これらの変更が組み合わされている場合についても同様に適用することができる。 The case where the position, magnification, and angle of each image constituting the moving image are sequentially changed has been described above. However, the present invention can be similarly applied to a case where these changes are combined.

ここで、現フレームに対応する画像の中心位置、角度および倍率の算出処理例について説明する。上述のように現フレームに対応する画像は、基準画像のフレームから現フレームまでの各フレームに対応するアフィン変換パラメータの行列が乗算されたものを用いて変換される。そのため、基準画像からの現フレームに対応する画像の移動量、回転角または倍率は、乗算により求められるアフィン変換パラメータを用いることで算出することができる。具体的には、画像変換部１４０により保持されている変換の基準とされた先頭フレームの中心位置、角度および倍率を示す基準情報、および、現フレームまでの各フレームに対応するアフィン変換パラメータの行列を乗算したものを用いることで、現フレームに対応する画像の中心位置、角度および倍率を算出することができる。現フレームに対応する画像の中心位置については、基準情報の中心位置および乗算されたアフィン変換パラメータの行列を用いて式１から算出され、角度θおよび倍率ｚについては、乗算されたアフィン変換パラメータを用いて、例えば、次式から算出される。
Here, a calculation example of the center position, angle, and magnification of the image corresponding to the current frame will be described. As described above, the image corresponding to the current frame is converted using a product obtained by multiplying the matrix of the affine transformation parameters corresponding to each frame from the frame of the reference image to the current frame. Therefore, the movement amount, rotation angle, or magnification of the image corresponding to the current frame from the reference image can be calculated by using an affine transformation parameter obtained by multiplication. Specifically, reference information indicating the center position, angle, and magnification of the first frame, which is a reference for conversion held by the image conversion unit 140, and a matrix of affine transformation parameters corresponding to each frame up to the current frame By using the product obtained by multiplying the image, the center position, angle and magnification of the image corresponding to the current frame can be calculated. The center position of the image corresponding to the current frame is calculated from Equation 1 using the center position of the reference information and the matrix of the multiplied affine transformation parameters. For the angle θ and the magnification z, the multiplied affine transformation parameters are set. For example, it is calculated from the following equation.

図１７は、本発明の実施の形態における音声変換処理部２００の構成例を示すブロック図である。ここでは一例として、右チャンネルおよび左チャンネルの入力音声を右チャンネルおよび左チャンネルの出力音声に変換する例を示す。 FIG. 17 is a block diagram illustrating a configuration example of the voice conversion processing unit 200 according to the embodiment of the present invention. Here, as an example, an example is shown in which the input sound of the right channel and the left channel is converted into the output sound of the right channel and the left channel.

音量調整部２０１は、音量増幅器２０３乃至２０６を備える。音量増幅器２０３は、音声変換情報算出部１９０からの音声変換情報ＲＲに基づいて右チャンネル入力音声を増幅する。音量増幅器２０４は、音声変換情報算出部１９０からの音声変換情報ＲＬに基づいて右チャンネル入力音声を増幅する。音量増幅器２０３は、音声変換情報算出部１９０からの音声変換情報ＬＲに基づいて左チャンネル入力音声を増幅する。音量増幅器２０４は、音声変換情報算出部１９０からの音声変換情報ＬＬに基づいて左チャンネル入力音声を増幅するものである。ここにいう、音声変換情報とは、表示部１８０の表示画面における現フレームの中心位置、角度および倍率から算出される情報であり、各チャンネルの音量の調整値を示すものである。 The volume adjustment unit 201 includes volume amplifiers 203 to 206. The volume amplifier 203 amplifies the right channel input sound based on the sound conversion information RR from the sound conversion information calculation unit 190. The volume amplifier 204 amplifies the right channel input sound based on the sound conversion information RL from the sound conversion information calculation unit 190. The volume amplifier 203 amplifies the left channel input sound based on the sound conversion information LR from the sound conversion information calculation unit 190. The volume amplifier 204 amplifies the left channel input sound based on the sound conversion information LL from the sound conversion information calculation unit 190. Here, the voice conversion information is information calculated from the center position, angle, and magnification of the current frame on the display screen of the display unit 180, and indicates the adjustment value of the volume of each channel.

音声加算部２０２は、音声加算器２０７および２０８を備える。音声加算器２０７は、音量増幅器２０３により増幅された右チャンネル入力音声および音量増幅器２０５により増幅された左チャンネル入力音声を加算するものである。この音声加算器２０７は、加算した音声を右チャンネル出力音声として音声出力制御部２１０に出力する。音声加算器２０８は、音量増幅器２０４により増幅された右チャンネル入力音声および音量増幅器２０６により増幅された左チャンネル入力音声を加算するものである。この音声加算器２０８は、加算した音声を左チャンネル出力音声として音声出力制御部２１０に出力する。これにより、入力音声が音声変換情報に従って変換されて、出力音声として音声出力制御部２１０に供給される。 The audio adder 202 includes audio adders 207 and 208. The sound adder 207 adds the right channel input sound amplified by the volume amplifier 203 and the left channel input sound amplified by the volume amplifier 205. The audio adder 207 outputs the added audio to the audio output control unit 210 as the right channel output audio. The sound adder 208 adds the right channel input sound amplified by the volume amplifier 204 and the left channel input sound amplified by the volume amplifier 206. The audio adder 208 outputs the added audio to the audio output control unit 210 as left channel output audio. Thereby, the input voice is converted according to the voice conversion information and supplied to the voice output control unit 210 as output voice.

次に、本発明の実施の形態における現フレームに対応する画像の移動に関する音声変換処理について図面を参照して詳細に説明する。 Next, audio conversion processing relating to movement of an image corresponding to the current frame in the embodiment of the present invention will be described in detail with reference to the drawings.

図１８は、撮像された動画を通常の再生方法により再生する例の概要を示す図である。図１８（ａ）には、右から左に移動する車を撮像対象として、撮像装置５００により撮像された撮像範囲５１１乃至５１３が示されている。また、撮像範囲５１１乃至５１３の中心に車５１４乃至５１６が収まるように撮像されている。なお、ここでは図面の理解を容易にするために、便宜的に撮像画像５１１乃至５１３の面積に対する車５１４乃至５１６の割合を比較的大きくして示しているが、図３で述べたように、動物体の影響を受けずに撮像装置の動きを抽出するには、画像の面積に対する動物体を比較的小さくして撮像することが好ましい。また、以下では、撮像範囲５１１乃至５１３に対応する撮像画像については、同一の符号を付して撮像画像５１１乃至５１３として説明する。撮像装置５００は、右マイク５０１と左マイク５０２とを備え、撮像画像５１１乃至５１３とともに、右マイク５０１および左マイク５０２により右チャンネルおよび左チャンネルの入力音声が取得される。取得された入力音声は、一般に、撮像装置５００のファインダーに映し出される画像と合っている。図１８（ｂ）には、図１８（ａ）で示す撮像画像５１１乃至５１３を通常の再生方法で再生する例が示されている。撮像画像５１１乃至５１３は、表示部１８０の表示画面全体に表示され、撮像画像５１１乃至５１３内の車５１４乃至５１６は、表示画面中央に表示される。ここで、撮像画像５１１乃至５１３の表示に合わせて出力される出力音声については、右チャンネルおよび左チャンネルの入力音声がそのまま右スピーカ２２１および左スピーカ２２２に出力される。このような通常の再生方法では、撮像画像の入力音声をそのまま出力音声として出力しても撮像画像に合った音声となる。なお、音量表示５１７および５１８は、右チャンネルおよび左チャンネルの入力音声の音量を示しており、また、右チャンネルの入力音声の音量を白で表し、左チャンネルの入力音声の音量を黒で表している。 FIG. 18 is a diagram illustrating an outline of an example in which a captured moving image is reproduced by a normal reproduction method. FIG. 18A shows imaging ranges 511 to 513 captured by the imaging apparatus 500 with a vehicle moving from right to left as an imaging target. In addition, the images are picked up so that the cars 514 to 516 fit in the centers of the image pickup ranges 511 to 513. Here, in order to facilitate understanding of the drawings, the ratio of the cars 514 to 516 to the area of the captured images 511 to 513 is shown to be relatively large for convenience, but as described in FIG. In order to extract the movement of the imaging apparatus without being influenced by the moving object, it is preferable to capture the moving object with a relatively small moving object relative to the area of the image. Hereinafter, captured images corresponding to the imaging ranges 511 to 513 will be described as captured images 511 to 513 with the same reference numerals. The imaging apparatus 500 includes a right microphone 501 and a left microphone 502, and right channel and left channel input sounds are acquired by the right microphone 501 and the left microphone 502 together with the captured images 511 to 513. The acquired input sound generally matches an image displayed on the finder of the imaging apparatus 500. FIG. 18B shows an example in which the captured images 511 to 513 shown in FIG. 18A are reproduced by a normal reproduction method. The captured images 511 to 513 are displayed on the entire display screen of the display unit 180, and the cars 514 to 516 in the captured images 511 to 513 are displayed at the center of the display screen. Here, as for the output sound output in accordance with the display of the captured images 511 to 513, the input sound of the right channel and the left channel is output to the right speaker 221 and the left speaker 222 as they are. In such a normal reproduction method, even if the input sound of the captured image is output as it is as the output sound, the sound is suitable for the captured image. The volume indications 517 and 518 indicate the volume of the input sound of the right channel and the left channel, the volume of the input sound of the right channel is expressed in white, and the volume of the input sound of the left channel is expressed in black. Yes.

図１９は、本発明の実施の形態における画像処理装置１００による再生例の概要を示す図である。ここでは、現フレームの前の各画像により形成された合成画像を固定して画像変換情報供給部１３０から供給されたアフィン変換パラメータを用いて現フレームに対応する画像を変換して再生する例を示す。図１９で示す撮像画像５１１乃至５１３は、図１８（ａ）で示す撮像画像５１１乃至５１３と同じであるが、この例では、撮像装置５００の移動方向５２１から求められたアフィン変換パラメータに基づいて表示部１８０の表示画面上の右から左に撮像画像５１１、５１２、５１３の順に表示される。なお、図１９では、説明のため、撮像画像５１１乃至５１３の間隔を空けない状態を示す。この場合には、図１８（ｂ）と異なり、撮像画像５１１乃至５１３が表示画面上を移動するため、取得された入力音声をそのまま出力させると、撮像画像５１１乃至５１３の移動に応じた自然な音響効果を得ることができない。そこで、本発明の実施の形態では、画像処理装置１００は、表示画面上における撮像画像５１１乃至５１３の中心位置に応じて左チャンネルおよび右チャンネルの入力音声の加算割合を調整して出力チャンネル毎に出力する。具体的には、表示画面上における撮像画像５１１乃至５１３の中心位置に応じて右チャンネルおよび左チャンネルの入力音声の音量の比率を調整して加算された音声が各スピーカに出力される。表示画面上に撮像画像５１３が表示された場合を例にすると、右スピーカ２２１の出力音声としては、右チャンネルの入力音声を減衰させた音量５１９の音声が出力される。ここでは、減衰させた音量５１９に左チャンネルの入力音声を加算しているが、撮像画像５１３の中心位置が表示画面上の左側に位置するため、左チャンネルの入力音声の音量の比率を「０」としている。そのため、右スピーカ２２１には右チャンネルの入力音声のみが出力される。左スピーカ２２２の出力音声としては、左チャンネルの入力音声の音量５１８に右チャンネルの入力音声の音量から減衰させた音量５１９を引いた音声の音量５２０を加算した音声が出力される。 FIG. 19 is a diagram showing an outline of a reproduction example by the image processing apparatus 100 in the embodiment of the present invention. Here, an example in which a composite image formed by each image before the current frame is fixed and an image corresponding to the current frame is converted and reproduced using the affine transformation parameters supplied from the image conversion information supply unit 130. Show. The captured images 511 to 513 shown in FIG. 19 are the same as the captured images 511 to 513 shown in FIG. 18A, but in this example, based on the affine transformation parameters obtained from the moving direction 521 of the imaging device 500. The captured images 511, 512, and 513 are displayed in order from right to left on the display screen of the display unit 180. Note that FIG. 19 illustrates a state in which the captured images 511 to 513 are not spaced for the sake of explanation. In this case, unlike the image of FIG. 18B, the captured images 511 to 513 move on the display screen. Therefore, when the acquired input sound is output as it is, natural images corresponding to the movement of the captured images 511 to 513 are obtained. The sound effect cannot be obtained. Therefore, in the embodiment of the present invention, the image processing apparatus 100 adjusts the addition ratio of the input sound of the left channel and the right channel according to the center position of the captured images 511 to 513 on the display screen for each output channel. Output. Specifically, the volume of the input sound of the right channel and the left channel is adjusted according to the center position of the captured images 511 to 513 on the display screen, and the added sound is output to each speaker. Taking the case where the captured image 513 is displayed on the display screen as an example, as the output sound of the right speaker 221, the sound of the volume 519 obtained by attenuating the input sound of the right channel is output. Here, the input sound of the left channel is added to the attenuated volume 519. However, since the center position of the captured image 513 is located on the left side of the display screen, the volume ratio of the input sound of the left channel is set to “0”. " Therefore, only the right channel input sound is output to the right speaker 221. As the output sound of the left speaker 222, a sound obtained by adding a sound volume 520 obtained by subtracting a sound volume 519 attenuated from a sound volume of the input sound of the right channel to a sound volume 518 of the input sound of the left channel is output.

次に、本発明の実施の形態における音声変換情報算出部１９０による移動に関する音声変換情報の算出例について図面を参照して詳細に説明する。 Next, an example of calculating speech conversion information related to movement by the speech conversion information calculation unit 190 according to the embodiment of the present invention will be described in detail with reference to the drawings.

図２０は、本発明の実施の形態における表示部１８０の表示画面の座標系について示すブロック図である。この例では、表示画面における画像１８５を例にして説明する。 FIG. 20 is a block diagram showing the coordinate system of the display screen of the display unit 180 in the embodiment of the present invention. In this example, an image 185 on the display screen will be described as an example.

水平方向にＸ軸１８１を、垂直方向にＹ軸１８２をそれぞれ想定し、これらの軸の原点を表示部１８０の表示画面の中心とする。また、表示部１８０の表示画面の横幅１８３および縦幅１８４をそれぞれｗｉｄｔｈおよびｈｅｉｇｈｔにより表す。この座標系において、原点からの現フレームに対応する画像１８５の中心位置１８６の移動量としては、Ｘ軸方向における移動量１８７をｘ、Ｙ軸方向における移動量をｙとする。また、画像１８５とＸ軸の成す角度１８９をθとする。 Assuming the X axis 181 in the horizontal direction and the Y axis 182 in the vertical direction, the origin of these axes is the center of the display screen of the display unit 180. In addition, the horizontal width 183 and the vertical width 184 of the display screen of the display unit 180 are represented by width and height, respectively. In this coordinate system, as the movement amount of the center position 186 of the image 185 corresponding to the current frame from the origin, the movement amount 187 in the X-axis direction is x, and the movement amount in the Y-axis direction is y. An angle 189 formed by the image 185 and the X axis is θ.

このように定義した座標系を用いて、これ以降に示す音声変換情報の算出に関連するグラフおよび関係式を表すこととする。 By using the coordinate system defined as described above, a graph and a relational expression related to the calculation of the voice conversion information shown below will be expressed.

図２１は、本発明の実施の形態における現フレームに対応する画像の中心位置と出力音声との関係を例示するグラフを示す図である。図２１（ａ）および（ｂ）では、横軸を、表示画面における画像の移動量（ｘ）を示す軸とし、縦軸を、入力音声に対する出力音声の比率（Ｒａｔｅ）を示す軸とする。実線６１１および６２１は、右チャンネルの出力音声の出力割合を示しており、破線６１２および６２２は、左チャンネルの出力音声の出力割合を示している。図２１（ａ）には、右チャンネルの入力音声が移動量ｘに応じて右チャンネルおよび左チャンネルの出力音声に配分される割合が示されている。図２１（ｂ）には、左チャンネルの入力音声が移動量ｘに応じて各チャンネルの出力音声に配分される割合が示されている。最終的に、右チャンネルの出力音声については、実線６１１および６２１から定まる右チャンネルおよび左チャンネルの音声が加算されて出力される。左チャンネルの出力音声についても、破線６１２および６２２から定まる各チャンネルの音声が加算されて出力される。 FIG. 21 is a diagram illustrating a graph illustrating the relationship between the center position of the image corresponding to the current frame and the output sound in the embodiment of the present invention. In FIGS. 21A and 21B, the horizontal axis is an axis indicating the movement amount (x) of the image on the display screen, and the vertical axis is an axis indicating the ratio (Rate) of the output sound to the input sound. Solid lines 611 and 621 indicate the output ratio of the right channel output sound, and broken lines 612 and 622 indicate the output ratio of the left channel output sound. FIG. 21 (a) shows the ratio of the right channel input sound distributed to the right channel and left channel output sound according to the movement amount x. FIG. 21B shows the ratio of the input sound of the left channel distributed to the output sound of each channel according to the movement amount x. Finally, as for the right channel output sound, the right channel and left channel sound determined from the solid lines 611 and 621 are added and output. As for the output sound of the left channel, the sound of each channel determined from the broken lines 612 and 622 is added and output.

ここで示す実線６１１に関する移動量ｘと入力音声に対する出力音声の比率ｆ（ｘ）との関係は、次式を用いて表すことができる。
ｆ（ｘ）＝（α／（ｗｉｄｔｈ／２）)・ｘ＋１（−ｗｉｄｔｈ／２≦ｘ＜０）
１（０≦ｘ＜ｗｉｄｔｈ／２） The relationship between the movement amount x related to the solid line 611 and the ratio f (x) of the output sound to the input sound can be expressed using the following equation.
f (x) = (α / (width / 2)) · x + 1 (−width / 2 ≦ x <0)
1 (0 ≦ x <width / 2)

なお、ｗｉｄｔｈは、表示画面の横幅である。また、パラメータαの値は、例えば、０．３〜０．４とするのが望ましい。 The width is the width of the display screen. Further, the value of the parameter α is preferably 0.3 to 0.4, for example.

上記式を用いると、破線６１２、実線６２１および破線６２２の関係式は、それぞれ１−ｆ（ｘ）、１−ｆ（−ｘ）およびｆ（−ｘ）として表される。 Using the above expression, the relational expressions of the broken line 612, the solid line 621, and the broken line 622 are expressed as 1-f (x), 1-f (-x), and f (-x), respectively.

ここで、入力音声の変換に関する関係式は、式２として表される。
Ｒ'＝Ｒ・ＲＲ＋Ｌ・ＬＲ・・・（式２）
Ｌ'＝Ｒ・ＲＬ＋Ｌ・ＬＬ Here, the relational expression regarding the conversion of the input speech is expressed as Expression 2.
R ′ = R · RR + L·LR (Formula 2)
L ′ = R · RL + L·LL

なお、ＲＲ＝ｆ（ｘ）、ＲＬ＝（１−ｆ（ｘ））、ＬＲ＝（１−ｆ（−ｘ））、ＬＬ＝ｆ（−ｘ）である。また、ｘは、現フレームに対応する画像の移動量（水平方向における原点から現フレームに対応する画像の中心位置までの距離）である。Ｒ'およびＬ'は、それぞれ右チャンネルおよび左チャンネルの出力音声である。ＲおよびＬは、それぞれ右チャンネルおよび左チャンネルの入力音声である。 Note that RR = f (x), RL = (1−f (x)), LR = (1−f (−x)), and LL = f (−x). X is the amount of movement of the image corresponding to the current frame (the distance from the origin in the horizontal direction to the center position of the image corresponding to the current frame). R ′ and L ′ are the output sound of the right channel and the left channel, respectively. R and L are the input sound of the right channel and the left channel, respectively.

ここに示すＲＲ、ＲＬ、ＬＲおよびＬＬが音声変換情報に相当し、音声変換情報算出部１９０は、現フレームに対応する画像の中心位置から、これらＲＲ、ＲＬ、ＬＲおよびＬＬを算出する。 RR, RL, LR, and LL shown here correspond to audio conversion information, and the audio conversion information calculation unit 190 calculates these RR, RL, LR, and LL from the center position of the image corresponding to the current frame.

なお、ここでは一例として、スピーカ２２０を表示画面の左右に設置することを想定し、現フレームに対応する画像の画面上における左右方向の位置関係に基づいて、現フレームに対応する画像に係る音声に関する音声変換情報を算出する例について説明したが、例えば、センタースピーカのように表示画面の中央部分に設置するスピーカシステムや、表示画面の上下に設置するスピーカシステムにも適用してもよい。例えば、上下に設置するスピーカシステムに適用する場合には、現フレームに対応する画像の画面上における上下方向の位置関係に基づいて、現フレームに対応する画像に係る音声に関する音声変換情報を算出することができる。また、例えば、中央部分に設置するスピーカシステムに適用する場合には、現フレームに対応する画像の画面上における左右方向の位置関係に基づいて、現フレームに対応する画像に係る音声に関する音声変換情報を算出することができる。すなわち、アフィン変換パラメータに基づいて現フレームに対応する画像に係る音声に関する音声変換情報を算出することにより、この音声変換情報に基づいて音声を変換処理して出力音声を生成する。 Here, as an example, it is assumed that the speakers 220 are installed on the left and right of the display screen, and the sound related to the image corresponding to the current frame is based on the positional relationship in the horizontal direction on the screen of the image corresponding to the current frame. However, the present invention may be applied to a speaker system installed at the center of the display screen, such as a center speaker, or a speaker system installed above and below the display screen. For example, when applied to a speaker system installed at the top and bottom, the audio conversion information related to the sound related to the image corresponding to the current frame is calculated based on the vertical positional relationship of the image corresponding to the current frame on the screen. be able to. Also, for example, when applied to a speaker system installed in the center portion, based on the positional relationship in the left-right direction on the screen of the image corresponding to the current frame, the audio conversion information relating to the sound related to the image corresponding to the current frame Can be calculated. That is, by calculating sound conversion information related to sound related to an image corresponding to the current frame based on the affine conversion parameters, sound is converted based on the sound conversion information to generate output sound.

次に、本発明の実施の形態における現フレームに対応する画像の回転に関する音声変換処理について図面を参照して詳細に説明する。 Next, audio conversion processing relating to image rotation corresponding to the current frame in the embodiment of the present invention will be described in detail with reference to the drawings.

図２２は、撮像装置５００と被写体との関係について例示する図である。図２２（ａ）には、撮像開始時における状態が示されている。ここでは、右マイク５０１および左マイク５０２を備えた撮像装置５００により、声を出している人５３１およびベルが鳴っている目覚し時計５３２を撮像している状況が示されている。この場合、右マイク５０１には人５３１の声が比較的大きな割合で入力され、左マイク５０２には目覚し時計５３２の音が比較的大きな割合で入力される。図２２（ｂ）には、図２２（ａ）の状態において撮像された撮像画像５５１が示されている。音量表示５４３は、右マイク５０１により取得された右チャンネルの入力音声の音量であり、音量表示５４４は、左マイク５０２により取得された左チャンネルの入力音声の音量である。なお、撮像画像５５１におけるＲマーク５４１およびＬマーク５４２は、撮像画像上における右マイク５０１および左マイク５０２の位置関係を把握するための目印として示している。図２２（ｃ）は、図２２（ａ）の状態を撮像装置５００の背面から見た図である。ここでは、撮像装置５００を時計方向５４５に１８０度回転させて撮像動画を撮像する。この場合、右マイク５０１で取得される右チャンネルの入力音声は、回転角度に応じて、人５３１の声の割合が徐々に小さくなるのに対し、目覚し時計５３２のベルの音の割合が徐々に大きくなる。一方、左マイク５０２で取得される左チャンネルの入力音声は、回転角度に応じて、目覚し時計５３２のベルの音の割合が徐々に小さくなり、それとは逆に人５３１の声の割合が徐々に大きくなる。このようにして撮像された撮像動画の再生例について次図を参照して説明する。 FIG. 22 is a diagram illustrating the relationship between the imaging apparatus 500 and the subject. FIG. 22A shows a state at the start of imaging. Here, a situation is shown in which an imaging device 500 including a right microphone 501 and a left microphone 502 images a person 531 who is speaking and an alarm clock 532 where a bell is ringing. In this case, the voice of the person 531 is input to the right microphone 501 at a relatively high rate, and the sound of the alarm clock 532 is input to the left microphone 502 at a relatively high rate. FIG. 22B shows a captured image 551 captured in the state of FIG. The volume display 543 is the volume of the input sound of the right channel acquired by the right microphone 501, and the volume display 544 is the volume of the input sound of the left channel acquired by the left microphone 502. The R mark 541 and the L mark 542 in the captured image 551 are shown as marks for grasping the positional relationship between the right microphone 501 and the left microphone 502 on the captured image. FIG. 22C is a diagram when the state of FIG. 22A is viewed from the back surface of the imaging apparatus 500. Here, the imaging apparatus 500 is rotated 180 degrees clockwise 545 to capture a captured moving image. In this case, the input sound of the right channel acquired by the right microphone 501 gradually decreases the proportion of the voice of the person 531 according to the rotation angle, whereas the proportion of the bell sound of the alarm clock 532 gradually increases. growing. On the other hand, the input sound of the left channel acquired by the left microphone 502 gradually decreases the ratio of the bell sound of the alarm clock 532 according to the rotation angle, and conversely, the ratio of the voice of the person 531 gradually increases. growing. A reproduction example of the captured moving image captured in this way will be described with reference to the following diagram.

図２３は、本発明の実施の形態における画像処理装置１００による再生例の概要を示す図である。図２３（ａ）には、図２２に示す撮像装置５００で撮像された撮像動画を通常の再生方法で表示した一連の表示画像５５１乃至５５５が示されている。図２３（ｂ）には、本発明の実施の形態における画像処理装置１００により再生した一連の表示画像５６１乃至５６５の一例が示されており、この再生例は、合成画像を固定して画像変換情報供給部１３０から供給されたアフィン変換パラメータを用いて現フレームに対応する画像を変換する例である。なお、ここでは、簡略化のため表示画面の枠を省略して示しており、また、撮像画像は、表示画面の中心に表示されるものとする。 FIG. 23 is a diagram showing an outline of a reproduction example by the image processing apparatus 100 according to the embodiment of the present invention. FIG. 23A shows a series of display images 551 to 555 in which a captured moving image captured by the image capturing apparatus 500 illustrated in FIG. 22 is displayed by a normal reproduction method. FIG. 23B shows an example of a series of display images 561 to 565 reproduced by the image processing apparatus 100 according to the embodiment of the present invention. In this reproduction example, the composite image is fixed and image conversion is performed. This is an example of converting an image corresponding to the current frame using the affine transformation parameters supplied from the information supply unit 130. Here, the frame of the display screen is omitted for simplification, and the captured image is displayed at the center of the display screen.

図２３（ａ）では、表示画像５５１乃至５５５上に表すＲマーク５４１およびＬマーク５４２の位置関係が変わらないため、表示画像５５１乃至５５５の入力音声をそのまま出力音声として出力しても、表示画像５５１乃至５５５に合った音声となる。 In FIG. 23A, the positional relationship between the R mark 541 and the L mark 542 shown on the display images 551 to 555 does not change. The sound is suitable for 551 to 555.

一方、図２３（ｂ）では、表示画像５６１乃至５６５上に表すＲマーク５４１およびＬマーク５４２の位置関係が変わるため、取得された入力音声をそのまま出力させると自然な音響効果を得ることができない。そこで、本発明の実施の形態では、画像処理装置１００は、表示画面上における表示画像の角度に応じて右チャンネルおよび左チャンネルの入力音声の加算割合を調整して出力チャンネル毎に出力する。具体的には、右チャンネルの出力音声としては、表示画像５６１乃至５６５の角度に応じて、右チャンネルの入力音声の音量を減衰させるとともに左チャンネルの入力音声の音量を徐々に加算した音声が出力される。左チャンネルの出力音声としては、右チャンネルの出力音声における右チャンネルの入力音量の減衰分、および、右チャンネルの出力音声において右チャンネルの入力音量が加算された分の残りの分を加算した音声が出力される。 On the other hand, in FIG. 23B, since the positional relationship between the R mark 541 and the L mark 542 shown on the display images 561 to 565 changes, a natural acoustic effect cannot be obtained if the acquired input sound is output as it is. . Therefore, in the embodiment of the present invention, the image processing apparatus 100 adjusts the addition ratio of the input sound of the right channel and the left channel according to the angle of the display image on the display screen, and outputs it for each output channel. Specifically, as the output sound of the right channel, sound in which the volume of the input sound of the right channel is attenuated and the volume of the input sound of the left channel is gradually added according to the angle of the display images 561 to 565 is output. Is done. The left channel output sound includes the right channel input sound attenuated by the right channel input sound volume, and the right channel output sound added by the right channel input sound volume. Is output.

図２４は、本発明の実施の形態における現フレームに対応する画像の角度と出力音声との関係を例示するグラフを示す図である。図２４（ａ）および（ｂ）では、横軸を、水平方向に対する角度（θ）を示す軸とし、縦軸を、入力音声に対する出力音声の比率（Ｒａｔｅ）を示す軸とする。実線７１１および７２１は、右チャンネルの出力音声の出力割合を示しており、破線７１２および７２２は、左チャンネルの出力音声の出力割合を示している。図２４（ａ）には、右チャンネルの入力音声が角度θに応じて各チャンネルの出力音声に配分される割合が示されている。図２４（ｂ）には、左チャンネルの入力音声が角度θに応じて各チャンネルの出力音声に配分される割合が示されている。最終的に、右チャンネルの出力音声については、実線７１１および７２１から定まる比率で各チャンネルの入力音声が加算されて出力される。左チャンネルの出力音声についても、破線７１２および７２２から定まる比率で各チャンネルの入力音声が加算されて出力される。 FIG. 24 is a diagram illustrating a graph illustrating the relationship between the angle of the image corresponding to the current frame and the output sound in the embodiment of the present invention. In FIGS. 24A and 24B, the horizontal axis is an axis indicating the angle (θ) with respect to the horizontal direction, and the vertical axis is an axis indicating the ratio (Rate) of the output sound to the input sound. Solid lines 711 and 721 indicate the output ratio of the output sound of the right channel, and broken lines 712 and 722 indicate the output ratio of the output sound of the left channel. FIG. 24A shows the ratio of the right channel input sound distributed to the output sound of each channel according to the angle θ. FIG. 24B shows the ratio of the input sound of the left channel distributed to the output sound of each channel according to the angle θ. Finally, the output sound of the right channel is added and output at the ratio determined from the solid lines 711 and 721. As for the output sound of the left channel, the input sound of each channel is added and output at a ratio determined from the broken lines 712 and 722.

ここで示す実線７１１に関する現フレームに対応する画像の角度θと入力音声に対する出力音声の比率ｇ（θ）との関係は、次式を用いて表すことができる。
ｇ（θ）＝（１＋ｃｏｓθ）／２ The relationship between the angle θ of the image corresponding to the current frame with respect to the solid line 711 and the ratio g (θ) of the output sound to the input sound can be expressed using the following equation.
g (θ) = (1 + cos θ) / 2

上記式を用いると、破線７１２、実線７２１および破線７２２の関係式は、それぞれ１−ｇ（θ）、１−ｇ（θ）およびｇ（θ）として表される。ここで、音声変換情報に相当する、式２に示したＲＲ、ＲＬ、ＬＲおよびＬＬは、それぞれＲＲ＝ｇ（θ）、ＲＬ＝（１−ｇ（θ））、ＬＲ＝（１−ｇ（θ））、ＬＬ＝ｇ（θ）として表される。 Using the above formula, the relational expressions of the broken line 712, the solid line 721, and the broken line 722 are expressed as 1-g (θ), 1-g (θ), and g (θ), respectively. Here, RR, RL, LR, and LL shown in Expression 2 corresponding to the voice conversion information are RR = g (θ), RL = (1-g (θ)), and LR = (1-g ( θ)), LL = g (θ).

なお、ここでは一例として、スピーカ２２０を表示画面の左右に設置することを想定したが、図２１と同様に、表示画面の上下に設置するスピーカシステムにも適用してもよい。 Here, as an example, it is assumed that the speakers 220 are installed on the left and right sides of the display screen. However, as in FIG.

次に、本発明の実施の形態における現フレームに対応する画像の倍率に関する音声変換処理について図面を参照して詳細に説明する。 Next, audio conversion processing relating to the magnification of an image corresponding to the current frame in the embodiment of the present invention will be described in detail with reference to the drawings.

図２５は、本発明の実施の形態における画像処理装置１００による再生例の概要を示す図である。図２５（ａ）および（ｃ）には、右マイク５０１および左マイク５０２を備えた撮像装置５００により、人５３１および目覚し時計５３２を撮像している状況が示されている。そして、図２５（ｂ）、（ｄ）および（ｅ）には、撮像画像の表示例が示されており、ここでは、表示部１８０の表示画面の中心の一部領域に表示されていることとする。図２５（ｂ）には、図２５（ａ）に示す撮像装置５００で撮像された撮像画像５５１が示されている。音量表示５４３は、右マイク５０１により取得された右チャンネルの入力音声の音量であり、音量表示５４４は、左マイク５０２により取得された左チャンネルの入力音声の音量である。図２５（ｃ）には、図２５（ａ）に示す撮像装置５００の撮像状態から被写体にズームインした状態が示されている。 FIG. 25 is a diagram showing an outline of a reproduction example by the image processing apparatus 100 according to the embodiment of the present invention. FIGS. 25A and 25C show a situation where the image pickup apparatus 500 including the right microphone 501 and the left microphone 502 is picking up the image of the person 531 and the alarm clock 532. 25 (b), (d), and (e) show display examples of captured images, which are displayed in a partial region at the center of the display screen of the display unit 180. And FIG. 25B illustrates a captured image 551 captured by the imaging apparatus 500 illustrated in FIG. The volume display 543 is the volume of the input sound of the right channel acquired by the right microphone 501, and the volume display 544 is the volume of the input sound of the left channel acquired by the left microphone 502. FIG. 25C shows a state in which the subject is zoomed in from the imaging state of the imaging apparatus 500 shown in FIG.

図２５（ｄ）および（ｅ）には、本発明の実施の形態における画像処理装置１００による再生例を示している。図２５（ｄ）は、現フレームに対応する画像の大きさを固定して画像変換情報供給部１３０から供給されたアフィン変換パラメータを用いて合成画像を変換する再生例である。この場合には、撮像装置５００のズームイン操作により、撮像画像５７１における人５３１および目覚し時計５３２が拡大表示される。そこで、本発明の実施の形態では、画像処理装置１００は、現フレームに対応する画像の倍率に応じて左チャンネルおよび右チャンネルの入力音声の音量をそれぞれ同じ比率で調整して出力チャンネル毎に出力する。具体的には、撮像画像５５１に対する撮像画像５７１における被写体の拡大率に応じて、各チャンネルの入力音声の音量５４３および５４４を同じ比率で増幅（音量表示５４６および５４７をそれぞれ加算）させた音声がそれぞれ出力される。 25D and 25E show examples of reproduction by the image processing apparatus 100 in the embodiment of the present invention. FIG. 25D shows a reproduction example in which the composite image is converted using the affine transformation parameters supplied from the image conversion information supply unit 130 while fixing the size of the image corresponding to the current frame. In this case, the person 531 and the alarm clock 532 in the captured image 571 are enlarged and displayed by a zoom-in operation of the imaging apparatus 500. Therefore, in the embodiment of the present invention, the image processing apparatus 100 adjusts the volume of the input sound of the left channel and the right channel at the same ratio according to the magnification of the image corresponding to the current frame, and outputs it for each output channel. To do. Specifically, the sound obtained by amplifying the volume 543 and 544 of the input sound of each channel at the same ratio (adding the volume indications 546 and 547, respectively) according to the magnification rate of the subject in the captured image 571 with respect to the captured image 551. Each is output.

一方、図２５（ｅ）は、合成画像を固定して画像変換情報供給部１３０から供給されたアフィン変換パラメータを用いて、現フレームに対応する画像を変換する再生例であるが、この場合には、表示画面上における撮像画像５７１の被写体の大きさが、図２５（ｂ）に示す被写体の大きさと変わらない。そこで、本発明の実施の形態では、画像処理装置１００は、入力音声の音量の比率を変えずにそのまま出力する。具体的には、この表示モードの場合には、画像変換部１４０は、現フレームに対応する画像の倍率以外の中心位置および角度を音声変換情報算出部１９０に出力する。 On the other hand, FIG. 25 (e) is a reproduction example in which an image corresponding to the current frame is converted using the affine transformation parameters supplied from the image conversion information supply unit 130 while fixing the composite image. The size of the subject of the captured image 571 on the display screen is the same as the size of the subject shown in FIG. Therefore, in the embodiment of the present invention, the image processing apparatus 100 outputs the input sound as it is without changing the volume ratio of the input sound. Specifically, in this display mode, the image conversion unit 140 outputs the center position and angle other than the magnification of the image corresponding to the current frame to the audio conversion information calculation unit 190.

図２６は、本発明の実施の形態における現フレームに対応する画像の倍率と出力音声との関係を例示するグラフ図である。図２６（ａ）および（ｂ）では、横軸を、画像の倍率（ｚ）を示す軸とし、縦軸を、入力音声に対する出力音声の比率（Ｒａｔｅ）を示す軸とする。図２６（ａ）には、倍率ｚに応じた右チャンネルの入力音声に対する右チャンネルの出力音声の割合が示されている。図２６（ｂ）には、倍率ｚに応じた左チャンネルの入力音声に対する左チャンネルの出力音声の割合が示されている。 FIG. 26 is a graph illustrating the relationship between the magnification of the image corresponding to the current frame and the output sound in the embodiment of the present invention. In FIGS. 26A and 26B, the horizontal axis is an axis indicating the magnification (z) of the image, and the vertical axis is an axis indicating the ratio (Rate) of the output sound to the input sound. FIG. 26A shows the ratio of the right channel output sound to the right channel input sound according to the magnification z. FIG. 26B shows the ratio of the output sound of the left channel to the input sound of the left channel corresponding to the magnification z.

ここで示す実線７１３に関する倍率ｚと入力音声に対する出力音声の比率ｈ（ｚ）との関係は、次式を用いて表すことができる。
ｈ（ｚ）＝１−β （０＜ｚ≦ｚ１）
（２β／（ｚ２−ｚ１））・（ｚ−ｚ１）＋１−β （ｚ１≦ｚ＜ｚ２）
１＋β （ｚ２≦ｚ） The relationship between the magnification z related to the solid line 713 and the ratio h (z) of the output sound to the input sound can be expressed using the following equation.
h (z) = 1−β (0 <z ≦ z1)
(2β / (z2−z1)) · (z−z1) + 1−β (z1 ≦ z <z2)
1 + β (z2 ≦ z)

なお、ｚは、現フレームに対応する画像の倍率である。パラメータβは、倍率による音声への影響をあまり与えないように、例えば、０．１〜０．２とするのが望ましい。ｚ１およびｚ２は、βの値を考慮して、適宜、決められるものである。 Note that z is the magnification of the image corresponding to the current frame. The parameter β is preferably set to 0.1 to 0.2, for example, so as not to greatly affect the sound due to the magnification. z1 and z2 are appropriately determined in consideration of the value of β.

ここで、音声変換情報に相当する、式２に示したＲＲおよびＬＬは、ＲＲ＝ＬＬ＝ｈ（ｚ）として表される。なお、図２５においては、各チャンネルの出力音声に対し、各チャンネルの入力音声が加算されない場合について説明したが、加算される場合には、その加算される各チャンネルの入力音声の音量は同じ比率で増幅されるため、ＲＬおよびＬＲについても、ＲＲおよびＬＬと同様にｈ（ｚ）として表される。また、ｈ（ｚ）は、次式に示す１＋βと１−βとを漸近線とするシグモイド関数等などでもよい。
ｈ（ｚ）＝（１／（１＋ｅ^{−（ｚ−１）}）−０．５）・β＋１ Here, RR and LL shown in Expression 2 corresponding to the voice conversion information are expressed as RR = LL = h (z). In FIG. 25, the case where the input sound of each channel is not added to the output sound of each channel has been described. However, when added, the volume of the input sound of each added channel is the same ratio. Therefore, RL and LR are also expressed as h (z) similarly to RR and LL. H (z) may be a sigmoid function or the like having asymptotic lines between 1 + β and 1-β shown in the following equation.
h (z) = (1 / (1 + e− ^(z−1) ) − 0.5) · β + 1

以上では、現フレームに対応する画像の中心位置、角度および倍率が順次変更される場合について説明したが、これらの変更が組み合わされている場合についてもそれぞれの関係式を掛け合わせることにより同様に表すことができる。具体的には、音声変換情報に相当する、式２に示すＲＲ、ＲＬ、ＬＲおよびＬＬは、それぞれＲＲ＝ｆ（ｘ）・ｇ（θ）・ｈ（ｚ）、ＲＬ＝（１−ｆ（ｘ））・（１−ｇ（θ））・ｈ（ｚ）、ＬＲ＝（１−ｆ（−ｘ））・（１−ｇ（θ））・ｈ（ｚ）、ＬＬ＝ｆ（−ｘ）・ｇ（θ）・ｈ（ｚ）として表される。なお、ここでは一例として、右チャンネルおよび左チャンネルの入力音声について説明したが、センターチャンネルを加えた入力音声について適用してもよい。 The case where the center position, the angle, and the magnification of the image corresponding to the current frame are sequentially changed has been described above, but the case where these changes are combined is also expressed by multiplying the respective relational expressions. be able to. Specifically, RR, RL, LR, and LL shown in Equation 2 corresponding to the voice conversion information are RR = f (x) · g (θ) · h (z) and RL = (1−f ( x)) · (1−g (θ)) · h (z), LR = (1−f (−x)) · (1−g (θ)) · h (z), LL = f (−x ) · G (θ) · h (z). Here, as an example, the input sound of the right channel and the left channel has been described. However, the input sound may be applied to the center channel.

図２７は、本発明の実施の形態における画像処理装置１００による動画再生処理の処理手順を示すフローチャートである。 FIG. 27 is a flowchart showing a processing procedure of moving image reproduction processing by the image processing apparatus 100 according to the embodiment of the present invention.

最初に、動画を構成する画像のサイズよりも大きいワークバッファが画像メモリ１６０に確保される（ステップＳ９２１）。続いて、コンテンツ記憶部１１０からコンテンツファイルが取得される（ステップＳ９２２）。続いて、コンテンツファイルをデコードして現フレームに対応する画像および音声を取得する（ステップＳ９２３）。続いて、画像変換情報供給部１３０により現フレームに対応するアフィン変換パラメータが画像変換部１４０に供給される（ステップＳ９２４）。ここで、現フレームが先頭のフレームである場合には、単位行列のアフィン変換パラメータが供給される。続いて、画像変換部１４０は、３つの再生表示モードのうち、どのモードに選択されているかを判断する（ステップＳ９２５）。 First, a work buffer larger than the size of the image constituting the moving image is secured in the image memory 160 (step S921). Subsequently, a content file is acquired from the content storage unit 110 (step S922). Subsequently, the content file is decoded to obtain an image and sound corresponding to the current frame (step S923). Subsequently, the image conversion information supply unit 130 supplies the affine transformation parameters corresponding to the current frame to the image conversion unit 140 (step S924). Here, if the current frame is the first frame, the affine transformation parameters of the unit matrix are supplied. Subsequently, the image conversion unit 140 determines which mode is selected from the three playback display modes (step S925).

そして、現フレームよりも前の各フレームに対応する画像により合成された合成画像を固定して動画を再生表示する場合には、画像変換部１４０から現フレームに対応する画像の倍率以外の中心位置および角度が音声変換情報算出部１９０に出力される。（ステップＳ９２６）。続いて、画像変換部１４０において、乗算により求められたアフィン変換パラメータを用いて現フレームに対応する画像がアフィン変換される（ステップＳ９２７）。ここで、現フレームが先頭のフレームである場合には、単位行列のアフィン変換パラメータを用いてアフィン変換がされるため、実際の画像は変換されない。続いて、画像メモリ１６０に保持されていた画像に、アフィン変換された現フレームに対応する画像が上書きして合成され、その合成された合成画像が画像メモリ１６０に保存される（ステップＳ９２８）。ここで、現フレームが先頭のフレームである場合には、先頭のフレームに対応する画像が画像メモリ１６０に保存される。 Then, when a moving image is reproduced and displayed by fixing a synthesized image synthesized with an image corresponding to each frame before the current frame, the center position other than the magnification of the image corresponding to the current frame is output from the image conversion unit 140. And the angle are output to the voice conversion information calculation unit 190. (Step S926). Subsequently, the image conversion unit 140 performs affine transformation on the image corresponding to the current frame using the affine transformation parameters obtained by multiplication (step S927). Here, when the current frame is the first frame, the affine transformation is performed using the affine transformation parameters of the unit matrix, and thus the actual image is not transformed. Subsequently, the image stored in the image memory 160 is overwritten with the image corresponding to the current frame subjected to the affine transformation, and the combined image is stored in the image memory 160 (step S928). If the current frame is the first frame, an image corresponding to the first frame is stored in the image memory 160.

その後、合成された合成画像が表示部１８０に表示される（ステップＳ９３８）。続いて、音声変換処理が実行される（ステップ９５０）。この音声変換処理ついては、次図を参照して詳細に説明する。続いて、取得された動画を構成するフレームの中で、現フレームが最後のフレームであるか否かが判断される（ステップＳ９３９）。現フレームが最後のフレームではない場合には（ステップＳ９３９）、ステップＳ９２３に戻り、合成画像表示処理を繰り返す。 Thereafter, the synthesized composite image is displayed on the display unit 180 (step S938). Subsequently, a voice conversion process is executed (step 950). The voice conversion process will be described in detail with reference to the next figure. Subsequently, it is determined whether or not the current frame is the last frame among the frames constituting the acquired moving image (step S939). If the current frame is not the last frame (step S939), the process returns to step S923 to repeat the composite image display process.

一方、ステップＳ９２５で現フレームに対応する画像を固定して動画を再生表示する再生表示モードが選択されていると判断された場合には、画像変換部１４０から現フレームに対応する画像の倍率のみが音声変換情報算出部１９０に出力される（ステップＳ９２９）。続いて、画像変換部１４０において、乗算により求められたアフィン変換パラメータを用いて画像メモリ１６０に保存されている合成画像が、アフィン変換パラメータの方向とは逆方向にアフィン変換される（ステップＳ９３１）。ここで、現フレームが先頭のフレームである場合には、画像メモリ１６０に保存されている合成画像が存在しないため、画像は変換されない。続いて、アフィン変換パラメータの方向とは逆方向にアフィン変換された合成画像に、現フレームに対応する画像が上書きして合成され、その合成された合成画像が画像メモリ１６０に保存される（ステップＳ９３２）。ここで、現フレームが先頭のフレームである場合には、先頭のフレームに対応する画像が画像メモリ１６０に保存される。続いて、ステップＳ９３８に進む。 On the other hand, if it is determined in step S925 that the playback display mode in which the image corresponding to the current frame is fixed and the moving image is played back is selected, only the magnification of the image corresponding to the current frame is selected from the image conversion unit 140. Is output to the voice conversion information calculation unit 190 (step S929). Subsequently, in the image conversion unit 140, the composite image stored in the image memory 160 using the affine transformation parameters obtained by multiplication is affine transformed in the direction opposite to the direction of the affine transformation parameters (step S931). . Here, when the current frame is the first frame, the composite image stored in the image memory 160 does not exist, so the image is not converted. Subsequently, an image corresponding to the current frame is overwritten on the synthesized image that has been affine transformed in the direction opposite to the direction of the affine transformation parameter, and the synthesized image is stored in the image memory 160 (step S932). If the current frame is the first frame, an image corresponding to the first frame is stored in the image memory 160. Then, it progresses to step S938.

また、ステップＳ９２５で現フレームに対応する画像の表示倍率を固定して動画を再生表示する再生表示モードが選択されていると判断された場合には、画像変換部１４０から現フレームに対応する画像の中心位置、角度および倍率が音声変換情報算出部１９０に出力される（ステップＳ９３３）。画像変換情報供給部１３０により供給されたアフィン変換パラメータの各要素から、倍率に関する要素が分離される（ステップＳ９３４）。続いて、分離された倍率に関する要素を用いて、画像メモリ１６０に保存されていた合成画像が、アフィン変換パラメータの方向とは逆方向にアフィン変換される（ステップＳ９３５）。ここで、現フレームが先頭のフレームである場合には、画像メモリ１６０に保存されている合成画像が存在しないため、画像は変換されない。続いて、分離された移動または回転に関する要素を用いて、現フレームに対応する画像がアフィン変換される（ステップＳ９３６）。ここで、現フレームが先頭のフレームである場合には、単位行列のアフィン変換パラメータを用いてアフィン変換がされるため、実際の画像は変換されない。続いて、アフィン変換パラメータの方向とは逆方向にアフィン変換された合成画像に、アフィン変換された現フレームに対応する画像が上書きして合成され、その合成された合成画像が画像メモリ１６０に保存される（ステップＳ９３７）。続いて、ステップＳ９３８に進む。 If it is determined in step S925 that the playback display mode for playing back and displaying a moving image with the display magnification of the image corresponding to the current frame fixed is selected from the image conversion unit 140, the image corresponding to the current frame is selected. Center position, angle, and magnification are output to the voice conversion information calculation unit 190 (step S933). Elements relating to the magnification are separated from each element of the affine transformation parameters supplied by the image conversion information supply unit 130 (step S934). Subsequently, the synthesized image stored in the image memory 160 is affine-transformed in the direction opposite to the direction of the affine transformation parameter using the separated elements relating to the magnification (step S935). Here, when the current frame is the first frame, the composite image stored in the image memory 160 does not exist, so the image is not converted. Subsequently, the image corresponding to the current frame is affine transformed using the separated elements relating to movement or rotation (step S936). Here, when the current frame is the first frame, the affine transformation is performed using the affine transformation parameters of the unit matrix, and thus the actual image is not transformed. Subsequently, the composite image obtained by affine transformation in the direction opposite to the direction of the affine transformation parameter is overwritten with the image corresponding to the current frame subjected to the affine transformation, and the synthesized composite image is stored in the image memory 160. (Step S937). Then, it progresses to step S938.

ステップＳ９３９において、現フレームが最後のフレームである場合には（ステップＳ９３９）、画像メモリ１６０に確保されているワークバッファを解放して（ステップＳ９４１）、動画再生処理を終了する。 In step S939, if the current frame is the last frame (step S939), the work buffer secured in the image memory 160 is released (step S941), and the moving image playback process is terminated.

図２８は、本発明の実施の形態における画像処理装置１００による音声変換処理の処理手順例（ステップＳ９５０の処理手順）を示すフローチャートである。 FIG. 28 is a flowchart showing a processing procedure example (processing procedure in step S950) of the voice conversion processing by the image processing apparatus 100 according to the embodiment of the present invention.

最初に、音声変換情報算出部１９０は、画像変換部１４０により出力された現フレームに対応する画像の中心位置、角度または倍率に基づいて音声変換情報を算出する（ステップＳ９５１）。続いて、音量調整部２０１は、音声変換情報算出部１９０により算出された音声変換情報に基づいてコンテンツ取得部１２０から出力された音声を構成する複数のチャンネルの各音量を調整する（ステップＳ９５２）。続いて、音声加算部２０２は、調整された音声がチャンネル毎に加算されて各チャンネルの出力音声として出力する（ステップＳ９５３）。続いて、加算された各チャンネルの出力音声がスピーカ２２０にそれぞれ出力される（ステップＳ９５４）。 First, the audio conversion information calculation unit 190 calculates audio conversion information based on the center position, angle, or magnification of the image corresponding to the current frame output by the image conversion unit 140 (step S951). Subsequently, the volume adjustment unit 201 adjusts the volume of each of a plurality of channels constituting the audio output from the content acquisition unit 120 based on the audio conversion information calculated by the audio conversion information calculation unit 190 (step S952). . Subsequently, the audio adding unit 202 adds the adjusted audio for each channel and outputs the result as output audio for each channel (step S953). Subsequently, the added output sound of each channel is output to the speaker 220 (step S954).

次に、本発明の実施の形態の第１の変形例について図面を参照して説明する。 Next, a first modification of the embodiment of the present invention will be described with reference to the drawings.

図２９は、本発明の実施の形態における画像処理装置６５０の機能構成例を示すブロック図である。ここで、画像処理装置６５０は、図１に示す画像処理装置１００の一部を変形したものであり、この画像処理装置６５０において、コンテンツ記憶部１１０、コンテンツ取得部１２０および画像変換情報供給部１３０の代わりに、動画記憶部２４０、メタデータ記憶部２５０およびコンテンツ取得部１２１を設けた画像処理装置である。なお、動画記憶部２４０、メタデータ記憶部２５０およびコンテンツ取得部１２１以外の構成は、図１に示す画像処理装置１００と同様であるため、これら以外の構成についての説明は省略する。 FIG. 29 is a block diagram illustrating a functional configuration example of the image processing device 650 according to the embodiment of the present invention. Here, the image processing apparatus 650 is obtained by modifying a part of the image processing apparatus 100 illustrated in FIG. 1. In the image processing apparatus 650, the content storage unit 110, the content acquisition unit 120, and the image conversion information supply unit 130 are used. The image processing apparatus is provided with a moving image storage unit 240, a metadata storage unit 250, and a content acquisition unit 121 instead. Since the configuration other than the moving image storage unit 240, the metadata storage unit 250, and the content acquisition unit 121 is the same as that of the image processing apparatus 100 shown in FIG. 1, the description of other configurations is omitted.

動画記憶部２４０は、動画を動画ファイルとして記憶するものである。また、動画記憶部２４０は、コンテンツ取得部１２０からの要求に応じて動画ファイルをコンテンツ取得部１２０に供給する。なお、動画記憶部２４０に記憶される動画ファイルについては、図３０を参照して詳細に説明する。 The moving image storage unit 240 stores moving images as moving image files. In addition, the moving image storage unit 240 supplies a moving image file to the content acquisition unit 120 in response to a request from the content acquisition unit 120. The moving image file stored in the moving image storage unit 240 will be described in detail with reference to FIG.

メタデータ記憶部２５０は、動画を解析して求められた動き情報に基づいて算出されたアフィン変換パラメータをメタデータファイルとして記憶するものである。また、メタデータ記憶部２５０は、コンテンツ取得部１２０からの要求に応じてメタデータファイルをコンテンツ取得部１２０に供給する。なお、メタデータ記憶部２５０に記憶されるメタデータファイルについては、図３０を参照して詳細に説明する。 The metadata storage unit 250 stores affine transformation parameters calculated based on motion information obtained by analyzing a moving image as a metadata file. Further, the metadata storage unit 250 supplies a metadata file to the content acquisition unit 120 in response to a request from the content acquisition unit 120. The metadata file stored in the metadata storage unit 250 will be described in detail with reference to FIG.

コンテンツ取得部１２１は、操作受付部２３０からの動画再生に係る操作入力に応じて、動画記憶部２４０に記憶されている動画ファイルと、この動画ファイルに関連付けられてメタデータ記憶部２５０に記憶されているメタデータファイルとを取得するものである。このコンテンツ取得部１２１は、取得された動画ファイルの動画およびメタデータファイルのアフィン変換パラメータを画像変換部１４０に出力する。また、コンテンツ取得部１２１は、取得された動画ファイルの動画に対応する音声を音声変換処理部２００に出力する。 The content acquisition unit 121 stores a moving image file stored in the moving image storage unit 240 and the metadata storage unit 250 in association with the moving image file in response to an operation input related to moving image reproduction from the operation receiving unit 230. To obtain the metadata file. The content acquisition unit 121 outputs the acquired moving image of the moving image file and the affine transformation parameters of the metadata file to the image conversion unit 140. Further, the content acquisition unit 121 outputs audio corresponding to the moving image of the acquired moving image file to the audio conversion processing unit 200.

図３０は、本発明の実施の形態における動画記憶部２４０およびメタデータ記憶部２５０に記録されている各ファイルを模式的に示す図である。図３０（ａ）では、動画記憶部２４０に記憶されている動画ファイル２４１乃至２４４と、動画ファイル２４１乃至２４４に関連付けてメタデータ記憶部２５０に記憶されているメタデータファイル２５１乃至２５３とを示す。ここで、動画記憶部２４０に記憶されている各動画ファイルを識別するための識別情報である動画ＩＤが、各動画ファイルに付与されているものとする。例えば、動画ファイル２４１には「＃１」が付与され、動画ファイル２４２には「＃２」が付与され、動画ファイル２４４には「＃ｎ」が付与されている。 FIG. 30 is a diagram schematically showing each file recorded in the moving image storage unit 240 and the metadata storage unit 250 in the embodiment of the present invention. FIG. 30A shows moving image files 241 to 244 stored in the moving image storage unit 240 and metadata files 251 to 253 stored in the metadata storage unit 250 in association with the moving image files 241 to 244. . Here, it is assumed that a moving image ID that is identification information for identifying each moving image file stored in the moving image storage unit 240 is given to each moving image file. For example, “# 1” is assigned to the moving image file 241, “# 2” is assigned to the moving image file 242, and “#n” is assigned to the moving image file 244.

図３０（ｂ）では、動画記憶部２４０に記憶されている動画ファイル２４１と、動画ファイル２４１に関連付けてメタデータ記憶部２５０に記憶されているメタデータファイル２５１とを模式的に示す図である。ここで、動画ファイル２４１は、ｎ枚のフレームで構成された動画のファイルであり、これらのｎ枚のフレームをフレーム１（２４５）乃至ｎ（２４８）として示す。 FIG. 30B is a diagram schematically showing a moving image file 241 stored in the moving image storage unit 240 and a metadata file 251 stored in the metadata storage unit 250 in association with the moving image file 241. . Here, the moving image file 241 is a moving image file composed of n frames, and these n frames are denoted as frames 1 (245) to n (248).

また、メタデータファイル２５１には、動画ＩＤ２５４と、フレーム番号２５５と、アフィン変換パラメータ２５６とが関連付けて格納されている。 The metadata file 251 stores a moving image ID 254, a frame number 255, and an affine transformation parameter 256 in association with each other.

動画ＩＤ２５４は、対応する動画ファイルに付与されている動画ＩＤであり、例えば、動画ファイル２４１に付与されている「＃１」が格納される。 The moving image ID 254 is a moving image ID assigned to the corresponding moving image file. For example, “# 1” added to the moving image file 241 is stored.

フレーム番号２５５は、対応する動画ファイルの動画を構成する各フレームの通し番号であり、例えば、動画ファイル２４１の動画を構成するフレーム（１）２４５乃至（ｎ）２４８に対応する「１」乃至「ｎ」が格納される。 The frame number 255 is a serial number of each frame constituting the moving image of the corresponding moving image file. For example, “1” to “n” corresponding to the frames (1) 245 to (n) 248 constituting the moving image of the moving image file 241. Is stored.

アフィン変換パラメータ２５６は、フレーム番号２５５に対応する動画の各フレームについて計算されたアフィン変換パラメータである。なお、フレーム番号２５５「１」に対応するアフィン変換パラメータ２５６「ａ１，ｂ１，ｃ１，ｄ１，ｅ１，ｆ１」は、単位行列のアフィン変換パラメータである。また、フレーム番号２５５「ｍ（ｍは２以上の整数）」に対応するアフィン変換パラメータ２５６「ａｍ，ｂｍ，ｃｍ，ｄｍ，ｅｍ，ｆｍ」は、フレーム「ｍ」の直前フレーム「ｍ−１」に対するアフィン変換パラメータである。 The affine transformation parameter 256 is an affine transformation parameter calculated for each frame of the moving image corresponding to the frame number 255. The affine transformation parameter 256 “a1, b1, c1, d1, e1, f1” corresponding to the frame number 255 “1” is an affine transformation parameter of the unit matrix. The affine transformation parameter 256 “am, bm, cm, dm, em, fm” corresponding to the frame number 255 “m (m is an integer of 2 or more)” is the frame “m−1” immediately before the frame “m”. Is an affine transformation parameter for.

以上では、現フレームに対応する画像を表示部１８０の真中部分に固定して動画を再生するか否かに応じて、現フレームに対応する画像にアフィン変換を施して合成画像を作成する場合と、前の各フレームに対応する合成画像にアフィン変換パラメータの方向とは逆方向にアフィン変換を施して合成画像を作成する場合とについて説明した。しかしながら、現フレームに対応する現画像にアフィン変換を順次施して合成画像を作成して画像メモリに順次保存するとともに、この画像メモリの合成画像から、表示の対象となる領域である表示領域を取り出して表示させることができる。これにより、動画を再生中に表示部の表示態様を切り換えることができる。以下では、これらの動画再生方法について図面を参照して詳細に説明する。 In the above, a case where an image corresponding to the current frame is fixed to the middle portion of the display unit 180 and a composite image is created by performing affine transformation on the image corresponding to the current frame, depending on whether or not to reproduce a moving image. In the above description, the synthesized image corresponding to each previous frame is subjected to affine transformation in the direction opposite to the direction of the affine transformation parameter to create a synthesized image. However, the current image corresponding to the current frame is sequentially subjected to affine transformation to create a composite image, which is then stored in the image memory. At the same time, the display area that is the display target area is extracted from the composite image in the image memory. Can be displayed. Thereby, the display mode of the display unit can be switched during the reproduction of the moving image. Hereinafter, these moving image playback methods will be described in detail with reference to the drawings.

次に、本発明の実施の形態の第２の変形例について図面を参照して説明する。 Next, a second modification of the embodiment of the present invention will be described with reference to the drawings.

図３１は、本発明の実施の形態における画像処理装置６８０の機能構成例を示すブロック図である。ここで、画像処理装置６８０は、図２９に示す画像処理装置６５０の一部を変形したものである。この画像処理装置６８０は、図２９に示す画像処理装置６５０の機能構成に加えて表示領域取出部２６０および表示メモリ２７０を備え、画像合成部１５０、画像メモリ１６０および音声変換情報算出部１９０に代えて、画像合成部１５１、画像メモリ１６１および音声変換情報算出部１９１を設ける。この画像処理装置６８０は、表示画面内に現フレームに対応する画像を収めることができ、また、その処理に応じた音声変換処理をするものである。なお、動画記憶部２４０、メタデータ記憶部２５０、画像変換部１４０、音声変換処理部２００、音声出力制御部２１０およびスピーカ２２０の構成は、図２９に示す画像処理装置と同様であるため、これらの説明は省略する。また、この例では、図２９に示す画像処理装置６５０の一部を変形した例について説明するが、図１に示す画像処理装置１００についても同様に適用することも可能である。 FIG. 31 is a block diagram illustrating a functional configuration example of the image processing device 680 according to the embodiment of the present invention. Here, the image processing apparatus 680 is obtained by modifying a part of the image processing apparatus 650 shown in FIG. This image processing device 680 includes a display area extraction unit 260 and a display memory 270 in addition to the functional configuration of the image processing device 650 shown in FIG. 29, and replaces the image synthesis unit 150, the image memory 160, and the voice conversion information calculation unit 190. Thus, an image composition unit 151, an image memory 161, and a sound conversion information calculation unit 191 are provided. The image processing device 680 can store an image corresponding to the current frame in the display screen, and performs audio conversion processing according to the processing. The configurations of the moving image storage unit 240, the metadata storage unit 250, the image conversion unit 140, the audio conversion processing unit 200, the audio output control unit 210, and the speaker 220 are the same as those of the image processing apparatus shown in FIG. Description of is omitted. In this example, an example in which a part of the image processing apparatus 650 shown in FIG. 29 is modified will be described. However, the present invention can be similarly applied to the image processing apparatus 100 shown in FIG.

画像合成部１５１は、表示領域取出部２６０から出力された表示領域における現フレームに対応する画像の位置に基づいて、画像変換部１４０から受け取った現フレームに対応する画像を、表示メモリ２７０に保持される合成画像に上書きすることにより合成する。具体的には、現フレームに対応する画像を固定する表示モードが指定されている場合には、画像合成部１５１は、画像変換部１４０によりアフィン変換される前の現フレームに対応する画像を、表示メモリ２７０に保持される合成画像の真中部分に上書きすることにより合成する。一方、現フレームに対応する画像の前の合成画像を固定する表示モードが指定されている場合には、画像合成部１５１は、表示領域取出部２６０から出力された表示領域における現フレームに対応する画像の位置に基づいて、画像変換部１４０によるアフィン変換後の現フレームに対応する画像を、表示メモリ２７０に保持される合成画像に上書きして合成する。ここで、表示メモリ２７０に合成される現フレームに対応する画像の大きさについては、表示倍率の値に応じて決定される。また、画像合成部１５１は、画像合成部１５０の機能を備える。画像合成部１５０の機能については、上述のものと同様であるため、ここでの説明は省略する。 The image composition unit 151 holds the image corresponding to the current frame received from the image conversion unit 140 in the display memory 270 based on the position of the image corresponding to the current frame in the display region output from the display region extraction unit 260. Is synthesized by overwriting the synthesized image. Specifically, when the display mode for fixing the image corresponding to the current frame is designated, the image composition unit 151 displays the image corresponding to the current frame before being affine transformed by the image conversion unit 140. Composition is performed by overwriting the middle portion of the composite image held in the display memory 270. On the other hand, when the display mode for fixing the composite image before the image corresponding to the current frame is designated, the image composition unit 151 corresponds to the current frame in the display area output from the display area extraction unit 260. Based on the position of the image, the image corresponding to the current frame after the affine transformation by the image transformation unit 140 is overwritten on the synthesized image held in the display memory 270 and synthesized. Here, the size of the image corresponding to the current frame combined with the display memory 270 is determined according to the value of the display magnification. The image composition unit 151 has the function of the image composition unit 150. Since the function of the image composition unit 150 is the same as that described above, a description thereof is omitted here.

画像メモリ１６１は、画像合成部１５１により合成された合成画像を保持するワークバッファである。保持されている合成画像を画像変換部１４０または表示領域取出部２６０に供給するものである。 The image memory 161 is a work buffer that holds the synthesized image synthesized by the image synthesis unit 151. The held composite image is supplied to the image conversion unit 140 or the display area extraction unit 260.

表示領域取出部２６０は、画像メモリ１６１に保持されている合成画像から、表示の対象となる領域である表示領域の範囲内に存在する画像を取り出すものである。この表示領域取出部２６０は、取り出された画像を表示メモリ２７０に保持させる。また、表示領域取出部２６０は、画像メモリ１６１に保持されている合成画像のうちの現フレームに対応する画像の少なくとも一部が表示領域の範囲内からはみ出している場合には、現フレームに対応する画像の全てが表示領域の範囲内に含まれるように表示領域を移動させた後に、画像メモリ１６１に保持されている合成画像から、表示領域の範囲内に存在する画像を取り出す。さらに、表示領域取出部２６０は、現フレームの前の合成画像を固定する表示モードが指定されている場合には、表示領域における現フレームに対応する画像の位置を算出し、この表示領域における現フレームに対応する画像の位置を画像合成部１５１に出力する。また、表示領域取出部２６０は、画像メモリ１６１の領域を基準とした現在の表示領域に関するアフィン変換パラメータを算出して音声変換情報算出部１９１に出力する。なお、この表示領域の範囲内に含まれる画像の取出しについては、図３２乃至図３８等を参照して詳細に説明し、表示領域の移動については、図３３、図３４等を参照して詳細に説明する。また、表示領域における現フレームに対応する画像の位置の算出については、図３７を参照して詳細に説明する。また、現在の表示領域に関するアフィン変換パラメータの算出については、図３２および図３５を参照して説明する。 The display area extraction unit 260 extracts an image that exists within the display area, which is an area to be displayed, from the composite image held in the image memory 161. The display area extraction unit 260 holds the extracted image in the display memory 270. Further, the display area extraction unit 260 responds to the current frame when at least a part of the image corresponding to the current frame among the composite images held in the image memory 161 is out of the display area. After the display area is moved so that all the images to be included are included in the display area range, an image existing in the display area range is extracted from the composite image held in the image memory 161. Further, when the display mode for fixing the composite image before the current frame is designated, the display area extraction unit 260 calculates the position of the image corresponding to the current frame in the display area, and displays the current area in the display area. The position of the image corresponding to the frame is output to the image composition unit 151. Further, the display area extraction unit 260 calculates affine transformation parameters related to the current display area with reference to the area of the image memory 161 and outputs the affine transformation parameters to the audio conversion information calculation unit 191. The extraction of an image included in the display area will be described in detail with reference to FIGS. 32 to 38, and the movement of the display area will be described in detail with reference to FIGS. Explained. The calculation of the position of the image corresponding to the current frame in the display area will be described in detail with reference to FIG. The calculation of the affine transformation parameters relating to the current display area will be described with reference to FIGS. 32 and 35.

表示メモリ２７０は、表示領域取出部２６０により画像メモリ１６１から取り出された画像を保持する表示用バッファである。この表示メモリ２７０に保持されている画像が表示部１８０に表示される。 The display memory 270 is a display buffer that holds an image extracted from the image memory 161 by the display area extraction unit 260. An image held in the display memory 270 is displayed on the display unit 180.

表示制御部１７１は、表示メモリ２７０に保持されている合成画像をフレーム毎に表示部１８０に順次表示させるものである。 The display control unit 171 sequentially displays the composite image held in the display memory 270 on the display unit 180 for each frame.

表示部１８０は、表示制御部１７１の制御に基づいて、表示メモリ２７０に保持されている合成画像を表示するものである。例えば、パーソナルコンピュータやテレビジョンのディスプレイにより実現することができる。 The display unit 180 displays the composite image held in the display memory 270 based on the control of the display control unit 171. For example, it can be realized by a display of a personal computer or a television.

音声変換情報算出部１９１は、表示領域における現フレームに対応する画像の中心位置、角度または倍率に基づいて音声変換情報を算出するものである。具体的には、音声変換情報算出部１９１は、画像変換部１４０から出力された画像メモリ１６１における現フレームに対応する画像の中心位置、角度または倍率、および、表示領域取出部２６０から出力された、現在の表示領域に関するアフィン変換パラメータの逆行列を用いて、表示領域における現フレームに対応する画像の中心位置、角度および倍率を算出する。また、音声変換情報算出部１９０は、音声変換情報算出部１９０の機能を備える。なお、音声変換情報算出部１９０の機能については、上述のものと同様であるため、ここでの説明は省略する。また、音声変換情報算出部１９１は、表示領域における現フレームに対応する画像の中心位置、角度または倍率を表示領域取出部２６０から直接受け取ってもよい。 The voice conversion information calculation unit 191 calculates voice conversion information based on the center position, angle, or magnification of the image corresponding to the current frame in the display area. Specifically, the voice conversion information calculation unit 191 outputs the center position, angle, or magnification of the image corresponding to the current frame in the image memory 161 output from the image conversion unit 140 and the display area extraction unit 260. The center position, angle, and magnification of the image corresponding to the current frame in the display area are calculated using the inverse matrix of the affine transformation parameters for the current display area. The voice conversion information calculation unit 190 has the function of the voice conversion information calculation unit 190. Note that the function of the voice conversion information calculation unit 190 is the same as that described above, and thus the description thereof is omitted here. The voice conversion information calculation unit 191 may directly receive the center position, angle, or magnification of the image corresponding to the current frame in the display area from the display area extraction unit 260.

操作受付部２３１は、各種操作キー等を備え、これらのキーによる操作入力を受け付けると、受け付けた操作入力の内容を表示領域取出部２６０に出力するものである。操作受付部２３１には、例えば、動画の再生を指示する再生指示キー、動画の表示倍率を指定する表示倍率指定キー、動画を再生する場合における表示モードを設定する設定キーが設けられている。この表示モードとして、例えば、現フレームの前の各フレームに対応する合成画像を固定した状態で、現フレームに対応する現フレームに対応する画像をアフィン変換させながら表示させる表示モード、または、現フレームに対応する現フレームに対応する画像を固定した状態で、合成画像をアフィン変換パラメータの方向とは逆方向にアフィン変換させながら表示させる表示モードがある。これらの表示モードは、動画の再生中であっても切り換えることが可能である。また、操作受付部２３１は、操作受付部２３０の機能を備える。なお、操作受付部２３０の機能については、上述のものと同様であるため、ここでの説明は省略する。 The operation accepting unit 231 includes various operation keys and the like, and when accepting an operation input using these keys, outputs the content of the accepted operation input to the display area extracting unit 260. The operation reception unit 231 includes, for example, a playback instruction key for instructing playback of a moving image, a display magnification specifying key for specifying a display magnification of the moving image, and a setting key for setting a display mode when playing back the moving image. As this display mode, for example, a display mode in which an image corresponding to the current frame corresponding to the current frame is displayed while being affine transformed in a state where a composite image corresponding to each frame before the current frame is fixed, or the current frame There is a display mode in which a composite image is displayed while affine transformation is performed in a direction opposite to the direction of the affine transformation parameter in a state where the image corresponding to the current frame is fixed. These display modes can be switched even during moving image reproduction. The operation receiving unit 231 has the function of the operation receiving unit 230. Note that the function of the operation reception unit 230 is the same as that described above, and thus the description thereof is omitted here.

図３２は、本発明の実施の形態における動画記憶部２４０に記憶されている動画の各フレームと、表示領域との関係を模式的に示す図である。ここでは、画像メモリ１６１、メタデータ記憶部２５０および操作受付部２３１についてのみ図示し、これら以外の構成についての図示を省略する。また、図３０（ｂ）に示す動画ファイル２４１を構成するフレーム「１」乃至「３」について、メタデータファイル２５１に記憶されているアフィン変換パラメータ２５６を用いて画像メモリ１６１に合成画像が作成される場合を例にして説明する。なお、図３２では、表示部１８０において現フレームの前の各フレームに対応する合成画像を固定する場合について示す。 FIG. 32 is a diagram schematically showing a relationship between each frame of the moving image stored in the moving image storage unit 240 and the display area in the embodiment of the present invention. Here, only the image memory 161, the metadata storage unit 250, and the operation receiving unit 231 are illustrated, and illustrations of other configurations are omitted. Also, a composite image is created in the image memory 161 using the affine transformation parameters 256 stored in the metadata file 251 for the frames “1” to “3” constituting the moving image file 241 shown in FIG. An example will be described. FIG. 32 shows a case where the composite image corresponding to each frame before the current frame is fixed on the display unit 180.

図３２（ａ）には、図３０（ｂ）に示す動画ファイル２４１を構成するフレームのうちの最初のフレームであるフレーム１（２４５）が画像メモリ１６１に保存される場合を示す。例えば、現フレームの前の各フレームに対応する合成画像を固定して、動画記憶部２４０に記憶されている動画ファイル２４１の再生を指示する再生指示の操作入力が操作受付部２３１により受け付けられると、図３２（ａ）に示すように、動画ファイル２４１のフレーム１（２４５）に対応する画像３５１が画像メモリ１６１に保存される。ここで、最初のフレームに対応する画像３５１が画像メモリ１６１に保存される位置は、予め指定されている位置に保存するようにしてもよく、操作受付部２３１においてユーザにより指定された位置に保存するようにしてもよい。また、例えば、メタデータファイル２５１に記憶されている動画ファイル２４１に関するアフィン変換パラメータ２５６を用いてフレーム「１」乃至「ｎ」までの合成画像の大きさを計算し、この計算に基づいて画像３５１が保存される位置を決定するようにしてもよい。なお、この例では、画像メモリ１６１上に配置された画像３５１の左上の位置を原点とし、横方向（横軸）をｘ軸とし、縦方向（縦軸）をｙ軸として説明する。 FIG. 32A shows a case where frame 1 (245), which is the first frame among the frames constituting the moving image file 241 shown in FIG. 30B, is stored in the image memory 161. For example, when the operation accepting unit 231 accepts an operation input of a reproduction instruction for instructing reproduction of the moving image file 241 stored in the moving image storage unit 240 by fixing a composite image corresponding to each frame before the current frame. 32A, an image 351 corresponding to frame 1 (245) of the moving image file 241 is stored in the image memory 161. Here, the position where the image 351 corresponding to the first frame is stored in the image memory 161 may be stored in a position specified in advance, or stored in a position specified by the user in the operation reception unit 231. You may make it do. Also, for example, the size of the composite image from frames “1” to “n” is calculated using the affine transformation parameters 256 relating to the moving image file 241 stored in the metadata file 251, and the image 351 is calculated based on this calculation. The position where the is stored may be determined. In this example, the description will be made assuming that the upper left position of the image 351 arranged on the image memory 161 is the origin, the horizontal direction (horizontal axis) is the x axis, and the vertical direction (vertical axis) is the y axis.

図３２（ａ）に示すように、画像メモリ１６１上に画像３５１が配置された場合における表示領域を表示領域３６１とする。表示領域３６１は、例えば、画像３５１が保存されている位置および大きさに基づいて、操作受付部３２０により受け付けられた表示倍率の値に応じて決定される。例えば、現フレームに対応する画像をズームアウトする「０．５倍」の表示倍率が指定されている場合には、表示領域３６１は、画像３５１を中心として画像３５１の２倍の大きさとなる。なお、画像３５１に対する表示領域３６１の位置は、アフィン変換パラメータにより決定することができる。すなわち、現フレームに対応する画像をズームアウトする「０．５倍」の表示倍率が指定されている場合には、ｘ方向およびｙ方向のズーム成分が２倍となるアフィン変換パラメータを用いて表示領域が設定される。また、現フレームに対応する画像に対して表示領域を平行移動させる場合や回転させる場合についても、アフィン変換パラメータを用いることにより表示領域の位置および範囲を決定することができる。 As shown in FIG. 32A, a display area when an image 351 is arranged on the image memory 161 is a display area 361. The display area 361 is determined according to the value of the display magnification received by the operation receiving unit 320 based on, for example, the position and size where the image 351 is stored. For example, when the display magnification of “0.5 times” for zooming out the image corresponding to the current frame is designated, the display area 361 is twice the size of the image 351 with the image 351 as the center. Note that the position of the display area 361 with respect to the image 351 can be determined by an affine transformation parameter. That is, when a display magnification of “0.5 times” for zooming out an image corresponding to the current frame is designated, display is performed using an affine transformation parameter that doubles the zoom component in the x and y directions. An area is set. Even when the display area is translated or rotated with respect to the image corresponding to the current frame, the position and range of the display area can be determined by using the affine transformation parameters.

図３２（ｂ）には、図３０（ｂ）に示す動画ファイル２４１を構成するフレームのうちのフレーム２（２４６）が画像メモリ１６１に保存される場合を示す。この場合には、上述したように、フレーム番号２５５の「１」および「２」に関連付けてメタデータファイル２５１に記憶されているアフィン変換パラメータ２５６を用いてフレーム２（２４６）に対応する画像３５２が変換され、画像３５１に上書き合成される。この場合に、例えば、現フレームに対応する画像３５２が表示領域３６１の範囲内からはみ出していない場合には、表示領域３６１の位置および大きさは変更されない。ここで、現フレームに対応する画像が現在の表示領域の範囲内からはみ出す場合については、図３３および図３４を参照して詳細に説明する。なお、画像３５１に対する画像３５２の移動に応じて表示領域３６１を平行移動等させるようにしてもよい。 FIG. 32B shows a case where frame 2 (246) of the frames constituting the moving image file 241 shown in FIG. 30B is stored in the image memory 161. In this case, as described above, the image 352 corresponding to the frame 2 (246) using the affine transformation parameters 256 stored in the metadata file 251 in association with the frame numbers 255 “1” and “2”. Are converted and overwritten on the image 351. In this case, for example, when the image 352 corresponding to the current frame does not protrude from the range of the display area 361, the position and size of the display area 361 are not changed. Here, the case where the image corresponding to the current frame protrudes from the range of the current display area will be described in detail with reference to FIG. 33 and FIG. Note that the display area 361 may be translated in accordance with the movement of the image 352 relative to the image 351.

図３２（ｃ）には、図３０（ｂ）に示す動画ファイル２４１を構成するフレームのうちのフレーム３が画像メモリ１６１に保存される場合を示す。この場合についても、上述したように、フレーム番号２５５「１」乃至「３」に関連付けてメタデータファイル２５１に記憶されているアフィン変換パラメータ２５６を用いてフレーム３に対応する画像３５３が変換され、画像３５１および３５２に上書き合成される。 FIG. 32C shows a case where frame 3 of the frames constituting the moving image file 241 shown in FIG. 30B is stored in the image memory 161. Also in this case, as described above, the image 353 corresponding to the frame 3 is converted using the affine transformation parameters 256 stored in the metadata file 251 in association with the frame numbers 255 “1” to “3”. The images 351 and 352 are overwritten and synthesized.

次に、現画像の移動に合わせて表示領域を移動させる場合における処理について図面を参照して詳細に説明する。 Next, processing in the case where the display area is moved in accordance with the movement of the current image will be described in detail with reference to the drawings.

図３３は、現フレームに対応する画像が表示領域からはみ出した場合における表示領域の移動処理を概略的に示す図である。図３３（ａ）は、画像メモリ１６１に保持されている現フレームに対応する画像７６０を含む複数の画像と、表示領域７５９との関係を示す図である。図３３（ａ）に示すように、表示領域７５９の範囲内に現画像７６０の全てが含まれているため、表示部１８０には他の画像とともに現画像７６０の全てが表示される。 FIG. 33 is a diagram schematically showing a display area moving process when an image corresponding to the current frame protrudes from the display area. FIG. 33A shows the relationship between a plurality of images including the image 760 corresponding to the current frame held in the image memory 161 and the display area 759. As shown in FIG. 33A, since all of the current image 760 is included within the display area 759, the display unit 180 displays all of the current image 760 together with other images.

図３３（ｂ）は、画像メモリ１６１に保持されている現画像７６２を含む複数の画像と、表示領域７５９との関係を示す図である。ここで、現画像７６２は、図３３（ａ）に示す現画像７６０の次のフレームに対応する画像である。図３３（ｂ）に示すように、表示領域７５９の範囲内から現画像７６２の一部がはみ出している場合には、表示部１８０には現画像７６０の一部が表示されない。そこで、このような場合には、図３３（ｂ）に示すように、表示領域７５９の一辺と、表示領域７５９の範囲内からはみ出している現画像７６２との差分値７６３を表示領域取出部２６０が算出して、この算出された差分値７６３に付加値７６４を加算した値だけ表示領域７５９を移動させる。ここで、付加値７６４は、例えば、５ピクセルとすることができる。また、付加値を加算せずに、差分値だけ移動させるようにしてもよい。なお、図３３（ｂ）では、表示領域７６１の右側部分から現画像７６２がはみ出した場合を例にして説明するが、上側部分、下側部分、または左側部分から現画像がはみ出した場合についても、同様の方法により表示領域を移動させることができる。また、上下左右の少なくとも２箇所から現画像がはみ出した場合には、それぞれ一辺の差分値を算出して、算出された各差分値に基づいて、それぞれの辺の方向に表示領域を移動させるようにすることができる。 FIG. 33B is a diagram showing the relationship between a plurality of images including the current image 762 held in the image memory 161 and the display area 759. Here, the current image 762 is an image corresponding to the next frame of the current image 760 shown in FIG. As shown in FIG. 33B, when a part of the current image 762 protrudes from the range of the display area 759, a part of the current image 760 is not displayed on the display unit 180. Therefore, in such a case, as shown in FIG. 33B, a difference value 763 between one side of the display area 759 and the current image 762 protruding from the display area 759 is displayed. And the display area 759 is moved by a value obtained by adding the additional value 764 to the calculated difference value 763. Here, the additional value 764 can be, for example, 5 pixels. Moreover, you may make it move only a difference value, without adding an additional value. In FIG. 33B, the case where the current image 762 protrudes from the right portion of the display area 761 will be described as an example. However, the case where the current image protrudes from the upper portion, the lower portion, or the left portion is also described. The display area can be moved by the same method. In addition, when the current image protrudes from at least two places on the top, bottom, left, and right, each side difference value is calculated, and the display area is moved in the direction of each side based on each calculated difference value. Can be.

図３３（ｃ）には、図３３（ｂ）に示す状態で算出された差分値７６３に基づいて移動された表示領域７６５を示す。 FIG. 33 (c) shows the display area 765 moved based on the difference value 763 calculated in the state shown in FIG. 33 (b).

図３４は、図３３に示す移動処理で表示領域を移動させる場合の遷移の一例を示す図である。図３４（ａ）は、表示領域を移動させる場合における画像メモリ１６１上の表示領域の遷移の一例を示す図であり、図３４（ｂ）は、表示領域を移動させる場合における表示部１８０に表示される画像の遷移の一例を示す図である。同図に示すように、現画像７６７以降の画像が表示領域７６６からはみ出すような場合でも、現画像の位置に応じて表示領域７６６を順次移動させることができる。例えば、画像メモリ１６１上において画像７６７から現画像７６９まで進んだ場合に、この移動に応じて表示領域７６６が表示領域７６８の位置まで移動する。この場合には、表示部１８０に表示される画像が画像７７０から画像７７１に遷移する。これにより、表示部１８０に表示されている画像を拡大縮小させる場合でも、現画像の全部を表示部１８０に常に表示させておくことができる。 FIG. 34 is a diagram illustrating an example of transition when the display area is moved in the movement process illustrated in FIG. 33. FIG. 34A is a diagram illustrating an example of transition of the display area on the image memory 161 when the display area is moved, and FIG. 34B is a display on the display unit 180 when the display area is moved. It is a figure which shows an example of the transition of the image performed. As shown in the figure, even when the image after the current image 767 protrudes from the display area 766, the display area 766 can be sequentially moved according to the position of the current image. For example, when the image memory 161 advances from the image 767 to the current image 769, the display area 766 moves to the position of the display area 768 in accordance with this movement. In this case, the image displayed on the display unit 180 changes from the image 770 to the image 771. Thereby, even when the image displayed on the display unit 180 is enlarged or reduced, the entire current image can be always displayed on the display unit 180.

次に、表示部１８０において現フレームに対応する現画像を固定する場合について図面を参照して詳細に説明する。 Next, the case where the current image corresponding to the current frame is fixed on the display unit 180 will be described in detail with reference to the drawings.

図３５は、本発明の実施の形態における動画記憶部２４０に記憶されている動画ファイルの各フレームと、表示領域との関係を模式的に示す図である。ここでは、図３２と同様に、画像メモリ１６１、メタデータ記憶部２５０および操作受付部２３１についてのみ図示し、これら以外の構成についての図示を省略する。また、図３０（ｂ）に示す動画ファイル２４１を構成するフレーム「１」乃至「３」について、メタデータファイル２５１に記憶されているアフィン変換パラメータ２５６を用いて画像メモリ１６１に合成画像が作成される場合を例にして説明する。 FIG. 35 is a diagram schematically showing a relationship between each frame of the moving image file stored in the moving image storage unit 240 and the display area in the embodiment of the present invention. Here, as in FIG. 32, only the image memory 161, the metadata storage unit 250, and the operation receiving unit 231 are illustrated, and illustrations of other configurations are omitted. Also, a composite image is created in the image memory 161 using the affine transformation parameters 256 stored in the metadata file 251 for the frames “1” to “3” constituting the moving image file 241 shown in FIG. An example will be described.

図３５（ａ）には、図３２（ａ）と同様に、フレーム１（２４５）が画像メモリ１６１に保存される場合を示す。なお、図３５（ａ）に示す画像３５１および表示領域３６１の位置および大きさについては、図３２（ａ）に示すものと同じであるため、ここでの詳細な説明は省略する。なお、この例では、現フレームに対応する画像の変換とともに、表示領域が変換されるものの、フレーム１（２４５）に対応するアフィン変換パラメータは単位行列のパラメータであるため、フレーム１（２４５）に対応する表示領域３６１は、操作受付部２３１からの表示倍率指定のみが考慮されて決定される。 FIG. 35A shows a case where frame 1 (245) is stored in the image memory 161, as in FIG. Note that the positions and sizes of the image 351 and the display area 361 shown in FIG. 35A are the same as those shown in FIG. 32A, and thus detailed description thereof is omitted here. In this example, although the display area is converted together with the conversion of the image corresponding to the current frame, the affine transformation parameter corresponding to frame 1 (245) is a unit matrix parameter. The corresponding display area 361 is determined considering only the display magnification designation from the operation receiving unit 231.

図３５（ｂ）には、図３２（ｂ）と同様に、フレーム２（２４６）が画像メモリ１６１に保存される場合を示す。この場合には、図３２（ｂ）と同様に、フレーム２（２４６）に対応する画像３５２が変換され、画像３５１に上書き合成されるとともに、表示領域についてもアフィン変換が施される。すなわち、画像３５１の位置および大きさを基準として、フレーム番号２５５の「１」および「２」に関連付けてメタデータファイル２５１に記憶されているアフィン変換パラメータ２５６を用いてフレーム２（２４６）に対応する画像３５２が変換される。そして、操作受付部２３１により受け付けられた表示倍率の値に応じて決定されるアフィン変換パラメータを用いて画像３５２の位置および大きさが変換され、この変換後の位置および大きさにより決定される領域が表示領域３６２となる。具体的には、フレーム番号２５５の「１」および「２」に対応するアフィン変換パラメータの行列をそれぞれＡ１、Ａ２とし、操作受付部２３１により受け付けられた表示倍率の値に応じて決定されるアフィン変換パラメータの行列をＢとする場合には、「Ａ１×Ａ２×Ｂ」の値が求められ、画像３５１の位置および大きさを基準として、求められた「Ａ１×Ａ２×Ｂ」の行列により表示領域３６２が決定される。 FIG. 35B shows a case where frame 2 (246) is stored in the image memory 161, as in FIG. In this case, similarly to FIG. 32B, the image 352 corresponding to the frame 2 (246) is converted, overwritten and combined with the image 351, and the display area is also subjected to affine transformation. That is, using the affine transformation parameter 256 stored in the metadata file 251 in association with “1” and “2” of the frame number 255 on the basis of the position and size of the image 351, it corresponds to the frame 2 (246). The image 352 to be converted is converted. Then, the position and size of the image 352 are converted using the affine transformation parameters determined according to the value of the display magnification received by the operation receiving unit 231, and the region determined by the converted position and size. Becomes the display area 362. Specifically, the matrix of affine transformation parameters corresponding to “1” and “2” of frame number 255 is A1 and A2, respectively, and the affine determined according to the value of the display magnification received by the operation receiving unit 231. When the matrix of the conversion parameters is B, the value “A1 × A2 × B” is obtained, and is displayed by the obtained matrix “A1 × A2 × B” based on the position and size of the image 351. Region 362 is determined.

図３５（ｃ）には、図３２（ａ）と同様に、フレーム３が画像メモリ１６１に保存される場合を示す。この場合についても、上述したように、フレーム３に対応する画像３５３が変換され、画像３５１および３５２に上書き合成されるとともに、表示領域についてもアフィン変換が施されて、画像３５３に対する表示領域３６３が決定される。具体的には、フレーム番号２５５の「１」乃至「３」に対応するアフィン変換パラメータの行列をそれぞれＡ１乃至Ａ３とし、操作受付部２３１により受け付けられた表示倍率の値に応じて決定されるアフィン変換パラメータの行列をＢとする場合には、「Ａ１×Ａ２×Ａ３×Ｂ」の値が求められ、画像３５１の位置および大きさを基準として、求められた「Ａ１×Ａ２×Ａ３×Ｂ」の行列により表示領域３６３が決定される。 FIG. 35C shows a case where the frame 3 is stored in the image memory 161 as in FIG. Also in this case, as described above, the image 353 corresponding to the frame 3 is converted, overwritten and combined with the images 351 and 352, and the display area is also subjected to affine transformation, so that the display area 363 for the image 353 is changed. It is determined. Specifically, the matrix of affine transformation parameters corresponding to “1” to “3” of the frame number 255 is A1 to A3, respectively, and the affine determined according to the display magnification value received by the operation receiving unit 231. When the matrix of conversion parameters is B, the value “A1 × A2 × A3 × B” is obtained, and the obtained “A1 × A2 × A3 × B” is obtained based on the position and size of the image 351. The display area 363 is determined by the matrix.

図３６は、表示部１８０における現フレームに対応する画像を固定する表示モードが指定されている場合において、表示部１８０に表示される動画を拡大表示させる場合における拡大方法の概略を示す図である。図３６（ａ）は、表示部１８０に表示される動画を拡大表示させる場合の表示領域の遷移を概略的に示す図であり、図３６（ｂ）は、図３６（ａ）に示す表示領域６９８および６９９内の画像が表示部１８０に表示される場合における表示例を示す図である。 FIG. 36 is a diagram showing an outline of an enlargement method when a moving image displayed on the display unit 180 is enlarged and displayed when a display mode for fixing an image corresponding to the current frame on the display unit 180 is designated. . FIG. 36A is a diagram schematically showing the transition of the display area when the moving image displayed on the display unit 180 is enlarged, and FIG. 36B is the display area shown in FIG. 6 is a diagram illustrating a display example when images in 698 and 699 are displayed on the display unit 180. FIG.

図３６（ｂ）では、図３６（ａ）に示す表示領域６９８により画像メモリ１６１から取り出されて表示部１８０に表示される画像７３０を示す。ここで、図３６（ｂ）に示す画像７３０が表示されている状態で、操作受付部２３１において拡大表示の指示操作が受け付けられた場合には、この拡大表示の指示操作に応じて、表示領域取出部２６０が表示領域６９８の大きさを縮小する。なお、この縮小処理は、現画像６９７が中心となるように行われる。すなわち、上述したように、操作受付部２３１により受け付けられた表示倍率の値に応じて決定されるアフィン変換パラメータを用いて画像６７９の位置および大きさが変換され、この変換後の位置および大きさにより表示領域６９８が決定される。この例では、表示倍率を拡大する操作入力がされているため、この表示倍率の拡大に応じてアフィン変換パラメータのズーム成分が決定される。 FIG. 36B shows an image 730 that is extracted from the image memory 161 and displayed on the display unit 180 by the display area 698 shown in FIG. Here, in the state where the image 730 shown in FIG. 36B is displayed, when the operation accepting unit 231 accepts an enlarged display instruction operation, the display area is displayed according to the enlarged display instruction operation. The extraction unit 260 reduces the size of the display area 698. This reduction process is performed so that the current image 697 is at the center. That is, as described above, the position and size of the image 679 are converted using the affine transformation parameters determined according to the value of the display magnification received by the operation receiving unit 231, and the position and size after the conversion are converted. Thus, the display area 698 is determined. In this example, since the operation input for enlarging the display magnification is performed, the zoom component of the affine transformation parameter is determined according to the enlargement of the display magnification.

例えば、図３６（ａ）に示すように、表示領域６９８の大きさが縮小されて、表示領域６９９となる。図３６（ｂ）では、図３６（ａ）に示す表示領域６９９により画像メモリ１６１から取り出されて表示部１８０に表示される画像７３１を示す。このように、表示領域の大きさを変更するのみで、現フレームに対応する画像を含む画像を拡大または縮小させて表示することができる。 For example, as shown in FIG. 36A, the size of the display area 698 is reduced to become a display area 699. FIG. 36B shows an image 731 that is extracted from the image memory 161 and displayed on the display unit 180 by the display area 699 shown in FIG. As described above, the image including the image corresponding to the current frame can be enlarged or reduced and displayed only by changing the size of the display area.

以上で示したように、画像メモリ１６１上に配置される表示領域の範囲内に存在する画像を表示することによって、再生中の合成画像を順次表示させることができる。ここで、現画像がアフィン変換されて画像メモリ１６１に合成される際には縮小処理等が施されることがある。このため、表示倍率を高くして現画像を拡大表示させる場合には、現フレームに対応する画像を含む合成画像がぼけてしまうことが考えられる。そこで、この例では、現在再生中の現画像については、画像メモリ１６１に合成される前の画像を用いて合成画像を表示させる。以下では、この表示方法について図面を参照して詳細に説明する。 As described above, by displaying an image that exists within the display area arranged on the image memory 161, it is possible to sequentially display the composite image being reproduced. Here, when the current image is affine-transformed and synthesized in the image memory 161, a reduction process or the like may be performed. For this reason, when the display magnification is increased and the current image is enlarged and displayed, a composite image including an image corresponding to the current frame may be blurred. Therefore, in this example, for the current image currently being reproduced, a synthesized image is displayed using the image before being synthesized in the image memory 161. Hereinafter, this display method will be described in detail with reference to the drawings.

図３７および図３８は、本発明の実施の形態における動画記憶部２４０に記憶されている動画ファイルの各フレームの流れを模式的に示す図である。ここでは、動画記憶部２４０、メタデータ記憶部２５０、画像メモリ１６１および表示メモリ２７０の関係についてのみ図示し、これら以外の構成についての図示を省略する。また、図３７では、表示部１８０において現フレームの前の各フレームに対応する合成画像を固定する場合について示し、図３８では、表示部１８０において現フレームに対応する画像を固定する場合について示す。 FIGS. 37 and 38 are diagrams schematically showing the flow of each frame of the moving image file stored in the moving image storage unit 240 according to the embodiment of the present invention. Here, only the relationship between the moving image storage unit 240, the metadata storage unit 250, the image memory 161, and the display memory 270 is illustrated, and illustrations of other configurations are omitted. FIG. 37 shows a case where a composite image corresponding to each frame before the current frame is fixed on the display unit 180, and FIG. 38 shows a case where an image corresponding to the current frame is fixed on the display unit 180.

図３７（ａ）には、図３０（ｂ）に示す動画ファイル２４１およびメタデータファイル２５１を簡略化して示す。以下では、動画ファイル２４１を構成するフレームｉ（２４７）に対応する画像が表示される例について説明する。すなわち、動画ファイル２４１を構成するフレーム１乃至「ｉ−１」に対応する画像については、合成画像が作成されているものとする。また、現画像の移動に合わせて図３２に示す表示領域３６１が右側に移動されているものとする。 FIG. 37A shows a simplified video file 241 and metadata file 251 shown in FIG. Hereinafter, an example in which an image corresponding to the frame i (247) constituting the moving image file 241 is displayed will be described. In other words, it is assumed that a composite image has been created for images corresponding to frames 1 to “i−1” constituting the moving image file 241. Further, it is assumed that the display area 361 shown in FIG. 32 is moved to the right side in accordance with the movement of the current image.

図３７（ｂ）には、動画ファイル２４１を構成する各フレームに対応する画像が合成された合成画像が保持されている画像メモリ１６１を模式的に示す。図３２（ｂ）に示すように、動画ファイル２４１を構成するフレーム１（２４５）に対応する画像３５１が画像メモリ１６１に最初に保持される。そして、画像３５１が画像メモリ１６１に保持された後に、動画ファイル２４１を構成するフレーム２乃至「ｉ−１」に対応する各画像が、フレーム２乃至「ｉ−１」のそれぞれに関連付けてメタデータファイル２５１に記憶されているアフィン変換パラメータ２５６の値を用いて順次アフィン変換され、アフィン変換された画像が画像メモリ１６１に順次上書きされて保持される。そして、画像メモリ１６１に保持されている合成画像から、操作受付部２３１からの表示倍率指定に係る操作入力に応じて決定された表示領域内に存在する画像を、表示領域取出部２６０がフレーム毎に取り出す。 FIG. 37B schematically shows an image memory 161 that holds a composite image in which images corresponding to the frames constituting the moving image file 241 are combined. As shown in FIG. 32B, an image 351 corresponding to frame 1 (245) constituting the moving image file 241 is first held in the image memory 161. Then, after the image 351 is stored in the image memory 161, each image corresponding to the frames 2 to “i−1” constituting the moving image file 241 is associated with each of the frames 2 to “i−1” as metadata. The affine transformation parameters 256 stored in the file 251 are sequentially used for affine transformation, and the affine transformation images are sequentially overwritten and held in the image memory 161. Then, the display area extraction unit 260 displays, for each frame, an image that exists in the display area determined according to the operation input related to the display magnification designation from the operation reception unit 231 from the composite image held in the image memory 161. Take out.

フレーム１乃至「ｉ−１」に対応する各画像による合成画像が画像メモリ１６１に保持されている状態で、動画ファイル２４１を構成するフレームｉ（２４７）に対応する画像が、フレームｉに関連付けてメタデータファイル２５１に記憶されているアフィン変換パラメータ２５６の値「ａｉ，ｂｉ，ｃｉ，ｄｉ，ｅｉ，ｆｉ」を用いてアフィン変換され、アフィン変換された現画像６９２が画像メモリ１６１に上書きされて保持される。そして、画像メモリ１６１に保持されている合成画像から、操作受付部２３１からの表示倍率指定に係る操作入力に応じて決定された表示領域６９０内に存在する画像を、表示領域取出部２６０が取り出し、取り出された画像を、例えば、図３７（ｃ）に示すように表示メモリ２７０に保持させる。 An image corresponding to frame i (247) constituting the moving image file 241 is associated with frame i in a state where a composite image of each image corresponding to frames 1 to “i-1” is held in the image memory 161. The affine transformation is performed using the values “ai, bi, ci, di, ei, fi” of the affine transformation parameters 256 stored in the metadata file 251, and the current image 692 after the affine transformation is overwritten in the image memory 161. Retained. Then, the display area extraction unit 260 extracts an image existing in the display area 690 determined according to the operation input related to the display magnification designation from the operation reception unit 231 from the composite image held in the image memory 161. The extracted image is held in the display memory 270 as shown in FIG. 37C, for example.

図３７（ｃ）には、表示領域取出部２６０により取り出された画像が保持されている表示メモリ２７０を模式的に示す。ここで、表示領域取出部２６０により取り出された画像のうちの現フレームに対応する現画像６９３は、表示領域取出部２６０により画像メモリ１６１から取り出された現画像６９２ではなく、動画記憶部２４０から取得されて画像変換部１４０によりアフィン変換された画像を用いる。ここで、表示メモリ２７０における現画像６９３の保存位置は、画像メモリ１６１における現画像６９２の位置および大きさと、画像メモリ１６１における表示領域６９０の位置および大きさとに基づいて決定することができる。例えば、フレーム番号２５５の「１」乃至「ｉ」に関連付けてメタデータファイル２５１に記憶されているアフィン変換パラメータの行列をそれぞれＡ１、…、Ａｉとし、表示領域６９０を決定するためのアフィン変換パラメータの行列（例えば、画像メモリ１６１を基準とする行列）をＣとする場合には、画像３５１の位置を基準として、ｉｎｖ（Ｃ）×（Ａ１×…×Ａｉ）を用いることにより、表示メモリ２７０における現画像６９３の保存位置を決定することができる。 FIG. 37C schematically shows the display memory 270 in which the image extracted by the display area extraction unit 260 is held. Here, the current image 693 corresponding to the current frame among the images extracted by the display area extraction unit 260 is not the current image 692 extracted from the image memory 161 by the display area extraction unit 260 but the moving image storage unit 240. An image acquired and affine transformed by the image conversion unit 140 is used. Here, the storage position of the current image 693 in the display memory 270 can be determined based on the position and size of the current image 692 in the image memory 161 and the position and size of the display area 690 in the image memory 161. For example, the matrix of affine transformation parameters stored in the metadata file 251 in association with the frame numbers 255 “1” to “i” is A1,..., Ai, respectively, and the affine transformation parameters for determining the display area 690 are determined. When the matrix (for example, the matrix based on the image memory 161) is C, the display memory 270 is obtained by using inv (C) × (A1 ×... × Ai) on the basis of the position of the image 351. The storage position of the current image 693 can be determined.

図３７（ｃ）に示すように、表示領域取出部２６０により取り出された画像が表示メモリ２７０に保持されるとともに、表示領域取出部２６０により取り出された画像に、動画記憶部２４０から取得されて画像変換部１４０によりアフィン変換された画像が上書きされて表示メモリ２７０に保持される。そして、表示メモリ２７０に保持されている画像が表示部１８０に表示される。このように、現画像については、アフィン変換後に縮小等の処理が施されて画像メモリ１６１に保持される前の状態の画像を用いることによって、比較的綺麗な現画像を表示することができる。また、ユーザの操作により拡大等がされた場合についても現画像を綺麗な状態で表示することができる。 As shown in FIG. 37 (c), the image extracted by the display area extraction unit 260 is held in the display memory 270, and the image extracted by the display area extraction unit 260 is acquired from the moving image storage unit 240. The image affine transformed by the image conversion unit 140 is overwritten and held in the display memory 270. Then, the image held in the display memory 270 is displayed on the display unit 180. As described above, a relatively beautiful current image can be displayed by using an image in a state before being stored in the image memory 161 after being subjected to processing such as reduction after affine transformation. In addition, the current image can be displayed in a beautiful state even when it is enlarged by a user operation.

図３８（ａ）には、図３０（ｂ）に示す動画ファイル２４１およびメタデータファイル２５１を簡略化して示す。なお、図３８（ａ）に示す動画記憶部２４０およびメタデータ記憶部２５０と、図３８（ｂ）に示す画像メモリ１６１に保持されている合成画像については、図３７（ａ）および（ｂ）と同一であるため、ここでの説明を省略する。 FIG. 38A shows the moving image file 241 and the metadata file 251 shown in FIG. 30B in a simplified manner. Note that the synthesized images held in the moving image storage unit 240 and the metadata storage unit 250 shown in FIG. 38A and the image memory 161 shown in FIG. 38B are shown in FIGS. The description here is omitted.

図３８（ｂ）には、図３７（ｂ）に示す画像３５１から現画像６９２までの合成画像が保持されている画像メモリ１６１を模式的に示すとともに、図３５（ｂ）に示す表示領域３６１を破線で示す。この例では、図３５に示すように、表示部１８０において現フレームに対応する画像の位置を固定するため、現画像６９２に合わせて表示領域をアフィン変換により算出する。すなわち、現フレームに対応する画像である画像３５１を基準として、フレーム番号２５５の「１」乃至「ｉ」に関連付けてメタデータファイル２５１に記憶されているアフィン変換パラメータ２５６を用いてフレームｉ（２４７）に対応する画像が画像６９２に変換され、画像メモリ１６１に保存される。そして、フレームｉ（２４７）に対応する表示領域６９５については、操作受付部２３１により受け付けられた表示倍率の値に応じて決定されるアフィン変換パラメータを用いて画像６９２の位置および大きさが変換され、変換後の位置および大きさにより表示領域６９５が決定される。この表示領域の決定は、表示領域取出部２６０により行われる。 FIG. 38B schematically shows an image memory 161 in which the composite image from the image 351 shown in FIG. 37B to the current image 692 is held, and a display area 361 shown in FIG. Is indicated by a broken line. In this example, as shown in FIG. 35, in order to fix the position of the image corresponding to the current frame on the display unit 180, the display area is calculated by affine transformation in accordance with the current image 692. That is, the frame i (247) using the affine transformation parameters 256 stored in the metadata file 251 in association with the frame numbers “1” to “i” with the image 351 corresponding to the current frame as a reference. ) Is converted into an image 692 and stored in the image memory 161. For the display area 695 corresponding to the frame i (247), the position and size of the image 692 are converted using the affine transformation parameters determined according to the display magnification value received by the operation receiving unit 231. The display area 695 is determined by the position and size after conversion. The display area is determined by the display area extraction unit 260.

図３８（ｃ）には、表示領域取出部２６０により取り出された画像が保持されている表示メモリ２７０を模式的に示す。ここで、表示メモリ２７０に保持される画像（現画像６９６以外の画像）は、表示領域取出部２６０により取り出された画像（表示領域６９５の範囲内に存在する画像）が、表示領域６９５の変換に用いられたアフィン変換パラメータに係る行列に対する逆行列を用いて変換された画像である。すなわち、画像メモリ１６１上に配置される表示領域の形状は、アフィン変換により平行四辺形となる場合等がある。このようにアフィン変換された表示領域内の合成画像を表示部１８０に表示させるため、現在の現画像をアフィン変換する際に用いられたアフィン変換パラメータに係る行列の逆行列を用いて表示領域内の合成画像を変換する。例えば、フレーム番号２５５の「１」乃至「ｉ」に関連付けてメタデータファイル２５１に記憶されているアフィン変換パラメータの行列をそれぞれＡ１、…、Ａｉとし、表示領域６９５を決定するためのアフィン変換パラメータの行列（例えば、現フレームに対応する画像を基準とする行列）をＢとする場合には、表示領域内の合成画像を変換するための行列として、ｉｎｖ（Ａ１×…×Ａｉ×Ｂ）を用いる。これにより、例えば、図３８（ｃ）に示すように、平行四辺形に変換された画像を長方形に変換して表示部１８０に表示させることができる。また、表示領域取出部２６０により取り出された画像のうちの現フレームに対応する画像６９６は、表示領域取出部２６０により画像メモリ１６１から取り出された画像の代わりに、動画記憶部２４０から取得されてアフィン変換されていない画像を用いる。ここで、表示メモリ２７０において画像６９６が保存される位置および大きさは、操作受付部２３１からの表示倍率に応じて決定される。 FIG. 38C schematically shows the display memory 270 in which the image extracted by the display area extracting unit 260 is held. Here, an image (an image other than the current image 696) held in the display memory 270 is an image extracted by the display area extraction unit 260 (an image existing within the display area 695). It is the image transformed using the inverse matrix with respect to the matrix which concerns on the affine transformation parameter used for. That is, the shape of the display area arranged on the image memory 161 may be a parallelogram by affine transformation. In order to display the composite image in the display area subjected to the affine transformation in this way on the display unit 180, the inverse of the matrix related to the affine transformation parameter used when the current current image is affine transformed is used. Convert the composite image. For example, the matrix of affine transformation parameters stored in the metadata file 251 in association with the frame numbers 255 “1” to “i” is A1,..., Ai, respectively, and the affine transformation parameters for determining the display area 695 are determined. When B is a matrix (for example, a matrix based on an image corresponding to the current frame), inv (A1 ×... × Ai × B) is used as a matrix for converting the composite image in the display area. Use. Thereby, for example, as shown in FIG. 38C, an image converted into a parallelogram can be converted into a rectangle and displayed on the display unit 180. The image 696 corresponding to the current frame among the images extracted by the display area extracting unit 260 is acquired from the moving image storage unit 240 instead of the image extracted from the image memory 161 by the display area extracting unit 260. Use an image that has not been affine transformed. Here, the position and size at which the image 696 is stored in the display memory 270 are determined according to the display magnification from the operation reception unit 231.

図３８（ｃ）に示すように、表示領域取出部２６０により取り出された画像が表示メモリ２７０に保持されるとともに、表示領域取出部２６０により取り出された画像に、動画記憶部２４０から取得された画像が上書きされて表示メモリ２７０に保持される。これにより、現フレームに対応する画像を固定位置に表示する表示モードが指定されている場合には、一旦アフィン変換がされた合成画像を、逆行列によりアフィン変換がされていない状態に戻して表示することができる。また、現フレームに対応する画像については、図３７と同様に、比較的綺麗な画像を表示することができる。 As shown in FIG. 38C, the image extracted by the display area extraction unit 260 is held in the display memory 270, and the image extracted by the display area extraction unit 260 is acquired from the moving image storage unit 240. The image is overwritten and held in the display memory 270. As a result, when the display mode for displaying the image corresponding to the current frame at a fixed position is designated, the composite image that has been subjected to affine transformation is returned to the state that has not been affine transformed by the inverse matrix and displayed. can do. As for the image corresponding to the current frame, a relatively clean image can be displayed as in FIG.

以上で示したように、画像メモリ１６１に保持される合成画像の作成方法を同一の方法により作成して、２つの表示態様による動画再生を実現することができるため、２つの表示態様の切り替えを動画の再生中に行うことができる。これにより、動画を再生中の視聴者が、再生中でも好みの表示態様に切り換えることが可能である。例えば、図３７に示す表示態様で動画を再生している場合において、好みの人物が現画像の真中に現れて、その人物を表示部１８０の真中部分に配置して視聴したい場合には、操作受付部２３１からの表示モード切替操作により、図３８に示す表示態様の動画再生に切り換えることができる。また、現画像については、画像メモリ１６１に保持される合成画像の代わりに、動画記憶部２４０から取得されてアフィン変換された画像を用いることができるため、比較的綺麗な画像を視聴することができる。 As described above, the method of creating the composite image stored in the image memory 161 can be created by the same method, and the moving image reproduction by the two display modes can be realized. This can be done during video playback. Thereby, the viewer who is reproducing the moving image can switch to the favorite display mode even during reproduction. For example, when a moving image is played back in the display mode shown in FIG. 37, if a favorite person appears in the middle of the current image and the person wants to view the person placed in the middle of the display unit 180, By the display mode switching operation from the accepting unit 231, the display mode can be switched to the moving image reproduction shown in FIG. 38. In addition, as for the current image, instead of the composite image held in the image memory 161, an image obtained from the moving image storage unit 240 and affine-transformed can be used, so that a relatively beautiful image can be viewed. it can.

図３９および図４０は、本発明の実施の形態における画像処理装置６８０による動画再生処理の処理手順を示すフローチャートである。なお、図３９乃至４１に示す処理手順のうちで、ステップＳ９２１、Ｓ９２６、Ｓ９２７、Ｓ９２８、Ｓ９３９およびＳ９４１については、図２７に示す処理手順と同様であるため、同一の符号を付してここでの説明は省略する。 FIG. 39 and FIG. 40 are flowcharts showing the processing procedure of the moving image reproduction processing by the image processing device 680 in the embodiment of the present invention. Of the processing procedures shown in FIGS. 39 to 41, steps S921, S926, S927, S928, S939, and S941 are the same as the processing procedures shown in FIG. Description of is omitted.

コンテンツ取得部１２０は、操作受付部２３１からの操作入力に応じて、動画記憶部２４０に記憶されている動画ファイルを取得するとともに、この動画ファイルに関連付けてメタデータ記憶部２５０に記憶されているメタデータファイルを取得する（ステップＳ９６１）。続いて、コンテンツ取得部１２０が、動画ファイルをデコードし、動画ファイルを構成する１つのフレームである現フレームの画像およびそれに対応する音声を取得する（ステップＳ９６２）。続いて、コンテンツ取得部１２０が、取得された現フレームに対応するアフィン変換パラメータをメタデータファイルから取得する（ステップＳ９６３）。 The content acquisition unit 120 acquires a moving image file stored in the moving image storage unit 240 in response to an operation input from the operation reception unit 231 and stores the moving image file in association with the moving image file in the metadata storage unit 250. A metadata file is acquired (step S961). Subsequently, the content acquisition unit 120 decodes the moving image file, and acquires an image of the current frame, which is one frame constituting the moving image file, and sound corresponding to the current frame (step S962). Subsequently, the content acquisition unit 120 acquires an affine transformation parameter corresponding to the acquired current frame from the metadata file (step S963).

続いて、アフィン変換された現フレームに対応する画像が合成画像に上書きされて画像メモリ１６１に保存される（ステップＳ９２８）。その後に、表示領域取出部２６０は、現フレームに対応する画像を固定する表示モードが指定されているか否かを判断する（ステップＳ９６４）。現フレームに対応する画像を固定する表示モードが指定されている場合には、表示領域取出部２６０は、最初のフレームから現フレームまでのアフィン変換パラメータと、表示倍率に対応するアフィン変換パラメータとを用いて表示領域の位置および大きさを決定する（ステップＳ９６５）。続いて、表示領域取出部２６０は、表示領域に含まれる合成画像を画像メモリ１６１から取り出す（ステップＳ９６６）。続いて、表示領域取出部２６０は、表示領域の決定に用いられたアフィン変換パラメータの行列に対する逆行列を用いて、画像メモリ１６１から取り出された合成画像をアフィン変換する（ステップＳ９６７）。 Subsequently, the image corresponding to the current frame subjected to the affine transformation is overwritten on the synthesized image and stored in the image memory 161 (step S928). Thereafter, the display area extraction unit 260 determines whether or not a display mode for fixing an image corresponding to the current frame is designated (step S964). When the display mode for fixing the image corresponding to the current frame is designated, the display area extracting unit 260 calculates the affine transformation parameters from the first frame to the current frame and the affine transformation parameters corresponding to the display magnification. Using this, the position and size of the display area are determined (step S965). Subsequently, the display area extraction unit 260 extracts the composite image included in the display area from the image memory 161 (step S966). Subsequently, the display area extraction unit 260 affine-transforms the composite image extracted from the image memory 161 using an inverse matrix for the matrix of affine transformation parameters used to determine the display area (step S967).

続いて、表示領域取出部２６０は、画像メモリ１６１から取り出されてアフィン変換された合成画像を表示メモリ２７０に保存する（ステップＳ９６８）。続いて、画像合成部１５１は、表示メモリ２７０に保存されている合成画像に現画像を上書き合成する（ステップＳ９６９）。続いて、表示メモリ２７０に保存されている合成画像を表示部１８０が表示する（ステップＳ９７０）。続いて、音声変換処理が実行される（ステップＳ９８０）。 Subsequently, the display area extraction unit 260 stores the composite image extracted from the image memory 161 and subjected to affine transformation in the display memory 270 (step S968). Subsequently, the image composition unit 151 performs composition by overwriting the current image on the composite image stored in the display memory 270 (step S969). Subsequently, the display unit 180 displays the composite image stored in the display memory 270 (step S970). Subsequently, a voice conversion process is executed (step S980).

一方、ステップＳ９６４において、現フレームに対応する画像を固定する表示モードが指定されていない場合には、表示領域取出部２６０は、表示倍率に対応するアフィン変換パラメータを用いて表示領域の位置および大きさを決定する（ステップＳ９７１）。なお、現画像の変換に応じて表示領域が移動している場合には、直前に移動された表示領域の位置を用いるようにしてもよい。 On the other hand, if the display mode for fixing the image corresponding to the current frame is not designated in step S964, the display area extraction unit 260 uses the affine transformation parameter corresponding to the display magnification to display the position and size of the display area. Is determined (step S971). When the display area is moved according to the conversion of the current image, the position of the display area moved immediately before may be used.

続いて、表示領域取出部２６０は、画像メモリ１６１に保持されている現画像が表示領域からはみ出しているか否かを判断する（ステップＳ９７２）。画像メモリ１６１に保持されている、現フレームに対応する画像が表示領域からはみ出していない場合（すなわち、現画像の全部が表示領域の範囲内に含まれる場合）には、表示領域取出部２６０は、表示領域に含まれる合成画像を画像メモリ１６１から取り出す（ステップＳ９７３）。続いて、表示領域取出部２６０は、画像メモリ１６１から取り出された合成画像を表示メモリ２７０に保存する（ステップＳ９７４）。 Subsequently, the display area extraction unit 260 determines whether or not the current image held in the image memory 161 protrudes from the display area (step S972). When the image corresponding to the current frame held in the image memory 161 does not protrude from the display area (that is, when the entire current image is included in the display area), the display area extraction unit 260 The composite image included in the display area is taken out from the image memory 161 (step S973). Subsequently, the display area extraction unit 260 stores the composite image extracted from the image memory 161 in the display memory 270 (step S974).

続いて、表示領域取出部２６０は、現フレームに対応する画像の変換に用いられたアフィン変換パラメータの行列と、表示領域の決定に用いられたアフィン変換パラメータの行列に対する逆行列とを用いて、表示メモリ２７０における現フレームに対応する画像の位置を決定する（ステップＳ９７５）。続いて、画像合成部１５１は、表示メモリ２７０に保存されている合成画像に現フレームに対応する画像を上書きして合成する（ステップＳ９７６）。続いて、ステップＳ９７０に進む。 Subsequently, the display area extraction unit 260 uses the affine transformation parameter matrix used for transforming the image corresponding to the current frame and the inverse matrix for the affine transformation parameter matrix used to determine the display area, The position of the image corresponding to the current frame in the display memory 270 is determined (step S975). Subsequently, the image composition unit 151 composes the composite image stored in the display memory 270 by overwriting the image corresponding to the current frame (step S976). Then, it progresses to step S970.

また、ステップＳ９７２において、画像メモリ１６１に保持されている現画像が表示領域からはみ出している場合（すなわち、現画像の少なくとも一部が表示領域の範囲内に含まれない場合）には、表示領域取出部２６０は、表示領域の一辺と、表示領域からはみ出している現画像との差分値を算出する（ステップＳ９７７）。続いて、表示領域取出部２６０は、算出された差分値に基づいて表示領域を移動させる（ステップＳ９７８）。続いて、ステップＳ９７３に進む。 In step S972, when the current image held in the image memory 161 is out of the display area (that is, when at least a part of the current image is not included in the display area), the display area The extraction unit 260 calculates a difference value between one side of the display area and the current image protruding from the display area (step S977). Subsequently, the display area extraction unit 260 moves the display area based on the calculated difference value (step S978). Subsequently, the process proceeds to step S973.

図４１は、本発明の実施の形態における画像処理装置６８０による音声変換処理の処理手順例（ステップＳ９８０の処理手順）を示すフローチャートである。なお、ここで示す処理手順のうちで、ステップＳ９５２、Ｓ９５３およびＳ９５４については、図２８に示す処理手順と同様であるため、同一の符号を付してここでの説明は省略する。 FIG. 41 is a flowchart showing an example of a processing procedure (processing procedure in step S980) of the voice conversion processing by the image processing device 680 according to the embodiment of the present invention. Of the processing procedures shown here, steps S952, S953, and S954 are the same as the processing procedures shown in FIG. 28, and therefore, the same reference numerals are given and description thereof is omitted here.

最初に、表示領域取出部２６０から画像メモリ１６１の領域を基準とした現在の表示領域に関するアフィン変換パラメータが出力される（ステップＳ９８１）。表示領域に関するアフィン変換パラメータ、および、画像変換部１４０により出力された現フレームに対応する画像の中心位置、角度および倍率を用いて音声変換情報が算出される（ステップＳ９８２）。 First, the affine transformation parameters relating to the current display area with reference to the area of the image memory 161 are output from the display area extraction unit 260 (step S981). Audio conversion information is calculated using the affine transformation parameters related to the display area and the center position, angle, and magnification of the image corresponding to the current frame output by the image conversion unit 140 (step S982).

次に、本発明の実施の形態の第３の変形例について図面を参照して説明する。 Next, a third modification of the embodiment of the present invention will be described with reference to the drawings.

図４２は、本発明の実施の形態における画像処理装置７４０の機能構成例を示すブロック図である。ここで、画像処理装置７４０は、図３１に示す画像処理装置６８０の一部を変形したものである。この画像処理装置７４０は、図３１に示す画像処理装置６８０の機能構成に加えて対象画像変換情報算出部２８０および相対関係情報記憶部２９０を備え、コンテンツ取得部１２０、画像変換部１４０、音量調整部２０１および音声加算部２０２に代えてコンテンツ取得部１２１、画像変換部１４１、音量調整部６３０および音声加算部６４０を設ける。これにより、画像処理装置７４０は、１つの表示画面内での複数の動画再生において複数の動画と関連付けて音声を変換処理することができるようになる。なお、動画記憶部２４０、メタデータ記憶部２５０、画像合成部１５１、音声変換情報算出部１９１、音声出力制御部２１０およびスピーカ２２０の構成は、図３１に示す画像処理装置と同様であるため、これらの説明は省略する。 FIG. 42 is a block diagram illustrating a functional configuration example of the image processing device 740 according to the embodiment of the present invention. Here, the image processing device 740 is obtained by modifying a part of the image processing device 680 shown in FIG. This image processing device 740 includes a target image conversion information calculation unit 280 and a relative relationship information storage unit 290 in addition to the functional configuration of the image processing device 680 shown in FIG. 31, and includes a content acquisition unit 120, an image conversion unit 140, and a volume adjustment. Instead of the unit 201 and the audio addition unit 202, a content acquisition unit 121, an image conversion unit 141, a volume adjustment unit 630, and an audio addition unit 640 are provided. As a result, the image processing device 740 can perform audio conversion processing in association with a plurality of moving images in a plurality of moving image reproductions within one display screen. The configuration of the moving image storage unit 240, the metadata storage unit 250, the image synthesis unit 151, the audio conversion information calculation unit 191, the audio output control unit 210, and the speaker 220 is the same as that of the image processing apparatus shown in FIG. These descriptions are omitted.

コンテンツ取得部１２１は、操作受付部２３２により受け付けられた操作入力に応じて、動画記憶部２４０に記憶されている１または複数の動画ファイル、これらの各動画ファイルに関連付けられてメタデータ記憶部２５０に記憶されているメタデータファイル、これらの動画ファイルに共通して関連付けられて相対関係情報記憶部２９０に記憶されている相対関係メタデータファイルの少なくとも１つを取得し、取得された各ファイルの情報を各部に供給するものである。具体的には、コンテンツ取得部１２１は、複数の動画を合成しながら再生する複数動画合成再生モードを指定する操作入力が操作受付部２３２により受け付けられた場合には、動画記憶部２４０に記憶されている複数の動画ファイルと、これらの各動画ファイルに関連付けられてメタデータ記憶部２５０に記憶されているメタデータファイルと、これらの動画ファイルに共通して関連付けられて相対関係情報記憶部２９０に記憶されている相対関係メタデータファイルとを取得し、取得された動画ファイルの動画およびメタデータファイルのアフィン変換パラメータを画像変換部１４１に出力する。また、取得されたメタデータファイルおよび相対関係メタデータファイルの内容を対象画像変換情報算出部２８０に出力する。また、コンテンツ取得部１２１は、コンテンツ取得部１２０の機能を備える。コンテンツ取得部１２０の機能については、上述のものと同様であるため、ここでの説明は省略する。 In response to the operation input received by the operation receiving unit 232, the content acquisition unit 121 is associated with one or a plurality of moving image files stored in the moving image storage unit 240, and the metadata storage unit 250 associated with each of these moving image files. And at least one of the relative metadata files stored in the relative information storage unit 290 that are commonly associated with these moving image files and stored in the relative information storage unit 290. Information is supplied to each part. Specifically, when the operation receiving unit 232 receives an operation input for designating a plurality of moving image synthesis / playback modes for playback while combining a plurality of moving images, the content acquisition unit 121 stores the operation in the moving image storage unit 240. A plurality of moving image files, a metadata file associated with each of the moving image files and stored in the metadata storage unit 250, and a relative relationship information storage unit 290 associated with these moving image files in common. The stored relative relationship metadata file is acquired, and the acquired moving image of the moving image file and the affine transformation parameters of the metadata file are output to the image conversion unit 141. Further, the contents of the acquired metadata file and relative relationship metadata file are output to the target image conversion information calculation unit 280. In addition, the content acquisition unit 121 has the function of the content acquisition unit 120. Since the function of the content acquisition unit 120 is the same as that described above, a description thereof is omitted here.

画像変換部１４１は、コンテンツ取得部１２１から出力された動画ファイルの動画を構成する画像について、この画像に対応するアフィン変換パラメータを用いてフレーム毎にアフィン変換を施し、アフィン変換された画像を画像合成部１５１に出力するものである。ここで、複数動画合成再生モードが指定されている場合には、画像変換部１４１は、再生の対象となる複数の動画のうちの１つの動画を基準動画とし、この基準動画については、この基準動画を構成する画像に対応するアフィン変換パラメータを用いて、フレーム毎にアフィン変換を施す。一方、再生の対象となる複数の動画のうちの基準動画以外の他の動画については、対象画像変換情報算出部２８０により算出された対象画像変換情報（アフィン変換パラメータ）と、動画を構成する画像に対応するアフィン変換パラメータとを用いて、フレーム毎にアフィン変換を施す。また、画像変換部１４１は、画像変換部１４０の機能を備える。画像変換部１４０の機能については、上述のものと同様であるため、ここでの説明は省略する。なお、他の動画の変換方法については、図４４等を参照して詳細に説明する。 The image conversion unit 141 performs affine transformation for each frame using an affine transformation parameter corresponding to the image of the moving image of the moving image file output from the content acquisition unit 121, and converts the affine transformed image into an image. This is output to the combining unit 151. Here, when the multiple video composition playback mode is designated, the image conversion unit 141 sets one of the multiple videos to be played back as a reference video, and for this reference video, Affine transformation is performed for each frame using affine transformation parameters corresponding to images constituting a moving image. On the other hand, with respect to videos other than the reference video among a plurality of videos to be reproduced, target image conversion information (affine transformation parameters) calculated by the target image conversion information calculation unit 280 and images constituting the video The affine transformation parameters corresponding to are used to perform affine transformation for each frame. The image conversion unit 141 has the function of the image conversion unit 140. Since the function of the image conversion unit 140 is the same as that described above, a description thereof is omitted here. Other moving image conversion methods will be described in detail with reference to FIG. 44 and the like.

操作受付部２３２は、各種入力キーを備え、これらの操作入力を受け付けると、受け付けた操作入力の内容をコンテンツ取得部１２１、画像変換部１４１または表示領域取出部２６０に出力するものである。操作受付部２３２には、例えば、動画記憶部２４０に記憶されている１または複数の動画ファイルの中から所望の動画を選択する動作選択キー、通常の動画再生を指示する再生指示キー、再生中の動画を停止する停止キー、動画の表示倍率を指定する表示倍率指定キー、複数動画合成再生モードを設定する複数動画合成再生設定キー等が設けられている。なお、これらのキーについては、１つのキーに複数の機能を割り当てるようにしてもよい。また、操作受付部２３２の少なくとも一部と表示部１８０とをタッチパネルとして一体として構成するようにしてもよい。 The operation receiving unit 232 includes various input keys. When receiving these operation inputs, the operation receiving unit 232 outputs the contents of the received operation inputs to the content acquisition unit 121, the image conversion unit 141, or the display area extraction unit 260. The operation accepting unit 232 includes, for example, an operation selection key for selecting a desired movie from one or more movie files stored in the movie storage unit 240, a playback instruction key for instructing normal movie playback, There are provided a stop key for stopping the moving image, a display magnification specifying key for specifying the display magnification of the moving image, a plurality of moving image combining / reproducing setting keys for setting a plurality of moving image combining / reproducing modes, and the like. As for these keys, a plurality of functions may be assigned to one key. Further, at least a part of the operation receiving unit 232 and the display unit 180 may be configured as a single unit as a touch panel.

対象画像変換情報算出部２８０は、複数動画合成再生モードが指定されている場合に、コンテンツ取得部１２１から出力されたメタデータファイルおよび相対関係メタデータファイルのアフィン変換パラメータに基づいて、再生の対象となる複数の動画の中の１つの動画を構成する少なくとも１つの画像を基準画像とし、他の動画を構成する各画像を対象画像とした場合に、この対象画像の変換に用いられる対象画像変換情報を算出するものである。そして、算出された対象画像変換情報を画像変換部１４１に出力する。１つの動画における基準画像については、例えば、１つの動画を構成する画像の中の先頭フレームに対応する画像を用いることができる。また、対象画像変換情報は、例えば、基準画像に対する対象画像の変換に用いられるアフィン変換パラメータである。 The target image conversion information calculation unit 280 performs reproduction based on the affine conversion parameters of the metadata file and the relative metadata file output from the content acquisition unit 121 when the multiple video composition playback mode is designated. Target image conversion used for conversion of the target image when at least one image constituting one moving image among a plurality of moving images is set as a reference image and each image forming another moving image is set as a target image Information is calculated. Then, the calculated target image conversion information is output to the image conversion unit 141. For the reference image in one moving image, for example, an image corresponding to the first frame in the images constituting one moving image can be used. The target image conversion information is, for example, an affine transformation parameter used for conversion of the target image with respect to the reference image.

図４３は、本発明の実施の形態における動画記憶部２４０および相対関係情報記憶部２９０に記録されている各ファイルを模式的に示す図である。この例では、動画記憶部２４０に記憶されている動画ファイル２４１乃至２４４と、動画ファイル２４１乃至２４４に関連付けて相対関係情報記憶部２９０に記憶されている相対関係メタデータファイル２９１乃至２９３とを模式的に示す図である。この例では、動画ファイル（＃１）２４１を構成するフレーム「５」７４１およびフレーム「８」７４２と、動画ファイル（＃２）２４２を構成するフレーム「７」７４３およびフレーム「９」７４４と、動画ファイル（＃３）２４３を構成するフレーム「３」７４５およびフレーム「１０」７４６とが、相対関係情報記憶部２９０に記憶されている相対関係メタデータファイル２９１乃至２９３に関連付けて記憶されている例について説明する。なお、動画記憶部２４０に記憶されている各動画ファイルについては、図３０に示す動画ファイルと同様であるため、ここでの説明を省略する。 FIG. 43 is a diagram schematically showing each file recorded in the moving image storage unit 240 and the relative relationship information storage unit 290 in the embodiment of the present invention. In this example, the moving image files 241 to 244 stored in the moving image storage unit 240 and the relative relationship metadata files 291 to 293 stored in the relative relationship information storage unit 290 in association with the moving image files 241 to 244 are schematically illustrated. FIG. In this example, the frame “5” 741 and the frame “8” 742 constituting the moving image file (# 1) 241, the frame “7” 743 and the frame “9” 744 constituting the moving image file (# 2) 242, The frame “3” 745 and the frame “10” 746 constituting the moving image file (# 3) 243 are stored in association with the relative relationship metadata files 291 to 293 stored in the relative relationship information storage unit 290. An example will be described. Note that each moving image file stored in the moving image storage unit 240 is the same as the moving image file illustrated in FIG. 30, and thus description thereof is omitted here.

相対関係メタデータファイル２９１乃至２９３には、動画ＩＤ２９４と、フレーム番号２９５と、アフィン変換パラメータ２９６とが関連付けてそれぞれ格納されている。 In the relative relationship metadata files 291 to 293, a moving image ID 294, a frame number 295, and an affine transformation parameter 296 are stored in association with each other.

動画ＩＤ２９４は、少なくとも３つの一致点を互いに含む２つの画像に対応する２つの動画ファイルに付与されている動画ＩＤであり、例えば、相対関係メタデータファイル２９１には、動画ファイル２４１に付与されている「＃１」および動画ファイル２４２に付与されている「＃２」が格納される。 The moving image ID 294 is a moving image ID assigned to two moving image files corresponding to two images including at least three coincident points. For example, the relative relationship metadata file 291 is assigned to the moving image file 241. “# 1” and “# 2” assigned to the moving image file 242 are stored.

フレーム番号２９５は、少なくとも３つの一致点を互いに含む２つの画像に対応する２つのフレームの通し番号であり、例えば、相対関係メタデータファイル２９１には、動画ファイル２４１の動画を構成するフレームのフレーム番号「５」および動画ファイル２４２の動画を構成するフレームのフレーム番号「７」が格納される。 The frame number 295 is a serial number of two frames corresponding to two images including at least three coincident points. For example, the relative metadata file 291 includes a frame number of a frame constituting the moving image of the moving image file 241. “5” and the frame number “7” of the frame constituting the moving image of the moving image file 242 are stored.

アフィン変換パラメータ２９６は、動画ＩＤ２９４およびフレーム番号２９５に対応する少なくとも２つの画像について計算されたアフィン変換パラメータであり、例えば、相対関係メタデータファイル２９１には、動画ファイル２４１の動画を構成するフレーム「５」および動画ファイル２４２の動画を構成するフレーム「７」に対応するアフィン変換パラメータとして「ａｏ，ｂｏ，ｃｏ，ｄｏ，ｅｏ，ｆｏ」が格納される。なお、本発明の実施の形態では、アフィン変換パラメータ２９６は、対応する２つの動画ＩＤ２９４およびフレーム番号２９５のうちの図４３に示す下側のフレーム番号に対応する画像を基準画像として、上側を対象画像とした場合におけるアフィン変換パラメータであるものとする。例えば、相対関係メタデータファイル２９１に格納されているアフィン変換パラメータ２９６は、動画ファイル（＃１）２４１の動画を構成するフレーム「５」７４１の動画ファイル（＃２）２４２の動画を構成するフレーム「７」７４３に対するアフィン変換パラメータである。 The affine transformation parameter 296 is an affine transformation parameter calculated for at least two images corresponding to the moving image ID 294 and the frame number 295. For example, the relative metadata file 291 includes a frame “ 5 ”and“ ao, bo, co, do, eo, fo ”are stored as affine transformation parameters corresponding to the frame“ 7 ”constituting the moving image of the moving image file 242. In the embodiment of the present invention, the affine transformation parameter 296 is set so that the image corresponding to the lower frame number shown in FIG. 43 of the two corresponding moving image IDs 294 and frame numbers 295 is the reference image and the upper side is the target. It is assumed that it is an affine transformation parameter in the case of an image. For example, the affine transformation parameter 296 stored in the relative relationship metadata file 291 includes a frame constituting the moving picture of the moving picture file (# 2) 242 of the frame “5” 741 constituting the moving picture of the moving picture file (# 1) 241. This is an affine transformation parameter for “7” 743.

図４４は、２つの動画を合成する場合における合成例を模式的に示す図である。この例では、動画３７０を構成する画像３７１乃至３８４と、動画３９０を構成する画像３９１乃至３９７とを合成する場合について説明する。また、内部を斜線で示す画像３７８および３９４は、動画３７０および３９０に関する相対関係メタデータに含まれるフレーム番号に対応する画像であるものとする。 FIG. 44 is a diagram schematically showing a synthesis example when two moving images are synthesized. In this example, a case will be described in which images 371 to 384 forming the moving image 370 and images 391 to 397 forming the moving image 390 are combined. In addition, it is assumed that images 378 and 394 whose interiors are hatched are images corresponding to frame numbers included in the relative relationship metadata regarding the moving images 370 and 390.

図４４（ａ）では、動画３７０を構成する画像３７１乃至３８４を、各フレームに関連付けて記憶されているアフィン変換パラメータを用いて順次アフィン変換していき、画像メモリ１６１上に合成する場合を示す。例えば、最初に、先頭フレームに対応する画像３７１が画像メモリ１６１に保持される。そして、画像３７１を基準にして画像３７２乃至３８４が順次アフィン変換されて画像メモリ１６１に合成される。このアフィン変換による現画像の流れを矢印３８５で示す。すなわち、矢印３８５に沿うように画像３７１乃至３８４が順次合成される。 FIG. 44A shows a case where the images 371 to 384 constituting the moving image 370 are sequentially affine transformed using the affine transformation parameters stored in association with the respective frames and synthesized on the image memory 161. . For example, first, an image 371 corresponding to the first frame is held in the image memory 161. Then, the images 372 to 384 are sequentially affine transformed with the image 371 as a reference and synthesized in the image memory 161. The flow of the current image by this affine transformation is indicated by an arrow 385. That is, the images 371 to 384 are sequentially combined along the arrow 385.

図４４（ｂ）では、動画３９０を構成する画像３９１乃至３９７を、各フレームに関連付けて記憶されているアフィン変換パラメータを用いて順次アフィン変換していき、画像メモリ１６１上に合成する場合を示す。また、図４４（ｃ）では、動画３７０および３９０に関する相対関係メタデータに含まれるアフィン変換パラメータにより、画像３９１を基準画像として画像３９４をアフィン変換した場合における画像３７８および画像３９４の相対関係位置を示す。ここで、図４４（ｂ）に示す合成画像は、図４４（ｃ）に示す画像３７８および画像３９４の相対関係位置を基準にして、画像３９１乃至３９７が合成された場合を示すものである。この場合のアフィン変換による現画像の流れを矢印３９８で示す。すなわち、矢印３９８に沿うように画像３９１乃至３９７が順次合成される。このように、図４４（ｃ）に示す画像３７８および画像３８４の相対関係位置を基準にして、図４４（ａ）に示す合成画像および図４４（ｂ）に示す合成画像が合成された場合における合成例を図４４（ｄ）に示す。なお、図４４（ｄ）に示す例では、画像３７８および３９４が同時刻に再生される場合を示し、同時刻に再生される各画像は、動画３９０が動画３７０よりも上書き合成される例を示す。 FIG. 44B shows a case where the images 391 to 397 constituting the moving image 390 are sequentially affine transformed using the affine transformation parameters stored in association with the respective frames and synthesized on the image memory 161. . Also, in FIG. 44C, the relative position of the image 378 and the image 394 when the image 394 is affine transformed with the image 391 as the reference image by the affine transformation parameters included in the relative relationship metadata regarding the moving images 370 and 390 are shown. Show. Here, the synthesized image shown in FIG. 44B shows a case where the images 391 to 397 are synthesized based on the relative positions of the image 378 and the image 394 shown in FIG. The flow of the current image by affine transformation in this case is indicated by an arrow 398. That is, the images 391 to 397 are sequentially combined along the arrow 398. In this way, when the composite image shown in FIG. 44A and the composite image shown in FIG. 44B are combined based on the relative position of the image 378 and the image 384 shown in FIG. A synthesis example is shown in FIG. Note that the example shown in FIG. 44D shows a case where the images 378 and 394 are reproduced at the same time, and each image reproduced at the same time is an example in which the moving image 390 is overwritten and combined with the moving image 370. Show.

ここで、具体的な各動画の保持位置に関する計算について説明する。最初に、複数の動画のうちの１つの動画を構成する少なくとも１つの動画の位置が決定される。例えば、動画３７０を構成する先頭フレームに対応する画像３７１の位置が決定される。この決定される位置は、操作受付部２３２においてユーザが指定してもよく、上述した計算により算出された位置を用いて決定してもよい。続いて、他の動画を構成する画像のうちの少なくとも１つの画像の保持位置が算出される。例えば、画像３７１乃至３８４に対応する各フレームに関連付けられているアフィン変換パラメータの行列を、Ａ１乃至Ａ１４とする。また、画像３９１乃至３９７に対応する各フレームに関連付けられているアフィン変換パラメータの行列を、Ｂ１乃至Ｂ７とする。さらに、動画３７０および３９０に関連付けて記憶されている相対関係メタデータのアフィン変換パラメータの行列をＣ１とする。ここで、基準画像は画像３７１とする。画像メモリ１６１上における画像３７１の保持位置を基準とした場合に、画像３７８の保持位置は、Ａ１乃至Ａ８の乗算により算出される。すなわち、Ａ１×…×Ａ８を用いて算出される。また、画像メモリ１６１上における画像３７１の保持位置を基準とした場合に、画像３９４の保持位置は、Ａ１乃至Ａ８、Ｃ１の乗算により算出される。すなわち、Ａ１×…×Ａ８×Ｃ１を用いて算出される。ここで、例えば、動画３９０の先頭フレームに対応する画像３９１の保持位置を算出する場合には、Ａ１乃至Ａ８およびＣ１と、Ｂ１乃至Ｂ４の逆行列の乗算により算出することができる。すなわち、「Ａ１×…×Ａ８×Ｃ１×Ｉｎｖ（Ｂ１×…×Ｂ４）」を用いて画像３９１の保持位置を算出することができる。また、動画３９０を構成する他の画像についての保持位置についても同様に、Ａ１乃至Ａ８およびＣ１と、Ｂ１乃至Ｂ４の逆行列またはＢ５乃至Ｂ７とを用いて算出することが可能である。 Here, the calculation regarding the holding | maintenance position of each moving image is demonstrated. First, the position of at least one moving image constituting one moving image among the plurality of moving images is determined. For example, the position of the image 371 corresponding to the first frame constituting the moving image 370 is determined. The determined position may be designated by the user in the operation receiving unit 232, or may be determined using the position calculated by the above-described calculation. Subsequently, a holding position of at least one image among images constituting another moving image is calculated. For example, the matrix of affine transformation parameters associated with each frame corresponding to the images 371 to 384 is A1 to A14. Also, the matrix of affine transformation parameters associated with each frame corresponding to the images 391 to 397 is denoted by B1 to B7. Further, the matrix of the affine transformation parameters of the relative relationship metadata stored in association with the moving images 370 and 390 is C1. Here, the reference image is an image 371. When the holding position of the image 371 on the image memory 161 is used as a reference, the holding position of the image 378 is calculated by multiplication of A1 to A8. That is, it is calculated using A1 ×... × A8. When the holding position of the image 371 on the image memory 161 is used as a reference, the holding position of the image 394 is calculated by multiplication of A1 to A8 and C1. That is, it is calculated using A1 ×... × A8 × C1. Here, for example, when calculating the holding position of the image 391 corresponding to the first frame of the moving image 390, it can be calculated by multiplying the inverse matrix of A1 to A8 and C1 and B1 to B4. That is, the holding position of the image 391 can be calculated using “A1 ×... × A8 × C1 × Inv (B1 ×... × B4)”. Similarly, the holding positions of other images constituting the moving image 390 can be calculated using A1 to A8 and C1 and an inverse matrix of B1 to B4 or B5 to B7.

また、基準画像を含む動画以外の動画を構成する画像をアフィン変換する場合には、先頭フレームに対応する画像の保持位置の算出に用いられた行列と、画像に関連付けられたアフィン変換パラメータを用いて行う。例えば、動画３９０の画像３９２をアフィン変換する場合には、画像３９２に対応する行列Ｂ２を用いて、「Ａ１×…×Ａ８×Ｃ１×Ｉｎｖ（Ｂ３×Ｂ４）」の行列により変換される。また、例えば、動画３９０の画像５２３をアフィン変換する場合も同様に、「Ａ１×…×Ａ８×Ｃ１×Ｉｎｖ（Ｂ４）」の行列により変換される。同様に、動画３９０の各画像が変換される。 In addition, when affine transformation is performed on an image constituting a moving image other than the moving image including the reference image, the matrix used for calculating the holding position of the image corresponding to the first frame and the affine transformation parameter associated with the image are used. Do it. For example, when the image 392 of the moving image 390 is subjected to affine transformation, the matrix B2 corresponding to the image 392 is used to perform transformation using a matrix of “A1 ×. Further, for example, when the image 523 of the moving image 390 is subjected to affine transformation, the transformation is similarly performed using a matrix of “A1 ×. Similarly, each image of the moving image 390 is converted.

このように、複数の動画について合成して再生する場合には、１つの動画の基準画像の画像メモリ１６１における位置および大きさを決定した後に、各動画のそれぞれに関連付けられているメタデータファイルと、各動画に関連付けられている相対関係メタデータファイルとを用いて、各画像の位置および大きさを算出することができる。このため、複数の動画について合成して再生する場合には、各動画の何れかの位置からも再生させることが可能である。例えば、図４４（ｄ）に示す画像メモリ１６１上では、動画３７０を構成する画像３７１乃至３７４が合成された後に、動画３９０を構成する画像３９１が合成される例を示す。すなわち、画像３７５および３９１が同時に合成され、続いて、画像３７６および３９２が同時に合成される。以降も同様に合成される。なお、この例では、同時刻に再生される各画像は、動画３９０が動画３７０よりも上書き合成される例を示すが、上書きする動画を操作受付部２３０において指定するようにしてもよい。 In this way, when combining and reproducing a plurality of moving images, after determining the position and size of the reference image of one moving image in the image memory 161, the metadata file associated with each moving image and The position and size of each image can be calculated using the relative relationship metadata file associated with each moving image. For this reason, when combining and reproducing | regenerating about several moving images, it is possible to reproduce from any position of each moving image. For example, on the image memory 161 shown in FIG. 44D, an example is shown in which the images 391 constituting the moving image 390 are synthesized after the images 371 to 374 constituting the moving image 370 are synthesized. That is, the images 375 and 391 are synthesized simultaneously, and then the images 376 and 392 are synthesized simultaneously. Thereafter, the same synthesis is performed. In this example, each image reproduced at the same time shows an example in which the moving image 390 is overwritten and combined with the moving image 370. However, the operation receiving unit 230 may specify the moving image to be overwritten.

次に、本発明の実施の形態における画像処理装置７４０による音声変換処理について説明する。 Next, audio conversion processing by the image processing device 740 in the embodiment of the present invention will be described.

図４５は、本発明の実施の形態における画像処理装置７４０による音声変換処理部２００の構成例を示すブロック図である。ここでは一例として、第１の動画および第２の動画を同時再生する場合における、右チャンネルおよび左チャンネルの出力音声を生成する変換処理例を示す。また、第１の動画および第２の動画の入力音声は、右チャンネルおよび左チャンネルにより構成されることとする。そのため、図３１に示す音量調整部２０１および音声加算部２０２に代えて音量調整部６３０および音声加算部６４０を設けた音声変換処理部２００の機能について説明する。なお、基本的な構成は図１７に示す構成と同様のため、ここでは簡単に説明する。 FIG. 45 is a block diagram illustrating a configuration example of the audio conversion processing unit 200 by the image processing device 740 according to the embodiment of the present invention. Here, as an example, a conversion processing example for generating output sound of the right channel and the left channel in the case of simultaneously reproducing the first moving image and the second moving image is shown. Also, the input sound of the first moving image and the second moving image is composed of a right channel and a left channel. Therefore, the function of the audio conversion processing unit 200 provided with a volume adjusting unit 630 and an audio adding unit 640 instead of the volume adjusting unit 201 and the audio adding unit 202 shown in FIG. 31 will be described. Since the basic configuration is the same as the configuration shown in FIG. 17, it will be briefly described here.

音量調整部６３０は、音量増幅器６３１乃至６３８を備える。音量増幅器６３１乃至６３４は、音声変換情報算出部１９１から供給された、第１の動画に関する音声変換情報ＲＲ_１、ＲＬ_１、ＬＲ_１およびＬＬ_１に基づいて第１の動画の右チャンネルおよび左チャンネルの入力音声を増幅するものである。音量増幅器６３５乃至６３８は、音声変換情報算出部１９１から供給された、第２の動画に関する音声変換情報ＲＲ_２、ＲＬ_２、ＬＲ_２およびＬＬ_２に基づいて第２の動画の右チャンネルおよび左チャンネルの入力音声を増幅するものである。 The volume adjustment unit 630 includes volume amplifiers 631 to 638. The volume amplifiers 631 to 634 are the right channel and the left channel of the first video based on the audio conversion information RR ₁ , RL ₁ , LR ₁ and LL ₁ related to the first video supplied from the audio conversion information calculation unit 191. The input voice is amplified. The volume amplifiers 635 to 638 are the right channel and the left channel of the second video based on the audio conversion information RR ₂ , RL ₂ , LR ₂ and LL ₂ related to the second video supplied from the audio conversion information calculation unit 191. The input voice is amplified.

音声加算部６４０は、音声加算器６４１乃至６４６を備える。音声加算器６４１および６４２は、第１の動画の右チャンネルおよび左チャンネルの入力音声を加算し、音声加算器６４３および６４４は、第２の動画の右チャンネルおよび左チャンネルの入力音声を加算するものである。音声加算器６４５は、第１の動画および第２の動画の右チャンネルの出力音声を加算するものである。音声加算器６４６は、第１の動画および第２の動画の左チャンネルの出力音声を加算するものである。 The audio adder 640 includes audio adders 641 to 646. Audio adders 641 and 642 add the input sound of the right channel and the left channel of the first moving image, and audio adders 643 and 644 add the input sound of the right channel and the left channel of the second moving image. It is. The audio adder 645 adds the output audio of the right channel of the first moving image and the second moving image. The sound adder 646 adds the output sound of the left channel of the first moving image and the second moving image.

図４６は、本発明の実施の形態における画像処理装置７４０による２つの動画の同時再生時における音声変換処理の例を示す図である。図４６には、表示部１８０の表示画面上に２つの再生動画６５１および６５２を表示する例が示されている。この場合には、まず、上述のように、再生動画６５１および６５２に対して、現フレームに対応する画像の中心位置、角度または倍率に応じて、各チャンネルの入力音声が変換処理されて出力音声が生成される。そして、再生動画６５１および６５２に関するそれぞれの出力音声について、同じチャンネル毎に加算した音声が右スピーカ２２１および左スピーカ２２２に出力される。また、このように生成される出力音声に関する関係式は、次式で表すことができる。
Ｒ'＝（Ｒ１'＋Ｒ２')／２
Ｌ'＝（Ｌ１'＋Ｌ２')／２ FIG. 46 is a diagram showing an example of audio conversion processing at the time of simultaneous reproduction of two moving images by the image processing device 740 according to the embodiment of the present invention. FIG. 46 shows an example in which two playback moving images 651 and 652 are displayed on the display screen of the display unit 180. In this case, first, as described above, the input audio of each channel is converted for the reproduced moving images 651 and 652 according to the center position, angle or magnification of the image corresponding to the current frame, and the output audio Is generated. Then, with respect to the respective output sounds relating to the reproduced moving images 651 and 652, the sound added for each same channel is output to the right speaker 221 and the left speaker 222. Moreover, the relational expression regarding the output sound generated in this way can be expressed by the following expression.
R ′ = (R1 ′ + R2 ′) / 2
L ′ = (L1 ′ + L2 ′) / 2

なお、Ｒ１'＝Ｒ１・ＲＲ_１＋Ｌ１・ＬＲ_１、Ｌ１'＝Ｒ１・ＲＬ_１＋Ｌ１・ＬＬ_１、Ｒ２'＝Ｒ２・ＲＲ_２＋Ｌ２・ＬＲ_２、Ｌ２'＝Ｒ２・ＲＬ_２＋Ｌ２・ＬＬ_２として表される。また、Ｒ１およびＬ１は、第１の動画の右チャンネルおよび左チャンネルの入力音声であり、Ｒ２およびＬ２は、第２の動画の右チャンネルおよび左チャンネルの入力音声である。また、ＲＲ_１、ＲＬ_１、ＬＲ_１およびＬＬ_１は、第１の動画に関する音声変換情報に相当し、ＲＲ_２、ＲＬ_２、ＬＲ_２およびＬＬ_２は、第２の動画に関する音声変換情報に相当する。 R1 ′ = R1 · RR ₁ + L1 · LR ₁ , L1 ′ = R1 · RL ₁ + L1 · LL ₁ , R2 ′ = R2 · RR ₂ + L2 · LR ₂ , L2 ′ = R2 · RL ₂ + L2 · LL ₂ expressed. Further, R1 and L1 are input sounds of the right channel and the left channel of the first moving image, and R2 and L2 are input sounds of the right channel and the left channel of the second moving image. Also, RR ₁ , RL ₁ , LR ₁ and LL ₁ correspond to audio conversion information related to the first moving image, and RR ₂ , RL ₂ , LR ₂ and LL ₂ correspond to audio conversion information related to the second moving image. To do.

なお、本発明の実施の形態における画像処理装置７４０による音声変換処理の処理手順例については、ステップＳ９８２において、複数の動画の出力音声が同じチャンネル毎に加算される処理が加わるのみである。それ以外の音声変換処理の手順は、同様であるため、説明を省略する。 Note that, with respect to the processing procedure example of the audio conversion processing by the image processing device 740 in the embodiment of the present invention, only the processing of adding the output audio of a plurality of moving images for each same channel is added in step S982. Since the procedure of the other voice conversion processing is the same, the description thereof is omitted.

以上では、動画に関する動き情報に基づいて音声を変換処理する例について説明したが、本実施の発明の形態では、動画に関する動き情報以外の情報に基づいて音声を変換処理する場合にも適用することができる。以下では、この適用例として、本発明の実施の形態の第４の変形例について図面等を参照して説明する。ここでは、図１に示すコンテンツ取得部１２０および画像変換部１４０の機能についてのみ説明する。これら以外の構成は、上述の機能と同様であるため説明を省略する。 In the above, the example of converting the sound based on the motion information related to the moving image has been described. However, in the embodiment of the present invention, it is applied to the case where the sound is converted based on information other than the motion information related to the moving image. Can do. Hereinafter, as this application example, a fourth modification example of the embodiment of the present invention will be described with reference to the drawings. Here, only functions of the content acquisition unit 120 and the image conversion unit 140 illustrated in FIG. 1 will be described. Since the configuration other than these is the same as the above-described function, description thereof is omitted.

コンテンツ取得部１２０は、表示部１８０における動画の表示領域を示すテンプレート情報を取得するものである。このテンプレート情報は、表示画面上において各情報を表示するための表示領域を規定するものであり、例えば、動画を表示させる動画表示領域、テキスト形式の文字を表示する文字表示領域が規定される。 The content acquisition unit 120 acquires template information indicating a moving image display area on the display unit 180. This template information defines a display area for displaying each piece of information on the display screen. For example, a moving image display area for displaying a moving image and a character display area for displaying text characters are defined.

画像変換部１４０は、コンテンツ取得部１２０から出力されたテンプレート情報に基づいて画像を変換するものである。すなわち、この変換は、テンプレート情報に示された表示画面における表示領域に動画を表示させるための変換である。また、画像変換部１４０は、テンプレート情報から求められる、現フレームに対応する画像の中心位置、角度または倍率を音声変換情報算出部１９０に出力する。 The image conversion unit 140 converts an image based on the template information output from the content acquisition unit 120. That is, this conversion is a conversion for displaying a moving image in the display area on the display screen indicated by the template information. Further, the image conversion unit 140 outputs the center position, angle, or magnification of the image corresponding to the current frame obtained from the template information to the audio conversion information calculation unit 190.

図４７は、本発明の実施の形態における動画の動き情報以外の情報により音声を変換処理する例を示す図である。図４７（ａ）は、表示部１８０の表示画面の左側に動画６５３が表示され、その右側に動画に関する情報がテキスト形式で表示される例である。ここでは、表示画面上における動画６５３の中心位置は、表示画面上の左側に位置するため、右スピーカ２２１に比べて左スピーカ２２２に出力音声の出力割合を大きくする。この場合には、画像変換部１４０は、動画の表示領域を示すテンプレート情報から動画６５３の中心位置および倍率を求めて、音声変換情報算出部１９１に出力する。図４７（ｂ）は、表示部１８０の表示領域を２つに分割して動画を表示させる例である。ここでは、表示画面上の左半分に動画６５４が表示され、右半分に動画６５５が表示されるため、動画６５４および６５５の中心位置に応じて、動画６５４および６５５の各チャンネルの出力音声がそれぞれ生成される。そして、動画６５４および６５５の同じチャンネルの出力音声を加算した音声が右スピーカ２２１および左スピーカ２２２に出力される。この場合には、画像変換部１４０は、表示領域の分割に関するテンプレート情報から動画６５４および６５５の中心位置および倍率を求めて、音声変換情報算出部１９１に出力する。 FIG. 47 is a diagram illustrating an example in which sound is converted by information other than the motion information of the moving image according to the embodiment of the present invention. FIG. 47A shows an example in which a moving image 653 is displayed on the left side of the display screen of the display unit 180, and information related to the moving image is displayed in a text format on the right side. Here, since the center position of the moving image 653 on the display screen is located on the left side on the display screen, the output ratio of the output sound to the left speaker 222 is increased compared to the right speaker 221. In this case, the image conversion unit 140 obtains the center position and magnification of the moving image 653 from the template information indicating the moving image display area, and outputs it to the audio conversion information calculation unit 191. FIG. 47B shows an example in which a moving image is displayed by dividing the display area of the display unit 180 into two. Here, since the moving image 654 is displayed on the left half on the display screen and the moving image 655 is displayed on the right half, the output sound of each channel of the moving images 654 and 655 is respectively displayed according to the center position of the moving images 654 and 655. Generated. Then, a sound obtained by adding the output sounds of the same channels of the moving images 654 and 655 is output to the right speaker 221 and the left speaker 222. In this case, the image conversion unit 140 obtains the center positions and magnifications of the moving images 654 and 655 from the template information related to the division of the display area, and outputs them to the audio conversion information calculation unit 191.

ここで、図２９、図３１、図４２に示すメタデータ記憶部２５０に記憶されるアフィン変換パラメータを検出するカメラワーク検出部４８０について図面を参照して詳細に説明する。なお、図１に示す画像変換情報供給部１３０についても、カメラワーク検出部４８０と同様の構成として、アフィン変換パラメータを検出することができる。 Here, the camera work detection unit 480 that detects the affine transformation parameters stored in the metadata storage unit 250 shown in FIGS. 29, 31, and 42 will be described in detail with reference to the drawings. The image conversion information supply unit 130 shown in FIG. 1 can also detect affine transformation parameters with the same configuration as the camera work detection unit 480.

図４８は、本発明の実施の形態におけるカメラワーク検出部４８０の機能構成例を示すブロック図である。カメラワーク検出部４８０は、特徴点抽出部４８１と、オプティカルフロー計算部４８２と、カメラワークパラメータ算出部４８３とを備え、動画入力部４７０および記録制御部４９０に接続されている。なお、この例では、カメラワーク検出部４８０に関連する構成のみについて図示し、他の構成の図示および説明を省略する。 FIG. 48 is a block diagram illustrating a functional configuration example of the camera work detection unit 480 in the embodiment of the present invention. The camera work detection unit 480 includes a feature point extraction unit 481, an optical flow calculation unit 482, and a camera work parameter calculation unit 483, and is connected to the moving image input unit 470 and the recording control unit 490. In this example, only the configuration related to the camera work detection unit 480 is illustrated, and illustration and description of other configurations are omitted.

動画入力部４７０は、デジタルビデオカメラ等の撮像装置により撮像された動画を入力する動画入力部であり、入力された動画をカメラワーク検出部４８０に出力する。 The moving image input unit 470 is a moving image input unit that inputs a moving image captured by an imaging device such as a digital video camera, and outputs the input moving image to the camera work detection unit 480.

記録制御部４９０は、カメラワーク検出部４８０から出力されたアフィン変換パラメータを、対応する動画およびフレームに関連付けてメタデータファイルとしてメタデータ記憶部２５０に記録するものである。 The recording control unit 490 records the affine transformation parameters output from the camera work detection unit 480 in the metadata storage unit 250 as a metadata file in association with the corresponding moving image and frame.

特徴点抽出部４８１は、動画入力部４７０から出力された動画を構成するフレームに対応する画像から特徴点を抽出し、抽出された特徴点をオプティカルフロー計算部４８２に出力するものである。ここで、特徴点抽出部４８１は、動画入力部４７０から出力された動画を構成するフレームのうちの先頭のフレームについては、画像全体から特徴点を抽出し、先頭以外のフレームについては、直前のフレームに対応する画像と比較して新しく撮影された領域部分から特徴点を抽出する。なお、特徴点として、例えば、縦方向または横方向にエッジの勾配が強い点（一般に「コーナー点」と呼ばれている。以下では、「コーナー点」と称する。）を抽出することができる。このコーナー点は、オプティカルフローの計算に強い特徴点であり、エッジ検出を用いて求めることができる。例えば、このコーナー点は、図２および図３で示した抽出方法により求めることができる。また、この例では、特徴点抽出部４８１は、先頭のフレームについては画像全体から特徴点を抽出し、先頭以外のフレームについては直前の画像と比較して新しく撮影された領域部分から特徴点を抽出するが、処理能力等に応じて、先頭以外の各フレームについても、画像全体から特徴点を抽出するようにしてもよい。 The feature point extraction unit 481 extracts feature points from an image corresponding to a frame constituting the moving image output from the moving image input unit 470, and outputs the extracted feature points to the optical flow calculation unit 482. Here, the feature point extraction unit 481 extracts the feature point from the entire image for the top frame of the frames constituting the video output from the video input unit 470, and for the frames other than the top, A feature point is extracted from a newly photographed area compared with an image corresponding to a frame. Note that, for example, a point having a strong edge gradient in the vertical direction or the horizontal direction (generally referred to as “corner point”. Hereinafter, referred to as “corner point”) can be extracted as the feature point. This corner point is a feature point that is strong in optical flow calculation, and can be obtained using edge detection. For example, the corner point can be obtained by the extraction method shown in FIGS. Further, in this example, the feature point extraction unit 481 extracts feature points from the entire image for the first frame, and extracts feature points from the newly photographed area portion compared to the previous image for the other frames. Although extraction is performed, feature points may be extracted from the entire image for each frame other than the top in accordance with the processing capability and the like.

オプティカルフロー計算部４８２は、特徴点抽出部４８１から出力された各特徴点に対するオプティカルフローを計算するものであり、計算して求められたオプティカルフローをカメラワークパラメータ算出部４８３に出力する。具体的には、動画入力部４７０から出力された動画を構成する連続する２つのフレーム（現フレームおよびこの直前のフレーム）に対応する各画像を比較することにより、直前のフレームに対応する画像における各特徴点に対応するオプティカルフローを、現フレームのオプティカルフローとして求める。また、オプティカルフローは、動画を構成するフレーム毎に求められる。なお、オプティカルフローを検出する検出方法として、勾配法やブロックマッチング方法等の検出方法を用いることができる。例えば、このオプティカルフローは、図２および図３で示した計算により求めることができる。 The optical flow calculation unit 482 calculates an optical flow for each feature point output from the feature point extraction unit 481, and outputs the optical flow obtained by the calculation to the camera work parameter calculation unit 483. Specifically, by comparing each image corresponding to two consecutive frames (the current frame and the immediately preceding frame) constituting the moving image output from the moving image input unit 470, the image corresponding to the immediately preceding frame is compared. The optical flow corresponding to each feature point is obtained as the optical flow of the current frame. The optical flow is obtained for each frame constituting the moving image. As a detection method for detecting the optical flow, a detection method such as a gradient method or a block matching method can be used. For example, this optical flow can be obtained by the calculation shown in FIGS.

カメラワークパラメータ算出部４８３は、オプティカルフロー計算部４８２から出力された各特徴点に対応するオプティカルフローを用いて、カメラワークパラメータを算出するカメラワークパラメータ算出処理を行うものである。そして、算出されたカメラワークパラメータがメタデータ記憶部２５０に記憶される。ここで、本発明の実施の形態では、再生の対象となる複数の動画を構成する各画像を撮像装置の動きに合わせてそれぞれ変換して表示する。この画像の変換を行うため、オプティカルフロー計算部４８２により計算されたオプティカルフローを用いて撮像装置の動きが抽出され、この抽出された動きに基づいて、カメラワークパラメータ（変換パラメータ）が計算される。なお、本発明の実施の形態では、再生の対象となる動画を構成する画像を変換する画像変換方法として、アフィン変換を用いる例について説明する。また、カメラワークパラメータとして、オプティカルフローに基づいて算出されたアフィン変換パラメータの行列の逆行列に対応するアフィン変換パラメータを用いる例について説明する。すなわち、本発明の実施の形態では、変換情報として用いられるアフィン変換パラメータを、連続する画像間の特徴点の動きを表すアフィン行列ではなく、連続する画像のうちの１つの画像を基準画像とした場合に、この基準画像の次の画像がどこに移動するかを示すアフィン行列に対応するアフィン変換パラメータと定義する。また、カメラワークパラメータとして、アフィン変換パラメータを用いる例について説明するが、射影変換等の他の画像変換方法を用いるようにしてもよい。なお、アフィン変換パラメータは、３点のベクトルを用いて計算して求めることができる。また、射影変換パラメータは、４点のベクトルを用いて計算して求めることができる。ここで、カメラワークパラメータは、撮像動画を構成する撮像画像のうちの少なくとも１つの撮像画像を基準にして他の撮像画像を変換するための変換情報であり、少なくとも撮像装置の座標系で記述される位置情報および姿勢情報を含むものである。すなわち、カメラワークパラメータは、撮影者により撮影されている場合における撮像装置の位置や姿勢に関する情報を含むものである。また、カメラワークパラメータ算出部４８３により求められたアフィン変換パラメータに基づいて、例えば、ズームイン、ズームアウト、パン、チルト、ローテーション等の撮影者の操作による撮像装置の動きを推定することができる。例えば、アフィン変換パラメータは、図２および図３で示した計算により求めることができる。 The camera work parameter calculation unit 483 performs a camera work parameter calculation process for calculating a camera work parameter using the optical flow corresponding to each feature point output from the optical flow calculation unit 482. Then, the calculated camera work parameter is stored in the metadata storage unit 250. Here, in the embodiment of the present invention, each image constituting a plurality of moving images to be reproduced is converted and displayed in accordance with the movement of the imaging device. In order to perform the conversion of the image, the motion of the imaging device is extracted using the optical flow calculated by the optical flow calculation unit 482, and the camera work parameter (conversion parameter) is calculated based on the extracted motion. . In the embodiment of the present invention, an example in which affine transformation is used as an image conversion method for converting an image constituting a moving image to be reproduced will be described. An example in which affine transformation parameters corresponding to the inverse matrix of the affine transformation parameter matrix calculated based on the optical flow is used as the camera work parameter will be described. That is, in the embodiment of the present invention, an affine transformation parameter used as transformation information is not an affine matrix that represents the movement of feature points between successive images, but one of the successive images is used as a reference image. In this case, it is defined as an affine transformation parameter corresponding to an affine matrix indicating where the next image of the reference image moves. In addition, although an example in which an affine transformation parameter is used as a camera work parameter will be described, other image transformation methods such as projective transformation may be used. The affine transformation parameters can be obtained by calculation using a three-point vector. The projective transformation parameter can be obtained by calculation using a vector of four points. Here, the camera work parameter is conversion information for converting another captured image on the basis of at least one captured image constituting the captured moving image, and is described in at least the coordinate system of the image capturing apparatus. Position information and posture information. In other words, the camera work parameter includes information on the position and orientation of the image pickup apparatus when the image is taken by the photographer. Further, based on the affine transformation parameters obtained by the camera work parameter calculation unit 483, it is possible to estimate the movement of the imaging device due to the photographer's operation such as zooming in, zooming out, panning, tilting, rotation, and the like. For example, the affine transformation parameters can be obtained by the calculations shown in FIGS.

次に、本発明の実施の形態における特徴点抽出処理およびオプティカルフロー計算処理をマルチコアプロセッサにより行う場合について図面を参照して詳細に説明する。ここでは、図４８に示す特徴点抽出部４８１により行われる特徴点抽出処理と、オプティカルフロー計算部４８２により行われるオプティカルフロー算出処理とを例にして説明する。 Next, the case where the feature point extraction process and the optical flow calculation process in the embodiment of the present invention are performed by a multi-core processor will be described in detail with reference to the drawings. Here, the feature point extraction process performed by the feature point extraction unit 481 shown in FIG. 48 and the optical flow calculation process performed by the optical flow calculation unit 482 will be described as examples.

図４９は、本発明の実施の形態におけるマルチコアプロセッサ８００の一構成例を示す図である。マルチコアプロセッサ８００は、１つのＣＰＵ（Central Processing Unit）パッケージ上に異なる種類のプロセッサコアが複数搭載されているプロセッサである。すなわち、マルチコアプロセッサ８００には、各プロセッサコア単体の処理性能を維持するとともに、シンプルな構成にするため、あらゆる用途（アプリケーション）に対応する１種類のコアと、所定の用途にある程度最適化されている他の種類のコアとの２種類のプロセッサコアが複数搭載されている。 FIG. 49 is a diagram showing a configuration example of the multi-core processor 800 in the embodiment of the present invention. The multi-core processor 800 is a processor in which a plurality of different types of processor cores are mounted on one CPU (Central Processing Unit) package. That is, in the multi-core processor 800, in order to maintain the processing performance of each processor core alone and to have a simple configuration, one type of core corresponding to every application (application) and a certain degree of optimization for a predetermined application are used. Two or more types of processor cores with other types of cores are mounted.

マルチコアプロセッサ８００は、制御プロセッサコア８０１と、演算プロセッサコア（＃１）８１１乃至（＃８）８１８と、バス８０２とを備え、メインメモリ７８１と接続されている。また、マルチコアプロセッサ８００は、例えば、グラフィックスデバイス７８２やＩ／Ｏデバイス７８３等の他のデバイスと接続される。マルチコアプロセッサ８００として、例えば、本願出願人等により開発されたマイクロプロセッサである「Ｃｅｌｌ（セル：Cell Broadband Engine）」を採用することができる。 The multi-core processor 800 includes a control processor core 801, arithmetic processor cores (# 1) 811 to (# 8) 818, and a bus 802, and is connected to the main memory 781. The multi-core processor 800 is connected to other devices such as a graphics device 782 and an I / O device 783, for example. As the multi-core processor 800, for example, a “Cell (Cell Broadband Engine)” which is a microprocessor developed by the applicant of the present application can be employed.

制御プロセッサコア８０１は、オペレーティング・システムのような頻繁なスレッド切り替え等を主に行う制御プロセッサコアである。なお、制御プロセッサコア８０１については、図５０を参照して詳細に説明する。 The control processor core 801 is a control processor core that mainly performs frequent thread switching as in an operating system. The control processor core 801 will be described in detail with reference to FIG.

演算プロセッサコア（＃１）８１１乃至（＃８）８１８は、マルチメディア系の処理を得意とするシンプルで小型の演算プロセッサコアである。なお、演算プロセッサコア（＃１）８１１乃至（＃８）８１８については、図５１を参照して詳細に説明する。 The arithmetic processor cores (# 1) 811 to (# 8) 818 are simple and small arithmetic processor cores that specialize in multimedia processing. The arithmetic processor cores (# 1) 811 to (# 8) 818 will be described in detail with reference to FIG.

バス８０２は、ＥＩＢ（Element Interconnect Bus）と呼ばれる高速なバスであり、制御プロセッサコア８０１および演算プロセッサコア（＃１）８１１乃至（＃８）８１８のそれぞれが接続され、各プロセッサコアによるデータアクセスはバス８０２を経由して行われる。 The bus 802 is a high-speed bus called EIB (Element Interconnect Bus), and the control processor core 801 and the arithmetic processor cores (# 1) 811 to (# 8) 818 are connected to each other, and data access by each processor core is performed. This is done via the bus 802.

メインメモリ７８１は、バス８０２に接続され、各プロセッサコアにロードすべき各種プログラムや、各プロセッサコアの処理に必要なデータを格納するとともに、各プロセッサコアにより処理されたデータを格納するメインメモリである。 The main memory 781 is connected to the bus 802 and stores various programs to be loaded into each processor core and data necessary for processing of each processor core, and also stores data processed by each processor core. is there.

グラフィックスデバイス７８２は、バス８０２に接続されているグラフィックスデバイスであり、Ｉ／Ｏデバイス７８３は、バス８０２に接続されている外部入出力デバイスである。 The graphics device 782 is a graphics device connected to the bus 802, and the I / O device 783 is an external input / output device connected to the bus 802.

図５０は、本発明の実施の形態における制御プロセッサコア８０１の一構成例を示す図である。制御プロセッサコア８０１は、制御プロセッサユニット８０３および制御プロセッサストレージシステム８０６を備える。 FIG. 50 is a diagram showing a configuration example of the control processor core 801 in the embodiment of the present invention. The control processor core 801 includes a control processor unit 803 and a control processor storage system 806.

制御プロセッサユニット８０３は、制御プロセッサコア８０１の演算処理を行う核となるユニットであり、マイクロプロセッサのアーキテクチャをベースとする命令セットを備え、一次キャッシュとして命令キャッシュ８０４およびデータキャッシュ８０５が搭載されている。命令キャッシュ８０４は、例えば、３２ＫＢの命令キャッシュであり、データキャッシュ８０５は、例えば、３２ＫＢのデータキャッシュである。 The control processor unit 803 is a unit that performs the arithmetic processing of the control processor core 801. The control processor unit 803 includes an instruction set based on a microprocessor architecture, and includes an instruction cache 804 and a data cache 805 as primary caches. . The instruction cache 804 is, for example, a 32 KB instruction cache, and the data cache 805 is, for example, a 32 KB data cache.

制御プロセッサストレージシステム８０６は、制御プロセッサユニット８０３からメインメモリ７８１へのデータアクセスを制御するユニットであり、制御プロセッサユニット８０３からのメモリアクセスを高速化させるために５１２ＫＢの二次キャッシュ８０７が搭載されている。 The control processor storage system 806 is a unit that controls data access from the control processor unit 803 to the main memory 781, and is equipped with a 512 KB secondary cache 807 for speeding up memory access from the control processor unit 803. Yes.

図５１は、本発明の実施の形態における演算プロセッサコア（＃１）８１１の一構成例を示す図である。演算プロセッサコア（＃１）８１１は、演算プロセッサユニット８２０およびメモリフローコントローラ８２２を備える。なお、演算プロセッサコア（＃２）８１２乃至（＃８）８１８は、演算プロセッサコア（＃１）８１１と同様の構成であるため、ここでの説明を省略する。 FIG. 51 is a diagram showing a configuration example of the arithmetic processor core (# 1) 811 in the embodiment of the present invention. The arithmetic processor core (# 1) 811 includes an arithmetic processor unit 820 and a memory flow controller 822. Note that the arithmetic processor cores (# 2) 812 to (# 8) 818 have the same configuration as that of the arithmetic processor core (# 1) 811, and thus the description thereof is omitted here.

演算プロセッサユニット８２０は、演算プロセッサコア（＃１）８１１の演算処理を行う核となるユニットであり、制御プロセッサコア８０１の制御プロセッサユニット８０３とは異なる独自の命令セットを備える。また、演算プロセッサユニット８２０には、ローカルストア（ＬＳ：Local Store）８２１が搭載されている。 The arithmetic processor unit 820 is a unit that performs the arithmetic processing of the arithmetic processor core (# 1) 811 and has a unique instruction set different from the control processor unit 803 of the control processor core 801. The arithmetic processor unit 820 is equipped with a local store (LS) 821.

ローカルストア８２１は、演算プロセッサユニット８２０の専用メモリであるとともに、演算プロセッサユニット８２０から直接参照することができる唯一のメモリである。ローカルストア８２１として、例えば、容量が２５６Ｋバイトのメモリを用いることができる。なお、演算プロセッサユニット８２０が、メインメモリ７８１や他の演算プロセッサコア（演算プロセッサコア（＃２）８１２乃至（＃８）８１８）上のローカルストアにアクセスするためには、メモリフローコントローラ８２２を利用する必要がある。 The local store 821 is a dedicated memory of the arithmetic processor unit 820 and is the only memory that can be directly referred to from the arithmetic processor unit 820. As the local store 821, for example, a memory having a capacity of 256 Kbytes can be used. Note that the memory flow controller 822 is used for the arithmetic processor unit 820 to access a local store on the main memory 781 and other arithmetic processor cores (arithmetic processor cores (# 2) 812 to (# 8) 818). There is a need to.

メモリフローコントローラ８２２は、メインメモリ７８１や他の演算プロセッサコア等との間でデータのやり取りするためのユニットであり、ＭＦＣ（Memory Flow Controller）と呼ばれるユニットである。ここで、演算プロセッサユニット８２０は、チャネルと呼ばれるインタフェースを介してメモリフローコントローラ８２２に対してデータ転送等を依頼する。 The memory flow controller 822 is a unit for exchanging data with the main memory 781 and other arithmetic processor cores, and is a unit called an MFC (Memory Flow Controller). Here, the arithmetic processor unit 820 requests the memory flow controller 822 for data transfer or the like via an interface called a channel.

以上で示したマルチコアプロセッサ８００のプログラミング・モデルとして、さまざまなものが提案されている。このプログラミング・モデルの中で最も基本的なモデルとして、制御プロセッサコア８０１上でメインプログラムを実行し、演算プロセッサコア（＃１）８１１乃至（＃８）８１８上でサブプログラムを実行するモデルが知られている。本発明の実施の形態では、このモデルを用いたマルチコアプロセッサ８００の演算方法について図面を参照して詳細に説明する。 Various programming models for the multi-core processor 800 shown above have been proposed. The most basic model of this programming model is a model that executes a main program on the control processor core 801 and executes subprograms on the arithmetic processor cores (# 1) 811 to (# 8) 818. It has been. In the embodiment of the present invention, a calculation method of the multi-core processor 800 using this model will be described in detail with reference to the drawings.

図５２は、本発明の実施の形態におけるマルチコアプロセッサ８００の演算方法を模式的に示す図である。この例では、データ７８５を用いて制御プロセッサコア８０１がタスク７８４を実行する場合に、タスク７８４の一部であるタスク７８６の処理に必要なデータ７８７（データ７８５の一部）を用いて、タスク７８６を各演算プロセッサコアに実行させる場合を例に図示する。 FIG. 52 is a diagram schematically showing a calculation method of the multi-core processor 800 in the embodiment of the present invention. In this example, when the control processor core 801 executes the task 784 using the data 785, the task 786 necessary for the processing of the task 786 that is a part of the task 784 (a part of the data 785) is used. An example in which each arithmetic processor core executes 786 is illustrated.

同図に示すように、データ７８５を用いて制御プロセッサコア８０１がタスク７８４を実行する場合には、タスク７８４の一部であるタスク７８６の処理に必要なデータ７８７（データ７８５の一部）を用いて、タスク７８６を各演算プロセッサコアに実行させる。本発明の実施の形態では、動画を構成するフレーム毎に各演算プロセッサコアにより演算処理が行われる。 As shown in the figure, when the control processor core 801 executes the task 784 using the data 785, the data 787 (part of the data 785) necessary for processing of the task 786 which is a part of the task 784 is obtained. And causes each arithmetic processor core to execute task 786. In the embodiment of the present invention, arithmetic processing is performed by each arithmetic processor core for each frame constituting a moving image.

同図に示すように、マルチコアプロセッサ８００が演算を行うことにより、演算プロセッサコア（＃１）８１１乃至（＃８）８１８を並列に利用して、比較的少ない時間で多くの演算を行うことができるとともに、演算プロセッサコア（＃１）８１１乃至（＃８）８１８上でＳＩＭＤ（Single Instruction/Multiple Data：単一命令／複数データ）演算を利用して、さらに少ない命令数により、比較的多くの演算処理を行うことができる。なお、ＳＩＭＤ演算については、図５６乃至図５９等を参照して詳細に説明する。 As shown in the figure, when the multi-core processor 800 performs calculations, the calculation processor cores (# 1) 811 to (# 8) 818 can be used in parallel to perform many calculations in a relatively short time. In addition, using SIMD (Single Instruction / Multiple Data) operations on the arithmetic processor cores (# 1) 811 to (# 8) 818, a relatively large number of instructions can be obtained. Arithmetic processing can be performed. The SIMD calculation will be described in detail with reference to FIGS. 56 to 59 and the like.

図５３は、本発明の実施の形態におけるマルチコアプロセッサ８００により演算を行う場合におけるプログラムおよびデータの流れを模式的に示す図である。ここでは、演算プロセッサコア（＃１）８１１乃至（＃８）８１８のうちの演算プロセッサコア（＃１）８１１を例にして説明するが、演算プロセッサコア（＃２）８１２乃至（＃８）８１８についても同様に行うことができる。 FIG. 53 is a diagram schematically showing the flow of programs and data in the case where operations are performed by the multi-core processor 800 in the embodiment of the present invention. Here, the arithmetic processor core (# 2) 812 to (# 8) 818 will be described by taking the arithmetic processor core (# 1) 811 of the arithmetic processor cores (# 1) 811 to (# 8) 818 as an example. The same can be done for.

最初に、制御プロセッサコア８０１は、メインメモリ７８１に格納されている演算プロセッサコアプログラム８２３を演算プロセッサコア（＃１）８１１のローカルストア８２１にロードする指示を演算プロセッサコア（＃１）８１１に送る。これにより、演算プロセッサコア（＃１）８１１は、メインメモリ７８１に格納されている演算プロセッサコアプログラム８２３をローカルストア８２１にロードする。 First, the control processor core 801 sends an instruction to load the arithmetic processor core program 823 stored in the main memory 781 to the local store 821 of the arithmetic processor core (# 1) 811 to the arithmetic processor core (# 1) 811. . As a result, the arithmetic processor core (# 1) 811 loads the arithmetic processor core program 823 stored in the main memory 781 into the local store 821.

続いて、制御プロセッサコア８０１は、ローカルストア８２１に格納された演算プロセッサコアプログラム８２５の実行を演算プロセッサコア（＃１）８１１に指示する。 Subsequently, the control processor core 801 instructs the arithmetic processor core (# 1) 811 to execute the arithmetic processor core program 825 stored in the local store 821.

続いて、演算プロセッサコア（＃１）８１１は、ローカルストア８２１に格納された演算プロセッサコアプログラム８２５の実行処理に必要なデータ８２４をメインメモリ７８１からローカルストア８２１に転送する。 Subsequently, the arithmetic processor core (# 1) 811 transfers data 824 necessary for execution processing of the arithmetic processor core program 825 stored in the local store 821 from the main memory 781 to the local store 821.

続いて、演算プロセッサコア（＃１）８１１は、ローカルストア８２１に格納された演算プロセッサコアプログラム８２５に基づいて、メインメモリ７８１から転送されたデータ８２６を加工し、条件に応じた処理を実行して処理結果をローカルストア８２１に格納する。 Subsequently, the arithmetic processor core (# 1) 811 processes the data 826 transferred from the main memory 781 based on the arithmetic processor core program 825 stored in the local store 821, and executes processing according to the conditions. The processing result is stored in the local store 821.

続いて、演算プロセッサコア（＃１）８１１は、ローカルストア８２１に格納された演算プロセッサコアプログラム８２５に基づいて実行された処理結果をローカルストア８２１からメインメモリ７８１に転送する。 Subsequently, the arithmetic processor core (# 1) 811 transfers the processing result executed based on the arithmetic processor core program 825 stored in the local store 821 from the local store 821 to the main memory 781.

続いて、演算プロセッサコア（＃１）８１１は、制御プロセッサコア８０１に演算処理の終了を通知する。 Subsequently, the arithmetic processor core (# 1) 811 notifies the control processor core 801 of the end of the arithmetic processing.

次に、マルチコアプロセッサ８００を用いて行うＳＩＭＤ演算について図面を参照して詳細に説明する。ここで、ＳＩＭＤ演算とは、複数のデータに対する処理を１つの命令で行う演算方式である。 Next, SIMD operations performed using the multi-core processor 800 will be described in detail with reference to the drawings. Here, the SIMD operation is an operation method in which processing for a plurality of data is performed with one instruction.

図５４（ａ）は、複数のデータに対する処理をそれぞれの命令で行う演算方式の概要を模式的に示す図である。図５４（ａ）に示す演算方式は、通常の演算方式であり、例えば、スカラー演算と呼ばれている。例えば、データ「Ａ１」およびデータ「Ｂ１」を加算する命令によりデータ「Ｃ１」の処理結果が求められる。また、他の３つの演算についても同様に、同一の行にあるデータ「Ａ２」、「Ａ３」、「Ａ４」と、データ「Ｂ２」、「Ｂ３」、「Ｂ４」とを加算する命令がそれぞれの処理について行われ、この命令により、各行の値が加算処理され、この処理結果がデータ「Ｃ２」、「Ｃ３」、「Ｃ４」として求められる。このように、スカラー演算では、複数のデータに対する処理については、それぞれに対して命令を行う必要がある。 FIG. 54A is a diagram schematically showing an outline of an arithmetic method for performing processing on a plurality of data with respective instructions. The calculation method shown in FIG. 54A is a normal calculation method, and is called, for example, scalar calculation. For example, the processing result of the data “C1” is obtained by an instruction to add the data “A1” and the data “B1”. Similarly, the instructions for adding the data “A2”, “A3”, “A4” and the data “B2”, “B3”, “B4” in the same row for the other three operations respectively By this instruction, the values of the respective rows are added, and the processing result is obtained as data “C2”, “C3”, “C4”. As described above, in the scalar calculation, it is necessary to give an instruction to each of a plurality of data.

図５４（ｂ）は、複数のデータに対する処理を１つの命令で行う演算方式であるＳＩＭＤ演算の概要を模式的に示す図である。ここで、ＳＩＭＤ演算用に１まとまりにしたデータ（点線８２７および８２８で囲まれる各データ）は、ベクターデータと呼ばれることがある。また、このようなベクターデータを用いて行われるＳＩＭＤ演算は、ベクトル演算と呼ばれることがある。 FIG. 54B is a diagram schematically showing an outline of SIMD calculation, which is an arithmetic method for performing processing on a plurality of data with one instruction. Here, data grouped for SIMD calculation (each data surrounded by dotted lines 827 and 828) may be referred to as vector data. In addition, SIMD calculation performed using such vector data may be referred to as vector calculation.

例えば、点線８２７で囲まれるベクターデータ（「Ａ１」、「Ａ２」、「Ａ３」、「Ａ４」）と、点線８２８で囲まれるベクターデータ（「Ｂ１」、「Ｂ２」、「Ｂ３」、「Ｂ４」）とを加算する１つの命令により「Ｃ１」、「Ｃ２」、「Ｃ３」、「Ｃ４」の処理結果（点線８２９で囲まれているデータ）が求められる。このように、ＳＩＭＤ演算では、複数のデータに対する処理を１つの命令で行うことができるため、演算処理を迅速に行うことができる。また、これらのＳＩＭＤ演算に関する命令を、マルチコアプロセッサ８００の制御プロセッサコア８０１が行い、この命令に対する複数データの演算処理について演算プロセッサコア（＃１）８１１乃至（＃８）８１８が並列処理を行う。 For example, vector data (“A1”, “A2”, “A3”, “A4”) surrounded by a dotted line 827 and vector data (“B1”, “B2”, “B3”, “B4”) surrounded by a dotted line 828 )), The processing results of “C1”, “C2”, “C3”, and “C4” (data surrounded by a dotted line 829) are obtained. As described above, in the SIMD operation, the processing for a plurality of data can be performed with one instruction, so that the operation processing can be performed quickly. In addition, the control processor core 801 of the multi-core processor 800 performs these SIMD calculation instructions, and the calculation processor cores (# 1) 811 to (# 8) 818 perform parallel processing on calculation processing of a plurality of data corresponding to the instructions.

一方、例えば、データ「Ａ１」と「Ｂ１」とを加算し、データ「Ａ２」と「Ｂ２」とを減算し、データ「Ａ３」と「Ｂ３」とを乗算し、データ「Ａ４」と「Ｂ４」とを除算する処理については、ＳＩＭＤ演算では行うことができない。すなわち、複数のデータのそれぞれに対して異なる処理をする場合には、ＳＩＭＤ演算による処理を行うことがではできない。 On the other hand, for example, data "A1" and "B1" are added, data "A2" and "B2" are subtracted, data "A3" and "B3" are multiplied, and data "A4" and "B4" Cannot be performed by SIMD calculation. That is, when different processing is performed on each of a plurality of data, processing by SIMD calculation cannot be performed.

次に、特徴点抽出処理およびオプティカルフロー算出処理を行う場合におけるＳＩＭＤ演算の具体的な演算方法について図面を参照して詳細に説明する。 Next, a specific calculation method of SIMD calculation in the case of performing feature point extraction processing and optical flow calculation processing will be described in detail with reference to the drawings.

図５５は、本発明の実施の形態における制御プロセッサコア８０１または演算プロセッサコア（＃１）８１１により実行されるプログラムの構成例を示す図である。ここでは、演算プロセッサコア（＃１）８１１についてのみ図示するが、演算プロセッサコア（＃２）８１２乃至（＃８）８１８についても同様の処理が行われる。 FIG. 55 is a diagram showing a configuration example of a program executed by the control processor core 801 or the arithmetic processor core (# 1) 811 in the embodiment of the present invention. Although only the arithmetic processor core (# 1) 811 is illustrated here, the same processing is performed for the arithmetic processor cores (# 2) 812 to (# 8) 818.

制御プロセッサコア８０１は、デコード８５１としてデコード８５２、インターレース８５３およびリサイズ８５４を実行する。デコード８５２は、動画ファイルをデコードする処理である。インターレース８５３は、デコードされた各フレームについてインターレース除去する処理である。リサイズ８５４は、インターレース除去された各フレームについて縮小する処理である。 The control processor core 801 executes decode 852, interlace 853, and resize 854 as the decode 851. The decode 852 is a process for decoding a moving image file. The interlace 853 is a process for removing the interlace for each decoded frame. Resizing 854 is a process of reducing each frame from which the interlace is removed.

また、制御プロセッサコア８０１は、演算プロセッサコア管理８５６として命令送信８５７および８５９、終了通知受信８５８および８６０を実行する。命令送信８５７および８５９は、演算プロセッサコア（＃１）８１１乃至（＃８）８１８に対するＳＩＭＤ演算の実行命令を送信する処理であり、終了通知受信８５８および８６０は、上記命令に対する演算プロセッサコア（＃１）８１１乃至（＃８）８１８からのＳＩＭＤ演算の終了通知を受信する処理である。さらに、制御プロセッサコア８０１は、カメラワーク検出８６１としてカメラワークパラメータ算出処理８６２を実行する。カメラワークパラメータ算出処理８６２は、演算プロセッサコア（＃１）８１１乃至（＃８）８１８によるＳＩＭＤ演算により算出されたオプティカルフローに基づいてフレーム毎にアフィン変換パラメータを算出する処理である。 Further, the control processor core 801 executes instruction transmission 857 and 859 and end notification reception 858 and 860 as the arithmetic processor core management 856. The instruction transmissions 857 and 859 are processes for transmitting SIMD calculation execution instructions to the arithmetic processor cores (# 1) 811 to (# 8) 818, and the end notification receptions 858 and 860 are the arithmetic processor cores (# 1) Processing for receiving SIMD calculation end notifications from 811 to (# 8) 818. Further, the control processor core 801 executes camera work parameter calculation processing 862 as camera work detection 861. The camera work parameter calculation processing 862 is processing for calculating an affine transformation parameter for each frame based on an optical flow calculated by SIMD calculation by the arithmetic processor cores (# 1) 811 to (# 8) 818.

演算プロセッサコア（＃１）８１１は、特徴点抽出処理８６３として、ソベルフィルタ（Sobel Filter）処理８６４、二次モーメント行列（Second Moment Matrix）処理８６５、セパラブルフィルタ（Separable Filter）処理８６６、ハリスコーナー点抽出（Calc Harris）処理８６７、膨張処理（Dilation）８６８、並べ替え処理（Sort）８６９を実行する。 The arithmetic processor core (# 1) 811 includes, as a feature point extraction process 863, a Sobel filter process 864, a second moment matrix process 865, a separable filter process 866, and a Harris corner. A point extraction (Calc Harris) process 867, a dilation process (Dilation) 868, and a rearrangement process (Sort) 869 are executed.

ソベルフィルタ処理８６４は、Ｐ２のフィルタ（ｘ方向）を使って得られるｘ方向の値ｄｘと、Ｙ方向のフィルタを使って得られるｙ方向の値ｄｙとを算出する処理である。なお、ｘ方向の値ｄｘの算出については、図５６乃至図５９を参照して詳細に説明する。 The Sobel filter process 864 is a process of calculating the value dx in the x direction obtained using the P2 filter (x direction) and the value dy in the y direction obtained using the Y direction filter. The calculation of the value dx in the x direction will be described in detail with reference to FIGS. 56 to 59.

二次モーメント行列処理８６５は、ソベルフィルタ処理８６４により算出されたｄｘおよびｄｙを用いて、ｄｘ^２，ｄｙ^２，ｄｘ・ｄｙの各値を算出する処理である。 The second moment matrix process 865 is a process of calculating each value of dx ² , dy ² , dx · dy using dx and dy calculated by the Sobel filter process 864.

セパラブルフィルタ処理８６６は、二次モーメント行列処理８６５により算出されたｄｘ^２，ｄｙ^２，ｄｘ・ｄｙの画像に対してガウシアンフィルタ（ぼかし処理）を掛ける処理である。 The separable filter process 866 is a process of applying a Gaussian filter (blurring process) to the dx ² , dy ² , dx · dy images calculated by the second moment matrix process 865.

ハリスコーナー点抽出処理８６７は、セパラブルフィルタ処理８６６により、ぼかし処理が施されたｄｘ^２，ｄｙ^２，ｄｘ・ｄｙの各値を用いて、ハリスコーナーのスコアを算出する処理である。このハリスコーナーのスコアＳは、例えば、次の式により算出される。
Ｓ＝（ｄｘ^２×ｄｙ^２−ｄｘ・ｄｙ×ｄｘ・ｄｙ）／（ｄｘ^２＋ｄｙ^２＋ε） The Harris corner point extraction process 867 is a process of calculating a Harris corner score using each value of dx ² , dy ² , dx · dy subjected to the blurring process by the separable filter process 866. The Harris corner score S is calculated by the following equation, for example.
S = (dx ² × dy ² −dx · dy × dx · dy) / (dx ² + dy ² + ε)

膨張処理８６８は、ハリスコーナー点抽出処理８６７により算出されたハリスコーナーのスコアで構成された画像に対してぼかし処理を行う処理である。 The expansion process 868 is a process of performing a blurring process on an image composed of the Harris corner score calculated by the Harris corner point extraction process 867.

並べ替え処理８６９は、ハリスコーナー点抽出処理８６７により算出されたハリスコーナーのスコアが高い順に画素を並べ、このスコアが高い方から所定の数だけピックアップし、このピックアップされた点を特徴点として抽出する処理である。 The rearrangement processing 869 arranges pixels in descending order of the Harris corner score calculated by the Harris corner point extraction processing 867, picks up a predetermined number from the higher score, and extracts the picked points as feature points. It is processing to do.

演算プロセッサコア（＃１）８１１は、オプティカルフロー（Optical Flow）演算処理８７０として、ピラミッド画像（Make Pyramid Image）処理８７１、オプティカルフロー算出（Calc Optical Flow）処理８７２を実行する。 The arithmetic processor core (# 1) 811 performs a pyramid image (Make Pyramid Image) process 871 and an optical flow calculation (Calc Optical Flow) process 872 as an optical flow calculation process 870.

ピラミッド画像処理８７１は、撮像装置による撮像時の画サイズから所定数の段階に縮小された画像を順次作成する処理であり、作成された画像は多重解像度画像と呼ばれる。 The pyramid image process 871 is a process of sequentially creating images reduced to a predetermined number of stages from the image size at the time of imaging by the imaging device, and the created image is called a multi-resolution image.

オプティカルフロー算出処理８７２は、ピラミッド画像処理８７１により作成された多重解像度画像のうちで、最も小さい画像についてオプティカルフローを計算し、この計算結果を用いて、１つ上の解像度の画像について再びオプティカルフローを計算する処理であり、この一連の処理を最も大きい画像に辿り着くまで繰り返し行う。 The optical flow calculation processing 872 calculates an optical flow for the smallest image among the multi-resolution images created by the pyramid image processing 871, and uses this calculation result again for the optical flow for the image of the next higher resolution. This series of processing is repeated until the largest image is reached.

このように、例えば、図４８に示す特徴点抽出部４８１により行われる特徴点抽出処理と、オプティカルフロー計算部４８２により行われるオプティカルフロー算出処理とについては、マルチコアプロセッサ８００を用いてＳＩＭＤ演算によって並列処理することにより処理結果を求めることができる。なお、図５５等で示す特徴点抽出処理およびオプティカルフロー算出処理は、一例であり、動画を構成する画像に対する各種フィルタ処理や閾値処理等により構成される他の処理を用いて、マルチコアプロセッサ８００によるＳＩＭＤ演算を行うようにしてもよい。 Thus, for example, the feature point extraction processing performed by the feature point extraction unit 481 shown in FIG. 48 and the optical flow calculation processing performed by the optical flow calculation unit 482 are performed in parallel by SIMD calculation using the multi-core processor 800. Processing results can be obtained by processing. Note that the feature point extraction process and the optical flow calculation process shown in FIG. 55 and the like are examples, and the multicore processor 800 uses other processes configured by various filter processes, threshold processes, and the like for the images constituting the moving image. SIMD calculation may be performed.

図５６は、本発明の実施の形態におけるメインメモリ７８１に格納されている画像データ（撮像装置により撮像された動画を構成する１つのフレームに対応する画像データ）について、ソベルフィルタ８３０を用いてフィルタリング処理を行う場合におけるデータ構造と処理の流れを概略的に示す図である。なお、同図に示すメインメモリ７８１に格納されている画像データについては、横の画素数を３２画素として簡略化して示す。また、ソベルフィルタ８３０は、３×３のエッジ抽出フィルタである。同図に示すように、メインメモリ７８１に格納されている画像データについて、ソベルフィルタ８３０を用いたフィルタリング処理を行い、このフィルタリング処理の結果が出力される。この例では、ＳＩＭＤ演算を用いて４つ分のフィルタ結果を一度に得る例について説明する。 FIG. 56 shows filtering using the Sobel filter 830 for image data (image data corresponding to one frame constituting a moving image captured by the imaging device) stored in the main memory 781 in the embodiment of the present invention. It is a figure which shows roughly the data structure in the case of performing a process, and the flow of a process. Note that the image data stored in the main memory 781 shown in the figure is shown in a simplified manner with 32 horizontal pixels. The Sobel filter 830 is a 3 × 3 edge extraction filter. As shown in the figure, filtering processing using a Sobel filter 830 is performed on the image data stored in the main memory 781, and the result of this filtering processing is output. In this example, an example will be described in which four filter results are obtained at a time using SIMD computation.

図５７は、本発明の実施の形態におけるメインメモリ７８１に格納されている画像データについてソベルフィルタ８３０を用いてＳＩＭＤ演算を行う場合におけるデータの流れを概略的に示す図である。最初は、メインメモリ７８１に格納されている画像データのうちの最初のラインを含む所定数のライン（例えば、３ライン）が演算プロセッサコアのローカルストア８２１に備えられる第一バッファ８３１にＤＭＡ（Direct Memory Access）転送されるとともに、第一バッファ８３１にＤＭＡ転送された各ラインを１つ下にずらした所定数のラインが第二バッファ８３２にＤＭＡ転送される。このように、ダブルバッファを使用することにより、ＤＭＡ転送による遅延を補うことができる。 FIG. 57 is a diagram schematically showing a data flow when SIMD calculation is performed using the Sobel filter 830 on the image data stored in the main memory 781 according to the embodiment of the present invention. Initially, a predetermined number of lines (for example, 3 lines) including the first line of the image data stored in the main memory 781 is transferred to the first buffer 831 provided in the local store 821 of the arithmetic processor core by DMA (Direct Memory Access) and a predetermined number of lines obtained by shifting down each line DMA-transferred to the first buffer 831 by one is DMA-transferred to the second buffer 832. Thus, by using the double buffer, the delay due to DMA transfer can be compensated .

図５８は、本発明の実施の形態におけるソベルフィルタ８３０を用いてフィルタリング処理を行う場合において、第一バッファ８３１に格納されている画像データから９つのベクトルを作成するベクトル作成方法を概略的に示す図である。図５７に示すように、ＤＭＡ転送が行われた後に、第一バッファ８３１に格納されている画像データから９つのベクトルが作成される。具体的には、第一バッファ８３１に格納されている画像データの１ラインにおいて左隅から４つのデータによりベクターデータ８４１が作成され、その４つのデータを右側に１つずらした４つのデータによりベクターデータ８４２が作成され、同様に、その４つのデータを右側に１つずらした４つのデータによりベクターデータ８４３が作成される。また、２ラインおよび３ラインにおいても同様に４つのデータによりベクターデータ８４４乃至８４９が作成される。 FIG. 58 schematically shows a vector creation method for creating nine vectors from image data stored in the first buffer 831 when filtering is performed using the Sobel filter 830 in the embodiment of the present invention. FIG. As shown in FIG. 57, after DMA transfer, nine vectors are created from the image data stored in the first buffer 831. Specifically, vector data 841 is created from four data from the left corner in one line of the image data stored in the first buffer 831, and the vector data is obtained by shifting the four data to the right by one. Similarly, the vector data 843 is created from four data obtained by shifting the four data by one to the right. Similarly, the vector data 844 to 849 are generated from the four data in the second and third lines.

図５９は、本発明の実施の形態におけるソベルフィルタ８３０を用いてフィルタリング処理を行う場合において、ベクターデータ８４１乃至８４９についてＳＩＭＤ演算を用いてベクトル演算を行うベクトル演算方法を概略的に示す図である。具体的には、ベクターデータ８４１乃至８４３についてＳＩＭＤ演算が順次行われ、ベクトルＡが求められる。このＳＩＭＤ演算では、最初に、『「−１」×「ベクターデータ８４１」』のＳＩＭＤ演算が実行される。続いて、『「０」×「ベクターデータ８４２」』のＳＩＭＤ演算が実行され、『「１」×「ベクターデータ８４３」』のＳＩＭＤ演算が実行される。ここで、『「０」×「ベクターデータ８４２」』については、演算結果が「０」であると確定しているため、省略することが可能である。また、『「１」×「ベクターデータ８４３」』については、演算結果が「ベクターデータ８４３」と同じ値であることが確定しているため、省略することが可能である。 FIG. 59 is a diagram schematically showing a vector calculation method for performing vector calculation using SIMD calculation for vector data 841 to 849 when filtering is performed using the Sobel filter 830 according to the embodiment of the present invention. . Specifically, SIMD operations are sequentially performed on the vector data 841 to 843 to obtain the vector A. In this SIMD operation, first, a SIMD operation of ““ −1 ”×“ vector data 841 ”” is executed. Subsequently, the SIMD operation ““ 0 ”×“ vector data 842 ”” is executed, and the SIMD operation ““ 1 ”×“ vector data 843 ”” is executed. Here, “0” × “vector data 842” can be omitted because the calculation result is determined to be “0”. Further, “1” × “vector data 843” can be omitted because the calculation result is determined to be the same value as “vector data 843”.

続いて、『「−１」×「ベクターデータ８４１」』の演算結果と、『「０」×「ベクターデータ８４２」』の演算結果との加算処理がＳＩＭＤ演算により実行される。続いて、この加算処理の結果と、『「１」×「ベクターデータ８４３」』の演算結果との加算処理がＳＩＭＤ演算により実行される。ここで、例えば、「ベクターデータ１」×「ベクターデータ２」＋「ベクターデータ３」となるデータ構造の演算については、ＳＩＭＤ演算により実行することが可能である。そこで、ベクトルＡの演算については、例えば、『「０」×「ベクターデータ８４２」』および『「１」×「ベクターデータ８４３」』についてのＳＩＭＤ演算を省略し、『「−１」×「ベクターデータ８４１」＋「ベクターデータ８４３」』を一度のＳＩＭＤ演算により実行するようにしてもよい。 Subsequently, an addition process of the calculation result ““ −1 ”×“ vector data 841 ”” and the calculation result ““ 0 ”×“ vector data 842 ”” is executed by SIMD calculation. Subsequently, an addition process between the result of the addition process and the calculation result of ““ 1 ”×“ vector data 843 ”” is executed by SIMD calculation. Here, for example, the calculation of the data structure of “vector data 1” × “vector data 2” + “vector data 3” can be executed by SIMD calculation. Therefore, for the calculation of the vector A, for example, the SIMD calculation for ““ 0 ”×“ vector data 842 ”” and ““ 1 ”×“ vector data 843 ”” is omitted, and ““ −1 ”×“ vector ” Data 841 "+" vector data 843 "" may be executed by one SIMD operation.

また、同様に、ベクターデータ８４４乃至８４６についてＳＩＭＤ演算が行われ、ベクトルＢが求められ、ベクターデータ８４７乃至８４９についてＳＩＭＤ演算が行われ、ベクトルＣが求められる。 Similarly, the SIMD operation is performed on the vector data 844 to 846 to determine the vector B, and the SIMD operation is performed on the vector data 847 to 849 to determine the vector C.

続いて、ＳＩＭＤ演算により求められたベクトルＡ乃至ＣについてＳＩＭＤ演算が行われ、ベクトルＤが求められる。このように、ＳＩＭＤ演算を行うことにより、ベクトルの要素数分（この例では４つのデータ）の結果をまとめて得ることができる。 Subsequently, the SIMD operation is performed on the vectors A to C obtained by the SIMD operation, and the vector D is obtained. As described above, by performing the SIMD operation, the results corresponding to the number of elements of the vector (four data in this example) can be collectively obtained.

ベクトルＤが算出された後は、図５７に示す第一バッファ８３１に格納されている画像データにおいて、取り出すデータの位置を右側に１つずらしながら、同様の処理を繰り返し実行して、それぞれのベクトルＤの算出を順次行う。そして、図５７に示す第一バッファ８３１に格納されている画像データの右端までの処理が終了した場合には、処理結果をメインメモリ７８１にＤＭＡ転送する。 After the vector D is calculated, the same processing is repeated in the image data stored in the first buffer 831 shown in FIG. 57 while shifting the position of the data to be extracted by one to the right. D is calculated sequentially. When the processing up to the right end of the image data stored in the first buffer 831 shown in FIG. 57 is completed, the processing result is DMA transferred to the main memory 781.

続いて、メインメモリ７８１に格納されている画像データのうちで、第二バッファ８３２にＤＭＡ転送された各ラインを１つ下にずらした所定数のラインが第一バッファ８３１にＤＭＡ転送されるとともに、第二バッファ８３２に格納されている画像データについて、上述した処理を繰り返し行う。そして、メインメモリ７８１に格納されている画像データの各ラインのうちの下端のラインに達するまで、同様の処理を繰り返し行う。 Subsequently, among the image data stored in the main memory 781, a predetermined number of lines obtained by shifting down each line DMA-transferred to the second buffer 832 by one is DMA-transferred to the first buffer 831. The above-described processing is repeated for the image data stored in the second buffer 832. The same processing is repeated until the lowermost line of the lines of image data stored in the main memory 781 is reached.

同様に、特徴点抽出とオプティカルフロー算出の大部分の処理をＳＩＭＤ演算により行うことによって高速化を実現することができる。 Similarly, high speed can be realized by performing most of the processing of feature point extraction and optical flow calculation by SIMD calculation.

図６０は、本発明の実施の形態におけるカメラワークパラメータ算出処理の流れを時系列で概略的に示す図である。上述したように、例えば、マルチコアプロセッサ８００を用いてＳＩＭＤ演算を行うことにより、動画についてのデコードおよび解析処理を並列化して行うことができる。このため、動画を構成する１フレームの解析時間を、デコード時間よりも短縮することが可能である。 FIG. 60 is a diagram schematically showing the flow of camera work parameter calculation processing in the embodiment of the present invention in time series. As described above, for example, by performing a SIMD operation using the multi-core processor 800, decoding and analysis processing for a moving image can be performed in parallel. For this reason, it is possible to shorten the analysis time of 1 frame which comprises a moving image rather than decoding time.

例えば、同図において、ｔ１は、制御プロセッサコア８０１が動画を構成する１フレームのデコード処理に要する時間を示し、ｔ２は、演算プロセッサコア（＃１）８１１乃至（＃８）８１８が動画を構成する１フレームの特徴点抽出処理に要する時間を示し、ｔ３は、演算プロセッサコア（＃１）８１１乃至（＃８）８１８が動画を構成する１フレームのオプティカルフロー算出処理に要する時間を示し、ｔ４は、制御プロセッサコア８０１が動画を構成する１フレームのカメラワーク検出処理に要する時間を示す。なお、ｔ５は、制御プロセッサコア８０１および演算プロセッサコア（＃１）８１１乃至（＃８）８１８が動画を構成する１フレームについて、カメラワーク検出処理に要する時間を示す。また、ｔ６は、制御プロセッサコア８０１が演算プロセッサコア（＃１）８１１乃至（＃８）８１８を管理する処理に要する時間を示す。例えば、ｔ１を「２５．０ｍｓ」とし、ｔ２を「７．９ｍｓ」とし、ｔ３を「６．７ｍｓ」とし、ｔ４を「１．２ｍｓ」とし、ｔ５を「１５．８ｍｓ」とすることができる。 For example, in the figure, t1 indicates the time required for the decoding processing of one frame constituting the moving image by the control processor core 801, and t2 indicates that the arithmetic processor cores (# 1) 811 to (# 8) 818 constitute the moving image. The time required for the feature point extraction process for one frame to be performed, t3 indicates the time required for the optical flow calculation process for one frame constituting the moving image by the arithmetic processor cores (# 1) 811 to (# 8) 818, and t4 Indicates the time required for the camera processor detection processing of one frame constituting the moving image by the control processor core 801. Note that t5 indicates the time required for the camera work detection process for one frame in which the control processor core 801 and the arithmetic processor cores (# 1) 811 to (# 8) 818 constitute a moving image. Further, t6 can control processor core 801 indicates the arithmetic processor core (# 1) 811 through (# 8) time required for 818 to process to manage. For example, t1 can be “25.0 ms”, t2 can be “7.9 ms”, t3 can be “6.7 ms”, t4 can be “1.2 ms”, and t5 can be “15.8 ms”. .

次に、本発明の実施の形態におけるメタデータファイルを用いた動画コンテンツを再生する場合について図面を参照して詳細に説明する。 Next, a case of reproducing moving image content using a metadata file in the embodiment of the present invention will be described in detail with reference to the drawings.

図６１（ａ）は、記録媒体の一例であるブルーレイディスク（Blu-ray Disc（登録商標））８８０を模式的に示す上面図であり、図６１（ｂ）は、ブルーレイディスク８８０に記録されている各データ８８１乃至８８４を模式的に示す図である。ブルーレイディスク８８０には、例えば、撮像装置等により撮像された動画である動画コンテンツ８８２、動画コンテンツ８８２の字幕８８３、および、動画コンテンツ８８２について解析されて得られたメタデータ（例えば、図３０（ｂ）に示すメタデータファイル、図４３に示す相対関係メタデータファイル）８８４とともに、本発明の実施の形態における動画再生に係るＪａｖａ（登録商標）プログラム８８１が記録されている。 61A is a top view schematically showing a Blu-ray Disc (registered trademark) 880 which is an example of a recording medium, and FIG. 61B is a diagram recorded on the Blu-ray Disc 880. FIG. It is a figure which shows each data 881 thru | or 884 which are present typically. In the Blu-ray disc 880, for example, moving image content 882 which is a moving image captured by an imaging device, subtitle 883 of the moving image content 882, and metadata obtained by analyzing the moving image content 882 (for example, FIG. ) And the relative relationship metadata file shown in FIG. 43) 884, a Java (registered trademark) program 881 related to moving image reproduction in the embodiment of the present invention is recorded.

図６１（ｃ）は、ブルーレイディスク８８０を再生可能なブルーレイ再生機（Blu-ray Disc Player）８９０の内部構成を模式的に示す図である。ここで、ブルーレイディスクを再生可能なブルーレイ再生機８９０は、ＣＰＵ８９１およびＯＳ８９２とともに、ＪａｖａＶＭ（Ｊａｖａ仮想マシン）およびライブラリ８９３が標準で搭載されているため、Ｊａｖａプログラムを実行することが可能である。このため、ブルーレイディスク８８０をブルーレイ再生機８９０に装着することにより、ブルーレイ再生機８９０がＪａｖａプログラム８８１をロードして実行することが可能である。これにより、ブルーレイ再生機８９０が動画コンテンツ８８２を再生する場合に、メタデータ８８４を用いて、本発明の実施の形態における動画再生を行うことが可能である。すなわち、専用のＰＣソフト等を使わずに、全てのブルーレイ再生機で本発明の実施の形態における動画再生を実現することが可能になる。 FIG. 61C is a diagram schematically showing an internal configuration of a Blu-ray player (Blu-ray Disc Player) 890 capable of playing the Blu-ray Disc 880. Here, the Blu-ray player 890 capable of playing a Blu-ray disc is equipped with a Java VM (Java Virtual Machine) and a library 893 as well as the CPU 891 and the OS 892, and thus can execute a Java program. Therefore, the Blu-ray player 890 can load and execute the Java program 881 by mounting the Blu-ray disc 880 on the Blu-ray player 890. Thereby, when the Blu-ray player 890 reproduces the moving image content 882, it is possible to reproduce the moving image according to the embodiment of the present invention using the metadata 884. That is, it is possible to realize the moving image reproduction according to the embodiment of the present invention on all Blu-ray players without using dedicated PC software or the like.

このように、本発明の実施の形態によれば、表示画面上における動画の現フレームに対応する画像の位置、角度または倍率に応じて入力音声を変換処理することができる。これにより、その動画の閲覧者は、表示画面上の現フレームに対応する画像の位置、角度または倍率に応じた適切な音響を聞くことができる。すなわち、より現実味のある音響効果を得ることができる。 As described above, according to the embodiment of the present invention, the input sound can be converted according to the position, angle or magnification of the image corresponding to the current frame of the moving image on the display screen. Thereby, the viewer of the moving image can hear an appropriate sound according to the position, angle or magnification of the image corresponding to the current frame on the display screen. That is, a more realistic sound effect can be obtained.

また、本発明の実施の形態では、撮像装置により撮像された動画について説明したが、例えば、カメラにより撮像された動画が編集された場合における編集後の動画やアニメーション等が合成された動画等についても、本発明の実施の形態を適用することができる。 Further, in the embodiment of the present invention, the moving image captured by the imaging device has been described. For example, the edited moving image or the moving image synthesized with the animation or the like when the moving image captured by the camera is edited. Also, the embodiment of the present invention can be applied.

また、本発明の実施の形態では、パーソナルコンピュータ等の画像処理装置について説明したが、例えば、テレビジョン等の動画再生装置等についても、本発明の実施の形態を適用することができる。 In the embodiment of the present invention, the image processing apparatus such as a personal computer has been described. However, the embodiment of the present invention can also be applied to a moving image reproduction apparatus such as a television.

また、本発明の実施の形態では、音響装置および表示装置等を組み合わせた動画視聴システムについても本発明の実施の形態を適用することができる。 In the embodiment of the present invention, the embodiment of the present invention can also be applied to a moving image viewing system in which an audio device, a display device, and the like are combined.

なお、本発明の実施の形態は本発明を具現化するための一例を示したものであり、特許請求の範囲における発明特定事項とそれぞれ対応関係を有するが、これに限定されるものではなく本発明の要旨を逸脱しない範囲において種々の変形を施すことができる。 The embodiments of the present invention illustrates an example for embodying the present invention, each of invention specification components in the patent claims have a correspondence relationship, it is not limited thereto Various modifications can be made without departing from the scope of the present invention.

なお、本発明の実施の形態において説明した処理手順は、これら一連の手順を有する方法として捉えてもよく、また、これら一連の手順をコンピュータに実行させるためのプログラム乃至そのプログラムを記憶する記録媒体として捉えてもよい。 The processing procedure described in the embodiment of the present invention may be regarded as a method having a series of these procedures, and a program for causing a computer to execute these series of procedures or a recording medium storing the program May be taken as

本発明の実施の形態における画像処理装置１００の機能構成例を示すブロック図である。It is a block diagram which shows the function structural example of the image processing apparatus 100 in embodiment of this invention. 動画を構成するフレームに対応する画像の一例を示す図である。It is a figure which shows an example of the image corresponding to the flame | frame which comprises a moving image. 動画を構成するフレームに対応する画像について背景等を省略して簡略化した画像を示す図である。It is a figure which shows the image which abbreviate | omitted the background etc. about the image corresponding to the flame | frame which comprises a moving image. 本発明の実施の形態における画像処理装置１００によるアフィン変換パラメータ検出処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the affine transformation parameter detection process by the image processing apparatus 100 in embodiment of this invention. 撮像装置により撮影された動画の遷移の一例を示す図である。It is a figure which shows an example of the transition of the moving image image | photographed with the imaging device. 図５に示す各画像において、直前のフレームに対応する画像を破線で示すとともに、検出されるオプティカルフローの一例を示す図である。In each image shown in FIG. 5, while showing the image corresponding to the last flame | frame with a broken line, it is a figure which shows an example of the detected optical flow. 図５に示す画像４０１乃至４０３を含む動画を再生する場合における表示例を示す図である。FIG. 6 is a diagram illustrating a display example when a moving image including the images 401 to 403 illustrated in FIG. 5 is reproduced. 図５に示す画像４０１乃至４０３を含む動画を再生する場合における表示例を示す図である。FIG. 6 is a diagram illustrating a display example when a moving image including the images 401 to 403 illustrated in FIG. 5 is reproduced. 撮像装置により撮影された動画の遷移の一例を示す図である。It is a figure which shows an example of the transition of the moving image image | photographed with the imaging device. 図９に示す各画像において、直前のフレームに対応する画像を破線で示すとともに、検出されるオプティカルフローの一例を示す図である。In each image shown in FIG. 9, while showing the image corresponding to the last flame | frame with a broken line, it is a figure which shows an example of the detected optical flow. 図９に示す画像４２１乃至４２３を含む動画を再生する場合における表示例を示す図である。FIG. 10 is a diagram illustrating a display example when a moving image including the images 421 to 423 illustrated in FIG. 9 is reproduced. 図９に示す画像４２１乃至４２３を含む動画を再生する場合における表示例を示す図である。FIG. 10 is a diagram illustrating a display example when a moving image including the images 421 to 423 illustrated in FIG. 9 is reproduced. 撮像装置により撮影された動画の遷移の一例を示す図である。It is a figure which shows an example of the transition of the moving image image | photographed with the imaging device. 図１３に示す各画像において、直前のフレームに対応する画像を破線で示すとともに、検出されるオプティカルフローの一例を示す図である。FIG. 14 is a diagram showing an example of an optical flow detected in each image shown in FIG. 13 while an image corresponding to the immediately preceding frame is indicated by a broken line. 図１３に示す画像４４１乃至４４３を含む動画を再生する場合における表示例を示す図である。It is a figure which shows the example of a display in the case of reproducing | regenerating the moving image containing the images 441 thru | or 443 shown in FIG. 図１３に示す画像４４１乃至４４３を含む動画を再生する場合における表示例を示す図である。It is a figure which shows the example of a display in the case of reproducing | regenerating the moving image containing the images 441 thru | or 443 shown in FIG. 本発明の実施の形態における音声変換処理部２００の構成例を示すブロック図である。It is a block diagram which shows the structural example of the audio | voice conversion process part 200 in embodiment of this invention. 撮像された動画を通常の再生方法により再生する例の概要を示す図である。It is a figure which shows the outline | summary of the example which reproduces | regenerates the imaged moving image with a normal reproduction method. 本発明の実施の形態における画像処理装置１００による再生例の概要を示す図である。It is a figure which shows the outline | summary of the example of reproduction | regeneration by the image processing apparatus 100 in embodiment of this invention. 本発明の実施の形態における表示部１８０の表示画面の座標系について示すブロック図である。It is a block diagram shown about the coordinate system of the display screen of the display part 180 in embodiment of this invention. 本発明の実施の形態における現フレームに対応する画像の中心位置と出力音声との関係を例示するグラフ図である。It is a graph which illustrates the relationship between the center position of the image corresponding to the present flame | frame, and output audio | voice in embodiment of this invention. 撮像装置５００と被写体との関係例について示す図である。It is a figure shown about the example of a relationship between the imaging device 500 and a to-be-photographed object. 本発明の実施の形態における画像処理装置１００による再生例の概要を示す図である。It is a figure which shows the outline | summary of the example of reproduction | regeneration by the image processing apparatus 100 in embodiment of this invention. 本発明の実施の形態における現フレームに対応する画像の角度と出力音声との関係を例示するグラフ図である。It is a graph which illustrates the relationship between the angle of the image corresponding to the present flame | frame, and output audio | voice in embodiment of this invention. 本発明の実施の形態における画像処理装置１００による再生例の概要を示す図である。It is a figure which shows the outline | summary of the example of reproduction | regeneration by the image processing apparatus 100 in embodiment of this invention. 本発明の実施の形態における現フレームに対応する画像の倍率と出力音声との関係を例示するグラフ図である。It is a graph which illustrates the relationship between the magnification of the image corresponding to the present frame and output sound in the embodiment of the present invention. 本発明の実施の形態における画像処理装置１００による動画再生処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the moving image reproduction | regeneration processing by the image processing apparatus 100 in embodiment of this invention. 本発明の実施の形態における画像処理装置１００による音声変換処理の処理手順例（ステップＳ９５０の処理手順）を示すフローチャートである。It is a flowchart which shows the process sequence example (process sequence of step S950) of the audio | voice conversion process by the image processing apparatus 100 in embodiment of this invention. 本発明の実施の形態における画像処理装置６５０の機能構成例を示すブロック図である。It is a block diagram which shows the function structural example of the image processing apparatus 650 in embodiment of this invention. 本発明の実施の形態における動画記憶部２４０およびメタデータ記憶部２５０に記録されている各ファイルを模式的に示す図である。It is a figure which shows typically each file currently recorded on the moving image memory | storage part 240 and the metadata memory | storage part 250 in embodiment of this invention. 本発明の実施の形態における画像処理装置６８０の機能構成例を示すブロック図である。It is a block diagram which shows the function structural example of the image processing apparatus 680 in embodiment of this invention. 本発明の実施の形態における動画記憶部２４０に記憶されている動画の各フレームと、表示領域との関係を模式的に示す図である。It is a figure which shows typically the relationship between each flame | frame of the moving image memorize | stored in the moving image memory | storage part 240 in embodiment of this invention, and a display area. 現フレームに対応する画像が表示領域からはみ出した場合における表示領域の移動処理を概略的に示す図である。It is a figure which shows roughly the movement process of a display area when the image corresponding to the present flame | frame protrudes from a display area. 図３３に示す移動処理で表示領域を移動させる場合の遷移の一例を示す図である。It is a figure which shows an example of the transition in the case of moving a display area by the movement process shown in FIG. 本発明の実施の形態における動画記憶部２４０に記憶されている動画ファイルの各フレームと、表示領域との関係を模式的に示す図である。It is a figure which shows typically the relationship between each flame | frame of the moving image file memorize | stored in the moving image memory | storage part 240 in embodiment of this invention, and a display area. 表示部１８０における現フレームに対応する画像を固定する表示モードが指定されている場合において、表示部１８０に表示される動画を拡大表示させる場合における拡大方法の概略を示す図である。FIG. 11 is a diagram showing an outline of an enlargement method in a case where a moving image displayed on the display unit 180 is enlarged when a display mode for fixing an image corresponding to the current frame on the display unit 180 is designated. 本発明の実施の形態における動画記憶部２４０に記憶されている動画ファイルの各フレームの流れを模式的に示す図である。It is a figure which shows typically the flow of each frame of the moving image file memorize | stored in the moving image memory | storage part 240 in embodiment of this invention. 本発明の実施の形態における動画記憶部２４０に記憶されている動画ファイルの各フレームの流れを模式的に示す図である。It is a figure which shows typically the flow of each frame of the moving image file memorize | stored in the moving image memory | storage part 240 in embodiment of this invention. 本発明の実施の形態における画像処理装置６５０による動画再生処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the moving image reproduction | regeneration processing by the image processing apparatus 650 in embodiment of this invention. 本発明の実施の形態における画像処理装置６８０による動画再生処理の処理手順を示すフローチャートである。It is a flowchart which shows the process sequence of the moving image reproduction | regeneration processing by the image processing apparatus 680 in embodiment of this invention. 本発明の実施の形態における画像処理装置６８０による音声変換処理の処理手順例（ステップＳ９８０の処理手順）を示すフローチャートである。It is a flowchart which shows the process sequence example (process sequence of step S980) of the audio | voice conversion process by the image processing apparatus 680 in embodiment of this invention. 本発明の実施の形態における画像処理装置７４０の機能構成例を示すブロック図である。It is a block diagram which shows the function structural example of the image processing apparatus 740 in embodiment of this invention. 本発明の実施の形態における動画記憶部２４０および相対関係情報記憶部２９０に記録されている各ファイルを模式的に示す図である。It is a figure which shows typically each file currently recorded on the moving image memory | storage part 240 and the relative relationship information storage part 290 in embodiment of this invention. ２つの動画を合成する場合における合成例を模式的に示す図である。It is a figure which shows typically the example of a synthesis | combination in the case of synthesize | combining two moving images. 本発明の実施の形態における画像処理装置７４０による音声変換処理部２００の構成例を示すブロック図である。It is a block diagram which shows the structural example of the audio | voice conversion process part 200 by the image processing apparatus 740 in embodiment of this invention. 本発明の実施の形態における画像処理装置７４０による２つの動画の同時再生時における音声変換処理の例を示す図である。It is a figure which shows the example of the audio | voice conversion process at the time of the simultaneous reproduction | regeneration of two moving images by the image processing apparatus 740 in embodiment of this invention. 本発明の実施の形態における動画の動き情報以外の情報により音声を変換処理する例を示す図である。It is a figure which shows the example which converts an audio | voice by information other than the motion information of the moving image in embodiment of this invention. 本発明の実施の形態におけるカメラワーク検出部４８０の機能構成例を示すブロック図である。It is a block diagram which shows the function structural example of the camera work detection part 480 in embodiment of this invention. 本発明の実施の形態におけるマルチコアプロセッサ８００の一構成例を示す図である。It is a figure which shows the example of 1 structure of the multi-core processor 800 in embodiment of this invention. 本発明の実施の形態における制御プロセッサコア８０１の一構成例を示す図である。It is a figure which shows the example of 1 structure of the control processor core 801 in embodiment of this invention. 本発明の実施の形態における演算プロセッサコア（＃１）８１１の一構成例を示す図である。It is a figure which shows one structural example of the arithmetic processor core (# 1) 811 in embodiment of this invention. 本発明の実施の形態におけるマルチコアプロセッサ８００の演算方法を模式的に示す図である。It is a figure which shows typically the calculating method of the multi-core processor 800 in embodiment of this invention. 本発明の実施の形態におけるマルチコアプロセッサ８００により演算を行う場合におけるプログラムおよびデータの流れを模式的に示す図である。It is a figure which shows typically the flow of a program and data in the case of calculating by the multi-core processor 800 in embodiment of this invention. 複数のデータに対する処理をそれぞれの命令で行う演算方式の概要、および、複数のデータに対する処理を１つの命令で行うＳＩＭＤ演算の概要を模式的に示す図である。It is a figure which shows typically the outline | summary of the calculation system which performs the process with respect to several data with each instruction | indication, and the outline | summary of the SIMD calculation which performs the process with respect to several data with one instruction | command. 本発明の実施の形態における制御プロセッサコア８０１または演算プロセッサコア（＃１）８１１により実行されるプログラムの構成例を示す図である。It is a figure which shows the structural example of the program performed by the control processor core 801 or the arithmetic processor core (# 1) 811 in embodiment of this invention. 本発明の実施の形態におけるメインメモリ７８１に格納されている画像データについて、ソベルフィルタ８３０を用いてフィルタリング処理を行う場合におけるデータ構造と処理の流れを概略的に示す図である。It is a figure which shows roughly the data structure in the case of performing the filtering process using the Sobel filter 830 about the image data stored in the main memory 781 in embodiment of this invention, and the flow of a process. 本発明の実施の形態におけるメインメモリ７８１に格納されている画像データについてソベルフィルタ８３０を用いてＳＩＭＤ演算を行う場合におけるデータの流れを概略的に示す図である。It is a figure which shows roughly the data flow in the case of performing SIMD calculation using the Sobel filter 830 about the image data stored in the main memory 781 in embodiment of this invention. 本発明の実施の形態におけるソベルフィルタ８３０を用いてフィルタリング処理を行う場合において、第一バッファ８３１に格納されている画像データから９つのベクトルを作成するベクトル作成方法を概略的に示す図である。FIG. 11 is a diagram schematically showing a vector creation method for creating nine vectors from image data stored in a first buffer 831 when performing a filtering process using the Sobel filter 830 in the embodiment of the present invention. 本発明の実施の形態におけるソベルフィルタ８３０を用いてフィルタリング処理を行う場合において、ベクターデータ８４１乃至８４９についてＳＩＭＤ命令を用いてベクトル演算を行うベクトル演算方法を概略的に示す図である。FIG. 18 is a diagram schematically showing a vector operation method for performing vector operation using SIMD instructions for vector data 841 to 849 when filtering is performed using the Sobel filter 830 according to the embodiment of the present invention. 本発明の実施の形態におけるカメラワークパラメータ算出処理の流れを時系列で概略的に示す図である。It is a figure which shows roughly the flow of the camera work parameter calculation process in embodiment of this invention in time series. 記録媒体の一例であるブルーレイディスク８８０、ブルーレイディスク８８０に記録されている各データ８８１乃至８８４、および、ブルーレイディスク８８０を再生可能なブルーレイ再生機８９０の内部構成を模式的に示す図である。FIG. 11 is a diagram schematically showing an internal configuration of a Blu-ray player 890 capable of playing back a Blu-ray disc 880 and each data 881 to 884 recorded on the Blu-ray disc 880 as an example of a recording medium.

符号の説明Explanation of symbols

１００、６５０、６８０、７４０画像処理装置
１１０コンテンツ記憶部
１２０、１２１コンテンツ取得部
１３０画像変換情報供給部
１４０、１４１画像変換部
１５０、１５１画像合成部
１６０、１６１画像メモリ
１７０、１７１表示制御部
１８０表示部
１９０、１９１音声変換情報算出部
２００音声変換処理部
２０１、６３０音量調整部
２０２、６４０音声加算部
２１０音声出力制御部
２２０スピーカ
２３０、２３１、２３２操作受付部
２６０表示領域取出部
２７０表示メモリ
２８０対象画像変換情報算出部
２９０相対関係情報記憶部 100, 650, 680, 740 Image processing device 110 Content storage unit 120, 121 Content acquisition unit 130 Image conversion information supply unit 140, 141 Image conversion unit 150, 151 Image composition unit 160, 161 Image memory 170, 171 Display control unit 180 Display unit 190, 191 Audio conversion information calculation unit 200 Audio conversion processing unit 201, 630 Volume adjustment unit 202, 640 Audio addition unit 210 Audio output control unit 220 Speaker 230, 231, 232 Operation reception unit 260 Display area extraction unit 270 Display memory 280 Target image conversion information calculation unit 290 Relative relationship information storage unit

Claims

動画および当該動画に対応する音声を含むコンテンツデータを取得するコンテンツ取得手段と、
前記動画を構成する第１の画像を基準として前記動画の時間軸において前記第１の画像よりも後に位置して表示対象となる第２の画像を変換するための画像変換情報を供給する画像変換情報供給手段と、
表示手段の表示画面上における前記第１の画像の配置位置を基準として前記画像変換情報に基づいて前記第２の画像を変換する画像変換手段と、
前記変換された第２の画像を前記表示手段に表示させる表示制御手段と、
前記画像変換情報により特定される要素であって前記第１の画像に対する前記第２の画像の移動に関する要素に基づいて前記第２の画像に係る音声を構成する複数のチャンネルの各音量を調整することにより当該音声を変換処理して出力音声を生成する音声変換処理手段と、
前記変換された第２の画像が前記表示手段に表示されている際に前記生成された出力音声を音声出力手段に出力させる音声出力制御手段と
を具備する画像処理装置。 Content acquisition means for acquiring content data including a video and sound corresponding to the video;
Image conversion that supplies image conversion information for converting a second image to be displayed and positioned after the first image on the time axis of the moving image with the first image constituting the moving image as a reference Information supply means;
Image conversion means for converting the second image based on the image conversion information on the basis of the arrangement position of the first image on the display screen of the display means;
Display control means for displaying the second image which is the converted to the Display unit,
The volume of each of the plurality of channels constituting the sound related to the second image is adjusted based on the element specified by the image conversion information and related to the movement of the second image with respect to the first image. and voice conversion processing means for generating an output audio by converting processes the sound by,
The transformed second image is the display to that images processing device and a sound output control means for the output speech said generated when displayed on the unit to be output to the sound output unit.

動画および当該動画に対応する音声を含むコンテンツデータを取得するコンテンツ取得手段と、
前記動画を構成する第１の画像を基準として前記動画の時間軸において前記第１の画像よりも後に位置して表示対象となる第２の画像を変換するための画像変換情報を供給する画像変換情報供給手段と、
表示手段の表示画面上における前記第１の画像の配置位置を基準として前記画像変換情報に基づいて前記第２の画像を変換する画像変換手段と、
前記変換された第２の画像を前記表示手段に表示させる表示制御手段と、
前記画像変換情報により特定される要素であって前記第１の画像に対する前記第２の画像の移動、回転および倍率の少なくとも１つに関する要素に基づいて前記第２の画像に係る音声を構成する複数のチャンネルの各音量を調整することにより当該音声を変換処理して出力音声を生成する音声変換処理手段と、
前記変換された第２の画像が前記表示手段に表示されている際に前記生成された出力音声を音声出力手段に出力させる音声出力制御手段と
を具備する画像処理装置。 Content acquisition means for acquiring content data including a video and sound corresponding to the video;
Image conversion that supplies image conversion information for converting a second image to be displayed and positioned after the first image on the time axis of the moving image with the first image constituting the moving image as a reference Information supply means;
Image conversion means for converting the second image based on the image conversion information on the basis of the arrangement position of the first image on the display screen of the display means;
Display control means for displaying a second image which is the converted to the Display unit,
A plurality of elements constituting the sound related to the second image based on elements specified by the image conversion information and related to at least one of movement, rotation, and magnification of the second image with respect to the first image Sound conversion processing means for converting the sound by adjusting the volume of each channel to generate output sound;
The transformed second image is the display to that images processing device and a sound output control means for the output speech said generated when displayed on the unit to be output to the sound output unit.

前記変換された第２の画像および当該第２の画像の背景となる背景画像を合成して合成画像とする画像合成手段をさらに具備し、Further comprising image combining means for combining the converted second image and a background image serving as a background of the second image into a combined image;
前記表示制御手段は、前記合成画像を前記表示手段に表示させ、The display control means displays the composite image on the display means,
前記音声出力制御手段は、前記合成画像が前記表示手段に表示されている際に前記生成された出力音声を前記音声出力手段に出力させるThe sound output control means causes the sound output means to output the generated output sound when the synthesized image is displayed on the display means.
請求項１または２記載の画像処理装置。The image processing apparatus according to claim 1.

前記要素に基づいて前記第２の画像に係る音声を構成する複数のチャンネルの各音量を調整するための音声変換情報を算出する音声変換情報算出手段をさらに具備し、Voice conversion information calculating means for calculating voice conversion information for adjusting each volume of a plurality of channels constituting the voice related to the second image based on the element;
前記音声変換処理手段は、前記算出された音声変換情報に基づいて前記第２の画像に係る音声を変換処理して前記出力音声を生成するThe voice conversion processing unit converts the voice related to the second image based on the calculated voice conversion information to generate the output voice.
請求項１または２記載の画像処理装置。The image processing apparatus according to claim 1.

前記画像変換情報は、前記第１の画像に対する前記第２の画像の移動、回転および倍率の少なくとも１つに関する要素を含む請求項１または２記載の画像処理装置。 Wherein the image conversion information, the first movement of the second image to the image rotation and image processing apparatus of at least one related component magnifications including請 Motomeko 1 or 2 wherein.

前記音声変換処理手段は、音量調整手段と音声加算手段とを備え、
前記音量調整手段は、前記音声変換情報に基づいて前記音声を構成する複数のチャンネルの各音量を調整し、
前記音声加算手段は、前記調整後の音声をチャンネル毎に加算する
請求項４記載の画像処理装置。 The voice conversion processing means includes a volume adjusting means and a voice adding means,
The volume adjusting means adjusts the volume of each of a plurality of channels constituting the audio based on the audio conversion information,
The sound addition means, you add the audio after the adjustment for each channel
請 Motomeko 4 image processing apparatus according.

前記音声変換処理手段は、前記変換処理を行うことにより、前記出力音声を構成する右チャンネルおよび左チャンネルの音声、または、前記出力音声を構成するセンターチャンネルの音声を生成する請求項１または２記載の画像処理装置。 The speech conversion processing means by performing the conversion processing, right channel and left channel audio constituting the output voice, or, 請 Motomeko 1 that generates a sound of the center channel constituting the output voice or 2. The image processing apparatus according to 2 .

前記音声は、右チャンネルおよび左チャンネルの音声を含み、
前記音声変換処理手段は、前記右チャンネルおよび前記左チャンネルの音声を前記変換処理して前記出力音声を生成する
請求項１または２記載の画像処理装置。 The audio includes right channel and left channel audio,
The speech conversion processing means that generates the output voice sound of the right channel and the left channel to the conversion process
The image processing apparatus 請 Motomeko 1 or 2 wherein.

前記音声は、センターチャンネルの音声を含み、
前記音声変換処理手段は、前記センターチャンネルの音声を前記変換処理して前記出力音声を生成する
請求項１または２記載の画像処理装置。 The audio includes a center channel audio,
The speech conversion processing means that generates the output voice sound of the center channel to the conversion process
The image processing apparatus 請 Motomeko 1 or 2 wherein.

前記第１の画像を含む画像を履歴画像として保持する画像保持手段をさらに具備し、
前記画像変換手段は、前記画像変換情報に基づいて前記第２の画像および前記画像保持手段に保持されている履歴画像のうちの少なくとも一方を変換し、
前記画像合成手段は、前記画像変換手段により少なくとも一方が変換された前記第２の画像および前記履歴画像を合成して前記合成画像とし前記合成画像を新たな履歴画像として前記画像保持手段に保持させる
請求項１または２記載の画像処理装置。 Image holding means for holding an image including the first image as a history image ;
Before Symbol image converting means converts at least one of the history image held in the second image and the image holding unit based on the image conversion information,
The image synthesizing unit synthesizes at least one of the second image converted by the image converting unit and the history image to form the synthesized image, and causes the image holding unit to retain the synthesized image as a new history image. Ru
The image processing apparatus 請 Motomeko 1 or 2 wherein.

前記画像保持手段に保持されている前記新たな履歴画像から前記表示手段の表示対象となる表示領域を決定して当該表示領域に含まれる画像を表示画像として取り出す表示領域取出手段をさらに具備し、
前記画像合成手段は、前記第２の画像を前記表示画像に上書きして合成して新たな表示画像とし、
前記表示制御手段は、前記新たな表示画像を前記表示手段に表示させ、
前記表示領域取出手段は、前記画像保持手段の保持領域における前記表示領域の位置または角度または大きさに関する表示領域取出情報を生成し、
前記音声変換処理手段は、前記画像変換情報により特定される前記要素と、前記表示領域取出情報により特定される要素であって前記画像保持手段の保持領域における前記表示領域の位置、角度、大きさに関する要素とに基づいて前記第２の画像に係る音声を構成する複数のチャンネルの各音量を調整することにより当該音声を変換処理して前記出力音声を生成する
請求項１０記載の画像処理装置。 A display area extracting means for determining a display area to be displayed by the display means from the new history image held in the image holding means and taking out an image included in the display area as a display image;
The image synthesizing means, the pre-Symbol second image by synthesizing by overwriting the displayed image as a new display image,
The display control means causes the display means to display the new display image,
The display area extraction means generates display area extraction information related to the position, angle or size of the display area in the holding area of the image holding means,
The voice conversion processing means includes the element specified by the image conversion information and the element specified by the display area extraction information, and the position, angle, and size of the display area in the holding area of the image holding means. And adjusting the volume of each of a plurality of channels constituting the sound related to the second image based on the elements related to the second image, thereby converting the sound and generating the output sound
請 Motomeko 10 image processing apparatus according.

前記画像変換手段は、前記表示手段における前記動画を表示させる表示領域を示すテンプレート情報に基づいて前記第２の画像を変換する請求項１または２記載の画像処理装置。 The image converting means, the image processing unit 請 Motomeko 1 or 2 wherein that converts the second image based on the template information indicating a display area for displaying the moving image on the display unit.

動画および当該動画に対応する音声を含むコンテンツデータを取得するコンテンツ取得手順と、
前記動画を構成する第１の画像を基準として前記動画の時間軸において前記第１の画像よりも後に位置して表示対象となる第２の画像を変換するための画像変換情報を供給する画像変換情報供給手順と、
表示手段の表示画面上における前記第１の画像の配置位置を基準として前記画像変換情報に基づいて前記第２の画像を変換する画像変換手順と、
前記変換された第２の画像を前記表示手段に表示させる表示制御手順と、
前記画像変換情報により特定される要素であって前記第１の画像に対する前記第２の画像の移動、回転および倍率の少なくとも１つに関する要素に基づいて前記第２の画像に係る音声を構成する複数のチャンネルの各音量を調整することにより当該音声を変換処理して出力音声を生成する音声変換処理手順と、
前記変換された第２の画像が前記表示手段に表示されている際に前記生成された出力音声を音声出力手段に出力させる音声出力制御手順と
を具備する情報処理方法。 A content acquisition step of acquiring contents data including audio corresponding to the moving image and the moving image,
Image conversion that supplies image conversion information for converting a second image to be displayed and positioned after the first image on the time axis of the moving image with the first image constituting the moving image as a reference Information supply procedures;
An image conversion procedure for converting the second image based on the image conversion information on the basis of the arrangement position of the first image on the display screen of the display means ;
A display control procedure for causing the display means to display the converted second image;
A plurality of elements constituting the sound related to the second image based on elements specified by the image conversion information and related to at least one of movement, rotation, and magnification of the second image with respect to the first image An audio conversion processing procedure for converting the audio to generate output audio by adjusting the volume of each of the channels ;
The transformed second image information processing how to and a sound output control procedure for outputting an output audio said generated when being displayed on the display unit to the audio output means.

動画および当該動画に対応する音声を含むコンテンツデータを取得するコンテンツ取得手順と、
前記動画を構成する第１の画像を基準として前記動画の時間軸において前記第１の画像よりも後に位置して表示対象となる第２の画像を変換するための画像変換情報を供給する画像変換情報供給手順と、
表示手段の表示画面上における前記第１の画像の配置位置を基準として前記画像変換情報に基づいて前記第２の画像を変換する画像変換手順と、
前記変換された第２の画像を前記表示手段に表示させる表示制御手順と、
前記画像変換情報により特定される要素であって前記第１の画像に対する前記第２の画像の移動、回転および倍率の少なくとも１つに関する要素に基づいて前記第２の画像に係る音声を構成する複数のチャンネルの各音量を調整することにより当該音声を変換処理して出力音声を生成する音声変換処理手順と、
前記変換された第２の画像が前記表示手段に表示されている際に前記生成された出力音声を音声出力手段に出力させる音声出力制御手順と
をコンピュータに実行させるプログラム。 A content acquisition step of acquiring contents data including audio corresponding to the moving image and the moving image,
Image conversion that supplies image conversion information for converting a second image to be displayed and positioned after the first image on the time axis of the moving image with the first image constituting the moving image as a reference Information supply procedures;
An image conversion procedure for converting the second image based on the image conversion information on the basis of the arrangement position of the first image on the display screen of the display means ;
A display control procedure for causing the display means to display the converted second image;
A plurality of elements constituting the sound related to the second image based on elements specified by the image conversion information and related to at least one of movement, rotation, and magnification of the second image with respect to the first image An audio conversion processing procedure for converting the audio to generate output audio by adjusting the volume of each of the channels ;
The transformed second image Help program the output speech the generated by executing an audio output control procedure for the output to the audio output unit to the computer when being displayed on the display means.