JP2020126434A

JP2020126434A - Image processing device, control method thereof, program, and storage media

Info

Publication number: JP2020126434A
Application number: JP2019018256A
Authority: JP
Inventors: 保彦岩本; Yasuhiko Iwamoto
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2019-02-04
Filing date: 2019-02-04
Publication date: 2020-08-20
Anticipated expiration: 2039-02-04
Also published as: JP7286330B2

Abstract

To provide an image processing device capable of appropriately selecting a subject desired by a user when selecting a main subject from an image.SOLUTION: A main subject is determined from obtained images, and an image to be used as data giving negative rewards in learning for determination means is selected from the images which the main subject is determined therein, based on inputs by a user.SELECTED DRAWING: Figure 6

Description

本発明は、画像処理装置における主被写体領域を決定する技術に関するものである。 The present invention relates to a technique for determining a main subject area in an image processing device.

従来より、撮像装置や画像処理装置において、ユーザが何らかの操作をすることなく、装置が画像を識別し、自動で主被写体を選定する機能が知られている。この機能では、開発者が予め想定する画像群を用意し、各々の画像において主被写体とすべき正解を設定しており、それらに基づいて決定器のパラメータが調整されている。例えば、特許文献１には、予め用意された学習データ群を用いてニューラルネットワークを用いた物体の決定を行う方法が開示されている。 2. Description of the Related Art Conventionally, in image pickup apparatuses and image processing apparatuses, a function is known in which the apparatus identifies an image and automatically selects a main subject without the user performing any operation. In this function, a developer prepares an image group that is supposed in advance and sets a correct answer to be the main subject in each image, and the parameters of the determiner are adjusted based on these. For example, Patent Document 1 discloses a method of determining an object using a neural network using a learning data group prepared in advance.

また、予め学習した決定器に加え、決定時に決定領域から得られた情報を用いて、追加の学習を行う方法も提案されている。例えば、特許文献２には、予め学習した固定識別機と固定識別機の決定領域から得られた情報とを辞書データに追加した学習識別機を用いて物体を決定する方法が開示されている。 In addition to a pre-learned determinator, a method of performing additional learning using information obtained from a decision area at the time of decision has been proposed. For example, Patent Document 2 discloses a method of determining an object using a learning discriminator in which a fixed discriminator learned in advance and information obtained from a determination area of the fixed discriminator are added to dictionary data.

特開２０１６−１５７２１９号公報JP, 2016-157219, A 特開２０１０−１７０２０１号公報JP, 2010-170201, A

Ｓ．Ｈａｙｋｉｎ，“ＮｅｕｒａｌＮｅｔｗｏｒｋｓＡＣｏｍｐｒｅｈｅｎｓｉｖｅＦｏｕｎｄａｔｉｏｎ２ｎｄＥｄｉｔｉｏｎ”，ＰｒｅｎｔｉｃｅＨａｌｌ，ｐｐ．１５６−２５５，Ｊｕｌｙ１９９８S. Haykin, "Neural Networks A Comprehensive Foundation 2nd Edition", Prentice Hall, pp. 156-255, July 1998

ここで、ユーザが頻繁に撮影する画像は、それぞれのユーザによって異なるのが一般的である。また同じ画像であっても主被写体とすべき正解はユーザによって異なる。従って、主被写体の自動選択を行う機能においても、各ユーザの好みに応じた調整がなされるべきである。 Here, the image taken by the user frequently is different for each user. Even if the images are the same, the correct answer to be the main subject differs depending on the user. Therefore, even the function of automatically selecting the main subject should be adjusted according to the preference of each user.

しかしながら、上述の特許文献１では予め用意された学習データ群から学習しているだけであり、上述の特許文献２では、固定識別機の決定領域から得られた情報を辞書データに追加しているだけである。そのため、いずれの技術においても、ユーザの好みに応じた調整がなされているとは言えない。 However, in Patent Document 1 described above, only learning is performed from a learning data group prepared in advance, and in Patent Document 2 described above, the information obtained from the determination area of the fixed classifier is added to the dictionary data. Only. Therefore, in any of the technologies, it cannot be said that the adjustment is made according to the user's preference.

本発明は上述した課題に鑑みてなされたものであり、画像から主被写体を選択する場合に、ユーザの所望する被写体を適切に選択することができる画像処理装置を提供することである。 The present invention has been made in view of the above-described problems, and an object of the present invention is to provide an image processing apparatus capable of appropriately selecting a subject desired by a user when a main subject is selected from an image.

本発明に係わる画像処理装置は、取得した画像から主被写体を決定する決定手段を用いて主被写体が決定された画像の中から、前記決定手段のための学習において負の報酬を与えるデータとして用いる画像を、ユーザの入力に基づいて選別する選別手段を有することを特徴とする。 The image processing apparatus according to the present invention is used as data for giving a negative reward in learning for the determining means from among images whose main subject is determined by using the determining means for determining the main subject from the acquired image. It is characterized by having a selecting means for selecting an image based on an input by a user.

本発明によれば、画像から主被写体を選択する場合に、ユーザの所望する被写体を適切に選択することができる画像処理装置を提供することが可能となる。 According to the present invention, it is possible to provide an image processing apparatus capable of appropriately selecting a subject desired by a user when a main subject is selected from an image.

本発明の一実施形態に係わる撮像装置の構成を示すブロック図。FIG. 3 is a block diagram showing a configuration of an image pickup apparatus according to an embodiment of the present invention. 一実施形態の撮像装置の全体動作の流れを示すフローチャート。3 is a flowchart showing a flow of overall operation of the image pickup apparatus according to the embodiment. 主被写体決定結果の例を示す模式図。The schematic diagram which shows the example of a main subject determination result. ＣＮＮの全体構成の例を示す模式図。The schematic diagram which shows the example of the whole structure of CNN. ＣＮＮの部分構成の例を示す模式図。The schematic diagram which shows the example of the partial structure of CNN. 一実施形態におけるデータセットの選別手順を示すフローチャート。5 is a flowchart showing a procedure for selecting a data set according to an embodiment. 一実施形態における別のデータセットの選別手順を示すフローチャート。6 is a flowchart showing a procedure for selecting another data set according to an embodiment. スマートフォンとサーバからなるシステムを示す図。The figure which shows the system which consists of a smart phone and a server.

以下、添付図面を参照して実施形態を詳しく説明する。尚、以下の実施形態は特許請求の範囲に係る発明を限定するものではない。実施形態には複数の特徴が記載されているが、これらの複数の特徴の全てが発明に必須のものとは限らず、また、複数の特徴は任意に組み合わせられてもよい。さらに、添付図面においては、同一若しくは同様の構成に同一の参照番号を付し、重複した説明は省略する。 Hereinafter, embodiments will be described in detail with reference to the accompanying drawings. The following embodiments do not limit the invention according to the claims. Although a plurality of features are described in the embodiment, not all of the plurality of features are essential to the invention, and the plurality of features may be arbitrarily combined. Further, in the accompanying drawings, the same or similar components are designated by the same reference numerals, and duplicated description will be omitted.

（撮像装置の構成）
図１は、本発明の一実施形態に係わる画像処理装置としての撮像装置１００の構成を示すブロック図である。 (Structure of imaging device)
FIG. 1 is a block diagram showing the arrangement of an image pickup apparatus 100 as an image processing apparatus according to an embodiment of the present invention.

図１において、撮像装置１００は、被写体を撮影して、動画や静止画のデータを、テープ、固体メモリ、光ディスク、磁気ディスクなどの各種メディアに記録可能なデジタルスチルカメラやビデオカメラなどである。しかし、本発明はこれらに限定されるものではなく、カメラ付き携帯電話やタブレット端末等の撮影機能を有する他の装置にも適用可能である。撮像装置１００内の各ユニットは、バス１６０を介して接続されている。また各ユニットは、ＣＰＵ１５１（中央演算処理装置）により制御される。 In FIG. 1, the image pickup apparatus 100 is a digital still camera or a video camera capable of shooting a subject and recording moving image and still image data on various media such as a tape, a solid-state memory, an optical disk, and a magnetic disk. However, the present invention is not limited to these, and can be applied to other devices having a photographing function such as a mobile phone with a camera and a tablet terminal. The units in the image pickup apparatus 100 are connected via a bus 160. Each unit is controlled by the CPU 151 (central processing unit).

レンズユニット１０１は、固定１群レンズ１０２、ズームレンズ１１１、絞り１０３、固定３群レンズ１２１、および、フォーカスレンズ１３１を備えて構成される。絞り制御回路１０５は、ＣＰＵ１５１の指令に従い、絞りモータ１０４（ＡＭ）を介して絞り１０３を駆動することにより、絞り１０３の開口径を調整して撮影時の光量調節を行う。ズーム制御回路１１３は、ズームモータ１１２（ＺＭ）を介してズームレンズ１１１を駆動することにより、焦点距離を変更する。フォーカス制御回路１３３は、レンズユニット１０１のピントのずれ量に基づいてフォーカスモータ１３２（ＦＭ）を駆動する駆動量を決定する。加えてフォーカス制御回路１３３は、フォーカスモータ１３２（ＦＭ）を介してフォーカスレンズ１３１を駆動することにより、焦点調節状態を制御する。フォーカス制御回路１３３およびフォーカスモータ１３２によるフォーカスレンズ１３１の移動制御により、ＡＦ制御が実現される。フォーカスレンズ１３１は、焦点調節用レンズであり、図１には単レンズで簡略的に示されているが、通常複数枚のレンズで構成される。 The lens unit 101 includes a fixed first group lens 102, a zoom lens 111, a diaphragm 103, a fixed third group lens 121, and a focus lens 131. The diaphragm control circuit 105 drives the diaphragm 103 via the diaphragm motor 104 (AM) in accordance with a command from the CPU 151, thereby adjusting the aperture diameter of the diaphragm 103 and adjusting the light amount at the time of shooting. The zoom control circuit 113 changes the focal length by driving the zoom lens 111 via the zoom motor 112 (ZM). The focus control circuit 133 determines a drive amount for driving the focus motor 132 (FM) based on the focus shift amount of the lens unit 101. In addition, the focus control circuit 133 controls the focus adjustment state by driving the focus lens 131 via the focus motor 132 (FM). AF control is realized by the movement control of the focus lens 131 by the focus control circuit 133 and the focus motor 132. The focus lens 131 is a focus adjustment lens, and although it is simply shown as a single lens in FIG. 1, it is usually composed of a plurality of lenses.

レンズユニット１０１を介して撮像素子１４１上に結像された被写体像は、撮像素子１４１により電気信号に変換される。撮像素子１４１は、被写体像（光学像）を電気信号に変換する光電変換素子である。撮像素子１４１は、横方向にｍ画素、縦方向にｎ画素の受光素子が配置されている。撮像素子１４１上に結像されて光電変換された画像は、撮像信号処理回路１４２により画像信号（画像データ）として整えられる。 The subject image formed on the image sensor 141 via the lens unit 101 is converted into an electric signal by the image sensor 141. The image sensor 141 is a photoelectric conversion element that converts a subject image (optical image) into an electric signal. The image sensor 141 has light receiving elements of m pixels in the horizontal direction and n pixels in the vertical direction. The image formed on the image sensor 141 and photoelectrically converted is conditioned by the image signal processing circuit 142 as an image signal (image data).

撮像信号処理回路１４２から出力される画像データは、撮像制御回路１４３に送られ、一時的にＲＡＭ（ランダム・アクセス・メモリ）１５４に蓄積される。ＲＡＭ１５４に蓄積された画像データは、画像圧縮解凍回路１５３において圧縮された後、画像記録媒体１５７に記録される。これと並行して、ＲＡＭ１５４に蓄積された画像データは、画像処理回路１５２に送られる。画像処理回路１５２は、画像信号を処理し、画像データに対して最適なサイズへの縮小・拡大処理や画像データ同士の類似度算出処理等を行う。最適なサイズに処理された画像データを、適宜モニタディスプレイ１５０に送って表示することによりプレビュー画像表示やスルー画像表示を行うことができる。また、主被写体決定回路１６２の主被写体決定結果を画像データに重畳表示することもできる。また、ＲＡＭ１５４をリングバッファとして用いることにより、所定期間内に撮像された複数の画像データと、画像データ毎に対応した主被写体決定回路１６２の決定結果をバッファリングすることができる。また同様に、主被写体決定回路１６２の学習に用いた画像データと、画像データに対応した主被写体決定結果とをバッファリングすることができる。 The image data output from the image pickup signal processing circuit 142 is sent to the image pickup control circuit 143 and is temporarily stored in a RAM (random access memory) 154. The image data accumulated in the RAM 154 is compressed in the image compression/decompression circuit 153 and then recorded in the image recording medium 157. In parallel with this, the image data accumulated in the RAM 154 is sent to the image processing circuit 152. The image processing circuit 152 processes the image signal, and performs reduction/enlargement processing for the image data to an optimum size, similarity calculation processing between the image data, and the like. By sending the image data processed into the optimum size to the monitor display 150 and displaying it appropriately, it is possible to display a preview image or a through image. Further, the main subject determination result of the main subject determination circuit 162 can be displayed on the image data in a superimposed manner. Further, by using the RAM 154 as a ring buffer, it is possible to buffer a plurality of image data picked up within a predetermined period and the determination result of the main subject determination circuit 162 corresponding to each image data. Similarly, the image data used for learning of the main subject determination circuit 162 and the main subject determination result corresponding to the image data can be buffered.

操作スイッチ１５６は、タッチパネルやボタンなどを含む入力インターフェイスであり、モニタディスプレイ１５０に表示される種々の機能アイコンを選択操作することなどにより、様々な操作を行うことができる。例えば、ユーザは、モニタディスプレイ１５０に表示されたスルー画像を見ながら、主被写***置をマニュアル指定したり、既に指定されている主被写体をキャンセルしたりすることができる。 The operation switch 156 is an input interface including a touch panel, buttons and the like, and various operations can be performed by selecting various function icons displayed on the monitor display 150. For example, the user can manually specify the position of the main subject or cancel the already specified main subject while observing the through image displayed on the monitor display 150.

ＣＰＵ１５１は、操作スイッチ１５６から入力されたユーザからの指示、あるいは、一時的にＲＡＭ１５４に蓄積された画像データの画素信号の大きさに基づき、撮像素子１４１の蓄積時間、撮像素子１４１から撮像信号処理回路１４２へ信号を出力する際のゲインの設定値等を決定する。撮像制御回路１４３は、ＣＰＵ１５１から蓄積時間、ゲインの設定値の指示を受け取り、撮像素子１４１を制御する。 The CPU 151 uses the user's instruction input from the operation switch 156 or the pixel signal size of the image data temporarily stored in the RAM 154, based on the storage time of the image sensor 141 and the image signal processing from the image sensor 141. A gain setting value or the like for outputting a signal to the circuit 142 is determined. The image pickup control circuit 143 receives the instruction of the accumulation time and the setting value of the gain from the CPU 151, and controls the image pickup element 141.

主被写体決定回路１６２は、画像信号を用いて主被写体が存在する領域を決定する。主被写体決定回路１６２における主被写体決定処理は、ＣＮＮ（Convolutinal Neural Networks）による特徴抽出処理により実現される。主被写体決定回路１６２は、ＧＰＵ（ＧｒａｐｈｉｃＰｒｏｃｅｓｓｉｎｇＵｎｉｔ）で構成される。ＧＰＵは、元々は画像処理用のプロセッサであるが、複数の積和演算器を有し、行列計算を得意としているため、学習用の処理を行うプロセッサとしても用いられることが多い。そして、深層学習を行う処理においても、ＧＰＵが用いられることが一般的であるが、ＦＰＧＡ（ｆｉｅｌｄ−ｐｒｏｇｒａｍｍａｂｌｅｇａｔｅａｒｒａｙ）やＡＳＩＣ（ａｐｐｌｉｃａｔｉｏｎｓｐｅｃｉｆｉｃｉｎｔｅｇｒａｔｅｄｃｉｒｃｕｉｔ）などを用いてもよい。 The main subject determination circuit 162 determines an area in which the main subject exists using the image signal. The main subject determination process in the main subject determination circuit 162 is realized by a feature extraction process by CNN (Convolutinal Neural Networks). The main subject determination circuit 162 is configured by a GPU (Graphic Processing Unit). The GPU is originally a processor for image processing, but since it has a plurality of multiply-add calculators and is good at matrix calculation, it is often used as a processor for learning processing. A GPU is generally used also in the processing for performing deep learning, but an FPGA (field-programmable gate array), an ASIC (application specific integrated circuit), or the like may be used.

フォーカス制御回路１３３は、特定の被写体領域に対するＡＦ制御を行う。絞り制御回路１０５は、特定の被写体領域の輝度値を用いた露出制御を行う。画像処理回路１５２は、被写体領域に基づいたガンマ補正、ホワイトバランス処理などを行う。モニタディスプレイ１５０は、画像や主被写体決定結果を矩形などで表示する。バッテリ１５９は、電源管理回路１５８により適切に管理され、撮像装置１００の全体に安定した電源供給を行う。 The focus control circuit 133 performs AF control for a specific subject area. The aperture control circuit 105 performs exposure control using the brightness value of a specific subject area. The image processing circuit 152 performs gamma correction, white balance processing, and the like based on the subject area. The monitor display 150 displays the image and the main subject determination result in a rectangle or the like. The battery 159 is appropriately managed by the power management circuit 158, and stably supplies power to the entire imaging device 100.

フラッシュメモリ１５５には、撮像装置１００の動作に必要な制御プログラムや、各部の動作に用いるパラメータ等が記録されている。ユーザの操作により撮像装置１００が起動されると（電源ＯＦＦ状態から電源ＯＮ状態へ移行すると）、フラッシュメモリ１５５に格納された制御プログラム及びパラメータがＲＡＭ１５４の一部に読み込まれる。ＣＰＵ１５１は、ＲＡＭ１５４にロードされた制御プログラム及び定数に従って撮像装置１００の動作を制御する。 The flash memory 155 stores a control program necessary for the operation of the image pickup apparatus 100, parameters used for the operation of each unit, and the like. When the image pickup apparatus 100 is activated by a user operation (when the power is turned off and then turned on), the control program and parameters stored in the flash memory 155 are read into a part of the RAM 154. The CPU 151 controls the operation of the imaging device 100 according to the control program and constants loaded in the RAM 154.

（全体処理フロー）
図２は、本実施形態の撮像装置１００における全体動作の流れを示すフローチャートである。図２に示すフローチャートは、撮像装置がライブビュー表示をしている状態で、例えば、ライブビューの１フレーム期間ごとに繰り返される。 (Overall processing flow)
FIG. 2 is a flowchart showing the flow of the overall operation in the image pickup apparatus 100 of this embodiment. The flowchart shown in FIG. 2 is repeated, for example, every one frame period of the live view while the image pickup apparatus is performing the live view display.

撮像装置１００がライブビュー表示を開始すると、まず、Ｓ２０１において、撮像制御回路１４３は、レンズユニット１０１、撮像素子１４１を用いて取得された入力画像を撮像装置１００の各部へ供給する。 When the image pickup apparatus 100 starts live view display, first, in step S201, the image pickup control circuit 143 supplies the input image acquired by using the lens unit 101 and the image pickup element 141 to each unit of the image pickup apparatus 100.

Ｓ２０２においては、主被写体決定回路１６２は、入力画像に対して主被写体決定を行う。主被写体決定処理の詳細は後述する。また主被写体決定回路１６２には予め必要な学習がなされているものとし、学習処理の詳細は後述する。 In S202, the main subject determination circuit 162 determines the main subject for the input image. Details of the main subject determination process will be described later. Further, it is assumed that the main subject determination circuit 162 has undergone necessary learning in advance, and details of the learning processing will be described later.

Ｓ２０３においては、ＣＰＵ１５１は、Ｓ２０２において主被写体決定回路１６２から主被写体決定結果が出力されたか否かを判定する。出力されていればＳ２０４へ進み、出力されていなければＳ２０５へ進む。 In S203, the CPU 151 determines whether the main subject determination circuit 162 outputs the main subject determination result in S202. If it is output, the process proceeds to S204, and if it is not output, the process proceeds to S205.

Ｓ２０４においては、モニタディスプレイ１５０は、入力画像を表示すると共に主被写体決定結果を重畳表示する。Ｓ２０５においては、モニタディスプレイ１５０は、入力画像のみを表示する。 In S204, the monitor display 150 displays the input image and the main subject determination result in a superimposed manner. In S205, the monitor display 150 displays only the input image.

Ｓ２０６においては、ＣＰＵ１５１は、Ｓ２０４でライブビュー表示した入力画像、主被写体決定結果および表示時刻を、１組のデータセットとしてＲＡＭ１５４にバッファリングする。Ｓ２０７においては、ＣＰＵ１５１は、ＲＡＭ１５４にバッファリングされたデータセットから、ポジティブ学習、あるいは、ネガティブ学習に用いるための入力画像を選択する。この処理については後述する。 In S206, the CPU 151 buffers the input image, the main subject determination result, and the display time displayed in live view in S204 in the RAM 154 as one data set. In S207, the CPU 151 selects an input image to be used for positive learning or negative learning from the data set buffered in the RAM 154. This processing will be described later.

Ｓ２０８においては、ＣＰＵ１５１は、ユーザから撮影指示があったか否かを判定し、撮影指示がある場合にはＳ２０９に進み、撮影指示がない場合にはＳ２１１に進む。なお、撮影指示とは、記録のための静止画の撮影を開始するための指示、あるいは、記録のための動画の撮影を開始するための指示である。ユーザがレリーズボタンを全押ししたり、タッチパネルを操作することによって、ユーザは撮影指示を与えることができる。 In S208, the CPU 151 determines whether or not there is a shooting instruction from the user. If there is a shooting instruction, the process proceeds to S209, and if there is no shooting instruction, the process proceeds to S211. The shooting instruction is an instruction to start shooting a still image for recording or an instruction to start shooting a moving image for recording. The user can give a shooting instruction when the user fully presses the release button or operates the touch panel.

Ｓ２０９において、ＣＰＵ１５１は静止画あるいは動画の撮影処理を行い、撮影が終了するとＳ２１０に進む。 In S209, the CPU 151 performs a still image or moving image shooting process, and when the shooting ends, the process proceeds to S210.

Ｓ２１０において、ＣＰＵ１５１は、ＲＡＭ１５４にバッファリングされたデータセットから、ポジティブ学習、あるいは、ネガティブ学習に用いるためのデータセットを選択する。この処理については後述する。 In S210, the CPU 151 selects a data set used for positive learning or negative learning from the data sets buffered in the RAM 154. This processing will be described later.

Ｓ２１１において、ＣＰＵ１５１は、Ｓ２０７およびＳ２１０にて選択された入力画像の数が閾値以上であるか否かを判定し、閾値以上であればＳ２１２に進み、閾値未満であればＳ２１３に進む。 In S211, the CPU 151 determines whether or not the number of input images selected in S207 and S210 is greater than or equal to the threshold value. If the number is greater than or equal to the threshold value, the process proceeds to S212, and if less than the threshold value, the process proceeds to S213.

Ｓ２１２において、主被写体決定回路１６２は、追加学習処理を行う。追加学習処理の詳細は後述する。 In S212, the main subject determination circuit 162 performs additional learning processing. Details of the additional learning process will be described later.

Ｓ２１３においては、ＣＰＵ１５１は、操作スイッチ１５６からの終了指示があるか否かを判定する。終了指示があれば処理を終了し、終了指示がなければＳ２０１に戻り、一連の処理を繰り返す。 In S213, the CPU 151 determines whether or not there is an end instruction from the operation switch 156. If there is an end instruction, the process is ended, and if there is no end instruction, the process returns to S201 and repeats a series of processes.

（主被写体決定結果の例）
図３は、図２のＳ２０４において表示される主被写体決定結果の例を示す図である。図３（ａ）はＳ２０７における追加学習を行っていない場合の主被写体決定結果の例を示している。図３（ｂ）は、花を好み、頻繁に撮影するユーザの操作により、Ｓ２０７において追加学習を行った後の主被写体決定結果の例を示している。図３（ｃ）は、鳥を好み、頻繁に撮影するユーザの操作により、Ｓ２０７における追加学習を行った後の主被写体決定結果の例を示している。 (Example of main subject determination result)
FIG. 3 is a diagram showing an example of the main subject determination result displayed in S204 of FIG. FIG. 3A shows an example of the main subject determination result when the additional learning in S207 is not performed. FIG. 3B shows an example of a main subject determination result after additional learning is performed in S207 by an operation of a user who likes flowers and frequently photographs. FIG. 3C shows an example of the main subject determination result after the additional learning in S207 is performed by the operation of the user who likes birds and frequently photographs.

（主被写体決定回路の説明）
本実施形態では、主被写体決定回路１６２をＣＮＮ（Convolutinal Neural Networks）で構成する。ＣＮＮの基本的な構成について、図４および図５を用いて説明する。 (Description of main subject determination circuit)
In this embodiment, the main subject determination circuit 162 is configured by CNN (Convolutinal Neural Networks). The basic configuration of the CNN will be described with reference to FIGS. 4 and 5.

図４は、入力された２次元画像データおよび位置マップから主被写体を決定するＣＮＮの基本的な構成を示す図である。処理の流れは、左端を入力とし、右方向に処理が進んでいく。ＣＮＮは、特徴検出層（Ｓ層）と特徴統合層（Ｃ層）と呼ばれる２つの層をひとつのセットとし、それが階層的に構成されている。 FIG. 4 is a diagram showing a basic configuration of the CNN that determines the main subject from the input two-dimensional image data and the position map. In the processing flow, the left end is input, and the processing proceeds to the right. The CNN has two layers called a feature detection layer (S layer) and a feature integration layer (C layer) as one set, and is configured hierarchically.

ＣＮＮでは、まずＳ層において前段階層で検出された特徴に基づいて次の特徴を検出する。またＳ層において検出した特徴をＣ層で統合し、その階層における検出結果として次の階層に送るように構成されている。このＣＮＮに入力される情報としてはさまざまなものが考えられる。例えば、ＲＧＢ画像、あるいは現像処理前の画素単位の撮像画像信号（ＲＡＷ画像）や画像のデプス情報、物体検出器による物体検出スコアのマップ、画像の局所領域における分散値から得られるコントラストマップ、などが挙げられる。 In CNN, first, the next feature is detected in the S layer based on the feature detected in the previous layer. Further, the features detected in the S layer are integrated in the C layer and are sent to the next layer as the detection result in that layer. Various information can be considered as the information input to the CNN. For example, an RGB image, or a captured image signal (RAW image) in pixel units before development processing, image depth information, a map of an object detection score by an object detector, a contrast map obtained from a variance value in a local region of the image, etc. Is mentioned.

Ｓ層は特徴検出細胞面からなり、特徴検出細胞面ごとに異なる特徴を検出する。また、Ｃ層は、特徴統合細胞面からなり、前段の特徴検出細胞面での検出結果をプーリングする。以下では、特に区別する必要がない場合、特徴検出細胞面および特徴統合細胞面を総称して特徴面と呼ぶ。本実施形態では、最終段階層である出力層は、Ｃ層は用いずＳ層のみで構成されている。 The S layer is composed of a feature-detecting cell surface, and detects different features for each feature-detecting cell surface. In addition, the layer C is composed of the feature-integrated cell surface, and pools the detection result on the feature-detected cell surface in the previous stage. In the following, the feature-detecting cell surface and the feature-integrating cell surface are generically referred to as a feature surface unless it is necessary to distinguish them. In the present embodiment, the output layer, which is the final stage layer, is composed of only the S layer without using the C layer.

図５は、特徴検出細胞面での特徴検出処理、および特徴統合細胞面での特徴統合処理について説明する図である。 FIG. 5 is a diagram for explaining the feature detection processing on the feature-detected cell surface and the feature integration processing on the feature-integrated cell surface.

特徴検出細胞面は、複数の特徴検出ニューロンにより構成され、特徴検出ニューロンは前段階層のＣ層に所定の構造で結合されている。また特徴統合細胞面は、複数の特徴統合ニューロンにより構成され、特徴統合ニューロンは同階層のＳ層に所定の構造で結合されている。図５に示した、Ｌ階層目Ｓ層のＭ番目細胞面内において、位置（ξ，ζ）の特徴検出ニューロンの出力値をｙ_M ^LS(ξ,ζ)、Ｌ階層目Ｃ層のＭ番目細胞面内において、位置（ξ，ζ）の特徴統合ニューロンの出力値をｙ_M ^LC(ξ,ζ)と表記する。その場合、それぞれのニューロンの結合係数をｗ_M ^LS(n,u,v)、ｗ_M ^LC(u,v)とすると、各出力値は以下のように表すことができる。 The feature-detecting cell plane is composed of a plurality of feature-detecting neurons, and the feature-detecting neurons are connected to the C layer of the preceding stage layer with a predetermined structure. The feature-integrated cell plane is composed of a plurality of feature-integrated neurons, and the feature-integrated neurons are connected to the S layer of the same hierarchy with a predetermined structure. In the Mth cell plane of the Lth layer S layer shown in FIG. 5, the output value of the feature detection neuron at the position (ξ, ζ) is y _M ^LS (ξ, ζ), and the Mth layer of the Lth layer C layer. In the cell plane, the output value of the feature integration neuron at the position (ξ, ζ) is expressed as y _M ^LC (ξ, ζ). In that case, assuming that the coupling coefficient of each neuron is w _M ^LS (n,u,v) and w _M ^LC (u,v), each output value can be expressed as follows.

…（１）

…（２）
式（１）のｆは活性化関数であり、ロジスティック関数や双曲正接関数などのシグモイド関数であれば何でもよく、例えばtanh関数で実現してよい。ｕ_M ^LS(ξ,ζ)は、Ｌ階層目Ｓ層のＭ番目細胞面における、位置(ξ,ζ)の特徴検出ニューロンの内部状態である。式（２）は活性化関数を用いず単純な線形和をとっている。式（２）のように活性化関数を用いない場合は、ニューロンの内部状態ｕ_M ^LC(ξ,ζ)と出力値ｙ_M ^LC(ξ,ζ)は等しい。また、式（１）のｙ_n ^(L-1C)(ξ+u,ζ+v)、式（３）のｙ_M ^LS(ξ+u,ζ+v)をそれぞれ特徴検出ニューロン、特徴統合ニューロンの結合先出力値と呼ぶ。

…(1)

…(2)
F in Expression (1) is an activation function, and may be any sigmoid function such as a logistic function or a hyperbolic tangent function, and may be realized by, for example, a tanh function. u _M ^LS (ξ, ζ) is the internal state of the feature detection neuron at the position (ξ, ζ) on the Mth cell plane of the S layer of the Lth layer. Expression (2) is a simple linear sum without using the activation function. When the activation function is not used as in Expression (2), the internal state u _M ^LC (ξ, ζ) of the neuron is equal to the output value y _M ^LC (ξ, ζ). In addition, y _n ^(L-1C) (ξ+u, ζ+v) in the equation (1) and y _M ^LS (ξ+u, ζ+v) in the equation (3) are the feature detection neuron and the feature integrated neuron, respectively. It is called a combined output value of.

ここで、式（１）及び式（２）中のξ，ζ，ｕ，ｖ，ｎについて説明する。位置（ξ，ζ）は、入力画像における位置座標に対応しており、例えばｙ_M ^LS(ξ,ζ)が高い出力値である場合は、入力画像の画素位置（ξ，ζ）に、Ｌ階層目Ｓ層Ｍ番目細胞面において検出する特徴が存在する可能性が高いことを意味する。またｎは式（２）において、Ｌ−１階層目Ｃ層ｎ番目細胞面を意味しており、統合先特徴番号と呼ぶ。基本的にＬ−１階層目Ｃ層に存在する全ての細胞面についての積和演算を行う。（ｕ，ｖ）は、結合係数の相対位置座標であり、検出する特徴のサイズに応じて有限の範囲（ｕ，ｖ）において積和演算を行う。このような有限な（ｕ，ｖ）の範囲を受容野と呼ぶ。また受容野の大きさを、以下では受容野サイズと呼び、結合している範囲の横画素数×縦画素数で表す。 Here, ξ, ζ, u, v, and n in the equations (1) and (2) will be described. The position (ξ, ζ) corresponds to the position coordinate in the input image. For example, when y _M ^LS (ξ, ζ) is a high output value, the pixel position (ξ, ζ) of the input image is L It means that there is a high possibility that there is a feature to be detected in the cell surface of the Mth cell in the S layer of the hierarchy. Further, n in the formula (2) means the nth cell plane of the L-1th layer C layer, and is referred to as an integration destination feature number. Basically, the sum of products operation is performed for all cell planes existing in the L-1th layer C layer. (U, v) are relative position coordinates of the coupling coefficient, and the sum of products operation is performed in a finite range (u, v) according to the size of the detected feature. Such a finite (u,v) range is called a receptive field. The size of the receptive field is hereinafter referred to as the receptive field size, and is represented by the number of horizontal pixels in the combined range×the number of vertical pixels.

また式（１）において、Ｌ＝１つまり一番初めのＳ層では、ｙ_n ^(L-1C)(ξ+u,ζ+v)は、入力画像ｙ^in_image (ξ+u,ζ+v)または、入力位置マップｙ^in_posi_map (ξ+u,ζ+v)となる。ちなみに、ニューロンや画素の分布は離散的であり、結合先特徴番号も離散的なので、ξ，ζ，ｕ，ｖ，ｎは連続な変数ではなく、離散的な値をとる。ここでは、ξ，ζは非負整数、ｎは自然数、ｕ，ｖは整数とし、何れも有限な範囲となる。 Further, in the equation (1), in L=1, that is, in the first S layer, y _n ^(L-1C) (ξ+u, ζ+v) is the input image y ^in_image (ξ+u, ζ+v) ^{Alternatively} , the input position map is y ^in_posi_map (ξ+u, ζ+v). By the way, since the distribution of neurons and pixels is discrete and the feature number of the connection destination is also discrete, ξ, ζ, u, v, and n are not continuous variables but take discrete values. Here, ξ and ζ are non-negative integers, n is a natural number, and u and v are integers, and both are in a finite range.

式（１）中のｗ_M ^LS(n,u,v)は、所定の特徴を検出するための結合係数分布であり、これを適切な値に調整することによって、所定の特徴を検出することが可能になる。この結合係数分布の調整が学習であり、ＣＮＮの構築においては、さまざまなテストパターンを提示して、ｙ_M ^LS(ξ,ζ)が適切な出力値になるように、結合係数を繰り返し徐々に修正していくことにより結合係数の調整を行う。 W _M ^LS (n,u,v) in Expression (1) is a coupling coefficient distribution for detecting a predetermined feature, and the predetermined feature can be detected by adjusting this to an appropriate value. Will be possible. The adjustment of this coupling coefficient distribution is learning, and in constructing CNN, various coupling test patterns are presented and the coupling coefficient is gradually and gradually repeated so that y _M ^LS (ξ, ζ) becomes an appropriate output value. The coupling coefficient is adjusted by making corrections.

次に、式（２）中のｗ_M ^LC(u,v)は、２次元のガウシアン関数を用いており、以下の式（３）のように表すことができる。 Next, w _M ^LC (u,v) in the equation (2) uses a two-dimensional Gaussian function and can be expressed as the following equation (3).

…（３）
ここでも、（ｕ，ｖ）は有限の範囲としているので、特徴検出ニューロンの説明と同様に、有限の範囲を受容野といい、範囲の大きさを受容野サイズと呼ぶ。この受容野サイズは、ここではＬ階層目Ｓ層のＭ番目特徴のサイズに応じて適当な値に設定すればよい。式（３）中の、σは特徴サイズ因子であり、受容野サイズに応じて適当な定数に設定しておけばよい。具体的には、受容野の一番外側の値がほぼ０とみなせるような値になるように設定するのがよい。

…(3)
Since (u, v) is also a finite range here, the finite range is called a receptive field, and the size of the range is called a receptive field size, similarly to the description of the feature detection neuron. This receptive field size may be set to an appropriate value here according to the size of the Mth feature of the Lth layer S layer. In equation (3), σ is a feature size factor, which may be set to an appropriate constant according to the receptive field size. Specifically, it is preferable that the outermost value of the receptive field is set to a value that can be regarded as almost zero.

上述のような演算を各階層で行うことにより、最終階層のＳ層において、主被写体決定を行うのが、本実施形態におけるＣＮＮの構成である。 The configuration of the CNN in this embodiment is to determine the main subject in the S layer, which is the final layer, by performing the above-described calculation in each layer.

（追加学習処理の流れ）
図６は、図２のＳ２０７における画像の選択処理の流れを示すフローチャートである。なお、以下では、入力画像の所定の領域を主被写体を決定すべきテストパターンとして学習することをポジティブ（肯定的）学習と記載する。また所定の領域を主被写体を決定すべきではないテストパターンとして学習することをネガティブ（否定的）学習と記載する。また各学習における詳細は後述する。 (Flow of additional learning process)
FIG. 6 is a flowchart showing the flow of image selection processing in S207 of FIG. In the following, learning a predetermined area of the input image as a test pattern for determining the main subject will be referred to as positive learning. Learning a predetermined area as a test pattern in which the main subject should not be determined is referred to as negative learning. The details of each learning will be described later.

まず、Ｓ６０１において、ＣＰＵ１５１は、操作スイッチ１５６からマニュアル指示があったか否かを判定する。マニュアル指示があった場合にはＳ６０２に進み、マニュアル指示がない場合はＳ６０５に進む。ここでマニュアル指示とは、ユーザが操作スイッチ１５６をマニュアル操作し、画像に対する主被写体領域を指定する操作（選択指示）を指す。Ｓ６０１でマニュアル指示があったということは、図２のＳ２０２において決定された主被写体がユーザが所望する主被写体と異なることを意味するため、Ｓ６０２以降の処理に進む。 First, in S601, the CPU 151 determines whether or not there is a manual instruction from the operation switch 156. If there is a manual instruction, the process proceeds to S602, and if there is no manual instruction, the process proceeds to S605. Here, the manual instruction refers to an operation (selection instruction) in which the user manually operates the operation switch 156 to specify the main subject area for the image. Since the manual instruction in S601 means that the main subject determined in S202 of FIG. 2 is different from the main subject desired by the user, the process proceeds to S602 and subsequent steps.

Ｓ６０２においては、Ｓ６０１でユーザがマニュアルで指示した被写体が正しい主被写体と考えられる。そのため、ＣＰＵ１５１は、図２のＳ２０１で取得した入力画像と、Ｓ６０１でマニュアル指示された主被写体領域をポジティブ学習のためのデータセットとして選別する。 In S602, the subject manually instructed by the user in S601 is considered to be the correct main subject. Therefore, the CPU 151 selects the input image acquired in S201 of FIG. 2 and the main subject region manually instructed in S601 as a data set for positive learning.

Ｓ６０３においては、ＣＰＵ１５１は、図２のＳ２０２で決定した主被写体領域と、Ｓ６０１でマニュアル指定された主被写体領域が異なる被写体であるか否かを判定する。例えば、図２のＳ２０２で決定した主被写体領域と、Ｓ６０１でマニュアル指定された主被写体領域が所定間隔以上離れている場合、２つの領域が異なる被写体であると判定する。所定間隔以上離れていればＳ６０４に進み、離れていなければこのフローチャートを終了する。なおここでの所定間隔は十分大きければ何でもよいが、例えば入力画像の画角に対する割合で決定してもよい。より具体的には、入力画像が６４０×４８０画素の画像である場合、例えば、水平サイズの６４０に対する１０％である６４画素を所定間隔としてもよい。なお、顔検出、人体検出、あるいは、物体検出などの結果により、Ｓ２０２で決定した主被写体領域とＳ６０１でマニュアル指定された主被写体領域が異なる被写体であることが明らかであれば、２つの領域の間の距離を判定する必要はない。 In S603, the CPU 151 determines whether the main subject area determined in S202 of FIG. 2 is different from the main subject area manually designated in S601. For example, when the main subject area determined in S202 of FIG. 2 and the main subject area manually specified in S601 are separated by a predetermined distance or more, it is determined that the two areas are different subjects. If the distance is equal to or more than the predetermined interval, the process proceeds to S604, and if not, the process ends. The predetermined interval here may be anything as long as it is sufficiently large, but may be determined by, for example, the ratio to the angle of view of the input image. More specifically, when the input image is an image of 640×480 pixels, for example, 64 pixels, which is 10% of the horizontal size 640, may be set as the predetermined interval. If it is clear that the main subject area determined in S202 and the main subject area manually designated in S601 are different from each other as a result of face detection, human body detection, or object detection, the two areas There is no need to determine the distance between.

Ｓ６０４においては、ＣＰＵ１５１は、図２のＳ２０１で取得した入力画像と、Ｓ２０２で決定した主被写体領域をネガティブ学習のためのデータセットとして選別する。そして、このフローチャートを終了する。 In S604, the CPU 151 selects the input image acquired in S201 of FIG. 2 and the main subject region determined in S202 as a data set for negative learning. Then, this flowchart ends.

Ｓ６０５においては、ＣＰＵ１５１は、操作スイッチ１５６からキャンセル指示があったか否かを判定する。キャンセル指示があった場合にはＳ６０６に進み、キャンセル指示がない場合にはこのフローチャートを終了する。ここでキャンセル指示とは、図２のＳ２０４で出力した主被写体決定結果を削除し、撮像装置１００を主被写体の決定がされていない状態にする操作を指す。例えば、主被写体領域の決定をやり直すことをユーザが指示できる機能が搭載されている場合には、その指示が、キャンセル指示に相当する。あるいは、焦点調節の対象とする被写体を確定するための操作である、レリーズボタンの半押しがユーザによってやり直された場合には、この半押しのやり直しが、キャンセル指示に相当する。Ｓ６０５でキャンセル指示があったということは、図２のＳ２０２において決定された主被写体がユーザが所望する主被写体と異なることを意味するため、Ｓ６０６の処理に進む。 In S605, the CPU 151 determines whether or not there is a cancel instruction from the operation switch 156. If there is a cancel instruction, the process proceeds to step S606, and if there is no cancel instruction, this flowchart ends. Here, the cancel instruction refers to an operation of deleting the main subject determination result output in S204 of FIG. 2 and setting the imaging apparatus 100 in a state in which the main subject has not been determined. For example, when a function that allows the user to instruct to re-determine the main subject area is installed, the instruction corresponds to the cancel instruction. Alternatively, when the user redoes the half-press of the release button, which is an operation for determining the subject to be the focus adjustment target, the redo of the half-press corresponds to the cancel instruction. Since the cancellation instruction is given in S605 means that the main subject determined in S202 of FIG. 2 is different from the main subject desired by the user, the process proceeds to S606.

Ｓ６０６においては、ＣＰＵ１５１は、図２のＳ２０１で取得した入力画像と、Ｓ２０２で決定した主被写体領域をネガティブ学習のためのデータセットとして選別する。そして、このフローチャートを終了する。 In S606, the CPU 151 selects the input image acquired in S201 of FIG. 2 and the main subject area determined in S202 as a data set for negative learning. Then, this flowchart ends.

図７は、図２のＳ２１０における画像の選択処理の流れを示すフローチャートである。 FIG. 7 is a flowchart showing the flow of image selection processing in S210 of FIG.

Ｓ７０１においては、ＣＰＵ１５１は、ＲＡＭ１５４にバッファリングされたデータセット群のそれぞれについて、撮影指示があったときと同じ被写体が、主被写体として決定されたデータセットであるか否かを判定する。撮影指示があったときと同じ被写体が主被写体として決定されているデータセットであればＳ７０２に進み、そうでなければＳ７０３に進む。撮影指示よりも前に、撮影指示があったときに主被写体として決定されていた被写体とは別の被写体が、主被写体として決定されていたのであれば、先の主被写体の決定は誤りである可能性が高い。そのため、別の被写体が主被写体として決定されていたときのデータセットは、ネガティブ学習のためのデータセットとして有効である。 In step S<b>701, the CPU 151 determines, for each of the data set groups buffered in the RAM 154, whether the same subject as when the shooting instruction is issued is the data set determined as the main subject. If the same subject as when the shooting instruction is issued is the data set determined as the main subject, the process proceeds to S702, and if not, the process proceeds to S703. If a subject other than the subject that was determined to be the main subject when the shooting instruction was issued was determined to be the main subject prior to the shooting instruction, the determination of the main subject is incorrect. Probability is high. Therefore, the data set when another subject is determined as the main subject is effective as a data set for negative learning.

Ｓ７０２においては、ＣＰＵ１５１は、撮影指示があったときと同じ被写体が主被写体として決定されているデータセット群から、ポジティブ学習条件を満たすデータセットを選別する。ここでのポジティブ学習条件とは、データセットのうち表示時刻と本ステップの実行時刻の差が所定時間未満であることである。なお、ここでの所定時間は、ユーザが図２のＳ２０４でモニタディスプレイ１５０に出力された主被写体決定結果を確認してから、Ｓ６０７で操作スイッチ１５６を用いて撮影指示を行うまでのタイムラグに基づいて決定してもよい。より具体的には、例えば所定時間を２秒とする。また１度に複数のデータセットが条件を満たす場合には、各々のデータセットに含まれる入力画像同士の類似度を算出し、類似度が所定値以上の場合には、類似する何れかのデータセットを選別対象から除外する。除外処理においては、データセットに含まれる表示時刻と本ステップの実行時刻の差が大きいデータを優先的に除外してもよい。ここでの類似度には、ＳＡＤ値やヒストグラム差分等を用いることができる。 In step S<b>702, the CPU 151 selects a data set that satisfies the positive learning condition from the data set group in which the same subject as when the shooting instruction is given is determined as the main subject. The positive learning condition here is that the difference between the display time and the execution time of this step in the data set is less than the predetermined time. Note that the predetermined time here is based on the time lag from when the user confirms the main subject determination result output to the monitor display 150 in S204 of FIG. 2 to when the user issues a shooting instruction using the operation switch 156 in S607. You may decide. More specifically, for example, the predetermined time is set to 2 seconds. If multiple data sets satisfy the condition at one time, the similarity between the input images included in each data set is calculated, and if the similarity is equal to or more than a predetermined value, one of the similar data Exclude sets from selection. In the exclusion process, data having a large difference between the display time and the execution time of this step included in the data set may be preferentially excluded. The SAD value, histogram difference, or the like can be used as the degree of similarity here.

Ｓ７０３においては、ＣＰＵ１５１は、撮影指示があったときと別の被写体が主被写体として決定されているデータセット群から、ネガティブ学習条件を満たすデータセットを選別する。また１度に複数のデータセットが条件を満たす場合には、Ｓ６０８と同様に除外処理を行う。ＣＰＵ１５１は、Ｓ７０２およびＳ７０３の少なくともいずれかの処理が行われると、このフローチャートを終了する。 In step S703, the CPU 151 selects a data set satisfying the negative learning condition from the data set group in which a subject different from that when the shooting instruction is given is determined as the main subject. When a plurality of data sets satisfy the condition at one time, the exclusion process is performed as in S608. The CPU 151 ends this flowchart when at least one of the processes of S702 and S703 is performed.

そして、図２のＳ２１２において、主被写体決定回路１６２は、Ｓ２１０およびＳ２１２で選別したデータセットを用いて、強化学習を行う。 Then, in S212 of FIG. 2, the main subject determination circuit 162 performs reinforcement learning using the data sets selected in S210 and S212.

なお、Ｓ２１２においては、ＣＰＵ１５１は、図２のＳ２０６でＲＡＭ１５４にバッファリングしたデータセットの全削除を行う。 In S212, the CPU 151 deletes all the data sets buffered in the RAM 154 in S206 of FIG.

以上説明したように、ユーザ操作に基づいて追加学習を行うことにより、ユーザの嗜好に応じて主被写体決定回路１６２のパラメータを調整することができる。またデータセット選別時に類似度の高いデータセットを除外しているため、特定のシーンに対する過学習を抑制できる。 As described above, by performing additional learning based on the user operation, the parameters of the main subject determination circuit 162 can be adjusted according to the user's preference. Further, since the datasets having a high degree of similarity are excluded when the datasets are selected, it is possible to suppress overlearning for a specific scene.

（学習方法）
次に、具体的な学習方法について説明する。本実施形態では教師ありの学習により、結合係数の調整を行う。教師ありの学習では、テストパターンを与えて実際にニューロンの出力値を求め、その出力値と教師信号（そのニューロンが出力すべき望ましい出力値）の関係から結合係数ｗ_M ^LS(n,u,v)の修正を行えばよい。本実施形態の学習においては、最終層の特徴検出層は最小二乗法を用い、中間層の特徴検出層は誤差逆伝搬法を用いて結合係数の修正を行う（最小二乗法や、誤差逆伝搬法等の、結合係数の修正手法の詳細は、非特許文献１を参照）。 (Learning method)
Next, a specific learning method will be described. In this embodiment, the coupling coefficient is adjusted by learning with a teacher. In supervised learning, an output value of a neuron is actually obtained by giving a test pattern, and the coupling coefficient w _M ^LS (n,u, from the relationship between the output value and the teacher signal (desired output value that the neuron should output). v) should be corrected. In the learning of the present embodiment, the feature detection layer of the final layer uses the least squares method, and the feature detection layer of the middle layer uses the error back propagation method to correct the coupling coefficient (least squares method or error back propagation). See Non-Patent Document 1 for details of the correction method of the coupling coefficient such as the method).

本実施形態では、予め学習する場合には、学習用のテストパターンとして、検出すべき特定パターンと、検出すべきでないパターンを多数用意し、追加学習する場合には、前述の方法で学習すべきテストパターンをバッファから選定する。各テストパターンは、画像および教師信号を１セットとする。ポジティブ学習のために選別されたデータセットが、検出すべき特定パターンとして用いられ、ネガティブ学習のための選別されたデータセットが、検出すべきでないパターンとして用いられる。 In the present embodiment, when learning in advance, a large number of specific patterns to be detected and patterns not to be detected are prepared as test patterns for learning, and in the case of additional learning, learning should be performed by the above-described method. Select a test pattern from the buffer. Each test pattern has one set of an image and a teacher signal. The selected data set for positive learning is used as the specific pattern to be detected, and the selected data set for negative learning is used as the undetectable pattern.

活性化関数にtanh関数を用いる場合は、検出すべき特定パターンを提示した場合は、最終層の特徴検出細胞面の、特定パターンが存在する領域のニューロンに対し、出力が１となるように教師信号を与える。すなわち、正の報酬を与える。逆に、検出すべきでないパターンを提示した場合は、そのパターンの領域のニューロンに対し、出力が−１となるように教師信号を与えることになる。すなわち、負の報酬を与える。 When using the tanh function as the activation function, when a specific pattern to be detected is presented, the output is set to 1 for neurons in the region where the specific pattern exists on the feature detection cell surface of the final layer. Give a signal. That is, it gives a positive reward. On the contrary, when a pattern that should not be detected is presented, a teacher signal is given to the neurons in the area of the pattern so that the output becomes -1. That is, it gives a negative reward.

以上の方法により、２次元画像から主被写体を決定するためのＣＮＮが構築される。実際の決定においては、学習により構築した結合係数ｗ_M ^LS(n,u,v)を用いて演算を行い、最終層の特徴検出細胞面上のニューロン出力が、所定値以上であれば、そこに主被写体が存在すると判定する。 By the above method, the CNN for determining the main subject from the two-dimensional image is constructed. In the actual determination, an operation is performed using the coupling coefficient w _M ^LS (n,u,v) constructed by learning, and if the neuron output on the feature detection cell surface of the final layer is a predetermined value or more, then It is determined that the main subject exists in.

なお、本実施形態では、撮像装置１００が主被写体決定回路１６２を有し、学習のためのデータセットの選別および強化学習も行う構成を例に挙げて説明を行ったが、これに限られるものではない。 In the present embodiment, the configuration in which the imaging apparatus 100 has the main subject determination circuit 162 and also performs selection of a data set for learning and reinforcement learning has been described as an example, but the present invention is not limited to this. is not.

例えば、カメラ機能を有するスマートフォンやタブレット端末を、サーバあるいはエッジコンピュータなどの外部装置と無線通信により接続するシステムにおいても、本発明を適用することが可能である。 For example, the present invention can be applied to a system in which a smartphone or a tablet terminal having a camera function is connected to an external device such as a server or an edge computer by wireless communication.

図８は、スマートフォンとサーバからなるシステムを示す。スマートフォン８０１が内部のＲＡＭにバッファリングされたデータセットをサーバ８０２に送信し、サーバ８０２が強化学習を行って、スマートフォン８０１の内部の主被写体決定回路の結合係数を求める。そして、サーバ８０２が求めた結合係数をスマートフォン８０１に送信し、スマートフォン８０１は受信した結合係数を主被写体決定回路に設定する。このとき、スマートフォン８０１のＲＡＭにバッファリングされたデータセットからポジティブ学習、および、ネガティブ学習のためのデータセットを選別する処理は、スマートフォン８０１とサーバ８０２のいずれが行っても構わない。スマートフォン８０１がデータセットの選別を行うのであれば、選別後のデータセットを、スマートフォン８０１からサーバ８０２に送信すればよい。サーバ８０２がデータセットの選別を行うのであれば、スマートフォン８０１はＲＡＭにバッファリングされた全てのデータセットと、そのデータセットに対して関連付けて記憶されたユーザ操作の情報を、サーバ８０２に送信する。データセットに含まれる画像と、ユーザ操作の情報は時間軸上の対応付けがされているものとする。このような構成であれば、スマートフォン８０１は、データセットとユーザ操作情報をサーバ８０２に送信することで、サーバ８０２がデータセットの選別を行うことができる。 FIG. 8 shows a system including a smartphone and a server. The smartphone 801 transmits the data set buffered in the internal RAM to the server 802, and the server 802 performs the reinforcement learning to obtain the coupling coefficient of the main subject determination circuit inside the smartphone 801. Then, the coupling coefficient obtained by the server 802 is transmitted to the smartphone 801, and the smartphone 801 sets the received coupling coefficient in the main subject determination circuit. At this time, either the smartphone 801 or the server 802 may perform the process of selecting a data set for positive learning and a data set for negative learning from the data set buffered in the RAM of the smartphone 801. If the smartphone 801 sorts the data sets, the sorted data sets may be transmitted from the smartphone 801 to the server 802. If the server 802 selects data sets, the smartphone 801 transmits to the server 802 all data sets buffered in the RAM and user operation information stored in association with the data sets. .. It is assumed that the images included in the data set and the user operation information are associated with each other on the time axis. With such a configuration, the smartphone 801 transmits the data set and the user operation information to the server 802, so that the server 802 can select the data set.

また、スマートフォン８０１が主被写体決定回路を備えておらず、サーバ８０２が主被写体決定回路を備えるように構成してもよい。スマートフォン８０１はリアルタイムで入力画像をサーバに送信し、サーバ８０２が主被写体の決定を行い、その結果をスマートフォン８０１に送信する。この場合、サーバ８０２が強化学習を行い、サーバの主被写体決定回路の結合係数を更新することになる。この場合も、スマートフォン８０１のＲＡＭにバッファリングされたデータセットからポジティブ学習、および、ネガティブ学習のためのデータセットを選別する処理は、スマートフォン８０１とサーバ８０２のいずれで行っても構わない。 Further, the smartphone 801 may not include the main subject determining circuit, and the server 802 may include the main subject determining circuit. The smartphone 801 transmits the input image to the server in real time, the server 802 determines the main subject, and transmits the result to the smartphone 801. In this case, the server 802 performs reinforcement learning and updates the coupling coefficient of the main subject determination circuit of the server. Also in this case, either the smartphone 801 or the server 802 may perform the process of selecting the data set for the positive learning and the negative learning from the data set buffered in the RAM of the smartphone 801.

さらに、ポジティブ学習のためのデータセットについては、撮影指示の前にＲＡＭ１５４にバッファリングされたデータセットに代えて、撮影によって得られた入力画像を用いても構わない。これは、撮影時には主被写体が正しく選定されている可能性が高いためである。この場合、ポジティブ学習のためのデータセットは、撮影によって記録された画像を含むデータセットであるのに対して、ネガティブ学習のためのデータセットは、撮影指示の前にバッファリングされたデータセットとなる。 Further, as for the data set for positive learning, the input image obtained by shooting may be used instead of the data set buffered in the RAM 154 before the shooting instruction. This is because there is a high possibility that the main subject is correctly selected at the time of shooting. In this case, the data set for positive learning is a data set including images recorded by photographing, while the data set for negative learning is a data set buffered before the photographing instruction. Become.

以上説明したように、本実施形態によれば、通常のカメラ操作を繰り返すことにより、ユーザの嗜好に応じた主被写体決定を行うことが可能となる。さらに、撮影指示を行う前の主被写体決定の結果と、そのときのユーザの操作の関係を判定することで、ユーザが意図しない主被写体決定の結果であることが識別できるようになり、効率的にネガティブ学習のためのデータを選別することが可能となる。 As described above, according to the present embodiment, it is possible to determine the main subject according to the taste of the user by repeating the normal camera operation. Furthermore, by determining the relationship between the result of the main subject determination before the shooting instruction and the operation of the user at that time, it is possible to identify the result of the main subject determination not intended by the user, which is efficient. It is possible to select data for negative learning.

（他の実施形態）
また本発明は、上述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサーがプログラムを読出し実行する処理でも実現できる。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現できる。 (Other embodiments)
Further, the present invention supplies a program that implements one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in a computer of the system or apparatus read the program. It can also be realized by executing processing. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

発明は上記実施形態に制限されるものではなく、発明の精神及び範囲から離脱することなく、様々な変更及び変形が可能である。従って、発明の範囲を公にするために請求項を添付する。 The invention is not limited to the above-described embodiments, and various changes and modifications can be made without departing from the spirit and scope of the invention. Therefore, the following claims are attached to open the scope of the invention.

１００：撮像装置、１０１：レンズユニット、１４１：撮像素子、１４２：撮像信号処理回路、１４３：撮像制御回路、１５１：ＣＰＵ、１５２：画像処理回路、１６２：主被写体決定回路 100: imaging device, 101: lens unit, 141: imaging device, 142: imaging signal processing circuit, 143: imaging control circuit, 151: CPU, 152: image processing circuit, 162: main subject determination circuit

Claims

取得した画像から主被写体を決定する決定手段を用いて主被写体が決定された画像の中から、前記決定手段のための学習において負の報酬を与えるデータとして用いる画像を、ユーザの入力に基づいて選別する選別手段を有することを特徴とする画像処理装置。 Based on the input of the user, an image used as data for giving a negative reward in the learning for the determining means from among the images in which the main subject is determined using the determining means for determining the main subject from the acquired image. An image processing apparatus having a selecting means for selecting.

前記選別手段は、前記取得した画像から主被写体を決定する決定手段を用いて主被写体が決定された画像の中から、前記決定手段のための学習において正の報酬を与えるデータとして用いる画像を、前記ユーザの入力に基づいて選別することを特徴とする請求項１に記載の画像処理装置。 The selecting means, from among the images in which the main subject is determined by using the determining means for determining the main subject from the acquired image, an image used as data for giving a positive reward in learning for the determining means, The image processing apparatus according to claim 1, wherein selection is performed based on an input from the user.

前記選別手段は、前記ユーザの入力に基づいて指定された第１の主被写体と、前記決定手段を用いて決定された第２の主被写体が異なる被写体である場合に、前記第２の主被写体が決定された画像を、前記負の報酬を与えるデータとして選別することを特徴とする請求項１または２に記載の画像処理装置。 When the first main subject designated based on the user's input and the second main subject determined by the determining unit are different subjects, the selecting unit determines the second main subject. The image processing apparatus according to claim 1 or 2, wherein the image for which is determined is selected as the data that gives the negative reward.

前記選別手段は、前記ユーザの入力に基づいて指定された第１の主被写体と、前記決定手段を用いて決定された第２の主被写体が異なる被写体である場合に、前記第２の主被写体が決定された画像を、前記負の報酬を与えるデータとして選別し、前記第１の主被写体と前記第２の主被写体が同じ被写体である場合には、前記第２の主被写体が決定された画像を、前記正の報酬を与えるデータとして選別することを特徴とする請求項２に記載の画像処理装置。 When the first main subject designated based on the user's input and the second main subject determined by the determining unit are different subjects, the selecting unit determines the second main subject. The image for which the second main subject has been determined is selected as data for giving the negative reward, and when the first main subject and the second main subject are the same subject, the second main subject has been determined. The image processing apparatus according to claim 2, wherein an image is selected as data that gives the positive reward.

前記選別手段は、焦点調節の対象とする被写体を確定するためのユーザの入力のやり直しがあった場合には、やり直す前に前記決定手段を用いて主被写体が決定された画像を、前記負の報酬を与えるデータとして選別することを特徴とする請求項１乃至４のいずれか１項に記載の画像処理装置。 In the case where the user's input is redone to determine the subject to be the focus adjustment target, the selecting means selects the image in which the main subject has been determined by the determining means before the redone is performed. The image processing apparatus according to any one of claims 1 to 4, wherein the image processing apparatus selects the data as reward data.

前記選別手段は、撮影指示を示すユーザの入力があった際に前記決定手段を用いて決定されていた第１の主被写体と、前記撮影指示を示すユーザの入力の前に前記決定手段を用いて決定されていた第２の主被写体とが異なる被写体である場合に、前記第２の主被写体が決定された画像を、前記負の報酬を与えるデータとして選別することを特徴とする請求項１または２に記載の画像処理装置。 The selecting unit uses the first main subject determined by the determining unit when the user inputs a shooting instruction, and the determining unit before the user inputs the shooting instruction. The image in which the second main subject has been determined is selected as the data for giving the negative reward when the second main subject that has been determined in advance is a different subject. Alternatively, the image processing device according to item 2.

前記選別手段は、撮影指示を示すユーザの入力があった際に前記決定手段を用いて決定されていた第１の主被写体と、前記撮影指示を示すユーザの入力の前に前記決定手段を用いて決定されていた第２の主被写体とが異なる被写体である場合に、前記第２の主被写体が決定された画像を、前記負の報酬を与えるデータとして選別し、前記第１の主被写体と、前記第２の主被写体とが同じ被写体である場合に、前記第２の主被写体が決定された画像を、前記正の報酬を与えるデータとして選別することを特徴とする請求項２に記載の画像処理装置。 The selecting unit uses the first main subject determined by the determining unit when the user inputs a shooting instruction, and the determining unit before the user inputs the shooting instruction. If the second main subject that has been determined by the above is a different subject, the image for which the second main subject has been determined is selected as the data for giving the negative reward, and the image is determined as the first main subject. The image in which the second main subject is determined is selected as the data that gives the positive reward when the second main subject is the same subject. Image processing device.

複数の画像の類似度を算出する算出手段をさらに備え、
前記選別手段は、前記算出手段により算出された類似度が所定値以上の複数の画像については、該複数の画像から選択された画像のみを、前記負の報酬を与えるデータとして選別することを特徴とする請求項１乃至７のいずれか１項に記載の画像処理装置。 Further comprising calculation means for calculating the similarity of a plurality of images,
With respect to a plurality of images whose similarity calculated by the calculating unit is a predetermined value or more, the selecting unit selects only an image selected from the plurality of images as the data for giving the negative reward. The image processing apparatus according to any one of claims 1 to 7.

前記決定手段を有することを特徴とする請求項１乃至８のいずれか１項に記載の画像処理装置。 The image processing apparatus according to claim 1, further comprising the determining unit.

前記決定手段のための学習を行う学習手段を有し、
前記学習手段は、前記選別手段が選別した前記負の報酬を与えるデータを用いて学習を行うことを特徴とする請求項１乃至９のいずれか１項に記載の画像処理装置。 A learning means for carrying out learning for the determining means,
10. The image processing apparatus according to claim 1, wherein the learning unit performs learning by using the data that gives the negative reward selected by the selection unit.

前記決定手段は、前記決定手段のための学習を行う外部装置の学習手段から無線通信によって取得した学習の結果を用いるものであり、
前記学習手段は、前記選別手段が選別した前記負の報酬を与えるデータを用いて学習を行うことを特徴とする請求項９に記載の画像処理装置。 The determining means uses the result of learning acquired by wireless communication from a learning means of an external device that performs learning for the determining means,
The image processing apparatus according to claim 9, wherein the learning unit performs learning using the data that gives the negative reward selected by the selection unit.

前記画像処理装置と無線通信が可能な外部装置が、前記決定手段と、前記決定手段のための学習を行う学習手段を有し、
前記学習手段は、前記選別手段が選別した前記負の報酬を与えるデータを用いて学習を行うことを特徴とする請求項１乃至８のいずれか１項に記載の画像処理装置。 An external device capable of wireless communication with the image processing device has the determining means and a learning means for performing learning for the determining means,
9. The image processing apparatus according to claim 1, wherein the learning unit performs learning by using the data that gives the negative reward selected by the selection unit.

前記主被写体が決定された画像を撮像するための撮像手段を有することを特徴とする請求項１乃至１２のいずれか１項に記載の画像処理装置。 The image processing apparatus according to claim 1, further comprising an image capturing unit configured to capture an image in which the main subject is determined.

取得した画像から主被写体を決定する決定手段を用いて主被写体が決定された画像の中から、前記決定手段のための学習において負の報酬を与えるデータとして用いる画像を、ユーザの入力に基づいて選別する選別工程を有することを特徴とする画像処理装置の制御方法。 Based on the input of the user, an image used as data for giving a negative reward in the learning for the determining means from among the images in which the main subject is determined using the determining means for determining the main subject from the acquired image. A method of controlling an image processing device, comprising a sorting step of sorting.

請求項１４に記載の制御方法の工程をコンピュータに実行させるためのプログラム。 A program for causing a computer to execute the steps of the control method according to claim 14.

請求項１４に記載の制御方法の工程をコンピュータに実行させるためのプログラムを記憶したコンピュータが読み取り可能な記憶媒体。 A computer-readable storage medium that stores a program for causing a computer to execute the steps of the control method according to claim 14.