JP7292178B2

JP7292178B2 - Region dividing device, region dividing method and region dividing program

Info

Publication number: JP7292178B2
Application number: JP2019192523A
Authority: JP
Inventors: 智之吉山
Original assignee: Secom Co Ltd
Current assignee: Secom Co Ltd
Priority date: 2019-10-23
Filing date: 2019-10-23
Publication date: 2023-06-16
Anticipated expiration: 2039-10-23
Also published as: JP2021068141A

Description

本発明は、画像等のデータ群を被写体等のクラスごとに分類してデータ群をラベル領域に分割する技術に関する。 The present invention relates to a technique for classifying a data group such as an image for each class such as an object and dividing the data group into label areas.

画像に撮影されたシーンを自動認識するなどの目的で、画像を、当該画像に撮影されている複数の物体それぞれの領域や複数の部位それぞれの領域に分割すると共に、各領域に撮影されている物体や部位を認識する技術が研究・開発されてきた。以下、撮影されている物体や部位を被写体と呼ぶ。被写体の認識を伴った領域分割はセマンティックセグメンテーションなどと称される。 For the purpose of automatically recognizing the scene photographed in the image, the image is divided into areas for each of the multiple objects photographed in the image and areas for each of a plurality of parts, and each area is photographed. Techniques for recognizing objects and parts have been researched and developed. Hereinafter, an object or part being photographed will be referred to as a subject. Region segmentation accompanied by object recognition is called semantic segmentation.

特に、近年では、学習に基づいて上記分割と認識を行う技術が盛んに研究されている。すなわち、例えば、下記非特許文献１には、予め被写体ごとに分割された領域の画素ごとに被写体を表すクラスを付与した学習用画像を多数用意し、コンピュータにこれらの学習用画像を機械学習させることが記載されている。予め付与する情報はアノテーションなどと称される。この学習によって生成された学習済みモデルに任意の画像を入力すれば当該入力画像に対して画素ごとのクラスが出力される。つまり当該入力画像が被写体ごとに、クラスでラベル付けされた領域（ラベル領域）に分割される。 In recent years, in particular, techniques for segmentation and recognition based on learning have been actively studied. That is, for example, in Non-Patent Document 1 below, a large number of learning images are prepared in which a class representing a subject is assigned to each pixel of an area divided for each subject in advance, and a computer performs machine learning on these learning images. is stated. Information given in advance is called an annotation or the like. If an arbitrary image is input to the trained model generated by this learning, a class for each pixel is output for the input image. That is, the input image is divided into regions labeled with classes (labeled regions) for each object.

また、近年では、学習用画像とアノテーションとからなるデータセットが公開され利用可能となっている。基本的には多様な学習をした学習済みモデルほど高精度な領域分割を行うことができるため、学習に用いるデータセットの規模は大きい方が望ましい。 Also, in recent years, datasets made up of training images and annotations have been made public and available for use. Basically, a trained model that has undergone various kinds of learning can perform region segmentation with higher accuracy, so it is desirable that the scale of the data set used for learning is large.

“Fully Convolutional Networks for Semantic Segmentation”,Jonathan Long, Evan Shelhamer, and Trevor Darrell (Proceedings of the IEEE conference on computer vision and pattern recognition, 2015)“Fully Convolutional Networks for Semantic Segmentation”, Jonathan Long, Evan Shelhamer, and Trevor Darrell (Proceedings of the IEEE conference on computer vision and pattern recognition, 2015)

しかしながら、学習データの多様性や付与基準の異なるアノテーションの混在が原因で領域分割結果が変動しやすくなる問題があった。また、付与基準の異なるアノテーションの混在は学習精度低下の原因にもなっていた。 However, there is a problem that the segmentation results tend to fluctuate due to the diversity of training data and the mixture of annotations with different criteria. In addition, the mixture of annotations with different criteria also caused a decrease in learning accuracy.

例えば、黒い絨毯の画像とそれに似たアスファルトの画像とを学習に用いると、黒い絨毯が敷かれた床の領域を正しく床の領域と分割する場合だけでなく、その一部または全部を道路の領域として誤って分割してしまう場合も生じる。これは学習の多様性により領域分割結果が変動しやすくなってしまう例である。 For example, if an image of a black carpet and an image of asphalt similar to it are used for learning, it is possible not only to correctly divide the floor area with the black carpet from the floor area, but also to divide part or all of it into the road area. A region may be erroneously divided. This is an example in which the results of segmentation tend to fluctuate due to diversity in learning.

また、例えば、野球場を撮影した画像を入力した場合、当該画像における芝の領域を草の領域として分割する場合もあれば、当該画像における芝の領域を遊技場の領域の一部として分割する場合もある。これは付与基準の異なるアノテーションの混在により領域分割結果が変動しやすくなってしまう例である。例えば、公開されているデータセットにおいては、野球場を撮影した学習用画像のひとつにおいては芝の領域に「草」を表すラベルが付与され土の領域に「土」を表すラベルが付与されているが、野球場を撮影した別の学習用画像においては芝と土の領域を合わせた領域に「遊技場」を表すラベルが付与されている、というように異なる付与基準が混在していることがある。つまり、芝の領域に対しては草も遊技場も正解となる。そのため、入力画像の違いによる変動が生じやすくなる。 Further, for example, when an image of a baseball field is input, the turf area in the image may be divided as the grass area, or the turf area in the image may be divided as part of the game ground area. In some cases. This is an example in which the result of segmentation tends to fluctuate due to the mixture of annotations with different attachment criteria. For example, in a public dataset, in one training image of a baseball field, the grass area is labeled "grass" and the soil area is labeled "soil." However, in another learning image of a baseball stadium, different labeling criteria are mixed, such as a label indicating “playground” being assigned to the combined area of grass and soil. There is In other words, both the grass and the playground are correct answers for the grass area. Therefore, variations due to differences in input images are likely to occur.

別の側面では芝の領域の例のような複数の正解の存在は学習を収束しづらくさせる。そのため、付与基準の異なるアノテーションの混在は学習精度低下の要因でもある。 In another aspect, the presence of multiple correct answers, such as the grass area example, makes learning difficult to converge. Therefore, the mixture of annotations with different attachment criteria is also a factor in lowering the learning accuracy.

なお、上記問題は、二次元画像のみならず、時系列画像から形成される時空間のデータやポイントクラウド等の三次元データ等においても生じ得る。 Note that the above problem can occur not only in two-dimensional images, but also in spatio-temporal data formed from time-series images, three-dimensional data such as point clouds, and the like.

本発明は、上記問題を鑑みてなされたものであり、領域分割結果の変動を抑制することのできる領域分割技術を提供することを目的とする。 SUMMARY OF THE INVENTION The present invention has been made in view of the above problems, and it is an object of the present invention to provide a region dividing technique capable of suppressing variations in region dividing results.

（１）本発明に係る領域分割装置は、所定の空間に分布するデータ群を複数のクラスに分類する分類処理を行い前記空間を前記クラスで識別されるラベル領域に分割する装置であって、前記データ群と注目クラス群とを入力され当該データ群についての前記分類処理を行い前記注目クラス群についての前記ラベル領域を出力する分類器として、学習用データ群、当該学習用データ群に対し予め与えられた正解のクラス、及び当該正解のクラスの部分集合で与えられる学習用注目クラス群を用いて学習が行われた学習済みモデルを記憶しているモデル記憶手段と、前記データ群に対する前記注目クラス群を複数通り設定する注目クラス設定手段と、前記注目クラス設定手段による前記注目クラス群の複数通りの設定それぞれについて、前記分類器により前記ラベル領域を求め、当該ラベル領域に基づく前記空間の領域分割結果のうち、当該領域分割結果を構成する前記ラベル領域と前記空間との整合の度合いについて所定の条件を満たすものを前記データ群についての領域分割結果として選択する領域分割手段と、を有する。 (1) A region dividing device according to the present invention is a device that classifies data groups distributed in a predetermined space into a plurality of classes and divides the space into labeled regions identified by the classes, As a classifier that receives the data group and the attention class group as input, performs the classification processing for the data group, and outputs the label region for the attention class group, a learning data group and a model storage means for storing a trained model trained using a given correct class and a group of attention classes for learning given as a subset of the correct class; and said attention to said data group. attention class setting means for setting a plurality of class groups; and for each of the plurality of setting of the attention class groups by the attention class setting means, the label area is obtained by the classifier, and the space area is based on the label area. an area division means for selecting, from among the division results, those satisfying a predetermined condition regarding the degree of matching between the label area and the space constituting the area division result, as the area division result for the data group.

（２）上記（１）に記載の領域分割装置において、前記注目クラス設定手段は、前記注目クラス群に補足クラスを加えて新たな前記注目クラス群を設定する処理により、逐次的に前記注目クラスを複数通り設定し、前記領域分割手段は、前記複数通りの前記注目クラス群について前記分類器が出力する前記ラベル領域のうちその大きさが予め定めた基準以上となるものを前記データ群についての領域分割結果として選択する構成とすることができる。 (2) In the region dividing apparatus described in (1) above, the attention class setting means sequentially sets the attention class group by adding a supplementary class to the attention class group and setting the new attention class group. are set in a plurality of ways, and the region dividing means divides the labeled regions output by the classifier for the plurality of classes of attention whose size is equal to or larger than a predetermined reference for the data group It can be configured to be selected as a segmentation result.

（３）上記（２）に記載の領域分割装置において、前記学習済みモデルは、前記データ群と前記注目クラス群とを入力され前記補足クラスを推定する推定器として、さらに前記学習用データ群についての前記補足クラスの正解を用いて前記学習が行われている構成とすることができる。 (3) In the region dividing device described in (2) above, the trained model is an estimator that receives the data group and the attention class group and estimates the supplementary class, and the training data group: The learning is performed using the correct answer of the supplementary class of .

（４）本発明に係る領域分割方法は、空間に分布するデータ群を複数のクラスに分類する分類処理を行い前記空間を前記クラスで識別されるラベル領域に分割する方法であって、前記データ群と注目クラス群とを入力され当該データ群についての前記分類処理を行い前記注目クラス群についての前記ラベル領域を出力する分類器として、学習用データ群、当該学習用データ群に対し予め与えられた正解のクラス、及び当該正解のクラスの部分集合で与えられる学習用注目クラス群を用いて学習が行われた学習済みモデルを用意するステップと、前記データ群に対する前記注目クラス群を複数通り設定する注目クラス設定ステップと、前記注目クラス設定ステップにおける前記注目クラス群の複数通りの設定それぞれについて、前記分類器により前記ラベル領域を求め、当該ラベル領域に基づく前記空間の領域分割結果のうち、当該領域分割結果を構成する前記ラベル領域と前記空間との整合の度合いについて所定の条件を満たすものを前記データ群についての領域分割結果として選択する領域分割ステップと、を有する。 (4) A region dividing method according to the present invention is a method for classifying a data group distributed in a space into a plurality of classes and dividing the space into label regions identified by the classes, wherein the data As a classifier that receives a group and an attention class group as input, performs the classification processing for the data group, and outputs the label region for the attention class group, a learning data group and a classifier that is given in advance to the learning data group. preparing a trained model that has been trained using the correct class and a group of attention classes for learning given by a subset of the correct class; and setting a plurality of the class groups of interest for the data group. and for each of the plurality of settings of the attention class group in the attention class setting step, the label region is obtained by the classifier, and among the results of segmentation of the space based on the label region, the an area division step of selecting, as the area division result for the data group, those satisfying a predetermined condition with respect to the degree of matching between the label areas and the space constituting the area division result.

（５）本発明に係る領域分割プログラムは、空間に分布するデータ群を複数のクラスに分類する分類処理を行い前記空間を前記クラスで識別されるラベル領域に分割する処理をコンピュータに行われるプログラムであって、当該コンピュータを、前記データ群と注目クラス群とを入力され当該データ群についての前記分類処理を行い前記注目クラス群についての前記ラベル領域を出力する分類器として、学習用データ群、当該学習用データ群に対し予め与えられた正解のクラス、及び当該正解のクラスの部分集合で与えられる学習用注目クラス群を用いて学習が行われた学習済みモデルを記憶しているモデル記憶手段、前記データ群に対する前記注目クラス群を複数通り設定する注目クラス設定手段、及び、前記注目クラス設定手段による前記注目クラス群の複数通りの設定それぞれについて、前記分類器により前記ラベル領域を求め、当該ラベル領域に基づく前記空間の領域分割結果のうち、当該領域分割結果を構成する前記ラベル領域と前記空間との整合の度合いについて所定の条件を満たすものを前記データ群についての領域分割結果として選択する領域分割手段、として機能させる。 (5) A region dividing program according to the present invention is a program for classifying a data group distributed in a space into a plurality of classes and dividing the space into label regions identified by the classes. wherein the computer is a classifier that receives the data group and the attention class group as inputs, performs the classification processing for the data group, and outputs the label region for the attention class group; Model storage means for storing a trained model that has been trained using a correct class given in advance to the learning data group and a group of attention classes for learning given as a subset of the correct class. , an attention class setting means for setting a plurality of the attention class groups for the data group, and for each of the plurality of setting of the attention class groups by the attention class setting means, the classifier obtains the label area, and Among the segmentation results of the space based on the label regions, those satisfying a predetermined condition regarding the degree of matching between the label regions and the space constituting the segmentation result are selected as segmentation results for the data group. function as area dividing means.

本発明によれば、領域分割結果の変動を抑制することが可能になる。 According to the present invention, it is possible to suppress variations in segmentation results.

本発明の実施形態に係る画像処理システムの概略の構成を示すブロック図である。1 is a block diagram showing a schematic configuration of an image processing system according to an embodiment of the present invention; FIG. 本発明の第１の実施形態における分類器の概略の機能ブロック図である。2 is a schematic functional block diagram of a classifier in the first embodiment of the invention; FIG. 合成特徴量の生成処理を説明する模式図である。It is a schematic diagram explaining the generation process of a synthetic feature-value. 第１の実施形態に係る画像処理システムの学習装置としての概略の機能ブロック図である。2 is a schematic functional block diagram as a learning device of the image processing system according to the first embodiment; FIG. 第１の実施形態に係る画像処理システムの学習時の動作の概略のフロー図である。FIG. 4 is a schematic flow diagram of operations during learning of the image processing system according to the first embodiment; 第１の実施形態に係る画像処理システムの領域分割装置としての概略の機能ブロック図である。2 is a schematic functional block diagram as an area dividing device of the image processing system according to the first embodiment; FIG. 第１の実施形態に係る画像処理システムの領域分割処理での動作の概略のフロー図である。4 is a schematic flow diagram of operations in region division processing of the image processing system according to the first embodiment; FIG. 第１の実施形態に係る画像処理システムの領域分割処理の処理例を説明するための模式図である。FIG. 5 is a schematic diagram for explaining a processing example of region division processing of the image processing system according to the first embodiment; 第２の実施形態に係る画像処理システムの領域分割処理での動作の概略のフロー図である。FIG. 11 is a flow chart showing the outline of the operation in the area dividing process of the image processing system according to the second embodiment;

以下、本発明の実施の形態（以下実施形態という）について、図面に基づいて説明する。 BEST MODE FOR CARRYING OUT THE INVENTION Embodiments (hereinafter referred to as embodiments) of the present invention will be described below with reference to the drawings.

《第１の実施形態》
本実施形態は、撮影部と表示部とがコンピュータに接続されてなる画像処理システム１であり、画像処理システム１は領域分割装置およびその学習装置として動作する。 <<1st Embodiment>>
This embodiment is an image processing system 1 in which a photographing unit and a display unit are connected to a computer, and the image processing system 1 operates as an area division device and a learning device thereof.

本発明に係る領域分割装置は、所定の空間に分布する対象データを複数のクラスに分類する分類処理を行い空間をクラスで識別されるラベル領域に分割するものであり、本実施形態にて一例として示す領域分割装置は、監視空間を撮影した画像を領域分割する。すなわち、本実施形態において、分類される対象データは二次元画像を構成する画素であり、分割される空間は画像に対応する有限の大きさの二次元空間である。また、領域分割装置は分類処理を行う分類器のほか、対象データに含まれるクラスに関する推定器を備える。学習装置は領域分割装置で用いる分類器および推定器を学習する。本実施形態では、推定器は分類器の一部を共有して構成されており、以下、分類器という用語は基本的に推定器を含んだ広義で用いる。つまり以下、特に断らない限り、分類器とは、上述の狭義の分類器と推定器との一体の構成を意味する。 The region dividing device according to the present invention classifies target data distributed in a predetermined space into a plurality of classes, and divides the space into labeled regions identified by the classes. , divides an image of the monitored space into regions. That is, in the present embodiment, the target data to be classified are pixels forming a two-dimensional image, and the space to be divided is a two-dimensional space of a finite size corresponding to the image. In addition to the classifier that performs the classification process, the region dividing device also has an estimator for the classes included in the target data. A learner trains the classifier and estimator used by the segmenter. In this embodiment, the estimator is configured to share a part of the classifier, and hereinafter, the term classifier is basically used in a broad sense including the estimator. That is, hereinafter, unless otherwise specified, a classifier means an integrated configuration of the classifier in the narrow sense described above and an estimator.

［画像処理システム１の構成］
図１は画像処理システム１の概略の構成を示すブロック図である。画像処理システム１は撮影部２、通信部３、記憶部４、画像処理部５および表示部６からなる。 [Configuration of image processing system 1]
FIG. 1 is a block diagram showing a schematic configuration of an image processing system 1. As shown in FIG. The image processing system 1 comprises a photographing section 2 , a communication section 3 , a storage section 4 , an image processing section 5 and a display section 6 .

撮影部２は、対象データの集まりである画像を取得するカメラであり、本実施形態においては監視カメラである。撮影部２は通信部３を介して画像処理部５と接続され、監視空間を所定の時間間隔で撮影して画像を生成し、生成した画像を順次、画像処理部５に入力する。例えば、撮影部２は、監視空間である屋内の壁に当該監視空間を俯瞰する所定の固定視野を有して設置され、監視空間をフレーム周期１秒で撮影してカラー画像を生成する。なお、撮影部２はカラー画像の代わりにモノクロ画像を生成してもよい。 The imaging unit 2 is a camera that acquires an image that is a collection of target data, and is a monitoring camera in this embodiment. The photographing unit 2 is connected to the image processing unit 5 via the communication unit 3 , photographs the monitored space at predetermined time intervals to generate images, and sequentially inputs the generated images to the image processing unit 5 . For example, the photographing unit 2 is installed on an indoor wall, which is a monitored space, with a predetermined fixed field of view for overlooking the monitored space, and captures the monitored space at a frame cycle of 1 second to generate a color image. Note that the photographing unit 2 may generate a monochrome image instead of a color image.

通信部３は通信回路であり、その一端が画像処理部５に接続され、他端が撮影部２および表示部６と接続される。通信部３は撮影部２から画像を取得して画像処理部５に入力する。また、通信部３は画像処理部５からクラスへの分類結果やラベル領域への分割結果を入力され表示部６へ出力する。 The communication unit 3 is a communication circuit, one end of which is connected to the image processing unit 5 and the other end of which is connected to the photographing unit 2 and the display unit 6 . The communication unit 3 acquires an image from the photographing unit 2 and inputs it to the image processing unit 5 . Further, the communication unit 3 receives the results of classification into classes and the results of division into label regions from the image processing unit 5 and outputs them to the display unit 6 .

なお、撮影部２、通信部３、記憶部４、画像処理部５および表示部６の間は各部の設置場所に応じた形態で適宜接続される。例えば、撮影部２と通信部３および画像処理部５とが遠隔に設置される場合、撮影部２と通信部３との間をインターネット回線にて接続することができる。また、通信部３と画像処理部５との間はバスで接続する構成とすることができる。その他、接続手段として、ＬＡＮ（Local Area Network）、各種ケーブルなどを用いることができる。 The imaging unit 2, the communication unit 3, the storage unit 4, the image processing unit 5, and the display unit 6 are appropriately connected in a form according to the installation location of each unit. For example, when the photographing unit 2, the communication unit 3, and the image processing unit 5 are installed remotely, the photographing unit 2 and the communication unit 3 can be connected via an Internet line. Further, the communication unit 3 and the image processing unit 5 can be configured to be connected by a bus. In addition, a LAN (Local Area Network), various cables, etc. can be used as connection means.

記憶部４は、ＲＯＭ（Read Only Memory）、ＲＡＭ（Random Access Memory）等のメモリ装置であり、各種プログラムや各種データを記憶する。例えば、記憶部４は学習用のデータや、学習済みモデルである分類器の情報を記憶し、画像処理部５との間でこれらの情報を入出力する。すなわち、分類器の学習に用いる情報、分類処理に必要な情報や当該処理の過程で生じた情報などが記憶部４と画像処理部５との間で入出力される。 The storage unit 4 is a memory device such as ROM (Read Only Memory) or RAM (Random Access Memory), and stores various programs and various data. For example, the storage unit 4 stores data for learning and information on a classifier that is a trained model, and inputs/outputs this information to/from the image processing unit 5 . That is, information used for learning a classifier, information required for classification processing, information generated in the course of the processing, and the like are input/output between the storage unit 4 and the image processing unit 5 .

画像処理部５は、ＣＰＵ（Central Processing Unit）、ＤＳＰ（Digital Signal Processor）、ＭＣＵ（Micro Control Unit）、ＧＰＵ（Graphics Processing Unit）等の演算装置で構成される。画像処理部５は記憶部４からプログラムを読み出して実行することにより各種の処理手段・制御手段として動作し、必要に応じて、各種データを記憶部４から読み出し、生成したデータを記憶部４に記憶させる。例えば、画像処理部５は分類器を学習し生成すると共に、生成した分類器を通信部３経由で記憶部４に記憶させる。また、画像処理部５は分類器を用いて撮影部２からの画像を構成する画素をクラス分類し、当該画像を分割する。 The image processing unit 5 is composed of arithmetic units such as a CPU (Central Processing Unit), a DSP (Digital Signal Processor), an MCU (Micro Control Unit), and a GPU (Graphics Processing Unit). The image processing unit 5 operates as various processing means and control means by reading out programs from the storage unit 4 and executing them. Memorize. For example, the image processing unit 5 learns and generates a classifier, and stores the generated classifier in the storage unit 4 via the communication unit 3 . Further, the image processing unit 5 uses a classifier to classify the pixels forming the image from the photographing unit 2, and divides the image.

表示部６は、液晶ディスプレイまたは有機ＥＬ（Electro-Luminescence）ディスプレイ等であり、通信部３を経由して画像処理部５から入力される分類結果や分割結果を表示する。 The display unit 6 is a liquid crystal display, an organic EL (Electro-Luminescence) display, or the like, and displays classification results and division results input from the image processing unit 5 via the communication unit 3 .

［分類器の構成］
図２は上述の広義の分類器の概略の機能ブロック図である。この分類器の構成の説明においては、狭義の分類器と推定器とを区別する都合上、狭義の分類器を単に分類器と記載し、広義の分類器を分類・推定器と記載する。分類・推定器は、画像と注目クラス情報とを入力され、分類器としては、画像の各画素についてクラス分類を行ってその結果を出力し、一方、推定器としては、補足クラスを推定してその結果を出力する。注目クラス情報は注目クラス群を指定する情報である。 [Classifier configuration]
FIG. 2 is a schematic functional block diagram of the broad classifier described above. In the description of the structure of this classifier, for the convenience of distinguishing between the narrow-sense classifier and the estimator, the narrow-sense classifier is simply referred to as a classifier, and the broad-sense classifier is referred to as a classifier/estimator. The classifier/estimator receives an image and class information of interest, classifies each pixel of the image into a class and outputs the result, and estimates a supplementary class as an estimator. Output the result. Attention class information is information specifying an attention class group.

ここで、注目クラス群は、分類器にてラベル領域を求めるクラスであり、基本的には１または複数のクラスからなる。つまり、分類対象として予め定められた複数のクラスのうちで注目クラスが設定され、注目クラスとして設定された１または複数のクラスを要素とする集合が注目クラス群である。そして、分類器は、領域分割対象の画像において注目クラスに対応する領域をラベル領域として識別し、一方、当該画像における注目クラス以外のクラスに対応する領域は特定のクラスのラベル領域としての識別を行わない。その結果、注目クラス群に対応しない領域は例えば、その他のクラスの領域として扱われる。 Here, the attention class group is a class for which a label region is obtained by a classifier, and basically consists of one or a plurality of classes. That is, a class of interest is set among a plurality of classes predetermined as classification targets, and a set of one or more classes set as the class of interest is the class of interest group. Then, the classifier identifies regions corresponding to the class of interest in the image to be segmented as labeled regions, and identifies regions corresponding to classes other than the class of interest in the image as labeled regions of a specific class. Not performed. As a result, areas that do not correspond to the class group of interest are treated as areas of other classes, for example.

補足クラスは、現在の注目クラス群に新たな注目クラスとして追加するクラスである。つまり、現在の注目クラス群に補足クラスを加えることで新たな注目クラス群が設定される。補足クラスを加えて注目クラス群を更新することで、基本的に、注目クラス群に対応するラベル領域は拡大し画像の全体領域に近づくが、推定器は、新たな注目クラス群に対応するラベル領域が好適に画像全体に近づく補足クラスを推定する。補足クラスは例えば、注目クラス以外のクラスのうちの１つのクラスであって、それを注目クラス群に加えることで、ラベル領域とされないその他のクラスの領域のサイズが最も減少するものとすることができる。 A supplementary class is a class to be added as a new attention class to the current attention class group. That is, a new attention class group is set by adding the complementary class to the current attention class group. By adding supplementary classes and updating the class group of interest, basically the label region corresponding to the class group of interest expands and approaches the entire region of the image. Estimate complementary classes whose regions preferably approximate the entire image. The supplementary class is, for example, one of the classes other than the class of interest, and by adding it to the group of classes of interest, the sizes of regions of other classes that are not labeled regions can be reduced most. can.

本実施形態では、分類・推定器は深層学習（Deep Learning）で用いられるような多層のネットワークで構成され、例えば、畳み込みニューラルネットワーク（Convolutional Neural Network：ＣＮＮ）でモデル化することができる。本実施形態の分類・推定器を構成するネットワークは、特徴量抽出部４００、注目クラス情報圧縮部４０１、特徴量合成部４０２、クラス分類部４０３および補足クラス推定部４０４を含む。これらのうち特徴量抽出部４００、注目クラス情報圧縮部４０１、特徴量合成部４０２は分類器と推定器とで共有され、当該共有部分にクラス分類部４０３を接続した構成が分類器をなし、一方、当該共有部分に補足クラス推定部４０４を接続した構成が推定器をなす。 In this embodiment, the classifier/estimator is composed of a multilayer network used in deep learning, and can be modeled by, for example, a convolutional neural network (CNN). The network that constitutes the classifier/estimator of this embodiment includes a feature quantity extraction unit 400 , an attention class information compression unit 401 , a feature quantity synthesis unit 402 , a class classification unit 403 and a supplementary class estimation unit 404 . Of these, the feature amount extraction unit 400, the attention class information compression unit 401, and the feature amount synthesis unit 402 are shared by the classifier and the estimator. On the other hand, the configuration in which the supplementary class estimator 404 is connected to the shared portion constitutes an estimator.

分類器の構成において、特徴量抽出部４００、特徴量合成部４０２およびクラス分類部４０３は直列に接続された複数層からなるネットワーク構造であり、当該部分を分類器主部と呼ぶことにする。同様に、推定器の構成において、特徴量抽出部４００、特徴量合成部４０２および補足クラス推定部４０４は直列に接続された複数層からなるネットワーク構造であり、当該部分を推定器主部と呼ぶことにする。 In the configuration of the classifier, the feature extraction unit 400, feature synthesis unit 402, and class classification unit 403 are connected in series and have a network structure consisting of multiple layers, and this portion is called a classifier main part. Similarly, in the configuration of the estimator, the feature quantity extraction unit 400, the feature quantity synthesis unit 402, and the supplementary class estimation unit 404 are a network structure consisting of multiple layers connected in series, and this part is called the estimator main part. to decide.

特徴量抽出部４００、クラス分類部４０３および補足クラス推定部４０４は、畳み込み層や活性化関数、プーリング（pooling）層などから構成される。例えば、分類器主部は近傍画素の特徴量を畳み込んだ特徴量マップを求める処理を繰り返し行って周囲の画素との関係を集約し、さらに元の画像の画素についてクラスを識別する処理を行う。本実施形態では、分類器主部および推定器主部のネットワーク構造はその途中に特徴量合成部４０２を挿入され、それぞれ特徴量合成部４０２の前と後との２つの部分に分かれる。これら２つの部分のうち前の部分が分類器主部と推定器主部とで共有される特徴量抽出部４００であり、一方、後ろの部分が分類器主部ではクラス分類部４０３であり、推定器主部では補足クラス推定部４０４である。 The feature extraction unit 400, the class classification unit 403, and the supplementary class estimation unit 404 are composed of convolution layers, activation functions, pooling layers, and the like. For example, the main part of the classifier repeats the process of obtaining a feature map by convoluting the features of neighboring pixels, summarizes the relationship with surrounding pixels, and further performs the process of classifying the pixels of the original image. . In this embodiment, the network structure of the classifier main part and the estimator main part is divided into two parts, one before and one after the feature amount synthesis unit 402, with the feature amount synthesis unit 402 inserted in the middle. Of these two parts, the front part is the feature extractor 400 shared by the classifier principal and the estimator principal, while the latter part is the classifier 403 in the classifier principal, The main part of the estimator is the complementary class estimator 404 .

特徴量抽出部４００は画像を入力され、当該画像から特徴量の計算を行う。なお、特徴量抽出部４００が行う特徴量の計算は、複数階層に生成される特徴量マップの途中の階層までであり得、また、クラス分類部４０３や補足クラス推定部４０４が行う処理は当該途中の階層以降の特徴量マップの生成を含み得る。 A feature amount extraction unit 400 receives an image and calculates a feature amount from the image. Note that the feature amount calculation performed by the feature amount extraction unit 400 can be performed up to an intermediate layer of the feature amount map generated in a plurality of layers. It may include generation of feature maps for intermediate and subsequent layers.

クラス分類部４０３は特徴量合成部４０２が生成する合成特徴量に基づいて画素のクラス分類を行い画像を領域分割する処理を行う。当該領域分割では、注目クラスに対応するラベル領域が出力される。つまり、クラス分類部４０３は、各画素のクラスを分類し、その際、注目クラスに分類される画素については当該クラスを出力し、これにより画像中にて当該クラスに属する画素群からなるラベル領域が得られ、画像はラベル領域に分割される。具体的には同一クラスに分類した隣接画素同士が当該クラスのラベル領域の１区画を構成する。一方、クラス分類部４０３は、画像中にて注目クラスに分類されない部分については例えば、上述のように「その他クラス」として出力する。 A class classification unit 403 performs a process of classifying pixels based on the synthesized feature amount generated by the feature amount synthesis unit 402 and dividing the image into regions. In the area division, a label area corresponding to the attention class is output. In other words, the class classification unit 403 classifies the class of each pixel, and at that time, outputs the class for pixels classified into the class of interest. is obtained and the image is divided into labeled regions. Specifically, adjacent pixels classified into the same class constitute one section of the label region of the class. On the other hand, the class classification unit 403 outputs parts of the image that are not classified into the attention class as, for example, the "other class" as described above.

補足クラス推定部４０４は特徴量合成部４０２が生成する合成特徴量に基づいて補足クラスを推定する処理を行う。例えば、補足クラス推定部４０４は、特徴量合成部４０２で作成した合成特徴量をもとに、注目クラス群に含まれていないが画像中には含まれているクラスのうち、最も面積が大きいクラスを補足クラスとして推定する。また、補足クラス推定部４０４は、クラスごとに注目クラスになり得るスコアが格納されたベクトルを出力し、当該スコアに基づいて補足クラスを選択する構成とすることもできる。なお、注目クラス群が画像中に含まれているクラスを全て含んでいる場合には、「補足クラスなし」という推定結果を返す。 A supplementary class estimation unit 404 performs processing for estimating a supplementary class based on the synthesized feature amount generated by the feature amount synthesis unit 402 . For example, the supplementary class estimation unit 404 determines, based on the combined feature amount created by the feature amount combining unit 402, the class having the largest area among the classes that are not included in the target class group but are included in the image. Estimate classes as complementary classes. Further, the supplementary class estimation unit 404 can also be configured to output a vector storing a score that can be a class of interest for each class, and select a supplementary class based on the score. If the class group of interest includes all the classes included in the image, an estimation result of "no supplementary class" is returned.

注目クラス情報圧縮部４０１は全結合層などから構成され、低次元表現での注目クラス情報を得て特徴量合成部４０２へ出力する。つまり、注目クラス情報は画像に映っているものやそのシーンに基づいて設定されるが、入力される画像中に現れるクラスの数は、分類器が分類可能な全クラスの数よりも十分小さいことが多く、また例えば屋外の画像に屋内のクラスは含まれにくい、屋内では壁と床は同時に含まれやすいなどの共起性を持つため、注目クラス情報は比較的低次元の情報で表すことができ、注目クラス情報圧縮部４０１はこの低次元化の変換処理を行う。例えば、注目クラス情報圧縮部４０１は、予め定義された全クラスに応じた数の変数で表される注目クラス情報を入力され、当該情報を次元圧縮し、より少ない変数で表現される注目クラス情報に変換して出力する。 The attention class information compression unit 401 is composed of a fully connected layer or the like, obtains attention class information in a low-dimensional representation, and outputs it to the feature amount synthesis unit 402 . In other words, attention class information is set based on what is shown in the image and its scene, but the number of classes that appear in the input image must be sufficiently smaller than the total number of classes that the classifier can classify. For example, it is difficult to include an indoor class in an outdoor image, and indoors tends to include walls and floors at the same time. The class-of-interest information compression unit 401 performs conversion processing for this dimensionality reduction. For example, the class-of-interest information compression unit 401 receives as input class-of-interest information represented by the number of variables corresponding to all the predefined classes, dimensionally compresses the information, and class-of-interest information that is represented by a smaller number of variables. converted to and output.

特徴量合成部４０２は、特徴量抽出部４００にて抽出された特徴量に、注目クラス情報圧縮部４０１にて圧縮された注目クラス情報を合成して合成特徴量を生成し、クラス分類部４０３および補足クラス推定部４０４へ入力する。 The feature amount synthesis unit 402 synthesizes the attention class information compressed by the attention class information compression unit 401 with the feature amount extracted by the feature amount extraction unit 400 to generate a synthesized feature amount. and input to supplemental class estimation unit 404 .

図３は合成特徴量の生成処理を説明する模式図である。図３は図２に示した分類・推定器内におけるデータを模式的に表しており、図の左側には、分類器主部をなす図２の特徴量抽出部４００、特徴量合成部４０２およびクラス分類部４０３の並びに対応して、分類器へ入力される画像１００、特徴量合成部４０２により生成される合成特徴量１１０、分類器から出力されるクラス分類結果１４０が並んでいる。また、図の右側には、注目クラス情報圧縮部４０１の入力ノード１２０および当該ノードに入力される注目クラス情報１２１、並びに注目クラス情報圧縮部４０１の出力ノード１３０が示されている。 3A and 3B are schematic diagrams for explaining the process of generating the synthesized feature amount. FIG. 3 schematically shows the data in the classifier/estimator shown in FIG. An image 100 input to the classifier, a synthesized feature amount 110 generated by the feature amount synthesizing section 402, and a class classification result 140 output from the classifier are arranged corresponding to the arrangement of the class classification section 403 . On the right side of the drawing, an input node 120 of the attention class information compression unit 401, attention class information 121 input to the node, and an output node 130 of the attention class information compression unit 401 are shown.

図３の左側に並ぶ分類器主部のデータに関し、画像１００の幅方向にｘ軸、高さ方向にｙ軸をとり、また特徴量のチャンネルに対応する次元をｃ軸で表している。画像１００の大きさはｘ方向にＷ_Ｉ画素、ｙ方向にＨ_Ｉ画素である。特徴量抽出部４００にて生成される特徴量マップはｘ方向にＷ_Ｆ画素、ｙ方向にＨ_Ｆ画素の大きさで、ｃ方向の大きさ、つまりチャンネル数はＣチャンネルとする。ちなみに、特徴量マップのｘ，ｙ方向のサイズは一般に画像１００のサイズとは一致せず、通常、Ｗ_Ｆ＜Ｗ_Ｉ，Ｈ_Ｆ＜Ｈ_Ｉとなる。 Regarding the data of the main part of the classifier arranged on the left side of FIG. 3, the width direction of the image 100 is taken as the x-axis, the height direction as the y-axis, and the dimension corresponding to the channel of the feature amount is represented as the c-axis. The size of the image 100 is _WI pixels in the x direction and _HI pixels in the y direction. The feature amount map generated by the feature amount extraction unit 400 has a size of _WF pixels in the x direction, _HF pixels in the y direction, and a size in the c direction, that is, the number of channels is C channels. Incidentally, the size of the feature quantity map in the x and y directions generally does not match the size of the image 100, and usually W _F <W _I and H _F <H _I .

図３に例示する注目クラス情報１２１は、予め定められたＮ個のクラスそれぞれについて各クラスが注目クラスか否かを表す情報である。例えば、分類器が分類対象とする全クラスが当該Ｎクラスとして設定される。 Attention class information 121 illustrated in FIG. 3 is information indicating whether or not each class is an attention class for each of N predetermined classes. For example, all classes to be classified by the classifier are set as the N classes.

具体的には、注目クラス情報１２１は注目クラスを値“１”、注目クラスではなく、よってクラス分類結果においてその他クラスに置き換えて出力させるクラスを“０”で表したＮ次元のベクトルである。注目クラス情報１２１はその具体的な一例を示しており、屋内を撮影した画像に対して生成されたものである。例えば“人”や“床”のクラスは当該画像に含まれるため注目クラスであるとして、ベクトルにて対応する要素に“１”が設定され、一方、例えば“道路”のクラスは当該画像に含まれないため注目クラスではないとして、対応する要素に“０”が設定されている。 Specifically, the attention class information 121 is an N-dimensional vector representing the attention class with a value of "1" and the class that is not the attention class and is therefore output by replacing it with another class in the class classification result with a value of "0". The attention class information 121 shows a specific example thereof, and is generated for an indoor image. For example, classes such as “person” and “floor” are included in the image and are considered to be attention classes. The corresponding element is set to "0" because it is not a target class because it cannot be read.

注目クラス情報圧縮部４０１の入力ノード１２０は注目クラス情報１２１の要素と一対一に対応しており、その数はＮであり、一方、出力ノード１３０の数ＤはＮ未満である。注目クラス情報圧縮部４０１は、入力ノード１２０に入力された注目クラス情報１２１を次元圧縮して、出力ノード１３０から圧縮された注目クラス情報を出力する。つまり、注目クラス情報１２１はＮ次元のベクトルからＤ次元のベクトルに圧縮される。ちなみに、図３では、注目クラス情報圧縮部４０１として、入力ノード１２０と出力ノード１３０とが全結合された構成を示している。 The input nodes 120 of the attention class information compression unit 401 are in one-to-one correspondence with the elements of the attention class information 121, the number of which is N, while the number D of the output nodes 130 is less than N. The attention class information compression unit 401 dimensionally compresses the attention class information 121 input to the input node 120 and outputs the compressed attention class information from the output node 130 . That is, the attention class information 121 is compressed from an N-dimensional vector to a D-dimensional vector. Incidentally, FIG. 3 shows a configuration in which the input node 120 and the output node 130 are fully coupled as the attention class information compression unit 401 .

特徴量合成部４０２は、注目クラス情報圧縮部４０１の出力ノード１３０から圧縮された注目クラス情報を入力され、当該注目クラス情報を特徴量抽出部４００から入力された特徴量マップと合成して、合成特徴量１１０を生成する。合成特徴量１１０は、合成前の特徴量マップにてｘ座標、ｙ座標の組で指定されるＣ次元の特徴量ベクトルそれぞれにＤ次元ベクトルで表される注目クラス情報を連結したものであり、合成前の特徴量マップと幅と高さが同じで、チャンネル数が（Ｃ＋Ｄ）チャンネルとなった構造を有する。例えば、合成特徴量１１０の第１～第Ｃチャンネルは合成前の特徴量マップで、第（Ｃ＋１）～第（Ｃ＋Ｄ）チャンネルに、注目クラス情報圧縮部４０１の出力ノード１３０の第１～第Ｄノードの出力値が設定される。 The feature amount synthesis unit 402 receives the compressed attention class information from the output node 130 of the attention class information compression unit 401, synthesizes the attention class information with the feature amount map input from the feature amount extraction unit 400, A synthesized feature quantity 110 is generated. The synthesized feature quantity 110 is obtained by connecting attention class information represented by a D-dimensional vector to each C-dimensional feature quantity vector specified by a set of x-coordinates and y-coordinates in the feature quantity map before synthesis, It has the same width and height as the feature map before synthesis, and has a structure in which the number of channels is (C+D). For example, the 1st to Cth channels of the synthesized feature quantity 110 are the feature quantity maps before synthesis, and the (C+1)th to (C+D)th channels are the first to Dth channels of the output node 130 of the attention class information compression unit 401. The output value of the node is set.

本実施形態では各（ｘ，ｙ）座標に対して共通の注目クラス情報を設定するので、合成特徴量１１０の構造は、注目クラス情報のＤ個の要素それぞれをｘ，ｙ方向に複製して特徴量抽出部４００の出力と同じＷ_Ｆ×Ｈ_Ｆ画素の大きさに拡大し、それを合成前の特徴量マップに積層した構造である。つまり、例えば、第１～第Ｃチャンネルの特徴量は座標（ｘ，ｙ）に応じて異なり得るのに対し、本実施形態では第（Ｃ＋１）～第（Ｃ＋Ｄ）の各チャンネルには全ての座標（ｘ，ｙ）に共通の値が設定される。 In this embodiment, common attention class information is set for each (x, y) coordinate. It has a structure in which it is enlarged to the same W _F ×H _F pixel size as the output of the feature amount extraction unit 400 and is layered on the feature amount map before synthesis. That is, for example, the feature amounts of the first to C-th channels can differ according to the coordinates (x, y), whereas in the present embodiment, each of the (C+1)-th to (C+D)-th channels has all coordinates A common value is set for (x, y).

以下、画像処理システム１の構成について、先ず、学習装置としての構成および動作について説明し、次いで、領域分割装置としての構成および動作について説明する。 In the following, regarding the configuration of the image processing system 1, the configuration and operation as a learning device will be described first, and then the configuration and operation as an area dividing device will be described.

［学習装置としての構成］
図４は第１の実施形態に係る画像処理システム１の学習装置としての概略の機能ブロック図であり、記憶部４が学習用データ記憶手段４０および学習モデル記憶手段４１として機能し、画像処理部５が正解ラベル置換手段５０、学習用注目クラス生成手段５１、学習用補足クラス生成手段５２および学習手段５３として機能する。 [Configuration as a learning device]
FIG. 4 is a schematic functional block diagram as a learning device of the image processing system 1 according to the first embodiment. 5 functions as correct label replacing means 50 , attention class generating means 51 for learning, supplementary class generating means 52 for learning, and learning means 53 .

学習用データ記憶手段４０は、学習用対象データである多数の画像および当該画像に対し予め与えられた正解のクラスを記憶する。学習用画像と当該画像それぞれに対応する正解のクラスとは、学習処理に先立って予め学習用データ記憶手段４０に記憶される。 The learning data storage means 40 stores a large number of images as learning object data and correct classes given in advance to the images. The learning images and the correct classes corresponding to the respective images are stored in advance in the learning data storage means 40 prior to the learning process.

学習モデル記憶手段４１は分類器についての学習モデルを記憶する。学習手段５３による学習処理に伴い、学習モデル記憶手段４１に記憶される学習モデルは更新される。そして、学習が完了すると、学習モデル記憶手段４１は分類器の学習済みモデルを記憶し、後述するモデル記憶手段４２として機能する。上述したように本実施形態では、分類器は例えば、ＣＮＮでモデル化されるネットワークで構成され、学習モデル記憶手段４１は、ＣＮＮなどのネットワークを構成するフィルタのフィルタ係数やネットワーク構造などを含めた情報を分類器として記憶する。 A learning model storage means 41 stores a learning model for the classifier. As the learning process by the learning means 53 is performed, the learning model stored in the learning model storage means 41 is updated. When the learning is completed, the learning model storage means 41 stores the learned model of the classifier and functions as the model storage means 42 described later. As described above, in the present embodiment, the classifier is configured by, for example, a network modeled by CNN, and the learning model storage means 41 includes filter coefficients and network structures of filters constituting a network such as CNN. Store the information as a classifier.

学習手段５３は、学習モデル記憶手段４１に記憶される学習モデルの学習を行う。当該学習では、分類器の学習モデルにて特徴量抽出部４００および注目クラス情報圧縮部４０１それぞれに学習用画像および学習用注目クラス情報を入力し、クラス分類部４０３の出力のクラス分類結果の正解に対する誤差と、補足クラス推定部４０４の出力に得られる補足クラスの正解に対する誤差とに基づいて学習モデルが更新され学習される。ちなみに、分類器の学習における出力の正解に対する誤差は、クラス分類部４０３と補足クラス推定部４０４の上記両誤差を加算等により統合した値とし、学習は当該統合した誤差に基づいて制御することができる。 The learning means 53 learns the learning model stored in the learning model storage means 41 . In the learning, the learning image and the attention class information for learning are input to the feature amount extraction unit 400 and the attention class information compression unit 401 respectively in the learning model of the classifier, and the correct answer of the class classification result output from the class classification unit 403 is obtained. and the error for the correct answer of the supplementary class obtained in the output of the supplementary class estimation unit 404, the learning model is updated and learned. Incidentally, the error with respect to the correct output in the learning of the classifier is a value obtained by integrating the errors of the class classification unit 403 and the supplementary class estimation unit 404 by addition or the like, and learning can be controlled based on the integrated error. can.

学習手段５３には、当該学習に用いられる学習用画像、学習用注目クラス情報、並びにクラス分類および補足クラスそれぞれの正解とが入力される。これらのうち学習用画像は学習用データ記憶手段４０から学習手段５３に入力される。また、学習用注目クラス生成手段５１、正解ラベル置換手段５０および学習用補足クラス生成手段５２がそれぞれ、学習用注目クラス情報、クラス分類の正解および補足クラスの正解を学習手段５３に入力する。 The learning means 53 is supplied with the learning image used for the learning, the attention class information for learning, and the correct answers for each of the class classification and the supplementary class. Of these images, the learning images are input from the learning data storage means 40 to the learning means 53 . Further, the attention class generating means 51 for learning, the correct label replacing means 50 and the supplementary class generating means 52 for learning respectively input the attention class information for learning, the correct class classification and the correct supplementary class to the learning means 53 .

正解ラベル置換手段５０は学習用データ記憶手段４０に記憶されている正解のクラスを読み出し、正解のクラスに対する置換処理を行う。当該置換処理は、正解のクラス、つまり学習用画像に存在するクラスの一部を存在しないものとする。例えば、置換するクラスは、各学習用画像に対応した正解のクラスの中でランダムに設定することができる。この際、各クラスを一定の確率でランダムに置換するのではなく、０以上で、正解ラベルに含まれるクラス数以下の乱数を生成し、その乱数の数だけのクラスをランダムに選択し置換することで、置換されるクラスの数が均一に分布するようにするとよい。或いは、ランダムに置換する代わりに、各クラスの置換回数を計数しながら、各学習用画像に対応した正解のクラスの中で置換回数が少ないクラスを優先して選択し置換してもよい。 The correct label replacement means 50 reads the correct class stored in the learning data storage means 40 and performs replacement processing for the correct class. In the replacement process, the correct class, that is, some of the classes that exist in the learning image are assumed to be non-existent. For example, the replacement class can be set randomly among correct classes corresponding to each learning image. At this time, instead of randomly replacing each class with a certain probability, a random number greater than or equal to 0 and less than or equal to the number of classes included in the correct label is generated, and classes equal to the random number are randomly selected and replaced. This should ensure that the number of classes to be replaced is evenly distributed. Alternatively, instead of performing random replacement, while counting the number of replacements for each class, a class with a smaller number of replacements may be preferentially selected and replaced among correct classes corresponding to each learning image.

この正解ラベル置換手段５０の処理を人、床、壁、窓が写っている画像を例に用いて説明する。クラスの情報は全クラスのそれぞれと一対一に対応した要素からなるベクトルで表すことができる。このベクトルをクラスベクトルと呼ぶことにする。各画素の正解のクラスは、分類対象として予め定義された全クラスの数をＮとすると、当該クラスに対応する要素に値“１”、それ以外の要素に値“０”が設定されたＮ次元クラスベクトルで表現できる。例えば、人が写っている画素の正解のクラスは、人のクラスの要素が“１”でそれ以外は“０”であるＮ次元ベクトルで表現され、また床の画素の正解のクラスは、床のクラスの要素が“１”でそれ以外は“０”であるＮ次元ベクトルで表現され、壁や窓の画素の正解のクラスも同様に表現される。正解ラベル置換手段５０は、正解のクラスを表すＮ次元ベクトルを各画素について入力され、そのベクトルにその他クラスに対応する要素を加えたＮ＋１次元クラスベクトルを各画素について出力する。正解ラベル置換手段５０が例えば、床以外、つまり人、壁、窓を注目しないものとする場合、人、壁、窓のいずれかが含まれている画素のＮ＋１次元クラスベクトルにおける人、壁、窓のクラスの要素が“０”に置換され、その他クラスに対応する要素が“１”に設定される。また、床が含まれている画素のＮ＋１次元クラスベクトルにおけるその他クラスに対応する要素には“０”が設定される。 The processing of the correct label replacing means 50 will be explained using an image including a person, floor, wall, and window as an example. Class information can be represented by a vector consisting of elements corresponding to all classes one-to-one. This vector will be called a class vector. The correct class for each pixel is N, where the value "1" is set for the element corresponding to the class, and the value "0" is set for the other elements, where N is the number of all classes predefined as classification targets. It can be expressed as a dimensional class vector. For example, the correct class of pixels in which a person is captured is represented by an N-dimensional vector in which the elements of the class of people are "1" and the elements of the others are "0". is represented by an N-dimensional vector in which the elements of the class are "1" and the others are "0", and the correct classes of pixels of walls and windows are similarly represented. Correct label replacement means 50 receives an N-dimensional vector representing a correct class for each pixel, and outputs an N+1-dimensional class vector obtained by adding elements corresponding to other classes to the vector for each pixel. For example, when the correct label replacement means 50 does not pay attention to people, walls, and windows other than the floor, people, walls, and windows in the N+1-dimensional class vector of pixels containing any of people, walls, and windows are replaced with "0", and elements corresponding to other classes are set to "1". Also, "0" is set to the element corresponding to the other class in the N+1-dimensional class vector of the pixels including the floor.

置換処理により、注目クラスについては置換前の正解のクラスに基づくオリジナルの正解ラベル領域、注目クラス以外についてはその他クラスに置き換えられた正解ラベル領域が得られ、この置換済みの正解ラベル領域が正解ラベル置換手段５０からクラス分類の正解として学習手段５３に与えられる。 By the replacement process, the original correct label area based on the correct class before replacement is obtained for the attention class, and the correct label area replaced by the other class is obtained for the class other than the attention class. It is given to the learning means 53 from the replacement means 50 as the correct answer for class classification.

学習用注目クラス生成手段５１は、正解ラベル置換手段５０から置換済みの正解ラベル領域を入力され、それに基づいて学習用注目クラス情報を生成する。学習用注目クラス生成手段５１は、置換済みの正解ラベル領域に対応するクラス、つまり、正解ラベル置換手段５０での置換処理後に残る正解のクラスを学習用注目クラス群とし、それを表す学習用注目クラス情報を生成する。ちなみに、学習用注目クラス情報として、置換済みの正解ラベル画像ごとに１つのＮ次元クラスベクトルが生成される。例えば、学習用注目クラス情報は、注目クラスに対応する要素が値“１”でそれ以外は“０”であるクラスベクトルで表現される。 The attention class generating means 51 for learning receives the replaced correct label area from the correct label replacing means 50 and generates attention class information for learning based on it. The learning attention class generation means 51 sets the class corresponding to the replaced correct label region, that is, the correct class remaining after the replacement processing in the correct label replacement means 50 as a learning attention class group, and sets a learning attention class group representing it as a learning attention class group. Generate class information. Incidentally, one N-dimensional class vector is generated for each replaced correct label image as the attention class information for learning. For example, the attention class information for learning is represented by a class vector in which the element corresponding to the attention class has a value of "1" and the other elements have a value of "0".

学習用補足クラス生成手段５２は、学習用データ記憶手段４０からオリジナルの正解ラベル領域を読み出し、また学習用注目クラス生成手段５１から学習用注目クラス情報を入力され、それらに基づいて補足クラスの正解である学習用補足クラスを生成する。オリジナルの正解ラベル領域から分かる置換前の正解のクラスと、学習用注目クラス情報が示す注目クラスとを比べることで、正解のクラスのうち正解ラベル置換手段５０にて置換され注目クラスに含まれないこととなっているクラスが分かる。学習用補足クラス生成手段５２は、当該クラスのうちの正解ラベル領域としての面積が最大のクラスを、学習用補足クラスとして学習手段５３に入力する。なお、学習用補足クラス生成手段５２は補足クラスが無い場合はその旨を出力する。学習用補足クラス生成手段５２の出力はＮ＋１次元クラスベクトルとすることができ、その第Ｎ＋１次元目の要素を補足クラス無しフラグとすることができる。 The learning supplementary class generating means 52 reads out the original correct label area from the learning data storage means 40, receives the learning attention class information from the learning attention class generating means 51, and based on these, determines the correct answer of the supplementary class. Generate a training supplementary class where . By comparing the correct class before replacement known from the original correct label area with the attention class indicated by the attention class information for learning, the correct class is replaced by the correct label replacement means 50 and is not included in the attention class. I know the class that is supposed to be. The learning supplementary class generation means 52 inputs the class having the largest area as the correct label region among the relevant classes to the learning means 53 as the learning supplementary class. If there is no supplementary class, the learning supplementary class generating means 52 outputs that effect. The output of the learning supplementary class generating means 52 can be an N+1-dimensional class vector, and the N+1-th dimensional element can be used as a no supplementary class flag.

［学習装置としての動作］
画像処理システム１は入力画像を領域分割する動作に先立って、分類器を学習する動作を行う。以下、この分類器の学習について説明する。画像処理システム１における分類器の学習は、学習用画像および学習用注目クラス情報と、クラス分類の正解データである置換済み正解ラベル領域、および補足クラスの正解データである学習用補足クラスとを用い、上述した統合した誤差をもとに、誤差逆伝播法などの既知の最適化手法を用いて、学習モデルのパラメータを繰り返し誤差が収束するまで更新する。この学習によって、注目クラスに対応するラベル領域を求める分類処理が可能な分類器（狭義）と、領域分割結果にて注目クラスに対応するラベル領域に含まれない「その他クラス」の領域を縮小させるように補足クラスを推定する推定器とを学習させることができる。また、当該分類器の学習は、特徴量抽出部４００、クラス分類部４０３および補足クラス推定部４０４の学習に加え、学習用注目クラス情報を用いて注目クラス情報圧縮部４０１を学習する動作を含む。 [Operation as a learning device]
The image processing system 1 performs the operation of learning a classifier prior to the operation of segmenting the input image into regions. The learning of this classifier will be described below. Classifier learning in the image processing system 1 uses a learning image and attention class information for learning, a replaced correct label region that is correct data for class classification, and a supplementary class for learning that is correct data for supplementary classes. , based on the integrated error described above, using a known optimization technique such as error backpropagation, iteratively updates the parameters of the learning model until the error converges. Through this learning, a classifier (narrow definition) capable of classification processing to find the label area corresponding to the attention class and the area of "other classes" that are not included in the label area corresponding to the attention class in the result of segmentation are reduced. We can train an estimator to estimate the supplemental classes as follows. The learning of the classifier includes the learning of the feature extraction unit 400, the class classification unit 403, and the supplementary class estimation unit 404, and the operation of learning the attention class information compression unit 401 using the attention class information for learning. .

図５は画像処理システム１の学習時の動作の概略のフロー図である。 FIG. 5 is a schematic flowchart of the operation of the image processing system 1 during learning.

学習動作開始が指示されると、画像処理部５は学習モデル記憶手段４１から分類器の学習モデルのパラメータの初期設定値を読み込み（ステップＳ１）、当該モデルについての学習動作（ステップＳ２～Ｓ１０）を開始する。 When the start of learning operation is instructed, the image processing unit 5 reads the initial setting values of the parameters of the learning model of the classifier from the learning model storage means 41 (step S1), and performs the learning operation for the model (steps S2 to S10). to start.

画像処理部５は、学習用データ記憶手段４０から学習用画像および当該画像の正解ラベルを取得する（ステップＳ２）。画像処理部５は正解ラベル置換手段５０として機能し、正解ラベルに含まれるクラスをランダムに選択し、「その他クラス」に置換する（ステップＳ３）。 The image processing unit 5 acquires the learning image and the correct label of the image from the learning data storage means 40 (step S2). The image processing unit 5 functions as the correct label replacement means 50, randomly selects a class included in the correct label, and replaces it with "other class" (step S3).

画像処理部５は学習用注目クラス生成手段５１として機能し、正解ラベル置換手段５０で生成されたラベルをもとに学習用注目クラス情報を生成する（ステップＳ４）。例えば、正解ラベル置換手段５０で生成される置換済みの正解ラベル領域は、「その他クラス」を含めてＮ＋１クラスで構成され得るが、学習用注目クラス生成手段５１は「その他クラス」を除いたＮクラスのクラスベクトルを出力する。 The image processing unit 5 functions as a learning class-of-interest generating means 51, and generates learning class-of-interest information based on the labels generated by the correct label replacement means 50 (step S4). For example, the replaced correct label region generated by the correct label replacement means 50 can be composed of N+1 classes including the "other class", but the attention class generating means 51 for learning uses N classes excluding the "other class". Print the class vector of the class.

次に、画像処理部５は学習用補足クラス生成手段５２として機能し、オリジナルの正解ラベルと学習用注目クラス情報とに基づいて、学習用注目クラス情報には含まれていないが、オリジナルの正解ラベルには含まれているクラスのうち、正解ラベル内で面積が最大のクラスを学習用補足クラスとして設定する（ステップＳ５）。この際、正解ラベル置換手段５０においてどのクラスも置換されず、学習用注目クラス情報にオリジナルの正解ラベルに含まれる全てのクラスが含まれている場合には、学習用補足クラスなしという特殊なクラスを設定する。つまり、学習用補足クラス生成手段５２は要素数Ｎ＋１のクラスベクトルを生成し、注目クラスに追加するべきクラスに相当する要素が値“１”でそれ以外は“０”であるベクトルを出力する。 Next, the image processing unit 5 functions as a supplementary class generating means 52 for learning, and based on the original correct label and the attention class information for learning, the original correct answer, which is not included in the attention class information for learning, is generated. Among the classes included in the label, the class having the largest area within the correct label is set as the supplementary class for learning (step S5). At this time, if none of the classes are replaced by the correct label replacement means 50 and all the classes included in the original correct label are included in the attention class information for learning, a special class of no supplementary class for learning is used. set. That is, the learning supplementary class generating means 52 generates a class vector having N+1 elements, and outputs a vector in which the element corresponding to the class to be added to the target class has the value "1" and the other elements have the value "0".

画像処理部５は学習手段５３として機能し、学習用画像、置換済み正解ラベル領域、学習用注目クラス情報、および学習用補足クラスに基づいて、学習モデルのパラメータを更新する。学習手段５３はまず学習モデルに学習用画像と学習用注目クラス情報を入力し、入力時のパラメータで領域分割と補足クラスの推定を行う（ステップＳ６）。その後、得られた領域分割結果と正解ラベルを比較して誤差を求め（ステップＳ７）、さらに推定した補足クラスと学習用補足クラスとの誤差を求める（ステップＳ８）。学習手段５３はこれらの誤差が小さくなるように確率的勾配降下法などで学習モデルのパラメータを更新する（ステップＳ９）。 The image processing unit 5 functions as a learning means 53, and updates the parameters of the learning model based on the learning image, the replaced correct label region, the attention class information for learning, and the supplementary class for learning. The learning means 53 first inputs the learning image and the attention class information for learning into the learning model, and performs area division and estimation of supplementary classes using parameters at the time of input (step S6). After that, the obtained segmentation result and the correct label are compared to obtain an error (step S7), and an error between the estimated supplementary class and the learning supplementary class is obtained (step S8). The learning means 53 updates the parameters of the learning model by stochastic gradient descent or the like so as to reduce these errors (step S9).

画像処理システム１は学習動作にて、ステップＳ２～Ｓ９の処理を学習データを変えながら誤差が収束するまで繰り返し（ステップＳ１０にて「ＮＯ」の場合）、誤差が所定の収束条件を満たすと（ステップＳ１０にて「ＹＥＳ」の場合）、学習モデル記憶手段４１に学習済みモデル（すなわち分類器）のパラメータを記憶させ、学習動作を終了する（ステップＳ１１）。 In the learning operation, the image processing system 1 repeats the processing of steps S2 to S9 while changing the learning data until the error converges (if "NO" in step S10), and when the error satisfies a predetermined convergence condition ( If "YES" in step S10), the parameters of the learned model (that is, the classifier) are stored in the learning model storage means 41, and the learning operation ends (step S11).

以上の学習によって生成される分類器（狭義）はクラス分類結果を注目クラスに制限することを指示する注目クラス情報（注目クラス群）を画像（データ群）とともに入力することによって画素（データ）を注目クラス以外に分類することを抑制できるものとなる。そして、学習においては注目クラス群を正解のクラスの部分集合としているため、学習用データの多様性による変動を抑制した（例えば床を道路に誤分類する余地を無くした）高精度なクラス分類ができ、及び／又は、学習データが付与基準の異なるアノテーションの混在したものであっても混在による変動を抑制した（例えば芝を遊技場に分類する余地を無くし草に分類させる）高精度なクラス分類ができるものとなる。また学習が収束しやすくなる。よって、分類器は学習用データの多様性や付与基準の混在による変動を抑制した高精度なクラス分類（領域分割）ができるものとなる。 The classifier (narrow sense) generated by the above learning classifies pixels (data) by inputting attention class information (attention class group) that instructs to limit the class classification result to the attention class together with the image (data group). Classification into classes other than the attention class can be suppressed. In learning, the target class group is a subset of the correct class, so highly accurate class classification that suppresses fluctuations due to the diversity of learning data (for example, eliminates the possibility of misclassifying floors as roads) can be achieved. and/or high-precision class classification that suppresses variation due to mixture even if learning data is a mixture of annotations with different assignment criteria (for example, classifies grass as grass instead of classifying grass as playground) can be done. In addition, learning becomes easier to converge. Therefore, the classifier can perform highly accurate class classification (region division) by suppressing variations due to the diversity of learning data and the mixture of assignment criteria.

また、以上の学習によって生成される推定器は、入力された画像（データ群）にあって入力された注目クラス群に無いクラス、すなわち注目クラス群に加えるべき補足クラスを高精度に推定できるものとなる。 Also, the estimator generated by the above learning is capable of estimating with high accuracy the classes that are in the input image (data group) but not in the input attention class group, that is, the supplementary classes to be added to the attention class group. becomes.

［領域分割装置としての構成］
図６は第１の実施形態に係る画像処理システム１の領域分割装置としての概略の機能ブロック図であり、記憶部４がモデル記憶手段４２として機能し、画像処理部５が注目クラス設定手段５４および領域分割手段５５として機能する。また、通信部３が画像処理部５と協働し、画像入力手段３０および領域情報出力手段３１として機能する。ここで、注目クラス設定手段５４および領域分割手段５５における主な処理は分類器を用いて行われることから、図６では便宜的に、注目クラス設定手段５４および領域分割手段５５を分類器５６として図示している。 [Configuration as an area dividing device]
FIG. 6 is a schematic functional block diagram of an area dividing device of the image processing system 1 according to the first embodiment. and function as the region dividing means 55 . Also, the communication section 3 cooperates with the image processing section 5 and functions as an image input means 30 and an area information output means 31 . Here, since the main processing in the attention class setting means 54 and the area dividing means 55 is performed using a classifier, in FIG. Illustrated.

モデル記憶手段４２は学習により生成された分類器を記憶している。本実施形態においてモデル記憶手段４２は学習装置の構成として上述した学習モデル記憶手段４１と同一であり、分類器は上述した学習済みモデルである。 The model storage means 42 stores classifiers generated by learning. In this embodiment, the model storage means 42 has the same configuration as the learning model storage means 41 described above as a configuration of the learning device, and the classifier is the above-described trained model.

画像入力手段３０は撮影部２から画像を順次取得して分類器５６に入力する。 The image input means 30 sequentially acquires images from the imaging unit 2 and inputs them to the classifier 56 .

領域分割手段５５は、画像入力手段３０から画像（入力画像）を入力され、また注目クラス設定手段５４から注目クラス情報を入力され、入力画像の各画素について、クラス分類処理を行い、その結果に基づいて得られる注目クラスのラベル領域を出力する。具体的には、入力画像および注目クラス情報はそれぞれ分類器の特徴量抽出部４００、注目クラス情報圧縮部４０１に入力され、クラス分類部４０３から出力されるクラス分類結果に基づいてラベル領域への分割結果が得られる。領域分割手段５５は注目クラス設定手段５４による注目クラス群の複数通りの設定それぞれについて、分類器によりラベル領域を求め、注目クラス群の複数通りの設定のうち、当該ラベル領域と入力画像に対応する二次元空間との整合の度合いについて所定の条件を満たすものでのラベル領域を入力画像についての領域分割結果として選択する。 The region dividing means 55 receives an image (input image) from the image input means 30 and attention class information from the attention class setting means 54, performs class classification processing on each pixel of the input image, and divides the result into Output the label region of the attention class obtained based on Specifically, the input image and the attention class information are input to the feature amount extraction unit 400 and the attention class information compression unit 401 of the classifier, respectively, and based on the class classification result output from the class classification unit 403, the classification is performed to the label region. A split result is obtained. The region dividing means 55 obtains label regions by means of a classifier for each of the plurality of settings of the attention class group by the attention class setting means 54, and the label regions corresponding to the input image among the plurality of settings of the attention class group are determined. A label region that satisfies a predetermined condition regarding the degree of matching with the two-dimensional space is selected as the region segmentation result for the input image.

注目クラス設定手段５４は、画像入力手段３０から画像を入力され、領域分割手段５５に入力する複数通りの注目クラス群を設定する。本実施形態では、注目クラス設定手段５４は、注目クラス群に補足クラスを加えて新たな注目クラス群を設定する処理により、複数通りの注目クラス群を逐次的に設定する。注目クラス設定手段５４は、領域分割手段５５でのクラス分類処理が所定の条件を満たすまで、逐次的な設定を繰り返す。設定された注目クラス群は注目クラス情報として領域分割手段５５へ与えられる。補足クラスは分類器を用いて推定される。具体的には、分類器の特徴量抽出部４００、注目クラス情報圧縮部４０１にそれぞれ入力画像および注目クラス情報を入力し、補足クラス推定部４０４から補足クラスの推定結果を得る。 The attention class setting means 54 receives an image from the image input means 30 and sets a plurality of attention class groups to be input to the region dividing means 55 . In the present embodiment, the attention class setting means 54 sequentially sets a plurality of attention class groups by adding supplemental classes to the attention class group to set a new attention class group. The attention class setting means 54 repeats the sequential setting until the class classification processing by the area dividing means 55 satisfies a predetermined condition. The set attention class group is given to the region dividing means 55 as attention class information. Supplementary classes are estimated using a classifier. Specifically, an input image and attention class information are input to a feature amount extraction unit 400 and an attention class information compression unit 401 of the classifier, respectively, and a supplementary class estimation result is obtained from a supplementary class estimation unit 404 .

領域情報出力手段３１は、領域分割手段５５が求めたラベル領域を表示部６に出力する。例えば、領域情報出力手段３１は、ラベル領域ごとに色分けされた画像を生成して表示部６に出力する。 The area information output means 31 outputs the label areas found by the area dividing means 55 to the display section 6 . For example, the area information output means 31 generates an image color-coded for each label area and outputs it to the display section 6 .

［領域分割装置としての動作］
図７は画像処理システム１の領域分割処理での動作に関する概略のフロー図である。 [Operation as an area dividing device]
FIG. 7 is a schematic flow diagram of the operation of the image processing system 1 in the area dividing process.

画像処理システム１が領域分割処理を開始すると、撮影部２は所定時間おきに監視空間を撮影した画像を順次出力する。画像処理部５は通信部３と協働して、撮影部２から画像を受信するたびに図７のフロー図に示す動作を繰り返す。 When the image processing system 1 starts the region dividing process, the photographing unit 2 sequentially outputs images of the monitored space photographed at predetermined time intervals. The image processing unit 5 cooperates with the communication unit 3 and repeats the operation shown in the flowchart of FIG. 7 every time an image is received from the photographing unit 2 .

当該動作にてまず通信部３が画像入力手段３０として機能し、画像を受信すると当該画像を画像処理部５に入力する（ステップＳ２０）。 In this operation, the communication section 3 first functions as the image input means 30, and upon receiving an image, inputs the image to the image processing section 5 (step S20).

画像処理部５は入力された画像（入力画像）を分類器の特徴量抽出部４００に入力し、入力画像の特徴量を計算する（ステップＳ２１）。この先の処理では、１つの入力画像に対して領域分割処理が注目クラスを変化させながら複数回行われるが、その際にここで計算した特徴量を繰り返し利用する。このように、領域分割処理の都度、特徴量を計算するのではなく再利用することで、画像処理部５の計算量を削減することができる。 The image processing unit 5 inputs the input image (input image) to the feature amount extraction unit 400 of the classifier, and calculates the feature amount of the input image (step S21). In the subsequent processing, the region division processing is performed on one input image a plurality of times while changing the class of interest. In this manner, the amount of calculation of the image processing unit 5 can be reduced by reusing the feature amount instead of calculating it each time the region dividing process is performed.

一方、分類器の注目クラス情報圧縮部４０１には、注目クラス設定手段５４により設定される注目クラス群の初期値が入力される（ステップＳ２２）。注目クラス設定手段５４は当該初期値として例えば、１クラスだけからなる注目クラス群を設定する。注目クラス設定手段５４は例えばＰ通り（１≦Ｐ）の注目クラス群を設定する。ここでＰは予め定めておく。例えばＰ＝３とすることができる。例えば、注目クラスに何も設定しなかった際に補足クラス推定部４０４の出力に得られる補足クラスの上位Ｐ個のクラスのそれぞれを初期注目クラス群とすることができる。あるいは、Ｎ個の全クラスそれぞれを１個ずつ注目クラス群として領域分割を行い、それらＮ通りの注目クラス群のうち、領域分割結果における「その他クラス」の面積が小さい順にＰクラスをそれぞれ初期注目クラス群とすることもできる。 On the other hand, the initial value of the attention class group set by the attention class setting means 54 is input to the attention class information compression unit 401 of the classifier (step S22). The attention class setting means 54 sets, for example, a attention class group consisting of only one class as the initial value. The attention class setting means 54 sets, for example, P groups of attention classes (1≤P). Here, P is determined in advance. For example, P=3. For example, each of the top P classes of the supplementary classes obtained in the output of the supplementary class estimation unit 404 when nothing is set as the target class can be used as the initial target class group. Alternatively, each of all the N classes is divided into a class group of interest one by one, and among the N classes of class groups of interest, the P classes are selected in descending order of the area of "other classes" in the area division result. It can also be a class group.

注目クラス設定手段５４と領域分割手段５５は推定器と一体の分類器を共用し、好適な領域分割結果が得られるように、注目クラス群の更新と領域分割処理とを繰り返す（ステップＳ２３～Ｓ２９）。 The attention class setting means 54 and the area dividing means 55 share a classifier integrated with the estimator, and repeat the updating of the attention class group and the area dividing process so as to obtain a suitable result of area division (steps S23 to S29). ).

注目クラス設定手段５４は、補足クラスの推定結果に多少の誤りが含まれていても良いように、注目クラス群を複数通り保持しながら、好適な注目クラス群の探索を行う。 The attention class setting means 54 searches for a suitable attention class group while holding a plurality of attention class groups so that the supplementary class estimation result may contain some errors.

補足クラスは補足クラス推定部４０４で算出される。ここでは、補足クラス推定部４０４は補足クラス情報として、クラスごとに補足クラスらしさを表すスコアが格納されたベクトルを出力する構成であるとする。注目クラス設定手段５４は、保持されているＰ通りの注目クラス群を順次、分類器に入力して当該注目クラス群について補足クラス情報を求める。そして、補足クラス情報にてスコアが上位のＱ（１≦Ｑ＜Ｎ）個のクラスそれぞれを当該注目クラス群に対する補足クラスとして選定する（ステップＳ２３）。ここでＱは予め定めておく。例えばＱ＝３とすることができる。 The complementary class is calculated by the complementary class estimator 404 . Here, it is assumed that the supplementary class estimating unit 404 is configured to output, as supplementary class information, a vector in which a score representing the likelihood of a supplementary class is stored for each class. The attention class setting means 54 sequentially inputs the retained P attention class groups to the classifier to obtain supplementary class information for the attention class groups. Then, each of the Q (1≤Q<N) classes with the highest score in the supplementary class information is selected as a supplementary class for the target class group (step S23). Here, Q is determined in advance. For example, Q=3.

注目クラス設定手段５４がステップＳ２３におけるＱ通りの補足クラスから順次１つ選択して現在の注目クラス群に追加することで試行注目クラス群を作成し（ステップＳ２４）、画像処理部５が作成された試行注目クラス群を分類器に入力して領域分割処理を行う（ステップＳ２５）という一連の処理が、現在の注目クラス群に対してＱ個の補足クラスについて処理し終えるまで繰り返される（ステップＳ２６にて「ＮＯ」の場合）。なお、ステップＳ２５にて領域分割処理とともに補足クラスの推定処理を行ってもよく、その推定結果を、後のステップＳ２８で更新する注目クラス群に対するステップＳ２３の結果として利用することができる。 The class-of-interest setting means 54 sequentially selects one class from the Q supplementary classes in step S23 and adds it to the current class group of interest to create a trial class group of interest (step S24), and the image processing unit 5 is created. A series of processes of inputting the obtained trial attention class group into the classifier and performing region division processing (step S25) is repeated until Q supplementary classes for the current attention class group are processed (step S26). in the case of "NO"). Note that, in step S25, supplementary class estimation processing may be performed together with region division processing, and the estimation result can be used as the result of step S23 for the target class group to be updated in step S28 later.

Ｑ個の補足クラス全てについて処理が完了すると（ステップＳ２６にて「ＹＥＳ」の場合）、現在の注目クラス群の１つについて試行注目クラス群がＱ通り作成される。 When all the Q supplementary classes have been processed ("YES" in step S26), Q trial attention class groups are created for one of the current attention class groups.

画像処理部５はＰ通りの現在の注目クラス群に対してステップＳ２４～Ｓ２６の処理を順次行う（ステップＳ２７にて「ＮＯ」の場合）。Ｐ通りの現在の注目クラス群全てについて当該処理が完了すると（ステップＳ２７にて「ＹＥＳ」の場合）、試行注目クラス群とその領域分割結果がＰ×Ｑ通り得られている。注目クラス設定手段５４はそれら試行注目クラス群のうち、領域分割結果における「その他クラス」の面積が小さい順における第１位～第Ｐ位のものでＰ通り保持している注目クラス群を置換することによって注目クラス群を更新する（ステップＳ２８）。この更新によってＰ通りの注目クラス群は、現在の注目クラス群に補足クラスを加えたものとなる。また、この処理において注目クラス群に対応するラベル領域は基本的に拡大していき画像の全体領域に近づく。 The image processing unit 5 sequentially performs the processes of steps S24 to S26 for the P current attention class groups (if "NO" in step S27). When the processing is completed for all of the P current attention class groups (if "YES" in step S27), P×Q trial attention class groups and their segmentation results are obtained. The attention class setting means 54 replaces the retained attention class groups with the first to P-ranked ones of the trial attention class groups in the descending order of the area of the "other classes" in the region division result. By doing so, the attention class group is updated (step S28). By this update, the P classes of attention are obtained by adding the complementary classes to the current class of attention. Also, in this processing, the label area corresponding to the class group of interest is basically enlarged and approaches the entire area of the image.

画像処理部５は、１つの補足クラスが追加されたＰ×Ｑ通りの試行注目クラス群を生成し、それらの中からＰ通りの注目クラス群を選択するというステップＳ２３～Ｓ２８の操作を終了条件が満たされるまで繰り返す（ステップＳ２９にて「ＮＯ」の場合）。 The image processing unit 5 generates P×Q trial attention class groups to which one supplementary class is added, and selects P attention class groups from among them. is satisfied (if "NO" in step S29).

終了条件は、注目クラス群に対応する当該ラベル領域と入力画像に対応する二次元空間との整合の度合いについてのものであり、例えば、当該度合いを表す指標の繰り返しに伴う変化が無い又は所定値以下であれば終了と判定される。具体的には、終了条件は、画像中の「その他クラス」の面積が減少しなくなることや、当該減少が所定値以下となることや、保持している注目クラス群全てにおいて「補足クラスなし」と推定されることや、注目クラス群に既に含まれるクラスが補足クラスとして推定され、新たな注目クラス群が得られなくなることなどとすることができる。また、例えば、整合の度合いを表す指標が所定基準を超える又は下回れば終了と判定される。具体的には、終了条件は、「その他クラス」のラベル領域の面積が入力画像の面積の所定割合以下となること、注目クラス群に既に含まれるクラス数が予め設定した最大個数を超えることなどとすることができる。或いは、終了条件を、上記条件のうちの２以上のいずれかを満たすこととしてもよい。 The end condition is the degree of matching between the label region corresponding to the class group of interest and the two-dimensional space corresponding to the input image. If it is below, it is determined to be finished. Specifically, the end condition is that the area of "other classes" in the image does not decrease, that the decrease is equal to or less than a predetermined value, or that "no supplementary class" exists in all of the attention class groups held. or that a class already included in the attention class group is estimated as a supplementary class, making it impossible to obtain a new attention class group. Also, for example, if the index representing the degree of matching exceeds or falls below a predetermined standard, it is determined that the processing is completed. Specifically, the end conditions are that the area of the label region of "other classes" is equal to or less than a predetermined ratio of the area of the input image, that the number of classes already included in the target class group exceeds a preset maximum number, and the like. can be Alternatively, the termination condition may be that any two or more of the above conditions are satisfied.

画像処理部５は終了条件が満たされた場合（ステップＳ２９にて「ＹＥＳ」の場合）、注目クラス群の探索が終了したと判定し、その結果得られたＰ通りの注目クラス群のうちで上述の整合度合いについて所定の条件を満たすものを最終結果として選択する。本実施形態では、Ｐ通りの注目クラス群のうちで、「その他クラス」の面積が最小であるものに対応するラベル領域を、入力画像の領域分割結果として領域情報出力手段３１により出力する（ステップＳ３０）。 When the termination condition is satisfied ("YES" in step S29), the image processing unit 5 determines that the search for the attention class group is completed, and out of the resulting P attention class groups, A final result that satisfies a predetermined condition for the degree of matching is selected as the final result. In this embodiment, the region information output means 31 outputs the label region corresponding to the class having the smallest area of the "other class" among the P attention class groups as the region segmentation result of the input image (step S30).

図８は、画像処理システム１の領域分割処理の処理例を説明するための模式図である。図８（ａ）の画像２００は入力画像を示しており、入力画像２００には、壁２０１、窓２０２、人２０３と共に、黒い絨毯が敷かれた床２０４が撮影されている。 FIG. 8 is a schematic diagram for explaining a processing example of the region division processing of the image processing system 1. As shown in FIG. An image 200 in FIG. 8A shows an input image, in which a wall 201, a window 202, a person 203, and a floor 204 covered with a black carpet are photographed.

図８（ｂ）の画像２１０は入力画像２００に対して従来技術により得られるラベル領域を表している。一方、図８（ｃ）の画像２２０は入力画像２００に対して本実施形態の画像処理システム１により得られるラベル領域、また画像２２０ａ～２２０ｃは当該ラベル領域を得る際の領域分割処理の過程を表している。 Image 210 in FIG. 8(b) represents the labeled regions obtained according to the prior art for input image 200. FIG. On the other hand, the image 220 in FIG. 8(c) is the label area obtained by the image processing system 1 of the present embodiment for the input image 200, and the images 220a to 220c are the process of area division processing when obtaining the label area. represent.

図８（ｂ）に示す従来技術の処理結果では、壁２０１、窓２０２、人２０３が撮影された領域はそれぞれ正しく壁のクラスのラベル領域２１１、窓のクラスのラベル領域２１２、人のクラスのラベル領域２１３として分割されているが、床２０４が撮影された領域は正しく床のクラスとして分割されたラベル領域２１４と、誤って道路のクラスとして分割されたラベル領域２１５とに分かれてしまっている。 In the processing result of the prior art shown in FIG. 8(b), the areas in which the wall 201, the window 202, and the person 203 are photographed are correctly labeled areas 211, 212, and 212 respectively. Although it is divided as a label area 213, the area where the floor 204 is photographed is divided into a label area 214 correctly divided as the floor class and a label area 215 erroneously divided as the road class. .

図８（ｃ）に示す処理例では、説明を簡単にするために、上述の注目クラス群の保持数Ｐを１とする。その処理過程において、画像２２０ａは注目クラス情報として“床”のクラスを値“１”とし、それ以外のクラスを値“０”としたクラスベクトルを入力したときの分類器のクラス分類部４０３の出力を表しており、ラベル領域として床のクラスのラベル領域２２４が得られ、それ以外の領域は斜線で示す「その他クラス」となっている。補足クラス推定部４０４は補足クラス情報のスコアが最上位の１クラスを補足クラスとし、それにより、この段階では補足クラスが“壁”とされ、“床”、“壁”のクラスが値“１”である新たな注目クラス情報が設定される。 In the processing example shown in FIG. 8C, the retention number P of the attention class group described above is set to 1 for the sake of simplicity of explanation. In the processing process, the image 220a is input as a class vector in which the class "floor" has a value of "1" and the other classes have a value of "0" as attention class information. A label area 224 of the floor class is obtained as the label area, and the other areas are "other classes" indicated by diagonal lines. The supplementary class estimating unit 404 regards the one class with the highest score in the supplementary class information as a supplementary class. ” is set as new attention class information.

画像２２０ｂは当該注目クラス情報を入力したときのクラス分類部４０３の出力を表しており、ラベル領域として床のクラスのラベル領域２２４と壁のクラスのラベル領域２２１とが得られ、それ以外の領域は斜線で示す「その他クラス」となっている。このときの補足クラス推定部４０４は補足クラスを“窓”と推定し、“床”、“壁”、“窓”のクラスが値“１”である新たな注目クラス情報が設定される。 An image 220b represents the output of the class classification unit 403 when the attention class information is input, and a label region 224 of the floor class and a label region 221 of the wall class are obtained as label regions, and other regions are obtained. is the "other class" indicated by diagonal lines. At this time, the supplementary class estimation unit 404 estimates the supplementary class as "window", and sets new attention class information in which the classes of "floor", "wall", and "window" have a value of "1".

画像２２０ｃは当該注目クラス情報を入力したときのクラス分類部４０３の出力を表しており、ラベル領域として床のクラスのラベル領域２２４、壁のクラスのラベル領域２２１に加え窓のクラスのラベル領域２２２が得られ、それ以外の領域は斜線で示す「その他クラス」となっている。このときの補足クラス推定部４０４は補足クラスを“人”とし、“床”、“壁”、“窓”、“人”のクラスが値“１”である新たな注目クラス情報が設定される。 An image 220c represents the output of the class classification unit 403 when the attention class information is input, and includes a label region 224 for the floor class, a label region 221 for the wall class, and a label region 222 for the window class as label regions. is obtained, and the other areas are shaded "other classes". At this time, the supplementary class estimating unit 404 sets the supplementary class as "person", and sets new attention class information in which the classes of "floor", "wall", "window", and "person" have a value of "1". .

画像２２０は当該注目クラス情報を入力したときのクラス分類部４０３の出力を表しており、入力画像２００は壁のクラスのラベル領域２２１、窓のクラスのラベル領域２２２、人のクラスのラベル領域２２３および床のクラスのラベル領域２２４に分割される。この段階にて、「その他クラス」の領域がなくなることや、補足クラス推定部４０４の出力にて「追加なし」が設定されることといった終了条件が満たされ、領域分割処理が終了する。 An image 220 represents the output of the class classification unit 403 when the attention class information is input. The input image 200 includes a wall class label region 221, a window class label region 222, and a person class label region 223. and floor class label areas 224 . At this stage, the termination conditions such as that there is no "other class" area and that "no addition" is set in the output of the supplementary class estimation unit 404 are satisfied, and the area division processing is terminated.

この処理では、入力画像に出現し得るクラスを注目クラス情報で与えることで、例えば、床を注目クラスとしたときに、画像２１０にて床と似た画像特徴を有する道路のラベル領域２１５とされた床の部分が正しく床に誘導されやすくなり誤分類が抑制される。 In this process, classes that can appear in the input image are given as attention class information, so that, for example, when the floor is the attention class, the label area 215 of the road in the image 210 having an image feature similar to that of the floor is determined. This makes it easier for the floor part to be correctly guided to the floor and suppresses misclassification.

《第２の実施形態》
本発明の第２の実施形態に係る画像処理システム１Ｂは第１の実施形態の図１と共通の構成であり、また第１の実施形態と同様、入力画像に対する領域分割処理を行う領域分割装置、およびその学習装置として動作する。以下、第２の実施形態について、第１の実施形態と同様の構成要素については同一の符号を付して第１の実施形態での説明を援用し、また、説明上、第１の実施形態の構成要素との混同を避ける場合には第１の実施形態の符号の後ろに“Ｂ”を付した符号を用いることとし、主に第１の実施形態との相違点を説明する。 <<Second embodiment>>
An image processing system 1B according to the second embodiment of the present invention has the same configuration as that of the first embodiment shown in FIG. , and its learner. Hereinafter, in the second embodiment, the same reference numerals are given to the same components as in the first embodiment, and the description in the first embodiment is used. In order to avoid confusion with the constituent elements of the first embodiment, reference numerals with "B" added to the end of the reference numerals of the first embodiment will be used, and differences from the first embodiment will be mainly described.

画像処理システム１Ｂが第１の実施形態の画像処理システム１と基本的に異なる点は、分類器が第１の実施形態で定義した広義のものではなく狭義のものであり、補足クラスの推定器を要しないという点にある。つまり、第１の実施形態の分類器は広義のものとして、クラス分類を行う狭義の分類器と補足クラスを推定する推定器とが合わさったものであったが、第２の実施形態の分類器は当該推定器の機能を含まない。具体的には、第２の実施形態の分類器は図２において補足クラス推定部４０４を省略した構成である。 The basic difference between the image processing system 1B and the image processing system 1 of the first embodiment is that the classifier is not broadly defined in the first embodiment but narrowly defined, and the supplementary class estimator is not required. In other words, the classifier of the first embodiment is a combination of a narrow classifier for classifying classes and an estimator for estimating supplementary classes as a classifier in a broad sense, but the classifier of the second embodiment does not include the function of the estimator. Specifically, the classifier of the second embodiment has a configuration in which the supplementary class estimation unit 404 is omitted from FIG.

この分類器の構成に対応して、当該分類器の学習装置としての画像処理システム１Ｂでは、推定器の出力である補足クラスに対する正解データとする学習用補足クラスが不要となる。具体的には、画像処理システム１Ｂは学習装置として、図４にて学習用補足クラス生成手段５２を省略した構成である。 Corresponding to this configuration of the classifier, the image processing system 1B as a learning device for the classifier does not require a learning supplementary class as correct data for the supplementary class output from the estimator. Specifically, the image processing system 1B is configured as a learning device in which the learning supplementary class generating means 52 in FIG. 4 is omitted.

画像処理システム１Ｂにおける注目クラス設定手段５４Ｂは、注目クラス群を複数通り設定する。第１の実施形態では、或る注目クラス情報を分類器に入力しその出力に得られる補足クラスを用いて注目クラス情報を更新することで、複数通りの注目クラス群を逐次的に設定しているのに対し、本実施形態では複数通りの注目クラス群は逐次更新という形態に依らずに設定される。具体的には、注目クラス設定手段５４ＢはＮ個の全クラスを１個ずつ注目クラスとし当該１つの注目クラスからなる注目クラス群をＮ通り設定する。 The attention class setting means 54B in the image processing system 1B sets a plurality of attention class groups. In the first embodiment, by inputting certain attention class information into a classifier and updating the attention class information using supplementary classes obtained in the output, a plurality of attention class groups are sequentially set. On the other hand, in this embodiment, a plurality of attention class groups are set without depending on the form of sequential update. Specifically, the class-of-interest setting means 54B sets each of the N classes as a class of interest, and sets N groups of classes of interest each including the one class of interest.

画像処理システム１Ｂにおける領域分割手段５５Ｂは、注目クラス設定手段５４Ｂによる注目クラス群の複数通りの設定それぞれについて、分類器によりラベル領域を求め、当該ラベル領域に基づいて入力画像に対応する二次元空間を領域分割する。そして、当該領域分割処理で得られる複数通りの領域分割結果のうち、当該領域分割結果を構成するラベル領域と入力画像に対応する二次元空間との整合の度合いについて所定の条件を満たすものを入力画像についての領域分割結果として選択する。 The region dividing means 55B in the image processing system 1B obtains a label region using a classifier for each of the plurality of settings of the attention class group by the attention class setting means 54B, and based on the label region, creates a two-dimensional space corresponding to the input image. is segmented into regions. Then, among the plurality of region segmentation results obtained by the region segmentation process, those satisfying a predetermined condition regarding the degree of matching between the label regions constituting the region segmentation result and the two-dimensional space corresponding to the input image are input. Select as the segmentation result for the image.

図９は画像処理システム１Ｂの領域分割処理での動作に関する概略のフロー図である。 FIG. 9 is a schematic flow diagram of the operation of the image processing system 1B in the segmentation process.

画像処理システム１Ｂが領域分割処理を開始すると、撮影部２は所定時間おきに監視空間を撮影した画像を順次出力する。画像処理部５は通信部３と協働して、撮影部２から画像を受信するたびに図９のフロー図に示す動作を繰り返す。 When the image processing system 1B starts the region dividing process, the photographing unit 2 sequentially outputs images obtained by photographing the surveillance space at predetermined time intervals. The image processing unit 5 cooperates with the communication unit 3 and repeats the operation shown in the flowchart of FIG. 9 each time an image is received from the photographing unit 2 .

当該動作にてまず通信部３が画像入力手段３０として機能し、画像を受信すると当該画像を画像処理部５に入力する（ステップＳ４０）。 In this operation, the communication section 3 first functions as the image input means 30, and upon receiving an image, inputs the image to the image processing section 5 (step S40).

画像処理部５は入力された画像（入力画像）を分類器の特徴量抽出部４００に入力し、入力画像の特徴量を計算する（ステップＳ４１）。第１の実施形態と同様、ここで計算した特徴量は、１つの入力画像に対する注目クラスを変えた複数回の領域分割処理にて繰り返し利用され、これにより画像処理部５の計算量が削減される。 The image processing unit 5 inputs the input image (input image) to the feature amount extraction unit 400 of the classifier, and calculates the feature amount of the input image (step S41). As in the first embodiment, the feature amount calculated here is repeatedly used in a plurality of region division processes with different attention classes for one input image, thereby reducing the amount of calculation of the image processing unit 5. be.

注目クラス設定手段５４Ｂは、注目クラス情報として、クラスベクトルにて１クラスだけを注目クラス群に設定したものを生成し、これを注目クラス情報圧縮部４０１に入力する（ステップＳ４２）。領域分割手段５５Ｂは、ステップＳ４１で計算した特徴量を用いて分類器によりクラス分類処理を行い、ステップＳ４２にて設定された注目クラス群についてのラベル領域を求める（ステップＳ４３）。 The attention class setting means 54B generates attention class information in which only one class is set as an attention class group in the class vector, and inputs it to the attention class information compression section 401 (step S42). The region dividing means 55B performs class classification processing with a classifier using the feature amount calculated in step S41, and obtains a label region for the attention class group set in step S42 (step S43).

画像処理部５はＮ個の全クラスについてステップＳ４２，Ｓ４３を繰り返す（ステップＳ４４にて「ＮＯ」の場合）。これにより、注目クラス設定手段５４ＢはＮ通りの注目クラス群を設定し、領域分割手段５５Ｂは注目クラス群の当該Ｎ通りの設定それぞれについて分類器でラベル領域を求める。 The image processing unit 5 repeats steps S42 and S43 for all N classes (if "NO" in step S44). As a result, the attention class setting means 54B sets N attention class groups, and the area dividing means 55B obtains a label area for each of the N attention class groups using a classifier.

画像処理部５は、全クラスについてそれぞれを注目クラス群とした領域分割処理を終えると（ステップＳ４４にて「ＹＥＳ」の場合）、その処理結果のうち「その他クラス」の部分しかないとされた注目クラス群の設定に対応するものを削除する（ステップＳ４５）。 When the image processing unit 5 completes the region division processing for all the classes with each of them as the target class group (if "YES" in step S44), it is determined that only the "other class" part is included in the processing result. Those corresponding to the setting of the attention class group are deleted (step S45).

ステップＳ４５で残された領域分割結果には注目クラスについてラベル領域が存在する。領域分割手段５５Ｂは残った領域分割結果を組み合わせ、なるべくラベル領域間の重複がなく、かつ画像全体を埋め尽くすことができるようなラベル領域の組み合わせを作成する。この処理は、例えば次のような手法で行うことができる。 A label area exists for the target class in the area segmentation result left in step S45. The area division means 55B combines the remaining area division results to create a combination of label areas that has as little overlap between label areas as possible and can fill the entire image. This processing can be performed, for example, by the following method.

画像処理部５はステップＳ４５で残された各領域分割結果をオリジナルの領域分割結果（オリジナル結果）として保持する一方、当該各領域分割結果の複製を作成し探索用の領域分割結果（探索用結果）として保持する（ステップＳ４６）。 The image processing unit 5 retains each region segmentation result left in step S45 as the original region segmentation result (original result), while creating a copy of each region segmentation result to generate a region segmentation result for search (search result ) (step S46).

ステップＳ４７～Ｓ５３の処理は終了条件を満たすまで繰り返すループ処理である。その中で行われるステップＳ４７～Ｓ５２のループ処理は基本的に複数作成される探索用結果についてのものであり、探索用結果を例えばクラスの識別番号順に処理対象として順次設定して実行される。さらにその中で行われるステップＳ４７～Ｓ４９のループ処理は基本的に複数存在するオリジナル結果についてのものである。 The processing of steps S47 to S53 is a loop processing that is repeated until the end condition is satisfied. The loop processing of steps S47 to S52 performed therein is basically for a plurality of search results, and the search results are sequentially set as processing targets in the order of class identification numbers, for example, and executed. Further, the loop processing of steps S47 to S49 performed therein is basically for a plurality of original results.

オリジナル結果についてのループ処理は、画像処理部５が、オリジナル結果を、それらが含む注目クラスのラベル領域の面積が大きいものから順に１つずつ選択して（ステップＳ４７）、繰り返される。画像処理部５は選択したラベル領域と処理対象に設定した１つの探索用結果に含まれているラベル領域との重なり度合いを示す指標を計算する（ステップＳ４８）。ここでは当該指標としてＩｏＵ（Intersection over Union）を用いる。ＩｏＵは２つの領域の重複部分（Intersection）の面積をＩ、２つの領域の和領域（Union）の面積をＵとして、ＩｏＵ＝Ｉ／Ｕで与えられ、０～１の値を取り、０に近いほど２つの領域の重なり度合いが低いことを表す。 The loop processing for the original results is repeated after the image processing unit 5 selects the original results one by one in descending order of the area of the label region of the attention class included therein (step S47). The image processing unit 5 calculates an index indicating the degree of overlap between the selected label area and the label area included in one search result set as the processing target (step S48). Here, IoU (Intersection over Union) is used as the index. IoU is given by IoU=I/U, where I is the area of the intersection of the two regions and U is the area of the union of the two regions. The closer it is, the lower the degree of overlap between the two regions.

画像処理部５は、ＩｏＵが予め定めた閾値Ｔより大きく、且つ、処理対象としている探索用結果とのＩｏＵを計算していないオリジナル結果がある場合は、他のオリジナル結果についてステップＳ４７，Ｓ４８を繰り返す（ステップＳ４９にて「ＮＯ」の場合）。一方、ＩｏＵが閾値Ｔ以下であった場合、または、全てのオリジナル結果についてＩｏＵを計算し終えた場合は（ステップＳ４９にて「ＹＥＳ」の場合）、ステップＳ５０に処理を進める。 If there is an original result whose IoU is greater than the predetermined threshold value T and for which the IoU of the search result to be processed has not been calculated, the image processing unit 5 performs steps S47 and S48 for other original results. Repeat (if "NO" in step S49). On the other hand, if the IoU is equal to or less than the threshold T, or if the IoU has been calculated for all original results ("YES" in step S49), the process proceeds to step S50.

ステップＳ５０では画像処理部５は、処理対象に設定されている探索用結果とのＩｏＵが閾値Ｔ以下のオリジナル結果が存在するか否かを調べ、存在する場合には（ステップＳ５０にて「ＹＥＳ」の場合）、当該オリジナル結果に含まれている注目クラスのラベル領域を探索用結果に含まれているラベル領域とマージし（ステップＳ５１）、一方、存在しない場合には（ステップＳ５０にて「ＮＯ」の場合）、ステップＳ５１は省略される。ステップＳ５１にて画像処理部５は、マージした結果で探索用結果のラベル領域を更新する。なお、ラベル領域が重なった部分は、探索用結果のラベル領域を優先し残す。 In step S50, the image processing unit 5 checks whether or not there is an original result whose IoU with the search result set as the processing target is equal to or less than the threshold value T. ”), the label region of the attention class included in the original result is merged with the label region included in the search result (step S51). NO"), step S51 is omitted. In step S51, the image processing unit 5 updates the label area of the search result with the merged result. In addition, in the portion where the label areas overlap, the label area of the search result is preferentially left.

画像処理部５はステップＳ４７～Ｓ５１の処理を処理対象の探索用結果を変えて反復する（ステップＳ５２にて「ＮＯ」の場合）。全ての探索用結果を処理対象とし終えた場合は（ステップＳ５２にて「ＹＥＳ」の場合）、終了判定を行う（ステップＳ５３）。終了判定にて、所定の終了条件が満たされていない場合は（ステップＳ５３にて「ＮＯ」の場合）、ステップＳ４７～Ｓ５２の処理を繰り返す。ここで、ステップＳ５１のマージ処理により探索用結果におけるラベル領域は拡大し、ステップＳ４７～Ｓ５２の処理を繰り返すことで、探索用結果におけるラベル領域以外である「その他クラス」の領域は基本的に徐々に減少する。 The image processing unit 5 repeats the processing of steps S47 to S51 by changing the search result to be processed (if "NO" in step S52). When all the search results have been processed ("YES" in step S52), an end determination is made (step S53). If the predetermined termination condition is not satisfied in the termination determination ("NO" in step S53), the processing of steps S47 to S52 is repeated. Here, the merging process in step S51 enlarges the label area in the search result, and by repeating the processes in steps S47 to S52, the "other class" area other than the label area in the search result is basically gradually expanded. to

ステップＳ５３の終了判定における終了条件は、探索用分類結果として得られている領域分割結果を構成するラベル領域と入力画像に対応する二次元空間との整合の度合いについてのものであり、例えば、当該度合いを表す指標の繰り返しに伴う変化が無い又は所定値以下であれば終了と判定される。具体的には、終了条件は、追加できるオリジナルの分類結果のラベル領域が無くなることや、そのことによって探索用分類結果における「その他クラス」の領域が減少しなくなることや、当該減少が所定値以下となることとすることができる。また、例えば、整合の度合いを表す指標が所定基準を超える又は下回れば終了と判定される。具体的には、終了条件は、探索用分類結果のラベル領域に含まれるクラス数が予め予め設定した最大個数を超えることや、「その他クラス」のラベル領域の面積が入力画像の面積の所定割合以下となることとすることができる。或いは、終了条件を、上記条件のうちの２以上のいずれかを満たすこととしてもよい。 The termination condition in the termination determination in step S53 relates to the degree of matching between the label regions forming the segmentation result obtained as the search classification result and the two-dimensional space corresponding to the input image. If the index representing the degree does not change with repetition or is equal to or less than a predetermined value, it is determined to be finished. Specifically, the end condition is that there is no more label area in the original classification result that can be added, that the "other class" area in the search classification result does not decrease, or that the decrease is less than or equal to a predetermined value. It can be assumed that Also, for example, if the index representing the degree of matching exceeds or falls below a predetermined standard, it is determined that the processing is completed. Specifically, the end condition is that the number of classes included in the label area of the classification result for search exceeds a preset maximum number, or that the area of the label area of the "other class" is a predetermined ratio of the area of the input image. It can be as follows. Alternatively, the termination condition may be that any two or more of the above conditions are satisfied.

画像処理部５は終了条件が満たされた場合、探索処理を終了し（ステップＳ５３にて「ＹＥＳ」の場合）、探索用結果のうち「その他クラス」の領域が最小であるものにおけるラベル領域を、入力画像の領域分割結果として領域情報出力手段３１により出力する（ステップＳ５４）。 When the termination condition is satisfied, the image processing unit 5 terminates the search process ("YES" in step S53), and determines the label area in the search result with the smallest "other class" area. , is output by the area information output means 31 as the area segmentation result of the input image (step S54).

《第３の実施形態》
本発明の第３の実施形態に係る画像処理システム１Ｃは第１の実施形態の図１と共通の構成であり、また第１の実施形態と同様、入力画像に対する領域分割処理を行う領域分割装置、およびその学習装置として動作する。以下、第３の実施形態について、第１の実施形態と同様の構成要素については同一の符号を付して第１の実施形態での説明を援用し、また、説明上、第１の実施形態の構成要素との混同を避ける場合には第１の実施形態の符号の後ろに“Ｃ”を付した符号を用いることとし、主に第１の実施形態との相違点を説明する。 <<Third Embodiment>>
An image processing system 1C according to the third embodiment of the present invention has the same configuration as that of the first embodiment shown in FIG. , and its learner. Hereinafter, regarding the third embodiment, the same reference numerals are attached to the same components as in the first embodiment, and the description in the first embodiment is used. In order to avoid confusion with the constituent elements of the first embodiment, reference numerals with "C" added to the end of the reference numerals of the first embodiment will be used, and differences from the first embodiment will be mainly described.

画像処理システム１Ｃの分類器は第２の実施形態と同様、補足クラスの推定器を含まないもの、つまり上述した狭義の分類器であり、補足クラス推定部４０４を有さず、これに対応して学習装置の構成に関し第２の実施形態と同様、学習用補足クラス生成手段５２を要さない。 As in the second embodiment, the classifier of the image processing system 1C does not include a supplementary class estimator. As for the configuration of the learning apparatus, the supplementary class generating means 52 for learning is not required as in the second embodiment.

第１および第２の実施形態では、クラス分類部４０３は、画像中にて注目クラスに分類されない部分について、具体的なクラスを特定せず「その他クラス」として出力する構成とすることができたが、本実施形態のクラス分類部４０３は注目クラス以外のクラスについてもラベル領域を出力する。 In the first and second embodiments, the class classification unit 403 can be configured to output a portion of the image that is not classified into the attention class as "other class" without specifying a specific class. However, the class classification unit 403 of this embodiment also outputs label regions for classes other than the attention class.

本実施形態における注目クラス情報は、クラス分類処理に偏りを持たせるために与えるバイアス情報としての性格を有する。具体的には、分類器に入力する注目クラス情報は、Ｎ次元のクラスベクトルで定義され、クラス分類結果に現れやすくするクラスの要素に値“１”、クラス分類結果に現れにくくするクラスの要素に値“０”を設定する。 The attention class information in the present embodiment has a characteristic as bias information given to bias the class classification process. Specifically, the attention class information to be input to the classifier is defined by an N-dimensional class vector. is set to the value "0".

分類器の学習動作では、学習手段５３には置換済み正解ラベル領域ではなくオリジナルの正解ラベル領域が入力され、また学習用注目クラス生成手段５１は学習用画像についての正解のクラスを学習用注目クラス情報として学習手段５３に入力する。よって、本実施形態の学習装置の構成では正解ラベル置換手段５０も要さない。 In the learning operation of the classifier, the original correct label region is input to the learning means 53 instead of the replaced correct label region. It is input to the learning means 53 as information. Therefore, the configuration of the learning apparatus of this embodiment does not require the correct label replacement means 50 either.

画像処理システム１Ｃにおける注目クラス設定手段５４Ｃは、クラス分類部４０３の出力に基づいて補足クラスを決め、現在の注目クラス群に当該補足クラスを加えて新たな注目クラス群を設定する。つまり、注目クラス設定手段５４Ｃは、補足クラス推定部４０４を用いずに補足クラスを定める点で第１の実施形態と相違するが、一方、第１の実施形態の注目クラス設定手段５４と同様、注目クラス群に補足クラスを加えて新たな注目クラス群を設定する処理により、複数通りの注目クラス群を逐次的に設定する。 The attention class setting means 54C in the image processing system 1C determines a supplementary class based on the output of the class classification section 403, adds the supplementary class to the current attention class group, and sets a new attention class group. In other words, the attention class setting means 54C is different from the first embodiment in that the supplementary class is determined without using the supplementary class estimation section 404. A plurality of types of attention class groups are successively set by the process of setting a new attention class group by adding a complementary class to the attention class group.

具体的には、クラス分類部４０３の出力にて入力画像に現れているとされたクラスであって、現在の注目クラス群に含まれていないものを補足クラス候補とし、その中から補足クラスを選択する。例えば、注目クラス設定手段５４Ｃは、補足クラス候補のうちラベル領域の面積が最大のクラスを補足クラスとして選択し、注目クラス群を更新する。更新後の注目クラス情報では、更新前の注目クラス群と補足クラスに対応する要素に値“１”が設定され、それ以外のクラス、つまり補足クラス候補のうち補足クラスに選択されなかったものと画像に現れていないとされたクラスの要素に値“０”が設定される。 Specifically, a class that appears in the input image in the output of the class classification unit 403 and is not included in the current target class group is set as a supplementary class candidate, and a supplementary class is selected from among these. select. For example, the attention class setting unit 54C selects the class having the largest label region area among the supplementary class candidates as the supplementary class, and updates the attention class group. In the attention class information after updating, the value "1" is set to the elements corresponding to the attention class group and the supplementary class before updating, and the other classes, that is, the supplementary class candidates were not selected as the supplementary class. A value of "0" is set to the element of the class determined not to appear in the image.

注目クラス設定手段５４Ｃは、例えば、Ｎ個の全クラスに値“０”が設定されたクラスベクトルを注目クラス情報の初期値として設定し、クラス分類結果に基づいて順次、補足クラスを追加する処理を繰り返す。そして、画像処理部５は終了条件が満たされたときのクラス分類部４０３の出力で得られるラベル領域を、入力画像の領域分割結果として領域情報出力手段３１により出力する。 The attention class setting means 54C sets, for example, a class vector in which the value "0" is set for all N classes as the initial value of the attention class information, and sequentially adds supplementary classes based on the class classification results. repeat. Then, the image processing unit 5 outputs the label area obtained by the output of the class classification unit 403 when the termination condition is satisfied, by the area information output means 31 as the area segmentation result of the input image.

終了条件として、注目クラス群に対応する当該ラベル領域と入力画像に対応する二次元空間との整合の度合いに関する条件を設定することができる。例えば、当該度合いを表す指標の繰り返しに伴う変化が無い又は所定値以下であれば終了と判定される。具体的には、終了条件は、注目クラス情報が更新されなくなることや、注目クラス群に対応するラベル領域の面積が増加しなくなることや、当該増加が所定値以下となることとすることができる。また、例えば、整合の度合いを表す指標が所定基準を超える又は下回れば終了と判定される。具体的には、終了条件は、注目クラス群に対応するラベル領域の面積が入力画像の面積の所定割合以上となったことや、注目クラス群に対応しないラベル領域の面積が入力画像の面積の所定割合以下となったこととすることができる。また、終了条件は、例えば、クラス分類処理の繰り返し回数が予め設定した最大回数を超えたこととすることができる。或いは、終了条件を、上記条件のうちの２以上のいずれかを満たすこととしてもよい。 As a termination condition, a condition regarding the degree of matching between the label region corresponding to the class group of interest and the two-dimensional space corresponding to the input image can be set. For example, if there is no change accompanying repetition of the index representing the degree or it is equal to or less than a predetermined value, it is determined that the process has ended. Specifically, the termination condition can be that the attention class information is no longer updated, that the area of the label region corresponding to the attention class group is no longer increased, or that the increase is equal to or less than a predetermined value. . Also, for example, if the index representing the degree of matching exceeds or falls below a predetermined standard, it is determined that the processing is finished. Specifically, the end condition is that the area of the label region corresponding to the class group of interest becomes equal to or greater than a predetermined ratio of the area of the input image, or that the area of the label region that does not correspond to the class group of interest is less than the area of the input image. It can be assumed that it has become equal to or less than a predetermined ratio. Further, the termination condition can be, for example, that the number of repetitions of the classifying process exceeds a preset maximum number. Alternatively, the termination condition may be that any two or more of the above conditions are satisfied.

［変形例］
（１）上記各実施形態では、データ群を二次元画像とする例を示したが、この例に限られない。例えばデータ群を二次元画像の時系列とすることができる。その場合、空間は時空間であり、データは画素である。また例えば、データ群を距離画像、空間を二次元空間、データを画素（距離値）とすることもできる。なお、その場合、撮影部２は距離画像センサとなる。また例えば、データ群をポイントクラウド等の三次元計測データ、空間を三次元空間、データを計測点とすることもできる。なお、その場合は撮影部２に代えて三次元計測器が用いられる。 [Modification]
(1) In each of the above-described embodiments, an example in which a data group is a two-dimensional image was shown, but the present invention is not limited to this example. For example, the data set can be a time series of two-dimensional images. In that case, space is spatio-temporal and data is pixels. Further, for example, the data group can be a range image, the space can be a two-dimensional space, and the data can be pixels (distance values). In this case, the photographing unit 2 becomes a distance image sensor. Further, for example, the data group can be three-dimensional measurement data such as a point cloud, the space can be three-dimensional space, and the data can be measurement points. In that case, a three-dimensional measuring device is used instead of the photographing unit 2. FIG.

（２）上記第１の実施形態および変形例では、狭義の分類器の学習モデルと推定器の学習モデルとは、特徴量抽出部４００を共有する例を示したが、両学習モデルは共通部分を持たない別個のモデルとしても良い。その場合、分類器と推定器は、共通の学習用データによって学習させてもよいし、別々の学習用データによって学習させてもよい。 (2) In the above-described first embodiment and modified example, the learning model of the narrow-sense classifier and the learning model of the estimator share the feature quantity extraction unit 400, but both learning models have a common part It may be a separate model that does not have In that case, the classifier and the estimator may be trained using common learning data or may be trained using separate learning data.

（３）上記各実施形態および各変形例では、注目クラス情報圧縮部４０１を特徴量抽出部４００およびクラス分類部４０３との同時並列的な学習によって生成している。これに代えて、学習データのクラスの出現傾向を基にした事前の主成分分析などによって注目クラス情報圧縮部４０１を別途生成してもよい。 (3) In each of the above-described embodiments and modifications, the class-of-interest information compression unit 401 is generated by simultaneous parallel learning with the feature amount extraction unit 400 and the class classification unit 403 . Alternatively, the class-of-interest information compression unit 401 may be separately generated by prior principal component analysis based on the appearance tendency of classes in learning data.

（４）上記各実施形態および各変形例では、分類器の注目クラス情報圧縮部４０１にて注目クラス情報を次元圧縮する例を説明した。しかし、注目クラス情報圧縮部４０１を使用せず、入力された注目クラス情報そのままを、特徴量合成部４０２にて特徴量抽出部４００からの画像特徴量と合成してもよい。 (4) In each of the above-described embodiments and modifications, an example in which the attention class information is dimensionally compressed by the attention class information compression unit 401 of the classifier has been described. However, without using the class-of-interest information compression unit 401 , the input class-of-interest information may be combined with the image feature quantity from the feature quantity extraction unit 400 in the feature quantity synthesis unit 402 .

以上で説明した領域分割装置・方法・プログラムによれば、画像（データ群）とともに注目クラス情報（注目クラス群）を入力することによって学習用データの多様性や学習用データにおける付与基準の混在に起因する変動を抑制した高精度なクラス分類ができるよう学習された分類器（狭義）で領域分割を行うので、変動を抑制した高精度な領域分割が可能となる。 According to the region dividing device, method, and program described above, by inputting the attention class information (attention class group) together with the image (data group), the diversity of the learning data and the mixture of the assignment criteria in the learning data can be handled. Since region segmentation is performed by a classifier (in a narrow sense) that has been trained to perform highly accurate class classification while suppressing fluctuations, highly accurate region segmentation with suppressed fluctuations is possible.

特に、一つのデータ群に対して複数通りの注目クラス群を設定して領域分割を行い、データ群が分布する空間に対するラベル領域の整合の度合いが所定の条件を満たす領域分割結果を選択することによって、注目クラス群の確度が高まるので、変動を抑制した高精度な領域分割が確実に実行可能となる。 In particular, region division is performed by setting a plurality of attention class groups for one data group, and selecting a region division result that satisfies a predetermined condition for the degree of matching of label regions with respect to the space in which the data group is distributed. This increases the accuracy of the class group of interest, so that it is possible to reliably perform high-precision segmentation that suppresses fluctuations.

また、第１の実施形態およびその変形例にて例示したように、注目クラス群に補足クラスを加えては条件を満たすまで領域分割を繰り返すことにより、変動を抑制した高精度な領域分割の確実な実行が可能となる。つまり、注目クラス群が小さな部分集合であるほど多様性や付与基準の混在に起因する変動を抑制する効果は高まるため、注目クラス群を小さな部分集合から次第に大きな部分集合にして領域分割を行うことで、変動を抑制した高精度な領域分割の確実な実行が可能となる。 Further, as exemplified in the first embodiment and its modification, by adding a supplementary class to the attention class group and repeating the region division until the conditions are satisfied, highly accurate region division with suppressed fluctuations is ensured. execution becomes possible. In other words, the smaller the subset of the class group of interest, the greater the effect of suppressing variations due to the mixture of diversity and assignment criteria. , it is possible to reliably perform high-precision segmentation that suppresses fluctuations.

また、第１の実施形態およびその変形例にて例示したように、注目クラス群を正解に近づけるために加えるべき補足クラスを高精度に推定できる学習が行われた推定器により補足クラスを推定しつつ領域分割を繰り返すことによっても、注目クラス群の確度をさらに高めることができる。 Further, as exemplified in the first embodiment and its modified example, the supplementary class is estimated by an estimator that has undergone training that can highly accurately estimate the supplementary class to be added in order to bring the attention class group closer to the correct answer. The accuracy of the attention class group can be further increased by repeating the region division while keeping the distance.

１画像処理システム、２撮影部、３通信部、４記憶部、５画像処理部、６表示部、４０学習用データ記憶手段、４１学習モデル記憶手段、４２モデル記憶手段、５０正解ラベル置換手段、５１学習用注目クラス生成手段、５２学習用補足クラス生成手段、５３学習手段、５４注目クラス設定手段、５５領域分割手段、５６分類器、４００特徴量抽出部、４０１注目クラス情報圧縮部、４０２特徴量合成部、４０３クラス分類部、４０４補足クラス推定部。 1 image processing system, 2 imaging unit, 3 communication unit, 4 storage unit, 5 image processing unit, 6 display unit, 40 learning data storage means, 41 learning model storage means, 42 model storage means, 50 correct label replacement means, 51 attention class generating means for learning 52 supplementary class generating means for learning 53 learning means 54 attention class setting means 55 region dividing means 56 classifier 400 feature amount extraction unit 401 attention class information compression unit 402 feature Quantity synthesizing unit, 403 class classifying unit, 404 supplemental class estimating unit.

Claims

所定の空間に分布するデータ群を複数のクラスに分類する分類処理を行い前記空間を前記クラスで識別されるラベル領域に分割する領域分割装置であって、
前記データ群と注目クラス群とを入力され当該データ群についての前記分類処理を行い前記注目クラス群についての前記ラベル領域を出力する分類器として、学習用データ群、当該学習用データ群に対し予め与えられた正解のクラス、及び当該正解のクラスの部分集合で与えられる学習用注目クラス群を用いて学習が行われた学習済みモデルを記憶しているモデル記憶手段と、
前記データ群に対する前記注目クラス群を複数通り設定する注目クラス設定手段と、
前記注目クラス設定手段による前記注目クラス群の複数通りの設定それぞれについて、前記分類器により前記ラベル領域を求め、当該ラベル領域に基づく前記空間の領域分割結果のうち、当該領域分割結果を構成する前記ラベル領域と前記空間との整合の度合いについて所定の条件を満たすものを前記データ群についての領域分割結果として選択する領域分割手段と、
を有することを特徴とする領域分割装置。 A region dividing device for classifying a data group distributed in a predetermined space into a plurality of classes and dividing the space into label regions identified by the classes,
As a classifier that receives the data group and the attention class group as input, performs the classification processing for the data group, and outputs the label region for the attention class group, the learning data group and the learning data group are preliminarily model storage means for storing a trained model trained using a given correct class and a group of attention classes for learning given as a subset of the correct class;
attention class setting means for setting a plurality of the attention class groups for the data group;
For each of the plurality of settings of the attention class group by the attention class setting means, the label area is obtained by the classifier, and the area division result is constructed from among the area division results of the space based on the label area. region dividing means for selecting a region dividing result for the data group that satisfies a predetermined condition regarding the degree of matching between the label region and the space;
A region dividing device comprising:

前記注目クラス設定手段は、前記注目クラス群に補足クラスを加えて新たな前記注目クラス群を設定する処理により、逐次的に前記注目クラスを複数通り設定し、
前記領域分割手段は、前記複数通りの前記注目クラス群について前記分類器が出力する前記ラベル領域のうちその大きさが予め定めた基準以上となるものを前記データ群についての領域分割結果として選択すること、を特徴とする請求項１に記載の領域分割装置。 The class-of-interest setting means sequentially sets a plurality of classes of interest by adding a complementary class to the class-of-interest group to set a new class-of-interest group,
The region dividing means selects, as a region dividing result for the data group, those label regions output by the classifier for the plurality of attention class groups whose sizes are equal to or larger than a predetermined standard. 2. The region dividing device according to claim 1, characterized by:

前記学習済みモデルは、前記データ群と前記注目クラス群とを入力され前記補足クラスを推定する推定器として、さらに前記学習用データ群についての前記補足クラスの正解を用いて前記学習が行われていること、
を特徴とする請求項２に記載の領域分割装置。 The trained model serves as an estimator for estimating the supplementary class to which the data group and the class group of interest are input, and the learning is performed using the correct answer of the supplementary class for the learning data group. to be
3. The region dividing device according to claim 2, characterized by:

空間に分布するデータ群を複数のクラスに分類する分類処理を行い前記空間を前記クラスで識別されるラベル領域に分割する領域分割方法であって、
前記データ群と注目クラス群とを入力され当該データ群についての前記分類処理を行い前記注目クラス群についての前記ラベル領域を出力する分類器として、学習用データ群、当該学習用データ群に対し予め与えられた正解のクラス、及び当該正解のクラスの部分集合で与えられる学習用注目クラス群を用いて学習が行われた学習済みモデルを用意するステップと、
前記データ群に対する前記注目クラス群を複数通り設定する注目クラス設定ステップと、
前記注目クラス設定ステップにおける前記注目クラス群の複数通りの設定それぞれについて、前記分類器により前記ラベル領域を求め、当該ラベル領域に基づく前記空間の領域分割結果のうち、当該領域分割結果を構成する前記ラベル領域と前記空間との整合の度合いについて所定の条件を満たすものを前記データ群についての領域分割結果として選択する領域分割ステップと、
を有することを特徴とする領域分割方法。 A region dividing method for classifying a data group distributed in a space into a plurality of classes and dividing the space into label regions identified by the classes,
As a classifier that receives the data group and the attention class group as input, performs the classification processing for the data group, and outputs the label region for the attention class group, a learning data group and a preparing a trained model that has been trained using a given correct class and a group of attention classes for learning given as a subset of the correct class;
an attention class setting step of setting a plurality of the attention class groups for the data group;
For each of the plurality of settings of the attention class group in the attention class setting step, the label area is obtained by the classifier, and the area division result is constructed from among the area division results of the space based on the label area. a segmentation step of selecting a region segmentation result for the data group that satisfies a predetermined condition regarding the degree of matching between the label region and the space;
A region segmentation method characterized by comprising:

空間に分布するデータ群を複数のクラスに分類する分類処理を行い前記空間を前記クラスで識別されるラベル領域に分割する処理をコンピュータに行われるプログラムであって、
当該コンピュータを、
前記データ群と注目クラス群とを入力され当該データ群についての前記分類処理を行い前記注目クラス群についての前記ラベル領域を出力する分類器として、学習用データ群、当該学習用データ群に対し予め与えられた正解のクラス、及び当該正解のクラスの部分集合で与えられる学習用注目クラス群を用いて学習が行われた学習済みモデルを記憶しているモデル記憶手段、
前記データ群に対する前記注目クラス群を複数通り設定する注目クラス設定手段、及び、
前記注目クラス設定手段による前記注目クラス群の複数通りの設定それぞれについて、前記分類器により前記ラベル領域を求め、当該ラベル領域に基づく前記空間の領域分割結果のうち、当該領域分割結果を構成する前記ラベル領域と前記空間との整合の度合いについて所定の条件を満たすものを前記データ群についての領域分割結果として選択する領域分割手段、
として機能させることを特徴とする領域分割プログラム。 A program for causing a computer to perform a classification process of classifying a data group distributed in a space into a plurality of classes and dividing the space into label areas identified by the classes,
the computer,
As a classifier that receives the data group and the attention class group as input, performs the classification processing for the data group, and outputs the label region for the attention class group, a learning data group and a Model storage means for storing a trained model that has been trained using a given correct class and a group of attention classes for learning given as a subset of the correct class;
attention class setting means for setting a plurality of the attention class groups for the data group;
For each of the plurality of settings of the attention class group by the attention class setting means, the label area is obtained by the classifier, and the area division result is constructed from among the area division results of the space based on the label area. region dividing means for selecting, as a region dividing result for the data group, those satisfying a predetermined condition regarding the degree of matching between the label region and the space;
A region segmentation program characterized by functioning as