JP2017010318A

JP2017010318A - Information processing apparatus and information processing method

Info

Publication number: JP2017010318A
Application number: JP2015125803A
Authority: JP
Inventors: 友貴藤森; Tomoki Fujimori; 裕輔御手洗; Hirosuke Mitarai
Original assignee: Canon Inc
Current assignee: Canon Inc
Priority date: 2015-06-23
Filing date: 2015-06-23
Publication date: 2017-01-12

Abstract

PROBLEM TO BE SOLVED: To select a combination of feature quantities suitable for classification of input data, with high accuracy.SOLUTION: An information processing apparatus generates a combination of feature quantities extracted from input data, calculates a first evaluation value for evaluating whether the generated combination of feature quantities is suitable for classifying input data, generates a plurality of parameters to be used in calculating a second evaluation value for evaluating a combination of feature quantities, calculates the second evaluation value on the basis of the first evaluation value for each of the parameters, and selects feature quantities with high accuracy in selecting the combination of feature quantities suitable for classification of input data so as to select the feature quantities, on the basis of the second evaluation value, for each of the parameters, and to generate subsets of the feature quantities.SELECTED DRAWING: Figure 2

Description

本発明は、入力データから抽出される複数の特徴量から、入力データの分類に用いる特徴量を選択する情報処理装置及び情報処理方法に関する。 The present invention relates to an information processing apparatus and an information processing method for selecting a feature amount used for classification of input data from a plurality of feature amounts extracted from input data.

外観検査などの情報処理装置において、検査対象物を撮影した画像から画素値の平均や分散といった多様な特徴量群を抽出し、良否判定（良品と不良品の２クラス判別）するといった手法がある。しかし、多数の特徴量を全て使うと、特徴の次元が高次になってしまい汎化性能が低下し、また、冗長な特徴量を抽出することにより処理時間が増大してしまう。よって、適切な特徴量を選択することにより、汎化性能を低下させず、演算処理を高速化させる手法が重要となっている。 In an information processing apparatus such as an appearance inspection, there is a technique of extracting various feature amount groups such as an average or variance of pixel values from an image obtained by photographing an inspection target and determining pass / fail (two-class discrimination between a good product and a defective product). . However, if all of the many feature quantities are used, the dimension of the feature becomes higher order and the generalization performance is deteriorated, and the processing time is increased by extracting redundant feature quantities. Therefore, it is important to select an appropriate feature amount so as to speed up the arithmetic processing without deteriorating generalization performance.

下記特許文献１には、入力データから抽出される複数の特徴量から、入力データの分類に用いる特徴量を選択するために、特徴量間の組み合わせの相性を考慮し、入力データの分類に適した特徴量を選択する技術が開示されている。具体的には、特徴量を組み合わせて第一評価値を算出し、第一評価値同士を比較することにより特徴量間の組み合わせの相性を示す第二評価値を算出し、第二評価値に基づいて特徴量を選択する。 In Patent Document 1 below, in order to select a feature value used for classification of input data from a plurality of feature values extracted from input data, it is suitable for classification of input data in consideration of the compatibility of combinations between feature values. A technique for selecting a feature amount is disclosed. Specifically, the first evaluation value is calculated by combining the feature amounts, the second evaluation value indicating the compatibility of the combination between the feature amounts is calculated by comparing the first evaluation values, and the second evaluation value is calculated. A feature value is selected based on this.

特許第５４１４４１６号公報Japanese Patent No. 5414416

しかしながら、特許文献１に開示されている特徴量の選択では、特徴量の組み合わせを評価する評価値を算出する処理において使用するパラメータが存在し、このパラメータを適切に設定する必要があるが、適切に設定することが困難な場合があった。本発明は、入力データの分類に適した特徴量の組み合わせの選択において精度の高い特徴量の選択を行えるようにすることを目的とする。 However, in the feature amount selection disclosed in Patent Document 1, there is a parameter used in the process of calculating an evaluation value for evaluating the combination of feature amounts, and it is necessary to appropriately set this parameter. There was a case where it was difficult to set. An object of the present invention is to enable selection of feature amounts with high accuracy in selecting a combination of feature amounts suitable for classification of input data.

本発明に係る情報処理装置は、入力データから抽出される複数の特徴量を組み合わせて、特徴量の組み合わせを生成する組み合わせ生成手段と、前記特徴量の組み合わせに対して、前記入力データの分類の判定に適しているか否かを評価する第一評価値を算出する第一の算出手段と、前記第一評価値に基づいて前記特徴量の組み合わせを評価する第二評価値を算出する際に使用するパラメータを複数生成するパラメータ生成手段と、前記複数のパラメータ毎に前記第二評価値を算出する第二の算出手段と、前記複数のパラメータ毎に、前記第二評価値に基づいて前記特徴量を選択し、前記特徴量のサブセットを生成するサブセット生成手段とを有することを特徴とする。 An information processing apparatus according to the present invention includes a combination generation unit that generates a combination of feature amounts by combining a plurality of feature amounts extracted from input data, and classifies the input data with respect to the combination of feature amounts. Used for calculating a first evaluation value for calculating a first evaluation value for evaluating whether or not it is suitable for determination, and a second evaluation value for evaluating a combination of the feature amounts based on the first evaluation value Parameter generating means for generating a plurality of parameters to be performed, second calculating means for calculating the second evaluation value for each of the plurality of parameters, and the feature quantity based on the second evaluation value for each of the plurality of parameters And a subset generation means for generating a subset of the feature quantity.

本発明によれば、第二評価値を算出する際に使用するパラメータを複数生成し、複数のパラメータ毎に第二評価値を算出して特徴量を選択することで、入力データの分類に適した特徴量の組み合わせの選択において精度の高い特徴量の選択を行うことができる。 According to the present invention, it is suitable for the classification of input data by generating a plurality of parameters used when calculating the second evaluation value, calculating the second evaluation value for each of the plurality of parameters, and selecting the feature amount. In selecting a combination of feature quantities, it is possible to select a feature quantity with high accuracy.

第１の実施形態における情報処理システムの構成例を示す図である。It is a figure which shows the structural example of the information processing system in 1st Embodiment. 第１の実施形態における情報処理装置の構成例を示す図である。It is a figure which shows the structural example of the information processing apparatus in 1st Embodiment. 第１の実施形態における情報処理装置の処理例を示すフローチャートである。It is a flowchart which shows the process example of the information processing apparatus in 1st Embodiment. 第１の実施形態における第二評価値の算出の概念を示す図である。It is a figure which shows the concept of calculation of the 2nd evaluation value in 1st Embodiment. 第１の実施形態における特徴量のサブセットの生成処理を示すフローチャートである。It is a flowchart which shows the production | generation process of the subset of the feature-value in 1st Embodiment. 第２の実施形態における特徴量のサブセットの生成処理を示すフローチャートである。It is a flowchart which shows the production | generation process of the subset of the feature-value in 2nd Embodiment. 第３の実施形態における特徴量のサブセットの生成処理を示すフローチャートである。It is a flowchart which shows the production | generation process of the subset of the feature-value in 3rd Embodiment. 第３の実施形態における特徴量のサブセットの設定を説明する概念図である。It is a conceptual diagram explaining the setting of the subset of the feature-value in 3rd Embodiment. 第４の実施形態における情報処理システムの構成例を示す図である。It is a figure which shows the structural example of the information processing system in 4th Embodiment. 第４の実施形態における情報処理装置の処理例を示すフローチャートである。It is a flowchart which shows the process example of the information processing apparatus in 4th Embodiment. 本実施形態における情報処理装置を実現可能なコンピュータ機能を示すブロック図である。It is a block diagram which shows the computer function which can implement | achieve the information processing apparatus in this embodiment.

以下、本発明の実施形態を図面に基づいて説明する。 Hereinafter, embodiments of the present invention will be described with reference to the drawings.

（第１の実施形態）
本発明の第１の実施形態について説明する。以下では、検査ライン上で検査対象物が運ばれており、画像を撮影して検査を行い、検査結果を出力するタスクを例に説明する。第１の実施形態では、入力データから抽出される特徴量の組み合わせを評価する第二評価値を算出する際に使用する重みパラメータを複数生成する。そして、複数の重みパラメータ毎に、第二評価値を算出し、算出した第二評価値に基づいて特徴量のサブセットを生成することで、入力データの分類に適している特徴量のサブセットを選択する。 (First embodiment)
A first embodiment of the present invention will be described. In the following, a description will be given by taking as an example a task in which an inspection object is carried on an inspection line, an image is captured and inspected, and an inspection result is output. In the first embodiment, a plurality of weight parameters used when calculating a second evaluation value for evaluating a combination of feature amounts extracted from input data are generated. Then, a second evaluation value is calculated for each of the plurality of weight parameters, and a feature amount subset is generated based on the calculated second evaluation value, thereby selecting a feature amount subset suitable for classification of input data. To do.

図１は、第１の実施形態における情報処理システムの構成例を示す図である。図１に示す情報処理システムでは、検査対象物１０１の外観に対し、正常であるか異常であるかの判定（正常異常判定）を行う。画像撮影装置１０２は、検査対象物１０１の画像の撮影を行う。情報処理装置１０３は、検査対象物１０１の検査を行う装置であり、画像撮影装置１０２で撮影された画像を用い設定された領域に対して正常異常判定を行う。 FIG. 1 is a diagram illustrating a configuration example of an information processing system according to the first embodiment. In the information processing system shown in FIG. 1, it is determined whether the appearance of the inspection object 101 is normal or abnormal (normal / abnormal determination). The image capturing device 102 captures an image of the inspection object 101. The information processing apparatus 103 is an apparatus that inspects the inspection object 101, and performs normality / abnormality determination on a set region using an image captured by the image capturing apparatus 102.

表示装置１０４は、モニタ等から構成され、情報処理装置１０３から送信される検査結果（判定結果）を表示する。光源１０５は、例えば欠陥の可視化のために検査対象物１０１に光を照射する。光源１０５から光を検査対象物１０１に照射し、画像撮影装置１０２が検査対象物１０１の画像を撮影する。 The display device 104 is configured by a monitor or the like, and displays an inspection result (determination result) transmitted from the information processing device 103. The light source 105 irradiates the inspection object 101 with light, for example, to visualize defects. Light from the light source 105 is applied to the inspection object 101, and the image capturing apparatus 102 captures an image of the inspection object 101.

図２は、本実施形態における情報処理装置１０３の構成例を示す図である。特徴量抽出部２０１は、画像撮像装置１０２により検査対象領域を撮像して得られた入力データとしての画像データから特徴量を抽出する。特徴量選択部２０２は、特徴量抽出部２０１で抽出された特徴量に対して特徴量選択を行うことにより、特徴量の組み合わせを決定する。 FIG. 2 is a diagram illustrating a configuration example of the information processing apparatus 103 according to the present embodiment. The feature amount extraction unit 201 extracts a feature amount from image data as input data obtained by imaging the inspection target area by the image capturing apparatus 102. The feature amount selection unit 202 determines a combination of feature amounts by performing feature amount selection on the feature amounts extracted by the feature amount extraction unit 201.

パラメータ設定部２０３は、特徴量選択部２０２で選択された特徴量の組み合わせのうち、どの組み合わせが良いかを決定するとともに、識別器のパラメータを決定する。識別器生成部２０４は、パラメータ設定部２０３で決定したパラメータを基に識別器を生成する。識別器判定部２０５は、生成された識別器とテストデータから求めた特徴量とを用いてテストデータの異常度を算出し、算出した異常度を閾値処理することにより良品（正常）であるか不良品（異常）であるかを判定する。 The parameter setting unit 203 determines which combination is good among the combinations of feature amounts selected by the feature amount selection unit 202 and also determines the parameters of the discriminator. The discriminator generation unit 204 generates a discriminator based on the parameters determined by the parameter setting unit 203. The discriminator determination unit 205 calculates the degree of abnormality of the test data using the generated discriminator and the feature amount obtained from the test data, and determines whether the product is normal (normal) by performing threshold processing on the calculated degree of abnormality. It is determined whether the product is defective (abnormal).

図３は、本実施形態における情報処理装置１０３の処理例を示すフローチャートである。なお、図３に示す各ステップの処理は、情報処理装置１０３により実行される。
（ステップＳ３０１：データ入力）
ステップＳ３０１では、撮影された画像データの入力を行う。なお、検査対象領域が画像上の一部であるときには、検査対象領域の画像データのみを評価対象として入力するようにしても良い。 FIG. 3 is a flowchart illustrating a processing example of the information processing apparatus 103 according to the present embodiment. Note that the processing of each step shown in FIG. 3 is executed by the information processing apparatus 103.
(Step S301: Data input)
In step S301, photographed image data is input. When the inspection target area is a part of the image, only the image data of the inspection target area may be input as the evaluation target.

（ステップＳ３０２：特徴量抽出）
ステップＳ３０２では、ステップＳ３０１で入力された画像データの検査対象領域から、複数の特徴量を抽出する。例えば、対象画像の注目領域に対してハール・ウェーブレット（Haar Wavelet）変換をかけて階層的に画像を生成し、各階層の画像から特徴量を抽出する。 (Step S302: feature extraction)
In step S302, a plurality of feature amounts are extracted from the inspection target area of the image data input in step S301. For example, a target image of the target image is subjected to Haar Wavelet transformation to generate images hierarchically, and feature amounts are extracted from the images of each layer.

ハール・ウェーブレット変換とは、位置情報を保持したまま周波数変換する処理である。ハール・ウェーブレット変換では、まず、対象画像に対して、式１−１〜式１−４で表される４種類のフィルタを用いて内積を演算する。 The Haar wavelet transform is a process of performing frequency transform while maintaining position information. In the Haar wavelet transform, first, an inner product is calculated with respect to a target image using four types of filters expressed by Expression 1-1 to Expression 1-4.

ここで、式１−１が縦方向高周波数成分フィルタ（ＨＬ）を示し、式１−２が横方向高周波数成分フィルタ（ＬＨ）を示し、式１−３が対角方向高周波数成分フィルタ（ＨＨ）を示し、式１−４が低周波数成分フィルタ（ＬＬ）を示す。対象画像の２×２の画素領域を重ね合わせることなく、移動させて、解像度が２分の１になるように処理し、縦方向高周波成分画像、横方向高周波成分画像、対角方向高周波成分画像、低周波成分画像の４種類の画像を生成する。 Here, Expression 1-1 represents a vertical high-frequency component filter (HL), Expression 1-2 represents a horizontal high-frequency component filter (LH), and Expression 1-3 represents a diagonal high-frequency component filter (LH). HH), and Equations 1-4 represent the low frequency component filter (LL). The 2 × 2 pixel area of the target image is moved without being overlapped and processed so that the resolution becomes half, and the vertical high-frequency component image, the horizontal high-frequency component image, and the diagonal high-frequency component image Then, four types of low-frequency component images are generated.

そして、生成された低周波成分画像に対して、同様に４種類のフィルタを用いて内積を演算することで、次の階層の縦方向高周波成分画像、横方向高周波成分画像、対角方向高周波成分画像、低周波成分画像の４種類の画像を生成する。この際、解像度は２分の１になるので、例えばハール・ウェーブレット変換を８回繰り返すのであれば、画像の垂直方向（縦方向）の画素数及び水平方向（横方向）の画素数は２の８乗の倍数に設定しておくことが好ましい。 Then, the inner product of the generated low-frequency component image is similarly calculated using four types of filters, so that a vertical high-frequency component image, a horizontal high-frequency component image, a diagonal high-frequency component of the next layer are calculated. Four types of images, an image and a low frequency component image, are generated. At this time, since the resolution is halved, for example, if the Haar wavelet transform is repeated 8 times, the number of pixels in the vertical direction (vertical direction) and the number of pixels in the horizontal direction (horizontal direction) of the image are 2. It is preferable to set a multiple of the eighth power.

このように生成された低周波成分画像に対して再度４種類のフィルタを適用して、次の階層の縦方向高周波成分画像、横方向高周波成分画像、対角方向高周波成分画像、低周波成分画像の４種類の画像を生成する処理を繰り返す。本実施形態では、一例としてハール・ウェーブレット変換を８回行うとする。結果として各階層から４画像が生成されるので、３２画像が生成される。これに加えて、入力画像が追加されるので、以下の合計３３画像が生成される。
（１）入力画像
（２）第１〜第８階層の各階層の縦方向高周波成分画像
（３）第１〜第８階層の各階層の横方向高周波成分画像
（４）第１〜第８階層の各階層の対角方向高周波成分画像
（５）第１〜第８階層の各階層の低周波成分画像 The four types of filters are again applied to the low-frequency component image generated in this manner, and the vertical high-frequency component image, horizontal high-frequency component image, diagonal high-frequency component image, and low-frequency component image of the next layer are applied. The process of generating the four types of images is repeated. In the present embodiment, as an example, the Haar wavelet transform is performed eight times. As a result, since 4 images are generated from each layer, 32 images are generated. In addition to this, since an input image is added, the following 33 images in total are generated.
(1) Input image (2) Longitudinal high-frequency component image of each layer of first to eighth layers (3) Horizontal high-frequency component image of each layer of first to eighth layers (4) First to eighth layers (5) Low frequency component images of each layer of the first to eighth layers

ハール・ウェーブレット変換を行っていない変換前の１種類の画像、及びハール・ウェーブレット変換をかけて得られた８階層の各階層での４種類の画像に対して、特徴量を算出する。ここで、特徴量は、統計特徴量を用い、コントラスト、最大値、（最大値−最小値）、平均、分散、尖度、歪度、相乗平均といった統計量を用いる。特徴量の算出式の例を以下に示す。画素値の平均は式２で算出し、分散は式３で算出し、尖度は式４で算出し、歪度は式５で算出し、相乗平均は式６で算出し、コントラストは式７で算出する。式２〜式７のそれぞれを、良品、不良品の撮像結果に対して適用し、演算する。なお、画像のサイズは垂直方向（縦方向）ａ画素、水平方向（横方向）ｂ画素の画像とし、水平ｉ番目、垂直ｊ番目の画素値をｐ（ｉ，ｊ）と表す。 Feature amounts are calculated for one type of image that has not been subjected to Haar / Wavelet transform, and four types of images in each of the eight layers obtained by applying Haar / Wavelet transform. Here, a statistical feature quantity is used as the feature quantity, and statistical quantities such as contrast, maximum value, (maximum value−minimum value), average, variance, kurtosis, skewness, and geometric mean are used. An example of a feature amount calculation formula is shown below. The average of the pixel values is calculated by Formula 2, the variance is calculated by Formula 3, the kurtosis is calculated by Formula 4, the skewness is calculated by Formula 5, the geometric mean is calculated by Formula 6, and the contrast is Formula 7 Calculate with Each of Expressions 2 to 7 is applied to the imaging results of non-defective products and defective products, and calculation is performed. Note that the image size is an image of vertical (vertical) a pixels and horizontal (horizontal) b pixels, and the horizontal i-th and vertical j-th pixel values are represented as p (i, j).

例えば、１つの入力画像とその入力画像から生成された画像との合計３３種類の画像に対して、最大値、及び式２〜式７で示した７つの特徴量を抽出する。つまり、計３３種類の画像から統計特徴量を７種類ずつ抽出する。結果的に入力画像毎に７×３３＝２３１個（以下Ｎ個とする）の特徴量を抽出する。なお、入力画像として用いるＭ個の全パターンからＮ個ずつの特徴量を抽出する。 For example, with respect to a total of 33 types of images including one input image and an image generated from the input image, the maximum value and the seven feature amounts expressed by Expressions 2 to 7 are extracted. That is, seven types of statistical feature values are extracted from a total of 33 types of images. As a result, 7 × 33 = 231 (hereinafter referred to as N) feature amounts are extracted for each input image. Note that N feature amounts are extracted from all M patterns used as input images.

今回はハール・ウェーブレット変換を用いる手法について述べたが、その他のウェーブレット変換、エッジ抽出、フーリエ変換、ガボール変換といったその他の変換手法を用いても良い。また、統計特徴量として最大値から最小値を引いた値、標準偏差といったその他の統計量を用いても良い。以上の処理により、入力画像パターンから複数の特徴量を抽出することができる。 Although the method using the Haar wavelet transform has been described this time, other conversion methods such as other wavelet transform, edge extraction, Fourier transform, and Gabor transform may be used. Further, other statistical quantities such as a value obtained by subtracting the minimum value from the maximum value or a standard deviation may be used as the statistical feature quantity. Through the above processing, a plurality of feature amounts can be extracted from the input image pattern.

（ステップＳ３０３：特徴量のサブセットの生成）
ステップＳ３０３では、ステップＳ３０２で抽出した特徴量に対して、特徴量の選択を行い、特徴量のサブセットを生成する。例えば、入力データから抽出される複数の特徴量を組み合わせて第一評価値を算出し、第一評価値同士を比較して、上位のＫ個の特徴量にのみ、特徴量の組み合わせの相性を示す第二評価値に値を加算し、得られた第二評価値に基づいて特徴量を選択する。 (Step S303: Generation of Feature Subset)
In step S303, a feature amount is selected for the feature amount extracted in step S302 to generate a feature amount subset. For example, a first evaluation value is calculated by combining a plurality of feature amounts extracted from input data, the first evaluation values are compared, and only the top K feature amounts are compatible with the combination of feature amounts. A value is added to the second evaluation value shown, and a feature amount is selected based on the obtained second evaluation value.

以下、図４（Ａ）及び図４（Ｂ）を参照して、特徴量のサブセットの生成について説明する。図４（Ａ）はＫ＝２の場合、図４（Ｂ）はＫ＝４の場合における第二評価値の算出の概念を示す図である。例として、特徴量がＡ、Ｂ、Ｃ、Ｄ、Ｅの５種類ある場合について説明する。なお、図４（Ａ）及び図４（Ｂ）に示す例では、第一評価値が小さいほど良品（正常）と不良品（異常）との分類に適しているものとする。 Hereinafter, generation of a subset of feature amounts will be described with reference to FIGS. 4 (A) and 4 (B). 4A is a diagram illustrating the concept of calculating the second evaluation value when K = 2, and FIG. 4B is a diagram illustrating the concept of calculating the second evaluation value when K = 4. As an example, a case where there are five types of feature amounts A, B, C, D, and E will be described. In the example shown in FIGS. 4A and 4B, it is assumed that the smaller the first evaluation value is, the more suitable the classification is between a non-defective product (normal) and a defective product (abnormal).

最初に、図４（Ａ）を用いて、Ｋ＝２としたときの第二評価値の算出方法について説明する。特徴量Ａに注目した場合、特徴量Ｂは第一評価値が最も良く、他の特徴量Ｃ、Ｄ、Ｅよりも特徴量Ａとの組み合わせにおいて良品と不良品との分類に適しているとみなせるため、特徴量Ｂの第二評価値に対して２を加算する。また、特徴量Ｃは第一評価値が二番目に良いので、特徴量Ｃの第二評価値に対して１を加算する。 First, a method for calculating the second evaluation value when K = 2 is described with reference to FIG. When paying attention to the feature amount A, the feature amount B has the best first evaluation value, and is suitable for the classification of non-defective products and defective products in combination with the feature amount A over other feature amounts C, D, and E. Therefore, 2 is added to the second evaluation value of the feature amount B. Since the feature value C has the second highest evaluation value, 1 is added to the second evaluation value of the feature value C.

特徴量Ｂに注目した場合、特徴量Ｃは第一評価値が最も良く、他の特徴量Ａ、Ｄ、Ｅよりも特徴量Ｂとの組み合わせにおいて良品と不良品との分類に適しているとみなせるため、特徴量Ｃの第二評価値に対して２を加算する。また、特徴量Ａは第一評価値が二番目に良いので、特徴量Ａの第二評価値に対して１を加算する。 When attention is paid to the feature value B, the feature value C has the best first evaluation value, and is suitable for the classification of a non-defective product and a defective product in combination with the feature value B rather than the other feature values A, D, and E. Therefore, 2 is added to the second evaluation value of the feature amount C. Since the feature value A has the second highest evaluation value, 1 is added to the second evaluation value of the feature value A.

特徴量Ｃに注目した場合、特徴量Ｂは第一評価値が最も良く、他の特徴量Ａ、Ｄ、Ｅよりも特徴量Ｃとの組み合わせにおいて良品と不良品との分類に適しているとみなせるため、特徴量Ｂの第二評価値に対して２を加算する。また、特徴量Ａは第一評価値が二番目に良いので、特徴量Ａの第二評価値に対して１を加算する。 When paying attention to the feature amount C, the feature amount B has the best first evaluation value, and is suitable for the classification of a non-defective product and a defective product in combination with the feature amount C over other feature amounts A, D, and E. Therefore, 2 is added to the second evaluation value of the feature amount B. Since the feature value A has the second highest evaluation value, 1 is added to the second evaluation value of the feature value A.

特徴量Ｄに注目した場合、特徴量Ｃは第一評価値が最も良く、他の特徴量Ａ、Ｂ、Ｅよりも特徴量Ｄとの組み合わせにおいて良品と不良品との分類に適しているとみなせるため、特徴量Ｃの第二評価値に対して２を加算する。また、特徴量Ｂは第一評価値が二番目に良いので、特徴量Ｂの第二評価値に対して１を加算する。 When paying attention to the feature quantity D, the feature quantity C has the best first evaluation value, and is suitable for the classification of non-defective products and defective products in combination with the feature quantity D over other feature quantities A, B, and E. Therefore, 2 is added to the second evaluation value of the feature amount C. Since the feature value B has the second highest evaluation value, 1 is added to the second evaluation value of the feature value B.

特徴量Ｅに注目した場合、特徴量Ｂは第一評価値が最も良く、他の特徴量Ａ、Ｃ、Ｄよりも特徴量Ｅとの組み合わせにおいて良品と不良品との分類に適しているとみなせるため、特徴量Ｂの第二評価値に対して２を加算する。また、特徴量Ａは第一評価値が二番目に良いので、特徴量Ａの第二評価値に対して１を加算する。 When attention is paid to the feature quantity E, the feature quantity B has the best first evaluation value, and it is suitable for the classification of non-defective products and defective products in combination with the feature quantity E over other feature quantities A, C, and D. Therefore, 2 is added to the second evaluation value of the feature amount B. Since the feature value A has the second highest evaluation value, 1 is added to the second evaluation value of the feature value A.

結果として、特徴量Ａの第二評価値は３、特徴量Ｂの第二評価値は７、特徴量Ｃの第二評価値は５、特徴量Ｄの第二評価値は０、特徴量Ｅの第二評価値は０となる。この第二評価値が所定の値（例えば、閾値は０）より大きい特徴量を特徴量のサブセットとする。この場合、特徴量Ａと特徴量Ｂと特徴量Ｃとが特徴量のサブセットとなる。 As a result, the second evaluation value of the feature quantity A is 3, the second evaluation value of the feature quantity B is 7, the second evaluation value of the feature quantity C is 5, the second evaluation value of the feature quantity D is 0, and the feature quantity E The second evaluation value is zero. A feature quantity whose second evaluation value is larger than a predetermined value (for example, the threshold is 0) is set as a subset of the feature quantity. In this case, the feature amount A, the feature amount B, and the feature amount C are a subset of the feature amount.

次に、図４（Ｂ）を用いて、Ｋ＝４としたときの第二評価値の算出方法について説明する。特徴量Ａに注目した場合、特徴量Ｂは第一評価値が最も良く、他の特徴量Ｃ、Ｄ、Ｅよりも特徴量Ａとの組み合わせにおいて良品と不良品との分類に適しているとみなせるため、特徴量Ｂの第二評価値に対して４を加算する。また、特徴量Ｃは第一評価値が二番目に良いので、特徴量Ｃの第二評価値に対して３を加算する。そして、特徴量Ｅは第一評価値が三番目に良いので、特徴量Ｅの第二評価値に対して２を加算する。最後に、特徴量Ｄは第一評価値が四番目に良いので、特徴量Ｄの第二評価値に対して１を加算する。 Next, a second evaluation value calculation method when K = 4 will be described with reference to FIG. When paying attention to the feature amount A, the feature amount B has the best first evaluation value, and is suitable for the classification of non-defective products and defective products in combination with the feature amount A over other feature amounts C, D, and E. Therefore, 4 is added to the second evaluation value of the feature amount B. Since the feature value C has the second highest evaluation value, 3 is added to the second evaluation value of the feature value C. Then, since the first evaluation value of the feature quantity E is the third best, 2 is added to the second evaluation value of the feature quantity E. Finally, since the first evaluation value of the feature quantity D is the fourth best, 1 is added to the second evaluation value of the feature quantity D.

特徴量Ｂに注目した場合、特徴量Ｃは第一評価値が最も良く、他の特徴量Ａ、Ｄ、Ｅよりも特徴量Ｂとの組み合わせにおいて良品と不良品との分類に適しているとみなせるため、特徴量Ｃの第二評価値に対して４を加算する。また、特徴量Ａは第一評価値が二番目に良いので、特徴量Ａの第二評価値に対して３を加算する。そして、特徴量Ｅは第一評価値が三番目に良いので、特徴量Ｅの第二評価値に対して２を加算する。最後に、特徴量Ｄは第一評価値が四番目に良いので、特徴量Ｄの第二評価値に対して１を加算する。 When attention is paid to the feature value B, the feature value C has the best first evaluation value, and is suitable for the classification of a non-defective product and a defective product in combination with the feature value B rather than the other feature values A, D, and E. Therefore, 4 is added to the second evaluation value of the feature amount C. Since the feature value A has the second highest evaluation value, 3 is added to the second evaluation value of the feature value A. Then, since the first evaluation value of the feature quantity E is the third best, 2 is added to the second evaluation value of the feature quantity E. Finally, since the first evaluation value of the feature quantity D is the fourth best, 1 is added to the second evaluation value of the feature quantity D.

特徴量Ｃに注目した場合、特徴量Ｂは第一評価値が最も良く、他の特徴量Ａ、Ｄ、Ｅよりも特徴量Ｃとの組み合わせにおいて良品と不良品との分類に適しているとみなせるため、特徴量Ｂの第二評価値に対して４を加算する。また、特徴量Ａは第一評価値が二番目に良いので、特徴量Ａの第二評価値に対して３を加算する。そして、特徴量Ｄは第一評価値が三番目に良いので、特徴量Ｄの第二評価値に対して２を加算する。最後に、特徴量Ｅは第一評価値が四番目に良いので、特徴量Ｅの第二評価値に対して１を加算する。 When paying attention to the feature amount C, the feature amount B has the best first evaluation value, and is suitable for the classification of a non-defective product and a defective product in combination with the feature amount C over other feature amounts A, D, and E. Therefore, 4 is added to the second evaluation value of the feature amount B. Since the feature value A has the second highest evaluation value, 3 is added to the second evaluation value of the feature value A. Then, since the first evaluation value of the feature quantity D is the third best, 2 is added to the second evaluation value of the feature quantity D. Finally, since the first evaluation value of the feature quantity E is the fourth best, 1 is added to the second evaluation value of the feature quantity E.

特徴量Ｄに注目した場合、特徴量Ｃは第一評価値が最も良く、他の特徴量Ａ、Ｂ、Ｅよりも特徴量Ｄとの組み合わせにおいて良品と不良品との分類に適しているとみなせるため、特徴量Ｃの第二評価値に対して４を加算する。また、特徴量Ｂは第一評価値が二番目に良いので、特徴量Ｂの第二評価値に対して３を加算する。そして、特徴量Ｅは第一評価値が三番目に良いので、特徴量Ｅの第二評価値に対して２を加算する。最後に、特徴量Ａは第一評価値が四番目に良いので、特徴量Ａの第二評価値に対して１を加算する。 When paying attention to the feature quantity D, the feature quantity C has the best first evaluation value, and is suitable for the classification of non-defective products and defective products in combination with the feature quantity D over other feature quantities A, B, and E. Therefore, 4 is added to the second evaluation value of the feature amount C. Since the feature value B has the second highest evaluation value, 3 is added to the second evaluation value of the feature value B. Then, since the first evaluation value of the feature quantity E is the third best, 2 is added to the second evaluation value of the feature quantity E. Finally, since the first evaluation value of the feature quantity A is the fourth best, 1 is added to the second evaluation value of the feature quantity A.

特徴量Ｅに注目した場合、特徴量Ｂは第一評価値が最も良く、他の特徴量Ａ、Ｃ、Ｄよりも特徴量Ｅとの組み合わせにおいて良品と不良品との分類に適しているとみなせるため、特徴量Ｂの第二評価値に対して４を加算する。また、特徴量Ａは第一評価値が二番目に良いので、特徴量Ａの第二評価値に対して３を加算する。そして、特徴量Ｄは第一評価値が三番目に良いので、特徴量Ｄの第二評価値に対して２を加算する。最後に、特徴量Ｃは第一評価値が四番目に良いので、特徴量Ｃの第二評価値に対して１を加算する。 When attention is paid to the feature quantity E, the feature quantity B has the best first evaluation value, and it is suitable for the classification of non-defective products and defective products in combination with the feature quantity E over other feature quantities A, C, and D. Therefore, 4 is added to the second evaluation value of the feature amount B. Since the feature value A has the second highest evaluation value, 3 is added to the second evaluation value of the feature value A. Then, since the first evaluation value of the feature quantity D is the third best, 2 is added to the second evaluation value of the feature quantity D. Finally, since the first evaluation value of the feature amount C is the fourth best, 1 is added to the second evaluation value of the feature amount C.

結果として、特徴量Ａの第二評価値は１０、特徴量Ｂの第二評価値は１５、特徴量Ｃの第二評価値は１２、特徴量Ｄの第二評価値は６、特徴量Ｅの第二評価値は７となる。この第二評価値が所定の値（例えば、閾値は０）より大きい特徴量を特徴量のサブセットとする。この場合、特徴量Ａと特徴量Ｂと特徴量Ｃと特徴量Ｄと特徴量Ｅとが特徴量のサブセットとなる。 As a result, the second evaluation value of the feature quantity A is 10, the second evaluation value of the feature quantity B is 15, the second evaluation value of the feature quantity C is 12, the second evaluation value of the feature quantity D is 6, and the feature quantity E The second evaluation value is 7. A feature quantity whose second evaluation value is larger than a predetermined value (for example, the threshold is 0) is set as a subset of the feature quantity. In this case, the feature quantity A, the feature quantity B, the feature quantity C, the feature quantity D, and the feature quantity E are a subset of the feature quantities.

（ステップＳ３０４：特徴量のサブセット選択）
ステップＳ３０４では、交差確認法を利用して、ステップＳ３０３で生成した特徴量のサブセットを選択し、識別器のパラメータを設定する。ここでは、識別器として投影距離法を用いる場合について述べる。投影距離とは、それぞれの特徴量を軸とする特徴空間における特徴ベクトルと、パターンの分布の分散が最大となる向きを持つ超平面（主平面）との最短距離である。詳細については、数式を用いて具体的に説明する。 (Step S304: Feature value subset selection)
In step S304, using the intersection confirmation method, a subset of the feature values generated in step S303 is selected, and the classifier parameters are set. Here, a case where the projection distance method is used as a discriminator will be described. The projection distance is the shortest distance between the feature vector in the feature space with each feature quantity as an axis and the hyperplane (main plane) having the direction in which the distribution of the pattern distribution is maximum. Details will be specifically described using mathematical expressions.

良品データの平均ベクトルｍと共分散行列Σは、良品データの数ｎと特徴ベクトルｘ_iを用いて示すことができる。この良品データの平均ベクトルｍを式８に、共分散行列Σを式９に示す。 Mean vector m and the covariance matrix Σ of good data may be demonstrated using the number n and the feature vector x _i of good data. The average vector m of this good product data is shown in Equation 8, and the covariance matrix Σ is shown in Equation 9.

共分散行列Σの第ｉ番目の固有値、固有ベクトルをそれぞれλ_i、φ_iとする。このとき固有値は降順で並んでいるものとする。式８で算出された主平面と式９で算出された平均ベクトルｍを用いて、良品のデータ数ｒと特徴ベクトルｘにより、投影距離ｄ（ｘ）は Let the i-th eigenvalue and eigenvector of the covariance matrix Σ be λ _i and φ _i , respectively. At this time, the eigenvalues are arranged in descending order. Using the main plane calculated by Equation 8 and the average vector m calculated by Equation 9, the projection distance d (x) is given by the number of good data r and the feature vector x.

で表される。学習用データをＬ分割し、（Ｌ−１）個のデータ群を学習用データとし、１個のデータ群をテストデータとして、式１０で算出された投影距離に基づいて異常度を算出し正常異常判定を行う。算出された異常度からＲＯＣ曲線を求め、評価値基準のひとつであるＡＵＣ（ＡｒｅａＵｎｄｅｒＣｕｒｖｅ：曲線下部の面積）により特徴量のサブセットを選択して、部分空間の次元数を設定する。 It is represented by The learning data is divided into L, and (L-1) data groups are used as learning data, one data group is used as test data, and the degree of abnormality is calculated based on the projection distance calculated by Expression 10 to be normal. Abnormality judgment is performed. A ROC curve is obtained from the calculated degree of abnormality, a subset of feature values is selected by AUC (Area Under Curve) which is one of evaluation value criteria, and the number of dimensions of the subspace is set.

（ステップＳ３０５：正常異常判定）
ステップＳ３０５では、ステップＳ３０４で選択された特徴量のサブセットと、設定されたパラメータに対応する識別器のパラメータ（部分空間の次元数）を用いて学習を行い、正常モデルを生成する。この正常モデルを用いて、すべてのテストデータに対して投影距離に基づく異常度スコアを算出し、算出した異常度スコアを閾値処理することにより正常異常判定を行う。 (Step S305: normal / abnormal determination)
In step S305, learning is performed using the subset of the feature values selected in step S304 and the classifier parameters (the number of dimensions of the subspace) corresponding to the set parameters to generate a normal model. Using this normal model, an abnormality degree score based on the projection distance is calculated for all test data, and normal abnormality determination is performed by thresholding the calculated abnormality degree score.

次に、図３のステップＳ３０３における特徴量のサブセットの生成処理の詳細について説明する。図５は、第１の実施形態における特徴量のサブセットの生成処理を示すフローチャートである。 Next, details of the feature quantity subset generation processing in step S303 in FIG. 3 will be described. FIG. 5 is a flowchart showing a feature amount subset generation process according to the first embodiment.

（ステップＳ５０１：ｎ個ずつ特徴量の組み合わせを生成）
ステップＳ５０１では、情報処理装置１０３が組み合わせ生成手段として機能し、ステップＳ３０２で抽出した特徴量を組み合わせて、ｎ個ずつの特徴量の組み合わせを生成する。組み合わせる特徴量の数ｎは、予め設定しておくが、例えば計算時間の観点からｎ＝２にするのが好ましい。入力データとしての画像データから抽出した特徴量の数がＮ個である場合、ｎ個ずつの特徴量を組み合わせるので、_NＣ_n通りの組み合わせが生成される。 (Step S501: Generate a combination of feature quantities n by n)
In step S501, the information processing apparatus 103 functions as a combination generation unit, and generates a combination of n feature amounts by combining the feature amounts extracted in step S302. The number n of feature quantities to be combined is set in advance, but for example, n = 2 is preferable from the viewpoint of calculation time. When the number of feature amounts extracted from the image data as input data is N, since n feature amounts are combined, _N C _n combinations are generated.

（ステップＳ５０２：ｎ個の特徴量の組み合わせのそれぞれに対して第一評価値を算出）
ステップＳ５０２では、情報処理装置１０３が第一の算出手段として機能し、ステップＳ５０１で生成された_NＣ_n通りの特徴量の組み合わせのそれぞれに対して、第一評価値を算出する。第一評価値は、ｎ個ずつの特徴量の組み合わせが、入力データの分類の判定に適しているか否かを評価するための評価値である。すなわち、第一評価値は、検査対象物１０１が良品であるか不良品であるかを判定する際に用いる特徴量の組み合わせとして適しているか否かを示している。 (Step S502: First evaluation value is calculated for each of the combinations of n feature values)
In step S502, the information processing apparatus 103 functions as a first calculation unit, and calculates a first evaluation value for each of the _N C _n feature amount combinations generated in step S501. The first evaluation value is an evaluation value for evaluating whether a combination of n feature amounts is suitable for determining the classification of input data. That is, the first evaluation value indicates whether or not the inspection object 101 is suitable as a combination of feature amounts used when determining whether the inspection object 101 is a non-defective product or a defective product.

第一評価値の例の一つとして、正常データと異常データとを用いて算出されるベイズ誤り確率推定値について述べる。ここで、正常のクラス、異常のクラスのそれぞれをｗ₁、ｗ₂とし、ｎ個の特徴をもつベクトルをＸ＝［ｘ₁，・・・，ｘ_n］^tとする。正常クラスｗ₁、異常クラスｗ₂に属する確率の分布に対応するｗ₁とｗ₂における条件付き確率分布Ｐ（ｗ₁｜ｘ）、Ｐ（ｗ₂｜ｘ）をヒストグラムで表現し、そこから事後確率分布Ｐ（ｗ₁｜ｘ）、Ｐ（ｗ₂｜ｘ）を式１１により算出する。 As one example of the first evaluation value, a Bayes error probability estimation value calculated using normal data and abnormal data will be described. Here, it is assumed that the normal class and the abnormal class are w ₁ and w ₂ , respectively, and a vector having n features is X = [x ₁ ,..., X _n ] ^t . The conditional probability distributions P (w ₁ | x) and P (w ₂ | x) in w ₁ and w ₂ corresponding to the probability distributions belonging to the normal class w ₁ and the abnormal class w ₂ are represented by histograms. The posterior probability distributions P (w ₁ | x) and P (w ₂ | x) are calculated according to Equation 11.

そして、事後確率分布Ｐ（ｗ₁｜ｘ）、Ｐ（ｗ₂｜ｘ）の重なりに対応するベイズ誤り確率推定値を、式１２を用いて算出する。 Then, a Bayes error probability estimated value corresponding to the overlap of the posterior probability distributions P (w ₁ | x) and P (w ₂ | x) is calculated using Expression 12.

このベイズ誤り確率推定値の計算を、_NＣ_n通りの特徴量の組み合わせのそれぞれに対して行う。ここで算出するベイズ誤り確率推定値は、値が低いほど良品と不良品との分類に適している組み合わせとみなすことができる。また、ベイズ誤り確率推定値は、ｎ≧３の場合においても、すなわち３つ以上の特徴量の組み合わせについても適用可能である。 The Bayes error probability estimation value is calculated for each of the _N C _n feature amount combinations. The Bayes error probability estimated value calculated here can be regarded as a combination that is more suitable for classification between non-defective products and defective products as the value is lower. Further, the Bayes error probability estimation value can be applied even when n ≧ 3, that is, a combination of three or more feature amounts.

なお、ベイズ誤り確率推定値以外の正常データと異常データとを用いる評価値として、例えばクラス内分散・クラス間分散比を第一評価値として用いることも可能である。また、正常データのみを用いる評価値として、正常データの重心からの最も離れたデータへのユークリッド距離を第一評価値として用いることも可能である。 As an evaluation value using normal data and abnormal data other than the Bayes error probability estimation value, for example, intra-class variance / inter-class variance ratio may be used as the first evaluation value. Further, as the evaluation value using only normal data, the Euclidean distance to the most distant data from the center of gravity of normal data can be used as the first evaluation value.

（ステップＳ５０３：重みパラメータの設定）
ステップＳ５０３では、情報処理装置１０３がパラメータ生成手段として機能し、第二評価値を算出する際に値を加算する上位の特徴量の数Ｋを設定する。このＫを重みパラメータと呼ぶ。本実施形態では、重みパラメータＫを複数（ｋ個）用意する。そして、続くステップＳ５０４で、重みパラメータＫを順に変えて、重みパラメータ毎に第二評価値の算出が行われる。例えば、ｋ個の重みパラメータＫをＫ＝１，・・・，ｋと順に変えて、重みパラメータ毎に第二評価値を算出する。例えば、Ｋ＝ｍである場合、ある特徴量に対して、第一評価値の順位が一位の特徴量には第二評価値に対して値ｍを加算する。また、第一評価値の順位が二位の特徴量には第二評価値に対して値（ｍ−１）を加算し、以下、第一評価値の順位に応じて加算する値を線形に変化させていき、第一評価値の順位がｎ位の特徴量には第二評価値に対して値１を加算する。 (Step S503: Setting of weight parameter)
In step S503, the information processing apparatus 103 functions as a parameter generation unit, and sets the number K of upper feature amounts to which a value is added when calculating the second evaluation value. This K is called a weight parameter. In the present embodiment, a plurality (k) of weight parameters K are prepared. In subsequent step S504, the weight parameter K is changed in order, and the second evaluation value is calculated for each weight parameter. For example, k weight parameters K are sequentially changed to K = 1,..., K, and the second evaluation value is calculated for each weight parameter. For example, when K = m, the value m is added to the second evaluation value for the feature quantity that ranks first in the first evaluation value for a certain feature quantity. In addition, a value (m−1) is added to the second evaluation value for the feature value having the second rank of the first evaluation value, and the value to be added according to the rank of the first evaluation value is linearly expressed below. The value 1 is added to the second evaluation value for the feature quantity having the first evaluation value rank n.

（ステップＳ５０４：第二評価値を算出）
ステップＳ５０４では、情報処理装置１０３が第二の算出手段として機能し、ステップＳ５０２で算出した第一評価値と、ステップＳ５０３で決定した複数の重みパラメータとを用いて、第二評価値を算出する。以下に、第二評価値の算出方法について詳細に説明する。 (Step S504: Calculate second evaluation value)
In step S504, the information processing apparatus 103 functions as a second calculation unit, and calculates the second evaluation value using the first evaluation value calculated in step S502 and the plurality of weight parameters determined in step S503. . Below, the calculation method of a 2nd evaluation value is demonstrated in detail.

まず、１つの特徴量に注目し、その特徴量を含む組み合わせの中で第一評価値が良い方から、重みパラメータＫの数分の組み合わせを求める。第一評価値としてベイズ誤り確率推定値を用いている場合、ベイズ誤り確率推定値が低い方から重みパラメータＫの数分の特徴量の組み合わせを求める。そして、注目した特徴量に対し、第一評価値が良い方から重みパラメータＫの数分の組み合わせの特徴量について、第一評価値が良い順に値が大きくなる、重みパラメータＫに応じた所定の値を第二評価値に対して加算する。そして、注目する特徴量を順に変えながら、特徴量毎の第二評価値に対して所定の値を加算する前述の処理を繰り返す。このようにして、すべての特徴量に対して注目し、特徴量毎に所定の値を加算した第二評価値の合計を求める。 First, paying attention to one feature amount, combinations corresponding to the number of weighting parameters K are obtained from the combination with the feature amount, which has a better first evaluation value. When a Bayes error probability estimate is used as the first evaluation value, combinations of feature quantities corresponding to the number of weight parameters K are obtained from the lower Bayes error probability estimate. Then, with respect to the feature amount of interest, a predetermined amount corresponding to the weight parameter K increases in the order of the first evaluation value in the order of the first evaluation value with respect to the feature amount of combinations corresponding to the number of weight parameters K from the better first evaluation value. The value is added to the second evaluation value. Then, the above-described processing of adding a predetermined value to the second evaluation value for each feature amount is repeated while sequentially changing the feature amount to be noticed. In this way, attention is paid to all feature amounts, and a total of second evaluation values obtained by adding a predetermined value for each feature amount is obtained.

なお、ｎ≧３の場合における第二評価値の算出において、第二評価値に対して値を加算する際には複数の特徴量の第二評価値に対して値を加算する。例えば、ｎ＝３の場合について述べると、３つの特徴量の組み合わせで第一評価値を算出する。よって、ある特徴量に注目した場合、その特徴量との組み合わせに含まれる２つの特徴量の第二評価値に対して値が加算される。 In the calculation of the second evaluation value in the case of n ≧ 3, when adding a value to the second evaluation value, the value is added to the second evaluation value of a plurality of feature amounts. For example, in the case of n = 3, the first evaluation value is calculated by a combination of three feature amounts. Therefore, when attention is paid to a certain feature value, a value is added to the second evaluation value of the two feature values included in the combination with the feature value.

（ステップＳ５０５：特徴量のサブセットを生成）
ステップＳ５０５では、情報処理装置１０３がサブセット生成手段として機能し、ステップＳ５０４で重みパラメータＫ毎に算出した第二評価値を閾値処理することにより、第二評価値が所定の値である特徴量の組み合わせを特徴量のサブセットとして生成する。このときの閾値は固定値にすると重みパラメータＫの値によりバラつきがでるので、パーセンタイル法で累積値の閾値処理を行う。 (Step S505: Generate a subset of feature values)
In step S505, the information processing apparatus 103 functions as a subset generation unit, and threshold processing is performed on the second evaluation value calculated for each weight parameter K in step S504, so that the second evaluation value has a predetermined value. A combination is generated as a subset of feature values. If the threshold value at this time is a fixed value, it varies depending on the value of the weight parameter K. Therefore, the threshold processing of the cumulative value is performed by the percentile method.

例えば、特徴量Ａの第二評価値が４、特徴量Ｂの第二評価値が３、特徴量Ｃの第二評価値が１、特徴量Ｄの第二評価値が９であるとする。この場合、特徴量をＤ（９）、Ａ（４）、Ｂ（３）、Ｃ（１）と降順に並べ、累積値がＤ（９）、Ａ（１３）、Ｂ（１６）、Ｃ（１７）となる。９０％の累積値で閾値処理すると、閾値が１７×０．９で１５．３となる。よって、特徴量Ｄと特徴量Ａが特徴量のサブセットとして生成される。この処理を各重みパラメータＫに関し行い、特徴量のサブセットを複数種類（重みパラメータＫの数分）生成する。 For example, it is assumed that the second evaluation value of the feature quantity A is 4, the second evaluation value of the feature quantity B is 3, the second evaluation value of the feature quantity C is 1, and the second evaluation value of the feature quantity D is 9. In this case, the feature amounts are arranged in descending order as D (9), A (4), B (3), C (1), and the accumulated values are D (9), A (13), B (16), C ( 17). If threshold processing is performed with a cumulative value of 90%, the threshold value is 15.3 at 17 × 0.9. Therefore, the feature quantity D and the feature quantity A are generated as a subset of the feature quantity. This processing is performed for each weight parameter K, and a plurality of types of feature amount subsets (for the number of weight parameters K) are generated.

なお、今回はパーセンタイル法で累積値の閾値処理を行うことを前提としたが、閾値処理を行わないで、第二評価値が０以上の特徴量をすべて用い、特徴量のサブセットとして利用してもよい。例えば、特徴量Ａの第二評価値が４、特徴量Ｂの第二評価値が３、特徴量Ｃの第二評価値が０、特徴量Ｄの第二評価値が７である場合、特徴量Ａと特徴量Ｂと特徴量Ｄ、を特徴量のサブセットとして利用するようにしても良い。 This time, it is assumed that threshold processing of cumulative values is performed by the percentile method, but without using threshold processing, all feature quantities with a second evaluation value of 0 or more are used as a subset of feature quantities. Also good. For example, when the second evaluation value of the feature quantity A is 4, the second evaluation value of the feature quantity B is 3, the second evaluation value of the feature quantity C is 0, and the second evaluation value of the feature quantity D is 7, The amount A, the feature amount B, and the feature amount D may be used as a subset of the feature amount.

第１の実施形態によれば、第一評価値に基づいて第二評価値を求める際に使用する重みパラメータを複数設定する。そして、複数の重みパラメータ毎に、第二評価値を算出し、算出した第二評価値に基づいて特徴量のサブセットを生成し、入力データの分類に適している特徴量のサブセットを選択する。これにより、精度の高い特徴量の選択を行うことができ、結果的に、より精度の高い正常異常識別を行うことができる。 According to the first embodiment, a plurality of weight parameters used when obtaining the second evaluation value based on the first evaluation value are set. Then, a second evaluation value is calculated for each of the plurality of weight parameters, a feature amount subset is generated based on the calculated second evaluation value, and a feature amount subset suitable for classification of input data is selected. Thereby, it is possible to select a feature quantity with high accuracy, and as a result, it is possible to perform normal / abnormal identification with higher accuracy.

（第２の実施形態）
次に、本発明の第２の実施形態について説明する。第１の実施形態では、第二評価値に対して加算する値を離散的に設定していた。これに対し、第２の実施形態では、第二評価値に対して加算する値を入力データの共分散行列の固有値を基に設定して、第二評価値を算出し特徴量のサブセットを生成する。なお、以下では、第２の実施形態において、前述した第１の実施形態と異なる点についてのみ説明する。 (Second Embodiment)
Next, a second embodiment of the present invention will be described. In the first embodiment, values to be added to the second evaluation value are set discretely. On the other hand, in the second embodiment, a value to be added to the second evaluation value is set based on the eigenvalues of the covariance matrix of the input data, and the second evaluation value is calculated to generate a feature amount subset. To do. In the following description, only the differences of the second embodiment from the first embodiment will be described.

第２の実施形態における特徴量のサブセットの生成処理のフローチャートを図６に示す。
（ステップＳ６０１：共分散行列の固有値から重みパラメータを決定）
ステップＳ６０１では、ステップＳ３０２で抽出した正常データの特徴量（特徴量ベクトル）から前述した式８及び式９により特徴行列である共分散行列Σを求め、求めた共分散行列Σの固有値を用いて第二評価値に対して加算する値を決定する。式９により得られた共分散行列Σの第ｉ番目の固有値、固有ベクトルをそれぞれλ_i、φ_iとする。このとき固有値は降順で並んでいるものとする。本実施形態では、第二評価値の算出を行う際、第一評価値がＭ番目に良い特徴量の第二評価値に対して、Ｍ番目に大きい固有値λ_Mを加算する。 FIG. 6 shows a flowchart of the feature amount subset generation processing in the second embodiment.
(Step S601: Determine weight parameter from eigenvalues of covariance matrix)
In step S601, a covariance matrix Σ, which is a feature matrix, is obtained from the feature amount (feature amount vector) of the normal data extracted in step S302 by the above-described equations 8 and 9, and the eigenvalue of the obtained covariance matrix Σ is used. A value to be added to the second evaluation value is determined. Let the i-th eigenvalue and eigenvector of the covariance matrix Σ obtained by Equation 9 be λ _i and φ _i , respectively. At this time, the eigenvalues are arranged in descending order. In the present embodiment, when the second evaluation value is calculated, the Mth largest eigenvalue λ _M is added to the second evaluation value of the feature quantity having the Mth best first evaluation value.

（ステップＳ６０２：重みパラメータを設定）
ステップＳ６０２では、重みパラメータＫを設定する。設定される重みパラメータＫに応じて、続くステップＳ６０３では、重みパラメータＫがｋであるとき、第一評価値がｋ番目に良い特徴量までには、第一評価値の順に応じてステップＳ６０１の処理で得られた値が第二評価値に対して加算される。例えば、第一評価値がｐ番目（１≦ｐ≦ｋ）に良い特徴量の第二評価値に対して、ｐ番目に大きい固有値λ_pが加算される。第一評価値が（ｋ＋１）位以上の特徴量に対しては、第二評価値に対して０を加算する（すなわち値が加算されない）。 (Step S602: Set weight parameter)
In step S602, a weight parameter K is set. According to the set weight parameter K, in the subsequent step S603, when the weight parameter K is k, the first evaluation value reaches the k-th best feature amount according to the order of the first evaluation value. The value obtained by the processing is added to the second evaluation value. For example, the p-th largest eigenvalue λ _p is added to the second evaluation value of the feature quantity whose first evaluation value is p-th (1 ≦ p ≦ k). For feature quantities with a first evaluation value of (k + 1) or higher, 0 is added to the second evaluation value (that is, no value is added).

なお、第２の実施形態における図６のステップＳ６０３、Ｓ６０４の処理は、第１の実施形態における図５のステップＳ５０４、Ｓ５０５の処理とそれぞれ同様であるので説明は省略する。 Note that the processing in steps S603 and S604 in FIG. 6 in the second embodiment is the same as the processing in steps S504 and S505 in FIG. 5 in the first embodiment, and a description thereof will be omitted.

第２の実施形態によれば、第二評価値に加算する値に特徴行列である共分散行列の固有値を割り当てることにより、連続値を設定することが可能となり、精度の高い第二評価値の算出を行うことができる。結果的に、精度の高い正常異常識別を行うことができる。 According to the second embodiment, it is possible to set a continuous value by assigning an eigenvalue of a covariance matrix that is a feature matrix to a value to be added to the second evaluation value. Calculations can be made. As a result, it is possible to perform normal / abnormal identification with high accuracy.

（第３の実施形態）
次に、本発明の第３の実施形態について説明する。第１の実施形態では、重みパラメータを複数設定し、複数の重みパラメータ毎に、第二評価値を算出し、算出した第二評価値に基づいて特徴量のサブセットを生成する。これに対し、第３の実施形態では、複数の重みパラメータをかえて生成したそれぞれの特徴量のサブセットを分解して、より多くのサブセットを生成する。なお、以下では、第３の実施形態において、前述した第１の実施形態と異なる点についてのみ説明する。 (Third embodiment)
Next, a third embodiment of the present invention will be described. In the first embodiment, a plurality of weighting parameters are set, a second evaluation value is calculated for each of the plurality of weighting parameters, and a feature amount subset is generated based on the calculated second evaluation value. On the other hand, in the third embodiment, a subset of each feature quantity generated by changing a plurality of weight parameters is decomposed to generate a larger number of subsets. In the following description, only the differences of the third embodiment from the first embodiment will be described.

第３の実施形態における特徴量のサブセットの生成処理のフローチャートを図７に示す。なお、第３の実施形態における図７のステップＳ７０１、Ｓ７０２、Ｓ７０３、Ｓ７０４の処理は、第１の実施形態における図５のステップＳ５０１、Ｓ５０２、Ｓ５０３、Ｓ５０４の処理とそれぞれ同様であるので説明は省略する。 FIG. 7 shows a flowchart of the feature quantity subset generation processing according to the third embodiment. Note that the processes in steps S701, S702, S703, and S704 in FIG. 7 in the third embodiment are the same as the processes in steps S501, S502, S503, and S504 in FIG. 5 in the first embodiment, respectively. Omitted.

（ステップＳ７０５：特徴量の組み合わせを分解し、特徴量のサブセットを生成）
ステップＳ７０５では、ステップＳ７０４で算出した第二評価値を利用して、特徴量の組み合わせを分解し、特徴量のサブセットを生成する。特徴量の組み合わせに対して、ステップＳ７０４で重みパラメータＫ毎に算出した第二評価値を降順に並べて第二評価値の閾値処理を行い、第二評価値が所定の値以上の特徴量を特徴量のサブセットに決定する。このときの閾値処理は、降順に並べた第二評価値の累積値を利用してパーセンタイル法により閾値処理を行う。閾値処理については、図５のステップＳ５０５と同一であるので説明は省略する。 (Step S705: Decompose a combination of feature values to generate a subset of feature values)
In step S705, using the second evaluation value calculated in step S704, the combination of feature amounts is decomposed to generate a subset of feature amounts. For the combination of feature amounts, the second evaluation values calculated for each weighting parameter K in step S704 are arranged in descending order to perform threshold processing of the second evaluation values, and feature amounts whose second evaluation values are equal to or greater than a predetermined value are characterized. Decide on a subset of quantities. The threshold processing at this time is performed by the percentile method using the cumulative value of the second evaluation values arranged in descending order. The threshold processing is the same as step S505 in FIG.

閾値処理を行った後に、決定した複数種類の特徴量の組み合わせを分解し、分解した特徴量の組み合わせをサブセットに設定する。図８は、特徴量の組み合わせを分解し、分解した特徴量の組み合わせをサブセットに設定する概念を示す図である。 After performing the threshold processing, the determined combination of the plurality of types of feature amounts is decomposed, and the decomposed combination of feature amounts is set as a subset. FIG. 8 is a diagram illustrating a concept of decomposing a combination of feature amounts and setting the decomposed combination of feature amounts as a subset.

図８（Ａ）は、重みパラメータを設定して第二評価値を算出する際に、ａ位の第一評価値を持つ特徴量まで、第二評価値に対して値を加算した結果を示している。特徴量インデックスは、第二評価値が高い順に並べられている。第二評価値の算出では、第一評価値の順位が一位の特徴量の第二評価値に対して値ａを加算し、第一評価値の順位が二位の特徴量の第二評価値に対して値（ａ−１）を加算する。また、第一評価値の順位がｎ位の特徴量の第二評価値に対して値（ａ−ｎ）を加算し、第一評価値の順位がａ位の特徴量の第二評価値に対して値１を加算する。 FIG. 8A shows the result of adding values to the second evaluation value up to the feature amount having the first evaluation value at the a position when the weight parameter is set and the second evaluation value is calculated. ing. The feature amount indexes are arranged in descending order of the second evaluation value. In the calculation of the second evaluation value, the value a is added to the second evaluation value of the feature quantity ranked first, and the second evaluation of the feature quantity ranked second is the first evaluation value. The value (a-1) is added to the value. In addition, a value (an) is added to the second evaluation value of the feature value ranked n in the first evaluation value, and the second evaluation value of the feature value ranked a in the first evaluation value is added. Add 1 to the value.

このとき、加算する値は離散的に設定するため、第二評価値が同じになる特徴量が複数存在する。第二評価値が同じになる特徴量に関しては、どちらがよいのかを判断することができないので、第二評価値に基づいて並べた特徴量に対し、第二評価値の閾値を設定することができない。よって、第二評価値の値が変化するところで、選択する特徴量のサブセットを区切る。 At this time, since the values to be added are set discretely, there are a plurality of feature amounts having the same second evaluation value. Since it is not possible to determine which one is better for the feature amounts having the same second evaluation value, it is not possible to set a threshold value for the second evaluation value for the feature amounts arranged based on the second evaluation value. . Therefore, when the value of the second evaluation value changes, the subset of feature quantities to be selected is divided.

そのため、第二評価値の変化点から、特徴量選択数は、順に３、６、１０、１２、１４となる。特徴量選択数が３であるときに選択される特徴量インデックスは、１２、２５、４１であり、特徴量選択数が６であるときに選択される特徴量インデックスは１２、２５、４１、８、１６、９である。特徴量選択数が１０であるときに選択される特徴量インデックスは、１２、２５、４１、８、１６、９、１１、２８、７、１３である。特徴量選択数が１２であるときに選択される特徴量インデックスは、１２、２５、４１、８、１６、９、１１、２８、７、１３、２６、４２である。特徴量選択数が１４であるときに選択される特徴量インデックスは、１２、２５、４１、８、１６、９、１１、２８、７、１３、２６、４２、５７、１７である。 Therefore, the feature quantity selection numbers are 3, 6, 10, 12, and 14 in order from the change point of the second evaluation value. The feature quantity indexes selected when the feature quantity selection number is 3 are 12, 25, 41, and the feature quantity indexes selected when the feature quantity selection number is 6, 12, 25, 41, 8 16,9. The feature quantity indexes selected when the feature quantity selection number is 10, are 12, 25, 41, 8, 16, 9, 11, 28, 7, and 13. The feature quantity indexes selected when the feature quantity selection number is 12 are 12, 25, 41, 8, 16, 9, 11, 28, 7, 13, 26, and 42. The feature quantity indexes selected when the feature quantity selection number is 14 are 12, 25, 41, 8, 16, 9, 11, 28, 7, 13, 26, 42, 57, and 17.

図８（Ｂ）は、図８（Ａ）とは異なる重みパラメータを設定した場合を示しており、第二評価値を算出する際に、ｂ位の第一評価値を持つ特徴量まで、第二評価値に対して値を加算した結果を示している。ここでは、特徴量選択数は、２、６、９、１２、１７、１９となる。 FIG. 8B shows a case where a weight parameter different from that shown in FIG. 8A is set. When calculating the second evaluation value, up to the feature quantity having the b-th first evaluation value, The result of adding values to the two evaluation values is shown. Here, the feature quantity selection numbers are 2, 6, 9, 12, 17, and 19.

同様に、図８（Ｃ）は、図８（Ａ）とは異なる重みパラメータを設定した場合を示しており、第二評価値を算出する際に、ｃ位の第一評価値を持つ特徴量まで、第二評価値に対して値を加算した結果を示している。ここでは、特徴量選択数は、４、６、１１、１３、１７、２１、２４となる。 Similarly, FIG. 8C shows a case where a weight parameter different from that shown in FIG. 8A is set. When calculating the second evaluation value, the feature quantity having the c-first evaluation value is shown. Until now, the result of adding a value to the second evaluation value is shown. Here, the feature quantity selection numbers are 4, 6, 11, 13, 17, 21, 24.

このようにして、図８（Ａ）のように重みパラメータを設定した場合から特徴量のサブセット（特徴量選択数：３、６、１２、１４）が４つ生成される。また、図８（Ｂ）のように重みパラメータを設定した場合から特徴量のサブセット（特徴量選択数：２、６、９、１２、１７、１９）が６つ生成される。図８（Ｃ）のように重みパラメータを設定した場合から特徴量のサブセット（特徴量選択数：４、６、１１、１３、１７、２１、２４）が７つ生成される。結果として、サブセットは合計で４＋６＋７＝１７個生成される。 In this way, four subsets of feature quantities (number of feature quantity selections: 3, 6, 12, 14) are generated from the case where the weight parameters are set as shown in FIG. 8A. Also, six feature quantity subsets (number of feature quantity selections: 2, 6, 9, 12, 17, 19) are generated from the case where the weight parameter is set as shown in FIG. 8B. Seven subsets of feature quantities (number of feature quantity selections: 4, 6, 11, 13, 17, 21, 24) are generated from the case where the weight parameter is set as shown in FIG. 8C. As a result, a total of 4 + 6 + 7 = 17 subsets are generated.

このようにしてサブセットを生成するが、異なる重みパラメータで特徴量選択数が重複する場合があるので、Ｋが小さい特徴量のサブセットを優先的に利用するようにしてもよい。その場合、重複する特徴量選択数を除くので、４＋（６−２）＋（７−２）＝１３個の特徴量のサブセットが生成されることになる。 Although the subset is generated in this way, since the number of feature quantity selections may be duplicated with different weight parameters, the subset of the feature quantity with a small K may be used preferentially. In that case, since the number of feature quantity selections that overlap is excluded, a subset of 4+ (6-2) + (7-2) = 13 feature quantities is generated.

第３の実施形態によれば、複数の重みパラメータ毎に特徴量のサブセットを生成するだけでなく、サブセットを分解して、より多くのサブセットを生成する。これにより、特徴量のサブセットの候補をより多く生成することができ、特徴量の選択の候補を増分させることになり、結果的に精度の高い正常異常識別を行うことができる。 According to the third embodiment, not only a subset of feature values is generated for each of a plurality of weight parameters, but also the subset is decomposed to generate a larger number of subsets. As a result, more candidate subsets of feature quantities can be generated, and feature quantity selection candidates are incremented. As a result, highly accurate normal / abnormal discrimination can be performed.

（第４の実施形態）
次に、本発明の第４の実施形態について説明する。第１〜第３の実施形態では、検査のタスクを例に説明を行った。これに対し、第４の実施形態では、遺伝子の診断において、コストを抑えるチップの作成を行うための遺伝子選択のタスクを例に説明する。 (Fourth embodiment)
Next, a fourth embodiment of the present invention will be described. In the first to third embodiments, the inspection task has been described as an example. On the other hand, in the fourth embodiment, a gene selection task for creating a chip for reducing costs in gene diagnosis will be described as an example.

図９は、第４の実施形態における情報処理システムの構成例を示す図である。メッセンジャーＲＮＡ測定装置９０１は、マイクロアレイ技術によって様々な情報断片を運ぶメッセンジャーＲＮＡの個数、つまり遺伝子発現量を測定する。情報処理装置９０２は、メッセンジャーＲＮＡ測定装置９０１により測定されたメッセンジャーＲＮＡの個数、つまり遺伝子発現量が、教師ラベルに対して統計的に有意であるか否かを判断し、遺伝子を選択するための装置である。 FIG. 9 is a diagram illustrating a configuration example of an information processing system according to the fourth embodiment. The messenger RNA measurement device 901 measures the number of messenger RNAs that carry various information fragments, that is, the gene expression level, using microarray technology. The information processing device 902 determines whether the number of messenger RNAs measured by the messenger RNA measuring device 901, that is, the gene expression level is statistically significant with respect to the teacher label, and selects a gene. Device.

図１０は、本実施形態における情報処理装置９０２の処理例を示すフローチャートである。なお、図１０に示す各ステップの処理は、情報処理装置９０２により実行される。
（ステップＳ１００１：分散値によるスクリーニング）
ステップＳ１００１では、教師ラベルを用いずに遺伝子発現の分散を算出し、分散値の大きい遺伝子から順に選択して１０００程度の遺伝子にスクリーニングを行う。 FIG. 10 is a flowchart illustrating a processing example of the information processing apparatus 902 according to the present embodiment. Note that the processing of each step shown in FIG. 10 is executed by the information processing device 902.
(Step S1001: Screening by variance value)
In step S1001, the variance of gene expression is calculated without using a teacher label, and genes having a large variance are selected in order, and about 1000 genes are screened.

（ステップＳ１００２：評価値に基づいたスクリーニング）
ステップＳ１００２では、ステップＳ１００１で分散値を用いてスクリーニングした遺伝子に対し、さらにｔ−ｓｃｏｒｅと呼ばれる評価値を用いて５００程度の遺伝子にスクリーニングを行う。ここで、ｔ−ｓｃｏｒｅとは、Ｓ／Ｎ比の一種であり、遺伝子毎に発現量の群内平均の差と群内標準偏差の平均との比を表す評価値である。 (Step S1002: Screening Based on Evaluation Value)
In step S1002, about 500 genes are screened using an evaluation value called t-score with respect to the genes screened using the variance value in step S1001. Here, t-score is a kind of S / N ratio, and is an evaluation value that represents the ratio between the difference in the group average of the expression level and the average of the group standard deviation for each gene.

（ステップＳ１００３：遺伝子のサブセットの生成）
ステップＳ１００３では、ステップＳ１００２での遺伝子のスクリーニング結果に対し、特徴量の選択技術を用いて遺伝子のサブセットを生成する。特徴量の選択を行う手法として、第１の実施形態における図３のステップＳ３０３の手法を用いる。ここでは、遺伝子を特徴量とし、遺伝子間の発現量を利用して評価値を算出することによって、特徴量の選択を行う。ステップＳ１００３の処理の詳細については、図３のステップＳ３０３の処理と同様であるので説明を省略する。 (Step S1003: Generation of Gene Subset)
In step S1003, gene subsets are generated using a feature quantity selection technique for the gene screening results in step S1002. As a method for selecting a feature amount, the method of step S303 in FIG. 3 in the first embodiment is used. Here, the feature value is selected by calculating the evaluation value using the gene as a feature value and using the expression level between the genes. Details of the processing in step S1003 are the same as the processing in step S303 in FIG.

（ステップＳ１００４：交差確認法による遺伝子のサブセットの選択）
ステップＳ１００４では、ステップＳ１００３で生成した遺伝子のサブセットに対し、交差確認法により選択を行う。 (Step S1004: Selection of Gene Subset by Cross Confirmation Method)
In step S1004, the subset of genes generated in step S1003 is selected by the cross-confirmation method.

第４の実施形態によれば、特徴量の選択技術を用いて遺伝子のサブセットを生成して、交差確認法により遺伝子の再選択を行う。これにより、精度の高い遺伝子選択を行うことができ、結果として、遺伝子の診断においてコストを抑えるチップの作成が可能となる。 According to the fourth embodiment, gene subsets are generated using a feature quantity selection technique, and genes are reselected by a cross-validation method. Thereby, it is possible to perform gene selection with high accuracy, and as a result, it is possible to create a chip that suppresses costs in gene diagnosis.

（本発明の他の実施形態）
本発明は、前述の実施形態の１以上の機能を実現するプログラムを、ネットワーク又は記憶媒体を介してシステム又は装置に供給し、そのシステム又は装置のコンピュータにおける１つ以上のプロセッサがプログラムを読み出し実行する処理でも実現可能である。また、１以上の機能を実現する回路（例えば、ＡＳＩＣ）によっても実現可能である。 (Other embodiments of the present invention)
The present invention supplies a program that realizes one or more functions of the above-described embodiments to a system or apparatus via a network or a storage medium, and one or more processors in the computer of the system or apparatus read and execute the program This process can be realized. It can also be realized by a circuit (for example, ASIC) that realizes one or more functions.

例えば、前述の実施形態に示した情報処理装置は、図１１に示すようなコンピュータ機能１１００を有し、そのＣＰＵ１１０１により前述の実施形態での動作が実施される。コンピュータ機能１１００は、図１１に示すように、ＣＰＵ１１０１と、ＲＯＭ１１０２と、ＲＡＭ１１０３とを有する。また、操作部（ＣＯＮＳ）１１０９のコントローラ（ＣＯＮＳＣ）１１０５と、表示部としてのディスプレイ（ＤＩＳＰ）１１１０のディスプレイコントローラ（ＤＩＳＰＣ）１１０６とを有する。さらに、ハードディスク（ＨＤ）１１１１、及びフレキシブルディスク等の記憶デバイス（ＳＴＤ）１１１２のコントローラ（ＤＣＯＮＴ）１１０７と、ネットワークインタフェースカード（ＮＩＣ）１１０８とを有する。それら機能部１１０１、１１０２、１１０３、１１０５、１１０６、１１０７、１１０８は、システムバス１１０４を介して互いに通信可能に接続された構成としている。 For example, the information processing apparatus shown in the above embodiment has a computer function 1100 as shown in FIG. 11, and the CPU 1101 performs the operation in the above embodiment. As shown in FIG. 11, the computer function 1100 includes a CPU 1101, a ROM 1102, and a RAM 1103. In addition, a controller (CONSC) 1105 of the operation unit (CONS) 1109 and a display controller (DISPC) 1106 of a display (DISP) 1110 as a display unit are provided. Further, a hard disk (HD) 1111, a controller (DCONT) 1107 of a storage device (STD) 1112 such as a flexible disk, and a network interface card (NIC) 1108 are included. The functional units 1101, 1102, 1103, 1105, 1106, 1107, and 1108 are configured to be communicably connected to each other via the system bus 1104.

ＣＰＵ１１０１は、ＲＯＭ１１０２又はＨＤ１１１１に記憶されたソフトウェア、又はＳＴＤ１１１２より供給されるソフトウェアを実行することで、システムバス１１０４に接続された各構成部を総括的に制御する。すなわち、ＣＰＵ１１０１は、前述したような動作を行うための処理プログラムを、ＲＯＭ１１０２、ＨＤ１１１１、又はＳＴＤ１１１２から読み出して実行することで、前述の実施形態での動作を実現するための制御を行う。ＲＡＭ１１０３は、ＣＰＵ１１０１の主メモリ又はワークエリア等として機能する。ＣＯＮＳＣ１１０５は、ＣＯＮＳ１１０９からの指示入力を制御する。ＤＩＳＰＣ１１０６は、ＤＩＳＰ１１１０の表示を制御する。ＤＣＯＮＴ１１０７は、ブートプログラム、種々のアプリケーション、ユーザファイル、ネットワーク管理プログラム、及び前述の実施形態における処理プログラム等を記憶するＨＤ１１１１及びＳＴＤ１１１２とのアクセスを制御する。ＮＩＣ１１０８はネットワーク１１１３上の他の装置と双方向にデータをやりとりする。 The CPU 1101 performs overall control of each component connected to the system bus 1104 by executing software stored in the ROM 1102 or the HD 1111 or software supplied from the STD 1112. That is, the CPU 1101 reads out and executes a processing program for performing the above-described operation from the ROM 1102, the HD 1111 or the STD 1112, thereby performing control for realizing the operation in the above-described embodiment. The RAM 1103 functions as a main memory or work area for the CPU 1101. The CONSC 1105 controls an instruction input from the CONS 1109. The DISPC 1106 controls the display of the DISP 1110. The DCONT 1107 controls access to the HD 1111 and the STD 1112 that store a boot program, various applications, user files, a network management program, a processing program in the above-described embodiment, and the like. The NIC 1108 exchanges data bidirectionally with other devices on the network 1113.

なお、前記実施形態は、何れも本発明を実施するにあたっての具体化のほんの一例を示したものに過ぎず、これらによって本発明の技術的範囲が限定的に解釈されてはならないものである。すなわち、本発明はその技術思想、又はその主要な特徴から逸脱することなく、様々な形で実施することができる。 The above-described embodiments are merely examples of implementation in carrying out the present invention, and the technical scope of the present invention should not be construed as being limited thereto. That is, the present invention can be implemented in various forms without departing from the technical idea or the main features thereof.

１０３：情報処理装置２０１：特徴量抽出部２０２：特徴量選択部２０３：パラメータ設定部２０４：識別器生成部２０５：識別器判定部 103: Information processing apparatus 201: Feature quantity extraction unit 202: Feature quantity selection unit 203: Parameter setting unit 204: Discriminator generation unit 205: Discriminator determination unit

Claims

入力データから抽出される複数の特徴量を組み合わせて、特徴量の組み合わせを生成する組み合わせ生成手段と、
前記特徴量の組み合わせに対して、前記入力データの分類の判定に適しているか否かを評価する第一評価値を算出する第一の算出手段と、
前記第一評価値に基づいて前記特徴量の組み合わせを評価する第二評価値を算出する際に使用するパラメータを複数生成するパラメータ生成手段と、
前記複数のパラメータ毎に前記第二評価値を算出する第二の算出手段と、
前記複数のパラメータ毎に、前記第二評価値に基づいて前記特徴量を選択し、前記特徴量のサブセットを生成するサブセット生成手段とを有することを特徴とする情報処理装置。 A combination generating means for generating a combination of feature amounts by combining a plurality of feature amounts extracted from input data;
First calculation means for calculating a first evaluation value for evaluating whether or not the combination of the feature amounts is suitable for determination of the classification of the input data;
Parameter generating means for generating a plurality of parameters used when calculating a second evaluation value for evaluating the combination of feature quantities based on the first evaluation value;
Second calculating means for calculating the second evaluation value for each of the plurality of parameters;
An information processing apparatus comprising: a subset generation unit configured to select the feature amount based on the second evaluation value for each of the plurality of parameters and generate a subset of the feature amount.

前記第一の算出手段は、正常データと異常データとを用いて前記第一評価値を算出することを特徴とする請求項１記載の情報処理装置。 The information processing apparatus according to claim 1, wherein the first calculation unit calculates the first evaluation value using normal data and abnormal data.

前記第一評価値は、ベイズ誤り確率推定値であることを特徴とする請求項２記載の情報処理装置。 The information processing apparatus according to claim 2, wherein the first evaluation value is a Bayes error probability estimation value.

前記第一評価値は、クラス内分散・クラス間分散比であることを特徴とする請求項２記載の情報処理装置。 The information processing apparatus according to claim 2, wherein the first evaluation value is an intraclass variance / interclass variance ratio.

前記第一の算出手段は、正常データのみを用いて前記第一評価値を算出することを特徴とする請求項１記載の情報処理装置。 The information processing apparatus according to claim 1, wherein the first calculation unit calculates the first evaluation value using only normal data.

前記第一評価値は、前記正常データの重心からの最も離れたデータへのユークリッド距離であることを特徴とする請求項５記載の情報処理装置。 The information processing apparatus according to claim 5, wherein the first evaluation value is a Euclidean distance to data farthest from the center of gravity of the normal data.

前記第二の算出手段は、前記第一評価値の順位に応じた値を前記第二評価値に対して加算して前記第二評価値の算出を行うことを特徴とする請求項１〜６の何れか１項に記載の情報処理装置。 The second calculation means calculates the second evaluation value by adding a value corresponding to the rank of the first evaluation value to the second evaluation value. The information processing apparatus according to any one of the above.

前記第二評価値に対して加算する値は、前記第一評価値の順位に応じて線形に変化する値であることを特徴とする請求項７記載の情報処理装置。 The information processing apparatus according to claim 7, wherein the value added to the second evaluation value is a value that linearly changes in accordance with the rank of the first evaluation value.

前記第二評価値に対して加算する値は、特徴行列の固有値により算出された値であることを特徴とする請求項７記載の情報処理装置。 The information processing apparatus according to claim 7, wherein the value added to the second evaluation value is a value calculated from an eigenvalue of a feature matrix.

前記サブセット生成手段は、前記複数のパラメータ毎に生成した特徴量のサブセットを前記第二評価値の値に基づいて分解し、前記特徴量のサブセットの候補を増分させて前記特徴量のサブセットを生成することを特徴とする請求項１〜９の何れか１項に記載の情報処理装置。 The subset generation means decomposes the subset of feature amounts generated for each of the plurality of parameters based on the value of the second evaluation value, and generates the subset of feature amounts by incrementing candidates for the subset of feature amounts. The information processing apparatus according to claim 1, wherein the information processing apparatus is an information processing apparatus.

前記サブセット生成手段が生成した前記特徴量のサブセットから交差確認法を用いて、特徴量のサブセットの選択を行う選択手段を有することを特徴とする請求項１〜９の何れか１項に記載の情報処理装置。 10. The apparatus according to claim 1, further comprising a selection unit configured to select a subset of feature amounts using a cross-confirmation method from the subset of feature amounts generated by the subset generation unit. Information processing device.

入力データから抽出される複数の特徴量を組み合わせて、特徴量の組み合わせを生成する工程と、
前記特徴量の組み合わせに対して、前記入力データの分類の判定に適しているか否かを評価する第一評価値を算出する工程と、
前記第一評価値に基づいて前記特徴量の組み合わせを評価する第二評価値を算出する際に使用するパラメータを複数生成する工程と、
前記複数のパラメータ毎に前記第二評価値を算出する工程と、
前記複数のパラメータ毎に、前記第二評価値に基づいて前記特徴量を選択し、前記特徴量のサブセットを生成する工程とを有することを特徴とする情報処理方法。 Combining a plurality of feature amounts extracted from input data to generate a combination of feature amounts;
Calculating a first evaluation value for evaluating whether or not the combination of the feature amounts is suitable for determining the classification of the input data;
Generating a plurality of parameters for use in calculating a second evaluation value for evaluating the combination of feature values based on the first evaluation value;
Calculating the second evaluation value for each of the plurality of parameters;
And a step of selecting the feature amount based on the second evaluation value for each of the plurality of parameters and generating a subset of the feature amount.

入力データから抽出される複数の特徴量を組み合わせて、特徴量の組み合わせを生成するステップと、
前記特徴量の組み合わせに対して、前記入力データの分類の判定に適しているか否かを評価する第一評価値を算出するステップと、
前記第一評価値に基づいて前記特徴量の組み合わせを評価する第二評価値を算出する際に使用するパラメータを複数生成するステップと、
前記複数のパラメータ毎に前記第二評価値を算出するステップと、
前記複数のパラメータ毎に、前記第二評価値に基づいて前記特徴量を選択し、前記特徴量のサブセットを生成するステップとをコンピュータに実行させるためのプログラム。 Generating a combination of feature values by combining a plurality of feature values extracted from input data;
Calculating a first evaluation value for evaluating whether or not the combination of the feature amounts is suitable for determining the classification of the input data;
Generating a plurality of parameters for use in calculating a second evaluation value for evaluating the combination of feature quantities based on the first evaluation value;
Calculating the second evaluation value for each of the plurality of parameters;
A program for causing a computer to execute the step of selecting the feature amount based on the second evaluation value and generating a subset of the feature amount for each of the plurality of parameters.