JP7071624B2

JP7071624B2 - Search program, search method and search device

Info

Publication number: JP7071624B2
Application number: JP2018045283A
Authority: JP
Inventors: 晃浦; 健一小林; 晴康上田
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2018-03-13
Filing date: 2018-03-13
Publication date: 2022-05-19
Anticipated expiration: 2038-03-13
Also published as: JP2019159769A

Description

本発明は探索プログラム、探索方法および探索装置に関する。 The present invention relates to a search program, a search method, and a search device.

コンピュータを利用したデータ分析の１つとして、機械学習が行われることがある。機械学習では、複数の既知の事例を示す訓練データをコンピュータに入力する。コンピュータは、訓練データを分析して、要因（説明変数や独立変数と言うことがある）と結果（目的変数や従属変数と言うことがある）との間の関係を一般化したモデルを生成する。生成されたモデルを用いることで未知の事例の結果を予測することができる。機械学習では、生成されるモデルの正確さ、すなわち、未知の事例の結果を正確に予測する能力（予測性能と言うことがある）が高いことが好ましい。 Machine learning may be performed as one of the data analysis using a computer. In machine learning, training data showing multiple known cases is input to a computer. The computer analyzes the training data to generate a generalized model of the relationship between factors (sometimes referred to as explanatory or independent variables) and outcomes (sometimes referred to as objective or dependent variables). .. The results of unknown cases can be predicted by using the generated model. In machine learning, it is preferable that the accuracy of the generated model, that is, the ability to accurately predict the result of an unknown case (sometimes referred to as prediction performance) is high.

機械学習では、同じ訓練データを使用する場合であっても、ハイパーパラメータ値を変えることで予測性能の異なるモデルが生成される。ハイパーパラメータは、機械学習の挙動を調整する設定項目である。ハイパーパラメータ値は、機械学習を通じて決定されるモデル内の係数とは異なり、モデル生成の開始前に指定される。ハイパーパラメータには、例えば、ロジスティック回帰分析、サポートベクタマシン（ＳＶＭ：Support Vector Machine）、ランダムフォレストなどの機械学習アルゴリズムを示すものが含まれる。また、ハイパーパラメータには、例えば、ＲＢＦ（Radial Basis Function）カーネルＳＶＭという機械学習アルゴリズムで使用される「Ｃ」や「γ」が含まれる。 In machine learning, even when the same training data is used, models with different prediction performance are generated by changing the hyperparameter values. Hyperparameters are setting items that adjust the behavior of machine learning. Hyperparameter values are specified before the start of model generation, unlike the coefficients in the model that are determined through machine learning. Hyperparameters include, for example, those showing machine learning algorithms such as logistic regression analysis, Support Vector Machine (SVM), and Random Forest. Further, the hyperparameters include, for example, "C" and "γ" used in a machine learning algorithm called RBF (Radial Basis Function) kernel SVM.

ある訓練データに対して、モデルの予測性能が最も高くなるようなハイパーパラメータ値が事前にわからないこともある。そこで、予測性能が高くなるハイパーパラメータ値をコンピュータが探索することが考えられる。例えば、ハイパーパラメータ値を設定する設定装置が提案されている。提案の設定装置は、データセットからサンプリングした小さなサイズの訓練データを用いて、異なるハイパーパラメータ値に対応する複数のモデルを生成し、これら複数のモデルの予測性能を測定する。設定装置は、複数のハイパーパラメータ値を予測性能の高いグループと予測性能の低いグループとに分類し、予測性能の高いグループに分類される確率に基づいて、次に使用するハイパーパラメータ値を選択する。 For some training data, we may not know in advance the hyperparameter values that give the best prediction performance of the model. Therefore, it is conceivable that the computer searches for hyperparameter values that improve the prediction performance. For example, a setting device for setting hyperparameter values has been proposed. The proposed setup device uses small size training data sampled from a dataset to generate multiple models corresponding to different hyperparameter values and measure the predictive performance of these multiple models. The setting device classifies multiple hyperparameter values into a group with high prediction performance and a group with low prediction performance, and selects the hyperparameter value to be used next based on the probability of being classified into the group with high prediction performance. ..

特開２０１６－２１８８６９号公報Japanese Unexamined Patent Publication No. 2016-218869

ところで、ハイパーパラメータは、値の大小関係が規定されないハイパーパラメータと値の大小関係が規定されるハイパーパラメータの組み合わせであることがある。前者は名義尺度に相当し、カテゴリカルハイパーパラメータと言うことがある。カテゴリカルハイパーパラメータの例として、機械学習アルゴリズムや、ランダムフォレストのクラス分類方法（ジニ係数とエントロピーの二者択一）が挙げられる。後者は順序尺度や比例尺度や間隔尺度に相当し、連続量ハイパーパラメータと言うことがある。連続量ハイパーパラメータの例として、ＲＢＦカーネルＳＶＭの「Ｃ」や「γ」が挙げられる。 By the way, a hyperparameter may be a combination of a hyperparameter in which the magnitude relation of values is not defined and a hyperparameter in which the magnitude relation of values is defined. The former corresponds to a nominal scale and is sometimes called a categorical hyperparameter. Examples of categorical hyperparameters include machine learning algorithms and random forest classification methods (Gini coefficient and entropy alternatives). The latter corresponds to an ordinal scale, a proportional scale, and an interval scale, and is sometimes called a continuous quantity hyperparameter. Examples of continuous quantity hyperparameters include "C" and "γ" of RBF kernel SVM.

この場合、モデルの予測性能が高くなるハイパーパラメータ値を探索するにあたり、カテゴリカルハイパーパラメータ値と連続量ハイパーパラメータ値の適切な組み合わせを探すことになる。連続量ハイパーパラメータについては、その性質から、近い値を用いて生成されたモデルは近い予測性能をもつ可能性が高い。そのため、カテゴリカルハイパーパラメータ値を１つに固定すると、選択可能な全ての連続量ハイパーパラメータ値を試行しなくても予測性能が高くなる連続量ハイパーパラメータ値を推定し得る。 In this case, when searching for hyperparameter values that improve the prediction performance of the model, it is necessary to search for an appropriate combination of categorical hyperparameter values and continuous quantity hyperparameter values. For continuous quantity hyperparameters, due to their nature, models generated with close values are likely to have close predictive performance. Therefore, if the categorical hyperparameter values are fixed to one, it is possible to estimate the continuous quantity hyperparameter values that improve the prediction performance without trying all the continuous quantity hyperparameter values that can be selected.

これに対し、カテゴリカルハイパーパラメータについては、その性質から、近い値を用いて生成されたモデルが近い予測性能をもつという仮定は成立しない。ただし、連続量ハイパーパラメータ値が同一でカテゴリカルハイパーパラメータ値が異なるモデルの予測性能の間に、何らかの関係性が存在する場合もあり完全に無関係であるとは限らない。そこで、異なるカテゴリカルハイパーパラメータ値を用いて生成されたモデルの予測性能をどの様に利用すれば、予測性能が高くなるようなカテゴリカルハイパーパラメータ値と連続量ハイパーパラメータ値の組み合わせを効率的に探索できるかが問題となる。 On the other hand, for categorical hyperparameters, the assumption that models generated using close values have close predictive performance does not hold due to their nature. However, there may be some relationship between the prediction performance of models with the same continuous quantity hyperparameter values but different categorical hyperparameter values, and it is not always completely irrelevant. Therefore, how to use the prediction performance of the model generated using different categorical hyperparameter values efficiently to combine the categorical hyperparameter values and the continuous quantity hyperparameter values so that the prediction performance becomes high. The question is whether it can be searched.

１つの側面では、本発明は、ハイパーパラメータ値の探索を効率化することができる探索プログラム、探索方法および探索装置を提供することを目的とする。 In one aspect, it is an object of the present invention to provide a search program, a search method, and a search device capable of streamlining the search for hyperparameter values.

１つの態様では、コンピュータに以下の処理を実行させる探索プログラムが提供される。異なる値の間に大小関係が規定されない第１のハイパーパラメータと異なる値の間に大小関係が規定される第２のハイパーパラメータとに基づいて制御される機械学習を、複数のデータセットに対して過去に実行した結果について、第１のハイパーパラメータに第１の値を設定した場合の予測性能と第１のハイパーパラメータに第２の値を設定した場合の予測性能との間の差を示す予測性能差情報を取得する。複数のデータセットと異なる他のデータセットに対して、第１のハイパーパラメータに第１の値を設定し第２のハイパーパラメータに第３の値を設定して機械学習を実行させることで第１の予測性能を算出する。第１の予測性能と予測性能差情報とに基づいて、他のデータセットに対して次に機械学習を実行するときに使用する第１のハイパーパラメータの値と第２のハイパーパラメータの値の組み合わせを選択する。 In one embodiment, a search program is provided that causes a computer to perform the following processing. Machine learning controlled based on a first hyperparameter with no magnitude relationship between different values and a second hyperparameter with a magnitude relationship between different values for multiple datasets. Prediction showing the difference between the predicted performance when the first value is set for the first hyperparameter and the predicted performance when the second value is set for the first hyperparameter for the results executed in the past. Acquire performance difference information. For other data sets that are different from multiple data sets, the first hyperparameter is set to the first value and the second hyperparameter is set to the third value to execute machine learning. Calculate the prediction performance of. A combination of the values of the first hyperparameters and the values of the second hyperparameters to be used the next time machine learning is performed on other data sets based on the first prediction performance and the prediction performance difference information. Select.

また、１つの態様では、コンピュータが実行する探索方法が提供される。また、１つの態様では、記憶部と処理部とを有する探索装置が提供される。 Also, in one aspect, a search method performed by a computer is provided. Further, in one embodiment, a search device having a storage unit and a processing unit is provided.

１つの側面では、ハイパーパラメータ値の探索を効率化することができる。 In one aspect, the search for hyperparameter values can be streamlined.

第１の実施の形態の探索装置を説明する図である。It is a figure explaining the search apparatus of 1st Embodiment. 機械学習装置のハードウェア例を示すブロック図である。It is a block diagram which shows the hardware example of the machine learning apparatus. ハイパーパラメータと予測性能との関係例を示すグラフである。It is a graph which shows the relation example of a hyperparameter and a prediction performance. 予測性能差の期待値および標準偏差の例を示すグラフである。It is a graph which shows the example of the expected value and the standard deviation of the predicted performance difference. ハイパーパラメータ探索の進行例を示す図である。It is a figure which shows the progress example of the hyperparameter search. ハイパーパラメータ探索の進行例を示す図（続き）である。It is a figure (continuation) which shows the progress example of the hyperparameter search. 共有度と推定予測性能との関係例を示す図である。It is a figure which shows the relationship example of the degree of sharing and estimated prediction performance. 機械学習装置の機能例を示すブロック図であるIt is a block diagram which shows the functional example of the machine learning apparatus. サンプル履歴テーブルの例を示す図である。It is a figure which shows the example of the sample history table. 統計テーブルの例を示す図である。It is a figure which shows the example of the statistical table. 統計情報生成の手順例を示すフローチャートである。It is a flowchart which shows the procedure example of the statistical information generation. 統計情報生成の手順例を示すフローチャート（続き）である。It is a flowchart (continued) which shows the procedure example of the statistical information generation. 第２の実施の形態の機械学習の手順例を示すフローチャートである。It is a flowchart which shows the procedure example of the machine learning of 2nd Embodiment. 性能改善量推定の手順例を示すフローチャートである。It is a flowchart which shows the procedure example of the performance improvement amount estimation. 性能改善量推定の手順例を示すフローチャート（続き）である。It is a flowchart (continued) which shows the procedure example of the performance improvement amount estimation. 第３の実施の形態の機械学習の手順例を示すフローチャートである。It is a flowchart which shows the procedure example of the machine learning of the 3rd Embodiment. 性能改善速度推定の手順例を示すフローチャートである。It is a flowchart which shows the procedure example of the performance improvement speed estimation. 性能改善速度推定の手順例を示すフローチャート（続き）である。It is a flowchart (continued) which shows the procedure example of the performance improvement speed estimation.

以下、本実施の形態を図面を参照して説明する。
［第１の実施の形態］
第１の実施の形態を説明する。 Hereinafter, the present embodiment will be described with reference to the drawings.
[First Embodiment]
The first embodiment will be described.

図１は、第１の実施の形態の探索装置を説明する図である。
第１の実施の形態の探索装置１０は機械学習の進行を管理する。機械学習では、既知の事例を示す訓練データを分析することで、未知の事例の結果を予測するモデル（学習モデルと言うことがある）を生成する。探索装置１０が自ら機械学習を行ってもよいし、探索装置１０が他の装置に機械学習を行わせてもよい。探索装置１０は、ユーザが操作するクライアントコンピュータでもよいし、クライアントコンピュータからネットワーク経由でアクセスされるサーバコンピュータでもよい。 FIG. 1 is a diagram illustrating a search device according to the first embodiment.
The search device 10 of the first embodiment manages the progress of machine learning. In machine learning, a model (sometimes called a learning model) that predicts the result of an unknown case is generated by analyzing training data showing a known case. The search device 10 may perform machine learning by itself, or the search device 10 may cause another device to perform machine learning. The search device 10 may be a client computer operated by a user, or may be a server computer accessed from the client computer via a network.

探索装置１０は、記憶部１１および処理部１２を有する。記憶部１１は、ＲＡＭ（Random Access Memory）などの揮発性の半導体メモリでもよいし、ＨＤＤ（Hard Disk Drive）やフラッシュメモリなどの不揮発性のストレージでもよい。処理部１２は、例えば、ＣＰＵ（Central Processing Unit）やＤＳＰ（Digital Signal Processor）などのプロセッサである。ただし、処理部１２は、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などの特定用途の電子回路を含んでもよい。プロセッサは、ＲＡＭなどのメモリ（記憶部１１でもよい）に記憶されたプログラムを実行する。プログラムには探索プログラムが含まれる。複数のプロセッサの集合を「マルチプロセッサ」または単に「プロセッサ」と言うことがある。 The search device 10 has a storage unit 11 and a processing unit 12. The storage unit 11 may be a volatile semiconductor memory such as a RAM (Random Access Memory) or a non-volatile storage such as an HDD (Hard Disk Drive) or a flash memory. The processing unit 12 is, for example, a processor such as a CPU (Central Processing Unit) or a DSP (Digital Signal Processor). However, the processing unit 12 may include an electronic circuit for a specific purpose such as an ASIC (Application Specific Integrated Circuit) or an FPGA (Field Programmable Gate Array). The processor executes a program stored in a memory such as RAM (may be a storage unit 11). The program includes a search program. A collection of multiple processors may be referred to as a "multiprocessor" or simply a "processor."

機械学習によって生成されるモデルに対しては、既知の事例を示すテストデータを用いることで、未知の事例の結果を予測する精度を示す予測性能を算出することができる。予測性能を示す指標としては、例えば、正答率（Accuracy）、適合率（Precision）、Ｆ値、平均二乗誤差（ＭＳＥ：Mean Squared Error）、平均二乗誤差平方根（ＲＭＳＥ：Root Mean Squared Error）などが用いられる。機械学習では予測性能の高いモデルが生成されることが好ましい。生成されるモデルの予測性能は、訓練データとして使用するデータセット（データ母集合）や機械学習の方法に依存する。 For the model generated by machine learning, by using the test data showing the known case, it is possible to calculate the prediction performance showing the accuracy of predicting the result of the unknown case. Examples of indicators of predictive performance include accuracy, precision, F value, mean squared error (MSE), and root mean squared error (RMSE). Used. In machine learning, it is preferable to generate a model with high prediction performance. The predictive performance of the generated model depends on the dataset (data set) used as training data and the method of machine learning.

第１の実施の形態の機械学習は、ハイパーパラメータ１４（第１のハイパーパラメータ）とハイパーパラメータ１５（第２のハイパーパラメータ）とに基づいてその挙動が制御される。ハイパーパラメータ１４，１５は、モデルに含まれる係数のように機械学習を通じて値が決定されるものではなく、モデル生成の開始前に値が設定されるものである。ハイパーパラメータ１４，１５の値を変えると、生成されるモデルが変わって予測性能が変わる。そこで、探索装置１０は、予測性能が高くなるようなハイパーパラメータ１４の値とハイパーパラメータ１５の値の組み合わせを、機械学習を繰り返して探索する。 The behavior of the machine learning of the first embodiment is controlled based on hyperparameter 14 (first hyperparameter) and hyperparameter 15 (second hyperparameter). The hyperparameters 14 and 15 are not determined by machine learning like the coefficients included in the model, but are set before the start of model generation. When the values of hyperparameters 14 and 15 are changed, the generated model changes and the prediction performance changes. Therefore, the search device 10 repeatedly searches for a combination of the value of the hyperparameter 14 and the value of the hyperparameter 15 so as to improve the prediction performance by repeating machine learning.

ハイパーパラメータ１４は、異なる値の間に大小関係が規定されないハイパーパラメータである。ハイパーパラメータ１４は、カテゴリカルハイパーパラメータと言うことができ、名義尺度に相当すると言うこともできる。ハイパーパラメータ１４に設定可能な異なる値を比較しても、大小関係に意味が無く、両者の差や比にも意味が無い。よって、ハイパーパラメータ１４については、近い値を用いて生成されたモデルは近い予測性能をもつという仮定が成立しない。ハイパーパラメータ１４の例として、機械学習アルゴリズムや、ランダムフォレストのクラス分類方法（ジニ係数とエントロピーの二者択一）などが挙げられる。機械学習アルゴリズムの例としては、ロジスティック回帰分析、ＲＢＦカーネルＳＶＭ、ランダムフォレストなどが挙げられる。 The hyperparameter 14 is a hyperparameter in which a magnitude relationship is not defined between different values. The hyperparameter 14 can be said to be a categorical hyperparameter, and can also be said to correspond to a nominal scale. Even if different values that can be set for hyperparameter 14 are compared, the magnitude relationship is meaningless, and the difference or ratio between the two is also meaningless. Therefore, for hyperparameters 14, the assumption that models generated using close values have close predictive performance does not hold. Examples of hyperparameters 14 include machine learning algorithms and random forest classification methods (choice of Gini coefficient and entropy). Examples of machine learning algorithms include logistic regression analysis, RBF kernel SVM, random forest and the like.

ハイパーパラメータ１５は、異なる値の間に大小関係が規定されるハイパーパラメータである。ハイパーパラメータ１５は、連続量ハイパーパラメータと言うことができ、順序尺度や間隔尺度や比例尺度に相当すると言うこともできる。ハイパーパラメータ１５に設定可能な異なる値を比較したとき、大小関係は判定可能であるものの、両者の差や比に意味が無いことがある（順序尺度）。また、異なる値を比較したとき、両者の差（距離）は定義されるものの比に意味が無いことがある（間隔尺度）。また、異なる値を比較したとき、両者の差だけでなく比も定義されることがある（比例尺度）。ハイパーパラメータ１５は、異なる値の間の距離を算出可能であることが好ましい。ハイパーパラメータ１５に設定可能な値は、実数のような連続値でもよいし整数のような離散値でもよい。ハイパーパラメータ１５の例として、ＲＢＦカーネルＳＶＭの変数「Ｃ」と「γ」、ランダムフォレストの木の深さを示す整数などが挙げられる。 The hyperparameter 15 is a hyperparameter in which a magnitude relationship is defined between different values. The hyperparameter 15 can be said to be a continuous quantity hyperparameter, and can also be said to correspond to an ordinal scale, an interval scale, or a proportional scale. When comparing different values that can be set for hyperparameter 15, the magnitude relationship can be determined, but the difference or ratio between the two may be meaningless (ordinal scale). Also, when comparing different values, the difference (distance) between the two is defined, but the ratio may be meaningless (interval scale). Also, when comparing different values, not only the difference between the two but also the ratio may be defined (proportional scale). It is preferable that the hyperparameter 15 can calculate the distance between different values. The values that can be set in the hyperparameter 15 may be continuous values such as real numbers or discrete values such as integers. Examples of hyperparameter 15 include variables "C" and "γ" of RBF kernel SVM, integers indicating the depth of trees in a random forest, and the like.

なお、ハイパーパラメータ１４は、単一のハイパーパラメータ要素でもよいし、上記の性質を満たす２以上のハイパーパラメータ要素の集合でもよい。よって、ハイパーパラメータ１４の値は、単一のハイパーパラメータ要素の値であるスカラ値でもよいし、２以上のハイパーパラメータ要素の値を含むベクトル値でもよい。同様に、ハイパーパラメータ１５は、単一のハイパーパラメータ要素でもよいし２以上のハイパーパラメータ要素の集合でもよい。ハイパーパラメータ１５の値は、スカラ値でもよいしベクトル値でもよい。 The hyperparameter 14 may be a single hyperparameter element, or may be a set of two or more hyperparameter elements satisfying the above properties. Therefore, the value of the hyperparameter 14 may be a scalar value which is a value of a single hyperparameter element, or may be a vector value including the values of two or more hyperparameter elements. Similarly, the hyperparameter 15 may be a single hyperparameter element or a set of two or more hyperparameter elements. The value of hyperparameter 15 may be a scalar value or a vector value.

記憶部１１は、予測性能差情報１３を記憶する。予測性能差情報１３は、複数のデータセットに対して過去に機械学習を実行した結果から生成される。予測性能差情報１３は、探索装置１０によって生成されてもよいし他の装置によって生成されてもよい。 The storage unit 11 stores the prediction performance difference information 13. The prediction performance difference information 13 is generated from the result of performing machine learning on a plurality of data sets in the past. The predicted performance difference information 13 may be generated by the search device 10 or may be generated by another device.

予測性能差情報１３は、ハイパーパラメータ１４に値１４ａ（第１の値）を設定した場合の予測性能と、ハイパーパラメータ１４に値１４ｂ（第２の値）を設定した場合の予測性能との間の差を示す。予測性能差情報１３は、ハイパーパラメータ１５の値に関係なく、値１４ａを使用した場合の予測性能と値１４ｂを使用した場合の予測性能との間の差の傾向を示す情報であってもよい。また、予測性能差情報１３は、ハイパーパラメータ１５に設定可能な値それぞれに対応させて、値１４ａを使用した場合の予測性能と値１４ｂを使用した場合の予測性能との間の差を示す情報であってもよい。 The prediction performance difference information 13 is between the prediction performance when the value 14a (first value) is set for the hyperparameter 14 and the prediction performance when the value 14b (second value) is set for the hyperparameter 14. The difference is shown. The predicted performance difference information 13 may be information showing the tendency of the difference between the predicted performance when the value 14a is used and the predicted performance when the value 14b is used, regardless of the value of the hyperparameter 15. .. Further, the prediction performance difference information 13 is information indicating the difference between the prediction performance when the value 14a is used and the prediction performance when the value 14b is used, corresponding to each of the values that can be set in the hyperparameter 15. It may be.

すなわち、予測性能差情報１３は、ハイパーパラメータ１４に値１４ａを設定しハイパーパラメータ１５に特定の値を設定した場合の予測性能と、ハイパーパラメータ１４に値１４ｂを設定しハイパーパラメータ１５に上記特定の値を設定した場合の予測性能との間の差を示してもよい。予測性能差情報１３は、データセット毎の差を上記複数のデータセットの間で平均化して求めた期待値を含んでもよい。また、予測性能差情報１３は、差のばらつきの程度を示す分散度（例えば、分散や標準偏差）を含んでもよい。 That is, the prediction performance difference information 13 includes the prediction performance when the hyperparameter 14 is set to the value 14a and the hyperparameter 15 is set to a specific value, and the hyperparameter 14 is set to the value 14b and the hyperparameter 15 is specified above. It may show the difference from the prediction performance when the value is set. The predicted performance difference information 13 may include an expected value obtained by averaging the difference between the data sets among the plurality of data sets. Further, the prediction performance difference information 13 may include a degree of dispersion (for example, dispersion or standard deviation) indicating the degree of variation in the difference.

処理部１２は、記憶部１１に記憶された予測性能差情報１３を利用して、予測性能差情報１３の生成に用いた複数のデータセットと異なる他のデータセットに対する機械学習を制御する。このとき、処理部１２は、モデルの予測性能が高くなるハイパーパラメータ１４の値とハイパーパラメータ１５の値の組み合わせを探索する。 The processing unit 12 uses the prediction performance difference information 13 stored in the storage unit 11 to control machine learning for other data sets different from the plurality of data sets used to generate the prediction performance difference information 13. At this time, the processing unit 12 searches for a combination of the value of the hyperparameter 14 and the value of the hyperparameter 15 that enhances the prediction performance of the model.

処理部１２は、ハイパーパラメータ１４に値１４ａを設定しハイパーパラメータ１５に値１５ａ（第３の値）を設定して、上記他のデータセットに対して機械学習を実行させることで予測性能１６（第１の予測性能）を算出する。例えば、値１４ａと値１５ａを使用してモデルが生成され、生成されたモデルに対して予測性能１６が算出される。すると、処理部１２は、算出した予測性能１６と予測性能差情報１３とに基づいて、上記他のデータセットに対して次に機械学習を実行するときに使用するハイパーパラメータ１４の値とハイパーパラメータ１５の値の組み合わせを選択する。この組み合わせは、値１４ａと値１５ａの組み合わせとは異なるものである。 The processing unit 12 sets the hyperparameter 14 to the value 14a, sets the hyperparameter 15 to the value 15a (third value), and causes the other data sets to execute machine learning to perform machine learning 16 (predictive performance 16). First prediction performance) is calculated. For example, a model is generated using the values 14a and 15a, and the prediction performance 16 is calculated for the generated model. Then, based on the calculated prediction performance 16 and the prediction performance difference information 13, the processing unit 12 uses the values of the hyperparameters 14 and the hyperparameters to be used when the machine learning is next executed for the other data sets. Select a combination of 15 values. This combination is different from the combination of the value 14a and the value 15a.

例えば、処理部１２は、予測性能１６と予測性能差情報１３とに基づいて、ハイパーパラメータ１４に値１４ｂを設定しハイパーパラメータ１５に値１５ａを設定した場合の予測性能（第２の予測性能）を推定する。処理部１２は、ハイパーパラメータ１４に値１４ａを設定したときのハイパーパラメータ１５に対する予測性能の変化を、予測性能１６を用いて推定する。一方、処理部１２は、ハイパーパラメータ１４に値１４ｂを設定したときのハイパーパラメータ１５に対する予測性能の変化を、上記第２の予測性能を用いて推定する。これは、ハイパーパラメータ１４については近い値を使用しても予測性能が近似するわけではない一方、ハイパーパラメータ１５については近い値を使用すると予測性能が近似するという仮定が成立することを利用している。処理部１２は、例えば、最も予測性能の推定値が高くなるような組み合わせを選択する。 For example, the processing unit 12 sets the value 14b for the hyperparameter 14 and the value 15a for the hyperparameter 15 based on the prediction performance 16 and the prediction performance difference information 13 (second prediction performance). To estimate. The processing unit 12 estimates the change in the prediction performance with respect to the hyperparameter 15 when the value 14a is set for the hyperparameter 14, using the prediction performance 16. On the other hand, the processing unit 12 estimates the change in the prediction performance with respect to the hyperparameter 15 when the value 14b is set for the hyperparameter 14, using the second prediction performance. This utilizes the assumption that the prediction performance is not approximated even if close values are used for hyperparameter 14, while the prediction performance is approximated when close values are used for hyperparameter 15. There is. The processing unit 12 selects, for example, a combination that has the highest estimated value of prediction performance.

第１の実施の形態の探索装置１０によれば、ハイパーパラメータ１４に値１４ａを設定した場合の予測性能とハイパーパラメータ１４に値１４ｂを設定した場合の予測性能との間の差を示す予測性能差情報１３が取得される。ハイパーパラメータ１４に値１４ａを設定しハイパーパラメータ１５に値１５ａを設定して、他のデータセットに対して機械学習が試行されて予測性能１６が算出される。そして、算出された予測性能１６と予測性能差情報１３とに基づいて、上記他のデータセットに対して次に試行するハイパーパラメータ１４の値とハイパーパラメータ１５の値の組み合わせが選択される。これにより、予測性能が高くなるハイパーパラメータ値の探索が効率化される。 According to the search device 10 of the first embodiment, the prediction performance showing the difference between the prediction performance when the value 14a is set for the hyperparameter 14 and the prediction performance when the value 14b is set for the hyperparameter 14 The difference information 13 is acquired. By setting the value 14a for the hyperparameter 14 and setting the value 15a for the hyperparameter 15, machine learning is tried for other data sets and the prediction performance 16 is calculated. Then, based on the calculated predicted performance 16 and the predicted performance difference information 13, a combination of the value of the hyperparameter 14 and the value of the hyperparameter 15 to be tried next for the other data set is selected. This streamlines the search for hyperparameter values that improve prediction performance.

例えば、ハイパーパラメータ１４に値１４ｂを設定した場合の予測性能の推定において予測性能１６を使用せず、値１４ａに関する探索と値１４ｂに関する探索とを独立に行う方法も考えられる。しかし、この方法は、ハイパーパラメータ１４の値が異なりハイパーパラメータ１５の値が同一または近似する組み合わせを別個に試行することになり、ハイパーパラメータ１４の異なる値でハイパーパラメータ１５の予測性能が類似している場合、探索の効率性に改善の余地がある。また、ハイパーパラメータ１４に値１４ｂを設定した場合の予測性能の推定に、予測性能差情報１３を使用せずに予測性能１６をそのまま適用する方法も考えられる。しかし、この方法は、ハイパーパラメータ１４の値の違いによる予測性能のずれを考慮しておらず、予測性能の推定精度に改善の余地があり、結果として探索の効率性に改善の余地がある。これに対し、過去の機械学習から得られた予測性能差情報１３を参照することで、ハイパーパラメータ値の探索を効率化できる。 For example, it is conceivable that the prediction performance 16 is not used in the estimation of the prediction performance when the value 14b is set for the hyperparameter 14, and the search for the value 14a and the search for the value 14b are performed independently. However, in this method, combinations in which the values of hyperparameter 14 are different and the values of hyperparameter 15 are the same or similar are tried separately, and the prediction performance of hyperparameter 15 is similar for different values of hyperparameter 14. If so, there is room for improvement in search efficiency. Further, a method of applying the prediction performance 16 as it is without using the prediction performance difference information 13 can be considered for estimating the prediction performance when the value 14b is set for the hyperparameter 14. However, this method does not consider the deviation of the prediction performance due to the difference in the value of the hyperparameter 14, and there is room for improvement in the estimation accuracy of the prediction performance, and as a result, there is room for improvement in the efficiency of the search. On the other hand, by referring to the prediction performance difference information 13 obtained from the past machine learning, the search for hyperparameter values can be made more efficient.

［第２の実施の形態］
次に、第２の実施の形態を説明する。
図２は、機械学習装置のハードウェア例を示すブロック図である。 [Second Embodiment]
Next, a second embodiment will be described.
FIG. 2 is a block diagram showing a hardware example of a machine learning device.

機械学習装置１００は、ＣＰＵ１０１、ＲＡＭ１０２、ＨＤＤ１０３、画像信号処理部１０４、入力信号処理部１０５、媒体リーダ１０６および通信インタフェース１０７を有する。上記ユニットはバスに接続されている。機械学習装置１００は、第１の実施の形態の探索装置１０に対応する。ＣＰＵ１０１は、第１の実施の形態の処理部１２に対応する。ＲＡＭ１０２またはＨＤＤ１０３は、第１の実施の形態の記憶部１１に対応する。 The machine learning device 100 includes a CPU 101, a RAM 102, an HDD 103, an image signal processing unit 104, an input signal processing unit 105, a medium reader 106, and a communication interface 107. The above unit is connected to the bus. The machine learning device 100 corresponds to the search device 10 of the first embodiment. The CPU 101 corresponds to the processing unit 12 of the first embodiment. The RAM 102 or the HDD 103 corresponds to the storage unit 11 of the first embodiment.

ＣＰＵ１０１は、プログラムの命令を実行するプロセッサである。ＣＰＵ１０１は、ＨＤＤ１０３に記憶されたプログラムやデータの少なくとも一部をＲＡＭ１０２にロードし、プログラムを実行する。なお、ＣＰＵ１０１は複数のプロセッサコアを含んでもよく、機械学習装置１００は複数のプロセッサを有してもよく、以下で説明する処理を複数のプロセッサまたはプロセッサコアを用いて並列に実行してもよい。また、複数のプロセッサの集合を「マルチプロセッサ」または単に「プロセッサ」と言うことがある。 The CPU 101 is a processor that executes a program instruction. The CPU 101 loads at least a part of the programs and data stored in the HDD 103 into the RAM 102 and executes the program. The CPU 101 may include a plurality of processor cores, the machine learning device 100 may have a plurality of processors, and the processes described below may be executed in parallel using the plurality of processors or processor cores. .. Also, a set of multiple processors may be referred to as a "multiprocessor" or simply a "processor".

ＲＡＭ１０２は、ＣＰＵ１０１が実行するプログラムやＣＰＵ１０１が演算に用いるデータを一時的に記憶する揮発性の半導体メモリである。なお、機械学習装置１００は、ＲＡＭ以外の種類のメモリを備えてもよく、複数個のメモリを備えてもよい。 The RAM 102 is a volatile semiconductor memory that temporarily stores a program executed by the CPU 101 and data used by the CPU 101 for calculation. The machine learning device 100 may include a type of memory other than the RAM, or may include a plurality of memories.

ＨＤＤ１０３は、オペレーティングシステム（ＯＳ：Operating System）やミドルウェアやアプリケーションソフトウェアなどのソフトウェアのプログラム、および、データを記憶する不揮発性の記憶装置である。プログラムには探索プログラムが含まれる。なお、機械学習装置１００は、フラッシュメモリやＳＳＤ（Solid State Drive）などの他の種類の記憶装置を備えてもよく、複数の不揮発性の記憶装置を備えてもよい。 The HDD 103 is a non-volatile storage device that stores software programs such as an operating system (OS: Operating System), middleware, and application software, and data. The program includes a search program. The machine learning device 100 may be provided with other types of storage devices such as a flash memory and an SSD (Solid State Drive), or may be provided with a plurality of non-volatile storage devices.

画像信号処理部１０４は、ＣＰＵ１０１からの命令に従って、機械学習装置１００に接続されたディスプレイ１１１に画像を出力する。ディスプレイ１１１としては、ＣＲＴ（Cathode Ray Tube）ディスプレイ、液晶ディスプレイ（ＬＣＤ：Liquid Crystal Display）、プラズマディスプレイ、有機ＥＬ（ＯＥＬ：Organic Electro-Luminescence）ディスプレイなど、任意の種類のディスプレイを用いることができる。 The image signal processing unit 104 outputs an image to the display 111 connected to the machine learning device 100 in accordance with a command from the CPU 101. As the display 111, any kind of display such as a CRT (Cathode Ray Tube) display, a liquid crystal display (LCD: Liquid Crystal Display), a plasma display, and an organic EL (OEL: Organic Electro-Luminescence) display can be used.

入力信号処理部１０５は、機械学習装置１００に接続された入力デバイス１１２から入力信号を取得し、ＣＰＵ１０１に出力する。入力デバイス１１２としては、マウスやタッチパネルやタッチパッドやトラックボールなどのポインティングデバイス、キーボード、リモートコントローラ、ボタンスイッチなどを用いることができる。また、機械学習装置１００に、複数の種類の入力デバイスが接続されていてもよい。 The input signal processing unit 105 acquires an input signal from the input device 112 connected to the machine learning device 100 and outputs the input signal to the CPU 101. As the input device 112, a pointing device such as a mouse, a touch panel, a touch pad, a trackball, a keyboard, a remote controller, a button switch, or the like can be used. Further, a plurality of types of input devices may be connected to the machine learning device 100.

媒体リーダ１０６は、記録媒体１１３に記録されたプログラムやデータを読み取る読み取り装置である。記録媒体１１３として、例えば、磁気ディスク、光ディスク、光磁気ディスク（ＭＯ：Magneto-Optical disk）、半導体メモリなどを使用できる。磁気ディスクには、フレキシブルディスク（ＦＤ：Flexible Disk）やＨＤＤが含まれる。光ディスクには、ＣＤ（Compact Disc）やＤＶＤ（Digital Versatile Disc）が含まれる。 The medium reader 106 is a reading device that reads programs and data recorded on the recording medium 113. As the recording medium 113, for example, a magnetic disk, an optical disk, a magneto-optical disk (MO: Magneto-Optical disk), a semiconductor memory, or the like can be used. The magnetic disk includes a flexible disk (FD) and an HDD. Optical discs include CDs (Compact Discs) and DVDs (Digital Versatile Discs).

媒体リーダ１０６は、例えば、記録媒体１１３から読み取ったプログラムやデータを、ＲＡＭ１０２やＨＤＤ１０３などの他の記録媒体にコピーする。読み取られたプログラムは、例えば、ＣＰＵ１０１によって実行される。なお、記録媒体１１３は可搬型記録媒体であってもよく、プログラムやデータの配布に用いられることがある。また、記録媒体１１３やＨＤＤ１０３を、コンピュータ読み取り可能な記録媒体と言うことがある。 The medium reader 106, for example, copies a program or data read from the recording medium 113 to another recording medium such as the RAM 102 or the HDD 103. The read program is executed by, for example, the CPU 101. The recording medium 113 may be a portable recording medium and may be used for distribution of programs and data. Further, the recording medium 113 and the HDD 103 may be referred to as a computer-readable recording medium.

通信インタフェース１０７は、ネットワーク１１４に接続され、ネットワーク１１４を介して他の装置と通信を行うインタフェースである。通信インタフェース１０７は、例えば、スイッチやルータなどの通信装置とケーブルで接続される。 The communication interface 107 is an interface that is connected to the network 114 and communicates with other devices via the network 114. The communication interface 107 is connected to a communication device such as a switch or a router by a cable.

次に、機械学習における予測性能およびハイパーパラメータについて説明する。
第２の実施の形態の機械学習では、既知の事例を示す複数の単位データ（インスタンスやレコードと言うことがある）を含むデータセットを予め用意しておく。機械学習装置１００または他の装置が、センサデバイスなどの各種デバイスからネットワーク１１４経由でインスタンスを収集してもよい。データセットは「ビッグデータ」と言われるサイズの大きなデータセットであってもよい。各インスタンスは、１以上の説明変数の値と目的変数の値とを含む。説明変数および目的変数それぞれを属性と言うことがあり、説明変数の値および目的変数の値それぞれを属性値と言うことがある。 Next, the prediction performance and hyperparameters in machine learning will be described.
In the machine learning of the second embodiment, a data set including a plurality of unit data (sometimes referred to as an instance or a record) indicating a known case is prepared in advance. The machine learning device 100 or other device may collect instances from various devices such as sensor devices via the network 114. The data set may be a large data set called "big data". Each instance contains the value of one or more explanatory variables and the value of the objective variable. The explanatory variable and the objective variable may be referred to as attributes, and the value of the explanatory variable and the value of the objective variable may be referred to as attribute values.

機械学習装置１００は、データセットの中から一部のインスタンスを訓練データとしてサンプリングし、訓練データを用いてモデルを生成する。モデルは、説明変数と目的変数との間の関係を示し、１以上の係数を含む。モデルは、線形式、二次以上の多項式、指数関数、対数関数などの各種数式によって表されることがある。数式の形は、機械学習の前にユーザによって指定されてもよい。係数は、機械学習を通じて訓練データに基づいて決定される。生成されたモデルを用いることで、未知の事例の説明変数の値（要因）から、未知の事例の目的変数の値（結果）を予測することができる。モデルによって予測される結果は、０以上１以下の確率値などの連続値であってもよいし、ＹＥＳ／ＮＯの２値などの離散値であってもよい。 The machine learning device 100 samples some instances from the data set as training data, and generates a model using the training data. The model shows the relationship between the explanatory variables and the objective variables and contains one or more coefficients. Models may be represented by various mathematical formulas such as linear form, polynomials of degree 2 or higher, exponential functions, and logarithmic functions. The form of the formula may be specified by the user prior to machine learning. Coefficients are determined based on training data through machine learning. By using the generated model, the value (result) of the objective variable of the unknown case can be predicted from the value (factor) of the explanatory variable of the unknown case. The result predicted by the model may be a continuous value such as a probability value of 0 or more and 1 or less, or a discrete value such as a two value of YES / NO.

生成されたモデルに対しては予測性能を算出することができる。予測性能は、未知の事例の結果を正確に予測する能力であり、精度と言うこともできる。機械学習装置１００は、データセットの中から訓練データ以外のインスタンスをテストデータとしてサンプリングし、テストデータを用いて予測性能を算出する。テストデータのサイズは、例えば、訓練データのサイズの１／２程度とする。機械学習装置１００は、テストデータに含まれる説明変数の値をモデルに入力し、モデルが出力する目的変数の値（予測値）とテストデータに含まれる目的変数の値（実績値）とを比較する。なお、生成したモデルの予測性能を検証することを「バリデーション」と言うことがある。 Prediction performance can be calculated for the generated model. Prediction performance is the ability to accurately predict the results of unknown cases, and can also be called accuracy. The machine learning device 100 samples an instance other than the training data from the data set as test data, and calculates the prediction performance using the test data. The size of the test data is, for example, about ½ of the size of the training data. The machine learning device 100 inputs the value of the explanatory variable included in the test data into the model, and compares the value of the objective variable (predicted value) output by the model with the value of the objective variable (actual value) included in the test data. do. It should be noted that verifying the prediction performance of the generated model is sometimes called "validation".

予測性能の指標として、正答率、適合率、Ｆ値、平均二乗誤差、平均二乗誤差平方根などが挙げられる。例えば、結果がＹＥＳ／ＮＯの２値で表されるとする。また、Ｎ件のテストデータの事例のうち、予測値＝ＹＥＳかつ実績値＝ＹＥＳの件数をＴｐ、予測値＝ＹＥＳかつ実績値＝ＮＯの件数をＦｐ、予測値＝ＮＯかつ実績値＝ＹＥＳの件数をＦｎ、予測値＝ＮＯかつ実績値＝ＮＯの件数をＴｎとする。この場合、正答率は予測が当たった割合であり、（Ｔｐ＋Ｔｎ）／Ｎと算出される。適合率は「ＹＥＳ」の予測を間違えない確率であり、Ｔｐ／（Ｔｐ＋Ｆｐ）と算出される。Ｆ値は、（２×再現率×適合率）／（再現率＋適合率）と算出される。再現率は、Ｔｐ／（Ｔｐ＋Ｆｎ）と算出される。各事例の実績値をｙと表し予測値をｙ＾と表すと、平均二乗誤差はｓｕｍ（ｙ－ｙ＾）^２／Ｎと算出され、平均二乗誤差平方根は（ｓｕｍ（ｙ－ｙ＾）^２／Ｎ）^１／２と算出される。 Examples of the predictive performance index include the correct answer rate, the precision rate, the F value, the mean square error, and the square root of the mean square error. For example, assume that the result is represented by two values of YES / NO. Of the N test data cases, the number of cases where the predicted value = YES and the actual value = YES is Tp, the number of cases where the predicted value = YES and the actual value = NO is Fp, and the predicted value = NO and the actual value = YES. Let Fn be the number of cases, and let Tn be the number of cases where the predicted value = NO and the actual value = NO. In this case, the correct answer rate is the rate at which the prediction is correct, and is calculated as (Tp + Tn) / N. The precision rate is a probability that the prediction of "YES" is not mistaken, and is calculated as Tp / (Tp + Fp). The F value is calculated as (2 × recall rate × precision rate) / (recall rate + precision rate). The recall rate is calculated as Tp / (Tp + Fn). When the actual value of each case is expressed as y and the predicted value is expressed as y ^, the mean square error is calculated as sum (y−y ^) ² / N, and the mean square error square root is (sum (y−y ^) ² ). / N) Calculated as ^1/2 .

機械学習装置１００は、ハイパーパラメータθの値を変更することで機械学習の挙動を調整する。ハイパーパラメータθは、モデルに含まれる係数のように機械学習を通じて値が決定されるものではなく、モデル生成の開始前に値が指定されるものである。ハイパーパラメータθの値を変えると、生成されるモデルが変わり予測性能が変わる。第２の実施の形態では、ハイパーパラメータθは、カテゴリカルハイパーパラメータｃと連続量ハイパーパラメータｘの組である。すなわちθ＝（ｃ，ｘ）である。 The machine learning device 100 adjusts the behavior of machine learning by changing the value of the hyperparameter θ. The hyperparameter θ is not determined through machine learning like the coefficients included in the model, but the value is specified before the start of model generation. When the value of hyperparameter θ is changed, the generated model changes and the prediction performance changes. In the second embodiment, the hyperparameter θ is a set of the categorical hyperparameter c and the continuous quantity hyperparameter x. That is, θ = (c, x).

カテゴリカルハイパーパラメータは、大小関係を定義できない値をとるハイパーパラメータである。カテゴリカルハイパーパラメータについては、近い値を用いて生成されたモデルは近い予測性能をもつという仮定が成立しない。カテゴリカルハイパーパラメータの例として、機械学習アルゴリズムやランダムフォレストのクラス分類方法が挙げられる。カテゴリカルハイパーパラメータは、２以上のハイパーパラメータ要素を含むベクトルでもよい。ただし、以下では説明を簡単にするため、カテゴリカルハイパーパラメータは単一のハイパーパラメータ要素であるスカラであると想定する場合がある。 Categorical hyperparameters are hyperparameters that take values whose magnitude relationship cannot be defined. For categorical hyperparameters, the assumption that models generated with close values have close predictive performance does not hold. Examples of categorical hyperparameters include machine learning algorithms and random forest classification methods. The categorical hyperparameter may be a vector containing two or more hyperparameter elements. However, for the sake of simplicity, we may assume that the categorical hyperparameters are scalars, which are single hyperparameter elements.

連続量ハイパーパラメータは、カテゴリカルハイパーパラメータ以外のハイパーパラメータであり、大小関係を定義できる値をとるハイパーパラメータである。連続量ハイパーパラメータは、好ましくは、異なる値の間の距離を定義できるハイパーパラメータである。連続量ハイパーパラメータについては、近い値を用いて生成されたモデルは近い予測性能をもつという仮定が成立する。連続量ハイパーパラメータの値は、実数のような連続値でもよいし整数のような離散値でもよい。連続量ハイパーパラメータの例として、ＲＢＦカーネルＳＶＭの変数「Ｃ」と「γ」やランダムフォレストの木の深さが挙げられる。連続量ハイパーパラメータは、２以上のハイパーパラメータ要素を含むベクトルでもよい。ただし、以下では説明を簡単にするため、連続量ハイパーパラメータは単一のハイパーパラメータ要素であるスカラであると想定する場合がある。 The continuous quantity hyperparameter is a hyperparameter other than the categorical hyperparameter, and is a hyperparameter that takes a value that can define a magnitude relationship. A continuous quantity hyperparameter is preferably a hyperparameter that can define the distance between different values. For continuous quantity hyperparameters, the assumption holds that models generated with close values have close predictive performance. The value of the continuous quantity hyperparameter may be a continuous value such as a real number or a discrete value such as an integer. Examples of continuous quantity hyperparameters include the variables "C" and "γ" of the RBF kernel SVM and the depth of the tree in the random forest. The continuous quantity hyperparameter may be a vector containing two or more hyperparameter elements. However, in the following, for the sake of simplicity, it may be assumed that the continuous quantity hyperparameter is a scalar which is a single hyperparameter element.

上記のように、カテゴリカルハイパーパラメータでは機械学習アルゴリズムを指定することができる。機械学習装置１００が実行可能な機械学習アルゴリズムの例として、ロジスティック回帰分析、ＳＶＭ、ランダムフォレストなどが挙げられる。 As mentioned above, machine learning algorithms can be specified by categorical hyperparameters. Examples of machine learning algorithms that can be executed by the machine learning device 100 include logistic regression analysis, SVM, and random forest.

ロジスティック回帰分析は、目的変数ｙの値と説明変数ｘ_１，ｘ_２，…，ｘ_ｋの値をＳ字曲線にフィッティングする回帰分析である。目的変数ｙおよび説明変数ｘ_１，ｘ_２，…，ｘ_ｋは、ｌｏｇ（ｙ／（１－ｙ））＝ａ_１ｘ_１＋ａ_２ｘ_２＋…＋ａ_ｋｘ_ｋ＋ｂの関係を満たすと仮定される。ａ_１，ａ_２，…，ａ_ｋ，ｂは係数であり、回帰分析によって決定される。 Logistic regression analysis is a regression analysis that fits the value of the objective variable y and the values of the explanatory variables x ₁ , x ₂ , ..., X _k to an S-shaped curve. It is assumed that the objective variable y and the explanatory variables x ₁ , x ₂ , ..., X _k satisfy the relationship of log (y / (1-y)) = a ₁ x ₁ + a ₂ x ₂ + ... + a _k x _k + b. Will be done. a ₁ , a ₂ , ..., Ak, b are _coefficients , which are determined by regression analysis.

サポートベクタマシンは、Ｎ次元空間に配置されたインスタンスの集合を、２つのクラスに最も明確に分割するような境界面を算出する機械学習アルゴリズムである。境界面は、各クラスとの距離（マージン）が最大になるように算出される。 A support vector machine is a machine learning algorithm that calculates a boundary surface that most clearly divides a set of instances arranged in N-dimensional space into two classes. The boundary surface is calculated so that the distance (margin) from each class is maximized.

ランダムフォレストは、複数のインスタンスを適切に分類するためのモデルを生成する機械学習アルゴリズムである。ランダムフォレストでは、データセットからインスタンスをランダムにサンプリングする。説明変数の一部をランダムに選択し、選択した説明変数の値に応じてサンプリングしたインスタンスを分類する。説明変数の選択とインスタンスの分類を繰り返すことで、複数の説明変数の値に基づく階層的な決定木を生成する。インスタンスのサンプリングと決定木の生成を繰り返すことで複数の決定木を取得し、複数の決定木を合成することでインスタンスを分類するための最終的なモデルを生成する。 Random forest is a machine learning algorithm that creates a model for properly classifying multiple instances. Random forest randomly samples instances from a dataset. Randomly select some of the explanatory variables and classify the sampled instances according to the value of the selected explanatory variable. By repeating the selection of explanatory variables and the classification of instances, a hierarchical decision tree based on the values of multiple explanatory variables is generated. Multiple decision trees are obtained by repeating instance sampling and decision tree generation, and the final model for classifying instances is generated by synthesizing multiple decision trees.

次に、ハイパーパラメータの調整による予測性能の変化について説明する。
図３は、ハイパーパラメータと予測性能との関係例を示すグラフである。
曲線２１ａは、カテゴリカルハイパーパラメータ値をｃ_１に固定してデータセットＤ_１に対して機械学習を行ったときにおける、連続量ハイパーパラメータ値と予測性能との間の関係を示している。カテゴリカルハイパーパラメータ値とデータセットが同一でも、連続量ハイパーパラメータ値を変えると予測性能が変わる。ただし、曲線２１ａは連続的曲線であり、近い連続量ハイパーパラメータ値からは近い予測性能が得られる。 Next, changes in prediction performance due to hyperparameter adjustment will be described.
FIG. 3 is a graph showing an example of the relationship between hyperparameters and prediction performance.
The curve 21a shows the relationship between the continuous quantity hyperparameter value and the prediction performance when the categorical hyperparameter value is fixed to c ₁ and machine learning is performed on the data set D ₁ . Even if the categorical hyperparameter value and the data set are the same, the prediction performance changes when the continuous quantity hyperparameter value is changed. However, the curve 21a is a continuous curve, and close prediction performance can be obtained from the close continuous quantity hyperparameter values.

曲線２１ｂは、カテゴリカルハイパーパラメータ値をｃ_２に固定してデータセットＤ_１に対して機械学習を行ったときにおける、連続量ハイパーパラメータ値と予測性能との間の関係を示している。曲線２１ａと曲線２１ｂとは、データセットが同一であるもののカテゴリカルハイパーパラメータ値が異なる。データセットと連続量ハイパーパラメータ値が同一でも、カテゴリカルハイパーパラメータ値を変えると予測性能が変わる。ただし、曲線２１ａと曲線２１ｂとは類似した予測性能の分布を示している。 The curve 21b shows the relationship between the continuous quantity hyperparameter value and the prediction performance when the categorical hyperparameter value is fixed to c ₂ and machine learning is performed on the data set D ₁ . Curve 21a and curve 21b have the same data set but different categorical hyperparameter values. Even if the data set and the continuous quantity hyperparameter value are the same, the prediction performance changes when the categorical hyperparameter value is changed. However, the curve 21a and the curve 21b show similar distributions of prediction performance.

曲線２１ｃは、カテゴリカルハイパーパラメータ値をｃ_１に固定してデータセットＤ_２に対して機械学習を行ったときにおける、連続量ハイパーパラメータ値と予測性能との間の関係を示している。曲線２１ａと曲線２１ｃとは、カテゴリカルハイパーパラメータ値が同一であるもののデータセットが異なる。カテゴリカルハイパーパラメータ値と連続量ハイパーパラメータ値が同一でも、データセットを変えると予測性能が変わる。曲線２１ａと曲線２１ｃとは類似しない予測性能の分布を示している。 The curve 21c shows the relationship between the continuous quantity hyperparameter value and the prediction performance when the categorical hyperparameter value is fixed to c ₁ and machine learning is performed on the data set D ₂ . Curves 21a and curves 21c have the same categorical hyperparameter values, but different data sets. Even if the categorical hyperparameter value and the continuous quantity hyperparameter value are the same, the prediction performance changes when the data set is changed. Curves 21a and 21c show distributions of predictive performance that are not similar.

曲線２１ｄは、カテゴリカルハイパーパラメータ値をｃ_２に固定してデータセットＤ_２に対して機械学習を行ったときにおける、連続量ハイパーパラメータ値と予測性能との間の関係を示している。曲線２１ｂと曲線２１ｄとは、カテゴリカルハイパーパラメータ値が同一であるもののデータセットが異なる。曲線２１ｂと曲線２１ｄとは類似しない予測性能の分布を示している。一方で、曲線２１ｃと曲線２１ｄとは、データセットが同一であるもののカテゴリカルハイパーパラメータ値が異なる。曲線２１ｃと曲線２１ｄとは類似した予測性能の分布を示している。 The curve 21d shows the relationship between the continuous quantity hyperparameter value and the prediction performance when the categorical hyperparameter value is fixed to c ₂ and machine learning is performed on the data set D ₂ . Curve 21b and curve 21d have the same categorical hyperparameter values, but different data sets. Curves 21b and 21d show distributions of predictive performance that are not similar. On the other hand, the curve 21c and the curve 21d have the same data set but different categorical hyperparameter values. Curves 21c and 21d show similar distributions of predictive performance.

このように、生成されるモデルの予測性能は、カテゴリカルハイパーパラメータ値と連続量ハイパーパラメータ値とデータセットの影響を受ける。あるデータセットからモデルを生成するとき、予測性能が最大になるカテゴリカルハイパーパラメータ値と連続量ハイパーパラメータ値の組を予め特定することは難しい。そこで、機械学習装置１００は、同一のデータセットに対して機械学習を繰り返して、最適なカテゴリカルハイパーパラメータ値と連続量ハイパーパラメータ値の組を探索する。探索では、可能な組み合わせを網羅的に試行するのではなく、予測性能が高くなる可能性がある組み合わせに絞って機械学習を試行することが好ましい。よって、最適なカテゴリカルハイパーパラメータ値と連続量ハイパーパラメータ値の組を探索する方法が問題となる。 Thus, the predictive performance of the generated model is affected by the categorical hyperparameter values, the continuous quantity hyperparameter values and the dataset. When generating a model from a certain data set, it is difficult to specify in advance the pair of categorical hyperparameter values and continuous quantity hyperparameter values that maximize the prediction performance. Therefore, the machine learning device 100 repeats machine learning for the same data set to search for the optimum set of categorical hyperparameter values and continuous quantity hyperparameter values. In the search, it is preferable to try machine learning by focusing on the combinations that may improve the prediction performance, instead of trying all possible combinations. Therefore, the problem is how to search for the optimal set of categorical hyperparameter values and continuous quantity hyperparameter values.

あるカテゴリカルハイパーパラメータ値について幾つかの連続量ハイパーパラメータ値を試行した場合、算出された予測性能を用いて、回帰分析などにより他の連続量ハイパーパラメータ値の予測性能を推定することは可能である。これは、近い値からは近い予測性能が得られるという連続量ハイパーパラメータの性質を利用したものである。例えば、カテゴリカルハイパーパラメータ値ｃ_１について幾つかの異なる連続量ハイパーパラメータ値を試行した場合、その試行結果から曲線２１ａを推定することができる。 When several continuous-quantity hyperparameter values are tried for a certain categorical hyperparameter value, it is possible to estimate the prediction performance of other continuous-quantity hyperparameter values by regression analysis etc. using the calculated prediction performance. be. This utilizes the property of continuous quantity hyperparameters that close prediction performance can be obtained from close values. For example, when several different continuous quantity hyperparameter values are tried for the categorical hyperparameter value c ₁ , the curve 21a can be estimated from the trial result.

一方で、あるカテゴリカルハイパーパラメータ値について予測性能の分布を推定するときに、他のカテゴリカルハイパーパラメータ値の試行結果をそのまま利用することは難しい。カテゴリカルハイパーパラメータ値が異なると予測性能が変わるためである。例えば、カテゴリカルハイパーパラメータ値ｃ_１の曲線２１ａを推定するときに、カテゴリカルハイパーパラメータ値ｃ_２で算出された予測性能をそのまま利用することは難しい。 On the other hand, when estimating the distribution of prediction performance for a certain categorical hyperparameter value, it is difficult to use the trial result of another categorical hyperparameter value as it is. This is because the prediction performance changes when the categorical hyperparameter values are different. For example, when estimating the curve 21a of the categorical hyperparameter value c ₁ , it is difficult to use the prediction performance calculated by the categorical hyperparameter value c ₂ as it is.

ただし、特定の２つのカテゴリカルハイパーパラメータ値の間では、データセットが変わっても予測性能差が概ね一定範囲に収まることがある。図３の例では、カテゴリカルパラメータ値ｃ_１，ｃ_２それぞれの予測性能の分布自体は、データセットが変わると大きく変化する。その一方で、カテゴリカルパラメータ値ｃ_１の予測性能とカテゴリカルハイパーパラメータ値ｃ_２の予測性能の差は、データセットに関係なく十分小さい。予測性能の分布が近似するカテゴリカルパラメータ値の例としては、ランダムフォレストで使用する「ジニ係数」と「エントロピー」というクラス分類方法が挙げられる。 However, the prediction performance difference between two specific categorical hyperparameter values may be within a generally constant range even if the data set changes. In the example of FIG. 3, the distribution itself of the prediction performance of each of the categorical parameter values c ₁ and c ₂ changes greatly when the data set changes. On the other hand, the difference between the prediction performance of the categorical parameter value c ₁ and the prediction performance of the categorical hyperparameter value c ₂ is sufficiently small regardless of the data set. Examples of categorical parameter values that approximate the distribution of predictive performance include the "Gini coefficient" and "entropy" classification methods used in random forests.

そこで、機械学習装置１００は、過去の機械学習の履歴から、２つのカテゴリカルハイパーパラメータ値の間の予測性能差の傾向を分析しておく。そして、機械学習装置１００は、この分析結果を利用して、あるカテゴリカルハイパーパラメータ値の予測性能から別のカテゴリカルハイパーパラメータ値の予測性能を推定する。 Therefore, the machine learning device 100 analyzes the tendency of the prediction performance difference between the two categorical hyperparameter values from the history of past machine learning. Then, the machine learning device 100 uses this analysis result to estimate the prediction performance of another categorical hyperparameter value from the prediction performance of one categorical hyperparameter value.

図４は、予測性能差の期待値および標準偏差の例を示すグラフである。
曲線２２ａは、カテゴリカルハイパーパラメータ値ｃ_１の予測性能とカテゴリカルハイパーパラメータ値ｃ_２の予測性能の差の期待値（平均値）を示している。曲線２２ｂは、カテゴリカルハイパーパラメータ値ｃ_１の予測性能とカテゴリカルハイパーパラメータ値ｃ_２の予測性能の差の標準偏差（分散度）を示している。予測性能差の期待値および標準偏差は、連続量ハイパーパラメータ値それぞれに対して算出される。ただし、連続量ハイパーパラメータ値に対する期待値および標準偏差の変化が小さい場合、連続量ハイパーパラメータ値に関係なく代表して１つの期待値および標準偏差を算出してもよい。 FIG. 4 is a graph showing an example of the expected value and standard deviation of the predicted performance difference.
The curve 22a shows the expected value (mean value) of the difference between the prediction performance of the categorical hyperparameter value c ₁ and the prediction performance of the categorical hyperparameter value c ₂ . The curve 22b shows the standard deviation (variance degree) of the difference between the prediction performance of the categorical hyperparameter value c ₁ and the prediction performance of the categorical hyperparameter value c ₂ . The expected value and standard deviation of the predicted performance difference are calculated for each continuous quantity hyperparameter value. However, if the change in the expected value and the standard deviation with respect to the continuous quantity hyperparameter value is small, one expected value and the standard deviation may be calculated on behalf of the continuous quantity hyperparameter value.

機械学習装置１００は、複数のデータセットそれぞれについて、カテゴリカルハイパーパラメータ値ｃ_１の予測性能とカテゴリカルハイパーパラメータ値ｃ_２の予測性能との差を算出する。機械学習装置１００は、複数のデータセットの間で予測性能差の平均値を算出することで曲線２２ａを求める。また、機械学習装置１００は、複数のデータセットの間で予測性能差の標準偏差を算出することで曲線２２ｂを求める。なお、使用するデータセットによってモデルの予測性能の変動範囲は異なる。このため、予測性能差の期待値と標準偏差を算出するにあたっては、機械学習装置１００は、データセット毎の予測性能の変動範囲（例えば、最大値と最小値）に基づいて予測性能を正規化しておく。 The machine learning device 100 calculates the difference between the prediction performance of the categorical hyperparameter value c ₁ and the prediction performance of the categorical hyperparameter value c ₂ for each of the plurality of data sets. The machine learning device 100 obtains the curve 22a by calculating the average value of the predicted performance difference among the plurality of data sets. Further, the machine learning device 100 obtains the curve 22b by calculating the standard deviation of the predicted performance difference among a plurality of data sets. The fluctuation range of the prediction performance of the model differs depending on the data set used. Therefore, in calculating the expected value and standard deviation of the predicted performance difference, the machine learning device 100 normalizes the predicted performance based on the fluctuation range of the predicted performance (for example, the maximum value and the minimum value) for each data set. Keep it.

予測性能差の期待値および標準偏差を複数の連続量ハイパーパラメータ値それぞれに対して算出しておく場合、機械学習装置１００は、例えば、グリッド法またはランダム法によって複数の連続量ハイパーパラメータ値を選択する。グリッド法は、連続量ハイパーパラメータの空間の中から一定間隔毎に値を選択する方法である。ランダム法は、連続量ハイパーパラメータの空間の中からランダムに値を選択する方法である。ただし、機械学習装置１００は、グリッド法を採用することが好ましい。 When the expected value and standard deviation of the predicted performance difference are calculated for each of a plurality of continuous quantity hyperparameter values, the machine learning device 100 selects a plurality of continuous quantity hyperparameter values by, for example, a grid method or a random method. do. The grid method is a method of selecting values at regular intervals from the space of continuous quantity hyperparameters. The random method is a method of randomly selecting a value from the space of continuous quantity hyperparameters. However, it is preferable that the machine learning device 100 adopts the grid method.

次に、予測性能差の情報を利用したハイパーパラメータ探索について説明する。
図５は、ハイパーパラメータ探索の進行例を示す図である。
一例として、機械学習装置１００は、カテゴリカルハイパーパラメータ値ｃ_１と連続量ハイパーパラメータ値ｘ_１の組を用いて機械学習を行い、予測性能２３ａを算出する。予測性能２３ａは測定値であるため、不確かさを示す分散（エラーバー）をもたない。また、機械学習装置１００は、カテゴリカルハイパーパラメータ値ｃ_２と連続量ハイパーパラメータ値ｘ_２の組を用いて機械学習を行い、予測性能２３ｂを算出する。予測性能２３ｂは測定値であるため、不確かさを示す分散をもたない。 Next, the hyperparameter search using the information of the predicted performance difference will be described.
FIG. 5 is a diagram showing an example of progress of hyperparameter search.
As an example, the machine learning device 100 performs machine learning using a set of a categorical hyperparameter value c ₁ and a continuous quantity hyperparameter value x ₁ , and calculates a prediction performance 23a. Since the predicted performance 23a is a measured value, it does not have a variance (error bar) indicating uncertainty. Further, the machine learning device 100 performs machine learning using a set of the categorical hyperparameter value c ₂ and the continuous quantity hyperparameter value x ₂ , and calculates the prediction performance 23b. Since the predicted performance 23b is a measured value, it does not have a variance indicating uncertainty.

次に、機械学習装置１００は、予測性能２３ｂと予測性能差情報を用いて、カテゴリカルハイパーパラメータ値ｃ_１と連続量ハイパーパラメータ値ｘ_２の組の予測性能２３ｃを推定する。予測性能２３ｃは推定値であるため、不確かさを示す分散をもつ。予測性能２３ｃの期待値は、連続量ハイパーパラメータｘ_２におけるカテゴリカルハイパーパラメータ値ｃ_１から見た予測性能差を、予測性能２３ｂに加えたものである。予測性能２３ｃの標準偏差は、連続量ハイパーパラメータｘ_２における予測性能差の標準偏差である。 Next, the machine learning device 100 estimates the prediction performance 23c of a set of the categorical hyperparameter value c ₁ and the continuous quantity hyperparameter value x ₂ by using the prediction performance 23b and the prediction performance difference information. Since the predicted performance 23c is an estimated value, it has a variance indicating uncertainty. _The expected value of the predicted performance 23c is the difference in the predicted performance seen from the categorical hyperparameter value c1 in the continuous quantity hyperparameter x ₂ added to the predicted performance 23b. _The standard deviation of the prediction performance 23c is the standard deviation of the prediction performance difference in the continuous quantity hyperparameter x2.

すると、機械学習装置１００は、予測性能２３ａ，２３ｃを用いて、回帰分析手法などによりカテゴリカルハイパーパラメータ値ｃ_１の予測性能曲線を推定する。このとき、機械学習装置１００は、推定の不確かさに起因して予測性能が高くなる連続ハイパーパラメータ値を見逃してしまうリスクを下げるため、予測性能曲線の示す推定値がばらつく確率も算出しておく。例えば、機械学習装置１００は、回帰分析手法などにより予測性能曲線に対して９５％信頼区間を算出しておく。９５％信頼区間を算出するにあたっては、推定された予測性能である予測性能２３ｃの不確かさも考慮される。 Then, the machine learning device 100 estimates the prediction performance curve of the categorical hyperparameter value c ₁ by the regression analysis method or the like using the prediction performances 23a and 23c. At this time, the machine learning device 100 also calculates the probability that the estimated value shown by the predicted performance curve varies in order to reduce the risk of overlooking the continuous hyperparameter value whose prediction performance becomes high due to the uncertainty of estimation. .. For example, the machine learning device 100 calculates a 95% confidence interval for the prediction performance curve by a regression analysis method or the like. In calculating the 95% confidence interval, the uncertainty of the estimated performance 23c, which is the estimated prediction performance, is also taken into consideration.

また、機械学習装置１００は、予測性能２３ａと予測性能差情報を用いて、カテゴリカルハイパーパラメータ値ｃ_２と連続量ハイパーパラメータ値ｘ_１の組の予測性能２３ｄを推定する。予測性能２３ｄは推定値であるため、不確かさを示す分散をもつ。予測性能２３ｄの期待値は、連続量ハイパーパラメータｘ_１におけるカテゴリカルハイパーパラメータ値ｃ_２から見た予測性能差を、予測性能２３ａに加えたものである。予測性能２３ｄの標準偏差は、連続量ハイパーパラメータｘ_１における予測性能差の標準偏差である。 Further, the machine learning device 100 estimates the prediction performance 23d of a set of the categorical hyperparameter value c ₂ and the continuous quantity hyperparameter value x ₁ by using the prediction performance 23a and the prediction performance difference information. Since the predicted performance 23d is an estimated value, it has a variance indicating uncertainty. _The expected value of the predicted performance 23d is the difference in the predicted performance seen from the categorical _{hyperparameter} value c2 in the continuous quantity hyperparameter x1 added to the predicted performance 23a. _The standard deviation of the predicted performance 23d is the standard deviation of the predicted performance difference in the continuous quantity hyperparameter x1.

すると、機械学習装置１００は、予測性能２３ｂ，２３ｄを用いて、回帰分析手法などによりカテゴリカルハイパーパラメータ値ｃ_２の予測性能曲線を推定する。このとき、機械学習装置１００は、予測性能曲線の示す推定値がばらつく確率も算出しておく。 Then, the machine learning device 100 estimates the prediction performance curve of the categorical hyperparameter value c ₂ by the regression analysis method or the like using the prediction performances 23b and 23d. At this time, the machine learning device 100 also calculates the probability that the estimated value shown by the prediction performance curve varies.

機械学習装置１００は、カテゴリカルハイパーパラメータ値ｃ_１の予測性能曲線とカテゴリカルハイパーパラメータ値ｃ_２の予測性能曲線に基づいて、予測性能が高い可能性があるカテゴリカルハイパーパラメータ値と連続量ハイパーパラメータ値の組を選択する。例えば、機械学習装置１００は、算出した９５％信頼区間の上限値（ＵＣＢ：Upper Confidence Bound）が最も高いカテゴリカルハイパーパラメータ値と連続量ハイパーパラメータ値の組を選択する。ここでは、機械学習装置１００は、カテゴリカルハイパーパラメータ値ｃ_２と連続量ハイパーパラメータ値ｘ_３の組を選択する。 The machine learning device 100 is based on the prediction performance curve of the categorical hyperparameter value c ₁ and the prediction performance curve of the categorical hyperparameter value c ₂ , and the categorical hyperparameter value and the continuous quantity hyper which may have high prediction performance. Select a set of parameter values. For example, the machine learning device 100 selects a set of a categorical hyperparameter value and a continuous quantity hyperparameter value having the highest calculated upper limit value (UCB: Upper Confidence Bound) of the 95% confidence interval. Here, the machine learning device 100 selects a set of the categorical hyperparameter value c ₂ and the continuous quantity hyperparameter value x ₃ .

次に、機械学習装置１００は、選択したカテゴリカルハイパーパラメータ値ｃ_２と連続量ハイパーパラメータ値ｘ_３の組を用いて機械学習を行い、予測性能２３ｅを算出する。予測性能２３ｅは測定値であるため、不確かさを示す分散をもたない。すると、機械学習装置１００は、予測性能２３ｂ，２３ｄ，２３ｅを用いて、回帰分析手法などによりカテゴリカルハイパーパラメータ値ｃ_２の予測性能曲線を更新する。 Next, the machine learning device 100 performs machine learning using the set of the selected categorical hyperparameter value c ₂ and the continuous quantity hyperparameter value x ₃ , and calculates the prediction performance 23e. Since the predicted performance 23e is a measured value, it does not have a variance indicating uncertainty. Then, the machine learning device 100 updates the prediction performance curve of the categorical hyperparameter value c ₂ by the regression analysis method or the like using the prediction performances 23b, 23d, 23e.

図６は、ハイパーパラメータ探索の進行例を示す図（続き）である。
機械学習装置１００は、前述の予測性能２３ｅと予測性能差情報を用いて、カテゴリカルハイパーパラメータ値ｃ_１と連続量ハイパーパラメータ値ｘ_３の組の予測性能２３ｆを推定する。予測性能２３ｆは推定値であるため、不確かさを示す分散をもつ。予測性能２３ｆの期待値は、連続量ハイパーパラメータｘ_３におけるカテゴリカルハイパーパラメータ値ｃ_１から見た予測性能差を、予測性能２３ｅに加えたものである。予測性能２３ｆの標準偏差は、連続量ハイパーパラメータｘ_３における予測性能差の標準偏差である。機械学習装置１００は、予測性能２３ａ，２３ｃ，２３ｆを用いて、回帰分析手法などによりカテゴリカルハイパーパラメータ値ｃ_１の予測性能曲線を更新する。 FIG. 6 is a diagram (continued) showing an example of progress of hyperparameter search.
The machine learning device 100 estimates the prediction performance 23f of _a set of the categorical hyperparameter value c1 and the continuous quantity hyperparameter value _x3 by using the above-mentioned prediction performance 23e and the prediction performance difference information. Since the predicted performance 23f is an estimated value, it has a variance indicating uncertainty. _The expected value of the predicted performance 23f is the difference in the predicted performance seen from the categorical hyperparameter value c1 in the continuous quantity hyperparameter _x3 added to the predicted performance 23e. The standard deviation of the predicted performance 23f is the standard deviation of the predicted performance difference in the continuous quantity hyperparameter _x3 . The machine learning device 100 uses the prediction performances 23a, 23c, and 23f to update the prediction performance curve of the categorical hyperparameter value c1 by _a regression analysis method or the like.

次に、機械学習装置１００は、カテゴリカルハイパーパラメータ値ｃ_１の予測性能曲線とカテゴリカルハイパーパラメータ値ｃ_２の予測性能曲線に基づいて、予測性能が高い可能性があるカテゴリカルハイパーパラメータ値と連続量ハイパーパラメータ値の組を選択する。ここでは、機械学習装置１００は、カテゴリカルハイパーパラメータ値ｃ_１と連続量ハイパーパラメータ値ｘ_４の組を選択する。 Next, the machine learning device 100 has a categorical hyperparameter value that may have high prediction performance based on the prediction performance curve of the categorical hyperparameter value c ₁ and the prediction performance curve of the categorical hyperparameter value c ₂ . Select a set of continuous quantity hyperparameter values. Here, the machine learning device 100 selects a set of the categorical hyperparameter value c ₁ and the continuous quantity hyperparameter value x ₄ .

機械学習装置１００は、選択したカテゴリカルハイパーパラメータ値ｃ_１と連続量ハイパーパラメータ値ｘ_４の組を用いて機械学習を行い、予測性能２３ｇを算出する。予測性能２３ｇは測定値であるため、不確かさを示す分散をもたない。機械学習装置１００は、予測性能２３ａ，２３ｃ，２３ｆ，２３ｇを用いて、回帰分析手法などによりカテゴリカルハイパーパラメータ値ｃ_１の予測性能曲線を更新する。 The machine learning device 100 performs machine learning using the set of the selected categorical hyperparameter value c ₁ and the continuous quantity hyperparameter value x ₄ , and calculates the prediction performance 23 g. Since the predicted performance of 23 g is a measured value, it does not have a variance indicating uncertainty. The machine learning device 100 uses the prediction performances 23a, 23c, 23f, and 23g to update the prediction performance curve of the categorical hyperparameter value c1 by _a regression analysis method or the like.

次に、機械学習装置１００は、予測性能２３ｇと予測性能差情報を用いて、カテゴリカルハイパーパラメータ値ｃ_２と連続量ハイパーパラメータ値ｘ_４の組の予測性能２３ｈを推定する。予測性能２３ｈは推定値であるため、不確かさを示す分散をもつ。予測性能２３ｈの期待値は、連続量ハイパーパラメータｘ_４におけるカテゴリカルハイパーパラメータ値ｃ_２から見た予測性能差を、予測性能２３ｇに加えたものである。予測性能２３ｈの標準偏差は、連続量ハイパーパラメータｘ_４における予測性能差の標準偏差である。機械学習装置１００は、予測性能２３ｂ，２３ｄ，２３ｅ，２３ｈを用いて、回帰分析手法などによりカテゴリカルハイパーパラメータ値ｃ_２の予測性能曲線を更新する。 Next, the machine learning device 100 estimates the prediction performance 23h of a set of the categorical hyperparameter value c2 and the continuous quantity _{hyperparameter} value _x4 by using the prediction performance 23g and the prediction performance difference information. Since the predicted performance 23h is an estimated value, it has a variance indicating uncertainty. _The expected value of the predicted performance 23h is the difference in the predicted performance seen from the categorical _{hyperparameter} value c2 in the continuous quantity hyperparameter x4 added to the predicted performance 23g. The standard deviation of the predicted performance 23h is the standard deviation of the predicted performance difference in the continuous quantity hyperparameter _x4 . The machine learning device 100 uses the prediction performances 23b, 23d, 23e, and 23h to update the prediction performance curve of the categorical hyperparameter value c ₂ by a regression analysis method or the like.

このように、１回のモデル生成および予測性能の測定によって、カテゴリカルハイパーパラメータ値ｃ_１の予測性能曲線とカテゴリカルハイパーパラメータ値ｃ_２の予測性能曲線の両方が更新される。これを繰り返すことで予測性能の推定精度が向上し、予測性能が最大になるハイパーパラメータ値に近づくことができる。 In this way, both the prediction performance curve of the categorical hyperparameter value c ₁ and the prediction performance curve of the categorical hyperparameter value c ₂ are updated by one model generation and measurement of the prediction performance. By repeating this, the estimation accuracy of the prediction performance is improved, and it is possible to approach the hyperparameter value that maximizes the prediction performance.

図７は、共有度と推定予測性能との関係例を示す図である。
ここでは、カテゴリカルハイパーパラメータ値ｃ_２の試行結果に基づいてカテゴリカルハイパーパラメータ値ｃ_１の予測性能曲線が更新されることに着目する。カテゴリカルハイパーパラメータ値ｃ_１の予測性能曲線の９５％信頼区間の広さは、カテゴリカルハイパーパラメータ値ｃ_１，ｃ_２の間の共有度（予測性能の類似度）に依存する。共有度は、予測性能差の標準偏差に反比例していると言える。 FIG. 7 is a diagram showing an example of the relationship between the degree of sharing and the estimated prediction performance.
Here, it is noted that the prediction performance curve of the categorical hyperparameter value c ₁ is updated based on the trial result of the categorical hyperparameter value c ₂ . The width of the 95% confidence interval of the prediction performance curve of the categorical hyperparameter values c ₁ depends on the degree of sharing (similarity of prediction performance) between the categorical hyperparameter values c ₁ and c ₂ . It can be said that the degree of sharing is inversely proportional to the standard deviation of the predicted performance difference.

共有度が強い（類似度が高い）とき、連続量ハイパーパラメータ値ｘ_３に対して予測性能２４ａが推定され、連続量ハイパーパラメータ値ｘ_２に対して予測性能２４ｂが推定される。予測性能２４ａ，２４ｂの分散は比較的狭い。一方、共有度が弱い（類似度が低い）とき、連続量ハイパーパラメータ値ｘ_３に対して予測性能２４ｃが推定され、連続量ハイパーパラメータ値ｘ_２に対して予測性能２４ｄが推定される。予測性能２４ｃ，２４ｄの分散は予測性能２４ａ，２４ｂと比べて広い。 When the degree of sharing is strong (high similarity), the prediction performance 24a is estimated for the continuous quantity hyperparameter value x ₃ , and the prediction performance 24b is estimated for the continuous quantity hyperparameter value x ₂ . The variance of the predicted performance 24a and 24b is relatively narrow. On the other hand, when the degree of sharing is weak (the degree of similarity is low), the prediction performance 24c is estimated for the continuous quantity hyperparameter value _x3 , and the prediction performance 24d is estimated for the continuous quantity _{hyperparameter} value x2. The variance of the predicted performances 24c and 24d is wider than that of the predicted performances 24a and 24b.

よって、共有度が強いときは予測性能曲線の推定の不確かさが小さいとみなされる。一方、共有度が弱いときは予測性能曲線の推定の不確かさが大きいとみなされ、予測性能が予想外に高くなる可能性が表現される。このように、予測性能差の標準偏差に応じて、カテゴリカルハイパーパラメータ値ｃ_１，ｃ_２の間で機械学習の試行結果を共有して予測性能曲線を連動させる強度が自動的に調整されることになる。 Therefore, when the degree of sharing is strong, the uncertainty of estimation of the prediction performance curve is considered to be small. On the other hand, when the degree of sharing is weak, the uncertainty of estimation of the prediction performance curve is considered to be large, and the possibility that the prediction performance becomes unexpectedly high is expressed. In this way, the strength of sharing the machine learning trial results between the categorical hyperparameter values c ₁ and c ₂ and linking the prediction performance curves is automatically adjusted according to the standard deviation of the prediction performance difference. It will be.

なお、カテゴリカルハイパーパラメータ値ｃ_１，ｃ_２の間で機械学習の試行結果を単純に共有する方法も考えられる。この方法では、カテゴリカルハイパーパラメータ値の違いによる予測性能の変化を無視することになり、推定された予測性能の分散が考慮されない。単純共有では、連続量ハイパーパラメータ値ｘ_３に対して予測性能２４ｅが推定され、連続量ハイパーパラメータ値ｘ_２に対して予測性能２４ｆが推定される。予測性能２４ｅ，２４ｆの分散は０であるとみなされる。よって、単純共有では予測性能曲線の推定の不確かさが過小評価されてしまい、予測性能が予想外に高くなる可能性が表現されない。 A method of simply sharing the machine learning trial results between the categorical hyperparameter values c ₁ and c ₂ is also conceivable. In this method, the change in the prediction performance due to the difference in the categorical hyperparameter values is ignored, and the variance of the estimated prediction performance is not taken into consideration. In simple sharing, the prediction performance 24e is estimated for the continuous quantity hyperparameter value x ₃ , and the prediction performance 24f is estimated for the continuous quantity hyperparameter value x ₂ . The variance of the predicted performances 24e and 24f is considered to be zero. Therefore, in simple sharing, the uncertainty of estimation of the prediction performance curve is underestimated, and the possibility that the prediction performance becomes unexpectedly high cannot be expressed.

また、カテゴリカルハイパーパラメータ値ｃ_１，ｃ_２の間で機械学習の試行結果を共有しない方法も考えられる。この方法では、異なるカテゴリカルハイパーパラメータ値の間では予測性能に全く関連性がないと仮定することになり、他のカテゴリカルハイパーパラメータ値の試行によって得られた情報が利用されない。無共有では、連続量ハイパーパラメータ値ｘ_２，ｘ_３に対して何ら情報がないとみなして予測性能曲線が推定される。よって、無共有では予測性能曲線の推定の不確かさが過大評価されてしまい、予測性能がばらつく可能性を早期に絞り込むことが難しい。 It is also conceivable that the machine learning trial results are not shared between the categorical hyperparameter values c ₁ and c ₂ . In this method, it is assumed that there is no relation in prediction performance between different categorical hyperparameter values, and the information obtained by trials of other categorical hyperparameter values is not utilized. With no sharing _, the prediction performance curve is estimated assuming that there is no information for the continuous quantity hyperparameter values x2 and _x3 . Therefore, without sharing, the uncertainty of estimation of the prediction performance curve is overestimated, and it is difficult to narrow down the possibility of variation in prediction performance at an early stage.

次に、機械学習装置１００の処理について説明する。
図８は、機械学習装置の機能例を示すブロック図である。
機械学習装置１００は、サンプル履歴記憶部１２１、統計情報記憶部１２２、データセット記憶部１２３および学習結果記憶部１２４を有する。また、機械学習装置１００は、サンプル履歴分析部１３１、制限時間入力部１３２、学習制御部１３３、学習実行部１３４および予測性能推定部１３５を有する。サンプル履歴記憶部１２１、統計情報記憶部１２２、データセット記憶部１２３および学習結果記憶部１２４は、例えば、ＲＡＭ１０２またはＨＤＤ１０３の記憶領域を用いて実装される。サンプル履歴分析部１３１、制限時間入力部１３２、学習制御部１３３、学習実行部１３４および予測性能推定部１３５は、例えば、ＣＰＵ１０１が実行するプログラムを用いて実装される。 Next, the processing of the machine learning device 100 will be described.
FIG. 8 is a block diagram showing a functional example of the machine learning device.
The machine learning device 100 has a sample history storage unit 121, a statistical information storage unit 122, a data set storage unit 123, and a learning result storage unit 124. Further, the machine learning device 100 includes a sample history analysis unit 131, a time limit input unit 132, a learning control unit 133, a learning execution unit 134, and a prediction performance estimation unit 135. The sample history storage unit 121, the statistical information storage unit 122, the data set storage unit 123, and the learning result storage unit 124 are implemented using, for example, the storage area of the RAM 102 or the HDD 103. The sample history analysis unit 131, the time limit input unit 132, the learning control unit 133, the learning execution unit 134, and the prediction performance estimation unit 135 are implemented using, for example, a program executed by the CPU 101.

サンプル履歴記憶部１２１は、過去に様々なデータセットに対して実行された機械学習の結果を示すサンプル履歴を記憶する。サンプル履歴を生成するための機械学習は、機械学習装置１００によって実行されてもよいし他の装置によって実行されてもよい。サンプル履歴は、ユーザからの要求に応じて実行された運用上の機械学習で生成されたものでもよいし、サンプル履歴生成のために実行された試験的な機械学習で生成されたものでもよい。ただし、ハイパーパラメータ値の網羅性の観点から、試験的な機械学習によってサンプル履歴を生成しておくことが好ましい。サンプル履歴は、１回のモデル生成毎にデータセットＩＤとハイパーパラメータ値と予測性能と実行時間とを含む。 The sample history storage unit 121 stores a sample history showing the results of machine learning executed for various data sets in the past. The machine learning for generating the sample history may be performed by the machine learning device 100 or by another device. The sample history may be generated by operational machine learning executed in response to a request from the user, or may be generated by experimental machine learning executed for sample history generation. However, from the viewpoint of completeness of hyperparameter values, it is preferable to generate a sample history by experimental machine learning. The sample history includes the data set ID, hyperparameter values, prediction performance, and execution time for each model generation.

統計情報記憶部１２２は、サンプル履歴から生成された統計情報を記憶する。統計情報は、２つのカテゴリカルハイパーパラメータ値と１つの連続量ハイパーパラメータ値の組み合わせ毎に、予測性能差の期待値と予測性能差の標準偏差とを含む。また、統計情報は、２つのカテゴリカルハイパーパラメータ値と１つの連続量ハイパーパラメータ値の組み合わせ毎に、実行時間差の期待値と実行時間差の標準偏差とを含む。実行時間差は、前述の予測性能差と同様の考え方に基づいて算出されるものである。ただし、後述するように第２の実施の形態では、機械学習の制御に実行時間は参照されないため、統計情報は実行時間差の期待値と実行時間差の標準偏差とを含まなくてもよい。 The statistical information storage unit 122 stores statistical information generated from the sample history. The statistical information includes the expected value of the predicted performance difference and the standard deviation of the predicted performance difference for each combination of the two categorical hyperparameter values and one continuous quantity hyperparameter value. In addition, the statistical information includes the expected value of the execution time difference and the standard deviation of the execution time difference for each combination of the two categorical hyperparameter values and one continuous quantity hyperparameter value. The execution time difference is calculated based on the same concept as the above-mentioned predicted performance difference. However, as will be described later, in the second embodiment, since the execution time is not referred to in the control of machine learning, the statistical information does not have to include the expected value of the execution time difference and the standard deviation of the execution time difference.

データセット記憶部１２３は、今回の機械学習に用いるデータセット（インスタンスの集合）を記憶する。データセット記憶部１２３に記憶されるデータセットは、サンプル履歴を生成するために過去に使用されたデータセットとは異なる。 The data set storage unit 123 stores the data set (set of instances) used for the machine learning this time. The data set stored in the data set storage unit 123 is different from the data set used in the past for generating the sample history.

学習結果記憶部１２４は、データセット記憶部１２３に記憶されたデータセットに対する機械学習の学習結果を記憶する。学習結果は、予測性能が最も高かったモデルと、そのモデルの生成に使用したハイパーパラメータ値と、そのモデルの予測性能とを含む。 The learning result storage unit 124 stores the learning result of machine learning for the data set stored in the data set storage unit 123. The training results include the model with the highest prediction performance, the hyperparameter values used to generate the model, and the prediction performance of the model.

サンプル履歴分析部１３１は、サンプル履歴記憶部１２１に記憶されたサンプル履歴を分析し、統計情報を生成して統計情報記憶部１２２に保存する。サンプル履歴分析部１３１は、ユーザからの要求に応じて統計情報を生成してもよいし、一定時間毎に統計情報を生成するなど継続的に統計情報を更新してもよい。 The sample history analysis unit 131 analyzes the sample history stored in the sample history storage unit 121, generates statistical information, and stores it in the statistical information storage unit 122. The sample history analysis unit 131 may generate statistical information in response to a request from the user, or may continuously update the statistical information, such as generating statistical information at regular time intervals.

サンプル履歴分析部１３１は、２つのカテゴリカルハイパーパラメータ値と１つの連続量ハイパーパラメータ値の組み合わせ毎に、各データセットにおける予測性能差を算出し、複数のデータセットの間でその期待値と標準偏差とを算出する。また、サンプル履歴分析部１３１は、２つのカテゴリカルハイパーパラメータ値と１つの連続量ハイパーパラメータ値の組み合わせ毎に、各データセットにおける実行時間差を算出し、複数のデータセットの間でその期待値と標準偏差とを算出する。ただし、第２の実施の形態ではサンプル履歴分析部１３１は、実行時間差については算出しなくてもよい。 The sample history analysis unit 131 calculates the prediction performance difference in each data set for each combination of two categorical hyperparameter values and one continuous quantity hyperparameter value, and the expected value and standard among a plurality of data sets. Calculate the deviation. Further, the sample history analysis unit 131 calculates the execution time difference in each data set for each combination of two categorical hyperparameter values and one continuous quantity hyperparameter value, and sets the expected value among a plurality of data sets. Calculate the standard deviation. However, in the second embodiment, the sample history analysis unit 131 does not have to calculate the execution time difference.

制限時間入力部１３２は、データセット記憶部１２３に記憶されたデータセットに対する機械学習を開始してからの経過時間の閾値として、制限時間を学習制御部１３３に対して入力する。制限時間の範囲内で、できる限り予測性能の高いモデルが生成されることになる。ハイパーパラメータ探索の性質上、制限時間が長い方がモデルの予測性能が高くなることが多い。制限時間は、例えば、今回のデータセットに対する機械学習の開始前に、ユーザによって入力デバイス１１２から入力される。 The time limit input unit 132 inputs the time limit to the learning control unit 133 as a threshold of the elapsed time from the start of machine learning for the data set stored in the data set storage unit 123. Within the time limit, a model with the highest possible predictive performance will be generated. Due to the nature of hyperparameter search, the longer the time limit, the higher the prediction performance of the model. The time limit is input by the user from the input device 112, for example, before the start of machine learning for this dataset.

学習制御部１３３は、今回のデータセットに対する機械学習を制御する。学習制御部１３３は、ハイパーパラメータ値、すなわち、カテゴリカルハイパーパラメータ値と連続量ハイパーパラメータ値の組を１つ選択し、選択したハイパーパラメータ値を学習実行部１３４に通知して機械学習を実行させる。学習制御部１３３は、当該ハイパーパラメータ値を使用した機械学習が終了すると、他のハイパーパラメータ値を使用した場合の予測性能を予測性能推定部１３５に推定させる。学習制御部１３３は、現在までに達成された最大の予測性能（達成予測性能）を超える予測性能が得られる可能性のある他のハイパーパラメータ値を次に選択する。学習制御部１３３は、以上の処理を経過時間が制限時間に到達するまで繰り返し、学習結果を学習結果記憶部１２４に保存する。 The learning control unit 133 controls machine learning for the current data set. The learning control unit 133 selects one hyperparameter value, that is, a set of a categorical hyperparameter value and a continuous quantity hyperparameter value, and notifies the learning execution unit 134 of the selected hyperparameter value to execute machine learning. .. When the machine learning using the hyperparameter value is completed, the learning control unit 133 causes the prediction performance estimation unit 135 to estimate the prediction performance when other hyperparameter values are used. The learning control unit 133 next selects another hyperparameter value that may obtain a prediction performance exceeding the maximum prediction performance (achievement prediction performance) achieved so far. The learning control unit 133 repeats the above processing until the elapsed time reaches the time limit, and stores the learning result in the learning result storage unit 124.

学習実行部１３４は、学習制御部１３３から指定されたハイパーパラメータ値を使用した機械学習を実行する。学習実行部１３４が使用する機械学習アルゴリズムは、カテゴリカルハイパーパラメータ値の中で指定されることがある。機械学習アルゴリズムが指定されなかった場合、学習実行部１３４は所定の機械学習アルゴリズムを使用してもよい。 The learning execution unit 134 executes machine learning using the hyperparameter values specified by the learning control unit 133. The machine learning algorithm used by the learning execution unit 134 may be specified in the categorical hyperparameter values. If no machine learning algorithm is specified, the learning execution unit 134 may use a predetermined machine learning algorithm.

学習実行部１３４は、データセット記憶部１２３に記憶されたデータセットの中から一部のインスタンスを訓練データとして抽出し、他の一部のインスタンスをテストデータとして抽出する。学習実行部１３４は、抽出した訓練データと指定されたハイパーパラメータ値を用いて機械学習を実行してモデルを生成する。学習実行部１３４は、生成したモデルと抽出したテストデータを用いてバリデーションを行い、モデルの予測性能を測定する。また、学習実行部１３４は、指定されたハイパーパラメータ値を用いた機械学習を開始してから終了するまでの実行時間を測定する。学習実行部１３４は、生成したモデルと測定した予測性能と測定した実行時間とを学習制御部１３３に出力する。ただし、第２の実施の形態では学習実行部１３４は、実行時間を測定しなくてもよい。 The learning execution unit 134 extracts some instances as training data from the data set stored in the data set storage unit 123, and extracts some other instances as test data. The learning execution unit 134 executes machine learning using the extracted training data and the specified hyperparameter values to generate a model. The learning execution unit 134 performs validation using the generated model and the extracted test data, and measures the prediction performance of the model. Further, the learning execution unit 134 measures the execution time from the start to the end of machine learning using the designated hyperparameter values. The learning execution unit 134 outputs the generated model, the measured prediction performance, and the measured execution time to the learning control unit 133. However, in the second embodiment, the learning execution unit 134 does not have to measure the execution time.

予測性能推定部１３５は、今回のデータセットに対するハイパーパラメータ探索の履歴（探索履歴）に基づいて、カテゴリカルハイパーパラメータ値毎に予測性能曲線を推定し、カテゴリカルハイパーパラメータ値毎に予測性能改善量を算出する。予測性能改善量は、予測性能の推定値と現在の達成予測性能との差であり、次に特定のハイパーパラメータ値を選択した場合に期待される達成予測性能の上昇の程度を示す指標である。予測性能推定部１３５は、算出した予測性能改善量を学習制御部１３３に通知する。学習制御部１３３では、予測性能改善量が最大になるハイパーパラメータ値が次に選択される。なお、第２の実施の形態では予測性能改善量がハイパーパラメータ探索の指標として使用されるため、予測性能推定部１３５では実行時間に関する計算は行われない。 The prediction performance estimation unit 135 estimates the prediction performance curve for each categorical hyperparameter value based on the history of hyperparameter search for this data set (search history), and the amount of prediction performance improvement for each categorical hyperparameter value. Is calculated. The amount of prediction performance improvement is the difference between the estimated value of the prediction performance and the current achievement prediction performance, and is an index showing the degree of increase in the achievement prediction performance expected when a specific hyperparameter value is selected next. .. The prediction performance estimation unit 135 notifies the learning control unit 133 of the calculated prediction performance improvement amount. In the learning control unit 133, the hyperparameter value that maximizes the predicted performance improvement amount is next selected. Since the predicted performance improvement amount is used as an index for hyperparameter search in the second embodiment, the prediction performance estimation unit 135 does not calculate the execution time.

図９は、サンプル履歴テーブルの例を示す図である。
サンプル履歴テーブル１４１は、サンプル履歴記憶部１２１に記憶される。サンプル履歴テーブル１４１は、複数回のモデル生成の履歴を示す。サンプル履歴テーブル１４１は、データセットＩＤ、カテゴリカルハイパーパラメータ値（カテゴリカルＨＰ値）、連続量ハイパーパラメータ値（連続量ＨＰ値）、予測性能および実行時間の項目を有する。 FIG. 9 is a diagram showing an example of a sample history table.
The sample history table 141 is stored in the sample history storage unit 121. The sample history table 141 shows the history of a plurality of model generations. The sample history table 141 has items of data set ID, categorical hyperparameter value (categorical HP value), continuous quantity hyperparameter value (continuous quantity HP value), prediction performance and execution time.

データセットＩＤの項目には、モデル生成に使用したデータセットの識別子が登録される。カテゴリカルハイパーパラメータ値の項目には、モデル生成に使用したカテゴリカルハイパーパラメータ値が登録される。連続量ハイパーパラメータ値の項目には、モデル生成に使用した連続量ハイパーパラメータ値が登録される。予測性能の項目には、測定された予測性能が登録される。実行時間の項目には、モデル生成に要した時間が登録される。 In the data set ID item, the identifier of the data set used for model generation is registered. In the item of categorical hyperparameter value, the categorical hyperparameter value used for model generation is registered. In the item of continuous quantity hyperparameter value, the continuous quantity hyperparameter value used for model generation is registered. The measured prediction performance is registered in the prediction performance item. The time required to generate the model is registered in the execution time item.

サンプル履歴テーブル１４１は、カテゴリカルハイパーパラメータ値および連続量ハイパーパラメータ値が同一でデータセットが異なる複数のレコードを含む。また、サンプル履歴テーブル１４１は、データセットＩＤおよび連続量ハイパーパラメータ値が同一でカテゴリカルハイパーパラメータ値が異なる複数のレコードを含む。また、サンプル履歴テーブル１４１は、データセットＩＤおよびカテゴリカルハイパーパラメータ値が同一で連続量ハイパーパラメータ値が異なる複数のレコードを含む。 The sample history table 141 includes a plurality of records having the same categorical hyperparameter value and continuous quantity hyperparameter value but different data sets. Further, the sample history table 141 includes a plurality of records having the same data set ID and continuous quantity hyperparameter values but different categorical hyperparameter values. Further, the sample history table 141 includes a plurality of records having the same data set ID and categorical hyperparameter values but different continuous quantity hyperparameter values.

予測性能推定部１３５は、今回のデータセットに対する探索履歴としてサンプル履歴テーブル１４１と同様のテーブルを保持する。ただし、探索履歴はデータセットＩＤを含まなくてよい。よって、探索履歴は、カテゴリカルハイパーパラメータ値、連続量ハイパーパラメータ値、予測性能および実行時間をそれぞれ含む１以上のレコードを有する。なお、第２の実施の形態では探索履歴は実行時間を含まなくてもよい。 The prediction performance estimation unit 135 holds a table similar to the sample history table 141 as the search history for the current data set. However, the search history does not have to include the data set ID. Therefore, the search history has one or more records including categorical hyperparameter values, continuous quantity hyperparameter values, prediction performance and execution time, respectively. In the second embodiment, the search history does not have to include the execution time.

図１０は、統計テーブルの例を示す図である。
統計テーブル１４２は、統計情報記憶部１２２に記憶される。統計テーブル１４２は、カテゴリカルハイパーパラメータ値、連続量ハイパーパラメータ値、予測性能期待値、予測性能標準偏差、実行時間期待値および実行時間標準偏差の項目を有する。 FIG. 10 is a diagram showing an example of a statistical table.
The statistical table 142 is stored in the statistical information storage unit 122. The statistical table 142 has items of categorical hyperparameter value, continuous quantity hyperparameter value, predicted performance expected value, predicted performance standard deviation, execution time expected value and execution time standard deviation.

カテゴリカルハイパーパラメータ値の項目には、異なる２つのカテゴリカルハイパーパラメータ値の組（ｃ_ｉ，ｃ_ｊ）が登録される。ここでは、２つの値の順序も区別される。よって、（ｃ_１，ｃ_２）と（ｃ_２，ｃ_１）は別の組として認識される。連続量ハイパーパラメータ値の項目には、１つの連続量ハイパーパラメータ値が登録される。 Two different sets of categorical hyperparameter values ( _ci , c _j ) are registered in the item of categorical hyperparameter values. Here, the order of the two values is also distinguished. Therefore, (c ₁ , c ₂ ) and (c ₂ , c ₁ ) are recognized as different sets. One continuous quantity hyperparameter value is registered in the item of continuous quantity hyperparameter value.

予測性能期待値の項目には、カテゴリカルハイパーパラメータ値ｃ_ｉから見たカテゴリカルハイパーパラメータ値ｃ_ｊとの間の予測性能差の期待値（平均値）が登録される。（ｃ_ｉ，ｃ_ｊ）の予測性能差は、カテゴリカルハイパーパラメータ値ｃ_ｉの予測性能からカテゴリカルハイパーパラメータ値ｃ_ｊの予測性能を引いたものに相当する。ただし、前述のように予測性能差は正規化されている。予測性能標準偏差の項目には、カテゴリカルハイパーパラメータ値ｃ_ｉ，ｃ_ｊの間の予測性能差の標準偏差が登録される。ただし、予測性能差のばらつきを示す指標として分散などの他の指標を用いることもできる。 In the item of predicted performance expected value, the expected value (average value) of the predicted performance difference between the categorical hyperparameter value c _i and the categorical hyperparameter value c _j is registered. The difference in prediction performance of (c _i , c _j ) corresponds to the prediction performance of the categorical hyperparameter value c _i minus the prediction performance of the categorical hyperparameter value c _j . However, as described above, the predicted performance difference is normalized. In the item of predicted performance standard deviation, the standard deviation of the predicted performance difference between the categorical hyperparameter values c _i and c _j is registered. However, other indexes such as variance can also be used as an index showing the variation in the predicted performance difference.

実行時間期待値の項目には、カテゴリカルハイパーパラメータ値ｃ_ｉから見たカテゴリカルハイパーパラメータ値ｃ_ｊとの間の実行時間差の期待値（平均値）が登録される。（ｃ_ｉ，ｃ_ｊ）の実行時間差は、カテゴリカルハイパーパラメータ値ｃ_ｉの実行時間からカテゴリカルハイパーパラメータ値ｃ_ｊの実行時間を引いたものに相当する。ただし、実行時間差は正規化されている。実行時間標準偏差の項目には、カテゴリカルハイパーパラメータ値ｃ_ｉ，ｃ_ｊの間の実行時間差の標準偏差が登録される。ただし、実行時間差のばらつきを示す指標として分散などの他の指標を用いることもできる。 In the item of expected execution time, the expected value (mean value) of the execution time difference between the categorical hyperparameter value c _i and the categorical hyperparameter value c _j is registered. The execution time difference of (c _i , c _j ) corresponds to the execution time of the categorical hyperparameter value c _i minus the execution time of the categorical hyperparameter value c _j . However, the execution time difference is normalized. In the item of execution time standard deviation, the standard deviation of the execution time difference between the categorical hyperparameter values c _i and c _j is registered. However, other indicators such as variance can also be used as an indicator of variation in execution time difference.

図１１は、統計情報生成の手順例を示すフローチャートである。
（Ｓ１０）サンプル履歴分析部１３１は、サンプル履歴テーブル１４１の中から１つのデータセットＩＤ（データセットＤ_ｄのデータセットＩＤ）を選択し、選択したデータセットＩＤを含むレコードをサンプル履歴テーブル１４１から抽出する。 FIG. 11 is a flowchart showing an example of a procedure for generating statistical information.
(S10) The sample history analysis unit 131 selects one data set ID (data set ID of the data set D _d ) from the sample history table 141, and records the records including the selected data set ID from the sample history table 141. Extract.

（Ｓ１１）サンプル履歴分析部１３１は、ステップＳ１０で抽出したレコードから予測性能の最大値ｐｍａｘ＝ｍａｘ_ｃ，ｘ（ｆ（ｘ；Ｄ_ｄ，ｃ））と予測性能の最小値ｐｍｉｎ＝ｍｉｎ_ｃ，ｘ（ｆ（ｘ；Ｄ_ｄ，ｃ））を検索する。ｆ（ｘ；Ｄ，ｃ）は、カテゴリカルハイパーパラメータｃと連続量ハイパーパラメータｘとデータセットＤから特定される予測性能を表す。サンプル履歴分析部１３１は、検索した最大値ｐｍａｘと最小値ｐｍｉｎに基づいて、予測性能の正規化関数ｈｐ（）を決定する。例えば、ｈｐ（ｆ（ｘ；Ｄ_ｄ，ｃ））＝（ｆ（ｘ；Ｄ_ｄ，ｃ）－ｐｍｉｎ）／（ｐｍａｘ－ｐｍｉｎ）と決定する。なお、正規化関数を、予測性能の平均と標準偏差を用いて定義することもできる。また、予測性能の中央値と四分位範囲を用いて定義することもできる。また、予測性能の中央値と中央絶対偏差（ＭＡＤ：Median Absolute Deviation）を用いて定義することもできる。 (S11) The sample history analysis unit 131 has a maximum value of prediction performance pmax = max _{c, x} (f (x; D _d , c)) and a minimum value of prediction performance pmin = min _c, from the record extracted in step S10. Search for _x (f (x; D _d , c)). f (x; D, c) represents the prediction performance specified from the categorical hyperparameter c, the continuous quantity hyperparameter x, and the data set D. The sample history analysis unit 131 determines the normalization function hp () of the prediction performance based on the searched maximum value pmax and minimum value pmin. For example, it is determined that hp (f (x; D _d , c)) = (f (x; D _d , c) -pmin) / (pmax-pmin). The normalization function can also be defined using the average and standard deviation of the prediction performance. It can also be defined using the median predictive performance and the interquartile range. It can also be defined using the median of predictive performance and median absolute deviation (MAD).

（Ｓ１２）サンプル履歴分析部１３１は、ステップＳ１０で抽出したレコードに含まれる各予測性能を、ステップＳ１１で決定した正規化関数ｈｐ（）を用いて正規化する。
（Ｓ１３）サンプル履歴分析部１３１は、ステップＳ１０で抽出したレコードから実行時間の最大値ｔｍａｘ＝ｍａｘ_ｃ，ｘ（ｔ（ｘ；Ｄ_ｄ，ｃ））と実行時間の最小値ｔｍｉｎ＝ｍｉｎ_ｃ，ｘ（ｔ（ｘ；Ｄ_ｄ，ｃ））を検索する。ｔ（ｘ；Ｄ，ｃ）は、カテゴリカルハイパーパラメータｃと連続量ハイパーパラメータｘとデータセットＤから特定される実行時間を表す。サンプル履歴分析部１３１は、検索した最大値ｔｍａｘと最小値ｔｍｉｎに基づいて、実行時間の正規化関数ｈｔ（）を決定する。例えば、ｈｔ（ｔ（ｘ；Ｄ_ｄ，ｃ））＝（ｔ（ｘ；Ｄ_ｄ，ｃ）－ｔｍｉｎ）／（ｔｍａｘ－ｔｍｉｎ）と決定する。なお、正規化関数を、実行時間の平均と標準偏差を用いて定義することもできる。また、実行時間の中央値と四分位範囲を用いて定義することもできる。また、実行時間の中央値と中央絶対偏差を用いて定義することもできる。 (S12) The sample history analysis unit 131 normalizes each prediction performance included in the records extracted in step S10 by using the normalization function hp () determined in step S11.
(S13) The sample history analysis unit 131 has a maximum execution time tmax = max _{c, x} (t (x; D _d , c)) and a minimum execution time tmin = min _c, from the records extracted in step S10. Search for _x (t (x; D _d , c)). t (x; D, c) represents the execution time specified from the categorical hyperparameter c, the continuous quantity hyperparameter x, and the data set D. The sample history analysis unit 131 determines the execution time normalization function ht () based on the searched maximum value tmax and minimum value tmin. For example, it is determined that ht (t (x; D _d , c)) = (t (x; D _d , c) -tmin) / (tmax-tmin). The normalization function can also be defined using the average execution time and the standard deviation. It can also be defined using the median execution time and the interquartile range. It can also be defined using the median execution time and the median absolute deviation.

（Ｓ１４）サンプル履歴分析部１３１は、ステップＳ１０で抽出したレコードに含まれる各実行時間を、ステップＳ１３で決定した正規化関数ｈｔ（）を用いて正規化する。なお、第２の実施の形態ではステップＳ１３，Ｓ１４を省略してもよい。 (S14) The sample history analysis unit 131 normalizes each execution time included in the record extracted in step S10 by using the normalization function ht () determined in step S13. In the second embodiment, steps S13 and S14 may be omitted.

（Ｓ１５）サンプル履歴分析部１３１は、ステップＳ１０で全てのデータセットＩＤを選択したか判断する。全てのデータセットＩＤを選択した場合はステップＳ１６に処理が進み、未選択のデータセットＩＤがある場合はステップＳ１０に処理が進む。 (S15) The sample history analysis unit 131 determines whether all the data set IDs have been selected in step S10. If all the data set IDs are selected, the process proceeds to step S16, and if there are unselected data set IDs, the process proceeds to step S10.

図１２は、統計情報生成の手順例を示すフローチャート（続き）である。
（Ｓ１６）サンプル履歴分析部１３１は、１つのカテゴリカルハイパーパラメータ値（カテゴリカルハイパーパラメータ値ｃ_ｉ）を選択する。 FIG. 12 is a flowchart (continued) showing an example of a procedure for generating statistical information.
(S16) The sample history analysis unit 131 selects one categorical hyperparameter value (categorical hyperparameter value _ci ).

（Ｓ１７）サンプル履歴分析部１３１は、ステップＳ１６と異なる他の１つのカテゴリカルハイパーパラメータ値（カテゴリカルハイパーパラメータ値ｃ_ｊ）を選択する。
（Ｓ１８）サンプル履歴分析部１３１は、サンプル履歴テーブル１４１に現れる１つの連続量ハイパーパラメータ値（連続量ハイパーパラメータ値ｘ_ｋ）を選択する。 (S17) The sample history analysis unit 131 selects another categorical hyperparameter value (categorical hyperparameter value _cj ) different from step S16.
(S18) The sample history analysis unit 131 selects one continuous quantity hyperparameter value (continuous quantity hyperparameter value x _k ) that appears in the sample history table 141.

（Ｓ１９）サンプル履歴分析部１３１は、ステップＳ１２で正規化した予測性能（正規化予測性能）のうち、カテゴリカルハイパーパラメータ値ｃ_ｉと連続量ハイパーパラメータ値ｘ_ｋの組に対応する正規化予測性能を抽出する。また、サンプル履歴分析部１３１は、カテゴリカルハイパーパラメータ値ｃ_ｊと連続量ハイパーパラメータ値ｘ_ｋの組に対応する正規化予測性能を抽出する。サンプル履歴分析部１３１は、データセットＩＤ毎に２つの正規化予測性能の差（正規化予測性能差）を算出し、正規化予測性能差の期待値ｐμ_ｉｊ（ｘ_ｋ）を算出する。ｐμ_ｉｊ（ｘ_ｋ）＝Ｅ_Ｄ［ｈｐ（ｆ（ｘ_ｋ；Ｄ，ｃ_ｉ））－ｈｐ（ｆ（ｘ_ｋ；Ｄ，ｃ_ｊ））］と算出される。そして、サンプル履歴分析部１３１は、（ｃ_ｉ，ｃ_ｊ），ｘ_ｋの組に対応付けて期待値ｐμ_ｉｊ（ｘ_ｋ）を統計テーブル１４２に登録する。 (S19) The sample history analysis unit 131 performs normalized prediction corresponding to a set of categorical hyperparameter value _ci and continuous quantity hyperparameter value x _k among the prediction performance (normalized prediction performance) normalized in step S12. Extract performance. Further, the sample history analysis unit 131 extracts the normalized prediction performance corresponding to the set of the categorical hyperparameter value c _j and the continuous quantity hyperparameter value x _k . The sample history analysis unit 131 calculates the difference between the two normalization prediction performances (normalization prediction performance difference) for each data set ID, and calculates the expected value pμ _ij (x _k ) of the normalization prediction performance difference. It is calculated as pμ _ij (x _k ) = ED [hp (f (x _k ; _D , c _i ))-hp (f (x _k ; D, c _j ))]. Then, the sample history analysis unit 131 registers the expected value pμ _ij (x _k ) in the statistical table 142 in association with the set of ( _ci , c _j ) and x _k .

（Ｓ２０）サンプル履歴分析部１３１は、正規化予測性能差の標準偏差ｐσ_ｉｊ（ｘ_ｋ）を算出する。ｐσ_ｉｊ（ｘ_ｋ）＝ＳＤ_Ｄ［ｈｐ（ｆ（ｘ_ｋ；Ｄ，ｃ_ｉ））－ｈｐ（ｆ（ｘ_ｋ；Ｄ，ｃ_ｊ））］と算出される。そして、サンプル履歴分析部１３１は、（ｃ_ｉ，ｃ_ｊ），ｘ_ｋの組に対応付けて標準偏差ｐσ_ｉｊ（ｘ_ｋ）を統計テーブル１４２に登録する。 (S20) The sample history analysis unit 131 calculates the standard deviation pσ _ij (x _k ) of the normalized prediction performance difference. It is calculated as pσ _ij (x _k ) = SD _D [hp (f (x _k ; D, c _i )) − hp (f (x _k ; D, c _j ))]. Then, the sample history analysis unit 131 registers the standard deviation pσ _ij (x _k ) in the statistical table 142 in association with the set of ( _ci , c _j ) and x _k .

（Ｓ２１）サンプル履歴分析部１３１は、ステップＳ１４で正規化した実行時間（正規化実行時間）のうち、カテゴリカルハイパーパラメータ値ｃ_ｉと連続量ハイパーパラメータ値ｘ_ｋの組に対応する正規化実行時間を抽出する。また、サンプル履歴分析部１３１は、カテゴリカルハイパーパラメータ値ｃ_ｊと連続量ハイパーパラメータ値ｘ_ｋの組に対応する正規化実行時間を抽出する。サンプル履歴分析部１３１は、データセットＩＤ毎に２つの正規化実行時間の差（正規化実行時間差）を算出し、正規化実行時間差の期待値ｔμ_ｉｊ（ｘ_ｋ）を算出する。ｔμ_ｉｊ（ｘ_ｋ）＝Ｅ_Ｄ［ｈｔ（ｔ（ｘ_ｋ；Ｄ，ｃ_ｉ））－ｈｔ（ｔ（ｘ_ｋ；Ｄ，ｃ_ｊ））］と算出される。そして、サンプル履歴分析部１３１は、（ｃ_ｉ，ｃ_ｊ），ｘ_ｋの組に対応付けて期待値ｔμ_ｉｊ（ｘ_ｋ）を統計テーブル１４２に登録する。 (S21) The sample history analysis unit 131 performs normalization corresponding to the set of the categorical hyperparameter value _ci and the continuous quantity hyperparameter value x _k in the execution time (normalized execution time) normalized in step S14. Extract time. Further, the sample history analysis unit 131 extracts the normalization execution time corresponding to the set of the categorical hyperparameter value c _j and the continuous quantity hyperparameter value x _k . The sample history analysis unit 131 calculates the difference between the two normalization execution times (normalization execution time difference) for each data set ID, and calculates the expected value tμ _ij (x _k ) of the normalization execution time difference. It is calculated as tμ _ij (x _k ) = ED [ht (t (x _k ; _D , c _i ))-ht (t (x _k ; D, c _j ))]. Then, the sample history analysis unit 131 registers the expected value tμ _ij (x _k ) in the statistical table 142 in association with the set of ( _ci , c _j ) and x _k .

（Ｓ２２）サンプル履歴分析部１３１は、正規化実行時間差の標準偏差ｔσ_ｉｊ（ｘ_ｋ）を算出する。ｔσ_ｉｊ（ｘ_ｋ）＝ＳＤ_Ｄ［ｈｔ（ｆ（ｘ_ｋ；Ｄ，ｃ_ｉ））－ｈｔ（ｆ（ｘ_ｋ；Ｄ，ｃ_ｊ））］と算出される。そして、サンプル履歴分析部１３１は、（ｃ_ｉ，ｃ_ｊ），ｘ_ｋの組に対応付けて標準偏差ｔσ_ｉｊ（ｘ_ｋ）を統計テーブル１４２に登録する。なお、第２の実施の形態ではステップＳ２１，Ｓ２２を省略してもよい。 (S22) The sample history analysis unit 131 calculates the standard deviation tσ _ij (x _k ) of the normalization execution time difference. It is calculated as tσ _ij (x _k ) = SD _D [ht (f (x _k ; D, c _i ))-ht (f (x _k ; D, c _j ))]. Then, the sample history analysis unit 131 registers the standard deviation tσ _ij (x _k ) in the statistical table 142 in association with the set of ( _ci , c _j ) and x _k . In the second embodiment, steps S21 and S22 may be omitted.

（Ｓ２３）サンプル履歴分析部１３１は、ステップＳ１８でサンプル履歴テーブル１４１に現れる全ての連続量ハイパーパラメータ値を選択したか判断する。全ての連続量ハイパーパラメータ値を選択した場合はステップＳ２４に処理が進み、未選択の連続量ハイパーパラメータ値が存在する場合はステップＳ１８に処理が進む。 (S23) The sample history analysis unit 131 determines whether all the continuous quantity hyperparameter values appearing in the sample history table 141 have been selected in step S18. When all the continuous quantity hyperparameter values are selected, the process proceeds to step S24, and when there are unselected continuous quantity hyperparameter values, the process proceeds to step S18.

（Ｓ２４）サンプル履歴分析部１３１は、ステップＳ１７で全ての他のカテゴリカルハイパーパラメータ値を選択したか判断する。全ての他のカテゴリカルハイパーパラメータ値を選択した場合はステップＳ２５に処理が進み、未選択の他のカテゴリカルハイパーパラメータ値が存在する場合はステップＳ１７に処理が進む。 (S24) The sample history analysis unit 131 determines whether all other categorical hyperparameter values have been selected in step S17. If all other categorical hyperparameter values are selected, the process proceeds to step S25, and if there are other unselected categorical hyperparameter values, the process proceeds to step S17.

（Ｓ２５）サンプル履歴分析部１３１は、ステップＳ１６で全てのカテゴリカルハイパーパラメータ値を選択したか判断する。全てのカテゴリカルハイパーパラメータ値を選択した場合は統計情報生成が終了し、未選択のカテゴリカルハイパーパラメータ値が存在する場合はステップＳ１６に処理が進む。 (S25) The sample history analysis unit 131 determines whether all the categorical hyperparameter values have been selected in step S16. When all the categorical hyperparameter values are selected, the statistical information generation is completed, and when there are unselected categorical hyperparameter values, the process proceeds to step S16.

図１３は、第２の実施の形態の機械学習の手順例を示すフローチャートである。
（Ｓ３０）学習制御部１３３は、データセットＤ、制限時間Ｔおよびハイパーパラメータ探索範囲Θを特定する。ハイパーパラメータ探索範囲Θは、カテゴリカルハイパーパラメータ探索範囲Ｃと連続量ハイパーパラメータ探索範囲Ｘとを含む。 FIG. 13 is a flowchart showing an example of the machine learning procedure of the second embodiment.
(S30) The learning control unit 133 specifies the data set D, the time limit T, and the hyperparameter search range Θ. The hyperparameter search range Θ includes a categorical hyperparameter search range C and a continuous quantity hyperparameter search range X.

（Ｓ３１）予測性能推定部１３５は、探索履歴Ｓを空集合φに初期化する。
（Ｓ３２）予測性能推定部１３５は、探索履歴Ｓを参照して、カテゴリカルハイパーパラメータ探索範囲Ｃの各カテゴリカルハイパーパラメータ値について、最大の予測性能改善量ｇ（ｘ；ｃ）を算出し、その予測性能改善量が得られる連続量ハイパーパラメータ値を特定する。予測性能改善量の推定方法については後述する。 (S31) The prediction performance estimation unit 135 initializes the search history S to the empty set φ.
(S32) The prediction performance estimation unit 135 calculates the maximum prediction performance improvement amount g (x; c) for each categorical hyperparameter value in the categorical hyperparameter search range C with reference to the search history S. Specify the continuous quantity hyperparameter value from which the predicted performance improvement amount can be obtained. The method for estimating the amount of improvement in predicted performance will be described later.

（Ｓ３３）学習制御部１３３は、ステップＳ３２で算出された複数のカテゴリカルハイパーパラメータ値に対応する複数の予測性能改善量の中から、最大の予測性能改善量を選択する。学習制御部１３３は、選択した予測性能改善量が得られるカテゴリカルハイパーパラメータ値ｃ_ｉと連続量ハイパーパラメータ値ｘ_ｃｉを特定する。 (S33) The learning control unit 133 selects the maximum prediction performance improvement amount from the plurality of prediction performance improvement amounts corresponding to the plurality of categorical hyperparameter values calculated in step S32. The learning control unit 133 specifies the categorical hyperparameter value _ci and the continuous quantity hyperparameter value x _ci from which the selected prediction performance improvement amount can be obtained.

（Ｓ３４）学習実行部１３４は、データセット記憶部１２３に記憶されたデータセットの中から訓練データとテストデータとを抽出する。学習実行部１３４は、ステップＳ３３で選択されたカテゴリカルハイパーパラメータ値ｃ_ｉと連続量ハイパーパラメータ値ｘ_ｃｉに基づいて機械学習を実行し、訓練データからモデルを生成する。学習実行部１３４は、テストデータを用いて、生成したモデルの予測性能ｆ（ｘ_ｃｉ；ｃ_ｉ）を測定する。学習制御部１３３は、モデルと予測性能ｆ（ｘ_ｃｉ；ｃ_ｉ）を取得する。 (S34) The learning execution unit 134 extracts training data and test data from the data set stored in the data set storage unit 123. The learning execution unit 134 executes machine learning based on the categorical hyperparameter value _ci and the continuous quantity hyperparameter value x _ci selected in step S33, and generates a model from the training data. The learning execution unit 134 measures the prediction performance f (x _ci ; _ci ) of the generated model using the test data. The learning control unit 133 acquires the model and the prediction performance f (x _ci ; _ci ).

（Ｓ３５）予測性能推定部１３５は、カテゴリカルハイパーパラメータ値ｃ_ｉと連続量ハイパーパラメータ値ｘ_ｃｉと予測性能ｆ（ｘ_ｃｉ；ｃ_ｉ）を取得する。予測性能推定部１３５は、探索履歴Ｓに（ｃ_ｉ，ｘ_ｃｉ，ｆ（ｘ_ｃｉ；ｃ_ｉ））というレコードを追加する。 (S35) The prediction performance estimation unit 135 acquires the categorical hyperparameter value c _i , the continuous quantity _{hyperparameter} value x _ci , and the prediction performance f (x c i; c _i ). The prediction performance estimation unit 135 adds a record ( _ci , x _ci , f (x _ci ; _ci )) to the search history S.

（Ｓ３６）学習制御部１３３は、ステップＳ３０の実行からの経過時間が制限時間Ｔを超えたか判断する。経過時間が制限時間Ｔを超えた場合はステップＳ３７に処理が進み、経過時間が制限時間Ｔ以下である場合はステップＳ３２に処理が進む。 (S36) The learning control unit 133 determines whether the elapsed time from the execution of step S30 has exceeded the time limit T. If the elapsed time exceeds the time limit T, the process proceeds to step S37, and if the elapsed time is equal to or less than the time limit T, the process proceeds to step S32.

（Ｓ３７）学習制御部１３３は、これまでに生成されたモデルのうち予測性能が最大のモデルを学習結果記憶部１２４に保存する。また、学習制御部１３３は、そのモデルの予測性能（達成予測性能）と、そのモデルの生成に使用したカテゴリカルハイパーパラメータ値および連続量ハイパーパラメータ値とを、学習結果記憶部１２４に保存する。 (S37) The learning control unit 133 stores in the learning result storage unit 124 the model having the highest prediction performance among the models generated so far. Further, the learning control unit 133 stores the prediction performance (achievement prediction performance) of the model and the categorical hyperparameter values and the continuous quantity hyperparameter values used for generating the model in the learning result storage unit 124.

図１４は、性能改善量推定の手順例を示すフローチャートである。
この性能改善量推定は、上記のステップＳ３２で実行される。
（Ｓ４０）予測性能推定部１３５は、１つのカテゴリカルハイパーパラメータ値（カテゴリカルハイパーパラメータ値ｃ_ｉ）を選択する。 FIG. 14 is a flowchart showing an example of a procedure for estimating the amount of performance improvement.
This performance improvement amount estimation is executed in step S32 described above.
(S40) The prediction performance estimation unit 135 selects one categorical hyperparameter value (categorical hyperparameter value _ci ).

（Ｓ４１）予測性能推定部１３５は、探索履歴Ｓの中から、選択したカテゴリカルハイパーパラメータ値ｃ_ｉを含むレコードの集合である探索履歴Ｓ_ｉを検索する。
（Ｓ４２）予測性能推定部１３５は、点集合Ｒ_ｉを空集合φに初期化する。また、予測性能推定部１３５は、探索履歴Ｓ_ｉに含まれる予測性能ｆ（ｘ；ｃ_ｉ）を正規化する。ここで正規化に用いる正規化関数は、探索履歴Ｓ_ｉの予測性能の分布に基づいて決定される。例えば、探索履歴Ｓ_ｉの中から最大値ｐｍａｘと最小値ｐｍｉｎが検索され、正規化関数がｈｐ（ｆ（ｘ；ｃ_ｉ））＝（ｆ（ｘ；ｃ_ｉ）－ｐｍｉｎ）／（ｐｍａｘ－ｐｍｉｎ）と決定される。ただし、他のカテゴリカルハイパーパラメータ値のレコードも含む探索履歴Ｓの予測性能の分布に基づいて正規化関数を決定してもよい。予測性能推定部１３５は、探索履歴Ｓ_ｉに現れる連続量ハイパーパラメータ値それぞれについて、（ｘ，ｈｐ（ｆ（ｘ；ｃ_ｉ）），０）というレコードを点集合Ｒ_ｉに追加する。第１項は連続量ハイパーパラメータ値を表し、第２項は正規化予測性能を表し、第３項は標準偏差を表す。 (S41) The prediction performance estimation unit 135 searches the search history S _i , which is a set of records including the selected categorical hyperparameter value c _i , from the search history S.
(S42) The prediction performance estimation unit 135 initializes the point set _Ri to the empty set φ. Further, the prediction performance estimation unit 135 normalizes the prediction performance f (x; _ci ) included in the search history S _i . Here, the normalization function used for normalization is determined based on the distribution of the prediction performance of the search history _Si . For example, the maximum value pmax and the minimum value pmin are searched from the search history S _i , and the normalization function is hp (f (x; c _i )) = (f (x; c _i ) -pmin) / (pmax-). It is determined to be pmin). However, the normalization function may be determined based on the distribution of the prediction performance of the search history S including the records of other categorical hyperparameter values. The prediction performance estimation unit 135 adds the records (x, hp (f (x; _ci )), 0) to the point set _Ri for each of the continuous quantity hyperparameter values appearing in the search history S _i . The first term represents the continuous quantity hyperparameter value, the second term represents the normalized prediction performance, and the third term represents the standard deviation.

（Ｓ４３）予測性能推定部１３５は、ステップＳ４０と異なる他の１つのカテゴリカルハイパーパラメータ値（カテゴリカルハイパーパラメータ値ｃ_ｊ）を選択する。
（Ｓ４４）予測性能推定部１３５は、探索履歴Ｓの中から、選択したカテゴリカルハイパーパラメータ値ｃ_ｊを含むレコードの集合である探索履歴Ｓ_ｊを検索する。 (S43) The prediction performance estimation unit 135 selects another categorical hyperparameter value (categorical hyperparameter value _cj ) different from step S40.
(S44) The prediction performance estimation unit 135 searches the search history S _j , which is a set of records including the selected categorical hyperparameter value c _j , from the search history S.

（Ｓ４５）予測性能推定部１３５は、探索履歴Ｓ_ｊに含まれる予測性能ｆ（ｘ；ｃ_ｊ）を正規化する。ここで正規化に用いる正規化関数は、探索履歴Ｓ_ｊの予測性能の分布に基づいて決定される。例えば、探索履歴Ｓ_ｊの中から最大値ｐｍａｘと最小値ｐｍｉｎが検索され、正規化関数がｈｐ（ｆ（ｘ；ｃ_ｊ））＝（ｆ（ｘ；ｃ_ｊ）－ｐｍｉｎ）／（ｐｍａｘ－ｐｍｉｎ）と決定される。ただし、探索履歴Ｓの予測性能の分布に基づいて正規化関数を決定してもよく、ステップＳ４２と同じ正規化関数を使用してもよい。 (S45) The prediction performance estimation unit 135 normalizes the prediction performance f (x; c _j ) included in the search history _Sj . Here, the normalization function used for normalization is determined based on the distribution of the prediction performance of the search history _Sj . For example, the maximum value pmax and the minimum value pmin are searched from the search history _Sj , and the normalization function is hp (f (x; c _j )) = (f (x; c _j ) -pmin) / (pmax-). It is determined to be pmin). However, the normalization function may be determined based on the distribution of the prediction performance of the search history S, or the same normalization function as in step S42 may be used.

また、予測性能推定部１３５は、探索履歴Ｓ_ｊに現れる連続量ハイパーパラメータ値それぞれについて、統計テーブル１４２から（ｃ_ｉ，ｃ_ｊ）に対応する正規化予測性能差の期待値ｐμ_ｉｊ（ｘ）と標準偏差ｐσ_ｉｊ（ｘ）を検索する。このとき、統計テーブル１４２の中には、探索履歴Ｓ_ｊに現れる連続量ハイパーパラメータ値と完全に一致するレコードが存在しない場合もある。その場合、例えば、所望の連続量ハイパーパラメータ値に最も近いレコードを使用する方法が考えられる。また、所望の連続量ハイパーパラメータ値の前後のレコードから線形補間により、所望の連続量ハイパーパラメータ値のｐμ_ｉｊ（ｘ），ｐσ_ｉｊ（ｘ）を推定する方法も考えられる。 Further, the prediction performance estimation unit 135 determines the expected value pμ _ij (x) of the normalized prediction performance difference corresponding to (ci _i , c _j ) from the statistical table 142 for each of the continuous quantity hyperparameter values appearing in the search history _Sj . And the standard deviation pσ _ij (x) are searched. At this time, there may be no record in the statistical table 142 that completely matches the continuous quantity hyperparameter value that appears in the search history _Sj . In that case, for example, a method of using the record closest to the desired continuous quantity hyperparameter value can be considered. Further, a method of estimating pμ _ij (x) and pσ _ij (x) of the desired continuous quantity hyperparameter values by linear interpolation from the records before and after the desired continuous quantity hyperparameter values can be considered.

（Ｓ４６）予測性能推定部１３５は、探索履歴Ｓ_ｊに現れる連続量ハイパーパラメータ値それぞれについて、（ｘ，ｈｐ（ｆ（ｘ；ｃ_ｊ））＋ｐμ_ｉｊ（ｘ），ｐσ_ｉｊ（ｘ））というレコードを点集合Ｒ_ｉに追加する。第２項の正規化予測性能は、カテゴリカルハイパーパラメータ値ｃ_ｊの正規化予測性能に正規化予測性能差の期待値を加えたものである。第３項の標準偏差は、正規化予測性能差の標準偏差である。 (S46) The prediction performance estimation unit 135 calls (x, hp (f (x; c _j )) + pμ _ij (x), pσ _ij (x)) for each of the continuous quantity hyperparameter values appearing in the search history _Sj . Add records to the point set _Ri . The normalization prediction performance of the second term is the normalization prediction performance of the categorical hyperparameter value _cj plus the expected value of the normalization prediction performance difference. The standard deviation of the third term is the standard deviation of the normalized prediction performance difference.

（Ｓ４７）予測性能推定部１３５は、ステップＳ４３で全ての他のカテゴリカルハイパーパラメータ値を選択したか判断する。全ての他のカテゴリカルハイパーパラメータ値を選択した場合はステップＳ４８に処理が進み、未選択の他のカテゴリカルハイパーパラメータ値が存在する場合はステップＳ４３に処理が進む。 (S47) The prediction performance estimation unit 135 determines whether all other categorical hyperparameter values have been selected in step S43. If all other categorical hyperparameter values are selected, the process proceeds to step S48, and if there are other unselected categorical hyperparameter values, the process proceeds to step S43.

図１５は、性能改善量推定の手順例を示すフローチャート（続き）である。
（Ｓ４８）予測性能推定部１３５は、連続量ハイパーパラメータ探索範囲Ｘの中から連続量ハイパーパラメータ値を１つサンプリングする（連続量ハイパーパラメータ値ｘ_ｋ）。好ましくは、探索履歴Ｓ_ｉに現れていない連続量ハイパーパラメータ値を選択する。サンプリングは、グリッド法やランダム法などによって行う。 FIG. 15 is a flowchart (continued) showing an example of a procedure for estimating the amount of performance improvement.
(S48) The prediction performance estimation unit 135 samples one continuous quantity hyperparameter value from the continuous quantity hyperparameter search range X (continuous quantity hyperparameter value x _k ). Preferably, a continuous quantity hyperparameter value that does not appear in the search history _Si is selected. Sampling is performed by the grid method or the random method.

（Ｓ４９）予測性能推定部１３５は、点集合Ｒ_ｉが示す複数の点に基づいて、ステップＳ４８で選択した連続量ハイパーパラメータ値ｘ_ｋに対する正規化予測性能ｈｐ（ｆ（ｘ_ｋ；ｃ_ｉ））の期待値と標準偏差を推定する。点集合Ｒ_ｉの中に０でない標準偏差が与えられた点、すなわち、推定された点が存在しない場合、ｈｐ（ｆ（ｘ_ｋ；ｃ_ｉ））の期待値および標準偏差は通常の回帰分析によって算出することが可能である。一方、点集合Ｒ_ｉの中に０でない標準偏差が与えられた点が存在する場合、この点の推定の不確かさを考慮してｈｐ（ｆ（ｘ_ｋ；ｃ_ｉ））の期待値および標準偏差が算出される。推定された点が存在すると、ｈｐ（ｆ（ｘ_ｋ；ｃ_ｉ））の標準偏差は大きくなることが多い。 (S49) The prediction performance estimation unit 135 has normalized prediction performance hp (f (x _k ; c _i )) for the continuous quantity hyperparameter value x _k selected in step S48 based on a plurality of points indicated by the point set R _i . ) Estimate the expected value and standard deviation. If there is no non-zero standard deviation in the point set R _i , i.e., the estimated point does not exist, then the expected value and standard deviation of hp (f (x _k ; _ci )) are the usual regression analysis. It is possible to calculate by. On the other hand, if there is a point in the point set R _i given a non-zero standard deviation, the expected value and standard of hp (f (x _k ; c _i )) are taken into account the uncertainty of estimation of this point. The deviation is calculated. The presence of the estimated points often results in a large standard deviation of hp (f (x _k ; _ci )).

例えば、予測性能推定部１３５は、０でない標準偏差をもつ正規化予測性能の分布の中からサンプル値を抽出し、抽出したサンプル値を用いて回帰分析を行う。予測性能推定部１３５は、これをモンテカルロシミュレーションとして多数回繰り返し、回帰分析結果を平均化することで、ｈｐ（ｆ（ｘ_ｋ；ｃ_ｉ））の期待値および標準偏差を求める。また、例えば、予測性能推定部１３５は、ガウス過程を用いた推定方法により、数式的にｈｐ（ｆ（ｘ_ｋ；ｃ_ｉ））の期待値および標準偏差を求める。ガウス過程を用いた推定方法については、例えば、次の文献に記載されている。P.W.Goldberg, C.K.I. Williams and C.M.Bishop, "Regression with Input-dependent Noise: A Gaussian Process Treatment", In Advances in Neural Information Processing 11 (NIPS 1997), 1997。 For example, the prediction performance estimation unit 135 extracts a sample value from the distribution of the normalized prediction performance having a standard deviation other than 0, and performs regression analysis using the extracted sample value. The prediction performance estimation unit 135 repeats this many times as a Monte Carlo simulation and averages the regression analysis results to obtain the expected value and standard deviation of hp (f (x _k ; _ci )). Further, for example, the prediction performance estimation unit 135 mathematically obtains the expected value and standard deviation of hp (f (x _k ; _ci )) by an estimation method using a Gaussian process. An estimation method using a Gaussian process is described, for example, in the following literature. PWGoldberg, CKI Williams and CMBishop, "Regression with Input-dependent Noise: A Gaussian Process Treatment", In Advances in Neural Information Processing 11 (NIPS 1997), 1997.

（Ｓ５０）予測性能推定部１３５は、ステップＳ４９で推定したｈｐ（ｆ（ｘ_ｋ；ｃ_ｉ））の期待値と標準偏差に基づいて、カテゴリカルハイパーパラメータ値ｃ_ｉと連続量ハイパーパラメータ値ｘ_ｋの組に対する予測性能改善量ｇ（ｘ_ｋ；ｃ_ｉ）を算出する。例えば、予測性能推定部１３５は、ｈｐ（ｆ（ｘ_ｋ；ｃ_ｉ））の期待値と標準偏差から、９５％信頼区間の上限値（ＵＣＢ）を算出し、ＵＣＢから現在の達成予測性能を引いた値を予測性能改善量ｇ（ｘ_ｋ；ｃ_ｉ）とする。このとき、達成予測性能についても正規化しておく。ｈｐ（ｆ（ｘ_ｋ；ｃ_ｉ））の期待値より大きい値を用いて予測性能改善量を算出しているのは、推定の不確かさにより、高い予測性能を達成し得るハイパーパラメータ値を誤って切り捨ててしまうことを回避するためである。 (S50) The prediction performance estimation unit 135 has a categorical hyperparameter value _ci and a continuous quantity hyperparameter value x based on the expected value and standard deviation of hp (f (x _k ; _ci )) estimated in step S49. The predicted performance improvement amount g (x _k ; _ci ) for the set of _k is calculated. For example, the prediction performance estimation unit 135 calculates the upper limit value (UCB) of the 95% confidence interval from the expected value and standard deviation of hp (f (x _k ; _ci )), and calculates the current achievement prediction performance from UCB. The subtracted value is taken as the predicted performance improvement amount g (x _k ; _ci ). At this time, the achievement prediction performance is also normalized. The reason why the amount of improvement in prediction performance is calculated using a value larger than the expected value of hp (f (x _k ; _ci )) is that the hyperparameter value that can achieve high prediction performance is incorrect due to the uncertainty of estimation. This is to avoid truncation.

なお、予測性能推定部１３５は、ＵＣＢに代えて、推定される予測性能の分布を積分して、予測性能が達成予測性能を超える確率（ＰＩ：Probability of Improvement）を算出してもよい。また、推定される予測性能の分布を積分して、予測性能が達成予測性能を超える期待値（ＥＩ：Expected Improvement）を算出してもよい。 Instead of UCB, the prediction performance estimation unit 135 may integrate the estimated distribution of the prediction performance to calculate the probability (PI: Probability of Improvement) that the prediction performance exceeds the achievement prediction performance. Further, the expected value (EI: Expected Improvement) in which the predicted performance exceeds the achieved predicted performance may be calculated by integrating the distribution of the estimated predicted performance.

（Ｓ５１）予測性能推定部１３５は、ステップＳ４８において十分な数の連続量ハイパーパラメータ値をサンプリングしたか、すなわち、サンプリングした連続量ハイパーパラメータ値の数が所定の閾値に達したか判断する。サンプリングした連続量ハイパーパラメータ値の数が十分である場合はステップＳ５２に処理が進み、連続量ハイパーパラメータ値の数が不十分である場合はステップＳ４８に処理が進む。 (S51) The prediction performance estimation unit 135 determines whether a sufficient number of continuous quantity hyperparameter values have been sampled in step S48, that is, whether the number of sampled continuous quantity hyperparameter values has reached a predetermined threshold value. If the number of sampled continuous quantity hyperparameter values is sufficient, the process proceeds to step S52, and if the number of continuous quantity hyperparameter values is insufficient, the process proceeds to step S48.

（Ｓ５２）予測性能推定部１３５は、ステップＳ５０で算出された複数の予測性能改善量のうち、最大の予測性能改善量を選択する。この予測性能改善量が、カテゴリカルハイパーパラメータｃ_ｉについての予測性能改善量になる。また、予測性能推定部１３５は、この予測性能改善量が得られる連続ハイパーパラメータ値を選択する。 (S52) The prediction performance estimation unit 135 selects the maximum prediction performance improvement amount from the plurality of prediction performance improvement amounts calculated in step S50. This predicted performance improvement amount becomes the predicted performance improvement amount for the categorical hyperparameter _ci . Further, the prediction performance estimation unit 135 selects a continuous hyperparameter value from which this prediction performance improvement amount can be obtained.

（Ｓ５３）予測性能推定部１３５は、ステップＳ４０で全てのカテゴリカルハイパーパラメータ値を選択したか判断する。全てのカテゴリカルハイパーパラメータ値を選択した場合は性能改善量推定が終了し、未選択のカテゴリカルハイパーパラメータ値が存在する場合はステップＳ４０に処理が進む。 (S53) The prediction performance estimation unit 135 determines whether all the categorical hyperparameter values have been selected in step S40. When all the categorical hyperparameter values are selected, the performance improvement amount estimation is completed, and when there are unselected categorical hyperparameter values, the process proceeds to step S40.

第２の実施の形態の機械学習装置１００によれば、過去の機械学習の履歴から、異なるカテゴリカルハイパーパラメータ値の間の予測性能差の情報が生成される。そして、新たなデータセットに対して機械学習を行うとき、予測性能差の情報に基づいて、異なるカテゴリカルハイパーパラメータ値の間でハイパーパラメータ探索の試行結果が相互に変換されて共有される。これにより、ハイパーパラメータ探索が効率化される。 According to the machine learning device 100 of the second embodiment, information on the prediction performance difference between different categorical hyperparameter values is generated from the history of past machine learning. Then, when machine learning is performed on a new data set, the trial results of hyperparameter search are mutually converted and shared between different categorical hyperparameter values based on the information of the prediction performance difference. This makes hyperparameter search more efficient.

例えば、個々のカテゴリカルハイパーパラメータ値について独立にハイパーパラメータ探索を行う場合と比べて、推定される予測性能曲線の不確かさを早期に低減することができ、制限時間内に予測性能が十分高いハイパーパラメータ値に到達することが容易となる。また、異なるカテゴリカルハイパーパラメータ値の間で生じる予測性能差を考慮せずに試行結果を単純共有する場合と比べて、予測性能曲線の推定精度が向上する。また、予測性能差の標準偏差を利用することで、予測性能曲線の推定の不確かさを適切に反映させることができ、予測性能が高くなるハイパーパラメータ値を見逃すリスクを低減できる。 For example, the uncertainty of the estimated prediction performance curve can be reduced earlier than when hyperparameter search is performed independently for each categorical hyperparameter value, and the prediction performance is sufficiently high within the time limit. It becomes easy to reach the parameter value. In addition, the estimation accuracy of the prediction performance curve is improved as compared with the case where the trial results are simply shared without considering the prediction performance difference that occurs between different categorical hyperparameter values. Further, by using the standard deviation of the prediction performance difference, the uncertainty of estimation of the prediction performance curve can be appropriately reflected, and the risk of overlooking the hyperparameter value that improves the prediction performance can be reduced.

［第３の実施の形態］
次に、第３の実施の形態を説明する。第２の実施の形態との違いを中心に説明し、第２の実施の形態と同様の事項については説明を省略することがある。 [Third Embodiment]
Next, a third embodiment will be described. The differences from the second embodiment will be mainly described, and the same matters as those of the second embodiment may be omitted.

第２の実施の形態の機械学習では、ハイパーパラメータ値の評価基準として予測性能改善量を使用したのに対し、第３の実施の形態の機械学習では、ハイパーパラメータ値の評価基準として予測性能改善速度を使用する。予測性能改善速度は、単位時間当たりの予測性能改善量であり、推定される予測性能改善量を実行時間の推定値で割ったものである。第３の実施の形態の機械学習装置は、図２，８～１０に示した第２の実施の形態の機械学習装置１００と同様の構成によって実現できる。そこで、以下の第３の実施の形態の説明では、図２，８～１０と同様の符号を使用することがある。 In the machine learning of the second embodiment, the predicted performance improvement amount was used as the evaluation standard of the hyperparameter value, whereas in the machine learning of the third embodiment, the predicted performance improvement was used as the evaluation standard of the hyperparameter value. Use speed. The predicted performance improvement speed is the predicted performance improvement amount per unit time, and is the estimated predicted performance improvement amount divided by the estimated value of the execution time. The machine learning device of the third embodiment can be realized by the same configuration as the machine learning device 100 of the second embodiment shown in FIGS. 2, 8 to 10. Therefore, in the following description of the third embodiment, the same reference numerals as those in FIGS. 2, 8 to 10 may be used.

図１６は、第３の実施の形態の機械学習の手順例を示すフローチャートである。
（Ｓ６０）学習制御部１３３は、データセットＤ、制限時間Ｔおよびハイパーパラメータ探索範囲Θを特定する。ハイパーパラメータ探索範囲Θは、カテゴリカルハイパーパラメータ探索範囲Ｃと連続量ハイパーパラメータ探索範囲Ｘとを含む。 FIG. 16 is a flowchart showing an example of the machine learning procedure of the third embodiment.
(S60) The learning control unit 133 specifies the data set D, the time limit T, and the hyperparameter search range Θ. The hyperparameter search range Θ includes a categorical hyperparameter search range C and a continuous quantity hyperparameter search range X.

（Ｓ６１）予測性能推定部１３５は、探索履歴Ｓを空集合φに初期化する。
（Ｓ６２）予測性能推定部１３５は、探索履歴Ｓを参照して、カテゴリカルハイパーパラメータ探索範囲Ｃの各カテゴリカルハイパーパラメータ値について、最大の予測性能改善速度ｖ（ｘ；ｃ）を算出し、その予測性能改善速度が得られる連続量ハイパーパラメータ値を特定する。予測性能改善速度の推定方法については後述する。 (S61) The prediction performance estimation unit 135 initializes the search history S to the empty set φ.
(S62) The prediction performance estimation unit 135 calculates the maximum prediction performance improvement speed v (x; c) for each categorical hyperparameter value in the categorical hyperparameter search range C with reference to the search history S. Specify the continuous quantity hyperparameter value from which the predicted performance improvement speed can be obtained. The method of estimating the prediction performance improvement speed will be described later.

（Ｓ６３）学習制御部１３３は、ステップＳ６２で算出された複数のカテゴリカルハイパーパラメータ値に対応する複数の予測性能改善速度の中から、最大の予測性能改善速度を選択する。学習制御部１３３は、選択した予測性能改善速度が得られるカテゴリカルハイパーパラメータ値ｃ_ｉと連続量ハイパーパラメータ値ｘ_ｃｉを特定する。 (S63) The learning control unit 133 selects the maximum predicted performance improvement speed from the plurality of predicted performance improvement speeds corresponding to the plurality of categorical hyperparameter values calculated in step S62. The learning control unit 133 specifies the categorical hyperparameter value _ci and the continuous quantity hyperparameter value x _ci from which the selected prediction performance improvement speed can be obtained.

（Ｓ６４）学習実行部１３４は、データセット記憶部１２３に記憶されたデータセットの中から訓練データとテストデータとを抽出する。学習実行部１３４は、ステップＳ６３で選択されたカテゴリカルハイパーパラメータ値ｃ_ｉと連続量ハイパーパラメータ値ｘ_ｃｉに基づいて機械学習を実行し、訓練データからモデルを生成する。学習実行部１３４は、テストデータを用いて、生成したモデルの予測性能ｆ（ｘ_ｃｉ；ｃ_ｉ）を測定する。また、学習実行部１３４は、実行時間ｔ（ｘ_ｃｉ；ｃ_ｉ）を測定する。学習制御部１３３は、モデルと予測性能ｆ（ｘ_ｃｉ；ｃ_ｉ）と実行時間ｔ（ｘ_ｃｉ；ｃ_ｉ）を取得する。 (S64) The learning execution unit 134 extracts training data and test data from the data set stored in the data set storage unit 123. The learning execution unit 134 executes machine learning based on the categorical hyperparameter value _ci and the continuous quantity hyperparameter value x _ci selected in step S63, and generates a model from the training data. The learning execution unit 134 measures the prediction performance f (x _ci ; _ci ) of the generated model using the test data. Further, the learning execution unit 134 measures the execution time t (x _ci ; _ci ). The learning control unit 133 acquires the model, the prediction performance f (x _c i; c _i ), and the execution time t (x _c i; c _i ).

（Ｓ６５）予測性能推定部１３５は、カテゴリカルハイパーパラメータ値ｃ_ｉと連続量ハイパーパラメータ値ｘ_ｃｉと予測性能ｆ（ｘ_ｃｉ；ｃ_ｉ）と実行時間ｔ（ｘ_ｃｉ；ｃ_ｉ）を取得する。予測性能推定部１３５は、探索履歴Ｓに（ｃ_ｉ，ｘ_ｃｉ，ｆ（ｘ_ｃｉ；ｃ_ｉ），ｔ（ｘ_ｃｉ；ｃ_ｉ））というレコードを追加する。 (S65) The prediction performance estimation unit 135 acquires the categorical _{hyperparameter} value c _i , the continuous quantity _{hyperparameter} value x _ci , the prediction performance f (x c i; c _i ), and the execution time t (x c i; c _i ). .. The prediction performance estimation unit 135 adds a record ( _ci , x _ci , f (x _ci ; _ci ), t (x _ci ; _ci )) to the search history S.

（Ｓ６６）学習制御部１３３は、ステップＳ６０の実行からの経過時間が制限時間Ｔを超えたか判断する。経過時間が制限時間Ｔを超えた場合はステップＳ６７に処理が進み、経過時間が制限時間Ｔ以下である場合はステップＳ６２に処理が進む。 (S66) The learning control unit 133 determines whether the elapsed time from the execution of step S60 exceeds the time limit T. If the elapsed time exceeds the time limit T, the process proceeds to step S67, and if the elapsed time is equal to or less than the time limit T, the process proceeds to step S62.

（Ｓ６７）学習制御部１３３は、これまでに生成されたモデルのうち予測性能が最大のモデルを学習結果記憶部１２４に保存する。また、学習制御部１３３は、そのモデルの予測性能（達成予測性能）と、そのモデルの生成に使用したカテゴリカルハイパーパラメータ値および連続量ハイパーパラメータ値とを、学習結果記憶部１２４に保存する。学習制御部１３３は、そのモデルの生成に要した実行時間を更に保存してもよい。 (S67) The learning control unit 133 stores in the learning result storage unit 124 the model having the highest prediction performance among the models generated so far. Further, the learning control unit 133 stores the prediction performance (achievement prediction performance) of the model and the categorical hyperparameter values and the continuous quantity hyperparameter values used for generating the model in the learning result storage unit 124. The learning control unit 133 may further save the execution time required to generate the model.

図１７は、性能改善速度推定の手順例を示すフローチャートである。
（Ｓ７０）予測性能推定部１３５は、１つのカテゴリカルハイパーパラメータ値（カテゴリカルハイパーパラメータ値ｃ_ｉ）を選択する。 FIG. 17 is a flowchart showing a procedure example of performance improvement speed estimation.
(S70) The prediction performance estimation unit 135 selects one categorical hyperparameter value (categorical hyperparameter value _ci ).

（Ｓ７１）予測性能推定部１３５は、探索履歴Ｓの中から、選択したカテゴリカルハイパーパラメータ値ｃ_ｉを含むレコードの集合である探索履歴Ｓ_ｉを検索する。
（Ｓ７２）予測性能推定部１３５は、点集合ｐＲ_ｉを空集合φに初期化する。また、予測性能推定部１３５は、探索履歴Ｓ_ｉに含まれる予測性能ｆ（ｘ；ｃ_ｉ）を正規化する。予測性能推定部１３５は、探索履歴Ｓ_ｉに現れる連続量ハイパーパラメータ値それぞれについて（ｘ，ｈｐ（ｆ（ｘ；ｃ_ｉ）），０）というレコードを点集合ｐＲ_ｉに追加する。 (S71) The prediction performance estimation unit 135 searches the search history S _i , which is a set of records including the selected categorical hyperparameter value c _i , from the search history S.
(S72) The prediction performance estimation unit 135 initializes the point set pR _i to the empty set φ. Further, the prediction performance estimation unit 135 normalizes the prediction performance f (x; _ci ) included in the search history S _i . The prediction performance estimation unit 135 adds records (x, hp (f (x; _ci )), 0) to the point set pR _i for each of the continuous quantity hyperparameter values appearing in the search history S _i .

（Ｓ７３）予測性能推定部１３５は、点集合ｔＲ_ｉを空集合φに初期化する。また、予測性能推定部１３５は、探索履歴Ｓ_ｉに含まれる実行時間ｔ（ｘ；ｃ_ｉ）を正規化する。ここで正規化に用いる正規化関数は、探索履歴Ｓ_ｉの実行時間の分布に基づいて決定される。例えば、探索履歴Ｓ_ｉの中から最大値ｔｍａｘと最小値ｔｍｉｎが検索され、正規化関数がｈｔ（ｔ（ｘ；ｃ_ｉ））＝（ｔ（ｘ；ｃ_ｉ）－ｔｍｉｎ）／（ｔｍａｘ－ｔｍｉｎ）と決定される。ただし、他のカテゴリカルハイパーパラメータ値のレコードも含む探索履歴Ｓの実行時間の分布に基づいて正規化関数を決定してもよい。予測性能推定部１３５は、探索履歴Ｓ_ｉに現れる連続量ハイパーパラメータ値それぞれについて（ｘ，ｈｔ（ｔ（ｘ；ｃ_ｉ）），０）というレコードを点集合ｔＲ_ｉに追加する。第１項は連続量ハイパーパラメータ値を表し、第２項は正規化実行時間を表し、第３項は標準偏差を表す。 (S73) The prediction performance estimation unit 135 initializes the point set tR _i to the empty set φ. Further, the prediction performance estimation unit 135 normalizes the execution time t (x; _ci ) included in the search history S _i . Here, the normalization function used for normalization is determined based on the distribution of the execution time of the search history _Si . For example, the maximum value tmax and the minimum value tmin are searched from the search history S _i , and the normalization function is ht (t (x; c _i )) = (t (x; c _i ) -tmin) / (tmax-). It is determined as tmin). However, the normalization function may be determined based on the distribution of the execution time of the search history S including the records of other categorical hyperparameter values. The prediction performance estimation unit 135 adds records (x, ht (t (x; _ci )), 0) to the point set tR _i for each of the continuous quantity hyperparameter values appearing in the search history S _i . The first term represents the continuous quantity hyperparameter value, the second term represents the normalization execution time, and the third term represents the standard deviation.

（Ｓ７４）予測性能推定部１３５は、ステップＳ７０と異なる他の１つのカテゴリカルハイパーパラメータ値（カテゴリカルハイパーパラメータ値ｃ_ｊ）を選択する。
（Ｓ７５）予測性能推定部１３５は、探索履歴Ｓの中から、選択したカテゴリカルハイパーパラメータ値ｃ_ｊを含むレコードの集合である探索履歴Ｓ_ｊを検索する。 (S74) The prediction performance estimation unit 135 selects another categorical hyperparameter value (categorical hyperparameter value _cj ) different from step S70.
(S75) The prediction performance estimation unit 135 searches the search history S _j , which is a set of records including the selected categorical hyperparameter value c _j , from the search history S.

（Ｓ７６）予測性能推定部１３５は、探索履歴Ｓ_ｊに含まれる予測性能ｆ（ｘ；ｃ_ｊ）を正規化する。また、予測性能推定部１３５は、探索履歴Ｓ_ｊに現れる連続量ハイパーパラメータ値それぞれについて、統計テーブル１４２から（ｃ_ｉ，ｃ_ｊ）に対応する正規化予測性能差の期待値ｐμ_ｉｊ（ｘ）と標準偏差ｐσ_ｉｊ（ｘ）を検索する。 (S76) The prediction performance estimation unit 135 normalizes the prediction performance f (x; c _j ) included in the search history _Sj . Further, the prediction performance estimation unit 135 determines the expected value pμ _ij (x) of the normalized prediction performance difference corresponding to (ci _i , c _j ) from the statistical table 142 for each of the continuous quantity hyperparameter values appearing in the search history _Sj . And the standard deviation pσ _ij (x) are searched.

（Ｓ７７）予測性能推定部１３５は、探索履歴Ｓ_ｊに現れる連続量ハイパーパラメータ値それぞれについて、（ｘ，ｈｐ（ｆ（ｘ；ｃ_ｊ））＋ｐμ_ｉｊ（ｘ），ｐσ_ｉｊ（ｘ））というレコードを点集合ｐＲ_ｉに追加する。第２項の正規化予測性能は、カテゴリカルハイパーパラメータ値ｃ_ｊの正規化予測性能に正規化予測性能差の期待値を加えたものである。第３項の標準偏差は、正規化予測性能差の標準偏差である。 (S77) The prediction performance estimation unit 135 calls (x, hp (f (x; c _j )) + pμ _ij (x), pσ _ij (x)) for each of the continuous quantity hyperparameter values appearing in the search history _Sj . Add records to the point set pR _i . The normalization prediction performance of the second term is the normalization prediction performance of the categorical hyperparameter value _cj plus the expected value of the normalization prediction performance difference. The standard deviation of the third term is the standard deviation of the normalized prediction performance difference.

（Ｓ７８）予測性能推定部１３５は、探索履歴Ｓ_ｊに含まれる実行時間ｔ（ｘ；ｃ_ｊ）を正規化する。ここで正規化に用いる正規化関数は、探索履歴Ｓ_ｊの実行時間の分布に基づいて決定される。例えば、探索履歴Ｓ_ｊの中から最大値ｔｍａｘと最小値ｔｍｉｎが検索され、正規化関数がｈｔ（ｔ（ｘ；ｃ_ｊ））＝（ｔ（ｘ；ｃ_ｊ）－ｔｍｉｎ）／（ｔｍａｘ－ｔｍｉｎ）と決定される。ただし、探索履歴Ｓの予測性能の分布に基づいて正規化関数を決定してもよく、ステップＳ７３と同じ正規化関数を使用してもよい。 (S78) The prediction performance estimation unit 135 normalizes the execution time t (x; c _j ) included in the search history S _j . Here, the normalization function used for normalization is determined based on the distribution of the execution time of the search history _Sj . For example, the maximum value tmax and the minimum value tmin are searched from the search history _{Sj, and the normalization function is ht (t (x; c j)) = (t (x; c j} ₎ _-tmin ) / (tmax-). It is determined as tmin). However, the normalization function may be determined based on the distribution of the prediction performance of the search history S, or the same normalization function as in step S73 may be used.

また、予測性能推定部１３５は、探索履歴Ｓ_ｊに現れる連続量ハイパーパラメータ値それぞれについて、統計テーブル１４２から（ｃ_ｉ，ｃ_ｊ）に対応する正規化実行時間差の期待値ｔμ_ｉｊ（ｘ）と標準偏差ｔσ_ｉｊ（ｘ）を検索する。このとき、統計テーブル１４２の中には、探索履歴Ｓ_ｊに現れる連続量ハイパーパラメータ値と完全に一致するレコードが存在しない場合もある。その場合、例えば、所望の連続量ハイパーパラメータ値に最も近いレコードを使用する方法が考えられる。また、所望の連続量ハイパーパラメータ値の前後のレコードから線形補間により、所望の連続量ハイパーパラメータ値のｔμ_ｉｊ（ｘ），ｔσ_ｉｊ（ｘ）を推定する方法も考えられる。 Further, the prediction performance estimation unit 135 sets the expected value tμ _ij (x) of the normalization execution time difference corresponding to (ci _i , c _j ) from the statistical table 142 for each of the continuous quantity hyperparameter values appearing in the search history _Sj . Search for standard deviation tσ _ij (x). At this time, there may be no record in the statistical table 142 that completely matches the continuous quantity hyperparameter value that appears in the search history _Sj . In that case, for example, a method of using the record closest to the desired continuous quantity hyperparameter value can be considered. Further, a method of estimating tμ _ij (x) and tσ _ij (x) of the desired continuous quantity hyperparameter values by linear interpolation from the records before and after the desired continuous quantity hyperparameter values can be considered.

（Ｓ７９）予測性能推定部１３５は、探索履歴Ｓ_ｊに現れる連続量ハイパーパラメータ値それぞれについて、（ｘ，ｈｔ（ｔ（ｘ；ｃ_ｊ））＋ｔμ_ｉｊ（ｘ），ｔσ_ｉｊ（ｘ））というレコードを点集合ｔＲ_ｉに追加する。第２項の正規化予測性能は、カテゴリカルハイパーパラメータ値ｃ_ｊの正規化実行時間に正規化実行時間差の期待値を加えたものである。第３項の標準偏差は、正規化実行時間差の標準偏差である。 (S79) The prediction performance estimation unit 135 calls (x, ht (t (x; c _j )) + tμ _ij (x), tσ _ij (x)) for each of the continuous quantity hyperparameter values appearing in the search history _Sj . Add records to the point set tR _i . The normalization prediction performance of the second term is obtained by adding the expected value of the normalization execution time difference to the normalization execution time of the categorical hyperparameter value _cj . The standard deviation of the third term is the standard deviation of the normalization execution time difference.

（Ｓ８０）予測性能推定部１３５は、ステップＳ７４で全ての他のカテゴリカルハイパーパラメータ値を選択したか判断する。全ての他のカテゴリカルハイパーパラメータ値を選択した場合はステップＳ８１に処理が進み、未選択の他のカテゴリカルハイパーパラメータ値が存在する場合はステップＳ７４に処理が進む。 (S80) The prediction performance estimation unit 135 determines whether all other categorical hyperparameter values have been selected in step S74. If all other categorical hyperparameter values are selected, the process proceeds to step S81, and if there are other unselected categorical hyperparameter values, the process proceeds to step S74.

図１８は、性能改善速度推定の手順例を示すフローチャート（続き）である。
（Ｓ８１）予測性能推定部１３５は、連続量ハイパーパラメータ探索範囲Ｘの中から連続量ハイパーパラメータ値を１つサンプリングする（連続量ハイパーパラメータ値ｘ_ｋ）。好ましくは、探索履歴Ｓ_ｉに現れていない連続量ハイパーパラメータ値を選択する。 FIG. 18 is a flowchart (continued) showing an example of the procedure for estimating the performance improvement speed.
(S81) The prediction performance estimation unit 135 samples one continuous quantity hyperparameter value from the continuous quantity hyperparameter search range X (continuous quantity hyperparameter value x _k ). Preferably, a continuous quantity hyperparameter value that does not appear in the search history _Si is selected.

（Ｓ８２）予測性能推定部１３５は、点集合ｐＲ_ｉが示す複数の点に基づいて、ステップＳ８１で選択した連続量ハイパーパラメータ値ｘ_ｋに対する正規化予測性能ｈｐ（ｆ（ｘ_ｋ；ｃ_ｉ））の期待値と標準偏差を推定する。 (S82) The prediction performance estimation unit 135 has normalized prediction performance hp (f (x _k ; c _i )) for the continuous quantity hyperparameter value x _k selected in step S81 based on a plurality of points indicated by the point set pR _i . ) Estimate the expected value and standard deviation.

（Ｓ８３）予測性能推定部１３５は、ステップＳ８２で推定したｈｐ（ｆ（ｘ_ｋ；ｃ_ｉ））の期待値と標準偏差に基づいて、カテゴリカルハイパーパラメータ値ｃ_ｉと連続量ハイパーパラメータ値ｘ_ｋの組に対する予測性能改善量ｇ（ｘ_ｋ；ｃ_ｉ）を算出する。 (S83) The prediction performance estimation unit 135 has a categorical hyperparameter value _ci and a continuous quantity hyperparameter value x based on the expected value and standard deviation of hp (f (x _k ; _ci )) estimated in step S82. The predicted performance improvement amount g (x _k ; _ci ) for the set of _k is calculated.

（Ｓ８４）予測性能推定部１３５は、点集合ｔＲ_ｉが示す複数の点に基づいて、ステップＳ８１で選択した連続量ハイパーパラメータ値ｘ_ｋに対する正規化実行時間ｈｔ（ｔ（ｘ_ｋ；ｃ_ｉ））の期待値を推定する。正規化実行時間ｈｔ（ｔ（ｘ_ｋ；ｃ_ｉ））の期待値の推定は、ステップＳ８２の正規化予測性能ｈｐ（ｆ（ｘ_ｋ；ｃ_ｉ））の期待値の推定と同様の方法で行うことができる。ただし、正規化実行時間ｈｔ（ｔ（ｘ_ｋ；ｃ_ｉ））の期待値の推定では、推定された点の不確かさ（標準偏差）を考慮しなくてもよい。 (S84) The prediction performance estimation unit 135 has a normalization execution time ht (t (x _k ; c _i )) for the continuous quantity hyperparameter value x _k selected in step S81 based on a plurality of points indicated by the point set tR _i . ) Estimate the expected value. The estimation of the expected value of the normalization execution time ht (t (x _k ; c _i )) is performed by the same method as the estimation of the expected value of the normalization prediction performance hp (f (x _k ; _ci )) in step S82. It can be carried out. However, in estimating the expected value of the normalized execution time ht (t (x _k ; _ci )), it is not necessary to consider the uncertainty (standard deviation) of the estimated points.

（Ｓ８５）予測性能推定部１３５は、ステップＳ８３で算出した予測性能改善量ｇ（ｘ_ｋ；ｃ_ｉ）を、ステップＳ８４で算出した正規化実行時間の期待値で割ることで、予測性能改善速度ｖ（ｘ_ｋ；ｃ_ｉ）を算出する。 (S85) The prediction performance estimation unit 135 divides the prediction performance improvement amount g (x _k ; _ci ) calculated in step S83 by the expected value of the normalization execution time calculated in step S84, so that the prediction performance improvement speed Calculate v (x _k ; _ci ).

（Ｓ８６）予測性能推定部１３５は、ステップＳ８１において十分な数の連続量ハイパーパラメータ値をサンプリングしたか、すなわち、サンプリングした連続量ハイパーパラメータ値の数が所定の閾値に達したか判断する。サンプリングした連続量ハイパーパラメータ値の数が十分である場合はステップＳ８７に処理が進み、連続量ハイパーパラメータ値の数が不十分である場合はステップＳ８１に処理が進む。 (S86) The prediction performance estimation unit 135 determines whether a sufficient number of continuous quantity hyperparameter values have been sampled in step S81, that is, whether the number of sampled continuous quantity hyperparameter values has reached a predetermined threshold value. If the number of sampled continuous quantity hyperparameter values is sufficient, the process proceeds to step S87, and if the number of continuous quantity hyperparameter values is insufficient, the process proceeds to step S81.

（Ｓ８７）予測性能推定部１３５は、ステップＳ８５で算出された複数の予測性能改善速度のうち、最大の予測性能改善速度を選択する。この予測性能改善速度が、カテゴリカルハイパーパラメータｃ_ｉについての予測性能改善速度になる。また、予測性能推定部１３５は、この予測性能改善速度が得られる連続ハイパーパラメータ値を選択する。 (S87) The prediction performance estimation unit 135 selects the maximum prediction performance improvement speed from the plurality of prediction performance improvement speeds calculated in step S85. This prediction performance improvement speed becomes the prediction performance improvement speed for the categorical hyperparameter _ci . Further, the prediction performance estimation unit 135 selects a continuous hyperparameter value at which the prediction performance improvement speed can be obtained.

（Ｓ８８）予測性能推定部１３５は、ステップＳ７０で全てのカテゴリカルハイパーパラメータ値を選択したか判断する。全てのカテゴリカルハイパーパラメータ値を選択した場合は性能改善量推定が終了し、未選択のカテゴリカルハイパーパラメータ値が存在する場合はステップＳ７０に処理が進む。 (S88) The prediction performance estimation unit 135 determines whether all the categorical hyperparameter values have been selected in step S70. When all the categorical hyperparameter values are selected, the performance improvement amount estimation is completed, and when there are unselected categorical hyperparameter values, the process proceeds to step S70.

第３の実施の形態の機械学習装置によれば、第２の実施の形態と同様の効果が得られる。更に、第３の実施の形態では、推定される予測性能改善速度が最大のハイパーパラメータ値が次に選択される。よって、制限時間が短い場合でも、予測性能が高くなるハイパーパラメータ値を効率的に探索することができる。 According to the machine learning device of the third embodiment, the same effect as that of the second embodiment can be obtained. Further, in the third embodiment, the hyperparameter value with the maximum estimated predicted performance improvement rate is selected next. Therefore, even if the time limit is short, it is possible to efficiently search for hyperparameter values that improve the prediction performance.

１０探索装置
１１記憶部
１２処理部
１３予測性能差情報
１４，１５ハイパーパラメータ
１４ａ，１４ｂ，１５ａ値
１６予測性能 10 Search device 11 Storage unit 12 Processing unit 13 Prediction performance difference information 14,15 Hyperparameters 14a, 14b, 15a Value 16 Prediction performance

Claims

コンピュータに、
異なる値の間に大小関係が規定されない第１のハイパーパラメータと異なる値の間に大小関係が規定される第２のハイパーパラメータとに基づいて制御される機械学習を、複数のデータセットに対して過去に実行した結果について、前記第１のハイパーパラメータに第１の値を設定した場合の予測性能と前記第１のハイパーパラメータに第２の値を設定した場合の予測性能との間の差を示す予測性能差情報を取得し、
前記複数のデータセットと異なる他のデータセットに対して、前記第１のハイパーパラメータに前記第１の値を設定し前記第２のハイパーパラメータに第３の値を設定して前記機械学習を実行させることで第１の予測性能を算出し、
前記第１の予測性能と前記予測性能差情報とに基づいて、前記他のデータセットに対して次に前記機械学習を実行するときに使用する前記第１のハイパーパラメータの値と前記第２のハイパーパラメータの値の組み合わせを選択する、
処理を実行させる探索プログラム。 On the computer
Machine learning controlled based on a first hyperparameter with no magnitude relationship between different values and a second hyperparameter with a magnitude relationship between different values for multiple datasets. Regarding the results executed in the past, the difference between the predicted performance when the first value is set for the first hyperparameter and the predicted performance when the second value is set for the first hyperparameter. Obtain the predicted performance difference information shown, and
For other data sets different from the plurality of data sets, the first hyperparameter is set to the first value, the second hyperparameter is set to the third value, and the machine learning is executed. By letting it calculate the first prediction performance,
Based on the first prediction performance and the prediction performance difference information, the value of the first hyperparameter and the second hyperparameter to be used when the machine learning is next executed for the other data set. Select a combination of hyperparameter values,
A search program that executes processing.

前記組み合わせの選択では、前記第１の予測性能と前記予測性能差情報とに基づいて、前記第１のハイパーパラメータに前記第２の値を設定し前記第２のハイパーパラメータに前記第３の値を設定した場合に算出される第２の予測性能を推定し、前記第１の予測性能と前記第２の予測性能とに基づいて前記組み合わせを選択する、
請求項１記載の探索プログラム。 In the selection of the combination, the second value is set in the first hyperparameter and the third value is set in the second hyperparameter based on the first predicted performance and the predicted performance difference information. The second prediction performance calculated when is set is estimated, and the combination is selected based on the first prediction performance and the second prediction performance.
The search program according to claim 1.

前記予測性能差情報は、前記第２のハイパーパラメータに設定可能な複数の値それぞれについて、前記第１のハイパーパラメータに前記第１の値を設定した場合の予測性能と前記第２の値を設定した場合の予測性能との間の差を示し、
前記組み合わせの選択では、前記予測性能差情報が示す前記複数の値に対応する複数の差のうち前記第３の値に対応する差に基づいて前記第２の予測性能を推定する、
請求項２記載の探索プログラム。 The predicted performance difference information sets the predicted performance when the first value is set in the first hyperparameter and the second value for each of the plurality of values that can be set in the second hyperparameter. Shows the difference from the predicted performance when
In the selection of the combination, the second predicted performance is estimated based on the difference corresponding to the third value among the plurality of differences corresponding to the plurality of values indicated by the predicted performance difference information.
The search program according to claim 2.

前記コンピュータに更に、
前記複数のデータセットに対して前記機械学習を実行させることで複数の予測性能を算出し、前記複数の予測性能をデータセット毎の予測性能の分布に基づいて正規化することで前記予測性能差情報を生成する処理を実行させる、
請求項１記載の探索プログラム。 Further to the computer
The machine learning is executed for the plurality of data sets to calculate a plurality of prediction performances, and the plurality of prediction performances are normalized based on the distribution of the prediction performances for each data set to obtain the prediction performance difference. Execute the process to generate information,
The search program according to claim 1.

前記予測性能差情報の生成では、データセット毎の予測性能の最大値および最小値に基づいて前記複数の予測性能を正規化する、
請求項４記載の探索プログラム。 In the generation of the prediction performance difference information, the plurality of prediction performances are normalized based on the maximum value and the minimum value of the prediction performance for each data set.
The search program according to claim 4 .

コンピュータが実行する探索方法であって、
異なる値の間に大小関係が規定されない第１のハイパーパラメータと異なる値の間に大小関係が規定される第２のハイパーパラメータとに基づいて制御される機械学習を、複数のデータセットに対して過去に実行した結果について、前記第１のハイパーパラメータに第１の値を設定した場合の予測性能と前記第１のハイパーパラメータに第２の値を設定した場合の予測性能との間の差を示す予測性能差情報を取得し、
前記複数のデータセットと異なる他のデータセットに対して、前記第１のハイパーパラメータに前記第１の値を設定し前記第２のハイパーパラメータに第３の値を設定して前記機械学習を実行させることで第１の予測性能を算出し、
前記第１の予測性能と前記予測性能差情報とに基づいて、前記他のデータセットに対して次に前記機械学習を実行するときに使用する前記第１のハイパーパラメータの値と前記第２のハイパーパラメータの値の組み合わせを選択する、
探索方法。 It ’s a computer-executed search method.
Machine learning controlled based on a first hyperparameter with no magnitude relationship between different values and a second hyperparameter with a magnitude relationship between different values for multiple datasets. Regarding the results executed in the past, the difference between the predicted performance when the first value is set for the first hyperparameter and the predicted performance when the second value is set for the first hyperparameter. Obtain the predicted performance difference information shown, and
For other data sets different from the plurality of data sets, the first hyperparameter is set to the first value, the second hyperparameter is set to the third value, and the machine learning is executed. By letting it calculate the first prediction performance,
Based on the first prediction performance and the prediction performance difference information, the value of the first hyperparameter and the second hyperparameter to be used when the machine learning is next executed for the other data set. Select a combination of hyperparameter values,
Search method.

異なる値の間に大小関係が規定されない第１のハイパーパラメータと異なる値の間に大小関係が規定される第２のハイパーパラメータとに基づいて制御される機械学習を、複数のデータセットに対して過去に実行した結果について、前記第１のハイパーパラメータに第１の値を設定した場合の予測性能と前記第１のハイパーパラメータに第２の値を設定した場合の予測性能との間の差を示す予測性能差情報を記憶する記憶部と、
前記複数のデータセットと異なる他のデータセットに対して、前記第１のハイパーパラメータに前記第１の値を設定し前記第２のハイパーパラメータに第３の値を設定して前記機械学習を実行させることで第１の予測性能を算出し、前記第１の予測性能と前記予測性能差情報とに基づいて、前記他のデータセットに対して次に前記機械学習を実行するときに使用する前記第１のハイパーパラメータの値と前記第２のハイパーパラメータの値の組み合わせを選択する処理部と、
を有する探索装置。 Machine learning controlled based on a first hyperparameter with no magnitude relationship between different values and a second hyperparameter with a magnitude relationship between different values for multiple datasets. Regarding the results executed in the past, the difference between the predicted performance when the first value is set for the first hyperparameter and the predicted performance when the second value is set for the first hyperparameter. A storage unit that stores the predicted performance difference information shown, and
For other data sets different from the plurality of data sets, the first hyperparameter is set to the first value, the second hyperparameter is set to the third value, and the machine learning is executed. By doing so, the first prediction performance is calculated, and based on the first prediction performance and the prediction performance difference information, the said machine learning to be used next time for the other data set is performed. A processing unit that selects a combination of the value of the first hyperparameter and the value of the second hyperparameter, and
Search device with.