JP7231829B2

JP7231829B2 - Machine learning program, machine learning method and machine learning apparatus

Info

Publication number: JP7231829B2
Application number: JP2019137027A
Authority: JP
Inventors: 健一小林
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 2019-07-25
Filing date: 2019-07-25
Publication date: 2023-03-02
Anticipated expiration: 2039-07-25
Also published as: JP2021022051A

Description

本発明は機械学習プログラム、機械学習方法および機械学習装置に関する。 The present invention relates to a machine learning program, a machine learning method and a machine learning device.

コンピュータを利用したデータ分析の１つとして、機械学習が行われることがある。機械学習では、幾つかの既知の事例を示す訓練データをコンピュータに入力する。コンピュータは、訓練データを分析して、要因（説明変数や独立変数と言うことがある）と結果（目的変数や従属変数と言うことがある）との間の関係を一般化したモデルを学習する。学習されたモデルを用いることで、未知の事例についての結果を予測することができる。 Machine learning may be performed as one of data analysis using a computer. In machine learning, a computer is fed training data representing a few known examples. A computer analyzes training data to learn a model that generalizes the relationship between factors (sometimes called explanatory or independent variables) and outcomes (sometimes called objective or dependent variables) . A learned model can be used to predict outcomes for unknown cases.

機械学習では、学習されるモデルの正確さ、すなわち、未知の事例の結果を正確に予測する能力（モデルの精度、予測性能、性能などと言うことがある）が問題となる。モデルの精度は、分析対象とする事象の性質、モデルの学習に使用する訓練データのサイズ、機械学習アルゴリズムなど様々な側面に依存する。精度が不十分なモデルが生成された場合、訓練データのサイズを増加させる、機械学習アルゴリズムを変更するなどの調整を行って、機械学習を再実行することもある。そこで、データ母集合から訓練データとは異なる既知の事例を示すテストデータを抽出し、訓練データを用いて学習されたモデルに対してテストデータを入力することで、モデルの精度を評価することが多い。 In machine learning, the accuracy of the learned model, that is, the ability to accurately predict the outcome of unknown cases (sometimes referred to as model accuracy, predictive performance, performance, etc.) is a problem. The accuracy of a model depends on many aspects, such as the nature of the events being analyzed, the size of the training data used to train the model, and the machine learning algorithms. If a model with insufficient accuracy is generated, we may make adjustments, such as increasing the size of the training data or changing the machine learning algorithm, and rerun the machine learning. Therefore, it is possible to evaluate the accuracy of the model by extracting test data showing known cases different from the training data from the data population and inputting the test data to the model learned using the training data. many.

なお、同一のデータ母集合から、異なる分割方法によって訓練データとテストデータのペアを複数通り生成し、ペア毎に訓練データを用いたモデルの学習とテストデータを用いた精度の評価を行い、精度の平均を算出する汎化能力評価方法が提案されている。また、データベースから訓練データを抽出して回帰分析を行い、回帰モデルの精度を評価し、精度が不十分である場合には訓練データを追加して回帰分析を再度行う結果予測装置が提案されている。また、教師ラベルが付されたサンプルのデータ母集合から、訓練データと類似するテストデータを抽出し、訓練データを用いて学習された分類モデルの精度を、訓練データと類似するテストデータを用いて評価する情報処理システムが提案されている。 In addition, from the same data population, multiple pairs of training data and test data are generated by different division methods, and model learning using training data and accuracy evaluation using test data are performed for each pair. A generalization ability evaluation method that calculates the average of In addition, a result prediction device has been proposed in which training data is extracted from a database, regression analysis is performed, the accuracy of the regression model is evaluated, and if the accuracy is insufficient, training data is added and regression analysis is performed again. there is In addition, we extract test data similar to the training data from the data population of samples with teacher labels, and evaluate the accuracy of the classification model learned using the training data using test data similar to the training data. An information processing system for evaluation has been proposed.

特開平９－５４７６４号公報JP-A-9-54764 特開２０１４－１３５６０号公報Japanese Unexamined Patent Application Publication No. 2014-13560 国際公開第２０１７／１８３５４８号WO2017/183548

しかし、モデルの精度を評価するにあたり、テストデータのサイズをどの様に決定すればよいかが問題となる。テストデータが少な過ぎると、テストデータとして選択されるサンプルの偶然性の影響を強く受けて、算出される精度が不正確になり信頼性が低下する。一方、テストデータが多過ぎると、精度の評価に長時間かかることになり非効率である。この点、従来の機械学習では、訓練データのサイズの２分の１から４分の１程度をテストデータのサイズとするなど、経験則に基づいてサイズを決定していた。そのため、テストデータを用いたモデルの精度の評価について改善の余地があった。 However, in evaluating the accuracy of the model, the problem is how to determine the size of the test data. If the test data is too small, the randomness of the sample selected as the test data is strongly affected, and the calculated accuracy becomes inaccurate and unreliable. On the other hand, if there is too much test data, it will take a long time to evaluate accuracy, which is inefficient. In this regard, in conventional machine learning, the size is determined based on empirical rules, such as setting the size of the test data to be about 1/2 to 1/4 of the size of the training data. Therefore, there is room for improvement in evaluating the accuracy of the model using test data.

１つの側面では、本発明は、機械学習のテストデータのサイズを適切に決定できる機械学習プログラム、機械学習方法および機械学習装置を提供することを目的とする。 An object of the present invention is to provide a machine learning program, a machine learning method, and a machine learning apparatus that can appropriately determine the size of test data for machine learning.

１つの態様では、コンピュータに以下の処理を実行させる機械学習プログラムが提供される。データ集合から抽出された複数の第１の訓練データを用いて、機械学習により複数の第１の訓練データに対応する複数の第１のモデルを学習する。データ集合から抽出された第１のテストデータに含まれる２以上のレコードそれぞれを複数の第１のモデルに入力することで、複数の第１のモデルと２以上のレコードとの組み合わせ毎に算出された予測誤差を示す誤差情報を生成する。誤差情報に基づいて、テストデータのサイズとテストデータを用いて算出されるモデルの精度の測定値が有する分散との間の対応関係を判定する。データ集合から抽出された第２の訓練データを用いて学習された第２のモデルの精度を、データ集合から抽出される第２のテストデータを用いて測定する場合に、対応関係に基づいて、第２のモデルに対して算出される精度の測定値の分散が所定条件を満たすように第２のテストデータのサイズを決定する。 In one aspect, a machine learning program is provided that causes a computer to perform the following processes. Using a plurality of first training data extracted from the dataset, a plurality of first models corresponding to the plurality of first training data are learned by machine learning. By inputting each of the two or more records included in the first test data extracted from the data set into the plurality of first models, it is calculated for each combination of the plurality of first models and the two or more records. Generate error information that indicates the predicted error. Based on the error information, a correspondence between the size of the test data and the variance of a measure of model accuracy computed using the test data is determined. When measuring the accuracy of the second model learned using the second training data extracted from the dataset using the second test data extracted from the dataset, based on the correspondence, The size of the second test data is determined such that the variance of the accuracy measurements calculated for the second model satisfies a predetermined condition.

また、１つの態様では、コンピュータが実行する機械学習方法が提供される。また、１つの態様では、記憶部と処理部とを有する機械学習装置が提供される。 Also, in one aspect, a computer-implemented machine learning method is provided. Also, in one aspect, a machine learning device having a storage unit and a processing unit is provided.

１つの側面では、機械学習のテストデータのサイズが適切に決定される。 In one aspect, machine learning test data is sized appropriately.

第１の実施の形態の機械学習装置の例を説明する図である。It is a figure explaining the example of the machine-learning apparatus of 1st Embodiment. 第２の実施の形態の機械学習装置のハードウェア例を示す図である。It is a figure which shows the hardware example of the machine-learning apparatus of 2nd Embodiment. 訓練データサイズと予測性能の関係例を示すグラフである。It is a graph which shows the relationship example of training data size and prediction performance. 予測性能の測定値の分散例を示すグラフである。FIG. 5 is a graph showing an example variance of predictive performance measurements; FIG. 予測性能の期待ロスおよび期待バイアスの例を示すグラフである。7 is a graph showing an example of expected loss and expected bias in predictive performance; 機械学習装置の機能例を示すブロック図である。3 is a block diagram showing an example of functions of a machine learning device; FIG. 誤差プロファイルテーブルの例を示す図である。FIG. 10 is a diagram showing an example of an error profile table; FIG. 分散関数テーブルの例を示す図である。FIG. 10 is a diagram showing an example of a distributed function table; 機械学習の手順例を示すフローチャートである。5 is a flow chart showing an example of machine learning procedure. 機械学習の手順例を示すフローチャート（続き）である。10 is a flowchart (continued) showing an example of the machine learning procedure;

以下、本実施の形態を図面を参照して説明する。
［第１の実施の形態］
第１の実施の形態を説明する。 Hereinafter, this embodiment will be described with reference to the drawings.
[First embodiment]
A first embodiment will be described.

図１は、第１の実施の形態の機械学習装置の例を説明する図である。
第１の実施の形態の機械学習装置１０は、訓練データを用いて機械学習によりモデルを生成し、テストデータを用いてモデルの精度を測定する。機械学習装置１０を、情報処理装置やコンピュータと言うこともできる。機械学習装置１０は、ユーザが操作するクライアント装置でもよいし、他の装置からアクセスされるサーバ装置でもよい。 FIG. 1 is a diagram illustrating an example of a machine learning device according to the first embodiment.
The machine learning device 10 of the first embodiment generates a model by machine learning using training data, and measures the accuracy of the model using test data. The machine learning device 10 can also be called an information processing device or a computer. The machine learning device 10 may be a client device operated by a user or a server device accessed from another device.

機械学習装置１０は、記憶部１１および処理部１２を有する。記憶部１１は、ＲＡＭ（Random Access Memory）などの揮発性半導体メモリでもよいし、ＨＤＤ（Hard Disk Drive）やフラッシュメモリなどの不揮発性ストレージでもよい。処理部１２は、例えば、ＣＰＵ（Central Processing Unit）、ＧＰＵ（Graphics Processing Unit）、ＤＳＰ（Digital Signal Processor）などのプロセッサである。ただし、処理部１２は、ＡＳＩＣ（Application Specific Integrated Circuit）やＦＰＧＡ（Field Programmable Gate Array）などの特定用途の電子回路を含んでもよい。プロセッサは、ＲＡＭなどのメモリ（記憶部１１でもよい）に記憶されたプログラムを実行する。複数のプロセッサの集合を「マルチプロセッサ」または単に「プロセッサ」と言うこともある。 A machine learning device 10 has a storage unit 11 and a processing unit 12 . The storage unit 11 may be a volatile semiconductor memory such as a RAM (Random Access Memory), or may be a non-volatile storage such as an HDD (Hard Disk Drive) or flash memory. The processing unit 12 is, for example, a processor such as a CPU (Central Processing Unit), a GPU (Graphics Processing Unit), or a DSP (Digital Signal Processor). However, the processing unit 12 may include electronic circuits for specific purposes such as ASICs (Application Specific Integrated Circuits) and FPGAs (Field Programmable Gate Arrays). The processor executes a program stored in a memory such as RAM (which may be the storage unit 11). A collection of multiple processors is sometimes called a "multiprocessor" or simply a "processor."

記憶部１１は、データ集合１３を記憶する。データ集合１３は、既知の事例を示す複数のレコードを含む。レコードを、サンプルや行やデータブロックと言うこともできる。各レコードは、１以上の説明変数の値と１つの目的変数の値とを含む。説明変数を、カラムと言うこともできる。目的変数の値は、ユーザから与えられる正解であり、教師ラベルと言うこともできる。データ集合１３は、１００万レコード以上の多数のレコードを含んでもよく、ビッグデータと言われる大規模データであってもよい。 Storage unit 11 stores data set 13 . The dataset 13 contains multiple records representing known cases. A record can also be called a sample, a row, or a data block. Each record contains one or more explanatory variable values and one objective variable value. An explanatory variable can also be called a column. The value of the objective variable is the correct answer given by the user, and can also be called a teacher label. The data set 13 may include a large number of records of one million records or more, and may be large-scale data called big data.

処理部１２は、データ集合１３から、訓練データ１４ａ，１４ｂ，１４ｃを含む複数セットの訓練データ（第１の訓練データ）を抽出する。ここで抽出する訓練データのセット数は少数でよく、各訓練データのサイズは小さくてよい。例えば、訓練データのセット数を１０セット程度とし、各訓練データのサイズを１万レコード程度とする。各訓練データのサイズは、後述する訓練データ１８のサイズの１００分の１程度でよい。処理部１２は、同一のレコードが異なる訓練データに含まれないようにデータ集合１３からレコードを抽出してもよいし、同一のレコードが異なる訓練データに含まれることを許容してもよい。処理部１２は、データ集合１３からランダムにレコードを抽出してもよい。 The processing unit 12 extracts, from the data set 13, multiple sets of training data (first training data) including training data 14a, 14b, and 14c. The number of sets of training data extracted here may be small, and the size of each training data may be small. For example, assume that the number of training data sets is about 10 sets, and the size of each training data set is about 10,000 records. The size of each training data may be about 1/100 of the size of the training data 18 described later. The processing unit 12 may extract records from the data set 13 so that the same record is not included in different training data, or may allow the same record to be included in different training data. The processing unit 12 may randomly extract records from the data set 13 .

また、処理部１２は、データ集合１３からテストデータ１５（第１のテストデータ）を抽出する。テストデータ１５のサイズは、訓練データ１４ａ，１４ｂ，１４ｃのサイズより小さくてよい。テストデータ１５のサイズは、訓練データ１４ａ，１４ｂ，１４ｃのサイズの２分の１から４分の１程度でもよく、例えば、５０００レコード程度とする。処理部１２は、テストデータ１５に属するレコードを、訓練データ１４ａ，１４ｂ，１４ｃと重複しないようにデータ集合１３から抽出することが好ましい。 The processing unit 12 also extracts test data 15 (first test data) from the data set 13 . The size of test data 15 may be smaller than the size of training data 14a, 14b, 14c. The size of the test data 15 may be about 1/2 to 1/4 of the size of the training data 14a, 14b, 14c, for example, about 5000 records. It is preferable that the processing unit 12 extract records belonging to the test data 15 from the data set 13 so as not to overlap with the training data 14a, 14b, 14c.

処理部１２は、訓練データ１４ａ，１４ｂ，１４ｃを含む複数セットの訓練データを用いて、機械学習によりそれら複数セットの訓練データに対応する複数のモデルを学習する。訓練データ１４ａから１つのモデルが学習され、それと独立に訓練データ１４ｂから１つのモデルが学習され、それと独立に訓練データ１４ｃから１つのモデルが学習される。 The processing unit 12 uses a plurality of sets of training data including training data 14a, 14b, and 14c to learn a plurality of models corresponding to the plurality of sets of training data by machine learning. One model is learned from training data 14a, independently from training data 14b, and independently from training data 14c.

複数のモデルの学習には、同一の機械学習アルゴリズムが使用される。使用する機械学習アルゴリズムは、ユーザにより指定されてもよい。機械学習アルゴリズムとして、回帰分析、サポートベクタマシン、ランダムフォレストなどが挙げられる。モデルは、説明変数と目的変数との間の関係を示し、通常、１以上の説明変数と１以上の係数と１つの目的変数とを含む。係数は、機械学習を通じて訓練データに基づいて決定される。 The same machine learning algorithm is used to train multiple models. The machine learning algorithm to use may be specified by the user. Machine learning algorithms include regression analysis, support vector machines, and random forests. A model describes the relationship between explanatory variables and objective variables and typically includes one or more explanatory variables, one or more coefficients, and one objective variable. Coefficients are determined based on training data through machine learning.

次に、処理部１２は、テストデータ１５および学習した複数のモデルを用いて、誤差情報１６を生成する。誤差情報１６を、誤差プロファイルと言うこともできる。誤差情報１６は、複数セットの訓練データに対応する複数のモデルとテストデータ１５に含まれる２以上のレコードとの組み合わせ毎に算出された予測誤差を示す。 Next, the processing unit 12 generates error information 16 using the test data 15 and the learned models. Error information 16 can also be referred to as an error profile. The error information 16 indicates prediction errors calculated for each combination of multiple models corresponding to multiple sets of training data and two or more records included in the test data 15 .

このとき、処理部１２は、ある訓練データから学習された１つのモデルに、テストデータ１５に含まれる１つのレコードを入力することで、当該１つのモデルと当該１つのレコードの組に対応する１つの予測誤差を算出する。例えば、処理部１２は、テストデータ１５のレコードに含まれる説明変数の値をモデルの説明変数に代入する。処理部１２は、モデルによって算出される目的変数の値である予測値と、テストデータ１５のレコードに含まれる目的変数の値である正解値とを比較し、両者の差を予測誤差として算出する。 At this time, the processing unit 12 inputs one record included in the test data 15 to one model learned from certain training data, so that one model corresponding to a set of the one model and the one record. Calculate the prediction error of each For example, the processing unit 12 substitutes the values of the explanatory variables included in the records of the test data 15 into the explanatory variables of the model. The processing unit 12 compares the predicted value, which is the value of the objective variable calculated by the model, with the correct value, which is the value of the objective variable included in the record of the test data 15, and calculates the difference between the two as the prediction error. .

次に、処理部１２は、誤差情報１６に基づいて対応関係１７を判定する。対応関係１７は、テストデータのサイズと、テストデータを用いて算出されるモデルの精度の測定値が有するばらつきの程度である分散との間の対応関係を示す。モデルの精度は、未知の事例の結果を正確に予測する能力であり、予測性能や性能と言うこともできる。モデルの精度の指標として、正答率（Accuracy）、適合率（Precision）、平均二乗誤差（ＭＳＥ）、二乗平均平方根誤差（ＲＭＳＥ）などが挙げられる。 Next, the processing unit 12 determines the correspondence 17 based on the error information 16 . Correspondence 17 indicates the correspondence between the size of the test data and the variance, which is the degree of variability of the model accuracy measurements calculated using the test data. Model accuracy is the ability to accurately predict the outcome of unknown cases, and can also be referred to as predictive performance or performance. Indices of model accuracy include accuracy, precision, mean square error (MSE), root mean square error (RMSE), and the like.

対応関係１７は、例えば、テストデータのサイズの増加に応じて分散が下限に漸近するように減少する非線形関係である。一般に、データ集合１３からのテストデータの抽出には、レコードの選択の偶然性がある。このため、テストデータのサイズが小さいと、レコードの選択の偶然性の影響を強く受けて、精度の測定値が真の値からずれるリスクが高くなる。テストデータのサイズを大きくすることで、分散を小さくすることができる。ただし、データ集合１３からの訓練データの抽出にも、レコードの選択の偶然性がある。テストデータのサイズの増加だけでは、精度の測定値の分散は０にならない。 Correspondence 17 is, for example, a non-linear relationship in which the variance decreases as the size of the test data increases, asymptotically to the lower bound. In general, the extraction of test data from the dataset 13 is subject to the randomness of record selection. Therefore, if the size of the test data is small, chances of record selection are more likely to affect accuracy measurements, and the risk of deviating from the true value increases. Increasing the size of the test data can reduce the variance. However, the extraction of training data from the data set 13 also has the randomness of record selection. Increasing the size of the test data alone does not reduce the variance of the accuracy measurements to zero.

対応関係１７は、機械学習に使用するデータ集合１３や機械学習アルゴリズムに依存し得る。そこで、処理部１２は、誤差情報１６に基づいて対応関係１７を判定する。例えば、処理部１２は、誤差情報１６が示す予測誤差のうち、テストデータ１５のレコードが同一でモデルが異なる予測誤差を平均化することで、テストデータ１５のレコード毎に予測バイアスを算出する。処理部１２は、テストデータ１５の２以上のレコードの予測バイアスを合成して、対応関係１７を規定するパラメータの値を決定する。 The correspondence 17 may depend on the dataset 13 used for machine learning and the machine learning algorithm. Therefore, the processing unit 12 determines the correspondence 17 based on the error information 16 . For example, the processing unit 12 calculates the prediction bias for each record of the test data 15 by averaging the prediction errors of the same record of the test data 15 but different models among the prediction errors indicated by the error information 16 . The processing unit 12 synthesizes prediction biases of two or more records of the test data 15 to determine parameter values that define the correspondence 17 .

対応関係１７は、訓練データのサイズに依存しない第１のパラメータと、訓練データのサイズに依存する第２のパラメータと、テストデータのサイズを示す第３のパラメータとから分散を算出する分散関数であってもよい。この場合、処理部１２は、誤差情報１６を用いて第１のパラメータの値を推定してもよい。これにより、分散関数は、変数として第２のパラメータと第３のパラメータをもつ関数になる。 Correspondence 17 is a variance function that calculates the variance from a first parameter that does not depend on the size of training data, a second parameter that depends on the size of training data, and a third parameter that indicates the size of test data. There may be. In this case, the processing unit 12 may use the error information 16 to estimate the value of the first parameter. As a result, the variance function becomes a function with the second parameter and the third parameter as variables.

次に、処理部１２は、データ集合１３から訓練データ１８（第２の訓練データ）を抽出する。訓練データ１８のサイズは、訓練データ１４ａ，１４ｂ，１４ｃより十分に大きくてもよく、ユーザから指定されてもよい。例えば、訓練データ１８のサイズを１００万レコード程度とする。処理部１２は、訓練データ１８を用いてモデルを学習する。 Next, the processing unit 12 extracts training data 18 (second training data) from the data set 13 . The size of the training data 18 may be sufficiently larger than the training data 14a, 14b, 14c and may be specified by the user. For example, assume that the size of the training data 18 is approximately one million records. The processing unit 12 uses training data 18 to learn the model.

モデルが学習されると、処理部１２は、データ集合１３からテストデータ１９（第２のテストデータ）を抽出する。処理部１２は、テストデータ１９に属するレコードを、訓練データ１８と重複しないようにデータ集合１３から抽出することが好ましい。処理部１２は、訓練データ１８から学習されたモデルの精度を、テストデータ１９を用いて測定する。例えば、処理部１２は、テストデータ１９のレコードに含まれる説明変数の値をモデルの説明変数に代入し、モデルによって算出される目的変数の予測値とテストデータ１９のレコードに含まれる目的変数の正解値とを比較して、精度を測定する。 After the model is trained, processing unit 12 extracts test data 19 (second test data) from data set 13 . It is preferable that the processing unit 12 extract records belonging to the test data 19 from the data set 13 so as not to overlap with the training data 18 . The processing unit 12 measures the accuracy of the model learned from the training data 18 using test data 19 . For example, the processing unit 12 substitutes the value of the explanatory variable included in the record of the test data 19 into the explanatory variable of the model, and the prediction value of the objective variable calculated by the model and the objective variable included in the record of the test data 19 Accuracy is measured by comparing with the correct answer.

このとき、処理部１２は、対応関係１７に基づいて、モデルの精度の測定値の分散が所定条件を満たすように、テストデータ１９のサイズを決定する。例えば、処理部１２は、対応関係１７において、サイズの所定増加量に対する分散の減少度を示す効率性指標を算出し、効率性指標に基づいてテストデータ１９のサイズを決定する。対応関係１７が、テストデータのサイズの増加に応じて分散が下限に漸近する非線形関係である場合、効率性指標の値は、テストデータのサイズの増加に応じて減少する。テストデータ１９のサイズは、効率性指標の値が閾値以上である範囲で最大のサイズとしてもよい。 At this time, the processing unit 12 determines the size of the test data 19 based on the correspondence 17 so that the variance of the measured values of the accuracy of the model satisfies a predetermined condition. For example, in the correspondence 17, the processing unit 12 calculates an efficiency index indicating the degree of decrease in variance with respect to a predetermined amount of increase in size, and determines the size of the test data 19 based on the efficiency index. If the correspondence 17 is a nonlinear relationship in which the variance asymptotically approaches a lower bound as the size of the test data increases, the value of the efficiency index decreases as the size of the test data increases. The size of the test data 19 may be the maximum size within a range in which the value of the efficiency index is equal to or greater than the threshold.

また、例えば、処理部１２は、訓練データ１８を用いたモデルの学習結果に基づいて、分散関数に含まれる訓練データのサイズに依存する第２のパラメータの値を決定する。そして、処理部１２は、決定された上記の第１のパラメータの値および第２のパラメータの値のもとで、テストデータのサイズを示す第３のパラメータの値を変動させることで、分散が所定条件を満たすテストデータのサイズを探索する。 Also, for example, the processing unit 12 determines the value of the second parameter that depends on the size of the training data included in the variance function, based on the learning result of the model using the training data 18 . Then, the processing unit 12 changes the value of the third parameter indicating the size of the test data based on the determined values of the first parameter and the second parameter, so that the variance is Search for the size of test data that satisfies a predetermined condition.

なお、誤差情報１６の生成および対応関係１７の判定は、訓練データ１８を用いた機械学習の前に行ってもよいし後に行ってもよい。処理部１２は、訓練データ１８を用いて学習されたモデルと、テストデータ１９を用いて測定された精度を出力する。処理部１２は、学習されたモデルと測定された精度を、記憶装置に保存してもよいし、表示装置に表示してもよいし、他の情報処理装置に送信してもよい。 Note that the generation of the error information 16 and the determination of the correspondence 17 may be performed before or after machine learning using the training data 18 . The processing unit 12 outputs the model trained using the training data 18 and the accuracy measured using the test data 19 . The processing unit 12 may store the learned model and the measured accuracy in a storage device, display them on a display device, or transmit them to another information processing device.

第１の実施の形態の機械学習装置１０によれば、小さいサイズの訓練データ１４ａ，１４ｂ，１４ｃを用いて複数のモデルが学習される。小さいサイズのテストデータ１５を用いて、それら複数のモデルとテストデータ１５の２以上のレコードとの組み合わせ毎に算出された予測誤差を示す誤差情報１６が生成される。誤差情報１６に基づいて、テストデータのサイズとモデルの精度の測定値が有する分散との間の対応関係１７が判定される。そして、訓練データ１８を用いて学習されたモデルの精度を、テストデータ１９を用いて測定するにあたり、対応関係１７に基づいて、精度の測定値の分散が所定条件を満たすようにテストデータ１９のサイズが決定される。 According to the machine learning device 10 of the first embodiment, a plurality of models are learned using small-sized training data 14a, 14b, 14c. Using small test data 15 , error information 16 is generated that indicates prediction errors calculated for each combination of the plurality of models and two or more records of test data 15 . Based on the error information 16, a correspondence 17 between the size of the test data and the variance that the model's accuracy measurements have is determined. Then, when measuring the accuracy of the model learned using the training data 18 using the test data 19, the test data 19 is adjusted based on the correspondence 17 so that the variance of the accuracy measurement values satisfies a predetermined condition. size is determined.

これにより、テストデータ１９のサイズがモデル精度の測定値の分散に与える影響を考慮して、テストデータ１９のサイズを適切に決定することができる。よって、テストデータ１９のサイズが小さ過ぎることによる測定値の信頼性の低下を抑制できる。また、テストデータ１９のサイズが大き過ぎることによる処理時間の増大を抑制できる。このため、機械学習により学習されたモデルの精度を、高信頼かつ短時間で測定することができ、モデルの精度の測定を効率化できる。特に、テストデータのサイズを訓練データのサイズの２分の１から４分の１程度とする経験則と比べて、テストデータのサイズを削減できる。 Accordingly, the size of the test data 19 can be appropriately determined in consideration of the influence of the size of the test data 19 on the variance of the measured values of model accuracy. Therefore, it is possible to suppress the decrease in the reliability of the measured value due to the size of the test data 19 being too small. Also, it is possible to suppress an increase in processing time due to the size of the test data 19 being too large. Therefore, the accuracy of a model learned by machine learning can be measured with high reliability and in a short time, and the accuracy of the model can be measured efficiently. In particular, the size of the test data can be reduced compared to the rule of thumb that sets the size of the test data to about 1/2 to 1/4 of the size of the training data.

［第２の実施の形態］
次に、第２の実施の形態を説明する。
図２は、第２の実施の形態の機械学習装置のハードウェア例を示す図である。 [Second embodiment]
Next, a second embodiment will be described.
FIG. 2 is a diagram illustrating a hardware example of a machine learning device according to the second embodiment.

機械学習装置１００は、ＣＰＵ１０１、ＲＡＭ１０２、ＨＤＤ１０３、画像インタフェース１０４、入力インタフェース１０５、媒体リーダ１０６および通信インタフェース１０７を有する。機械学習装置１００が有するこれらのユニットは、機械学習装置１００の内部でバスに接続されている。機械学習装置１００は、第１の実施の形態の機械学習装置１０に対応する。ＣＰＵ１０１は、第１の実施の形態の処理部１２に対応する。ＲＡＭ１０２またはＨＤＤ１０３は、第１の実施の形態の記憶部１１に対応する。 Machine learning device 100 has CPU 101 , RAM 102 , HDD 103 , image interface 104 , input interface 105 , medium reader 106 and communication interface 107 . These units of machine learning device 100 are connected to a bus inside machine learning device 100 . A machine learning device 100 corresponds to the machine learning device 10 of the first embodiment. A CPU 101 corresponds to the processing unit 12 of the first embodiment. A RAM 102 or HDD 103 corresponds to the storage unit 11 of the first embodiment.

ＣＰＵ１０１は、プログラムの命令を実行するプロセッサである。ＣＰＵ１０１は、ＨＤＤ１０３に記憶されたプログラムやデータの少なくとも一部をＲＡＭ１０２にロードし、プログラムを実行する。ＣＰＵ１０１は複数のプロセッサコアを備えてもよく、機械学習装置１００は複数のプロセッサを備えてもよい。複数のプロセッサの集合を「マルチプロセッサ」または単に「プロセッサ」と言うことがある。 The CPU 101 is a processor that executes program instructions. The CPU 101 loads at least part of the programs and data stored in the HDD 103 into the RAM 102 and executes the programs. The CPU 101 may include multiple processor cores, and the machine learning device 100 may include multiple processors. A collection of multiple processors is sometimes called a "multiprocessor" or simply a "processor."

ＲＡＭ１０２は、ＣＰＵ１０１が実行するプログラムやＣＰＵ１０１が演算に使用するデータを一時的に記憶する揮発性半導体メモリである。機械学習装置１００は、ＲＡＭ以外の種類のメモリを備えてもよく、複数のメモリを備えてもよい。 The RAM 102 is a volatile semiconductor memory that temporarily stores programs executed by the CPU 101 and data used by the CPU 101 for calculation. Machine learning device 100 may include a type of memory other than RAM, or may include multiple memories.

ＨＤＤ１０３は、ＯＳ（Operating System）やミドルウェアやアプリケーションソフトウェアなどのソフトウェアのプログラム、および、データを記憶する不揮発性ストレージである。機械学習装置１００は、フラッシュメモリやＳＳＤ（Solid State Drive）など他の種類のストレージを備えてもよく、複数のストレージを備えてもよい。 The HDD 103 is a nonvolatile storage that stores an OS (Operating System), software programs such as middleware and application software, and data. The machine learning device 100 may include other types of storage such as flash memory and SSD (Solid State Drive), or may include multiple storages.

画像インタフェース１０４は、ＣＰＵ１０１からの命令に従って、機械学習装置１００に接続された表示装置１１１に画像を出力する。表示装置１１１として、ＣＲＴ（Cathode Ray Tube）ディスプレイ、液晶ディスプレイ（ＬＣＤ：Liquid Crystal Display）、有機ＥＬ（ＯＥＬ：Organic Electro-Luminescence）ディスプレイ、プロジェクタなど、任意の種類の表示装置を使用することができる。機械学習装置１００に、プリンタなど表示装置１１１以外の出力デバイスが接続されてもよい。 The image interface 104 outputs an image to the display device 111 connected to the machine learning device 100 according to instructions from the CPU 101 . As the display device 111, any type of display device can be used, such as a CRT (Cathode Ray Tube) display, a liquid crystal display (LCD: Liquid Crystal Display), an organic EL (OEL: Organic Electro-Luminescence) display, or a projector. . An output device other than the display device 111 such as a printer may be connected to the machine learning device 100 .

入力インタフェース１０５は、機械学習装置１００に接続された入力デバイス１１２から入力信号を受け付ける。入力デバイス１１２として、マウス、タッチパネル、タッチパッド、キーボードなど、任意の種類の入力デバイスを使用することができる。機械学習装置１００に複数種類の入力デバイスが接続されてもよい。 Input interface 105 receives an input signal from input device 112 connected to machine learning apparatus 100 . Input device 112 can be any type of input device such as a mouse, touch panel, touch pad, keyboard, or the like. Multiple types of input devices may be connected to the machine learning device 100 .

媒体リーダ１０６は、記録媒体１１３に記録されたプログラムやデータを読み取る読み取り装置である。記録媒体１１３として、フレキシブルディスク（ＦＤ：Flexible Disk）やＨＤＤなどの磁気ディスク、ＣＤ（Compact Disc）やＤＶＤ（Digital Versatile Disc）などの光ディスク、半導体メモリなど、任意の種類の記録媒体を使用することができる。媒体リーダ１０６は、例えば、記録媒体１１３から読み取ったプログラムやデータを、ＲＡＭ１０２やＨＤＤ１０３などの他の記録媒体にコピーする。読み取られたプログラムは、例えば、ＣＰＵ１０１によって実行される。なお、記録媒体１１３は可搬型記録媒体であってもよく、プログラムやデータの配布に用いられることがある。また、記録媒体１１３やＨＤＤ１０３を、コンピュータ読み取り可能な記録媒体と言うことがある。 The medium reader 106 is a reading device that reads programs and data recorded on the recording medium 113 . Any type of recording medium can be used as the recording medium 113, such as magnetic disks such as flexible disks (FDs) and HDDs, optical disks such as CDs (Compact Discs) and DVDs (Digital Versatile Discs), and semiconductor memories. can be done. The medium reader 106 copies, for example, programs and data read from the recording medium 113 to other recording media such as the RAM 102 and the HDD 103 . The read program is executed by the CPU 101, for example. Note that the recording medium 113 may be a portable recording medium, and may be used for distribution of programs and data. Also, the recording medium 113 and the HDD 103 may be referred to as a computer-readable recording medium.

通信インタフェース１０７は、ネットワーク１１４に接続され、ネットワーク１１４を介して他の情報処理装置と通信する。通信インタフェース１０７は、スイッチやルータなどの有線通信装置に接続される有線通信インタフェースでもよいし、基地局やアクセスポイントなどの無線通信装置に接続される無線通信インタフェースでもよい。 The communication interface 107 is connected to the network 114 and communicates with other information processing apparatuses via the network 114 . The communication interface 107 may be a wired communication interface connected to a wired communication device such as a switch or router, or a wireless communication interface connected to a wireless communication device such as a base station or access point.

次に、機械学習における訓練データサイズと予測性能について説明する。
第２の実施の形態の機械学習では、既知の事例を示す複数のレコードを含むデータ集合を予め収集しておく。レコードを、サンプル、インスタンス、行、データブロック、単位データなどと言うこともできる。機械学習装置１００または他の情報処理装置が、センサデバイスなどの各種デバイスからネットワーク１１４経由でデータ集合を収集してもよい。収集されるデータ集合は、「ビッグデータ」と言われるサイズの大きなものであってもよい。各レコードは、通常、１以上の説明変数の値と１つの目的変数の値とを含む。例えば、商品の需要予測を行う機械学習では、気温や湿度など商品需要に影響を与える要因を説明変数とし、商品需要量を目的変数とした実績データを収集する。 Next, training data size and prediction performance in machine learning will be explained.
In machine learning according to the second embodiment, a data set including a plurality of records representing known cases is collected in advance. Records can also be referred to as samples, instances, rows, data blocks, unit data, and so on. Machine learning device 100 or other information processing device may collect data sets from various devices such as sensor devices via network 114 . Collected data sets may be large in size, referred to as "big data." Each record usually contains one or more explanatory variable values and one objective variable value. For example, in machine learning that predicts product demand, performance data is collected using factors that affect product demand, such as temperature and humidity, as explanatory variables, and product demand as the objective variable.

機械学習装置１００は、収集されたデータ集合の中から一部のレコードを訓練データとしてサンプリングし、訓練データを用いてモデルを学習する。モデルは、説明変数と目的変数との間の関係を示し、通常、１以上の説明変数と１以上の係数と１つの目的変数とを含む。モデルは、例えば、線形式、二次以上の多項式、指数関数、対数関数などの各種数式によって表されてもよい。数式の形は、機械学習の前にユーザによって指定されてもよい。係数は、機械学習によって訓練データに基づいて決定される。 The machine learning device 100 samples some records from the collected data set as training data, and learns a model using the training data. A model describes the relationship between explanatory variables and objective variables and typically includes one or more explanatory variables, one or more coefficients, and one objective variable. The model may be represented by various mathematical expressions, such as, for example, linear expressions, polynomials of second or higher order, exponential functions, logarithmic functions, and the like. The form of the formula may be specified by the user prior to machine learning. The coefficients are determined based on training data by machine learning.

学習されたモデルを用いることで、未知の事例の説明変数の値（要因）から、未知の事例の目的変数の値（結果）を予測することができる。例えば、来期の気象予報から来期の商品需要量を予測できる。モデルによって予測される結果は、０以上１以下の確率などの連続量であってもよいし、ＹＥＳ／ＮＯの２値などの離散値であってもよい。 By using the learned model, it is possible to predict the value (result) of the objective variable of the unknown case from the value (factor) of the explanatory variable of the unknown case. For example, it is possible to predict the product demand for the next term from the weather forecast for the next term. The result predicted by the model may be a continuous quantity, such as a probability of 0 to 1, or a discrete value, such as a binary value of YES/NO.

学習されたモデルに対しては「予測性能」を算出することができる。予測性能は、未知の事例の結果を正確に予測する能力であり、「精度」と言うこともできる。機械学習装置１００は、収集されたデータ集合の中から訓練データ以外のレコードをテストデータとしてサンプリングし、テストデータを用いて予測性能を算出する。機械学習装置１００は、テストデータに含まれる説明変数の値をモデルに入力し、モデルが出力する目的変数の値（予測値）とテストデータに含まれる目的変数の値（実績値）とを比較する。なお、学習したモデルの予測性能を検証することを「バリデーション」と言うことがある。 A "predictive performance" can be calculated for a trained model. Predictive performance is the ability to accurately predict the outcome of unknown cases, and can also be referred to as "accuracy." The machine learning device 100 samples records other than training data from the collected data set as test data, and calculates predictive performance using the test data. The machine learning device 100 inputs the value of the explanatory variable included in the test data into the model, and compares the value of the objective variable output by the model (predicted value) with the value of the objective variable included in the test data (actual value). do. It should be noted that verifying the prediction performance of a learned model is sometimes called "validation".

予測性能の指標としては、正答率（Accuracy）、適合率（Precision）、平均二乗誤差（ＭＳＥ）、二乗平均平方根誤差（ＲＭＳＥ）などが挙げられる。例えば、結果がＹＥＳ／ＮＯの２値で表されるとする。また、ｎ件のテストデータのレコードのうち、予測値＝ＹＥＳかつ実績値＝ＹＥＳの件数をＴｐ、予測値＝ＹＥＳかつ実績値＝ＮＯの件数をＦｐ、予測値＝ＮＯかつ実績値＝ＹＥＳの件数をＦｎ、予測値＝ＮＯかつ実績値＝ＮＯの件数をＴｎとする。正答率は予測が当たった割合であり、（Ｔｐ＋Ｔｎ）／ｎと算出される。適合率は「ＹＥＳ」の予測を間違えない確率であり、Ｔｐ／（Ｔｐ＋Ｆｐ）と算出される。平均二乗誤差ＭＳＥは、各事例の実績値をＹと表し予測値をｙと表すと、ｓｕｍ（Ｙ－ｙ）^２／ｎと算出される。二乗平均平方根誤差ＲＭＳＥは、（ｓｕｍ（Ｙ－ｙ）^２／ｎ）^１／２と算出される。ＭＳＥ＝ＲＭＳＥ^２である。 Indicators of prediction performance include correct answer rate (Accuracy), precision rate (Precision), mean square error (MSE), root mean square error (RMSE), and the like. For example, assume that the result is represented by a binary value of YES/NO. Also, among the n test data records, Tp is the number of predicted value = YES and actual value = YES, Fp is the number of predicted value = YES and actual value = NO, predicted value = NO and actual value = YES Let Fn be the number of cases, and Tn be the number of cases where predicted value=NO and actual value=NO. The percentage of correct answers is the percentage of correct predictions, and is calculated as (Tp+Tn)/n. The precision is the probability of correct prediction of "YES" and is calculated as Tp/(Tp+Fp). The mean squared error MSE is calculated as sum(Y−y) ² /n, where Y is the actual value of each case and y is the predicted value. The root mean square error RMSE is calculated as (sum(Yy) ² /n) ^1/2 . MSE= ^RMSE2 .

ここで、訓練データからモデルを学習する手順（機械学習アルゴリズム）には様々なものが存在する。機械学習装置１００が使用する機械学習アルゴリズムは、ユーザから指定されてもよいし、機械学習装置１００が所定の評価方法に従って選択するようにしてもよい。機械学習装置１００が使用できる機械学習アルゴリズムの数は、数十～数百程度あってもよい。機械学習アルゴリズムの一例として、ロジスティック回帰分析、サポートベクタマシン、ランダムフォレストなどを挙げることができる。 Here, there are various procedures (machine learning algorithms) for learning a model from training data. The machine learning algorithm used by machine learning device 100 may be specified by the user, or may be selected by machine learning device 100 according to a predetermined evaluation method. The number of machine learning algorithms that can be used by the machine learning device 100 may be on the order of tens to hundreds. Examples of machine learning algorithms include logistic regression analysis, support vector machines, and random forests.

ロジスティック回帰分析は、目的変数ｙの値と説明変数ｘ_１，ｘ_２，…，ｘ_dの値をＳ字曲線にフィッティングする回帰分析である。目的変数ｙおよび説明変数ｘ_１，ｘ_２，…，ｘ_ｄは、ｌｏｇ（ｙ／（１－ｙ））＝ａ_１ｘ_１＋ａ_２ｘ_２＋…＋ａ_ｄｘ_ｄ＋ｂの関係を満たすと仮定される。ａ_１，ａ_２，…，ａ_ｄ，ｂは係数であり、回帰分析によって決定される。 The logistic regression analysis is a regression analysis that fits the value of the objective variable y and the values of the explanatory variables x ₁ , x ₂ , . . . , x _d to an S-curve. _Objective variable _y and _explanatory variables _x ₁ _, x ₂ _, _. be done. a ₁ , a ₂ , . . . , a _d , b are coefficients determined by regression analysis.

サポートベクタマシンは、空間に配置されたレコードの集合を、２つのクラスに最も明確に分割するような境界面を算出する機械学習アルゴリズムである。境界面は、各クラスとの距離（マージン）が最大になるように算出される。 A support vector machine is a machine learning algorithm that computes a boundary surface that most clearly divides a set of spatially arranged records into two classes. The boundary surface is calculated so that the distance (margin) to each class is maximized.

ランダムフォレストは、複数の単位データを適切に分類するためのモデルを生成する機械学習アルゴリズムである。ランダムフォレストでは、データ集合からレコードをランダムにサンプリングする。説明変数の一部をランダムに選択し、選択した説明変数の値に応じてサンプリングしたレコードを分類する。説明変数の選択とレコードの分類を繰り返すことで、複数の説明変数の値に基づく階層的な決定木を生成する。レコードのサンプリングと決定木の生成を繰り返すことで複数の決定木を取得し、それら複数の決定木を合成することで、レコードを分類するための最終的なモデルを生成する。 A random forest is a machine learning algorithm that generates models for appropriately classifying multiple units of data. Random forests randomly sample records from a dataset. Randomly select some of the explanatory variables and classify the sampled records according to the values of the selected explanatory variables. By repeating the selection of explanatory variables and the classification of records, a hierarchical decision tree based on the values of multiple explanatory variables is generated. A plurality of decision trees are obtained by repeating record sampling and decision tree generation, and the final model for classifying records is generated by synthesizing the plurality of decision trees.

あるデータ集合に１つの機械学習アルゴリズムを適用する場合、訓練データとしてサンプリングするレコードの数（訓練データサイズ）が大きいほど予測性能は高くなる。
図３は、訓練データサイズと予測性能の関係例を示すグラフである。 When one machine learning algorithm is applied to a certain data set, the larger the number of records sampled as training data (training data size), the higher the prediction performance.
FIG. 3 is a graph showing an example of the relationship between training data size and prediction performance.

曲線３１は、モデルの予測性能と訓練データサイズとの間の関係を示す。訓練データサイズｓ_１，ｓ_２，ｓ_３，ｓ_４，ｓ_５の間の大小関係は、ｓ_１＜ｓ_２＜ｓ_３＜ｓ_４＜ｓ_５である。例えば、ｓ_２はｓ_１の２倍または４倍であり、ｓ_３はｓ_２の２倍または４倍であり、ｓ_４はｓ_３の２倍または４倍であり、ｓ_５はｓ_４の２倍または４倍である。 Curve 31 shows the relationship between the model's predictive performance and training data size. The training data sizes s ₁ , s ₂ , s ₃ , s ₄ , and s ₅ are s ₁ <s ₂ <s ₃ <s ₄ <s ₅ . For example, _s2 is 2 or 4 times _s1 , _s3 is ₂ or 4 times s2, _s4 is 2 or 4 times _s3 , _s5 is _s4 Double or quadruple.

曲線３１が示すように、訓練データサイズがｓ_２の場合の予測性能はｓ_１の場合よりも高い傾向にある。同様に、訓練データサイズがｓ_３の場合の予測性能はｓ_２の場合よりも高い傾向にある。訓練データサイズがｓ_４の場合の予測性能はｓ_３の場合よりも高い傾向にある。訓練データサイズがｓ_５の場合の予測性能はｓ_４の場合よりも高い傾向にある。このように、訓練データサイズが大きくなるほど予測性能も高くなる傾向にある。ただし、予測性能が低いうちは、訓練データサイズの増加に応じて予測性能が大きく上昇する。一方で、予測性能には上限があり、予測性能が上限に近づくと、訓練データサイズの増加量に対する予測性能の上昇量の比は逓減する。すなわち、曲線３１は、訓練データサイズの増加に応じて、ある上限に漸近するように予測性能が増加することを示している。 As curve 31 shows, the prediction performance for training data size _s2 tends to be higher than for training data size _s1 . Similarly, the prediction performance when the training data size is _s3 tends to be higher than when it is _s2 . The prediction performance when the training data size is _s4 tends to be higher than when it is _s3 . The prediction performance when the training data size is _s5 tends to be higher than when it is _s4 . Thus, the larger the training data size, the higher the prediction performance tends to be. However, while the prediction performance is low, the prediction performance increases greatly as the training data size increases. On the other hand, the prediction performance has an upper limit, and as the prediction performance approaches the upper limit, the ratio of the increase in prediction performance to the increase in training data size gradually decreases. That is, curve 31 shows that predictive performance increases as the training data size increases, asymptotically to some upper bound.

このような訓練データサイズと予測性能との間の関係は、使用する機械学習アルゴリズムによって異なり、収集したデータ集合の性質（データ集合の種類）によっても異なる。このため、曲線３１に示すような予測性能の上限や各訓練データサイズにおける予測性能を、機械学習を開始する前に推定することは容易でない。 The relationship between such training data size and prediction performance varies depending on the machine learning algorithm used and also on the nature of the collected data set (type of data set). For this reason, it is not easy to estimate the upper limit of the prediction performance as shown by curve 31 and the prediction performance for each training data size before starting machine learning.

次に、予測性能の測定値の信頼性について説明する。
図４は、予測性能の測定値の分散例を示すグラフである。
ある訓練データサイズのもとで学習されたモデルの予測性能の測定値は、機械学習アルゴリズムとデータ集合の性質とから決まる期待値から乖離するリスクがある。すなわち、同じデータ集合を使用しても、訓練データおよびテストデータとして選択するレコードの偶然性によって、予測性能の測定値にばらつきが生じる。測定値の「ばらつき」は、分散や標準偏差などと解釈することもできる。分散は、訓練データサイズが小さいほど大きく、訓練データサイズが大きいほど小さくなる傾向にある。また、分散は、テストデータサイズが小さいほど大きく、テストデータサイズが大きいほど小さくなる傾向にある。 Next, the reliability of the measured values of predictive performance will be described.
FIG. 4 is a graph showing an example variance of predictive performance measures.
A measure of the predictive performance of a model trained on a given training data size risks deviating from the expected value determined by the machine learning algorithm and the nature of the dataset. That is, even when using the same data set, randomness in the records selected for training and test data will cause variations in predictive performance measures. "Dispersion" in measured values can also be interpreted as variance, standard deviation, and the like. The variance tends to be larger as the training data size is smaller, and smaller as the training data size is larger. Also, the variance tends to be larger as the test data size is smaller, and smaller as the test data size is larger.

グラフ３２は、訓練データサイズと予測性能との間の関係を示す。ここでは、同じ機械学習アルゴリズムおよび同じデータ集合を用いて、訓練データサイズ１つ当たり５０回ずつモデルの生成および予測性能の測定を行っている。また、テストデータサイズは、訓練データサイズの２分の１または４分の１とするなど、訓練データサイズに比例するようにして訓練データサイズに従属させている。グラフ３２は、１つの訓練データサイズにつき５０個の予測性能の測定値をプロットしたものである。なお、グラフ３２では、予測性能の指標として、値が大きいほど予測性能が高いことを示す正答率を用いている。 Graph 32 shows the relationship between training data size and predictive performance. Here, the same machine learning algorithm and the same data set are used to generate the model and measure its prediction performance 50 times per training data size. Also, the test data size is dependent on the training data size in proportion to the training data size, such as one-half or one-fourth of the training data size. Graph 32 plots 50 predictive performance measures per training data size. Graph 32 uses a percentage of correct answers indicating that the larger the value, the higher the prediction performance, as an index of prediction performance.

グラフ３２では、訓練データサイズ＝１００の場合の予測性能の測定値は、約０．５８～０．６８であり広範囲に広がっている。訓練データサイズ＝５００の場合の予測性能の測定値は、約０．６９～０．７５であり、訓練データサイズ＝１００の場合よりも範囲が狭くなっている。以降、訓練データサイズが大きくなるに従って測定値の範囲は狭くなる。訓練データサイズが十分に大きくなると、測定値は約０．７６に収束している。 In graph 32, the measured predictive performance for training data size=100 is broadly spread between about 0.58 and 0.68. The predictive performance measure for training data size=500 is about 0.69 to 0.75, which is a narrower range than for training data size=100. After that, the range of measured values narrows as the training data size increases. When the training data size is large enough, the measured value converges to about 0.76.

以下では、予測性能の測定値の分散について更に検討する。
まず、バイアス・バリアンス分解について説明する。バイアス・バリアンス分解は、ある機械学習アルゴリズムの良否を評価するために用いられることがある。バイアス・バリアンス分解では、ロス（損失）とバイアスとバリアンスという３つの指標が用いられる。ロス＝バイアスの二乗＋バリアンスという関係が成立する。 Further consideration is given below to the variance of the predictive performance measure.
First, the bias/variance decomposition will be described. Bias-variance decomposition is sometimes used to evaluate the good or bad of a machine learning algorithm. Bias-variance decomposition uses three indices: loss, bias, and variance. The relationship of loss = square of bias + variance holds.

ロスは、機械学習によって生成されるモデルが予測を外す度合いを示す指標である。ロスの種類には０－１ロスや二乗ロスなどがある。０－１ロスは、予測に成功すれば０を付与し予測に失敗すれば１を付与することで算出されるロスであり、その期待値は予測が失敗する確率を示す。予測が外れることが少ないほど０－１ロスの期待値は小さく、予測が外れることが多いほど０－１ロスの期待値は大きい。二乗ロスは、予測値と真の値との差（予測誤差）の二乗である。予測誤差が小さいほど二乗ロスは小さく、予測誤差が大きいほど二乗ロスは大きい。期待ロス（ロスの期待値）と予測性能とは相互に変換できる。 Loss is a measure of the degree to which a model generated by machine learning misses predictions. Loss types include 0-1 loss and squared loss. The 0-1 loss is a loss calculated by assigning 0 if the prediction succeeds and assigning 1 if the prediction fails, and its expected value indicates the probability of prediction failure. The less the prediction is wrong, the smaller the expected value of the 0-1 loss, and the more the prediction is wrong, the higher the expected value of the 0-1 loss. The squared loss is the square of the difference (prediction error) between the predicted value and the true value. The smaller the prediction error, the smaller the squared loss, and the larger the prediction error, the larger the squared loss. Expected loss (expected value of loss) and predictive performance can be converted to each other.

予測性能が正答率（Accuracy）でありロスが０－１ロスである場合、期待ロス＝１－予測性能である。予測性能が平均二乗誤差（ＭＳＥ）でありロスが二乗ロスである場合、期待ロス＝ＭＳＥである。予測性能が二乗平均平方根誤差（ＲＭＳＥ）でありロスが二乗ロスである場合、期待ロス＝ＲＭＳＥの二乗である。 If the prediction performance is the correct answer rate (Accuracy) and the loss is 0-1 loss, expected loss=1-prediction performance. If predictive performance is mean squared error (MSE) and loss is squared loss, expected loss=MSE. If predictive performance is root mean square error (RMSE) and loss is squared loss, expected loss=RMSE squared.

バイアスは、機械学習によって生成されるモデルの予測値が真の値に対して偏る程度を示す指標である。バイアスが小さいほど精度の高いモデルであると言うことができる。バリアンスは、機械学習によって生成されるモデルの予測値がばらつく程度を示す指標である。バリアンスが小さいほど精度の高いモデルであると言うことができる。ただし、バイアスとバリアンスの間にはトレードオフの関係があることが多い。 Bias is an index that indicates the degree to which the prediction values of a model generated by machine learning are biased toward the true value. It can be said that the smaller the bias, the higher the accuracy of the model. Variance is an index that indicates the degree to which prediction values of models generated by machine learning vary. It can be said that the smaller the variance, the higher the accuracy of the model. However, there is often a trade-off relationship between bias and variance.

次数の小さい多項式など複雑性の低いモデル（表現力の低いモデルと言うこともできる）では、モデルの係数をどの様に調整しても、複数のレコードの全てについて真の値に近い予測値を出力するようにすることは難しい。すなわち、複雑性の低いモデルを用いると複雑な事象を表現できない。よって、複雑性の低いモデルのバイアスは大きくなる傾向にある。この点、次数の大きい多項式など複雑性の高いモデル（表現力の高いモデルと言うこともできる）では、モデルの係数を適切に調整することで、複数のレコードの全てについて真の値に近い予測値を出力することができる余地がある。よって、複雑性の高いモデルのバイアスは小さくなる傾向にある。 A low-complexity model, such as a low-order polynomial (which can also be called a low-expressive model), produces predictions that are close to the true value for all records, regardless of how the model's coefficients are adjusted. It is difficult to make it output. In other words, a complex phenomenon cannot be expressed using a low-complexity model. Thus, models with low complexity tend to be biased. In this regard, highly complex models such as polynomials with a large degree (which can also be called highly expressive models) can make predictions that are close to the true values for all multiple records by appropriately adjusting the coefficients of the model. There is room for a value to be output. Therefore, models with higher complexity tend to have less bias.

一方で、複雑性の高いモデルでは、訓練データとして使用するレコードの特徴に過度に依存したモデルが生成されるという過学習が生じるリスクがある。過学習によって生成されたモデルは、他のレコードについて適切な予測値を出力できないことが多い。例えば、ｄ次の多項式を用いると、ｄ＋１個のレコードについて真の値と完全に一致する予測値を出力するモデル（残差が０のモデル）を生成できる。 On the other hand, high-complexity models run the risk of overfitting, generating models that rely too heavily on the features of the records used as training data. Models generated by overfitting often fail to output good predictions for other records. For example, a polynomial of degree d can be used to generate a model (a model with 0 residuals) that outputs predicted values that exactly match the true values for d+1 records.

しかし、あるレコードについて残差が０になるモデルは、通常は過度に複雑なモデルであり、他のレコードについて予測誤差が著しく大きい予測値を出力してしまうリスクが高くなる。よって、複雑性の高いモデルのバリアンスは大きくなる傾向にある。この点、複雑性の低いモデルでは、予測誤差が著しく大きい予測値を出力してしまうリスクは低く、バリアンスは小さくなる傾向にある。このように、ロスの成分としてのバイアスとバリアンスは、モデルを生成する機械学習アルゴリズムの特性に依存している。 However, a model whose residual is 0 for a certain record is usually an overly complex model, and there is a high risk of outputting prediction values with significantly large prediction errors for other records. Therefore, models with high complexity tend to have a large variance. In this respect, a model with low complexity tends to have a low risk of outputting a prediction value with a significantly large prediction error, and the variance tends to be small. Thus, bias and variance as components of loss depend on the properties of the machine learning algorithms that generate the model.

次に、ロスとバイアスとバリアンスの形式的定義を説明する。ここでは、二乗ロスをバイアスとバリアンスに分解する例について説明する。
同一のデータ集合からｍ個の訓練データＤ_ｋ（ｋ＝１，２，…，ｍ）が抽出され、ｍ個のモデルが生成されたとする。また、上記のデータ集合からｉ個のレコードを含むテストデータＴが抽出されたとする。ｉ番目のレコード（テストケースと言うこともできる）は、説明変数の値Ｘ_ｉと目的変数の真の値Ｙ_ｉとを含む（ｉ＝１，２，…，ｎ）。ｋ番目のモデルからは説明変数の値Ｘ_ｉに対して目的変数の予測値ｙ_ｉｋが算出される。 Next, we explain formal definitions of loss, bias, and variance. Here, an example of decomposing the squared loss into bias and variance will be described.
Suppose that m training data D _k (k=1, 2, . . . , m) are extracted from the same data set to generate m models. Assume also that test data T including i records is extracted from the above data set. The i-th record (which can also be called a test case) contains the explanatory variable value X _i and the objective variable true value Y _i (i=1, 2, . . . , n). From the k-th model, the predicted value _yik of the objective variable is calculated for the value _Xi of the explanatory variable.

すると、ｋ番目のモデルとｉ番目のレコードとの間で算出される予測誤差ｅ_ｉｋはｅ_ｉｋ＝Ｙ_ｉ－ｙ_ｉｋと定義され、そのロス（ここでは二乗ロス）はｅ_ｉｋ ^２と定義される。ｉ番目のレコードに対しては、バイアスＢ_ｉとバリアンスＶ_ｉとロスＬ_ｉが定義される。バイアスＢ_ｉはＢ_ｉ＝Ｅ_Ｄ［ｅ_ｉｋ］と定義される。Ｅ_Ｄ［］はｍ個の訓練データの間の平均値（期待値）を表す。バリアンスＶ_ｉはＶ_ｉ＝Ｖ_Ｄ［ｅ_ｉｋ］と定義される。Ｖ_Ｄ［］はｍ個の訓練データの間の分散を表す。ロスＬ_ｉはＬ_ｉ＝Ｅ_Ｄ［ｅ_ｉｋ ^２］と定義される。前述のロスとバイアスとバリアンスの間の関係からＬ_ｉ＝Ｂ_ｉ ^２＋Ｖ_ｉが成立する。 Then the prediction error e _ik calculated between the k th model and the i th record is defined as e _ik =Y _i −y _ik and its loss (here the squared loss) is defined as e _ik ² be. For the i-th record, bias B _i , variance V _i and loss L _i are defined. Bias B _i is defined as B _i =E _D [e _ik ]. E _D [ ] represents the average value (expected value) among m training data. The variance V _i is defined as V _i =V _D [e _ik ]. V _D [ ] represents the variance among the m training data. The loss L _i is defined as L _i =E _D [e _ik ² ]. From the relationships between loss, bias and variance described above, L _i =B _i ² +V _i holds.

テストデータＴ全体に対しては、期待バイアスＥＢ２と期待バリアンスＥＶと期待ロスＥＬが定義される。期待バイアスＥＢ２はＥＢ２＝Ｅ_ｘ［Ｂ_ｉ ^２］と定義される。Ｅ_ｘ［］はｎ個のレコードの間の平均値（期待値）を表す。期待バリアンスＥＶはＥＶ＝Ｅ_ｘ［Ｖ_ｉ］と定義される。期待ロスＥＬはＥＬ＝Ｅ_ｘ［Ｌ_ｉ］と定義される。前述のロスとバイアスとバリアンスの間の関係からＥＬ＝ＥＢ２＋ＥＶが成立する。 For the entire test data T, an expected bias EB2, an expected variance EV, and an expected loss EL are defined. Expectation bias EB2 is defined as EB2=E _x [B _i ² ]. E _x [] represents the average value (expected value) among n records. The expected variance EV is defined as EV=E _x [V _i ]. The expected loss EL is defined as EL=E _x [L _i ]. From the relationship between loss, bias and variance described above, EL=EB2+EV.

バイアス・バリアンス分解の考え方を応用して、予測性能の測定値に生じる分散を推定することができる。測定値の分散は、次の数式によって近似される。ＶＬ＝Ｃ×（ＥＬ＋ＥＢ２）×（ＥＬ－ＥＢ２）。ＶＬは訓練データサイズｓにおける予測性能の測定値の分散を表す。Ｃは定数である。ＥＬは訓練データサイズｓにおける期待ロスを表す。ＥＢ２は期待バイアスを表す。以下、この数式の意味について説明を加える。 The concept of bias-variance decomposition can be applied to estimate the variance that occurs in predictive performance measures. The variance of the measurements is approximated by the following formula. VL=C*(EL+EB2)*(EL-EB2). VL represents the variance of the measure of predictive performance on the training data size s. C is a constant. EL represents the expected loss at training data size s. EB2 represents the expectation bias. The meaning of this formula will be explained below.

図５は、予測性能の期待ロスおよび期待バイアスの例を示すグラフである。
曲線３３は、訓練データサイズとロスの推定値との間の関係を示すロス曲線である。図３では縦軸が予測性能であるのに対し、図５では縦軸がロスに変換されている。前述のように予測性能とロスは、予測性能の指標とロスの指標に応じて相互に変換可能である。曲線３３は、訓練データサイズの増加に応じてロスが単調に減少し一定の下限ロスに漸近する非線形曲線である。訓練データサイズが小さいうちはロスの減少量が大きく、訓練データサイズが大きくなるとロスの減少量が小さくなる。 FIG. 5 is a graph showing an example of expected loss and expected bias in predictive performance.
Curve 33 is a loss curve showing the relationship between the training data size and the loss estimate. In FIG. 3, the vertical axis is predicted performance, while in FIG. 5, the vertical axis is converted to loss. As described above, predictive performance and loss are interchangeable according to the predictive performance index and the loss index. A curve 33 is a non-linear curve in which the loss monotonically decreases and asymptotically approaches a constant lower loss limit as the training data size increases. While the training data size is small, the loss reduction amount is large, and as the training data size is large, the loss reduction amount is small.

訓練データサイズｓ_ｐにおける曲線３３上の点のロス（ロス＝０から曲線３３上の点までの距離）は、訓練データサイズｓ_ｐの期待ロスＥＬ_ｐに相当する。曲線３３によって特定される下限ロスは、図３の曲線３１によって特定される予測性能の上限に対応しており、０より大きい値である。例えば、予測性能の上限をｃとおくと、予測性能が正答率である場合、下限ロスは１－ｃとなる。予測性能が平均二乗誤差（ＭＳＥ）である場合、下限ロスはｃとなる。予測性能が二乗平均平方根誤差（ＲＭＳＥ）である場合、下限ロスはｃ^２となる。下限ロスは、この機械学習アルゴリズムにとっての期待バイアスＥＢ２に相当する。訓練データサイズが十分大きくなると、機械学習に用いる訓練データの特徴がデータ集合の特徴に一致し、期待バリアンスが０に近づくためである。 The loss of a point on curve 33 at training data size s _p (the distance from loss=0 to the point on curve 33) corresponds to the expected loss EL _p for training data size s _p . The lower loss limit specified by curve 33 corresponds to the upper predictive performance limit specified by curve 31 in FIG. 3 and is a value greater than zero. For example, if the upper limit of prediction performance is c, and the prediction performance is the correct answer rate, the lower limit loss is 1-c. If the predictive performance is the mean squared error (MSE), then the lower bound loss is c. If the predictive performance is the root mean square error (RMSE), then the lower bound loss is ^c2 . The lower bound loss corresponds to the expected bias EB2 for this machine learning algorithm. This is because when the training data size is large enough, the features of the training data used for machine learning match the features of the data set, and the expected variance approaches zero.

期待ロスＥＬ_ｐと期待バイアスＥＢ２の差は、訓練データサイズｓ_ｐにおけるギャップと言うことができる。ギャップは、訓練データサイズを大きくすることでその機械学習アルゴリズムがロスを低減できる余地を表している。ギャップは、図３の曲線３１上の点と予測性能の上限との間の距離に対応し、訓練データサイズを大きくすることでその機械学習アルゴリズムが予測性能を改善できる余地を表しているとも言える。ギャップは、訓練データサイズｓ_ｐにおける期待バリアンスの影響を受ける。 The difference between expected loss EL _p and expected bias EB2 can be said to be a gap in training data size s _p . A gap represents the room the machine learning algorithm can use to reduce loss by increasing the training data size. The gap corresponds to the distance between a point on curve 31 in FIG. 3 and the upper bound of prediction performance, and can be said to represent the room for the machine learning algorithm to improve its prediction performance by increasing the training data size. . The gap is subject to the expected variance in the training data size _sp .

次に、予測性能の測定値の分散を示す数式の数学的根拠について説明する。
（ａ）問題の形式的な記述
同一のデータ集合からｍセットの訓練データＤ_１，Ｄ_２，…，Ｄ_ｍとテストデータＴが抽出されたとする。ある機械学習アルゴリズムに訓練データＤ_ｋを与えて学習されたモデルをｆ_ｋとする（ｋ＝１，２，…，ｍ）。テストデータＴをレコード＜Ｙ_ｉ，Ｘ_ｉ＞の集合とする（ｉ＝１，２，…，ｎ）。Ｘ_ｉは説明変数の値（入力値）であり、Ｙ_ｉは入力値Ｘ_ｉに対応する目的変数の既知の値（真値）である。入力値Ｘ_ｉに対してモデルｆ_ｋが予測した値（予測値）をｙ_ｉｋ＝ｆ_ｋ（Ｘ_ｉ）とする。入力値Ｘ_ｉに対するモデルｆ_ｋによる予測の誤差はｅ_ｉｋ＝Ｙ_ｉ－ｙ_ｉｋと定義される。テストデータＴに含まれるレコードの数、すなわち、テストデータＴのサイズはｎである。以下では主に、ｉ，ｊはテストデータＴのレコードを識別する添え字、ｋはモデルを識別する添え字として使用する。 Next, the mathematical basis for formulas that describe the variance of the predictive performance measurements will be described.
(a) Formal Description of Problem Suppose that m sets of training data D ₁ , D ₂ , . . . , D _m and test data T are extracted from the same data set. Let _fk be a model learned by giving training data _Dk to a certain machine learning algorithm (k=1, 2, . . . , m). Let test data T be a set of records <Y _i , X _i > (i=1, 2, . . . , n). _Xi is the value (input value) of the explanatory variable, and _Yi is the known value (true value) of the objective variable corresponding to the input value _Xi . Let y _ik =f _k (X _i ) be the value (predicted value) predicted by the model f _k for the input value X _i . The error of prediction by model f _k for input value X _i is defined as e _ik =Y _i −y _ik . The number of records included in the test data T, that is, the size of the test data T is n. Below, i and j are mainly used as suffixes for identifying records of test data T, and k as a suffix for identifying models.

機械学習アルゴリズムが回帰を目的とする場合、予測値は連続量であり、ロスの指標として数式（１）の二乗ロスが用いられることが多い。この二乗ロスをテストデータＴの全てのレコードについて平均したものが数式（２）のＭＳＥ（平均二乗誤差）である。 When a machine learning algorithm aims at regression, the predicted value is a continuous quantity, and the squared loss of Equation (1) is often used as the loss index. The MSE (mean squared error) in Equation (2) is obtained by averaging this squared loss for all the records of the test data T.

ここで、Ｅ［・］は期待値を求める演算子であり、Ｖ［・］は分散を求める演算子である。Ｅ［・］，Ｖ［・］に付加する添え字Ｘは、この演算子がテストデータＴの複数のレコードの間の演算であることを示す。Ｅ［・］，Ｖ［・］に付加する添え字Ｍは、この演算子が複数のモデルの間の演算であることを示す。すなわち、Ｅ_Ｘ［・］はテストデータＴの複数のレコードの間で平均化した期待値を示し、Ｅ_Ｍ［・］は複数のモデルの間で平均化した期待値を示す。Ｖ_Ｘ［・］はテストデータＴの複数のレコードの間の分散を示し、Ｖ_Ｍ［・］は複数のモデルの間の分散を示す。また、ｃｏｖ（・，・）は共分散を求める共分散関数であり、ｃｏｒ（・，・）は相関係数を求める相関係数関数である。ｃｏｖ（・，・），ｃｏｒ（・，・）にも添え字Ｘ，Ｍが付加される。 Here, E[•] is an operator for obtaining an expected value, and V[•] is an operator for obtaining a variance. The suffix X added to E[.] and V[.] indicates that this operator is an operation between multiple records of the test data T. FIG. The subscript M attached to E[•] and V[•] indicates that this operator is an operation between multiple models. That is, E _X [·] indicates the expected value averaged among multiple records of the test data T, and E _M [·] indicates the expected value averaged among multiple models. V _X [·] indicates the variance among the multiple records of the test data T, and V _M [·] indicates the variance among the multiple models. Also, cov(.,.) is a covariance function for obtaining a covariance, and cor(.,.) is a correlation coefficient function for obtaining a correlation coefficient. Subscripts X and M are also added to cov(·,·) and cor(·,·).

機械学習アルゴリズムが二値分類を目的とする場合、予測値は｛－１，１｝のような二値の離散値であり、ロスの指標として数式（３）の０－１ロスが用いられることが多い。この０－１ロスをテストデータＴの全てのレコードについて平均して１から引いたものが、数式（４）の正答率（Accuracy）である。 If the machine learning algorithm aims at binary classification, the predicted value is a binary discrete value such as {-1, 1}, and the 0-1 loss in equation (3) is used as the loss indicator. There are many. Averaging this 0-1 loss for all records of the test data T and subtracting it from 1 is the correct answer rate (Accuracy) of the formula (4).

ＭＳＥは値が小さいほど予測性能が高いことを示し、正答率は値が大きいほど予測性能が高いことを示す。ただし、両者ともテストデータＴ全体について平均化したロスがモデルの予測性能の良否を表している点で共通しており、モデルロスと言うことができる。モデルｆ_ｋに対するモデルロスＭＬ_ｋは数式（５）の通りである。予測性能の分散は、数式（６）のように複数のモデルの間のモデルロスの分散として表される。 A smaller MSE value indicates higher prediction performance, and a higher correct answer rate indicates higher prediction performance. However, both of them have in common that the loss averaged over the entire test data T represents the quality of the prediction performance of the model, and can be called a model loss. The model loss ML _k for the model f _k is given by Equation (5). The variance of predictive performance is expressed as the variance of model loss among multiple models as shown in Equation (6).

（ｂ）バイアス・バリアンス分解
モデルによる予測で生じるロスはバイアスとバリアンスに分解できる。バイアスはモデルの予測値の偏りを示す量である。バイアスが低いモデルほど正確なモデルであると言える。表現力の低いモデル（調整可能な係数が少ないような複雑性の低いモデルなど）はバイアスが高くなる傾向にある。バリアンスはモデルの予測値のばらつきを示す量である。バリアンスが低いほど正確なモデルであると言える。表現力の高いモデル（調整可能な係数が多いような複雑性の高いモデルなど）はバリアンスが高くなる傾向にある。表現力の高いモデルには、訓練データに過剰適合するという過学習のリスクがある。 (b) Decomposition of bias and variance The loss caused by the prediction by the model can be decomposed into bias and variance. Bias is a quantity that indicates the bias of the model's predictions. A model with a lower bias can be said to be a more accurate model. Less expressive models (such as less complex models with fewer adjustable coefficients) tend to be more biased. Variance is a quantity that indicates the variability of the model's predictions. It can be said that the lower the variance, the more accurate the model. Highly expressive models (complex models with many adjustable coefficients, etc.) tend to have high variance. Highly expressive models run the risk of overfitting, overfitting the training data.

テストデータＴの入力値Ｘ_ｉに対するロスＬ_ｉ、バイアスＢ_ｉおよびバリアンスＶ_ｉは、数式（７）～（９）のように定義される。ロスＬ_ｉは複数のモデルの間の二乗誤差の期待値であり、バイアスＢ_ｉは複数のモデルの間の誤差の期待値であり、バリアンスＶ_ｉは複数のモデルの間の誤差の分散である。ロスＬ_ｉとバイアスＢ_ｉとバリアンスＶ_ｉとの間には、数式（１０）の関係（バイアス・バリアンス分解）が成立する。 The loss L _i , the bias B _i and the variance V _i for the input value X _i of the test data T are defined by Equations (7) to (9). The loss L _i is the expected value of the squared error between the models, the bias B _i is the expected value of the error between the models, and the variance V _i is the variance of the error between the models. . The relationship (bias-variance decomposition) of Equation (10) holds between the loss _Li , the bias _Bi , and the variance _Vi .

様々な入力値Ｘ_ｉに対するロスＬ_ｉの期待値を期待ロスＥＬ、バイアスＢ_ｉの二乗の期待値を期待バイアスＥＢ２、バリアンスＶ_ｉの期待値を期待バリアンスＥＶとする。期待ロスＥＬ、期待バイアスＥＢ２、期待バリアンスＥＶは、数式（１１）～（１３）のように定義される。期待ロスＥＬと期待バイアスＥＢ２と期待バリアンスＥＶとの間には、数式（１４）の関係（バイアス・バリアンス分解）が成立する。 Let expected loss _EL be the expected value of loss _Li for various input values Xi, expected bias EB2 be the expected value of the square of bias _Bi , and expected variance EV be the expected value of variance _Vi . Expected loss EL, expected bias EB2, and expected variance EV are defined as in equations (11) to (13). The relationship (bias/variance decomposition) of Equation (14) holds between the expected loss EL, the expected bias EB2, and the expected variance EV.

ここでの目的は、ＥＬ，ＥＢ２，ＥＶとモデルロスの分散との間の関係を導出することである。なお、期待ロスＥＬとモデルロスＭＬ_ｋの期待値とは、数式（１５）に示すように等価である。一方、ロスＬ_ｉの分散とモデルロスＭＬ_ｋの分散とは等価でない。以下では、予測性能の分散を推定する数式を次の流れで導出する。第１に、ロスの分散をバイアスとバリアンスで記述する。第２に、モデルロスの分散をインスタンス成分と相互作用成分に分解する。第３に、インスタンス成分を算出する。第４に、相互作用成分を算出する。第５に、モデルロスの分散をバイアスとバリアンスで記述する。 The aim here is to derive the relationship between EL, EB2, EV and the variance of the model loss. Note that the expected loss EL and the expected value of the model loss ML _k are equivalent as shown in Equation (15). On the other hand, the variance of loss L _i and the variance of model loss ML _k are not equivalent. Below, a formula for estimating the variance of predictive performance is derived in the following flow. First, we describe the variance of loss in terms of bias and variance. Second, we decompose the model loss variance into an instance component and an interaction component. Third, compute the instance component. Fourth, the interaction component is calculated. Fifth, the variance of model loss is described by bias and variance.

（ｃ）ロスの分散をバイアスとバリアンスで記述
テストデータＴの入力値Ｘ_ｉを固定して複数のモデルの誤差を並べた誤差ベクトルを考える。誤差ｅを確率変数とみなしてその分布が正規分布に従うと仮定すると、複数のモデルの間のロスの分散は数式（１６）のように定義され、バイアスＢ_ｉとバリアンスＶ_ｉの組またはロスＬ_ｉとバイアスＢ_ｉの組によって記述することができる。数式（１６）の１行目から２行目への変形では、数式（１７）に示す統計学上の性質（確率変数の４乗の期待値）が利用されている。数式（１７）においてＸは確率変数であり、Ｓは歪度であり、Ｋは尖度である。正規分布の場合はＳ＝０かつＫ＝３である。 (c) Describing Loss Dispersion by Bias and Variance Let us consider an error vector in which the input value X _i of the test data T is fixed and the errors of a plurality of models are arranged. Assuming that the error e is a random variable and its distribution follows a normal distribution, the variance of the loss among multiple models is defined as in Equation (16), and the set of bias B _i and variance V _i or loss L It can be described by a pair of _i and bias B _i . The transformation from the first line to the second line of Equation (16) utilizes the statistical property (expected value of the 4th power of a random variable) shown in Equation (17). In Equation (17), X is a random variable, S is skewness, and K is kurtosis. For the normal distribution, S=0 and K=3.

（ｄ）モデルロスの分散をインスタンス成分と相互作用成分に分解
分散の基本的性質から、予測性能の分散（複数のモデルの間のモデルロスの分散）について数式（１８）が成立する。これをｎ×ｎ行列の成分の平均と考えると、ｉ＝ｊである対角成分は入力値Ｘ_ｉに対するロスの分散を表しており、その相関係数は１になる。一方、ｉ≠ｊである非対角成分の相関係数は異なる入力値の間のロスの相関を表している。異なる入力値に対する誤差の発生状況は共通点が少ないため、その相関係数の絶対値は十分に小さくなることが多く、予測性能の高いモデルほどその相関係数は０に近づく。対角成分と非対角成分とは性質が異なるため、数式（１９）のように両者を分離して考える。 (d) Decomposing Model Loss Variance into Instance Component and Interaction Component Based on the basic nature of variance, Equation (18) holds for predictive performance variance (model loss variance among multiple models). Considering this as the average of the elements of the n×n matrix, the diagonal element where i=j represents the variance of the loss with respect to the input value X _i , and its correlation coefficient is one. On the other hand, the off-diagonal correlation coefficients for i≠j represent the loss correlation between different input values. Since there are few common points in the occurrence of errors for different input values, the absolute value of the correlation coefficient is often sufficiently small, and the correlation coefficient approaches 0 as the prediction performance of the model increases. Since the diagonal component and the off-diagonal component have different properties, they are considered separately as in Equation (19).

数式（１９）では、モデルロスの分散を第１項のインスタンス成分と第２項の相互作用成分とに分解している。第１項はロスの分散の期待値を表しており、モデルロスの分散の大部分を占めることが多い。第２項は異なる入力値の間の共分散の期待値を表しており、モデルロスの分散に対する寄与は小さいことが多い。第１項はテストデータＴのサイズｎに反比例するため、テストデータＴのレコードを増やすことでモデルロスの分散を低減できる。ただし、第２項が存在することから低減効果には限界がある。
（ｅ）インスタンス成分を算出
数式（１９）の第１項について検討する。上記の数式（１６）より数式（２０）が成立する。ここで、数式（２０）の第１項と第２項を算出するために幾つかの仮定をおく。多くの機械学習アルゴリズムは不偏推定量を出力するようにモデルを学習することから、数式（２１）のように誤差の期待値が０になるという仮定をおく。数式（２１）からバイアスＢ_ｉについて数式（２２）の性質が導出される。 Equation (19) decomposes the variance of the model loss into the instance component of the first term and the interaction component of the second term. The first term represents the expected value of the variance of the loss and often accounts for most of the variance of the model loss. The second term represents the expected value of the covariance between different input values, and the contribution of the model loss to the variance is often small. Since the first term is inversely proportional to the size n of the test data T, increasing the number of test data T records can reduce the variance of the model loss. However, since the second term exists, there is a limit to the reduction effect.
(e) Calculate Instance Components Consider the first term of Equation (19). Equation (20) is established from Equation (16) above. Here, some assumptions are made to calculate the first and second terms of Equation (20). Since many machine learning algorithms learn a model so as to output an unbiased estimator, it is assumed that the expected value of the error is 0 as shown in Equation (21). From equation (21), the property of equation (22) is derived for bias B _i .

確率分布の中には、訓練データサイズや訓練データのサンプリング方法に依存して期待値や分散が変化することはあっても、確率分布の形状を示す歪度や尖度は変化しない（または、変化が非常に緩やかである）ものがあると仮定する。具体的には、入力値Ｘ_ｉに対する複数のモデルの間の誤差の分布は正規分布を形成し、尖度＝３かつ歪度＝０になることを仮定する。また、バイアスＢ_ｉの分布の尖度Ｋ１は変化しないことを仮定する。バイアスＢ_ｉの分布の尖度Ｋ１は、数式（２３）のように定義される。数式（２３）と上記の数式（１２）から数式（２４）が算出される。 For some probability distributions, the expected value and variance may change depending on the training data size and training data sampling method, but the skewness and kurtosis that indicate the shape of the probability distribution do not change (or change very slowly). Specifically, we assume that the distribution of errors among the models for input values X _i forms a normal distribution, with kurtosis=3 and skewness=0. It is also assumed that the kurtosis K1 of the bias B _i distribution does not change. The kurtosis K1 of the distribution of the bias B _i is defined as in Equation (23). Equation (24) is calculated from Equation (23) and Equation (12) above.

また、モデルｆ_ｋに対する複数の入力値の間の誤差の分布の尖度Ｋ２は、モデル間で共通でありかつ変化しないことを仮定する。尖度Ｋ２は数式（２５）のように定義される。Ｋ１，Ｋ２の値はそれぞれ３～１０の範囲内であることが多く、両者は近いことが多い。 It is also assumed that the kurtosis K2 of the distribution of errors among multiple input values for the model f _k is common among the models and does not change. The kurtosis K2 is defined as in Equation (25). The values of K1 and K2 are often within the range of 3 to 10, and both are often close to each other.

数式（２５）から数式（２６）が導出される。数式（２６）を数式（１８），（１９）に代入することで数式（２７）が算出される。ここで、尖度Ｋ２はサイズｎより十分に小さいため、１－Ｋ２／ｎは１に近似される。数式（２０），（２３）を数式（１８），（１９）に代入することで数式（２８）が算出される。数式（２８）から数式（２７）を減算して数式（２９）が算出される。そして、数式（２０），（２４），（２９）から数式（３０）が算出される。これが、数式（１９）の第１項の主要成分である。 Equation (26) is derived from Equation (25). Formula (27) is calculated by substituting formula (26) into formulas (18) and (19). Here, 1−K2/n is approximated to 1 because the kurtosis K2 is sufficiently smaller than the size n. Formula (28) is calculated by substituting formulas (20) and (23) into formulas (18) and (19). Equation (29) is calculated by subtracting Equation (27) from Equation (28). Equation (30) is calculated from Equations (20), (24), and (29). This is the main component of the first term of Equation (19).

（ｆ）相互作用成分を算出
不動点Ｃｏｒ１ｖを数式（３１）のように定義する。不動点Ｃｏｒ１ｖは、訓練データサイズを変化させても値が変化しないかまたは非常に緩やかに変化することが多いため、ここでは訓練データサイズに依存しないと仮定する。不動点Ｃｏｒ１ｖの値は０．００１～０．１程度であることが多い。 (f) Calculation of interaction component A fixed point Cor1v is defined as in Equation (31). The fixed point Cor1v is assumed to be independent of the training data size because it often does not change or changes very slowly even when the training data size is changed. The value of the fixed point Cor1v is often about 0.001 to 0.1.

ここで、数式（３２）に示す統計学上の性質（誤差の相関係数の期待値）を利用する。誤差の期待値が０であるとき、２つの誤差の相関係数の期待値は０に近似する。この性質から数式（３３）が成立し、上記の数式（３１）から数式（３４）が算出される。 Here, the statistical property (expected value of error correlation coefficient) shown in Equation (32) is used. When the expected value of the error is 0, the expected value of the correlation coefficient of the two errors is close to 0. Equation (33) is established from this property, and equation (34) is calculated from the above equation (31).

また、数式（３５）が成立する。数式（３５）の２行目から３行目への変形では、相関係数ｃｏｒ_ＭとバリアンスＶ_ｉ，Ｖ_ｊとは互いに独立であることを仮定している。数式（３５）の３行目から４行目への変形では、上記の数式（３４）を利用しており、Ｖ_ｉ・Ｖ_ｊの期待値がＥＶ^２に近似することを利用している。数式（３５）の４行目の近似では、テストデータサイズｎが１より十分に大きいため１／（ｎ－１）^２を無視している。 Also, the formula (35) is established. The second-to-third modification of equation (35) assumes that the correlation coefficient cor _M and the variances V _i and V _j are independent of each other. The transformation from the third line to the fourth line of Equation (35) utilizes Equation (34) above, and utilizes the fact that the expected value of V _i ·V _j approximates EV ² . The fourth line approximation of Equation (35) ignores 1/(n−1) ² because the test data size n is much larger than one.

ここで、数式（３６）に示す統計学上の性質（共分散の二乗と二乗の共分散の関係）を利用する。確率変数Ｘ，Ｙの結合確率が二次元正規分布に従うならば数式（３６）が成立する。誤差の分散が正規分布に従うため、数式（３６）を利用して数式（３７）が算出される。また、数式（３８）が成立する。数式（３８）の１行目から２行目への変形では、共分散ｃｏｖ_ＭとバイアスＢ_ｉ，Ｂ_ｊは概ね独立であることを仮定している。数式（３８）の２行目の近似では、Ｂ_ｉＢ_ｊの期待値はバイアスＢ_ｉの期待値の二乗に近似しその結果０に近似するという性質を利用している。数式（３５），（３８）を数式（３７）に代入することで数式（３９）が算出される。これが、数式（１９）の第２項の主要成分である。 Here, the statistical property (the relationship between the square of the covariance and the covariance of the square) shown in Equation (36) is used. Equation (36) holds if the joint probability of random variables X and Y follows a two-dimensional normal distribution. Equation (37) is calculated using Equation (36) because the error variance follows a normal distribution. Also, the formula (38) is established. The transformation from line 1 to line 2 of Equation (38) assumes that the covariance cov _M and the biases B _i and B _j are largely independent. The approximation on the second line of Equation (38) utilizes the property that the expected value of B _i B _j approximates the square of the expected value of the bias B _i and, as a result, approximates zero. Formula (39) is calculated by substituting formulas (35) and (38) into formula (37). This is the main component of the second term of Equation (19).

（ｇ）モデルロスの分散をバイアスとバリアンスで記述
上記の数式（１８），（１９），（３０），（３９）より数式（４０）の近似式が成立する。尖度Ｋ２は尖度Ｋ１に近似するため、数式（４０）は数式（４１）のように近似される。典型的にはＫ１（ＥＬ＋ＥＢ２）はｃｏｒ１ｖ（ＥＬ－ＥＢ２）より十分に大きいため、数式（４１）は更に数式（４２）のように近似される。尖度Ｋ１は事前には不明であるが、分散の比が判明すれば実用上十分であることも多い。そこで、数式（４２）は比例定数Ｃを用いて数式（４３）のように単純化できる。これにより、予測性能の測定値の分散が、期待ロスＥＬと期待バイアスＥＢ２の差に比例し、かつ、期待ロスＥＬと期待バイアスＥＢ２の和に比例するという数式が導出される。 (g) Describing Variance of Model Loss with Bias and Variance From the above Equations (18), (19), (30), and (39), the approximation of Equation (40) is established. Since kurtosis K2 approximates kurtosis K1, equation (40) is approximated as equation (41). Since K1(EL+EB2) is typically much larger than cor1v(EL-EB2), equation (41) is further approximated by equation (42). Although the kurtosis K1 is unknown in advance, it is often sufficient for practical use if the variance ratio is known. Therefore, equation (42) can be simplified to equation (43) using the proportionality constant C. As a result, a mathematical formula is derived that the variance of the measured value of predictive performance is proportional to the difference between the expected loss EL and the expected bias EB2, and proportional to the sum of the expected loss EL and the expected bias EB2.

予測性能の測定値の分散を示す数式（４１）は、数式（４４）のように変形することができる。テストデータサイズｎに着目すると、数式（４４）の第１項は、予測性能の測定値の分散のうち、テストデータサイズｎの増大に応じて減少するテストデータ依存成分に相当する。一方、数式（４４）の第２項は、予測性能の測定値の分散のうち、テストデータサイズｎの増大によっては減少しない訓練データ依存成分に相当する。このため、数式（４４）は、予測性能の測定値の分散が、テストデータサイズｎの増大によって減少するものの、０より大きい下限が存在することを示している。 Equation (41), which describes the variance of the predictive performance measure, can be modified as in Equation (44). Focusing on the test data size n, the first term in Equation (44) corresponds to the test data dependent component of the variance of the measured values of predictive performance that decreases as the test data size n increases. On the other hand, the second term in Equation (44) corresponds to the training data dependent component of the variance of the predictive performance measurement that does not decrease as the test data size n increases. Equation (44) thus indicates that there is a lower bound greater than 0, although the variance of the predictive performance measure decreases with increasing test data size n.

前述の図４では、テストデータサイズを訓練データサイズの２分の１や４分の１とするなど、テストデータサイズを訓練データサイズに比例するように決定していた。しかし、このようなテストデータサイズの決定方法は、予測性能の測定値の信頼性と予測性能の測定の負荷とを両立させる観点から、改善の余地がある。テストデータサイズが小さ過ぎると、予測性能の測定値が有する潜在的な分散が大きくなり、算出される測定値の信頼性が低下する。一方、テストデータサイズが大き過ぎると、予測性能の測定値の分散があまり減少せず、測定値の信頼性の向上にあまり寄与しないにもかかわらず、テスト処理を無駄に繰り返すことになり、テスト処理の負荷が増大する。 In FIG. 4 described above, the test data size is determined to be proportional to the training data size, such as setting the test data size to 1/2 or 1/4 of the training data size. However, such a test data size determination method has room for improvement from the viewpoint of balancing the reliability of the prediction performance measurement value and the load of the prediction performance measurement. If the test data size is too small, the predictive performance measure will have a large potential variance and the calculated measure will be less reliable. On the other hand, if the test data size is too large, the variance of the predictive performance measurements will not be significantly reduced, and the test process will be repeated unnecessarily even though it will not contribute much to improving the reliability of the measurements. Processing load increases.

この点、数式（４４）が示す予測性能の測定値の分散とテストデータサイズｎとの間の対応関係によれば、測定値の信頼性とテスト負荷とを両立させるような効率的なテストデータサイズｎが存在することになる。そこで、第２の実施の形態の機械学習装置１００は、数式（４４）に基づいて、適切なテストデータサイズを決定する。 In this regard, according to the correspondence relationship between the variance of the measured value of the predicted performance and the test data size n indicated by Equation (44), efficient test data that balances the reliability of the measured value and the test load There will be a size n. Therefore, the machine learning device 100 according to the second embodiment determines an appropriate test data size based on Equation (44).

データ集合と機械学習アルゴリズムが特定されると、機械学習装置１００は、数式（４４）の尖度Ｋ１と不動点Ｃｏｒ１ｖと期待バイアスＥＢ２を決定する。これにより、機械学習装置１００は、テストデータサイズｎと期待ロスＥＬを引数として有し、予測性能の測定値の分散を推定する分散関数ｆ（ｎ，ＥＬ）を生成する。尖度Ｋ１と不動点Ｃｏｒ１ｖと期待バイアスＥＢ２は、訓練データサイズに依存しないパラメータである。そのため、データ集合と機械学習アルゴリズムが同じであれば、訓練データサイズが異なっても、同じ分散関数を用いて予測性能の測定値の分散を推定することができる。 Once the data set and machine learning algorithm are specified, machine learning device 100 determines kurtosis K1, fixed point Cor1v, and expected bias EB2 of equation (44). As a result, the machine learning apparatus 100 has the test data size n and the expected loss EL as arguments and generates a variance function f(n, EL) for estimating the variance of the measured value of prediction performance. The kurtosis K1, the fixed point Cor1v, and the expected bias EB2 are parameters that do not depend on the training data size. Therefore, given the same data set and machine learning algorithm, the same variance function can be used to estimate the variance of a measure of predictive performance, even with different training data sizes.

あるデータ集合と機械学習アルゴリズムの組に対する尖度Ｋ１と不動点Ｃｏｒ１ｖと期待バイアスＥＢ２を決定するには、前述のように、ｍセットの訓練データと１セットのテストデータの間の網羅的な誤差を示す誤差プロファイルを用意することが好ましい。そこで、機械学習装置１００は、予測性能を測定したいモデルの訓練データサイズよりも十分に小さい訓練データを、同一のデータ集合からｍセット抽出し、ｍセットの訓練データを用いて機械学習によりｍ個のモデルを生成する。また、機械学習装置１００は、十分に小さいテストデータを当該データ集合から抽出し、テストデータに含まれる複数のレコードとｍ個のモデルとの間で網羅的に誤差を算出する。 To determine the kurtosis K1, the fixed point Cor1v, and the expectation bias EB2 for a given dataset and set of machine learning algorithms, as before, the exhaustive error between m sets of training data and one set of test data It is preferable to provide an error profile showing Therefore, the machine learning device 100 extracts m sets of training data sufficiently smaller than the training data size of the model whose prediction performance is to be measured from the same data set, and performs m sets of training data by machine learning using the m sets of training data. generate a model of Also, the machine learning apparatus 100 extracts sufficiently small test data from the data set, and comprehensively calculates errors between the plurality of records included in the test data and the m models.

例えば、予測性能を測定したいモデルの訓練データサイズが１００万レコードであるとする。この場合、誤差プロファイル生成のための訓練データの個数を１０セットとし、各訓練データのサイズを１万レコードとする。また、テストデータサイズを、訓練データサイズの２分の１である５０００レコードとする。これにより、１０個のモデルとテストデータの５０００レコードとの間で、１０×５０００個の誤差が算出される。機械学習装置１００は、この誤差プロファイルを用いて、数式（２３）の尖度Ｋ１と、数式（３１）の不動点Ｃｏｒ１ｖと、数式（１２）の期待バイアスＥＢ２を算出する。 For example, suppose the training data size of the model whose predictive performance you want to measure is 1 million records. In this case, the number of training data sets for error profile generation is 10 sets, and the size of each training data is 10,000 records. Also, the test data size is set to 5000 records, which is half the training data size. This yields 10×5000 errors between the 10 models and the 5000 records of the test data. Using this error profile, the machine learning device 100 calculates the kurtosis K1 of Equation (23), the fixed point Cor1v of Equation (31), and the expected bias EB2 of Equation (12).

分散関数ｆ（ｎ，ＥＬ）が生成されると、機械学習装置１００は、予測性能を測定したいモデルに対応する期待ロスＥＬを分散関数に代入する。期待ロスＥＬは、図５に示すように、データ集合と機械学習アルゴリズムが同じでも訓練データサイズに応じて変化する。そのため、対象のモデルに対応する期待ロスＥＬを使用することになる。 When the variance function f(n, EL) is generated, the machine learning device 100 substitutes the expected loss EL corresponding to the model whose predictive performance is to be measured into the variance function. As shown in FIG. 5, the expected loss EL changes according to the training data size even if the data set and machine learning algorithm are the same. Therefore, the expected loss EL corresponding to the target model will be used.

ある訓練データサイズに対応する期待ロスＥＬは、測定せずに与えられることもあるし対象のモデルから測定して求めることもある。測定しない場合として、データ集合および機械学習アルゴリズムが同一であり訓練データサイズが異なる複数のモデルの予測性能が、既に測定済みである場合が考えられる。その場合、回帰分析などの統計的方法により、それら測定値から未知の期待ロスＥＬを推定することが考えられる。未知の期待ロスＥＬの推定には、図３や図５の非線形曲線を利用することができる。 The expected loss EL corresponding to a certain training data size may be given without measurement or obtained by measurement from the target model. As a case of not measuring, it is conceivable that the predictive performance of multiple models with the same data set and machine learning algorithm but different training data sizes has already been measured. In that case, it is conceivable to estimate the unknown expected loss EL from these measured values by a statistical method such as regression analysis. The non-linear curves in FIGS. 3 and 5 can be used to estimate the unknown expected loss EL.

対象のモデルから測定する場合、例えば、機械学習装置１００は、誤差プロファイルの生成に使用した小さなテストデータを対象のモデルに入力し、テストデータに含まれる複数のレコードに対応する誤差を算出する。そして、機械学習装置１００は、それら誤差から数式（１１）の期待ロスＥＬを算出する。例えば、５０００レコードのテストデータから５０００個の誤差が算出され、期待値としての期待ロスＥＬが算出される。 When measuring from the target model, for example, the machine learning device 100 inputs small test data used to generate the error profile to the target model, and calculates errors corresponding to multiple records included in the test data. Then, the machine learning device 100 calculates the expected loss EL of Equation (11) from those errors. For example, 5000 errors are calculated from 5000 records of test data, and the expected loss EL is calculated as an expected value.

上記の方法で測定される期待ロスＥＬは、対象のモデルが大きい訓練データサイズで学習されているため、当該大きい訓練データサイズに対応した測定値になる。ただし、小さいテストデータを使用するため、大きいテストデータを使用して測定される本来の期待ロスＥＬと比較すると、測定値の分散が大きくなる。その点で、小さなテストデータで測定される期待ロスＥＬは、近似値または推定値であると言える。 The expected loss EL measured by the above method is a measured value corresponding to the large training data size because the target model is learned with a large training data size. However, since small test data is used, the dispersion of the measured values becomes large compared to the original expected loss EL measured using large test data. In that respect, the expected loss EL measured with small test data can be said to be an approximation or an estimate.

分散関数ｆ（ｎ，ＥＬ）に入力する期待ロスＥＬの精度を上げるため、機械学習装置１００は、期待ロスＥＬの推定とテストデータサイズｎの選択を２回繰り返してもよい。例えば、機械学習装置１００は、小さなテストデータで測定した期待ロスＥＬを分散関数ｆ（ｎ，ＥＬ）に入力し、以下で説明する方法でテストデータサイズｎを仮選択する。機械学習装置１００は、データ集合から当該仮選択したサイズのテストデータを抽出し、抽出したテストデータを用いて期待ロスＥＬを再測定する。機械学習装置１００は、再測定した期待ロスＥＬを分散関数ｆ（ｎ，ＥＬ）に入力し、以下で説明する方法でテストデータサイズｎを再選択し、これを最終的なテストデータサイズと決定する。 In order to increase the accuracy of the expected loss EL input to the variance function f(n, EL), the machine learning device 100 may repeat the estimation of the expected loss EL and the selection of the test data size n twice. For example, the machine learning device 100 inputs the expected loss EL measured with small test data into the variance function f(n, EL), and tentatively selects the test data size n by the method described below. The machine learning device 100 extracts test data of the temporarily selected size from the data set, and re-measures the expected loss EL using the extracted test data. The machine learning device 100 inputs the remeasured expected loss EL to the variance function f(n, EL), reselects the test data size n by the method described below, and determines this as the final test data size. do.

期待ロスＥＬを分散関数ｆ（ｎ，ＥＬ）に入力して期待ロスＥＬを固定すると、分散関数は、テストデータサイズｎと分散の推定値とを１対１に対応付ける対応関係を表す。機械学習装置１００は、分散関数のテストデータサイズｎを変動させながら分散の推定値を評価することで、適切なテストデータサイズｎを決定する。 When the expected loss EL is input to the variance function f(n, EL) and the expected loss EL is fixed, the variance function represents a one-to-one correspondence between the test data size n and the estimated value of variance. The machine learning apparatus 100 determines an appropriate test data size n by evaluating the variance estimate while varying the test data size n of the variance function.

テストデータサイズｎと分散の推定値との対応関係は、テストデータサイズｎの増加に応じて、分散の推定値が下限に漸近するように減少する非線形曲線に相当する。テストデータサイズｎが小さいうちは、テストデータサイズｎの単位増加量あたりの分散の推定値の減少量が大きい。テストデータサイズｎが大きいほど、テストデータサイズｎの単位増加量あたりの分散の推定値の減少量が小さくなる。予測性能の測定値の信頼性を維持しつつテストデータサイズｎを小さくするため、適切なテストデータサイズｎは、分散の推定値が許容できる程度に小さい範囲で、できる限り小さいサイズとする。 The correspondence between the test data size n and the estimated variance corresponds to a non-linear curve in which the estimated variance decreases as the test data size n increases so that the estimated variance approaches the lower limit. As long as the test data size n is small, the amount of decrease in the estimated variance per unit increase in the test data size n is large. The larger the test data size n, the smaller the decrease in variance estimate per unit increase in test data size n. In order to keep the test data size n small while maintaining a reliable measure of predictive performance, a suitable test data size n should be as small as possible while the variance estimate is acceptably small.

例えば、機械学習装置１００は、効果指標としてｆ（ｎ，ＥＬ）／ｆ（２＊ｎ，ＥＬ）を算出する。この効果指標は、テストデータサイズｎを２倍にした場合の分散の減少率に相当し、分散減少効果の評価指標である。効果指標の値が大きいほど分散減少効果が大きいことを示し、効果指標の値が小さいほど分散減少効果が小さいことを示す。テストデータサイズｎと分散の推定値の関係から、ｎが大きいほど効果指標の値は小さくなる。 For example, the machine learning device 100 calculates f(n, EL)/f(2*n, EL) as an effect index. This effect index corresponds to the reduction rate of variance when the test data size n is doubled, and is an evaluation index of the effect of reducing variance. A larger value of the effect index indicates a greater effect of reducing variance, and a smaller value of the effect index indicates a smaller effect of reducing variance. From the relationship between the test data size n and the estimated value of variance, the value of the effect index decreases as n increases.

機械学習装置１００は、小さいテストデータサイズｎで効果指標の値を算出し、閾値と比較する。閾値は、１．１などと予め決めておく。効果指標の値が閾値以上である場合、機械学習装置１００は、テストデータサイズｎを２倍にし、効果指標の値が閾値未満になるまで上記を繰り返す。効果指標の値が閾値未満になると、機械学習装置１００は、その時点のテストデータサイズｎを適切なテストデータサイズとして決定する。 The machine learning device 100 calculates the value of the effect index with a small test data size n and compares it with the threshold. A threshold such as 1.1 is determined in advance. If the value of the effect index is greater than or equal to the threshold, the machine learning device 100 doubles the test data size n and repeats the above until the value of the effect index becomes less than the threshold. When the value of the effect index becomes less than the threshold, the machine learning device 100 determines the test data size n at that time as an appropriate test data size.

なお、上記の方法におけるテストデータサイズｎの増加速度である「２倍」や閾値の「１．１」は調整可能パラメータであり、ユーザがこれらのパラメータを変更することも可能である。また、分散関数ｆ（ｎ，ＥＬ）から適切なテストデータサイズｎを探索する他の方法として、例えば、機械学習装置１００は、テストデータサイズｎを無限大にした場合の分散の推定値の下限を算出する。そして、機械学習装置１００は、分散の推定値が下限の所定倍（例えば、１．１倍）になるようなテストデータサイズｎを選択する。 It should be noted that the increasing speed of the test data size n in the above method of "double" and the threshold value of "1.1" are adjustable parameters, and the user can change these parameters. As another method of searching for an appropriate test data size n from the variance function f(n, EL), for example, the machine learning device 100 calculates the lower limit of the estimated variance when the test data size n is infinite. Calculate Then, the machine learning device 100 selects the test data size n such that the estimated value of variance is a predetermined multiple (for example, 1.1 times) of the lower limit.

このようにして機械学習装置１００によって決定されるテストデータサイズは、訓練データサイズの２分の１または４分の１をテストデータサイズとする慣習的方法と比べて、十分に小さいサイズとなる。例えば、訓練データサイズが１００万レコードである場合、慣習的方法では、テストデータサイズが５０万レコードまたは２５万レコードとなる。これに対して、第２の実施の形態の方法によれば、予測性能の測定値の分散を慣習的方法と同程度に維持しつつ、テストデータサイズを数万レコード程度に削減できる。よって、予測性能の測定値の信頼性を維持しつつ、テスト処理を高速化できる。 The test data size determined by the machine learning apparatus 100 in this way is sufficiently small compared to conventional methods in which the test data size is set to 1/2 or 1/4 of the training data size. For example, if the training data size is 1 million records, the conventional method results in a test data size of 500,000 or 250,000 records. In contrast, according to the method of the second embodiment, the test data size can be reduced to about tens of thousands of records while maintaining the variance of the predictive performance measurement at the same level as the conventional method. Therefore, it is possible to speed up the test process while maintaining the reliability of the measured value of the predicted performance.

なお、第２の実施の形態で決定される最終的なテストデータサイズｎは、慣習的方法よりも十分に小さい。そのため、テストデータを用いて期待ロスＥＬを算出することを１回または２回行っても、全体のテスト処理の負荷は慣習的方法よりも十分に小さくなる。 Note that the final test data size n determined in the second embodiment is much smaller than the conventional method. Therefore, even if the expected loss EL is calculated using the test data once or twice, the overall test processing load is much smaller than that of the conventional method.

次に、機械学習装置１００の機能および処理手順について説明する。
図６は、機械学習装置の機能例を示すブロック図である。
機械学習装置１００は、データ記憶部１２１、制御情報記憶部１２２、学習結果記憶部１２３、モデル生成部１２４、テスト実行部１２５、テストサイズ決定部１２６および機械学習制御部１２７を有する。データ記憶部１２１、制御情報記憶部１２２および学習結果記憶部１２３は、例えば、ＲＡＭ１０２またはＨＤＤ１０３の記憶領域を用いて実現される。モデル生成部１２４、テスト実行部１２５、テストサイズ決定部１２６および機械学習制御部１２７は、例えば、ＣＰＵ１０１が実行するプログラムを用いて実現される。 Next, functions and processing procedures of the machine learning device 100 will be described.
FIG. 6 is a block diagram showing an example of functions of the machine learning device.
The machine learning device 100 has a data storage unit 121 , a control information storage unit 122 , a learning result storage unit 123 , a model generation unit 124 , a test execution unit 125 , a test size determination unit 126 and a machine learning control unit 127 . The data storage unit 121, the control information storage unit 122, and the learning result storage unit 123 are realized using storage areas of the RAM 102 or the HDD 103, for example. The model generation unit 124, the test execution unit 125, the test size determination unit 126, and the machine learning control unit 127 are implemented using programs executed by the CPU 101, for example.

データ記憶部１２１は、訓練データまたはテストデータに使用可能な多数のレコードを含むデータ集合を記憶する。各レコードは、説明変数の値と教師ラベルである目的変数の値とを含む。データ集合は、数百万レコードなどサイズの大きなものであってもよい。機械学習装置１００は、ユーザからデータ集合を受け付けてもよいし、他の情報処理装置からデータ集合を受信してもよいし、センサデバイスからデータ集合を収集してもよい。 Data store 121 stores a data set containing a large number of records that can be used for training or test data. Each record contains the value of the explanatory variable and the value of the objective variable, which is the teacher label. Data sets may be large in size, such as millions of records. The machine learning device 100 may receive a data set from a user, may receive a data set from another information processing device, or may collect a data set from a sensor device.

制御情報記憶部１２２は、訓練データを用いたモデルの学習やテストデータを用いたモデルの予測性能の測定の過程で生成される各種の制御情報を記憶する。制御情報には、分散関数の生成に用いられる誤差プロファイルや分散関数のパラメータが含まれる。 The control information storage unit 122 stores various types of control information generated in the process of learning the model using training data and measuring the predictive performance of the model using test data. The control information includes the error profile used to generate the variance function and the parameters of the variance function.

学習結果記憶部１２３は、機械学習の結果を記憶する。機械学習の結果には、学習されたモデルおよび当該モデルの予測性能の測定値が含まれる。
モデル生成部１２４は、機械学習によりモデルを生成する。モデル生成部１２４は、機械学習制御部１２７から機械学習アルゴリズムの指定と訓練データを受け付ける。モデル生成部１２４は、指定された機械学習アルゴリズムに従って、訓練データに含まれるレコードを用いてモデルの係数を決定することでモデルを学習する。機械学習アルゴリズムには、回帰分析、サポートベクタマシン、ランダムフォレストなどが含まれる。モデル生成部１２４は、学習されたモデルを機械学習制御部１２７に提供する。 The learning result storage unit 123 stores results of machine learning. Machine learning results include a learned model and a measure of the model's predictive performance.
The model generator 124 generates a model by machine learning. The model generation unit 124 receives specification of a machine learning algorithm and training data from the machine learning control unit 127 . The model generation unit 124 learns the model by determining coefficients of the model using records included in the training data according to a designated machine learning algorithm. Machine learning algorithms include regression analysis, support vector machines, random forests, etc. The model generator 124 provides the learned model to the machine learning controller 127 .

テスト実行部１２５は、モデルのテストを行う。テスト実行部１２５は、機械学習制御部１２７からモデルとテストデータを受け付ける。テスト実行部１２５は、テストデータのレコードに含まれる説明変数の値をモデルに入力し、モデルに従って目的変数の予測値を算出する。テスト実行部１２５は、テストデータのレコードに含まれる目的変数の真値とモデルから算出された予測値とを比較して、誤差を算出する。そして、テスト実行部１２５は、誤差を列挙した誤差プロファイルを生成する。 The test execution unit 125 tests the model. The test execution unit 125 receives models and test data from the machine learning control unit 127 . The test execution unit 125 inputs the values of the explanatory variables included in the test data record into the model, and calculates the predicted value of the objective variable according to the model. The test execution unit 125 compares the true value of the objective variable included in the test data record with the predicted value calculated from the model to calculate the error. Then, the test execution unit 125 generates an error profile listing the errors.

テスト実行部１２５は、誤差プロファイルを機械学習制御部１２７に提供する。または、テスト実行部１２５は、誤差プロファイルを予測性能または期待ロスに変換し、予測性能または期待ロスを機械学習制御部１２７に提供する。予測性能の指標には、正答率、適合率、平均二乗誤差、二乗平均平方根誤差などが含まれる。予測性能または期待ロスは、テストデータに含まれる複数のレコードに対応する誤差から算出することができる。機械学習制御部１２７に提供される情報は、機械学習制御部１２７の要求に応じて変わる。 Test execution unit 125 provides the error profile to machine learning control unit 127 . Alternatively, the test execution unit 125 converts the error profile into predicted performance or expected loss and provides the predicted performance or expected loss to the machine learning control unit 127 . Indices of predictive performance include correct answer rate, precision rate, mean square error, root mean square error, and the like. A predicted performance or expected loss can be calculated from errors corresponding to multiple records in the test data. The information provided to the machine learning control unit 127 changes according to the requests of the machine learning control unit 127 .

テストサイズ決定部１２６は、テストデータサイズを決定する。まず、テストサイズ決定部１２６は、機械学習制御部１２７から誤差プロファイルを受け付ける。この誤差プロファイルは、ｍセットの小さな訓練データを用いて学習されたｍ個のモデルに対して、小さなテストデータを用いて測定された誤差を列挙したものである。テストサイズ決定部１２６は、この誤差プロファイルを用いて、予測性能の測定値の分散を推定するための分散関数のパラメータを決定する。分散関数のパラメータには、尖度Ｋ１と不動点Ｃｏｒ１ｖと期待バイアスＥＢ２が含まれる。テストサイズ決定部１２６は、分散関数の式や分散関数のパラメータの決定方法を予め知っている。テストサイズ決定部１２６は、分散関数のパラメータを機械学習制御部１２７に提供する。 The test size determination unit 126 determines the test data size. First, the test size determination unit 126 receives an error profile from the machine learning control unit 127. FIG. This error profile lists the errors measured using small test data for m models trained using m sets of small training data. The test size determiner 126 uses this error profile to determine the parameters of the variance function for estimating the variance of the predictive performance measure. Parameters of the variance function include kurtosis K1, fixed point Cor1v, and expected bias EB2. The test size determining unit 126 knows in advance how to determine the formula of the variance function and the parameters of the variance function. The test size determination unit 126 provides the parameters of the variance function to the machine learning control unit 127 .

また、テストサイズ決定部１２６は、機械学習制御部１２７から、先に算出した分散関数のパラメータと、対象のモデルの訓練データサイズに対応する期待ロスＥＬを受け付ける。テストサイズ決定部１２６は、分散関数ｆ（ｎ，ＥＬ）に期待ロスＥＬを代入し、テストデータサイズｎを変えながら分散の推定値を算出する。そして、テストサイズ決定部１２６は、適切なテストデータサイズｎを決定して機械学習制御部１２７に提供する。例えば、テストサイズ決定部１２６は、分散の推定値から算出される効果指標の値が閾値以上である範囲で、最大のテストデータサイズｎを検出する。 The test size determination unit 126 also receives from the machine learning control unit 127 the parameter of the variance function calculated in advance and the expected loss EL corresponding to the training data size of the target model. The test size determination unit 126 substitutes the expected loss EL for the variance function f(n, EL), and calculates the estimated value of the variance while changing the test data size n. The test size determination unit 126 then determines an appropriate test data size n and provides it to the machine learning control unit 127 . For example, the test size determination unit 126 detects the maximum test data size n within a range in which the value of the effect index calculated from the estimated variance is equal to or greater than the threshold.

機械学習制御部１２７は、機械学習を制御する。まず、機械学習制御部１２７は、モデルの学習および予測性能の測定の対象とする機械学習アルゴリズムおよび訓練データサイズを特定する。対象の機械学習アルゴリズムおよび訓練データサイズは、ユーザから指定されてもよいし、所定の規則に従って機械学習制御部１２７が選択してもよい。 The machine learning control unit 127 controls machine learning. First, the machine learning control unit 127 specifies a machine learning algorithm and a training data size for model learning and prediction performance measurement. The target machine learning algorithm and training data size may be specified by the user, or may be selected by the machine learning control unit 127 according to a predetermined rule.

次に、機械学習制御部１２７は、テストサイズ決定部１２６に分散関数のパラメータを決定させる。ただし、分散関数のパラメータの決定は、予測性能を測定する対象のモデルが学習された後に行うようにすることも可能である。 Next, the machine learning control unit 127 causes the test size determination unit 126 to determine the parameters of the variance function. However, it is also possible to determine the parameters of the variance function after the model whose predictive performance is to be measured has been trained.

分散関数のパラメータの決定では、機械学習制御部１２７は、ｍセットの小さな訓練データと１セットの小さなテストデータを、データ記憶部１２１に記憶されたデータ集合から抽出する。機械学習制御部１２７は、ｍセットの訓練データをモデル生成部１２４に提供し、ｍ個のモデルをモデル生成部１２４から取得する。機械学習制御部１２７は、ｍ個のモデルと１セットのテストデータをテスト実行部１２５に提供し、誤差プロファイルをテスト実行部１２５から取得する。そして、機械学習制御部１２７は、誤差プロファイルをテストサイズ決定部１２６に提供し、分散関数のパラメータをテストサイズ決定部１２６から取得し、制御情報として制御情報記憶部１２２に格納する。 In determining the parameters of the variance function, the machine learning control unit 127 extracts m sets of small training data and one set of small test data from the data set stored in the data storage unit 121 . The machine learning control unit 127 provides m sets of training data to the model generation unit 124 and acquires m models from the model generation unit 124 . The machine learning control unit 127 provides the m models and one set of test data to the test execution unit 125 and acquires the error profile from the test execution unit 125 . Then, the machine learning control unit 127 provides the error profile to the test size determination unit 126, acquires the parameters of the variance function from the test size determination unit 126, and stores them in the control information storage unit 122 as control information.

次に、機械学習制御部１２７は、モデル生成部１２４に対象のモデルを学習させる。機械学習制御部１２７は、先に特定したサイズの訓練データを、データ記憶部１２１に記憶されたデータ集合から抽出する。機械学習制御部１２７は、抽出した訓練データをモデル生成部１２４に提供し、学習されたモデルをテスト実行部１２５から取得する。機械学習制御部１２７は、モデルを学習結果記憶部１２３に格納する。 Next, the machine learning control unit 127 causes the model generation unit 124 to learn the target model. The machine learning control unit 127 extracts training data of the previously specified size from the data set stored in the data storage unit 121 . The machine learning control unit 127 provides the extracted training data to the model generation unit 124 and acquires the learned model from the test execution unit 125 . The machine learning control section 127 stores the model in the learning result storage section 123 .

次に、機械学習制御部１２７は、対象のモデルの予測性能を測定するための適切なテストデータサイズをテストサイズ決定部１２６に決定させる。まず、機械学習制御部１２７は、学習結果記憶部１２３に記憶されたモデルと、分散関数のパラメータの決定の際に使用した小さなテストデータとを、テスト実行部１２５に提供する。機械学習制御部１２７は、このために小さなテストデータを保存しておいてもよい。また、機械学習制御部１２７は、分散関数のパラメータの決定の際に使用したテストデータに代えて、同等のサイズのテストデータを、データ記憶部１２１に記憶されたデータ集合から抽出してもよい。 Next, the machine learning control unit 127 causes the test size determination unit 126 to determine an appropriate test data size for measuring the prediction performance of the target model. First, the machine learning control unit 127 provides the test execution unit 125 with the model stored in the learning result storage unit 123 and small test data used in determining the parameters of the variance function. Machine learning control unit 127 may store a small amount of test data for this purpose. Further, the machine learning control unit 127 may extract test data of the same size from the data set stored in the data storage unit 121 instead of the test data used in determining the parameters of the variance function. .

機械学習制御部１２７は、テスト実行部１２５から期待ロスを取得し、制御情報記憶部１２２に記憶された分散関数のパラメータと期待ロスをテストサイズ決定部１２６に提供する。ただし、機械学習制御部１２７は、対象のモデルを用いて期待ロスを測定する代わりに、回帰分析などの統計的方法によって期待ロスを推定してもよい。機械学習制御部１２７は、テストサイズ決定部１２６からテストデータサイズを取得する。 The machine learning control unit 127 acquires the expected loss from the test execution unit 125 and provides the test size determination unit 126 with the parameters of the variance function and the expected loss stored in the control information storage unit 122 . However, instead of measuring the expected loss using the target model, the machine learning control unit 127 may estimate the expected loss by a statistical method such as regression analysis. The machine learning control unit 127 acquires the test data size from the test size determination unit 126. FIG.

すると、機械学習制御部１２７は、データ記憶部１２１に記憶されたデータ集合から、決定されたサイズのテストデータを抽出する。テストデータに含まれるレコードは訓練データと重複しないことが好ましい。機械学習制御部１２７は、抽出したテストデータと学習結果記憶部１２３に記憶されたモデルとをテスト実行部１２５に提供する。機械学習制御部１２７は、テスト実行部１２５から予測性能の測定値を取得し、学習結果記憶部１２３に格納する。ただし、機械学習制御部１２７は、上記のテストデータに対して、更新された期待ロスをテスト実行部１２５から取得し、更新された期待ロスに基づいて、更新されたテストデータサイズをテストサイズ決定部１２６から取得してもよい。 Then, the machine learning control unit 127 extracts test data of the determined size from the data set stored in the data storage unit 121 . It is preferable that the records contained in the test data do not overlap with the training data. The machine learning control unit 127 provides the test execution unit 125 with the extracted test data and the model stored in the learning result storage unit 123 . The machine learning control unit 127 acquires the prediction performance measurement value from the test execution unit 125 and stores it in the learning result storage unit 123 . However, the machine learning control unit 127 acquires the updated expected loss for the test data from the test execution unit 125, and determines the updated test data size based on the updated expected loss. It may be obtained from the unit 126 .

モデルの学習と予測性能の測定が完了すると、機械学習制御部１２７は、モデルおよび予測性能の測定値を出力する。例えば、機械学習制御部１２７は、表示装置１１１にモデルおよび予測性能の測定値を表示する。機械学習制御部１２７は、他の出力デバイスにモデルおよび予測性能の測定値を出力してもよい。また、例えば、機械学習制御部１２７は、他の情報処理装置にモデルおよび予測性能の測定値を送信する。 After model learning and prediction performance measurement is complete, the machine learning control unit 127 outputs the model and prediction performance measurements. For example, the machine learning controller 127 displays the model and predictive performance measurements on the display device 111 . The machine learning controller 127 may output the model and predictive performance measurements to other output devices. Also, for example, the machine learning control unit 127 transmits the model and the measurement value of the prediction performance to other information processing devices.

図７は、誤差プロファイルテーブルの例を示す図である。
誤差プロファイルテーブル１３１は、制御情報記憶部１２２に記憶される。誤差プロファイルテーブル１３１は、ｍセットの訓練データとｎレコードのテストデータとの間で網羅的に算出されたｍ×ｎ個の誤差を記憶する。誤差プロファイルテーブル１３１の列は、訓練データＤ_１，Ｄ_２，…，Ｄ_ｍに対応する。誤差プロファイルテーブル１３１の行は、テストデータのｎ個のレコードに含まれる入力値Ｘ_１，Ｘ_２，…，Ｘ_ｎに対応する。１つの訓練データＤ_ｋから学習された１つのモデルに、テストデータの１つのレコードに含まれる入力値Ｘ_ｉを入力することで、予測値と真値との差である誤差ｅ_ｉｋが算出される。 FIG. 7 is a diagram showing an example of an error profile table.
The error profile table 131 is stored in the control information storage unit 122. FIG. The error profile table 131 stores m×n errors exhaustively calculated between m sets of training data and n records of test data. The columns of error profile table 131 correspond to training data D ₁ , D ₂ , . . . , D _m . Rows of error profile table 131 correspond to input values X ₁ , X ₂ , . . . , X _n contained in n records of test data. By inputting the input value X _i included in one record of the test data into one model learned from one training data D _k , the error e _ik that is the difference between the predicted value and the true value is calculated. be.

図８は、分散関数テーブルの例を示す図である。
分散関数テーブル１３２は、制御情報記憶部１２２に記憶される。分散関数テーブル１３２は、尖度Ｋ１、不動点Ｃｏｒ１ｖおよび期待バイアスＥＢ２の３つのパラメータに対応する値を記憶する。これら３つのパラメータは、数式（４４）に含まれるパラメータであって、訓練データサイズに依存しないパラメータである。分散関数テーブル１３２に記憶される値は、誤差プロファイルテーブル１３１から算出される。 FIG. 8 is a diagram showing an example of a distributed function table.
The distributed function table 132 is stored in the control information storage unit 122 . The variance function table 132 stores values corresponding to three parameters: kurtosis K1, fixed point Cor1v, and expected bias EB2. These three parameters are parameters included in Equation (44) and are parameters that do not depend on the training data size. The values stored in variance function table 132 are calculated from error profile table 131 .

図９は、機械学習の手順例を示すフローチャートである。
（Ｓ１０）機械学習制御部１２７は、機械学習アルゴリズムと訓練データサイズを指定する。機械学習アルゴリズムと訓練データサイズの指定はユーザから受け付けてもよい。 FIG. 9 is a flowchart illustrating an example of machine learning procedures.
(S10) The machine learning control unit 127 designates a machine learning algorithm and training data size. The machine learning algorithm and training data size may be specified by the user.

（Ｓ１１）機械学習制御部１２７は、データ記憶部１２１からｍセットの小サイズの訓練データと１セットの小サイズのテストデータを抽出する。例えば、１万レコードの訓練データが１０セット抽出され、５０００レコードのテストデータが１セット抽出される。 ( S<b>11 ) The machine learning control unit 127 extracts m sets of small size training data and one set of small size test data from the data storage unit 121 . For example, 10 sets of 10,000-record training data are extracted, and 1 set of 5,000-record test data is extracted.

（Ｓ１２）モデル生成部１２４は、ステップＳ１０で指定された機械学習アルゴリズムに従って、ｍセットの訓練データからｍ個のモデルを学習する。
（Ｓ１３）テスト実行部１２５は、ステップＳ１２で学習されたｍ個のモデルに、ステップＳ１１のテストデータの各レコードを入力して誤差を算出し、算出した誤差を列挙した誤差プロファイルテーブル１３１を生成する。具体的には、テスト実行部１２５は、１つのモデルとテストデータの１つのレコードの組毎に、レコードに含まれる説明変数の値をモデルに入力し、モデルから算出された目的変数の予測値とレコードに含まれる真値との差を誤差として算出する。例えば、１０個のモデルと５０００レコードのテストデータから、１０×５０００個の誤差を含む誤差プロファイルテーブル１３１が生成される。 (S12) The model generator 124 learns m models from m sets of training data according to the machine learning algorithm specified in step S10.
(S13) The test execution unit 125 inputs each record of the test data in step S11 to the m models learned in step S12, calculates errors, and generates an error profile table 131 listing the calculated errors. do. Specifically, for each set of one model and one record of test data, the test execution unit 125 inputs the value of the explanatory variable included in the record to the model, and the predicted value of the objective variable calculated from the model. and the true value contained in the record is calculated as the error. For example, an error profile table 131 containing 10×5000 errors is generated from 10 models and 5000 records of test data.

（Ｓ１４）テストサイズ決定部１２６は、誤差プロファイルテーブル１３１から、所定の数式に従って、分散関数ｆ（ｎ，ＥＬ）を規定するパラメータの値を決定する。パラメータには、尖度Ｋ１と不動点Ｃｏｒ１ｖと期待バイアスＥＢ２が含まれる。ここで決定されるパラメータの値は、使用するデータ集合と指定された機械学習アルゴリズムに依存するものである一方、訓練データサイズに依存しないものである。 (S14) From the error profile table 131, the test size determination unit 126 determines parameter values that define the variance function f(n, EL) according to a predetermined formula. The parameters include kurtosis K1, fixed point Cor1v, and expected bias EB2. The parameter values determined here depend on the data set used and the specified machine learning algorithm, but do not depend on the training data size.

（Ｓ１５）機械学習制御部１２７は、データ記憶部１２１から、ステップＳ１０で指定されたサイズの訓練データを抽出する。
（Ｓ１６）モデル生成部１２４は、ステップＳ１０で指定された機械学習アルゴリズムに従って、ステップＳ１５で抽出された訓練データからモデルを学習する。 (S15) The machine learning control unit 127 extracts training data of the size specified in step S10 from the data storage unit 121. FIG.
(S16) The model generator 124 learns a model from the training data extracted in step S15 according to the machine learning algorithm specified in step S10.

図１０は、機械学習の手順例を示すフローチャート（続き）である。
（Ｓ１７）テスト実行部１２５は、ステップＳ１６で学習されたモデルに、ステップＳ１１で抽出された小サイズのテストデータの各レコードを入力して誤差を算出する。ただし、ステップＳ１１で抽出されたものとは異なるテストデータを使用してもよい。 FIG. 10 is a flowchart (continued) showing a procedure example of machine learning.
(S17) The test execution unit 125 inputs each record of the small size test data extracted in step S11 to the model learned in step S16 to calculate an error. However, test data different from that extracted in step S11 may be used.

（Ｓ１８）テスト実行部１２５は、ステップＳ１７で算出された誤差から、所定の数式に従って、ステップＳ１６で学習されたモデルの期待ロスＥＬを推定する。
（Ｓ１９）テストサイズ決定部１２６は、ステップＳ１４で決定されたパラメータの値をもつ分散関数ｆ（ｎ，ＥＬ）に、ステップＳ１８で推定された期待ロスＥＬを代入する。テストサイズ決定部１２６は、分散関数ｆ（ｎ，ＥＬ）により算出される分散が所定条件を満たす範囲で、最大のテストデータサイズｎ１を判定する。例えば、テストサイズ決定部１２６は、テストデータサイズｎを２倍にした場合の分散の減少率を示す効果指標の値と所定の閾値とを比較し、効果指標の値が閾値未満になるまでテストデータサイズｎを２倍にすることを繰り返す。これにより、最大のテストデータサイズｎ１が選択される。 (S18) The test execution unit 125 estimates the expected loss EL of the model learned in step S16 from the error calculated in step S17 according to a predetermined formula.
(S19) The test size determination unit 126 substitutes the expected loss EL estimated in step S18 into the variance function f(n, EL) having the parameter values determined in step S14. The test size determination unit 126 determines the maximum test data size n1 within a range in which the variance calculated by the variance function f(n, EL) satisfies a predetermined condition. For example, the test size determination unit 126 compares the value of the effect index indicating the reduction rate of variance when the test data size n is doubled with a predetermined threshold, and performs the test until the value of the effect index becomes less than the threshold. Repeat doubling the data size n. Thereby, the maximum test data size n1 is selected.

（Ｓ２０）機械学習制御部１２７は、データ記憶部１２１から、ステップＳ１９で判定されたサイズｎ１のテストデータを抽出する。
（Ｓ２１）テスト実行部１２５は、ステップＳ１６で学習されたモデルに、ステップＳ２０で抽出されたテストデータの各レコードを入力して誤差を算出する。 (S20) The machine learning control unit 127 extracts from the data storage unit 121 the test data of the size n1 determined in step S19.
(S21) The test execution unit 125 inputs each record of the test data extracted in step S20 to the model learned in step S16 to calculate an error.

（Ｓ２２）テスト実行部１２５は、ステップＳ２１で算出された誤差から、所定の数式に従って、ステップＳ１６で学習されたモデルの期待ロスＥＬを再推定する。
（Ｓ２３）テストサイズ決定部１２６は、ステップＳ１４で決定されたパラメータの値をもつ分散関数ｆ（ｎ，ＥＬ）に、ステップＳ２２で再推定された期待ロスＥＬを代入する。テストサイズ決定部１２６は、分散関数ｆ（ｎ，ＥＬ）により算出される分散が所定条件を満たす範囲で、最大のテストデータサイズｎ２を判定する。テストデータサイズｎ２の判定方法は、ステップＳ１９と同様の方法でよい。 (S22) The test execution unit 125 re-estimates the expected loss EL of the model learned in step S16 from the error calculated in step S21 according to a predetermined formula.
(S23) The test size determination unit 126 substitutes the expected loss EL re-estimated in step S22 into the variance function f(n, EL) having the parameter values determined in step S14. The test size determination unit 126 determines the maximum test data size n2 within a range in which the variance calculated by the variance function f(n, EL) satisfies a predetermined condition. A method for determining the test data size n2 may be the same method as in step S19.

（Ｓ２４）機械学習制御部１２７は、データ記憶部１２１から、ステップＳ２３で判定されたサイズｎ２のテストデータを抽出する。
（Ｓ２５）テスト実行部１２５は、ステップＳ１６で学習されたモデルに、ステップＳ２４で抽出されたテストデータの各レコードを入力して誤差を算出する。テスト実行部１２５は、算出された誤差から、当該モデルの予測性能の測定値を算出する。 (S24) The machine learning control unit 127 extracts from the data storage unit 121 the test data of size n2 determined in step S23.
(S25) The test execution unit 125 inputs each record of the test data extracted in step S24 to the model learned in step S16 to calculate an error. Test execution unit 125 calculates a measure of the predictive performance of the model from the calculated error.

（Ｓ２６）機械学習制御部１２７は、ステップＳ１６で学習されたモデルとステップＳ２５で算出された予測性能の測定値を、学習結果記憶部１２３に保存する。また、機械学習制御部１２７は、モデルおよび予測性能の測定値を表示装置１１１に表示する。 (S26) The machine learning control unit 127 stores the model learned in step S16 and the measured value of prediction performance calculated in step S25 in the learning result storage unit 123. FIG. The machine learning control unit 127 also displays the measured values of the model and prediction performance on the display device 111 .

なお、上記のフローチャートでは、対象となるモデルの期待ロスＥＬの推定を２回繰り返している。期待ロスＥＬの推定を１回だけ行う場合、上記のステップＳ１９～Ｓ２２を省略することができる。また、対象となるモデルを使用せずに統計的方法により期待ロスＥＬを推定する場合、上記のステップＳ１７～Ｓ２２を省略することができる。 Note that in the above flowchart, the estimation of the expected loss EL of the target model is repeated twice. If the expected loss EL is estimated only once, the above steps S19 to S22 can be omitted. Also, when the expected loss EL is estimated by a statistical method without using the target model, the above steps S17 to S22 can be omitted.

第２の実施の形態の機械学習装置１００によれば、複数セットの小さい訓練データと１セットの小さいテストデータを用いて、同一のデータ集合および機械学習アルゴリズムのもとで生じる誤差の分布を示す誤差プロファイルが生成される。誤差プロファイルに基づいて、期待ロスとテストデータサイズを引数としてもち、予測性能の測定値の分散を算出する分散関数が決定される。そして、大きい訓練データを用いて学習された対象モデルの期待ロスが推定され、分散関数が示すテストデータサイズと分散の対応関係に基づいて、対象モデルの予測性能を測定するための適切なテストデータサイズが決定される。 According to the machine learning device 100 of the second embodiment, multiple sets of small training data and one set of small test data are used to show the distribution of errors that occur under the same data set and machine learning algorithm. An error profile is generated. Based on the error profile, a variance function is determined that takes the expected loss and the test data size as arguments and computes the variance of the predictive performance measure. Then, the expected loss of the target model learned using large training data is estimated, and appropriate test data for measuring the prediction performance of the target model based on the correspondence between the test data size and the variance indicated by the variance function size is determined.

テストデータサイズは、予測性能の測定値が実用上十分な信頼性をもつ範囲、すなわち、その分散が許容できる範囲で、できる限り小さいサイズに決定される。これにより、テストデータサイズが小さ過ぎることにより予測性能の測定値の信頼性が低下することを抑制できる。また、テストデータサイズが大き過ぎることにより予測性能の測定値の信頼性向上に寄与しない無駄なテスト処理が発生することを抑制でき、テスト処理の負荷を軽減してテスト時間を短縮できる。よって、学習されたモデルの予測性能を高信頼かつ短時間で測定することができ、テスト処理を効率化することができる。例えば、テストデータサイズを訓練データサイズの２分の１から４分の１程度とする慣習的方法と比べて、測定値の分散を同程度に抑えつつ、テストデータサイズを削減することができる。 The test data size is determined to be as small as possible within a range in which the measured value of predictive performance is sufficiently reliable in practice, that is, within a range in which the variance is acceptable. As a result, it is possible to prevent the test data size from being too small, thereby reducing the reliability of the measured value of predictive performance. In addition, it is possible to suppress the occurrence of useless test processing that does not contribute to improving the reliability of the measured value of the predicted performance due to the test data size being too large, thereby reducing the load of the test processing and shortening the test time. Therefore, it is possible to measure the prediction performance of the learned model with high reliability and in a short period of time, and to improve the efficiency of test processing. For example, the test data size can be reduced while suppressing the variance of the measured values to the same degree as compared with the conventional method in which the test data size is set to about one-half to one-fourth of the training data size.

１０機械学習装置
１１記憶部
１２処理部
１３データ集合
１４ａ，１４ｂ，１４ｃ，１８訓練データ
１５，１９テストデータ
１６誤差情報
１７対応関係 REFERENCE SIGNS LIST 10 machine learning device 11 storage unit 12 processing unit 13 data set 14a, 14b, 14c, 18 training data 15, 19 test data 16 error information 17 correspondence

Claims

コンピュータに、
データ集合から抽出された複数の第１の訓練データを用いて、機械学習により前記複数の第１の訓練データに対応する複数の第１のモデルを学習し、
前記データ集合から抽出された第１のテストデータに含まれる２以上のレコードそれぞれを前記複数の第１のモデルに入力することで、前記複数の第１のモデルと前記２以上のレコードとの組み合わせ毎に算出された予測誤差を示す誤差情報を生成し、
前記誤差情報に基づいて、テストデータのサイズとテストデータを用いて算出されるモデルの精度の測定値が有する分散との間の対応関係を判定し、
前記データ集合から抽出された第２の訓練データを用いて学習された第２のモデルの精度を、前記データ集合から抽出される第２のテストデータを用いて測定する場合に、前記対応関係に基づいて、前記第２のモデルに対して算出される精度の測定値の分散が所定条件を満たすように前記第２のテストデータのサイズを決定する、
処理を実行させる機械学習プログラム。 to the computer,
Using a plurality of first training data extracted from a data set, learning a plurality of first models corresponding to the plurality of first training data by machine learning;
By inputting each of two or more records included in the first test data extracted from the data set into the plurality of first models, combining the plurality of first models and the two or more records Generate error information indicating the prediction error calculated for each
Based on the error information, determine the correspondence between the size of the test data and the variance of a measure of model accuracy calculated using the test data;
When measuring the accuracy of a second model learned using second training data extracted from the dataset using second test data extracted from the dataset, the correspondence relationship determining the size of the second test data such that the variance of the accuracy measurements calculated for the second model satisfies a predetermined condition based on
A machine learning program that makes you do things.

前記対応関係は、テストデータのサイズの増加に応じて分散が下限に漸近するように減少する非線形関係であり、前記第２のテストデータのサイズは、サイズの所定増加量に対する分散の減少度を示す効率性指標に基づいて決定される、
請求項１記載の機械学習プログラム。 The correspondence relationship is a non-linear relationship in which the variance decreases so as to asymptotically approach the lower limit as the size of the test data increases, and the size of the second test data indicates the degree of decrease in variance for a predetermined increase in size. determined based on the efficiency index shown,
The machine learning program according to claim 1.

前記所定条件は、前記効率性指標の値が閾値以上であることであり、前記第２のテストデータのサイズは、前記所定条件を満たす範囲で最大のサイズに決定される、
請求項２記載の機械学習プログラム。 The predetermined condition is that the value of the efficiency index is equal to or greater than a threshold, and the size of the second test data is determined to be the maximum size within a range that satisfies the predetermined condition.
3. The machine learning program according to claim 2.

前記対応関係の判定では、前記２以上のレコードそれぞれについて前記複数の第１のモデルに対して算出された予測誤差を平均化した予測バイアスを算出し、前記２以上のレコードの前記予測バイアスを合成して、前記対応関係を表すパラメータの値を決定する、
請求項１記載の機械学習プログラム。 In determining the correspondence relationship, a prediction bias is calculated by averaging the prediction errors calculated for the plurality of first models for each of the two or more records, and the prediction biases of the two or more records are synthesized. to determine the value of the parameter representing the correspondence,
The machine learning program according to claim 1.

前記対応関係の判定では、訓練データのサイズに依存しない第１のパラメータと訓練データのサイズに依存する第２のパラメータとテストデータのサイズを示す第３のパラメータとを用いて分散を算出する分散関数に対して、前記第１のパラメータの値を推定し、
前記第２のテストデータのサイズの決定では、前記第２のモデルの学習結果に基づいて前記第２のパラメータの値を推定し、前記第３のパラメータの値を変動させることで、分散が前記所定条件を満たすテストデータのサイズを探索する、
請求項１記載の機械学習プログラム。 In determining the correspondence relationship, the variance is calculated using a first parameter that does not depend on the size of the training data, a second parameter that depends on the size of the training data, and a third parameter that indicates the size of the test data. estimating a value of the first parameter for a function;
In determining the size of the second test data, the value of the second parameter is estimated based on the learning result of the second model, and the value of the third parameter is varied so that the variance is the search for a size of test data that satisfies a given condition;
The machine learning program according to claim 1.

前記第２のテストデータのサイズの決定では、前記第１のテストデータを前記第２のモデルに入力して算出される予測誤差に基づいて前記第２のパラメータの値を仮選択し、前記仮選択した第２のパラメータの値を用いてテストデータのサイズを仮選択し、前記データ集合から抽出された前記仮選択したサイズのテストデータを前記第２のモデルに入力して算出される予測誤差に基づいて前記第２のパラメータの値を決定する、
請求項５記載の機械学習プログラム。 In determining the size of the second test data, the value of the second parameter is provisionally selected based on the prediction error calculated by inputting the first test data into the second model, and the provisional Prediction error calculated by tentatively selecting the size of test data using the value of the selected second parameter, and inputting the test data of the tentatively selected size extracted from the data set into the second model determining the value of the second parameter based on
The machine learning program according to claim 5.

コンピュータが、
データ集合から抽出された複数の第１の訓練データを用いて、機械学習により前記複数の第１の訓練データに対応する複数の第１のモデルを学習し、
前記データ集合から抽出された第１のテストデータに含まれる２以上のレコードそれぞれを前記複数の第１のモデルに入力することで、前記複数の第１のモデルと前記２以上のレコードとの組み合わせ毎に算出された予測誤差を示す誤差情報を生成し、
前記誤差情報に基づいて、テストデータのサイズとテストデータを用いて算出されるモデルの精度の測定値が有する分散との間の対応関係を判定し、
前記データ集合から抽出された第２の訓練データを用いて学習された第２のモデルの精度を、前記データ集合から抽出される第２のテストデータを用いて測定する場合に、前記対応関係に基づいて、前記第２のモデルに対して算出される精度の測定値の分散が所定条件を満たすように前記第２のテストデータのサイズを決定する、
機械学習方法。 the computer
Using a plurality of first training data extracted from a data set, learning a plurality of first models corresponding to the plurality of first training data by machine learning;
By inputting each of two or more records included in the first test data extracted from the data set into the plurality of first models, combining the plurality of first models and the two or more records Generate error information indicating the prediction error calculated for each
Based on the error information, determine the correspondence between the size of the test data and the variance of a measure of model accuracy calculated using the test data;
When measuring the accuracy of a second model learned using second training data extracted from the dataset using second test data extracted from the dataset, the correspondence relationship determining the size of the second test data such that the variance of the accuracy measurements calculated for the second model satisfies a predetermined condition based on
machine learning method.

データ集合を記憶する記憶部と、
前記データ集合から抽出された複数の第１の訓練データを用いて、機械学習により前記複数の第１の訓練データに対応する複数の第１のモデルを学習し、前記データ集合から抽出された第１のテストデータに含まれる２以上のレコードそれぞれを前記複数の第１のモデルに入力することで、前記複数の第１のモデルと前記２以上のレコードとの組み合わせ毎に算出された予測誤差を示す誤差情報を生成し、前記誤差情報に基づいて、テストデータのサイズとテストデータを用いて算出されるモデルの精度の測定値が有する分散との間の対応関係を判定し、前記データ集合から抽出された第２の訓練データを用いて学習された第２のモデルの精度を、前記データ集合から抽出される第２のテストデータを用いて測定する場合に、前記対応関係に基づいて、前記第２のモデルに対して算出される精度の測定値の分散が所定条件を満たすように前記第２のテストデータのサイズを決定する処理部と、
を有する機械学習装置。 a storage unit for storing data sets;
Using the plurality of first training data extracted from the data set, learning a plurality of first models corresponding to the plurality of first training data by machine learning, and the first model extracted from the data set By inputting each of two or more records included in one piece of test data into the plurality of first models, the prediction error calculated for each combination of the plurality of first models and the two or more records is calculated as determining, based on said error information, a correspondence between the size of the test data and the variance of a measure of model accuracy computed using the test data; When measuring the accuracy of the second model learned using the extracted second training data using the second test data extracted from the data set, based on the correspondence, the a processing unit that determines the size of the second test data such that the variance of the accuracy measurements calculated for the second model satisfies a predetermined condition;
A machine learning device having