WO2019198408A1

WO2019198408A1 - Learning device, learning method, and learning program

Info

Publication number: WO2019198408A1
Application number: PCT/JP2019/010290
Authority: WO
Inventors: 豪啓安藤; 理貴近藤
Original assignee: 富士フイルム株式会社
Priority date: 2018-04-11
Filing date: 2019-03-13
Publication date: 2019-10-17
Also published as: JPWO2019198408A1; JP6804009B2

Abstract

Provided is a learning device comprising: a derivation unit for deriving an evaluation value for an output model which takes as input a plurality of combinations of experimental conditions for generating a material or a drug and performance values of experimental results and outputs the experimental conditions, said evaluation value of said output model being derived using the performance values of the experimental results obtained by inputting, into an experimental model for virtually carrying out experiments, the experimental conditions outputted by inputting the plurality of combinations into the output model; and a learning unit for learning the output model by machine learning taking into account the evaluation value derived by the derivation unit.

Description

学習装置、学習方法、及び学習プログラムLearning device, learning method, and learning program

　本開示は、学習装置、学習方法、及び学習プログラムに関する。 The present disclosure relates to a learning device, a learning method, and a learning program.

　第１の結果値を有するデータと第２の結果値を有するデータとの間の関係に基づいて、第２の結果値を有するデータを第１の結果値を有するデータに変化させるための評価項目及びその値を抽出するデータ分析装置が提案されている（特開２０００－３０５９４１号公報参照）。このデータ分析装置は、抽出した評価項目の値を変更する場合に、結果値への影響を調べ、かつ抽出した評価項目の値の変更の効果を計算する。 Evaluation item for changing data having the second result value to data having the first result value based on the relationship between the data having the first result value and the data having the second result value And a data analysis device for extracting the value has been proposed (see Japanese Patent Laid-Open No. 2000-305941). When changing the value of the extracted evaluation item, this data analysis apparatus examines the influence on the result value and calculates the effect of changing the value of the extracted evaluation item.

　また、訓練データの複数の属性値に対応する出力値の対応関係を、異なる複数の予測アルゴリズムを含む能動学習装置を用いて、複数の予測アルゴリズムでそれぞれ学習させるデータセット選択装置が提案されている（特開２００７－３０４７８２号公報参照）。このデータセット選択装置は、複数の予測アルゴリズムによりそれぞれ学習された複数の対応関係を用いて、予測データに対応する出力値を予測し、予測結果値として複数の予測アルゴリズム毎に複数取得する。また、このデータセット選択装置は、取得した複数の予測アルゴリズムによる複数の予測結果値のばらつきが、対応する予測データのデータセット内で大きいものを選択する。 In addition, there has been proposed a data set selection device that learns the correspondence between output values corresponding to a plurality of attribute values of training data using a plurality of prediction algorithms using an active learning device including a plurality of different prediction algorithms. (See JP 2007-304782). This data set selection device predicts output values corresponding to the prediction data using a plurality of correspondence relationships respectively learned by a plurality of prediction algorithms, and obtains a plurality of prediction values for each of the plurality of prediction algorithms. In addition, this data set selection device selects the one in which the variation of the plurality of prediction result values by the plurality of acquired prediction algorithms is large in the corresponding data set of the prediction data.

　また、技術的な系の出力量であって、入力量ベクトルの形式の複数の入力量に非線形に依存する出力量に関するモデルを算出する技術が提案されている（特開２０１６－５３０５８５号公報参照）。 Further, a technique has been proposed for calculating a model related to an output quantity that is a technical output quantity and is nonlinearly dependent on a plurality of input quantities in the form of an input quantity vector (see Japanese Patent Laid-Open No. 2016-530585). ).

　材料の研究開発において、性能がより良い材料を獲得するために、実験を繰り返し行うことによって、性能がより良い材料を探索することが行われている。この場合、過去に行われた材料を生成するための実験条件と実験結果の性能値との組み合わせから、適切な実験条件を新たに探索することができると研究開発の効率化のためにも好ましい。 In the research and development of materials, in order to obtain materials with better performance, searching for materials with better performance is performed by repeatedly conducting experiments. In this case, it is preferable to improve the efficiency of research and development that a new suitable experimental condition can be searched from the combination of the experimental condition for generating the material performed in the past and the performance value of the experimental result. .

　特開２０００－３０５９４１号公報に記載の技術では、データベースに存在するデータに類似するデータしか探索することができないため、材料を生成するための実験条件を探索する手法に適用したとしても、必ずしも適切な実験条件を探索することができない場合がある、という問題点がある。また、特開２００７－３０４７８２号公報及び特開２０１６－５３０５８５号公報に記載の技術は、そもそも新たな実験条件を探索することについては考慮されていない。なお、この問題点は、材料の研究開発に限らず、薬剤の研究開発でも発生し得る問題点である。 The technique described in Japanese Patent Laid-Open No. 2000-305941 can only search for data similar to the data existing in the database. Therefore, even if applied to a method for searching for experimental conditions for generating materials, it is not always appropriate. There is a problem that it may not be possible to search for an experimental condition. In addition, the techniques described in Japanese Patent Application Laid-Open Nos. 2007-304782 and 2016-530585 are not considered in the search for new experimental conditions in the first place. This problem is not limited to the research and development of materials, but can also occur in the research and development of drugs.

　本開示は、以上の事情を鑑みて成されたものであり、材料又は薬剤を生成するための適切な実験条件を探索可能とすることを目的とする。 This disclosure has been made in view of the above circumstances, and an object thereof is to enable search for appropriate experimental conditions for generating a material or a drug.

　本開示の学習装置は、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデルに、複数の組み合わせを入力することによって出力された実験条件を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値を用いて出力モデルの評価値を導出する導出部と、導出部により導出された評価値を反映させる機械学習によって出力モデルを学習させる学習部と、を備えている。 The learning device according to the present disclosure is configured by inputting a plurality of combinations of an experimental condition for generating a material or a drug and a performance value of the experimental result, and inputting the plurality of combinations into an output model having the experimental condition as an output. A derivation unit that derives the evaluation value of the output model using the performance value of the experimental result obtained by inputting the output experimental conditions into the experimental model for performing a virtual experiment, and the evaluation derived by the derivation unit And a learning unit that learns the output model by machine learning that reflects the value.

　これにより、材料又は薬剤を生成するための適切な実験条件を探索することができる。 This makes it possible to search for appropriate experimental conditions for generating materials or drugs.

　なお、本開示の学習装置は、評価値が、複数の性能値における目標とする性能を満たす値の比率が高いほど良い値であってもよいし、目標とする性能を満たす性能値が得られるまでの仮想的な実験回数が少ないほど良い値であってもよいし、性能値が目標とする性能に近いほど良い値であってもよい。 Note that in the learning device of the present disclosure, the evaluation value may be a better value as the ratio of the values that satisfy the target performance among the plurality of performance values is higher, or a performance value that satisfies the target performance can be obtained. The smaller the number of virtual experiments, the better the value, and the closer the performance value is to the target performance, the better the value.

　これにより、性能値として適切な値を用いて探索行動を評価することにより適切な出力モデルが定まる結果、材料又は薬剤を生成するための適切な実験条件を探索することができる。 This makes it possible to search for an appropriate experimental condition for generating a material or a drug as a result of determining an appropriate output model by evaluating a search action using an appropriate value as a performance value.

　また、本開示の学習装置では、導出部は、出力モデルから、予め定められた規則を満たさない実験条件が出力された場合、評価値を低く補正してもよい。 In the learning device of the present disclosure, the derivation unit may correct the evaluation value to be low when an experimental condition that does not satisfy a predetermined rule is output from the output model.

　これにより、過去の経験則等に基づく予め定められた規則を満たす実験条件が得られる可能性を高くすることができる結果、材料又は薬剤を生成するための適切な実験条件を探索することができる。 As a result, it is possible to increase the possibility of obtaining an experimental condition that satisfies a predetermined rule based on past empirical rules, etc., and as a result, it is possible to search for an appropriate experimental condition for generating a material or a drug. .

　また、本開示の学習装置では、導出部が、出力モデルから出力された実験条件を実際の実験に使用可能な実験条件に補正してもよい。 In the learning device of the present disclosure, the derivation unit may correct the experimental condition output from the output model to an experimental condition that can be used for an actual experiment.

　これにより、実際の実験に使用可能な実験条件が得られる可能性を高くすることができる結果、材料又は薬剤を生成するための適切な実験条件を探索することができる。 As a result, it is possible to increase the possibility of obtaining experimental conditions that can be used in actual experiments. As a result, it is possible to search for appropriate experimental conditions for generating a material or a drug.

　また、本開示の学習装置は、出力モデルが、遺伝的アルゴリズムを用いて学習されるモデルであってもよい。 Further, in the learning device according to the present disclosure, the output model may be a model learned using a genetic algorithm.

　これにより、材料又は薬剤を生成するためのより適切な実験条件を探索することができる。 This makes it possible to search for more appropriate experimental conditions for generating a material or a drug.

　本開示の学習装置は、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、強化学習における行動価値を出力とした出力モデルに、複数の組み合わせと複数の異なる実験条件の候補各々とをそれぞれ入力することにより出力された複数の行動価値のうち、所定値以上の行動価値に対応する実験条件の候補を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値に基づいて導出される値を報酬として、出力モデルを学習させる学習部を備えている。 The learning device of the present disclosure is an output model in which a plurality of combinations of experimental conditions for generating a material or a drug and performance values of experimental results, and experimental condition candidates are input, and the action value in reinforcement learning is output. A virtual experiment is performed to select a candidate for an experimental condition corresponding to an action value greater than or equal to a predetermined value among a plurality of action values output by inputting a plurality of combinations and a plurality of different experiment condition candidates. A learning unit is provided that learns the output model using a value derived based on the performance value of the experimental result obtained by inputting to the experimental model to be performed as a reward.

　なお、本開示の学習装置は、報酬が、複数の性能値における目標とする性能を満たす値の比率が高いほど良い値であるか、目標とする性能を満たす性能値が得られるまでの仮想的な実験回数が少ないほど良い値であるか、又は性能値が目標とする性能に近いほど良い値であってもよい。 Note that the learning device of the present disclosure is a virtual value until the reward is a better value as the ratio of values satisfying the target performance in the plurality of performance values is higher, or until a performance value satisfying the target performance is obtained. The smaller the number of experiments, the better the value, or the closer the performance value is to the target performance, the better the value.

　これにより、性能値として適切な値が用いられる結果、材料又は薬剤を生成するための適切な実験条件を探索することができる。 Thereby, as a result of using an appropriate value as the performance value, it is possible to search for an appropriate experimental condition for generating a material or a drug.

　また、本開示の学習装置は、強化学習が、Ｑ学習であり、行動価値が、Ｑ値であってもよい。 Further, in the learning device of the present disclosure, the reinforcement learning may be Q learning, and the action value may be a Q value.

　また、本開示の学習装置は、学習部により学習された出力モデルを用いる場合に、複数の実験条件の候補を出力モデルに逐次的に複数回入力することにより出力された累計の行動価値が最大となる実験条件の候補を次に実験対象とする実験条件の候補として出力する出力部を更に備えてもよい。 In addition, when using the output model learned by the learning unit, the learning device of the present disclosure has the maximum cumulative action value output by sequentially inputting a plurality of experimental condition candidates to the output model multiple times. An output unit may be further provided that outputs the experimental condition candidate to be the next experimental condition candidate to be tested.

　また、本開示の学習装置は、実験モデルが、機械学習により得られたモデルであってもよい。 In the learning device of the present disclosure, the experimental model may be a model obtained by machine learning.

　これにより、特定の問題に特化した出力モデルを生成することができる。 This makes it possible to generate an output model specialized for a specific problem.

　また、本開示の学習装置は、実験モデルが、複数存在し、複数の実験モデルのそれぞれの作成条件が異なってもよい。 In addition, the learning device of the present disclosure may include a plurality of experimental models, and the creation conditions of each of the plurality of experimental models may be different.

　これにより、作成条件の異なる実験モデルにより得られた複数の仮想的な実験結果を用いることによって、材料又は薬剤を生成するためのより適切な実験条件を探索することができる。また、実験モデルが、ｓｉｎ又はｅｘｐ等の関数を含んで構成された数式であってもよい。これにより、実験データが全く得られていないような実験系でも、出力モデルを生成することができる。 Thus, by using a plurality of virtual experimental results obtained from experimental models with different preparation conditions, it is possible to search for more appropriate experimental conditions for generating a material or a drug. Also, the experimental model may be a mathematical expression configured to include a function such as sin or exp. As a result, an output model can be generated even in an experimental system in which no experimental data is obtained.

　また、本開示の学習装置では、出力モデルが、複数存在し、複数の出力モデルのそれぞれの作成条件が異なってもよい。 In the learning device of the present disclosure, there may be a plurality of output models, and the creation conditions of the plurality of output models may be different.

　これにより、作成条件の異なる出力モデルにより得られた複数の実験条件から得られる複数の性能値の評価により学習することによって、材料又は薬剤を生成するためのより適切な実験条件を探索することができる。 This makes it possible to search for a more appropriate experimental condition for generating a material or a drug by learning by evaluating a plurality of performance values obtained from a plurality of experimental conditions obtained by an output model having different creation conditions. it can.

　本開示の学習方法は、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデルに、複数の組み合わせを入力することによって出力された実験条件を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値を用いて出力モデルの評価値を導出し、導出した評価値を反映させる機械学習によって出力モデルを学習させる処理をコンピュータが実行する方法である。 In the learning method of the present disclosure, a plurality of combinations of an experimental condition for generating a material or a drug and a performance value of the experimental result are input, and a plurality of combinations are input to an output model having the experimental condition as an output. Using machine learning to derive the evaluation value of the output model using the performance value of the experimental result obtained by inputting the output experimental condition into the experimental model that performs the virtual experiment, and to reflect the derived evaluation value This is a method in which a computer executes processing for learning an output model.

　本開示の学習プログラムは、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデルに、複数の組み合わせを入力することによって出力された実験条件を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値を用いて出力モデルの評価値を導出し、導出した評価値を反映させる機械学習によって出力モデルを学習させる処理をコンピュータに実行させるためのものである。 The learning program of the present disclosure receives a plurality of combinations of an experimental condition for generating a material or a drug and a performance value of the experimental result as an input, and inputs the plurality of combinations into an output model having the experimental condition as an output. Using machine learning to derive the evaluation value of the output model using the performance value of the experimental result obtained by inputting the output experimental condition into the experimental model that performs the virtual experiment, and to reflect the derived evaluation value This is for causing the computer to execute processing for learning the output model.

　本開示の学習方法は、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、強化学習における行動価値を出力とした出力モデルに、複数の組み合わせと複数の異なる実験条件の候補各々とをそれぞれ入力することにより出力された複数の行動価値のうち、所定値以上の行動価値に対応する実験条件の候補を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値に基づいて導出される値を報酬として、出力モデルを学習させる処理をコンピュータが実行する方法である。 The learning method of the present disclosure is an output model in which a plurality of combinations of experimental conditions for generating a material or a drug and performance values of experimental results, and candidate experimental conditions are input, and the action value in reinforcement learning is output. A virtual experiment is performed to select a candidate for an experimental condition corresponding to an action value greater than or equal to a predetermined value among a plurality of action values output by inputting a plurality of combinations and a plurality of different experiment condition candidates. This is a method in which a computer executes a process of learning an output model using a value derived based on a performance value of an experimental result obtained by inputting to an experimental model to be performed as a reward.

　本開示の学習プログラムは、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、強化学習における行動価値を出力とした出力モデルに、複数の組み合わせと複数の異なる実験条件の候補各々とをそれぞれ入力することにより出力された複数の行動価値のうち、所定値以上の行動価値に対応する実験条件の候補を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値に基づいて導出される値を報酬として、出力モデルを学習させる処理をコンピュータに実行させるためのものである。 The learning program of the present disclosure is an output model in which a plurality of combinations of experimental conditions for generating a material or a drug and performance values of experimental results, and candidate experimental conditions are input, and the action value in reinforcement learning is output. A virtual experiment is performed to select a candidate for an experimental condition corresponding to an action value greater than or equal to a predetermined value among a plurality of action values output by inputting a plurality of combinations and a plurality of different experiment condition candidates. This is for causing a computer to execute a process of learning an output model using a value derived based on a performance value of an experimental result obtained by inputting to an experimental model to be performed as a reward.

　また、本開示の学習装置は、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデルに、複数の組み合わせを入力することによって出力された実験条件を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値を用いて出力モデルの評価値を導出し、導出した評価値を反映させる機械学習によって出力モデルを学習させるプロセッサを有する。 Further, the learning device of the present disclosure inputs a plurality of combinations of experimental conditions for generating a material or a drug and performance values of the experimental results, and inputs a plurality of combinations to an output model that outputs the experimental conditions. This is a machine that derives the evaluation value of the output model using the performance value of the experimental result obtained by inputting the experimental condition output by the above into the experimental model for performing a virtual experiment, and reflects the derived evaluation value A processor for learning the output model by learning;

　また、本開示の学習装置は、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、強化学習における行動価値を出力とした出力モデルに、複数の組み合わせと複数の異なる実験条件の候補各々とをそれぞれ入力することにより出力された複数の行動価値のうち、所定値以上の行動価値に対応する実験条件の候補を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値に基づいて導出される値を報酬として、出力モデルを学習させるプロセッサを有する。 In addition, the learning device of the present disclosure outputs a combination of an experimental condition for generating a material or a drug and a performance value of the experimental result and an experimental condition candidate as inputs, and an action value in reinforcement learning as an output. Of the plurality of action values output by inputting a plurality of combinations and each of a plurality of different experiment condition candidates to the model, experimental condition candidates corresponding to action values equal to or greater than a predetermined value are virtually It has a processor that learns the output model by using as a reward a value derived based on the performance value of the experimental result obtained by inputting to the experimental model for performing the experiment.

　本開示によれば、材料又は薬剤を生成するための適切な実験条件を探索することができる。 According to the present disclosure, it is possible to search for an appropriate experimental condition for generating a material or a drug.

第１実施形態に係る学習フェーズにおける学習装置の機能的な構成の一例を示すブロック図である。It is a block diagram which shows an example of a functional structure of the learning apparatus in the learning phase which concerns on 1st Embodiment. 各実施形態に係る学習用データの一例を示す図である。It is a figure which shows an example of the data for learning which concern on each embodiment. 第１実施形態に係る出力モデルの一例を示す図である。It is a figure which shows an example of the output model which concerns on 1st Embodiment. 第１実施形態に係る出力モデルから出力されるデータのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of the data output from the output model which concerns on 1st Embodiment. 各実施形態に係る実験モデルの一例を示す図である。It is a figure which shows an example of the experimental model which concerns on each embodiment. 各実施形態に係る実験モデルから出力されるデータのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of the data output from the experimental model which concerns on each embodiment. 第１実施形態に係る出力モデルの評価値の導出処理を説明するための図である。It is a figure for demonstrating the derivation | leading-out process of the evaluation value of the output model which concerns on 1st Embodiment. 変形例に係る出力モデルの評価値の導出処理を説明するための図である。It is a figure for demonstrating the derivation | leading-out process of the evaluation value of the output model which concerns on a modification. 第１実施形態に係る運用フェーズにおける学習装置の機能的な構成の一例を示すブロック図である。It is a block diagram which shows an example of a functional structure of the learning apparatus in the operation phase which concerns on 1st Embodiment. 各実施形態に係る学習装置のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of the learning apparatus which concerns on each embodiment. 各実施形態に係る実験モデル学習処理の一例を示すフローチャートである。It is a flowchart which shows an example of the experimental model learning process which concerns on each embodiment. 第１実施形態に係る出力モデル学習処理の一例を示すフローチャートである。It is a flowchart which shows an example of the output model learning process which concerns on 1st Embodiment. 第１実施形態に係る実験条件出力処理の一例を示すフローチャートである。It is a flowchart which shows an example of the experiment condition output process which concerns on 1st Embodiment. 第２実施形態に係る学習フェーズにおける学習装置の機能的な構成の一例を示すブロック図である。It is a block diagram which shows an example of a functional structure of the learning apparatus in the learning phase which concerns on 2nd Embodiment. 第２実施形態に係る出力モデルの一例を示す図である。It is a figure which shows an example of the output model which concerns on 2nd Embodiment. 第２実施形態に係る出力モデルの評価値の導出処理を説明するための図である。It is a figure for demonstrating the derivation | leading-out process of the evaluation value of the output model which concerns on 2nd Embodiment. 第２実施形態に係る運用フェーズにおける学習装置の機能的な構成の一例を示すブロック図である。It is a block diagram which shows an example of a functional structure of the learning apparatus in the operation phase which concerns on 2nd Embodiment. 第２実施形態に係る出力モデル学習処理の一例を示すフローチャートである。It is a flowchart which shows an example of the output model learning process which concerns on 2nd Embodiment. 第２実施形態に係る実験条件出力処理の一例を示すフローチャートである。It is a flowchart which shows an example of the experiment condition output process which concerns on 2nd Embodiment.

　以下、図面を参照して、本開示の技術を実施するための形態例を詳細に説明する。 Hereinafter, exemplary embodiments for carrying out the technology of the present disclosure will be described in detail with reference to the drawings.

　［第１実施形態］
　まず、図１を参照して、本実施形態に係る学習フェーズにおける学習装置１０の機能的な構成について説明する。図１に示すように、学習装置１０は、導出部１２及び学習部１４を備える。また、学習装置１０の記憶部４２（図１０参照）には、学習用データ２０、複数の出力モデル２２、及び複数の実験モデル２４が記憶される。 [First Embodiment]
First, a functional configuration of the learning device 10 in the learning phase according to the present embodiment will be described with reference to FIG. As illustrated in FIG. 1, the learning device 10 includes a derivation unit 12 and a learning unit 14. Further, the storage unit 42 (see FIG. 10) of the learning device 10 stores learning data 20, a plurality of output models 22, and a plurality of experiment models 24.

　図２に、学習用データ２０の一例を示す。図２に示すように、本実施形態に係る学習用データ２０は、材料を生成するための実験条件と、その実験条件で実験を行った場合の実験結果の材料の性能値との組み合わせを含む。実験条件は、例えば、半導体レジスト材料等の材料を生成する際の条件であり、主成分組成、添加物量、及びプロセス条件を含む。図２の例では、主成分組成は、材料の主成分の比率を示し、添加物量は添加物の濃度を示し、プロセス条件は、材料を生成する際の温度を示す。 FIG. 2 shows an example of the learning data 20. As shown in FIG. 2, the learning data 20 according to the present embodiment includes a combination of an experimental condition for generating a material and a material performance value as an experimental result when an experiment is performed under the experimental condition. . The experimental conditions are, for example, conditions for generating a material such as a semiconductor resist material, and include a main component composition, an additive amount, and process conditions. In the example of FIG. 2, the main component composition indicates the ratio of the main component of the material, the additive amount indicates the concentration of the additive, and the process condition indicates the temperature at which the material is generated.

　また、学習用データ２０の性能値は、対応する実験条件により材料が生成された際の材料の性能値を示す。本実施形態に係る性能値は、材料の出来のよさを表す尺度であり、例えば、材料の表面の凹凸の度合い、及び所望の大きさの穴があけられたかを表す度合い等が挙げられる。また、本実施形態では、性能値が小さいほど材料の出来がよいことを示している。また、本実施形態の学習用データ２０は、複数の異なる実験条件と性能値との組み合わせを含む。なお、実験条件には、同じものが複数含まれていてもよい。 In addition, the performance value of the learning data 20 indicates the performance value of the material when the material is generated under the corresponding experimental condition. The performance value according to the present embodiment is a scale representing the quality of the material. Examples of the performance value include a degree of unevenness on the surface of the material and a degree representing whether a hole having a desired size is formed. In this embodiment, the smaller the performance value, the better the material. In addition, the learning data 20 of the present embodiment includes a combination of a plurality of different experimental conditions and performance values. The experimental conditions may include a plurality of the same conditions.

　図３に、出力モデル２２の一例を示す。図３に示すように、本実施形態に係る出力モデル２２は、入力層、複数の中間層、及び出力層を含むニューラルネットワークである。出力モデル２２の入力層には、実験条件と性能値との複数の組み合わせが入力される。出力モデル２２の出力層は、１つの実験条件を出力する。図４に、出力モデル２２の出力層から出力される実験条件のデータ構造の一例を示す。図４に示すように、出力モデル２２の出力層は、例えば、主成分組成、添加物量、及びプロセス条件を含む実験条件を出力する。 FIG. 3 shows an example of the output model 22. As shown in FIG. 3, the output model 22 according to this embodiment is a neural network including an input layer, a plurality of intermediate layers, and an output layer. A plurality of combinations of experimental conditions and performance values are input to the input layer of the output model 22. The output layer of the output model 22 outputs one experimental condition. FIG. 4 shows an example of the data structure of the experimental conditions output from the output layer of the output model 22. As shown in FIG. 4, the output layer of the output model 22 outputs experimental conditions including, for example, the main component composition, the additive amount, and the process conditions.

　詳細には、出力モデル２２は、例えば、以下の（１）～（３）に示すように構成される。
（１）入力層のノード数：Ｎ×Ｍ
　なお、Ｎは、実験条件の項目数を表し、Ｍは、実験回数を表す。
（２）中間層の構成：カーネルが３×３、フィルタ数が３２、ストライドが２、活性化関数がＲｅｌｕの畳み込み層を１０層有する。
（３）出力層のノード数：Ｎ×１ Specifically, the output model 22 is configured, for example, as shown in the following (1) to (3).
(1) Number of nodes in the input layer: N × M
N represents the number of items in the experimental condition, and M represents the number of experiments.
(2) Configuration of the intermediate layer: It has 10 convolution layers with a kernel of 3 × 3, a filter number of 32, a stride of 2, and an activation function of Relu.
(3) Number of nodes in the output layer: N × 1

　また、本実施の形態に係る複数の出力モデル２２は、それぞれモデルの作成条件が異なる。詳細には、複数の出力モデル２２は、中間層の層数、中間層の各層のノード数、及び重みの初期値の少なくとも１つが異なることによって、それぞれモデルの作成条件が異なる。 Also, the plurality of output models 22 according to the present embodiment have different model creation conditions. More specifically, the plurality of output models 22 have different model creation conditions depending on at least one of the number of intermediate layers, the number of nodes in each intermediate layer, and the initial value of the weight.

　図５に、実験モデル２４の一例を示す。図５に示すように、本実施形態に係る実験モデル２４は、入力層、複数の中間層、及び出力層を含むニューラルネットワークとされている。実験モデル２４は、仮想的な実験を行うモデルであり、実験モデル２４の入力層には、１つの実験条件が入力される。実験モデル２４の出力層は、入力層に入力された１つの実験条件に対応する実験結果の性能値を出力する。図６に、実験モデル２４の出力層から出力される実験結果の性能値のデータ構造の一例を示す。なお、実験モデル２４は、複数種類の性能値を出力してもよい。この場合、例えば、実験モデル２４は、材料の性能値として、材料の表面の凹凸の度合い、及び材料の光感度の双方を出力する。 FIG. 5 shows an example of the experimental model 24. As shown in FIG. 5, the experimental model 24 according to the present embodiment is a neural network including an input layer, a plurality of intermediate layers, and an output layer. The experimental model 24 is a model for performing a virtual experiment, and one experimental condition is input to the input layer of the experimental model 24. The output layer of the experimental model 24 outputs a performance value of an experimental result corresponding to one experimental condition input to the input layer. FIG. 6 shows an example of the data structure of the performance value of the experimental result output from the output layer of the experimental model 24. The experimental model 24 may output a plurality of types of performance values. In this case, for example, the experimental model 24 outputs both the degree of unevenness on the surface of the material and the light sensitivity of the material as the performance value of the material.

　詳細には、実験モデル２４は、例えば、以下の（４）～（６）に示すように構成される。
（４）入力層のノード数：Ｎ×１
　なお、Ｎは、実験条件の項目数を表す。
（５）中間層の構成：カーネルが３×３、フィルタ数が３２、ストライドが２、活性化関数がＲｅｌｕの畳み込み層を４層有する。
（６）出力層のノード数：１×Ｊ
　なお、Ｊは、性能値の種類数を表す。 Specifically, the experimental model 24 is configured, for example, as shown in the following (4) to (6).
(4) Number of nodes in the input layer: N × 1
N represents the number of items in the experimental condition.
(5) Configuration of the intermediate layer: It has four convolution layers with a kernel of 3 × 3, a filter number of 32, a stride of 2, and an activation function of Relu.
(6) Number of nodes in the output layer: 1 × J
J represents the number of types of performance values.

　また、本実施の形態に係る複数の実験モデル２４は、それぞれモデルの作成条件が異なる。詳細には、複数の実験モデル２４は、中間層の層数、中間層の各層のノード数、及び重みの初期値の少なくとも１つが異なることによって、それぞれモデルの作成条件が異なる。 In addition, the plurality of experimental models 24 according to the present embodiment have different model creation conditions. Specifically, the plurality of experimental models 24 have different model creation conditions, because at least one of the number of intermediate layers, the number of nodes of each layer of the intermediate layer, and the initial value of the weight is different.

　導出部１２は、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせを出力モデル２２に入力し、出力モデル２２から出力された実験条件を取得する。詳細には、導出部１２は、まず、学習用データ２０に含まれる全ての実験条件と性能値との組み合わせを出力モデル２２に入力し、出力モデル２２から出力された実験条件を取得する。なお、導出部１２は、学習用データ２０に含まれる一部の複数の実験条件と性能値との組み合わせを出力モデル２２に入力してもよいし、学習用データ２０とは異なる複数の実験条件と性能値との組み合わせを出力モデル２２に入力してもよい。 The deriving unit 12 inputs a plurality of combinations of the experimental conditions for generating the material and the performance values of the experimental results to the output model 22, and acquires the experimental conditions output from the output model 22. Specifically, the derivation unit 12 first inputs combinations of all experimental conditions and performance values included in the learning data 20 to the output model 22, and acquires the experimental conditions output from the output model 22. The derivation unit 12 may input a combination of some of the plurality of experimental conditions and performance values included in the learning data 20 to the output model 22 or a plurality of experimental conditions different from the learning data 20. And a combination of performance values may be input to the output model 22.

　また、導出部１２は、出力モデル２２から出力された実験条件を、実際の実験に使用可能な実験条件に補正する。本実施形態では、導出部１２は、出力モデル２２から出力された実験条件を、実際に使用する実験装置の制約を満たす最も近い実験条件に補正する。例えば、実験装置の仕様上、プロセス条件で設定可能な温度が５℃単位で、かつ出力モデル２２から出力された実験条件に含まれるプロセス条件の温度が５℃単位ではない温度（例えば、９２．３℃）の場合、導出部１２は、出力モデル２２から出力された上記温度を、最も近い５の倍数の温度（例えば、９０℃）に補正する。 Also, the derivation unit 12 corrects the experimental condition output from the output model 22 to an experimental condition that can be used for an actual experiment. In the present embodiment, the derivation unit 12 corrects the experimental condition output from the output model 22 to the closest experimental condition that satisfies the constraints of the experimental apparatus actually used. For example, in the specification of the experimental apparatus, the temperature that can be set in the process condition is a unit of 5 ° C., and the temperature of the process condition included in the experimental condition output from the output model 22 is not a unit of 5 ° C. (for example, 92. 3 °), the derivation unit 12 corrects the temperature output from the output model 22 to the nearest multiple of 5 (for example, 90 ° C.).

　次に、導出部１２は、補正して得られた実験条件を各実験モデル２４に入力し、各実験モデル２４から出力された性能値をそれぞれ取得する。 Next, the derivation unit 12 inputs the experimental conditions obtained by the correction to each experimental model 24, and acquires the performance values output from each experimental model 24, respectively.

　更に、導出部１２は、出力モデル２２に入力した複数セットの実験条件と性能値との複数の組み合わせに、それぞれ対応する実験モデル２４に入力した実験条件と導出した性能値との組み合わせを追加した複数セットの実験条件と性能値との複数の組み合わせを得る。そして、導出部１２は、再度、得られた複数セットの実験条件と性能値との複数の組み合わせを出力モデル２２に入力することにより得られた複数の実験条件をそれぞれ対応する各実験モデル２４に入力する。これにより、導出部１２は、再度、それぞれ対応する実験モデル２４に入力した実験条件に対応する性能値を得る。導出部１２は、以上の出力モデル２２に入力した複数セットの実験条件と性能値との複数の組み合わせに、それぞれ対応する実験モデル２４に入力した実験条件と得られた性能値との組み合わせを追加して、再度それぞれ対応する実験モデル２４を用いて性能値を得る処理を所定の回数（例えば、１００回）繰り返す。 Furthermore, the derivation unit 12 adds a combination of the experimental condition input to the corresponding experimental model 24 and the derived performance value to a plurality of combinations of the plurality of sets of experimental conditions input to the output model 22. Obtain multiple combinations of multiple sets of experimental conditions and performance values. The deriving unit 12 again inputs the plurality of combinations of the obtained plurality of sets of experimental conditions and performance values to the output model 22 to each of the corresponding experimental models 24. input. Thereby, the deriving unit 12 obtains performance values corresponding to the experimental conditions input to the corresponding experimental models 24 again. The deriving unit 12 adds a combination of the experimental condition input to the corresponding experimental model 24 and the obtained performance value to the plurality of combinations of the experimental condition and performance value input to the output model 22 described above. Then, the process of obtaining the performance value again using the corresponding experimental model 24 is repeated a predetermined number of times (for example, 100 times).

　また、導出部１２は、以上の処理を各出力モデル２２に対して行う。すなわち、導出部１２は、各出力モデル２２について、所定回数分の出力モデル２２から出力された実験条件と、その実験条件に対応する性能値との複数の組み合わせを得る。 Further, the derivation unit 12 performs the above processing on each output model 22. That is, for each output model 22, the derivation unit 12 obtains a plurality of combinations of the experimental conditions output from the output model 22 for a predetermined number of times and the performance values corresponding to the experimental conditions.

　導出部１２は、各出力モデル２２について、得られた所定回数分の性能値を用いて、出力モデル２２の評価値を導出する。本実施形態では、導出部１２は、一例として図７に示すように、目標とする性能を満たす性能値（本実施形態では、目標値以下である性能値）が得られるまでの仮想的な実験回数（図７に示すＮ）が少ないほど良い値として出力モデル２２の評価値を導出する。なお、図７の縦軸は性能値を示し、横軸はその性能値が何回目の仮想的な実験で得られた値であるかを表す仮想的な実験回数を示す。図７の例では、Ｎ回目の仮想的な実験で初めて目標とする性能を満たす性能値が得られたことを示している。 The derivation unit 12 derives an evaluation value of the output model 22 by using the obtained performance value for a predetermined number of times for each output model 22. In this embodiment, as shown in FIG. 7 as an example, the derivation unit 12 performs a virtual experiment until a performance value that satisfies the target performance (in this embodiment, a performance value that is equal to or less than the target value) is obtained. The evaluation value of the output model 22 is derived as a better value as the number of times (N shown in FIG. 7) is smaller. Note that the vertical axis in FIG. 7 indicates the performance value, and the horizontal axis indicates the number of virtual experiments indicating the number of virtual experiments obtained by the virtual value. In the example of FIG. 7, it is shown that a performance value that satisfies the target performance is obtained for the first time in the Nth virtual experiment.

　なお、導出部１２は、一例として図８に示すように、得られた所定回数分の性能値における目標とする性能を満たす性能値の比率（図８に示す全ての性能値の数に対する一点鎖線の矩形で囲まれた性能値の数の比率）が高いほど良い値として出力モデル２２の評価値を導出してもよい。なお、図８における「ｇｏｏｄ」は、目標とする性能を満たすことを意味する。また、導出部１２は、各性能値が目標値に近いほど良い値として出力モデル２２の評価値を導出してもよい。 As an example, as shown in FIG. 8, the derivation unit 12 has a ratio of performance values satisfying the target performance in the obtained performance values for a predetermined number of times (a chain line with respect to the number of all performance values shown in FIG. 8. The evaluation value of the output model 22 may be derived as a better value as the ratio of the number of performance values surrounded by the rectangle is higher. Note that “good” in FIG. 8 means that the target performance is satisfied. Further, the deriving unit 12 may derive the evaluation value of the output model 22 as a better value as each performance value is closer to the target value.

　なお、導出部１２は、出力モデル２２から予め定められた規則を満たさない実験条件が出力された場合に、上記評価値を低く補正してもよい。この予め定められた規則としては、例えば、材料Ａと材料Ｂとを混合させることは無い、又は５種類以上の材料を混合させることは無い等のユーザの経験則に従った規則が挙げられる。 The derivation unit 12 may correct the evaluation value to be low when an experimental condition that does not satisfy a predetermined rule is output from the output model 22. Examples of the predetermined rule include a rule according to a user's empirical rule such that the material A and the material B are not mixed, or five or more kinds of materials are not mixed.

　学習部１４は、機械学習の一例としての誤差逆伝播法を用いて、実験モデル２４を学習させる。具体的には、学習部１４は、学習用データ２０に含まれる実験条件を実験モデル２４に入力し、実験モデル２４から出力された性能値を取得する。そして、学習部１４は、取得した性能値と、学習用データ２０に含まれる実験条件に対応する性能値との差が最小となるように、実験モデル２４を学習させる。学習部１４は、この実験モデル２４を学習させる処理を、学習用データ２０に含まれる全ての実験条件と性能値との組み合わせを用いて行う。なお、学習部１４は、学習用データ２０に含まれる一部の実験条件と性能値との複数の組み合わせを用いて実験モデル２４を学習させてもよい。また、学習部１４が各実験モデル２４を学習させる際に各実験モデル２４に入力するデータは、各実験モデル２４間で同じデータでもよいし、異なるデータでもよい。 The learning unit 14 learns the experimental model 24 using an error back propagation method as an example of machine learning. Specifically, the learning unit 14 inputs an experimental condition included in the learning data 20 to the experimental model 24 and acquires a performance value output from the experimental model 24. Then, the learning unit 14 learns the experimental model 24 so that the difference between the acquired performance value and the performance value corresponding to the experimental condition included in the learning data 20 is minimized. The learning unit 14 performs the process of learning the experimental model 24 using combinations of all experimental conditions and performance values included in the learning data 20. The learning unit 14 may learn the experimental model 24 using a plurality of combinations of some experimental conditions and performance values included in the learning data 20. Further, the data input to each experimental model 24 when the learning unit 14 learns each experimental model 24 may be the same data or different data among the experimental models 24.

　また、学習部１４は、各出力モデル２２について導出部１２により導出された評価値を用いて、最適化アルゴリズムの一例としての遺伝的アルゴリズムを用いた機械学習によって各出力モデル２２を学習させる。なお、この遺伝的アルゴリズムで用いられる個体の選択手法（例えば、ルーレット選択等）、交叉方法（例えば、二点交叉等）、及び突然変異の確率等のパラメータは、ユーザによって予め設定される。 Further, the learning unit 14 uses the evaluation value derived by the deriving unit 12 for each output model 22 to learn each output model 22 by machine learning using a genetic algorithm as an example of an optimization algorithm. Note that parameters such as an individual selection method (for example, roulette selection), a crossover method (for example, two-point crossover), and a mutation probability used in this genetic algorithm are preset by the user.

　詳細には、例えば、学習部１４は、各出力モデル２２のうち、最も評価の良い２つの出力モデル２２を交配することによって新たな出力モデル２２を生成する。この交配は、例えば、一方の出力モデル２２の入力層と中間層のうちの入力層側の半分の中間層、及び他方の出力モデル２２の中間層のうちの出力層側の半分の中間層と出力層を結合することによって行われる。なお、交配の手法はこの例に限定されない。例えば、一方の出力モデル２２の図３に示す入力層、中間層、及び出力層の上半分と、他方の出力モデル２２の図３に示す入力層、中間層、及び出力層の下半分と、を結合することによって交配を行ってもよい。また、本実施形態では、学習部１４は、世代間で出力モデル２２の数が変わらないように、遺伝的アルゴリズムにより次世代の出力モデル２２を生成する。すなわち、遺伝的アルゴリズムを用いることにより出力モデル２２の重み値が更新されることによって、出力モデル２２が学習される。また、出力モデル２２が学習されることにより、導出部１２により導出された評価値が反映される。 Specifically, for example, the learning unit 14 generates a new output model 22 by mating two output models 22 having the best evaluation among the output models 22. This crossing is performed, for example, with an input layer side half of the input layer and intermediate layer of one output model 22 and an intermediate layer of the output layer side of the intermediate layer of the other output model 22 This is done by combining the output layers. The method of mating is not limited to this example. For example, the upper half of the input layer, intermediate layer, and output layer shown in FIG. 3 of one output model 22 and the lower half of the input layer, intermediate layer, and output layer shown in FIG. 3 of the other output model 22; Mating may be performed by combining In the present embodiment, the learning unit 14 generates the next generation output model 22 using a genetic algorithm so that the number of output models 22 does not change between generations. That is, the output model 22 is learned by updating the weight value of the output model 22 by using a genetic algorithm. In addition, by learning the output model 22, the evaluation value derived by the deriving unit 12 is reflected.

　上記の導出部１２による各出力モデル２２の評価値の導出処理、及び学習部１４による出力モデル２２群の学習処理は、所定の世代数（例えば、１万世代）だけ行われる。そして、学習部１４は、最終世代において評価値が示す評価が最も良い１つの出力モデル２２を、後述する運用フェーズで用いる出力モデル２２Ａとして記憶部４２に記憶する。なお、上記の導出部１２による各出力モデル２２の評価値の導出処理、及び学習部１４による出力モデル２２群の学習処理は、評価値が収束するまで行ってもよい。 The derivation process of the evaluation value of each output model 22 by the derivation unit 12 and the learning process of the output model 22 group by the learning unit 14 are performed for a predetermined number of generations (for example, 10,000 generations). Then, the learning unit 14 stores, in the storage unit 42, one output model 22 having the best evaluation indicated by the evaluation value in the final generation as an output model 22A used in an operation phase described later. Note that the derivation process of the evaluation value of each output model 22 by the derivation unit 12 and the learning process of the output model 22 group by the learning unit 14 may be performed until the evaluation value converges.

　次に、図９を参照して、本実施形態に係る運用フェーズにおける学習装置１０の機能的な構成について説明する。図９に示すように、学習装置１０は、受付部３０及び出力部３２を備える。また、学習装置１０の記憶部４２には、前述した学習フェーズで得られた出力モデル２２Ａが記憶される。 Next, a functional configuration of the learning device 10 in the operation phase according to the present embodiment will be described with reference to FIG. As illustrated in FIG. 9, the learning device 10 includes a reception unit 30 and an output unit 32. Further, the storage unit 42 of the learning device 10 stores the output model 22A obtained in the learning phase described above.

　受付部３０は、ユーザにより入力部４４（図１０参照）を介して入力された材料を生成するための実験条件と、実験結果の材料の性能値との複数の組み合わせを受け付ける。 The accepting unit 30 accepts a plurality of combinations of an experimental condition for generating a material input by the user via the input unit 44 (see FIG. 10) and a performance value of the material of the experimental result.

　出力部３２は、受付部３０により受け付けられた実験条件と性能値との複数の組み合わせを出力モデル２２Ａに入力し、出力モデル２２Ａから出力された実験条件を取得する。また、出力部３２は、学習フェーズにおける導出部１２と同様に、出力モデル２２Ａから出力された実験条件を実際の実験に使用可能な実験条件に補正する。そして、出力部３２は、補正して得られた実験条件を表示部４３（図１０参照）に出力する。ユーザは、表示部４３に表示された実験条件を目視し、必要に応じてその実験条件での実験を行う。なお、出力部３２は、補正して得られた実験条件を記憶部４２に出力（記憶）してもよい。 The output unit 32 inputs a plurality of combinations of the experimental conditions and performance values received by the receiving unit 30 to the output model 22A, and acquires the experimental conditions output from the output model 22A. The output unit 32 corrects the experimental condition output from the output model 22A to an experimental condition that can be used for an actual experiment, as in the derivation unit 12 in the learning phase. Then, the output unit 32 outputs the experimental condition obtained by the correction to the display unit 43 (see FIG. 10). The user visually observes the experimental conditions displayed on the display unit 43 and performs an experiment under the experimental conditions as necessary. Note that the output unit 32 may output (store) experimental conditions obtained by the correction to the storage unit 42.

　次に、図１０を参照して、学習装置１０のハードウェア構成について説明する。学習装置１０は、図１０に示すコンピュータによって実現される。図１０に示すように、学習装置１０は、ＣＰＵ（Central Processing Unit）４０、一時記憶領域としてのメモリ４１、及び不揮発性の記憶部４２を備える。また、学習装置１０は、液晶ディスプレイ等の表示部４３、及びキーボードとマウス等の入力部４４を備える。ＣＰＵ４０、メモリ４１、記憶部４２、表示部４３、及び入力部４４は、バス４５を介して接続される。 Next, the hardware configuration of the learning device 10 will be described with reference to FIG. The learning device 10 is realized by a computer shown in FIG. As illustrated in FIG. 10, the learning device 10 includes a CPU (Central Processing Unit) 40, a memory 41 as a temporary storage area, and a nonvolatile storage unit 42. The learning apparatus 10 includes a display unit 43 such as a liquid crystal display and an input unit 44 such as a keyboard and a mouse. The CPU 40, the memory 41, the storage unit 42, the display unit 43, and the input unit 44 are connected via a bus 45.

　記憶部４２は、ＨＤＤ（Hard Disk Drive）、ＳＳＤ（Solid State Drive）、及びフラッシュメモリ等によって実現される。記憶媒体としての記憶部４２には、学習プログラム５０が記憶される。ＣＰＵ４０は、学習プログラム５０を記憶部４２から読み出し、読み出した学習プログラム５０をメモリ４１に展開してから実行する。ＣＰＵ４０が学習プログラム５０を実行することによって、導出部１２、学習部１４、受付部３０、及び出力部３２として機能する。 The storage unit 42 is realized by an HDD (Hard Disk Drive), an SSD (Solid State Drive), a flash memory, or the like. A learning program 50 is stored in the storage unit 42 as a storage medium. The CPU 40 reads the learning program 50 from the storage unit 42, and executes the read learning program 50 after expanding it in the memory 41. When the CPU 40 executes the learning program 50, the CPU 40 functions as the derivation unit 12, the learning unit 14, the reception unit 30, and the output unit 32.

　次に、図１１～図１３を参照して、本実施形態に係る学習装置１０の作用を説明する。学習装置１０が学習プログラム５０を実行することにより、図１１に示す実験モデル学習処理、図１２に示す出力モデル学習処理、及び図１３に示す実験条件出力処理が実行される。図１１に示す実験モデル学習処理は、例えば、学習フェーズにおいて、ユーザによって入力部４４を介して実験モデル学習処理の実行指示が入力された場合に実行される。また、図１２に示す出力モデル学習処理は、例えば、学習フェーズにおいて、ユーザによって入力部４４を介して出力モデル学習処理の実行指示が入力された場合に実行される。また、図１３に示す実験条件出力処理は、例えば、運用フェーズにおいて、ユーザによって入力部４４を介して実験条件出力処理の実行指示が入力された場合に実行される。 Next, the operation of the learning apparatus 10 according to the present embodiment will be described with reference to FIGS. When the learning device 10 executes the learning program 50, the experimental model learning process shown in FIG. 11, the output model learning process shown in FIG. 12, and the experimental condition output process shown in FIG. 13 are executed. The experimental model learning process illustrated in FIG. 11 is executed, for example, when an instruction to perform the experimental model learning process is input by the user via the input unit 44 in the learning phase. Further, the output model learning process illustrated in FIG. 12 is executed, for example, when a user inputs an execution instruction for the output model learning process via the input unit 44 in the learning phase. Further, the experiment condition output process shown in FIG. 13 is executed, for example, when the user inputs an execution instruction for the experiment condition output process via the input unit 44 in the operation phase.

　図１１のステップＳ１０で、学習部１４は、記憶部４２から学習用データ２０を読み出す。ステップＳ１２で、学習部１４は、それぞれモデルの作成条件が異なる複数の実験モデル２４を生成する。ステップＳ１４で、学習部１４は、ステップＳ１２の処理により生成された複数の実験モデル２４の中から、学習させる対象の１つの実験モデル２４を選択する。なお、ステップＳ１４の処理が繰り返し実行される際には、学習部１４は、それまでに未選択の実験モデル２４を選択する。 In step S10 of FIG. 11, the learning unit 14 reads the learning data 20 from the storage unit 42. In step S12, the learning unit 14 generates a plurality of experimental models 24 having different model creation conditions. In step S14, the learning unit 14 selects one experimental model 24 to be learned from the plurality of experimental models 24 generated by the processing in step S12. Note that when the process of step S14 is repeatedly executed, the learning unit 14 selects an experimental model 24 that has not been selected so far.

　ステップＳ１６で、学習部１４は、前述したように、ステップＳ１０の処理により読み出された学習用データ２０を用いて、ステップＳ１４の処理により選択された実験モデル２４を誤差逆伝播法によって学習させる。ステップＳ１８で、学習部１４は、ステップＳ１６の処理により学習された実験モデル２４を記憶部４２に記憶する。ステップＳ２０で、学習部１４は、ステップＳ１２の処理により生成された全ての実験モデル２４について、ステップＳ１４～ステップＳ１８の処理が完了したか否かを判定する。この判定が否定判定となった場合は、処理はステップＳ１４に戻り、肯定判定となった場合は、実験モデル学習処理が終了する。 In step S16, as described above, the learning unit 14 uses the learning data 20 read out in step S10 to learn the experimental model 24 selected in step S14 by the error back propagation method. . In step S18, the learning unit 14 stores the experimental model 24 learned by the process in step S16 in the storage unit 42. In step S20, the learning unit 14 determines whether or not the processing in steps S14 to S18 has been completed for all the experimental models 24 generated by the processing in step S12. If this determination is negative, the process returns to step S14. If the determination is affirmative, the experimental model learning process ends.

図１２のステップＳ３０で、学習部１４は、それぞれモデルの作成条件が異なる複数の出力モデル２２を生成する。ステップＳ３２で、導出部１２は、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせを各出力モデル２２に入力し、各出力モデル２２から出力された実験条件をそれぞれ取得する。 12, the learning unit 14 generates a plurality of output models 22 having different model creation conditions. In step S 32, the derivation unit 12 inputs a plurality of combinations of experimental conditions for generating materials and performance values of experimental results to each output model 22, and acquires the experimental conditions output from each output model 22. To do.

　なお、この実験条件と性能値との複数の組み合わせは、ステップＳ３２が出力モデル２２の各世代の初回に実行される際（すなわち、初回にステップＳ３２が実行される際、又は後述するステップＳ４６の判定が否定判定となった後の初回にステップＳ３２が実行される際）には、学習用データ２０に含まれる全ての実験条件と性能値との組み合わせとされる。また、この実験条件と性能値との複数の組み合わせは、ステップＳ３２が出力モデル２２の各世代の２回目以降に実行される際（すなわち、ステップＳ４０の判定が否定判定となった後にステップＳ３２が実行される際）には、前回のステップＳ３２で出力モデル２２に入力された実験条件と性能値との複数の組み合わせに、後述するステップＳ３８で実験条件と性能値との組み合わせが追加されたものとなる。 The plurality of combinations of the experimental conditions and the performance values are obtained when step S32 is executed for the first time of each generation of the output model 22 (that is, when step S32 is executed for the first time, or in step S46 described later). When step S32 is executed for the first time after a negative determination), the combination of all the experimental conditions and performance values included in the learning data 20 is used. In addition, a plurality of combinations of the experimental conditions and the performance values are obtained when step S32 is executed after the second time of each generation of the output model 22 (that is, step S32 is determined after the determination in step S40 is negative). When executed, the combination of the experimental condition and the performance value is added in step S38, which will be described later, to the plurality of combinations of the experimental condition and the performance value input to the output model 22 in the previous step S32. It becomes.

ステップＳ３４で、導出部１２は、前述したように、ステップＳ３２の処理により各出力モデル２２から出力された実験条件を、実際の実験に使用可能な実験条件に補正する。ステップＳ３６で、導出部１２は、ステップＳ３４の処理により補正されて得られた各実験条件を、各実験モデル２４に入力し、各実験モデル２４から出力された性能値をそれぞれ取得する。また、導出部１２は、各出力モデル２２について、出力モデル２２から出力された実験条件に対応して、実験条件と性能値との複数の組み合わせをそれぞれ保持する。 In step S34, as described above, the derivation unit 12 corrects the experimental condition output from each output model 22 by the processing in step S32 to an experimental condition that can be used in an actual experiment. In step S 36, the derivation unit 12 inputs each experimental condition corrected by the processing in step S 34 to each experimental model 24, and acquires the performance value output from each experimental model 24. Further, the derivation unit 12 holds a plurality of combinations of experimental conditions and performance values for each output model 22 corresponding to the experimental conditions output from the output model 22.

　ステップＳ３８で、導出部１２は、今回（直前）のステップＳ３２の処理により出力モデル２２に入力された実験条件と性能値との複数の組み合わせに、以下に示す実験条件と性能値との組み合わせを追加する。すなわち、この場合、導出部１２は、今回のステップＳ３６の処理により実験モデル２４に入力した実験条件と、性能値との組み合わせを追加する。この追加を行うことにより得られた実験条件と性能値との複数の組み合わせは、後述するステップＳ４０の判定が否定判定となった後に、次に実行されるステップＳ３２で用いられる。 In step S38, the derivation unit 12 adds the following combinations of experimental conditions and performance values to the plurality of combinations of the experimental conditions and performance values input to the output model 22 by the processing of step S32 (immediately before). to add. That is, in this case, the derivation unit 12 adds a combination of the experimental condition and the performance value input to the experimental model 24 by the process of step S36 this time. A plurality of combinations of experimental conditions and performance values obtained by performing this addition are used in step S32 to be executed next after a negative determination is made in step S40 described later.

　ステップＳ４０で、導出部１２は、ステップＳ３２～ステップＳ３８の処理を、所定の回数（例えば、１００回）繰り返して実行したか否かを判定する。この判定が否定判定となった場合は、処理はステップＳ３２に戻り、肯定判定となった場合は、処理はステップＳ４２に移行する。 In step S40, the derivation unit 12 determines whether or not the processes in steps S32 to S38 have been repeated a predetermined number of times (for example, 100 times). If the determination is negative, the process returns to step S32. If the determination is affirmative, the process proceeds to step S42.

　ステップＳ４２で、導出部１２は、前述したように、各出力モデル２２について、ステップＳ３２～ステップＳ３８の繰り返し処理により得られた所定回数分の性能値を用いて、出力モデル２２の評価値を導出する。ステップＳ４４で、学習部１４は、前述したように、各出力モデル２２についてステップＳ４２の処理により導出された評価値を用いて、遺伝的アルゴリズムによって次世代の出力モデル２２を生成する。この次世代の出力モデル２２は、後述するステップＳ４６の判定が否定判定となった後に、次に実行されるステップＳ３２で用いられる。 In step S42, as described above, the derivation unit 12 derives the evaluation value of the output model 22 by using the performance value for the predetermined number of times obtained by the repetition processing of step S32 to step S38 for each output model 22. To do. In step S44, as described above, the learning unit 14 generates the next generation output model 22 by a genetic algorithm using the evaluation value derived by the process of step S42 for each output model 22. This next-generation output model 22 is used in step S32 to be executed next after a negative determination is made in step S46 described later.

　ステップＳ４６で、学習部１４は、出力モデル２２の世代数が所定の世代数（例えば、１万世代）に達したか否かを判定する。この判定が否定判定となった場合は、処理はステップＳ３２に戻り、肯定判定となった場合は、処理はステップＳ４８に移行する。ステップＳ４８で、学習部１４は、前述したように、最終世代において評価値が示す評価が最も良い１つの出力モデル２２を出力モデル２２Ａとして記憶部４２に記憶する。ステップＳ４８の処理が終了すると、出力モデル学習処理が終了する。 In step S46, the learning unit 14 determines whether or not the number of generations of the output model 22 has reached a predetermined number of generations (for example, 10,000 generations). If this determination is negative, the process returns to step S32. If the determination is affirmative, the process proceeds to step S48. In step S48, as described above, the learning unit 14 stores one output model 22 having the best evaluation indicated by the evaluation value in the final generation in the storage unit 42 as the output model 22A. When the process of step S48 ends, the output model learning process ends.

　図１３のステップＳ５０で、受付部３０は、ユーザにより入力部４４を介して入力された材料を生成するための実験条件と、実験結果の材料の性能値との複数の組み合わせを受け付ける。ステップＳ５２で、出力部３２は、記憶部４２から出力モデル２２Ａを読み出す。ステップＳ５４で、出力部３２は、ステップＳ５０の処理により受け付けられた実験条件と性能値との複数の組み合わせを、ステップＳ５２の処理により読み出された出力モデル２２Ａに入力し、出力モデル２２Ａから出力された実験条件を取得する。 In step S50 of FIG. 13, the accepting unit 30 accepts a plurality of combinations of the experimental conditions for generating the material input by the user via the input unit 44 and the performance value of the material of the experimental result. In step S52, the output unit 32 reads the output model 22A from the storage unit. In step S54, the output unit 32 inputs a plurality of combinations of the experimental condition and the performance value received by the process of step S50 to the output model 22A read by the process of step S52, and outputs from the output model 22A. Obtained experimental conditions.

　ステップＳ５６で、出力部３２は、前述したように、ステップＳ５４の処理により出力モデル２２Ａから出力された実験条件を実際の実験に使用可能な実験条件に補正する。ステップＳ５８で、出力部３２は、前述したように、ステップＳ５６の処理により補正された実験条件を表示部４３に出力する。ステップＳ５８の処理により、表示部４３には実験条件が表示される。ステップＳ５８の処理が終了すると、実験条件出力処理が終了する。 In step S56, as described above, the output unit 32 corrects the experimental condition output from the output model 22A by the processing in step S54 to the experimental condition usable in the actual experiment. In step S58, the output unit 32 outputs the experimental condition corrected by the process of step S56 to the display unit 43 as described above. Through the processing in step S58, the experimental conditions are displayed on the display unit 43. When the process of step S58 ends, the experiment condition output process ends.

　以上説明したように、本実施形態によれば、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデル２２により得られた実験条件を、仮想的な実験を行う実験モデル２４に入力する。また、この入力により得られた実験結果の性能値を用いて出力モデル２２の評価値を導出する。そして、導出した出力モデル２２の評価値を用いて出力モデル２２を機械学習によって学習させる。従って、このように学習された出力モデル２２を用いることによって、材料の適切な実験条件を探索することができる。 As described above, according to the present embodiment, an experiment obtained by the output model 22 using a plurality of combinations of the experimental conditions for generating the material and the performance values of the experimental results as inputs and using the experimental conditions as outputs. The conditions are input to the experimental model 24 for performing a virtual experiment. Further, the evaluation value of the output model 22 is derived using the performance value of the experimental result obtained by this input. Then, the output model 22 is learned by machine learning using the derived evaluation value of the output model 22. Therefore, by using the output model 22 learned in this way, it is possible to search for an appropriate experimental condition of the material.

　［第２実施形態］
　開示の技術の第２実施形態を説明する。なお、第１実施形態と同一の構成要素については、同一の符号を付して説明を省略する。まず、図１４を参照して、本実施形態に係る学習フェーズにおける学習装置１０の機能的な構成について説明する。図１４に示すように、学習装置１０は、導出部１２Ａ、学習部１４Ａ、及び生成部１６を備える。記憶部４２には、学習用データ２０、複数の出力モデル２２Ｂ、及び複数の実験モデル２４が記憶される。 [Second Embodiment]
A second embodiment of the disclosed technology will be described. In addition, about the component same as 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted. First, a functional configuration of the learning device 10 in the learning phase according to the present embodiment will be described with reference to FIG. As illustrated in FIG. 14, the learning device 10 includes a derivation unit 12A, a learning unit 14A, and a generation unit 16. The storage unit 42 stores learning data 20, a plurality of output models 22B, and a plurality of experimental models 24.

　図１５に、出力モデル２２Ｂの一例を示す。図１５に示すように、本実施形態に係る出力モデル２２Ｂは、入力層、複数の中間層、及び出力層を含むニューラルネットワークとされている。出力モデル２２Ｂの入力層には、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び１つの実験条件の候補が入力される。出力モデル２２Ｂの出力層は、強化学習における行動価値の一例としてのＱ値を出力する。すなわち、本実施形態に係る学習装置１０は、実験条件と性能値との複数の組み合わせを現状態ｓとし、実験条件の候補を行動ａとして、強化学習の一例としてのＱ学習に従って出力モデル２２Ｂを学習させる。なお、本実施形態に係る複数の出力モデル２２Ｂも、第１実施形態に係る出力モデル２２と同様に、それぞれモデルの作成条件が異なる。 FIG. 15 shows an example of the output model 22B. As shown in FIG. 15, the output model 22B according to the present embodiment is a neural network including an input layer, a plurality of intermediate layers, and an output layer. A plurality of combinations of experimental conditions for generating materials and performance values of experimental results, and one experimental condition candidate are input to the input layer of the output model 22B. The output layer of the output model 22B outputs a Q value as an example of an action value in reinforcement learning. That is, the learning device 10 according to the present embodiment sets the output model 22B according to Q learning as an example of reinforcement learning, with a plurality of combinations of the experimental condition and the performance value as the current state s, the experimental condition candidate as the action a. Let them learn. Note that the plurality of output models 22B according to the present embodiment also have different model creation conditions, like the output model 22 according to the first embodiment.

　生成部１６は、複数の異なる実験条件の候補を生成する。本実施形態では、生成部１６は、予め定められた規則を満たし、かつ実際の実験に使用可能な実験条件の候補を生成する。この規則及び実際の実験に使用可能な実験条件については、第１実施形態と同様であるため、説明を省略する。具体的には、生成部１６は、複数の異なる実験条件の候補を生成する都度、予め定められた規則を満たし、かつ実際の実験に使用可能な実験条件の候補をランダムに生成する。 The generation unit 16 generates a plurality of different experimental condition candidates. In the present embodiment, the generation unit 16 generates experimental condition candidates that satisfy a predetermined rule and can be used in an actual experiment. Since this rule and the experimental conditions that can be used in the actual experiment are the same as those in the first embodiment, description thereof is omitted. Specifically, each time a plurality of different experimental condition candidates are generated, the generation unit 16 randomly generates experimental condition candidates that satisfy a predetermined rule and can be used in an actual experiment.

　導出部１２Ａは、後述する学習部１４Ａが各出力モデル２２ＢをＱ学習に従って学習させる際に報酬として用いる値（以下、「報酬値」という）を導出する。以下、導出部１２Ａが報酬値を導出する処理の詳細を説明する。 The deriving unit 12A derives a value (hereinafter referred to as “reward value”) used as a reward when the learning unit 14A described later learns each output model 22B according to Q learning. Hereinafter, details of the process in which the deriving unit 12A derives the reward value will be described.

まず、導出部１２Ａは、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び生成部１６により生成された実験条件の候補を出力モデル２２Ｂに入力し、出力モデル２２Ｂから出力されたＱ値を取得する。詳細には、図１６に示すように、導出部１２Ａは、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び生成部１６により生成された複数の実験条件の候補の何れか１つを、生成された全ての実験条件の候補について、出力モデル２２Ｂに個別に入力する。すなわち、導出部１２Ａは、生成部１６により生成された複数の実験条件の候補のそれぞれに対応して出力モデル２２Ｂから出力されたＱ値を取得する。 First, the derivation unit 12A inputs, to the output model 22B, a plurality of combinations of experimental conditions for generating materials and performance values of experimental results, and experimental condition candidates generated by the generation unit 16, and outputs the output model 22B. The Q value output from is acquired. Specifically, as illustrated in FIG. 16, the derivation unit 12 A includes a plurality of combinations of experimental conditions for generating materials and performance values of experimental results, and a plurality of experimental condition candidates generated by the generation unit 16. Any one of the above is individually input to the output model 22B for all the generated experimental condition candidates. That is, the deriving unit 12A acquires the Q value output from the output model 22B corresponding to each of the plurality of experimental condition candidates generated by the generating unit 16.

　次に、導出部１２Ａは、取得した複数のＱ値のうち、所定値以上のＱ値の何れかに対応する実験条件の候補を実験モデル２４に入力する。本実施形態では、導出部１２Ａは、取得した複数のＱ値のうち、最大のＱ値に対応する実験条件の候補を各実験モデル２４に入力し、各実験モデル２４から出力された性能値を取得する。また、導出部１２Ａは、第１実施形態に係る導出部１２と同様に、実験条件と実験結果の性能値との複数の組み合わせを、それぞれ保持する。 Next, the derivation unit 12A inputs the experimental condition candidate corresponding to any of the Q values equal to or greater than a predetermined value among the plurality of acquired Q values to the experimental model 24. In the present embodiment, the derivation unit 12A inputs the experimental condition candidate corresponding to the maximum Q value among the plurality of acquired Q values to each experimental model 24, and uses the performance value output from each experimental model 24. get. Further, similarly to the derivation unit 12 according to the first embodiment, the derivation unit 12A holds a plurality of combinations of experimental conditions and performance values of experimental results, respectively.

　更に、導出部１２Ａは、出力モデル２２Ｂに入力した実験条件と性能値との複数の組み合わせに、実験モデル２４に入力した実験条件と導出した性能値との組み合わせを追加した実験条件と性能値との複数の組み合わせを得る。また、導出部１２Ａは、再度、得られた実験条件と性能値との複数の組み合わせ、及び生成部１６により生成された複数の実験条件の候補の何れか１つを、生成された全ての実験条件の候補について、出力モデル２２Ｂに個別に入力する。そして、導出部１２Ａは、再度、前述した処理と同様に、複数の実験条件の候補のそれぞれに対応して出力モデル２２Ｂから出力されたＱ値と実験モデル２４とを用いて、実験条件の候補に対応する性能値を取得する。導出部１２Ａは、この実験条件の候補に対応する性能値を取得するための処理を所定の回数（例えば、１００回）繰り返す。 Furthermore, the derivation unit 12A includes an experimental condition and a performance value obtained by adding a combination of the experimental condition input to the experimental model 24 and the derived performance value to a plurality of combinations of the experimental condition and the performance value input to the output model 22B. Get multiple combinations. In addition, the derivation unit 12A again selects any one of a plurality of combinations of the obtained experimental conditions and performance values and a plurality of experimental condition candidates generated by the generation unit 16 for all the generated experiments. The candidate conditions are individually input to the output model 22B. Then, the derivation unit 12A again uses the Q value output from the output model 22B corresponding to each of the plurality of experimental condition candidates and the experimental model 24, similarly to the above-described processing, to use the experimental model candidates. Get the performance value corresponding to. The deriving unit 12A repeats the process for acquiring the performance value corresponding to the experimental condition candidate a predetermined number of times (for example, 100 times).

　また、導出部１２Ａは、以上の処理を各出力モデル２２Ｂについて行う。すなわち、導出部１２Ａは、各出力モデル２２Ｂのそれぞれについて所定回数分の性能値を取得する。導出部１２Ａは、第１実施形態に係る導出部１２と同様に、各出力モデル２２Ｂについて、得られた所定回数分の性能値を用いて出力モデル２２Ｂの評価値を導出する（図７参照）。 Also, the derivation unit 12A performs the above processing for each output model 22B. That is, the derivation unit 12A acquires a performance value for a predetermined number of times for each output model 22B. Similarly to the derivation unit 12 according to the first embodiment, the derivation unit 12A derives an evaluation value of the output model 22B for each output model 22B using the obtained performance values for a predetermined number of times (see FIG. 7). .

　また、導出部１２Ａは、導出した評価値が高い出力モデル２２Ｂほど高い報酬が得られるように報酬値を導出する。例えば、導出部１２Ａは、評価値が高い順番に上位３つの出力モデル２２Ｂの報酬値を「１」と導出し、下位３つの出力モデル２２Ｂの報酬値を「－１」と導出し、他の出力モデル２２Ｂの報酬値を「０」と導出する。 Also, the derivation unit 12A derives a reward value so that a higher reward is obtained for the output model 22B having a higher derived evaluation value. For example, the derivation unit 12A derives the reward value of the upper three output models 22B as “1” in descending order of the evaluation value, derives the reward value of the lower three output models 22B as “−1”, The reward value of the output model 22B is derived as “0”.

　学習部１４Ａは、導出部１２Ａにより導出された報酬値をＱ学習における報酬ｒとして用いて、各出力モデル２２Ｂを学習させる。 The learning unit 14A learns each output model 22B by using the reward value derived by the derivation unit 12A as the reward r in the Q learning.

　上記の導出部１２Ａによる各出力モデル２２Ｂの報酬値を導出するための処理、及び学習部１４Ａによる各出力モデル２２Ｂの学習処理は、所定の回数（例えば、１万回）だけ行われる。そして、学習部１４Ａは、最後の回において、評価値が示す評価が最も良い１つの出力モデル２２Ｂを、後述する運用フェーズで用いる出力モデル２２Ｃとして記憶部４２に記憶する。なお、上記の導出部１２Ａによる各出力モデル２２Ｂの報酬値を導出するための処理、及び学習部１４Ａによる各出力モデル２２Ｂの学習処理は、評価値が収束するまで行ってもよい。 The process for deriving the reward value of each output model 22B by the deriving unit 12A and the learning process of each output model 22B by the learning unit 14A are performed a predetermined number of times (for example, 10,000 times). Then, in the last round, the learning unit 14A stores one output model 22B having the best evaluation indicated by the evaluation value in the storage unit 42 as an output model 22C used in an operation phase described later. Note that the process for deriving the reward value of each output model 22B by the deriving unit 12A and the learning process of each output model 22B by the learning unit 14A may be performed until the evaluation value converges.

　また、学習部１４Ａは、第１実施形態に係る学習部１４と同様に、学習用データ２０を用いて、誤差逆伝播法に従って、実験モデル２４を学習させる。 Also, the learning unit 14A uses the learning data 20 to learn the experimental model 24 according to the error back-propagation method, similarly to the learning unit 14 according to the first embodiment.

　次に、図１７を参照して、本実施形態に係る運用フェーズにおける学習装置１０の機能的な構成について説明する。図１７に示すように、学習装置１０は、生成部１６、受付部３０、及び出力部３２Ａを備える。また、学習装置１０の記憶部４２には、前述した学習フェーズで得られた出力モデル２２Ｃが記憶される。 Next, a functional configuration of the learning device 10 in the operation phase according to the present embodiment will be described with reference to FIG. As illustrated in FIG. 17, the learning device 10 includes a generation unit 16, a reception unit 30, and an output unit 32A. Further, the storage unit 42 of the learning device 10 stores the output model 22C obtained in the learning phase described above.

出力部３２Ａは、受付部３０により受け付けられた実験条件と性能値との複数の組み合わせ、及び生成部１６により生成された複数の実験条件の候補の何れか１つを、生成された全ての実験条件の候補について、出力モデル２２Ｃに個別に入力する。出力部３２Ａは、この入力それぞれに対応して出力モデル２２Ｃから出力されたＱ値を取得する。そして、出力部３２Ａは、取得した複数のＱ値のうち、最大のＱ値に対応する実験条件の候補を、次に実験対象とする実験条件の候補として表示部４３に出力する。なお、出力部３２Ａは、取得した複数のＱ値のうち、所定値以上のＱ値の何れか（例えば、所定値以上で、かつ２番目に大きいＱ値）に対応する実験条件の候補を、次に実験対象とする実験条件の候補として表示部４３に出力してもよい。また、出力部３２Ａは、取得した複数のＱ値のうち、最大のＱ値に対応する実験条件の候補を、次に実験対象とする実験条件の候補として記憶部４２に出力（記憶）してもよい。 The output unit 32 A selects all ones of the plurality of combinations of the experimental conditions and performance values received by the receiving unit 30 and the plurality of experimental condition candidates generated by the generating unit 16. The candidate conditions are individually input to the output model 22C. The output unit 32A acquires the Q value output from the output model 22C corresponding to each of the inputs. Then, the output unit 32A outputs the experimental condition candidate corresponding to the maximum Q value among the acquired Q values to the display unit 43 as the experimental condition candidate to be the next experiment target. Note that the output unit 32A selects experimental condition candidates corresponding to any one of the acquired Q values that are equal to or higher than a predetermined value (for example, the Q value that is equal to or higher than the predetermined value and is the second largest). Next, you may output to the display part 43 as a candidate of experimental conditions made into experiment. In addition, the output unit 32A outputs (stores) the experimental condition candidate corresponding to the maximum Q value among the acquired Q values to the storage unit 42 as the experimental condition candidate to be the next experiment target. Also good.

　本実施形態に係る学習装置１０のハードウェア構成は、第１実施形態に係る学習装置１０と同様（図１０参照）であるため、説明を省略する。ＣＰＵ４０が学習プログラム５０を実行することによって、導出部１２Ａ、学習部１４Ａ、生成部１６、受付部３０、及び出力部３２Ａとして機能する。 Since the hardware configuration of the learning device 10 according to the present embodiment is the same as that of the learning device 10 according to the first embodiment (see FIG. 10), the description thereof is omitted. When the CPU 40 executes the learning program 50, the CPU 40 functions as the derivation unit 12A, the learning unit 14A, the generation unit 16, the reception unit 30, and the output unit 32A.

　次に、図１８及び図１９を参照して、本実施形態に係る学習装置１０の作用を説明する。なお、実験モデル学習処理は、第１実施形態と同様（図１１参照）であるため、説明を省略する。図１８に示す出力モデル学習処理は、例えば、学習フェーズにおいて、ユーザによって入力部４４を介して出力モデル学習処理の実行指示が入力された場合に実行される。また、図１９に示す実験条件出力処理は、例えば、運用フェーズにおいて、ユーザによって入力部４４を介して実験条件出力処理の実行指示が入力された場合に実行される。 Next, the operation of the learning device 10 according to the present embodiment will be described with reference to FIGS. Note that the experimental model learning process is the same as in the first embodiment (see FIG. 11), and thus the description thereof is omitted. The output model learning process illustrated in FIG. 18 is executed, for example, when a user inputs an execution instruction for the output model learning process via the input unit 44 in the learning phase. Further, the experiment condition output process shown in FIG. 19 is executed, for example, when the user inputs an instruction to execute the experiment condition output process via the input unit 44 in the operation phase.

　図１８のステップＳ６０で、学習部１４は、それぞれモデルの作成条件が異なる複数の出力モデル２２Ｂを生成する。ステップＳ６０の処理により生成された各出力モデル２２Ｂについて以下のステップＳ６２～Ｓ７０の処理が同様に実行される。ステップＳ６２で、生成部１６は、前述したように、複数の異なる実験条件の候補を生成する。 18, in step S60, the learning unit 14 generates a plurality of output models 22B having different model creation conditions. The following steps S62 to S70 are similarly executed for each output model 22B generated by the processing of step S60. In step S62, the generation unit 16 generates a plurality of different experimental condition candidates as described above.

　ステップＳ６４で、導出部１２Ａは、前述したように、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及びステップＳ６２の処理により生成された実験条件の候補を出力モデル２２Ｂに入力し、出力モデル２２Ｂから出力されたＱ値を取得する。 In step S64, as described above, the derivation unit 12A outputs a plurality of combinations of the experimental conditions for generating the material and the performance values of the experimental results, and the experimental condition candidates generated by the process of step S62 as an output model. The Q value input to 22B and output from the output model 22B is acquired.

　なお、この実験条件と性能値との複数の組み合わせは、ステップＳ６４が出力モデル２２Ｂの学習処理における初回に実行される際（すなわち、初回にステップＳ６４が実行される際、又は後述するステップＳ７８の判定が否定判定となった後の初回にステップＳ６２が実行される際）には、学習用データ２０に含まれる全ての実験条件と性能値との組み合わせとされる。また、この実験条件と性能値との複数の組み合わせは、ステップＳ６４が出力モデル２２Ｂの学習処理における２回目以降に実行される際（すなわち、ステップＳ７０の判定が否定判定となった後にステップＳ６４が実行される際）には、前回のステップＳ６４で出力モデル２２Ｂに入力された実験条件と性能値との複数の組み合わせに、後述するステップＳ６８で実験条件と性能値との組み合わせが追加されたものとなる。 The plurality of combinations of the experimental conditions and the performance values are obtained when step S64 is executed for the first time in the learning process of the output model 22B (that is, when step S64 is executed for the first time, or in step S78 described later). When step S62 is executed for the first time after a negative determination), the combination of all the experimental conditions and performance values included in the learning data 20 is used. In addition, a plurality of combinations of the experimental conditions and the performance values are obtained when step S64 is executed after the second time in the learning process of the output model 22B (that is, after the determination in step S70 is negative) When executed, the combination of the experimental condition and the performance value is added in step S68 described later to the plurality of combinations of the experimental condition and the performance value input to the output model 22B in the previous step S64. It becomes.

　ステップＳ６６で、導出部１２Ａは、ステップＳ６４の処理により取得された複数のＱ値のうち、最大のＱ値に対応する実験条件の候補を各実験モデル２４に入力し、各実験モデル２４から出力された性能値を取得する。また、導出部１２Ａは、最大のＱ値に対応する実験条件の候補に対応して、実験条件と実験結果の性能値との複数の組み合わせを、それぞれ保持する。 In step S66, the derivation unit 12A inputs the experimental condition candidate corresponding to the maximum Q value among the plurality of Q values acquired by the process of step S64 to each experimental model 24, and outputs from each experimental model 24. Get the measured performance value. Further, the deriving unit 12A holds a plurality of combinations of the experimental condition and the performance value of the experimental result, respectively, corresponding to the experimental condition candidate corresponding to the maximum Q value.

　ステップＳ６８で、導出部１２Ａは、今回（直前）のステップＳ６４の処理により出力モデル２２Ｂに入力された実験条件と性能値との複数の組み合わせに、以下に示す実験条件と性能値との組み合わせを追加する。すなわち、この場合、導出部１２Ａは、今回のステップＳ６６の処理により実験モデル２４に入力した実験条件と、取得された性能値との組み合わせを追加する。この追加を行うことにより得られた実験条件と性能値との複数の組み合わせは、後述するステップＳ７０の判定が否定判定となった後に、次に実行されるステップＳ６４で用いられる。 In step S68, the derivation unit 12A adds the following combinations of experimental conditions and performance values to a plurality of combinations of the experimental conditions and performance values input to the output model 22B by the processing of step S64 this time (immediately before). to add. That is, in this case, the deriving unit 12A adds a combination of the experimental condition input to the experimental model 24 and the acquired performance value by the process of step S66 this time. A plurality of combinations of experimental conditions and performance values obtained by performing this addition are used in step S64 to be executed next after the determination in step S70 described later is negative.

　ステップＳ７０で、導出部１２Ａは、ステップＳ６２～ステップＳ６８の処理を、所定の回数（例えば、１００回）繰り返して実行したか否かを判定する。この判定が否定判定となった場合は、処理はステップＳ６２に戻り、肯定判定となった場合は、処理はステップＳ７２に移行する。 In step S70, the derivation unit 12A determines whether or not the processing in steps S62 to S68 has been repeated a predetermined number of times (for example, 100 times). If this determination is negative, the process returns to step S62. If the determination is affirmative, the process proceeds to step S72.

　ステップＳ７２で、導出部１２Ａは、前述したように、各出力モデル２２Ｂについて、ステップＳ６２～ステップＳ６８の繰り返し処理により得られた所定回数分の性能値を用いて、出力モデル２２Ｂの評価値を導出する。ステップＳ７４で、導出部１２Ａは、前述したように、ステップＳ７２の処理により導出された評価値が高い出力モデル２２Ｂほど高い報酬が得られるように報酬値を導出する。 In step S72, as described above, the derivation unit 12A derives the evaluation value of the output model 22B using the performance value for the predetermined number of times obtained by the repetition processing of step S62 to step S68 for each output model 22B. To do. In step S74, as described above, the derivation unit 12A derives a reward value so that a higher reward is obtained for the output model 22B having a higher evaluation value derived by the process of step S72.

　ステップＳ７６で、学習部１４Ａは、ステップＳ７４の処理により導出された報酬値をＱ学習における報酬ｒとして用いて、各出力モデル２２Ｂを学習させる。ステップＳ７８で、学習部１４は、ステップＳ６２～ステップＳ７６の処理を、所定の回数（例えば、１万回）繰り返して実行したか否かを判定する。この判定が否定判定となった場合は、処理はステップＳ６２に戻り、肯定判定となった場合は、処理はステップＳ８０に移行する。ステップＳ８０で、学習部１４Ａは、前述したように、最後に実行されたステップＳ７２の処理により導出された評価値が示す評価が最も良い１つの出力モデル２２Ｂを出力モデル２２Ｃとして記憶部４２に記憶する。ステップＳ８０の処理が終了すると、出力モデル学習処理が終了する。 In step S76, the learning unit 14A learns each output model 22B using the reward value derived by the process in step S74 as the reward r in the Q learning. In step S78, the learning unit 14 determines whether or not the processes in steps S62 to S76 have been repeated a predetermined number of times (for example, 10,000 times). If this determination is negative, the process returns to step S62. If the determination is affirmative, the process proceeds to step S80. In step S80, as described above, the learning unit 14A stores in the storage unit 42, as the output model 22C, one output model 22B having the best evaluation indicated by the evaluation value derived by the process of step S72 executed last. To do. When the process of step S80 ends, the output model learning process ends.

　図１９のステップＳ９０で、受付部３０は、ユーザにより入力部４４を介して入力された材料を生成するための実験条件と、実験結果の材料の性能値との複数の組み合わせを受け付ける。ステップＳ９２で、出力部３２Ａは、記憶部４２から出力モデル２２Ｃを読み出す。ステップＳ９４で、生成部１６は、前述したように、複数の異なる実験条件の候補を生成する。 19, the accepting unit 30 accepts a plurality of combinations of the experimental conditions for generating the material input by the user via the input unit 44 and the performance value of the material of the experimental result. In step S92, the output unit 32A reads the output model 22C from the storage unit. In step S94, the generation unit 16 generates a plurality of different experimental condition candidates as described above.

　ステップＳ９６で、出力部３２Ａは、ステップＳ９０の処理により受け付けられた実験条件と性能値との複数の組み合わせ、及びステップＳ９２の処理により生成された複数の実験条件の候補の何れか１つを、生成された全ての実験条件の候補について、出力モデル２２Ｃに個別に入力する。出力部３２Ａは、この入力それぞれに対応して出力モデル２２Ｃから出力されたＱ値を取得する。 In step S96, the output unit 32A outputs one of a plurality of combinations of the experimental condition and the performance value received by the process of step S90 and a plurality of experimental condition candidates generated by the process of step S92. All the generated experimental condition candidates are individually input to the output model 22C. The output unit 32A acquires the Q value output from the output model 22C corresponding to each of the inputs.

　ステップＳ９８で、出力部３２Ａは、ステップＳ９６の処理により取得された複数のＱ値のうち、最大のＱ値に対応する実験条件の候補を、次に実験対象とする実験条件の候補として表示部４３に出力する。ステップＳ９８の処理が終了すると、実験条件出力処理が終了する。 In step S98, the output unit 32A displays the experimental condition candidate corresponding to the maximum Q value among the plurality of Q values acquired by the process of step S96 as the experimental condition candidate to be tested next. Output to 43. When the process of step S98 ends, the experiment condition output process ends.

　以上説明したように、本実施形態によれば、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、Ｑ値を出力とした出力モデル２２Ｂにより得られたＱ値が最大となる実験条件の候補を実験モデル２４に入力する。また、この入力により得られた実験結果の性能値を用いて出力モデル２２Ｂの評価値を導出し、導出した評価値に応じて出力モデル２２Ｂに与える報酬を導出する。そして、導出した報酬を用いて出力モデル２２ＢをＱ学習によって学習させる。従って、このように学習された出力モデル２２Ｂを用いることによって、材料の適切な実験条件を探索することができる。 As described above, according to the present embodiment, a plurality of combinations of experimental conditions for generating materials and performance values of experimental results, and candidates for experimental conditions are input, and an output model is output with Q value. The experimental condition candidate that maximizes the Q value obtained by 22B is input to the experimental model 24. Also, an evaluation value of the output model 22B is derived using the performance value of the experimental result obtained by this input, and a reward given to the output model 22B is derived according to the derived evaluation value. Then, the output model 22B is learned by Q learning using the derived reward. Therefore, by using the output model 22B learned in this way, it is possible to search for an appropriate experimental condition for the material.

　なお、上記各実施形態では、材料を生成するための実験条件を導出する場合について説明したが、これに限定されない。例えば、薬剤を生成するための実験条件を導出する形態としてもよい。 In addition, although each said embodiment demonstrated the case where experiment conditions for producing | generating a material were derived, it is not limited to this. For example, it is good also as a form which derives the experimental conditions for producing | generating a chemical | medical agent.

　また、上記各実施形態では、実験モデル２４として機械学習によって得られた学習済みモデルを適用した場合について説明したが、仮想的な実験が可能なモデルであれば、これに限定されない。例えば、実験モデル２４として、１つの実験条件を入力とし、入力された１つの実験条件に対応する実験結果の性能値を出力とした任意の関数を適用してもよい。このようなモデルを適用した場合でも出力モデル２２、２２Ｂが学習されることによって最適化される。また、例えば、実験モデル２４は、実験をシミュレーションするシミュレータであってもよい。 In each of the above embodiments, the case where a learned model obtained by machine learning is applied as the experimental model 24 has been described. However, the present invention is not limited to this as long as it is a model that allows a virtual experiment. For example, as the experimental model 24, an arbitrary function having one experimental condition as an input and outputting a performance value of an experimental result corresponding to the input one experimental condition may be applied. Even when such a model is applied, the

output models

22 and 22B are optimized by learning. For example, the experimental model 24 may be a simulator that simulates an experiment.

　また、上記第２実施形態の運用フェーズにおいて、出力部３２Ａは、複数の実験条件の候補を出力モデル２２Ｃに逐次的に複数回入力することにより得られた累計のＱ値が最大となる実験条件の候補を次に実験対象とする実験条件の候補として出力してもよい。この場合、出力部３２Ａは、まず、第２実施形態と同様に、出力モデル２２Ｃから１回目の複数の実験条件の候補それぞれに対応するＱ値を得る。次に、出力部３２Ａは、例えば、１回目に出力モデル２２Ｃに入力した実験条件と性能値との複数の組み合わせに、１回目に出力モデル２２Ｃに入力した実験条件の候補と性能値との組み合わせを追加する。この性能値は、例えば、ＳＶＭ（Support Vector Machine）等の既知の手法により推定すればよい。そして、出力部３２Ａは、１回目と同様に、追加して得られた実験条件と性能値との複数の組み合わせ、及び２回目の複数の実験条件の候補それぞれを出力モデル２２Ｃに入力することにより出力モデル２２Ｃから２回目の複数の実験条件の候補それぞれに対応するＱ値を得る。この場合、出力部３２Ａは、１回目のＱ値と２回目のＱ値の累計値が最大となる実験条件の候補を次に実験対象とする実験条件の候補として出力する。なお、ここでは、２回のＱ値の累計値を用いる場合を説明したが、３回以上のＱ値の累計値を用いる場合も同様に可能である。 Further, in the operation phase of the second embodiment, the output unit 32A allows the experimental condition in which the cumulative Q value obtained by sequentially inputting a plurality of experimental condition candidates to the output model 22C is maximized. These candidates may be output as candidates for the experimental conditions to be the next experiment target. In this case, the output unit 32A first obtains a Q value corresponding to each of a plurality of first experimental condition candidates from the output model 22C, as in the second embodiment. Next, the output unit 32A, for example, combines a plurality of combinations of experimental conditions and performance values input to the output model 22C for the first time with combinations of experimental condition candidates and performance values input to the output model 22C for the first time. Add This performance value may be estimated by a known method such as SVM (Support Vector Vector Machine). Then, similarly to the first time, the output unit 32A inputs a plurality of combinations of experimental conditions and performance values obtained by addition, and a plurality of second experimental condition candidates to the output model 22C. A Q value corresponding to each of a plurality of second experimental condition candidates is obtained from the output model 22C. In this case, the output unit 32A outputs the experimental condition candidate that maximizes the cumulative value of the first Q value and the second Q value as the next experimental condition candidate. In addition, although the case where the accumulated value of Q value of 2 times was used was demonstrated here, the case where the accumulated value of Q value of 3 times or more is used is similarly possible.

　また、上記各実施形態でＣＰＵがソフトウェア（プログラム）を実行することにより実行した各種処理を、ＣＰＵ以外の各種のプロセッサが実行してもよい。この場合のプロセッサとしては、ＦＰＧＡ（Field-Programmable Gate Array）等の製造後に回路構成を変更可能なＰＬＤ（Programmable Logic Device）、及びＡＳＩＣ（Application Specific Integrated Circuit）等の特定の処理を実行させるために専用に設計された回路構成を有するプロセッサである専用電気回路等が例示される。また、上記各種処理を、これらの各種のプロセッサのうちの１つで実行してもよいし、同種又は異種の２つ以上のプロセッサの組み合わせ（例えば、複数のＦＰＧＡ、及びＣＰＵとＦＰＧＡとの組み合わせ等）で実行してもよい。また、これらの各種のプロセッサのハードウェア的な構造は、より詳細には、半導体素子等の回路素子を組み合わせた電気回路である。 In addition, various processors other than the CPU may execute various processes executed by the CPU executing software (programs) in each of the above embodiments. As a processor in this case, in order to execute specific processing such as PLD (Programmable Logic Device) and ASIC (Application Specific Integrated Circuit) whose circuit configuration can be changed after manufacturing FPGA (Field-Programmable Gate Array) or the like A dedicated electric circuit, which is a processor having a circuit configuration designed exclusively, is exemplified. Further, the above-described various processes may be executed by one of these various processors, or a combination of two or more processors of the same type or different types (for example, a combination of a plurality of FPGAs and CPUs and FPGAs). Etc.). More specifically, the hardware structure of these various processors is an electric circuit in which circuit elements such as semiconductor elements are combined.

　また、上記各実施形態では、学習プログラム５０が記憶部４２に予め記憶（インストール）されている態様を説明したが、これに限定されない。学習プログラム５０は、ＣＤ－ＲＯＭ（Compact Disk Read Only Memory）、ＤＶＤ－ＲＯＭ（Digital Versatile Disk Read Only Memory）、及びＵＳＢ（Universal Serial Bus）メモリ等の非一時的記録媒体に記録された形態で提供されてもよい。また、学習プログラム５０は、ネットワークを介して外部装置からダウンロードされる形態としてもよい。 In the above embodiments, the learning program 50 has been previously stored (installed) in the storage unit 42. However, the present invention is not limited to this. The learning program 50 is provided in a form recorded on a non-temporary recording medium such as a CD-ROM (Compact Disk Read Only Memory), DVD-ROM (Digital Versatile Disk Disk Read Only Memory), and USB (Universal Serial Bus) memory. May be. The learning program 50 may be downloaded from an external device via a network.

　本願は２０１８年４月１１日出願の日本出願第２０１８－０７６００１号の優先権を主張すると共に、その全文を参照により本明細書に援用する。 This application claims the priority of Japanese Application No. 2018-076001 filed on Apr. 11, 2018, the entire text of which is incorporated herein by reference.

Claims

　材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデルに、前記複数の組み合わせを入力することによって出力された実験条件を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値を用いて前記出力モデルの評価値を導出する導出部と、
　前記導出部により導出された評価値を反映させる機械学習によって前記出力モデルを学習させる学習部と、
　を備えた学習装置。 The test condition output by inputting the plurality of combinations into the output model having the test condition as an output and a plurality of combinations of the test condition for generating the material or the drug and the performance value of the test result. A derivation unit for deriving an evaluation value of the output model using a performance value of an experimental result obtained by inputting to an experimental model for performing a virtual experiment;
A learning unit for learning the output model by machine learning reflecting the evaluation value derived by the deriving unit;
A learning device.
　前記評価値は、複数の前記性能値における目標とする性能を満たす値の比率が高いほど良い値であるか、前記目標とする性能を満たす性能値が得られるまでの仮想的な実験回数が少ないほど良い値であるか、又は前記性能値が前記目標とする性能に近いほど良い値である
　請求項１に記載の学習装置。 The evaluation value is a better value as the ratio of values satisfying the target performance among the plurality of performance values is higher, or the number of virtual experiments until the performance value satisfying the target performance is obtained is smaller. The learning device according to claim 1, wherein the learning device is a better value, or a better value as the performance value is closer to the target performance.
　前記導出部は、前記出力モデルから、予め定められた規則を満たさない実験条件が出力された場合、前記評価値を低く補正する
　請求項１又は請求項２に記載の学習装置。 The learning device according to claim 1, wherein the derivation unit corrects the evaluation value to be low when an experimental condition that does not satisfy a predetermined rule is output from the output model.
　前記導出部は、前記出力モデルから出力された実験条件を実際の実験に使用可能な実験条件に補正する
　請求項１から請求項３の何れか１項に記載の学習装置。 The learning device according to claim 1, wherein the derivation unit corrects the experimental condition output from the output model to an experimental condition that can be used for an actual experiment.
　前記出力モデルは、遺伝的アルゴリズムを用いて学習されるモデルである
　請求項１から請求項４の何れか１項に記載の学習装置。 The learning apparatus according to any one of claims 1 to 4, wherein the output model is a model learned using a genetic algorithm.
　材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、強化学習における行動価値を出力とした出力モデルに、前記複数の組み合わせと複数の異なる前記実験条件の候補各々とをそれぞれ入力することにより出力された複数の行動価値のうち、所定値以上の行動価値に対応する前記実験条件の候補を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値に基づいて導出される値を報酬として、前記出力モデルを学習させる学習部
　を備えた学習装置。 A plurality of combinations of the plurality of combinations and a plurality of combinations of the experimental conditions for generating the material or the drug and the performance values of the experimental results and the candidate of the experimental conditions as inputs and the action value in reinforcement learning as an output. Among the plurality of behavioral values output by inputting each of the experimental condition candidates different from each other, the experimental condition candidate corresponding to the behavioral value greater than or equal to a predetermined value is used as an experimental model for performing a virtual experiment. A learning apparatus comprising: a learning unit that learns the output model using a value derived based on a performance value of an experimental result obtained by inputting as a reward.
　前記報酬は、複数の前記性能値における目標とする性能を満たす値の比率が高いほど良い値であるか、前記目標とする性能を満たす性能値が得られるまでの仮想的な実験回数が少ないほど良い値であるか、又は前記性能値が前記目標とする性能に近いほど良い値である
　請求項６に記載の学習装置。 The reward is a better value as the ratio of values satisfying the target performance in the plurality of performance values is higher, or the smaller the number of virtual experiments until a performance value satisfying the target performance is obtained. The learning device according to claim 6, wherein the learning device is a good value or a value that is better as the performance value is closer to the target performance.
　前記強化学習は、Ｑ学習であり、
　前記行動価値は、Ｑ値である
　請求項６又は請求項７に記載の学習装置。 The reinforcement learning is Q learning,
The learning device according to claim 6, wherein the action value is a Q value.
　前記学習部により学習された出力モデルを用いる場合に、複数の前記実験条件の候補を前記出力モデルに逐次的に複数回入力することにより出力された累計の行動価値が最大となる前記実験条件の候補を次に実験対象とする実験条件の候補として出力する出力部を更に備えた
　請求項６から請求項８の何れか１項に記載の学習装置。 When the output model learned by the learning unit is used, a plurality of the experimental condition candidates are sequentially input to the output model a plurality of times, and the cumulative action value output is maximized. The learning apparatus according to any one of claims 6 to 8, further comprising an output unit that outputs the candidate as a candidate of an experimental condition that is set as a next experiment target.
　前記実験モデルは、機械学習により得られたモデルである
　請求項１から請求項９の何れか１項に記載の学習装置。 The learning apparatus according to claim 1, wherein the experimental model is a model obtained by machine learning.
　前記実験モデルは、複数存在し、
　前記複数の前記実験モデルは、それぞれモデルの作成条件が異なる
　請求項１から請求項１０の何れか１項に記載の学習装置。 There are a plurality of the experimental models,
The learning apparatus according to claim 1, wherein the plurality of experimental models have different model creation conditions.
　前記出力モデルは、複数存在し、
　前記複数の前記出力モデルは、それぞれモデルの作成条件が異なる
　請求項１から請求項１１の何れか１項に記載の学習装置。 There are a plurality of the output models,
The learning device according to any one of claims 1 to 11, wherein the plurality of output models have different model creation conditions.
　材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデルに、前記複数の組み合わせを入力することによって出力された実験条件を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値を用いて前記出力モデルの評価値を導出し、
　導出した評価値を反映させる機械学習によって前記出力モデルを学習させる
　処理をコンピュータが実行する学習方法。 The test condition output by inputting the plurality of combinations into the output model having the test condition as an output and a plurality of combinations of the test condition for generating the material or the drug and the performance value of the test result. Deriving the evaluation value of the output model using the performance value of the experimental result obtained by inputting into the experimental model for performing a virtual experiment,
A learning method in which a computer executes a process of learning the output model by machine learning that reflects a derived evaluation value.
　材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデルに、前記複数の組み合わせを入力することによって出力された実験条件を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値を用いて前記出力モデルの評価値を導出し、
　導出した評価値を反映させる機械学習によって前記出力モデルを学習させる
　処理をコンピュータに実行させるための学習プログラム。 The test condition output by inputting the plurality of combinations into the output model having the test condition as an output and a plurality of combinations of the test condition for generating the material or the drug and the performance value of the test result. Deriving the evaluation value of the output model using the performance value of the experimental result obtained by inputting into the experimental model for performing a virtual experiment,
A learning program for causing a computer to execute a process of learning the output model by machine learning that reflects a derived evaluation value.
　材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、強化学習における行動価値を出力とした出力モデルに、前記複数の組み合わせと複数の異なる前記実験条件の候補各々とをそれぞれ入力することにより出力された複数の行動価値のうち、所定値以上の行動価値に対応する前記実験条件の候補を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値に基づいて導出される値を報酬として、前記出力モデルを学習させる
　処理をコンピュータが実行する学習方法。 A plurality of combinations of the plurality of combinations and a plurality of combinations of the experimental conditions for generating the material or the drug and the performance values of the experimental results and the candidate of the experimental conditions as inputs and the action value in reinforcement learning as an output. Among the plurality of behavioral values output by inputting each of the experimental condition candidates different from each other, the experimental condition candidate corresponding to the behavioral value greater than or equal to a predetermined value is used as an experimental model for performing a virtual experiment. A learning method in which a computer executes a process of learning the output model using a value derived based on a performance value of an experimental result obtained by inputting as a reward.
　材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、強化学習における行動価値を出力とした出力モデルに、前記複数の組み合わせと複数の異なる前記実験条件の候補各々とをそれぞれ入力することにより出力された複数の行動価値のうち、所定値以上の行動価値に対応する前記実験条件の候補を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値に基づいて導出される値を報酬として、前記出力モデルを学習させる
　処理をコンピュータに実行させるための学習プログラム。 A plurality of combinations of the plurality of combinations and a plurality of combinations of the experimental conditions for generating the material or the drug and the performance values of the experimental results and the candidate of the experimental conditions as inputs and the action value in reinforcement learning as an output. Among the plurality of behavioral values output by inputting each of the experimental condition candidates different from each other, the experimental condition candidate corresponding to the behavioral value greater than or equal to a predetermined value is used as an experimental model for performing a virtual experiment. A learning program for causing a computer to execute a process of learning the output model using a value derived based on a performance value of an experimental result obtained by inputting as a reward.