WO2019198408A1 - Learning device, learning method, and learning program - Google Patents

Learning device, learning method, and learning program Download PDF

Info

Publication number
WO2019198408A1
WO2019198408A1 PCT/JP2019/010290 JP2019010290W WO2019198408A1 WO 2019198408 A1 WO2019198408 A1 WO 2019198408A1 JP 2019010290 W JP2019010290 W JP 2019010290W WO 2019198408 A1 WO2019198408 A1 WO 2019198408A1
Authority
WO
WIPO (PCT)
Prior art keywords
experimental
output
learning
value
model
Prior art date
Application number
PCT/JP2019/010290
Other languages
French (fr)
Japanese (ja)
Inventor
豪啓 安藤
理貴 近藤
Original Assignee
富士フイルム株式会社
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 富士フイルム株式会社 filed Critical 富士フイルム株式会社
Priority to JP2020513128A priority Critical patent/JP6804009B2/en
Publication of WO2019198408A1 publication Critical patent/WO2019198408A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/12Computing arrangements based on biological models using genetic models

Definitions

  • the present disclosure relates to a learning device, a learning method, and a learning program.
  • a data analysis device for extracting the value has been proposed (see Japanese Patent Laid-Open No. 2000-305941).
  • this data analysis apparatus examines the influence on the result value and calculates the effect of changing the value of the extracted evaluation item.
  • a data set selection device that learns the correspondence between output values corresponding to a plurality of attribute values of training data using a plurality of prediction algorithms using an active learning device including a plurality of different prediction algorithms.
  • This data set selection device predicts output values corresponding to the prediction data using a plurality of correspondence relationships respectively learned by a plurality of prediction algorithms, and obtains a plurality of prediction values for each of the plurality of prediction algorithms.
  • this data set selection device selects the one in which the variation of the plurality of prediction result values by the plurality of acquired prediction algorithms is large in the corresponding data set of the prediction data.
  • Japanese Patent Laid-Open No. 2000-305941 can only search for data similar to the data existing in the database. Therefore, even if applied to a method for searching for experimental conditions for generating materials, it is not always appropriate. There is a problem that it may not be possible to search for an experimental condition.
  • the techniques described in Japanese Patent Application Laid-Open Nos. 2007-304782 and 2016-530585 are not considered in the search for new experimental conditions in the first place. This problem is not limited to the research and development of materials, but can also occur in the research and development of drugs.
  • This disclosure has been made in view of the above circumstances, and an object thereof is to enable search for appropriate experimental conditions for generating a material or a drug.
  • the learning device is configured by inputting a plurality of combinations of an experimental condition for generating a material or a drug and a performance value of the experimental result, and inputting the plurality of combinations into an output model having the experimental condition as an output.
  • a derivation unit that derives the evaluation value of the output model using the performance value of the experimental result obtained by inputting the output experimental conditions into the experimental model for performing a virtual experiment, and the evaluation derived by the derivation unit
  • a learning unit that learns the output model by machine learning that reflects the value.
  • the evaluation value may be a better value as the ratio of the values that satisfy the target performance among the plurality of performance values is higher, or a performance value that satisfies the target performance can be obtained.
  • the derivation unit may correct the evaluation value to be low when an experimental condition that does not satisfy a predetermined rule is output from the output model.
  • the derivation unit may correct the experimental condition output from the output model to an experimental condition that can be used for an actual experiment.
  • the output model may be a model learned using a genetic algorithm.
  • the learning device of the present disclosure is an output model in which a plurality of combinations of experimental conditions for generating a material or a drug and performance values of experimental results, and experimental condition candidates are input, and the action value in reinforcement learning is output.
  • a virtual experiment is performed to select a candidate for an experimental condition corresponding to an action value greater than or equal to a predetermined value among a plurality of action values output by inputting a plurality of combinations and a plurality of different experiment condition candidates.
  • a learning unit is provided that learns the output model using a value derived based on the performance value of the experimental result obtained by inputting to the experimental model to be performed as a reward.
  • the learning device of the present disclosure is a virtual value until the reward is a better value as the ratio of values satisfying the target performance in the plurality of performance values is higher, or until a performance value satisfying the target performance is obtained.
  • the reinforcement learning may be Q learning
  • the action value may be a Q value
  • the learning device of the present disclosure has the maximum cumulative action value output by sequentially inputting a plurality of experimental condition candidates to the output model multiple times.
  • An output unit may be further provided that outputs the experimental condition candidate to be the next experimental condition candidate to be tested.
  • the experimental model may be a model obtained by machine learning.
  • the learning device of the present disclosure may include a plurality of experimental models, and the creation conditions of each of the plurality of experimental models may be different.
  • the experimental model may be a mathematical expression configured to include a function such as sin or exp. As a result, an output model can be generated even in an experimental system in which no experimental data is obtained.
  • the learning device of the present disclosure there may be a plurality of output models, and the creation conditions of the plurality of output models may be different.
  • a plurality of combinations of an experimental condition for generating a material or a drug and a performance value of the experimental result are input, and a plurality of combinations are input to an output model having the experimental condition as an output.
  • machine learning to derive the evaluation value of the output model using the performance value of the experimental result obtained by inputting the output experimental condition into the experimental model that performs the virtual experiment, and to reflect the derived evaluation value.
  • the learning program of the present disclosure receives a plurality of combinations of an experimental condition for generating a material or a drug and a performance value of the experimental result as an input, and inputs the plurality of combinations into an output model having the experimental condition as an output.
  • the learning method of the present disclosure is an output model in which a plurality of combinations of experimental conditions for generating a material or a drug and performance values of experimental results, and candidate experimental conditions are input, and the action value in reinforcement learning is output.
  • a virtual experiment is performed to select a candidate for an experimental condition corresponding to an action value greater than or equal to a predetermined value among a plurality of action values output by inputting a plurality of combinations and a plurality of different experiment condition candidates.
  • This is a method in which a computer executes a process of learning an output model using a value derived based on a performance value of an experimental result obtained by inputting to an experimental model to be performed as a reward.
  • the learning program of the present disclosure is an output model in which a plurality of combinations of experimental conditions for generating a material or a drug and performance values of experimental results, and candidate experimental conditions are input, and the action value in reinforcement learning is output.
  • a virtual experiment is performed to select a candidate for an experimental condition corresponding to an action value greater than or equal to a predetermined value among a plurality of action values output by inputting a plurality of combinations and a plurality of different experiment condition candidates. This is for causing a computer to execute a process of learning an output model using a value derived based on a performance value of an experimental result obtained by inputting to an experimental model to be performed as a reward.
  • the learning device of the present disclosure inputs a plurality of combinations of experimental conditions for generating a material or a drug and performance values of the experimental results, and inputs a plurality of combinations to an output model that outputs the experimental conditions.
  • This is a machine that derives the evaluation value of the output model using the performance value of the experimental result obtained by inputting the experimental condition output by the above into the experimental model for performing a virtual experiment, and reflects the derived evaluation value
  • a processor for learning the output model by learning
  • the learning device of the present disclosure outputs a combination of an experimental condition for generating a material or a drug and a performance value of the experimental result and an experimental condition candidate as inputs, and an action value in reinforcement learning as an output.
  • experimental condition candidates corresponding to action values equal to or greater than a predetermined value are virtually It has a processor that learns the output model by using as a reward a value derived based on the performance value of the experimental result obtained by inputting to the experimental model for performing the experiment.
  • the learning device 10 includes a derivation unit 12 and a learning unit 14. Further, the storage unit 42 (see FIG. 10) of the learning device 10 stores learning data 20, a plurality of output models 22, and a plurality of experiment models 24.
  • FIG. 2 shows an example of the learning data 20.
  • the learning data 20 includes a combination of an experimental condition for generating a material and a material performance value as an experimental result when an experiment is performed under the experimental condition.
  • the experimental conditions are, for example, conditions for generating a material such as a semiconductor resist material, and include a main component composition, an additive amount, and process conditions.
  • the main component composition indicates the ratio of the main component of the material
  • the additive amount indicates the concentration of the additive
  • the process condition indicates the temperature at which the material is generated.
  • the performance value of the learning data 20 indicates the performance value of the material when the material is generated under the corresponding experimental condition.
  • the performance value according to the present embodiment is a scale representing the quality of the material. Examples of the performance value include a degree of unevenness on the surface of the material and a degree representing whether a hole having a desired size is formed. In this embodiment, the smaller the performance value, the better the material.
  • the learning data 20 of the present embodiment includes a combination of a plurality of different experimental conditions and performance values.
  • the experimental conditions may include a plurality of the same conditions.
  • FIG. 3 shows an example of the output model 22.
  • the output model 22 according to this embodiment is a neural network including an input layer, a plurality of intermediate layers, and an output layer. A plurality of combinations of experimental conditions and performance values are input to the input layer of the output model 22.
  • the output layer of the output model 22 outputs one experimental condition.
  • FIG. 4 shows an example of the data structure of the experimental conditions output from the output layer of the output model 22. As shown in FIG. 4, the output layer of the output model 22 outputs experimental conditions including, for example, the main component composition, the additive amount, and the process conditions.
  • the output model 22 is configured, for example, as shown in the following (1) to (3).
  • N Number of nodes in the input layer: N ⁇ M N represents the number of items in the experimental condition, and M represents the number of experiments.
  • Configuration of the intermediate layer It has 10 convolution layers with a kernel of 3 ⁇ 3, a filter number of 32, a stride of 2, and an activation function of Relu.
  • Number of nodes in the output layer N ⁇ 1
  • the plurality of output models 22 according to the present embodiment have different model creation conditions. More specifically, the plurality of output models 22 have different model creation conditions depending on at least one of the number of intermediate layers, the number of nodes in each intermediate layer, and the initial value of the weight.
  • FIG. 5 shows an example of the experimental model 24.
  • the experimental model 24 according to the present embodiment is a neural network including an input layer, a plurality of intermediate layers, and an output layer.
  • the experimental model 24 is a model for performing a virtual experiment, and one experimental condition is input to the input layer of the experimental model 24.
  • the output layer of the experimental model 24 outputs a performance value of an experimental result corresponding to one experimental condition input to the input layer.
  • FIG. 6 shows an example of the data structure of the performance value of the experimental result output from the output layer of the experimental model 24.
  • the experimental model 24 may output a plurality of types of performance values. In this case, for example, the experimental model 24 outputs both the degree of unevenness on the surface of the material and the light sensitivity of the material as the performance value of the material.
  • the experimental model 24 is configured, for example, as shown in the following (4) to (6).
  • Number of nodes in the input layer N ⁇ 1 N represents the number of items in the experimental condition.
  • Configuration of the intermediate layer It has four convolution layers with a kernel of 3 ⁇ 3, a filter number of 32, a stride of 2, and an activation function of Relu.
  • Number of nodes in the output layer 1 ⁇ J J represents the number of types of performance values.
  • the plurality of experimental models 24 have different model creation conditions. Specifically, the plurality of experimental models 24 have different model creation conditions, because at least one of the number of intermediate layers, the number of nodes of each layer of the intermediate layer, and the initial value of the weight is different.
  • the deriving unit 12 inputs a plurality of combinations of the experimental conditions for generating the material and the performance values of the experimental results to the output model 22, and acquires the experimental conditions output from the output model 22. Specifically, the derivation unit 12 first inputs combinations of all experimental conditions and performance values included in the learning data 20 to the output model 22, and acquires the experimental conditions output from the output model 22. The derivation unit 12 may input a combination of some of the plurality of experimental conditions and performance values included in the learning data 20 to the output model 22 or a plurality of experimental conditions different from the learning data 20. And a combination of performance values may be input to the output model 22.
  • the derivation unit 12 corrects the experimental condition output from the output model 22 to an experimental condition that can be used for an actual experiment.
  • the derivation unit 12 corrects the experimental condition output from the output model 22 to the closest experimental condition that satisfies the constraints of the experimental apparatus actually used.
  • the temperature that can be set in the process condition is a unit of 5 ° C.
  • the temperature of the process condition included in the experimental condition output from the output model 22 is not a unit of 5 ° C. (for example, 92. 3 °)
  • the derivation unit 12 corrects the temperature output from the output model 22 to the nearest multiple of 5 (for example, 90 ° C.).
  • the derivation unit 12 inputs the experimental conditions obtained by the correction to each experimental model 24, and acquires the performance values output from each experimental model 24, respectively.
  • the derivation unit 12 adds a combination of the experimental condition input to the corresponding experimental model 24 and the derived performance value to a plurality of combinations of the plurality of sets of experimental conditions input to the output model 22. Obtain multiple combinations of multiple sets of experimental conditions and performance values.
  • the deriving unit 12 again inputs the plurality of combinations of the obtained plurality of sets of experimental conditions and performance values to the output model 22 to each of the corresponding experimental models 24. input. Thereby, the deriving unit 12 obtains performance values corresponding to the experimental conditions input to the corresponding experimental models 24 again.
  • the deriving unit 12 adds a combination of the experimental condition input to the corresponding experimental model 24 and the obtained performance value to the plurality of combinations of the experimental condition and performance value input to the output model 22 described above. Then, the process of obtaining the performance value again using the corresponding experimental model 24 is repeated a predetermined number of times (for example, 100 times).
  • the derivation unit 12 performs the above processing on each output model 22. That is, for each output model 22, the derivation unit 12 obtains a plurality of combinations of the experimental conditions output from the output model 22 for a predetermined number of times and the performance values corresponding to the experimental conditions.
  • the derivation unit 12 derives an evaluation value of the output model 22 by using the obtained performance value for a predetermined number of times for each output model 22.
  • the derivation unit 12 performs a virtual experiment until a performance value that satisfies the target performance (in this embodiment, a performance value that is equal to or less than the target value) is obtained.
  • the evaluation value of the output model 22 is derived as a better value as the number of times (N shown in FIG. 7) is smaller.
  • N the number of times
  • the vertical axis in FIG. 7 indicates the performance value
  • the horizontal axis indicates the number of virtual experiments indicating the number of virtual experiments obtained by the virtual value.
  • a performance value that satisfies the target performance is obtained for the first time in the Nth virtual experiment.
  • the derivation unit 12 has a ratio of performance values satisfying the target performance in the obtained performance values for a predetermined number of times (a chain line with respect to the number of all performance values shown in FIG. 8.
  • the evaluation value of the output model 22 may be derived as a better value as the ratio of the number of performance values surrounded by the rectangle is higher. Note that “good” in FIG. 8 means that the target performance is satisfied. Further, the deriving unit 12 may derive the evaluation value of the output model 22 as a better value as each performance value is closer to the target value.
  • the derivation unit 12 may correct the evaluation value to be low when an experimental condition that does not satisfy a predetermined rule is output from the output model 22.
  • a predetermined rule include a rule according to a user's empirical rule such that the material A and the material B are not mixed, or five or more kinds of materials are not mixed.
  • the learning unit 14 learns the experimental model 24 using an error back propagation method as an example of machine learning. Specifically, the learning unit 14 inputs an experimental condition included in the learning data 20 to the experimental model 24 and acquires a performance value output from the experimental model 24. Then, the learning unit 14 learns the experimental model 24 so that the difference between the acquired performance value and the performance value corresponding to the experimental condition included in the learning data 20 is minimized. The learning unit 14 performs the process of learning the experimental model 24 using combinations of all experimental conditions and performance values included in the learning data 20. The learning unit 14 may learn the experimental model 24 using a plurality of combinations of some experimental conditions and performance values included in the learning data 20. Further, the data input to each experimental model 24 when the learning unit 14 learns each experimental model 24 may be the same data or different data among the experimental models 24.
  • the learning unit 14 uses the evaluation value derived by the deriving unit 12 for each output model 22 to learn each output model 22 by machine learning using a genetic algorithm as an example of an optimization algorithm.
  • parameters such as an individual selection method (for example, roulette selection), a crossover method (for example, two-point crossover), and a mutation probability used in this genetic algorithm are preset by the user.
  • the learning unit 14 generates a new output model 22 by mating two output models 22 having the best evaluation among the output models 22. This crossing is performed, for example, with an input layer side half of the input layer and intermediate layer of one output model 22 and an intermediate layer of the output layer side of the intermediate layer of the other output model 22 This is done by combining the output layers.
  • the method of mating is not limited to this example. For example, the upper half of the input layer, intermediate layer, and output layer shown in FIG. 3 of one output model 22 and the lower half of the input layer, intermediate layer, and output layer shown in FIG.
  • Mating may be performed by combining
  • the learning unit 14 generates the next generation output model 22 using a genetic algorithm so that the number of output models 22 does not change between generations. That is, the output model 22 is learned by updating the weight value of the output model 22 by using a genetic algorithm. In addition, by learning the output model 22, the evaluation value derived by the deriving unit 12 is reflected.
  • the derivation process of the evaluation value of each output model 22 by the derivation unit 12 and the learning process of the output model 22 group by the learning unit 14 are performed for a predetermined number of generations (for example, 10,000 generations). Then, the learning unit 14 stores, in the storage unit 42, one output model 22 having the best evaluation indicated by the evaluation value in the final generation as an output model 22A used in an operation phase described later. Note that the derivation process of the evaluation value of each output model 22 by the derivation unit 12 and the learning process of the output model 22 group by the learning unit 14 may be performed until the evaluation value converges.
  • the learning device 10 includes a reception unit 30 and an output unit 32. Further, the storage unit 42 of the learning device 10 stores the output model 22A obtained in the learning phase described above.
  • the accepting unit 30 accepts a plurality of combinations of an experimental condition for generating a material input by the user via the input unit 44 (see FIG. 10) and a performance value of the material of the experimental result.
  • the output unit 32 inputs a plurality of combinations of the experimental conditions and performance values received by the receiving unit 30 to the output model 22A, and acquires the experimental conditions output from the output model 22A.
  • the output unit 32 corrects the experimental condition output from the output model 22A to an experimental condition that can be used for an actual experiment, as in the derivation unit 12 in the learning phase. Then, the output unit 32 outputs the experimental condition obtained by the correction to the display unit 43 (see FIG. 10).
  • the user visually observes the experimental conditions displayed on the display unit 43 and performs an experiment under the experimental conditions as necessary. Note that the output unit 32 may output (store) experimental conditions obtained by the correction to the storage unit 42.
  • the learning device 10 is realized by a computer shown in FIG. As illustrated in FIG. 10, the learning device 10 includes a CPU (Central Processing Unit) 40, a memory 41 as a temporary storage area, and a nonvolatile storage unit 42.
  • the learning apparatus 10 includes a display unit 43 such as a liquid crystal display and an input unit 44 such as a keyboard and a mouse.
  • the CPU 40, the memory 41, the storage unit 42, the display unit 43, and the input unit 44 are connected via a bus 45.
  • the storage unit 42 is realized by an HDD (Hard Disk Drive), an SSD (Solid State Drive), a flash memory, or the like.
  • a learning program 50 is stored in the storage unit 42 as a storage medium.
  • the CPU 40 reads the learning program 50 from the storage unit 42, and executes the read learning program 50 after expanding it in the memory 41.
  • the CPU 40 executes the learning program 50, the CPU 40 functions as the derivation unit 12, the learning unit 14, the reception unit 30, and the output unit 32.
  • the learning device 10 executes the learning program 50
  • the experimental model learning process shown in FIG. 11, the output model learning process shown in FIG. 12, and the experimental condition output process shown in FIG. 13 are executed.
  • the experimental model learning process illustrated in FIG. 11 is executed, for example, when an instruction to perform the experimental model learning process is input by the user via the input unit 44 in the learning phase.
  • the output model learning process illustrated in FIG. 12 is executed, for example, when a user inputs an execution instruction for the output model learning process via the input unit 44 in the learning phase.
  • the experiment condition output process shown in FIG. 13 is executed, for example, when the user inputs an execution instruction for the experiment condition output process via the input unit 44 in the operation phase.
  • step S10 of FIG. 11 the learning unit 14 reads the learning data 20 from the storage unit 42.
  • step S12 the learning unit 14 generates a plurality of experimental models 24 having different model creation conditions.
  • step S14 the learning unit 14 selects one experimental model 24 to be learned from the plurality of experimental models 24 generated by the processing in step S12. Note that when the process of step S14 is repeatedly executed, the learning unit 14 selects an experimental model 24 that has not been selected so far.
  • step S16 the learning unit 14 uses the learning data 20 read out in step S10 to learn the experimental model 24 selected in step S14 by the error back propagation method. .
  • step S18 the learning unit 14 stores the experimental model 24 learned by the process in step S16 in the storage unit 42.
  • step S20 the learning unit 14 determines whether or not the processing in steps S14 to S18 has been completed for all the experimental models 24 generated by the processing in step S12. If this determination is negative, the process returns to step S14. If the determination is affirmative, the experimental model learning process ends.
  • the learning unit 14 generates a plurality of output models 22 having different model creation conditions.
  • the derivation unit 12 inputs a plurality of combinations of experimental conditions for generating materials and performance values of experimental results to each output model 22, and acquires the experimental conditions output from each output model 22. To do.
  • step S32 The plurality of combinations of the experimental conditions and the performance values are obtained when step S32 is executed for the first time of each generation of the output model 22 (that is, when step S32 is executed for the first time, or in step S46 described later).
  • step S32 is executed for the first time after a negative determination
  • the combination of all the experimental conditions and performance values included in the learning data 20 is used.
  • a plurality of combinations of the experimental conditions and the performance values are obtained when step S32 is executed after the second time of each generation of the output model 22 (that is, step S32 is determined after the determination in step S40 is negative).
  • step S38 the combination of the experimental condition and the performance value is added in step S38, which will be described later, to the plurality of combinations of the experimental condition and the performance value input to the output model 22 in the previous step S32. It becomes.
  • step S34 the derivation unit 12 corrects the experimental condition output from each output model 22 by the processing in step S32 to an experimental condition that can be used in an actual experiment.
  • step S ⁇ b> 36 the derivation unit 12 inputs each experimental condition corrected by the processing in step S ⁇ b> 34 to each experimental model 24, and acquires the performance value output from each experimental model 24. Further, the derivation unit 12 holds a plurality of combinations of experimental conditions and performance values for each output model 22 corresponding to the experimental conditions output from the output model 22.
  • step S38 the derivation unit 12 adds the following combinations of experimental conditions and performance values to the plurality of combinations of the experimental conditions and performance values input to the output model 22 by the processing of step S32 (immediately before). to add. That is, in this case, the derivation unit 12 adds a combination of the experimental condition and the performance value input to the experimental model 24 by the process of step S36 this time. A plurality of combinations of experimental conditions and performance values obtained by performing this addition are used in step S32 to be executed next after a negative determination is made in step S40 described later.
  • step S40 the derivation unit 12 determines whether or not the processes in steps S32 to S38 have been repeated a predetermined number of times (for example, 100 times). If the determination is negative, the process returns to step S32. If the determination is affirmative, the process proceeds to step S42.
  • step S42 the derivation unit 12 derives the evaluation value of the output model 22 by using the performance value for the predetermined number of times obtained by the repetition processing of step S32 to step S38 for each output model 22.
  • step S44 the learning unit 14 generates the next generation output model 22 by a genetic algorithm using the evaluation value derived by the process of step S42 for each output model 22.
  • This next-generation output model 22 is used in step S32 to be executed next after a negative determination is made in step S46 described later.
  • step S46 the learning unit 14 determines whether or not the number of generations of the output model 22 has reached a predetermined number of generations (for example, 10,000 generations). If this determination is negative, the process returns to step S32. If the determination is affirmative, the process proceeds to step S48.
  • step S48 as described above, the learning unit 14 stores one output model 22 having the best evaluation indicated by the evaluation value in the final generation in the storage unit 42 as the output model 22A. When the process of step S48 ends, the output model learning process ends.
  • step S50 of FIG. 13 the accepting unit 30 accepts a plurality of combinations of the experimental conditions for generating the material input by the user via the input unit 44 and the performance value of the material of the experimental result.
  • step S52 the output unit 32 reads the output model 22A from the storage unit.
  • step S54 the output unit 32 inputs a plurality of combinations of the experimental condition and the performance value received by the process of step S50 to the output model 22A read by the process of step S52, and outputs from the output model 22A. Obtained experimental conditions.
  • step S56 the output unit 32 corrects the experimental condition output from the output model 22A by the processing in step S54 to the experimental condition usable in the actual experiment.
  • step S58 the output unit 32 outputs the experimental condition corrected by the process of step S56 to the display unit 43 as described above. Through the processing in step S58, the experimental conditions are displayed on the display unit 43. When the process of step S58 ends, the experiment condition output process ends.
  • the conditions are input to the experimental model 24 for performing a virtual experiment.
  • the evaluation value of the output model 22 is derived using the performance value of the experimental result obtained by this input.
  • the output model 22 is learned by machine learning using the derived evaluation value of the output model 22. Therefore, by using the output model 22 learned in this way, it is possible to search for an appropriate experimental condition of the material.
  • the learning device 10 includes a derivation unit 12A, a learning unit 14A, and a generation unit 16.
  • the storage unit 42 stores learning data 20, a plurality of output models 22B, and a plurality of experimental models 24.
  • FIG. 15 shows an example of the output model 22B.
  • the output model 22B according to the present embodiment is a neural network including an input layer, a plurality of intermediate layers, and an output layer.
  • a plurality of combinations of experimental conditions for generating materials and performance values of experimental results, and one experimental condition candidate are input to the input layer of the output model 22B.
  • the output layer of the output model 22B outputs a Q value as an example of an action value in reinforcement learning. That is, the learning device 10 according to the present embodiment sets the output model 22B according to Q learning as an example of reinforcement learning, with a plurality of combinations of the experimental condition and the performance value as the current state s, the experimental condition candidate as the action a. Let them learn.
  • the plurality of output models 22B according to the present embodiment also have different model creation conditions, like the output model 22 according to the first embodiment.
  • the generation unit 16 generates a plurality of different experimental condition candidates.
  • the generation unit 16 generates experimental condition candidates that satisfy a predetermined rule and can be used in an actual experiment. Since this rule and the experimental conditions that can be used in the actual experiment are the same as those in the first embodiment, description thereof is omitted. Specifically, each time a plurality of different experimental condition candidates are generated, the generation unit 16 randomly generates experimental condition candidates that satisfy a predetermined rule and can be used in an actual experiment.
  • the deriving unit 12A derives a value (hereinafter referred to as “reward value”) used as a reward when the learning unit 14A described later learns each output model 22B according to Q learning.
  • reward value a value used as a reward when the learning unit 14A described later learns each output model 22B according to Q learning.
  • the derivation unit 12A inputs, to the output model 22B, a plurality of combinations of experimental conditions for generating materials and performance values of experimental results, and experimental condition candidates generated by the generation unit 16, and outputs the output model 22B.
  • the Q value output from is acquired.
  • the derivation unit 12 ⁇ / b> A includes a plurality of combinations of experimental conditions for generating materials and performance values of experimental results, and a plurality of experimental condition candidates generated by the generation unit 16. Any one of the above is individually input to the output model 22B for all the generated experimental condition candidates. That is, the deriving unit 12A acquires the Q value output from the output model 22B corresponding to each of the plurality of experimental condition candidates generated by the generating unit 16.
  • the derivation unit 12A inputs the experimental condition candidate corresponding to any of the Q values equal to or greater than a predetermined value among the plurality of acquired Q values to the experimental model 24.
  • the derivation unit 12A inputs the experimental condition candidate corresponding to the maximum Q value among the plurality of acquired Q values to each experimental model 24, and uses the performance value output from each experimental model 24. get.
  • the derivation unit 12A holds a plurality of combinations of experimental conditions and performance values of experimental results, respectively.
  • the derivation unit 12A includes an experimental condition and a performance value obtained by adding a combination of the experimental condition input to the experimental model 24 and the derived performance value to a plurality of combinations of the experimental condition and the performance value input to the output model 22B. Get multiple combinations.
  • the derivation unit 12A again selects any one of a plurality of combinations of the obtained experimental conditions and performance values and a plurality of experimental condition candidates generated by the generation unit 16 for all the generated experiments.
  • the candidate conditions are individually input to the output model 22B.
  • the derivation unit 12A again uses the Q value output from the output model 22B corresponding to each of the plurality of experimental condition candidates and the experimental model 24, similarly to the above-described processing, to use the experimental model candidates. Get the performance value corresponding to.
  • the deriving unit 12A repeats the process for acquiring the performance value corresponding to the experimental condition candidate a predetermined number of times (for example, 100 times).
  • the derivation unit 12A performs the above processing for each output model 22B. That is, the derivation unit 12A acquires a performance value for a predetermined number of times for each output model 22B. Similarly to the derivation unit 12 according to the first embodiment, the derivation unit 12A derives an evaluation value of the output model 22B for each output model 22B using the obtained performance values for a predetermined number of times (see FIG. 7). .
  • the derivation unit 12A derives a reward value so that a higher reward is obtained for the output model 22B having a higher derived evaluation value. For example, the derivation unit 12A derives the reward value of the upper three output models 22B as “1” in descending order of the evaluation value, derives the reward value of the lower three output models 22B as “ ⁇ 1”, The reward value of the output model 22B is derived as “0”.
  • the learning unit 14A learns each output model 22B by using the reward value derived by the derivation unit 12A as the reward r in the Q learning.
  • the process for deriving the reward value of each output model 22B by the deriving unit 12A and the learning process of each output model 22B by the learning unit 14A are performed a predetermined number of times (for example, 10,000 times). Then, in the last round, the learning unit 14A stores one output model 22B having the best evaluation indicated by the evaluation value in the storage unit 42 as an output model 22C used in an operation phase described later. Note that the process for deriving the reward value of each output model 22B by the deriving unit 12A and the learning process of each output model 22B by the learning unit 14A may be performed until the evaluation value converges.
  • the learning unit 14A uses the learning data 20 to learn the experimental model 24 according to the error back-propagation method, similarly to the learning unit 14 according to the first embodiment.
  • the learning device 10 includes a generation unit 16, a reception unit 30, and an output unit 32A. Further, the storage unit 42 of the learning device 10 stores the output model 22C obtained in the learning phase described above.
  • the output unit 32 ⁇ / b> A selects all ones of the plurality of combinations of the experimental conditions and performance values received by the receiving unit 30 and the plurality of experimental condition candidates generated by the generating unit 16.
  • the candidate conditions are individually input to the output model 22C.
  • the output unit 32A acquires the Q value output from the output model 22C corresponding to each of the inputs. Then, the output unit 32A outputs the experimental condition candidate corresponding to the maximum Q value among the acquired Q values to the display unit 43 as the experimental condition candidate to be the next experiment target.
  • the output unit 32A selects experimental condition candidates corresponding to any one of the acquired Q values that are equal to or higher than a predetermined value (for example, the Q value that is equal to or higher than the predetermined value and is the second largest).
  • the output unit 32A outputs (stores) the experimental condition candidate corresponding to the maximum Q value among the acquired Q values to the storage unit 42 as the experimental condition candidate to be the next experiment target. Also good.
  • the hardware configuration of the learning device 10 according to the present embodiment is the same as that of the learning device 10 according to the first embodiment (see FIG. 10), the description thereof is omitted.
  • the CPU 40 executes the learning program 50, the CPU 40 functions as the derivation unit 12A, the learning unit 14A, the generation unit 16, the reception unit 30, and the output unit 32A.
  • the output model learning process illustrated in FIG. 18 is executed, for example, when a user inputs an execution instruction for the output model learning process via the input unit 44 in the learning phase.
  • the experiment condition output process shown in FIG. 19 is executed, for example, when the user inputs an instruction to execute the experiment condition output process via the input unit 44 in the operation phase.
  • step S60 the learning unit 14 generates a plurality of output models 22B having different model creation conditions.
  • steps S62 to S70 are similarly executed for each output model 22B generated by the processing of step S60.
  • step S62 the generation unit 16 generates a plurality of different experimental condition candidates as described above.
  • step S64 as described above, the derivation unit 12A outputs a plurality of combinations of the experimental conditions for generating the material and the performance values of the experimental results, and the experimental condition candidates generated by the process of step S62 as an output model.
  • the Q value input to 22B and output from the output model 22B is acquired.
  • step S64 is executed for the first time in the learning process of the output model 22B (that is, when step S64 is executed for the first time, or in step S78 described later).
  • step S62 is executed for the first time after a negative determination
  • the combination of all the experimental conditions and performance values included in the learning data 20 is used.
  • step S64 a plurality of combinations of the experimental conditions and the performance values are obtained when step S64 is executed after the second time in the learning process of the output model 22B (that is, after the determination in step S70 is negative)
  • step S68 the combination of the experimental condition and the performance value is added in step S68 described later to the plurality of combinations of the experimental condition and the performance value input to the output model 22B in the previous step S64. It becomes.
  • step S66 the derivation unit 12A inputs the experimental condition candidate corresponding to the maximum Q value among the plurality of Q values acquired by the process of step S64 to each experimental model 24, and outputs from each experimental model 24. Get the measured performance value. Further, the deriving unit 12A holds a plurality of combinations of the experimental condition and the performance value of the experimental result, respectively, corresponding to the experimental condition candidate corresponding to the maximum Q value.
  • step S68 the derivation unit 12A adds the following combinations of experimental conditions and performance values to a plurality of combinations of the experimental conditions and performance values input to the output model 22B by the processing of step S64 this time (immediately before). to add. That is, in this case, the deriving unit 12A adds a combination of the experimental condition input to the experimental model 24 and the acquired performance value by the process of step S66 this time. A plurality of combinations of experimental conditions and performance values obtained by performing this addition are used in step S64 to be executed next after the determination in step S70 described later is negative.
  • step S70 the derivation unit 12A determines whether or not the processing in steps S62 to S68 has been repeated a predetermined number of times (for example, 100 times). If this determination is negative, the process returns to step S62. If the determination is affirmative, the process proceeds to step S72.
  • a predetermined number of times for example, 100 times.
  • step S72 the derivation unit 12A derives the evaluation value of the output model 22B using the performance value for the predetermined number of times obtained by the repetition processing of step S62 to step S68 for each output model 22B. To do.
  • step S74 as described above, the derivation unit 12A derives a reward value so that a higher reward is obtained for the output model 22B having a higher evaluation value derived by the process of step S72.
  • step S76 the learning unit 14A learns each output model 22B using the reward value derived by the process in step S74 as the reward r in the Q learning.
  • step S78 the learning unit 14 determines whether or not the processes in steps S62 to S76 have been repeated a predetermined number of times (for example, 10,000 times). If this determination is negative, the process returns to step S62. If the determination is affirmative, the process proceeds to step S80.
  • step S80 as described above, the learning unit 14A stores in the storage unit 42, as the output model 22C, one output model 22B having the best evaluation indicated by the evaluation value derived by the process of step S72 executed last. To do. When the process of step S80 ends, the output model learning process ends.
  • the accepting unit 30 accepts a plurality of combinations of the experimental conditions for generating the material input by the user via the input unit 44 and the performance value of the material of the experimental result.
  • the output unit 32A reads the output model 22C from the storage unit.
  • the generation unit 16 generates a plurality of different experimental condition candidates as described above.
  • step S96 the output unit 32A outputs one of a plurality of combinations of the experimental condition and the performance value received by the process of step S90 and a plurality of experimental condition candidates generated by the process of step S92. All the generated experimental condition candidates are individually input to the output model 22C.
  • the output unit 32A acquires the Q value output from the output model 22C corresponding to each of the inputs.
  • step S98 the output unit 32A displays the experimental condition candidate corresponding to the maximum Q value among the plurality of Q values acquired by the process of step S96 as the experimental condition candidate to be tested next. Output to 43.
  • the experiment condition output process ends.
  • a plurality of combinations of experimental conditions for generating materials and performance values of experimental results, and candidates for experimental conditions are input, and an output model is output with Q value.
  • the experimental condition candidate that maximizes the Q value obtained by 22B is input to the experimental model 24.
  • an evaluation value of the output model 22B is derived using the performance value of the experimental result obtained by this input, and a reward given to the output model 22B is derived according to the derived evaluation value.
  • the output model 22B is learned by Q learning using the derived reward. Therefore, by using the output model 22B learned in this way, it is possible to search for an appropriate experimental condition for the material.
  • each said embodiment demonstrated the case where experiment conditions for producing
  • the present invention is not limited to this as long as it is a model that allows a virtual experiment.
  • the experimental model 24 an arbitrary function having one experimental condition as an input and outputting a performance value of an experimental result corresponding to the input one experimental condition may be applied. Even when such a model is applied, the output models 22 and 22B are optimized by learning.
  • the experimental model 24 may be a simulator that simulates an experiment.
  • the output unit 32A allows the experimental condition in which the cumulative Q value obtained by sequentially inputting a plurality of experimental condition candidates to the output model 22C is maximized. These candidates may be output as candidates for the experimental conditions to be the next experiment target.
  • the output unit 32A first obtains a Q value corresponding to each of a plurality of first experimental condition candidates from the output model 22C, as in the second embodiment.
  • the output unit 32A for example, combines a plurality of combinations of experimental conditions and performance values input to the output model 22C for the first time with combinations of experimental condition candidates and performance values input to the output model 22C for the first time. Add This performance value may be estimated by a known method such as SVM (Support Vector Vector Machine).
  • the output unit 32A inputs a plurality of combinations of experimental conditions and performance values obtained by addition, and a plurality of second experimental condition candidates to the output model 22C.
  • a Q value corresponding to each of a plurality of second experimental condition candidates is obtained from the output model 22C.
  • the output unit 32A outputs the experimental condition candidate that maximizes the cumulative value of the first Q value and the second Q value as the next experimental condition candidate.
  • the case where the accumulated value of Q value of 2 times was used was demonstrated here, the case where the accumulated value of Q value of 3 times or more is used is similarly possible.
  • various processors other than the CPU may execute various processes executed by the CPU executing software (programs) in each of the above embodiments.
  • a processor in this case, in order to execute specific processing such as PLD (Programmable Logic Device) and ASIC (Application Specific Integrated Circuit) whose circuit configuration can be changed after manufacturing FPGA (Field-Programmable Gate Array) or the like
  • a dedicated electric circuit which is a processor having a circuit configuration designed exclusively, is exemplified.
  • the above-described various processes may be executed by one of these various processors, or a combination of two or more processors of the same type or different types (for example, a combination of a plurality of FPGAs and CPUs and FPGAs). Etc.).
  • the hardware structure of these various processors is an electric circuit in which circuit elements such as semiconductor elements are combined.
  • the learning program 50 has been previously stored (installed) in the storage unit 42.
  • the learning program 50 is provided in a form recorded on a non-temporary recording medium such as a CD-ROM (Compact Disk Read Only Memory), DVD-ROM (Digital Versatile Disk Disk Read Only Memory), and USB (Universal Serial Bus) memory. May be.
  • the learning program 50 may be downloaded from an external device via a network.

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Software Systems (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Biophysics (AREA)
  • Computing Systems (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Health & Medical Sciences (AREA)
  • Medical Informatics (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Genetics & Genomics (AREA)
  • Biomedical Technology (AREA)
  • Computational Linguistics (AREA)
  • General Health & Medical Sciences (AREA)
  • Molecular Biology (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)
  • Feedback Control In General (AREA)

Abstract

Provided is a learning device comprising: a derivation unit for deriving an evaluation value for an output model which takes as input a plurality of combinations of experimental conditions for generating a material or a drug and performance values of experimental results and outputs the experimental conditions, said evaluation value of said output model being derived using the performance values of the experimental results obtained by inputting, into an experimental model for virtually carrying out experiments, the experimental conditions outputted by inputting the plurality of combinations into the output model; and a learning unit for learning the output model by machine learning taking into account the evaluation value derived by the derivation unit.

Description

学習装置、学習方法、及び学習プログラムLearning device, learning method, and learning program
 本開示は、学習装置、学習方法、及び学習プログラムに関する。 The present disclosure relates to a learning device, a learning method, and a learning program.
 第1の結果値を有するデータと第2の結果値を有するデータとの間の関係に基づいて、第2の結果値を有するデータを第1の結果値を有するデータに変化させるための評価項目及びその値を抽出するデータ分析装置が提案されている(特開2000-305941号公報参照)。このデータ分析装置は、抽出した評価項目の値を変更する場合に、結果値への影響を調べ、かつ抽出した評価項目の値の変更の効果を計算する。 Evaluation item for changing data having the second result value to data having the first result value based on the relationship between the data having the first result value and the data having the second result value And a data analysis device for extracting the value has been proposed (see Japanese Patent Laid-Open No. 2000-305941). When changing the value of the extracted evaluation item, this data analysis apparatus examines the influence on the result value and calculates the effect of changing the value of the extracted evaluation item.
 また、訓練データの複数の属性値に対応する出力値の対応関係を、異なる複数の予測アルゴリズムを含む能動学習装置を用いて、複数の予測アルゴリズムでそれぞれ学習させるデータセット選択装置が提案されている(特開2007-304782号公報参照)。このデータセット選択装置は、複数の予測アルゴリズムによりそれぞれ学習された複数の対応関係を用いて、予測データに対応する出力値を予測し、予測結果値として複数の予測アルゴリズム毎に複数取得する。また、このデータセット選択装置は、取得した複数の予測アルゴリズムによる複数の予測結果値のばらつきが、対応する予測データのデータセット内で大きいものを選択する。 In addition, there has been proposed a data set selection device that learns the correspondence between output values corresponding to a plurality of attribute values of training data using a plurality of prediction algorithms using an active learning device including a plurality of different prediction algorithms. (See JP 2007-304782). This data set selection device predicts output values corresponding to the prediction data using a plurality of correspondence relationships respectively learned by a plurality of prediction algorithms, and obtains a plurality of prediction values for each of the plurality of prediction algorithms. In addition, this data set selection device selects the one in which the variation of the plurality of prediction result values by the plurality of acquired prediction algorithms is large in the corresponding data set of the prediction data.
 また、技術的な系の出力量であって、入力量ベクトルの形式の複数の入力量に非線形に依存する出力量に関するモデルを算出する技術が提案されている(特開2016-530585号公報参照)。 Further, a technique has been proposed for calculating a model related to an output quantity that is a technical output quantity and is nonlinearly dependent on a plurality of input quantities in the form of an input quantity vector (see Japanese Patent Laid-Open No. 2016-530585). ).
 材料の研究開発において、性能がより良い材料を獲得するために、実験を繰り返し行うことによって、性能がより良い材料を探索することが行われている。この場合、過去に行われた材料を生成するための実験条件と実験結果の性能値との組み合わせから、適切な実験条件を新たに探索することができると研究開発の効率化のためにも好ましい。 In the research and development of materials, in order to obtain materials with better performance, searching for materials with better performance is performed by repeatedly conducting experiments. In this case, it is preferable to improve the efficiency of research and development that a new suitable experimental condition can be searched from the combination of the experimental condition for generating the material performed in the past and the performance value of the experimental result. .
 特開2000-305941号公報に記載の技術では、データベースに存在するデータに類似するデータしか探索することができないため、材料を生成するための実験条件を探索する手法に適用したとしても、必ずしも適切な実験条件を探索することができない場合がある、という問題点がある。また、特開2007-304782号公報及び特開2016-530585号公報に記載の技術は、そもそも新たな実験条件を探索することについては考慮されていない。なお、この問題点は、材料の研究開発に限らず、薬剤の研究開発でも発生し得る問題点である。 The technique described in Japanese Patent Laid-Open No. 2000-305941 can only search for data similar to the data existing in the database. Therefore, even if applied to a method for searching for experimental conditions for generating materials, it is not always appropriate. There is a problem that it may not be possible to search for an experimental condition. In addition, the techniques described in Japanese Patent Application Laid-Open Nos. 2007-304782 and 2016-530585 are not considered in the search for new experimental conditions in the first place. This problem is not limited to the research and development of materials, but can also occur in the research and development of drugs.
 本開示は、以上の事情を鑑みて成されたものであり、材料又は薬剤を生成するための適切な実験条件を探索可能とすることを目的とする。 This disclosure has been made in view of the above circumstances, and an object thereof is to enable search for appropriate experimental conditions for generating a material or a drug.
 本開示の学習装置は、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデルに、複数の組み合わせを入力することによって出力された実験条件を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値を用いて出力モデルの評価値を導出する導出部と、導出部により導出された評価値を反映させる機械学習によって出力モデルを学習させる学習部と、を備えている。 The learning device according to the present disclosure is configured by inputting a plurality of combinations of an experimental condition for generating a material or a drug and a performance value of the experimental result, and inputting the plurality of combinations into an output model having the experimental condition as an output. A derivation unit that derives the evaluation value of the output model using the performance value of the experimental result obtained by inputting the output experimental conditions into the experimental model for performing a virtual experiment, and the evaluation derived by the derivation unit And a learning unit that learns the output model by machine learning that reflects the value.
 これにより、材料又は薬剤を生成するための適切な実験条件を探索することができる。 This makes it possible to search for appropriate experimental conditions for generating materials or drugs.
 なお、本開示の学習装置は、評価値が、複数の性能値における目標とする性能を満たす値の比率が高いほど良い値であってもよいし、目標とする性能を満たす性能値が得られるまでの仮想的な実験回数が少ないほど良い値であってもよいし、性能値が目標とする性能に近いほど良い値であってもよい。 Note that in the learning device of the present disclosure, the evaluation value may be a better value as the ratio of the values that satisfy the target performance among the plurality of performance values is higher, or a performance value that satisfies the target performance can be obtained. The smaller the number of virtual experiments, the better the value, and the closer the performance value is to the target performance, the better the value.
 これにより、性能値として適切な値を用いて探索行動を評価することにより適切な出力モデルが定まる結果、材料又は薬剤を生成するための適切な実験条件を探索することができる。 This makes it possible to search for an appropriate experimental condition for generating a material or a drug as a result of determining an appropriate output model by evaluating a search action using an appropriate value as a performance value.
 また、本開示の学習装置では、導出部は、出力モデルから、予め定められた規則を満たさない実験条件が出力された場合、評価値を低く補正してもよい。 In the learning device of the present disclosure, the derivation unit may correct the evaluation value to be low when an experimental condition that does not satisfy a predetermined rule is output from the output model.
 これにより、過去の経験則等に基づく予め定められた規則を満たす実験条件が得られる可能性を高くすることができる結果、材料又は薬剤を生成するための適切な実験条件を探索することができる。 As a result, it is possible to increase the possibility of obtaining an experimental condition that satisfies a predetermined rule based on past empirical rules, etc., and as a result, it is possible to search for an appropriate experimental condition for generating a material or a drug. .
 また、本開示の学習装置では、導出部が、出力モデルから出力された実験条件を実際の実験に使用可能な実験条件に補正してもよい。 In the learning device of the present disclosure, the derivation unit may correct the experimental condition output from the output model to an experimental condition that can be used for an actual experiment.
 これにより、実際の実験に使用可能な実験条件が得られる可能性を高くすることができる結果、材料又は薬剤を生成するための適切な実験条件を探索することができる。 As a result, it is possible to increase the possibility of obtaining experimental conditions that can be used in actual experiments. As a result, it is possible to search for appropriate experimental conditions for generating a material or a drug.
 また、本開示の学習装置は、出力モデルが、遺伝的アルゴリズムを用いて学習されるモデルであってもよい。 Further, in the learning device according to the present disclosure, the output model may be a model learned using a genetic algorithm.
 これにより、材料又は薬剤を生成するためのより適切な実験条件を探索することができる。 This makes it possible to search for more appropriate experimental conditions for generating a material or a drug.
 本開示の学習装置は、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、強化学習における行動価値を出力とした出力モデルに、複数の組み合わせと複数の異なる実験条件の候補各々とをそれぞれ入力することにより出力された複数の行動価値のうち、所定値以上の行動価値に対応する実験条件の候補を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値に基づいて導出される値を報酬として、出力モデルを学習させる学習部を備えている。 The learning device of the present disclosure is an output model in which a plurality of combinations of experimental conditions for generating a material or a drug and performance values of experimental results, and experimental condition candidates are input, and the action value in reinforcement learning is output. A virtual experiment is performed to select a candidate for an experimental condition corresponding to an action value greater than or equal to a predetermined value among a plurality of action values output by inputting a plurality of combinations and a plurality of different experiment condition candidates. A learning unit is provided that learns the output model using a value derived based on the performance value of the experimental result obtained by inputting to the experimental model to be performed as a reward.
 これにより、材料又は薬剤を生成するための適切な実験条件を探索することができる。 This makes it possible to search for appropriate experimental conditions for generating materials or drugs.
 なお、本開示の学習装置は、報酬が、複数の性能値における目標とする性能を満たす値の比率が高いほど良い値であるか、目標とする性能を満たす性能値が得られるまでの仮想的な実験回数が少ないほど良い値であるか、又は性能値が目標とする性能に近いほど良い値であってもよい。 Note that the learning device of the present disclosure is a virtual value until the reward is a better value as the ratio of values satisfying the target performance in the plurality of performance values is higher, or until a performance value satisfying the target performance is obtained. The smaller the number of experiments, the better the value, or the closer the performance value is to the target performance, the better the value.
 これにより、性能値として適切な値が用いられる結果、材料又は薬剤を生成するための適切な実験条件を探索することができる。 Thereby, as a result of using an appropriate value as the performance value, it is possible to search for an appropriate experimental condition for generating a material or a drug.
 また、本開示の学習装置は、強化学習が、Q学習であり、行動価値が、Q値であってもよい。 Further, in the learning device of the present disclosure, the reinforcement learning may be Q learning, and the action value may be a Q value.
 これにより、材料又は薬剤を生成するためのより適切な実験条件を探索することができる。 This makes it possible to search for more appropriate experimental conditions for generating a material or a drug.
 また、本開示の学習装置は、学習部により学習された出力モデルを用いる場合に、複数の実験条件の候補を出力モデルに逐次的に複数回入力することにより出力された累計の行動価値が最大となる実験条件の候補を次に実験対象とする実験条件の候補として出力する出力部を更に備えてもよい。 In addition, when using the output model learned by the learning unit, the learning device of the present disclosure has the maximum cumulative action value output by sequentially inputting a plurality of experimental condition candidates to the output model multiple times. An output unit may be further provided that outputs the experimental condition candidate to be the next experimental condition candidate to be tested.
 これにより、材料又は薬剤を生成するためのより適切な実験条件を探索することができる。 This makes it possible to search for more appropriate experimental conditions for generating a material or a drug.
 また、本開示の学習装置は、実験モデルが、機械学習により得られたモデルであってもよい。 In the learning device of the present disclosure, the experimental model may be a model obtained by machine learning.
 これにより、特定の問題に特化した出力モデルを生成することができる。 This makes it possible to generate an output model specialized for a specific problem.
 また、本開示の学習装置は、実験モデルが、複数存在し、複数の実験モデルのそれぞれの作成条件が異なってもよい。 In addition, the learning device of the present disclosure may include a plurality of experimental models, and the creation conditions of each of the plurality of experimental models may be different.
 これにより、作成条件の異なる実験モデルにより得られた複数の仮想的な実験結果を用いることによって、材料又は薬剤を生成するためのより適切な実験条件を探索することができる。また、実験モデルが、sin又はexp等の関数を含んで構成された数式であってもよい。これにより、実験データが全く得られていないような実験系でも、出力モデルを生成することができる。 Thus, by using a plurality of virtual experimental results obtained from experimental models with different preparation conditions, it is possible to search for more appropriate experimental conditions for generating a material or a drug. Also, the experimental model may be a mathematical expression configured to include a function such as sin or exp. As a result, an output model can be generated even in an experimental system in which no experimental data is obtained.
 また、本開示の学習装置では、出力モデルが、複数存在し、複数の出力モデルのそれぞれの作成条件が異なってもよい。 In the learning device of the present disclosure, there may be a plurality of output models, and the creation conditions of the plurality of output models may be different.
 これにより、作成条件の異なる出力モデルにより得られた複数の実験条件から得られる複数の性能値の評価により学習することによって、材料又は薬剤を生成するためのより適切な実験条件を探索することができる。 This makes it possible to search for a more appropriate experimental condition for generating a material or a drug by learning by evaluating a plurality of performance values obtained from a plurality of experimental conditions obtained by an output model having different creation conditions. it can.
 本開示の学習方法は、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデルに、複数の組み合わせを入力することによって出力された実験条件を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値を用いて出力モデルの評価値を導出し、導出した評価値を反映させる機械学習によって出力モデルを学習させる処理をコンピュータが実行する方法である。 In the learning method of the present disclosure, a plurality of combinations of an experimental condition for generating a material or a drug and a performance value of the experimental result are input, and a plurality of combinations are input to an output model having the experimental condition as an output. Using machine learning to derive the evaluation value of the output model using the performance value of the experimental result obtained by inputting the output experimental condition into the experimental model that performs the virtual experiment, and to reflect the derived evaluation value This is a method in which a computer executes processing for learning an output model.
 本開示の学習プログラムは、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデルに、複数の組み合わせを入力することによって出力された実験条件を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値を用いて出力モデルの評価値を導出し、導出した評価値を反映させる機械学習によって出力モデルを学習させる処理をコンピュータに実行させるためのものである。 The learning program of the present disclosure receives a plurality of combinations of an experimental condition for generating a material or a drug and a performance value of the experimental result as an input, and inputs the plurality of combinations into an output model having the experimental condition as an output. Using machine learning to derive the evaluation value of the output model using the performance value of the experimental result obtained by inputting the output experimental condition into the experimental model that performs the virtual experiment, and to reflect the derived evaluation value This is for causing the computer to execute processing for learning the output model.
 本開示の学習方法は、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、強化学習における行動価値を出力とした出力モデルに、複数の組み合わせと複数の異なる実験条件の候補各々とをそれぞれ入力することにより出力された複数の行動価値のうち、所定値以上の行動価値に対応する実験条件の候補を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値に基づいて導出される値を報酬として、出力モデルを学習させる処理をコンピュータが実行する方法である。 The learning method of the present disclosure is an output model in which a plurality of combinations of experimental conditions for generating a material or a drug and performance values of experimental results, and candidate experimental conditions are input, and the action value in reinforcement learning is output. A virtual experiment is performed to select a candidate for an experimental condition corresponding to an action value greater than or equal to a predetermined value among a plurality of action values output by inputting a plurality of combinations and a plurality of different experiment condition candidates. This is a method in which a computer executes a process of learning an output model using a value derived based on a performance value of an experimental result obtained by inputting to an experimental model to be performed as a reward.
 本開示の学習プログラムは、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、強化学習における行動価値を出力とした出力モデルに、複数の組み合わせと複数の異なる実験条件の候補各々とをそれぞれ入力することにより出力された複数の行動価値のうち、所定値以上の行動価値に対応する実験条件の候補を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値に基づいて導出される値を報酬として、出力モデルを学習させる処理をコンピュータに実行させるためのものである。 The learning program of the present disclosure is an output model in which a plurality of combinations of experimental conditions for generating a material or a drug and performance values of experimental results, and candidate experimental conditions are input, and the action value in reinforcement learning is output. A virtual experiment is performed to select a candidate for an experimental condition corresponding to an action value greater than or equal to a predetermined value among a plurality of action values output by inputting a plurality of combinations and a plurality of different experiment condition candidates. This is for causing a computer to execute a process of learning an output model using a value derived based on a performance value of an experimental result obtained by inputting to an experimental model to be performed as a reward.
 また、本開示の学習装置は、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデルに、複数の組み合わせを入力することによって出力された実験条件を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値を用いて出力モデルの評価値を導出し、導出した評価値を反映させる機械学習によって出力モデルを学習させるプロセッサを有する。 Further, the learning device of the present disclosure inputs a plurality of combinations of experimental conditions for generating a material or a drug and performance values of the experimental results, and inputs a plurality of combinations to an output model that outputs the experimental conditions. This is a machine that derives the evaluation value of the output model using the performance value of the experimental result obtained by inputting the experimental condition output by the above into the experimental model for performing a virtual experiment, and reflects the derived evaluation value A processor for learning the output model by learning;
 また、本開示の学習装置は、材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、強化学習における行動価値を出力とした出力モデルに、複数の組み合わせと複数の異なる実験条件の候補各々とをそれぞれ入力することにより出力された複数の行動価値のうち、所定値以上の行動価値に対応する実験条件の候補を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値に基づいて導出される値を報酬として、出力モデルを学習させるプロセッサを有する。 In addition, the learning device of the present disclosure outputs a combination of an experimental condition for generating a material or a drug and a performance value of the experimental result and an experimental condition candidate as inputs, and an action value in reinforcement learning as an output. Of the plurality of action values output by inputting a plurality of combinations and each of a plurality of different experiment condition candidates to the model, experimental condition candidates corresponding to action values equal to or greater than a predetermined value are virtually It has a processor that learns the output model by using as a reward a value derived based on the performance value of the experimental result obtained by inputting to the experimental model for performing the experiment.
 本開示によれば、材料又は薬剤を生成するための適切な実験条件を探索することができる。 According to the present disclosure, it is possible to search for an appropriate experimental condition for generating a material or a drug.
第1実施形態に係る学習フェーズにおける学習装置の機能的な構成の一例を示すブロック図である。It is a block diagram which shows an example of a functional structure of the learning apparatus in the learning phase which concerns on 1st Embodiment. 各実施形態に係る学習用データの一例を示す図である。It is a figure which shows an example of the data for learning which concern on each embodiment. 第1実施形態に係る出力モデルの一例を示す図である。It is a figure which shows an example of the output model which concerns on 1st Embodiment. 第1実施形態に係る出力モデルから出力されるデータのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of the data output from the output model which concerns on 1st Embodiment. 各実施形態に係る実験モデルの一例を示す図である。It is a figure which shows an example of the experimental model which concerns on each embodiment. 各実施形態に係る実験モデルから出力されるデータのデータ構造の一例を示す図である。It is a figure which shows an example of the data structure of the data output from the experimental model which concerns on each embodiment. 第1実施形態に係る出力モデルの評価値の導出処理を説明するための図である。It is a figure for demonstrating the derivation | leading-out process of the evaluation value of the output model which concerns on 1st Embodiment. 変形例に係る出力モデルの評価値の導出処理を説明するための図である。It is a figure for demonstrating the derivation | leading-out process of the evaluation value of the output model which concerns on a modification. 第1実施形態に係る運用フェーズにおける学習装置の機能的な構成の一例を示すブロック図である。It is a block diagram which shows an example of a functional structure of the learning apparatus in the operation phase which concerns on 1st Embodiment. 各実施形態に係る学習装置のハードウェア構成の一例を示すブロック図である。It is a block diagram which shows an example of the hardware constitutions of the learning apparatus which concerns on each embodiment. 各実施形態に係る実験モデル学習処理の一例を示すフローチャートである。It is a flowchart which shows an example of the experimental model learning process which concerns on each embodiment. 第1実施形態に係る出力モデル学習処理の一例を示すフローチャートである。It is a flowchart which shows an example of the output model learning process which concerns on 1st Embodiment. 第1実施形態に係る実験条件出力処理の一例を示すフローチャートである。It is a flowchart which shows an example of the experiment condition output process which concerns on 1st Embodiment. 第2実施形態に係る学習フェーズにおける学習装置の機能的な構成の一例を示すブロック図である。It is a block diagram which shows an example of a functional structure of the learning apparatus in the learning phase which concerns on 2nd Embodiment. 第2実施形態に係る出力モデルの一例を示す図である。It is a figure which shows an example of the output model which concerns on 2nd Embodiment. 第2実施形態に係る出力モデルの評価値の導出処理を説明するための図である。It is a figure for demonstrating the derivation | leading-out process of the evaluation value of the output model which concerns on 2nd Embodiment. 第2実施形態に係る運用フェーズにおける学習装置の機能的な構成の一例を示すブロック図である。It is a block diagram which shows an example of a functional structure of the learning apparatus in the operation phase which concerns on 2nd Embodiment. 第2実施形態に係る出力モデル学習処理の一例を示すフローチャートである。It is a flowchart which shows an example of the output model learning process which concerns on 2nd Embodiment. 第2実施形態に係る実験条件出力処理の一例を示すフローチャートである。It is a flowchart which shows an example of the experiment condition output process which concerns on 2nd Embodiment.
 以下、図面を参照して、本開示の技術を実施するための形態例を詳細に説明する。 Hereinafter, exemplary embodiments for carrying out the technology of the present disclosure will be described in detail with reference to the drawings.
 [第1実施形態]
 まず、図1を参照して、本実施形態に係る学習フェーズにおける学習装置10の機能的な構成について説明する。図1に示すように、学習装置10は、導出部12及び学習部14を備える。また、学習装置10の記憶部42(図10参照)には、学習用データ20、複数の出力モデル22、及び複数の実験モデル24が記憶される。
[First Embodiment]
First, a functional configuration of the learning device 10 in the learning phase according to the present embodiment will be described with reference to FIG. As illustrated in FIG. 1, the learning device 10 includes a derivation unit 12 and a learning unit 14. Further, the storage unit 42 (see FIG. 10) of the learning device 10 stores learning data 20, a plurality of output models 22, and a plurality of experiment models 24.
 図2に、学習用データ20の一例を示す。図2に示すように、本実施形態に係る学習用データ20は、材料を生成するための実験条件と、その実験条件で実験を行った場合の実験結果の材料の性能値との組み合わせを含む。実験条件は、例えば、半導体レジスト材料等の材料を生成する際の条件であり、主成分組成、添加物量、及びプロセス条件を含む。図2の例では、主成分組成は、材料の主成分の比率を示し、添加物量は添加物の濃度を示し、プロセス条件は、材料を生成する際の温度を示す。 FIG. 2 shows an example of the learning data 20. As shown in FIG. 2, the learning data 20 according to the present embodiment includes a combination of an experimental condition for generating a material and a material performance value as an experimental result when an experiment is performed under the experimental condition. . The experimental conditions are, for example, conditions for generating a material such as a semiconductor resist material, and include a main component composition, an additive amount, and process conditions. In the example of FIG. 2, the main component composition indicates the ratio of the main component of the material, the additive amount indicates the concentration of the additive, and the process condition indicates the temperature at which the material is generated.
 また、学習用データ20の性能値は、対応する実験条件により材料が生成された際の材料の性能値を示す。本実施形態に係る性能値は、材料の出来のよさを表す尺度であり、例えば、材料の表面の凹凸の度合い、及び所望の大きさの穴があけられたかを表す度合い等が挙げられる。また、本実施形態では、性能値が小さいほど材料の出来がよいことを示している。また、本実施形態の学習用データ20は、複数の異なる実験条件と性能値との組み合わせを含む。なお、実験条件には、同じものが複数含まれていてもよい。 In addition, the performance value of the learning data 20 indicates the performance value of the material when the material is generated under the corresponding experimental condition. The performance value according to the present embodiment is a scale representing the quality of the material. Examples of the performance value include a degree of unevenness on the surface of the material and a degree representing whether a hole having a desired size is formed. In this embodiment, the smaller the performance value, the better the material. In addition, the learning data 20 of the present embodiment includes a combination of a plurality of different experimental conditions and performance values. The experimental conditions may include a plurality of the same conditions.
 図3に、出力モデル22の一例を示す。図3に示すように、本実施形態に係る出力モデル22は、入力層、複数の中間層、及び出力層を含むニューラルネットワークである。出力モデル22の入力層には、実験条件と性能値との複数の組み合わせが入力される。出力モデル22の出力層は、1つの実験条件を出力する。図4に、出力モデル22の出力層から出力される実験条件のデータ構造の一例を示す。図4に示すように、出力モデル22の出力層は、例えば、主成分組成、添加物量、及びプロセス条件を含む実験条件を出力する。 FIG. 3 shows an example of the output model 22. As shown in FIG. 3, the output model 22 according to this embodiment is a neural network including an input layer, a plurality of intermediate layers, and an output layer. A plurality of combinations of experimental conditions and performance values are input to the input layer of the output model 22. The output layer of the output model 22 outputs one experimental condition. FIG. 4 shows an example of the data structure of the experimental conditions output from the output layer of the output model 22. As shown in FIG. 4, the output layer of the output model 22 outputs experimental conditions including, for example, the main component composition, the additive amount, and the process conditions.
 詳細には、出力モデル22は、例えば、以下の(1)~(3)に示すように構成される。
(1)入力層のノード数:N×M
 なお、Nは、実験条件の項目数を表し、Mは、実験回数を表す。
(2)中間層の構成:カーネルが3×3、フィルタ数が32、ストライドが2、活性化関数がReluの畳み込み層を10層有する。
(3)出力層のノード数:N×1
Specifically, the output model 22 is configured, for example, as shown in the following (1) to (3).
(1) Number of nodes in the input layer: N × M
N represents the number of items in the experimental condition, and M represents the number of experiments.
(2) Configuration of the intermediate layer: It has 10 convolution layers with a kernel of 3 × 3, a filter number of 32, a stride of 2, and an activation function of Relu.
(3) Number of nodes in the output layer: N × 1
 また、本実施の形態に係る複数の出力モデル22は、それぞれモデルの作成条件が異なる。詳細には、複数の出力モデル22は、中間層の層数、中間層の各層のノード数、及び重みの初期値の少なくとも1つが異なることによって、それぞれモデルの作成条件が異なる。 Also, the plurality of output models 22 according to the present embodiment have different model creation conditions. More specifically, the plurality of output models 22 have different model creation conditions depending on at least one of the number of intermediate layers, the number of nodes in each intermediate layer, and the initial value of the weight.
 図5に、実験モデル24の一例を示す。図5に示すように、本実施形態に係る実験モデル24は、入力層、複数の中間層、及び出力層を含むニューラルネットワークとされている。実験モデル24は、仮想的な実験を行うモデルであり、実験モデル24の入力層には、1つの実験条件が入力される。実験モデル24の出力層は、入力層に入力された1つの実験条件に対応する実験結果の性能値を出力する。図6に、実験モデル24の出力層から出力される実験結果の性能値のデータ構造の一例を示す。なお、実験モデル24は、複数種類の性能値を出力してもよい。この場合、例えば、実験モデル24は、材料の性能値として、材料の表面の凹凸の度合い、及び材料の光感度の双方を出力する。 FIG. 5 shows an example of the experimental model 24. As shown in FIG. 5, the experimental model 24 according to the present embodiment is a neural network including an input layer, a plurality of intermediate layers, and an output layer. The experimental model 24 is a model for performing a virtual experiment, and one experimental condition is input to the input layer of the experimental model 24. The output layer of the experimental model 24 outputs a performance value of an experimental result corresponding to one experimental condition input to the input layer. FIG. 6 shows an example of the data structure of the performance value of the experimental result output from the output layer of the experimental model 24. The experimental model 24 may output a plurality of types of performance values. In this case, for example, the experimental model 24 outputs both the degree of unevenness on the surface of the material and the light sensitivity of the material as the performance value of the material.
 詳細には、実験モデル24は、例えば、以下の(4)~(6)に示すように構成される。
(4)入力層のノード数:N×1
 なお、Nは、実験条件の項目数を表す。
(5)中間層の構成:カーネルが3×3、フィルタ数が32、ストライドが2、活性化関数がReluの畳み込み層を4層有する。
(6)出力層のノード数:1×J
 なお、Jは、性能値の種類数を表す。
Specifically, the experimental model 24 is configured, for example, as shown in the following (4) to (6).
(4) Number of nodes in the input layer: N × 1
N represents the number of items in the experimental condition.
(5) Configuration of the intermediate layer: It has four convolution layers with a kernel of 3 × 3, a filter number of 32, a stride of 2, and an activation function of Relu.
(6) Number of nodes in the output layer: 1 × J
J represents the number of types of performance values.
 また、本実施の形態に係る複数の実験モデル24は、それぞれモデルの作成条件が異なる。詳細には、複数の実験モデル24は、中間層の層数、中間層の各層のノード数、及び重みの初期値の少なくとも1つが異なることによって、それぞれモデルの作成条件が異なる。 In addition, the plurality of experimental models 24 according to the present embodiment have different model creation conditions. Specifically, the plurality of experimental models 24 have different model creation conditions, because at least one of the number of intermediate layers, the number of nodes of each layer of the intermediate layer, and the initial value of the weight is different.
 導出部12は、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせを出力モデル22に入力し、出力モデル22から出力された実験条件を取得する。詳細には、導出部12は、まず、学習用データ20に含まれる全ての実験条件と性能値との組み合わせを出力モデル22に入力し、出力モデル22から出力された実験条件を取得する。なお、導出部12は、学習用データ20に含まれる一部の複数の実験条件と性能値との組み合わせを出力モデル22に入力してもよいし、学習用データ20とは異なる複数の実験条件と性能値との組み合わせを出力モデル22に入力してもよい。 The deriving unit 12 inputs a plurality of combinations of the experimental conditions for generating the material and the performance values of the experimental results to the output model 22, and acquires the experimental conditions output from the output model 22. Specifically, the derivation unit 12 first inputs combinations of all experimental conditions and performance values included in the learning data 20 to the output model 22, and acquires the experimental conditions output from the output model 22. The derivation unit 12 may input a combination of some of the plurality of experimental conditions and performance values included in the learning data 20 to the output model 22 or a plurality of experimental conditions different from the learning data 20. And a combination of performance values may be input to the output model 22.
 また、導出部12は、出力モデル22から出力された実験条件を、実際の実験に使用可能な実験条件に補正する。本実施形態では、導出部12は、出力モデル22から出力された実験条件を、実際に使用する実験装置の制約を満たす最も近い実験条件に補正する。例えば、実験装置の仕様上、プロセス条件で設定可能な温度が5℃単位で、かつ出力モデル22から出力された実験条件に含まれるプロセス条件の温度が5℃単位ではない温度(例えば、92.3℃)の場合、導出部12は、出力モデル22から出力された上記温度を、最も近い5の倍数の温度(例えば、90℃)に補正する。 Also, the derivation unit 12 corrects the experimental condition output from the output model 22 to an experimental condition that can be used for an actual experiment. In the present embodiment, the derivation unit 12 corrects the experimental condition output from the output model 22 to the closest experimental condition that satisfies the constraints of the experimental apparatus actually used. For example, in the specification of the experimental apparatus, the temperature that can be set in the process condition is a unit of 5 ° C., and the temperature of the process condition included in the experimental condition output from the output model 22 is not a unit of 5 ° C. (for example, 92. 3 °), the derivation unit 12 corrects the temperature output from the output model 22 to the nearest multiple of 5 (for example, 90 ° C.).
 次に、導出部12は、補正して得られた実験条件を各実験モデル24に入力し、各実験モデル24から出力された性能値をそれぞれ取得する。 Next, the derivation unit 12 inputs the experimental conditions obtained by the correction to each experimental model 24, and acquires the performance values output from each experimental model 24, respectively.
 更に、導出部12は、出力モデル22に入力した複数セットの実験条件と性能値との複数の組み合わせに、それぞれ対応する実験モデル24に入力した実験条件と導出した性能値との組み合わせを追加した複数セットの実験条件と性能値との複数の組み合わせを得る。そして、導出部12は、再度、得られた複数セットの実験条件と性能値との複数の組み合わせを出力モデル22に入力することにより得られた複数の実験条件をそれぞれ対応する各実験モデル24に入力する。これにより、導出部12は、再度、それぞれ対応する実験モデル24に入力した実験条件に対応する性能値を得る。導出部12は、以上の出力モデル22に入力した複数セットの実験条件と性能値との複数の組み合わせに、それぞれ対応する実験モデル24に入力した実験条件と得られた性能値との組み合わせを追加して、再度それぞれ対応する実験モデル24を用いて性能値を得る処理を所定の回数(例えば、100回)繰り返す。 Furthermore, the derivation unit 12 adds a combination of the experimental condition input to the corresponding experimental model 24 and the derived performance value to a plurality of combinations of the plurality of sets of experimental conditions input to the output model 22. Obtain multiple combinations of multiple sets of experimental conditions and performance values. The deriving unit 12 again inputs the plurality of combinations of the obtained plurality of sets of experimental conditions and performance values to the output model 22 to each of the corresponding experimental models 24. input. Thereby, the deriving unit 12 obtains performance values corresponding to the experimental conditions input to the corresponding experimental models 24 again. The deriving unit 12 adds a combination of the experimental condition input to the corresponding experimental model 24 and the obtained performance value to the plurality of combinations of the experimental condition and performance value input to the output model 22 described above. Then, the process of obtaining the performance value again using the corresponding experimental model 24 is repeated a predetermined number of times (for example, 100 times).
 また、導出部12は、以上の処理を各出力モデル22に対して行う。すなわち、導出部12は、各出力モデル22について、所定回数分の出力モデル22から出力された実験条件と、その実験条件に対応する性能値との複数の組み合わせを得る。 Further, the derivation unit 12 performs the above processing on each output model 22. That is, for each output model 22, the derivation unit 12 obtains a plurality of combinations of the experimental conditions output from the output model 22 for a predetermined number of times and the performance values corresponding to the experimental conditions.
 導出部12は、各出力モデル22について、得られた所定回数分の性能値を用いて、出力モデル22の評価値を導出する。本実施形態では、導出部12は、一例として図7に示すように、目標とする性能を満たす性能値(本実施形態では、目標値以下である性能値)が得られるまでの仮想的な実験回数(図7に示すN)が少ないほど良い値として出力モデル22の評価値を導出する。なお、図7の縦軸は性能値を示し、横軸はその性能値が何回目の仮想的な実験で得られた値であるかを表す仮想的な実験回数を示す。図7の例では、N回目の仮想的な実験で初めて目標とする性能を満たす性能値が得られたことを示している。 The derivation unit 12 derives an evaluation value of the output model 22 by using the obtained performance value for a predetermined number of times for each output model 22. In this embodiment, as shown in FIG. 7 as an example, the derivation unit 12 performs a virtual experiment until a performance value that satisfies the target performance (in this embodiment, a performance value that is equal to or less than the target value) is obtained. The evaluation value of the output model 22 is derived as a better value as the number of times (N shown in FIG. 7) is smaller. Note that the vertical axis in FIG. 7 indicates the performance value, and the horizontal axis indicates the number of virtual experiments indicating the number of virtual experiments obtained by the virtual value. In the example of FIG. 7, it is shown that a performance value that satisfies the target performance is obtained for the first time in the Nth virtual experiment.
 なお、導出部12は、一例として図8に示すように、得られた所定回数分の性能値における目標とする性能を満たす性能値の比率(図8に示す全ての性能値の数に対する一点鎖線の矩形で囲まれた性能値の数の比率)が高いほど良い値として出力モデル22の評価値を導出してもよい。なお、図8における「good」は、目標とする性能を満たすことを意味する。また、導出部12は、各性能値が目標値に近いほど良い値として出力モデル22の評価値を導出してもよい。 As an example, as shown in FIG. 8, the derivation unit 12 has a ratio of performance values satisfying the target performance in the obtained performance values for a predetermined number of times (a chain line with respect to the number of all performance values shown in FIG. 8. The evaluation value of the output model 22 may be derived as a better value as the ratio of the number of performance values surrounded by the rectangle is higher. Note that “good” in FIG. 8 means that the target performance is satisfied. Further, the deriving unit 12 may derive the evaluation value of the output model 22 as a better value as each performance value is closer to the target value.
 なお、導出部12は、出力モデル22から予め定められた規則を満たさない実験条件が出力された場合に、上記評価値を低く補正してもよい。この予め定められた規則としては、例えば、材料Aと材料Bとを混合させることは無い、又は5種類以上の材料を混合させることは無い等のユーザの経験則に従った規則が挙げられる。 The derivation unit 12 may correct the evaluation value to be low when an experimental condition that does not satisfy a predetermined rule is output from the output model 22. Examples of the predetermined rule include a rule according to a user's empirical rule such that the material A and the material B are not mixed, or five or more kinds of materials are not mixed.
 学習部14は、機械学習の一例としての誤差逆伝播法を用いて、実験モデル24を学習させる。具体的には、学習部14は、学習用データ20に含まれる実験条件を実験モデル24に入力し、実験モデル24から出力された性能値を取得する。そして、学習部14は、取得した性能値と、学習用データ20に含まれる実験条件に対応する性能値との差が最小となるように、実験モデル24を学習させる。学習部14は、この実験モデル24を学習させる処理を、学習用データ20に含まれる全ての実験条件と性能値との組み合わせを用いて行う。なお、学習部14は、学習用データ20に含まれる一部の実験条件と性能値との複数の組み合わせを用いて実験モデル24を学習させてもよい。また、学習部14が各実験モデル24を学習させる際に各実験モデル24に入力するデータは、各実験モデル24間で同じデータでもよいし、異なるデータでもよい。 The learning unit 14 learns the experimental model 24 using an error back propagation method as an example of machine learning. Specifically, the learning unit 14 inputs an experimental condition included in the learning data 20 to the experimental model 24 and acquires a performance value output from the experimental model 24. Then, the learning unit 14 learns the experimental model 24 so that the difference between the acquired performance value and the performance value corresponding to the experimental condition included in the learning data 20 is minimized. The learning unit 14 performs the process of learning the experimental model 24 using combinations of all experimental conditions and performance values included in the learning data 20. The learning unit 14 may learn the experimental model 24 using a plurality of combinations of some experimental conditions and performance values included in the learning data 20. Further, the data input to each experimental model 24 when the learning unit 14 learns each experimental model 24 may be the same data or different data among the experimental models 24.
 また、学習部14は、各出力モデル22について導出部12により導出された評価値を用いて、最適化アルゴリズムの一例としての遺伝的アルゴリズムを用いた機械学習によって各出力モデル22を学習させる。なお、この遺伝的アルゴリズムで用いられる個体の選択手法(例えば、ルーレット選択等)、交叉方法(例えば、二点交叉等)、及び突然変異の確率等のパラメータは、ユーザによって予め設定される。 Further, the learning unit 14 uses the evaluation value derived by the deriving unit 12 for each output model 22 to learn each output model 22 by machine learning using a genetic algorithm as an example of an optimization algorithm. Note that parameters such as an individual selection method (for example, roulette selection), a crossover method (for example, two-point crossover), and a mutation probability used in this genetic algorithm are preset by the user.
 詳細には、例えば、学習部14は、各出力モデル22のうち、最も評価の良い2つの出力モデル22を交配することによって新たな出力モデル22を生成する。この交配は、例えば、一方の出力モデル22の入力層と中間層のうちの入力層側の半分の中間層、及び他方の出力モデル22の中間層のうちの出力層側の半分の中間層と出力層を結合することによって行われる。なお、交配の手法はこの例に限定されない。例えば、一方の出力モデル22の図3に示す入力層、中間層、及び出力層の上半分と、他方の出力モデル22の図3に示す入力層、中間層、及び出力層の下半分と、を結合することによって交配を行ってもよい。また、本実施形態では、学習部14は、世代間で出力モデル22の数が変わらないように、遺伝的アルゴリズムにより次世代の出力モデル22を生成する。すなわち、遺伝的アルゴリズムを用いることにより出力モデル22の重み値が更新されることによって、出力モデル22が学習される。また、出力モデル22が学習されることにより、導出部12により導出された評価値が反映される。 Specifically, for example, the learning unit 14 generates a new output model 22 by mating two output models 22 having the best evaluation among the output models 22. This crossing is performed, for example, with an input layer side half of the input layer and intermediate layer of one output model 22 and an intermediate layer of the output layer side of the intermediate layer of the other output model 22 This is done by combining the output layers. The method of mating is not limited to this example. For example, the upper half of the input layer, intermediate layer, and output layer shown in FIG. 3 of one output model 22 and the lower half of the input layer, intermediate layer, and output layer shown in FIG. 3 of the other output model 22; Mating may be performed by combining In the present embodiment, the learning unit 14 generates the next generation output model 22 using a genetic algorithm so that the number of output models 22 does not change between generations. That is, the output model 22 is learned by updating the weight value of the output model 22 by using a genetic algorithm. In addition, by learning the output model 22, the evaluation value derived by the deriving unit 12 is reflected.
 上記の導出部12による各出力モデル22の評価値の導出処理、及び学習部14による出力モデル22群の学習処理は、所定の世代数(例えば、1万世代)だけ行われる。そして、学習部14は、最終世代において評価値が示す評価が最も良い1つの出力モデル22を、後述する運用フェーズで用いる出力モデル22Aとして記憶部42に記憶する。なお、上記の導出部12による各出力モデル22の評価値の導出処理、及び学習部14による出力モデル22群の学習処理は、評価値が収束するまで行ってもよい。 The derivation process of the evaluation value of each output model 22 by the derivation unit 12 and the learning process of the output model 22 group by the learning unit 14 are performed for a predetermined number of generations (for example, 10,000 generations). Then, the learning unit 14 stores, in the storage unit 42, one output model 22 having the best evaluation indicated by the evaluation value in the final generation as an output model 22A used in an operation phase described later. Note that the derivation process of the evaluation value of each output model 22 by the derivation unit 12 and the learning process of the output model 22 group by the learning unit 14 may be performed until the evaluation value converges.
 次に、図9を参照して、本実施形態に係る運用フェーズにおける学習装置10の機能的な構成について説明する。図9に示すように、学習装置10は、受付部30及び出力部32を備える。また、学習装置10の記憶部42には、前述した学習フェーズで得られた出力モデル22Aが記憶される。 Next, a functional configuration of the learning device 10 in the operation phase according to the present embodiment will be described with reference to FIG. As illustrated in FIG. 9, the learning device 10 includes a reception unit 30 and an output unit 32. Further, the storage unit 42 of the learning device 10 stores the output model 22A obtained in the learning phase described above.
 受付部30は、ユーザにより入力部44(図10参照)を介して入力された材料を生成するための実験条件と、実験結果の材料の性能値との複数の組み合わせを受け付ける。 The accepting unit 30 accepts a plurality of combinations of an experimental condition for generating a material input by the user via the input unit 44 (see FIG. 10) and a performance value of the material of the experimental result.
 出力部32は、受付部30により受け付けられた実験条件と性能値との複数の組み合わせを出力モデル22Aに入力し、出力モデル22Aから出力された実験条件を取得する。また、出力部32は、学習フェーズにおける導出部12と同様に、出力モデル22Aから出力された実験条件を実際の実験に使用可能な実験条件に補正する。そして、出力部32は、補正して得られた実験条件を表示部43(図10参照)に出力する。ユーザは、表示部43に表示された実験条件を目視し、必要に応じてその実験条件での実験を行う。なお、出力部32は、補正して得られた実験条件を記憶部42に出力(記憶)してもよい。 The output unit 32 inputs a plurality of combinations of the experimental conditions and performance values received by the receiving unit 30 to the output model 22A, and acquires the experimental conditions output from the output model 22A. The output unit 32 corrects the experimental condition output from the output model 22A to an experimental condition that can be used for an actual experiment, as in the derivation unit 12 in the learning phase. Then, the output unit 32 outputs the experimental condition obtained by the correction to the display unit 43 (see FIG. 10). The user visually observes the experimental conditions displayed on the display unit 43 and performs an experiment under the experimental conditions as necessary. Note that the output unit 32 may output (store) experimental conditions obtained by the correction to the storage unit 42.
 次に、図10を参照して、学習装置10のハードウェア構成について説明する。学習装置10は、図10に示すコンピュータによって実現される。図10に示すように、学習装置10は、CPU(Central Processing Unit)40、一時記憶領域としてのメモリ41、及び不揮発性の記憶部42を備える。また、学習装置10は、液晶ディスプレイ等の表示部43、及びキーボードとマウス等の入力部44を備える。CPU40、メモリ41、記憶部42、表示部43、及び入力部44は、バス45を介して接続される。 Next, the hardware configuration of the learning device 10 will be described with reference to FIG. The learning device 10 is realized by a computer shown in FIG. As illustrated in FIG. 10, the learning device 10 includes a CPU (Central Processing Unit) 40, a memory 41 as a temporary storage area, and a nonvolatile storage unit 42. The learning apparatus 10 includes a display unit 43 such as a liquid crystal display and an input unit 44 such as a keyboard and a mouse. The CPU 40, the memory 41, the storage unit 42, the display unit 43, and the input unit 44 are connected via a bus 45.
 記憶部42は、HDD(Hard Disk Drive)、SSD(Solid State Drive)、及びフラッシュメモリ等によって実現される。記憶媒体としての記憶部42には、学習プログラム50が記憶される。CPU40は、学習プログラム50を記憶部42から読み出し、読み出した学習プログラム50をメモリ41に展開してから実行する。CPU40が学習プログラム50を実行することによって、導出部12、学習部14、受付部30、及び出力部32として機能する。 The storage unit 42 is realized by an HDD (Hard Disk Drive), an SSD (Solid State Drive), a flash memory, or the like. A learning program 50 is stored in the storage unit 42 as a storage medium. The CPU 40 reads the learning program 50 from the storage unit 42, and executes the read learning program 50 after expanding it in the memory 41. When the CPU 40 executes the learning program 50, the CPU 40 functions as the derivation unit 12, the learning unit 14, the reception unit 30, and the output unit 32.
 次に、図11~図13を参照して、本実施形態に係る学習装置10の作用を説明する。学習装置10が学習プログラム50を実行することにより、図11に示す実験モデル学習処理、図12に示す出力モデル学習処理、及び図13に示す実験条件出力処理が実行される。図11に示す実験モデル学習処理は、例えば、学習フェーズにおいて、ユーザによって入力部44を介して実験モデル学習処理の実行指示が入力された場合に実行される。また、図12に示す出力モデル学習処理は、例えば、学習フェーズにおいて、ユーザによって入力部44を介して出力モデル学習処理の実行指示が入力された場合に実行される。また、図13に示す実験条件出力処理は、例えば、運用フェーズにおいて、ユーザによって入力部44を介して実験条件出力処理の実行指示が入力された場合に実行される。 Next, the operation of the learning apparatus 10 according to the present embodiment will be described with reference to FIGS. When the learning device 10 executes the learning program 50, the experimental model learning process shown in FIG. 11, the output model learning process shown in FIG. 12, and the experimental condition output process shown in FIG. 13 are executed. The experimental model learning process illustrated in FIG. 11 is executed, for example, when an instruction to perform the experimental model learning process is input by the user via the input unit 44 in the learning phase. Further, the output model learning process illustrated in FIG. 12 is executed, for example, when a user inputs an execution instruction for the output model learning process via the input unit 44 in the learning phase. Further, the experiment condition output process shown in FIG. 13 is executed, for example, when the user inputs an execution instruction for the experiment condition output process via the input unit 44 in the operation phase.
 図11のステップS10で、学習部14は、記憶部42から学習用データ20を読み出す。ステップS12で、学習部14は、それぞれモデルの作成条件が異なる複数の実験モデル24を生成する。ステップS14で、学習部14は、ステップS12の処理により生成された複数の実験モデル24の中から、学習させる対象の1つの実験モデル24を選択する。なお、ステップS14の処理が繰り返し実行される際には、学習部14は、それまでに未選択の実験モデル24を選択する。 In step S10 of FIG. 11, the learning unit 14 reads the learning data 20 from the storage unit 42. In step S12, the learning unit 14 generates a plurality of experimental models 24 having different model creation conditions. In step S14, the learning unit 14 selects one experimental model 24 to be learned from the plurality of experimental models 24 generated by the processing in step S12. Note that when the process of step S14 is repeatedly executed, the learning unit 14 selects an experimental model 24 that has not been selected so far.
 ステップS16で、学習部14は、前述したように、ステップS10の処理により読み出された学習用データ20を用いて、ステップS14の処理により選択された実験モデル24を誤差逆伝播法によって学習させる。ステップS18で、学習部14は、ステップS16の処理により学習された実験モデル24を記憶部42に記憶する。ステップS20で、学習部14は、ステップS12の処理により生成された全ての実験モデル24について、ステップS14~ステップS18の処理が完了したか否かを判定する。この判定が否定判定となった場合は、処理はステップS14に戻り、肯定判定となった場合は、実験モデル学習処理が終了する。 In step S16, as described above, the learning unit 14 uses the learning data 20 read out in step S10 to learn the experimental model 24 selected in step S14 by the error back propagation method. . In step S18, the learning unit 14 stores the experimental model 24 learned by the process in step S16 in the storage unit 42. In step S20, the learning unit 14 determines whether or not the processing in steps S14 to S18 has been completed for all the experimental models 24 generated by the processing in step S12. If this determination is negative, the process returns to step S14. If the determination is affirmative, the experimental model learning process ends.
 図12のステップS30で、学習部14は、それぞれモデルの作成条件が異なる複数の出力モデル22を生成する。ステップS32で、導出部12は、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせを各出力モデル22に入力し、各出力モデル22から出力された実験条件をそれぞれ取得する。 12, the learning unit 14 generates a plurality of output models 22 having different model creation conditions. In step S <b> 32, the derivation unit 12 inputs a plurality of combinations of experimental conditions for generating materials and performance values of experimental results to each output model 22, and acquires the experimental conditions output from each output model 22. To do.
 なお、この実験条件と性能値との複数の組み合わせは、ステップS32が出力モデル22の各世代の初回に実行される際(すなわち、初回にステップS32が実行される際、又は後述するステップS46の判定が否定判定となった後の初回にステップS32が実行される際)には、学習用データ20に含まれる全ての実験条件と性能値との組み合わせとされる。また、この実験条件と性能値との複数の組み合わせは、ステップS32が出力モデル22の各世代の2回目以降に実行される際(すなわち、ステップS40の判定が否定判定となった後にステップS32が実行される際)には、前回のステップS32で出力モデル22に入力された実験条件と性能値との複数の組み合わせに、後述するステップS38で実験条件と性能値との組み合わせが追加されたものとなる。 The plurality of combinations of the experimental conditions and the performance values are obtained when step S32 is executed for the first time of each generation of the output model 22 (that is, when step S32 is executed for the first time, or in step S46 described later). When step S32 is executed for the first time after a negative determination), the combination of all the experimental conditions and performance values included in the learning data 20 is used. In addition, a plurality of combinations of the experimental conditions and the performance values are obtained when step S32 is executed after the second time of each generation of the output model 22 (that is, step S32 is determined after the determination in step S40 is negative). When executed, the combination of the experimental condition and the performance value is added in step S38, which will be described later, to the plurality of combinations of the experimental condition and the performance value input to the output model 22 in the previous step S32. It becomes.
 ステップS34で、導出部12は、前述したように、ステップS32の処理により各出力モデル22から出力された実験条件を、実際の実験に使用可能な実験条件に補正する。ステップS36で、導出部12は、ステップS34の処理により補正されて得られた各実験条件を、各実験モデル24に入力し、各実験モデル24から出力された性能値をそれぞれ取得する。また、導出部12は、各出力モデル22について、出力モデル22から出力された実験条件に対応して、実験条件と性能値との複数の組み合わせをそれぞれ保持する。 In step S34, as described above, the derivation unit 12 corrects the experimental condition output from each output model 22 by the processing in step S32 to an experimental condition that can be used in an actual experiment. In step S <b> 36, the derivation unit 12 inputs each experimental condition corrected by the processing in step S <b> 34 to each experimental model 24, and acquires the performance value output from each experimental model 24. Further, the derivation unit 12 holds a plurality of combinations of experimental conditions and performance values for each output model 22 corresponding to the experimental conditions output from the output model 22.
 ステップS38で、導出部12は、今回(直前)のステップS32の処理により出力モデル22に入力された実験条件と性能値との複数の組み合わせに、以下に示す実験条件と性能値との組み合わせを追加する。すなわち、この場合、導出部12は、今回のステップS36の処理により実験モデル24に入力した実験条件と、性能値との組み合わせを追加する。この追加を行うことにより得られた実験条件と性能値との複数の組み合わせは、後述するステップS40の判定が否定判定となった後に、次に実行されるステップS32で用いられる。 In step S38, the derivation unit 12 adds the following combinations of experimental conditions and performance values to the plurality of combinations of the experimental conditions and performance values input to the output model 22 by the processing of step S32 (immediately before). to add. That is, in this case, the derivation unit 12 adds a combination of the experimental condition and the performance value input to the experimental model 24 by the process of step S36 this time. A plurality of combinations of experimental conditions and performance values obtained by performing this addition are used in step S32 to be executed next after a negative determination is made in step S40 described later.
 ステップS40で、導出部12は、ステップS32~ステップS38の処理を、所定の回数(例えば、100回)繰り返して実行したか否かを判定する。この判定が否定判定となった場合は、処理はステップS32に戻り、肯定判定となった場合は、処理はステップS42に移行する。 In step S40, the derivation unit 12 determines whether or not the processes in steps S32 to S38 have been repeated a predetermined number of times (for example, 100 times). If the determination is negative, the process returns to step S32. If the determination is affirmative, the process proceeds to step S42.
 ステップS42で、導出部12は、前述したように、各出力モデル22について、ステップS32~ステップS38の繰り返し処理により得られた所定回数分の性能値を用いて、出力モデル22の評価値を導出する。ステップS44で、学習部14は、前述したように、各出力モデル22についてステップS42の処理により導出された評価値を用いて、遺伝的アルゴリズムによって次世代の出力モデル22を生成する。この次世代の出力モデル22は、後述するステップS46の判定が否定判定となった後に、次に実行されるステップS32で用いられる。 In step S42, as described above, the derivation unit 12 derives the evaluation value of the output model 22 by using the performance value for the predetermined number of times obtained by the repetition processing of step S32 to step S38 for each output model 22. To do. In step S44, as described above, the learning unit 14 generates the next generation output model 22 by a genetic algorithm using the evaluation value derived by the process of step S42 for each output model 22. This next-generation output model 22 is used in step S32 to be executed next after a negative determination is made in step S46 described later.
 ステップS46で、学習部14は、出力モデル22の世代数が所定の世代数(例えば、1万世代)に達したか否かを判定する。この判定が否定判定となった場合は、処理はステップS32に戻り、肯定判定となった場合は、処理はステップS48に移行する。ステップS48で、学習部14は、前述したように、最終世代において評価値が示す評価が最も良い1つの出力モデル22を出力モデル22Aとして記憶部42に記憶する。ステップS48の処理が終了すると、出力モデル学習処理が終了する。 In step S46, the learning unit 14 determines whether or not the number of generations of the output model 22 has reached a predetermined number of generations (for example, 10,000 generations). If this determination is negative, the process returns to step S32. If the determination is affirmative, the process proceeds to step S48. In step S48, as described above, the learning unit 14 stores one output model 22 having the best evaluation indicated by the evaluation value in the final generation in the storage unit 42 as the output model 22A. When the process of step S48 ends, the output model learning process ends.
 図13のステップS50で、受付部30は、ユーザにより入力部44を介して入力された材料を生成するための実験条件と、実験結果の材料の性能値との複数の組み合わせを受け付ける。ステップS52で、出力部32は、記憶部42から出力モデル22Aを読み出す。ステップS54で、出力部32は、ステップS50の処理により受け付けられた実験条件と性能値との複数の組み合わせを、ステップS52の処理により読み出された出力モデル22Aに入力し、出力モデル22Aから出力された実験条件を取得する。 In step S50 of FIG. 13, the accepting unit 30 accepts a plurality of combinations of the experimental conditions for generating the material input by the user via the input unit 44 and the performance value of the material of the experimental result. In step S52, the output unit 32 reads the output model 22A from the storage unit. In step S54, the output unit 32 inputs a plurality of combinations of the experimental condition and the performance value received by the process of step S50 to the output model 22A read by the process of step S52, and outputs from the output model 22A. Obtained experimental conditions.
 ステップS56で、出力部32は、前述したように、ステップS54の処理により出力モデル22Aから出力された実験条件を実際の実験に使用可能な実験条件に補正する。ステップS58で、出力部32は、前述したように、ステップS56の処理により補正された実験条件を表示部43に出力する。ステップS58の処理により、表示部43には実験条件が表示される。ステップS58の処理が終了すると、実験条件出力処理が終了する。 In step S56, as described above, the output unit 32 corrects the experimental condition output from the output model 22A by the processing in step S54 to the experimental condition usable in the actual experiment. In step S58, the output unit 32 outputs the experimental condition corrected by the process of step S56 to the display unit 43 as described above. Through the processing in step S58, the experimental conditions are displayed on the display unit 43. When the process of step S58 ends, the experiment condition output process ends.
 以上説明したように、本実施形態によれば、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデル22により得られた実験条件を、仮想的な実験を行う実験モデル24に入力する。また、この入力により得られた実験結果の性能値を用いて出力モデル22の評価値を導出する。そして、導出した出力モデル22の評価値を用いて出力モデル22を機械学習によって学習させる。従って、このように学習された出力モデル22を用いることによって、材料の適切な実験条件を探索することができる。 As described above, according to the present embodiment, an experiment obtained by the output model 22 using a plurality of combinations of the experimental conditions for generating the material and the performance values of the experimental results as inputs and using the experimental conditions as outputs. The conditions are input to the experimental model 24 for performing a virtual experiment. Further, the evaluation value of the output model 22 is derived using the performance value of the experimental result obtained by this input. Then, the output model 22 is learned by machine learning using the derived evaluation value of the output model 22. Therefore, by using the output model 22 learned in this way, it is possible to search for an appropriate experimental condition of the material.
 [第2実施形態]
 開示の技術の第2実施形態を説明する。なお、第1実施形態と同一の構成要素については、同一の符号を付して説明を省略する。まず、図14を参照して、本実施形態に係る学習フェーズにおける学習装置10の機能的な構成について説明する。図14に示すように、学習装置10は、導出部12A、学習部14A、及び生成部16を備える。記憶部42には、学習用データ20、複数の出力モデル22B、及び複数の実験モデル24が記憶される。
[Second Embodiment]
A second embodiment of the disclosed technology will be described. In addition, about the component same as 1st Embodiment, the same code | symbol is attached | subjected and description is abbreviate | omitted. First, a functional configuration of the learning device 10 in the learning phase according to the present embodiment will be described with reference to FIG. As illustrated in FIG. 14, the learning device 10 includes a derivation unit 12A, a learning unit 14A, and a generation unit 16. The storage unit 42 stores learning data 20, a plurality of output models 22B, and a plurality of experimental models 24.
 図15に、出力モデル22Bの一例を示す。図15に示すように、本実施形態に係る出力モデル22Bは、入力層、複数の中間層、及び出力層を含むニューラルネットワークとされている。出力モデル22Bの入力層には、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び1つの実験条件の候補が入力される。出力モデル22Bの出力層は、強化学習における行動価値の一例としてのQ値を出力する。すなわち、本実施形態に係る学習装置10は、実験条件と性能値との複数の組み合わせを現状態sとし、実験条件の候補を行動aとして、強化学習の一例としてのQ学習に従って出力モデル22Bを学習させる。なお、本実施形態に係る複数の出力モデル22Bも、第1実施形態に係る出力モデル22と同様に、それぞれモデルの作成条件が異なる。 FIG. 15 shows an example of the output model 22B. As shown in FIG. 15, the output model 22B according to the present embodiment is a neural network including an input layer, a plurality of intermediate layers, and an output layer. A plurality of combinations of experimental conditions for generating materials and performance values of experimental results, and one experimental condition candidate are input to the input layer of the output model 22B. The output layer of the output model 22B outputs a Q value as an example of an action value in reinforcement learning. That is, the learning device 10 according to the present embodiment sets the output model 22B according to Q learning as an example of reinforcement learning, with a plurality of combinations of the experimental condition and the performance value as the current state s, the experimental condition candidate as the action a. Let them learn. Note that the plurality of output models 22B according to the present embodiment also have different model creation conditions, like the output model 22 according to the first embodiment.
 生成部16は、複数の異なる実験条件の候補を生成する。本実施形態では、生成部16は、予め定められた規則を満たし、かつ実際の実験に使用可能な実験条件の候補を生成する。この規則及び実際の実験に使用可能な実験条件については、第1実施形態と同様であるため、説明を省略する。具体的には、生成部16は、複数の異なる実験条件の候補を生成する都度、予め定められた規則を満たし、かつ実際の実験に使用可能な実験条件の候補をランダムに生成する。 The generation unit 16 generates a plurality of different experimental condition candidates. In the present embodiment, the generation unit 16 generates experimental condition candidates that satisfy a predetermined rule and can be used in an actual experiment. Since this rule and the experimental conditions that can be used in the actual experiment are the same as those in the first embodiment, description thereof is omitted. Specifically, each time a plurality of different experimental condition candidates are generated, the generation unit 16 randomly generates experimental condition candidates that satisfy a predetermined rule and can be used in an actual experiment.
 導出部12Aは、後述する学習部14Aが各出力モデル22BをQ学習に従って学習させる際に報酬として用いる値(以下、「報酬値」という)を導出する。以下、導出部12Aが報酬値を導出する処理の詳細を説明する。 The deriving unit 12A derives a value (hereinafter referred to as “reward value”) used as a reward when the learning unit 14A described later learns each output model 22B according to Q learning. Hereinafter, details of the process in which the deriving unit 12A derives the reward value will be described.
 まず、導出部12Aは、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び生成部16により生成された実験条件の候補を出力モデル22Bに入力し、出力モデル22Bから出力されたQ値を取得する。詳細には、図16に示すように、導出部12Aは、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び生成部16により生成された複数の実験条件の候補の何れか1つを、生成された全ての実験条件の候補について、出力モデル22Bに個別に入力する。すなわち、導出部12Aは、生成部16により生成された複数の実験条件の候補のそれぞれに対応して出力モデル22Bから出力されたQ値を取得する。 First, the derivation unit 12A inputs, to the output model 22B, a plurality of combinations of experimental conditions for generating materials and performance values of experimental results, and experimental condition candidates generated by the generation unit 16, and outputs the output model 22B. The Q value output from is acquired. Specifically, as illustrated in FIG. 16, the derivation unit 12 </ b> A includes a plurality of combinations of experimental conditions for generating materials and performance values of experimental results, and a plurality of experimental condition candidates generated by the generation unit 16. Any one of the above is individually input to the output model 22B for all the generated experimental condition candidates. That is, the deriving unit 12A acquires the Q value output from the output model 22B corresponding to each of the plurality of experimental condition candidates generated by the generating unit 16.
 次に、導出部12Aは、取得した複数のQ値のうち、所定値以上のQ値の何れかに対応する実験条件の候補を実験モデル24に入力する。本実施形態では、導出部12Aは、取得した複数のQ値のうち、最大のQ値に対応する実験条件の候補を各実験モデル24に入力し、各実験モデル24から出力された性能値を取得する。また、導出部12Aは、第1実施形態に係る導出部12と同様に、実験条件と実験結果の性能値との複数の組み合わせを、それぞれ保持する。 Next, the derivation unit 12A inputs the experimental condition candidate corresponding to any of the Q values equal to or greater than a predetermined value among the plurality of acquired Q values to the experimental model 24. In the present embodiment, the derivation unit 12A inputs the experimental condition candidate corresponding to the maximum Q value among the plurality of acquired Q values to each experimental model 24, and uses the performance value output from each experimental model 24. get. Further, similarly to the derivation unit 12 according to the first embodiment, the derivation unit 12A holds a plurality of combinations of experimental conditions and performance values of experimental results, respectively.
 更に、導出部12Aは、出力モデル22Bに入力した実験条件と性能値との複数の組み合わせに、実験モデル24に入力した実験条件と導出した性能値との組み合わせを追加した実験条件と性能値との複数の組み合わせを得る。また、導出部12Aは、再度、得られた実験条件と性能値との複数の組み合わせ、及び生成部16により生成された複数の実験条件の候補の何れか1つを、生成された全ての実験条件の候補について、出力モデル22Bに個別に入力する。そして、導出部12Aは、再度、前述した処理と同様に、複数の実験条件の候補のそれぞれに対応して出力モデル22Bから出力されたQ値と実験モデル24とを用いて、実験条件の候補に対応する性能値を取得する。導出部12Aは、この実験条件の候補に対応する性能値を取得するための処理を所定の回数(例えば、100回)繰り返す。 Furthermore, the derivation unit 12A includes an experimental condition and a performance value obtained by adding a combination of the experimental condition input to the experimental model 24 and the derived performance value to a plurality of combinations of the experimental condition and the performance value input to the output model 22B. Get multiple combinations. In addition, the derivation unit 12A again selects any one of a plurality of combinations of the obtained experimental conditions and performance values and a plurality of experimental condition candidates generated by the generation unit 16 for all the generated experiments. The candidate conditions are individually input to the output model 22B. Then, the derivation unit 12A again uses the Q value output from the output model 22B corresponding to each of the plurality of experimental condition candidates and the experimental model 24, similarly to the above-described processing, to use the experimental model candidates. Get the performance value corresponding to. The deriving unit 12A repeats the process for acquiring the performance value corresponding to the experimental condition candidate a predetermined number of times (for example, 100 times).
 また、導出部12Aは、以上の処理を各出力モデル22Bについて行う。すなわち、導出部12Aは、各出力モデル22Bのそれぞれについて所定回数分の性能値を取得する。導出部12Aは、第1実施形態に係る導出部12と同様に、各出力モデル22Bについて、得られた所定回数分の性能値を用いて出力モデル22Bの評価値を導出する(図7参照)。 Also, the derivation unit 12A performs the above processing for each output model 22B. That is, the derivation unit 12A acquires a performance value for a predetermined number of times for each output model 22B. Similarly to the derivation unit 12 according to the first embodiment, the derivation unit 12A derives an evaluation value of the output model 22B for each output model 22B using the obtained performance values for a predetermined number of times (see FIG. 7). .
 また、導出部12Aは、導出した評価値が高い出力モデル22Bほど高い報酬が得られるように報酬値を導出する。例えば、導出部12Aは、評価値が高い順番に上位3つの出力モデル22Bの報酬値を「1」と導出し、下位3つの出力モデル22Bの報酬値を「-1」と導出し、他の出力モデル22Bの報酬値を「0」と導出する。 Also, the derivation unit 12A derives a reward value so that a higher reward is obtained for the output model 22B having a higher derived evaluation value. For example, the derivation unit 12A derives the reward value of the upper three output models 22B as “1” in descending order of the evaluation value, derives the reward value of the lower three output models 22B as “−1”, The reward value of the output model 22B is derived as “0”.
 学習部14Aは、導出部12Aにより導出された報酬値をQ学習における報酬rとして用いて、各出力モデル22Bを学習させる。 The learning unit 14A learns each output model 22B by using the reward value derived by the derivation unit 12A as the reward r in the Q learning.
 上記の導出部12Aによる各出力モデル22Bの報酬値を導出するための処理、及び学習部14Aによる各出力モデル22Bの学習処理は、所定の回数(例えば、1万回)だけ行われる。そして、学習部14Aは、最後の回において、評価値が示す評価が最も良い1つの出力モデル22Bを、後述する運用フェーズで用いる出力モデル22Cとして記憶部42に記憶する。なお、上記の導出部12Aによる各出力モデル22Bの報酬値を導出するための処理、及び学習部14Aによる各出力モデル22Bの学習処理は、評価値が収束するまで行ってもよい。 The process for deriving the reward value of each output model 22B by the deriving unit 12A and the learning process of each output model 22B by the learning unit 14A are performed a predetermined number of times (for example, 10,000 times). Then, in the last round, the learning unit 14A stores one output model 22B having the best evaluation indicated by the evaluation value in the storage unit 42 as an output model 22C used in an operation phase described later. Note that the process for deriving the reward value of each output model 22B by the deriving unit 12A and the learning process of each output model 22B by the learning unit 14A may be performed until the evaluation value converges.
 また、学習部14Aは、第1実施形態に係る学習部14と同様に、学習用データ20を用いて、誤差逆伝播法に従って、実験モデル24を学習させる。 Also, the learning unit 14A uses the learning data 20 to learn the experimental model 24 according to the error back-propagation method, similarly to the learning unit 14 according to the first embodiment.
 次に、図17を参照して、本実施形態に係る運用フェーズにおける学習装置10の機能的な構成について説明する。図17に示すように、学習装置10は、生成部16、受付部30、及び出力部32Aを備える。また、学習装置10の記憶部42には、前述した学習フェーズで得られた出力モデル22Cが記憶される。 Next, a functional configuration of the learning device 10 in the operation phase according to the present embodiment will be described with reference to FIG. As illustrated in FIG. 17, the learning device 10 includes a generation unit 16, a reception unit 30, and an output unit 32A. Further, the storage unit 42 of the learning device 10 stores the output model 22C obtained in the learning phase described above.
 出力部32Aは、受付部30により受け付けられた実験条件と性能値との複数の組み合わせ、及び生成部16により生成された複数の実験条件の候補の何れか1つを、生成された全ての実験条件の候補について、出力モデル22Cに個別に入力する。出力部32Aは、この入力それぞれに対応して出力モデル22Cから出力されたQ値を取得する。そして、出力部32Aは、取得した複数のQ値のうち、最大のQ値に対応する実験条件の候補を、次に実験対象とする実験条件の候補として表示部43に出力する。なお、出力部32Aは、取得した複数のQ値のうち、所定値以上のQ値の何れか(例えば、所定値以上で、かつ2番目に大きいQ値)に対応する実験条件の候補を、次に実験対象とする実験条件の候補として表示部43に出力してもよい。また、出力部32Aは、取得した複数のQ値のうち、最大のQ値に対応する実験条件の候補を、次に実験対象とする実験条件の候補として記憶部42に出力(記憶)してもよい。 The output unit 32 </ b> A selects all ones of the plurality of combinations of the experimental conditions and performance values received by the receiving unit 30 and the plurality of experimental condition candidates generated by the generating unit 16. The candidate conditions are individually input to the output model 22C. The output unit 32A acquires the Q value output from the output model 22C corresponding to each of the inputs. Then, the output unit 32A outputs the experimental condition candidate corresponding to the maximum Q value among the acquired Q values to the display unit 43 as the experimental condition candidate to be the next experiment target. Note that the output unit 32A selects experimental condition candidates corresponding to any one of the acquired Q values that are equal to or higher than a predetermined value (for example, the Q value that is equal to or higher than the predetermined value and is the second largest). Next, you may output to the display part 43 as a candidate of experimental conditions made into experiment. In addition, the output unit 32A outputs (stores) the experimental condition candidate corresponding to the maximum Q value among the acquired Q values to the storage unit 42 as the experimental condition candidate to be the next experiment target. Also good.
 本実施形態に係る学習装置10のハードウェア構成は、第1実施形態に係る学習装置10と同様(図10参照)であるため、説明を省略する。CPU40が学習プログラム50を実行することによって、導出部12A、学習部14A、生成部16、受付部30、及び出力部32Aとして機能する。 Since the hardware configuration of the learning device 10 according to the present embodiment is the same as that of the learning device 10 according to the first embodiment (see FIG. 10), the description thereof is omitted. When the CPU 40 executes the learning program 50, the CPU 40 functions as the derivation unit 12A, the learning unit 14A, the generation unit 16, the reception unit 30, and the output unit 32A.
 次に、図18及び図19を参照して、本実施形態に係る学習装置10の作用を説明する。なお、実験モデル学習処理は、第1実施形態と同様(図11参照)であるため、説明を省略する。図18に示す出力モデル学習処理は、例えば、学習フェーズにおいて、ユーザによって入力部44を介して出力モデル学習処理の実行指示が入力された場合に実行される。また、図19に示す実験条件出力処理は、例えば、運用フェーズにおいて、ユーザによって入力部44を介して実験条件出力処理の実行指示が入力された場合に実行される。 Next, the operation of the learning device 10 according to the present embodiment will be described with reference to FIGS. Note that the experimental model learning process is the same as in the first embodiment (see FIG. 11), and thus the description thereof is omitted. The output model learning process illustrated in FIG. 18 is executed, for example, when a user inputs an execution instruction for the output model learning process via the input unit 44 in the learning phase. Further, the experiment condition output process shown in FIG. 19 is executed, for example, when the user inputs an instruction to execute the experiment condition output process via the input unit 44 in the operation phase.
 図18のステップS60で、学習部14は、それぞれモデルの作成条件が異なる複数の出力モデル22Bを生成する。ステップS60の処理により生成された各出力モデル22Bについて以下のステップS62~S70の処理が同様に実行される。ステップS62で、生成部16は、前述したように、複数の異なる実験条件の候補を生成する。 18, in step S60, the learning unit 14 generates a plurality of output models 22B having different model creation conditions. The following steps S62 to S70 are similarly executed for each output model 22B generated by the processing of step S60. In step S62, the generation unit 16 generates a plurality of different experimental condition candidates as described above.
 ステップS64で、導出部12Aは、前述したように、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及びステップS62の処理により生成された実験条件の候補を出力モデル22Bに入力し、出力モデル22Bから出力されたQ値を取得する。 In step S64, as described above, the derivation unit 12A outputs a plurality of combinations of the experimental conditions for generating the material and the performance values of the experimental results, and the experimental condition candidates generated by the process of step S62 as an output model. The Q value input to 22B and output from the output model 22B is acquired.
 なお、この実験条件と性能値との複数の組み合わせは、ステップS64が出力モデル22Bの学習処理における初回に実行される際(すなわち、初回にステップS64が実行される際、又は後述するステップS78の判定が否定判定となった後の初回にステップS62が実行される際)には、学習用データ20に含まれる全ての実験条件と性能値との組み合わせとされる。また、この実験条件と性能値との複数の組み合わせは、ステップS64が出力モデル22Bの学習処理における2回目以降に実行される際(すなわち、ステップS70の判定が否定判定となった後にステップS64が実行される際)には、前回のステップS64で出力モデル22Bに入力された実験条件と性能値との複数の組み合わせに、後述するステップS68で実験条件と性能値との組み合わせが追加されたものとなる。 The plurality of combinations of the experimental conditions and the performance values are obtained when step S64 is executed for the first time in the learning process of the output model 22B (that is, when step S64 is executed for the first time, or in step S78 described later). When step S62 is executed for the first time after a negative determination), the combination of all the experimental conditions and performance values included in the learning data 20 is used. In addition, a plurality of combinations of the experimental conditions and the performance values are obtained when step S64 is executed after the second time in the learning process of the output model 22B (that is, after the determination in step S70 is negative) When executed, the combination of the experimental condition and the performance value is added in step S68 described later to the plurality of combinations of the experimental condition and the performance value input to the output model 22B in the previous step S64. It becomes.
 ステップS66で、導出部12Aは、ステップS64の処理により取得された複数のQ値のうち、最大のQ値に対応する実験条件の候補を各実験モデル24に入力し、各実験モデル24から出力された性能値を取得する。また、導出部12Aは、最大のQ値に対応する実験条件の候補に対応して、実験条件と実験結果の性能値との複数の組み合わせを、それぞれ保持する。 In step S66, the derivation unit 12A inputs the experimental condition candidate corresponding to the maximum Q value among the plurality of Q values acquired by the process of step S64 to each experimental model 24, and outputs from each experimental model 24. Get the measured performance value. Further, the deriving unit 12A holds a plurality of combinations of the experimental condition and the performance value of the experimental result, respectively, corresponding to the experimental condition candidate corresponding to the maximum Q value.
 ステップS68で、導出部12Aは、今回(直前)のステップS64の処理により出力モデル22Bに入力された実験条件と性能値との複数の組み合わせに、以下に示す実験条件と性能値との組み合わせを追加する。すなわち、この場合、導出部12Aは、今回のステップS66の処理により実験モデル24に入力した実験条件と、取得された性能値との組み合わせを追加する。この追加を行うことにより得られた実験条件と性能値との複数の組み合わせは、後述するステップS70の判定が否定判定となった後に、次に実行されるステップS64で用いられる。 In step S68, the derivation unit 12A adds the following combinations of experimental conditions and performance values to a plurality of combinations of the experimental conditions and performance values input to the output model 22B by the processing of step S64 this time (immediately before). to add. That is, in this case, the deriving unit 12A adds a combination of the experimental condition input to the experimental model 24 and the acquired performance value by the process of step S66 this time. A plurality of combinations of experimental conditions and performance values obtained by performing this addition are used in step S64 to be executed next after the determination in step S70 described later is negative.
 ステップS70で、導出部12Aは、ステップS62~ステップS68の処理を、所定の回数(例えば、100回)繰り返して実行したか否かを判定する。この判定が否定判定となった場合は、処理はステップS62に戻り、肯定判定となった場合は、処理はステップS72に移行する。 In step S70, the derivation unit 12A determines whether or not the processing in steps S62 to S68 has been repeated a predetermined number of times (for example, 100 times). If this determination is negative, the process returns to step S62. If the determination is affirmative, the process proceeds to step S72.
 ステップS72で、導出部12Aは、前述したように、各出力モデル22Bについて、ステップS62~ステップS68の繰り返し処理により得られた所定回数分の性能値を用いて、出力モデル22Bの評価値を導出する。ステップS74で、導出部12Aは、前述したように、ステップS72の処理により導出された評価値が高い出力モデル22Bほど高い報酬が得られるように報酬値を導出する。 In step S72, as described above, the derivation unit 12A derives the evaluation value of the output model 22B using the performance value for the predetermined number of times obtained by the repetition processing of step S62 to step S68 for each output model 22B. To do. In step S74, as described above, the derivation unit 12A derives a reward value so that a higher reward is obtained for the output model 22B having a higher evaluation value derived by the process of step S72.
 ステップS76で、学習部14Aは、ステップS74の処理により導出された報酬値をQ学習における報酬rとして用いて、各出力モデル22Bを学習させる。ステップS78で、学習部14は、ステップS62~ステップS76の処理を、所定の回数(例えば、1万回)繰り返して実行したか否かを判定する。この判定が否定判定となった場合は、処理はステップS62に戻り、肯定判定となった場合は、処理はステップS80に移行する。ステップS80で、学習部14Aは、前述したように、最後に実行されたステップS72の処理により導出された評価値が示す評価が最も良い1つの出力モデル22Bを出力モデル22Cとして記憶部42に記憶する。ステップS80の処理が終了すると、出力モデル学習処理が終了する。 In step S76, the learning unit 14A learns each output model 22B using the reward value derived by the process in step S74 as the reward r in the Q learning. In step S78, the learning unit 14 determines whether or not the processes in steps S62 to S76 have been repeated a predetermined number of times (for example, 10,000 times). If this determination is negative, the process returns to step S62. If the determination is affirmative, the process proceeds to step S80. In step S80, as described above, the learning unit 14A stores in the storage unit 42, as the output model 22C, one output model 22B having the best evaluation indicated by the evaluation value derived by the process of step S72 executed last. To do. When the process of step S80 ends, the output model learning process ends.
 図19のステップS90で、受付部30は、ユーザにより入力部44を介して入力された材料を生成するための実験条件と、実験結果の材料の性能値との複数の組み合わせを受け付ける。ステップS92で、出力部32Aは、記憶部42から出力モデル22Cを読み出す。ステップS94で、生成部16は、前述したように、複数の異なる実験条件の候補を生成する。 19, the accepting unit 30 accepts a plurality of combinations of the experimental conditions for generating the material input by the user via the input unit 44 and the performance value of the material of the experimental result. In step S92, the output unit 32A reads the output model 22C from the storage unit. In step S94, the generation unit 16 generates a plurality of different experimental condition candidates as described above.
 ステップS96で、出力部32Aは、ステップS90の処理により受け付けられた実験条件と性能値との複数の組み合わせ、及びステップS92の処理により生成された複数の実験条件の候補の何れか1つを、生成された全ての実験条件の候補について、出力モデル22Cに個別に入力する。出力部32Aは、この入力それぞれに対応して出力モデル22Cから出力されたQ値を取得する。 In step S96, the output unit 32A outputs one of a plurality of combinations of the experimental condition and the performance value received by the process of step S90 and a plurality of experimental condition candidates generated by the process of step S92. All the generated experimental condition candidates are individually input to the output model 22C. The output unit 32A acquires the Q value output from the output model 22C corresponding to each of the inputs.
 ステップS98で、出力部32Aは、ステップS96の処理により取得された複数のQ値のうち、最大のQ値に対応する実験条件の候補を、次に実験対象とする実験条件の候補として表示部43に出力する。ステップS98の処理が終了すると、実験条件出力処理が終了する。 In step S98, the output unit 32A displays the experimental condition candidate corresponding to the maximum Q value among the plurality of Q values acquired by the process of step S96 as the experimental condition candidate to be tested next. Output to 43. When the process of step S98 ends, the experiment condition output process ends.
 以上説明したように、本実施形態によれば、材料を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、Q値を出力とした出力モデル22Bにより得られたQ値が最大となる実験条件の候補を実験モデル24に入力する。また、この入力により得られた実験結果の性能値を用いて出力モデル22Bの評価値を導出し、導出した評価値に応じて出力モデル22Bに与える報酬を導出する。そして、導出した報酬を用いて出力モデル22BをQ学習によって学習させる。従って、このように学習された出力モデル22Bを用いることによって、材料の適切な実験条件を探索することができる。 As described above, according to the present embodiment, a plurality of combinations of experimental conditions for generating materials and performance values of experimental results, and candidates for experimental conditions are input, and an output model is output with Q value. The experimental condition candidate that maximizes the Q value obtained by 22B is input to the experimental model 24. Also, an evaluation value of the output model 22B is derived using the performance value of the experimental result obtained by this input, and a reward given to the output model 22B is derived according to the derived evaluation value. Then, the output model 22B is learned by Q learning using the derived reward. Therefore, by using the output model 22B learned in this way, it is possible to search for an appropriate experimental condition for the material.
 なお、上記各実施形態では、材料を生成するための実験条件を導出する場合について説明したが、これに限定されない。例えば、薬剤を生成するための実験条件を導出する形態としてもよい。 In addition, although each said embodiment demonstrated the case where experiment conditions for producing | generating a material were derived, it is not limited to this. For example, it is good also as a form which derives the experimental conditions for producing | generating a chemical | medical agent.
 また、上記各実施形態では、実験モデル24として機械学習によって得られた学習済みモデルを適用した場合について説明したが、仮想的な実験が可能なモデルであれば、これに限定されない。例えば、実験モデル24として、1つの実験条件を入力とし、入力された1つの実験条件に対応する実験結果の性能値を出力とした任意の関数を適用してもよい。このようなモデルを適用した場合でも出力モデル22、22Bが学習されることによって最適化される。また、例えば、実験モデル24は、実験をシミュレーションするシミュレータであってもよい。 In each of the above embodiments, the case where a learned model obtained by machine learning is applied as the experimental model 24 has been described. However, the present invention is not limited to this as long as it is a model that allows a virtual experiment. For example, as the experimental model 24, an arbitrary function having one experimental condition as an input and outputting a performance value of an experimental result corresponding to the input one experimental condition may be applied. Even when such a model is applied, the output models 22 and 22B are optimized by learning. For example, the experimental model 24 may be a simulator that simulates an experiment.
 また、上記第2実施形態の運用フェーズにおいて、出力部32Aは、複数の実験条件の候補を出力モデル22Cに逐次的に複数回入力することにより得られた累計のQ値が最大となる実験条件の候補を次に実験対象とする実験条件の候補として出力してもよい。この場合、出力部32Aは、まず、第2実施形態と同様に、出力モデル22Cから1回目の複数の実験条件の候補それぞれに対応するQ値を得る。次に、出力部32Aは、例えば、1回目に出力モデル22Cに入力した実験条件と性能値との複数の組み合わせに、1回目に出力モデル22Cに入力した実験条件の候補と性能値との組み合わせを追加する。この性能値は、例えば、SVM(Support Vector Machine)等の既知の手法により推定すればよい。そして、出力部32Aは、1回目と同様に、追加して得られた実験条件と性能値との複数の組み合わせ、及び2回目の複数の実験条件の候補それぞれを出力モデル22Cに入力することにより出力モデル22Cから2回目の複数の実験条件の候補それぞれに対応するQ値を得る。この場合、出力部32Aは、1回目のQ値と2回目のQ値の累計値が最大となる実験条件の候補を次に実験対象とする実験条件の候補として出力する。なお、ここでは、2回のQ値の累計値を用いる場合を説明したが、3回以上のQ値の累計値を用いる場合も同様に可能である。 Further, in the operation phase of the second embodiment, the output unit 32A allows the experimental condition in which the cumulative Q value obtained by sequentially inputting a plurality of experimental condition candidates to the output model 22C is maximized. These candidates may be output as candidates for the experimental conditions to be the next experiment target. In this case, the output unit 32A first obtains a Q value corresponding to each of a plurality of first experimental condition candidates from the output model 22C, as in the second embodiment. Next, the output unit 32A, for example, combines a plurality of combinations of experimental conditions and performance values input to the output model 22C for the first time with combinations of experimental condition candidates and performance values input to the output model 22C for the first time. Add This performance value may be estimated by a known method such as SVM (Support Vector Vector Machine). Then, similarly to the first time, the output unit 32A inputs a plurality of combinations of experimental conditions and performance values obtained by addition, and a plurality of second experimental condition candidates to the output model 22C. A Q value corresponding to each of a plurality of second experimental condition candidates is obtained from the output model 22C. In this case, the output unit 32A outputs the experimental condition candidate that maximizes the cumulative value of the first Q value and the second Q value as the next experimental condition candidate. In addition, although the case where the accumulated value of Q value of 2 times was used was demonstrated here, the case where the accumulated value of Q value of 3 times or more is used is similarly possible.
 また、上記各実施形態でCPUがソフトウェア(プログラム)を実行することにより実行した各種処理を、CPU以外の各種のプロセッサが実行してもよい。この場合のプロセッサとしては、FPGA(Field-Programmable Gate Array)等の製造後に回路構成を変更可能なPLD(Programmable Logic Device)、及びASIC(Application Specific Integrated Circuit)等の特定の処理を実行させるために専用に設計された回路構成を有するプロセッサである専用電気回路等が例示される。また、上記各種処理を、これらの各種のプロセッサのうちの1つで実行してもよいし、同種又は異種の2つ以上のプロセッサの組み合わせ(例えば、複数のFPGA、及びCPUとFPGAとの組み合わせ等)で実行してもよい。また、これらの各種のプロセッサのハードウェア的な構造は、より詳細には、半導体素子等の回路素子を組み合わせた電気回路である。 In addition, various processors other than the CPU may execute various processes executed by the CPU executing software (programs) in each of the above embodiments. As a processor in this case, in order to execute specific processing such as PLD (Programmable Logic Device) and ASIC (Application Specific Integrated Circuit) whose circuit configuration can be changed after manufacturing FPGA (Field-Programmable Gate Array) or the like A dedicated electric circuit, which is a processor having a circuit configuration designed exclusively, is exemplified. Further, the above-described various processes may be executed by one of these various processors, or a combination of two or more processors of the same type or different types (for example, a combination of a plurality of FPGAs and CPUs and FPGAs). Etc.). More specifically, the hardware structure of these various processors is an electric circuit in which circuit elements such as semiconductor elements are combined.
 また、上記各実施形態では、学習プログラム50が記憶部42に予め記憶(インストール)されている態様を説明したが、これに限定されない。学習プログラム50は、CD-ROM(Compact Disk Read Only Memory)、DVD-ROM(Digital Versatile Disk Read Only Memory)、及びUSB(Universal Serial Bus)メモリ等の非一時的記録媒体に記録された形態で提供されてもよい。また、学習プログラム50は、ネットワークを介して外部装置からダウンロードされる形態としてもよい。 In the above embodiments, the learning program 50 has been previously stored (installed) in the storage unit 42. However, the present invention is not limited to this. The learning program 50 is provided in a form recorded on a non-temporary recording medium such as a CD-ROM (Compact Disk Read Only Memory), DVD-ROM (Digital Versatile Disk Disk Read Only Memory), and USB (Universal Serial Bus) memory. May be. The learning program 50 may be downloaded from an external device via a network.
 本願は2018年4月11日出願の日本出願第2018-076001号の優先権を主張すると共に、その全文を参照により本明細書に援用する。 This application claims the priority of Japanese Application No. 2018-076001 filed on Apr. 11, 2018, the entire text of which is incorporated herein by reference.

Claims (16)

  1.  材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデルに、前記複数の組み合わせを入力することによって出力された実験条件を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値を用いて前記出力モデルの評価値を導出する導出部と、
     前記導出部により導出された評価値を反映させる機械学習によって前記出力モデルを学習させる学習部と、
     を備えた学習装置。
    The test condition output by inputting the plurality of combinations into the output model having the test condition as an output and a plurality of combinations of the test condition for generating the material or the drug and the performance value of the test result. A derivation unit for deriving an evaluation value of the output model using a performance value of an experimental result obtained by inputting to an experimental model for performing a virtual experiment;
    A learning unit for learning the output model by machine learning reflecting the evaluation value derived by the deriving unit;
    A learning device.
  2.  前記評価値は、複数の前記性能値における目標とする性能を満たす値の比率が高いほど良い値であるか、前記目標とする性能を満たす性能値が得られるまでの仮想的な実験回数が少ないほど良い値であるか、又は前記性能値が前記目標とする性能に近いほど良い値である
     請求項1に記載の学習装置。
    The evaluation value is a better value as the ratio of values satisfying the target performance among the plurality of performance values is higher, or the number of virtual experiments until the performance value satisfying the target performance is obtained is smaller. The learning device according to claim 1, wherein the learning device is a better value, or a better value as the performance value is closer to the target performance.
  3.  前記導出部は、前記出力モデルから、予め定められた規則を満たさない実験条件が出力された場合、前記評価値を低く補正する
     請求項1又は請求項2に記載の学習装置。
    The learning device according to claim 1, wherein the derivation unit corrects the evaluation value to be low when an experimental condition that does not satisfy a predetermined rule is output from the output model.
  4.  前記導出部は、前記出力モデルから出力された実験条件を実際の実験に使用可能な実験条件に補正する
     請求項1から請求項3の何れか1項に記載の学習装置。
    The learning device according to claim 1, wherein the derivation unit corrects the experimental condition output from the output model to an experimental condition that can be used for an actual experiment.
  5.  前記出力モデルは、遺伝的アルゴリズムを用いて学習されるモデルである
     請求項1から請求項4の何れか1項に記載の学習装置。
    The learning apparatus according to any one of claims 1 to 4, wherein the output model is a model learned using a genetic algorithm.
  6.  材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、強化学習における行動価値を出力とした出力モデルに、前記複数の組み合わせと複数の異なる前記実験条件の候補各々とをそれぞれ入力することにより出力された複数の行動価値のうち、所定値以上の行動価値に対応する前記実験条件の候補を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値に基づいて導出される値を報酬として、前記出力モデルを学習させる学習部
     を備えた学習装置。
    A plurality of combinations of the plurality of combinations and a plurality of combinations of the experimental conditions for generating the material or the drug and the performance values of the experimental results and the candidate of the experimental conditions as inputs and the action value in reinforcement learning as an output. Among the plurality of behavioral values output by inputting each of the experimental condition candidates different from each other, the experimental condition candidate corresponding to the behavioral value greater than or equal to a predetermined value is used as an experimental model for performing a virtual experiment. A learning apparatus comprising: a learning unit that learns the output model using a value derived based on a performance value of an experimental result obtained by inputting as a reward.
  7.  前記報酬は、複数の前記性能値における目標とする性能を満たす値の比率が高いほど良い値であるか、前記目標とする性能を満たす性能値が得られるまでの仮想的な実験回数が少ないほど良い値であるか、又は前記性能値が前記目標とする性能に近いほど良い値である
     請求項6に記載の学習装置。
    The reward is a better value as the ratio of values satisfying the target performance in the plurality of performance values is higher, or the smaller the number of virtual experiments until a performance value satisfying the target performance is obtained. The learning device according to claim 6, wherein the learning device is a good value or a value that is better as the performance value is closer to the target performance.
  8.  前記強化学習は、Q学習であり、
     前記行動価値は、Q値である
     請求項6又は請求項7に記載の学習装置。
    The reinforcement learning is Q learning,
    The learning device according to claim 6, wherein the action value is a Q value.
  9.  前記学習部により学習された出力モデルを用いる場合に、複数の前記実験条件の候補を前記出力モデルに逐次的に複数回入力することにより出力された累計の行動価値が最大となる前記実験条件の候補を次に実験対象とする実験条件の候補として出力する出力部を更に備えた
     請求項6から請求項8の何れか1項に記載の学習装置。
    When the output model learned by the learning unit is used, a plurality of the experimental condition candidates are sequentially input to the output model a plurality of times, and the cumulative action value output is maximized. The learning apparatus according to any one of claims 6 to 8, further comprising an output unit that outputs the candidate as a candidate of an experimental condition that is set as a next experiment target.
  10.  前記実験モデルは、機械学習により得られたモデルである
     請求項1から請求項9の何れか1項に記載の学習装置。
    The learning apparatus according to claim 1, wherein the experimental model is a model obtained by machine learning.
  11.  前記実験モデルは、複数存在し、
     前記複数の前記実験モデルは、それぞれモデルの作成条件が異なる
     請求項1から請求項10の何れか1項に記載の学習装置。
    There are a plurality of the experimental models,
    The learning apparatus according to claim 1, wherein the plurality of experimental models have different model creation conditions.
  12.  前記出力モデルは、複数存在し、
     前記複数の前記出力モデルは、それぞれモデルの作成条件が異なる
     請求項1から請求項11の何れか1項に記載の学習装置。
    There are a plurality of the output models,
    The learning device according to any one of claims 1 to 11, wherein the plurality of output models have different model creation conditions.
  13.  材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデルに、前記複数の組み合わせを入力することによって出力された実験条件を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値を用いて前記出力モデルの評価値を導出し、
     導出した評価値を反映させる機械学習によって前記出力モデルを学習させる
     処理をコンピュータが実行する学習方法。
    The test condition output by inputting the plurality of combinations into the output model having the test condition as an output and a plurality of combinations of the test condition for generating the material or the drug and the performance value of the test result. Deriving the evaluation value of the output model using the performance value of the experimental result obtained by inputting into the experimental model for performing a virtual experiment,
    A learning method in which a computer executes a process of learning the output model by machine learning that reflects a derived evaluation value.
  14.  材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせを入力とし、実験条件を出力とした出力モデルに、前記複数の組み合わせを入力することによって出力された実験条件を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値を用いて前記出力モデルの評価値を導出し、
     導出した評価値を反映させる機械学習によって前記出力モデルを学習させる
     処理をコンピュータに実行させるための学習プログラム。
    The test condition output by inputting the plurality of combinations into the output model having the test condition as an output and a plurality of combinations of the test condition for generating the material or the drug and the performance value of the test result. Deriving the evaluation value of the output model using the performance value of the experimental result obtained by inputting into the experimental model for performing a virtual experiment,
    A learning program for causing a computer to execute a process of learning the output model by machine learning that reflects a derived evaluation value.
  15.  材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、強化学習における行動価値を出力とした出力モデルに、前記複数の組み合わせと複数の異なる前記実験条件の候補各々とをそれぞれ入力することにより出力された複数の行動価値のうち、所定値以上の行動価値に対応する前記実験条件の候補を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値に基づいて導出される値を報酬として、前記出力モデルを学習させる
     処理をコンピュータが実行する学習方法。
    A plurality of combinations of the plurality of combinations and a plurality of combinations of the experimental conditions for generating the material or the drug and the performance values of the experimental results and the candidate of the experimental conditions as inputs and the action value in reinforcement learning as an output. Among the plurality of behavioral values output by inputting each of the experimental condition candidates different from each other, the experimental condition candidate corresponding to the behavioral value greater than or equal to a predetermined value is used as an experimental model for performing a virtual experiment. A learning method in which a computer executes a process of learning the output model using a value derived based on a performance value of an experimental result obtained by inputting as a reward.
  16.  材料又は薬剤を生成するための実験条件と実験結果の性能値との複数の組み合わせ、及び実験条件の候補を入力とし、強化学習における行動価値を出力とした出力モデルに、前記複数の組み合わせと複数の異なる前記実験条件の候補各々とをそれぞれ入力することにより出力された複数の行動価値のうち、所定値以上の行動価値に対応する前記実験条件の候補を、仮想的な実験を行う実験モデルに入力することにより得られた実験結果の性能値に基づいて導出される値を報酬として、前記出力モデルを学習させる
     処理をコンピュータに実行させるための学習プログラム。
    A plurality of combinations of the plurality of combinations and a plurality of combinations of the experimental conditions for generating the material or the drug and the performance values of the experimental results and the candidate of the experimental conditions as inputs and the action value in reinforcement learning as an output. Among the plurality of behavioral values output by inputting each of the experimental condition candidates different from each other, the experimental condition candidate corresponding to the behavioral value greater than or equal to a predetermined value is used as an experimental model for performing a virtual experiment. A learning program for causing a computer to execute a process of learning the output model using a value derived based on a performance value of an experimental result obtained by inputting as a reward.
PCT/JP2019/010290 2018-04-11 2019-03-13 Learning device, learning method, and learning program WO2019198408A1 (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
JP2020513128A JP6804009B2 (en) 2018-04-11 2019-03-13 Learning devices, learning methods, and learning programs

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2018-076001 2018-04-11
JP2018076001 2018-04-11

Publications (1)

Publication Number Publication Date
WO2019198408A1 true WO2019198408A1 (en) 2019-10-17

Family

ID=68163548

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2019/010290 WO2019198408A1 (en) 2018-04-11 2019-03-13 Learning device, learning method, and learning program

Country Status (2)

Country Link
JP (1) JP6804009B2 (en)
WO (1) WO2019198408A1 (en)

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022079829A1 (en) * 2020-10-14 2022-04-21 日本電気株式会社 Information processing device, information processing method, information processing system, and storage medium

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003058579A (en) * 2001-08-21 2003-02-28 Bridgestone Corp Method for optimizing design/blending
JP2017107902A (en) * 2015-12-07 2017-06-15 ファナック株式会社 Machine learning device learning action of laminating core sheet, laminated core manufacturing device, laminated core manufacturing system, and machine learning method

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2003058579A (en) * 2001-08-21 2003-02-28 Bridgestone Corp Method for optimizing design/blending
JP2017107902A (en) * 2015-12-07 2017-06-15 ファナック株式会社 Machine learning device learning action of laminating core sheet, laminated core manufacturing device, laminated core manufacturing system, and machine learning method

Cited By (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2022079829A1 (en) * 2020-10-14 2022-04-21 日本電気株式会社 Information processing device, information processing method, information processing system, and storage medium

Also Published As

Publication number Publication date
JPWO2019198408A1 (en) 2021-02-12
JP6804009B2 (en) 2020-12-23

Similar Documents

Publication Publication Date Title
JP7413580B2 (en) Generating integrated circuit floorplans using neural networks
US20230252327A1 (en) Neural architecture search for convolutional neural networks
KR102457974B1 (en) Method and apparatus for searching new material
US20040167721A1 (en) Optimal fitting parameter determining method and device, and optimal fitting parameter determining program
JP7488457B2 (en) Optimization device, method for controlling the optimization device, and program for controlling the optimization device
Trinh et al. A novel constrained genetic algorithm-based Boolean network inference method from steady-state gene expression data
JP2021086371A (en) Learning program, learning method, and learning apparatus
WO2018168383A1 (en) Optimal solution assessment method, optimal solution assessment program, and optimal solution assessment device
JP7236253B2 (en) Information processing method and learning model
Yao et al. Efficient algorithms to explore conformation spaces of flexible protein loops
WO2019198408A1 (en) Learning device, learning method, and learning program
US20230051237A1 (en) Determining material properties based on machine learning models
CN112905809B (en) Knowledge graph learning method and system
JP2000057123A (en) Method and device for searching state of sequential circuit, and recording medium having recorded state search program thereon
JP2020064536A (en) Optimization device and method for controlling optimization device
JP7359493B2 (en) Hyperparameter adjustment device, non-temporary recording medium recording hyperparameter adjustment program, and hyperparameter adjustment program
JP7057003B1 (en) Predictor, trained model generator, predictor, trained model generator, predictor, and trained model generator
Pacchiano et al. Neural design for genetic perturbation experiments
JP6713099B2 (en) Learned model integration method, device, program, IC chip, and system
JP7224263B2 (en) MODEL GENERATION METHOD, MODEL GENERATION DEVICE AND PROGRAM
JP2020177508A (en) Prediction system, prediction method, and prediction program
Vitingerova Evolutionary algorithms for multi-objective parameter estimation
JP7298789B2 (en) Prediction device, learning device, prediction method, learning method, prediction program and learning program
Tiwari et al. A hybrid approach for feature selection
Patterson et al. Draft: Empirical Design in Reinforcement Learning

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 19785385

Country of ref document: EP

Kind code of ref document: A1

ENP Entry into the national phase

Ref document number: 2020513128

Country of ref document: JP

Kind code of ref document: A

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 19785385

Country of ref document: EP

Kind code of ref document: A1