WO2021176753A1 - Data value definition method, data collection facilitation method, data value definition system, and data collection facilitation system - Google Patents

Data value definition method, data collection facilitation method, data value definition system, and data collection facilitation system Download PDF

Info

Publication number
WO2021176753A1
WO2021176753A1 PCT/JP2020/033223 JP2020033223W WO2021176753A1 WO 2021176753 A1 WO2021176753 A1 WO 2021176753A1 JP 2020033223 W JP2020033223 W JP 2020033223W WO 2021176753 A1 WO2021176753 A1 WO 2021176753A1
Authority
WO
WIPO (PCT)
Prior art keywords
data
prediction model
value
contribution
prediction
Prior art date
Application number
PCT/JP2020/033223
Other languages
French (fr)
Japanese (ja)
Inventor
陽之 小田
茂紀 松本
Original Assignee
株式会社日立製作所
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 株式会社日立製作所 filed Critical 株式会社日立製作所
Publication of WO2021176753A1 publication Critical patent/WO2021176753A1/en

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N5/00Computing arrangements using knowledge-based models
    • G06N5/04Inference or reasoning models
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N99/00Subject matter not provided for in other groups of this subclass
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/04Forecasting or optimisation specially adapted for administrative or management purposes, e.g. linear programming or "cutting stock problem"
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q30/00Commerce
    • G06Q30/02Marketing; Price estimation or determination; Fundraising
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q50/00Information and communication technology [ICT] specially adapted for implementation of business processes of specific business sectors, e.g. utilities or tourism
    • G06Q50/04Manufacturing
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02PCLIMATE CHANGE MITIGATION TECHNOLOGIES IN THE PRODUCTION OR PROCESSING OF GOODS
    • Y02P90/00Enabling technologies with a potential contribution to greenhouse gas [GHG] emissions mitigation
    • Y02P90/30Computing systems specially adapted for manufacturing

Definitions

  • the present invention relates to a data value definition method, a data collection promotion method, a data value definition system and a data collection promotion system that give an incentive for data collection to a data provider by defining the value of the data, and particularly data.
  • the present invention relates to a data value definition method, a data collection promotion method, a data value definition system, and a data collection promotion system, which can exert a high effect by applying the present invention to valuable data transactions having a high acquisition cost.
  • the failure notification system shown in Patent Document 1 is known.
  • the technology used here that rewards the failure information provider is to determine the importance of the failure report information from the preset failure importance database and to reward the information provider. be.
  • Patent Document 1 based on the preliminary evaluation could not properly define the data value.
  • design prototype conditions that realize desired product functions and quality are obtained based on a prediction model created by using a machine learning method.
  • Patent Document 1 also has a problem that the value of the data cannot be defined in consideration of how much the data used contributes to the selection result of the design prototype conditions proposed by the prediction model. ..
  • the present invention makes it possible to promote data accumulation and sharing by giving the data provider the right to receive a reward according to the degree of contribution of the data provided by the data provider to product development. It is an object of the present invention to provide a data value definition method, a data collection promotion method, a data collection system, and a data collection promotion system.
  • a plurality of data related to the objective variable which is the evaluation value of the product function to be optimized is acquired, and the data has the feature amount data and the objective variable data which is the evaluation value as elements.
  • the data value is obtained for each of the multiple data related to the product function to be optimized first, based on the contribution, which indicates how much the prediction result changes when it is not included. It is characterized by defining.
  • the present invention is a data collection promotion method using a data value definition method, in which data related to an optimization target is obtained from a data provider, and the consideration for the determined data value is based on the data value for product contribution.
  • This is a data collection promotion method characterized by increasing the incentive to secure data by presenting the reward evaluation method to the data provider in advance.
  • the present invention has a first means for acquiring a plurality of data from an optimization target, a second means for generating a data set including a plurality of sets of a plurality of feature amount data and objective variable data, and an optimization target from the data set.
  • a third means of creating a prediction model of By searching the fourth means of presenting the design trial condition from the feature quantity vector, when predicting the objective variable corresponding to the design trial, from the change of the prediction result of the prediction model depending on the presence or absence of training data, each A data value definition system characterized by being provided with a fifth means for determining the degree of contribution of data to the design trial condition and a sixth means for determining the value of the data obtained by designing and trializing from the contribution. Is.
  • the present invention is a data collection promotion system using a data collection system, which is used by obtaining data to be optimized from a data provider and presenting a consideration for the determined value of the data to the data provider. It is a data collection promotion system characterized by increasing the incentive for the data provision of the person.
  • the flow chart which shows an example of the data value definition method which concerns on this invention.
  • An explanatory table of data according to the present invention An explanatory table of feature amount data according to the present invention.
  • Explanatory table of objective variable data according to the present invention The figure which illustrated the relationship between the acquired data D1 (h) and the raw data D2 (h).
  • the schematic diagram which shows the relationship between a database and a data set used for making a prediction model.
  • the schematic diagram which shows the prediction model use history data D6 (m, n) which concerns on this invention, and the prediction model use history database.
  • the conversion definition table in FIG. The schematic diagram which shows the schematic structure of the data value definition system which concerns on this invention.
  • the data value definition method and the data collection promotion method according to the present invention are executed by software in the data value definition system and the data collection promotion system composed of the computer system. Therefore, in the following description, the data value definition method will be described first in the first embodiment, and then the data value definition system composed of the computer system will be described in the second embodiment. Further, in the third embodiment, the data collection promotion method and the data collection promotion system shall acquire and operate not only the data in the own company but also the data widely including the external period.
  • the data value definition method according to the present invention will be described by applying it to the product production site or the product development site (hereinafter referred to as the production development site) as an example.
  • FIG. 1 is a flow chart showing an example of the data value definition method according to the present invention.
  • the present embodiment is shown as an example of the present invention by explaining a series of flows starting from the processing step S101 shown in FIG. 1 and ending in the processing step S116.
  • the actual processing of the data value definition is described in the processing steps S102 to S115.
  • FIGS. 2 to 4 show an explanation table of data, an explanation table of feature amount data, and an explanation table of objective variable data, respectively, and the following examples follow the definition of data described in this explanation table. I recommend the explanation of.
  • the index i is an index indicating each data D3 (i).
  • i used in D3 train (i), D3 variation (i), D3 test (i), and D3 not train (i) corresponds to D3 (i), but not necessarily for all i. There is no element.
  • the data acquirer acquires a plurality of data related to the product function to be optimized at the production development site.
  • This data may be offline accumulated data acquired in the past, or may be monitoring data constantly transmitted online at this production development site.
  • image data, output data of measuring equipment, signal data, input / output data of simulation analysis, audio data, material property data, etc. can be considered.
  • it may be data in which a plurality of these types of data are combined, or secondary data processed based on these data.
  • some of the acquired data may include environmental data such as temperature and humidity that cannot be controlled at the time of product design.
  • the data may be in a format suitable for each data, and does not have to be in a unified data format.
  • a format that can add tagging information such as meta information is preferable, and a data format having an information structure (xml format or json format) is preferable, and machine learning is more preferable.
  • a record-type data format that makes it easy to organize information in a tabular format suitable for use in Further, it may be an expression format that can be easily converted into the above data format by information processing. Further, it may include time information of the date and time when the data was acquired at the production development site.
  • the acquired data is expressed as acquired data D1 (h) by using the index h that identifies the data acquired in the processing step S102.
  • FIG. 5 is a diagram illustrating the relationship between the acquired data D1 (h) and the raw data D2 (h).
  • the raw data D2 (h) of FIG. 5 is composed of at least all of the following four elements or a combination thereof.
  • the first element 502 of the raw data D2 (h) is the index h of the acquired data D1 (h) and is used to uniquely identify a certain raw data in the database.
  • the second element 503 of the raw data D2 (h) is the data addition time, and the time when the data is added to the database is retained.
  • the third element 504 of the raw data D2 (h) is the data provider information.
  • data contributor information is information that identifies the person who acquired the data at the development site, but it is not limited to this, and it is intended for those who retain the right to receive compensation by adding the acquired data to the database. Information that can identify the target person may be used as data provider information.
  • the fourth element 201 of the raw data D2 (h) is the information of the acquired data D1 (h) obtained in the processing step S102.
  • the raw data D2 (h) is the acquired data D1 (h) with an index 502, a data addition time 503, and data provision. It can be said that the person information 504 is newly added.
  • the acquired data D1 (h) and the raw data D2 (h) mentioned in the processing steps S102 and S103 correspond to 201 and 202 shown in the data explanation table shown in FIG. 2, respectively.
  • the raw data D2 (h) 202 of FIG. 2 obtained in the processing step S103 is added to the database.
  • the database referred to here is as shown in 601 of FIG.
  • the raw data D2 (h) is converted into the format of the record data D3 (i) included in the data set 602 of FIG.
  • D3 (i) which is a common format as a record type
  • the data format of the acquired data content D1 (h) contained in the raw data D2 (h) is a fixed format. It does not have to be.
  • Database 601 can also be referred to as data lake 601.
  • the data lake 601 shown at the top of FIG. 6 can also use a unified data format.
  • D2 (1), D2 (2) ... In FIG. 6 indicate each raw data D2 (h), and the data lake 601 holds a set of raw data D2 (h).
  • a series of processes from the process step S102 to the process step S104 of FIG. 1 for the individual acquired data D1 (h) is sequentially converted from the individual acquired data D1 (h) into the raw data D2 (h) and generated in the database.
  • Data D2 (h) is accumulated.
  • the series of processes from the process step S102 to the process step S104 of FIG. 1 is a process of acquiring and accumulating the raw data D2 (h), so to speak, as a function in the computer system, the data addition unit 10 in FIG. It can be said that it is composed.
  • the machine By performing the processing steps S105 to S109 of FIG. 1 next using the database 601 after converting and acquiring the raw data D2 (h) for all the acquired data D1 (h) and accumulating the raw data D2 (h), the machine Build a prediction model by learning.
  • the series of processes from the process step S105 to the process step S109 in FIG. 1 is a process for constructing a prediction model, and it can be said that the prediction model creation unit 30 in FIG. 12, which will be described later, is configured as a function in the computer system. can.
  • FIG. 6 is a schematic diagram showing the relationship between the data lake and the data set used for creating the prediction model, and explains the flow of creating the data set used for creating the prediction model.
  • the processing step S105 of FIG. 1 corresponds to the processing S105 of FIG.
  • the data set 605 (lower part of FIG. 3) used for model creation is generated by the processing step S105 using the data included in the data lake 601 illustrated in the upper part of FIG.
  • the raw data D2 (h) 202 of FIG. 2 is converted into the record type data (hereinafter, data set D3 (i) FIG. 2 and FIG. 6 203) shown in each row of the data set 605.
  • data set D3 (i) FIG. 2 and FIG. 6 203 shown in each row of the data set 605.
  • the raw data D2 (1) in FIG. 6 can be converted into record data D3 (1).
  • the relationship between the raw data D2 (h) and the record data D3 (i) at this time does not need to correspond one-to-one as shown in 203 of FIG. 2, and one raw data D2 (h) to 1 A plurality of D3 (i) may be generated.
  • each data D3 (i) is generated by conversion from the raw data D2 (h).
  • Each record data D3 (i) has 603 of the index FIG. 6 of the raw data D2 (h) used at the time of generation. However, this is not used to create a predictive model.
  • it has the feature vector D4 (vc) (i) 301 and the objective variable vector D5 (vc) (i) 401 corresponding to each data D3 (i), and has the feature vector D4 (vc) (i).
  • the feature amount which is the j-th component, is expressed as D4 (i, j) 302.
  • the objective variable D5 (i, k) 402 which is the k-th component of the objective variable vector D5 (vc) (i) 401, is used.
  • the D5 (i, k) corresponds to 402 in FIG.
  • the feature vector D4 (vc) (i) 301 and the objective variable vector D5 (vc) (i) have at least one or more components.
  • D3 (i) 203 of the data set 605 will be described in detail as an example of a case where a material is newly developed at a production development site.
  • D4 (vc) (i) and D5 (i, k) are as the record data D3 (i) of the material is shown.
  • D3 (i) be the data obtained by combining D4 (vc) (i) representing the characteristics of one material and the objective variable data D5 (i, k) representing the function of the material.
  • a feature vector D4 of the material (vec) (i) it is possible by the feature vector from the information of the material composition D4 (vec) (i) is generated.
  • feature quantities D4 (vc) (i, j) generated from the material composition information on the elements derived from the elements such as the atomic weight, atomic number and electronegativity value of the elements contained in the material is used as the composition ratio of each element in the material. There is a weighted average value in. Similarly, weighted dispersion of information derived from the element can be considered, but the present invention is not limited to this.
  • the vector D4 (vc) (i) such that each vector element has a value corresponding to the composition ratio of the element can be used as a vector having the same dimension as the element species. good.
  • nanometer single-phase material information a crystal structure and molecular structure feature quantity generated from the vector of the order D4 (vec) (i) is also conceivable.
  • the feature quantities representing the crystal structure and the atomic arrangement include the feature quantities of Documents 1-5 below, but these are merely examples and are not limited thereto.
  • Reference 1 [P. J. Steinhardt, D.D. R. Nelson, and M. et al. Ronchetti, Phys. Rev. B 28, 784 (1983). ]
  • Reference 2 [M. Rupp, A. Tkatchenko, K.K. -R. Muller, and O. A. von Lilianfeld, Phys. Rev. Lett.
  • the feature vector D4 (vc) (i) generated from the material structure on the order of micrometers can be considered.
  • the feature vector D4 (vc) (i) can be used, which includes the dispersion of the average particle size and the particle size distribution, the phase fraction, the dislocation density, the dispersion of the concentration distribution inside the contained phase, and the like. , There are no particular restrictions on the feature vector.
  • the feature vector D4 (vc) (i) generated from the process conditions of the material can also be used, the constant temperature holding temperature, the constant temperature holding time, the cooling rate and the environmental conditions such as humidity, temperature, and pressure can be obtained.
  • a secondary feature amount generated by reducing the dimension from the feature amount vector as described above by, for example, principal component analysis, self-encoder, etc. may be used.
  • the property set as the objective variable D5 (i, k) is a measured value or expected value such as hardness, thermal conductivity, tensile strength, yield stress, elongation, reflectance, and solubility as a value indicating the physical properties and functions of the material. Is. Further, the measured value and the expected value are not limited to the actually measured value, and may be a calculated value output based on a computer simulation used at the production development site. For example, as examples of values derived by material simulation, the objectives are lattice thermal conductivity, band gap, solubility parameter, probability that the material exists stably, formation energy, relative energy value with other phases, diffusion coefficient, etc. Can be a variable.
  • the feature quantity vectors D4 (vc) (i) relating to the process design are, for example, a plurality of condition values and measured values that determine the process conditions, and are compression pressure, heat input amount, and additive element. Includes mixing ratio, outside air temperature, outside air humidity, etc. These are values obtained for each process condition and vectors having the above values as elements, and are referred to as process quantities.
  • the process amount may be a value indicating an uncontrollable environmental factor such as an outside air temperature or an outside air humidity from a controllable factor such as a mixing ratio of additives.
  • the record data D3 (i) can define the process amount related to one process as each element feature amount D4 (i, j), and the feature amount vector D4 (vc) (i) having each element. Can be defined.
  • the objective variable vectors D5 (vc) (i) related to the same process can be expressed, and the function evaluation value can be expressed as the objective variable D5 (i, k) of each component of the objective variable vectors D5 (vc) (i).
  • the record data D3 (i) includes the feature quantity vectors D4 (vc) (i) having the process quantity as each element component and the objective variable vector D5 (vc) indicating the evaluation value of the product function which is the output of the same process. ) Since it has (i), it can be said that D3 (i) holds information on the input / output relationship of the process.
  • the target record data D3 (i) is the feature amount vector D4 and (vec) (i) Having the variable vectors D5 (vc) (i) is the same.
  • the data set 605 composed of a plurality of sets of record data D3 (i) is used for generating a prediction model in the next stage (S106 to S107 in FIG. 1).
  • the feature amount D4 vector D4 (vc) (i) 301 has one or more components, and is expressed as the feature amount D4 (i, j) 302 of the j-th component.
  • the component D4 (i, j) of the feature amount vectors D4 (vc) (i) of each record data D3 (i) may have a missing value. If missing values are allowed, the missing values can be complemented before the prediction model is created. As a method of complementing missing values, for example, there is a multiple imputation method and the like, but the method is not limited to this.
  • the objective variable vector D5 (vc) (i) is a vector having an objective variable as an evaluation value of each function in each element.
  • one prediction model outputs the k-th objective variable D5 (predict) (p, k) of FIG. 4, which is the output, to D4 (vc) (candidate) (p) as an input.
  • D4 (vc) (candidate) (p) as an input.
  • a set of feature quantities vectors D4 (vc) (i) as inputs and feature quantities D5 (i, k) as outputs of record data D3 (i) used as a data set is used.
  • a prediction model is created using all the data for i as a data set.
  • the same feature vector D4 (vc) When creating prediction models M (k-1), M (k), M (k + 1) ... Using i) as an input, create multiple prediction models by simultaneously optimizing these prediction models. You can also do it. Therefore, the objective variable vectors D5 (vc) (i) of the record data D3 (i) included in the data set can have one or more objective variable elements.
  • the optimized prediction model M (k) is a function that predicts the objective variable D5 k 414 of FIG. 4 by using 309 of D4 (vc) FIG. 3 as an input. This is a prediction model determined for each k of the objective variable D5 k.
  • a neural network or a random forest can be considered, but the optimization method is not limited to this.
  • Missing values may be included in the objective variable data D5 (i, k). Such missing values are complemented based on the distribution D4 (vc) (i) of the feature quantity vector of the data set and the objective variable D5 (i, k) that does not include the missing values, or a semi-supervised learning method. Therefore, the missing values of the objective variable D5 (i, k) may be complemented step by step until the prediction model to be finally adopted is created.
  • the record data D3 (i) in the data set is trained data D3 train (i) 204 in FIG. 2, validation data D3 validation (i) 205 in FIG. 2, and a test.
  • Data D3 test (i) The procedure for dividing into 206 in FIG. 2 will be described.
  • the data set composed of the record data D3 (i) includes the training data D3 train (i) 204 in FIG. 2, the validation data D3 validation FIG. 2 205, and the test data D3 test (i), respectively. It is divided into 2 206. Thereby, it is preferable to divide into three, a D3 train (i) training data set, a validation data set composed of D3 validation (i), and a test data set composed of D3 test (i).
  • the training data set is used for training to create the prediction model
  • the validation data set is used for the selection of hyperparameters to prevent overfitting
  • the performance evaluation of the final prediction accuracy is the test data set. Do by.
  • the prediction model It is better to select a prediction model with high generalization performance by evaluating the prediction error or the distribution of the prediction error.
  • the validation data set and the test data set may be the same.
  • the method for creating the prediction model used in the prediction model creation by the processing step 107 includes generalized linear regression, logistic regression, support vector machine, random forest, neural network, and the like. Methods can be mentioned, but are not limited to this. Further, the same purpose of creating a prediction model is not limited to the regression model, and may be the creation of a prediction model that predicts the classification result.
  • the prediction performance of the prediction model created in the processing step S107 is evaluated.
  • the prediction error indicated by RMSE (Root Mean Squared Error) or MAE (Mean Absolute Error), the coefficient of determination, etc. can be considered in the regression problem, but the value is not limited to this. If the user can determine that the prediction accuracy of the prediction model is sufficient for use, the user of the prediction model can use the prediction model regardless of the type of evaluation index of prediction accuracy and the threshold value that determines the allowable range of the evaluation index. It is only necessary to be able to decide whether or not to use it.
  • the value of each element of the mixed matrix showing the classification result, the correct answer rate calculated using those values, the precision rate, the recall rate, etc. can be considered.
  • the type of evaluation index for the prediction accuracy does not matter.
  • the processing step S109 based on the prediction model evaluation result in the processing step S108, it is determined from the evaluation result of the prediction performance whether the prediction model is allowed to be used for product development. If it is not allowed, the operation from the acquisition of the data in the processing step S102 to the processing step S108 is improved, and a new prediction model is created and the prediction model is evaluated again. However, we will improve the model until it is sufficient for use in product development.
  • the data set D3 (i) is changed by re-doing the processing step S102 and adding the new raw data D2 (h) to the database. Further, by changing the feature amount conversion method when converting the raw data D2 (h) to the data D3 (i), the feature amount vector D4'in a format different from that of the previously used feature amount vectors D4 (vc) (i). (Vec) An improvement method for creating (i) is also conceivable.
  • Processing step S106 When it is necessary to change the prediction model definition of the optimization target, the optimization method and the function form of the optimization target are changed in the processing step S106.
  • the prediction model is used in the prototype design stage by product development, process condition optimization at the production site, and the like.
  • the processing of the processing step S110 and the processing step S111 is a processing process in which the constructed prediction model is actually used, and it can be said that the prediction model use unit 40 in FIG. 12, which will be described later, is configured as a function in the computer system. can.
  • the condition is proposed by predicting the optimum design prototype condition using the prediction model.
  • the objective variable D5 (predict) (p, k) 408 of FIG. 4 output by the prediction model predicts and outputs the functional value that the product wants to satisfy.
  • the feature quantity vector D4 (vc) (candidate) (p) corresponds to the design prototype condition of the product and can be proposed as numerical information of each element of the feature quantity vector.
  • the feature vector D4 (vc) (candidate) (p) proposed from the prediction model is a feature vector having a one-to-one correspondence with the design prototype condition itself, or a feature vector obtained by converting the design prototype condition. ..
  • the prediction model M (m) predicts the k-th functional value of the product as the objective variable D5 (predict) (p, k) by inputting the feature vector D4 (vc) (candidate) (p).
  • the objective variable vector D5 (vc) (predict) (p) As the objective variable vector D5 (vc) (predict) (p) that satisfies the requirements of a plurality of functional values is output, D4 (vc) (candidate) (p) at the time can be searched.
  • the output of the predictive model D5 (predict) (p, k ) is requested function a feature amount such as an input so as to satisfy vector D4 (vec) (candidate) to (p) corresponding to a plurality condition D4 (vec) (candidate )
  • a feature amount such as an input so as to satisfy vector D4 (vec) (candidate) to (p) corresponding to a plurality condition D4 (vec) (candidate )
  • the output D5 (predict) (p, k) of the prediction model can be estimated, and the search for design prototype conditions in product development can be made more efficient. From the obtained input feature vector D4 (vc) (candidate) (p), the design prototype conditions to be implemented are proposed.
  • the specific search method of (vector ) (p) is shown below.
  • a search method for the feature vector D4 (vc) (candidate) (p) that outputs a desired objective variable a plurality of input feature vectors D4 (vc) for a prediction model are used by a genetic algorithm or a gradient method.
  • the value whose target function (objective variable, prediction result of the objective variable vector) is closest to the desired value is searched sequentially and comprehensively.
  • the design prototype conditions are uniquely determined from the obtained feature quantity vectors D4 (vc) (candidate) (p), or the design prototype conditions of a plurality of candidates are converted into the feature quantity vectors of a plurality of design prototype candidates, and then the above.
  • a promising design prototype condition is estimated by selecting a feature quantity vector that is close to D4 (vc) (candidate) (p) obtained from the prediction model in the feature quantity space. It is possible.
  • the method of searching for D4 (vc) (candidate) (p) is desired by simply performing a grid search on the feature amount D4 j , which is each element of 310 in the feature amount vector space D4 (vec) FIG. of D5 (vec) (predict) may be searched for (p) and outputs the feature vector D4 (vec) (candidate) ( p).
  • all of these design prototype conditions are converted into feature vector, input to the prediction model, and then output as the objective variable vector D5 ( vec), by evaluating the objective variable D5 (i, k), from among the inputted feature vector, there is a method of selecting the best feature vector D4 (vec) (candidate) ( p). Thereby, a promising design prototype condition can be specified from the determined D4 (vc) (candidate) (p).
  • the above design and trial production condition proposal method is an example, and as a proposal result using the prediction model, the design trial production condition is obtained from the function evaluation result (prediction result of the objective variable or the objective variable vector) which is the output result of the prediction model.
  • the input feature vector D4 of the prediction model corresponding (vec) (candidate) (p ) or proposes a range of input feature vectors things, corresponding to the feature vector D4 (vec) (candidate) (p )
  • the above example is not limited to the above example as long as the design prototype conditions can be estimated.
  • the feature quantity vector D4 (vc) (candidate) (p) corresponding to the design prototype condition is proposed using the prediction model M (m).
  • the feature quantity vector D4 (vc) (candidate) (p) proposed at this time is appropriately used by the processing step S111 in the nth usage history data D6 (m, n) of the mth prediction model M (m) FIG. Is converted to 208.
  • the nth usage history data D6 (m, n) of the Mth prediction model M (m) corresponds to 208 shown in FIG.
  • This usage history data D6 (m, n) is appropriately added and saved in the prediction model usage history database 709.
  • n is an index uniquely given to each of the prediction models M (m).
  • n is an index indicating that it is the nth data in the usage history data of the prediction model M (m), and even if it is the usage history data for the same prediction model M (m), the usage history data D6 (m). , N) are distinguished for each n.
  • the usage history data D6 (m, n) is composed of at least all of the following four elements or a combination thereof.
  • the first element 702 is an index m that distinguishes the prediction model M (m). Since this index can specify the prediction model when the prediction model to be optimized is defined in the process S106 of FIG. 1, the process step S112 described later can be performed.
  • the second element 703 is information that can refer to the record data D3 (i) or D3 (i) used for creating the prediction model M (m), and is, for example, an index that distinguishes the record data of the data set. .. In addition, it also has information that can distinguish the usage method at the time of creating a prediction model, such as training data D3 train (i), test data D3 test (i), and verification data D3 validation (i). Similar to the first element, this information is also used for the contribution analysis of each data to the prediction model usage history by the processing step S112.
  • the record data D3 (i) holds the index h of the raw data D2 (h) used for generation 603 in FIG. 6, the database of the raw data D2 (h) corresponding to D3 (i) is stored. You can see the time of addition to (503 in FIG. 5). The data at the additional time is used in the processing step S113.
  • the third element 704 is a function (function as a function of the program) that outputs the objective variable D5 k by inputting the feature amount vector D4 (vc) used in the proposal of the design prototype condition, and is a prediction model M (m). ) Function.
  • the prediction model M (m) may be an object indicating a program function as long as it returns the result of performing a certain process on the input, or it may be an API of an external program. There may be.
  • the fourth element 705 is the "usage history of the prediction model M (m)", and the usage history is the feature vector D4 (vc) (candidate) corresponding to the design and prototype conditions proposed by the prediction model M (m). ) It is a set P (m, n) of (p).
  • the index m is an index indicating the prediction model M (m)
  • n indicates that the nth usage history data has it, and corresponds to the index n of D6 (m, n). Therefore, the set composed of a plurality of feature vectors D4 (vc) (candidate) (p) proposed when the nth trial production condition is proposed using the prediction model M (m) is P (m). , N).
  • the prediction model M (m) usage history total data D6 (m) 708 included in the prediction model usage history database 709 uses the nth usage history data D6 (m, n) of the prediction model M (m) for each index m. It is distinct and integrated for all indexes n.
  • the usage history database 709 stores all usage history data for each m.
  • the data format of the prediction model M (m) usage history total data D6 (m) is the same as the prediction model M (m) usage history data D6 (m, n) shown in 208, and the four elements 702 and 703 , 704, 705 shall be included.
  • the set of the usage history of the prediction model M (m) corresponding to the element 705 in the prediction model M (m) usage history total data D6 (m) 708 is the set of the feature vector D4 (vc) (candidate) (p). Notated as P (m).
  • the set P (m) of the feature quantity vectors D4 (vc) (candidate) (p) is the nth usage history of the prediction model M (m) added to the prediction model usage history total data D6 (m) 708. It is a set P (m) in which the set P (m, n) of feature vector held by the data D6 (m, n) as the element 705 is integrated for all n.
  • the feature vector D4 (vc) (candidate) (p) in the set does not overlap.
  • D4 (vc) (candidate) (p) in this set P (m) is used for data value definition in processing steps S112 to S115.
  • the prediction model usage history database 709 was created by the above processing steps S110 to processing step S111.
  • the value of each raw data D2 (h) is defined using the usage history database FIG. 7 709 of the prediction model. Since the series of processes from the process step S112 to the process step S115 is a process of defining the data value in each raw data D2 (h) based on the usage history database 709 of the prediction model, so to speak, in the computer system. It can be said that the data value definition unit 50 of FIG. 12, which will be described later, is configured as the function of.
  • the feature quantity vector D4 (vc) (candidate) (p) corresponding to the design and trial production conditions included in the prediction model usage history database 709 is used.
  • D5 (predict) (p, k) which is the output result corresponding to the feature vector D4 (vc) (candidate) (p), which was the basis of the proposal.
  • the final raw data D2 (h) according to the contribution of each record data D3 (i) and the time when the raw data D2 (h) on which each data D3 (i) is added to the database. Define value.
  • the contribution solution of each data to the prediction model usage history, which is the prediction result of each record data, is performed in the processing step S111, and the weight calculation to the contribution according to the addition time of each data is performed in the processing step S112. Subsequently, in the processing step S113 showing the calculation of the weighted contribution of each data, and finally, the weighted contribution of each data is converted into the data value by the processing step S115. Define value. This series of flow will be described below.
  • FIG. 8 is a conceptual diagram of a method for defining the contribution of each data to the design prototype data point (p) according to the present invention.
  • the schematic diagram of the feature amount space shown in FIGS. 801 and 802 shows the feature amount vector D4 (vc) (309 in FIG. 3) on the horizontal axis.
  • the vertical axis is the objective variable D5 k (414 in FIG. 4).
  • This is the k-th component D5 k (414 in FIG. 4) of the objective variable vector D5 (vec) , which is a vector having a prediction target function for each component, and shows one of the evaluation values of the prediction target function. be.
  • the product development goal aims to maximize the prediction target function D5 k.
  • FIGS. 801 and 802 Schematic of the feature space of FIG. 8
  • D4 (vec) is usually a multidimensional vector.
  • the prediction model M (m) outputs the objective variable D5 k with respect to the feature vector D4 (vc) .
  • the response curve 806 (D4 (vc)) is multidimensional, the response curved surface ) Is shown in the figure.
  • the output value D5 k shows an increasing tendency and then shows a flat portion, and then shows a flat portion. It is shown that the objective variable D5 k is predicted to be the maximum in terms of the feature vector D4 (vc) (candidate) (p) by increasing again, and then tends to decrease.
  • the training data D4 train (i) used to create the prediction model M (m) 806 is represented as a black square point on the feature space as shown in the legend 803.
  • Feature vector indicating the usage history D4 (vec) (candidate) 804 point indicating the (p) are designed, the point on the horizontal axis feature vectors D4 corresponding to the condition of performing the trial (vec) D4 (vec) ( The point showing the response relationship of the prediction model M (m) to the candidate (p) is shown as 804.
  • the method of quantifying the contribution of the training data D3 (') train (i) to the prediction result of the prediction model M (m) is shown below.
  • the feature vector D4 (vc) (candidate) (p) indicating the design prototype conditions proposed by the prediction model M (m) is stored in the prediction model usage history database 709 of FIG. 7 in the processing step S111. ..
  • the prediction model M (m) inputs the feature quantity vector D4 (vc) (candidate) (p) corresponding to the design prototype condition, and outputs the output value D5 (predict) (p, k).
  • the response relationship of this prediction model is shown by the circle point 804.
  • the contribution of a certain training data D3 (') train (i) to the value of the prediction result D5 (predict) (p, k) of the prediction model M (m) is quantified by the prediction model M (m).
  • the prediction model M delegate (D3 (') train (i), m) outputs a black cross 807 as a prediction result for the feature vector D4 (vc) (candidate) (p) corresponding to the design prototype condition. do.
  • the prediction model M (m) uses D4 (vec) (candidate) (p).
  • Quantification of the said contribution contribution is for D4 corresponds to the design prototype conditions (vec) (candidate) (p ), measured value D5 (actual) (p, k ) of the objective variable D5 k 410 of FIG. 4 A method for quantifying the degree of contribution is shown both when it is clarified after the prototype and can be defined using the measured value, and when the measured value D5 k (actual) (p, k) of the objective variable is not clear.
  • the measured values D5 (vc) (p, k) have been clarified by design trial production, and a method of defining the degree of contribution using this is shown.
  • the measured value D5 actual (p, k) of the objective variable D5 k used in the above equation (1) is a variable corresponding to 409 in FIG.
  • the predicted value D5 (predict) (p, k) of the objective variable D5 k of the predicted model M (m) is a variable corresponding to 408 in FIG. 4, and the predicted model M delegate (D3 train (i), m).
  • the predicted value D5 (v-predict (i)) (p, k) of the objective variable D5 k is the variable shown in 412 of FIG.
  • the degree of contribution C (D3 tain (i), p) is indicated by a value indicating the difference between the double-ended arrows, as explained by 809 in FIG.
  • the difference indicated by the double -headed arrow is the measured value D5 (actual) by comparing the prediction result D5 (v-predict [i]) (p, k) 807 with the prediction result D5 (predict) (p, k) 804. It is a value indicating how much the prediction accuracy of 808 is improved by the training data D3 (') train (i). Therefore, it is a negative contribution to the data whose prediction accuracy is lowered.
  • the contribution can be defined by the following method.
  • the difference between the above is defined as the degree of contribution.
  • the prediction result of the feature quantity vector D4 (vc) (candidate) (p) indicating the design prototype condition proposed in the prediction model M (m) includes the training data D3 (') train (i) as the training data.
  • the contribution of the training data D3 (') train (i) is defined for the change.
  • a data set excluding a certain training data D3 (') train (i) for obtaining the degree of contribution is created, and then the prediction model is optimized again to obtain the prediction model M delegate (D3 (D3).
  • the prediction model is optimized again to obtain the prediction model M delegate (D3 (D3).
  • trains (i) and m) it is possible in principle to derive the change given to the prediction result with and without the training data D3 (') train (i) and obtain the contribution.
  • the above-mentioned definition method of contribution is a definition method based on the contribution to the prediction result limited to the training data D3 train (i) in the data set, and is the test data excluded from the training when the prediction model is created. Contribution degree cannot be defined in D3 test (i) and validation data D validation (i). Here, since D3 test (i) and D validation (i) do not contribute to the learning for creating the prediction model, their respective contributions, C (D3 (') test (i), p), C ( D3 (') validation , p) can also be 0.
  • test data D3 test By associating i) and the validation data D3 validation (i) with the training data in the vicinity of the feature amount space, the degree of contribution can be defined respectively.
  • the closest distance to the test data D3 test (i) by using the Euclidean distance obtained by calculating the square root of the sum of squares for each feature component in the feature vector space D4 (vc). It can be treated as if the D3 test (i) has the same value as the value C (D3 train (i), p) of the training data D3 train (i) in.
  • the contribution of the validation data D3 validation (i) to the prediction result can also be defined as the same contribution as the learning data D3 train (i) at the closest distance.
  • the feature quantity vector D4 ( feature quantity vector D4) indicating the design prototype conditions proposed in the prediction model M (m).
  • the contribution C (D3 (i), p) of the record data D3 (i) to the change in the prediction result of vc) (candate) (p) could be defined.
  • the contribution C (D3 (i), p) defined for each of the data D3 (i) inside the data set is standardized. Dimensionless to eliminate the influence of the scale of contribution that depends on the target function of the prediction result. At this time, as long as the relative rate of change of contribution can be maintained, there is no particular limitation on the standardization method.
  • C (D3 (i)) in which the contribution of the data is negative is to prevent the value of such data from becoming negative.
  • p) are replaced with 0.
  • the contribution C (D3 (i), p) of the data to a certain feature vector D4 (vc) (candidate) (p) is added to all the data D3 (i), and the total is C ( It can be standardized by dividing D3 (i), p).
  • This standardized contribution is set as the standardized contribution C standard (D3 (i), p), and this value is used in the calculation of the data value in the subsequent processing step S113.
  • the standardized contribution degree C standard (D3 (i), p) to each record data D3 (i) to the design prototype data D4 (vc) (candidate) (p) of the prediction model usage history is obtained. I was able to get it.
  • FIG. 9 shows a conceptual diagram of a weighting method according to the data addition time with respect to the contribution of each record data D3 (i) to the design prototype data point (p).
  • the standardized contribution C standard (D3 (i), p) is the data D3 (i) obtained from the change in the prediction result obtained by inputting the design prototype data points (p) D4 (vc) (candidate) (p) of FIG. ) Contributes to the prediction result.
  • S113 a process of weighting the contribution degree in order to obtain the value of the data according to the data acquisition time will be described.
  • 901 is a feature amount of the design prototype data point (p) with the standardization contribution C standard (D3 (i), p) of the data D3 (i) to the prediction result obtained by the processing step S112 on the vertical axis.
  • the Euclidean distance on the feature space D4 (vc) from the vector D4 (vc) (candite) (p) was taken as the horizontal axis.
  • Each data D3 (i) is indicated by three points, data (1), data (2), and data (3) by black triangle 605, black square 606, and black cross 607, respectively.
  • the component D4 (i, j) of the j-th feature amount vector of the feature amount vector D4 (vc) (i) of each data D3 (i) used for creating the prediction model is all the data set.
  • the feature amount D4 (i, j) of each component j is often standardized so that the standard deviation becomes 1 at the stage of creating the data set, which is the basis of the feature amount.
  • the information-specific scale difference does not retain the feature D4 (i, j).
  • the time t i at which the record data D3 (i) is added to the database corresponds to the time when the raw data D2 (h) that generated the record data D3 (i) is added to the database (503 in FIG. 5). Therefore, the information of the index h of the raw data D2 (h) records data D3 (i) is held, can obtain t i from the additional time to the database. In other words, additional time t i is a value that depends on h.
  • 902 of FIG. 9 shows that the later the addition to the database, the lower the weight change rate, and the later the addition time, the lower the data value with respect to the prediction result of the design prototype data point (p). ..
  • data is added in the order of 907, 906, 905, which indicates that the data value decreases in this order.
  • the f (t i) is a decreasing function, for example, can be determined using the 0 ⁇ a ⁇ 1 for under (3).
  • t first corresponds to t i corresponding to the early was record data D3 (i) Most acquisition of the data set.
  • a period of time raw data f a (t i) may be a function such as treated as a constant that is obtained.
  • a step width w is a positive number, with respect to an integer of 0 or more q
  • function S w (t i where f (t i) is increased by w for each width w ) using the weight change rate f (t i) can be defined as shown in equation (5).
  • a is the same as that of Eq. (3).
  • f (t i) to the data of the raw data D2 (h) acquired during a period of time w, it is weighted with the same importance.
  • the data acquisition stage 1, the data acquisition phase 2, arbitrarily determined the respective data acquisition period as ..., are in the same acquisition period f (t i) outputs a constant value, relative things acquisition period is slow, even going importance to reduce the shown weighting f (t i) methods.
  • f (t i) is no particular limitation on but not limited to the example above as long as it can represent a decrease change in the data value to time.
  • the three data points 908 before weighting correspond to the data (1) 905 to the data (3) 907 of FIG. 901, respectively. It is indicated by a black circle inside the figure of 903.
  • each data D3 (i) changes as shown by the black arrow.
  • data weighting contribution f (t i) C standeard ( D3 (i), p) is Motoma' through processing steps S113,114.
  • the said weighting contribution, relative is D3 (i) all the target data value definitions, calculates the sum of f (t i) C standard ( D3 (i), p), the weighted contribution in total By dividing the degree, the weighting contribution is standardized.
  • the value after standardization at this time is set as the standardization weighting contribution degree cm, p (i h ), and this value is used in the processing step S115 for defining the data value.
  • i h denotes the index i corresponding to the record data D3 generated from the raw data D2 (h) (i).
  • the non-weighted contribution upper n or 0, may be performed standardized thereon.
  • the standardized weighting contributions cm and p (i h ) are used.
  • the data value of each data D2 (h) is defined based on the standardized weighting contribution degrees cm and p (i h ) of the data obtained in the above-mentioned processing step S114. Since the data D3 (i) is converted from D2 (h), the index i is associated with h. Therefore, the index corresponding to the index i is given h and is referred to as i h below.
  • the process up to the derivation of the value V (h) of the data D2 (h) will be described with reference to FIGS. 10, 11, and 12.
  • FIG. 10 is a conceptual diagram showing an example of data-driven product development
  • FIG. 11 is a conceptual diagram showing the relationship between product profit and the degree of contribution of data (ich )
  • FIG. 12 is a conversion definition table in FIG.
  • design prototypes are performed under multiple conditions until the finished product (1000 in FIG. 10) is produced.
  • the design prototype conditions proposed based on the prediction model M (m) are shown by 1001A and 1001B. It corresponds to the point 804 shown by the design and prototype data points (p) in FIG. 8 described above.
  • the design prototype 1001A and the design prototype 1001B are design and prototype conditions proposed based on the function prediction results output by the prediction models such as the prediction model M (A) and the prediction model M (B).
  • the record data D3 (i h ) included in the data set used to generate the prediction model M (A) and the prediction model M (B), which proposed the design prototype conditions, and the prediction models M (A), M ( B) ... is tied. Further, the record data D3 (i) is associated with the raw data D2 (h) used for data generation.
  • feature data D4 (vc) (candidate) (p) corresponding to product design prototype conditions are linked to D2 (h) through prediction models M (m) to D3 (i h). Therefore, the data value V (h) of each data D2 (h) can be defined according to the progress of product development.
  • the definition of the data value definition V (h) is determined according to the product profit Sales according to the above-mentioned connection in FIG.
  • the whole picture is shown in 1101 of FIG.
  • the contribution areas of each development contribution element are shown by 1103, 1104A, 1104B, 1105A, and 1105B with respect to the product profit sales (1102 in FIG. 11) inside the overall picture.
  • the area 1103 surrounded by a square is an area according to the data contribution rate R data 1109 in which the entire data indicates the degree of contribution of the product profit, and is the area of the P sales R data . It becomes the area 1103 of the size.
  • this data contribution rate R data a method of arbitrarily deciding before the start of product design or a method of arbitrarily deciding according to the production situation after the start of product production can be considered in the form of a contract.
  • this technology and its technology that determines the value of each data It can be implemented by combining the methods.
  • the data contribution area is the area 1103 indicated by Sales R data
  • the data contribution area is based on the contribution rate M rate (m) 1110 of each prediction model M (m).
  • m is an index that distinguishes each prediction model M (m).
  • the contribution regions by the prediction models M (A) and M (B) are indicated by 1104A and 1104B.
  • the size of the contribution region according to the prediction model M (m) is indicated by M rate (m) R data Sales .
  • the contribution area of each prediction model M (m) and the size of the area Meter (m) R data Sales can also be divided into the contribution areas of each data D3 (i).
  • the contribution ratio of the data D3 (i h ) to the contribution region of a certain prediction model M (m) is 1111 in FIG. 11 of C m (i h )
  • the contribution region of the prediction model M (m) is C m. (I h ) Rate (m) R data Sales .
  • the data value V (h) of the raw data D2 (h) can be defined by the equation (6).
  • m is an index that distinguishes the prediction model M (m)
  • the set I (D2 (h)) is an index i h of the record data D3 (i h) associated with the raw data D2 (h). Is a set that has as an element.
  • the data of the raw data D2 (h) is determined according to the data provider information (504 in FIG. 5) of the raw data D2 (h).
  • the provider can be paid as a reward as appropriate according to the data of the data value V (h).
  • a plurality of data related to the product function to be optimized are acquired, and the data is used as the feature amount data and the evaluation value of the product function.
  • a prediction model that predicts the objective variable which is a value for evaluating the function to be optimized, is created from the plurality of data for creating the prediction model by converting the variable data into data for creating the prediction model, and has a plurality of features.
  • the feature quantity vector which is input information indicating the optimum design trial condition obtained by using the prediction model that outputs the predicted value of the objective variable by inputting the feature quantity vector determined from the quantity, is obtained and is used for creating the prediction model.
  • the training data set to which the contribution is to be derived does not include the definition of the contribution, which indicates how much each of the multiple data contributed to the prediction result of the feature quantity vector indicating the optimum design trial conditions.
  • An example of a data value definition method and a data collection promotion method characterized by the above is shown. This allows the data provider of each data to be paid an appropriate reward according to the defined data value.
  • Example 1 the data value definition method was described, but in Example 2, a data value definition system that realizes this by a computer will be described.
  • This system function may include some of the functions of the first embodiment or may include all of them.
  • FIG. 13 is a schematic diagram showing a schematic configuration of a data value definition system according to the present invention.
  • the data value definition system 1200 of FIG. 13 is a system that defines the data value from the contribution of each data to product development in consideration of the contribution of the prediction model to the prediction result, and is roughly divided into six functions. It consists of parts.
  • These functional parts include a data addition unit 10 that acquires data and adds it to the database, and a data storage unit 20 that stores various data such as a database based on the acquired data and a usage history database of a prediction model.
  • Each data is based on the prediction model creation unit 30 that creates a prediction model using the created database, the prediction model usage unit 40 that uses the created prediction model for product design, and the usage history database of the prediction model.
  • It is composed of a data value definition unit 50 that defines a value and an input / output unit 60 that inputs and outputs an analysis calculation.
  • the internal configuration of each part will be described more specifically.
  • the data addition unit 10 has a data acquisition mechanism 11, a data addition time, an addition mechanism 12, and a database addition mechanism 13, and each mechanism acquires data in FIG. 1 (processing step S102) and adds time to data. It can be said that it is responsible for granting (processing step S103) and adding to the database (processing step S104).
  • data includes measurement devices such as microscopes, IR analysis device, NMR analysis device, X-ray analysis device, electron beam analysis device, analysis device by computer simulation, and various analysis devices. It may be included in the acquisition mechanism 11, and the data acquisition mechanism 11 may handle the data directly input by the input mechanism 61 described later or the data received by the data communication mechanism 63. In this regard, the input mechanism 61 and the data communication mechanism 63 can also be considered as part of the data acquisition mechanism 11.
  • each mechanism of the data addition unit 10 there is no particular limitation on the device configuration of each mechanism of the data addition unit 10, and a conventional analysis device (for example, a computer) or the above-mentioned measuring device (microscope, etc.) is appropriately used. can.
  • a conventional analysis device for example, a computer
  • the above-mentioned measuring device microwave, etc.
  • the data captured by the data addition unit 10 is stored in the data storage mechanism 21 of the data storage unit 20.
  • a plurality of data addition units 10 may exist for the data storage device, and in that case, there is an advantage that data can be collected and utilized from a plurality of production development sites or research bases.
  • the data storage mechanism 21 is not particularly limited as long as necessary data can be stored, and a conventional data storage device (for example, random access memory (RAM), hard disk (HD), solid state drive (SSD), etc.) is appropriately used. can. Further, the data storage device does not need to be realized by a single node, and a plurality of nodes may be connected by a network and distributed processing may be performed. Further, in the case of analysis processing suitable for performing distributed processing, the mechanisms 30, 40, and 50 may also have a configuration in which a plurality of nodes are connected on a network to perform processing.
  • RAM random access memory
  • HD hard disk
  • SSD solid state drive
  • the prediction model creation unit 30 has a data set creation mechanism 31, a prediction model definition mechanism 32, a prediction model creation mechanism 33, and a prediction model performance evaluation mechanism 34, and each mechanism creates a data set (processing) in FIG. It can be said that it is responsible for step S105), definition of a prediction model to be optimized (processing step S106), creation of a prediction model (processing step S107), and evaluation of prediction model performance (processing step S108, processing step S109). Further, the result calculated in each mechanism of the prediction model creation unit 30 is stored in the data storage mechanism 21.
  • the prediction model usage unit 40 has a prediction model usage mechanism 41 and a prediction model usage history addition mechanism 42, and each mechanism uses the prediction model in FIG. 1 (processing step S110) and the prediction model usage history. It can be said that it is responsible for the preservation (processing step S111). Further, the result calculated in each mechanism of the prediction model use unit 40 and the created data are stored in the data storage mechanism 21.
  • the data value definition unit 50 has each data contribution analysis mechanism 51 for the prediction model usage history, a weight acquisition mechanism 52 for the contribution based on each data addition time, and a conversion mechanism 53 from the weighted data contribution to the data value.
  • Each mechanism performs each data contribution analysis to the prediction model usage history in FIG. 1 (processing step S112), weight calculation to the contribution according to each data addition time (processing step S113), and each data. It can be said that it is responsible for calculating the weighted contribution of (processing step S114) and converting the weighted data contribution to data value (processing step S115).
  • the device configuration of each mechanism of the prediction model creation unit 20, the prediction model use unit 40, and the data value definition unit 50 there is no particular limitation on the device configuration of the conventional analysis device (for example, , Computer device) can be used as appropriate.
  • the input / output unit 60 inputs analysis conditions (for example, direct input of acquired data, input of machine learning model selection information, hyperparameters of the model, parameters of optimization algorithm, search range of parameters, etc.). It has an input mechanism 61 and an output mechanism 62 that outputs the analysis result, and is responsible for input / output related to the flow from the processing step S103 to the processing step S116. Information such as various analysis conditions and analysis results for the input data is stored in the data storage mechanism 21. The output information can also be stored in the data storage mechanism 21 in the same manner.
  • analysis conditions for example, direct input of acquired data, input of machine learning model selection information, hyperparameters of the model, parameters of optimization algorithm, search range of parameters, etc.
  • Information such as various analysis conditions and analysis results for the input data is stored in the data storage mechanism 21.
  • the output information can also be stored in the data storage mechanism 21 in the same manner.
  • the device configuration of the input mechanism 61 and the output mechanism 62 is not particularly limited, and conventional input / output devices (for example, keyboard, display, printer) can be used as appropriate.
  • Example 3 the data value definition method in Example 1 and the data value definition system in Example 2 are further extended to configure a data collection promotion method or a data collection promotion system using these.
  • Example 3 when acquiring a plurality of data from the optimization target, not only the data in the own company but also a wide range of external organizations are acquired. After that, the incentive to secure the data is further enhanced by proposing and providing the data provider who provided the data with the consideration according to the value determined by the evaluation.
  • 10 Data addition unit
  • 20 Data storage unit
  • 30 Prediction model creation unit
  • 40 Prediction model usage unit
  • 50 Data value definition unit
  • 60 Input / output unit

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Theoretical Computer Science (AREA)
  • Strategic Management (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • General Physics & Mathematics (AREA)
  • Human Resources & Organizations (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Tourism & Hospitality (AREA)
  • Development Economics (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Game Theory and Decision Science (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Software Systems (AREA)
  • Mathematical Physics (AREA)
  • General Engineering & Computer Science (AREA)
  • Accounting & Taxation (AREA)
  • Finance (AREA)
  • Primary Health Care (AREA)
  • Computational Linguistics (AREA)
  • Evolutionary Computation (AREA)
  • Artificial Intelligence (AREA)
  • Manufacturing & Machinery (AREA)
  • General Health & Medical Sciences (AREA)
  • Health & Medical Sciences (AREA)
  • Educational Administration (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Provided are a data value definition method and a data collection system that give a data provider a right to receive a reward in accordance with a contribution degree of product development by the data provider, thus enabling encouragement of data accumulation and sharing. The present invention is characterized by: acquiring a plurality of data items related to a function evaluation value that is to be optimized; converting the data items to prediction model creation data items having, as elements, feature amount data and objective variable data that is an evaluation value; creating a prediction model for predicting, from the plurality of prediction model creation data items, an objective variable that is a value for evaluating the function to be optimized; obtaining a feature amount vector that is input information indicating a promising design/trial manufacture condition obtained by using the prediction model for outputting a predicted value of the objective variable upon input of a feature amount vector determined from a plurality of feature amounts; obtaining a contribution degree indicating how much a predicted result is changed when the plurality of prediction model creation data items are not included in learning data; and determining a value for each of the plurality of initially-acquired data items related to a product function that is to be optimized, on the basis of the contribution degree.

Description

データ価値定義方法、データ収集促進方法、データ価値定義システム並びにデータ収集促進システムData value definition method, data collection promotion method, data value definition system and data collection promotion system
 本発明は、データに対してその価値を定義することで、データ提供者にデータ収集のインセンティブを与えるデータ価値定義方法、データ収集促進方法、データ価値定義システム並びにデータ収集促進システムに係り、特にデータ取得コストの高い貴重なデータ取引に対して本発明を適用することで高い効果を発揮することができるデータ価値定義方法、データ収集促進方法、データ価値定義システム並びにデータ収集促進システムに関する。 The present invention relates to a data value definition method, a data collection promotion method, a data value definition system and a data collection promotion system that give an incentive for data collection to a data provider by defining the value of the data, and particularly data. The present invention relates to a data value definition method, a data collection promotion method, a data value definition system, and a data collection promotion system, which can exert a high effect by applying the present invention to valuable data transactions having a high acquisition cost.
 製品開発現場において、製品サイクルの短期化と顧客ニーズの多様化に伴い、効率的に新製品を開発するために、データ科学に基づく効率的な製品設計が望まれる。 At the product development site, with the shortening of the product cycle and the diversification of customer needs, efficient product design based on data science is desired in order to efficiently develop new products.
 一方で製品開発現場では目下の開発課題が優先され、短期的にはデータ蓄積の優先度が低くデータ蓄積が進まないため、従来の試行錯誤的な開発がつづいている。 On the other hand, at the product development site, the current development issues are prioritized, and in the short term, the priority of data accumulation is low and data accumulation does not proceed, so conventional trial and error development is continuing.
 しかし、長期的視点ではデータを蓄積し活用することで、製品開発を加速できるため、データ取得、蓄積の価値が短期的視点からは過度に低く見積もられており、将来の製品開発へのデータ蓄積という形での投資が疎かになっている。 However, since product development can be accelerated by accumulating and utilizing data from a long-term perspective, the value of data acquisition and accumulation is overestimated from a short-term perspective, and data for future product development. Investment in the form of accumulation has been neglected.
 ここで、既存のデータ収集システムの発明として、特許文献1に示す故障通知システムが知られている。ここに用いられる、故障情報提供者に報酬を与える技術は、予め設定された故障重要度データベースから、故障通報の情報に対して重要度を定め情報提供者に報酬を与えられるようにするものである。 Here, as an invention of an existing data collection system, the failure notification system shown in Patent Document 1 is known. The technology used here that rewards the failure information provider is to determine the importance of the failure report information from the preset failure importance database and to reward the information provider. be.
特開2011-086154号公報Japanese Unexamined Patent Publication No. 2011-086415
 このように製品開発現場では、短期的な視点から目下の開発課題が優先されデータ蓄積進まないため、データ活用による効率的な製品開発が進まないというのが実情である。 In this way, at the product development site, the current development issues are prioritized from a short-term perspective and data accumulation does not proceed, so the actual situation is that efficient product development by utilizing data does not proceed.
 この原因の一つは、データ単体では製品開発に対する即効性はなく、データ取得に対する費用対効果が不明瞭であることである。これにより、データ収集活動の製品開発への貢献度を適切に評価することが困難であるため、データ収集者側にデータを蓄積する利点がなく、データの収集と蓄積が滞っている。 One of the reasons for this is that the data alone does not have an immediate effect on product development, and the cost-effectiveness for data acquisition is unclear. As a result, it is difficult to appropriately evaluate the degree of contribution of data collection activities to product development, so there is no advantage in accumulating data on the data collector side, and data collection and accumulation are delayed.
 加えて、これらのデータは、一般的に製品ごとに閉じたものであり、たとえ他の製品開発に利用できるデータであっても、データが取得された製品開発においてのみ使用されるため、対象の製品開発内でのデータの開発貢献度のみを考慮することでデータ価値が定まることが現状である。
一方で、データ解析の性質上、解析効果を高めるには、ある程度大きな規模のデータを蓄積する必要がある。特に、製品開発現場では多数の制御因子を入力として、製品ができるため制御因子と製品機能の関係は複雑である。加えて、製品機能は複数の因子に由来する誤差を含む。このような状況下では、多数の制御因子と結果の関係を推定するためには多くのデータを要するため、製品開発現場において、蓄積されるデータの規模を大きくすることへの要求は大きい。
In addition, these data are generally closed on a product-by-product basis, and even if they are available for other product development, they are only used in the product development from which the data was obtained. The current situation is that the data value is determined only by considering the contribution of data development within product development.
On the other hand, due to the nature of data analysis, it is necessary to accumulate data on a large scale in order to enhance the analysis effect. In particular, at the product development site, the relationship between control factors and product functions is complicated because a product can be produced by inputting a large number of control factors. In addition, product features include errors due to multiple factors. Under such circumstances, a large amount of data is required to estimate the relationship between a large number of control factors and the results, and therefore, there is a great demand for increasing the scale of accumulated data at the product development site.
 しかし、製品開発現場では繰り返し作業が少なく開発目的がその都度変わるため大規模なデータを単一企業で揃えることは困難であり、企業を含む複数機関でデータ蓄積を行ったデータベースを共有するこが望まれる。このように蓄積データを公開し、複数機関でデータを共有できれば、製品開発現場でのデータ活用が促進され、小規模なデータでは利益を生まなかった状態から、データのもつ価値を向上することができる。そのため、複数の機関でデータを共有して用いることが理想的な姿である。しかし、現状では企業間のデータの共有は進んでいない。 However, at the product development site, it is difficult for a single company to prepare large-scale data because there are few repetitive tasks and the development purpose changes each time. desired. If the accumulated data can be disclosed and shared by multiple institutions in this way, the utilization of data at the product development site will be promoted, and the value of the data can be improved from the state where small-scale data did not generate profits. can. Therefore, it is ideal to share and use data among multiple institutions. However, at present, data sharing between companies is not progressing.
 この原因は、データ提供に対して直接的な利得がないことにある。また、製品開発において利用されるデータ単体の価値が不明瞭であることに起因して、データ提供者に利得を付与することも容易でない。これにより、データ提供者が提供したデータに対する報酬を得難く、データの提供が単なる情報流出となってしまうため、データを共有するメリットがない。 The reason for this is that there is no direct gain on data provision. In addition, it is not easy to give a gain to a data provider because the value of a single piece of data used in product development is unclear. As a result, it is difficult to obtain a reward for the data provided by the data provider, and the provision of the data is merely an information leak, so that there is no merit in sharing the data.
 前述のデータ提供に対するメリットを付与するために、報酬を与えることが考えられるが、製品開発現場では、開発した製品の生む利益を、事前に精度よく見積もることは困難である。仮に提供されたデータに対する報酬を事前評価に基づいて開発前に支払うと、
データが想定通りの成果を生まない場合はデータの利用者側が過剰なリスクを負う場合や、データ提供者が開発成果に対して過度に低い報酬を受けとる場合が起こり得る。
このような従来型のインセンティブの付与では、開発成果に応じた適切な価値をデータに付与できないという問題を有していた。したがって、製品開発現場でのデータ価値は、開発成果が上がった後の事後評価によって、製品開発成果に対してデータの貢献度から、データの価値を適切に定義する必要がある。然るにこの点に関して、事前評価に基づく特許文献1の方法では、データ価値を適切に定義できなかった。また、データ活用の進んだ製品開発現場では、開発を効率化するために、機械学習手法を用いて作成された予測モデルに基づいて、所望の製品機能や品質を実現する設計試作条件を得る。しかし、特許文献1は、使用したデータが、予測モデルの提案する設計試作条件の選定結果に対してどれほど寄与しているのかを考慮して、データの価値を定義できないという問題も有していた。
It is conceivable to give a reward in order to give the above-mentioned merit to the data provision, but it is difficult to accurately estimate the profit generated by the developed product in advance at the product development site. If you pay for the data provided before development based on pre-evaluation,
If the data does not produce the expected results, the users of the data may take excessive risks, or the data providers may receive excessively low rewards for the development results.
With such conventional incentives, there is a problem that it is not possible to give appropriate value to data according to the development results. Therefore, for the data value at the product development site, it is necessary to appropriately define the value of the data from the degree of contribution of the data to the product development result by the ex-post evaluation after the development result is achieved. However, in this regard, the method of Patent Document 1 based on the preliminary evaluation could not properly define the data value. In addition, at a product development site where data utilization is advanced, in order to improve development efficiency, design prototype conditions that realize desired product functions and quality are obtained based on a prediction model created by using a machine learning method. However, Patent Document 1 also has a problem that the value of the data cannot be defined in consideration of how much the data used contributes to the selection result of the design prototype conditions proposed by the prediction model. ..
 このことから本発明は、データ提供者が提供したデータのもつ製品開発への貢献度に応じて、データ提供者に報酬を受ける権利を与えることで、データ蓄積と共有を促すことを可能にするデータ価値定義方法、データ収集促進方法、データ収集システム並びにデータ収集促進システムを提供することを目的とする。 From this, the present invention makes it possible to promote data accumulation and sharing by giving the data provider the right to receive a reward according to the degree of contribution of the data provided by the data provider to product development. It is an object of the present invention to provide a data value definition method, a data collection promotion method, a data collection system, and a data collection promotion system.
 以上のことから本発明においては、最適化対象である製品機能の評価値である目的変数に関わる複数のデータを取得し、前記データを特徴量データと評価値である目的変数データを要素としてもつ予測モデル作成向けのデータに変換し、前記予測モデル作成向けの複数データから最適化対象の機能を評価する値である目的変数を予測する予測モデルを作成し、複数の特徴量から定まる特徴量ベクトルを入力として、目的変数の予測値を出力する予測モデルを用いて得られた有望な設計試作条件を示す入力情報である特徴量ベクトルを得て、前記予測モデル作成向けの各データが学習データに含まれない際に予測結果がどの程度変化するかを示す寄与度を求めて、前記寄与度に基づき、はじめに取得した最適化対象である製品機能に関わる複数データのそれぞれのデータに対してデータ価値を定めることを特徴とする。 From the above, in the present invention, a plurality of data related to the objective variable which is the evaluation value of the product function to be optimized is acquired, and the data has the feature amount data and the objective variable data which is the evaluation value as elements. A feature quantity vector determined from a plurality of feature quantities by converting it into data for creating a prediction model and creating a prediction model that predicts an objective variable that is a value for evaluating the function to be optimized from the plurality of data for creating the prediction model. Is used as an input, and a feature quantity vector, which is input information indicating promising design trial conditions obtained by using a prediction model that outputs the prediction value of the objective variable, is obtained, and each data for creating the prediction model is used as training data. The data value is obtained for each of the multiple data related to the product function to be optimized first, based on the contribution, which indicates how much the prediction result changes when it is not included. It is characterized by defining.
 また本発明は、データ価値定義方法を用いたデータ収集促進方法であって、最適化対象に関わるデータを、データ提供者から得るとともに、決定したデータの価値に対する対価を、製品貢献に対するデータ価値基づく報酬の評価方法を事前にデータ提供者に提示することでデータ確保のインセンティブを高めることを特徴とするデータ収集促進方法としたものである。 Further, the present invention is a data collection promotion method using a data value definition method, in which data related to an optimization target is obtained from a data provider, and the consideration for the determined data value is based on the data value for product contribution. This is a data collection promotion method characterized by increasing the incentive to secure data by presenting the reward evaluation method to the data provider in advance.
 また本発明は、最適化対象から複数のデータを取得する第1の手段、複数の特徴量データと目的変数データの複数組を含むデータセットを生成する第2の手段、データセットから最適化対象の予測モデルを作成する第3の手段、複数の特徴量データで定まる特徴量ベクトルを入力として、目的変数データの応答関係を示す予測モデルを基に最適な目的変数を出力する、特徴量ベクトルを探索することで、特徴量ベクトルから設計試作条件を提示する第4の手段、設計試作に対応する目的変数に対して予測する際に、学習データの有無による予測モデルの予測結果の変化から、各データの設計試作条件への寄与度を決定する第5の手段、前記寄与度から設計試作して得たデータの価値を定める第6の手段を備えることを特徴とするデータ価値定義システムとしたものである。 Further, the present invention has a first means for acquiring a plurality of data from an optimization target, a second means for generating a data set including a plurality of sets of a plurality of feature amount data and objective variable data, and an optimization target from the data set. A third means of creating a prediction model of By searching, the fourth means of presenting the design trial condition from the feature quantity vector, when predicting the objective variable corresponding to the design trial, from the change of the prediction result of the prediction model depending on the presence or absence of training data, each A data value definition system characterized by being provided with a fifth means for determining the degree of contribution of data to the design trial condition and a sixth means for determining the value of the data obtained by designing and trializing from the contribution. Is.
 また本発明は、データ収集システムを用いたデータ収集促進システムであって、最適化対象のデータを、データ提供者から得るとともに、決定したデータの価値に対する対価をデータ提供者に提示することで使用者のデータ提供へのインセンティブを高めることを特徴とするデータ収集促進システムとしたものである。 Further, the present invention is a data collection promotion system using a data collection system, which is used by obtaining data to be optimized from a data provider and presenting a consideration for the determined value of the data to the data provider. It is a data collection promotion system characterized by increasing the incentive for the data provision of the person.
 製品開発に利用した予測モデルの製品開発成果に応じて各データの価値を定義することで、データ提供者の開発貢献度に応じて報酬を受ける権利を与え、データ蓄積と共有を促すことを可能にする。 By defining the value of each data according to the product development results of the prediction model used for product development, it is possible to give the right to receive compensation according to the development contribution of the data provider and promote data accumulation and sharing. To.
本発明に係るデータ価値定義方法の一例を示すフロー図。The flow chart which shows an example of the data value definition method which concerns on this invention. 本発明に係るデータの説明表。An explanatory table of data according to the present invention. 本発明に係る特徴量データの説明表。An explanatory table of feature amount data according to the present invention. 本発明に係る目的変数データの説明表。Explanatory table of objective variable data according to the present invention. 取得データD1(h)と生データD2(h)の関係を例示した図。The figure which illustrated the relationship between the acquired data D1 (h) and the raw data D2 (h). データベースと予測モデル作成に用いるデータセットとの関係を示す模式図。The schematic diagram which shows the relationship between a database and a data set used for making a prediction model. 本発明に係る予測モデル使用履歴データD6(m,n)と、予測モデル使用履歴データベースを示す模式図。The schematic diagram which shows the prediction model use history data D6 (m, n) which concerns on this invention, and the prediction model use history database. 本発明に係る、設計試作データ点(p)に対する各データの設計試作条件の提案結果への寄与度の定義方法の概念図。The conceptual diagram of the definition method of the degree of contribution to the proposal result of the design prototype condition of each data with respect to the design prototype data point (p) which concerns on this invention. 設計試作データ点(p)への各レコードデータD3(i)の寄与度に対するデータ追加時刻に応じた重みづけ方法の概念図。The conceptual diagram of the weighting method according to the data addition time to the contribution degree of each record data D3 (i) to a design prototype data point (p). 本発明に係るデータ駆動の製品開発の一例を示す概念図。The conceptual diagram which shows an example of the data-driven product development which concerns on this invention. 本発明に係る製品利益とデータ(i)の貢献度の関係を示す概念図。The conceptual diagram which shows the relationship between the product profit which concerns on this invention, and the degree of contribution of data (i h). 図11における変換定義表。The conversion definition table in FIG. 本発明に係るデータ価値定義システムの概略構成を示す模式図。The schematic diagram which shows the schematic structure of the data value definition system which concerns on this invention.
 以下、本発明の実施例について図面を用いて詳細に説明する。なお本発明に係るデータ価値定義方法、データ収集促進方法は、計算機システムで構成されるデータ価値定義システム、データ収集促進システム内でソフトウェアにより実行される。このため、以下の説明においては、まずデータ価値定義方法について実施例1で説明し、その後に計算機システムで構成されるデータ価値定義システムについて実施例2で説明を行うものとする。さらに実施例3において、データ収集促進方法、データ収集促進システムを自会社内データのみでなく、広く外部期間までを含めてデータを取得して運用するものとする。 Hereinafter, examples of the present invention will be described in detail with reference to the drawings. The data value definition method and the data collection promotion method according to the present invention are executed by software in the data value definition system and the data collection promotion system composed of the computer system. Therefore, in the following description, the data value definition method will be described first in the first embodiment, and then the data value definition system composed of the computer system will be described in the second embodiment. Further, in the third embodiment, the data collection promotion method and the data collection promotion system shall acquire and operate not only the data in the own company but also the data widely including the external period.
 実施例1では、本発明に係るデータ価値定義方法を製品生産現場または製品開発現場(以降、生産開発現場とする)での実施に適用することを例にして説明する。 In the first embodiment, the data value definition method according to the present invention will be described by applying it to the product production site or the product development site (hereinafter referred to as the production development site) as an example.
 図1は、本発明に係るデータ価値定義方法の一例を示すフロー図である。ここでは図1に示す処理ステップS101から始まり、処理ステップS116で終了する一連のフローを説明することで、本実施例を本発明の1例として示している。なおデータ価値定義の実態的処理は、処理ステップS102から処理ステップS115に記述されている。 FIG. 1 is a flow chart showing an example of the data value definition method according to the present invention. Here, the present embodiment is shown as an example of the present invention by explaining a series of flows starting from the processing step S101 shown in FIG. 1 and ending in the processing step S116. The actual processing of the data value definition is described in the processing steps S102 to S115.
 加えて、図2から図4には、それぞれデータの説明表、特徴量データの説明表、目的変数データの説明表を示しており、この説明表に記されるデータの定義に従い以下の実施例の説明をすすめる。なおこれらの表において、インデックスiは、各データD3(i)を示すインデックスである。一方で、D3train(i),D3valiation(i),D3test(i),D3not train(i)で使用されるiはD3(i)と対応しているが、必ずしもすべてのiに対して要素が存在するわけではない。 In addition, FIGS. 2 to 4 show an explanation table of data, an explanation table of feature amount data, and an explanation table of objective variable data, respectively, and the following examples follow the definition of data described in this explanation table. I recommend the explanation of. In these tables, the index i is an index indicating each data D3 (i). On the other hand, i used in D3 train (i), D3 variation (i), D3 test (i), and D3 not train (i) corresponds to D3 (i), but not necessarily for all i. There is no element.
 まず、処理ステップS102では、データ取得者が生産開発現場で最適化対象となる製品機能に関わる複数のデータを取得する。このデータは過去に取得されたオフラインで蓄積されたデータであっても、またこの生産開発現場においてオンラインで常時送信される監視データであってもよい。またデータの形式は問わないため、画像データ、測定機器の出力データ、信号データ、シミュレーション解析の入出力データ、音声データ、材料物性データ等が考えられる。さらにはこれらの種類のデータが複数組み合わさったデータ、あるいはこれらのデータをもとに加工された二次的なデータであってもよい。さらに取得するデータの一部は製品設計時に制御することができない、気温、湿度などの環境データを含むものであってもよい。 First, in the processing step S102, the data acquirer acquires a plurality of data related to the product function to be optimized at the production development site. This data may be offline accumulated data acquired in the past, or may be monitoring data constantly transmitted online at this production development site. Further, since the data format does not matter, image data, output data of measuring equipment, signal data, input / output data of simulation analysis, audio data, material property data, etc. can be considered. Further, it may be data in which a plurality of these types of data are combined, or secondary data processed based on these data. Furthermore, some of the acquired data may include environmental data such as temperature and humidity that cannot be controlled at the time of product design.
 また前記データは、それぞれのデータに適した形式でよく、統一されたデータ形式である必要はない。ただし、データベースに追加される際に、望ましくはメタ情報などタグ付け情報を付与できる形式がよく、さらに望ましくは情報構造を有するデータ形式(xml形式や、json形式)がよく、さらに望ましくは機械学習に用いる際に適した表形式に情報を整理しやすいレコード型のデータ形式がよい。また、情報処理によって上記のデータ形式に変換が容易な表現形式であってもよい。さらには生産開発現場で、当該データを取得した日時の時刻情報を含むものであってもよい。 Also, the data may be in a format suitable for each data, and does not have to be in a unified data format. However, when it is added to the database, a format that can add tagging information such as meta information is preferable, and a data format having an information structure (xml format or json format) is preferable, and machine learning is more preferable. A record-type data format that makes it easy to organize information in a tabular format suitable for use in Further, it may be an expression format that can be easily converted into the above data format by information processing. Further, it may include time information of the date and time when the data was acquired at the production development site.
 処理ステップS102で取得したデータを識別するインデックスhを用いて、取得データを取得データD1(h)と表現する。 The acquired data is expressed as acquired data D1 (h) by using the index h that identifies the data acquired in the processing step S102.
 処理ステップS103では、取得データD1(h)を計算機システム内のデータベースに追加し、記憶するために取得データD1(h)を生データD2(h)に変換する。図5は、取得データD1(h)と生データD2(h)の関係を例示した図である。図5の生データD2(h)は、少なくとも以下の4つの要素すべて、もしくはそれらの組み合わせから構成される。 In the processing step S103, the acquired data D1 (h) is added to the database in the computer system, and the acquired data D1 (h) is converted into the raw data D2 (h) for storage. FIG. 5 is a diagram illustrating the relationship between the acquired data D1 (h) and the raw data D2 (h). The raw data D2 (h) of FIG. 5 is composed of at least all of the following four elements or a combination thereof.
 生データD2(h)の1つ目の要素502は、取得データD1(h)のインデックスhであり、データベース内部において、ある生デ―タを固有に識別するために用いられる。 The first element 502 of the raw data D2 (h) is the index h of the acquired data D1 (h) and is used to uniquely identify a certain raw data in the database.
 生データD2(h)の2つ目の要素503はデータ追加時刻であり、データベースにデータを追加された際の時刻が保持される。 The second element 503 of the raw data D2 (h) is the data addition time, and the time when the data is added to the database is retained.
 生データD2(h)の3つ目の要素504は、データ提供者情報である。通常、データ提供者情報は開発現場でデータを取得した者を特定する情報であるが、これに限らず取得データをデータベースに追加したことによる、報酬を受ける権利を保持する者を対象にして、対象者を特定できる情報をデータ提供者情報としても良い。 The third element 504 of the raw data D2 (h) is the data provider information. Normally, data contributor information is information that identifies the person who acquired the data at the development site, but it is not limited to this, and it is intended for those who retain the right to receive compensation by adding the acquired data to the database. Information that can identify the target person may be used as data provider information.
 生データD2(h)の4つ目の要素201は、処理ステップS102で得た取得データD1(h)の情報である。 The fourth element 201 of the raw data D2 (h) is the information of the acquired data D1 (h) obtained in the processing step S102.
 図5に示す取得データD1(h)と生データD2(h)の関係性によれば、要するに生データD2(h)は、取得データD1(h)にインデックス502、データ追加時刻503、データ提供者情報504を新たに付与したものということができる。 According to the relationship between the acquired data D1 (h) and the raw data D2 (h) shown in FIG. 5, in short, the raw data D2 (h) is the acquired data D1 (h) with an index 502, a data addition time 503, and data provision. It can be said that the person information 504 is newly added.
 ここで、処理スッテプS102およびS103で言及した取得データD1(h)および生データD2(h)は、図2で示すデータの説明表に示した、201,202にそれぞれ対応する。 Here, the acquired data D1 (h) and the raw data D2 (h) mentioned in the processing steps S102 and S103 correspond to 201 and 202 shown in the data explanation table shown in FIG. 2, respectively.
 次の処理ステップS104では、処理ステップS103で得られた生データD2(h)図2の202をデータベースに追加する。ここでいうデータベースは、図6の601に示すとおりである。なお生データD2(h)は、図6のデータセット602に含まれるレコードデータD3(i)の形式に変換される。この際にレコード型として共通のフォーマットであるD3(i)に変換されれば良いため、前記生データD2(h)が内部にもつ取得データの内容D1(h)のデータのフォーマットは決まった形式でなくともよい。データベース601はデータレイク601とも表記できる。 In the next processing step S104, the raw data D2 (h) 202 of FIG. 2 obtained in the processing step S103 is added to the database. The database referred to here is as shown in 601 of FIG. The raw data D2 (h) is converted into the format of the record data D3 (i) included in the data set 602 of FIG. At this time, since it is sufficient to convert to D3 (i), which is a common format as a record type, the data format of the acquired data content D1 (h) contained in the raw data D2 (h) is a fixed format. It does not have to be. Database 601 can also be referred to as data lake 601.
 図6の上部に示すデータレイク601は、統一的なデータ・フォーマットを用いることもできる。図6のD2(1)、D2(2)・・・は各々の生データD2(h)を示しており、データレイク601は、生データD2(h)の集合を保持している。 The data lake 601 shown at the top of FIG. 6 can also use a unified data format. D2 (1), D2 (2) ... In FIG. 6 indicate each raw data D2 (h), and the data lake 601 holds a set of raw data D2 (h).
 個別の取得データD1(h)に対して図1の処理ステップS102から処理ステップS104の一連の処理を個別の取得データD1(h)を生データD2(h)に順次変換して、データベースに生データD2(h)は蓄積されていく。 A series of processes from the process step S102 to the process step S104 of FIG. 1 for the individual acquired data D1 (h) is sequentially converted from the individual acquired data D1 (h) into the raw data D2 (h) and generated in the database. Data D2 (h) is accumulated.
 図1の処理ステップS102から処理ステップS104に至る一連の処理は、生データD2(h)を取得して蓄積する処理であり、いわば計算機システム内の機能として後述する図12におけるデータ追加部10を構成したものということができる。 The series of processes from the process step S102 to the process step S104 of FIG. 1 is a process of acquiring and accumulating the raw data D2 (h), so to speak, as a function in the computer system, the data addition unit 10 in FIG. It can be said that it is composed.
 全ての取得データD1(h)に対して生データD2(h)を変換取得し、蓄積した後のデータベース601を用いて、図1の処理ステップS105~処理ステップS109を次に行うことによって、機械学習により予測モデルを構築する。 By performing the processing steps S105 to S109 of FIG. 1 next using the database 601 after converting and acquiring the raw data D2 (h) for all the acquired data D1 (h) and accumulating the raw data D2 (h), the machine Build a prediction model by learning.
 図1の処理ステップS105から処理ステップS109に至る一連の処理は、予測モデルを構築する処理であり、いわば計算機システム内の機能として後述する図12における予測モデル作成部30を構成したものということができる。 The series of processes from the process step S105 to the process step S109 in FIG. 1 is a process for constructing a prediction model, and it can be said that the prediction model creation unit 30 in FIG. 12, which will be described later, is configured as a function in the computer system. can.
 図6は、データレイクと予測モデル作成に用いるデータセットとの関係を示す模式図であり、予測モデル作成に用いるデータセットの作成の流れを説明する。予測モデル作成の一連の処理ではまず、処理ステップS105において予測モデルを作成するためのデータセット作成の処理を説明する。図1の処理ステップS105は、図6での処理S105に相当する。 FIG. 6 is a schematic diagram showing the relationship between the data lake and the data set used for creating the prediction model, and explains the flow of creating the data set used for creating the prediction model. In the series of processes for creating the prediction model, first, the process for creating the data set for creating the prediction model in the process step S105 will be described. The processing step S105 of FIG. 1 corresponds to the processing S105 of FIG.
 処理ステップS105により図6の上部に例示したデータレイク601に含まれるデータを用いて、モデル作成に用いるデータセット605(図3の下部)を生成する。ここでは、生データD2(h)図2の202を、データセット605の各行に示されるレコード型データ(以下データセットD3(i)図2および図6の203)にそれぞれ変換する。
ただし、データセット内部に含まれるすべてのD2(h)を変換する必要はない。
The data set 605 (lower part of FIG. 3) used for model creation is generated by the processing step S105 using the data included in the data lake 601 illustrated in the upper part of FIG. Here, the raw data D2 (h) 202 of FIG. 2 is converted into the record type data (hereinafter, data set D3 (i) FIG. 2 and FIG. 6 203) shown in each row of the data set 605.
However, it is not necessary to convert all D2 (h) contained in the data set.
 例えば、図6の生データD2(1)はレコードデータD3(1)というように変換できる。ただし、この際の生データD2(h)とレコードデータD3(i)との関係は、図2の203に示すとおり1対1で対応する必要はなく、一つの生データD2(h)から1個以上の複数のD3(i)が生成されても良い。 For example, the raw data D2 (1) in FIG. 6 can be converted into record data D3 (1). However, the relationship between the raw data D2 (h) and the record data D3 (i) at this time does not need to correspond one-to-one as shown in 203 of FIG. 2, and one raw data D2 (h) to 1 A plurality of D3 (i) may be generated.
 ここで、各データD3(i)は生データD2(h)からの変換により生成される。各レコードデータD3(i)は、生成する際に用いられた生データD2(h)のインデックス図6の603を持つ。ただし、これは予測モデル作成に用いられない。加えて、各データD3(i)に対応する特徴量ベクトルD4(vec)(i)301と目的変数ベクトルD5(vec)(i)401を持ち、特徴量ベクトルD4(vec)(i)の、j番目の成分である特徴量をD4(i,j)302と表記する。同様に、目的変数ベクトルD5(vec)(i)401のk番目の成分である目的変数D5(i,k)402とする。前記D5(i,k)は、図6中の402に対応する。特徴量ベクトルD4(vec)(i)301および、目的変数ベクトルD5(vec)(i)は少なくとも、一つ以上の成分を持つ。 Here, each data D3 (i) is generated by conversion from the raw data D2 (h). Each record data D3 (i) has 603 of the index FIG. 6 of the raw data D2 (h) used at the time of generation. However, this is not used to create a predictive model. In addition, it has the feature vector D4 (vc) (i) 301 and the objective variable vector D5 (vc) (i) 401 corresponding to each data D3 (i), and has the feature vector D4 (vc) (i). The feature amount, which is the j-th component, is expressed as D4 (i, j) 302. Similarly, the objective variable D5 (i, k) 402, which is the k-th component of the objective variable vector D5 (vc) (i) 401, is used. The D5 (i, k) corresponds to 402 in FIG. The feature vector D4 (vc) (i) 301 and the objective variable vector D5 (vc) (i) have at least one or more components.
 ここで、データセット605の一つのレコードデータD3(i)203の意味するところを、生産開発現場において、材料を新規開発する場合の例を詳細に説明する。材料のレコードデータD3(i)として、D4(vec)(i)や、D5(i,k)がどのようなものであるかの一例を示す。一つの材料の特徴表すD4(vec)(i)とその材料の機能を表す目的変数データD5(i,k)を合わせたデータをD3(i)とする。 Here, the meaning of one record data D3 (i) 203 of the data set 605 will be described in detail as an example of a case where a material is newly developed at a production development site. An example of what D4 (vc) (i) and D5 (i, k) are as the record data D3 (i) of the material is shown. Let D3 (i) be the data obtained by combining D4 (vc) (i) representing the characteristics of one material and the objective variable data D5 (i, k) representing the function of the material.
 ここで、材料に係る特徴量ベクトルD4(vec)(i)の一例として、材料組成の情報から特徴量ベクトルD4(vec)(i)が生成することができる。材料組成から生成される特徴量D4(vec)(i,j)として、材料に含まれる元素の原子量、原子番号や電気陰性度の値などの元素由来の情報を、材料内の各元素組成比で重み付け平均した値がある。また同様に、前記元素由来の情報の重み付け分散等が考えられるがこれに限らない。 Here, as an example of a feature vector D4 of the material (vec) (i), it is possible by the feature vector from the information of the material composition D4 (vec) (i) is generated. As feature quantities D4 (vc) (i, j) generated from the material composition, information on the elements derived from the elements such as the atomic weight, atomic number and electronegativity value of the elements contained in the material is used as the composition ratio of each element in the material. There is a weighted average value in. Similarly, weighted dispersion of information derived from the element can be considered, but the present invention is not limited to this.
 また、材料組成から生成される情報として、元素種と同じ次元のベクトルとして、各ベクトル要素が元素の組成比に対応する値を持つような、前記ベクトルD4(vec)(i)を用いても良い。 Further, as the information generated from the material composition, the vector D4 (vc) (i) such that each vector element has a value corresponding to the composition ratio of the element can be used as a vector having the same dimension as the element species. good.
 材料に係る特徴量ベクトルD4(vec)(i)として、ナノメートル程度のオーダーの単相材料情報である結晶構造や分子構造から生成された特徴量ベクトルD4(vec)(i)も考えられる。結晶構造や原子配置を表す特徴量として、以下文献1-5の特徴量等があるが、あくまで一例でありこれらに限らない。
・文献1[P.J. Steinhardt, D.R. Nelson, and M. Ronchetti, Phys. Rev. B 28, 784 (1983).]
・文献2[M. Rupp, A. Tkatchenko, K.-R. Muller, and O.A. von Lilienfeld, Phys. Rev. Lett. 108, 58301 (2012).]
・文献3[K. Hansen, F. Biegler, R. Ramakrishnan, W. Pronobis, O.A. von Lilienfeld, K.-R. Muller, and A. Tkatchenko, J. Phys. Chem. Lett. 6, 2326 (2015).]
・文献4[A. Seko, A. Togo, and I. Tanaka, Phys. Rev. B 99, 214108 (2019).]
・文献5[T. Xie and J.C. Grossman, Phys. Rev. Lett. 120, 145301 (2018).]
 他にもマイクロメートルオーダーの材料組織から生成する特徴量ベクトルD4(vec)(i)も考えられる。一例として、平均粒径や粒径分布の分散、相分率、転位密度、含有相内部の濃度分布の分散等を特徴量に含む特徴量ベクトルD4(vec)(i)を用いることができるが、特段特徴量ベクトルに対する制限はない。
As a feature vector D4 of the material (vec) (i), nanometer single-phase material information a crystal structure and molecular structure feature quantity generated from the vector of the order D4 (vec) (i) is also conceivable. The feature quantities representing the crystal structure and the atomic arrangement include the feature quantities of Documents 1-5 below, but these are merely examples and are not limited thereto.
・ Reference 1 [P. J. Steinhardt, D.D. R. Nelson, and M. et al. Ronchetti, Phys. Rev. B 28, 784 (1983). ]
・ Reference 2 [M. Rupp, A. Tkatchenko, K.K. -R. Muller, and O. A. von Lilianfeld, Phys. Rev. Lett. 108, 58301 (2012). ]
・ Reference 3 [K. Hansen, F.M. Biegler, R.M. Ramaklishnan, W. et al. Pronovis, O.D. A. von Lilianfeld, K. et al. -R. Muller, and A. Tkatchenko, J. et al. Phys. Chem. Lett. 6, 2326 (2015). ]
・ Reference 4 [A. Seko, A. Togo, and I. Tanaka, Phys. Rev. B 99, 214108 (2019). ]
・ Reference 5 [T. Xie and J. C. Grossman, Phys. Rev. Lett. 120, 145301 (2018). ]
In addition, the feature vector D4 (vc) (i) generated from the material structure on the order of micrometers can be considered. As an example, the feature vector D4 (vc) (i) can be used, which includes the dispersion of the average particle size and the particle size distribution, the phase fraction, the dislocation density, the dispersion of the concentration distribution inside the contained phase, and the like. , There are no particular restrictions on the feature vector.
 加えて、材料のプロセス条件から生成される特徴量ベクトルD4(vec)(i)も用いることもできるため、恒温保持温度、恒温保持時間、冷却速度や環境条件である湿度、気温、圧力等を含む特徴量ベクトルD4(vec)(i)もあるが、これらの特徴量ベクトルも一例であり、上記で記載したような材料のもつ特徴やそのプロセス条件の組み合わせで生成される特徴量ベクトルを材料製品では用いることができる。 In addition, since the feature vector D4 (vc) (i) generated from the process conditions of the material can also be used, the constant temperature holding temperature, the constant temperature holding time, the cooling rate and the environmental conditions such as humidity, temperature, and pressure can be obtained. There is also a feature vector D4 (vc) (i) including, but these feature vectors are also examples, and the feature vector generated by the combination of the features of the material and the process conditions as described above is used as the material. It can be used in products.
 また、上記で述べたような特徴量ベクトルから、例えば主成分分析、自己符号化器等によって、次元削減されて生成された2次的な特徴量を用いても構わない。 Further, a secondary feature amount generated by reducing the dimension from the feature amount vector as described above by, for example, principal component analysis, self-encoder, etc. may be used.
 次に、材料開発におけるレコードデータD3(i)のもつ目的変数D5(i,k)の説明を行う。目的変数D5(i,k)とされる特性は、その材料の物性や機能を示す値として、硬度、熱伝導度、引張強度、降伏応力、伸び、反射率、溶解度などの測定値や期待値である。また、前記測定値や期待値は実測値に限らず、生産開発現場で利用されるコンピュータシミュレーションに基づいて出力された計算値でも良い。例えば、材料シミュレーションによる導出される値の例として、格子熱伝導度や、バンドギャップ、溶解度パラメータ、材料が安定に存在する確率や、形成エネルギー、他相との相対エネルギー値、拡散係数等が目的変数とできる。 Next, the objective variable D5 (i, k) of the record data D3 (i) in material development will be described. The property set as the objective variable D5 (i, k) is a measured value or expected value such as hardness, thermal conductivity, tensile strength, yield stress, elongation, reflectance, and solubility as a value indicating the physical properties and functions of the material. Is. Further, the measured value and the expected value are not limited to the actually measured value, and may be a calculated value output based on a computer simulation used at the production development site. For example, as examples of values derived by material simulation, the objectives are lattice thermal conductivity, band gap, solubility parameter, probability that the material exists stably, formation energy, relative energy value with other phases, diffusion coefficient, etc. Can be a variable.
 また、一般的な生産開発現場において、プロセス設計に関する特徴量ベクトルD4(vec)(i)は、例えばプロセス条件を定める複数の条件値および、測定値であり、圧縮圧力、入熱量、添加元素の混合比率、外気温度、外気湿度などを含む。これらは、プロセス条件ごとに得られる値や、前記値を要素としてもつベクトルでありプロセス量と呼ぶこととする。 Further, in a general production development site, the feature quantity vectors D4 (vc) (i) relating to the process design are, for example, a plurality of condition values and measured values that determine the process conditions, and are compression pressure, heat input amount, and additive element. Includes mixing ratio, outside air temperature, outside air humidity, etc. These are values obtained for each process condition and vectors having the above values as elements, and are referred to as process quantities.
 前記プロセス量は、添加材の混合比率をはじめとする制御可能な因子から、外気温度や外気湿度等の制御できない環境因子を示す値でもよい。プロセス設計の場合、レコードデータD3(i)は1つのプロセスに関わるプロセス量を各要素特徴量D4(i,j)として定義でき、前記各要素を持つ特徴量ベクトルD4(vec)(i)と定義できる。このとき同じプロセスに関わる目的変数ベクトルD5(vec)(i)とでき、機能評価値は前記目的変数ベクトルD5(vec)(i)の各成分の目的変数D5(i,k)と表記できる。従って、レコードデータD3(i)は、プロセス量を各要素成分とする特徴量ベクトルD4(vec)(i)と、同様のプロセスの出力である製品機能の評価値を示す目的変数ベクトルD5(vec)(i)をもつため、D3(i)はプロセスの入出力の関係の情報を保持していると言える。 The process amount may be a value indicating an uncontrollable environmental factor such as an outside air temperature or an outside air humidity from a controllable factor such as a mixing ratio of additives. In the case of process design, the record data D3 (i) can define the process amount related to one process as each element feature amount D4 (i, j), and the feature amount vector D4 (vc) (i) having each element. Can be defined. At this time, the objective variable vectors D5 (vc) (i) related to the same process can be expressed, and the function evaluation value can be expressed as the objective variable D5 (i, k) of each component of the objective variable vectors D5 (vc) (i). Therefore, the record data D3 (i) includes the feature quantity vectors D4 (vc) (i) having the process quantity as each element component and the objective variable vector D5 (vc) indicating the evaluation value of the product function which is the output of the same process. ) Since it has (i), it can be said that D3 (i) holds information on the input / output relationship of the process.
 以上に具体例として材料開発現場とプロセス設計を行う生産現場でのレコードデータD3(i)の構成を述べたが、レコードデータD3(i)が、特徴量ベクトルD4(vec)(i)と目的変数ベクトルD5(vec)(i)を持つことは同じである。レコードデータD3(i)の複数組で構成されるデータセット605は、次の段階(図1のS106~S107)において予測モデルの生成に用いられる。 Above has been described the structure of record data D3 (i) at production sites performing material development site and process design Examples, the target record data D3 (i) is the feature amount vector D4 and (vec) (i) Having the variable vectors D5 (vc) (i) is the same. The data set 605 composed of a plurality of sets of record data D3 (i) is used for generating a prediction model in the next stage (S106 to S107 in FIG. 1).
 続いて、S106からS107において、データセットを用いて予測モデルを作成するフローを以下で説明する。ここで、特徴量D4ベクトルD4(vec)(i)301は1つ以上の成分をもち、j番目の成分の特徴量D4(i,j)302と表記する。ここで、各レコードデータD3(i)のもつ特徴量ベクトルD4(vec)(i)の成分D4(i,j)は欠損値があっても良い。欠損値を許す場合は、予測モデル作成前に欠損値を補完することができる。欠損値の補完方法として、例えば多重代入法等があるがこれに限られない。
図1の処理ステップS106で示す、最適化対象の予測モデル定義の操作を説明するにあたり、予測モデルの最適化に用いる、目的変数ベクトルD5(vec)(i)または、目的変数D5(i,k)と特徴量ベクトルD4(vec)(i)の複数組で構成されるデータセットと予測モデルの関係を以下で説明する。
Subsequently, in S106 to S107, a flow for creating a prediction model using the data set will be described below. Here, the feature amount D4 vector D4 (vc) (i) 301 has one or more components, and is expressed as the feature amount D4 (i, j) 302 of the j-th component. Here, the component D4 (i, j) of the feature amount vectors D4 (vc) (i) of each record data D3 (i) may have a missing value. If missing values are allowed, the missing values can be complemented before the prediction model is created. As a method of complementing missing values, for example, there is a multiple imputation method and the like, but the method is not limited to this.
In explaining the operation of the prediction model definition of the optimization target shown in the processing step S106 of FIG. 1, the objective variable vector D5 (vc) (i) or the objective variable D5 (i, k) used for optimizing the prediction model. ) And the feature quantity vector D4 (vc) (i) The relationship between the data set composed of a plurality of sets and the prediction model will be described below.
 目的変数ベクトルD5(vec)(i)は、各機能の評価値とする目的変数を各要素にもつベクトルである。通常、1つの予測モデルは入力としてD4(vec)(candidate)(p)に対して、出力となるk番目の目的変数D5(predict)(p,k)図4の408を出力する。このような予測モデルの最適化には、データセットとして用いるレコードデータD3(i)の入力とする特徴量ベクトルD4(vec)(i)と出力とする特徴量D5(i,k)の組を一つの組として、iに対する全データをデータセットとして予測モデルの作成を行う。 The objective variable vector D5 (vc) (i) is a vector having an objective variable as an evaluation value of each function in each element. Normally, one prediction model outputs the k-th objective variable D5 (predict) (p, k) of FIG. 4, which is the output, to D4 (vc) (candidate) (p) as an input. In order to optimize such a prediction model, a set of feature quantities vectors D4 (vc) (i) as inputs and feature quantities D5 (i, k) as outputs of record data D3 (i) used as a data set is used. As one set, a prediction model is created using all the data for i as a data set.
 ただし、k-1,k,k+1,・・・のように1個以上の目的変数D5k-1,D5,D5k+1,・・・を予測対象として,同じ特徴量ベクトルD4(vec)(i)を入力として、それぞれ予測モデルM(k-1), M(k), M(k+1)・・・を作成する場合は、これらの予測モデルを同時最適化することで複数予測モデルを作成することもできる。したがって、データセットに含まれるレコードデータD3(i)の持つ、目的変数ベクトルD5(vec)(i)は、1個以上の目的変数要素を持てる。 However, with one or more objective variables D5 k-1 , D5 k , D5 k + 1, ... Such as k-1, k, k + 1, ... As prediction targets, the same feature vector D4 (vc) ( When creating prediction models M (k-1), M (k), M (k + 1) ... Using i) as an input, create multiple prediction models by simultaneously optimizing these prediction models. You can also do it. Therefore, the objective variable vectors D5 (vc) (i) of the record data D3 (i) included in the data set can have one or more objective variable elements.
 最適化後の予測モデルM(k)はD4(vec)図3の309を入力として目的変数D5図4の414を予測する関数である。これは、目的変数D5のそれぞれkごとに対して定まる予測モデルである。ここで、複数の予測モデルを同時最適化する手法の例として、ニューラルネットワークや、ランダムフォレストが考えられるが、最適化手法はこれに限らない。 The optimized prediction model M (k) is a function that predicts the objective variable D5 k 414 of FIG. 4 by using 309 of D4 (vc) FIG. 3 as an input. This is a prediction model determined for each k of the objective variable D5 k. Here, as an example of a method for simultaneously optimizing a plurality of prediction models, a neural network or a random forest can be considered, but the optimization method is not limited to this.
 目的変数データD5(i,k)において、欠損値が含まれている場合もある。このような、欠損値はデータセットの特徴量ベクトルの分布D4(vec)(i)と,欠損値を含まない目的変数D5(i,k)に基づいて補完する場合や、半教師あり学習手法によって、最終的に採用する予測モデルを作成するまでの間に、段階的に目的変数D5(i,k)の欠損値を補完しても良い。 Missing values may be included in the objective variable data D5 (i, k). Such missing values are complemented based on the distribution D4 (vc) (i) of the feature quantity vector of the data set and the objective variable D5 (i, k) that does not include the missing values, or a semi-supervised learning method. Therefore, the missing values of the objective variable D5 (i, k) may be complemented step by step until the prediction model to be finally adopted is created.
 次に処理ステップS107による予測モデル作成操作を行う前にデータセット内のレコードデータD3(i)を学習データD3train(i)図2の204、バリデーションデータD3validation(i)図2の205、テストデータD3test(i)図2の206に分ける手続きを説明する。レコードデータD3(i)で構成されるデータセットは、D3(i)をそれぞれ、学習データD3train(i)図2の204、バリデーションデータD3validation図2の205,テストデータD3test(i)図2の206、に分けられる。これにより、D3train(i)学習データセットと、D3validation(i)で構成されるバリデーションデータセット、D3test(i)で構成されるテストデータセットの3つに分けるのが良い。 Next, before performing the prediction model creation operation according to the processing step S107, the record data D3 (i) in the data set is trained data D3 train (i) 204 in FIG. 2, validation data D3 validation (i) 205 in FIG. 2, and a test. Data D3 test (i) The procedure for dividing into 206 in FIG. 2 will be described. The data set composed of the record data D3 (i) includes the training data D3 train (i) 204 in FIG. 2, the validation data D3 validation FIG. 2 205, and the test data D3 test (i), respectively. It is divided into 2 206. Thereby, it is preferable to divide into three, a D3 train (i) training data set, a validation data set composed of D3 validation (i), and a test data set composed of D3 test (i).
 ここで、予測モデルを作成するための学習には学習データセットを用い、過学習を防ぐためのハイパーパラメータの選択には、バリデーションデータセットを用い、最終的な予測精度の性能評価はテストデータセットによって行う。 Here, the training data set is used for training to create the prediction model, the validation data set is used for the selection of hyperparameters to prevent overfitting, and the performance evaluation of the final prediction accuracy is the test data set. Do by.
 また、クロスバリデーション等を行い、予測モデルを作成するまでの間にデータセット全体を以下3つ、学習データセット、バリデーションデータセット、テストデータセットへの分け方を複数回変更した後、予測モデルの予測誤差もしくは予測誤差の分布の評価により、汎化性能の高い予測モデルを選択するのが良い。ただし、上記の操作においてデータ量があまり多くない場合は、バリデーションデータセットとテストデータセットを同一のものとしてもよい。
ここで処理ステップS106において予測モデルが定義された後に、処理スッテプ107による予測モデル作成において用いられる予測モデルの作成方法は、一般化線形回帰、ロジスティック回帰、サポートベクターマシーン、ランダムフォレスト、ニューラルネットワークなどの手法が挙げられるがこれに限らない。また同様の予測モデル作成目的は回帰モデルに限らず、分類結果を予測する予測モデルの作成であってもよい。
In addition, after cross-validating, etc., and changing the method of dividing the entire data set into the following three, training data set, validation data set, and test data set multiple times before creating the prediction model, the prediction model It is better to select a prediction model with high generalization performance by evaluating the prediction error or the distribution of the prediction error. However, if the amount of data is not very large in the above operation, the validation data set and the test data set may be the same.
Here, after the prediction model is defined in the processing step S106, the method for creating the prediction model used in the prediction model creation by the processing step 107 includes generalized linear regression, logistic regression, support vector machine, random forest, neural network, and the like. Methods can be mentioned, but are not limited to this. Further, the same purpose of creating a prediction model is not limited to the regression model, and may be the creation of a prediction model that predicts the classification result.
 続いて、処理ステップS108において、処理ステップS107で作成された予測モデルの予測性能を評価する。性能評価に用いる値として、回帰問題ではRMSE(Root Mean Squared Error)や、MAE(Mean Absolute Error)で示される予測誤差や、決定係数などが考えられるがこれに限らない。予測モデルの予測精度が使用に十分であると使用者が判断できれば、予測精度の評価指標や、評価指標の許容範囲を決めるしきい値は種類を問わず、予測モデルの使用者が予測モデルを使用するか、否かを判断できればよい。 Subsequently, in the processing step S108, the prediction performance of the prediction model created in the processing step S107 is evaluated. As the value used for the performance evaluation, the prediction error indicated by RMSE (Root Mean Squared Error) or MAE (Mean Absolute Error), the coefficient of determination, etc. can be considered in the regression problem, but the value is not limited to this. If the user can determine that the prediction accuracy of the prediction model is sufficient for use, the user of the prediction model can use the prediction model regardless of the type of evaluation index of prediction accuracy and the threshold value that determines the allowable range of the evaluation index. It is only necessary to be able to decide whether or not to use it.
 同様に分類問題での、予測モデルにおける予測精度の評価は、分類結果を示す混合行列の各要素の値や、それらの値を用いて計算される正解率、適合率、再現率などが考えられるが、予測モデルの予測精度が使用に十分であると使用者が判断できればよく、予測精度の評価指標の種類についても問わない。 Similarly, in the evaluation of the prediction accuracy in the prediction model in the classification problem, the value of each element of the mixed matrix showing the classification result, the correct answer rate calculated using those values, the precision rate, the recall rate, etc. can be considered. However, as long as the user can determine that the prediction accuracy of the prediction model is sufficient for use, the type of evaluation index for the prediction accuracy does not matter.
 続いて、処理ステップS109において、処理ステップS108での予測モデル評価結果に基づき、予測性能の評価結果から、予測モデルが製品開発に対して使用することが許容されるか判断する。許容されなければ、処理ステップS102のデータの取得から処理ステップS108までの操作を改善して、再度新たな予測モデル作成と予測モデルの評価を実施することで、処理ステップS109で予測モデルの予測性能が、製品開発に対して使用するのに十分となるまでのモデル改善を行う。 Subsequently, in the processing step S109, based on the prediction model evaluation result in the processing step S108, it is determined from the evaluation result of the prediction performance whether the prediction model is allowed to be used for product development. If it is not allowed, the operation from the acquisition of the data in the processing step S102 to the processing step S108 is improved, and a new prediction model is created and the prediction model is evaluated again. However, we will improve the model until it is sufficient for use in product development.
 例えば、取得データの量や質の改善を要する場合は、処理ステップS102をやり直しデータベースに新たな生データD2(h)を加えることで、データセットD3(i)の変更を行う。また、生データD2(h)からデータD3(i)に変換する際の特徴量変換方法を変えることで、以前使用した特徴量ベクトルD4(vec)(i)と異なる形式の特徴量ベクトルD4’(vec)(i)を作成する改善方法も考えられる。処理ステップS106最適化対象の予測モデル定義の変更を要する場合、処理ステップS106で最適化方法や最適化対象の関数形を変更させる。 For example, when it is necessary to improve the quantity and quality of the acquired data, the data set D3 (i) is changed by re-doing the processing step S102 and adding the new raw data D2 (h) to the database. Further, by changing the feature amount conversion method when converting the raw data D2 (h) to the data D3 (i), the feature amount vector D4'in a format different from that of the previously used feature amount vectors D4 (vc) (i). (Vec) An improvement method for creating (i) is also conceivable. Processing step S106 When it is necessary to change the prediction model definition of the optimization target, the optimization method and the function form of the optimization target are changed in the processing step S106.
 上記に示す例のように、処理ステップS102からS108までの処理に変更を加えた後に、処理ステップS109まで再度検証行い、製品開発に使用できるまで予測モデルの改善を行う。そこで予測性能が十分と判断できると、処理ステップS110で予測モデルを製品開発に使用できる。 As in the example shown above, after making changes to the processes from process steps S102 to S108, verification is performed again up to process step S109, and the prediction model is improved until it can be used for product development. Therefore, if it can be determined that the prediction performance is sufficient, the prediction model can be used for product development in the processing step S110.
 続いて前記予測モデルを製品開発による試作設計段階や、生産現場のプロセス条件最適化等において使用する。これは処理ステップS110から処理ステップS111までの一連の段階に相当する。処理ステップS110、処理ステップS111の処理は、構築された予測モデルを実際に使用する処理過程であり、いわば計算機システム内の機能として後述する図12における予測モデル使用部40を構成したものということができる。
まず、処理ステップS110では予測モデルを用いて最適な設計試作条件を予測することで条件提案を行う。
ここで、予測モデルが出力する目的変数D5(predict)(p,k)図4の408が、製品が満たしたい機能値を予測して出力する。この目的変数D5(predict)(p,k)が所望値になる予測モデルへの入力である特徴量ベクトルD4(vec)(candidate)(p)図3の303の探索を行い取得する。この前記特徴量ベクトルD4(vec)(candidate)(p)は、製品の設計試作条件に対応しており特徴量ベクトルの各要素の数値情報として提案できる。または、所望の要求を満たす、前記特徴量ベクトルD4(vec)(candidate)(p)を複数取得した上で、各ベクトル要素の数値範囲として設計試作条件の許容範囲を提案することもできる。
Subsequently, the prediction model is used in the prototype design stage by product development, process condition optimization at the production site, and the like. This corresponds to a series of steps from the processing step S110 to the processing step S111. The processing of the processing step S110 and the processing step S111 is a processing process in which the constructed prediction model is actually used, and it can be said that the prediction model use unit 40 in FIG. 12, which will be described later, is configured as a function in the computer system. can.
First, in the processing step S110, the condition is proposed by predicting the optimum design prototype condition using the prediction model.
Here, the objective variable D5 (predict) (p, k) 408 of FIG. 4 output by the prediction model predicts and outputs the functional value that the product wants to satisfy. The feature amount vector D4 (vc) (candidate) (p), which is an input to the prediction model in which the objective variable D5 (predict) (p, k) becomes a desired value, is searched and acquired by 303 in FIG. The feature quantity vector D4 (vc) (candidate) (p) corresponds to the design prototype condition of the product and can be proposed as numerical information of each element of the feature quantity vector. Alternatively, after acquiring a plurality of the feature quantity vectors D4 (vc) (candidate) (p) that satisfy the desired requirements, it is possible to propose an allowable range of design prototype conditions as a numerical range of each vector element.
 予測モデルから提案された特徴量ベクトルD4(vec)(candidate)(p)は、設計試作条件そのものと1対1対応する特徴量ベクトルか、または設計試作条件の変換により得られる特徴量ベクトルである。この時予測モデルM(m)は、特徴量ベクトルD4(vec)(candidate)(p)を入力として、製品のk番目の機能値を目的変数D5(predict)(p,k)として予測する。複数の予測モデルを用いて、同一のD4(vec)(candidate)(p)を入力として、予測モデルごとの複数の目的変数を得ることで、目的変数ベクトルD5(vec)(predict)(p)として、複数の機能値の要求を満たす目的変数ベクトルD5(vec)(predict)(p)を出力すると時のD4(vec)(candidate)(p)を探索することができる。 The feature vector D4 (vc) (candidate) (p) proposed from the prediction model is a feature vector having a one-to-one correspondence with the design prototype condition itself, or a feature vector obtained by converting the design prototype condition. .. At this time, the prediction model M (m) predicts the k-th functional value of the product as the objective variable D5 (predict) (p, k) by inputting the feature vector D4 (vc) (candidate) (p). By using a plurality of prediction models and inputting the same D4 (vc) (candidate) (p) and obtaining a plurality of objective variables for each prediction model, the objective variable vector D5 (vc) (predict) (p) As the objective variable vector D5 (vc) (predict) (p) that satisfies the requirements of a plurality of functional values is output, D4 (vc) (candidate) (p) at the time can be searched.
 したがって、予測モデルの出力D5(predict)(p,k)が要求機能を満たすような入力である特徴量ベクトルD4(vec)(candidate)(p)を複数条件に対応するD4(vec)(candidate)(p)を予測モデルの入力として与えることで、予測モデルの出力D5(predict)(p,k)を推定することで、製品開発における設計試作条件の探索を効率化できる。この得られた入力特徴量ベクトルD4(vec)(candidate)(p)から、実施すべき設計試作条件が提案される。 Accordingly, the output of the predictive model D5 (predict) (p, k ) is requested function a feature amount such as an input so as to satisfy vector D4 (vec) (candidate) to (p) corresponding to a plurality condition D4 (vec) (candidate ) By giving (p) as the input of the prediction model, the output D5 (predict) (p, k) of the prediction model can be estimated, and the search for design prototype conditions in product development can be made more efficient. From the obtained input feature vector D4 (vc) (candidate) (p), the design prototype conditions to be implemented are proposed.
 ここで、予測モデルが機能の予測結果として、目的変数ベクトルD5(vec)(predict)(p),または目的変数D5(predict)(p,k)について、所望の値を満たす入力特徴量ベクトルD4(vec)(candidate)(p)の具体的な探索方法を以下に示す。所望の目的変数を出力する特徴量ベクトルD4(vec)(candidate)(p)の探索方法として、遺伝的アルゴリズムや勾配法などにより、予測モデルに対しての複数の入力特徴量ベクトルD4(vec)を対応する目的変数の予測結果の変化に応じて変化させることで、対象機能(目的変数,目的変数ベクトルの予測結果)が所望の値と最も近くなる値を逐次的かつ網羅的に探索し、得られた局所解として、特徴量ベクトルD4(vec)(candidate)(p)を提示するのがよい。 Here, the input feature vector D4 satisfying a desired value for the objective variable vector D5 (vc) (predict) (p) or the objective variable D5 (predict) (p, k) as the prediction result of the function of the prediction model. The specific search method of (vector ) (p) is shown below. As a search method for the feature vector D4 (vc) (candidate) (p) that outputs a desired objective variable , a plurality of input feature vectors D4 (vc) for a prediction model are used by a genetic algorithm or a gradient method. By changing according to the change in the prediction result of the corresponding objective variable, the value whose target function (objective variable, prediction result of the objective variable vector) is closest to the desired value is searched sequentially and comprehensively. As the obtained local solution, it is preferable to present the feature vector D4 (vc) (candidate) (p).
 一つの目的変数D5(predict)(p,k)に対して最適化を行う場合と、複数の目的変数を要素としてもつ目的変数ベクトルD5(vec)(predict)(p)の最適化を行う場合では、入力となる特徴量ベクトルD4(vec)(candidate)(p)を探索する際の評価関数が異なる。また、製品を設計試作する上で、特徴量ベクトルD4(vec)(candidate)(p)や、目的変数ベクトルD5(vec)(predict)(p)が満たすべき制約は、特徴量ベクトルD4(vec)(candidate)(p)の最適化における探索範囲や、最適化の際の評価関数を変更することで、探索のアルゴリズムに取り込むことが可能である。 When optimizing for one objective variable D5 (predict) (p, k) and when optimizing the objective variable vector D5 (vc) (predict) (p) having a plurality of objective variables as elements. Then, the evaluation function when searching for the input feature quantity vector D4 (vc) (candidate) (p) is different. Further, in designing and prototyping a product, the constraint that the feature vector D4 ( vc) (candidate) (p) and the objective variable vector D5 (vc) (predict) (p) must satisfy is the feature vector D4 (vc). ) (Candide) (p) By changing the search range in the optimization and the evaluation function at the time of optimization, it is possible to incorporate it into the search algorithm.
 得られた前記特徴量ベクトルD4(vec)(candidate)(p)から、設計試作条件を一意に定めるか、複数候補の設計試作条件を複数の設計試作候補の特徴量ベクトルに変換した後に、前記特徴量ベクトル候補のうち、予測モデルから得られたD4(vec)(candidate)(p)に対して、特徴量空間上で近い特徴量ベクトルを選択することで、有望な設計試作条件を推定することが可能である。 The design prototype conditions are uniquely determined from the obtained feature quantity vectors D4 (vc) (candidate) (p), or the design prototype conditions of a plurality of candidates are converted into the feature quantity vectors of a plurality of design prototype candidates, and then the above. Among the feature quantity vector candidates, a promising design prototype condition is estimated by selecting a feature quantity vector that is close to D4 (vc) (candidate) (p) obtained from the prediction model in the feature quantity space. It is possible.
 D4(vec)(candidate)(p)の探索の仕方は、単に特徴量ベクトル空間D4(vec)図3の310のもつ各要素である特徴量D4に対して、グリッドサーチを行って、所望のD5(vec)(predict)(p)を出力する特徴量ベクトルD4(vec)(candidate)(p)を探索してもよい。他にも、設計試作条件の検討範囲が予め決まっている場合は、それらの設計試作条件をすべて、特徴量ベクトルに変換した上で、予測モデルに入力した後に、出力される目的変数ベクトルD5(vec)、目的変数D5(i,k)を評価することによって、入力した特徴量ベクトルの中から、最良の特徴量ベクトルD4(vec)(candidate)(p)を選択する方法もある。これにより、定まったD4(vec)(candidate)(p)から、有望な設計試作条件を特定できる。 The method of searching for D4 (vc) (candidate) (p) is desired by simply performing a grid search on the feature amount D4 j , which is each element of 310 in the feature amount vector space D4 (vec) FIG. of D5 (vec) (predict) may be searched for (p) and outputs the feature vector D4 (vec) (candidate) ( p). In addition, if the examination range of the design prototype conditions is predetermined, all of these design prototype conditions are converted into feature vector, input to the prediction model, and then output as the objective variable vector D5 ( vec), by evaluating the objective variable D5 (i, k), from among the inputted feature vector, there is a method of selecting the best feature vector D4 (vec) (candidate) ( p). Thereby, a promising design prototype condition can be specified from the determined D4 (vc) (candidate) (p).
 ただし上記の設計、試作条件の提案方法は一例であり、予測モデルを用いた提案結果として、予測モデルの出力結果である機能評価結果(目的変数または目的変数ベクトルの予測結果)から、設計試作条件に対応する予測モデルの入力特徴量ベクトルD4(vec)(candidate)(p)もしくは、入力特徴量ベクトルの範囲を提案することで、前記特徴量ベクトルD4(vec)(candidate)(p)に対応する設計試作条件を推定できるものであれば、上記の例に限らない。 However, the above design and trial production condition proposal method is an example, and as a proposal result using the prediction model, the design trial production condition is obtained from the function evaluation result (prediction result of the objective variable or the objective variable vector) which is the output result of the prediction model. in the input feature vector D4 of the prediction model corresponding (vec) (candidate) (p ) or proposes a range of input feature vectors things, corresponding to the feature vector D4 (vec) (candidate) (p ) The above example is not limited to the above example as long as the design prototype conditions can be estimated.
 処理ステップS110では、前述のように予測モデルM(m)を用いて設計試作条件に対応する特徴量ベクトルD4(vec)(candidate)(p)を提案する。この際提案された特徴量ベクトルD4(vec)(candidate)(p)は、処理ステップS111によって適宜、m番目の予測モデルM(m)のn番目の使用履歴データD6(m,n)図2の208に変換される。M番目の予測モデルM(m)のn番目の使用履歴データD6(m,n)は図7に示す208に対応する。この使用履歴データD6(m,n)は予測モデル使用履歴データベース709に適宜追加し保存される。 In the processing step S110, as described above, the feature quantity vector D4 (vc) (candidate) (p) corresponding to the design prototype condition is proposed using the prediction model M (m). The feature quantity vector D4 (vc) (candidate) (p) proposed at this time is appropriately used by the processing step S111 in the nth usage history data D6 (m, n) of the mth prediction model M (m) FIG. Is converted to 208. The nth usage history data D6 (m, n) of the Mth prediction model M (m) corresponds to 208 shown in FIG. This usage history data D6 (m, n) is appropriately added and saved in the prediction model usage history database 709.
 この時、予測モデルM(m)のn番目の使用履歴データD6(m,n)208の構成を説明する。ここで、mは予測モデルM(m)のそれぞれに固有で与えられるインデックスである。またnは、予測モデルM(m)使用履歴データのうちn番目データであることを示すインデックスであり,同一の予測モデルM(m)に対する使用履歴データであっても、使用履歴データD6(m,n)はnごとに区別される。 At this time, the configuration of the nth usage history data D6 (m, n) 208 of the prediction model M (m) will be described. Here, m is an index uniquely given to each of the prediction models M (m). Further, n is an index indicating that it is the nth data in the usage history data of the prediction model M (m), and even if it is the usage history data for the same prediction model M (m), the usage history data D6 (m). , N) are distinguished for each n.
 図7の208に示すように使用履歴データD6(m,n)は、少なくとも以下の4つの要素のすべてもしくは、その組み合わせで構成される。1つ目の要素702は予測モデルM(m)を区別するインデックスmである。このインデックスにより、図1の処理S106で最適化対象の予測モデルの定義した際の予測モデルを特定できるため、後述の処理ステップS112が実施できる。 As shown in 208 of FIG. 7, the usage history data D6 (m, n) is composed of at least all of the following four elements or a combination thereof. The first element 702 is an index m that distinguishes the prediction model M (m). Since this index can specify the prediction model when the prediction model to be optimized is defined in the process S106 of FIG. 1, the process step S112 described later can be performed.
 2つ目の要素703は、予測モデルM(m)の作成に用いたレコードデータD3(i)、もしくはD3(i)を参照できる情報であり、例えばデータセットのレコードデータを区別するインデックスである。加えて学習データD3train(i)、テストデータD3test(i)、検証データD3validation(i)のように、予測モデル作成時の使用方法を区別できる情報も持つ。この情報も1つ目の要素と同様に、処理ステップS112による予測モデル使用履歴への各データの寄与度解析に用いられる。 The second element 703 is information that can refer to the record data D3 (i) or D3 (i) used for creating the prediction model M (m), and is, for example, an index that distinguishes the record data of the data set. .. In addition, it also has information that can distinguish the usage method at the time of creating a prediction model, such as training data D3 train (i), test data D3 test (i), and verification data D3 validation (i). Similar to the first element, this information is also used for the contribution analysis of each data to the prediction model usage history by the processing step S112.
 加えて、レコードデータD3(i)は、生成に用いた生データD2(h)のインデックスh図6の603を保持しているため、D3(i)に対応する生データD2(h)のデータベースへの追加時刻(図5の503)がわかる。この追加時刻のデータを処理ステップS113で用いている。 In addition, since the record data D3 (i) holds the index h of the raw data D2 (h) used for generation 603 in FIG. 6, the database of the raw data D2 (h) corresponding to D3 (i) is stored. You can see the time of addition to (503 in FIG. 5). The data at the additional time is used in the processing step S113.
 3つ目の要素704は、設計試作条件提案に用いられた特徴量ベクトルD4(vec)を入力として目的変数D5を出力する関数(プログラムの機能としての関数)であり、予測モデルM(m)の機能を指す。ただし、予測モデルM(m)の形式は問わず、入力に対しある一定の処理を行った結果を返すものであればよく、プログラム機能を示すオブジェクトであってもいいし、外部プログラムのAPIであってもよい。 The third element 704 is a function (function as a function of the program) that outputs the objective variable D5 k by inputting the feature amount vector D4 (vc) used in the proposal of the design prototype condition, and is a prediction model M (m). ) Function. However, regardless of the format of the prediction model M (m), it may be an object indicating a program function as long as it returns the result of performing a certain process on the input, or it may be an API of an external program. There may be.
 4つ目の要素705は、”予測モデルM(m)の使用履歴”であり、使用履歴は予測モデルM(m)が提案した設計、試作条件に対応する特徴量ベクトルD4(vec)(candidate)(p)の集合P(m,n)である。ここで、インデックスmは予測モデルM(m)を指すインデックスであり、nはn番目使用履歴データが持つことを示しており、D6(m,n)のインデックスnに対応する。したがって、予測モデルM(m)を用いてn回目の試作条件提案を行った際に提案された、複数の特徴量ベクトルD4(vec)(candidate)(p)で構成された集合がP(m,n)である。 The fourth element 705 is the "usage history of the prediction model M (m)", and the usage history is the feature vector D4 (vc) (candidate) corresponding to the design and prototype conditions proposed by the prediction model M (m). ) It is a set P (m, n) of (p). Here, the index m is an index indicating the prediction model M (m), n indicates that the nth usage history data has it, and corresponds to the index n of D6 (m, n). Therefore, the set composed of a plurality of feature vectors D4 (vc) (candidate) (p) proposed when the nth trial production condition is proposed using the prediction model M (m) is P (m). , N).
 次に前記使用履歴データD6(m,n)と予測モデル使用履歴データベース709の関係を説明する。図7の下部には、予測モデル使用履歴データベース709の構造例を示している。予測モデル使用履歴データベース709に含まれる予測モデルM(m)使用履歴全データD6(m)708は、予測モデルM(m)のn番目の使用履歴データD6(m,n)をインデックスmごとに区別して、すべてのインデックスnに対して統合したものである。これは、図7の708で示す予測モデルM(m)使用履歴全データであり、M(1),M(2)についての使用履歴全データは706,707で示している。このような、予測モデルM(m)について、それぞれのmについて使用履歴全データを保存したものが使用履歴データベース709である。 Next, the relationship between the usage history data D6 (m, n) and the prediction model usage history database 709 will be described. At the bottom of FIG. 7, a structural example of the prediction model usage history database 709 is shown. The prediction model M (m) usage history total data D6 (m) 708 included in the prediction model usage history database 709 uses the nth usage history data D6 (m, n) of the prediction model M (m) for each index m. It is distinct and integrated for all indexes n. This is the total usage history data of the prediction model M (m) shown in 708 of FIG. 7, and the total usage history data of M (1) and M (2) is shown by 706 and 707. For such a prediction model M (m), the usage history database 709 stores all usage history data for each m.
 ここで、予測モデルM(m)使用履歴全データD6(m)のデータ形式は、208に示す予測モデルM(m)使用履歴データD6(m,n)と同様で、4つの要素702、703、704、705を含むものとする。 Here, the data format of the prediction model M (m) usage history total data D6 (m) is the same as the prediction model M (m) usage history data D6 (m, n) shown in 208, and the four elements 702 and 703 , 704, 705 shall be included.
 ただし、予測モデルM(m)使用履歴全データD6(m)708における、要素705にあたる予測モデルM(m)の使用履歴の集合は、特徴量ベクトルD4(vec)(candidate)(p)の集合P(m)と表記する。 However, the set of the usage history of the prediction model M (m) corresponding to the element 705 in the prediction model M (m) usage history total data D6 (m) 708 is the set of the feature vector D4 (vc) (candidate) (p). Notated as P (m).
 この前記特徴量ベクトルD4(vec)(candidate)(p)の集合P(m)は、予測モデル使用履歴全データD6(m)708に追加された、予測モデルM(m)のn番目使用履歴データD6(m,n)が要素705として保持していた特徴量ベクトルの集合P(m,n)をすべてのnについて統合した集合P(m)である。ここで、集合内の特徴量ベクトルD4(vec)(candidate)(p)は重複しない。この集合P(m)内のD4(vec)(candidate)(p)は、処理ステップS112からS115において、データ価値定義のために使用される。 The set P (m) of the feature quantity vectors D4 (vc) (candidate) (p) is the nth usage history of the prediction model M (m) added to the prediction model usage history total data D6 (m) 708. It is a set P (m) in which the set P (m, n) of feature vector held by the data D6 (m, n) as the element 705 is integrated for all n. Here, the feature vector D4 (vc) (candidate) (p) in the set does not overlap. D4 (vc) (candidate) (p) in this set P (m) is used for data value definition in processing steps S112 to S115.
 以上の処理ステップS110から処理ステップS111により予測モデル使用履歴データベース709が作成された。 The prediction model usage history database 709 was created by the above processing steps S110 to processing step S111.
 次に、図1の処理ステップS112から処理ステップS115において、予測モデルの使用履歴データベース図7の709を用いて各生データD2(h)の価値定義を行う。処理ステップS112から処理ステップS115までの一連の処理は、予測モデルの使用履歴データベース709をもとに、各生データD2(h)にデータ価値を定義する処理過程であることから、いわば計算機システム内の機能として後述する図12のデータ価値定義部50を構成したものということができる。 Next, in the processing steps S112 to S115 of FIG. 1, the value of each raw data D2 (h) is defined using the usage history database FIG. 7 709 of the prediction model. Since the series of processes from the process step S112 to the process step S115 is a process of defining the data value in each raw data D2 (h) based on the usage history database 709 of the prediction model, so to speak, in the computer system. It can be said that the data value definition unit 50 of FIG. 12, which will be described later, is configured as the function of.
 処理ステップS110から処理ステップS115までの処理は、上記予測モデル使用履歴データベース709に含まれる設計、試作条件に対応する特徴量ベクトルD4(vec)(candidate)(p)を用いて、特徴量ベクトルD4(vec)(candidate)(p)の提案の根拠となった、特徴量ベクトルD4(vec)(candidate)(p)に対応する出力結果である目的変数D5(predict)(p,k)への、各レコードデータD3(i)の寄与度と、各データD3(i)のもとになった生データD2(h)のデータベースへの追加時刻に応じて、最終的な生データD2(h)価値を定義する。 In the processing from the processing step S110 to the processing step S115, the feature quantity vector D4 (vc) (candidate) (p) corresponding to the design and trial production conditions included in the prediction model usage history database 709 is used. (Vec) (data) (p) To the objective variable D5 (predict) (p, k), which is the output result corresponding to the feature vector D4 (vc) (candidate) (p), which was the basis of the proposal. , The final raw data D2 (h) according to the contribution of each record data D3 (i) and the time when the raw data D2 (h) on which each data D3 (i) is added to the database. Define value.
 各レコードデータの予測結果である、予測モデル使用履歴への各データの寄与度解を処理ステップS111で行い、各データの追加時刻に応じた寄与度への重み計算を処理ステップS112で行う。続いて、各データの重み付け寄与度の計算を示す処理ステップS113で行い、最終的には、各データの重み付け寄与度からデータ価値へ変換を行う、処理ステップS115により、生データD2(h)の価値を定義する。この一連の流れを以下で説明する。 The contribution solution of each data to the prediction model usage history, which is the prediction result of each record data, is performed in the processing step S111, and the weight calculation to the contribution according to the addition time of each data is performed in the processing step S112. Subsequently, in the processing step S113 showing the calculation of the weighted contribution of each data, and finally, the weighted contribution of each data is converted into the data value by the processing step S115. Define value. This series of flow will be described below.
 まず、処理ステップS112において、予測モデルM(m)使用履歴全データがもつある特徴量ベクトルD4(vec)(candidate)(p)が示す設計試作条件の提案に際して各レコードデータD3(i)の寄与度を求める。 First, in the processing step S112, the contribution of each record data D3 (i) in proposing the design prototype conditions indicated by the feature quantity vectors D4 (vc) (candidate) (p) of the prediction model M (m) usage history all data. Find the degree.
 処理ステップS112の説明は図8を用いて行う。図8は本発明に係る、設計試作データ点(p)に対する各データの寄与度の定義方法の概念図である。 The processing step S112 will be described with reference to FIG. FIG. 8 is a conceptual diagram of a method for defining the contribution of each data to the design prototype data point (p) according to the present invention.
 図8において,図801、802に示す特徴量空間の模式図は、横軸に特徴量ベクトルD4(vec) (図3の309)を示す。縦軸は目的変数D5(図4の414)である。これは予測対象機能を各成分にもつベクトルである目的変数ベクトルD5(vec)のk番目の成分D5(図4の414)であり、予測対象機能の評価値の一つを示したものである。ここで、図8で示す図内で示す一例は、製品開発目標は予測対象機能D5を最大にすることを目指しているとする。 In FIG. 8, the schematic diagram of the feature amount space shown in FIGS. 801 and 802 shows the feature amount vector D4 (vc) (309 in FIG. 3) on the horizontal axis. The vertical axis is the objective variable D5 k (414 in FIG. 4). This is the k-th component D5 k (414 in FIG. 4) of the objective variable vector D5 (vec) , which is a vector having a prediction target function for each component, and shows one of the evaluation values of the prediction target function. be. Here, in the example shown in the figure shown in FIG. 8, it is assumed that the product development goal aims to maximize the prediction target function D5 k.
 図8の特徴量空間の模式図801、802に示す横軸、特徴量ベクトルD4(vec)は、多次元空間上のベクトルを簡単のため1次元で示している。ただし通常、D4(vec)は多次元ベクトルである。縦軸を目的変数D5として、予測モデルM(m)は特徴量ベクトルD4(vec)に対して目的変数D5を出力する応答曲線806(D4(vec)が多次元である場合、応答曲面)として図内に示される。 Schematic of the feature space of FIG. 8 The horizontal axis and the feature vector D4 (vc) shown in FIGS. 801 and 802 show the vector in the multidimensional space in one dimension for the sake of simplicity. However, D4 (vec) is usually a multidimensional vector. With the vertical axis as the objective variable D5 k , the prediction model M (m) outputs the objective variable D5 k with respect to the feature vector D4 (vc) . When the response curve 806 (D4 (vc)) is multidimensional, the response curved surface ) Is shown in the figure.
 模式図801,802に示す予測モデルM(m)806は特徴量ベクトルD4(vec)の左から右に増加するにしたがって、出力値D5は増加傾向を示したのちに平坦部を示し、その後再増加して特徴量ベクトルD4(vec)(candidate)(p)の点で目的変数D5を最大と予測することを示しており、その後減少する傾向にある。 In the prediction model M (m) 806 shown in the schematic views 801 and 802, as the feature vector D4 (vc) increases from left to right, the output value D5 k shows an increasing tendency and then shows a flat portion, and then shows a flat portion. It is shown that the objective variable D5 k is predicted to be the maximum in terms of the feature vector D4 (vc) (candidate) (p) by increasing again, and then tends to decrease.
 ここで、予測モデルM(m)806の作成に用いた学習データD4train(i)は凡例803に示すように、特徴量空間上で、黒四角の点として表記される。使用履歴を示す特徴量ベクトルD4(vec)(candidate)(p)を示す点804は、設計、試作を行った条件に対応する横軸特徴量ベクトルD4(vec)上の点D4(vec)(candidate)(p)に対する予測モデルM(m)の応答関係を示した点を804として示す。 Here, the training data D4 train (i) used to create the prediction model M (m) 806 is represented as a black square point on the feature space as shown in the legend 803. Feature vector indicating the usage history D4 (vec) (candidate) 804 point indicating the (p) are designed, the point on the horizontal axis feature vectors D4 corresponding to the condition of performing the trial (vec) D4 (vec) ( The point showing the response relationship of the prediction model M (m) to the candidate (p) is shown as 804.
 次に図802の特徴量空間の模式図を用いて、任意の一つとして選択された学習データD3(‘) train(i)の前記予測モデルM(m)の応答関係を示した点804への寄与度の定義方法を示す。以下では、この選択された前記学習データD3(‘) train(i)の特徴量ベクトルはD4(‘) train,(vec)(i)と表記し、目的変数はD5(‘) train(i)と表記する。 Next, using the schematic diagram of the feature space of FIG. 802, to the point 804 showing the response relationship of the prediction model M (m) of the training data D3 (') train (i) selected as an arbitrary one. The definition method of the contribution of is shown. In the following, the feature vector of the selected training data D3 (') train (i) is expressed as D4 (') train, (vc) (i), and the objective variable is D5 (') train (i). Notated as.
 予測モデルM(m)の予測結果への学習データD3(‘) train(i)の前記寄与度を定量化する方法を以下に示す。予測モデルM(m)により提案された設計試作条件を示す特徴量ベクトルD4(vec)(candidate)(p)は、処理ステップS111で図7の予測モデル使用履歴データベース709に保存されたものである。 The method of quantifying the contribution of the training data D3 (') train (i) to the prediction result of the prediction model M (m) is shown below. The feature vector D4 (vc) (candidate) (p) indicating the design prototype conditions proposed by the prediction model M (m) is stored in the prediction model usage history database 709 of FIG. 7 in the processing step S111. ..
 まず、予測モデルM(m)は、設計試作条件に対応する特徴量ベクトルD4(vec)(candidate)(p)を入力として、出力値D5(predict)(p,k)を出力する。この予測モデルの応答関係は前記丸点804に示される。 First, the prediction model M (m) inputs the feature quantity vector D4 (vc) (candidate) (p) corresponding to the design prototype condition, and outputs the output value D5 (predict) (p, k). The response relationship of this prediction model is shown by the circle point 804.
 ここで、予測モデルM(m)の予測結果D5(predict)(p,k)の値に対するある学習データD3(‘) train(i)の寄与度の定量化は、予測モデルM(m)の予測結果(黒線)と、予測モデルM(m)の生成時に学習データD3(‘) train(i)を学習データセットに含まなかった際の予測モデル(D3(‘) train(i),m)810の予測結果(黒破線)を比較することで行われる。 Here, the contribution of a certain training data D3 (') train (i) to the value of the prediction result D5 (predict) (p, k) of the prediction model M (m) is quantified by the prediction model M (m). The prediction result (black line) and the prediction model (D3 (') train (i), m) when the training data D3 (') train (i) is not included in the training data set when the prediction model M (m) is generated. ) It is performed by comparing the prediction results (black broken lines) of 810.
 ここで、予測モデルMdelete(D3(‘) train(i),m)は、設計試作条件に対応する特徴量ベクトルD4(vec)(candidate)(p)に対する予測結果として、黒バツ807を出力する。この意味することは、予測モデルM(m)を用いた場合,予測対象機能の値であるD5を最大とする設計試作条件は、設計、試作データ(p)の黒丸804を予測するが、仮に学習データD3(‘) train(i)を学習データが存在しなかった場合の予測モデルMdelete(D3(‘) train(i),m)は、D4(vec)(candidate)(p)を入力することで、予測対象機能D5を最大となるという予測ができなかったことになる。したがって、D3(‘) train(i)は、設計試作条件(p)の提案に大きく寄与していたことが分かる。 Here, the prediction model M delegate (D3 (') train (i), m) outputs a black cross 807 as a prediction result for the feature vector D4 (vc) (candidate) (p) corresponding to the design prototype condition. do. This means that when the prediction model M (m) is used , the design trial condition that maximizes the value of the prediction target function, D5 k , predicts the black circle 804 of the design and trial data (p). Assuming that the training data D3 (') train (i) does not exist, the prediction model M delegate (D3 (') train (i), m) uses D4 (vec) (candidate) (p). By inputting it, it is not possible to predict that the prediction target function D5 k will be maximized. Therefore, it can be seen that D3 (') train (i) greatly contributed to the proposal of the design prototype condition (p).
 この前記寄与を寄与度の定量化は、設計試作条件に対応するD4(vec)(candidate)(p)に対する、目的変数D5の実測値D5(actual)(p,k)図4の410が試作後明らかになっておりこの前記実測値を用いて定義できる場合と、目的変数の実測値D5k(actual)(p,k)が明らかでない場合の両方の寄与度の定量化方法を示す。 Quantification of the said contribution contribution is for D4 corresponds to the design prototype conditions (vec) (candidate) (p ), measured value D5 (actual) (p, k ) of the objective variable D5 k 410 of FIG. 4 A method for quantifying the degree of contribution is shown both when it is clarified after the prototype and can be defined using the measured value, and when the measured value D5 k (actual) (p, k) of the objective variable is not clear.
 まず、前記実測値D5(vec)(p,k)が設計試作により明らかになっており、これを用いて寄与度の定義する方法を示す。特徴量ベクトルD4(vec)(candidate)(p)に対応する設計試作条件で設計した際の目的変数D5の実測値D5(actual)(p,k)の図4の410と、D4(vec)(candidate)(p)に対する予測モデルM(m)の予測結果D5(predict)(p,k)図4の408と予測モデルMdelete(D3(‘) train(i),m)の予測結果D5(v-predict[i])(p,k)図4の412の3つの予測結果を用いて任意のレコードデータD3train(i)の寄与度を定義できる。この前記寄与度をC(D3train(i),p)とする。ここで、i番目の学習データD3train(i)と、設計試作条件に対応した特徴量ベクトルD4(vec)(candidate)(p)のインデックスであるpに伴い、前記寄与度C(D3train(i),p)は定まる。この前記寄与度は(1)式に従う。 First, the measured values D5 (vc) (p, k) have been clarified by design trial production, and a method of defining the degree of contribution using this is shown. Actually measured values of the objective variable D5 k when designed under the design prototype conditions corresponding to the feature quantity vectors D4 (vc) (candidate) (p) 410 and D4 (vc ) in FIG. 4 of D5 (actual) (p, k) ) prediction result (candidate) (p) prediction result of the prediction model M (m) for D5 (predict) (p, k ) 408 and the prediction of FIG. 4 model M delete (D3 ( ') train (i), m) D5 (v-predict [i]) (p, k) The contribution of arbitrary record data D3 train (i) can be defined using the three prediction results of 412 in FIG. Let this contribution be C (D3 train (i), p). Here, the contribution degree C (D3 train ) is accompanied by the i-th learning data D3 train (i) and p, which is an index of the feature vector D4 (vc) (candidate) (p) corresponding to the design prototype condition. i) and p) are determined. The contribution depends on the equation (1).
Figure JPOXMLDOC01-appb-M000001
Figure JPOXMLDOC01-appb-M000001
 上記の(1)式で用いられている目的変数D5の実測値D5actual(p,k)は図4の409に対応する変数である。また、予測モデルM(m)の目的変数D5の予測値D5(predict)(p,k)は、図4の408に対応する変数であり、予測モデルMdelete(D3train(i),m)の目的変数D5の予測値D5(v-predict(i))(p,k)は、図4の412に示す変数である。 The measured value D5 actual (p, k) of the objective variable D5 k used in the above equation (1) is a variable corresponding to 409 in FIG. Further, the predicted value D5 (predict) (p, k) of the objective variable D5 k of the predicted model M (m) is a variable corresponding to 408 in FIG. 4, and the predicted model M delegate (D3 train (i), m). ), The predicted value D5 (v-predict (i)) (p, k) of the objective variable D5 k is the variable shown in 412 of FIG.
 この時、寄与度C(D3tain(i),p)は図8の809の説明するように,両端矢印の差分を示す値で示される。この両端矢印で示す差分は、予測結果D5(v-predict[i])(p,k)807から予測結果D5(predict)(p,k)804を比較することによって、実測値D5(actual)808の予測精度を、学習データD3(‘) train(i)がどの程度向上させたかを示した値となる。したがって、予測精度を低下させたデータに関してはマイナスの寄与度となる。 At this time, the degree of contribution C (D3 tain (i), p) is indicated by a value indicating the difference between the double-ended arrows, as explained by 809 in FIG. The difference indicated by the double -headed arrow is the measured value D5 (actual) by comparing the prediction result D5 (v-predict [i]) (p, k) 807 with the prediction result D5 (predict) (p, k) 804. It is a value indicating how much the prediction accuracy of 808 is improved by the training data D3 (') train (i). Therefore, it is a negative contribution to the data whose prediction accuracy is lowered.
 一方で、目的変数の実測値D5k(actual)(p,k)が明らかとなっている場合が望ましいが、明らかでない場合は以下の方法で寄与度を定義できる。(2)式に従い、単に予測モデルMdelete(D3train(i),m)と予測モデル(m)の2つの予測モデルの特徴量ベクトルD4(vec)(candidate)(p)の入力に対する予測結果の差分を、前記寄与度を定義する。 On the other hand, it is desirable that the measured value D5 k (actual) (p, k) of the objective variable is clear, but if it is not clear, the contribution can be defined by the following method. According to the equation (2), the prediction result for the input of the feature vector D4 (vc) (candidate) (p) of the two prediction models, that is, the prediction model M delegate (D3 train (i), m) and the prediction model (m). The difference between the above is defined as the degree of contribution.
Figure JPOXMLDOC01-appb-M000002
Figure JPOXMLDOC01-appb-M000002
 つまり、予測モデルM(m)に提案された設計試作条件を示す特徴量ベクトルD4(vec)(candidate)(p)の予測結果が、学習データD3(‘) train(i)を学習データとして含むか含まないかに応じた変化する量を定義することで、前記変化を学習データD3(‘) train(i)の寄与度を定義したことになる。 That is, the prediction result of the feature quantity vector D4 (vc) (candidate) (p) indicating the design prototype condition proposed in the prediction model M (m) includes the training data D3 (') train (i) as the training data. By defining the amount of change depending on whether or not it is included, the contribution of the training data D3 (') train (i) is defined for the change.
 前記寄与度を求める方法として、寄与度を求めるある学習データD3(‘) train(i)を除いたデータセットを作成した後、予測モデルを再び最適化することで、予測モデルMdelete(D3(‘) train(i),m)を作成することで、学習データD3(‘) train(i)の有無に伴う予測結果に与える変化を導出して前記寄与度を求めることは原理上可能であるが、計算コスト観点から現実的でなく、以下のような代替方法を利用する方がよい。予測モデルの入力となる、設計試作条件(p)に対応する特徴量ベクトルD4(vec)(candidate)(p)の特徴量空間付近の局所的な特徴量空間の応答関係のみに着目することで、予測結果への各学習データD3train(i)の寄与度を記述するのが望ましい。例えば、一般化線形モデルや、畳み込みニューラルネットワークに関しては、文献[Pang Wei Koh and Percy Liang. “Understanding Black-box Predictions via Influence Functions”. In: International Conference on Machine Learning. 2017, pp. 1885-1894.]に示される方法が適用できる。また、ランダムフォレストやGBDT(Gradient Boosted Decision Tree)等のアンサンブル学習においては、文献[Boris Sharchilev et al. “Finding Influential Training Samples for Gradient Boosted Decision Trees”. In: arXiv preprint arXiv:1802.06640 (2018).]を用いることができる。ただし、予測結果に対する各学習データの有無による変化およびその変化を推定できる方法から寄与度を定義できれば良いため上記方法には制限するわけではない。 As a method of obtaining the degree of contribution, a data set excluding a certain training data D3 (') train (i) for obtaining the degree of contribution is created, and then the prediction model is optimized again to obtain the prediction model M delegate (D3 (D3). ') By creating trains (i) and m), it is possible in principle to derive the change given to the prediction result with and without the training data D3 (') train (i) and obtain the contribution. However, it is not realistic from the viewpoint of calculation cost, and it is better to use the following alternative method. By focusing only on the response relationship of the local feature space near the feature space of the feature vector D4 (vc) (candidate) (p) corresponding to the design prototype condition (p), which is the input of the prediction model. , It is desirable to describe the contribution of each training data D3 train (i) to the prediction result. For example, regarding generalized linear models and convolutional neural networks, the literature [Pang Wei Koh and Percy Liang. “Understanding Black-box Predictions via Influence Functions”. In: International Conference on Machine Learning. 2017, pp. 1885-1894. ] Can be applied. Further, in ensemble learning such as random forest and GBDT (Gradient Tree Decision Tree), the literature [Boris Sharchilev et al. "Finding Infruential Training Samples for Grandience BOSTED Decision Trees". In: arXiv preprint arXiv: 1802.06640 (2018). ] Can be used. However, the above method is not limited as long as the contribution can be defined from the change in the prediction result depending on the presence or absence of each learning data and the method that can estimate the change.
 上記の寄与度の定義方法は、データセット内において、学習データD3train(i)に限った予測結果への寄与度に基づく定義方法であり、予測モデル作成の際に学習から除かれたテストデータD3test(i),バリデーションデータDvalidation(i)においては寄与度が定義できていない。ここで、D3test(i),Dvalidation(i)は予測モデル作成のための学習に寄与していないため、それぞれの寄与度、C(D3(‘) test(i),p),C(D3(‘) validation, p)は0とすることもできる。 The above-mentioned definition method of contribution is a definition method based on the contribution to the prediction result limited to the training data D3 train (i) in the data set, and is the test data excluded from the training when the prediction model is created. Contribution degree cannot be defined in D3 test (i) and validation data D validation (i). Here, since D3 test (i) and D validation (i) do not contribute to the learning for creating the prediction model, their respective contributions, C (D3 (') test (i), p), C ( D3 (') validation , p) can also be 0.
 一方で、学習データ、テストデータ、バリデーションデータにデータセットが偶然分けられた段階で、学習データから除かれることで、寄与度が失われてデータ価値が失うことを防ぐために、テストデータD3test(i)、バリデーションデータD3validation(i)を特徴量空間近傍の学習データに関連付けることで、それぞれ寄与度を定義することができる。 On the other hand, in order to prevent the contribution and data value from being lost by being removed from the training data when the data set is accidentally divided into training data, test data, and validation data, test data D3 test ( By associating i) and the validation data D3 validation (i) with the training data in the vicinity of the feature amount space, the degree of contribution can be defined respectively.
 一例として、特徴量ベクトル空間D4(vec)における、各特徴量成分に対する平方和の平方根を算出することで得られるユークリッド距離を用いることで、テストデータD3test(i)に対して、最も近い距離にある学習データD3train(i)の価値C(D3train(i),p)と、同じ価値をD3test(i)が持つとして扱える。バリデーションデータD3validation(i)の予測結果への寄与度についても、同様に最も近い距離にある学習データD3train(i)と同じ寄与度として定義できる。 As an example, the closest distance to the test data D3 test (i) by using the Euclidean distance obtained by calculating the square root of the sum of squares for each feature component in the feature vector space D4 (vc). It can be treated as if the D3 test (i) has the same value as the value C (D3 train (i), p) of the training data D3 train (i) in. The contribution of the validation data D3 validation (i) to the prediction result can also be defined as the same contribution as the learning data D3 train (i) at the closest distance.
 以上の操作から、データセット内部のすべてのD3train(i)、D3test(i)、D3validation(i)について、予測モデルM(m)に提案された設計試作条件を示す特徴量ベクトルD4(vec)(candidate)(p)の予測結果の変化に対するレコードデータD3(i)の寄与度C(D3(i),p)が定義できた。 From the above operations, for all the D3 trains (i), D3 test (i), and D3 validation (i) in the data set, the feature quantity vector D4 ( feature quantity vector D4) indicating the design prototype conditions proposed in the prediction model M (m). The contribution C (D3 (i), p) of the record data D3 (i) to the change in the prediction result of vc) (candate) (p) could be defined.
 続いて、データセット内部のデータD3(i)それぞれに対して定義された前記寄与度C(D3(i),p)の標準化を行う。予測結果の対象機能に依存する寄与度のスケールの影響を除くために無次元化する。この際、寄与度の相対的な変化率が維持できれば、標準化の方法に特段制限はない。 Subsequently, the contribution C (D3 (i), p) defined for each of the data D3 (i) inside the data set is standardized. Dimensionless to eliminate the influence of the scale of contribution that depends on the target function of the prediction result. At this time, as long as the relative rate of change of contribution can be maintained, there is no particular limitation on the standardization method.
 例えば、(1)式によって、定義されるデータの寄与度は負の値をとるため、このようなデータの価値がマイナスとなるのを防ぐために、データの寄与度が負のC(D3(i),p)の値は0に置き換える。その上で、ある特徴量ベクトルD4(vec)(candidate)(p)に対するデータの寄与度C(D3(i),p)を全データD3(i)に対して足し合わせた合計で、C(D3(i),p)を割ることで標準化できる。この標準化された寄与度を標準化寄与度Cstandard(D3(i),p)として、この値を以降の処理ステップS113でデータ価値の計算に使用していく。 For example, since the contribution of the data defined by Eq. (1) takes a negative value, C (D3 (i)) in which the contribution of the data is negative is to prevent the value of such data from becoming negative. ) And p) are replaced with 0. Then, the contribution C (D3 (i), p) of the data to a certain feature vector D4 (vc) (candidate) (p) is added to all the data D3 (i), and the total is C ( It can be standardized by dividing D3 (i), p). This standardized contribution is set as the standardized contribution C standard (D3 (i), p), and this value is used in the calculation of the data value in the subsequent processing step S113.
 以上の処理ステップS112に従って、予測モデル使用履歴の設計試作データD4(vec)(candidate)(p)への各レコードデータD3(i)に対する前記標準化寄与度Cstandard(D3(i),p)を取得できた。 According to the above processing step S112, the standardized contribution degree C standard (D3 (i), p) to each record data D3 (i) to the design prototype data D4 (vc) (candidate) (p) of the prediction model usage history is obtained. I was able to get it.
 次に、図1処理ステップS113による、各データの追加時刻に応じた寄与度への重み計算および、処理ステップS114による、各データの重み付け寄与度の計算について、図9を用いて説明する。 Next, the calculation of the weighting contribution of each data according to the addition time of each data according to the processing step S113 and the calculation of the weighting contribution of each data according to the processing step S114 will be described with reference to FIG.
 ここで、図9は設計試作データ点(p)への各レコードデータD3(i)の寄与度に対するデータ追加時刻に応じた重みづけ方法の概念図を示している。
前記標準化寄与度Cstandard(D3(i),p)は、図8の設計試作データ点(p)D4(vec)(candidate)(p)を入力とする予測結果の変化から求まるデータD3(i)の予測結果への寄与度である。S113ではこの前記寄与度に対して、データの取得時刻に応じてデータの価値を求めるために、重み付けを行う過程を説明する。
Here, FIG. 9 shows a conceptual diagram of a weighting method according to the data addition time with respect to the contribution of each record data D3 (i) to the design prototype data point (p).
The standardized contribution C standard (D3 (i), p) is the data D3 (i) obtained from the change in the prediction result obtained by inputting the design prototype data points (p) D4 (vc) (candidate) (p) of FIG. ) Contributes to the prediction result. In S113, a process of weighting the contribution degree in order to obtain the value of the data according to the data acquisition time will be described.
 図9の901は、処理ステップS112によって求まった予測結果へのデータD3(i)の標準化寄与度Cstandard(D3(i),p)を縦軸に、設計試作データ点(p)の特徴量ベクトルD4(vec)(candidate)(p)からの特徴量空間D4(vec)上のユークリッド距離を横軸とした。各データD3(i)を3点、データ(1)、データ(2)、データ(3)としてそれぞれ黒三角605、黒四角606、黒バツ607で示している。
この時、予測モデル作成に使用した各データD3(i)の特徴量ベクトルD4(vec)(i)のj番目の特徴量ベクトルの成分D4(i,j)は、データセット全体であるすべてのiに対して、各成分jの特徴量D4(i,j)はデータセットを作成する段階で、標準偏差が1となるように標準化していることが多く、特徴量のもとになった情報特有のスケール差は特徴量D4(i,j)を保持しない。
In FIG. 9, 901 is a feature amount of the design prototype data point (p) with the standardization contribution C standard (D3 (i), p) of the data D3 (i) to the prediction result obtained by the processing step S112 on the vertical axis. The Euclidean distance on the feature space D4 (vc) from the vector D4 (vc) (candite) (p) was taken as the horizontal axis. Each data D3 (i) is indicated by three points, data (1), data (2), and data (3) by black triangle 605, black square 606, and black cross 607, respectively.
At this time, the component D4 (i, j) of the j-th feature amount vector of the feature amount vector D4 (vc) (i) of each data D3 (i) used for creating the prediction model is all the data set. With respect to i, the feature amount D4 (i, j) of each component j is often standardized so that the standard deviation becomes 1 at the stage of creating the data set, which is the basis of the feature amount. The information-specific scale difference does not retain the feature D4 (i, j).
 図9の902は横軸にデータD3(i)の追加時刻tを、縦軸に重み変化率f(t)を示す。データ追加時刻tの値が増加にするにあたり、D3(i)のデータベースへの追加が遅いことを示しており、tが増加するにつれて、データの重要度の重みは減少するため、縦軸の変化率f(t)は減少する。 902 in Figure 9 is an additional time t i of the data D3 (i) the horizontal axis indicates the weight change rate f (t i) on the vertical axis. Upon the value of the data additional time t i is the increase indicates that the addition to database D3 (i) is slow, as t i is increased, to reduce the weight of importance of the data, the vertical axis the rate of change of f (t i) is reduced.
 ここで、レコードデータD3(i)のデータベースへの追加時刻tは,レコードデータD3(i)を生成した生データD2(h)がデータベースに追加された時刻(図5の503)に対応するため、レコードデータD3(i)が保持する生データD2(h)のインデックスhの情報から、データベースへの追加時間からtを取得できる。つまり、追加時刻tはhに依存する値である。 Here, the time t i at which the record data D3 (i) is added to the database corresponds to the time when the raw data D2 (h) that generated the record data D3 (i) is added to the database (503 in FIG. 5). Therefore, the information of the index h of the raw data D2 (h) records data D3 (i) is held, can obtain t i from the additional time to the database. In other words, additional time t i is a value that depends on h.
 図9の902は、設計試作データ点(p)の予測結果に対して、データベースへの追加が遅くなるほど、重み変化率が下がることで、追加時刻が遅いほどデータ価値が下がることを示している。この例では、907、906、905の順序でデータ追加されており、この順序でデータ価値が減少することを表している。 902 of FIG. 9 shows that the later the addition to the database, the lower the weight change rate, and the later the addition time, the lower the data value with respect to the prediction result of the design prototype data point (p). .. In this example, data is added in the order of 907, 906, 905, which indicates that the data value decreases in this order.
 このf(t)は減少関数であり、例えば、0<a<1のもとで(3)式を用いて求めることができる。ここで、tfirstはデータセットのうち最も取得の早かったレコードデータD3(i)に対応するtに対応する。 The f (t i) is a decreasing function, for example, can be determined using the 0 <a <1 for under (3). Here, t first corresponds to t i corresponding to the early was record data D3 (i) Most acquisition of the data set.
Figure JPOXMLDOC01-appb-M000003
Figure JPOXMLDOC01-appb-M000003
 f(t)の例として、データベースへの生データD2(h)の追加時刻tからD3(i)に対する重み変化率f(t)を追加時刻の連続値として扱わず、ある期間内に取得された生データf(t)を一定として扱うような関数でもよい。例えば、(4)式に示すような、正の数であるステップ幅wと、0以上の整数qに対して、幅wごとにf(t)がwずつ増加する関数S(t)を用いて、重み変化率f(t)は(5)式で示すように定義できる。 Examples of f (t i), not treated as a continuous value of the weight change rate f (t i) the additional time for the D3 (i) from the additional time t i of the raw data D2 (h) to the database, a period of time raw data f a (t i) may be a function such as treated as a constant that is obtained. For example, (4) as shown in equation a step width w is a positive number, with respect to an integer of 0 or more q, function S w (t i where f (t i) is increased by w for each width w ) using the weight change rate f (t i) can be defined as shown in equation (5).
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000004
Figure JPOXMLDOC01-appb-M000005
Figure JPOXMLDOC01-appb-M000005
 ここで、aの定義は(3)式と同様である。例えば、前記f(t)を用いることで、ある期間wの間に取得された生データD2(h)のデータに対して、同じ重要度の重み付けができる。 Here, the definition of a is the same as that of Eq. (3). For example, the by using f (t i), to the data of the raw data D2 (h) acquired during a period of time w, it is weighted with the same importance.
 これに類する方法として、データ取得1期、データ取得2期、・・・のようにそれぞれのデータ取得期間を任意に決め、同一取得期間内ではf(t)が一定値を出力して、取得期間が遅いものに対して、重要度を示す重み付けf(t)を減少させていく方法もある。 As a method similar thereto, the data acquisition stage 1, the data acquisition phase 2, arbitrarily determined the respective data acquisition period as ..., are in the same acquisition period f (t i) outputs a constant value, relative things acquisition period is slow, even going importance to reduce the shown weighting f (t i) methods.
 f(t)に特段制限はないが、時間経過に対してデータ価値の減少する変化を表すことができれば上記の例に限らない。 f (t i) is no particular limitation on but not limited to the example above as long as it can represent a decrease change in the data value to time.
 図9の902で得られた重み変化率f(t)を用いて、f(t)を,データの標準化寄与度Cstandard(D3(i),p)(図9の901の縦軸)にかけることで、901のデータ寄与度から、903に示すデータの重み付け寄与度f(t)Cstandeard(D3(i),p) を求める。 Using the obtained weight change rate f (t i) in 902 of FIG. 9, f (t i) the standardization of data contribution C standard (D3 (i), p) ( vertical axis 901 in FIG. 9 by applying a), from the data contribution of 901, obtaining the weighting contribution to the data shown in 903 f (t i) C standeard (D3 (i), p).
 図903に示す例では、重み付け前の3つのデータ点908はそれぞれ図901のデータ(1)905からデータ(3)907に対応する。903の図内部において黒丸点で示している。データ追加時刻に応じた重み変化率f(t)を考慮することで、各データD3(i)は黒矢印で示すように変化する。 In the example shown in FIG. 903, the three data points 908 before weighting correspond to the data (1) 905 to the data (3) 907 of FIG. 901, respectively. It is indicated by a black circle inside the figure of 903. By considering the weight change rate f (t i) corresponding to the data additional time, each data D3 (i) changes as shown by the black arrow.
 例えば、図9の901では、905で示したデータ(1)から907で示したデータ(3)の順にデータの寄与度は高いが、一方で、903では重み変化率f(t)を考慮することで、907で示したデータ(3)から905で示したデータ(1)の順にデータの貢献度が高くなり、データの追加時刻が早いものが高く評価されている。 For example, in 901 of FIG. 9, the high contribution of the data in the order of the data (3) shown in 907 from the data (1) shown in 905, while the consideration of the weight change rate f (t i) In 903 By doing so, the contribution of the data increases in the order of the data (3) shown in 907 to the data (1) shown in 905, and the data having the earliest addition time is highly evaluated.
 以上より、データの重み付け寄与度f(t)Cstandeard(D3(i),p)が処理ステップS113,114を経て求まった。この前記重み付け寄与度は、データ価値定義の対象であるD3(i)すべてに対して、f(t)Cstandard(D3(i),p)の和を計算し、合計値で前記重み付け寄与度を割ることで、重み付け寄与度の標準化を行う。この時の標準化後の値を、標準化重み付け寄与度cm,p(i)として、データ価値定義のための処理ステップS115でこの値を用いる。ここで、iは生データD2(h)から生成されたレコードデータD3(i)に対応するインデックスiを示す。また、標準化の際にf(t)Cstandard(D3(i),p)に閾値を設けることで、重み付け寄与度上位n個以外を0として、その上で標準化を行っても良い。
以降標準化重み付け寄与度cm,p(i)を用いる。
Thus, data weighting contribution f (t i) C standeard ( D3 (i), p) is Motoma' through processing steps S113,114. The said weighting contribution, relative is D3 (i) all the target data value definitions, calculates the sum of f (t i) C standard ( D3 (i), p), the weighted contribution in total By dividing the degree, the weighting contribution is standardized. The value after standardization at this time is set as the standardization weighting contribution degree cm, p (i h ), and this value is used in the processing step S115 for defining the data value. Here, i h denotes the index i corresponding to the record data D3 generated from the raw data D2 (h) (i). At the time of standardization f (t i) C standard ( D3 (i), p) by providing the threshold, the non-weighted contribution upper n or 0, may be performed standardized thereon.
Hereinafter, the standardized weighting contributions cm and p (i h ) are used.
 処理ステップS115では、前述の処理ステップS114で求めた前記データの標準化重み付け寄与度cm,p(i)を基に各データD2(h)のデータ価値を定義する。前記データD3(i)はD2(h)から変換されるため、インデックスiはhに紐付いている。したがって、インデックスiに対応するインデックスをhを付与して以下でiと表記する。データD2(h)の価値V(h)の導出までの過程の説明を図10、図11、図12を用いて行う。 In the processing step S115, the data value of each data D2 (h) is defined based on the standardized weighting contribution degrees cm and p (i h ) of the data obtained in the above-mentioned processing step S114. Since the data D3 (i) is converted from D2 (h), the index i is associated with h. Therefore, the index corresponding to the index i is given h and is referred to as i h below. The process up to the derivation of the value V (h) of the data D2 (h) will be described with reference to FIGS. 10, 11, and 12.
 図10はデータ駆動の製品開発の一例を示す概念図、図11は製品利益とデータ(i)の貢献度の関係を示す概念図であり、図12は図11における変換定義表である。 FIG. 10 is a conceptual diagram showing an example of data-driven product development, FIG. 11 is a conceptual diagram showing the relationship between product profit and the degree of contribution of data (ich ), and FIG. 12 is a conversion definition table in FIG.
 製品開発現場では、完成製品(図10の1000)ができるまでに、複数条件での設計試作が行われる。ここで、予測モデルM(m)に基づいて提案された設計試作条件を1001Aと1001Bで示す。前述の図8における設計、試作データ点(p)で示す804の点に対応する。 At the product development site, design prototypes are performed under multiple conditions until the finished product (1000 in FIG. 10) is produced. Here, the design prototype conditions proposed based on the prediction model M (m) are shown by 1001A and 1001B. It corresponds to the point 804 shown by the design and prototype data points (p) in FIG. 8 described above.
 設計試作1001A、設計試作1001Bは、予測モデルM(A)、予測モデルM(B)等の予測モデルが出力する機能予測結果に基づいて提案された設計、試作条件である。
この設計試作条件を提案した、予測モデルM(A)、予測モデルM(B)を生成するにあたり用いたデータセットに含まれるレコードデータD3(i)と,予測モデルM(A),M(B)・・・は紐付いている。また、レコードデータD3(i)は,データ生成に用いた生データD2(h)に紐付いている。
The design prototype 1001A and the design prototype 1001B are design and prototype conditions proposed based on the function prediction results output by the prediction models such as the prediction model M (A) and the prediction model M (B).
The record data D3 (i h ) included in the data set used to generate the prediction model M (A) and the prediction model M (B), which proposed the design prototype conditions, and the prediction models M (A), M ( B) ... is tied. Further, the record data D3 (i) is associated with the raw data D2 (h) used for data generation.
 データ駆動の製品開発では、製品の設計試作条件に対応する特徴量データD4(vec)(candidate)(p)は、予測モデルM(m)からD3(i)を通して、D2(h)と結びつくため、各データD2(h)のデータ価値V(h)を、製品の開発経過に伴って定義できる。 In data-driven product development, feature data D4 (vc) (candidate) (p) corresponding to product design prototype conditions are linked to D2 (h) through prediction models M (m) to D3 (i h). Therefore, the data value V (h) of each data D2 (h) can be defined according to the progress of product development.
 データ価値定義V(h)の定義は図10の前記結びつきに従い、製品利益Psalesに応じて定める。全体像を図11の1101に示す。全体像内部の製品利益Psales(図11の1102)に対して、各開発貢献要素における貢献領域を1103、1104A、1104B、1105A、1105Bで示している。 The definition of the data value definition V (h) is determined according to the product profit Sales according to the above-mentioned connection in FIG. The whole picture is shown in 1101 of FIG. The contribution areas of each development contribution element are shown by 1103, 1104A, 1104B, 1105A, and 1105B with respect to the product profit sales (1102 in FIG. 11) inside the overall picture.
 図11の1101に示す領域全体である1102は、製品利益全体Psalesを指す。またここで、製品利益全体Psalesのなかで、四角で囲まれる領域1103は、製品利益のうちデータ全体が貢献度を示すデータ貢献率Rdata1109に応じて領域であり、Psalesdataの大きさの領域1103となる。 1102, which is the entire region shown in 1101 of FIG. 11, refers to the total product profit sales. Further, here, in the total product profit P sales , the area 1103 surrounded by a square is an area according to the data contribution rate R data 1109 in which the entire data indicates the degree of contribution of the product profit, and is the area of the P sales R data . It becomes the area 1103 of the size.
 このデータ貢献率Rdataについては、契約の形で、製品設計が始める前に恣意的に決める方法や、製品生産が始まってから生産状況に応じて恣意的に決める方法等考えられる。ただし、データ貢献率Rdataの決め方に特段制限はないため、製品開発に対して、データ全体への貢献率Rdataを合理的に決める方法があれば、各データの価値を決める本技術とその方法を組み合わせて実施できる。 Regarding this data contribution rate R data , a method of arbitrarily deciding before the start of product design or a method of arbitrarily deciding according to the production situation after the start of product production can be considered in the form of a contract. However, there are no particular restrictions on how to determine the data contribution rate R data , so if there is a method for rationally determining the data contribution rate R data for product development, this technology and its technology that determines the value of each data. It can be implemented by combining the methods.
 図11から、データ貢献領域はPsalesdataで示される領域1103となるが、この前記データ貢献領域は、各予測モデルM(m)の貢献率Mrate(m)1110に基づき各予測モデルの貢献領域に分割できる。ここでmは予測モデルM(m)ごとを区別するインデックスである。例えば、図中では予測モデルM(A)およびM(B)による貢献領域は1104A,1104Bで示される。一般に予測モデルM(m)による貢献領域の大きさはMrate(m) Rdatasalesで示される。 From FIG. 11, the data contribution area is the area 1103 indicated by Sales R data , and the data contribution area is based on the contribution rate M rate (m) 1110 of each prediction model M (m). Can be divided into contribution areas. Here, m is an index that distinguishes each prediction model M (m). For example, in the figure, the contribution regions by the prediction models M (A) and M (B) are indicated by 1104A and 1104B. Generally, the size of the contribution region according to the prediction model M (m) is indicated by M rate (m) R data Sales .
 つづいて、各予測モデルM(m)の貢献領域、その領域の大きさMrate(m) Rdatasalesも、各データD3(i)の貢献領域に分割できる。ここで、ある予測モデルM(m)の貢献領域に対するデータD3(i)の貢献割合はC(i)図11の1111とすると、予測モデルM(m)の貢献領域は,C(i)Mrate(m)Rdatasalesとなる。 Subsequently, the contribution area of each prediction model M (m) and the size of the area Meter (m) R data Sales can also be divided into the contribution areas of each data D3 (i). Here, assuming that the contribution ratio of the data D3 (i h ) to the contribution region of a certain prediction model M (m) is 1111 in FIG. 11 of C m (i h ), the contribution region of the prediction model M (m) is C m. (I h ) Rate (m) R data Sales .
 この時、生データD2(h)のデータ価値V(h)は(6)式で定義することができる。ここで、mは予測モデルM(m)を区別するインデックスであり、集合I(D2(h))は、生データD2(h)に紐づいているレコードデータD3(i)のインデックスiを要素としてもつ集合である。 At this time, the data value V (h) of the raw data D2 (h) can be defined by the equation (6). Here, m is an index that distinguishes the prediction model M (m), and the set I (D2 (h)) is an index i h of the record data D3 (i h) associated with the raw data D2 (h). Is a set that has as an element.
Figure JPOXMLDOC01-appb-M000006
Figure JPOXMLDOC01-appb-M000006
 次に、(6)式で示されたC(i)と、処理ステップS114で求まったデータD3(i)の重み付け寄与度をcm,p(i)と表記すると、C(i)と重み付き寄与度の関係は(7)式に示す。 Next, when the weighting contributions of C m (i h ) shown in Eq. (6) and the data D3 (i h ) obtained in the processing step S114 are expressed as cm and p (i h ), C m. The relationship between (i h ) and the weighted contribution is shown in Eq. (7).
Figure JPOXMLDOC01-appb-M000007
Figure JPOXMLDOC01-appb-M000007
 以上により、重み付き寄与度cm,p(i)と生データV(h)との関係が、(6)式と(7)式を用いて定義された。 From the above, the relationship between the weighted contributions cm and p (i h ) and the raw data V (h) was defined using Eqs. (6) and (7).
 処理ステップS115により、生データD2(h)のデータ価値V(h)が定まるため、D2(h)のもつ、データ提供者の情報(図5の504)に従い、生データD2(h)のデータ提供者にデータ価値V(h)のデータに従い、適宜報酬として支払うことができる。 Since the data value V (h) of the raw data D2 (h) is determined by the processing step S115, the data of the raw data D2 (h) is determined according to the data provider information (504 in FIG. 5) of the raw data D2 (h). The provider can be paid as a reward as appropriate according to the data of the data value V (h).
 図1のフロー図の処理ステップS101から処理ステップS116の一連の流れに従って、最適化対象である製品機能に関わる複数のデータを取得し、前記データを特徴量データと製品機能の評価値である目的変数データを要素としてもつ予測モデル作成向けのデータに変換し、前記予測モデル作成向けの複数データから最適化対象の機能を評価する値である目的変数を予測する予測モデルを作成し、複数の特徴量から定まる特徴量ベクトルを入力として、目的変数の予測値を出力する予測モデルを用いて得られた最適な設計試作条件を示す入力情報である特徴量ベクトルを得て、前記予測モデル作成向けの複数データがそれぞれ、どの程度最適な設計試作条件を示す特徴量ベクトルの予測結果へ寄与したかを示す寄与度の定義を、寄与度の導出対象である学習データが学習データセットに含まれない際に予測結果がどの程度変化するかを評価するかを示す寄与度を決定し、前記寄与度に基から、最適化対象である製品機能に関わる複数データのそれぞれのデータに対して価値を定めることを特徴とするデータ価値定義方法、データ収集促進方法の実施例を示した。
これにより定義されたデータ価値に従い、各データのデータ提供者に適切な報酬が支払うことを可能とする。
According to a series of flows from the processing step S101 to the processing step S116 of the flow chart of FIG. 1, a plurality of data related to the product function to be optimized are acquired, and the data is used as the feature amount data and the evaluation value of the product function. A prediction model that predicts the objective variable, which is a value for evaluating the function to be optimized, is created from the plurality of data for creating the prediction model by converting the variable data into data for creating the prediction model, and has a plurality of features. The feature quantity vector, which is input information indicating the optimum design trial condition obtained by using the prediction model that outputs the predicted value of the objective variable by inputting the feature quantity vector determined from the quantity, is obtained and is used for creating the prediction model. When the training data set to which the contribution is to be derived does not include the definition of the contribution, which indicates how much each of the multiple data contributed to the prediction result of the feature quantity vector indicating the optimum design trial conditions. Determine the degree of contribution that indicates how much the prediction result will change, and determine the value for each of the multiple data related to the product function to be optimized based on the degree of contribution. An example of a data value definition method and a data collection promotion method characterized by the above is shown.
This allows the data provider of each data to be paid an appropriate reward according to the defined data value.
 実施例1では、データ価値定義方法について説明したが、実施例2ではこれを計算機で実現したデータ価値定義システムについて説明する。このシステム機能は、実施例1の機能の一部を含むものであっても、それらすべてを含むものであってもよい。 In Example 1, the data value definition method was described, but in Example 2, a data value definition system that realizes this by a computer will be described. This system function may include some of the functions of the first embodiment or may include all of them.
 データ価値定義システムの構成を、図13を用いて説明する。図13は本発明に係るデータ価値定義システムの概略構成を示す模式図である。 The configuration of the data value definition system will be described with reference to FIG. FIG. 13 is a schematic diagram showing a schematic configuration of a data value definition system according to the present invention.
 図13のデータ価値定義システム1200は、予測モデルの予測結果への寄与度を考慮して、製品開発への各データの貢献度からデータ価値を定義するシステムであって、大きくわけて6つの機能部分から構成される。 The data value definition system 1200 of FIG. 13 is a system that defines the data value from the contribution of each data to product development in consideration of the contribution of the prediction model to the prediction result, and is roughly divided into six functions. It consists of parts.
 これらの機能部分は、データ取得とデータベースへの追加を実施する、データ追加部10と、取得データに基づくデータベースや予測モデルの使用履歴データベース等の各種データを記憶する、データ記憶部20と、蓄積されたデータベースを用いて予測モデル作成する、予測モデル作成部30と、作成した予測モデルを製品設計に対して利用する、予測モデル使用部40と、予測モデルの使用履歴データベースをもとに各データ価値を定義する、データ価値定義部50と、解析計算に対する入力および出力を行う、入出力部60から構成される。以下、各部の内部構成をより具体的に説明する。 These functional parts include a data addition unit 10 that acquires data and adds it to the database, and a data storage unit 20 that stores various data such as a database based on the acquired data and a usage history database of a prediction model. Each data is based on the prediction model creation unit 30 that creates a prediction model using the created database, the prediction model usage unit 40 that uses the created prediction model for product design, and the usage history database of the prediction model. It is composed of a data value definition unit 50 that defines a value and an input / output unit 60 that inputs and outputs an analysis calculation. Hereinafter, the internal configuration of each part will be described more specifically.
 データ追加部10は、データ取得機構11、データ追加時刻、付与機構12、データベースへの追加機構13を有しており、それぞれの機構は図1におけるデータ取得(処理ステップS102)、データに追加時刻付与(処理ステップS103)、データベースへの追加(処理ステップS104)を担うものということができる。なお具体的なデータ追加部10の構成としては、顕微鏡類、IR解析装置、NMR解析装置、X線解析装置、電子線解析装置、コンピュータシミュレーションによる解析装置等の計測装置や各種の解析装置をデータ取得機構11に含んでも良いし、またデータ取得機構11は後述する入力機構61によるデータの直接入力されたデータや、データ通信機構63により受信したデータを扱っても良い。この点に関して、入力機構61や、データ通信機構63もデータ取得機構11の一部と考えることができる。 The data addition unit 10 has a data acquisition mechanism 11, a data addition time, an addition mechanism 12, and a database addition mechanism 13, and each mechanism acquires data in FIG. 1 (processing step S102) and adds time to data. It can be said that it is responsible for granting (processing step S103) and adding to the database (processing step S104). As a specific configuration of the data addition unit 10, data includes measurement devices such as microscopes, IR analysis device, NMR analysis device, X-ray analysis device, electron beam analysis device, analysis device by computer simulation, and various analysis devices. It may be included in the acquisition mechanism 11, and the data acquisition mechanism 11 may handle the data directly input by the input mechanism 61 described later or the data received by the data communication mechanism 63. In this regard, the input mechanism 61 and the data communication mechanism 63 can also be considered as part of the data acquisition mechanism 11.
 必要とされる機能が適切に行われるかぎり、データ追加部10の各機構の装置構成に特段の限定はなく、従前の解析装置(例えば、コンピータ)や、上記測定装置(顕微鏡等)を適宜利用できる。 As long as the required functions are properly performed, there is no particular limitation on the device configuration of each mechanism of the data addition unit 10, and a conventional analysis device (for example, a computer) or the above-mentioned measuring device (microscope, etc.) is appropriately used. can.
 データ追加部10により取り込んだデータは、データ記憶部20のデータ記憶機構21において記憶される。この際データ追加部10は,データ記憶装置に対して複数存在してもよく,その場合複数の生産開発現場や,研究拠点からデータを収集して活用することができるという利点がある。 The data captured by the data addition unit 10 is stored in the data storage mechanism 21 of the data storage unit 20. At this time, a plurality of data addition units 10 may exist for the data storage device, and in that case, there is an advantage that data can be collected and utilized from a plurality of production development sites or research bases.
 データ記憶機構21は、必要なデータが記憶できるかぎり特段の限定はなく、従前のデータ記憶装置(例えば、ランダムアクセスメモリ(RAM)、ハードディスク(HD)、ソリッドステートドライブ(SSD)など)を適宜利用できる。また,データ記憶装置は単一ノードで実現される必要がなく,複数のノードがネットワークでつながり,分散処理されていても良い。また,分散処理を行うのに適した解析処理の場合であれば,機構30,40,50も複数のノードがネットワーク上でつながって処理を行う構成でもよい。 The data storage mechanism 21 is not particularly limited as long as necessary data can be stored, and a conventional data storage device (for example, random access memory (RAM), hard disk (HD), solid state drive (SSD), etc.) is appropriately used. can. Further, the data storage device does not need to be realized by a single node, and a plurality of nodes may be connected by a network and distributed processing may be performed. Further, in the case of analysis processing suitable for performing distributed processing, the mechanisms 30, 40, and 50 may also have a configuration in which a plurality of nodes are connected on a network to perform processing.
 予測モデル作成部30は、データセット作成機構31、予測モデル定義機構32、予測モデル作成機構33、予測モデル性能評価機構34を有しており、それぞれの機構は、図1におけるデータセット作成(処理ステップS105)、最適化対象の予測モデル定義(処理ステップS106)、予測モデル作成(処理ステップS107)、予測モデル性能評価(処理ステップS108、処理ステップS109)を担うものということができる。また予測モデル作成部30の各機構内で計算された結果は、データ記憶機構21において記憶される。 The prediction model creation unit 30 has a data set creation mechanism 31, a prediction model definition mechanism 32, a prediction model creation mechanism 33, and a prediction model performance evaluation mechanism 34, and each mechanism creates a data set (processing) in FIG. It can be said that it is responsible for step S105), definition of a prediction model to be optimized (processing step S106), creation of a prediction model (processing step S107), and evaluation of prediction model performance (processing step S108, processing step S109). Further, the result calculated in each mechanism of the prediction model creation unit 30 is stored in the data storage mechanism 21.
 予測モデル使用部40は、予測モデルの使用機構41、予測モデル使用履歴追加機構42を有しており、それぞれの機構は、図1における予測モデルの使用(処理ステップS110)、予測モデルの使用履歴の保存(処理ステップS111)を担うものということができる。また予測モデル使用部40の各機構内で計算された結果や、作成されたデータはデータ記憶機構21において記憶される。 The prediction model usage unit 40 has a prediction model usage mechanism 41 and a prediction model usage history addition mechanism 42, and each mechanism uses the prediction model in FIG. 1 (processing step S110) and the prediction model usage history. It can be said that it is responsible for the preservation (processing step S111). Further, the result calculated in each mechanism of the prediction model use unit 40 and the created data are stored in the data storage mechanism 21.
 データ価値定義部50は、予測モデル使用履歴への各データ寄与度解析機構51、各データ追加時刻に基づく寄与度への重み取得機構52、重み付けデータ寄与度からデータ価値への変換機構53を有しており、それぞれの機構は、図1における予測モデル使用履歴への各データ寄与度解析(処理ステップS112)、各データ追加時刻に応じた寄与度への重み計算(処理ステップS113)と各データの重みづけ寄与度を計算(処理ステップS114)、ならびに、重みづけデータ寄与度からデータ価値へ変換(処理ステップS115)を担うものということができる。 The data value definition unit 50 has each data contribution analysis mechanism 51 for the prediction model usage history, a weight acquisition mechanism 52 for the contribution based on each data addition time, and a conversion mechanism 53 from the weighted data contribution to the data value. Each mechanism performs each data contribution analysis to the prediction model usage history in FIG. 1 (processing step S112), weight calculation to the contribution according to each data addition time (processing step S113), and each data. It can be said that it is responsible for calculating the weighted contribution of (processing step S114) and converting the weighted data contribution to data value (processing step S115).
 なお必要とされる機能が適切に行われるかぎり、予測モデル作成部20、予測モデル使用部40、およびデータ価値定義部50の各機構の装置構成に特段の限定はなく、従前の解析装置(例えば、計算機装置)を適宜利用できる。 As long as the required functions are properly performed, there is no particular limitation on the device configuration of each mechanism of the prediction model creation unit 20, the prediction model use unit 40, and the data value definition unit 50, and there is no particular limitation on the device configuration of the conventional analysis device (for example, , Computer device) can be used as appropriate.
 入出力部60は、解析条件(例えば、取得データの直接入力、機械学習モデルの選択情報の入力や、前記モデルのハイパーパラメータ、最適化アルゴリズムのパラメータや、パラメータの探索範囲等)の入力を行う入力機構61と、解析結果の出力を行う出力機構62とを有しており、処理ステップS103から処理ステップS116までのフローに関する入出力を担うものである。入力されたデータについての各種解析条件、解析結果等の情報はデータ記憶機構21において記憶される。また出力情報も同様にデータ記憶機構21において記憶できる。 The input / output unit 60 inputs analysis conditions (for example, direct input of acquired data, input of machine learning model selection information, hyperparameters of the model, parameters of optimization algorithm, search range of parameters, etc.). It has an input mechanism 61 and an output mechanism 62 that outputs the analysis result, and is responsible for input / output related to the flow from the processing step S103 to the processing step S116. Information such as various analysis conditions and analysis results for the input data is stored in the data storage mechanism 21. The output information can also be stored in the data storage mechanism 21 in the same manner.
 必要とされる入力および望まれる出力ができるかぎり、入力機構61および出力機構62の装置構成に特段の限定はなく、従前の入出力装置(例えば、キーボード、ディスプレー、プリンタ)を適宜利用できる。 As long as the required input and the desired output can be obtained, the device configuration of the input mechanism 61 and the output mechanism 62 is not particularly limited, and conventional input / output devices (for example, keyboard, display, printer) can be used as appropriate.
 上述した実施形態は、本発明の理解を助けるために説明したものであり、本発明は、記載した具体的な構成のみに限定されるものではない。例えば、実施形態の構成の一部を当業者の技術常識の構成に置き換えることが可能であり、実施形態の構成に当業者の技術常識の構成を加えることも可能である。また、実施形態同士を適宜組み合わせてもよい。すなわち、本発明は、本明細書の実施形態の構成の一部について、発明の技術的思想を逸脱しない範囲で、削除・他の構成による置換・他の構成の追加をすることが可能である。 The above-described embodiment has been described for the purpose of assisting the understanding of the present invention, and the present invention is not limited to the specific configuration described. For example, it is possible to replace a part of the configuration of the embodiment with the configuration of the common general technical knowledge of those skilled in the art, and it is also possible to add the configuration of the common general technical knowledge of the person skilled in the art to the configuration of the embodiment. Moreover, you may combine embodiments as appropriate. That is, the present invention can delete, replace, or add another configuration to a part of the configuration of the embodiment of the present specification without departing from the technical idea of the invention. ..
 実施例3では、実施例1におけるデータ価値定義方法、実施例2におけるデータ価値定義システムをさらに敷衍して、これらを利用したデータ収集促進方法あるいは、データ収集促進システムを構成するものである。 In Example 3, the data value definition method in Example 1 and the data value definition system in Example 2 are further extended to configure a data collection promotion method or a data collection promotion system using these.
 実施例3の場合には、最適化対象から複数のデータを取得するに際し、自会社内データのみでなく、広く外部機関までを含めてデータを取得する。そのうえでデータを提供してくれたデータ提供者に対して、評価して決定した価値に応じた対価を提案し、提供することによって、一層のデータ確保のインセンティブを高めることとしたものである。 In the case of Example 3, when acquiring a plurality of data from the optimization target, not only the data in the own company but also a wide range of external organizations are acquired. After that, the incentive to secure the data is further enhanced by proposing and providing the data provider who provided the data with the consideration according to the value determined by the evaluation.
10:データ追加部、20:データ記憶部、30:予測モデル作成部、40:予測モデル使用部、50:データ価値定義部、60:入出力部 10: Data addition unit, 20: Data storage unit, 30: Prediction model creation unit, 40: Prediction model usage unit, 50: Data value definition unit, 60: Input / output unit

Claims (9)

  1.  最適化対象である機能評価値に関わる複数のデータを取得し、前記データを特徴量データと評価値である目的変数データを要素としてもつ予測モデル作成向けのデータに変換し、前記予測モデル作成向けの複数データから最適化対象の機能を評価する値である目的変数を予測する予測モデルを作成し、複数の特徴量から定まる特徴量ベクトルを入力として、目的変数の予測値を出力する予測モデルを用いて得られた有望な設計試作条件を示す入力情報である特徴量ベクトルを得て、前記予測モデル作成向けの複数データが学習データに含まれない際に予測結果がどの程度変化するかを示す寄与度を求めて、前記寄与度に基づき、はじめに取得した最適化対象である製品機能に関わる複数データのそれぞれのデータに対して価値を定めることを特徴とするデータ価値定義方法。 Acquire a plurality of data related to the function evaluation value to be optimized, convert the data into data for creating a prediction model having feature amount data and objective variable data as evaluation values as elements, and for creating the prediction model. Create a prediction model that predicts the objective variable, which is the value that evaluates the function to be optimized from the multiple data of Obtain a feature quantity vector, which is input information indicating promising design trial conditions obtained using the data, and show how much the prediction result changes when a plurality of data for creating the prediction model is not included in the training data. A data value definition method characterized in that the degree of contribution is obtained and the value is determined for each of a plurality of data related to the product function to be optimized first acquired based on the degree of contribution.
  2.  請求項1に記載のデータ価値定義方法であって、
    前記予測モデルを用いて得られた最適な設計試作条件を示す入力情報である特徴量ベクトルを得て、前記予測モデル作成向けの複数データが学習データに含まれない際に予測結果がどの程度変化するかを示す寄与度を求める際に、学習データに含まれるかどうかが局所的な特徴量空間の変化にどの程度影響するかに着目して、予測結果への各データの寄与度を定義することを特徴とするデータ価値定義方法。
    The data value definition method according to claim 1.
    How much the prediction result changes when a plurality of data for creating the prediction model is not included in the training data by obtaining a feature amount vector which is input information indicating the optimum design trial condition obtained by using the prediction model. When determining the contribution that indicates whether or not to do so, the contribution of each data to the prediction result is defined by focusing on how much the inclusion in the training data affects the change in the local feature space. A data value definition method characterized by the fact that.
  3.  請求項1または請求項2に記載のデータ価値定義方法であって、
     設計試作して得たデータの寄与度を複数得るとともに、前記寄与度を得た順番に応じて高い重みの寄与度に修正することを特徴とするデータ価値定義方法。
    The data value definition method according to claim 1 or 2.
    A data value definition method characterized in that a plurality of contributions of data obtained by design and trial production are obtained, and the contributions are corrected to higher weights according to the order in which the contributions are obtained.
  4.  請求項1、請求項2または請求項3のいずれか1項に記載のデータ価値定義方法であって、
     前記最適化対象となる製品の利益に対して、当該利益の中で、前記予測モデルが、設計条件を提案した際に、設計試作結果である目的変数の実測値の予測への影響度を、データ価値定義方法を用いることで、設計データが設計試作条件を提案する一連のプロセス全体を評価することで各データの貢献率を考慮して前記データの価値を定めることを特徴とするデータ価値定義方法。
    The data value definition method according to any one of claims 1, 2, and 3.
    With respect to the profit of the product to be optimized, the degree of influence on the prediction of the measured value of the objective variable, which is the result of the design prototype, when the prediction model proposes the design conditions in the profit. Data value definition is characterized in that the value of the data is determined in consideration of the contribution rate of each data by evaluating the entire series of processes in which the design data proposes design prototype conditions by using the data value definition method. Method.
  5.  請求項4に記載のデータ価値定義方法であって、
     前記予測モデルは複数生成されるとともに、前記貢献率は複数の前記予測モデルに対して設定されていることを特徴とするデータ価値定義方法。
    The data value definition method according to claim 4.
    A data value definition method, characterized in that a plurality of the prediction models are generated and the contribution rate is set for the plurality of prediction models.
  6.  請求項1から請求項5のいずれか1項に記載のデータ価値定義方法であって、
     最適化対象に関わる複数のデータは、最適化対象が材料である場合に前記材料に関するデータであることを特徴とするデータ価値定義方法。
    The data value definition method according to any one of claims 1 to 5.
    A data value definition method characterized in that a plurality of data related to an optimization target are data related to the material when the optimization target is a material.
  7.  請求項1から請求項6のいずれか1項に記載のデータ価値定義方法を用いたデータ収集促進方法であって、
     前記最適化対象のデータを、データ提供者から得るとともに、決定したデータの価値に対する対価をデータ提供者に提示することでデータ確保のインセンティブを高めることを特徴とするデータ収集促進方法。
    A data collection promotion method using the data value definition method according to any one of claims 1 to 6.
    A data collection promotion method characterized in that the data to be optimized is obtained from a data provider and the incentive for securing the data is increased by presenting the consideration for the value of the determined data to the data provider.
  8.  最適化対象から複数のデータを取得する第1の手段、複数の特徴量データと目的変数データを含むデータセットを生成する第2の手段、複数の前記データセットから前記最適化対象の予測モデルを作成する第3の手段、前記複数の特徴量データで定まる特徴量ベクトルと前記目的変数データで定まる平面上で作成した前記予測モデルの特性と設計試作して得たデータを比較する第4の手段、前記設計試作して得たデータの寄与度を決定する第5の手段、前記寄与度から前記設計試作して得たデータの価値を定める第6の手段を備えることを特徴とするデータ価値定義システム。 A first means for acquiring a plurality of data from an optimization target, a second means for generating a data set including a plurality of feature amount data and objective variable data, and a prediction model for the optimization target from the plurality of the data sets. The third means to be created, the fourth means to compare the characteristics of the prediction model created on the plane determined by the feature quantity vector determined by the plurality of feature quantity data and the objective variable data with the data obtained by design trial production. The data value definition is characterized by comprising a fifth means for determining the contribution of the data obtained by the design and trial production, and a sixth means for determining the value of the data obtained by the design and trial production from the contribution. system.
  9.  請求項8に記載のデータ価値定義システムを用いたデータ収集促進システムであって、 前記最適化対象のデータを、データ提供者から得るとともに、決定したデータの価値に対する対価をデータ提供者に提示することでデータ確保のインセンティブを高めることを特徴とするデータ収集促進システム。 A data collection promotion system using the data value definition system according to claim 8, wherein the data to be optimized is obtained from the data provider and the consideration for the determined data value is presented to the data provider. A data collection promotion system characterized by increasing the incentive to secure data.
PCT/JP2020/033223 2020-03-03 2020-09-02 Data value definition method, data collection facilitation method, data value definition system, and data collection facilitation system WO2021176753A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
JP2020-035620 2020-03-03
JP2020035620A JP2021140296A (en) 2020-03-03 2020-03-03 Data value definition method, data collection facilitation method, data value definition system, and data collection facilitation system

Publications (1)

Publication Number Publication Date
WO2021176753A1 true WO2021176753A1 (en) 2021-09-10

Family

ID=77612959

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/JP2020/033223 WO2021176753A1 (en) 2020-03-03 2020-09-02 Data value definition method, data collection facilitation method, data value definition system, and data collection facilitation system

Country Status (2)

Country Link
JP (1) JP2021140296A (en)
WO (1) WO2021176753A1 (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792189A (en) * 2021-09-30 2021-12-14 中国人民解放军国防科技大学 Crowd-sourcing software development contribution efficiency evaluation method, device, equipment and medium
CN115409419A (en) * 2022-09-26 2022-11-29 河南星环众志信息科技有限公司 Value evaluation method and device of business data, electronic equipment and storage medium

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
JP2008299486A (en) * 2007-05-30 2008-12-11 Toshiba Corp Data deletion device, and method and program for data deletion
US20100114664A1 (en) * 2007-01-16 2010-05-06 Bernard Jobin Method And System For Developing And Evaluating And Marketing Products Through Use Of Intellectual Capital Derivative Rights
JP2017520068A (en) * 2014-05-23 2017-07-20 データロボット, インコーポレイテッド Systems and techniques for predictive data analysis
JP2018180712A (en) * 2017-04-06 2018-11-15 テンソル・コンサルティング株式会社 Model variable candidate generating device and method
JP2018206200A (en) * 2017-06-07 2018-12-27 Kddi株式会社 Management device, method for management, and program
JP2020024541A (en) * 2018-08-07 2020-02-13 株式会社キーエンス Data analysis device and data analysis method

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20100114664A1 (en) * 2007-01-16 2010-05-06 Bernard Jobin Method And System For Developing And Evaluating And Marketing Products Through Use Of Intellectual Capital Derivative Rights
JP2008299486A (en) * 2007-05-30 2008-12-11 Toshiba Corp Data deletion device, and method and program for data deletion
JP2017520068A (en) * 2014-05-23 2017-07-20 データロボット, インコーポレイテッド Systems and techniques for predictive data analysis
JP2018180712A (en) * 2017-04-06 2018-11-15 テンソル・コンサルティング株式会社 Model variable candidate generating device and method
JP2018206200A (en) * 2017-06-07 2018-12-27 Kddi株式会社 Management device, method for management, and program
JP2020024541A (en) * 2018-08-07 2020-02-13 株式会社キーエンス Data analysis device and data analysis method

Cited By (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113792189A (en) * 2021-09-30 2021-12-14 中国人民解放军国防科技大学 Crowd-sourcing software development contribution efficiency evaluation method, device, equipment and medium
CN113792189B (en) * 2021-09-30 2024-05-14 中国人民解放军国防科技大学 Method, device, equipment and medium for evaluating contribution efficiency of crowd-sourced software development
CN115409419A (en) * 2022-09-26 2022-11-29 河南星环众志信息科技有限公司 Value evaluation method and device of business data, electronic equipment and storage medium
CN115409419B (en) * 2022-09-26 2023-12-05 河南星环众志信息科技有限公司 Method and device for evaluating value of business data, electronic equipment and storage medium

Also Published As

Publication number Publication date
JP2021140296A (en) 2021-09-16

Similar Documents

Publication Publication Date Title
CN109784806B (en) Supply chain control method, system and storage medium
Wang et al. Big data analytics for forecasting cycle time in semiconductor wafer fabrication system
Li et al. An integrated location-inventory problem in a closed-loop supply chain with third-party logistics
Semenoglou et al. Investigating the accuracy of cross-learning time series forecasting methods
Hatefi et al. Multi-criteria ABC inventory classification with mixed quantitative and qualitative criteria
Wang et al. Big data driven cycle time parallel prediction for production planning in wafer manufacturing
Song et al. Prioritising technical attributes in QFD under vague environment: a rough-grey relational analysis approach
Proietti et al. Dynamic factor analysis with non-linear temporal aggregation constraints
Makridakis et al. Statistical, machine learning and deep learning forecasting methods: Comparisons and ways forward
WO2021176753A1 (en) Data value definition method, data collection facilitation method, data value definition system, and data collection facilitation system
Attanasio et al. Towards an automated, fast and interpretable estimation model of heating energy demand: A data-driven approach exploiting building energy certificates
Garmabaki et al. Maintenance optimization using multi-attribute utility theory
JPWO2017056367A1 (en) Information processing system, information processing method, and information processing program
Zhang et al. Optimized scenario reduction: Solving large-scale stochastic programs with quality guarantees
Ebrahimnezhad et al. A new extended analytical hierarchy process technique with incomplete interval-valued information for risk assessment in IT outsourcing
US11995667B2 (en) Systems and methods for business analytics model scoring and selection
MirHassani et al. Quantum binary particle swarm optimization-based algorithm for solving a class of bi-level competitive facility location problems
Carrizosa et al. A sparsity-controlled vector autoregressive model
Yan et al. Evaluation of agri-product supply chain competitiveness based on extension theory
Dehghan Shoorkand et al. A deep learning approach for integrated production planning and predictive maintenance
Ira et al. Tuning of multivariable model predictive controllers through expert bandit feedback
Fu et al. Resilient supply chain framework for semiconductor distribution and an empirical study of demand risk inference
Guan et al. Ultra-short-term wind power prediction method combining financial technology feature engineering and XGBoost algorithm
Allu et al. Predicting the success rate of a start-up using lstm with a swish activation function
Narbaev et al. A machine learning study to improve the reliability of project cost estimates

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20923407

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20923407

Country of ref document: EP

Kind code of ref document: A1