CN107958268A - The training method and device of a kind of data model - Google Patents

The training method and device of a kind of data model Download PDF

Info

Publication number
CN107958268A
CN107958268A CN201711175464.6A CN201711175464A CN107958268A CN 107958268 A CN107958268 A CN 107958268A CN 201711175464 A CN201711175464 A CN 201711175464A CN 107958268 A CN107958268 A CN 107958268A
Authority
CN
China
Prior art keywords
score
data model
scoring
modeling
index
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN201711175464.6A
Other languages
Chinese (zh)
Inventor
王雪洁
李长山
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Uf Financial Information Technology Ltd By Share Ltd
Original Assignee
Uf Financial Information Technology Ltd By Share Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Uf Financial Information Technology Ltd By Share Ltd filed Critical Uf Financial Information Technology Ltd By Share Ltd
Priority to CN201711175464.6A priority Critical patent/CN107958268A/en
Publication of CN107958268A publication Critical patent/CN107958268A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/214Generating training patterns; Bootstrap methods, e.g. bagging or boosting
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/21Design or setup of recognition systems or techniques; Extraction of features in feature space; Blind source separation
    • G06F18/217Validation; Performance evaluation; Active pattern learning techniques
    • G06F18/2193Validation; Performance evaluation; Active pattern learning techniques based on specific statistical tests

Landscapes

  • Engineering & Computer Science (AREA)
  • Data Mining & Analysis (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Evolutionary Biology (AREA)
  • Evolutionary Computation (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Probability & Statistics with Applications (AREA)
  • Electrically Operated Instructional Devices (AREA)

Abstract

The present invention proposes the training method and device of a kind of data model, and the training method of data model includes:Modeling problem type and sample data are obtained, and identifies sample data type;According to modeling problem type and sample data, determine sample parameter and index can be issued;According to modeling problem type, sample parameter and preset model selection strategy, modeling algorithm is determined;According to modeling algorithm training data model, and sample data is input to data model and obtains output result;Score output result, obtain appraisal result;Judge whether appraisal result meets that index can be issued;When appraisal result is unsatisfactory for that index can be issued, then optimize preset model selection strategy, and return to continuation according to modeling problem type, sample parameter and preset model selection strategy, determine modeling algorithm.The present invention introduces Automation grade point mechanism in modeling process, carries out the optimization of data model and modd selection strategy, reduces manual intervention, improve modeling efficiency.

Description

Training method and device for data model
Technical Field
The invention relates to the technical field of data mining, in particular to a training method and device of a data model.
Background
The mining analysis based on big data is utilized to provide support for enterprise decision making, data needs to have better business understanding under the premise that the data is accurate in quality, a targeted analysis prediction model can be trained from massive data by utilizing a proper mining algorithm, and production deployment is carried out. Fig. 1 shows a flow diagram of a classical data mining in the background of the invention. A classical Data mining Process (CRISP-DM: cross-Industry Standard Process for Data mining) is shown in fig. 1, and in the Process of analyzing and modeling business Data, business modeling staff basically explore, process and model the Data by using analysis and mining tools such as SPSS, SAS, R, etc., to convert business problems into Data problems, and model the Data after the Data analysis and processing are ready. In the data modeling process, the analysis prediction model trained based on the sample data needs to be evaluated (such as accuracy, error and the like) to judge whether the model can be put into a production environment, so that the deployment is carried out to solve the business problem.
FIG. 2 shows a flow diagram of classical data modeling in the background of the invention. As shown in fig. 2, the preprocessed (filtered, converted, combined, etc.) data are trained and evaluated by statistical analysis and visual exploration according to experience and business problems of business modeling personnel by selecting different mining algorithms (classification, clustering, association, etc.), corresponding algorithm model parameter values are obtained by training input sample data, and the accuracy of the model is evaluated by verifying a data set, so as to determine whether the model can be put into a production environment. In a production environment, the generated business data is input from the perspective of a model, and an analysis prediction result of a production decision reference is generated after the model is calculated.
In the whole modeling analysis process, the flow of the dotted frame part and the production environment deployment process need to be trained by a modeling worker by selecting a corresponding mining algorithm according to the business field knowledge of the modeling worker, when the training result does not meet the requirement (the error is large, and the like), the algorithm or parameters need to be readjusted, and a large number of attempts are often needed to find a relatively optimized model result. Generally, this step tends to take up a large portion of the entire analysis mining project.
Disclosure of Invention
The present invention has been made to solve at least one of the problems occurring in the prior art or the related art.
To this end, a first aspect of the present invention is directed to a method for training a data model.
A second aspect of the present invention is to provide a training apparatus for data models.
In view of this, according to a first aspect of the present invention, a method for training a data model is provided, including: obtaining a modeling problem type and sample data, and identifying the type of the sample data; determining sample parameters and issuable indexes according to the modeling problem type and sample data; determining a modeling algorithm according to the modeling problem type, the sample parameters and a preset model selection strategy; training a data model according to a modeling algorithm, and inputting sample data into the data model to obtain an output result; grading the output result to obtain a grading result; judging whether the scoring result meets the issuable index; and when the grading result does not meet the issuable index, optimizing a preset model selection strategy, and returning to continuously determine a modeling algorithm according to the modeling problem type, the sample parameter and the preset model selection strategy.
The invention provides a training method of a data model, which comprises the steps of firstly identifying the type of obtained sample data (such as the sample data is digital or character type, continuous or discrete type and the like), determining sample parameters (such as classification indexes of classification problems, mean values of clustering problems and the like) and issuable indexes (such as the accuracy rate is more than 95% and the like) according to the type of the sample data and the type of obtained modeling problems (such as classification problems, clustering problems, association problems and the like), then selecting one or more modeling algorithms from a modeling algorithm cluster according to the type of the modeling problems, the sample parameters and a preset model selection strategy, training the data model, finally grading the data model by using the sample data, judging whether the grading result meets the issuable indexes, optimizing the preset model selection strategy if the grading result does not meet the issuable indexes, and returning to re-determine the modeling algorithms. According to the method, the corresponding mining algorithm is automatically selected through the preset model selection strategy to model the sample data, the preset model selection strategy is automatically optimized through evaluating the data model, manual intervention is not needed, the objectivity of the model is greatly improved, subjective omission and errors of modeling personnel are reduced, the deployable model meeting the production environment can be selected, the threshold of the business modeling personnel for applying the mining algorithm is reduced, and the modeling accuracy and efficiency are improved.
The training method of the data model according to the present invention may further have the following technical features:
in the above technical solution, preferably, the determining a modeling algorithm according to the modeling problem type, the sample parameter, and the preset model selection policy specifically includes: determining the range of the type of a modeling algorithm according to the type of the modeling problem; and determining a modeling algorithm within the range of the type of the modeling algorithm according to the sample parameters and a preset model selection strategy.
In the technical scheme, the range of the type of the modeling algorithm is determined according to the type of the modeling problem, for example, the type of the modeling problem is classified, algorithms corresponding to the classified problem, such as a decision tree, logistic regression, fuzzy rules and the like, can be selected from a modeling algorithm cluster, and as sample parameters reflect the characteristics of sample data, one or more algorithms for modeling are selected finally within the range of the type of the modeling algorithm according to the sample parameters and a preset model selection strategy, so that the modeling is more accurate and reliable, and the modeling efficiency is improved.
In any of the above technical solutions, preferably, the scoring result includes: a correct rate score and at least one or a combination of: performance index scoring, stability index scoring, and custom index scoring.
In the technical scheme, the scoring of the data model comprises correct rate scoring, performance index scoring, stability index scoring and user-defined index scoring, a user can select according to actual needs, and the scoring in all aspects of comprehensive consideration also ensures the reliability of the data model.
In any of the above technical solutions, preferably, the calculation formula of the scoring result is:
SCORE total =SCORE acc ×W acc +SCORE perf ×W perf +SCORE robust ×W robust
+SCORE cust ×W cust
wherein, SCORE total SCORE for Total, SCORE acc For accuracy scoring, W acc ScORE for scoring weights for predetermined accuracy rates perf Scoring the performance index, W perf ScORE for the Preset Performance index robust Scoring the stability index, W robust Scoring a predetermined stability index by a weight, SCORE cust Scoring the custom index, W cust And scoring the weight of the preset user-defined index.
In the technical scheme, the scoring result of the data model is a weighted summation result of the correct rate scoring, the performance index scoring, the stability index scoring and the user-defined index scoring, a user can select one or more items according to actual needs to score the data model, and the weight is adjusted correspondingly, generally speaking, the weight of the correct rate scoring is the highest, and the reliability of the data model is ensured.
In any of the above technical solutions, preferably, the accuracy rating formula is:
wherein, acc is the accuracy of the data model, acc thredhold And if the accuracy threshold is preset, the accuracy of the data model is the ratio of the number of correct results output by the data model to the number of sample data.
In the technical scheme, when the accuracy of the data model is smaller than a preset accuracy threshold, the accuracy of the data model is lower, which indicates that the data model cannot meet the production requirement, and the accuracy score is zero; when the accuracy of the data model is greater than or equal to the accuracy threshold, the accuracy score is the difference between the accuracy of the data model and the accuracy threshold, and the higher the accuracy of the data model is, the higher the accuracy score is.
In any of the above technical solutions, preferably, the performance index scoring formula is:
SCORE perf =T min -T i
wherein, the performance index is scored T min Minimum time spent training data models, T i It actually takes time to train the data model.
In the technical scheme, the performance index score is the time consumption cost for obtaining an output result for the same sample data, the time spent in each iteration in the data model training process is recorded, the time spent in the data model training process is selected as the minimum time spent in the data model training, the performance index score is the difference between the minimum time spent in the data model training and the actual time spent in the data model training, and the performance index score is higher when the actual time spent is less.
In any of the above technical solutions, preferably, if an abnormal condition occurs during the training process of the data model and the difference between the output result of the data model and the output result of the data model under the abnormal condition is within a preset range, the stability index SCORE is determined robust Is 1, otherwise, the stability index SCORE SCORE robust Is 0.
In the technical scheme, if abnormal conditions (such as field value control, insufficient computing resources and the like) occur in the training process of the data model, and the output result of the data model under the abnormal conditions is not greatly different from the output result under the non-abnormal conditions, which indicates that the data model is relatively stable, the stability index score is 1, otherwise, the stability index score is 0, and if no abnormal conditions occur, a user can set the weight of the stability index score to zero when calculating the total score according to the actual conditions.
In any of the above technical solutions, preferably, when the modeling algorithm is a custom algorithm, the custom index SCORE is SCORE cust Measures given to business expertsGrading according to the model effect; when the modeling algorithm is not a custom algorithm, the custom index SCORE SCORE cust Is 0.
In the technical scheme, when a user-defined algorithm is selected for modeling, a user-defined index score needs to be set in the total score, wherein the score is the score of the data model effect given by a service expert.
In any of the above technical solutions, preferably, when the scoring result satisfies the issuable index, the data model with the highest total score is determined as the final data model.
In the technical scheme, when the scoring result meets the issuable index, the data model with the highest total score is selected as the final data model, so that the automatic screening of the model is realized, and the method is applied to the actual production environment.
In a second aspect of the present invention, an apparatus for training a data model is provided, including: the acquisition unit is used for acquiring the type of the modeling problem and sample data and identifying the type of the sample data; the first determining unit is used for determining sample parameters and issuable indexes according to the modeling problem type and sample data; the second determining unit is used for determining a modeling algorithm according to the modeling problem type, the sample parameter and a preset model selection strategy; the modeling unit is used for training a data model according to a modeling algorithm and inputting sample data into the data model to obtain an output result; the scoring unit is used for scoring the output result to obtain a scoring result; the judging unit is used for judging whether the grading result meets the issuable index or not; and the optimizing unit is used for optimizing the preset model selection strategy when the grading result does not meet the issuable index, and returning to determine the modeling algorithm continuously according to the modeling problem type, the sample parameter and the preset model selection strategy.
The invention provides a training device of a data model, which comprises the steps of firstly identifying the type of acquired sample data (for example, the sample data is digital or character type, continuous or discrete type and the like), determining sample parameters (for example, classification indexes of classification problems, mean values of clustering problems and the like) and issuable indexes (for example, the accuracy rate is more than 95% and the like) according to the type of the sample data and the type of acquired modeling problems (for example, classification problems, clustering problems, association problems and the like), then selecting one or more modeling algorithms in a modeling algorithm cluster according to the type of the modeling problems, the sample parameters and a preset model selection strategy, training the data model, finally grading the data model by using the sample data, judging whether the grading result meets the issuable indexes, if not, optimizing the preset model selection strategy, and returning to re-determine the modeling algorithms. According to the method, the corresponding mining algorithm is automatically selected through the preset model selection strategy to model the sample data, the preset model selection strategy is automatically optimized through evaluating the data model, manual intervention is not needed, the objectivity of the model is greatly improved, subjective omission and errors of modeling personnel are reduced, the deployable model meeting the production environment can be selected, the threshold of the business modeling personnel for applying the mining algorithm is reduced, and the modeling accuracy and efficiency are improved.
The training device for the data model according to the present invention may further have the following technical features:
in the foregoing technical solution, preferably, the second determining unit specifically includes: the third determining unit is used for determining the range of the modeling algorithm type according to the modeling problem type; and the selection unit is used for determining the modeling algorithm within the range of the type of the modeling algorithm according to the sample parameters and the preset model selection strategy.
In the technical scheme, the range of the modeling algorithm type is determined according to the modeling problem type, for example, the modeling problem type is a classification type problem, an algorithm corresponding to the classification type problem can be selected in a modeling algorithm cluster, such as a decision tree, a logistic regression, a fuzzy rule and the like, as sample parameters reflect the characteristics of sample data, and then one or more algorithms for modeling are selected in the range of the modeling algorithm type according to the sample parameters and a preset model selection and measurement strategy, so that the modeling is more accurate and reliable, and the modeling efficiency is improved.
In any of the above technical solutions, preferably, the scoring result includes: a correct rate score and at least one or a combination of: performance index score, stability index score, and custom index score.
In the technical scheme, the scoring of the data model comprises correct rate scoring, performance index scoring, stability index scoring and user-defined index scoring, a user can select the scoring according to actual needs, and the scoring in all aspects is comprehensively considered, so that the reliability of the data model is ensured.
In any of the above technical solutions, preferably, the calculation formula of the scoring result is:
SCORE total =SCORE acc ×W acc +SCORE perf ×W perf +SCORE robust ×W robust +SCORE cust ×W cust
wherein, SCORE total SCORE for Total, SCORE acc For accuracy rating, W acc Score for predetermined accuracy rating perf Scoring the performance index, W perf ScORE for the Preset Performance index robust Scoring for stability index, W robust Scoring a predetermined stability index by a weight, SCORE cust Scoring for the custom index, W cust And the weight is scored for the preset user-defined index, so that the reliability of the data model is ensured.
In the technical scheme, the scoring result of the data model is a weighted summation result of the correct rate scoring, the performance index scoring, the stability index scoring and the user-defined index scoring, and a user can select one or more items according to actual needs to score the data model and correspondingly adjust the weight, wherein generally, the weight of the correct rate scoring is the highest.
In any of the above technical solutions, preferably, the accuracy rating formula is:
wherein, acc is the accuracy of the data model, acc thredhold For presetting a threshold value of the accuracy, the accuracy of the data model is the accuracy of the data model outputThe ratio of the number to the number of sample data.
In the technical scheme, when the accuracy of the data model is smaller than a preset accuracy threshold, the accuracy of the data model is lower, which indicates that the data model cannot meet the production requirement, and the accuracy score is zero; when the accuracy of the data model is greater than or equal to the accuracy threshold, the accuracy score is the difference between the accuracy of the data model and the accuracy threshold, and the higher the accuracy of the data model is, the higher the accuracy score is.
In any of the above technical solutions, preferably, the performance index scoring formula is:
SCORE perf =T min -T i
wherein, T min Minimum time spent training data models, T i It actually takes time to train the data model.
In the technical scheme, the performance index score is the time consumption cost for obtaining an output result for the same sample data, the time spent in each iteration in the data model training process is recorded, the time spent in the least time spent in the data model training process is selected as the minimum time spent in the data model training, the performance index score is the difference between the minimum time spent in the data model training and the actual time spent in the data model training, and the performance index score is higher when the actual time spent is less.
In any of the above technical solutions, preferably, if an abnormal condition occurs during the training process of the data model and the difference between the output result of the data model and the output result of the data model under the abnormal condition is within a preset range, the stability index SCORE is determined robust Is 1, otherwise, the stability index SCORE SCORE robust Is 0.
In the technical scheme, if abnormal conditions (such as field value control, insufficient computing resources and the like) occur in the training process of the data model, and the output result of the data model under the abnormal conditions is not greatly different from the output result under the non-abnormal conditions, which indicates that the data model is relatively stable, the stability index score is 1, otherwise, the stability index score is 0, and if no abnormal conditions occur, a user can set the weight of the stability index score to zero when calculating the total score according to the actual conditions.
In any of the above technical solutions, preferably, when the modeling algorithm is a custom algorithm, the custom index SCORE is SCORE cust The score for measuring the effect of the data model is given to the service expert; when the modeling algorithm is not a custom algorithm, the custom index SCORE SCORE cust Is 0.
In the technical scheme, when a user-defined algorithm is selected for modeling, a user-defined index score needs to be set in the total score, and the score is the score of the data model effect given by a service expert.
In any of the above technical solutions, preferably, the optimization unit is further configured to determine the data model with the highest total score as the final data model when the scoring result satisfies the issuable index.
In the technical scheme, when the scoring result meets the issuable index, the data model with the highest total score is selected as the final data model, so that the automatic screening of the model is realized, and the method is applied to the actual production environment.
Additional aspects and advantages of the invention will be set forth in part in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention.
Drawings
The above and/or additional aspects and advantages of the present invention will become apparent and readily appreciated from the following description of the embodiments, taken in conjunction with the accompanying drawings of which:
FIG. 1 is a flow diagram illustrating a classical data mining process in the background of the invention;
FIG. 2 is a flow chart illustrating a classical data modeling in the background of the invention;
FIG. 3 illustrates a flow diagram of a method of training a data model according to an embodiment of the invention;
FIG. 4 shows a flow diagram of a method of training a data model according to an embodiment of the invention;
FIG. 5 shows a schematic block diagram of a training apparatus for a data model of an embodiment of the present invention;
FIG. 6 shows a schematic block diagram of a training apparatus for a data model of an embodiment of the present invention;
FIG. 7 is a schematic flow chart diagram illustrating a mining modeling method in accordance with an exemplary embodiment of the present invention;
FIG. 8 illustrates a model diagram of an auto-training evaluation mechanism in accordance with a specific embodiment of the present invention;
fig. 9 is a schematic diagram illustrating the effect of the mining modeling method applied to the data analysis platform according to the embodiment of the present invention.
Detailed Description
In order that the above objects, features and advantages of the present invention can be more clearly understood, a more particular description of the invention will be rendered by reference to the appended drawings. It should be noted that the embodiments and features of the embodiments of the present application may be combined with each other without conflict.
In the following description, numerous specific details are set forth in order to provide a thorough understanding of the present invention, however, the present invention may be practiced in other ways than those specifically described herein, and therefore the scope of the present invention is not limited to the specific embodiments disclosed below.
Embodiments of the first aspect of the present invention provide a method for training a data model, and fig. 3 shows a flowchart of the method for training a data model according to an embodiment of the present invention. The training method of the data model shown in fig. 3 includes:
102, acquiring a modeling problem type and sample data, and identifying the type of the sample data;
104, determining sample parameters and issuable indexes according to the modeling problem type and sample data;
106, determining a modeling algorithm according to the modeling problem type, the sample parameters and a preset model selection strategy;
108, training a data model according to a modeling algorithm, and inputting sample data into the data model to obtain an output result;
step 110, scoring the output result to obtain a scoring result;
step 112, judging whether the grading result meets the issuable index;
and step 114, when the scoring result does not meet the issuable index, optimizing a preset model selection strategy, and returning to step 106.
The invention provides a training method of a data model, which comprises the steps of firstly identifying the type of obtained sample data (such as the sample data is digital or character type, continuous or discrete type and the like), determining sample parameters (such as classification indexes of classification problems, mean values of clustering problems and the like) and issuable indexes (such as the accuracy rate is more than 95% and the like) according to the type of the sample data and the type of obtained modeling problems (such as classification problems, clustering problems, association problems and the like), checking the contents of necessary items, value range and the like of the sample parameters, selecting one or more modeling algorithms from a modeling algorithm cluster according to the type of the modeling problems, the sample parameters and a preset model selection strategy, training the data model, finally grading the data model by using the sample data, judging whether a grading result meets the issuable indexes or not, optimizing the preset model selection strategy if the grading result does not meet the issuable indexes, and returning to redetermine the modeling algorithms. According to the method, the corresponding mining algorithm is automatically selected through the preset model selection strategy to model the sample data, the preset model selection strategy is automatically optimized through evaluating the data model, manual intervention is not needed, the objectivity of the model is greatly improved, subjective omission and errors of modeling personnel are reduced, the deployable model meeting the production environment can be selected, the threshold of the business modeling personnel for applying the mining algorithm is reduced, and the modeling accuracy and efficiency are improved.
FIG. 4 shows a flow diagram of a method for training a data model according to an embodiment of the invention. The training method of the data model shown in fig. 4 includes:
step 202, obtaining a modeling problem type and sample data, and identifying the sample data type;
step 204, determining sample parameters and issuable indexes according to the modeling problem type and sample data;
step 206, determining the range of the modeling algorithm type according to the modeling problem type;
208, determining a modeling algorithm within the range of modeling algorithm types according to the sample parameters and a preset model selection strategy;
step 210, training a data model according to a modeling algorithm, and inputting sample data into the data model to obtain an output result;
step 212, scoring the output result to obtain a scoring result;
step 214, judging whether the scoring result meets the issuable index;
step 216, when the scoring result does not meet the issuable index, optimizing a preset model selection strategy, and returning to step 208;
and step 218, when the grading result meets the issuable index, selecting the data model with the highest total grade as the final data model, so as to realize automatic screening of the model, and applying the model to the actual production environment.
In this embodiment, in step 206 and step 208, a range of a modeling algorithm type is determined according to the modeling problem type, for example, the modeling problem type is a classification type problem, an algorithm corresponding to the classification type problem, such as a decision tree, a logistic regression, a fuzzy rule, etc., may be selected in the modeling algorithm cluster, and since the sample parameter reflects the characteristics of the sample data, one or more algorithms that are finally modeled are selected within the range of the modeling algorithm type according to the sample parameter and a preset model selection and measurement policy, so as to ensure more accurate and reliable modeling and improve modeling efficiency.
In step 218, when the scoring result meets the issuable index, the data model with the highest total score is selected as the final data model, so as to realize automatic screening of the model, and the model is applied to the actual production environment, and if the issuable index is more than 98% of accuracy, the model with the highest total score is selected from all the data models with the accuracy more than 98% for deployment.
In one embodiment of the present invention, preferably, the scoring result includes: a correct rate score and at least one or a combination of: performance index score, stability index score, and custom index score.
In the embodiment, the scoring of the data model comprises the accuracy scoring, the performance index scoring, the stability index scoring and the user-defined index scoring, a user can select according to actual needs, and the scoring of all aspects of comprehensive consideration also ensures the reliability of the data model.
In one embodiment of the present invention, preferably, the calculation formula of the scoring result is:
SCORE total =SCORE acc ×W acc +SCORE perf ×W perf +SCORE robust ×W robust +SCORE cust ×W cust
wherein, SCORE total SCORE for total, SCORE acc For accuracy scoring, W acc ScORE for scoring weights for predetermined accuracy rates perf Scoring the performance index, W perf ScORE for scoring weights for preset performance indicators robust Scoring for stability index, W robust Scoring a predetermined stability index weight, SCORE cust Scoring the custom index, W cust And scoring the weight for the preset custom index.
In one embodiment of the present invention, preferably, the accuracy rating score formula is:
wherein, acc is the accuracy of the data model, acc thredhold The accuracy of the data model is the ratio of the number of correct results output by the data model to the number of sample data, which is a preset accuracy threshold.
In one embodiment of the present invention, preferably, the performance index scoring formula is:
SCORE perf =T min -T i
wherein, the performance index is scored T min Minimum time spent training data models, T i It actually takes time to train the data model.
In an embodiment of the present invention, preferably, if an abnormal condition occurs during the training process of the data model and a difference between an output result of the data model and an output result of the data model in a non-abnormal condition is within a preset range, the stability index SCORE is set robust Is 1, otherwise, the stability index SCORE SCORE robust Is 0.
In one embodiment of the present invention, preferably, when the modeling algorithm is a custom algorithm, the custom index SCORE SCORE cust The score for measuring the effect of the data model is given to the service expert; when the modeling algorithm is not a custom algorithm, the custom index SCORE SCORE cust Is 0.
In this embodiment, the scoring result of the data model is a result of weighted summation of the accuracy rating, the performance index rating, the stability index rating and the user-defined index rating, and a user can select one or more of the accuracy rating, the performance index rating, the stability index rating and the user-defined index rating according to actual needs to score the data model and adjust the weight accordingly.
For the accuracy rating, when the accuracy of the data model is smaller than a preset accuracy threshold, the accuracy of the data model is lower, which indicates that the data model cannot meet the production requirement, and the accuracy rating is zero; when the accuracy of the data model is greater than or equal to the accuracy threshold, the accuracy score is the difference between the accuracy of the data model and the accuracy threshold, and the higher the accuracy of the data model is, the higher the accuracy score is.
In addition, the performance index score is the time consumption cost for obtaining an output result for the same sample data, the time spent in each iteration in the data model training process is recorded, the time spent in the data model training process is selected as the minimum time spent in the data model training, the performance index score is the difference between the minimum time spent in the data model training and the actual time spent in the data model training, and the performance index score is higher when the actual time spent is less.
In addition, if an abnormal condition (such as field value control, insufficient computing resources and the like) occurs in the training process of the data model, and the output result of the data model under the abnormal condition is not greatly different from the output result under the non-abnormal condition, which indicates that the data model is relatively stable, the stability index score is 1, otherwise, the stability index score is 0, and if no abnormal condition occurs, the user can set the weight of the stability index score to zero when calculating the total score according to the actual condition.
In addition, when the modeling is performed by using a custom algorithm, a custom index score, which is a score of the effect of the data model given by a business expert, needs to be set in the total score.
It should be noted that, for the scoring of the data model, generally, the accuracy score is the most important consideration factor, the accuracy score weight is also the largest, and the performance index score, the stability index score and the user-defined index score are optional items, and the user can select one or more items to evaluate the data model together with the accuracy score according to actual needs, for example, if a decision tree algorithm is used for modeling, and an abnormality occurs during the training of the model, the user can select the accuracy score and the stability index score to evaluate the data model, and at the same time, the corresponding weight needs to be adjusted.
In a second aspect of the present invention, a training apparatus for a data model is provided, and fig. 5 shows a schematic block diagram of the training apparatus for a data model according to an embodiment of the present invention. The training apparatus 300 of the data model shown in fig. 5 includes:
an obtaining unit 302, configured to obtain a modeling problem type and sample data, and identify the sample data type;
a first determining unit 304, configured to determine a sample parameter and a publishable index according to the modeling problem type and the sample data;
a second determining unit 306, configured to determine a modeling algorithm according to the modeling problem type, the sample parameter, and a preset model selection policy;
the modeling unit 308 is used for training the data model according to a modeling algorithm and inputting sample data into the data model to obtain an output result;
the scoring unit 310 is configured to score the output result to obtain a scoring result;
a judging unit 312, configured to judge whether the scoring result meets the issuable index;
and the optimizing unit 314 is configured to optimize the preset model selection policy when the scoring result does not meet the issuable index, and return to determine the modeling algorithm according to the modeling problem type, the sample parameter, and the preset model selection policy.
The invention provides a training device of a data model, which comprises the steps of firstly identifying the type of acquired sample data (such as the sample data is digital or character type, continuous or discrete), determining sample parameters (such as classification indexes of classification problems, mean values of clustering problems and the like) and issuable indexes (such as accuracy rate more than 95% and the like) according to the type of the sample data and the type of acquired modeling problems (such as classification problems, clustering problems, association problems and the like), checking the contents of necessary items, value range and the like of the sample parameters, selecting one or more modeling algorithms in a modeling algorithm cluster according to the type of the modeling problems, the sample parameters and a preset model selection strategy, training the data model, grading the data model by using the sample data, judging whether a grading result meets the issuable indexes or not, optimizing the preset model selection strategy if the grading result does not meet the issuable indexes, and returning to determine the modeling algorithms again. According to the method, the corresponding mining algorithm is automatically selected through the preset model selection strategy to model the sample data, the preset model selection strategy is automatically optimized through evaluating the data model, manual intervention is not needed, the objectivity of the model is greatly improved, subjective omission and errors of modeling personnel are reduced, the deployable model meeting the production environment can be selected, the threshold of the business modeling personnel for applying the mining algorithm is reduced, and the modeling accuracy and efficiency are improved.
FIG. 6 shows a schematic block diagram of a training apparatus for a data model according to an embodiment of the present invention. The training apparatus of the data model shown in fig. 6 includes:
an obtaining unit 402, configured to obtain a modeling problem type and sample data, and identify the sample data type;
a first determining unit 404, configured to determine a sample parameter and a issuable index according to the modeling problem type and the sample data;
a second determining unit 406, configured to determine a modeling algorithm according to the modeling problem type, the sample parameter, and a preset model selection policy;
the modeling unit 408 is used for training a data model according to a modeling algorithm and inputting sample data into the data model to obtain an output result;
the scoring unit 410 is used for scoring the output result to obtain a scoring result;
a determining unit 412, configured to determine whether the scoring result meets a distributable index;
the optimizing unit 414 is configured to optimize the preset model selection policy when the scoring result does not meet the issuable index, and return to determine a modeling algorithm according to the modeling problem type, the sample parameter, and the preset model selection policy;
the second determining unit 406 specifically includes:
a third determining unit 462, configured to determine a range of a modeling algorithm type according to the modeling problem type;
a selecting unit 464, configured to determine a modeling algorithm within a range of modeling algorithm types according to the sample parameters and a preset model selection policy;
and the optimizing unit 414 is further configured to determine the data model with the highest total score as the final data model when the scoring result satisfies the issuable index.
In the technical scheme, the range of the modeling algorithm type is determined according to the modeling problem type, for example, the modeling problem type is a classification type problem, an algorithm corresponding to the classification type problem can be selected in a modeling algorithm cluster, such as a decision tree, a logistic regression, a fuzzy rule and the like, as sample parameters reflect the characteristics of sample data, and then one or more algorithms for modeling are selected in the range of the modeling algorithm type according to the sample parameters and a preset model selection and measurement strategy, so that the modeling is more accurate and reliable, and the modeling efficiency is improved.
And when the scoring result meets the issuable index, selecting the data model with the highest total score as the final data model, realizing automatic screening of the model, and applying the model to the actual production environment.
In one embodiment of the present invention, preferably, the scoring result includes: a correct rate score and at least one or a combination of: performance index scoring, stability index scoring, and custom index scoring.
In the embodiment, the scoring of the data model comprises correct rate scoring, performance index scoring, stability index scoring and user-defined index scoring, a user can select the scoring according to actual needs, and the scoring in all aspects of comprehensive consideration also ensures the reliability of the data model.
In one embodiment of the present invention, preferably, the calculation formula of the scoring result is:
SCORE total =SCORE acc ×W acc +SCORE perf ×W perf +SCORE robust ×W robust +SCORE cust ×W cust
wherein, SCORE total SCORE for total, SCORE acc For accuracy scoring, W acc Score for predetermined accuracy rating perf Scoring the performance index, W perf ScORE for scoring weights for preset performance indicators robust Scoring the stability index, W robust Scoring a predetermined stability index weight, SCORE cust Scoring for the custom index, W cust And scoring the weight for the preset custom index.
In one embodiment of the present invention, preferably, the accuracy rating score formula is:
wherein, acc is the accuracy of the data model, acc thredhold The accuracy of the data model is the ratio of the number of correct results output by the data model to the number of sample data, which is a preset accuracy threshold.
In one embodiment of the present invention, preferably, the performance index scoring formula is:
SCORE perf =T min -T i
wherein, the performance index is scored as T min Minimum time spent training data models, T i It actually takes time to train the data model.
In an embodiment of the present invention, preferably, if an abnormal condition occurs during the training process of the data model and the difference between the output result of the data model and the output result of the data model in the abnormal condition is within a preset range, the stability index SCORE is generated robust Is 1, otherwise, the stability index SCORE SCORE robust Is 0.
In one embodiment of the present invention, preferably, when the modeling algorithm is a custom algorithm, the custom index SCORE is SCORE cust A score for measuring the effect of the data model is given to the service expert; when the modeling algorithm is not a custom algorithm, the custom index SCORE SCORE cust Is 0.
In this embodiment, the scoring result of the data model is a weighted summation result of the accuracy rating, the performance index rating, the stability index rating and the user-defined index rating, and a user can select one or more of the accuracy rating, the performance index rating, the stability index rating and the user-defined index rating according to actual needs to score the data model and adjust the weight accordingly.
For the accuracy rating, when the accuracy of the data model is smaller than a preset accuracy threshold, the accuracy of the data model is lower, which indicates that the data model cannot meet the production requirement, and the accuracy rating is zero; when the accuracy of the data model is greater than or equal to the accuracy threshold, the accuracy score is the difference between the accuracy of the data model and the accuracy threshold, and the higher the accuracy of the data model is, the higher the accuracy score is.
In addition, the performance index score is the time consumption cost for obtaining an output result for the same sample data, the time spent in each iteration in the data model training process is recorded, the time spent in the data model training process is selected as the minimum time spent in the data model training, the performance index score is the difference between the minimum time spent in the data model training and the actual time spent in the data model training, and the performance index score is higher when the actual time spent is less.
In addition, if an abnormal condition (such as field value control, insufficient computing resources and the like) occurs in the training process of the data model, and the output result of the data model under the abnormal condition is not greatly different from the output result under the non-abnormal condition, which indicates that the data model is relatively stable, the stability index score is 1, otherwise, the stability index score is 0, and if no abnormal condition occurs, the user can set the weight of the stability index score to zero when calculating the total score according to the actual condition.
In addition, when the modeling is performed by using a custom algorithm, a custom index score, which is a score of the effect of the data model given by a business expert, needs to be set in the total score.
It should be noted that, for the scoring of the data model, generally, the accuracy score is the most important consideration factor, the accuracy score weight is also the largest, and the performance index score, the stability index score and the user-defined index score are optional items, and the user can select one or more items to evaluate the data model together with the accuracy score according to actual needs, for example, if a decision tree algorithm is used for modeling, and an abnormality occurs during the training of the model, the user can select the accuracy score and the stability index score to evaluate the data model, and at the same time, the corresponding weight needs to be adjusted.
The specific embodiment is as follows:
FIG. 7 is a flowchart illustrating a mining modeling method according to an embodiment of the present invention. Comparing fig. 7 with fig. 2, it can be seen that in the method of fig. 7, modeling evaluation can be automatically performed on input data, an optimal model is determined and automatically deployed, seamless connection without manual intervention is achieved, and therefore, an available model does not need to be separately deployed in a production flow, and self-optimization updating is completed.
FIG. 8 illustrates a model diagram of an auto-training evaluation mechanism, in accordance with an embodiment of the present invention.
From the view of the whole service data flow, the whole automatic training evaluation device firstly carries out automatic modeling on the sampled data of the received service (label 1), and firstly evaluates the availability and carries out self-iterative updating (label 2 and label 4) according to the model prediction result of the production environment. In addition, the business problem definition module is mainly used for judging the types of the analysis modeling problems, such as classification, clustering, relevance analysis and the like. Basic judgment by the service personnel and assignment of input are required here.
The other main parts of the technical principle of the training evaluation device are as follows:
mining model parameter definition:
the main functions are as follows: 1. identifying the data type of the sample, such as whether the column data is a number or a character, continuous or discrete, missing values, data distribution and the like; 2. determining sample parameters such as classification indexes, K value, convergence rules/penalty functions and the like; 3. checking parameters including necessary items, value range and the like; 4. publishable index of mining model and threshold (e.g., >95% accuracy).
Modeling algorithm descriptor:
the basic description information of each algorithm in the algorithm cluster is mainly maintained, and the basic information of the algorithms is obtained according to the algorithm types (classification, clustering and the like), the algorithm parameters, the data types and the like. The method supports the expansion of a custom algorithm, provides an XML format, and describes and registers information such as algorithm parameters, types and the like.
A mining algorithm selector:
the method mainly comprises two parts: 1. defining an applicable algorithm range in an algorithm cluster according to modeling algorithm description by combining modeling problem definition and sample parameter definition; 2. and optimizing the selection strategy of the model according to the selection history of the algorithm and the grading result, and determining the training parameters of the model, such as the selection strategy of the K value in the K-MEANS.
Modeling algorithm clustering:
namely a basic mining modeling algorithm and an algorithm package customized by business modeling personnel. The device mainly comprises two parts: 1. and the algorithm description comprises algorithm classification, algorithm parameters, output data, evaluation parameters and the like. 2. The algorithm execution package supports multiple implementations, such as Java, python, R and other runtime environments, and the runtime supports the PFM/PMML model format.
A model scoring device:
the model scorer is mainly used for evaluating an intermediate result output in a self-iteration process of data model training, giving a score of the intermediate data model, and taking the score as a basis of model optimization to finally obtain an optimal data model. The scoring content comprises the following steps: accuracy index, performance index, stability index and user-defined index.
Analyzing and predicting result evaluation:
and (3) evaluating the model deployed in the production environment in real time, and triggering a self-optimization updating mechanism of the model when the evaluation index is lower than a certain preset threshold (the threshold is specified when the sample parameter is defined), so that the automatic screening and automatic updating of the model are finally realized, and the self-optimization of the data modeling process is automatically completed.
Fig. 9 is a schematic diagram illustrating the effect of the mining modeling method applied to the data analysis platform according to the embodiment of the present invention. As shown in fig. 9, first, sample data is classified and then a data distribution map and an original data model are output, an iterative automatic update is performed on the data model, then, preprocessed data is substituted into the updated data model to output a predicted value, and finally, the accuracy of the model is evaluated and visualized, so that the observation and adjustment of business personnel are facilitated.
In the description herein, the description of the terms "one embodiment," "some embodiments," "specific embodiments," etc., means that a particular feature, structure, material, or characteristic described in connection with the embodiment or example is included in at least one embodiment or example of the invention. In this specification, the schematic representations of the terms used above do not necessarily refer to the same embodiment or example. Furthermore, the particular features, structures, materials, or characteristics described may be combined in any suitable manner in any one or more embodiments or examples.
The above description is only a preferred embodiment of the present invention and is not intended to limit the present invention, and various modifications and changes may be made by those skilled in the art. Any modification, equivalent replacement, or improvement made within the spirit and principle of the present invention should be included in the protection scope of the present invention.

Claims (18)

1. A method for training a data model, comprising:
obtaining a modeling problem type and sample data, and identifying the type of the sample data;
determining the sample parameters and a publishable index according to the modeling problem type and the sample data;
determining a modeling algorithm according to the modeling problem type, the sample parameters and a preset model selection strategy;
training a data model according to the modeling algorithm, and inputting the sample data into the data model to obtain an output result;
grading the output result to obtain a grading result;
judging whether the grading result meets the publishable index;
and when the grading result does not meet the publishable index, optimizing the preset model selection strategy, and returning to determine a modeling algorithm continuously according to the modeling problem type, the sample parameter and the preset model selection strategy.
2. The method for training a data model according to claim 1, wherein the determining a modeling algorithm according to the modeling problem type, the sample parameter, and a preset model selection strategy specifically comprises:
determining the range of the type of a modeling algorithm according to the type of the modeling problem;
and determining a modeling algorithm within the range of the type of the modeling algorithm according to the sample parameters and the preset model selection strategy.
3. The method of training a data model according to claim 1,
the scoring result comprises: a correct rate score and at least one or a combination of: performance index scoring, stability index scoring, and custom index scoring.
4. The method of training a data model according to claim 3,
the calculation formula of the scoring result is as follows:
SCORE total =SCORE acc ×W acc +SCORE perf ×W perf +SCORE robust ×W robust +SCORE cust ×W cust
wherein, SCORE total SCORE for total, SCORE acc Scoring said accuracy, W acc ScORE for scoring weights for predetermined accuracy rates perf Scoring said performance index, W perf ScORE for the Preset Performance index robust Scoring the stability index, W robust Scoring a predetermined stability index weight, SCORE cust Scoring the custom index, W cust And scoring the weight for the preset custom index.
5. The method for training a data model according to claim 4, wherein the accuracy rating formula is:
wherein, acc is the accuracy of the data model, acc thredhold And if the accuracy is the preset accuracy threshold, the accuracy of the data model is the ratio of the number of correct results output by the data model to the number of sample data.
6. The method for training a data model according to claim 4, wherein the performance index scoring formula is:
SCORE perf =T min -T i
wherein, T min Minimum time spent training data models, T i It actually takes time to train the data model.
7. The method of training a data model according to claim 4,
if an abnormal condition occurs in the training process of the data model and the difference between the output result of the data model and the output result under the abnormal condition is within a preset range, the stability index SCORE SCORE robust Is 1, otherwise, the stability index SCORE SCORE robust Is 0.
8. The method of training a data model according to claim 4,
when the modeling algorithm is a custom algorithm, the custom index SCORE SCORE cust The score for measuring the effect of the data model is given to the service expert;
when the modeling algorithm is not a custom algorithm, the custom index SCORE SCORE cust Is 0.
9. A method of training a data model according to any one of claims 1 to 8, further comprising:
and when the scoring result meets the publishable index, determining the data model with the highest total score as the final data model.
10. An apparatus for training a data model, comprising:
the acquisition unit is used for acquiring the type of the modeling problem and sample data and identifying the type of the sample data;
the first determining unit is used for determining the sample parameters and the issuable index according to the modeling problem type and the sample data;
the second determining unit is used for determining a modeling algorithm according to the modeling problem type, the sample parameter and a preset model selection strategy;
the modeling unit is used for training a data model according to the modeling algorithm and inputting the sample data into the data model to obtain an output result;
the scoring unit is used for scoring the output result to obtain a scoring result;
the judging unit is used for judging whether the grading result meets the publishable index;
and the optimizing unit is used for optimizing the preset model selection strategy when the grading result does not meet the publishable index, and returning to determine a modeling algorithm continuously according to the modeling problem type, the sample parameter and the preset model selection strategy.
11. The apparatus for training a data model according to claim 10, wherein the second determining unit specifically includes:
the third determining unit is used for determining the range of the modeling algorithm type according to the modeling problem type;
and the selection unit is used for determining a modeling algorithm in the range of the type of the modeling algorithm according to the sample parameters and the preset model selection strategy.
12. The apparatus for training a data model according to claim 10,
the scoring result comprises: a correct rate score and at least one or a combination of: performance index scoring, stability index scoring, and custom index scoring.
13. The training apparatus for data model according to claim 12,
the calculation formula of the scoring result is as follows:
SCORE total =SCORE acc ×W acc +SCORE perf ×W perf +SCORE robust ×W robust +SCORE cust ×W cust
wherein, SCORE total SCORE for Total, SCORE acc Scoring said accuracy, W acc ScORE for scoring weights for predetermined accuracy rates perf Scoring said performance index, W perf ScORE for scoring weights for preset performance indicators robust Scoring the stability index, W robust Scoring a predetermined stability index weight, SCORE cust Scoring the custom index, W cust And scoring the weight of the preset user-defined index.
14. The apparatus for training a data model according to claim 13, wherein the accuracy rating score is formulated as:
wherein, acc is the accuracy of the data model, acc thredhold And if the accuracy rate is the preset accuracy rate threshold, the accuracy rate of the data model is the ratio of the number of the correct results output by the data model to the number of the sample data.
15. The apparatus for training a data model according to claim 13, wherein the performance indicator score is formulated as:
SCORE perf =T min -T i
wherein, T min Cost most for training data modelsSmall time, T i It actually takes time to train the data model.
16. The apparatus for training a data model according to claim 13,
if an abnormal condition occurs in the training process of the data model and the difference between the output result of the data model and the output result under the abnormal condition is within a preset range, the stability index SCORE SCORE robust Is 1, otherwise, the stability index SCORE SCORE robust Is 0.
17. The apparatus for training a data model according to claim 13,
when the modeling algorithm is a custom algorithm, the custom index SCORE SCORE cust The score for measuring the effect of the data model is given to the service expert;
when the modeling algorithm is not a custom algorithm, the custom index SCORE SCORE cust Is 0.
18. Training apparatus of a data model according to one of the claims 10 to 17,
and the optimization unit is further used for determining the data model with the highest total score as the final data model when the scoring result meets the publishable index.
CN201711175464.6A 2017-11-22 2017-11-22 The training method and device of a kind of data model Pending CN107958268A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201711175464.6A CN107958268A (en) 2017-11-22 2017-11-22 The training method and device of a kind of data model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201711175464.6A CN107958268A (en) 2017-11-22 2017-11-22 The training method and device of a kind of data model

Publications (1)

Publication Number Publication Date
CN107958268A true CN107958268A (en) 2018-04-24

Family

ID=61959558

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201711175464.6A Pending CN107958268A (en) 2017-11-22 2017-11-22 The training method and device of a kind of data model

Country Status (1)

Country Link
CN (1) CN107958268A (en)

Cited By (18)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664401A (en) * 2018-05-11 2018-10-16 阿里巴巴集团控股有限公司 Bury a little rational appraisal procedure and device
CN109190674A (en) * 2018-08-03 2019-01-11 百度在线网络技术(北京)有限公司 The generation method and device of training data
CN109242135A (en) * 2018-07-16 2019-01-18 阿里巴巴集团控股有限公司 A kind of model method for running, device and service server
CN109299178A (en) * 2018-09-30 2019-02-01 北京九章云极科技有限公司 A kind of application method and data analysis system
CN109389143A (en) * 2018-06-19 2019-02-26 北京九章云极科技有限公司 A kind of Data Analysis Services system and method for automatic modeling
CN109901979A (en) * 2019-01-24 2019-06-18 平安科技(深圳)有限公司 Model optimization intelligent evaluation method, server and computer readable storage medium
CN110020670A (en) * 2019-03-07 2019-07-16 阿里巴巴集团控股有限公司 A kind of model alternative manner, device and equipment
CN110209561A (en) * 2019-05-09 2019-09-06 北京百度网讯科技有限公司 Evaluating method and evaluating apparatus for dialogue platform
CN110222710A (en) * 2019-04-30 2019-09-10 北京深演智能科技股份有限公司 Data processing method, device and storage medium
CN110689134A (en) * 2018-07-05 2020-01-14 第四范式(北京)技术有限公司 Method, apparatus, device and storage medium for performing machine learning process
CN111078984A (en) * 2019-11-05 2020-04-28 深圳奇迹智慧网络有限公司 Network model publishing method and device, computer equipment and storage medium
CN111126419A (en) * 2018-10-30 2020-05-08 顺丰科技有限公司 Dot clustering method and device
CN111664550A (en) * 2019-01-18 2020-09-15 深圳创新奇智科技有限公司 Energy efficiency optimization method and system based on prediction model and optimization algorithm
CN111708810A (en) * 2020-06-17 2020-09-25 北京世纪好未来教育科技有限公司 Model optimization recommendation method and device and computer storage medium
CN112204581A (en) * 2018-06-05 2021-01-08 三菱电机株式会社 Learning device, deduction device, method and program
CN112381158A (en) * 2020-11-18 2021-02-19 山东高速信息集团有限公司 Artificial intelligence-based data efficient training method and system
CN113065658A (en) * 2021-03-30 2021-07-02 山东英信计算机技术有限公司 Method and system for improving accuracy of artificial intelligence inference result
CN113570257A (en) * 2021-07-30 2021-10-29 北京房江湖科技有限公司 Index data evaluation method and device based on scoring model, medium and equipment

Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567391A (en) * 2010-12-20 2012-07-11 ***通信集团广东有限公司 Method and device for building classification forecasting mixed model
CN103886203A (en) * 2014-03-24 2014-06-25 美商天睿信息***(北京)有限公司 Automatic modeling system and method based on index prediction
CN104807775A (en) * 2015-01-29 2015-07-29 湖南省农产品加工研究所 NIR spectrum analysis model and method used for identifying frying oil quality
CN104954210A (en) * 2015-06-19 2015-09-30 重庆邮电大学 Method for matching different service types in power distribution communication network with wireless communication modes
CN106663224A (en) * 2014-06-30 2017-05-10 亚马逊科技公司 Interactive interfaces for machine learning model evaluations
CN106897918A (en) * 2017-02-24 2017-06-27 上海易贷网金融信息服务有限公司 A kind of hybrid machine learning credit scoring model construction method
CN107067179A (en) * 2017-04-20 2017-08-18 中国电子技术标准化研究院 A kind of industrial control system standard compliance assessment system
CN107103050A (en) * 2017-03-31 2017-08-29 海通安恒(大连)大数据科技有限公司 A kind of big data Modeling Platform and method
CN107194526A (en) * 2017-03-29 2017-09-22 国网浙江省电力公司经济技术研究院 A kind of sales marketization reform progress appraisal procedure based on fuzzy clustering
CN107292412A (en) * 2016-03-31 2017-10-24 阿里巴巴集团控股有限公司 A kind of problem Forecasting Methodology and forecasting system
US20190012581A1 (en) * 2017-07-06 2019-01-10 Nokia Technologies Oy Method and an apparatus for evaluating generative machine learning model

Patent Citations (11)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102567391A (en) * 2010-12-20 2012-07-11 ***通信集团广东有限公司 Method and device for building classification forecasting mixed model
CN103886203A (en) * 2014-03-24 2014-06-25 美商天睿信息***(北京)有限公司 Automatic modeling system and method based on index prediction
CN106663224A (en) * 2014-06-30 2017-05-10 亚马逊科技公司 Interactive interfaces for machine learning model evaluations
CN104807775A (en) * 2015-01-29 2015-07-29 湖南省农产品加工研究所 NIR spectrum analysis model and method used for identifying frying oil quality
CN104954210A (en) * 2015-06-19 2015-09-30 重庆邮电大学 Method for matching different service types in power distribution communication network with wireless communication modes
CN107292412A (en) * 2016-03-31 2017-10-24 阿里巴巴集团控股有限公司 A kind of problem Forecasting Methodology and forecasting system
CN106897918A (en) * 2017-02-24 2017-06-27 上海易贷网金融信息服务有限公司 A kind of hybrid machine learning credit scoring model construction method
CN107194526A (en) * 2017-03-29 2017-09-22 国网浙江省电力公司经济技术研究院 A kind of sales marketization reform progress appraisal procedure based on fuzzy clustering
CN107103050A (en) * 2017-03-31 2017-08-29 海通安恒(大连)大数据科技有限公司 A kind of big data Modeling Platform and method
CN107067179A (en) * 2017-04-20 2017-08-18 中国电子技术标准化研究院 A kind of industrial control system standard compliance assessment system
US20190012581A1 (en) * 2017-07-06 2019-01-10 Nokia Technologies Oy Method and an apparatus for evaluating generative machine learning model

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
殷复莲: "《数据分析与数据挖掘实用教程》", 30 September 2017 *

Cited By (27)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN108664401A (en) * 2018-05-11 2018-10-16 阿里巴巴集团控股有限公司 Bury a little rational appraisal procedure and device
CN108664401B (en) * 2018-05-11 2021-10-12 创新先进技术有限公司 Method and device for evaluating reasonability of buried point
CN112204581A (en) * 2018-06-05 2021-01-08 三菱电机株式会社 Learning device, deduction device, method and program
CN113935434A (en) * 2018-06-19 2022-01-14 北京九章云极科技有限公司 Data analysis processing system and automatic modeling method
CN109389143A (en) * 2018-06-19 2019-02-26 北京九章云极科技有限公司 A kind of Data Analysis Services system and method for automatic modeling
CN110689134A (en) * 2018-07-05 2020-01-14 第四范式(北京)技术有限公司 Method, apparatus, device and storage medium for performing machine learning process
CN109242135B (en) * 2018-07-16 2021-12-21 创新先进技术有限公司 Model operation method, device and business server
CN109242135A (en) * 2018-07-16 2019-01-18 阿里巴巴集团控股有限公司 A kind of model method for running, device and service server
CN109190674B (en) * 2018-08-03 2021-07-20 百度在线网络技术(北京)有限公司 Training data generation method and device
CN109190674A (en) * 2018-08-03 2019-01-11 百度在线网络技术(北京)有限公司 The generation method and device of training data
CN109299178A (en) * 2018-09-30 2019-02-01 北京九章云极科技有限公司 A kind of application method and data analysis system
CN111126419B (en) * 2018-10-30 2023-12-01 顺丰科技有限公司 Dot clustering method and device
CN111126419A (en) * 2018-10-30 2020-05-08 顺丰科技有限公司 Dot clustering method and device
CN111664550A (en) * 2019-01-18 2020-09-15 深圳创新奇智科技有限公司 Energy efficiency optimization method and system based on prediction model and optimization algorithm
CN109901979A (en) * 2019-01-24 2019-06-18 平安科技(深圳)有限公司 Model optimization intelligent evaluation method, server and computer readable storage medium
CN110020670B (en) * 2019-03-07 2023-07-18 创新先进技术有限公司 Model iteration method, device and equipment
CN110020670A (en) * 2019-03-07 2019-07-16 阿里巴巴集团控股有限公司 A kind of model alternative manner, device and equipment
CN110222710A (en) * 2019-04-30 2019-09-10 北京深演智能科技股份有限公司 Data processing method, device and storage medium
CN110209561A (en) * 2019-05-09 2019-09-06 北京百度网讯科技有限公司 Evaluating method and evaluating apparatus for dialogue platform
CN110209561B (en) * 2019-05-09 2024-02-09 北京百度网讯科技有限公司 Evaluation method and evaluation device for dialogue platform
CN111078984A (en) * 2019-11-05 2020-04-28 深圳奇迹智慧网络有限公司 Network model publishing method and device, computer equipment and storage medium
CN111078984B (en) * 2019-11-05 2024-02-06 深圳奇迹智慧网络有限公司 Network model issuing method, device, computer equipment and storage medium
CN111708810A (en) * 2020-06-17 2020-09-25 北京世纪好未来教育科技有限公司 Model optimization recommendation method and device and computer storage medium
CN111708810B (en) * 2020-06-17 2022-05-27 北京世纪好未来教育科技有限公司 Model optimization recommendation method and device and computer storage medium
CN112381158A (en) * 2020-11-18 2021-02-19 山东高速信息集团有限公司 Artificial intelligence-based data efficient training method and system
CN113065658A (en) * 2021-03-30 2021-07-02 山东英信计算机技术有限公司 Method and system for improving accuracy of artificial intelligence inference result
CN113570257A (en) * 2021-07-30 2021-10-29 北京房江湖科技有限公司 Index data evaluation method and device based on scoring model, medium and equipment

Similar Documents

Publication Publication Date Title
CN107958268A (en) The training method and device of a kind of data model
Husain An analysis of modeling audit quality measurement based on decision support systems (DSS)
US7849062B1 (en) Identifying and using critical fields in quality management
US8990145B2 (en) Probabilistic data mining model comparison
US20120016701A1 (en) Intelligent timesheet assistance
Staron et al. A method for forecasting defect backlog in large streamline software development projects and its industrial evaluation
US20110258008A1 (en) Business process model design measurement
CN115293667A (en) Management method of project progress and cost management system
CN112116184A (en) Factory risk estimation using historical inspection data
Herraiz et al. Impact of installation counts on perceived quality: A case study on debian
US20090099907A1 (en) Performance management
CN106096635B (en) The warning classification method of cost-sensitive neural network based on threshold operation
Raza et al. A model for analyzing performance problems and root causes in the personal software process
US20040015382A1 (en) Data-driven management decision tool for total resource management
CN112613718B (en) Specific place risk assessment method and device
CN108197740A (en) Business failure Forecasting Methodology, electronic equipment and computer storage media
CN114066322A (en) Method for evaluating unmanned station operation management and risk prevention and control capacity of oil and gas pipeline
CN113537759A (en) User experience measurement model based on weight self-adaptation
CN113128851A (en) Construction risk assessment method, device and equipment and computer readable storage medium
US9373084B2 (en) Computer system and information presentation method using computer system
CN117893100B (en) Construction method of quality evaluation data updating model based on convolutional neural network
US20230281505A1 (en) Automatic data quality monitoring using machine learning
US11768753B2 (en) System and method for evaluating and deploying data models having improved performance measures
CN117035469B (en) Method and device for measuring and calculating land indexes of public and railway intermodal transportation junction functional area construction
Liao et al. Implementation of traffic data quality verification for WIM sites

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
RJ01 Rejection of invention patent application after publication

Application publication date: 20180424

RJ01 Rejection of invention patent application after publication