CN113052509B - Model evaluation method, model evaluation device, electronic apparatus, and storage medium - Google Patents

Model evaluation method, model evaluation device, electronic apparatus, and storage medium Download PDF

Info

Publication number
CN113052509B
CN113052509B CN202110487843.9A CN202110487843A CN113052509B CN 113052509 B CN113052509 B CN 113052509B CN 202110487843 A CN202110487843 A CN 202110487843A CN 113052509 B CN113052509 B CN 113052509B
Authority
CN
China
Prior art keywords
model
random
value
type
evaluated
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN202110487843.9A
Other languages
Chinese (zh)
Other versions
CN113052509A (en
Inventor
李策
孔繁爽
刘晏萁
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Industrial and Commercial Bank of China Ltd ICBC
Original Assignee
Industrial and Commercial Bank of China Ltd ICBC
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Industrial and Commercial Bank of China Ltd ICBC filed Critical Industrial and Commercial Bank of China Ltd ICBC
Priority to CN202110487843.9A priority Critical patent/CN113052509B/en
Publication of CN113052509A publication Critical patent/CN113052509A/en
Application granted granted Critical
Publication of CN113052509B publication Critical patent/CN113052509B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0635Risk analysis of enterprise or organisation activities
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • G06F18/20Analysing
    • G06F18/24Classification techniques
    • G06F18/241Classification techniques relating to the classification model, e.g. parametric or non-parametric approaches
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/06Resources, workflows, human or project management; Enterprise or organisation planning; Enterprise or organisation modelling
    • G06Q10/063Operations research, analysis or management
    • G06Q10/0639Performance analysis of employees; Performance analysis of enterprise or organisation operations
    • G06Q10/06393Score-carding, benchmarking or key performance indicator [KPI] analysis
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q40/00Finance; Insurance; Tax strategies; Processing of corporate or income taxes
    • G06Q40/02Banking, e.g. interest calculation or account maintenance

Landscapes

  • Engineering & Computer Science (AREA)
  • Business, Economics & Management (AREA)
  • Human Resources & Organizations (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • Economics (AREA)
  • Strategic Management (AREA)
  • General Physics & Mathematics (AREA)
  • Entrepreneurship & Innovation (AREA)
  • Development Economics (AREA)
  • Educational Administration (AREA)
  • Marketing (AREA)
  • General Business, Economics & Management (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Game Theory and Decision Science (AREA)
  • Finance (AREA)
  • Software Systems (AREA)
  • Accounting & Taxation (AREA)
  • General Engineering & Computer Science (AREA)
  • Tourism & Hospitality (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Quality & Reliability (AREA)
  • Operations Research (AREA)
  • Artificial Intelligence (AREA)
  • Bioinformatics & Cheminformatics (AREA)
  • Computing Systems (AREA)
  • Bioinformatics & Computational Biology (AREA)
  • Evolutionary Biology (AREA)
  • Life Sciences & Earth Sciences (AREA)
  • Medical Informatics (AREA)
  • Mathematical Physics (AREA)
  • Technology Law (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The disclosure provides a model evaluation method, and relates to the field of artificial intelligence. The model evaluation method comprises the following steps: based on a prediction result of a to-be-evaluated model on user data in a test set, a first index of the to-be-evaluated model is obtained, wherein the first index is used for representing the prediction performance of the to-be-evaluated model, and the to-be-evaluated model is a machine learning model. At least one stochastic model is determined based on the type of model to be evaluated, wherein the stochastic model is a non-machine learning model, and the at least one stochastic model comprises a first stochastic model. And obtaining a second index based on a random prediction result of the first random model on the user data in the test set, wherein the second index is used for representing the prediction performance of the first random model. And evaluating the model to be evaluated based on the difference value between the first index and the second index. The disclosure also provides a model evaluation device, an electronic device and a storage medium.

Description

Model evaluation method, model evaluation device, electronic apparatus, and storage medium
Technical Field
The present disclosure relates to the field of artificial intelligence, and more particularly, to a model evaluation method, a model evaluation apparatus, an electronic device, and a storage medium.
Background
Machine learning models are currently being used in a wider and wider range of industries, such as in the risk prevention and control and intelligent marketing fields of the financial industry. Before the machine learning model is formally applied, it is necessary to know the performance of the model.
At present, a corresponding evaluation index is generally selected according to the type of a model and a modeling target, and the quality of the model is judged by comparing the values of the evaluation indexes. For example, for a classification model, typical evaluation indexes include AUC value, accuracy, recall, prediction accuracy, and the like, and for a regression model, typical evaluation indexes include mean square error, root mean square error, R2 coefficient, and the like. The model evaluation indexes of different types can reflect the performance of the model according to the difference of the numerical values.
In implementing the concepts of the present disclosure, the inventors found that at least the following problems exist in the prior art:
The evaluation index in the related art is highly correlated with the quality of the used test set, and if the quality of the constructed test set is poor, the effective performance of the model to be evaluated cannot be objectively reflected.
Disclosure of Invention
In view of the above, the embodiments of the present disclosure provide a model evaluation method, which can objectively acquire the effective performance of a machine learning model, independent of the quality of a test set, and a model evaluation apparatus, an electronic device, and a storage medium.
One aspect of the disclosed embodiments provides a model evaluation method. The model evaluation method comprises the following steps: based on a prediction result of a to-be-evaluated model on user data in a test set, a first index of the to-be-evaluated model is obtained, wherein the first index is used for representing the prediction performance of the to-be-evaluated model, and the to-be-evaluated model is a machine learning model. At least one stochastic model is determined based on the type of model to be evaluated, wherein the stochastic model is a non-machine learning model, and the at least one stochastic model comprises a first stochastic model. And obtaining a second index based on a random prediction result of the first random model on the user data in the test set, wherein the second index is used for representing the prediction performance of the first random model. And evaluating the model to be evaluated based on the difference value between the first index and the second index.
According to an embodiment of the present disclosure, before determining the first stochastic model based on the type of the model to be evaluated, further comprising: a second stochastic model is determined based on the type of model to be evaluated, wherein the predicted performance of the second stochastic model is lower than the predicted performance of the first stochastic model, and the at least one stochastic model comprises the second stochastic model. And obtaining a third index based on a random prediction result of the second random model on the user data in the test set, wherein the third index is used for representing the prediction performance of the second random model. And when the difference value between the first index and the third index meets a preset condition, determining the first random model based on the type of the model to be evaluated.
According to an embodiment of the present disclosure, further comprising obtaining a type of the model to be evaluated, including: and obtaining the type of a predicted value output by the to-be-evaluated model for predicting the user data in the test set, wherein the predicted value has a corresponding relation with the predicted result. Determining the type of the model to be evaluated based on the type of the predicted value, wherein the type of the predicted value comprises at least one of a discrete value type and a continuous value type.
According to an embodiment of the disclosure, when the type of the predicted value is a discrete value type, wherein the determining at least one random model based on the type of the model to be evaluated comprises: determining a correct probability of each random model, wherein the correct probability comprises a probability that a random predicted value output by predicting each user data in the test set by the random model is identical to a first tag value of the user data, the first tag value comprises a value marked in advance according to the type of each user data, the first tag value is of a discrete value type, and the type of the first tag value corresponds to the type of the user data in the test set. Each of the random models is determined based on the probability of correctness.
According to an embodiment of the present disclosure, the determining each of the random models includes determining the first random model, specifically including: and obtaining the distribution proportion of each user data in a first training set, wherein each user data corresponds to one of the first label values, and the first training set is used for training the model to be evaluated. And determining a corresponding first probability based on the distribution proportion of each user data. The first stochastic model is determined based on the first probability, wherein the correct probability comprises the first probability.
According to an embodiment of the present disclosure, the determining each of the random models includes determining the second random model, specifically includes: and obtaining N categories of the first tag value, wherein N is an integer greater than or equal to 2. Determining a second probability based on the N categories, including: the second probability is 1/N. Determining the second random model based on the second probability, wherein the correct probability includes the second probability.
According to an embodiment of the disclosure, when the type of the predicted value is a continuous value type, wherein the determining at least one random model based on the type of the model to be evaluated comprises: and determining a random prediction rule of each random model based on a second label value corresponding to each user data in a second training set, wherein the second training set is used for training the model to be evaluated. And obtaining a corresponding continuous value set based on each random prediction rule. Each of the stochastic models is determined based on each of the successive sets of values.
According to an embodiment of the present disclosure, the predicting, by the first stochastic model, the user data in the test set includes: and obtaining a second label value set, wherein the second label value set comprises second label values corresponding to all user data in the second training set, and the continuous value set comprises the second label value set. And when the first random model predicts each user data in the test set, randomly extracting a second tag value from the second tag value set as a random predicted value.
According to an embodiment of the present disclosure, the predicting the user data in the test set by the second stochastic model includes: and obtaining the maximum value and the minimum value in the second label value. A random prediction interval is determined based on the maximum value and the minimum value, wherein the set of consecutive values includes the random prediction interval. When the second random model predicts each user data in the test set, a numerical value is randomly obtained from the random prediction interval to serve as a random prediction numerical value.
Another aspect of the disclosed embodiments provides a model evaluation apparatus. The model evaluation device comprises a first acquisition module, a random model module, a second acquisition module and a model evaluation module. The first obtaining module is used for obtaining a first index of the model to be evaluated based on a prediction result of the model to be evaluated on user data in a test set, wherein the first index is used for representing the prediction performance of the model to be evaluated, and the model to be evaluated is a machine learning model. The stochastic model module is configured to determine at least one stochastic model based on the type of model to be evaluated, wherein the stochastic model is a non-machine learning model, and the at least one stochastic model includes a first stochastic model. The second obtaining module is used for obtaining a second index based on the random prediction result of the first random model on the user data in the test set, wherein the second index is used for representing the prediction performance of the first random model. The model evaluation module is used for evaluating the model to be evaluated based on the difference value between the first index and the second index.
Another aspect of an embodiment of the present disclosure provides an electronic device. The electronic device includes one or more memories, and one or more processors. The memory stores executable instructions. The processor executes the executable instructions to implement the method as described above.
Another aspect of the disclosed embodiments provides a computer-readable storage medium storing computer-executable instructions that, when executed, are configured to implement a method as described above.
Yet another aspect of the disclosed embodiments provides a computer program product comprising a computer program/instruction which, when executed by a processor, implements a method as described above.
One or more of the above embodiments have the following advantages or benefits: the method can at least partially solve the problem that the evaluation of the machine learning model is easily influenced by the quality of the test set in the related technology, obtain a first index of the to-be-evaluated model for predicting the user data in the test set, obtain a second index of the first random model for randomly predicting the user data in the test set, evaluate the to-be-evaluated model through the difference value of the first index and the second index, and can objectively reflect the performance of the to-be-evaluated model because the first random model performs random prediction and is less influenced by the test set.
Drawings
The above and other objects, features and advantages of the present disclosure will become more apparent from the following description of embodiments thereof with reference to the accompanying drawings in which:
FIG. 1 schematically illustrates an exemplary system architecture to which a model evaluation method may be applied, according to an embodiment of the present disclosure;
FIG. 2 schematically illustrates a flow chart of a model evaluation method according to an embodiment of the disclosure;
FIG. 3 schematically illustrates a flow chart of a model evaluation method according to another embodiment of the present disclosure;
FIG. 4 schematically illustrates a flow chart of obtaining a type of model to be evaluated, according to an embodiment of the disclosure;
FIG. 5 schematically illustrates a flow chart of determining at least one stochastic model when the type of predicted value is a discrete value type, according to an embodiment of the disclosure;
FIG. 6 schematically illustrates a flow diagram for determining the first stochastic model according to an embodiment of the disclosure;
FIG. 7 schematically illustrates a flow chart of determining the second stochastic model according to another embodiment of the disclosure;
FIG. 8 schematically illustrates a flow chart of determining at least one stochastic model when the type of predicted value is a continuous value type, according to an embodiment of the disclosure;
FIG. 9 schematically illustrates a flow chart of predicting user data in the test set by a first stochastic model according to an embodiment of the disclosure;
FIG. 10 schematically illustrates a flow chart of predicting user data in the test set by a second stochastic model according to an embodiment of the disclosure;
FIG. 11 schematically illustrates a block diagram of a model evaluation apparatus according to an embodiment of the disclosure;
FIG. 12 schematically illustrates an operational flow diagram for model evaluation using a model evaluation device according to an embodiment of the present disclosure; and
FIG. 13 schematically illustrates a block diagram of a computer system suitable for implementing the model evaluation method and apparatus, in accordance with an embodiment of the present disclosure.
Detailed Description
Hereinafter, embodiments of the present disclosure will be described with reference to the accompanying drawings. It should be understood that the description is only exemplary and is not intended to limit the scope of the present disclosure. In the following detailed description, for purposes of explanation, numerous specific details are set forth in order to provide a thorough understanding of the embodiments of the present disclosure. It may be evident, however, that one or more embodiments may be practiced without these specific details. In addition, in the following description, descriptions of well-known structures and techniques are omitted so as not to unnecessarily obscure the concepts of the present disclosure.
The disclosure provides a model evaluation method, and relates to the field of artificial intelligence. The model evaluation method comprises the following steps: and acquiring a first index of the model to be evaluated based on a prediction result of the model to be evaluated on user data in the test set, wherein the first index is used for representing the prediction performance of the model to be evaluated, and the model to be evaluated is a machine learning model. At least one stochastic model is determined based on the type of model to be evaluated, wherein the stochastic model is a non-machine learning model, and the at least one stochastic model includes a first stochastic model. And obtaining a second index based on the random prediction result of the first random model on the user data in the test set, wherein the second index is used for representing the prediction performance of the first random model. And evaluating the model to be evaluated based on the difference value between the first index and the second index.
It should be noted that, the method and apparatus for model evaluation according to the embodiments of the present disclosure may be used in the financial field, and may also be used in any field other than the financial field, and the application fields of the method and apparatus for model evaluation according to the embodiments of the present disclosure are not limited.
According to embodiments of the present disclosure, the model to be evaluated may be a classification model (e.g., a bi-classification model or a multi-classification model) or a regression model. For example, when the model to be evaluated is a classification model, the model to be evaluated can be used for predicting and identifying the risk of the credit business of the user in the financial field, and the predicted value output by the classification model is a discrete value, for example, if the user is a risk user, a '0' is output, and if the user is a normal user, a '1' is output. When the model to be evaluated is a regression model, the predicted value output by the regression model is a continuous value, and the model can be used for a scene with continuous change of results of house price evaluation, air temperature prediction and the like, for example, the temperature of the area A can be predicted by inputting data of humidity, wind power, seasons and the like of the area A, such as 16.8 ℃, 18.9 ℃ or 25.3 ℃.
Fig. 1 schematically illustrates an exemplary system architecture 100 in which a model evaluation method may be applied according to an embodiment of the present disclosure. It should be noted that fig. 1 is only an example of a system architecture to which embodiments of the present disclosure may be applied to assist those skilled in the art in understanding the technical content of the present disclosure, but does not mean that embodiments of the present disclosure may not be used in other devices, systems, environments, or scenarios.
As shown in fig. 1, a system architecture 100 according to this embodiment may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 is used as a medium to provide communication links between the terminal devices 101, 102, 103 and the server 105. The network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, among others.
The user may interact with the server 105 via the network 104 using the terminal devices 101, 102, 103 to receive or send messages or the like. Various communication client applications, such as shopping class applications, web browser applications, search class applications, instant messaging tools, mailbox clients, social platform software, etc. (by way of example only) may be installed on the terminal devices 101, 102, 103.
According to an embodiment of the present disclosure, user data may be collected by the terminal devices 101, 102, 103 and stored in the server 105 forming a training set or test set. Wherein the server 105 may train the model under evaluation using user data in the training set. After training is completed, the model to be evaluated can be used for predicting the user data in the test set so as to obtain indexes reflecting the performance of the model to be evaluated.
In some embodiments of the present disclosure, the sources of user data in the training set and the testing set may not be identical or may be completely different. Data such as loan records, property conditions, repayment records, credit reports, and income levels of the M users are collected by the terminal devices 101, 102, 103, and the M users are labeled in advance, for example, the offending customers are labeled "0", and the normal customers are labeled "1". The server 105 may extract a portion of the constituent training set from the M users to train the model to be evaluated, and then the server 105 may extract another portion of the constituent test set from the M users to test.
Taking the prediction accuracy (Precision) index in the related art as an example, it characterizes the duty ratio of the true positive class (i.e., predicting the correct positive number of samples) in all the examples predicted as positive classes. If the index is derived on a severely unbalanced data set, such as a test set where most of the samples are positive samples, the model itself need only be biased to determine most of the samples as positive samples to obtain a higher value of the prediction accuracy index. For example, the ratio of the normal users in the test set to the loan risk users is 9 to 1, and even if the server 105 does not train the model to be evaluated, the model to be evaluated has higher accuracy if all the users are judged to be normal users. In some embodiments of the present disclosure, the sample may include a user or user data, and the test set may include a sample to be tested, wherein the model to be evaluated outputs a predicted result of the user to be tested by predicting the user data in the test set.
The terminal devices 101, 102, 103 may be a variety of electronic devices having a display screen and supporting web browsing, including but not limited to smartphones, tablets, laptop and desktop computers, and the like.
The server 105 may be a server providing various services, such as a background management server (by way of example only) providing support for websites browsed by users using the terminal devices 101, 102, 103. The background management server may analyze and process the received data such as the user request, and feed back the processing result (e.g., the web page, information, or data obtained or generated according to the user request) to the terminal device.
It should be noted that, the model evaluation method provided by the embodiment of the present disclosure may be generally performed by the server 105. Accordingly, the model evaluation apparatus provided by the embodiments of the present disclosure may be generally provided in the server 105. The model evaluation method provided by the embodiments of the present disclosure may also be performed by a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105. Accordingly, the model evaluation apparatus provided by the embodiments of the present disclosure may also be provided in a server or a server cluster that is different from the server 105 and is capable of communicating with the terminal devices 101, 102, 103 and/or the server 105.
It should be understood that the number of terminal devices, networks and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
The model evaluation method of the embodiment of the present disclosure will be described in detail below by taking a machine learning model applied in the financial field as an example.
Fig. 2 schematically illustrates a flow chart of a model evaluation method according to an embodiment of the present disclosure.
As shown in fig. 2, the method may include operations S210 to S240.
In operation S210, a first index of the model to be evaluated is obtained based on a prediction result of the model to be evaluated on the user data in the test set, where the first index is used to characterize a prediction performance of the model to be evaluated, and the model to be evaluated is a machine learning model.
Taking a credit business risk management model as an example, the model can judge whether a sample to be tested has business risk or not according to the embodiment of the disclosure. For example, the output value of the model may be made discrete, with an output of "0" representing the offending customer and "1" representing the normal customer. Specifically, for example, the test set sample size is 4000, with positive samples 3180 (i.e., normal customers) and negative samples 820 (i.e., default customers). The test set is predicted after the model is trained, for example, positive samples are predicted correctly 3120, prediction errors are 60, negative samples are predicted correctly 762, and prediction errors are 58, so that the prediction accuracy is 0.97 (i.e. the first index).
In some embodiments of the present disclosure, the output value of the model may be made continuous, e.g., the output result is a percentage, to represent the probability of a customer's breach. If the probability of breach is greater than or equal to 60% representing a breach client, less than 50% representing a normal client (by way of example only), and between 50% and 60% being clients to be reviewed. After the test set is predicted, a root mean square error between the predicted result and the actual default probability of the customer is calculated as a first index.
It should be noted that, the method of training the machine learning model may be a method existing in the related art, and the disclosure is not limited thereto.
In operation S220, at least one stochastic model is determined based on the type of model to be evaluated, wherein the stochastic model is a non-machine learning model, and the at least one stochastic model includes a first stochastic model.
According to the embodiment of the disclosure, since the starting points of constructing different models to be evaluated are different, for example, whether one user is a risk user or not is obtained, or the default probability of one user is obtained, the results output by the models to be evaluated are different (for example, discrete values and continuous values), so that the random model can be determined according to the corresponding output result types.
Wherein the stochastic model may be a mathematical model, for example by setting the probability of stochastic prediction to adjust the prediction performance, the stochastic model is influenced by the set probability, so that dependence on the quality of the user data in the test set can be avoided.
In operation S230, a second index is obtained based on the random prediction result of the first random model on the user data in the test set, wherein the second index is used to characterize the prediction performance of the first random model.
According to the embodiment of the disclosure, for example, 4000 samples of the test set are randomly determined by using a first random model, the probability of each user determining as a positive sample is 3180/4000, the probability of each user determining as a negative sample is 820/4000, and finally, the prediction result of the first random model is counted to obtain the accuracy of prediction as a second index.
In operation S240, the model to be evaluated is evaluated based on the difference between the first index and the second index.
According to the model evaluation method disclosed by the embodiment of the disclosure, the second index of the first random model eliminates strong dependence on the test set, and user data in the test set is not taken as a reference, so that the model to be evaluated is compared with the first random model, and the performance improvement degree of the model to be evaluated can be reflected through the difference value of the first index and the second index, so that the effective performance of the model to be evaluated is objectively reflected.
Fig. 3 schematically illustrates a flow chart of a model evaluation method according to another embodiment of the present disclosure.
As shown in fig. 3, the model evaluation method of the embodiment of the present disclosure may further include operations S310 to S350 before determining the first random model based on the type of the model to be evaluated, in addition to operations S210, S230, and S240.
In operation S210, a first index of the model to be evaluated is obtained based on a prediction result of the model to be evaluated on the user data in the test set, where the first index is used to characterize a prediction performance of the model to be evaluated, and the model to be evaluated is a machine learning model.
In operation S310, a second stochastic model is determined based on the type of model to be evaluated, wherein the predicted performance of the second stochastic model is lower than the predicted performance of the first stochastic model, and at least one stochastic model comprises the second stochastic model.
According to the embodiment of the disclosure, a random condition can be preset, and the prediction performance of the random model on the user data in the test set can be adjusted. When the prediction performance of the random model is stronger, the randomness of the random model is lower, and service personnel are required to consume larger cost for setting conditions. Therefore, a second random model with lower prediction performance can be determined first, and the performance of the model to be evaluated can be obtained by using the second random model.
In operation S320, a third index is obtained based on the random prediction result of the second random model on the user data in the test set, where the third index is used to characterize the prediction performance of the second random model.
In operation S330, it is determined whether the difference between the first index and the third index satisfies the preset condition, if so, operation S350 is performed, and if not, operation S340 is performed.
In operation S340, when the difference between the first index and the third index does not satisfy the preset condition, the model to be evaluated is determined as an invalid model.
According to the embodiment of the disclosure, the prediction performance of the second random model is lower, that is, the accuracy of the prediction result reflected by the third index is poor. Therefore, the difference between the first index and the third index can reflect the lifting degree of the model to be evaluated relative to the second random model, and if the lifting degree is limited or not lifted, the performance of the model to be evaluated is considered to be lower, and the model to be evaluated belongs to an invalid model.
In operation S350, when the difference between the first index and the third index satisfies a preset condition, a first random model is determined based on the type of the model to be evaluated.
In operation S230, a second index is obtained based on the random prediction result of the first random model on the user data in the test set, wherein the second index is used to characterize the prediction performance of the first random model.
In operation S240, the model to be evaluated is evaluated based on the difference between the first index and the second index.
According to the embodiment of the disclosure, the model to be evaluated is firstly evaluated through a second random model which is easy to obtain, and if the evaluation is not passed, the first random model is not required to be determined. The evaluation efficiency can be improved, and the evaluation cost can be reduced.
Various implementations of the methods shown in fig. 2 or 3 are further described below with reference to fig. 4-9 in conjunction with specific embodiments.
Fig. 4 schematically illustrates a flow chart of obtaining a type of model to be evaluated, according to an embodiment of the disclosure.
As shown in fig. 4, obtaining the type of the model to be evaluated may include operations S410 to S420.
In operation S410, a type of a predicted value output by the model to be evaluated for predicting user data in the test set is obtained, wherein the predicted value has a correspondence with the predicted result.
In operation S420, a type of the model to be evaluated is determined based on the type of the predicted value, wherein the type of the predicted value includes at least one of a discrete value type and a continuous value type.
In embodiments of the present disclosure, the determination may be made by determining whether the predicted value is of an integer type or a floating point type. For example, "0", "1", "2" (only by way of example) and the like have no fractional part, belonging to an integer type (discrete value type), and "2.1", "5.0", "85.2" (only by way of example) and the like have fractional parts, belonging to a floating point type (continuous value type).
According to the embodiment of the disclosure, the type of the model to be evaluated can be determined simultaneously when the model to be evaluated is constructed. For example, the three stages of construction, training and testing are responsible for different business personnel, and the type of model to be evaluated can be determined by judging whether the predicted value is of a discrete value type or a continuous value type during testing. Therefore, the time for communicating with other business personnel can be saved, and the test personnel does not need to carry out manual confirmation.
Fig. 5 schematically illustrates a flow chart of determining at least one random model when the type of predicted value is a discrete value type, according to an embodiment of the disclosure.
As shown in fig. 5, obtaining the type of the model to be evaluated may include operations S510 to S520.
In operation S510, a correct probability of each random model is determined, wherein the correct probability includes a probability that a random predicted value outputted by predicting each user data in the test set by the random model is identical to a first tag value of the user data, the first tag value including a value labeled in advance according to a type of each user data, wherein the first tag value is a discrete value type, and the type of the first tag value corresponds to a type of the user data in the test set.
In operation S520, each random model is determined based on the correct probability.
Fig. 6 schematically illustrates a flow chart of determining a first random model according to an embodiment of the disclosure.
As shown in fig. 6, determining the first random model in operation S520 may include operations S610 to S630.
In operation S610, a distribution ratio of each user data in a first training set is obtained, where each user data corresponds to a first label value, and the first training set is used to train a model to be evaluated.
In operation S620, a corresponding first probability is determined based on the distribution ratio of each user data.
In operation S630, a first random model is determined based on the first probability, wherein the correct probability comprises the first probability.
Taking the credit business risk management model as an example, the training set sample size may be 5000 (where there are 4000 positive samples and 1000 negative samples) according to embodiments of the present disclosure. Wherein, the distribution ratio of the positive sample is 0.8, and the distribution ratio of the negative sample is 0.2. The first random model determines for each sample that the first probability of being a positive sample is 80% and that of being a negative sample is 20%. That is, the probability of outputting "1" is 80%, and the probability of outputting "0" is 20%.
In predicting each user in the test set using the first random model, according to embodiments of the present disclosure, first, values within the [0,1] interval may be randomly generated using a random number generation function. Then, the output random value is judged, and when the output random value is less than or equal to 0.8, the first random model outputs "1". When the output random value is greater than 0.8, the first random model outputs "0". The method can form a mathematical model as a first random model, and combines the training set to control the random prediction probability of the first random model, so that the prediction performance of the first random model is improved.
Fig. 7 schematically illustrates a flow chart of determining a second random model according to another embodiment of the present disclosure.
As shown in fig. 7, determining the second random model in operation S520 may include operations S710 to S730.
In operation S710, N categories of the first tag value are obtained, where N is an integer greater than or equal to 2.
In operation S720, a second probability is determined based on the N categories, including: the second probability is 1/N.
In operation S730, a second random model is determined based on the second probabilities, wherein the correct probabilities include the second probabilities.
According to embodiments of the present disclosure, there are two user data types, positive and negative, in the training set, such as the credit business risk management model described above. Accordingly, "1" and "0" represent two kinds of tag values. Thus, it is possible to determine that the probability of obtaining "1" or "0" when predicting each sample by the second random model is equal, that is, 50% by n=2.
In predicting each user in the test set using the second random model, according to embodiments of the present disclosure, first, values within the [0,1] interval may be randomly generated using a random number generation function. Then, the output random value is judged, and when the output random value is less than or equal to 0.5, the second random model outputs "1". When the output random value is greater than 0.5, the second random model outputs "0". The method can form a mathematical model as the second random model, and does not need to acquire the distribution proportion of the user data in the training set, so that the time for acquiring the training set is saved, and the second random model can be acquired more quickly.
It should be noted that the above process of constructing the first random model or the second random model by the random number generating function is merely an example, and the random model may be obtained in other manners, which is not limited in the present disclosure.
Fig. 8 schematically illustrates a flow chart of determining at least one random model when the type of predicted value is a continuous value type, according to an embodiment of the disclosure.
As shown in fig. 8, obtaining the type of the model to be evaluated may include operations S810 to S830.
In operation S810, a random prediction rule for each random model is determined based on a second tag value corresponding to each user data in a second training set, where the second training set is used to train the model to be evaluated.
In operation S820, a corresponding set of consecutive values is obtained based on each random prediction rule.
In operation S830, each random model is determined based on each successive set of values.
According to embodiments of the present disclosure, the random prediction rules of each random model may affect the prediction performance of the random model. The random prediction rule may here be a rule that collects consecutive values to form a set of consecutive values. Then, when each sample in the test set is predicted randomly, a random predicted value of each random model may be obtained from the continuous value set, and finally, the obtained random predicted value is taken as a predicted result of each sample.
Fig. 9 schematically illustrates a flow chart of predicting user data in a test set by a first stochastic model according to an embodiment of the disclosure.
As shown in fig. 9, when the predicted value is a continuous value, the first random model predicting the user data in the test set may include operations S910 to S920.
In operation S910, a second set of tag values is obtained, where the second set of tag values includes second tag values corresponding to all user data in the second training set, and the continuous set of values includes the second set of tag values.
In operation S920, when the first random model predicts each user data in the test set, one second tag value is randomly extracted from the second tag value set as a random predicted value.
According to embodiments of the present disclosure, for example, a credit business risk management model may output the probability of breach for each user. For example, the second training set includes 10 users, each user having a second tag value (i.e., probability of breach) of [15.2%,20.0%,30.5%,40.5%,52.6%,56.8%,60.5%,70.5%,80.2%,90.7% ] (by way of example only). The test set data also comprises 10 users, and when a first random model predicts a tested user, a label value can be randomly extracted from the set to serve as a prediction result. After the user data in the test set is predicted, a random predicted value set is obtained. And the root mean square error can be calculated as an index of the first random model through the random prediction value set and the second label value set of the test set.
According to embodiments of the present disclosure, for example, a credit business risk management model is constructed for use in real life, where the sample distribution in the training set or test set generally follows the real data sample distribution when the credit business risk management model is trained or tested. Therefore, the first random model randomly obtains the predicted value from the training set to predict the user data in the test set, so that the user data in the test set can have higher prediction performance.
It should be noted that, the above description is given by taking 10 users as examples for clarity, and the number of users may be selected according to the needs in practical application, which is not limited by the disclosure.
Fig. 10 schematically illustrates a flow chart of predicting user data in a test set by a second stochastic model according to an embodiment of the disclosure.
As shown in fig. 10, when the predicted value is a continuous value, the second random model predicting the user data in the test set may include operations S1010 to S1030.
In operation S1010, the maximum value and the minimum value in the second tag value are obtained.
In operation S1020, a random prediction interval is determined based on the maximum value and the minimum value, wherein the continuous value set includes the random prediction interval.
In operation S1030, when the second random model predicts each user data in the test set, a value is randomly obtained from the random prediction interval as a random prediction value.
According to an embodiment of the present disclosure, the minimum value of the second label value for the user in the second training set is 15.2% and the maximum value is 90.7%, for example. Thus, a random prediction interval [15.2%,90.7% ] can be determined, from which a value is randomly obtained when the second random model predicts each test user. After the user data in the test set is predicted, a random predicted value set is obtained. And the root mean square error can be calculated as an index of the second random model by using the random prediction value set and the second label value set of the test set.
According to embodiments of the present disclosure, the second stochastic model can be constructed based on the maximum and minimum values of the second label values, with the second stochastic model being constructed faster than if the first stochastic model acquired all of the second label values in the training set.
Fig. 11 schematically illustrates a block diagram of a model evaluation apparatus 1100 according to an embodiment of the disclosure.
As shown in fig. 11, the model evaluation apparatus 1100 includes a first acquisition module 1110, a random model module 1120, a second acquisition module 1130, and a model evaluation module 1140.
The first obtaining module 1110 may, for example, perform operation S210, configured to obtain, based on a prediction result of the model to be evaluated on user data in the test set, a first index of the model to be evaluated, where the first index is used to characterize a prediction performance of the model to be evaluated, and the model to be evaluated is a machine learning model.
The stochastic model module 1120 may, for example, perform operation S220 for determining at least one stochastic model based on the type of model to be evaluated, wherein the stochastic model is a non-machine learning model, the at least one stochastic model comprising a first stochastic model.
According to an embodiment of the present disclosure, the stochastic model module 1120 may include a first stochastic model module, which may perform operations S310 to S350, for example, for determining a second stochastic model based on the type of the model to be evaluated, wherein the predictive performance of the second stochastic model is lower than the predictive performance of the first stochastic model, and at least one stochastic model includes the second stochastic model. And obtaining a third index based on the random prediction result of the second random model on the user data in the test set, wherein the third index is used for representing the prediction performance of the second random model. And when the difference value between the first index and the third index meets a preset condition, determining a first random model based on the type of the model to be evaluated.
According to an embodiment of the present disclosure, the first random model module may further perform operations S510 to S520 for determining a correct probability of each random model, where the correct probability includes a probability that a random predicted value outputted by predicting each user data in the test set by the random model is the same as a first tag value of the user data, the first tag value includes a value labeled in advance according to a type of each user data, and the first tag value is a discrete value type, and the type of the first tag value corresponds to a type of the user data in the test set. Each random model is determined based on the probability of correctness.
According to an embodiment of the disclosure, the first stochastic model module may further perform operations S610 to S630, for example, to obtain a distribution ratio of each user data in a first training set, where each user data corresponds to a first label value, and the first training set is used to train the model to be evaluated. A corresponding first probability is determined based on the distribution ratio of each user data. A first stochastic model is determined based on the first probability, wherein the correct probability comprises the first probability.
According to an embodiment of the present disclosure, the first random model module may further perform operations S710 to S730, for example, to obtain N kinds of first tag values, where N is an integer greater than or equal to 2. Determining a second probability based on the N categories, including: the second probability is 1/N. A second random model is determined based on the second probabilities, wherein the correct probabilities include the second probabilities.
According to an embodiment of the present disclosure, the stochastic model module 1120 may further include a second stochastic model module, for example, which may perform operations S810 to S830, for determining a stochastic prediction rule of each stochastic model based on the second label value corresponding to each user data in a second training set, wherein the second training set is used for training the model to be evaluated. A corresponding set of consecutive values is obtained based on each random prediction rule. Each random model is determined based on each set of consecutive values.
According to an embodiment of the present disclosure, the random model module 1120 may further perform operations S910 to S920, for example, to obtain a second set of tag values, where the second set of tag values includes second tag values corresponding to all user data in the second training set, and the continuous set of values includes the second set of tag values. And when the first random model predicts each user data in the test set, randomly extracting a second tag value from the second tag value set as a random predicted value.
According to an embodiment of the present disclosure, the random model module 1120 may further perform operations S1010 to S1030, for example, for obtaining the maximum value and the minimum value in the second tag value. A random prediction interval is determined based on the maximum and minimum values, wherein the set of consecutive values includes the random prediction interval. When the second random model predicts each user data in the test set, a value is randomly obtained from the random prediction interval to serve as a random prediction value.
The second obtaining module 1130 may, for example, perform operation S230, configured to obtain a second index based on a random prediction result of the first random model on the user data in the test set, where the second index is used to characterize the prediction performance of the first random model.
The model evaluation module 1140 may perform operation S240, for example, for evaluating the model to be evaluated based on the difference between the first index and the second index.
According to an embodiment of the present disclosure, the model evaluation apparatus 1100 may further include a type determination module. The type determining module may, for example, perform operations S410 to S420, and is configured to obtain a type of a predicted value that is output by the model to be evaluated in predicting the user data in the test set, where the predicted value has a correspondence with the predicted result. The type of the model to be evaluated is determined based on the type of the predicted value, wherein the type of the predicted value comprises at least one of a discrete value type and a continuous value type.
The detailed flow of model evaluation using the model evaluation device 1100 is described in detail below with reference to fig. 12.
Fig. 12 schematically illustrates an operational flow diagram for model evaluation using the model evaluation device 1100 according to an embodiment of the present disclosure.
As shown in fig. 12, performing model evaluation using the model evaluation device 1100 may include operations S1210 to S1270.
In operation S1210, the model type to be evaluated, the model training set to be evaluated, the model index to be evaluated (i.e., the first index), and the model test set to be evaluated may be acquired by the model evaluation device 1100.
In operation S1220, it is determined whether the model type to be evaluated is a discrete value type or a continuous value type. If the discrete value type is the discrete value type, operation S1240 is performed, and if the continuous value type is the continuous value type, operation S1230 is performed.
In operation S1230, for example, operations S810 to S830 may be performed to determine the first random model and the second random model, which will not be described herein.
In operation S1240, for example, operations S510 to S520 may be performed to determine the first random model and the second random model, which will not be described herein.
In operation S1250, random predictions of user data in the test set may be made using the first random model and the second random model, respectively.
According to the embodiment of the disclosure, the second random model can be obtained first to conduct random prediction on the user data in the test set, and the first random model is obtained after the preset condition is met to conduct prediction on the user data in the test set.
In operation S1260, a second index is obtained based on the first random model, and a third index is obtained based on the second random model.
In operation S1270, a difference between the random model and the model to be evaluated is compared as a validity index for evaluating the performance of the model based on the first index, the second index, and the third index. The Availability index Availability calculation formula is as follows:
Wherein n 1 is the number of samples in the test set, n 0 is the number of correct predicted samples in the test set by the original judgment model, n A is the number of correct predicted samples in the test set by the random model a (i.e., the second random model), n B is the number of correct predicted samples in the test set by the random model B (i.e., the first random model), α is a tolerance coefficient, and α can be set to be 0.1 by default.
In accordance with an embodiment of the present disclosure,The degree of lifting of the original model compared with the random model A is measured, and the effectiveness is directly assigned to 0 when the lifting is limited, so that the model is considered to be invalid. The effectiveness index availabilities measures the degree to which the original model is lifted compared with the random model B; for example, it is considered that the validity index is invalid when smaller than 0.01, more effective when 0.01 to 0.05, more effective when 0.05 to 0.1, and very effective when larger than 0.1.
Also taking the credit business risk management model described above as an example, the model is able to determine whether user data in a test set has a business risk, with an output 0 representing a default customer (negative sample) and a1 representing a normal customer (positive sample), i.e., the output value is a discrete value, according to an embodiment of the present disclosure. The number of users in the test set was 4000, with 3180 positive samples and 820 negative samples. For example, after the model predicts on the test set, positive samples predict correctly 3120, and predict incorrectly 60; negative samples were correctly predicted 762, 58 prediction errors, and prediction accuracy was 0.97 (i.e., the first index).
From the above, the model to be evaluated is a classification model, the predicted value is a discrete value type, and the validity index availabilities is calculated next:
first, a random model a is determined. The probability of determining each user as a positive sample or a negative sample is set to be 50%, and specific reference may be made to operations S710 to S730, which are not described herein.
Then, for example, the random model a predicts 2000 samples on the test set to be correct, and then its accuracy is 0.5 (i.e. the third index).
Next, a difference between the first index and the third index of 0.47, which is greater than the tolerance coefficient of 0.1, is obtained, and then the first random model can be determined to continue the evaluation.
Next, a random model B is determined. For example, the distribution ratio of positive and negative samples in the training set is: 8 to 2, the probability that the random model B determines that a user is a positive sample is 80% and the probability that the user is a negative sample is 20%, and specific reference may be made to operations S610 to S630, which are not described herein.
Again, random model B predicts 2640 correct samples on the test set, then its accuracy is 0.66 (i.e., the second index).
Finally, a difference between the first index and the second index is 0.31. And performing dimensionless treatment, dividing 0.31 by 0.66, and finally obtaining the effectiveness index of about 0.47.
Any number of modules, sub-modules, units, sub-units, or at least some of the functionality of any number of the sub-units according to embodiments of the present disclosure may be implemented in one module. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented as split into multiple modules. Any one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be implemented at least in part as a hardware circuit, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system-on-chip, a system-on-substrate, a system-on-package, an Application Specific Integrated Circuit (ASIC), or in any other reasonable manner of hardware or firmware that integrates or encapsulates the circuit, or in any one of or a suitable combination of three of software, hardware, and firmware. Or one or more of the modules, sub-modules, units, sub-units according to embodiments of the present disclosure may be at least partially implemented as computer program modules, which, when executed, may perform the corresponding functions.
For example, any number of the first acquisition module 1110, the random model module 1120, the second acquisition module 1130, and the model evaluation module 1140 may be combined in one module to be implemented, or any one of the modules may be split into a plurality of modules. Or at least some of the functionality of one or more of the modules may be combined with, and implemented in, at least some of the functionality of other modules. According to embodiments of the present disclosure, at least one of the first acquisition module 1110, the random model module 1120, the second acquisition module 1130, and the model evaluation module 1140 may be implemented at least in part as hardware circuitry, such as a Field Programmable Gate Array (FPGA), a Programmable Logic Array (PLA), a system on a chip, a system on a substrate, a system on a package, an Application Specific Integrated Circuit (ASIC), or may be implemented in hardware or firmware in any other reasonable manner of integrating or packaging the circuitry, or in any one of or a suitable combination of three of software, hardware, and firmware. Or at least one of the first acquisition module 1110, the stochastic model module 1120, the second acquisition module 1130, and the model evaluation module 1140 may be at least partially implemented as a computer program module, which when executed, may perform the corresponding functions.
FIG. 13 schematically illustrates a block diagram of a computer system suitable for implementing the model evaluation method and apparatus, in accordance with an embodiment of the present disclosure. The computer system illustrated in fig. 13 is merely an example, and should not be construed as limiting the functionality and scope of use of the embodiments of the present disclosure.
As shown in fig. 13, a computer system 1300 according to an embodiment of the present disclosure includes a processor 1301 that can perform various appropriate actions and processes according to a program stored in a Read Only Memory (ROM) 1302 or a program loaded from a storage section 1308 into a Random Access Memory (RAM) 1303. Processor 1301 may include, for example, a general purpose microprocessor (e.g., a CPU), an instruction set processor and/or an associated chipset and/or a special purpose microprocessor (e.g., an Application Specific Integrated Circuit (ASIC)), or the like. Processor 1301 may also include on-board memory for caching purposes. Processor 1301 may include a single processing unit or multiple processing units for performing different actions of the method flow according to embodiments of the present disclosure.
In the RAM 1303, various programs and data necessary for the operation of the system 1300 are stored. The processor 1301, the ROM 1302, and the RAM 1303 are connected to each other through a bus 1304. The processor 1301 performs various operations of the method flow according to the embodiment of the present disclosure by executing programs in the ROM 1302 and/or the RAM 1303. Note that the program may be stored in one or more memories other than the ROM 1302 and the RAM 1303. Processor 1301 may also perform various operations of the method flow according to embodiments of the present disclosure by executing programs stored in one or more memories.
According to an embodiment of the present disclosure, the system 1300 may also include an input/output (I/O) interface 1305, the input/output (I/O) interface 1305 also being connected to the bus 1304. The system 1300 may also include one or more of the following components connected to the I/O interface 1305: an input section 1306 including a keyboard, a mouse, and the like; an output portion 1307 including a Cathode Ray Tube (CRT), a Liquid Crystal Display (LCD), and the like, and a speaker, and the like; a storage portion 1308 including a hard disk or the like; and a communication section 1309 including a network interface card such as a LAN card, a modem, or the like. The communication section 1309 performs a communication process via a network such as the internet. The drive 1310 is also connected to the I/O interface 1305 as needed. Removable media 1311, such as magnetic disks, optical disks, magneto-optical disks, semiconductor memory, and the like, is installed as needed on drive 1310 so that a computer program read therefrom is installed as needed into storage portion 1308.
According to embodiments of the present disclosure, the method flow according to embodiments of the present disclosure may be implemented as a computer software program. For example, embodiments of the present disclosure include a computer program product comprising a computer program embodied on a computer readable storage medium, the computer program comprising program code for performing the method shown in the flowcharts. In such embodiments, the computer program may be downloaded and installed from a network via the communication portion 1309 and/or installed from the removable medium 1311. The above-described functions defined in the system of the embodiments of the present disclosure are performed when the computer program is executed by the processor 1301. The systems, devices, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
Embodiments of the present disclosure also include a computer program product comprising a computer program comprising program code for performing the methods provided by the embodiments of the present disclosure, the program code for causing an electronic device to implement the image recognition methods provided by the embodiments of the present disclosure when the computer program product is run on the electronic device.
The above-described functions defined in the system/apparatus of the embodiments of the present disclosure are performed when the computer program is executed by the processor 1301. The systems, apparatus, modules, units, etc. described above may be implemented by computer program modules according to embodiments of the disclosure.
In one embodiment, the computer program may be based on a tangible storage medium such as an optical storage device, a magnetic storage device, or the like. In another embodiment, the computer program can also be transmitted, distributed over a network medium in the form of signals, downloaded and installed via the communication portion 1309, and/or installed from the removable medium 1311. The computer program may include program code that may be transmitted using any appropriate network medium, including but not limited to: wireless, wired, etc., or any suitable combination of the foregoing.
According to embodiments of the present disclosure, program code for performing computer programs provided by embodiments of the present disclosure may be written in any combination of one or more programming languages, and in particular, such computer programs may be implemented in high-level procedural and/or object-oriented programming languages, and/or assembly/machine languages. Programming languages include, but are not limited to, such as Java, c++, python, "C" or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, partly on a remote computing device, or entirely on the remote computing device or server. In the case of remote computing devices, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., connected via the Internet using an Internet service provider).
The flowcharts and block diagrams in the figures illustrate the architecture, functionality, and operation of possible implementations of systems, methods and computer program products according to various embodiments of the present disclosure. In this regard, each block in the flowchart or block diagrams may represent a module, segment, or portion of code, which comprises one or more executable instructions for implementing the specified logical function(s). It should also be noted that, in some alternative implementations, the functions noted in the block may occur out of the order noted in the figures. For example, two blocks shown in succession may, in fact, be executed substantially concurrently, or the blocks may sometimes be executed in the reverse order, depending upon the functionality involved. It will also be noted that each block of the block diagrams or flowchart illustration, and combinations of blocks in the block diagrams or flowchart illustration, can be implemented by special purpose hardware-based systems which perform the specified functions or acts, or combinations of special purpose hardware and computer instructions.
The embodiments of the present disclosure are described above. These examples are for illustrative purposes only and are not intended to limit the scope of the present disclosure. Various alternatives and modifications can be made by those skilled in the art without departing from the scope of the disclosure, and such alternatives and modifications are intended to fall within the scope of the disclosure.

Claims (12)

1. A model evaluation method, comprising:
acquiring a first index of a to-be-evaluated model based on a prediction result of the to-be-evaluated model on user data in a test set, wherein the first index is used for representing the prediction performance of the to-be-evaluated model, and the to-be-evaluated model is a machine learning model;
Determining at least one stochastic model based on the type of the model to be evaluated, wherein the stochastic model is a non-machine learning model, and the at least one stochastic model comprises a first stochastic model;
acquiring a second index based on a random prediction result of the first random model on the user data in the test set, wherein the second index is used for representing the prediction performance of the first random model;
Evaluating the model to be evaluated based on the difference value between the first index and the second index;
Wherein the method further comprises obtaining a type of the model to be evaluated, comprising:
obtaining the type of a predicted value output by the to-be-evaluated model for predicting the user data in the test set, wherein the predicted value has a corresponding relation with the predicted result;
Determining the type of the model to be evaluated based on the type of the predicted value, wherein the type of the predicted value comprises at least one of a discrete value type and a continuous value type;
Wherein said determining at least one stochastic model based on the type of the model under evaluation comprises:
Determining the at least one stochastic model based on the type of predicted value being a discrete value type; or alternatively, the first and second heat exchangers may be,
The at least one stochastic model is determined based on the type of predictive value being a continuous value type.
2. The model evaluation method according to claim 1, wherein before determining the first random model based on the type of the model to be evaluated, further comprising:
determining a second stochastic model based on the type of the model to be evaluated, wherein the predicted performance of the second stochastic model is lower than the predicted performance of the first stochastic model, the at least one stochastic model comprising the second stochastic model;
acquiring a third index based on a random prediction result of the second random model on the user data in the test set, wherein the third index is used for representing the prediction performance of the second random model;
and when the difference value between the first index and the third index meets a preset condition, determining the first random model based on the type of the model to be evaluated.
3. The model evaluation method according to claim 2, when the type of the predicted value is a discrete value type, wherein the determining at least one random model based on the type of the model to be evaluated includes:
determining a correct probability of each random model, wherein the correct probability comprises a probability that a random predicted value output by predicting each user data in the test set by the random model is the same as a first tag value of the user data, the first tag value comprises a value marked in advance according to the type of each user data, the first tag value is of a discrete value type, and the type of the first tag value corresponds to the type of the user data in the test set;
each of the random models is determined based on the probability of correctness.
4. A model evaluation method according to claim 3, wherein said determining each of said stochastic models comprises determining said first stochastic model, comprising in particular:
obtaining a distribution proportion of each user data in a first training set, wherein each user data corresponds to one type of first label value, and the first training set is used for training the model to be evaluated;
determining a corresponding first probability based on the distribution proportion of each user data;
The first stochastic model is determined based on the first probability, wherein the correct probability comprises the first probability.
5. A model evaluation method according to claim 3, wherein said determining each of said stochastic models comprises determining said second stochastic model, comprising in particular:
obtaining N categories of the first tag value, wherein N is an integer greater than or equal to 2;
Determining a second probability based on the N categories, including: the second probability is 1/N;
determining the second random model based on the second probability, wherein the correct probability includes the second probability.
6. The model evaluation method according to claim 2, wherein when the type of the predicted value is a continuous value type, wherein the determining at least one random model based on the type of the model to be evaluated includes:
Determining a random prediction rule of each random model based on a second label value corresponding to each user data in a second training set, wherein the second training set is used for training the model to be evaluated;
obtaining a corresponding continuous value set based on each random prediction rule;
each of the stochastic models is determined based on each of the successive sets of values.
7. The model evaluation method of claim 6, wherein the first stochastic model predicts the test set of user data comprises:
Obtaining a second label value set, wherein the second label value set comprises second label values corresponding to all user data in the second training set, and the continuous value set comprises the second label value set;
Wherein,
And randomly extracting a second tag value from the second tag value set as a random predicted value when the first random model predicts each piece of user data in the test set.
8. The model evaluation method of claim 6, wherein the second stochastic model predicts the test set of user data comprises:
Obtaining a maximum value and a minimum value in the second label value;
determining a random prediction interval based on the maximum value and the minimum value, wherein the set of consecutive values includes the random prediction interval;
Wherein,
When the second random model predicts each user data in the test set, a value is randomly obtained from the random prediction interval to serve as a random prediction value.
9. A model evaluation apparatus comprising:
The first acquisition module is used for acquiring a first index of the model to be evaluated based on a prediction result of the model to be evaluated on user data in a test set, wherein the first index is used for representing the prediction performance of the model to be evaluated, and the model to be evaluated is a machine learning model;
A stochastic model module for determining at least one stochastic model based on the type of model to be evaluated, wherein the stochastic model is a non-machine learning model, the at least one stochastic model comprising a first stochastic model;
The second acquisition module is used for acquiring a second index based on a random prediction result of the first random model on the user data in the test set, wherein the second index is used for representing the prediction performance of the first random model;
The model evaluation module is used for evaluating the model to be evaluated based on the difference value of the first index and the second index;
the method also comprises a type determining module, which is used for obtaining the type of the model to be evaluated and comprises the following steps:
obtaining the type of a predicted value output by the to-be-evaluated model for predicting the user data in the test set, wherein the predicted value has a corresponding relation with the predicted result;
Determining the type of the model to be evaluated based on the type of the predicted value, wherein the type of the predicted value comprises at least one of a discrete value type and a continuous value type;
Wherein said determining at least one stochastic model based on the type of the model under evaluation comprises:
Determining the at least one stochastic model based on the type of predicted value being a discrete value type; or alternatively, the first and second heat exchangers may be,
The at least one stochastic model is determined based on the type of predictive value being a continuous value type.
10. An electronic device, comprising:
one or more memories storing executable instructions; and
One or more processors executing the executable instructions to implement the method of any of claims 1-8.
11. A computer readable storage medium having stored thereon executable instructions which, when executed by a processor, cause the processor to perform the method according to any of claims 1-8.
12. A computer program product comprising a computer program which, when executed by a processor, implements the method according to any one of claims 1 to 8.
CN202110487843.9A 2021-04-30 2021-04-30 Model evaluation method, model evaluation device, electronic apparatus, and storage medium Active CN113052509B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110487843.9A CN113052509B (en) 2021-04-30 2021-04-30 Model evaluation method, model evaluation device, electronic apparatus, and storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110487843.9A CN113052509B (en) 2021-04-30 2021-04-30 Model evaluation method, model evaluation device, electronic apparatus, and storage medium

Publications (2)

Publication Number Publication Date
CN113052509A CN113052509A (en) 2021-06-29
CN113052509B true CN113052509B (en) 2024-07-02

Family

ID=76518271

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110487843.9A Active CN113052509B (en) 2021-04-30 2021-04-30 Model evaluation method, model evaluation device, electronic apparatus, and storage medium

Country Status (1)

Country Link
CN (1) CN113052509B (en)

Families Citing this family (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN116362607B (en) * 2023-03-30 2023-11-03 中国人民解放军军事科学院***工程研究院 Material reserve efficiency evaluation method and device

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109446251A (en) * 2018-09-04 2019-03-08 北京睿企信息科技有限公司 System and method for distributed artificial intelligence application development
CN109919684A (en) * 2019-03-18 2019-06-21 上海盛付通电子支付服务有限公司 For generating method, electronic equipment and the computer readable storage medium of information prediction model

Family Cites Families (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US10650928B1 (en) * 2017-12-18 2020-05-12 Clarify Health Solutions, Inc. Computer network architecture for a pipeline of models for healthcare outcomes with machine learning and artificial intelligence
CN110555486B (en) * 2019-09-11 2022-04-19 北京百度网讯科技有限公司 Model structure delay prediction method and device and electronic equipment
CN111191797B (en) * 2020-01-03 2023-07-28 深圳追一科技有限公司 Information processing method, information processing device, electronic equipment and storage medium
CN111488994A (en) * 2020-03-04 2020-08-04 清华大学 Positive sample learning model evaluation method and device

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109446251A (en) * 2018-09-04 2019-03-08 北京睿企信息科技有限公司 System and method for distributed artificial intelligence application development
CN109919684A (en) * 2019-03-18 2019-06-21 上海盛付通电子支付服务有限公司 For generating method, electronic equipment and the computer readable storage medium of information prediction model

Also Published As

Publication number Publication date
CN113052509A (en) 2021-06-29

Similar Documents

Publication Publication Date Title
US11562304B2 (en) Preventative diagnosis prediction and solution determination of future event using internet of things and artificial intelligence
US20180278640A1 (en) Selecting representative metrics datasets for efficient detection of anomalous data
US11972382B2 (en) Root cause identification and analysis
CN113537337A (en) Training method, abnormality detection method, apparatus, device, and storage medium
CN112990281A (en) Abnormal bid identification model training method, abnormal bid identification method and abnormal bid identification device
CN113052509B (en) Model evaluation method, model evaluation device, electronic apparatus, and storage medium
JP7170689B2 (en) Output device, output method and output program
CN117234844A (en) Cloud server abnormality management method and device, computer equipment and storage medium
CN116823164A (en) Business approval method, device, equipment and storage medium
CN116225848A (en) Log monitoring method, device, equipment and medium
CN116739605A (en) Transaction data detection method, device, equipment and storage medium
CN114706856A (en) Fault processing method and device, electronic equipment and computer readable storage medium
CN115563507A (en) Generation method, device and equipment for renewable energy power generation scene
CN112860652B (en) Task state prediction method and device and electronic equipment
CN112200602B (en) Neural network model training method and device for advertisement recommendation
CN115269315A (en) Abnormity detection method, device, equipment and medium
CN114493853A (en) Credit rating evaluation method, credit rating evaluation device, electronic device and storage medium
CN113869904A (en) Suspicious data identification method, device, electronic equipment, medium and computer program
CN113537519A (en) Method and device for identifying abnormal equipment
CN110895564A (en) Potential customer data processing method and device
CN117131405A (en) Application anomaly detection method, device, equipment and medium
CN116049508A (en) Test element information generation method, device, equipment and storage medium
CN117132233A (en) Method, device, equipment and medium for evaluating progress of information system project
CN116126831A (en) Training method of stability prediction model and database stability detection method
CN114048056A (en) Root cause positioning method, apparatus, device, medium, and program product

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant