CN109685089B - System and method for evaluating model performance - Google Patents

System and method for evaluating model performance Download PDF

Info

Publication number
CN109685089B
CN109685089B CN201710971628.XA CN201710971628A CN109685089B CN 109685089 B CN109685089 B CN 109685089B CN 201710971628 A CN201710971628 A CN 201710971628A CN 109685089 B CN109685089 B CN 109685089B
Authority
CN
China
Prior art keywords
model
evaluation
performance
task
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201710971628.XA
Other languages
Chinese (zh)
Other versions
CN109685089A (en
Inventor
谢慧霞
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Original Assignee
Beijing Jingdong Century Trading Co Ltd
Beijing Jingdong Shangke Information Technology Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Beijing Jingdong Century Trading Co Ltd, Beijing Jingdong Shangke Information Technology Co Ltd filed Critical Beijing Jingdong Century Trading Co Ltd
Priority to CN201710971628.XA priority Critical patent/CN109685089B/en
Publication of CN109685089A publication Critical patent/CN109685089A/en
Application granted granted Critical
Publication of CN109685089B publication Critical patent/CN109685089B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F18/00Pattern recognition
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06QINFORMATION AND COMMUNICATION TECHNOLOGY [ICT] SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES; SYSTEMS OR METHODS SPECIALLY ADAPTED FOR ADMINISTRATIVE, COMMERCIAL, FINANCIAL, MANAGERIAL OR SUPERVISORY PURPOSES, NOT OTHERWISE PROVIDED FOR
    • G06Q10/00Administration; Management
    • G06Q10/10Office automation; Time management
    • G06Q10/103Workflow collaboration or project management
    • HELECTRICITY
    • H04ELECTRIC COMMUNICATION TECHNIQUE
    • H04LTRANSMISSION OF DIGITAL INFORMATION, e.g. TELEGRAPHIC COMMUNICATION
    • H04L67/00Network arrangements or protocols for supporting network services or applications
    • H04L67/50Network services
    • H04L67/55Push-based network services

Abstract

The application discloses a system and a method for evaluating model performance. Relating to the technical field of computer information processing, the system comprises: the model management module is used for registering the model according to a preset model registration protocol and providing a uniform model calling interface; the data management module is used for storing model data, and the model data comprises training data and test data; and the performance evaluation module is used for acquiring the model through the model calling interface and evaluating the performance of the model through the model data. The system and the method for evaluating the performance of the model can automatically evaluate the model, reduce the labor input and reduce the labor cost.

Description

System and method for evaluating model performance
Technical Field
The invention relates to the field of computer information processing, in particular to a system and a method for evaluating model performance.
Background
Machine learning is the core of artificial intelligence, and generally, machine learning realizes the prediction function of the machine through different mathematical models built in the machine for different problems. The general flow of machine learning is as follows: design model, training model, verification model, evaluation model, update model/online model. Model performance assessment is one of the important rings, which indicates the degree of success of the trained model. How to realize the sustainable evaluation of the model performance and timely adjusting the corresponding parameters and algorithms of the model is an important task.
The existing model performance evaluation test comprises the following steps: 1. manually making a model and selecting a data set; 2. the data set is divided into two parts: training set and verification set; 3. training the model by using a training set; 4. after training is finished, verifying the model by using a verification set; 5. the performance of the model on the test set was evaluated.
The disadvantages of the prior art are as follows: 1. model training and verification are performed by the same group of people, so that the possibility of performance data counterfeiting exists, and the data reliability is low; 2. the whole process needs manual participation, the model is manually operated, and when the number of the models is large, the evaluation workload is huge; 3. after the evaluation is started, the evaluation completion time cannot be predicted, whether the evaluation is completed or not needs to be manually detected, the possibility of time delay exists, and the evaluation result cannot be obtained in time; 4. when the data set or the algorithm is updated, the model needs to be manually re-evaluated, the model cannot be actively re-evaluated, the performance change of the model cannot be continuously tracked, the change trend is monitored, and abnormal early warning is performed.
Therefore, a new system and method for evaluating model performance is needed.
The above information disclosed in this background section is only for enhancement of understanding of the background of the invention and therefore it may contain information that does not constitute prior art that is already known to a person of ordinary skill in the art.
Disclosure of Invention
In view of this, the present invention provides a system and a method for evaluating model performance, which can automatically evaluate a model, reduce human input, and reduce human cost.
Additional features and advantages of the invention will be set forth in the detailed description which follows, or may be learned by practice of the invention.
According to an aspect of the present invention, there is provided a system for evaluating performance of a model, the system comprising: the model management module is used for registering the model according to a preset model registration protocol and providing a uniform model calling interface; the data management module is used for storing model data, and the model data comprises training data and test data; and the performance evaluation module is used for acquiring the model through the model calling interface and evaluating the performance of the model through the model data.
In an exemplary embodiment of the present disclosure, further comprising: and the rule setting module is used for storing model processing rules and alarm rules, wherein the model processing rules comprise preset time processing rules.
In an exemplary embodiment of the present disclosure, further comprising: and the pushing task module is used for executing pushing tasks, wherein the pushing tasks comprise an alarm pushing task and an evaluation report task.
In an exemplary embodiment of the present disclosure, further comprising: and the evaluation task module is used for analyzing and processing the performance evaluation result of the model through the rule of the rule setting module and calling the pushing task model to push the alarm information when the evaluation result of the model meets the alarm condition.
In an exemplary embodiment of the present disclosure, the model management module includes: the model issuing submodule is used for recording the historical information and issuing information of the model and creating a model evaluation task when a preset condition is met; and the model interface submodule is used for providing a stored model calling interface of the model according to the stored relevant information of the model.
In an exemplary embodiment of the present disclosure, the performance evaluation module includes: the evaluation algorithm submodule is used for providing an evaluation algorithm of the model; and the model evaluation submodule is used for evaluating and analyzing the model through an evaluation algorithm of the model.
In an exemplary embodiment of the present disclosure, the training data is in a 1:1 relationship with the test data and is from the same batch of data sets.
According to an aspect of the present invention, there is provided a method of evaluating performance of a model, the method including: obtaining a model to be evaluated; the model evaluation system automatically evaluates the performance of the model through a preset evaluation algorithm; and performing subsequent processing according to the performance evaluation result of the model.
In an exemplary embodiment of the present disclosure, further comprising: and detecting preset parameters, and automatically evaluating the performance of the model when the state of the preset parameters is updated, wherein the preset parameters comprise model change parameters and algorithm change parameters.
In an exemplary embodiment of the present disclosure, the model evaluation system automatically performs performance evaluation on the model through a predetermined evaluation algorithm, including: and the model evaluation system automatically evaluates the performance of the model through an accuracy algorithm.
In an exemplary embodiment of the disclosure, the performing subsequent processing according to the performance evaluation result of the model includes: judging whether an alarm condition is met according to the performance evaluation result; and when the alarm condition is met, pushing an alarm message to the user.
In an exemplary embodiment of the disclosure, the performing subsequent processing according to the performance evaluation result of the model further includes: generating an evaluation report according to the evaluation result; pushing the assessment report to a user.
According to an aspect of the present invention, there is provided an electronic apparatus including: one or more processors; storage means for storing one or more programs; when executed by one or more processors, cause the one or more processors to implement a method as above.
According to an aspect of the invention, a computer-readable medium is proposed, on which a computer program is stored, characterized in that the program, when executed by a processor, implements a method as in the above.
According to the system and the method for evaluating the model performance, the model can be automatically evaluated, the labor input is reduced, and the labor cost is reduced.
It is to be understood that both the foregoing general description and the following detailed description are exemplary and explanatory only and are not restrictive of the invention, as claimed.
Drawings
The above and other objects, features and advantages of the present invention will become more apparent by describing in detail exemplary embodiments thereof with reference to the attached drawings. The drawings described below are only some embodiments of the invention and other drawings may be derived from those drawings by a person skilled in the art without inventive effort.
FIG. 1 is an architecture diagram illustrating a system for evaluating model performance in accordance with an exemplary embodiment.
FIG. 2 is a block diagram illustrating a system for evaluating model performance in accordance with an exemplary embodiment.
FIG. 3 is a block diagram of a system for evaluating model performance according to another exemplary embodiment.
FIG. 4 is a flow diagram illustrating a method of evaluating model performance in accordance with an exemplary embodiment.
FIG. 5 is a flow chart illustrating a method of evaluating model performance according to another exemplary embodiment.
FIG. 6 is a block diagram illustrating an electronic device in accordance with an example embodiment.
FIG. 7 is a schematic diagram illustrating a computer readable medium according to an example embodiment.
DETAILED DESCRIPTION OF EMBODIMENT (S) OF INVENTION
Example embodiments will now be described more fully with reference to the accompanying drawings. Example embodiments may, however, be embodied in many different forms and should not be construed as limited to the embodiments set forth herein; rather, these embodiments are provided so that this disclosure will be thorough and complete, and will fully convey the concept of example embodiments to those skilled in the art. The same reference numerals denote the same or similar parts in the drawings, and thus, a repetitive description thereof will be omitted.
Furthermore, the described features, structures, or characteristics may be combined in any suitable manner in one or more embodiments. In the following description, numerous specific details are provided to provide a thorough understanding of embodiments of the invention. One skilled in the relevant art will recognize, however, that the invention may be practiced without one or more of the specific details, or with other methods, components, devices, steps, and so forth. In other instances, well-known methods, devices, implementations or operations have not been shown or described in detail to avoid obscuring aspects of the invention.
The block diagrams shown in the figures are functional entities only and do not necessarily correspond to physically separate entities. I.e. these functional entities may be implemented in the form of software, or in one or more hardware modules or integrated circuits, or in different networks and/or processor means and/or microcontroller means.
The flow charts shown in the drawings are merely illustrative and do not necessarily include all of the contents and operations/steps, nor do they necessarily have to be performed in the order described. For example, some operations/steps may be decomposed, and some operations/steps may be combined or partially combined, so that the actual execution sequence may be changed according to the actual situation.
It will be understood that, although the terms first, second, third, etc. may be used herein to describe various components, these components should not be limited by these terms. These terms are used to distinguish one element from another. Thus, a first component discussed below may be termed a second component without departing from the teachings of the disclosed concept. As used herein, the term "and/or" includes any and all combinations of one or more of the associated listed items.
It will be appreciated by those skilled in the art that the drawings are merely schematic representations of exemplary embodiments, and that the blocks or flow charts in the drawings are not necessarily required to practice the present invention and are, therefore, not intended to limit the scope of the present invention.
The following detailed description of exemplary embodiments of the disclosure refers to the accompanying drawings.
FIG. 1 is an architecture diagram illustrating a system for evaluating model performance in accordance with an exemplary embodiment.
As shown in fig. 1, the system architecture 100 may include terminal devices 101, 102, 103, a network 104, and a server 105. The network 104 serves as a medium for providing communication links between the terminal devices 101, 102, 103 and the server 105. Network 104 may include various connection types, such as wired, wireless communication links, or fiber optic cables, to name a few.
The user may use the terminal devices 101, 102, 103 to interact with the server 105 via the network 104 to receive or send messages or the like. The terminal devices 101, 102, 103 may have various communication client applications installed thereon, such as a shopping application, a web browser application, a search application, an instant messaging tool, a mailbox client, social platform software, and the like.
The terminal devices 101, 102, 103 may be various electronic devices having a display screen and supporting web browsing, including but not limited to smart phones, tablet computers, laptop portable computers, desktop computers, and the like.
The server 105 may be a server that provides various services, such as processing a model evaluation test request submitted by a user using the terminal devices 101, 102, 103. The background management server may analyze and otherwise process the received data such as the information query request, and feed back a processing result (e.g., an evaluation result) to the terminal device.
It should be noted that the model evaluation method provided in the embodiment of the present application is generally executed by the server 105, and accordingly, the evaluation result is generally sent to the client 101.
It should be understood that the number of terminal devices, networks, and servers in fig. 1 is merely illustrative. There may be any number of terminal devices, networks, and servers, as desired for implementation.
FIG. 2 is a block diagram illustrating a system for evaluating model performance in accordance with an exemplary embodiment.
The model management module 202 is configured to register the model according to a predetermined model registration protocol, and provide a uniform model call interface. Further, for example, the model management module 202 is mainly responsible for management of the model, the model management module 202 stores a unified model registration protocol, developers register the model according to the corresponding protocol, and record release information after the model is changed and released, and the model management module 202 may also store address information of a model prediction interface, for example.
Model management module 202 may also include, for example: the model issuing submodule is used for recording the historical information and issuing information of the model and creating a model evaluation task when a preset condition is met; and the model interface submodule is used for providing a stored model calling interface of the model according to the stored relevant information of the model.
The data management module 204 is configured to store model data, which includes training data and test data. The data management module 204 may, for example, be used for storage management of data in model evaluation, which may, for example, be uploaded by a tester. In order to reduce the evaluation performance error caused by the data cause, the data set corresponding to the model may be divided into a training set (training data) and a test set (test data), for example. Training data: the data in the training set is used to train the model. Test data: and the data in the test set is used for verifying the model and evaluating the performance of the model, and the test data and the training data have a 1:1 relationship and come from the same batch of data sets.
The performance evaluation module 206 is configured to obtain the model through the model call interface and perform performance evaluation on the model through the model data. The performance evaluation module 206 traverses all models in the task, calls a model prediction interface corresponding to the model to obtain a prediction result, and may analyze the prediction result using an evaluation algorithm, for example, to perform performance evaluation on the model.
The performance evaluation module 206 may, for example, include an evaluation algorithm sub-module for providing an evaluation algorithm for the model; and the model evaluation submodule is used for evaluating and analyzing the model through an evaluation algorithm of the model.
According to the system for evaluating the model performance, the model training is isolated from the model evaluation, an independent model evaluation system is provided, the data reliability can be improved, the model performance evaluation is automatically executed through the rule designated by the user, the model can be automatically evaluated, the labor input is reduced, and the labor cost is reduced.
In an exemplary embodiment of the present disclosure, further comprising: and the rule setting module is used for storing model processing rules and alarm rules, wherein the model processing rules comprise preset time processing rules. Among them, the model processing rules may include, for example: and the engine detects the matching relation between the task scheduling rule and the time, and informs the execution engine to execute the evaluation task when the specified time of the scheduling rule is reached. The alarm rules include: and after the model evaluation is finished, the rule engine detects whether the evaluation result is matched with the alarm rule, and if the evaluation result is matched with the alarm rule, the rule engine informs the execution engine to execute the alarm task.
In an exemplary embodiment of the present disclosure, further comprising: and the pushing task module is used for executing pushing tasks, wherein the pushing tasks comprise an alarm pushing task and an evaluation report task. Wherein, the alert pushing task may include, for example: a mail engine: push assessment reports and alert mail. A short message engine: and pushing the alarm short message. And (4) alarming mail: and when the model evaluation result is matched with the alarm rule, generating an alarm mail according to the alarm receiving mailbox configured by the task. Warning short messages: and when the model evaluation result is matched with an alarm rule, generating an alarm short message according to the alarm receiving mobile phone number configured by the task.
The evaluation reporting task may, for example, include: and generating a corresponding evaluation report according to the evaluation result of the model in the task, wherein the report shows the model, the detailed evaluation result of the model and the evaluation trend of the model under the task.
According to the system for evaluating the model performance, the evaluation result mail is automatically generated and synchronously sent after the evaluation is finished, so that the delay of obtaining the evaluation result is reduced.
In an exemplary embodiment of the present disclosure, further comprising: and the evaluation task module is used for analyzing and processing the performance evaluation result of the model through the rule of the rule setting module and calling the pushing task model to push the alarm information when the evaluation result of the model meets the alarm condition. The evaluation task management is mainly used for configuration of the evaluation task, task queue management and evaluation result management, and the evaluation task can be created by a tester. Task configuration: the method comprises a task name, a task association model, a task scheduling rule, an alarm rule and a result receiver. The task and the model present a 1: N relationship, the task execution rule specifies the specific execution time of the task, and whether the task is repeatedly executed or not can adopt a similar Cron expression mode. And setting an alarm rule to alarm when the evaluation result reaches a specified state or threshold value. And the result receiver designates the receiver of the evaluation report and the alarm information and configures the short messages and the mails of the mobile phone.
According to the system for evaluating the model performance, the possibility of false alarm of the evaluation result caused by data reasons is reduced according to the alarm rule (such as fluctuation amplitude) set by a user.
FIG. 3 is a block diagram of a system for evaluating model performance according to another exemplary embodiment.
As shown in fig. 3, the system for evaluating the performance of the model includes: the model management module 302 is used for managing the models, unifying the model registration protocols, allowing developers to register the models according to the corresponding protocols, recording release information after the models are changed and released, and storing address information of model prediction interfaces.
The model registration protocol 3022 contracts the model registration mode, and makes explicit provisions on how the model is bound to the system. The system may obtain model information according to the manner specified by the protocol.
When the model is released, the model release management 3024 records model history information and release information, and actively creates a model evaluation task.
The model addressing service 3026 transmits model information, returns a prediction interface corresponding to the model, and the system calls the prediction interface to obtain a model prediction result.
The data set management module 304 is used for model evaluation and is mainly uploaded by test personnel. In order to reduce the evaluation performance error caused by the data reason, the data set corresponding to the model is divided into a training section and a test set. Managing a training set: the data in the training set is used to train the model. Managing a test set: the data in the test set is used for verifying the model and evaluating the performance of the model, has a 1:1 relation with the training set and comes from the same batch of data sets.
The evaluation task management 306 is used for configuration of evaluation tasks, task queue management, and evaluation result management. The evaluation task is created by the tester.
Task configuration 3062 includes task name, task association model, task scheduling rules, alarm rules, result recipient. The task and the model present a 1: N relationship, the task execution rule specifies the specific execution time of the task, and whether the task is repeatedly executed or not can adopt a similar Cron expression mode. And setting an alarm rule to alarm when the evaluation result reaches a specified state or threshold value. And the result receiver designates the receiver of the evaluation report and the alarm information and configures the short messages and the mails of the mobile phone. The Cron expression is a character string expressing a time dimension, and is composed of 6 (or 7) characters separated by spaces, and can represent a time point or a time range. There may be different syntax formats for different timing task frameworks.
After the task queue management 3064 completes creation of the evaluation task, the corresponding task is pushed to the evaluation task queue.
After the execution of the task is completed, the evaluation result management 3066 pushes the evaluation report and the alarm information according to the alarm rule configured by the user and the result receiver information.
The rules engine module 308 contains detection of task scheduling rules and alarm rules.
The task scheduling rule 3082 engine detects a matching relationship between the task scheduling rule and time, and notifies the execution engine to execute the evaluation task when the specified time of the scheduling rule is reached.
After the alarm rule 3084 model is evaluated, the rule engine 308 detects whether the evaluation result matches the alarm rule, and if so, informs the execution engine to execute the alarm task.
The evaluation task queue 310 is used to store evaluation tasks to be executed.
The execution engine 312 traverses the evaluation task queue 310, performs task scheduling rule detection using the rule engine 308, and executes the evaluation task after the execution condition is satisfied. After the evaluation task is completed, a task evaluation report is generated and inserted into the push task queue 310; and the rule engine 308 is used for matching the alarm rules, executing the alarm task after the alarm conditions are met, generating alarm information and inserting the alarm information into the push task queue 314.
And traversing all models in the task by the model evaluation 3124, calling a model prediction interface corresponding to the model to obtain a prediction result, analyzing the prediction result by using an evaluation algorithm, and further performing performance evaluation on the model, wherein the model can be evaluated by using an accuracy algorithm, for example.
The push task queue 314 stores evaluation reports and alarm information to be pushed.
The push engine 316 traverses the push task queue and executes the corresponding push task. The push task includes an evaluation report: and generating a corresponding evaluation report according to the evaluation result of the model in the task, wherein the report shows the model, the detailed evaluation result of the model and the evaluation trend of the model under the task.
The mail engine 3162 pushes assessment reports and alert mails. And when the model evaluation result is matched with the alarm rule, generating an alarm mail according to the alarm receiving mailbox configured by the task.
The short message engine 3164 pushes the alarm short message. And when the model evaluation result is matched with an alarm rule, generating an alarm short message according to the alarm receiving mobile phone number configured by the task.
Those skilled in the art will appreciate that the modules described above may be distributed in the apparatus according to the description of the embodiments, or may be modified accordingly in one or more apparatuses unique from the embodiments. The modules of the above embodiments may be combined into one module, or further split into multiple sub-modules.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiment of the present invention can be embodied in the form of a software product, which can be stored in a non-volatile storage medium (which can be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which can be a personal computer, a server, a mobile terminal, or a network device, etc.) to execute the method according to the embodiment of the present invention.
FIG. 4 is a flow diagram illustrating a method of evaluating model performance in accordance with an exemplary embodiment.
As shown in fig. 4, in S402, a model to be evaluated is acquired. The model may be, for example, a model obtained after machine learning. The model to be evaluated may be uploaded, for example, by a developer.
In S404, the model evaluation system automatically performs performance evaluation on the model through a predetermined evaluation algorithm. The model evaluation system may, for example, automatically perform a performance evaluation on the model via an accuracy algorithm.
In S406, subsequent processing is performed according to the performance evaluation result of the model. The method comprises the following steps: judging whether an alarm condition is met according to the performance evaluation result; and when the alarm condition is met, pushing an alarm message to the user. Further comprising: generating an evaluation report according to the evaluation result; pushing the assessment report to a user.
According to the method for evaluating the model performance, the model training is isolated from the model evaluation, an independent model evaluation system is provided, the data reliability is improved, the model performance evaluation is automatically executed according to the user specified rule, the manual participation is reduced, the human resources are saved, the evaluation result mail is automatically generated after the evaluation is finished, the synchronous transmission is carried out, and the delay of obtaining the evaluation result is reduced.
In an exemplary embodiment of the present disclosure, further comprising: and detecting preset parameters, and automatically evaluating the performance of the model when the state of the preset parameters is updated, wherein the preset parameters comprise model change parameters and algorithm change parameters.
According to the method for evaluating the model performance, when the data set or the algorithm is updated, the model is automatically re-evaluated, the model is continuously observed, the performance trend analysis of the model is realized, and the jitter percentage in the normal calculation range is obtained.
It should be clearly understood that the present disclosure describes how to make and use particular examples, but the principles of the present disclosure are not limited to any details of these examples. Rather, these principles can be applied to many other embodiments based on the teachings of the present disclosure.
FIG. 5 is a flow chart illustrating a method of evaluating model performance according to another exemplary embodiment.
In S502, the developer registers the model. The communication protocol and the prediction interface corresponding to the model can be specified.
In S504, the model is changed.
In S506, the tester uploads the data set. And uploading the data set by the tester, splitting the data set into a training set and a testing set, and matching the training set and the testing set with the corresponding models.
In S508, a task is created. After the selected model is tested, an alarm rule (alarm threshold value, alarm receiver (mailbox & mobile phone number)) and a task execution rule of the model are specified, and a test task is created. The task execution rule adopts a Cron expression rule, and a user can select one or more models or all models at a time.
In S510, a task queue is generated. And executing the task through a Worker, regularly polling the task by a rule engine to generate a task queue, and generating a task list to be executed when the task execution time is reached or the task is not completed. Worker is a task executor, and the functions can be realized through various forms of other common timing tasks, such as Java Timer, Quartz and the like.
In S512, the task queue is traversed.
In S514, the model is traversed. And traversing the task list to be executed, traversing the model in the task when the task is not locked and not suspended, calling the corresponding interface according to the protocol to predict the model, and executing the performance evaluation of the model.
In S516, an evaluation result is generated. And locking the task before performance evaluation is carried out on each model, and unlocking the task after the performance evaluation is finished. After the task is wholly successful, an evaluation result mail is generated and inserted into a pushing task queue; and when the evaluation result is matched with the alarm rule, generating an alarm mail and an alarm short message, and inserting the alarm mail and the alarm short message into a pushing task queue. a) According to the set execution rule, one test task can be executed for multiple times, and multiple test records are provided and correspond to different evaluation results respectively. b) During the task execution period, the user can select whether to suspend the task, and after the user selects suspension, the automatic test module detects the task suspension information, namely, the evaluation of the model under the task is suspended. The suspended task may perform a resume operation. c) The user can edit and modify the task information at any time without influencing the test task currently carrying out model evaluation.
In S518, an evaluation report is generated.
In S520, an alert mail and a short message are generated.
In S522, the report and the alarm information are transmitted. And the push engine polls the push task queue at regular time and actively pushes information when a task to be pushed exists.
Those skilled in the art will appreciate that all or part of the steps implementing the above embodiments are implemented as computer programs executed by a CPU. The computer program, when executed by the CPU, performs the functions defined by the method provided by the present invention. The program may be stored in a computer readable storage medium, which may be a read-only memory, a magnetic or optical disk, or the like.
Furthermore, it should be noted that the above-mentioned figures are only schematic illustrations of the processes involved in the method according to exemplary embodiments of the invention, and are not intended to be limiting. It will be readily understood that the processes shown in the above figures are not intended to indicate or limit the chronological order of the processes. In addition, it is also readily understood that these processes may be performed synchronously or asynchronously, e.g., in multiple modules.
FIG. 6 is a block diagram illustrating an electronic device in accordance with an example embodiment.
An electronic device 200 according to this embodiment of the invention is described below with reference to fig. 6. The electronic device 200 shown in fig. 6 is only an example, and should not bring any limitation to the functions and the scope of use of the embodiments of the present invention.
As shown in fig. 6, the electronic device 200 is embodied in the form of a general purpose computing device. The components of the electronic device 200 may include, but are not limited to: at least one processing unit 210, at least one memory unit 220, a bus 230 connecting different system components (including the memory unit 220 and the processing unit 210), a display unit 240, and the like.
Wherein the storage unit stores program code executable by the processing unit 210 to cause the processing unit 210 to perform the steps according to various exemplary embodiments of the present invention described in the above-mentioned electronic prescription flow processing method section of the present specification. For example, the processing unit 210 may perform the steps as shown in fig. 4.
The memory unit 220 may include readable media in the form of volatile memory units, such as a random access memory unit (RAM)2201 and/or a cache memory unit 2202, and may further include a read only memory unit (ROM) 2203.
The storage unit 220 may also include a program/utility 2204 having a set (at least one) of program modules 2205, such program modules 2205 including, but not limited to: an operating system, one or more application programs, other program modules, and program data, each of which, or some combination thereof, may comprise an implementation of a network environment.
Bus 230 may be one or more of several types of bus structures, including a memory unit bus or memory unit controller, a peripheral bus, an accelerated graphics port, a processing unit, or a local bus using any of a variety of bus architectures.
The electronic device 200 may also communicate with one or more external devices 300 (e.g., keyboard, pointing device, bluetooth device, etc.), with one or more devices that enable a user to interact with the electronic device 200, and/or with any devices (e.g., router, modem, etc.) that enable the electronic device 200 to communicate with one or more other computing devices. Such communication may occur via an input/output (I/O) interface 250. Also, the electronic device 200 may communicate with one or more networks (e.g., a Local Area Network (LAN), a Wide Area Network (WAN), and/or a public network such as the Internet) via the network adapter 260. The network adapter 260 may communicate with other modules of the electronic device 200 via the bus 230. It should be appreciated that although not shown in the figures, other hardware and/or software modules may be used in conjunction with the electronic device 200, including but not limited to: microcode, device drivers, redundant processing units, external disk drive arrays, RAID systems, tape drives, and data backup storage systems, among others.
Through the above description of the embodiments, those skilled in the art will readily understand that the exemplary embodiments described herein may be implemented by software, or by software in combination with necessary hardware. Therefore, the technical solution according to the embodiments of the present disclosure may be embodied in the form of a software product, which may be stored in a non-volatile storage medium (which may be a CD-ROM, a usb disk, a removable hard disk, etc.) or on a network, and includes several instructions to enable a computing device (which may be a personal computer, a server, or a network device, etc.) to execute the above-mentioned electronic prescription flow processing method according to the embodiments of the present disclosure.
FIG. 7 is a schematic diagram illustrating a computer readable medium according to an example embodiment.
Referring to fig. 7, a program product 400 for implementing the above method according to an embodiment of the present invention is described, which may employ a portable compact disc read only memory (CD-ROM) and include program code, and may be run on a terminal device, such as a personal computer. However, the program product of the present invention is not limited in this regard and, in the present document, a readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.
The program product may employ any combination of one or more readable media. The readable medium may be a readable signal medium or a readable storage medium. A readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any combination of the foregoing. More specific examples (a non-exhaustive list) of the readable storage medium include: an electrical connection having one or more wires, a portable disk, a hard disk, a Random Access Memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing.
The computer readable storage medium may include a propagated data signal with readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated data signal may take many forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A readable storage medium may also be any readable medium that is not a readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device. Program code embodied on a readable storage medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.
Program code for carrying out operations for aspects of the present invention may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, C + + or the like and conventional procedural programming languages, such as the "C" programming language or similar programming languages. The program code may execute entirely on the user's computing device, partly on the user's device, as a stand-alone software package, partly on the user's computing device and partly on a remote computing device, or entirely on the remote computing device or server. In the case of a remote computing device, the remote computing device may be connected to the user computing device through any kind of network, including a Local Area Network (LAN) or a Wide Area Network (WAN), or may be connected to an external computing device (e.g., through the internet using an internet service provider).
The computer readable medium carries one or more programs which, when executed by a device, cause the computer readable medium to perform the functions of: obtaining a model to be evaluated; the model evaluation system automatically evaluates the performance of the model through a preset evaluation algorithm; and performing subsequent processing according to the performance evaluation result of the model.
Those skilled in the art will readily appreciate from the foregoing detailed description that the systems and methods for assessing performance of a model according to embodiments of the present invention may have one or more of the following advantages.
According to some embodiments, the method and the device can reduce the artificial data counterfeiting risk and improve the data credibility by means of model training and model evaluation isolation
According to some embodiments, the system of the invention can automatically perform model performance evaluation, reduce human input and reduce human cost
According to some embodiments, the system of the invention can monitor the state of the model or the state of the data, automatically evaluate the model after the data set or the model is updated, continuously track the performance change of the model, realize the performance trend analysis of the model, quickly locate the defects of the model, and shorten the training time before the model is on line as much as possible
Exemplary embodiments of the present invention are specifically illustrated and described above. It is to be understood that the invention is not limited to the precise construction, arrangements, or instrumentalities described herein; on the contrary, the invention is intended to cover various modifications and equivalent arrangements included within the spirit and scope of the appended claims.
In addition, the structures, the proportions, the sizes, and the like shown in the drawings of the present specification are only used for matching with the contents disclosed in the specification, so as to be understood and read by those skilled in the art, and are not used for limiting the limit conditions which the present disclosure can implement, so that the present disclosure has no technical essence, and any modification of the structures, the change of the proportion relation, or the adjustment of the sizes, should still fall within the scope which the technical contents disclosed in the present disclosure can cover without affecting the technical effects which the present disclosure can produce and the purposes which can be achieved. In addition, the terms "above", "first", "second" and "a" as used in the present specification are for the sake of clarity only, and are not intended to limit the scope of the present disclosure, and changes or modifications of the relative relationship may be made without substantial technical changes and modifications.

Claims (13)

1. A system for evaluating performance of a model, comprising:
the model management module is used for registering the model according to a preset model registration protocol and providing a uniform model calling interface;
the data management module is used for storing model data, and the model data comprises training data and test data;
the rule setting module is used for storing model processing rules and alarm rules, wherein the model processing rules comprise preset time processing rules; and
and the performance evaluation module is used for acquiring the model through the model calling interface and evaluating the performance of the model through the model data.
2. The system of claim 1, further comprising:
and the pushing task module is used for executing pushing tasks, wherein the pushing tasks comprise an alarm pushing task and an evaluation report task.
3. The system of claim 2, further comprising:
and the evaluation task module is used for analyzing and processing the performance evaluation result of the model through the rule of the rule setting module and calling the pushing task model to push the alarm information when the evaluation result of the model meets the alarm condition.
4. The system of claim 1, wherein the model management module comprises:
the model issuing submodule is used for recording the historical information and issuing information of the model and creating a model evaluation task when a preset condition is met; and
and the model interface submodule is used for providing a stored model calling interface of the model according to the stored relevant information of the model.
5. The system of claim 1, wherein the performance evaluation module comprises:
the evaluation algorithm submodule is used for providing an evaluation algorithm of the model; and
and the model evaluation submodule is used for evaluating and analyzing the model through an evaluation algorithm of the model.
6. The system of claim 1, wherein the training data is in a 1:1 relationship with the test data and is from the same batch of data sets.
7. A method of evaluating performance of a model, comprising:
obtaining a model to be evaluated;
the model evaluation system automatically evaluates the performance of the model through a preset evaluation algorithm; and
and performing subsequent processing according to the performance evaluation result of the model.
8. The method of claim 7, further comprising:
and detecting preset parameters, and automatically evaluating the performance of the model when the state of the preset parameters is updated, wherein the preset parameters comprise model change parameters and algorithm change parameters.
9. The method of claim 7, wherein the model evaluation system automatically evaluates the model for performance through a predetermined evaluation algorithm, comprising:
and the model evaluation system automatically evaluates the performance of the model through an accuracy algorithm.
10. The method of claim 7, wherein the subsequent processing based on the performance evaluation of the model comprises:
judging whether an alarm condition is met according to the performance evaluation result;
and when the alarm condition is met, pushing an alarm message to the user.
11. The method of claim 10, wherein the subsequent processing based on the performance evaluation of the model further comprises:
generating an evaluation report according to the evaluation result;
pushing the assessment report to a user.
12. An electronic device, comprising:
one or more processors;
storage means for storing one or more programs;
when executed by the one or more processors, cause the one or more processors to implement the method of any one of claims 7-11.
13. A computer-readable medium, on which a computer program is stored, which, when being executed by a processor, carries out the method according to any one of claims 7-11.
CN201710971628.XA 2017-10-18 2017-10-18 System and method for evaluating model performance Active CN109685089B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201710971628.XA CN109685089B (en) 2017-10-18 2017-10-18 System and method for evaluating model performance

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201710971628.XA CN109685089B (en) 2017-10-18 2017-10-18 System and method for evaluating model performance

Publications (2)

Publication Number Publication Date
CN109685089A CN109685089A (en) 2019-04-26
CN109685089B true CN109685089B (en) 2020-12-22

Family

ID=66184072

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201710971628.XA Active CN109685089B (en) 2017-10-18 2017-10-18 System and method for evaluating model performance

Country Status (1)

Country Link
CN (1) CN109685089B (en)

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110928907A (en) * 2019-11-18 2020-03-27 第四范式(北京)技术有限公司 Target task processing method and device and electronic equipment
CN111026436B (en) * 2019-12-09 2021-04-02 支付宝(杭州)信息技术有限公司 Model joint training method and device
CN111144738A (en) * 2019-12-24 2020-05-12 太平金融科技服务(上海)有限公司 Information processing method, information processing device, computer equipment and storage medium
CN111581272B (en) * 2020-05-25 2023-08-29 泰康保险集团股份有限公司 System, method, apparatus, and computer readable medium for processing data
CN111767948B (en) * 2020-06-22 2023-08-08 北京百度网讯科技有限公司 Model interception method and device, electronic equipment and storage medium
CN112130865A (en) * 2020-09-30 2020-12-25 北京明略昭辉科技有限公司 Model management method and system
CN113271236A (en) * 2021-06-11 2021-08-17 国家计算机网络与信息安全管理中心 Engine evaluation method, device, equipment and storage medium
CN113554357A (en) * 2021-09-22 2021-10-26 北京国研科技咨询有限公司 Informatization project cost evaluation method based on big data and electronic equipment
CN114860402B (en) * 2022-05-10 2023-10-20 北京百度网讯科技有限公司 Scheduling strategy model training method, scheduling device, scheduling equipment and scheduling medium

Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006042358A1 (en) * 2004-10-22 2006-04-27 In The Chair Pty Ltd A method and system for assessing a musical performance
CN101021810A (en) * 2007-03-08 2007-08-22 山东浪潮齐鲁软件产业股份有限公司 Software system performance estimating method
CN101482891A (en) * 2008-01-08 2009-07-15 富士通株式会社 Performance evaluation simulation
CN104200087A (en) * 2014-06-05 2014-12-10 清华大学 Parameter optimization and feature tuning method and system for machine learning
CN106169096A (en) * 2016-06-24 2016-11-30 山西大学 A kind of appraisal procedure of machine learning system learning performance
CN106250987A (en) * 2016-07-22 2016-12-21 无锡华云数据技术服务有限公司 A kind of machine learning method, device and big data platform
CN106663224A (en) * 2014-06-30 2017-05-10 亚马逊科技公司 Interactive interfaces for machine learning model evaluations

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20040236552A1 (en) * 2003-05-22 2004-11-25 Kimberly-Clark Worldwide, Inc. Method of evaluating products using a virtual environment

Patent Citations (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2006042358A1 (en) * 2004-10-22 2006-04-27 In The Chair Pty Ltd A method and system for assessing a musical performance
CN101021810A (en) * 2007-03-08 2007-08-22 山东浪潮齐鲁软件产业股份有限公司 Software system performance estimating method
CN101482891A (en) * 2008-01-08 2009-07-15 富士通株式会社 Performance evaluation simulation
CN104200087A (en) * 2014-06-05 2014-12-10 清华大学 Parameter optimization and feature tuning method and system for machine learning
CN106663224A (en) * 2014-06-30 2017-05-10 亚马逊科技公司 Interactive interfaces for machine learning model evaluations
CN106169096A (en) * 2016-06-24 2016-11-30 山西大学 A kind of appraisal procedure of machine learning system learning performance
CN106250987A (en) * 2016-07-22 2016-12-21 无锡华云数据技术服务有限公司 A kind of machine learning method, device and big data platform

Also Published As

Publication number Publication date
CN109685089A (en) 2019-04-26

Similar Documents

Publication Publication Date Title
CN109685089B (en) System and method for evaluating model performance
CN109460664B (en) Risk analysis method and device, electronic equipment and computer readable medium
CN106992994B (en) Automatic monitoring method and system for cloud service
RU2713574C1 (en) Systems and devices for assessing the architecture and implementing strategies for security
CN110851342A (en) Fault prediction method, device, computing equipment and computer readable storage medium
Kaufman et al. Democratizing online controlled experiments at Booking. com
CN113157545A (en) Method, device and equipment for processing service log and storage medium
CA3123916C (en) Microapp functionality recommendations with cross-application activity correlation
US20180143897A1 (en) Determining idle testing periods
US10372572B1 (en) Prediction model testing framework
US20210365762A1 (en) Detecting behavior patterns utilizing machine learning model trained with multi-modal time series analysis of diagnostic data
WO2023200596A1 (en) Automated positive train control event data extraction and analysis engine and method therefor
CN113282920B (en) Log abnormality detection method, device, computer equipment and storage medium
CN114398465A (en) Exception handling method and device of Internet service platform and computer equipment
CN116662193A (en) Page testing method and device
WO2022022572A1 (en) Calculating developer time during development process
US11782938B2 (en) Data profiling and monitoring
CN111767290B (en) Method and apparatus for updating user portraits
CN114461499A (en) Abnormal information detection model construction method and gray scale environment abnormal detection method
CN109960659B (en) Method and device for detecting application program
CN112882948A (en) Stability testing method, device and system for application and storage medium
CN114036054A (en) Code quality evaluation method, device, equipment, medium and program product
CN113918525A (en) Data exchange scheduling method, system, electronic device, medium, and program product
Brozek et al. Application of mobile devices within distributed simulation-based decision making
CN114301713A (en) Risk access detection model training method, risk access detection method and risk access detection device

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant