CN117422155A

CN117422155A - Automatic iteration model service system and method based on automatic data screening

Info

Publication number: CN117422155A
Application number: CN202311746585.7A
Authority: CN
Inventors: 许靖; 柴磊; 郭帅; 袁靖
Original assignee: Shenzhen Magic Digital Intelligent Artificial Intelligence Co ltd
Current assignee: Shenzhen Magic Digital Intelligent Artificial Intelligence Co ltd
Priority date: 2023-12-19
Filing date: 2023-12-19
Publication date: 2024-01-19

Abstract

The invention provides an automatic iterative model service system and method based on data automatic screening, which are used for selecting model types and model parameters based on training tasks, importing the model types and the model parameters into a model center, and constructing to obtain an initial model; based on the data characteristics of the update data corresponding to the training task, matching an optimal scheduling strategy from a scheduling strategy library, and configuring self-iterative scheduling of an initial model according to the optimal scheduling strategy; establishing an iteration task based on self-iteration scheduling and combining model management parameters, and establishing a new model or upgrading an initial model based on the iteration task to obtain a second model; and performing performance evaluation on the second model, and continuously iterating the second model based on the performance evaluation result until the evaluation result meets the requirement, stopping self-iteration to obtain a final optimized model, solving the problems of data drift and concept drift and the problems of lack of generality and portability, and improving the performance of the model.

Description

Automatic iteration model service system and method based on automatic data screening

Technical Field

The invention relates to the fields of computer science and machine learning, in particular to an automatic iteration model service system and method based on automatic data screening.

Background

In the field of machine learning, iterative upgrading of models has been a key challenge. Traditional machine learning models typically remain static after training and are not self-adaptive. However, the rapidly changing data and real-time requirements place new demands on the flexibility of the model. This provides power for the self-iteration of the model to ensure its continued optimization and improvement.

Currently, many applications in the machine learning field rely on manual intervention to update parameters or architecture of models to accommodate new data and challenges. This method is not only time consuming and laborious, but also may lead to human error. Researchers have therefore sought automated and intelligent methods so that models can automatically identify performance degradation, improve strategies, and iterate themselves without human intervention.

In recent years, with rapid development of deep learning technology, a self-iterative technology of a neural network model has received a great deal of attention. The techniques cover methods such as automatic hyper-parameter adjustment, data enhancement, model architecture search, transfer learning and the like, so that the model can realize continuous improvement of adaptability and performance in various applications. These innovations not only accelerate the development and application of models, but also improve the level of intelligence and automation of the machine learning system, thereby meeting the evolving demands. Under the technical background, the model self-iteration system based on the scheduling technology has potential importance, and brings new possibility for the field of machine learning.

The current model has the following problems in iterative upgrade:

1. manual intervention cost problem: upgrades and modifications to traditional machine learning models often require expensive human effort and time resources. Professional data scientists and engineers must manually intervene to perform parameter adjustments, model retraining and deployment. This process is not only time consuming and laborious, but also prone to error.

2. Model performance plagues: machine learning models often face performance challenges in a changing data environment. The performance of the model may be affected by new data distributions, conceptual drift, or problems inside the model. However, there is a lack of effective methods to identify and solve these performance problems in time, resulting in a gradual loss of effectiveness of the model in practical applications.

3. Data drift and concept drift problems: the distribution and concept of data in the real world often changes, which may be due to seasonal changes, new trends, changes in the data sources, or other factors. Conventional machine learning models are difficult to automatically accommodate for these changes, requiring manual adjustment and retraining, resulting in system downtime and performance degradation.

4. Lack of versatility and portability issues: automated machine learning iterative methods are typically domain or application specific, lacking versatility and portability. This means that each new problem or application requires re-development and optimization of the iterative process, resulting in repeated labor and resource wastage.

Disclosure of Invention

The invention provides an automatic iteration model service system and method based on data automatic screening, which are used for solving the problems in the background technology.

An automatic iterative model service system based on data automatic screening, comprising:

the model building module is used for selecting model types and model parameters based on training tasks, importing the model types and the model parameters into a model center, and building to obtain an initial model;

the trigger determining module is used for matching an optimal scheduling strategy from the scheduling strategy library based on the data characteristics of the update data corresponding to the training task, and configuring self-iterative scheduling of the initial model according to the optimal scheduling strategy;

the scheduling execution module is used for establishing an iteration task based on self-iteration scheduling and combining model management parameters, and establishing a new model or upgrading an initial model based on the iteration task to obtain a second model;

and the evaluation iteration module is used for evaluating the performance of the second model, and continuing to iterate the second model based on the performance evaluation result until the evaluation result meets the requirement, and stopping self-iteration to obtain a final optimization model.

Preferably, an automatic iterative model service system based on automatic data screening further comprises: the resource allocation module is used for allocating resources in the model scheduling and iteration processes;

a resource allocation module, comprising:

the monitoring unit is used for monitoring the service system to obtain a scheduling and iterative operation process;

the resource allocation unit is used for planning and allocating the existing resources based on the scheduling and iterative operation process, and allocating the resources to the corresponding operation modules and operation units according to the allocation result.

Preferably, an automatic iterative model service system based on data automatic screening, a model building module, includes:

the type determining unit is used for analyzing the training task, determining the training purpose and determining the model type based on the training purpose;

the parameter determining unit is used for acquiring a training data set corresponding to the training task and setting model parameters based on the data characteristics of the training data set;

the model construction unit is used for importing the model frames corresponding to the model types into the model information, configuring the model frames by using model parameters, and constructing to obtain an initial model.

Preferably, the trigger determining module includes:

the data acquisition unit is used for periodically acquiring new data related to the training task, judging whether the data difference between the new data and the training data is larger than the preset data difference, if so, taking the new data related to the training task as updated data, otherwise, not updating the data;

the difference determining unit is used for acquiring data set classification characteristics of the training data after the presence of the update data is detected, grouping the update data based on the data set classification characteristics to obtain a plurality of groups of new data sets, and determining set differences between the new data sets and the data sets of the training data;

the strategy determining unit is used for acquiring a scheduling strategy to be selected which meets the data characteristics from the scheduling strategy library based on the data characteristics of the updated data, determining the scheduling weight of the scheduling type based on the set difference, and selecting and acquiring an optimal scheduling strategy from the scheduling strategy to be selected based on the scheduling weight;

and the scheduling determination unit is used for determining scheduling tasks and scheduling configuration resources from the optimal scheduling strategy, and performing resource scheduling based on the scheduling tasks and the scheduling configuration resources to obtain self-iterative scheduling of the initial model.

Preferably, the trigger determining module further includes:

the triggering determining unit is used for starting a scheduling technology by taking the monitored updated data as a triggering condition, and the scheduling technology assists in completing the selection of the optimal scheduling strategy;

the trigger determining unit is further used for setting a scheduling time period, and when the existence of updated data is not monitored in the scheduling time period, starting a scheduling technology to select a fixed scheduling strategy to perform self-iterative scheduling on the initial model.

Preferably, the scheduling execution module includes:

the scheduling function determining unit is used for acquiring a scheduling flow from iterative scheduling and acquiring the scheduling function of each node in the scheduling flow;

the execution data determining unit is used for matching corresponding target management parameters from the model management parameters based on the functional characteristics of the scheduling function, and integrating the functional characteristics and the corresponding target management parameters to obtain total execution data;

the execution function determining unit is used for decomposing the total execution data according to the unit execution characteristics to obtain a plurality of single execution data and determining the execution function corresponding to the single execution data;

the task determining unit is used for sequencing all execution functions based on the iteration standard execution sequence to obtain an execution function sequence, and establishing an iteration task based on the execution function sequence;

the judging unit is used for acquiring the data characteristics and the execution characteristics of the iteration task and judging whether the execution characteristics are re-executed or not;

if yes, determining that the iteration task is the establishment of a new model and calling related resources established by the new model, wherein the data characteristics are different from the training data characteristics of the initial model;

otherwise, the data characteristics are different from the training data characteristics of the initial model, the iteration task is determined to be the upgrading of the initial model, and relevant resources of the model upgrading are called;

the model determining unit is used for carrying out new model and establishment or upgrading of the initial model based on the iterative task and combining related resources to obtain a second model.

Preferably, the scheduling function determining unit includes:

the node determining unit is used for acquiring a scheduling flow of iterative scheduling, dividing the scheduling flow according to unit scheduling characteristics and obtaining a plurality of nodes;

and the function determining unit is used for determining the scheduling function of the node based on the unit scheduling characteristics corresponding to the node.

The method is used for acquiring the scheduling flow from the iterative scheduling and acquiring the scheduling function of each node in the scheduling flow.

Preferably, the evaluation iteration module comprises:

the index determining unit is used for determining an evaluation index of the model based on the training task, and dividing the evaluation index into a pre-modeling evaluation index, a modeling evaluation index and a modeling post-evaluation index according to an evaluation position node of the evaluation index in model establishment;

the classification evaluation unit is used for evaluating modeling data before modeling on the second model based on the evaluation index before modeling to obtain a first evaluation result, performing modeling evaluation on the second model based on the evaluation index in modeling to obtain a second evaluation result, and evaluating the modeled model based on the evaluation index after modeling to obtain a third evaluation result;

the data determining unit is used for constructing overall evaluation data from the first evaluation result, the second evaluation result and the third evaluation result, and correlating the overall evaluation data based on a modeling flow to obtain overall evaluation correlation data;

the comprehensive evaluation unit is used for comprehensively evaluating the overall evaluation associated data to obtain a performance evaluation result;

and the iteration unit is used for continuing to iterate the second model based on the performance evaluation result, comparing the performance evaluation result with a preset evaluation requirement after each iteration is finished, and stopping self-iteration if the performance evaluation result meets the preset evaluation requirement to obtain a final optimization model.

Preferably, the comprehensive evaluation unit includes:

the first evaluation unit is used for evaluating the overall evaluation associated data according to a preset evaluation rule to obtain an initial evaluation result;

and the secondary evaluation unit is used for re-evaluating the initial evaluation result based on the association relation in the overall evaluation association data to obtain a performance evaluation result.

An automatic iterative model service method based on data automatic screening, comprising the following steps:

s1: based on the training task, selecting a model type and model parameters, importing the model type and the model parameters into a model center, and constructing to obtain an initial model;

s2: based on the data characteristics of the update data corresponding to the training task, matching an optimal scheduling strategy from a scheduling strategy library, and configuring self-iterative scheduling of an initial model according to the optimal scheduling strategy;

s3: establishing an iteration task based on self-iteration scheduling and combining model management parameters, and establishing a new model or upgrading an initial model based on the iteration task to obtain a second model;

s4: and performing performance evaluation on the second model, and continuing to iterate the second model based on the performance evaluation result until the evaluation result meets the requirement, and stopping self-iteration to obtain a final optimization model.

Compared with the prior art, the invention has the following beneficial effects:

1. automated model iteration: one of the main innovation points of the invention is to realize automatic iteration of the machine learning model without manual parameter adjustment and model update. The new modeling task can be automatically triggered, and the upgrading process of the model is accelerated.

2. Adaptive and automatic modeling: new data distributions and conceptual drifts can be accommodated by scheduling techniques, automatically triggering modeling tasks to accommodate these changes. This helps to solve the data drift and concept drift problems without manual intervention.

3. Commonality and portability: the invention provides a universal solution, which is suitable for various machine learning tasks and fields. This means that the iterative process does not need to be re-developed and optimized for each new problem, improving portability.

4. Real-time feedback and model optimization: another innovation of the present invention is the realization of a real-time feedback loop, as well as the optimization of the continuity of the model. And performing performance evaluation in real time according to a new model result returned by the modeling service, and feeding back the result to the modeling service. This enables continuous model improvement, allowing the model to remain efficient in a changing data environment. This real-time feedback mechanism helps to quickly adapt to performance degradation and new data distribution, thereby reducing response time.

5. Maximization of resource utilization: another significant innovation of the present invention is the maximized utilization of resources. Through the scheduling technology, the computing resources can be effectively planned and allocated to meet the requirements of modeling tasks. This includes allocating computing resources, data storage, and other critical resources. The maximized utilization of the resources not only improves the efficiency, but also reduces the resource waste, and contributes to optimizing the cost effectiveness of model improvement.

Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities particularly pointed out in the written application.

The technical scheme of the invention is further described in detail through the drawings and the embodiments.

Drawings

The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:

FIG. 1 is a block diagram of an automatic iterative model service system based on data automatic screening in an embodiment of the invention;

FIG. 2 is a block diagram of a model building block in accordance with an embodiment of the present invention;

fig. 3 is a flowchart of an automatic iterative model service method based on automatic data screening in an embodiment of the present invention.

Detailed Description

The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.

Example 1:

the embodiment of the invention provides an automatic iterative model service system based on data automatic screening, as shown in fig. 1, comprising:

In this embodiment, the model types include various machine learning models.

In this embodiment, the model parameters include, for example, sample weights, various types of parameters, various types of coefficients, and the like.

The invention has the following beneficial effects:

1. automated model iteration: one of the main innovation points of the invention is to realize automatic iteration of the machine learning model without manual parameter adjustment and model update. The model management service can automatically trigger a new modeling task to accelerate the upgrading process of the model.

2. Adaptive and automatic modeling: by scheduling techniques, the model management service can adapt to new data distributions and conceptual drifts, automatically triggering modeling tasks to adapt to these changes. This helps to solve the data drift and concept drift problems without manual intervention.

4. Real-time feedback and model optimization: another innovation of the present invention is the realization of a real-time feedback loop, as well as the optimization of the continuity of the model. The model management service can immediately perform performance evaluation according to the new model result returned by the modeling service and feed the result back to the modeling service. This enables continuous model improvement, allowing the model to remain efficient in a changing data environment. This real-time feedback mechanism helps to quickly adapt to performance degradation and new data distribution, thereby reducing response time.

5. Maximization of resource utilization: another significant innovation of the present invention is the maximized utilization of resources. Through scheduling techniques, the model management service can efficiently plan and allocate computing resources to meet the demands of modeling tasks. This includes allocating computing resources, data storage, and other critical resources. The maximized utilization of the resources not only improves the efficiency, but also reduces the resource waste, and contributes to optimizing the cost effectiveness of model improvement.

Example 2:

based on embodiment 1, the embodiment of the invention provides an automatic iteration model service system based on automatic data screening, which further comprises: the resource allocation module is used for allocating resources in the model scheduling and iteration processes;

a resource allocation module, comprising:

The beneficial effects of above-mentioned design scheme are: in the whole process, the resource allocation module effectively plans and allocates the computing resources to meet the requirements of modeling tasks. This includes resource allocation, data storage, and other critical resources to ensure efficient execution and maximum utilization of tasks, providing a resource basis for modeling.

Example 3:

based on embodiment 1, an embodiment of the present invention provides an automatic iterative model service system based on automatic data screening, as shown in fig. 2, a model building module includes:

The beneficial effects of above-mentioned design scheme are: the model frames corresponding to the model types are imported into the model to obtain information, and the model frames are configured by using model parameters, so that an initial model is obtained by construction, and the model is built.

Example 4:

based on embodiment 1, the embodiment of the invention provides an automatic iteration model service system based on automatic data screening, a trigger determining module, comprising:

In this embodiment, the scheduling weight is the largest as the optimal scheduling policy.

The beneficial effects of above-mentioned design scheme are: the method comprises the steps of acquiring a scheduling strategy to be selected which meets data characteristics from a scheduling strategy library based on data characteristics of updated data, determining scheduling weight for a scheduling type based on set differences, selecting an optimal scheduling strategy from the scheduling strategy to be selected based on the scheduling weight, determining scheduling tasks and scheduling configuration resources from the optimal scheduling strategy, and performing resource scheduling based on the scheduling tasks and the scheduling configuration resources to obtain self-iterative scheduling for an initial model, so that automatic iteration of a machine learning model is realized, and manual parameter adjustment and model updating are not needed. The model management service can automatically trigger new modeling tasks, accelerate the upgrading process of the model, adapt to new data distribution and concept drift, and automatically trigger the modeling tasks to adapt to the changes. This helps to solve the data drift and concept drift problems without manual intervention.

Example 5:

based on embodiment 4, the embodiment of the invention provides an automatic iteration model service system based on automatic data screening, and the trigger determining module further comprises:

The beneficial effects of above-mentioned design scheme are: by setting a scheduling time period, when the existence of updated data is not monitored in the scheduling time period, starting a scheduling technology to select a fixed scheduling strategy to perform self-iterative scheduling on the initial model, and upgrading the model is achieved.

Example 6:

based on embodiment 1, the embodiment of the invention provides an automatic iteration model service system based on automatic data screening, a scheduling execution module, comprising:

The beneficial effects of above-mentioned design scheme are: finally establishing an iterative task through a scheduling flow of self-iterative scheduling, acquiring data characteristics and execution characteristics of the iterative task, and judging whether the execution characteristics are re-executed or not;

if so, determining that the iteration task is the establishment of a new model and calling related resources established by the new model, if not, determining that the iteration task is the upgrading of the initial model and calling related resources of the model upgrading, if not, determining that the data feature is different from the training data feature of the initial model; based on the iteration task, the new model and the establishment or the upgrading of the initial model are carried out by combining related resources, a second model is obtained, the upgrading or the establishment of the model is realized, manual parameter adjustment and model updating are not needed, and the upgrading and the establishment processes of the model are accelerated.

Example 7:

based on embodiment 6, the embodiment of the invention provides an automatic iteration model service system based on automatic data screening, a scheduling function determining unit, comprising:

The beneficial effects of above-mentioned design scheme are: the scheduling process of the self-iterative scheduling is obtained, the scheduling process is divided according to unit scheduling characteristics, a plurality of nodes are obtained, the scheduling function of the nodes is determined based on the unit scheduling characteristics corresponding to the nodes, and the accuracy and the matching performance of the obtained scheduling function are guaranteed.

Example 8:

based on embodiment 1, the embodiment of the invention provides an automatic iteration model service system based on automatic data screening, and an evaluation iteration module, which comprises:

The beneficial effects of above-mentioned design scheme are: before modeling, model data are evaluated in the modeling process and after modeling, iteration is continuously conducted on the second model according to apple results, performance evaluation results are compared with preset evaluation requirements after each iteration is finished, if the performance evaluation results meet the preset evaluation requirements, self-iteration is stopped, a final optimization model is obtained, and real-time feedback circulation and continuous optimization of the model are achieved. And performing performance evaluation in real time according to a new model result returned by the modeling service, and feeding back the result to the modeling service. This enables continuous model improvement, allowing the model to remain efficient in a changing data environment. This real-time feedback mechanism helps to quickly adapt to performance degradation and new data distribution, thereby reducing response time.

Example 9:

based on embodiment 8, the embodiment of the invention provides an automatic iterative model service system based on automatic data screening, and a comprehensive evaluation unit comprises:

The beneficial effects of above-mentioned design scheme are: and the overall relevance and accuracy of the obtained performance evaluation result are ensured.

Example 10:

the embodiment of the invention provides an automatic iterative model service method based on data automatic screening, which is shown in fig. 3 and comprises the following steps:

In this embodiment, the model types include various machine learning models.

The invention has the following beneficial effects:

It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims

1. An automatic iterative model service system based on automatic data screening, comprising:

2. The automatic iterative model service system based on data autoscreening of claim 1, further comprising: the resource allocation module is used for allocating resources in the model scheduling and iteration processes;

a resource allocation module, comprising:

3. The automatic iterative model service system based on data autoscreening of claim 1, wherein said model building module comprises:

4. The automatic iterative model service system based on data autoscreening of claim 1, wherein said trigger determination module comprises:

5. The automatic iterative model service system based on data autoscreening of claim 4, wherein said trigger determination module further comprises:

6. The automatic iterative model service system based on data autoscreening of claim 1, wherein said schedule execution module comprises:

7. The automatic iterative model service system based on automatic data screening according to claim 6, wherein the scheduling function determining unit comprises:

8. The automatic iterative model service system based on data autoscreening of claim 1, wherein said evaluation iteration module comprises:

9. The automatic iterative model service system based on data autofilter of claim 8, wherein said comprehensive evaluation unit comprises:

10. An automatic iterative model service method based on automatic data screening, which is characterized by comprising the following steps: