CN117422155A - Automatic iteration model service system and method based on automatic data screening - Google Patents

Automatic iteration model service system and method based on automatic data screening Download PDF

Info

Publication number
CN117422155A
CN117422155A CN202311746585.7A CN202311746585A CN117422155A CN 117422155 A CN117422155 A CN 117422155A CN 202311746585 A CN202311746585 A CN 202311746585A CN 117422155 A CN117422155 A CN 117422155A
Authority
CN
China
Prior art keywords
model
data
scheduling
evaluation
iteration
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202311746585.7A
Other languages
Chinese (zh)
Inventor
许靖
柴磊
郭帅
袁靖
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Shenzhen Magic Digital Intelligent Artificial Intelligence Co ltd
Original Assignee
Shenzhen Magic Digital Intelligent Artificial Intelligence Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Shenzhen Magic Digital Intelligent Artificial Intelligence Co ltd filed Critical Shenzhen Magic Digital Intelligent Artificial Intelligence Co ltd
Priority to CN202311746585.7A priority Critical patent/CN117422155A/en
Publication of CN117422155A publication Critical patent/CN117422155A/en
Pending legal-status Critical Current

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • General Engineering & Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Physics & Mathematics (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Data Mining & Analysis (AREA)
  • Computing Systems (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Mathematical Physics (AREA)
  • Artificial Intelligence (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The invention provides an automatic iterative model service system and method based on data automatic screening, which are used for selecting model types and model parameters based on training tasks, importing the model types and the model parameters into a model center, and constructing to obtain an initial model; based on the data characteristics of the update data corresponding to the training task, matching an optimal scheduling strategy from a scheduling strategy library, and configuring self-iterative scheduling of an initial model according to the optimal scheduling strategy; establishing an iteration task based on self-iteration scheduling and combining model management parameters, and establishing a new model or upgrading an initial model based on the iteration task to obtain a second model; and performing performance evaluation on the second model, and continuously iterating the second model based on the performance evaluation result until the evaluation result meets the requirement, stopping self-iteration to obtain a final optimized model, solving the problems of data drift and concept drift and the problems of lack of generality and portability, and improving the performance of the model.

Description

Automatic iteration model service system and method based on automatic data screening
Technical Field
The invention relates to the fields of computer science and machine learning, in particular to an automatic iteration model service system and method based on automatic data screening.
Background
In the field of machine learning, iterative upgrading of models has been a key challenge. Traditional machine learning models typically remain static after training and are not self-adaptive. However, the rapidly changing data and real-time requirements place new demands on the flexibility of the model. This provides power for the self-iteration of the model to ensure its continued optimization and improvement.
Currently, many applications in the machine learning field rely on manual intervention to update parameters or architecture of models to accommodate new data and challenges. This method is not only time consuming and laborious, but also may lead to human error. Researchers have therefore sought automated and intelligent methods so that models can automatically identify performance degradation, improve strategies, and iterate themselves without human intervention.
In recent years, with rapid development of deep learning technology, a self-iterative technology of a neural network model has received a great deal of attention. The techniques cover methods such as automatic hyper-parameter adjustment, data enhancement, model architecture search, transfer learning and the like, so that the model can realize continuous improvement of adaptability and performance in various applications. These innovations not only accelerate the development and application of models, but also improve the level of intelligence and automation of the machine learning system, thereby meeting the evolving demands. Under the technical background, the model self-iteration system based on the scheduling technology has potential importance, and brings new possibility for the field of machine learning.
The current model has the following problems in iterative upgrade:
1. manual intervention cost problem: upgrades and modifications to traditional machine learning models often require expensive human effort and time resources. Professional data scientists and engineers must manually intervene to perform parameter adjustments, model retraining and deployment. This process is not only time consuming and laborious, but also prone to error.
2. Model performance plagues: machine learning models often face performance challenges in a changing data environment. The performance of the model may be affected by new data distributions, conceptual drift, or problems inside the model. However, there is a lack of effective methods to identify and solve these performance problems in time, resulting in a gradual loss of effectiveness of the model in practical applications.
3. Data drift and concept drift problems: the distribution and concept of data in the real world often changes, which may be due to seasonal changes, new trends, changes in the data sources, or other factors. Conventional machine learning models are difficult to automatically accommodate for these changes, requiring manual adjustment and retraining, resulting in system downtime and performance degradation.
4. Lack of versatility and portability issues: automated machine learning iterative methods are typically domain or application specific, lacking versatility and portability. This means that each new problem or application requires re-development and optimization of the iterative process, resulting in repeated labor and resource wastage.
Disclosure of Invention
The invention provides an automatic iteration model service system and method based on data automatic screening, which are used for solving the problems in the background technology.
An automatic iterative model service system based on data automatic screening, comprising:
the model building module is used for selecting model types and model parameters based on training tasks, importing the model types and the model parameters into a model center, and building to obtain an initial model;
the trigger determining module is used for matching an optimal scheduling strategy from the scheduling strategy library based on the data characteristics of the update data corresponding to the training task, and configuring self-iterative scheduling of the initial model according to the optimal scheduling strategy;
the scheduling execution module is used for establishing an iteration task based on self-iteration scheduling and combining model management parameters, and establishing a new model or upgrading an initial model based on the iteration task to obtain a second model;
and the evaluation iteration module is used for evaluating the performance of the second model, and continuing to iterate the second model based on the performance evaluation result until the evaluation result meets the requirement, and stopping self-iteration to obtain a final optimization model.
Preferably, an automatic iterative model service system based on automatic data screening further comprises: the resource allocation module is used for allocating resources in the model scheduling and iteration processes;
a resource allocation module, comprising:
the monitoring unit is used for monitoring the service system to obtain a scheduling and iterative operation process;
the resource allocation unit is used for planning and allocating the existing resources based on the scheduling and iterative operation process, and allocating the resources to the corresponding operation modules and operation units according to the allocation result.
Preferably, an automatic iterative model service system based on data automatic screening, a model building module, includes:
the type determining unit is used for analyzing the training task, determining the training purpose and determining the model type based on the training purpose;
the parameter determining unit is used for acquiring a training data set corresponding to the training task and setting model parameters based on the data characteristics of the training data set;
the model construction unit is used for importing the model frames corresponding to the model types into the model information, configuring the model frames by using model parameters, and constructing to obtain an initial model.
Preferably, the trigger determining module includes:
the data acquisition unit is used for periodically acquiring new data related to the training task, judging whether the data difference between the new data and the training data is larger than the preset data difference, if so, taking the new data related to the training task as updated data, otherwise, not updating the data;
the difference determining unit is used for acquiring data set classification characteristics of the training data after the presence of the update data is detected, grouping the update data based on the data set classification characteristics to obtain a plurality of groups of new data sets, and determining set differences between the new data sets and the data sets of the training data;
the strategy determining unit is used for acquiring a scheduling strategy to be selected which meets the data characteristics from the scheduling strategy library based on the data characteristics of the updated data, determining the scheduling weight of the scheduling type based on the set difference, and selecting and acquiring an optimal scheduling strategy from the scheduling strategy to be selected based on the scheduling weight;
and the scheduling determination unit is used for determining scheduling tasks and scheduling configuration resources from the optimal scheduling strategy, and performing resource scheduling based on the scheduling tasks and the scheduling configuration resources to obtain self-iterative scheduling of the initial model.
Preferably, the trigger determining module further includes:
the triggering determining unit is used for starting a scheduling technology by taking the monitored updated data as a triggering condition, and the scheduling technology assists in completing the selection of the optimal scheduling strategy;
the trigger determining unit is further used for setting a scheduling time period, and when the existence of updated data is not monitored in the scheduling time period, starting a scheduling technology to select a fixed scheduling strategy to perform self-iterative scheduling on the initial model.
Preferably, the scheduling execution module includes:
the scheduling function determining unit is used for acquiring a scheduling flow from iterative scheduling and acquiring the scheduling function of each node in the scheduling flow;
the execution data determining unit is used for matching corresponding target management parameters from the model management parameters based on the functional characteristics of the scheduling function, and integrating the functional characteristics and the corresponding target management parameters to obtain total execution data;
the execution function determining unit is used for decomposing the total execution data according to the unit execution characteristics to obtain a plurality of single execution data and determining the execution function corresponding to the single execution data;
the task determining unit is used for sequencing all execution functions based on the iteration standard execution sequence to obtain an execution function sequence, and establishing an iteration task based on the execution function sequence;
the judging unit is used for acquiring the data characteristics and the execution characteristics of the iteration task and judging whether the execution characteristics are re-executed or not;
if yes, determining that the iteration task is the establishment of a new model and calling related resources established by the new model, wherein the data characteristics are different from the training data characteristics of the initial model;
otherwise, the data characteristics are different from the training data characteristics of the initial model, the iteration task is determined to be the upgrading of the initial model, and relevant resources of the model upgrading are called;
the model determining unit is used for carrying out new model and establishment or upgrading of the initial model based on the iterative task and combining related resources to obtain a second model.
Preferably, the scheduling function determining unit includes:
the node determining unit is used for acquiring a scheduling flow of iterative scheduling, dividing the scheduling flow according to unit scheduling characteristics and obtaining a plurality of nodes;
and the function determining unit is used for determining the scheduling function of the node based on the unit scheduling characteristics corresponding to the node.
The method is used for acquiring the scheduling flow from the iterative scheduling and acquiring the scheduling function of each node in the scheduling flow.
Preferably, the evaluation iteration module comprises:
the index determining unit is used for determining an evaluation index of the model based on the training task, and dividing the evaluation index into a pre-modeling evaluation index, a modeling evaluation index and a modeling post-evaluation index according to an evaluation position node of the evaluation index in model establishment;
the classification evaluation unit is used for evaluating modeling data before modeling on the second model based on the evaluation index before modeling to obtain a first evaluation result, performing modeling evaluation on the second model based on the evaluation index in modeling to obtain a second evaluation result, and evaluating the modeled model based on the evaluation index after modeling to obtain a third evaluation result;
the data determining unit is used for constructing overall evaluation data from the first evaluation result, the second evaluation result and the third evaluation result, and correlating the overall evaluation data based on a modeling flow to obtain overall evaluation correlation data;
the comprehensive evaluation unit is used for comprehensively evaluating the overall evaluation associated data to obtain a performance evaluation result;
and the iteration unit is used for continuing to iterate the second model based on the performance evaluation result, comparing the performance evaluation result with a preset evaluation requirement after each iteration is finished, and stopping self-iteration if the performance evaluation result meets the preset evaluation requirement to obtain a final optimization model.
Preferably, the comprehensive evaluation unit includes:
the first evaluation unit is used for evaluating the overall evaluation associated data according to a preset evaluation rule to obtain an initial evaluation result;
and the secondary evaluation unit is used for re-evaluating the initial evaluation result based on the association relation in the overall evaluation association data to obtain a performance evaluation result.
An automatic iterative model service method based on data automatic screening, comprising the following steps:
s1: based on the training task, selecting a model type and model parameters, importing the model type and the model parameters into a model center, and constructing to obtain an initial model;
s2: based on the data characteristics of the update data corresponding to the training task, matching an optimal scheduling strategy from a scheduling strategy library, and configuring self-iterative scheduling of an initial model according to the optimal scheduling strategy;
s3: establishing an iteration task based on self-iteration scheduling and combining model management parameters, and establishing a new model or upgrading an initial model based on the iteration task to obtain a second model;
s4: and performing performance evaluation on the second model, and continuing to iterate the second model based on the performance evaluation result until the evaluation result meets the requirement, and stopping self-iteration to obtain a final optimization model.
Compared with the prior art, the invention has the following beneficial effects:
1. automated model iteration: one of the main innovation points of the invention is to realize automatic iteration of the machine learning model without manual parameter adjustment and model update. The new modeling task can be automatically triggered, and the upgrading process of the model is accelerated.
2. Adaptive and automatic modeling: new data distributions and conceptual drifts can be accommodated by scheduling techniques, automatically triggering modeling tasks to accommodate these changes. This helps to solve the data drift and concept drift problems without manual intervention.
3. Commonality and portability: the invention provides a universal solution, which is suitable for various machine learning tasks and fields. This means that the iterative process does not need to be re-developed and optimized for each new problem, improving portability.
4. Real-time feedback and model optimization: another innovation of the present invention is the realization of a real-time feedback loop, as well as the optimization of the continuity of the model. And performing performance evaluation in real time according to a new model result returned by the modeling service, and feeding back the result to the modeling service. This enables continuous model improvement, allowing the model to remain efficient in a changing data environment. This real-time feedback mechanism helps to quickly adapt to performance degradation and new data distribution, thereby reducing response time.
5. Maximization of resource utilization: another significant innovation of the present invention is the maximized utilization of resources. Through the scheduling technology, the computing resources can be effectively planned and allocated to meet the requirements of modeling tasks. This includes allocating computing resources, data storage, and other critical resources. The maximized utilization of the resources not only improves the efficiency, but also reduces the resource waste, and contributes to optimizing the cost effectiveness of model improvement.
Additional features and advantages of the invention will be set forth in the description which follows, and in part will be obvious from the description, or may be learned by practice of the invention. The objects and other advantages of the invention may be realized and obtained by means of the instrumentalities particularly pointed out in the written application.
The technical scheme of the invention is further described in detail through the drawings and the embodiments.
Drawings
The accompanying drawings are included to provide a further understanding of the invention and are incorporated in and constitute a part of this specification, illustrate the invention and together with the embodiments of the invention, serve to explain the invention. In the drawings:
FIG. 1 is a block diagram of an automatic iterative model service system based on data automatic screening in an embodiment of the invention;
FIG. 2 is a block diagram of a model building block in accordance with an embodiment of the present invention;
fig. 3 is a flowchart of an automatic iterative model service method based on automatic data screening in an embodiment of the present invention.
Detailed Description
The preferred embodiments of the present invention will be described below with reference to the accompanying drawings, it being understood that the preferred embodiments described herein are for illustration and explanation of the present invention only, and are not intended to limit the present invention.
Example 1:
the embodiment of the invention provides an automatic iterative model service system based on data automatic screening, as shown in fig. 1, comprising:
the model building module is used for selecting model types and model parameters based on training tasks, importing the model types and the model parameters into a model center, and building to obtain an initial model;
the trigger determining module is used for matching an optimal scheduling strategy from the scheduling strategy library based on the data characteristics of the update data corresponding to the training task, and configuring self-iterative scheduling of the initial model according to the optimal scheduling strategy;
the scheduling execution module is used for establishing an iteration task based on self-iteration scheduling and combining model management parameters, and establishing a new model or upgrading an initial model based on the iteration task to obtain a second model;
and the evaluation iteration module is used for evaluating the performance of the second model, and continuing to iterate the second model based on the performance evaluation result until the evaluation result meets the requirement, and stopping self-iteration to obtain a final optimization model.
In this embodiment, the model types include various machine learning models.
In this embodiment, the model parameters include, for example, sample weights, various types of parameters, various types of coefficients, and the like.
The invention has the following beneficial effects:
1. automated model iteration: one of the main innovation points of the invention is to realize automatic iteration of the machine learning model without manual parameter adjustment and model update. The model management service can automatically trigger a new modeling task to accelerate the upgrading process of the model.
2. Adaptive and automatic modeling: by scheduling techniques, the model management service can adapt to new data distributions and conceptual drifts, automatically triggering modeling tasks to adapt to these changes. This helps to solve the data drift and concept drift problems without manual intervention.
3. Commonality and portability: the invention provides a universal solution, which is suitable for various machine learning tasks and fields. This means that the iterative process does not need to be re-developed and optimized for each new problem, improving portability.
4. Real-time feedback and model optimization: another innovation of the present invention is the realization of a real-time feedback loop, as well as the optimization of the continuity of the model. The model management service can immediately perform performance evaluation according to the new model result returned by the modeling service and feed the result back to the modeling service. This enables continuous model improvement, allowing the model to remain efficient in a changing data environment. This real-time feedback mechanism helps to quickly adapt to performance degradation and new data distribution, thereby reducing response time.
5. Maximization of resource utilization: another significant innovation of the present invention is the maximized utilization of resources. Through scheduling techniques, the model management service can efficiently plan and allocate computing resources to meet the demands of modeling tasks. This includes allocating computing resources, data storage, and other critical resources. The maximized utilization of the resources not only improves the efficiency, but also reduces the resource waste, and contributes to optimizing the cost effectiveness of model improvement.
Example 2:
based on embodiment 1, the embodiment of the invention provides an automatic iteration model service system based on automatic data screening, which further comprises: the resource allocation module is used for allocating resources in the model scheduling and iteration processes;
a resource allocation module, comprising:
the monitoring unit is used for monitoring the service system to obtain a scheduling and iterative operation process;
the resource allocation unit is used for planning and allocating the existing resources based on the scheduling and iterative operation process, and allocating the resources to the corresponding operation modules and operation units according to the allocation result.
The beneficial effects of above-mentioned design scheme are: in the whole process, the resource allocation module effectively plans and allocates the computing resources to meet the requirements of modeling tasks. This includes resource allocation, data storage, and other critical resources to ensure efficient execution and maximum utilization of tasks, providing a resource basis for modeling.
Example 3:
based on embodiment 1, an embodiment of the present invention provides an automatic iterative model service system based on automatic data screening, as shown in fig. 2, a model building module includes:
the type determining unit is used for analyzing the training task, determining the training purpose and determining the model type based on the training purpose;
the parameter determining unit is used for acquiring a training data set corresponding to the training task and setting model parameters based on the data characteristics of the training data set;
the model construction unit is used for importing the model frames corresponding to the model types into the model information, configuring the model frames by using model parameters, and constructing to obtain an initial model.
The beneficial effects of above-mentioned design scheme are: the model frames corresponding to the model types are imported into the model to obtain information, and the model frames are configured by using model parameters, so that an initial model is obtained by construction, and the model is built.
Example 4:
based on embodiment 1, the embodiment of the invention provides an automatic iteration model service system based on automatic data screening, a trigger determining module, comprising:
the data acquisition unit is used for periodically acquiring new data related to the training task, judging whether the data difference between the new data and the training data is larger than the preset data difference, if so, taking the new data related to the training task as updated data, otherwise, not updating the data;
the difference determining unit is used for acquiring data set classification characteristics of the training data after the presence of the update data is detected, grouping the update data based on the data set classification characteristics to obtain a plurality of groups of new data sets, and determining set differences between the new data sets and the data sets of the training data;
the strategy determining unit is used for acquiring a scheduling strategy to be selected which meets the data characteristics from the scheduling strategy library based on the data characteristics of the updated data, determining the scheduling weight of the scheduling type based on the set difference, and selecting and acquiring an optimal scheduling strategy from the scheduling strategy to be selected based on the scheduling weight;
and the scheduling determination unit is used for determining scheduling tasks and scheduling configuration resources from the optimal scheduling strategy, and performing resource scheduling based on the scheduling tasks and the scheduling configuration resources to obtain self-iterative scheduling of the initial model.
In this embodiment, the scheduling weight is the largest as the optimal scheduling policy.
The beneficial effects of above-mentioned design scheme are: the method comprises the steps of acquiring a scheduling strategy to be selected which meets data characteristics from a scheduling strategy library based on data characteristics of updated data, determining scheduling weight for a scheduling type based on set differences, selecting an optimal scheduling strategy from the scheduling strategy to be selected based on the scheduling weight, determining scheduling tasks and scheduling configuration resources from the optimal scheduling strategy, and performing resource scheduling based on the scheduling tasks and the scheduling configuration resources to obtain self-iterative scheduling for an initial model, so that automatic iteration of a machine learning model is realized, and manual parameter adjustment and model updating are not needed. The model management service can automatically trigger new modeling tasks, accelerate the upgrading process of the model, adapt to new data distribution and concept drift, and automatically trigger the modeling tasks to adapt to the changes. This helps to solve the data drift and concept drift problems without manual intervention.
Example 5:
based on embodiment 4, the embodiment of the invention provides an automatic iteration model service system based on automatic data screening, and the trigger determining module further comprises:
the triggering determining unit is used for starting a scheduling technology by taking the monitored updated data as a triggering condition, and the scheduling technology assists in completing the selection of the optimal scheduling strategy;
the trigger determining unit is further used for setting a scheduling time period, and when the existence of updated data is not monitored in the scheduling time period, starting a scheduling technology to select a fixed scheduling strategy to perform self-iterative scheduling on the initial model.
The beneficial effects of above-mentioned design scheme are: by setting a scheduling time period, when the existence of updated data is not monitored in the scheduling time period, starting a scheduling technology to select a fixed scheduling strategy to perform self-iterative scheduling on the initial model, and upgrading the model is achieved.
Example 6:
based on embodiment 1, the embodiment of the invention provides an automatic iteration model service system based on automatic data screening, a scheduling execution module, comprising:
the scheduling function determining unit is used for acquiring a scheduling flow from iterative scheduling and acquiring the scheduling function of each node in the scheduling flow;
the execution data determining unit is used for matching corresponding target management parameters from the model management parameters based on the functional characteristics of the scheduling function, and integrating the functional characteristics and the corresponding target management parameters to obtain total execution data;
the execution function determining unit is used for decomposing the total execution data according to the unit execution characteristics to obtain a plurality of single execution data and determining the execution function corresponding to the single execution data;
the task determining unit is used for sequencing all execution functions based on the iteration standard execution sequence to obtain an execution function sequence, and establishing an iteration task based on the execution function sequence;
the judging unit is used for acquiring the data characteristics and the execution characteristics of the iteration task and judging whether the execution characteristics are re-executed or not;
if yes, determining that the iteration task is the establishment of a new model and calling related resources established by the new model, wherein the data characteristics are different from the training data characteristics of the initial model;
otherwise, the data characteristics are different from the training data characteristics of the initial model, the iteration task is determined to be the upgrading of the initial model, and relevant resources of the model upgrading are called;
the model determining unit is used for carrying out new model and establishment or upgrading of the initial model based on the iterative task and combining related resources to obtain a second model.
The beneficial effects of above-mentioned design scheme are: finally establishing an iterative task through a scheduling flow of self-iterative scheduling, acquiring data characteristics and execution characteristics of the iterative task, and judging whether the execution characteristics are re-executed or not;
if so, determining that the iteration task is the establishment of a new model and calling related resources established by the new model, if not, determining that the iteration task is the upgrading of the initial model and calling related resources of the model upgrading, if not, determining that the data feature is different from the training data feature of the initial model; based on the iteration task, the new model and the establishment or the upgrading of the initial model are carried out by combining related resources, a second model is obtained, the upgrading or the establishment of the model is realized, manual parameter adjustment and model updating are not needed, and the upgrading and the establishment processes of the model are accelerated.
Example 7:
based on embodiment 6, the embodiment of the invention provides an automatic iteration model service system based on automatic data screening, a scheduling function determining unit, comprising:
the node determining unit is used for acquiring a scheduling flow of iterative scheduling, dividing the scheduling flow according to unit scheduling characteristics and obtaining a plurality of nodes;
and the function determining unit is used for determining the scheduling function of the node based on the unit scheduling characteristics corresponding to the node.
The beneficial effects of above-mentioned design scheme are: the scheduling process of the self-iterative scheduling is obtained, the scheduling process is divided according to unit scheduling characteristics, a plurality of nodes are obtained, the scheduling function of the nodes is determined based on the unit scheduling characteristics corresponding to the nodes, and the accuracy and the matching performance of the obtained scheduling function are guaranteed.
Example 8:
based on embodiment 1, the embodiment of the invention provides an automatic iteration model service system based on automatic data screening, and an evaluation iteration module, which comprises:
the index determining unit is used for determining an evaluation index of the model based on the training task, and dividing the evaluation index into a pre-modeling evaluation index, a modeling evaluation index and a modeling post-evaluation index according to an evaluation position node of the evaluation index in model establishment;
the classification evaluation unit is used for evaluating modeling data before modeling on the second model based on the evaluation index before modeling to obtain a first evaluation result, performing modeling evaluation on the second model based on the evaluation index in modeling to obtain a second evaluation result, and evaluating the modeled model based on the evaluation index after modeling to obtain a third evaluation result;
the data determining unit is used for constructing overall evaluation data from the first evaluation result, the second evaluation result and the third evaluation result, and correlating the overall evaluation data based on a modeling flow to obtain overall evaluation correlation data;
the comprehensive evaluation unit is used for comprehensively evaluating the overall evaluation associated data to obtain a performance evaluation result;
and the iteration unit is used for continuing to iterate the second model based on the performance evaluation result, comparing the performance evaluation result with a preset evaluation requirement after each iteration is finished, and stopping self-iteration if the performance evaluation result meets the preset evaluation requirement to obtain a final optimization model.
The beneficial effects of above-mentioned design scheme are: before modeling, model data are evaluated in the modeling process and after modeling, iteration is continuously conducted on the second model according to apple results, performance evaluation results are compared with preset evaluation requirements after each iteration is finished, if the performance evaluation results meet the preset evaluation requirements, self-iteration is stopped, a final optimization model is obtained, and real-time feedback circulation and continuous optimization of the model are achieved. And performing performance evaluation in real time according to a new model result returned by the modeling service, and feeding back the result to the modeling service. This enables continuous model improvement, allowing the model to remain efficient in a changing data environment. This real-time feedback mechanism helps to quickly adapt to performance degradation and new data distribution, thereby reducing response time.
Example 9:
based on embodiment 8, the embodiment of the invention provides an automatic iterative model service system based on automatic data screening, and a comprehensive evaluation unit comprises:
the first evaluation unit is used for evaluating the overall evaluation associated data according to a preset evaluation rule to obtain an initial evaluation result;
and the secondary evaluation unit is used for re-evaluating the initial evaluation result based on the association relation in the overall evaluation association data to obtain a performance evaluation result.
The beneficial effects of above-mentioned design scheme are: and the overall relevance and accuracy of the obtained performance evaluation result are ensured.
Example 10:
the embodiment of the invention provides an automatic iterative model service method based on data automatic screening, which is shown in fig. 3 and comprises the following steps:
s1: based on the training task, selecting a model type and model parameters, importing the model type and the model parameters into a model center, and constructing to obtain an initial model;
s2: based on the data characteristics of the update data corresponding to the training task, matching an optimal scheduling strategy from a scheduling strategy library, and configuring self-iterative scheduling of an initial model according to the optimal scheduling strategy;
s3: establishing an iteration task based on self-iteration scheduling and combining model management parameters, and establishing a new model or upgrading an initial model based on the iteration task to obtain a second model;
s4: and performing performance evaluation on the second model, and continuing to iterate the second model based on the performance evaluation result until the evaluation result meets the requirement, and stopping self-iteration to obtain a final optimization model.
In this embodiment, the model types include various machine learning models.
In this embodiment, the model parameters include, for example, sample weights, various types of parameters, various types of coefficients, and the like.
The invention has the following beneficial effects:
1. automated model iteration: one of the main innovation points of the invention is to realize automatic iteration of the machine learning model without manual parameter adjustment and model update. The model management service can automatically trigger a new modeling task to accelerate the upgrading process of the model.
2. Adaptive and automatic modeling: by scheduling techniques, the model management service can adapt to new data distributions and conceptual drifts, automatically triggering modeling tasks to adapt to these changes. This helps to solve the data drift and concept drift problems without manual intervention.
3. Commonality and portability: the invention provides a universal solution, which is suitable for various machine learning tasks and fields. This means that the iterative process does not need to be re-developed and optimized for each new problem, improving portability.
4. Real-time feedback and model optimization: another innovation of the present invention is the realization of a real-time feedback loop, as well as the optimization of the continuity of the model. The model management service can immediately perform performance evaluation according to the new model result returned by the modeling service and feed the result back to the modeling service. This enables continuous model improvement, allowing the model to remain efficient in a changing data environment. This real-time feedback mechanism helps to quickly adapt to performance degradation and new data distribution, thereby reducing response time.
5. Maximization of resource utilization: another significant innovation of the present invention is the maximized utilization of resources. Through scheduling techniques, the model management service can efficiently plan and allocate computing resources to meet the demands of modeling tasks. This includes allocating computing resources, data storage, and other critical resources. The maximized utilization of the resources not only improves the efficiency, but also reduces the resource waste, and contributes to optimizing the cost effectiveness of model improvement.
It will be apparent to those skilled in the art that various modifications and variations can be made to the present invention without departing from the spirit or scope of the invention. Thus, it is intended that the present invention also include such modifications and alterations insofar as they come within the scope of the appended claims or the equivalents thereof.

Claims (10)

1. An automatic iterative model service system based on automatic data screening, comprising:
the model building module is used for selecting model types and model parameters based on training tasks, importing the model types and the model parameters into a model center, and building to obtain an initial model;
the trigger determining module is used for matching an optimal scheduling strategy from the scheduling strategy library based on the data characteristics of the update data corresponding to the training task, and configuring self-iterative scheduling of the initial model according to the optimal scheduling strategy;
the scheduling execution module is used for establishing an iteration task based on self-iteration scheduling and combining model management parameters, and establishing a new model or upgrading an initial model based on the iteration task to obtain a second model;
and the evaluation iteration module is used for evaluating the performance of the second model, and continuing to iterate the second model based on the performance evaluation result until the evaluation result meets the requirement, and stopping self-iteration to obtain a final optimization model.
2. The automatic iterative model service system based on data autoscreening of claim 1, further comprising: the resource allocation module is used for allocating resources in the model scheduling and iteration processes;
a resource allocation module, comprising:
the monitoring unit is used for monitoring the service system to obtain a scheduling and iterative operation process;
the resource allocation unit is used for planning and allocating the existing resources based on the scheduling and iterative operation process, and allocating the resources to the corresponding operation modules and operation units according to the allocation result.
3. The automatic iterative model service system based on data autoscreening of claim 1, wherein said model building module comprises:
the type determining unit is used for analyzing the training task, determining the training purpose and determining the model type based on the training purpose;
the parameter determining unit is used for acquiring a training data set corresponding to the training task and setting model parameters based on the data characteristics of the training data set;
the model construction unit is used for importing the model frames corresponding to the model types into the model information, configuring the model frames by using model parameters, and constructing to obtain an initial model.
4. The automatic iterative model service system based on data autoscreening of claim 1, wherein said trigger determination module comprises:
the data acquisition unit is used for periodically acquiring new data related to the training task, judging whether the data difference between the new data and the training data is larger than the preset data difference, if so, taking the new data related to the training task as updated data, otherwise, not updating the data;
the difference determining unit is used for acquiring data set classification characteristics of the training data after the presence of the update data is detected, grouping the update data based on the data set classification characteristics to obtain a plurality of groups of new data sets, and determining set differences between the new data sets and the data sets of the training data;
the strategy determining unit is used for acquiring a scheduling strategy to be selected which meets the data characteristics from the scheduling strategy library based on the data characteristics of the updated data, determining the scheduling weight of the scheduling type based on the set difference, and selecting and acquiring an optimal scheduling strategy from the scheduling strategy to be selected based on the scheduling weight;
and the scheduling determination unit is used for determining scheduling tasks and scheduling configuration resources from the optimal scheduling strategy, and performing resource scheduling based on the scheduling tasks and the scheduling configuration resources to obtain self-iterative scheduling of the initial model.
5. The automatic iterative model service system based on data autoscreening of claim 4, wherein said trigger determination module further comprises:
the triggering determining unit is used for starting a scheduling technology by taking the monitored updated data as a triggering condition, and the scheduling technology assists in completing the selection of the optimal scheduling strategy;
the trigger determining unit is further used for setting a scheduling time period, and when the existence of updated data is not monitored in the scheduling time period, starting a scheduling technology to select a fixed scheduling strategy to perform self-iterative scheduling on the initial model.
6. The automatic iterative model service system based on data autoscreening of claim 1, wherein said schedule execution module comprises:
the scheduling function determining unit is used for acquiring a scheduling flow from iterative scheduling and acquiring the scheduling function of each node in the scheduling flow;
the execution data determining unit is used for matching corresponding target management parameters from the model management parameters based on the functional characteristics of the scheduling function, and integrating the functional characteristics and the corresponding target management parameters to obtain total execution data;
the execution function determining unit is used for decomposing the total execution data according to the unit execution characteristics to obtain a plurality of single execution data and determining the execution function corresponding to the single execution data;
the task determining unit is used for sequencing all execution functions based on the iteration standard execution sequence to obtain an execution function sequence, and establishing an iteration task based on the execution function sequence;
the judging unit is used for acquiring the data characteristics and the execution characteristics of the iteration task and judging whether the execution characteristics are re-executed or not;
if yes, determining that the iteration task is the establishment of a new model and calling related resources established by the new model, wherein the data characteristics are different from the training data characteristics of the initial model;
otherwise, the data characteristics are different from the training data characteristics of the initial model, the iteration task is determined to be the upgrading of the initial model, and relevant resources of the model upgrading are called;
the model determining unit is used for carrying out new model and establishment or upgrading of the initial model based on the iterative task and combining related resources to obtain a second model.
7. The automatic iterative model service system based on automatic data screening according to claim 6, wherein the scheduling function determining unit comprises:
the node determining unit is used for acquiring a scheduling flow of iterative scheduling, dividing the scheduling flow according to unit scheduling characteristics and obtaining a plurality of nodes;
and the function determining unit is used for determining the scheduling function of the node based on the unit scheduling characteristics corresponding to the node.
8. The automatic iterative model service system based on data autoscreening of claim 1, wherein said evaluation iteration module comprises:
the index determining unit is used for determining an evaluation index of the model based on the training task, and dividing the evaluation index into a pre-modeling evaluation index, a modeling evaluation index and a modeling post-evaluation index according to an evaluation position node of the evaluation index in model establishment;
the classification evaluation unit is used for evaluating modeling data before modeling on the second model based on the evaluation index before modeling to obtain a first evaluation result, performing modeling evaluation on the second model based on the evaluation index in modeling to obtain a second evaluation result, and evaluating the modeled model based on the evaluation index after modeling to obtain a third evaluation result;
the data determining unit is used for constructing overall evaluation data from the first evaluation result, the second evaluation result and the third evaluation result, and correlating the overall evaluation data based on a modeling flow to obtain overall evaluation correlation data;
the comprehensive evaluation unit is used for comprehensively evaluating the overall evaluation associated data to obtain a performance evaluation result;
and the iteration unit is used for continuing to iterate the second model based on the performance evaluation result, comparing the performance evaluation result with a preset evaluation requirement after each iteration is finished, and stopping self-iteration if the performance evaluation result meets the preset evaluation requirement to obtain a final optimization model.
9. The automatic iterative model service system based on data autofilter of claim 8, wherein said comprehensive evaluation unit comprises:
the first evaluation unit is used for evaluating the overall evaluation associated data according to a preset evaluation rule to obtain an initial evaluation result;
and the secondary evaluation unit is used for re-evaluating the initial evaluation result based on the association relation in the overall evaluation association data to obtain a performance evaluation result.
10. An automatic iterative model service method based on automatic data screening, which is characterized by comprising the following steps:
s1: based on the training task, selecting a model type and model parameters, importing the model type and the model parameters into a model center, and constructing to obtain an initial model;
s2: based on the data characteristics of the update data corresponding to the training task, matching an optimal scheduling strategy from a scheduling strategy library, and configuring self-iterative scheduling of an initial model according to the optimal scheduling strategy;
s3: establishing an iteration task based on self-iteration scheduling and combining model management parameters, and establishing a new model or upgrading an initial model based on the iteration task to obtain a second model;
s4: and performing performance evaluation on the second model, and continuing to iterate the second model based on the performance evaluation result until the evaluation result meets the requirement, and stopping self-iteration to obtain a final optimization model.
CN202311746585.7A 2023-12-19 2023-12-19 Automatic iteration model service system and method based on automatic data screening Pending CN117422155A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311746585.7A CN117422155A (en) 2023-12-19 2023-12-19 Automatic iteration model service system and method based on automatic data screening

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311746585.7A CN117422155A (en) 2023-12-19 2023-12-19 Automatic iteration model service system and method based on automatic data screening

Publications (1)

Publication Number Publication Date
CN117422155A true CN117422155A (en) 2024-01-19

Family

ID=89528845

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311746585.7A Pending CN117422155A (en) 2023-12-19 2023-12-19 Automatic iteration model service system and method based on automatic data screening

Country Status (1)

Country Link
CN (1) CN117422155A (en)

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704178A (en) * 2019-09-04 2020-01-17 北京三快在线科技有限公司 Machine learning model training method, platform, electronic equipment and readable storage medium
CN111444848A (en) * 2020-03-27 2020-07-24 广州英码信息科技有限公司 Specific scene model upgrading method and system based on federal learning
US20220094709A1 (en) * 2020-09-18 2022-03-24 Paypal, Inc. Automatic Machine Learning Vulnerability Identification and Retraining
CN115630708A (en) * 2022-10-31 2023-01-20 中国建设银行股份有限公司 Model updating method and device, electronic equipment, storage medium and product

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN110704178A (en) * 2019-09-04 2020-01-17 北京三快在线科技有限公司 Machine learning model training method, platform, electronic equipment and readable storage medium
CN111444848A (en) * 2020-03-27 2020-07-24 广州英码信息科技有限公司 Specific scene model upgrading method and system based on federal learning
US20220094709A1 (en) * 2020-09-18 2022-03-24 Paypal, Inc. Automatic Machine Learning Vulnerability Identification and Retraining
CN115630708A (en) * 2022-10-31 2023-01-20 中国建设银行股份有限公司 Model updating method and device, electronic equipment, storage medium and product

Non-Patent Citations (1)

* Cited by examiner, † Cited by third party
Title
刘立 等: "人机融合智能的遥感解译生产新方法", 测绘通报, no. 7, 31 December 2022 (2022-12-31), pages 118 - 123 *

Similar Documents

Publication Publication Date Title
US20220300812A1 (en) Workflow optimization
CN107888669B (en) Deep learning neural network-based large-scale resource scheduling system and method
CN102622260B (en) Optimization method and optimization system of on-line iteration compiling
CN109891438B (en) Numerical quantum experiment method and system
Dehghanimohammadabadi et al. A novel Iterative Optimization-based Simulation (IOS) framework: An effective tool to optimize system’s performance
CN117522084B (en) Automatic concrete grouting scheduling system
CN117076077A (en) Planning and scheduling optimization method based on big data analysis
CN109933907B (en) Method and device for establishing equipment management service model
CN115220882A (en) Data processing method and device
CN117608865B (en) Mathematical model service method and system of take-away meal delivery platform based on cloud computing
CN113568747B (en) Cloud robot resource scheduling method and system based on task classification and time sequence prediction
CN114637586A (en) Data-driven online prediction and K8S resource over-sale realization method
Venkataswamy et al. Rare: Renewable energy aware resource management in datacenters
RU2411574C2 (en) Intellectual grid-system for highly efficient data processing
CN111105050B (en) Fan maintenance plan generation method, device, equipment and storage medium
CN117422155A (en) Automatic iteration model service system and method based on automatic data screening
WO2020062047A1 (en) Scheduling rule updating method, device, system, storage medium and terminal
Li Assembly line balancing under uncertain task time and demand volatility
Golenko-Ginzburg et al. Resource constrained scheduling simulation model for alternative stochastic network projects
CN111079974A (en) Information processing method, electronic equipment and computer storage medium
US11961099B2 (en) Utilizing machine learning for optimization of planning and value realization for private networks
Meidan et al. Data mining for cycle time key factor identification and prediction in semiconductor manufacturing
Quan et al. Multi-objective evolutionary scheduling based on collaborative virtual workflow model and adaptive rules for flexible production process with operation reworking
CN113657844B (en) Task processing flow determining method and device
CN117973431B (en) Optimal bipartite consensus control method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination