CN113505895B

CN113505895B - Machine learning engine service system, model training method and configuration method

Info

Publication number: CN113505895B
Application number: CN202110897441.6A
Authority: CN
Inventors: 程战战
Original assignee: Shanghai Goldway Intelligent Transportation System Co Ltd
Current assignee: Shanghai Goldway Intelligent Transportation System Co Ltd
Priority date: 2021-08-05
Filing date: 2021-08-05
Publication date: 2023-05-05
Anticipated expiration: 2041-08-05
Also published as: CN113505895A

Abstract

The embodiment of the invention provides a machine learning engine service system, a model training method and a configuration method, wherein in the system, a model management module acquires corresponding target configuration information based on a target task determined by a user, and each configuration information comprises network model information used by a task model. And then, based on the target configuration information, according to a target training process, invoking a model training engine to train the model based on the target data set. According to the system provided by the embodiment of the invention, the configuration information of the task model contains the related information required to be called when the task model is trained, when a user uses the task model, the corresponding configuration file can be directly called to train the model to obtain a desired result, and code migration is not required, so that the threshold of machine learning is reduced, and the robustness of machine learning is improved.

Description

Machine learning engine service system, model training method and configuration method

Technical Field

The present invention relates to the field of machine learning technologies, and in particular, to a machine learning engine service system, a model training method and a configuration method.

Background

Currently, machine learning techniques are widely used in various technical fields, such as: video monitoring, behavior analysis, image processing, and the like.

In order to implement machine learning, in the related art, some deep learning frameworks are provided, such as: caffe (Convolutional Architecture for Fast Feature Embedding, fast feature embedded convolution), tensorFlow, pytorch, etc.

By applying the deep learning frames, aiming at common machine learning models such as neural network models, research personnel are not required to code from a complex neural network, the existing models can be selected according to the needs, model parameters can be obtained through training, and the training can be performed after the own layer is added on the basis of the existing models.

However, when a new algorithm is required, a developer is required to write a new algorithm code for the new algorithm, and then call a function interface of the deep learning framework to use the new algorithm customized by the developer.

With the rapid development of artificial intelligence technology, advanced machine learning algorithms based on, for example: semi-supervised learning, active learning, etc. Advanced machine learning algorithms are defined as: refers to a machine learning thought or model, which is usually realized by combining a plurality of sub-algorithms, the sub-algorithm used by each advanced machine learning algorithm is not fixed, and can be selected by a developer according to actual requirements.

In practical algorithm development, for each advanced machine learning algorithm, a developer usually performs single-point technology development on a single sub-algorithm to be used, and then splices a plurality of sub-algorithms together in a code migration mode, so as to define a new algorithm.

In practice, some single sub-algorithms may be applied to a variety of advanced machine learning algorithms. In this case, it is still necessary to migrate the individual sub-algorithms separately for each advanced machine learning algorithm, which is costly.

Therefore, aiming at the advanced machine learning algorithm, the deep learning framework in the related technology cannot be directly applied, and the research and development personnel not only need to conduct single-point technology research and development of a single sub-algorithm, but also need to conduct code migration, so that the robustness is poor.

Disclosure of Invention

The embodiment of the invention aims to provide a machine learning engine service system and a model training method so as to improve the robustness of machine learning. The specific technical scheme is as follows:

in one aspect of an embodiment of the present invention, there is provided a machine learning engine service system, the system including:

the system comprises a model management module, a data management module, a model training engine, a configuration information storage module and a model information storage module;

The model training engine is used for training a network model;

the configuration information storage module is used for storing the configuration information of each task model provided by the system; the configuration information of each task model comprises: the configuration information of the network model used by the task model, the super parameters of the network model and the data set used by the task model;

the model information storage module is used for storing model data of each network model provided by the system; the model data of the network model includes program code for the network model;

the model management module is used for obtaining target configuration information of a corresponding target task model from the configuration information storage module based on a target task determined by a user, and constructing and loading the target task model based on the target configuration information and model data of a network model stored in the model information storage module; and invoking the data management module to load a target data set; according to a preset target training process of the target task, invoking the model training engine, and training the target task model based on the target data set; the target training process comprises the following steps: training the target network model in the target configuration information;

The data management module is used for loading the target data set based on the configuration information of the target data set in the target configuration information when being called by the model management module.

In one embodiment of the invention, the system further comprises: the policy information storage module and the policy management module;

the configuration information stored in the configuration information storage module further comprises configuration information of at least one task model: policy configuration information; the policy configuration information includes: the task model needs the strategy identification of at least one strategy to be called and strategy parameter configuration information of each strategy; one strategy identifier corresponds to one sub-algorithm in the advanced machine learning algorithm;

the policy information storage module is used for storing program codes of each policy provided by the system;

the model management module is specifically configured to obtain target configuration information of a corresponding target task model from the configuration information storage module based on a target task determined by a user, and construct and load the target task model based on the target configuration information and model data of a network model stored in the model information storage module; and invoking the data management module to load a target data set;

Under the condition that the configuration information of the target task model contains strategy configuration information, invoking the model training engine and the strategy management module according to a preset target training process of the target task, and training the target task model based on the target data set; the target training process comprises the following steps: training a target network model in the target configuration information, and loading and executing a target strategy in the target configuration information;

and the policy management module is used for loading and executing the target policy based on the program codes of the policies stored in the policy information storage module and the policy configuration information of the target task model when being called by the model management module, and feeding back a policy execution result to the model management module.

In one embodiment of the invention, the system further comprises: each training interface corresponds to one task model, and each training interface comprises a training process of the corresponding task model;

the model management module, when the configuration information of the target task model includes policy information, invokes the model training engine and the policy management module according to a preset target training process of the target task, and trains the target task model based on the target data set, including: determining a target training interface corresponding to the target task, calling the target training interface to call the model training engine and the strategy management module according to a preset target training process of the target task, and training the target task model based on the target data set.

In one embodiment of the invention, the system further comprises: a model configuration module;

the model configuration module is used for obtaining configuration information for configuring the task model by a research and development user and storing the configuration information into the configuration information storage module;

the model information storage module is also used for storing model data of a new network model imported by a research user;

the policy information storage module is further configured to store program codes for developing a new policy imported by the user.

In one embodiment of the invention, the system further comprises: a data output module;

the data output module is used for outputting various data to be output according to a preset unified output format.

In one embodiment of the invention, the system further comprises: one or more of a script presentation module, a tool module, and a log analysis module:

the script display module is used for displaying script instances of the process of training the target task model by the model management module to a user based on display instructions triggered by the user and/or the research and development user;

the tool module is used for providing auxiliary tools for users and/or research and development users; the auxiliary tool at least comprises: the feature visualization display tool is used for displaying feature changes and/or feature distribution in the training process of the target task model;

The log analysis module is used for analyzing log data generated by the model management module in the process of training the target task model.

In a second aspect of the embodiment of the present invention, there is further provided a training method of a task model, applied to the machine learning engine service system, the method including:

the model management module is used for obtaining a target task determined by a user;

obtaining target configuration information of a corresponding target task model from the configuration information storage module;

constructing and loading the target task model based on the target configuration information and the model data of the network model stored in the model information storage module; the model data of the network model includes program code for the network model;

invoking the data management module, and loading a target data set based on configuration information of the target data set in the target configuration information;

according to a preset training process of the target task, invoking the model training engine, and training the target task model based on the target data set; the training process comprises the following steps: and training the target network model in the target configuration information.

the configuration information stored in the configuration information storage module further comprises configuration information of at least one task model: policy configuration information; the policy configuration information includes: the task model needs the strategy identification of at least one strategy to be called and strategy parameter configuration information of each strategy; wherein, a strategy mark corresponds to a sub-algorithm in the advanced machine learning algorithm;

the policy management module is used for loading and executing the target policy based on the program codes of the policies stored in the policy information storage module and the policy configuration information of the target task model when being called by the model management module, and feeding back a policy execution result to the model management module;

the step of calling the model training engine to train the target task model based on the target data set according to the preset training flow of the target task comprises the following steps:

under the condition that the configuration information of the target task model contains strategy information, invoking the model training engine and the strategy management module according to a preset target training process of the target task, and training the target task model based on the target data set; the target training process comprises the following steps: training the target network model in the target configuration information, and loading and executing the target strategy in the target configuration information.

and under the condition that the configuration information of the target task model contains strategy information, invoking the model training engine and the strategy management module according to a preset target training process of the target task, and training the target task model based on the target data set, wherein the method comprises the following steps of:

determining a target training interface corresponding to the target task, calling the target training interface to call the model training engine and the strategy management module according to a preset target training process of the target task, and training the target task model based on the target data set.

In one embodiment of the present invention, when the configuration information of the target task model includes policy information, the model training engine and the policy management module are invoked according to a preset training process of the target task, and the step of training the target task model based on the target data set includes:

Determining the current training process step to be executed based on the preset training process of the target task;

under the condition that the current training process step is the step of executing the target strategy, calling the strategy management module, obtaining the program code of the target strategy from the strategy information storage module for loading, and executing the target strategy in the current training process step;

receiving a policy execution result fed back by the policy management module;

under the condition that the current training process step is to train the target network model in the target configuration information, invoking the model training engine to train the target network model based on the target data set;

receiving training results fed back by the model training engine;

and returning to the training process based on the preset target task under the condition that the current training process step is not the last training process, and determining the step of the training process step to be executed.

In one embodiment of the invention, the method further comprises:

and outputting various data to be output according to a preset unified output format.

In a third aspect of the embodiment of the present invention, there is further provided a method for configuring a task model, applied to the machine learning engine service system, the method including:

Model data of each network model imported by a research and development user are received and stored in the model information storage module; the model data of the network model includes program code for the network model;

configuration information of a research and development user for configuring each task model is obtained and stored in the configuration information storage module; the configuration information of each task model comprises: structural information of a network model used by the task model, super parameters of the network model, and configuration information of a data set used by the task model.

the method further comprises the steps of:

program codes for researching and developing all strategies imported by a user are received and stored in the strategy information storage module;

obtaining strategy configuration information configured by a research and development user on at least one task model, and storing the strategy configuration information into the configuration information storage module, wherein the strategy configuration information comprises: the task model needs the strategy identification of at least one strategy to be called and strategy parameter configuration information of each strategy; one strategy identifier corresponds to one sub-algorithm in the advanced machine learning algorithm;

In yet another aspect of the embodiments of the present invention, an electronic device is provided, including a processor, a communication interface, a memory, and a communication bus, where the processor, the communication interface, and the memory complete communication with each other through the communication bus;

a memory for storing a computer program;

and the processor is used for realizing the training or configuration method steps of any task model when executing the program stored in the memory.

The embodiment of the invention also provides a computer readable storage medium, wherein a computer program is stored in the computer readable storage medium, and the computer program realizes the training or configuration method steps of any task model when being executed by a processor.

Embodiments of the present invention also provide a computer program product comprising instructions which, when run on a computer, cause the computer to perform a method of training or configuring a task model as described in any one of the above.

The embodiment of the invention has the beneficial effects that:

according to the machine learning engine service system provided by the embodiment of the invention, the model management module obtains corresponding target configuration information from the configuration information storage module based on the target task determined by the user, and the task model configuration information of each task comprises the structure information and the super parameters of the network model used by the task model. And then constructing and loading a target task model according to the target configuration information and the model data of the network model stored in the model information storage module, calling a model training engine according to a target training process of the target task, and training the target task model based on the target data set loaded by the data management module. In the machine learning engine service system provided by the embodiment of the invention, the configuration information of the task model contains the related information required to be called when the task model is trained, and when the task model is used by a user, the corresponding configuration file can be directly called to train the model to obtain a desired result, and code migration is not required, so that the threshold of machine learning is reduced and the robustness of machine learning is improved.

Of course, it is not necessary for any one product or method of practicing the invention to achieve all of the advantages set forth above at the same time.

Drawings

In order to more clearly illustrate the embodiments of the invention or the technical solutions in the prior art, the drawings used in the embodiments or the description of the prior art will be briefly described below, and it is obvious that the drawings in the following description are only some embodiments of the invention, and other embodiments may be obtained according to these drawings to those skilled in the art.

FIG. 1 is a schematic diagram of a machine learning engine service system according to an embodiment of the present invention;

FIG. 2 is a schematic diagram of a second architecture of a machine learning engine service system according to an embodiment of the present invention;

FIG. 3a is a schematic diagram of a third architecture of a machine learning engine service system according to an embodiment of the present invention;

fig. 3b is a schematic diagram of an external interface of a machine learning engine service system according to an embodiment of the present invention;

fig. 4 is a schematic diagram of a fourth architecture of a machine learning engine service system according to an embodiment of the present invention;

FIG. 5 is a schematic diagram of a fifth architecture of a machine learning engine service system according to an embodiment of the present invention;

FIG. 6a is a block diagram of a machine learning engine service system according to an embodiment of the present invention;

FIG. 6b is a graph showing the variation of the Loss curve;

FIG. 7 is a schematic diagram of a training process of a machine learning engine service system according to an embodiment of the present invention;

FIG. 8 is a flowchart of a task model training method according to an embodiment of the present invention;

FIG. 9 is a second flowchart of a task model training method according to an embodiment of the present invention;

FIG. 10 is a flowchart of a task model training method for training specifically according to an embodiment of the present invention;

FIG. 11 is a flowchart of a task model configuration method according to an embodiment of the present invention;

FIG. 12 is a second flowchart of a task model configuration method according to an embodiment of the present invention;

fig. 13 is a schematic structural diagram of an electronic device according to an embodiment of the present invention;

fig. 14 is a schematic structural diagram of another electronic device according to an embodiment of the present invention.

Detailed Description

The following description of the embodiments of the present invention will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present invention, but not all embodiments. Based on the embodiments of the present invention, those of ordinary skill in the art will be able to devise all other embodiments that are obtained based on this application and are within the scope of the present invention.

In order to improve the robustness of machine learning, the embodiment of the invention provides a machine learning engine service system, a model training method and a configuration method. The machine learning engine service system provided by the embodiment of the invention is first described in detail below.

As shown in fig. 1, fig. 1 is a schematic structural diagram of a machine learning engine service system according to an embodiment of the present invention, where the system may include: a model management module 101, a configuration information storage module 102, a data management module 103, a model information storage module 104, and a model training engine 105;

the model training engine 105 is configured to train a network model;

the configuration information storage module 102 is configured to store configuration information of each task model provided by the system; the configuration information of each task model comprises: the configuration information of the network model used by the task model, the super parameters of the network model and the data set used by the task model;

the model information storage module 104 is configured to store model data of each network model provided by the system; the model data of the network model includes program code for the network model;

The model management module 101 is configured to obtain, from the configuration information storage module 102, target configuration information of a corresponding target task model based on a target task determined by a user, and construct and load the target task model based on the target configuration information and model data of a network model stored in the model information storage module 104; and invokes the data management module 103 to load a target data set; according to a preset target training process of the target task, invoking the model training engine 105 to train the target task model based on the target data set; the target training process comprises the following steps: training the target network model in the target configuration information;

the data management module 103 is configured to load a target data set based on configuration information of the target data set in the target configuration information when invoked by the model management module 101.

In the embodiment of the present invention, the user may meet the user's requirement by selecting the task provided by the machine learning engine service system, where the task may include classification, detection, and the like, and the corresponding task model may be a classifier model, a detector model, and the like, and is not limited herein. And different task model configuration information can be configured in advance for different tasks and stored in the configuration information storage module.

As a specific implementation manner of the embodiment of the invention, the configuration information of each task model can be stored in a configuration file form. The configuration file can be written by a developer in the development process, and of course, when the user uses the machine learning engine service system, the user can rewrite the configuration file in the system configuration information storage module so as to meet the use requirement of the user.

As described above, the model information storage module stores model data of each network model provided by the system, where the network model may include a res net network, an RPN network, an RCNN network, and the like, and the network model data of each network model refers to specific program codes of the network model. The task models may include one or more of the network models, and accordingly, configuration information of the task models may include a structure and super parameters of the network model used by the task models, for example, network structure information such as a network layer number, an input channel number, an output channel number, and the like of the network model, and super parameters such as a learning rate.

As a specific implementation manner of the embodiment of the present invention, based on fig. 1, as shown in fig. 2, the system may further include: a policy management module 206 and a policy information storage module 207;

the configuration information stored in the configuration information storage module 102 further includes configuration information of at least one task model: policy configuration information; the policy configuration information includes: the task model needs the strategy identification of at least one strategy to be called and strategy parameter configuration information of each strategy; one strategy identifier corresponds to one sub-algorithm in the advanced machine learning algorithm;

the policy information storage module 207 is configured to store program codes of each policy provided by the system;

the model management module 101 is specifically configured to obtain, from the configuration information storage module 102, target configuration information of a corresponding target task model based on a target task determined by a user, and construct and load the target task model based on the target configuration information and model data of a network model stored in the model information storage module 104; and invokes the data management module 103 to load a target data set;

when the configuration information of the target task model includes policy configuration information, invoking the model training engine 105 and the policy management module 206 according to a preset target training process of the target task, and training the target task model based on the target data set; the target training process comprises the following steps: training a target network model in the target configuration information, and loading and executing a target strategy in the target configuration information;

The policy management module 206 is configured to load and execute the target policy based on the program code of the policy stored in the policy information storage module 207 and the policy configuration information of the target task model, and feed back a policy execution result to the model management module 101 when the policy management module is invoked.

As mentioned above, the policy information storage module stores the program codes of the policies provided by the system, and in this embodiment of the present invention, the policies are specific sub-algorithms in the advanced machine learning algorithm, such as SVM algorithm, KNN algorithm, enhanced disaggregation method (augmentor), minimum reliability (reliability), etc., where the advanced machine learning algorithm refers to a machine learning idea or paradigm, and is usually implemented by combining multiple sub-algorithms. One or more of the sub-algorithms (strategies) can be contained in each task model, correspondingly, the configuration information of the task model can also contain strategy configuration information, the strategy configuration information can comprise the identifications of the strategies and strategy parameter configuration information of the strategies, the strategy identifications correspond to the strategies one by one, when the strategies are called based on the configuration file, specific strategies can be called based on the strategy identifications, and the strategy parameter configuration information is the specific implementation process of the sub-algorithms. As a specific implementation of the embodiment of the invention, a policy identifier can be generated for a sub-algorithm (policy), and the policy identifier is added to the configuration file when the configuration file is written.

Therefore, each strategy can be called by different task models, the reusability of each sub-algorithm is improved, and the development cost of the algorithm is reduced.

As described above, the configuration information of each task model includes the configuration information of the network model used by the task model, the super parameter of the network model, the policy identifier of the policy to be invoked by the task model, and the policy parameter configuration information, and the configuration information may further include the data set configuration information used by the corresponding task model. The data management module can perform operations such as construction, division and the like on the data set based on the data set configuration information, and can also perform operations such as addition, deletion and the like on the data set, for example, the data set can be divided into a training set and a testing set, and the results of the operations such as construction, division and the like are loaded for use in a model training process. As a specific implementation of the embodiment of the present invention, the data management module may convert the data set into a unified format, such as a Json format. The data management modules may include an incremental dataset management (IncDatasetManager) sub-module, an active learning dataset management (ALDatasetManager) sub-module, and the like.

In the embodiment of the invention, after the model management module in the machine learning engine service system acquires the target task selected by the user, the corresponding target configuration information can be acquired from the configuration information storage module, a target task model is constructed according to the target configuration information, the model training engine and the strategy management module are called, and the target task model is trained based on the target data set loaded by the data management module. As a specific implementation manner, the target data set may include a training set and a test set, and the model management module may train the task model on the training set, evaluate the task model on the test set, and finally output a trained model file and an evaluation result. In the embodiment of the present invention, the model training engine may be Caffe, tensorFlow, pytorch or the like.

In an embodiment of the present invention, the model management module may include a semi-supervised detector management (semidetector manager) sub-module, an active learning detector management (alidetector manager) sub-module, an active learning classifier management (ALClassifierManager) sub-module, and the like.

As a specific implementation manner of the embodiment of the present invention, the model training may be performed according to a preset target training process, where the target training process may include a network model training sequence in the target configuration information, and as described above, the target training process may further include an execution sequence of a target policy when the configuration information of the target task model includes policy configuration information.

Generally, the learning method used in the model training may mainly include active learning, semi-supervised learning and incremental learning, and for each learning method, the learning method may be abstracted into an operation flow of a plurality of atomic algorithms according to its algorithm idea, for example, for semi-supervised learning, label-free data is usually obtained, the label-free data is input into a trained teacher model to obtain a label corresponding to the output, meanwhile, the label-free data is input into a model to be trained (student model) to obtain a pseudo label corresponding to the output, and then consistency training is performed based on the label output by the teacher model and the pseudo label output by the student model, where the process may be abstracted to be implemented by the following procedures: acquiring unlabeled data, labeling the unlabeled data with a pseudo tag, and performing consistency training based on the pseudo tag. And a developer can develop and optimize each link in the operation flow, and the target training flow is the result of developing and optimizing each link in the operation flow by the developer.

As a specific implementation manner of the embodiment of the present invention, based on fig. 2, as shown in fig. 3a, fig. 3a is a third structural schematic diagram of a machine learning engine service system provided by the embodiment of the present invention; the system may further include a plurality of training interfaces 304, where each training interface corresponds to a task model, and each training interface includes a training process of the corresponding task model;

Therefore, when the configuration information of the target task model includes policy information, the model management module 101 may call the model training engine 105 and the policy management module 206 according to a target training process of a preset target task, and train the target task model based on the target training set, where the training may be: determining a target training interface corresponding to the target task, calling the target training interface to call the model training engine 105 and the policy management module 206 according to a preset target training process of the target task, and training the target task model based on the target data set.

That is, as a specific implementation manner of the embodiment of the present invention, after the user selects the target task, the machine learning engine service system may invoke the model training engine and the policy management module through the training interface corresponding to the target task, and obtain the model data of the network model included in the corresponding task model from the model information storage module 104, so as to train the target task model. As shown in fig. 3b, the training interface (interface layer) may include an incremental interface, an active interface, a semi-supervised interface, and the like, and the user (user layer) may select the target task, that is, determine the corresponding target configuration information, the target task model, and the target data set, and then call the engine service (algorithm layer) through the corresponding interface to train the target task model, where the engine service performs specific algorithm execution by the model training engine and the policy management module.

As described above, the machine learning engine service system provided by the embodiment of the invention can enable the user to select the target task independently and train the model according to the target task. On the other hand, the machine learning engine service system can also face a developer user (research and development user), and the research and development user can complete the perfect upgrading of the formulated algorithm idea based on the system.

Accordingly, as a specific implementation of the embodiment of the present invention, based on fig. 2, as shown in fig. 4, the machine learning engine service system may further include a model configuration module 408;

the model configuration module 408 is configured to obtain configuration information for configuring the task model by the developing user, and store the configuration information in the configuration information storage module;

the developer can configure configuration information of different task models according to different task demands, and each task model configuration information can comprise one or more pieces of network model information, one or more pieces of strategy information and data set configuration information.

The model information storage module 104 is further configured to store model data for developing a new network model imported by a user;

The policy information storage module 207 is further configured to store program codes for developing a new policy imported by a user.

If the network model and the strategy used by the research and development user when configuring the configuration information of each task model are stored in the model information storage module and the strategy information storage module, new network model or strategy program code does not need to be imported. If more than one stored network model and/or policy is used by the task model, then a developer is required to import new network models and/or policy program code into the corresponding storage module.

As a specific implementation of the embodiment of the present invention, based on fig. 2, as shown in fig. 5, the machine learning engine service system further includes a data output module 508;

the data output module 508 is configured to output various data to be output according to a preset unified output format.

As a specific implementation manner of the embodiment of the present invention, the data to be output may include a trained model, an operation result list, and the like. If the model training engine is a pytorch, the trained model can be output in the form of a pth file, the operation result list can be output in the form of html and the like, and the data output module can package the output data into a unified Json format for output.

As shown in fig. 6a, the machine learning engine service system 600 may further include one or more of a script presentation module 608, a tool module 609, and a log analysis module 610, in addition to the model management module 601, the configuration information storage module 602, the data management module 603, the model information storage module 604, the model training engine 605, the policy management module 606, and the policy information storage module 607, as an implementation of an embodiment of the present invention:

the script display module 608 is configured to display, to a user, a script instance of a process of performing the target task model training by the model management module based on a display instruction triggered by the user and/or the developing user;

as a specific implementation manner of the embodiment of the present invention, the user or the developing user may trigger the display instruction by clicking the "display" button, which is not limited herein. After receiving the display instruction, the system can display the model training process to the user, for example, the specific running process of the training code can be displayed to the user.

The tool module 609 is configured to provide an auxiliary tool for the user and/or the research user; the auxiliary tool at least comprises: the feature visualization display tool is used for displaying feature changes and/or feature distribution in the training process of the target task model;

As a specific implementation manner of the embodiment of the present invention, feature visualization may be performed during the training process, where the feature visualization may include various manners, such as visualization of Loss (Loss value), visualization of a single/multi-channel feature map, etc., as shown in fig. 6b, and fig. 6b is a Loss curve, which shows the variation condition of test Loss and training Loss with the number of iterations, and may intuitively display the model training process to the user. The auxiliary tool may further include a test tool, etc., and is not particularly limited herein.

The log analysis module 610 is configured to analyze log data generated by the model management module during training of the target task model.

The log data can record various operation data in the process of training the target task model, and a user can analyze the abnormal part in the process of training the model according to the log data and modify and optimize the abnormal part.

As shown in fig. 7, fig. 7 is a schematic diagram of a training flow of a machine learning engine service system provided by an embodiment of the present invention, which shows a semi-supervised detection method based on YOLO, firstly, a configuration file solution/xx_solution. Py is read through a configuration file instruction, then, a model manager model_manager is called, the model manager builds a detector according to a network model structure configuration file models/builder.py in the configuration file, when the detector is built, the model is initialized by using a_init_function according to model configuration file models/detectors/yolo_v3.py, and a detector (build_detector), a back_backup network (build_back), a back network (build_back), and a head network (build_head_head) are built, and in general, the back network is a backbone network of a task model, and in order that the back is a network of extracting characteristics, when the back is a network extracting characteristics, the back network is a network extracting characteristics, and the back network is a network with which can be extracted by using the back network before the back network.

The data manager (dataset_manager) may construct a dataset based on the dataset configuration information (dataset/builder.py) in the target configuration file, and the policy manager (strategy_manager) may construct a policy based on the target policy configuration information (Strategy/builder.py) in the target configuration file, i.e. invoke the policy stored in the policy information storage module. Training model interface (train_model), can be based on training interface configuration information (apis/train. Py), use_non_dist_train () function to load target data set (build DataLoader), call Runner (model training engine), and register HOOK interface, HOOK interface guarantee system can call model training engine normally.

And then configuring an operation interface of the model training engine based on model training engine configuration information (runner/runner.py), performing iterative training through a runner.run () method, and performing iterative training through a runner.trace () method, wherein after a round of training, a model Forward algorithm and a calculation error can be used based on a batch processor, specifically, the model Forward can be realized based on a model/detector/base.py, the model Forward can be realized by using a Forward () method, and then the model training Forward algorithm calculation error can be called through a forward_trace () method, and as described above, a teacher model and a student model generally exist in a semi-supervised learning algorithm, and consistency training can be performed on labels output by the teacher model and the student model, so that the error calculated here can be the error of a pseudo-label output by the teacher model, and the pseudo-label output by the student model, after that the model Forward can be realized by using a Forward () method, the model Forward can be called through a forward_trace () method, and the model Forward algorithm can be performed through a Forward error-optimized algorithm, and the error is called through a forward_trace_trace () method, and the error is normally performed on the teacher model and the student model when training is performed in a training mode, and the training error is performed when training is performed, so that the consistency training is performed on labels output by the teacher model, and the model.

As can be seen from the above embodiments, the machine learning engine service system provided by the embodiments of the present invention can be directly invoked when each policy is invoked, and does not need code migration, so as to reduce the threshold of machine learning and improve the robustness of machine learning.

Based on the same technical conception as the machine learning engine service system, the embodiment of the invention also provides a training method of a task model, which is applied to the machine learning engine service system, as shown in fig. 8, and fig. 8 is a flow chart of the model training method provided by the embodiment of the invention; the method specifically comprises the following steps:

step 801, a model management module obtains a target task determined by a user;

step 802, obtaining target configuration information of a corresponding target task model from the configuration information storage module;

step 803, constructing and loading the target task model based on the target configuration information and the model data of the network model stored in the model information storage module; the model data of the network model includes program code for the network model;

step 804, calling the data management module, and loading a target data set based on configuration information of the target data set in the target configuration information;

Step 805, calling the model training engine according to a preset training process of the target task, and training the target task model based on the target data set; the training process comprises the following steps: and training the target network model in the target configuration information.

As a specific implementation of the embodiment of the present invention, as described above, the system may further include: the policy information storage module and the policy management module;

the configuration information stored in the configuration information storage module further includes configuration information of at least one task model: policy configuration information; the policy configuration information includes: the task model needs the strategy identification of at least one strategy to be called and strategy parameter configuration information of each strategy; wherein, a strategy mark corresponds to a sub-algorithm in the advanced machine learning algorithm;

Accordingly, based on fig. 8, as shown in fig. 9, the step 805 may specifically include the following steps:

step 905, calling the model training engine and the policy management module according to a preset target training process of the target task under the condition that the configuration information of the target task model contains policy information, and training the target task model based on the target data set; the target training process comprises the following steps: training the target network model in the target configuration information, and loading and executing the target strategy in the target configuration information.

As a specific implementation manner of the embodiment of the present invention, as described above, the machine learning engine service system may further include a plurality of training interfaces, where each training interface corresponds to a task model, and each training interface includes a training procedure of the corresponding task model: thus, the step 905 may specifically include the following steps:

As a specific implementation of the embodiment of the present invention, as shown in fig. 10, the step 905 may specifically include the following steps:

step 1001, determining a current training process step to be executed based on a preset training process of the target task;

in the embodiment of the invention, the current training process step may include executing the target strategy and training the target network model.

Step 1002, calling the policy management module to obtain the program code of the target policy from the policy information storage module for loading and executing the target policy in the current training process step when the current training process step is the step of executing the target policy;

specifically, the policy storage module may acquire the program code of the target policy from the policy information storage code according to the policy identifier in the target configuration information and the configuration information of the policy, and load the program code, so that the program code is used to perform operations such as calculation, pseudo-labeling and the like in the model training process.

Step 1003, receiving a policy execution result fed back by the policy management module;

as a specific implementation manner of the embodiment of the present invention, the policy execution result fed back by the policy management module may include pseudo tag information of a sample, and the like.

Step 1004, calling the model training engine to train the target network model based on the target data set under the condition that the current training process step is to train the target network model in the target configuration information;

step 1005, receiving a training result fed back by the model training engine;

as a specific implementation manner of the embodiment of the present invention, a model save point (checkpoints) may be set in the above configuration file, that is, a snapshot of the model is saved, for example, a write interval of the checkpoints may be set to 20 minutes, and the last 10 checkpoints are set to remain, which is not limited herein specifically. The model training engine may return the model saved at the checkpoint to the model management module for iterative training.

Step 1006, judging whether the current training process step is the last one of the training processes, if not, returning to the training process based on the preset target task, and determining the step of the current training process step to be executed; if yes, go to step 1007 and end the training.

After the training is finished, the trained model can be tested in the test set, and retraining or outputting results and the like are performed according to the test results. As a specific implementation manner of the embodiment of the present invention, various data to be output may be output according to a preset unified output format.

The training method of the task model is described in detail in the previous system embodiment, and only the description is briefly made here, and the description is omitted.

The training method of the task model provided by the embodiment of the invention is applied to the machine learning engine service system, the model management module acquires corresponding target configuration information from the configuration information storage module based on the target task determined by the user, constructs and loads the target task model according to the target configuration information and the model data of the network model stored in the model information storage module, invokes the model training engine and the strategy management module according to the target training flow of the target task, and trains the target task model based on the target data set loaded by the data management module. According to the training method for the task model, the configuration information of the task model comprises information of strategies to be called by the task model, and program codes of all the strategies are stored in the strategy information storage module, so that when a user uses the task model, the corresponding strategies can be directly called, code migration is not needed, the threshold of machine learning is reduced, and the robustness of machine learning is improved.

Based on the same technical concept as the machine learning engine service system, the embodiment of the invention also provides a task model configuration method, which is applied to the machine learning engine service system, as shown in fig. 11, and specifically includes the following steps:

step 1101, receiving model data of each network model imported by a research and development user, and storing the model data in the model information storage module; the model data of the network model includes program code for the network model;

step 1102, obtaining configuration information for configuring each task model by a research and development user, and storing the configuration information into the configuration information storage module; the configuration information of each task model comprises: structural information of a network model used by the task model, super parameters of the network model, and configuration information of a data set used by the task model.

As an implementation manner of the embodiment of the present invention, as described above, the machine learning engine service system may further include a policy information storage module and a policy management module;

accordingly, based on fig. 11, as shown in fig. 12, the above method may further include the following steps:

step 1203, receiving program codes for developing all strategies imported by a user and storing the program codes into the strategy information storage module;

Step 1204, obtaining policy configuration information configured by a developing user on at least one task model, and storing the policy configuration information into the configuration information storage module, where the policy configuration information includes: the task model needs the strategy identification of at least one strategy to be called and strategy parameter configuration information of each strategy; one strategy identifier corresponds to one sub-algorithm in the advanced machine learning algorithm;

The above process is described in detail in the previous system embodiment, and will not be repeated here.

The configuration method of the task model provided by the embodiment of the invention is applied to the machine learning engine service system, firstly, model data of each network model and program codes of each strategy which are imported by a research and development user are received and respectively stored in a model information storage module and a strategy information storage module, configuration information of each task model which is configured by the research and development user is obtained and stored in the configuration information storage module, the configuration information of each task model comprises structural information, super parameters of the task model for the user network model, data set configuration information used by the task model, strategy identification of the strategy which needs to be invoked and strategy configuration information, and one strategy identification corresponds to a sub-algorithm in advanced machine learning. According to the configuration method of the task model, the configuration information of each task model comprises the information of the strategy to be called by the task model, and the program codes of each strategy are stored in the strategy information storage module, so that when a user uses the task model, the corresponding strategy can be directly called, and code migration is not needed, thereby reducing the threshold of machine learning and improving the robustness of machine learning.

The embodiment of the present invention further provides an electronic device, as shown in fig. 13, including a processor 1301, a communication interface 1302, a memory 1303 and a communication bus 1304, where the processor 1301, the communication interface 1302, and the memory 1303 complete communication with each other through the communication bus 1304,

a memory 1303 for storing a computer program;

processor 1301, when executing the program stored in memory 1303, implements the following steps:

The training method of the task model provided by the embodiment of the invention is applied to the machine learning engine service system, the model management module acquires corresponding target configuration information from the configuration information storage module based on the target task determined by the user, constructs and loads the target task model according to the target configuration information and the model data of the network model stored in the model information storage module, invokes the model training engine according to the target training process of the target task, and trains the target task model based on the target data set loaded by the data management module. According to the training method for the task model, the configuration information of the task model comprises the related information required to be called when the task model is trained, when a user uses the task model, the corresponding configuration file can be directly called to train the model to obtain a required result, and code migration is not required, so that the threshold of machine learning is reduced, and the robustness of the machine learning is improved.

The embodiment of the present invention also provides an electronic device, as shown in fig. 14, including a processor 1401, a communication interface 1402, a memory 1403, and a communication bus 1404, where the processor 1401, the communication interface 1402, and the memory 1403 perform communication with each other through the communication bus 1404,

A memory 1403 for storing a computer program;

the processor 1401 is configured to execute the program stored in the memory 1403, and implement the following steps:

The configuration method of the task model provided by the embodiment of the invention is applied to the machine learning engine service system, and firstly, model data of each network model imported by a research and development user is received and stored in a model information storage module, configuration information of each task model configured by the research and development user is obtained and stored in a configuration information storage module, and the configuration information of each task model comprises structural information, super parameters and data set configuration information used by the task model of the user network model by the task model. According to the configuration method of the task model, the configuration information of the task model comprises the related information required to be called when the task model is trained, when a user uses the task model, the corresponding configuration file can be directly called to train the model to obtain a required result, and code migration is not required, so that the threshold of machine learning is reduced, and the robustness of the machine learning is improved.

The communication bus mentioned above for the electronic devices may be a peripheral component interconnect standard (Peripheral Component Interconnect, PCI) bus or an extended industry standard architecture (Extended Industry Standard Architecture, EISA) bus, etc. The communication bus may be classified as an address bus, a data bus, a control bus, or the like. For ease of illustration, the figures are shown with only one bold line, but not with only one bus or one type of bus.

The communication interface is used for communication between the electronic device and other devices.

The Memory may include random access Memory (Random Access Memory, RAM) or may include Non-Volatile Memory (NVM), such as at least one disk Memory. Optionally, the memory may also be at least one memory device located remotely from the aforementioned processor.

The processor may be a general-purpose processor, including a central processing unit (Central Processing Unit, CPU), a network processor (Network Processor, NP), etc.; but also digital signal processors (Digital Signal Processor, DSP), application specific integrated circuits (Application Specific Integrated Circuit, ASIC), field programmable gate arrays (Field-Programmable Gate Array, FPGA) or other programmable logic devices, discrete gate or transistor logic devices, discrete hardware components.

In yet another embodiment of the present invention, a computer readable storage medium is also provided, in which a computer program is stored, which when executed by a processor, implements the steps of the training or configuration method of any of the task models described above.

In yet another embodiment of the present invention, there is also provided a computer program product containing instructions that, when run on a computer, cause the computer to perform the training or configuration method of any of the task models of the above embodiments.

In the above embodiments, it may be implemented in whole or in part by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, produces a flow or function in accordance with embodiments of the present invention, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a computer network, or other programmable apparatus. The computer instructions may be stored in or transmitted from one computer-readable storage medium to another, for example, by wired (e.g., coaxial cable, optical fiber, digital Subscriber Line (DSL)), or wireless (e.g., infrared, wireless, microwave, etc.). The computer readable storage medium may be any available medium that can be accessed by a computer or a data storage device such as a server, data center, etc. that contains an integration of one or more available media. The usable medium may be a magnetic medium (e.g., floppy Disk, hard Disk, magnetic tape), an optical medium (e.g., DVD), or a semiconductor medium (e.g., solid State Disk (SSD)), etc.

It is noted that relational terms such as first and second, and the like are used solely to distinguish one entity or action from another entity or action without necessarily requiring or implying any actual such relationship or order between such entities or actions. Moreover, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or apparatus that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or apparatus. Without further limitation, an element defined by the phrase "comprising one … …" does not exclude the presence of other like elements in a process, method, article, or apparatus that comprises the element.

In this specification, each embodiment is described in a related manner, and identical and similar parts of each embodiment are all referred to each other, and each embodiment mainly describes differences from other embodiments. In particular, for the method, electronic device embodiments, since they are substantially similar to the system embodiments, the description is relatively simple, with reference to the partial description of the system embodiments being relevant.

The foregoing description is only of the preferred embodiments of the present invention and is not intended to limit the scope of the present invention. Any modification, equivalent replacement, improvement, etc. made within the spirit and principle of the present invention are included in the protection scope of the present invention.

Claims

1. A machine learning engine service system, the system comprising:

the system comprises a model management module, a data management module, a model training engine, a configuration information storage module and a model information storage module; the system further comprises: the policy information storage module and the policy management module;

the model training engine is used for training a network model;

the data management module is used for loading a target data set based on configuration information of the target data set in the target configuration information when being called by the model management module;

2. The system of claim 1, wherein the system further comprises a controller configured to control the controller,

the system further comprises: each training interface corresponds to one task model, and each training interface comprises a training process of the corresponding task model;

3. The system of claim 1, wherein the system further comprises: a model configuration module;

4. The system of claim 1, wherein the system further comprises: a data output module;

5. The system of claim 1, wherein the system further comprises: one or more of a script presentation module, a tool module, and a log analysis module:

6. A method of training a task model as applied to the machine learning engine service system of claim 1, the method comprising:

7. The method of claim 6, wherein the step of providing the first layer comprises,

the system further comprises: the policy information storage module and the policy management module;

8. The method of claim 7, wherein the step of determining the position of the probe is performed,

9. The method according to claim 7 or 8, wherein,

and under the condition that the configuration information of the target task model contains strategy information, invoking the model training engine and the strategy management module according to a preset training process of the target task, and training the target task model based on the target data set, wherein the method comprises the following steps of:

receiving a policy execution result fed back by the policy management module;

receiving training results fed back by the model training engine;

10. The method of claim 6, wherein the method further comprises:

11. A method for configuring a task model, applied to the machine learning engine service system of claim 1, the method comprising:

12. The configuration method according to claim 11, wherein the system further comprises: the policy information storage module and the policy management module;

the method further comprises the steps of:

13. The electronic equipment is characterized by comprising a processor, a communication interface, a memory and a communication bus, wherein the processor, the communication interface and the memory are communicated with each other through the communication bus;

a memory for storing a computer program;

a processor for carrying out the method steps of any one of claims 6-10 or 11-12 when executing a program stored on a memory.

14. A computer-readable storage medium, characterized in that the computer-readable storage medium has stored therein a computer program which, when executed by a processor, implements the method steps of any of claims 6-10 or 11-12.