CN114358309A

CN114358309A - Distributed machine learning model training method, device, equipment and storage medium

Info

Publication number: CN114358309A
Application number: CN202111513463.4A
Authority: CN
Inventors: 周贤谦; 鲁迪超; 陈牧
Original assignee: Shenzhen Kingdom Technology Co ltd
Current assignee: Shenzhen Kingdom Technology Co ltd
Priority date: 2021-12-02
Filing date: 2021-12-02
Publication date: 2022-04-15

Abstract

The invention discloses a distributed machine learning model training method, device, equipment and storage medium, and belongs to the technical field of machine learning. When a training instruction of a user is obtained, extracting training information in the training instruction; acquiring data to be trained from a server based on the training information; creating a corresponding training model according to the data to be trained; the data to be trained is subjected to distributed model training based on the training model, the training information in the training instruction of the user is extracted, the corresponding data to be trained is obtained from the server according to the training information, the corresponding training model is created, the data to be trained can be directly obtained from the server, convenience is provided for management of the training data, and the efficiency of model training is improved by using a distributed deployment mode.

Description

Distributed machine learning model training method, device, equipment and storage medium

Technical Field

The invention relates to the technical field of machine learning, in particular to a distributed machine learning model training method, device, equipment and storage medium.

Background

The existing machine learning platform can only provide modeling and training functions, and has more inconvenience in the aspects of training data management and model management and release after training. In the aspect of training data management, the data comparison of training data at each time needs to be performed by an operator, the conditions of multiple data sources cannot be well compatible, the training data can only be imported in a file import or database import mode at each time, and the problem of simultaneous import of multiple data sources cannot be solved. If the imported data needs to be processed and edited, the training data can only be rearranged and imported again off line, and the management of the training data is not convenient enough.

The above is only for the purpose of assisting understanding of the technical aspects of the present invention, and does not represent an admission that the above is prior art.

Disclosure of Invention

The invention mainly aims to provide a distributed machine learning model training method, a device, equipment and a storage medium, and aims to solve the technical problem that training data management in the prior art is inconvenient.

In order to achieve the above object, the present invention provides a distributed machine learning model training method, including the following steps:

when a training instruction of a user is obtained, extracting training information in the training instruction;

acquiring data to be trained from a server based on the training information;

creating a corresponding training model according to the data to be trained;

and carrying out distributed model training on the data to be trained based on the training model.

Optionally, the extracting training information in the training instruction includes:

extracting a corresponding scene type according to the training instruction;

and acquiring corresponding training information according to the scene type.

Optionally, when the training instruction of the user is obtained, before the extracting of the training information in the training instruction, the method further includes:

when the operation of a user is detected, acquiring first original file data and second original file data imported by the user through a preset import interface, wherein the first original file data is located in a first preset server, the second original file data is located in a second preset server, and the first preset server is different from the second preset server;

reading first data content of the first original file data and second data content of the second original file data;

merging the first data content and the second data content in a preset splicing mode to obtain original file data;

and storing the original file data to a server as data to be trained.

Optionally, after the storing the original file data to a server as data to be trained, the method further includes:

when an updating instruction of a user is received, extracting updating information in the updating instruction;

updating the original file data based on the updating information;

and storing the updated original file data to the server.

Optionally, the performing distributed model training on the data to be trained based on the training model includes:

creating a model canvas based on the training model;

establishing a functional module of the training model based on the model canvas and the training information;

obtaining a training process according to the functional module;

importing the data to be trained based on the training process to obtain a model operation task;

and monitoring the model operation task through a preset distributed system, and issuing the model operation task to a corresponding server through a preset message queue for parallel distributed operation so as to realize distributed model training of the data to be trained.

Optionally, after the performing distributed model training on the data to be trained based on the training model, the method further includes:

storing the model canvas into a corresponding preset canvas file in a server;

when a modification instruction of a user is acquired, extracting identification information in the modification instruction, wherein the identification information comprises: one or more of an experiment identifier, a model identifier and a canvas identifier;

analyzing the preset canvas file according to the identification information;

and reloading the model canvas according to the analysis result so as to modify the model canvas.

setting version information of the trained model;

packing the version information into a model executable file;

storing the model executable file into a preset database and a server corresponding to a service system;

and generating a model issuing instruction, and informing the service system so that the service system determines a model version according to the issuing instruction to issue the model.

In addition, to achieve the above object, the present invention further provides a distributed machine learning model training apparatus, including:

the extraction module is used for extracting training information in a training instruction when the training instruction of a user is obtained;

the acquisition module is used for acquiring data to be trained from the server based on the training information;

the creating module is used for creating a corresponding training model according to the data to be trained;

and the training module is used for carrying out distributed model training on the data to be trained based on the training model.

In addition, to achieve the above object, the present invention further provides a distributed machine learning model training apparatus, including: a memory, a processor, and a distributed machine learning model training program stored on the memory and executable on the processor, the distributed machine learning model training program configured to implement the steps of the distributed machine learning model training method as described above.

Furthermore, to achieve the above object, the present invention further provides a storage medium having a distributed machine learning model training program stored thereon, which when executed by a processor implements the steps of the distributed machine learning model training method as described above.

When a training instruction of a user is obtained, extracting training information in the training instruction; acquiring data to be trained from a server based on the training information; creating a corresponding training model according to the data to be trained; the data to be trained is subjected to distributed model training based on the training model, the training information in the training instruction of the user is extracted, the corresponding data to be trained is obtained from the server according to the training information, the corresponding training model is created, the data to be trained can be directly obtained from the server, convenience is provided for management of the training data, and the efficiency of model training is improved by using a distributed deployment mode.

Drawings

FIG. 1 is a schematic structural diagram of a distributed machine learning model training device of a hardware operating environment according to an embodiment of the present invention;

FIG. 2 is a flowchart illustrating a first embodiment of a distributed machine learning model training method according to the present invention;

FIG. 3 is a flowchart illustrating a distributed machine learning model training method according to a second embodiment of the present invention;

FIG. 4 is a table management diagram illustrating a distributed machine learning model training method according to a second embodiment of the present invention;

FIG. 5 is a flowchart illustrating a distributed machine learning model training method according to a third embodiment of the present invention;

FIG. 6 is a flowchart illustrating a user-dragged model training process according to a third embodiment of the distributed machine learning model training method of the present invention;

FIG. 7 is a distributed operation graph according to a third embodiment of the distributed machine learning model training method of the present invention;

FIG. 8 is a flowchart illustrating a fourth embodiment of a distributed machine learning model training method according to the present invention;

FIG. 9 is a flowchart illustrating a fifth embodiment of the distributed machine learning model training method of the present invention;

fig. 10 is a block diagram illustrating a first embodiment of the distributed machine learning model training apparatus according to the present invention.

The implementation, functional features and advantages of the objects of the present invention will be further explained with reference to the accompanying drawings.

Detailed Description

It should be understood that the specific embodiments described herein are merely illustrative of the invention and are not intended to limit the invention.

Referring to fig. 1, fig. 1 is a schematic structural diagram of a distributed machine learning model training device of a hardware operating environment according to an embodiment of the present invention.

As shown in fig. 1, the distributed machine learning model training apparatus may include: a processor 1001, such as a Central Processing Unit (CPU), a communication bus 1002, a user interface 1003, a network interface 1004, and a memory 1005. Wherein a communication bus 1002 is used to enable connective communication between these components. The user interface 1003 may include a Display screen (Display), an input unit such as a Keyboard (Keyboard), and the optional user interface 1003 may also include a standard wired interface, a wireless interface. The network interface 1004 may optionally include a standard wired interface, a Wireless interface (e.g., a Wireless-Fidelity (Wi-Fi) interface). The Memory 1005 may be a Random Access Memory (RAM) Memory, or may be a Non-Volatile Memory (NVM), such as a disk Memory. The memory 1005 may alternatively be a storage device separate from the processor 1001.

Those skilled in the art will appreciate that the architecture shown in FIG. 1 does not constitute a limitation of the distributed machine learning model training apparatus and may include more or fewer components than shown, or some components in combination, or a different arrangement of components.

As shown in fig. 1, a memory 1005, which is a storage medium, may include therein an operating system, a network communication module, a user interface module, and a distributed machine learning model training program.

In the distributed machine learning model training apparatus shown in fig. 1, the network interface 1004 is mainly used for data communication with a network server; the user interface 1003 is mainly used for data interaction with a user; the processor 1001 and the memory 1005 of the distributed machine learning model training apparatus according to the present invention may be provided in the distributed machine learning model training apparatus, and the distributed machine learning model training apparatus invokes the distributed machine learning model training program stored in the memory 1005 through the processor 1001 and executes the distributed machine learning model training method provided by the embodiment of the present invention.

An embodiment of the present invention provides a distributed machine learning model training method, and referring to fig. 2, fig. 2 is a schematic flow diagram of a first embodiment of the distributed machine learning model training method of the present invention.

In this embodiment, the distributed machine learning model training method includes the following steps:

step S10: when a training instruction of a user is obtained, extracting training information in the training instruction.

It should be noted that the execution subject of this embodiment is a service platform and a machine learning platform that can perform distributed machine learning model training, and may also be other devices that can implement the same or similar functions, which is not limited in this embodiment. Whole service platform is realized based on the Web frame, in service platform, through newly-increased menu, link the platform Web address that corresponds and can open machine learning platform, when the user uses machine learning platform, can inquire service platform's data and index in step, machine learning platform's database and service platform's database mutual independence, machine learning platform attaches the realization of machine learning function then, machine learning platform itself can regard as a module that can expand, use in other service platforms, also can regard as solitary platform, strengthen the actual service application scene of platform.

In specific implementation, a user enters a web operation interface of the platform through the service system, when the user has a model training requirement, the platform can generate a training instruction according to the operation of the user, and specific training information required to be trained by the user is obtained by extracting the training instruction. For example, if the user needs to train the data of the off-site investment simulation, an instruction of the off-site investment simulation can be generated, and the instruction is extracted to obtain the specific information to be trained.

Step S20: and acquiring data to be trained from a server based on the training information.

It should be understood that the server may store data imported by the user or data imported from other databases, and after the training information is obtained, the server may directly determine data to be trained according to the training information, and directly obtain the stored data to be trained from the server.

Step S30: and creating a corresponding training model according to the data to be trained.

In specific implementation, a user can create a model through a model management interface provided by a platform, determine a module to be created through data to be trained, create a corresponding training model, and store corresponding model information into a background database of the platform, so that subsequent query and use are facilitated.

Step S40: and carrying out distributed model training on the data to be trained based on the training model.

It should be noted that, the model training functions in the Web interface all correspond to the function interfaces of the background, and when the front end runs the model, the front end can transmit the specified parameters according to different functions and the sequence specified by the connecting line, and call the interface of the background; and the background interface executes functional operation according to the data transmitted by the interface and the corresponding file information stored in the server, so that the operation of model training is ensured.

In this embodiment, the user may perform distributed model training on the data to be trained by using a distributed algorithm, such as a consistent hash algorithm, a modulo algorithm, and the like, which is not limited in this embodiment. In practical application, the speed of model training can be increased by planning the Central Processing Unit (CPU) resources and the hard disk resources of each server.

In the embodiment, when a training instruction of a user is obtained, training information in the training instruction is extracted; acquiring data to be trained from a server based on the training information; creating a corresponding training model according to the data to be trained; the data to be trained is subjected to distributed model training based on the training model, the training information in the training instruction of the user is extracted, the corresponding data to be trained is obtained from the server according to the training information, the corresponding training model is created, the data to be trained can be directly obtained from the server, convenience is provided for management of the training data, and the efficiency of model training is improved by using a distributed deployment mode.

Referring to fig. 3, fig. 3 is a flowchart illustrating a distributed machine learning model training method according to a second embodiment of the present invention.

Based on the first embodiment, the step S10 of the distributed machine learning model training method in this embodiment specifically includes:

step S101: and extracting a corresponding scene type according to the training instruction.

It should be understood that the scene type is a specific training scene required by the user, for example, a risk scene, a transaction scene, and the like, and the corresponding scene type may be extracted according to a training instruction sent by the user.

Step S102: and acquiring corresponding training information according to the scene type.

In specific implementation, when a scene type that a user needs to train is acquired, specific training information can be acquired according to the scene type.

In this embodiment, before the step S10 of extracting the training information in the training instruction when the training instruction of the user is obtained, the method further includes:

step S11: when the operation of a user is detected, first original file data and second original file data imported by the user are obtained through a preset import interface, the first original file data are located in a first preset server, the second original file data are located in a second preset server, and the first preset server is different from the second preset server.

The preset import interface refers to an upload interface provided by a platform, and the File data imported by the user is imported by using an ftp (File Transfer Protocol). The first original file data includes local file data, the second original file data includes database data, the first preset Server is a local Server, and the second preset Server is a database of a business system or an external database, such as one or more of a MySQL database, an Oracle database, and a Microsoft SQL Server database, which is not limited in this embodiment by contrast.

When the fact that a user logs in the web interface is detected, a data import menu is provided in the web interface, the user obtains the local position of a data file through browsing, the local first original file data is transmitted to a server of the web interface platform through an uploading interface provided by the platform, such as an HTTP (hyper text transport protocol) interface, by using an ftp (ftp) protocol, and the subsequent platform can directly read the file in the server when using the data. The second original file data can directly access the configured data source through front-end web decoding, an SQL statement editor is embedded in a web interface, the required data can be directly inquired from the Database according to the edited inquiry statement, the data source can be from the Database of the service system or an external Database, and the data in the Database is obtained by using an Open Database Connectivity (ODBC) to configure a direct connection Database.

Step S12: and reading first data content of the first original file data and second data content of the second original file data.

In specific implementation, the server stores the first original file data and the second original file data, wherein a data source of the first original file data is different from a data source of the second original file data, and data of the second original file may come from different databases, so that the first original file data and the second original file data in the server need to be merged, and the subsequent training is facilitated.

It should be understood that, the data content of the first original file data and the data content of the second original file data are read by the platform, and after the contents of the file data are read, the read data content of the first original file data and the read data content of the second original file data are stored in the web cache.

Step S13: and merging the first data content and the second data content in a preset splicing mode to obtain original file data.

The preset splicing method is to input the data content of the first original file data and the data content of the second original file data into the form for splicing. The original file data is file data obtained by combining the first original file data and the second original file data.

In this embodiment, the first original file data and the second original file data are dynamically spliced in a table form to form a complete data source, and the merged original file data is stored as a new data file through a storage Interface provided by the platform, such as a serial hard disk Interface (hdd) Interface, a Small Computer System Interface (SCSI), and the like, and is stored in the server as a subsequent training data source.

In specific implementation, if the user uploads the local file data for multiple times, file contents of all the local file data uploaded by the user can be read and stored in the web cache, and all the read file contents are dynamically spliced in a table form to form a complete new data source and stored in the server as a data source for subsequent training. And if the file data is the data of the same database which is uploaded for many times, the file contents of all the uploaded data of the same database can be read and stored in the web cache, and all the read file contents are dynamically spliced in a table form to form a complete new data source which is stored in the server and used as a data source for subsequent training.

Step S14: and storing the original file data to a server as data to be trained.

It should be understood that the merged file data is used as a new data source and stored in the server as data to be trained subsequently.

Further, after saving the merged original file data to the server, the method may edit and modify the original file data according to a data editing function provided by the platform, and after step S14, the method further includes: when an updating instruction of a user is received, extracting updating information in the updating instruction; updating the original file data based on the updating information; and storing the updated original file data to the server.

It should be noted that the update instruction is an instruction for modifying and updating original file data, and can extract specific information to be updated according to the update instruction, and after receiving the update information, can edit and update the data of the original file according to the update information, and the edit types may include: column editing, row editing, field value editing, etc., which are not limited in this embodiment. In the web interface, data in the data table can be edited, after updating is completed, a storage button can be clicked, a file export interface provided by a background is called, data cached at the front end is stored to a position designated by a server, and a new character separation value (csvccomma-Separated Values, CSV) format file is generated and used for reserving an update trace and facilitating subsequent model training.

Fig. 4 is a schematic diagram of table management in a second embodiment of the distributed machine learning model training method of the present invention, in which a user can search a stored database of tables, such as table types, update time, creation time, and the like, by inputting keywords. For example, the input is machine learning, the number of the obtained specific model tables is 2 after query, the type is machine learning, and the model table library comprises table names, creation modes, table descriptions, creation time, update time and corresponding responsible persons. After the original file data is updated, the stored table data can be queried according to the updated time.

In this embodiment, when it is detected that a user performs an operation, first original file data and second original file data imported by the user are acquired through a preset import interface, where the first original file data is located in a first preset server, the second original file data is located in a second preset server, and the first preset server is different from the second preset server; reading first data content of the first original file data and second data content of the second original file data; merging the first data content and the second data content in a preset splicing mode to obtain original file data; and storing the original file data to a server as data to be trained. Extracting a corresponding scene type according to the training instruction; and acquiring corresponding training information according to the scene type, and merging the data of the imported different data sources to obtain a complete new data source, so that the problem of simultaneously importing multiple data sources is solved.

Referring to fig. 5, fig. 5 is a flowchart illustrating a distributed machine learning model training method according to a third embodiment of the present invention.

Based on the first embodiment, a third embodiment of the distributed machine learning model training method of the present invention is provided. The step S40 of the distributed machine learning model training method of this embodiment specifically includes:

step S401: a model canvas is created based on the training model.

In specific implementation, after a user creates a model through a model management interface provided by a platform, a model canvas is created under the corresponding model, and corresponding model canvas information is stored to a specified position in a server in an XML format, so that subsequent use is facilitated.

Step S402: and establishing a functional module of the training model based on the model canvas and the training information.

It should be noted that the functional modules include functional modules such as importing training data, data deduplication, missing value filling, feature engineering, data splitting, random forest training algorithm, parameter adjustment, binary evaluation, model publishing, and the like, and may also include other functional modules, and may be created according to user requirements.

It should be appreciated that a user may perform the function modules of the training model by adding modeling training functions in the model canvas.

Step S403: and obtaining a training process according to the functional module.

In this embodiment, the training process is a specific step of performing model training, as shown in fig. 6, fig. 6 is a user dragging type model training process diagram in a third embodiment of the distributed machine learning model training method of the present invention, and the common functions include components, experiments, a model table library, and engineering. The function module comprises a data source module, a data preprocessing module, a feature function module, a statistical analysis module and a machine learning module, wherein the data source module comprises a data reading table and a data writing table, the data preprocessing module comprises data merging, normalization, type conversion, data splitting and standardization, the feature engineering module comprises feature marks, feature conversion, feature importance evaluation, feature selection and feature generation, the statistical analysis module comprises a histogram, a data view, normal distribution, T inspection, covariance and a scatter diagram, and the machine learning module comprises classification, specifically comprises GBDT secondary classification. And selecting a corresponding module generation training process according to the functional module, wherein the process comprises a data reading table-data splitting-random forest-prediction-binary evaluation process, and after data splitting is selected, selecting a specific splitting mode, splitting ratio, output ratio selection and random tree seed selection and storing.

In specific implementation, model training functions of a platform interface correspond to a function interface of a background, when a front end runs a model, the front end transmits specified input parameters according to different functions according to a sequence specified by a connecting line, and calls the interface of the background; and the background interface executes functional operation according to the data transmitted by the interface and the corresponding file information stored in the server, so that the operation of model training is ensured.

Step S404: and importing the data to be trained based on the training flow to obtain a model operation task.

It should be understood that the model calculation task includes a calculation task corresponding to each functional module in the model. After the training flow is generated, specific data to be trained can be imported, and corresponding functional operation is executed to obtain a model operation task.

Step S405: and monitoring the model operation task through a preset distributed system, and issuing the model operation task to a corresponding server through a preset message queue for parallel distributed operation so as to realize distributed model training of the data to be trained.

In a specific implementation, the preset distributed system is a visual person monitoring of Celeryde. The preset message queue is RabbitMQ software, the execution condition of background tasks can be easily monitored through the visual task monitoring of the Celery, and tasks generated by the machine learning platform can be issued to the running server through the distributed channels of the RabbitMQ for parallel operation.

As shown in fig. 7, fig. 7 is a distributed operation graph in the third embodiment of the training method of the distributed machine learning model according to the present invention. The simulation computation tasks may include random forests, grid tuning, logistic regression, GBDT, and the like. Model operation tasks are divided into message queues MQ1, MQ2, MQ3 and MQ alpha through a distributed channel of the RabbitMQ for issuing, wherein alpha is the number of the message queues, the simulation operation tasks of each message queue are issued to a server T1, a server T2, a server T3, a server T4 and a server T beta, and beta represents the number of running servers. The operation tasks of one message queue can be distributed to 3 running servers to carry out parallel operation.

The embodiment creates a model canvas based on the training model; establishing a functional module of the training model based on the model canvas and the training information; obtaining a training process according to the functional module; importing the data to be trained based on the training process to obtain a model operation task; the model operation tasks are monitored through a preset distributed system, the model operation tasks are issued to corresponding servers through a preset message queue for parallel distributed operation, distributed model training of the data to be trained is achieved, the model operation tasks needing operation are monitored through the preset distributed system, the model operation tasks are issued to corresponding operating servers through the preset message queue for parallel operation, time required by model training can be reduced through the distributed task distribution system, operating efficiency is improved, and model training speed is increased by planning CPU resources and hard disk resources of the servers.

Referring to fig. 8, fig. 8 is a flowchart illustrating a fourth embodiment of the distributed machine learning model training method according to the present invention.

Based on the first embodiment and the third embodiment, a fourth embodiment of the distributed machine learning model training method of the present invention is provided. After the step S40, the method for training a distributed machine learning model in this embodiment further includes:

step S41: and storing the model canvas into a corresponding preset canvas file in a server.

It should be understood that the canvas file is preset as an XML file, when a corresponding model canvas is created according to the training model, and the parameters are adjusted according to the information displayed in the canvas, the canvas information of the model canvas can be saved as the XML file and stored in the designated position in the server, so as to archive the current modeling training, and re-edit and data verification can be performed on the model flow in the saved model canvas at any time later.

Step S42: when a modification instruction of a user is acquired, extracting identification information in the modification instruction, wherein the identification information comprises: one or more of an experiment identification, a model identification, and a canvas identification.

In specific implementation, the modification instruction is an instruction for modifying information in the canvas, and when a user sends the modification instruction, the modification instruction can be extracted to obtain specific identification information, wherein the identification information comprises an experiment identification, a model identification and a canvas identification. When the canvas information is stored, the model canvas can be stored through the identification information, when the canvas needs to be searched, the canvas information can be directly matched one by one according to the representation information in the canvas information, and the corresponding model canvas is obtained.

Step S43: and analyzing the preset canvas file according to the identification information.

It should be noted that, when the user needs to re-edit, the user can search in the server through the identification information, and the platform parses the XML file of the corresponding canvas, so as to restore the original modeling flowchart in the web interface.

Step S44: and reloading the model canvas according to the analysis result so as to modify the model canvas.

It should be understood that after parsing the XML file of the canvas, all training result data is also saved in the database and can be reloaded in the web interface through the corresponding identification information. Re-editing can be performed through the loaded model canvas.

In the embodiment, the model canvas is stored in a corresponding preset canvas file in a server; when a modification instruction of a user is acquired, extracting identification information in the modification instruction, wherein the identification information comprises: one or more of an experiment identifier, a model identifier and a canvas identifier; analyzing the preset canvas file according to the identification information; and reloading the model canvas according to the analysis result so as to modify the model canvas, storing all the flows and data in the model canvas at corresponding positions in the server, analyzing the stored preset canvas file directly when a user needs to edit again, restoring the previous model canvas after one-to-one matching through the identification information, facilitating the modification of the user and having high query efficiency.

Referring to fig. 9, fig. 9 is a flowchart illustrating a fifth embodiment of the distributed machine learning model training method according to the present invention.

Based on the first embodiment, the third embodiment and the fourth embodiment, a fifth embodiment of the distributed machine learning model training method of the present invention is provided. After the step S40, the method for training a distributed machine learning model in this embodiment further includes:

step S45: and setting version information of the trained model.

It should be understood that the version information may include specific information of the trained model, such as a unique version number generated according to the training content, the training time, and the training scenario, and when the user updates the model training, the updated model may be version-updated.

Step S46: and packaging the version information into a model executable file.

In a specific implementation, the model executable file refers to a program file that can be loaded and executed, for example, a type file such as an exe file or a com file, and the executable file can be issued after being packaged into the executable file according to version information of the model.

Step S47: and storing the model executable file into a preset database and a server corresponding to the service system.

It should be noted that the preset database refers to a background database of the machine learning platform, and after the trained model largely encloses the model executable file, the model executable file is stored in the machine learning platform server and a server agreed with the service system.

Step S48: and generating a model issuing instruction, and informing the service system so that the service system determines a model version according to the issuing instruction to issue the model.

In this embodiment, when the saving is completed, a model issue instruction may be generated to issue the model. When the model needs to be issued, the server appointed with the service system stores the model executable file needing to be issued, so that the service system can be directly informed to issue the model. The version released each time is stored in the database, and when a user logs in a service system, the user can directly enter a web platform to manage, adjust or retrain the model in the platform. The distributed machine learning management platform is embedded into the service system, so that the service system can conveniently manage the platform.

In this embodiment, the version information of the trained model is set; packing the version information into a model executable file; storing the model executable file into a preset database and a server corresponding to a service system; and generating a model release instruction, and informing the service system to ensure that the service system determines a model version according to the release instruction to release the model, wherein the version information released by the model is stored as an executable file and is stored in a server appointed with the service system, and when the model version information needs to be released, the service system is directly informed to release the model version, so that the linkage between the model and the service system is facilitated, and the calling and monitoring of the model by the service system are facilitated.

Referring to fig. 10, fig. 10 is a block diagram illustrating a first embodiment of a distributed machine learning model training apparatus according to the present invention.

As shown in fig. 10, the distributed machine learning model training apparatus according to the embodiment of the present invention includes:

the extracting module 10 is configured to extract training information in a training instruction when the training instruction of a user is obtained.

And an obtaining module 20, configured to obtain data to be trained from a server based on the training information.

And a creating module 30, configured to create a corresponding training model according to the data to be trained.

And the training module 40 is used for performing distributed model training on the data to be trained based on the training model.

In an embodiment, the extracting module 10 is further configured to extract a corresponding scene type according to the training instruction; and acquiring corresponding training information according to the scene type.

In an embodiment, the extracting module 10 is further configured to, when detecting that a user performs an operation, obtain, through a preset import interface, first original file data and second original file data imported by the user, where the first original file data is located in a first preset server, the second original file data is located in a second preset server, and the first preset server is different from the second preset server; reading first data content of the first original file data and second data content of the second original file data; merging the first data content and the second data content in a preset splicing mode to obtain original file data; and storing the original file data to a server as data to be trained.

In an embodiment, the extracting module 10 is further configured to, when an update instruction of a user is received, extract update information in the update instruction; updating the original file data based on the updating information; and storing the updated original file data to the server.

In an embodiment, the training module 40 is further configured to create a model canvas based on the training model; establishing a functional module of the training model based on the model canvas and the training information; obtaining a training process according to the functional module; importing the data to be trained based on the training process to obtain a model operation task; and monitoring the model operation task through a preset distributed system, and issuing the model operation task to a corresponding server through a preset message queue for parallel distributed operation so as to realize distributed model training of the data to be trained.

In an embodiment, the training module 40 is further configured to store the model canvas into a corresponding preset canvas file in a server; when a modification instruction of a user is acquired, extracting identification information in the modification instruction, wherein the identification information comprises: one or more of an experiment identifier, a model identifier and a canvas identifier; analyzing the preset canvas file according to the identification information; and reloading the model canvas according to the analysis result so as to modify the model canvas.

In an embodiment, the training module 40 is further configured to set version information of the trained model; packing the version information into a model executable file; storing the model executable file into a preset database and a server corresponding to a service system; and generating a model issuing instruction, and informing the service system so that the service system determines a model version according to the issuing instruction to issue the model.

Since the distributed machine learning model training device adopts all technical solutions of all the embodiments, at least all the beneficial effects brought by the technical solutions of the embodiments are achieved, and no further description is given here.

Furthermore, an embodiment of the present invention further provides a storage medium, where a distributed machine learning model training program is stored, and when executed by a processor, the distributed machine learning model training program implements the steps of the distributed machine learning model training method described above.

Since the storage medium adopts all technical solutions of all the embodiments, at least all the beneficial effects brought by the technical solutions of the embodiments are achieved, and no further description is given here.

It should be understood that the above is only an example, and the technical solution of the present invention is not limited in any way, and in a specific application, a person skilled in the art may set the technical solution as needed, and the present invention is not limited thereto.

It should be noted that the above-described work flows are only exemplary, and do not limit the scope of the present invention, and in practical applications, a person skilled in the art may select some or all of them to achieve the purpose of the solution of the embodiment according to actual needs, and the present invention is not limited herein.

In addition, the technical details that are not described in detail in this embodiment may refer to the training method for the distributed machine learning model provided in any embodiment of the present invention, and are not described herein again.

Further, it is to be noted that, in this document, the terms "comprises," "comprising," or any other variation thereof, are intended to cover a non-exclusive inclusion, such that a process, method, article, or system that comprises a list of elements does not include only those elements but may include other elements not expressly listed or inherent to such process, method, article, or system. Without further limitation, an element defined by the phrase "comprising an … …" does not exclude the presence of other like elements in a process, method, article, or system that comprises the element.

The above-mentioned serial numbers of the embodiments of the present invention are merely for description and do not represent the merits of the embodiments.

Through the above description of the embodiments, those skilled in the art will clearly understand that the method of the above embodiments can be implemented by software plus a necessary general hardware platform, and certainly can also be implemented by hardware, but in many cases, the former is a better implementation manner. Based on such understanding, the technical solution of the present invention or portions thereof that contribute to the prior art may be embodied in the form of a software product, where the computer software product is stored in a storage medium (e.g. Read Only Memory (ROM)/RAM, magnetic disk, optical disk), and includes several instructions for enabling a terminal device (e.g. a mobile phone, a computer, a server, or a network device) to execute the method according to the embodiments of the present invention.

The above description is only a preferred embodiment of the present invention, and not intended to limit the scope of the present invention, and all modifications of equivalent structures and equivalent processes, which are made by using the contents of the present specification and the accompanying drawings, or directly or indirectly applied to other related technical fields, are included in the scope of the present invention.

Claims

1. A distributed machine learning model training method, characterized in that the distributed machine learning model training method comprises:

acquiring data to be trained from a server based on the training information;

creating a corresponding training model according to the data to be trained;

2. The method of distributed machine learning model training of claim 1, wherein the extracting training information in the training instructions comprises:

extracting a corresponding scene type according to the training instruction;

and acquiring corresponding training information according to the scene type.

3. The method for training the distributed machine learning model according to claim 1, wherein before extracting the training information in the training instruction when the training instruction of the user is obtained, the method further comprises:

and storing the original file data to a server as data to be trained.

4. The method of distributed machine learning model training according to claim 3, wherein after saving the raw document data to a server as data to be trained, further comprising:

updating the original file data based on the updating information;

and storing the updated original file data to the server.

5. The distributed machine learning model training method of any of claims 1 to 4, wherein the performing distributed model training on the data to be trained based on the training model comprises:

creating a model canvas based on the training model;

obtaining a training process according to the functional module;

6. The distributed machine learning model training method of any of claims 1 to 4, after the distributed model training of the data to be trained based on the training model, further comprising:

storing the model canvas into a corresponding preset canvas file in a server;

analyzing the preset canvas file according to the identification information;

7. The distributed machine learning model training method of any of claims 1 to 4, after the distributed model training of the data to be trained based on the training model, further comprising:

setting version information of the trained model;

packing the version information into a model executable file;

8. A distributed machine learning model training apparatus, comprising:

9. A distributed machine learning model training apparatus, comprising: a memory, a processor, and a distributed machine learning model training program stored on the memory and executable on the processor, the distributed machine learning model training program configured to implement the distributed machine learning model training method of any of claims 1-7.

10. A storage medium having stored thereon a distributed machine learning model training program which, when executed by a processor, implements a distributed machine learning model training method according to any one of claims 1 to 7.