CN113537507B

CN113537507B - Machine learning system, method and electronic equipment

Info

Publication number: CN113537507B
Application number: CN202010907359.2A
Authority: CN
Inventors: 李伟; 陈守志; 苏函晶; 洪立涛
Original assignee: Tencent Technology Shenzhen Co Ltd
Current assignee: Tencent Technology Shenzhen Co Ltd
Priority date: 2020-09-02
Filing date: 2020-09-02
Publication date: 2024-05-24
Anticipated expiration: 2040-09-02
Also published as: CN113537507A

Abstract

The application provides a machine learning system, a machine learning method, an electronic device and a computer readable storage medium; data calculation and data transmission in the technical field of cloud computing are related; the method comprises the following steps: receiving a data transfer request between any two sub-computing systems; when the data transfer request is detected to meet the security condition of cross-system data transfer, authorizing to execute the data transfer operation corresponding to the data transfer request; wherein the plurality of sub-computing systems are configured to perform computing tasks in corresponding processing stages based on the stored data. The application can strengthen the data security of the processing flow of the machine learning model.

Description

Machine learning system, method and electronic equipment

Technical Field

The present application relates to cloud computing technology and artificial intelligence technology, and in particular, to a machine learning system, a method, an electronic device, and a computer readable storage medium.

Background

Artificial intelligence (ARTIFICIAL INTELLIGENCE, AI) is the theory, method, technique, and application system that simulates, extends, and extends human intelligence using a digital computer or a machine controlled by a digital computer, perceives the environment, obtains knowledge, and uses the knowledge to obtain optimal results. Machine learning (MACHINE LEARNING, ML) is an important branch of artificial intelligence, mainly studying how computers simulate or implement learning behavior of humans to acquire new knowledge or skills, reorganizing existing knowledge structures to continuously improve their own performance.

The life cycle (comprising modeling and application) of the machine learning model is a multi-flow and high-flexibility process, and relates to big data processing of cloud technology, such as data calculation and data transmission of different sub-flows; is divided into a plurality of sub-flows, each sub-flow being carried by a corresponding sub-system.

In the modeling process and the application process, data such as parameters and characteristics of a machine learning model are frequently adjusted due to various requirements, and if adjustment errors cause safety accidents in different subsystems, the data in different sub-processes are lost or destroyed, and the data safety is affected.

Disclosure of Invention

Embodiments of the present application provide a machine learning system, method, electronic device, and computer-readable storage medium, which can enhance data security of a process flow of a machine learning model.

The technical scheme of the embodiment of the application is realized as follows:

An embodiment of the present application provides a machine learning system including:

A publication system and a plurality of sub-computing systems isolated from each other and corresponding to different processing stages of the machine learning model; wherein,

The plurality of sub-computing systems are used for executing computing tasks in corresponding processing stages according to the stored data;

The issuing system is used for receiving a data transfer request between any two sub-computing systems; when the data transfer request is detected to meet the security condition of cross-system data transfer, the data transfer operation corresponding to the data transfer request is authorized to be executed.

The embodiment of the application provides a machine learning method which is applied to a plurality of sub-computing systems, wherein the plurality of sub-computing systems are isolated from each other and correspond to different processing stages of a machine learning model;

the machine learning method includes:

receiving a data transfer request between any two sub-computing systems;

Authorizing execution of a data transfer operation corresponding to the data transfer request when the data transfer request is detected to meet a security condition of cross-system data transfer;

wherein the plurality of sub-computing systems are configured to perform computing tasks in corresponding processing stages according to the stored data.

In the above scheme, the method further comprises:

the plurality of sub-computing systems are operable to be isolated from one another by at least one of:

performing respective computing tasks using different computing resources; the different storage resources are used to store the data needed to perform the computing task, as well as the results of the computing task.

In the above scheme, the plurality of sub-computing systems include an offline computing system, a near-line computing system, and an online computing system; the machine learning method further includes:

The offline computing system executes an offline training computing task of the machine learning model according to the stored historical data;

Wherein the offline training computing task comprises: extracting a history sample from the history data based on the characteristic statistical conversion of the history data and based on the history characteristics obtained by the characteristic statistical conversion, and training the machine learning model based on the history sample;

The near line computing system executes near line training computing tasks according to real-time data;

the near line training calculation task comprises the following steps: extracting a real-time sample from the real-time data based on the feature statistical conversion of the real-time data and the real-time features obtained by the feature statistical conversion, and training the machine learning model based on the real-time sample;

The online computing system responds to a real-time prediction request, executes an online prediction computing task of the machine learning model, and responds to the prediction request based on the obtained prediction result.

In the above scheme, the near line computing system performs near line training computing tasks according to real-time data, including:

the near-line computing system acquires first data from the offline computing system through authorized first data transfer operation, and executes the near-line training computing task by combining the first data with the real-time data;

wherein the first data includes at least one of:

And the historical parameters of the machine learning model are calculated in the offline training calculation task, and the historical characteristics are calculated in the offline training calculation task.

In the above solution, the performing the near line training calculation task in combination with the first data and the real-time data includes:

The near-line computing system takes the historical characteristics as real-time characteristics, and extracts real-time samples from the real-time data according to the real-time characteristics;

training the machine learning model deploying the historical parameters based on the real-time samples.

In the above solution, after the near line computing system performs the near line training computing task according to the real-time data, the near line computing system further includes:

The online computing system acquires second data from the near-line computing system through a second data transfer operation authorized by the issuing system, and executes the online prediction computing task by combining the second data and the stored data to be detected;

wherein the second data includes at least one of:

The real-time parameters of the machine learning model calculated in the near-line training calculation task are the real-time features calculated in the near-line training calculation task.

In the above solution, the performing the online prediction calculation task by combining the second data and the stored data to be measured includes:

the online computing system extracts a sample to be detected from the data to be detected according to the real-time characteristics;

and carrying out prediction processing on the sample to be detected by deploying the machine learning model of the real-time parameters to obtain a prediction result.

The near line computing system combines the second data and the stored data to be detected to execute a near line prediction computing task to obtain a prediction result;

wherein the second data includes at least one of:

The real-time parameters of the machine learning model are calculated in the near-line training calculation task, and the real-time characteristics are calculated in the near-line training calculation task;

the online computing system responds to a real-time prediction request by acquiring the prediction result from the near-line computing system through a third data transfer operation authorized by the issuing system.

In the above solution, after the performing the offline training calculation task of the machine learning model, the method further includes:

The offline computing system combines the first data and the stored data to be detected, executes an offline prediction computing task and obtains a prediction result;

wherein the first data includes at least one of:

the historical parameters of the machine learning model are calculated in the offline training calculation task, and the historical characteristics are calculated in the offline training calculation task;

The online computing system responds to the prediction request by obtaining the prediction result from the offline computing system through a fourth data transfer operation authorized by the publishing system.

In the above scheme, the method further comprises:

When the type of the data to be transferred corresponding to the data transfer request accords with the type of the permitted transfer, determining that the data transfer request meets the security condition;

Wherein the type of allowable transition includes a result of the computing task.

In the above scheme, the method further comprises:

and when the data transfer time of the data transfer request accords with the time interval allowing data transfer, determining that the data transfer request meets the security condition.

In the above scheme, the method further comprises:

And when the data transfer direction of the data transfer request accords with the set transfer direction, determining that the data transfer request meets the security condition.

In the above scheme, the method further comprises:

the sub-computing system performs backup processing on the stored original data to obtain backup data for use in the data processing of the data processing system

And when the original data has errors, recovering according to the backup data.

In the above scheme, the method further comprises:

The sub-computing system performs audit processing on the set data in the stored data to obtain an audit log comprising data access operations performed on the set data so as to

And when the setting data has errors, positioning data access operation causing the errors according to the audit log.

An embodiment of the present application provides an electronic device, including:

A memory for storing executable instructions;

and the processor is used for realizing the machine learning method provided by the embodiment of the application when executing the executable instructions stored in the memory.

The embodiment of the application provides a computer readable storage medium which stores executable instructions for causing a processor to execute, thereby realizing the machine learning method provided by the embodiment of the application.

The embodiment of the application has the following beneficial effects:

According to the plurality of sub-computing systems isolated from each other, computing tasks of different processing stages of the machine learning model are executed, so that the computing tasks of different processing stages are effectively ensured not to be influenced; meanwhile, when data transfer is needed among different sub-computing systems, according to security audit conducted by the issuing system, whether the data transfer operation is authorized to be executed is determined, and data security of a processing flow of the machine learning model can be enhanced.

Drawings

FIG. 1 is a schematic diagram of an alternative architecture of a machine learning system provided by an embodiment of the present application;

FIG. 2 is a schematic diagram of an alternative architecture of an electronic device provided by an embodiment of the present application;

FIG. 3A is a schematic flow chart of an alternative machine learning method provided by an embodiment of the present application;

FIG. 3B is a schematic flow chart of an alternative machine learning method provided by an embodiment of the present application;

FIG. 3C is a schematic flow chart of an alternative machine learning method provided by an embodiment of the present application;

FIG. 3D is a schematic flow chart of an alternative machine learning method provided by an embodiment of the present application;

FIG. 4 is an alternative schematic diagram of a storage medium isolation policy provided by an embodiment of the present application;

FIG. 5 is an alternative schematic diagram of data transfer provided by an embodiment of the present application;

FIG. 6 is an alternative schematic diagram of a computing resource isolation policy provided by an embodiment of the present application.

Detailed Description

The present application will be further described in detail with reference to the accompanying drawings, for the purpose of making the objects, technical solutions and advantages of the present application more apparent, and the described embodiments should not be construed as limiting the present application, and all other embodiments obtained by those skilled in the art without making any inventive effort are within the scope of the present application.

In the following description, reference is made to "some embodiments" which describe a subset of all possible embodiments, but it is to be understood that "some embodiments" can be the same subset or different subsets of all possible embodiments and can be combined with one another without conflict.

In the following description, the terms "first", "second", "third" and the like are merely used to distinguish similar objects and do not represent a specific ordering of the objects, it being understood that the "first", "second", "third" may be interchanged with a specific order or sequence, as permitted, to enable embodiments of the application described herein to be practiced otherwise than as illustrated or described herein. In the following description, the term "plurality" refers to at least two.

In the embodiment of the application, the relevant data collection and processing should be strictly according to the requirements of relevant national laws and regulations when the example is applied, the informed consent or independent consent of the personal information body is obtained, and the subsequent data use and processing behaviors are developed within the authorized range of the laws and regulations and the personal information body.

Unless defined otherwise, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the art to which this application belongs. The terminology used herein is for the purpose of describing embodiments of the application only and is not intended to be limiting of the application.

Before describing embodiments of the present application in further detail, the terms and terminology involved in the embodiments of the present application will be described, and the terms and terminology involved in the embodiments of the present application will be used in the following explanation.

1) The sub-computing system: for executing computing tasks in one or more processing stages of the machine learning model based on its own computing resources. The processing stages of the machine learning model include, but are not limited to, a training stage and a prediction stage.

2) The release system comprises: the data transfer system is used for auditing the data transfer requests among different sub-computing systems and determining whether the corresponding data transfer operation is authorized to be executed.

3) Offline environment: and executing the calculation task based on the generated historical data, wherein the real-time performance is poor, and the real-time service is not provided. In an embodiment of the application, an offline environment is built based on an offline computing system.

4) Near-line environment: executing computing tasks based on data generated in real-time does not guarantee the provision of real-time services. In an embodiment of the application, a near-line environment is built based on a near-line computing system.

5) On-line environment: and responding to the real-time request, executing a corresponding calculation task and ensuring to provide real-time service. In an embodiment of the application, an online environment is built based on an online computing system.

6) Cloud Computing (Cloud Computing): a computing mode distributes computing tasks on a resource pool formed by a large number of electronic devices, so that various application systems (sub-computing systems) can acquire computing resources, storage resources and information services according to requirements. The network that provides the resources is referred to as the "cloud". Resources in the cloud are infinitely expandable in the sense of users, and can be acquired at any time, used as needed, expanded at any time and paid for use as needed.

7) Database (Database): data sets which are stored together in a certain way, can be shared with a plurality of users, have the smallest redundancy and are independent of application programs, and the users can execute operations such as adding, inquiring, updating and deleting on the data in the database.

The lifecycle of a machine learning model is a multi-process, particularly flexible process, in order to prevent the input or output data of each sub-process from affecting each other, it is common to distinguish the input and output of the calculated data from the application parameters, such as: model ID, model time, sample time, application scenario ID, sample table, feature table, etc. Related personnel often need to perform fine adjustment on related model parameters, features, samples and the like in different sub-flows, and then compare the effects of the fine adjustment, if one fine adjustment is verified to be effective, the full application scene needs to be updated manually when the fine adjustment is pushed to the application full scene, and the updating operation is repeated, so that the labor cost is high. In addition, when the update operation is manually performed, a problem of modification errors (such as inputting a model ID by mistake) may occur, so that serious security accidents are easily caused, and the data security and the modeling efficiency are low.

The embodiment of the application provides a machine learning system, a machine learning method, electronic equipment and a computer readable storage medium, which can strengthen the data security in the processing process of a machine learning model and improve the modeling efficiency. The following describes exemplary applications of the electronic device provided by the embodiments of the present application, where the electronic device provided by the embodiments of the present application may be implemented as various types of terminal devices such as a notebook computer, a tablet computer, a desktop computer, a set-top box, a mobile device (e.g., a mobile phone, a portable music player, a personal digital assistant, a dedicated messaging device, a portable game device), and the like, and may also be implemented as a server.

The embodiment of the application can be suitable for machine learning processing procedures in various application scenes, taking a scene of content recommendation as an example, a calculation task (training calculation task) of a training stage of a machine learning model can be executed at a cloud end according to content trigger records of a user in an application program, then when a real-time prediction request initiated in the application program is received, the calculation task (prediction calculation task) of the prediction stage of the machine learning model is executed in the cloud end or terminal equipment carried by the application program, and the prediction request is responded according to the obtained prediction result. Here, the type of the content is not limited, and may be, for example, an advertisement, a payment coupon, a public number, a movie or a television show, or the like.

Referring to fig. 1, fig. 1 is a schematic diagram of an alternative architecture of a machine learning system 100 according to an embodiment of the present application, and for convenience of understanding, a plurality of sub-computing systems including an offline computing system 300, a near-line computing system 400, and an online computing system 500 are taken as examples, and an application scenario of content recommendation is taken as an example. In FIG. 1, publication system 200 includes server 200-1, offline computing system 300 includes server 300-1 and database 300-2, offline computing system 400 includes server 400-1 and database 400-2, and online computing system 500 includes terminal device 500-1. Wherein the storage resources in the sub-computing system may be provided by at least one of a database, a distributed file system, and a distributed memory system, the database being used only as an example; the online computing system 500 may also be provided with computing resources by a server, here exemplified only by terminal devices.

In the processing of the machine learning model, offline training computing tasks are first performed by server 300-1 based on historical data stored in database 300-2, such as a user's historical content trigger record. The history data may be generated by the online computing system 500 and obtained by the offline computing system 300 via a data transfer request (not shown in fig. 1), or may be manually stored in the offline computing system 300. The embodiment of the application does not limit the form of the content trigger record, and can comprise user data, data of recommended content and trigger results.

The server 300-1 may transfer the historical features computed in the offline training computing task, as well as the historical parameters of the machine learning model, through the publication system 200 to the database 400-2 of the near-line computing system 400. The server 400-1 in the near-line computing system 400 performs near-line training computing tasks based on the real-time features, the historical parameters, and the real-time data using the historical features in the database 400-2 as the real-time features. The real-time data, such as the current content trigger record of the user generated in real time, may be generated by the online computing system 500, obtained by the near-line computing system 400 through a data transfer request, or may be manually stored in the near-line computing system 400. In addition, the history feature refers to a type of feature, for example, which features are extracted from user data as object features, and which features are extracted from data of content as content features.

The server 400-1 may transfer real-time features in the near-line training computing task, as well as real-time parameters of the machine learning model, through the publication system 200 to the online computing system 500, such as locally to the terminal device 500-1. The terminal device 500-1 responds to the real-time prediction request, performs an online prediction calculation task of the machine learning model according to the real-time parameters and the real-time characteristics, and responds to the prediction request based on the obtained prediction result.

The terminal device 500-1 may display various results and final results in the processing of the machine learning model in the graphical interface 510-1. In fig. 1, taking a content recommendation scenario as an example, an intelligent content recommendation button and recommended content 1 and content 2 are shown, where the intelligent content recommendation button is used to be triggered to generate a real-time prediction request, and content 1 and content 2 are the above prediction results.

It should be noted that the above processing procedure of the machine learning model is merely an example and is not limited to the embodiment of the present application, for example, the server 300-1 may also perform an offline prediction calculation task to obtain a prediction result, and transfer the prediction result to the terminal device 500-1 through the publishing system 200; for another example, the server 400-1 may also perform a near-line prediction calculation task to obtain a prediction result, and transfer the prediction result to the terminal device 500-1 through the publishing system 200.

In some embodiments, the servers (such as server 200-1, server 300-1, server 400-1) in fig. 1 may be independent physical servers, may be a server cluster or a distributed system formed by a plurality of physical servers, and may be cloud servers providing cloud services, cloud databases, cloud computing, cloud functions, cloud storage, network services, cloud communication, middleware services, domain name services, security services, CDNs, and basic cloud computing services such as big data and artificial intelligence platforms, for example, the server 300-1 may provide cloud services of offline training computing tasks and offline prediction computing tasks; the server 400-1 may provide cloud services for near-line training computing tasks and near-line prediction computing tasks; when the online computing system includes a server, the server may also provide cloud services for online predictive computing tasks. The terminal device 500-1 may be, but is not limited to, a smart phone, a tablet computer, a notebook computer, a desktop computer, a smart speaker, a smart watch, etc. The terminal device and the server may be directly or indirectly connected through wired or wireless communication, which is not limited in the embodiment of the present application.

Referring to fig. 2, fig. 2 is a schematic architecture diagram of an electronic device 800 according to an embodiment of the present application, where the electronic device 800 may be an electronic device that provides computing resources in a distribution system or an electronic device that provides computing resources in a sub-computing system, and for convenience of understanding, a case where the electronic device 800 is taken as a server is illustrated in fig. 2. The electronic device 800 shown in fig. 2 includes: at least one processor 810, a memory 840, and at least one network interface 820. The various components in electronic device 800 are coupled together by bus system 830. It is understood that bus system 830 is used to enable connected communications between these components. The bus system 830 includes a power bus, a control bus, and a status signal bus in addition to the data bus. But for clarity of illustration the various buses are labeled in fig. 2 as bus system 830.

The Processor 810 may be an integrated circuit chip having signal processing capabilities such as a general purpose Processor, such as a microprocessor or any conventional Processor, a digital signal Processor (DSP, digital Signal Processor), or other programmable logic device, discrete gate or transistor logic device, discrete hardware components, or the like.

Memory 840 may be removable, non-removable, or a combination thereof. Exemplary hardware devices include solid state memory, hard drives, optical drives, and the like. Memory 840 optionally includes one or more storage devices physically located remote from processor 810.

Memory 840 includes volatile memory or nonvolatile memory, and may also include both volatile and nonvolatile memory. The non-volatile memory may be read only memory (ROM, read Only Me mory) and the volatile memory may be random access memory (RAM, random Access Memor y). The memory 840 described in embodiments of the present application is intended to comprise any suitable type of memory.

In some embodiments, memory 840 is capable of storing data to support various operations, examples of which include programs, modules and data structures, or subsets or supersets thereof, as exemplified below.

An operating system 841 including system programs for handling various basic system services and performing hardware-related tasks, such as a framework layer, a core library layer, a driver layer, etc., for implementing various basic services and handling hardware-based tasks;

Network communication module 842 for reaching other computing devices via one or more (wired or wireless) network interfaces 820, exemplary network interfaces 820 include: bluetooth, wireless compatibility authentication (WiFi), and universal serial bus (USB, universal Serial Bus), etc.

In some embodiments, embodiments of the present application may implement a machine learning process in software, and fig. 2 shows software in the form of programs and plug-ins stored in a memory 840, including the following software modules: the sub-computing module 8431 and the publishing module 8432 are logical, and thus may be arbitrarily combined or further split depending on the functions implemented. In addition, in fig. 2, various software modules involved in the machine learning process are collectively illustrated, and in practice, these software modules may be deployed in different electronic devices, such as the sub-computing module 8431 being deployed in an electronic device of a sub-computing system, the publishing module 8432 being deployed in an electronic device of a publishing system, and so on. The functions of the respective modules will be described hereinafter.

In other embodiments, embodiments of the application may implement the machine learning process in hardware, such as a processor in the form of a hardware decoding processor programmed to perform the machine learning method provided by embodiments of the application, e.g., the processor in the form of a hardware decoding processor may employ one or more Application Specific Integrated Circuits (ASICs), DS P, programmable logic devices (PLDs, programmable Logic Device), complex Programmable logic devices (CPLDs, complex Programmable Logic Device), field-Programmable gate arrays (FPGs), or other electronic components.

In some embodiments, other structures may be included on the basis of fig. 2 for the case where the electronic device is a terminal device. For example, a user interface is also included that includes one or more output devices that enable presentation of media content, including one or more speakers and/or one or more visual displays. The user interface also includes one or more input devices, including user interface components that facilitate user input, such as a keyboard, mouse, microphone, touch screen display, camera, other input buttons, and controls.

In memory 840, a presentation module may also be included for enabling presentation of information (e.g., user interfaces for operating peripheral devices and displaying content and information) via one or more output devices (e.g., display screens, speakers, etc.) associated with the user interfaces; in the memory 840, an input processing module may also be included for detecting one or more user inputs or interactions from one of the one or more input devices and translating the detected inputs or interactions.

The machine learning method provided by the embodiment of the application will be described in connection with exemplary applications and implementations of the electronic device provided by the embodiment of the application.

Referring to fig. 3A, fig. 3A is a schematic flow chart of an alternative machine learning method provided in an embodiment of the present application, which is illustrated by an offline computing system and a near-line computing system for ease of understanding, but this is not meant to limit the embodiment of the present application.

In step 101, a plurality of sub-computing systems perform computing tasks in corresponding processing stages according to stored data.

In an embodiment of the present application, a plurality of sub-computing systems isolated from each other are included, and the plurality of sub-computing systems correspond to different processing stages of a machine learning model, where the number of machine learning models is not limited, and the processing stages of the machine learning model include, but are not limited to, a training stage and a prediction stage. For each sub-computing system, for performing a computing task in a corresponding processing stage based on the stored data.

In some embodiments, further comprising: the plurality of sub-computing systems are isolated from one another by at least one of: performing respective computing tasks using different computing resources; the different storage resources are used to store the data needed to perform the computing task, as well as the results of the computing task.

Here, two ways of isolation are provided. The first approach is that multiple sub-computing systems use different computing resources to perform their respective computing tasks, i.e., the computing resources used by the different sub-computing systems are different. For example, there are 10 available servers, 5 of which are partitioned to the offline computing system and the remaining 5 servers are partitioned to the near-line computing system, thus isolating the offline computing system from the near-line computing system in terms of computing resources. By the method, the situation that the computing tasks cannot be normally executed due to the fact that different sub-computing systems compete for computing resources can be prevented.

In the second mode, the plurality of sub-computing systems use different storage resources to store data required for executing the computing task and the result of the computing task, that is, different sub-computing systems correspond to different storage spaces, and the sub-computing systems only have data access rights to the corresponding storage spaces and do not have data access rights to other storage spaces. For example, the storage space corresponding to the sub-computing system a is a ₁, and the storage space corresponding to the sub-computing system B is B ₁, the storage space a ₁ only allows the sub-computing system a to perform the data access operation, and prohibits the sub-computing system B from performing the data access operation (provided that the sub-computing system B does not obtain the authority to perform the data transfer operation). The data access operation comprises at least one of a read operation, a write operation, a modification operation and a query operation of data stored in the storage space; in addition, the storage space includes at least one of a database, a distributed file system and a distributed memory system, however, other forms of storage space may be applied, for example, when the data size to be stored is large, a data warehouse may be applied, which is not limited thereto. By the method, different sub-computing systems are isolated according to the storage resources, and the safety of data stored in the sub-computing systems is improved. At least one of the two modes can be applied to perform isolation according to different practical application scenes.

In some embodiments, the plurality of sub-computing systems includes an offline computing system, a near-line computing system, and an online computing system; the above-described plurality of sub-computing systems may be implemented in such a way as to perform computing tasks in corresponding processing stages based on the stored data: the offline computing system executes an offline training computing task of the machine learning model according to the stored historical data; the offline training calculation task comprises the following steps: feature statistical conversion based on historical data, extracting a historical sample from the historical data based on the historical features obtained by the feature statistical conversion, and training a machine learning model based on the historical sample; the near line computing system executes near line training computing tasks according to the real-time data; the near line training calculation task comprises the following steps: feature statistics conversion based on real-time data, extracting real-time samples from the real-time data based on real-time features obtained by the feature statistics conversion, and training a machine learning model based on the real-time samples; the online computing system responds to the real-time prediction request, executes an online prediction computing task of the machine learning model, and responds to the prediction request based on the obtained prediction result.

In an embodiment of the present application, the plurality of sub-computing systems may include an offline computing system, a near-line computing system, and an online computing system, for building an offline environment, a near-line environment, and an online environment, respectively. To facilitate understanding, the role of each sub-computing system is illustrated in terms of content recommendation scenarios.

For offline computing systems, it may be useful to perform offline training computing tasks of a machine learning model based on stored historical data, where the historical data is data collected from an online application environment (online environment) of the machine learning model. In the process of executing the offline training calculation task, firstly, performing feature statistical conversion based on historical data, namely performing feature engineering processing to obtain historical features, wherein the historical features represent types of features extracted from the historical data, and the rules of the feature statistical conversion can be preset. For example, the history data is a historical content trigger record, where the content trigger record includes user data, data of recommended content, and a trigger result, where the historical features may include object features such as gender, age, residence city, hobbies and interests in the user data, content features such as content type, display position, display duration and the like in the data of the recommended content, and may also include a trigger result, where the trigger result may indicate whether the user triggered the recommended content, and may also indicate a duration (such as advertisement browsing duration, game playing duration and the like) of the user triggering the recommended content, and the trigger form may be clicking, long pressing or the like, according to an actual application scenario, and is not limited. Then, a history sample is extracted from the history data according to the history features, for example, a history sample of 'object features-content features-trigger results' is obtained, parameters of the machine learning model are updated based on the history sample, and the updated parameters are named as history parameters for convenience of distinguishing. The update method is not limited here, and for example, parameter update may be performed by combining back propagation and gradient descent mechanisms. In addition to offline training computing tasks, offline computing systems may also perform offline predictive computing tasks, as described in detail below.

For near-line computing systems, the near-line training computing tasks of the machine learning model may be performed based on real-time data, which is also data collected from the on-line application environment of the machine learning model. Similar to the offline training computing task, in the process of executing the near-line training computing task, feature statistical conversion based on real-time data is firstly executed, namely feature engineering processing is performed, so as to obtain real-time features, and the real-time features represent types of features extracted from the real-time data. Then, a real-time sample is extracted from the real-time data according to the real-time feature, for example, a real-time sample of 'object feature-content feature-trigger result' is obtained, parameters of the machine learning model are updated based on the real-time sample, and the updated parameters are named as real-time parameters for convenience of distinguishing. In addition to the near-line training computing tasks, the near-line computing system may also perform near-line prediction computing tasks, as described in detail below.

For an online computing system, when a real-time prediction request is received, an online prediction computing task of a machine learning model is executed, and the prediction request is responded based on the obtained prediction result. For example, in the process of executing the online prediction calculation task, feature statistics conversion may be performed on the stored data to be measured, so as to extract a sample to be measured from the data to be measured. In the context of content recommendation, the sample to be tested differs from the historical sample and the real-time sample above in that no trigger results are present in the sample to be tested. Here, the online computing system includes the user data of the prediction request corresponding to the user and the data of the plurality of contents to be recommended for illustration, and the sample to be tested may include the object feature extracted from the user data and the content feature extracted from the data of a certain content to be recommended, that is, each content to be recommended corresponds to one sample to be tested. And then, respectively carrying out prediction processing on a plurality of samples to be detected through a machine learning model to obtain a prediction result corresponding to each sample to be detected, wherein the prediction result is a prediction trigger result of the content to be recommended corresponding to the sample to be detected. According to the predicted trigger result, a plurality of contents to be recommended can be screened, for example, when the predicted trigger result indicates whether to trigger, the contents to be recommended corresponding to the predicted trigger result indicating the trigger are used as screened contents; when the predicted trigger result indicates the trigger time length, the content to be recommended corresponding to the predicted trigger result with the trigger time length exceeding the time length threshold is used as the screened content. And finally, recommending the screened content to the user corresponding to the prediction request by the online computing system, so that the content recommending effect can be improved, and the user experience can be enhanced. In some cases, the filtered content may also be used as a prediction result.

It should be noted that, the scenario of the embodiment of the present application is not limited to content recommendation, and may be, for example, a scenario such as computer vision (e.g., face detection, human body detection, or vehicle detection), speech technology (e.g., speech recognition or speech synthesis), or natural language processing (e.g., entity recognition, part-of-speech labeling, machine translation, or robot question and answer).

In some embodiments, between any of the steps, further comprising: and the sub-computing system performs backup processing on the stored original data to obtain backup data so as to restore according to the backup data when the original data has errors.

For the sub-computing system, the stored original data can be backed up to obtain backup data. For example, the online computing system may backup parameters of the machine learning model, and if the parameters of the machine learning model are wrong after the machine learning model is online, and thus the online prediction computing task cannot be executed, the online computing system may restore, that is, roll back to the machine learning model before the error occurs, according to the backed-up parameters. By the method, the fault tolerance of the sub-computing system in executing the computing task can be improved.

In some embodiments, between any of the steps, further comprising: the sub-computing system performs audit processing on the set data in the stored data to obtain an audit log comprising data access operations executed on the set data, so as to locate the data access operations causing errors according to the audit log when errors occur in the set data.

In the embodiment of the application, the sub-computing system can also perform audit processing on the setting data in the stored data, wherein the setting data can be specifically set according to an actual application scene, for example, the setting data can be parameters of a machine learning model. The audit processing is to record the data access operation executed by the setting data to obtain an audit log, wherein the data access operation can be set according to the actual application scene, for example, the data access operation is set to record only the modification operation executed on the parameters of the machine learning model. Therefore, when the data is set to have errors, the data access operation causing the errors can be positioned according to the audit log, so that related personnel can repair the errors accurately, wherein the positioning mode can be manual positioning or positioning by means of a specific positioning rule. By the mode, errors can be accurately and rapidly traced.

In step 102, the publishing system receives a data transfer request between any two sub-computing systems.

Here, there may be a need for data transfer between different sub-computing systems, for example, an offline computing system needs to transfer trained historical parameters to a near-line computing system. In the embodiment of the application, the data transfer between the isolated sub-computing systems is realized by the issuing system, and the issuing system judges whether to authorize or not after receiving the data transfer request between any two sub-computing systems, wherein the data transfer request can be a data sending request or a data obtaining request.

In step 103, when it is detected that the data transfer request satisfies the security condition of the cross-system data transfer, the issuing system authorizes the execution of the data transfer operation corresponding to the data transfer request.

If the issuing system detects that the data transfer request meets the security condition of cross-system data transfer, authorizing the execution of data transfer operation corresponding to the data transfer request; if the safety condition is detected to be not met, the processing is not performed, so that the safety of the data and the order of the processing process of the machine learning model are ensured to the greatest extent. It should be noted that, in the embodiment of the present application, the execution order of executing the computing task and performing the data transfer is not limited, for example, the computing task may be executed first, the data transfer may be performed according to the result of the computing task, and the computing task may also be executed according to the data obtained by the data transfer.

In addition, backup and audit can be realized by the release system, for example, the release system can carry out backup processing on the transferred original data to obtain backup data, and thus, when the original data has errors, the backup data can be restored; the issuing system can also carry out audit processing on the set data in the transferred data to obtain an audit log comprising data access operations executed on the set data, so that when the set data has errors, the data access operations causing the errors can be positioned according to the audit log.

In transferring data, one implementation is that the sub-computing system can obtain the names and/or other identifiable information of the data stored by other sub-computing systems, but cannot obtain the storage address of the data. Wherein the data list (including names and/or other identifiable information of the stored data) of each sub-computing system may be obtained by the distribution system in real-time or periodically and synchronized to each sub-computing system. Taking the example of including an offline computing system, a near-line computing system, and an online computing system, the file system may synchronize data listings for the offline computing system and the online computing system to the near-line computing system, and so on. On this basis, the publishing system may synchronize according to a set synchronization rule, for example, to synchronize only the data list of the offline computing system to the online computing system, and to synchronize only the data list of the online computing system to the online computing system. After receiving the data transfer request, if the issuing system detects that the data transfer request meets the security condition, the issuing system sends a storage address of data to be transferred (corresponding to the data transfer request) to a corresponding sub-computing system, so that the sub-computing system transfers the data according to the storage address. For example, the data transfer request is a data acquisition request sent by the near-line computing system and used for acquiring the historical parameters in the offline computing system, and when the issuing system detects that the data acquisition request meets the security condition, the issuing system can send a storage address of the historical parameters in the offline computing system to the near-line computing system so that the near-line computing system acquires the historical parameters from the offline computing system according to the storage address.

Another implementation is that the sub-computing system can obtain the storage address of the data stored by the other sub-computing system, but without the data transfer authority. The publishing system may acquire the data list of each sub-computing system (including the storage address of the stored data, and may also include a name and/or other identifiable information) in real time or periodically, and synchronize the data list to each sub-computing system, where the synchronization rule may also be freely set. After receiving the data transfer request, if the issuing system detects that the data transfer request meets the security condition, opening the data transfer permission to the corresponding sub-computing system so that the sub-computing system performs data transfer according to the data transfer permission, wherein the form of the data transfer permission is not limited, and can be an authentication password or other forms. For example, the data transfer request is a data acquisition request sent by the near-line computing system and used for acquiring a history parameter in the offline computing system, and when the issuing system detects that the data acquisition request meets a security condition, the issuing system may send a data transfer permission (here, the data acquisition permission) to the near-line computing system, so that the near-line computing system acquires the history parameter from the offline computing system according to the data transfer permission and a storage address of the previously acquired history parameter.

In some embodiments, prior to step 103, further comprising: when the type of the data to be transferred corresponding to the data transfer request accords with the type of the permitted transfer, the issuing system determines that the data transfer request meets the security condition; wherein the type of transfer allowed includes the result of the computational task.

Here, an example of a security condition is provided, namely including the type of transfer allowed. Here, the type of transfer allowed may be set uniformly for all the sub-computing systems, or may be set individually for different sub-computing systems. For example, the type of transition allowed may be uniformly set as the result of the calculation task, which includes the feature obtained after performing the training calculation task and the parameters of the machine learning model, and also includes the prediction result obtained after performing the prediction calculation task. For another example, in a scenario where an offline computing system performs an offline prediction computing task and an online computing system directly obtains a prediction result from the offline computing system, for a data transfer request involving the offline computing system, the type of allowable transfer is set to include only the prediction result obtained after the offline prediction computing task is performed. By adopting the mode, the data transfer process is restricted by a specific type, and the security of the data transfer can be improved.

In some embodiments, prior to step 103, further comprising: when the data transfer time of the data transfer request accords with the time interval allowing the data transfer, the issuing system determines that the data transfer request meets the security condition.

Here, another example of a security condition, i.e., a time interval in which data transfer is permitted, is provided, and next, an example is described in two specific cases. In one case, the data transfer time refers to the time when the sub-computing system issues a data transfer request, or the time when the issuing system receives a data transfer request. In this case, the load degree of each time interval of the plurality of sub-computing systems in history may be counted in advance, and one or more time intervals with the lowest load degree may be used as time intervals allowing data transfer, where the time intervals may be divided according to the actual application scenario, for example, 1 day may be divided into 24 time intervals, and each time interval is 1 hour. If the data transfer time falls within the time interval allowed for data transfer, determining that the data transfer request meets the security condition. Therefore, the data transfer can be ensured to be carried out in a time interval with lower load degree, and the consequences of overlong data transfer time or error occurrence in data transfer and the like caused by overhigh load degree are effectively avoided.

In addition, the time interval for allowing data transfer can be set according to the security requirements of different sub-computing systems, for example, the data transfer requirements from the offline computing system to the near-line computing system are relatively frequent, and the influence degree is small, so that the time interval for allowing data transfer can be not limited, i.e. the data transfer requirements can be directly determined to meet the security conditions; for the data transfer requirement from the near-line computing system to the on-line computing system, because the influence degree on the on-line service provided by the on-line computing system is larger, the time interval (such as three early morning points) in which the use frequency of the on-line service is lower than the frequency threshold value can be used as the time interval for allowing the data transfer, and therefore, because the use users of the on-line service in the time interval are fewer and the use frequency is lower, the influence can be reduced to the greatest extent even if the on-line service is problematic due to the data transfer, and the quick repair of related personnel is facilitated.

In another case, the data transfer time is a generation time of data to be transferred corresponding to the data transfer request, and when the generation time falls within a time interval allowing data transfer, it is determined that the data transfer request satisfies the security condition. The time interval in which the data transfer is allowed may be set according to the service timeliness requirement of the machine learning model processing procedure, for example, set to be within 1 day from the current time. In this way, the model processing effect (training effect or prediction effect) caused by the data of which the timeliness is lost due to the transfer can be prevented from being poor, and the resources of the data transmission can be saved.

In some embodiments, prior to step 103, further comprising: when the data transfer direction of the data transfer request accords with the set transfer direction, the issuing system determines that the data transfer request meets the security condition.

Here, another example of the security condition, i.e., setting the transfer direction, is provided. For example, the set transition directions may include directions from the offline computing system to the near-line computing system, and from the near-line computing system to the online computing system, and may include only directions from the offline computing system to the online computing system. By the method, the data transfer is restrained by setting the transfer direction, the order of the data transfer is improved, and errors caused by data transfer in an illegal direction are avoided.

It should be noted that, the type of transfer permission, the time interval in which data transfer is permitted, and the set transfer direction may be optionally applied, or may be optionally combined with application, or may determine whether to authorize the execution of the data transfer operation in the issuing system by means of manual audit. Under the condition that the number of machine learning models to be modeled is a plurality of, safety conditions respectively corresponding to different machine learning models can be set so as to meet the safety requirements of the corresponding machine learning models.

As shown in fig. 3A, by a plurality of sub-computing systems isolated from each other, it is effectively ensured that computing tasks in different processing stages do not affect each other; meanwhile, whether the data transfer operation is authorized or not is determined by the release system, so that risks in the machine learning model processing process can be reduced, and data security is enhanced.

In some embodiments, referring to fig. 3B, fig. 3B is a schematic flow chart of an alternative machine learning method provided in an embodiment of the present application, and for ease of understanding, an offline computing system, a near-line computing system, and an online computing system are taken as examples, and the illustrated steps are described in connection therewith.

In step 201, an offline computing system performs offline training computing tasks of a machine learning model based on stored historical data.

Here, the offline computing system performs a feature statistical transformation based on the stored historical data to obtain historical features that are indicative of the type of features extracted from the historical data, where the historical data may be obtained from the online computing system by way of data transfer by the offline computing system. The offline computing system then extracts historical samples from the historical data based on the historical characteristics, for example, as a content trigger record including user data, data of recommended content, and trigger results, and the historical samples extracted from the historical content trigger record may be in the form of "object characteristics-content characteristics-trigger results". The method comprises the steps of training a machine learning model based on a history sample, for example, predicting object characteristics and content characteristics in the history sample through the machine learning model to obtain a predicted result, determining a difference (namely a loss value) between the predicted result and a trigger result in the history sample according to a loss function of the machine learning model, carrying out back propagation in the machine learning model according to the difference, and updating parameters of the machine learning model along a gradient descent direction in the back propagation process. For convenience of distinction, parameters of the machine learning model obtained after performing the offline training calculation task are named as history parameters.

In step 202, the near-line computing system obtains first data from the offline computing system through a first data transfer operation authorized by the publishing system; the first data includes at least one of: historical parameters of the machine learning model calculated in the offline training calculation task and historical features calculated in the offline training calculation task.

Here, the offline computing system or the near-line computing system may send a data transfer request corresponding to the first data to the publishing system, and when the publishing system detects that the data transfer request meets the security condition, the publishing system authorizes to execute a data transfer operation corresponding to the data transfer request, and for convenience of distinction, the data transfer operation is named as a first data transfer operation.

In step 203, the near-line computing system performs a near-line training computing task in conjunction with the first data and the real-time data.

Here, the real-time data may be acquired from the online computing system by the near-line computing system through a data transfer manner, wherein the near-line computing system may acquire the real-time data through a streaming manner.

In some embodiments, performing the near line training computing task in conjunction with the first data and the real-time data as described above may be accomplished in such a way: the near line computing system takes the history feature as a real-time feature, and extracts a real-time sample from real-time data according to the real-time feature; based on the real-time samples, training a machine learning model deploying historical parameters.

When the first data includes both the history feature and the history parameter, the near-line computing system may directly use the history feature as a real-time feature, extract a real-time sample from the real-time data according to the real-time feature, and then train a machine learning model for deploying the history parameter based on the real-time sample, and name the obtained parameter as a real-time parameter.

In addition, when the first data only comprises the history feature, the near-line computing system can directly take the history feature as the real-time feature to further obtain a real-time sample, and then training a machine learning model stored in the near-line computing system based on the real-time sample to obtain real-time parameters; when the first data only comprises the history parameters, the near-line computing system can perform feature statistical conversion based on the real-time data to obtain real-time features, extract real-time samples from the real-time data according to the real-time features, and then train a machine learning model for deploying the history parameters based on the real-time samples. By the mode, the model training process in the near-line computing system can be attached to the real-time situation, and the accuracy of the obtained real-time parameters is improved.

In step 204, the online computing system obtains second data from the near-online computing system through a second data transfer operation authorized by the publishing system; the second data includes at least one of: real-time parameters of the machine learning model calculated in the near-line training calculation task and real-time characteristics calculated in the near-line training calculation task.

Here, the near-line computing system or the online computing system may send a data transfer request corresponding to the second data to the issuing system, and the issuing system authorizes to execute the second data transfer operation corresponding to the data transfer request when detecting that the data transfer request satisfies the security condition.

In step 205, the online computing system performs an online predictive computing task in combination with the second data and the stored data to be tested in response to the real-time predictive request, and responds to the predictive request based on the obtained predictive result.

The data to be measured may be obtained from the offline computing system or the near-line computing system by the online computing system through data transfer, or may be stored locally in the online computing system in advance, or may be carried by the received prediction request. And when the online computing system receives the real-time prediction request, the online computing system executes an online prediction computing task by combining the second data and the stored data to be detected, and responds to the prediction request based on the obtained prediction result.

In some embodiments, the above-described performing the online prediction calculation task in combination with the second data and the stored data under test may be implemented in such a way that: the online computing system extracts a sample to be detected from the data to be detected according to the real-time characteristics; and carrying out prediction processing on the sample to be detected by deploying a machine learning model of the real-time parameters to obtain a prediction result.

When the second data includes real-time features and real-time parameters, the online computing system may directly extract the sample to be measured from the data to be measured according to the real-time features, for example, the data to be measured includes user data and data of a plurality of contents to be recommended, extract object features from the user data according to feature types represented by the real-time features, extract content features from the data of each of the contents to be recommended, and respectively combine the object features with the content features corresponding to the plurality of contents to be recommended to obtain a plurality of samples to be measured. Then, a machine learning model of real-time parameters is deployed, prediction processing is carried out on the sample to be detected, a prediction result is obtained, and a prediction request is responded based on the prediction result. For example, when the predicted result indicates whether to trigger, the content to be recommended corresponding to the predicted trigger result indicating the trigger is used as the screened content; when the predicted result indicates the trigger time length, the content to be recommended corresponding to the predicted result with the trigger time length exceeding the time length threshold is used as the screened content. And recommending the screened content to the user corresponding to the prediction request to be used as a response to the prediction request, so that accurate content recommendation can be realized, and user experience is improved.

In addition, the processing manner when the second data includes only the real-time feature or only the real-time parameter is similar to the processing manner when the first data includes only the history feature or only the history parameter, which is not described herein.

As shown in fig. 3B, the embodiment of the application can effectively improve the effect of model training and improve the accuracy of the finally obtained prediction result by means of off-line training, near-line training and on-line prediction.

In some embodiments, referring to fig. 3C, fig. 3C is an optional flowchart of a machine learning method according to an embodiment of the present application, based on fig. 3B, after step 203, the near-line computing system may further perform a near-line prediction computing task in step 301 in combination with the second data and the stored data to be tested, to obtain a prediction result; the second data includes at least one of: real-time parameters of the machine learning model calculated in the near-line training calculation task and real-time characteristics calculated in the near-line training calculation task.

Aiming at the situation that the online computing system has poor computing capability and cannot support the execution of online prediction computing tasks, in the embodiment of the application, the online prediction computing task can be executed by combining the second data obtained by computing in the online training computing task and the stored data to be tested by the online computing system, wherein the data to be tested can be obtained from the offline computing system or the online computing system by the online computing system in a data transfer mode or can be prestored to the local of the online computing system.

In some embodiments, the above-mentioned performing of the near-line prediction calculation task in combination with the second data and the stored data to be measured may be implemented in such a way that a prediction result is obtained: and the near line computing system extracts a sample to be detected from the stored data to be detected according to the real-time characteristics, and predicts the sample to be detected through deploying a machine learning model of the real-time parameters to obtain a prediction result.

Here, the process of performing the near-line prediction calculation task is similar to the process of performing the on-line prediction calculation task described above, and will not be described here.

In FIG. 3C, following step 301, the online computing system may also obtain a prediction result from the near-line computing system in response to the prediction request in response to the real-time prediction request via a third data transfer operation authorized by the publishing system in step 302.

Here, the near-line computing system or the online computing system may send a data transfer request corresponding to the prediction result to the issuing system, and the issuing system authorizes to execute a third data transfer operation corresponding to the data transfer request when detecting that the data transfer request satisfies the security condition.

As shown in fig. 3C, the method provided by the embodiment of the application can reduce the processing pressure of the online computing system by means of offline training, near-line prediction and online acquisition, and is suitable for a scenario in which the processing capacity of the online computing system is low, such as providing computing resources by terminal equipment in the online computing system.

In some embodiments, referring to fig. 3D, fig. 3D is a schematic flow chart of an alternative machine learning method provided by an embodiment of the present application, based on fig. 3B, after step 201, an offline computing system may further perform an offline prediction calculation task in step 401 in combination with the first data and the stored data to be measured, to obtain a prediction result; wherein the first data comprises at least one of: historical parameters of the machine learning model calculated in the offline training calculation task and historical features calculated in the offline training calculation task.

For less timeliness demanding scenarios, such as those in which the recommended content is a payment coupon, offline predictions may be made by the offline computing system. For example, the offline computing system performs an offline prediction computing task in combination with the first data and the stored data to be detected to obtain a prediction result, where the data to be detected may be obtained by the offline computing system through a data transfer manner from the near-line computing system or the online computing system, or may be stored in advance in the offline computing system.

In some embodiments, the above-mentioned combination of the first data and the stored data to be measured may be implemented in such a way that an offline prediction calculation task is performed, and a prediction result is obtained: the offline computing system extracts a sample to be detected from the stored data to be detected based on the historical characteristics, and predicts the sample to be detected through a machine learning model of the deployment historical parameters to obtain a prediction result.

When the first data simultaneously comprises the historical characteristics and the historical parameters, the offline computing system extracts a sample to be tested from the data to be tested based on the historical characteristics, and predicts the sample to be tested through a machine learning model with the historical parameters, so as to obtain a prediction result. In addition, when the first data only includes the history feature, the offline computing system may extract a sample to be tested from the data to be tested based on the history feature, and perform prediction processing on the sample to be tested through an original machine learning model (i.e., a machine learning model before performing the offline training computing task); when the first data only comprises the historical parameters, the offline computing system can perform feature statistical conversion based on the data to be tested, further extract a sample to be tested, and predict the sample to be tested through a machine learning model with the historical parameters.

In step 402, the online computing system obtains a prediction result from the offline computing system in response to the prediction request through a fourth data transfer operation authorized by the publishing system in response to the real-time prediction request.

Here, the offline computing system or the online computing system may send a data transfer request corresponding to the prediction result to the issuing system, and the issuing system authorizes execution of a fourth data transfer operation corresponding to the data transfer request when detecting that the data transfer request satisfies the security condition.

As shown in fig. 3D, in the embodiment of the present application, the prediction result is calculated in advance by means of offline prediction-online acquisition, so that the processing pressure when the prediction request is acquired can be reduced, and the method is suitable for a scene with low real-time requirements.

In the following, an exemplary application of the embodiment of the present application in a practical application scenario will be described. The embodiment of the application can be applied to various application scenes of machine learning modeling, such as content recommendation in application programs, wherein the content can be advertisement, payment coupon, movie or television series, and the like, without limitation. Next, each module involved in the machine learning modeling process is described in tabular form:

Here, each module involved will be specifically explained.

Data warehouse: refers to a warehouse for storing data, which stores data with a large magnitude and is mainly used for data analysis, such as Hive data warehouse. In the embodiment of the application, the data warehouse can store the report log related to the application and the processed standardized data for machine learning modeling, for example, the data related to the payment application is used for machine learning modeling, and then the data warehouse can store transaction running water related to the payment application, statistical data generated based on the running water and the like.

A streaming data processing system: a system for processing real-time online service generated data streams, such as the Fl ink stream processing framework. The above data warehouse processes the generated data (corresponding to the above historical data), while the streaming data processing system processes the data being generated (corresponding to the above real-time data).

Computing clusters: in the processing of the machine learning model, a large amount of data is often generated, and a single electronic device is difficult to process, so that a computing cluster including a plurality of electronic devices may be used to perform a computing task, such as a Hadoop cluster, a Spark cluster, or a distributed TensorFlow cluster.

Scheduling system: the processing of machine learning models often involves multi-person collaboration, multi-process computing, etc., and therefore, a scheduling system is needed to fully utilize computing cluster resources to implement a scheduling strategy for multi-user, multi-task scenarios, such as another resource coordinator (Yet Another Resource Negotiator, yacn) scheduling system.

Distributed storage system: the storage system comprises a distributed file system and a distributed memory system, wherein the distributed file system and the distributed memory system are deployed based on a computing cluster, and the difference is that the distributed file system utilizes a disk and the distributed memory system utilizes memory. During the execution of a computing task, data is typically stored in a distributed file system, such as the Hadoop distributed file system (Hadoop Distributed FILE SYSTEM, HDFS), when it is stored in a disk; when providing real-time online service, the data volume is large, and the data volume is also stored in a distributed memory system, such as FeatureKV distributed memory system. That is, distributed file systems are typically used for offline analysis, such as storing device logs or application logs that are generated anytime and anywhere; distributed memory systems are typically used for online services, such as storing object features for performing predictive computing tasks.

On-line service: model reasoning or computing services provided over a network often involve the loading of machine learning models, reading of features, etc. For example, the online service may be an intelligent content recommendation service.

The plurality of environments in the above table will be described.

Offline environment: based on the generated historical data, calculation (sample statistics conversion, feature statistics conversion, model training, off-line prediction and the like) is performed, data processing is completed in a batch mode, the real-time performance of the data processing is poor, and the immediate updating of the data and the model cannot be achieved.

Near-line environment: the data processing is done in a streaming manner based on the data generated in real time (real-time feature calculation, real-time model training, etc.), and the data is processed with a delay of the order of minutes or even seconds, i.e. the provision of real-time services is not guaranteed.

On-line environment: in response to a prediction request sent by a user, online prediction of a model is performed, which generally requires completion of calculations in the millisecond level, ensuring that real-time online services are provided.

In the processing process of the machine learning model, data in an offline environment is usually pushed to a near-line environment, data in the offline environment is pushed to an online environment, data in the near-line environment is pushed to the online environment, and finally, the effect is generally exerted in the online environment. In order to avoid the data security problem caused by the mutual influence of all links, all modules are segmented according to an offline environment, a near-line environment and an online environment. The angle of the cut includes two aspects, one being a storage medium and the other being a computing resource.

For the storage medium, the embodiment of the application provides a schematic diagram of a storage medium isolation policy as shown in fig. 4, each computing environment corresponds to an independent storage space, and by controlling the data access rights of the storage space, the situation that data is used across environments is ensured not to occur on the system, for example, the data access rights of an offline data warehouse and an offline distributed file system are set and are only held by electronic devices in the offline environment. When data transfer is performed between different computing environments, the data transfer can be successfully performed after the data transfer is verified and authorized by an automatic issuing system.

The issuing system is a mechanism for auditing and transferring introduced when multiplexing (transferring) data in different computing environments, and prevents data errors caused by manual operation when transferring data in different computing environments. In the distribution system, the audit can be performed by setting the transfer direction. The embodiment of the application provides a schematic diagram of data transfer as shown in fig. 5, the issuing system may only authorize data transfer operations corresponding to transfer directions shown in ①、② and ③, and not authorize data transfer operations corresponding to other transfer directions. Of course, this is not limiting of the embodiments of the present application, and the direction of the set transition may be specifically determined for different machine learning systems according to the manner and environment in which the data is used.

Moreover, the auditing mechanism can be determined according to the security requirements of different computing environments, for example, the data transfer operation for transferring the data in the offline environment to the near-line environment is relatively frequent, the influence degree is small, the time interval for allowing the data transfer can be not limited in the release system, and the manual auditing by the collaborative developer can be set; for the data transfer operation of transferring the data in the offline environment to the online environment or the data transfer operation of transferring the data in the near-line environment to the online environment, the time interval in which the data transfer is allowed can be limited to be the non-core working time interval of the online service in the release system due to the large influence degree, and in addition, the manual audit by more responsible persons can be set.

In addition, a backup and audit mechanism can be applied, wherein the backup mechanism refers to the backup of the data for rollback before the data is on line; the audit mechanism is used for auditing the setting data, and is used for tracing when a problem occurs, so that the source of the problem can be conveniently positioned. In the embodiment of the application, the backup and audit mechanism can be deployed in the release system, and the backup and audit mechanism can also be deployed in each computing environment.

For computing resources, the embodiment of the application provides a schematic diagram of a computing resource isolation policy as shown in fig. 6, so as to prevent different computing environments from competing when the task is scheduled by the scheduling system, so that some important tasks cannot be completed normally on time, and different computing clusters can be divided for different computing environments. For example, there is a Spark cluster comprising 10 servers, 5 of which can be partitioned into an offline environment as an offline-Spark cluster for providing computing resources to the offline environment; the other 5 servers are partitioned into the near-line environment as near-line-Spark clusters for providing computing resources to the near-line environment, thus achieving isolation from each other at the computing resource level.

According to the embodiment of the application, the manual constraint problem is changed into the system constraint problem, and the following technical effects can be realized: 1) The risk in the processing process of the machine learning model is reduced, and the data use mode is standardized; 2) The daily invalid work content of related personnel (such as algorithm engineers) is reduced, the efficiency of the system can be improved, and the labor cost is reduced; 3) The standardized data operation mode can provide unified input for subsequent abnormality detection and the like, and is convenient for the expansion of other applications.

Continuing with the description below of exemplary structures of software modules in electronic device 800 provided by embodiments of the present application, in some embodiments, as shown in FIG. 2, when electronic device 800 is an electronic device in a sub-computing system, sub-computing module 8431 is configured to: executing the computing tasks in the corresponding processing stages according to the stored data; wherein the plurality of sub-computing systems are isolated from each other and correspond to different processing stages of the machine learning model. When the electronic device 800 is an electronic device in a publication system, the publication module 8432 is configured to: receiving a data transfer request between any two sub-computing systems; when it is detected that the data transfer request satisfies a security condition for cross-system data transfer, the execution of the data transfer operation corresponding to the data transfer request is authorized.

In some embodiments, the plurality of sub-computing systems are configured to be isolated from each other by at least one of: performing respective computing tasks using different computing resources; the different storage resources are used to store the data needed to perform the computing task, as well as the results of the computing task.

In some embodiments, the plurality of sub-computing systems includes an offline computing system, a near-line computing system, and an online computing system. When the electronic device 800 is an electronic device in an offline computing system, the sub-computing module 8431 may be updated to an offline computing module, which is further configured to: performing an offline training computing task of the machine learning model according to the stored historical data; the offline training calculation task comprises the following steps: the method comprises the steps of feature statistical conversion based on historical data, extraction of historical samples from the historical data based on historical features obtained by the feature statistical conversion, and training of a machine learning model based on the historical samples.

In some embodiments, when the electronic device 800 is an electronic device in a near-line computing system, the sub-computing module 8431 may be updated to a near-line computing module, which is further configured to: executing a near line training calculation task according to the real-time data; the near line training calculation task comprises the following steps: and (3) carrying out feature statistical conversion based on the real-time data, extracting real-time samples from the real-time data based on the real-time features obtained by the feature statistical conversion, and training a machine learning model based on the real-time samples.

In some embodiments, when the electronic device 800 is an electronic device in an online computing system, the sub-computing module 8431 may be updated to an online computing module, which is also to: and responding to the real-time prediction request, executing an online prediction calculation task of the machine learning model, and responding to the prediction request based on the obtained prediction result.

In some embodiments, the near line calculation module is further to: acquiring first data from an offline computing system through a first data transfer operation authorized by a release system, and executing a near-line training computing task by combining the first data and real-time data; wherein the first data comprises at least one of: historical parameters of the machine learning model calculated in the offline training calculation task and historical features calculated in the offline training calculation task.

In some embodiments, the near line calculation module is further to: taking the history characteristic as a real-time characteristic, and extracting a real-time sample from real-time data according to the real-time characteristic; based on the real-time samples, training a machine learning model deploying historical parameters.

In some embodiments, the online computing module is further to: acquiring second data from the near-line computing system through a second data transfer operation authorized by the release system, and executing an online prediction computing task by combining the second data and the stored data to be detected; wherein the second data comprises at least one of: real-time parameters of the machine learning model calculated in the near-line training calculation task and real-time characteristics calculated in the near-line training calculation task.

In some embodiments, the online computing module is further to: extracting a sample to be detected from the data to be detected according to the real-time characteristics; and carrying out prediction processing on the sample to be detected by deploying a machine learning model of the real-time parameters to obtain a prediction result.

In some embodiments, the near line calculation module is further to: executing a near line prediction calculation task by combining the second data and the stored data to be detected to obtain a prediction result; wherein the second data comprises at least one of: real-time parameters of the machine learning model calculated in the near-line training calculation task and real-time characteristics calculated in the near-line training calculation task; the online computing module is also for: in response to the real-time forecast request, a forecast result is obtained from the near-line computing system through a third data transfer operation authorized by the issuing system, and the forecast request is responded.

In some embodiments, the offline computing module is further to: executing an offline prediction calculation task by combining the first data and the stored data to be detected to obtain a prediction result; wherein the first data comprises at least one of: historical parameters of the machine learning model are calculated in the offline training calculation task, and historical characteristics are calculated in the offline training calculation task; the online computing module is also for: in response to the prediction request in real time, a prediction result is obtained from the offline computing system via a fourth data transfer operation authorized by the publishing system in response to the prediction request.

In some embodiments, the publish module 8432 is further configured to: when the type of the data to be transferred corresponding to the data transfer request accords with the type of the allowable transfer, determining that the data transfer request meets the security condition; wherein the type of transfer allowed includes the result of the computational task.

In some embodiments, the publish module 8432 is further configured to: and when the data transfer time of the data transfer request accords with the time interval allowing the data transfer, determining that the data transfer request meets the security condition.

In some embodiments, the publish module 8432 is further configured to: when the data transfer direction of the data transfer request accords with the set transfer direction, determining that the data transfer request meets the security condition.

In some embodiments, the sub-calculation module 8431 is further configured to: and carrying out backup processing on the stored original data to obtain backup data so as to restore according to the backup data when the original data has errors.

In some embodiments, the sub-calculation module 8431 is further configured to: and performing audit processing on the set data in the stored data to obtain an audit log comprising data access operations executed on the set data, so as to locate the data access operation causing the error according to the audit log when the set data has the error.

Embodiments of the present application provide a computer program product or computer program comprising computer instructions stored in a computer readable storage medium. The processor of the computer device reads the computer instructions from the computer-readable storage medium, and the processor executes the computer instructions, so that the computer device performs the machine learning method according to the embodiment of the present application.

Embodiments of the present application provide a computer readable storage medium having stored therein executable instructions that, when executed by a processor, cause the processor to perform a method provided by embodiments of the present application, for example, a machine learning method as shown in fig. 3A, 3B, 3C, or 3D. It is noted that a computer includes various computing devices including a terminal device and a server.

In some embodiments, the computer readable storage medium may be FRAM, ROM, PROM, EP ROM, EEPROM, flash memory, magnetic surface memory, optical disk, or CD-ROM; but may be a variety of devices including one or any combination of the above memories.

In some embodiments, the executable instructions may be in the form of programs, software modules, scripts, or code, written in any form of programming language (including compiled or interpreted languages, or declarative or procedural languages), and they may be deployed in any form, including as stand-alone programs or as modules, components, subroutines, or other units suitable for use in a computing environment.

As an example, executable instructions may, but need not, correspond to files in a file system, may be stored as part of a file that holds other programs or data, such as in one or more scripts in a hypertext markup language (HTML, hyper Text Markup Language) document, in a single file dedicated to the program in question, or in multiple coordinated files (e.g., files that store one or more modules, sub-programs, or portions of code).

As an example, executable instructions may be deployed to be executed on one computing device or on multiple computing devices located at one site or distributed across multiple sites and interconnected by a communication network.

In summary, the following technical effects can be achieved by the embodiments of the present application:

1) The sub-computing systems which are isolated from each other effectively ensure that the computing tasks in different processing stages are not affected; meanwhile, whether the data transfer operation is authorized or not is determined by the release system, so that risks in the machine learning model processing process can be reduced, and data security is enhanced. The isolation can be realized by at least one mode of computing resource isolation and storage resource isolation, so that the flexibility is improved.

2) The issuing system can audit the data transfer request through at least one of the type of the permitted transfer, the time interval of the permitted data transfer and the set transfer direction, so that the flexibility and the safety of the data transfer are improved, and the risk in the data transfer process is reduced.

3) In the processing process of the machine learning model, modes such as offline training, near line training, online prediction, offline training, near line prediction, offline training, offline prediction and the like can be utilized, so that the processing flexibility is improved, and the machine learning model can be selected according to actual application scenes.

4) Based on a backup mechanism, when data is wrong, the data can be quickly rolled back, and the loss is reduced as much as possible; based on an audit mechanism, when data is wrong, the source can be traced quickly, and related personnel can repair the data as soon as possible.

The above is merely an example of the present application and is not intended to limit the scope of the present application. Any modification, equivalent replacement, improvement, etc. made within the spirit and scope of the present application are included in the protection scope of the present application.

Claims

1. A machine learning system, comprising:

The plurality of sub-computing systems are used for executing computing tasks in corresponding processing stages according to the stored data, wherein the plurality of sub-computing systems comprise an offline computing system, a near-line computing system and an online computing system; the offline computing system is used for executing offline training computing tasks of the machine learning model according to the stored historical data; the near line computing system is used for executing near line training computing tasks according to real-time data; the online computing system is used for responding to a real-time prediction request, executing an online prediction computing task of the machine learning model and responding to the prediction request based on an obtained prediction result;

The issuing system is used for receiving a data transfer request between any two sub-computing systems, and is further used for executing at least one of the following processes: when the type of the data to be transferred corresponding to the data transfer request accords with the type of the allowable transfer, determining that the data transfer request meets the security condition, wherein the type of the allowable transfer comprises a result of a computing task; when the data transfer time of the data transfer request accords with a time interval allowing data transfer, determining that the data transfer request meets the safety condition; when the data transfer direction of the data transfer request accords with a set transfer direction, determining that the data transfer request meets the safety condition;

When the data transfer request is detected to meet the security condition of cross-system data transfer, opening data transfer permission to a sub-computing system corresponding to the data transfer request, so that the sub-computing system corresponding to the data transfer request performs data transfer according to the data transfer permission.

2. The machine learning system of claim 1, wherein,

3. The machine learning system of claim 1, wherein,

The offline training computing task comprises: extracting a history sample from the history data based on the characteristic statistical conversion of the history data and based on the history characteristics obtained by the characteristic statistical conversion, and training the machine learning model based on the history sample;

the near line training calculation task comprises the following steps: and extracting a real-time sample from the real-time data based on the feature statistical conversion of the real-time data and the real-time features obtained by the feature statistical conversion, and training the machine learning model based on the real-time sample.

4. The machine learning system of claim 1, wherein,

The near line computing system is further used for acquiring first data from the offline computing system through a first data transfer operation authorized by the release system, and executing the near line training computing task by combining the first data and the real-time data;

wherein the first data includes at least one of:

Historical parameters of the machine learning model calculated in the offline training calculation task are historical features calculated in the offline training calculation task.

5. The machine learning system of claim 4, wherein,

The near line computing system is further configured to:

Taking the history feature as a real-time feature, and extracting a real-time sample from the real-time data according to the real-time feature;

6. The machine learning system of claim 4, wherein,

The online computing system is further used for acquiring second data from the near-line computing system through a second data transfer operation authorized by the issuing system, and executing the online prediction computing task by combining the second data and the stored data to be detected;

wherein the second data includes at least one of:

the real-time parameters of the machine learning model calculated in the near-line training calculation task are real-time features calculated in the near-line training calculation task.

7. The machine learning system of claim 6, wherein,

The online computing system is further configured to:

extracting a sample to be detected from the data to be detected according to the real-time characteristics;

8. The machine learning system of claim 4, wherein,

The near line computing system is also used for executing a near line prediction computing task by combining the second data and the stored data to be detected to obtain a prediction result;

wherein the second data includes at least one of:

real-time parameters of the machine learning model calculated in the near-line training calculation task and real-time characteristics calculated in the near-line training calculation task;

The online computing system is further configured to respond to a real-time prediction request by obtaining the prediction result from the near-online computing system through a third data transfer operation authorized by the publishing system.

9. The machine learning system of claim 3, wherein,

The off-line computing system is also used for executing off-line prediction computing tasks by combining the first data and the stored data to be detected to obtain a prediction result;

wherein the first data includes at least one of:

the online computing system is further configured to obtain the prediction result from the offline computing system in response to the prediction request through a fourth data transfer operation authorized by the publishing system in response to a real-time prediction request.

10. The machine learning system of any of claims 1-9 wherein,

The sub-computing system is also used for carrying out backup processing on the stored original data to obtain backup data so as to

And when the original data has errors, recovering according to the backup data.

11. The machine learning system of any of claims 1-9 wherein,

The sub-computing system is also used for performing audit processing on the set data in the stored data to obtain an audit log comprising data access operations executed on the set data so as to

12. A machine learning method, characterized by being applied to a plurality of sub-computing systems isolated from each other and corresponding to different processing phases of a machine learning model;

the machine learning method includes:

receiving a data transfer request between any two sub-computing systems;

Performing at least one of the following: when the type of the data to be transferred corresponding to the data transfer request accords with the type of the allowable transfer, determining that the data transfer request meets the security condition, wherein the type of the allowable transfer comprises a result of a computing task; when the data transfer time of the data transfer request accords with a time interval allowing data transfer, determining that the data transfer request meets the safety condition; when the data transfer direction of the data transfer request accords with a set transfer direction, determining that the data transfer request meets the safety condition;

when the data transfer request is detected to meet the security condition of cross-system data transfer, opening data transfer permission to a sub-computing system corresponding to the data transfer request, so that the sub-computing system corresponding to the data transfer request performs data transfer according to the data transfer permission;

wherein,

The plurality of sub-computing systems are used for executing computing tasks in corresponding processing stages according to the stored data, wherein the plurality of sub-computing systems comprise an offline computing system, a near-line computing system and an online computing system; the offline computing system is used for executing offline training computing tasks of the machine learning model according to the stored historical data; the near line computing system is used for executing near line training computing tasks according to real-time data; the online computing system is used for responding to real-time prediction requests, executing online prediction computing tasks of the machine learning model and responding to the prediction requests based on obtained prediction results.

13. An electronic device, comprising:

A memory for storing executable instructions;

A processor configured to implement the machine learning method of claim 12 when executing the executable instructions stored in the memory.

14. A computer readable storage medium storing executable instructions for implementing the machine learning method of claim 12 when executed by a processor.

15. A computer program product comprising computer instructions or a computer program which, when executed by a processor, implements the machine learning method of claim 12.