CN115756875B

CN115756875B - Online service deployment method and system of machine learning model for streaming data

Info

Publication number: CN115756875B
Application number: CN202310015610.8A
Authority: CN
Inventors: 张田田; 涂燕晖; 程海博
Original assignee: Shandong Future Network Research Institute Industrial Internet Innovation Application Base Of Zijinshan Laboratory
Current assignee: Shandong Future Network Research Institute Industrial Internet Innovation Application Base Of Zijinshan Laboratory
Priority date: 2023-01-06
Filing date: 2023-01-06
Publication date: 2023-05-05
Anticipated expiration: 2043-01-06
Also published as: CN115756875A

Abstract

The invention provides a method and a system for deploying machine learning model online services for streaming data, which relate to the field of machine learning model deployment; the method comprises the following steps: the method comprises the steps of constructing a machine learning model online service framework oriented to streaming data, wherein the machine learning model online service framework comprises an external API interface of a model online service, a streaming data real-time processing channel and a distributed model prediction service, and the streaming data real-time processing channel comprises a gRPC service cluster and a message queue service; establishing a bidirectional communication connection between a client and a gRPC server node according to gRPC bidirectional flow; receiving stream request data and storing the stream request data in a message queue service; and the message queue monitoring service selects corresponding model prediction when receiving the streaming request data, writes a prediction result into the message queue service and pushes the prediction result to the client. The invention provides an asynchronous WEB interface for online service of the machine learning model, receives, caches and processes data in real time, and sends a model prediction result, thereby avoiding invalid blocking of a client.

Description

Online service deployment method and system of machine learning model for streaming data

Technical Field

The invention relates to the technical field of machine learning model deployment, in particular to a method and a system for deploying online service of a machine learning model for streaming data.

Background

Machine learning is becoming more and more widely applied and deployed, and a model service with enterprise application value is built by using a machine learning model in an easy-to-use, efficient and convenient manner, so that the closed loop of the machine learning application life cycle can be completed. The machine learning frameworks common in the industry have tensorflow, pytorch and the like, and generally provide offline training and corresponding model-servitization solutions; providing services to the outside through an HTTP or gRPC interface, wherein the model online service is currently provided in a batch processing mode, a user sends batch request data to the model service, the model service predicts and sends a result to a client, and the model service mainly focuses on training and providing predictions by using a static model and historical static data; for example, when a user browses a website, news may be pushed according to historical behavior data of the user.

In most practical use cases, after the prediction result or effect is available in the mobile application program or displayed on the webpage, the user can continue the subsequent operation, which makes the real-time machine learning more and more focused on the use case; such as real-time recommendation models using code-characterized recent session events, surge price prediction algorithms used in concert ticket booking/pooling applications, etc. The real-time machine learning processing is to process streaming data, and the difficulty of processing the streaming data is greater, because the data volume is not limited, and the rate and speed of data input can also be changed, and the traditional batch processing mode obviously cannot meet the requirement of dynamic real-time data processing.

Batch processing of data, which is understood to mean that a series of related tasks are performed sequentially or in parallel, one after the other, with the input of the batch being the collection of data over a period of time. In most cases, both the output data and the input data of the batch are boundary data. With the rapid development of the Internet, various information is explosively growing, dynamic new data is continuously generated, and the batch model deployment mode has the problems of repeated data transmission, low processing speed, long response time and incapability of carrying out real-time prediction. Batch predictions can degrade the user experience but do not lead to catastrophic consequences such as advertisement ranking, twitter's trending label ranking, facebook's news subscription ranking, arrival time estimation, etc.; however, there are some applications that can have disastrous consequences if not online predicted, and even become ineffective, such as high frequency transactions, automatic driving of cars, voice assistants, face/fingerprint unlocking of cell phones, elderly fall detection, fraud detection, etc. For fraudulent transactions, events can be prevented directly if the effect of real-time detection can be achieved. Batch processing is a good choice for scenes that do not require real-time analysis results; especially, when business logic is very complex and the data volume is large, useful information is easier to mine from the data. Therefore, in the case where the real-time analysis processing of the application is required or the end time of the data transmission and the data amount cannot be determined, the architecture of the stream processing is required to accomplish this.

Disclosure of Invention

The invention aims to provide a machine learning model online service deployment method and system for streaming data, which realize the near-real-time processing capacity of the streaming data and can process a large amount of data by constructing a distributed service deployment architecture and developing a real-time processing channel with streaming technology, and overcome the defects of strong processing delay, lower computing performance and the like of the existing machine learning model online service deployment method.

In order to achieve the above purpose, the present invention proposes the following technical scheme: a machine learning model online service deployment method facing stream data comprises the following steps:

the method comprises the steps of constructing a machine learning model online service framework oriented to streaming data, wherein the machine learning model online service framework comprises a unified model online service external API interface, a streaming data real-time processing channel and a distributed model prediction service; the streaming data real-time processing channel comprises a gRPC service cluster and a message queue service; the distributed model prediction service comprises a plurality of machine learning models with model prediction functions; the gRPC service cluster comprises a plurality of gRPC service end nodes;

receiving connection requests from all application clients, and establishing bidirectional communication connection between all application clients and all gRPC service end nodes in a gRPC service cluster by utilizing gRPC bidirectional flow;

Each gRPC service end node in the gRPC service cluster continuously receives the streaming request data from each application client and sequentially stores the streaming request data into the message queue service;

the message queue monitoring service selects a corresponding machine learning model in the distributed model prediction service to perform model prediction when new request data is received, and writes a prediction result into the message queue service;

and the monitoring message queue service pushes the predicted result to the application client corresponding to the request in real time through the unified model online service external API interface when receiving the new predicted result.

Further, the unified model online service external API interface is provided by a WEB API gateway;

the gRPC service end node in the gRPC service cluster is in long connection with the application client end capable of two-way communication and is used for receiving the streaming data request, caching the streaming data request to a message queue service, and asynchronously pushing the streaming data request to the application client end in real time after the message queue service is monitored to obtain a prediction result;

the message queue service comprises a request message queue and a reply message queue, wherein the request message queue is used for caching streaming request data, and the reply message queue is used for caching prediction results.

Further, the application client is gRPC client, and each gRPC service end node in the gRPC service cluster registers service information to a WEB API gateway;

when a gRPC client initiates a gRPC request, the process of establishing bidirectional communication connection and interaction with the gRPC server node is as follows:

receiving a gRPC request of a gRPC client and judging a currently available gRPC server node;

selecting a connectable gRPC server node according to a preset load balancing strategy, and sending node information to a gRPC client so that the gRPC client establishes bidirectional communication connection with the gRPC server node according to the node information;

and distributing a unique client ID to the gRPC client which is in bidirectional communication connection with the gRPC server node so that the gRPC server node receives stream request data sent by the gRPC client corresponding to the client ID, and realizing interaction between the gRPC server node and the gRPC client.

Further, the specific process that the gRPC server node receives the streaming request data and sequentially stores the streaming request data into the message queue service is as follows:

the gRPC server generates a request ID corresponding to each stream type request data according to the time sequence of receiving the stream type request data;

Sequentially sending each stream request data with the request ID to a request message queue taking the client ID as an identifier; the streaming request data comprises model input data, a machine learning model of a prediction application and model prediction parameters.

Further, the specific process from the message queue monitoring service to the message queue writing of the prediction result is as follows:

monitoring a request message queue, and dividing the borderless data set into data sets which are processed corresponding to the selected machine learning model based on different window modes according to an application scene when new streaming request data is received;

inputting the segmented data set into a machine learning model to obtain a prediction result;

and writing the predicted result into a reply message queue taking the client ID as an identifier, and adding a request ID corresponding to the streaming request data into the predicted result so as to reconstruct the predicted result under the condition that the predicted result is unordered.

The invention further discloses a machine learning model online service deployment system for stream data, which comprises:

the building module is used for building a machine learning model online service framework oriented to the streaming data and comprises a unified model online service external API interface, a streaming data real-time processing channel and a distributed model prediction service; the streaming data real-time processing channel comprises a gRPC service cluster and a message queue service; the distributed model prediction service comprises a plurality of machine learning models with model prediction functions; the gRPC service cluster comprises a plurality of gRPC service end nodes;

The first receiving module is used for receiving connection requests from all application clients and establishing bidirectional communication connection between all application clients and all gRPC service end nodes in the gRPC service cluster by utilizing gRPC bidirectional flow;

the second receiving module is used for continuously receiving the streaming request data from each application client by each gRPC service end node in the gRPC service cluster and sequentially storing the streaming request data into the message queue service;

the first monitoring module is used for monitoring the message queue service, selecting a corresponding machine learning model in the distributed model prediction service to perform model prediction when receiving new request data, and writing a prediction result into the message queue service;

and the second monitoring module is used for monitoring the message queue service, and pushing the predicted result to the application client corresponding to the request in real time through the unified model online service external API interface when receiving the new predicted result.

Further, the unified model online service of the machine learning model online service framework facing the streaming data is provided by a WEB API gateway;

Further, the execution unit for establishing the bidirectional communication connection between each application client and the gRPC service cluster by the first receiving module includes:

the receiving judging unit is used for receiving the gRPC request initiated by the gRPC client and judging the currently available gRPC server node; the gRPC client is the application client, and each gRPC service end node in the gRPC service cluster registers service information to a WEB API gateway;

the gRPC client is used for establishing bidirectional communication connection with the gRPC server node according to the node information;

and the distribution interaction unit is used for distributing a unique client ID to the gRPC client which is in bidirectional communication connection with the gRPC server node so that the gRPC server node receives stream request data sent by the gRPC client corresponding to the client ID, and interaction between the gRPC server node and the gRPC client is realized.

Further, the specific execution unit for receiving the streaming request data by the gRPC server node and sequentially storing the streaming request data in the message queue service by the second receiving module includes:

the generation unit is used for generating a request ID corresponding to each streaming request data according to the time sequence of receiving the streaming request data by the gRPC server node;

a sending unit, configured to send each stream request data with a request ID attached to the request message queue using the client ID as an identifier in sequence; the streaming request data comprises model input data, a machine learning model of a prediction application and model prediction parameters.

Further, the specific execution unit for the first monitoring module to monitor the message queue service and finally write the prediction result into the message queue service includes:

the first monitoring unit is used for monitoring the request message queue, and dividing the borderless data set into data sets which are processed corresponding to the selected machine learning model based on different window modes according to the application scene when receiving new stream request data;

the model prediction unit is used for inputting the segmented data set into a machine learning model to obtain a prediction result;

and the writing unit is used for writing the prediction result into a reply message queue taking the client ID as an identifier, and adding a request ID corresponding to the streaming request data into the prediction result so as to reconstruct the prediction result under the condition that the prediction result is unordered.

According to the technical scheme, the following beneficial effects are achieved:

the invention discloses a method and a system for deploying machine learning model online service for stream data, wherein the method comprises the following steps: the method comprises the steps of constructing a machine learning model online service framework oriented to streaming data, wherein the machine learning model online service framework comprises a unified model online service external API interface, a streaming data real-time processing channel and a distributed model prediction service; the distributed model prediction service comprises a plurality of machine learning models; receiving a connection request of an application client, and establishing bidirectional communication connection between the application client and each gRPC service end node in a gRPC service cluster according to gRPC bidirectional flow; each gRPC service end node in the gRPC service cluster receives the stream request data and stores the stream request data into the message queue service; and the message queue monitoring service selects a corresponding machine learning model to conduct model prediction when receiving new streaming request data, writes a prediction result into the message queue service, and pushes the prediction result to an application client in real time through an external API interface of the unified model online service.

The distributed service deployment architecture realizes the near-real-time processing capability of streaming data and can process large-batch data, and the specific advantages include:

1) The method has the advantages that the online model service is converted from batch prediction to real-time prediction of stream processing, so that processing delay is effectively reduced; by developing a real-time processing channel with a streaming technology, caching event data by using a message queue, monitoring the event data stored in the message queue by a model service, predicting and writing a result back to the message queue in real time, and then responding to a user, thereby fully realizing the real-time characteristic of the method; the real-time nature ensures that the data is fresh and enables the model to respond to the latest changes.

2) In the method, data is processed by a system designed for infinite data stream processing, input data is distributed into windows with specific sizes according to application scenes, and each window is processed as an independent limited data set, so that the data processing efficiency is greatly improved, and the processing delay is reduced.

3) The machine learning model online service framework for the streaming data adopts architecture design which supports model calculation load balance and can be expanded in parallel, so that the streaming data and model processing capacity which are continuously increased are effectively and effectively applied, the calculation capacity can be uniformly distributed along with the time, and the calculation performance is high.

It should be understood that all combinations of the foregoing concepts, as well as additional concepts described in more detail below, may be considered a part of the inventive subject matter of the present disclosure as long as such concepts are not mutually inconsistent.

The foregoing and other aspects, embodiments, and features of the present teachings will be more fully understood from the following description, taken together with the accompanying drawings. Other additional aspects of the invention, such as features and/or advantages of the exemplary embodiments, will be apparent from the description which follows, or may be learned by practice of the embodiments according to the teachings of the invention.

Drawings

The drawings are not intended to be drawn to scale with respect to true references. In the drawings, each identical or nearly identical component that is illustrated in various figures may be represented by a like numeral. For purposes of clarity, not every component may be labeled in every drawing. Embodiments of various aspects of the invention will now be described, by way of example, with reference to the accompanying drawings, in which:

FIG. 1 is a schematic diagram of an online service framework of a machine learning model for streaming data according to the present invention;

FIG. 2 is a schematic diagram of load balancing of the gRPC server implemented by the external API of the model of the invention;

FIG. 3 is a schematic diagram illustrating preprocessing and buffering of streaming request data by the gRPC server according to the present invention;

FIG. 4 is a schematic diagram of a distributed model predictive service subscription streaming request data in accordance with the present invention;

FIG. 5 is a schematic diagram of a distributed model predictive service of the present invention writing a predicted result to a message queue;

FIG. 6 is a schematic diagram of a prediction result of the present invention being pushed to an application client;

FIG. 7 is a flow chart of a machine learning model online service deployment method for streaming data.

Detailed Description

In order to make the objects, technical solutions and advantages of the embodiments of the present invention more clear, the technical solutions of the embodiments of the present invention will be clearly and completely described below with reference to the accompanying drawings of the embodiments of the present invention. It will be apparent that the described embodiments are some, but not all, embodiments of the invention. All other embodiments, which can be made by a person skilled in the art without creative efforts, based on the described embodiments of the present invention fall within the protection scope of the present invention. Unless defined otherwise, technical or scientific terms used herein should be given the ordinary meaning as understood by one of ordinary skill in the art to which this invention belongs.

The terms "first," "second," and the like in the description and in the claims, are not used for any order, quantity, or importance, but are used for distinguishing between different elements. Also, unless the context clearly indicates otherwise, singular forms "a," "an," or "the" and similar terms do not denote a limitation of quantity, but rather denote the presence of at least one. The terms "comprises," "comprising," or the like are intended to cover a feature, integer, step, operation, element, and/or component recited as being present in the element or article that "comprises" or "comprising" does not exclude the presence or addition of one or more other features, integers, steps, operations, elements, components, and/or groups thereof.

The method for deploying the online service based on the existing model mostly adopts a data batch processing mode to process data, and the mode can finish data processing in batches, but on one hand, the processed data are mostly bordered data, borderless data cannot be processed, on the other hand, the processing efficiency is low, and the processing has delay and cannot be predicted in real time; the method can effectively solve the defects existing in the existing data batch processing by adopting a stream processing mode. The invention provides a machine learning model online service deployment method and system for streaming data based on the characteristics of the streaming data, which adopt a distributed architecture to realize decoupling and mutual dependence of model service and WEB service, develop a real-time processing channel with streaming technology, realize real-time machine learning prediction of the streaming data, provide more real-time data analysis capability and realize high service availability.

The invention discloses a method and a system for deploying machine learning model online service for stream data, which are further specifically introduced by combining specific embodiments shown in the drawings.

The embodiment shown in fig. 7 discloses a flow of a machine learning model online service deployment method for streaming data, which specifically includes the following steps:

step S102, a machine learning model online service framework oriented to streaming data is constructed, wherein the machine learning model online service framework comprises a unified model online service external API interface, a streaming data real-time processing channel and a distributed model prediction service; the distributed model prediction service comprises a plurality of machine learning models with model prediction functions; the gRPC service cluster comprises a plurality of gRPC service end nodes;

in practice, the online service framework realizes continuous transmission of streaming data and real-time machine learning analysis. As shown in fig. 1, the unified model online service provides an external API interface by the WEB API gateway, which can reduce the interaction complexity between the application client and the server, and can uniformly perform the task of cutting and the load balancing of the server. The gRPC service cluster is used for centralizing a plurality of gRPC service end nodes to provide the same service, so that the overall computing capacity of the service can be improved, and the service can be seen as only one service node at an application client; the gRPC service end node is in long connection with the application client end and can be in bidirectional communication, and is used for receiving the streaming data request, caching the streaming data request to the message queue service, monitoring the message queue service to acquire a prediction result, and pushing the prediction result to the application client end asynchronously and in real time; the message queue service is used for caching streaming request data and analysis results, realizing asynchronous processing of the requests and peak clipping processing of data peak periods, and comprises a request message queue and a reply message queue, wherein the request message queue is used for caching streaming request data, and the reply message queue is used for caching prediction results. The online service framework realizes load balancing and horizontal expansion of prediction requests among a plurality of machine learning models based on message queue service, and further has the characteristics of low delay, expandability and high throughput.

Step S104, receiving a connection request from each application client, and establishing bidirectional communication connection between each application client and each gRPC service end node in the gRPC service cluster by using gRPC bidirectional flow;

in the balanced load diagram shown in fig. 2, an application client is a gRPC client, in this scheme, the application client is a client implementing a gRPC protocol, and the application client and a gRPC server node communicate through the gRPC protocol; when establishing connection, the gRPC client initiates a connection request to each gRPC server node in the gRPC service cluster. Specifically, all gRPC service end nodes in the gRPC service cluster register service information in the WEB API gateway in advance, and then when a gRPC client initiates a gRPC request, the following flow is realized, namely: receiving a gRPC request of a gRPC client and judging a currently available gRPC server node; selecting a connectable gRPC server node according to a preset load balancing strategy, and sending node information to a gRPC client so that the gRPC client establishes bidirectional communication connection with the gRPC server node according to the node information; distributing a unique client ID to the gRPC client which is in bidirectional communication connection with the gRPC server node so that the gRPC server node receives stream request data sent by the gRPC client corresponding to the client ID, and realizing interaction between the gRPC server node and the gRPC client; as shown in fig. 2, the gRPC server node buffers the information of the client until the connection is disconnected during the bidirectional communication.

The currently available gRPC service end nodes are obtained by accessing a WEB API gateway, specifically, the gRPC service end nodes register service information including IP and ports when being started, the service information is sent to the WEB API gateway, the WEB API gateway regularly sends health check heartbeats of each gRPC service end node, and the gRPC client judges whether the service end node is available according to the returned gRPC service end node state. In addition, the mode of selecting gRPC service end nodes is different under different load balancing strategies. For example, under the polling policy, the WEB API gateway may circularly allocate the received gRPC request for connection to each gRPC server node in the gRPC service cluster; under the random strategy, the WEB API gateway randomly selects one gRPC service end node from the gRPC service cluster list, and forwards the connection request to the gRPC service end node.

Step S106, each gRPC service end node in the gRPC service cluster continuously receives the flow request data from each application client and sequentially stores the flow request data into the message queue service;

the streaming request data initiated by the application client is sent to the corresponding gRPC server node according to the bidirectional communication connection established between the streaming request data and the gRPC server node, and the gRPC server node further processes the received streaming request data. In this embodiment, as shown in fig. 3, the gRPC server node allocates a thread or processes a request for each gRPC client, and sequentially writes the received request data of the gRPC client into a corresponding request message queue; specifically, the gRPC server node generates a request ID corresponding to each streaming request data according to the time sequence of receiving the streaming request data; then, gRPC service end node sends each stream request data with request ID to request message queue with client ID as mark; the streaming request data comprises three types of information, namely model input data, a machine learning model of a prediction application and model prediction parameters, in a bidirectional communication link formed by a gRPC client and a gRPC server node, except that the model input data is recorded in each streaming request data, the machine learning model and the model prediction parameters of the prediction application can be marked in the first streaming request data determined according to the request sequence, and the follow-up streaming request data adopts the same machine learning model and model prediction parameters.

Optionally, when the request message queue has only one queue, after the gRPC server node generates the request ID in the early stage, each stream request data with the request ID and the client ID added is sent to the message queue, so that a result after the subsequent request processing is fed back to the corresponding application client.

Step S108, monitoring a message queue service, selecting a corresponding machine learning model in a distributed model prediction service to perform model prediction when receiving new stream request data, and writing a prediction result into the message queue service;

listening work is performed by the distributed model predictive services, in particular, all of the distributed model predictive services subscribe to data in the request message queue and the reply message queue in a group subscription manner. For streaming request data, the specific process from the message queue service monitoring to the message queue service writing of the prediction result is as follows: monitoring a request message queue, and dividing the borderless data set into data sets which are processed corresponding to the selected machine learning model based on different window modes according to an application scene when new streaming request data is received; inputting the segmented data set into a machine learning model to obtain a prediction result. As shown in fig. 4, each time the distributed model prediction service subscribes to receive one streaming request data, the content in the streaming request data is analyzed, and the machine learning model used and how to acquire model input data and how to intercept borderless data sets (including window modes, window sizes, etc. determined according to application scenarios) are determined, so that the machine learning model and corresponding input data are acquired, and model prediction is performed.

After model prediction is completed and a prediction result is obtained, the prediction result is written into a reply message queue taking a client ID as an identifier, and a request ID corresponding to streaming request data is added into the prediction result, so that the prediction result is recombined under the condition that the prediction result is unordered. As shown in fig. 5, the distributed model prediction service may obtain the client ID and the request ID in the data when processing the streaming request data, and according to this, write the prediction result into the reply message queue of the corresponding client ID.

And step S110, monitoring a message queue service, and pushing the predicted result to an application client corresponding to the request in real time through an external API interface of the unified model online service when receiving the new predicted result.

The distributed model prediction service further monitors the prediction results in the reply message queue, and pushes the prediction results to the corresponding application clients in real time when receiving new prediction results. As shown in fig. 6, the gRPC server node listens to the reply message queues of the corresponding clients processed by the node, sorts the predicted results according to the request IDs after receiving the predicted results, and then pushes the predicted results to the application clients in order.

In the scheme, after the gRPC server node receives streaming request data sent by a corresponding gRPC client, unique client ID is used for identifying the request data, and after model service processing is completed, a prediction result of the corresponding request is also identified by using the client ID, so that the gRPC server node can push the prediction result to the corresponding gRPC client after receiving the prediction result.

Compared with the existing model online service deployment mode, the machine learning model online service deployment method for the streaming data disclosed by the embodiment has the advantages that the real-time processing channel with the streaming technology is developed, so that the data processing speed and the real-time performance are realized, especially for the streaming data, the repeated uploading of the data can be avoided through the message queue service, the waste of the flow is avoided, and the real-time machine learning of the streaming data is fully realized; the model deployment mode in the method not only can realize decoupling and mutual dependence of model service and WEB service, but also can horizontally expand service deployment in the scheme so as to support larger data volume, provide more real-time data analysis capability and realize high availability of service.

In an embodiment of the present invention, there is further provided an electronic device, where the device includes a memory, a processor, and a computer program stored on the memory and capable of running on the processor, and when the processor runs the computer program, the method for deploying the machine learning model online service for streaming data disclosed in the foregoing embodiment is implemented.

The above-described programs may be run on a processor or may be stored in a memory, i.e., a computer readable medium, including both permanent and non-permanent, removable and non-removable media, which may be implemented by any method or technology for information storage. The information may be computer readable instructions, data structures, modules of a program, or other data. Examples of storage media for a computer include, but are not limited to, phase change memory (PRAM), static Random Access Memory (SRAM), dynamic Random Access Memory (DRAM), other types of Random Access Memory (RAM), read Only Memory (ROM), electrically Erasable Programmable Read Only Memory (EEPROM), flash memory or other memory technology, compact disc read only memory (CD-ROM), digital Versatile Discs (DVD) or other optical storage, magnetic cassettes, magnetic tape magnetic disk storage or other magnetic storage devices, or any other non-transmission medium, which can be used to store information that can be accessed by a computing device. Computer-readable media, as defined herein, does not include transitory computer-readable media, such as modulated data signals and carrier waves.

These computer programs may also be loaded onto a computer or other programmable data processing apparatus to cause a series of operational steps to be performed on the computer or other programmable apparatus to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide steps for implementing the functions specified in the flowchart block or blocks and/or block diagram block or blocks, and corresponding method steps may be implemented in different modules.

In this embodiment, an apparatus or system is provided, which may be referred to as a machine learning model online service deployment system for streaming data, the system comprising: the building module is used for building a machine learning model online service framework oriented to the streaming data and comprises a unified model online service external API interface, a streaming data real-time processing channel and a distributed model prediction service; the streaming data real-time processing channel comprises a gRPC service cluster and a message queue service; the distributed model prediction service comprises a plurality of machine learning models with model prediction functions; the gRPC service cluster comprises a plurality of gRPC service end nodes; the first receiving module is used for receiving connection requests from all application clients and establishing bidirectional communication connection between all application clients and all gRPC service end nodes in the gRPC service cluster by utilizing gRPC bidirectional flow; the second receiving module is used for continuously receiving the streaming request data from each application client by each gRPC service end node in the gRPC service cluster and sequentially storing the streaming request data into the message queue service; the first monitoring module is used for monitoring the message queue service, selecting a corresponding machine learning model in the distributed model prediction service to perform model prediction when receiving new streaming request data, and writing a prediction result into the message queue service; and the second monitoring module is used for monitoring the message queue service, and pushing the predicted result to the application client corresponding to the request in real time through the unified model online service external API interface when receiving the new predicted result.

The steps of the online service deployment method for the machine learning model for streaming data disclosed in the above embodiment are already described and will not be described in detail herein.

For example, a unified model online service of the machine learning model online service framework for streaming data constructed by the construction module provides an external API interface by a WEB API gateway; the gRPC service end node in the gRPC service cluster of the streaming data real-time processing channel is in long connection with the application client capable of two-way communication and is used for receiving streaming data requests, caching the streaming data requests to the message queue service, monitoring the message queue service to acquire a prediction result and then asynchronously pushing the streaming data requests to the application client in real time; the message queue service of the streaming data real-time processing channel comprises a request message queue and a reply message queue, wherein the request message queue is used for caching streaming request data, and the reply message queue is used for caching prediction results.

For another example, the execution unit implemented by the first receiving module to establish a bidirectional communication connection between each application client and the gRPC service cluster includes:

the receiving judging unit is used for receiving the gRPC request initiated by the gRPC client and judging the currently available gRPC server node; the gRPC client is the application client, and each gRPC service end node in the gRPC service cluster registers service information to a WEB API gateway; the gRPC client is used for establishing bidirectional communication connection with the gRPC server node according to the node information; and the distribution interaction unit is used for distributing a unique client ID to the gRPC client which is in bidirectional communication connection with the gRPC server node so that the gRPC server node receives stream request data sent by the gRPC client corresponding to the client ID, and interaction between the gRPC server node and the gRPC client is realized.

For another example, the specific execution unit of the second receiving module for receiving the streaming request data through the gRPC server node and sequentially storing the streaming request data in the message queue service includes: the generation unit is used for generating a request ID corresponding to each streaming request data according to the time sequence of receiving the streaming request data by the gRPC server node; a sending unit, configured to send each stream request data with a request ID attached to the request message queue using the client ID as an identifier in sequence; the streaming request data comprises model input data, a machine learning model of a prediction application and model prediction parameters.

For another example, the specific execution unit that the first monitoring module monitors the message queue service and finally writes the prediction result into the message queue service includes: the first monitoring unit is used for monitoring the request message queue, and dividing the borderless data set into data sets which are processed corresponding to the selected machine learning model based on different window modes according to the application scene when receiving new stream request data; the model prediction unit is used for inputting the segmented data set into a machine learning model to obtain a prediction result; and the writing unit is used for writing the prediction result into a reply message queue taking the client ID as an identifier, and adding a request ID corresponding to the streaming request data into the prediction result so as to reconstruct the prediction result under the condition that the prediction result is unordered.

For another example, the second monitoring module monitors the message queue service, and finally pushes the prediction result to the execution unit of the application client corresponding to the request in real time, including: the second monitoring unit is used for monitoring the reply message queue; and the pushing unit is used for pushing the predicted result to the application client corresponding to the client ID identified by the reply message queue in real time when the reply message queue is monitored to receive the new predicted result.

The method and the system disclosed by the invention realize that the machine learning model online service framework for streaming data provides an asynchronous WEB interface for the outside, receives, caches and processes the data in real time, and sends an analysis result, so that invalid blocking of a client is avoided; the deployment scheme utilizes the distributed service deployment architecture, effectively solves the problems of strong processing delay, lower computing performance and the like of the existing machine learning model online service deployment method, realizes the near-real-time processing capability of streaming data, and can process a large amount of data.

While the invention has been described with reference to preferred embodiments, it is not intended to be limiting. Those skilled in the art will appreciate that various modifications and adaptations can be made without departing from the spirit and scope of the present invention. Accordingly, the scope of the invention is defined by the appended claims.

Claims

1. The online service deployment method of the machine learning model for the streaming data is characterized by comprising the following steps of:

Monitoring a message queue service, pushing the predicted result to an application client corresponding to the request in real time through an external API interface of a unified model online service when receiving the new predicted result;

the unified model online service external API interface is provided by a WEB API gateway;

2. The online service deployment method of a machine learning model for streaming data according to claim 1, wherein the application client is a gRPC client, and each gRPC server node in the gRPC service cluster registers service information to a WEB API gateway;

3. The online service deployment method of the machine learning model for streaming data according to claim 2, wherein the specific process of the gRPC service end node receiving streaming request data and sequentially storing the streaming request data in the message queue service is as follows:

4. The online service deployment method of machine learning model for streaming data according to claim 3, wherein the specific process from monitoring the message queue service to writing the prediction result into the message queue service is:

5. A machine learning model online service deployment system for streaming data, comprising:

the second monitoring module is used for monitoring the message queue service, and pushing the predicted result to the application client corresponding to the request in real time through the unified model online service external API interface when receiving the new predicted result;

the unified model online service of the machine learning model online service framework for the streaming data constructed by the construction module is provided by a WEB API gateway; the gRPC service end node in the gRPC service cluster is in long connection with the application client end capable of two-way communication and is used for receiving the streaming data request, caching the streaming data request to a message queue service, and asynchronously pushing the streaming data request to the application client end in real time after the message queue service is monitored to obtain a prediction result; the message queue service comprises a request message queue and a reply message queue, wherein the request message queue is used for caching streaming request data, and the reply message queue is used for caching prediction results.

6. The online service deployment system of a machine learning model for streaming data according to claim 5, wherein the execution unit for establishing a bidirectional communication connection between each application client and the gRPC service cluster by the first receiving module includes:

7. The system for deploying a machine learning model online service for streaming data according to claim 6, wherein the second receiving module receives the streaming request data through the gRPC server node and sequentially stores the streaming request data to a specific execution unit in the message queue service comprises:

8. The system for deploying a machine learning model online service for streaming data according to claim 7, wherein the specific execution unit for the first monitoring module to monitor the message queue service and finally write the prediction result into the message queue service comprises: