CN113419750B

CN113419750B - Model reasoning service calling system and method

Info

Publication number: CN113419750B
Application number: CN202110976068.3A
Authority: CN
Inventors: 张险全; 薛延波; 赵鹏
Original assignee: Beijing Huapin Borui Network Technology Co Ltd
Current assignee: Beijing Hanlan Wolf Technology Co ltd
Priority date: 2021-08-24
Filing date: 2021-08-24
Publication date: 2021-11-02
Anticipated expiration: 2041-08-24
Also published as: CN113419750A

Abstract

The embodiment of the invention discloses a model inference service calling system, which comprises: the service application cluster comprises a plurality of service application nodes, each service application node is provided with an application container, and the application containers are provided with model reasoning Services (SDKs); the reasoning service cluster comprises a plurality of reasoning service containers, and each reasoning service container calls the loaded model in the model library to carry out reasoning and returns a reasoning result; the feature library is used for storing feature data of a plurality of versions of models; a model library storing other data of a plurality of versions of a model; and the automatic release deployment platform selects a model from the model library according to a user request, and configures the business application cluster and the inference service cluster. The embodiment of the invention also discloses a model reasoning service calling method. The invention can efficiently complete the algorithm model reasoning service, and the algorithm model reasoning service is decoupled from the service application, thereby realizing better double-track iteration.

Description

Model reasoning service calling system and method

Technical Field

The invention relates to the technical field of computers, in particular to a model inference service calling system and method.

Background

In the field of AI, algorithmic model reasoning is an important link. In the field of the existing algorithm model reasoning service, the algorithm model reasoning service has no standardized flow, so that the iteration efficiency is low, and the coupling between the business application and the algorithm model reasoning service is too tight, so that the training and the iteration of the algorithm model and the evolution coupling of the system architecture are too tight, the mutual influence is caused, the updating is difficult, and the efficiency is low.

Disclosure of Invention

In order to solve the above problems, the present invention aims to provide a system and a method for calling a model inference service, which can efficiently complete an algorithm model inference service, decouple the algorithm model inference service from a business application, and implement better dual-track iteration.

The embodiment of the invention provides a model inference service calling system, which comprises:

the system comprises two service application clusters which respectively support a local mode and a remote mode, wherein each service application cluster comprises a plurality of service application nodes, each service application node is provided with an application container, each application container is provided with a model inference Service (SDK), one of the two service application clusters is determined as a target service application cluster according to a service scene corresponding to a user request, the user request is routed to each application container in the target service application cluster, each application container respectively obtains characteristic data from a characteristic library according to the received user request for preprocessing, and respectively invokes a model inference service interface of the SDK, and each model inference Service (SDK) respectively invokes the corresponding inference service container according to a request parameter of the model inference service interface;

the reasoning service cluster comprises a plurality of reasoning service containers, and each reasoning service container calls the loaded model in the model library to carry out reasoning and returns a reasoning result to each application container;

the feature library is used for storing feature data of a plurality of versions of models;

the model library is used for storing other data of the models of the plurality of versions, and the other data comprises model files and model metadata;

and the automatic release deployment platform selects a model to be deployed from the model library according to a user request, and configures the business application cluster and the inference service cluster.

As a further improvement of the present invention, the system further comprises: an algorithm platform for updating the models in the model library,

the inference service container is provided with a model monitoring module to monitor whether the model in the model base has a new version or not and download the model of the new version from the model base when the new version exists.

As a further improvement of the present invention, for a business application cluster supporting local mode,

the application container and the inference service container are deployed on the same service application node, and the model inference service SDK calls the inference service container through a local mode.

As a further improvement of the invention, for the service application cluster supporting the remote mode, the application container and the inference service container are deployed on different service application nodes, and the model inference service SDK calls the inference service container through the remote mode.

As a further improvement of the present invention, the business application mirror image and the model inference service mirror image are customized and distributed in the mirror repository according to the business application and the model inference service,

the resource configuration of the target business application cluster is determined according to a user request,

in the target business application cluster, business application mirror images are deployed on business application nodes where all inference service containers are located, and model inference service mirror images are deployed on business application nodes where all inference service containers are located.

The embodiment of the invention also provides a model inference service calling method, which comprises the following steps:

the access layer receives a user request, selects a target service application cluster according to a service scene, and routes the target service application cluster to each application container in the target service application cluster, wherein the target service application cluster is a service application cluster supporting a local mode or a remote mode;

each application container acquires feature data from a feature library for preprocessing according to the received user request and respectively calls a model reasoning service interface of a model reasoning Service (SDK);

each model reasoning service SDK calls a corresponding reasoning service container in a local mode or a remote mode according to the request parameters of a reasoning service interface, wherein the request parameters of the reasoning service interface are obtained by conversion according to model metadata in a model library;

and each inference service container calls the loaded model in the model library respectively to carry out inference and returns an inference result to each application container.

As a further improvement of the invention, the inference service container is provided with a model monitoring module, the inference service container is initialized after being started, and the model monitoring module is started,

the method further comprises the following steps:

the inference service container monitors whether the model in the model base has a new version, if so, the next step is carried out, otherwise, the model in the model base is continuously monitored;

the inference service container downloads a new version of the model from the model library, carries out validity verification, carries out the next step if the verification is successful, and otherwise rolls back, unloads the new version of the model and sends alarm information;

the inference service container preheats the new version of the model;

and the inference service container is online with the new version model, unloads the old version model and continues to monitor the model in the model base.

As a further improvement of the present invention, the method further comprises:

customizing mirror images for business application and model inference service and packaging and issuing to a mirror image warehouse,

setting the resource allocation of the target service application cluster according to the user request,

in the target business application cluster, deploying the model inference service mirror images on the business application nodes where the inference service containers are located, and deploying the business application mirror images on the business application nodes where the application containers are located.

As a further improvement of the invention, for the business application cluster supporting the local mode, the application container and the inference service container are deployed on the same business application node,

the deploying the model inference service mirror image on the business application node where each inference service container is located, and the deploying the business application mirror image on the business application node where each inference service container is located, includes:

s11, the automatic release deployment platform deploys the model inference service mirror image on each business application node;

s12, each inference service container is initialized after being started, if the initialization is successful, the model inference service mirror image is deployed successfully, the inference service container starts a model monitoring module and continues the next step, otherwise, the automatic release deployment platform terminates all inference service containers, the model inference service mirror image is deployed unsuccessfully, and the deployment process is finished;

s13, the automatic release deployment platform deploys the service application mirror images on each service application node;

s14, initializing after starting each application container, if the initialization is successful, the service application mirror image deployment is successful, and continuing the next step, otherwise, the automatic release deployment platform terminates all application containers, the service application mirror image deployment is failed, and the deployment process is finished;

and S15, repeating S11-S14 until all the service application nodes are deployed.

As a further improvement of the invention, for the business application cluster supporting the remote mode, the application container and the inference service container are deployed on different business application nodes,

s21, the automatic release deployment platform deploys the model inference service mirror image on the service application node where each inference service container is located;

s22, each inference service container is initialized after being started, if the initialization is successful, the model inference service mirror image is deployed successfully, the inference service container starts a model monitoring module and continues the next step, otherwise, the automatic release deployment platform terminates all inference service containers, the model inference service mirror image is deployed unsuccessfully, and the deployment process is finished;

s23, repeating S21 and S22 until all the business application nodes where the inference service containers are located are deployed;

s24, the automatic release deployment platform deploys the service application mirror images on the service application nodes where the application containers are located;

s25, initializing after starting each application container, if the initialization is successful, the service application mirror image deployment is successful, and continuing the next step, otherwise, the automatic release deployment platform terminates all application containers, the service application mirror image deployment is failed, and the deployment process is finished;

and S26, repeating S24 and S25 until all the service application nodes where the application containers are located are deployed.

Embodiments of the present invention also provide an electronic device, which includes a memory and a processor, and is characterized in that the memory is configured to store one or more computer instructions, where the one or more computer instructions are executed by the processor to implement the method.

Embodiments of the present invention also provide a computer-readable storage medium, on which a computer program is stored, the computer program being executed by a processor to implement the method.

The invention has the beneficial effects that:

the resources of the business application cluster are isolated, reasonably distributed and called, a high-availability operation environment is provided for the algorithm model reasoning service, the algorithm model reasoning service can be efficiently completed, the iteration efficiency is improved, the algorithm model reasoning service is decoupled with the business application, the operation environments of different services are completely isolated, the training and iteration of the algorithm model are decoupled with the evolution of a system architecture without mutual influence, the double-track iteration can be better realized, and the cooperative work efficiency is improved.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below. It is obvious that the drawings in the following description are only some embodiments of the invention, and that for a person skilled in the art, other drawings can be derived from them without inventive effort.

Fig. 1 is a schematic diagram of a model inference service invocation system according to an exemplary embodiment of the present invention.

Detailed Description

The technical solutions in the embodiments of the present invention will be clearly and completely described below with reference to the drawings in the embodiments of the present invention, and it is obvious that the described embodiments are only a part of the embodiments of the present invention, and not all of the embodiments. All other embodiments, which can be derived by a person skilled in the art from the embodiments given herein without making any creative effort, shall fall within the protection scope of the present invention.

It should be noted that, if directional indications (such as up, down, left, right, front, and back … …) are involved in the embodiment of the present invention, the directional indications are only used to explain the relative positional relationship between the components, the movement situation, and the like in a specific posture (as shown in the drawing), and if the specific posture is changed, the directional indications are changed accordingly.

In addition, in the description of the present invention, the terms used are for illustrative purposes only and are not intended to limit the scope of the present invention. The terms "comprises" and/or "comprising" are used to specify the presence of stated elements, steps, operations, and/or components, but do not preclude the presence or addition of one or more other elements, steps, operations, and/or components. The terms "first," "second," and the like may be used to describe various elements, not necessarily order, and not necessarily limit the elements. In addition, in the description of the present invention, "a plurality" means two or more unless otherwise specified. These terms are only used to distinguish one element from another. These and/or other aspects will become apparent to those of ordinary skill in the art in view of the following drawings, and the description of the embodiments of the present invention will be more readily understood by those of ordinary skill in the art. The drawings are only for purposes of illustrating the described embodiments of the invention. One skilled in the art will readily recognize from the following description that alternative embodiments of the structures and methods illustrated in the present application may be employed without departing from the principles described in the present application.

As shown in fig. 1, the model inference service invoking system according to the embodiment of the present invention includes:

The service application cluster includes a plurality of service application nodes, that is, the Side Car mode service application cluster (local mode service application cluster) includes a plurality of service application nodes, for example, App Node-1 (first application Node), …, App Node-n (nth application Node) shown in fig. 1, App Node-1 arranges application containers-1, …, App Node-n arranges application container-n. The Rpc mode service application cluster (remote mode service application cluster) includes a plurality of service application nodes, such as App Node-1 (first application Node), …, App Node-n (nth application Node) shown in fig. 1, App Node-1 arranges application containers-1, …, App Node-n arranges application container-n. The number of service application nodes included in the two service application clusters and the number of application containers arranged on the service application nodes are not particularly limited.

The inference service cluster includes a plurality of inference service containers, such as inference service containers-1, …, inference service container-n shown in FIG. 1. The number of inference service containers which are contained in the inference service cluster is not particularly limited.

The model inference service SDK provides a client side SDK used by the system, and is packaged with a model inference service interface and model input parameters (such as single request call, batch request collection, timeout reconnection, local mode call, remote mode call and the like). For all service scenes, a set of Serving SDK can be used for completing the calling of the inference service of the algorithm model, so that the usability is high, and the access cost is low. When the corresponding inference service container is called, if the batch calling is carried out, the request needs to be cached, if the batch submitting threshold is met, a batch request interface of the model inference service SDK is called, otherwise, a single request interface of the model inference service SDK is called.

The reasoning service container is packaged with an algorithm framework and issues an algorithm model into a model reasoning service, the algorithm model is deployed in the reasoning service container through packaging, continuous integration is supported, and an algorithm developer can operate only by setting relevant parameters of the model. The inference service container supports two modes, namely a local mode (Side Car mode), namely the inference service container and the application container are deployed on the same business application node, the application container completes algorithm inference in a local calling mode, and a remote mode (RPC mode), namely the inference service container and the application container are deployed on different business application nodes, and the application container completes algorithm inference in a remote calling mode. And after the inference service container calls the loaded model in the model library for inference, returning an inference result to a caller of the inference service, namely the application container.

The feature library is used for storing feature data required by model reasoning, and an algorithm developer produces and obtains features which are stored in the feature library for use in online reasoning. After the application container obtains the feature data from the feature library for preprocessing (including the processes of feature data conversion, mapping and the like), the feature data are converted into feature data required by the model, namely, input parameters required by the model, and the application container can be applied to the inference service container to carry out model inference. Since the feature data stored in the feature library may not be consistent with the feature format and type of the model input, the feature data needs to be preprocessed for a specific model so that the processed data can be input into the model.

The model library is a highly available object storage system that stores the algorithmic models. Multiple versions of the model are stored in the model library, and the inference service container generally loads the latest version of the model. The model library stores a plurality of versions of models, so as to support quick rollback, and when the online loading of the updated models fails or the effect is not in line with expectation, smooth offline rollback can be performed, so that the influence on the service is reduced. It should be noted that the model in the model library may be an online training model or an offline training model, and may be selected according to business requirements.

The automatic release deployment platform is a platform for automatic construction, release and deployment, and can complete the release of projects by one key through a visual interface, thereby reducing manual intervention and improving the working efficiency. The platform supports blue-green release, golden sparrow release, gray release, production release and other release types, changes are recorded in real time, and quick rollback can be performed. When the relevant configuration is carried out according to the data requested by the user, the method comprises the following steps: selecting a model and a corresponding version which need to be deployed as an online inference service from a model library, selecting a container mirror image of the model inference service during operation from a mirror image warehouse, configuring hardware resources (including configuration of an application container and an inference service container in a target business application cluster) required by the inference service operation, inputting parameters of the model and the like.

The containers in the service application cluster and the inference service cluster of the system are both stateless high-availability architectures, single-point faults do not exist, the condition that the containers on other service application points can provide services after one service application node is down can be avoided, and the high availability of the model inference service is ensured. The feature library and the model library are both distributed high-availability systems, and single-point faults do not exist. The system is a highly available, extensible and highly concurrent model algorithm inference service architecture. The system supports a local mode and a remote mode, reasonably abstracts and encapsulates complex links into a Serving SDK (model reasoning service SDK), supports the calling of reasoning services of various algorithm frameworks through the Serving SDK, standardizes the reasoning service flow of the whole algorithm model, and improves the iteration efficiency of the algorithm model. The algorithm model inference service and the business application are decoupled, and the operating environments of different services are completely isolated, so that the training and iteration of the algorithm model are decoupled from the evolution of a system architecture (namely, each module in the system comprises a business application cluster, an inference service cluster, an automatic release deployment platform and the like), and are not influenced mutually, the double-track iteration can be better realized, the dependence between cooperative parties is reduced, and the working efficiency between the cooperative parties is improved.

In an optional embodiment, the system further comprises: an algorithm platform for updating the models in the model library,

The inference service container of the invention can monitor the model change in the model base, and when the model is updated, the model is reloaded. The machine learning model is often updated iteratively along with the lapse of time, and needs to be updated immediately in order to ensure the accuracy of the model online reasoning service. The method and the system automatically update, load and online the algorithm model output by the algorithm platform (used for model training), reduce manual operation and simultaneously do not influence online business.

In an optional implementation manner, for a business application cluster supporting a local mode, the application container and the inference service container are deployed on the same business application node, and the model inference service SDK invokes the inference service container through the local mode.

It can be understood that when the Side Car model business application cluster, the application container and the inference service container are deployed in the same business application Node, for example, the application container-1 and the inference service container-1 are deployed in App Node-1, …, the application container-n and the inference service container-n are deployed in App Node-n.

In an optional embodiment, for a business application cluster supporting a remote mode, the application container and the inference service container are deployed on different business application nodes, and the model inference service SDK invokes the inference service container through the remote mode.

It can be understood that, when the Rpc mode business application cluster, the application container and the inference service container are deployed in different business application nodes, for example, the application container-1 is deployed in App Node-1, …, the application container-n is deployed in App Node-n, the inference service container-1 is deployed in App Node-11, …, and the inference service container-n is deployed in App Node-1n, where App Node-11, … and App Node-1n are different business application nodes from App Node-1, … and App Node-n.

It can also be understood that the local mode is that two containers share the hardware resource of the same service application node, and the computation-intensive task may have a certain influence on the application container, but the local calling mode reduces network transmission, and the efficiency is higher. The remote mode needs remote calling, possibly needs some network time-consuming expenses, but can better perform resource isolation with the algorithmic inference service, more inference service containers can be expanded according to the requirement of computing performance, and hardware configuration can be further upgraded. The system of the invention can select a proper deployment mode of the application container and the inference service container according to different service scenes and requirements through two service application clusters. In addition, the algorithm model reasoning service is made into a standard model reasoning service mirror image, and an algorithm developer can operate only by setting the relevant model input parameters of the algorithm model through the Serving SDK without additional development.

In an alternative embodiment, the business application image and the model inference service image are customized and published in an image repository based on the business application and the model inference service,

The method and the device can determine the target service application cluster according to the service scene corresponding to the user request, determine the resource configuration (including the configuration of each service application node) of the target service application cluster, route the user request to a specified application container (serving as a target application container), and run the target application container through the service application mirror image. Correspondingly, the target application container calls each inference service container (as a target inference service container), and the target inference service container operates according to the model inference service mirror image.

According to the method, the operation environment and the container mirror image of the business application and the model inference service are preset, the application container supports various business scenes, the inference service container supports various algorithm frameworks, and a software environment for operating an algorithm model of machine learning does not need to be built. And by isolating, reasonably distributing and calling resources of the business application cluster, a high-availability operating environment is provided for the algorithm model reasoning service. And the container clustering is applied, and the inference service container clustering does not need to worry about single-point failure. The inference service container cluster can also support adjustment such as expansion, and waste of idle resources can be reduced while resource requirements of the model inference service are guaranteed. In addition, by means of automatic model updating, manual operation can be reduced, efficiency is improved, and meanwhile, online business is not affected.

The embodiment of the invention discloses a model inference service calling method, which comprises the following steps:

s1, the access layer receives the user request, selects the target service application cluster according to the service scene, and routes to each application container in the target service application cluster, wherein, the target service application cluster is the service application cluster supporting the local mode or the remote mode;

s2, each application container respectively obtains feature data from the feature library for preprocessing according to the received user request, and respectively calls a model inference service interface of the model inference service SDK;

s3, each model inference service SDK calls a corresponding inference service container in a local mode or a remote mode according to the request parameters of the inference service interface, wherein the request parameters of the inference service interface are obtained by conversion according to model metadata in a model library;

and S4, each inference service container calls the loaded model in the model library respectively to perform inference, and returns an inference result to each application container.

As described above, when the corresponding inference service container is called, if the inference service container is called in batch, the request needs to be cached, if the batch submission threshold is met, the batch request interface of the model inference service SDK is called, otherwise, the single request interface of the model inference service SDK is called. Therefore, in S3, the interface invocation manner of the model inference service SDK of the application container may be determined according to the model input parameters.

It will also be appreciated that the model inference service SDK of the application container determines from the initialization parameters (i.e. the model input parameters) whether to invoke the inference service container in local mode or in remote mode.

In an optional implementation mode, the inference service container is provided with a model monitoring module, the inference service container is initialized after being started, and the model monitoring module is started,

the method further comprises the following steps:

the inference service container preheats the new version of the model, so that request response jitter caused by model updating can be reduced;

and the inference service container is online with the model of the new version, unloads the model of the old version, releases resources and continues to monitor the model in the model library.

In an optional embodiment, the method further comprises:

As described above, after the target service application cluster is determined according to the service scenario corresponding to the user request, the user request is routed to the specified application container, the application container provides services according to the corresponding service application mirror image, the inference service container is called, and the inference service container completes inference services according to the corresponding model inference service mirror image.

In an alternative embodiment, for a business application cluster supporting a local mode, the application container and the inference service container are deployed on the same business application node,

The above process may be understood as a starting process of the service application cluster in the local mode, and after the deployment of the application container and the inference service container is performed, the deployment of the target service application cluster is completed, that is, the model inference service requested by the user may be completed through the target service application cluster. Because the application container and the model inference service container are deployed on the same business application node, the application container and the model inference service container are in a symbiotic relationship, and the application container and the model inference service container are not independent from each other and can affect each other. In S12, if the model inference service image fails to deploy, that is, the inference service container fails to deploy, the inference service container is terminated (failed to start), the connection cannot be established, the subsequent flow of S13 is not continued, and the entire deployment failure is ended. In S14, if the service application image deployment fails, that is, the deployment of the application container fails, the application container is terminated (the start fails), the connection cannot be established, the subsequent flow of S15 is not continued, and the entire deployment failure is ended.

In an alternative embodiment, for a business application cluster supporting remote mode, the application container and the inference service container are deployed on different business application nodes,

The above process may be understood as a starting process of the remote-mode business application cluster, and after the deployment of the application container and the inference service container is performed, the deployment of the target business application cluster is completed, that is, the model inference service requested by the user may be completed through the target business application cluster. The application container and the model inference service container are deployed on different business application nodes, the application container and the model inference service container are not in symbiotic relation, the application container and the model inference service container are mutually independent, and the application container and the model inference service container are not influenced mutually. In S22, if the model inference service mirror deployment fails, that is, the inference service container deployment fails, the inference service container is terminated and a connection cannot be established, the subsequent flow of S23 is not continued, the entire deployment failure is ended, but the inference service container termination (start failure) does not result in the termination of the application container, and after the reason is found, the inference service container is independently deployed again, that is, S21 and S22 are performed again. In S25, if the service application image deployment fails, that is, the deployment of the application container fails, the application container is terminated and the connection cannot be established, and the subsequent process of S26 is not continued, the entire deployment failure is ended, but the termination of the application container (start failure) does not cause the termination of the inference service container, and the reason can be found and then the application container is deployed again independently, that is, S24 and S25 are performed again.

The disclosure also relates to an electronic device comprising a server, a terminal and the like. The electronic device includes: at least one processor; a memory communicatively coupled to the at least one processor; and a communication component communicatively coupled to the storage medium, the communication component receiving and transmitting data under control of the processor; wherein the memory stores instructions executable by the at least one processor to implement the method of the above embodiments.

In an alternative embodiment, the memory is used as a non-volatile computer-readable storage medium for storing non-volatile software programs, non-volatile computer-executable programs, and modules. The processor executes various functional applications of the device and data processing, i.e., implements the method, by executing nonvolatile software programs, instructions, and modules stored in the memory.

The memory may include a storage program area and a storage data area, wherein the storage program area may store an operating system, an application program required for at least one function; the storage data area may store a list of options, etc. Further, the memory may include high speed random access memory, and may also include non-volatile memory, such as at least one magnetic disk storage device, flash memory device, or other non-volatile solid state storage device. In some embodiments, the memory optionally includes memory located remotely from the processor, and such remote memory may be connected to the external device via a network. Examples of such networks include, but are not limited to, the internet, intranets, local area networks, mobile communication networks, and combinations thereof.

One or more modules are stored in the memory and, when executed by the one or more processors, perform the methods of any of the method embodiments described above.

The product can execute the method provided by the embodiment of the application, has corresponding functional modules and beneficial effects of the execution method, and can refer to the method provided by the embodiment of the application without detailed technical details in the embodiment.

The present disclosure also relates to a computer-readable storage medium for storing a computer-readable program for causing a computer to perform some or all of the above-described method embodiments.

That is, as can be understood by those skilled in the art, all or part of the steps in the method for implementing the embodiments described above may be implemented by a program instructing related hardware, where the program is stored in a storage medium and includes several instructions to enable a device (which may be a single chip, a chip, or the like) or a processor (processor) to execute all or part of the steps of the method described in the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk or an optical disk, and other various media capable of storing program codes.

In the description provided herein, numerous specific details are set forth. It is understood, however, that embodiments of the invention may be practiced without these specific details. In some instances, well-known methods, structures and techniques have not been shown in detail in order not to obscure an understanding of this description.

Furthermore, those of ordinary skill in the art will appreciate that while some embodiments described herein include some features included in other embodiments, rather than other features, combinations of features of different embodiments are meant to be within the scope of the invention and form different embodiments. For example, in the claims, any of the claimed embodiments may be used in any combination.

It will be understood by those skilled in the art that while the present invention has been described with reference to exemplary embodiments, various changes may be made and equivalents may be substituted for elements thereof without departing from the scope of the invention. In addition, many modifications may be made to adapt a particular situation or material to the teachings of the invention without departing from the essential scope thereof. Therefore, it is intended that the invention not be limited to the particular embodiment disclosed, but that the invention will include all embodiments falling within the scope of the appended claims.

Claims

1. A model inference service invocation system, characterized in that said system comprises:

the automatic release deployment platform selects a model to be deployed from the model library according to a user request, and configures a business application cluster and an inference service cluster;

for the service application cluster supporting the local mode, the application container and the inference service container are deployed on the same service application node, and the model inference service SDK calls the inference service container through the local mode;

for the service application cluster supporting the remote mode, the application container and the inference service container are deployed on different service application nodes, and the model inference service SDK calls the inference service container through the remote mode.

2. The system of claim 1, wherein the system further comprises: an algorithm platform for updating the models in the model library,

3. The system of claim 1, wherein the business application image and the model inference service image are customized and published in the image repository according to the business application and the model inference service,

4. A model inference service invocation method, characterized in that the method comprises:

each inference service container calls the loaded model in the model library respectively to carry out inference and returns an inference result to each application container;

for the service application cluster supporting the local mode, the application container and the inference service container are deployed on the same service application node, and the model inference service SDK calls the inference service container through the local mode; for the service application cluster supporting the remote mode, the application container and the inference service container are deployed on different service application nodes, and the model inference service SDK calls the inference service container through the remote mode.

5. The method of claim 4, wherein the inference service container is provided with a model monitoring module, the inference service container is initialized after being started, and the model monitoring module is started,

the method further comprises the following steps:

the inference service container preheats the new version of the model;

6. The method of claim 4 or 5, wherein the method further comprises:

7. The method of claim 6, wherein deploying the model inference service image on the business application node where each inference service container is located and deploying the business application image on the business application node where each inference service container is located comprises:

8. The method of claim 6, wherein deploying the model inference service image on the business application node where each inference service container is located and deploying the business application image on the business application node where each inference service container is located comprises:

9. An electronic device comprising a memory and a processor, wherein the memory is configured to store one or more computer instructions, wherein the one or more computer instructions are executed by the processor to implement the method of any one of claims 4-8.

10. A computer-readable storage medium, on which a computer program is stored, the computer program being executable by a processor for implementing the method according to any of claims 4-8.