CN112860402A - Dynamic batch processing task scheduling method and system for deep learning inference service - Google Patents

Dynamic batch processing task scheduling method and system for deep learning inference service Download PDF

Info

Publication number
CN112860402A
CN112860402A CN202110192645.XA CN202110192645A CN112860402A CN 112860402 A CN112860402 A CN 112860402A CN 202110192645 A CN202110192645 A CN 202110192645A CN 112860402 A CN112860402 A CN 112860402A
Authority
CN
China
Prior art keywords
batch
size
upper limit
deep learning
task
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202110192645.XA
Other languages
Chinese (zh)
Other versions
CN112860402B (en
Inventor
张德宇
罗云臻
张尧学
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Central South University
Original Assignee
Central South University
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Central South University filed Critical Central South University
Priority to CN202110192645.XA priority Critical patent/CN112860402B/en
Publication of CN112860402A publication Critical patent/CN112860402A/en
Application granted granted Critical
Publication of CN112860402B publication Critical patent/CN112860402B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5011Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals
    • G06F9/5016Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resources being hardware resources other than CPUs, Servers and Terminals the resource being the memory
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N3/00Computing arrangements based on biological models
    • G06N3/02Neural networks
    • G06N3/08Learning methods
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Abstract

The invention discloses a dynamic batch processing task scheduling method and a system for deep learning inference service, wherein the method comprises the following steps: describing the number of tasks such as queues and the like at the leaving time of each batch and the size of the leaving batch by a two-dimensional Markov process, determining the steady-state probability of the two-dimensional Markov process, and determining the average service delay in a deep learning inference service system according to the steady-state probability; and constructing an optimization model to optimize the upper limit of the batch processing task size, the average service delay and the memory usage amount, and solving the optimization model to determine the upper limit of the batch size of the batch processing task. The invention has the advantages of meeting dynamic environment, having better average service delay and memory occupation, and the like.

Description

Dynamic batch processing task scheduling method and system for deep learning inference service
Technical Field
The invention relates to the technical field of edge computing and cloud computing, in particular to a dynamic batch processing task scheduling method and system for deep learning inference service.
Background
Due to the excellent performance of Deep Learning (Deep Learning) in the fields such as image processing, natural language processing, and the like, and the increasing popularity of mobile devices equipped with systems such as Android and iOS, mobile devices can now provide a great number of intelligent applications for end users. For example, there are over 16500 mobile applications on Google Play that use deep learning as a core component to provide intelligent services from computer vision to text and audio processing. The specific application includes a mobile phone APP developed by Microsoft and named Seeing AI for assisting visually impaired people to identify the surrounding environment by using a vehicle-mounted camera. Adobe Scan converts images to text using text recognition techniques based on deep learning.
One common way for mobile devices to provide intelligent services using deep learning is to run deep learning reasoning based on pre-trained models. However, deep learning model reasoning has high requirements in terms of energy, memory, and computational engine cycles. Although some mobile neural network computing elements have been released on the market to speed up deep learning model reasoning on devices, such as NPUs and TPUs, their computing power is still very limited and it is difficult to guarantee high quality of service.
To provide efficient mobile intelligence services, a more efficient solution is to offload model reasoning onto powerful edge or cloud servers. As the application range of the deep learning model is continuously expanded and improved, the demand for deep learning reasoning has been rapidly increased in recent years, as observed from related information released by leading high-tech companies. Specifically, a dedicated platform DLIS (DLIS) deployed by microsoft receives hundreds of thousands of Deep Learning Inference requests every second. While the deep learning reasoning demand of Facebook's data center has also increased by a factor of three in two years.
In mobile applications such as AR and VR, the critical issue is the strict low latency requirement, typically in the millisecond range. With the significant increase in the amount of deep learning inference task requests, this strict low latency requirement becomes a challenge even for powerful GPU servers.
Due to the highly parallel computing architecture of the GPU, batching together the inputs can significantly improve computational efficiency. Through research and analysis of the throughput rates of the representative deep learning models on the 2 GPU servers under different batch (batch) sizes, the throughput rates of different batch sizes in the different-depth learning models shown in FIG. 1 are obtained, and therefore the throughput rates can be greatly improved through batch processing input. Meanwhile, through the research on the relationship between the batch input by batch processing and the video memory occupation, the relationship between the sizes of different batches and the video memory occupation in the learning model with different depths as shown in fig. 2 is obtained, and the worst condition is that the video memory occupation reaches 2558 MB.
However, the existing related researches for improving the throughput rate of the deep learning inference service and reducing the delay of the deep learning inference service in a batch processing mode are discussed in a static environment, the static environment means that the tasks of the deep learning inference service are considered to be statically waiting at the local of a server by the related researches, but in the case of actual network service, the tasks arrive at random, so that how to reasonably utilize the batch processing mode to optimize the deep learning inference service in the process of random arrival is not deeply researched in the prior art and has practical significance.
Disclosure of Invention
The technical problem to be solved by the invention is as follows: aiming at the technical problems in the prior art, the invention provides a dynamic batch processing task scheduling method and system of deep learning inference service, which is in accordance with a dynamic environment and has better average service delay and memory occupation.
In order to solve the technical problems, the technical scheme provided by the invention is as follows: a dynamic batch processing task scheduling method of deep learning inference service describes the number of tasks such as queues and the like of each batch at the leaving time and the size of the leaving batch by a two-dimensional Markov process, determines the steady-state probability of the two-dimensional Markov process, and determines the average service delay in a deep learning inference service system according to the steady-state probability;
optimizing the upper limit of the batch processing task size and the average service delay and the memory usage amount by the optimization model shown in the formula (1),
Figure BDA0002945694400000031
in the formula (1), E (W (b)) is the average service delay corresponding to the upper limit of the batch size b, b is the upper limit of the batch size of the batch processing task, W (b) is the service delay, gamma is the weight of the memory usage compared with the average service delay, mbThe corresponding memory usage when the upper limit of the batch size is B, B is the maximum value of the upper limit of the batch size, N is the maximum number of tasks waiting in the batch processing task queue, lambda is the task arrival rate, muBThe service rate when the batch size is B; solving the optimization model of equation (1) determines the upper limit of batch size in a batch processing task.
Further, the average service delay is determined by the equation (2) calculation,
Figure BDA0002945694400000032
in the formula (2), E (W (b)) is the average service delay corresponding to the upper limit b of the batch size, L is the average task number, λ is the task arrival rate, PblockIs the blocking probability of the task.
Further, the average task number is determined by equation (3),
Figure BDA0002945694400000033
the blocking probability is determined by equation (4),
Figure BDA0002945694400000034
in the formulas (3) and (4), E (L) is the average task number, n is the number of waiting tasks in the batch processing task queue, r is the batch size, a is the lower limit of the batch size of the batch processing tasks, b is the upper limit of the batch size of the batch processing tasks, and pin,rFor the steady-state probability, pi, of waiting for n tasks and a batch size of rn,0For the steady-state probability, pi, of waiting for the number of tasks to be n and the size of the batch to be 0N,rThe number of waiting tasks is N, and the steady-state probability of the batch size is r.
Further, the solving process of the optimization model comprises the following steps:
initializing the upper limit of the batch size of the batch processing task and the step length for adjusting the upper limit of the batch size in each iteration; taking the sum of the average service delay and the memory usage corresponding to the upper limit of the batch size as a convergence parameter; and in each iteration, adjusting the upper limit of the batch size according to the step length, and when the convergence parameter obtained in the current iteration is larger than the convergence parameter of the previous iteration, taking the upper limit of the batch size obtained in the current iteration as the optimal solution output by the optimization model.
Further, in the first iteration, the method further includes a process of correcting the adjustment direction of the step size: and when the difference between the average service delay obtained in the first iteration and the average service delay corresponding to the initialized upper limit of the batch size is larger than a preset threshold value, changing the adjustment direction for adjusting the upper limit of the batch size.
A dynamic batch processing task scheduling system of deep learning inference service carries out task scheduling according to the dynamic batch processing task scheduling method of the deep learning inference service.
Compared with the prior art, the invention has the advantages that: compared with the traditional single task processing method, the method has the advantages that the processing speed is greatly improved; compared with the traditional batch processing method, the speed is greatly improved compared with the batch processing method with the optimal fixed batch size, and the video memory occupation is obviously improved; compared with a greedy dynamic batch processing method, under the condition that service delay is basically the same, the occupation amount of the video memory is greatly reduced, and the occupation aspect of the video memory is greatly improved.
Drawings
Fig. 1 shows throughput of different depth learning models in the prior art.
FIG. 2 is a graph of memory occupancy of different depth learning models in the prior art.
FIG. 3 is a flow chart illustrating an embodiment of the present invention.
Fig. 4 is a schematic diagram illustrating a relationship between the deep learning model *** lenet, DenseNet-169 inference throughput rate (a) and the GPU utilization rate (b) on the NVIDIA RTX 2080 GPU and the batch size in the embodiment of the present invention.
Fig. 5 is a schematic diagram illustrating a relationship between the deep learning model *** lenet, DenseNet-169 inference throughput rate (a) and the GPU utilization rate (b) on the NVIDIA Titan Xp GPU according to an embodiment of the present invention.
Fig. 6 is a schematic diagram of the relationship between NVIDIA RTX 2080 GPU (a) and NVIDIA Titan Xp GPU (b) in which *** lenet and DenseNet-169 infer GPU video memory occupancy.
Fig. 7 is a schematic diagram of service delay and blocking probability of the deep learning model *** inference service on NVIDIA RTX 2080 GPU in the dynamic batch processing (a ═ 1) and the static batch processing (a ═ b), and the task arrival rate in the service process is 990 tasks/second.
FIG. 8 is a schematic diagram illustrating a relationship between a dynamic batch lower limit and a service delay of a deep learning model GoogLeNet inference service on an NVIDIA RTX 2080 GPU in an embodiment of the present invention under a fixed dynamic batch upper limit
Fig. 9 is a schematic diagram illustrating comparison of video memory occupation of a deep learning model *** inference service on an NVIDIA RTX 2080 GPU under dynamic batch processing and static batch processing in the embodiment of the present invention.
Fig. 10 is a diagram illustrating a queuing model corresponding to a deep learning inference service system model in an embodiment of the present invention.
Fig. 11 is a diagram illustrating a comparison between the deep learning model *** lenet and densneet-169 inference services on NVIDIA RTX 2080 GPU in the real case and the model prediction case, where the task arrival rate in the *** lenet service process is 990 tasks/second and the task arrival rate in the densneet-169 is 330 tasks/second in the embodiment of the present invention.
Fig. 12 is a schematic diagram illustrating a situation that the inference service of the *** lenet model is affected by the upper limit b of the batch size and the arrival rate λ in the embodiment of the present invention.
Fig. 13 is a schematic diagram illustrating a comparison between the service delay of the method of the present invention and different static batch processing under the condition of varying task arrival rates of the *** lenet inference service of the deep learning model on the NVIDIA RTX 2080 GPU in the embodiment of the present invention.
Fig. 14 is a schematic diagram illustrating comparison between the method of the present invention and different static batch processing and display occupation of greedy dynamic batch processing under the condition of varying task arrival rates by the *** lenet inference service of the deep learning model on the NVIDIA RTX 2080 GPU in the embodiment of the present invention.
Detailed Description
The invention is further described below with reference to the drawings and specific preferred embodiments of the description, without thereby limiting the scope of protection of the invention.
According to the dynamic batch processing task scheduling method of the deep learning inference service, the number of tasks such as queues and the like at the leaving time of each batch and the size of the leaving batch are described by a two-dimensional Markov process, the steady-state probability of the two-dimensional Markov process is determined, and the average service delay in a deep learning inference service system is determined according to the steady-state probability;
optimizing the upper limit of the batch processing task size and the average service delay and the memory usage amount by the optimization model shown in the formula (1),
Figure BDA0002945694400000061
in the formula (1), E (W (b)) is the average service delay corresponding to the upper limit of the batch size b, b is the upper limit of the batch size of the batch processing task, W (b) is the service delay, gamma is the weight of the memory usage compared with the average service delay, mbThe corresponding memory usage when the upper limit of the batch size is B, B is the maximum value of the upper limit of the batch size, N is the maximum number of tasks waiting in the batch processing task queue, lambda is the task arrival rate, muBThe service rate when the batch size is B; solving the optimization model of equation (1) determines the upper limit of batch size in a batch processing task.
In the present embodiment, the average service delay is determined by the calculation of equation (2),
Figure BDA0002945694400000062
in the formula (2), E (W (b)) is the average service delay corresponding to the upper limit b of the batch size, L is the average task number, λ is the task arrival rate, PblockThe remaining parameters in the equation are defined as above for the blocking probability of the task.
In this embodiment, the average task number is determined by equation (3),
Figure BDA0002945694400000063
the blocking probability is determined by equation (4),
Figure BDA0002945694400000071
in the formulas (3) and (4), E (L) is the average task number, n is the number of waiting tasks in the batch processing task queue, r is the batch size, a is the lower limit of the batch size of the batch processing tasks, b is the upper limit of the batch size of the batch processing tasks, and pin,rFor the steady-state probability, pi, of waiting for n tasks and a batch size of rn,0For the steady-state probability, pi, of waiting for the number of tasks to be n and the size of the batch to be 0N,rFor the steady-state probability of waiting for N number of tasks and for a batch size of r, the remaining parameters in the equation are defined as above.
In this embodiment, the solving process of the optimization model includes: initializing the upper limit of the batch size of the batch processing task and the step length for adjusting the upper limit of the batch size in each iteration; taking the sum of the average service delay and the memory usage corresponding to the upper limit of the batch size as a convergence parameter; and in each iteration, adjusting the upper limit of the batch size according to the step length, and when the convergence parameter obtained in the current iteration is larger than the convergence parameter of the previous iteration, taking the upper limit of the batch size obtained in the current iteration as the optimal solution output by the optimization model.
In this embodiment, during the first iteration, the method further includes a process of correcting the adjustment direction of the step size: and when the difference between the average service delay obtained in the first iteration and the average service delay corresponding to the initialized upper limit of the batch size is larger than a preset threshold value, changing the adjustment direction for adjusting the upper limit of the batch size.
A dynamic batch processing task scheduling system of deep learning inference service carries out task scheduling according to the dynamic batch processing task scheduling method of the deep learning inference service.
The method of the present embodiment is verified and analyzed through a specific simulation experiment. In the experiment, the batch-based deep learning inference service system is described as follows by analyzing a pattern of an actual network deep learning inference service.
The GPU server receives a large number of mobile devices or other terminals to receive deep learning model inference tasks, the task arrival process follows a Poisson distribution with a speed of lambda, the inference process follows a general distribution, and the inference delay depends on the current batch size. When the GPU server runs only one deep learning inference model, the inference process follows a deterministic distribution since there are no other tasks and its competition. In a more realistic scenario, where multiple model inference services run on a single server, the inference delay services are generally distributed to describe the different delays caused by computing resource competition among the services.
According to the mode of the deep learning inference service system, in the experiment of the embodiment, the deep learning inference service system based on batch processing is modeled into an M/G (a, b)/1/N queuing model, and the queuing model is modeled according to the paradigm of the d.g. kendall queuing theory, wherein M represents that the arrival process of tasks obeys poisson distribution, G represents that the inference process obeys general distribution, a represents the lower limit of the size of an inference batch, b represents the upper limit of the size of the inference batch, 1 represents the number of servers, and N represents the maximum number of tasks waiting in the batch processing task queue of the queuing system. And after a queuing model is established, analyzing the queuing model to obtain the average service delay. Specifically, a closed-form solution formula of the average service delay of the deep learning service inference system is theoretically constructed, and the solution result of the closed-form solution formula comprises queuing delay and inference delay. After a large number of experimental analyses, it is determined that as the number of pictures that need to be loaded into the video card increases with the increase of the batch and the amount of intermediate data generated in the inference process increases, the video memory occupancy increases linearly with the increase of the batch size, and this relationship can be described by a linear function. Finally, the optimization variable is the batch size upper limit b by generalizing the closed form company and the linear function into an objective function of the optimization problem. Since the values of the batch sizes are discrete and the search space is small, the problem is solved by traversing the search space.
In the currently popular deep learning frameworks such as TensorFlow, Pytrch, MxNet, and Dynet, where GPU acceleration is used, the batch processed images will be organized into a matrix, which is passed on to the computation of the deep learning model already loaded on the GPU. On one hand, batch processing can enable all images to share convolution kernel weight in convolution operation, so that calling of model parameters is reduced, and delay is reduced; batch processing, on the other hand, takes advantage of the parallelism of convolution operations and full-link layer neuron operations, which further takes advantage of the parallel computing power of the GPU architecture. In the experiment, a deep learning model reasoning experiment is carried out on an RTX 2080 GPU based on 8GB video memory and a Titan X Pascal (Titan Xp) GPU based on 12GB memory, the deep learning model reasoning utilizes a CUDA-v10.0.130 interface and a cuDNN-v7.6.2 acceleration library, and a Pytrch is used as a deep learning framework.
In this embodiment, for comparison, the throughput and video memory occupancy of 5 typical deep learning models in different batch sizes were tested, and the input image size was 224 × 224 pixels. Considering that the number of tasks in the queue during the service process is an arbitrary integer, not a power of 2, two models of DenseNet-169 and *** lenet were chosen, as shown in fig. 4 and 5, and their FLOPs (floating point operands) were 3.2 × 109 and 1.41 × 109, respectively. Under the condition that the batch size of the batch processing is changed from 1 to 64, the throughput is increased rapidly along with the increase of the batch processing scale and then fluctuates around a certain point, which shows that the batch processing can accelerate the model reasoning speed to a considerable extent, and other models tested have the same characteristics. For the GoogLeNet running on RTX 2080 and Titan Xp, the throughput rates of 7.67 times and 25.27 times can be respectively improved by batch processing, and the throughput rates and the GPU utilization rate are almost consistent with the change trend of the batch processing scale. For statistical validation, the throughput and video memory occupancy results in the experiments shown in fig. 4 and 5 are the average of 100 runs.
In the experiment, a curve was fitted by the least squares method to analyze the relationship between the batch throughput and the throughput. Through experiments, the batch reasoning time delay tau under the batch size rr=v×r+τ0V denotes the slope of the inference delay, τ, as I/O operations increase with batch size0Intercept representing inference time delay, v > 0, tau0Is greater than 0. From this, the service rate mu can be determined for the batch size rrIs expressed in (unit of batch/s) as
Figure BDA0002945694400000091
At finger size r, the expression of throughput rate (image/s) is r × μr
In a scenario where a server provides deep learning model inference services, there may be a large number of clients submitting tasks to the server. The server will organize the tasks into a queue. The maximum number of tasks waiting in the batch task queue is denoted by N in the experiment. According to the Service Level Agreement (SLA), the response time is an important index for the cloud service or the network service, and since an excessively large N may cause the queue tail task to time out due to high delay, N should not be excessively large, and is set to 128 in the present experiment. The service delay W consists of two parts, queuing delay and inference delay, respectively. In experiments to mimic random arrival of inference tasks, it was assumed that the task arrival process followed a poisson distribution.
In a practical system, since the tasks arrive randomly, and the randomly arrived tasks cannot be served immediately, the deep learning service delay includes not only the inference delay but also the queuing delay. In the experiment, the service delay W and the blocking task P of GoogLeNet on the RTX 2080 GPU are tested under different system statesblock. It can be determined that the batch size of the process is by the serverIf the number of the tasks is less than the upper limit b of the batch processing size, the server processes all the tasks as a batch; otherwise, the batch size processed by the server will be the upper limit b. Considering that the batch size is limited by the GPU memory, in the experiment, a maximum value B is set for the value of B, namely B is less than or equal to B, and B is less than or equal to N, and in the experiment, B is set to 64. BXmuBCorresponding to the maximum throughput rate in a certain service procedure. Defining the flow intensity as
Figure BDA0002945694400000101
Wherein muBRepresenting the service rate when the batch size is B, λ represents the task arrival rate, ρ < 1, since when λ is greater than or equal to the maximum throughput rate Bx μBWhile increasing the arrival rate λ only increases Pblock
By comparing the cases where a is 1(a is the lower limit of the batch size, and a is 1, i.e., the dynamic batch size) and a is b (i.e., the fixed batch size), and setting ρ to 0.75, the task arrival rate λ can be calculated to be 990, i.e., representing an average of 990 tasks per second. As shown in fig. 7, in the case where a is 1, the service delay decreases when b is increased for a small value of b, and fluctuates only slightly when b is increased. The service delay value in the case of a-b is larger than that in the case of a-b because the task that arrives first must wait until there are at least b tasks in the queue before the batch processing can be performed. Fig. 8 shows that when the upper batch limit b is fixed, increasing the value of the lower limit a increases the service delay, and the corresponding video memory occupation amount is as shown in fig. 9. It was determined that the dynamic batch size performed better than the fixed batch size, and in addition, when a ═ b ═ 1, i.e., no batch was performed, the average of the service delays was 781ms, Pblock83% of PblockThis is almost the same trend as the service delay W, because the lower the service delay, the lower the queuing delay of the task, and thus the lower the probability that the queue is full when the task arrives.
In deep learning inference calculation, a GPU (graphics processing unit) video memory (i.e. a memory) is an important resource in a GPU calculation process, and unlike CPU calculation, a physical GPU cannot accurately limitThe video memory usage of a process, but the GPU may be virtualized by remote API technology and PCI pass-through technology, or distributed to different containers for different services by nvidia docker. The GPU server provides all GPU video memory for a process and prepares to allocate more memory when the process applies for it. Because the Out of Memory and Page Fault are Fatal errors (total Error) to the GPU, the process where the Fault occurs will stop running, and different processes will compete for the GPU Memory, it is necessary to reduce the usage amount of the GPU Memory of one process as much as possible without affecting the running speed. Because each image needs to be loaded into the memory in the reasoning process and an output tensor is generated at each layer of the neural network, the memory usage amount mrIs linear with the batch size r, as shown in FIG. 6, the relationship between the memory usage and the batch size can be expressed as mr=kr+m0Where k is a slope representing an increase in video memory usage with an increase in the batch size r, and m0Is used for expressing the video memory usage of the loading deep learning model, and k is more than 0, m0>0。
In the deep learning inference service system based on dynamic batch processing, the size of each batch is constrained to be a less than or equal to r less than or equal to b, namely when the number of waiting tasks is more than a, the server starts batch inference, and the number of tasks of one batch cannot exceed b. The batch size is limited by the video memory of the GPU, namely the maximum value of B is B, namely B is less than or equal to B. In order to obtain the average service delay, the number of waiting tasks at any time in the system needs to be analyzed, the change process of the number of waiting tasks in the system at any time is a non-Markov process because the service process obeys general distribution, and in addition, the reasoning delay depends on the batch size between a and b. To simplify the analysis, in this experiment, the embedded Markov chain (eMC) technique was first utilized to obtain the transition probability of a Markov process with two dimensions, including the number of queue waiting tasks n and the batch size r. The embedded Markov process records the number of waiting tasks for a batch to complete, which in this embodiment is referred to as the batch departure time, as shown in FIG. 10. And obtaining the probability matrix of the system state at any moment through the relation between the system state at the batch leaving moment and the probability of the system state at any moment. In this embodiment, x (t) ═ n (t), r (t)) is used to indicate the two-dimensional markov process formed by the evolution of the number of queue-waiting tasks n and the size r of the batch left at each time when the batch leaves, where t is the subscript of the time when the batch leaves, i.e., the number of queue-waiting tasks at the time when the batch leaves the tth batch n (t) and r (t) is the size of the batch of the tth batch.
The proof process that can represent the relationship between the number of waiting tasks and the size of the leaving batch for a deep learning inference service with a two-dimensional markov process is as follows:
with Vt,t+1(r) represents the number of arriving tasks between the time t and t +1 of the departure of the batch, the batch size being r, the conversion relation between n (t) and r (t) in the different cases being deduced as follows:
n (t) < a, that is, the number of tasks in the queue at the batch departure time t is less than a, and the server needs to wait until a-n (t) tasks arrive before the inference of the batch size a can be carried out, so that n (t +1) ═ Vt,t+1(a) And r (t +1) ═ a.
A ≦ n (t) ≦ b, i.e., the number of tasks at the time t of batch departure is between a and b, and all n (t) tasks will be inferred as a batch. Thus, there is n (t +1) ═ Vt,t+1(n (t)) and r (t +1) ═ n (t).
N (t) > b, i.e. the number of tasks at the time t when the batch leaves is greater than b, the first b tasks will be inferred as one batch. Thus, there is n (t +1) ═ n (t) — b + Vt,t+1(b) And r (t +1) ═ b.
It can thus be determined that the values of n (t +1) and r (t +1) are defined by n (t), r (t) and Vt,t+1(r (t +1)) and Vt,t+1The value of (r (t +1)) follows a poisson distribution which is memoryless between the various batch departure times, so that the relationship between the number of waiting tasks of the deep learning inference service and the size of the departing batch can be represented by a two-dimensional markov process.
From the Markov property of the process X (t), the profile of the various states of the system can be analyzedAnd (4) rate. The state space of X (t) is the union of the remaining tasks and leaving batches from 0 to N and a to b, respectively. By using
Figure BDA0002945694400000128
To represent a probability transition matrix having dimensions of (N +1) (b-a +1) × (N +1) (b-a + 1). To simplify the analysis, the original matrix is divided into sub-matrices each having a size of (b-a +1) × (b-a +1) and used
Figure BDA0002945694400000121
Which is represented by the following formula,
Figure BDA0002945694400000122
wherein θ (. cndot.) represents
Figure BDA0002945694400000123
The values in the parenthesis for the individual elements of the matrix, theta (-), represent batch sizes between a and b.
For the purpose of facilitating the explanation of the derivation process, the definition is preceded by the value determined
Figure BDA0002945694400000124
To process a batch of r tasks, the probability of j tasks arriving at the server is defined
Figure BDA0002945694400000125
If n (t). ltoreq.a, the value of n (t) is equal to a, if a < n (t). ltoreq.b, the value of n (t) is equal to b, if n (t) > b, the value of θ (-) is determined in several cases:
n (t) is less than or equal to b, and N (t +1) is less than or equal to N-1. In this case, the next batch to be reasoned is equal in size to the batch size
Figure BDA0002945694400000126
If n (t) < a, reasoning after waiting for one task to arrive, otherwise, performing batch reasoning on all tasks. Probability of n (t +1) tasks in the server at the completion of the next batch, etcIn that
Figure BDA0002945694400000127
Namely, it is
Figure BDA0002945694400000131
In
Figure BDA0002945694400000132
The other θ (·) is 0.
N (t) ≦ b and N (t +1) ═ N. For the same reason as in the previous case, in this case the batch size to be reasoned for the next batch is equal to
Figure BDA00029456944000001321
Since N (t +1) ═ N, it means that at least N tasks arrive in the service process of the batch of tasks. The probability of there being n (t +1) tasks in the server at the completion of the next batch is equal to
Figure BDA0002945694400000133
Namely, it is
Figure BDA0002945694400000134
In
Figure BDA0002945694400000135
The other θ (·) is 0.
B < N (t). ltoreq.N and N (t + 1). ltoreq.N-1. In this case, the next batch size to be inferred
Figure BDA0002945694400000136
To achieve N (t +1) ≦ N-1, the number of tasks reached is equal to N (t +1) - (N (t) -b). The probability of n (t +1) tasks being present in the server at the completion of the next batch is equal to
Figure BDA0002945694400000137
Namely, it is
Figure BDA0002945694400000138
In
Figure BDA0002945694400000139
The other θ (·) is 0.
B < N (t) ≦ N and N (t +1) ═ N. In this case, the next batch size to be inferred
Figure BDA00029456944000001310
Since N (t +1) ═ N, at least N (t +1) - (N (t) — b) tasks arrive during the batch inference process. The probability of there being n (t +1) tasks in the server at the completion of the next batch is equal to
Figure BDA00029456944000001311
Namely, it is
Figure BDA00029456944000001312
In
Figure BDA00029456944000001313
The other θ (·) is 0.
From a transition probability matrix
Figure BDA00029456944000001314
The steady state probability matrix of X (t) can be derived
Figure BDA00029456944000001315
By solving for
Figure BDA00029456944000001316
The equation can be obtained
Figure BDA00029456944000001317
By using
Figure BDA00029456944000001318
To express the steady-state probability that a server has n tasks in a queue at a batch departure time with a batch size r, since all the states of the system are to be described, a steady-state probability matrix of the batch departure time is required, that is, the server has a steady-state probability matrix of the batch departure time
Figure BDA00029456944000001319
And system taskThe steady state probability matrix pi at the intentional moment is related byn,rTo represent the probability of each state in the steady-state probability matrix pi. The steady state probability matrix of the batch leaving time can be obtained by calculation
Figure BDA00029456944000001320
The functional relationship with the steady state probability matrix pi at any time of the system is as follows, and the steady state probability of the system state at N is more than or equal to 0 and less than or equal to N-1 is as follows:
Figure BDA0002945694400000141
Figure BDA0002945694400000142
Figure BDA0002945694400000143
a+1≤r≤b-1,0≤n≤N-1
Figure BDA0002945694400000144
0≤n≤N-1
wherein the content of the first and second substances,
Figure BDA0002945694400000145
srrepresenting the mean inference delay when the batch size is r.
Then the steady state probability in the N-N state is as follows:
Figure BDA0002945694400000146
Figure BDA0002945694400000147
a+1≤r≤b-1
Figure BDA0002945694400000148
wherein p isn,r(0) Indicates the probability that there are n tasks in the queue and the remaining service time of a batch of batch size r is 0, pn,r(0) Given by:
Figure BDA0002945694400000149
after the analysis, the important indexes can be obtained by calculation, the average task number E (L) in the system is shown as a formula (3) and comprises the tasks in queuing and the tasks in service, and then the average service delay E (W) and the blocking probability P are obtained according to the Litter's ruleblockAs shown in formula (4), according to λ (1-P)block) The effective arrival rate can be calculated.
Through experiments and the analysis, it can be determined that the service delay in the deep learning inference service based on dynamic batch processing initially decreases with the batch size b and then fluctuates with the increase of the upper bound of the batch size b, and the video memory occupancy monotonically increases with the increase of the batch size. Therefore, it can be determined that in different system states, a good balance between delay and memory usage is achieved by adjusting the value of the upper bound b, and the optimization model shown in formula (1) is obtained. Under the constraint conditions in equation (1), the flow intensity is considered for practical reasons
Figure BDA0002945694400000151
In order to ensure the stable operation of the system, B is more than or equal to 1 and less than or equal to B, which is the range of the upper bound of the batch processing size. It was found through experiments that E (w (b)) decreases when b is increased for smaller values of b, and E (w (b)) fluctuates only slightly when b is increased, so that γ can be set to a smaller value in case the system is not very sensitive to video occupation. The solution variable for the optimization problem is the value of the upper batch size bound b,the goal is to minimize the service delay E (W (b)) and gammambThe sum of (1).
Since the queuing model can accurately predict the service delay under different system states, the influence of the upper bound b of the batch size and the arrival rate λ on the average service delay E (w (b)) can be analyzed by the queuing model, as shown in fig. 12, it is found that:
when
Figure BDA0002945694400000152
When is, i.e. b μbAbove the arrival rate λ, the service delay E (w (b)) decreases slightly with increasing b. As can be seen from FIGS. 4 and 5, b μbIncreases monotonically with b, where μbIndicating the service rate for a batch size b. b is the upper limit of the batch size during service, thus b μbIs the upper limit of the throughput rate during service. When b μbIf the arrival rate is higher than the arrival rate, the server can deduce the arrived task in time, and the reduction of the service delay caused by increasing the value of b is not obvious.
When
Figure BDA0002945694400000153
I.e. b mubAt or initially less than λ, the value of the service delay E (w (b)) will increase dramatically as b decreases. In this case, in a queuing system, the task queue must create a backlog, resulting in each newly arrived task being faced with a full task queue.
When
Figure BDA0002945694400000161
I.e. b mubBelow the arrival rate λ, the value of the service delay E (w (b)) continues to increase with decreasing b. In queuing systems with limited queue capacity, the throughput rate has saturated when the arrival rate is greater than the throughput rate, in which case lowering the throughput rate, i.e. lowering the value of b in the system, will result in an increased queuing delay for the tasks.
As shown in fig. 12, the situation that E (w (b)) varies with the quantity size upper bound b and the arrival rate λ in the *** lenet inference service is reflected, and other deep learning models listed in fig. 1 also have the above characteristics due to similarity of inference processes.
In the embodiment, after the optimization model shown in the formula (1) is determined, the solution of the optimization model is realized through an iterative process, and the efficiency is higher compared with the brute force search. The code for the specific implementation of the iterative algorithm is as follows:
Figure BDA0002945694400000162
Figure BDA0002945694400000176
the method comprises the following steps: first, when
Figure BDA0002945694400000171
A relatively low average service delay and memory usage can be achieved. For equation λ ═ b μbB in (1) is solved to obtain
Figure BDA0002945694400000172
Will be lambda tau0V (1- λ v) rounding up is to ensure
Figure BDA0002945694400000173
(lines 1-2). Then, the values of E (W (b)) and E (W (b-1)) corresponding to the current b are calculated to obtain the surge condition of the service delay. Because E (W (b)) and E (W (b-1)) correspond to E (W (b)))
Figure BDA0002945694400000174
And
Figure BDA0002945694400000175
average service delay (line 3). Finally, the value of b is adjusted according to the trade-off parameters γ and k. When: 1) when E (W (b-1)) -E (W (b)) < gamma k, it means that the weight of the video memory usage in (OP) is greater than the burst rate of the service delay, and the value of b needs to be reduced to reduce the video memory usage. 2) E (W (b-1)) -E (W (b)) > gamma k, indicating that b can be increased further to obtain lower clothesTraffic delay (lines 7-8, 11). 3) Changing the value of b results in E (W (b)) + gammambBecomes large, and the value of b is the optimal solution b*(lines 15-16). Due to E (W (b)) and mbRespectively, with the monotone decreasing and increasing of b, so E (W (b)) + gamma mbIn the definition domain [1, B]Must have a minimum value, the worst case for the algorithm to search is b*At the end points of the defined domain.
The method is realized on a deep learning frame Pythrch, and the performance of the optimization method is evaluated by adopting an NVIDIA RTX 2080 GPU and a deep learning model GoogleNet. FIG. 13 shows the comparison of the service latency of the method of the present invention and different static batch processes under varying task arrival rates for the deep learning model GoogleLeNet inference service on NVIDIA RTX 2080 GPU. Where the transition scenario for the arrival rate is 330, 800, 730, 930, 1120, 990, 330, 530, 670, 400 tasks/sec, where 50000 tasks are reached at each arrival rate. FIG. 14 shows a comparison between the video memory occupancy of the deep learning model GoogleLeNet inference service on NVIDIA RTX 2080 GPU and different static batch processing and greedy dynamic batch processing under the condition of varying task arrival rate. The transition scenario for the arrival rate is 330, 800, 730, 930, 1120, 990, 330, 530, 670, 400 tasks/second, with 50000 tasks arriving at each arrival rate. Compared with a single-task processing method, the optimization method is accelerated by 31 times; in the case of batch processing, as shown in fig. 11 and 12, the method of the present invention is accelerated by 2.2 times compared with the optimal fixed batch size batch processing method, and the GPU video memory occupation is 0.8 times thereof; compared with the greedy dynamic batch processing method, the GPU video memory occupation is only 0.3 times of that of the greedy dynamic batch processing method, and the service delay is basically the same.
The foregoing is considered as illustrative of the preferred embodiments of the invention and is not to be construed as limiting the invention in any way. Although the present invention has been described with reference to the preferred embodiments, it is not intended to be limited thereto. Therefore, any simple modification, equivalent change and modification made to the above embodiments according to the technical spirit of the present invention should fall within the protection scope of the technical scheme of the present invention, unless the technical spirit of the present invention departs from the content of the technical scheme of the present invention.

Claims (6)

1. A dynamic batch processing task scheduling method of deep learning inference service is characterized in that:
describing the number of tasks such as queues and the like at the leaving time of each batch and the size of the leaving batch by a two-dimensional Markov process, determining the steady-state probability of the two-dimensional Markov process, and determining the average service delay in a deep learning inference service system according to the steady-state probability;
optimizing the upper limit of the batch processing task size and the average service delay and the memory usage amount by the optimization model shown in the formula (1),
Figure FDA0002945694390000011
in the formula (1), E (W (b)) is the average service delay corresponding to the upper limit of the batch size b, b is the upper limit of the batch size of the batch processing task, W (b) is the service delay, gamma is the weight of the memory usage compared with the average service delay, mbThe corresponding memory usage when the upper limit of the batch size is B, B is the maximum value of the upper limit of the batch size, N is the maximum number of tasks waiting in the batch processing task queue, lambda is the task arrival rate, muBThe service rate when the batch size is B; solving the optimization model of equation (1) determines the upper limit of batch size in a batch processing task.
2. The dynamic batch task scheduling method of deep learning inference service of claim 1, wherein:
the average service delay is determined by the calculation of equation (2),
Figure FDA0002945694390000012
in the formula (2), E (W (b)) is the average service delay corresponding to the upper limit b of the batch size, L is the average task number, λ is the task arrival rate, PblockIs the blocking probability of the task.
3. The dynamic batch task scheduling method of deep learning inference service of claim 2, characterized in that:
the average task number is determined by equation (3),
Figure FDA0002945694390000021
the blocking probability is determined by equation (4),
Figure FDA0002945694390000022
in the formulas (3) and (4), E (L) is the average task number, n is the number of waiting tasks in the batch processing task queue, r is the batch size, a is the lower limit of the batch size of the batch processing tasks, b is the upper limit of the batch size of the batch processing tasks, and pin,rFor the steady-state probability, pi, of waiting for n tasks and a batch size of rn,0For the steady-state probability, pi, of waiting for the number of tasks to be n and the size of the batch to be 0N,rThe number of waiting tasks is N, and the steady-state probability of the batch size is r.
4. The method for scheduling the dynamic batch processing task of the deep learning inference service as claimed in claim 3, wherein the solving process of the optimization model comprises:
initializing the upper limit of the batch size of the batch processing task and the step length for adjusting the upper limit of the batch size in each iteration; taking the sum of the average service delay and the memory usage corresponding to the upper limit of the batch size as a convergence parameter; and in each iteration, adjusting the upper limit of the batch size according to the step length, and when the convergence parameter obtained in the current iteration is larger than the convergence parameter of the previous iteration, taking the upper limit of the batch size obtained in the current iteration as the optimal solution output by the optimization model.
5. The method for scheduling the dynamic batch processing task of the deep learning inference service as claimed in claim 4, wherein in the first iteration, the method further comprises a process of correcting the adjustment direction of the step size: and when the difference between the average service delay obtained in the first iteration and the average service delay corresponding to the initialized upper limit of the batch size is larger than a preset threshold value, changing the adjustment direction for adjusting the upper limit of the batch size.
6. A dynamic batch task scheduling system of deep learning inference service, characterized in that, the task scheduling is performed according to the dynamic batch task scheduling method of deep learning inference service of any claim from 1 to 5.
CN202110192645.XA 2021-02-20 2021-02-20 Dynamic batch task scheduling method and system for deep learning reasoning service Active CN112860402B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202110192645.XA CN112860402B (en) 2021-02-20 2021-02-20 Dynamic batch task scheduling method and system for deep learning reasoning service

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202110192645.XA CN112860402B (en) 2021-02-20 2021-02-20 Dynamic batch task scheduling method and system for deep learning reasoning service

Publications (2)

Publication Number Publication Date
CN112860402A true CN112860402A (en) 2021-05-28
CN112860402B CN112860402B (en) 2023-12-05

Family

ID=75988278

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202110192645.XA Active CN112860402B (en) 2021-02-20 2021-02-20 Dynamic batch task scheduling method and system for deep learning reasoning service

Country Status (1)

Country Link
CN (1) CN112860402B (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113961328A (en) * 2021-10-26 2022-01-21 深圳大学 Task processing method and device, storage medium and electronic equipment
CN117376423A (en) * 2023-12-08 2024-01-09 西南民族大学 Deep learning reasoning service scheduling method, system, equipment and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106792779A (en) * 2016-12-30 2017-05-31 浙江大学 It is a kind of to permit and exempting from the cellular network connection control method of licensed band work
US20180121601A1 (en) * 2016-10-28 2018-05-03 Edico Genome, Corp. Bioinformatics systems, apparatuses, and methods for performing secondary and/or tertiary processing
CN110312272A (en) * 2019-07-23 2019-10-08 中南大学 A kind of network services block resource allocation methods and storage medium
CN112346866A (en) * 2020-11-05 2021-02-09 中国科学院计算技术研究所 GPU (graphics processing Unit) scheduling method and system based on asynchronous data transmission

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20180121601A1 (en) * 2016-10-28 2018-05-03 Edico Genome, Corp. Bioinformatics systems, apparatuses, and methods for performing secondary and/or tertiary processing
CN106792779A (en) * 2016-12-30 2017-05-31 浙江大学 It is a kind of to permit and exempting from the cellular network connection control method of licensed band work
CN110312272A (en) * 2019-07-23 2019-10-08 中南大学 A kind of network services block resource allocation methods and storage medium
CN112346866A (en) * 2020-11-05 2021-02-09 中国科学院计算技术研究所 GPU (graphics processing Unit) scheduling method and system based on asynchronous data transmission

Non-Patent Citations (8)

* Cited by examiner, † Cited by third party
Title
C.ZHOU ET AL.: ""Delay-Aware IoT Task Scheduling in Space-Air-Ground Integrated Network"", 《2019 IEEE GLOBAL COMMUNICATIONS CONFERENCE (GLOBECOM)》 *
LIN, LEI, QIAN WANG, AND ADEL W. SADEK.: ""Border crossing delay prediction using transient multi-server queueing models"", 《TRANSPORTATION RESEARCH PART A: POLICY AND PRACTICE64 (2014)》 *
PANDA, GOPINATH, ABHIJIT DATTA BANIK, AND DIBYAJYOTI GUHA.: ""Stationary analysis and optimal control under multiple working vacation policy in a GI/M (a, b)/1 queue"", 《JOURNAL OF SYSTEMS SCIENCE AND COMPLEXITY 31 (2018)》 *
ZHANG, DEYU, ET AL.: ""Delay-optimal proactive service framework for block-stream as a service"", 《IEEE WIRELESS COMMUNICATIONS LETTERS 7.4 (2018)》, pages 598 - 601 *
ZHAO, WENJUAN, XIUSHUANG WANG, SHUNFU JIN, WUYI YUE, AND YUTAKA TAKAHASHI.: ""An Energy Efficient Task Scheduling Strategy in a Cloud Computing System and its Performance Evaluation using a Two-Dimensional Continuous Time Markov Chain Model"", 《ELECTRONICS 8》 *
何华;林闯;赵增华;庞善臣;: ""使用确定随机Petri网对Hadoop公平调度的建模和性能分析"", 《计算机应用》 *
王斐: ""基于***调度与随机算法的云服务优化技术研究"", 《中国博士学位论文全文数据库 基础科学辑》 *
赵海军;崔梦天;李明东;何先波;: ""基于CTMC和状态空间模型的宽带无线接入网的QoS性能研究"", 《电子学报》 *

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113961328A (en) * 2021-10-26 2022-01-21 深圳大学 Task processing method and device, storage medium and electronic equipment
CN117376423A (en) * 2023-12-08 2024-01-09 西南民族大学 Deep learning reasoning service scheduling method, system, equipment and storage medium
CN117376423B (en) * 2023-12-08 2024-03-12 西南民族大学 Deep learning reasoning service scheduling method, system, equipment and storage medium

Also Published As

Publication number Publication date
CN112860402B (en) 2023-12-05

Similar Documents

Publication Publication Date Title
JP6539236B2 (en) System and method for use in effective neural network deployment
US20220391771A1 (en) Method, apparatus, and computer device and storage medium for distributed training of machine learning model
CN108885571B (en) Input of batch processing machine learning model
US10140572B2 (en) Memory bandwidth management for deep learning applications
CN113950066A (en) Single server part calculation unloading method, system and equipment under mobile edge environment
US20210295168A1 (en) Gradient compression for distributed training
CN109657793B (en) Model training method and device, storage medium and electronic equipment
EP3602419B1 (en) Neural network optimizer search
CN110929839B (en) Method and device for training neural network, electronic equipment and computer storage medium
CN112860402B (en) Dynamic batch task scheduling method and system for deep learning reasoning service
CN113469355B (en) Multi-model training pipeline in distributed system
CN110795235B (en) Method and system for deep learning and cooperation of mobile web
CN110795246A (en) Resource utilization rate prediction method and device
CN110531996B (en) Particle swarm optimization-based computing task unloading method in multi-micro cloud environment
WO2019001323A1 (en) Signal processing system and method
CN114356540A (en) Parameter updating method and device, electronic equipment and storage medium
CN114514536A (en) Neural network training in distributed systems
CN114240506A (en) Modeling method of multi-task model, promotion content processing method and related device
CN109840597B (en) Model prediction method and device, electronic equipment and storage medium
CN110489955B (en) Image processing, device, computing device and medium applied to electronic equipment
CN111858916B (en) Method and device for clustering sentences
WO2023284347A1 (en) Task execution method and apparatus
CN113361621B (en) Method and device for training model
CN110782017B (en) Method and device for adaptively adjusting learning rate
CN114298329A (en) Model training method, device, equipment and storage medium

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant