CN116932290A - Data processing system for obtaining target model - Google Patents

Data processing system for obtaining target model Download PDF

Info

Publication number
CN116932290A
CN116932290A CN202311196936.1A CN202311196936A CN116932290A CN 116932290 A CN116932290 A CN 116932290A CN 202311196936 A CN202311196936 A CN 202311196936A CN 116932290 A CN116932290 A CN 116932290A
Authority
CN
China
Prior art keywords
target
request
model
key
preset
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN202311196936.1A
Other languages
Chinese (zh)
Other versions
CN116932290B (en
Inventor
于伟
赵洲洋
靳雯
王全修
石江枫
王林
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Rizhao Ruian Information Technology Co ltd
Beijing Rich Information Technology Co ltd
Original Assignee
Rizhao Ruian Information Technology Co ltd
Beijing Rich Information Technology Co ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Rizhao Ruian Information Technology Co ltd, Beijing Rich Information Technology Co ltd filed Critical Rizhao Ruian Information Technology Co ltd
Priority to CN202311196936.1A priority Critical patent/CN116932290B/en
Publication of CN116932290A publication Critical patent/CN116932290A/en
Application granted granted Critical
Publication of CN116932290B publication Critical patent/CN116932290B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1402Saving, restoring, recovering or retrying
    • G06F11/1415Saving, restoring, recovering or retrying at system level
    • G06F11/1443Transmit or communication errors
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F11/00Error detection; Error correction; Monitoring
    • G06F11/07Responding to the occurrence of a fault, e.g. fault tolerance
    • G06F11/14Error detection or correction of the data by redundancy in operation
    • G06F11/1479Generic software techniques for error detection or fault masking
    • G06F11/1489Generic software techniques for error detection or fault masking through recovery blocks
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/505Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering the load
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5022Workload threshold
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/505Clust
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/508Monitor
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Quality & Reliability (AREA)
  • Software Systems (AREA)
  • Data Exchanges In Wide-Area Networks (AREA)

Abstract

The application relates to the technical field of request data processing, in particular to a data processing system for acquiring a target model, which comprises the following components: the gateway node ID, the preset model ID set, the processor and the memory storing the computer program, when the computer program is executed by the processor, the following steps are realized: when a target request has a request fault, acquiring target request times corresponding to the target request and current retry times of a key target request, acquiring a target retry interval if the target request times are smaller than the current retry times, sending the key target request to a gateway node according to the target retry interval so as to call a target model corresponding to the key target request, and continuing to send the key target request after adjusting the retry interval if the target model is not called. According to the application, the reasonable scheduling of the target request is realized according to the adjustment of the retry times and the retry intervals, and the reasonable utilization of the system resources is realized.

Description

Data processing system for obtaining target model
Technical Field
The application relates to the technical field of request data processing, in particular to a data processing system for acquiring a target model.
Background
At present, in a distributed cluster used in fields of data identification, data query, image identification, image processing and the like, a plurality of models for users to call are generally arranged, so that corresponding models are provided for users to use according to service requests sent by the user end and text or image processing is realized, however, in the actual operation process, the situation of request faults occasionally occurs, and the following problems still exist in the prior art:
on the one hand, when a request fault occurs, the request fault is generally fed back to the user side, and after the user side performs corresponding processing or repairing measures, the request is sent again, so that reasonable scheduling of the request cannot be realized, and meanwhile, the calling process of the target model is influenced.
On the other hand, when the utilization rate of a certain model is high, if the number of subsequently sent requests for calling the model is large, the situation that the number of copies of the model is insufficient for calling can occur, and a plurality of requests for calling the model need to be queued, so that the processing efficiency of the service is affected.
Disclosure of Invention
Aiming at the technical problems, the application adopts the following technical scheme:
a data processing system for acquiring a target model, the system comprising: gateway node ID, preset model ID set a= { a 1 ,A 2 ,……,A i ,……,A m Copy ID set A of preset model corresponding to the sequence number (A) 0 ={A 0 1 ,A 0 2 ,……,A 0 i ,……,A 0 m A processor and a memory storing a computer program, wherein A i For the ith preset model ID, A 0 i Is A i The corresponding set of duplicate IDs of the preset model, i=1, 2 … … m, m is the number of preset model IDs, when the computer program is executed by the processor, the following steps are implemented:
s100, when a request fault occurs in a target request, acquiring target request times H corresponding to the target request 0 Wherein H is 0 Meets the following conditions:
H 0 =INT(H max ×R 0 ) Wherein INT () is a rounding function, H max For the preset maximum retry number, R 0 The latest total utilization rate of a target model corresponding to the target request is obtained, wherein the target model is any A in A i A corresponding preset model.
S200, acquiring the current retry number H of the key target request; where H is the number of times the critical target request has been sent.
S300, if H is less than H 0 And acquiring a target retry interval G corresponding to the key target request.
S400, according to G, sending a key target request to a gateway node corresponding to the gateway node ID so as to call a target model corresponding to the key target request.
S500, when the target model corresponding to the key target request is not called, executing S200-S500 again until the target model corresponding to the key target request is called or until H=H 0 When this is the case, the cycle is ended.
Compared with the prior art, the data processing system for acquiring the target model has obvious beneficial effects, can achieve quite technical progress and practicality, has wide industrial utilization value, and has at least the following beneficial effects:
the application provides a data processing system for acquiring a target model, which comprises the following components: the gateway node ID, the set of preset model IDs, the set of copy IDs of the preset model, the processor and the memory storing a computer program which, when executed by the processor, performs the steps of: when a target request has a request fault, acquiring target request times corresponding to the target request and current retry times of a key target request, if the target request times are smaller than the current retry times, acquiring target retry intervals corresponding to the key target request, sending the key target request to a gateway node corresponding to the gateway node ID according to the target retry intervals so as to call a target model corresponding to the key target request, and if the target model is not called, continuing to send the key target request after adjusting the retry intervals. On the one hand, when the target model is not called, the target retry interval is adjusted according to the current retry times, so that the target request is sent again according to the adjusted target retry interval, the reasonable scheduling of a plurality of target requests corresponding to the target model is realized, and meanwhile, the reasonable utilization of system resources is realized; on the other hand, the copy number of the preset model can be automatically adjusted according to the total utilization rate corresponding to any preset model and the weight of the target request corresponding to the model, so that the service processing efficiency of the target request is improved, and meanwhile, the performance of the distributed cluster and the reasonable utilization of resources are ensured.
Drawings
In order to more clearly illustrate the technical solutions of the embodiments of the present application, the drawings required for the description of the embodiments will be briefly described below, and it is apparent that the drawings in the following description are only some embodiments of the present application, and other drawings may be obtained according to these drawings without inventive effort for a person skilled in the art.
FIG. 1 is a flowchart of a data processing system executing a computer program for obtaining a target model according to an embodiment of the present application.
Detailed Description
The following description of the embodiments of the present application will be made clearly and completely with reference to the accompanying drawings, in which it is apparent that the embodiments described are only some embodiments of the present application, but not all embodiments. All other embodiments, which can be made by those skilled in the art based on the embodiments of the application without making any inventive effort, are intended to fall within the scope of the application.
It should be noted that the terms "first," "second," and the like in the description and the claims of the present application and the above figures are used for distinguishing between similar objects and not necessarily for describing a particular sequential or chronological order. It is to be understood that the data so used may be interchanged where appropriate such that the embodiments of the application described herein may be implemented in sequences other than those illustrated or otherwise described herein. Furthermore, the terms "comprises," "comprising," and "having," and any variations thereof, are intended to cover a non-exclusive inclusion, such that a process, method, system, article, or server that comprises a list of steps or elements is not necessarily limited to those steps or elements expressly listed or inherent to such process, method, article, or apparatus, but may include other steps or elements not expressly listed or inherent to such process, method, article, or apparatus.
The application provides a data processing system for acquiring a target model, which comprises the following components: gateway node ID, preset model ID set a= { a 1 ,A 2 ,……,A i ,……,A m Copy ID set A of preset model corresponding to the sequence number (A) 0 ={A 0 1 ,A 0 2 ,……,A 0 i ,……,A 0 m A processor and a memory storing a computer program, wherein A i For the ith preset model ID, A 0 i Is A i The corresponding set of duplicate IDs for the preset model, i=1, 2 … … m, m is the number of preset model IDs, which when executed by the processor, implements the following steps, as shown in fig. 1:
s100, when a request fault occurs in a target request, acquiring target request times H corresponding to the target request 0
Specifically, the preset model ID refers to a unique identity of a preset model, where the preset model is a model that processes an image or performs data recognition on a target file, and the target file is a file including character information, for example: text, tables.
Specifically, the duplicate ID refers to a unique identity of the duplicate that is different from other preset models; it can be understood that: any two copy IDs of the same preset model are the same; and the copies of the preset model are the number of standby models corresponding to the preset model.
Specifically, the request fault is a situation that a target request is in error or the target request fails.
Specifically, H 0 Meets the following conditions:
H 0 =INT(H max ×R 0 ) Wherein INT () is a rounding function, H max For the preset maximum retry number, R 0 The latest total utilization rate of a target model corresponding to the target request is obtained, wherein the target model is any A in A i A corresponding preset model.
Specifically, the preset maximum retry number is a maximum number of times that is preset by a person skilled in the art to perform a re-request after the target request has a request failure.
Specifically, R is 0 Meets the following conditions:
R 0 =σ×(R 0 1 ×0.4+R 0 2 ×0.3+R 0 3 x 0.3), wherein R 0 1 For CPU utilization rate corresponding to target model, R 0 2 For the memory utilization rate corresponding to the target model, R 0 3 And (3) the video memory utilization rate corresponding to the target model is obtained, and sigma is the copy number corresponding to the target model.
Above, R is used in calculating the number of target requests 0 In order to obtain the latest total utilization rate, the target request times corresponding to the target requests can be reasonably determined through the total utilization rate, so that the target requests are prevented from continuing to request according to the maximum retry times.
In a specific embodiment, σ is obtained by:
s101, from A 0 Obtaining a target model copy ID set A corresponding to the target model 0 i ={A 0 i1 ,A 0 i2 ,……,A 0 iu ,……,A 0 iv (wherein A) 0 iu And u= … … v for the u-th target model duplicate ID corresponding to the target model, where v is the number of target model duplicate IDs corresponding to the target model.
S102, determining v as sigma.
S200, obtaining the current retry number H of the key target request.
Specifically, H is the number of times a critical target request has been sent.
Specifically, the key target request is a request which is sent to a gateway node corresponding to the gateway node ID and is consistent with the target request after the target request has a request fault; it can be understood that: the number of retries after the target request fails.
By acquiring the current retry number of the key target request, the current retry number can be compared with the target request number to determine whether to retry the key target request again.
S300, if H is less than H 0 And acquiring a target retry interval G corresponding to the key target request.
Specifically, G meets the following conditions:
G=G 0 +2 H wherein G is 0 Is a preset initial retry interval.
Specifically, the preset initial retry interval is a time interval preset by a person skilled in the art between when the target request fails and when the critical target request is sent for the first time.
And when the current retry times are smaller than the target request times, the target retry interval is recalculated on the basis of the initial retry interval, and the next retry of the key target request is performed in a mode that the target retry interval gradually increases, so that the excessive occupation of system resources and the influence on other requests are reduced.
S400, according to G, sending a key target request to a gateway node corresponding to the gateway node ID so as to call a target model corresponding to the key target request; it can be understood that: and when the time interval corresponding to the request fault reaching G is reached from the last target request or the key target request, sending the key target request to the gateway node again.
In the above way, the key target request is sent based on the target retry interval, so that the problem that the sending of other requests is influenced due to the fact that the target request is subjected to frequent retry is avoided, and therefore reasonable and stable operation of the system is ensured.
In a specific embodiment, in S400, the method further includes the following steps:
s401, if the key target request is successfully sent to the gateway node, acquiring a target model ID corresponding to the key target request from A according to the gateway node;
s402, calling a target model corresponding to the key target request according to the target model ID.
S500, when the target model corresponding to the key target request is not called, executing S200-S500 again until the target model corresponding to the key target request is called or until H=H 0 Ending the cycle when the cycle is completed; it can be understood that: and if the transmission of the key target request fails, re-executing S200-S500.
And when the target model is called, ending the retransmission of the key target request, if the target model is not obtained, retransmitting the key target request until the target model is called within the target request times, and if the target model is not called, ending the cycle after the retry times reach the target request times, so as to prevent the request from going on infinitely, and reduce the waste of system resources and the influence on system performance.
In another specific embodiment, the computer program when executed by a processor further performs the steps of:
s1, any A in a first preset time interval is acquired i The total utilization rate R of the corresponding preset model.
Specifically, the first preset time interval is a time interval set by a person skilled in the art according to actual requirements; for example: half an hour, one hour, etc.; it can be understood that: s1 and subsequent steps are performed every half hour or every hour to complete the expansion or contraction of the distributed cluster.
Specifically, R meets the following conditions:
R=σ×(R 1 ×0.4+R 2 ×0.3+R 3 x 0.3), wherein R 1 Is A i CPU utilization rate, R of corresponding preset model 2 Is A i Memory utilization rate, R, of corresponding preset model 3 Is A i The corresponding display memory utilization rate of the preset model is shown as a 0 Acquired A i Corresponding number of copies.
Specifically, A i The corresponding copy number is A i The number of corresponding standby models.
Specifically, R is 1 Meets the following conditions:
R 11 x T/CL, where η 1 For CPU utilization factor, T is call A in historical time period i The average completion time CL of the tasks corresponding to the plurality of requests of the corresponding preset model is a CPU limit value, where the historical time period is a time period set by a person skilled in the art according to actual requirements, and will not be described herein.
Specifically, R is 2 Meets the following conditions:
R 22 x S/ML, where η 2 Is the utilization coefficient of the memory, S is A i The corresponding default model occupies the memory size, and ML is the memory limit value.
Specifically, R is 3 Meets the following conditions:
R 33 x S x L/GL, where η 3 L is A in the historical time period for the utilization coefficient of the video memory i And the corresponding average data volume of the service processed by the preset model, and GL is a video memory limit value.
In a particular embodiment, η 1 、η 2 And eta 3 All are obtained by testing in the actual environment, and are specifically as follows:
η 1 =T 1 /N 1 /CL, where T 1 Is A i The corresponding preset model is shown in N 1 Total execution time on CPU of each model, N 1 For the number of different CPUs used in the test.
η 2 =F/N 2 ML, wherein F is N 2 A different size of A i The total memory size occupied by the corresponding preset model, N 2 For A used in the test i The number of corresponding preset models. Wherein A is different in size i The corresponding pre-set model can be understood as: and respectively training the target models or setting the target models with different parameters according to actual requirements by adopting different samples.
η 3 =T 2 /N 3 /GL, wherein T 2 Is A i The corresponding preset model is shown in N 3 Total execution time on GPU of model number, N 3 Is the number of different GPU models employed in the test.
Specifically, the CPU limit value, the memory limit value, and the video memory limit value are all hardware parameter values corresponding to the distributed clusters, which are not described herein.
And calculating the total utilization rate once every other first preset time interval, and timely acquiring the resource use condition of the target model in the distributed cluster to complete the subsequent capacity expansion and contraction processing of the distributed cluster.
S2, if R > lambda 1 According to A 0 i Will A i The corresponding copy number is updated to the first target number delta k+1 To complete the capacity expansion of the distributed clusters; wherein lambda is 1 Is a first target threshold.
In particular, delta k+1 Meets the following conditions:
δ k+1k +1, where δ k Is A i Corresponding first current copy number.
Specifically, the first current copy number is a obtained before the current update i The corresponding number of copies; it can be understood that: the first current copy number is the obtained copy numberKey number of target model copies.
Specifically, the first target threshold is a threshold set by a person skilled in the art according to actual requirements, for example: 80%; it can be understood that: increasing a when the total utilization exceeds 80% of the total resources of the distributed cluster i And the corresponding copy number of the preset model is used for completing capacity expansion processing of the distributed cluster.
When the total utilization rate of the distributed clusters exceeds the first target threshold, that is, exceeds the set upper limit, it is indicated that the number of times that the copies of the target model are called is greater, and at this time, the number of copies of the target model is increased to provide a sufficient number of copies of the target model for the subsequent requests to call.
S3, if R is less than or equal to lambda 1 And obtaining the target weight W corresponding to the target request.
Specifically, the target request is a call a sent by the user terminal and obtained from a gateway node corresponding to the gateway node ID i Any request of the corresponding preset model.
Specifically, W meets the following conditions:
W=W 1 /W 0 1 ×0.5+W 2 /W 0 2 ×0.35+W 3 /W 0 3 x 0.15, wherein W 1 To target the requested amount of the request in the previous minute of the current time, W 0 1 W is the total requested amount in the previous minute of the current time 2 To target the requested amount of the request in the first ten minutes of the current time, W 0 2 W is the total requested amount in the first ten minutes at the current time 3 To target the requested amount of the request within the previous hour of the current time, W 0 3 Is the total requested amount in the hour preceding the current time.
Specifically, the current time is a time when acquisition of W is started.
Specifically, the total request amount in the previous minute of the current moment is the sum of the request amounts of calling each preset model in m preset models sent by the user side in the previous minute of the current moment.
Specifically, the total request amount in the first ten minutes of the current time and the total request amount in the last hour of the current time are consistent with the obtaining mode of the total request amount in the last minute of the current time.
And calculating the target weight of the target request when the total utilization rate does not reach the set upper limit so as to comprehensively judge the two conditions, thereby realizing the capacity expansion or capacity reduction of the distributed cluster.
In another specific embodiment, the step S3 further includes the following steps:
s31, obtaining A in a second preset time interval i Corresponding target request set b= { B 1 ,B 2 ,……,B p ,……,B q },B p For the p-th target request in the time period corresponding to any second preset time interval, p=1, 2 … … q, q is the number of target requests in the time period corresponding to any second preset time interval.
Specifically, the second preset time interval is a time interval set by a person skilled in the art according to actual requirements; for example: 1 second; it can be understood that: b is acquired every second, and B acquired is the set of total target requests in the previous second.
By setting the second preset time interval, the plurality of target requests can be acquired every second, and the subsequent mode of separately sequencing the plurality of target requests in each second is realized, so that the problem of overlong waiting time caused by excessive sequencing is prevented.
S32, obtaining any B p A corresponding target priority value ζ.
Specifically, ζ meets the following conditions:
ζ=C 1 ×C 2 +D 1 ×D 2 +E 1 ×E 2 wherein C 1 To request delay weight, C 2 To request delay value D 1 For Query Per Second (QPS) weight, D 2 For QPS value, E 1 To request type weight, E 2 For the request type value.
Specifically, C 1 Meets the following conditions:
C 1 =C 0 ×q/C z wherein C 0 Is B p Corresponding data volume, C z Is theoretically the total amount of data requested per second.
Specifically, the request delay value is B p Delay value at the time of request.
Specifically, D 1 Meets the following conditions:
D 1 =R/D 0 wherein D is 0 Is B p Corresponding system utilization.
Specifically, D 0 Meets the following conditions:
D 0 = (dc+dm+dg)/R, wherein DC is B p Corresponding CPU consumption value, DM is B p Corresponding memory consumption value, DG is B p Corresponding memory consumption values.
Specifically, the QPS value is a number of processing requests per second set in the distributed cluster.
Specifically, E 1 Meets the following conditions:
E 1 =E z /E 0 wherein E is 0 For the preset time period in the history operation to belong to B p Number of requests of corresponding request type, E z A number of all requests within a preset time period in the history operation; it can be understood that: the fewer the number of requests corresponding to the request type, the greater the request type weight.
Specifically, the request type value is a value set by a person skilled in the art according to actual service requirements; for example: the request type corresponding to the request type value comprises data identification and image processing, if the data identification is processed preferentially, the request type value corresponding to the data identification is set to be 1, and the request type value corresponding to the image processing is set to be 2.
In the above, the target priority value of each target request can be obtained according to the multiple parameters corresponding to the target request, so as to implement adjustment of the request sequence of the target requests according to the target priority value.
S33, according to q zeta, sending a scheduling instruction to the gateway node corresponding to the gateway node ID so as to realize ordered scheduling of B.
Specifically, the scheduling instruction is an instruction for ordering q target requests according to the q ζ values by the gateway node in order from large to small, so as to implement that the target request with a high priority value preferentially passes through the gateway node, and further implement the call of the target model corresponding to the target request.
In the foregoing, q target requests are ordered according to the target priority values of the target requests, so that the gateway node can preferentially send the target request with a high target priority value, and the optimization processing of the target request is realized, so that the target model corresponding to the target request with a high priority value can be invoked by the corresponding user as early as possible, and the priority processing of the important service is realized.
S4, if W > lambda 2 According to A 0 i Will A i The corresponding copy number is updated to the second target number delta 0 k+1 To complete the capacity expansion of the distributed clusters; wherein lambda is 2 Is a second target threshold.
In particular, delta 0 k+1 Meets the following conditions:
δ 0 k+10 k +1, where δ 0 k Is A i Corresponding second current copy number.
Specifically, the second current copy number is a acquired before the current update i The number of copies of the corresponding pre-set model.
Specifically, the second target threshold is a threshold set by a person skilled in the art according to actual requirements, for example: 1.5; it can be understood that: when the target weight exceeds 1.5, A is increased i And the corresponding copy number of the preset model is used for completing capacity expansion processing of the distributed cluster.
When the total utilization rate does not reach the set upper limit, if the target weight is greater than the second target threshold, that is, exceeds the set upper limit, it is indicated that the target request has higher importance, and the number of copies of the target model corresponding to the target request should be increased, so as to satisfy the call of multiple target requests corresponding to the target model, reduce the waiting time of the target request, and improve the call efficiency.
S5, if W is less than or equal to lambda 2 According to A 0 i Will A i The corresponding copy number is updated to a third target number theta k+1 To complete the shrinking of the distributed clusters; it can be understood that: on the basis that the total utilization rate is smaller than the first target threshold value, when the target weight is smaller than the second target threshold value, reducing A i And the corresponding copy number of the preset model is used for completing the capacity reduction processing of the distributed cluster.
Specifically, θ k+1 Meets the following conditions:
θ k+1k -1, wherein θ k Is A i Corresponding third current copy number.
Specifically, the third current copy number is a obtained before the current update i Corresponding number of copies.
When the total utilization rate does not reach the set upper limit and the target weight of the target request is smaller than the second target threshold, the target request is smaller in request times of the target model, the number of copies of the target model is more redundant, and the number of copies of the target model is reduced, so that reasonable utilization of system resources and system space is realized.
The application provides a data processing system for acquiring a target model, which comprises the following components: the gateway node ID, the set of preset model IDs, the set of copy IDs of the preset model, the processor and the memory storing a computer program which, when executed by the processor, performs the steps of: when a target request has a request fault, acquiring target request times corresponding to the target request and current retry times of a key target request, if the target request times are smaller than the current retry times, acquiring target retry intervals corresponding to the key target request, sending the key target request to a gateway node corresponding to the gateway node ID according to the target retry intervals so as to call a target model corresponding to the key target request, and if the target model is not called, continuing to send the key target request after adjusting the retry intervals. On the one hand, when the target model is not called, the target retry interval is adjusted according to the current retry times, so that the target request is sent again according to the adjusted target retry interval, the reasonable scheduling of a plurality of target requests corresponding to the target model is realized, and meanwhile, the reasonable utilization of system resources is realized; on the other hand, the copy number of the preset model can be automatically adjusted according to the total utilization rate corresponding to any preset model and the weight of the target request corresponding to the model, so that the service processing efficiency of the target request is improved, and meanwhile, the performance of the distributed cluster and the reasonable utilization of resources are ensured.
While certain specific embodiments of the application have been described in detail by way of example, it will be appreciated by those skilled in the art that the above examples are for illustration only and are not intended to limit the scope of the application. Those skilled in the art will also appreciate that many modifications may be made to the embodiments without departing from the scope and spirit of the application. The scope of the application is defined by the appended claims.

Claims (10)

1. A data processing system for acquiring a target model, the system comprising: gateway node ID, preset model ID set a= { a 1 ,A 2 ,……,A i ,……,A m Copy ID set A of preset model corresponding to the sequence number (A) 0 ={A 0 1 ,A 0 2 ,……,A 0 i ,……,A 0 m A processor and a memory storing a computer program, wherein A i For the ith preset model ID, A 0 i Is A i The corresponding set of duplicate IDs of the preset model, i=1, 2 … … m, m is the number of preset model IDs, when the computer program is executed by the processor, the following steps are implemented:
s100, when a request fault occurs in a target request, acquiring target request times H corresponding to the target request 0 Wherein H is 0 Meets the following conditions:
H 0 =INT(H max ×R 0 ) Wherein INT () is a rounding function, H max For the preset maximum retry number, R 0 The latest total utilization rate of a target model corresponding to the target request is obtained, wherein the target model is any A in A i A corresponding preset model;
s200, acquiring the current retry number H of the key target request; wherein H is the number of times the key target request has been sent;
s300, if H is less than H 0 Acquiring a target retry interval G corresponding to the key target request;
s400, according to G, sending a key target request to a gateway node corresponding to the gateway node ID so as to call a target model corresponding to the key target request;
s500, when the target model corresponding to the key target request is not called, executing S200-S500 again until the target model corresponding to the key target request is called or until H=H 0 When this is the case, the cycle is ended.
2. The data processing system for obtaining a target model according to claim 1, wherein the request failure is a case where the target request is in error or the target request fails.
3. The data processing system for obtaining a model of an object of claim 1, wherein R 0 Meets the following conditions:
R 0 =σ×(R 0 1 ×0.4+R 0 2 ×0.3+R 0 3 x 0.3), wherein R 0 1 For CPU utilization rate corresponding to target model, R 0 2 For the memory utilization rate corresponding to the target model, R 0 3 And (3) the video memory utilization rate corresponding to the target model is obtained, and sigma is the copy number corresponding to the target model.
4. A data processing system for acquiring a target model according to claim 3, wherein σ is acquired by:
s101, from A 0 Obtaining a target model copy ID set A corresponding to the target model 0 i ={A 0 i1 ,A 0 2 ,……,A 0 iu ,……,A 0 iv (wherein A) 0 iu For the u-th target model duplicate ID corresponding to the target model, u= … … v, where v is the number of target model duplicate IDs corresponding to the target model;
s102, determining v as sigma.
5. The data processing system for obtaining a model of an object of claim 1, further comprising the steps of:
s1, any A in a first preset time interval is acquired i The total utilization rate R of the corresponding preset model;
s2, if R > lambda 1 According to A 0 i Will A i The corresponding copy number is updated to the first target number delta k+1 To complete the capacity expansion of the distributed clusters; wherein lambda is 1 Is a first target threshold;
s3, if R is less than or equal to lambda 1 Acquiring a target weight W corresponding to the target request;
s4, if W > lambda 2 According to A 0 i Will A i The corresponding copy number is updated to the second target number delta 0 k+1 To complete the capacity expansion of the distributed clusters; wherein lambda is 2 Is a second target threshold;
s5, if W is less than or equal to lambda 2 According to A 0 i Will A i The corresponding copy number is updated to a third target number theta k+1 To complete the capacity reduction of the distributed clusters.
6. The data processing system for obtaining a model of an object of claim 5, wherein δ k+1 Meets the following conditions:
δ k+1k +1, where δ k Is A i Corresponding first current copy number.
7. The data processing system for obtaining a model of an object of claim 5, wherein W meets the following conditions:
W=W 1 /W 0 1 ×0.5+W 2 /W 0 2 ×0.35+W 3 /W 0 3 x 0.15, wherein W 1 To target the requested amount of the request in the previous minute of the current time, W 0 1 W is the total requested amount in the previous minute of the current time 2 To target the requested amount of the request in the first ten minutes of the current time, W 0 2 W is the total requested amount in the first ten minutes at the current time 3 To target the requested amount of the request within the previous hour of the current time, W 0 3 Is the total requested amount in the hour preceding the current time.
8. The data processing system for obtaining a target model according to claim 1, wherein the key target request is a request consistent with the target request sent to a gateway node corresponding to the gateway node ID after the target request fails.
9. The data processing system for acquiring a target model according to claim 1, wherein in S300, G meets the following condition:
G=G 0 +2 H wherein G is 0 Is a preset initial retry interval.
10. The data processing system for acquiring a target model according to claim 1, further comprising the steps of, in S400:
s401, if the key target request is successfully sent to the gateway node, acquiring a target model ID corresponding to the key target request from A according to the gateway node;
s402, calling a target model corresponding to the key target request according to the target model ID.
CN202311196936.1A 2023-09-18 2023-09-18 Data processing system for obtaining target model Active CN116932290B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202311196936.1A CN116932290B (en) 2023-09-18 2023-09-18 Data processing system for obtaining target model

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202311196936.1A CN116932290B (en) 2023-09-18 2023-09-18 Data processing system for obtaining target model

Publications (2)

Publication Number Publication Date
CN116932290A true CN116932290A (en) 2023-10-24
CN116932290B CN116932290B (en) 2023-12-08

Family

ID=88389984

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202311196936.1A Active CN116932290B (en) 2023-09-18 2023-09-18 Data processing system for obtaining target model

Country Status (1)

Country Link
CN (1) CN116932290B (en)

Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109697113A (en) * 2018-12-29 2019-04-30 广州华多网络科技有限公司 Request method, apparatus, equipment and the readable storage medium storing program for executing retried
CN112463366A (en) * 2020-11-19 2021-03-09 上海交通大学 Cloud-native-oriented micro-service automatic expansion and contraction capacity and automatic fusing method and system
CN114675945A (en) * 2021-05-21 2022-06-28 腾讯云计算(北京)有限责任公司 Service calling method and device, computer equipment and storage medium
JP7103705B1 (en) * 2021-12-21 2022-07-20 北京穿楊科技有限公司 Cluster-based capacity reduction processing method and equipment
CN116048734A (en) * 2023-03-29 2023-05-02 贵州大学 Method, device, medium and equipment for realizing AI (advanced technology attachment) service
WO2023103342A1 (en) * 2021-12-09 2023-06-15 深圳前海微众银行股份有限公司 Cluster resource quota allocation method and apparatus, and electronic device

Patent Citations (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN109697113A (en) * 2018-12-29 2019-04-30 广州华多网络科技有限公司 Request method, apparatus, equipment and the readable storage medium storing program for executing retried
CN112463366A (en) * 2020-11-19 2021-03-09 上海交通大学 Cloud-native-oriented micro-service automatic expansion and contraction capacity and automatic fusing method and system
CN114675945A (en) * 2021-05-21 2022-06-28 腾讯云计算(北京)有限责任公司 Service calling method and device, computer equipment and storage medium
WO2023103342A1 (en) * 2021-12-09 2023-06-15 深圳前海微众银行股份有限公司 Cluster resource quota allocation method and apparatus, and electronic device
JP7103705B1 (en) * 2021-12-21 2022-07-20 北京穿楊科技有限公司 Cluster-based capacity reduction processing method and equipment
CN116048734A (en) * 2023-03-29 2023-05-02 贵州大学 Method, device, medium and equipment for realizing AI (advanced technology attachment) service

Also Published As

Publication number Publication date
CN116932290B (en) 2023-12-08

Similar Documents

Publication Publication Date Title
CN107423120B (en) Task scheduling method and device
CN112162865B (en) Scheduling method and device of server and server
CN109561148B (en) Distributed task scheduling method based on directed acyclic graph in edge computing network
CN107911478B (en) Multi-user calculation unloading method and device based on chemical reaction optimization algorithm
CN105159782A (en) Cloud host based method and apparatus for allocating resources to orders
CN110365748A (en) Treating method and apparatus, storage medium and the electronic device of business datum
CN114936086B (en) Task scheduler, task scheduling method and task scheduling device under multi-computing center scene
CN111191794B (en) Training task processing method, device and equipment and readable storage medium
CN116932290B (en) Data processing system for obtaining target model
CN111258729B (en) Redis-based task allocation method and device, computer equipment and storage medium
CN116932231B (en) Expansion and contraction system of distributed cluster
CN109871958B (en) Method, device and equipment for training model
CN110413393B (en) Cluster resource management method and device, computer cluster and readable storage medium
CN108900865B (en) Server, and scheduling method and execution method of transcoding task
CN116820769A (en) Task allocation method, device and system
CN115964198A (en) Distributed flexible transaction processing method and device based on long transaction
CN110442455A (en) A kind of data processing method and device
US8869171B2 (en) Low-latency communications
CN113590357A (en) Method and device for adjusting connection pool, computer equipment and storage medium
CN114912627A (en) Recommendation model training method, system, computer device and storage medium
CN113051063A (en) Task scheduling method and device for distributed tasks and electronic equipment
CN111901425A (en) CDN scheduling method and device based on Pareto algorithm, computer equipment and storage medium
CN111045805A (en) Method and device for rating task executor, computer equipment and storage medium
CN111258757A (en) Automatic task arranging method and device, computer equipment and storage medium
CN114816720B (en) Scheduling method and device of multi-task shared physical processor and terminal equipment

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant