CN115373836A - Computing network, computing force measuring method, scheduling device and related products - Google Patents

Computing network, computing force measuring method, scheduling device and related products Download PDF

Info

Publication number
CN115373836A
CN115373836A CN202210872143.6A CN202210872143A CN115373836A CN 115373836 A CN115373836 A CN 115373836A CN 202210872143 A CN202210872143 A CN 202210872143A CN 115373836 A CN115373836 A CN 115373836A
Authority
CN
China
Prior art keywords
computing
probe
task
target
center
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202210872143.6A
Other languages
Chinese (zh)
Inventor
丁肇辉
朱波
施泰龙
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Huawei Technologies Co Ltd
Original Assignee
Huawei Technologies Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Huawei Technologies Co Ltd filed Critical Huawei Technologies Co Ltd
Publication of CN115373836A publication Critical patent/CN115373836A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5072Grid computing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5005Allocation of resources, e.g. of the central processing unit [CPU] to service a request
    • G06F9/5027Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals
    • G06F9/5044Allocation of resources, e.g. of the central processing unit [CPU] to service a request the resource being a machine, e.g. CPUs, Servers, Terminals considering hardware capabilities
    • YGENERAL TAGGING OF NEW TECHNOLOGICAL DEVELOPMENTS; GENERAL TAGGING OF CROSS-SECTIONAL TECHNOLOGIES SPANNING OVER SEVERAL SECTIONS OF THE IPC; TECHNICAL SUBJECTS COVERED BY FORMER USPC CROSS-REFERENCE ART COLLECTIONS [XRACs] AND DIGESTS
    • Y02TECHNOLOGIES OR APPLICATIONS FOR MITIGATION OR ADAPTATION AGAINST CLIMATE CHANGE
    • Y02DCLIMATE CHANGE MITIGATION TECHNOLOGIES IN INFORMATION AND COMMUNICATION TECHNOLOGIES [ICT], I.E. INFORMATION AND COMMUNICATION TECHNOLOGIES AIMING AT THE REDUCTION OF THEIR OWN ENERGY USE
    • Y02D10/00Energy efficient computing, e.g. low power processors, power management or thermal management

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Mathematical Physics (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

The embodiment of the application discloses a computing network, a computing force measuring method, a scheduling device and related products, and belongs to the technical field of computers. The method comprises the following steps that a scheduling device in a computing network acquires a probe task and issues the probe task to a target computing center in a plurality of computing centers, wherein the probe task is set according to the computing task executed by the target computing center; the target computing center executes the probe task and reports the performance parameters in the process of executing the probe task to the scheduling device, and the scheduling device evaluates the computing power of the target computing center according to the performance parameters reported by the target computing center. Because the probe task is set based on the calculation task executed by the target calculation center, the calculation force of the target calculation center measured based on the probe task is not greatly different from the calculation force expressed when the target calculation center actually runs, and therefore the calculation force measuring method provided by the embodiment of the application can more accurately measure the calculation force of the target calculation center.

Description

Computing network, computing force measuring method, scheduling device and related products
The present application claims priority of the chinese patent application having application number 202210503412.1 entitled "a calculation power estimation method" filed on 09.05/2022, the entire contents of which are incorporated herein by reference.
Technical Field
The present application relates to the field of computer technologies, and in particular, to a computing network, a computing strength measuring method, a scheduling apparatus, and a related product.
Background
A plurality of computing centers are generally deployed in a computing network, and can efficiently process computing tasks, where the computing tasks include data processing tasks triggered by user-side application programs, and a scheduler of the computing network schedules the computing tasks to each computing center for execution. The super computing center is a computing center for processing data at high speed through a super computer. The computing power of the supercomputing center indicates the computing power of the supercomputing center. If the calculation power estimation of the super calculation center is not accurate enough, the calculation task may be dispatched to the super calculation center whose actual calculation power cannot meet the calculation power required by the calculation task, thereby affecting the execution efficiency of the calculation task.
In the related art, a reference point (benchmark) program is set and issued with respect to a standard organization, and the benchmark program includes a series of standard operation instructions. When the computing power of a certain super-computation center needs to be measured, the super-computation center is controlled to execute a reference point program. The computing power of the super-computation center is measured by the performance that the super-computation center exhibits during the execution of the fiducial point procedure.
In the above-mentioned technology, since the benchmark program is preset by a standard mechanism, when a manufacturer sets up software and hardware of the supercomputing center, the manufacturer usually debugs parameters of the software and hardware based on the benchmark program, so that the supercomputing center can embody the best performance of the supercomputing center when executing the benchmark program, but the supercomputing center usually cannot achieve the best performance when actually executing a computing task, so that the benchmark program cannot accurately measure the computing power of the supercomputing center.
Disclosure of Invention
The application provides a computing network, a computing power measuring method, a scheduling device and related products, which can improve the accuracy of computing power measurement of a computing center in the computing power network. The technical scheme is as follows:
in a first aspect, a computing network is provided that includes a scheduling apparatus and a plurality of computing centers. The dispatching device is used for acquiring a probe task and issuing the probe task to a target computing center in a plurality of computing centers, and the probe task is set according to the computing task executed by the target computing center; the target computing center is used for executing the probe task and reporting the performance parameters of the target computing center in the process of executing the probe task to the scheduling device; the scheduling device is used for evaluating the computing power of the target computing center according to the performance parameters reported by the target computing center
In the present application, the probe task may be set based on the calculation task performed by the target calculation center, and then the calculation power of the target calculation center may be measured based on the probe task by the scheduling apparatus. Since the probe task is set based on the calculation task executed by the target calculation center, the calculation power of the target calculation center based on the measurement of the probe task is not much different from the calculation power of the target calculation center when the target calculation center is actually operated. That is, the calculation capacity method provided by the application can more accurately measure the calculation capacity of the target calculation center.
Based on the computing network provided in the first aspect, in some embodiments, the probe task includes at least one probe operator, each of the at least one probe operator indicating one operator of the target computing center that affects performance of the target computing center in performing the computing task.
In the method and the device, the probe task can be set based on the operator influencing the performance of the target computing center in the computing task execution process, so that the difference between the computing power of the target computing center measured based on the probe task and the computing power expressed when the target computing center is actually operated is further shortened, and the accuracy of the measured computing power is improved.
Based on the computing network provided in the first aspect, in some embodiments, the scheduling apparatus includes a probe operator library, and the probe operator library includes a plurality of probe operators. In this scenario, the scheduling apparatus is further configured to: receiving the computing task; matching at least one executive operator which is indicated by the computing task and used for executing the computing task with a plurality of probe operators in the probe operator library to obtain at least one probe operator; the probe task is generated based on the at least one probe operator.
Through the mode, the scheduling device can automatically generate the probe task based on the to-be-processed calculation task and the probe operator library, then measure the calculation power of the target calculation center based on the probe task, and reasonably schedule the target calculation center to execute the to-be-processed target task based on the calculation power of the measured target calculation center. The flexibility of measuring the central computing power of the target computing is improved.
Based on the computing network provided in the first aspect, in some embodiments, the probe task is included in the scheduling apparatus. In this scenario, when the scheduling apparatus acquires the probe task, the scheduling apparatus is specifically configured to: and responding to the calculation strength instruction, acquiring the probe task from the scheduling device, wherein the probe task is generated according to the calculation task after the calculation task is executed.
Through the mode, after the calculation task is executed, the scheduling device can determine and store the probe task based on the calculation task, and then when the calculation power of the target calculation center needs to be measured, the probe task is issued to the target calculation center. And when the subsequent scheduling device receives the computing task reported by the user side again, the target computing center is scheduled to execute the received computing task again according to the computing power of the measured target computing center. The efficiency of executing the computing task is improved.
Based on the computing network provided in the first aspect, in some embodiments, the computing resources of the target computing center are divided into a plurality of computing units. In this scenario, the target computing center is specifically configured to, when executing the probe task: the probe task is performed by a target computing unit of the plurality of computing units. In this scenario, the scheduling apparatus takes the calculation power of the calculation center for performance parameter evaluation as the calculation power of the target calculation unit.
By the method, the scheduling device can measure the calculation power of fine-grained calculation resources on the target calculation center, and the flexibility of calculation power measurement is improved.
Based on the computing network provided in the first aspect, in some embodiments, the scheduling apparatus includes an identification of a plurality of computing units included in the target computing center. In this scenario, the scheduling apparatus is specifically configured to: determining a target computing unit from a plurality of computing units; and when the probe task is issued, the identification of the target calculation unit is also issued to the target calculation center. Accordingly, the target computing center, prior to performing the probe task by a target computing unit of the plurality of computing units, is further configured to: and determining a target computing unit for executing the probe task according to the identification of the target computing unit.
By the mode, the target computing unit on the target computing center can be appointed by the scheduling device to execute the probe task, so that the computing power of a specific computing unit is measured, and the flexibility of computing power measurement is improved.
Based on the computing network provided in the first aspect, in some embodiments, when the scheduling device determines the target computing unit from a plurality of computing units included in the target computing center, the scheduling device is specifically configured to: acquiring resource configuration information of a target computing center, wherein the resource configuration information indicates an operator which can be executed by each computing unit in a plurality of computing units; and determining a target calculation unit from the plurality of calculation units based on the operator which can be executed by each calculation unit in the plurality of calculation units and the probe operator included in the probe task.
By the method, the resource configuration information of the target computing center can be reported to the scheduling device in advance, so that the scheduling device determines the target computing unit based on the resource configuration information of the target computing center.
Based on the computing network provided in the first aspect, in some embodiments, when the scheduling device determines the target computing unit from a plurality of computing units included in the target computing center, the scheduling device is specifically configured to: issuing the identifier of the probe operator included in the probe task to a target computing center; receiving the identification of a computing unit which can execute the probe task in the plurality of computing units reported by the target computing center; a target computing unit is determined from among the plurality of computing units capable of performing the probe task.
The scheduling device can determine the target computing unit through the resource configuration information of the target computing center and also can determine the target computing unit through interaction with the target computing center, so that the flexibility of computing power measurement is improved.
Based on the computing network provided in the first aspect, in some embodiments, when the scheduling device determines the target computing unit from a plurality of computing units included in the target computing center, the scheduling device is specifically configured to: presenting the identities of the plurality of computing units to a user; responding to the operation of a user, generating a calculation unit setting instruction, wherein the setting instruction comprises an identifier of a target calculation unit; the identification of the target computing unit is obtained from the computing unit setting instruction.
In the above manner, the target calculation unit can be specified by the user. For example, in some scenarios, a user may specify that the computing task is to be executed by a computing unit of a certain type, and the identifier of the target computing unit sent by the scheduling apparatus is specified by the user. The flexibility of computing the strength metric is improved.
Based on the computing network provided in the first aspect, in some embodiments, the target computing center reports a plurality of performance parameters. In this scenario, when evaluating the computing power of the target computing center according to the performance parameter reported by the target computing center, the scheduling device is specifically configured to: and converting the performance parameters into a unified measurement value of computational power through a preset strategy.
In the application, the reported performance parameters can be converted into a unified measurement value of computing power through a preset strategy. In order to compare the differences between the computing forces of different computing centers.
Based on the computing network provided in the first aspect, in some embodiments, the target computing unit comprises at least two computing units of a plurality of computing units comprised by the target computing center. In this scenario, the scheduling apparatus is further configured to: and issuing the target number to the target computing center when the identification of the target computing unit is issued, wherein the target number is the number of computing units which execute the probe task in parallel in at least two computing units included in the target computing unit.
Through the mode, the scheduling device can also specify the scale of the computing unit for executing the probe task, and the flexibility of computing strength is further improved.
Based on the computing network provided in the first aspect, in some embodiments, the target computing unit comprises at least two computing units of a plurality of computing units comprised by the target computing center. In this scenario, the scheduling apparatus is further configured to: and issuing a plurality of different target quantities to a target computing center when the identification of the target computing unit is issued, wherein each target quantity is the quantity of the computing units which execute the probe task in parallel in at least two computing units. Correspondingly, the performance parameters reported by the target computing center comprise performance parameters corresponding to a plurality of different target quantities one to one; the scheduling device is used for: and determining the corresponding relation between the execution efficiency of the target computing unit for executing the probe task and the number of the computing units on the basis of the performance parameters which are in one-to-one correspondence with the different target numbers.
At this time, the scheduling device is further configured to: for the calculation tasks to be processed, determining the scheduling number based on the corresponding relation between the execution efficiency of the target calculation unit for executing the probe tasks and the number of the calculation units; and scheduling a number of computing units in the scheduling target computing unit to execute the computing tasks to be processed.
Through the implementation mode, the scheduling device can measure the corresponding relation between the execution efficiency of the target computing center and the number of the computing units, so that the waste of computing resources caused by the excessive number of the scheduled computing units when the computing units of the target computing center are subsequently scheduled to execute computing tasks is avoided.
Based on the computing network provided in the first aspect, in some embodiments, the scheduling apparatus is configured to: issuing a probe task to a target computing center for multiple times within a reference time period; receiving performance parameters reported by a target computing center for many times; and determining the performance of the network resource of the target computing center in the reference time period based on the performance parameters reported for multiple times.
The calculation strength measuring method provided by the application can measure the performance of the computing resources on the computing center and the performance of the network resources of the computing center, and further improves the flexibility of the calculation strength.
At this time, the scheduling device is further configured to: and for the to-be-processed computing task, scheduling the target computing center to execute the to-be-processed computing task in a specified time period based on the performance of the network resources of the target computing center in the reference time period.
Through the mode, after the scheduling device measures the time within the reference time period when the network resource performance of the target computing center is excellent, the scheduling device can control the target computing center to execute the computing task in the time period with the excellent network resource performance when the scheduling device schedules the target computing center to execute the computing task.
In a second aspect, a computation power method is provided, where technical effects in the computation power method may refer to technical effects of the computation network provided in the first aspect, and are not described in detail below.
In particular, the method is applied to a computing network comprising a scheduling device and a plurality of computing centers. In the method, a scheduling device acquires a probe task and issues the probe task to a target computing center in a plurality of computing centers, and the probe task is set according to the computing task executed by the target computing center; the target computing center executes the probe task and reports the performance parameters of the target computing center in the process of executing the probe task to the scheduling device; and the scheduling device evaluates the computing power of the target computing center according to the performance parameters reported by the target computing center.
Based on the method provided by the second aspect, in some embodiments, the probe task includes at least one probe operator, and each probe operator in the at least one probe operator indicates an operator of the target computing center that affects performance of the target computing center in performing the computing task.
Based on the method provided by the second aspect, in some embodiments, the scheduling apparatus includes a probe operator library, and the probe operator library includes a plurality of probe operators. In this scenario, the implementation process of the scheduling device for acquiring the probe task may be as follows: receiving the computing task; matching at least one executive operator for executing the calculation task indicated by the calculation task with a plurality of probe operators in a probe operator library to obtain at least one probe operator; the probe task is generated based on at least one probe operator.
In some embodiments, the probe task is included in the scheduling apparatus based on the method provided in the second aspect. In this scenario, the implementation process of the scheduling device for acquiring the probe task may be as follows: and responding to the calculation strength instruction, and acquiring the probe task from the scheduling device, wherein the probe task is generated according to the calculation task after the calculation task is executed.
Based on the method provided in the second aspect, in some embodiments, the computing resources of the target computing center are divided into a plurality of computing units. In this scenario, the implementation process of the target computing center executing the probe task may be: the probe task is performed by a target computing unit of the plurality of computing units. Accordingly, the scheduling device takes the calculation power of the calculation center evaluated according to the performance parameters as the calculation power of the target calculation unit.
Based on the method provided in the second aspect, in some embodiments, the scheduling apparatus includes therein identifications of the plurality of computing units included in the target computing center. In this scenario, the implementation process of issuing the probe task by the scheduling device may be as follows: determining a target computing unit from a plurality of computing units; and when the probe task is issued, the identification of the target calculation unit is also issued to the target calculation center. Accordingly, before the target computing center executes the probe task through the target computing unit in the plurality of computing units, the target computing center can also determine the target computing unit executing the probe task according to the identification of the target computing unit.
Based on the method provided in the second aspect, in some embodiments, the implementation process of the scheduling apparatus determining the target computing unit from the plurality of computing units may be: acquiring resource configuration information of a target computing center, wherein the resource configuration information indicates an operator which can be executed by each computing unit in a plurality of computing units; and determining a target calculation unit from the plurality of calculation units based on an operator which can be executed by each calculation unit in the plurality of calculation units and a probe operator included in the probe task.
Based on the method provided in the second aspect, in some embodiments, the implementation process of the scheduling apparatus determining the target computing unit from the plurality of computing units may be: issuing the identifier of the probe operator included in the probe task to a target computing center; receiving the identification of a computing unit which can execute the probe task in a plurality of computing units reported by a target computing center; a target computing unit is determined from among the plurality of computing units capable of performing the probe task.
Based on the method provided by the second aspect, in some embodiments, the implementation process of the scheduling apparatus determining the target computing unit from the plurality of computing units may be: presenting the identities of the plurality of computing units to a user; responding to the operation of a user, generating a calculation unit setting instruction, wherein the calculation unit setting instruction comprises an identifier of a target calculation unit; the identification of the target computing unit is obtained from the computing unit setting instruction.
Based on the method provided in the second aspect, in some embodiments, the target computing center reports a plurality of performance parameters. In this scenario, the implementation process of the scheduling device for evaluating the computing power of the target computing center according to the performance parameters reported by the target computing center may be as follows: and converting the performance parameters into a unified measurement value of computational power through a preset strategy.
In a third aspect, a scheduling apparatus is provided, where the scheduling apparatus has a function of implementing the computation strength method behavior in the second aspect. The scheduling device comprises at least one module for implementing the computational method provided by the second aspect.
In a fourth aspect, a scheduling apparatus is provided, where the scheduling apparatus includes a processor and a memory, and the memory is used to store a program for supporting the scheduling apparatus to execute the computation method provided in the second aspect, and store data used to implement the computation method provided in the second aspect. The processor is configured to execute programs stored in the memory. The operating means of the memory device may further comprise a communication bus for establishing a connection between the processor and the memory.
In a fifth aspect, there is provided a computer readable storage medium comprising computer program instructions which, when executed by a scheduling apparatus, cause the scheduling apparatus to perform the computational method as provided in the second aspect.
A sixth aspect provides a computer program product comprising instructions which, when executed by a scheduling apparatus, cause the scheduling apparatus to perform the computational method as provided in the second aspect.
The technical effects obtained by the above second to sixth aspects are similar to the technical effects obtained by the corresponding technical means in the first aspect, and are not described herein again.
Drawings
Fig. 1 is a schematic architecture diagram of a computing network according to an embodiment of the present application;
fig. 2 is a schematic hardware structure diagram of a scheduling apparatus according to an embodiment of the present application;
fig. 3 is a schematic hardware structure diagram of a compute node according to an embodiment of the present application;
fig. 4 is a schematic setting flow chart of a probe task according to an embodiment of the present application;
fig. 5 is a flowchart of a computation force measuring method according to an embodiment of the present disclosure;
fig. 6 is a schematic diagram illustrating a probe task issuing process provided in the embodiment of the present application;
fig. 7 is a schematic diagram illustrating another probe task issuing process provided in the embodiment of the present application;
fig. 8 is a schematic diagram illustrating another probe task issuing process provided in the embodiment of the present application;
FIG. 9 is a schematic diagram illustrating a correspondence relationship between execution efficiency and the number of computing units in a target computing center according to an embodiment of the present disclosure;
fig. 10 is a schematic block structure diagram of a scheduling apparatus according to an embodiment of the present application.
Detailed Description
To make the objects, technical solutions and advantages of the embodiments of the present application more clear, the embodiments of the present application will be further described in detail with reference to the accompanying drawings.
Before explaining the embodiments of the present application in detail, an application scenario of the embodiments of the present application will be described.
A super computer (supercomputer), abbreviated as supercomputer, refers to a computer with a specification and performance much higher than those of a general personal computer, so that supercomputer can operate a high-speed computing task which cannot be operated by a general computer. Based on this, the ultra-computation can provide the required operation speed and storage capacity for some high-tech fields and advanced technical research fields. In addition, the supercomputing is widely applied to the fields of industrial simulation, geological exploration, meteorological forecasting, gene sequencing and the like.
It should be noted that a supercomputer is usually not a stand-alone computer, but a cluster comprising a plurality of computers, and thus supercomputers are also referred to as supercomputer centers. In this regard, a supercomputing center typically includes a plurality of computing nodes to efficiently process data through the computing resources on the plurality of computing nodes. In addition, the super-computation center may also be referred to as a super-computation system, and the name of the super-computation center is not limited in the embodiment of the present application. It should be understood that reference herein to "a plurality" means two or more.
The operation of the supercomputing center consumes a large amount of electric energy, resulting in very large energy consumption of the supercomputing center. At present, the ranking of the super-computation center is not simply ranked by using the element of the computation performance, but is ranked by using the element of the energy consumption performance ratio and the like. In addition, with the requirements of "carbon neutralization", "carbon peak reaching", and the like, various fields have focused on how to reduce energy consumption. Based on these factors, energy conservation is becoming an important goal in building supercomputing centers.
Therefore, the construction of the supercomputing center cannot pursue the calculation speed on a one-sided basis, and it is necessary to improve the efficiency ratio of the supercomputing center in the whole system by considering the system power consumption and the system utility efficiency as a whole. The performance ratio of the whole system of the current super-calculation center is as follows. One is to break through the energy consumption technology and the performance of computing hardware and build a super-computing center with higher efficiency ratio. And the other method is to construct a computing network comprising a plurality of super computing centers and realize the cross-region and cross-super computing center computing resource unified scheduling through the computing network. The computing network may also be referred to as a computational network, and the like, which is not limited in the embodiments of the present application.
The computing network is a novel information infrastructure which can distribute and flexibly schedule computing resources, storage resources and network resources of a super computing center among the cloud, the network and the edge according to business requirements. The computing network enables flexible scheduling of supercomputing centers.
In the scene of the computing network, different supercomputing centers integrate respective computing resources into the computing network, so that the computing network performs uniform scheduling on the supercomputing centers. One premise for scheduling over different supercomputing centers is to accurately measure the computing power that the supercomputing centers can provide.
Computing power may be understood as exceeding the computing power of a computing center. In the era of everything interconnection, computing power is just like water and electricity, and can be used for meeting real-time computing requirements of various applications such as automatic driving, cloud games, face recognition, virtual Reality (VR)/Augmented Reality (AR) and the like.
In recent years, the computing power of the world leading the supercomputer center is growing in a direction of diversity, and it is difficult to accurately measure the computing power of the supercomputer center. For example, many supercomputing centers may provide diversified computing resources such as a central processing unit (C computing unit), a graphics processing unit (G computing unit), a network processing unit (N computing unit), and the like to complete computing tasks of various applications such as big data, AI, scientific computing, and the like. Different computing resources have different performance when performing the same task, and the performance of the same type of computing resource when running various different applications also differs.
In addition, with the continuous expansion of business requirements, the overall technology stack of the supercomputing center is becoming more and more complex, from the bottom layer of the computing chip (X86 (16-bit microprocessor proposed by intel corporation)/Advanced RISC Machine (ARM)/G computing unit/N computing unit) and the basic software (math library/operator library), to the programming framework and the upper layer of the model application, each link affects the computing power provided by the supercomputing center as a whole, further aggravating the difficulty of accurately measuring the computing power of the supercomputing center.
Currently, in order to measure the computing power of a super-computation center, researchers set up a series of fiducial point (benchmark) programs. The fiducial program includes a standard series of operating instructions to simulate real computational scenes such as physical simulation, 3D image rendering, image processing, etc. The computing power of the supercomputer centers under these computing scenarios is measured by the performance of the supercomputer centers during execution of the fiducial point program.
For example, in an HPC scenario, the current commonly used fiducial programs include a Linear system package (Linear system package, LINPACK) and a high performance Linear algebraic system package (HPL). LINPACK is the most popular reference point program for testing the performance of the super-computation center internationally at present, and the floating point computation performance (FLOPS/number of times of double-precision floating point computation completed per second) of the super-computation center is evaluated by solving a unitary equation set for N times by a Gaussian elimination method on the super-computation. HPL (High Performance LINPACK) is a benchmark program used to evaluate the parallel computing power of a supercomputer center. In addition to the LINPACK and HPL, there are other fiducial programs in the industry, such as the standard performance evaluation organization floating point performance (SpecFP), the standard performance evaluation organization message passing interface (SpecMPI), the high performance conjugate gradient datum (HPCG), the input/output (in/out computing unit t, IO) 500, the Stream (Stream), etc., to evaluate the performance indexes of various aspects of the supercomputing center in the HPC scenario.
For another example, in an AI computation scenario, there are also many fiducial point programs to evaluate the performance of a supercomputing center. Machine learning performance (ML _ Perf) is a publicly-opened, standardized set of benchmark tests used to test machine learning hardware, software, and service and inference performance. The ML Perf can reflect the performance of a hypercalculation center on common machine learning tasks such as image classification, target object identification and detection, medical image segmentation, machine translation, natural language processing, intelligent recommendation and the like. The residual neural network (ResNet) -50 is a pre-trained image classification neural network, and generally loads the model on the supercomputing center to perform an inference task of image classification, and evaluates the speed of the supercomputing center for completing the task, wherein the speed is used for measuring the performance of the supercomputing center on deep learning.
When the computing power of the super-computation center is measured through the reference point program, a manufacturer of the super-computation center often sets resources used by the super-computation center for executing the reference point program in advance, and optimizes various parameters of the super-computation center, so that the super-computation center has the best performance when executing the reference point program, and the computing power of the super-computation center measured based on the reference point program is inaccurate.
In addition, some manufacturers may even customize the benchmarks to reflect the performance advantages of the supercomputing centers in order to show their own advantages. This also results in inaccuracies in the computation of the hypercalculation center based on fiducial procedure metrics.
In addition, the current effort charging policy commonly used in the industry is to roughly grade different super-computation centers by using the test result of the benchmark program, and determine the charging policy of the super-computation center according to the grade. But because the computing power of the super-computation center is difficult to accurately quantify, the charging strategy of the super-computation center has an unfair problem.
In addition, the way of calculating power based on the super-calculation center of the reference point program metric has the following problems besides the problem of inaccurate metric.
Currently, only the computing performance that the super-computing center can provide as a whole can be measured through the benchmark program. When the supercomputing center is scheduled, part of the computing resources of the supercomputing center are often used according to the requirements of users. In this way, the performance test results obtained by running the fiducial point program are difficult to reflect on the performance of the computing resources beyond the fine granularity of the computing center.
Based on this, the embodiment of the application provides a computing network and a computing power amount method, so as to solve the technical problem existing in computing power amount through a reference point program.
For ease of understanding, the overall concept of the embodiments of the present application is first described herein. The embodiment of the application provides a method capable of calculating the computation strength of a computation center in a computation network. On one hand, the method does not need to consider the difference between the computing power scales and the computing power performance of different computing centers, directly analyzes the computing tasks executed in the actual operation process of the computing centers, sets Probe Tasks (PT) similar to the actually executed computing tasks according to the analysis, and completes the accurate measurement of the computing power of the computing centers through the completion condition of the probe tasks executed by the computing centers. On the other hand, the computing resources of the computing center are divided in a fine-grained manner, so that the calculation power of the fine-grained computing resources on the computing center can be measured conveniently. The specific implementation is described in the following embodiments.
The computing center in the computing network may be a computing center that executes a computing task, such as a supercomputing center, and the embodiment of the present application does not limit the specific form of the computing center.
The architecture of the computing network provided in the embodiments of the present application is explained in detail below.
Fig. 1 is a schematic architecture diagram of a computing network according to an embodiment of the present application. As shown in fig. 1, the computing network 00 includes a scheduling device 10, a plurality of computing centers 20, and a user terminal 30. The scheduling device 10 and any one of the computing centers 20 may communicate with each other. The user terminal 30 and the dispatching device 10 can communicate with each other.
It should be noted that the number of the plurality of computing centers 20 in the computing network 00 may be any number, and fig. 1 illustrates two computing centers 20. In addition, computing network 00 may include multiple clients 30, with fig. 1 illustrating one client 30.
The application program is installed on the user terminal 30, and the application program generates a computing task requiring processing of computing resources when running, and the computing task may exemplarily include a computing task in any computing scenario such as an HPC computing scenario or an AI computing scenario. The application program sends the calculation task to the scheduling device 10 through the user terminal 30. When the scheduling device 10 receives the computing task, it may schedule one computing center 20 from the plurality of computing centers 20 to perform the computing task. Among other things, the scheduling device 10 may refer to the power of each computing center 20 to select which computing center 20 to schedule. Based on this, the scheduling apparatus 10 can accurately measure the computation power of each computation center 20 in advance.
As shown in fig. 1, a meta-scheduler (meta-scheduler) is deployed on the scheduler 10, and the meta-scheduler is a software module for implementing the computation method provided in the present application. The specific implementation will be described in detail in the following embodiments, which will not be described herein.
It should be noted that, in a scenario where the computing center 20 in the computing network is a supercomputing center, unlike a conventional scheduler that performs job management and resource scheduling inside the supercomputing center, the meta scheduler disposed on the scheduling apparatus provided in the embodiment of the present application is used to perform job management and resource scheduling between supercomputing centers in the computing network.
The hardware architectures of the dispatching device 10, the computing center 20 and the user terminal 30 are explained below.
(1) Scheduling device
Fig. 2 is a schematic hardware structure diagram of a scheduling apparatus according to an embodiment of the present application. It should be noted that the structure shown in fig. 2 does not limit the hardware implementation of the scheduling apparatus 10 provided in the embodiment of the present application, and any device capable of implementing the function of the scheduling apparatus 10 may be used to execute the computation method provided in the embodiment of the present application.
Specifically, as shown in fig. 2, the scheduling device 10 includes a processor 101, a communication bus 102, a memory 103, and at least one communication interface 104.
The processor 101 may be a general processing unit (C computing unit), an application-specific integrated circuit (ASIC), or one or more integrated circuits for controlling the execution of the programs in the computation power scheme provided by the embodiments of the present application.
The communication bus 102 is used to transfer information between the above components.
Memory 103 may be, but is not limited to, a read-only Memory (ROM) or other type of static storage device that may store static information and instructions, a Random Access Memory (RAM) or other type of dynamic storage device that may store information and instructions, an electrically erasable programmable read-only Memory (EEPROM), a compact disk read-only Memory (CD-ROM) or other optical disk storage, optical disk storage (including compact disk, laser disk, optical disk, digital versatile disk, blu-ray disk, etc.), magnetic disk or other magnetic storage device, or any other medium that can be used to carry or store desired program code in the form of instructions or data structures and that can be accessed by a computer. The memory 103 may be self-contained and coupled to the processor 101 via the communication bus 102. The memory 103 may also be integrated with the processor 101.
The memory 103 is used for storing program codes for executing the computational power measurement scheme provided by the embodiment of the application, and is controlled by the processor 101 to execute. The processor 101 is used to execute program code stored in the memory 103. One or more software modules may be included in the program code. For example, the meta-scheduler shown in fig. 1 is included in the program code.
The communication interface 104 may be any device, such as a transceiver, for communicating with other devices or a communication network, such as an ethernet, a Radio Access Network (RAN), a Wireless Local Area Network (WLAN), etc.
In one embodiment, the scheduling apparatus 10 may include a plurality of processors, such as the processor 101 and the processor 105 shown in fig. 2. Each of these processors may be a single-core (single-C compute unit) processor or a multi-core (multi-C compute unit) processor. A processor herein may refer to one or more devices, circuits, and/or processing cores for processing data (e.g., computer program instructions).
(2) Computing center
As shown in fig. 1, the computing center 20 includes a plurality of computing nodes 201 and at least one management node 202. Two compute nodes 201 and one management node 202 are illustrated in fig. 1.
The computing node 201 is configured to provide computing power required by the computing center 20, and the management node 202 is configured to manage the computing node 201 in the computing center 20. For example, the management node 202 may schedule a certain computing node 201 to execute a computing task, and in this scenario, the management node 202 may also be referred to as a scheduler.
The computing node 201 is a device with computing capability, such as a server, a desktop computer, and the like.
In software, each compute node 201 has an operating system thereon. In hardware, as shown in fig. 3, the computing node 201 includes at least a processor 2011, a memory 2012 and a network card 2013. The processor 2011, the memory 2012 and the network card 2013 are connected by a bus. The processor 2011 and the memory 2012 provide computing resources, among other things.
The processor 2011 may be a Central Processing Unit (CPU) configured to process a computing task from outside the computing node. Alternatively, the processor 2011 may also be a Graphics Processing Unit (GPU) configured to process computational tasks related to image data. The computing tasks may include, for example, image and graphics processing tasks such as geometry transformation, lighting, cubic environment texture mapping and vertex blending, texture compression and bump mapping, rendering, and so on. Optionally, the processor 2011 may also be a Tensor Processing Unit (TPU) for processing computational tasks such as machine learning. Optionally, the processor 2011 may also be a Neural-Network Process Unit (NPU) configured to Process computation tasks related to mass multimedia data such as videos and images.
The memory 2012 is an internal memory directly exchanging data with the processor 2011, can read and write data at any time, and is fast enough to be used as a temporary data storage for an operating system or other programs in operation. The memory includes at least two types of memory, for example, the memory may be a random access memory (ram) or a Read Only Memory (ROM). In practical applications, the computing node 201 may be configured with a plurality of memories 2012 and different types of memories 2012. The number and type of the memories 2012 are not limited in this embodiment.
In summary, in the embodiment of the present application, the computing node 201 can provide diversified computing resources.
The network card 2013 is used to communicate with other computing nodes 201 and management node 202.
In addition, the computing node 201 may be a computing node integrated with a computer, and in this scenario, the computing node 201 further includes a storage resource such as a hard disk. Alternatively, the computing node 201 may not include a storage resource such as a hard disk, and in this scenario, the computing node 201 may borrow a storage resource on a node having a storage resource to store data.
(3) User terminal
The user terminal 30 may be referred to as an application server. The user side 30 may be a physical machine or a virtual machine. Physical machines include, but are not limited to, desktop computers, servers, notebook computers, and mobile devices.
The user end 30 may report the computation task to the scheduling apparatus 10 through an optical fiber switch, an ethernet switch, an infini-band (IB) switch, a remote direct access over ethernet (RoCE) switch based on a converged ethernet, and the like.
In terms of software, the user end 30 has an application program installed thereon, and the user implements a corresponding function through the application program. For example, the user may implement image processing through an image processing application, and the user may implement artificial intelligence through an AI application. In the running process of the application program, a data processing type calculation task is triggered, the user side 30 is responsible for reporting the calculation task to the scheduling device 10, and the scheduling device 10 schedules a calculation center 20 to process the calculation task.
Based on the architecture of the computing network shown in fig. 1, the scheduling device measures the computation power of each computing center by the computation power measuring method provided by the embodiment of the present application. The calculation force measuring method provided by the embodiment of the application measures the calculation force of the calculation center based on the specific situation that the calculation center executes the probe task. The probe tasks are set based on the computing tasks performed by the computing centers in the computing network. The calculation force of the calculation center measured in this way does not differ too much from the calculation force exhibited by the calculation center when actually performing the calculation task.
In addition, in order to measure the computation power of the computation resource with a fine granularity in the computation center, the computation center may divide the computation resource into a plurality of computation units, where the computation units are also referred to as Probe Units (PUs), so that the computation power of a computation unit with a fine granularity can be measured.
For the sake of easy understanding, the relevant contents of how to set up the probe tasks and configure the computing unit are explained first.
1. Setting up of probe tasks
For the sake of convenience in the following description, the calculation task referred to by the set probe task is referred to as a target task. The target task is a calculation task executed by the calculation center, namely the target task is a calculation task executed when the calculation center actually runs. Illustratively, in the computing network shown in fig. 1, the target task is a computing task reported to the scheduling device by the user end, and the computing task is a computing task triggered by an application installed on the user end.
In the present embodiment, the probe task can be set by the following two implementations.
In a first implementation, the scheduling device automatically generates the probe task based on the target task and the probe operator library.
The probe operator library includes a plurality of probe operators, and the probe operators may also be referred to as Probe Kernels (PK).
The plurality of probe operators are determined by analyzing various types of computing tasks in the computing network in advance by experts. Specifically, the expert analyzes in advance an execution operator for executing a corresponding calculation task indicated by each calculation task in a large number of calculation tasks in the calculation network, and an operator influencing the performance of the calculation task executed by the calculation center in the analyzed execution operators is used as a probe operator. Since the operators that affect the performance of the computing center in executing the computing task can usually embody the core computing process of the computing task, the operators that affect the performance of the computing center in executing the computing task may also be referred to as core operators. The embodiment of the application does not describe in detail how the expert analyzes the process of the core operator.
For example, in a recalculation scenario, operators such as DGEMM and Fast Fourier Transform (FFT) in HPC are relatively high in computational power consumption, and therefore these operators can be used as probe operators. Also, for example, operators such as double-precision dense matrix multiplication, sparse matrix multiplication vector, trigonometric function, convolution (CONV), and the like have high power consumption in the AI scenario, and thus these operators can be regarded as probe operators.
In addition, after the expert analyzes the typical application in the industry to obtain a series of probe operators, the probe operators can be uploaded to the probe operator library from each computing center based on the identifications of the probe operators, so as to obtain a plurality of probe operators included in the probe operator library. And will not be described in detail herein.
Based on this, the implementation manner of the scheduling device automatically generating the probe task based on the target task and the probe operator library may be as follows: the scheduling device matches at least one execution operator for executing the target task indicated by the target task with a plurality of probe operators in a probe operator library to obtain at least one probe operator; generating a probe task based on at least one probe operator. For convenience of description later, at least one execution operator indicated by the target task to execute the target task is referred to as an operator indicated by the target task, and the matched at least one probe operator is referred to as a matched probe operator.
The probe operator matched with the operator indicated by the target task can be obtained through a matching algorithm. The matching algorithm may be, for example, a similarity matching algorithm between codes corresponding to operators, and the like, which is not limited in this embodiment of the present application.
In addition, for convenience of description later, an operator matched with the probe operator in the operators indicated by the target task is referred to as a matching operator, and an implementation manner of generating the probe task based on the matched probe operator may be: and for any matched probe operator, determining the execution times of the matched operator corresponding to the probe operator in the process of executing the target task, reducing the execution times of the matched operator according to a reference proportion to obtain the execution times of the probe operator in the probe task, and generating the probe task based on the execution times of each probe operator and the logic relationship between the input and the output of each probe operator.
For example, the matched probe operators include probe operator 1, probe operator 2, and probe operator 3. The execution times of the matching operator corresponding to the probe operator 1 in the target task execution process are 3000 times, the execution times of the matching operator corresponding to the probe operator 2 in the target task execution process are 2000 times, and the execution times of the matching operator corresponding to the probe operator 3 in the target task execution process are 1000 times. Assuming that the reference ratio is 1000. In addition, the output of probe operator 1 and the output of probe operator 2 are the input of P K, and the probe task generated based on this can be: executing the probe operator 1 for three times, determining an average output result of the probe operator 1 based on an output result of the probe operator 1 for three times, executing the probe operator 2 for two times, determining an average output result of the probe operator 2 based on an output result of the probe operator 2 for two times, and executing the probe operator 3 based on the average output result of the probe operator 1 and the average output result of the probe operator 2.
It should be noted that, for any probe operator in the matched probe operators, if the execution times of the matching operator corresponding to the probe operator are less, for example, only once, the execution times of the matching operator may not be reduced according to a reference scale, and the execution times of the matching operator may be directly set as the execution times of the probe operator.
In addition, in the process of generating the probe task based on the matched probe operator, one probe task can be generated, and a plurality of probe tasks can also be generated. For example, if there is a correlation between the input and output of some probe operators in the matched probe operators and there is no correlation between the input and output of other probe operators and the input and output of the previous probe operator, two independent probe tasks can be set. Alternatively, the matched probe operator may be directly set in a probe task without considering whether the input and the output of the matched probe operator are related, and in this case, the probe task may include a plurality of subtasks that are not related to each other.
Based on the above, in the embodiment of the present application, the probe task is understood as a binary executable program capable of running on the computing center, and the content of the binary executable program is the computing task of executing a series of probe operators.
In addition, the process of automatically generating the probe task based on the operator indicated by the target task and the probe operator library can be realized by a scheduling device. Optionally, in order to avoid a large data processing pressure of the scheduling device, the probe operator library may be configured on another server, in which case, the probe task is automatically generated by the other server based on the operator indicated by the target task and the probe operator library, and then the probe task is issued to the scheduling device.
In a second implementation, the target task is analyzed by an expert to set a probe task for the target task.
In some embodiments, the scheduling device may be connected to an administrator side, which is a terminal used by an expert. Under the condition, when the administrator end detects a PT setting instruction triggered by the expert, the target task is obtained and displayed from the scheduling device, the expert analyzes an operator indicated by the target task at the administrator end to determine a core operator of the target task, the core operator is used as a probe operator, then the expert combines the analyzed probe operators into the probe task, and the probe task is sent to the scheduling device through the administrator end.
Fig. 4 is a schematic setting flow chart of a probe task provided in an embodiment of the present application. As shown in fig. 4, for a large-scale dense matrix to directly solve such a computation task, an expert analyzes attributes such as the running time of such a computation task through an administrator terminal. The analysis result shows that double-precision matrix multiplication (DGEMM) is the operator with the highest computational power consumption in the calculation task, so that the expert can use the DGEMM as a probe operator and set a probe task as shown in FIG. 4 based on the probe operator, wherein the probe task comprises three DGEMMs. The administrator side may then upload the probe tasks to the scheduler.
In other embodiments, the scheduling means is a device comprising an input output component. In this case, the expert directly analyzes the target task on the scheduling device to obtain the core operators in the target task, and then the expert combines the analyzed core operators into the probe task on the scheduling device. The specific implementation is not described in detail here.
In the second implementation manner, how the expert analyzes the core operator in the target task is also not limited.
The first implementation manner and the second implementation manner are used to illustrate a process of setting a probe task based on a target task, and the embodiments of the present application do not limit various implementation manners of setting a probe task based on a target task.
Furthermore, in embodiments of the present application, the probe operator library may be maintained by a dedicated system, such as a scheduler or other server. The expert can modify or update or delete the probe operators in the probe operator library on the scheduling device, or upload new probe operators to the probe operator library, and the like.
Further, optionally, since the probe operator library includes a large number of probe operators, in order to facilitate management of the probe operators, the expert may set an operator category for the probe operators in the probe operator library, which may be, for example, an HPC category, an AI category, and the like.
Therefore, when the first implementation manner automatically generates the probe task based on the target task to be processed, based on the type of the target task to be processed, an operator category is selected from the probe operator library, then the matched probe operator is determined from the probe operators corresponding to the selected operator category, and then the probe task is automatically generated based on the matched probe operator, so that the efficiency of automatically generating the probe task in the first implementation manner is improved.
Wherein the type of computing task may be determined based on the type of application that triggered the computing task. The type of the application program may be specifically specified by an expert, which is not described in detail in this embodiment of the present application. For example, the computing task triggered by the image processing application is a type of computing task, and the computing task triggered by the AI application is a type of computing task.
Thus, when an expert analyzes a probe operator, the operator category of the probe operator can be set based on the type of the calculation task referred to by the analysis probe operator. For target tasks to be processed, the type of target task may also be determined based on the type of application that triggered the target task.
Alternatively, the type of the computing task may be determined in other ways, such as dividing the type of the computing task based on resources required by the computing task, dividing the type of the computing task based on operators involved in the computing task, and the like, and embodiments of the present application are not described in detail here.
In addition, when the expert sets the probe task, the expert may test the set probe task in advance, and the test is performed to ensure that the execution time of the set probe task under the smaller-scale computing resource configuration is in the second level, so as to avoid interference caused by the subsequent computing center when executing the probe task to other computing tasks currently being executed.
2. Configuration of computing units
In the embodiment of the present application, in order to measure the computation power of the fine-grained computation resource on the computation center, the computation center may divide its computation resource into a plurality of computation units in advance, and report the relevant information of the divided computation units to the scheduling device. Thus, the dispatching device can measure the computing power of a certain computing unit on the computing center. The related information of the divided computing units can be called as resource configuration information of the computing center.
The computing unit is a basic logic unit for executing tasks in the computing center. For any compute center in a compute network, the compute center may configure compute units according to the type of compute resources of each compute node in the compute center.
For example, the types of computing resources of each computing node in the computing center include three types of computing resources, namely CPU, GPU and NPU. The computing resources such as the CPU include 100 CPU cores, the computing resources such as the GPU include 100 GPU cores, and the computing resources such as the NPU include 100 NPU cores. In this scenario, the computation center may divide the computation resources corresponding to the CPU into 20 computation units, each computation unit includes 5 CPU cores, divide the computation resources corresponding to the GPU into 20 computation units, each computation unit includes 5 GPU cores, divide the computation resources corresponding to the NPU into 20 computation units, and each computation unit includes 5 NPU cores.
It should be noted that, the above describes the types of computing resources of the computing center by taking a CPU, a GPU, or an NPU as an example. Optionally, the type of the computing resource may also be a certain lower class of the CPU, the GPU or the NPU, that is, if the CPU in the computing center includes multiple types of CPUs, or the GPU includes multiple types of GPUs, or the NPU includes multiple types of NPUs, at this time, the computing unit may be configured according to the lower class of the CPU, the GPU or the NPU, which is not described in detail herein.
It should be noted that the CPU core, the GPU core, or the NPU core may be understood as a logical core or a physical core of the corresponding computing resource. Taking the CPU core as an example for explanation, the CPU core may be a CPU logical core, which is a core of a logical layer divided on the basis of a CPU physical core. For example, a CPU physical core may be divided into two CPU logical cores. Specifically, the application program can assume that two CPU physical cores are in operation by controlling the high-speed operation of the CPU physical cores, thereby achieving the effect of dividing one CPU physical core into two CPU logical cores.
In addition, the manner in which computing units are configured by different computing centers in a computing network may be the same or different. The manner of configuring the computing unit may be understood as: how the compute units are configured based on the type of compute resources, and how much compute resources each compute unit includes. For example, for compute center 1 and compute center 2 in a computing network, the computing resources of compute center 1 and compute center 2 each include 100 CPU cores. In this scenario, the computing center 1 may be configured with 20 computing units, each of which includes 5 CPU cores, and the computing center 2 may be configured with 10 computing units, each of which includes 10 CPU cores.
In other words, the computing centers in the computing network are configured with computing units independently from each other, and each computing center can configure computing units according to the related information of its own computing resource.
The configuration of the computing unit by the computing center can be realized by manual operation of an administrator. Specifically, an administrator can specify which computing resources are used as a type of computing resource and how many computing resources are used as a computing unit, so that the computing center can divide the computing resources according to the computing units according to the mode specified by the administrator.
After configuring the computing center, the computing center may report resource configuration information to the scheduling device, where the resource configuration information includes, for example, attributes of the computing units included in the computing center, the attributes including which types of computing units the computing center includes, how many computing units of each type exist, operators supported by the computing units of each type, and so on.
The operators supported by each type of computing unit, that is, the operators that the type of computing unit can execute. The operators supported by each type of compute unit are related to the type of compute resource that makes up that type of compute unit.
In some embodiments, the computing center is configured with a correspondence between computing resource types and supporting operators, which can be uploaded to the computing center by an administrator. Thus, after determining the computing unit, the computing center can determine the operator supported by the computing unit based on the type of the computing resource composing the computing unit and the corresponding relationship.
The corresponding relationship between the computing resource type and the support operator can be uploaded to the computing center by an administrator in advance, and is not described in detail herein.
Table 1 is a schematic table of resource configuration information provided in an embodiment of the present application. As shown in table 1, the resource configuration information reported by the computing center to the scheduling device includes three types of computing units, which are computing unit 1, computing unit 2, and computing unit 3, where computing unit 1 includes 10 computing units, computing unit 2 includes 20 computing units, and computing unit 3 includes 5 computing units. The operators supported by the computing unit 1 include operator 1, operator 3, and operator 4. The operators supported by the computing unit 2 include operator 1, operator 2, operator 3, and operator 5. The operators supported by the calculation unit 3 include operator 1, operator 2, and operator 5.
TABLE 1
Computing unit identification Calculating the number of units Support operator
Computing unit 1 10 Operator 1, operator 3 and operator 4
Computing unit 2 20 Operator 1, operator 2, operator 3 and operator 5
Computing unit 3 5 Operator 1, operator 2 and operator 5
It should be noted that the above process of configuring the computing units is only used for illustration, and when the embodiment of the present application is applied, an administrator corresponding to the computing center may arbitrarily divide the computing resources on the computing center based on the requirement to configure the computing units of the computing center, which is not illustrated herein.
After the setting of the probe task and the configuration of the computing unit are completed, the dispatching device can measure the computing power of the computing center. Fig. 5 is a flowchart of a computation method according to an embodiment of the present disclosure. As shown in fig. 5, the method comprises the following steps 501-503.
Step 501: the scheduling device acquires a probe task and issues the probe task to the target computing center, and the probe task is set according to the computing task executed by the target computing center.
The foregoing contents may be referred to as an implementation manner of how to set a probe task based on a calculation task executed by a target calculation center, and details are not described herein again. Also for the sake of convenience of the following description, the calculation task performed by the target calculation center referred to for the setting probe task is referred to as a target task.
In the embodiment of the present application, the scheduling apparatus may acquire a probe task and issue the probe task to the target computing center in the following two exemplary scenarios.
Scene one: and when a target task to be processed is received, setting a probe task based on the target task, and issuing the probe task to a target computing center.
In a scenario one, when receiving a target task to be processed reported by a user, a scheduling device temporarily determines a probe task based on the target task, then accurately measures the computing power of each computing center in a computing network based on the probe task, and reasonably schedules one computing center to execute the target task to be processed based on the accurately measured computing power of each computing center.
Wherein the scheduling means may determine a probe task temporarily based on the target task to be processed in the following two examples.
For example, when receiving the target task to be processed, if it is determined that the target task has not been executed before the current time, the scheduling device may temporarily determine a probe task based on the target task, so as to accurately measure the computing power of the target computing center based on the probe task, and schedule the target computing center to execute the target task based on the measured computing power.
As another example, the measurement period may be set in advance so that the scheduling apparatus measures the computation power of the target computation center once every measurement period. Based on this, when the measurement period is reached, the scheduling device may temporarily determine a probe task based on the target task for receiving the target task to be processed, so as to measure the computing power of each computing center based on the probe task, and schedule the target computing center to execute the target task based on the measured computing power.
For the target task to be processed, in order to avoid affecting the execution efficiency of the target task, the scheduling device may determine a probe task based on the target task to be processed based on the aforementioned first implementation manner for setting the probe task. Specifically, when a target task to be processed is received, a probe task is determined based on an execution operator for executing the target task and a probe operator library indicated by the target task. Specific implementation modes can refer to the detailed implementation mode of the probe setting task, and repeated description is omitted here.
Scene two: and after the target task is executed, generating a probe task according to the target task, and issuing the probe task to the target computing center.
In the second scenario, after the target task is executed, the scheduling device may set a probe task based on the relevant information in the target task execution process through an expert, store the probe task, and then issue the probe task to the target computing center when the computing power of the target computing center needs to be measured, so as to measure the computing power of the target computing center. Illustratively, after the scheduler determines and stores the probe task, the probe task is obtained from the scheduler in response to the computation workload instruction, and the probe task is issued through step 501.
In a second scenario, after the computing power of the target computing center is measured, when the subsequent scheduling device receives the computing task reported by the user side again, the target computing center is directly scheduled to execute the received computing task again according to the measured computing power of the target computing center. The execution efficiency of the computing task is improved.
The specific process of determining the probe task based on the target task by the scheduling device may also refer to the foregoing detailed implementation manner of setting the probe task, and a description thereof is not repeated here.
It should be noted that, in the second scenario, since the computing network computing task may include multiple types of computing tasks, the scheduling device may determine one probe task for each type of computing task, and record a corresponding relationship between the probe task and the task type. Therefore, when the scheduling device receives the calculation task to be processed again, the probe task matched with the type of the calculation task can be obtained from the corresponding relation based on the type of the calculation task to be processed, and then the target calculation center is scheduled according to the calculation force of the target calculation center measured by the obtained probe task.
The related content of the type of the calculation task may refer to the related content of the probe setting task, and is not described herein again.
Furthermore, the target computing center in step 501 may be one or more computing centers in the computing network shown in fig. 1. When the target computing center is one of the computing centers in the computing network, a measure of the computing power of a certain computing center may be implemented based on the embodiment shown in fig. 5. When the target computing center is a plurality of computing centers in a computing network, a measure of the dissimilarity between the computing power of different computing centers may be achieved based on the embodiment shown in fig. 5.
Fig. 6 is a schematic diagram illustrating a probe task issuing process according to an embodiment of the present disclosure. As shown in fig. 6, the target calculation center includes a calculation center a, a calculation center B, and a calculation center C. The probe tasks issued by the scheduling device comprise probe operators such as double-precision dense matrix multiplication, sparse matrix multiplication vectors, trigonometric functions, CONV and the like. In fig. 6, the double-precision dense matrix multiplication is denoted by (1), the sparse matrix multiplication vector is denoted by (2), the trigonometric function is denoted by (3), and CONV is denoted by (4). The dispatching device respectively issues the probe tasks to the computing center A, the computing center B and the computing center C.
In addition, in a scenario in which the computing resources on the target computing center are divided into fine-grained computing units, in some embodiments, the scheduling device includes identifiers of a plurality of computing units included in the target computing center, so that the scheduling device may further determine the target computing unit from the plurality of computing units, so as to further issue the identifier of the target computing unit to the target computing center when issuing the probe task. And the subsequent target calculation center determines the target calculation unit for executing the probe task according to the identification of the target calculation unit and controls the target calculation unit to execute the probe task.
Fig. 7 is a schematic flowchart of another task of issuing probes according to the embodiment of the present application. As shown in fig. 7, the scheduling device issues the probe task to the computing center a and the computing center B, and also issues the identifier of the target computing unit. In fig. 7, each of the computing center a and the computing center B includes a plurality of PUs, each PU representing a computing unit, and the target computing unit is marked by a dashed box in fig. 7. When the computing center A and the computing center B receive the identification of the target computing unit, the computing center A and the computing center B can control the target computing unit to execute the probe task, and report the performance parameters to the scheduling device after the probe task is executed.
An implementation example in which the scheduling apparatus determines the target computing unit from the plurality of computing units is as follows.
In some embodiments, the scheduling apparatus may obtain resource configuration information of the target computing center, and determine, based on the resource configuration information of the target computing center, an operator that each computing unit included in the target computing center can execute; and selecting the target calculation unit from a plurality of units included in the target calculation center based on an operator which can be executed by each calculation unit included in the target calculation center and a probe operator included in the probe task.
As can be seen from the foregoing configuration process of the computing unit, the implementation process of the scheduling apparatus acquiring the resource configuration information of the target computing center may be: and receiving resource configuration information reported by the target computing center.
Optionally, after the target computing center has configured the computing unit, the resource configuration information may be actively uploaded to the network manager, and the subsequent scheduling device may obtain the resource configuration information of the target computing center from the network manager.
Optionally, after the computing unit of the target computing center is configured by the administrator through manual operation, the administrator may also directly upload the resource configuration information of the target computing center to the scheduling device.
In addition, in a case where the probe task includes a probe operator, the scheduling device may select, based on an operator that each computing unit included in the target computing center can execute and the probe operator included in the probe task, a target computing unit from a plurality of units included in the target computing center, as an implementation procedure: for any computing unit of the target computing center, matching each probe operator included in the probe task with an operator supported by the computing unit, if each probe operator included in the probe task can be matched with one operator supported by the computing unit, determining that the computing unit can execute the probe task, and then determining the target computing unit from the computing units capable of executing the probe task. For example, the scheduling means may directly take the computing unit capable of performing the probe task as the target computing unit.
Wherein, the matching of the probe operator and the operator supported by the computing unit can be understood as: the similarity between the codes corresponding to the probe operators and the codes corresponding to the operators is large.
Optionally, in other embodiments, the scheduling device may issue, to the target computing center, an identifier of a probe operator included in the probe task before issuing the probe task to the target computing center, where the target computing center reports, to the scheduling device, the identifier of a computing unit that can execute the probe task among the multiple computing units based on the identifier of the probe operator included in the probe task, and the scheduling device determines the target computing unit from the computing units that can execute the probe task among the multiple computing units.
As shown in fig. 8, the scheduling device may first issue a resource obtaining request to the target computing center, where the resource obtaining request carries an attribute of the probe task, such as an identifier of a probe operator included in the probe task. And when receiving the resource acquisition request, the target computing center reports the identification of the computing unit capable of executing the probe task to the scheduling device based on the operators supported by each computing unit included in the target computing center. And when receiving the identification of the computing unit which can execute the probe task and is reported by the target computing center, the scheduling device determines the target computing unit from the computing units which can execute the probe task. To also issue the identification of the target computing resource when issuing the probe task.
Optionally, in other embodiments, the target computing unit may also be specified by the user. For example, in some scenarios, a user may specify that the computing task is to be executed by a computing unit of a certain type, and the identifier of the target computing unit sent by the scheduling apparatus is specified by the user.
Based on this, the implementation of the scheduling apparatus in determining the target computing unit from the plurality of computing units may be: presenting the identifications of the plurality of computing units to a user, responding to the operation of the user, generating a computing unit setting instruction, wherein the computing unit setting instruction comprises the identification of the target computing unit, and acquiring the identification of the target computing unit from the computing unit setting instruction.
Wherein the scheduling means, after presenting the identities of the plurality of computing units to the user, the user may select a target computing unit from the plurality of computing units to achieve the target computing unit being specified by the user. The operation of the user is an operation of selecting a target computing unit from the plurality of computing units.
In addition, optionally, in other embodiments, when the scheduling device issues the probe task to the target computing center, the scheduling device may not issue the identifier of the target computing unit to the target computing center. In this scenario, after the target computing center receives the probe task, the target computing center itself determines a computing unit capable of executing the probe task, that is, the target computing center itself determines the target computing unit, and controls the target computing unit to execute the probe task. In order to facilitate the accuracy and calculation power of the subsequent scheduling device, the target computing center needs to report the determined identification of the target computing unit to the scheduling device when reporting the performance parameters of the target computing center for executing the probe task.
In addition, based on the related contents of the configuration computing units, a certain type of computing unit on the computing center may include a plurality of computing units. In this scenario, when the scheduling apparatus designates the target computing unit to execute the probe task, the scheduling apparatus may further designate the number of computing units in the target computing unit that execute the probe task in parallel, so as to designate a computing unit of a certain scale to execute the probe task.
Based on this, in some embodiments, when issuing the probe task and the identifier of the target computing unit to the target computing center, the scheduling device may also issue the target number to the target computing center, where the target number is the number of computing units executing the probe task in parallel in at least two computing units included in the target computing unit.
Optionally, in some embodiments, the scheduling device may not issue the target quantity to the target computing center when issuing the probe task to the target computing center. In such a scenario, the target computing center determines the target number of the computing units executing the probe tasks in parallel in the target computing units, and in order to facilitate accuracy calculation of the subsequent scheduling device, the target computing center needs to report the determined target number to the scheduling device.
Further, in some scenarios, the scheduling device may need to measure the corresponding relationship between the execution efficiency of the target computing center and the number of computing units, so as to avoid the number of computing units being scheduled too much when the computing units of the target computing center are subsequently scheduled to execute the computing tasks. At this time, when the scheduling device issues the probe task in the target computing center, a plurality of different target quantities can be issued to the target computing center, and each target quantity is the quantity of computing units which execute the probe task in parallel in at least two computing units in the target computing unit. Therefore, the target computing center can respectively execute the probe tasks according to the different target quantities, so that the scheduling device can determine the corresponding relation between the execution efficiency of the target computing center and the quantity of the computing units based on the performance parameters of the target computing center for respectively executing the probe tasks. The relevant contents will be explained in detail later, and will not be expanded at first.
Step 502: the target computing center executes the probe task and reports the performance parameters of the target computing center in the process of executing the probe task to the scheduling device.
For the convenience of description later, the performance parameters reported by the target computing center are referred to as evaluation results.
In some embodiments, the evaluation result includes an execution duration of the probe task executed by the target computing center, and/or an accuracy of an output result obtained after the probe task is executed by the target computing center.
Since the probe task is a small-scale calculation task in nature, the target calculation center outputs the probe task, and the output result can be understood as a numerical value. The precision of the output result obtained after the target computing center performs the probe task may exemplarily include a calculation accuracy or a floating point precision of the output result, and the calculation accuracy of the output result may be understood as an error value between the output result and a preset true value. The preset truth value may be uploaded in advance by the expert.
It should be noted that the performance parameters related to the process of executing the probe task by the target computing center are not limited to the execution duration of executing the probe task by the target computing center and the accuracy of the output result obtained after executing the probe task by the target computing center, and when the embodiment of the present application is applied, any performance parameter related to the process of executing the probe task by the target computing center may be carried in the evaluation result, which is not illustrated herein.
In other embodiments, when the scheduling device issues the probe task in the target computing center, the scheduling device also issues a scene with a plurality of different target quantities to the target computing center, and the evaluation result includes a plurality of sub-evaluation results corresponding to the plurality of different target quantities one to one. Each sub-evaluation result may exemplarily include performance parameters, such as an execution duration, of the target computing center in the process of executing the probe task according to the corresponding target number. And the subsequent scheduling device determines the corresponding relation between the execution efficiency of the target computing unit for executing the probe task and the number of the computing units based on the plurality of sub-evaluation results.
In the above scenario, the target computing center takes the performance parameters of the target computing center for executing the probe tasks according to each target quantity as a sub-evaluation result, and then packs a plurality of sub-evaluation results into an evaluation result to be sent to the scheduling device. Optionally, the target computing center may also send a plurality of evaluation results to the scheduling device, where each evaluation result corresponds to one target quantity, and each evaluation result indicates the target computing center to execute the performance parameters in the probe task process according to the corresponding target quantity.
Step 503: and the scheduling device evaluates the computing power of the target computing center according to the performance parameters reported by the target computing center.
In some embodiments, if the scheduling device is capable of determining the identity of the target computing unit performing the probe task, step 503 may be specifically understood as: the scheduling device measures the computing power of a target computing unit based on the evaluation result, and the target computing unit is a computing unit which executes the probe task in a plurality of computing units included in the target computing center. That is, the scheduling device evaluates the computing power of the computing center according to the reported performance parameters as the computing power of the target computing unit, thereby realizing the computing power measurement of fine-grained computing resources on the target computing center.
The specific implementation manner of the scheduling apparatus determining the identifier of the target computing unit may refer to step 501, and is not described in detail here.
In addition, in some embodiments, since the evaluation result includes a plurality of performance parameters of different dimensions, such as execution duration and accuracy of the output result, in order to facilitate visual evaluation of differences between computing powers of different computing centers, the evaluation result may be converted into a specific computing power value (i.e., computing power).
Based on this, the implementation process of step 503 may be: the scheduling device converts the multiple performance parameters into a unified metric value of computing power based on a preset strategy.
For convenience of explanation, the preset policy is referred to as a metric policy. The measurement strategy can be understood as an algorithm for converting each performance parameter in the evaluation result into a calculation value. For example, assuming that the evaluation result includes multiple performance parameters, the measurement policy may indicate how the parameter value of each performance parameter is converted into an computation value, so as to obtain a computation value corresponding to each performance parameter, and indicate how to integrate the computation values of the performance parameters into a uniform computation force.
For example, the evaluation result includes two performance parameters, i.e., execution duration and output accuracy. Two mapping relationships, namely a first mapping relationship and a second mapping relationship, may be included in the measurement policy. The first mapping relation comprises a plurality of duration intervals and calculation force values respectively corresponding to the duration intervals, and the second mapping relation comprises a plurality of precision intervals and calculation force values respectively corresponding to the precision intervals. The measurement strategy also comprises a first weight and a second weight, wherein the first weight is a weight corresponding to the execution duration, and the second weight is a weight corresponding to the output precision.
Based on this, when the scheduling device receives the execution duration and the output precision reported by the target computing center, the calculation force value corresponding to the reported execution duration is determined according to the reported execution duration and the first mapping relation, and the calculation force value corresponding to the reported output precision is determined according to the reported output precision and the second mapping relation. And then based on the first weight and the second weight, carrying out weighting processing on the calculation force value corresponding to the reported execution duration and the calculation force value corresponding to the reported output precision, wherein the obtained numerical value is the calculation force of the target calculation center after unified measurement.
The above metric strategies are used for illustration, and do not constitute a limitation on the metric strategies provided in the embodiments of the present application.
In addition, in the embodiment of the present application, the measurement policy may be set by an expert, and the measurement policy may be uploaded to the scheduling device. In this scenario, the implementation manner of the scheduling apparatus obtaining the metric policy may be: and the scheduling device responds to the measurement strategy uploading operation to obtain the measurement strategy.
The embodiment of the present application does not limit the specific way in which the expert sets the measurement policy, and when the embodiment of the present application is applied, the expert may set the specific measurement policy based on actual requirements.
In addition, in order to avoid influence of randomness and system jitter on the calculation strength in the calculation strength measurement process, the probe task can be issued to the target calculation center for multiple times, so that the target calculation center executes the probe task for multiple times to obtain multiple evaluation results. The scheduling device carries out averaging processing on the multiple evaluation results to obtain the average computing power of the target computing center so as to improve the accuracy of the computing power of the measured target computing center.
In addition, after the scheduling device issues the probe task to the target computing center, the target computing center can not only control the target computing unit on the target computing center to execute the probe task, but also report which computing units, except the target computing unit, in the computing units included in the scheduling device can be used for executing the probe task. The scheduling device can measure the types of the computing units of the probe task which can be executed on the target computing center, thereby realizing the measurement of diversified computing power.
In addition, in order to be able to standardize the difference between the computing powers of different computing centers, based on the embodiment shown in fig. 5, the scheduling apparatus may issue the same probe tasks to the same number of computing units of different computing centers in the computing network, thereby standardizing the difference between the computing power performances of different computing centers.
In summary, the embodiment shown in fig. 5 has the following technical effects:
(1) Based on the embodiment shown in fig. 5, the calculation power of the target calculation center can be measured based on the probe task, and since the probe task is set based on the calculation task performed by the target calculation center, the calculation power of the target calculation center based on the probe task measurement is not much different from the calculation power exhibited when the target calculation center is actually operated. That is, the calculation capacity method provided by the embodiment of the application can more accurately measure the calculation capacity of the target calculation center.
Furthermore, billing issues need to be considered when computing forces are provided to the outside world as resources in a computing network. Based on the calculation capacity measuring method provided by the embodiment of the application, the calculation capacity of the calculation center can be accurately measured, so that unreasonable places for calculation capacity charging are avoided.
(2) Based on the embodiment shown in fig. 5, by dividing the computing resources on the computing center into fine-grained computing units, the scheduling apparatus can measure the computation power of the fine-grained computing units, thereby measuring the performance of the fine-grained computing resources on the computing center.
The calculation method provided by the embodiment of the application can also comprise the following technical scheme besides the scheme.
1. The differential variation in computing power performance under different scale computing units was evaluated.
Specifically, as shown in step 501, when the scheduling device issues the probe task and the identifier of the target computing unit to the target computing center, if a plurality of different target numbers are also issued to the target computing center, each target number is the number of computing units executing the probe task in parallel in at least two computing units in the target computing unit. Under the scene, the target computing center can control the target computing units with different target quantities to respectively execute the probe tasks, and report the performance parameters respectively corresponding to the target quantities, so that the scheduling device can determine the corresponding relation between the execution efficiency of the target computing center and the computing unit quantities based on the performance parameters respectively corresponding to the target quantities. That is, the variation of the difference in the calculation power performance under different scale calculation units was evaluated.
The correspondence between the execution efficiency of the target calculation center and the number of calculation units in a normal case exhibits a trend of change as shown in fig. 9. As shown in fig. 9, in the case where the number of calculation units is small, the execution efficiency of the target calculation center becomes significantly large as the number of calculation units increases, but when the number of calculation units increases to a certain number threshold, the execution efficiency of the target calculation center does not change much as the number increases.
Based on this, after obtaining the corresponding relationship between the execution efficiency of the target computing center and the number of computing units, the scheduling device may control the number of the scheduled computing units to be within the number threshold shown in fig. 9 when the subsequent scheduling device needs to determine how many computing units on the target computing center are scheduled, so as to avoid wasting computing resources due to scheduling of too many computing units.
Thus, in some embodiments, for a computing task to be processed, the scheduling means may determine the scheduling number based on a correspondence between an execution efficiency of the target computing unit on the target computing center to execute the probe task and the number of computing units; then, the calculation units with the scheduling number in the target calculation units on the scheduling target calculation center execute the calculation tasks to be processed.
2. The performance of the network resources of the computing center is measured.
The calculation method provided by the embodiment of the application can measure the performance of the computing resources on the computing center and can also measure the performance of the network resources of the computing center.
Based on this, in some embodiments, the implementation process of step 501 may be: the scheduling device issues the probe task to the target computing center for multiple times within the reference time period. Accordingly, the implementation process of step 502 may be: and the scheduling device receives the evaluation results returned by the target computing center for multiple times, wherein the evaluation results returned each time comprise performance parameters in the process of executing the probe task by the target computing center, and multiple evaluation results are obtained. Accordingly, the implementation process of step 503 may be: and the scheduling device determines the performance of the network resource of the target computing center in the reference time period based on the plurality of evaluation results.
Since the target computing center may need to acquire data from the network when performing the probe task, the execution duration of the target computing center performing the probe task may also reflect the performance of network resources in the process of performing the probe task. In this way, when the target computing center executes the probe task multiple times within the reference time period, the scheduling device may determine a change in the network resource of the target computing center within the reference time period based on a change in the execution duration of the probe task executed by the target computing center multiple times.
For example, the variation of the execution duration of the probe task executed by the target computing center for a plurality of times within the reference time period is as follows: the execution time period becomes stable for a while and then becomes suddenly large. The trend of the change of the network resources, such as bandwidth, of the target computing center in the reference time period may be: the bandwidth settles over a period of time and then suddenly diminishes.
By the mode, the scheduling device can measure the excellent network resource performance of the target computing center in the reference time period. When the subsequent scheduling device schedules the target computing center to execute the computing task, the subsequent scheduling device can control the target computing center to execute the computing task in a time slot with excellent network resource performance.
Based on this, in some embodiments, for a pending computing task, the scheduling device may schedule the target computing center to execute the pending computing task within a specified time period based on performance of network resources of the target computing center within a reference time period. The performance of the network resources within the specified time period is superior.
For example, the reference time period is one day, and the scheduling device measures that the network resource performance of the target computing center is excellent between two points and four points in the afternoon in one day. Based on this, the scheduling apparatus may control the target calculation center to execute the calculation task between two points and four points in the afternoon when the target calculation center is scheduled to execute the calculation task to be processed.
In addition, assuming that a process in which the computing centers perform the probe task is called a probe (probe) test, each computing center performs the probe test at least once before the scheduling device schedules the computing center to perform the computing task, so that the scheduling device can sense a difference between computing powers of each computing center. In some embodiments, the scheduling device may periodically control the computing center to perform probe tests, so as to timely find out the computing power change of the computing center. In other embodiments, the computing center may also actively request the dispatching device for probe testing during computation adjustment (e.g., adding new computation resources), so that the dispatching device can sense the computation change.
It should be noted that, since the probe task is also a calculation task in nature, when the calculation center actually executes the probe task, the calculation center does not separately process the probe task and the calculation task triggered by the application program, but processes the probe task as a calculation task received under normal conditions. In this case, the computing center may also mark the probe task, so that after the probe task is executed, the performance parameters in the process of executing the probe task are reported to the scheduling device.
In addition, the embodiment of the application also provides a scheduling device. As shown in fig. 10, the scheduling apparatus 1000 includes the following modules.
An acquisition module 1001 for acquiring a probe task. The specific implementation manner may refer to step 501 in the embodiment of fig. 5.
The issuing module 1002 is configured to issue a probe task to a target computing center among the multiple computing centers, where the probe task is set according to a computing task executed by the target computing center. The specific implementation manner may refer to step 501 in the embodiment of fig. 5.
The receiving module 1003 is configured to receive performance parameters of the target computing center during the process of executing the probe task. The specific implementation manner may refer to step 502 in the embodiment of fig. 5.
And the evaluation module 1004 is configured to evaluate the computing power of the target computing center according to the performance parameters reported by the target computing center. The specific implementation manner may refer to step 503 in the embodiment of fig. 5.
Optionally, the probe task includes at least one probe operator, each of the at least one probe operator indicating an operator of the target computing center that affects performance of the target computing center in performing the computing task.
Optionally, the scheduling apparatus includes a probe operator library, and the probe operator library includes a plurality of probe operators;
the acquisition module is used for: receiving a computing task; matching at least one execution operator for executing the calculation task indicated by the calculation task with a plurality of probe operators in a probe operator library to obtain at least one probe operator; generating a probe task based on the at least one probe operator.
Optionally, the scheduling device includes a probe task;
the acquisition module is used for: and responding to the calculation strength instruction, and acquiring a probe task from the scheduling device, wherein the probe task is generated according to the calculation task after the calculation task is executed.
Optionally, the computing resources of the target computing center are divided into a plurality of computing units.
The calculation unit for executing the probe task is a target calculation unit in the plurality of calculation units; and the scheduling device takes the computing power of the computing center for evaluating the performance parameters as the computing power of the target computing unit.
Optionally, the scheduling device includes identifiers of a plurality of computing units included in the target computing center.
The issuing module is used for determining a target calculation unit from a plurality of calculation units; and when the probe task is issued, the identification of the target computing unit is also issued to the target computing center, so that the target computing center determines the target computing unit for executing the probe task according to the identification of the target computing unit before executing the probe task through the target computing unit in the plurality of computing units.
Optionally, the issuing module is specifically configured to:
acquiring resource configuration information of a target computing center, wherein the resource configuration information indicates an operator which can be executed by each computing unit in a plurality of computing units;
and determining a target calculation unit from the plurality of calculation units based on the operator which can be executed by each calculation unit in the plurality of calculation units and the probe operator included in the probe task.
Optionally, the down-sending module is specifically configured to,
issuing the identifier of the probe operator included in the probe task to a target computing center; receiving the identification of a computing unit which can execute a probe task in a plurality of computing units reported by a target computing center; a target computing unit is determined from among the plurality of computing units capable of performing the probe task.
Optionally, the issuing module is specifically configured to:
presenting the identities of the plurality of computing units to a user; responding to the operation of a user, generating a calculation unit setting instruction, wherein the calculation unit setting instruction comprises an identifier of a target calculation unit; the identification of the target computing unit is obtained from the computing unit setting instruction.
Optionally, the target computing center reports a plurality of performance parameters;
the evaluation module is specifically configured to:
and converting the performance parameters into a unified measurement value of computational power through a preset strategy.
In summary, in the embodiment of the present application, the scheduling device may measure the computation power of the target computation center based on the probe task, and since the probe task is set based on the computation task actually executed by the target computation center, the computation power of the target computation center based on the probe task measurement is not much different from the computation power expressed when the target computation center actually runs. That is, the calculation capacity method provided by the embodiment of the application can more accurately measure the calculation capacity of the target calculation center.
It should be noted that: in the scheduling apparatus provided in the foregoing embodiment, when calculating the power metric, only the division of the functional modules is described as an example, and in practical applications, the function allocation may be completed by different functional modules according to needs, that is, the internal structure of the scheduling apparatus is divided into different functional modules to complete all or part of the functions described above. In addition, the scheduling apparatus provided in the foregoing embodiment and the method embodiment for calculating the strength metric belong to the same concept, and specific implementation processes thereof are detailed in the method embodiment and are not described herein again.
In the above embodiments, the implementation may be wholly or partly realized by software, hardware, firmware, or any combination thereof. When implemented in software, may be implemented in whole or in part in the form of a computer program product. The computer program product includes one or more computer instructions. When loaded and executed on a computer, cause the processes or functions described in accordance with the embodiments of the application to occur, in whole or in part. The computer may be a general purpose computer, a special purpose computer, a network of computers, or other programmable device. The computer instructions may be stored on a computer readable storage medium or transmitted from one computer readable storage medium to another, for example, from one website, computer, server, or computing center to another website, computer, server, or computing center via wire (e.g., coaxial cable, fiber optic, digital Subscriber Line (DSL)) or wireless (e.g., infrared, wireless, microwave, etc.). The computer-readable storage medium can be any available medium that can be accessed by a computer or a data storage device including one or more available media integrated servers, computing centers, and the like. The usable medium may be a magnetic medium (e.g., floppy disk, hard disk, magnetic tape), an optical medium (e.g., digital Versatile Disk (DVD)), or a semiconductor medium (e.g., solid State Disk (SSD)), among others.
It will be understood by those skilled in the art that all or part of the steps for implementing the above embodiments may be implemented by hardware, or may be implemented by a program instructing relevant hardware, where the program may be stored in a computer-readable storage medium, and the above-mentioned storage medium may be a read-only memory, a magnetic disk or an optical disk, etc.
The above-mentioned embodiments are provided by way of example and not intended to limit the present disclosure, and any modifications, equivalents, improvements, etc. made within the spirit and principle of the present disclosure should be included in the scope of the present disclosure.
In addition, information (including but not limited to user equipment information, user personal information, etc.), data (including but not limited to data for analysis, stored data, displayed data, etc.), and signals referred to in the embodiments of the present application are authorized by the user or sufficiently authorized by various parties, and the collection, use, and processing of the relevant data is required to comply with relevant laws and regulations and standards in relevant countries and regions.

Claims (32)

1. A computing network, comprising a scheduling apparatus and a plurality of computing centers;
the dispatching device is used for acquiring a probe task and issuing the probe task to a target computing center in the plurality of computing centers, and the probe task is set according to the computing task executed by the target computing center;
the target computing center is used for executing the probe task and reporting the performance parameters of the target computing center in the process of executing the probe task to the scheduling device;
and the scheduling device is used for evaluating the computing power of the target computing center according to the performance parameters reported by the target computing center.
2. The computing network of claim 1, wherein the probe task includes at least one probe operator, each of the at least one probe operator indicating one operator of the target computing center that affects performance of the target computing center in performing the computing task.
3. The computing network of claim 1 or 2, wherein the scheduling means comprises a probe operator library, the probe operator library comprising a plurality of probe operators;
when acquiring the probe task, the scheduling device is specifically configured to:
receiving the computing task;
matching at least one executive operator which is indicated by the computing task and used for executing the computing task with a plurality of probe operators in the probe operator library to obtain at least one probe operator;
generating the probe task based on the at least one probe operator.
4. The computing network of claim 1 or 2, wherein said probe tasks are included in said scheduling means,
when acquiring the probe task, the scheduling device is specifically configured to:
and responding to a calculation strength instruction, and acquiring the probe task from the scheduling device, wherein the probe task is generated according to the calculation task after the calculation task is executed.
5. The computing network of any of claims 1-4, wherein the computing resources of the target computing center are divided into a plurality of computing units;
when the target computing center executes the probe task, the target computing center is specifically configured to:
performing, by a target computing unit of the plurality of computing units, the probe task;
and the scheduling device is used for evaluating the calculation power of the calculation center according to the performance parameters as the calculation power of the target calculation unit.
6. The computing network of claim 5, wherein the scheduling means includes an identification of the plurality of computing units included in the target computing center;
when issuing a probe task, the scheduling device is specifically configured to:
determining the target computing unit from the plurality of computing units;
when the probe task is issued, the identification of the target calculation unit is also issued to the target calculation center;
the target computing center, prior to performing the probe task by a target computing unit of the plurality of computing units, is further to:
and determining a target computing unit for executing the probe task according to the identification of the target computing unit.
7. The computing network of claim 6, wherein the scheduling means, when determining the target computing unit from the plurality of computing units, is specifically configured to:
acquiring resource configuration information of the target computing center, wherein the resource configuration information indicates operators which can be executed by each computing unit in the plurality of computing units;
determining the target computing unit from the plurality of computing units based on an operator that each computing unit of the plurality of computing units can execute and a probe operator included in the probe task.
8. The computing network of claim 6, wherein the scheduling means, when determining the target computing unit from the plurality of computing units, is specifically configured to:
issuing the identifier of the probe operator included in the probe task to the target computing center;
receiving an identifier of a computing unit which can execute the probe task and is reported by the target computing center;
determining the target computing unit from among the computing units capable of performing the probe task.
9. The computing network of claim 6, wherein the scheduling means, when determining the target computing unit from the plurality of computing units, is specifically configured to:
presenting the identities of the plurality of computing units to a user;
responding to the operation of a user, and generating a calculation unit setting instruction, wherein the calculation unit setting instruction comprises the identification of the target calculation unit;
and acquiring the identification of the target computing unit from the computing unit setting instruction.
10. The computing network of any of claims 1-9, wherein a plurality of performance parameters are reported by the target computing center;
when evaluating the computing power of the target computing center according to the performance parameters reported by the target computing center, the scheduling device is specifically configured to:
and converting the performance parameters into a unified measurement value of computing power through a preset strategy.
11. A computation method is characterized in that the method is applied to a computing network, and the computing network comprises a scheduling device and a plurality of computing centers; the method comprises the following steps:
the dispatching device acquires a probe task and issues the probe task to a target computing center in the plurality of computing centers, wherein the probe task is set according to the computing task executed by the target computing center;
the target computing center executes the probe task and reports the performance parameters of the target computing center in the process of executing the probe task to the scheduling device;
and the scheduling device evaluates the computing power of the target computing center according to the performance parameters reported by the target computing center.
12. The method of claim 11, in which the probe task comprises at least one probe operator, each of the at least one probe operator indicating one operator of the target computing center that affects performance of the target computing center in performing the computing task.
13. The method of claim 11 or 12, wherein the scheduling apparatus comprises a probe operator library, the probe operator library comprising a plurality of probe operators;
the scheduling device acquires the probe task, and comprises:
receiving the computing task;
matching at least one executive operator which is indicated by the computing task and used for executing the computing task with a plurality of probe operators in the probe operator library to obtain at least one probe operator;
generating the probe task based on the at least one probe operator.
14. The method according to claim 11 or 12, wherein said probe tasks are comprised in said scheduling means,
the scheduling device acquires the probe task, and comprises:
responding to a calculation strength instruction, and acquiring the probe task from the scheduling device, wherein the probe task is generated according to the calculation task after the calculation task is executed.
15. The method of any of claims 11-14, wherein the computing resources of the target computing center are divided into a plurality of computing units;
the target computing center performs the probe tasks, including:
performing, by a target computing unit of the plurality of computing units, the probe task;
and the scheduling device is used for evaluating the calculation power of the calculation center according to the performance parameters as the calculation power of the target calculation unit.
16. The method of claim 15, wherein the scheduling means includes an identification of the plurality of computing units included by the target computing center;
the dispatching device issues the probe tasks, and the method comprises the following steps:
determining the target computing unit from the plurality of computing units;
when the probe task is issued, the identification of the target calculation unit is also issued to the target calculation center;
the target computing center, prior to performing the probe task by a target computing unit of the plurality of computing units, further comprises:
and determining a target computing unit for executing the probe task according to the identification of the target computing unit.
17. The method of claim 16, wherein the scheduling means determining the target computational unit from the plurality of computational units comprises:
acquiring resource configuration information of the target computing center, wherein the resource configuration information indicates operators which can be executed by each computing unit in the plurality of computing units;
determining the target computing unit from the plurality of computing units based on an operator that each computing unit of the plurality of computing units can execute and a probe operator included in the probe task.
18. The method of claim 16, wherein the scheduling means determining the target computational unit from the plurality of computational units comprises:
issuing the identifier of the probe operator included in the probe task to the target computing center;
receiving an identifier of a computing unit which can execute the probe task and is reported by the target computing center;
determining the target computing unit from among the computing units capable of performing the probe task.
19. The method of claim 16, wherein the scheduling means determining the target computing unit from the plurality of computing units comprises:
presenting the identities of the plurality of computing units to a user;
responding to the operation of a user, and generating a calculation unit setting instruction, wherein the calculation unit setting instruction comprises the identification of the target calculation unit;
and acquiring the identification of the target computing unit from the computing unit setting instruction.
20. The method of any one of claims 11-19, wherein a plurality of performance parameters are reported by the target computing center;
the scheduling device evaluates the computing power of the target computing center according to the performance parameters reported by the target computing center, and comprises the following steps:
and converting the performance parameters into a unified measurement value of computing power through a preset strategy.
21. A scheduling apparatus, comprising:
the acquisition module is used for acquiring a probe task;
the issuing module is used for issuing the probe tasks to a target computing center in the plurality of computing centers, and the probe tasks are set according to the computing tasks executed by the target computing center;
a receiving module, configured to receive a performance parameter, reported by the target computing center, of the target computing center in the process of executing the probe task;
and the evaluation module is used for evaluating the computing power of the target computing center according to the performance parameters reported by the target computing center.
22. The scheduling apparatus of claim 21, wherein the probe task comprises at least one probe operator, each of the at least one probe operator indicating one operator of the target computing center that affects performance of the target computing center in performing the computing task.
23. The scheduling apparatus of claim 21 or 22 wherein the scheduling apparatus comprises a probe operator library, the probe operator library comprising a plurality of probe operators;
the acquisition module is configured to:
receiving the computing task;
matching at least one executive operator which is indicated by the computing task and used for executing the computing task with a plurality of probe operators in the probe operator library to obtain at least one probe operator;
generating the probe task based on the at least one probe operator.
24. The scheduling apparatus according to claim 21 or 22, wherein the probe task is included in the scheduling apparatus,
the acquisition module is configured to:
responding to a calculation strength instruction, and acquiring the probe task from the scheduling device, wherein the probe task is generated according to the calculation task after the calculation task is executed.
25. The scheduling apparatus of any one of claims 21-24 wherein the computing resources of the target computing center are divided into a plurality of computing units;
wherein the computing unit executing the probe task is a target computing unit of the plurality of computing units;
and the scheduling device is used for calculating the calculation power of the target calculation unit according to the calculation power of the calculation center of the performance parameter evaluation.
26. The scheduling apparatus of claim 25 wherein the scheduling apparatus includes therein an identification of the plurality of computing units included by the target computing center;
the issuing module is used for:
determining the target computing unit from the plurality of computing units;
and when the probe task is issued, issuing the identification of the target computing unit to the target computing center so that the target computing center determines the target computing unit for executing the probe task according to the identification of the target computing unit before the target computing unit in the plurality of computing units executes the probe task.
27. The scheduling apparatus of claim 26 wherein the issuing module is configured to:
acquiring resource configuration information of the target computing center, wherein the resource configuration information indicates operators which can be executed by each computing unit in the plurality of computing units;
determining the target computing unit from the plurality of computing units based on an operator that each computing unit of the plurality of computing units can execute and a probe operator included in the probe task.
28. The scheduling apparatus of claim 26 wherein the issuing module is configured to:
issuing the identifier of the probe operator included in the probe task to the target computing center;
receiving an identifier of a computing unit which can execute the probe task and is reported by the target computing center;
determining the target computing unit from among the computing units capable of performing the probe task.
29. The scheduling apparatus of claim 26 wherein the issuing module is configured to:
presenting the identities of the plurality of computing units to a user;
responding to the operation of a user, and generating a calculation unit setting instruction, wherein the calculation unit setting instruction comprises the identification of the target calculation unit;
and acquiring the identification of the target computing unit from the computing unit setting instruction.
30. The scheduling apparatus of any one of claims 21-29 wherein a plurality of performance parameters are reported by the target computing center;
the evaluation module is to:
and converting the performance parameters into a unified measurement value of computing power through a preset strategy.
31. A computer program product comprising instructions which, when executed by a scheduling apparatus, cause the scheduling apparatus to perform the method of any one of claims 11 to 20.
32. A computer readable storage medium comprising computer program instructions which, when executed by a scheduling apparatus, cause the scheduling apparatus to perform the method of any one of claims 11 to 20.
CN202210872143.6A 2022-05-09 2022-07-22 Computing network, computing force measuring method, scheduling device and related products Pending CN115373836A (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202210503412 2022-05-09
CN2022105034121 2022-05-09

Publications (1)

Publication Number Publication Date
CN115373836A true CN115373836A (en) 2022-11-22

Family

ID=84061124

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202210872143.6A Pending CN115373836A (en) 2022-05-09 2022-07-22 Computing network, computing force measuring method, scheduling device and related products

Country Status (1)

Country Link
CN (1) CN115373836A (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN117453376A (en) * 2023-12-20 2024-01-26 宁德时代新能源科技股份有限公司 Control method, device, equipment and storage medium for high-throughput calculation
WO2024124909A1 (en) * 2022-12-13 2024-06-20 中兴通讯股份有限公司 Communication method, electronic device, and storage medium

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN106355251A (en) * 2016-04-29 2017-01-25 北京大学 Data processing device and data processing method
CN109857633A (en) * 2018-12-14 2019-06-07 武汉斗鱼鱼乐网络科技有限公司 A kind of task calculates power estimation method, device and storage medium
CN112767993A (en) * 2021-03-03 2021-05-07 清华大学 Test method and test system
CN114217948A (en) * 2015-11-13 2022-03-22 谷歌有限责任公司 Performance monitoring in distributed storage systems

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN114217948A (en) * 2015-11-13 2022-03-22 谷歌有限责任公司 Performance monitoring in distributed storage systems
CN106355251A (en) * 2016-04-29 2017-01-25 北京大学 Data processing device and data processing method
CN109857633A (en) * 2018-12-14 2019-06-07 武汉斗鱼鱼乐网络科技有限公司 A kind of task calculates power estimation method, device and storage medium
CN112767993A (en) * 2021-03-03 2021-05-07 清华大学 Test method and test system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
WO2024124909A1 (en) * 2022-12-13 2024-06-20 中兴通讯股份有限公司 Communication method, electronic device, and storage medium
CN117453376A (en) * 2023-12-20 2024-01-26 宁德时代新能源科技股份有限公司 Control method, device, equipment and storage medium for high-throughput calculation
CN117453376B (en) * 2023-12-20 2024-05-03 宁德时代新能源科技股份有限公司 Control method, device, equipment and storage medium for high-throughput calculation

Similar Documents

Publication Publication Date Title
Liu et al. A hierarchical framework of cloud resource allocation and power management using deep reinforcement learning
Hsu et al. Micky: A cheaper alternative for selecting cloud instances
CN115373836A (en) Computing network, computing force measuring method, scheduling device and related products
Wu et al. Hybrid evolutionary scheduling for energy-efficient fog-enhanced internet of things
CN110389820A (en) A kind of private clound method for scheduling task carrying out resources based on v-TGRU model
Wu et al. A power consumption model for cloud servers based on elman neural network
CN113821332B (en) Method, device, equipment and medium for optimizing efficiency of automatic machine learning system
US11609784B2 (en) Method for distributing a computational process, workload distribution device and system for distributing a computational process
Tuli et al. GOSH: Task scheduling using deep surrogate models in fog computing environments
Hou et al. AlphaR: Learning-powered resource management for irregular, dynamic microservice graph
CN116991558B (en) Computing power resource scheduling method, multi-architecture cluster, device and storage medium
Zhang et al. Estimating power consumption of containers and virtual machines in data centers
Chen et al. Silhouette: Efficient cloud configuration exploration for large-scale analytics
CN113158435B (en) Complex system simulation running time prediction method and device based on ensemble learning
CN116820730B (en) Task scheduling method, device and storage medium of multi-engine computing system
CN109857633A (en) A kind of task calculates power estimation method, device and storage medium
CN111061618B (en) Cloud platform simulation system, cloud platform performance test method and computer equipment
CN116795198A (en) Energy consumption optimization method and device for data center and storage medium
CN113448687B (en) Hyper-heuristic task scheduling method and system based on reinforcement learning in cloud environment
Kim et al. Fedgpo: Heterogeneity-aware global parameter optimization for efficient federated learning
Sen et al. Predictive price-performance optimization for serverless query processing
CN112860531B (en) Block chain wide consensus performance evaluation method based on deep heterogeneous graph neural network
CN111598390B (en) Method, device, equipment and readable storage medium for evaluating high availability of server
Zuo et al. A cloud resource evaluation model based on entropy optimization and ant colony clustering
Yang et al. Tear up the bubble boom: Lessons learned from a deep learning research and development cluster

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination