WO2022110446A1 - Simulation method and apparatus for heterogeneous cluster scheduling, computer device, and storage medium - Google Patents

Simulation method and apparatus for heterogeneous cluster scheduling, computer device, and storage medium Download PDF

Info

Publication number
WO2022110446A1
WO2022110446A1 PCT/CN2020/139683 CN2020139683W WO2022110446A1 WO 2022110446 A1 WO2022110446 A1 WO 2022110446A1 CN 2020139683 W CN2020139683 W CN 2020139683W WO 2022110446 A1 WO2022110446 A1 WO 2022110446A1
Authority
WO
WIPO (PCT)
Prior art keywords
deep learning
scheduling
learning load
heterogeneous
load
Prior art date
Application number
PCT/CN2020/139683
Other languages
French (fr)
Chinese (zh)
Inventor
叶可江
陈文艳
须成忠
Original Assignee
中国科学院深圳先进技术研究院
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by 中国科学院深圳先进技术研究院 filed Critical 中国科学院深圳先进技术研究院
Publication of WO2022110446A1 publication Critical patent/WO2022110446A1/en

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5077Logical partitioning of resources; Management or configuration of virtualized resources
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06NCOMPUTING ARRANGEMENTS BASED ON SPECIFIC COMPUTATIONAL MODELS
    • G06N20/00Machine learning
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/4557Distribution of virtual machine instances; Migration and load balancing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/44Arrangements for executing specific programs
    • G06F9/455Emulation; Interpretation; Software simulation, e.g. virtualisation or emulation of application or operating system execution engines
    • G06F9/45533Hypervisors; Virtual machine monitors
    • G06F9/45558Hypervisor-specific management and integration aspects
    • G06F2009/45595Network integration; Enabling network access in virtual machine instances

Definitions

  • the present application relates to the field of information technology, and in particular, to a simulation method, apparatus, computer equipment and storage medium for scheduling of heterogeneous clusters.
  • a MapReduce simulator can simulate a large-scale cluster with a small number of nodes, and accurately simulate the running process of the job, thus providing a complete Hadoop cluster performance test platform to help solve the test problem of large-scale clusters.
  • Apache also provides a Yarn Scheduler Load Simulator (SLS), it is a tool that can load applications on a machine to simulate a large-scale YARN cluster.
  • SLS Yarn Scheduler Load Simulator
  • the simulator uses the actual YARN ResourceManager, within the same JAVA virtual machine, to remove the network factor by processing and scheduling NM/AMs heartbeat events, simulating NodeManager and ApplicationMaster.
  • the purpose of the embodiments of the present application is to propose a simulation method, device, computer equipment and storage medium for heterogeneous cluster scheduling, so as to at least solve the high complexity of experimental environment preparation and the cost of hardware resource purchase for the traditional heterogeneous cluster scheduling method Expensive issues.
  • an embodiment of the present application provides a simulation method for heterogeneous cluster scheduling, which adopts the following technical solutions:
  • the pre-trained deep learning load executes the operation mode of the instruction and sets the instruction scheduling policy
  • the running deep learning workload is simulated and expanded by cluster nodes, and the large-scale heterogeneous cluster operation of the deep learning workload is simulated, and the running behavior characteristics and running status data of the deep learning workload are obtained.
  • the method also includes:
  • the basic deep learning load benchmark is trained based on the training data set, and the trained deep learning load benchmark is obtained;
  • Making the deep learning load run according to the operation mode and the instruction scheduling strategy based on the execution instructions includes:
  • the deep learning workload benchmark is run according to the operation mode and the instruction scheduling policy.
  • the method also includes:
  • the method also includes:
  • the scheduling performance data is evaluated based on the preset performance index, and the scheduling performance evaluation result is obtained.
  • the embodiment of the present application also provides a simulation device for heterogeneous cluster scheduling, which adopts the following technical solutions:
  • the request receiving module is used to receive the simulation running request sent by the user terminal;
  • the information reading module is used to respond to the simulation operation request and read the local database to obtain the historical heterogeneous resource configuration information
  • the instruction setting module is used to set the operation mode of executing the instruction and the instruction scheduling policy for the pre-trained deep learning load based on the historical heterogeneous resource configuration information;
  • the load operation module is used to make the deep learning load run according to the operation mode and the instruction scheduling strategy based on the execution instruction;
  • the simulation operation module is used to simulate and expand the running deep learning load based on the kubernetes virtualization technology.
  • the device also includes:
  • the data set acquisition module is used to read the local database and obtain the training data set;
  • the load training module is used to train the basic deep learning load benchmark based on the training data set to obtain the trained deep learning load benchmark;
  • the load operation module includes:
  • the load running unit is used to make the deep learning load benchmark run according to the running mode and the instruction scheduling policy based on the execution instruction.
  • the device also includes:
  • the device also includes:
  • the performance indicator collection module is used to collect scheduling performance indicators for the running deep learning load based on the indicator collection configuration file to obtain scheduling performance data;
  • the performance evaluation module is used to evaluate the scheduling performance data based on the preset performance index to obtain the scheduling performance evaluation result.
  • the embodiment of the present application also provides a computer device, which adopts the following technical solutions:
  • It includes a memory and a processor, where a computer program is stored in the memory, and when the processor executes the computer program, the steps of the above-mentioned simulation method for scheduling a heterogeneous cluster are implemented.
  • the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:
  • a computer program is stored on the computer-readable storage medium, and when the computer program is executed by the processor, the steps of the above-mentioned simulation method for scheduling a heterogeneous cluster are implemented.
  • the present application provides a method for simulating heterogeneous cluster scheduling, including: receiving a simulation running request sent by a user terminal; responding to the simulation running request, reading a local database to obtain historical heterogeneous resource configuration information;
  • the pre-trained deep learning load executes the operation mode of the instruction and sets the instruction scheduling policy; based on the execution instruction, the deep learning load runs according to the operation mode and the instruction scheduling policy; based on the kubernetes virtualization technology, the running deep learning load is clustered Node simulation expansion, simulating the operation of large-scale heterogeneous clusters of deep learning workloads, and obtaining the running behavior characteristics and running status data of deep learning workloads.
  • the cluster nodes are simulated and expanded for the running deep learning workload, so as to simulate the large-scale heterogeneous cluster operation of the deep learning workload to obtain the running status data, which can not only realize Simulate the large-scale cluster deployment mode of kubernetes, and can also simulate the application scenarios of large-scale heterogeneous clusters through a small number of nodes, thus providing a low-cost experimental environment for scheduling R&D personnel, and effectively reducing the evaluation time of heterogeneous clusters to a certain extent. cost.
  • FIG. 1 is a schematic diagram of an exemplary principle to which the present application can be applied;
  • FIG. 2 is a flowchart of an embodiment of a method for simulating heterogeneous cluster scheduling according to the present application
  • 3 is a flowchart of deep learning load training according to the simulation method of heterogeneous cluster scheduling of the present application
  • FIG. 4 is a flowchart of performance data evaluation of the simulation method of heterogeneous cluster scheduling according to the present application
  • FIG. 5 is a schematic structural diagram of an embodiment of a simulation apparatus for scheduling heterogeneous clusters according to the present application
  • FIG. 6 is a schematic structural diagram of deep learning load training of a simulation device scheduled by a heterogeneous cluster according to the present application
  • FIG. 7 is a schematic structural diagram of performance data evaluation of a simulation device for heterogeneous cluster scheduling according to the present application.
  • FIG. 8 is a schematic structural diagram of an embodiment of a computer device according to the present application.
  • FIGs. 1-2 a flow chart of an embodiment of a method for simulating scheduling of a heterogeneous cluster according to the present application is shown. For the convenience of description, only the parts related to the present application are shown.
  • step S1 a simulation running request sent by the user terminal is received.
  • the simulated operation request is an operation request sent by the scheduling R&D personnel in order to obtain the runtime characteristics of the deep learning load under different high-performance heterogeneous hardware configurations and need to provide environmental support.
  • step S2 in response to the simulation running request, the local database is read to obtain historical heterogeneous resource configuration information.
  • the historical heterogeneous resource configuration information is collected before the simulation runs, in order to be able to set the operation mode and API calling relationship of the application program of the deep learning load under different architecture chips in a targeted manner.
  • architecture data can specifically include CPU models (AtomicSimple, TimingSimple, InOrder, O3, etc.) and GPU architectures (Tesla, Fermi, Kepler, Volta, Turing, etc.) and FPGA, TPU and other data.
  • step S3 based on the historical heterogeneous resource configuration information, the pre-trained deep learning load is configured to execute an instruction operation mode and an instruction scheduling policy.
  • the operation mode refers to the operation mode, such as serial operation and parallel operation, when an application program implementing a deep learning load under different architectures executes an instruction.
  • the instruction scheduling policy refers to a scheduling policy when an application implementing a deep learning load under different architectures executes an instruction, and a parameter configuration file of different hardware combinations capable of executing the scheduling policy.
  • the operation mode of executing the instruction on the pre-trained deep learning load based on the historical heterogeneous resource configuration information and the setting of the instruction scheduling policy may specifically be the setting of the existing hardware type based on the historical heterogeneous resource configuration information.
  • Analyze with the architecture so as to set the operation mode such as serial operation and parallel operation when the application program that can realize the deep learning load under different architectures executes the instruction, and the scheduling when the application program that can realize the deep learning load under the different architecture executes the instruction strategy, and the parameter configuration files of different hardware combinations that can execute the scheduling strategy, so that the application running state of the deep learning load under different architectures can be simulated in the future, and the runtime characteristics of the deep learning load under different hardware configurations can be further analyzed.
  • the autonomous configuration function of different hardware architecture combinations simulates the underlying resource configuration strategy of heterogeneous clusters, simplifies the configuration of heterogeneous environments, and saves the cost of purchasing physical hardware.
  • step S4 the deep learning load is made to run according to the operation mode and the instruction scheduling strategy based on the execution instruction.
  • making the deep learning load run according to the operation mode and the instruction scheduling strategy based on the execution instructions may specifically be to make the application program of the deep learning load execute the operation modes such as serial operation and parallel operation under different architectures, and in the Under different architectures, the application of the deep learning load executes the scheduling strategy based on the parameter configuration files of different hardware combinations, which can simulate the running state of the application of the deep learning load under different architectures, and then analyze the deep learning load in different hardware configurations. In this way, it can realize the independent configuration function of different hardware architecture combinations, simulate the underlying resource configuration strategy of heterogeneous clusters, simplify the configuration of heterogeneous environments, and save the cost of purchasing physical hardware.
  • step S5 based on the kubernetes virtualization technology, cluster nodes are simulated and expanded for the running deep learning load, and the large-scale heterogeneous cluster operation of the deep learning load is simulated to obtain the running behavior characteristics and running status data of the deep learning load.
  • Kubernetes has become a container orchestration tool widely used in industry and academia due to its features of portability, scalability and self-healing.
  • the cluster node simulation and expansion of the running deep learning load based on the kubernetes virtualization technology may be implemented by using k8s to deploy the cluster, and based on the k8s container orchestration tool to simulate the characteristics of a small number of nodes by simulating physical machine characteristics.
  • the cluster of large-scale nodes can simulate the operation of large-scale heterogeneous clusters of deep learning loads, so as to accurately obtain the running status data that can intuitively reflect the running status of the cluster. environment support.
  • the present application provides a method for simulating heterogeneous cluster scheduling, including: receiving a simulation running request sent by a user terminal; responding to the simulation running request, reading a local database to obtain historical heterogeneous resource configuration information;
  • the pre-trained deep learning load executes the operation mode of the instruction and sets the instruction scheduling strategy; based on the execution instruction, the deep learning load runs according to the operation mode and the instruction scheduling strategy; based on the kubernetes virtualization technology, the running deep learning load is clustered Node simulation expansion, simulating the operation of large-scale heterogeneous clusters of deep learning workloads, and obtaining the running behavior characteristics and running status data of deep learning workloads.
  • the cluster nodes are simulated and expanded for the running deep learning workload, so as to simulate the large-scale heterogeneous cluster operation of the deep learning workload to obtain the running status data, which can not only realize Simulate the large-scale cluster deployment mode of kubernetes, and can also simulate the application scenarios of large-scale heterogeneous clusters through a small number of nodes, thus providing a low-cost experimental environment for scheduling R&D personnel, and effectively reducing the evaluation time of heterogeneous clusters to a certain extent. cost.
  • FIG. 3 a flowchart of deep learning load training according to the simulation method for heterogeneous cluster scheduling of the present application is shown. For the convenience of description, only the parts related to the present application are shown.
  • the method before the above step S4, the method further includes: step S301 and step S302; the above step S4 includes: step S303.
  • step S301 a local database is read to obtain a training data set.
  • step S302 the basic deep learning load benchmark is trained based on the training data set to obtain a trained deep learning load benchmark.
  • step S303 the deep learning load benchmark is made to run according to the operation mode and the instruction scheduling policy based on the execution instruction.
  • the basic deep learning workload benchmark is to select representative benchmarks from several different types of existing deep learning workloads, and is mainly used to test the execution time, transmission speed, throughput, and resource occupancy of the workload. etc. data.
  • the training data set is a training data set of different scales corresponding to each benchmark.
  • the selected basic deep learning load benchmark is trained by collecting training data sets of different scales corresponding to each benchmark, so as to obtain an application program that can be used to test the simulated deep learning load and run under different architectures
  • the deep learning load benchmark and then when the application of the deep learning load benchmark under different architectures executes operation modes such as serial operation and parallel operation, and the application of the deep learning load benchmark under different architectures is based on different hardware combinations
  • the scheduling strategy is executed by the parameter configuration file, it can simulate the running state of the application program of the deep learning load benchmark under different architectures, and then analyze the runtime characteristics of the deep learning load benchmark under different hardware configurations, so as to realize the combination of different hardware architectures.
  • the self-configuration function of the system simulates the underlying resource configuration strategy of heterogeneous clusters, simplifies the configuration of heterogeneous environments, and saves the cost of purchasing physical hardware.
  • the method further includes:
  • script editing refers to setting of a configuration file that can freely select performance indicators and sampling intervals.
  • the performance indicators of the application layer of the deep learning load benchmark include CPU utilization, memory utilization, disk IO size, network bandwidth, and the like.
  • the performance indicators of the micro-architecture layer of the deep learning load benchmark are data such as IPC (Instructions per Cycle), branch prediction (Branch Predict), and cache misses (Cache misses).
  • the performance indicators of the application layer of the deep learning load benchmark such as CPU utilization, Memory utilization, disk IO size, network bandwidth, etc.
  • the performance indicators of the micro-architecture layer of the deep learning load benchmark such as IPC (Instructions per Cycle), branch prediction (Branch Predict) and cache misses (Cache misses) and other data
  • script Set to obtain an indicator collection configuration file that includes the function of freely selecting performance indicators and sampling interval, so that the subsequent collection of performance indicators can be scheduled for the running deep learning load based on the collection configuration file, and the performance of the collected data can be evaluated.
  • the performance evaluation results so as to effectively evaluate the performance of the scheduling algorithm, and to a certain extent, effectively reduce the evaluation time and cost of heterogeneous clusters.
  • FIG. 4 a flow chart of performance data evaluation of the simulation method for heterogeneous cluster scheduling according to the present application is shown. For the convenience of description, only the parts related to the present application are shown.
  • the method further includes: step S401 and step S402.
  • step S401 scheduling performance indicators are collected for the running deep learning load based on the indicator collection configuration file to obtain scheduling performance data.
  • step S402 the scheduling performance data is evaluated based on a preset performance index to obtain a scheduling performance evaluation result.
  • the preset performance indicators are set according to actual application requirements and can be used to select key indicators for evaluating scheduling performance, such as the time cost of completing each scheduling, the average job turnaround time, and changes in the overall resource utilization of the cluster, etc. , there is no specific restriction here.
  • the scheduling performance evaluation result is an evaluation index that can be used to intuitively reflect the scheduling performance when the deep learning load benchmark is simulated and run.
  • the scheduling performance indicators are collected for the deep learning load in the simulation operation, and the scheduling performance data that can be used to intuitively reflect the scheduling performance of the deep learning load benchmark simulation operation is obtained. Furthermore, the key indicators that meet the preset performance index requirements are screened out from the scheduling performance data, such as the time cost of completing each scheduling, the average job turnaround time, and changes in the overall resource utilization of the cluster, etc., so as to obtain information that can be used to intuitively reflect The scheduling performance evaluation results of the deep learning load benchmark simulation runtime scheduling performance, and to a certain extent, effectively reduce the evaluation time and cost of heterogeneous clusters.
  • the present application provides a method for simulating heterogeneous cluster scheduling, including: receiving a simulation running request sent by a user terminal; responding to the simulation running request, reading a local database to obtain historical heterogeneous resource configuration information; Configure resource configuration information to set the operation mode and instruction scheduling strategy of pre-trained deep learning workloads to execute instructions; make deep learning workloads run according to the running mode and instruction scheduling strategy based on execution instructions; based on kubernetes virtualization technology
  • the deep learning load is simulated and expanded by cluster nodes, and the large-scale heterogeneous cluster operation of the deep learning load is simulated, and the running behavior characteristics and running status data of the deep learning load are obtained.
  • the basic deep learning workload benchmark is trained based on the training data set to obtain the trained deep learning workload benchmark, and based on the pre-collected historical heterogeneous resource configuration information, the execution mode of the command and the command scheduling strategy are executed for the deep learning workload benchmark. Setting; and then make the deep learning load benchmark run according to the operation mode and instruction scheduling strategy based on the execution instructions to accurately obtain the running behavior characteristics of the deep learning load; and then simulate and expand the running deep learning load based on the kubernetes virtualization technology.
  • the large-scale heterogeneous cluster operation that simulates the deep learning load to obtain the running status data; then, based on the performance indicators of the application layer of the deep learning load benchmark and the performance indicators of the micro-architecture layer, the index collection configuration file is set, and the deep learning load in operation is analyzed. Collect scheduling performance indicators, and evaluate the collected scheduling performance data to quickly obtain scheduling performance evaluation results. It can not only simulate the large-scale cluster deployment of kubernetes, but also simulate the application scenarios of large-scale heterogeneous clusters through a small number of nodes, thus providing a low-cost experimental environment for scheduling R&D personnel, and effectively reducing the cost of heterogeneous clusters to a certain extent. Assess time and cost.
  • the aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM).
  • the present application provides an embodiment of a simulation apparatus for heterogeneous cluster scheduling, and the apparatus embodiment corresponds to the method embodiment shown in FIG. 2 ,
  • the device can be specifically applied to various electronic devices.
  • the simulation apparatus 100 for heterogeneous cluster scheduling includes: a request receiving module 101 , an information reading module 102 , an instruction setting module 103 , a load running module 104 and a simulation running module 105 . in:
  • a request receiving module 101 configured to receive a simulation running request sent by a user terminal
  • the simulated operation request is an operation request sent by the scheduling R&D personnel to obtain the runtime characteristics of the deep learning load under different high-performance heterogeneous hardware configurations and need to provide environmental support.
  • an information reading module 102 configured to read a local database to obtain historical heterogeneous resource configuration information in response to the simulation operation request;
  • the historical heterogeneous resource configuration information is collected before the simulation operation, in order to be able to set the operation mode and API calling relationship of the application program of the deep learning load under different architecture chips in a targeted manner.
  • architecture data can specifically include CPU models (AtomicSimple, TimingSimple, InOrder, O3, etc.) and GPU architectures (Tesla, Fermi, Kepler, Volta, Turing, etc.) and FPGA, TPU and other data.
  • the instruction setting module 103 is used for setting the operation mode of executing the instruction and the instruction scheduling strategy for the pre-trained deep learning load based on the historical heterogeneous resource configuration information;
  • the operation mode refers to the operation mode, such as serial operation and parallel operation, when an application program implementing a deep learning load under different architectures executes an instruction.
  • the instruction scheduling policy refers to a scheduling policy when an application implementing a deep learning load under different architectures executes an instruction, and a parameter configuration file of different hardware combinations capable of executing the scheduling policy.
  • the operation mode of executing the instruction on the pre-trained deep learning load based on the historical heterogeneous resource configuration information and the setting of the instruction scheduling policy may specifically be the setting of the existing hardware type based on the historical heterogeneous resource configuration information.
  • Analyze with the architecture so as to set the operation mode such as serial operation and parallel operation when the application program that can realize the deep learning load under different architectures executes the instruction, and the scheduling when the application program that can realize the deep learning load under the different architecture executes the instruction strategy, and the parameter configuration files of different hardware combinations that can execute the scheduling strategy, so that the application running state of the deep learning load under different architectures can be simulated in the future, and the runtime characteristics of the deep learning load under different hardware configurations can be further analyzed.
  • the autonomous configuration function of different hardware architecture combinations simulates the underlying resource configuration strategy of heterogeneous clusters, simplifies the configuration of heterogeneous environments, and saves the cost of purchasing physical hardware.
  • a load running module 104 configured to make the deep learning load run according to the running mode and the instruction scheduling policy based on the execution instruction
  • making the deep learning load run according to the operation mode and the instruction scheduling strategy based on the execution instructions may specifically be to make the application program of the deep learning load execute the operation modes such as serial operation and parallel operation under different architectures, and in the Under different architectures, the application of the deep learning load executes the scheduling strategy based on the parameter configuration files of different hardware combinations, which can simulate the running state of the application of the deep learning load under different architectures, and then analyze the deep learning load in different hardware configurations. In this way, it can realize the independent configuration function of different hardware architecture combinations, simulate the underlying resource configuration strategy of heterogeneous clusters, simplify the configuration of heterogeneous environments, and save the cost of purchasing physical hardware.
  • the simulation operation module 105 is used for performing cluster node simulation expansion on the running deep learning load based on the kubernetes virtualization technology, simulating the large-scale heterogeneous cluster operation of the deep learning load, and obtaining the operation behavior of the deep learning load characteristics and operating status data.
  • Kubernetes has become a container orchestration tool widely used in industry and academia due to its features of portability, scalability and self-healing.
  • the cluster node simulation and expansion of the running deep learning load based on the kubernetes virtualization technology may be implemented by using k8s to deploy the cluster, and based on the k8s container orchestration tool to simulate a small number of nodes by simulating the characteristics of the physical machine.
  • the cluster of large-scale nodes can simulate the operation of large-scale heterogeneous clusters of deep learning loads, so as to accurately obtain the running status data that can intuitively reflect the running status of the cluster. environment support.
  • the present application provides a simulation device for heterogeneous cluster scheduling, which includes: based on pre-collected historical heterogeneous resource configuration information, an operation mode of executing an instruction for a pre-trained deep learning load and setting of an instruction scheduling strategy; Execute the instructions to make the deep learning load run according to the operation mode and the instruction scheduling strategy to accurately obtain the operating behavior characteristics of the deep learning load; further, based on the kubernetes virtualization technology, the running deep learning load is simulated and expanded by cluster nodes to simulate the deep learning load.
  • FIG. 6 there is shown a schematic structural diagram of deep learning load training of a simulation device for heterogeneous cluster scheduling according to the present application. For the convenience of description, only parts related to the present application are shown.
  • the apparatus further includes: a data set acquisition module 601 and a load training module 602 ; the above-mentioned load operation module 104 includes: a load operation unit 603 .
  • the load training module 602 is used to train the basic deep learning load benchmark based on the training data set to obtain the trained deep learning load benchmark;
  • the load running unit 603 is configured to run the deep learning load benchmark according to the running mode and the instruction scheduling policy based on the execution instruction.
  • the basic deep learning workload benchmark is to select representative benchmarks from several different types of existing deep learning workloads, and is mainly used to test the execution time, transmission speed, throughput, and resource occupancy of the workload. etc. data.
  • the training data set is a training data set of different scales corresponding to each benchmark.
  • the selected basic deep learning load benchmark is trained by collecting training data sets of different scales corresponding to each benchmark, so as to obtain an application program that can be used to test the simulated deep learning load and run under different architectures
  • the deep learning load benchmark and then when the application of the deep learning load benchmark under different architectures executes operation modes such as serial operation and parallel operation, and the application of the deep learning load benchmark under different architectures is based on different hardware combinations
  • the scheduling strategy is executed by the parameter configuration file, it can simulate the running state of the application program of the deep learning load benchmark under different architectures, and then analyze the runtime characteristics of the deep learning load benchmark under different hardware configurations, so as to realize the combination of different hardware architectures.
  • the self-configuration function of the system simulates the underlying resource configuration strategy of heterogeneous clusters, simplifies the configuration of heterogeneous environments, and saves the cost of purchasing physical hardware.
  • the device further includes:
  • script editing refers to setting of a configuration file that can freely select performance indicators and sampling intervals.
  • the performance indicators of the application layer of the deep learning load benchmark include CPU utilization, memory utilization, disk IO size, network bandwidth, and the like.
  • the performance indicators of the micro-architecture layer of the deep learning load benchmark are data such as IPC (Instructions per Cycle), branch prediction (Branch Predict), and cache misses (Cache misses).
  • the performance indicators of the application layer of the deep learning load benchmark such as CPU utilization, Memory utilization, disk IO size, network bandwidth, etc.
  • the performance indicators of the micro-architecture layer of the deep learning load benchmark such as IPC (Instructions per Cycle), branch prediction (Branch Predict) and cache misses (Cache misses) and other data
  • script Set to obtain an indicator collection configuration file that includes the function of freely selecting performance indicators and sampling interval, so that the subsequent collection of performance indicators can be scheduled for the running deep learning load based on the collection configuration file, and the performance of the collected data can be evaluated.
  • the performance evaluation results so as to effectively evaluate the performance of the scheduling algorithm, and to a certain extent, effectively reduce the evaluation time and cost of heterogeneous clusters.
  • FIG. 7 a schematic structural diagram of the performance data evaluation of the simulation apparatus for heterogeneous cluster scheduling according to the present application is shown. For the convenience of description, only the parts related to the present application are shown.
  • the apparatus further includes: a performance index collection module 701 and a performance evaluation module 702 .
  • a performance indicator collection module 701 configured to collect scheduling performance indicators for the running deep learning load based on the indicator collection configuration file to obtain scheduling performance data
  • the performance evaluation module 702 is configured to evaluate the scheduling performance data based on a preset performance index to obtain a scheduling performance evaluation result.
  • the preset performance indicators are set according to actual application requirements and can be used to select key indicators for evaluating scheduling performance, such as the time cost of completing each scheduling, the average job turnaround time, and changes in the overall resource utilization of the cluster, etc. , there is no specific restriction here.
  • the scheduling performance evaluation result is an evaluation index that can be used to intuitively reflect the scheduling performance when the deep learning load benchmark is simulated and run.
  • the scheduling performance indicators are collected for the deep learning load in the simulation operation, and the scheduling performance data that can be used to intuitively reflect the scheduling performance of the deep learning load benchmark simulation operation is obtained. Furthermore, the key indicators that meet the preset performance index requirements are screened out from the scheduling performance data, such as the time cost of completing each scheduling, the average job turnaround time, and changes in the overall resource utilization of the cluster, etc., so as to obtain information that can be used to intuitively reflect The scheduling performance evaluation results of the deep learning load benchmark simulation runtime scheduling performance, and to a certain extent, effectively reduce the evaluation time and cost of heterogeneous clusters.
  • the present application provides a simulation device for heterogeneous cluster scheduling, including: a request receiving module for receiving a simulation running request sent by a user terminal; an information reading module for responding to the simulation running request, reading The local database obtains historical heterogeneous resource configuration information; the instruction setting module is used to execute the operation mode of the pre-trained deep learning load and the setting of the instruction scheduling strategy based on the historical heterogeneous resource configuration information; the load operation module is used to Based on the execution of instructions, the deep learning load runs according to the operation mode and the instruction scheduling strategy; the simulation operation module is used to simulate and expand the cluster nodes of the running deep learning load based on the kubernetes virtualization technology, and simulate the large-scale heterogeneous cluster of the deep learning load.
  • Run get the running behavior characteristics and running status data of the deep learning load.
  • the basic deep learning workload benchmark is trained based on the training data set to obtain the trained deep learning workload benchmark, and based on the pre-collected historical heterogeneous resource configuration information, the execution mode of the instruction and the instruction scheduling strategy are executed for the deep learning workload benchmark. Setting; and then make the deep learning load benchmark run according to the operation mode and the instruction scheduling strategy based on the execution instructions to accurately obtain the running behavior characteristics of the deep learning load; and then simulate and expand the running deep learning load based on the kubernetes virtualization technology.
  • the large-scale heterogeneous cluster operation that simulates the deep learning load to obtain the running status data; then, based on the performance indicators of the application layer of the deep learning load benchmark and the performance indicators of the micro-architecture layer, the index collection configuration file is set to perform the operation of the deep learning load in operation. Collect scheduling performance indicators, and evaluate the collected scheduling performance data to quickly obtain scheduling performance evaluation results. It can not only simulate the large-scale cluster deployment mode of kubernetes, but also simulate the application scenarios of large-scale heterogeneous clusters through a small number of nodes, thus providing a low-cost experimental environment for scheduling R&D personnel, and effectively reducing the cost of heterogeneous clusters to a certain extent. Assess time and cost.
  • FIG. 8 is a block diagram of a basic structure of a computer device according to this embodiment.
  • the computer device 8 includes a memory 81 , a processor 82 , and a network interface 83 that communicate with each other through a system bus. It should be noted that only the computer device 8 with components 81-83 is shown in the figure, but it should be understood that implementation of all shown components is not required, and more or less components may be implemented instead.
  • the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Signal Processor (Digital Signal Processor, DSP), embedded devices, etc.
  • ASIC Application Specific Integrated Circuit
  • ASIC Application Specific Integrated Circuit
  • FPGA Field-Programmable Gate Array
  • DSP Digital Signal Processor
  • embedded devices etc.
  • the computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment.
  • the computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.
  • the memory 81 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (eg, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc.
  • the memory 81 may be an internal storage unit of the computer device 8 , such as a hard disk or a memory of the computer device 8 .
  • the memory 81 may also be an external storage device of the computer device 8, such as a plug-in hard disk equipped on the computer device 8, a smart memory card (Smart Media Card, SMC), Secure Digital (SD) card, Flash Card (Flash Card), etc.
  • the memory 81 may also include both the internal storage unit of the computer device 8 and its external storage device.
  • the memory 81 is generally used to store the operating system and various application software installed on the computer device 8 , such as program codes of a method for simulating heterogeneous cluster scheduling.
  • the memory 81 can also be used to temporarily store various types of data that have been output or will be output.
  • the processor 82 may be a central processing unit (Central Processing Unit) in some embodiments. Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip.
  • the processor 82 is typically used to control the overall operation of the computer device 8 .
  • the processor 82 is configured to run the program code stored in the memory 81 or process data, for example, the program code for executing the simulation method of the heterogeneous cluster scheduling.
  • the network interface 83 may include a wireless network interface or a wired network interface, and the network interface 83 is generally used to establish a communication connection between the computer device 8 and other electronic devices.
  • the present application also provides another embodiment, which is to provide a computer-readable storage medium, where the computer-readable storage medium stores a simulation program scheduled by a heterogeneous cluster, and the simulation program scheduled by a heterogeneous cluster can be at least One processor executes to cause the at least one processor to execute the steps of the above-described simulation method for scheduling a heterogeneous cluster.
  • the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation.
  • the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.
  • a storage medium such as ROM/RAM, magnetic disk, CD-ROM

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Artificial Intelligence (AREA)
  • Computer Vision & Pattern Recognition (AREA)
  • Data Mining & Analysis (AREA)
  • Evolutionary Computation (AREA)
  • Medical Informatics (AREA)
  • Computing Systems (AREA)
  • Mathematical Physics (AREA)
  • Debugging And Monitoring (AREA)
  • Management, Administration, Business Operations System, And Electronic Commerce (AREA)

Abstract

Embodiments of the present application relate to the technical field of information, and relate to a simulation method for heterogeneous cluster scheduling, comprising: receiving a simulation operation request sent by a user terminal; in response to the simulation operation request, reading a local database to obtain historical heterogeneous resource configuration information; setting the operation mode of an execution instruction and an instruction scheduling policy for a pre-trained deep learning load on the basis of the historical heterogeneous resource configuration information; enabling, on the basis of the execution instruction, the deep learning load to operate according to the operation mode and the instruction scheduling policy; performing, on the basis of a kubernetes virtualization technology, cluster node simulation extension on the deep learning load in operation, so as to simulate large-scale heterogeneous cluster operation of the deep learning load to obtain operation behavior characteristics and operation state data of the deep learning load. The present application also provides a simulation apparatus for heterogeneous cluster scheduling, a computer device, and a storage medium. The present application can provide a low-cost experimental environment, and effectively reduce the evaluation time of a heterogeneous cluster to a certain extent.

Description

异构集群调度的模拟方法、装置、计算机设备及存储介质Simulation method, device, computer equipment and storage medium for heterogeneous cluster scheduling 技术领域technical field
本申请涉及信息技术领域,尤其涉及一种异构集群调度的模拟方法、装置、计算机设备及存储介质。The present application relates to the field of information technology, and in particular, to a simulation method, apparatus, computer equipment and storage medium for scheduling of heterogeneous clusters.
背景技术Background technique
人工智能的迅速发展使得涌现出越来越多的深度学习应用,而该类应用在运行时由于需要海量的训练数据和高性能的计算能力,促使了高性能芯片如GPU、FPGA和TPU等的产生和应用。为了满足上层深度学习应用对计算力和存储能力的需求,CPU与GPU等的异构集群逐渐成为主要集群配置方案。The rapid development of artificial intelligence has led to the emergence of more and more deep learning applications, which require massive training data and high-performance computing power at runtime, prompting the use of high-performance chips such as GPUs, FPGAs, and TPUs. production and application. In order to meet the computing power and storage capacity requirements of upper-layer deep learning applications, heterogeneous clusters such as CPUs and GPUs have gradually become the main cluster configuration solutions.
而近年来随着深度学***台无法达到其要求,而高性能计算则为该类应用的运行提供了持续稳定的算例保障。而由于高性能芯片GPU等购价昂贵,且底层架构复杂多样、集群规模不断扩张,对上层的深度学***性、容量保证和资源可靠性的影响,很难找到大规模集群评测调度算法性能。因此,如何根据实际需求设计一个轻量级、易于配置和使用的异构集群调度模拟器是目前异构调度研究的主要问题。In recent years, with the rise of deep learning, high-performance computing has gradually entered the field of cloud clusters. Due to the strong demand for computing power in deep learning applications, general computing platforms cannot meet their requirements, but high-performance computing provides continuous and stable calculation examples for the operation of such applications. However, high-performance chips such as GPU are expensive to purchase, the underlying architecture is complex and diverse, and the cluster scale continues to expand, which brings huge challenges to the upper-layer deep learning application scheduling. However, there are many problems when designing scheduling algorithms for deep learning applications on heterogeneous clusters. On the one hand, due to the large number of underlying hardware resources such as GPUs, the acquisition cost is expensive; on the other hand, scheduling decisions are affected by many factors such as fairness, capacity guarantee, and resource reliability, and it is difficult to find large-scale clusters to evaluate the performance of scheduling algorithms. Therefore, how to design a lightweight, easy-to-configure and use heterogeneous cluster scheduling simulator according to actual needs is the main problem of current heterogeneous scheduling research.
目前,潘旭明等的《MapReduceFairScheduler的高性能优化及超大规模集群模拟器设计及实现》结合真实的线上生产集群,设计并实现了超大规模hadoop集群的模拟器,并对其功能和性能做了验证性测试,其主要功能有(1)用1~2台服务器模拟超大规模集群,(2)模拟客户端并发提交作业,提供全面的benchmark测试。基于模拟器搭建了2000个节点的模拟集群,分别对FIFO,FairScheduler和新的公平调度器进行全面的对比测试;以及刘知俊等的《面向性能调优的MapReduce集群模拟器的研究与设计》设计了一个MapReduce模拟器,能使用少量节点模拟出大规模集群,并对作业的运行过程进行了精确模拟,从而提供了一个完整的Hadoop集群性能测试平台,帮助解决大规模集群的测试问题。此外,Apache也提供了一个Yarn调度器负载模拟器Yarn Scheduler Load Simulator (SLS), 它是一个能够在一台机器上装载应用程序,模拟一个大规模的YARN 集群的工具。模拟器使用实际的YARN ResourceManager,在相同的JAVA 虚拟机内,通过处理和调度NM/AMs 心跳事件,模拟NodeManager和ApplicationMaster 来移除网络因素。At present, Pan Xuming et al.'s "High-performance optimization of MapReduceFairScheduler and design and implementation of super-large-scale cluster simulator" combined with real online production clusters, designed and implemented a super-large-scale hadoop cluster simulator, and verified its functions and performance. Its main functions are (1) simulating a super-large-scale cluster with 1 to 2 servers, (2) simulating the client to submit jobs concurrently, and provide a comprehensive benchmark test. Based on the simulator, a simulated cluster of 2000 nodes was built, and a comprehensive comparison test of FIFO, FairScheduler and the new fair scheduler was carried out respectively. A MapReduce simulator can simulate a large-scale cluster with a small number of nodes, and accurately simulate the running process of the job, thus providing a complete Hadoop cluster performance test platform to help solve the test problem of large-scale clusters. In addition, Apache also provides a Yarn Scheduler Load Simulator (SLS), it is a tool that can load applications on a machine to simulate a large-scale YARN cluster. The simulator uses the actual YARN ResourceManager, within the same JAVA virtual machine, to remove the network factor by processing and scheduling NM/AMs heartbeat events, simulating NodeManager and ApplicationMaster.
但是,这些技术一方面采用针对CPU处理器的模拟器如gem5,支持多种ISAs和CPU模型且高度可配置;或者针对GPU的模拟器如GPGPU-sim,支持不同GPU架构;针对CPU和GPU混合的模拟器gem5-gpu,该模拟器是对gem5和GPGPU-sim的集成。另一方面,现有的集群模拟器主要是针对大数据负载设计的Hadoop和Yarn调度模拟器,这些模拟器在底层硬件架构和上层调度***方面都存在滞后性,难以满足调度优化研发人员对于异构集群调度的迫切需求,且实验环境准备的复杂性高以及硬件资源购置的费用支出高昂。However, on the one hand, these technologies use simulators for CPU processors such as gem5, which support multiple ISAs and CPU models and are highly configurable; or simulators for GPUs such as GPGPU-sim, which support different GPU architectures; for CPU and GPU hybrids The emulator gem5-gpu, the emulator is an integration of gem5 and GPGPU-sim. On the other hand, the existing cluster simulators are mainly Hadoop and Yarn scheduling simulators designed for big data loads. These simulators have lag in the underlying hardware architecture and the upper-level scheduling system, which are difficult to meet the needs of scheduling optimization developers for different The urgent need for cluster scheduling, and the high complexity of experimental environment preparation and the high cost of hardware resource acquisition.
技术问题technical problem
本申请实施例的目的在于提出一种异构集群调度的模拟方法、装置、计算机设备及存储介质,以至少解决传统的异构集群调度方法的实验环境准备的复杂性高以及硬件资源购置的费用支出高昂的问题。The purpose of the embodiments of the present application is to propose a simulation method, device, computer equipment and storage medium for heterogeneous cluster scheduling, so as to at least solve the high complexity of experimental environment preparation and the cost of hardware resource purchase for the traditional heterogeneous cluster scheduling method Expensive issues.
技术解决方案technical solutions
为了解决上述技术问题,本申请实施例提供一种异构集群调度的模拟方法,采用了如下所述的技术方案:In order to solve the above technical problems, an embodiment of the present application provides a simulation method for heterogeneous cluster scheduling, which adopts the following technical solutions:
接收用户终端发送的模拟运行请求;Receive the simulation running request sent by the user terminal;
响应模拟运行请求,读取本地数据库获取历史异构资源配置信息;In response to the simulated operation request, read the local database to obtain historical heterogeneous resource configuration information;
基于历史异构资源配置信息对预先训练好的深度学习负载进行执行指令的运行模式以及指令调度策略的设置;Based on the historical heterogeneous resource configuration information, the pre-trained deep learning load executes the operation mode of the instruction and sets the instruction scheduling policy;
基于执行指令使深度学习负载按照运行模式以及指令调度策略运行;Make the deep learning load run according to the operation mode and the instruction scheduling strategy based on the execution instruction;
基于kubernetes虚拟化技术对运行中的深度学习负载进行集群节点模拟扩展,模拟深度学习负载的大规模异构集群运行,得到深度学习负载的运行行为特征以及运行状态数据。Based on the kubernetes virtualization technology, the running deep learning workload is simulated and expanded by cluster nodes, and the large-scale heterogeneous cluster operation of the deep learning workload is simulated, and the running behavior characteristics and running status data of the deep learning workload are obtained.
进一步的,该方法还包括:Further, the method also includes:
读取本地数据库,获取训练数据集;Read the local database and get the training data set;
基于训练数据集对基础深度学习负载benchmark进行训练,得到训练好的深度学习负载benchmark;The basic deep learning load benchmark is trained based on the training data set, and the trained deep learning load benchmark is obtained;
基于执行指令使深度学习负载按照运行模式以及指令调度策略运行具体包括:Making the deep learning load run according to the operation mode and the instruction scheduling strategy based on the execution instructions includes:
基于执行指令使深度学习负载benchmark按照运行模式以及指令调度策略运行。Based on the execution instructions, the deep learning workload benchmark is run according to the operation mode and the instruction scheduling policy.
进一步地,该方法还包括:Further, the method also includes:
对深度学习负载benchmark的应用层以及微架构层的性能指标进行脚本编辑,得到深度学习负载benchmark对应的指标采集配置文件。Script editing of the performance indicators of the application layer and the micro-architecture layer of the deep learning load benchmark to obtain the index collection configuration file corresponding to the deep learning load benchmark.
进一步地,该方法还包括:Further, the method also includes:
基于指标采集配置文件对运行中的深度学习负载进行调度性能指标采集,得到调度性能数据;Collect scheduling performance indicators for the running deep learning load based on the indicator collection configuration file to obtain scheduling performance data;
基于预设的性能指标对调度性能数据进行评价,得到调度性能评估结果。The scheduling performance data is evaluated based on the preset performance index, and the scheduling performance evaluation result is obtained.
为了解决上述技术问题,本申请实施例还提供一种异构集群调度的模拟装置,采用了如下所述的技术方案:In order to solve the above technical problem, the embodiment of the present application also provides a simulation device for heterogeneous cluster scheduling, which adopts the following technical solutions:
请求接收模块,用于接收用户终端发送的模拟运行请求;The request receiving module is used to receive the simulation running request sent by the user terminal;
信息读取模块,用于响应模拟运行请求,读取本地数据库获取历史异构资源配置信息;The information reading module is used to respond to the simulation operation request and read the local database to obtain the historical heterogeneous resource configuration information;
指令设置模块,用于基于历史异构资源配置信息对预先训练好的深度学习负载进行执行指令的运行模式以及指令调度策略的设置;The instruction setting module is used to set the operation mode of executing the instruction and the instruction scheduling policy for the pre-trained deep learning load based on the historical heterogeneous resource configuration information;
负载运行模块,用于基于执行指令使深度学习负载按照运行模式以及指令调度策略运行;The load operation module is used to make the deep learning load run according to the operation mode and the instruction scheduling strategy based on the execution instruction;
模拟运行模块,用于基于kubernetes虚拟化技术对运行中的深度学习负载进行集群节点模拟扩展,模拟深度学习负载的大规模异构集群运行,得到深度学习负载的运行行为特征以及运行状态数据。The simulation operation module is used to simulate and expand the running deep learning load based on the kubernetes virtualization technology.
进一步的,该装置还包括:Further, the device also includes:
数据集获取模块,用于读取本地数据库,获取训练数据集;The data set acquisition module is used to read the local database and obtain the training data set;
负载训练模块,用于基于训练数据集对基础深度学习负载benchmark进行训练,得到训练好的深度学习负载benchmark;The load training module is used to train the basic deep learning load benchmark based on the training data set to obtain the trained deep learning load benchmark;
负载运行模块包括:The load operation module includes:
负载运行单元,用于基于执行指令使深度学习负载benchmark按照运行模式以及指令调度策略运行。The load running unit is used to make the deep learning load benchmark run according to the running mode and the instruction scheduling policy based on the execution instruction.
进一步地,该装置还包括:Further, the device also includes:
对深度学习负载benchmark的应用层以及微架构层的性能指标进行脚本编辑,得到深度学习负载benchmark对应的指标采集配置文件。Script editing of the performance indicators of the application layer and the micro-architecture layer of the deep learning load benchmark to obtain the index collection configuration file corresponding to the deep learning load benchmark.
进一步地,该装置还包括:Further, the device also includes:
性能指标采集模块,用于基于指标采集配置文件对运行中的深度学习负载进行调度性能指标采集,得到调度性能数据;The performance indicator collection module is used to collect scheduling performance indicators for the running deep learning load based on the indicator collection configuration file to obtain scheduling performance data;
性能评估模块,用于基于预设的性能指标对调度性能数据进行评价,得到调度性能评估结果。The performance evaluation module is used to evaluate the scheduling performance data based on the preset performance index to obtain the scheduling performance evaluation result.
为了解决上述技术问题,本申请实施例还提供一种计算机设备,采用了如下所述的技术方案:In order to solve the above-mentioned technical problems, the embodiment of the present application also provides a computer device, which adopts the following technical solutions:
包括存储器和处理器,存储器中存储有计算机程序,处理器执行计算机程序时实现如上所述异构集群调度的模拟方法的步骤。It includes a memory and a processor, where a computer program is stored in the memory, and when the processor executes the computer program, the steps of the above-mentioned simulation method for scheduling a heterogeneous cluster are implemented.
为了解决上述技术问题,本申请实施例还提供一种计算机可读存储介质,采用了如下所述的技术方案:In order to solve the above technical problems, the embodiments of the present application also provide a computer-readable storage medium, which adopts the following technical solutions:
计算机可读存储介质上存储有计算机程序,计算机程序被处理器执行时实现如上所述的异构集群调度的模拟方法的步骤。A computer program is stored on the computer-readable storage medium, and when the computer program is executed by the processor, the steps of the above-mentioned simulation method for scheduling a heterogeneous cluster are implemented.
有益效果beneficial effect
本申请提供了一种异构集群调度的模拟方法,包括:接收用户终端发送的模拟运行请求;响应模拟运行请求,读取本地数据库获取历史异构资源配置信息;基于历史异构资源配置信息对预先训练好的深度学习负载进行执行指令的运行模式以及指令调度策略的设置;基于执行指令使深度学习负载按照运行模式以及指令调度策略运行;基于kubernetes虚拟化技术对运行中的深度学习负载进行集群节点模拟扩展,模拟深度学习负载的大规模异构集群运行,得到深度学习负载的运行行为特征以及运行状态数据。基于预先收集到的历史异构资源配置信息针对预先训练好的深度学习负载进行执行指令的运行模式以及指令调度策略的设置;并基于执行指令使深度学习负载按照运行模式以及指令调度策略运行来准确获取深度学习负载的运行行为特征;进而,基于kubernetes虚拟化技术对运行中的深度学习负载进行集群节点模拟扩展,从而模拟深度学习负载的大规模异构集群运行来获取运行状态数据,不仅能够实现模拟kubernetes大规模集群部署方式,还能够通过少量节点实现模拟大规模异构集群的应用场景,从而为调度研发人员提供低成本的实验环境,并在一定程度上有效降低异构集群的评估时间以及成本。The present application provides a method for simulating heterogeneous cluster scheduling, including: receiving a simulation running request sent by a user terminal; responding to the simulation running request, reading a local database to obtain historical heterogeneous resource configuration information; The pre-trained deep learning load executes the operation mode of the instruction and sets the instruction scheduling policy; based on the execution instruction, the deep learning load runs according to the operation mode and the instruction scheduling policy; based on the kubernetes virtualization technology, the running deep learning load is clustered Node simulation expansion, simulating the operation of large-scale heterogeneous clusters of deep learning workloads, and obtaining the running behavior characteristics and running status data of deep learning workloads. Based on the pre-collected historical heterogeneous resource configuration information, execute the instruction operation mode and instruction scheduling policy setting for the pre-trained deep learning load; Obtain the operating behavior characteristics of the deep learning workload; further, based on the kubernetes virtualization technology, the cluster nodes are simulated and expanded for the running deep learning workload, so as to simulate the large-scale heterogeneous cluster operation of the deep learning workload to obtain the running status data, which can not only realize Simulate the large-scale cluster deployment mode of kubernetes, and can also simulate the application scenarios of large-scale heterogeneous clusters through a small number of nodes, thus providing a low-cost experimental environment for scheduling R&D personnel, and effectively reducing the evaluation time of heterogeneous clusters to a certain extent. cost.
附图说明Description of drawings
为了更清楚地说明本申请中的方案,下面将对本申请实施例描述中所需要使用的附图作一个简单介绍,显而易见地,下面描述中的附图是本申请的一些实施例,对于本领域普通技术人员来讲,在不付出创造性劳动的前提下,还可以根据这些附图获得其他的附图。In order to illustrate the solutions in the present application more clearly, the following will briefly introduce the accompanying drawings used in the description of the embodiments of the present application. For those of ordinary skill, other drawings can also be obtained from these drawings without any creative effort.
图1是本申请可以应用于其中的示例性原理示意图;FIG. 1 is a schematic diagram of an exemplary principle to which the present application can be applied;
图2 是根据本申请的异构集群调度的模拟方法的一个实施例的流程图;FIG. 2 is a flowchart of an embodiment of a method for simulating heterogeneous cluster scheduling according to the present application;
图3是根据本申请的异构集群调度的模拟方法的深度学习负载训练的流程图;3 is a flowchart of deep learning load training according to the simulation method of heterogeneous cluster scheduling of the present application;
图4是根据本申请的异构集群调度的模拟方法的性能数据评价的流程图;FIG. 4 is a flowchart of performance data evaluation of the simulation method of heterogeneous cluster scheduling according to the present application;
图5是根据本申请的异构集群调度的模拟装置的一个实施例的结构示意图;5 is a schematic structural diagram of an embodiment of a simulation apparatus for scheduling heterogeneous clusters according to the present application;
图6是根据本申请的异构集群调度的模拟装置的深度学习负载训练的结构示意图;6 is a schematic structural diagram of deep learning load training of a simulation device scheduled by a heterogeneous cluster according to the present application;
图7是根据本申请的异构集群调度的模拟装置的性能数据评价的结构示意图;7 is a schematic structural diagram of performance data evaluation of a simulation device for heterogeneous cluster scheduling according to the present application;
图8是根据本申请的计算机设备的一个实施例的结构示意图。FIG. 8 is a schematic structural diagram of an embodiment of a computer device according to the present application.
本发明的实施方式Embodiments of the present invention
除非另有定义,本文所使用的所有的技术和科学术语与属于本申请的技术领域的技术人员通常理解的含义相同;本文中在申请的说明书中所使用的术语只是为了描述具体的实施例的目的,不是旨在于限制本申请;本申请的说明书和权利要求书及上述附图说明中的术语“包括”和“具有”以及它们的任何变形,意图在于覆盖不排他的包含。本申请的说明书和权利要求书或上述附图中的术语“第一”、“第二”等是用于区别不同对象,而不是用于描述特定顺序。Unless otherwise defined, all technical and scientific terms used herein have the same meaning as commonly understood by one of ordinary skill in the technical field of this application; the terms used herein in the specification of the application are for the purpose of describing specific embodiments only It is not intended to limit the application; the terms "comprising" and "having" and any variations thereof in the description and claims of this application and the above description of the drawings are intended to cover non-exclusive inclusion. The terms "first", "second" and the like in the description and claims of the present application or the above drawings are used to distinguish different objects, rather than to describe a specific order.
在本文中提及“实施例”意味着,结合实施例描述的特定特征、结构或特性可以包含在本申请的至少一个实施例中。在说明书中的各个位置出现该短语并不一定均是指相同的实施例,也不是与其它实施例互斥的独立的或备选的实施例。本领域技术人员显式地和隐式地理解的是,本文所描述的实施例可以与其它实施例相结合。Reference herein to an "embodiment" means that a particular feature, structure, or characteristic described in connection with the embodiment can be included in at least one embodiment of the present application. The appearances of the phrase in various places in the specification are not necessarily all referring to the same embodiment, nor a separate or alternative embodiment that is mutually exclusive of other embodiments. It is explicitly and implicitly understood by those skilled in the art that the embodiments described herein may be combined with other embodiments.
为了使本技术领域的人员更好地理解本申请方案,下面将结合附图,对本申请实施例中的技术方案进行清楚、完整地描述。In order to make those skilled in the art better understand the solutions of the present application, the technical solutions in the embodiments of the present application will be described clearly and completely below with reference to the accompanying drawings.
实施例一Example 1
参阅图1-2,示出了根据本申请的异构集群调度的模拟的方法的一个实施例的流程图,为了便于说明,仅示出了与本申请相关的部分。Referring to Figs. 1-2, a flow chart of an embodiment of a method for simulating scheduling of a heterogeneous cluster according to the present application is shown. For the convenience of description, only the parts related to the present application are shown.
在步骤S1中,接收用户终端发送的模拟运行请求。In step S1, a simulation running request sent by the user terminal is received.
在本实施例中,模拟运行请求是调度研发人员为了获取深度学习负载在不同高性能异构硬件配置下的运行时特征需要提供环境支持而发出的操作请求。In this embodiment, the simulated operation request is an operation request sent by the scheduling R&D personnel in order to obtain the runtime characteristics of the deep learning load under different high-performance heterogeneous hardware configurations and need to provide environmental support.
在步骤S2中,响应模拟运行请求,读取本地数据库获取历史异构资源配置信息。In step S2, in response to the simulation running request, the local database is read to obtain historical heterogeneous resource configuration information.
在本实施例中,历史异构资源配置信息是在模拟运行前,为了能够有针对性的对深度学习负载的应用程序在不同架构芯片下的运行方式和API调用关系进行设置而进行采集的现有CPU与高性能芯片如GPU的不同架构数据,其中,该不同架构数据具体可以包含CPU模型(AtomicSimple、TimingSimple、InOrder和O3等)与GPU架构(Tesla、Fermi、Kepler、Volta、Turing等)以及FPGA、TPU等数据。In this embodiment, the historical heterogeneous resource configuration information is collected before the simulation runs, in order to be able to set the operation mode and API calling relationship of the application program of the deep learning load under different architecture chips in a targeted manner. There are different architecture data of CPU and high-performance chips such as GPU. The different architecture data can specifically include CPU models (AtomicSimple, TimingSimple, InOrder, O3, etc.) and GPU architectures (Tesla, Fermi, Kepler, Volta, Turing, etc.) and FPGA, TPU and other data.
在步骤S3中,基于历史异构资源配置信息对预先训练好的深度学习负载进行执行指令的运行模式以及指令调度策略的设置。In step S3, based on the historical heterogeneous resource configuration information, the pre-trained deep learning load is configured to execute an instruction operation mode and an instruction scheduling policy.
在本实施例中,运行模式是指在实现不同架构下深度学习负载的应用程序执行指令时的运算模式如串行运行和并行运行等。In this embodiment, the operation mode refers to the operation mode, such as serial operation and parallel operation, when an application program implementing a deep learning load under different architectures executes an instruction.
在本实施例中,指令调度策略是指实现不同架构下深度学习负载的应用程序执行指令时的调度策略,以及能够执行该调度策略的不同硬件组合的参数配置文件。In this embodiment, the instruction scheduling policy refers to a scheduling policy when an application implementing a deep learning load under different architectures executes an instruction, and a parameter configuration file of different hardware combinations capable of executing the scheduling policy.
在本实施例中,基于历史异构资源配置信息对预先训练好的深度学习负载进行执行指令的运行模式以及指令调度策略的设置具体可以是对基于历史异构资源配置信息来对现有硬件类型和架构进行分析,从而设置能够实现不同架构下深度学习负载的应用程序执行指令时的运算模式如串行运行和并行运行等,以及能够实现不同架构下深度学习负载的应用程序执行指令时的调度策略,以及能够执行该调度策略的不同硬件组合的参数配置文件,使得后续能够模拟不同架构下深度学习负载的应用程序运行状态,进一步分析深度学习负载在不同硬件配置下的运行时特征,从而实现不同硬件架构组合的自主化配置功能,模拟异构集群的底层资源配置策略,简化异构环境的配置,实现节约购买物理硬件的成本。In this embodiment, the operation mode of executing the instruction on the pre-trained deep learning load based on the historical heterogeneous resource configuration information and the setting of the instruction scheduling policy may specifically be the setting of the existing hardware type based on the historical heterogeneous resource configuration information. Analyze with the architecture, so as to set the operation mode such as serial operation and parallel operation when the application program that can realize the deep learning load under different architectures executes the instruction, and the scheduling when the application program that can realize the deep learning load under the different architecture executes the instruction strategy, and the parameter configuration files of different hardware combinations that can execute the scheduling strategy, so that the application running state of the deep learning load under different architectures can be simulated in the future, and the runtime characteristics of the deep learning load under different hardware configurations can be further analyzed. The autonomous configuration function of different hardware architecture combinations simulates the underlying resource configuration strategy of heterogeneous clusters, simplifies the configuration of heterogeneous environments, and saves the cost of purchasing physical hardware.
在步骤S4中,基于执行指令使深度学习负载按照运行模式以及指令调度策略运行。In step S4, the deep learning load is made to run according to the operation mode and the instruction scheduling strategy based on the execution instruction.
在本实施例中,基于执行指令使深度学习负载按照运行模式以及指令调度策略运行具体可以是在不同架构下使该深度学习负载的应用程序执行如串行运行和并行运行等运算模式,以及在不同架构下使该深度学习负载的应用程序基于不同硬件组合的参数配置文件执行该调度策略,能够实现在不同架构下模拟深度学习负载的应用程序的运行状态,进而分析深度学习负载在不同硬件配置下的运行时特征,从而实现不同硬件架构组合的自主化配置功能,模拟异构集群的底层资源配置策略,简化异构环境的配置,实现节约购买物理硬件的成本。In this embodiment, making the deep learning load run according to the operation mode and the instruction scheduling strategy based on the execution instructions may specifically be to make the application program of the deep learning load execute the operation modes such as serial operation and parallel operation under different architectures, and in the Under different architectures, the application of the deep learning load executes the scheduling strategy based on the parameter configuration files of different hardware combinations, which can simulate the running state of the application of the deep learning load under different architectures, and then analyze the deep learning load in different hardware configurations. In this way, it can realize the independent configuration function of different hardware architecture combinations, simulate the underlying resource configuration strategy of heterogeneous clusters, simplify the configuration of heterogeneous environments, and save the cost of purchasing physical hardware.
在步骤S5中,基于kubernetes虚拟化技术对运行中的深度学习负载进行集群节点模拟扩展,模拟深度学习负载的大规模异构集群运行,得到深度学习负载的运行行为特征以及运行状态数据。In step S5, based on the kubernetes virtualization technology, cluster nodes are simulated and expanded for the running deep learning load, and the large-scale heterogeneous cluster operation of the deep learning load is simulated to obtain the running behavior characteristics and running status data of the deep learning load.
在本实施例中,Kubernetes(K8s)因其可移植、可扩展和自修复的特点成为目前工业界和学术界广泛应用的容器编排工具。In this embodiment, Kubernetes (K8s) has become a container orchestration tool widely used in industry and academia due to its features of portability, scalability and self-healing.
在本实施例中,基于kubernetes虚拟化技术对运行中的深度学习负载进行集群节点模拟扩展具体可以是通过采用k8s部署集群,并基于k8s容器编排工具通过模拟物理机特征实现将少量节点模拟出大规模节点的集群来模拟深度学习负载的大规模异构集群运行,从而准确获取能够直观反映集群运行状态的运行状态数据,不仅能够实现集群节点的模拟扩展,还能够为集群调度优化研究提供了良好的环境支持。In this embodiment, the cluster node simulation and expansion of the running deep learning load based on the kubernetes virtualization technology may be implemented by using k8s to deploy the cluster, and based on the k8s container orchestration tool to simulate the characteristics of a small number of nodes by simulating physical machine characteristics. The cluster of large-scale nodes can simulate the operation of large-scale heterogeneous clusters of deep learning loads, so as to accurately obtain the running status data that can intuitively reflect the running status of the cluster. environment support.
本申请提供了一种异构集群调度的模拟方法,包括:接收用户终端发送的模拟运行请求;响应模拟运行请求,读取本地数据库获取历史异构资源配置信息;基于历史异构资源配置信息对预先训练好的深度学习负载进行执行指令的运行模式以及指令调度策略的设置;基于执行指令使深度学习负载按照运行模式以及指令调度策略运行;基于kubernetes虚拟化技术对运行中的深度学习负载进行集群节点模拟扩展,模拟深度学习负载的大规模异构集群运行,得到深度学习负载的运行行为特征以及运行状态数据。基于预先收集到的历史异构资源配置信息针对预先训练好的深度学习负载进行执行指令的运行模式以及指令调度策略的设置;并基于执行指令使深度学习负载按照运行模式以及指令调度策略运行来准确获取深度学习负载的运行行为特征;进而,基于kubernetes虚拟化技术对运行中的深度学习负载进行集群节点模拟扩展,从而模拟深度学习负载的大规模异构集群运行来获取运行状态数据,不仅能够实现模拟kubernetes大规模集群部署方式,还能够通过少量节点实现模拟大规模异构集群的应用场景,从而为调度研发人员提供低成本的实验环境,并在一定程度上有效降低异构集群的评估时间以及成本。The present application provides a method for simulating heterogeneous cluster scheduling, including: receiving a simulation running request sent by a user terminal; responding to the simulation running request, reading a local database to obtain historical heterogeneous resource configuration information; The pre-trained deep learning load executes the operation mode of the instruction and sets the instruction scheduling strategy; based on the execution instruction, the deep learning load runs according to the operation mode and the instruction scheduling strategy; based on the kubernetes virtualization technology, the running deep learning load is clustered Node simulation expansion, simulating the operation of large-scale heterogeneous clusters of deep learning workloads, and obtaining the running behavior characteristics and running status data of deep learning workloads. Based on the pre-collected historical heterogeneous resource configuration information, execute the instruction operation mode and instruction scheduling policy setting for the pre-trained deep learning load; Obtain the operating behavior characteristics of the deep learning workload; further, based on the kubernetes virtualization technology, the cluster nodes are simulated and expanded for the running deep learning workload, so as to simulate the large-scale heterogeneous cluster operation of the deep learning workload to obtain the running status data, which can not only realize Simulate the large-scale cluster deployment mode of kubernetes, and can also simulate the application scenarios of large-scale heterogeneous clusters through a small number of nodes, thus providing a low-cost experimental environment for scheduling R&D personnel, and effectively reducing the evaluation time of heterogeneous clusters to a certain extent. cost.
继续参考图3,示出了根据本申请的异构集群调度的模拟方法的深度学习负载训练的流程图,为了便于说明,仅示出了与本申请相关的部分。Continuing to refer to FIG. 3 , a flowchart of deep learning load training according to the simulation method for heterogeneous cluster scheduling of the present application is shown. For the convenience of description, only the parts related to the present application are shown.
在本实施例一的一些可选的实现方式中,在上述步骤S4之前,该方法还包括:步骤S301以及步骤S302;上述步骤S4包括:步骤S303。In some optional implementations of the first embodiment, before the above step S4, the method further includes: step S301 and step S302; the above step S4 includes: step S303.
在步骤S301中,读取本地数据库,获取训练数据集。In step S301, a local database is read to obtain a training data set.
在步骤S302中,基于训练数据集对基础深度学习负载benchmark进行训练,得到训练好的深度学习负载benchmark。In step S302, the basic deep learning load benchmark is trained based on the training data set to obtain a trained deep learning load benchmark.
在步骤S303中,基于执行指令使深度学习负载benchmark按照运行模式以及指令调度策略运行。In step S303, the deep learning load benchmark is made to run according to the operation mode and the instruction scheduling policy based on the execution instruction.
在本实施例中,基础深度学习负载benchmark是从现有的几种不同类型的深度学习负载中筛选具有代表性的benchmarks,主要用于测试负载的执行时间、传输速度、吞吐量、资源占用率等数据。In this embodiment, the basic deep learning workload benchmark is to select representative benchmarks from several different types of existing deep learning workloads, and is mainly used to test the execution time, transmission speed, throughput, and resource occupancy of the workload. etc. data.
在本实施例中,训练数据集是每个benchmark所对应的不同规模的训练数据集。In this embodiment, the training data set is a training data set of different scales corresponding to each benchmark.
在本实施例中,通过收集每个benchmark所对应的不同规模的训练数据集来对选取的基础深度学习负载benchmark进行训练,以获取能够用于测试模拟深度学习负载的应用程序在不同架构下运行的深度学习负载benchmark,进而后续当在不同架构下该深度学习负载benchmark的应用程序执行如串行运行和并行运行等运算模式,以及在不同架构下该深度学习负载benchmark的应用程序基于不同硬件组合的参数配置文件执行该调度策略时,能够实现在不同架构下模拟深度学习负载benchmark的应用程序的运行状态,进而分析深度学习负载benchmark在不同硬件配置下的运行时特征,从而实现不同硬件架构组合的自主化配置功能,模拟异构集群的底层资源配置策略,简化异构环境的配置,实现节约购买物理硬件的成本。In this embodiment, the selected basic deep learning load benchmark is trained by collecting training data sets of different scales corresponding to each benchmark, so as to obtain an application program that can be used to test the simulated deep learning load and run under different architectures The deep learning load benchmark, and then when the application of the deep learning load benchmark under different architectures executes operation modes such as serial operation and parallel operation, and the application of the deep learning load benchmark under different architectures is based on different hardware combinations When the scheduling strategy is executed by the parameter configuration file, it can simulate the running state of the application program of the deep learning load benchmark under different architectures, and then analyze the runtime characteristics of the deep learning load benchmark under different hardware configurations, so as to realize the combination of different hardware architectures. The self-configuration function of the system simulates the underlying resource configuration strategy of heterogeneous clusters, simplifies the configuration of heterogeneous environments, and saves the cost of purchasing physical hardware.
在本实施例一的一些可选的实现方式中,在上述步骤S302之后,该方法还包括:In some optional implementations of the first embodiment, after the above step S302, the method further includes:
对深度学习负载benchmark的应用层以及微架构层的性能指标进行脚本编辑,得到深度学习负载benchmark对应的指标采集配置文件。Script editing of the performance indicators of the application layer and the micro-architecture layer of the deep learning load benchmark to obtain the index collection configuration file corresponding to the deep learning load benchmark.
在本实施例中,脚本编辑是指能够自由选择性能指标以及采样间隔的配置文件的设置。In this embodiment, script editing refers to setting of a configuration file that can freely select performance indicators and sampling intervals.
在本实施例中,深度学习负载benchmark的应用层的性能指标如CPU利用率、内存利用率、磁盘IO大小、网络带宽等。In this embodiment, the performance indicators of the application layer of the deep learning load benchmark include CPU utilization, memory utilization, disk IO size, network bandwidth, and the like.
在本实施例中,深度学习负载benchmark的微架构层的性能指标如IPC(Instructions per Cycle),分支预测(Branch Predict)和缓存丢失(Cache misses)等数据。In this embodiment, the performance indicators of the micro-architecture layer of the deep learning load benchmark are data such as IPC (Instructions per Cycle), branch prediction (Branch Predict), and cache misses (Cache misses).
在本实施例中,为了能够对在不同架构下深度学习负载benchmark模拟运行采用的调度算法进行有效的调度性能评估,本实施例通过对深度学习负载benchmark的应用层的性能指标如CPU利用率、内存利用率、磁盘IO大小、网络带宽等,以及深度学习负载benchmark的微架构层的性能指标如IPC(Instructions per Cycle),分支预测(Branch Predict)和缓存丢失(Cache misses)等数据,进行脚本设置,得到包含有自由选择性能指标以及采样间隔功能的指标采集配置文件,以使后续能够基于该采集配置文件对运行中的深度学习负载进行调度性能指标采集,并对采集到的数据进行性能评估,以获取性能评估结果,从而实现对调度算法性能有效评估,并在一定程度上有效降低异构集群的评估时间以及成本。In this embodiment, in order to be able to effectively evaluate the scheduling performance of the scheduling algorithms used in the simulation operation of the deep learning load benchmark under different architectures, in this embodiment, the performance indicators of the application layer of the deep learning load benchmark, such as CPU utilization, Memory utilization, disk IO size, network bandwidth, etc., as well as the performance indicators of the micro-architecture layer of the deep learning load benchmark such as IPC (Instructions per Cycle), branch prediction (Branch Predict) and cache misses (Cache misses) and other data, script Set to obtain an indicator collection configuration file that includes the function of freely selecting performance indicators and sampling interval, so that the subsequent collection of performance indicators can be scheduled for the running deep learning load based on the collection configuration file, and the performance of the collected data can be evaluated. , to obtain the performance evaluation results, so as to effectively evaluate the performance of the scheduling algorithm, and to a certain extent, effectively reduce the evaluation time and cost of heterogeneous clusters.
继续参考图4,示出了根据本申请的异构集群调度的模拟方法的性能数据评价的流程图,为了便于说明,仅示出了与本申请相关的部分。Continuing to refer to FIG. 4 , a flow chart of performance data evaluation of the simulation method for heterogeneous cluster scheduling according to the present application is shown. For the convenience of description, only the parts related to the present application are shown.
在本实施例一的一些可选的实现方式中,在上述步骤S5之后,该方法还包括:步骤S401以及步骤S402。In some optional implementation manners of the first embodiment, after the foregoing step S5, the method further includes: step S401 and step S402.
在步骤S401中,基于指标采集配置文件对运行中的深度学习负载进行调度性能指标采集,得到调度性能数据。In step S401, scheduling performance indicators are collected for the running deep learning load based on the indicator collection configuration file to obtain scheduling performance data.
在步骤S402中,基于预设的性能指标对调度性能数据进行评价,得到调度性能评估结果。In step S402, the scheduling performance data is evaluated based on a preset performance index to obtain a scheduling performance evaluation result.
在本实施例中,预设的性能指标是根据实际应用需求设置的能够用于选取评估调度性能的关键指标如完成每次调度的时间成本、平均作业周转时间和集群整体资源利用率的变化等,此处不作具体限制。In this embodiment, the preset performance indicators are set according to actual application requirements and can be used to select key indicators for evaluating scheduling performance, such as the time cost of completing each scheduling, the average job turnaround time, and changes in the overall resource utilization of the cluster, etc. , there is no specific restriction here.
在本实施例中,调度性能评估结果是能够用于直观反映深度学习负载benchmark模拟运行时的调度性能的评价指标。In this embodiment, the scheduling performance evaluation result is an evaluation index that can be used to intuitively reflect the scheduling performance when the deep learning load benchmark is simulated and run.
在本实施例中,基于指标采集配置文件中采样间隔来对模拟运行中的深度学***均作业周转时间和集群整体资源利用率的变化等,从而得到能够用于直观反映深度学习负载benchmark模拟运行时的调度性能的调度性能评估结果,以及在一定程度上有效降低异构集群的评估时间以及成本。In this embodiment, based on the sampling interval in the indicator collection configuration file, the scheduling performance indicators are collected for the deep learning load in the simulation operation, and the scheduling performance data that can be used to intuitively reflect the scheduling performance of the deep learning load benchmark simulation operation is obtained. Furthermore, the key indicators that meet the preset performance index requirements are screened out from the scheduling performance data, such as the time cost of completing each scheduling, the average job turnaround time, and changes in the overall resource utilization of the cluster, etc., so as to obtain information that can be used to intuitively reflect The scheduling performance evaluation results of the deep learning load benchmark simulation runtime scheduling performance, and to a certain extent, effectively reduce the evaluation time and cost of heterogeneous clusters.
综上所述,本申请提供了一种异构集群调度的模拟方法,包括:接收用户终端发送的模拟运行请求;响应模拟运行请求,读取本地数据库获取历史异构资源配置信息;基于历史异构资源配置信息对预先训练好的深度学习负载进行执行指令的运行模式以及指令调度策略的设置;基于执行指令使深度学习负载按照运行模式以及指令调度策略运行;基于kubernetes虚拟化技术对运行中的深度学习负载进行集群节点模拟扩展,模拟深度学习负载的大规模异构集群运行,得到深度学习负载的运行行为特征以及运行状态数据。基于训练数据集对基础深度学习负载benchmark进行训练,得到训练好的深度学习负载benchmark,并基于预先收集到的历史异构资源配置信息针对深度学习负载benchmark进行执行指令的运行模式以及指令调度策略的设置;进而基于执行指令使深度学习负载benchmark按照运行模式以及指令调度策略运行来准确获取深度学习负载的运行行为特征;再基于kubernetes虚拟化技术对运行中的深度学习负载进行集群节点模拟扩展,从而模拟深度学习负载的大规模异构集群运行来获取运行状态数据;然后,基于深度学习负载benchmark的应用层以及微架构层的性能指标进行设置的指标采集配置文件,对运行中的深度学习负载进行调度性能指标采集,并对采集到的调度性能数据进行评价,以快速获取调度性能评估结果。不仅能够实现模拟kubernetes大规模集群部署方式,还能够通过少量节点实现模拟大规模异构集群的应用场景,从而为调度研发人员提供低成本的实验环境,并在一定程度上有效降低异构集群的评估时间以及成本。To sum up, the present application provides a method for simulating heterogeneous cluster scheduling, including: receiving a simulation running request sent by a user terminal; responding to the simulation running request, reading a local database to obtain historical heterogeneous resource configuration information; Configure resource configuration information to set the operation mode and instruction scheduling strategy of pre-trained deep learning workloads to execute instructions; make deep learning workloads run according to the running mode and instruction scheduling strategy based on execution instructions; based on kubernetes virtualization technology The deep learning load is simulated and expanded by cluster nodes, and the large-scale heterogeneous cluster operation of the deep learning load is simulated, and the running behavior characteristics and running status data of the deep learning load are obtained. The basic deep learning workload benchmark is trained based on the training data set to obtain the trained deep learning workload benchmark, and based on the pre-collected historical heterogeneous resource configuration information, the execution mode of the command and the command scheduling strategy are executed for the deep learning workload benchmark. Setting; and then make the deep learning load benchmark run according to the operation mode and instruction scheduling strategy based on the execution instructions to accurately obtain the running behavior characteristics of the deep learning load; and then simulate and expand the running deep learning load based on the kubernetes virtualization technology. The large-scale heterogeneous cluster operation that simulates the deep learning load to obtain the running status data; then, based on the performance indicators of the application layer of the deep learning load benchmark and the performance indicators of the micro-architecture layer, the index collection configuration file is set, and the deep learning load in operation is analyzed. Collect scheduling performance indicators, and evaluate the collected scheduling performance data to quickly obtain scheduling performance evaluation results. It can not only simulate the large-scale cluster deployment of kubernetes, but also simulate the application scenarios of large-scale heterogeneous clusters through a small number of nodes, thus providing a low-cost experimental environment for scheduling R&D personnel, and effectively reducing the cost of heterogeneous clusters to a certain extent. Assess time and cost.
本领域普通技术人员可以理解实现上述实施例方法中的全部或部分流程,是可以通过计算机程序来指令相关的硬件来完成,该计算机程序可存储于一计算机可读取存储介质中,该程序在执行时,可包括如上述各方法的实施例的流程。其中,前述的存储介质可为磁碟、光盘、只读存储记忆体(Read-Only Memory,ROM)等非易失性存储介质,或随机存储记忆体(Random Access Memory,RAM)等。Those of ordinary skill in the art can understand that the realization of all or part of the processes in the methods of the above embodiments can be accomplished by instructing relevant hardware through a computer program, and the computer program can be stored in a computer-readable storage medium, and the program is During execution, it may include the processes of the embodiments of the above-mentioned methods. The aforementioned storage medium may be a non-volatile storage medium such as a magnetic disk, an optical disk, a read-only memory (Read-Only Memory, ROM), or a random access memory (Random Access Memory, RAM).
应该理解的是,虽然附图的流程图中的各个步骤按照箭头的指示依次显示,但是这些步骤并不是必然按照箭头指示的顺序依次执行。除非本文中有明确的说明,这些步骤的执行并没有严格的顺序限制,其可以以其他的顺序执行。而且,附图的流程图中的至少一部分步骤可以包括多个子步骤或者多个阶段,这些子步骤或者阶段并不必然是在同一时刻执行完成,而是可以在不同的时刻执行,其执行顺序也不必然是依次进行,而是可以与其他步骤或者其他步骤的子步骤或者阶段的至少一部分轮流或者交替地执行。It should be understood that although the various steps in the flowchart of the accompanying drawings are sequentially shown in the order indicated by the arrows, these steps are not necessarily executed in sequence in the order indicated by the arrows. Unless explicitly stated herein, the execution of these steps is not strictly limited to the order and may be performed in other orders. Moreover, at least a part of the steps in the flowcharts of the accompanying drawings may include multiple sub-steps or multiple stages, and these sub-steps or stages are not necessarily executed at the same time, but may be executed at different times, and the execution sequence is also It does not have to be performed sequentially, but may be performed alternately or alternately with other steps or at least a portion of sub-steps or stages of other steps.
实施例二Embodiment 2
进一步参考图5,作为对上述图2所示方法的实现,本申请提供了一种异构集群调度的模拟装置的一个实施例,该装置实施例与图2所示的方法实施例相对应,该装置具体可以应用于各种电子设备中。With further reference to FIG. 5 , as an implementation of the method shown in FIG. 2 above, the present application provides an embodiment of a simulation apparatus for heterogeneous cluster scheduling, and the apparatus embodiment corresponds to the method embodiment shown in FIG. 2 , The device can be specifically applied to various electronic devices.
如图5所示,本申请实施例二提供的异构集群调度的模拟装置100包括:请求接收模块101、信息读取模块102、指令设置模块103、负载运行模块104以及模拟运行模块105。其中:As shown in FIG. 5 , the simulation apparatus 100 for heterogeneous cluster scheduling provided in the second embodiment of the present application includes: a request receiving module 101 , an information reading module 102 , an instruction setting module 103 , a load running module 104 and a simulation running module 105 . in:
请求接收模块101,用于接收用户终端发送的模拟运行请求;A request receiving module 101, configured to receive a simulation running request sent by a user terminal;
在本实施例中,模拟运行请求是调度研发人员为了获取深度学习负载在不同高性能异构硬件配置下的运行时特征需要提供环境支持而发出的操作请求。In this embodiment, the simulated operation request is an operation request sent by the scheduling R&D personnel to obtain the runtime characteristics of the deep learning load under different high-performance heterogeneous hardware configurations and need to provide environmental support.
信息读取模块102,用于响应所述模拟运行请求,读取本地数据库获取历史异构资源配置信息;an information reading module 102, configured to read a local database to obtain historical heterogeneous resource configuration information in response to the simulation operation request;
在本实施例中,历史异构资源配置信息是在模拟运行前,为了能够有针对性的对深度学习负载的应用程序在不同架构芯片下的运行方式和API调用关系进行设置而进行采集的现有CPU与高性能芯片如GPU的不同架构数据,其中,该不同架构数据具体可以包含CPU模型(AtomicSimple、TimingSimple、InOrder和O3等)与GPU架构(Tesla、Fermi、Kepler、Volta、Turing等)以及FPGA、TPU等数据。In this embodiment, the historical heterogeneous resource configuration information is collected before the simulation operation, in order to be able to set the operation mode and API calling relationship of the application program of the deep learning load under different architecture chips in a targeted manner. There are different architecture data of CPU and high-performance chips such as GPU. The different architecture data can specifically include CPU models (AtomicSimple, TimingSimple, InOrder, O3, etc.) and GPU architectures (Tesla, Fermi, Kepler, Volta, Turing, etc.) and FPGA, TPU and other data.
指令设置模块103,用于基于所述历史异构资源配置信息对预先训练好的深度学习负载进行执行指令的运行模式以及指令调度策略的设置;The instruction setting module 103 is used for setting the operation mode of executing the instruction and the instruction scheduling strategy for the pre-trained deep learning load based on the historical heterogeneous resource configuration information;
在本实施例中,运行模式是指在实现不同架构下深度学习负载的应用程序执行指令时的运算模式如串行运行和并行运行等。In this embodiment, the operation mode refers to the operation mode, such as serial operation and parallel operation, when an application program implementing a deep learning load under different architectures executes an instruction.
在本实施例中,指令调度策略是指实现不同架构下深度学习负载的应用程序执行指令时的调度策略,以及能够执行该调度策略的不同硬件组合的参数配置文件。In this embodiment, the instruction scheduling policy refers to a scheduling policy when an application implementing a deep learning load under different architectures executes an instruction, and a parameter configuration file of different hardware combinations capable of executing the scheduling policy.
在本实施例中,基于历史异构资源配置信息对预先训练好的深度学习负载进行执行指令的运行模式以及指令调度策略的设置具体可以是对基于历史异构资源配置信息来对现有硬件类型和架构进行分析,从而设置能够实现不同架构下深度学习负载的应用程序执行指令时的运算模式如串行运行和并行运行等,以及能够实现不同架构下深度学习负载的应用程序执行指令时的调度策略,以及能够执行该调度策略的不同硬件组合的参数配置文件,使得后续能够模拟不同架构下深度学习负载的应用程序运行状态,进一步分析深度学习负载在不同硬件配置下的运行时特征,从而实现不同硬件架构组合的自主化配置功能,模拟异构集群的底层资源配置策略,简化异构环境的配置,实现节约购买物理硬件的成本。In this embodiment, the operation mode of executing the instruction on the pre-trained deep learning load based on the historical heterogeneous resource configuration information and the setting of the instruction scheduling policy may specifically be the setting of the existing hardware type based on the historical heterogeneous resource configuration information. Analyze with the architecture, so as to set the operation mode such as serial operation and parallel operation when the application program that can realize the deep learning load under different architectures executes the instruction, and the scheduling when the application program that can realize the deep learning load under the different architecture executes the instruction strategy, and the parameter configuration files of different hardware combinations that can execute the scheduling strategy, so that the application running state of the deep learning load under different architectures can be simulated in the future, and the runtime characteristics of the deep learning load under different hardware configurations can be further analyzed. The autonomous configuration function of different hardware architecture combinations simulates the underlying resource configuration strategy of heterogeneous clusters, simplifies the configuration of heterogeneous environments, and saves the cost of purchasing physical hardware.
负载运行模块104,用于基于所述执行指令使所述深度学习负载按照所述运行模式以及所述指令调度策略运行;A load running module 104, configured to make the deep learning load run according to the running mode and the instruction scheduling policy based on the execution instruction;
在本实施例中,基于执行指令使深度学习负载按照运行模式以及指令调度策略运行具体可以是在不同架构下使该深度学习负载的应用程序执行如串行运行和并行运行等运算模式,以及在不同架构下使该深度学习负载的应用程序基于不同硬件组合的参数配置文件执行该调度策略,能够实现在不同架构下模拟深度学习负载的应用程序的运行状态,进而分析深度学习负载在不同硬件配置下的运行时特征,从而实现不同硬件架构组合的自主化配置功能,模拟异构集群的底层资源配置策略,简化异构环境的配置,实现节约购买物理硬件的成本。In this embodiment, making the deep learning load run according to the operation mode and the instruction scheduling strategy based on the execution instructions may specifically be to make the application program of the deep learning load execute the operation modes such as serial operation and parallel operation under different architectures, and in the Under different architectures, the application of the deep learning load executes the scheduling strategy based on the parameter configuration files of different hardware combinations, which can simulate the running state of the application of the deep learning load under different architectures, and then analyze the deep learning load in different hardware configurations. In this way, it can realize the independent configuration function of different hardware architecture combinations, simulate the underlying resource configuration strategy of heterogeneous clusters, simplify the configuration of heterogeneous environments, and save the cost of purchasing physical hardware.
模拟运行模块105,用于基于kubernetes虚拟化技术对运行中的所述深度学习负载进行集群节点模拟扩展,模拟所述深度学习负载的大规模异构集群运行,得到所述深度学习负载的运行行为特征以及运行状态数据。The simulation operation module 105 is used for performing cluster node simulation expansion on the running deep learning load based on the kubernetes virtualization technology, simulating the large-scale heterogeneous cluster operation of the deep learning load, and obtaining the operation behavior of the deep learning load characteristics and operating status data.
在本实施例中,Kubernetes(K8s)因其可移植、可扩展和自修复的特点成为目前工业界和学术界广泛应用的容器编排工具。In this embodiment, Kubernetes (K8s) has become a container orchestration tool widely used in industry and academia due to its features of portability, scalability and self-healing.
在本实施例中,基于kubernetes虚拟化技术对运行中的深度学习负载进行集群节点模拟扩展具体可以是通过采用k8s部署集群,并基于k8s容器编排工具通过模拟物理机特征实现将少量节点模拟出大规模节点的集群来模拟深度学习负载的大规模异构集群运行,从而准确获取能够直观反映集群运行状态的运行状态数据,不仅能够实现集群节点的模拟扩展,还能够为集群调度优化研究提供了良好的环境支持。In this embodiment, the cluster node simulation and expansion of the running deep learning load based on the kubernetes virtualization technology may be implemented by using k8s to deploy the cluster, and based on the k8s container orchestration tool to simulate a small number of nodes by simulating the characteristics of the physical machine. The cluster of large-scale nodes can simulate the operation of large-scale heterogeneous clusters of deep learning loads, so as to accurately obtain the running status data that can intuitively reflect the running status of the cluster. environment support.
本申请提供了一种异构集群调度的模拟装置,包括:基于预先收集到的历史异构资源配置信息针对预先训练好的深度学习负载进行执行指令的运行模式以及指令调度策略的设置;并基于执行指令使深度学习负载按照运行模式以及指令调度策略运行来准确获取深度学习负载的运行行为特征;进而,基于kubernetes虚拟化技术对运行中的深度学习负载进行集群节点模拟扩展,从而模拟深度学习负载的大规模异构集群运行来获取运行状态数据,不仅能够实现模拟kubernetes大规模集群部署方式,还能够通过少量节点实现模拟大规模异构集群的应用场景,从而为调度研发人员提供低成本的实验环境,并在一定程度上有效降低异构集群的评估时间以及成本。The present application provides a simulation device for heterogeneous cluster scheduling, which includes: based on pre-collected historical heterogeneous resource configuration information, an operation mode of executing an instruction for a pre-trained deep learning load and setting of an instruction scheduling strategy; Execute the instructions to make the deep learning load run according to the operation mode and the instruction scheduling strategy to accurately obtain the operating behavior characteristics of the deep learning load; further, based on the kubernetes virtualization technology, the running deep learning load is simulated and expanded by cluster nodes to simulate the deep learning load. It can not only simulate the large-scale cluster deployment mode of kubernetes, but also simulate the application scenarios of large-scale heterogeneous clusters through a small number of nodes, so as to provide low-cost experiments for scheduling developers environment, and effectively reduce the evaluation time and cost of heterogeneous clusters to a certain extent.
继续参考图6,示出了根据本申请的异构集群调度的模拟装置的深度学习负载训练的结构示意图,为了便于说明,仅示出了与本申请相关的部分。Continuing to refer to FIG. 6 , there is shown a schematic structural diagram of deep learning load training of a simulation device for heterogeneous cluster scheduling according to the present application. For the convenience of description, only parts related to the present application are shown.
在本实施例二的一些可选的实现方式中,该装置还包括:数据集获取模块601以及负载训练模块602;上述负载运行模块104包括:负载运行单元603。In some optional implementations of the second embodiment, the apparatus further includes: a data set acquisition module 601 and a load training module 602 ; the above-mentioned load operation module 104 includes: a load operation unit 603 .
数据集获取模块601,用于读取本地数据库,获取训练数据集;A data set acquisition module 601, used for reading a local database and acquiring a training data set;
负载训练模块602,用于基于训练数据集对基础深度学习负载benchmark进行训练,得到训练好的深度学习负载benchmark;The load training module 602 is used to train the basic deep learning load benchmark based on the training data set to obtain the trained deep learning load benchmark;
负载运行单元603,用于基于执行指令使深度学习负载benchmark按照运行模式以及指令调度策略运行。The load running unit 603 is configured to run the deep learning load benchmark according to the running mode and the instruction scheduling policy based on the execution instruction.
在本实施例中,基础深度学习负载benchmark是从现有的几种不同类型的深度学习负载中筛选具有代表性的benchmarks,主要用于测试负载的执行时间、传输速度、吞吐量、资源占用率等数据。In this embodiment, the basic deep learning workload benchmark is to select representative benchmarks from several different types of existing deep learning workloads, and is mainly used to test the execution time, transmission speed, throughput, and resource occupancy of the workload. etc. data.
在本实施例中,训练数据集是每个benchmark所对应的不同规模的训练数据集。In this embodiment, the training data set is a training data set of different scales corresponding to each benchmark.
在本实施例中,通过收集每个benchmark所对应的不同规模的训练数据集来对选取的基础深度学习负载benchmark进行训练,以获取能够用于测试模拟深度学习负载的应用程序在不同架构下运行的深度学习负载benchmark,进而后续当在不同架构下该深度学习负载benchmark的应用程序执行如串行运行和并行运行等运算模式,以及在不同架构下该深度学习负载benchmark的应用程序基于不同硬件组合的参数配置文件执行该调度策略时,能够实现在不同架构下模拟深度学习负载benchmark的应用程序的运行状态,进而分析深度学习负载benchmark在不同硬件配置下的运行时特征,从而实现不同硬件架构组合的自主化配置功能,模拟异构集群的底层资源配置策略,简化异构环境的配置,实现节约购买物理硬件的成本。In this embodiment, the selected basic deep learning load benchmark is trained by collecting training data sets of different scales corresponding to each benchmark, so as to obtain an application program that can be used to test the simulated deep learning load and run under different architectures The deep learning load benchmark, and then when the application of the deep learning load benchmark under different architectures executes operation modes such as serial operation and parallel operation, and the application of the deep learning load benchmark under different architectures is based on different hardware combinations When the scheduling strategy is executed by the parameter configuration file, it can simulate the running state of the application program of the deep learning load benchmark under different architectures, and then analyze the runtime characteristics of the deep learning load benchmark under different hardware configurations, so as to realize the combination of different hardware architectures. The self-configuration function of the system simulates the underlying resource configuration strategy of heterogeneous clusters, simplifies the configuration of heterogeneous environments, and saves the cost of purchasing physical hardware.
在本实施例二的一些可选的实现方式中,该装置还包括:In some optional implementation manners of the second embodiment, the device further includes:
对深度学习负载benchmark的应用层以及微架构层的性能指标进行脚本编辑,得到深度学习负载benchmark对应的指标采集配置文件。Script editing of the performance indicators of the application layer and the micro-architecture layer of the deep learning load benchmark to obtain the index collection configuration file corresponding to the deep learning load benchmark.
在本实施例中,脚本编辑是指能够自由选择性能指标以及采样间隔的配置文件的设置。In this embodiment, script editing refers to setting of a configuration file that can freely select performance indicators and sampling intervals.
在本实施例中,深度学习负载benchmark的应用层的性能指标如CPU利用率、内存利用率、磁盘IO大小、网络带宽等。In this embodiment, the performance indicators of the application layer of the deep learning load benchmark include CPU utilization, memory utilization, disk IO size, network bandwidth, and the like.
在本实施例中,深度学习负载benchmark的微架构层的性能指标如IPC(Instructions per Cycle),分支预测(Branch Predict)和缓存丢失(Cache misses)等数据。In this embodiment, the performance indicators of the micro-architecture layer of the deep learning load benchmark are data such as IPC (Instructions per Cycle), branch prediction (Branch Predict), and cache misses (Cache misses).
在本实施例中,为了能够对在不同架构下深度学习负载benchmark模拟运行采用的调度算法进行有效的调度性能评估,本实施例通过对深度学习负载benchmark的应用层的性能指标如CPU利用率、内存利用率、磁盘IO大小、网络带宽等,以及深度学习负载benchmark的微架构层的性能指标如IPC(Instructions per Cycle),分支预测(Branch Predict)和缓存丢失(Cache misses)等数据,进行脚本设置,得到包含有自由选择性能指标以及采样间隔功能的指标采集配置文件,以使后续能够基于该采集配置文件对运行中的深度学习负载进行调度性能指标采集,并对采集到的数据进行性能评估,以获取性能评估结果,从而实现对调度算法性能有效评估,并在一定程度上有效降低异构集群的评估时间以及成本。In this embodiment, in order to be able to effectively evaluate the scheduling performance of the scheduling algorithms used in the simulation operation of the deep learning load benchmark under different architectures, in this embodiment, the performance indicators of the application layer of the deep learning load benchmark, such as CPU utilization, Memory utilization, disk IO size, network bandwidth, etc., as well as the performance indicators of the micro-architecture layer of the deep learning load benchmark such as IPC (Instructions per Cycle), branch prediction (Branch Predict) and cache misses (Cache misses) and other data, script Set to obtain an indicator collection configuration file that includes the function of freely selecting performance indicators and sampling interval, so that the subsequent collection of performance indicators can be scheduled for the running deep learning load based on the collection configuration file, and the performance of the collected data can be evaluated. , to obtain the performance evaluation results, so as to effectively evaluate the performance of the scheduling algorithm, and to a certain extent, effectively reduce the evaluation time and cost of heterogeneous clusters.
继续参考图7,示出了根据本申请的异构集群调度的模拟装置的性能数据评价的结构示意图,为了便于说明,仅示出了与本申请相关的部分。Continuing to refer to FIG. 7 , a schematic structural diagram of the performance data evaluation of the simulation apparatus for heterogeneous cluster scheduling according to the present application is shown. For the convenience of description, only the parts related to the present application are shown.
在本实施例二的一些可选的实现方式中,该装置还包括:性能指标采集模块701以及性能评估模块702。In some optional implementations of the second embodiment, the apparatus further includes: a performance index collection module 701 and a performance evaluation module 702 .
性能指标采集模块701,用于基于指标采集配置文件对运行中的深度学习负载进行调度性能指标采集,得到调度性能数据;A performance indicator collection module 701, configured to collect scheduling performance indicators for the running deep learning load based on the indicator collection configuration file to obtain scheduling performance data;
性能评估模块702,用于基于预设的性能指标对调度性能数据进行评价,得到调度性能评估结果。The performance evaluation module 702 is configured to evaluate the scheduling performance data based on a preset performance index to obtain a scheduling performance evaluation result.
在本实施例中,预设的性能指标是根据实际应用需求设置的能够用于选取评估调度性能的关键指标如完成每次调度的时间成本、平均作业周转时间和集群整体资源利用率的变化等,此处不作具体限制。In this embodiment, the preset performance indicators are set according to actual application requirements and can be used to select key indicators for evaluating scheduling performance, such as the time cost of completing each scheduling, the average job turnaround time, and changes in the overall resource utilization of the cluster, etc. , there is no specific restriction here.
在本实施例中,调度性能评估结果是能够用于直观反映深度学习负载benchmark模拟运行时的调度性能的评价指标。In this embodiment, the scheduling performance evaluation result is an evaluation index that can be used to intuitively reflect the scheduling performance when the deep learning load benchmark is simulated and run.
在本实施例中,基于指标采集配置文件中采样间隔来对模拟运行中的深度学***均作业周转时间和集群整体资源利用率的变化等,从而得到能够用于直观反映深度学习负载benchmark模拟运行时的调度性能的调度性能评估结果,以及在一定程度上有效降低异构集群的评估时间以及成本。In this embodiment, based on the sampling interval in the indicator collection configuration file, the scheduling performance indicators are collected for the deep learning load in the simulation operation, and the scheduling performance data that can be used to intuitively reflect the scheduling performance of the deep learning load benchmark simulation operation is obtained. Furthermore, the key indicators that meet the preset performance index requirements are screened out from the scheduling performance data, such as the time cost of completing each scheduling, the average job turnaround time, and changes in the overall resource utilization of the cluster, etc., so as to obtain information that can be used to intuitively reflect The scheduling performance evaluation results of the deep learning load benchmark simulation runtime scheduling performance, and to a certain extent, effectively reduce the evaluation time and cost of heterogeneous clusters.
综上所述,本申请提供了一种异构集群调度的模拟装置,包括:请求接收模块,用于接收用户终端发送的模拟运行请求;信息读取模块,用于响应模拟运行请求,读取本地数据库获取历史异构资源配置信息;指令设置模块,用于基于历史异构资源配置信息对预先训练好的深度学习负载进行执行指令的运行模式以及指令调度策略的设置;负载运行模块,用于基于执行指令使深度学习负载按照运行模式以及指令调度策略运行;模拟运行模块,用于基于kubernetes虚拟化技术对运行中的深度学习负载进行集群节点模拟扩展,模拟深度学习负载的大规模异构集群运行,得到深度学习负载的运行行为特征以及运行状态数据。基于训练数据集对基础深度学习负载benchmark进行训练,得到训练好的深度学习负载benchmark,并基于预先收集到的历史异构资源配置信息针对深度学习负载benchmark进行执行指令的运行模式以及指令调度策略的设置;进而基于执行指令使深度学习负载benchmark按照运行模式以及指令调度策略运行来准确获取深度学习负载的运行行为特征;再基于kubernetes虚拟化技术对运行中的深度学习负载进行集群节点模拟扩展,从而模拟深度学习负载的大规模异构集群运行来获取运行状态数据;然后,基于深度学习负载benchmark的应用层以及微架构层的性能指标进行设置的指标采集配置文件,对运行中的深度学习负载进行调度性能指标采集,并对采集到的调度性能数据进行评价,以快速获取调度性能评估结果。不仅能够实现模拟kubernetes大规模集群部署方式,还能够通过少量节点实现模拟大规模异构集群的应用场景,从而为调度研发人员提供低成本的实验环境,并在一定程度上有效降低异构集群的评估时间以及成本。To sum up, the present application provides a simulation device for heterogeneous cluster scheduling, including: a request receiving module for receiving a simulation running request sent by a user terminal; an information reading module for responding to the simulation running request, reading The local database obtains historical heterogeneous resource configuration information; the instruction setting module is used to execute the operation mode of the pre-trained deep learning load and the setting of the instruction scheduling strategy based on the historical heterogeneous resource configuration information; the load operation module is used to Based on the execution of instructions, the deep learning load runs according to the operation mode and the instruction scheduling strategy; the simulation operation module is used to simulate and expand the cluster nodes of the running deep learning load based on the kubernetes virtualization technology, and simulate the large-scale heterogeneous cluster of the deep learning load. Run, get the running behavior characteristics and running status data of the deep learning load. The basic deep learning workload benchmark is trained based on the training data set to obtain the trained deep learning workload benchmark, and based on the pre-collected historical heterogeneous resource configuration information, the execution mode of the instruction and the instruction scheduling strategy are executed for the deep learning workload benchmark. Setting; and then make the deep learning load benchmark run according to the operation mode and the instruction scheduling strategy based on the execution instructions to accurately obtain the running behavior characteristics of the deep learning load; and then simulate and expand the running deep learning load based on the kubernetes virtualization technology. The large-scale heterogeneous cluster operation that simulates the deep learning load to obtain the running status data; then, based on the performance indicators of the application layer of the deep learning load benchmark and the performance indicators of the micro-architecture layer, the index collection configuration file is set to perform the operation of the deep learning load in operation. Collect scheduling performance indicators, and evaluate the collected scheduling performance data to quickly obtain scheduling performance evaluation results. It can not only simulate the large-scale cluster deployment mode of kubernetes, but also simulate the application scenarios of large-scale heterogeneous clusters through a small number of nodes, thus providing a low-cost experimental environment for scheduling R&D personnel, and effectively reducing the cost of heterogeneous clusters to a certain extent. Assess time and cost.
为解决上述技术问题,本申请实施例还提供计算机设备。具体请参阅图8,图8为本实施例计算机设备基本结构框图。To solve the above technical problems, the embodiments of the present application also provide computer equipment. For details, please refer to FIG. 8 , which is a block diagram of a basic structure of a computer device according to this embodiment.
所述计算机设备8包括通过***总线相互通信连接存储器81、处理器82、网络接口83。需要指出的是,图中仅示出了具有组件81-83的计算机设备8,但是应理解的是,并不要求实施所有示出的组件,可以替代的实施更多或者更少的组件。其中,本技术领域技术人员可以理解,这里的计算机设备是一种能够按照事先设定或存储的指令,自动进行数值计算和/或信息处理的设备,其硬件包括但不限于微处理器、专用集成电路(Application Specific Integrated Circuit,ASIC)、可编程门阵列(Field-Programmable Gate Array,FPGA)、数字处理器 (Digital Signal Processor,DSP)、嵌入式设备等。The computer device 8 includes a memory 81 , a processor 82 , and a network interface 83 that communicate with each other through a system bus. It should be noted that only the computer device 8 with components 81-83 is shown in the figure, but it should be understood that implementation of all shown components is not required, and more or less components may be implemented instead. Among them, those skilled in the art can understand that the computer device here is a device that can automatically perform numerical calculation and/or information processing according to pre-set or stored instructions, and its hardware includes but is not limited to microprocessors, special-purpose Integrated Circuit (Application Specific Integrated Circuit, ASIC), Programmable Gate Array (Field-Programmable Gate Array, FPGA), Digital Signal Processor (Digital Signal Processor, DSP), embedded devices, etc.
所述计算机设备可以是桌上型计算机、笔记本、掌上电脑及云端服务器等计算设备。所述计算机设备可以与用户通过键盘、鼠标、遥控器、触摸板或声控设备等方式进行人机交互。The computer equipment may be a desktop computer, a notebook computer, a palmtop computer, a cloud server and other computing equipment. The computer device can perform human-computer interaction with the user through a keyboard, a mouse, a remote control, a touch pad or a voice control device.
所述存储器81至少包括一种类型的可读存储介质,所述可读存储介质包括闪存、硬盘、多媒体卡、卡型存储器(例如,SD或DX存储器等)、随机访问存储器(RAM)、静态随机访问存储器(SRAM)、只读存储器(ROM)、电可擦除可编程只读存储器(EEPROM)、可编程只读存储器(PROM)、磁性存储器、磁盘、光盘等。在一些实施例中,所述存储器81可以是所述计算机设备8的内部存储单元,例如该计算机设备8的硬盘或内存。在另一些实施例中,所述存储器81也可以是所述计算机设备8的外部存储设备,例如该计算机设备8上配备的插接式硬盘,智能存储卡(Smart Media Card, SMC),安全数字(Secure Digital, SD)卡,闪存卡(Flash Card)等。当然,所述存储器81还可以既包括所述计算机设备8的内部存储单元也包括其外部存储设备。本实施例中,所述存储器81通常用于存储安装于所述计算机设备8的操作***和各类应用软件,例如异构集群调度的模拟方法的程序代码等。此外,所述存储器81还可以用于暂时地存储已经输出或者将要输出的各类数据。The memory 81 includes at least one type of readable storage medium, and the readable storage medium includes flash memory, hard disk, multimedia card, card-type memory (eg, SD or DX memory, etc.), random access memory (RAM), static Random Access Memory (SRAM), Read Only Memory (ROM), Electrically Erasable Programmable Read Only Memory (EEPROM), Programmable Read Only Memory (PROM), Magnetic Memory, Magnetic Disk, Optical Disk, etc. In some embodiments, the memory 81 may be an internal storage unit of the computer device 8 , such as a hard disk or a memory of the computer device 8 . In other embodiments, the memory 81 may also be an external storage device of the computer device 8, such as a plug-in hard disk equipped on the computer device 8, a smart memory card (Smart Media Card, SMC), Secure Digital (SD) card, Flash Card (Flash Card), etc. Of course, the memory 81 may also include both the internal storage unit of the computer device 8 and its external storage device. In this embodiment, the memory 81 is generally used to store the operating system and various application software installed on the computer device 8 , such as program codes of a method for simulating heterogeneous cluster scheduling. In addition, the memory 81 can also be used to temporarily store various types of data that have been output or will be output.
所述处理器82在一些实施例中可以是中央处理器(Central Processing Unit,CPU)、控制器、微控制器、微处理器、或其他数据处理芯片。该处理器82通常用于控制所述计算机设备8的总体操作。本实施例中,所述处理器82用于运行所述存储器81中存储的程序代码或者处理数据,例如运行所述异构集群调度的模拟方法的程序代码。The processor 82 may be a central processing unit (Central Processing Unit) in some embodiments. Processing Unit, CPU), controller, microcontroller, microprocessor, or other data processing chip. The processor 82 is typically used to control the overall operation of the computer device 8 . In this embodiment, the processor 82 is configured to run the program code stored in the memory 81 or process data, for example, the program code for executing the simulation method of the heterogeneous cluster scheduling.
所述网络接口83可包括无线网络接口或有线网络接口,该网络接口83通常用于在所述计算机设备8与其他电子设备之间建立通信连接。The network interface 83 may include a wireless network interface or a wired network interface, and the network interface 83 is generally used to establish a communication connection between the computer device 8 and other electronic devices.
本申请还提供了另一种实施方式,即提供一种计算机可读存储介质,所述计算机可读存储介质存储有异构集群调度的模拟程序,所述异构集群调度的模拟程序可被至少一个处理器执行,以使所述至少一个处理器执行如上述的异构集群调度的模拟方法的步骤。The present application also provides another embodiment, which is to provide a computer-readable storage medium, where the computer-readable storage medium stores a simulation program scheduled by a heterogeneous cluster, and the simulation program scheduled by a heterogeneous cluster can be at least One processor executes to cause the at least one processor to execute the steps of the above-described simulation method for scheduling a heterogeneous cluster.
通过以上的实施方式的描述,本领域的技术人员可以清楚地了解到上述实施例方法可借助软件加必需的通用硬件平台的方式来实现,当然也可以通过硬件,但很多情况下前者是更佳的实施方式。基于这样的理解,本申请的技术方案本质上或者说对现有技术做出贡献的部分可以以软件产品的形式体现出来,该计算机软件产品存储在一个存储介质(如ROM/RAM、磁碟、光盘)中,包括若干指令用以使得一台终端设备(可以是手机,计算机,服务器,空调器,或者网络设备等)执行本申请各个实施例所述的方法。From the description of the above embodiments, those skilled in the art can clearly understand that the method of the above embodiment can be implemented by means of software plus a necessary general hardware platform, and of course can also be implemented by hardware, but in many cases the former is better implementation. Based on this understanding, the technical solution of the present application can be embodied in the form of a software product in essence or the part that contributes to the prior art, and the computer software product is stored in a storage medium (such as ROM/RAM, magnetic disk, CD-ROM), including several instructions to make a terminal device (which may be a mobile phone, a computer, a server, an air conditioner, or a network device, etc.) execute the methods described in the various embodiments of this application.
显然,以上所描述的实施例仅仅是本申请一部分实施例,而不是全部的实施例,附图中给出了本申请的较佳实施例,但并不限制本申请的专利范围。本申请可以以许多不同的形式来实现,相反地,提供这些实施例的目的是使对本申请的公开内容的理解更加透彻全面。尽管参照前述实施例对本申请进行了详细的说明,对于本领域的技术人员来而言,其依然可以对前述各具体实施方式所记载的技术方案进行修改,或者对其中部分技术特征进行等效替换。凡是利用本申请说明书及附图内容所做的等效结构,直接或间接运用在其他相关的技术领域,均同理在本申请专利保护范围之内。Obviously, the above-described embodiments are only a part of the embodiments of the present application, rather than all of the embodiments. The accompanying drawings show the preferred embodiments of the present application, but do not limit the scope of the patent of the present application. This application may be embodied in many different forms, rather these embodiments are provided so that a thorough and complete understanding of the disclosure of this application is provided. Although the present application has been described in detail with reference to the foregoing embodiments, those skilled in the art can still modify the technical solutions described in the foregoing specific embodiments, or perform equivalent replacements for some of the technical features. . Any equivalent structure made by using the contents of the description and drawings of the present application, which is directly or indirectly used in other related technical fields, is also within the scope of protection of the patent of the present application.

Claims (10)

  1. 一种异构集群调度的模拟方法,其特征在于,包括下述步骤:A method for simulating heterogeneous cluster scheduling, comprising the following steps:
    接收用户终端发送的模拟运行请求;Receive the simulation running request sent by the user terminal;
    响应所述模拟运行请求,读取本地数据库获取历史异构资源配置信息;In response to the simulated operation request, read the local database to obtain historical heterogeneous resource configuration information;
    基于所述历史异构资源配置信息对预先训练好的深度学习负载进行执行指令的运行模式以及指令调度策略的设置;Based on the historical heterogeneous resource configuration information, the pre-trained deep learning load is configured to execute the operation mode of the instruction and the instruction scheduling strategy;
    基于所述执行指令使所述深度学习负载按照所述运行模式以及所述指令调度策略运行;Make the deep learning load run according to the operation mode and the instruction scheduling policy based on the execution instruction;
    基于kubernetes虚拟化技术对运行中的所述深度学习负载进行集群节点模拟扩展,模拟所述深度学习负载的大规模异构集群运行,得到所述深度学习负载的运行行为特征以及运行状态数据。Based on the kubernetes virtualization technology, the running deep learning load is simulated and expanded by cluster nodes, and the large-scale heterogeneous cluster operation of the deep learning load is simulated, and the running behavior characteristics and running status data of the deep learning load are obtained.
  2. 根据权利要求1所述的异构集群调度的模拟方法,其特征在于,在所述基于所述执行指令使所述深度学习负载按照所述运行模式以及所述指令调度策略运行的步骤之前,所述方法还包括:The method for simulating scheduling of heterogeneous clusters according to claim 1, wherein before the step of making the deep learning load run according to the operation mode and the instruction scheduling policy based on the execution instruction, the The method also includes:
    读取所述本地数据库,获取训练数据集;Read the local database to obtain a training data set;
    基于所述训练数据集对基础深度学习负载benchmark进行训练,得到训练好的深度学习负载benchmark;The basic deep learning load benchmark is trained based on the training data set, and the trained deep learning load benchmark is obtained;
    所述基于所述执行指令使所述深度学习负载按照所述运行模式以及所述指令调度策略运行具体包括:The step of causing the deep learning load to run according to the operation mode and the instruction scheduling strategy based on the execution instruction specifically includes:
    基于所述执行指令使所述深度学习负载benchmark按照所述运行模式以及所述指令调度策略运行。Based on the execution instruction, the deep learning load benchmark is executed according to the operation mode and the instruction scheduling policy.
  3. 根据权利要求2所述的异构集群调度的模拟方法,其特征在于,在所述基于所述训练数据集对基础深度学习负载benchmark进行训练,得到训练好的深度学习负载benchmark的步骤之后,所述方法还包括:The method for simulating heterogeneous cluster scheduling according to claim 2, wherein after the step of training a basic deep learning load benchmark based on the training data set to obtain a trained deep learning load benchmark, the The method also includes:
    对所述深度学习负载benchmark的应用层以及微架构层的性能指标进行脚本编辑,得到所述深度学习负载benchmark对应的指标采集配置文件。Script editing is performed on the performance indicators of the application layer and the micro-architecture layer of the deep learning load benchmark to obtain an indicator collection configuration file corresponding to the deep learning load benchmark.
  4. 根据权利要求3所述的异构集群调度的模拟方法,其特征在于,在所述当所述深度学习负载在所述执行模式中运行时,基于kubernetes虚拟化技术对所述深度学习负载进行集群节点模拟扩展,模拟所述深度学习负载的大规模异构集群运行,得到所述深度学习负载的运行行为特征以及运行状态数据的步骤之后,所述方法还包括:The method for simulating heterogeneous cluster scheduling according to claim 3, wherein when the deep learning load is running in the execution mode, the deep learning load is clustered based on a kubernetes virtualization technology After simulating the expansion of nodes, simulating the operation of a large-scale heterogeneous cluster of the deep learning load, and obtaining the operation behavior characteristics and operation status data of the deep learning load, the method further includes:
    基于所述指标采集配置文件对运行中的所述深度学习负载进行调度性能指标采集,得到调度性能数据;Collect scheduling performance indicators for the running deep learning load based on the indicator collection configuration file to obtain scheduling performance data;
    基于预设的性能指标对所述调度性能数据进行评价,得到调度性能评估结果。The scheduling performance data is evaluated based on a preset performance index to obtain a scheduling performance evaluation result.
  5. 一种异构集群调度的模拟装置,其特征在于,包括:A simulation device for heterogeneous cluster scheduling, characterized in that it includes:
    请求接收模块,用于接收用户终端发送的模拟运行请求;The request receiving module is used to receive the simulation running request sent by the user terminal;
    信息读取模块,用于响应所述模拟运行请求,读取本地数据库获取历史异构资源配置信息;an information reading module, configured to read the local database to obtain historical heterogeneous resource configuration information in response to the simulated operation request;
    指令设置模块,用于基于所述历史异构资源配置信息对预先训练好的深度学习负载进行执行指令的运行模式以及指令调度策略的设置;an instruction setting module, used for setting the operation mode of executing the instruction and the instruction scheduling strategy for the pre-trained deep learning load based on the historical heterogeneous resource configuration information;
    负载运行模块,用于基于所述执行指令使所述深度学习负载按照所述运行模式以及所述指令调度策略运行;a load operation module, configured to make the deep learning load run according to the operation mode and the instruction scheduling policy based on the execution instruction;
    模拟运行模块,用于基于kubernetes虚拟化技术对运行中的所述深度学习负载进行集群节点模拟扩展,模拟所述深度学习负载的大规模异构集群运行,得到所述深度学习负载的运行行为特征以及运行状态数据。The simulation operation module is used to perform cluster node simulation expansion on the running deep learning load based on the kubernetes virtualization technology, simulate the large-scale heterogeneous cluster operation of the deep learning load, and obtain the running behavior characteristics of the deep learning load and operating status data.
  6. 根据权利要求5所述的异构集群调度的模拟装置,其特征在于,所述装置还包括:The apparatus for simulating heterogeneous cluster scheduling according to claim 5, wherein the apparatus further comprises:
    数据集获取模块,用于读取所述本地数据库,获取训练数据集;a data set acquisition module, used for reading the local database and acquiring a training data set;
    负载训练模块,用于基于所述训练数据集对基础深度学习负载benchmark进行训练,得到训练好的深度学习负载benchmark;a load training module, used for training the basic deep learning load benchmark based on the training data set to obtain a trained deep learning load benchmark;
    所述负载运行模块包括:The load running module includes:
    负载运行单元,用于基于所述执行指令使所述深度学习负载benchmark按照所述运行模式以及所述指令调度策略运行。A load running unit, configured to make the deep learning load benchmark run according to the running mode and the instruction scheduling policy based on the execution instruction.
  7. 根据权利要求6所述的异构集群调度的模拟装置,其特征在于,所述装置还包括:The apparatus for simulating heterogeneous cluster scheduling according to claim 6, wherein the apparatus further comprises:
    对所述深度学习负载benchmark的应用层以及微架构层的性能指标进行脚本编辑,得到所述深度学习负载benchmark对应的指标采集配置文件。Script editing is performed on the performance indicators of the application layer and the micro-architecture layer of the deep learning load benchmark to obtain an indicator collection configuration file corresponding to the deep learning load benchmark.
  8. 根据权利要求7所述的异构集群调度的模拟装置,其特征在于,所述装置还包括:The device for simulating heterogeneous cluster scheduling according to claim 7, wherein the device further comprises:
    性能指标采集模块,用于基于所述指标采集配置文件对运行中的所述深度学习负载进行调度性能指标采集,得到调度性能数据;A performance indicator collection module, configured to collect scheduling performance indicators for the running deep learning load based on the indicator collection configuration file to obtain scheduling performance data;
    性能评估模块,用于基于预设的性能指标对所述调度性能数据进行评价,得到调度性能评估结果。A performance evaluation module, configured to evaluate the scheduling performance data based on a preset performance index to obtain a scheduling performance evaluation result.
  9. 一种计算机设备,其特征在于,包括存储器和处理器,所述存储器中存储有计算机程序,所述处理器执行所述计算机程序时实现如权利要求1至4中任一项所述的异构集群调度的模拟方法的步骤。A computer device, characterized by comprising a memory and a processor, wherein a computer program is stored in the memory, and the processor implements the heterogeneous system according to any one of claims 1 to 4 when the processor executes the computer program Steps of the mock method for cluster scheduling.
  10. 一种计算机可读存储介质,其特征在于,所述计算机可读存储介质上存储有计算机程序,所述计算机程序被处理器执行时实现如权利要求1至4中任一项所述的异构集群调度的模拟方法的步骤。A computer-readable storage medium, characterized in that a computer program is stored on the computer-readable storage medium, and when the computer program is executed by a processor, the heterogeneous system according to any one of claims 1 to 4 is implemented Steps of the mock method for cluster scheduling.
PCT/CN2020/139683 2020-11-30 2020-12-25 Simulation method and apparatus for heterogeneous cluster scheduling, computer device, and storage medium WO2022110446A1 (en)

Applications Claiming Priority (2)

Application Number Priority Date Filing Date Title
CN202011375112.7A CN112433819B (en) 2020-11-30 2020-11-30 Simulation method and device for heterogeneous cluster scheduling, computer equipment and storage medium
CN202011375112.7 2020-11-30

Publications (1)

Publication Number Publication Date
WO2022110446A1 true WO2022110446A1 (en) 2022-06-02

Family

ID=74697516

Family Applications (1)

Application Number Title Priority Date Filing Date
PCT/CN2020/139683 WO2022110446A1 (en) 2020-11-30 2020-12-25 Simulation method and apparatus for heterogeneous cluster scheduling, computer device, and storage medium

Country Status (2)

Country Link
CN (1) CN112433819B (en)
WO (1) WO2022110446A1 (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115237581A (en) * 2022-09-21 2022-10-25 之江实验室 Heterogeneous computing power-oriented multi-strategy intelligent scheduling method and device
CN116523045A (en) * 2023-03-13 2023-08-01 之江实验室 Deep learning reasoning simulator oriented to multi-core chip
CN117271268A (en) * 2023-11-20 2023-12-22 成都大征创智科技有限公司 Cluster architecture performance evaluation method in digital computing platform

Families Citing this family (9)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN113420517B (en) * 2021-05-28 2023-01-06 清华大学 FPGA virtualization hardware system stack design oriented to cloud deep learning reasoning
CN113298176B (en) * 2021-06-10 2023-04-25 中国科学技术大学 Heterogeneous model self-adaptive cooperation method
CN113377540A (en) * 2021-06-15 2021-09-10 上海商汤科技开发有限公司 Cluster resource scheduling method and device, electronic equipment and storage medium
CN113504966B (en) * 2021-06-22 2023-10-31 中国科学院计算技术研究所 GPU cluster scheduling strategy simulation method and GPU cluster simulator
CN113391925A (en) * 2021-06-25 2021-09-14 北京字节跳动网络技术有限公司 Cloud resource management method, system, medium, and computer device
CN113553140B (en) * 2021-09-17 2022-03-18 阿里云计算有限公司 Resource scheduling method, equipment and system
CN113973049B (en) * 2021-10-13 2022-08-02 中国科学院计算技术研究所 Method for managing and deploying bit stream of FPGA (field programmable Gate array) cluster
CN114637650B (en) * 2022-03-11 2023-04-18 电子科技大学 Elastic expansion method based on Kubernetes cluster
CN116170518B (en) * 2023-04-26 2023-07-18 北京太极信息***技术有限公司 Method and equipment for cloud cross-architecture management of domestic chip container

Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102236582A (en) * 2011-07-15 2011-11-09 浙江大学 Method for balanced distribution of virtualization cluster load in a plurality of physical machines
CN105205003A (en) * 2015-10-28 2015-12-30 努比亚技术有限公司 Automated testing method and device based on clustering system
US20200186616A1 (en) * 2018-12-11 2020-06-11 Sap Se Kubernetes as a distributed operating system for multitenancy/multiuser
CN112000421A (en) * 2020-07-15 2020-11-27 北京计算机技术及应用研究所 Management scheduling technology based on super-fusion architecture

Family Cites Families (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US11526370B2 (en) * 2019-03-10 2022-12-13 Microsoft Technology Licensing, Llc. Cloud resource management using machine learning
CN111274036B (en) * 2020-01-21 2023-11-07 南京大学 Scheduling method of deep learning task based on speed prediction
CN111966484A (en) * 2020-06-23 2020-11-20 北京大学 Cluster resource management and task scheduling method and system based on deep reinforcement learning

Patent Citations (4)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102236582A (en) * 2011-07-15 2011-11-09 浙江大学 Method for balanced distribution of virtualization cluster load in a plurality of physical machines
CN105205003A (en) * 2015-10-28 2015-12-30 努比亚技术有限公司 Automated testing method and device based on clustering system
US20200186616A1 (en) * 2018-12-11 2020-06-11 Sap Se Kubernetes as a distributed operating system for multitenancy/multiuser
CN112000421A (en) * 2020-07-15 2020-11-27 北京计算机技术及应用研究所 Management scheduling technology based on super-fusion architecture

Cited By (6)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN115237581A (en) * 2022-09-21 2022-10-25 之江实验室 Heterogeneous computing power-oriented multi-strategy intelligent scheduling method and device
CN115237581B (en) * 2022-09-21 2022-12-27 之江实验室 Heterogeneous computing power-oriented multi-strategy intelligent scheduling method and device
CN116523045A (en) * 2023-03-13 2023-08-01 之江实验室 Deep learning reasoning simulator oriented to multi-core chip
CN116523045B (en) * 2023-03-13 2023-11-07 之江实验室 Deep learning reasoning simulator oriented to multi-core chip
CN117271268A (en) * 2023-11-20 2023-12-22 成都大征创智科技有限公司 Cluster architecture performance evaluation method in digital computing platform
CN117271268B (en) * 2023-11-20 2024-01-30 成都大征创智科技有限公司 Cluster architecture performance evaluation method in digital computing platform

Also Published As

Publication number Publication date
CN112433819B (en) 2024-04-19
CN112433819A (en) 2021-03-02

Similar Documents

Publication Publication Date Title
CN112433819B (en) Simulation method and device for heterogeneous cluster scheduling, computer equipment and storage medium
Liu et al. FogWorkflowSim: An automated simulation toolkit for workflow performance evaluation in fog computing
Di et al. Characterizing and modeling cloud applications/jobs on a Google data center
WO2012033909A2 (en) Method and system of simulating a data center
US11429434B2 (en) Elastic execution of machine learning workloads using application based profiling
CN103955373A (en) Design method of SDN (Software Defined Networking) application integration development environment
Huang et al. A simulation-based optimization approach for reliability-aware service composition in edge computing
CN105630575A (en) Performance evaluation method aiming at KVM virtualization server
Murshed et al. Using the GridSim toolkit for enabling grid computing education
Shen et al. Performance prediction of parallel computing models to analyze cloud-based big data applications
Rak Performance modeling using queueing petri nets
JP2012509546A (en) Method and data processing system for simulating embedded systems
Wang et al. An Efficient Load Prediction-Driven Scheduling Strategy Model in Container Cloud
Khan Hadoop performance modeling and job optimization for big data analytics
Huang et al. Performance and replica consistency simulation for quorum-based NoSQL system cassandra
Zhang et al. Repeatable multi-dimensional virtual network embedding in cloud service platform
Oladimeji et al. A comprehensive survey on cloud computing simulators
Jawaddi et al. Autoscaling in Serverless Computing: Taxonomy and OpenChallenges
Kim et al. RETRACTED ARTICLE: Simulator considering modeling and performance evaluation for high-performance computing of collaborative-based mobile cloud infrastructure
Sinaei et al. Run-time mapping algorithm for dynamic workloads using association rule mining
He et al. Dynamic scalable stochastic petri net: A novel model for designing and analysis of resource scheduling in cloud computing
Skrinarova Implementation and evaluation of scheduling algorithm based on PSO HC for elastic cluster criteria
Amar et al. Tunable scheduling in a GridRPC framework
Hernández et al. A Simulation-based Scheduling Strategy for Scientific Workflows.
Kołodziej et al. Modeling and Simulation in HPC and Cloud Systems

Legal Events

Date Code Title Description
121 Ep: the epo has been informed by wipo that ep was designated in this application

Ref document number: 20963334

Country of ref document: EP

Kind code of ref document: A1

NENP Non-entry into the national phase

Ref country code: DE

122 Ep: pct application non-entry in european phase

Ref document number: 20963334

Country of ref document: EP

Kind code of ref document: A1