CN107861606A

CN107861606A - A kind of heterogeneous polynuclear power cap method by coordinating DVFS and duty mapping

Info

Publication number: CN107861606A
Application number: CN201711163506.4A
Authority: CN
Inventors: 方娟; 汪梦萱; 马傲男; 程妍瑾; 常泽清
Original assignee: Beijing University of Technology
Current assignee: Beijing University of Technology
Priority date: 2017-11-21
Filing date: 2017-11-21
Publication date: 2018-03-30

Abstract

The present invention discloses a kind of heterogeneous polynuclear power cap method by coordinating DVFS and duty mapping, survey calculation node power consumption can be distinguished after the completion of program execution by being realized first against heterogeneous system, the script of CPU power consumption and GPU power consumptions, then the concurrent testing benchmark program of selection is changed, for obtaining the execution time of different kernel functions；Then in the case where CPU and GPU sets different frequency, application program only is run on CPU and GPU respectively, obtains detailed operation information, including total execution time, each kernel function perform time, calculate node power consumption and CPU power consumption and GPU power consumptions；Based on operation information, a forecast model, including prediction execution time model and power consumption model are designed；Finally, based on forecast model, different cpu frequencies, GPU frequencies and system power dissipation under task allocative decision are obtained and performs the time insert in allocation list, according to improved greedy algorithm, searches out allocation optimum scheme.System power dissipation budget is limited while can improving systematic function using the present invention.

Description

A kind of heterogeneous polynuclear power cap method by coordinating DVFS and duty mapping

Technical field

The invention belongs to field of computer architecture, and in particular to realize a kind of by coordinating DVFS and duty mapping Heterogeneous polynuclear power cap method.

Background technology

It is gradual as the advanced architectures of representative using polycaryon processor by the continuous research and development of recent years Single core processor is substituted to turn into the main path for improving processor performance.The isomorphism that compares polycaryon processor, heterogeneous polynuclear platform Better performance can be realized.Power cap is a kind of technology of power consumption limit by heterogeneous system under predeterminated level.Power consumption The lifting of heterogeneous polynuclear performance is limited with radiating.The structure of modern processors allows them to bear certain level power consumption band The injury come, so as to be required to the system for realizing the processor power upper limit.Most common power budget technology is by hard at present Part component is worked at different frequencies, therefore has different power consumptions, and main thought is scaled using dynamic voltage frequency (DVFS).While limiting heterogeneous system power consumption using DVFS, the situation of laod unbalance occurs between CPU and GPU.Pass through Concurrent program is decomposed into can performing simultaneously for task, and each duty mapping can be made full use of to most suitable processor The computing capability of system, systematic function is improved, but this mapping scheme usually not considers system power dissipation.This paper presents A kind of scheme for combining DVFS and duty mapping, systematic function is improved in the case where limiting system power dissipation budget.

The content of the invention

The present invention proposes a kind of heterogeneous polynuclear power cap method by coordinating DVFS and duty mapping, improves system System power dissipation budget is limited while performance, realizes that one can be surveyed respectively after the completion of program execution first against heterogeneous system The script of gauge operator node power consumption, CPU power consumption and GPU power consumptions, the concurrent testing benchmark program of selection is then changed, for obtaining The execution time of different kernel functions.Then in the case where CPU and GPU sets different frequency, application is only run on CPU and GPU respectively Program, detailed operation information is obtained, including total execution time, each kernel function perform time, calculate node power consumption and CPU work( Consumption and GPU power consumptions.Based on operation information, a forecast model, including prediction execution time model and power consumption model are designed.Most Afterwards, based on forecast model, obtain different cpu frequencies, GPU frequencies and system power dissipation under task allocative decision and perform the time filling out Enter in allocation list.According to improved greedy algorithm, allocation optimum scheme (cpu frequency, GPU frequencies, duty mapping table) is searched out.

In order to achieve the above object, the present invention uses following technical scheme.

A kind of heterogeneous polynuclear power cap method by coordinating DVFS and duty mapping, DVFS and duty mapping are combined System power dissipation is limited under budget power consumption while systematic function is pursued；Comprise the following steps：

Step 1, realize that measurement application program always performs time, CPU power consumption and GPU work(after the completion of application program execution Consumption.

Step 2, the concurrent testing benchmark program of selection is changed, obtains the execution time of each kernel function in program.

Step 3, respectively only application program is performed on CPU or on GPU, set different CPU (GPU) can selected frequency, obtain Detailed operation information, always perform the time including program, each kernel function performs the time in program, total system power consumption, CPU power consumption and GPU power consumptions.

Step 4, design forecast model, predict different task mapping scheme under different CPU and GPU frequencies power consumption and Perform the time.The input of forecast model is cpu frequency, GPU frequencies and duty mapping scheme, present document relates to duty mapping be Refer to using each kernel function as an entirety, be mapped on CPU or GPU, rather than kernel function is distributed according to certain proportion Performed simultaneously on to CPU and GPU.The output of forecast model is that the program of prediction always performs the system power dissipation of time and prediction.This Forecast model includes execution time model and power consumption model.

Step 4.1, execution time model

Application program always performs the time can be according to the execution time of the kernel of each in program and corresponding data biography The defeated time obtains.1. always perform the time by program is represented by formula.

Wherein, f_cpu,f_gpuCpu frequency, GPU frequencies, T are represented respectively_i(f_cpu,f_gpu) represent i-th of kernel function execution Time and required data transmission period, 2. represented by formula.

The Part I of formula represents to perform time, Part II expression data transmission period.The kernel function execution time exists 2nd step has obtained.H2D and D2H represent respectively the transmission of data transfer cost from main frame to equipment and equipment to main frame into This.Required be equal to 1 or 0, represent whether core k whether data d.Data are possible in equipment, it is not necessary to are passed It is defeated, so representing data whether in equipment using OnDevice.Size of data is represented with size.

Step 4.2, power consumption model

System power dissipation can be by three part expressions, respectively idle power consumption, CPU power consumption P_cpuWith GPU power consumptions P_gpu.System 3. power consumption is represented by formula.

P=P_idle(f_cpu, f_gpu)+P_cpu(f_cpu, f_gpu)+P_gpu(f_cpu, f_gpu) ③

Wherein, P_idleIdle power consumption is represented, it is relevant with cpu frequency and GPU frequencies, and it is unrelated with duty mapping.It can lead to Cross the power consumption that the total power consumption that execution application program obtains only on CPU is subtracted on CPU to obtain, 4. obtained by formula.

Represent respectively under conditions of performing application program only on CPU, the total system power consumption of acquisition, CPU power consumption and GPU power consumptions, it is relevant with setting frequency.

When application program performs, CPU and GPU are not always in execution, institute due to the data correlation between kernel function Changed with CPU power consumption and GPU power consumptions according to the difference of given duty mapping scheme, with reference to this phenomenon, we are false If power consumption and to perform the time directly proportional with total execution time ratios, thus CPU and GPU power consumptions can respectively by formula 5. and formula 6. represent.

Wherein, λ_cpuAnd λ_gpuCPU and GPU rush hour ratios are represented respectively,Represent only to hold on GPU respectively Under conditions of row application program, the CPU power consumption and GPU power consumptions of acquisition.WithCPU is represented respectively With the estimator of the maximum of dynamic power consumption in GPU.

Because λ_cpuAnd λ_gpuCPU and GPU rush hour ratios are represented respectively, it is possible to the operation measured from the 3rd step Information obtains.When program performs, each equipment rush hour is defined as the summation of each kernel function actual execution time.λ_cpuWith λ_gpu7. can 8. it be represented with formula by formula respectively.

t_cpuRepresent cpu busy time, t_gpuRepresent the GPU rush hours, t_to_talRepresent that application program always performs the time.

Step 5, based on forecast model, configuration parameter table is built.

Different cpu frequency and GPU frequencies, different task mapping scheme are calculated according to execution time model and power consumption model Under the execution time and power consumption, and insert in configuration parameter table.

Step 6, according to configuration parameter table, optimized parameter collection is searched for using improved greedy algorithm.It is divided into two steps, exists first Time most short duty mapping scheme is performed using greedy algorithm search under given cpu frequency and GPU frequencies, according to this task Mapping, cpu frequency, GPU frequencies and power module, obtain the system prediction power consumption under this parameter configuration.Then, CPU frequencies are changed Rate and GPU frequencies, system prediction power consumption is calculated again according to previous step, finally give be limited in it is optimal under budget power consumption Allocation plan.

Step 6.1, cpu frequency and GPU frequencies are given, search performs time most short duty mapping scheme, and according to work( Model is consumed, draws system prediction power consumption.

Step 6.2, according to cpu frequency and the optional setting of GPU frequencies, change cpu frequency and GPU frequencies, repeat step Rapid 6.1, mapping scheme corresponding to the optimal exercising time under this combination of frequency is obtained, and according to this mapping scheme computing system work( Consumption, by given budget power consumption, draw the optimal frequency parameter selection being limited under budget power consumption and mapping scheme.

Compared with prior art, the present invention has advantages below：

DVFS and duty mapping are combined, common realization ensures system while system power dissipation is limited in into budget power consumption System performance.Existing system power capping technology is largely all realized by dynamic voltage frequency scaling, because system In device frequency system power dissipation is influenceed maximum, but do not account for and change cpu frequency and system that GPU frequencies can be brought The situation of load imbalance.By the way that concurrent program is decomposed into can performing for task simultaneously, and by each duty mapping to most closing Suitable processor can make full use of the computing capability of system, improve systematic function, but this mapping scheme is usually not examined Consider system power dissipation.So it is of the invention by the way that two kinds of optimisation strategies of DVFS and duty mapping are combined, in lifting system System power dissipation is limited under certain budget level while energy.

Brief description of the drawings

To make the purpose of the present invention, scheme is more easy-to-understand, and below in conjunction with figure, the present invention is further described.

Fig. 1 is CPU-GPU heterogeneous multi-core system Organization Charts, and the heterogeneous system is to be simulated to build by gem5-gpu, 4 cores CPU and GPU being made up of 8 CU is integrated on the same chip.

Fig. 2 is the power cap conceptual design schematic diagram based on detailed operation information in the present invention.

Fig. 3 is the fine granularity synchronization schematic diagram between CPU and GPU using traditional task data piecemeal.

Fig. 4 is CPU the and GPU rush hours of the task kernel function piecemeal used in the present invention and waits task management data Caused by CPU and GPU free time schematic diagrames.

Embodiment

The present invention will be further described below in conjunction with the accompanying drawings.

Fig. 1 is the heterogeneous multi-core system built by gem5-gpu simulators, simulation be one by 4 core CPUs and One is integrated in the isomery framework on same chip by 8 CU GPU formed, can be according to configuration text in gem5-gpu Part flexibly changes this analog architectures, and gem5-gpu supports DVFS.

The present invention realizes one kind by the way that DVFS and task are reflected in the heterogeneous multi-core system of structure is simulated by gem5-gpu The power cap method combined is penetrated, includes step in detail below：

Step 1, realize and measure total calculate node time, CPU power consumption and GPU power consumptions after the completion of application program execution.

In gem5-gpu, after an application program execution terminates, one can be automatically generated and perform letter comprising all programs The file stat.txt of breath, wherein just comprising program execution time.McPAT modules can independent measurement CPU power consumption, GPUWattch modules can independent measurement CPU module, pass through in gem5-gpu configure McPAT module volume GPUWattch modules CPU power consumption and GPU power consumptions can be obtained after the completion of program execution.

OpenCL programs can perform on different devices, including CPU and GPU.The benchmark used in the present invention Test program is the NAS concurrent testing benchmark program collection of OpenCL versions.Each benchmark has different characteristics, wherein, Some programs comprise more than 60 kernel, and some programs only have two kernel.By rewriting test program, each is collected The kernel execution time.

Step 3, respectively only application program is performed on CPU or on GPU, set different CPU (GPU) can selected frequency, obtain Obtain detailed operation information, the input as the power cap scheme based on operation information.

Fig. 2 shows the flow of the power cap strategy based on operation information.Wherein, CPU Profile Runs and GPU Profile Runs represent the operation information only obtained on CPU and only on GPU after execution application program respectively, pass through these Operation information, establish the forecast model in step 4, including time model and power consumption model, by the Time model in Fig. 2 and Power model are represented.The cpu frequency during input of forecast model, GPU frequencies and duty mapping scheme, output are corresponding pre- Survey and perform time and forecasting system power consumption.Different input and output construct an allocation list, improved according to allocation list, use Greedy algorithm is according to the algorithm search optimum mapping scheme and set of frequency in step 6.By the Distribute in Fig. 2 Parallel tasks and set device frequencies are represented.

Step 4, forecast model, including execution time model and power consumption model are established, for predicting in heterogeneous multi-core environment The execution time of middle application program and system power dissipation.In step 3 under different set of frequency, only on CPU and only on GPU It is the basis for establishing forecast model to perform the operation information that application program obtains.From formula 1. to formula 8., it can be seen that pass through Program operation information includes each kernel execution informations, CPU power consumption and GPU power consumptions, can predict different cpu frequencies, GPU Under frequency and duty mapping scheme, the execution time of program and power consumption.

The duty mapping being related in the present invention refers to using any one kernel in program as an entirety, mapping To CPU or GPU, this from traditional according to task data to distribute a certain proportion of data different to CPU and GPU.According to number of tasks According to pro rate refer to a kernel simultaneously on CPU and GPU perform identical code, kernel need data according to For pro rate to CPU and GPU, CPU and GPU handle the data of distribution simultaneously.According to during task data pro rate due to each Individual kernel needs on CPU and GPU synchronously after execution terminates, so fixed data distribution ratio may be in kernel It is idle idle with GPU time that many CPU times are produced when execution.As shown in figure 3, the CPU and GPU of task data piecemeal it Between fine granularity synchronously illustrate.Fixed data allocation proportion is α, and for kernel1, GPU execution efficiencys are more preferable, and CPU Processing speed will be slow, and for the data distributed, GPU can perform completion prior to CPU, and this when, GPU was at sky Idle is carved, and waiting for CPU performs completion.It is relative, for kernel2, CPU processing speed faster, so CPU can be prior to GPU performs completion, and CPU would be at idle condition and wait GPU to perform completion this when.So from figure 3, it can be seen that by When the CPU required for task data piecemeal and GPU can synchronously cause CPU in the process of implementation and GPU to produce many free time Between.For being distributed directly to CPU or GPU using a kernel as an entirety, the synchronization between CPU and GPU is avoided the need for, But it is relative, data transmission period can be longer.As shown in figure 4, it is shown that one whole as one based on kernel CPU the and GPU implementation procedures of body mapping, it is not necessary to which CPU and GPU is synchronous, but data transfer is more frequent between CPU and GPU.

Step 5, based on forecast model, configuration parameter table is built.

Using the time prediction model and power consumption forecast model in step 4, all possible CPU frequencies can be calculated Rate, execution time and power consumption under GPU frequencies and duty mapping scheme, and be stored in configuration parameter table.

Step 6, according to configuration parameter table, optimized parameter collection is searched for using improved greedy algorithm.

Step 6.1, cpu frequency and GPU frequencies are given, search performs time most short duty mapping scheme, and according to work( Model is consumed, draws system prediction power consumption.As shown in algorithm 1.

Step 6.2, according to cpu frequency and the optional setting of GPU frequencies, change cpu frequency and GPU frequencies, repeat step Rapid 6.1, mapping scheme corresponding to the optimal exercising time under this combination of frequency is obtained, and according to this mapping scheme computing system work( Consumption, by given budget power consumption, draw the optimal frequency parameter selection being limited under budget power consumption and mapping scheme.This step by Shown in algorithm 2.

Claims

A kind of 1. heterogeneous polynuclear power cap method by coordinating DVFS and duty mapping, it is characterised in that including following step Suddenly：

Step 1, realization measure total calculate node time, CPU power consumption and GPU power consumptions after the completion of application program execution；

Step 2, the concurrent testing benchmark program for changing selection, obtain the execution time of each kernel function in program；

Step 3, application program is only performed on CPU or on GPU respectively, set different CPU or GPU can selected frequency, obtain detailed Thin operation information, each kernel function in time, program is always performed including program and performs time, total system power consumption, CPU work( Consumption and GPU power consumptions；

Step 4, forecast model is designed, predicts power consumption and the execution of different task mapping scheme under different CPU and GPU frequencies Time；Wherein, the input of forecast model is cpu frequency, GPU frequencies and duty mapping scheme, the duty mapping refer to by Each kernel function is mapped on CPU or GPU, rather than kernel function is assigned into CPU according to certain proportion as an entirety With performed simultaneously on GPU；The output of forecast model is that the program of prediction always performs the system power dissipation of time and prediction；

Step 5, different cpu frequency and GPU frequencies, different task mapping side are calculated according to execution time model and power consumption model Execution time and power consumption under case, and insert in configuration parameter table；

Step 6, according to configuration parameter table, optimized parameter collection is searched for using improved greedy algorithm；It is divided into two steps, first given Cpu frequency and GPU frequencies under using greedy algorithm search perform time most short duty mapping scheme, reflected according to this task Penetrate, cpu frequency, GPU frequencies and power module, obtain the system prediction power consumption under this parameter configuration；Then, cpu frequency is changed With GPU frequencies, system prediction power consumption is calculated again according to previous step, finally gives be limited under budget power consumption optimal and matches somebody with somebody Put scheme.
2. as claimed in claim 1 by coordinating DVFS and duty mapping heterogeneous polynuclear power cap method, its feature exists In forecast model includes described in step 4：Execution time model and power consumption model,

Step 4.1, execution time model

Application program always performs the time when can be according to execution time and the corresponding data transfer of the kernel of each in program Between obtain.1. always perform the time by program is represented by formula.

Wherein, f_cpu,f_gpuCpu frequency, GPU frequencies, T are represented respectively_i(f_cpu,f_gpu) represent i-th of kernel function the execution time and Required data transmission period, 2. represented by formula.

The Part I of formula represents to perform time, Part II expression data transmission period.Kernel function performs the time in the 2nd step Obtain.H2D and D2H represents data transfer cost from main frame to equipment and equipment to the transmission cost of main frame respectively. Required be equal to 1 or 0, represent whether core k whether data d.Data are possible in equipment, it is not necessary to transmit, So represent data whether in equipment using OnDevice.Size of data is represented with size.

Step 4.2, power consumption model

System power dissipation can be by three part expressions, respectively idle power consumption, CPU power consumption P_cpuWith GPU power consumptions P_gpu.System power dissipation 3. represented by formula.

P=P_idle(f_cpu, f_gpu)+P_cpu(f_cpu, f_gpu)+P_gpu(f_cpu, f_gpu) ③

Wherein, P_idleIdle power consumption is represented, it is relevant with cpu frequency and GPU frequencies, and it is unrelated with duty mapping.Can be by only existing The power consumption that the total power consumption that application program obtains is subtracted on CPU is performed on CPU to obtain, and is 4. obtained by formula.

Represent respectively under conditions of performing application program only on CPU, the total system power consumption of acquisition, CPU work( Consumption and GPU power consumptions, it is relevant with setting frequency.

When application program performs, CPU and GPU due to the data correlation between kernel function, be not always in execution, so CPU power consumption and GPU power consumptions change according to the difference of given duty mapping scheme, with reference to this phenomenon, it will be assumed that Power consumption and to perform the time directly proportional with total execution time ratios, thus CPU and GPU power consumptions can respectively by formula 5. with formula 6. Represent.

Wherein, λ_cpuAnd λ_gpuCPU and GPU rush hour ratios are represented respectively,Represent to perform only on GPU respectively and answer Under conditions of program, the CPU power consumption and GPU power consumptions of acquisition.WithCPU and GPU is represented respectively The estimator of the maximum of middle dynamic power consumption.

Because λ_cpuAnd λ_gpuCPU and GPU rush hour ratios are represented respectively, it is possible to the operation information measured from the 3rd step Obtain, when program performs, each equipment rush hour is defined as the summation of each kernel function actual execution time, λ_cpuAnd λ_gpuPoint It can not represented by formula 07 and formula 8 zero.

t_cpuRepresent cpu busy time, t_gpuRepresent the GPU rush hours, t_totalRepresent that application program always performs the time.