CN110134567B

CN110134567B - Microprocessor non-uniform sampling heat distribution reconstruction method based on convolution neural network

Info

Publication number: CN110134567B
Application number: CN201910358496.2A
Authority: CN
Inventors: 李鑫; 欧兴涛; 李智; 周巍; 段哲民
Original assignee: Northwestern Polytechnical University
Current assignee: Northwestern Polytechnical University
Priority date: 2019-04-30
Filing date: 2019-04-30
Publication date: 2023-03-14
Anticipated expiration: 2039-04-30
Also published as: CN110134567A

Abstract

The invention relates to a microprocessor non-uniform sampling heat distribution reconstruction method based on a convolutional neural network. Firstly, manufacturing a sample temperature data set by using an oil-cooled heat dissipation system capable of transmitting infrared spectrum; secondly, completing the training of the model in the chip design or the post-silicon stage, and storing the trained network; thirdly, in the running stage of the processor, judging the class of the workload application program by using a classification network according to the temperature data sampled by the thermal sensor; and finally, obtaining the whole temperature distribution of the chip by utilizing the corresponding heat distribution reconstruction network. Because the workload classification network model and the thermal distribution reconstruction network model are respectively designed, the non-uniform temperature data sampled by a limited number of thermal sensors is utilized to carry out thermal distribution reconstruction, and the recovered whole chip temperature distribution data is more accurate.

Description

Microprocessor non-uniform sampling heat distribution reconstruction method based on convolution neural network

Technical Field

The invention belongs to the technical field of chip temperature monitoring, and particularly relates to a microprocessor non-uniform sampling heat distribution reconstruction method based on a convolutional neural network. Firstly, manufacturing a sample temperature data set by using an oil-cooled heat dissipation system capable of transmitting infrared spectrum; secondly, completing the training of the model in the chip design or the post-silicon stage, and storing the trained network; thirdly, in the running stage of the processor, judging the class of the workload application program by using a classification network according to the temperature data sampled by the thermal sensor; and finally, obtaining the whole temperature distribution of the chip by utilizing the corresponding heat distribution reconstruction network.

Background

In recent years, high-performance multi-core processors commonly integrate on-chip thermal sensors to perform continuous thermal monitoring on chips. The heat distribution reconstruction technology utilizes the observed temperature reading of the heat sensor to recover the whole temperature distribution of the chip, and is mainly applied to the dynamic heat management technology to realize global temperature sensing. In practice, the number and location of the built-in thermal sensors in the chip are limited due to manufacturing costs, design complexity, and the like. As soon as a hot spot occurs in an area where no thermal sensor is placed, global temperature sensing can play a critical role to avoid damage to a functional unit due to lack of temperature information of the area. In addition, reconstructing the temperature information provided by the entire chip heatmap is critical to fine-grained thermal management of the multicore processor, such as thermally driven spatial thread migration and dynamic voltage frequency scaling. Moreover, compared with the power consumption simulation tool, the power consumption estimation of each functional module in operation, which is calculated by using the whole temperature distribution of the chip, is more accurate. Therefore, thermal distribution reconfiguration techniques are important to ensure thermally safe operation of processor chips, especially in the dark silicon era.

The thermal distribution reconstruction technology is mainly divided into uniform sampling and non-uniform sampling, but due to the position limitation of the thermal sensor, the dynamic thermal management mainly uses the thermal distribution reconstruction technology based on the non-uniform sampling to realize the global temperature sensing of the chip. The accuracy of the thermal profile reconstruction may affect the efficiency of the dynamic thermal management to a great extent, and inaccurate temperature estimation may lead to false early warning and unnecessary response, so that the reliability of the dynamic thermal management is affected. The heat distribution reconstruction is generally realized by using an interpolation technology, but the interpolation algorithm is not suitable for real-time monitoring due to the factors of large calculation amount, long operation time and the like. How to rapidly and accurately realize the thermal distribution reconstruction gradually becomes a new research hotspot in the field of on-chip temperature sensing.

Among the results of the study of the thermal distribution reconstruction technique, the spectral technique is considered as one of the most potential methods. The basic starting point is that a chip temperature signal with variable space is regarded as a temperature signal with variable time, and the heat distribution reconstruction is realized by applying the Nyquist-Shannon sampling theory and the two-dimensional discrete signal processing technology to the heat sensors which are uniformly arranged at intervals; for non-uniformly spaced thermal sensors, a Voronoi diagram would first need to be constructed, which is converted to uniformly spaced samples. Abdullah Nazma Nowroz, ryan Cochran and Sherief Reda published in Proceedings of the 47th Design Automation Conference (47 th Design Automation Collection) in 2010 an article "Thermal monitoring of real processors for sensor allocation and full characterization" (technique for Thermal monitoring of actual processors: sensor allocation and full characterization) which uses the k-LSE method (spectral technique using discrete cosine transform) instead of discrete Fourier transform for the reconstruction of Thermal distributions. However, since the chip temperature signal is not bandwidth-limited, the method has a certain edge effect, and particularly has a certain deficiency in hot spot temperature error estimation. Ruo-lin Wang, xin Li, wen-jiang Liu, tao Liu, meng-tie Rong, liang Zhou in the Journal of Shanghai Jian Jiaotong University (Shanghai University of transportation) in 2014 published article "Surface spline interpolation method for thermal reconstruction with limited non-uniform layout sensor data of non-uniform layout" (curved spline interpolation method for thermal reconstruction using sampling data of finite non-uniform layout sensors), and proposed a sampling non-uniform thermal distribution reconstruction method based on curved spline interpolation. The basic idea is to regard the temperature value of each data point on the chip as the height value of the point, construct a continuous temperature surface by using a surface spline interpolation method, and further reconstruct the temperature distribution of the whole chip. However, under the condition that the number of the thermal sensors is limited, the hotspot temperature reconstructed by the method has obvious errors, and accurate temperature sensing cannot be realized.

Disclosure of Invention

Technical problem to be solved

In order to avoid the defects of the prior art, the invention provides a microprocessor non-uniform sampling heat distribution reconstruction method based on a convolutional neural network.

Technical scheme

A microprocessor non-uniform sampling heat distribution reconstruction method based on a convolutional neural network is characterized by comprising the following steps:

step 1: collecting a sample temperature data set by using an oil-cooled heat dissipation system with a permeable infrared spectrum, and selecting an SPEC CPU2006 standard performance evaluation standard as a test program;

step 2: scaling the characteristic range of the chip sample temperature data by adopting a Min-Max normalization method, and uniformly mapping the characteristic range between [0,1 ]:

wherein, T = (T) _x,y ) _H×W A matrix of discretized heat maps for the processor, H and W representing the height and width of the discretized heat map, respectively, t _x,y Represents the temperature at coordinates (x, y) t 'on the chip' _x,y Max (T) and min (T) are the maximum and minimum values, respectively, of the heatmap matrix T for the normalized temperature;

and step 3: designing a workload classification network model, which comprises an input layer, four full-connection layers and an output layer;

the input data is a one-dimensional vector with the size of Nx 1 obtained after normalization of chip temperature data, the calculation of four full connection layers is sequentially carried out, a workload classification result is obtained through an output layer SoftMax classifier, and the result corresponds to a 29-class benchmark test program in the SPEC CPU 2006;

and 4, step 4: respectively designing a thermal distribution reconstruction network model of a 29-class benchmark test program, which comprises an input layer, two full-connection layers, a re-dimension adjusting layer and three convolution layers;

selecting a reconstruction network by using the result of the step 3, converting input data into a one-dimensional vector with dimension of 3600 after two full-connection layer operations, adding dimension adjustment operation, converting the data into 60 x 60 size, and performing three-layer convolution operation on the 60 x 60 data to obtain the final chip reconstruction temperature distribution; the filter numbers of the three convolution layers are respectively 64, 32 and 1, the convolution kernels are all 3 multiplied by 3, and the step length is all 1;

and 5: training a network; and (3) using a Caffe framework platform as the realization of training and testing the convolutional neural network model, and realizing the updating and optimization of network model parameters by using a back propagation algorithm and a random gradient descent algorithm.

The infrared spectrum-permeable oil-cooling heat dissipation system is characterized in that two sapphire windows with a gap of 1 mm are arranged on a microprocessor, the thickness of each sapphire window is 4 mm, inorganic mineral oil is injected between the two sapphires at a flow rate of 2.5 gallons per minute, and a medium wave refrigeration type infrared imager InfraTec is adopted

8300 the temperature of the microprocessor is collected.

The activating function of the full connection layer in step 3 uses the ReLU function.

The initial learning rate, weight attenuation and SGD momentum of the convolutional neural network model in step 5 were set to 0.0001, 0.0005 and 0.9, respectively, and the network model was trained 500000 times.

Advantageous effects

The invention provides a microprocessor non-uniform sampling heat distribution reconstruction method based on a convolutional neural network. Firstly, manufacturing a sample temperature data set by using an oil-cooled heat dissipation system capable of transmitting infrared spectrum; secondly, completing the training of the model in the chip design or the post-silicon stage, and storing the trained network; thirdly, in the operation stage of the processor, judging the category of the workload application program by using a classification network according to the temperature data sampled by the thermal sensor; and finally, obtaining the whole temperature distribution of the chip by utilizing the corresponding heat distribution reconstruction network. Because the workload classification network model and the thermal distribution reconstruction network model are respectively designed, the non-uniform temperature data sampled by a limited number of thermal sensors is utilized to carry out thermal distribution reconstruction, and the recovered whole chip temperature distribution data is more accurate.

Drawings

FIG. 1 is a basic block diagram of a convolutional neural network-based microprocessor thermal distribution reconstruction method according to the present invention

FIG. 2 is a diagram of a workload classification network model

FIG. 3 is a diagram of a structure of a heat distribution reconstruction network model

FIG. 4 is a graph of reconstructed root mean square error for the SPEC CPU2006 benchmark program

FIG. 5 is a graph of maximum error reconstructed for the SPEC CPU2006 benchmark

FIG. 6 is a raw heat map of the mcf benchmark program

FIG. 7 is a heat map reconstructed by the k-LSE method of the mcf benchmarking program

FIG. 8 is a reconstructed heat map of a surface spline interpolation method of an mcf benchmark test program

FIG. 9 is a convolution neural network method reconstruction heat map of the mcf benchmarking program

Detailed Description

The invention will now be further described with reference to the following examples and drawings:

the method of the invention is a method for realizing accurate heat distribution reconstruction by combining a network model, which comprises the following steps: firstly, a sample temperature data set is manufactured by utilizing an infrared thermal measurement technology, and a work load selects a SPEC CPU2006 standard performance evaluation standard (including 12 groups of integer standards and 17 groups of floating point standards); secondly, judging the category of the workload application program by using a classification network; finally, reconstructing the temperature distribution of the chip by using the corresponding reconstruction network; therefore, a total of 30 network models (including 1 classification network and 29 reconstruction networks) need to be designed and trained.

The invention comprises the following steps:

step 1: manufacturing a sample temperature data set by using an infrared-transmission-spectrum-based oil-cooled heat dissipation system, and selecting a SPEC CPU2006 standard performance evaluation standard as a test program;

step 2: scaling the characteristic range of the chip discrete temperature data by adopting a Min-Max normalization (Min-Max normalization) method, and uniformly mapping the characteristic range between [0,1] to improve the convergence speed of the model;

wherein, T = (T) _x,y ) _H×W Discretized heatmap matrix for processor, t _x,y Max (T) and min (T) are the maximum and minimum values of the heatmap matrix T, respectively, representing the temperature at coordinates (x, y) on the chip;

and 3, step 3: designing a workload classification network model, which mainly comprises an input layer, four Fully Connected (FC for short) layers and an output layer;

and 4, step 4: respectively designing a heat distribution reconstruction network model of a 29-class benchmark test program, wherein the heat distribution reconstruction network model mainly comprises an input layer, two full-connection layers, a re-dimension adjusting layer (Reshape) and three convolution (Conv for short) layers;

and (3) selecting a reconstruction network by using the result of the step (3), converting input data into a one-dimensional vector with the dimension of 3600 after two full-connection layer operations, adding dimension adjustment operation, converting the data into the size of 60 multiplied by 60, and performing three-layer convolution operation on the data of 60 multiplied by 60 to obtain the final chip reconstruction temperature distribution. The number of filters of the three convolution layers is respectively 64, 32 and 1, the sizes of convolution kernels are all 3 multiplied by 3, and the step length (stride) is all 1;

and 5: training a network; a buffer (conditional architecture for fast feature embedding, referred to as "buffer") framework platform is used as an implementation for training and testing a Convolutional neural network model, and a Back-propagation (BP) algorithm and a Stochastic Gradient Descent (SGD) algorithm are used for updating and optimizing network model parameters. Initial learning rate (initial learning rate), weight decay (weight decay), and SGD momentum (momentum for SGD) were set to 0.0001, 0.0005, and 0.9, respectively, and the network model was trained 500000 times.

The method of the invention mainly comprises the following steps: and finishing the training of the model in a chip Design or Post-Silicon Phase (Design or Post-Silicon Phase) and storing the trained network. In the running stage (Run Time Phase) of the processor, according to the temperature data sampled by the thermal sensor, firstly, the classification network is used for judging the category of the workload application program, and then, the corresponding reconstruction network is used for reconstructing the temperature distribution of the chip.

The basic framework of the microprocessor heat distribution reconstruction method based on the convolutional neural network is shown in fig. 1, and the specific implementation process is as follows:

1. and acquiring real temperature distribution of the processor in real time operation by using the infrared-spectrum-permeable oil-cooled heat dissipation system.

The unique infrared transmissive oil-cooled heat dissipation system of the present invention consists essentially of two sapphire windows spaced apart by a gap of 1 mm, each sapphire window having a thickness of about 4 mm. Sapphire is an infrared-transparent composite material that compensates for Thermal Interface Materials (TIM), sapphire windows on top of the mold that can increase heat capacity and improve lateral heat diffusion, with inorganic mineral oil (Sigma M3156) flowing between two sapphire windows to ensure heat dissipation. Inorganic mineral oils have the characteristics of being highly transparent to infrared spectra, having a large specific heat capacity and relatively high thermal conductivity, and may be suitable choices as the coolant. Two variable-speed direct-current pumps are connected in series in an experiment platform to serve as a power system, inorganic mineral oil is injected between two pieces of sapphire at the flow rate of 2.5 gallons per minute, the flow speed is fixed at 8 meters per second, and in order to ensure that the heat is taken away by the inorganic mineral oil in the system in a circulating mode, the temperature of an oil cooling thermostat is set to be 10 ℃. In the invention, a medium wave refrigeration type infrared imager InfraTec is selected

8300 operating wavelength of 3.7-4.8 μm, infrared spectrum in this band partially transparent to the silicon backplane (transmittance about 55%), thermal image resolution set at 640 × 512, sampling interval set at 17 ms to accurately capture temperature changes during the run.

The test processor selected in the present invention was AMD Athlon II X4 610e, with an operating frequency of 2.4GHz. All 29 applications in the standard performance evaluation benchmark SPEC CPU2006 were subjected to dynamic temperature extraction on the processor using the thermal modeling platform described above to obtain real-time, accurate sample temperature data.

The heatmap samples for each SPEC CPU2006 benchmark in the sample temperature database were 3000 (87000 heatmap samples in total for 29 benchmarks), of which 2700 were randomly chosen as the training dataset and the remaining 300 as the testing dataset. The resolution of each heat map sample is set to 60 × 60, i.e., containing 3600 pixels.

2. And carrying out normalization processing on the input temperature data.

Scaling the characteristic range of the chip discrete temperature data by adopting a Min-Max normalization method, and uniformly mapping the characteristic range to [0,1]]In the meantime. Assume the discretized heatmap matrix of the processor is T = (T) _x,y ) _H×W Wherein, t _x,y Representing the temperature at coordinates (x, y) on the chip, H and W represent the height and width of the discretized heat map (i.e., the resolution of the heat map), respectively, and satisfy 0. Ltoreq. X.ltoreq.H-1 and 0. Ltoreq. Y.ltoreq.W-1. Then the chip discrete temperature can be normalized by Min-Max as:

where max (T) and min (T) represent the maximum and minimum values of the heatmap matrix T, respectively. Heatmap matrix T '= (T' _x,y ) _H×W Has a temperature value distribution range of [0,1]]。

3. And designing a workload classification network model to judge the sampling temperature source of the thermal sensor.

The workload classification network model structure is shown in fig. 2, and the classification network framework is shown in table 1, and mainly includes: one input layer, four fully-connected layers and one output layer. After sampling temperature data of N thermal sensors on a chip are obtained, normalization processing is firstly carried out, and then the sampling temperature normalization numerical values of the thermal sensors are arrayed into a one-dimensional vector with the size of Nx 1 to serve as input data. The input data are sequentially subjected to the operation of four full-connection layers and then are changed into one-dimensional vectors with the dimensionality of 720, the activating function in the full-connection layers is a ReLU function, and the mathematical description of the ReLU function is shown in a formula (2). Compared with a Sigmoid function, the ReLU function is beneficial to the convergence of a stochastic gradient descent algorithm, and the convergence speed is about 6 times faster. Finally, a workload classification result is obtained through an output layer SoftMax classifier, which corresponds to the 29 classes of benchmark test programs in the SPEC CPU2006, i.e., which SPEC CPU2006 benchmark test program the thermal sensor sample temperature comes from is determined.

TABLE 1 workload Classification Web framework

4. Thermal distribution reconstruction networks were designed for class 29 benchmark programs, respectively.

The structure of the heat distribution reconstruction network model is shown in fig. 3, and the reconstruction network framework is shown in table 2, and mainly includes: one input layer, two fully connected layers, one re-dimension adjustment layer, and three convolutional layers. The input data and normalization processing for the reconstructed network is the same as for the classified network. The input data is sequentially subjected to the operation of two full connection layers and then is changed into a one-dimensional vector with the dimension of 3600. Because the output of the thermal distribution reconstruction network is the reconstruction heat map (namely a two-dimensional matrix) of the chip, in addition, in order to facilitate the convolution operation of the data, reduce the parameters of a network model and reduce the calculation complexity, a Reshape operation is added after the operation of two full connection layers, and the data is converted into the size of 60 multiplied by 60 through the Reshape layer. And finally, carrying out three-layer convolution operation on the 60 x 60 data to obtain the final chip reconstruction temperature distribution. The filter numbers of the three convolutional layers are respectively 64, 32 and 1, the convolutional kernel sizes are all 3 multiplied by 3, and the step length is all 1. Meanwhile, in order to make the input data and the convolved feature map have the same dimension, zero padding is also required to be performed on the input data, that is, a boundary whose elements are all 0 is added (pad size is set to 1).

TABLE 2 thermal distribution reconstruction network framework

5. And (4) building an experimental platform and training a network.

A Caffe framework is used as an implementation platform for training and testing a convolutional neural network model, in the training stage of the network model, gaussian distribution is adopted to initialize training parameters, and the updating and optimization of the network model parameters are realized by using a BP algorithm and an SGD algorithm. Initial learning rate, weight decay, and SGD momentum were set to 0.0001, 0.0005, and 0.9, respectively, with the network model trained 500000 times.

In this embodiment, five groups of different numbers of thermal sensors (the numbers of the thermal sensors are respectively 9, 16, 25, 36 and 49) which are randomly placed are respectively adopted, and comparison results of a convolutional neural network (CNN-based) based thermal distribution reconstruction method, a Surface Spline (Surface Spline) interpolation algorithm and a k-LSE method are given from the aspects of Root-Mean-Square Error (RMSE), maximum Error (maxme) and the like. Assume that M heat map samples are used

And performing a test, wherein the root mean square error and the maximum error are respectively defined as:

wherein the content of the first and second substances,

is the reconstructed result of the heat map T. Due to local temperature of the chipThe peak causes Thermal Runaway (Thermal Runaway), and thus the maximum error is an important index for measuring the Thermal perception performance. The average classification accuracy and average reconstruction results for all benchmark test programs in the SPEC CPU2006 are shown in tables 3 and 4, respectively. Wherein, the classification accuracy rate refers to the ratio of the number of correctly classified samples to the total number of samples. As can be seen from Table 3, the accuracy of workload classification was higher than 95% with different numbers of thermal sensors, and in addition, as can be seen from Table 4, the performance of the method of the present invention is significantly better than the other two reconstruction methods with different numbers of thermal sensors. Fig. 4 and 5 show a visual comparison of the reconstruction accuracy for each benchmark program using 36 thermal sensors. It can be seen from figures 4 and 5 that the root mean square error and the maximum error of all benchmark test programs using the method of the present invention are limited to 0.2 ℃ and 2 ℃, respectively. It is worth noting that the superior performance of the convolutional neural network-based thermal distribution reconstruction method is at the cost of "sacrificing" memory, which requires storing a trained network model in memory. In this example, storing the classified network model and the single reconstructed network model requires approximately 3206 Kilobytes (KB) and 5208 Kilobytes (KB), respectively, and thus, storing 30 network models (including 1 classified network and 29 reconstructed networks) requires 154238 Kilobytes (KB) in total. The raw heat map of the mcf benchmarking program is shown in fig. 6, and the three methods described above, using 36 thermal sensors, have the effects of reconstructing the thermal profile of the mcf benchmarking program as shown in fig. 7-9.

TABLE 3 average Classification accuracy for different thermal sensor counts

TABLE 4 average reconstruction results for different thermal sensor counts

Claims

1. A microprocessor non-uniform sampling heat distribution reconstruction method based on a convolutional neural network is characterized by comprising the following steps:

wherein, T = (T) _x,y ) _H×W For the processor's discretized heat map matrix, H and W represent the height and width, respectively, of the discretized heat map, t _x,y Represents the temperature at coordinate (x, y) on the chip, t' _x,y Max (T) and min (T) are the maximum and minimum values, respectively, of the heatmap matrix T for the normalized temperature;

the input data is a one-dimensional vector with the size of Nx 1 obtained after normalization of chip temperature data, operation of four full connection layers is sequentially carried out, a workload classification result is obtained through an output layer SoftMax classifier, and the result corresponds to a 29-class benchmark test program in the SPEC CPU 2006;

and 4, step 4: respectively designing a heat distribution reconstruction network model of a 29-class benchmark test program, wherein the heat distribution reconstruction network model comprises an input layer, two full-connection layers, a re-dimension adjusting layer and three convolution layers;

2. The method of claim 1, wherein the infrared spectrum transparent oil-cooled heat dissipation system is formed by placing two sapphire windows spaced by a gap of 1 mm on the microprocessor, each sapphire window having a thickness of 4 mm, injecting inorganic mineral oil between the two sapphire windows at a flow rate of 2.5 gallons per minute, and collecting the temperature of the microprocessor by using a medium-wave refrigeration infrared imager.

3. The convolutional neural network-based non-uniform sampling thermal distribution reconstruction method for the microprocessor according to claim 1, wherein the activation function of the full link layer in step 3 is a ReLU function.

4. The method as claimed in claim 1, wherein the initial learning rate, the weight attenuation and the SGD momentum of the convolutional neural network model in step 5 are set to 0.0001, 0.0005 and 0.9 respectively, and the network model is trained 500000 times.