CN103279446A - Isomerism mixed calculation multi-platform system using central processing unit (CPU)+graphic processing unit (GPU)+many integrated core (MIC) - Google Patents

Isomerism mixed calculation multi-platform system using central processing unit (CPU)+graphic processing unit (GPU)+many integrated core (MIC) Download PDF

Info

Publication number
CN103279446A
CN103279446A CN 201310229342 CN201310229342A CN103279446A CN 103279446 A CN103279446 A CN 103279446A CN 201310229342 CN201310229342 CN 201310229342 CN 201310229342 A CN201310229342 A CN 201310229342A CN 103279446 A CN103279446 A CN 103279446A
Authority
CN
China
Prior art keywords
gpu
cpu
mic
performance
card
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN 201310229342
Other languages
Chinese (zh)
Inventor
张清
张广勇
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Inspur Electronic Information Industry Co Ltd
Original Assignee
Inspur Electronic Information Industry Co Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Inspur Electronic Information Industry Co Ltd filed Critical Inspur Electronic Information Industry Co Ltd
Priority to CN 201310229342 priority Critical patent/CN103279446A/en
Publication of CN103279446A publication Critical patent/CN103279446A/en
Pending legal-status Critical Current

Links

Images

Landscapes

  • Multi Processors (AREA)

Abstract

The invention relates to an isomerism mixed calculation multi-platform system using a central processing unit (CPU) + a graphic processing unit (GPU) + a many integrated core (MIC). The system comprises a CPU platform and a connector. The CPU platform includes a CPU chip, at least one GPU card and at least one intel MIC card. The connector is used for connecting the GPU card and the MIC card to the CPU platform. The isomerism system can effectively improve system performance and calculation density, meets requirements for different high-performance applications and solves the problem that the system performance is low and software productivity is low during high performance calculation applications. The information processing isomerism system is formed by the CPU chip, a GPU chip and an MIC chip, preferably a high-calculation-density system which is currently popular and provided with double-branch CPU chips, two GPU chips and two MIC chips is adopted, and therefore the system performance can be effectively improved, and the requirements for high-performance applications are met.

Description

A kind of CPU+GPU+MIC isomery that utilizes mixes the multiple platform system that calculates
Technical field
The present invention relates to field of computer technology, specifically a kind of CPU+GPU+MIC isomery that utilizes mixes the multiple platform system that calculates.
Background technology
High-performance calculation is the forward position hi-tech of message area, safeguarding national security, promote the science and techniques of defence progress, promoting to have direct impetus aspect the sophisticated weapons development, is one of important symbol of weighing a national comprehensive strength.Develop rapidly along with informationized society, human more and more higher to the requirement of information processing capability, demand high-performance calculations such as not only petroleum prospecting, weather forecast, space flight national defence, scientific research, and finance, e-government, education, enterprise, online game etc. widely the field to the demand rapid growth of high-performance calculation.
Computing velocity is particularly important for high-performance calculation, high-performance calculation will be authorized exhibition towards multinuclear, crowd, adopt parallel lifting of isomery to use computing velocity, CPU+GPU is the very ripe collaborative computation schema of isomery at present, be fit to application or algorithm that highly-parallel calculates, as Fluid Mechanics Computation application, FFT calculating etc., but because there is huge challenge in GPU on programming efficiency, fine granularity design of Parallel Algorithms, large-scale parallel performance.Along with Intel MIC (Intel Many Integrated Core, integrated many nuclear) formal issue, CPU+MIC will be pretty good selection of high-performance calculation, adopt this framework when promoting application performance, to improve programming efficiency greatly, MIC can solve the more applications performance bottleneck with the CPU perfect adaptation, but it is not high for some vectorization degree, the memory-intensive type is used its performance and also is faced with challenge, and the CPU+GPU+MIC isomery mixes the multiple platform structure of calculating with the advantage of comprehensive CPU+GPU and two kinds of isomery patterns of CPU+MIC, will greatly satisfy the calculated performance demand of different application.
Summary of the invention
The purpose of this invention is to provide a kind of CPU+GPU+MIC of utilization isomery and mix the multiple platform system that calculates.
The objective of the invention is to realize in the following manner, this system comprises:
A central processor CPU platform, described platform comprises cpu chip; At least one GPU card; At least one integrated many nuclear mic card; Also be useful on the described GPU card of connection and mic card to the connector of described CPU platform, described connector is the PCIE slot, and the memory configurations of system is not less than 128GB, and the peak power support is not less than 1800w; The operating system of CPU platform, compiler and driving support that all GPU, MIC, operating system are Linux, and described compiler is the nvcc of icc, icpc, ifort and the Nvidia of Intel; System also comprises 2 cpu chips, 2 GPU cards and 2 mic cards, and described cpu chip comprises 8 cores, and described GPU card comprises 512 GPU cores, and described mic card comprises at least 50 cores.
This system comprises: first performance element, its processor are realized by 2 cpu chips, are used for carrying out information processing; Second, third performance element all is connected with described first performance element, and processor is realized by 2 GPU cards and 2 mic cards respectively, is used for and the described first performance element executed in parallel information processing.
Described first, second, third performance element adopts the mode of multithreading to carry out information processing.
Described first, second, third performance element is carried out information processing based on the principle of load balancing.
Described first performance element starts 16 thread execution information processings, described second performance element starts at least 1 ten thousand GPU thread execution information processing, at least 200 thread execution information processings of the 3rd performance element startup.
The invention has the beneficial effects as follows: the technical problem to be solved in the present invention provides a kind of CPU+GPU+MIC of utilization isomery and mixes the multiple platform system that calculates, the problem that system performance is low when using with the solution high-performance calculation, the software yield-power is low.Information processing heterogeneous system of the present invention is made of cpu chip, GPU chip and MIC chip, be preferably popular two-way cpu chip, 2 GPU chips and 2 the high bulk density of the MIC chip systems at present that adopt, can effectively improve system performance, satisfy the requirement of performance application.
Description of drawings
Fig. 1 utilizes the CPU+GPU+MIC isomery to mix the modular structure synoptic diagram of the multiple platform system embodiment 1 that calculates;
Fig. 2 utilizes the CPU+GPU+MIC isomery to mix the modular structure synoptic diagram of the multiple platform system embodiment 2 that calculates;
Fig. 3 is PSTM serial operational effect figure;
Fig. 4 is the operational effect figure of the operation PSTM of system.
Embodiment
Explain below with reference to Figure of description system of the present invention being done.
Embodiment 1
The present invention is based on the CPU+GPU+MIC isomery and mix the multiple platform system that calculates, as figure 1Shown in, this system comprises:
A central processing unit (CPU) platform, described platform comprises cpu chip;
At least one GPU card;
At least one integrated many nuclears (MIC) card;
Connector is used for connecting described mic card and GPU card to described CPU platform.
Particularly, described connector is the PCIE slot.
GPU is Intel Company's exploitation for the GPU of the Fermi framework of NVIDIA company exploitation, MIC, and the crowd who is used for high performance parallel computation examines chip.GPU and MIC can both provide the computing power of highly-parallel, and the smart peak performance of its pair all reaches more than the 1TFlops.Adopt CPU+GPU+MIC mixing isomery mode, with the advantage of comprehensive three platforms, adapt to and accelerate different performance application, accelerate the development of high-performance calculation, solve the performance bottleneck that high-performance calculation is used fast.
This system uses at high-performance calculation, adopt the CPU+GPU+MIC isomeric architecture, the crowd of having merged multinuclear computing power, GPU and the MIC of CPU platform assesses the calculation ability, take full advantage of the computing power of three kinds of chips, make all fellowship calculating of three, thereby the computing power of system is strengthened greatly, solved the performance bottleneck that high-performance calculation is used, so this system is a high performance system, and this system satisfies the demand of different application, can realize accelerating to different application.This system still is a low energy consumption high-density systems simultaneously, and its performance power consumption is than being higher than isomorphism CPU platform far away, and total system is obtaining the high performance while, saved energy consumption, and the space of reducing machine room, so generally speaking, this system is high-effect a, high-density systems.
The memory configurations of described system is more than the 128GB, and peak power is supported more than the 1800w.
MIC is all supported in the operating system of described CPU platform, compiler and driving.
Described operating system is Linux, and described compiler is icc, icpc, ifort, the nvcc of Intel.
Preferably, described system comprises 2 cpu chips, 2 GPU cards and 2 mic cards, and described cpu chip comprises 8 cores, and described GPU card comprises 512 GPU cores, and described mic card comprises 50 more than the core.
In order to make the purpose, technical solutions and advantages of the present invention more clear, below in conjunction with drawings and Examples, the present invention is done following detailed description.
The present invention is based on CPU+GPU+MIC isomery mixed architecture, realizes high-performance, high bulk density, low-power consumption, high application adaptability.Below describe from hardware components and system environments configuration two inventions:
Hardware components:
The CPU platform adopts two-way, supports that 2 CPU work simultaneously, and this implementation process system adopts 2 intel E5-2680,8 nuclear CPU, and dominant frequency is 2.7GHz
System has 4 above PCIE slots, can insert 2 GPU cards, 2 mic cards, and native system adopts 2 mic cards, has 50 more than the core on each card.
It is big that the memory configurations of system is wanted, and is more than 2 times of original CPU platform.The above internal memory of native system configuration 128GB.
System power dissipation is supported more than the 1800w, guarantees that total system runs well, and the native system peak power is supported 1800w.
The system environments configuration:
Operating system is supported MIC, needs to install (SuSE) Linux OS.This implementation process adopts Red Hat Enterprise Linux 6.0 GA 64-bit kernel 2.6.32-71;
Compiler is supported GPU, MIC, can adopt the nvcc compiler of icc, icpc, ifort and the NVIDIA of Intel;
Support the driving of GPU and MIC.
Embodiment 2
This system will realize efficiently, must design by software and hardware integration, allows the application software operation run in this system most effective.
Given this, a kind of CPU+GPU+MIC of utilization isomery of the present invention mixes the multiple platform system that calculates also can be from being described with lower angle, and as shown in Figure 2, this system comprises:
First performance element, its processor are realized by 2 cpu chips, are used for carrying out information processing;
Second, third performance element all is connected with described first performance element, and processor is realized by 2 GPU cards and 2 mic cards respectively, is used for and the described first performance element executed in parallel information processing;
Particularly, described first, second, third performance element adopts the mode of multithreading to carry out information processing, and described first, second, third performance element is carried out information processing based on the principle of load balancing.
Wherein, described first performance element starts 16 thread execution information processings, 2 GPU cards of described second performance element all start thousands of or the information processing of up to ten thousand the light granularity thread execution of GPU, 2 mic cards of described the 3rd performance element all start 200 above thread execution information processings.
Preferably, described cpu chip comprises at least 8 cores, and each has examined a thread, and described GPU card comprises 512 GPU cores, and mic card comprises at least 50 cores, and each core can play 4 threads.
The server of main flow is two-way at present, namely inserts 2 CPU, because SandyBridge CPU has 40 passages at present, 2 CPU are 80 passages, insert 2 GPU cards, 2 mic cards at the PCIE slot, PCIE is most effective, and is best from transmission data performance between CPU and GPU and CPU and the MIC.
In order to test the performance of this system, can select high-performance calculation to use, this uses algorithm high parallel task, data do not have dependence between the parallel task, and concurrency is good, and whole application requires high to system performance, earthquake pre-stack time migration (PreStack Time Migration, PSTM) possess above specific application just, below be applied as example with this, existing CPU platform with single-threaded operation is carried out improved process describe:
Original PSTM program is with the single-threaded CPU platform that operates in, at first utilize the CPU multi-core platform, adopt the OpenMP programming model that it is realized with multithreading, adopt used calculation task 16 thread parallels to get up, the computing power of all nuclears of 2 CPU is all brought into play;
The computing power of total system is divided into 5 equipment, and first GPU card starts up to ten thousand GPU threads as equipment 0, and second GPU card starts up to ten thousand GPU threads as equipment 1; First mic card starts 200 more than the thread as equipment 2, and second MIC chip starts 200 more than the thread as equipment 3; 2 CPU start 16 threads as equipment 4; As shown in Figure 2;
The calculation task of whole PSTM is divided according to the computing power of these five equipment, make the parallel computation simultaneously of five equipment, namely these 5 equipment fellowships calculate, and reach the effect that CPU, GPU and MIC calculate simultaneously, and the proof load equilibrium, total system realizes high-performance.
Particularly, to test 91 surveys line, 963 CMP(common midpoints on the every survey line) point, importing 110000 track datas, to carry out migration imaging be example, under original CPU isomorphism system, PSTM is 76053s with the time of single-threaded serial mode cost, and native system working time is 537s, and performance promotes greatly.The imaging effect figure of CPU serial version PSTM operation sees shown in the accompanying drawing (3) that the imaging effect figure of native system operation sees that shown in the accompanying drawing (4), wherein horizontal ordinate is the common midpoint of certain bar side line, ordinate is the time, from image, two width of cloth image basically identicals illustrate that operation result is correct.
System of the present invention, this system has high-performance, low-power consumption, high computation-intensive, high application adaptability characteristics, with solving performance bottleneck and the power problems of performance application, satisfy actual production and scientific research demand, and reduce machine room construction cost and management, operation, maintenance cost.Among the present invention, CPU not only participates in logical calculated, also participate in intensive core calculations, and GPU, MIC only participates in the core intensive calculations, and the CPU+GPU+MIC isomery mixes calculating, realizes maximizing performance.
From earthquake pre-stack time migration embodiment as can be seen total system realize high-performance, low-power consumption, high bulk density, satisfied scientific research requirement and the commercial production requirement of performance application greatly, this system has also reduced machine room construction cost and management, operation, maintenance cost.
Except the described technical characterictic of instructions, be the known technology of those skilled in the art.

Claims (5)

1. one kind is utilized the CPU+GPU+MIC isomery to mix the multiple platform system that calculates, and it is characterized in that this system comprises:
A central processor CPU platform, described platform comprises cpu chip; At least one GPU card; At least one integrated many nuclear mic card; Also be useful on the described GPU card of connection and mic card to the connector of described CPU platform, described connector is the PCIE slot, and the memory configurations of system is not less than 128GB, and the peak power support is not less than 1800w; The operating system of CPU platform, compiler and driving support that all GPU, MIC, operating system are Linux, and described compiler is the nvcc of icc, icpc, ifort and the Nvidia of Intel; System also comprises 2 cpu chips, 2 GPU cards and 2 mic cards, and described cpu chip comprises 8 cores, and described GPU card comprises 512 GPU cores, and described mic card comprises at least 50 cores.
2. multiple platform system according to claim 1 is characterized in that, this system comprises: first performance element, its processor are realized by 2 cpu chips, are used for carrying out information processing; Second, third performance element all is connected with described first performance element, and processor is realized by 2 GPU cards and 2 mic cards respectively, is used for and the described first performance element executed in parallel information processing.
3. multiple platform system according to claim 2 is characterized in that, described first, second, third performance element adopts the mode of multithreading to carry out information processing.
4. multiple platform system according to claim 2 is characterized in that, described first, second, third performance element is carried out information processing based on the principle of load balancing.
5. multiple platform system according to claim 2, it is characterized in that described first performance element starts 16 thread execution information processings, described second performance element starts at least 1 ten thousand GPU thread execution information processing, at least 200 thread execution information processings of the 3rd performance element startup.
CN 201310229342 2013-06-09 2013-06-09 Isomerism mixed calculation multi-platform system using central processing unit (CPU)+graphic processing unit (GPU)+many integrated core (MIC) Pending CN103279446A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN 201310229342 CN103279446A (en) 2013-06-09 2013-06-09 Isomerism mixed calculation multi-platform system using central processing unit (CPU)+graphic processing unit (GPU)+many integrated core (MIC)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN 201310229342 CN103279446A (en) 2013-06-09 2013-06-09 Isomerism mixed calculation multi-platform system using central processing unit (CPU)+graphic processing unit (GPU)+many integrated core (MIC)

Publications (1)

Publication Number Publication Date
CN103279446A true CN103279446A (en) 2013-09-04

Family

ID=49061971

Family Applications (1)

Application Number Title Priority Date Filing Date
CN 201310229342 Pending CN103279446A (en) 2013-06-09 2013-06-09 Isomerism mixed calculation multi-platform system using central processing unit (CPU)+graphic processing unit (GPU)+many integrated core (MIC)

Country Status (1)

Country Link
CN (1) CN103279446A (en)

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104461849A (en) * 2014-12-08 2015-03-25 东南大学 Method for measuring power consumption of CPU (Central Processing Unit) and GPU (Graphics Processing Unit) software on mobile processor
CN104536936A (en) * 2015-01-28 2015-04-22 浪潮电子信息产业股份有限公司 Draw-bar box type programmable calculator device
CN105183079A (en) * 2015-09-01 2015-12-23 浪潮(北京)电子信息产业有限公司 Portable programmable calculator
CN105227669A (en) * 2015-10-15 2016-01-06 浪潮(北京)电子信息产业有限公司 A kind of aggregated structure system of CPU and the GPU mixing towards degree of depth study
CN105893151A (en) * 2016-04-01 2016-08-24 浪潮电子信息产业股份有限公司 High-dimensional data stream processing method based on CPU + MIC heterogeneous platform

Cited By (7)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN104461849A (en) * 2014-12-08 2015-03-25 东南大学 Method for measuring power consumption of CPU (Central Processing Unit) and GPU (Graphics Processing Unit) software on mobile processor
CN104461849B (en) * 2014-12-08 2017-06-06 东南大学 CPU and GPU software power consumption measuring methods in a kind of mobile processor
CN104536936A (en) * 2015-01-28 2015-04-22 浪潮电子信息产业股份有限公司 Draw-bar box type programmable calculator device
CN105183079A (en) * 2015-09-01 2015-12-23 浪潮(北京)电子信息产业有限公司 Portable programmable calculator
CN105227669A (en) * 2015-10-15 2016-01-06 浪潮(北京)电子信息产业有限公司 A kind of aggregated structure system of CPU and the GPU mixing towards degree of depth study
CN105893151A (en) * 2016-04-01 2016-08-24 浪潮电子信息产业股份有限公司 High-dimensional data stream processing method based on CPU + MIC heterogeneous platform
CN105893151B (en) * 2016-04-01 2019-03-08 浪潮电子信息产业股份有限公司 High-dimensional data stream processing method based on CPU + MIC heterogeneous platform

Similar Documents

Publication Publication Date Title
CN101901042B (en) Method for reducing power consumption based on dynamic task migrating technology in multi-GPU (Graphic Processing Unit) system
CN102243321B (en) Method and system for processing seismic pre-stack time migration
CN103279446A (en) Isomerism mixed calculation multi-platform system using central processing unit (CPU)+graphic processing unit (GPU)+many integrated core (MIC)
Wang et al. SODA: Software defined FPGA based accelerators for big data
CN102253919A (en) Parallel numerical simulation method and system based on GPU and CPU cooperative operation
Lai et al. Hybrid MPI and CUDA parallelization for CFD applications on multi‐GPU HPC clusters
CN109284250A (en) A kind of calculating acceleration system and its accelerated method based on large-scale F PGA chip
Camp GPU Acceleration of Particle AdvectionWorkloads in a Parallel, Distributed Memory Setting
CN112631986B (en) Large-scale DSP parallel computing device
CN103294639A (en) CPU+MIC mixed heterogeneous cluster system for achieving large-scale computing
CN103049329A (en) High-efficiency system based on central processing unit (CPU)/many integrated core (MIC) heterogeneous system structure
CN102902655A (en) Information processing heterogeneous system
Cui et al. Research on parallel association rules mining on GPU
CN109918335A (en) One kind being based on 8 road DSM IA frame serverPC system of CPU+FPGA and processing method
Childs et al. Particle advection performance over varied architectures and workloads
CN104360979B (en) computer system based on graphics processor
Zhou et al. Parallel data cube computation on graphic processing units
Wang et al. Data motion acceleration: Chaining cross-domain multi accelerators
Kerbyson et al. Adapting wave-front algorithms to efficiently utilize systems with deep communication hierarchies
CN102866423B (en) Seismic prestack time migration processing method and system
Kryuchkov et al. Design of multipurpose computational cluster based on ARM single-board computers
Khaled et al. Parallel study of 3-D oil reservoir data visualization tool using hybrid distributed/shared-memory models
Gong et al. Optimizing Sweep3D for graphic processor unit
Sun Construction of Artistic Design Patterns Based on Improved Distributed Data Parallel Computing of Heterogeneous Tasks
CN103019323A (en) Binary-star computer server based on Loongson processor

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
WD01 Invention patent application deemed withdrawn after publication

Application publication date: 20130904

WD01 Invention patent application deemed withdrawn after publication