CN203465722U - Computer system facing multi-scale calculation - Google Patents

Computer system facing multi-scale calculation Download PDF

Info

Publication number
CN203465722U
CN203465722U CN201320106696.7U CN201320106696U CN203465722U CN 203465722 U CN203465722 U CN 203465722U CN 201320106696 U CN201320106696 U CN 201320106696U CN 203465722 U CN203465722 U CN 203465722U
Authority
CN
China
Prior art keywords
computer system
many
core processor
processor
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Expired - Fee Related
Application number
CN201320106696.7U
Other languages
Chinese (zh)
Inventor
葛蔚
李博
李静海
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Institute of Process Engineering of CAS
Original Assignee
Institute of Process Engineering of CAS
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Institute of Process Engineering of CAS filed Critical Institute of Process Engineering of CAS
Priority to CN201320106696.7U priority Critical patent/CN203465722U/en
Application granted granted Critical
Publication of CN203465722U publication Critical patent/CN203465722U/en
Anticipated expiration legal-status Critical
Expired - Fee Related legal-status Critical Current

Links

Images

Landscapes

  • Multi Processors (AREA)

Abstract

The utility model discloses a computer system facing multi-scale calculation. The system comprises a plurality of general multi-core processors, a multi-command multi-data multi-core processor, a single-command multi-data multi-core processor and a network interface card. The plurality of general multi-core processors are used for controlling and scheduling in a calculating process. The multi-command multi-data multi-core processor is used for processing calculating targets having many determinations and branches. The single-command multi-data multi-core processor is used for processing calculating targets having few determinations and branches. The network interface card is used for allowing the computer system to be connected with a computer network. By using the system provided by the utility model, different calculating targets having different features can achieve adaptive high-efficiency execution parts, and total calculating efficiency is improved, so that solution time is shortened and operation cost is reduced.

Description

A kind of computer system towards Multi-Scale Calculation
Technical field
The utility model relates to high-performance computer field, relates in particular to a kind of computer system towards Multi-Scale Calculation.
Background technology
Develop rapidly along with modern science and technology, traditional scientific experiment and theoretical study method can not meet the demand of modern scientific research and technical progress completely, and computer simulation is regarded as " accelerator " of modern science and technology progress and is subject to increasing attention as the novel scientific research method growing up in 20th century.Mainframe computer simulation is to take computing power as basis, and the theoretical model utilization numerical computation method according to goal in research, carries out virtual experimental on computers.The feature of mainframe computer simulation is that expense is low, the cycle is short, convenient, flexible, be widely used, the scientific experiment that even can simulating human modern technologies cannot realize, as galactic evolution and universe forming process etc.Therefore, super computer system has become the research equipment that competitively develop countries in the world.
Find after deliberation, Multi-scale model and discreteness are the common traits of most of simulated object.Through further case study, can adopt the technology with ubiquity below to simulate thering is the complication system of Multi-scale model:
1) on suitable yardstick, system is discrete for having in a large number the stackable interactional naive model of short range unit;
2) the interaction except between said units, they are also subject to the constraint of one or more variations or extremum conditions, thus different behavior while having from independently moving;
3) form of the constraint applying is also decided by the behavior of unit that is tied, so higher level, more complicated model unit can be set, and by the interaction with lower level unit, embodies this constraint-feedback mechanism;
4) relation between said units can be nested, thereby form multi-level computation model.
For above-mentioned technology, we can design multi-level short range connect, from top to bottom by numerous to computing unit system simple, from less to more, make to set up suitable mapping relations between being connected between effect between computing unit and model unit, model unit and computing unit, thereby bring into play to greatest extent the performance of computing hardware, reduce unnecessary hardware spending.Meanwhile, adopt this technology, according to simulated object stability condition physically, we can also carry out corrected Calculation error to the constraint of lower floor unit by upper unit, from mechanism, guarantee the precision of calculating.
In Chinese utility model patent 200910237027.1, proposed a kind ofly towards multiple dimensioned complication system, the computer software and hardware structure based on variation multi-scale method, has realized the feasible simulation to complication system efficiently.But disclosed computer software and hardware structure is to customize for solving the feasible simulation of complication system, the scope of application is narrower.
The Single Instruction Multi-data single instruction stream multiple data stream technology that is otherwise known as, is to control asynchronously a plurality of arithmetic elements with single control module, thereby realizes the technology of parallel computation.The multiple-instruction multiple-data (MIMD) technology multiple-instruction-stream multiple-data stream technology that is otherwise known as, is to control asynchronously a plurality of arithmetic elements with a plurality of control modules, realizes the technology of parallel computation.
Explicit algorithm and implicit algorithm are two kinds of algorithms common in dynamic analysis.Explicit algorithm adopts some difference schemes of kinetics equation, does not need to carry out equilibrium iteration, and computing velocity is fast, enough little as long as time step is got, does not generally have convergence problem.Therefore the internal memory needing is also few than implicit algorithm.And numerical procedure can carry out parallel computation at an easy rate, and program composition is also relatively simple.But the time step that explicit algorithm adopts is generally less, only in the scale of computational problem, enough could show its advantage greatly time.
In implicit algorithm, all need kinetic balance equation to carry out iterative in each time step, and iteration all needs to solve large-scale system of linear equations at every turn, this process need takies a considerable amount of computational resources, disk space and internal memory.Time step in this algorithm can be larger, at least can be more much bigger than explicit algorithm, but in actual operation, to be subject to the restriction of iterations and nonlinear degree, need to get a reasonable value.
Utility model content
The purpose of this utility model is to propose a kind of computer system towards Multi-Scale Calculation, and described computer system can adopt Multi-Scale Calculation technology to improve counting yield, and simultaneously applied widely, extensibility is strong.
For reaching this object, the utility model by the following technical solutions:
Towards a computer system for Multi-Scale Calculation, described computer system comprises:
A plurality of general polycaryon processors, for the control and scheduling of computation process;
Multiple-instruction multiple-data (MIMD) many-core processor, for judging the processing of the calculation task more with branch;
Single instruction multiple data many-core processor, for judging the processing of the calculation task less with branch;
Network interface unit, for being connected to computer network by described computer system.
Further, described a plurality of general polycaryon processor, multiple-instruction multiple-data (MIMD) many-core processor, single instruction multiple data many-core processor and a plurality of network interface unit are all connected with external unit bus (PCI), with realize between a plurality of processors and processor and described network interface unit between communicate by letter.
Further, described a plurality of general polycaryon processors are all connected with shared storage by memory bus, with the storage realizing between a plurality of general polycaryon processors, share.
Further, between described a plurality of general polycaryon processors, by high-speed bus, interconnect, to realize the communication between a plurality of general polycaryon processors.
Further, described general polycaryon processor adopts extremely strong (Xeon) series processors of Intel (Intel) company; High-speed bus between described general polycaryon processor adopts express passway interconnected (Quick Path Interconnect, the QPI) technology of Intel (Intel) company.
Further, described general polycaryon processor adopts Yi, a legendary monarch of Youqiong State in the xia Dynasty dragon (Phenom) series processors of AMD; High-speed bus between described general polycaryon processor adopts super transmission (Hyper Transport, the HT) bus of AMD.
Further, described multiple-instruction multiple-data (MIMD) many-core processor adopts the serial many-core processor of integrated many core (Many Integrated Core, MIC) of Intel (Intel) company.
Further, described single instruction multiple data many-core processor adopts Fermi (Fermi) Series Universal of tall and handsome reaching (Nvidia) company to calculate graphic process unit.
Further, described single instruction multiple data many-core processor adopts Kepler (Kepler) Series Universal of tall and handsome reaching (Nvidia) company to calculate graphic process unit.
Further, described network interface unit adopts Ethernet card.
Further, described network interface unit adopts infinite bandwidth (InfiniBand) interface card.
The utility model proposes the computer system towards Multi-Scale Calculation of a large amount of supercomputing problems that are applicable to a plurality of scientific research fields.The supercomputer system calculating for scientific research with respect to other, the described computer system towards Multi-Scale Calculation has following characteristics:
1) applied widely.Utilize this framework, represent in the overall algorithm and data structure that the various modes of action between various unit are can modular embedding general, without independently writing corresponding software for calculation.
2) extensibility is strong.General super computer system is for adapting to various algorithm and application problem, need the quick swap data of energy between any a pair of processor, the feature of multiple dimensioned discrete analog has determined that each processor only needs and specific only a few processor exchanges or has shared data, as long as reliability allows, in such system, processor quantity can arbitrary extension and keep relative cost and service efficiency constant.
3) parallel efficiency is high.The calculating acting between each discrete unit can be carried out on a large amount of processors simultaneously, and needn't adopt traditional central processing unit (CPU) sequential processes, can greatly improve the ratio of the components and parts in calculating operation, reduce the expense of storage hardware, thereby reduce hardware manufacturing ability under same manufacturing capacity, cost and operation power consumption.
Accompanying drawing explanation
Fig. 1 is the logical organization schematic diagram of the computer system that provides of the utility model specific embodiment.
Fig. 2 is the physical layout schematic diagram of the implement device of the computer system that provides of the utility model specific embodiment.
Embodiment
Below in conjunction with accompanying drawing and by embodiment, further illustrate the technical solution of the utility model.
Fig. 1 is the logical organization schematic diagram of the computer system that provides of the utility model specific embodiment.
Fig. 2 is the physical layout schematic diagram of the implement device of the computer system that provides of the utility model specific embodiment.
Referring to Fig. 1, the described computer system towards Multi-Scale Calculation comprises a plurality of general polycaryon processors 101, memory bus 102, shared storage 103, external unit bus (PCI) 104, high-speed bus 105, multiple-instruction multiple-data (MIMD) many-core processor 106, single instruction multiple data many-core processor 107 and network interface unit 108.
Described a plurality of general polycaryon processor 101 is connected with described external unit bus (PCI) 104.By described external unit bus (PCI) 104, described general polycaryon processor 101 can communicate with described multiple-instruction multiple-data (MIMD) many-core processor 106 and described single instruction multiple data many-core processor 107.Communicating by letter between described general polycaryon processor 101 and described multiple-instruction multiple-data (MIMD) many-core processor 106 and described single instruction multiple data many-core processor 107 adopts interrupt mode.Concrete, when described general polycaryon processor 101 need to communicate with described multiple-instruction multiple-data (MIMD) many-core processor 106 or described single instruction multiple data many-core processor by described external unit bus (PCI) 104, the interrupt pin of described general polycaryon processor by described external unit bus (PCI) 104 sends interrupt request message, receive interruption acknowledge message after described general polycaryon processor 101 can communicate with target processor.
According to an embodiment of the present embodiment, extremely strong (Xeon) series processors of described general polycaryon processor 101Shi Intel (Intel) company.According to the another kind of embodiment of the present embodiment, Yi, a legendary monarch of Youqiong State in the xia Dynasty dragon (Phenom) series processors of described general polycaryon processor 101Shi AMD.
Described memory bus 102 is connected described a plurality of general polycaryon processors with described shared storage 103, for realizing the read-write of 101 pairs of described shared storages 103 of described a plurality of general polycaryon processor.Described memory bus 102 comprises address wire and data line.Described address wire is for transport address signal, to determine the address of the storage unit that described general polycaryon processor 101 of a certain moment reads or writes.Described data line is for transmission of data signals.The data message that described general polycaryon processor 101 is read or writes described shared storage 103 by described data line transmission.
Described shared storage 103 is connected with described a plurality of general polycaryon processors 101 by memory bus 102.Described shared storage 103 is used to a plurality of general polycaryon processors 101 that shared storage area is provided.By described shared storage area, can realize a large amount of exchanges data between a plurality of general polycaryon processors 101.Described shared storage 103 adopts double data rate (DDR) synchronous dynamic random access memory (DDRSRAM).
Described external unit bus (PCI) 104 be connected with described a plurality of general polycaryon processors 101, multiple-instruction multiple-data (MIMD) many-core processor 106, single instruction multiple data many-core processor 107 and network interface unit 108 be connected, for realizing the communication between a plurality of processors and network interface unit.
The highway width of described external unit bus (PCI) 104 is 64, and the bus speed of described external unit bus (PCI) is 133MHz.Communication between all equipment that is connected to described external unit bus (PCI) 104 completes by interrupt mode.
Described high-speed bus 105 is connected described a plurality of general polycaryon processors 101, for realizing the communication between described a plurality of general polycaryon processor 101.Described high-speed bus is mainly used between described a plurality of general polycaryon processor 101 that data scale is less, and number of communications is data interaction comparatively frequently, for example control signal mutual between described a plurality of general polycaryon processor.
If described a plurality of general polycaryon processor is extremely strong (Xeon) series processors of Intel (Intel) company, described high-speed bus is express passway interconnected (Quick Path Interconnect, the QPI) technology of Intel (Intel) company; If described a plurality of general polycaryon processor be AMD Yi, a legendary monarch of Youqiong State in the xia Dynasty dragon (Phenom) series processors, described high-speed bus is super transmission (Hyper Transport, the HT) bus of AMD.
Described multiple-instruction multiple-data (MIMD) many-core processor 106 is connected with described external unit bus (PCI) 104, for the treatment of judgement and the more calculation task of branch.According to an embodiment of the present embodiment, the serial many-core processor of integrated many core of described multiple-instruction multiple-data (MIMD) many-core processor 106Shi Intel (Intel) company (Many Integrated Core, MIC).
Described single instruction multiple data many-core processor 107 is connected with described external unit bus (PCI) 104, for the treatment of judgement and the less calculation task of score value.According to an embodiment of the present embodiment, described single instruction multiple data many-core processor 107 is that Fermi (Fermi) Series Universal of tall and handsome reaching (Nvidia) company calculates graphic process unit.According to the another kind of embodiment of the present embodiment, described single instruction multiple data many-core processor 107 is that Kepler (Kepler) Series Universal of tall and handsome reaching (Nvidia) company calculates graphic process unit.
Referring to Fig. 2, the implement device of described computer system is a kind of rack-mount server.Described rack-mount server comprises cabinet 201, mainboard 202, power supply 203, fan 204, hard disk 205, storer 206, general polycaryon processor 101, multiple-instruction multiple-data (MIMD) many-core processor 106, single instruction multiple data many-core processor 107 and network interface unit 108.Wherein, mainboard 202, fan 204 and hard disk 205 are installed in cabinet 201.Described mainboard 202 is surface-mounted integrated circuits, and all devices of described implement device are all arranged on described mainboard.The heat radiation that described fan 204 is each processor in described implement device arranges.Described hard disk 205 is main storage mediums of described implement device.
Power supply 203, shared storage groove position 206, Principle of External Device Extension groove 207, general polycaryon processor groove position 208 are installed on described mainboard 202.Described power supply 203 becomes to be connected to the processor of mainboard 202 and the direct supply that other equipment can be used by transformation and rectifying conversion by the alternation civil power being connected on cabinet, and the direct supply that conversion is obtained offers processor and other equipment that is connected to mainboard 202.
Described shared storage groove position 206 is for the described shared storage 103 of pegging graft.And described shared storage groove position 206 is connected with the general polycaryon processor groove position 208 of described implement device by memory bus 102, so that described general polycaryon processor 101 can be by the described shared storage 103 of described memory bus 102 access.
Described Principle of External Device Extension groove 207 is for described multiple-instruction multiple-data (MIMD) many-core processor 106 and the described single instruction multiple data many-core processor 107 of pegging graft.And described Principle of External Device Extension groove 207 is connected with described external unit bus (PCI) 108, to communicate by described external unit bus (PCI) 108 between each processor.
Described general polycaryon processor groove position 208 is for the described general polycaryon processor 101 of pegging graft.And described general polycaryon processor groove position 208 is connected with described external unit bus (PCI) 108, to communicate by described external unit bus (PCI) 108 between each processor.Meanwhile, described general polycaryon processor groove position 208 is connected with described high-speed bus 102, and a plurality of general polycaryon processor 101 being plugged on described general polycaryon processor groove position 208 can be communicated by letter each other.
In order to further illustrate described computer system at the beneficial effect aspect raising counting yield, to the present embodiment being applied to the application scenarios of the multiple dimensioned discrete analog of granule fluid system, be specifically described below.
Under this application scene, described general polycaryon processor adopts extremely strong (Xeon) series processors of Intel (Intel) company.High-speed communication between described general polycaryon processor connects express passway interconnected (Quick Path Interconnect, the QPI) technology that adopts Intel (Intel) company.Described multiple-instruction multiple-data (MIMD) many-core processor adopts the serial many-core processor of integrated many core (Many Integrated Core, MIC) of Intel (Intel) company.Described single instruction multiple data many-core processor adopts Fermi (Fermi) Series Universal of tall and handsome reaching (Nvidia) company to calculate graphic process unit.
Described computer system is for granule fluid system and the on a large scale multi-scale Simulation of granule fluid system on a small scale.In granule fluid system, the calculating of fluid generally can be carried out by explicit algorithm on a small scale.Described explicit algorithm has good data locality and operational consistency, and logic judgement is less with branch, is more suitable for single instruction multiple data parallel processing.The inspection of Interaction of particles, although effect is processed with graininess, upgrade and also there is good locality, but the effective object of variable grain is different from the operation that effect is calculated, and contain considerable judgement and branch in calculating, be therefore more suitable for multiple-instruction multiple-data (MIMD) parallel processing.Therefore, when simulation small-scale granule fluid system, described general polycaryon processor 101 is carried out the calculating that the flow field overall situation distributes, and its content comprises multiple-objection optimization and partial differential equation implicit expression numerical solution etc., is responsible for the control and scheduling of whole computation process simultaneously.Described multiple-instruction multiple-data (MIMD) many-core processor 106 is responsible for the EVOLUTIONARY COMPUTATION of particle or cluster of grains, and content is mainly that inspection, the effect of an Interaction of particles processed and graininess renewal.The fluid motion partial differential equation numerical solution that described single instruction multiple data many-core processor 107 is responsible for based on explicit algorithm, its resolution solving is generally less than particle or cluster of grains size.
When the extensive granule fluid system of simulation, in order to improve computing velocity, also can adopt than particle or the larger fluid calculation grid of cluster of grains.Now, from the angle of numerical algorithm stability and efficiency, should adopt implicit algorithm.Described implicit algorithm contains more exchanges data and decision operation, thereby by single instruction multidata processor 107, is responsible for the calculating of particle, and by multiple-instruction multiple-data (MIMD) processor 106, is responsible for the calculating of fluid.Under two kinds of analog forms of the multiple dimensioned discrete analog of described granule fluid system, the configuration quantity of single instruction multiple data many-core processor 107 all should be more than multiple-instruction multiple-data (MIMD) many-core processor 106, to adapt to corresponding calculated amount.
To containing the large-scale parallel processing system (PPS) of a plurality of described computer systems, described simulation can adopt mode parallel processing between each computer system of space partition zone, and the network interface unit 108 that the information exchange on its zone boundary is crossed separately exchanges.
Under described application scenarios, because computer system is distributed the handled calculation task of different processor according to the feature of different computing tasks according to the concept of multi-scale Simulation, each processor can be processed be more suitable for the calculation task of self processing feature, thereby improved the counting yield of whole computer system.
The computer system towards Multi-Scale Calculation that the present embodiment provides comprises communication bus and the shared storage that carries out efficient communication between a plurality of different processors and each processor, can select the processor that is applicable to the described calculation task of processing to carry out task processing according to the feature of different computing tasks, improve counting yield; Meanwhile, described computer system has adopted general processor chips and bus standard, has expanded the scope of application, has improved the extensibility of system, has realized good parallel computation effect.
Although above the utility model is had been described in detail, the utility model is not limited to this, and the art technology people cloud can carry out various modifications according to principle of the present utility model.Therefore, all modifications of doing according to the utility model principle, all should be understood to fall into protection domain of the present utility model.

Claims (10)

1. towards a computer system for Multi-Scale Calculation, it is characterized in that, described computer system comprises:
A plurality of general polycaryon processors, for the control and scheduling of computation process;
Multiple-instruction multiple-data (MIMD) many-core processor, for judging the processing of the calculation task more with branch;
Single instruction multiple data many-core processor, for judging the processing of the calculation task less with branch;
Network interface unit, for being connected to computer network by described computer system.
2. the computer system towards Multi-Scale Calculation according to claim 1, it is characterized in that, described a plurality of general polycaryon processor, multiple-instruction multiple-data (MIMD) many-core processor, single instruction multiple data many-core processor and a plurality of network interface unit are all connected with external unit bus (PCI), with realize between a plurality of processors and processor and described network interface unit between communicate by letter.
3. the computer system towards Multi-Scale Calculation according to claim 1, is characterized in that, described a plurality of general polycaryon processors are all connected with shared storage by memory bus, with the storage realizing between a plurality of general polycaryon processors, shares.
4. the computer system towards Multi-Scale Calculation according to claim 1, is characterized in that, interconnects, to realize the communication between a plurality of general polycaryon processors between described a plurality of general polycaryon processors by high-speed bus.
5. according to the computer system towards Multi-Scale Calculation described in claim 1 or 4, it is characterized in that, described general polycaryon processor adopts extremely strong (Xeon) series processors of Intel (Intel) company; High-speed bus between described general polycaryon processor adopts express passway interconnected (Quick Path Interconnect, the QPI) technology of Intel (Intel) company.
6. according to the computer system towards Multi-Scale Calculation described in claim 1 or 4, it is characterized in that, described general polycaryon processor adopts Yi, a legendary monarch of Youqiong State in the xia Dynasty dragon (Phenom) series processors of AMD; High-speed bus between described general polycaryon processor adopts super transmission (Hyper Transport, the HT) bus of AMD.
7. the computer system towards Multi-Scale Calculation according to claim 1, it is characterized in that, described multiple-instruction multiple-data (MIMD) many-core processor adopts the serial many-core processor of integrated many core (Many Integrated Core, MIC) of Intel (Intel) company.
8. the computer system towards Multi-Scale Calculation according to claim 1, is characterized in that, described single instruction multiple data many-core processor adopts Fermi (Fermi) Series Universal of tall and handsome reaching (Nvidia) company to calculate graphic process unit.
9. the computer system towards Multi-Scale Calculation according to claim 1, is characterized in that, described single instruction multiple data many-core processor adopts Kepler (Kepler) Series Universal of tall and handsome reaching (Nvidia) company to calculate graphic process unit.
10. the computer system towards Multi-Scale Calculation according to claim 1, is characterized in that, described network interface unit adopts Ethernet card or infinite bandwidth (InfiniBand) interface card.
CN201320106696.7U 2013-03-08 2013-03-08 Computer system facing multi-scale calculation Expired - Fee Related CN203465722U (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201320106696.7U CN203465722U (en) 2013-03-08 2013-03-08 Computer system facing multi-scale calculation

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201320106696.7U CN203465722U (en) 2013-03-08 2013-03-08 Computer system facing multi-scale calculation

Publications (1)

Publication Number Publication Date
CN203465722U true CN203465722U (en) 2014-03-05

Family

ID=50178071

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201320106696.7U Expired - Fee Related CN203465722U (en) 2013-03-08 2013-03-08 Computer system facing multi-scale calculation

Country Status (1)

Country Link
CN (1) CN203465722U (en)

Cited By (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198049A (en) * 2013-03-08 2013-07-10 中国科学院过程工程研究所 Multiscale-calculation-oriented computer system
CN105022715A (en) * 2015-07-08 2015-11-04 浪潮(北京)电子信息产业有限公司 Server backplane interconnection method and system

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN103198049A (en) * 2013-03-08 2013-07-10 中国科学院过程工程研究所 Multiscale-calculation-oriented computer system
CN103198049B (en) * 2013-03-08 2016-05-11 中国科学院过程工程研究所 A kind of computer system towards Multi-Scale Calculation
CN105022715A (en) * 2015-07-08 2015-11-04 浪潮(北京)电子信息产业有限公司 Server backplane interconnection method and system

Similar Documents

Publication Publication Date Title
US10007742B2 (en) Particle flow simulation system and method
Wawrzynek et al. RAMP: Research accelerator for multiple processors
Cheung et al. A large-scale spiking neural network accelerator for FPGA systems
Götz et al. Direct numerical simulation of particulate flows on 294912 processor cores
Barker et al. A performance evaluation of the Nehalem quad-core processor for scientific computing
CN103810111A (en) Address Generation In An Active Memory Device
CN113766802A (en) Intelligent liquid cooling calculation cabin for mobile data center
Martin Multicore processors: challenges, opportunities, emerging trends
He et al. A survey to predict the trend of AI-able server evolution in the cloud
CN100489830C (en) 64 bit stream processor chip system structure oriented to scientific computing
Li et al. A hybrid particle swarm optimization algorithm for load balancing of MDS on heterogeneous computing systems
CN203465722U (en) Computer system facing multi-scale calculation
JP5388323B2 (en) calculator
CN104679670A (en) Shared data caching structure and management method for FFT (fast Fourier transform) and FIR (finite impulse response) algorithms
CN103279446A (en) Isomerism mixed calculation multi-platform system using central processing unit (CPU)+graphic processing unit (GPU)+many integrated core (MIC)
Cai et al. Gemini: Mapping and Architecture Co-exploration for Large-scale DNN Chiplet Accelerators
Starke et al. IBM POWER9 memory architectures for optimized systems
CN103294639A (en) CPU+MIC mixed heterogeneous cluster system for achieving large-scale computing
Chen et al. Integrated research of parallel computing: Status and future
CN103198049B (en) A kind of computer system towards Multi-Scale Calculation
CN102902655A (en) Information processing heterogeneous system
US20230289398A1 (en) Efficient Matrix Multiply and Add with a Group of Warps
Qureshi et al. Genome sequence alignment-design space exploration for optimal performance and energy architectures
CN107729284A (en) A kind of calculating card based on multi-chip parallel processing
Tufa et al. Acceleration of Deep neural network training using field programmable gate arrays

Legal Events

Date Code Title Description
GR01 Patent grant
GR01 Patent grant
CF01 Termination of patent right due to non-payment of annual fee

Granted publication date: 20140305

Termination date: 20170308

CF01 Termination of patent right due to non-payment of annual fee