CN105468567A - Isomerism many-core discrete memory access optimization method - Google Patents

Isomerism many-core discrete memory access optimization method Download PDF

Info

Publication number
CN105468567A
CN105468567A CN201510830202.3A CN201510830202A CN105468567A CN 105468567 A CN105468567 A CN 105468567A CN 201510830202 A CN201510830202 A CN 201510830202A CN 105468567 A CN105468567 A CN 105468567A
Authority
CN
China
Prior art keywords
task
core
memory access
optimization method
counting variable
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
CN201510830202.3A
Other languages
Chinese (zh)
Other versions
CN105468567B (en
Inventor
袁欣辉
潘治
林蓉芬
王礼生
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Wuxi Jiangnan Computing Technology Institute
Original Assignee
Wuxi Jiangnan Computing Technology Institute
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Wuxi Jiangnan Computing Technology Institute filed Critical Wuxi Jiangnan Computing Technology Institute
Priority to CN201510830202.3A priority Critical patent/CN105468567B/en
Publication of CN105468567A publication Critical patent/CN105468567A/en
Application granted granted Critical
Publication of CN105468567B publication Critical patent/CN105468567B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/167Interprocessor communication using a common memory, e.g. mailbox

Landscapes

  • Engineering & Computer Science (AREA)
  • Computer Hardware Design (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention provides an isomerism many-core discrete memory access optimization method comprising: step one, a general task is divided into a plurality of task segments; step two, establishing a counting variable in a memory space allowing access of both a master core and a slave core; step three, judging whether the value of the counting variable is smaller than the segment number of the general task, and carrying out step four if determining that the counting variable is smaller than the segment number of the general task; step four, taking out the task segments dynamically from a task pool by the master core and each slave core, performing an atom plus one operation, and finishing a memory access operation on the task segments which are taken out; then, dealing with a return to step 3.

Description

The discrete memory access optimization method of the many cores of a kind of isomery
Technical field
The present invention relates to field of computer technology, more particularly, the present invention relates to the discrete memory access optimization method of the many cores of a kind of isomery.
Background technology
Isomery many-core processor is a kind of processor architecture of novelty, and general processor (main core) and acceleration core bunch (from core) are packaged together by it, provide very high calculated performance.But the master and slave core of this framework bunch share processor memory bandwidth, when data locality is poor, the actual memory bandwidth of processor is starkly lower than bandwidth when accessing continuous data.For data-intensive applications, memory access ink-bottle effect is obvious.Wherein, data locality comprises temporal locality and spatial locality.Temporal locality refers to that, in a bit of time, the data of accessing recently are probably accessed again; Spatial locality refers to that the data centralization of a bit of time internal program access is in a small pieces memory block, and the data near just accessed data are probably next accessed.
Immediate with this new architecture is at present CPU (CentralProcessingUnit, central processing unit)+GPU (GraphicsProcessingUnit, graphic process unit) mixed architecture, for improving discrete memory access (referring to that the data locality of routine access the is poor) performance on this framework, use CPU and GPU memory access simultaneously, first CPU and GPU is assigned the task to according to the two respective memory bandwidth by fixed proportion, again by GPU obtain task according to Thread Count mean allocation, to reach entirety preferably problem results of property.
Under CPU+GPU mixed architecture, the memory access path of CPU and GPU is independently independent of each other, therefore can easily according to the ratio that respective bandwidth pinned task is distributed.But this method is difficult to directly be applied on isomery many-core processor: the master and slave core of isomery many-core processor shares memory bandwidth, the memory access of each core may cause conflict and cause other core memory access hydraulic performance declines, in addition the access instruction that each core sends is unordered, random, uncertain completely, and this makes winner's core be unfixed with the memory bandwidth ratio from core bunch; In addition, due to the difference of the actual memory access amount of each task, the intensity of load of each task is variant, and from core bunch, the average division of task can make load imbalance, the heavy core of load can tie down the performance of problem, is also therefore infeasible according to this ratio cut partition task.
Summary of the invention
Technical matters to be solved by this invention is for there is above-mentioned defect in prior art, there is provided the many cores of a kind of isomery discrete memory access optimization method, the hardware characteristics of isomery many-core processor can be utilized, improve the performance of discrete memory access, to improve the performance of data-intensive applications.
In order to realize above-mentioned technical purpose, according to the present invention, providing the discrete memory access optimization method of the many cores of a kind of isomery, it is characterized in that comprising:
First step: general assignment is divided into multiple task fragment;
Second step: main core with set up a counting variable from all addressable storage space of core;
Third step: judge whether the value of counting variable is less than the segments of general assignment, if it is determined that the segments that the value of counting variable is less than general assignment then performs the 4th step;
4th step: main core and eachly atom adding 1 is done to counting variable operate from core dynamic taking-up task fragment from task pool, and complete accessing operation for the task fragment of taking out; Third step is returned with aftertreatment.
Preferably, in third step, if it is determined that the value of counting variable equals the segments of general assignment, then judge that task is disposed, process stops; Otherwise main core and each from core bunch dynamically take out the process of memory access task from core from task pool.
Preferably, in a first step, general assignment is divided into the task fragment of predetermined quantity; In 4th step, main core and eachly to process from certain memory access task fragment of core dynamic requests.
Preferably, in a first step, general assignment is divided into the task fragment of pre-sizing; In 4th step, main core with complete memory access task from core, and task dynamic assignment simultaneously.
Preferably, the pre-sizing of task fragment can be conditioned, main core with complete memory access task from core, and task dynamic assignment simultaneously.
Use method of the present invention can make main core with from core bunch memory access simultaneously, chip memory bandwidth can be made full use of, and utilize main core Cache, when Cache hits, main core is finished the work not occupied bandwidth, and the actual discrete memory bandwidth of said method may higher than total bandwidth; And dynamic task division mode efficiently solves main core and is difficult to determine that task division ratio and each core, load inequality ties down the problem of problem performance between core bunch.The invention provides method for partitioning dynamic tasks, the memory bandwidth of isomery many-core processor can be made full use of, again can according to practical operation situation flexibly, divide main core and from core bunch, from the task amount between core bunch each core, the discrete memory access bottleneck problem on isomery many-core processor effectively can be alleviated dynamically.
Accompanying drawing explanation
By reference to the accompanying drawings, and by reference to detailed description below, will more easily there is more complete understanding to the present invention and more easily understand its adjoint advantage and feature, wherein:
Fig. 1 schematically shows the process flow diagram of the discrete memory access optimization method of the many cores of isomery according to the preferred embodiment of the invention.
It should be noted that, accompanying drawing is for illustration of the present invention, and unrestricted the present invention.Note, represent that the accompanying drawing of structure may not be draw in proportion.Further, in accompanying drawing, identical or similar element indicates identical or similar label.
Embodiment
In order to make content of the present invention clearly with understandable, below in conjunction with specific embodiments and the drawings, content of the present invention is described in detail.
In isomery many-core processor main core with from core bunch shared memory bandwidth: if only with main core memory access, cannot memory bandwidth be made full use of; If only with from core bunch memory access, then waste the benefit that main Nuclear Data Cache (high-speed cache) is brought memory access.Master and slave nuclear coordination memory access, due to the effect of Cache, program likely obtains the memory access performance exceeding discrete memory access total bandwidth.
Due to the design feature of the many core of isomery, memory access ability, the actual memory bandwidth utilized of master and slave core are difficult to precise quantification, task division is improper collaborative memory access may be made there is no effect even effect is poorer; Dividing task amount with fixed proportion makes its versatility poor, and may cause more significant laod unbalance problem.
For solving this problem, the present invention is proposed.Specifically describe the preferred embodiments of the present invention below with reference to the accompanying drawings.
Fig. 1 schematically shows the process flow diagram of the discrete memory access optimization method of the many cores of isomery according to the preferred embodiment of the invention.
As shown in Figure 1, the discrete memory access optimization method of the many cores of isomery comprises according to the preferred embodiment of the invention:
First step S1: general assignment is divided into multiple task fragment;
Preferably, in first step S1, general assignment can be divided into the task fragment of predetermined quantity.Or, preferably, in first step S1, general assignment can be divided into the task fragment of pre-sizing.Further preferably, user/operating personnel can regulate the pre-sizing of task fragment, to keep dynamic adjustments ability for when being particularly applicable in and not increasing collaborative expense.
Second step S2: main core with set up a counting variable from all addressable storage space of core, and the initial value of counting variable is set to 0; Counting variable is in order to record the performance of overall task.
Third step S3: judge whether the value of counting variable is less than the segments (namely judging whether general assignment has processed) of general assignment, if it is determined that the segments that the value of counting variable is less than general assignment then performs the 4th step S4, the segments of if it is determined that the value of counting variable is not less than (equaling) general assignment, then judge that task is disposed, process stops;
4th step S4: main core and eachly dynamically take out task fragment from task pool from core, does atom adding 1 to counting variable and operates, and complete accessing operation for the task fragment of taking out; Third step S3 is returned with aftertreatment.
Wherein, atomic operation refers to: one or sequence of operations are atom (atomic), if this operation is independent, indivisible, not interruptable before being finished.When same resource has multiple core access, atomic operation ensures that all cores all operate this resource at different time.Common atomic operation has atom adding, atom subtracts, atomic ratio comparatively also exchanges.
Preferably, for obtaining better effect, can be done some to data and such as reordering, reject the operations such as invalid data, improve data locality, play the ability of Cache more significantly.
The present invention is applicable to the processor of the many core frameworks of isomery, its advantage is: 1. main core with carry out accessing operation from core bunch simultaneously, the memory bandwidth of chip can be made full use of, and have Cache due to main core, when main core Cache hits, aforesaid way can obtain the discrete memory access performance higher than discrete memory access total bandwidth; 2. dynamic task division mode efficiently solves main core and is difficult to determine that task division ratio and each core, load inequality ties down the problem of problem performance between core bunch.The method is flexible, and performance cost is little, and comprehensive income is large.
In addition, it should be noted that, unless stated otherwise or point out, otherwise the term " first " in instructions, " second ", " the 3rd " etc. describe only for distinguishing each assembly, element, step etc. in instructions, instead of for representing logical relation between each assembly, element, step or ordinal relation etc.
Be understandable that, although the present invention with preferred embodiment disclose as above, but above-described embodiment and be not used to limit the present invention.For any those of ordinary skill in the art, do not departing under technical solution of the present invention ambit, the technology contents of above-mentioned announcement all can be utilized to make many possible variations and modification to technical solution of the present invention, or be revised as the Equivalent embodiments of equivalent variations.Therefore, every content not departing from technical solution of the present invention, according to technical spirit of the present invention to any simple modification made for any of the above embodiments, equivalent variations and modification, all still belongs in the scope of technical solution of the present invention protection.

Claims (5)

1. the discrete memory access optimization method of the many cores of isomery, is characterized in that comprising:
First step: general assignment is divided into multiple task fragment;
Second step: main core with set up a counting variable from all addressable storage space of core;
Third step: judge whether the value of counting variable is less than the segments of general assignment, if it is determined that the segments that the value of counting variable is less than general assignment then performs the 4th step;
4th step: main core and eachly atom adding 1 is done to counting variable operate from core dynamic taking-up task fragment from task pool, and complete accessing operation for the task fragment of taking out; Third step is returned with aftertreatment.
2. the discrete memory access optimization method of the many cores of isomery according to claim 1, is characterized in that, in third step, if it is determined that the value of counting variable equals the segments of general assignment, then judge that task is disposed, process stops; Otherwise main core and each from core dynamic taking-up task process from task pool from core bunch.
3. the discrete memory access optimization method of the many cores of isomery according to claim 1 and 2, is characterized in that, in a first step, general assignment is divided into the task fragment of predetermined quantity; In 4th step, main core and eachly to process from certain memory access task fragment of core dynamic requests.
4. the discrete memory access optimization method of the many cores of isomery according to claim 1 and 2, is characterized in that, in a first step, general assignment is divided into the task fragment of pre-sizing; In 4th step, main core with complete memory access task from core, and task dynamic assignment simultaneously.
5. the discrete memory access optimization method of the many cores of isomery according to claim 1 and 2, it is characterized in that, the pre-sizing of task fragment can be conditioned, main core with complete memory access task from core, and task dynamic assignment simultaneously.
CN201510830202.3A 2015-11-24 2015-11-24 A kind of discrete memory access optimization method of isomery many-core Active CN105468567B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201510830202.3A CN105468567B (en) 2015-11-24 2015-11-24 A kind of discrete memory access optimization method of isomery many-core

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201510830202.3A CN105468567B (en) 2015-11-24 2015-11-24 A kind of discrete memory access optimization method of isomery many-core

Publications (2)

Publication Number Publication Date
CN105468567A true CN105468567A (en) 2016-04-06
CN105468567B CN105468567B (en) 2018-02-06

Family

ID=55606286

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201510830202.3A Active CN105468567B (en) 2015-11-24 2015-11-24 A kind of discrete memory access optimization method of isomery many-core

Country Status (1)

Country Link
CN (1) CN105468567B (en)

Cited By (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885585A (en) * 2016-09-30 2018-04-06 罗伯特·博世有限公司 A kind of dynamic task scheduling device in multinuclear electronic control unit
CN112540936A (en) * 2019-09-23 2021-03-23 无锡江南计算技术研究所 Discrete memory access read-write method oriented to heterogeneous many-core architecture
CN113568718A (en) * 2020-04-29 2021-10-29 北京希姆计算科技有限公司 Task allocation method and device, electronic equipment and computer readable storage medium

Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110213854A1 (en) * 2008-12-04 2011-09-01 Yaron Haviv Device, system, and method of accessing storage
CN102567275A (en) * 2010-12-08 2012-07-11 中国科学院声学研究所 Method and system for memory access among multiple operation systems on multi-core processor
US20120221795A1 (en) * 2010-07-16 2012-08-30 Panasonic Corporation Shared memory system and control method therefor
CN102929724A (en) * 2012-11-06 2013-02-13 无锡江南计算技术研究所 Multistage memory access method and discrete memory access method based on heterogeneous multi-core processor
CN104281495A (en) * 2014-10-13 2015-01-14 湖南农业大学 Method for task scheduling of shared cache of multi-core processor

Patent Citations (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US20110213854A1 (en) * 2008-12-04 2011-09-01 Yaron Haviv Device, system, and method of accessing storage
US20120221795A1 (en) * 2010-07-16 2012-08-30 Panasonic Corporation Shared memory system and control method therefor
CN102567275A (en) * 2010-12-08 2012-07-11 中国科学院声学研究所 Method and system for memory access among multiple operation systems on multi-core processor
CN102929724A (en) * 2012-11-06 2013-02-13 无锡江南计算技术研究所 Multistage memory access method and discrete memory access method based on heterogeneous multi-core processor
CN104281495A (en) * 2014-10-13 2015-01-14 湖南农业大学 Method for task scheduling of shared cache of multi-core processor

Non-Patent Citations (3)

* Cited by examiner, † Cited by third party
Title
李雁冰 等: "面向异构多核处理器的的循环分块", 《计算机工程与设计》 *
杨阳: "嵌入式异构多核处理器的任务调度研究", 《中国优秀硕士学位论文全文数据库 信息科技辑》 *
许瑾晨 等: "面向异构众核从核的数学函数库访存优化方法", 《计算机科学》 *

Cited By (5)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107885585A (en) * 2016-09-30 2018-04-06 罗伯特·博世有限公司 A kind of dynamic task scheduling device in multinuclear electronic control unit
CN112540936A (en) * 2019-09-23 2021-03-23 无锡江南计算技术研究所 Discrete memory access read-write method oriented to heterogeneous many-core architecture
CN113568718A (en) * 2020-04-29 2021-10-29 北京希姆计算科技有限公司 Task allocation method and device, electronic equipment and computer readable storage medium
WO2021218492A1 (en) * 2020-04-29 2021-11-04 北京希姆计算科技有限公司 Task allocation method and apparatus, electronic device, and computer readable storage medium
EP4145283A4 (en) * 2020-04-29 2023-09-06 Stream Computing Inc Task allocation method and apparatus, electronic device, and computer readable storage medium

Also Published As

Publication number Publication date
CN105468567B (en) 2018-02-06

Similar Documents

Publication Publication Date Title
US10609129B2 (en) Method and system for multi-tenant resource distribution
Gregg et al. {Fine-Grained} Resource Sharing for Concurrent {GPGPU} Kernels
US9659081B1 (en) Independent data processing environments within a big data cluster system
DE102013114072B4 (en) System and method for hardware scheduling of indexed barriers
CN102099789B (en) Multi-dimensional thread grouping for multiple processors
CN101799773B (en) Memory access method of parallel computing
CN103765376A (en) Graphics processor with non-blocking concurrent architecture
CN101551761A (en) Method for sharing stream memory of heterogeneous multi-processor
CN104834505B (en) Synchronization method for NUMA (Non Uniform Memory Access) sensing under multi-core and multi-thread environment
CN101366004A (en) Methods and apparatus for multi-core processing with dedicated thread management
JP2008191949A (en) Multi-core system, and method for distributing load of the same
US11720496B2 (en) Reconfigurable cache architecture and methods for cache coherency
Puri et al. A parallel algorithm for clipping polygons with improved bounds and a distributed overlay processing system using mpi
US10725940B2 (en) Reallocate memory pending queue based on stall
CN105468567A (en) Isomerism many-core discrete memory access optimization method
CN105718315A (en) Task processing method and server
CN105637482A (en) Method and device for processing data stream based on gpu
US7647482B2 (en) Methods and apparatus for dynamic register scratching
KR102253788B1 (en) Methods of and apparatus for multidimensional indexing in microprocessor systems
CN104156271B (en) A kind of method and system of cooperated computing cluster load balance
CN107851041A (en) The dynamic tuning of multiprocessor/multi-core computing system
CN103197918B (en) Hyperchannel timeslice group
Hoffmann et al. Performance evaluation of task pools based on hardware synchronization
Valero et al. Towards a more efficient use of gpus
Chatterjee et al. Data structures and algorithms for counting problems on graphs using gpu

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant