CN106155814B - A kind of reconfigurable arithmetic unit that supporting multiple-working mode and its working method - Google Patents

A kind of reconfigurable arithmetic unit that supporting multiple-working mode and its working method Download PDF

Info

Publication number
CN106155814B
CN106155814B CN201610523519.7A CN201610523519A CN106155814B CN 106155814 B CN106155814 B CN 106155814B CN 201610523519 A CN201610523519 A CN 201610523519A CN 106155814 B CN106155814 B CN 106155814B
Authority
CN
China
Prior art keywords
data
cache unit
layer interface
operand cache
controller
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Active
Application number
CN201610523519.7A
Other languages
Chinese (zh)
Other versions
CN106155814A (en
Inventor
宋宇鲲
李浩洋
张多利
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Hefei University of Technology
Original Assignee
Hefei University of Technology
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Hefei University of Technology filed Critical Hefei University of Technology
Priority to CN201610523519.7A priority Critical patent/CN106155814B/en
Publication of CN106155814A publication Critical patent/CN106155814A/en
Application granted granted Critical
Publication of CN106155814B publication Critical patent/CN106155814B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/50Allocation of resources, e.g. of the central processing unit [CPU]
    • G06F9/5061Partitioning or combining of resources
    • G06F9/5066Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F15/00Digital computers in general; Data processing equipment in general
    • G06F15/16Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
    • G06F15/163Interprocessor communication
    • G06F15/173Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
    • G06F15/17306Intercommunication techniques
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/50Indexing scheme relating to G06F9/50
    • G06F2209/5022Workload threshold

Landscapes

  • Engineering & Computer Science (AREA)
  • Theoretical Computer Science (AREA)
  • Software Systems (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Computer Hardware Design (AREA)
  • Mathematical Physics (AREA)
  • Small-Scale Networks (AREA)
  • Memory System Of A Hierarchy Structure (AREA)

Abstract

The invention discloses a kind of reconfigurable cell for supporting multiple-working mode and its working methods, it is characterized in that: reconfigurable arithmetic unit includes control layer, operation layer and accumulation layer;Wherein control layer includes state layer interface, configuration layer interface, data layer interface, address generator and controller;Operation layer includes arithmetic unit;Accumulation layer includes source operand cache unit, destination operand cache unit.The operating mode of reconfigurable arithmetic unit includes three kinds of operation mode of storage operation mode, pulsation operation mode and stream, provides stronger flexibility for the Algorithm mapping of computing system.When carrying out duty mapping in computing systems, it can be according to the specific features and its algorithm bottleneck of algorithm to be mapped, in conjunction with the concrete condition of network communication in computing system and memory bandwidth, select the specific works mode of reconfigurable arithmetic unit, to take into account operation handling capacity and network communication and storage access pressure, the working efficiency of whole system is improved.

Description

A kind of reconfigurable arithmetic unit that supporting multiple-working mode and its working method
Technical field
It is calculated the present invention relates to high density and digital signaling system field, specifically a kind of chip multi-core that is used for calculates The reconfigurable arithmetic unit and its working method of system.
Background technique
Specific integrated circuit (ASIC) and general processor (GPP) are two kinds of common data processing hardwares.ASIC is directed to Specific application design, operation efficiency is high but does not have versatility;GPP is used for general-purpose computations, has very strong flexibility, but opposite ASIC operation efficiency is low.Restructural operation achieves balance in the high efficiency of ASIC and the versatility of GPP, is guaranteeing one Determine in field on the basis of versatility, obtain the opposite higher efficiency of GPP, is a kind of common in current multicore computing system Calculate power organizational form.
Multi-core technology is because low power consumption, strong parallel processing capability and excellent calculated performance have become processor and set The mainstream of meter.However, realizing the efficient mapping of the high efficiency communication and task between operation core, it is directly related to the calculation of multiple nucleus system Can power be played, and be the major issue that current multiple nucleus system faces.
Summary of the invention
The present invention is to overcome the shortcoming of existing invention, proposes a kind of support storage operation mode, pulsation operation mould The reconfigurable arithmetic unit and its working method of formula and stream three kinds of operating modes of operation mode, to be coarseness computing system Algorithm mapping stronger flexibility is provided, can be according to the tool of algorithm to be mapped when carrying out duty mapping in computing systems Body characteristics and its algorithm bottleneck, in conjunction with the concrete condition of network communication in computing system and memory bandwidth, selection can more preferably be weighed The specific working mode of structure arithmetic element improves entire to take into account operation handling capacity and network communication and storage access pressure The working efficiency of system.
The technical scheme adopted by the invention to achieve the purpose is as follows:
A kind of reconfigurable arithmetic unit for supporting multiple-working mode of the present invention, is any two for being articulated in network-on-chip On routing node, its main feature is that, the reconfigurable arithmetic unit includes: control layer, operation layer and accumulation layer;
The control layer includes: state layer interface, configuration layer interface, data layer interface, address generator and controller;
The operation layer includes: arithmetic unit;
The accumulation layer includes: source operand cache unit, destination operand cache unit;
The operating mode of the reconfigurable arithmetic unit includes storage operation mode, pulsation operation mode and stream operation mould Formula;
The configuration information that network-on-chip described in the configuration layer interface of the reconfigurable arithmetic unit is sent, why is judgement Kind operating mode;
If storage operation mode, then it represents that include deblocking information, the state layer interface in the configuration information According to the deblocking information, data request information is sent to source node by the network-on-chip;The data layer interface A data block is received as current data block by the network-on-chip and is stored into the source operand cache unit;
After the completion of data block storage, the arithmetic unit is raw according to the address under the control of the controller It grows up to be a useful person address generated, reads the current data block in the source operand cache unit and be sent into the arithmetic unit and execute fortune It calculates, obtained operation result is stored in the destination operand cache unit;
After a data block is all admitted to the arithmetic unit, the state layer interface is believed according to the deblocking Breath, sends data request information to source node, for receiving next data block by the network-on-chip again;
The destination node request that the network-on-chip described in the configuration layer interface transmits, and current data block is complete At operation, the state layer interface reads operation result from the destination operand cache unit, and passes through described online Network is sent to destination node, to complete the processing of current data block;
If pulsation operation mode, then it represents that include deblocking information, the state layer interface in the configuration information According to the deblocking information, data request information is sent to source node by the network-on-chip;The data layer interface A data block is received as current data block by the network-on-chip and is stored into the source operand cache unit;
When the source operand cache unit non-empty, the arithmetic unit under the control of the controller, reads the source behaviour It counts and the current data block in cache unit and carries out operation, obtained operation result is stored in the destination operand cache unit In;
If the source operand caching is sky, stop reading data from source operand caching immediately, and to described Arithmetic unit carries out scene protection, until the source operand caches non-empty;
After a data block is all admitted to the arithmetic unit, the state layer interface is believed according to the deblocking Breath sends data request information to source node for receiving next data block by the network-on-chip again;
The destination node request that the network-on-chip described in the configuration layer interface transmits, and the destination operand is slow When memory cell non-empty, the state layer interface reads operation result from the destination operand cache unit, and by described Network-on-chip is sent to destination node, to complete the processing of current data block;
If stream operation mode, indicate in the configuration information comprising total amount of data information, the state layer interface according to The total amount of data information sends data request information to source node by the network-on-chip;The data layer interface passes through The network-on-chip receives data flow and is cached by the source operand cache unit;When in the source operand cache unit Data volume be more than its amount of storage upper threshold value when, then the data layer interface is given by the network-on-chip transmission source pending signal The source node;When the data volume in the source operand cache unit is lower than its reserves lower threshold value, the data layer interface The source pending signal is cancelled to the source node by the network-on-chip, so that data flow is continued to, until completing institute State the reception of total amount of data;
When the source operand cache unit non-empty, the arithmetic unit under the control of the controller, reads the source behaviour The data flow in cache unit of counting and and be sent into the arithmetic unit and carry out operation, obtained operation result is stored in the purpose behaviour It counts in cache unit;In the calculating process of the data flow, if the source operand cache unit is sky, the operation Device carries out scene protection operation, and when the source operand cache unit non-empty, revocation scene protection operates and continues logarithm Operation is carried out according to stream;
The destination node request that the network-on-chip described in the configuration layer interface transmits, and the destination operand is slow When memory cell non-empty, the state layer interface reads operation result from the destination operand cache unit, and by described Network-on-chip is sent to destination node;
When the data volume in the destination operand cache unit is more than its amount of storage upper threshold value, the source operand is cached Unit under the control of the controller, suspends the reading of data flow;When the data volume in the source operand cache unit is lower than it When reserves lower threshold value, under the control of the controller, continue the reading of data flow;When data layer interface is connect by the network-on-chip When receiving the purpose pending signal of destination node transmission, the pause of data layer interface is read from the destination operand cache unit Operation result continues to read operation result when purpose pending signal revocation;To complete the processing of data flow.
A kind of working method for the reconfigurable arithmetic unit for supporting multiple-working mode of the present invention, is applied in coarseness meter In calculation system, its main feature is that, the reconfigurable arithmetic unit includes: control layer, operation layer and accumulation layer;
The control layer includes: state layer interface, configuration layer interface, data layer interface, address generator and controller;
The operation layer includes: arithmetic unit;
The accumulation layer includes: source operand cache unit, destination operand cache unit;
The operating mode of the reconfigurable arithmetic unit includes storage operation mode, pulsation operation mode and stream operation mould Formula;
The working method is to carry out as follows:
After step 1, reconfigurable arithmetic unit pass through the configuration layer interface to configuration information, the master of the controller State machine jumps to model selection state F_CALMODE;
Step 2: at model selection state F_CALMODE, storage operation mould being judged whether it is according to the configuration information Formula, if so, indicating that, comprising deblocking information in the configuration information, the host state machine of the controller jumps into storage operation Importing state FS_SRC executes step 3;Otherwise, the host state machine of the controller jumps to stream operation state FF_CAL, goes forward side by side Enter step 7;
Step 3: initialization i=0;
Step 4, controller upload data request information by the state layer interface according to the deblocking information, and It as current data block and is stored into the source operand cache unit by i-th of data block of data Layer interface;
When current data block all receives to finish, operation commencing signal cal_start_w, the master of the controller are generated State machine jumps to storage operation and executes state FS_CAL;
Step 5: the controller is according to address generator address generated, from the source operand cache unit Middle reading current data block simultaneously carries out operation, and obtained operation result is stored in the destination operand cache unit;It is right simultaneously The data volume carried out in the data block of operation is counted;When data volume reaches the data of current data block as defined in configuration information When amount, the controller generates operation end signal cal_finish_w;The host state machine of the controller jumps to storage etc. To state FS_OVERHEAD, to complete the operation control processing of current data block;
Step 6: judging whether batch counting device i reaches preset batch value n;If reaching, generating whole operations terminates Signal all_cal_finished_w, the host state machine of the controller jumps to end state F_END, and enters step 10, Otherwise, the state layer interface under the control of the controller, reads operation result from the destination operand cache unit, and It is sent to destination node by the network-on-chip, so that i+1 is assigned to i, and again after completing the processing of current data block Execute step 4;
Step 7: controller uploads data request information by state layer interface, and passes through data Layer interface data block Or data flow, when source operand cache unit non-empty, the arithmetic unit under the control of the controller, reads the source operand Data block or data flow in cache unit simultaneously carry out operation, and obtained operation result is stored in the destination operand cache unit In;The data volume in the data block for carrying out operation or in data flow is counted simultaneously;
The destination node request that the network-on-chip described in the configuration layer interface transmits, and the destination operand is slow When memory cell non-empty, the state layer interface reads operation result from the destination operand cache unit, and by described Network-on-chip is sent to destination node;
When data volume as defined in data volume reaches configuration information, the controller generates operation end signal cal_ finish_w;
Step 8: pulsation operation mode is judged whether it is, if so, the host state machine of the controller jumps to pulsation etc. To state FF_OVERHEAD, and enter step 9;Otherwise, it is expressed as stream operation mode, the host state machine of the controller jumps To end state F_END, and enter step 10;
Step 9: judge whether batch counting device i reaches preset batch value n, if reaching, the main shape of the controller State machine jumps to end state F_END, enters step 10;Otherwise, batch counting device i adds 1 certainly, and re-execute the steps 7;
Step 10: waiting the source operand cache unit for sky, the host state machine of the controller jumps to idle shape State F_IDLE.
Compared with prior art, advantageous effects of the invention are embodied in:
1, the present invention supports storage operation mode, pulsation operation mode and stream three kinds of operating modes of operation mode, is on piece The Algorithm mapping of multicore computing system provides stronger flexibility, provides more multiselect for the Algorithm mapping mode of programmer It selects;Under different working modes, reconfigurable arithmetic unit external characteristics having the same uses identical network interface protocols, side Reconfigurable arithmetic unit in computing systems integrated and use;When carrying out duty mapping in computing systems, Ke Yigen According to the specific features and its algorithm bottleneck of algorithm to be mapped, the specific feelings of network communication and memory bandwidth in computing system are combined Condition, the specific works mode of selection more preferably reconfigurable arithmetic unit, to take into account operation handling capacity and network communication and deposit Storage access pressure, improves the working efficiency of entire computing system;Under the control of computing system master controller, restructural operation The operating mode of unit can be in switching at runtime between operation mode and stream operation mode of pulsing, so that the operation for realizing task is excellent First grade management makes the task execution process of system have stronger controllability.
2, when storing under operation mode, data are stored in advance in source operand cache unit for present invention work, Under the cooperation of address generator, a variety of flexible address jump rules may be implemented, realize data selection and calculate two steps Rapid fusion reduces or even avoids the equal pre-operations that reorder before operation to pending data, reduces programmer couple The difficulty of complicated algorithm mapping, improves the working efficiency of reconfigurable arithmetic unit;Pass through the circulation function of configuration address generator Can, it can also realize the reuse of data, to reduce the data transmission times of reconfigurable arithmetic unit, improve operation Time ratio shared in the entire calculating task time, the algorithm relatively high for data-reusing rate, such as matrix multiplication, quickly Fourier transformation and its inverse transformation etc. have very high application value, can be greatly improved computational efficiency, shorten calculating and appoint The total time of business;Store that reconfigurable arithmetic unit control under operation mode is simple, debugging difficulty when duty mapping is low, convenient for pair The fast mapping of complicated algorithm;Under this mode, just start to operate in next step after the completion of the back operation of data block, a data The operation and transmission of block are carried out continuously in its process, and data transmission will not break after the data link setup of network-on-chip Stream, the data link utilization rate of network-on-chip is higher, thus greatly reduces arithmetic element to network communication and memory bandwidth Pressure.
3, when present invention work is under operation mode of pulsing, data processing is carried out by unit flowing water of data block, can be incited somebody to action The data transmission period of reconfigurable arithmetic unit and operation time partially overlap, to improve the working efficiency of system;To more Link need to be re-established when the processing of a data block, transmission path can be adjusted according to the priority and network occupancy situation of task, Sequence is executed to determine operation, additionally it is possible to realize the one-to-many service of reconfigurable arithmetic unit, have more preferably mapping clever Activity reaches the target for making full use of and calculating power;The data scale of data block is less than the endogenous behaviour of reconfigurable arithmetic unit under this mode It counts the capacity of cache unit and destination operand cache unit, not will cause memory overflow error, do not need network-on-chip Flow control logic, control and debugging difficulty are moderate, in control complexity, between flexibility and operation efficiency achieve balance.
4, when flowing under operation mode, the data transmission period of reconfigurable arithmetic unit and operation time exist for present invention work It realizes and is overlapped to the full extent, data are processed in the form of data flow, and a link is only established and cancelled to entire processor active task, Reduce network-on-chip to repeat to establish and cancel the loss of clock cycle caused by link, decreases arithmetic unit and repeatedly enter and move back The clock cycle caused by arithmetic pipelining loses out, can obtain high operation efficiency;In computing systems, by will be multiple Reconfigurable arithmetic unit is combined with cascade form completes multinomial operation, can make cascade all reconfigurable arithmetic unit groups At an assembly line, it has further been overlapped data transmission period and operation time, ultra-deep streamlined is realized in structure, into one Step improves work efficiency;The source operand caching and destination operand caching of system are equipped with upper threshold value and lower threshold value, both keep away The appearance for exempting from memory spilling, ensure that the safety of data, and reserve enough data latency processing, will not make operation The data of device stop, and ensure that the working efficiency of reconfigurable arithmetic unit, it is thus also avoided that the frequent transmission and revocation of pending signal, Reduce power consumption.
Detailed description of the invention
Fig. 1 for the present invention towards chip multi-core computing system structure chart;
Fig. 2 is structure of the invention figure;
Fig. 3 is controller host state machine schematic diagram of the present invention;
Fig. 4 is present invention storage operation mode schematic diagram;
Fig. 5 is present invention pulsation operation mode schematic diagram;
Fig. 6 is present invention stream operation mode schematic diagram.
Specific embodiment
In this example implementation, a kind of reconfigurable arithmetic unit for supporting multiple-working mode is to be articulated in piece as shown in Figure 1 On any two routing node of the network-on-chip of upper multicore computing system, pass through the local interface completion of network-on-chip and on piece The data exchange of network;Fig. 2 gives structural block diagram of the invention, and reconfigurable arithmetic unit includes: control layer, operation layer and deposits Reservoir;
Control layer includes: state layer interface, configuration layer interface, data layer interface, address generator and controller;
Operation layer includes: arithmetic unit;
Accumulation layer includes: source operand cache unit, destination operand cache unit;
Source operand cache unit and destination operand caching are equipped with cache threshold, and cache threshold is according to computing system The stream of the capacity of source operand caching and destination operand caching, arithmetic unit in the scale of network-on-chip, reconfigurable arithmetic unit The numerical value of the factors such as water series setting;Threshold value in the design is divided into upper threshold value and lower threshold value, and upper threshold value guarantees the transmission of data Safety, avoids cache overflow and leads to loss of data, and lower threshold value avoids arithmetic unit working efficiency caused by data cutout It reduces, while reducing the number of occurrence of control signal, reduce power consumption;
Source pending signal and purpose pending signal are equipped in data Layer interface module, the two signals are for network-on-chip Destination node is transmitted to originating node requests pause data;Arithmetic unit when scene protection logic in arithmetic unit stops for data Working condition protection, it is therefore prevented that numerical exception jump or data in arithmetic unit are lost;
The operating mode of reconfigurable arithmetic unit includes storage operation mode, pulsation operation mode and stream operation mode;No With under operating mode, reconfigurable arithmetic unit external characteristics having the same can be convenient using identical network interface protocols Realize the integrated and cooperation with other arithmetic elements of reconfigurable arithmetic unit in computing systems;Reconfigurable arithmetic unit work Make when storing operation mode or pulsation operation mode, if the data scale of calculating task is grasped greater than reconfigurable arithmetic unit source It counts the capacity of cache unit, then must data be carried out with " piecemeal " operation, it is first slow according to reconfigurable arithmetic unit source operand Data are divided into several data blocks by the capacity of memory cell, then are successively handled each data block;
Which kind of Working mould the configuration information that the configuration layer interface network-on-chip of reconfigurable arithmetic unit is sent, be judged as Formula;
If storage operation mode, then it represents that include deblocking information in configuration information, state layer interface is according to data Blocking information sends data request information to source node by network-on-chip;Data layer interface receives one by network-on-chip Data block is as current data block and stores into source operand cache unit;
When data block, which stores, to be completed, arithmetic unit under the control of the controller, according to address generator address generated, It reads the current data block in source operand cache unit and is sent into arithmetic unit and execute operation, obtained operation result deposit purpose In operand cache unit;
After a data block is all admitted to arithmetic unit, state layer interface is surfed the Internet according to deblocking information by piece Network sends data request information to source node for receiving next data block again;
When the destination node request that configuration layer interface network-on-chip transmits, and after current data block completion operation, shape State layer interface reads operation result from destination operand cache unit, and is sent to destination node by network-on-chip, thus Complete the processing of current data block;
As shown in figure 4, source data input, execution operation and operation result, which export three steps, is under storage operation mode It is successively executed in order as unit of data block, before calculating starts, entire data block has been stored in restructural in advance In the source operand cache unit of arithmetic element, there is no temporal coincidence between each step of each data block;Data block it Between source data input and operation result output can with the coincidence in having time, thus conceal to greatest extent data transmission when Between;
If pulsation operation mode, then it represents that include deblocking information in configuration information, state layer interface is according to data Blocking information sends data request information to source node by network-on-chip;Data layer interface receives one by network-on-chip Data block is as current data block and stores into source operand cache unit;
When source operand cache unit non-empty, arithmetic unit under the control of the controller, reads source operand cache unit In current data block and carry out operation, in obtained operation result deposit destination operand cache unit;
If source operand caching is sky, stop reading data from source operand caching immediately, and show arithmetic unit Field protection, until source operand caches non-empty;
After a data block is all admitted to arithmetic unit, state layer interface is surfed the Internet according to deblocking information by piece Network sends data request information to source node for receiving next data block again;
The destination node request transmitted when configuration layer interface network-on-chip, and destination operand cache unit non-empty When, state layer interface reads operation result from destination operand cache unit, and is sent to destination node by network-on-chip, To complete the processing of current data block;
As shown in figure 5, when pulsing operation mode, source data input, execute operation and operation result output be with Data block is what unit successively sequentially executed, but in same data block, and source data input executes operation and operation result output Three steps have part-time coincidence in time, are carried out in a manner of flowing water;It is executed in order between each data block Operation, when previous number terminates according to the operation of block, just the operation of progress subsequent data chunk, was not overlapped, to guarantee number on the time According to safety.
If stream operation mode, indicate in configuration information to include total amount of data information, state layer interface is according to total amount of data Information sends data request information to source node by network-on-chip;Data layer interface receives data flow simultaneously by network-on-chip It is cached by source operand cache unit;When the data volume in source operand cache unit is more than its amount of storage upper threshold value, then Data layer interface is by network-on-chip transmission source pending signal to source node;When the data volume in source operand cache unit is lower than When its reserves lower threshold value, data layer interface cancels source pending signal to source node, to continue to data by network-on-chip Stream, the reception until completing total amount of data;
When source operand cache unit non-empty, arithmetic unit under the control of the controller, reads source operand cache unit In data flow and be sent into arithmetic unit carry out operation, obtain operation result deposit destination operand cache unit in;In data In the calculating process of stream, if source operand cache unit is sky, arithmetic unit carries out scene protection operation, until source operand is slow When memory cell non-empty, revocation scene protection operates and continues to carry out operation to data stream;
The destination node request transmitted when configuration layer interface network-on-chip, and destination operand cache unit non-empty When, state layer interface reads operation result from destination operand cache unit, and is sent to destination node by network-on-chip;
When the data volume in destination operand cache unit is more than its amount of storage upper threshold value, source operand cache unit is being controlled Under the control of device processed, suspend the reading of data flow;When the data volume in source operand cache unit is lower than its reserves lower threshold value, Under the control of the controller, continue the reading of data flow;It is sent when data layer interface receives destination node by network-on-chip Purpose pending signal when, data layer interface pause reads operation result from destination operand cache unit, until purpose hang When playing signal revocation, continue to read operation result;To complete the processing of data flow.
As shown in fig. 6, source data input, execution operation and operation result output are with data flow when flowing operation mode Mode execute;Pending data does not need " piecemeal ", no matter the data scale size of calculating task, all be used as a data Stream sequence is handled, and source data input, execution operation and operation result export three steps and essentially coincide in time, maximum Degree realizes hiding for data transmission period, has high data-handling efficiency.
In the present embodiment, a kind of working method of reconfigurable arithmetic unit that supporting multiple-working mode, is applied in base In the computing system of network-on-chip, reconfigurable arithmetic unit includes: control layer, operation layer and accumulation layer;
Control layer includes: state layer interface, configuration layer interface, data layer interface, address generator and controller;
Operation layer includes: arithmetic unit;
Accumulation layer includes: source operand cache unit, destination operand cache unit;
The operating mode of reconfigurable arithmetic unit includes storage operation mode, pulsation operation mode and stream operation mode;
As shown in figure 3, working method is to carry out as follows:
Step 1, reconfigurable arithmetic unit are by the way that after configuration layer interface to configuration information, the host state machine of controller is jumped Go to configuration completion status F_CFG;
Step 2: judge whether operation needs to import parameter according to configuration information, if necessary to parameter, the main shape of controller State machine jumps to parameter and imports state F_PARAMETER, and enters step 3;Otherwise the host state machine of controller jumps directly to Model selection state F_CALMODE, and enter step 4;
Step 3: in the case where parameter imports state F_PARAMETER, parameter importing to be received finishes signal parameter_ Done_i, as parameter importing terminate, and jump to model selection state F_CALMODE;
Step 4: under F_CALMODE model selection state, storage operation mode is judged whether it is according to configuration information, if It is, then it represents that include deblocking information in configuration information, host state machine jumps into FS_SRC storage operation and imports state, executes step Rapid 5;Otherwise, host state machine jumps to FF_CAL stream operation state, and enters step 11;
Step 5: initialization i=0, j=0;
Step 6: controller uploads data request information by state layer interface according to deblocking information, and passes through data Layer interface receives i-th of data block as current data block and stores into source operand cache unit;
When current data block all receives to finish, operation commencing signal cal_start_w is generated, host state machine jumps to FS_CAL stores operation and executes state;
Step 7: controller reads current number from source operand cache unit according to address generator address generated According to block and operation is carried out, in obtained operation result deposit destination operand cache unit;Simultaneously to the data block for carrying out operation Interior data volume is counted;When the data volume of current data block as defined in data volume reaches configuration information, controller is generated Operation end signal cal_finish_w;
Step 8: if the arithmetic type in configuration information is FFT or IFFT operation, the host state machine of controller jumps to FFT Operation state FS_FFT executes step 9;Otherwise the host state machine of controller jumps to storage wait state FS_OVERHEAD, holds Row step 10, to complete the operation control processing of current data block;
Step 9: if FFT the number of iterations counter j does not reach the preset value of configuration information, it is slow to exchange source operand The data-interface with destination operand caching is deposited, the number of iterations counter j re-execute the steps 7 from adding 1;Otherwise it jumps to and deposits Wait state FS_OVERHEAD is stored up, FFT the number of iterations counter j is emptied, makes j=0, and execute step 10;
Step 10: judging whether batch counting device i reaches preset batch value n;If reaching, generating whole operations terminates Signal all_cal_finished_w, the host state machine of controller jumps to end state F_END, and enters step 14, otherwise, State layer interface under the control of the controller, reads operation result, and pass through network-on-chip from destination operand cache unit It is sent to destination node, so that i+1 is assigned to i, and re-execute the steps 6 after completing the processing of current data block;
Step 11: controller uploads data request information by state layer interface, and passes through data Layer interface data Block or data flow, when source operand cache unit non-empty, arithmetic unit under the control of the controller, it is single to read source operand caching Data block or data flow in member simultaneously carry out operation, in obtained operation result deposit destination operand cache unit;It is right simultaneously The data volume in the data block of operation or in data flow is carried out to be counted;
The destination node request transmitted when configuration layer interface network-on-chip, and destination operand cache unit non-empty When, state layer interface reads operation result from destination operand cache unit, and is sent to destination node by network-on-chip;
When data volume as defined in data volume reaches configuration information, controller generates operation end signal cal_finish_ w;
Step 12: pulsation operation mode is judged whether it is, if so, host state machine jumps to pulsation wait state FF_ OVERHEAD, and enter step 13;Otherwise, it is expressed as stream operation mode, host state machine jumps to end state F_END, goes forward side by side Enter step 14;
Step 13: judging whether batch counting device i reaches preset batch value n, if reaching, host state machine jumps to knot Pencil state F_END, enters step 14;Otherwise, batch counting device i adds 1 certainly, and re-execute the steps 11;
Step 14: waiting source operand cache unit is sky, jumps to completion status F_FINISH, passes through state layer interface It is sent to system master controller and calculates power release request;
Step 15: calculating power release and request after being sent, the host state machine of controller jumps to idle state F_IDLE.

Claims (2)

1. a kind of reconfigurable arithmetic unit for supporting multiple-working mode is any two routing node for being articulated in network-on-chip On, characterized in that the reconfigurable arithmetic unit includes: control layer, operation layer and accumulation layer;
The control layer includes: state layer interface, configuration layer interface, data layer interface, address generator and controller;
The operation layer includes: arithmetic unit;
The accumulation layer includes: source operand cache unit, destination operand cache unit;
The operating mode of the reconfigurable arithmetic unit includes storage operation mode, pulsation operation mode and stream operation mode;
Which kind of work the configuration information that network-on-chip described in the configuration layer interface of the reconfigurable arithmetic unit is sent, be judged as Operation mode;
If storage operation mode, then it represents that in the configuration information include deblocking information, the state layer interface according to The deblocking information sends data request information to source node by the network-on-chip;The data layer interface passes through The network-on-chip receives a data block as current data block and stores into the source operand cache unit;
After the completion of data block storage, the arithmetic unit is under the control of the controller, according to the address generator Address generated reads the current data block in the source operand cache unit and is sent into the arithmetic unit execution operation, Obtained operation result is stored in the destination operand cache unit;
After a data block is all admitted to the arithmetic unit, the state layer interface leads to according to the deblocking information It crosses the network-on-chip and sends data request information again to source node, for receiving next data block;
The destination node request that the network-on-chip described in the configuration layer interface transmits, and current data block has been completed to transport It calculates, the state layer interface reads operation result from the destination operand cache unit, and is sent out by the network-on-chip Destination node is given, to complete the processing of current data block;
If pulsation operation mode, then it represents that in the configuration information include deblocking information, the state layer interface according to The deblocking information sends data request information to source node by the network-on-chip;The data layer interface passes through The network-on-chip receives a data block as current data block and stores into the source operand cache unit;
When the source operand cache unit non-empty, the arithmetic unit under the control of the controller, reads the source operand Current data block in cache unit simultaneously carries out operation, and obtained operation result is stored in the destination operand cache unit;
If the source operand cache unit is sky, stop reading data from the source operand cache unit immediately, and right The arithmetic unit carries out scene protection, until the source operand cache unit non-empty;
After a data block is all admitted to the arithmetic unit, the state layer interface leads to according to the deblocking information It crosses the network-on-chip and sends data request information again to source node for receiving next data block;
The destination node request that the network-on-chip described in the configuration layer interface transmits, and destination operand caching is single When first non-empty, the state layer interface reads operation result from the destination operand cache unit, and passes through the on piece Network is sent to destination node, to complete the processing of current data block;
If stream operation mode, indicate that the state layer interface is according to comprising total amount of data information in the configuration information Total amount of data information sends data request information to source node by the network-on-chip;The data layer interface passes through described Network-on-chip receives data flow and is cached by the source operand cache unit;Number in the source operand cache unit When being more than its amount of storage upper threshold value according to amount, then the data layer interface is by the network-on-chip transmission source pending signal to described Source node;When the data volume in the source operand cache unit is lower than its reserves lower threshold value, the data layer interface passes through The network-on-chip cancels the source pending signal to the source node, so that data flow is continued to, until completing the number According to the reception of total amount;
When the source operand cache unit non-empty, the arithmetic unit under the control of the controller, reads the source operand Data flow in cache unit is simultaneously sent into the arithmetic unit and carries out operation, and it is slow that obtained operation result is stored in the destination operand In memory cell;In the calculating process of the data flow, if the source operand cache unit is sky, the arithmetic unit is carried out Scene protection operation, when the source operand cache unit non-empty, revocation scene protection operates and continues to flow into data Row operation;
The destination node request that the network-on-chip described in the configuration layer interface transmits, and destination operand caching is single When first non-empty, the state layer interface reads operation result from the destination operand cache unit, and passes through the on piece Network is sent to destination node;
When the data volume in the destination operand cache unit is more than its amount of storage upper threshold value, the source operand cache unit Under the control of the controller, suspend the reading of data flow;When the data volume in the source operand cache unit is lower than its reserves When lower threshold value, under the control of the controller, continue the reading of data flow;When data layer interface is received by the network-on-chip When the purpose pending signal that destination node is sent, the pause of data layer interface reads operation from the destination operand cache unit As a result, continuing to read operation result when purpose pending signal revocation;To complete the processing of data flow.
2. a kind of working method for the reconfigurable arithmetic unit for supporting multiple-working mode, is applied in coarseness computing system In, characterized in that the reconfigurable arithmetic unit includes: control layer, operation layer and accumulation layer;
The control layer includes: state layer interface, configuration layer interface, data layer interface, address generator and controller;
The operation layer includes: arithmetic unit;
The accumulation layer includes: source operand cache unit, destination operand cache unit;
The operating mode of the reconfigurable arithmetic unit includes storage operation mode, pulsation operation mode and stream operation mode;
The working method is to carry out as follows:
After step 1, reconfigurable arithmetic unit pass through the configuration layer interface to configuration information, the major state of the controller Machine jumps to model selection state F_CALMODE;
Step 2: at model selection state F_CALMODE, storage operation mode is judged whether it is according to the configuration information, if It is, then it represents that include deblocking information in the configuration information, the host state machine of the controller is jumped into storage operation and imported State FS_SRC executes step 3;Otherwise, the host state machine of the controller jumps to stream operation state FF_CAL, and enters step Rapid 7;
Step 3: initialization i=0;
Step 4, controller upload data request information by the state layer interface according to the deblocking information, and pass through I-th of data block of data Layer interface is as current data block and stores into the source operand cache unit;
When current data block all receives to finish, operation commencing signal cal_start_w, the major state of the controller are generated Machine jumps to storage operation and executes state FS_CAL;
Step 5: the controller is read from the source operand cache unit according to address generator address generated It takes current data block and carries out operation, obtained operation result is stored in the destination operand cache unit;Simultaneously to progress Data volume in the data block of operation is counted;When data volume reaches the data volume of current data block as defined in configuration information When, the controller generates operation end signal cal_finish_w;The host state machine of the controller jumps to storage and waits State FS_OVERHEAD, to complete the operation control processing of current data block;
Step 6: judging whether batch counting device i reaches preset batch value n;If reaching, whole operation end signals are generated All_cal_finished_w, the host state machine of the controller jumps to end state F_END, and enters step 10, otherwise, The state layer interface under the control of the controller, reads operation result from the destination operand cache unit, and pass through The network-on-chip is sent to destination node, so that i+1 is assigned to i, and re-execute after completing the processing of current data block Step 4;
Step 7: controller uploads data request information by state layer interface, and passes through data Layer interface data block or number According to stream, when source operand cache unit non-empty, the arithmetic unit under the control of the controller, reads the source operand caching Data block or data flow in unit simultaneously carry out operation, and obtained operation result is stored in the destination operand cache unit; The data volume in the data block for carrying out operation or in data flow is counted simultaneously;
The destination node request that the network-on-chip described in the configuration layer interface transmits, and destination operand caching is single When first non-empty, the state layer interface reads operation result from the destination operand cache unit, and passes through the on piece Network is sent to destination node;
When data volume as defined in data volume reaches configuration information, the controller generates operation end signal cal_finish_ w;
Step 8: pulsation operation mode is judged whether it is, if so, the host state machine of the controller, which jumps to pulsation, waits shape State FF_OVERHEAD, and enter step 9;Otherwise, it is expressed as stream operation mode, the host state machine of the controller jumps to knot Pencil state F_END, and enter step 10;
Step 9: judging whether batch counting device i reaches preset batch value n, if reaching, the host state machine of the controller End state F_END is jumped to, enters step 10;Otherwise, batch counting device i adds 1 certainly, and re-execute the steps 7;
Step 10: waiting the source operand cache unit for sky, the host state machine of the controller jumps to idle state F_ IDLE。
CN201610523519.7A 2016-07-04 2016-07-04 A kind of reconfigurable arithmetic unit that supporting multiple-working mode and its working method Active CN106155814B (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN201610523519.7A CN106155814B (en) 2016-07-04 2016-07-04 A kind of reconfigurable arithmetic unit that supporting multiple-working mode and its working method

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN201610523519.7A CN106155814B (en) 2016-07-04 2016-07-04 A kind of reconfigurable arithmetic unit that supporting multiple-working mode and its working method

Publications (2)

Publication Number Publication Date
CN106155814A CN106155814A (en) 2016-11-23
CN106155814B true CN106155814B (en) 2019-04-05

Family

ID=58061746

Family Applications (1)

Application Number Title Priority Date Filing Date
CN201610523519.7A Active CN106155814B (en) 2016-07-04 2016-07-04 A kind of reconfigurable arithmetic unit that supporting multiple-working mode and its working method

Country Status (1)

Country Link
CN (1) CN106155814B (en)

Families Citing this family (10)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN107193656B (en) * 2017-05-17 2020-01-10 深圳先进技术研究院 Resource management method of multi-core system, terminal device and computer readable storage medium
CN107193755A (en) * 2017-06-29 2017-09-22 合肥工业大学 A kind of MMU memory management unit and its working method suitable for general floating point processor
CN108334738B (en) * 2017-12-29 2021-12-14 创业慧康科技股份有限公司 Dynamic calculation power distribution method for distributed big data processing
CN110309098A (en) * 2019-06-27 2019-10-08 上海金卓网络科技有限公司 Interaction control method, device, equipment and storage medium between a kind of processor
US11467846B2 (en) * 2019-08-02 2022-10-11 Tenstorrent Inc. Overlay layer for network of processor cores
CN111045954B (en) * 2019-11-29 2023-08-08 北京航空航天大学青岛研究院 NAND-SPIN-based in-memory computing acceleration method
CN110990767B (en) * 2019-11-29 2021-08-31 华中科技大学 Reconfigurable number theory transformation unit and method applied to lattice cryptosystem
CN112214448B (en) * 2020-10-10 2024-04-09 声龙(新加坡)私人有限公司 Data dynamic reconstruction circuit and method of heterogeneous integrated workload proving operation chip
CN112379928B (en) * 2020-11-11 2023-04-07 海光信息技术股份有限公司 Instruction scheduling method and processor comprising instruction scheduling unit
CN113543045B (en) * 2021-05-28 2022-04-26 平头哥(上海)半导体技术有限公司 Processing unit, correlation device, and tensor operation method

Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102576304A (en) * 2009-06-19 2012-07-11 奇异计算有限公司 Processing with compact arithmetic processing element
CN103955584A (en) * 2014-05-12 2014-07-30 合肥工业大学 Upper bound optimization method of on-chip network restructuring cache based on multi-path routing
CN104618303A (en) * 2015-02-05 2015-05-13 东南大学 Reconfigurable modulation and demodulation method applied to baseband processing

Family Cites Families (1)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US9218205B2 (en) * 2012-07-11 2015-12-22 Ca, Inc. Resource management in ephemeral environments

Patent Citations (3)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
CN102576304A (en) * 2009-06-19 2012-07-11 奇异计算有限公司 Processing with compact arithmetic processing element
CN103955584A (en) * 2014-05-12 2014-07-30 合肥工业大学 Upper bound optimization method of on-chip network restructuring cache based on multi-path routing
CN104618303A (en) * 2015-02-05 2015-05-13 东南大学 Reconfigurable modulation and demodulation method applied to baseband processing

Non-Patent Citations (2)

* Cited by examiner, † Cited by third party
Title
S. Ciricescu;R. Essick;B. Lucas等.The reconfigurable streaming vector processor (RSVP/spl trade/).《Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36. in IEEE》.2004,
一种一维可重构计算***模型的设计;杜高明、张敏、宋宇鲲;《合肥工业大学学报》;20150131;第38卷(第1期);全文

Also Published As

Publication number Publication date
CN106155814A (en) 2016-11-23

Similar Documents

Publication Publication Date Title
CN106155814B (en) A kind of reconfigurable arithmetic unit that supporting multiple-working mode and its working method
CN110347635B (en) Heterogeneous multi-core microprocessor based on multilayer bus
CN103543954B (en) A kind of data storage and management method and device
US9195610B2 (en) Transaction info bypass for nodes coupled to an interconnect fabric
CN106227507B (en) Computing system and its controller
CN109542830B (en) Data processing system and data processing method
CN104699631A (en) Storage device and fetching method for multilayered cooperation and sharing in GPDSP (General-Purpose Digital Signal Processor)
CN103714026B (en) A kind of memory access method supporting former address data exchange and device
WO2015187209A1 (en) Transactional traffic specification for network-on-chip design
CN111190735B (en) On-chip CPU/GPU pipelining calculation method based on Linux and computer system
CN107590085A (en) A kind of dynamic reconfigurable array data path and its control method with multi-level buffer
CN104765701B (en) Data access method and equipment
CN110399221A (en) Data processing method, system and terminal device
CN102508803A (en) Matrix transposition memory controller
CN107315717A (en) A kind of apparatus and method for performing vectorial arithmetic
Chen et al. ArSMART: An improved SMART NoC design supporting arbitrary-turn transmission
CN105824604B (en) Multiple-input and multiple-output processor pipeline data synchronization unit and method
KR20220136426A (en) Queue Allocation in Machine Learning Accelerators
CN116663639B (en) Gradient data synchronization method, system, device and medium
CN103166863B (en) Lump type 8X8 low delay high bandwidth intersection cache queue slice upstream routers
CN116074267B (en) Data communication system and SoC chip
Zhao et al. Insight and reduction of MapReduce stragglers in heterogeneous environment
CN106569968B (en) For data transmission structure and dispatching method between the array of reconfigurable processor
CN105518617B (en) Data cached processing method and processing device
CN115550173A (en) Dynamic calculation communication scheduling method based on WFBP and link characteristics

Legal Events

Date Code Title Description
C06 Publication
PB01 Publication
C10 Entry into substantive examination
SE01 Entry into force of request for substantive examination
GR01 Patent grant
GR01 Patent grant