CN106155814B - A kind of reconfigurable arithmetic unit that supporting multiple-working mode and its working method - Google Patents
A kind of reconfigurable arithmetic unit that supporting multiple-working mode and its working method Download PDFInfo
- Publication number
- CN106155814B CN106155814B CN201610523519.7A CN201610523519A CN106155814B CN 106155814 B CN106155814 B CN 106155814B CN 201610523519 A CN201610523519 A CN 201610523519A CN 106155814 B CN106155814 B CN 106155814B
- Authority
- CN
- China
- Prior art keywords
- data
- cache unit
- layer interface
- operand cache
- controller
- Prior art date
- Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
- Active
Links
Classifications
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F9/00—Arrangements for program control, e.g. control units
- G06F9/06—Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
- G06F9/46—Multiprogramming arrangements
- G06F9/50—Allocation of resources, e.g. of the central processing unit [CPU]
- G06F9/5061—Partitioning or combining of resources
- G06F9/5066—Algorithms for mapping a plurality of inter-dependent sub-tasks onto a plurality of physical CPUs
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F15/00—Digital computers in general; Data processing equipment in general
- G06F15/16—Combinations of two or more digital computers each having at least an arithmetic unit, a program unit and a register, e.g. for a simultaneous processing of several programs
- G06F15/163—Interprocessor communication
- G06F15/173—Interprocessor communication using an interconnection network, e.g. matrix, shuffle, pyramid, star, snowflake
- G06F15/17306—Intercommunication techniques
-
- G—PHYSICS
- G06—COMPUTING; CALCULATING OR COUNTING
- G06F—ELECTRIC DIGITAL DATA PROCESSING
- G06F2209/00—Indexing scheme relating to G06F9/00
- G06F2209/50—Indexing scheme relating to G06F9/50
- G06F2209/5022—Workload threshold
Landscapes
- Engineering & Computer Science (AREA)
- Theoretical Computer Science (AREA)
- Software Systems (AREA)
- Physics & Mathematics (AREA)
- General Engineering & Computer Science (AREA)
- General Physics & Mathematics (AREA)
- Computer Hardware Design (AREA)
- Mathematical Physics (AREA)
- Small-Scale Networks (AREA)
- Memory System Of A Hierarchy Structure (AREA)
Abstract
The invention discloses a kind of reconfigurable cell for supporting multiple-working mode and its working methods, it is characterized in that: reconfigurable arithmetic unit includes control layer, operation layer and accumulation layer;Wherein control layer includes state layer interface, configuration layer interface, data layer interface, address generator and controller;Operation layer includes arithmetic unit;Accumulation layer includes source operand cache unit, destination operand cache unit.The operating mode of reconfigurable arithmetic unit includes three kinds of operation mode of storage operation mode, pulsation operation mode and stream, provides stronger flexibility for the Algorithm mapping of computing system.When carrying out duty mapping in computing systems, it can be according to the specific features and its algorithm bottleneck of algorithm to be mapped, in conjunction with the concrete condition of network communication in computing system and memory bandwidth, select the specific works mode of reconfigurable arithmetic unit, to take into account operation handling capacity and network communication and storage access pressure, the working efficiency of whole system is improved.
Description
Technical field
It is calculated the present invention relates to high density and digital signaling system field, specifically a kind of chip multi-core that is used for calculates
The reconfigurable arithmetic unit and its working method of system.
Background technique
Specific integrated circuit (ASIC) and general processor (GPP) are two kinds of common data processing hardwares.ASIC is directed to
Specific application design, operation efficiency is high but does not have versatility;GPP is used for general-purpose computations, has very strong flexibility, but opposite
ASIC operation efficiency is low.Restructural operation achieves balance in the high efficiency of ASIC and the versatility of GPP, is guaranteeing one
Determine in field on the basis of versatility, obtain the opposite higher efficiency of GPP, is a kind of common in current multicore computing system
Calculate power organizational form.
Multi-core technology is because low power consumption, strong parallel processing capability and excellent calculated performance have become processor and set
The mainstream of meter.However, realizing the efficient mapping of the high efficiency communication and task between operation core, it is directly related to the calculation of multiple nucleus system
Can power be played, and be the major issue that current multiple nucleus system faces.
Summary of the invention
The present invention is to overcome the shortcoming of existing invention, proposes a kind of support storage operation mode, pulsation operation mould
The reconfigurable arithmetic unit and its working method of formula and stream three kinds of operating modes of operation mode, to be coarseness computing system
Algorithm mapping stronger flexibility is provided, can be according to the tool of algorithm to be mapped when carrying out duty mapping in computing systems
Body characteristics and its algorithm bottleneck, in conjunction with the concrete condition of network communication in computing system and memory bandwidth, selection can more preferably be weighed
The specific working mode of structure arithmetic element improves entire to take into account operation handling capacity and network communication and storage access pressure
The working efficiency of system.
The technical scheme adopted by the invention to achieve the purpose is as follows:
A kind of reconfigurable arithmetic unit for supporting multiple-working mode of the present invention, is any two for being articulated in network-on-chip
On routing node, its main feature is that, the reconfigurable arithmetic unit includes: control layer, operation layer and accumulation layer;
The control layer includes: state layer interface, configuration layer interface, data layer interface, address generator and controller;
The operation layer includes: arithmetic unit;
The accumulation layer includes: source operand cache unit, destination operand cache unit;
The operating mode of the reconfigurable arithmetic unit includes storage operation mode, pulsation operation mode and stream operation mould
Formula;
The configuration information that network-on-chip described in the configuration layer interface of the reconfigurable arithmetic unit is sent, why is judgement
Kind operating mode;
If storage operation mode, then it represents that include deblocking information, the state layer interface in the configuration information
According to the deblocking information, data request information is sent to source node by the network-on-chip;The data layer interface
A data block is received as current data block by the network-on-chip and is stored into the source operand cache unit;
After the completion of data block storage, the arithmetic unit is raw according to the address under the control of the controller
It grows up to be a useful person address generated, reads the current data block in the source operand cache unit and be sent into the arithmetic unit and execute fortune
It calculates, obtained operation result is stored in the destination operand cache unit;
After a data block is all admitted to the arithmetic unit, the state layer interface is believed according to the deblocking
Breath, sends data request information to source node, for receiving next data block by the network-on-chip again;
The destination node request that the network-on-chip described in the configuration layer interface transmits, and current data block is complete
At operation, the state layer interface reads operation result from the destination operand cache unit, and passes through described online
Network is sent to destination node, to complete the processing of current data block;
If pulsation operation mode, then it represents that include deblocking information, the state layer interface in the configuration information
According to the deblocking information, data request information is sent to source node by the network-on-chip;The data layer interface
A data block is received as current data block by the network-on-chip and is stored into the source operand cache unit;
When the source operand cache unit non-empty, the arithmetic unit under the control of the controller, reads the source behaviour
It counts and the current data block in cache unit and carries out operation, obtained operation result is stored in the destination operand cache unit
In;
If the source operand caching is sky, stop reading data from source operand caching immediately, and to described
Arithmetic unit carries out scene protection, until the source operand caches non-empty;
After a data block is all admitted to the arithmetic unit, the state layer interface is believed according to the deblocking
Breath sends data request information to source node for receiving next data block by the network-on-chip again;
The destination node request that the network-on-chip described in the configuration layer interface transmits, and the destination operand is slow
When memory cell non-empty, the state layer interface reads operation result from the destination operand cache unit, and by described
Network-on-chip is sent to destination node, to complete the processing of current data block;
If stream operation mode, indicate in the configuration information comprising total amount of data information, the state layer interface according to
The total amount of data information sends data request information to source node by the network-on-chip;The data layer interface passes through
The network-on-chip receives data flow and is cached by the source operand cache unit;When in the source operand cache unit
Data volume be more than its amount of storage upper threshold value when, then the data layer interface is given by the network-on-chip transmission source pending signal
The source node;When the data volume in the source operand cache unit is lower than its reserves lower threshold value, the data layer interface
The source pending signal is cancelled to the source node by the network-on-chip, so that data flow is continued to, until completing institute
State the reception of total amount of data;
When the source operand cache unit non-empty, the arithmetic unit under the control of the controller, reads the source behaviour
The data flow in cache unit of counting and and be sent into the arithmetic unit and carry out operation, obtained operation result is stored in the purpose behaviour
It counts in cache unit;In the calculating process of the data flow, if the source operand cache unit is sky, the operation
Device carries out scene protection operation, and when the source operand cache unit non-empty, revocation scene protection operates and continues logarithm
Operation is carried out according to stream;
The destination node request that the network-on-chip described in the configuration layer interface transmits, and the destination operand is slow
When memory cell non-empty, the state layer interface reads operation result from the destination operand cache unit, and by described
Network-on-chip is sent to destination node;
When the data volume in the destination operand cache unit is more than its amount of storage upper threshold value, the source operand is cached
Unit under the control of the controller, suspends the reading of data flow;When the data volume in the source operand cache unit is lower than it
When reserves lower threshold value, under the control of the controller, continue the reading of data flow;When data layer interface is connect by the network-on-chip
When receiving the purpose pending signal of destination node transmission, the pause of data layer interface is read from the destination operand cache unit
Operation result continues to read operation result when purpose pending signal revocation;To complete the processing of data flow.
A kind of working method for the reconfigurable arithmetic unit for supporting multiple-working mode of the present invention, is applied in coarseness meter
In calculation system, its main feature is that, the reconfigurable arithmetic unit includes: control layer, operation layer and accumulation layer;
The control layer includes: state layer interface, configuration layer interface, data layer interface, address generator and controller;
The operation layer includes: arithmetic unit;
The accumulation layer includes: source operand cache unit, destination operand cache unit;
The operating mode of the reconfigurable arithmetic unit includes storage operation mode, pulsation operation mode and stream operation mould
Formula;
The working method is to carry out as follows:
After step 1, reconfigurable arithmetic unit pass through the configuration layer interface to configuration information, the master of the controller
State machine jumps to model selection state F_CALMODE;
Step 2: at model selection state F_CALMODE, storage operation mould being judged whether it is according to the configuration information
Formula, if so, indicating that, comprising deblocking information in the configuration information, the host state machine of the controller jumps into storage operation
Importing state FS_SRC executes step 3;Otherwise, the host state machine of the controller jumps to stream operation state FF_CAL, goes forward side by side
Enter step 7;
Step 3: initialization i=0;
Step 4, controller upload data request information by the state layer interface according to the deblocking information, and
It as current data block and is stored into the source operand cache unit by i-th of data block of data Layer interface;
When current data block all receives to finish, operation commencing signal cal_start_w, the master of the controller are generated
State machine jumps to storage operation and executes state FS_CAL;
Step 5: the controller is according to address generator address generated, from the source operand cache unit
Middle reading current data block simultaneously carries out operation, and obtained operation result is stored in the destination operand cache unit;It is right simultaneously
The data volume carried out in the data block of operation is counted;When data volume reaches the data of current data block as defined in configuration information
When amount, the controller generates operation end signal cal_finish_w;The host state machine of the controller jumps to storage etc.
To state FS_OVERHEAD, to complete the operation control processing of current data block;
Step 6: judging whether batch counting device i reaches preset batch value n;If reaching, generating whole operations terminates
Signal all_cal_finished_w, the host state machine of the controller jumps to end state F_END, and enters step 10,
Otherwise, the state layer interface under the control of the controller, reads operation result from the destination operand cache unit, and
It is sent to destination node by the network-on-chip, so that i+1 is assigned to i, and again after completing the processing of current data block
Execute step 4;
Step 7: controller uploads data request information by state layer interface, and passes through data Layer interface data block
Or data flow, when source operand cache unit non-empty, the arithmetic unit under the control of the controller, reads the source operand
Data block or data flow in cache unit simultaneously carry out operation, and obtained operation result is stored in the destination operand cache unit
In;The data volume in the data block for carrying out operation or in data flow is counted simultaneously;
The destination node request that the network-on-chip described in the configuration layer interface transmits, and the destination operand is slow
When memory cell non-empty, the state layer interface reads operation result from the destination operand cache unit, and by described
Network-on-chip is sent to destination node;
When data volume as defined in data volume reaches configuration information, the controller generates operation end signal cal_
finish_w;
Step 8: pulsation operation mode is judged whether it is, if so, the host state machine of the controller jumps to pulsation etc.
To state FF_OVERHEAD, and enter step 9;Otherwise, it is expressed as stream operation mode, the host state machine of the controller jumps
To end state F_END, and enter step 10;
Step 9: judge whether batch counting device i reaches preset batch value n, if reaching, the main shape of the controller
State machine jumps to end state F_END, enters step 10;Otherwise, batch counting device i adds 1 certainly, and re-execute the steps 7;
Step 10: waiting the source operand cache unit for sky, the host state machine of the controller jumps to idle shape
State F_IDLE.
Compared with prior art, advantageous effects of the invention are embodied in:
1, the present invention supports storage operation mode, pulsation operation mode and stream three kinds of operating modes of operation mode, is on piece
The Algorithm mapping of multicore computing system provides stronger flexibility, provides more multiselect for the Algorithm mapping mode of programmer
It selects;Under different working modes, reconfigurable arithmetic unit external characteristics having the same uses identical network interface protocols, side
Reconfigurable arithmetic unit in computing systems integrated and use;When carrying out duty mapping in computing systems, Ke Yigen
According to the specific features and its algorithm bottleneck of algorithm to be mapped, the specific feelings of network communication and memory bandwidth in computing system are combined
Condition, the specific works mode of selection more preferably reconfigurable arithmetic unit, to take into account operation handling capacity and network communication and deposit
Storage access pressure, improves the working efficiency of entire computing system;Under the control of computing system master controller, restructural operation
The operating mode of unit can be in switching at runtime between operation mode and stream operation mode of pulsing, so that the operation for realizing task is excellent
First grade management makes the task execution process of system have stronger controllability.
2, when storing under operation mode, data are stored in advance in source operand cache unit for present invention work,
Under the cooperation of address generator, a variety of flexible address jump rules may be implemented, realize data selection and calculate two steps
Rapid fusion reduces or even avoids the equal pre-operations that reorder before operation to pending data, reduces programmer couple
The difficulty of complicated algorithm mapping, improves the working efficiency of reconfigurable arithmetic unit;Pass through the circulation function of configuration address generator
Can, it can also realize the reuse of data, to reduce the data transmission times of reconfigurable arithmetic unit, improve operation
Time ratio shared in the entire calculating task time, the algorithm relatively high for data-reusing rate, such as matrix multiplication, quickly
Fourier transformation and its inverse transformation etc. have very high application value, can be greatly improved computational efficiency, shorten calculating and appoint
The total time of business;Store that reconfigurable arithmetic unit control under operation mode is simple, debugging difficulty when duty mapping is low, convenient for pair
The fast mapping of complicated algorithm;Under this mode, just start to operate in next step after the completion of the back operation of data block, a data
The operation and transmission of block are carried out continuously in its process, and data transmission will not break after the data link setup of network-on-chip
Stream, the data link utilization rate of network-on-chip is higher, thus greatly reduces arithmetic element to network communication and memory bandwidth
Pressure.
3, when present invention work is under operation mode of pulsing, data processing is carried out by unit flowing water of data block, can be incited somebody to action
The data transmission period of reconfigurable arithmetic unit and operation time partially overlap, to improve the working efficiency of system;To more
Link need to be re-established when the processing of a data block, transmission path can be adjusted according to the priority and network occupancy situation of task,
Sequence is executed to determine operation, additionally it is possible to realize the one-to-many service of reconfigurable arithmetic unit, have more preferably mapping clever
Activity reaches the target for making full use of and calculating power;The data scale of data block is less than the endogenous behaviour of reconfigurable arithmetic unit under this mode
It counts the capacity of cache unit and destination operand cache unit, not will cause memory overflow error, do not need network-on-chip
Flow control logic, control and debugging difficulty are moderate, in control complexity, between flexibility and operation efficiency achieve balance.
4, when flowing under operation mode, the data transmission period of reconfigurable arithmetic unit and operation time exist for present invention work
It realizes and is overlapped to the full extent, data are processed in the form of data flow, and a link is only established and cancelled to entire processor active task,
Reduce network-on-chip to repeat to establish and cancel the loss of clock cycle caused by link, decreases arithmetic unit and repeatedly enter and move back
The clock cycle caused by arithmetic pipelining loses out, can obtain high operation efficiency;In computing systems, by will be multiple
Reconfigurable arithmetic unit is combined with cascade form completes multinomial operation, can make cascade all reconfigurable arithmetic unit groups
At an assembly line, it has further been overlapped data transmission period and operation time, ultra-deep streamlined is realized in structure, into one
Step improves work efficiency;The source operand caching and destination operand caching of system are equipped with upper threshold value and lower threshold value, both keep away
The appearance for exempting from memory spilling, ensure that the safety of data, and reserve enough data latency processing, will not make operation
The data of device stop, and ensure that the working efficiency of reconfigurable arithmetic unit, it is thus also avoided that the frequent transmission and revocation of pending signal,
Reduce power consumption.
Detailed description of the invention
Fig. 1 for the present invention towards chip multi-core computing system structure chart;
Fig. 2 is structure of the invention figure;
Fig. 3 is controller host state machine schematic diagram of the present invention;
Fig. 4 is present invention storage operation mode schematic diagram;
Fig. 5 is present invention pulsation operation mode schematic diagram;
Fig. 6 is present invention stream operation mode schematic diagram.
Specific embodiment
In this example implementation, a kind of reconfigurable arithmetic unit for supporting multiple-working mode is to be articulated in piece as shown in Figure 1
On any two routing node of the network-on-chip of upper multicore computing system, pass through the local interface completion of network-on-chip and on piece
The data exchange of network;Fig. 2 gives structural block diagram of the invention, and reconfigurable arithmetic unit includes: control layer, operation layer and deposits
Reservoir;
Control layer includes: state layer interface, configuration layer interface, data layer interface, address generator and controller;
Operation layer includes: arithmetic unit;
Accumulation layer includes: source operand cache unit, destination operand cache unit;
Source operand cache unit and destination operand caching are equipped with cache threshold, and cache threshold is according to computing system
The stream of the capacity of source operand caching and destination operand caching, arithmetic unit in the scale of network-on-chip, reconfigurable arithmetic unit
The numerical value of the factors such as water series setting;Threshold value in the design is divided into upper threshold value and lower threshold value, and upper threshold value guarantees the transmission of data
Safety, avoids cache overflow and leads to loss of data, and lower threshold value avoids arithmetic unit working efficiency caused by data cutout
It reduces, while reducing the number of occurrence of control signal, reduce power consumption;
Source pending signal and purpose pending signal are equipped in data Layer interface module, the two signals are for network-on-chip
Destination node is transmitted to originating node requests pause data;Arithmetic unit when scene protection logic in arithmetic unit stops for data
Working condition protection, it is therefore prevented that numerical exception jump or data in arithmetic unit are lost;
The operating mode of reconfigurable arithmetic unit includes storage operation mode, pulsation operation mode and stream operation mode;No
With under operating mode, reconfigurable arithmetic unit external characteristics having the same can be convenient using identical network interface protocols
Realize the integrated and cooperation with other arithmetic elements of reconfigurable arithmetic unit in computing systems;Reconfigurable arithmetic unit work
Make when storing operation mode or pulsation operation mode, if the data scale of calculating task is grasped greater than reconfigurable arithmetic unit source
It counts the capacity of cache unit, then must data be carried out with " piecemeal " operation, it is first slow according to reconfigurable arithmetic unit source operand
Data are divided into several data blocks by the capacity of memory cell, then are successively handled each data block;
Which kind of Working mould the configuration information that the configuration layer interface network-on-chip of reconfigurable arithmetic unit is sent, be judged as
Formula;
If storage operation mode, then it represents that include deblocking information in configuration information, state layer interface is according to data
Blocking information sends data request information to source node by network-on-chip;Data layer interface receives one by network-on-chip
Data block is as current data block and stores into source operand cache unit;
When data block, which stores, to be completed, arithmetic unit under the control of the controller, according to address generator address generated,
It reads the current data block in source operand cache unit and is sent into arithmetic unit and execute operation, obtained operation result deposit purpose
In operand cache unit;
After a data block is all admitted to arithmetic unit, state layer interface is surfed the Internet according to deblocking information by piece
Network sends data request information to source node for receiving next data block again;
When the destination node request that configuration layer interface network-on-chip transmits, and after current data block completion operation, shape
State layer interface reads operation result from destination operand cache unit, and is sent to destination node by network-on-chip, thus
Complete the processing of current data block;
As shown in figure 4, source data input, execution operation and operation result, which export three steps, is under storage operation mode
It is successively executed in order as unit of data block, before calculating starts, entire data block has been stored in restructural in advance
In the source operand cache unit of arithmetic element, there is no temporal coincidence between each step of each data block;Data block it
Between source data input and operation result output can with the coincidence in having time, thus conceal to greatest extent data transmission when
Between;
If pulsation operation mode, then it represents that include deblocking information in configuration information, state layer interface is according to data
Blocking information sends data request information to source node by network-on-chip;Data layer interface receives one by network-on-chip
Data block is as current data block and stores into source operand cache unit;
When source operand cache unit non-empty, arithmetic unit under the control of the controller, reads source operand cache unit
In current data block and carry out operation, in obtained operation result deposit destination operand cache unit;
If source operand caching is sky, stop reading data from source operand caching immediately, and show arithmetic unit
Field protection, until source operand caches non-empty;
After a data block is all admitted to arithmetic unit, state layer interface is surfed the Internet according to deblocking information by piece
Network sends data request information to source node for receiving next data block again;
The destination node request transmitted when configuration layer interface network-on-chip, and destination operand cache unit non-empty
When, state layer interface reads operation result from destination operand cache unit, and is sent to destination node by network-on-chip,
To complete the processing of current data block;
As shown in figure 5, when pulsing operation mode, source data input, execute operation and operation result output be with
Data block is what unit successively sequentially executed, but in same data block, and source data input executes operation and operation result output
Three steps have part-time coincidence in time, are carried out in a manner of flowing water;It is executed in order between each data block
Operation, when previous number terminates according to the operation of block, just the operation of progress subsequent data chunk, was not overlapped, to guarantee number on the time
According to safety.
If stream operation mode, indicate in configuration information to include total amount of data information, state layer interface is according to total amount of data
Information sends data request information to source node by network-on-chip;Data layer interface receives data flow simultaneously by network-on-chip
It is cached by source operand cache unit;When the data volume in source operand cache unit is more than its amount of storage upper threshold value, then
Data layer interface is by network-on-chip transmission source pending signal to source node;When the data volume in source operand cache unit is lower than
When its reserves lower threshold value, data layer interface cancels source pending signal to source node, to continue to data by network-on-chip
Stream, the reception until completing total amount of data;
When source operand cache unit non-empty, arithmetic unit under the control of the controller, reads source operand cache unit
In data flow and be sent into arithmetic unit carry out operation, obtain operation result deposit destination operand cache unit in;In data
In the calculating process of stream, if source operand cache unit is sky, arithmetic unit carries out scene protection operation, until source operand is slow
When memory cell non-empty, revocation scene protection operates and continues to carry out operation to data stream;
The destination node request transmitted when configuration layer interface network-on-chip, and destination operand cache unit non-empty
When, state layer interface reads operation result from destination operand cache unit, and is sent to destination node by network-on-chip;
When the data volume in destination operand cache unit is more than its amount of storage upper threshold value, source operand cache unit is being controlled
Under the control of device processed, suspend the reading of data flow;When the data volume in source operand cache unit is lower than its reserves lower threshold value,
Under the control of the controller, continue the reading of data flow;It is sent when data layer interface receives destination node by network-on-chip
Purpose pending signal when, data layer interface pause reads operation result from destination operand cache unit, until purpose hang
When playing signal revocation, continue to read operation result;To complete the processing of data flow.
As shown in fig. 6, source data input, execution operation and operation result output are with data flow when flowing operation mode
Mode execute;Pending data does not need " piecemeal ", no matter the data scale size of calculating task, all be used as a data
Stream sequence is handled, and source data input, execution operation and operation result export three steps and essentially coincide in time, maximum
Degree realizes hiding for data transmission period, has high data-handling efficiency.
In the present embodiment, a kind of working method of reconfigurable arithmetic unit that supporting multiple-working mode, is applied in base
In the computing system of network-on-chip, reconfigurable arithmetic unit includes: control layer, operation layer and accumulation layer;
Control layer includes: state layer interface, configuration layer interface, data layer interface, address generator and controller;
Operation layer includes: arithmetic unit;
Accumulation layer includes: source operand cache unit, destination operand cache unit;
The operating mode of reconfigurable arithmetic unit includes storage operation mode, pulsation operation mode and stream operation mode;
As shown in figure 3, working method is to carry out as follows:
Step 1, reconfigurable arithmetic unit are by the way that after configuration layer interface to configuration information, the host state machine of controller is jumped
Go to configuration completion status F_CFG;
Step 2: judge whether operation needs to import parameter according to configuration information, if necessary to parameter, the main shape of controller
State machine jumps to parameter and imports state F_PARAMETER, and enters step 3;Otherwise the host state machine of controller jumps directly to
Model selection state F_CALMODE, and enter step 4;
Step 3: in the case where parameter imports state F_PARAMETER, parameter importing to be received finishes signal parameter_
Done_i, as parameter importing terminate, and jump to model selection state F_CALMODE;
Step 4: under F_CALMODE model selection state, storage operation mode is judged whether it is according to configuration information, if
It is, then it represents that include deblocking information in configuration information, host state machine jumps into FS_SRC storage operation and imports state, executes step
Rapid 5;Otherwise, host state machine jumps to FF_CAL stream operation state, and enters step 11;
Step 5: initialization i=0, j=0;
Step 6: controller uploads data request information by state layer interface according to deblocking information, and passes through data
Layer interface receives i-th of data block as current data block and stores into source operand cache unit;
When current data block all receives to finish, operation commencing signal cal_start_w is generated, host state machine jumps to
FS_CAL stores operation and executes state;
Step 7: controller reads current number from source operand cache unit according to address generator address generated
According to block and operation is carried out, in obtained operation result deposit destination operand cache unit;Simultaneously to the data block for carrying out operation
Interior data volume is counted;When the data volume of current data block as defined in data volume reaches configuration information, controller is generated
Operation end signal cal_finish_w;
Step 8: if the arithmetic type in configuration information is FFT or IFFT operation, the host state machine of controller jumps to FFT
Operation state FS_FFT executes step 9;Otherwise the host state machine of controller jumps to storage wait state FS_OVERHEAD, holds
Row step 10, to complete the operation control processing of current data block;
Step 9: if FFT the number of iterations counter j does not reach the preset value of configuration information, it is slow to exchange source operand
The data-interface with destination operand caching is deposited, the number of iterations counter j re-execute the steps 7 from adding 1;Otherwise it jumps to and deposits
Wait state FS_OVERHEAD is stored up, FFT the number of iterations counter j is emptied, makes j=0, and execute step 10;
Step 10: judging whether batch counting device i reaches preset batch value n;If reaching, generating whole operations terminates
Signal all_cal_finished_w, the host state machine of controller jumps to end state F_END, and enters step 14, otherwise,
State layer interface under the control of the controller, reads operation result, and pass through network-on-chip from destination operand cache unit
It is sent to destination node, so that i+1 is assigned to i, and re-execute the steps 6 after completing the processing of current data block;
Step 11: controller uploads data request information by state layer interface, and passes through data Layer interface data
Block or data flow, when source operand cache unit non-empty, arithmetic unit under the control of the controller, it is single to read source operand caching
Data block or data flow in member simultaneously carry out operation, in obtained operation result deposit destination operand cache unit;It is right simultaneously
The data volume in the data block of operation or in data flow is carried out to be counted;
The destination node request transmitted when configuration layer interface network-on-chip, and destination operand cache unit non-empty
When, state layer interface reads operation result from destination operand cache unit, and is sent to destination node by network-on-chip;
When data volume as defined in data volume reaches configuration information, controller generates operation end signal cal_finish_
w;
Step 12: pulsation operation mode is judged whether it is, if so, host state machine jumps to pulsation wait state FF_
OVERHEAD, and enter step 13;Otherwise, it is expressed as stream operation mode, host state machine jumps to end state F_END, goes forward side by side
Enter step 14;
Step 13: judging whether batch counting device i reaches preset batch value n, if reaching, host state machine jumps to knot
Pencil state F_END, enters step 14;Otherwise, batch counting device i adds 1 certainly, and re-execute the steps 11;
Step 14: waiting source operand cache unit is sky, jumps to completion status F_FINISH, passes through state layer interface
It is sent to system master controller and calculates power release request;
Step 15: calculating power release and request after being sent, the host state machine of controller jumps to idle state F_IDLE.
Claims (2)
1. a kind of reconfigurable arithmetic unit for supporting multiple-working mode is any two routing node for being articulated in network-on-chip
On, characterized in that the reconfigurable arithmetic unit includes: control layer, operation layer and accumulation layer;
The control layer includes: state layer interface, configuration layer interface, data layer interface, address generator and controller;
The operation layer includes: arithmetic unit;
The accumulation layer includes: source operand cache unit, destination operand cache unit;
The operating mode of the reconfigurable arithmetic unit includes storage operation mode, pulsation operation mode and stream operation mode;
Which kind of work the configuration information that network-on-chip described in the configuration layer interface of the reconfigurable arithmetic unit is sent, be judged as
Operation mode;
If storage operation mode, then it represents that in the configuration information include deblocking information, the state layer interface according to
The deblocking information sends data request information to source node by the network-on-chip;The data layer interface passes through
The network-on-chip receives a data block as current data block and stores into the source operand cache unit;
After the completion of data block storage, the arithmetic unit is under the control of the controller, according to the address generator
Address generated reads the current data block in the source operand cache unit and is sent into the arithmetic unit execution operation,
Obtained operation result is stored in the destination operand cache unit;
After a data block is all admitted to the arithmetic unit, the state layer interface leads to according to the deblocking information
It crosses the network-on-chip and sends data request information again to source node, for receiving next data block;
The destination node request that the network-on-chip described in the configuration layer interface transmits, and current data block has been completed to transport
It calculates, the state layer interface reads operation result from the destination operand cache unit, and is sent out by the network-on-chip
Destination node is given, to complete the processing of current data block;
If pulsation operation mode, then it represents that in the configuration information include deblocking information, the state layer interface according to
The deblocking information sends data request information to source node by the network-on-chip;The data layer interface passes through
The network-on-chip receives a data block as current data block and stores into the source operand cache unit;
When the source operand cache unit non-empty, the arithmetic unit under the control of the controller, reads the source operand
Current data block in cache unit simultaneously carries out operation, and obtained operation result is stored in the destination operand cache unit;
If the source operand cache unit is sky, stop reading data from the source operand cache unit immediately, and right
The arithmetic unit carries out scene protection, until the source operand cache unit non-empty;
After a data block is all admitted to the arithmetic unit, the state layer interface leads to according to the deblocking information
It crosses the network-on-chip and sends data request information again to source node for receiving next data block;
The destination node request that the network-on-chip described in the configuration layer interface transmits, and destination operand caching is single
When first non-empty, the state layer interface reads operation result from the destination operand cache unit, and passes through the on piece
Network is sent to destination node, to complete the processing of current data block;
If stream operation mode, indicate that the state layer interface is according to comprising total amount of data information in the configuration information
Total amount of data information sends data request information to source node by the network-on-chip;The data layer interface passes through described
Network-on-chip receives data flow and is cached by the source operand cache unit;Number in the source operand cache unit
When being more than its amount of storage upper threshold value according to amount, then the data layer interface is by the network-on-chip transmission source pending signal to described
Source node;When the data volume in the source operand cache unit is lower than its reserves lower threshold value, the data layer interface passes through
The network-on-chip cancels the source pending signal to the source node, so that data flow is continued to, until completing the number
According to the reception of total amount;
When the source operand cache unit non-empty, the arithmetic unit under the control of the controller, reads the source operand
Data flow in cache unit is simultaneously sent into the arithmetic unit and carries out operation, and it is slow that obtained operation result is stored in the destination operand
In memory cell;In the calculating process of the data flow, if the source operand cache unit is sky, the arithmetic unit is carried out
Scene protection operation, when the source operand cache unit non-empty, revocation scene protection operates and continues to flow into data
Row operation;
The destination node request that the network-on-chip described in the configuration layer interface transmits, and destination operand caching is single
When first non-empty, the state layer interface reads operation result from the destination operand cache unit, and passes through the on piece
Network is sent to destination node;
When the data volume in the destination operand cache unit is more than its amount of storage upper threshold value, the source operand cache unit
Under the control of the controller, suspend the reading of data flow;When the data volume in the source operand cache unit is lower than its reserves
When lower threshold value, under the control of the controller, continue the reading of data flow;When data layer interface is received by the network-on-chip
When the purpose pending signal that destination node is sent, the pause of data layer interface reads operation from the destination operand cache unit
As a result, continuing to read operation result when purpose pending signal revocation;To complete the processing of data flow.
2. a kind of working method for the reconfigurable arithmetic unit for supporting multiple-working mode, is applied in coarseness computing system
In, characterized in that the reconfigurable arithmetic unit includes: control layer, operation layer and accumulation layer;
The control layer includes: state layer interface, configuration layer interface, data layer interface, address generator and controller;
The operation layer includes: arithmetic unit;
The accumulation layer includes: source operand cache unit, destination operand cache unit;
The operating mode of the reconfigurable arithmetic unit includes storage operation mode, pulsation operation mode and stream operation mode;
The working method is to carry out as follows:
After step 1, reconfigurable arithmetic unit pass through the configuration layer interface to configuration information, the major state of the controller
Machine jumps to model selection state F_CALMODE;
Step 2: at model selection state F_CALMODE, storage operation mode is judged whether it is according to the configuration information, if
It is, then it represents that include deblocking information in the configuration information, the host state machine of the controller is jumped into storage operation and imported
State FS_SRC executes step 3;Otherwise, the host state machine of the controller jumps to stream operation state FF_CAL, and enters step
Rapid 7;
Step 3: initialization i=0;
Step 4, controller upload data request information by the state layer interface according to the deblocking information, and pass through
I-th of data block of data Layer interface is as current data block and stores into the source operand cache unit;
When current data block all receives to finish, operation commencing signal cal_start_w, the major state of the controller are generated
Machine jumps to storage operation and executes state FS_CAL;
Step 5: the controller is read from the source operand cache unit according to address generator address generated
It takes current data block and carries out operation, obtained operation result is stored in the destination operand cache unit;Simultaneously to progress
Data volume in the data block of operation is counted;When data volume reaches the data volume of current data block as defined in configuration information
When, the controller generates operation end signal cal_finish_w;The host state machine of the controller jumps to storage and waits
State FS_OVERHEAD, to complete the operation control processing of current data block;
Step 6: judging whether batch counting device i reaches preset batch value n;If reaching, whole operation end signals are generated
All_cal_finished_w, the host state machine of the controller jumps to end state F_END, and enters step 10, otherwise,
The state layer interface under the control of the controller, reads operation result from the destination operand cache unit, and pass through
The network-on-chip is sent to destination node, so that i+1 is assigned to i, and re-execute after completing the processing of current data block
Step 4;
Step 7: controller uploads data request information by state layer interface, and passes through data Layer interface data block or number
According to stream, when source operand cache unit non-empty, the arithmetic unit under the control of the controller, reads the source operand caching
Data block or data flow in unit simultaneously carry out operation, and obtained operation result is stored in the destination operand cache unit;
The data volume in the data block for carrying out operation or in data flow is counted simultaneously;
The destination node request that the network-on-chip described in the configuration layer interface transmits, and destination operand caching is single
When first non-empty, the state layer interface reads operation result from the destination operand cache unit, and passes through the on piece
Network is sent to destination node;
When data volume as defined in data volume reaches configuration information, the controller generates operation end signal cal_finish_
w;
Step 8: pulsation operation mode is judged whether it is, if so, the host state machine of the controller, which jumps to pulsation, waits shape
State FF_OVERHEAD, and enter step 9;Otherwise, it is expressed as stream operation mode, the host state machine of the controller jumps to knot
Pencil state F_END, and enter step 10;
Step 9: judging whether batch counting device i reaches preset batch value n, if reaching, the host state machine of the controller
End state F_END is jumped to, enters step 10;Otherwise, batch counting device i adds 1 certainly, and re-execute the steps 7;
Step 10: waiting the source operand cache unit for sky, the host state machine of the controller jumps to idle state F_
IDLE。
Priority Applications (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610523519.7A CN106155814B (en) | 2016-07-04 | 2016-07-04 | A kind of reconfigurable arithmetic unit that supporting multiple-working mode and its working method |
Applications Claiming Priority (1)
Application Number | Priority Date | Filing Date | Title |
---|---|---|---|
CN201610523519.7A CN106155814B (en) | 2016-07-04 | 2016-07-04 | A kind of reconfigurable arithmetic unit that supporting multiple-working mode and its working method |
Publications (2)
Publication Number | Publication Date |
---|---|
CN106155814A CN106155814A (en) | 2016-11-23 |
CN106155814B true CN106155814B (en) | 2019-04-05 |
Family
ID=58061746
Family Applications (1)
Application Number | Title | Priority Date | Filing Date |
---|---|---|---|
CN201610523519.7A Active CN106155814B (en) | 2016-07-04 | 2016-07-04 | A kind of reconfigurable arithmetic unit that supporting multiple-working mode and its working method |
Country Status (1)
Country | Link |
---|---|
CN (1) | CN106155814B (en) |
Families Citing this family (10)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN107193656B (en) * | 2017-05-17 | 2020-01-10 | 深圳先进技术研究院 | Resource management method of multi-core system, terminal device and computer readable storage medium |
CN107193755A (en) * | 2017-06-29 | 2017-09-22 | 合肥工业大学 | A kind of MMU memory management unit and its working method suitable for general floating point processor |
CN108334738B (en) * | 2017-12-29 | 2021-12-14 | 创业慧康科技股份有限公司 | Dynamic calculation power distribution method for distributed big data processing |
CN110309098A (en) * | 2019-06-27 | 2019-10-08 | 上海金卓网络科技有限公司 | Interaction control method, device, equipment and storage medium between a kind of processor |
US11467846B2 (en) * | 2019-08-02 | 2022-10-11 | Tenstorrent Inc. | Overlay layer for network of processor cores |
CN111045954B (en) * | 2019-11-29 | 2023-08-08 | 北京航空航天大学青岛研究院 | NAND-SPIN-based in-memory computing acceleration method |
CN110990767B (en) * | 2019-11-29 | 2021-08-31 | 华中科技大学 | Reconfigurable number theory transformation unit and method applied to lattice cryptosystem |
CN112214448B (en) * | 2020-10-10 | 2024-04-09 | 声龙(新加坡)私人有限公司 | Data dynamic reconstruction circuit and method of heterogeneous integrated workload proving operation chip |
CN112379928B (en) * | 2020-11-11 | 2023-04-07 | 海光信息技术股份有限公司 | Instruction scheduling method and processor comprising instruction scheduling unit |
CN113543045B (en) * | 2021-05-28 | 2022-04-26 | 平头哥(上海)半导体技术有限公司 | Processing unit, correlation device, and tensor operation method |
Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102576304A (en) * | 2009-06-19 | 2012-07-11 | 奇异计算有限公司 | Processing with compact arithmetic processing element |
CN103955584A (en) * | 2014-05-12 | 2014-07-30 | 合肥工业大学 | Upper bound optimization method of on-chip network restructuring cache based on multi-path routing |
CN104618303A (en) * | 2015-02-05 | 2015-05-13 | 东南大学 | Reconfigurable modulation and demodulation method applied to baseband processing |
Family Cites Families (1)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
US9218205B2 (en) * | 2012-07-11 | 2015-12-22 | Ca, Inc. | Resource management in ephemeral environments |
-
2016
- 2016-07-04 CN CN201610523519.7A patent/CN106155814B/en active Active
Patent Citations (3)
Publication number | Priority date | Publication date | Assignee | Title |
---|---|---|---|---|
CN102576304A (en) * | 2009-06-19 | 2012-07-11 | 奇异计算有限公司 | Processing with compact arithmetic processing element |
CN103955584A (en) * | 2014-05-12 | 2014-07-30 | 合肥工业大学 | Upper bound optimization method of on-chip network restructuring cache based on multi-path routing |
CN104618303A (en) * | 2015-02-05 | 2015-05-13 | 东南大学 | Reconfigurable modulation and demodulation method applied to baseband processing |
Non-Patent Citations (2)
Title |
---|
S. Ciricescu;R. Essick;B. Lucas等.The reconfigurable streaming vector processor (RSVP/spl trade/).《Proceedings. 36th Annual IEEE/ACM International Symposium on Microarchitecture, 2003. MICRO-36. in IEEE》.2004, |
一种一维可重构计算***模型的设计;杜高明、张敏、宋宇鲲;《合肥工业大学学报》;20150131;第38卷(第1期);全文 |
Also Published As
Publication number | Publication date |
---|---|
CN106155814A (en) | 2016-11-23 |
Similar Documents
Publication | Publication Date | Title |
---|---|---|
CN106155814B (en) | A kind of reconfigurable arithmetic unit that supporting multiple-working mode and its working method | |
CN110347635B (en) | Heterogeneous multi-core microprocessor based on multilayer bus | |
CN103543954B (en) | A kind of data storage and management method and device | |
US9195610B2 (en) | Transaction info bypass for nodes coupled to an interconnect fabric | |
CN106227507B (en) | Computing system and its controller | |
CN109542830B (en) | Data processing system and data processing method | |
CN104699631A (en) | Storage device and fetching method for multilayered cooperation and sharing in GPDSP (General-Purpose Digital Signal Processor) | |
CN103714026B (en) | A kind of memory access method supporting former address data exchange and device | |
WO2015187209A1 (en) | Transactional traffic specification for network-on-chip design | |
CN111190735B (en) | On-chip CPU/GPU pipelining calculation method based on Linux and computer system | |
CN107590085A (en) | A kind of dynamic reconfigurable array data path and its control method with multi-level buffer | |
CN104765701B (en) | Data access method and equipment | |
CN110399221A (en) | Data processing method, system and terminal device | |
CN102508803A (en) | Matrix transposition memory controller | |
CN107315717A (en) | A kind of apparatus and method for performing vectorial arithmetic | |
Chen et al. | ArSMART: An improved SMART NoC design supporting arbitrary-turn transmission | |
CN105824604B (en) | Multiple-input and multiple-output processor pipeline data synchronization unit and method | |
KR20220136426A (en) | Queue Allocation in Machine Learning Accelerators | |
CN116663639B (en) | Gradient data synchronization method, system, device and medium | |
CN103166863B (en) | Lump type 8X8 low delay high bandwidth intersection cache queue slice upstream routers | |
CN116074267B (en) | Data communication system and SoC chip | |
Zhao et al. | Insight and reduction of MapReduce stragglers in heterogeneous environment | |
CN106569968B (en) | For data transmission structure and dispatching method between the array of reconfigurable processor | |
CN105518617B (en) | Data cached processing method and processing device | |
CN115550173A (en) | Dynamic calculation communication scheduling method based on WFBP and link characteristics |
Legal Events
Date | Code | Title | Description |
---|---|---|---|
C06 | Publication | ||
PB01 | Publication | ||
C10 | Entry into substantive examination | ||
SE01 | Entry into force of request for substantive examination | ||
GR01 | Patent grant | ||
GR01 | Patent grant |