CN101021781A

CN101021781A - Stream processor expanding method for flexible distribution operating group resource

Info

Publication number: CN101021781A
Application number: CN 200710034574
Authority: CN
Inventors: 衣晓飞; 陈海燕; 蒋江; 杨学军; 张民选; 邢座程; 张明; 穆长富; 阳柳; 曾献君; 马驰远; 李勇; 高军; 李晋文; 倪晓强; 唐遇星; 张承义; 齐树波
Original assignee: National University of Defense Technology
Current assignee: National University of Defense Technology
Priority date: 2007-03-19
Filing date: 2007-03-19
Publication date: 2007-08-22
Anticipated expiration: 2027-03-19
Also published as: CN100456231C

Abstract

This invention discloses a cluster-processor expansion method by flexible allocation of computing resource, and its steps are: (1) setting the array of microcontroller, which is composed of two or more microcontrollers, adding a reading cluster interface for each microcontroller in SRF, and each microcontroller has the same structure and same interface with cluster controller, to start implementation of a core-level procedure under the control of cluster controller, (2)setting interfaces between microcontroller and operational group, (3) setting the cluster controller and adding interfaces connected with each microcontroller on it,(4) setting Clusterop in the controller, to make a core-level procedure implement only in a microcontroller and some computing group ensuring that many microcontrollers executes the procedure at the same time with the permission of computing resources.

Description

The stream processor expanding method of flexible distribution operating group resource

Technical field

The present invention is mainly concerned with the stream handle architecture Design field of using towards stream, refers in particular to a kind of stream processor expanding method of flexible distribution operating group resource.

Background technology

With floating-point operation and Flame Image Process is that the principal feature that the stream of representative is used is a computation-intensive, concurrency and locality.It is exactly the special processing of carrying out at the above-mentioned feature that stream is used that stream is handled, its essential idea is that computation process is decomposed, make operation of data and access be separated from each other, data are carried out piecemeal to be handled, a plurality of parallel computation cell processing block datas are set, carry out explicit communication between each computing unit.Processor with novel architecture of above-mentioned feature is called stream handle.

The framework of typical stream handle is the data parallelism in the Application and Development, adopts a plurality of arithmetic groups, carries out identical instruction in the mode of SIMD on different data sets.Fig. 1 has provided the architecture of the Imagine processor of Stanford Univ USA.The program of carrying out on the stream handle is divided into two levels, stream level program and nuclear level program.Stream level program is meant the operation at whole stream, comprises the access of flow data and starts the operation of nuclear level.The main stream instruction of two classes is access instruction and the operation of nuclear level.The access instruction of stream is meant gets SRF with a complete stream from external memory storage, perhaps a complete stream is deposited external memory storage from SRF.When hardware resource was enough, a plurality of stream memory transfer can concurrently be carried out.The operation of nuclear level is to carry out computing in the inlet flow set, produces one or more output streams.The operation of nuclear level is to carry out on the arithmetical operation group of data parallel, and each arithmetic group is independently flowing execution identical operations sequence on the unit.The effect of stream damper (SB) is to make the single port of SRF can time-sharing multiplex between total interface.Making SRF look has a lot of logic ports.Microcontroller is a nuclear grade controller, is responsible for parameter and control signal that acceptance is transmitted from the stream level, and the microcode of loading nuclear grade program also is stored in the command memory of microcontroller, the most important thing is to control the execution of nuclear level program in arithmetic group.

In actual applications, it not is fairly regular calculating that most stream is used, particularly also inconsistent on calculating scale and degree of parallelism, for example, arithmetic group number in the arithmetic group array is fixed, but the calculating operation that can walk abreast is always not consistent with the number of arithmetic group.This problem that causes is exactly, some arithmetic group can be in idle state in certain period, if there are a plurality of microcontrollers, these idle arithmetic groups just can use, thereby can realize a plurality of nuclears of executed in parallel level program, so our mentality of designing is provided with a plurality of microcontrollers exactly, forms a controller array, add original arithmetic group array, realize that by alteration switch the data of two arrays connect.Kuo Zhan stream handle can be supported to use more widely more flexibly like this.

Architecture expansion at stream handle in this area has following two class methods: core expansion and the expansion of the multinuclear heart, the core expansion is as increasing the FPGA module as configurable unit in arithmetic group, and the type of the arithmetical unit in the change arithmetic group and quantity, be distributed register file etc. with centralized stream registers document change.The core expansion can provide the calculation resources of the formula of cutting the garment according to the figure in conjunction with different application types.The expansion of the multinuclear heart is as being provided with the stream process nuclear of a plurality of complete isomorphisms in stream handle inside, each stream process nuclear is made up of a microcontroller and a plurality of arithmetic group, and centralized SRF changes into distributed SRF or the like.But this improvement is not very flexible, might cause the serious waste calculation resources in some cases.

Summary of the invention

The technical problem to be solved in the present invention just is: at the technical matters that prior art exists, the invention provides a kind of can a plurality of nuclear level of executed in parallel programs, the stream processor expanding methods of the flexible distribution operating group resource of flexible allocation computational resource, accelerating flow data processing.

For solving the problems of the technologies described above, the solution that the present invention proposes is: a kind of stream processor expanding method of flexible distribution operating group resource is characterized in that step is:

(1) the microcontroller array is set, the microcontroller array is made up of two or more microcontrollers, in the stream registers file, for each microcontroller in the microcontroller array increases an interface of reading to flow, each microcontroller has identical structure and the interface identical with stream controller, can under the control of stream controller, start the nuclear level program of carrying out;

(2) interface of microcontroller and arithmetic group in the microcontroller array is set;

(3) stream controller is set, on stream controller, increases interface with each microcontroller;

(4) stream instruction Clusterop in the stream controller is set, by in the Clusterop instruction, increasing by one 10 execution position territory, indicate the microcontroller number of carrying out nuclear level program and arithmetic group number, make a nuclear level program only on a microcontroller and partial arithmetic group, carry out, guarantee under the prerequisite of arithmetic group resource grant, make a plurality of microcontrollers carry out nuclear level program simultaneously.

Microcontroller and arithmetic group array adopt full cross-connect mode in the described microcontroller array, and each microcontroller can connect and control any one arithmetic group.

Microcontroller and arithmetic group array adopt the separate connection mode in the described microcontroller array, and each microcontroller can only the control section arithmetic group, and each arithmetic group can only be controlled by a specific microcontroller.

The number of microcontroller is 2～4 in the described microcontroller array.

Compared with prior art, advantage of the present invention just is: the stream processor expanding method of flexible distribution operating group resource of the present invention, the number of the arithmetic group of being arranged by each microcontroller of flexible configuration, reach and make full use of calculation resources, support the purpose that a plurality of nuclear level program parallelizations are carried out, realize the multinuclear expansion that economizes on resources.Can executed in parallel examine the level program through the stream handle after improving, the processing of accelerating flow data, this is that stream handle of the prior art is not accomplish, but also the arithmetic group resource in the flexible allocation stream handle on demand, handle the stream of different scales and use, enlarge the range of application of stream handle.Stream handle after the improvement is a kind of microprocessor of heterogeneous polynuclear, can be used as basic module and builds the computing platform of supporting that large-scale parallel stream is used.

Description of drawings

Fig. 1 is the architectural schematic of classical stream handle;

Fig. 2 is the framed structure synoptic diagram that comprises microcontroller array stream handle;

Fig. 3 is a schematic flow sheet of the present invention;

Fig. 4 is the synoptic diagram of microcontroller array and stream registers file connected mode;

Fig. 5 is the synoptic diagram of microcontroller array and the full cross connection control mode of arithmetic group array;

Fig. 6 is the synoptic diagram of microcontroller array and arithmetic group array separate connection control mode;

Fig. 7 is the synoptic diagram of microcontroller array and stream controller connected mode;

Fig. 8 is an original C lusterop order format;

Fig. 9 is the order format of the Clusterop after expanding.

Embodiment

Below with reference to the drawings and specific embodiments the present invention is described in further details.

System assumption diagram referring to classical stream handle shown in Figure 1.The stream handle of classics comprises a stream controller, a microcontroller, a stream registers file, an arithmetic group array, a stream storage system, a network interface, a host interface.Stream controller is responsible for flowing the decoding and the execution of grade program, and it is in charge of and allocates all computational resources in the stream handle; The task of microcontroller is to accept the instruction of stream controller, mainly carries out nuclear level program on the arithmetic group array, and the performed nuclear level instruction of microcontroller comes from the stream registers file; The stream registers file is the metadata cache structure in the stream handle, and it accepts the instruction of stream controller, is responsible for carrying out data transmission to microcontroller supply nuclear level program with arithmetic group, carries out data transmission with the stream storage system.The arithmetic group array is the arithmetic element in the stream handle, and each arithmetic group all comprises totalizer, multiplier and division and square root, and communication unit, each arithmetic group in the arithmetic group array are carried out identical nuclear level instruction.The high speed that network interface is responsible between the stream handle chip is interconnected, and host interface is the interface of stream handle towards main frame, and main frame can be the nuclear in the chip, also can be the outer processor core of chip, is responsible for sending the stream Processing tasks to stream handle.

Referring to shown in Figure 3, schematic flow sheet of the present invention.At first, microcontroller is extended to the microcontroller array, is changed to 2～4 microcontrollers by 1 original microcontroller; We are in stream registers file SRF, for each microcontroller in the microcontroller array increases an interface of reading to flow then; Revise the interface of microcontroller and arithmetic group then, microcontroller array and arithmetic group array have two kinds of connected modes, and a kind of is full cross-connect, and promptly each microcontroller can connect and control any one arithmetic group, sees Fig. 5; A kind of is the separate connection mode, and promptly each microcontroller can only the control section arithmetic group, and each arithmetic group can only be controlled by a specific microcontroller, sees Fig. 6.Simultaneously, revise stream controller, except the interface of increase and each microcontroller, also will revise the Clusterop instruction, in stream controller, increase new instruction decode, be used to decipher amended Clusterop instruction.Its detailed steps is:

The first step, it is individual that the quantity of microcontroller is increased to n (n=2,4 etc.), forms the microcontroller array, and each microcontroller has a unique numbering, so that stream controller carries out resource management and resource contention inspection.The number of microcontroller is unsuitable too many in the microcontroller array, too many microcontroller can cause the annexation of microcontroller array and arithmetic group array too complicated, reduce the processing speed of stream handle, feature according to the stream application, in preferred embodiment, it is suitable that 2 to 4 microcontrollers are set, and can carry out 2 to 4 nuclear level programs so simultaneously, satisfies most of stream demands of applications.Each microcontroller has identical inner structure, is separate, and stream controller can be distributed to nuclear level task any one microcontroller (Fig. 2 has provided the structural drawing of the stream handle after improving).

Second step kept original stream registers file constant, was centralized SRF.Read flow port for each microcontroller in the microcontroller array is provided with one, be used for transmission nuclear level program.Keeping SRF is centralized SRF, SRF is not split as distributed SRF, is because the pressure that the microcontroller array brings for the bandwidth of SRF is little, is fit to existing SRF structure.

In the 3rd step, each microcontroller in the microcontroller array all has identical interface with stream controller.Command decoder in the stream controller is expanded,, controlled different microcontrollers and carry out nuclear level program by the newly-increased bit field of translation instruction.

In the 4th step, can adopt multiple interconnection mode between microcontroller array and the arithmetic group array.First kind of mode, full cross-connect.The microcontroller array can carry out cross interconnectedly entirely with the arithmetic group array, and promptly each microcontroller all can be controlled any one arithmetic group.The second way, separate connection control.Each microcontroller can be controlled its proprietary partial arithmetic group, and arithmetic group can mean allocation be given the microcontroller array.

Note, which kind of connected mode no matter, the stream compiler can guarantee that all the microcontroller that does not occur more than controls same arithmetic group simultaneously.

In the 5th step, revise stream instruction Clusterop.The implication of original C lusterop instruction is that commander's microcontroller is carried out nuclear level program, and the start address of nuclear level program is pointed out by mpc, and inlet flow, the output stream of nuclear level program are pointed out by sdr0-sdr7.By in Clusterop instruction, increasing by one 10 execution position territory, indicate the microcontroller number of carrying out nuclear level program and arithmetic group number, can make a nuclear level program only on a microcontroller and partial arithmetic group, carry out.Like this, under the prerequisite of arithmetic group resource grant, just can realize that a plurality of microcontrollers carry out a nuclear level program simultaneously.

Referring to shown in Figure 2, according to the stream handle that comprises the microcontroller array that the inventive method obtains, wherein, the microcontroller in the classical stream handle is replaced by the microcontroller array.Other parts relevant with microcontroller only need to carry out the variation of a little on interface.Comprise 2 to 4 microcontrollers in the microcontroller array, each microcontroller has identical structure, can start the nuclear level program of carrying out under the control of stream controller.

Referring to the microcontroller array shown in Figure 4 and the synoptic diagram of stream registers file interface.Each microcontroller in the array all has one to read stream interface with the stream registers file.The stream registers file is used to cushion the data block that microcontroller will be read away for each microcontroller has distributed the stream damper of a correspondence.Read stream and be meant that microcontroller can read flow data from the stream registers file, but cannot write flow data to the stream registers file.The flow data that microcontroller reads from register file is the nuclear level program code that this microcontroller will be deciphered execution.

Referring to shown in Figure 5, the synoptic diagram of microcontroller array and the full cross-connect mode of arithmetic group array.Be that each microcontroller all has with each arithmetic group and is connected, can carry out a nuclear level program simultaneously such as microcontroller 0 on all 4 arithmetic groups, arithmetic group 1 also can be carried out a nuclear grade program on 4 arithmetic groups.But stream controller guarantees, at one time in, an arithmetic group can only be used by a microcontroller.For example microcontroller 0 can be controlled arithmetic group 0, and microcontroller 1 is controlled arithmetic group 1, arithmetic group 2, arithmetic group 3 simultaneously.Full cross-connect mode is a connected mode the most flexibly, allows the characteristic of compiler according to nuclear level program, selects to use several arithmetic groups to carry out arithmetical operation, also can allow a plurality of nuclears grade programs to carry out simultaneously under the situation that total calculation resources satisfies.The arithmetic group of arithmetic group array can be allocated to different microcontrollers arbitrarily and use.

Referring to shown in Figure 6, the synoptic diagram of the microcontroller array of separate connection and arithmetic group array connected mode.In this connected mode, the number of supposing arithmetic group is 4, and the number of microcontroller is 2, then gives 2 microcontrollers with 4 arithmetic group mean allocation, and each microcontroller uses 2 arithmetic groups.Use arithmetic group 0 and arithmetic group 1 as microcontroller 0, microcontroller 1 uses arithmetic group 2 and arithmetic group 3.Under this connected mode, each microcontroller only and fixing several arithmetic groups have and be connected, even other arithmetic group leaves unused, also can't go use.The shortcoming of separate connection is fixedly connected, independent allocation; Advantage is that annexation is simple, and line is few, saves the interconnection resource on the chip, helps improving the frequency of chip.

Referring to shown in Figure 7, the synoptic diagram of microcontroller array and stream controller connected mode.Stream controller all has a cover and original same interface to each microcontroller in the microcontroller array; In stream controller, increase instruction decode newly, can decipher the order format of amended stream instruction clusterop, make stream controller send correct control signal, dispatch these microcontrollers and carry out nuclear grade program to different microcontrollers.

Referring to shown in Figure 8, the order format of stream handle stream instruction clusterop in the prior art, this instruction is used to start microcontroller and whole arithmetic group begins to carry out one section nuclear level program.Sdr0～sdr7 wherein is meant the inlet flow and the output stream of nuclear level program, and 8 I/O streams are arranged at most.And mpc indication nuclear level program storage addresses.

Referring to shown in Figure 9, the order format of the stream of the stream handle after expansion instruction clusterop, this instruction is used for starting one section nuclear level program that a microcontroller and partial arithmetic group begin to carry out this microcontroller.Identical with original instruction is that sdr0～sdr7 wherein is meant the input and output of nuclear level program, and 8 I/O streams are arranged at most.And mpc indication nuclear level program storage addresses.The bit field that increases is to carry out body, carries out body and comprises 10, and high 2 bit representation microcontrollers wherein number, least-significant byte is represented arithmetic group.Representing when microcontroller number is zero that nuclear level program carried out by microcontroller 0, is to represent that nuclear level program carried out by microcontroller 1 at 1 o'clock.The signal of 8 bit representation arithmetic groups, wherein each are represented an arithmetic group, and this is the 1 expression label arithmetic group participation execution nuclear level program of serial number for this reason.For example 11110000 expression arithmetic groups 4, arithmetic group 5, arithmetic group 6 and arithmetic group 7 participation execution.When the execution body was 0111110000, an expression this time nuclear level program was carried out jointly by microcontroller 1 and arithmetic group 4,5,6,7.

Claims

1, a kind of stream processor expanding method of flexible distribution operating group resource is characterized in that step is:

(1) the microcontroller array is set, the microcontroller array is made up of two or more microcontrollers, in the stream registers file, for each microcontroller in the microcontroller array increases an interface of reading to flow, each microcontroller has identical structure, the interface identical with the stream registers file can start the nuclear level program of carrying out under the control of stream controller;

2, the stream processor expanding method of flexible distribution operating group resource according to claim 1, it is characterized in that: microcontroller and arithmetic group array adopt full cross-connect mode in the described microcontroller array, and each microcontroller can connect and control any one arithmetic group.

3, the stream processor expanding method of flexible distribution operating group resource according to claim 1, it is characterized in that: microcontroller and arithmetic group array adopt the separate connection mode in the described microcontroller array, each microcontroller can only the control section arithmetic group, and each arithmetic group can only be controlled by a specific microcontroller.

4, according to the stream processor expanding method of claim 1 or 2 or 3 described flexible distribution operating group resources, it is characterized in that: the number of microcontroller is 2～4 in the described microcontroller array.