CA1176757A

CA1176757A - Data processing system for parallel processings

Info

Publication number: CA1176757A
Application number: CA000398861A
Authority: CA
Inventors: Kazushi Sakamoto; Tetsuro Okamoto; Shigeaki Okutani
Original assignee: Fujitsu Ltd
Current assignee: Fujitsu Ltd
Priority date: 1981-03-20
Filing date: 1982-03-19
Publication date: 1984-10-23
Also published as: KR830009518A; KR860001274B1; EP0061096A1; AU8161482A; ES510535A0; BR8201533A; AU538595B2; DE3262186D1; ES8303743A1; JPS57155666A; JPS6161436B2; US4507728A; EP0061096B1

Abstract

ABSTRACT
There is disclosed a data processing system which has plural operation units and can execute plural instructions in parallel. The system also has plural instruction control circuitry each of which comprises at least two stages, one for reading source operands from a local storage, and another for writing a result operand into the local storage. Each instruction control circuit is provided with specific bank timings for accessing the local storage. The invention is advantageous over existing vector processors in that it permits the control of three or more operation units with comparatively simple hardware.

Description

7~7 This invention relates to a data processing system having plural operation units, and effective particularly in a vector processor which processes the vector instruction requiring comparatively longer processing time for one instruction.
A data processing system providing plural operation units, for example, store operation unit, load operation uni~, adding operation unit, multiplying operation unit and dividing operation unit, has already been used. However, such existing scalar processing system does not allow two or more operation units to execute in parallel diferent instructions. Moreover, the existing vector operation unit, for example, CRAY-l allows two operation units to execute diEferent instructions in parallel. But, the instruction control means used for such parallel execution is considerably complicated and it is almost impossible to control three or more operation units for the parallel operations by the same method. Reference should be made to the following references for a description of the CRAY-l.
1) Comm~mication of the ACM, January 1978, Vol. 21, Number 1, Page 63 - 66. "The CRAY-l Computer System."

2) "The CRAY-l COMPUTER PRELIMINARY REFERENCE MANUAL" 1975, by Cray Research Inc.
It is an object of the present invention to provide a novel data pro-cessing system which causes plural operation units to execute different instruc-tions in parallel.
It is another object of the present invention to realize the control of these parallel operations with comparatively simplified hardware.
According to the present invention, there is prcvided a data proces-sing system for parallel processing which comprises a plurality of operation units each of which executes a different instruction respectively, a plurality - 1 - ~

~17~75~

of instruction control means each of which includes at least two stages for read-ing Ollt source operand data from a local storage and for writing the result operand data into the local storage, and each of said instruction control means controls different instruction execution in parallel.
The invention will now be described in greater detail with reference to the accompanying drawings,in which:
Figure 1 shows the pipe line processing of instructions in a scalar processor;
Figure 2 shows the pipe line processing of instructions in the exist-ing vector processor;
Figure 3 is a block diagr~m of an ordinary vector processor;
Figure 4 is a block diagram of a vector processor in the present in-vention;
Figure 5 shows the structure of an ordinary vector instruction;
Figure 6 shows in detail one of the circuit blocks of Figure 4~
Figure 6A is a cliagram showing how the load instruc-tion is executed in Figure 6;
Figure 7 shows in detail another circuit block of Figure 4;
Figure 8 shows in detail a further circuit block of Figure 4;
Figure 9A shows in detail another circuit block of Figure 4;
Figure 9B shows in detail yet another circuit block of Figure 4;
Figure 10A shows in detail still another circuit block of Figure 4;
Figure 10B is a timing diagram explaining the operation of Figure lOA;
Figure 11 shows the structure of the vector register in Figure 4;
Figure 12 to Figure 15B shows the pipe line processing of instructions in the present invention.
Figure 16 explains the functions of COMPRESS instruction.

Figure 1 explains the pipe line processing in an ordinary scalar com-puter. In this Figure, F is an instruction fetch stage, D is an instruction de-code stage and E is an instruction execute stage, respectively. As will be un-derstood from Figure 1, for example, while tile instruction 1 is being executed, the instruction 2 is decoded and the instruction 3 is fetched, i.e.~ the instruc-tions 1, 2, 3...... are executed in the orm of a processing stream. However, it is impossible to execute the ins-tructions 1, 2 ~nd 3 in the same stage.
In the control method of Figure 1, if the processing time of the three stages of each instruction is equal, the instructions flow smoothly. But if such processing times are not equal, some idle time is generated between stages as shown in Figure 2.
In the case of a vector processor, an instruction processes many ele-ments of vectors and as a result the execute stage becomes longer than the okher stages, thlls resulting in an idle time as shown in Figure 2. The existing vector processor provides plural operation units of the pipe line structure correspond-ing to various instructions such as load, store, addition, multiplication and division, but it can execute simultaneously only one instruction. As a result, when one operation unit is operating, others are placed in the idle condition.
Figure 3 shows the general block diagram of an ordinary vector pro-cessor, wherein 1 is the instruction fetch unit; 2 is the instruction decoder;

3 is the instruction control unit; 4 is the main storage unit; 5 is the storage control unit; 6 is the load operation unit; 7 is the store operation unit; 8 is the arithmetic unit; 9 is the vector registers, respectively.
The instruction fetch unit 1 is actually a scalar processor. This scalar processor sequentially fetches instructions from the main storage unit 4 via a route not illustrated. ~hen the instruction fetched is a scalar instruc-tion, the scalar processor executes the scalar instruction by itself and when - \
7 S 7.

the instruction fetched is a vector instruction, it transfers the instruction to the decoder 2 which is a vector processor. The instruction decoder 2 decodes the vector instruction transferred and gives the result of decoding to the in-struction control ~mit 3. The instruction control unit 3 controls execution of instructions and controls the load operation lmit 6~ store operation ~mit 7, arithmetic unit 8 and vector registers 9 in accordance with the result of in-struction decoding. The storage control unit 5 intervenes between the access request generating unit and the main storage unit. The load operation unit 6 extracts vector data from the main storage unit 4 and writes this vector data in-to the vector register 9. The store operation unit 7 stores the vector data read out of the vector register to the main storage unit 4. The arithmetic unit 8 calculates a couple oE vector data rea~ out of the vector register 9. The vector data as a result of calculation is again stored in the vector register 9.
The vector register 9 is composed of high speed memory elements.
Figure 4 is the block diagram of an embodiment of the present inven-tion corresponding to the circuit of Figure 3 where the instruction fetch unit 1 is removed. Components 2, 4, 5, 6, 7 and 9 remain largely unchanged. Additional-ly, in Figure 4, 8 - 1 is an adder; 8 - 2 is a multiplier; 8 - 3 is a divider;
10 is a control logic means; 11 - L to 11 - F are instruction controls; 12 is a vector register trigger unit; 13 is a bank timing generator of vector register 9;
14 - E and 14 - F are selectors; G are gates and DB is a data bus, respectively.
A vector instruction contains, as shown in Figure 5, the instruction code OP which indicates that the pertinent operation is to be executed. More-over, the load/store instrllction which is used for sending or receiving data between the main storage unit 4 and vector register 9 contains the address of main storage unit MA and the address R of vector register, while the arithmetic instruction contains the addresses Rl to R3 of three vector registers which 7 ~ 7 designate two input operands and one output operand.
Upon receiving a vector instruction, a kind of instruction (adding instruction or load instruction etc.) is distinguished by decoding the operation code with the decoder 2 and then it is sent to the control logic means 10. Thi~
control logic means 10 provides the unit trigger logic and control select logic and generates the gate signals L, S, E or F which instruct in any instruction control among 11 - L to 11 - F the instruction should be controlled in accordance with the kind of instruction and the control information sent from the instruc-tion controls, and the unit trigger signal which instruct in any operation unit among 6, 7, 8 - 1, 8 - 2, 8 - 3 the instruction should be executed.
As will be explained later, the bank timing generator 13 comprises a ring counter and generates a bank timing signal determined by the ring counter.
The bank timing signal speciEies the timing oE reading/writing data Erom/to the vector registers in accordance with the kind of instruction controls. Only when this timing matches can the pertinent instruction control be started.
The instruction control 11 - L controls the load operation unit 6, while the instruction control 11 - S controls the store operation unit 7. The instruction control 11 - E control any one of the adder 8 - 1, multiplier 8 - 2 and divider 8 - 3. The instruction control 11 - F is similar to the instruction control 11 - E. Each of the instruction controls 11 - L to 11 - F respectively holds instruction input from the corresponding gate G. The instruction control 11 - L holds also a ~arning signal sent from the load operation unit 6. The instruction control 11 - S holds also a warning signal sent from the store opera-tion unit 7. The instruction control 11 - E holds an instruction and sends a select signal to the selector 14 - E and then fetches a warning signal selected by the selector 14 - E. The instruction control 11 - F also operates in the same way as the instruction control 11 - E. The instruction controls 11 - L to -1 17~75~
11 - E respectively send a busy signal which indicates that an instruction con-trol is busy and addresses of the operation units being controlled and vector register used for executing instructions to the control logic means 10 as con-trol information.
The control logic means 10 refers to the kind of instruction to be in-put, bank timing and control information. ~len the specified conditions are satisfied, the control logic means 10 selects the instruction control to which an instruction should be input and opens the gate G corresponding to the selected ;nstruction control. Simultaneously, it selects the operation unit and generates a unit trigger signal to the selected operation unit. The vector register trig-ger unit 12 indicates the address of vector register contained in the instruction and in any operation lmit the instruction generated is executed, simultaneously receives the signal indicating the trigger start bank t:iming ~t]le vector data receiving unit) and controls data sent and received between the vector registers and operation units 6, 7, 8 - 1, 8 - 2, 8 - 3. Any of the instruction controLs 11 - L to 11 - F stores the name of the operation unit being controlled and the address of the vector registers being used while the warning signal which indi-cates the end of execution of instruction is sent from the corresponding opera-tion unit, and then transmits these data to the control logic means 10 as the control information.
Figure 6 shows the internal circuit of instruction control 11 - L.
The instruction control is composed of following two stages.
1. R stage (READ control stage) which holds the instruction code ~OP) of an instruction and address (Rl) of vector register while reading data from the main storage or vector register. During this period, BUSY becomes ON.
2. W stage (WRITE control stage) which holds the instruction code ~OP) ~ ~ 767S7 of an instruction and address (Rl) of vector register while writing data to the main storage or vector register. During this period, BUSY becomes ON.
In Figure 6, 212, 215 are busy flip-flops; 213, 214, 216, 217 are registers.
201, 202 and 218 represent the respective values of the load~store instruction in the format shown in Figure 5. 206 is the "Busy" signal of the R stage in the load control, 207 is the operation code of the R stage in the load control and 208 is the first operand register number of the R stage in the load control. Similarly, 209 is the "Busy" signal of the W stage in the load control 210 is the operation code oE the W stage in the load control and 211 is the first operand register of the W stage in the load control. 203, 204 and 205 are the signals from the load operation unit 6 representing the timings or write start, read end and write end, respectively. 219 is the memory address signal fed to the load operation unit 6. 407 is the gate signal L given from the circuit of Figure 8 in control logic 10. 412 is the start L signal given from the circuit of Figure 9A.
The load instruction is generally execu~ed as shown in Figure 6A.
Figure 6A shows an example of loading eight (8) vector data elements from the memory to the vector register. At the timing of Tl, START L of Figure 6 sets the busy flip-flop 212; at the timing of T2, WRITE START WARNING (hereinafter abbreviated as WSW) sets the busy flip-flop 215; at the timing of T3~ READ END
WARNING (hereinafter abbreviated as REW) resets the flip-flop 212; and at the timing of T4, WRITE END WARNING (hereinafter abbreviated as WEW) resets the flip-flop 215. The START L sets the OP code and vector register address to the R stage. In addition, these data are set to the W stage by the WSW. The in-struction control 11 - S has also a similar structure and therefore it is not ~ ~675~
shown.
Figure 7 sho~Ys the internal circuit of thc instruction control 11 - E
and sector 14 - E. This circuit i5 almost the same as Figure 6 in structure and operation, except that three pairs of vector register addresses are used and the signals ADD, MULTI or DIVIDE are used in order to indicate the kind o~ arith-metic operation. The instruction control 11 - F is the same as Figure 7 in the structure.
230, 231, 232 and 234 represent the respective values of the arithmetic instruction in the format shown in Figure 5. 235, 236 and 237 are the signals from the arithmetic unit 8 representing the timings of write start for the add, multiply and divide operations, respectively. 238, 239 and 240 are the signals from unit 8 representing the timings of read end for the add, multiply and d;-vide operations, respectively. 241, 242 and 243 are the signals from unit 8 representing the timings of write end for the add, multiply and clividc op~rat:ions, respectively.
244 is the "Busyl' signal o~ the R stage of control 11 - E. 245 is the operation code of the R stage in control ll - E. 246 is the R2 address of the R stage in control 11 - E, 247 is the Rl address of the R stage in control 11 -E, 248 is the R3 address of the R stage in control 11 - E, 249, 250 and 251 are respectively, the add, multiply and divide operations for the R stage in control 11 - E. 252 - 259 correspond respectively to 244 - 25~ except that they relate to the W stage rather than the R stage.
404, 405 and 406 are signals from Figure 8 designating the kinds of operations, these signals being passed to registers 260 - 265. 409 is the gate signal E given from the circuit of Figure 8. 420 is the start signal from the circuit of Figure 9B.
Figure 8 shows an example of the internal circuit of control select 3L ~7~'7.~

logic ln the control logic means 10. An output of decoder 2 becomes directly the gate signal L and S for the load/store instruction. Por ~he other arith-~netic instruction, the gate signal E or F is obtained by the logical operation between an output of the decoder and the bank timing signals F3 and F3.
Figure 9A shows an example of the internal circuit of the unit trigger logic in the control logic means 10. This figure particularly shows the trigger logic for load/store, while Figure 9B the trigger logic for arithmetic instruc-tion.
In Figure 9A, 414 is a bank timing reserve circuit, showing that by which timing among the bank timings K, L the LOAD instruction and STORE instruc-tion are respectively executed by the registers 415 and 416. When LC)AD OP of 402 is "1", the bank timing (L or K) which is different from thc bank timing (K or L) of the STORE being executed is "1", the BUSY signal of instruction control I. o 206 is "O" and moreover when the CONF`LICT signal (explained later~
is "O", the START LOAD of 412 is genèrated. This signal is the trigger signal of operation unit LOAD and simultaneously the START signal of instruction con-trol 11 - L and also the START signal of register corresponding to the LOAD in-struction of bank timing reser~e circuit 414. The same operation is carried out when the STORE OP of 403 is "1".
In Figure 9B~ when ADD OP of 404 is "1", the bank timing E3 of 31l is "1", ER BUSY of 244 is "O" (the R stage of instruction control is not BUSY) and the CONFLICT signal is "O", an output of 422 becomes "1" and START ADD of 417 is generated. Simulta~eously, START E of 420 is transferred to the instruction control E. This is also true to the outputs of 423 to 427.
Figure 10A shows an example of the internal circuit of the bank timing generating circuit. 302 to 309 are l-bit registers. At first, only the bank timing of 302 becomes "1" but the others become "O" due to a SET signal of 301.

_ 9 _ When the SET signal becomes "0", the status "1" bi~ shifts to the right such as Fl, E3... fronm K and when it reaches F2~ it returns. Thereafter, K becomes "1".
m ereafter, it is repeated. The bank timing E3, F3, K and L are sent to the control logic means, determining the timing of triggering operation units and instruction controls.
Figure lOB shows the phase relation of each bank timing.
Figure 11 shows the structure of vector register 9. In Figure 11, Bl to B8 are banks and 1 2 ...... are vector elements respectively. The vector register 9 is composed of eight banks Bl to B8 and each of bank Bl to B8 re-spectively stores plural vector elements. In case of making access to the vec-tor register 9, designation of one address allows automatically sequential ac-cess to eignt elements involved to the same address in the eight banks Bl to B8.
In case it is requested to desginate the vector havillg the elements of8 x N, it can be realized by designating N addresses. [n ligure 5, the vector VRX having eight elerments can be designated by clesignating address 0 and the vector regis-ter VRy having 16 elements can be designated by designating the addresses n and n ~ 1.
For the actual read/wIite operation, the access is made at first to the bank Bl, then to the bank B2 after one clock and then sequentially to the banks B3, B~, B5, B6, B7 and B8.
It is impossible to make an access simultaneously to plural address in the same bank. Therefore, the timing for reading each address of the bank Bl must be controlled adequately. In the case of ordinary vector arithmetic in-struction, a couple of vector data are read from the vector registers VR3, VR2 and the result of arithmetic operation to them is written into the vector re-gister VRl. Thus, as shown in Figure lOB, the clock train is partitioned for every eight clocks and the clock of clock train consisting of elght clocks are given the name of E3, E~, L, El, E3, F2, K, Fl, respectively. The clocks E3 and 1 ~7~7S~
F3 are -timings for rcading the address of bcmk Bl designated as the vector re-gister VR3, while the clocks E2 and F2 are timings for reading the address of bank Bl designated as the vector register VR2 and the clocks El and Fl are tim-ings for writing data into the address of bank Bl designated as the vector re-gister VRl. The clocks K, L designate the timing for accessing the bank Bl the occasion of executing the load instruction or store instruction.
Figure 12 shows an example of reading or writing the vector element for executing the arithmetic instruction. At the timing E3, an access is made to the bank Bl of vector register VR3; at the timing E2, to the bank B2 of the vector register VR3 and the bank Bl of the vector register BR2; at the timing El, to the bank B~ of the vector register VR3, bank B3 of the vector register VR2 and bank Bl of the vector register VRl, respectively. Successively access is made as indicated in the Figure. As explained above, parallel accesses arc never made simultaneously to the same bank during execution oE one instruction.
As shown in Figure ~, the instruction control 11 - L exclusively con-trols the load operation unit 6, while the instruction control 11 - S exclusively controls the store operation unit 7. The load operation unit 6 reads the main storage unit 4 and makes an access to it in accordance with the memory address sent from the instruction control 11 - L and transmits vector data to the data bus DB. The store operation unit 7 stores the vector data, which is sent from the vector register 9 in accordance with the memory address sent from the in-struction control 11 - S, to the main storage unit ~.
Upon receiving the load instruction, the control logic circuit 10 transmits the unit trigger signal to the load operation unit 6 at the bank tim-ing L~ simultaneously transmits the gate signal L, opens the gate G correspond-ing to the instruction control 11 - L, and moreover informs that the vector data should be received by the load operation unit to the vector register trigger I ~ 7~7~

unit 12 by means of the LOAD OP 402 R. Upon reception of the store instruction following the load instruction, the control logic means 10 transmits the unit trigger signal to the store operation unit 7 at the bank timing K, transmits the gate signal S, and moreover informs that the vector data should be received by the store operation unit 7 to the vector regis-ter trigger unit 12 by means oF
the STORE OP 403.
Figure 13 shows the instruction execution sequence on the occasion that the load instruction and the store instruction are fetched in succession.
In case plural arithmetic instructions are fetched continuously, the instruction control unit 3 of Figure 4 operates as explained below.
When the addition instruction is fetched, it is decoded by the in-struction decoder 2 and sent to the control logic means 10. In CQse the timing of having received the instruction decode signal is delaying from ~:3 but preced ing to E3 and the instruction control 11 - E is not busy, the control logic means 10 transmits the unit trigger signal to the adder 8 - 1 at the timing E3 and opens the gate G corresponding to the instruction control 11 - E by sending the gate signal E. MoreoverJ it informs to the vector register trigger unit 12 by START ADD 417 and START E that the vector data should be received by the adder 8 - 1. When these signals are received, access to the vector register 9 is started at the timings E3, E2J El. In case the timing of having received the instruction decode signal is delaying from the timing E3 but preceding to the timing F3 or the instruction control 11 - E is busyJ the control logic means 10 sends the unit trigger signal to the adder 8 - 1 at the timing of F3 under the condition that instruction control 11 - F is not busy, then opens the gate G
corresponding to the instruction control 11 - F by sending the gate signal F and informs to the vector register trigger unit 12 that the vector data should be received by the adder 8 - 1 and the bank timings are F3, F2 and Fl.

7~
When the multiplication instruction is fetched under the condition that the adder ~ - 1 is controlled by the instruction control 11 - E, the con-trol logic means 10 sends the unit trigger signal to the multiplier 8 - 2 at the timing F3, opens the gate ~ corresponding to the instruction control 11 - F by sending the gate signal F and informs to the vector register trigger unit 12 that the vector clata should be received by the multiplier 8 - 2 and the bank timings are F3, F2 and Fl. When the division instruction is fetched under the control that the instruction controls 11 - E and 11 - F are busy, execution of division instruction is delayed until the instruction control 11 - E or 11 - F
become idle.
Figure 14 shows an example of the instruction executing condition and vector register access condition in case the addition instruction of 22 elements, multiplication instruction of eight elements and division instructlon of elght elements are fetched contlnuously. In an example of Pigure 1~, when the addltlon instruction is fetched, the bank timings E3, E2, El are assigned to this additlon instruction and the adder 8 - 1 is controlled by the instruction control 11 - E.
For the multiplication instruction following the addition instruction, the bank timings F3, F2, Fl are assigned and the multiplier 8 - 2 is controlled by the instruction control 11 - F. The division instruction following the multiplica-tion instruction is queued until the lnstruction control 11 - E or 11 - F becomes idle,andin the example shown in Figure 14, since the instruction control 11 - F
precedingly becomes idle, when the instruction control 11 - F becomes idle, the bank timings F3, F2. Fl are assigned to the division instruction and the divider 8 - 3 is controlled by the instruction control 11 - F.
Then, said CONFLICT will be explained here. The register conflict means that the result of the first instruction, for example, the addition in-struction is used as the one operand of the second instruction, for example, the s ~
multiplication instruction. The register conflict can be detected by comparing the first operand address Rl of the first instruction and the second or third operand address R2 or R3 of the second instruction. When conflict occurs, it is necessary, as shown in Figure 15B, to start the ~LTI instruction at the tim-ing F3 after the end of writing the first element of ADD instruction. When no conflict is occurring as shown in Figure 15A, the MULTI instruction can be started at the timing F3 immediately after the start of ADD instruction.
Figure 16 explains a COMPRESS instruction. In case the COMPRESS in-struction is fetched, when the element of vector register VR2 corresponding to the element Xi ~i = l, 2, ... ) of vector register VR3 is "l", the elements Xi are stored in the vector register VRl without resulting in any vacant element, or when the element of vector register VR2 is "O", the element Xi is not stored in the vector register VRl. The COMPRESS instruction is executed by the load operation unit 6 or store operation unit 7.
When the COMPRESS instruction is actually fetched, the control logic means 10 checks whether any of the :instruction controls 11 - L and 11 - S is vacant or not and also check whether any of -the instruction controls ll - E and 11 - F is vacant or not. If the instruction controls ll - L and 11 - E were vacant, the control logic means 10 transmits the unit trigger signal to the load operation unit 6 at the timing E3 and informs to the vector register trigger unit 12 that the vector data should be received by the load operation unit 6 and the bank timings are E3, E2, El by sending the gate signal L and gate signal E.
As explained above, the present invention proposes to provide a plurality of instruction controls each of which has a plurality of stages and allows them to execute in parallel different instructions. Moreover, conflict of access to the vector registers by plurality of instruction control can be prevented with simplified hardware by limiting the timing of making acess to the vector registers by the respective instruction control.

Claims

THE EMBODIMENTS OF THE INVENTION IN WHICH AN EXCLUSIVE
PROPERTY OR PRIVILEGE IS CLAIMED ARE DEFINED AS FOLLOWS:

1. A data processing system for parallel processing which comprises a plurality of operation units each of which executes a different instruction re-spectively, a plurality of instruction control means each of which includes at least two stages for reading out source operand data from a local storage and for writing the result operand data into the local storage, and each of said in-struction control means controls different instruction execution in parallel.

2. A data processing system according to claim 1, in which each of said stages comprises registers which hold an operation code of the instruction and operand addresses.

3. A data processing system according to claim 2, in which each instruc-tion control means corresponds to each operation unit.

4. A data processing system according to claim 2, in which at least one of said instruction control means is used commonly for at least two of said operation units.