GB2454816A - Method for executing a load instruction in a pipeline processor, putting the data in the target address into a buffer then loading the requested data. - Google Patents

Method for executing a load instruction in a pipeline processor, putting the data in the target address into a buffer then loading the requested data. Download PDF

Info

Publication number
GB2454816A
GB2454816A GB0822115A GB0822115A GB2454816A GB 2454816 A GB2454816 A GB 2454816A GB 0822115 A GB0822115 A GB 0822115A GB 0822115 A GB0822115 A GB 0822115A GB 2454816 A GB2454816 A GB 2454816A
Authority
GB
United Kingdom
Prior art keywords
pipeline
value
location
execution unit
data
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Granted
Application number
GB0822115A
Other versions
GB2454816B (en
GB0822115D0 (en
Inventor
Son Dao Trong
Juergen Haess
David Shane Hutton
Michael Klein
John Gilbert Rell Jr
Eric Mark Schwarz
Kevin Chung-Lung Shum
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
International Business Machines Corp
Original Assignee
International Business Machines Corp
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by International Business Machines Corp filed Critical International Business Machines Corp
Publication of GB0822115D0 publication Critical patent/GB0822115D0/en
Publication of GB2454816A publication Critical patent/GB2454816A/en
Application granted granted Critical
Publication of GB2454816B publication Critical patent/GB2454816B/en
Active legal-status Critical Current
Anticipated expiration legal-status Critical

Links

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3824Operand accessing
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/30003Arrangements for executing specific machine instructions
    • G06F9/3004Arrangements for executing specific machine instructions to perform operations on memory
    • G06F9/30043LOAD or STORE instructions; Clear instruction
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3836Instruction issuing, e.g. dynamic instruction scheduling or out of order instruction execution
    • G06F9/3842Speculative instruction execution
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3861Recovery, e.g. branch miss-prediction, exception handling

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

Disclosed is a method and system for operating the execution unit of a computer, the execution unit having a pipeline-based execution flow during which load instructions are processed. The load instructions having the function of loading data from a storage means into a predetermined location within the pipeline, preferably a register-implemented pipeline. The method has the steps of, when a load instruction occurs in the pipeline, reading (610) the current value of the target location, and buffering (620) the current target value at a predetermined location within said pipeline. Next, the value of the source location is loaded (610) and stored (620) at the target location, the pipeline is executed according to its execution flow, using the loaded value for computing purposes. If an event (630) indicating that the loaded value is not correct occurs, (660) the buffered original value may be used instead of the loaded value. The execution unit may be a floating point unit with the reading and/or loading of the data being done using a multiply-add data path.

Description

DESCRIPTION
Out of Order Execution of Floating Point Loads with Integrated Refresh Mechanism
1. BACKGROUND OF THE INVENTION
1.1. FIELD OF THE INVENTION
The present invention relates to computer processor technology, more particularly it relates to a method and system for operating an execution unit of an electronic computer system, wherein the execution unit comprises a pipeline-based execution flow during which amongst other instructions also load instructions are processed having the function of loading data from a storage means into a predetermined location within the pipeline, preferably a register-implemented pipeline.
1.2. DESCRIPTION AND DISADVANTAGES OF PRIOR ART
In earlier prior art processors, the processing of instructions is normally done "in-order" in the following steps: 1. Instruction fetch.
2. If input operands are available (in registers for instance), the instruction is dispatched to the appropriate functional unit. If one or more operands is unavailable during the current clock cycle (generally because they are being fetched from memory), however, the processor stalls until they are available.
3. The instruction is executed by the appropriate functional unit.
4. The functional unit writes the results back to the register file.
Figure 1 illustrates such in-order load instruction in a -critical case of a read instruction 10 being dependent of a preceding instruction's 12 result, by way of schematically showing the scheme of storage locations of a shift register based, prior art execution pipeline in a "time-line" way, wherein the cycles (columns) are assumed to increase from left to right, and the pipeline depth extends from top to bottom in a couple of rows. As reveals from the drawing the instruction 10 which needs to read data from a storage location in the Floating Point Register (FPR) Pipeline needs to wait disadvantageously for a time gap of 7 cycles (end of cycle 0 to begin of cycle 8) until a preceding instruction 12 has written its computed data to that storage location.
A more recent prior art "out-of-order" paradigm breaks up the processing of instructions into the following steps: 1. Instruction fetch.
2. Instruction dispatch to an instruction queue (also called instruction buffer or reservation stations) 3. The instruction waits in the queue until its input operands are available. The instruction is then allowed to leave the queue before earlier, older instructions.
4. The instruction is issued to the appropriate functional unit and executed by that unit.
5. The results are queued.
6. Only after all older instructions have their results written back to the register file, then this result is written back to the register file. This is called the graduation or retire stage.
The key concept of out-of-order processing is to allow the processor to avoid a class of stalls that occur when the data needed to perform an operation is not available. In the outline above, the out-of-order processor avoids the stall that occurs in "in-order" step (2) of the in-order processor when the instruction is not completely ready to be processed due to missing data.
Out-of-order processors fill the gap of 7 free "slots" in time with other instructions that are ready, then re-order the results at the end to make it appear that the instructions were processed as normal. The way the instructions are ordered in the original computer code is known as program order, in the processor they are handled in data order, the order in which the data, operands, become available in the processor's registers.
Prior art pipelined out-of-order microprocessors use speculative execution to reduce the cost of conditional branch instructions.
When a conditional branch instruction is encountered, the processor guesses which way the branch is most likely to go (this is called branch prediction), and immediately starts executing instructions from that point. If the guess later proves to be incorrect, all computation past the branch point is discarded. The early execution is relatively cheap because the pipeline stages involved would otherwise lie dormant until the next instruction was known. However, wasted instructions consume CPU cycles that could have otherwise delivered performance, and on a laptop, those cycles consume batteries. There is always a penalty for a mispredicted branch. This holds even more true for large depth pipelines of floating point execution units (FPU) where the penalty is quite high due to the fact that the computation of a result requires a relative high amount of computational power. To partially get rid of these drawbacks out-of-order loads are introduced.
Figure 2 illustrates such out-of-order load instruction by way of the scheme of figure 1; The load instruction 12 writes a speculative result in the Floating Point Register (FPR) already in the second cycle. The subsequent instruction reads that data immediately from that register location and processes this data.
In cycle 9, the computed result of the instruction 12 is written to a recovery unit, see arrow 14.
With particular focus to the present invention, in order to increase the performance of execution units having a relatively large pipeline depth, such as a floating point execution,unit, it is important to start any computation as soon as all the input operands are ready. For operands which are the result of a previous instruction, this is done in prior art through a forwarding path originating near the end of the pipeline.
However, if this previous instruction is a load instruction, then its result is known long before the end of the pipeline and it will unnecessarily slow down the floating point unit if it waits for the load to reach the end of the pipeline before its data are forwarded to the following instruction.
The standard solution to this problem is to implement a register pipeline and additional forwarding paths from all stages of the load instruction. The problem with this solution is that it takes up a huge amount of registers, wiring resources and control logic.
Another attempt to solve the problem is to execute the load instruction out-of-order with respect to other arithmetic instructions. This means that the load instruction writes its data to the register file as early as possible, so upcoming instructions can directly load their input operand from the FPR instead of reading them from a forwarding path, thus greatly lessening the amount of wiring that is needed.
This solution works fine as long as no instructions are killed due to wrong predicted branches or rejected due to cache misses.
However, if the load instruction is killed or rejected after it wrote its result to the register file, a wrong value exists in the FPR.
This needs to be fixed by calling some refresh mechanism from a recovery unit (RU) which restores the original content of the FPR. The RU is keeping copies of all floating point registers.
These copies are updated in-order, so it is possible to reconstruct the correct value of a register.
Such solution to this problem is sketched in figure 3 for a prior art IBM Power6 server, illustrating a killed out-of-order load instruction including an external, fictive refresh mechanism 30. This straight forward approach may implement a register pipeline and additional forwarding paths from all stages of the load instruction requiring a huge amount of registers, wiring resources and control logic. But this could manage also cache rejects and branches which do happen quite often and which -in absence of such additional huge hardware -would be the cause for stalling the whole processor until any refresh has finished. Such stalling is a major performance drawback. So, either one tolerates the disadvantage of a huge amount of registers, wiring resources and control logic, or one tolerates the performance decrease imposed by a stalling processor. Until now, there is no way known to either find a tolerable compromise between them two or to find a solution which avoids both disadvantages.
1.3. OBJECTIVES OF THE INVENTION The objective of the present invention is thus to improve the performance of load instruction processing with tolerably small amount of hardware.
2. SUMMARY ANI) ADVANTAGES OF THE INVENTION
This objective of the invention is achieved by the features stated in enclosed independent claims. Further advantageous arrangements and embodiments of the invention are set forth in the respective subclairns. Reference should now be made to the appended claims.
The inventive method to solve the above problem is based on the idea to execute the load instruction out-of-order with respect to other arithmetic instructions. This means that the load instruction writes its data to the Floating Point Register (FPR) as early as possible, so upcoming instructions can directly load their input operand from the FPR instead of reading them from a forwarding path requiring enormous wiring, control logic and registers. By avoiding this, the inventive method greatly reduces the amount of wiring that is needed.
The problem of using wrongly loaded values due to wrong predicted branches could be fixed by calling some refresh mechanism from a recovery unit (RU) which restores the original content of the FPR. The RU keeps copies of all floating point registers. These copies are updated in-order, so it is possible to always reconstruct the correct value of a register. However this would require quite a few additional cycles to complete.
With the present invention, the fix can be done within the Floating point pipeline itself by an internal re-order mechanism which is implemented by not only writing the FPR out-of-order but also by taking the original source value of the load instruction all the way down the pipeline and use this value to overwrite the value that was written earlier (out-of-order) to the FPR. This eliminates the need for additional recovery process.
Thus, with respect to the wording used in the appended claims, a method and respective system for operating an execution unit of an electronic computer system is disclosed, wherein the execution unit comprises a pipeline-based execution flow during which load instructions are processed amongst other instructions, having the function of loading data from a source location of a storage means into a predetermined target location within the pipeline, wherein the method is characterized by the steps of: a) reading the current (original) value of the target location, and buffering this current target value at a predetermined location within the pipeline, b) loading the value of the source location and storing the loaded value at the target location (early write into FPR), c) executing the pipeline according to its execution flow, using the loaded value for computing purposes, d) on occurrence of an event indicating that the loaded value is not correct -for example in case of a mispredicted branch, or a data error, cache miss, etc., -deciding to use the previously buffered original target value instead of the loaded value.
When the execution unit is a floating point execution unit, the advantage results, that the performance gain is remarkably high, as the pipeline depth is relatively large.
Further advantageously, step a) or step b) or both of them can be implemented in a Floating Point Execution Unit by using a prior art multiply-add data path implemented usually in prior art for calculation purposes of floating point operands.
So, according to the inventive method, in case of a load instruction, the inventive method does not only read the source operand which is supposed to be written to the floating point register, but it also reads the old value acting as the target for the load instruction in the FPR. This is preferably done in a floating point unit having a multiply-add path without adding additional data paths and without needing more wiring resources, just by using these multiply add data paths.
If, in the event of a kill or a reject, the inventive method discovers that a wrong value has been written to the FPR, it controls the data paths such that the original FPR value will be taken all the way down the pipeline and then will be stored back into the FPR using the standard FPR write data paths. In this case, no check-pointing of the data is done in the recovery unit.
The treatment of write-after-write hazards is preferably done as follows: If an instruction writes to a storage location x of the FPR and it is followed by a load instruction which also writes to x, the write of the first instruction to the FPR has to be suppressed since it would be before the load instruction in an in-order unit and thus should not write the FPR after the early load has updated it.
However, if the load instruction has been killed or rejected, the previous instruction will be switched back to write its result, since else the data would get lost. Also, the check-pointing of the previous instruction must not be suppressed.
3. BRIEF DESCRIPTION OF THE DRAWINGS
The present invention is illustrated by way of example and is not limited by the shape of the figures of the drawings in which: Figure 1 illustrates an in-order load instruction by way of schematically showing the scheme of storage locations of a shift register based, prior art execution pipeline in a "time-line" way, wherein the cycles (columns) are assumed to increase from left to right, and the pipeline has a depth of ten stages (rows); Figure 2 illustrates an out-of-order load instruction by way of the scheme of figure 1; Figure 3 illustrates a killed out-of-order load instruction by way of the scheme of figure 1, including an external, fictive refresh mechanism; Figure 4 illustrates a killed out-of-order load instruction by way of the scheme of figure 1, including an internal, inventive refresh mechanism; Figure 5 depicts a circuit diagram illustrating the inventive method when implemented in multiply- add circuit of a prior art floating point execution unit; and Figure 6 illustrates the control flow of the most important steps of a preferred embodiment of the inventive method.
4. DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENT With general reference to the figures and with special reference npw to figure 4, a killed out-of-order load instruction is sketched wherein the killing is implemented by an inventive refresh mechanism internal to the pipeline logic in contrast to the external mechanism 30 of figure 3. Data is read in the first cycle at step 40, and a new result is written to the FPR immediately in the next cycle, see arrow 42 such as depicted in figure 4. But according to the inventive method, both, old, i.e. the original data at the load target location, and new data, i.e. the source data for the load target location is read at step 40, see circle 41. According to the inventive method in cases 43 of cache misses and wrongly predicted branches illustrated by circle 43, the load instruction is killed just by writing the old data to the FPR, see circle 45, thus effecting a kind of "undoing" of the effects of the reading of the (wrong) new data.
-10 -With reference to figure 5 a preferred embodiment of the inventive method is described when applied to a prior art Floating point execution unit. In figure 5, a circuit diagram is given including an embodiment of the inventive internal refresh mechanism when implemented in multiply-add circuit 50 of a prior art floating point execution unit.
Such prior art out of order floating point execution unit is in turn described in more detail in "binary floating-point unit design: The fused multiply-add dataf low", chapter 1, section 3.
It processes binary floating point instructions by using multiply-add instructions.
With particular reference now to the details of figure 5 the multiply-add circuit 50 comprises three read ports 54, 56, 58.
Port 54 is connected to an alignment unit 53, and ports 56, 58 are connected to the multiplier unit.
According to the invention a multiplexer 52 is provided at the output of the alignment unit 53 including select lines 51A,B further described later below, and a multiplier unit 55 connecting either the alignment result operand or the multiplier result operand to an adder unit, which is in turn not depicted in the drawing. A write connection is provided from the read ports 56 and 58 in order to enable writing to the floating point register, as indicated by reference sign 59.
Next, a preferred embodiment of the inventive method will be described wherein the step of reading the current value of the target location and buffering the current target value at some predetermined location within the pipeline, as well as the steps of loading the value of the source location and storing the loaded value at the target location are implemented by using the multiply-add data path of figure 5, used in prior art for calculation purposes of floating point operands.
-i_i -Since there are already multiple read ports in the register tile to load all three operands A, B and C of a multiply add instruction (A * C) + B, the source operand is read which is supposed to be stored in the FPR at a location "x" as operand A, and the old target value of FPR location "x" as operand B. Operand A is then used to write its value at FPR location "x" out of order. Operand C is given as a constant "1".
At this point, it is still unknown if the load instruction will be later killed or not. So, it is yet unknown, if the inventive method needs the A operand at the end of the pipeline for check-pointing if not killed, or if the inventive method needs the B operand for restoring the original value, if the load is killed.
The multiply-add mechanism "abused" according to this preferred inventive feature works by first multiplying the operand A with operand C. A multiplication with "1" lets the operand A unchanged, which is desired.
In the mean time, the operand B, read by read port 54, is only shifted in the alignment circuit 53 to the correct position to be added to the product A*C later on. Since it is known in case of a load instruction coming into the pipeline, that an addition is actually not desired, respective control signals are generated by the inventive method which force the operand B to remain unchanged while also operand A is controlled unchanged through the multiplier multiplication with neutral "1".
Respective control lines are depicted with reference numbers 62 and 64.
With this implementation, operands A and B are unchanged in the inventively used data paths during a few cycles (the depth of the multiplier). These cycles are enough to wait for the kill decision coming from outside of the depicted unit via the select lines 51 A 51B: -12 -If the load instruction is not killed, the adder circuit is controlled to ignore the B operand, passing down operand A unchanged. In case of a kill, A is ignored instead, and B is passed down. If the load instruction is not killed, the RU will be written with the new value of the load result, but the FPR is not written since it was written earlier. If the load is killed, the RU is not written but the FPR is written with operand B to recover the original value of the FPR before the load.
In addition to the before-described circuit and control respectively, which is restricted basically to units which comprise a multiply-add circuit, the basic steps of the control flow of the inventive are depicted in figure 6 and are described as follows. The control flow of figure 6 may also be implemented in a way different to that one of figure 5, and may be used also in execution units other than FPUs: The first step 610 comprises to read both, the source operand which is supposed to be written to the FPR, and the old value of the target FPR.
The next step 620 is then to keep both values separately stored until the "cache reject" or the "branch wrong" signals arrive.
Then a decision 630 is done: If the load instruction is not killed (left branch from decision 630) Here, the operand A is taken for being used in the pipeline, and the data of the load instruction are passed to RU to check-point the result of the load, step 650.
Otherwise, if the load instruction is killed, then the next step is to take operand B, write back the old (former) value to the FPR, and the RU is not written to, step 660.
The step 640 is to implement a mechanism to deal with write-after-write hazards: This means basically to perform the step of suppressing a FPR write, if an FPU instruction is followed by a -13 -load instruction overwriting the same FPR. Then a further step is performed to re-enable the FPR write if this following load instruction is killed.
The invention can take the form of an entirely hardware embodiment, an entirely software embodiment or an embodiment containing both hardware and software elements. In a preferred embodiment, the invention is implemented in software, which includes but is not limited to firmware, resident software, microcode, etc. The circuit as described above is part of the design for an integrated circuit chip. The chip design is created in a graphical computer programming language, and stored in a computer storage medium (such as a disk, tape, physical hard drive, or virtual hard drive such as in a storage access network). If the designer does not fabricate chips or the photolithographic masks used to fabricate chips, the designer transmits the resulting design by physical means (e.g., by providing a copy of the storage medium storing the design) or electronically (e.g., through the Internet) to such entities, directly or indirectly. The stored design is then converted into the appropriate format (e.g., GDSII) for the fabrication of photolithographic masks, which typically include multiple copies of the chip design in question that are to be formed on a wafer.
The photolithographic masks are utilized to define areas of the wafer (and/or the layers thereon) to be etched or otherwise processed.

Claims (4)

-14 - CLAIMS
1. A method for operating an execution unit of an electronic computer system, wherein the execution unit comprises a pipeline-based execution flow during which load instructions are processed amongst other instructions, having the function of loading data from a source location of a storage means into a predetermined target location within said pipeline, characterized by the steps of: in case of a load instruction occurring in the pipeline: a) reading (610) the current value of said target location, and buffering (620) said current target value at a predetermined location within said pipeline, b) loading (610) the value of the source location and storing (620) the loaded value at said target location, c) executing the pipeline according to its execution flow, using the loaded value for computing purposes, d) on occurrence (630) of an event indicating that the loaded value is not correct, deciding to use (660) said buffered value instead of the loaded value.
2. The method according to claim 1, wherein said execution unit is a floating point execution unit, and step a) is done by using a multiply-add data path implemented for calculation purposes of floating point operands.
3. The method according to claim 1, wherein said execution unit is a floating point execution unit, and step b) is done by using a multiply-add data path implemented for calculation purposes of floating point operands.
4. An electronic data processing system having an execution unit, implementing a pipeline-based execution flow during which load instructions are processed amongst other instructions, having the function of loading data from a source location of a storage means into a predetermined target location within said pipeline, characterized by a functional component (52, 56) performing the steps of: in case of a load instruction occurring in the pipeline: a) reading (610) the current value of said target location, arid buffering (620) said current target value at a predetermined location within said pipeline, b) loading (610) the value of the source location and storing (620) the loaded value at said target location, C) executing the pipeline according to its execution flow, using the loaded value for computing purposes, d) on occurrence (630) of an event indicating that the loaded value is not correct, deciding to use (660) said buffered value instead of the loaded value.
GB0822115A 2008-01-15 2008-12-04 Out of order execution of floating point loads with integrated refresh mechanism Active GB2454816B (en)

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
EP08100459 2008-01-15

Publications (3)

Publication Number Publication Date
GB0822115D0 GB0822115D0 (en) 2009-01-07
GB2454816A true GB2454816A (en) 2009-05-20
GB2454816B GB2454816B (en) 2012-02-22

Family

ID=40262631

Family Applications (1)

Application Number Title Priority Date Filing Date
GB0822115A Active GB2454816B (en) 2008-01-15 2008-12-04 Out of order execution of floating point loads with integrated refresh mechanism

Country Status (1)

Country Link
GB (1) GB2454816B (en)

Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5951676A (en) * 1997-07-30 1999-09-14 Integrated Device Technology, Inc. Apparatus and method for direct loading of offset register during pointer load operation
US20030110366A1 (en) * 2001-12-12 2003-06-12 Intel Corporation Run-ahead program execution with value prediction

Patent Citations (2)

* Cited by examiner, † Cited by third party
Publication number Priority date Publication date Assignee Title
US5951676A (en) * 1997-07-30 1999-09-14 Integrated Device Technology, Inc. Apparatus and method for direct loading of offset register during pointer load operation
US20030110366A1 (en) * 2001-12-12 2003-06-12 Intel Corporation Run-ahead program execution with value prediction

Also Published As

Publication number Publication date
GB2454816B (en) 2012-02-22
GB0822115D0 (en) 2009-01-07

Similar Documents

Publication Publication Date Title
US6862677B1 (en) System and method for eliminating write back to register using dead field indicator
JP5313279B2 (en) Non-aligned memory access prediction
US7594096B2 (en) Load lookahead prefetch for microprocessors
US8688963B2 (en) Checkpoint allocation in a speculative processor
US5619664A (en) Processor with architecture for improved pipelining of arithmetic instructions by forwarding redundant intermediate data forms
MX2009001748A (en) Method and apparatus for executing processor instructions based on a dynamically alterable delay.
US20040139299A1 (en) Operand forwarding in a superscalar processor
EP1562107B1 (en) Apparatus and method for performing early correction of conditional branch instruction mispredictions
US5898864A (en) Method and system for executing a context-altering instruction without performing a context-synchronization operation within high-performance processors
US7185182B2 (en) Pipelined microprocessor, apparatus, and method for generating early instruction results
US8977837B2 (en) Apparatus and method for early issue and recovery for a conditional load instruction having multiple outcomes
US7779234B2 (en) System and method for implementing a hardware-supported thread assist under load lookahead mechanism for a microprocessor
US20210311742A1 (en) An apparatus and method for predicting source operand values and optimized processing of instructions
Torng et al. Interrupt handling for out-of-order execution processors
US6983359B2 (en) Processor and method for pre-fetching out-of-order instructions
US20070088935A1 (en) Method and apparatus for delaying a load miss flush until issuing the dependent instruction
US6851044B1 (en) System and method for eliminating write backs with buffer for exception processing
Shum et al. Design and microarchitecture of the IBM System z10 microprocessor
GB2454816A (en) Method for executing a load instruction in a pipeline processor, putting the data in the target address into a buffer then loading the requested data.
US8966230B2 (en) Dynamic selection of execution stage
US7783692B1 (en) Fast flag generation
US6718460B1 (en) Mechanism for error handling in a computer system
US7100024B2 (en) Pipelined microprocessor, apparatus, and method for generating early status flags
US7991816B2 (en) Inverting data on result bus to prepare for instruction in the next cycle for high frequency execution units
JP3795449B2 (en) Method for realizing processor by separating control flow code and microprocessor using the same

Legal Events

Date Code Title Description
746 Register noted 'licences of right' (sect. 46/1977)

Effective date: 20130107