CN114721724A

CN114721724A - RISC-V instruction set-based six-stage pipeline processor

Info

Publication number: CN114721724A
Application number: CN202210221774.1A
Authority: CN
Inventors: 吕治宽; 黄逸轩
Original assignee: University of Electronic Science and Technology of China
Current assignee: University of Electronic Science and Technology of China
Priority date: 2022-03-07
Filing date: 2022-03-07
Publication date: 2022-07-08

Abstract

The invention relates to the field of computer components, in particular to a RISC-V instruction set-based six-stage pipeline processor which comprises a six-stage pipeline, a first register, a data memory and an instruction memory, wherein an instruction fetching module, a decoding module, a distributing module, a transmitting module, an executing module and a write-back module of the six-stage pipeline are sequentially connected, the data memory is connected with the executing module and the write-back module, the first register is connected with the distributing module and the write-back module, and the instruction memory is connected with the instruction fetching module. The characteristic of out-of-order execution of double-instruction decoding and transmitting is realized, and a reference model is provided for the superscalar parallel design of a RISC-V processor. The Ghare branch prediction circuit is arranged, and the success rate of branch prediction is improved. Having renamed registers significantly reduces pipeline voiding due to read-after-write and write-after-write. The circuit has a complete branch prediction failure repair circuit, and reduces time loss caused by the failure of branch prediction.

Description

RISC-V instruction set-based six-stage pipeline processor

Technical Field

The invention relates to the field of computer components, in particular to a six-stage pipeline processor based on a RISC-V instruction set.

Background

In traditional computer architecture classification, processor applications are divided into 3 domains, respectively: server domain, program counter domain, and embedded domain. Currently in the server and program counter domains, the x86 architecture is a creditable colossal, no instruction set architecture can impact its monopoly.

In recent years, with the rapid development of the internet of things technology, the market has an increasing demand for low-power-consumption and high-performance processors used on intelligent terminals. How to meet the performance requirement of the intelligent terminal on the processor while reducing the cost and the power consumption is a popular research direction. Currently, this direction is dominated by the Cotex-a family of processor architectures of ARM.

The depth of the existing assembly line in China is shallow, most of the existing assembly lines have the characteristics of no out-of-order superscalar parallelism, and the performance of a processor is not high.

Disclosure of Invention

The invention aims to provide a six-stage pipeline processor based on a RISC-V instruction set, and aims to provide a six-stage pipeline RISC-V processor with an out-of-order superscalar structure. Meanwhile, the invention realizes a branch prediction method and a branch prediction failure repair method.

In order to achieve the above object, the present invention provides a six-stage pipeline processor based on RISC-V instruction set, including a six-stage pipeline, a first register, a data memory and an instruction memory, where the six-stage pipeline includes an instruction fetching module, a decoding module, a distributing module, an emitting module, an executing module and a write-back module, the instruction fetching module, the decoding module, the distributing module, the emitting module, the executing module and the write-back module are connected in sequence, the data memory is connected with the executing module and the write-back module, the first register is connected with the distributing module and the write-back module, and the instruction memory is connected with the instruction fetching module;

the instruction fetching module is used for fetching an instruction from the instruction memory and sending the instruction to the decoding module;

the decoding module is used for extracting an operation code, a function code, a source first register, a destination first register and an immediate of an instruction and sending the operation code, the function code, the source first register, the destination first register and the immediate to the distribution module;

the distribution module is used for reading a source operand and mapping a destination operand to a rename register;

the transmitting module is used for registering the data read by the distributing module in the reservation station and providing transmitting operation according to the instruction preparation state and the instruction sequence;

the execution module is used for sending the instruction to the write-back module for execution;

the write-back module is used for writing back the submission of the instruction in order and updating the prediction tag.

Wherein the instruction fetching module comprises a program counter register, an instruction temporary storage unit, an instruction selection unit and a branch prediction unit,

the program counter register is used for storing a program counter corresponding to the instruction memory in the current period;

the instruction temporary storage unit is used for storing the fetched 128-bit instruction data;

the branch prediction unit is a Ghare branch prediction circuit;

the instruction selection unit selects 1 or 2 instructions to be executed at this time according to the results of the program counter and the branch prediction unit.

The branch prediction unit comprises a branch history register and a mode history register, wherein the branch history register is used for registering a current global branch jump state, the mode history register is used for describing jump strength under a current program counter and the branch history register, a jump address can be searched in a branch target buffer area according to the current program counter, and the mode history register and the branch target buffer area are used for predicting whether the current instruction is a branch prediction instruction and whether the jump and the jump address exist or not so as to change a program counter value of a next instruction.

Wherein the decoding module comprises an instruction information decoding unit and a branch prediction label generating unit;

the instruction information decoding unit is used for extracting an operation code, a function code, a reservation station ID, a source first register, a destination first register and an immediate of an instruction;

the branch prediction tag generation unit generates a prediction tag according to whether the instruction fetch module predicts that the instruction is a jump instruction or not and transfers the prediction tag to a distribution module along with the instruction.

The distribution module comprises an architecture register unit, a renaming register unit and a source operand manager;

the system structure register unit is a system structure register and an idle/busy state thereof;

the renaming register unit comprises a renaming register and an effective state thereof;

the renaming register unit comprises a mapping relation between an architecture register and a renaming register;

the source operand manager generates data to write to the launch module from the requested first register number, the architected and rename register information, the rename tag of the instruction.

The transmitting module is composed of reservation stations corresponding to all instructions, and the reservation stations are used for obtaining, storing and transmitting operands of the first register instructions.

The execution module comprises an adder, a multiplier, a memory read-write unit, a memory cache unit and a branch unit, wherein the adder executes addition and subtraction and logic operation; the multiplier is used for executing multiplication operation; the branch unit computing a branch target and comparing it to a branch prediction target to generate a prediction correctness or incorrectness flag; the operation of the memory read-write unit is pipelined into 2 segments, and the memory cache unit is used for storing the data and the address of the write instruction.

The invention relates to a RISC-V instruction set-based six-stage pipeline processor, wherein the six-stage pipeline structure comprises an instruction fetching unit, a decoding unit, a distributing unit, a transmitting unit, an executing unit and a write-back unit, wherein the instruction fetching unit fetches at most 2 instructions from an instruction memory (IMEM) by using a Program Counter (PC) and sends the instructions to a subsequent pipeline. The instruction fetch stage may be executed in one clock cycle by reading IMEM on the negative edge of the clock and writing the instruction fetch unit to the latch of the decode unit on the positive edge of the clock. While the PC is speculatively updated using the Gshare branch predictor; the decoding module is used for extracting an operation code, a function code, a source first register, a destination first register and an immediate of the instruction and sending generated data to a subsequent pipeline segment, and a branch Prediction label generator (Prediction Tag Gen) assigns a branch Prediction label to the instruction. If a branch instruction exists, the branch prediction tag generator and the branch prediction error correction Table (Miss PrectionFix Table) are updated; the dispatch unit first fetches the operands required by the two instructions from the Architectural Register File (ARF) and the Rename Register File (RRF). Entries are then allocated in both the reorder buffer and the rename register to perform a first register rename. Finally, the allocation unit writes the data required for instruction execution to a Reservation Station (RS). Data bypass is performed by a Source Operand Manager (Source Operand Manager). The issue unit selects instructions that already have all operands and issues them to the execution section. When an instruction is transmitted, it is immediately removed from the reservation station; in the execute section, the instruction will be executed. When the instruction is executed, the result is written into the RRF. At the same time, the reorder buffer is notified that an instruction has completed execution. The branch unit is responsible for determining whether the branch prediction is correct and writing the branch instruction results to certain modules to perform certain actions. When a branch Prediction error occurs, the instruction that is predicted to be executed will be invalidated by the branch Prediction error Fix Table (Miss Prediction Fix Table). The execution segment of each instruction, except Load/Store, will be executed in one clock cycle. The operation of the Load/Store unit is divided into 2 sections by the pipeline to be executed; the writeback unit allows up to 2 instructions in the reorder buffer to be completed. When an instruction is completed, the data for the instruction will move from the RRF to the ARF. And updating the renaming table of the ARF. The branch predictor is also updated with the new information. Therefore, the characteristic of out-of-order execution of double-instruction decoding and transmitting is realized, and a reference model is provided for the superscalar parallel design of the RISC-V processor. The Ghare branch prediction circuit is provided, and the success rate of branch prediction is improved. Having renamed registers significantly reduces pipeline voiding due to read-after-write and write-after-write. The circuit has a complete branch prediction failure repair circuit, and reduces time loss caused by the failure of branch prediction.

Drawings

In order to more clearly illustrate the embodiments of the present invention or the technical solutions in the prior art, the drawings used in the description of the embodiments or the prior art will be briefly described below, it is obvious that the drawings in the following description are only some embodiments of the present invention, and for those skilled in the art, other drawings can be obtained according to the drawings without creative efforts.

FIG. 1 is a microarchitectural block diagram of a six-stage pipelined RISC-V processor with an out-of-order superscalar architecture provided by the present invention.

FIG. 2 is a block diagram of an instruction fetch module of a six-stage pipelined RISC-V processor with an out-of-order superscalar architecture as provided by the present invention.

FIG. 3 is a block diagram of the decoding module of a six-stage pipelined RISC-V processor with an out-of-order superscalar architecture provided by the present invention.

FIG. 4 is a block diagram of a distribution module of a six-stage pipelined RISC-V processor with an out-of-order superscalar architecture provided by the present invention.

FIG. 5 is a block diagram of a transmit module of a six-stage pipelined RISC-V processor with an out-of-order superscalar architecture provided by the present invention.

FIG. 6 is a block diagram of an execution module of a six-stage pipelined RISC-V processor with an out-of-order superscalar architecture provided by the present invention.

FIG. 7 is a block diagram of a write-back module of a six-stage pipelined RISC-V processor with an out-of-order superscalar architecture provided by the present invention.

1-six-stage pipeline, 2-first register, 3-data memory, 4-instruction memory, 11-instruction fetching module, 12-decoding module, 13-distribution module, 14-emission module, 15-execution module, 16-write-back module, 111-program counter register, 112-instruction temporary storage unit, 113-instruction selection unit, 114-branch prediction unit, 121-instruction information decoding unit, 122-branch prediction label generation unit, 131-architecture register unit, 132-renaming register unit, 133-renaming register unit, 134-source operand manager, 151-adder, 152-multiplier, 153-memory read-write unit, 154-memory cache unit, data processing unit, and data processing unit, and data processing unit, and data processing unit, data processing unit, 155-Branch Unit.

Detailed Description

Reference will now be made in detail to embodiments of the present invention, examples of which are illustrated in the accompanying drawings, wherein like or similar reference numerals refer to the same or similar elements or elements having the same or similar function throughout. The embodiments described below with reference to the drawings are illustrative and intended to be illustrative of the invention and are not to be construed as limiting the invention.

Referring to fig. 1-7, the present invention provides a six-stage pipeline 1 architecture based on RISC-V instruction set:

the six-stage pipeline 1 comprises an instruction fetching module 11, a decoding module 12, a distributing module 13, a transmitting module 14, an executing module 15 and a write-back module 16, wherein the instruction fetching module 11, the decoding module 12, the distributing module 13, the transmitting module 14, the executing module 15 and the write-back module 16 are sequentially connected, the data memory 3 is connected with the executing module 15 and the write-back module 16, the first register 2 is connected with the distributing module 13 and the write-back module 16, and the instruction memory 4 is connected with the instruction fetching module 11;

the instruction fetching module 11 is configured to fetch an instruction from the instruction memory 4 and send the instruction to the decoding module 12;

the decoding module 12 is configured to extract an operation code, a function code, a source first register 2, a destination first register 2, and an immediate of an instruction, and send the operation code, the function code, the source first register 2, the destination first register 2, and the immediate to the distributing module 13;

the distributing module 13 is used for reading the source operand and mapping the destination operand to the rename register;

the transmitting module 14 is configured to register the data read by the distributing module 13 in the reservation station, and provide a transmitting operation according to the instruction preparation state and the instruction sequence;

the execution module 15 is configured to send the instruction to the write-back module 16;

the write-back module 16 is used for in-order write-back of instructions to commit and update the prediction tag.

In this embodiment, the six-stage pipeline 1 is configured as an instruction fetch unit, a decode unit, a dispatch unit, an issue unit, an execution unit, and a write-back unit, and the instruction fetch unit fetches up to 2 instructions from the instruction memory 4(IMEM) using a Program Counter (PC) and sends the instructions to a subsequent pipeline. The instruction fetch stage may be executed in one clock cycle by reading IMEM on the negative edge of the clock and writing the instruction fetch unit to the latch of the decode unit on the positive edge of the clock. While the PC is speculatively updated using the Gshare branch predictor; the decode module 12 is used to extract the opcode, function code, source first register 2, destination first register 2 and immediate of the instruction and send the resulting data to subsequent pipeline segments, and a branch Prediction Tag generator (Prediction Tag Gen) assigns a branch Prediction Tag to the instruction. If a branch instruction exists, the branch Prediction tag generator and a branch Prediction error correction Table (Miss Prediction Fix Table) are updated; the dispatch unit first fetches the operands required by the two instructions from the Architectural Register File (ARF) and the Rename Register File (RRF). Then entries are allocated in both the reorder buffer and the rename register to perform the renaming of the first register 2. Finally, the allocation unit writes the data required for instruction execution to a Reservation Station (RS). Data bypass is performed by the Source Operand Manager 134(Source Operand Manager). The issue unit selects instructions that already have all operands and issues them to the execution section. When an instruction is transmitted, it is immediately removed from the reservation station; in the execute section, the instruction will be executed. When the instruction is executed, the result is written into the RRF. At the same time, the reorder buffer is notified that an instruction has completed execution. Branch unit 155 is responsible for determining whether a branch prediction is correct and writing the branch instruction results to certain modules to perform certain actions. When a branch Prediction error occurs, the instruction that is predicted to be executed will be invalidated by the branch Prediction error Fix Table (Miss Prediction Fix Table). The execution segment of each instruction, except Load/Store, will be executed in one clock cycle. The operation of the Load/Store unit is divided into 2 sections by the pipeline to be executed; the writeback unit allows up to 2 instructions in the reorder buffer to be completed. When an instruction is completed, the data for the instruction will move from the RRF to the ARF. And updating the renaming table of the ARF. The branch predictor is also updated with the new information. Therefore, the characteristic of out-of-order execution of double-instruction decoding and transmitting is realized, and a reference model is provided for the superscalar parallel design of the RISC-V processor. The Ghare branch prediction circuit is provided, and the success rate of branch prediction is improved. Having renamed registers significantly reduces pipeline voiding due to read-after-write and write-after-write. The circuit has a complete branch prediction failure repair circuit, and reduces time loss caused by the failure of branch prediction.

Further, the instruction fetching module 11 comprises a program counter register 111, an instruction temporary storage unit 112, an instruction selection unit 113 and a branch prediction unit 114,

the program counter register 111 is used for storing a program counter corresponding to the instruction memory 4 in the current cycle;

the instruction temporary storage unit 112 is used for storing the fetched 128-bit instruction data;

the branch prediction unit 114 is a Ghare branch prediction circuit;

instruction selection unit 113 selects 1 or 2 instructions to execute this time based on the program counter and the results of branch prediction unit 114.

In an embodiment, the program counter register 111 stores a current program counter, the instruction temporary storage unit 112 stores 128-bit instruction data, and the instruction selection unit 113 selects an instruction to be executed according to the Program Counter (PC) and a branch prediction result. Two 32-bit instructions of the 128 bits are selected based on PC [3:2], and only the first instruction of the cycle is valid if PC [2] is 1 or the selected two 32-bit instructions are predicted to be branch instructions.

Further, the branch prediction unit 114 includes a branch history register to register a current global branch jump status and a mode history register to describe jump strengths under a current program counter and the branch history register, and may find a jump address in a branch target buffer according to the current program counter, and predict whether the current instruction is a branch prediction instruction and whether the jump and the jump address are performed by using the mode history register and the branch target buffer, so as to change a program counter value of a next instruction.

In the present embodiment, the branch prediction unit 114 is constituted by a Gshare branch prediction circuit, which includes a branch history register, a pattern history register, and a branch target buffer. The branch history register stores the history result of whether the global branch is jumped or not, the mode history register stores the jump willingness of a certain PC under the global branch history at a certain time, the jump is carried out when the jump is larger than 1 and is not jumped when the jump is smaller than 2, and the branch target buffer area stores the branch instruction address and the jump address thereof. Firstly, partial XOR of a PC and a branch history register is used as one item in an address reading mode history register, if the XOR is larger than 1, jump is predicted, otherwise, no jump is performed, meanwhile, a branch instruction address and a jump address of a branch target buffer area are searched through partial PCs, jump is predicted if the XOR is equal to that of the current PC, and if the jump is predicted, the PC is set as the predicted jump address in the next period. The branch history register is updated with the predicted result, repaired with the execution result, and the mode history register and the branch target buffer are updated with the execution result.

Further, the decode module 12 includes an instruction information decode unit 121 and a branch prediction tag generation unit 122;

the instruction information decoding unit 121 is configured to extract an operation code, a function code, a reservation station ID, a source first register 2, a destination first register 2, and an immediate of an instruction;

the branch prediction tag generation unit 122 generates a prediction tag according to whether the instruction fetch module 11 predicts a jump instruction or not and passes the prediction tag along with the instruction to the dispatch module 13.

In this embodiment, the decode module 12 fetches the opcode, function code, reservation station ID, source first register 2, destination first register 2, and immediate of the instruction and generates a branch prediction tag. The system comprises an instruction information decoding unit, a source first register 2, a destination first register 2 and an immediate, wherein the instruction information decoding unit is responsible for extracting an operation code, a function code, a reservation station ID, the source first register 2, the destination first register 2 and the immediate of an instruction; a branch prediction tag generation unit 122, configured to generate a branch prediction tag. The branch prediction tag is generated in 5-bit order with 1 circularly shifted from the lowest bit to the left, and is recovered by a tag repair vector generated by the execution unit if the prediction fails.

Further, the distributing module 13 includes an architecture register unit 131, a renaming register unit 132, a renaming register unit 133, and a source operand manager 134;

the architectural register unit 131 is an architectural register and its idle/busy state;

the rename register unit 132 contains rename registers and their valid states;

the rename register unit 133 contains the mapping relationship between the architectural registers and the rename registers;

the source operand manager 134 generates data to write to the issue module 14 from the first register 2 number requested, the architectural register and rename register information, the rename tag of the instruction.

In this embodiment, the distributing module 13 is configured to read a source operand and map a destination operand to a rename register. The distributing module 13 includes an architectural register unit 131, which updates data by the write-back section and can read the written-back source operand from it, and has a flag bit to flag the busy/idle status of the first register 2; the renaming register unit 132 updates data by the execution segment, updates the valid state by the write-back segment, and reads the source operand which has executed the result but has not been written back; a renaming register unit 133, configured to save the mapping between the destination first register 2 and the renaming register by the renaming register unit 133, so as to read the renaming register corresponding to the required source operand according to the mapping. When the prediction is wrong, restoring the corresponding instruction state according to the branch prediction tag returned by the execution unit; the source operand manager 134 is configured to first determine whether a source operand is equal to a destination operand returned by a current execution unit, obtain a rename register of the current destination operand if the source operand is equal to the destination operand, determine whether an architectural register corresponding to the destination operand is busy if the source operand is not equal to the destination operand, obtain required architectural register data if the architectural register corresponding to the destination operand is idle, determine whether a rename register corresponding to the rename register is valid if the architect register is busy, obtain required rename register data if the rename register is valid, and obtain the rename register address if the rename register address is not valid. In the above determination, the destination operand is considered to be available only if the architectural register is free or the rename register is active.

Further, the issue module 14 is composed of a reservation station corresponding to each instruction, and the reservation station is used for obtaining, storing and issuing the operand of the first register 2 instruction.

In this embodiment, when the reservation station receives the write enable signal of the dispatch module 13, the free entry finder will find a free entry for it and store the corresponding data temporarily, and when there is an instruction in the reservation station that can be executed and the corresponding execution unit is free in the next cycle, the reservation station will transmit the instruction to the execution unit.

Further, the execution module 15 includes an adder 151, a multiplier 152, a memory read/write unit 153, a memory cache unit 154, and a branch unit 155.

In the present embodiment, the execution unit includes an adder 151 that performs addition and subtraction and logical operation; a multiplier 152 that performs multiplication; a branch unit 155 that computes a branch target and compares it to the branch prediction target to generate a flag indicating whether the prediction is correct or incorrect; memory read/write unit 153, the operations of memory read/write unit 153 are pipelined into 2 segments. When receiving an instruction from the reservation station, accessing the data memory and memory cache unit 154, receiving data of the memory cache unit 154 in the 1 st segment, receiving data of the data memory in the 2 nd segment, and simultaneously writing back the data and informing that the execution of the reorder buffer instruction is completed; the memory cache unit 154 is used for storing the data and address of the write command, and writing the data into the data memory when the write command enters the reorder buffer. Meanwhile, the data and addresses of a plurality of write commands are stored, and can be compared with the read command address, and if the same address exists, the data can be directly returned to the memory read-write unit 153.

Further, the write-back module 16 is configured to take charge of in-order write-back of the instruction to commit and update the prediction tag. The writeback unit includes a reorder buffer for storing an instruction completion pointer that is consistent with the renamed register of the writeback instruction. The reorder buffer may complete a maximum of 2 instructions per cycle. However, branch instructions and write instructions can only complete on a per-entry basis. When the branch instruction is completed, the instruction PC, the jump address, whether to jump, and the branch history register need to be returned to the branch prediction unit 114, so that the data thereof is updated, and when the write instruction is completed, the information of the completion of the instruction needs to be notified to the branch cache unit; and the branch prediction failure repair table is used for storing the states of the prediction labels and the relationship between the prediction labels and other prediction labels. The prediction tag state is used to update the rename state under a different prediction tag at rename register unit 133 and to update the branch history register under a different prediction tag at branch prediction unit 114 for repair of the prediction failure. The relationship with other prediction tags may also repair instructions under their associated branch prediction tags when the tag fails branch prediction.

While the invention has been described with reference to a preferred embodiment, it will be understood by those skilled in the art that various changes in form and detail may be made therein without departing from the spirit and scope of the invention as defined by the appended claims.

Claims

1. A RISC-V instruction set based six-stage pipelined processor,

the six-stage pipeline comprises an instruction fetching module, a decoding module, a distributing module, a transmitting module, an executing module and a write-back module, wherein the instruction fetching module, the decoding module, the distributing module, the transmitting module, the executing module and the write-back module are sequentially connected, the data memory is connected with the executing module and the write-back module, the first register is connected with the distributing module and the write-back module, and the instruction memory is connected with the instruction fetching module;

the distribution module is used for reading a source operand and mapping a destination operand to a renaming register;

2. A RISC-V instruction set based six-stage pipelined processor of claim 1,

the instruction fetching module comprises a program counter register, an instruction temporary storage unit, an instruction selection unit and a branch prediction unit,

the branch prediction unit is a Ghare branch prediction circuit;

3. A RISC-V instruction set based six-stage pipelined processor as in claim 2,

the branch prediction unit comprises a branch history register and a mode history register, wherein the branch history register is used for registering the current global branch jump state, the mode history register is used for describing the jump strength under the current program counter and the branch history register, and can search the jump address in a branch target buffer area according to the current program counter, and the mode history register and the branch target buffer area are used for predicting whether the current instruction is a branch prediction instruction and whether the instruction is a jump and the jump address so as to change the program counter value of the next instruction.

4. A RISC-V instruction set based six-stage pipelined processor as in claim 1,

the decoding module comprises an instruction information decoding unit and a branch prediction label generating unit;

the branch prediction tag generation unit generates a prediction tag according to whether the instruction fetch module predicts a jump instruction or not and passes the prediction tag along with the instruction to the dispatch module.

5. A RISC-V instruction set based six-stage pipelined processor as in claim 1,

the architecture register unit comprises an architecture register and an idle/busy state thereof;

6. A RISC-V instruction set based six-stage pipelined processor as in claim 1,

7. A RISC-V instruction set based six-stage pipelined processor as in claim 1,

the execution module comprises an adder, a multiplier, a memory read-write unit, a memory cache unit and a branch unit, wherein the adder executes addition and subtraction and logic operation; the multiplier is used for executing multiplication operation; the branch unit computing a branch target and comparing it to a branch prediction target to generate an indication of whether the prediction is correct or not; the operation of the memory read-write unit is pipelined into 2 segments, and the memory cache unit is used for storing the data and the address of the write instruction.