CN115454506A - Instruction scheduling apparatus, method, chip, and computer-readable storage medium - Google Patents

Instruction scheduling apparatus, method, chip, and computer-readable storage medium Download PDF

Info

Publication number
CN115454506A
CN115454506A CN202211103555.XA CN202211103555A CN115454506A CN 115454506 A CN115454506 A CN 115454506A CN 202211103555 A CN202211103555 A CN 202211103555A CN 115454506 A CN115454506 A CN 115454506A
Authority
CN
China
Prior art keywords
instructions
instruction
queue
scheduling
control information
Prior art date
Legal status (The legal status is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the status listed.)
Pending
Application number
CN202211103555.XA
Other languages
Chinese (zh)
Inventor
徐征帅
尹磊祖
吴盼望
韩术
Current Assignee (The listed assignees may be inaccurate. Google has not performed a legal analysis and makes no representation or warranty as to the accuracy of the list.)
Zeku Technology Beijing Corp Ltd
Original Assignee
Zeku Technology Beijing Corp Ltd
Priority date (The priority date is an assumption and is not a legal conclusion. Google has not performed a legal analysis and makes no representation as to the accuracy of the date listed.)
Filing date
Publication date
Application filed by Zeku Technology Beijing Corp Ltd filed Critical Zeku Technology Beijing Corp Ltd
Priority to CN202211103555.XA priority Critical patent/CN115454506A/en
Publication of CN115454506A publication Critical patent/CN115454506A/en
Pending legal-status Critical Current

Links

Images

Classifications

    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/30Arrangements for executing machine instructions, e.g. instruction decode
    • G06F9/38Concurrent instruction execution, e.g. pipeline or look ahead
    • G06F9/3867Concurrent instruction execution, e.g. pipeline or look ahead using instruction pipelines
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F9/00Arrangements for program control, e.g. control units
    • G06F9/06Arrangements for program control, e.g. control units using stored programs, i.e. using an internal store of processing equipment to receive or retain programs
    • G06F9/46Multiprogramming arrangements
    • G06F9/48Program initiating; Program switching, e.g. by interrupt
    • G06F9/4806Task transfer initiation or dispatching
    • G06F9/4843Task transfer initiation or dispatching by program, e.g. task dispatcher, supervisor, operating system
    • G06F9/4881Scheduling strategies for dispatcher, e.g. round robin, multi-level priority queues
    • GPHYSICS
    • G06COMPUTING; CALCULATING OR COUNTING
    • G06FELECTRIC DIGITAL DATA PROCESSING
    • G06F2209/00Indexing scheme relating to G06F9/00
    • G06F2209/48Indexing scheme relating to G06F9/48
    • G06F2209/484Precedence

Landscapes

  • Engineering & Computer Science (AREA)
  • Software Systems (AREA)
  • Theoretical Computer Science (AREA)
  • Physics & Mathematics (AREA)
  • General Engineering & Computer Science (AREA)
  • General Physics & Mathematics (AREA)
  • Advance Control (AREA)

Abstract

The application provides an instruction scheduling device, a method, a chip and a computer readable storage medium, wherein the instruction scheduling device is applied to a processing chip, the processing chip comprises a plurality of execution units, and the instruction scheduling device comprises: and the scheduling logic module is used for determining target dispatching instructions in the instruction queue according to data correlation and/or scheduling strategy control information among the instructions in the instruction queue and dispatching the target dispatching instructions to the reservation station from the instruction queue, wherein the scheduling strategy control information is used for indicating the scheduling priority of the instructions of the execution units.

Description

Instruction scheduling apparatus, method, chip, and computer-readable storage medium
Technical Field
The present application relates to the field of chips, and more particularly, to an instruction scheduling apparatus, method, chip, and computer-readable storage medium.
Background
The instruction scheduling technology is used for improving the parallel efficiency of the instructions by changing the execution order of the instructions on the premise of not influencing the correctness of a program. Instruction scheduling techniques may include static scheduling techniques and dynamic scheduling techniques. Static scheduling is the scheduling of instructions by programmers and compilers prior to program execution; and dynamic scheduling is instruction scheduling performed by hardware in the program execution process.
The Tomasulo algorithm is a common dynamic scheduling algorithm that allows out-of-order execution to more efficiently use multiple execution units in a processing chip. Specifically, when an instruction enters the head of the instruction queue and the reservation station is free, the instruction is dispatched to the reservation station, and further, in the case that the source operand of the instruction is ready, the instruction in the reservation station is transmitted to the execution unit.
The dynamic scheduling method has a further optimization space.
Disclosure of Invention
The application provides an instruction scheduling device, method, chip and computer readable storage medium, which are beneficial to improving the concurrency efficiency of instructions.
In a first aspect, an instruction scheduling apparatus is provided, which is applied in a processing chip, where the processing chip includes a plurality of execution units, and the instruction scheduling apparatus includes: and the scheduling logic module is used for determining a target dispatching instruction in the instruction queue according to data correlation and/or scheduling policy control information among the instructions in the instruction queue, and dispatching the target dispatching instruction from the instruction queue to the reservation station, wherein the scheduling policy control information is used for indicating the scheduling priority of the instructions of the execution units.
In a second aspect, an instruction scheduling method is provided, including: determining a target dispatching instruction in an instruction queue according to data correlation and/or scheduling policy control information among instructions in the instruction queue, wherein the scheduling policy control information is used for indicating scheduling priorities of the instructions of different execution units, and dispatching the target dispatching instruction from the instruction queue to a reservation station.
In a third aspect, a chip is provided, which includes:
a processor; and
a memory for storing executable instructions of the processor;
wherein the processor is configured to perform the above instruction scheduling method via execution of executable instructions.
In a fourth aspect, a computer-readable storage medium is provided for storing a computer program, which causes a computer to execute the method of the second aspect or its implementations.
By means of the technical scheme, when the instructions sent to the reservation station are determined, data correlation and/or scheduling strategy control information among the instructions are considered, so that the instructions which do not have data correlation with other instructions are favorably and preferentially sent to the reservation station, and/or the instructions with high scheduling priority are favorably and preferentially sent to the reservation station, and therefore the parallel efficiency of the instructions can be improved.
Drawings
Fig. 1 is a schematic diagram of a basic structure of a floating-point unit of a MIPS processor based on the Tomasulo algorithm.
Fig. 2 is a schematic structural diagram of an instruction scheduling apparatus according to an embodiment of the present application.
Fig. 3 is a schematic structural diagram of a scheduling logic module according to an embodiment of the present application.
Fig. 4 is a schematic structural diagram of another scheduling logic module provided in an embodiment of the present application.
Fig. 5 is a schematic structural diagram of another scheduling logic module provided in an embodiment of the present application.
Fig. 6 is a flowchart of instruction scheduling performed by the instruction scheduling apparatus according to an embodiment of the present application.
Fig. 7 is an overall configuration diagram of an instruction scheduling apparatus according to an embodiment of the present application.
Fig. 8 is a schematic diagram illustrating an instruction flow of an instruction scheduling apparatus according to an embodiment of the present application.
Fig. 9 is a schematic structural diagram of a scheduling logic module according to an embodiment of the present application.
Fig. 10 is a schematic diagram of an instruction scheduling method provided in an embodiment of the present application.
Fig. 11 is a schematic block diagram of a chip provided according to an embodiment of the present application.
Detailed Description
The technical solutions in the embodiments of the present application will be described below with reference to the drawings in the embodiments of the present application, and it is obvious that the described embodiments are some embodiments of the present application, but not all embodiments. All other embodiments obtained by a person of ordinary skill in the art without making any creative effort for the embodiments in the present application belong to the protection scope of the present application.
For the purpose of understanding the embodiments of the present application, the Tomasulo algorithm related to the present application will be described.
Fig. 1 is a schematic diagram of a basic structure of a floating-point unit of a MIPS processor based on the Tomasulo algorithm.
As shown in fig. 1, the following structure may be included:
the functional unit (or called execution unit, functional unit) includes a floating-point adder and a floating-point multiplier, wherein the floating-point adder is used for completing addition and subtraction operations, and the floating-point multiplier is used for completing multiplication and division operations.
An instruction queue for storing instructions waiting for dispatch (dispatch) to a Reservation Station (RS). The instructions in the instruction queue come from the instruction component, and the data in the instruction queue flows out according to the first-in first-out sequence.
A reservation station for holding dispatched and waiting instructions to issue (issue). The reservation station stores the operation code, operand, etc. information of the instruction.
A Load (Load, L) buffer and a Store (Store, S) buffer for storing data or address information of the read/write memory.
A Common Data Bus (CDB) for sending the results of the computation of the functional unit to the place where the results are needed.
A Floating-point Register (FP) is connected to the functional unit through a bus, and is connected to the Store buffer through the CDB.
The pipeline using the Tomasulo algorithm requires 3 segments:
step 1: outflow of the liquid
Specifically, an instruction enters the head of the instruction queue and the reservation station corresponding to the instruction has an idle position, then the instruction may flow out of the head of the instruction queue to the reservation station (denoted as r).
If the operands of the instruction are ready in the registers, the operations are fed into the reservation station r.
If the operand of the instruction is not ready, the identity of the reservation station that generated the operand is fed into reservation station r. Once the recorded reservation stations complete the calculation, the calculation results are sent to reservation station r.
And finishing the reservation work of the target register.
If the reservation station is not free, instructions cannot be streamed out of the instruction queue.
Outflow action: and (5) buffering the operand, completing name changing and reserving a target register.
Step 2: execute
The floating point instruction performs the actions of: and executing the floating point operation to generate a calculation result.
The Load/Store instruction performs the actions of: the effective address is calculated and placed in the Load buffer and Store buffer.
And step 3: result write back
After the functional unit has completed its computation, the computation result may be placed on the CDB, and all registers and reservation stations waiting for the computation result may obtain the required data from the CDB. Based on Tomasulo algorithm, only one instruction can be scheduled at a time, so that the parallel efficiency of the instruction is reduced, and the instructions sequentially flow out to the reservation station according to the sequence in the instruction queue, so that the optimal processor performance cannot be ensured.
Therefore, how to improve the dynamic scheduling algorithm of instructions to improve the performance of the processor is a problem to be solved.
Fig. 2 is a schematic structural diagram of an instruction scheduling apparatus according to an embodiment of the present application. The instruction scheduling apparatus may be applied in a processing chip, and the processing chip may include a plurality of execution units.
It should be understood that the present application is not limited to the type of the processing chip, and may be, for example, a dedicated processing chip, such as an image processing chip, a video processing chip, an audio processing chip, or the like, or may be a general-purpose processing chip.
In some embodiments, the plurality of execution units may include, but are not limited to, at least one of the following execution units:
an Arithmetic Logic Unit (ALU), a data exchange unit (PMT), a load unit (LDU), and a store unit (STU).
It should be understood that the embodiment of the present application does not limit the configuration manner of the reservation station, for example, each execution unit may configure one reservation station, or all execution units share one reservation station, or the execution units may also be grouped, and one reservation station is configured for a group of execution units, and optionally, the number of execution units included in each group of execution units may be the same, or may also be different.
In some embodiments, as shown in fig. 2, the instruction scheduling apparatus 200 may include:
and the scheduling logic module 210 is configured to determine a target dispatch instruction in the instruction queue according to data dependencies between instructions in the instruction queue and/or scheduling policy control information, and dispatch the target dispatch instruction from the instruction queue to the reservation station, where M is an integer greater than or equal to 1.
In some embodiments, the scheduling logic module includes an instruction queue, or the instruction queue and scheduling logic module may be separate modules.
In some embodiments, the scheduling logic module 210 is further configured to:
in the event that there is free space in the reservation station, a target dispatch instruction is determined in the instruction queue based on data dependencies between instructions in the instruction queue and/or scheduling policy control information.
In one embodiment, prior to dispatching instructions to the reservation station, scheduling logic 210 may first determine whether there is free space in the reservation station, and in the event that there is free space in the reservation station, determine a target dispatch instruction to dispatch to the reservation station based on data dependencies between instructions in the instruction queue and/or scheduling policy control information.
Mode 1: target dispatched instructions to the reservation station are determined based on data dependencies between instructions in the instruction queue.
In the related art, instructions are dispatched to the reservation station in sequence according to the order of the instructions in the instruction queue, and the problem that the instructions dispatched to the reservation station preferentially cannot be executed due to data correlation between the instructions and other instructions, so that the reservation station is occupied for a long time, and the instruction parallel efficiency is influenced may exist. In the embodiment of the application, when the instruction sent to the reservation station is determined, data correlation among the instructions is considered, so that the instruction which does not have data correlation with other instructions is preferably sent to the reservation station and then is preferentially executed, the time for occupying the reservation station is reduced, the utilization efficiency of the reservation station is improved, and the parallel efficiency of the instructions is improved.
In some embodiments, the data dependencies between instructions may include:
whether a register (e.g., a source register and a destination register) of one instruction is related to a register (e.g., a source register and a destination register) of another instruction, for example, whether data dependency such as Read After Write (RAW), write After Write (WAW), write After Read (WAR) exists or not.
For example, whether an instruction in the instruction queue has data dependencies with other instructions may refer to: whether the instruction has a data dependency with other instructions prior to the instruction in the instruction queue.
In some embodiments, two instructions may be considered to have data dependency when a data dependency, such as RAW, WAW, or WAR, exists between their registers, and otherwise, the two instructions may not have data dependency.
In the embodiment of the present application, for convenience of distinction and description, an instruction having no data dependency with other instructions is referred to as a first type instruction, and an instruction having data dependency with other instructions is referred to as a second type instruction, or a non-first type instruction.
Alternatively, whether an instruction has data dependencies with other instructions may refer to: whether the instruction has a data dependency with other instructions prior to the instruction in the instruction queue.
In some embodiments, as shown in fig. 3, the scheduling logic module 210 may include:
and the instruction analysis module 211 is configured to analyze data dependencies between instructions in the instruction queue.
In one particular embodiment, the instruction analysis module may analyze data dependencies between each of the N instructions and other instructions that precede the instruction.
In some embodiments, when performing the instruction analysis, the instruction analysis module may first check the type of the instruction, and for a CSR instruction (which is used to set the first CR or set the second CR, and the roles of the first CR and the second CR are described below), may perform the subsequent instruction analysis and reordering processes for non-CSR instructions without participating in the reordering of the instructions.
That is, the N instructions do not include a CSR instruction.
In some embodiments, the instruction analysis module may analyze data dependencies between instructions in the instruction queue based on a sliding window.
Alternatively, the size of the sliding window may be N instructions, where N is an integer greater than 1. That is, the instruction analysis module can analyze the data dependencies between N instructions at a time.
The specific size of N is not limited in the embodiments of the present application, for example, N may be 3,4,5, or 6, etc., or N may be equal to the depth of the instruction queue.
In some embodiments, as shown in fig. 3, the scheduling logic module 210 may further include:
the instruction reordering module 212 is configured to reorder the instructions in the instruction queue according to data dependency between the instructions.
In some implementations, the instruction reordering module may order the N instructions to obtain the first queue according to data correlation between the first N instructions in the instruction queue.
For example, if all the instructions in the N instructions are the second type of instruction, the order of the N instructions in the instruction queue is not changed. That is, the N instructions in the first queue are ordered the same as the N instructions in the instruction queue.
For another example, if a part of the N instructions is a first type instruction, the first type instruction of the N instructions is arranged at the head of the first queue, and the second type instruction of the N instructions is arranged at the tail of the first queue. When the first type of instruction has a plurality of instructions, the order of the plurality of instructions of the first type is arranged according to the order of the plurality of instructions of the first type in the instruction queue. When the second type of instruction has a plurality of instructions, the order of the plurality of instructions of the second type is arranged according to the order of the plurality of instructions of the second type in the instruction queue. That is, the relative order between the instructions of the first type is unchanged, and the relative order between the instructions of the second type is unchanged.
For another example, if all the instructions in the N instructions are the second type of instruction, the order of the N instructions in the instruction queue is not changed. That is, the N instructions in the first queue are ordered the same as the N instructions in the instruction queue.
Through the reordering process, the first class of instructions in the first N instructions in the instruction queue are arranged at the head of the queue, the second class of instructions are arranged at the tail of the N instructions, and further, the instructions are dispatched to the reservation station based on the ordering, so that the instructions which do not have data correlation with other instructions are preferentially dispatched to the reservation station.
In some embodiments, the scheduling logic module 210 further comprises:
and the instruction selection module is used for selecting a target dispatching instruction from the first queue.
Optionally, the target dispatch instruction includes one or more instructions of a first type in the first queue.
In some embodiments, the number of target dispatched instructions is less than or equal to K, where K is the maximum number of instructions that can be dispatched to the reservation station, or the maximum number of instructions that can be dispatched to the reservation station at one time.
For example, if the number of first class instructions in the first queue is greater than or equal to K, the first K first class instructions may be selected from the first queue as target dispatch instructions.
For another example, if the number of first class instructions in the first queue is less than K, then all of the first class instructions may be selected from the first queue as target dispatch instructions.
In some embodiments, K is retrieved from a second Control Register (CR) (denoted DP _ NUM _ CR).
In some embodiments, the value of the second CR may be set via a Control and Status Register (CSR) instruction.
In some embodiments, there is one second CR for each execution unit in the plurality of execution units.
For example, the second CR for an execution unit may be used to indicate the maximum number of instructions for that execution unit that may be dispatched to the reservation station at one time.
Optionally, the values of the second CR corresponding to each execution unit may be the same, or may also be different.
In other words, the maximum number of instructions that can be dispatched at one time per execution unit may be the same or, alternatively, may be different.
In some embodiments, multiple execution units correspond to the same second CR.
The value of the second CR may be used to indicate the maximum number of instructions that can be dispatched at one time by one execution unit, where the maximum number of instructions that can be dispatched at one time by all execution units is the same, or may be used to indicate the sum of the maximum number of instructions that can be dispatched at one time by all execution units.
In some embodiments, if multiple execution units correspond to the same second CR, the value of the second CR is X sum ,X sum Indicating that all execution units are schedulable at one timeThe maximum number of instructions, then K may take the value of X sum
In some embodiments, if multiple execution units correspond to the same second CR, and a value of the second CR is X, where X represents a maximum number of instructions that can be dispatched by one execution unit at a time, a value of K may be S × X.
In some embodiments, if each execution unit in the plurality of execution units corresponds to one second CR, and a value of the second CR corresponding to the execution unit i is Xi, a value of K may be Xi
Figure BDA0003840372010000051
Mode 2: and determining a target dispatching instruction dispatched to the reservation station according to the dispatching strategy control information.
In some embodiments, the scheduling policy control information may be used to control scheduling policies used for scheduling instructions, e.g., a power-first scheduling policy, a bandwidth-first scheduling policy, an efficiency-first scheduling policy, etc.
In some embodiments, the scheduling policy control information is used to indicate scheduling priorities of instructions of the plurality of execution units.
For example, a power-first scheduling policy, a bandwidth-first scheduling policy, and an efficiency-first scheduling policy may be implemented by configuring scheduling priorities of instructions of a plurality of execution units.
Therefore, in the embodiment of the present application, the scheduling policy of the instruction may be configurable, and by using the scheduling manner, an engineer may participate in scheduling of the hardware program, for example, the corresponding scheduling policy may be configured for processors with different performances, or the corresponding scheduling policy may be configured based on different requirements or application scenarios, so that the optimal processor performance can be ensured.
In some embodiments, the scheduling policy Control information may be obtained from a first Control Register (CR). In some embodiments, the value of the first CR may be set via a Control and Status Register (CSR) instruction.
In some embodiments, the plurality of execution units correspond to a first CR, with different values of the first CR indicating different scheduling priorities of instructions of the plurality of execution units.
Illustrated is a plurality of execution units including ALU, PMT, LDU, and STU.
As one example, a first CR takes a first value representing the scheduling priority ordering of instructions of the execution units as:
ALU > LDU > STU > PMT. In some scenarios, this scheduling prioritization may be used to implement a computationally-preferred scheduling policy.
As another example, the first CR takes a second value representing a scheduling prioritization of instructions for the execution unit of:
LDU > STU > PMT > ALU. In some scenarios, this scheduling prioritization may be used to implement a bandwidth-first scheduling policy.
As yet another example, the first CR takes a third value, representing a scheduling prioritization of instructions for the execution unit of:
ALU = LDU = STU = PMT. In some scenarios, this scheduling prioritization may be used to implement an efficiency-first scheduling policy.
Wherein the first value, the second value and the third value are different.
In some embodiments, each of the plurality of execution units corresponds to a first CR, and a value of the first CR corresponding to an execution unit is used to indicate a weight of a scheduling priority of an instruction of the execution unit or to indicate a scheduling priority ordering of the instruction of the execution unit.
Illustrated with multiple execution units including an ALU, PMT, LDU, and STU.
The first CR corresponding to the ALU is recorded as ALU _ CR, the first CR corresponding to the PMT is recorded as PMT _ CR, the first CR corresponding to the LDU is recorded as LDU _ CR, and the first CR corresponding to the STU is recorded as STU _ CR.
In some embodiments, the value of the first CR is used to indicate a scheduling priority weight of the instruction, and if ALU _ CR =5, LDU \ CR =4, STU =3, PMT =2, the scheduling priority is ordered ALU > LDU > STU > PMT.
In other embodiments, the value of the first CR is used to indicate a scheduling priority ordering of instructions, which indicates that the scheduling priority ordering is ALU > LDU > STU > PMT if ALU _ CR =1, LDU \ CR =2, STU =3, PMT = 4.
It should be understood that the foregoing scheduling priority ordering is only an example, and in practical applications, the scheduling priority ordering may be flexibly adjusted according to factors such as scheduling requirements, system computation power, system bandwidth, instruction order, and total system efficiency, and the application is not limited thereto.
In some embodiments, as shown in fig. 4, the scheduling logic module 210 may include:
and the instruction reordering module 213 is configured to order the first N instructions in the instruction queue according to the scheduling policy control information, so as to obtain a third queue. For example, the first N instructions are ordered according to the scheduling priorities of the instructions of the multiple execution units, resulting in a third queue.
Through the reordering process, the high-scheduling-priority instructions in the first N instructions in the instruction queue are arranged at the head of the queue, the low-scheduling-priority instructions are arranged at the tail of the N instructions, and further, the instructions are dispatched to the reservation station based on the ordering, so that the high-scheduling-priority instructions are preferably dispatched to the reservation station, the scheduling requirement is favorably met, and the performance of the processor is improved.
In some embodiments, the scheduling logic module 210 further comprises:
and the instruction selection module is used for selecting a target dispatching instruction from the third queue.
In some embodiments, the target dispatch instruction includes the first K instructions in the third queue, where K is the maximum number of instructions that can be dispatched to the reservation station. The configuration of K refers to the related description in the mode 1, and is not described herein again for brevity.
Mode 3: target dispatch instructions to be dispatched to the reservation stations are determined based on data dependencies between instructions in the instruction queue and scheduling policy control information.
For specific implementation of the data correlation between the instructions and the scheduling policy control information, reference is made to the specific implementation in the manner 1 and the manner 2, and details are not described herein for brevity.
Therefore, in the embodiment of the application, when determining the instructions to be dispatched to the reservation station, data dependency and scheduling policy control information among the instructions are considered, so that on one hand, it is beneficial to ensure that the instructions without data dependency with other instructions are dispatched to the reservation station preferentially, and on the other hand, it is beneficial to ensure that the instructions with high scheduling priority are dispatched to the reservation station preferentially, and therefore, the parallel efficiency and the processor performance of the instructions can be considered.
In some embodiments, the scheduling logic module 210 may include:
an instruction analysis module 214 for analyzing data dependencies between instructions in the instruction queue;
a first instruction reordering module 215, configured to reorder instructions in an instruction queue according to data dependencies between the instructions;
and a second reordering module 216, configured to reorder the instructions twice according to the scheduling policy control information.
In a particular embodiment, the instruction analysis module 214 may analyze data dependencies between instructions in the instruction queue based on a sliding window.
For example, instruction analysis module 214 may analyze data dependencies between each of the N instructions and other instructions that precede the instruction.
Alternatively, the size of the sliding window may be N instructions, where N is an integer greater than 1. That is, the instruction analysis module can analyze the data dependencies between N instructions at a time.
The specific size of N is not limited in the embodiments of the present application, for example, N may be 3,4,5, or 6, etc., or N may also be equal to the depth of the instruction queue.
In some implementations, the first instruction reordering module 215 may order according to data dependencies between the first N instructions in the instruction queue, resulting in a first queue. For the sake of brevity, the detailed implementation in the method 1 is referred to in the detailed sorting manner, and is not described here again.
In some implementations, the second instruction reordering module 216 may reorder the first queue according to the scheduling policy control information. For example, according to the scheduling policy control information, L instructions in the first queue are ordered to obtain a second queue, where L is less than or equal to N.
In some embodiments, the L instructions may be instructions of a first type in a first queue.
In some embodiments, the L instructions do not include a CSR instruction, i.e., the CSR instruction does not participate in the ordering.
That is, the second instruction reordering module 216 may order, when there are a plurality of first type instructions in the first queue, the plurality of first type instructions in the first queue according to the scheduling policy control information, to obtain a second queue.
In some embodiments, the scheduling logic module 210 further comprises:
and the instruction selection module is used for selecting a target dispatching instruction from the L instructions.
For example, if L is greater than or equal to the value of DP _ NUM _ CR (the meaning of DP _ NUM _ CR refers to the relevant description in the manner 1), that is, the number of the first type instructions in the first queue is greater than or equal to K, the L instructions may be reordered according to the scheduling policy control information to obtain a second queue, and the first K instructions with high scheduling priority are selected from the second queue as the target dispatch instructions.
For another example, if L is less than K, i.e., the number of first type instructions in the first queue is less than K, then the L instructions may all be targeted dispatch instructions.
In other embodiments, the second instruction reordering module 216 may reorder the first instructions in the first queue for the second time according to the scheduling policy control information when the number of the first instructions in the first queue is greater than K, to obtain a second queue, and further select a target dispatch instruction to be dispatched to the reservation station in the second queue.
In some embodiments, if there is only one instruction of the first type in the first queue, then the one instruction of the first type may be only taken as a target dispatch instruction to be dispatched to the reservation station.
In other embodiments, if the instructions in the first queue are all of the second type of instruction, no instruction may be dispatched to the reservation station.
In this embodiment, the instruction selection module may be implemented by a selector, such as an N-K selector, where an input of the N-K selector may refer to N instructions, or may be less than N instructions, and an output of the selector may be K instructions, or may be less than K instructions.
The instruction dispatch flow in the method 3 will be described with reference to fig. 6. As shown in fig. 6, the following steps may be included:
s301, decoding the instructions in the instruction queue and determining the types of the instructions.
S302, whether the instruction is a CSR instruction is determined.
If so, perform S303, otherwise perform S311.
S303, extracts CSR information, for example, scheduling policy control information.
S304, write the CR register, e.g., write the scheduling policy control information to the first CR.
S311, it is determined whether the reservation station has a free space.
If so, go to S312, otherwise, end the process.
S312, analyzing data correlation among the instructions in the instruction queue.
For a specific analysis method, reference is made to the behavior of the instruction analysis module in the foregoing embodiment, which is not described herein again.
S313, the instructions are sorted according to the data correlation among the instructions.
The specific sorting method refers to the related description in the foregoing method 1 and method 3, and is not described herein again.
And S314, reordering the instructions according to the value of the CR register.
The CR register may be a first CR register, and a value of the first CR register may be used to indicate scheduling policy control information, that is, to reorder the instruction according to the scheduling policy control information.
S315, selecting a target dispatch instruction.
The target dispatch instruction is selected from the reordered queue, for example, based on the value of DP _ NUM _ CR.
S316, a reservation station is set.
For example, after the target dispatch instruction is dispatched to the reservation station, the status of the reservation station corresponding to the target dispatch instruction is set to busy (busy).
In some embodiments, the scheduling logic module 210 is further configured to:
in the case of dispatching a target dispatch instruction to a reservation station, setting the latch state of a register associated with the target dispatch instruction to be latched;
and under the condition that the target dispatching instruction is executed by the corresponding execution unit, setting the latch state of the register related to the target dispatching instruction to be unlocked.
Optionally, the registers associated with the target dispatch instruction may include source registers and/or target registers.
For example, in the case where a first instruction is dispatched to a reservation station, the register state of the first instruction in the reservation station is locked, and further, after the first instruction is executed by the execution unit, the register state of the first instruction is unlocked.
In some embodiments of the present application, after an instruction is dispatched (dispatch) to a reservation station, an execution unit may trigger a dispatch logic module to examine data dependencies between instructions in the reservation station to determine instructions that may be issued to the execution unit. For example, in the case where an instruction in a reservation station has no data dependency with other instructions, the instruction is issued to the corresponding execution unit.
In some embodiments, if the data dependency check is performed before the instructions are dispatched to the reservation station, in which case the dispatch logic may not have to perform the data dependency check between the instructions and the execution units may execute the instructions directly in the reservation station. For example, in modes 1 and 3, where data dependency checking has been performed before instructions are dispatched to the reservation station, the scheduling logic may not have to perform data dependency checking between instructions and the execution unit may directly execute instructions in the reservation station. For another example, in approach 2, no data dependency checking is performed before the instructions are dispatched to the reservation station, in which case the dispatch logic module needs to perform a data dependency check between the instructions to determine which instructions can be issued to the execution units.
In some embodiments, the scheduling logic module 210 is further configured to:
in the case of dispatching a target dispatch instruction to the reservation station, first control information (or check) of a first execution unit is received, the first control information instructing a scheduling logic module to check data dependencies between instructions in the reservation station to determine an issuable instruction.
Alternatively, in mode 2, the execution unit may send a check signal to the scheduling logic module, triggering the scheduling logic module to check for data dependencies between instructions in the reservation stations.
In some embodiments of the present application, after an instruction is issued (issue) to an execution unit, the execution unit may execute the instruction, further triggering a dispatch logic module to clear (clear) relevant information for the instruction in a reservation station.
For example, after the target dispatch instruction is issued to the first execution unit, second control information (or clear) signal of the first execution unit is received for instructing the dispatch logic module to clear information associated with the target dispatch instruction in the reservation station.
With reference to fig. 7 and fig. 8, the operation principle of the instruction scheduling apparatus provided in the embodiment of the present application is described by taking the multiple execution units including ALU, PMT, LDU, and STU as examples.
In an embodiment of the present application, the scheduling logic module is configured to implement at least one of the following functions:
1. caching an instruction of a processing chip;
2. sequencing the command queues to be issued in sequence;
3. analyzing N instructions at the head of the instruction queue;
4. and determining the instruction to be dispatched preferentially according to the scheduling strategy control information.
In the embodiment of the application, the instruction firstly reaches the scheduling logic module, the instruction which is preferentially dispatched to the reservation station is determined by the analysis of the scheduling logic module and the scheduling priority, and further the instruction is transmitted to each execution unit for execution.
In the embodiment of the present application, an independent reservation station structure may be adopted, that is, each execution unit has its own independent reservation station, or all execution units share a reservation station, or the reservation stations may be configured in groups.
As shown in fig. 7 and 8, instructions are first stored in a loop buffer of an instruction queue (instruction queue). The instructions in the instruction queue are arranged in sequence, the scheduling logic module can fetch a plurality of instructions (for example, N instructions) from the instruction queue at the same time, and control the dispatching order of the instructions according to the data correlation among the instructions and the configuration information (i.e., scheduling policy control information) of the first CR, so that the conversion from a serial sequential instruction stream to a parallel out-of-order instruction stream can be realized. Specifically, the scheduling logic module first checks data dependency between other instructions arranged before the instruction in the instruction queue and the instruction, and reorders the instruction queue according to the data dependency information. For example, the instructions without data dependency with other instructions are arranged at the head of the queue, and then the configuration information of the first CR performs secondary sorting on the instructions without data dependency with other instructions, wherein the instructions with data dependency with other instructions are arranged at the tail of the queue and the relative sequence between the instructions with data dependency with other instructions is not changed. Further, instructions dispatched to the reservation station are selected in the second ordered queue.
Optionally, after the instructions are dispatched to the reservation station, the scheduling logic module may also pause earlier instructions according to the data dependencies between the instructions until the data dependencies between the instructions are resolved.
As shown in fig. 8, after instructions are dispatched to the reservation stations, the execution units may send a check signal to the scheduling logic module for triggering the scheduling logic module to check for data dependencies between instructions in the reservation stations to determine which instructions may be transmitted to the execution units. After the instruction is executed by the execution unit, the execution unit may further send a clear signal to the scheduling logic module for triggering the scheduling logic module to clear information related to the instruction in the reservation station.
Therefore, in the embodiment of the present application, after an instruction reaches an instruction queue, according to the scheduling logic in the embodiment of the present application, the instruction is dispatched to a reservation station, each execution unit controls an instruction execution process by checking or clearing the reservation station, so that out-of-order dispatch and out-of-order execution of the instruction are realized, and the whole process completes two out-of-order processes, and conditionally out-of-order dispatch and out-of-order execution.
Fig. 9 is a schematic structural diagram of a scheduling logic module according to an embodiment of the present application. As shown in fig. 9, the scheduling logic module may include the following modules:
the system comprises an instruction queue, an instruction analysis module, an instruction reordering module, a selection module and a reservation station.
The instructions from the instruction unit arrive at the instruction queue first, and further, the data correlation between the instructions in the instruction queue is analyzed by the instruction analysis module, for example, the data correlation between the first N instructions in the instruction queue is analyzed.
The instruction reordering module (corresponding to the first instruction reordering module in the foregoing, or the first instruction reordering module and the second instruction reordering module) is configured to reorder the instructions according to data correlation between the instructions and the scheduling policy control information.
For example, the instructions may be first ordered according to data dependencies between the instructions and then ordered twice according to a scheduling priority weight of the instructions of each execution unit, where the scheduling priority weight of the instructions of each execution unit may be obtained from a corresponding first CR of each execution unit, e.g., from ALU _ CR, PMT _ CR, LDU _ CR, and STU _ CR, respectively.
And the selection module is used for selecting the target dispatching instruction from the reordered queue and setting the reservation station, namely dispatching the target dispatching instruction to the reservation station.
In other embodiments, the second instruction reordering module may be incorporated into the selection module, in which case, the input of the selection module may include the reordered queue output by the first instruction reordering module and the scheduling policy control information, and the selection module may select the target dispatch instruction from the reordered queue output by the first instruction reordering module according to the scheduling policy control information.
In some embodiments, the selection module may be an N-K selector, wherein the input of the N-K selector may refer to N instructions, or may be less than N instructions, and the output of the selector may be K instructions, or may be less than K instructions.
The following describes the scheduling process of instructions by taking an example that the instruction analysis module analyzes 5 instructions at a time, that is, N =5, the number of execution units is 4, and the execution units are LDU, STU, ALU, and PMT, and only one instruction (M = 1) is dispatched to the reservation station at a time. The scheduling priority weights of the execution units are respectively as follows: ALU _ CR =5, ldu _ =4, stu =3, pmt =2, i.e.: scheduling priority ALU > LDU > STU > PMT. The specific scheduling process is illustrated by 6 instructions in the instruction queue.
In this example, the instruction analysis module performs analysis of the instructions every clock cycle.
It should be understood that in the following table, the instructions may have four states: the instruction execution system comprises a dispatching state, an emission state, an execution completion state and a write-back state, wherein the dispatching state represents that the instruction is dispatched to the reservation station, and the emission state represents that the instruction is emitted to the execution unit, namely the instruction starts to execute. The number corresponding to each state in the table indicates that the state is reached in the fourth cycle.
Table 1-1 is the status of the instructions in the instruction queue for the first clock cycle, and Table 1-2 is the queue reordered after the first clock cycle based on the data dependencies and scheduling priority weights between the first 5 instructions in the instruction queue.
TABLE 1-1
Figure BDA0003840372010000091
Figure BDA0003840372010000101
Tables 1 to 2
Figure BDA0003840372010000102
As can be seen from table 1-1, ALU2 and the other instructions in the first 5 instructions have no data dependency, and the other 4 instructions have data dependency, so ALU2 is arranged at the head of the queue, and the other instructions are arranged in relative order in the instruction queue. An ALU2 instruction is dispatched to the first row of the reservation station with the busy bit set to 1 indicating that an instruction is now present in the first row.
Table 2-1 is the state of the instructions in the instruction queue for the second clock cycle, and Table 2-2 is the queue reordered after the second clock cycle.
TABLE 2-1
Figure BDA0003840372010000103
Tables 2 to 2
Figure BDA0003840372010000104
In the second clock cycle, the first instruction ALU2 starts to calculate; the LD instruction is at the head of the reordered queue, and the LD instruction is dispatched to the second row of the reservation station with the busy bit set to 1 indicating that an instruction is now present in the second row. Since the first instruction LD in the instruction queue has not yet been executed, the sliding window cannot yet be slid backward, and therefore there are four instructions analyzed in the second clock cycle.
Table 3-1 is the state of the instructions in the instruction queue for the third clock cycle, and Table 3-2 is the queue reordered after the third clock cycle.
TABLE 3-1
Figure BDA0003840372010000111
TABLE 3-2
Figure BDA0003840372010000112
In the third clock cycle, the first instruction LD starts to calculate; the ALU1 instruction is at the head of the reordered queue, dispatching the ALU1 instruction to the third line of the reservation station with the busy bit set to 1 indicating that an instruction is present in the third line.
Table 4-1 is the state of the instructions in the instruction queue for the fourth clock cycle, and Table 4-2 is the queue reordered after the fourth clock cycle.
TABLE 4-1
Figure BDA0003840372010000113
TABLE 4-2
Figure BDA0003840372010000114
In the fourth clock cycle, the instruction ALU2 writes back, the LD instruction is in the execute stage, the ALU1 instruction is in the issue stage, the ALU3 instruction is dispatched to the fourth row of the reservation station with the busy bit set to 1 indicating that the fourth row now has an instruction.
Table 5-1 is the state of the instructions in the instruction queue for the fifth clock cycle, and Table 5-2 is the queue reordered after the fifth clock cycle.
TABLE 5-1
Figure BDA0003840372010000115
Figure BDA0003840372010000121
TABLE 5-2
Figure BDA0003840372010000122
In the fifth clock cycle, the instruction LD is written back, the ALU1 instruction is in the execute stage, the ALU3 instruction is in the issue stage, and the STU instruction is dispatched to the fifth row of reservation stations with the busy bit set to 1 indicating that an instruction is now present in the fifth row.
Table 6-1 is the state of the instructions in the instruction queue for the sixth clock cycle, and Table 6-2 is the queue reordered after the sixth clock cycle.
TABLE 6-1
Figure BDA0003840372010000123
TABLE 6-2
Figure BDA0003840372010000124
In the sixth clock cycle, the instruction ALU1 writes back; the ALU3 instruction is in the execute stage, the STU instruction is in the issue stage, and the PMT instruction is dispatched to the sixth row of reservation stations with the busy bit set to 1 indicating that an instruction is now present in the sixth row.
Table 7-1 is the state of the instructions in the instruction queue for the seventh clock cycle, and Table 7-2 is the reordered queue after the seventh clock cycle.
TABLE 7-1
Figure BDA0003840372010000125
TABLE 7-2
Figure BDA0003840372010000131
In the seventh clock cycle, the instruction ALU3 writes back; the STU instruction is in the execute phase and the PMT instruction is in the issue phase.
Table 8-1 is the state of the instructions in the instruction queue for the eighth clock cycle, and Table 8-2 is the queue reordered after the eighth clock cycle.
TABLE 8-1
Figure BDA0003840372010000132
TABLE 8-2
Figure BDA0003840372010000133
In the eighth clock cycle, the instruction STU is written back; PMT instructions are in the execute stage.
Table 9-1 is the state of the instructions in the instruction queue for the ninth clock cycle, and Table 9-2 is the reordered queue after the ninth clock cycle.
TABLE 9-1
Figure BDA0003840372010000134
TABLE 9-2
Figure BDA0003840372010000135
In the ninth clock cycle, the instruction PMT writes back.
Table 10 is a scheduling timing table for scheduling the instructions in the instruction queue by using the tomasolo algorithm:
watch 10
Figure BDA0003840372010000136
Figure BDA0003840372010000141
Based on the scheduling mode of the embodiment of the application, 9 clock cycles are needed for executing the instructions in total, and 11 clock cycles are needed based on the tomasolo algorithm, so that the instruction parallel efficiency is improved.
The apparatus embodiments of the present application are described in detail above with reference to fig. 2-9, and the method embodiments of the present application are described in detail below with reference to fig. 10, it being understood that the method embodiments correspond to the apparatus embodiments and that similar descriptions may be had with reference to the apparatus embodiments.
FIG. 10 is a schematic flow chart diagram of an instruction scheduling method 400 according to an embodiment of the present application. The method 400 may be performed by the instruction scheduling apparatus in the foregoing embodiments, or may also be performed by a processing chip including the instruction scheduling apparatus. The method 400 includes at least in part the following:
s410, according to data correlation and/or scheduling strategy control information among instructions in an instruction queue, determining a target dispatching instruction in the instruction queue, and dispatching the target dispatching instruction from the instruction queue to a reservation station, wherein the scheduling strategy control information is used for indicating scheduling priorities of the instructions of different execution units.
In some embodiments, the scheduling policy control information is retrieved from the first control register CR.
In some embodiments, each execution unit corresponds to a first CR, and a value of the first CR corresponding to the execution unit is used to indicate a weight of a scheduling priority of an instruction of the execution unit.
In some embodiments, the determining a target dispatched instruction in the instruction queue according to data dependencies and/or scheduling policy control information between instructions in the instruction queue comprises:
and under the condition that the reservation station has free space, determining a target dispatching instruction in the instruction queue according to data correlation and/or scheduling strategy control information among the instructions in the instruction queue.
In some embodiments, the determining a target dispatched instruction in an instruction queue according to data dependencies and/or scheduling policy control information between instructions in the instruction queue comprises:
sequencing the first N instructions in the instruction queue according to the data correlation among the instructions in the instruction queue to obtain a first queue;
sequencing L instructions in the first queue according to the scheduling strategy control information to obtain a second queue, wherein L is less than or equal to N;
determining the target dispatch instruction from the second queue.
In some embodiments, the N instructions do not include a control state register, CSR, instruction to set the scheduling policy control information.
In some embodiments, the sorting L instructions in the first queue according to the scheduling policy control information to obtain a second queue includes:
when a plurality of first-class instructions exist in the first queue, sequencing L instructions in the first queue according to the scheduling policy control information to obtain a second queue, wherein the L instructions are the first-class instructions, and the first-class instructions and other instructions before the first-class instructions in the instruction queue do not have data correlation.
In some embodiments, the target dispatch instruction includes the L instructions if L is less than or equal to K; or
If L is greater than K, the target dispatch instruction includes the first K instructions in the second queue;
where K is the maximum number of instructions that can be dispatched to the reservation station.
In some embodiments, the determining a target dispatched instruction in the instruction queue according to data dependencies and/or scheduling policy control information between instructions in the instruction queue comprises:
sequencing the first N instructions in the instruction queue according to the data correlation among the instructions in the instruction queue to obtain a first queue;
determining the target dispatch instruction in the first queue.
In some embodiments, the target dispatch instruction includes one or more instructions of a first type in the first queue that have no data dependency on other instructions in the instruction queue prior to the instruction of the first type.
In some embodiments, the number of first type instructions in the first queue is less than or equal to K, the target dispatched instruction including all of the first type instructions in the first queue; or
The number of first class instructions in the first queue is greater than K, and the target dispatch instruction comprises the first K first class instructions in the first queue;
where K is the maximum number of instructions that can be dispatched to the reservation station.
In some embodiments, if there are no first class instructions for the N instructions, the order of the N instructions in the first queue is the order of the N instructions in the instruction queue; or alternatively
If part of the N instructions are first-class instructions, arranging the first-class instructions in the N instructions at the head of the first queue, and arranging non-first-class instructions in the N instructions at the tail of the first queue;
if the instructions of the N instructions are all first-class instructions, the sequence of the N instructions in the first queue is the sequence of the N instructions in the instruction queue.
In some embodiments, the determining a target dispatched instruction in the instruction queue according to data dependencies and/or scheduling policy control information between instructions in the instruction queue comprises:
sequencing the first N instructions in the instruction queue according to the scheduling strategy control information to obtain a third queue;
determining the target dispatch instruction in the third queue.
In some embodiments, the target dispatch instruction includes the first K instructions in the third queue, where K is the maximum number of instructions that can be dispatched to a reservation station.
In some embodiments, the K is obtained from a second CR.
In some embodiments, there is one second CR per execution unit, or,
all execution units correspond to the same second CR.
In some embodiments, the method further comprises:
in the case that the target dispatch instruction is dispatched to the reservation station, setting a latch state of a register related to the target dispatch instruction to be latched;
and under the condition that the target dispatching instruction is executed by the corresponding execution unit, setting the latch state of the register related to the target dispatching instruction to be unlocked.
In some embodiments, the method further comprises:
in the case of dispatching the target dispatch instruction to the reservation station, receiving first control information of a first execution unit, the first control information instructing the dispatch logic module to examine data dependencies between instructions in the reservation station to determine issuable instructions.
In some embodiments, the method further comprises:
after the target dispatch instruction is transmitted to a first execution unit, receiving second control information of the first execution unit, wherein the second control information is used for instructing the scheduling logic module to clear relevant information of the target dispatch instruction in the reservation station.
In some embodiments, all execution units share a reservation station; or
Each execution unit corresponds to one reservation station; or
Each group of execution units corresponds to one reservation station, wherein one group of execution units comprises one execution unit or comprises a plurality of execution units.
Fig. 11 is a schematic structural diagram of a processing chip of an embodiment of the present application. As shown in fig. 11, the processing chip 700 may include: a communication interface 701, a memory 702, a processor 703 and a communication bus 704. The communication interface 701, the memory 702, and the processor 703 realize communication with each other through the communication bus 704. The communication interface 701 is used for the processing chip 700 to perform data communication with an external device. The memory 702 may be used for storing software programs and modules, and the processor 703 may operate the software programs and modules stored in the memory 702, for example, the software programs of the corresponding operations in the foregoing method embodiments.
In some embodiments of the present application, the processor 703 may invoke the software programs and modules stored in the memory 702 to perform the following operations: determining a target dispatching instruction in an instruction queue according to data correlation and/or scheduling policy control information among instructions in the instruction queue, wherein the scheduling policy control information is used for indicating scheduling priorities of the instructions of different execution units, and dispatching the target dispatching instruction from the instruction queue to a reservation station.
Alternatively, the processing chip 700 may be integrated into a terminal or a server having a memory and a processor and having a computing capability, such as a tablet computer, a mobile phone, and a notebook computer, for example, or the processing chip 700 may be the terminal or the server.
Alternatively, the processing chip may be a dedicated processing chip, or may be a general-purpose processing chip.
It should be understood that the processing chip mentioned in the embodiments of the present application may also be referred to as a system-on-chip, a system-on-chip or a system-on-chip.
In some embodiments, the present application further provides a computer device comprising a memory and a processor, the memory storing a computer program, the processor implementing the steps of the above method embodiments when executing the computer program.
In some embodiments, the computer device may be a cell phone, a personal computer, a server, a network device, or the like.
An embodiment of the present application further provides a computer-readable storage medium for storing a computer program. The computer-readable storage medium can be applied to a computer device, and the computer program enables the computer device to execute the corresponding flow in the instruction scheduling method in the embodiment of the present application, which is not described herein again for brevity.
Embodiments of the present application also provide a computer program product including computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and executes the computer instruction, so that the computer device executes a corresponding process in the instruction scheduling method in the embodiment of the present application, which is not described herein again for brevity.
Embodiments of the present application also provide a computer program, which includes computer instructions stored in a computer-readable storage medium. The processor of the computer device reads the computer instruction from the computer-readable storage medium, and executes the computer instruction, so that the computer device executes the corresponding process in the instruction scheduling method in the embodiment of the present application, which is not described herein again for brevity.
It should be understood that the processor of the embodiments of the present application may be an integrated circuit chip having signal processing capabilities. In implementation, the steps of the above method embodiments may be performed by integrated logic circuits of hardware in a processor or instructions in the form of software. The Processor may be a general purpose Processor, a Digital Signal Processor (DSP), an Application Specific Integrated Circuit (ASIC), an off the shelf Programmable Gate Array (FPGA) or other Programmable logic device, discrete Gate or transistor logic, discrete hardware components. The various methods, steps, and logic blocks disclosed in the embodiments of the present application may be implemented or performed. A general purpose processor may be a microprocessor or the processor may be any conventional processor or the like. The steps of the method disclosed in connection with the embodiments of the present application may be directly implemented by a hardware decoding processor, or implemented by a combination of hardware and software modules in the decoding processor. The software module may be located in ram, flash memory, rom, prom, or eprom, registers, etc. storage media as is well known in the art. The storage medium is located in a memory, and a processor reads information in the memory and combines hardware thereof to complete the steps of the method.
It can be clearly understood by those skilled in the art that, for convenience and simplicity of description, the specific working processes of the above-described systems, apparatuses and units may refer to the corresponding processes in the foregoing method embodiments, and are not described herein again.
In the several embodiments provided in the present application, it should be understood that the disclosed system, apparatus and method may be implemented in other ways. For example, the above-described apparatus embodiments are merely illustrative, and for example, the division of the units is only one type of logical functional division, and other divisions may be realized in practice, for example, multiple units or components may be combined or integrated into another system, or some features may be omitted, or not executed. In addition, the shown or discussed mutual coupling or direct coupling or communication connection may be an indirect coupling or communication connection through some interfaces, devices or units, and may be in an electrical, mechanical or other form.
The units described as separate parts may or may not be physically separate, and parts displayed as units may or may not be physical units, may be located in one position, or may be distributed on multiple network units. Some or all of the units can be selected according to actual needs to achieve the purpose of the solution of the embodiment.
In addition, functional units in the embodiments of the present application may be integrated into one processing unit, or each unit may exist alone physically, or two or more units are integrated into one unit.
The functions, if implemented in the form of software functional units and sold or used as a stand-alone product, may be stored in a computer readable storage medium. Based on such understanding, the technical solution of the present application or portions thereof that substantially contribute to the prior art may be embodied in the form of a software product stored in a storage medium and including instructions for causing a computer device (which may be a mobile phone, a personal computer, a server, or a network device) to perform all or part of the steps of the method according to the embodiments of the present application. And the aforementioned storage medium includes: a U-disk, a removable hard disk, a Read-Only Memory (ROM), a Random Access Memory (RAM), a magnetic disk, an optical disk, or other various media capable of storing program codes.
The above description is only for the specific embodiments of the present application, but the scope of the present application is not limited thereto, and any person skilled in the art can easily conceive of the changes or substitutions within the technical scope of the present application, and shall be covered by the scope of the present application. Therefore, the protection scope of the present application shall be subject to the protection scope of the claims.

Claims (43)

1. An instruction scheduling apparatus applied in a processing chip, the processing chip including a plurality of execution units, the instruction scheduling apparatus comprising:
and the scheduling logic module is used for determining a target dispatching instruction in the instruction queue according to data correlation and/or scheduling policy control information among the instructions in the instruction queue, and dispatching the target dispatching instruction from the instruction queue to the reservation station, wherein the scheduling policy control information is used for indicating the scheduling priority of the instructions of the execution units.
2. The apparatus according to claim 1, wherein the scheduling policy control information is obtained from a first control register CR.
3. The apparatus of claim 2, wherein each execution unit of the plurality of execution units corresponds to a first CR, and wherein a value of the first CR for the execution unit is used to indicate a weight of a scheduling priority of the instructions of the execution unit.
4. The apparatus of any of claims 1-3, wherein the scheduling logic module is further configured to:
and under the condition that the reservation station has free space, determining a target dispatching instruction in the instruction queue according to data correlation and/or scheduling strategy control information among the instructions in the instruction queue.
5. The apparatus of any of claims 1-4, wherein the scheduling logic module is specifically configured to:
sequencing the first N instructions in the instruction queue according to the data correlation among the instructions in the instruction queue to obtain a first queue;
sequencing L instructions in the first queue according to the scheduling strategy control information to obtain a second queue, wherein L is less than or equal to N;
determining the target dispatch instruction from the second queue.
6. The apparatus of claim 5, wherein the N instructions do not include a Control State Register (CSR) instruction, the CSR instruction to set the scheduling policy control information.
7. The apparatus of claim 5, wherein the scheduling logic module is further configured to:
when a plurality of first-class instructions exist in the first queue, sequencing L instructions in the first queue according to the scheduling policy control information to obtain a second queue, wherein the L instructions are the plurality of first-class instructions, and the first-class instructions and other instructions before the first-class instructions in the instruction queue do not have data correlation.
8. The apparatus of claim 7,
if L is less than or equal to K, the target dispatch instruction includes the L instructions; or alternatively
If L is greater than K, the target dispatch instruction includes the first K instructions in the second queue;
where K is the maximum number of instructions that can be dispatched to the reservation station.
9. The apparatus of any of claims 1-4, wherein the scheduling logic is specifically configured to:
sequencing the first N instructions in the instruction queue according to the data correlation among the instructions in the instruction queue to obtain a first queue;
determining the target dispatch instruction in the first queue.
10. The apparatus of claim 9 wherein the target dispatch instruction comprises one or more instructions of a first type in the first queue, the instructions of the first type having no data dependency on other instructions in the instruction queue prior to the instructions of the first type.
11. The apparatus of claim 10 wherein the number of first type instructions in the first queue is less than or equal to K, the target dispatch instruction comprising all first type instructions in the first queue; or
The number of first class instructions in the first queue is greater than K, and the target dispatch instruction comprises the first K first class instructions in the first queue;
where K is the maximum number of instructions that can be dispatched to the reservation station.
12. The apparatus according to any one of claims 5 to 11,
if the N instructions do not have the first type of instructions, the sequence of the N instructions in the first queue is the sequence of the N instructions in the instruction queue; or
If part of the N instructions are first-class instructions, arranging the first-class instructions in the N instructions at the head of the first queue, and arranging non-first-class instructions in the N instructions at the tail of the first queue;
if the instructions of the N instructions are all first-class instructions, the sequence of the N instructions in the first queue is the sequence of the N instructions in the instruction queue.
13. The apparatus of any of claims 1-4, wherein the scheduling logic module is specifically configured to:
sequencing the first N instructions in the instruction queue according to the scheduling strategy control information to obtain a third queue;
determining the target dispatch instruction in the third queue.
14. The apparatus of claim 13, wherein the target dispatch instruction comprises a first K instructions in the third queue, where K is a maximum number of instructions that can be dispatched to a reservation station.
15. The apparatus of claim 8, 11 or 14, wherein the K is obtained from a second CR.
16. The apparatus of claim 15,
each execution unit of the plurality of execution units corresponds to a second CR, or,
the plurality of execution units correspond to the same second CR.
17. The apparatus of any of claims 1-16, wherein the scheduling logic module is further configured to:
in the event that the target dispatch instruction is dispatched to the reservation station, setting the latched state of the register associated with the target dispatch instruction to latched;
and under the condition that the target dispatching instruction is executed by the corresponding execution unit, setting the latch state of the register related to the target dispatching instruction to be unlocked.
18. The apparatus of any of claims 1-17, wherein the scheduling logic module is further configured to:
in the case of dispatching the target dispatch instruction to the reservation station, receiving first control information of a first execution unit, the first control information instructing the dispatch logic module to examine data dependencies between instructions in the reservation station to determine issuable instructions.
19. The apparatus of any of claims 1-18, wherein the scheduling logic is further configured to:
after the target dispatch instruction is transmitted to a first execution unit, receiving second control information of the first execution unit, wherein the second control information is used for instructing the scheduling logic module to clear relevant information of the target dispatch instruction in the reservation station.
20. The apparatus of any one of claims 1-20,
the plurality of execution units share a reservation station; or
Each execution unit in the plurality of execution units corresponds to a reservation station; or
The execution units are divided into multiple groups of execution units, each group of execution units corresponds to one reservation station, and one group of execution units comprises one execution unit or comprises multiple execution units.
21. An instruction scheduling method, the method comprising:
determining target dispatch instructions in an instruction queue according to data correlation and/or scheduling policy control information among instructions in the instruction queue, and dispatching the target dispatch instructions from the instruction queue to a reservation station, wherein the scheduling policy control information is used for indicating scheduling priorities of the instructions of different execution units.
22. The method of claim 21, wherein the scheduling policy control information is obtained from a first Control Register (CR).
23. The method of claim 22, wherein each execution unit corresponds to a first CR, and wherein the value of the first CR corresponding to the execution unit is used to indicate a weight of a scheduling priority of the instruction of the execution unit.
24. The method of any of claims 21-23, wherein determining a target dispatch instruction in an instruction queue based on data dependencies and/or scheduling policy control information between instructions in the instruction queue comprises:
and under the condition that the reservation station has free space, determining a target dispatching instruction in the instruction queue according to data correlation and/or scheduling strategy control information among the instructions in the instruction queue.
25. The method of any of claims 21-24, wherein determining a target dispatch instruction in an instruction queue based on data dependencies and/or scheduling policy control information between instructions in the instruction queue comprises:
sequencing the first N instructions in the instruction queue according to the data correlation among the instructions in the instruction queue to obtain a first queue;
sequencing L instructions in the first queue according to the scheduling strategy control information to obtain a second queue, wherein L is less than or equal to N;
determining the target dispatch instruction from the second queue.
26. The method of claim 25, wherein the N instructions do not include a Control State Register (CSR) instruction, the CSR instruction to set the scheduling policy control information.
27. The method of claim 25, wherein the sorting the L instructions in the first queue according to the scheduling policy control information to obtain a second queue comprises:
when a plurality of first-class instructions exist in the first queue, sequencing L instructions in the first queue according to the scheduling policy control information to obtain a second queue, wherein the L instructions are the first-class instructions, and the first-class instructions and other instructions before the first-class instructions in the instruction queue do not have data correlation.
28. The method of claim 27 wherein if L is less than or equal to K, the target dispatch instruction includes the L instructions; or alternatively
If L is greater than K, the target dispatch instruction includes the first K instructions in the second queue;
where K is the maximum number of instructions that can be dispatched to the reservation station.
29. The method of any of claims 21-24, wherein determining a target dispatch instruction in an instruction queue based on data dependencies and/or scheduling policy control information between instructions in the instruction queue comprises:
sequencing the first N instructions in the instruction queue according to the data correlation among the instructions in the instruction queue to obtain a first queue;
determining the target dispatch instruction in the first queue.
30. The apparatus of claim 29 wherein the target dispatch instruction comprises one or more instructions of a first type in the first queue, the instructions of the first type having no data dependency on other instructions in the instruction queue prior to the instructions of the first type.
31. The apparatus of claim 30 wherein the number of first class instructions in the first queue is less than or equal to K, the target dispatch instruction comprising all first class instructions in the first queue; or
The number of first class instructions in the first queue is greater than K, and the target dispatch instruction comprises the first K first class instructions in the first queue;
where K is the maximum number of instructions that can be dispatched to the reservation station.
32. The method of any one of claims 25-31,
if the N instructions do not have the first type of instructions, the sequence of the N instructions in the first queue is the sequence of the N instructions in the instruction queue; or
If part of the N instructions are first-class instructions, arranging the first-class instructions in the N instructions at the head of the first queue, and arranging non-first-class instructions in the N instructions at the tail of the first queue;
if the instructions of the N instructions are all first-class instructions, the sequence of the N instructions in the first queue is the sequence of the N instructions in the instruction queue.
33. The method of any of claims 21-24, wherein determining a target dispatch instruction in an instruction queue based on data dependencies and/or scheduling policy control information between instructions in the instruction queue comprises:
sequencing the first N instructions in the instruction queue according to the scheduling strategy control information to obtain a third queue;
determining the target dispatch instruction in the third queue.
34. The method of claim 33, wherein the target dispatch instruction comprises a first K instructions in the third queue, where K is a maximum number of instructions that can be dispatched to a reservation station.
35. The method of claim 28, 31 or 34, wherein the K is obtained from a second CR.
36. The method of claim 35,
each execution unit corresponds to a second CR, or,
all execution units correspond to the same second CR.
37. The method according to any one of claims 21-36, further comprising:
in the event that the target dispatch instruction is dispatched to the reservation station, setting the latched state of the register associated with the target dispatch instruction to latched;
and under the condition that the target dispatching instruction is executed by the corresponding execution unit, setting the latch state of the register related to the target dispatching instruction to be unlocked.
38. The method according to any one of claims 21-37, further comprising:
in the case of dispatching the target dispatch instruction to the reservation station, receiving first control information of a first execution unit, the first control information being used to instruct the scheduling logic module to check data dependencies between instructions in the reservation station to determine an issuable instruction.
39. The method according to any one of claims 21-38, further comprising:
after the target dispatch instruction is transmitted to a first execution unit, receiving second control information of the first execution unit, wherein the second control information is used for instructing the scheduling logic module to clear relevant information of the target dispatch instruction in the reservation station.
40. The method of any one of claims 21-39,
all execution units share a reservation station; or alternatively
Each execution unit corresponds to one reservation station; or
Each group of execution units corresponds to one reservation station, wherein one group of execution units comprises one execution unit or a plurality of execution units.
41. A chip comprising a processor for calling and running a computer program from a memory so that a device on which the chip is installed performs the method of any of claims 21 to 40.
42. A computer device comprising a processor and a memory for storing a computer program, the processor being configured to invoke and execute the computer program stored in the memory such that the computer device performs the method of any of claims 21 to 40.
43. A computer-readable storage medium for storing a computer program which causes a computer to perform the method of any one of claims 21 to 40.
CN202211103555.XA 2022-09-09 2022-09-09 Instruction scheduling apparatus, method, chip, and computer-readable storage medium Pending CN115454506A (en)

Priority Applications (1)

Application Number Priority Date Filing Date Title
CN202211103555.XA CN115454506A (en) 2022-09-09 2022-09-09 Instruction scheduling apparatus, method, chip, and computer-readable storage medium

Applications Claiming Priority (1)

Application Number Priority Date Filing Date Title
CN202211103555.XA CN115454506A (en) 2022-09-09 2022-09-09 Instruction scheduling apparatus, method, chip, and computer-readable storage medium

Publications (1)

Publication Number Publication Date
CN115454506A true CN115454506A (en) 2022-12-09

Family

ID=84303863

Family Applications (1)

Application Number Title Priority Date Filing Date
CN202211103555.XA Pending CN115454506A (en) 2022-09-09 2022-09-09 Instruction scheduling apparatus, method, chip, and computer-readable storage medium

Country Status (1)

Country Link
CN (1) CN115454506A (en)

Similar Documents

Publication Publication Date Title
US11868163B2 (en) Efficient loop execution for a multi-threaded, self-scheduling reconfigurable computing fabric
US11675734B2 (en) Loop thread order execution control of a multi-threaded, self-scheduling reconfigurable computing fabric
US11573796B2 (en) Conditional branching control for a multi-threaded, self-scheduling reconfigurable computing fabric
US11531543B2 (en) Backpressure control using a stop signal for a multi-threaded, self-scheduling reconfigurable computing fabric
US11675598B2 (en) Loop execution control for a multi-threaded, self-scheduling reconfigurable computing fabric using a reenter queue
US11915057B2 (en) Computational partition for a multi-threaded, self-scheduling reconfigurable computing fabric
US11567766B2 (en) Control registers to store thread identifiers for threaded loop execution in a self-scheduling reconfigurable computing fabric
US11635959B2 (en) Execution control of a multi-threaded, self-scheduling reconfigurable computing fabric
KR102074961B1 (en) Method and apparatus for efficient scheduling for asymmetrical execution units
US11586571B2 (en) Multi-threaded, self-scheduling reconfigurable computing fabric
US5604878A (en) Method and apparatus for avoiding writeback conflicts between execution units sharing a common writeback path
CN112789593A (en) Multithreading-based instruction processing method and device
CN116662255A (en) RISC-V processor realization method and system combined with overrunning function hardware accelerator
CN115454506A (en) Instruction scheduling apparatus, method, chip, and computer-readable storage medium
CN101042641B (en) Digital signal processor with dynamic submitting pipeline function
CN113703841B (en) Optimization method, device and medium for register data reading
CN112181497B (en) Method and device for transmitting branch target prediction address in pipeline
CN117435551A (en) Computing device, in-memory processing storage device and operation method
KR20140096498A (en) Device and method to compile for scheduling block at pipeline
JPH11203145A (en) Instruction scheduling method

Legal Events

Date Code Title Description
PB01 Publication
PB01 Publication
SE01 Entry into force of request for substantive examination
SE01 Entry into force of request for substantive examination